la notes complete

Upload: shiyeng-charmaine

Post on 10-Apr-2018

231 views

Category:

Documents


0 download

TRANSCRIPT

  • 8/8/2019 LA Notes Complete

    1/36

    MATH 2020 Linear Algebra

    Gordon Royle

    October 28, 2008

    1 Vector Spaces

    The first vector space that most students encounter is R2 which is the setof all pairs (x, y) of real numbers. The vectors in R2 are usually interpretedeither as points in the plane, or as quantities (such as velocity) that haveboth magnitude and direction. In Figure 1 we either view the point P asbeing located at position (3, 2) or representing a quantity whose magnitude

    and direction are represented by the directed line segment#

    OP where O isthe origin.

    P = (3, 2)

    Figure 1: Vectors in R2

    The vectors in R2 satisfy certain useful properties (see Linear Algebra Notes

    1

  • 8/8/2019 LA Notes Complete

    2/36

    page 44,45) including (among others) the following:

    Vector Addition: Vectors can be added according to the rule

    (u1, u2) + (v1, v2) = (u1 + v1, u2 + v2).

    Scalar Multiplication: Vectors can be multiplied by real numbersaccording to the rule

    (v1, v2) = (v1, v2).

    Zero Vector: There is a zero vector 0 = (0, 0) such that

    u + 0 = 0 + u = u

    for any u R2.

    Clearly the set R3 of triples of real numbers also satisfies these propertiesand in general we have the n-dimensional space Rn. Although R2 representspoints in the plane and R3 represents points in space, they are really thesame sort of structure.

    1.1 Abstract Real Vector Spaces

    One of the main techniques of mathematics is the abstraction of generalprinciples and structures from close examination of particular instances, andtheir distillation into a collection of axioms. The axioms or rules are meantto represent the essential features that have proven useful in a range ofspecific into a single abstract definition that will apply to all of the specificexamples, including any that may be discovered in the future.

    2

  • 8/8/2019 LA Notes Complete

    3/36

    Definition 1. A real vector space is a setV of objects calledvectors together

    with two functions + : V V V

    and : R V V

    called addition and scalar multiplication such that the following conditionshold:

    (i) Associativity: u + (v + w) = (u + v) + w for all u,v,w V.

    (ii) Commutativity: u + v = v + u for all u, v V.

    (iii) Zero vector: There is some vector 0 such thatv +0 = v for allv V.

    (iv) Inverses: For each vector v V there is another vector v V suchthat v + v = 0 we usually denote this vector by v.

    (v) Distributivity: (u + v) = u + v and ( + ) v = v + vfor all , R and u, v V.

    (vi) Associativity of : ( v) = () v for all , R and v V.

    (vii) Identity: 1 v = v for all v V.

    Although we have given them the normal names of + and , it is importantto realize that they can be any function at all, and they may in some situa-tions look very different from the usual addition and multiplication (even ifV is a familiar set).

    3

  • 8/8/2019 LA Notes Complete

    4/36

    Example 1. The set V = R2 with the usual addition and scalar multiplica-

    tion is a real vector space.Example 2. The set R22 of all 2 2 real matrices with vector additiondefined by

    a1 b1c1 d1

    +

    a2 b2c2 d2

    a1 + a2 b1 + b2c1 + c2 d1 + d2

    and scalar multiplication by

    a bc d

    =

    a bc d

    is a real vector space.

    Example 3. The set of22 realsymmetric matrices (with the usual additionand multiplication) is a real vector space.

    Example 4. The set of 2 2 matrices with determinant 1 and the usualaddition and multiplication is not a real vector space, because the set is notclosed under .

    Example 5. The set V = {} with addition defined by

    + =

    and scalar multiplication defined by

    =

    is a real vector space.

    Example 6. LetV be the set of all positive real numbers and define

    u + v = uv

    andk v = vk.

    Is this a real vector space?

    We need to check all of the conditions in turn.

    4

  • 8/8/2019 LA Notes Complete

    5/36

    (i) Vector addition is associative because

    u + (v + w) = u(vw) = (uv)w = (u + v) + w

    by the normal properties of real numbers.

    (ii) Vector addition is commutative because

    u + v = uv = vu = v + u.

    (iii) The zero vector is 0 = 1 because 1 + v = 1v = v for all v.

    (iv) The inverse of v is 1/v because

    v + 1v

    = vv

    = 1 = 0.

    (v) Distributivity holds because

    (u + v) = (u + v) = (uv) = uv = u + v = u + v.

    (vi) Associativity holds because

    ( v) = (v) = v = v = (v) = ( v).

    (vii) Identity holds because1 v = v1 = v.

    Therefore this set does form a real vector space under these operations.

    1.2 Other fields

    Throughout the preceding discussion we assumed that the scalars were realnumbers. However there are many other number systems and we can define

    a vector space using any of these as the scalars.

    The technical term for a suitable number system is a field but we will notneed to know the exact definitionof a field, just some examples of fields thatwe can use. Were already familiar with a number of common fields

    5

  • 8/8/2019 LA Notes Complete

    6/36

    The field R of real numbers.

    The field C of complex numbers:

    C = {a + bi | a, b R}

    where i is a symbol with the property that i2 = 1. We can addtogether and multiply complex numbers according to the following rules

    (a + bi) + (c + di) = (a + c) + (b + d)i

    (a + bi)(c + di) = (ac bd) + (ad + bc)i

    The field Q of rational numbers which are numbers of the form a/b

    where a, b are integers (i.e. whole numbers).

    All of these fields have infinitely many scalars, but there are also finite fields.The simplest is

    The binary field F2 = {0, 1} where addition and multiplication aregiven by

    + 0 10 0 1

    1 1 0

    0 10 0 0

    1 0 1

    To a computer scientist these are simply XOR and AND in disguise, whileto a mathematician it is simply arithmetic modulo 2.

    The prime fields Fp = {0, 1, . . . , p1} where p is a prime1 and addition

    and multiplication are performed modulo p.

    The formal definition of a vector space over a field F is exactly the same asabove, but with every occurrence ofR replaced by an arbitrary field. Forexample, F32 is the vector space of all triples of scalars from F

    2 and thus

    F32 = {(0, 0, 0), (0, 0, 1), (0, 1, 0), (0, 1, 1), (1, 0, 0), (1, 0, 1), (1, 1, 0), (1, 1, 1)}

    is a finite vector space containing just eight vectors.

    1This is important!

    6

  • 8/8/2019 LA Notes Complete

    7/36

    1.3 Theoretical Consequences

    To prove that something is true in all vector spaces, we need to ensure thatour argument uses only the axioms and already-proved consequences. It isvery easy to accidentally assume something is true from the familiar exam-ples.

    Theorem 1. LetV be a vector space, v V and R. Then

    (i) 0 v = 0

    (ii) 0 = 0

    (iii) (1) v = v

    (iv) () v = (v) = (v)

    (v) If v = 0 then = 0 or v = 0.

    Proof. We prove each of the assertions in turn, using just the axioms or theearlier results:

    Proof of (i): To prove that 0 v = 0 we start by using Axiom (v) to deduce

    that

    0 v + 0 v = (0 + 0) v

    = 0 v

    Adding the inverse of 0 v to both sides we get

    0 v + 0 = 0

    0 v = 0 Axiom (iii)

    Proof of (ii): By Axiom (v),

    v + 0 = (v + 0)

    = v

    7

  • 8/8/2019 LA Notes Complete

    8/36

    and so adding ( v) to both sides and arguing similarly to part (i) we get

    0 = 0.Proof of (iii): For any vector v, Axiom (viii) shows that v = 1 v and so

    v + (1) v = 1 v + (1) v

    = 0 v

    = 0

    and so (1) v is equal to v.

    Proof of (iv): Similar to (iii).

    Proof of (v): Later.

    Conventions

    From now on, we will adopt the following conventions:

    1. We will drop the for scalar multiplication and just use uto denote u.

    2. We will define an operation by

    v w = v + (w).

    With these conventions we can now define a linear combination of vectors{v1, v2, . . . , vn} to be any vector of the form

    v = 1v1 + 2v2 + + nvn

    without any ambiguity.

    1.4 Some more interesting vector spaces

    We list some further examples of real vector spaces, but omit the proofs.

    8

  • 8/8/2019 LA Notes Complete

    9/36

    1. The vector space R of sequences

    Consider the set of all infinite sequences of real numbers

    R = {(a1, a2, . . . , an, . . .) | ai R}

    and define vector addition and scalar multiplication so that if a =(a1, a2, . . .) and b = (b1, b2, . . .) then

    a + b = (a1 + b1, a2 + b2, . . .)

    anda = (a1, a2, . . .).

    This is a real vector space with

    0 = (0, 0, . . .)

    anda = (a1, a2, . . .).

    2. The vector space R[x] of polynomials

    Let R[x] denote the set of all polynomial functions of a variable x.Therefore some sample vectors in this vector space are

    v1 = 1 + 2x x3

    v2 = x100

    v3 = 3

    Notice that each vector in this space is an entire polynomial (and not,for example, the polynomial evaluated at a single point). Two polyno-mials f and g are equal if and only if they have the same graph orin other words if and only if f(x) = g(x) for every value of x.

    The zero vector 0 in this vector space is the polynomial f(x) = 0 thattakes the value 0 everywhere.

    The additionof two polynomials is defined simply by adding the coeffi-cients of the corresponding powers of x, and multiplying a polynomialby a scalar in a similar fashion.

    9

  • 8/8/2019 LA Notes Complete

    10/36

    3. The vector space RR of functions

    Let RR

    denote the set of all functions f : R R. This includes thepolynomial functions and so

    R[x] RR

    but of course there are many functions that are notpolynomials and so

    R[x] RR.

    The set of all functions includes tame functions that are given bysimple formulas such as sin x, cos x and can easily be graphed, along

    with truly wild functions such as

    f(x) =

    x, if x Q,

    x2, if x / Q,

    which cant even be drawn.

    Iff and g are functions, then the sum f+ g and scalar multiple f arefunctions that are defined by their effect on a value x as follows:

    (f + g)(x) = f(x) + g(x),

    (f)(x) = f(x).

    2 Subspaces

    Several of the examples in the previous section showed that vector spacescan contain smaller vector spaces inside them.

    Definition 2 (Subspace). LetV be a vector space and suppose that W Vis a subset of V. ThenW is called a subspace of V if W is itself a vector

    space (with the same field, vector addition and scalar multiplication as V).

    Proving that a set of vectors is a subspace is a lot easier than proving that a setis a vector space from scratch because most of the axioms are automaticallysatisfied in W because they are true in V. For example, we never need to

    10

  • 8/8/2019 LA Notes Complete

    11/36

    check that vector addition is commutative in W because we already know

    that it is commutative in V.In fact there are only two things that need to be checked:

    Theorem 2. If V is a vector space and W V then W is a subspace if andonly if

    1. W is closed under vector addition,

    2. W is closed under scalar multiplication.

    Proof. Omitted here but given in lecture.Example 7. Is the set of vectors W = {(x,y,z) | x + y + z = 1} a subspaceofR3?

    The answer is No. The set W is not closed under+ because if v = (1, 0, 0)and w = (0, 1, 0) then v, w W but v + w = (1, 1, 0) / W.

    Example 8. Is the set of even functions a subspace ofRR.

    A function f is even iff(x) = f(x)

    for allx R, and so the question is whether this property still holds if weadd together two even functions. If h = f + g then

    h(x) = f(x) + g(x) = f(x) + g(x) = h(x)

    and so h is itself an even function. In a similar fashion, h = f is also evenand so the set of even functions is a subspace.

    2.1 Linear Combinations

    As noted earlier, in any vector space, an expression of the form

    1v1 + 2v2 + + nvn

    is unambiguously defined and is called a linear combinationof{v1, v2, . . . , vn}.

    11

  • 8/8/2019 LA Notes Complete

    12/36

    Definition 3. If V is a vector space over a field F, and v1, v2, . . ., vn V

    then the linear span (usually just called span) of {v1, v2, . . . , vn} is the set ofall linear combinations:

    span({v1, v2, . . . , vn}) = {1v1 + 2v2 + + nvn | i F for all i}.

    Theorem 3. The span of any set of vectors is a subspace of V.

    Example 9. InR3 let

    S1 = {(1, 0, 0)}

    S2 = {(1, 1, 0), (1, 0, 0)}

    S3 = {(1, 2, 3), (1, 2, 0), (2, 0, 0)}S4 = {(0, 0, 0), (1, 0, 0), (2, 0, 0)}

    Then span(S1) is the x-axis, span(S2) is the xy-plane, span(S3) is the wholeofR3 and span(S4) is also the x-axis.

    12

  • 8/8/2019 LA Notes Complete

    13/36

    Self Study Problems

    1. Which of the following sets are subspaces ofR3?

    (a) All vectors of the form (a, 0, 0)

    (b) All vectors of the form (a, a2, a2)

    (c) All vectors of the form (a,b, a b)

    (d) All vectors of the form (a,b,c) where c = a + b 1

    (e) The solutions (x1, x2, x3) to the equation

    1 2 33 2 43 1 0

    x1x2x3

    = 0

    00

    (f) The solutions to the equation 1 2 33 2 4

    3 1 0

    x1x2

    x3

    =

    10

    1

    (g) The line passing through the points (1, 1, 2) and (1, 2, 1)

    (h) The line passing through the points (1, 0, 3) and (2, 0, 6)(i) The point (1, 2, 3)

    (j) The point (0, 0, 0)

    (k) The points at distance at most 1 from the origin

    (l) All vectors of the form (x, sin2 x, cos2 x)

    (m) All vectors of the form (0, sin2 x, cos2 x)

    (n) All vectors of the form (1, sin2 x, cos2 x)

    (o) All vectors of the form (, sin2 x, cos2 x)

    13

  • 8/8/2019 LA Notes Complete

    14/36

    2. Which of the following sets are subspaces ofR33?

    (a) Symmetric matrices.

    (b) Diagonal matrices.

    (c) Upper triangular matrices.

    (d) Singular matrices.

    (e) Invertible matrices.

    (f) Non-invertible matrices.

    (g) Matrices A such that Ax = 0 where x = (1, 1, 1)T.

    (h) Matrices A such that Ax = x for all x R3.

    (i) Matrices with an odd number of entries equal to zero.

    (j) Matrices where the sum of all the entries is 0.

    (k) Matrices where the sum of all the entries is 1.

    (l) Row-and-column magic squares i.e. matrices such that everyrow and column sum to the same value (not including diagonals).

    (m) Fully magic squares include the diagonal and anti-diagonal.

    3. Which of the following sets are subspaces ofR[x]?

    (a) Polynomials of degree exactly 3.

    (b) Polynomials of degree at most 3.

    (c) Constant polynomials.

    (d) Even polynomials.

    (e) Polynomials f such that f(2) = 1

    (f) Polynomials f such that f(2) = 0

    (g) Polynomials with zero constant term.

    (h) Polynomials that satisfy f(1) = f(1).

    (i) Polynomials of even degree.

    (j) Polynomials f such that

    f(x)dx = 0

    (k) Polynomials that satisfy f(1)f(1) = 0

    14

  • 8/8/2019 LA Notes Complete

    15/36

    4. List all the subspaces ofF32

    5. Which of the following are subspaces ofRR?

    (a) Continuous functions

    (b) Differentiable functions

    (c) Piecewise smooth functions

    (d) Functions of the form sin x + cos x

    (e) Functions f such that

    f(x)dx = 0

    3 Bases and Dimension

    In this section we consider the idea of a basis of a vector space or subspace.The main concept here is that we wish to specify a vector space or (moreusually) a subspace in an efficient manner.

    3.1 Linear Independence

    An easy way to specify a subspace W is to give a spanning set of vectors forW. For example, in R3 here are four ways to specify a subspace:

    W1 = span({(1, 0, 1), (0, 1, 1)})

    W2 = span({(1, 0, 1), (0, 1, 1), (1, 1, 2)}

    W3 = span({(1, 1, 2), (1, 1, 0)})

    Each of these descriptions gives us a complete specification of the subspace a vector v is in W1 if and only if it can be expressed as a linear combination

    1(1, 0, 1) + 2(0, 1, 1).

    For any particular vector we can always decide whether it is in W1 simply bysolving a system of linear equations.

    15

  • 8/8/2019 LA Notes Complete

    16/36

    Example Are the vectors v1 = (2, 3, 1) and v2 = (2, 3, 1) in W1? To

    check v1 we need to solve the vector equation1(1, 0, 1) + 2(0, 1, 1) = (2, 3, 1).

    Looking at each coordinate in turn, we get a system of linear equations

    11 + 02 = 201 + 12 = 311 + 12 = 1

    and this particular system of equations is easy to solve, giving us 1 = 2 and2 = 3. Therefore we conclude that v1 is a linear combination of the twovectors {(1, 0, 1), (0, 1, 1)} and so it is in W1.

    When we try to repeat this with v2 we get a different system of equations

    11 + 02 = 201 + 12 = 311 + 12 = 1

    and when we attempt to solve these we discover that they have no solution.Therefore v2 is not in W1.

    The subspace W3 actually the same as W1 because any vector that can beexpressed as a linear combination of{(1, 0, 1), (0, 1, 1)} can also be expressed

    as a linear combination of {(1, 1, 2), (1, 1, 0)}. [Question: How can youprove this?]

    The subspace W2 is also the same as W1, but we have expressed it lessefficiently. The set of linear combinations that can be reached by using allthree vectors is no greater than the set that can be reached by using just thefirst two, because (1, 1, 2) is already a linear combination of the other twovectors. Therefore the set of vectors {(1, 0, 1), (0, 1, 1), (1, 1, 2)} has someredundancy in it.

    We need a definition to capture this concept of redundancy.

    Definition 4 (Independence). LetS = {v1, v2, . . . , vn} be a set of vectors ina vector space. Then S is called linearly independent if the only solution tothe vector equation

    1v1 + 2v2 + + nvn = 0 (1)

    16

  • 8/8/2019 LA Notes Complete

    17/36

    is the trivial solution

    1 = 2 = = n = 0.

    If there is a non-trivial solution to (1) then S is called dependent.

    Linearly independent sets of vectors have many nice properties:

    Theorem 4. If S = {v1, v2, . . . , vn} is a linearly independent set of vectorsthen

    1. Any vectorv span(S) can be expressed uniquely as a linear combina-tion of the vectors in S.

    2. Any subset of S is also linearly independent.

    3. If v / span(S) then S {v} is also linearly independent.

    4. S does not contain 0.

    One of the most important properties of linearly independent sets is a con-sequence of this unassuming lemma.

    Lemma 1. Suppose thatS andT are linearly independent sets of size m and

    n respectively where m n and that span(S) = span(T). If |S T| = k < mthen there is a linearly independent set S of size m such that span(S) =span(T) and |S T| = k + 1

    Proof. Suppose that

    S = {v1, v2, . . . , vk, uk+1, . . . , um}

    T = {v1, v2, . . . , vk, wk+1, . . . , wm, . . . , wn}

    As the two sets have the same span, we can certainly find an expression forwk+1 as a linear combination of vectors from S.

    wk+1 = 1v1 + 2v2 + + kvk + k+1uk+1 + + mum (2)

    As T is linearly independent at least one of the s must be non-zero, sosuppose that j = 0 (where k + 1 j m).

    17

  • 8/8/2019 LA Notes Complete

    18/36

    Now let S = S {uj} {wk+1} (i.e. we replace uj with wk+1). Then

    span(S

    ) = span(S) because anything that can be obtained as a linear com-bination of the vectors in S can be expressed as a linear combination of thevectors in S by using (2) to get an expression for uj in terms of the vectorsin S. A similar argument shows that S is independent.

    3.2 Bases

    A basis for a vector space V is a set of vectors S such that

    S is linearly indpendent

    V = span(S)

    The easiest way to specify a vector space or subspace is to give a basis for it,because a basis provides a complete and economical specification of exactlywhich vectors are in that vector space or subspace. A vector space is said tobe finite dimensional if it has a finite basis.

    Theorem 5. All bases for a finite-dimensional vector space V have the samesize, which is known as the dimension of V.

    Proof. Suppose that B and C are both bases for V and that |B| |C|. IfB is not contained in C, then by Lemma 1, we can find a sequence B = B1,B2, . . . of bases of V that have increasingly large intersection with C untileventually Bi C. As a basis cannot properly contain another basis, itfollows that Bi = C and hence that B and C have the same size.

    Many of the vector spaces that we have seen have a particular basis that isused so often that it is called the standard basis:

    The standard basis for R3 is

    {(1, 0, 0), (0, 1, 0), (0, 0, 1)}

    and so its dimension is 3.

    18

  • 8/8/2019 LA Notes Complete

    19/36

    The standard basis for R22 is1 00 0

    ,

    0 10 0

    ,

    0 01 0

    ,

    0 00 1

    and so its dimension is 4.

    The standard basis for R[x] is

    {1, x , x2, x3, x4, . . .}

    and so it is not finite dimensional.

    The standard basis for R3[x] is

    {1, x , x2, x3}

    and so it has dimension 4.

    There are two main ways to find a basis:

    Start with an independent setof vectors S and (if necessary) add morevectors to S while keeping it independent until it becomes a spanning

    set. Start with a spanning setof vectors S and (if necessary) remove vectors

    from S while keeping it spanning until it becomes independent.

    The first way is called extendingan independent set to a basis.

    The row space of a matrix is the vector space spanned by its rows. Oftenhowever the rows will not be linearly independent and so while they forma spanning set for the row space, they do not form a basis. Gaussian elim-ination is a procedure for replacing the rows of a matrix with a linearly

    independent set of rows that span the same space and thus it can be viewedas a very convenient way to find a basis for a vector space.

    19

  • 8/8/2019 LA Notes Complete

    20/36

    3.3 Coordinates

    IfB = {v1, v2, . . . , vn} is an orderedbasis for V then any vector v V can beexpressed as a unique linear combination v = 1v1 + 2v2 + + nvn andthe vector

    [v]B = (1, 2, . . . , n)

    is called the coordinate vector of v with respect to the basis B.

    This then allows us to specify vectors in any vector space simply as a list ofscalars, even if the actual vectors are more complicated, such as polynomialsor matrices.

    Example Consider the basis

    B =

    1 00 1

    ,

    1 00 1

    ,

    0 11 0

    ,

    0 1

    1 0

    for R22.

    What is the coordinate vector with respect to B of the vector

    A =

    1 21 1

    ?

    We need to solve the equation1 21 1

    = 1

    1 00 1

    + 2

    1 00 1

    + 3

    0 11 0

    + 4

    0 1

    1 0

    Looking at the main diagonal we get

    1 + 2 = 1

    1 2 = 1

    which has the solution 1 = 1 and 2 = 0, and looking at the other twocorners we get

    3 + 4 = 2

    3 4 = 1

    and so 3 = 3/2 and 4 = 1/2. Thus

    [A]B = (1, 0, 3/2, 1/2).

    20

  • 8/8/2019 LA Notes Complete

    21/36

    4 Linear Transformations

    In this section we consider maps between vector spaces that preserve thelinear structure, and consider various subspaces associated with those maps.

    4.1 Linear Maps

    Definition 5. If V and W are vector spaces over the same field F, then afunction T : V W is called a linear transformation if the following twoconditions are satisfied for all u, v V and F:

    T(u + v) = T(u) + T(v)

    T(u) = T(u)

    Notation: Normally we just write T u rather than T(u) when the meaningis clear.

    Example Let T : R3 R2 be defined by

    T(x,y,z) = (x + y, y + z).

    This is a linear transformation because if we put u = (u1, u2, u3) and v =(v1, v2, v3) then u + v = (u1 + v1, u2 + v2, u3 + v3) and so

    T(u + v) = (u1 + v1 + u2 + v2, u2 + v2 + u3 + v3)

    and

    T u + T v = (u1 + u2, u2 + u3) + (v1 + v2, v2 + v3)

    = (u1 + u + 2 + v1 + v2, u2 + u3 + v2 + v3)

    = T(u + v)

    as required.

    Example On the other hand, the mapping T : R2 R2 given by

    T(x, y) = (xy,x + y)

    21

  • 8/8/2019 LA Notes Complete

    22/36

    is not linear, because

    T(1, 1) = (1, 2)

    T(2, 2) = (4, 4)

    and so T(2, 2) = 2T(1, 1).

    Many common operations that we perform on vectors, matrices and polyno-mials are actually linear transformations:

    1. Projection of vectors onto a subspace, such as T : R3 R3 given by

    T(x,y,z) = (x, y).

    2. Taking the trace of a matrix T : R33 R1 defined by

    T

    a11 a12 a13a21 a22 a23

    a31 a32 a33

    = (a11 + a22 + a23)

    3. Differentiation of polynomials T : R[x] R[x] given by

    T f = f.

    4. Rotation of vectors (e.g in computer graphics) T : R2

    R2

    given by

    T

    xy

    =

    cos sin sin cos

    xy

    4.2 The rank-nullity theorem

    Let T : V W be a linear transformation. Then we define the followingsets of vectors, called the kernel and range of T

    ker T = {v V | T v = 0}

    range T = {w W | w = T v for some v V}

    Of course these are not just arbitrary sets of vectors, but they are actuallysubspaces.

    22

  • 8/8/2019 LA Notes Complete

    23/36

    Theorem 6. If T : V W is a linear transformation then ker T is a

    subspace of V and range T is a subspace of W.

    Proof. We prove that ker T is a subspace, as the proof for the range is similar.So suppose that v1, v2 ker T. Then we have to check whether v1 + v2 is inthe kernel, and also whether v1 is in the kernel.

    T(v1 + v2) = T v1 + T v2 (by linearity)

    = 0 + 0 (by definition of kernel)

    = 0.

    Similarly

    T(v1) = T v1 (by linearity)

    = 0 (by definition of kernel)

    = 0.

    We have special names for the dimensions of these subspaces.

    Definition 6. If T : V W is a linear transformation then the rank of T

    is the dimension of the range and the nullity of T is the dimension of thekernel.

    Example: Consider the projection T : R3 R3 given by

    T(x,y,z) = T(x,y, 0).

    Then the kernel ofT is the vectors of the form {(0, 0, a) | a R} or (speakinggeometrically) the z-axis. The range of T is the whole of the xy-plane.Therefore the rank of T is 2 and the nullity of T is 1.

    Theorem 7 (The rank-nullity theorem). If T : V W is a linear transfor-mation between finite-dimensional vector spaces V and W then

    rank(T) + nullity(T) = dim(V).

    23

  • 8/8/2019 LA Notes Complete

    24/36

    Proof. Suppose that dim(V) = n and that the nullity of T is k. Then let v1,

    v2, . . ., vk be a basis for the kernel of T and extend it to a basis for V.

    B = {v1, v2, . . . , vk, vk+1, . . . , vn}

    Then we claim that

    C = {T vk+1, T vk+2, . . . , T vn}

    is a basis for the range of T and so it has dimension n k.

    To prove the claim we need to show that every vector in the range is a lin-ear combination of the vectors in C and that the vectors in C are linearly

    independent. It is fairly clear that every vector in the range of T is a linearcombination of the vectors in C and so we just show that they are indepen-dent. So suppose that

    k+1T vk+1 + . . . + nT vn = 0 (3)

    Then by the linearity of T we have

    T(k+1vk+1 + . . . + nvn) = 0

    which implies that k+1vk+1+ +nvn ker T and so is a linear combination

    of the vectors in B. But if1v1 + + kvk = alphak+1vk+1 + + nvn

    then because B is a basis for V it follows that

    1 = 2 = = k = k+1 = = n = 0

    and (3) is the trivial linear combination.

    The rank-nullity theorem is often useful for determining the dimension of thekernel or range when one of theme is awkward to compute directly for somereason.

    Example Let T : R2[x] R3 be given by

    T f = (f(1), f(0), f(1))

    24

  • 8/8/2019 LA Notes Complete

    25/36

    What is the kernel of T?

    It is easy to see that the range ofT is the whole ofR3 because T(1) = (1, 1, 1)and T(x) = (1, 0, 1) and T(x2) = ( 1, 0, 1) and these three vectors arelinearly independent. Therefore the rank is 3 and so the nullity is 0 andhence the only vector in the kernel of T is the zero vector 0.

    4.3 The matrix of a linear transformation

    It is easy to see that multiplication by a matrix is a linear transformation.Thus if we let A be an m n matrix (i.e. it has m rows and n columns) then

    the mapT : Rn Rm

    given byT x = Ax

    (where x is viewed as a column-vector rather than a row-vector) is a lineartransformation.

    In fact, in a very strong sense, everylinear transformation is essentially equiv-alent to multiplication by a matrix.

    Definition 7. If B = {v1, v2, . . . , vn} and C are ordered bases for V and Wrespectively and T : V W is a linear transformation, then the matrix of Twith respect to B and C is the matrix

    [T]CB

    =

    ......

    ......

    [T v1]C [T v2]C... [T vn]C

    ......

    ......

    where the jth column is the vector [T vj]C written as a column vector.

    Theorem 8. With notation as above, for any vector v V we have

    [T v]C = [T]C

    B[v]B

    thus showing thatany linear transformation can be expressed as multiplicationby a matrix.

    25

  • 8/8/2019 LA Notes Complete

    26/36

    Proof. If v V then v = 1v1 + + nvn and so its coordinate vector is

    [v]B =

    12...

    n

    .

    Multiplying this vector by the matrix of the linear transformation we get

    [T]CB

    [v]B = 1[T v1]C + 2[T v2]C + + n[T vn]C

    which is[T(1v1 + 2v2 + + nvn)]C

    as required.

    Note: The key point of this result is that multiplying by the matrix ofthe linear transformation has the effect of taking the B-coordinate vector ofv and returning the C-coordinate vector of T v In other words, the matrixmultiplication both applies the linear transformation and expresses the resultin C-coordinates.

    5 Change of Basis

    Let V be a vector space and suppose that B and C are bases for V. Then wecan consider the identity linear transformation I : V V where

    Iv = v.

    In other words, I maps each vector to itself!

    The matrix of this linear transformation with respect to the bases B and Cis defined as usual by

    [I]CB = [[v1]C, [v2]C, . . . , [vn]C]

    We recall that multiplying by the matrix of a linear transformation has twoeffects on the B-coordinate vector of v

    26

  • 8/8/2019 LA Notes Complete

    27/36

    It applies the linear transformation to v

    It expresses the result in C-coordinates

    When the transformation is the identity then the only effect is to translate B-coordinates to C-coordinates and in this case the matrix is called the transitionmatrix between the bases B and C.

    Example Let V = R3 and consider the two bases

    B = {(1, 1, 1), (1, 1, 0), (1, 0, 0)}

    and C = {(1, 1, 0), (1, 0, 1), (1, 0, 1)}

    Express each of the vectors in B as a linear combination of those in C to getthe coordinate vectors.

    (1, 1, 1) = (1)(1, 1, 0) +1

    2(1, 0, 1) +

    3

    2(1, 0, 1)

    6 Eigenvalues, Eigenvectors and Eigenspaces

    Suppose that A is a real n n matrix. Then a vector v Rn is called aneigenvector of A with eigenvalue if

    Av = v

    In other words the vector is simply multiplied by .

    Example: If

    A =

    1 10 3

    27

  • 8/8/2019 LA Notes Complete

    28/36

    then v = 12 is an eigenvector for A with eigenvalue 3 because

    1 10 3

    12

    =

    36

    If v is an eigenvector with eigenvalue then so is every scalar multiple of v.

    Theorem 9. If A is a real n n matrix, then the set of all eigenvectors ofA with eigenvalue is a subspace ofRn called the eigenspace correspondingto A.

    Proof. Suppose that v, w are both eigenvectors with eigenvalue . ThenA(v + w) = Av + Aw = v + w = (v + w)

    and so v + w is an eigenvector with eigenvalue .

    6.1 The characteristic polynomial

    We can determine all the possible eigenvalues for a matrix as follows. Firstwe note that if

    Av = v

    then(I A)v = 0

    or in other words v is in the null space of the matrix I A.

    Therefore I A is not an invertible matrix and so its determinant is zero.Hence we define the characteristic polynomial of A to be the polynomial

    () = |I A|.

    Then the only possible eigenvalues of A are the zeros of this characteristicpolynomial.

    Example Let

    A =

    0 0 21 2 1

    1 0 3

    28

  • 8/8/2019 LA Notes Complete

    29/36

    Then

    () =

    0 21 2 11 0 3

    = ( 1)( 2)2Therefore the only possible eigenvalues for A are = 1 and = 2.

    To find the corresponding eigenspaces we need to solve the two systems oflinear equations

    0 0 21 2 11 0 3

    xy

    z

    =

    xy

    z

    and 0 0 21 2 11 0 3

    xyz

    = 2x2y

    2z

    .The first system of linear equations is equivalent to

    1 0 21 1 11 0 2

    xy

    z

    =

    00

    0

    which after reduction to row-echelon form leaves us with 1 0 20 1 1

    0 0 0

    xy

    z

    =

    00

    0

    and so we have the solution space

    {(2z,z,z) | z R}

    which is a one-dimensional eigenspace spanned by {(2, 1, 1)}.

    The second system is equivalent to 2 0 21 0 1

    1 0 1

    xy

    z

    =

    00

    0

    29

  • 8/8/2019 LA Notes Complete

    30/36

    which after reduction to row-echelon form leaves us with 1 0 10 0 0

    0 0 0

    xy

    z

    =

    00

    0

    and so we have y, z being free variables and x = z. Hence the solutionspace is

    {(z,y,z) | y, z R}

    which is a 2-dimensional eigenspace spanned by {(1, 0, 1), (0, 1, 0)}.

    7 Markov Chains

    Consider some sort of system that evolves over time in a probabilistic fashion.For example, suppose that out of Perths 1 million adult inhabitants thereare initially 20% Dockers supporters, 40% Eagles supporters and 40% not in-terested in AFL. We can represent these proportions in a vector representingthe state of the system at time 0:

    s0 = 0.20.4

    0.4 .

    After each year, some people change their habits as follows:

    Of the Dockers supporters at the end of any year, 50% remain Dockersfans, 30% switch to the Eagles and 20% give up in disgust and loseinterest in football.

    Of the Eagles supporters 20% change to the Dockers while 80% remainwith the Eagles.

    Of those not interested, 30% become Dockers fans, 30% become Eaglesfans and 40% remain uninterested.

    30

  • 8/8/2019 LA Notes Complete

    31/36

    The proportions/probabilities are expressed in a transition matrixwhere each

    column contains non-negative real numbers that sum to 1.

    T =

    0.5 0.2 0.30.3 0.8 0.3

    0.2 0 0.4

    Definition 8. A stochastic matrix is a square matrix where each columncontains non-negative real numbers that sum to 1.

    The state of the system after 1 year is then

    s1 = T s0 = 0.5 0.2 0.30.3 0.8 0.3

    0.2 0 0.4

    200000400000400000

    = 300000500000200000

    After another year we have

    s2 = T s1 =

    310000550000

    140000

    and then the sequence of states continues

    s3 =

    307000575000118000

    s4 = 303900587500

    108600

    s5 = 302030593750

    104220

    s6 = 301031596875

    102094

    .

    The fluctuations appear to be settling down and the system approaching asteady state v that must satisfy the equation

    T s = s

    Therefore the steady state of this Markov chain is an eigenvector with eigen-

    value 1. Solving the system of linear equations we discover that the steadystate vector is

    s =

    300000600000

    100000

    31

  • 8/8/2019 LA Notes Complete

    32/36

    and that the system appears to be approaching this steady state.

    What are the other eigenvalues and eigenvectors of the transition matrix T?It is rather tedious to calculate the characteristic polynomial but it turns outthat the eigenvalues of T are {1/5, 1/2, 1} with eigenvectors v1 = (1, 0, 1),v2 = (1, 3, 2), v3 = (3, 6, 1) respectively. These eigenvectors form a basis forR3 and so the initial state vector is some linear combination of those vectors

    s0 = 1v1 + 2v2 + 3v3

    Therefore the state at time k is given by

    sk = Tks0 = 1(1/5)

    kv1 + 2(1/2)kv2 + 3v3

    As k increases the terms (1/5)k and (1/2)k become vanishingly small and allthat remains is the term 3v3. Therefore any initial state vector will tendtowards a multiple of v3 provided the initial vector is not in span({v1, v2}).

    Under certain mild conditions, a Markov chain can be guaranteed to convergeto a steady state vector

    Theorem 10. If a Markov chain is described by a stochastic transition ma-trixT such that eitherT or any of its powers Tk have strictly positive entries,then regardless of the initial state the chain converges to a unique steady state.

    7.1 The $100 billion dollar eigenvector

    The most famous Markov chain of all and certainly the most lucrative is theone underlying the original Page Rank algorithm of Google.

    This algorithm views the entire web as a giant Markov process where usersmove from page to page depending on the links on each page. In particular,the entry Tij models the probability that a user will surf from page j to pagei, which depends on whether page j links to page i and how many other links

    there are from page j to other pages. Finally there is always a small chancethat the user will randomly surf to an unlinked page.

    The steady state vector of this giant Markov chain represents the overallpopularity of each web page and so the pages are ranked according to the

    32

  • 8/8/2019 LA Notes Complete

    33/36

    values in the steady state vector. As T is a matrix with around 4 billion rows

    and columns, this is a massive computation!

    8 Inner Products

    An inner product on a vector space V is a function

    : V V R

    that satisfies the following conditions for all vectors u, v, w V and all

    scalars

    1. u, v = v, u

    2. u, v + w = u, v + u, w

    3. u,v = u, v = u,v

    4. u, u 0 and u, u = 0 u = 0

    These properties of an inner product are modelled after the familiar dotproduct in Rn given by

    u v = u1v1 + u2v2 + unvn.

    There are a number of other examples of inner products for different vectorspaces.

    The usual dot product in Rn (as above).

    If we pick n + 1 distinct real numbers x0, x1, . . ., xn then the function

    f, g = f(x0)g(x0) + f(x1)g(x1) + + f(xn)g(xn)

    is an inner product on the vector space Rn[x] of polynomials of degreeat most n.

    33

  • 8/8/2019 LA Notes Complete

    34/36

    The vector space C[0, 2], which is the set of continuous real-valued

    functions defined on the interval 0 x 2 has an inner product

    f, g =

    2

    0

    f(x)g(x) dx.

    The functionA, B = tr(ABT)

    is an inner product on the space of square matrices Rnn.

    A vector space V together with an inner product is called an inner product

    space.

    Definition 9. In an inner product space a set of non-zero vectors {v1, . . . , vk}is called orthogonal if

    vi, vj = 0

    for alli = j and the set of vectors is called orthonormal if it is orthogonaland in addition

    vi, vi = 1

    for alli.

    8.1 Orthogonal Projection

    Let W be a subspace of an inner product space V. The next lemma showsthat any vector in V can be expressed as the sum of two vectors, one in Wand one in the orthogonal complement of W.

    Lemma 2. Let W be a subspace of an inner product space V. Then anyvector v V can be expressed in the form

    v = w + w (4)

    where w W and w W.

    34

  • 8/8/2019 LA Notes Complete

    35/36

    Proof. Let W have an orthonormal basis {w1, w2, . . . , wk} and define

    w = v, w1w1 + v, w2w2 + + v, wkwk

    and w = v w. Then it is obvious that w W and that v = w + w. Toshow that w W we simply show that it is orthogonal to each wi where1 i k. But clearly

    wi, w = v w, wi = v, wi w, wi = 0.

    The vector w in (4) is called the projection ofv onto W and denoted projW(v).

    8.2 Symmetric Matrices

    One important class of linear transformations are those represented by sym-metric matrices. The major properties of symmetric matrices are summarisedin the next theorem

    Theorem 11. Suppose that A is an n n symmetric matrix. Then

    The matrix A has n real eigenvalues.

    If is an eigenvalue of A with multiplicity m then the eigenspace Ehas dimension m.

    Eigenvectors belonging to different eigenspaces of A are orthogonal.

    Proof. For now we just consider the third statement. Suppose that v1 haseigenvalue 1 and that v2 has eigenvalue 2, where 1 = 2. Then considerthe value

    v

    T

    1 Av2 = v

    T

    2 Av1.As v2 is an eigenvector for A, the first equation shows that

    vT1 2v2 = 2vT1 v2 = 2(v1 v2)

    35

  • 8/8/2019 LA Notes Complete

    36/36

    and as v1 is an eigenvector for A the second equation shows that

    vT2 1v1 = 1vT2 v1 = 1(v1 v2)

    Therefore1(v1 v2) = 2(v1 v2)

    and as 1 = 2 it follows that v1 v2 = 0.

    Definition 10. An invertible matrix P is called orthogonal if

    P1 = PT.

    Corollary 1. A matrix is orthogonally diagonalizable if and only if it is asymmetric matrix.

    36