math camp for economists - duke university › ~dgraham › mathcamp › mathcamp-10-07-29.pdf(6,...

Math Campfor Economists

Daniel A. Graham

July 29, 2010

Copyright c© 2007-2008 Daniel A. Graham

Contents

Contents iii

Preface vii

1 Linear Algebra 1

1.1 Real Spaces . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1

1.2 Combinations of Vectors . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5

1.3 The Standard Linear Equation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9

1.4 Separating and Supporting Hyperplanes . . . . . . . . . . . . . . . . . . . . . . . . 11

1.5 Answers . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 12

2 Matrix Algebra 15

2.1 Linear Spaces . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 15

2.2 Real Valued Linear Transformations and Vectors . . . . . . . . . . . . . . . . . . . . 16

2.3 Linear Transformations and Matrices . . . . . . . . . . . . . . . . . . . . . . . . . 17

2.4 Invertible Transformations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 19

2.5 Change of Basis and Similar Matrices . . . . . . . . . . . . . . . . . . . . . . . . . 20

2.6 Systems of Linear Equations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 21

2.7 Square Matrices . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 24

2.8 Farkas’ Lemma . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 31

2.9 Answers . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 33

iii

3 Topology 35

3.1 Counting . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 35

3.2 Metric Spaces . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 39

3.3 Topological Spaces . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 43

3.4 Sigma Algebras and Measure Spaces . . . . . . . . . . . . . . . . . . . . . . . . . . 45

3.5 Answers . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 46

4 Calculus 47

4.1 The First Differential . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 47

4.2 Quadratic Forms . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 49

4.3 The Second Differential . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 51

4.4 Convex and Concave Functions . . . . . . . . . . . . . . . . . . . . . . . . . . . . 53

4.5 Answers . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 56

5 Optimization 57

5.1 Optimization . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 57

5.2 The Well Posed Problem . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 67

5.3 Comparative Statics . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 70

6 Dynamics 75

6.1 Dynamic Systems . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 75

6.2 Systems of Linear Differential Equations . . . . . . . . . . . . . . . . . . . . . . . . 81

6.3 Liapunov’s Method . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 88

Notation 91

Using Mathematica 93

Basics . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 93

Input Expressions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 93

Symbols and Numbers . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 94

Using Prior Expressions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 96

Commonly Used Commands . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 99

iv

Bibliography 103

List of Problems 105

Index 106

Colophon 109

v

Preface

The attached represents the results of many years of effort to economize on the use of my scarcemental resources — particularly memory. My goal has been to extract just those mathematical ideaswhich are most important to Economics and to present them in a way that emphasizes the approach,common to virtually all of mathematics, that begins with the phrase “let X be a non-empty set” andgoes on to add a pinch of this and a dash of that.

I believe that Mathematics is both beautiful and useful and, when viewed in the right way, not nearly ascomplicated as some would have you believe. For me, the right way is to identify the links connectingthe ideas and, whenever possible, to embed them in a visual setting.

The reader should be aware of two aspects of these notes. First, intuition is emphasized. While “ProveTheorem 7” might be a common format for exercises inside courses, “State an interesting propositionand prove it” is far more common outside courses. Intuition is vital for such endeavors. Secondly, useof the symbolic algebra program Mathematica is emphasized for at least the following reasons:

• Mathematica is better at solving a wide variety of problems than you or I will ever be. Ourcomparative advantage is in modeling, not solving.

• Mathematica lowers the marginal cost of asking “What if?” questions, thereby inducing us to askmore of them. This is a very good thing. One of the best ways of formulating conjectures aboutwhat might be true, for instance, is to examine many specific cases and this is a relatively cheapendeavor with Mathematica.

• Mathematica encourages formulating solution plans and, in general, top-down thinking. Afterall, with it to do the heavy lifting, all that’s left for us is to formulate the problem and plan thesteps. This too, is a very good thing.

Why Mathematica and not Maple, another popular symbolics program? While there are differences,both are wonderful programs and it would be difficult to argue that either is better than the other. I’veused both and have a slight personal preference for Mathematica.

Dan Graham

Duke University

vii

Chapter 1

Linear Algebra

1.1 Real Spaces . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1

1.1.1 Equality and Inequalities . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2

1.1.2 Norms, Sums and Products . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3

1.2 Combinations of Vectors . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5

1.2.1 Linear Combinations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6

1.2.2 Affine Combinations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7

1.2.3 Convex Combinations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9

1.3 The Standard Linear Equation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9

1.4 Separating and Supporting Hyperplanes . . . . . . . . . . . . . . . . . . . . . . . . 11

1.5 Answers . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 12

The following is an informal review of that part of linear algebra which will be most important tosubsequent analysis. Please bear in mind that linear algebra is, perhaps, the single most importanttool in Economics and forms the basis for many other important areas of mathematics as well.

1.1 Real Spaces

Recall that the Cartesian product of sets, e.g. “Capital Letters” × “Integers” × “Lower Case Letters”, isitself a set composed of all ordered n-tupels of elements chosen from the respective sets, e.g., (G,5, f ),(F,1, a) and so forth. Note that no multiplication is involved in forming this product. Now introducethe set consisting of all real numbers, denoted R and called “the reals”, and real n-space is obtainedas the n-fold Cartesian product of the reals with itself:

1

(6, -3)

(-4, 5) *

Figure 1.1: Vectors in R2

Rn ≡n times︷︸︸︷

R× . . .×R≡ {(x1, x2, ..., xn) | xi ∈ R, i = 1,2, . . . , n}

The origin is the point (0,0, . . . ,0) ∈ Rn. It will sometimesbe written simply as 0 when no confusion should result. Anarbitrary element of this set, x ∈ Rn, is sometimes called apoint and sometimes called a vector and xi is called the ithcomponent of x. The existence of two terms for the samething is due, no doubt, to the fact that it is sometimes usefulto think of x = (x1, x2), for example, as a point located atx1 on the first axis and x2 on the second axis. Other timesit is useful to think of x = (x1, x2) as a directed arrow orvector with its tail at the origin, (0,0), and its tip at the pointx = (x1, x2).See Figure 1.1. It is important to realize, on the other hand, that it is hardly ever useful to think of avector as a list of its coordinates. Vectors are objects and better regarded as such than as lists of theircomponents.

1.1.1 Equality and Inequalities

Recall that a relation (or binary relation), R, on a set S is a mapping from S × S to {True, False}, i.e.,for every x,y ∈ S, xRy is either “True” or “False”. This is illustrated in Figure 1.2 for the relation >on R. Note that points along the “45-degree line” where x = y map into “False”.

x > y: True

x

y

x > y: False

x = y

Figure 1.2: The Relation > on R

Supposing that x,y ∈ Rn, several sorts of relations are pos-sible between these vectors. The vector x is equal to thevector y when each component of x is equal to the corre-sponding component of y :Definition 1.1. x = y iff xi = yi, i = 1,2, . . . , n

The vector x is greater than or equal to the vector y wheneach component of x is at least as great as the correspondingcomponent of y :Definition 1.2. x ≥ y iff xi ≥ yi, i = 1,2, . . . , n.

The vector x is greater than the vector y when each compo-nent of x is at least as great as the corresponding componentof y and at least one component of x is strictly greater thanthe corresponding component of y :Definition 1.3. x > y iff x ≥ y and x 6= y�Problem 1.1. [Answer] Suppose x > y . Does it necessarily follow that x ≥ y?

The vector x is strictly greater than the vector y when each component of x is greater than thecorresponding component of y :Definition 1.4. x� y iff xi > yi, i = 1,2, . . . , n.�Problem 1.2. [Answer] Suppose x� y . Does it necessarily follow that x > y?

2

These definitions are standard and they conform to the conventional usage in the special case in whichn = 1. The distinctions are illustrated in Figure 1.3. The shaded area in the left-hand panel representsthe set of vectors, y , for which y � (4,3). Note that (4,3) does not belong to this set nor does anypoint directly above (4,3) nor any point directly to the right of (4,3). The shaded area in the right-hand panel illustrates the set of vectors, y , for which y ≥ (4,3). This shaded area differs by including(4,3), points directly above (4,3) and points directly to the right of (4,3). Though not illustrated, theset of y ’s for which y > (4,3) corresponds to the shaded area in the right-hand panel with the point(4,3) itself removed.

x = (4,3) x = (4,3)

Figure 1.3: Vector Inequalities: y � x (left) and y ≥ x (right)

�Problem 1.3. A relation, R, on a set S is transitive iff xRy and yRz implies xRz for all x,y, z ∈ S.(i) Is = transitive? (ii) What about ≥? (iii)�? (iv) >?�Problem 1.4. A releation, R on a set S is complete if either xRy or yRx for all x,y ∈ S. When n = 1it must either be the case that x ≥ y or that y ≥ x (or both). Thus ≥ on R1 is complete. Is ≥ on Rncomplete when n > 1?

�Problem 1.5. [Answer] Consider the case in which n = 1 and x, y ∈ R1. Is there any distinctionbetween x > y and x� y?

1.1.2 Norms, Sums and Products

0-3

||(-3)|| = 3

4

3

||(3,4)|| = 52

4

4

||(2,4,4)|| = 6

Figure 1.4: Vector Norms in R1, R2 and R3

The (Euclidean) norm or length of a vector, by an obvious extension of the Pythagorean Theorem, isthe square root of the sum of the squares of its components.

3

Definition 1.5. ‖x‖ ≡(x21 + x22 + . . .+ x2n

)1/2.

Note that the absolute value of a real number and the norm of a vector in R1 are equivalent — if a ∈ Rthen ‖a‖ =

√a2 = |a|. The norms of a vectors in R2 and R3 are illustrated in Figure 1.4 on preceding

page. The extensions to higher dimensions are analogous.

x = (4,-3)

y = (3,4)z = (-3,4)

w = (-3, -4)

Figure 1.5: Angles

If x,y ∈ Rn, the dot or inner product of these two vectorsis obtained by multiplying the respective components and adding.

Definition 1.6. x ·y ≡ x1y1 + x2y2 + . . .+ xnyn =∑ni=1 xiyi

There is a very important geometric interpretation of this dotproduct. It can be shown that

x ·y = ‖x‖‖y‖ cosθ (1.1)

where θ is the included angle between the two vectors. Recallthat the cosine of θ is bigger than, equal to or less than zerodepending upon whether θ is less than, equal to or greaterthan ninety degrees.

Theorem 1. Suppose x,y ∈ Rn with x,y 6= 0. Then x ·y > 0iff x and y form an acute angle, x · y = 0 iff x and y form aright angle and x ·y < 0 iff x and y form an obtuse angle.

This theorem is illustrated in Figure 1.5 where (a) x and y form a right angle and x ·y = 0, (b) x andw form a right angle and x ·w = 0, (c) x and z form an obtuse angle and x · z = −24 < 0, (d) y andw form an obtuse angle and y ·w = −25 < 0 and (e) y and z form an acute angle and y · z = 7 > 0.

When two vectors form a right angle they are said to be orthogonal Note that the word “orthogonal” isjust the generalization of the word “perpendicular” to Rn. Similarly, orthant is the generalization ofthe word quadrant to Rn.

�Problem 1.6. The Cauchy-Schwarz inequality states that∣∣x ·y∣∣ ≤ ‖x‖‖y‖. Show that this inequality

follows from Equation 1.1.

♦Query 1.1. When does Cauchy-Schwarz hold as an equality?

�Problem 1.7. Suppose a, x, y ∈ Rn and a · x > a · y . Does it follow that x > y? [Hint: resist anytemptation to “divide both sides by a”.]

�Problem 1.8. In Mathematica a vector is a list, e.g. {1,2,3,4} or Table[j,{j,1,4}] and the dotproduct of two vectors is obtained by placing a period between them. Use Mathematica to evaluate thefollowing dot product:

Table[j, {j,1,100}] . Table[j-50, {j,1,100}]

Do the two vectors form an acute angle, an obtuse angle or are they orthogonal?

The sum of two vectors is obtained by adding the respective components. Supposing that x,y ∈ Rnwe have:

Definition 1.7. x +y ≡ (x1 +y1, x2 +y2, . . . , xn +yn)

4

x = (5, -3)

y = (-1,5)

x+y = (4, 2)

Figure 1.6: Vector Addition

Note that the sum of two vectors in Rn is itself a vector inRn. The set Rn is said, therefore, to be closed with respectto the operation of addition.

The addition of two points fromR2 is illustrated in Figure 1.6.The addition of x = (5,−3) and y = (−1,5) yields a pointlocated at the corner of the parallelogram whose sides areformed by the vectors x and y . Equivalently, x + y = (4,2)is obtained by moving the vector x parallel to itself until itstail rests at the tip of y , or by moving the vector y parallelto itself until its tail rests at the tip of x.

The scalar product of a real number and a vector is obtainedby multiplying each component of the vector by the real num-ber. If α ∈ R then:Definition 1.8. αx = (αx1, αx2, . . . , αxn).

Note that this product is itself a vector in Rn. The set Rn issaid, therefore, to be closed with respect to the operation of scalar multiplication.

x = (3,1)2 x

-2 x

Figure 1.7: Scalar Product

Scalar multiplication is illustrated in Figure 1.7. Note thatfor any choice of α, αx lies along the extended line passingthrough the origin and the point x. The sign of α determineswhether αx will be on the same (α > 0) or opposite (α <0) side of the origin as x. The magnitude of α determineswhether αx will be closer (< 1) or further away (‖α‖ > 1)from the origin than x.�Problem 1.9. In Mathematica if x and y are vectors and ais a real number, then x+y gives the sum of the two vectorsand ax gives the scalar product of a and x. Use Mathematicato evaluate the following:

3{1,3,5} + 2{2,4,6}

The norm of αx is

‖αx‖ = (α2x21 +α2x22 + . . .+α2x2n)1/2

= [α2(x21 + x22 + . . .+ x2n)]1/2

= (α2)1/2(x21 + x22 + . . .+ x2n)1/2

= ‖α‖‖x‖

Multiplying x by α thus produces a new vector that is ‖α‖ times as long as the original vector. It is notdifficult to see that αx points in the same direction as x if α is positive and in the opposite directionif α is negative.�Problem 1.10. In Mathematica the norm of the vector x is given by Norm[x]. What is

Norm[Table[j, {j,1,100}]]

1.2 Combinations of Vectors

The operations of vector addition and scalar multiplication can be combined.

5

1.2.1 Linear Combinations

Definition 1.9. If x1, x2, . . . , xk are k vectors in Rn and if α1, α2, . . . , αk are real numbers then

z = α1x1 +α2x2 + . . .+αkxk

is a linear combination of the vectors.

A related concept is that of linear independence.

Definition 1.10. Ifα1x1 +α2x2 + . . .+αkxk = (0,0, . . . ,0)

has no solution (a1, a2, . . . , ak) other than the trivial solution, α = 0, then the vectors are said to belinearly independent. Alternatively, if there were a non-trivial solution, α 6= 0, then the vectors are saidto be linearly dependent.

In the latter case we must have αj 6= 0 for some j and thus can write:

αjxj = −∑i6=jαixi

or, since αj 6= 0,xj =

∑i6=j−αiαjxi

Thus xj is a linear combination of the remaining x’s. It follows that vectors are either linearly inde-pendent or one of them can be expressed as a linear combination of the rest.

This is illustrated in Figure 1.8. In the right-hand panel, x and y are linearly dependent and a non-trivial solution is α = (1,2). In the left-hand panel, on the other hand, x and y are linearly independent.Scalar multiples of x lie along the dashed line passing through x and the origin and, similarly, scalarmultiples of y lie along the dashed line passing through y and the origin. The only way to have thesum of two points selected from these lines add up to the origin is to choose the origin from each line— the trivial solution α = (0,0).

x = (4, -8)

y = (-2, 4)

2y = (-4, 8)

(0, 0) =1 x + 2 y

x = (6, -3)

y = (6, 4)(0,7)

Figure 1.8: Linear Independence (left) and Dependence (right)

Definition 1.11. If L is a non-empty set which is closed with respect to vector addition and scalarmultiplication, i.e. (i) x,y ∈ L =⇒ x + y ∈ L and (ii) α ∈ R, ;x ∈ L =⇒ αx ∈ L, then L is called alinear space.

Definition 1.12. If L is a linear space and L ⊆ M then L is a linear subspace of M .�Problem 1.11. Which of the following sets are linear subspaces of R3?

1. A point other than the origin? What about the origin?

6

2. A line segment? A line through the origin? A line not passing through the origin?

3. A plane passing through the origin? A plane not passing through the origin?

4. The non-negative orthant, i.e., {x ∈ R3 | x ≥ 0}?

♦Query 1.2. Must the intersection of two linear subspaces itself be a linear subspace?

Definition 1.13. The dimension of a linear (sub)space is an integer equaling the largest number oflinearly independent vectors which can be selected from the (sub)space.

�Problem 1.12. [Answer] What are the dimensions of the following subsets of R3?

1. The origin?

2. A line through the origin?

3. A plane which passes through the origin?

Definition 1.14. Given a set {x1, x2, . . . , xk} of k vectors in Rn, the set of all possible linear combina-tions of these vectors is referred to as the linear subspace spanned by these vectors.

Linear spaces spanned by independent and dependent vectors are illustrated for the case in whichn = 2 in Figure 1.8 on the previous page. Since the two vectors, (6,−3) and (6,4) in the left-handpanel are linearly independent, every point in R2 can be obtained as a linear combination of these twovectors. The point (0,7), for example, corresponds to −1x + 1y . In the right-hand panel, on the otherhand, the two vectors (4,−8) and (−2,4), are linearly dependent and the linear subspace spanned bythese vectors is a one-dimensional, strict subset of R2 corresponding to the line which passes throughthe two points and the origin.

�Problem 1.13. [Answer] Suppose x, y ∈ Rn with x 6= 0 and let X = {z ∈ Rn | z = αx, α ∈ R } bethe (1-dimensional) linear space spanned by x. The projection of y upon X, denoted ŷ , is defined tobe that element of X which is closest to y , i.e. that element ŷ ∈ X for which the norm of the residualof the projection, ‖y − ŷ‖, is smallest. Obtain expressions for both α̂ and ŷ as functions of x and y .�Problem 1.14. Suppose a, y ∈ Rn and let X = {x ∈ Rn | a · x = 0 } be the linear subspaceorthogonal to a. Obtain an expression for ŷ , the projection of y on X, as a function of a and y .[See Problem 1.13.]

Definition 1.15. A basis for a linear (sub)space is a set of linearly independent vectors which span the(sub)space.

Definition 1.16. An orthonormal basis for a linear (sub)space is a basis with two additional properties:

1. The basis vectors are mutually orthogonal, i.e., if xi and xj are vectors in the basis, then xi ·xj =0.

2. The length of each basis vector is one, i.e., if xi is a vector in the basis, then xi · xi = 1.

�Problem 1.15. [Answer] Give an orthonormal basis, {x1, x2, . . . , xn}, for Rn.

1.2.2 Affine Combinations

In forming linear combinations of vectors no restriction whatever is placed upon the α’s other thanthat they must be real numbers. In the left-hand panel of Figure 1.9 on following page, for example,every point in the two-dimensional space corresponds to a linear combination of the two vectors. Anaffine combination of vectors, on the other hand, is a linear combination which has the additionalrestriction that the α’s add up to one.

7

Definition 1.17. If x1, x2, . . . , xk are k vectors in Rn and if α1, α2, . . . , αk are real numbers with theproperty that

k∑i=1αi = 1

then

z =k∑i=1αixi

is an affine combination of the x’s.�Problem 1.16. An affine combination of points is necessarily a linear combination as well but notvice versa. True or false?

An affine space bears the same relationship to affine combinations that a linear space does to linearcombinations:

Definition 1.18. If L is closed with respect to affine combinations, i.e. affine combinations of points inL are necessarily also in L, then L is called an affine space. If, additionally, L ⊆ M then L is an affinesubspace of M .

The affine subspace spanned by a set of vectors is similarly analogous to the linear subspace spannedby a set of vectors.

Definition 1.19. Given a set {x1, x2, . . . , xk} of k vectors in Rn, the affine subspace spanned by thesevectors is set of all possible affine combinations of these vectorsz ∈ Rn | z =

k∑i=1αixi,

k∑i=1αi = 1

When k = 2, z = α1x1+α2x2 is an affine combination of x1 and x2, provided that a1+a2 = 1. Supposenow that x1 6= x2, let λ = α1 and (1−λ) = α2 and rewrite this as z = λx1+ (1−λ)x2. Rewriting againwe have z = λ(x1 − x2) + x2. Note that when λ = 0, z = x2. Alternatively, when λ = 1, z = x1. Ingeneral z is obtained by adding a scalar multiple of (x1 − x2) to x2. It is not difficult to see that suchpoints lie on the extended line passing through x1 and x2 — the set of all possible affine combinationsof two distinct vectors is simply the line determined by the two vectors. This is illustrated for n = 2by the middle panel in Figure 1.9.

Figure 1.9: Combinations: linear (left), affine (middle) and convex (right)

�Problem 1.17. A linear subspace is necessarily an affine subspace as well but not vice versa. True orfalse?

�Problem 1.18. [Answer] Suppose a is a point in L where L is an affine subspace but not a linearsubspace. Let M be the set obtained by “subtracting” a from L, i.e. M = {z | z = x − a, x ∈ L }. Is Mnecessarily a linear subspace?

�Problem 1.19. Suppose x,y ∈ Rn with x and y linearly independent and consider the affine sub-space A = {z ∈ Rn | z = λx + (1− λ)y, λ ∈ R}. Find the projection, ô of the origin on A.

8

1.2.3 Convex Combinations

If we add the still further requirement that the α’s not only add up to one but also that each is non-negative, then we obtain a convex combination.

Definition 1.20. If x1, x2, . . . , xk are k vectors in Rn and if α1, α2, . . . , αk are real numbers with theproperty that

k∑i=1αi = 1

αi ≥ 0, i = 1, . . . , k

then

z =k∑i=1αixi

is a convex combination of the x’s.

Again considering the case of k = 2, we know that since the α’s must sum to one, convex combinationsof two vectors must lie on the line passing through these vectors. The additional requirement that theα’s must be non-negative means that convex combinations correspond to points on the line betweenx1 and x2, i. e. the set of all possible convex combinations of two distinct points is the line segmentconnecting the two points. This is illustrated for n = 2 in Figure 1.9 on the previous page.�Problem 1.20. A convex combination of points is necessarily an affine combination and thus a linearcombination as well but not vice versa. True or false?

A convex set bears the same relationship to convex combinations that an affine subspace does to affinecombinations:

Definition 1.21. If L ⊆ Rn and L is closed with respect to convex combinations, i. e. convex combina-tions of points in L are necessarily also in L, then L is called an convex set.

�Problem 1.21. Show that the intersection of two (or more) convex sets in Rn must itself be a convexset.

Definition 1.22. Given a set L ⊆ Rn, the smallest convex set which contains L is called the convex hullof L. Here “smallest” means the intersection of all convex sets containing the given set.]

The convex hull of a set of vectors corresponds to the set of all convex combinations of the vectorsand is thus analogous to the affine space spanned by a set of vectors:

�Problem 1.22. Suppose x, y and z are three, linearly independent vectors in R3. Describe the setswhich correspond to all (i) linear, (ii) affine and (iii) convex combinations of these three vectors.

1.3 The Standard Linear Equation

9

a = (4, 3)

a . x = 0

90

Figure 1.10: a · x = 0

With the geometrical interpretation of the dot product inmind consider the problem of solving the linear equation

a1x1 + a2x2 + . . .+ anxn = 0

ora · x = 0

where a = (a1, a2, . . . an) is a known vector of coefficients— called the normal of the equation — and the problem isto find those x’s in Rn which solve the equation. We knowthat finding such an x is equivalent to finding an x which isorthogonal to a. The solution set,

X(a) ≡ {x ∈ Rn | a · x = 0 }

then must consist of all x’s which are orthogonal to a. Thisis illustrated for n = 2 in Figure 1.10.�Problem 1.23. [Answer] Show that X(a) is a linear subspace.�Problem 1.24. [Answer] What is the dimension of X(a)?�Problem 1.25. Suppose a,b,y ∈ Rn are linearly independent and let L = {x ∈ Rn | a ·x = 0 and b ·x = 0}. Find an expression for ŷ , the projection of y on L as a function of a, b and y .

Now consider the “non zero” version of the linear equation

a1x1 + · · · + anxn = b

ora · x = b

where b is not necessarily equal to 0 and let

X(a,b) = {x ∈ Rn | a · x = b}

denote the solution set for this equation.

�Problem 1.26. [Answer] Show that X(a,b) is an affine subspace.�Problem 1.27. When is X(a,b) a linear subspace?♦Query 1.3. Which two subsets of a linear space, X, are always linear subspaces?

To provide a geometric characterization of X(a,b), find a point x∗ that (i) lies in the linear subspacespanned by a and (ii) solves the equation a · x = b. To satisfy (i) it must be the case that x∗ = λa forsome real number λ. To satisfy (ii) it must be the case that a ·x∗ = b. Combining we have a · (λa) = bor λ = b/(a · a) and thus x∗ = [b/(a · a)]a.Now suppose that x′ is any solution to a · x = 0. It follows that x∗ + x′ must solve a · x = b sincea · (x∗ + x′) = a · x∗ + a · x′ = b + 0 = b. We may therefore obtain solutions to a · x = b simplyby adding x∗ to each solution of a · x = 0. X(a,b) is obtained, in short, by moving X(a) parallelto itself until it passes through x∗. The significance of x∗ is that it is the point in X(a,b) which isclosest to the origin. This norm, moreover, is ‖x∗‖ = ‖b‖/‖a‖ . Note that x∗ can be interpreted asthe intercept of the solution set with the a “axis”. When b is positive, X(a,b) lies on the same sideof the origin as a and a forms a positive dot product (acute angle)with each point in X(a,b). When bis negative, X(a,b) lies on the opposite side of the origin from a and a forms a negative dot product(obtuse angle) with each point in X(a,b).

10

a = (4, 3)

a . x = 0

90

a. x = -25/2

x* = (-2, -3/2)

Figure 1.11: a · x = b

This x∗ is illustrated in Figure 1.11 for the case in whicha = (4,3), ‖a‖ = 5, b = −25/2, x∗ = [b/(a ·a)]a = −25/2×1/25× (4,3) = (−2,−3/2) and

‖x∗‖ =√(−2,−3/2) · (−2,−3/2)

= 5/2= ‖b‖/‖a‖

The solution set for the linear equation a ·x = b can thus begiven the following interpretation: X(a,b) is an affine sub-space orthogonal to the normal a and lying a directed dis-tance equal to b/‖a‖ from the origin at the closest point.The term directed distance simply means that X(a,b) lieson the same side of the origin as a if b is positive and on theopposite side if b is negative.

This is the standard form for a linear equation. It replacesthe familiar slope-intercept form used for n = 2. In this more general form the slope is given by the“orthogonal to a” requirement and the intercept by the point x∗ a distance b/‖a‖ out the a “axis”.�Problem 1.28. Suppose b ∈ R, a, y ∈ Rn and let X(a,b) = {x ∈ Rn | a · x = b}. Obtain anexpression for ŷ , the projection of y on X(a,b), as a function of a, b and y . [See Problem 1.13 onpage 7.]

1.4 Separating and Supporting Hyperplanes

The solution set X(a,b) bears exactly the same relationship to Rn that a plane does to R3. For example,it is linear (either a linear or an affine subspace) and has a dimension equal to n−1. For these reasons

X(a,b) ≡ {x ∈ Rn | a · x = b }

is called a hyperplane. This hyperplane divides Rn into two associated half spaces

H+(a, b) ≡ {x ∈ Rn | a · x ≥ b }H−(a, b) ≡ {x ∈ Rn | a · x ≤ b }

the intersection of which is X(a,b) itself:

H+(a, b)∩H−(a, b) = X(a,b)

Definition 1.23. If Z ⊂ Rn is an arbitrary set, then X(a,b) is bounding for Z iff Z is entirely containedin one of X(a,b)’s half-spaces, i.e., either Z ⊆ H+ or Z ⊆ H−.Definition 1.24. If Z ⊂ Rn is an arbitrary set, then X(a,b) is supporting for Z iff X(a,b) is boundingfor Z and X(a,b) “touches” Z , i.e.,

infz∈Z|a · z − b| = 0

These concepts together with the following theorem will prove very useful in subsequent analysis.

11

Theorem 2 (Minkowski’s Theorem). If Z and W are non-empty, convex and non-intersecting subsetsof Rn, then there exist a ∈ Rn and b ∈ R such that X(a,b) is separating for Z and W , i.e., X(a,b)(i) is bounding for both Z and W , (ii) contains Z in one half-space and (iii) contains W in the otherhalf-space.

Minkowski’s Theorem is illustrated for n = 2 in Figure 1.12. In the left-hand panel the antecedentconditions for the theorem are met and the separating hyperplane is illustrated. In right-hand panelone of the sets is not convex and it is not possible to find a separating hyperplane.

Z

W

a

a. x = b

Z

W

Figure 1.12: Conditions for Minkowski’s Theorem: satisfied (left) and violated (right)

1.5 Answers

Problem 1.1 on page 2. Yes. From the “only if” in the definition, x > y =⇒ x ≥ y .Problem 1.2 on page 2. Yes. From the “only if” in the definition,

x� y =⇒ xi > yi, i = 1,2, . . . , n=⇒ xi ≥ yi, i = 1,2, . . . , n=⇒ x ≥ y

Problem 1.5 on page 3. No. If x > y then at least one component of x must be greater than thecorresponding component of y . Since there is only one component, this means that every componentof x is greater than the corresponding component of y . Thus x > y =⇒ x � y . The converse alsoholds.

Problem 1.12 on page 7. The origin has dimension 0. Surprised? Note that α(0,0,0) = (0,0,0) has anabundance of non-trivial solutions, e.g. α = 1. A line through the origin has dimension 1 and a planethrough the origin has dimension 2.

Problem 1.13 on page 7. Two facts characterize this projection. (i) Since ŷ ∈ X it must be the casethat ŷ = α̂x for some real α̂. (ii) The residual of the projection, y − ŷ , must be orthogonal to everyvector in X. Since x ∈ X fact (ii) implies that (y − ŷ) ·x = 0 or y ·x = ŷ ·x. Combining with (i) yieldsy · x = α̂x · x or, since ‖x‖ 6= 0, α̂ = (y · x)/(x · x) and thus ŷ = (y · x)/(x · x)x.Problem 1.15 on page 7.

x1 = (1 0 0 · · · 0)x2 = (0 1 0 · · · 0)

...

xn = (0 0 0 · · · 1)

12

Problem 1.18 on page 8. Yes. The argument proceeds in three steps.

1. M is an affine space:If yi ∈ M, i = 1, . . . , k then it must be the case that yi = xi−a, i = 1, . . . , k for some xi ∈ L, i =1, . . . , k. Since L is affine, it follows that

∑iαi = 1 implies z ≡

∑iαixi =

∑iαi(yi + a) ∈ L. But

this means that∑iαiyi = z − a ∈ M . Thus M is an affine space.

2. The origin belongs to M :Since a ∈ L it follows that a− a = 0 ∈ M .

3. An affine space which contains the origin is necessarily a linear space:Suppose that xi ∈ M, i = 1, . . . , k and βi ∈ R, i = 1, . . . , k. We need to show that the linearcombination

∑i βixi ∈ M .

Note that βixi = βixi + (1 − βi)0 ∈ M since xi,0 ∈ M and M is affine. But then∑i βixi + (1 −∑

i βi)0 =∑i βixi ∈ M .

Problem 1.23 on page 10. (i) if x,x′ ∈ X(a) then a ·x = a ·x′ = 0, a ·x+a ·x′ = a · (x+x′) = 0 andthus x + x′ ∈ X(a) (ii) if x ∈ X(a) then a · x = 0, αa · x = a · (αx) = 0 and thus αx ∈ X(a).Problem 1.24 on page 10. Since (i) the dimension of Rn equals n, (ii) a itself spans (occupies) a linearsubspace of dimension 1 and (iii) X(a) contains all those xs which are orthogonal to a, it is not hardto see that there are n−1 directions left in which to find vectors orthogonal to a. Thus the dimensionof X(a) must be equal to n− 1.Problem 1.26 on page 10. Since xi ∈ X(a,b) implies a·xi = b and

∑iαi = 1 for any affine combination∑

iαixi, it follows that

b = (α1 + . . .+αk)b= α1a · x1 + . . .+αka · xk

= a · (α1x1 + . . .+αkxk)

and∑iαixi ∈ X(a,b).

13

Chapter 2

Matrix Algebra

2.1 Linear Spaces . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 15

2.2 Real Valued Linear Transformations and Vectors . . . . . . . . . . . . . . . . . . . . 16

2.3 Linear Transformations and Matrices . . . . . . . . . . . . . . . . . . . . . . . . . 17

2.4 Invertible Transformations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 19

2.5 Change of Basis and Similar Matrices . . . . . . . . . . . . . . . . . . . . . . . . . 20

2.6 Systems of Linear Equations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 21

2.6.1 Homogeneous Equations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 22

2.6.2 Non-Homogeneous Equations . . . . . . . . . . . . . . . . . . . . . . . . . . . . 23

2.7 Square Matrices . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 24

2.7.1 The Inverse of a Matrix . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 24

2.7.2 Application: Ordinary Least Squares Regression as a Projection . . . . . . . . . . . . . 25

2.7.3 The Determinant . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 26

2.7.4 Cramer’s Rule . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 28

2.7.5 Characteristic Roots . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 29

2.8 Farkas’ Lemma . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 31

2.8.1 Application: Asset Pricing and Arbitrage . . . . . . . . . . . . . . . . . . . . . . . 32

2.9 Answers . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 33

2.1 Linear Spaces

Thus far we have thought of vectors as points in Rn represented by n-tuples of real numbers. Thisis a little like thinking of “127 Main Street” as a 15 character text string when, in reality, it’s a house.Similarly, an n-tuple of real numbers is best regarded as the address of the vector that lives there.

15

||x||

||y||

x

y

x+y

3/2 x

o 1

1

x = (1, 0)

y = (0, 1)x+y = (1, 1)

3/2 x = (3/2, 0)

2-dimensional linear space R2

Figure 2.1: A linear space (left) and the corresponding “address space” (right)

All this can be made less abstract by constructing a “coordinate free” linear space using a pencil, ruler,protractor and a blank sheet of paper. Begin by placing a point on the paper and labeling it o torepresent the origin. Then arbitrarily pick another couple of points, label them x and y and drawarrows connecting them to o. This is illustrated in the left-hand panel of Figure 2.1.

The lengths, ‖x‖ and ‖y‖, of x and y , respectively, can be measured with the ruler. The scalar productof x and, say 3/2, can then be obtained by extending x using the ruler until the length is 3/2 times aslong as x. Multiplying by a negative real number, say −2, would require extending x in the oppositedirection until it’s length is 2 times the original length. The scalar multiple of an arbitrary point z bythe real number a is then obtained by expanding (or contracting) z until its length equals ‖a‖‖z‖ andthen reversing the direction if a is negative.

To add, say, x and y use the protractor to construct a parallel to y through x and a parallel tox through y . The intersection of these parallels gives x + y . Adding any other two points wouldsimilarly be accomplished by “completing the parallelogram” formed by the two points.

Note that x and y are linearly independent since ax + by = o has only the trivial solution a = b = 0.Any other point, z, can be expressed as a linear combination, z = ax + by , for appropriate choices ofthe real numbers a and b. This means that the two vectors, x and y , form a basis for our linear spacewhich, consequently, is 2-dimensional. All this is possible without axes and coordinates.

Now let’s add coordinates by choosing x and y , respectively, as the two basis vectors for our linearspace. The corresponding 2-dimensional “address” space is illustrated in the right-hand panel of Fig-ure 2.1 where, for example, (1,0) is the address of x since x lives 1 unit out the first basis vector (x)and 0 units out the second basis vector (y). In general, (a, b) in the right-hand panel is the address ofthe vector ax + by in the left-hand panel.Definition 2.1. A linear space is an abstract set, L, with a special element called the “origin” anddenoted o together with an operation on pairs of elements in L called “addition” and denoted + andanother operation on elements in L and real numbers called “scalar multiplication” with the propertythat for any x,y ∈ L and any a ∈ R: (i) x + o = x, (ii) 0x = o, (iii) x +y ∈ L and (iv) ax ∈ L.

2.2 Real Valued Linear Transformations and Vectors

Suppose that L is an n-dimensional linear space and that b = {b1, b2, . . . bn} ⊂ L form a basis for L. Areal-valued linear transformation on L is a map, t, from L into R with the property that t(ax + by) =at(x)+ bt(y) for all real numbers a and b and all x,y ∈ L.

16

Since b is a basis for L, an arbitrary x̂ ∈ L must be expressible as a linear combination of the elementsof b. Thus x̂ =

∑i xibi where x = (x1, x2, . . . , xn) ∈ Rn and, since T is linear, T(x̂) = T(

∑i xibi) =∑

i T(bi)xi = a · x, where ai ≡ T(bi) and thus a ∈ Rn. This means that a · x is the image of T(x̂)when x is the address of x̂. It also means that for every real-valued linear transformation on then-dimensional linear space, L, there is a corresponding vector, a ∈ Rn, that represents the associatedformula for getting the image of a point from its address.

♦Query 2.1. Let L∗ denote the set of all real-valued linear transformations of the linear space L anddefine addition and scalar multiplication for elements of L∗ as follows for all f , g ∈ L∗ and α ∈ R:

(f + g)(x) ≡ f(x)+ g(x), ∀x ∈ L(αf)(x) ≡ αf(x), ∀x ∈ L

L∗ thus defined is called the dual space of L. Is it a linear space and, if so, what is its dimensionality?

2.3 Linear Transformations and Matrices

In general, a linear transformation is a mapping that is, well, linear. This means (i) that if x maps intoT(x) and a is a real number, then ax must map into aT(x) and (ii) that if x and y map into T(x) andT(y), respectively, then x +y must map into T(x)+ T(y).

Let’s suppose that the domain and range of the linear transformation are both equal to the 2-dimensionallinear space illustrated in Figure 2.1 on the previous page and construct a linear transformation. Con-sider the left-hand panel of Figure 2.2. First select the same basis vectors as before, x and y . Nowchoose arbitrary points to be the images of these two points and label them T(x) and T(y). You’redone. That’s right, you have just constructed a linear transformation. To see why simply note that anypoint in the domain, z, can be expressed as a linear combination of the basis vectors, z = ax + by .But then the linearity of T implies that T(z) = T(ax + by) = aT(x)+ bT(y). Thus the image of anypoint in the domain is completely determined by the starting selection of T(x) and T(y).

x

y

o

T(x)

T(y)

x = (1,0)

y = (0,1)T(x) = (1, 2/3)

T(y) = (1/2, 1)

1/2 x

2/3 y

Figure 2.2: A linear transformation

Note that the T(x) and T(y) in the illustration are linearly independent. This need not be the case,they could be linearly dependent and span either a one-dimensional linear subspace of L, a line, or azero-dimensional linear subspace, the origin. See Problem 2.1 on following page.

As before, the right-hand panel of Figure 2.2 gives the “address view” of the same linear transforma-tion. This means that x = (1,0) maps into T(x) = (1,2/3), y = (0,1) maps into T(y) = (1/2,1) and,

17

in general, z = (z1, z2) maps into

T(z1x + z2y) = z1T(x)+ z2T(y)= (1,2/3)z1 + (1/2,1)z2

=[

1 1/22/3 1

][z1z2

]Thus the matrix-vector product [

1 1/22/3 1

][z1z2

]gives a formula for computing the address of the image of z from the address of z.�Problem 2.1. Suppose, in the construction of the linear transformation in Figure 2.2 on precedingpage, that x and y are linearly independent but that T(x) and T(y) were chosen in a way that madethem linearly dependent and, in fact, span only a one-dimensional linear subspace. Discuss the impli-cations for the image of the domain under the transformation and for the matrix that maps addressesfor the transformation.

Definition 2.2. A linear transformation is a mapping, T , which associates with each x in some n-dimensional linear space, D, a point T(x) in some m-dimensional linear space, R, with the propertythat if x1, . . . , xk ∈ D and α1, . . . , αk ∈ R then

T

∑iαixi

=∑iαiT

(xi)

It is customary to refer to D as the domain and R as the range of the transformation.

While the definition imposes no restriction upon the values of m and n it is convenient to assume forthe moment that m = n and R = D. Suppose that b1, . . . , bn ∈ D is a basis for both D and R, and letT(bj), j = 1, . . . , n be the images of these basis vectors under the transformation. Since T(bj) belongsto R = D it can be expressed as a linear combination of the basis vectors:

T(bj) =n∑i=1aijbi

The matrix a11 a12 · · · a1na21 a22 · · · a2n

......

. . ....

an1 an2 · · · ann

obtained in this way has for its jth column the “address” of T(bj) in terms of the basis, i.e., T(bj)“lives” a1j out the basis vector b1, a2j out b2 and so forth.

Similarly, an arbitrary vector x ∈ D can be expressed as a linear combination of the basis vectors

x =n∑j=1xjbj

and the resulting column vector x1x2...

xn

18

can be interpreted as the address of x in terms of the basis.

Now since T is linear,

T(x) = T n∑j=1xjbj

=

n∑j=1xjT(bj)

=n∑j=1xj

n∑i=1aijbi

=

n∑i=1

n∑j=1aijxj

bi

so that the matrix-vector productA11 A12 · · · A1nA21 A22 · · · A2n

......

. . ....

An1 An2 · · · Ann

x1x2...

xn

can be interpreted as the address of T(x) in terms of the basis.

A similar result can be established when m 6= n so that to every linear transformation which maps ann-dimensional linear space into an m-dimensional linear space there corresponds an m by n matrixfor mapping addresses in terms of given bases for the domain and the range, and vice versa. Thisbeing the case, the study of linear transformations centers upon the matrix-vector product Ax or,equivalently, upon the linear transformations, T , for which D = Rn and R = Rm.♦Query 2.2. Supposem 6= n and thus R 6= D. Let d1, d2, . . . , dn ∈ D be a basis forD and r1, r2, . . . , rm ∈R be a basis for R. Derive the formula for mapping the address of x ∈ D into the address of T(x) ∈ R.

Now choose a subset of the domain, Rn, and recall that:

Definition 2.3. The image of X ⊆ Rn under T is

T(X) ≡ {y ∈ Rm | y = T(x),x ∈ X} = {y ∈ Rm | y = Ax,x ∈ X}

Note that T(Rn), the set of all linear combinations of the columns of A, is a linear subspace with adimension equal to the number of linearly independent columns or rank of A. It is also true thatRank(A) ≤min{m,n} since there can’t be more linearly independent columns than there are columnsand since the columns themselves live in Rn. When Rank(A) < n the transformation “collapses” thedomain into a linear subspace. No such collapse takes place when Rank(A) =m = n and T(Rn) = Rn.

2.4 Invertible Transformations

Definition 2.4. Given a mapping T : Rn , Rm, the inverse image of Y ⊆ Rm is

T−1(Y) ≡ {x ∈ Rn | T(x) ∈ Y}

19

Definition 2.5. A mapping is invertible iff the inverse image of any point in the range is a single pointin the domain.

Note that when T is invertible

T−1(T(x)

)= x = T

(T−1(x)

)(2.1)

�Problem 2.2. [Answer] Show that the transformation associated with the matrix A is invertible iffRank(A) =m = n.�Problem 2.3. Suppose that the n×n matrix A is invertible. Does it follow that Ax = 0 =⇒ x = 0?�Problem 2.4. Suppose that A in an n×n matrix and that Ax = 0 =⇒ x = 0. Does it follow that A isinvertible?

2.5 Change of Basis and Similar Matrices

What difference does the choice of a basis make to the matrix that represents the linear transformationwith respect to the basis? Suppose that A is the original matrix,

[b1, b2, . . . , bn

]is the original basis

and[b̂1, b̂2, . . . , b̂n

]is the new basis. Since each original basis vector must be uniquely associated with

a new basis vector and since, as bases, both must be linearly independent, this change of basis definesa linear transformation which maps the new basis vectors to the old ones and this transformationmust be invertible. Let P be the matrix version of this transformation so that if x̂ is the address of avector in terms of the new basis, then Px̂ is the address of the same vector in terms of the old basis.Since the transformation itself has not changed, it must be the case that x = Px̂ maps into Ax or,in terms of the new basis, that x̂ maps into P−1Ax = P−1APx̂. Thus B = P−1AP is the matrix thatrepresents the transformation with respect to the new basis.

Definition 2.6. Two matrices, A and B, are called similar if there exists an invertable matrix P suchthat B = P−1AP .Theorem 3. Two matrices, A and B, represent the same linear transformation with respect to differentbases iff A and B are similar.

The power of this result is twofold:

1. From a collection of similar matrices, the simplest or most analytically convenient can be selectedsince they all represent the same linear transformation.

2. The set of all linear transformations can be partitioned into sets of similar transformations anda simplest representative selected from each to form a set of canonical or representative forms.For example, it can be shown that all 2 × 2 matrices are similar to one of the following threematrices:

[a b−b a

] [c 00 d

] [r 01 r

](2.2)

Understanding linear transformations of 2-dimension linear spaces then reduces to understand-ing these three canonical forms.

�Problem 2.5. Suppose

A =[a b−b a

]

20

Show that ‖Ax‖/‖x‖ =√a2 + b2 and cos(θ) = a/

√a2 + b2 where θ is the angle between x and Ax

and thus that this transformation corresponds to a rotation and either a lengthening or a shortening.Hint: For the first part, try Mathematica with

A = {{a, b}, {-b, a}};x = {x1, x2};Assuming[{Element[x1, Reals], Element[x2, Reals], Element[ a, Reals],

Element[b, Reals]}, Simplify[Norm[A.x]/Norm[x]]]

♦Query 2.3. Suppose

A =[c 00 d

]

Interpret the transformation T , i.e., what are the images, T(x) and T(y), of the two basis vectors, xand y?

2.6 Systems of Linear Equations

Solving simultaneous systems of linear equations involves nothing more than identifying the proper-ties of the inverse image of a linear transformation. To solve the homogeneous system

a11x1 +· · ·+ a1nxn = 0...

am1x1 +· · ·+ amnxn = 0

or Ax = 0 is to find the inverse image of 0 under this linear transformation. Similarly, to solve thenon-homogeneous system

a11x1 +· · ·+ a1nxn = b1...

am1x1 +· · ·+ amnxn = bmor Ax = b is to find the inverse image of b under this transformation.

Two distinct views of the matrix-vector product prove useful. In the column view, the vector Ax isviewed as a linear combination of the columns of A using the components of x as the weights:

a11a21

...

am1

x1 +a12a22

...

am2

x2 + · · · +a1na2n

...

amn

xn (COL)

In the row view, the components of the vector Ax are viewed as the dot products of the rows of Awith the vector x:

[a11 a12 · · · a1n] · x[a21 a22 · · · a2n] · x

......

. . ....

...

[am1 am2 · · · amn] · x

(ROW)

21

2.6.1 Homogeneous Equations

Consider the homogeneous system Ax = 0 using the column view. A non-trivial solution (x 6= 0)is possible iff the columns of A are linearly dependent since a non-trivial linear combination of thecolumns using the components of x as weights can only be equal to zero if the columns are linearlydependent.

The row view confirms this since x must be orthogonal to each row of A and thus to the linearsubspace spanned by the rows of A. This is possible iff Rank(A) = r < n in which case the rows spana r dimensional linear subspace and there are n− r directions left to look for things orthogonal. Thesolution set in this case, not surprisingly, is itself a linear subspace of dimension n − r and is calledthe null space of A.


A =

0 1 0 0 0

0 0 0 1 1

0 1 0 1 1

1 1 0 0 1

The Mathematica command

MatrixRank[A]

gives the rank of the matrix A and the command

NullSpace[A]

gives an orthogonal basis for the null space of A. (i) What is the rank of A? (ii) Give an orthogonalbasis for the null space of A. (iii) What is the rank of the null space of A?

The column view is illustrated in the left-hand panel of Figure 2.3 for the case in which

A =[

6 −34 −2

]

Column 1:(6, 4)

Column 2:(-3, -2)

Column View

(6, 4)1+(-3 ,-2)2 = (0,0)

Row 2: (4, -2)

Row 1: (6, -3)

Ax = (0, 0)

Row View

(1, 2)

Figure 2.3: Non-trivial Solutions for Ax = 0

Since rank(A) = 1 there are non-trivial choices for the weights x1 and x2 for which A·1x1+A·2x2 = 0,e.g., (x1, x2) = (1,2). The right-hand panel presents the corresponding row view in which the solutionset is a 2− 1 = 1 dimensional linear subspace orthogonal to the linear subspace spanned by the rowsof A. Note that (x1, x2) = (1,2) belongs to the solution set.

22

2.6.2 Non-Homogeneous Equations

The non-homogeneous system Ax = b is similar. The column view suggests that a solution is possibleiff b lies in the linear subspace spanned by the columns of A. Put somewhat differently, a solutionis possible iff Rank(A|b) = Rank(A). Given any one such solution, x∗, it is possible to obtain allsolutions as follows. Since Ax∗ = b it follows that if x′ is any other solution it must be the casethat Ax∗ = Ax′ = b or A(x′ − x∗) = 0. Now we already know that solutions to Ax = 0 form alinear subspace of dimension n − Rank(A). The solutions to Ax = b must then correspond to theset obtained by adding x∗ to each of the solutions to Ax = 0 — an affine subspace of dimensionn− Rank(A).This is illustrated in Figure 2.4 for the case in which

A =[

4 3

−3 4

]

b =[

5

−10

]

Column View Row View

(4, -3)

(3, 4)

(5, -10)

(4, 3)(-3, 4)

(2, -1)

-1(3, 4) +2(4, -3) S1

S2

Figure 2.4: A Unique Solution for Ax = b

Since rank(A) = 2 = n, the solutions to Ax = b must be an affine subspace of rank zero — a singlepoint — which corresponds to the trivial solution for Ax = 0. In column view illustrated in theleft-hand panel this unique solution for x is obtained by “completing the parallelogram” whose sidescorrespond to the columns of A and whose diagonal corresponds to b. It follows that the uniquesolution is x = (2,−1). Notice that if the columns of A were chosen as the basis, then the addressof b would be (2,−1). In the row view illustrated in the right-hand panel, the unique solution for xcorresponds to the intersection of

S1 = {x | (4,3) · (x1, x2) = 5}

a hyperplane orthogonal to the first row of A and lying a directed distance equal to 5/‖(4,3)‖ = 1from the origin and

S2 = {x | (−3,4) · (x1, x2) = −10}a hyperplane orthogonal to the second row of A and lying a directed distance equal to −10/‖(−3,4)‖ =−2 from the origin.

23

2.7 Square Matrices

2.7.1 The Inverse of a Matrix

When T Rn , Rn is invertible, the inverse image of any point, T−1(x), is itself a point. Thus T−1 is isalso a linear transformation. As such it has an associated matrix which is denoted, naturally enough,A−1. A consequence of Equation 2.1 on page 20 is that

A−1Ax = x = AA−1x

or that

A−1A = I = AA−1 (2.3)

where I is the identity matrix: 1 0 · · · 00 1 · · · 0...

.... . .

...

0 0 · · · 1

Equation 2.3 thus requires that

Ai· ·A−1·j ={

1 if i = j0 if i 6= j (2.4)

where Ai· is the ith row of A and A−1·j is the jth column of A−1. The jth column of A−1 must therefore

1. be orthogonal to every row of A save for the jth.

2. form an acute angle with the jth row.

3. be just long enough to make the dot product with the jth row equal to one.

These requirements can be used to construct the inverse geometrically — see Figure 2.5 on the nextpage for the case in which n = 2 and

A =[

4 3

2 6

]A−1 =

[1/3 −1/6−1/9 2/9

]

24

(4, 2)

(3, 6)

R1

R2

Figure 2.5: Constructing the Inverse

In Figure 2.5, R1 is the set of vectors which are orthogonalto the second column of A and form an acute angle with thefirst column — the first row of A−1 must belong to this set.Similarly, R2 is the set of vectors which are orthogonal to thefirst column of A and form an acute angle with the secondcolumn — the second row of A−1 must belong to this set.

�Problem 2.7. What problem would be encountered in con-structing the inverse if the columns of A were linearly depen-dent?

�Problem 2.8. The formula for the inverse of a 2 by 2 matrixis: [

a bc d

]−1= 1ad− bc

[d −b−c a

]Derive the first row of this inverse using Equation 2.4 on theprevious page.

�Problem 2.9. “Derive” the formula given in Problem Prob-lem 2.8 using the Mathematica commands A = {{a,b},{c,d}} and then Inverse[A]//MatrixForm.What difference would it make to replace //MatrixForm with //InputForm?

�Problem 2.10. [Answer] Gram’s Theorem states that if A is anm by nmatrix withm < n and x ∈ Rmthen

AATx = 0 a ATx = 0Prove Gram’s Theorem.

This theorem implies, for example, that if Rank(A) = m then Rank(AAT ) =m since ATx = 0 has nosolution, x 6= 0, and AATx = 0 must therefore have no solution either.

2.7.2 Application: Ordinary Least Squares Regression as a Projection

Consider the problem of ordinary least squares regression. In this problem data is available whichdescribes n observations on each of p exogenous and 1 endogenous variables. This arranged asfollows

• X: an n by p “exogenous data” matrix each row of which corresponds to an observation andeach column of which corresponds to an exogenous variable. There are more observations thanvariables so Rank(X) = p < n.

• y : an n by 1 “endogenous data” vector each “row” of which corresponds to an observation onthe endogenous variable.

The problem is to find the projection, ŷ , of y on S = {z | z = Xβ, β ∈ Rp }. The term “least squares”derives from the fact ŷ is the closest point to y in S and thus minimizes the sum of the squares ofthe components of the difference — see Figure 2.6 on following page.

There are two key facts:

• Since ŷ ∈ S, it must be the case that ŷ = Xβ̂ for some β̂. The problem of finding ŷ thus reducesto one of finding β̂.

25

y

S = Linear subspace spanned by columns of X

Projection of y on S

Residual from the projection

Figure 2.6: Ordinary Least Squares Regression

• Since ŷ is the projection of y on S, the residual of the projection, y − ŷ , must be orthogonal toS, the space spanned by the columns of X.

These facts are sufficient to identify β̂:

• Xj · (y − ŷ) = 0, j = 1, . . . , p. To be orthogonal to the space spanned by the columns of X, y − ŷmust be orthogonal to each column of X.

• XT (y − ŷ) = 0. Matrix version of previous line.

• XTy = XT ŷ = XTXβ̂. Carry out the multiplication and substitute for ŷ .

• β̂ = (XTX)−1XTy . Multiply both sides by (XTX)−1 which exists by virtue of Gram’s Theorem.

�Problem 2.11. Suppose x1 = (1,0,2), x2 = (2,0,1), y = (3,3,3) ∈ Rn and let L = {z ∈ Rn | z =α1x1 +α2x2, α, β ∈ R } be the linear subspace spanned by x1 and x2. Find ŷ , the projection of y onL. Hint:

X = {{1, 2}, {0, 0}, {2, 1}}y = {3,3,3}X . Inverse[Transpose[X] . X] . Transpose[X] . y

2.7.3 The Determinant

The following, due to Hadley [1961], page 87, is a typical definition of the determinant of a squarematrix — correct but not particularly intuitive.

Definition 2.7. The determinant of an n by n matrix A, written |A|, is

|A| ≡∑(±)a1ia2j . . . anr

the sum being taken over all permutations of the second subscripts with a term assigned a plus signif (i, j, . . . , r ) is an even permutation of (1,2, . . . , n), and a minus sign if it is an odd permutation.

When n = 2 this becomes ∣∣∣∣∣a11 a12a21 a22∣∣∣∣∣ = a11a22 − a12a21

26

�Problem 2.12. Use Mathematica to derive the formulas for the determinant and inverse of a general3× 3 matrix by first entering

A = {{a,b,c}, {d,e,f}, {g,h,i}}

and then Det[A] and Inverse[A]//MatrixForm.

It is often more useful to recognize that the determinant is another “signed magnitude” somewhatanalogous to the dot product which is best understood by examining its sign and its magnitude sepa-rately. In Figure 2.7 the (linearly independent) columns of

A =[

4 3

−3 4

]‖A‖ = 4× 4− 3×−3 = 25

have been illustrated and a parallelogram or, in this case a square, has been formed by completing thesides formed by these columns.

5

5

25 = |A|

A = (4, -3)1

A = (3, 4)2

Figure 2.7: The Determinant in R2: AnOriented Area

The first thing to notice is that movement from the first tothe second axis is counter clockwise and that the movementfrom the first column to the second is also counter clockwise.Thus the rows of A have the same orientation as the axes.This means that the determinant has a positive sign. (Switchthe columns and the determinant would be negative.) Themagnitude, moreover, corresponds to the area of this paral-lelogram.

Consider, alternatively, the columns of the singular matrix

B =[

6 −32 −1

]

The parallelogram formed by these columns is, in this case,degenerate — a segment of a line rather than an area. Thedeterminant is again equal to the area enclosed within thisline interval which, in this case, is equal to zero.

�Problem 2.13. The “formula” for a 2 by 2 determinant is:∣∣∣∣∣a bc d∣∣∣∣∣ = ad− bc

Show that |ad− bc| is the area of the parallelogram formed by the columns. Hint: let x = (a, c),y = (b,d) and note that the area of the parallelogram is equal to the length of the base, ‖x‖, timesthe altitude, ‖y − ŷ‖ where ŷ is the projection of y on the linear subspace spanned by x.

27

A1

A2

A3

Figure 2.8: The Determinant in R3:An Oriented Volume

Higher dimensional cases are analogous. The determinant ofa 3 by 3 matrix, for example, has a sign which depends uponwhether the columns have the same orientation as the axes anda magnitude equal to the volume of the parallelepiped formedby the columns. [A parallelepiped is a solid each face of whichis a parallelogram.] See Figure 2.8. When the columns arelinearly dependent the parallelepiped degenerates into a planearea (rank 2) or a line interval (rank 1), both of which have zerovolume and the determinant, accordingly, is equal to zero.

The determinant of an n by n matrix, analogously, has a signwhich depends upon the orientation of the columns and a mag-nitude equal to the volume of the “hyper” parallelepiped formedby the columns.

�Problem 2.14. Suppose A is an n by n matrix. Provide geo-metrical interpretations for the following propositions:

1. Suppose that Â·i = A·i for i 6= k andÂ·k = λA·k

i.e., Â is obtained from A by multiplying the kth column of A by a number λ. Then |Â| = λ|A|.

2. Suppose that Â·i = A·i for i 6= k and

Â·k = A·k +∑i6=kλiA·i

i.e., that Â is obtained from A by adding a linear combination of the other columns to the kthcolumn. Then |Â| = |A|.

�Problem 2.15. Suppose A is a non-singular n by n matrix. Show that |Ax| = α |A| for some α ∈ Rwhich depends only upon x. What is α?

2.7.4 Cramer’s Rule

An important application of the determinant is provided by:

Theorem 4 (Cramer’s Rule). If Ax = b with A an n by n matrix and |A| > 0 then

xi =|Bi||A|

where Bi is obtained by replacing the ith column of A with b.

The geometrical interpretation of this theorem is quite simple and is illustrated for the case in whichn = 2 in Figure 2.9 on the next page. Note first that the columns of A, labeled A1 and A2, are linearlyindependent and the solution for both x1 and x2 can be obtained by completing the parallelogram:

x1 =‖b1‖‖A1‖

x2 =‖b2‖‖A2‖

28

A1

A2

b

b 1

b 2c

d

e

o

f

Figure 2.9: Using Cramer’s Rule to SolveAx = b for x1

Let’s use Cramer’s Rule to find, say, x1. Since we wish toidentify the first component of x we begin by replacing thefirst column of A with b to obtain B1. Cramer’s rule thenasserts that

x1 =|B1||A|

Our task then is to show that

|B1||A| =

‖b1‖‖A1‖

(2.5)

Note first that |A| is the oriented volume of the parallelo-gram formed by the first and second columns of A which, inthis case, is positive and could be computed by multiplyingthe length of the “base”, oA2, by the “altitude”, the distancebetween the parallel lines ob2 and A1e. Since the parallelo-gram with vertices at o, d, e and A2 has the same base andthe same altitude, it’s area is also equal to |A|. Call thisparallelogram PA.

Turning attention to the numerator, |B1| is the area of the parallelogram with vertices at o, b, f andA2. Call this parallelogram PB . Since PB has the same base as PA, the ratio of the area of PB to thearea of PA, |B1| / |A|, is the same as the ratio of the distance between ob2 and b1f and the distancebetween ob2 and A1e. But this is the same as the ratio ‖b1‖/‖A1‖ which establishes Equation 2.5.

2.7.5 Characteristic Roots

A particularly important “characterization” of a square matrix is provided by its characteristic rootsand characteristic vectors:

Definition 2.8. If A is n by nmatrix, λ is a scalar and x 6= 0 is an n by 1 vector, then λ is a characteristicroot of A and x is the associated characteristic vector iff λ and x solve the characteristic equation:

Ax = λx (2.6)

Characteristic roots and vectors are also sometimes called (i) eigenvalues and eigenvectors or (ii) latentroots and latent vectors.

A fact worth noting about the characteristic roots of a matrix is that they characterize the underlyinglinear transformation and are invariant with respect to the choice of basis — recall the discussion ofSection 2.3 on page 17. To see this note that if A is the matrix representation of the linear transfor-mation T for a particular choice of basis, then to be a characteristic root of A, λ must satisfy

T(x) = λx

But this means that matrices which represent the same linear transformation under alternative choicesof basis, i.e., similar matrices, will have the same characteristic roots.

Since Equation 2.6 can be rewritten as the homogenous equation

[A− λI]x = 0

29

it follows that λ is a characteristic root of A iff

|A− λI| = 0

The expansion of this determinant∣∣∣∣∣∣∣∣∣∣∣∣

a11 − λ a12 · · · a1na21 a22 − λ · · · a2n

......

. . ....

an1 an2 · · · ann − λ

∣∣∣∣∣∣∣∣∣∣∣∣= 0

is a polynomial in λ with (−λ)n the highest order term [the product of the diagonal elements]. Fromthe fundamental theorem of algebra we know that such a polynomial will have n, not necessarilydistinct, solutions for λ.

�Problem 2.16. The characteristic roots of A may be either real or complex but if they are complexthey must occur in conjugate pairs so that if λ = a+ bi is a root then λ∗ = a− bi must also be a root.Show that it follows that both the sum of the roots and the product of the roots are necessarily real.

Two elementary facts about characteristic roots are worth noting.

Theorem 5. If A is an n by n matrix with characteristic roots λi, i = 1, . . . , n, thenn∑i=1λi = trace(A)

n∏i=1λi = |A|

where the trace of A is the sum of the diagonal elements:

trace(A) ≡n∑i=1aii

Since similar matrices must have the same characteristic roots, it follows from Theorem 5, that similarmatrices have the same trace and determinant as well.

�Problem 2.17. [Answer] Show that Theorem 5 is valid for the 2 by 2 matrix

A =[a bc d

]

�Problem 2.18. What are the characteristic roots of the three canonical matrices given in Equation 2.2on page 20?


A =

1 2 3 4

4 1 2 3

3 4 1 2

2 3 4 1

The Mathematica command for finding the characteristic roots and vectors of a matrix are Eigenvalues[A]and Eigenvectors[A], respectively. What are the characteristic roots and vectors of A?

30

2.8 Farkas’ Lemma

A final result that will prove very important in subsequent analysis takes us into the realm of linearinequalities. It states that a system of linear equations will have a solution precisely when anothersystem of linear inequalities does not have a solution. The importance of this result involves “indi-rection” — often it will be easier to establish the existence of a solution to the system of interest byshowing that the solution to complementary system cannot exist.

Theorem 6 (Farkas’ Lemma). Suppose A is an m by n matrix and b 6= 0 is a 1 by n row vector. Exactlyone of the following holds:

1. yA = b, y > 0 has a solution y ∈ Rm

2. Az ≥ 0, b · z < 0 has a solution z ∈ Rn

bb

A1

A2

A1

A2

b . z < 0b . z < 0

yA, y > 0yA, y > 0

Az > 0_ Az > 0_

Az > 0, b z < 0._ yA = b, y > 0

Figure 2.10: Farkas’ Lemma

The basis of this theorem is quite simple and is illustrated in Figure 2.10. Either

1. a vector z exists which forms a non-obtuse angle with every row of A and an obtuse angle with b(the left-hand panel)

2. or b lies in the “cone” generated by the rows of A (the right-hand panel)

The key to understanding Figure 2.10 is to fix the rows of A and rotate b clockwise in moving from theleft-hand panel to the right-hand panel. Initially b lies outside the cone generated by the rows of A. Itfollows that there is a vector z for which Az ≥ 0 and for which b · z < 0, i.e., a vector z that makes annon-obtuse angle with every row of A and an obtuse angle with b.

As b rotates clockwise this solution disappears precisely at the point at which b enters the conespanned by the rows of A but then there is a solution to yA = b with y > 0. This solution persistsuntil b emerges from the cone spanned by the rows of A but at this point there is again a solution toAz ≥ 0 and b · z < 0.

31

2.8.1 Application: Asset Pricing and Arbitrage

Consider a two period model of asset pricing. There are n assets which can be traded in the firstperiod at prices p. The first period budget constraint limits an investor endowed with portfolio ŝ toportfolios satisfying

p · s ≤ p · ŝ (2.7)

Asset prices in the second period are uncertain and depend upon which of m possible “states ofnature” occurs. It common knowledge when first period trading takes place that the second-periodprice of the jth asset will be aij if the ith state occurs. Let A denote the corresponding m×n matrixof second period prices in which rows correspond to states and columns to assets. Holding portfolios would then pay As in the second period, i.e., the ith component of this m-tuple would be the totalvalue of the portfolio if the ith state occurred.

Note that the components of s are not required to be non-negative. Indeed, negative componentscorrespond to “short” positions, e.g., s1 = −1 would be interpreted as taking a short position of oneshare on the first asset. This means the investor borrows a share of this asset from the market, sellsit for p1 and then uses the receipts to purchase other shares. The catch, of course, is that such loansmust be repaid in the second period. Our investor would thus be required to purchase one share ofthe first asset in the second period, whatever its price turns out to be, to repay the first-period loan.The second-period “solvency” constraint is that the investor must be able to repay such loans or thatholding the portfolio not entail bankruptcy in any state

As ≥ 0 (2.8)

It is important to realize that the components of As are the commodities that investors care about— the components of s only matter to the extent that they affect As. Since p is vector of securityprices and securities are not themselves the focus of interest, the question arises of whether or notit is possible to identify an m-tuple of “shadow prices”, ρ, of the commodities of interest. Here ρiwould be interpreted as the price of a claim to one dollar contingent upon state i occurring in fictionalshadow market. For such shadow prices to be interesting, trade opportunities in the shadow marketwould have to be equivalent to those in the actual markets, i.e., ρ would have to satisfy

p = ρA, ρ > 0 (2.9)

Can we be sure that a solution to Equation 2.9 exists? Well, if we make the association y = ρ, b = pand z = s, then Farkas’ Lemma states that either Equation 2.9 will have a solution or there will be asolution, s, to

As ≥ 0, p · s < 0 (2.10)

An s that satisfied Equation 2.10 would be a good thing, too good in fact. It not only satisfies solvency,As ≥ 0, but also “pumps money” into the pocket of the investor in the first period since p · s < 0. Inthe context of the budget constraint, Equation 2.7, this means that

p · (λs) = λp · s ≤ p · ŝ

is satisfied for an arbitrarily large λ and thus that our investor could acquire infinite first period wealth.This is commonly called an arbitrage opportunity. If we make the reasonable supposition that p andA preclude such arbitrage opportunities, then the existence of shadow prices satisfying Equation 2.9is guaranteed.

♦Query 2.4. Suppose m = n = 2, that the two columns of A, A1 and A2, are linearly indpendent andthat p · ŝ = 1, i.e., our investor is worth one dollar in the first period.

32

1. In a graph of the positive quadrant of R2, illustrate A1 and A2 and the points v1 ≡ A1/p1 andv2 ≡ A2/p2. Is either v1 > v2 or v2 < v1 consistent with no arbitrage opportunities?

2. Illustrate the budget constraint for contingent claims under the assumption that no arbitrageopportunities exist. Label the regions corresponding to long positions on both assets, to a longposition on the first asset and a short position on the second and to a short position on the firstasset and a long position on the second.

3. What is the effect in your illustration of adding the solvency constraint to the budget constraint?

4. Is it possible to determine the shadow prices, ρ1 and ρ2, from your illustration and, if so, how?

♦Query 2.5. Suppose that no arbitrage opportunities exist and let x = As and x̂ = Aŝ. What is thebudget constraint corresponding to Equation 2.7 on the previous page in terms of x, x̂ and ρ? Whatis the solvency constraint corresponding to Equation 2.8 on the previous page?

♦Query 2.6. Suppose that a new asset is introduced, that Rank(A) = Rank(A|b) where b is the vectorof state-dependent, second-period prices for the new asset, that no arbitrage opportunities exist eitherbefore or after the introduction of the new asset and that p = ρA is the vector of first-period prices ofthe original assets. What must be the first-period price of the new asset?

♦Query 2.7. Suppose that no arbitrage opportunities exist and that there is a riskless portfolio, s∗, forwhich As∗ = (1,1, . . . ,1)T . What is the one-period riskless rate of return? Hint: What is the first-periodcost of buying claims to a sure, second-period dollar?

2.9 Answers

Problem 2.2 on page 20. Suppose that Rank(A) =m = n and that A is not invertible. Then there mustexist y,x,x′ ∈ Rn with x 6= x′ such that y = Ax = Ax′. But this means that A(x − x′) = 0 with(x−x′) 6= 0 and thus the columns of A are linearly dependent — a contradiction. Conversely, supposeA is invertible and the columns of A are linearly dependent. Then there exist weights α = (α1, . . . , αn)such that Aα = 0. Now choose any x ∈ Rn and note that x−α 6= x+α and yet A(x−α) = A(x+α) =Ax — thus A is not invertible.

Problem 2.10 on page 25. Suppose AATx = 0. ThenxTAATx = 0 (ATx)T (ATx) = 0 |ATx| = 0 ATx = 0

and, conversely, if ATx = 0 then clearly AATx = 0.Problem 2.17 on page 30. Expanding the determinant yields

(a− λ)(d− λ)− bc = 0or

λ2 − (a+ d)λ+ ad− bc = 0Using the quadratic formula yields

λ1 =a+ d+

√(a+ d)2 − 4(ad− bc)

2

λ2 =a+ d−

√(a+ d)2 − 4(ad− bc)

2

It follows immediately that

λ1 + λ2 = a+ d = trace(A)λ1λ2 = ad− bc = |A|

33

Chapter 3

Topology

3.1 Counting . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 35

3.1.1 Countable Sets . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 36

3.1.2 Uncountable Sets . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 37

3.2 Metric Spaces . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 39

3.2.1 Open Sets . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 40

3.2.2 Closed Sets . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 41

3.2.3 Convergence . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 41

3.2.4 Continuity . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 42

3.3 Topological Spaces . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 43

3.3.1 Separation Properties . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 44

3.3.2 Generic Properties . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 44

3.3.3 Compactness . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 45

3.4 Sigma Algebras and Measure Spaces . . . . . . . . . . . . . . . . . . . . . . . . . . 45

3.5 Answers . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 46

This chapter draws much from Simmons [1963], surely one of the most beautiful books about mathe-matics ever written.

3.1 Counting

The subject of counting begins simply enough with thoughts of the positive integers, 1, 2, 3, . . .,familiar to all of us. But counting was surely important to human beings even before such symbolswere invented. Imagine a primitive society of sheep herders whose number system was limited tothe symbols “1”, “2”, “3” and “several”, i.e., “more than 3”. How might they have kept track of herdscontaining “several” sheep? One simple device might have been to place a stone in a pile for eachsheep in the herd and then, each night, to remove a stone for each sheep accounted for. Stones left inthe pile would then have indicated strays needing to be found.

35

3.1.1 Countable Sets

Similarly, the infinite setN = {1,2,3, . . .}

containing all the positive integers or cardinal numbers, serves as a modern “pile of stones”. Whilethis set is adequate for counting any non-empty, finite set, in mathematics there are many infinite setsjust as, for the herdsmen, there were many herds with “several” sheep. The simple but profound ideaof a one-to-one correspondence that met the needs of the herdsmen also permits comparing theseinfinite sets.

Definition 3.1. Two sets are said to numerically equivalent if there is a one-to-one correspondencebetween the elements of the two sets.

Definition 3.2. A countable set is a set that is numerically equivalent to the positive integers.

Suppose, for example, that we want to compare the set consisting of all positive integers with the setconsisting of of all even positive integers. Since the pairing

1 2 3 · · · n · · ·2 4 6 · · · 2n · · ·

establishes a one-to-one correspondence, the two sets must be regarded as having the same numberof elements even though one is a proper subset of the other. This situation is not unusual since everyinfinite set can, in fact, be put into a one-to-one correspondence with a proper subset of itself.

Similarly, there are exactly as may perfect squares as there are positive integers because these twosets can also be put in a one-to-one correspondence:

1 2 3 · · · n · · ·12 22 32 · · · n2 · · ·

As another example, consider the set of all positive rational numbers, i.e., ratios of positive integers.Surely this set is larger than the positive integers, right? No. The following array includes everypositive rational number at least once

1/1 1/2 1/3 1/4 · · ·2/1 2/2 2/3 2/4 · · ·3/1 3/2 3/3 3/4 · · ·

......

......

. . .

and can be put into a one-to-one correspondence with the positive integers as follows:

1 2 3 4 5 6 7 8 9 · · ·1/1 1/2 2/1 1/3 2/2 3/1 1/4 2/3 3/2 · · ·

�Problem 3.1. Construct a one-to-one correspondence between the set of integers,

{· · · ,−2,−1,0,1,2, · · · }

and the set of positive integers.

So how many positive integers are there? The symbol ℵ0, read “aleph null”, is used to represent thenumber of elements or cardinality of the set. Our list of numbers now includes its first “trans-finite”number:

1 < 2 < 3 < · · · < ℵ0

36

3.1.2 Uncountable Sets

Not all sets with infinitely many elements are countable. Consider a countable sequence of points ofthe form x1, x2, x3, ... where each element xi is either 0 or 1 and a countable listing of these sequencessuch as:

s1 = (0,0,0,0,0,0,0, · · · )s2 = (1,1,1,1,1,1,1, · · · )s3 = (0,1,0,1,0,1,0, · · · )s4 = (1,0,1,0,1,0,1, · · · )s5 = (1,1,0,1,0,1,1, · · · )s6 = (0,0,1,1,0,1,1, · · · )s7 = (1,0,0,0,1,0,0, · · · )

...

It is possible to build a list of elements s0 in such a way that its first element is different from the firstelement of the first sequence in the list, its second element is different from the second element of thesecond sequence in the list, and, in general, its nth element is different from the nth element of thenth sequence in the list. For instance:

s1 = (0,0,0,0,0,0,0, · · · )s2 = (1,1,1,1,1,1,1, · · · )s3 = (0,1,0,1,0,1,0, · · · )s4 = (1,0,1,0,1,0,1, · · · )s5 = (1,1,0,1,0,1,1, · · · )s6 = (0,0,1,1,0,1,1, · · · )s7 = (1,0,0,0,1,0,0, · · · )

...

s0 = (1,0,1,1,1,0,1, · · · )

Note that the highlighted element in s0 is in every case different from the highlighted element in thetable above it and thus the new sequence is distinct from all the sequences in the list. From this itfollows that the set T , consisting of all countable sequences of zeros and ones, cannot be put intoa list s1, s2, s3, .... Otherwise, it would be possible by the above process to construct a sequence s0which would both be in T (because it is a sequence of 0’s and 1’s) and at the same time not in T(because we deliberately construct it not to be in the list). Therefore T cannot be placed in one-to-onecorrespondence with the positive integers. In other words, T is uncountable.

Now consider the binary representation of a number between zero and one where, for example, 1/2would be represented as 0.1, 1/4 would be 0.01 and so forth. Since the binary representation of a realnumber between zero and one must be a countable sequence of zeros and ones preceded by a decimalpoint, e.g., 0.1011011100 . . ., and since the number of such sequences is uncountable, it follows thatthe set of real numbers lying between zero and one must also be uncountable.

Surely there are more real numbers than those lying between zero and one, right? No, the set of all realnumbers and the set of real numbers between zero and one, or in any other interval, are numerically

37

P

P'

a b

0

Figure 3.1: One-to-one Correspondence Between an Interval and the Real Line

equivalent. The one-to-one correspondence is illustrated in Figure 3.1. Simply bend the interval abinto a semi-circle, rest the result on the real line and then associate an arbitrary point P from theinterval with that point P ′ from the real line which corresponds to the intersection of a line from thecenter of the semi-circle through P with the real line.

We now have a new cardinal number, c, called the cardinal number of the continuum and our list ofnumbers now includes a second “trans-finite” number:

1 < 2 < 3 < · · · < ℵ0 < c

�Problem 3.2. The Cantor set is obtained as follows. First let C1 denote the closed unit interval [0,1].Next delete from C1 the open interval (1/3,2/3) corresponding to the middle third of C1 to get C2 andnote that C2 = [0,1/3]∪ [2/3,1]. Now delete the open middle thirds of the two closed intervals to get

C3 = [0,1/9]∪ [2/9,1/3]∪ [2/3,7/9]∪ [8/9,1]

Continuing in this fashion we obtain a sequence of closed sets, each of which contains all its succes-sors. The Cantor set is defined by

C = ∩∞i=1Ci

1. Each Cn consists of a number of disjoint closed intervals of equal length. How many closedintervals are there in C30?

2. The intervals removed have lengths 1/3,2/9,4/27, . . . ,2n−1/3n, . . . What is the combined lengthof the intervals that have been removed? Hint: Let Mathematica evaluate

Sum[2^(n-1)/3^n, {n,1,Infinity}]

You might be surprised at this point to learn that the cardinality of C is equal to c, i.e., the same as C1.

An interesting consequence is that since the rational numbers are countable but the real numbers arenot, the set of irrational numbers must be uncountable as well or, more poetically:

The rational numbers are spotted along the real line like stars against a black sky, and thedense blackness of the background is the firmament of the irrationals.

– T. E. Bell

Are there any cardinal numbers between ℵ0 and c? No one knows the answer to this question thoughCantor himself thought that no such number exits. There are, on the other hand, cardinal numberslarger than c — the number of elements in the class of all subsets of R, for example. This is oneconsequence of the following theorem.

Theorem 7. If X is any non-empty set, then the cardinal number of X is less than the cardinal numberof the class of all subsets of X.

38

Suppose, for example, that X = {1}, then there are two subsets, and {1}. If X = {1,2}, then thereare four subsets, , {1}, {2} and {1,2}. Similarly, X = {1,2,3} has eight subsets and, in general, if Xhas n elements, then there are 2n s

math camp for economists - duke university › ~dgraham › mathcamp › mathcamp-10-07-29.pdf(6,...

Documents