math review 2014

Math Review Course 2013

Ram Sewak Dubey

August 26, 2013

Contents

Syllabus xi0.1 Overview . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . xi0.2 Course Schedule . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . xii0.3 Topics covered . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . xii0.4 Textbook . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . xiv

1 Introduction to Logic 11.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11.2 Statements . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 31.3 Logical Connective . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 41.4 Quantifiers . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 101.5 Rules of Negation of statements with quantifiers . . . . . . . . . . . . . . 14

1.5.1 More Examples . . . . . . . . . . . . . . . . . . . . . . . . . . . 151.5.2 Negation of statement with one quantifier . . . . . . . . . . . . . 151.5.3 Negation with more than one quantifier . . . . . . . . . . . . . . 16

1.6 Logical Equivalences . . . . . . . . . . . . . . . . . . . . . . . . . . . . 171.7 Some Math symbols and Definitions . . . . . . . . . . . . . . . . . . . . 18

2 Proof Techniques 212.1 Methods of Proof . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 212.2 Trivial Proofs . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 222.3 Vacuous Proofs . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 232.4 Proof by Construction . . . . . . . . . . . . . . . . . . . . . . . . . . . . 232.5 Proof by Contraposition . . . . . . . . . . . . . . . . . . . . . . . . . . . 252.6 Proof by Contradiction . . . . . . . . . . . . . . . . . . . . . . . . . . . 262.7 Proof by Induction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 292.8 Additional Notes on Proofs . . . . . . . . . . . . . . . . . . . . . . . . . 342.9 Decomposition or proof by cases . . . . . . . . . . . . . . . . . . . . . . 36

iii

iv CONTENTS

3 Problem Set 1 41

4 Solution to PS 1 43

5 Set Theory, Sequence 515.1 Set Theory . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 51

5.1.1 Basic Definitions . . . . . . . . . . . . . . . . . . . . . . . . . . 515.1.2 A Few Common Sets . . . . . . . . . . . . . . . . . . . . . . . . 545.1.3 Set Operations . . . . . . . . . . . . . . . . . . . . . . . . . . . 54

5.2 Set Identities . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 575.3 Functions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 585.4 Vector Space . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 60

5.4.1 Metric . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 625.4.2 Norm . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 635.4.3 Inner Product . . . . . . . . . . . . . . . . . . . . . . . . . . . . 635.4.4 Cauchy-Schwartz Inequality . . . . . . . . . . . . . . . . . . . . 64

5.5 Sequences . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 645.5.1 Convergence and Limits . . . . . . . . . . . . . . . . . . . . . . 655.5.2 Some Results on Sequences . . . . . . . . . . . . . . . . . . . . 70

5.6 Sets in Rn . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 70

6 Problem Set 2 77


8 Linear Algebra 878.1 Vectors . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 87

8.1.1 Special Vectors . . . . . . . . . . . . . . . . . . . . . . . . . . . 888.1.2 Vector Relations and Operations . . . . . . . . . . . . . . . . . . 89

8.2 Matrices . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 898.2.1 Matrix Operations . . . . . . . . . . . . . . . . . . . . . . . . . 908.2.2 Some Fun facts about matrix multiplication . . . . . . . . . . . . 928.2.3 Rules for matrix operations . . . . . . . . . . . . . . . . . . . . . 928.2.4 Rank of a matrix . . . . . . . . . . . . . . . . . . . . . . . . . . 938.2.5 Rules for the inverse: . . . . . . . . . . . . . . . . . . . . . . . . 94

8.3 Determinant of a matrix . . . . . . . . . . . . . . . . . . . . . . . . . . . 948.3.1 Properties of Determinants . . . . . . . . . . . . . . . . . . . . . 95

8.4 An application of matrix algebra . . . . . . . . . . . . . . . . . . . . . . 978.4.1 Absorbing Markov Chains . . . . . . . . . . . . . . . . . . . . . 99

CONTENTS v

8.5 System of Linear Equations . . . . . . . . . . . . . . . . . . . . . . . . . 1008.6 Cramer’s Rule . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1048.7 Principal Minors . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 106

8.7.1 Leading Principal Minor . . . . . . . . . . . . . . . . . . . . . . 1078.8 Quadratic Form . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 107

8.8.1 Matrix Definiteness . . . . . . . . . . . . . . . . . . . . . . . . . 1078.8.2 Test for definiteness of symmetric matrices: . . . . . . . . . . . . 108

8.9 Eigenvalue and Eigenvectors . . . . . . . . . . . . . . . . . . . . . . . . 1098.10 Eigenvalues of symmetric matrix . . . . . . . . . . . . . . . . . . . . . . 1128.11 Eigenvalues, Trace and Determinant of a Matrix . . . . . . . . . . . . . . 112

8.11.1 Eigenvalues and Definiteness of Quadratic Forms . . . . . . . . . 114

9 Problem Set 3 115


11 Single and Multivariable Calculus 12511.1 Surjective and Injective Functions . . . . . . . . . . . . . . . . . . . . . 12511.2 Composition of Functions . . . . . . . . . . . . . . . . . . . . . . . . . . 12711.3 Continuous Functions . . . . . . . . . . . . . . . . . . . . . . . . . . . . 128

11.3.1 Properties of Continuous Functions . . . . . . . . . . . . . . . . 13011.4 Extreme Values . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 13211.5 An application of Extreme Values Theorem . . . . . . . . . . . . . . . . 13311.6 Differentiability . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 135

11.6.1 Rules of Differentiation . . . . . . . . . . . . . . . . . . . . . . 13711.6.2 L’Hospital’s Rule . . . . . . . . . . . . . . . . . . . . . . . . . . 140

11.7 Monotone Functions . . . . . . . . . . . . . . . . . . . . . . . . . . . . 14111.8 Functions of Several Variables . . . . . . . . . . . . . . . . . . . . . . . 142

11.8.1 Partial Derivative . . . . . . . . . . . . . . . . . . . . . . . . . . 14311.8.2 Second Order Partial Derivatives . . . . . . . . . . . . . . . . . . 145

12 Problem Set 4 149


14 Convex Analysis 15314.1 Concave, Convex Functions . . . . . . . . . . . . . . . . . . . . . . . . 153

14.1.1 Hessian, Concavity and Convexity . . . . . . . . . . . . . . . . . 15714.1.2 Some Useful Results . . . . . . . . . . . . . . . . . . . . . . . . 158

vi CONTENTS

14.2 Quasi-concave Functions . . . . . . . . . . . . . . . . . . . . . . . . . . 15814.2.1 Bordered Hessian . . . . . . . . . . . . . . . . . . . . . . . . . . 159



17 Implicit Function Theorem 17117.1 The Linear Implicit Function Theorem . . . . . . . . . . . . . . . . . . . 17117.2 Implicit Function Theorem for R2 . . . . . . . . . . . . . . . . . . . . . 173


19 Unconstrained Optimization 17919.1 Optimization Problem . . . . . . . . . . . . . . . . . . . . . . . . . . . . 17919.2 Maxima / Minima for C 2 functions of n variables . . . . . . . . . . . . . 180



22 Optimization Theory: Equality Constraints 19922.1 Constrained Optimization . . . . . . . . . . . . . . . . . . . . . . . . . . 19922.2 Equality Constraint . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 201

23 Envelope Theorem 21123.1 Envelope Theorem for Unconstrained Problems . . . . . . . . . . . . . . 21123.2 Meaning of the Lagrange multiplier . . . . . . . . . . . . . . . . . . . . 212

24 Homogeneous and Homothetic Functions 21524.1 Homogeneous Functions . . . . . . . . . . . . . . . . . . . . . . . . . . 21524.2 Homothetic Functions . . . . . . . . . . . . . . . . . . . . . . . . . . . . 217

25 Optimization Theory: Inequality Constraints 21925.1 Inequality Constraint . . . . . . . . . . . . . . . . . . . . . . . . . . . . 21925.2 Global maximum and constrained local maximum . . . . . . . . . . . . . 226



CONTENTS vii

28 Elementary Concepts in Probability 25128.1 Discrete Probability Model . . . . . . . . . . . . . . . . . . . . . . . . . 25128.2 Marginal and Conditional Distribution . . . . . . . . . . . . . . . . . . . 25528.3 The Law of Iterated Expectation . . . . . . . . . . . . . . . . . . . . . . 25728.4 Continuous Random Variables . . . . . . . . . . . . . . . . . . . . . . . 257

viii CONTENTS

Preface

These notes have been prepared for the Math Review Class for Graduate students joiningPh. D. program in the field of economics at Cornell University. While making these noteswe have referred to the material used in previous years’ classes.

The objective of Math Review class is to present elementary concepts from set the-ory, multi-variable calculus, linear algebra, elementary probability concepts, real analysisand optimization theory. I have used examples and problem sets to explain the concepts,definitions and techniques which are useful in Fall semester graduate economics classes.

These notes could serve to refresh the memory for those incoming students who arefamiliar with the material. To others, these notes could be a ready reckoner of math tech-niques they will need to know in the first few weeks of the graduate classes (Econ 6090,Econ 6130, Econ 6190) in Economics before they are discussed in Econ 6170 in morerigorous way.

The topics have been arranged so that the entire material can be covered in thirteenclasses of three hours duration each. Additional problem sets with solutions are providedon each day’s material. Three additional sections of three hours each are sufficient to goover the questions in problem sets. It is hoped that they will help the reader to betterunderstand the material in lecture notes.

Earlier versions have been used for the Math Review Classes during 2009-12. Mysincere thanks go to the participants for their comments and also for pointing out typos,errors.

Ram Sewak Dubey

ix

x PREFACE

Syllabus

Instructor: Ram Sewak Dubey Office Room: 432, Uris HallOffice Hour: 12:15 -1:15pm E-mail: [email protected]

0.1 Overview

The Field of Economics offers the August Math Review Course for incoming first-yearPh.D. students. The aim of this review is to refresh students’ mathematical skills andintroduce concepts that are critical to success in the first year economics core courses, i.e.,Econ 6090, Econ 6130, Econ 6170, and Econ 6190. The emphasis is on rigorous treatmentof proof techniques, underlying concepts and illustrative examples.

There is usually a great deal of variation in the mathematical background of incomingfirst-year students. However, almost all students have something to gain from the reviewcourse. For those who do not have an adequate mathematics background (by a US Ph.D.standard), the course offers an opportunity to catch up on critical concepts and get a headstart on the fall classes. For those who took their core undergraduate courses in analysisand algebra some years ago, the course is a good refresher. For those who do not havesignificant experience with technical courses taught in English, the review offers an op-portunity to pick up the math vocabulary that will be in use from the first day of regularinstruction.

The Math Review Course is funded by the Department of Economics. There is nocharge for students matriculating into the Economics Ph.D. Program. Students matricu-lating into other Ph.D. programs should contact the Director of Graduate Studies in theirField. There will be a charge for these students, and the DGS in the student’s Field mustmake arrangements to pay that charge before the student may attend the Math ReviewCourse.

The Math Review Course is not linked to Econ 6170, Intermediate Mathematical Eco-nomics I. There is no course grade, and no record will be kept of your performance. How-ever, the Economics Ph.D. program strongly encourages you to attend. Most students who

xi

xii SYLLABUS

have taken this course in past years have found it useful, regardless of their prior math-ematics training. Perhaps most importantly, the review period is an excellent time to getacquainted with other incoming students, meet the faculty and settle into Ithaca.

0.2 Course ScheduleThe course duration will be August 5-21. There will be a lecture session each working dayand discussion sections on three days. The room for all the sessions is Uris 202.

(A) Regular Session:

Days: August 5-9, 12-16, 19-21 Time: 9am-Noon.

(B) Extra Session: There will be three extra sessions to go over selected solutions.

Days: August 8, 15 Time: 3-6pm,Day: August 20 Time: 2-5pm.

(C) There will be a handout of some basic definitions distributed at each session, andpractice problems will be assigned on each topic. Students are strongly encouragedto at least attempt every problem, as this is the best way to understand the material.The problem sets will be due the following day in class (for example, the problem setgiven in class on Monday will be due on Tuesday) and I intend to grade some of thequestions in each problem set. We will go over the solutions to the problem sets inthe extra session.

0.3 Topics coveredA. Elements of Logic: Statements, Truth tables, Implications, Tautologies, Contradic-

tions, Logical Equivalence, Quantifiers, Negation of Quantified Statements

B. Proof Techniques: Trivial Proofs, Vacuous Proofs, Direct Proofs, Proof by Contraposi-tive, Proof by Cases, Proof by Contradiction, Existence Proofs, Proof by MathematicalInduction

C. Set Theory: Definitions, Set Equality, Set Operations, Venn Diagrams, Set Identities,Cartesian Products, Properties of the Set of Real Numbers

0.3. TOPICS COVERED xiii

D. Sequences: Convergent Sequences, Subsequences, Cauchy Sequences, Upper and LowerLimits, Algebraic Properties of Limits, Monotone Sequences

E. Functions of One Variable: Limits of Functions, Continuous Functions, MonotoneFunctions, Properties of Exponential and Logarithmic Functions

F. Linear Algebra: System of Linear Equations, Solution by Substitution or Eliminationof Variables, Systems with Many or No Solutions

G. Vectors I: Addition, Subtraction, Scalar Multiplication, Length, Distance, Inner Prod-uct

H. Matrix Algebra I: Addition, Subtraction, Scalar and Matrix Multiplication, Transpose,Laws of Matrix Algebra

I. Determinants: Definition, Computation, Properties, Use of Determinants, Matrix In-verse, Cramer’s Rule

J. Vectors II: Linear Independence, Rn as an example of Vector Space, Basis and Dimen-sion in Rn

K. Matrix Algebra II: Algebra of Square Matrices, Eigenvalues, Eigenvectors, Propertiesof Eigenvalues

L. Differential Calculus: Derivative of a Real Function, Mean Value Theorem, Continuityof Derivatives, L’Hospital’s Rule, Higher Order Derivatives, Taylor’s Theorem

M. Functions of Several Variables: Graphs of Functions of Two Variables, Level Curves,Continuous Functions, Total Derivative, Chain Rule, Partial Derivatives

N. Unconstrained Optimization: First Order Conditions, Global Maxima and Minima,Examples

O. Constrained Optimization with equality constraints: First Order Conditions, ConstrainedMinimization Problems, Examples

P. Constrained Optimization with inequality constraints: Kuhn-Tucker conditions, Inter-preting the Multipliers, Envelope Theorem

xiv SYLLABUS

0.4 TextbookThere is no textbook for this course, however the following books are helpful. The text-book Mas-Colell et al. (1995) is used in the Microeconomics course sequence. Simon andBlume (1994) and Wainwright and Chiang (2005) are useful textbooks for MathematicalEconomics. It will be useful to refer to Simon and Blume (1994) for understanding thematerial. Copies of this textbook are available in the libraries. Stricharz (2000) will be ourreference book for analysis. Dixit (1990) contains many useful examples. Mitra (2013) isthe set of Lecture Notes used in Econ 6170. It should be available at the bookstore at thestart of the course.

Chapter 1

Introduction to Logic

1.1 Introduction

The theory that you’ll learn during the first year is built on a foundation borrowed fromengineering and pure mathematics. You will be required to both understand and reproducecertain key proofs, particularly in microeconomics. On some problem sets and examsyou’ll be asked to produce your own proofs.

If you haven’t taken any pure math courses, you might be thinking “I don’t even knowwhat a proof is”. That is completely fine. There are plenty of very accomplished Ph.D.students at Cornell who had no idea how to write a proof when they arrived. It’s importantnot to get discouraged because it takes time to learn how to write good proofs. There is astandard bag of tricks that will get you through almost any proof in the first year sequence,but it takes exposure and then practice for you to learn and be comfortable with thesetricks. Math majors are at an advantage here, more than in most areas, but by the end ofthe year they’ll have forgotten the fancier proof techniques and you’ll have learned thenecessary ones, so the field will be surprisingly level.

A proof is a series of statements that demonstrates the truth of a proposition. In writinga proof you make use of (i) the rules of logic and (ii) Definitions, theorems, and otherpropositions that have already been proved, or that you are told you can take as given.

The rules of logic are obviously fixed and unchanging. The components of the secondpoint, however, will vary depending on the task at hand. The most important questionto ask yourself when attempting to prove a proposition is “What do I already know”? Itwill often be the case that if you write down all of the relevant mathematical definitions,the theorems or results that you were given or that you know you can take as given, andany result that you just proved in a previous problem, a straightforward rearrangement ofeverything on the page will give you the proof that you want.

1

2 CHAPTER 1. INTRODUCTION TO LOGIC

In this chapter we will discuss the principles of logic that are essential for problemsolving in mathematics. The ability to reason using the principles of logic is key to seekthe truth which is our goal in mathematics. Before we explore and study logic, let usstart by spending some time motivating this topic. Mathematicians reduce problems tothe manipulation of symbols using a set of rules. As an illustration, let us consider thefollowing problem.

Example 1.1. Joe is 7 years older than John. Six years from now Joe will be twice Johnsage. How old are Joe and John?

Solution 1.1. To answer the above question, we reduce the problem using symbolic for-mulation. We let Johns age be x. Then Joes age is x+7. We are given that six years fromnow Joe will be twice Johns age. In symbols, (x+7)+6 = 2(x+6). Solving for x yieldsx = 1. Therefore, John is 1 year old and Joe is 8.

Our objective is to reduce the process of mathematical reasoning, i.e., logic, to themanipulation of symbols using a set of rules. The central concept of deductive logic is theconcept of argument form. An argument is a sequence of statements aimed at demonstrat-ing the truth of an assertion (a “claim”). Consider the following two arguments.

Argument 1. If x is a real number such that x < −3 or x > 3, then x2 > 9. Therefore, ifx2 ≤ 9, then x ≥−3 and x ≤ 3.

Argument 2. If it is raining or I am sick, then I stay at home. Therefore, if I do not stay athome, then it is not raining and I am not sick.

Although the content of the above two arguments is very different, their logical form isthe same. To illustrate the logical form of these arguments, we use letters of the alphabet(such as p, q and r) to represent the component sentences and the expression ”not p” torefer to the sentence “It is not the case that p.”. Then the common logical form of both thearguments above is as follows:

If p or q, then r. Therefore, if not r, then not p and not q.

We start by identify and giving names to the building blocks which make up an argument.In Arguments 1 and 2, we identified the building blocks as follows:

Argument 1. If x is a real number such that x < −3 (p) or x> 3 (q), then x2 > 9 (r).Therefore, if x2 ≤ 9 (not r), then x ≥−3 (not p) and x ≤ 3 (not q).

Argument 2. If it is raining (p) or I am sick (q), then I stay at home (r). Therefore, if Ido not stay at home (not r), then it is not raining (not p) and I am not sick (not q).

1.2. STATEMENTS 3

1.2 StatementsThe study of logic is concerned with the truth or falsity of statements.

Definition 1.1 (Statement). A statement is a sentence which can be classified as true orfalse without ambiguity. The truth or falsity of the statement is known as the truth value.

For a sentence to be a statement, it is not necessary for us to know whether it is true orfalse. However, it must be clear that it is one or the other.

Example 1.2. Consider following examples.

(a) One plus two equals three. It is a statement which is true.

(b) One plus one equals three. It is also a statement which is not true.

(c) He is a university student. This sentence is neither true nor false. The truth or falsitydepends on the reference for the pronoun he. For some values of he the sentence istrue; for others it is false, and so it is not a statement.

(d) Every continuous function is differentiable.” is a statement with truth value beingfalse.

(e) “x < 1 ” is true for some values of x and false for some others. It is a statement if wehave some particular context in mind. Otherwise, it is not a statement.

(f) “Goldbach’s Conjecture Every even number greater than 2 is the sum of two primenumbers ” is a statement whose truth value is not known yet.

(g) “There are infinitely many prime numbers of the form 2n + 1, where n is a naturalnumber. ” is another statement whose truth value is not known till now.

Every statement has a truth value, namely true (denoted by T) or false (denoted by F). Weoften use p, q and r to denote statements, or perhaps p1, p2, · · · , pn if there are severalstatements involved.

Exercise 1.1. Which of the following sentences are statements?

(a) If x is a real number, then x2 ≥ 0.

(b) 11 is a prime number.

(c) This sentence is false.


The possible truth values of a statement are often given in a table, called a truth table.The truth values for two statements p and q are given below. Since there are two possibletruth values for each of p and q, there are four possible combinations of truth values for pand q. It is customary to consider the four combinations of truth values in the order of TT,TF, FT, FF from top to bottom.

p qT TT FF TF F

(1.1)

1.3 Logical ConnectiveA logical connective (also called a logical operator) is a symbol or word used to connecttwo or more statements such that the compound statement produced has a truth valuedependent on the respective truth values of the original statements.

We discuss some of the elementary logical operators (connectives) first.

(1) Logical Negation

Logical negation is an operation on one logical value, typically, the value of a propo-sition, that produces a value of true if its operand is false and a value of false if itsoperand is true. The truth table for ¬A (also written as NOT A or ∼ A) is as follows: A ¬A

T FF T

(1.2)

For example, consider the statement,

p : The integer 2 is even.

Then the negation of p is the statement

∼ p : It is not the case that the integer 2 is even.

It would be better to write,

∼ p : The integer 2 is not even.

Or better yet to write,∼ p : The integer 2 is odd.

1.3. LOGICAL CONNECTIVE 5

(2) Logical Conjunction

Logical conjunction is an operation on the values of two propositions, that produces avalue of true if and only if both of its operands are true. The truth table for A∧B (alsowritten as A AND B) is as follows:

A B A∧BT T TT F FF T FF F F

(1.3)

In words, if both A and B are true, then the conjunction A∧B is true. For all otherassignments of logical values to A and to B the conjunction A∧B is false.

For example, consider the statements

p : The integer 2 is even.

q : 4 is less than 3.

The conjunction of p and q, namely,

p∧q : The integer 2 is even and 4 is less than 3,

is a false statement since q is false (even though p is true).

(3) Logical Disjunction

Logical disjunction is an operation on the values of two propositions, that produces avalue of false if and only if both of its operands are false. The truth table for A∨B(also written as A OR B) is as follows:

A B A∨BT T TT F TF T TF F F

(1.4)

Thus for the statements p and q described earlier, the disjunction of p and q, namely,

p∨q : The integer 2 is even or 4 is less than 3,

is a true statement since at least one of p and q is true (in this case, p is true).


(4) Logical Implication

Logical implication is associated with an operation on the values of two propositions,that produces a value of false only in the case that the first operand is true and thesecond operand is false. The truth table associated with A ⇒ B is as follows:

A B A ⇒ BT T TT F FF T TF F T

(1.5)

The last row of the table may appear to be counterintuitive. Note, however, that the useof “if · · · then ” as a connective is quite different from that of day-to-day language.

Consider the following example.

Example 1.3. Suppose your supervisor makes you the following promise:

“If you meet the month-end deadline, then you will get a bonus.”

Under what circumstances are you justified in saying that your supervisor spoke falsely?

The answer is: You do meet the month-end deadline and you do not get a bonus. Yoursupervisor’s promise only says that you will get a bonus if a certain condition (youmeet the month-end deadline) is met; it says nothing about what will happen if thecondition is not met. So if the condition is not met, your supervisor did not lie (yoursupervisor promised nothing if you did not meet the month-end deadline); so yoursupervisor told the truth in this case. Are you convinced? Good! If not, let us thencheck the truth and falsity of the implication based on the various combinations of thetruth values of the statements

p : You meet the month-end deadline;q : You get a bonus.

The given statement can be written as p ⇒ q.

Suppose first that p is true and q is true. That is, you meet the month-end deadlineand you do get a bonus. Did your supervisor tell the truth? Yes, indeed. So if p and qare both true, then so too is p ⇒ q, which agrees with the first row of the truth table of(1.5).

Second, suppose that p is true and q is false. That is, you meet the month-end deadlineand you did not get a bonus. Then your supervisor did not do as he / she promised.


What your supervisor said was false, which agrees with the second row of the truthtable of (1.5).

Third, suppose that p is false and q is true. That is, you did not meet the month-enddeadline and you did get a bonus. Your supervisor (who was most generous) did notlie (your supervisor promised nothing if you did not meet the month-end deadline); sohe/she told the truth. This agrees with the third row of the truth table of (1.5).

Finally, suppose that p and q are both false. That is, you did not meet the month-enddeadline and you did not get a bonus. Your supervisor did not lie here either. Yoursupervisor only promised you a bonus if you met the month-end deadline. So yoursupervisor told the truth. This agrees with the fourth row of the truth table of (1.5).

In summary, the implication p ⇒ q is false only when p is true and q is false.

A conditional (or implication) statement that is true by virtue of the fact that its hy-pothesis is false is said to be vacuously true or true by default. Thus the statement: “Ifyou meet the month-end deadline, then you will get a bonus”is vacuously true if youdo not meet the month-end deadline!

Example 1.4. Consider the expression 4+1 = 9 ⇒ 8−1 = 3. It may not be apparentas to why this statement is assigned a truth value of T. But it is indeed true can be seenas follows. 4+1−4 = 9−4 = 5 so 1 = 5 and therefore 8−1 = 8−5 = 3.

(5) Logical Equality

Logical equality is an operation on the values of two propositions, that produces avalue of true if and only if both operands are false or both operands are true. The truthtable for A ≡ B is as follows:

A B A ≡ BT T TT F FF T FF F T

(1.6)

So A ≡ B is true if A and B have the same truth value (both true or both false), andfalse if they have different truth values.

Definition 1.2. A compound statement (statement with connective) is said to be a tautol-ogy if it is always true regardless of the truth value of the simple statements from which it isconstructed. It is a contradiction if it is always false. Thus a tautology and a contradictionare negation of each other.


Example 1.5. A∨ (¬A) is a tautology, while A∧ (¬A) is a contradiction. A ¬A A∨ (¬A) A∧ (¬A)T F T FF T T F

(1.7)

Example 1.6. [A∧ (A ⇒ B)]⇒ B is a tautology.A B A ⇒ B A∧ (A ⇒ B) [A∧ (A ⇒ B)]⇒ BT T T T TT F F F TF T T F TF F T F T

(1.8)

Definition 1.3.

(a) The converse of A ⇒ B is B ⇒ A.

(b) The inverse of A ⇒ B is ∼ A ⇒∼ B.

(c) The contrapositive of A ⇒ B is ∼ B ⇒∼ A.

Example 1.7. Write the converse, inverse and contrapositive of the statement in Example1.3.

Recall that the given statement can be written as p ⇒ q where p and q are the state-ments:

p : You meet the month-end deadline;q : You get a bonus.

(a) The converse of this implication is q ⇒ p: If you get a bonus, then you have met themonth-end deadline.

(b) The inverse of this implication is ∼ p ⇒∼ q: If you do not meet the month-end dead-line, then you will not get a bonus.

(c) The contrapositive of this implication is ∼ q ⇒∼ p: If you do not get a bonus, thenyou will not have met the month-end deadline.

The following theorem is extremely useful.


Theorem 1.1. (A ⇒ B)⇔ (∼ B ⇒∼ A).

Proof. Using Truth table,A B A ⇒ B ∼ B ∼ A ∼ B ⇒∼ AT T T F F TT F F T F FF T T F T TF F T T T T

(1.9)

The entries in third and sixth columns are identical.

Remark 1.1. It is an exercise to see that A ⇒ B is not logically equivalent to its converse,B ⇒ A. One should avoid the very common mistake of claiming the opposite.

Example 1.8. Consider following two statements,

(A) Cornell is in Ithaca.

(B) Cornell is in NY state.

and the compound statements:

(a) Implication : A ⇒ B : If Cornell is in Ithaca, then Cornell is in NY state.

(b) Contrapositive : ∼ B ⇒∼ A : If Cornell is NOT in NYS, then Cornell is NOT inIthaca.

(c) Converse : B ⇒ A : If Cornell is in NYS, then Cornell is in Ithaca.

Note that the converse statement is FALSE. This leads us to another important inter-pretation of the implication A ⇒ B. It means that every time A is true, then B must betrue. Hence A is a sufficient condition for B. If we know that A is true then we can alwaysconclude that B is also true. The contrapositive ∼ B ⇒∼ A showed us that when B is nottrue then A cannot be true either. Hence B is a necessary condition for A. If A is true wemust necessarily have that B is true, because if B isn’t true then A cannot be true either.Thus we have following ways of reading

A ⇒ B :

A implies B,If A then B,A is sufficient for B,B is necessary for A.

(1.10)

Remark 1.2. Note that for equivalence relation (the if and only if) A ⇔ B, the implicationgoes in both the directions. In this case A and B are necessary and sufficient conditions foreach other. A ⇔ B means that both the statement A ⇒ B and its converse B ⇒ A are true.


1.4 Quantifiers

In the previous sections, we learnt some definitions and basic properties of compoundstatements. We were interested in whether a particular statement was true or false. Thislogic is called propositional logic or statement logic. However there are many argumentswhose validity cannot be verified using propositional logic. Consider, for example, thesentence

p : x is an even integer.

This sentence is neither true nor false. The truth or falsity depends on the value of thevariable x. For some values of x the sentence is true; for others it is false. Thus thissentence is not a statement. However, let us denote this sentence by P(x), i.e.,

P(x) : x is an even integer.

Then, P(5) is true, while P(6) is false. To study the properties of such sentences, we needto extend the framework of propositional logic to what is called first-order logic.

Definition 1.4. A predicate or propositional function is a sentence that contains a finitenumber of variables and becomes a statement when specific values are substituted for thevariables. The domain of a predicate variable is the set of all values that may be substitutedin place of the variables.

In our earlier example, the sentence

P(x) : x is an even integer

is a propositional function with domain D, the set of integers; since for each x ∈ D, P(x)is a statement, i.e., for each x ∈ D, P(x) is true or false, but not both.

Example 1.9. The following are examples of predicate or propositional functions:

(a) The sentence “P(x) : x + 3 is an even integer” with domain D the set of positiveintegers.

(b) The sentence “P(x) : x+3 is an even integer” with domain D the set of integers.

(c) The sentence “P(x;y;z) : x2 + y2 = z2” with domain D the set of positive integers.

1.4. QUANTIFIERS 11

Before proceeding further, we introduce following notations. A more comprehensivelist of notation will be described later.

∈ : “is an element of”,϶ : “such that”,∧ : AND in the sense that A∧B means both Aand B,∨ : OR in the sense that A∨B means either A or B or both∀ : Universal “for all”∃ : Existential “there exists” (one or more).

(a) The Universal Quantifier:

Let P(x) be a predicate with domain D. Then the sentence

Q(x) : for all x,P(x)

is a statement. To see this, notice that either P(x) is true at each value x ∈ D (thenotation x ∈ D indicates that x is in the set D, while x /∈ D means that x is not in D) orP(x) is false for at least one value of x ∈ D. If P(x) is true at each value x ∈ D, thenQ(x) is true. However, if P(x) is false for at least one value of x ∈ D, then Q(x) isfalse. Hence, Q(x) is a statement because it is either true or false (but not both).

Definition 1.5. Each of the phrases “every”, “for every”, “for each”, and “for all” isreferred to as the universal quantifier and is expressed by the symbol ∀. Let P(x) bea statement with domain D. A universal statement is a statement of the form ∀x ∈D,P(x). It is false if P(x) is false for at least one x ∈ D; otherwise, it is true.

Example 1.10. Let D be a set.

The statement∀x ∈ D,x > 0

means “For all x that are elements of D, x is positive.”

Example 1.11. Let P(x) be the predicate “P(x) : x2 ≥ x.”.

Determine whether the following universal statements are true or false.

(i) ∀x ∈ R; P(x);

(ii) ∀x ∈ Z; P(x);


(i) Let x = 12 ∈ R. Then, (1

2)2 = 1

4 < 12 , and so P(1

2) is false. Therefore, “∀x ∈ R;P(x)” is false.

(ii) For all integers x, x2 ≥ x is true, and so P(x) is true for all ∀x∈Z. Hence,“∀x∈Z;P(x)” is true.

(b) The Existential Qunatifier:

Each of the phrases “there exists”, “there is”, “for some”, and “for at least one” isreferred to as the existential quantifier and is denoted in symbols ∃. Let P(x) be apredicate with domain D. An existential statement is a statement of the form ∃x ∈ Dsuch that P(x): It is true if P(x) is true for at least one x ∈ D; otherwise, it is false.

Example 1.12. As before let D be a set.

The statement∃x ∈ D,϶ x > 0

tells us that “There exists an element x of D such that x is positive.”

Example 1.13. Let P(x) be the predicate “P(x) : x2 < x.”.

Determine whether the following existential statements are true or false.

(i) ∃x ∈ R; P(x);

(ii) ∃x ∈ Z; P(x);

(i) Let x = 12 ∈ R. Then, (1

2)2 = 1

4 < 12 , and so P(1

2) is true. Therefore, “∃x ∈ R;P(x)” is true.

(ii) For all integers x, x2 ≥ x is true, and so there is no x ∈ Z such that P(x) is true.Hence,“∃x ∈ Z; P(x)” is false.

(c) Universal Conditional Statements

Recall that a conditional statement has a contrapositive, a converse, and an inverse.These definitions can be extended to universal conditional statements. Consider auniversal conditional statement of the form ∀x ∈ D; P(x)→ Q(x).

(i) Its contrapositive is the statement,

∀x ∈ D;∼ Q(x)→∼ P(x).

1.4. QUANTIFIERS 13

(ii) Its converse is the statement,

∀x ∈ D;Q(x)→ P(x)

(iii) Its inverse is the statement,

∀x ∈ D;∼ P(x)→∼ Q(x).

Example 1.14. Write the contrapositive, converse, and inverse of the statement: If areal number is greater than 3, then its square is greater than 9.

Solution 1.2. Symbolically, the statement can be written as:

∀x ∈ R; if x > 3 then x2 > 9

Here P(x) is the statement x > 3 and Q(x) the statement x2 > 9.

(i) The contrapositive is:

∀x ∈ R; if x2 > 9 then x > 3,

or, equivalently,∀x ∈ R; if x2 ≤ 9 then x ≤ 3.

(ii) The converse is:∀x ∈ R; if x2 > 9 then x > 3.

Note that the converse is false; take, for example, x = −4. Then, (−4)2 > 9 istrue but −4 > 3 is false. Hence the statement if (−4)2 > 9 then −4 > 3 is false.Hence the universal statement ∀x ∈ R; if x2 > 9 then x > 3 is false.

(iii) The inverse is:∀x ∈ R; if x > 3 then x2 > 9,

or, equivalently,∀x ∈ R; if x ≤ 3 then x2 ≤ 9.

(d) Order of quantifiers:

If the quantifiers are of the same type, the order in which they appear does not matter.

∀x,∀y : x+ y = y+ x∃x∧∃y : ϶ x+ y = 2∧ x+2y = 3.


But if the quantifiers are of different types we have to be careful. For the set of realnumbers, the statement

∀x ∃y ϶ y > x (1.11)

is TRUE, that is given any real number x, there is always a real number y that is greaterthan x. But the statement

∃y ϶ ∀ x, y > x (1.12)

is FALSE, since there is no fixed real number y that is greater than every real number.

Example 1.15. The statement [∃y ∈U ϶ ∀x ∈V,statement A] means that one y willmake A true regardless of what x is. The statement [∀x ∈ V,∃y ∈ U ϶ statement A]means that A can be made true by choosing y depending on x.

1.5 Rules of Negation of statements with quantifiersFact 1. The negation of a universal statement of the form ∀x ∈ D;P(x) is logically equiv-alent to an existential statement of the form ∃x ∈ D; such that ∼ P(x). Symbolically,

∀x ∈ D;P(x)≡ ∃x ∈ D; such that ∼ P(x)

Consider the universal statement ∀x ∈ D;P(x). It is false if P(x) is false for at least onex ∈ D; otherwise, it is true. Hence it is false if and only if P(x) is false for at least onex ∈ D, or, if and only if ∼ P(x) is true for at least one x ∈ D. Thus the negation of thisstatement is the statement ∃x ∈ D such that ∼ P(x).

Example 1.16. What is the negation of the statement “All mathematicians wear glasses ”?

Solution 1.3. Let us write this statement symbolically. Let D be the set of all mathemati-cians and let P(x) be the predicate “x wears glasses” with domain D. The given statementcan be written as ∀x ∈ D;P(x). The negation is ∃x ∈ D such that ∼ P(x). In words, thenegation is “There exists a mathematician who does not wear glasses” or “Some mathe-maticians do not wear glasses”.

Fact 2. The negation of an existential statement of the form ∃x ∈ D such that P(x) islogically equivalent to a universal statement of the form ∀x ∈ D; ∼ P(x). Symbolically,

∼ (∃x ∈ D such thatP(x))≡ ∀x ∈ D; ∼ P(x).

Consider the existential statement, ∃x ∈ D such that P(x). It is true if P(x) is true forat least one x ∈ D; otherwise, it is false. Hence it is false if and only if P(x) is false for allx ∈ D if and only if ∼ P(x) is true for all x ∈ D. Thus the negation of this statement is thestatement ∀x ∈ D; ∼ P(x).

1.5. RULES OF NEGATION OF STATEMENTS WITH QUANTIFIERS 15

Example 1.17. What is the negation of the statement “Some politicians are honest”?

Solution 1.4. Let us write this statement symbolically. Let D be the set of all politiciansand let P(x) be the predicate “x is honest” with domain D. The given statement can bewritten as ∃x ∈ D such that P(x). The negation is ∀x ∈ D; ∼ P(x). In words, the negationis “All politicians are not honest” or “No politician is honest”.

Consider next the negation of a universal conditional statement. By the second Fact,we have that ∼ (∀x ∈ D;(P(x) → Q(x))) ≡ ∃x ∈ D such that ∼ (P(x) → Q(x)). But thenegation of an “if p then q” statement is logically equivalent to an “p and not q” state-ment. Hence, ∼ (P(x)→ Q(x))≡ P(x)∧ ∼ Q(x). Therefore we have the following fact:

Fact 3. The negation of a universal conditional statement of the form ∀x ∈ D;(P(x) →Q(x)) is logically equivalent to the existential statement of the form ∃x ∈ D such that(P(x)∧ ∼ Q(x)). Symbolically,

∼ (∀x ∈ D;(P(x)→ Q(x)))≡ ∃x ∈ D such that (P(x)∧ ∼ Q(x)).

Written less symbolically, this becomes

∼ (∀x ∈ D; if P(x) then Q(x))≡ ∃x ∈ D such that P(x) and ∼ Q(x).

1.5.1 More Examples

We can use the truth tables to prove following examples of negations.

∼ (A∧B)⇔∼ A∨ ∼ B∼ (A∨B)⇔∼ A∧ ∼ B∼ (x > y)⇔ x ⩽ y∼ (A ⇒ B)⇔ A∧ ∼ B∼ (∼ A)⇔ A.

Try proving them (Good Exercise).

1.5.2 Negation of statement with one quantifier

The universal statement in the example above contains a universal quantifier term and thestatement x > 0. To negate a universal statement we need to find only one counterexample.


In this example, if we can find just one x in D that is non positive, we know that it is nottrue that all x are positive. Thus the negation of the universal statement

∀x ∈ D,x > 0

is an existential statement,∃x ∈ D,϶ x ⩽ 0.

To negate an existential statement we must show that every possible instance is false. Theexistential statement

∃x ∈ D,϶ x > 0

is false if there are no positive elements of D. Thus the negation of the existential statementis a universal statement

∀x ∈ D,x ⩽ 0.

Insight from these examples can be generalized to rules of negation. Note that ϶, such thatalways follows ∃ (the existential quantifier).

Rule 1.1. For negating the statement, [quantifier term, statement], first change the quanti-fier: ∀ becomes ∃, ∃ becomes ∀ and then negate the statement.

1.5.3 Negation with more than one quantifier

Rule 1.2. To negate a statement with a string of quantifiers, change the type of each quan-tifier, preserve their order and negate the statement that follows the quantifiers.

Example 1.18. Statement:

∀ε > 0 ∃N ϶ ∀ n, if n ⩾ N, then ∀ x ∈ D, | fn (x)− f (x)| < ε. (1.13)

Negation: ∃ε > 0 ϶ ∼ [∃N ϶ ∀ n, if n ⩾ N, then ∀ x ∈ D, | fn (x)− f (x)| < ε],or ∃ε > 0 ϶ ∀N,∼ [ ∀ n, if n ⩾ N, then ∀ x ∈ D, | fn (x)− f (x)| < ε],or ∃ε > 0 ϶ ∀N, ∃n ϶∼ [if n ⩾ N, then ∀ x ∈ D, | fn (x)− f (x)| < ε],or ∃ε > 0 ϶ ∀N, ∃n ϶ n ⩾ N and ∼ [∀ x ∈ D, | fn (x)− f (x)| < ε],or ∃ε > 0 ϶ ∀N, ∃ n ⩾ N and ∃ x ∈ D, ϶ | fn (x)− f (x)| ⩾ ε.

(1.14)

1.6. LOGICAL EQUIVALENCES 17

1.6 Logical EquivalencesThere are many fundamental logical equivalences that we often encounter. Several of theseare listed in Theorem below. We may find them to be useful for future reference.

Theorem 1.2. Let p, q and r be statements. Then the following logical equivalences hold.

(1) Commutative Laws

(i) p∧q ≡ q∧ p;

(ii) p∨q ≡ q∨ p.

(2) Associative Laws

(i) (p∧q)∧ r ≡ p∧ (q∧ r);

(ii) (p∨q)∨ r ≡ p∨ (q∨ r).

(3) Distributive Laws

(i) p∨ (q∧ r)≡ (p∨q)∧ (p∨ r);

(ii) p∧ (q∨ r)≡ (p∧q)∨ (p∧ r).

(4) De Morgans Laws

(i) ∼ (p∨q)≡ (∼ p)∧ (∼ q);

(ii) ∼ (p∧q)≡ (∼ p)∨ (∼ q).

(5) Idempotent Laws

(i) p∧ p ≡ p;

(ii) p∨ p ≡ p.

(6) Negation Laws

(i) p∨ (∼ p)≡ T ;

(ii) p∧ (∼ p)≡ F ;

where T: True; F: False.

(7) Universal Bound Laws

(i) p∨T ≡ T ;


(ii) p∧F ≡ F .

(8) Identity Laws

(i) p∨F ≡ p;(ii) p∧T ≡ p.

(9) Double Negation Law ∼ (∼ (p))≡ p.

The De Morgans Laws can be expressed in words as under: “The negation of an andstatement is logically equivalent to the or statement in which each component is negated,while the negation of an or statement is logically equivalent to the and statement in whicheach component is negated.”

1.7 Some Math symbols and DefinitionsThis is a very brief list of some of the mathematical shorthand that will be used in thiscourse and in the first year courses. Some of these symbols will be explained in moredetail as we go.

Operator Meaning∀ For all, for every, for each∃ There exists, there is∈ In, a member of∋ Owns, contains∨ Or∧ And∴ Therefore

∼ or ¬ Not/0 Empty set⊂ Subset, is a subset of⊃ Contains the set∪ Union (of sets)∩ Intersection (of sets)⇒ Implies

⇐⇒ or iff If and only if, each implies the others.t., |, ϶ or : Such that

Q.E.D. Quod erat demonstrandum (Proof complete)

1.7. SOME MATH SYMBOLS AND DEFINITIONS 19

Next we define some of the commonly used mathematical terms.

(a) Theorem A statement which can be demonstrated to be true by accepted mathe-matical operations and arguments.

In general, a theorem is an embodiment of some general principle that makes it partof a larger theory. The process of showing a theorem to be correct is called a proof.

(b) Proposition A statement which is required to be proved.

(c) Axiom A proposition regarded as self-evidently true without proof. The word“axiom” is synonym for postulate.

(d) Corollary An immediate consequence of a result already proved. Corollaries usu-ally state more complicated theorems in a language simpler to use and apply.

(e) Lemma A short theorem used in proving a larger theorem.

(f) Hypothesis A hypothesis is a proposition that is consistent with known data, buthas been neither verified nor shown to be false.

(g) Definition Tells us how or what things are.

Chapter 2

Proof Techniques

2.1 Methods of ProofA proof is a method of establishing the truthfulness of an implication. An example wouldbe to prove a proposition of the form, “If H1, · · · ,Hn, then T. ”. The statements H1, · · · ,Hnare referred to as hypotheses of the proof and proposition T is referred to as the conclu-sion. A formal proof would consist of a sequence of valid propositions ending with theconclusion T. By valid proposition, we mean the proposition in the sequence must eitherbe one of the hypotheses H1, · · · ,Hn, or an axiom, a definition, a tautology or a proposi-tion proved earlier, or it must be derived from previous propositions using either logicalimplication or substitution.

Before we present proof techniques, we describe some elementary definitions in num-ber theory.

Definition 2.1. An integer n is even if and only if n = 2k for some integer k. An integer nis odd if and only if n = 2k+1 for some integer k.

Using the quotient-remainder theorem, we can show that every integer is either evenor odd.

Definition 2.2. An integer n is prime if and only if n > 1 and for all positive integers r ands, if n = r · s then r = 1 or s = 1. An integer n is composite if and only if n = r · s for somepositive integers r and s, with r = 1 and s = 1.

First three prime numbers are 2, 3, and 5. First six composite numbers are 4, 6, 8, 9, 10and 12. Every integer greater than 1 is either prime or composite since the two definitionsare negations of each other.

21

22 CHAPTER 2. PROOF TECHNIQUES

Definition 2.3. Two integers m and n are said to be of the same parity if m and n are botheven or are both odd, while m and n are said to be of the opposite parity if one of m andn is even and the other is odd. Two integers are consecutive if one is one more than theother.

Integers 2 and 8 are of same parity while 5 and 10 are of opposite parity.

Definition 2.4. Let n and d be integers with d = 0. Then n is said to be divisible by d ifn = d · k for some integer k. In such case we say that n is a multiple of d, or d is a factorof n, or d is a divisor of n, or d divides n.

The notation “d|n” is read as “d divides n”.We discuss following techniques of writing proofs. Our emphasis here will be on

showing how each of them is used through several examples.

2.2 Trivial Proofs

Let P(x) and Q(x) be statements with domain D. If Q(x) is true for every x ∈ D, then theuniversal statement

∀x ∈ D,P(x)→ Q(x)

is true regardless of the truth value of P(x). Such a proof is called a trivial proof.

Claim 1. For x ∈ R, if x >−3, then x2 +1 > 0.

Proof. Consider the two statements P(x) : x >−3 and Q(x) : x2 +1 > 0. Since x2 ≥ 0 forevery x ∈ R, it follows that x2 +1 ≥ 0+1 > 0 for every x ∈ R. Thus P(x)→ Q(x) is truefor every x ∈ R and hence for x >−3.

Claim 2. If n is an odd integer, then 6n3 +4n+3 is an odd integer.

Proof. Since 6n3 +4n+3 = 2(3n3 +2n+1)+1 where 3n3 +2n+1 ∈ Z (i.e. 6n3 +4n+3 = 2k+1 where k = 3n3 +2n+1 ∈ Z), the integer 6n3 +4n+3 is odd for every integern.

Observe the fact that 6n3+4n+3 is odd does not depend on n being odd. It would havebeen better to replace the statement of the claim by “if n is an integer, then 6n3 +4n+3 isodd.”

2.3. VACUOUS PROOFS 23

2.3 Vacuous ProofsLet P(x) and Q(x) be the statements with domain D. If P(x) is false for all every x ∈ D,then the universal statement

∀x ∈ D,P(x)→ Q(x)

is true regardless of the truth value of Q(x). Such a proof is called vacuous proof.

Claim 3. For x ∈ R, if x2 −2x+1 < 0, then x > 1.

Proof. Let P(x) : x2 − 2x+ 1 < 0 and Q(x) : x > 1. Since x2 − 2x+ 1 = (x− 1)2 ≥ 0 forevery x ∈ R, we have (x−1)2 < 0 is false for every x ∈ R. Hence, P(x) is false for everyx ∈ R. Thus, P(x)→ Q(x) is true for every x ∈ R.

2.4 Proof by ConstructionIn a proof by construction we work straight from the set of assumptions.

Example 2.1. Consider a function

f (n) = n2 +n+17, (2.1)

where n ∈N. If we evaluate this function, it seems that we always get a prime number.For instance

f (1) = 19f (2) = 23f (3) = 29

f (15) = 257.

We can verify that all these numbers are prime. Then we might conjecture that

Conjecture 1. The function f (n) = n2 +n+17 generates prime numbers for all n ∈ N.

Drawing such a conclusion is an example of inductive reasoning. It is important tonote that we have NOT proved the conjecture made above in the example. In fact, thisconjecture is false. Take n = 17, f (17) = 172 + 17+ 17 = 17 · 19 which is not a primenumber.

Example 2.2. Let NE be the set of even natural numbers and NO be the set of odd numbers.


We want to show that (i) the sum of two even numbers is even,

∀x,y ∈ NE ,x+ y ∈ NE

and (ii) the sum of an odd number and an even number is odd

∀x ∈ NE ,∀y ∈ NO,x+ y ∈ NO.

Proof. (By construction)

(i) Let

x,y ∈ NE ⇔∃m,n ∈ N ϶ x = 2m∧ y = 2n,x+ y = 2m+2n = 2(m+n) ∈ NE since m+n ∈ N.

(ii) Let

x ∈ NE ⇔∃m ∈ N ϶ x = 2m,y ∈ NO ⇔∃n ∈ N ϶ y = 2n+1,x+ y = 2m+2n+1 = 2(m+n)+1, where m+n ∈ N⇒ x+ y ∈ NO.

Example 2.3. Consider function g(n,m)

g(n,m) = n2 +n+m where m,n ∈ N.g(1,2) = 12 +1+2 = 22

g(2,3) = 22 +2+3 = 32

g(12,13) = 122 +12+13 = 132

On the basis of above, we can form a conjecture,

Conjecture 2.∀n ∈ N, g(n,n+1) = (n+1)2 . (2.2)

It turns out that this conjecture is true.

2.5. PROOF BY CONTRAPOSITION 25

Proof. By construction.

g(n,n+1) = (n)2 +n+(n+1)= n2 +2n+1= (n+1)2 .

Having proved the general statement, we know that

g(15,16) = 162.

This is an example of deductive reasoning.

Example 2.4. Show that if x is odd then x2 is odd.

Proof. By construction. Let x > 1.

x ∈ NO ⇔∃n ∈ N ϶ x = 2n+1,x2 = (2n+1)2

= 4n2 +4n+1= 2

(2n2 +2n

)+1

⇒ x2 ∈ NO.

For x = 1, x2 = 1 which is odd.

Example 2.5. If the sum of two integers is even, then so is their difference.

Proof. Assume that the integers m and n are such that m+n is even. Then m+n = 2k forsome integer k. So, m = 2k−n and m−n = 2k−n−n = 2(k−n) = 2l, where l = k−n isan integer. Thus m−n is even.

2.5 Proof by Contraposition

Note that A ⇒ B is not logically equivalent to its converse statement B ⇒ A. It is possiblefor an implication to be false while its converse is true. Hence we cannot prove A ⇒ B byshowing B ⇒ A.

Example 2.6. The implication


m2 > 0 ⇒ m > 0

is false but its conversem > 0 ⇒ m2 > 0

is true.To show that A ⇒ B, we can instead show that ∼ B ⇒∼ A. We have already shown

before that implication and its contrapositive are logically equivalent.

Example 2.7. Consider a theorem.

“If 7m is an odd number then m is an odd number.”Its contrapositive is “If m is not an odd number, then 7m is not an odd number.”, or,

equivalently, “If m is an even number, then 7m is an even number.”We are talking about integers here. Using contrapositive, we can construct a proof of

theorem as under:

Proof.

m ∈ NE ⇔∃k ∈ N ϶ m = 2k,7m = 7(2k) = 2(7k) ,7k ∈ N⇒7m ∈ NE .

This is much easier than trying to show directly that 7m being odd implies that m is odd.

Example 2.8. Show that if x2 is even, then x is even.

x2 ∈ NE ⇒ x ∈ NE (2.3)

Its contrapositive isx ∈ NO ⇒ x2 ∈ NO (2.4)

This we have already shown in an example above.

2.6 Proof by ContradictionTo prove that statement C is true, try supposing ∼ C is true and then show that this leadsto a contradiction. To show that A ⇒ B we can use

∼ (A ⇒ B)⇔ A∧ ∼ B. (2.5)

So assume A to be true and show ∼ B is false. Hence A∧ ∼ B is false. So A ⇒ B is true.

2.6. PROOF BY CONTRADICTION 27

Example 2.9. In the last example,

x2 ∈ NE ⇒ x ∈ NE .

We can prove the statement by contradiction as follows.

Proof. Assume x2 is even and x is odd.

x2 ∈ NE ⇔∃m ∈ N ϶ x2 = 2m.

x ∈ NO ⇔∃n ∈ N ϶ x = 2n+1⇒ x2 = 4n2 +4n+1, which is odd.

This contradicts initial assumption that x2 is even.

Example 2.10. There is no greatest integer.

Proof. Assume, to the contrary, that there is a greatest integer, say N. Then, N ≥ n forevery integer n. Let m = N + 1. Now m is an integer since it is the sum of two integers.Also, m > N. Thus, m is an integer that is greater than the greatest integer, which is acontradiction. Hence our assumption that there is a greatest integer is false. Thus there isno greatest integer.

For next example, we define the rational number first.

Definition 2.5. A real number r is rational number if r = mn for some integers m and n with

n = 0. A real number that is not a rational number is called an irrational number.

Example 2.11. There is no smallest positive rational number.

Proof. Assume, to the contrary, that there is a least positive rational number x. Then,x ≤ y for every positive rational number y. Consider the number x

2 . Since x is a positiverational number, so too is x

2 . Multiplying both sides of the inequality 12 < 1 by x, which is

positive, gives x2 < x. Hence, x

2 is a positive rational number that is less than x, which is acontradiction. Hence our assumption that there is a least positive rational number is false.Thus there is no least positive rational number.

Example 2.12. The sum of a rational number and an irrational number is irrational.

Proof. Assume, to the contrary, that there exists a rational number p and an irrationalnumber q whose sum is a rational number. Thus, by definition of rational numbers, p = a

band p+q = r = c

d for some integers a; b; c and d with b = 0 and d = 0. Hence,

q = r− p =cd− a

b=

bc−adbd


Now, bc− ad ∈ Z and bd ∈ Z since a; b; c and d ∈ Z. Since b = 0 and d = 0, bd = 0.Hence, r ∈ Q, which is a contradiction. Hence our assumption that there exists a rationalnumber and an irrational number whose sum is a rational number is false. Thus, the sumof a rational number and an irrational number is irrational.

We end this section with a proof of the classical result that√

2 is irrational.

Example 2.13. The real number√

2 is irrational.

Proof. Assume, to the contrary, that√

2 is rational. Then,√

2 =mn

where m;n ∈ Z and n = 0. By dividing m and n by any common factors, if necessary, wemay further assume that m and n have no common factors, i.e., m

n has been expressed in(or reduced to) lowest terms. Then, 2 = m2

n2 , and so m2 = 2n2. Thus, m2 is even. Hence, mis even, and so m = 2k, where k ∈ Z. Substituting this into our earlier equation m2 = 2n2,we have (2k)2 = 2n2, and so 4k2 = 2n2. Therefore, n2 = 2k2. Thus, n2 is even, and so nis even. Therefore each of m and n has 2 as a factor, which contradicts our assumptionthat m = n has been reduced to lowest terms and therefore that m and n have no commonfactors. We deduce, therefore, that our assumption that

√2 is rational is incorrect. Hence,√

2 is irrational.

Remark 2.1. One should be very careful when writing proof by contradiction. Here is avery strong word of caution which can be found in Royden1.

“All students are enjoined in the strongest possible terms to eschew proofs by con-tradiction! There are two reasons for the prohibition: First such proofs are very oftenfallacious, the contradiction on the final page arising from an erroneous deduction on anearlier page, rather than from the incompatibility of p with ¬q. Second, even when cor-rect, such a proof gives little insight into the connection between p and q whereas boththe direct proof and the proof by contraposition construct a chain of argument connectingp and q. One reason why mistakes are so much more likely in proofs by contradictionthan in direct proofs is that in a direct proof (assuming the hypotheses is not always false)all deduction from the hypothesis are true in those cases where hypothesis holds. One isdealing with true statements, and one’s intuition and knowledge about what is true help tokeep one from making erroneous statements. In proofs by contradiction, however, you are(assuming the theorem is true) in the unreal world where any statement can be derived,and so the falsity of a statement is no indication of an erroneous deduction.”.

1H. L. Royden “Real Analysis” Third Edition, Prentice Hall, page 3

2.7. PROOF BY INDUCTION 29

2.7 Proof by InductionA proof by induction involves three steps.

(a) Base of induction. Check for n = 1, whether the statement is true.

(b) Inductive transition: Assume that the statement is true for some n and show that it isalso true for n+1.

(c) Inductive conclusion: The statement is true for all n ⩾ 1.

Example 2.14. Show that if f (x) = xn, then f ′ (x) = nxn−1 for n ∈ N.

Proof. By Induction.

(a) Base of induction:

f (x) = x, f ′ (x) = 1 = x0 = 1 · x1−1 (2.6)

(b) Inductive transition:

Assume that forf (x) = xn, f ′ (x) = nxn−1 (2.7)

then for

f (x) = xn+1 = xn · x,f ′ (x) = nxn−1 · x+ xn ·1

= nxn + xn

= (n+1)xn (2.8)

(c) Inductive conclusion:

∀n ∈ N if f (x) = xn then f ′ (x) = n · xn−1.

Example 2.15. Prove by induction that 7n −4n is a multiple of 3, for n ∈ N.

Proof. (a) Base of induction:

7n −4n = 7−4 = 3 (2.9)

Statement is true.



Assume that 7n −4n = 3m where m ∈ N, then

7n+1 −4n+1 = 7 ·7n −4 ·4n

= 7 ·7n −7 ·4n +7 ·4n −4 ·4n

= 7 · (7n −4n)+(7−4) ·4n

= 7 · (3m)+3 ·4n

= 3 · (7m+4n)

Since m and n are natural numbers, so is 7m+4n. So 7n+1 −4n+1 is a multiple of 3.


7n −4n is a multiple of 3, for all n ∈ N.

Example 2.16. Prove the Binomial Theorem : (a+b)n = ∑nk=0(n

k

)an−kbk by induction.

Proof. (a) Base of induction:

For n = 1, the claim is trivially true.


Assume that the Binomial Theorem holds true for n. Then

(a+b)n+1 = (a+b)(a+b)n = (a+b)n

∑k=0

(nk

)an−kbk

=n

∑k=0

(nk

)an−k+1bk +

n

∑k=0

(nk

)an−kbk+1

=n

∑k=0

(nk

)an−k+1bk +

n+1

∑l=1

(n

l −1

)an−l+1bl by change of variable l = k+1

=

(n0

)an+1 +

n

∑l=1

(nl

)+

(n

l −1

)an−l+1bl +

(nn

)bn+1

=

(n+1

0

)an+1 +

n+1

∑l=1

(n+1

l

)an−l+1bl +

(n+1n+1

)bn+1

=n+1

∑k=0

(n+1

k

)a(n+1)−kbk


In the fifth line we have used the fact that ,(nl

)+

(n

l −1

)=

(n+1

l

).

It is a good exercise to verify this.


The Binomial Theorem holds for all n ∈ N.

Observe that in the inductive hypothesis of our proof above, we assume that P(k) istrue for an arbitrary, but fixed, positive integer k. We certainly do not assume that P(k) istrue for all positive integers k, for this is precisely what we wish to prove! It is importantto understand that our aim is to establish the truth of the implication If P(k) is true, thenP(k+1) is true. which together with the truth of the statement P(1) allows us to concludethat an infinite number of statements (namely, P(1), P(2),P(3), · · · ) are true.

Example 2.17. For every positive integer n,

12 +22 + · · ·+n2 =n(n+1)(2n+1)

6.

Proof. For every integer n ≥ 1, let P(n) be the statement P(n) : 12 + 22 + · · ·+ n2 =n(n+1)(2n+1)

6 .

(a) Base of induction:

When n= 1, the statement P(1) : 12 = 1(1+1)(2·1+1)6 is certainly true since 1(1+1)(2·1+1)

6 =66 = 1. This establishes the base case when n = 1.

(b) For every integer n> 1, let P(n) be the statement P(n) : 12+22+· · ·+n2 = n(n+1)(2n+1)6 .

For the inductive hypothesis, let k be an arbitrary (but fixed) integer such that k > 1and assume that P(k) is true; that is, assume that 12 + · · ·+ k2 = k(k+1)(2k+1)

6 . For theinductive step, we need to show that P(k+1) is true. That is, we show that

12 +22 + · · ·+ k2 +(k+1)2 =(k+1)(k+2)(2k+3)

6.


Evaluating the left-hand side of this equation, we have

12 +22 + · · ·+ k2 +(k+1)2 = (12 +22 + · · ·+ k2)+(k+1)2

=k(k+1)(2k+1)

6+(k+1)2 (by the inductive hypothesis)

=k(k+1)(2k+1)

6+

6(k+1)2

6

=(k+1)(2k2 + k+6k+6)

6

=(k+1)(2k2 +7k+6)

6=

(k+1)(2k2 +4k+3k+6)6

=(k+1)(k+2)(2k+3)

6;

thus verifying that P(k+1) is true.

(c) Hence, by the principle of mathematical induction, P(n) is true for all integers n ≥ 1;that is,

12 +22 + · · ·+n2 =n(n+1)(2n+1)

6is true for every positive integer n.

Recall that in a geometric sequence, each term is obtained from the preceding one bymultiplying by a constant factor. If the first term is 1 and the constant factor is r, then thesequence is 1, r, r2, r3, · · · , rn, · · · . The sum of the first n terms of this sequence is givenby a simple formula which we shall verify using mathematical induction. This is left as anexercise.

Induction can also be used to solve problems involving divisibility, as the next twoexample illustrates.

Example 2.18. For all integers n ≥ 1, 22n −1 is divisible by 3.

Proof. We proceed by mathematical induction. When n = 1, the result is true since in thiscase 22n −1 = 22 −1 = 3 and 3 is divisible by 3. Hence, the base case when n = 1 is true.For the inductive hypothesis, let k be an arbitrary (but fixed) integer such that k > 1 andassume that the property holds for n = k, i.e., suppose that 22k−1 is divisible by 3. For theinductive step, we must show that the property holds for n = k+1. That is, we must show


that 22(k+1)−1 is divisible by 3. Since 22k −1 is divisible by 3, there exists, by definitionof divisibility, an integer m such that 22k −1 = 3m, and so 22k = 3m+1. Now,

22(k+1)−1 = 22k22 −1

= 4 ·22k −1= 4(3m+1)−1= 12m+3= 3(4m+1).

Since m ∈ Z, we know that 4m+ 1 ∈ Z. Hence, 22(k+1)− 1 is an integer multiple of 3;that is, 22(k+1)− 1 is divisible by 3, as desired. Hence, by the principle of mathematicalinduction, the property holds for all integers n ≥ 1.

Induction can also be used to verify certain inequalities, as the next example illustrates.

Example 2.19. For all integers n ≥ 2,

√n <

1√1+

1√2+ · · ·+ 1√

n

.

Proof. We proceed by mathematical induction. To show the inequality holds for n = 2,we must show that

√2 <

1√1+

1√2.

But this inequality is true if and only if 2 <√

2+ 1 which is true if and only if 1 <√

2.Since 1 <

√2 is true, so too is

√2 < 1√

1+ 1√

2. Hence the inequality holds for n = 2. This

establishes the base case. For the inductive hypothesis, let k be an arbitrary (but fixed)integer such that k > 2 and assume that the inequality holds for n = k, i.e., suppose that

√k <

1√1+

1√2+ · · ·+ 1√

k.

For the inductive step, we must show that the inequality holds for n = k+ 1. That is, wemust show that

√k+1 <

1√1+

1√2+ · · ·+ 1√

k+

1√k+1

.


Since k > 2,√

k <√

k+1, and so (multiplying both sides by√

k, k <√

k√

k+1. Hence(adding 1 to both sides), k+1 <

√k√

k+1+1; and so (dividing both sides by√

k+1 wehave

√k+1 <

√k+ 1√

k+1. Hence, by the inductive hypothesis,

√k+1 <

1√1+

1√2+ · · ·+ 1√

k+

1√k+1

;

as desired. Hence, by the principle of mathematical induction, the inequality holds for allintegers n > 2.

2.8 Additional Notes on ProofsTo prove a universal statement

∀x ∈ D, p(x) (2.10)

we let x represent an arbitrary element of the set D and then show that statement p(x) istrue. The only properties we can use about x are those that apply to all elements of D. Forexample, if the set D consists of the natural numbers, then we cannot assume x to be oddas not all natural numbers are odd. To prove an existential statement,

∃x ∈ D, p(x) (2.11)

all we need to do is to show that there exists at least one member of D for which p(x) istrue. We show these techniques through following examples.

Example 2.20. For every ε > 0, there exists a δ > 0 such that

1−δ < x < 1+δ ⇒ 5− ε < 2x+3 < 5+ ε (2.12)

In this example we are asked to prove that the statement is true for each positive number ε.We begin with an arbitrary ε and use it to find a δ which is positive and has the property thatthe implication holds true. We give a particular value of δ which could possibly dependon the ε and show that the statement is true.

Proof. Let ε > 0 be arbitrary and let δ = ε2 . Note that δ > 0.

1−δ < x < 1+δ1− ε

2< x < 1+

ε2

2− ε < 2x < 2+ ε5− ε < 2x+3 < 5+ ε.

2.8. ADDITIONAL NOTES ON PROOFS 35

In some cases, it is possible to prove an existential statement in an indirect way withoutactually producing any specific element of the set. One indirect method is to use contra-positive and another is to use a proof by contradiction. Consider following example toshow this aspect.

Example 2.21. Let f be a continuous function.

If1∫0

f (x)dx = 0, (2.13)

then there exists a point x ∈ [0,1] such that

f (x) = 0.

Proof. The contrapositive implication can be written as

If ∀x ∈ [0,1] f (x) = 0, then1∫0

f (x)dx = 0 (2.14)

This is lot easier to prove. Instead of having to conclude the existence of an x having aparticular property, we are given that all x have a different property. The proof followsdirectly from the definition of the integral, since each of the terms in any Riemann sumwill be zero.

Example 2.22. Let x be a real number. If x > 0 then 1x > 0.

Proof. Note that p ⇒ q is equivalent to (p∧ ∼ q)⇒ contradiction. We begin by assumingx > 0 and

1x⩽ 0. (2.15)

Since x > 0, we can multiply both sides by x.

(x)(

1x

)⩽ (x) ·0 or 1 ⩽ 0. (2.16)

This is a contradiction.

Consider the proof of the following existential statement.

Claim 4. There exist irrational numbers a and b such that ab is rational.


Proof. Consider the real number,√

2√

2. This number is either rational or irrational. We

consider each case in turn.

(1)√

2√

2is rational. Let a =

√2 and b =

√2. Thus a and b are irrational, and by

assumption, ab is rational.

(2)√

2√

2is irrational. Let a =

√2√

2and b =

√2. Thus a and b are irrational. Moreover,

ab = (√

2√

2)√

2 = (√

2)√

2·√

2 = (√

2)2 = 2 is rational.

In both cases, we proved the existence of irrational numbers a and b such that ab is rational,and so we have the desired result.

We remark that as it stands, this proof does not enable us to pinpoint which of thetwo choices of the pair (a,b) has the required property. In order to determine the correct

choice of (a,b), we would need to decide whether√

2√

2is rational or irrational. It is not a

constructive proof. Following would be a constructive proof of this claim. Let a =√

2 andb = log2 9. Then b is an irrational number, for if it were rational, then log2 9 = m

n where mand n are integers with no common factor. This implies 2m = 9n which is a contradictionas 2m is an even number and 9n is an odd number. This gives ab = 3 which is rational 2.

2.9 Decomposition or proof by casesLet P(x) be a statement. If x possesses certain properties, and if we can verify that P(x) istrue regardless of which of these properties x has, then P(x) is true. Such a proof is calleda proof by cases.

Some proofs naturally divide themselves into consideration of two or more cases. Forexample positive integers are either even or odd. Real numbers are positive, negative orzero. It may be that different arguments are required for each case.

More rigorously, suppose we want to prove that p ⇒ q, and that p can be decomposedinto two disjoint propositions p1, p2 such that p1 ∧ p2 is a contradiction. Then p ≡ (p1 ∨p2)∧¬(p1 ∧ p2)≡ (p1 ∨ p2).

With this choice of p1 and p2, we have,

(p ⇒ q) ⇔ (¬p∨q)⇔ [¬(p1 ∨ p2)∨q]⇔ [(¬p1 ∧¬p2)∨q]⇔ [(¬p1 ∨q)∧ (¬p2 ∨q)]⇔ [(p1 ⇒ q)∧ (p2 ⇒ q)].

2There is an extensive literature on constructive mathematics. You may like to do a google search foreasy to read articles on the subject. A classic reference is Bridges and Ray (1984).

2.9. DECOMPOSITION OR PROOF BY CASES 37

This means that we only need to show that p1 ⇒ q and p2 ⇒ q. Note that this methodworks also if we can decompose p into a number of propositions greater than 2 as faras these propositions are mutually exclusive ( i.e., every pair of them is a contradiction).Following example illustrates this technique.

Before going over some examples, we state the following theorem.

Theorem 2.1. (Quotient-Remainder Theorem) For every given integer n and positive in-teger d, there exist unique integers q and r such that

n = d ·q+ r and 0 ≤ r < d.

Definition 2.6. Let n be a nonnegative integer and let d be a positive integer. By theQuotient-Remainder Theorem, there exist unique integers q and r such that n = d · q+ r;where 0 ≤ r < d. We define,

n div d = q (read as “n divided by q ”), andn mod d = r (read as “n modulo q ”).Thus n div d and n mod d are the integer quotient and integer remainder, respectively,

obtained when n is divided by d.

Observe that given a nonnegative integer n and a positive integer d, we have thatn mod d ∈ 0, · · · ,d − 1 (since 0 ≤ r ≤ d − 1) and that n mod d = 0 if and only if nis divisible by d.

Result 1. Every integer is either even or odd.

Proof. By the Quotient-Remainder Theorem with d = 2, there exist unique integers q andr such that n = 2 · q+ r and 0 ≤ r < 2. Hence, r = 0 or r = 1. Therefore, n = 2q orn = 2q+ 1 for some integer q depending on whether r = 0 or r = 1, respectively. In thecase that n = 2q, the integer n is even. In the other case that n = 2q+ 1, the integer n isodd. Hence, n is either even or odd.

Let Z denote the set of integers.

Example 2.23. If n ∈ Z, then n2 +5n+3 is an odd integer.

Proof. We use a proof by cases, depending on whether n is even or odd.

(1) n is even.

Then, n = 2k for some integer k. Thus, n2+5n+3 = (2k)2+5(2k)+3 = 4k2+10k+3 = 2(2k2+5k+1)+1 = 2m+1, where m = 2k2+5k+1. Since k ∈Z, we must havem ∈Z. Hence, n2+5n+3 = 2m+1 for some integer m, and so the integer n2+5n+3is odd.


(2) n is odd.

Then, n = 2k+1 for some integer k. Thus, n2 +5n+3 = (2k+1)2 +5(2k+1)+3 =4k2 +14k+9 = 2(2k2 +7k+4)+1 = 2m+1, where m = 2k2 +7k+4. Since k ∈ Z,we must have m ∈ Z. Hence, n2 + 5n+ 3 = 2m+ 1 for some integer m, and so theinteger n2 +5n+3 is odd.

Example 2.24. Let m,n ∈ Z. If m and n are of the same parity (either both even or bothodd), then m+n is even.

Proof. We use a proof by cases, depending on whether m and n are both even or both odd.

(1) m and n are both even.

Then, m = 2k and n = 2l for some integers k and l. Thus, m+n = 2k+2l = 2(k+ l).Since k+ l ∈ Z, the integer m+n is even.

(2) m and n are both odd.

Then, m = 2k+1 and n = 2l +1 for some integers k and l. Thus, m+n = (2k+1)+(2l +1) = 2(k+ l +1). Since k+ l +1 ∈ Z, the integer m+n is even.

Example 2.25. Let n ∈ Z. If n2 is a multiple of 3, then n is a multiple of 3.

Proof. We shall combine two proof techniques and use both a proof by contrapositive anda proof by cases. Suppose that n is not a multiple of 3. We wish to show then that n2 isnot a multiple of 3. By the Quotient-Remainder Theorem with d = 3, there exist uniqueintegers q and r such that n = 3 · q+ r and 0 ≤ r < 3. Hence, r ∈ 0;1;2. Therefore,n = 3q or n = 3q+ 1 or n = 3q+ 2 for some integer q depending on whether r = 0; 1 or2, respectively. Since n is not a multiple of 3, either n = 3q+ 1 or n = 3q+ 2 for someinteger q. We consider each case in turn.

(1) n = 3q+1 for some integer q.

Then, n2 = (3q+1)2 = 9q2 +6q+1 = 3(3q2 +2q)+1, and so n2 is not a multiple of3.


Then, n2 = (3q+2)2 = 9q2+12q+4= 3(3q2+4q+1)+1, and so n2 is not a multipleof 3.

2.9. DECOMPOSITION OR PROOF BY CASES 39

Example 2.26. Let n ∈ Z. If n is an odd integer, then n2 = 8m+1 for some integer m.

Proof. We shall use both a direct proof and a proof by cases. Assume that n is an oddinteger. By the Quotient-Remainder Theorem with d = 4, there exist unique integers qand r such that n = 4 · q+ r and 0 ≤ r < 4. Hence, r ∈ 0;1;2;3. Therefore, n = 4q orn = 4q+1 or n = 4q+2 or n = 4q+3 for some integer q depending on whether r = 0; 1; 2or 3, respectively. Since n is odd, and since 4q and 4q+2 are both even, either n = 4q+1or n = 4q+3 for some integer q. We consider each case in turn.


Then, n2 = (4q+1)2 = 16q2+8q+1 = 8(2q2+q)+1 = 8m+1, where m = 2q2+q.Since q ∈ Z, we must have m ∈ Z. Hence, n2 = 8m+1 for some integer m.


Then, n2 = (4q+3)2 = 16q2+24q+9= (16q2+24q+8)+1= 8(2q2+3q+1)+1=8m+1, where m= 2q2+3q+1. Since q∈Z, we must have m∈Z. Hence, n2 = 8m+1for some integer m.

We remark that the last conclusion can be restated as follows: For every odd integer n,we have n2 mod 8 = 1. Here are some additional illustrative examples.

Example 2.27. If x is a real number, then

x ⩽ |x| .

Recall the definition of absolute value:

|x|=

x if x ⩾ 0−x if x < 0. (2.17)

Since this definition is divided into two parts, it makes sense to divide the proof also intwo parts.

Proof. Let x be an arbitrary real number. Then either x ⩾ 0 or x < 0. If x ⩾ 0, then bydefinition |x|= x. If x < 0, then −x > 0, so that

x < 0 <−x = |x|

In either case,x ⩽ |x| .

Chapter 3

Problem Set 1

(1) Prove or give a counterexample for the following claims. Capital letters refer to propo-sitions or sets, depending on the context.

(a)∼ (A∧B)⇔∼ A ∨ ∼ B

(b)∼ (A∨B)⇔∼ A ∧ ∼ B.

(c)∼ (A ⇒ B)⇔ A ∧ ∼ B.

(d)((A∨B)⇒C)⇔ ((A ⇒C)∧ (B ⇒C)).

(e) If n and n+1 are consecutive integers, then both cannot be even.

(f) Give a counter example of the proposed statement: If n ∈ N then n2 > n.

(g) If x is odd then x2 is odd.

(2) Write the negation of the following statements

(a) If S is closed and bounded, then S is compact.

(b) If S is compact, the S is closed and bounded.

(c) A continuous function is differentiable

(3) Find the contrapositive of

(a) If x2 = 3∧ y2 ⩾ 5 then xy is a rational number.

41

42 CHAPTER 3. PROBLEM SET 1

(b) If x = 0 then ∃y ϶ xy = 1.

(4) Find the mistake in the “proof”of the following results, and provide correct proofs.

(a) If m is an even integer and n is an odd integer, then 2m+3n is an odd integer.

Proof. Since m is an even integer and n is an odd integer, m = 2k and n = 2k+1for some integer k. Therefore, 2m+ 3n = 2(2k)+ 3(2k+ 1) = 10k+3 = 2(5k+1)+ 1 = 2l + 1; where l = 5k+ 1. Since k ∈ Z, l ∈ Z. Hence, 2m+ 3n = 2l + 1for some integer l, whence 2m+3n is an odd integer.

(b) For all integers n ≥ 1, n2 +2n+1 is composite.

Proof. Let n= 4. Then, n2+2n+1= 42+2(4)+1= 25 and 25 is composite.

(5) Prove the following claims:

(a) An integer that is not divisible by 2, cannot be divisible by 4. (Try proving thistwice, once with contraposition and once with contradiction).

(b) There is no greatest negative real number.(c) The product of an irrational number and a nonzero rational number is irrational.

(6) Prove that for n ∈ N,

(a)1+3+5+ · · ·+(2n−1) = n2.

(b)

1+2+ · · ·+n =n(n+1)

2.

(7) (Sum of a Geometric Sequence): For all integers n ≥ 0 and all real numbers r withr = 1,

n

∑i=0

ri =rn+1 −1

r−1

What can we say when n → ∞ for arbitrary values of r? For what values of r is thesum well defined? What is the sum for such values of r?

(8) (a) For all integers n ≥ 2, n3 −n is divisible by 6.(b) For all integers n ≥ 3, 2n > 2n+1.

(9) All prime numbers greater than 6 are either of the form 6n+ 1 or 6n+ 5, where n issome natural number.

Chapter 4

Solution to PS 1

(1) (a) ∼ (A∧B)⇔∼ A ∨ ∼ B and ∼ (A∨B)⇔∼ A ∧ ∼ B. We prove this claim usingTruth table.

A B A∧B A∨B ∼ (A∧B) ∼ A ∼ B ∼ A∨ ∼ B ∼ (A∨B) ∼ A∧ ∼ B1 2 3 4 5 6 7 8 9 10T T T T F F F F F FT F F T T F T T F FF T F T T T F T F FF F F F T T T T T T

Claim (a) is proved by comparing the columns 5 and 8.

(b) Claim (b) is proved by comparing columns 9 and 10.

(c) ∼ (A ⇒ B)⇔ A ∧ ∼ B

A B A ⇒ B ∼ (A ⇒ B) ∼ B A∧ ∼ B1 2 3 4 5 6T T T F F FT F F T T TF T T F F FF F T F T F

43

44 CHAPTER 4. SOLUTION TO PS 1

(d) ((A∨B)⇒C)⇔ ((A ⇒C)∧ (B ⇒C))

A B C A ⇒C B ⇒C A∨B (A ⇒C)∧ (B ⇒C) (A∨B)⇒C1 2 3 4 5 6 7 8T T T T T T T TT T F F F T F FT F T T T T T TT F F F T T F FF T T T T T T TF T F T F T F FF F T T T F T TF F F T T F T T

(e) This claim is true.If n is even, then n+1 is odd. If n is odd, then n+1 is even. Hence both cannotbe even.

(f) Let n = 1, then n2 = 1 = n.

(g) Let x > 1.

x ∈ NO ⇔∃n ∈ N ϶ x = 2n+1,x2 = (2n+1)2

= 4n2 +4n+1= 2

(2n2 +2n

)+1

⇒ x2 ∈ NO. (4.1)

For x = 1, x2 = 1 which is odd.

(2) Recall that ∼ (A ⇒ B) is equivalent to A∧ ∼ B.

(a) Set S is closed and bounded, and S is not compact.

(b) Set S is compact, and S is either not closed or unbounded.

(c) Function f is continuous and not differentiable.

(3) (a) If xy is NOT a rational number then x2 = 3∨ y2 < 5.

(b) If ∄y ϶ xy = 1 then x = 0. or If ∀y, xy = 1 then x = 0.

(4) (a) The mistake is in assuming the same value of k for m and n. The correct proofshould be

45

Proof. Since m is an even integer and n is an odd integer, m = 2k and n = 2p+1for some integers k and p. Therefore, 2m+3n= 2(2k)+3(2p+1)= 4k+6p+3=2(2k+3p+1)+1 = 2l+1; where l = 2k+3p+1. Since k, p ∈ Z, l ∈ Z. Hence,2m+3n = 2l +1 for some integer l, whence 2m+3n is an odd integer.

(b) The mistake is in showing the claim for one particular value of n. The claim holdsfor all positive integers. The correct proof should be

Proof. Observe that n2 + 2n+ 1 = (n+ 1)2 and (n+ 1) · (n+ 1) is a compositenumber for all positive integers.

(5) (a) (i) Contrapositive If a number is divisible by 4 then it is divisible by 2. Lety = 4m where m ∈ N, then y = 2(2m). Hence y is divisible by 2.

(ii) Contradiction There exist a number y which is not divisible by 2 but isdivisible by 4. Since y = 4m where m ∈ N, we know that y = 2(2m) and soy is divisible by 2. This contradicts our initial assumption.

(b) There is no greatest negative real number.

Proof. Assume, to the contrary, that there is a greatest negative real number x.Then, x ≥ y for every negative real number y. Consider the number x

2 . Since xis a negative real number, so too is x

2 . Multiplying both sides of the inequality12 < 1 by x, which is negative, gives x

2 > x. Hence, x2 is a negative real number

that is greater than x, which is a contradiction. Hence our assumption that thereis a greatest negative real number is false. Thus there is no greatest negative realnumber.

(c) The product of an irrational number and a nonzero rational number is irrational.

Proof. Assume, to the contrary, that there exists a non-zero rational number p andan irrational number q whose product is a rational number. Thus, by definition ofrational numbers, p= a

b and p ·q= r = cd for some integers a; b; c and d witha = 0,

b = 0 and d = 0. Hence,

q =rp=

cdab=

bcad

Now, bc ∈ Z and ad ∈ Z since a; b; c and d ∈ Z. Since a = 0 and d = 0, ad = 0.Hence, r ∈ Q, which is a contradiction. Hence our assumption that there existsa non-zero rational number and an irrational number whose product is a rationalnumber is false. Thus, the product of a rational number and an irrational numberis irrational.


(6) (a) (i) Base of induction:When n = 1, the statement P(1) : 1 = 12 holds trivially.

(ii) For every integer n > 1, let P(n) be the statement P(n) : 1+3+ · · ·+(2n−1) = n2. For the inductive hypothesis, let k be an arbitrary (but fixed) integersuch that k > 1 and assume that P(k) is true; that is, assume that 1+3+ · · ·+(2k−1) = k2. For the inductive step, we need to show that P(k+1) is true.That is, we show that

1+3+ · · ·+(2k−1)+(2k+1) = (k+1)2.


1+3+ · · ·+(2k−1)+(2k+1) = (1+3+ · · ·+(2k−1))+(2k+1)

= k2 +(2k+1) (by the inductive hypothesis)

= (k+1)2;

thus verifying that P(k+1) is true.

(iii) Hence, by the principle of mathematical induction, P(n) is true for all inte-gers n ≥ 1; that is,

1+23+ · · ·+(2n−1) = n2

is true for every positive integer n.

(b) (i) Base of induction:When n = 1, the statement P(1) : 1 = 1(1+1)

2 is certainly true since 1(1+1)2 =

22 = 1. This establishes the base case when n = 1.

(ii) For every integer n > 1, let P(n) be the statement P(n) : 1+ 2+ · · ·+ n =n(n+1)

2 . For the inductive hypothesis, let k be an arbitrary (but fixed) integersuch that k > 1 and assume that P(k) is true; that is, assume that 1+ · · ·+k =k(k+1)

2 . For the inductive step, we need to show that P(k+1) is true. That is,we show that

1+2+ · · ·+ k+(k+1) =(k+1)(k+2)

2.

47


1+2+ · · ·+ k+(k+1) = (1+2+ · · ·+ k)+(k+1)

=k(k+1)

2+(k+1) (by the inductive hypothesis)

=k(k+1)

2+

2(k+1)2

=(k+1)(k+2)

2;

thus verifying that P(k+1) is true.(iii) Hence, by the principle of mathematical induction, P(n) is true for all inte-

gers n ≥ 1; that is,

1+2+ · · ·+n =n(n+1)

2is true for every positive integer n.

(7) To show that the formula holds for n = 0, we must show that

0

∑i=0

ri =r0+1 −1

r−1.

The left-hand side of this equation is ∑0i=0 ri = r0 = 1, while the right-hand side is

r0+1−1r−1 = 1, since r = 1. Hence the formula holds for n = 0. For the inductive hy-

pothesis, let k be an arbitrary (but fixed) integer such that k ≥ 0 and assume that∑k

i=0 ri = rk+1−1r−1 . For the inductive step, we need to show that ∑k+1

i=0 ri = rk+2−1r−1 . Eval-

uating the left-hand side of this equation, we have

k+1

∑i=0

ri =k

∑i=0

ri + rk+1 (writing the (k + 1)st term separately)

=rk+1 −1

r−1+ rk+1 (by the inductive hypothesis)

=rk+1 −1

r−1+

(r−1)rk+1

r−1

=rk+1 −1+ rk+2 − rk+1

r−1

=rk+2 −1

r−1;


thus verifying the claim. Hence, by the principle of mathematical induction, the for-mula is true for all integers n ≥ 0.

In the limiting case of n → ∞, the sum is well-defined for |r| < 1. Also the sum is1

1−r in this case. In case of |r| ≥ 1 it is not well defined in case of n → ∞, though it isdefined for all n ∈ N.

(8) (a) We proceed by mathematical induction. When n = 2, the result is true since inthis case n3 −n = 23 −2 = 8−2 = 6 and 6 is divisible by 6. Hence, the base casewhen n = 2 is true. For the inductive hypothesis, let k be an arbitrary (but fixed)integer such that k ≥ 2 and assume that the property holds for n = k, i.e., supposethat k3−k is divisible by 6. For the inductive step, we must show that the propertyholds for n = k+1. That is, we must show that (k+1)3− (k+1) is divisible by 6.Since k3 − k is divisible by 6, there exists, by definition of divisibility, an integerr such that k3 −k = 6r. Now, by the laws of algebra and the inductive hypothesis,it follows that

(k+1)3 − (k+1) = (k3 +3k2 +3k+1)− (k+1)

= (k3 − k)+3(k2 + k)= 6r+3k(k+1)

Now, k(k + 1) is a product of two consecutive integers, and is therefore even.Hence, k(k + 1) = 2s for some integer s. Thus, 6r + 3k(k + 1) = 6r + 3(2s) =6(r+ s), and so, by substitution, (k+1)3 − (k+ 1) = 6(r+ s), which is divisibleby 6. Therefore, (k+ 1)3 − (k+ 1) is divisible by 6, as desired. Hence, by theprinciple of mathematical induction, the property holds for all integers n ≥ 2.

(b) We proceed, as before, by mathematical induction. When n = 3, the inequalityholds since in this case 2n = 23 = 8 and 2n+1 = 2 ·3+1 = 7, and 8 > 7. Hence,the base case when n = 3 is true. For the inductive hypothesis, let k be an arbitrary(but fixed) integer such that k > 3 and assume that the inequality holds for n = k,i.e., suppose that 2k > 2k + 1. For the inductive step, we must show that theinequality holds for n = k+ 1. That is, we must show that 2k+1 > 2(k+ 1)+ 1.Now,

2k+1 = 2 ·2k

> 2 · (2k+1) (by the inductive hypothesis)= 2(k+1)+2k> 2(k+1)+1 (since k ≥ 3),

49

as desired. Hence, by the principle of mathematical induction, the inequality holdsfor all integers n ≥ 3.

(9) By the Quotient Remainder theorem, for d = 6, for all natural number m, m = 6n+ rwhere n is integer and r ∈ 0,1,2,3,4,5. Since m is prime, it cannot be of the form6n (divisible by 6), 6n+2, or 6n+4 (divisible by 2) or 6n+3 (divisible by 3). Thusthe only remaining possibilities are 6n+1 and 6n+5.

Chapter 5

Set Theory, Sequence

5.1 Set Theory

5.1.1 Basic DefinitionsDefinition 5.1. A set is a well-specified collection of elements.

We define a set as a “well-specified collection”in order to emphasize that there mustbe a clear rule or group of rules that determine membership in the set. Essentially allmathematical objects can be gathered into sets: numbers, variables, functions, other sets,etc. Examples of sets can be found everywhere around us. For example, we can speak ofthe set of all living human beings, the set of all cities in Europe, the set of all propositions,the set of all prime numbers, and so on. Each living human being is an element of the set ofall living human beings. Similarly each prime number is an element of the set of all primenumbers. If A is a set and a is an element of A, then we write a ∈ A. If it so happens that ais not an element of A, then we write a /∈ A. If S is the set whose elements are s, t, and u,then we write S = s; t;u. The left brace and right brace visually indicate the “bounds”of the set, while what is written within the bounds indicates the elements of the set. Forexample, if S = 1;2;3;5, then 2 ∈ S, but 4 /∈ S. Sets are determined by their elements.The order in which the elements of a given set are listed does not matter. For example,1;2;3 and 3;1;2 are the same set. It also does not matter whether some elements ofa given set are listed more than once. For instance, 1;2;2;2;3;3 is still the set 1;2;3.Many sets are given a shorthand notation in mathematics as they are used so frequently. Aset may be defined by a property. For instance, the set of all true propositions, the set ofall even integers, the set of all odd integers, and so on. Formally, if P(x) is a property, wewrite A = x ∈ S : P(x) to indicate that the set A consists of all elements x of S having theproperty P(x). The colon : is commonly read as “such that” and is also written as “—.”.

51

52 CHAPTER 5. SET THEORY, SEQUENCE

A B

Figure 5.1: Set B is a strict subset of set A: B⊂ A

So x ∈ S|P(x) is an alternative notation for x ∈ S : P(x). For a concrete example,consider A = x ∈ R : x2 = 2. Here the property P(x) is x2 = 1. Thus, A is the set of allreal numbers whose square is one.

Definition 5.2. If A is a set, then B is a subset of A if every element of B is also an elementof A.

We write B ⊆ A or A ⊇ B.

Definition 5.3. If A is a set, then B is a strict subset of A if every element of B is also anelement of A, and there exists at least one element of A which is not an element of B.

We write B ⊂ A or A ⊃ B. In shorthand we could write these as: B is a subset of A if

b ∈ B ⇒ b ∈ A

and B is a strict subset of A if

b ∈ B ⇒ b ∈ A∧∃a ∈ A s.t. a /∈ B.

Technically we should differentiate between subsets and strict subsets, but economists areusually sloppy about this. In most courses you will see the operator ⊂ used for both,and you will not be required to differentiate between the two concepts. Now let X be auniversal set, such that we are interested in subsets of this set.

Definition 5.4. The complement of the set A is the set Ac containing all elements not inA.

5.1. SET THEORY 53

Set A

Set AC

Figure 5.2: Complement of Set A

We write Ac = x : x /∈ A.For the complement of a set to be clearly understood, we need to know what the rele-

vant universe is. For example, we can define the set J as all real numbers between 2 and4, inclusive:

J = x ∈ R | 2 ≤ x ≤ 4.

1 In this context, the set Jc is the set of all real numbers strictly less than 2 and strictlygreater than 4:

Jc = x ∈ R | x < 2∨ x > 4.

The “universe”in this case is the set of real numbers. The complement of J doesn’tinclude all mathematical objects not in J, nor does it include all numbers not in J (becausecomplex numbers are excluded). In most cases the universe is clear from the context.

Example 5.1. Some examples of sets are:

D = 2,4,10,B = x ∈ R s.t. x ≥ 10S = The set of all real-valued functions on R.

1This can also be written as J = [2,4], where the square brackets indicate the closed interval between thefirst entry and the second.


5.1.2 A Few Common SetsR The real numbersR+ Real numbers ≥ 0R++ Real numbers > 0Z The set of integers (−10,0,2,451, etc.)Z+ The set of integers ≥ 0 (also called N)Z++ The set of integers > 0 (sometimes also called N)Q The rational numbers (numbers that can be expressed as fractions)C The complex numbers/0 Empty set or null setΩ The universal setR2 The set of pairs of real numbers

The last set R2 is shorthand notation for the Cartesian product R×R. This notation isacceptable for any n ∈ Z++ number of sets. You will often encounter proofs and theoremsdefined on the set Rn, which is the general way of describing the space of n-vectors, eachelement of which are real numbers (this is taking us ahead to linear algebra).

5.1.3 Set OperationsDefinition 5.5. Union : The union of n sets is the set containing all elements from all nsets. We write

A∪B = x : x ∈ A∨ x ∈ B.n∪

i=1Ai = A1 ∪A2 ∪·· ·∪An == x : for some i = 1, · · · ,n,x ∈ Ai

Definition 5.6. Intersection : The intersection of n sets is the set containing the elementscommon to all n sets. We write

A∩B = x : x ∈ A∧ x ∈ B.n∩

i=1Ai = A1 ∩A2 ∩·· ·∩An = x : for all i = 1, · · · ,n,x ∈ Ai

Exercise 5.1. Let A1, · · · , An be subsets of X . Then,

[j=n∪j=1

A j

]C

=j=n∩j=1

ACj ;

[j=n∩j=1

A j

]C

=j=n∪j=1

ACj .

5.1. SET THEORY 55

A B

A∪B

Figure 5.3: Union of two sets: A∪B

A BA∩B

Figure 5.4: Intersection of two sets: A∩B


A B

A\B

Figure 5.5: Exclusion of Set B from Set A: A\B

Definition 5.7. Exclusion : The exclusion of the set B from the set A is the set of allelements in A that are, in addition, not elements of B. We write

A\B = x ∈ A | x /∈ B.

Proposition 1. (A\B)∩ (B\A) = /0

Proof.

A\B = A∩BC ⊆ BC

B\A = B∩AC ⊆ BB∩BC = /0.

Any subset of empty set is empty. Here is a pictorial representation (Fig. 5.1.3) of thisproof.

Exercise 5.2. Let B, and A1, · · · , An be subsets of X . Then,

B−

[j=n∪j=1

A j

]=

j=n∩j=1

(B−A j); B−

[j=n∩j=1

A j

]=

j=n∪j=1

(B−A j).

Next we consider the sets whose elements are sets themselves. For example, let A,B, and C be subsets of X , then the collection A = A,B,C is a set, whose elements areA, B and C. We call a set whose elements are subsets of X a family of subsets of X , or acollection of subsets of X . The notation we follow would be, the lower case letters refer to

5.2. SET IDENTITIES 57

A B

A\B B\A

Figure 5.6: Proposition 1: Sets A\B and B\A have empty intersection.

the elements of X , upper case letters refer to subsets of X and script letters refer to familiesof subsets of X .

Observe that the empty set /0 is a subset of X . It is possible to form a non-empty setwhose only element is the empty set, i.e., /0. In this case /0 is a singleton. Also /0 ⊂ /0and /0 ∈ /0.

There is a special family of subsets of X with a special name.

Definition 5.8. Let A be any subset in X . The power class of A or the power set of A tobe the family of all subsets of A. We denote the power set of A by P (A).

Specifically,P (A) = B : B ⊆ A

The power set of the empty set is P ( /0) = /0, i.e., the singleton of /0. The power set of asingleton P (a) = /0,a. Note that the power set of A always contains A and /0. Ingeneral, if A is a finite set with n elements, then P (A) contains 2n elements.

Exercise 5.3. Prove that if A is a finite set with n elements, then P (A) contains 2n ele-ments.

5.2 Set IdentitiesThere are a number of set identities that the set operations of union, intersection, and setdifference satisfy. They are very useful in calculations with sets. Below we give a table ofsuch set identities, where U is a universal set and A, B, and C are subsets of U .

• Commutative Laws: A∪B = B∪A ; A∩B = B∩A


• Associative Laws: (A∪B)∪C = A∪ (B∪C) ; (A∩B)∩C = A∩ (B∩C)

• Distributive Laws: A∩ (B∪C) = (A∩B)∪ (A∩C) ; A∪ (B∩C) = (A∪B)∩ (A∪C)

• Idempotent Laws: A∪A = A ; A∩A = A

• Absorption Laws: A∩ (A∪B) = A ; A∪ (A∩B) = A

• Identity Laws: A∪ /0 = A ; A∩U = A

• Universal Bound Laws: A∪U =U ; A∩ /0 = /0

• De Morgan’s Laws: (A∪B)c = Ac ∩Bc ; (A∩B)c = Ac ∪Bc

• Complement Laws: A∪Ac =U ; A∩Ac = /0

• Complements of U and /0 : Uc = /0 ; /0c =U

• Double Complement Law: (Ac)c = A

• Set Difference Law: A\B = A∩Bc

Exercise 5.4. Prove the following using only set identities:

(a) (A∪B)\C = (A\C)∪ (B\C).

(b) (A∪B)\ (C \A) = A∪ (B\C).

(c) A∩ (((B∪Cc)∪ (D∩Ec))∩ ((B∪Bc)∩Ac)) = /0.

We will discuss additional concepts in set theory after we have gone over some ele-mentary exposition of functions and sequences.

5.3 FunctionsDefinition 5.9. A function consists of:

(a) A set D called the domain;

(b) A set R called the range; and

(c) A mapping f (x) which assigns exactly one element from R to each element x ∈ D.

5.3. FUNCTIONS 59

Here are some examples of functions.

f (x) = x3,D = R, R = Rf (x) = 0,D = R, R = R.

The range need not be exhausted but the domain must be.

Definition 5.10. A correspondence consists of:

(a) A set D called the domain;

(b) A set R called the range; and

(c) A mapping f (x) which assigns at least one element from R to each element x ∈ D.

The set of all functions is a strict subset of the set of all correspondences. This is thesame as saying that all functions are correspondences, but not the other way around. Fromhere onwards it’s critical that you specify the domain and the range when defining or usinga function. For example these two functions:

f : R→ R such that f (x) = x2

g : R→ R+ such that g(x) = x2

are not the same function, even though in practice they produce identical results.2

Definition 5.11. The argument of a function is the element from the domain that ismapped into the range and the value of a function is the element from the range thatis the destination of the mapping.

Definition 5.12. A real-valued function is a function whose range is the set R or anysubset of R.

From the above definition 5.12, the definitions of integer-valued functions, complex-valued functions, etc., should be clear.

Definition 5.13. Let f : D→R and let A⊆D. We let f (A) represent the subset f (x) : x ∈ Aof R. The set f (A) is called the image of A in R. If B ⊆ R, we let f−1 (B) represent thesubset x ∈ D : f (x) ∈ B of D. The set f−1 (B) is called the pre-image of B in D.

2The only difference between the two is that the range of f is all real numbers, and the range of g is theset of non-negative real numbers. This is inconsequential, since the mapping in both cases takes all elementsfrom the domain and assigns them to a non-negative real number. But the two functions are still not thesame.


Note that the image of a function may be equivalent to the range, or it may be a strictsubset of the range. In the above example, the image of the function f is a strict subset ofits range, but the image of g is equal to its range.

5.4 Vector SpaceDefinition 5.14. A vector space V is a set with two associated operations, called addition(+) and scalar multiplication (·), which satisfy the following properties.

(i) Closure under addition:∀x,y ∈V, x+ y ∈V

(ii) Commutativity of addition:

∀x,y ∈V, x+ y = y+ x

(iii) Associativity of addition:

∀x,y,z ∈V, (x+ y)+ z = x+(y+ z)

(iv) Existence of additive identity:

∃ an element O ∈V ϶ x+O= x ∀x ∈V

(v) Existence of additive inverse:

∀x ∈V ∃ some element y ∈V ϶ x+ y =O

(vi) Closure under scalar multiplication:

∀ α ∈ R, ∀x ∈V, α · x ∈V

(vii) Distributivity of scalar multiplication:

∀ α ∈ R, ∀x,y ∈V, α · (x+ y) = (α · x)+(α · y)

(viii) Scalar distribution:

∀ α,β ∈ R, ∀x ∈V, (α+β) · x = α · x+β · x

5.4. VECTOR SPACE 61

(ix) Scalar association:

∀ α,β ∈ R, ∀x ∈V, (αβ) · x = α · (β · x)

(x) Identity element for scalar multiplication:

1 · x = x ∀ x ∈V.

In order to show that any space is a vector space, we simply need to show that theproperties in the above definition are satisfied.

Definition 5.15. The Cartesian Product of sets A and B is the set of pairs (a,b) satisfying a∈A ∧ b ∈ B. We write

A×B = (a,b) | a ∈ A ∧ b ∈ B.

The Cartesian product is the two set case of the general “cross product” of sets, whichis the same concept defined for any number of sets. For example using sets A,B,C and Dwe could define E = A×B×C×D, and a typical element of E would be (a,b,c,d) forsome a ∈ A, b ∈ B, c ∈C and d ∈ D.

Example 5.2.

R3 = R×R×R=(x,y,z) | x ∈ R∧ y ∈ R∧ z ∈ RR2+ = R+×R+;R2

++ = R++×R++.

The order of the sets in the cross-product does matter as the the following exampleshows.

Example 5.3. Let

A = 1,2,3 ,B = 2,4A×B = (1,2) ,(1,4) ,(2,2) ,(2,4) ,(3,2) ,(3,4)B×A = (2,1) ,(2,2) ,(2,3) ,(4,1) ,(4,2) ,(4,3) .

(a) The nonzero vectors u and v are parallel if there exists a ∈ R such that u = av.

(b) The vectors u and v are orthogonal or perpendicular if their scalar product is zero, thatis, if u · v = 0.

(c) The angle between vectors u and v is arccos(

uv∥u∥·∥v∥

).


5.4.1 MetricDefinition 5.16. A distance function is a real-valued function d : V ×V → R which sat-isfies

(i) Non-negativity,

∀ x,y ∈V, d(x,y)⩾ 0 with equality if and only if x = y

(ii) Symmetry∀x,y ∈V, d(x,y) = d(y,x)

(iii) Triangle Inequality

∀x,y,z ∈V,d(x,z)⩽ d(x,y)+d(y,z).

Any function satisfying these three properties is a distance function. A distance func-tion is also called a metric. The space V with elements x,y, which would be called points,is a metric space if we can associate a distance function to it.

Example 5.4.

(a) Euclidean Distance:

d (x,y) =√(x1 − y1)

2 + · · ·+(xn − yn)2

where V = Rn.

(b) Discrete metric:

d (x,y) =

0 if x = y1 if x = y

where V is any vector space.

(c) In V = R2

d (x,y) = max|x1 − y1| , |x2 − y2|

(d) In space V if d(·, ·) is a metric then

d1 (x,y) =d(x,y)

1+d(x,y)

is also a metric. This allows us to construct any number of metric d(x,y) from anygiven metric.

5.4. VECTOR SPACE 63

5.4.2 NormDefinition 5.17. A norm is a real-valued function written ∥ · ∥: V →R, defined on vectorspace V , which satisfies

(i) Non-negativity:

∀x ∈V, ∥ x ∥⩾ 0; with equality if only if x = 0,

(ii) Homogeneity:∀x ∈V,α ∈ R, ∥ α · x ∥= | α | · ∥ x ∥,

(iii) Triangle Inequality:

∀x,y ∈V,∥ x+ y ∥⩽∥ x ∥+ ∥ y ∥ .

Example 5.5.

(a) Euclidean Norm:

∀x ∈ Rn,∥x∥=√

x21 + · · ·+ x2

n

(b) Taxicab Norm:

∀x ∈ Rn,∥x∥=n

∑i=1

|xi|

5.4.3 Inner ProductDefinition 5.18. An inner product is a real valued function ⟨·, ·⟩ : V ×V → R, defined onvector space V , which satisfies

(i) Symmetry:∀x,y ∈V,⟨x,y⟩= ⟨y,x⟩ ,

(ii) Positive definiteness:

∀x ∈V,⟨x,x⟩⩾ 0 with equality if and only if x = 0,

(iii) Bilinearity:

∀x,y,z ∈V,∀α,β ∈ R,⟨αx+βy,z⟩= ⟨αx,z⟩+ ⟨βy,z⟩ .


Example 5.6. V = Rn. Dot Product

∀x,y ∈V,x · y = x1y1 + · · ·+ xnyn.

Definition 5.19. A metric space (V,d) is a space V and a distance function d.

A normed metric space (V,∥·∥) is a space V and a norm ∥·∥. An inner product space(V,⟨·, ·⟩) is a space V and an inner product ⟨·, ·⟩.

5.4.4 Cauchy-Schwartz InequalityThe Cauchy-Schwarz inequality states that for all vectors x and y of an inner product space,

|⟨x,y⟩|2 ⩽ ⟨x,x⟩ · ⟨y,y⟩,

where ⟨·, ·⟩ is the inner product. Equivalently, by taking the square root of both sides, andreferring to the norms of the vectors, the inequality is written as

|⟨x,y⟩|⩽ ∥x∥ · ∥y∥.

Moreover, the two sides are equal if and only if x and y are linearly dependent (or, in ageometrical sense, they are parallel or one of the vectors is equal to zero).

If x1, · · · ,xn ∈R and y1, · · · ,yn ∈R are any real numbers, the inequality may be restatedin a more explicit way as follows:

|x1y1 + · · ·+ xnyn|2 ⩽ (x21 + · · ·+ x2

n) · (y21 + · · ·+ y2

n).

5.5 SequencesDefinition 5.20. A sequence is a function

xn : N→ Rm

that gives us an ordered infinite list of points in Rm.

Another notation for sequence is ⟨xn⟩ where ⟨xn⟩ ≡ (x1,x2, · · ·). As we saw above, setsare unordered collections of elements. Even if there is an intuitive ordering to the elementsof a set, with respect to the definition of the set itself there is no “first element” or “lastelement”. Sequences, however, are sets for which the elements are assigned a particularorder.

5.5. SEQUENCES 65

Example 5.7.

S1 =

1n

, n ∈ N is a sequence in R

S2 =

( 1nn

), n ∈ N is a sequence in R2

The interpretation of S1 is that the nth element of the sequence is given by 1n . So

we could also have written S1 = 1, 12 ,

13 ,

14 , · · ·. Similarly S2 =

( 111

),

( 122

), · · ·

.

Note the implication of this definition is that the elements of the sequence are numberedfrom 1 onwards, not from 0. It’s usually assumed in the first year courses that the firstelement of a sequence is numbered “1” not “0”, but this need not always be the case. Notethat order of appearance of elements matters

1,2,3,4, · · · = 2,1,3,4, · · ·

and elements can be repeated,

S = 1,1,1, · · · is a sequence.

5.5.1 Convergence and LimitsDefinition 5.21. We say that x is a limit point of xn ,n ∈ N, if

∀ε > 0 there exist infinite number of terms xn ϶ d (x,xn) < ε.

Example 5.8. (a) Let xn = (−1)n. This sequence has two limit points: a =−1 and a = 1.

(b) Let xn = sin(π·n2 ). This sequence has three limit points: a =−1,0,1.

(c) The sequence

1,−1, 12 ,−1, 1

3 ,−1, · · ·

has two limit points 0 and −1.

(d) Let xn = n(−1)n. This sequence has a limit point a = 0.

(e) Let xn be a convergent sequence: xn → x as n → ∞. Then xn has a limit point x.

Definition 5.22. The sequence S converges to x (has a limit) x if

∀ ε > 0, ∃ N ∈ N such that d (xn,x) < ε ∀ n > N


In this case we writex = lim

n→∞xn.

Definition 5.22 is a source of a lot of difficulty. However it’s one of the most importantdefinitions in macroeconomic theory and in parts of micro, and it’s worth forcing yourselfto fully absorb it before the end of the Review. The intuition behind limits is not nearlyas difficult as the formal definition. A sequence converges to x if after choosing any very,very tiny number (ε), you can identify a point in the sequence (N) after which all of theremaining members of the sequence are no farther than ε from some particular value x.This concept is only well-defined for infinite sequences. In most economic theory, theelements of a convergent sequence never actually reach their limiting value. They simplyget closer and closer to it as the sequence progresses.

Example 5.9. The sequence xn =1n is a convergent sequence.

(Use claim 1n → 0).

Proof. Let ε > 0 be given. We have to find N such that

∀n > N, d (xn,0) = |xn|< ε ⇒ |xn|< ε

⇔ xn < ε ⇔ 1n< ε ⇔ n >

1ε.

So by choosing N to be any natural number greater than 1ε , we have

∀n > N, d (xn,0) = |xn|=1n<

1N

< ε.

Definition 5.23. A sequence xn is bounded if

∃ B ∈ R such that d (xn,0) ⩽ B, ∀ n ∈ N.

Definition 5.24. A sequence xn is unbounded if

∀ B ∈ R ∃ n ∈ N such that d (xn,0) > B.

Example 5.10. The sequence 1,0,1,0, · · · is bounded. The sequence xn ,xn = n,n∈Nis unbounded.

Definition 5.25. The tail of a sequence xn is the continuation of xn after some m ∈N,that is xm+1,xm+2, · · ·.

5.5. SEQUENCES 67

Theorem 5.1. A sequence xn is bounded if and only if the tail of xn is bounded.

Proof. xn is bounded⇒ tail xn is bounded. (TRIVIAL) Now tail xn is bounded⇒xn is bounded. Fix some m. Then the tail xn is bounded. ⇔

∃ B such that |xn|< B, ∀ n ⩾ m.

LetB′ = maxx1,x2, · · · ,xm−1,B .

Then B′ is a bound for xn,∀n ∈ N, |xn|< B′.

Definition 5.26. If xn∞n=1 is a sequence, a subsequence

xn(k)

∞k=1 is obtained from xn

by crossing out some (possibly infinitely many) elements, while preserving the order.

Example 5.11. Sequence: xn=

1,−1, 12 ,−1, 1

3 ,−1, · · ·

.

Subsequence:

xn(k)

= −1,−1,−1, · · · or

1, 1

2 ,13 , · · ·

.

Definition 5.27. A sequence is monotone increasing if

∀n ∈ N, xn+1 ⩾ xn

and is monotone decreasing if∀n ∈ N, xn+1 ⩽ xn.

Following claim on monotone sequences characterizes convergence property of mono-tone sequences.

Claim 5. Let xn be monotonic. Then it is convergent if and only if it is bounded.

Theorem 5.2. Bolzano-Weierstrass Theorem Every bounded sequence xn has a con-vergent subsequence.

Following proposition is useful in proving Bolzano-Weierstrass Theorem.

Proposition 2. Nested Interval Property Suppose that I1 = [a1,b1], I2 = [a2,b2], · · · , whereI1 ⊇ I2 ⊇ ·· · , and limn→∞(bn−an)= 0. Then there exists exactly one real number commonto all intervals In.


Proof. Note that we have a1 < a2 < a3 · · ·< an < · · ·< bn < · · ·< b2 < b1. Then each biis an upper bound for the set A = a1;a2; · · ·. In other words sequence an is monotoneincreasing and bounded sequence. Therefore, limn→∞ an = a exists and a = supan ⩽bk for each natural number k. Hence ak ⩽ a ⩽ bk for every k ∈ N or a is containedin each Ik. Now let b be contained in In for all n ∈ N. Then an ⩽ b ⩽ bn for everyn ∈ N or 0 ⩽ (b− an) ⩽ (bn − an) for each n. Then limn→∞(b− an) = 0. It follows thatb = limn→∞ an = a, and so a is the only real number common to all intervals.

Now we prove the Bolzano-Weierstrass Theorem as under.

Proof. Let xn∞n=1 be bounded. There is B ∈ R such that xn ⩽ B for all n ∈ N. We prove

the theorem in following steps.

Step 1 We inductively construct a sequence of intervals I0 ⊇ I1 ⊇ I2 ⊇ ·· · such that:

(i) In is a closed interval [an,bn] where bn −an =2B2n ; and

(ii) i : xi ∈ In is infinite.

We let I0 = [−B,B]. This closed interval has length 2B and xi ∈ I0 for all i ∈ N.Suppose we have In = [an,bn] satisfying (i) and (ii). Let cn be the midpoint an+bn

2 .Each of the intervals [an,cn] and [cn,bn] is half the length of In. Thus they bothhave length 1

2 ·2B2n = 2B

2n+1 . If xi ∈ In, then xi ∈ [an,cn] or xi ∈ [cn,bn], possibly both.Thus at least one of the sets i : xi ∈ [an,cn] or i : xi ∈ [cn,bn] is infinite. If thefirst set is infinite, we let an+1 = an and bn+1 = cn. If the second is infinite, we letan+1 = cn and bn+1 = bn. Let In+1 = [an+1,bn+1]. Then (i) and (ii) are satisfied.By the Nested Interval Property, there exists a ∈ ∩∞

n=1In.

Step 2 We next find a subsequence converging to a. Choose i1 ∈ N such that xi1 ∈ I1.Suppose we have in. We know that i : xi ∈ In+1 is infinite. Thus we can choosein+1 > in such that xin+1 ∈ In+1. This allows us to construct a sequence of naturalnumbers i1 < i2 < i3 < · · · where in ∈ In for all n ∈ N.

Step 3 We finish the proof by showing that the subsequence (xin)∞n=1 converges to a. Let

ε > 0. Choose N such that ε > 2B2N . Suppose n ⩾ N. Then xin ∈ In and a ∈ In. Thus

|xin −a|⩽ 2B2n ⩽ 2B

2N for all n ⩾ N.

Remark 5.1. Every bounded sequence xn has at least one limit point x.

5.5. SEQUENCES 69

Definition 5.28. A sequence xn is Cauchy sequence if

∀ ε > 0, ∃N, such that ∀n,m > N, d ( xn,xm) < ε.

After N, each element is close to every other element or in other words, the elementslie within a distance of ε from each other.

Some properties of Cauchy sequence are

(i) Every convergent sequence xn (with limit x, say) is a Cauchy sequence, since,given any real number ε > 0, beyond some fixed point, every term of the sequenceis within distance ε

2 of x, so any two terms of the sequence are within distance ε ofeach other.

(ii) Every Cauchy sequence of real numbers is bounded (since for some N, all terms ofthe sequence from the N-th position onwards are within distance 1 of each other, andif M is the largest absolute value of the terms up to and including the N-th, then noterm of the sequence has absolute value greater than M+1).

(iii) In any metric space, a Cauchy sequence which has a convergent subsequence withlimit x is itself convergent (with the same limit), since, given any real number ε > 0,beyond some fixed point in the original sequence, every term of the subsequence iswithin distance ε

2 of x, and any two terms of the original sequence are within distanceε2 of each other, so every term of the original sequence is within distance ε of x.

Theorem 5.3. Every sequence has at most one limit.

Proof. By contradiction. We use the intuition that all points end up being close to say r1and r2 at the same time which is not possible. Let sequence xn converge to two limits r1and r2. It is enough to show that there is one ε for which this does not hold. Let us chooseε = d(r1,r2)

4 = |r1−r2|4 or |r1 − r2|= 4ε. Since r1 is a limit,

∃N1,∀n > N1, |xn − r1|< ε

and since r2 is a limit,∃N2,∀n > N2, |xn − r2|< ε.

Let N = maxN1,N2. Then

∀n > N, |xn − r1|+ |xn − r2|< 2ε.

By triangle inequality,

4ε = |r1 − r2|= |(xn − r2)− (xn − r1)|⩽ |xn − r1|+ |xn − r2|< 2ε

which is a contradiction.

Remark 5.2. A sequence can have more than one limit point.


5.5.2 Some Results on Sequences(a) Every convergent sequence is bounded BUT a bounded sequence may not be conver-

gent. For example 1,−1,1,−1, · · ·.

(b) If xn → x and yn → y, then

xn + yn → x+ yxn · yn → x · y,

and if yn = 0,∀n∧ y = 0,xn

yn→ x

y.

(c) Weak inequalities are preserved in the limit.

If xn→ x and xn

⩾⩽><

b,∀n ∈ N, then x

⩾⩽⩾⩽

b.

(d) x is a limit point of xn if and only if ∃ a subsequence

xn(k)

∞

k=1of the sequence

xn such that

xn(k)

→ x.

(e) Sequence of vectors xn =

x1n

x2n

· · ·xNn

∈ RN converges to a limit x =

x1x2· · ·xN

if

and only if xin→ xi , ∀i = 1,2, · · · ,N.

(f) Every convergent sequence is also a Cauchy sequence.

Definition 5.29. A vector space in which every Cauchy sequence has a limit is called acomplete vector space.

5.6 Sets in Rn

Now we are ready for additional useful concepts in set theory. We begin with some defi-nitions.

5.6. SETS IN RN 71

Definition 5.30. A set A on the real line is bounded if ∃B ∈ R ϶ ∀x ∈ A,∥x∥⩽ B.

Theorem 5.4. For every non-empty bounded set A ⊂R, ∃ a real number sup A such that(a) sup A is an upper bound for A,

∀ x ∈ A, x ⩽ sup A

(b) If y is any upper bound for A then

y ⩾ sup A

or sup A is the least upper bound for A.

Similarly inf A is the greatest lower bound.

Example 5.12. For the sets

A = [0,1] ,B = (0,1) ,C = [0,1),D = (0,1]sup = 1, inf = 0

This example shows that sup and inf of a set need not belong to the set. If sup belongsto the set A, it is called max A and if inf A belongs to the set A, it is called min A.

Definition 5.31. Point x is a limit point of a set A if every neighborhood of x contains apoint of A different from x : x is a limit point of A if

∀ε > 0,∃y ∈ A,϶ y = x∧d (x,y)< ε.

Theorem 5.5. Bolzano-Weierstrass Theorem for sets Every bounded infinite set has atleast one limit point.

Example 5.13. For the set A = (0,1), x = 0 is a limit point of the set A.

This shows that limit point of a set need not belong to the set.

Theorem 5.6. Point x is a limit point of set A ⊆ Rn if ∃ a sequence

xn ϶ ∀n ∈ N,xn = x∧ xn ∈ A∧ xn → x.

Definition 5.32. An open ball in Rn centered at x with radius ε > 0 is

Bε (x) = y ∈ Rn | d (x,y)< ε .

Note that the open ball does not include its boundary points.


x

Br(x)

r

Figure 5.7: Open ball Br(x) in R2

Example 5.14. An open ball in R2 centered at x = (0,0) with radius 1 isy ∈ R2 | y2

1 + y22 < 1

.

Definition 5.33. The set A is open if

∀x ∈ A,∃r > 0 ϶ Br (x)⊆ A.

Around any point in an open set, one can draw an open ball which is completelycontained in the set.

Example 5.15. Following sets are open

A = (0,1)∪ (5,10) ;B = (−∞,0) ; R; /0.

Definition 5.34. The set A is closed if A contains all its limit points. (contains its borders)

Theorem 5.7. Set A ⊆ Rn is closed if and only if AC is open.

Example 5.16. Following sets are closed.

A = [2,5] since AC = (−∞,2)∪ (5,∞) is open.;R; /0.

There are two sets which are both open and closed. The empty set and the universal set.Empty set /0 is open since

int /0 = /0

5.6. SETS IN RN 73

and /0 is closed sincebd /0 = /0 ⊆ /0.

The universal set is complement of the empty set and so is both open and closed. Therecan be sets which are neither open nor closed: A = (0,1]. Following theorem characterizesthe closed set using convergent sequences.

Theorem 5.8. A set A ⊆ Rn is closed if and only if every convergent sequence of pointsxn ∈ A has its limit x ∈ A.

Example 5.17. The budget set

B(p, I) =

y ∈ Rn+ | p · y ⩽ I

,

where p ∈ Rn++ and I ∈ R++, is closed.

Proof. Take any sequence

xn ϶ xn ∈ B(p, I)∀n∧ xn → x.xn ⩾ 0,∀n ⇒ x ⩾ 0,p · xn ⩽ I,∀n ⇒ p · x ⩽ I

⇒ x ∈ B(p, I)⇒ B(p, I) is closed.

Theorem 5.9.

(a) The union of any number of open sets is open.

(b) The intersection of a finite number of open sets is open.

(c) A singleton set is a closed set.

(d) The union of a finite number of closed sets is closed.

(e) The intersection of any number of closed sets is closed.

Remark 5.3. The finite number of sets in (b) and (d) are necessary as following exampleshows.

For (b) , An =

(−1

n,1n

), n ∈ N,

∞∩

n=1An = [0] which is closed.

For (d) , Bn =

[1n,2], n ∈ N,

∞∪

n=1Bn = (0,2] which is not closed.


Good 2MP2

Good 1MP1

0

|Slope|= p1p2

Figure 5.8: Budget set B(p, I)

5.6. SETS IN RN 75

Figure 5.9: A Non-convex Set

Definition 5.35. A set A ⊆ Rn is compact if and only if A is closed and bounded.

Example 5.18.

A = [1,2] is compact.R is closed but not bounded. NOT compact.B = (1,2] is bounded but not closed. NOT compact.

Definition 5.36. A set A ⊆Rn is compact if every sequence of points xn ∈ A has a limitpoint x ∈ A.

Definition 5.37. A set A ⊆ Rn is convex if ∀x,y ∈ A,∀λ ∈ (0,1),

λx+(1−λ)y ∈ A.

It will be useful to draw some sets to differentiate between convex and non-convexsets.


Figure 5.10: Not a convex set

Figure 5.11: A Convex Set

Chapter 6

Problem Set 2

(1) Verify that the following are distance function or metric.

(a) The Manhattan distance, for x,y ∈ Rn

d(x,y) =n

∑i=1

| xi − yi | ∀x,y ∈ Rn (6.1)

(b) For x,y ∈ R2,d(x,y) = max| x1 − y1 |, | x2 − y2 | (6.2)

(c) Let d(·, ·) be a metric, then

d1 (x,y) =d(x,y)

1+d(x,y). (6.3)

(2) Determine whether∞∪

n=1

[1n,2n

](6.4)

is compact.

(3) Show that the limit of 1n

∞n=1 is 0.

(4) Which of the following is true? Prove or give a counterexample.

(a) (A∪B)c ⊆ Ac ∪Bc

(b) (A∪B)c ⊇ Ac ∪Bc.

77


(5) Show thatC =

(x1,x2) ∈ R2 : x2

1 + x22 = 1

(6.5)

is not a vector space.

(6) Prove

(J∩K)c = Jc ∪Kc

(J∪K)c = Jc ∩Kc

(7) Consider sequences xn∞n=1 and yn∞

n=1 such that xn∞n=1 → x and yn∞

n=1 → y.Show that xn + yn∞

n=1 → x+ y.

(8) Let xn = n,n ∈ N. Show that xn∞n=1 is not convergent.

(9) Prove that every Cauchy sequence xn is bounded.

(10) Prove that every sequence has at most one limit (i.e., prove that if a sequence has alimit, that limit is unique).

(11) Prove or disprove: A monotone sequence is convergent if and only if it is bounded.

(12) Determine whether the following sets are open, closed, neither or both:

(i) S = (0,1);

(ii) S = [0,1];

(iii) S = R;

(iv) S = [0,1);

(v) S = (x1,x2 ∈ R2+ | x2 ≥ 1

x1;

(vi) S = (x1,x2 ∈ R2++ | x2 >

1x1.


(b) (i) Non-negativity is obvious as the maximum of two absolute values is non-negative. If x = y,d(x,y) = 0. Also

d (x,y) = max|x1 − y1| , |x2 − y2|= 0⇒ |x1 − y1|= 0 = |x2 − y2| ⇒ x = y.

(ii) Symmetry is obvious too since absolute value function is symmetric

| a−b |=| b−a | .

(iii) Triangle Inequality : Note that maxa,b⩾ a and maxa,b⩾ b. Using thiswe have

d(x,y) ⩾ |x1 − y1| and d(x,y)⩾ |x2 − y2|d(y,z) ⩾ |y1 − z1| and d(y,z)⩾ |y2 − z2|

d(x,y)+d(y,z) ⩾ |x1 − y1|+ |y1 − z1|⩾ |x1 − z1|d(x,y)+d(y,z) ⩾ |x2 − y2|+ |y2 − z2|⩾ |x2 − z2| .

It follows that

d(x,y)+d(y,z)⩾ max|x1 − z1| , |x2 − z2|= d (x,z) .

Hence it is a distance function.

(c) (i) Non-negativity: d(x,y) ≥ 0 for all x,y in Rn, and thus 1+ d(x,y) ≥ 1 forall x,y in Rn. As a result, d1(x,y)≥ 0 for all x,y in Rn.By the definition of d1(x,y), d1(x,y) = 0 if and only if d(x,y) = 0. Butd(x,y) = 0 if and only if x = y.

(ii) Since d(x,y) = d(y,x), it is straightforward to see that d1(x,y) = d1(y,x).(iii)

d1(x,z) ≤ d1(x,y)+d1(y,z)⇔d(x,z)

1+d(x,z)≤ d(x,y)

1+d(x,y)+

d(y,z)1+d(y,z)

⇔

d(x,z)[1+d(x,y)][1+d(y,z)] ≤ d(x,y)[1+d(x,z)][1+d(y,z)]+ d(y,z)[1+d(x,y)][1+d(x,z)]⇔

d(x,z) ≤ d(x,y)+d(y,z)+2d(x,y)d(y,z)+d(x,y)d(y,z)d(x,z)

81

Since d(x,y)+d(y,z)≥ d(x,z), d(a,b)≥ 0 for any (a,b)∈Rn×Rn, the lastinequality is always true. Thus d1(x,z) ≤ d1(x,y)+ d1(y,z) for all x,y,z inRn.

(2) It is bounded. Take B = 2,∥x∥⩽ 2,∀x ∈∪∞

n=1[1

n ,2n

]. But it is NOT closed as

∞∪n=1

[1n,2n

]= (0,2].

So it is not compact.

(3) Let ε > 0 be given. We have to find N such that

∀n > N, d(

1n,0)=

∣∣∣∣1n∣∣∣∣< ε

⇔ 1n< ε ⇔ n >

1ε.

So by choosing N to be any natural number greater than 1ε , we have

∀n > N, d(

1n,0)=

1n<

1N

< ε.

Hence 1n → 0.

(4)

(A∪B)c ⊆ Ac ∪Bc is TRUE.Let, x ∈ (A∪B)c

⇒ x /∈ (A∪B)⇒ x /∈ A∧ x /∈ B⇒ x ∈ AC ∧ x ∈ BC

⇒ x ∈ AC ∪BC

(A∪B)c ⊇ Ac ∪Bc is FALSE.Let, x ∈ Ac ∪Bc and let x ∈ Ac ∧ x /∈ Bc

⇒ x /∈ A∧ x ∈ B ⇒ x ∈ A∪B⇒ x /∈ (A∪B)C .


(5) It is enough to show that one of the ten properties of the vector space is not satisfied bythis space. Take scalar multiplication by 2. Let (x1,x2) ∈C and let α = 2 be a scalar.Then

(2x1,2x2) ∈ R2 : (2x1)2 +(2x2)

2 = 4.

Hence (2x1,2x2) /∈C and so C is not a vector space.

(6) (a) (J∩K)c = Jc ∪Kc.We split the proof in two parts.

(i)(J∩K)c ⊆ Jc ∪Kc. (7.1)

Let

x ∈ (J∩K)c

⇒ x /∈ (J∩K)

⇒ x /∈ J∨ x /∈ K⇒ x ∈ JC ∨ x ∈ KC

⇒ x ∈ JC ∪KC.

(ii) NextJc ∪Kc ⊆ Jc ∩Kc

x ∈ JC ∪KC

⇒ x ∈ JC ∨ x ∈ KC

⇒ x /∈ J∨ x /∈ K⇒ x /∈ J∩K⇒ x ∈ (J∩K)C .

(b) (J∪K)c = Jc ∩Kc.

(J∪K)c ⊆ Jc ∩Kc (7.2)

Let

x ∈ (J∪K)c

⇒ x /∈ (J∪K)

⇒ x /∈ J∧ x /∈ K⇒ x ∈ JC ∧ x ∈ KC

⇒ x ∈ JC ∩KC.

83

NextJc ∩Kc ⊆ (J∪K)c

Let

x ∈ Jc ∩Kc

⇒ x ∈ Jc ∧ x ∈ Kc

⇒ x /∈ J∧ x /∈ K⇒ x /∈ J∪K⇒ x ∈ (J∪K)c .

(7) We need to show that ∀ ε > 0 ∃ N such that

∀n > N, |(xn + yn)− (x+ y)|< ε.

Note|xn + yn − x− y|= |xn − x+ yn − y|⩽ |xn − x|+ |yn − y| ,

by triangle inequality. Since

xn → x,∃N1 s.t. ∀n > N1, |xn − x|< ε2

yn → y,∃N2 s.t. ∀n > N2, |yn − y|< ε2.

Let N = maxN1,N2. Hence

∀n > N, |xn − x|+ |yn − y|< ε2+

ε2= ε,

⇒ |xn + yn − x− y|< ε.

So xn + yn→ x+ y.

(8) We know that if a sequence is convergent then it is bounded. The contrapositive state-ment will be, “If a sequence is not bounded then it is not convergent.”. The sequencexn = n,n ∈ N is NOT bounded. No matter which B we choose as a bound, there willbe a natural number greater than it. We now use the contrapositive to conclude thatxn∞

n=1 is not convergent.

(9) Since xn is a Cauchy sequence, for ∀ε > 0, there exist N ∈ N such that ∀m,n > Nimplies that |xn − xm|< ε. Choose ε = 1,m = N, then


|xn − xN |< 1 ⇒ xn < 1+ |xN | ,∀n > N.

LetB = max|x1| , |x2| , · · · ,1+ |xN | ,

then xn ⩽ B,∀n ∈ N.

(10) We prove this claim by contradiction. We use the intuition that all points end up beingclose to say r1 and r2 at the same time which is not possible. Let sequence xnconverge to two limits r1 and r2. It is enough to show that there is one ε for which thisdoes not hold. Let us choose ε = d(r1,r2)

4 = |r1−r2|4 or |r1 − r2|= 4ε. Since r1 is a limit,

∃N1,∀n > N1, |xn − r1|< ε

and since r2 is a limit,∃N2,∀n > N2, |xn − r2|< ε.

Let N = maxN1,N2. Then

∀n > N, |xn − r1|+ |xn − r2|< 2ε.

By triangle inequality,

4ε = |r1 − r2|= |(xn − r2)− (xn − r1)|⩽ |xn − r1|+ |xn − r2|< 2ε

which is a contradiction.

(11) We consider monotone increasing sequence xn ⩽ xn+1. Proof is analogous for themonotone decreasing case. Let xn be a convergent sequence and let lim

n→∞xn = x.

From the definition of convergence, with ε= 1, we get N ∈N such that ∀n>N impliesthat |xn − x|< 1. Then,

xn < 1+ |x| ,∀n > N.

LetB = max|x1| , |x2| , · · · ,1+ |x| ,

then xn ⩽ B,∀n∈N. Now let the sequence be bounded. Let x be the least upper bound.Then xn ⩽ x ∀n ∈ N. For every ε > 0, there exists a N ∈ N, such that x− ε < xN ⩽ x.Otherwise x− ε would be an upper bound for the sequence. Since xn is increasing,n ⩾ N implies

x− ε < xn ⩽ x

which shows that xn converges to x.

85

(12) (i) S = (0,1) Open: For any x ∈ (0,1), open ball with radius minx,1− x is con-tained in S.

(ii) S = [0,1] Closed: Use the theorem: A set S ⊆ Rn is closed if and only if everyconvergent sequence of points xn ∈ S has its limit x ∈ A. Let xn be a con-vergent sequence with limit x contained in S, then for all n, xn ⩾ 0, and xn ⩽ 1.Since weak inequalities are preserved in the limit, x ⩽ 1 and x ⩾ 0. So x ∈ S andS is closed.

(iii) S = [0,1): Neither open nor closed: It is not closed since the limit of convergentsequence

1− 1

n

is not contained in S and is not open since x= 0 is contained in

S but it is not possible to have an open ball with centre x = 0 which is containedin S.

(iv) S = R; Both open and closed: Use the result in the notes that empty set is bothopen and closed and R is complement of the empty set.

(v) S = (x1,x2 ∈ R2+ | x2 ≥ 1

x1 Closed. We will use following result.

Result 2. Let f : Rm+ → R be a continuous function on Rm

+. Then the set

C = x ∈ Rm+ : f (x)≥ 0 (1)

is closed in Rm.

Proof. Take an arbitrary convergent sequence xn of points in C, with limitz ∈ Rm. That is, x1,x2,x3, · · · belong to C, and

limn→∞

xn = z (2)

We have to show that z ∈C. Since xn ∈C for each n = 1,2,3, · · · we have xn ≥ 0for each n = 1,2,3, · · · . Since (2) holds, and weak inequalities are preserved inthe limit, we have z ≥ 0; that is, z ∈Rm

+. Since f is continuous on Rm+, given any

ε > 0, there is δ > 0, such that whenever d(x,z)< δ, we have

| f (x)− f (z)|< ε. (3)

Using this δ > 0, we can find a positive integer N, such that whenever n > N, wehave d(xn,z) < δ, since (2) holds. Thus, using (3), for all n > N, we must have| f (xn)− f (z)| < ε. This implies that the sequence f (xn) is convergent withlimit f (z). Since xn ∈C for each n = 1,2,3, · · · , we have f (xn)≥ 0 for each n =1,2,3, · · · . Since f (xn) is convergent with limit f (z), and weak inequalitiesare preserved in the limit, we have f (z) ≥ 0. We have now shown that z ∈ Rm

+,and f (z)≥ 0, so z ∈C. For our problem, let f(x)= x1x2−1 which is continuouson R2

+.


(vi) S = (x1,x2) ∈ R2++ | x2 >

1x1 Open:

Consider two sets A and B open in Rn. We show first that A ∩ B is open inRn. Take an arbitrary x ∈ A∩B. Now, since x ∈ A and S is open in Rn, thereexists rA > 0 such that B(x,rA) ⊂ A. And since x ∈ B and B is open in Rn,there exists rB > 0 such that B(x,rB) ⊂ B. Take r = minrA,rB > 0. ThenB(x,r)⊂ B(x,rA)⊂ A and B(x,r)⊂ B(x,rB)⊂ B. Therefore B(x,r)⊂ A∩B, soA∩B is open in Rn. We can write S = A1 ∩A2 where

A1 = (x1,x2) ∈ R2 | x1 > 0,x2 > 0A2 = (x1,x2) ∈ R2 | x1x2 > 1

Note that A2 contains some points where x1 < 0 and x2 < 0. We want to showthat both A1 and A2 are open in R2, which will show by part (a) that A is open inR2.

Claim 6. A1 is open in R2.

Proof. Let x ∈ A1 and take r = minx1, x2 > 0. We want to show B(x,r) ⊂A1. This is the same as showing that for any x /∈ A1, we have x /∈ B(x,r).So, consider some x /∈ A1 and assume without loss of generality that x1 ≤ 0.Then we have that x1 ≤ 0 < r ≤ x1, so x1 − x1 ≥ r. That means that d(x, x) =√

(x1 − x1)2 +(x2 − x2)2 ≥ r, so we have x /∈ B(x,r), which is what we wantedto show. Therefore A1 is open.

Claim 7. A2 is open in R2.

Proof. To show that A2 is open in R2, it is easiest to show that ∼A2 is closed inR2. First, define the function f (x) = 1− x1x2 for all x ∈ R2 and note that f iscontinuous on R2. Then we can write ∼ A2 as

∼A2 = (x1,x2) ∈ R2 | x1x2 ≤ 1= (x1,x2) ∈ R2 | f (x)≥ 0

By the result on closed sets used in (v), ∼A2 is closed in R2. Therefore A2 isopen in R2.

We can write S = A1 ∩A2 which will be open in R2++.

Chapter 8

Linear Algebra

Linear algebra is the branch of mathematics dealing with (among many other things) ma-trices and vectors. It’s intuitively easy to see why linear algebra is important for econo-metrics and statistics. Economic data is arranged in matrix format (rows corresponding toobservations, columns corresponding to variables), so the body of theory governing matri-ces should help us analyze data. It is harder to see the connection between matrix theoryand the optimization that we do in micro theory, but there are some important links. We’llcover the basics and some of the necessary detail here, but more detailed coverage will beoffered in the core courses.

8.1 VectorsYou may be familiar with vectors from physics courses, in which a vector is a pair givingthe magnitude and direction of a moving body. The vectors we use in economics aremore general, in that they can have any finite number of elements (rather than just 2),and the meaning of each element can vary with the context (rather than always signifyingmagnitude and direction). Formally speaking a vector can be defined as a member of avector space, but we don’t need to deal with such a definition here. For our purposes:

Definition 8.1. A vector is an ordered array of elements with either one row or one col-umn.

The elements are usually numbers. A vector is an n× k matrix for which either n = 1,k = 1 or both (see the definition of a matrix below). A general vector, for which the numberof elements is not specified but left as n, will sometimes be called an “n-vector”. We alsorefer to these as “vectors in Rn”. A vector can be written in either row or column form:

87

88 CHAPTER 8. LINEAR ALGEBRA

Row Vector: x ∈ Rn =(x1 x2 . . . xn

); Column Vector: x ∈ Rn =

x1x2...

xn

.

Although you will sometimes be able to switch between thinking of a vector as a rowor a column without restriction, there are certain operations that require a vector to beoriented in a certain way, so it is good to distinguish between row and column vectorswhenever possible. Most people use x to refer to the vector in column form and x′ torefer to it in row form, but this is not universal. Also, we usually use lowercase letters forvectors and uppercase letters for matrices.

8.1.1 Special Vectors

Null vector 0n×1 =

0...0

Sum vector un×1 =

1...1

Unit vector : In Rn there are n unit vectors.

The ith unit vector, called ei, has all elements 0 except for the i th, which is equal to 1. Thedefinition of a unit vector is specific to the vector space in which it sits. For example:

e2 ∈ R3 =

010

(8.1)

and

e2 ∈ R4 =

0100

(8.2)

8.2. MATRICES 89

8.1.2 Vector Relations and OperationsDefinition 8.2.

(a) Equality :

Vectors x ∈ Rn, y ∈ Rm are equal if n = m and xi = yi ∀ i.

(b) Inequalities : ∀x,y ∈ Rn:

x ≥ y if xi ≥ yi ∀ i = 1, · · · ,n;x > y if xi ≥ yi ∀ i = 1, · · · ,n and xi > yi for at least one i;x ≫ y if xi > yi ∀i = 1, · · · ,n.

(c) Addition :∀x,y ∈ Rn, x+ y = z ∈ RN where zi = xi + yi, ∀i.

(d) Scalar Multiplication :∀x ∈ Rn, and α ∈ R, we define the scalar product as

αx =

αx1αx2

...αxn

(8.3)

(e) Vector Multiplication : This is essentially an inner product rule applied to Rn. Seethe rules for matrix multiplication below, as they also apply for vectors.

8.2 MatricesDefinition 8.3. A matrix is a rectangular array of elements (usually numbers, for ourpurposes).

A matrix is characterized as n× k when it has n rows and k columns. To represent then× k matrix A, we can write:

[A]n×k

=[ai j]

n×k =

a11 a12 . . . a1ka21 a22 . . . a2k

......

......

an1 an2 . . . ank


The matrix An×k is a null matrix if ai j = 0 for i = 1, · · · ,n, j = 1, · · · ,k.The matrix An×k is square if n = k. In this case we refer to it as an n×n matrix.The square matrix An×n is symmetric if ai j = a ji ∀ i, j.The square matrix An×n is diagonal if ai j = 0 whenever i = j.The square matrix An×n is an identity matrix if ai j = 1 whenever i = j and ai j = 0

otherwise.The square matrix An×n is lower triangular if ai j = 0 ∀ i < j.The square matrix An×n is upper triangular if ai j = 0 ∀ i > j.It’s worth it to check your understanding of each of the above definitions by writing

out a matrix that satisfies each. Then note this next definition carefully:The k×n matrix B is called the transpose of An×k if bi j = a ji ∀ i, j.We write the transpose of A as either AT or A′. If A is symmetric then A′ = A. This

is an obvious statement, but you could try proving it formally. It should only take a fewlines.

8.2.1 Matrix Operations

Addition

Matrix addition is only defined for matrices of the same size. If A is n× k and B is n× kthen

A+B =Cn×k (8.4)

whereci j = ai j +bi j ∀ i = 1, · · · ,n, j = 1, · · · ,k. (8.5)

We say that matrix addition occurs “element wise” because we move through each elementof the matrix A, adding the corresponding element from B.

Scalar Multiplication

Scalar multiplication is also an element wise operation. That is,

∀ λ ∈ R, λ · [A]n×k

=

λa11 λa12 · · · λa1kλa21 λa22 · · · λa2k

......

......

λan1 λan2 · · · λank

(8.6)

8.2. MATRICES 91

Matrix Multiplication

Matrix multiplication is defined for matrices [A]m× j

and [B]n×k

if j = n or m = k. That is, the

number of columns in one of the matrices must be equal to the number of rows in theother. If matrices A and B satisfy this condition, so that A is m× j and B is j × k, theirproduct [C]

m×k≡ [A]

m× j· [B]

j×kis given by ci j = Ai ·B j, where Ai is the ith row of A and B j is the

jth column of B. For example, suppose

[A]2×2

=

[1 23 4

]and [B]

2×3=

[6 5 43 2 1

]Multiplication between A and B is only defined if A is on the left and B is on the right. Itmust always be the case that the number of columns in the left hand matrix is the same asthe number of rows in the right hand matrix. In this case, if we say AB =C, then element

c11 =[1 2

]·[

63

]= 1 ·6+2 ·3 = 12

Likewise

c12 = 1 ·5+2 ·2 = 9c13 = 1 ·4+2 ·1 = 6c21 = 3 ·6+4 ·3 = 30c22 = 3 ·5+4 ·2 = 23c23 = 3 ·4+4 ·1 = 16

which gives

[A]2×2

· [B]2×3

= [C]2×3

=

[12 9 630 23 16

]Note that matrix multiplication is not a symmetric operation. In general, AB = BA, andin fact it is often the case that the operation will only be defined in one direction. In ourexample BA is not defined because the number of columns of B = (3) is not equal to thenumber of rows of A = (2). For both AB and BA to be defined

[A]n×k

· [B]k×n

= [C]n×n

,

and[B]k×n

· [A]n×k

= [D]k×k

.


8.2.2 Some Fun facts about matrix multiplication

(i) Even if n = k,AB = BA.

A =

[1 23 4

], B =

[0 −16 7

], AB =

[12 1324 25

], BA =

[−3 −427 40

].

(ii) AB may be null matrix even when A = 0 and B = 0.

A =

[2 41 2

], B =

[−2 41 −2

], AB =

[0 00 0

].

(iii) CD =CE ⇏ D = E even when C = 0.

C =

[2 36 9

], D =

[1 11 2

], E =

[−2 13 2

], CD = CE =

[5 8

15 24

].

8.2.3 Rules for matrix operations

A+B = B+A

A+(B+C) = (A+B)+C

(AB)C = A(BC)

(A+B)C = AC+BC

A(B+C) = AB+AC

Check that you have a clear understanding of the restrictions needed on the number of rowsand columns of A,B and C in order for the above to work. More matrix rules, involvingthe transpose:

(A′)′ = A (8.7)

(A+B)′ = A′+B′ (8.8)

(AB)′ = B′A′ (8.9)

Note the reversal of the order of the matrices in the last operation.

8.2. MATRICES 93

8.2.4 Rank of a matrixDefinition 8.4. A set of vectors x1, · · · , xn in Rm is linearly dependent if there exist λ1,· · · , λn, not all zero, such that

λ1x1 + · · ·+λnxn = 0. (8.10)

Definition 8.5. A set of vectors x1, · · · , xn in Rm is linearly independent if it is not linearlydependent.

Definition 8.6. The rank of a matrix A is the number of linearly independent columnvectors of A. It is also equal to the number of linearly independent row vectors of A.

Example 8.1. Let

A =

1 2 30 1 02 4 6

The first and the third columns are linearly dependent. The elements of column 3 are

three times the corresponding entry in the column 1. Now take Columns 1 and 2.

λ1

102

+λ2

214

=

000

⇔

λ1 +2λ2 = 0

λ2 = 02λ1 +4λ2 = 0

⇔ λ1 = 0, λ2 = 0

is the only solution. So the first two columns are linearly independent. We found twolinearly independent columns so the rank of matrix A is 2. We could have done the exercisetaking rows instead of columns and still got the same answer. (Please verify).

Theorem 8.1. (i) Rank of [A]n×k

⩽ # rows,# columns= minn,k;

(ii) Rank of AB ⩽ minRank (A) , Rank (B)⩽ Rank (A) , Rank (B).

Definition 8.7. A square matrix [A]n×n

is called non-singular or of full rank if rank (A) = n.

It is called singular if rank (A)< n.


is invertible if there exist [B]n×n

such that [A]n×n

· [B]n×n

=

[B]n×n

· [A]n×n

= [I]n×n

. Then B is called inverse of A.


8.2.5 Rules for the inverse:(A−1)−1

= A (8.11)

(AB)−1 = B−1A−1 (8.12)(A′)−1

=(A−1)′ (8.13)


is called orthogonal if A−1 = A′, i.e., AA′ = I.

Theorem 8.2. [A]n×n

is invertible ⇔ [A]n×n

is non-singular.

8.3 Determinant of a matrixDeterminant is defined only for square matrices. The determinant is a function dependingon n that associates a scalar, det(A), to an n× n square matrix A. The determinant of an1-by-1 matrix A is the only entry of that matrix: det(A) = A11. The determinant of a 2 by2 matrix

A =

[a bc d

]is det(A) = ad −bc.

Definition 8.10. The cofactor Ai j of the element ai j is defined as (−1)i+ j times the deter-minant of the sub matrix obtained from A after deleting row i and column j.

Example 8.2. Let

A =

[1 23 4

]A11 = (−1)1+1 ·4 = 4, A12 = (−1)1+2 ·3 =−3

A21 = (−1)2+1 ·2 =−2, A22 = (−1)2+2 ·1 = 1.

Definition 8.11. The determinant of an n×n matrix A is given by

det(A) =n

∑j=1

a1 jA1 j =n

∑i=1

ai1Ai1. (8.14)

Example 8.3. Let

A =

a b cd e fg h i

.

8.3. DETERMINANT OF A MATRIX 95

Then

det(A) = a(−1)1+1 det[

e fh i

]+b(−1)1+2 det

[d fg i

]+ c(−1)1+3 det

[d eg h

]= a(ei− f h)−b(di− f g)+ c(dh− eg) .

8.3.1 Properties of Determinants

(a)det(A) = det

(A′) (8.15)

(b) Interchanging any two rows will alter the sign but not the numerical value of the de-terminant.

(c) Multiplication of any one row by a scalar k will change the determinant k− fold.

(d) If one row is a multiple of another row, the determinant is zero.

(e) The addition of a multiple of one row to another row will leave the determinant un-changed.

(f) If A and B are n×n matrices, then

det(AB) = det(A) ·det(B) .

(g) Properties (b)− (e) are valid if we replace row by columns everywhere.

A =

[1 23 4

], det(A) =−2; A′ =

[1 32 4

]det(A′)=−2

B =

[3 41 2

], det(B) = 2.

Result 3. Let A be an n×n an upper triangular matrix, i.e., ai j = 0 whenever i > j. Thedeterminant of the matrix A is given by:

det A = ∏ni=1 aii


Proof. The matrix A is upper triangular and is described as under:

A =

a11 a12 · · · a1,n−1 a1n0 a22 · · · a2,n−1 a2n...

... . . . ......

0 0 · · · an−1,n−1 an−1,n0 0 · · · 0 ann

We prove the result by induction.

(1) Base case: Let n = 1. If A is a 1 × 1 matrix, then det A = a11 = ∏1i=1 aii by the

definition of a determinant.

(2) Inductive case: Let n > 1. Assume that for any (n−1)× (n−1) matrix A with bi j = 0for all i > j, we have det A = ∏n−1

i=1 aii. Now consider any n×n matrix A with ai j = 0for all i > j. Expanding by the last row, we have

det A = an1An1 + · · ·+annAnn

= ann(−1)n+n

∣∣∣∣∣∣∣∣∣a11 a12 · · · a1,n−10 a22 · · · a2,n−1...

... . . . ...0 0 · · · an−1,n−1

∣∣∣∣∣∣∣∣∣= ann

n−1

∏i=1

aii

=n

∏i=1

aii

where the third equality follows from the inductive hypothesis.

(3) The result holds for all n by inductive conclusion.

Also we can show using the above the following result.

Result 4. The upper triangular square matrix A is non-singular if and only if aii = 0 foreach i ∈ 1, · · · ,n.

As an ”if and only if” statement, this requires proofs in both directions.

8.4. AN APPLICATION OF MATRIX ALGEBRA 97

Claim 8. If the upper triangular matrix A is non-singular, then aii = 0 for all i = 1, . . . ,n.

Proof. Let A be non-singular. Then A has an inverse, A−1. Since 1= detdet I = det A−1A=(det A−1)(det A), we know that det A = 0. If aii = 0 for any i ∈ 1, . . . ,n, then by the resultin Example (??) we would have det A = 0, a contradiction. So it must be that aii = 0 forall i = 1, . . . ,n.

Claim 9. If A is upper triangular and aii = 0 for all i = 1, . . . ,n, then A is non-singular.

Proof. Let aii = 0 for all i = 1, . . . ,n. Then by (a), det A = 0. Seeking contradiction,suppose A is singular. Without loss of generality, we can write A1 = ∑n

i=2 αiAi. Let

B =

A1 −∑ni=2 αiAi A2 · · · An

=

0 A2 · · · An

We know, by the properties of determinants, that det B = det A. But, expanding B by thefirst column, we have det B = 0. This gives det A = 0, a contradiction. So we have that Ais non-singular.

8.4 An application of matrix algebraWe provide an application of matrix algebra is Markov process or Markov chain. Markovprocesses are used to measure movements over time. It involves use of a Markov transitionmatrix. Each value in the transition matrix is probability of moving from one state toanother state. It also specifies a vector containing the initial distribution across each ofthese states. By repeatedly multiplying the initial distribution vector by the transitionmatrix, we can estimate changes across states over time.

Consider the problem of movement of employees within a firm at different branches.In the simple case, we take two locations, namely Ithaca and Cortland to demonstrate thebasic elements of a Markov process.

To determine the number of employees in Ithaca tomorrow, we take the probabilitythat the employees will stay in Ithaca branch multiplied by the total number of employeescurrently in Ithaca. We add to this the number of Cortland employees transferring to Ithaca,which is equal to total number of employees in Cortland multiplied by the probability ofCortland employees transferring to Ithaca.


We follow the same process to determine the number of employees in Cortland tomor-row, made up of the employees who chose to remain at Cortland and the Ithaca employeeswho transfer into Cortland.

There are four probabilities involved which can be arranged in a Markov transitionmatrix.

Let At and Bt denote the populations of Ithaca and Cortland locations at some time t.The transition probabilities are defined as follows.

pAA ≡ probability that a current A remains an A,

pAB ≡ probability that a current A moves to B,

pAA ≡ probability that a current B remains a B,

pAA ≡ probability that a current B moves to A.

The distribution of employees at time t is denoted by the vector x′t = [At Bt ] and thetransition probabilities in matrix form as

M =

[pAA pABpBA pBB

]. (8.16)

Then the distribution of employees across the two locations next period (t +1) is x′t ·M =x′t+1, which is

[At Bt ]

[pAA pABpBA pBB

]= [(At pAA +Bt pBA) (At pAB +Bt pBB)] = [At+1 Bt+1].

In the similar manner we can determine the distribution of employees after two periods.

x′t+1 ·M = x′t+2

[At+1 Bt+1]

[pAA pABpBA pBB

]= [At+2 Bt+2]

[At Bt ]

[pAA pABpBA pBB

][pAA pABpBA pBB

]= [At+2 Bt+2]

[At Bt ]

[pAA pABpBA pBB

]2

= [At+2 Bt+2]

In general, for n periods,

[At Bt ]

[pAA pABpBA pBB

]n

= [At+n Bt+n] (8.17)

When n is exogenous, the process is known as finite Markov chain.

8.4. AN APPLICATION OF MATRIX ALGEBRA 99

Example 8.4.

Consider the initial distribution of employees across two locations at time t = 0 as

x′0 = [A0 B0] = [200 200]

Let

M =

[pAA pABpBA pBB

]=

[0.8 0.20.4 0.6

].

Then the distribution of employees in the next period t = 1 is

[200 200][

0.8 0.20.4 0.6

]= [240 160] = [A1 B1]

. The distribution after two periods is

[200 200][

0.8 0.20.4 0.6

]2

= [200 200][

0.72 0.280.56 0.44

]= [256 144] = [A2 B2]

The distribution after six periods is

[200 200][

0.8 0.20.4 0.6

]6

= [200 200][

0.668 0.3320.664 0.336

]= [266.4 133.6] = [A6 B6]

Observe that when the transition matrix is raised to higher powers, the new transitionmatrix converges to a matrix whose rows are identical. This is referred to as he steadystate. In this example, the steady state would be

M =

[ 23

13

23

13

].

Try computing this value.

8.4.1 Absorbing Markov ChainsWe can extend the previous model by adding a third choice: employees can exit the firmwith

pAE ≡ probability that a current A choose to exit, E,

pBE ≡ probability that a current B choose to exit, E.

Let us assume thatpEA = 0, , pEB = 0, pEE = 1


where pEA, pEB, and pEE are the probabilities that an employee who is currently in state Ewill go to A, B or E respectively. The values assigned able mean that nobody who leavesthe firm ever returns. It is also implied by these restrictions that the firm never replacesemployees that leave. Starting at time t = 0, the Markov chain becomes,

[A0 B0 E0]

pAA pAB pAEpBA pBB pBEpEA pEB pEE

n

= [An Bn En]

or

[A0 B0 E0]

pAA pAB pAEpBA pBB pBE0 0 1

n

= [An Bn En]

This type of Markov process is referred to as absorbing Markov chain. The valuesof transition probabilities assigned in the third row are such that once an employee goesto state E, he or she remains in that state for ever. As n goes to infinity, An, and Bnwill approach zero and En will approach the total number of employees at time zero (i.e.,A0 +B0 +E0).

8.5 System of Linear EquationsThe system of linear equation is

Ax = b (8.18)

where matrix A is of dimension n× k, x is a column vector k× 1 and b is column vectorn×1. This is a system of n equations with k unknowns.

Example 8.5. The system of two linear equations,

5x+3y = 16x+ y = 2

can be written as [5 36 1

][xy

]=

[12

]When b = 0, the system is called a homogeneous system. When b = 0, it is called a

non-homogeneous system.

Definition 8.12. Column vector x∗ is called a solution to the system if Ax∗ = b.

8.5. SYSTEM OF LINEAR EQUATIONS 101

There are three important questions in this context.

(a) Does a solution exist?

(b) If there exists a solution, is it unique?

(c) If a solution exists, how do we compute it?

Claim 10. A homogeneous system Ax = 0 always has a solution (Trivial x = 0). But theremight be other solutions (solution may not be unique).

Claim 11. For a non-homogeneous system Ax = b, a solution may not exist.

Example 8.6. Following system of two linear equations

2x+4y = 5x+2y = 2

does not have a solution. Multiply second equation by 2. Then LHS of both equationsbecome same which leads to 5 = 4 which is a contradiction.

Example 8.7. Following system of two linear equations

2x+4y = 2x+2y = 1

has many solution.

Given [A]n×k

and bk×1

, the n× (k+1) matrix [Ab]n×(k+1)

=[A1 A2 · · · Ak b

]is called the

augmented matrix. Note Ai is the ith column of A.

Example 8.8. Let A =[

5 36 1

], b =

[12

]⇒ Ab =

[5 3 16 1 2

].

Theorem 8.3. The system of equations

[A]n×k

· xk×1

= bn×1

has a solution if and only ifrank (A) = rank (Ab) . (8.19)

The solution is unique if and only if

rank (A) = rank (Ab) = k = # of columns of A = # of unknowns (8.20)


Consider the case of n equations in n unknowns. In this case, A is n×n. If Ax = b hasa solution and if det(A) = 0 then the solution is characterized by

x∗n×1

= [A−1]n×n

· bn×1

(8.21)

Example 8.9. The system of linear equations

2x+ y = 02x+2y = 0

gives us

A =

[2 12 2

], b =

[00

], Ab =

[2 1 02 2 0

].

It is easy to verify thatrank (A) = 2 = rank (Ab) .

Hence solution exists and is unique.

Example 8.10. The system of linear equations

2x+ y = 04x+2y = 0

leads to

A =

[2 14 2

], b =

[00

], Ab =

[2 1 04 2 0

].

It is again easy to verify that

rank (A) = 1 = rank (Ab) .

Howeverrank (A) = rank (Ab) < k = 2.

Hence solution exists but is not unique1.

Now, we revert to the problem of computing the inverse of a non-singular matrix. Wefirst note the following result.

Theorem 8.4. Matrix [A]n×n

is invertible⇔ det(A) = 0. Also if [A]n×n

is invertible then det(A−1)=

1det(A) .

1A row or column vector of zeros is always linearly dependent on the other vectors.

8.5. SYSTEM OF LINEAR EQUATIONS 103

Proof. Suppose A is invertible. Then

A ·A−1 = I

so 1 = det I = det(AA−1) = det(A) det(A−1) using properties of determinants, notedabove. Consequently det(A) = 0, and det(A−1) = [det(A)]−1.

Suppose, next, that A is not invertible. Then, A is singular and so one of its columns(say, A1) can be expressed as a linear combination of its other columns A2, · · · ,An. That is,

A1 =n

∑i=2

αiAi

Consider the matrix, B, whose first column is[

A1 −n∑

i=2αiAi

]and whose other columns

are the same as those of A. Then, the first column of B is zero, and so |B| = 0. By theproperty of determinants, |B|= |A|, and so |A|= 0.

For a square matrix, [A]n×n

, we define the co-factor matrix of A to be the n× n matrix

given by

C =

A11 A12 ... A1n...

......

...An1 An2 ... Ann

The transpose of C is called the adjoint of A, and denoted by adj A.

Now, by the rules of matrix multiplication,

AC′ =

n∑j=1

a1 jA1 jn∑j=1

a1 jA2 j · · ·n∑j=1

a1 jAn j

......

n∑j=1

an jA1 jn∑j=1

an jA2 j · · ·n∑j=1

an jAn j

=

|A| 0 · · · 0...0 0 · · · |A|

This yields the equation

AC′ = |A| I (8.22)

If A is non-singular (that is invertible) then there is A−1 such that

AA−1 = A−1A = I (8.23)


Pre-multiplying (8.22) by A−1 and using (8.23),

C′ = |A|A−1

Since A is non-singular, we have |A| = 0, and

A−1 =C′

|A|=

ad jA|A|

(8.24)

Thus (8.24) gives us a formula for computing the inverse of a non-singular matrix in termsof the determinant and cofactors of A.

8.6 Cramer’s RuleRecall that we wanted to calculate the (unique) solution of a system of n equations in nunknowns given by

Ax = c (8.25)

where A is an n×n matrix, and c is a vector in Rn.To obtain a unique solution, we saw that we must have A non-singular, which now

translates to the condition “|A| = 0”. The unique solution to (8.25) is then

x = A−1c =ad jA|A|

c (8.26)

Let us evaluate x1, using (8.26). This can be done by finding the inner product of x withthe first unit vector, e1 = (1,0, · · · ,0). Thus,

x1 = e1x =e1ad j A

|A|c

=[A11A21 An1]c

|A|

= [c1A11 + c2A21 + .....+ cnAn1]/ |A|

=

∣∣∣∣∣∣∣c1 a12 .. a1n...

cn an2 .. ann

∣∣∣∣∣∣∣ |A|−1

8.6. CRAMER’S RULE 105

This gives us an easy way to compute the solution of x1. In general, in order to calculatexi, replace the ith column of A by the vector c and find the determinant of this matrix.Dividing this number by the determinant of A yields the solution xi. This rule is known asCramer’s Rule.

Example 8.11. General Market Equilibrium with three goods

Consider a market for three goods. Demand and supply for each good are given by:

D1 =5−2P1 +P2 +P3

S1 =−4+3P1 +2P2

D2 =6+2P1 −3P2 +P3

S2 =3+2P2

D3 =20+P1 +2P2 −4P3

S3 =3+P2 +3P3

where Pi is the price of good i; i = 1;2;3. The equilibrium conditions are: Di = Si; i =1;2;3, that is

5P1 +P2 −P3 = 9−2P1 +5P2 −P3 = 3−P1 −P2 +7P3 = 17

This system of linear equations can be solved at least in two ways.

(a) Using Cramer’s rule:

A1 = det

9 1 −13 5 −1

17 −1 7

= 356.

A = det

5 1 −1−2 5 −1−1 −1 7

= 178.

P∗1 =

A1

A=

356178

= 2.

Similarly P∗2 = 2 and P∗

3 = 3. The vector of (P∗1 , P∗

1 , P∗3 ) describes the general market

equilibrium.


(b) Using the inverse matrix rule. Let us denote

A =

5 1 −1−2 5 −1−1 −1 7

, P =

P1P2P3

, B =

93

17

The matrix form of the system is AP = B, which implies P = A−1B.

A−1 =1

det A

34 −6 415 34 77 4 27

P =

1178

34 −6 415 34 77 4 27

·

9317

=

223

Again, P∗

1 = 2, P∗1 = 2, and P∗

3 = 3.

8.7 Principal MinorsLet [A]

n×nbe a square matrix.

Definition 8.13. A principal minor of order k (1 ⩽ k ⩽ n) of [A]n×n

is the determinant of the

k× k sub matrix that remains when (n− k) rows and columns with the same indices aredeleted from A.

Example 8.12. Let

A =

1 2 30 8 12 5 9

(a) Principal minors of order 1 are 1,8,9.

(b) Principal minors of order 2 are,

det[

1 20 8

]= 8; det

[8 15 9

]= 67; det

[1 32 9

]= 3.

(c) Principal minor of order 3 is

det

1 2 30 8 12 5 9

= 23.

8.8. QUADRATIC FORM 107

8.7.1 Leading Principal MinorDefinition 8.14. A leading principal minor of order k, (1 ⩽ k ⩽ n) of [A]

n×nis the principal

minor of order k which has the last (n− k) rows and columns are deleted.

In the previous example, leading principal minor of order 1 is 1.Leading principal minor of order 2 is

det[

1 20 8

]= 8

and leading principal minor of order 3 is

det

1 2 30 8 12 5 9

= 23.

8.8 Quadratic FormA quadratic form consists of a square matrix [A]

n×nwhich is pre and post multiplied by a n

vector. It is a scalar.Q(x,A) = x′Ax (8.27)

Example 8.13. Let

A =

[a bc d

], x =

[x1x2

].

Then

Q(x,A) =[

x1 x2]·[

a bc d

]·[

x1x2

]= ax2

1 +(b+ c)x1x2 +dx22.

8.8.1 Matrix DefinitenessLet [A]

n×nbe symmetric.

(a) A is positive definite (PD) if

Q(z,A) = z′Az > 0, ∀ z ∈ Rn,z = 0. (8.28)


(b) A is negative definite (ND) if

Q(z,A) = z′Az < 0, ∀ z ∈ Rn,z = 0. (8.29)

(c) A is positive semidefinite (PSD) if

Q(z,A) = z′Az ⩾ 0, ∀ z ∈ Rn. (8.30)

(d) A is negative semi definite (NSD) if

Q(z,A) = z′Az ⩽ 0, ∀ z ∈ Rn. (8.31)

(e) A is indefinite if none of the above conditions hold true.

8.8.2 Test for definiteness of symmetric matrices:

[A]n×n

is PD if and only if all leading principal minors of A are strictly positive.

[A]n×n

is ND if and only if all leading principal minors of A have sign (−1)k .

[A]n×n

is PSD if and only if all principal minors of A are non-negative.

[A]n×n

is NSD if and only if all principal minors of A have sign (−1)k or are 0.

Example 8.14. Let

A =

[a11 a12a21 a22

].

Then A is

positive definite: a11 > 0, a11a22 −a12a21 > 0.negative definite: a11 < 0, a11a22 −a12a21 > 0.

positive semi-definite: a11 ⩾ 0, a22 ⩾ 0, a11a22 −a12a21 ⩾ 0.negative semi-definite: a11 ⩽ 0, a22 ⩽ 0, a11a22 −a12a21 ⩾ 0.

Note that a negative definite matrix necessarily has full rank: indeed, if the zero vectorcan be obtained by a linear combination of columns of A with weights α1, · · · , αn (not allzero), then we can define t = (α1, · · · ,αn) to obtain t ′At = 0.

8.9. EIGENVALUE AND EIGENVECTORS 109

Definition 8.15. A square matrix A is diagonally dominant if for each row i, we have|ai,i| ≥∑ j =i |ai, j|, and it is strictly diagonally dominant if the latter inequality holds strictlyfor each row.

Every symmetric, diagonally dominant matrix with non-positive entries along the di-agonal is negative semi-definite; and every symmetric, strictly diagonally dominant matrixwith negative entries along the diagonal is negative definite.

8.9 Eigenvalue and EigenvectorsGiven an n×n real matrix A, an eigenvalue of A is a number λ which when subtracted fromeach of the diagonal entries of A converts A into a singular matrix. Subtracting a scalar λfrom each diagonal entry of A is the same as subtracting λ times the identity matrix I fromA. Hence, λ is a eigenvalue of A if and only if A−λI is a singular matrix.

This is also equivalent to asking for what non-zero vectors x ∈ Rn, and for what com-plex numbers λ is it true that

Ax = λx (8.32)

This is known as the the eigenvalue problem.If x = 0 and λ satisfy equation (8.32), then λ is called a eigenvalue of A, and x is called

a eigenvector of A.Clearly (8.32) holds if and only if

(A−λI)x = 0 (8.33)

But (8.33) is a homogeneous system of n equations in n unknowns. It has a non-zerosolution for x if and only if (A−λI) is singular; that is, if and only if

|A−λI|= 0 (8.34)

This equation is called the characteristic equation of A. If we look at the expression

f (λ)≡ |A−λI| (8.35)

we note that f is a polynomial in λ; it is called the characteristic polynomial of A.

Example 8.15. Consider the 3×3 matrix A given by

A =

4 1 11 4 11 1 4


Then subtracting 3 from each diagonal entries transforms A into the singular matrix 1 1 11 1 11 1 1

.Therefore, 3 is an eigenvalue of matrix A.


A =

[4 00 2

]Then subtracting 3 from each diagonal entries transforms A into the singular matrix[

0 00 −1

].

Therefore, 4 is an eigenvalue of matrix A. Also, subtracting 2 from each diagonal entriestransforms A into the singular matrix [

2 00 0

].

Therefore, 2 is also an eigenvalue of matrix A.

The above example illustrates a general principal about the eigenvalues of a diagonalmatrix.

Theorem 8.5. The diagonal entries of a diagonal matrix A are the eigenvalues of A.

Theorem 8.6. A square matrix A is singular if and only if 0 is an eigenvalue of A.


A =

[4 −4−4 4

]Since the first row is negative of the second row, matrix A is singular. Hence 0 is aneigenvalue of A. Also subtracting 8 from each diagonal entries transforms A into thesingular matrix [

−4 −4−4 −4

].

Therefore, 8 is also an eigenvalue of matrix A.

8.9. EIGENVALUE AND EIGENVECTORS 111


A =

[2 11 2

]Then equation (8.34) becomes ∣∣∣∣ 2−λ 1

1 2−λ

∣∣∣∣ (8.36)

So, (4−4λ+λ2)−1 = 0, which yields

(1−λ)(3−λ) = 0

Thus, the eigenvalues are λ = 1 and λ = 3. In this case it was also possible to see thatλ = 1 is a eigenvalue as subtracting 1 from the diagonal entries converts matrix A into asingular matrix.

Putting λ = 1 in (8.33), we get[1 11 1

] [x1x2

]=

[00

]which yields

x1 + x2 = 0

Thus the general solution of the eigenvector corresponding to the eigenvalue λ= 1 is givenby

(x1, x2) = θ(1,−1) for θ = 0

Similarly, corresponding to the eigenvalue λ = 3, we have the eigenvector given by

(x1, x2) = θ(1,1) for θ = 0.

Example 8.19. A square matrix A whose entries are non-negative and whose rows (orcolumns) each add to 1 is called a Markov matrix. These matrices play a major role ineconomic dynamics. Consider the 2×2 matrix A given by

A =

[a 1−ab 1−b

]where a ≥ 0 and b ≥ 0. Then subtracting 1 from the diagonal entries leads to the matrix

A =

[a−1 1−a

b −b

]Notice that each row of the matrix adds to 0. But if the rows of a square matrix add to zero0,0, the columns are linearly dependent and the matrix is singular. This shows that 1 isan eigenvalue of the Markov matrix. This same argument shows that 1 is an eigenvalue ofevery Markov matrix.


8.10 Eigenvalues of symmetric matrixFor the case of a symmetric matrix A, we can show that all the eigenvalues of A are real.

Theorem 8.7. Let A be a symmetric n×n matrix. Then all the eignevalues of A are real.

Proof. Suppose λ is a complex eignevalue, with associated complex eignevector, x. Thenwe have

Ax = λx (8.37)

Define x∗ to be the complex conjugate of x, and λ∗ to be the complex conjugate of λ. Then

Ax∗ = λ∗x∗ (8.38)

Pre-multiply (8.37) by (x∗)′ and (8.38) by x′ to get

(x∗)′Ax = λ(x∗)′x (8.39)

x′Ax∗ = λ∗x′x∗ (8.40)

Subtracting (8.40) from (8.39)

(x∗)′Ax− x′Ax∗ = (λ−λ∗)x′x∗ (8.41)

since (x∗)′x = x′x∗. Also,

x′Ax∗ = (x′Ax∗)′ = (x∗)′A′x = (x∗)′Ax

since A′ = A (by symmetry). Thus (8.41) yields

(λ−λ∗)x′x∗ = 0 (8.42)

Since x = 0, we know that x′x∗ is real and positive. Hence (8.42) implies that λ = λ∗, so λis real.

8.11 Eigenvalues, Trace and Determinant of a MatrixIf A is an n×n matrix, the trace of A, denoted by tr(A), is the number defined by

tr(A) =n

∑i=1

aii

The following properties of the trace can be verified easily [Here A, B and C are n× nmatrices, and λ ∈ R].

8.11. EIGENVALUES, TRACE AND DETERMINANT OF A MATRIX 113

(a) tr(A+B) = tr(A)+ tr(B)

(b) tr(λA) = λ tr(A)

(c) tr(AB) = tr(BA)

(d) tr(ABC) = tr(BCA) = tr(CAB)

Let A be an n× n matrix. The characteristic polynomial of A, defined in (??) abovecan generally be written as

|A−λI|= (−λ)n +bn−1(−λ)n−1 + ....+b1(−λ)+b0 (8.43)

where b0, ...,bn−1 are the coefficients of the polynomial which are determined by the co-efficients of the A-matrix.

On the other hand, if λ1, ...,λn are the eigenvalues of A, then the characteristic equation(8.34) can be written as

0 = (λ1 −λ)(λ2 −λ)....(λn −λ) (8.44)

Using (8.34), (8.43), and (8.44) and ”comparing coefficients” we can conclude that

bn−1 = λ1 +λ2 + ...+λn

andb0 = λ1λ2...λn

Also, by looking at the terms in the characteristic polynomial of A which would involve(−λ)n−1, we can conclude that

bn−1 = a11 +a22 + ...+ann

Finally, putting λ = 0 in (8.43), we get

b0 = |A|

Thus we might note two interesting relationships between the characteristic values, thetrace and the determinant of A:

trA =n

∑i=1

λi

and

|A|=n

∏i=1

λi


8.11.1 Eigenvalues and Definiteness of Quadratic FormsTheorem 8.8. Let A be a symmetric matrix. Then,

(1) A is positive definite if and only if all the eigenvalues of A are positive.

(2) A is negative definite if and only if all the eigenvalues of A are negative.

(3) A is positive semidefinite if and only if all the eigenvalues of A are non-negative.

(4) A is negative semidefinite if and only if all the eigenvalues of A are non-positive.

(5) A is indefinite if and only if A has a positive eigenvalue and a negative eigenvalue.

Chapter 9

Problem Set 3

(1) Let

A =

[1 −1 70 8 10

], B =

9 6 5 41 −2 −3 30 1 −1 2

Compute AB. Is BA defined?

(2) Are the vectors(

12

)and

(13

)linearly independent?

(3) What is the transpose of

A =

1 7 9 58 6 2 3−1 0 4 −3

?

(4) Let

A =

[1 6 2−1 5 3

], B =

8 40 −27 −3

.Compute AB and BA.

(5) What is the determinant of

A =

1 2 3 41 2 1 21 3 5 72 1 4 1

?

115


(6) What is the rank of

A =

3 2 10 1 75 4 −1

?

(7) Consider the system of two equations in 2 unknowns:

3x+ y = 12x−5y = 1

Does a solution exist? If yes, is it unique?

(8) What is the definiteness of the following matrices? (Hint: Use the principal minors)

A =

[2 −1−1 1

], B =

[2 44 8

]C =

[−3 44 5

], D =

[−3 44 −6

].

(9) Consider the situation of a mass layoff (i.e. a firm goes out of business) where 2000people become unemployed and now begin a job search. There are two states: em-ployed (E) and unemployed (U) with an initial vector

x′0 = [E U ] = [0 2000].

Suppose that in any given period an unemployed person will find a job with proba-bility 0.7 and will therefore remain unemployed with a probability 0.3. Additionally,persons who find themselves employed in any given period may lose their job with aprobability of 0.1 (and will continue to remain employed with probability 0.9).

(i) Set up the Markov transition matrix for this problem

(ii) What will be the number of unemployed people after (a) two periods; (b) fourperiods; (c) six periods; (d) ten periods.

(iii) What is the steady-state level of unemployment?

(10) Prove that the eigenvalues of an upper or lower triangular matrix are precisely itsdiagonal entries.

(11) Suppose that A is an invertible matrix. Show that (A−λI)x = 0 implies that (A−1 −Iλ)x = 0. Conclude that for an invertible matrix A, λ is an eigenvalue of A if and onlyif 1

λ is an eigenvalue of A−1.

Chapter 10

Solution to PS 3

(1)

AB =

[1 −1 70 8 10

].

9 6 5 41 −2 −3 30 1 −1 2

=

[1 ·9−1 ·1+7 ·0 1 ·6−1 ·2+7 ·1 1 ·5+1 ·3−7 ·1 1 ·4−1 ·3+7 ·2

0 ·9+8 ·1+10 ·0 0 ·6−8 ·2+10 ·1 0 ·5−8 ·3−10 ·1 0 ·4+8 ·3+10 ·2

]=

[8 15 1 158 −6 −34 44

](10.1)

Note BA is not defined in this case.

(2) Set up the vector equation as

λ1

(12

)+λ2

(13

)=

(00

)⇔

λ1 +λ2 = 0

2λ1 +3λ2 = 0

⇔

λ1 =−λ2λ1 =−3

2λ2(10.2)

The only solution isλ1 = 0,λ2 = 0. (10.3)

So the two vectors are linearly independent.

117


(3)

A′ =

1 8 −17 6 09 2 45 3 −3

. (10.4)

(4)

AB =

[1 6 2−1 5 3

].

8 40 −27 −3

=

[1 ·8+6 ·0+2 ·7 1 ·4−6 ·2−2 ·3−1 ·8+5 ·0+3 ·7 −1 ·4−5 ·2−3 ·3

]=

[22 −1413 −23

](10.5)

BA =

8 40 −27 −3

. [1 6 2−1 5 3

]

=

8 ·1−4 ·1 8 ·6+4 ·5 8 ·2+4 ·30 ·1+2 ·1 0 ·6−2 ·5 0 ·2−2 ·37 ·1+3 ·1 7 ·6−3 ·5 7 ·2−3 ·3

=

4 68 282 −10 −6

10 27 5

(10.6)

(5)

A =

1 2 3 41 2 1 21 3 5 72 1 4 1

119

Let us expand the determinant by the first column.

|A| = 1 · (−1)1+1

∣∣∣∣∣∣2 1 23 5 71 4 1

∣∣∣∣∣∣+1 · (−1)2+1

∣∣∣∣∣∣2 3 43 5 71 4 1

∣∣∣∣∣∣+1 · (−1)3+1

∣∣∣∣∣∣2 3 42 1 21 4 1

∣∣∣∣∣∣+2 · (−1)4+1

∣∣∣∣∣∣2 3 42 1 23 5 7

∣∣∣∣∣∣∣∣∣∣∣∣

2 1 23 5 71 4 1

∣∣∣∣∣∣ = 2∣∣∣∣ 5 7

4 1

∣∣∣∣−1∣∣∣∣ 3 7

1 1

∣∣∣∣+2∣∣∣∣ 3 5

1 4

∣∣∣∣=−28

∣∣∣∣∣∣2 3 43 5 71 4 1

∣∣∣∣∣∣ = 2∣∣∣∣ 5 7

4 1

∣∣∣∣−3∣∣∣∣ 3 7

1 1

∣∣∣∣+4∣∣∣∣ 3 5

1 4

∣∣∣∣=−6

∣∣∣∣∣∣2 3 42 1 21 4 1

∣∣∣∣∣∣ = 2∣∣∣∣ 1 2

4 1

∣∣∣∣−3∣∣∣∣ 2 2

1 1

∣∣∣∣+4∣∣∣∣ 2 1

1 4

∣∣∣∣= 14

∣∣∣∣∣∣2 3 42 1 23 5 7

∣∣∣∣∣∣ = 2∣∣∣∣ 1 2

5 7

∣∣∣∣−3∣∣∣∣ 2 2

3 7

∣∣∣∣+4∣∣∣∣ 2 1

3 5

∣∣∣∣=−2

and therefore

|A| = −28−1 · (−6)+1 ·14+(−2) · (−2)= −4

(6) Recall the rank of a matrix A is the number of linearly independent column vectors ofA. It is also equal to the number of linearly independent row vectors of A.

A =

3 2 10 1 75 4 −1


Take Columns 1 and 2.

λ1

305

+λ2

214

=

000

⇔

3λ1 +2λ2 = 0

λ2 = 05λ1 +4λ2 = 0

⇔ λ1 = 0, λ2 = 0 (10.7)

is the only solution. So the first two columns are linearly independent. Now lets takeall three columns,

λ1

305

+λ2

214

+λ3

17−1

=

000

⇔

3λ1 +2λ2 +λ3 = 0 (i)

λ2 +7λ3 = 0 (ii)5λ1 +4λ2 −λ3 = 0 (iii)

⇔

(i)−2(ii) : 3λ1 −13λ3 = 0(iii)−4(ii) : 5λ1 −29λ3 = 0So, λ1 = 0, λ3 = 0 → λ2 = 0

(10.8)

is the only solution. So all three columns are linearly independent. This implies thatthe rank of matrix A is 3.

(7) Recall that the system of equation

An×k

· xk×1

= bn×1

(10.9)

has a solution if and only if

rank (A) = rank (Ab) (10.10)

and the solution, if it exists, is unique if and only if

rank (A) = rank (Ab) = k = # of columns of A = # of unknowns (10.11)

In this question

A =

[3 12 −5

], Ab =

[3 1 12 −5 1

]

121

We can verify that rank of A is 2 as under

λ1

(32

)+λ2

(1−5

)=

(00

)⇔

3λ1 +λ2 = 0

2λ1 −5λ2 = 0 ⇔

λ1 =−λ23

λ1 =52λ2

. (10.12)

which gives λ1 = 0,λ2 = 0 as the only solution. Then rank of Ab is also 2 (why?). Soa solution exists. Also

rank(A) = rank(Ab) = k = 2 = # of columns of A = # of unknowns.

Hence the solution is unique.(x = 6

17 ,y =− 117

)is the unique solution.

(8)

A11 = 2 > 0, A11A22 −A12A21 = 2 ·1−1 = 1 > 0: PDB11 > 0,B22 > 0, B11B22 −B12B21 = 2 ·8−16 = 0: PSDC11 < 0, C11C22 −C12C21 =−3 ·5−16 < 0 : IndefiniteD11 < 0, D11D22 −D12D21 =−3 · (−6)−16 > 0: ND

(9) Let Et and Ut denote the number of people who have employment and unemployedpeople in some period t. The transition probabilities are defined as follows.

pAA ≡ probability that a current A remains an A,

pAB ≡ probability that a current A moves to B,

pAA ≡ probability that a current B remains a B,

pAA ≡ probability that a current B moves to A.

The distribution of employees at time t is denoted by the vector x′t = [At Bt ] and thetransition probabilities in matrix form as

M =

[pAA pABpBA pBB

]=

[0.9 0.10.3 0.7

]. (10.13)

Then the distribution of employees across the two locations next period (t + 1) isx′t ·M = x′t+1, which is

[At Bt ]

[0.9 0.10.3 0.7

]= [(0.9At +0.3Bt) (0.1At +0.7Bt)] = [At+1 Bt+1].


In the similar manner we can determine the distribution of employees after two peri-ods.

x′t+1 ·M = x′t+2

[At+1 Bt+1]

[0.9 0.10.3 0.7

]= [At+2 Bt+2]

[At Bt ]

[0.9 0.10.3 0.7

][0.9 0.10.3 0.7

]= [At+2 Bt+2]

[At Bt ]

[0.9 0.10.3 0.7

]2

= [At+2 Bt+2]

In general, for n periods,

[At Bt ]

[0.9 0.10.3 0.7

]n

= [At+n Bt+n] (10.14)

The initial distribution of employees across two states at time t = 0 as

x′0 = [A0 B0] = [0 2000]

Then the distribution of employees in the next period t = 1 is

[0 2000][

0.9 0.10.3 0.7

]= [600 1400] = [A1 B1].

The distribution after two periods is

[0 2000][

0.9 0.10.3 0.7

]2

= [0 2000][

0.84 0.160.48 0.52

]= [960 1040] = [A2 B2]

The distribution after four periods is

[0 2000][

0.9 0.10.3 0.7

]4

= [0 2000][

0.782 0.2180.653 0.347

]= [1306 694] = [A4 B4]

The distribution after six periods is

[0 2000][

0.9 0.10.3 0.7

]6

= [0 2000][

0.762 0.2380.715 0.285

]= [1430 570].

123

The distribution after ten periods is

[0 2000][

0.9 0.10.3 0.7

]10

= [0 2000][

0.751 0.2490.745 0.255

]= [1490 510] = [A10 B10]

Observe that when the transition matrix is raised to higher powers, the new transitionmatrix converges to a matrix whose rows are identical. This is referred to as he steadystate. In this example, the steady state would be

M =

[ 34

14

34

14

].

Try computing this value.

(10) We use the Result 3 to prove this. The determinant of the upper triangular matrix isequal to the product of all the diagonal terms. By definition of eigenvalue, it is clearthat if we take λi = aii, then the determinant of the matrix [A−λiI] is zero since thediagonal entry in row i or column i is zero.

Similar arguments can be used to prove the result for the lower triangular matrix.

(11) Since A is an invertible matrix, A−1 exists and we can pre-multiply the equation (A−λI)x = 0 by (A−1. This yields (I−λA−1)x = 0 or ( 1

λ I−A−1)x = 0 or (A−1− 1λ I)x = 0

as desired. Thus for an invertible matrix A, λ is an eigenvalue of A if and only if 1λ is

an eigenvalue of A−1.

Chapter 11

Single and Multivariable Calculus

Recall the definition of functions discussed earlier. Now we discuss some features offunction which are useful in optimization exercise.

11.1 Surjective and Injective FunctionsDefinition 11.1. A function f : D → R is called surjective (or is said to map) D onto R iff (D) = R, i.e., if the image f (D) of the function is equal to entire range.

Definition 11.2. A function f : D → R is called injective or one to one if

f (x) = f (y)⇔ x = y. (11.1)

A function f : D → R is called a bijection if it is both surjective and injective.

Example 11.1. Consider function

f : R→ R : f (x) = x2.

It is not surjective as there exist no element in the domain which gets mapped into −1.Let us restrict the range to R+. So the new function is

g : R→ R+ : f (x) = x2.

Now this function is surjective as each non-negative real number has a pre-image (squareroot) in R. However, this function is not injective as the pre-image of 4 is both −2 and 2.

Next let us also restrict the domain of the function to R+. The function is

125

126 CHAPTER 11. SINGLE AND MULTIVARIABLE CALCULUS

h : R+ → R+ : f (x) = x2.

It is both surjective and injective. Hence it is bijective.

Example 11.2. Let A be a non-empty set and let S be a subset of A. We define a functionχS : A →0,1 by

χS (a) =

1, if a ∈ S;0, if a /∈ S. (11.2)

This function is called characteristic function or indicator function of S. It is widelyused in probability and statistics. If S is a non-empty proper subset of A, then χS is surjec-tive. If S = /0 or S = A, then χS is not surjective.

Definition 11.3. Inverse Function: Consider f : D → R. If ∃g : R → D such that ∀x ∈ D,

g( f (x)) = x, (11.3)

then g is called the inverse function of f and is written as f−1 : R → D. Alternatively wecan also define the inverse function as under. Let f : D → R be bijective. The inversefunction of f is the function f−1 : R → D such that ∀x ∈ D,

f−1 ( f (x)) = x. (11.4)

Theorem 11.1. Let f : D → R be bijective. Then f−1 : R → D is bijective.

Example 11.3. f (x) = 2x, f−1 (x) = x2 , f−1 ( f (x)) = f (x)

2 = 2x2 = x.

Theorem 11.2. Suppose f : D → R. Let A,A1,A2 be subsets of D and let B be a subset ofR.

Then

(a) If f is injective, then f−1 [ f (A)] = A,(b) If f is surjective, then f

[f−1 (B)

]= B,

(c) If f is injective, then f (A1 ∩A2) = f (A1)∩ f (A2) .

Proof. You should try and prove (a) and (b) on your own. I will provide proof for (c) here.We need to prove that f (A1 ∩A2)⊆ f (A1)∩ f (A2) and f (A1)∩ f (A2)⊆ f (A1 ∩A2).

Step 1: Showf (A1 ∩A2)⊆ f (A1)∩ f (A2)

11.2. COMPOSITION OF FUNCTIONS 127

Lety ∈ f (A1 ∩A2) .

Then ∃x ∈ A1∩A2 ϶ f (x) = y. Since x ∈ A1∩A2,x ∈ A1 and x ∈ A2. But then f (x)∈ f (A1)and f (x) ∈ f (A2). So f (x) ∈ f (A1)∩ f (A2). Observe that we have not used the fact thatf is injective. So this part of the result holds for any function.

Step 2: We need to show

f (A1)∩ f (A2)⊆ f (A1 ∩A2) .

Let y ∈ f (A1)∩ f (A2). Then y ∈ f (A1) and y ∈ f (A2). Hence there exists a point x1 ∈ A1and x2 ∈ A2 such that f (x1) = y and f (x2) = y. Or

f (x1) = y = f (x2) .

Since f is injective, we must have x1 = x2, or x1 ∈ A1 ∩ A2. But then y = f (x1) ∈f (A1 ∩A2).

Here are some more definitions related to functions.

Definition 11.4.

(a) A function f is odd if and only if for every x, − f (x) = f (−x).

(b) A function f is even if and only if for every x, f (x) = f (−x).

(c) A function f is periodic if and only if there exists a k > 0 such that for every x,f (x+ k) = f (x).

(d) A function f is increasing if and only if for every x and every y, if x ≤ y, then f (x)≤f (y).

(e) A function f is decreasing if and only if for every x and every y, if x ≤ y, then f (x)≥f (y).

11.2 Composition of FunctionsDefinition 11.5. Composition of Functions: If f : A → B and g : B →C are two functions,then for any a ∈ A, f (a) ∈ B. But B is the domain of g, so mapping g can be applied tof (a), which yields g( f (a)), an element in C. This establishes a correspondence betweena in A and c in C. This correspondence is called the composition function of f and g andis denoted by g f (read g of f ). Thus we have

(g f )(a) = g( f (a)) . (11.5)


Remark 11.1. Composition of two functions need not be commutative,

(g f )(a) = ( f g)(a)

as the following example shows.

Let f (x) = x2,g(x) = x+1. Then

(g f )(x) = x2 +1 but

( f g)(x) = (x+1)2 .

Theorem 11.3. Let f : A → B, and g : B →C.

(a) If f and g are surjective, then g f is surjective,(b) If f and g are injective, then g f is injective,(c) If f and g are bijective, then g f is bijective.

Proof. (a) Since g is surjective, range of g =C. That is for any c ∈C, there exists a b ∈ Bsuch that g(b) = c. Since f is also surjective, there exists a ∈ A such that f (a) = b. Butthen

(g f )(a) = g( f (a)) = g(b) = c.

So, (g f ) is surjective.(b) Since g is injective, for all b and b′ in B, if g(b) = g(b′) = c ∈ C then b = b′ and

since f is injective, for all a and a′ in A, if f (a) = f (a′) = b ∈ B then a = a′. Then

(g f )(a) = (g f )(a′)

⇒ g( f (a)) = g(

f(a′))

⇒ f (a) = f(a′)

⇒ a = a′

So, (g f ) is injective.(c) Proof of this result follows from (a) and (b).

11.3 Continuous FunctionsDefinition 11.6. The real number L is the limit of the function f : D → R at the point c ∈ Dif and only if for each ε > 0, there exists a δ > 0 such that | f (x)−L|< ε whenever x ∈ Dand 0 < |x− c|< δ.

11.3. CONTINUOUS FUNCTIONS 129

Definition 11.7. A function f : D → R is continuous at x0 ∈ D, if

∀ε > 0,∃δ > 0 ϶ d (x,x0)< δ ⇒ d ( f (x) , f (x0))< ε. (11.6)

A function f : D → R is continuous if it is continuous at all x0 ∈ D.

It is easy to draw examples of functions which are not continuous. An intuitive way ofunderstanding continuity of function is that we should be able to draw its graph withoutlifting pencil from paper. If a function has a point of discontinuity say, x0, then as weapproach x0 from the left hand side and from right hand side, the function attains differentvalues.

For a function to be continuous at x0, both the LHS and RHS limits must exist andconverge to the function value.

limx→x−0

f (x) = limx→x+0

f (x) = f (x0) (11.7)

Theorem 11.4. A function f : D → R is continuous if and only if for every convergentsequence of points xn ∈ D with limit x ∈ D, the sequence f (xn)→ f (x).

Example 11.4. Iflim

x→x−0f (x) = lim

x→x+0f (x) = f (x0)

then the function is not continuous. Take

y =

x for 0 ⩽ x < 1

20 for x = 1

21− x for 1

2 < x ⩽ 1

Definition 11.8. Given f : D → R, let A ⊆ R be any subset of the range. The inverse imageof A under f , f−1 (A), is the set of points x in the domain D such that f (x) ∈ A

f−1 (A) = x ∈ D | f (x) ∈ A . (11.8)

We give two more theorems on continuity of functions again without proof.

Theorem 11.5. A function f : D → R is continuous if and only if the inverse image ofevery open set is open.


Proof. Suppose f is continuous on D and V is an open set in R. We have to show thatf−1(V ) is open in D (i.e., every point of f−1(V ) is an interior point of f−1(V )). Let p ∈ Dand f (p) ∈ V . Since V is open, there exists ε > 0 such that y ∈ V if d( f (p),y) < ε. Alsosince f is continuous at p, there exists a δ > 0, such that d( f (p),y) < ε if d(p,x) < δ.Thus x ∈ f−1(V ) as soon as d(p,x)< δ and hence f−1(V ) is open.

Conversely, assume that f−1(V ) is open in D for every open set V in R. Fix p ∈ D andε > 0, and let V be the set of all y ∈ R such that d( f (p),y)< ε. Then V is open and hencef−1(V ) is open, and so there exists δ > 0 such that x ∈ f−1(V ) as soon as d(p,x)< δ. Butif x ∈ f−1(V ), then f (x) ∈V , and do d( f (p),y)< ε.

Corollary 1. A function f : D → R is continuous if and only if the inverse image of everyclosed set is closed.

This follows from Theorem 11.5, since a set is closed if and only if its complement isopen, and since f−1(V c) = [ f−1(V )]c for every V ⊂ R.

11.3.1 Properties of Continuous FunctionsClaim 12. If f and g are continuous functions then

f ±gf ·g(

fg

)(if g = 0)

max f ,gmin f ,g

are continuous.

Claim 13. If f is a continuous function of two variables f (x1,x2), then the functions ofone variable obtained by holding the other variable constant f (·, x2) and f (x1, ·) are alsocontinuous.

Theorem 11.6. Intermediate Value Theorem for continuous functions: Let f be a con-tinuous function on a domain containing [a,b], with say f (a) < f (b). Then for any y inbetween, f (a)< y < f (b), there exists c in (a,b) with f (c) = y.

We can apply the Intermediate Value Theorem to prove the existence of a fixed pointfor following function.

Theorem 11.7. Consider a continuous function f : [0,1] → [0,1]. Then there exists c ∈[0,1] such that f (c) = c.

11.3. CONTINUOUS FUNCTIONS 131

f (b)

f (a)

u

a bc

y = f (x)

y = u

Figure 11.1: Intermediate Value Theorem


Proof. Define a function g(x) = f (x)−x. It is continuous since it is sum of two continuousfunctions, f (x) and −x. If f (0) = 0, then x = 0 is a fixed point. If not, then f (0) > 0, org(0)> 0.

If f (1) = 1, then x = 1 is a fixed point. If not, then f (1)< 1, or g(1)< 0.Now we apply the Intermediate Value Theorem to claim that there exists a point c ∈

[0,1] such that g(c) = 0. This implies g(c) = f (c)− c = 0 or f (c) = c or c is a fixedpoint.

11.4 Extreme ValuesDefinition 11.9. The function f : D → R has a local maximum at x0 if there exists a neigh-borhood of x0 such that f (x)⩽ f (x0) for all x in the neighborhood. The function f : D→Rhas a strict local maximum at x0 if there exists a neighborhood of x0 such that f (x)< f (x0)for all x not equal to x0 in the neighborhood. Local minima are defined by reversing theinequalities.

Definition 11.10. The function f : D → R has a global maximum at x0 if f (x)⩽ f (x0) , ∀x∈D. The function f : D→R has a strict global maximum at x0 if f (x)< f (x0), ∀x∈D\x0.

Local minima are defined by reversing the inequalities.

Remark 11.2. A global maximum (minimum) is also a local maximum (minimum).

Theorem 11.8. Weierstrass Theorem: Suppose D is a non-empty closed and boundedsubset of Rn. If f : D → R is continuous on D, then there exists x∗ and x∗ in D such that

f (x∗)⩾ f (x)⩾ f (x∗) , ∀ x ∈ D. (11.9)

This is the theorem we will be using to show the existence of optimal bundles forconsumers and producers. So we need to understand it and be comfortable with using it.

The following examples show why the function domain must be closed and boundedin order for the theorem to apply. Each fails to attain a maximum on the given interval.

(a) f (x) = x defined over [0,∞) is not bounded from above.

(b) f (x) = x1+x defined over [0,∞) is bounded but does not attain its least upper bound 1.

(c) f (x) = 1x defined over (0,1] (which is bounded but not closed) is not bounded from

above.

(d) f (x) = 1− x defined over (0,1] is bounded but never attains its least upper bound 1.

(e) Defining f (x) = 0 in the last two examples shows that both functions require continu-ity on [a,b].

11.5. AN APPLICATION OF EXTREME VALUES THEOREM 133

11.5 An application of Extreme Values TheoremResult 5. Equivalence of norms in finite dimensional vector space.

If we are given two norms ∥·∥a and ∥·∥b on some finite-dimensional vector space Vover Rn, a very useful fact is that they are always within a constant factor of one another.In other words, there exists a pair of real numbers)<C1 <C2 such that, for all x ∈V , thefollowing inequality holds:

C1∥x∥b ≤ ∥x∥a ≤C2∥x∥b.

Note that any finite-dimensional vector space, by definition, is spanned by a basis e1,e2, · · · ,enwhere n is the dimension of the vector space. The basis is often chosen to be orthonormalif we have an inner product. That is, any vector x can be written

x =n

∑i=1

αiei,

where the αi are some real numbers depending on x.Now, we can prove equivalence of norms in four steps, the last of which requires

application of the Extreme Value Theorem.

Step 1 It is sufficient to consider ∥·∥a = ∥·∥1 (Transitivity property for the norms hold.)

First, let us define a taxi-cab style norm by

∥x∥1 =n

∑i=1

|αi|

We have seen earlier in a problem set that it is indeed a norm. The linear indepen-dence of any basis ei implies that x = 0 ⇐⇒ αi > 0 for some j ⇐⇒ ∥·∥1 > 0.The triangle inequality and the scaling property are obvious and follow from theusual properties of N1 norms on x ∈ Rn.

We will show that it is sufficient for to prove that ∥·∥a is equivalent to ∥·∥1, becausenorm equivalence is transitive: if two norms are equivalent to ∥·∥1, then they areequivalent to each other.

In particular, suppose both ∥·∥a and ∥·∥b are equivalent to∥·∥1 for constants 0 <C1 ≤C2 and 0 <C′

1 ≤C′2, respectively:

C1∥x∥1 ≤ ∥x∥a ≤C2∥x∥1,


C′1∥x∥1 ≤ ∥x∥b ≤C′

2∥x∥1.

Then it immediately follows that

C′1

C2∥x∥a ≤ ∥x∥b ≤

C′2

C1∥x∥a,

and hence ∥·∥a and ∥·∥b are equivalent.

Step 2 It is sufficient to consider only x with ∥x∥1 = 1.

We want to show thatC1∥x∥1 ≤ ∥x∥a ≤C2∥x∥1,

is true for all x ∈ V for some C1,C2. It is trivially true for x = 0, so we need onlyconsider x = 0, in which case we can divide by ∥x∥1 to obtain the condition

C1 ≤ ∥ x∥x∥1

∥a ≤C2,

where u ≡ x∥x∥1

has norm ∥u∥1 = 1.

Step 3 Any norm ∥x∥a is continuous under ∥x∥1.

We wish to show that any norm ∥·∥a is a continuous function on V under thetopology induced by the norm ∥·∥1. That is, we wish to show that for any ε > 0,there exists a δ > 0 such that

∥x− x′∥1 < δ → |∥x∥a −∥x′∥a|< ε

We prove this in two steps. First, by the triangle inequality on ∥·∥a, it follows that

∥x∥a −∥x′∥a = ∥x′+(x− x′)∥a −∥x′∥a ≤ ∥x− x′∥a,

and∥x′∥a −∥x∥a = ∥x− (x− x′)∥a −∥x∥a ≤ ∥x− x′∥a,

and therefore,|∥x∥a −∥x′∥a|< ∥x− x′∥a.

Second, applying the triangle inequality again, and writing x = ∑ni=1 αiei and x′ =

∑ni=1 α′

iei, we obtain

∥x− x′∥a ≤n

∑i=1

|αi −α′i|∥ei∥a ≤ ∥x− x′∥(max

i∥ei∥a).

11.6. DIFFERENTIABILITY 135

Therefore, if we chooseδ =

εmaxi ∥ei∥a

it immediately follows that

∥x− x′∥1 < δ → |∥x∥a −∥x′∥a|< ε

Step 4 The maximum and minimum of ∥x∥a on the unit sphere

Now we have a continuous function (the norm ∥·∥a on a compact (closed andbounded) non-empty domain the unit sphere and can apply Weierstrass Theorem.By the extreme value theorem, the function must achieve a maximum and mini-mum value on the set (it cannot merely approach them). Let

C1 = min∥u∥1=1

∥u∥a, and C2 = max∥u∥1=1

∥u∥a.

Since u = 0 for ∥u∥1 = 1, it follows that C2 ≥ C1 > 0, and C1 ≤ ∥u∥a leqC2 asrequired in Step 2. This completes the proof.

11.6 DifferentiabilityDefinition 11.11. A function f : R→ R is differentiable at x0 ∈ R, if

limh→0

f (x0 +h)− f (x0)

hexists. (11.10)

If this limit exists, we call it derivative of f at x0 and is denoted by f ′ (x0) or d f (x)dx |x=x0 .

We follow the steps listed below to determine whether a derivative exists and if yes, itsvalue.

(a) ∆ f = f (x0 +h)− f (x0) is the change in functional value.(b) slope of the secant is ∆ f

h = f (x0+h)− f (x0)h .

(c) If the secant ∆ fh has a limit as h → 0, then f is differentiable at x0, and the

derivative is equal to this limit.We can see that the derivative is equal to the slope of the tangent to the graph at x0.

Note that the tangent can be used to approximate the function in the neighborhood of x0.

f (x0 +h) = f (x0)+h · f ′ (x0) .

It is the best linear approximation.


Definition 11.12. A function f : R→ R is differentiable on a set S ⊆ R, if it is differen-tiable at each point x ∈ S. It is called differentiable if it is differentiable at each point ofthe domain.

Example 11.5. Let f (x) : R→R be f (x) = x2. This function is differentiable at all x ∈R.

αsec =f (x0 +h)− f (x0)

h=

(x0 +h)2 − x20

h

=

(x2

0 +h2 +2x0h)− x2

0h

=2x0h+h2

h= 2x0 +h

limh→0

αsec = 2x0 ⇒ f ′ (x0) = 2x0.

Definition 11.13. Second derivative: Let function f : R→ R be differentiable with f ′ (·)denoting its first derivative. If f ′ (·) is differentiable, its derivative is denoted by f ′′ (·) andis called the second derivative of f .

Definition 11.14. A function whose derivative exists and is continuous is called contin-uously differentiable or of class C 1. A function whose second derivative exists and iscontinuous is called twice continuously differentiable or of class C 2.

Result 6. If function f : R→ R is differentiable at x0 then it is continuous at x0.

Proof. Since f : R→ R is differentiable at x0 ∈ R, the limit,

limh→0

f (x0 +h)− f (x0)

h

exists and is f ′ (x0). Consider,

limx→x0

[ f (x)− f (x0) = limx→x0

[x− x0]

[f (x)− f (x0)

x− x0

];

= limx→x0

[x− x0] limx→x0

[f (x)− f (x0)

x− x0

]= 0 · f ′ (x0) = 0

limx→x0

f (x) = f (x0) .

Hence f is continuous at x0.


f (x) = |x|

x

f (x) is not differentiable at x = 0

Figure 11.2: Continuity does not imply differentiability

Note this claim does not hold in the other direction. Not all continuous functions aredifferentiable. Consider the example of absolute value function f : R−→ R is defined by

f (x) = |x| .

The absolute value or |x| of x is defined by

|x|=

x if x ⩾ 0−x if x < 0.

It is easy to check that f is continuous on R. However, it is not differentiable at x0 = 0(Please verify).

11.6.1 Rules of DifferentiationTheorem 11.9. If f and g are differentiable functions then

f ±g is differentiable with ( f ±g)′ (x) = f ′ (x)±g′ (x) (11.11)f ·g is differentiable with ( f ·g)′ (x) = f ′ (x)g(x)+ f (x)g′ (x) (11.12)

If g = 0, thenfg

is differentiable with(

fg

)′(x) =

f ′ (x)g(x)− f (x)g′ (x)

(g(x))2 .(11.13)


Theorem 11.10. Chain Rule: If f and g are differentiable, then

f g is differentiable with ( f g)′ (x) = f ′ (g(x)) ·g′ (x) (11.14)

Example 11.6. Let f (y) = lny and g(x) = x2. Then, f g(x) = ln(x2) and

( f g)′ (x) =1x2 ·2x =

2x

Theorem 11.11. If f is differentiable and has a local maxima or minima at x0, thenf ′ (x0) = 0.

Note the converse is not true. Take f (x) = x3 (See Figure 11.3). The first derivative iszero at x0 = 0 which is a point of inflection.

Following two examples illustrate differentiability and continuous differentiability ofa function.

Example 11.7. Let f be defined by

f (x) =

xsin(1

x

)for x = 0

0 for x = 0.

We know that derivative of sin(x) is cos(x). Using it

f ′ (x) = sin(

1x

)+ xcos

(1x

)(− 1

x2

)= sin

(1x

)− 1

xcos(

1x

)for x = 0.

At x = 0, this does not work as 1x is not defined there. We use the definition, for h = 0, the

secant isf (h)− f (0)

h=

hsin(1

h

)−0

h= sin

(1h

).

As h → 0, sin(1

h

)does not tend to any limit, so f ′ (0) does not exist.

Example 11.8. Let f be defined by

f (x) =

x2 sin(1

x

)for x = 0

0 for x = 0.


1

2

3

4

5

6

7

−1

−2

−3

−4

−5

−6

−7

1 2 3−1−2−3

Figure 11.3: Graph of x3


0

1

−1

0.2 0.4 0.6 0.8x

y

Figure 11.4: f(x) = sin 1x

We know that derivative of sin(x) is cos(x). Using it

f ′ (x) = 2xsin(

1x

)+ x2 cos

(1x

)(− 1

x2

)= 2xsin

(1x

)− cos

(1x

)for x = 0.

At x = 0, we use the definition as before, for h = 0, the secant is

f (h)− f (0)h

=h2 sin

(1h

)−0

h= hsin

(1h

)∣∣∣∣ f (h)− f (0)

h

∣∣∣∣ =

∣∣∣∣hsin(

1h

)∣∣∣∣⩽ |h|

As h → 0, we see that f ′ (0) = 0. Thus f (x) is differentiable everywhere but f ′ (x) is notcontinuous as cos

(1x

)does not tend to a limit as x → 0.

11.6.2 L’Hospital’s RuleSometimes we need to determine the value of a function where both the numerator andthe denominator go to zero. We use L’Hospital rule in such case. If f (a) = g(a) = 0 andg′ (a) = 0, then

limx→a

f (x)g(x)

=f ′ (a)g′ (a)

.

11.7. MONOTONE FUNCTIONS 141

Example 11.9. Find limx→4

x2−164√

x−8 .

f (x) = x2 −16,g(x) = 4√

x−8

f (4) = g(4) = 0, f ′ (x) = 2x, g′ (x) =2√x.

Then

limx→4

x2 −164√

x−8=

f ′ (4)g′ (4)

=81= 8.

11.7 Monotone FunctionsDefinition 11.15. Function f is monotone increasing at x0 if there exists a neighborhoodof x0 such that

f (x1)⩽ f (x0)⩽ f (x2)

for all x1,x2 in the neighborhood satisfying x1 < x0 < x2.

Definition 11.16. Function f is strictly increasing at x0 if there exists a neighborhood ofx0 such that

f (x1)< f (x0)< f (x2)

for all x1,x2 in the neighborhood satisfying x1 < x0 < x2.

Definition 11.17. Function f is monotone increasing on an interval if for all points x1,x2in the interval satisfying x1 < x2

f (x1)⩽ f (x2) .

Definition 11.18. Function f is strictly increasing on an interval if for all points x1,x2 inthe interval satisfying x1 < x2

f (x1)< f (x2) .

We define monotone and strictly decreasing functions in the same way by reversingthe inequalities. Some properties of derivative of monotone functions are

f ′ (x0)

><

0 ⇒ f is

strictly increasingstrictly decreasing

at x0. (11.15)

f ′ (x0)

⩾⩽

0 ⇔ f is

monotone increasingmonotone decreasing

at x0. (11.16)


Theorem 11.12. Intermediate Value Theorem for derivative: If f is differentiable on (a,b)then its derivative has the intermediate value property. If x1 < x2 are any two points in theinterval, then f ′ (x) assumes all values between f ′ (x1) and f ′ (x2) on the interval (x1,x2).

Theorem 11.13. Mean Value Theorem: Let f be a continuous function on the compactinterval [a,b] and differentiable on (a,b). Then there exists a point c ∈ (a,b) where

f ′ (c) =f (b)− f (a)

b−a.

Following claim is helpful in proving the Mean Value Theorem.

Claim 14. Let f (·) and g(·) be continuous functions on [a,b] and differentiable on (a,b).Then there exist x ∈ (a,b) such that

[ f (b)− f (a)]g′(x) = [g(b)−g(a)] f ′(x).

Proof. Define,h(s) = [ f (b)− f (a)]g(s)− [g(b)−g(a)] f (s).

Then, it is easy to check h(a) = f (b)g(a)− f (a)g(b) = h(b). We need to show that h′(x) =0 for some x ∈ (a,b). If h(x) is a constant function, then h′(x) = 0 for every point in (a,b).If not, then consider without loss of generality, h(x) > h(a) for some x ∈ (a,b). Sinceh(·) is a continuous function defined on a compact domain [a,b], Weierstrass Theorem canbe applied to claim that it attains a maximum at some point s ∈ (a,b). Also since h(·) isdifferentiable on (a,b) and attains its maximum at s ∈ (a,b), h′(s) = 0. The case whereh(x)< h(a) for some x∈ (a,b) can be proved in similar manner as in this case, the functionh(·) will attain a minimum at some interior point.

To prove the Mean Value Theorem, we consider g(x) = x. Then, g′(x) = 1 leads to

[ f (b)− f (a)](1) = [b−a] f ′(x) or f ′(x) =f (b)− f (a)

b−a,

for some x ∈ (a,b).

11.8 Functions of Several VariablesLet f : D → R where D ⊆ Rn be a function of n variables.

f (x) = f (x1,x2, · · ·xn) (11.17)

Examples of such functions are utility functions for several goods, the production functionsfor many inputs etc.

11.8. FUNCTIONS OF SEVERAL VARIABLES 143

f (b)

f (a)

a bc

f (b)− f (a)b−a

f ′(c)

Figure 11.5: Mean Value Theorem f ′(c) = f (b)− f (a)b−a

Definition 11.19. The function f (x) is differentiable at the point x if there exists an ndimensional vector D f (x), called the differential or total derivative of f at x, such that

∀ε > 0, ∃δ > 0 ϶ ∥x− y∥< δ⇒ | f (x)− f (y)−D f (x) · (x− y)|< ε · ∥x− y∥ .

11.8.1 Partial DerivativeTo us the more important concept is that of partial derivative which we define now.

Definition 11.20. Let f : D → R where D ⊆ Rn be a function of n variables. If the limit

limh→0

f (x1, · · · ,xi +h, · · ·xn)− f (x1, · · · ,xi, · · ·xn)

h

exists, it is called the ith (first order) partial derivative of f at x and is denoted by ∂ f (x)∂xi

orfi (x).

The function f (x) is then said to be partially differentiable with respect to xi. Thefunction f (x) is said to be partially differentiable if it is partially differentiable with respectto every xi.


Note ∂ f (x)∂xi

is the derivative of f (x1, · · · ,xn) with respect to xi holding all other variablesconstant. When all the partial derivatives exist, the vector of partial derivatives

∇ f (x) =[

∂ f (x)∂x1

, · · · , ∂ f (x)∂xn

]is called the Jacobian vector or the gradient vector. For functions of one variable, ∇ f (x) =f ′ (x).

Result 7. If a function is differentiable at x0 then it is partially differentiable at x0.

However, existence of all the partial derivatives do not guarantee even the continuityof the function as the following example shows.

Example 11.10. Let f (x,y) be defined as

f (x,y) = xy

x2+y2 if (x,y) = (0,0)0 otherwise.

We can prove that the partial derivatives D1 f (x,y) and D2 f (x,y) exist at every point in R2,although f is not continuous at (0,0).

However, if f is a real valued function defined on an open set D in Rn, and the partialderivatives are bounded in D, then f is continuous in D.

Example 11.11. Let f : R2 → R be

f (x1,x2) = x31 +2x1x2 +3x3

2.

Then∂ f (x)

∂x1= 3x2

1 +2x2,∂ f (x)

∂x2= 2x1 +9x2

2

∇ f (x) =[3x2

1 +2x2,2x1 +9x22],∀x ∈ R2

For functions of one variable we have seen earlier that we could approximate the func-tion around a point by the tangent to the function at the point. We can do something similarin case of functions of several variables. Instead of approximation by a line (the tangent),we now approximate by the tangent hyperplane.

Definition 11.21. Given f : D → R with gradient ∇ f (x) at x0, the tangent hyperplane tof at x0 is given by

f (x) = f (x0)+∇ f (x0) · (x− x0) .

Note that in an n dimensional world, the tangent hyperplane is an (n−1) dimensionalobject.


11.8.2 Second Order Partial DerivativesLet us look at the example above again. For

f (x1,x2) = x31 +2x1x2 +3x3

2,

∂ f (x)∂x1

= 3x21+2x2 and ∂ f (x)

∂x2= 2x1+9x2

2 are differentiable functions of x1 and x2 themselves.When we take partial derivatives of these functions we get the second partial derivatives.

∂2 f (x)∂x2

1= 6x1,

∂2 f (x)∂x2

2= 18x2,

∂2 f (x)∂x1∂x2

=∂2 f (x)∂x2∂x1

= 2.

This example can be generalized.

Definition 11.22. Let f : Rn →R be twice differentiable. For each of the n partial deriva-tives, we get n partial derivative of second order,

∂∂x j

(∂ f (x)

∂xi

)=

∂2 f (x)∂x j∂xi

= fi j (x) .

We organize the second order derivatives in a matrix, called the Hessian Matrix.

H f (x) = D2 f (x) =

∂2 f (x)

∂x21

· · · · · · ∂2 f (x)∂xn∂x1

∂2 f (x)∂x1∂x2

· · · · · · · · ·...

......

...∂2 f (x)∂x1∂xn

· · · · · · ∂2 f (x)∂x2

n

. (11.18)

If all the partial derivatives of the first order exist and are continuous then f is called C 1

or continuously differentiable. If all the partial derivatives of second order exist and arecontinuous then f is called C 2 or twice continuously differentiable and so forth.

Theorem 11.14. Young’s Theorem : If f is twice continuously differentiable then

∂2 f (x)∂x j∂xi

=∂2 f (x)∂xi∂x j

,

i.e., the Hessian of f is a symmetric matrix.

Example 11.12. For the example above

H f (x) =[

6x1 22 18x2

].


The off diagonal element of the Hessian are also called cross-partials. For functions ofone variable, H f (x) = f ′′ (x).

Example 11.13. Let f : R3 → R be

f (x) = 5x21 + x1x3

2 − x22x2

3 + x33.

Then∇ f (x) =

[10x1 + x3

2 3x1x22 −2x2x2

3 −2x22x3 +3x2

3]

and

H f (x) =

10 3x22 0

3x22 6x1x2 −2x2

3 −4x2x30 −4x2x3 −2x2

2 +6x3

We now provide three very useful theorems on continuous and differentiable functions

on convex sets in Rn for n ≥ 1. They are the Intermediate Value theorem, the Mean Valuetheorem and Taylor’s theorem.

Theorem 11.15 (Intermediate Value Theorem:). Suppose A is a convex subset of Rn, andf : A → R is a continuous function on A. Suppose x1 and x2 are in A, and f (x1) > f (x2).Then given any c ∈ R such that f (x1) > c > f (x2), there is 0 < θ < 1 such that f [θx1 +(1−θ)x2] = c.

Example 11.14. Suppose X ≡ [a,b] is a closed interval in R (with a < b). Suppose f isa continuous function on X . By Weierstrass theorem, there will exist x1 and x2 in X suchthat f (x1) ≥ f (x) ≥ f (x2) for all x ∈ X . If f (x1) = f (x2) [this is the trivial case], thenf (x) = f (x1) for all x ∈ X , and so f (X) is the single point, f (x1). If f (x1) > f (x2), thenusing the fact that X is a convex set, we can conclude from the Intermediate Value Theoremthat every value between f (x1) and f (x2) is attained by the function f at some point in X .This shows that, f (X) is itself a closed interval.

Theorem 11.16 (Mean Value Theorem). Suppose A is an open convex subset of Rn, andf : A → R is continuously differentiable on A. Suppose x1 and x2 are in A. Then there is0 ≤ θ ≤ 1 such that

f (x2)− f (x1) = (x2 − x1)∇ f (θx1 +(1−θ)x2)

Example 11.15. Let f : R→R be a continuously differentiable function with the propertythat f ′(x)> 0 for all x ∈R. Then given any x1, x2 in R, with x2 > x1 we have by the Mean-Value Theorem (since R is open and convex), the existence of 0 ≤ θ ≤ 1, such that

f (x2)− f (x1) = (x2 − x1) f ′(θx1 +(1−θ)x2)

Now f ′(θx1+(1−θ)x2)> 0 by assumption, and x2 > x1 by hypothesis. So f (x2)> f (x1).This shows that f is an increasing function on R.


Observe that a function f : R→R can be increasing without satisfying f ′(x)> 0 at allx ∈ R. For example, f (x) = x3 is increasing on R, but f ′(0) = 0.

Theorem 11.17 (Taylor’s Expansion up to Second-Order). Suppose A is an open, convexsubset of Rn, and f : A → R is twice continuously differentiable on A. Suppose x1 and x2

are in A. Then there exists 0 ≤ θ ≤ 1, such that

f (x2)− f (x1) = (x2 − x1)∇ f (x1)+12(x2 − x1)′H f (θx1 +(1−θ)x2)(x2 − x1)

Chapter 12

Problem Set 4

(1) Find the derivative of the following functions from R→ R:

f (x) =

[2x+1x−1

] 12

(12.1)

f (x) = ln(3x2 −5x) (12.2)

(2) Find the equation for the tangent to f (x) = 5x2 +3x−2 at x = 2.

(3) Let f : R→ R be

f (x) =

x2 −1,x ⩽ 0−x2,x > 0.

and g : R→ R

g(x) =

3x−2,x ⩽ 2−x+6,x > 2.

(a) Is f continuous at x = 0?

(b) Is g continuous at x = 2?

(4) Find

limx→0

f (x)g(x)

= limx→0

exp(x2)+ exp(−x)−2

2x(12.3)

149


(5) Evaluate the Hessian of the function f : R2 → R

f (x,y) = x2y+ y2x−2xy+3x

at the point (1,2).

(6) Let f (x,y) be defined as

f (x,y) = xy


Show that the partial derivatives D1 f (x,y) and D2 f (x,y) exist at every point in R2,although f is not continuous at (0,0).

(7) This exercise gives an example of a function with D12 f (x,y) = D21(x,y). Let f (x,y)be defined as

f (x,y) =

xy(x2−y2)


(a) We can prove that the partial derivatives D1 f (x,y) and D2 f (x,y) exist at everypoint in R2 and are bounded.

(b) Hence f is continuous on R2.

(c) The partial derivatives D1 f (x,y) and D2 f (x,y) are continuous at every point inR2.

(d) The second order cross partial derivatives D12 f (x,y) and D21 f (x,y) exist at everypoint in R2 and are continuous everywhere in R2 except at (0,0).

(e) D12 f (0,0) = +1 and D21 f (0,0) =−1.

Chapter 13

Solution to PS 4

(1) (a)

f (x) =

[2x+1x−1

] 12

f ′ (x) =12

[2x+1x−1

]− 12 (x−1)2− (2x+1)1

(x−1)2

= −32

[x−1

2x+1

] 12 1

(x−1)2

= −32

1

(2x+1)12 (x−1)

32

(13.1)

(b)

f (x) = ln(3x2 −5x)

f ′ (x) =1

3x2 −5x(6x−5)

=6x−5

3x2 −5x. (13.2)

(2) Recall the equation for the tangent to f (x) at x0 is

y = f (x0)+ f ′ (x0)(x− x0) .

Using f (2) = 24, f ′ (x) = 10x+3, f ′ (2) = 23,

y = 24+23(x−2) ,y = −22+23x.

151


(3) (a)lim

x→x−0f (x) = lim

x→x+0f (x) = f (x0)

limx→0−

f (x) = −1, limx→0+

f (x) = 0 = f (x0)

limx→x−0

f (x) = limx→x+0

f (x)

Hence f (x) is not continuous at x = 0.

(b)lim

x→2−g(x) = 3 ·2−2 = 4; lim

x→2+g(x) =−2+6 = 4 = g(x0) .

Hence g(x) is continuos at x = 2.

(4) Since bothf (0) = g(0) = 0,

we can use L’Hospital rule to find the limit.

f ′ (x) = 2x · exp(x2)− exp(−x)⇒ f ′ (0) =−1

g′ (x) = 2 ⇒ g′ (0) = 2. (13.3)

Hence

limx→0

exp(x2)+ exp(−x)−2

2x=

−12. (13.4)

(5)

∇ f (x,y) =[

2xy+ y2 −2y+3 x2 +2xy−2x]

H f (x,y) =

[2y 2x+2y−2

2x+2y−2 2x

]H f (1,2) =

[4 44 2

](13.5)

Chapter 14

Convex Analysis

14.1 Concave, Convex FunctionsDefinition 14.1. Function f : D → R is concave if ∀x,y ∈ D,∀λ ∈ [0,1],

λ f (x)+(1−λ) f (y)⩽ f [λx+(1−λ)y] (14.1)

Function f is strictly concave if the inequality is strict for all λ ∈ (0,1).

Following theorem gives a characterization of concave functions.

Theorem 14.1. Suppose A is a convex subset of Rn and f is a real-valued function on A.Then f is a concave function if and only if the set (x,α) ∈ A×R : f (x)⩾ α is a convexset in Rn+1.

In general, a concave function on a convex set in Rn need not be continuous as thefollowing example shows.

Example 14.1. Let

f (x) =

1+ x for x > 00 for x = 0.

This function is concave but it is not continuous at x = 0.

However, if the set A is open and convex, then the concave function f is continuous onA.

If the function is continuously differentiable on an open convex set, then followingtheorem characterizes the concave functions.

153

154 CHAPTER 14. CONVEX ANALYSIS

c

f(c)

d

f(d)

f (d)− f (c)d−c

f ′(c)

f ′(d)

Figure 14.1: A concave Function of one variable: f ′(d)< f (d)− f (c)d−c < f ′(c)

14.1. CONCAVE, CONVEX FUNCTIONS 155

f (x)

x

b

bC

f (x) is not continuous at x = 0

Figure 14.2: Concave function need not be continuous

Theorem 14.2. Suppose A ⊂ Rn is an open set, and f : A → R is continuously differen-tiable on A. Then f is concave on A if and only if

f (x2)− f (x1)⩽ ∇ f (x1)(x2 − x1)

whenever x1 and x2 are in A.

Also the function will be strictly concave if we change the weak inequality to strictinequality.

Theorem 14.3. Suppose A ⊂ Rn is an open set, and f : A → R is continuously differen-tiable on A. Then f is strictly concave on A if and only if

f (x2)− f (x1)< ∇ f (x1)(x2 − x1)

whenever x1 and x2 are in A.

Now we consider twice continuously differentiable functions. Following two theoremscharacterize concave and strictly concave functions.

Theorem 14.4. Suppose A ⊂ Rn is an open set, and f : A → R is twice continuouslydifferentiable on A. Then f is concave on A if and only if H f (x) is negative semi-definitefor all x ∈ A.

If H f (x) is negative definite whenever x ∈ A, then the function is strictly concave, butthe converse is not true.


−1

−2

−3

−4

−5

−6

−7

1 2 3−1−2−3

Figure 14.3: Graph of −x4

Theorem 14.5. Suppose A ⊂ Rn is an open set, and f : A → R is twice continuouslydifferentiable on A. If H f (x) is negative definite for all x ∈ A then f is strictly concave onA.

Following example shows that the converse implication does not hold.

Example 14.2. Let f : R→R be defined by f (x) =−x4 for all x ∈R (See Figure 2). Thisis a twice continuously differentiable function on the open, convex set R. We can verifythat f is strictly concave on R, but since f ′′(x) = −12x2, f (0) = 0. This shows that theconverse implication is not valid.

Claim 15. If f : D → R is a function of one variable and is twice continuously differen-tiable then

∀ x ∈ D, f ′′ (x) ⩽ 0 ⇔ f is concave.

Definition 14.2. Function f : D → R is convex if ∀x,y ∈ D,∀λ ∈ [0,1],

λ f (x)+(1−λ) f (y)⩾ f [λx+(1−λ)y] (14.2)

Function f is strictly convex if the inequality is strict for all λ ∈ (0,1).

14.1. CONCAVE, CONVEX FUNCTIONS 157

Claim 16. If f : D → R is a function of one variable and is twice continuously differen-tiable then

∀ x ∈ D, f ′′ (x) ⩾ 0 ⇔ f is convex.

Note that a local maxima (minima) of a concave (convex) function is a global maxima(minima) as well.

14.1.1 Hessian, Concavity and Convexity

Theorem 14.6. Let f : D → R (where D ⊆ Rn is open and convex) be twice continuouslydifferentiable. Then,

f is concave if and only if H f (x) is NSD ∀x ∈ D. (14.3)f is convex if and only if H f (x) is PSD ∀x ∈ D. (14.4)H f (x) is ND ∀x ∈ D ⇒ f is strictly concave. (14.5)H f (x) is PD ∀x ∈ D ⇒ f is strictly convex. (14.6)

Corollary 2. For a function of one variable, this means,

f is concave if and only if f ′′ (x)⩽ 0 ∀x ∈ D. (14.7)f is convex if and only if f ′′ (x)⩾ 0 ∀x ∈ D. (14.8)f ′′ (x)< 0 ∀x ∈ D ⇒ f is strictly concave. (14.9)f ′′ (x)> 0 ∀x ∈ D ⇒ f is strictly convex. (14.10)

Example 14.3. The implication

f is strictly convex ⇒ f ′′ (x)> 0,∀x ∈ D

does not hold.

Take f (x) = x4, f ′′ (x) = 12x2. It is strictly convex everywhere but f ′′ (0) = 0. Wewould need f ′′ (x)> 0,∀x ∈ D for the Hessian to be PD.


1

2

3

4

5

6

7

1 2 3−1−2−3


14.1.2 Some Useful ResultsProposition 3.

(a) If f and g are concave (convex)and a ⩾ 0,b ⩾ 0 then a f +bg is concave(convex).

(b) If f (x) is concave (convex) and F (u) concave(convex) and increasing then U (x) =F ( f (x)) is concave(convex).

(c) Function f is concave if and only if − f is convex.

14.2 Quasi-concave FunctionsDefinition 14.3. Function f : D → R is quasi-concave if ∀x,y ∈ D,∀λ ∈ [0,1]

f (λx+(1−λ)y)⩾ min f (x) , f (y) .

Theorem 14.7. Function f : D → R is quasi-concave if and only if ∀a ∈ R, the setf+a = x ∈ D | f (x)⩾ a is a convex set. The set f+a = x ∈ D | f (x)⩾ a is called uppercontour set.

Definition 14.4. Function f : D → R is quasi-convex if function − f is quasi-concave.

14.2. QUASI-CONCAVE FUNCTIONS 159

Theorem 14.8. Function f : D → R is quasi-convex if and only if ∀a ∈ R, the set f−a =x ∈ D | f (x)⩽ a is a convex set. The set f−a = x ∈ D | f (x)⩽ a is called lower con-tour set.

Theorem 14.9.

f : D → R concave ⇒ f is quasi-concave,f : D → R convex ⇒ f is quasi-convex.

Note that for functions of one variable, any monotone function is quasi-concave. Thishowever does NOT apply for functions of more than one variable. Also all quasi-concavefunctions need not be concave. Take f (x) = x2 it is monotone increasing, hence quasi-concave. But it is not concave, rather it is convex. For functions of one variable, followingtheorem characterizes the quasi-concave functions.

Theorem 14.10. A function f of a single variable is quasiconcave if and only if either(a) it is non-decreasing, (b) it is non-increasing, or (c) there exists x∗ such that f is non-decreasing for x < x∗and non-increasing for x > x∗.

14.2.1 Bordered HessianTo check quasi-concavity of a C 2 function, we use bordered Hessian matrix.

Definition 14.5. Bordered Hessian Let f be a C 2 function.

The bordered Hessian is

B(x) =

0 ∂ f (x)∂x1

· · · · · · ∂ f (x)∂xn

∂ f (x)∂x1

∂2 f (x)∂x2

1· · · · · · ∂2 f (x)

∂xn∂x1

· · · ∂2 f (x)∂x1∂x2

· · · · · · · · ·...

......

......

∂ f (x)∂xn

∂2 f (x)∂x1∂xn

· · · · · · ∂2 f (x)∂x2

n

.

Let Br (x) denote the sub matrix of the first (r+1) rows and columns of Br (x), i.e., Br (x)is (r+1)× (r+1) matrix.

Condition 1. A necessary condition for f to be quasiconcave is that (−1)r det(Br (x)) ⩾0,∀r = 1,2, · · · ,n;∀x ∈ D.


Condition 2. A sufficient condition for f to be quasiconcave is that (−1)r det(Br (x)) >0,∀r = 1,2, · · · ,n;∀x ∈ D.

When we check for quasi-concavity, we have to check the sufficient conditions. Weneed

det[

0 f1f1 f11

]< 0,

det

0 f1 f2f1 f11 f12f2 f21 f22

> 0, etc.

Remark 14.1. When we have to check whether a function is quasi-concave, start outchecking whether it is concave because it is easier to check for concavity and concavityimplies quasi-concavity.

Remark 14.2. Quasi-concavity is preserved under monotone transformation whereas con-cavity need not be preserved.

Example 14.4. Let f (x,y) =√

xy for (x,y) ∈ R2++.

Then

H f (x) =

−14

√yx3

14

√1xy

14

√1xy −1

4

√xy3

.The principal minors of order one are negative and of order two is zero. Hence f (x) isconcave and so quasi-concave.

Let us take a monotone transformation g(x,y) = ( f (x,y))4 = x2y2, for (x,y) ∈ R2++.

B(x,y) =

0 2xy2 2x2y2xy2 2y2 4xy2x2y 4xy 2x2

det(B1 (x,y)) =

[0 2xy2

2xy2 2y2

]=−4x2y4 < 0

⇒ (−1)1 det(B1 (x,y))> 0,∀(x,y) ∈ R2++.

det(B2 (x,y)) = −2xy2 (4x3y2 −8x3y2)+2x2y(8x2y3 −4x2y3)

= 8x4y4 +8x4y4 = 16x4y4 > 0,(x,y) ∈ R2++

⇒ g(x,y) is quasi-concave.

14.2. QUASI-CONCAVE FUNCTIONS 161

Note however, g(x,y) is not concave.

Hg (x,y) =[

2y2 4xy4xy 2x2

]Principal minors of order one are strictly positive and of order two is −12x2y2 which isstrictly negative. Thus g(x,y) is not concave.

Chapter 15

Problem Set 5

(1) Prove or give a counterexample: The sum of two concave functions is concave.

(2) Which of the following is true? Prove or give a counterexample.

(a) If A and B are convex sets, then A∪B is convex.

(b) If A and B are convex sets, then A∩B is convex.

(3) Determine whether each of the following functions is quasiconcave:

(a) f (x) = 3x+4;(b) g(x,y) = yex, y > 0;(c) h(x,y) =−x2y3.

(4) Show using an example that the sum of two quasiconcave functions need not be qua-siconcave (in general).

(5) Consider the functions:

(i) f (x,y,z) = 8x3 +2xy2 − z3

(ii) g(x,y) = x+ y− ex − ex+y

Write out the gradient vector and the Hessian matrices ∇ f (x,y,z) and H f (x,y,z) and∇g(x,y) and Hg(x,y). State if f concave, quasiconcave, quasiconvex? What aboutfunction g?

163

Chapter 16

Solution to PS 5

(1) Let f (x) and g(x) be two concave functions and let h(x) = f (x)+g(x). Concavity off and g imply, ∀x,y ∈ D,∀λ ∈ [0,1]

λ f (x)+(1−λ) f (y) ⩽ f [λx+(1−λ)y]λg(x)+(1−λ)g(y) ⩽ g [λx+(1−λ)y] .

Adding these two inequalities, we get

λ f (x)+(1−λ) f (y)+λg(x)+(1−λ)g(y) ⩽ f [λx+(1−λ)y]+g [λx+(1−λ)y]λ( f (x)+g(x))+(1−λ)( f (y)+g(y)) ⩽ f [λx+(1−λ)y]+g [λx+(1−λ)y]

λh(x)+(1−λ)h(y) ⩽ h [λx+(1−λ)y] .

This proves that h(x) = f (x)+g(x) is concave.

(2) (a) False. Consider A,B ⊆ R, A = [0,2] ,B = [4,6] .A∪B = [0,2]∪ [4,6]. Then 1 ∈A∪B,5 ∈ A∪B, but 1

2 ·1+12 ·5 = 3 /∈ A∪B.

(b) True. If A and B are convex sets, then A∩B is convex. Let x ∈ A∩B,y ∈ A∩B.Then,

λx+(1−λ)y ∈ A as x,y ∈ A (16.1)λx+(1−λ)y ∈ B as x,y ∈ B (16.2)

⇒ λx+(1−λ)y ∈ A∩B. (16.3)

Hence A∩B is convex.

(3) (a) Recall a monotone function of one variable is quasi-concave. Since f (x) = 3x+4is monotone increasing, it is quasi-concave.

165


(b) The bordered Hessian is

B(x,y) =

0 yexp(x) exp(x)yexp(x) yexp(x) exp(x)exp(x) exp(x) 0

det(B1 (x,y)) = det

[0 yexp(x)

yexp(x) yexp(x)

]= −y2 exp(2x)< 0;

detB2 (x,y) = det

0 yexp(x) exp(x)yexp(x) yexp(x) exp(x)exp(x) exp(x) 0

= yexp(3x)> 0.

Recall the sufficient condition for f to be quasiconcave is that

(−1)r det(Br (x))> 0,∀r = 1,2, · · · ,n;∀x ∈ D.

This holds true for the function. Hence it is quasiconcave.

(c)

B(x,y) =

0 −2xy3 −3x2y2

−2xy3 −2y3 −6xy2

−3x2y2 −6xy2 −6x2y

det(B1 (x,y)) = det

[0 −2xy3

−2xy3 −2y3

]= −4x2y6 ⩽ 0 (16.4)

det(B2 (x,y)) = det

0 −2xy3 −3x2y2

−2xy3 −2y3 −6xy2

−3x2y2 −6xy2 −6x2y

= −30x4y7 (16.5)

Note the sign of det(B2 (x,y)) is not positive. Hence it is not quasi-concave.

167

f (x)

x

Figure 16.1: Function f (x), Problem 4

g(x)

x

Figure 16.2: Function g(x), Problem 4

(4) Let

f (x) =

0 for x ⩽ 0

x for 0 ⩽ x ⩽ 12

1− x for 12 ⩽ x ⩽ 1

0 for x > 1

(16.6)

g(x) =

0 for x ⩽ 1

x−1 for 1 ⩽ x ⩽ 32

2− x for 32 ⩽ x ⩽ 2

0 for x > 2

and (16.7)

h(x) = f (x)+g(x) (16.8)


f (x)+g(x)

x

Figure 16.3: Function f (x)+g(x), Problem 4

In the figures, Fig. 1 and Fig. 2 functions are quasiconcave (each of them is firstnon-decreasing, then non-increasing), whereas Fig. 3 function, which is the sum ofthe top and middle functions, is not quasiconcave (it is not non-decreasing, is notnon-increasing, and is not non-decreasing then non-increasing.

(5) (i)

∇ f (x,y,z) =[

24x2 +2y2 4xy −3z2 ]H f (x,y) =

48x 4y 04y 4x 00 0 −6z

(16.9)

Then f (x,y) is not concave as the principal minor D1 = 48x ⩾ 0. The borderedHessian is

B(x,y) =

0 24x2 +2y2 4xy −3z2

24x2 +2y2 48x 4y 04xy 4y 4x 0−3z2 0 0 −6z

det(B1 (x,y)) = det

[0 24x2 +2y2

24x2 +2y2 48x

]= −576x4 −96x2y2 −4y4 ⩽ 0 (16.10)

det(B2 (x,y)) = det

0 24x2 +2y2 4xy24x2 +2y2 48x 4y

4xy 4y 4x

= −2304x5 −384x3y2 +48xy4 (16.11)

which could take both positive or negative values. Hence f (x,y,z) is neitherquasiconcave nor quasiconvex.

169

(ii)

∇g(x,y) =[

1− exp(x)− exp(x+ y) 1− exp(x+ y)]

Hg (x,y) =

[−exp(x)− exp(x+ y) −exp(x+ y)

−exp(x+ y) −exp(x+ y)

]. (16.12)

Then the leading principal minors are

D1 =−exp(x)− exp(x+ y)< 0,D2 = exex+y > 0 (16.13)

implies that g(x,y) is concave. Hence it is also quasi-concave.

Chapter 17

Implicit Function Theorem

17.1 The Linear Implicit Function TheoremFor the system of simultaneous linear equations Ax = b, we have seen earlier, that thereexists a unique solution for every choice of right hand side column vector b, if and only ifthe rank of A is equal to the number of rows of A which is equal to the number of columnsof the matrix A. In economic models, the vector b represents some externally determined(exogenous) parameters while the linear equations constitute some equilibrium conditionswhich determine the vector x which is the set of internal (endogenous) variables.

In this sense it is possible to divide the set of variables in two disjoint subsets of en-dogenous and exogenous variables. Thus a general linear economic model will have mequations in n unknowns:

a11x1+ a12x2+ · · ·+ a1nxn = b1· · · · · · · · · · · · · · ·

am1x1+ am2x2+ · · ·+ amnxn = bm

In general it will be possible to divide the set of variables into endogenous variables andexogenous variables. Such a division will be useful only if after substituting the values ofthe exogenous variables in the m equations, it is possible to obtain a solution of the systemfor the remaining endogenous variables. For this two conditions must hold. The numberof endogenous variables must be equal to the number of equations m and the square matrixcorresponding to the endogenous variables must have maximal rank m.

A formal statement of the above observation is known as the linear version of ImplicitFunction Theorem.

Theorem 17.1. Let x1, · · · , x j; x j+1, · · · , xn be a partition of the n variables in the system ofequations 23.1 into endogenous and exogenous variables respectively. Then there exists,

171

172 CHAPTER 17. IMPLICIT FUNCTION THEOREM

for every choice of the exogenous variables, x j+1, · · · , xn, a unique set of the values, x1,· · · , x j, if and only if

(a) j = m, i.e., number of endogenous variables = number of equations;

(b) the rank of j× j square matrix

[A]n×n

=

a11 a12 . . . a1 ja21 a22 . . . a2 j

......

......

a j1 a j2 . . . a jk

corresponding to the endogenous variables is j.

Here is an example for this theorem.

Exercise 17.1.

Let the system of equations be

x+ 2y+ z− w = 13x− y− 4z+ 2w = 30x+ y+ z+ w = 0

Determine how many variables can be endogenous at any one time and show a partition ofthe variables into endogenous and exogenous variables such that the system of equationshave a solution.

Find an explicit formula for the endogenous variables in terms of the exiguous vari-ables.

Exercise 17.2.

Let the system of equations be

−x+ 3y− z+ w = 04x− y+ 2z+ w = 37x+ y+ z+ 3w = 6

Is it possible to partition the variables into endogenous and exogenous variables such thatthe system of equations have a unique solution.

17.2. IMPLICIT FUNCTION THEOREM FOR R2 173

17.2 Implicit Function Theorem for R2

Consider the following example of a non-linear implicit function.

y2 −6xy+5x2 = 0

. Given any value of x, we can solve this equation for y. For example if x = 0, theny = 0; if x = 1 the equation takes the form y2 − 6y+ 5 = 0 and yields y = 1 or y = 5 assolution. Observe that it is possible to solve y explicitly in terms of x (it turns out to be acorrespondence) by applying the quadratic formula:

y =6x±

√36x2 −20x2

2or y = 5x or y = x.

It is possible to apply quadratic formula to the implicit function xy2 −3y−2expx = 0to obtain an explicit function for y as

y =3±

√9+8expx2x

.

However it could turn out to be the case that the explicit functions more difficult to workwith than the original implicit function.

If we come across an implicit function

y5 −5xy+4x2 = 0

then it is not possible to solve it in explicit form as there is no general formula for solvinga quintic equation. Note however that the equation still defines y as an implicit function ofx. For x = 0, we get y = 0, for x = 1 we get y = 1 and so on.

Example 17.1. A profit maximizing firm uses single input x (with unit cost w $ per unit)to produce an output y using production function y = f (x). Let the price of the output bep $ per unit. Then the profit function for this firm given p and w is

Π(x) = p · f (x)−w · x.

To obtain the optimal input x which maximizes the profit , we take the first order condition,which is

p · f ′(x)−w = 0.

We can treat p and w as exogenous variables and then this equation defines x as a functionof p and w. The equation need not yield x as an explicit function of p and w. However, itdoes define x as an implicit function of p and w and we can use it to estimate the changein x in response to changes in p and w.


Consider functions of the form

y = G(x1, · · · ,xn).

In this the endogenous variable y is an explicit function of the exogenous variables (x1, · · · ,xn).Such an ideal situation occurs in every case. More frequently we come across functions ofthe form

F(x1, · · · ,xn;y) = 0. (17.1)

If the function G determines value of y for each set of values (x1, · · · ,xn), then we saythat Eq. (17.1) defines the endogenous variable y as an explicit function of the exogenousvariables (x1, · · · ,xn).

We consider implicit functions in R2 of the form F(x,y) = c and analyze followingquestion. For a given implicit function F(x,y) = c and a specified solution (x0,y0),

(a) Does F(x,y) = c determine y as a continuous function of x for points (x,y) such that xis near x0 and y is near y0?

(b) If so, how do the changes in x affect the corresponding values of y?

More formally the two questions can be rephrased as under:

(a) Given the implicit function F(x,y) = c determine y and a point (x0,y0) such thatF(x0,y0) = c, does there exist a continuous function y = f (x) defined on an interval Iabout x0 so that:

(1) F(x, f(x))=c for all x ∈ I and

(2) y0 = f (x0)?

(b) If y = f (x) exists and is differentiable, what is f ′(x0)?

Theorem 17.2. Let F(x,y) = c be a continuously differentiable function on a ball about(x0,y0) in R2. Suppose F(x0,y0) = c, and consider the expression

F(x,y) = c.

If ∂F(x,y)∂y (x0,y0)

= 0, then there exists a continuously differentiable function y= f (x) defined

on an interval I about x0 such that:

(a) F(x, f(x))=c for all x ∈ I,

(b) y0 = f (x0), and

17.2. IMPLICIT FUNCTION THEOREM FOR R2 175

(c)

f ′(x0) =−∂F(x,y)

∂x (x0,y0)

∂F(x,y)∂y (x0,y0)

.

Example 17.2. Consider the function F : R2 → R given by F(x,y) = x2 + y2 = 1 (thegraph of this function is a circle with radius r = 1. If we choose (a,b) with F(a,b) = 1,and a = 1, a = −1, there are open intervals I ⊂ R containing a, and Y ⊂ R containingb, such that if x ∈ I, there is a unique y ∈ Y with F(x,y) = 0. Thus, we can define aunique function f : I → Y such that f (x, f (x)) = 0 for all x ∈ I. If a > 0 and b > 0, theng(x) =

√1− x2 on I. Note that if a = 1, and b = 0, so that f (a,b) = 0, we cannot find

such a unique function, f .

Chapter 18

Problem Set 6

(1) Let the system of equations be

x+ 3y+ z− 2w = 12x+ 6y− 2z− 4w = 3

(a) Determine how many variables can be endogenous at any one time and show apartition of the variables into endogenous and exogenous variables such that thesystem of equations have a solution.

(b) Find an explicit formula for the endogenous variables in terms of the exiguousvariables.

(2) Let the system of equations be

−x+ 3y− z+ w = 04x− y+ z+ w = 37x+ y+ z+ 3w = 6

Is it possible to partition the variables into endogenous and exogenous variables suchthat the system of equations have a unique solution.

(3) Show that the equation x2 − xy3 + y5 = 19 is an implicit function of y in terms of x inthe neighborhood of (x,y) = (5,2). Then estimate the value of y which corresponds tox = 4.9.

(4) Consider the function f (x,y,z) = x2 − y2 + z3.

(a) If x = 6 and y = 3, find a value of z which satisfies the equation f (x,y,z) = 0.

177


(b) Verify if this equation defines z as an implicit function of x and y near x = 6 andy = 3.

(c) If it does, compute(

∂z∂x

)(6,3)

and(

∂z∂y

)(6,3)

.

(d) If x increases to 6.1 and y decreases to 2.8, estimate the corresponding change inz.

(5) Consider the profit maximizing firm described in the Example 17.1. If p increases by∆p and w increases by ∆w, what will be the change in the optimal input amount x?

(6) Consider 3x2yz+ xyz2 = 96 as defining x as an implicit function of y and z around thepoint x = 2, y = 3, z = 2.

(a) If y increases to 3.1 and z remains same at 2, use the Implicit Function Theoremto estimate the corresponding x.

(b) Use the quadratic formula to solve 3x2yz+ xyz2 = 96 for x as an explicit functionof y and z.

(c) Use the approximation by differentials on the explicit formula to estimate x wheny = 3.1 and z = 2.

(d) Which of the two methods is easier?

Chapter 19

Unconstrained Optimization

19.1 Optimization ProblemWe call

max f (x) , x ∈ D ⊆ Rn, (19.1)

ormin f (x) , x ∈ D ⊆ Rn, (19.2)

where domain D is an open set, unconstrained optimization problems. There are norestrictions on x within the domain. Furthermore, there are no boundary solutions, be-cause the domain does not include its boundary (recall the definition of open set). Notemax f (x) ,x ∈ Rn or min f (x) ,x ∈ Rn are unconstrained optimization problems since Rn

is an open set. While solving unconstrained optimization problem, we want to use thetools we developed earlier, i.e., find points where ∇ f (x) = 0 and investigate the curvature/ shape of the function.

Remark 19.1. An unconstrained optimization problem may not have a solution.

Example 19.1. Let f (x) = x2. Then,

max f (x) ,x ∈ R (19.3)

does not have a solution. See the graph of f (x) = x2.

Remark 19.2. A minimization problem can always be turned into a maximization problemand vice versa:

minx∈D

f (x)⇔ maxx∈D

− f (x) . (19.4)

179

180 CHAPTER 19. UNCONSTRAINED OPTIMIZATION

1

2

3

4

5

6

7

1 2 3−1−2−3


We will see several examples of unconstrained optimization in these notes. Also thereare additional exercises in the problem set.

19.2 Maxima / Minima for C 2 functions of n variablesTheorem 19.1. First order necessary condition for local maxima / minima: Let A bean open set in Rn, and let f : A → R be a continuously differentiable function on A. Iffunction f has local maximum / minimum at x∗, then

∇ f (x∗) = 0

where 0 is a n×1 null vector.

Remark 19.3. The converse is not true.

Theorem 19.2. Second order necessary condition for local maxima / minima: Let A bean open set in Rn, and let f : A → R be a twice continuously differentiable function on A.

(a) If function f has local maximum at x∗ then H f (x∗) is negative semi-definite.

(b) If function f has local minimum at x∗ then H f (x∗) is positive semi-definite.

19.2. MAXIMA / MINIMA FOR C 2 FUNCTIONS OF N VARIABLES 181

Theorem 19.3. Sufficient conditions for local maxima / minima: Let A be an open setin Rn, and let f : A → R be a twice continuously differentiable function on A.

(a) If x∗ ∈ A is such that H f (x∗) is negative definite and ∇ f (x∗) = 0 then f has localmaximum at x∗.

(b) If x∗ ∈ A is such that H f (x∗) is positive definite and ∇ f (x∗) = 0 then f has localminimum at x∗.

Theorem 19.4. Concavity (convexity) and global maxima (minima): Let A be an openand convex set in Rn, and let f : A → R be a continuously differentiable function on A.

(a) If x∗ ∈ A is such that ∇ f (x∗) = 0 and f is concave on A, then f has global maximumat x∗.

(b) If x∗ ∈ A is such that ∇ f (x∗) = 0 and f is convex on A, then f has global minimumat x∗.

This is very easy to show. Note that concavity alongwith continuous differentiabilityof f implies that for all x ∈ A,

f (x)− f (x∗)⩽ ∇ f (x∗) · (x− x∗).

So f (x)− f (x∗)⩽ 0 or x∗ is a point of global maximum of f on A.

Theorem 19.5. Let A be an open and convex set in Rn, and let f : A → R be a twicecontinuously differentiable function on A.

(a) If x∗ ∈ A is such that ∇ f (x∗) = 0 and H f (x) is negative semi-definite for all x ∈ A,then f has global maximum at x∗.

(b) If x∗ ∈ A is such that ∇ f (x∗) = 0 and H f (x) is positive semi-definite for all x ∈ A,then f has global minimum at x∗.

Example 19.2. Letting X = R2+ and f (x) = x1x2 − 2x4

1 − x22, the first order conditions

are x2 − 8x31 = 0 and x1 − 2x2 = 0. Solving the second equation for x1, we have x1 =

2x2. Substituting this into the first equation, we have x2 − 64x32 = 0, which has three

solutions: x2 = 0, 18 ,−1

8 . Then the first order condition has three solutions, (x1,x2) =

(0,0),(14 ,

18),(−

14 ,−

18), but the last of these is not in the domain of f , and the first is on the

boundary of the domain. Thus, we have a unique solution in the interior of the domain:(x1,x2) = (1

4 ,18).


Example 19.3. Let us find maxima / minima (if any) for f : R3 → R

f (x,y,z) = x2 +2y2 +3z2 +2xy+2xz.

Step 1 Find ∇ f (x,y,z) and set it equal to zero vector.

∇ f (x,y,z) =[

2x+2y+2z 4y+2x 6z+2x]=[

0 0 0].

The only solution is (x,y,z) = (0,0,0). So we have one candidate for local maximum orminimum.

Step 2 Compute H f .

H f (x,y,z) =

2 2 22 4 02 0 6

.Note that in this example, H f is independent of (x,y,z). So whichever property of H f , weget, will be global.

Step 3 Determine the curvature. Begin with computing the leading principal minors.

D1 = 2 > 0, D2 = 2 ·4−2 ·2 = 4 > 0 andD3 = 2(24−0)−2(12−0)+2(0−8) = 48−24−16 = 8 > 0

All leading principal minors are strictly positive⇔ H f is Positive Definite ∀ (x,y,z) in-cluding (0,0,0) which implies that f is strictly convex.

Step 4 Conclude, using Theorem 19.4, that we have a global minimum at (0,0,0).


f (x,y) =−x3 + xy− y3.

Step 1 Find ∇ f (x,y) and set equal to zero vector.

∇ f (x,y) =[−3x2 + y −3y2 + x

]=[

0 0].

There are two solutions (x,y) = (0,0) ;(x,y) =(1

3 ,13

).


H f (x,y) =[−6x 1

1 −6y

]⇒ H f

(13,13

)=

[−2 11 −2

]and H f (0,0) =

[0 11 0

].


Step 3 Determine the curvature. For(1

3 ,13

), the leading principal minors.

D1 =−2 < 0,D2 = 3 > 0 ⇔ H f

(13,13

)is negative definite.

For (0,0),the principal minors are

D1 = 0,0;D2 =−1< 0 ⇒H f (0,0) is neither negative semi-definite nor positive semi-definite.

Step 4 Then Theorem 19.3 on second order necessary conditions applies and we havestrict local maximmum at

(13 ,

13

). The contrapositive of the second order conditions (The-

orem 19.2 gives neither local maximum not local minimum. It is an inflection point.


f (x,y) = 2x3 + xy2 +5x2 + y2.

Step 1 Find ∇ f (x,y) and set equal to zero vector.

∇ f (x,y) =[

6x2 + y2 +10x 2xy+2y]=[

0 0].

2xy+2y = 0 ⇒ y = 0∨ x =−1,for x = −1, 6x2 + y2 +10x = y2 −4 = 0 ⇒ y = 2∨ y =−2

for y = 0, 6x2 + y2 +10x = 6x2 +10x = 0 ⇒ x = 0∨ x =−53.

There are four solutions (x,y) = (0,0) ;(−1,2) ;(−1,−2) and(−5

3 ,0).


H f =

[12x+10 2y

2y 2x+2

].

Step 3

H f (0,0) =[

10 00 2

],D1 = 10 > 0,D2 = 20 > 0

⇒ H f (0,0) is positive definite.

H f (−1,2) =[−2 44 0

],D1 =−2 < 0,D2 =−16 < 0

⇒ H f (−1,2) is neither positive semi-definite nor negative semi-definite.


H f (−1,−2) =[−2 −4−4 0

],D1 =−2 < 0,D2 =−16 < 0

⇒ H f (−1,−2) is neither positive semi-definite nor negative semi-definite.

H f

(−5

3,0)=

[−10 0

0 −43

]D1 =−10 < 0,D2 =

403

> 0

⇒ H f

(−5

3,0)

is negative definite.

Step 4 Then Theorem 19.3 on sufficient conditions apply for (0,0) and(−5

3 ,0). We

have strict local minimum at (0,0); and strict local maximum at(−5

3 ,0). The contraposi-

tive of the second order conditions theorem implies that neither local maximum not localminimum exist at (−1,2) and (−1,−2). They are inflection points.

We describe a nice application of the unconstrained optimization technique in the de-termination of regression coefficients in the method of ordinary least squares.

Suppose there are n points (xi, yi), i = 1, · · · ,n in R2. Let F : R → R be given byF(x) = ax+ b for all x ∈ R. Our objective is to find a function F (that is, we want tochoose a ∈ R and b ∈ R) such that the quantity

n

∑i=1

[F(xi)− yi]2

is minimized. Thus the coefficients are such that the sum of the squares of the residu-als (error terms, i.e., the difference between the estimates and the actual observations) isminimized.

We can set up the problem as an unconstrained maximization problem as follows.Define f : R2 → R by

f (a,b) =−n

∑i=1

[axi +b− yi]2

The maximization problem then is

max(a,b)

f (a,b).

The function F is twice continuously differentible on R2 (being a polynomial function),


and we can calculate

f1 =−2n

∑i=1

[axi +b− yi]xi,

=−2n

∑i=1

[ax2i +bxi − xiyi]

f2 =−2n

∑i=1

[axi +b− yi]

f11 =−2n

∑i=1

x2i ;

f12 =−2n

∑i=1

xi;

f21 =−2n

∑i=1

xi;

f22 =−2n

Thus, the determinant of the Hessian of f is

det(H f (a,b)) = 4nn

∑i=1

x2i −4[

n

∑i=1

xi]2

Recall the Cauchy-Schwarz inequality,

|x · y| ≤ ∥x∥ · ∥y∥.

We can take the two vectors x and the sum vector u and apply the inequality to get

|x ·u| ≤ ∥x∥ · ∥u∥|x ·u|2 ≤ ∥x∥2 · ∥u∥2

|n

∑i=1

xi|2 ≤[∑x2

i]·n.

Therefore, det(H f (a,b)) ≥ 0. Since f11(a,b) ≤ 0, f22(a,b) ≤ 0, and det(H f (a,b)) ≥ 0,H f (a,b) is negative semi-definite. Consequently, if (a∗, b∗) satisfies the first-order condi-


tions, then (a∗, b∗) is a point of global maximum of f . The first-order conditions are

an

∑i=1

x2i +b

n

∑i=1

xi =n

∑i=1

xiyi

an

∑i=1

xi +bn =n

∑i=1

yi

Denoting

n∑

i=1xi

n by x and

n∑

i=1yi

n by y (the mean of x and mean of y respectively), we get

ax+b = y (19.5)

Using this in the first equation leads to

an

∑i=1

x2i +(y−ax)nx =

n

∑i=1

xiyi (19.6)

Thus,

n∑

i=1xiyi

n − xyn∑

i=1x2

i

n − x2

= a

y−ax = b

solves the problem. Note the solution is meaningful provided not all the xi are the same.

Chapter 20

Problem Set 7

(1) Consider the function g defined for all x ≥ 0, y ≥ 0 by

g(x,y) = x3 + y3 −3x−2y.

Write out ∇g(x,y) and Hg(x,y). Show that g is convex in its domain and find its(global) minimum.

(2) Find all the local maxima and minima of

f (x) = x4 −4x3 +4x2 +4? (20.1)

Which if any of them are global maxima or minima?

(3) Suppose that a perfectly competitive firm receives a price of P for its output, paysprices of w, and r for its labor (L), and capital inputs (K), and operates with the pro-duction function Q = LaKb.

(a) Write profits as a function of L, and K. Derive the first order conditions. Providean economic interpretation of the first order conditions.

(b) Solve for the optimal levels of L, and K.

(c) Check the second order conditions. What restrictions on the values of a, and bare necessary for a profit maximum. Provide an economic interpretation of theserestrictions.

(d) Find the signs of the partial derivatives of L with respect to P, w, and r.

(e) Derive the firm’s long run supply curve, i.e., Q as a function of the exogenousparameters. Find the elasticities of supply with respect to w, r, and P. Do theseelasticities sum to zero? Provide an economic explanation for this fact.

187


(4) Suppose that a perfectly competitive firm receives a price of P for its output, paysprices of w, v, and r for its labor (L), natural resource (R) and capital inputs (K), andoperates with the production function Q = A(L)a(K)b + ln R.

(a) Write profits as a function of L, R and K. Derive the first order conditions. Providean economic interpretation of the first order conditions.Now take A = 3, a = b = 1

3 for remainder of the problem.

(b) Check the second order conditions.

(c) Without explicitly solving for L∗,

(i) Find the change in L for a change in r when all other parameters are constant.(ii) Find the change in L for a change in v when all other parameters are constant.

(d) Solve for L∗. Take the partial derivatives of L∗ to confirm the results derived inpart (c) above.

Chapter 21

Solution to PS 7

(1)

∇g(x,y) =[

3x2 −3 3y2 −2]

Hg (x,y) =

[6x 00 6y

](21.1)

ThenD1 = 6x ⩾ 0,D2 = 36xy ⩾ 0 (21.2)

implies that g(x,y) is convex. Using

∇g(x,y) =[

3x2 −3 3y2 −2]=[

0 0]

(21.3)

x∗ = 1,y∗ =

√23

(21.4)

is the unique solution. Using the theorem on convexity and global minima, g(x,y)

attains global minimum at(

1,√

23

). Consider the function g defined for all x ⩾ 0,

y ⩾ 0 by

g∗(

1,

√23

)=−2− 4

3

√23=−3.09.

(2) We know that f ′ (x) = 0 is a necessary condition for f to have a local maxima orminima. Find all the local maxima and minima of

f ′ (x) = 4x3 −12x2 +8x = 0 (21.5)4x(x2 −3x+2

)= 0, (21.6)

x = 0,x = 1,x = 2 (21.7)

189


1

2

3

4

5

6

7

1 2 3−1

Figure 21.1: Graph of f (x) = x4 −4x3 +4x2 +4

If we plot the graph of this function, we can see that x = 0, and x = 2 are local minimaand x = 1 is local maxima. Also x = 0, and x = 2 are global minima and there is noglobal maxima.

(3) (a) The profit for the firm, when it uses K and L units of capital and labor to produceoutput Q = LaKb, given the out and input prices (P,w,r) is

Π(K,L) = (P ·Q−wL− rK).

The firm maximizes it’s profit by choosing K and L such that both the FOC andSOC are satisfied.The FOCs are as under.

dΠdL

=(

P ·aLa−1Kb −w)= 0

P ·aLa−1Kb = w;dΠdK

=(

P ·LabKb−1 − r)= 0,

P ·LabKb−1 = r.

191

The FOC with respect to L leads to the condition that the value of the marginalproduct of labor is equal to the wage rate w. Similarly, the FOC with respect to Kleads to the condition that the value of the marginal product of capital is equal tothe rental rate r.

(b) To solve for the optimal levels of L and K, we divide the first FOC by the secondand get,

P ·MPL

P ·MPK=

MPL

MPK=

dKdL

=P ·aLa−1Kb

P ·LabKb−1 =wr

;

aKbL

=wr

;

K =wbra

L.

Observe that the ratio of the MPL and MPK is the MRTS (Marginal rate of technicalsubstitution, i.e., the rate at which one can substitute labor for capital along aniso-quant.) The value of K can be substituted in any of the two FOC to get theexpression for L.

P ·aLa−1Kb = w;

P ·aLa−1(

wbra

L)b

= w;

P ·La+b−1(

wbra

L)b

=wa

;

P ·( a

w

)1−b(

br

)b

= L1−a−b

L∗ =( a

w

) 1−b1−a−b

(br

) b1−a−b

P1

1−a−b .


We compute the optimal value of K∗ from the last equation as under:

K∗ =wbra

L;

=wbra

( aw

) 1−b1−a−b

(br

) b1−a−b

P1

1−a−b ;

=( a

w

) 1−b1−a−b−1

(br

) b1−a−b+1

P1

1−a−b

=( a

w

) a1−a−b

(br

) 1−a1−a−b

P1

1−a−b .

(c) For SOC, we first write down the Hessian (the matrix of second order partialderivatives using the FOCs.

H =

[PFLL PFLKPFKL PFKK

]=

[Pa(a−1)La−2Kb PabLa−1Kb−1

PabLa−1Kb−1 Pb(b−1)LaKb−2

].

For the SOC to be satisfied, the leading principal minor of order one needs to benegative and the leading principal minor of order two needs to be positive. Thus,Pa(a−1)La−2Kb < 0, which implies that a−1 < 0 or a < 1. The LPM of ordertwo is the determinant of the Hessian matrix.

detH = det[

Pa(a−1)La−2Kb PabLa−1Kb−1

PabLa−1Kb−1 Pb(b−1)LaKb−2

]= P2ab(a−1)(b−1)L2a−2K2b−2 − (PabLa−1Kb−1)2,

= P2ab[(a−1)(b−1)−ab]L2a−2K2b−2;

= P2ab[1−a−b]L2a−2K2b−2 > 0,

which holds true if and only if 1−a−b > 0. Note that this condition also impliesthat b < 1.

Thus the production function is such that it displays diminishing marginal productin each of the two inputs (a < 1 and b < 1) and also it displays diminishing returnsto scale as the production function is homogeneous of degree a+b < 1.

(d) We use the expression for L∗ derived earlier to find the partial derivatives.

193

∂L∗

∂P=

(1

1−a−b

)( aw

) 1−b1−a−b

(br

) b1−a−b

P1

1−a−b−1 > 0,

∂L∗

∂w=

(− 1−b

1−a−b

)(a)

1−b1−a−b (w)−

1−b1−a−b−1

(br

) b1−a−b

P1

1−a−b < 0,

∂L∗

∂r=

(− b

1−a−b

)( aw

) 1−b1−a−b

(b)b

1−a−b (r)−b

1−a−b−1 P1

1−a−b < 0.

(e) The output is obtained by noting that the profit maximizing inputs are K∗ and L∗.

Q∗ = (L∗)a(K∗)b,

=

[( aw

) 1−b1−a−b

(br

) b1−a−b

P1

1−a−b

]a[( aw

) a1−a−b

(br

) 1−a1−a−b

P1

1−a−b

]b

,

=( a

w

) a(1−b)+ab1−a−b

(br

) ab+b(1−a)1−a−b

Pa+b

1−a−b ,

=( a

w

) a1−a−b

(br

) b1−a−b

Pa+b

1−a−b ,

=

[( aw

)a(

br

)b

Pa+b

] 11−a−b

.

For computing the price elasticity of supply with respect to out put price, note that

Q∗ =

[( aw

)a(

br

)b

Pa+b

] 11−a−b

,

= APa+b

1−a−b ,

where A =[( a

w

)a (br

)b] 1

1−a−b is a constant independent of P. It is easy to see

that the elasticity will be εP = a+b1−a−b . [Note that for Q = APb, εP = dQ

dP · PQ =


AbPb−1 PQ = b.] Similarly, εw =− a

1−a−b and εr =− b1−a−b . Thus,

εP + εw + εr =a+b

1−a−b+

−a1−a−b

+−b

1−a−b,

=−a+b−a−b1−a−b

= 0.

The economic interpretation is that if we change all the prices by same factor,then the profit maximizing quantity does not change. In other words, the profitmaximizing output is homogeneous of degree zero in the prices (P,w,r).

(f) You may like to write down the expression for the profit function explicitly interms of P, w and r, on your own.

(4) (a) The profit for the firm, when it uses K, L and R units of capital, labor and naturalresources to produce output Q = ALaKb + ln R, given the out and input prices(P,w,v,r), is

Π(K,L) = P ·Q−wL− rK − vR = P ·ALaKb +P ln R−wL− rK − vR.

The firm maximizes it’s profit by choosing K, L and R such that both the FOC andSOC are satisfied.The FOCs are as under.

dΠdL

= P ·aLa−1Kb −w = PFL −w = 0,

P ·AaLa−1Kb = w;dΠdK

= P ·ALabKb−1 − r = PFK − r = 0,

P ·ALabKb−1 = r,dΠdR

=PR− v = PFR − v = 0,

PR= v.

The FOC with respect to L leads to the condition that the value of the marginalproduct of labor is equal to the wage rate w. Similarly, the FOC with respect toK leads to the condition that the value of the marginal product of capital is equalto the rental rate r. Lastly, the FOC with respect to R leads to the condition thatthe value of the marginal product of natural resource is equal to the price of thenatural resource v.Now take A = 3, a = b = 1

3 for remainder of the problem.

195

(b) With the given parameter values, the FOCs are, (note Aa=1=Ab)

P ·L− 23 K

13 = w;

P ·L13 K− 2

3 = r,PR= v.

For SOC, we first write down the Hessian (the matrix of second order partialderivatives using the FOCs.

H =

PFLL PFLK PFLRPFKL PFKK PFKRPFRL PFRK PFRR

=

−23P ·L− 5

3 K13 1

3P ·L− 23 K− 2

3 013P ·L− 2

3 K− 23 −2

3P ·L 13 K− 5

3 00 0 − P

R2

.For the SOC to be satisfied, the leading principal minor of order one needs to benegative, the leading principal minor of order two needs to be positive and theleading principal minor of order three needs to be negative.

The LPM of order 1 is negative as, −23P ·L− 5

3 K13 < 0 (given that P > 0 and K >

0, L > 0). The LPM of order two is the determinant of the matrix obtained byremoving the third row and the third column.

detH2 = det

[−2

3P ·L− 53 K

13 1

3P ·L− 23 K− 2

3

13P ·L− 2

3 K− 23 −2

3P ·L 13 K− 5

3

]=

49

P2 ·L− 43 K− 4

3 − 19

P2 ·L− 43 K− 4

3 ,

=13

P2 ·L− 43 K− 4

3 > 0.

The LPM of order three is the determinant of the Hessian matrix. We compute thedeterminant using the third row to get,

detH = det

−23P ·L− 5

3 K13 1

3P ·L− 23 K− 2

3 013P ·L− 2

3 K− 23 −2

3P ·L 13 K− 5

3 00 0 − P

R2

=− P

R2

[49

P2 ·L− 43 K− 4

3 − 19

P2 ·L− 43 K− 4

3

],

=−13

P3

R2 ·L− 4

3 K− 43 < 0.

Hence the SOC is satisfied.


(c) (i) We first totally differentiate the FOCs to get,

dP ·L− 23 K

13 − 2

3P ·L− 5

3 K13 dL+

13

P ·L− 23 K− 2

3 dK = dw

dP ·L13 K− 2

3 +13

P ·L− 23 K− 2

3 dL− 23

P ·L13 K− 5

3 dK = dr

dPR2 − P

R3 dR = dv.

We can write this in matrix form as under:

A=

−23P ·L− 5

3 K13 1

3P ·L− 23 K− 2

3 013P ·L− 2

3 K− 23 −2

3P ·L 13 K− 5

3 00 0 − P

R2

, q=

dLdKdR

b=

dw−dP ·L− 2

3 K13

dr−dP ·L 13 K− 2

3

dv− dPR2

.

Then Aq = b. Note that the matrix A is same as the Hessian. Solving for dL,when dP = dw = dv = 0 and dr = 0, using Cramer’s Rule, we get,

dL =

det

0 13P ·L− 2

3 K− 23 0

dr −23P ·L 1

3 K− 53 0

0 0 − PR2

det

−23P ·L− 5

3 K13 1

3P ·L− 23 K− 2

3 013P ·L− 2

3 K− 23 −2

3P ·L 13 K− 5

3 00 0 − P

R2

=

(− PR2 )(−dr)1

3P ·L− 23 K− 2

3

−13

P3

R2 ·L− 43 K− 4

3=−dr ·L 2

3 K23

P< 0.

dL∗

dr=−L

23 K

23

P< 0.

Thus, L∗ decreases as r increases.

(ii) Solving for dL, when dP = dw = dr = 0 and dv = 0, using Cramer’s Rule,

197

we get,

dL =

det

0 13P ·L− 2

3 K− 23 0

0 −23P ·L 1

3 K− 53 0

dv 0 − PR2

det

−23P ·L− 5

3 K13 1

3P ·L− 23 K− 2

3 013P ·L− 2

3 K− 23 −2

3P ·L 13 K− 5

3 00 0 − P

R2

=

(− PR2 )(0)1

3P ·L− 23 K− 2

3

−13

P3

R2 ·L− 43 K− 4

3

=− 0

−13

P3

R2 ·L− 43 K− 4

3= 0.

dL∗

dv= 0.

Thus, L∗ is independent of v.

(d) To solve for the optimal levels of L and K, we divide the first FOC by the secondand get, (note a = b = 1

3 )

P ·MPL

P ·MPK=

MPL

MPK=

dKdL

=P ·aLa−1Kb

P ·LabKb−1 =wr

;

aKbL

=wr

;

K =wr

L.

Observe that the ratio of the MPL and MPK is the MRTS (Marginal rate of technicalsubstitution, i.e., the rate at which one can substitute labor for capital along aniso-quant.) The value of K can be substituted in any of the two FOC to get the


expression for L.

P ·L− 23 K

13 = w;

P ·L− 23 (

wr

L)13 = w;

P · (wr)

13 = wL

13 ;

P · ( 1rw2 )

13 = L

13 ;

L∗ =P3

rw2

dL∗

dr=− P3

r2w2 .

From the result in previous part, we get,

K∗ =P3

r2w; L∗ ·K∗ =

P6

r3w3 ; (L∗ ·K∗)23 =

P4

r2w2

dL∗

dr=−(L∗K∗)

23

P=

P3

r2w2

dL∗

dr=−L

23 K

23

P=− P3

r2w2 .

Since L∗ does not depend on v, the second conclusion is obvious.

Chapter 22

Optimization Theory: EqualityConstraints

22.1 Constrained OptimizationThe optimization problems we encounter in economics are, in general, constrained prob-lems where there are some restrictions on the set we can choose x from. Some examplesof constrained optimization problems we see are,

Example 22.1. Consumer Theory

maxx

u(x)

subject to x ∈ B(p, I)(22.1)

where B(p, I) is the budget set.Producer Theory

maxy,x

py−w · x

subject to (y,x) ∈ Y(22.2)

whereY = (y,x) ∈ R×Rn | y ⩽ f (x)

is the production possibility set with f (x) being the production function (one output, manyinputs).

We will work with maximization problem as it is easy to turn a minimization probleminto a maximization problem. A constrained maximization problem has the followingform.

199

200 CHAPTER 22. OPTIMIZATION THEORY: EQUALITY CONSTRAINTS

maxx

f (x)

subject to x ∈ G(x)

where

f (x) is called the objective function,

x is called the choice variable,G(x) is called the constraint set.

We assume the objective function to be C 2 so that we can use differential calculus tech-niques.

Example 22.2. Consider following optimization problem.

maxx

f (x)

subject to x ∈ [a,b](22.3)

A solution to this problem is

x∗ ∈ X∗ ⊂ [a,b]∧ f (x∗)⩾ f (x)∀x ∈ [a,b] . (22.4)

First question to answer is,Does a solution exist? Note f is continuous (because it is C 2) and [a,b] is a non-

empty compact set. We can use Weierstrass Theorem to show existence of a maximumand minimum. Having shown the existence, there are two possibilities:

(a) The solution is interior, x∗ ∈ (a,b);(b) We have corner (boundary) solution, i.e. x∗ = a, or x∗ = b or both.Case (i) If solution is interior, then x∗ must also be a local maximum, i.e.,

f ′ (x∗) = 0∧ f ′′ (x∗)⩽ 0. (22.5)

Hence we are able to apply earlier theorems to interior solutions.Case (ii) Boundary solution:If x∗ = a, then f ′ (a)⩽ 0.If x∗ = b, then f ′ (b)⩾ 0.In general, constrained optimization problems are of two categories, (a) with equality

constraint and (b) with inequality constraint. We discuss them next.

22.2. EQUALITY CONSTRAINT 201

22.2 Equality ConstraintIn this case the constraint set G(x) is described by k equality constraints.

g1 (x) = 0· · ·

gk (x) = 0

where x ∈ Rn, or,

G(x) = x ∈ Rn | g(x) = 0 . (22.6)

Note that g(x) = (g1 (x) , · · · ,gk (x)) is k-dimensional row vector. The interesting casewill be k < n as the following example shows.

Example 22.3. Consider

maxx∈R2

f (x)

subject to

x1 + x2 −2 = 0 : g1 (x) = 013x1 + x2 −1 = 0 : g2 (x) = 0.

The only point in the constraint set is (x1,x2) =(3

2 ,12

). Maximizing over this set is trivial.

The solitary point in the constraint set is also the solution.

Definition 22.1. A point x∗ ∈G(x) is point of local maximum of f subject to the constraintg(x) = 0, if there is δ > 0 such that x ∈ G(x)∩B(x∗,δ) implies f (x)⩽ f (x∗).

Definition 22.2. A point x∗ ∈ G(x) is point of global maximum of f subject to the con-straint g(x) = 0, if x∗ solves the problem

max f (x)subject to g(x) = 0.

Theorem 22.1. Necessary condition for a constrained local maximum (Lagrange Theo-rem) Let D ⊆ Rn be open and f : D → R, g : D → R be C 1 functions. Suppose x∗ isa point of local maximum of f subject to the constraint g(x) = 0. Suppose further that∇g(x∗) = 0. Then there is λ∗ ∈ Rk such that

∇ f (x∗) = λ∗∇g(x∗) . (22.7)

Remark 22.1. The condition ∇g(x∗) = 0 is called constraint qualification.

It is important to check the constraint qualification condition ∇g(x∗) = 0, for apply-ing the conclusion of Lagrange’s theorem. Without this condition, the conclusion of La-grange’s theorem would not be valid, as the following example shows.


Example 22.4. Let f : R2 →R be given by f (x1, x2) = 4x1+3x2 for all (x1, x2)∈R2; andlet g : R2 →R be given by g(x1, x2) = x2

1+x22. Consider the constraint set C = (x1, x2) ∈

R2 : g(x1,x2) = 0. The only element of this set is (0,0), so (x∗1, x∗2) = (0,0) is a pointof local maximum of f subject to the constraint g(x) = 0. Observe that the conclusion ofLagrange’s theorem does not hold here. For, if it did, there would exist λ∗ ∈ R such that

∇ f (0,0) = λ∗∇g(0,0)

But this means that(4,3) = (0,0)

which is a contradiction. The problem here is that ∇g(x∗1,x∗2) = ∇g(0,0) = (0,0), so the

constraint qualification condition is violated.

Theorem 22.2. Sufficient Conditions for a Global maximum: Let D ⊆ Rn be an openconvex set and f : D →R, g : D →R be C 1 functions. Suppose (x∗,λ∗)∈ D×Rk satisfies

∇ f (x∗) = λ∗∇g(x∗) . (22.8)

If L (x,λ∗) = f (x)−λ∗ ·g(x) is concave in x on D, then x∗ is a point of global maximumof f subject to constraint g(x) = 0.

Proof. Let x ∈C. Then,

L(x,λ∗)−L(x∗,λ∗)≤ (x− x∗)[∇ f (x∗)−λ∗∇g(x∗)]

by concavity of L in x on A. Using the first-order condition, we get

f (x)−λ∗g(x) = L(x,λ∗)≤ L(x∗,λ∗) = f (x∗)−λ∗g(x∗).

Since x ∈ C, and x∗ ∈ C, we have g(x) = g(x∗) = 0. Thus, f (x) ≤ f (x∗), and so x∗ is apoint of global maximum of f subject to the constraint g(x) = 0.

We use the following steps to solve the optimization problem with equality constraint.Let f and gi, i = 1, · · · ,k, be C 1 functions.

Necessity Route:Step 1 Existence of solution can be shown by using Weierstrass Theorem. For this we

need to show that the constraint set is closed and bounded.Step 2 Define the Lagrangian function as

L (x,λ) = f (x)−λ ·g(x) = f (x)−λ1g1 (x)−·· ·−λkgk (x)


where λi, i = 1, · · · ,k are Lagrange multipliers.Step 3 Take the derivative with respect to each variable x1, · · ·xn, and Lagrange multi-

pliers λ1, · · · ,λk.Step 4 Solve the following equations:

∂L (x,λ)∂xi

= 0, i = 1, · · · ,n;

∂L (x,λ)∂λi

= 0, i = 1, · · · ,k.

These are n+ k first order conditions (FOCs) for n+ k unknowns.Step 5 Let

M =(x,λ) ∈ Rn+k | x satisfies gi (x) = 0, i = 1, · · · ,k and FOCs hold.

Verify that ∇g(x∗) = 0 holds at each point in the set M. Then evaluate f at each (x,λ)∈ Mand find the maximum.

Sufficiency Route: We know that if f and λ1g1 (x) , · · · ,λkgk (x) are such that L (x,λ)is concave, then the FOCs are sufficient for a maximum. Hence if we can show concavity,then any point satisfying the FOC will be a solution. We illustrate the use of the two routesthrough following examples.

Remark 22.2. Note if f is not concave, we have to compare points in M (x,λ).

Example 22.5.maxx∈R2

+

f (x1,x2) =−x21 − x2

2

subject to 5x1 +10x2 = 10

The constraint set consists of 1−0.5x1 and non-negative values of x1 and x2 subject to theequality constraint. To get the constraint in g(x) = 0 form, we rearrange it

5x1 +10x2 −10 = 0.

Necessity Route Constraint set is non-empty as (2,0) is contained in it.Constraint set is closed. Take any convergent sequence xn ∈ G(x)→ x. Since 5xn

1 +10xn

2 −10 = 0,xn1 ⩾ 0,xn

2 ⩾ 0,∀n ∈ N, and weak inequalities are preserved in the limit,

5x1 +10x2 −10 = 0, x1 ⩾ 0, x2 ⩾ 0.


x2

1

x12

Figure 22.1: Constraint Set x2 = 1−0.5 · x1

So x ∈ G(x).Constraint set is bounded. Note for ∀x ∈ G(x),

x1 ⩽ 2 and x2 ⩽ 1 ⇒∥x∥⩽ ∥(2,1)∥=√

22 +12 =√

5.

So√

5 will serve as a bound. So the constraint set is compact and non-empty and theobjective function f is continuous, hence Weierstrass theorem is applicable and a solutionexists.

The Lagrangian and the FOCs are

L (x,λ) = −x21 − x2

2 +λ(5x1 +10x2 −10)∂L (x,λ)

∂x1= −2x1 +5λ = 0

∂L (x,λ)∂x2

= −2x2 +10λ = 0

∂L (x,λ)∂λ

= 5x1 +10x2 −10 = 0.

Now from the first two FOCs

4x1 = 2x2 ⇔ 2x1 = x2


and from the third FOC,

5x1 +20x1 −10 = 0

x1 =1025

=25,x2 =

45,λ =

425

.

We get a candidate for solution

m1 =

(25,45,

425

).

Since we know a solution exists, it must necessarily be either m1 or one of the corners(2,0) or (0,1). The constraint qualification

∇g(x∗) =[

5 10]= 0

is verified trivially.Verify that

f (2,0) =−4, f (0,1) =−1, f(

25,45

)=−4

5.

The solution then is x∗ =(2

5 ,45

).

Sufficiency Route

∇ f (x) =[−2x1 −2x2

]H f (x) =

[−2 00 −2

]D1 = −2 < 0,D2 = 4 > 0

So H f (x) is negative definite ∀x. Since H f (x) is negative definite ∀x, f is concave. Theconstraint g(x) is concave as it is linear. Also λ ⩾ 0. Then f (x)+ λg(x) is concave asa sum of concave functions. Then we know that the FOCs are sufficient condition for amaximum. So the point x∗ =

(25 ,

45

)is our solution.

Example 22.6. (Non-concave objective function)

max f (x1,x2) = x21x2

subject to 2x21 + x2

2 = 3.

The constraint set is an ellipsoid and can be rewritten as 3− 2x21 − x2

2 = 0. Here thesufficiency route will not work as the objective function is not concave.

H f (x) =

[2x2 2x12x1 0

]D1 = 2x2, D2 =−4x2

1, D2 < 0 ∀ x = 0


which means that H f (x) is indefinite ∀x = 0. So f is not concave. Hence we have to usethe necessity route.

Constraint set is non-empty as (1,1) is contained in it.Constraint set is closed. Take any convergent sequence xn ∈ G(x) → x. Since

2(xn

1)2

+(xn

2)2

= 3 ∀ n ∈ N, and weak inequalities are preserved in the limit,

2(x1)2 +(x2)

2 = 3.

So x ∈ G(x).Constraint set is bounded. Note for ∀x ∈ G(x),

x1 ⩽√

32<√

3 and x2 ⩽√

3.

So ∥x∥ ⩽∥∥√3,

√3∥∥ =

√3+3 =

√6. So the constraint set is compact and non-empty

and the objective function f is continuous, hence Weierstrass theorem is applicable and asolution exists.


L (x,λ) = x21x2 +λ

(3−2x2

1 − x22)

∂L (x,λ)∂x1

= 2x1x2 −4λx1 = 0

∂L (x,λ)∂x2

= x21 −2λx2 = 0

∂L (x,λ)∂λ

= 3−2x21 − x2

2 = 0.

Now2x1 (x2 −2λ) = 0 ⇔ x1 = 0∨λ =

x2

2.

Case (i)x1 = 0,x2 =±

√3,λ = 0.

We get two candidates for solution

m1 =(

0,√

3,0),m2 =

(0,−

√3,0).

case (ii)

λ =x2

2→ x2

1 − x22 = 0

→ x1 = x2 ∨ x1 =−x2

→ 3−2x21 − x2

2 = 0


gives x1 = 1∨ x1 =−1. If

x1 = 1 → x2 = 1∨ x2 =−1,λ =12∨λ =−1

2.

Similarly for x1 =−1. We get four more candidates for solution.

m3 =

(1,1,

12

),m4 =

(1,−1,−1

2

),

m5 =

(−1,−1,−1

2

),m6 =

(−1,1,

12

).

ThusM = m1,m2, · · · ,m6 .

The constraint qualification

∇g(x∗) =[−4x∗1 −2x∗2

]= 0

for each mi ∈ M. Verify that

f (0,√

3) = 0 = f(

0,−√

3),

f (1,1) = f (−1,1) = 1,f (1,−1) = f (−1,−1) =−1.

The solution then is x = (1,1) and x = (−1,1).


+

f (x1,x2) = x1x2

subject tox1 +4x2 = 16 or

16− x1 −4x2 = 0.

The Hessian is

H f (x) =[

0 11 0

]which is indefinite for all values of x ∈ R2

+. Hence the objective function is not concave.Observe that x is restricted to R2

+ and the equality constraint. This constraint set isnon-empty as (0,4) is contained in it, and compact. A solution to this problem exists as fis continuous and the constraint set is non empty and compact, hence Weierstrass theoremis applicable.



L (x,λ) = x1x2 +λ(16− x1 −4x2)

∂L (x,λ)∂x1

= x2 −λ = 0

∂L (x,λ)∂x2

= x1 −4λ = 0

∂L (x,λ)∂λ

= 16− x1 −4x2 = 0.

The FOCs will give us interior candidates. We will still need to compare with the corners.Now

x1 = 4x2 →8x2 = 16 → x2 = 2 andx1 = 8,λ = 8.

We get one candidate for solution

m1 = (8,2,8) .


∇g(x∗) =[−1 −4

]= 0

is satisfied trivially for m1. Compare it with the corners (0,4) ,(16,0) and verify that

f (0,4) = 0 = f (16,0) , f (8,2) = 16.

The solution then is x = (8,2).


+

f (x1,x2) = lnx1 + lnx2

subject tox1 +4x2 = 16 or

16− x1 −4x2 = 0.

Here the necessity route does not work as the objective function is not defined at thecorners of the constraint set, x = (16,0) or x = (0,4) as lny is not defined for y = 0.Weierstrass Theorem cannot be applied. Lets use the sufficiency route. Since ln is notdefined at the corners, the problem can be modified as follows

maxx∈R2

++

f (x1,x2) = lnx1 + lnx2

subject to 16− x1 −4x2 = 0.



L (x,λ) = lnx1 + lnx2 +λ(16− x1 −4x2)

∂L (x,λ)∂x1

=1x1

−λ = 0 → λx1 = 1

∂L (x,λ)∂x2

=1x2

−4λ = 0 → 4λx2 = 1

∂L (x,λ)∂λ

= 16− x1 −4x2 = 0.

So x1 = 4x2 from the first two FOCs. Substituting it in the third FOC, we get x1 = 8,x2 =2,λ = 1

8 . The Hessian is

H f (x) =

[− 1

x21

0

0 − 1x2

2

]

D1 = − 1x2

1< 0,D2 =

1x2

1x22> 0, ∀x ∈ R2

++.

Hence H f (x) is negative definite ∀x ∈R2++, so f is concave. Also g(x) = 16−x1 −4x2 is

linear, hence concave. Lastly λ > 0. So L (x,λ) is concave and the FOCs are sufficient formaximum. Hence x∗ = (8,2) is the solution.

Example 22.9. Application: Arithmetic mean Geometric mean inequality Consider

max(a,b)∈R2

+

f (a,b) = ab

subject to a+b = 2.(22.9)

Note the constraint set C = a ⩾ 0,b ⩾ 0,a+b = 2 is non empty, (2,0) is containedin it, closed since weak inequalities are preserved in the limit, and bounded as ∥(a,b)∥ ⩽|2| = 2. The objective function is continuous. Hence by Weierstrass Theorem a solutionexists.

Note that at the solution a > 0,b > 0. Hence we can rewrite the problem as under

max(a,b)∈R2

++

f (a,b) = ab

subject to g(a,b) = 2−a−b = 0.



L (a,b,λ) = ab+λ(2−a−b)∂L (x,λ)

∂a= b−λ = 0

∂L (x,λ)∂b

= a−λ = 0

∂L (x,λ)∂λ

= 2−a−b = 0.

Nowa = b → a = b = 1 = λ

We get one candidate for solution

m1 = (1,1,1) .


∇g(x∗) =[−1 −1

]= 0

is satisfied trivially for m1. Compare it with the corners (0,2) ,(2,0) and verify that

f (0,2) = 0 = f (2,0) , f (1,1) = 1.

The solution then is (1,1). In other words, we have shown that

ab ⩽ 1. (22.10)

Now let x1 ⩾ 0,x2 ⩾ 0 be arbitrary with

x1 + x2 = x > 0.

Then

2x1 +2x2 = 2x2x1

x+

2x2

x= 2

Note that a = 2x1x ⩾ 0,b = 2x2

x ⩾ 0 and a+b = 2. So we can apply the result shown above.

ab =

(2x1

x

)(2x2

x

)⩽ 1

x1x2 ⩽ x2

4=

(x1 + x2

2

)2

√x1x2 ⩽ x1 + x2

2which is the Arithmetic mean Geometric mean inequality.

Chapter 23

Envelope Theorem

23.1 Envelope Theorem for Unconstrained Problems

Let f (x,α) be a continuously differentiable function of x ∈ Rn and a parameter α. Foreach choice of α, consider the unconstrained maximization problem:

max f (x,α)

where choice variable is x. It is of interest to us as to how the maximizer value x∗ changesas the parameter value α changes.

Theorem 23.1. Let x∗(α) be a solution of this problem and also assume that x∗(α) is acontinuously differentiable function of α. Then,

ddα

f (x∗(α),α) =∂

∂αf (x∗(α),α)

Proof. We use the Chain Rule to get

ddα

f (x∗(α),α) = ∑i

∂∂xi

f (x∗(α),α) · ddα

x∗i (α)+∂

∂αf (x∗(α),α)

ord

dαf (x∗(α),α) =

∂∂α

f (x∗(α),α)

since ∂∂xi

f (x∗(α),α) = 0 for i = 1, · · · ,n by the First Order conditions for the solution.

211

212 CHAPTER 23. ENVELOPE THEOREM

Example 23.1. Consider the problem of maximizing the function f (x,a) =−2x2+2ax+4a2 with respect to x for any given value of a. What is the effect of a unit increase in thevalue of a on the maximum value of f (x,a).

This can be done directly by computing the x∗ which maximizes f . The first ordercondition yields

f ′(x) =−4x+2a = 0.

So x∗ = 0.5a. We can plug this into f (x,a) which leads to

f (x∗(a),a) = f (0.5a,a) =−0.5a2 +a2 +4a2 = 4.5a2.

Observe that f (x∗(a),a) increases at the rate of 9a as a increases. Alternatively we couldapply the Envelope Theorem to get

d f ∗

da=

∂ f (x∗(a),a)∂a

= 2x∗+8a = 9a

since x∗(a) = a.

23.2 Meaning of the Lagrange multiplierIn this section we will see that the multipliers measure the sensitivity of the optimal valueof the objective function to the changes in the right-hand sides (parameters) of the con-straints. In this sense, they provide a natural measure of the value for scarce resources ineconomics maximization problems.

Consider a simple maximization problem with two variables and one equality con-straint. Let f : R2 → R be denoted as f (x,y).

max(x,y)∈R2

+

f (x,y)

subject to h(x,y) = a.(23.1)

Let (x∗(a),y∗(a)) be a solution to the above problem for any given parameter value a.Thus f (x∗(a),y∗(a)) is the corresponding optimal value of the objective function. Let theLagrange multiplier be denoted by λ∗(a). Following theorem shows that λ∗(a) measuresthe rate of change of the optimal value of the objective function f with respect to a.

Theorem 23.2. Let f and h be continuously differentiable functions of two variables. Forany fixed value of the parameter a, let (x∗(a),y∗(a)) be the solution of the optimizationproblem (23.1) with the corresponding Lagrange multiplier λ∗(a). Assume that x∗(a),

23.2. MEANING OF THE LAGRANGE MULTIPLIER 213

y∗(a) and λ∗(a) are continuously differentiable functions of a and the constraint qualifica-tion holds at (x∗(a),y∗(a)). Then,

λ∗(a) =d f (x(a),y(a))

da.

Proof. The Lagrangian for the problem (23.1) is

L ≡ f (x,y)−λ(h(x,y)−a)

where a is a parameter. The solution of this problem, (x∗(a),y∗(a)),λ∗(a) satisfies theFirst Order conditions.

∂L (x∗(a),y∗(a),λ∗(a))∂x = 0

∂ f (x∗(a),y∗(a),λ∗(a))∂x −λ∗(a)∂h(x∗(a),y∗(a),λ∗(a))

∂x = 0∂L (x∗(a),y∗(a),λ∗(a))

∂y = 0∂ f (x∗(a),y∗(a),λ∗(a))

∂y −λ∗(a)∂h(x∗(a),y∗(a),λ∗(a))∂y = 0

for all values of a. Also, since h(x∗(a),y∗(a)) = a for all a, we get,

∂h(x∗(a),y∗(a),λ∗(a))∂x

dx∗(a)da

+∂h(x∗(a),y∗(a),λ∗(a))

∂ydy∗(a)

da= 1

for all a. Now we can use the Chain Rule and the two First Order conditions,

d f (x∗(a),y∗(a)da = ∂ f (x∗(a),y∗(a))

∂x · dx∗(a)da + ∂ f (x∗(a),y∗(a))

∂y · dy∗(a)da

= λ∗ ∂h(x∗(a),y∗(a),λ∗(a))∂x

dx∗(a)da +λ∗ ∂h(x∗(a),y∗(a),λ∗(a))

∂ydy∗(a)

da

= λ∗[∂h(x∗(a),y∗(a),λ∗(a))∂x

dx∗(a)da + ∂h(x∗(a),y∗(a),λ∗(a))

∂ydy∗(a)

da ]

= λ∗ ·1 = λ∗

214 CHAPTER 23. ENVELOPE THEOREM

Chapter 24

Homogeneous and HomotheticFunctions

24.1 Homogeneous FunctionsMost of us have come across homogeneous functions in the elementary algebra courses.For example f (x) = ax is homogeneous of degree 1, f (x) = axm is homogeneous of de-gree m, f (x) = ax + 1 is not a homogeneous function, and so on. First we define thehomogeneous function formally.

Definition 24.1. For any scalar k, a real valued function f (x1, · · · ,xn) is homogeneous ofdegree k on Rn

+ if for all x in Rn+, and all t > 0,

f (tx1, · · · , txn) = tk f (x1, · · · ,xn) for all t > 0.

Some examples of homogeneous functions are:

(a) Consider f : R2+ → R given by f (x1, x2) = x2

1 x32. Then if t > 0, we have f (tx1, tx2) =

(tx1)2( tx2)

3 = t2+3x21x3

2 = t5 f (x1, x2). So, f is homogeneous of degree 5.

(b) The function f (x1, x2) = xa1 xb

2 is homogeneous of degree a+b.

(c) The function f : R2+ → R given by f (x1, x2) = x2

1x2 + 3x1x22 + x3

2 is homogeneous ofdegree 3, since each term is homogeneous of degree 3.

(d) A linear function, f (x1, · · · ,xn) = a1x1 + · · ·+anxn, is homogeneous of degree 1.

(e) A quadratic for, Q(x,A) = x′Ax = ∑ai jxix j is homogeneous of degree 2.

215

216 CHAPTER 24. HOMOGENEOUS AND HOMOTHETIC FUNCTIONS

(f) The function f : R2+ → R given by f (x1, x2) = 3x2

1x32 − 6x5

1x22 is not homogeneous

since the first term is homogeneous of degree 5 but the second term is homogeneousof degree 7.

Let us look at the function f (x1, x2) = xa1 xb

2 again. We can calculate the partial deriva-tives of f on R2

++. Thus, ∂ f (x1,x2)∂x1

= axa−11 xb

2; ∂ f (x1,x2)∂x2

= bxa1,x

b−12 . Now, if t > 0, then

∂ f (tx1,tx2)∂x1

= a(tx1)a−1(tx2)

b = ta+b−1axa−11 xb

2 = ta+b−1 ∂ f (x1,x2)∂x1

. So ∂ f (x1,x2)∂x1

is homoge-

neous of degree (a+ b− 1). Similarly, one can check that ∂ f (x1,x2)∂x2

is homogeneous ofdegree (a+b−1). More generally, whenever a function, f , is homogeneous of degree k,its partial derivatives are homogeneous of degree (k−1).

Theorem 24.1. Suppose f is homogeneous of degree k on Rn+, and continuously differen-

tiable on Rn++. Then for each i = 1, · · · ,n, ∂ f (x1,··· ,xn)

∂xiis homogeneous of degree (k−1) on

Rn++.

Proof. To prove this let t > 0 be given. Then,

f (tx) = f (tx1, · · · , txn) = tk f (x1, · · · ,xn)

Applying the Chain Rule, we have for each i = 1,

∂ f (tx1, · · · , txn)

∂x1· t = tk ∂ f (x1, · · · ,xn)

∂x1(24.1)

Dividing by t, we get,

∂ f (tx1, · · · , txn)

∂x1= tk−1 ∂ f (x1, · · · ,xn)

∂x1(24.2)

Thus the partial derivatives are homogeneous functions of degree k−1.

We can also verify that

x1D1 f (x1, x2)+ x2D2(x1, x2) = axa1 xb

2 +bxa1 xb

2 = (a+b)xa1 xb

2 = (a+b) f (x1, x2).

More generally, when a function, f , is homogeneous of degree k, then x∇ f (x) = k f (x), aresult known as Euler’s theorem.

Theorem 24.2 (Euler’s Theorem). Suppose f : Rn+ → R is homogeneous of degree k on

Rn+ and continuously differentiable on Rn

++. Then,

x1 ·∂ f (x1, · · · ,xn)

∂x1+ · · ·+ xn ·

∂ f (x1, · · · ,xn)

∂xn= k f (x)x ·∇ f (x) = k f (x) for all x ∈ Rn

++

24.2. HOMOTHETIC FUNCTIONS 217

Proof. To prove this, letf (tx) = f (tx1, ..., txn)

Then, applying the Chain Rule, we have

f (tx)dx

=∂ f (tx)

∂x1· x1 + · · ·+ ∂ f (tx)

∂xn· xn (24.3)

But since f is homogeneous of degree k, we have

f (tx) = tk f (x1, · · · ,xn)

and,f (tx)dx

= ktk−1 f (x1, · · · ,xn) (24.4)

Take t = 1 to complete the proof.

Following is a converse of the Euler Theorem.

Theorem 24.3 (Euler’s Theorem). Suppose f : Rn+ → R is continuously function on Rn

+

and continuously differentiable on Rn++. Also suppose,

x1 ·∂ f (x1, · · · ,xn)

∂x1+ · · ·+ xn ·

∂ f (x1, · · · ,xn)

∂xn= k f (x)

for all x ∈ Rn++. Then, f is homogeneous of degree k.

24.2 Homothetic FunctionsDefinition 24.2. A function f : Rn

+ ⇒ R is nomothetic function if it is a monotone trans-formation of a homogeneous function.

Thus if there is a monotone transformation, g : R ⇒ R and a homogeneous functionh : Rn

+ ⇒ R such that f (x) = g(h(x)) holds for all x in the domain, then f is a nomotheticfunction.

The function, f (xy) = (xy)3 + xy is homothetic as h(xy) = z = xy is homogeneousfunction of degree 2 and g(z) = z3 + z is a monotone transformation of z.

218 CHAPTER 24. HOMOGENEOUS AND HOMOTHETIC FUNCTIONS

Chapter 25

Optimization Theory: InequalityConstraints

25.1 Inequality ConstraintThe more general constrained optimization problem deals with inequality constraint. Notethat the equality constraint g(x) = 0 can be expressed as g(x)⩾ 0 and g(x)⩽ 0.

The constrained maximization problem with which we are concerned is the following:

max f (x)

Subject to g j(x)≥ 0 f or j = 1, · · · ,mand x ∈ X

where X is a non-empty subset of Rn, and f , g j( j = 1, ...,m) are functions from X to R.We define the constraint set, C as follows:

C = x ∈ X : g(x)≥ 0

where, as usual, g(x) = [g1(x), · · · ,gm(x)].

Definition 25.1. Kuhn-Tucker Conditions: Let X be an open set in Rn, and f , g j( j =1, · · · ,m) be continuously differentiable on X . A pair (x∗,λ∗) in X ×Rm

+ satisfies theKuhn-Tucker conditions if

(i) Di f (x∗) +m

∑j=1

λ∗j · Dig j(x∗) = 0; i = 1, · · · ,n

(ii) g(x∗)⩾ 0 and λ∗ ·g(x∗) = 0.

219

220 CHAPTER 25. OPTIMIZATION THEORY: INEQUALITY CONSTRAINTS

Theorem 25.1. Let X be an open set in Rn, and f , g j( j = 1, · · · ,m) be continuously dif-ferentiable on X . Suppose a pair (x∗,λ∗) ∈ X ×Rm

+, satisfies the Kuhn-Tucker conditions.If X is convex and f ,g j( j = 1, · · · ,m) are concave on X , then x∗is a point of constrainedglobal maximum.

We illustrate the application of this Theorem through examples. First we take a linearobjective function.

Example 25.1. Solvemax

(x,y)∈R2+

f (x,y) = ax+by

subject to p1x+ p2y ⩽ M.

where a,b, p1, p2 and M are positive parameters. Find a solution to the problem for thefollowing parameter configurations

(i)ab

>p1

p2(ii)

ab

<p1

p2,

using Kuhn Tucker sufficiency theorem.

We need to check all conditions of the Theorem are satisfied.(i) Let

X =(x,y) ∈ R2 | x >−1,y >−1

.

Then X is open as its complement

XC =(x,y) ∈ R2 | x ⩽−1,y ⩽−1

is closed.

(ii) Function f (x,y) is continuous as ax and by are continuous and f (·, ·) is ob-tained by taking sum of two continuous functions.

Let g1 (x,y) = M− p1x− p2y,g2 (x,y) = x,g3 (x,y) = y are linear and hence continuousfunctions. Further fx (x,y) = a, fy (x,y) = b are continuous functions. Hence f , g j( j =1, · · · ,3) are continuously differentiable on X .

(iii) The set X is convex as (x1,y1), (x2,y2) ∈ X , then

x1 > −1,x2 >−1 → λx1 +(1−λ)x2 >−1∀λ ∈ (0,1)y1 > −1,y2 >−1 → λy1 +(1−λ)y2 >−1∀λ ∈ (0,1)

⇒ (λx1 +(1−λ)x2,λy1 +(1−λ)y2) ∈ X .

25.1. INEQUALITY CONSTRAINT 221

Function f (x,y) is concave as sum of two concave functions and g j( j = 1, · · · ,3) areconcave being linear functions. Hence for the following problem

max(x,y)∈X

f (x,y) = ax+by

subject to p1x+ p2y ⩽ M,x ⩾ 0,y ⩾ 0.

all conditions of Kuhn-Tucker sufficiency theorem are satisfied. We need to find pair((x∗, y∗) ,λ∗) ∈ X ×R3

+, that satisfies the Kuhn-Tucker conditions.

(i) Di f (x∗) +m

∑j=1

λ∗j · Dig j(x∗) = 0; i = 1, · · · ,n,

(ii) g(x∗)⩾ 0 and λ∗ ·g(x∗) = 0.

They are

a−λ1 p1 +λ2 = 0b−λ1 p2 +λ3 = 0M− p1x− p2y ⩾ 0, λ1 (M− p1x− p2y) = 0x ⩾ 0, λ2x = 0; y ⩾ 0, λ3y = 0

If λ1 = 0, then a−λ1 p1 +λ2 = 0 → λ2 =−a < 0 which contradicts λ2 ⩾ 0. Hence

λ1 > 0 → M− p1x− p2y = 0.

So x = y = 0 is ruled out. Take Case (i) ab > p1

p2. Consider x > 0, y = 0. Note λ2 = 0,

x = Mp1

,

ap1

= λ1, b− ap1

p2 +λ3 = 0,

λ3 =ap1

p2 −b = b(

ab

p2

p1−1)> 0,

since ab > p1

p2or a

bp2p1

> 1. Hence

x =Mp1

,y = 0,λ1 =ap1

,λ2 = 0,λ3 = b(

ab

p2

p1−1)> 0

is a solution. Case (ii) ab < p1

p2. Consider x = 0, y > 0. Note λ3 = 0, y = M

p2,

bp2

= λ1, a− bp2

p1 +λ2 = 0

λ2 =bp2

p1 −a = a(

ba

p1

p2−1)> 0


x2

Mp2

x1Mp1

Figure 25.1: Case (i): ab > p1

p2: Optimal Consumption Bundle =

(Mp1,0)

since ab < p1

p2or 1 < b

ap1p2

. Hence

x = 0,y =Mp2

,λ1 =bp2

,λ2 = a(

ba

p1

p2−1)> 0,λ3 = 0

is a solution.

Example 25.2. Solvemax

(x,y)∈R2+

f (x,y) = x1+x + y

subject to x+4y ⩽ 16.

using Kuhn Tucker sufficiency theorem.

We need to check all conditions of the Theorem are satisfied.(i) Let

X =(x,y) ∈ R2 | x >−1,y >−1

.


x2

Mp2

x1Mp1

Figure 25.2: Case (ii): ab < p1

p2: Optimal Consumption Bundle =

(0, M

p2

)Then X is open as its complement

XC =(x,y) ∈ R2 | x ⩽−1,y ⩽−1

is closed. (ii) Function f (x,y) is continuous as x,y,1+ x are continuous, 1+ x > 0 andf (·, ·) is obtained by taking quotient of two continuous functions x and 1+ x, with non-vanishing denominator and then adding a continuous function. Functions

g1 (x,y) = 16− x−4y;g2 (x,y) = x;g3 (x,y) = y

are linear and hence continuous. Further fx (x,y) = 1(1+x)2 , fy (x,y) = 1 are continuous

functions. Hence f , g j( j = 1, · · · ,3) are continuously differentiable on X .(iii) The set X is convex as (x1,y1) ,(x2,y2) ∈ X , then

x1 > −1,x2 >−1 → λx1 +(1−λ)x2 >−1∀λ ∈ (0,1)y1 > −1,y2 >−1 → λy1 +(1−λ)y2 >−1∀λ ∈ (0,1)

→ (λx1 +(1−λ)x2,λy1 +(1−λ)y2) ∈ X .

Function f (x,y) is concave as sum of two concave functions (exercise) and g j( j = 1, · · · ,3)are concave being linear functions. Hence for the following problem

max(x,y)∈X

f (x,y) = x1+x + y

subject to x+4y ⩽ 16,x ⩾ 0,y ⩾ 0.


all conditions of Kuhn -Tucker sufficiency theorem are satisfied. We need to find pair((x∗,y∗) ,λ∗) ∈ X ×R3

+, that satisfies the Kuhn-Tucker conditions. They are

1

(1+ x)2 −λ1 +λ2 = 0

1−4λ1 +λ3 = 016− x−4y ⩾ 0,λ1 (16− x−4y) = 0x ⩾ 0,λ2x = 0;y ⩾ 0,λ3y = 0.

If λ1 = 0, then 1−4λ1 +λ3 = 0 → λ3 =−1 < 0 which contradicts λ3 ⩾ 0. Hence

λ1 > 0 → 16− x−4y = 0

and x = y = 0 is ruled out. There are three remaining cases.Case (i) x > 0, y = 0. Note λ2 = 0, x = 16,

1

(1+16)2 = λ1;1− 4289

+λ3 = 0

λ3 = −285289

< 0

This contradicts λ3 ⩾ 0.Case (ii)x = 0, y > 0. Note λ3 = 0, y = 4,

14= λ1;1−λ1 +λ2 = 0

1− 14+λ2 = 0;λ2 =−3

4< 0.

This contradicts λ2 ⩾ 0.Case(iii)x > 0, y > 0. Note λ2 = 0, λ3 = 0,

1

(1+ x)2 = λ1; 1−4λ1 = 0,

(1+ x)2 = 4 → x = 1 > 0

16− x−4y = 0 → y =154

> 0.

Note that all conditions are satisfied. The Theorem asserts that(1, 15

4

)is a global maxi-

mum and therefore solves both the problem.


Example 25.3.In the above example, let the price of good y be p > 0 and income be I > 0. We can

redo the exercise by going over the Kuhn Tucker conditions again. They are1

(1+ x)2 −λ1 +λ2 = 0

1− pλ1 +λ3 = 0I − x− py ⩾ 0,λ1 (I − x− py) = 0x ⩾ 0,λ2x = 0;y ⩾ 0,λ3y = 0.

If λ1 = 0, then 1− pλ1 +λ3 = 0 → λ3 =−1 < 0 which contradicts λ3 ⩾ 0. Hence

λ1 > 0 → I − x− py = 0

and x = y = 0 is ruled out because I > 0. There are three remaining cases.Case (i) x > 0, y = 0. Note λ2 = 0, x = I,

1

(1+ I)2 = λ1

1− p

(1+ I)2 +λ3 = 0 → λ3 =p

(1+ I)2 −1.

If p(1+I)2 −1 ⩾ 0 → p ⩾ (I +1)2, then λ3 ⩾ 0. So solution is

(I,0, 1

(1+I)2 ,0,p

(1+I)2 −1)

if

p ⩾ (I +1)2.Case (ii)x = 0, y > 0. Note λ3 = 0, y = I

p ,

1p= λ1

1−λ1 +λ2 = 0 → 1− 1p+λ2 = 0

λ2 =1p−1.

If 1p −1 ⩾ 0 → 1 ⩾ p, then λ2 ⩾ 0. So solution is

(0, I

p ,1p ,

1p −1,0

)if p ⩽ 1.

Case(iii)x > 0, y > 0. Note λ2 = 0, λ3 = 0,1

(1+ x)2 = λ1,1− pλ1 = 0,

(1+ x)2 = p → x =√

p−1 > 0

I − x− py = 0 → y =I +1−√

pp

> 0.


Hence for p > 1 and I + 1 >√

p, the solution is(√

p−1, I+1−√p

p , 1p ,0,0

). Combining

them the solution(x∗,y∗,λ∗

1,λ∗2,λ

∗3)

is(

I,0, 1(1+I)2 ,0,

p(1+I)2 −1

)if p ⩾ (I +1)2(

0, Ip ,

1p ,

1p −1,0

)if p ⩽ 1, and(√

p−1, I+1−√p

p , 1p ,0,0

)if 1 < p < (I +1)2 .

The Kuhn Tucker Sufficiency Theorem asserts that this solution is a global maximum andtherefore solves both the problem.

25.2 Global maximum and constrained local maximumWe know from the definitions, that if x is a point of constrained global maximum, then xis also a point of constrained local maximum. The situations under which the converse istrue are given by the following theorem.

Theorem 25.2. Let X be a convex set in Rn. Let f ,g j( j = 1, ...,m) be concave functionson X. Suppose x is a point of constrained local maximum. Then, x is a point of constrainedglobal maximum.

Proof. Since x is a point of constrained local maximum, there is δ > 0, such that for allx ∈ B(x,δ)∩C, we have f (x)≤ f (x).

Now, if x is not a point of constrained global maximum, then there is some x ∈C, suchthat f (x) > f (x). One can choose 0 < θ < 1 with θ sufficiently close to zero, such thatx ≡ [θ x+(1−θ)x] ∈ B(x,δ). Since X is convex and g j( j = 1, ...,m) are concave, C is aconvex set, and x≡ [θ x+(1−θ)x]∈C. Thus x≡ [θ x+(1−θ)x]∈ B(x,δ)∩C. Also, sincef is concave, f (x)= f (θ x+(1−θ)x)≥ θ f (x)+(1−θ) f (x)> θ f (x)+(1−θ) f (x)= f (x).But this contradicts the fact that x is a point of constrained local maximum.

Chapter 26

Problem Set 8

(1) [Cauchy-Schwartz inequality]

Let (c1,c2,c3) be a non-zero vector in R3. Consider the following constrained maxi-mization problem:

max ∑3i=1 cixi

subject to ∑3i=1 x2

i = 1and (x1,x2,x3) ∈ R3

(26.1)

(a) Show, by using Weierstrass theorem, that there exists x ∈ R3 which solves (27.1).

(b) Use Lagrange’s theorem to show that

3

∑i=1

cixi = ∥C∥. (26.2)

(c) Let p, q be arbitrary non-zero vectors in Rn. Using result in (b), show that |p ·q| ≤∥p∥ · ∥q∥.

Solve the following constrained optimization problems.

(2) Let f : R2 → R.max

(x,y)∈R2+

f (x,y) = x2 −3xy

subject to x+2y = 10.(26.3)

(3) Let f : R2+ → R.

max(x,y)∈R2

+

f (x,y) = x13 y

23

subject to 2x+ y = 4.(26.4)

227


(4) Let f : R2+ → R

max f (x,y) =√

xysubject to x+ y ⩽ 6, x ⩾ 0, y ⩾ 0. (26.5)

(5) Let f : R2+ → R

max f (x,y) = x+ ln(1+ y)subject to x ≥ 0,y ≥ 0 and x+ py ≤ m.

(26.6)

(6) Let X be a non-empty, convex set in R2. Let g be a continuous function from X toR, and let f be a strictly quasi-concave function from X to R. Consider the followingconstrained optimization problem.

max f (x)subject to g(x)≥ 0and x ∈ X

(26.7)

and the corresponding optimization problem:

max f (x)subject to x ∈ X

(26.8)

in which the constraint g(x)≥ 0 has been omitted.

(a) Suppose that x is a solution to (27.26), and g(x) > 0. Is x also a solution toproblem (27.25)? Explain.

(b) Suppose that x is a solution to (27.26), but x is not a solution to (27.25). Showthat if x is any solution to (P), then we must have g(x) = 0.

(7) Suppose that a consumer has the utility function U(x,y) = xayb and faces the budgetconstraint pxx+ pyy ≤ I.

(A) Utility Maximization

(a) What are the first order conditions for utility maximization?(b) Solve for the consumer’s demands for goods x and y.(c) Solve for the value of λ. What is the economic interpretation of λ? When is

λ an increasing, decreasing or constant function of income?

229

(d) Show that the second order conditions hold?(e) Show that the implicit function theorem value of dx

dI is identical to the valueof taking the partial derivative of x∗ with respect to I.

(f) A consumer’s indirect utility function is defined to be utility as a function ofprices and income. Use x∗ and y∗ to solve for the indirect utility function. Isit true that the partial of the indirect utility function with respect to incomeequals λ?

(B) Expenditure Minimization:Now consider the “dual ”of the utility maximization problem. The dual problemis to minimize expenditures, Pxx+Pyy, subject to reaching a given level of utility,u0 (the constraint is therefore U0 − xayb = 0).

(a) What are the first order conditions for expenditure minimization?(b) Use the first order conditions to solve for x∗ and y∗ (these are called the Hick-

sian or compensated demand functions).(c) Check the second order conditions.(d) Write the level of income, I, necessary to reach U0 as a function of U0, prices,

and parameters. How does this expenditure function relate to the indirectutility function?

(e) To avoid confusion, let us call solution for utility maximization of good x asx∗ and solution for good x in expenditure minimization as h∗. Prove that

∂x∗

∂Px=

∂h∗

∂Px− x∗

∂x∗

∂I.

Interpret this answer.

(8) Suppose a consumer has the utility function U = a ln(x−x0)+b ln(y−y0) where a, b,x0 and y0 are positive parameters. Assume that the usual budget constraint applies.

(a) Solve for the consumer’s demand for good x.

(b) Find the elasticities of demand for good x with respect to income and prices.

(c) Show that the utility function U = 45(x− x0)3.5a(y− y0)

3.5a would have yieldedthe same demand for good x.

(9) Optimization with inequality constraints: Rationing.

Suppose a consumer has the utility function,

U(x,y,z) = a ln(x)+b ln(y)+ c ln(z)


where a > 0, b > 0 and c > 0 are such that a+b+ c = 1. The budget constraint is

px+qy+ rz ≤ I.

In other words, the prices of good x, y and z are p, q and r respectively and theconsumer has an income I. The prices and income are positive.

In addition, the consumer faces a rationing constraint. He is not allowed to buy morethan k > 0 units of good x.

(a) Solve the optimization problem.

(b) Under what condition on the various parameters, is the rationing constraint bind-ing?

(c) Show that when the rationing constraint binds, the income that the consumerwould have liked to spend on good x but cannot do so now is split between goody and z in proportions b : c.

(d) Would you expect rationing of bread purchases to affect demand for butter andrice in this way? If not, how would you expect the bread-butter-rice case to differfrom the result in (c)?

Chapter 27

Solution to PS 8

1. (a) The Constraint set C can be rewritten as

C =(x1, . . . ,x3) ∈ R3 : d2 ((0,0,0) ,(x1,x2,x3)) = 1

,

therefore C is

(i) bounded since C ⊂ B((0,0,0) ,2): indeed, x ∈C ⇒ d(x,0) = 1 < 2 ⇒x ∈ B(0,2),

(ii) closed in R3 since it is defined as a level set in R3 of polynomial and thereforecontinuous function ∑3

i=1 x2i (use the characterization of closed set in terms of

convergent sequences),(iii) non-empty since (1,0,0) ∈C.

Since objective function ∑3i=1 cixi is linear, and therefore continuous on R3, Weier-

strass theorem is applicable and yields x ∈C such that ∑3i=1 cixi ≤ ∑3

i=1 cixi for any(x1,x2,xn) ∈C.

(b) The optimization problem can be rewritten asmax f (x)

subject to g(x) = 0and x ∈ R3

(27.1)

where

f (x) =3

∑i=1

cixi and g(x) =3

∑i=1

x2i −1.

Both functions f and g are polynomial and therefore continuously differentiable onan open set R3. Since x is a point of global maximum of f subject to the constraint

231


g(x) = 0, it is also a local maximum of f subject to the constraint g(x) = 0. Sinceg(0) =−1 = 0 we have x = 0. Now

∇g(x) = 2(x1,x2,x3)′ = 0 for x = 0,

and x = 0, hence constraint qualification ∇g(x) = 0 holds. Therefore by Lagrange’stheorem there exists λ ∈ R such that ∇ f (x) = λ∇g(x), or

(c1,c2,c3)′ = λ2(x1, x2, xn)

′ (27.2)

If we premultiply (27.2) by the row vector (x1, x2, xn), we will get

3

∑i=1

cixi = 2λ3

∑i=1

x2i = 2λ(g(x)+1) = 2λ(0+1) = 2λ (27.3)

If we premultiply (27.2) by row vector (c1,c2,c3), equation (27.3) yields

∥c∥2 =3

∑i=1

c2i = 2λ

3

∑i=1

cixi =

(3

∑i=1

cixi

)2

(27.4)

To conclude that the result holds we only need to show that ∑3i=1 cixi ≥ 0. Indeed,

since (c1,c2,c3) = (0,0,0), we have ci = 0 for some i. Since g(

ei |ci|ci

)= 0 and x

solves (M), by definition of the solution to the constrained maximization problem

3

∑i=1

cixi = f (x)≥ f(

ei |ci|ci

)= |ci|> 0.

Now taking square roots in (27.4) yields the results.

(c) Let us define c = p, and consider x = q∥q∥ . Then ∥x∥ = 1, hence g(x) = 0 and the

definition of the solution of the constrained maximization problem yields

∥p∥= ∥c∥=3

∑i=1

cixi = f (x)≥ f (x) =3

∑i=1

cixi =1∥q∥

3

∑i=1

ciqi =1

∥q∥

3

∑i=1

piqi =pq∥q∥

Analogously, for x = − q∥q∥ we have ∥x∥ = 1 hence g(x) = 0 and the definition of

the solution of the constrained maximization problem yields

∥p∥ = ∥c∥=3

∑i=1

cixi = f (x)≥ f (x) =3

∑i=1

cixi =− 1∥q∥

3

∑i=1

ciqi

= − 1∥q∥

3

∑i=1

piqi =− pq∥q∥

.

233

Therefore, since ∥p∥∥p∥> 0, we have

−∥p∥∥q∥ ≤ pq ≤ ∥p∥∥p∥ ⇔ |pq| ≤ ∥p∥∥q∥.

2. Necessity Route : Function f (x,y) = x2 − 3xy is continuous and the constraint set(x,y) ∈ R2

+ | x+2y = 10

which we denote by G(x,y) is non-empty, (10,0) is con-tained in it, closed as the set is defined by weak inequalities which are preserved in thelimit and bounded as ∥(x,y)∥ ⩽

√102 +52 =

√125. So the constraint set is compact

and non-empty and the objective function f is continuous, hence Weierstrass theoremis applicable and a solution exists. The Lagrangian and the FOCs are

L (x,y,λ) = x2 −3xy+λ(2y+ x−10) (27.5)∂L (x,y,λ)

∂x= 2x−3y+λ = 0 (27.6)

∂L (x,y,λ)∂y

= −3x+2λ = 0 → λ =32

x (27.7)

∂L (x,λ)∂λ

= 2y+ x−10 = 0. (27.8)

Now

2x−3y+λ = 2x−3y+32

x = 0 → 72

x = 3y → y =76

x

2y+ x−10 = 0 → 73

x+ x−10 = 0 → 103

x = 10 → x = 3

y =76·3 =

72,λ =

92.

We get an interior candidate for solution

m1 =

(3,

72,92

).


∇g(x∗,y∗) =[

1 2]= 0

for all (x,y) ∈ R2+. Verify that

f (10,0) = 100, f (0,5) = 0, f(

3,72

)=

−452

.

The solution then is (x∗,y∗) = (10,0). Note that we cannot use sufficiency route sincef is not concave.


3. Necessity Route: A solution exists by arguments similar to the earlier problem. TheLagrangian and the FOCs are

L (x,y,λ) = x13 y

23 +λ(4−2x− y) (27.9)

∂L (x,y,λ)∂x

=13

x−23 y

23 −2λ = 0 (27.10)

∂L (x,y,λ)∂y

=23

x13 y−

13 −λ = 0 (27.11)

∂L (x,λ)∂λ

= 4−2x− y = 0. (27.12)

Now

13x−

23 y

23

23x

13 y−

13

=2λλ

→ y2x

= 2 → y = 4x

4−2x− y = 4−2x−4x = 0 → x =23,y =

83,λ =

23

(14

) 13

We get an interior candidate for solution

m1 =

(23,83,23

(14

) 13).


∇g(x∗,y∗) =[−2 −1

]= 0

for all (x,y) ∈ R2+. Verify that

f (2,0) = 0 = f (0,4) = 0, f(

23,83

)=

(23

) 13(

83

) 23

> 0.

The solution then is (x∗,y∗) =(2

3 ,83

).

Sufficiency route:

∇ f (x,y) =[

13x−

23 y

23 2

3x13 y−

13

]H f (x,y) =

[−2

9x−53 y

23 2

9x−23 y−

13

29x−

23 y−

13 −2

9x13 y−

43

].

235

The determinant of Principal minors of order one,

−29

x−53 y

23 ⩽ 0,

−29

x13 y−

43 ⩽ 0

and principal minor of order two0 ⩾ 0

for ∀(x,y) ∈ R2+. Hence f is concave. The constraint is linear and so concave. λ ⩾ 0.

L (x,y,λ) is concave and FOC are sufficient for maximum. Therefore (x∗,y∗) =(2

3 ,83

)which satisfies the FOC is the solution.

4. Let f : R2 → Rmax f (x,y) =

√xy

subject to x+ y ⩽ 6,x ⩾ 0,y ⩾ 0. (27.13)

This problem has inequality constraint and so we will use Kuhn Tucker Sufficiencytheorem. We need to check all conditions of the Theorem are satisfied.

(i) LetX =

(x,y) ∈ R2

++

.


XC =(x,y) ∈ R2 | x ⩽ 0,y ⩽ 0

is closed.

(ii) Function f (x,y) is continuous as x, and y are continuous, and f (·) is obtainedby taking the product of these two continuous functions. Let g1 (x,y) = 6 −x− y,g2 (x,y) = x,g3 (x,y) = y are linear and hence continuous functions. Fur-

ther fx (x,y) = 12

√yx , fy (x,y) = 1

2

√xy are continuous functions. Hence f , g j( j =

1, · · · ,3) are continuously differentiable on X .

(iii) The set X is convex as (x1,y1), (x2,y2) ∈ X ,then

x1 > 0,x2 > 0 → λx1 +(1−λ)x2 > 0∀λ ∈ (0,1)y1 > 0,y2 > 0 → λy1 +(1−λ)y2 > 0∀λ ∈ (0,1)

→ (λx1 +(1−λ)x2,λy1 +(1−λ)y2) ∈ X .


(iv) Function f (x,y) is concave as

∇ f (x,y) =[

12

√yx

12

√xy

]H f (x,y) =

−14

√yx3

14

√1xy

14

√1xy −1

4

√xy3

.The determinant of Principal minors of order one,

−14

√yx3 ⩽ 0,

−14

√xy3 ⩽ 0


for ∀(x,y) ∈ X . Hence f is concave. Further, g j( j = 1, · · · ,3) are concave beinglinear functions.

Hence for the following problem

max(x,y)∈X

f (x,y) =√

xy

subject to x+ y ⩽ 6,x > 0,y > 0.



(i) Di f (x∗)+m

∑j=1

λ∗jDig j(x∗) = 0; i = 1, · · · ,n,

(ii) g(x∗)⩾ 0 and λ∗ ·g(x∗) = 0.

They are

12

√yx−λ1 +λ2 = 0 (27.14)

12

√xy−λ1 +λ3 = 0 (27.15)

6− x− y ⩾ 0,λ1 (6− x− y) = 0 (27.16)x ⩾ 0,λ2x = 0;y ⩾ 0,λ3y = 0 (27.17)

237

If λ1 = 0, then 12

√xy −λ1+λ3 = 0→ λ3 =−1

2

√xy < 0 which contradicts λ3 ⩾ 0. Hence

λ1 > 0 → 6− x− y = 0

Since x > 0, y > 0, λ2 = 0, λ3 = 0,

12

√yx−λ1 +λ2 =

12

√xy−λ1 +λ3 = 0 → 1

2

√xy= λ1 =

12

√yx

→ x = y → 6− x− y = 0 → x = y = 3 > 0.

Note that all conditions are satisfied. Hence it is a global maximum on X . Observe thatit is also a global maximum on R2

+ as

f (x,y) = 0 for (x,y) = R2+ \X

and f (3,3)> 0. Hence, (3,3) solves the optimization problem.

5. Let f : R2 → Rmax f (x,y) = x+ ln(1+ y)

subject to x ≥ 0,y ≥ 0 and x+ py ≤ m.(27.18)

Again we will use Kuhn Tucker Sufficiency theorem. We need to check all conditionsof the Theorem are satisfied.

(i) LetX =

(x,y) ∈ R2 | x >−1,y >−1

.


XC =(x,y) ∈ R2 | x ⩽−1,y ⩽−1

is closed.

(ii) Function f (x,y) is continuous as x and ln(1+ y), for y >−1 are continuous, andf (·) is sum of two continuous functions. Let g1 (x,y) = m− x− py,g2 (x,y) =x,g3 (x,y) = y are linear and hence continuous functions. Further fx (x,y) = 1,fy (x,y) = 1

1+y are continuous functions. Hence f , g j( j = 1, · · · ,3) are continu-ously differentiable on X .

(iii) The set X is convex as (x1,y1), (x2,y2) ∈ X , then

x1 > −1,x2 >−1 → λx1 +(1−λ)x2 >−1∀λ ∈ (0,1)y1 > −1,y2 >−1 → λy1 +(1−λ)y2 >−1∀λ ∈ (0,1)

→ (λx1 +(1−λ)x2,λy1 +(1−λ)y2) ∈ X .


(iv) Function f (x,y) is concave as

∇ f (x,y) =[

1 11+y

]H f (x,y) =

[0 00 − 1

(1+y)2

].

The determinant of Principal minors of order one,

0 ⩽ 0,

− 1

(1+ y)2 ⩽ 0


for ∀(x,y) ∈ X . Hence f is concave. g j( j = 1, · · · ,3) are concave being linearfunctions.

Hence for the following problem

max(x,y)∈X

f (x,y) = x+ ln(1+ y)

subject to x+ py ⩽ m,x > 0,y > 0.



(i) Di f (x∗)+m

∑j=1

λ∗jDig j(x∗) = 0; i = 1, · · · ,n, and

(ii) g(x∗)⩾ 0 and λ∗ ·g(x∗) = 0.

They are

1−λ1 +λ2 = 0 (27.19)1

1+ y− pλ1 +λ3 = 0 (27.20)

m− x− py ⩾ 0,λ1 (m− x− py) = 0 (27.21)x ⩾ 0,λ2x = 0;y ⩾ 0,λ3y = 0. (27.22)

If λ1 = 0, then 1−λ1 +λ2 = 0 → λ2 =−1 < 0 which contradicts λ2 ⩾ 0. Hence

λ1 > 0 → m− x− py = 0

and x = y = 0 is ruled out because m > 0. There are three remaining cases.

239

(i) x > 0, y = 0. Note λ2 = 0, x = m,

1 = λ1

1− p+λ3 = 0λ3 = p−1.

If p−1 ⩾ 0, then λ3 ⩾ 0. So solution is (m,0,1,0, p−1) if p ⩾ 1.

(ii) x = 0, y > 0. Note λ3 = 0, y = mp ,

1

p(

1+ mp

) =1

p+m== λ1

1−λ1 +λ2 = 0

1− 1p+m

+λ2 = 0

λ2 =1

p+m−1.

If 1p+m −1 ⩾ 0 → 1 ⩾ p+m, then λ2 ⩾ 0. So solution is

(0, m

p ,1

p+m ,1

p+m −1,0)

if p+m ⩽ 1.

(iii) x > 0, y > 0. Note λ2 = 0, λ3 = 0,

1 = λ1,1

1+ y= p → y =

1p−1 > 0 (27.23)

m− x− py = 0 → x = m−1+ p > 0 (27.24)

Hence for 1 > p > 1−m, the solution is(

m−1+ p, 1p −1,1,0,0

). Combining

them the solution(x∗,y∗,λ∗

1,λ∗2,λ

∗3)

is(m,0,1,0, p−1) if p ⩾ 1(

0, mp ,

1p+m ,

1p+m −1,0

)if p ⩽ 1−m and(

m−1+ p, 1p −1,1,0,0

)if 1−m < p < 1.

The Kuhn Tucker Sufficiency Theorem asserts that this solution is a global max-imum and therefore solves both the problem.


6. Recall the two optimization problems are

max f (x)subject to g(x)≥ 0and x ∈ X

(27.25)

and the corresponding optimization problem:

max f (x)subject to x ∈ X

(27.26)

in which the constraint g(x)≥ 0 has been omitted.

(a) We claim that x is also a solution to problem (27.25). For, if this is not the case,then since x is in the constraint set x ∈ X : g(x) ≥ 0 of problem (27.25), thereis some x′ ∈ X , with g(x′) ≥ 0, such that f (x′) > f (x). But, since x′ ∈ X and istherefore in the constrain set of problem (27.26), this means that x is not a solutionto problem (27.26), a contradiction. This establishes our claim. [Note that we arenot given the information that problem (27.25) has a solution, and so we do notmake use of this information in the answer].

(b) Let x be any solution to problem (27.25). Note, for future reference that, sinceboth x and x are in X , the constraint set of problem (27.26), and x solves problem(27.26), we have

f (x)≥ f (x) (27.27)

We claim that g(x) = 0. For if g(x) = 0, we must have g(x) > 0, since x is a so-lutiuon to problem (27.25), and must therefore be in the constraint set x ∈ X :g(x)≥ 0 of problem (27.25).Since x is not a solution to problem (27.25), and x ∈ X , it must be the case thatg(x)< 0. For if g(x)≥ 0, then, given (27.27), x would also solve problem (27.25).Since g(x) < 0, continuity of g on the convex set X [using the intermediate valuetheorem] implies that we can find λ ∈ (0,1), such that:

g(λx+(1−λ)x) = 0 (27.28)

Denote (λx+(1−λ)x) by z. Then z ∈ X and g(z) = 0 by (27.28), so z satisfies theconstraints of problem (27.25).Since f is strictly quasi-concave on X , then we can use x = x [recall that g(x)< 0while g(x)> 0], and λ ∈ (0,1), to obtain:

f (z) = f (λx+(1−λ)x)> min f (x), f (x)= f (x)

241

using (27.27). But this contradicts the fact x solves (27.25), and establishes ourclaim.

7. Suppose that a consumer has the utility function U(x,y) = xayb and faces the budgetconstraint pxx+ pyy ≤ I.

(A) Utility Maximization

(a) What are the first order conditions for utility maximization?Observe that the utility function makes sense only if a > 0 and b > 0. TheLagrangean for the optimization problem is

L (x,y,λ) =U(x,y)+λ(I − pxx− pyy)

= xayb +λ(I − pxx− pyy)

The first order conditions are,

∂L

∂x= axa−1yb −λpx = 0

∂L

∂y= bxayb−1 −λpy = 0

∂L

∂λ= I − pxx− pyy = 0

(b) Solve for the consumer’s demands for goods x and y.From the first two FOCs, we get

axa−1yb = λpx

bxayb−1 = λpy

Dividing the first equation by the second, we get

axa−1yb

bxayb−1 =λpx

λpyaybx

=px

py

pyy =ba

pxx.


We use this in the third FOC to get,

pxx+ pyy = I

pxx+ba

pxx = I

a+ba

pxx = I

pxx∗ =a

a+bI → x∗ =

aa+b

Ipx

This gives

pyy∗ =b

a+bI → y∗ =

ba+b

Ipy

(c) Solve for the value of λ. What is the economic interpretation of λ? When is λan increasing, decreasing or constant function of income?We use the first FOC (with respect to x) to get,

axa−1yb = λpx → λ∗ =axa−1yb

px

λ∗ =a(

aa+b

Ipx

)a−1(b

a+bIpy

)b

px

=

(apx

)a( bpy

)b( Ia+b

)a+b−1

> 0

The lagrange multiplier λ∗ is marginal utility of income, as we can see below.

L ∗(x∗,y∗,λ∗) =U(x∗,y∗)+λ∗(I − p∗x − pyy∗)

= (x∗)a(y∗)b +λ∗ · (0)

Suppose the income increased by a dollar. Then the utility goes up by λ∗.Lastly the λ∗ is increasing with income if and only if a+b > 1.

(d) Show that the second order conditions hold?

243

Observe that the second order partial derivatives are,

∂2L

∂x2 = a(a−1)xa−2yb

∂2L

∂x∂y= abxa−1yb−1

∂2L

∂y2 = b(b−1)xayb−2

∂2L

∂x∂λ=−px

∂2L

∂y∂λ=−py

Using these, we get the bordered Hessian matrix as under:

H =

∂2L∂x2

∂2L∂x∂y

∂2L∂x∂λ

∂2L∂x∂y

∂2L∂y2

∂2L∂y∂λ

∂2L∂x∂λ

∂2L∂y∂λ

∂2L∂λ2

=

a(a−1)xa−2yb abxa−1yb−1 −pxabxa−1yb−1 b(b−1)xayb−2 −py

−px −py 0

.

The border preserving leading principal minor of order 2 is the Hessian matrixitself. For the second order condition to be satisfy, the determinant of the


Hessian needs to be positive.

detH = (−px)[(−py)abxa−1yb−1 − (−px)b(b−1)xayb−2]− (−py)[(−py)a(a−1)xa−2yb − (−px)abxa−1yb−1]

= px[pyabxa−1yb−1 − pxb(b−1)xayb−2]− py[pya(a−1)xa−2yb − pxabxa−1yb−1]

= 2px pyabxa−1yb−1 − p2xb(b−1)xayb−2 − p2

ya(a−1)xa−2yb

= (x∗)a(y∗)b

[2abpx py

xy− b(b−1)p2

xy2 −

a(a−1)p2y

x2

]

= (x∗)a(y∗)b

[2abpx pyaI

(a+b)px

bI(a+b)py

− b(b−1)p2x

( bI(a+b)py

)2−

a(a−1)p2y

( aI(a+b)px

)2

]

= (x∗)a(y∗)b

(2[(a+b)px py

I

]2

− (a+b)2 b−1b

[ px py

I

]2− (a+b)2 a−1

a

[ px py

I

]2)

= (x∗)a(y∗)b[(a+b)px py

I

]2(2− b−1

b− a−1

a

)= (x∗)a(y∗)b

[(a+b)px py

I

]2(2−1+

1b−1+

1a

)= (x∗)a(y∗)b

[(a+b)px py

I

]2(1b+

1a

)> 0

(e) Show that the implicit function theorem value of dxdI is identical to the value of

taking the partial derivative of x∗ with respect to I.Using x∗, we get

∂x∗

∂I=

aa+b

1px

245

Using the implicit function theorem,

dx∗

dI=

det

0 abxa−1yb−1 −px0 b(b−1)xayb−2 −py−1 −py 0

detH

=−1[(−py)abxa−1yb−1 − (−px)b(b−1)xayb−2]

detH

=[pyabxa−1yb−1 − pxb(b−1)xayb−2]

detH

= bxa−1yb−2 [apyy− px(b−1)x]detH

= bxa−1yb−2 xpx

detH=

bxayb−2 px

detH

=bxayb−2 px

(x∗)a(y∗)b[(a+b)px py

I

]2 (1b +

1a

)=

bpx

(y∗)2[(a+b)px py

I

]2 (a+bab

)=

px[y∗(a+b)py

bI

]2 (a+ba

)p2

x

=1(a+b

a

)px

=a

a+b1px.

Thus the two expressions are identical.(f) A consumer’s indirect utility function is defined to be utility as a function of

prices and income. Use x∗ and y∗ to solve for the indirect utility function. Isit true that the partial of the indirect utility function with respect to incomeequals λ?The indirect utility function is

u∗ = u(x∗,y∗) = (x∗)a(y∗)b =

(aI

(a+b)px

)a( bI(a+b)py

)b

=

(a

(a+b)px

)a( b(a+b)py

)b

Ia+b

Then,

∂u∗

∂I= (a+b)

(a

(a+b)px

)a( b(a+b)py

)b

Ia+b−1

=

(apx

)a( bpy

)b( Ia+b

)a+b−1

= λ∗


(B) Expenditure Minimization:Now consider the “dual ”of the utility maximization problem. The dual problemis to minimize expenditures, Pxx+Pyy, subject to reaching a given level of utility,u0 (the constraint is therefore U0 − xayb = 0).(a) What are the first order conditions for expenditure minimization?

First, we write down the minimization problem as

min pxx+ pyysubject to xayb ≥ u0,

which can be converted into a maximization exercise as under:max −pxx− pyy

subject to xayb ≥ u0,

The lagrangean for the maximization problem is

L (x,y,λ) =−pxx− pyy+λ(xayb −u0)

The first order conditions are,∂L

∂x=−px +λaxa−1yb = 0

∂L

∂y=−py +λbxayb−1 = 0

∂L

∂λ= xayb −u0 = 0

(b) Use the first order conditions to solve for x∗ and y∗ (these are called the Hick-sian or compensated demand functions).From the first two FOCs, we get

λaxa−1yb = px

λbxayb−1 = py


λaxa−1yb

λbxayb−1 =px

pyaybx

=px

py

y =ba

px

pyx.

247


xayb = u0; xa(

ba

px

pyx)b

= u0; xa+b =u0(

ba

pxpy

)b ; x∗ =(

ab

py

px

) ba+b

u1

a+b0

y∗ =ba

px

pyx∗ =

ba

px

py

(ab

py

px

) ba+b

u1

a+b0 =

(ba

px

py

) aa+b

u1

a+b0

(c) Check the second order conditions.It is easy to see that the bordered Hessian is same as in the case of utilitymaximization exercise. Hence we conclude that the SOC holds in this case.

(d) Write the level of income, I, necessary to reach U0 as a function of U0, prices,and parameters. How does this expenditure function relate to the indirect util-ity function?

e(px, py,u0) = pxx∗+ pyy∗ = px

(ab

py

px

) ba+b

u1

a+b0 + py

(ba

px

py

) aa+b

u1

a+b0

= (pax pb

yu0)1

a+b

[(ab

) ba+b

+

(ba

) aa+b]

= (pax pb

y)1

a+b

[(a

(a+b)px

)a( b(a+b)py

)b

Ia+b

] 1a+b[(a

b

) ba+b

+

(ba

) aa+b]

=

[(a

a+b

)a( ba+b

)b] 1

a+b[(a

b

) ba+b

+

(ba

) aa+b]

I = I.

This shows that the minimum expenditure required to attain utility equal to theindirect utility function is same as the income I. Thus the two approaches areequivalent.

(e) To avoid confusion, let us call solution for utility maximization of good x asx∗ and solution for good x in expenditure minimization as h∗. Prove that

∂x∗

∂Px=

∂h∗

∂Px− x∗

∂x∗

∂I.

Interpret this answer.


Observe that we can rewrite h∗ as h∗ = θ(px)− b

a+b where θ ≡ (ab py)

ba+b u

1a+b0 .

This gives us

∂h∗

∂px= θ

(− b

a+b

)(px)

− ba+b−1 =

(− b

a+b

)h∗

px.

Also from the utility maximization, we get,

∂x∗

∂px=

(− aI

a+b

)(px)

−2 =−x∗

px.

and

x∗∂x∗

∂I= x∗

(a

a+b

)(px)

−1.

Therefore,

∂x∗

∂px+ x∗

∂x∗

∂I=

−x∗

px+

(a

a+b

)(x∗

px

)=− b

a+b

(x∗

px

)=

∂h∗

∂px.

The change in x∗ due to change in own price px (Total effect) is the sum of thesubstitution effect ( ∂h∗

∂px) and the income effect (−x∗ ∂x∗

∂I ).

8. Suppose a consumer has the utility function U = a ln(x− x0)+b ln(y− y0) where a, b,x0 and y0 are positive parameters. Assume that the usual budget constraint applies.

(a) Solve for the consumer’s demand for good x.Observe that the utility maximization exercise makes sense if consumption bundle(x0,y0) is feasible. Let us denote x− x0 by x′ and y− y0 by y′. Then the utilityfunction can be written as U(x′,y′) = a ln(x′) + b ln(y′). The budget cosntsriantpx+qy = I can be written as px′+qy′ = I− px0 −qy0 = I′. The utility maximiza-tion exercise can therefore be formulated as

max a ln(x′)+b ln(y′)subject to px′+qy′ = I′.

The lagrangean for the optimization problem is

L (x′,y′,λ) = a ln(x′)+b ln(y′)+λ(I′− pxx′− pyy′)

249

The first order conditions are,

∂L

∂x′=

ax′−λpx = 0

∂L

∂y′=

by′−λpy = 0

∂L

∂λ= I′− pxx′− pyy′ = 0

From the first two FOCs, we get

ax′

= λpx;by′

= λpy


ay′

bx′=

px

py; pyy′ =

ba

pxx′.


pxx′+ pyy′ = I′

pxx′+ba

pxx′ = I′

a+ba

pxx′ = I′

pxx′ =a

a+bI′ → x′ =

aa+b

I′

px

This gives

pyy′ =b

a+bI′ → y′ =

ba+b

I′

py

We need to show that the second order conditions hold for the solution to yield amaximum.Observe that the second order partial derivatives are, (in order to save on notation,we omit the primes on x and y where context is clear.

∂2L

∂x2 =− a(x′)2 ;

∂2L

∂x∂y= 0;

∂2L

∂y2 =− b(y′)2 ;

∂2L

∂x∂λ=−px;

∂2L

∂y∂λ=−py


Using these, we get the bordered Hessian matrix as under:

H =

∂2L∂x2

∂2L∂x∂y

∂2L∂x∂λ

∂2L∂x∂y

∂2L∂y2

∂2L∂y∂λ

∂2L∂x∂λ

∂2L∂y∂λ

∂2L∂λ2

=

−a

(x′)2 0 −px

0 − b(y′)2 −py

−px −py 0

.The border preserving leading principal minor of order 2 is the Hessian matrixitself. For the second order condition to be satisfy, the determinant of the Hessianneeds to be positive.

detH = (−px)

[−(−px)

(− b(y′)2

)]− (−py)

[(−py)

(− a(x′)2

)]=

(bp2

x(y′)2

)+

(ap2

y

(x′)2

)> 0.

Thus SOC holds and we have a maximum. The optimum consumption bundle is

x∗ = x′+ x0 =a

a+bI − pxx0 − pyy0

px+ x0

=a

a+bI − pyy0

px+

ba+b

x0

y∗ =b

a+bI − pxx0

py+

aa+b

y0

(b) Find the elasticities of demand for good x with respect to income and prices.It is easy to compute the price and income elasticity using the definitions. Pleaselet me know if you have any questions on this.

(c) Show that the utility function V = 45(x−x0)3.5a(y−y0)

3.5b would have yielded thesame demand for good x.If we take positive monotone transformation of the given utility by taking its naturallog, then we get a function which is similar to the utility function in (a).

lnV = ln45+3.5a ln(x− x0)+3.5b ln(y− y0)

= ln45+3.5(U)

This implies that the consumption bundle (x∗,y∗) will maximize the utility functionV also.

Chapter 28

Elementary Concepts in Probability

Probability theory deals with random events, events whose occurrence cannot be predictedwith certainty. There are at least three sources of randomness. Firstly by nature manyfeatures of our world are stochastic. Evolution of such a diverse variety of life is witnessto unpredictability in the universe and environment. Second source of randomness: Manyevents are the result of a very large number of actions and decisions. Third source ofrandomness: Some variables may appear random because they are measured with error.

Even though we are not sure about the outcomes of a random event, we can attach toeach outcome a number called probability.

28.1 Discrete Probability ModelWe first describe the set of outcomes of a random event, i.e., a set whose elements are allpossible outcomes of a random event.

Example 28.1. The set of possible outcomes of flipping a fair coin is

Ω = H,T.

The set of outcomes for flipping two coins is

Ω = HT,T H,T T,HH.

It is easy to list the set of outcomes for flipping n coins, but very soon the lis becomes toolong.

Next we form the set F that contains all elements of the set Ω as well as their unionsand complements. Thus if A and B are in F , so does A∪B, Ac, and Bc. The set F , whichis closed under the operations of union and complements, as algebra.

251

252 CHAPTER 28. ELEMENTARY CONCEPTS IN PROBABILITY

Example 28.2. The algebra for the outcomes of flipping a fair coin is

F = /0,H,T,H,T.

The algebra for the outcomes for flipping two coins is

F = /0,Ω,T T,HH,HT,T H,HH,T T,HH,HT,T H,T T,HT,T H.

We can now define a probability measure by assigning to each element of sample spaceΩ, a probability P.

Definition 28.1. The set function P is called a probability measure if

(i) P( /0) = 0;

(ii) P(Ω) = 1;

(iii) P(A∪B) = P(A)+P(B) for all A,B ∈ Ω and A∩B = /0.

The three conditions listed above are the axioms of probability theory.

Example 28.3. For the outcomes of flipping two fair coins,

P(HH) = P(HT ) = P(T T ) = P(T H) = 0.25.

The triple of the set of outcomes, the algebra, and the probability measure (Ω,F ,P)is referred to as a probability model.

In next step, we assign probabilities to the random events. Three sources of attachingprobabilities to the outcomes of random events are (a) equally likely events, (b) long runfrequencies and (c) degree of confidence (subjective or Bayesian approach). Observe thateven though we assign probabilities to different events, the mathematical theory for dealingwith the random events and their probabilities remain the same.

We define random variable next. The rule that specifies a real number to the outcomesis called a random variable. More formally,

Definition 28.2. A random variable is a set function that maps the set of outcomes of arandom event to the set of real numbers.

Such a function is not unique and depending on the the purpose at hand, we may defineone or many random variables to the same random event.

28.1. DISCRETE PROBABILITY MODEL 253

Example 28.4. For the outcomes of flipping two fair coins, let us define a random variableX as the number of heads. Then, we have

X(HH) = 2;X(HT ) = P(T H) = 1,X(T T ) = 0.

WE could have defined the random variable X as the number of tails. Then, we have

X(HH) = 0;X(HT ) = P(T H) = 1,X(T T ) = 2.

In collecting labor statistics, we are interested in the characteristics of the respondents.For example, we may ask if a person is in the labor force or not, employed or unemployed.We could also be interested to learn the demographic characteristics of the respondentslike gender, race, age etc. For each of these answers we can define one or more binaryvariables. For example let X = 1 if a respondent who is in the labor force is unemployedand X = 0 if employed. We can define Y = 1 if the respondent is a woman and employed,Y = 0 otherwise.

A random variable together with its probabilities is called a probability distribution.Let us consider three flips of a coin.

Example 28.5. For the outcomes of flipping three fair coins, let us define a random vari-able X as the number of heads. Then, the probability distribution is

P(X = 0) = 0.125;P(X = 1) = 0.375;P(X = 2) = 0.375;P(X = 3) = 0.125.

Probability distributions become unwieldy as if the number of outcomes is large orinfinite. One way to summarize the information about a probability distribution is throughits moments as as mean which measure the central tendency, and variance, which measuresthe dispersion or variability of the distribution. Another moment reflects the skewness ofthe distribution to the left or to the right and kurtosis which is an indicator of the bundlingof the outcomes near the mean : the more values are concentrated near the mean, the talleris the peak of the distribution.

The first moment of the distribution around zero which is the expected value or themean of the distribution is defined as

E(X) = µ =n

∑i=1

xiP(xi).

Example 28.6. For the distribution of the number of heads in three flips of a coin, wehave,

µ = 0 ·P(X = 0)+1 ·P(X = 1)+2 ·P(X = 2)+3 ·P(X = 3).

which yields the mean as

µ = 0+0.375+0.750+0.375 = 1.50


In similar manner, we may define the rth moment of a distribution around zero as

E(X r) = mr =n

∑i=1

xri P(xi).

Example 28.7. For the distribution of the number of heads in three flips of a coin, thesecond moment is

E(X2) = 02 ·P(X = 0)+12 ·P(X = 1)+22 ·P(X = 2)+32 ·P(X = 3).

which yields the mean as

µ = 0+0.375+1.50+1.125 = 3

Another measure (which is of great importance) is the variance or the second momentaround the mean is

E(X −µ)2 = σ2 =n

∑i=1

(xi −µ)2P(xi).

The formula for the variance can be rewritten using the binomial expansion as

E(X −µ)2 =n

∑i=1

(xi −µ)2P(xi)

=n

∑i=1

x2i P(xi)−2µ

n

∑i=1

xiP(xi)+µ2

=n

∑i=1

x2i P(xi)−µ2

Example 28.8. For the distribution of the number of heads in three flips of a coin, thevariance is

σ2 = E(X2)−µ2 = 3−1.52 = 0.75.

Mean is a measure of central tendency of a distribution showing its center of gravitywhereas the variance and its square root, called the standard deviation measure the disper-sion or the volatility of the distribution. The advantage of using the standard deviation isthat it measures the dispersion in the same measurement units as the original variable. Infinance, variance of returns of an asset is used as a measure of risk.

28.2. MARGINAL AND CONDITIONAL DISTRIBUTION 255

28.2 Marginal and Conditional DistributionAs we have observed before, a random event may give rise to a number of random variableseach defined by a different set function whose domains are the same set. In the Tablebelow we present such a situation where random variables X and Y and their probabilitiesare reported. Think of Y as the annual income in $1000 of a profession and X as gender,with X = 0 denoting men and X = 1 denoting women. The information contained in thetable is probability of joint events, i.e., the probability of X and Y each taking a particularvalue. For instance the probability of X = 1 and Y = 120 is 0.11, which is denoted as

P(X = 1,Y = 120) = 0.11.

Such a probability is referred to as joint probability because it shows the probability of awoman earning $120000 a year.

X Y P0 60 0.020 70 0.040 80 0.070 90 0.090 100 0.100 110 0.060 120 0.030 130 0.020 140 0.010 150 0.011 70 0.011 80 0.021 90 0.041 100 0.081 110 0.111 120 0.111 130 0.091 140 0.051 150 0.031 160 0.01

If we are interested only in X , then we can sum up the overall relevant values of Y andget the marginal probability of X . For example,

P(X = 1) = P(X = 1,Y = 70)+P(X = 1,Y = 160) = 0.01+ · · ·+0.01 = 0.55.


In general we can write

P(X = xk) =n

∑j=1

P(X = xk,Y = y j)

In similar manner, we can calculate the probability of X = 0 which would be 0.45. Thusthe marginal distribution of X is

X P(X)0 0.451 0.55

A similar procedure yields the marginal probability of Y . For example,

P(Y = 90) = P(Y = 90,X = 0)+P(Y = 90,X = 1) = 0.09+0.04 = 0.13.

Observe that in this example, the marginal distribution of X shows the distribution ofmen and women in that profession (45% men and 55% women), whereas the marginaldistribution of Y would show the distribution of income for both men and women, i.e.,profession as a whole.

Sometimes we may be interested to know the probability of Y = 110 when we alreadyknow that X = 1. Thus we want to know the conditional probability of Y = 110, given thatX = 1.

P(Y = 110|X = 1) =P(Y = 110,X = 1)

P(X = 1)=

0.110.55

= 0.20

In general,

P(Y = y j|X = xk) =P(Y = y j,X = xk)

P(X = xk).

We have computed the conditional distribution of Y |X = 0 and Y |X = 1.Y P(Y— X=0) Y P(Y— X=1)60 0.044 70 0.01870 0.089 80 0.03680 0.156 90 0.07390 0.2 100 0.145

100 0.222 110 0.200110 0.133 120 0.200120 0.067 130 0.164130 0.044 140 0.091140 0.022 150 0.055150 0.022 160 0.018

28.3. THE LAW OF ITERATED EXPECTATION 257

A conditional distribution has a mean, variance and other moments. The mean is

E(Y |X = xk) =n

∑j=1

y jP(y j|X = xk).

Variance and other higher moments of the conditional distribution can be computed simi-larly.

The conditional mean of the conditional distribution given above is

E(Y |X = 0) = 101.4;E(Y |X = 1) = 111.4

28.3 The Law of Iterated ExpectationIt relates the conditional mean and the unconditional mean. In general

E(Y ) = EX E(Y |X) =n

∑j=1

E(Y |X = x j)P(X = x j)

For the example above,

E(Y ) = E(Y |X = 0)P(X = 0)+E(Y |X = 1)P(X = 1)= 101.4×0.45+111.4×0.55 = 107.9

It is easy to infer that if E(Y |X = x j) = 0 for all values of x, i.e., the conditional expectationof Y equals zero, then the unconditional expectation E(Y ) = EX E(Y |x) = 0. However, thereverse is not true. E(Y ) = 0 does not imply that E(Y |x) = 0 for all values of x.

28.4 Continuous Random VariablesMany variables we come across in economics are continuous in nature as against discrete.In assigning probabilities to continuous variables, we face the problem that no matter howsmall is the interval of values of the continuous variable, there are infinitely many pointsin it. If we assign positive probabilities to each point, the sum of such probabilities woulddiverge which violates the axiom of probability theory, the sum of probabilities should addup to one.

This problem is circumvented by assigning probabilities to the segments of the intervalwithin which the random variable is defined.

P(X ≤ 5), or P(−4 < X ≤ 2)


Example 28.9. A simple example of a continuous random variable is the uniform distribu-tion. Variable X can take any value between a and b and the probability of X falling withinthe segment [a,c] is proportional to the length of the interval compared to the interval [a,b].

P(a < X ≤ c] =c−ab−a

The probability distribution function of X is defined by

F(x) = P(X ≤ x)

and has to conform to the following conditions:

(a) F(x) is continuous.

(b) F(x) is non-decreasing, i.e.

F(x1)≤ F(x2), if x1 < x2.

(c)F(−∞) = lim

x→−∞F(x) = 0, and F(∞) = lim

x→∞F(x) = 1.

These conditions are the counterpart of the discrete case and entail that probability isalways positive and the sum of probabilities adds unto one.

Now we define the probability model for continuous random variables. Consider theextended real line R = R∪−∞,∞ which shall play the same role for the continuousvariables as Ω plays for the discrete variables, (the set of all possible outcomes). Considerthe half closed intervals on R,

(a,b] = [x ∈ R : a < x ≤]

and form finite sums of such intervals provided the intervals are disjoint:

A =n

∑j=1

(a j,b j],n < ∞.

A set consisting of all such sums plus the empty set /0 is an algebra, but it is not a σ-algebra.The smallest σ-algebra thaw contains this set is called the Borel set and is denoted byB(R). Finally we define the probability measure as

F(x) = P(−∞,x].

The triple (R,B(R),P) is our probability model for continuous random variables.

Bibliography

Bridges, D., Ray, M., 1984. What is constructive mathematics. The Mayhematical Intelli-gencer 6 (4), 32–38.

Dixit, A. K., 1990. Optimization in Economic Theory, 2nd Edition. Oxford UniversityPress, USA.

Mas-Colell, A., Whinston, M., Green, J., 1995. Microeconomic Theory. Oxford UniversityPress, USA.

Mitra, T., 2013. Lectures on Mathematical Analysis for Economists. Campus Book Store.

Simon, C. P., Blume, L., 1994. Mathematics for Economists. W. W. Norton & Co., NewYork.

Stricharz, R., 2000. The Way of Analysis. Jones and Bartlett.

Wainwright, E. K., Chiang, A., 2005. Fundamental Methods of Mathematical Economics,4th Edition. McGraw Hill, New York.

259

math review 2014

Documents

set theory

set operations

matrix operations

set identities5

optimization problem

matrix definiteness8

matrix multiplication8

contents iv3 problem