eui math-course-2012-09--19

Background Course in Mathematics1

European University InstituteDepartment of Economics

Fall 2012

antonio villanacci

September 20, 2012

1I would like to thank the following friends for helpful comments and discussions: Laura Carosi, MicheleGori, Vincent Maurin, Michal Markun, Marina Pireddu, Kiril Shakhnov and all the students of the coursesI used these notes for in the past several years.

Contents

I Linear Algebra 7

1 Systems of linear equations 91.1 Linear equations and solutions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 91.2 Systems of linear equations, equivalent systems and elementary operations . . . . . . 101.3 Systems in triangular and echelon form . . . . . . . . . . . . . . . . . . . . . . . . . 111.4 Reduction algorithm . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 121.5 Matrices . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 141.6 Systems of linear equations and matrices . . . . . . . . . . . . . . . . . . . . . . . . . 18

2 The Euclidean Space Rn 212.1 Sum and scalar multiplication . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 212.2 Scalar product . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 222.3 Norms and Distances . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 23

3 Matrices 273.1 Matrix operations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 283.2 Inverse matrices . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 323.3 Elementary matrices . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 343.4 Elementary column operations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 40

4 Vector spaces 434.1 Definition . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 434.2 Examples . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 454.3 Vector subspaces . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 454.4 Linear combinations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 464.5 Row and column space of a matrix . . . . . . . . . . . . . . . . . . . . . . . . . . . . 474.6 Linear dependence and independence . . . . . . . . . . . . . . . . . . . . . . . . . . . 504.7 Basis and dimension . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 544.8 Change of basis . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 59

5 Determinant and rank of a matrix 655.1 Definition and properties of the determinant of a matrix . . . . . . . . . . . . . . . . 655.2 Rank of a matrix . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 695.3 Inverse matrices (continued) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 705.4 Span of a matrix, linearly independent rows and columns, rank . . . . . . . . . . . . 72

6 Eigenvalues and eigenvectors 756.1 Characteristic polynomial . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 756.2 Eigenvalues and eigenvectors . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 766.3 Similar matrices and diagonalizable matrices . . . . . . . . . . . . . . . . . . . . . . 77

7 Linear functions 797.1 Definition . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 797.2 Kernel and Image of a linear function . . . . . . . . . . . . . . . . . . . . . . . . . . 807.3 Nonsingular functions and isomorphisms . . . . . . . . . . . . . . . . . . . . . . . . . 83

3

4 CONTENTS

8 Linear functions and matrices 878.1 From a linear function to the associated matrix . . . . . . . . . . . . . . . . . . . . . 878.2 From a matrix to the associated linear function . . . . . . . . . . . . . . . . . . . . . 888.3 M (m,n) and L (V,U) are isomorphic . . . . . . . . . . . . . . . . . . . . . . . . . . . 898.4 Some related properties of a linear function and associated matrix . . . . . . . . . . 928.5 Some facts on L (Rn,Rm) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 948.6 Examples of computation of [l]uv . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 968.7 Change of basis and linear operators . . . . . . . . . . . . . . . . . . . . . . . . . . . 978.8 Diagonalization of linear operators . . . . . . . . . . . . . . . . . . . . . . . . . . . . 988.9 Appendix. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 100

8.9.1 The dual and double dual space of a vector space . . . . . . . . . . . . . . . . 1008.9.2 Vector spaces as Images or Kernels of well chosen linear functions . . . . . . 102

9 Solutions to systems of linear equations 1059.1 Some preliminary basic facts . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1059.2 A solution method: Rouchè-Capelli’s and Cramer’s theorems . . . . . . . . . . . . . 106

10 Appendix. Basic results on complex numbers 11710.1 Definition of the set C of complex numbers . . . . . . . . . . . . . . . . . . . . . . . 11710.2 The complex numbers as an extension of the real numbers . . . . . . . . . . . . . . . 11810.3 The imaginary unit i . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11910.4 Geometric interpretation: modulus and argument . . . . . . . . . . . . . . . . . . . . 12010.5 Complex exponentials . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 12110.6 Complex-valued functions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 123

II Some topology in metric spaces 125

11 Metric spaces 12711.1 Definitions and examples . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 12711.2 Open and closed sets . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 131

11.2.1 Sets which are open or closed in metric subspaces. . . . . . . . . . . . . . . . 13611.3 Sequences . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 13711.4 Sequential characterization of closed sets . . . . . . . . . . . . . . . . . . . . . . . . . 14011.5 Compactness . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 141

11.5.1 Compactness and bounded, closed sets . . . . . . . . . . . . . . . . . . . . . . 14111.5.2 Sequential compactness . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 143

11.6 Completeness . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 14911.6.1 Cauchy sequences . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 14911.6.2 Complete metric spaces . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 15011.6.3 Completeness and closedness . . . . . . . . . . . . . . . . . . . . . . . . . . . 152

11.7 Fixed point theorem: contractions . . . . . . . . . . . . . . . . . . . . . . . . . . . . 15211.8 Appendix. Some characterizations of open and closed sets . . . . . . . . . . . . . . . 154

12 Functions 15912.1 Limits of functions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 15912.2 Continuous Functions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 16012.3 Continuous functions on compact sets . . . . . . . . . . . . . . . . . . . . . . . . . . 164

13 Correspondence and fixed point theorems 16713.1 Continuous Correspondences . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 16713.2 The Maximum Theorem . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 17413.3 Fixed point theorems . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 17713.4 Application of the maximum theorem to the consumer problem . . . . . . . . . . . . 177

CONTENTS 5

III Differential calculus in Euclidean spaces 181

14 Partial derivatives and directional derivatives 18314.1 Partial Derivatives . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 18314.2 Directional Derivatives . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 184

15 Differentiability 19115.1 Total Derivative and Differentiability . . . . . . . . . . . . . . . . . . . . . . . . . . . 19115.2 Total Derivatives in terms of Partial Derivatives. . . . . . . . . . . . . . . . . . . . . 193

16 Some Theorems 19516.1 The chain rule . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 19616.2 Mean value theorem . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 19816.3 A sufficient condition for differentiability . . . . . . . . . . . . . . . . . . . . . . . . . 20116.4 A sufficient condition for equality of mixed partial derivatives . . . . . . . . . . . . . 20116.5 Taylor’s theorem for real valued functions . . . . . . . . . . . . . . . . . . . . . . . . 201

17 Implicit function theorem 20317.1 Some intuition . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 20317.2 Functions with full rank square Jacobian . . . . . . . . . . . . . . . . . . . . . . . . . 20517.3 The inverse function theorem . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 20817.4 The implicit function theorem . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 21017.5 Some geometrical remarks on the gradient . . . . . . . . . . . . . . . . . . . . . . . . 21217.6 Extremum problems with equality constraints. . . . . . . . . . . . . . . . . . . . . . 212

IV Nonlinear programming 215

18 Concavity 21718.1 Convex sets . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 21718.2 Different Kinds of Concave Functions . . . . . . . . . . . . . . . . . . . . . . . . . . . 217

18.2.1 Concave Functions. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 21918.2.2 Strictly Concave Functions. . . . . . . . . . . . . . . . . . . . . . . . . . . . . 22118.2.3 Quasi-Concave Functions. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 22318.2.4 Strictly Quasi-concave Functions. . . . . . . . . . . . . . . . . . . . . . . . . . 22718.2.5 Pseudo-concave Functions. . . . . . . . . . . . . . . . . . . . . . . . . . . . . 229

18.3 Relationships among Different Kinds of Concavity . . . . . . . . . . . . . . . . . . . 23018.3.1 Hessians and Concavity. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 233

19 Maximization Problems 23519.1 The case of inequality constraints: Kuhn-Tucker theorems . . . . . . . . . . . . . . . 235

19.1.1 On uniqueness of the solution . . . . . . . . . . . . . . . . . . . . . . . . . . . 23919.2 The Case of Equality Constraints: Lagrange Theorem. . . . . . . . . . . . . . . . . 24019.3 The Case of Both Equality and Inequality Constraints. . . . . . . . . . . . . . . . . . 24219.4 Main Steps to Solve a (Nice) Maximization Problem . . . . . . . . . . . . . . . . . . 243

19.4.1 Some problems and some solutions . . . . . . . . . . . . . . . . . . . . . . . . 24819.5 The Implicit Function Theorem and Comparative Statics Analysis . . . . . . . . . . 250

19.5.1 Maximization problem without constraint . . . . . . . . . . . . . . . . . . . . 25019.5.2 Maximization problem with equality constraints . . . . . . . . . . . . . . . . 25119.5.3 Maximization problem with Inequality Constraints . . . . . . . . . . . . . . . 251

19.6 The Envelope Theorem and the meaning of multipliers . . . . . . . . . . . . . . . . . 25319.6.1 The Envelope Theorem . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 25319.6.2 On the meaning of the multipliers . . . . . . . . . . . . . . . . . . . . . . . . 254

20 Applications to Economics 25720.1 The Walrasian Consumer Problem . . . . . . . . . . . . . . . . . . . . . . . . . . . . 25720.2 Production . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 25920.3 The demand for insurance . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 261

6 CONTENTS

V Problem Sets 263

21 Exercises 26521.1 Linear Algebra . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 26521.2 Some topology in metric spaces . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 269

21.2.1 Basic topology in metric spaces . . . . . . . . . . . . . . . . . . . . . . . . . . 26921.2.2 Correspondences . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 272

21.3 Differential Calculus in Euclidean Spaces . . . . . . . . . . . . . . . . . . . . . . . . . 27321.4 Nonlinear Programming . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 276

22 Solutions 27922.1 Linear Algebra . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 27922.2 Some topology in metric spaces . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 287

22.2.1 Basic topology in metric spaces . . . . . . . . . . . . . . . . . . . . . . . . . . 28722.2.2 Correspondences . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 292

22.3 Differential Calculus in Euclidean Spaces . . . . . . . . . . . . . . . . . . . . . . . . . 29422.4 Nonlinear Programming . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 300

Part I

Linear Algebra

7

Chapter 1

Systems of linear equations

1.1 Linear equations and solutionsDefinition 1 A1 linear equation in the unknowns x1, x2, ..., xn is an equation of the form

a1x1 + a2x2 + ...anxn = b, (1.1)

where b ∈ R and ∀j ∈ {1, ..., n} , aj ∈ R . The real number aj is called the coefficient of xj and b iscalled the constant of the equation. aj for j ∈ {1, ..., n} and b are also called parameters of system(1.1).

Definition 2 A solution to the linear equation (1.1) is an ordered n-tuple (x1, ..., xn) := (xj)nj=1

such2 that the following statement (obtained by substituting xj in the place of xj for any j ) is true:

a1x1 + a2x2 + ...anxn = b,

The set of all such solutions is called the solution set or the general solution or, simply, the solutionof equation (1.1).

The following fact is well known.

Proposition 3 Let the linear equationax = b (1.2)

in the unknown (variable) x ∈ R and parameters a, b ∈ R be given. Then,

1. if a 6= 0, then x = ba is the unique solution to (1.2);

2. if a = 0 and b 6= 0, then (1.2) has no solutions;

3. if a = 0 and b = 0, then any real number is a solution to (1.2).

Definition 4 A linear equation (1.1) is said to be degenerate if ∀j ∈ {1, ..., n}, aj = 0, i.e., it hasthe form

0x1 + 0x2 + ...0xn = b, (1.3)

Clearly,

1. if b 6= 0, then equation (1.3) has no solution,

2. if b = 0, any n-tuple (xj)nj=1 is a solution to (1.3).

Definition 5 Let a nondegenerate equation of the form (1.1) be given. The leading unknown of thelinear equation (1.1) is the first unknown with a nonzero coefficient, i.e., xp is the leading unknownif

∀j ∈ {1, ..., p− 1} , aj = 0 and ap 6= 0.For any j ∈ {1, ..., n} \ {p} , xj is called a free variable - consistently with the following obvious

result.1 In this part, I often follow Lipschutz (1991).2 “:=” means “equal by definition”.

9

10 CHAPTER 1. SYSTEMS OF LINEAR EQUATIONS

Proposition 6 Consider a nondegenerate linear equation a1x1 + a2x2 + ...anxn = b with leadingunknown xp. Then the set of solutions to that equation is(

(xk)nk=1 : ∀j ∈ {1, ..., n} \ {p} , xj ∈ R and xp =

b−P

j∈{1,...,n}\{,p} ajxj

ap

)

1.2 Systems of linear equations, equivalent systems and el-ementary operations

Definition 7 A system of m linear equations in the n unknowns x1, x2, ..., xn is a system of theform ⎧⎪⎪⎪⎪⎨⎪⎪⎪⎪⎩

a11x1 + ...+ a1jxj + ...+ a1nxn = b1...ai1x1 + ...+ aijxj + ...+ ainxn = bi...am1xi + ...+ amjxj + ...+ amnxn = bm

(1.4)

where ∀i ∈ {1, ...,m} and ∀j ∈ {1, ..., n}, aij ∈ R and ∀i ∈ {1, ...,m}, bi ∈ R. We call Li thei− th linear equation of system (1.4).A solution to the above system is an ordered n-tuple (xj)

nj=1 which is a solution of each equation

of the system. The set of all such solutions is called the solution set of the system.

Definition 8 Systems of linear equations are equivalent if their solutions set is the same.

The following fact is obvious.

Proposition 9 Assume that a system of linear equations contains the degenerate equation

L : 0x1 + 0x2 + ...0xn = b.

1. If b = 0, then L may be deleted from the system without changing the solution set;

2. if b 6= 0, then the system has no solutions.

A way to solve a system of linear equations is to transform it in an equivalent system whosesolution set is “easy” to be found. In what follows we make precise the above sentence.

Definition 10 An elementary operation on a system of linear equations (1.4) is one of the followingoperations:

[E1] Interchange Li with Lj, an operation denoted by Li ↔ Lj (which we can read “put Li in theplace of Lj and Lj in the place of Li”);

[E2] Multiply Li by k ∈ R\ {0}, denoted by kLi → Li, k 6= 0 (which we can read “put kLi in theplace of Li, with k 6= 0”);

[E3] Replace Li by ( k times Lj plus Li ), denoted by (Li + kLj) → Li (which we can read “putLi + kLj in the place of Li”).

Sometimes we apply [E2] and [E3] in one step, i.e., we perform the following operation

[E] Replace Li by ( k0 times Lj and k ∈ R\ {0} times Li ), denoted by (k0Lj + kLi)→ Li, k 6= 0.

Elementary operations are important because of the following obvious result.

Proposition 11 If S1 is a system of linear equations obtained from a system S2 of linear equationsusing a finite number of elementary operations, then system S1 and S2 are equivalent.

In what follows, first we define two types of “simple” systems (triangular and echelon formsystems), and we see why those systems are in fact “easy” to solve. Then, we show how to transformany system in one of those “simple” systems.

1.3. SYSTEMS IN TRIANGULAR AND ECHELON FORM 11

1.3 Systems in triangular and echelon formDefinition 12 A linear system (1.4) is in triangular form if the number n of equations is equal tothe number n of unknowns and ∀i ∈ {1, ..., n}, xi is the leading unknown of equation i, i.e., thesystem has the following form:⎧⎪⎪⎪⎪⎨⎪⎪⎪⎪⎩

a11x1 +a12x2 +... +a1,n−1xn−1 +a1nxn = b1a22x2 +... +a2,n−1xn−1 +a2nxn = b2

...an−1,n−1xn−1 +an−1nxn = bn−1

annxn = bn

(1.5)

where ∀i ∈ {1, ..., n}, aii 6= 0.

Proposition 13 System (1.5) has a unique solution.

Proof. We can compute the solution of system (1.5) using the following procedure, known asback-substitution.First, since by assumption ann 6= 0, we solve the last equation with respect to the last unknown,

i.e., we get

xn =bnann

.

Second, we substitute that value of xn in the next-to-the-last equation and solve it for the next-to-the-last unknown, i.e.,

xn−1 =bn−1 − an−1,n · bn

ann

an−1,n−1

and so on. The process ends when we have determined the first unknown, x1.Observe that the above procedure shows that the solution to a system in triangular form is

unique since, at each step of the algorithm, the value of each xi is uniquely determined, as aconsequence of Proposition 3, conclusion 1.

Definition 14 A linear system (1.4) is said to be in echelon form if

1. no equation is degenerate, and

2. the leading unknown in each equation is to the right of the leading unknown of the precedingequation.

In other words, the system is of the form⎧⎪⎪⎪⎪⎨⎪⎪⎪⎪⎩a11x1 +... +a1j2xj2 +... +a1sxs =b1

a2j2xj2 +... +a2,j3xj3 +... +a2sxs =b2

a3,j3xj3 +... +a3sxs =b3

...

ar,jrxjr +ar,jr+1+... +arsxs =br

(1.6)

with j1 := 1 < j2 < ... < jr and a11, a2j2 , ..., arjr 6= 0. Observe that the above system has requations and s variables and that s ≥ r. The leading unknown in equation i ∈ {1, ..., r} is xji .

Remark 15 Systems with no degenerate equations are the “interesting” ones. If an equation isdegenerate and the right hand side term is zero, then you can erase it; if the right hand side termis not zero, then the system has no solutions.

Definition 16 An unknown xk in system (1.6) is called a free variable if xk is not the leadingunknown in any equation, i.e., ∀i ∈ {1, ..., r} , xk 6= xji .

In system (1.6), there are r leading unknowns, r equations and s− r ≥ 0 free variables.

Proposition 17 Let a system in echelon form with r equations and s variables be given. Then,the following results hold true.


1. If s = r, i.e., the number of unknowns is equal to the number of equations, then the systemhas a unique solution;

2. if s > r, i.e., the number of unknowns is greater than the number of equations, then we canarbitrarily assign values to the n− r > 0 free variables and obtain solutions of the system.

Proof. We prove the theorem by induction on the number r of equations of the system.Step 1. r = 1.In this case, we have a single, nondegenerate linear equation, to which Proposition 6 applies if

s > r = 1, and Proposition 3 applies if s = r = 1.Step 2.Assume that r > 1 and the desired conclusion is true for a system with r−1 equations. Consider

the given system in the form (1.6) and erase the first equation, so obtaining the following system:⎧⎪⎪⎨⎪⎪⎩a2j2xj2 +... +a2,j3xj3 +... = b2

a3,j3xj3 +...... ...ar,jrxjr +ar,jr+1 +arsxs = br

(1.7)

in the unknowns xj2 , ..., xs. First of all observe that the above system is in echelon form and hasr − 1 equation; therefore we can apply the induction argument distinguishing the two case s > rand s = r.If s > r, then we can assign arbitrary values to the free variables, whose number is (the “old”

number minus the erased ones)

s− r − (j2 − j1 − 1) = s− r − j2 + 2

and obtain a solution of system (1.7). Consider the first equation of the original system

a11x1 +a12x2 +... +a1,j2−1xj2−1 +a1j2xj2 +... = b1 . (1.8)

We immediately see that the above found values together with arbitrary values for the additional

j2 − 2

free variable of equation (1.8) yield a solution of that equation, as desired. Observe also that thevalues given to the variables x1, ..., xj2−1 from the first equation do satisfy the other equations simplybecause their coefficients are zero there.If s = r, the system in echelon form, in fact, becomes a system in triangular form and then the

solution exists and it is unique.

Remark 18 From the proof of the previous Proposition, if the echelon system (1.6) contains moreunknowns than equations, i.e., s > r, then the system has an infinite number of solutions since eachof the s− r ≥ 1 free variables may be assigned an arbitrary real number.

1.4 Reduction algorithm

The following algorithm (sometimes called row reduction) reduces system (1.4) of m equation andn unknowns to either echelon form, or triangular form, or shows that the system has no solution.The algorithm then gives a proof of the following result.

Proposition 19 Any system of linear equations has either

1. infinite solutions, or

2. a unique solution, or

3. no solutions.

1.4. REDUCTION ALGORITHM 13

Reduction algorithm.Consider a system of the form (1.4) such that

∀j ∈ {1, ..., n} , ∃i ∈ {1, ..,m} such that aij 6= 0, (1.9)

i.e., a system in which each variable has a nonzero coefficient in at least one equation. If that isnot the case, the remaining variables can renamed in order to have (1.9) satisfied.

Step 1. Interchange equations so that the first unknown, x1, appears with a nonzero coefficient inthe first equation; i.e., arrange that a11 6= 0.

Step 2. Use a11 as a “pivot” to eliminate x1 from all equations but the first equation. That is, foreach i > 1, apply the elementary operation

[E3] : −µai1a11

¶L1 + Li → Li

or[E] : −ai1L1 + a11Li → Li.

Step 3. Examine each new equation L :

1. If L has the form0x1 + 0x2 + ....+ 0xn = 0,

or if L is a multiple of another equation, then delete L from the system.3

2. If L has the form0x1 + 0x2 + ....+ 0xn = b,

with b 6= 0, then exit the algorithm. The system has no solutions.

Step 4. Repeat Steps 1, 2 and 3 with the subsystem formed by all the equations, excluding thefirst equation.

Step 5. Continue the above process until the system is in echelon form or a degenerate equationis obtained in Step 3.2.

Summarizing, our method for solving system (1.4) consists of two steps:Step A. Use the above reduction algorithm to reduce system (1.4) to an equivalent simpler

system (in triangular form, system (1.5) or echelon form (1.6)).Step B. If the system is in triangular form, use back-substitution to find the solution; if the

system is in echelon form, bring the free variables on the right hand side of each equation, givethem arbitrary values (say, the name of the free variable with an upper bar), and then use back-substitution.

Example 20 ⎧⎨⎩ x1 + 2x2 + (−3)x3 = −13x1 + (−1)x2 + 2x3 = 75x1 + 3x2 + (−4)x3 = 2

Step A.

Step 1. Nothing to do.

3The justification of Step 3 is Propositon 9 and the fact that if L = kL0 for some other equation L0 in the system,then the operation −kL0+L→ L replace L by 0x1+0x2+ ....+0xn = 0, which again may be deleted by Propositon9.


Step 2. Apply the operations−3L1 + L2 → L2

and−5L1 + L3 → L3,

to get ⎧⎨⎩ x1 + 2x2 + (−3)x3 = −1(−7)x2 + 11x3 = 10(−7)x2 + 11x3 = 7

Step 3. Examine each new equations L2 and L3:

1. L2 and L3 do not have the form

0x1 + 0x2 + ....+ 0xn = 0;

L2 is not a multiple L3;

2. L2 and L3 do not have the form

0x1 + 0x2 + ....+ 0xn = b,

Step 4.

Step 1.1 Nothing to do.

Step 2.1 Apply the operation−L2 + L3 → L3

to get ⎧⎨⎩ x1 + 2x2 + (−3)x3 = −1(−7)x2 + 11x3 = 10

0x1 + 0x2 + 0x3 = −3

Step 3.1 L3 has the form0x1 + 0x2 + ....+ 0xn = b,

1. with b = −3 6= 0, then exit the algorithm. The system has no solutions.

1.5 Matrices

Definition 21 Given m,n ∈ N\ {0}, a matrix (of real numbers) of order m × n is a table of realnumbers with m rows and n columns as displayed below.⎡⎢⎢⎢⎢⎢⎢⎣

a11 a12 ... a1j ... a1na21 a22 ... a2j ... a2n...ai1 ai2 ... aij ... ain...am1 am2 ... amj ... amn

⎤⎥⎥⎥⎥⎥⎥⎦For any i ∈ {1, ...,m} and any j ∈ {1, ..., n} the real numbers aij are called entries of the matrix;

the first subscript i denotes the row the entries belongs to, the second subscript j denotes the columnthe entries belongs to. We will usually denote matrices with capital letters and we will write Am×nto denote a matrix of order m × n. Sometimes it is useful to denote a matrix by its “typical”element and we write[aij ] i∈{1,...,m}

j∈{1,...,n}, or simply [aij ] if no ambiguity arises about the number of rows

and columns. For i ∈ {1, ...,m}, £ai1 ai2 ... aij ... ain

¤

1.5. MATRICES 15

is called the i− th row of A and it denoted by Ri (A). For j ∈ {1, ..., n},⎡⎢⎢⎢⎢⎢⎢⎣a1ja2j...aij...amj

⎤⎥⎥⎥⎥⎥⎥⎦is called the j − th column of A and it denoted by Cj (A).We denote the set of m× n matrices byMm,n, and we write, in an equivalent manner, Am×n

or A ∈Mm,n.

Definition 22 The matrix

Am×1 =

⎡⎣ a1...am

⎤⎦is called column vector and the matrix

A1×n =£a1, ... an

¤is called row vector. We usually denote row or column vectors by small Latin letters.

Definition 23 The first nonzero entry in a row R of a matrix Am×n is called the leading nonzeroentry of R. If R has no leading nonzero entries, i.e., if every entry in R is zero, then R is called azero row. If all the rows of A are zero, i.e., each entry of A is zero, then A is called a zero matrix,denoted by 0m×n or simply 0, if no confusion arises.

In the previous sections, we defined triangular and echelon systems of linear equations. Below,we define triangular, echelon matrices and a special kind of echelon matrices. In Section (1.6), wewill see that there is a simple relationship between systems and matrices.

Definition 24 A matrix Am×n is square if m = n. A square matrix A belonging toMm,m is calledsquare matrix of order m.

Definition 25 Given A = [aij ] ∈ Mm,m, the main diagonal of A is made up by the entries aiiwith i ∈ {1, ..,m}.

Definition 26 A square matrix A = [aij ] ∈ Mm,m is an upper triangular matrix or simply atriangular matrix if all entries below the main diagonal are equal to zero, i.e., ∀i, j ∈ {1, ..,m} , ifi > j, then aij = 0.

Definition 27 A ∈Mmm is called diagonal matrix of order m if any element outside the principaldiagonal is equal to zero, i.e., ∀i, j ∈ {1, ...,m} such that i 6= j, aij = 0.

Definition 28 A matrix A ∈Mm,n is called an echelon (form) matrix, or it is said to be in echelonform, if the following two conditions hold:

1. All zero rows, if any, are on the bottom of the matrix.

2. The leading nonzero entry of each row is to the right of the leading nonzero entry in thepreceding row.

Definition 29 If a matrix A is in echelon form, then its leading nonzero entries are called pivotentries, or simply, pivots

Remark 30 If a matrix A ∈ Mm,n is in echelon form and r is the number of its pivot entries,then r ≤ min {m,n}. In fact, r ≤ m, because the matrix may have zero rows and r ≤ n, because theleading nonzero entries of the first row maybe not in the first column, and the other leading nonzeroentries may be “strictly to the right” of previous leading nonzero entry.


Definition 31 A matrix A ∈Mm,n is called in row canonical form if

1. it is in echelon form,

2. each pivot is 1, and

3. each pivot is the only nonzero entry in its column.

Example 32 1. All the matrices below are echelon matrices; only the fourth one is in row canonicalform.⎡⎢⎢⎣0 7 0 0 1 20 0 0 1 −3 30 0 0 0 0 70 0 0 0 0 0

⎤⎥⎥⎦ ,⎡⎢⎢⎣2 3 2 0 1 2 40 0 1 1 −3 3 00 0 0 0 0 7 10 0 0 0 0 0 0

⎤⎥⎥⎦ ,⎡⎣ 1 2 30 0 10 0 0

⎤⎦ ,⎡⎣ 0 1 3 0 0 40 0 0 1 0 −30 0 0 0 1 2

⎤⎦ .2. Any zero matrix is in row canonical form.

Remark 33 Let a matrix Am×n in row canonical form be given. As a consequence of the definition,we have what follows.1. If some rows from A are erased, the resulting matrix is still in row canonical form.2. If some columns of zeros are added, the resulting matrix is still in row canonical form.

Definition 34 Denote by Ri the i− th row of a matrix A. An elementary row operation is one ofthe following operations on the rows of A:

[E1] (Row interchange) Interchange Ri with Rj, an operation denoted by Ri ↔ Rj (which we canread “put Li in the place of Rj and Rj in the place of Ri”);;

[E2] (Row scaling) Multiply Ri by k ∈ R\ {0}, denoted by kRi → Ri, k 6= 0 (which we can read“put kRi in the place of Ri, with k 6= 0”);

[E3] (Row addition) Replace Ri by ( k times Rj plus Ri ), denoted by (Ri + kRj)→ Ri (which wecan read “put Ri + kRj in the place of Ri”).


[E] Replace Ri by ( k0 times Rj and k ∈ R\ {0} times Ri ), denoted by (k0Rj + kRi)→ Ri, k 6= 0.

Definition 35 A matrix A ∈Mm,n is said to be row equivalent to a matrix B ∈Mm,n if B canbe obtained from A by a finite number of elementary row operations.

It is hard not to recognize the similarity of the above operations and those used in solvingsystems of linear equations.We use the expression “row reduce” as having the meaning of “transform a given matrix into

another matrix using row operations”. The following algorithm “row reduces” a matrix A into amatrix in echelon form.Row reduction algorithm to echelon form.Consider a matrix A = [aij ] ∈Mm,n.

Step 1. Find the first column with a nonzero entry. Suppose it is column j1.

Step 2. Interchange the rows so that a nonzero entry appears in the first row of column j1, i.e., sothat a1j1 6= 0.

Step 3. Use a1j1 as a “pivot” to obtain zeros below a1j1 , i.e., for each i > 1, apply the row operation

[E3] : −µaij1a1j1

¶R1 +Ri → Ri

or[E] : −aij1R1 + a11Ri → Ri.

1.5. MATRICES 17

Step 4. Repeat Steps 1, 2 and 3 with the submatrix formed by all the rows, excluding the firstrow.

Step 5. Continue the above process until the matrix is in echelon form.

Example 36 Let’s apply the above algorithm to the following matrix⎡⎣ 1 2 −3 −13 −1 2 75 3 −4 2

⎤⎦Step 1. Find the first column with a nonzero entry: that is C1, and therefore j1 = 1.

Step 2. Interchange the rows so that a nonzero entry appears in the first row of column j1, i.e., sothat a1j1 6= 0: a1j1 = a11 = 1 6= 0.

Step 3. Use a11 as a “pivot” to obtain zeros below a11. Apply the row operations

−3R1 +R2 → R2

and−5R1 +R3 → R3,

to get ⎡⎣ 1 2 −3 −10 −7 11 100 −7 11 7

⎤⎦Step 4. Apply the operation

−R2 +R3 → R3

to get ⎡⎣ 1 2 −3 −10 −7 11 100 0 0 −3

⎤⎦which is is in echelon form.

Row reduction algorithm from echelon form to row canonical form.Consider a matrix A = [aij ] ∈Mm,n in echelon form, say with pivots

a1j1 , a2j2 , ..., arjr .

Step 1. Multiply the last nonzero row Rr by 1arjr

,so that the leading nonzero entry of that rowbecomes 1.

Step 2. Use arjr as a “pivot” to obtain zeros above the pivot, i.e., for each i ∈ {r − 1, r − 2, ..., 1},apply the row operation

[E3] : −ai,jrRr +Ri → Ri.

Step 3. Repeat Steps 1 and 2 for rows Rr−1, Rr−2, ..., R2.

Step 4. Multiply R1 by 1a1j1

.

Example 37 Consider the matrix ⎡⎣ 1 2 −3 −10 −7 11 100 0 0 −3

⎤⎦in echelon form, with leading nonzero entries

a11 = 1, a22 = −7, a34 = −3.


Step 1. Multiply the last nonzero row R3 by 1−3 ,so that the leading nonzero entry becomes 1:⎡⎣ 1 2 −3 −1

0 −7 11 100 0 0 1

⎤⎦Step 2. Use arjr = a34 as a “pivot” to obtain zeros above the pivot, i.e., for each i ∈ {r − 1, r − 2, ..., 1} =

{2, 1}, apply the row operation

[E3] : −ai,jrRr +Ri → Ri,

which in our case are

−a2,4R3 +R2 → R2 i.e., − 10R3 +R2 → R2,

−a1,4R3 +R1 → R1 i.e., R3 +R1 → R1.

Then, we get ⎡⎣ 1 2 −3 00 −7 11 00 0 0 1

⎤⎦Step 3. Multiply R2 by 1

−7 , and get ⎡⎣ 1 2 −3 00 1 − 117 00 0 0 1

⎤⎦Use a23 as a “pivot” to obtain zeros above the pivot, applying the operation:

−2R2 +R1 → R1,

to get ⎡⎣ 1 0 17 0

0 1 − 117 00 0 0 1

⎤⎦which is in row reduced form.

Proposition 38 Any matrix A ∈Mm,n is row equivalent to a matrix in row canonical form.

Proof. The two above algorithms show that any matrix is row equivalent to at least one matrixin row canonical form.

Remark 39 In fact, in Proposition 152, we will show that: Any matrix A ∈Mm,n is row equivalentto a unique matrix in row canonical form.

1.6 Systems of linear equations and matricesDefinition 40 Given system (1.4), i.e., a system ofm linear equation in the n unknowns x1, x2, ..., xn⎧⎪⎪⎪⎪⎨⎪⎪⎪⎪⎩

a11x1 + ...+ a1jxj + ...+ a1nxn = b1...ai1x1 + ...+ aijxj + ...+ ainxn = bi...am1xi + ...+ amjxj + ...+ amnxn = bm,

the matrix ⎡⎢⎢⎢⎢⎣a11 ... a1j ... a1n b1...ai1 ... aij ... ain bi...am1 ... amj ... amn bm

⎤⎥⎥⎥⎥⎦is called the augmented matrix M of system (1.4).

1.6. SYSTEMS OF LINEAR EQUATIONS AND MATRICES 19

Each row of M corresponds to an equation of the system, and each column of M correspondsto the coefficients of an unknown, except the last column which corresponds to the constant of thesystem.In an obvious way, given an arbitrary matrix M , we can find a unique system whose associated

matrix is M ; moreover, given a system of linear equations, there is only one matrix M associatedwith it. We can therefore identify system of linear equations with (augmented) matrices.The coefficient matrix of the system is

A =

⎡⎢⎢⎢⎢⎣a11 ... a1j ... a1n...ai1 ... aij ... ain...am1 ... amj ... amn

⎤⎥⎥⎥⎥⎦One way to solve a system of linear equations is as follows:1. Reduce its augmented matrixM to echelon form, which tells if the system has solution; ifM

has a row of the form (0, 0, ..., 0, b) with b 6= 0, then the system has no solution and you can stop.If the system admits solutions go to the step below.2. Reduce the matrix in echelon form obtained in the above step to its row canonical form.

Write the corresponding system. In each equation, bring the free variables on the right hand side,obtaining a triangular system. Solve by back-substitution.The simple justification of this process comes from the following facts:

1. Any elementary row operation of the augmented matrix M of the system is equivalent toapplying the corresponding operation on the system itself.

2. The system has a solution if and only if the echelon form of the augmented matrix M doesnot have a row of the form (0, 0, ..., 0, b) with b 6= 0 - simply because that row corresponds toa degenerate equation.

3. In the row canonical form of the augmented matrix M (excluding zero rows) the coefficient ofeach nonfree variable is a leading nonzero entry which is equal to one and is the only nonzeroentry in its respective column; hence the free variable form of the solution is obtained bysimply transferring the free variable terms to the other side of each equation.

Example 41 Consider the system presented in Example 20:⎧⎨⎩ x1 + 2x2 + (−3)x3 = −13x1 + (−1)x2 + 2x3 = 75x1 + 3x2 + (−4)x3 = 2

The associated augmented matrix is:⎡⎣ 1 2 −3 −13 −1 2 75 3 −4 2

⎤⎦In example 36, we have see that the echelon form of the above matrix is⎡⎣ 1 2 −3 −1

0 −7 11 100 0 0 −3

⎤⎦which has its last row of the form (0, 0, ..., 0, b) with b = −3 6= 0, and therefore the system has nosolution.

Chapter 2

The Euclidean Space Rn

2.1 Sum and scalar multiplicationIt is well known that the real line is a representation of the set R of real numbers. Similarly, aordered pair (x, y) of real numbers can be used to represent a point in the plane and a triple (x, y, z)or (x1, x2, x3) a point in the space. In general, if n ∈ N+ := {1, 2, ..., }, we can define (x1, x2, ..., xn)or (xi)

ni=1 as a point in the n− space.

Definition 42 Rn := R×...×R .

In other words, Rn is the Cartesian product of R multiplied n times by itself.

Definition 43 The elements of Rn are ordered n-tuple of real numbers and are denoted by

x = (x1, x2, ..., xn) or x = (xi)ni=1.

xi is called i− th component of x ∈ Rn.

Definition 44 x = (xi)ni=1 ∈ Rn and y = (yi)ni=1 are equal if

∀i ∈ {1, ..., n} , xi = yi.

In that case we write x = y.

Let us introduce two operations on Rn and analyze some properties they satisfy.

Definition 45 Given x ∈ Rn, y ∈ Rn, we call addition or sum of x and y the element denoted byx+ y ∈ Rn obtained as follows

x+ y := (xi + yi)ni=1.

Definition 46 An element λ ∈ R is called scalar.

Definition 47 Given x ∈ Rn and λ ∈ R, we call scalar multiplication of x by λ the elementλx ∈ Rn obtained as follows

λx := (λxi)ni=1.

Geometrical interpretation of the two operations in the case n = 2.

21

22 CHAPTER 2. THE EUCLIDEAN SPACE RN

From the well known properties of the sum and product of real numbers it is possible to verifythat the following properties of the above operations do hold true.Properties of addition.A1. (Associative) ∀x, y ∈ Rn, (x+ y) + z = x+ (y + z);A2. (existence of null element) there exists an element e in Rn such that for any x ∈ Rn,

x+ e = x; in fact such element is unique and it is denoted by 0;A3. (existence of inverse element) ∀x ∈ Rn ∃y ∈ Rn such that x+ y = 0; in fact, that element

is unique and denoted by −x;A4. (Commutative) ∀x, y ∈ Rn, x+ y = y + x.Properties of multiplication.M1. (distributive) ∀α ∈ R, x ∈ Rn, y ∈ Rn α(x+ y) = αx+ αy;M2. (distributive) ∀α ∈ R, β ∈ R, x ∈ Rn, (α+ β)x = αx+ βxM3. ∀α ∈ R, β ∈ R, x ∈ Rn, (αβ)x = α(βx);M4. ∀x ∈ Rn, 1x = x.

2.2 Scalar productDefinition 48 Given x = (xi)

ni=1, y = (yi)

ni=1 ∈ Rn, we call dot, scalar or inner product of x and

y, denoted by xy or x · y, the scalarnXi=1

xi · yi ∈ R.

Remark 49 The scalar product of elements of Rn satisfies the following properties.1. ∀x, y ∈ Rn x · y = y · x;2. ∀α, β ∈ R,∀x, y, z ∈ Rn (αx+ βy) · z = α(x · z) + β(y · z);3. ∀x ∈ Rn, x · x ≥ 0;4. ∀x ∈ Rn , x · x = 0 ⇐⇒ x = 0.

Definition 50 The set Rn with above described three operations (addition, scalar multiplicationand dot product) is usually called Euclidean space of dimension n.

Definition 51 Given x = (xi)ni=1 ∈ Rn, we denote the (Euclidean) norm or length of x by

kxk := (x · x)12 =

vuut nXi=1

(xi)2.

Geometrical Interpretation of scalar products in R2.Given x = (x1, x2) ∈ R2\ {0}, from elementary trigonometry we know that

x = (kxk cosα, kxk sinα) (2.1)

where α is the measure of the angle between the positive part of the horizontal axes and x itself.

2.3. NORMS AND DISTANCES 23

Using the above observation we can verify that given x = (x1, x2) and y = (y1, y2) in R2\ {0},

xy = kxk · kyk · cos γ

where γ is an1 angle between x and y.scan and insert picture (Marcellini-Sbordone page 179)From the picture and (2.1), we have

x = (kxk cosα1, kxk sinα1)

and

y = (kyk cosα2, kyk sinα2) .

Then2

xy = kxk kyk (cosα1 cosα2 + sinα1 sinα2) = kxk kyk cos (α2 − α1) .

Taken x and y not belonging to the same line, define θ∗ := (angle between x and y with minimummeasure). From the above equality, it follows that

θ∗ = π2 ⇔ x · y = 0

θ∗ < π2 ⇔ x · y > 0

θ∗ > π2 ⇔ x · y < 0.

Definition 52 x, y ∈ Rn\ {0} are orthogonal if xy = 0.

2.3 Norms and Distances

Proposition 53 (Properties of the norm). Let α ∈ R and x, y ∈ Rn.1. kxk ≥ 0, and kxk = 0⇔ x = 0,2. kαxk = |α| · kxk,3. kx+ yk ≤ kxk+ kyk (Triangle inequality),4. |xy| ≤ kxk · kyk (Cauchy-Schwarz inequality ).

Proof. 1. By definition kxk =qPn

i=1 (xi)2 ≥ 0. Moreover, kxk = 0 ⇔ kxk2 = 0 ⇔Pn

i=1 (xi)2 = 0⇔ x = 0.

2. kαxk =qPn

i=1 α2 (xi)

2 = |α|qPn

i=1 (xi)2 = |α| · kxk.

4. (3 is proved using 4)We want to show that |xy| ≤ kxk · kyk or |xy|2 ≤ kxk2 · kyk2, i.e.,Ã

nXi=1

xiyi

!2≤Ã

nXi=1

x2i

!·Ã

nXi=1

y2i

!

Defined X :=Pn

i=1 x2i , Y :=

Pni=1 y

2i and Z :=

Pni=1 xiyi, we have to prove that

Z2 ≤ XY. (2.2)

Observe that

∀a ∈ R, 1.Pn

i=1 (axi + yi)2 ≥ 0, and

2.Pn

i=1 (axi + yi)2= 0 ⇔ ∀i ∈ {1, ..., n} , axi + yi = 0

1Recall that ∀x ∈ R, cosx = cos (−x) = cos (2π − x).

2Recall thatcos (x1 ± x2) = cosx1 cosx2 ∓ sinx1 sinx2.


Moreover,

nXi=1

(axi + yi)2 = a2

nXi=1

x2i + 2anXi=1

xiyi +nXi=1

y2i = a2X + 2aZ + Y ≥ 0 (2.3)

If X > 0, we can take a = − ZX , and from (2.3), we get

0 ≤ Z2

X2X − 2Z

2

X+ Y

orZ2 ≤ XY,

as desired.If X = 0, then x = 0 and Z = 0, and (2.2) is true simply because 0 ≤ 0.3. It suffices to show that kx+ yk2 ≤ (kxk+ kyk)2.

kx+ yk2 =nXi=1

(xi + yi)2 =

nXi=1

³(xi)

2 + 2xi · yi + (yi)2´=

= kxk2 + 2xy + kyk2 ≤ kxk2 + 2 |xy|+ kyk2(4 above)≤ kxk2 + 2 kxk · kyk+ kyk2 = (kxk+ kyk)2 .

Remark 54 |kxk− kyk| ≤ kx− yk .Recall that ∀a, b ∈ R

−b ≤ a ≤ b⇔ |a| ≤ b.

From Proposition 53.3,identifying x with x−y and y with y, we get kx− y + yk ≤ kx− yk+kyk,i.e.,

kxk− kyk ≤ kx− ykFrom Proposition 53.3, identifying x with y−x and y with x, we get ky − x+ xk ≤ ky − xk+kxk,

i.e.,kyk− kxk ≤ ky − xk = kx− yk

and− kx− yk ≤ kxk− kyk

Definition 55 For any n ∈ N\ {0} and for any i ∈ {1, ..., n} , ein :=¡eij,n

¢nj=1∈ Rn with

ein,j =

⎧⎨⎩ 0 if i 6= j

1 if i = j

In other words, ein is an element of Rn whose components are all zero, but the i− th componentwhich is equal to 1. The vector ein is called the i− th canonical vector in Rn.

Remark 56 ∀x ∈ Rn,

kxk ≤nXi=1

|xi| ,

as verified below.

kxk =°°°°°

nXi=1

xiei

°°°°° (1)≤nXi=1

°°xiei°° (2)= nXi=1

|xi| ·°°ei°° = nX

i=1

|xi| ,

where (1) follows from the triangle inequality, i.e., Proposition 53.3, and (2) from Proposition53.2.

Definition 57 Given x, y ∈ Rn,we denote the (Euclidean) distance between x and y by

d (x, y) := kx− yk

2.3. NORMS AND DISTANCES 25

Proposition 58 (Properties of the distance). Let x, y, z ∈ Rn.1. d (x, y) ≥ 0, and d (x, y) = 0⇔ x = y,2. d (x, y) = d (y, x),3. d (x, z) ≤ d (x, y) + d (y, z) (Triangle inequality).

Proof. 1. It follows from property 1 of the norm.2. It follows from the definition of the distance as a norm.3. Identifying x with x− y and y with y − z in property 3 of the norm, we getk(x− y) + (y − z)k ≤ kx− yk+ ky − zk, i.e., the desired result.

Remark 59 Let a set X be given.

• Any function n : X → R which satisfy the properties listed below is called a norm. Moreprecisely, we have what follows. Let α ∈ R and x, y ∈ X.Assume that

1. n (x) ≥ 0, and n (x) = 0⇔ x = 0,

2. n (αx) = |α| · n (x),3. n (x+ y) ≤ n (x) + n (y).

Then, n is called a norm and (X,n) is called a normed space.

• Any function d : X × X → R satisfying properties 1, 2 and 3 presented for the Euclideandistance in Proposition 58 is called a distance or a metric. More precisely, we have whatfollows. Let x, y, z ∈ X. Assume that

1. d (x, y) ≥ 0, and d (x, y) = 0⇔ x = y,

2. d (x, y) = d (y, x),

3. d (x, z) ≤ d (x, y) + d (y, z) ,

then d is called a distance and (X, d) a metric space.

Chapter 3

Matrices

We presented the concept of matrix in Definition 21. In this chapter, we study further propertiesof matrices.

Definition 60 The transpose of a matrix A ∈ Mm,n, denoted by AT belongs to Mm,n, and it isthe matrix obtained by writing the rows of A, in order, as columns:

AT =


⎤⎥⎥⎥⎥⎦T

=

⎡⎢⎢⎢⎢⎣a11 ... ai1 ... am1...a1j ... aij ... amj

...a1n ... ain ... amn

⎤⎥⎥⎥⎥⎦ .

In other words, row 1 of the matrix A becomes column 1 of AT , row 2 of A becomes column 2 ofAT , and so on, up to row m which becomes column m of AT . Same results is obtained proceedingas follows: column 1of A becomes row 1 of AT , column 2 of A becomes row 2 of AT , and so on, upto column n which becomes row n of AT . More formally, given A = [aij ]i∈{1,...,m}

j∈{1,...,n}∈Mm,n , then

AT = [aji]j∈{1,...,n}i∈{1,...,m}

∈Mn,m.

Definition 61 A matrix A ∈Mn,n is said symmetric if A = AT , i.e., ∀i, j ∈ {1, ..., n}, aij = aji.

Remark 62 We can write a matrix Am×n = [aij ] as

A =

⎡⎢⎢⎢⎢⎣R1 (A)...Ri (A)...Rm (A)

⎤⎥⎥⎥⎥⎦ = [C1 (A) , ..., Cj (A) , ..., Cn (A)]

where

Ri (A) = [ai1, ..., aij , ...ain] :=hR1i (A) , ..., R

ji (A) , ...R

ni (A)

i∈ Rn for i ∈ {1, ...,m} and

Cj (A) =

⎡⎢⎢⎢⎢⎣a1j

aij

amj

⎤⎥⎥⎥⎥⎦ :=⎡⎢⎢⎢⎢⎣

C1j (A)

Cij (A)

Cmj (A)

⎤⎥⎥⎥⎥⎦ ∈ Rn for j ∈ {1, ..., n} .

In other words, Ri (A) denotes row i of the matrix A and Cj (A) denotes column jof matrix A.

27

28 CHAPTER 3. MATRICES

3.1 Matrix operations

Definition 63 Two matrices Am×n := [aij ] and Bm×n := [bij ] are equal if for i ∈ {1, ...,m},j ∈ {1, ..., n}

∀i ∈ {1, ...,m} , j ∈ {1, ..., n} , aij = bij .

Definition 64 Given the matrices Am×n := [aij ] and Bm×n := [bij ], the sum of A and B, denotedby A+B is the matrix Cm×n = [cij ] such that

∀i ∈ {1, ...,m} , j ∈ {1, ..., n} , cij = aij + bij

Definition 65 Given the matrices Am×n := [aij ] and the scalar α, the product of the matrix A bythe scalar α, denoted by α ·A or αA, is the matrix obtained by multiplying each entry A by α :

αA := [αaij ]

Remark 66 It is easy to verify that the set of matrices Mm,n with the above defined sum andscalar multiplication satisfies all the properties listed for elements of Rn in Section 2.1.

Definition 67 Given A = [aij ] ∈ Mm,n, B = [bjk] ∈ Mn,p, the product A · B is a matrixC = [cik] ∈Mm,p such that

∀i ∈ {1, ...m} ,∀k ∈ {1, ..., p} , cik :=nXj=1

aijbjk = Ri (A) · Ck (B)

i.e., since

A =

⎡⎢⎢⎢⎢⎣R1 (A)...Ri (A)...Rm (A)

⎤⎥⎥⎥⎥⎦ , B = [C1 (B) , ..., Ck (B) , ..., Cp (B)] (3.1)

AB =

⎡⎢⎢⎢⎢⎣R1 (A) · C1 (B) ... R1 (A) · Ck (B) ... R1 (A) · Cp (B)...Ri (A) · C1 (B) ... Ri (A) · Ck (B) ... Ri (A) · Cp (B)...Rm (A) · C1 (B) ... Rm (A) · Ck (B) ... Rm (A) · Cp (B)

⎤⎥⎥⎥⎥⎦ (3.2)

Remark 68 If A ∈ M1n, B ∈ Mn,1, the above definition coincides with the definition of scalarproduct between elements of Rn. In what follows, we often identify an element of Rn with a row ora column vectors ( - see Definition 22) consistently with we what write. In other words Am×nx = ymeans that x and y are column vector with n entries, and wAm×n = z means that w and z are rowvectors with m entries.

Definition 69 If two matrices are such that a given operation between them is well defined, we saythat they are conformable with respect to that operation.

Remark 70 If A,B ∈Mm,n, they are conformable with respect to matrix addition. If A ∈Mm,n

and B ∈Mn,p, they are conformable with respect to multiplying A on the left of B. We often say thetwo matrices are conformable and let the context define precisely the sense in which conformabilityis to be understood.

Remark 71 (For future use) ∀k ∈ {1, ..., p},

A · Ck (B) =

⎡⎢⎢⎢⎢⎣R1 (A)...Ri (A)...Rm (A)

⎤⎥⎥⎥⎥⎦ · Ck (B) =

⎡⎢⎢⎢⎢⎣R1 (A) · Ck (B)...Ri (A) · Ck (B)...Rm (A) · Ck (B)

⎤⎥⎥⎥⎥⎦ (3.3)

3.1. MATRIX OPERATIONS 29

Then, just comparing (3.2) and (3.3), we get

AB =£A · C1 (B) ... A · Ck (B) ... A · Cp (B)

¤(3.4)

Similarly, ∀i ∈ {1, ...,m},

Ri (A) ·B = Ri (A) ·£C1 (B) ... Ck (B) ... Cp (B)

¤=

=£Ri (A) · C1 (B) ... Ri (A) · Ck (B) ... Ri (A) · Cp (B)

¤ (3.5)

Then, just comparing (3.2) and (3.5), we get

AB =

⎡⎢⎢⎢⎢⎣R1 (A)B...Ri (A)B...Rm (A)B

⎤⎥⎥⎥⎥⎦ (3.6)

Definition 72 A submatrix of a matrix A ∈Mm,n is a matrix obtained from A erasing some rowsand columns.

Definition 73 A matrix A ∈Mm,n is partitioned in blocks if it is written as submatrices using asystem of horizontal and vertical lines.

The reason of the partition into blocks is that the result of operations on block matrices canobtained by carrying out the computation with blocks, just as if they were actual scalar entries ofthe matrices, as described below.

Remark 74 We verify below that for matrix multiplication, we do not commit an error if, uponconformably partitioning two matrices, we proceed to regard the partitioned blocks as real numbersand apply the usual rules.1. Take a := (ai)

n1i=1 ∈ Rn1 , b := (bj)

n2j=1 ∈ Rn2 , c := (ci)

n1i=1 ∈ Rn1 , d := (dj)

n2j=1 ∈ Rn2 ,

£a | b

¤1×(n1+n2)

⎡⎣ c−d

⎤⎦(n1+n2)×1

=

n1Xi=1

aici +

n2Xj=1

bjdj = a · c+ b · d. (3.7)

2.Take A ∈Mm,n1 , B ∈Mm,n2 , C ∈Mn1,p,D ∈Mn2,p, with

A =

⎡⎣ R1 (A)...Rm (A)

⎤⎦ , B =

⎡⎣ R1 (B)...Rm (B)

⎤⎦ ,C = [C1 (C) , ..., Cp (C)] ,

D = [C1 (D) , ..., Cp (D)]

Then,

£A B

¤m×(n1+n2)

∙CD

¸(n1+n2)×p

=

⎡⎣ R1 (A)...Rm (A)

R1 (B)...Rm (B)

⎤⎦∙ C1 (C) , ..., Cp (C)C1 (D) , ..., Cp (D)

¸=

=

⎡⎣ R1 (A) · C1 (C) +R1 (B) · C1 (D) ... R1 (A) ·+R1 (B) · Cp (D)...Rm (A) · C1 (C) +Rm (B) · C1 (D) Rm (A) · Cp (C) +Rm (B) · Cp (D)

⎤⎦ ==

⎡⎣ R1 (A) · C1 (C) ... R1 (A) · Cp (C)...Rm (A) · C1 (C) Rm (A) · Cp (C)

⎤⎦+⎡⎣ R1 (B) · C1 (D) ... R1 (B) · Cp (D)

...Rm (B) · C1 (D) Rm (B) · Cp (D)

⎤⎦ == AC +BD.


Definition 75 Let the matrices Ai ∈M (ni, ni) for i ∈ {1, ...,K}, then the matrix

A =

⎡⎢⎢⎢⎢⎢⎢⎢⎢⎣

A1A2

. . .Ai

. . .AK

⎤⎥⎥⎥⎥⎥⎥⎥⎥⎦∈M

ÃKXi=1

ni,KXi=1

ni

!

is called block diagonal matrix.

Very often having information on the matrices Ai gives information on A.

Remark 76 It is easy, but cumbersome, to verify the following properties.

1. (associative) ∀A ∈Mm,n, ∀B ∈Mnp, ∀C ∈Mpq, A(BC) = (AB)C;2. (distributive) ∀A ∈Mm,n, ∀B ∈Mm,n, ∀C ∈Mnp, (A+B)C = AC +BC.3. ∀x, y ∈ Rn and ∀α, β ∈ R,

A (αx+ βy) = A (αx) +B (βy) = αAx+ βAy

It is false that:1. (commutative) ∀A ∈Mm,n, ∀B ∈Mnp, AB = BA;2. ∀A ∈Mm,n, ∀B,C ∈Mnp, hA 6= 0, AB = ACi =⇒ hB = Ci;3. ∀A ∈Mm,n, ∀B ∈Mnp, hA 6= 0, AB = 0i =⇒ hB = 0i.Let’s show why the above statements are false.1.

A =

∙1 2 1−1 1 3

¸B =

⎡⎣ 1 02 10 1

⎤⎦

AB =

∙1 2 1−1 1 3

¸⎡⎣ 1 02 10 1

⎤⎦ = ∙ 5 31 4

¸

BA =

⎡⎣ 1 02 10 1

⎤⎦∙ 1 2 1−1 1 3

¸=

⎡⎣ 1 2 11 5 5−1 1 3

⎤⎦C =

∙1 2−1 1

¸D =

∙1 03 2

¸

CD =

∙1 2−1 1

¸ ∙1 03 2

¸=

∙7 42 2

¸

DC =

∙1 03 2

¸ ∙1 2−1 1

¸=

∙1 21 8

¸Observe that since the commutative property does not hold true, we have to distinguish between

“left factor out” and “right factor out”:

AB +AC = A (B + C)EF +GF = (E +G)FAB + CA 6= A (B + C)AB + CA 6= (B + C)A

2.Given

A =

∙3 16 2

¸B =

∙4 1−5 6

¸C =

∙1 24 3

¸,

3.1. MATRIX OPERATIONS 31

we have

AB =

∙3 16 2

¸ ∙4 1−5 6

¸=

∙7 914 18

¸AC =

∙3 16 2

¸ ∙1 24 3

¸=

∙7 914 18

¸3.Observe that 3.⇒ 2. and therefore ¬2.⇒ ¬3. Otherwise, you can simply observe that 3. follows

from 2., choosing A in 3. equal to A in 2., and B in 3. equal to B − C in 2.:

A (B − C) =

∙3 16 2

¸·µ∙

4 1−5 6

¸−∙1 24 3

¸¶=

∙3 16 2

¸·∙3 −1−9 3

¸=

∙0 00 0

¸Since the associative property of the product between matrices does hold true we can give the

following definition.

Definition 77 Given A ∈Mm,m,

Ak := A1·A·2... · A

k times.

Observe that if A ∈Mm,m and k, l ∈ N\ {0}, then

Ak ·Al = Ak+l.

Remark 78 Properties of transpose matrices.

1. ∀A ∈Mm,n (AT )T = A2. ∀A,B ∈Mm,n (A+B)T = AT +BT

3. ∀α ∈ R, ∀A ∈Mm,n (αA)T = αAT

4. ∀A ∈Mm,n, ∀B ∈Mn,m (AB)T = BTAT

Matrices and linear systems.In Section 1.6, we have seen that a system ofm linear equation in the n unknowns x1, x2, ..., xn and

parameters aij , for i ∈ {1, ...,m}, j ∈ {1, ..., n}, (bi)ni=1 ∈ Rn is displayed below:⎧⎪⎪⎪⎪⎨⎪⎪⎪⎪⎩a11x1 + ...+ a1jxj + ...+ a1nxn = b1...ai1x1 + ...+ aijxj + ...+ ainxn = bi...am1xi + ...+ amjxj + ...+ amnxn = bm

(3.8)

Moreover, the matrix ⎡⎢⎢⎢⎢⎣a11 ... a1j ... a1n b1...ai1 ... aij ... ain bi...am1 ... amj ... amn bm

⎤⎥⎥⎥⎥⎦is called the augmented matrix M of system (1.4). The coefficient matrix A of the system is

A =


⎤⎥⎥⎥⎥⎦Using the notations we described in the present section, we can rewrite linear equations and

systems of linear equations in a convenient and short manner, as described below.The linear equation in the unknowns x1, ..., xn and parameters a1, ..., ai, ..., an, b ∈ R

a1x1 + ...+ aixi + ...+ anxn = b


can be rewritten asnXi=1

aixi = b

or

a · x = b

where a = [a1, ..., an] and x =

⎡⎣ x1...xn

⎤⎦.The linear system (3.8) can be rewritten as⎧⎪⎪⎪⎪⎨⎪⎪⎪⎪⎩

nPj=1

a1jxj = b1

...nPj=1

amjxj = bm

or ⎧⎨⎩ R1 (A)x = b1...Rm (A)x = bm

orAx = b

where A = [aij ].

Definition 79 The trace of A ∈Mmm, written tr A, is the sum of the diagonal entries, i.e.,

tr A =mXi=1

aii.

Definition 80 The identity matrix Im is a diagonal matrix of order m with each element on theprincipal diagonal equal to 1. If no confusion arises, we simply write I in the place of Im.

Remark 81 1. ∀n ∈ N\ {0} , (Im)n = Im;2. ∀A ∈Mm,n, ImA = AIn = A.

Proposition 82 Let A,B ∈M (m,m) and k ∈ R. Then1. tr (A+B) = tr A+ tr B;2. tr kA = k · tr A;3. tr AB = tr BA.

Proof. Exercise.

3.2 Inverse matricesDefinition 83 Given a matrix An×n, , a matrix Bn×n is called an inverse of A if

AB = BA = In.

We then say that A is invertible, or that A admits an inverse.

Proposition 84 If A admits an inverse, then the inverse is unique.

Proof. Let the inverse matrices B and C of A be given. Then

AB = BA = In (3.9)

and

3.2. INVERSE MATRICES 33

AC = CA = In (3.10)

Left multiplying the first two terms in the equality (3.9) by C, we get

(CA)B = C (BA)

and from (3.10) and (3.9) we get B = C, as desired.Thanks to the above Proposition, we can present the following definition.

Definition 85 If the inverse of A does exist, then it is denoted by A−1.

Example 86 Assume that for i ∈ {1, ..., n} , λi 6= 0.The diagonal matrix⎡⎢⎣ λ1. . .

λn

⎤⎥⎦is invertible and its inverse is ⎡⎢⎣

1λ1

. . .1λn

⎤⎥⎦ .Remark 87 If a row or a column of A is zero, then A is not invertible, as verified below.Without loss of generality, assume the first row of A is equal to zero. Assume that B is the

inverse of A. But then, since I = AB, we wold have 1 = R1 (A) · C1 (B) = 0, a contradiction.

Proposition 88 If A ∈Mm,m and B ∈Mm,m are invertible matrices, then AB is invertible and

(AB)−1 = B−1A−1

Proof.(AB)B−1A−1 = A

¡BB−1

¢A−1 = AIA−1 = AA−1 = I.

B−1A−1 (AB) = B−1¡A−1A

¢B = B−1IB = B−1B = I.

Remark 89 The existence of the inverse matrix gives an obvious way of solving systems of linearequations with the same number of equations and unknowns.Given the system

An×nx = b,

if A−1 exists, thenx = A−1b.

Proposition 90 (Some other properties of the inverse matrix)Let the invertible matrix A be given.1. A−1 in invertible and

¡A−1

¢−1= A;

2. AT is invertible and¡AT¢−1

=¡A−1

¢T;

Proof. 1. We want to verify that the inverse of A−1 is A, i.e.,

A−1A = I and AA−1 = I,

which is obvious.2. Observe that

AT ·¡A−1

¢T=¡A−1 ·A

¢T= IT = I,

and ¡A−1

¢T ·AT =¡A ·A−1

¢T= I.


3.3 Elementary matricesBelow, we recall the definition of elementary row operations on a matrix A ∈Mm×n presented inDefinition 34.

Definition 91 An elementary row operation on a matrix A ∈Mm×nis one of the following opera-tions on the rows of A:

[E1] (Row interchange) Interchange Ri with Rj, denoted by Ri ↔ Rj;

[E2] (Row scaling) Multiply Ri by k ∈ R\ {0}, denoted by kRi → Ri, k 6= 0;

[E3] (Row addition) Replace Ri by ( k times Rj plus Ri ), denoted by (Ri + kRj)→ Ri.


[E 0] Replace Ri by ( k0 times Rj and k ∈ R\ {0} times Ri ), denoted by (k0Rj + kRi)→ Ri, k 6= 0.

Definition 92 Let E be the set of functions E :Mm,n →Mm,n which associate with any matrixA ∈Mm,n a matrix E (A) obtained from A via an elementary row operation presented in Definition91. For i ∈ {1, 2, 3}, let Ei ⊆ E be the set of elementary row operation functions of type i presentedin Definition 91.

Definition 93 For any E ∈ E, define

EE = E (Im) ∈Mm,m.

EE is called the elementary matrix corresponding to the elementary row operation function E.

With some abuse of terminology, we call any E ∈ E an elementary row operation (omitting theword “function”), and we sometimes omit the subscript E.

Proposition 94 Each elementary row operations E1, E2 and E3 has an inverse, and that inverse isof the same type, i.e., for i ∈ {1, 2, 3}, E ∈ Ei ⇔ E−1 ∈ Ei.

Proof. 1. The inverse of Ri ↔ Rj is Rj ↔ Ri.2. The inverse of kRi → Ri, k 6= 0 is k−1Ri → Ri.3. The inverse of (Ri + kRj)→ Ri is (−kRj +Ri)→ Ri.

Remark 95 Given the row, canonical1 vectors eim, for i ∈ {1, ...,m},

Im =

⎡⎢⎢⎢⎢⎢⎢⎢⎢⎣

e1m...eim...ejm...enm

⎤⎥⎥⎥⎥⎥⎥⎥⎥⎦The following Proposition shows that the result of applying an elementary row operation E to

a matrix A can be obtained by premultiplying A by the corresponding elementary matrix EE .

Proposition 96 For any A ∈Mm,n and for any E ∈ E,

E (A) = E (Im) ·A := EEA. (3.11)

Proof. Recall that

A =

⎡⎢⎢⎢⎢⎢⎢⎢⎢⎣

R1 (A)...

Ri (A)...

Rj (A)...

Rm (A)

⎤⎥⎥⎥⎥⎥⎥⎥⎥⎦1 See Definition 55.

3.3. ELEMENTARY MATRICES 35

We have to prove that (3.11) does hold true ∀E ∈ {E1, E2, E3}.1. E ∈ E1.First of all observe that

E (Im) =

⎡⎢⎢⎢⎢⎢⎢⎢⎢⎣

e1m...ejm...eim...emm

⎤⎥⎥⎥⎥⎥⎥⎥⎥⎦and E (A) =

⎡⎢⎢⎢⎢⎢⎢⎢⎢⎣

R1 (A)...

Rj (A)...

Ri (A)...

Rm (A)

⎤⎥⎥⎥⎥⎥⎥⎥⎥⎦.

From (3.6),

E (Im) ·A =

⎡⎢⎢⎢⎢⎢⎢⎢⎢⎣

e1m ·A...

ejm ·A...

eim ·A...

emm ·A

⎤⎥⎥⎥⎥⎥⎥⎥⎥⎦=

⎡⎢⎢⎢⎢⎢⎢⎢⎢⎣

R1 (A)...

Rj (A)...

Ri (A)...

Rm (A)

⎤⎥⎥⎥⎥⎥⎥⎥⎥⎦,

as desired.2. E ∈ E2.Observe that

E (Im) =

⎡⎢⎢⎢⎢⎢⎢⎢⎢⎣

e1m...

k · eim...ejm...emm

⎤⎥⎥⎥⎥⎥⎥⎥⎥⎦and E (A) =

⎡⎢⎢⎢⎢⎢⎢⎢⎢⎣

R1 (A)...

k ·Rj (A)...

Ri (A)...

Rm (A)

⎤⎥⎥⎥⎥⎥⎥⎥⎥⎦.

E (Im) ·A =

⎡⎢⎢⎢⎢⎢⎢⎢⎢⎣

e1m ·A...

k · eim ·A...

ejm ·A...

emm ·A

⎤⎥⎥⎥⎥⎥⎥⎥⎥⎦=

⎡⎢⎢⎢⎢⎢⎢⎢⎢⎣

R1 (A)...

k ·Ri (A)...

Rj (A)...

Rm (A)

⎤⎥⎥⎥⎥⎥⎥⎥⎥⎦,

as desired.3. E ∈ E3.Observe that

E (Im) =

⎡⎢⎢⎢⎢⎢⎢⎢⎢⎣

e1m...

eim + k · ejm...ejm...emm

⎤⎥⎥⎥⎥⎥⎥⎥⎥⎦and E (A) =

⎡⎢⎢⎢⎢⎢⎢⎢⎢⎣

R1 (A)...

Ri (A) + k ·Rj (A)...

Ri (A)...

Rm (A)

⎤⎥⎥⎥⎥⎥⎥⎥⎥⎦.

E (Im) ·A =

⎡⎢⎢⎢⎢⎢⎢⎢⎢⎣

e1m...

eim + k · ejm...ejm...emm

⎤⎥⎥⎥⎥⎥⎥⎥⎥⎦A =

⎡⎢⎢⎢⎢⎢⎢⎢⎢⎣

e1m ·A...¡

eim + k · ejm¢·A

...ejm ·A...

emm ·A

⎤⎥⎥⎥⎥⎥⎥⎥⎥⎦=

⎡⎢⎢⎢⎢⎢⎢⎢⎢⎣

R1 (A)...

Ri (A) + k ·Rj (A)...

Rj (A)...

Rm (A)

⎤⎥⎥⎥⎥⎥⎥⎥⎥⎦,

as desired.


Corollary 97 If A is row equivalent to B, then there exist k ∈ N and elementary matrices E1, ..., Ek

such thatB = E1 ·E2 · ... ·Ek ·A

Proof. It follows from the definition of row equivalence and Proposition 96.

Proposition 98 Every elementary matrix EE is invertible and (EE)−1 is an elementary matrix.

In fact, (EE)−1= EE−1 .

Proof. Given an elementary matrix E, from Definition 93, ∃E ∈ E such that

E = E (I) (3.12)

DefineE0 = E−1 (I) .

ThenIdef. inv. func.

= E−1 (E (I)) (3.12)= E−1 (E) Prop. (96).= E−1 (I) ·E def. E0= E0E

andIdef. inv.= E

¡E−1 (I)

¢ def. E0= E (E0) Prop. (96).= E (I) ·E0 (3.12)= EE0.

Corollary 99 If E1, ..., Ek are elementary matrices, then

P := E1 ·E2 · ... ·Ek

is an invertible matrix.

Proof. It follows from Proposition 88 and Proposition 98. In fact,¡E−1k · ...E−12 ·E−11

¢is the

inverse of P.

Proposition 100 Let A ∈Mm×n be given. Then, there exist a matrix B ∈Mm×n in row canonicalform, k ∈ N and elementary matrices E1, ..., Ek such that

B = E1 ·E2 · ... ·Ek ·A.

Proof. From Proposition 38, there exist k ∈ N elementary operations E1, ..., Ek such that¡E1 ◦ E2 ◦ ... ◦ Ek

¢(A) = B.

From Proposition 96, ∀j ∈ {1, ..., k} ,

Ej (M) = Ej (I) ·M := Ej ·M.

Then,¡E1 ◦ E2 ◦ ... ◦ Ek

¢(A) =

¡E1 ◦ E2 ◦ ... ◦ Ek−1

¢ ¡Ek (A)

¢=¡E1 ◦ E2 ◦ ... ◦ Ek−1

¢(Ek ·A) =

=¡E1 ◦ E2 ◦ ... ◦ Ek−2

¢◦ Ek−1 (Ek ·A) =

¡E1 ◦ E2 ◦ ... ◦ Ek−2

¢(Ek−1 ·Ek ·A) =

=¡E1 ◦ E2 ◦ ...Ek−3

¢◦ Ek−2 (Ek−1 ·Ek ·A) =

¡E1 ◦ E2 ◦ ...Ek−3

¢(Ek−2 ·Ek−1 ·Ek ·A) =

... = E1 ·E2 · ... ·Ek ·A,

as desired.

Remark 101 In fact, in Proposition 152, we will show that the matrix B of the above Corollaryis unique.

Proposition 102 To be row equivalent is an equivalence relation.

Proof. Obvious.


Proposition 103 ∀n ∈ N\ {0} , An×n is in row canonical form and it is invertible ⇔ A = I.

Proof. [⇐]Obvious.[⇒]We proceed by induction on n.Case 1. n = 1.The case n = 1 is obvious. To try to better understand the logic of the proof, take n = 2, i.e.,

suppose that

A =

∙a11 a12a21 a22

¸is in row canonical form and invertible. Observe that A 6= 0.1. a11 = 1. Suppose a11 = 0. Then, from 1. in the definition of matrix in echelon form

- see Definition 28 - a12 6= 0 (otherwise, you would have a zero row not on the bottom of thematrix). Then, from 2. in that definition, we must have a21 = 0. But then the first column is zero,contradicting the fact that A is invertible - see Remark 87. Since a11 6= 0, then from 2. in theDefinition of row canonical form matrix - see Definition 31 - we get a11 = 1.2. a21 = 0. It follows from the fact that a11 = 1 and 3. in Definition 31.3. a22 = 1. Suppose a22 = 0, but then the last row would be zero, contradicting the fact that A

is invertible and a22 is the leading nonzero entry of the second row, i.e., a22 6= 0. Then from 2. inthe Definition of row canonical form matrix, we get a22 = 1.4. a12 = 0. It follows from the fact that a22 = 1 and 3. in Definition 31.Case 2. Assume that statement is true for n− 1.Suppose that

A =

⎡⎢⎢⎢⎢⎢⎢⎣a11 a12 ... a1j ... a1na21 a22 ... a2j ... a2n...ai1 ai2 ... aij ... ain...an1 an2 ... anj ... ann

⎤⎥⎥⎥⎥⎥⎥⎦is in row canonical form and invertible.1. a11 = 1. Suppose a11 = 0. Then, from 1. in the definition of matrix in echelon form - see

Definition 28 -(a12, ..., a1n) 6= 0.

Then, from 2. in that definition, we must have⎡⎢⎢⎢⎢⎣a21...ai1...an1

⎤⎥⎥⎥⎥⎦ = 0.But then the first column is zero, contradicting the fact that A is invertible - see Remark 87. Sincea11 6= 0, then from 2. in the Definition of row canonical form matrix - see Definition 31 - we geta11 = 1.2. Therefore, we can rewrite the matrix as follows

A =

⎡⎢⎢⎢⎢⎢⎢⎣1 a12 ... a1j ... a1n0 a22 ... a2j ... a2n...0 ai2 ... aij ... ain...0 an2 ... anj ... ann

⎤⎥⎥⎥⎥⎥⎥⎦ =∙1 a0 A22

¸(3.13)

with obvious definitions of a and A22. Since, by assumption, A is invertible, there exists Bwhich we can partition in the same we partitioned A, i.e.,

B =

∙b11 bc B22

¸.


and such that B is invertible. Then,

In = BA =

∙b11 bc B22

¸ ∙1 a0 A22

¸=

∙b11 b11 + bB22c ca+B22A22

¸=

∙1 00 In−1

¸;

then c = 0 and A22B22 = In−1.Moreover,

In = AB =

∙1 a0 A22

¸ ∙b11 bc B22

¸=

∙b11 + ac b+ aB22c A22B22

¸=

∙1 00 In−1

¸. (3.14)

Therefore, A22 is invertible. From 3.13, A22can be obtained from A erasing the first row andthen erasing a column of zero, from Remark 33, A22 is a row reduced form matrix. Then, we canapply the assumption of the induction argument to conclude that A22 = In−1. Then, from 3.13,

A =

∙1 a0 I

¸.

Since, by assumption, An×n is in row canonical form, from 3. in Definition 31, a = 0, and, asdesired A = I.

Proposition 104 Let A belong to Mm,m. Then the following statements are equivalent.

1. A is invertible;

2. A is row equivalent to Im;

3. A is the product of elementary matrices.

Proof. 1.⇒ 2.From Proposition 100, there exist a matrix B ∈Mm×m in row canonical form, k ∈ N and

elementary matrices E1, ..., Ek such that

B = E1 ·E2 · ... ·Ek ·A.

Since A is invertible and, from Corollary 99, E1 ·E2 · ... ·Ek is invertible as well, from Proposition88, B is invertible as well. Then, from Proposition 103, B = I.2.⇒ 3.By assumption and from Corollary 97, there exist k ∈ N and elementary matrices E1, ..., Ek

such thatI = E1 ·E2 · ... ·Ek ·A,

and, using Proposition 88,

A = (E1 ·E2 · ... ·Ek)−1 · I = E−1k · ... ·E−12 ·E−11 · I.

Since ∀i ∈ {1, ..., k}, E−1i is an elementary matrix, the desired result follows.3.⇒ 1.By assumption, there exist k ∈ N and elementary matrices E1, ..., Ek such that

A = E1 ·E2 · ... ·Ek.

Since, from Proposition 98, ∀i ∈ {1, ..., k}, Ei is invertible, A is invertible as well, from Propo-sition 88.

Proposition 105 Let Am×n be given.1. Bm×n is row equivalent to Am×n ⇔ there exists an invertible Pm×m such that B = PA.2. Pm×m is an invertible matrix ⇒ PA is row equivalent to A.

Proof. 1.[⇒] From Corollaries 99 and 97, B = E1 · ... ·Ek ·A with (E1 · ... ·Ek) invertible matrix. Then,

it suffices to take P = E1 · ... ·Ek.[⇐] From Proposition 104, P is row equivalent to I, i.e., there exist E1, ..., Ek such that P =

E1 · ... ·Ek · I. Then by assumption B = E1 · ... ·Ek · I ·A, i.e., B is row equivalent to A.2.From Proposition 104, P is the product of elementary matrices. Then, the desired result follows

from Proposition 96.


Proposition 106 If A is row equivalent to a matrix with a zero row, then A is not invertible.

Proof. Suppose otherwise, i.e., A is row equivalent to a matrix C with a zero row and Ais invertible. From Proposition 105, there exists an invertible P such that A = PC and thenP−1A = C. Since A and P−1are invertible, then, from Proposition 88, P−1A is invertible, whileC, from Remark 87, C is not invertible, a contradiction.

Remark 107 From Proposition 104, we know that if Am×m is invertible, then there exist E1, ..., Ek

such thatE1 · ... ·Ek ·A = I (3.15)

and, taking inverses of both sides,

(E1 · ... ·Ek ·A)−1 = I

andA−1 · E−1k · ...E−11 = I

orA−1 = E1 · ... ·Ek · I. (3.16)

Then, from (3.15)and (3.16), if A is invertible then A−1 is equal to the finite product of thoseelementary matrices which “transform” A in I, or, equivalently, can be obtained applying a finitenumber of corresponding elementary operations to the identity matrix I. That observation leads tothe following (Gaussian elimination) algorithm, which either show that an arbitrary matrix Am×mis not invertible or finds the inverse of A.

An algorithm to find the inverse of a matrix Am×m or to show the matrix is notinvertible.

Step 1. Construct the following matrix Mm×(2m):£A Im

¤Step 2. Row reduce M to echelon form. If the process generates a zero row in the part of

M corresponding to A, then stop: A is not invertible : A is row equivalent to a matrixwith a zero row and therefore, from Proposition 106 is not invertible. Otherwise, the part ofM corresponding to A is a triangular matrix.

Step 3. Row reduce M to the row canonical form£Im B

¤Then, from Remark 107, A−1 = B.

Example 108 We find the inverse of

A =

⎡⎣ 1 0 22 −1 34 1 8

⎤⎦ ,applying the above algorithm.Step 1.

M =

⎡⎣ 1 0 2 1 0 02 −1 3 0 1 04 1 8 0 0 1

⎤⎦ .Step 2. ⎡⎣ 1 0 2 | 1 0 0

0 −1 −1 | −2 1 00 1 0 | −4 0 1

⎤⎦ ,⎡⎣ 1 0 2 | 1 0 00 −1 −1 | −2 1 00 0 −1 | −6 1 1

⎤⎦The matrix is invertible.


Step 3. ⎡⎣ 1 0 2 | 1 0 00 −1 −1 | −2 1 00 0 −1 | −6 1 1

⎤⎦ ,⎡⎣ 1 0 2 | 1 0 00 −1 −1 | −2 1 00 0 1 | 6 −1 −1

⎤⎦ ,⎡⎣ 1 0 0 | −11 2 20 −1 0 | 4 0 −10 0 1 | 6 −1 −1

⎤⎦ ,⎡⎣ 1 0 0 | −11 2 20 1 0 | −4 0 10 0 1 | 6 −1 −1

⎤⎦Then

A−1 =

⎡⎣ −11 2 2−4 0 16 −1 −1

⎤⎦ .Example 109 ∙

1 3 | 1 04 2 | 0 1

¸∙1 3 | 1 00 −10 | −4 1

¸∙1 3 | 1 00 1 | 4

10 − 110

¸∙1 0 | − 2

1003

0 1 | 410 − 1

10

¸

3.4 Elementary column operations

This section repeats some of the discussion of the previous section using column instead of rows ofa matrix.

Definition 110 An elementary column operation is one of the following operations on the columnsof Am×n:

[F1] (Column interchange) Interchange Ci with Cj, denoted by Ci ↔ Cj;

[F2] (Column scaling) Multiply Ci by k ∈ R\ {0}, denoted by kCi → Ci, k 6= 0;

[F3] (Column addition) Replace Ci by ( k times Cj plus Ci ), denoted by (Ci + kCj)→ Ci..

Each of the above column operation has an inverse operation of the same type just like thecorresponding row operations.

Definition 111 Let F be an elementary column operation on a matrix An×n. We denote theresulting matrix by F (A). We define also

FF = F (In) ∈Mn,n.

FF is then called an elementary matrix corresponding to the elementary column operation F . Wesometimes omit the subscript F .

Definition 112 Given an elementary row operation E, define FE , if it exists2 , as the column oper-ation obtained by E substituting the word row with the word column. Similarly, given an elementarycolumn operation F define EF , if it exists, as the row operation obtained by F substituting the wordcolumn with the word row.

In what follows, F and E are such that F = FE and EF = E.2Of course, if you exchange the first and the third row, and the matrix has only two columns, you cannot exchange

the first and the third column.

3.4. ELEMENTARY COLUMN OPERATIONS 41

Proposition 113 Let a matrix Am×n be given. Then

F (A) =£E¡AT¢¤T

.

Proof. The above fact is equivalent to E¡AT¢= (F (A))T and it is a consequence of the fact

that the columns of A are the rows of AT and vice versa. As an exercise, carefully do the proof inthe case of each of the three elementary operation types.

Remark 114 The above Proposition says that applying the column operation F to a matrix A givesthe same result as applying the corresponding row operation EF to AT and then taking the transpose.

Proposition 115 Let a matrix Am×n be given. Then1.

F (A) = A · (E (I))T = A · F (I) ,

or, since E := E (I) and F := F (I),

F (A) = A ·ET = A · F. (3.17)

2. F = ET and F is invertible.

Proof. 1.

F (A) Lemma 113= =

£E¡AT¢¤T Lemma 96

= =¡E (I) ·AT

¢T= A · (E (I))T Lemma 113

= A · F (I) .

2. From (3.17), we then getF := F (I) = I ·ET = ET .

From Proposition 90 and Proposition 98, it follows that F is invertible.

Remark 116 The above Proposition says that says that the result of applying an elementary columnoperation F on a matrix A can be obtained by postmultiplying A by the corresponding elementarymatrix F .

Definition 117 A matrix Bm×n is said column equivalent to a matrix Am×n if B can be obtainedfrom A using a finite number of elementary column operations.

Remark 118 By definition of row equivalent, column equivalent and transpose of a matrix, wehave that

A and B are row equivalent ⇔ AT and BT are column equivalent,

andA and B are column equivalent ⇔ AT and BT are row equivalent.

Proposition 119 1. Bm×n is column equivalent to Am×n ⇔ there exists an invertible Qn×n suchthat Bm×n = Am×nQn×n.2. Qn×n in invertible matrix ⇒ AQ is column equivalent to A.

Proof. It is very similar to the proof of Proposition 105.

Definition 120 A matrix Bm×n is said equivalent to a matrix Am×n if B can be obtained from Ausing a finite number of elementary row and column operations.

Proposition 121 A matrix Bm×n is equivalent to a matrix Am×n ⇔ there exist invertible matricesPm×m and Qn×n such that Bm×n = Pm×mAm×nQn×n.

Proof. [⇒]By assumption B = E1 · ... ·Ek ·A · F1 · ... · Fh.[⇐]Similar to the proof of Proposition 105.


Proposition 122 For any matrix Am×n there exists a number r ∈ {0, 1, ...,min {m,n}} such thatA is equivalent to the block matrix of the form∙

Ir 00 0

¸. (3.18)

Proof. The proof is constructive in the form of an algorithm.

Step 1. Row reduce A to row canonical form, with leading nonzero entries a11, a2j2,..., ajjr .

Step 2. Interchange C2 and Cj2 , C3 and Cj3 and so on up to Cr and Cjr .You then get a matrixof the form ∙

Ir B0 0

¸.

Step 3. Use column operations to replace entries in B with zeros.

Remark 123 From Proposition 152 the matrix in Step 2 is unique and therefore the resultingmatrix in Step 3, i.e., matrix (3.18) is unique.

Proposition 124 For any A ∈Mm,n, there exists invertible matrices P ∈Mm,m and Q ∈Mn,n andr ∈ {0, 1, ...,min {m,n}} such that

PAQ =

∙Ir 00 0

¸Proof. It follows immediately from Propositions 122 and 121.

Remark 125 From Proposition 152 the number r in the statement of the previous Proposition isunique.

Proposition 126 If Am×mBm×m = I, then BA = I and therefore A is invertible and A−1 = B.

Proof. Suppose A is not invertible, the from Proposition 104 is not row equivalent to Im andfrom Proposition 122, A is equivalent to a block matrix of the form displayed in (3.18) with r < m.Then, from Proposition 121, there exist invertible matrices Pm×m and Qm×m such that

PAQ =

∙Ir 00 0

¸,

and from AB = I, we getP = PAQQ−1B

and ∙Ir 00 0

¸ ¡Q−1B

¢= P

Therefore, P has some zero rows and columns, contradicting that P is invertible.

Remark 127 The previous Proposition says that to verify that A in invertible it is enough to checkthat AB = I.

Remark 128 We will come back to the analysis of further properties of the inverse and on anotherway of computing it in Section 5.3

Chapter 4

Vector spaces

4.1 Definition

Definition 129 Let a nonempty set F with the operations ofaddition which assigns to any x, y ∈ F an element denoted by x⊕ y ∈ F, andmultiplication which assigns to any x, y ∈ F an element denoted by x¯ y ∈ Fbe given. (F,⊕,¯) is called a field, if the following properties hold true.

1. (Commutative) ∀x, y ∈ F , x⊕ y = y ⊕ x and x¯ y = y ¯ x;

2. (Associative) ∀x, y, z ∈ F, (x⊕ y)⊕ z = x⊕ (y ⊕ z) and (x¯ y)¯ z = x¯ (y ¯ z);

3. (Distributive) ∀x, y, z ∈ F, x¯ (y ⊕ z) = (x¯ y)⊕ (x¯ z);

4. (Existence of null elements) ∃f0, f1 ∈ F such that ∀x ∈ F , f0 ⊕ x = x and f1 ¯ x = x;

5. (Existence of a negative element) ∀x ∈ F ∃y ∈ F such that x⊕ y = f0;

From the above properties, it follows that f0 and f1are unique.1 We denote f0 and f1 by 0and 1, respectively.

6. (Existence of an inverse element) ∀x ∈ F\ {0}, ∃y ∈ F such that x¯ y = 1.

Elements of a fields are called scalars.

Example 130 The set R of real numbers with the standard addition and multiplication is a field.From the above properties all the rules of “elementary” algebra can be deduced.2

The set C of complex numbers is a field - see Appendix 10.

Definition 131 Let (F,⊕,¯) be a field and V be a nonempty set with the operations ofaddition which assigns to any u, v ∈ V an element denoted by u+ v ∈ V, andscalar multiplication which assigns to any u ∈ V and any α ∈ F an element α · u ∈ V .Then (V,+, ·) is called a vector space on the field (F,⊕,¯) and its elements are called vectors if

the following properties are satisfied.A1. (Associative) ∀u, v, w ∈ V, (u+ v) + w = u+ (v + w);A2. (existence of zero element) there exists an element 0 in V such that ∀u ∈ V , u+ 0 = u;A3. (existence of inverse element) ∀u ∈ V ∃v ∈ V such that u+ v = 0;A4. (Commutative) ∀u, v ∈ V , u+ v = v + u;M1. (distributive) ∀α ∈ F and ∀u, v ∈ V, α · (u+ v) = α · u+ α · v;M2. (distributive) ∀α, β ∈ F and ∀u ∈ V, (α⊕ β) · u = α · u+ β · u;M3. ∀α, β ∈ F and ∀u ∈ V, (α¯ β) · u = α · (β · u);M4. ∀u ∈ V, 1 · u = u.

Remark 132 From A2. above, we have that for any vector space V , we have that 0 ∈ V.

1The proof of that result is very similar to the proof of Proposition 134.1 and 2.2 See, for example, Apostol (1967), Section 13.2, page 17.

43

44 CHAPTER 4. VECTOR SPACES

Remark 133 In the remainder of these notes, if no confusion arises, for ease of notation, we willdenote a field simply by F and a vector space by V . Moreover, we will write + in the place of ⊕,and we will omit ¯ and ·, i.e., we will write xy instead of x¯ y and αv instead of α · v.

Proposition 134 If V is a vector space, then (as a consequence of the first four properties)1. The zero vector is unique and it is denoted by 0.2. ∀u ∈ V , the inverse element of u is unique and it is denoted by −u.3. (cancellation law) ∀u, v,w ∈ V,

u+ w = v + w⇒ u = v.

Proof. 1. Assume that there exist 01, 02 ∈ V which are zero vectors. Then from (A2),

01 + 02 = 01 and 02 + 01 = 02.

From (A.4) ,01 + 02 = 02 + 01,

and therefore 01 = 02.2. Given u ∈ V , assume there exist v1, v2 ∈ V such that

u+ v1 = 0 and u+ v2 = 0.

Then

v2 = v2 + 0 = v2 +¡u+ v1

¢=¡v2 + u

¢+ v1 =

¡u+ v2

¢+ v1 = 0 + v1 = v1..

3.u+ w = v + w

(1)⇒ u+ w + (−w) = v + w + (−w) (2)⇒ u+ 0 = v + 0(3)⇒ u = v,

where (1) follows from the definition of operation, (2) from the definition of −w and (3) fromthe definition of 0.

Proposition 135 If V is a vector space over a field F , then

1. For 0 ∈ F and ∀u ∈ V , 0u = 0.

2. For 0 ∈ V and ∀α ∈ F , α0 = 0.

3. If α ∈ F , u ∈ V and αu = 0, then either α = 0 or u = 0 or both.

4. ∀α ∈ F and ∀u ∈ V , (−α)u = α (−u) = − (αu) := −αu.

Proof. 1. From (M1),0u+ 0u = (0 + 0)u = 0u.

Then, adding − (0u) to both sides,

0u+ 0u+ (− (0u)) = 0u+ (− (0u))

and, using (A3),0u+ 0 = 0

and, using (A2), we get the desired result.2. From (A2),

0 + 0 = 0;

then multiplying both sides by α and using (M1),

α0 = α (0 + 0) = α0 + α0;

and, using (A3),α0 + (− (α0)) = α0 + α0 + (− (α0))

and, using (A2), we get the desired result.

4.2. EXAMPLES 45

3. Assume that αu = 0 and α 6= 0. Then

u = 1u =¡α−1 · α

¢u = α−1 (αu) = α−1 · 0 = 0.

Taking the contropositive of the above result, we get hu 6= 0i ⇒ hαu 6= 0 ∨ α = 0i. Thereforehu 6= 0 ∧ αu = 0i⇒ hα = 0i.4. From u + (−u) = 0, we get α (u+ (−u)) = α0, and then αu + α (−u) = 0, and therefore

− (αu) = α (−u).From α + (−α) = 0, we get (α+ (−α))u = 0u, and then αu + (−α)u = 0, and therefore

− (αu) = (−α)u.

Remark 136 From Proposition 135.4, and (M4)in Definition 131, we have

(−1)u = 1 (−u) = −u.

We also define subtraction as follows:

v − u := v + (−u)

4.2 ExamplesEuclidean spaces.The Euclidean space Rn with sum and scalar product defined in Chapter 2 is a vector space

over the field R with the standard addition and multiplicationMatrices.For any m,n ∈ N\ {0}, the set Mm,n of matrices with real elements with the operation of

addition and scalar multiplication, as defined in Section 2.3 is a vector space on a field F and it isdenoted by

MF (m,n) .

We also setM (m,n) :=MR (m,n) ,

i.e., M (m,n) is the vector space on the field R.PolynomialsThe set of all polynomials

a0 + a1x+ a2x2 + ...+ ant

n

with n ∈ N and a0, a1, a2, ..., an ∈ R is a vector space on R with respect to the standard sumbetween polynomials and scalar multiplication.Function space F (X).Given a nonempty set X, the set of all functions f : X → R with obvious sum and scalar

multiplication is a a vector space on R.Sets which are not vector spaces.(0,+∞) and [0,+∞) are not a vector spaces in R.For any n ∈ N\ {0}, the set of all polynomials of degree n is not a vector space on R.

4.3 Vector subspacesIn what follows, if no ambiguity may arise, we will say “vector space” instead of “vector space ona field”.

Definition 137 Let W be a subset of a vector space V . W is called a vector subspace of V if W isa vector space with respect to the operation of vector addition and scalar multiplication defined onV .

Proposition 138 Let W be a subset of a vector space V . The following three statements areequivalent.

1. W is a vector subspace of V.


2. a. W 6= ∅, i.e.,3 0 ∈W ;

b. ∀u, v ∈W, u+ v ∈W ;

c. ∀u ∈W,α ∈ F, αu ∈W .

3. a. W 6= ∅, i.e., 0 ∈W ;

b. ∀u, v ∈W, ∀α, β ∈ F, αu+ βv ∈W .

Proof. 2. and 3. are clearly equivalent.If W is a vector subspace, clearly 2. holds. To show the opposite implication we have to check

only (A2) and (A3), simply by definition of restriction function.(A2): Since W 6= ∅, we can take u ∈W . Then, from c., taken 0 ∈ F , we have 0u = 0 ∈W .(A3) : Taken u ∈W , from c., (−1)u ∈W , but then (−1)u = −u ∈W .

Example 139 1. Given an arbitrary vector space V , {0} and V are vector subspaces of V .2. Given R3,

W :=©(x1, x2, x3) ∈ R3 : x3 = 0

ªis a vector subspace of R3.3. Given the space V of polynomials, the set W of all polynomials of degree ≤ n is a vector

subspace of V.4. The set of all bounded or continuous or differentiable or integrable functions f : X → R is a

vector subspace of F (X).5. If V and W are vector spaces, then V ∩W is a vector subspace of V and W .6. [0,+∞) is not a vector subspace of R.7. Let V = {0} ×R ⊆ R2 and W = R× {0}⊆ R2. Then V ∪W is not a vector subspace of R2.

4.4 Linear combinationsNotation convention. Unless otherwise stated, a greek (or Latin) letter with a subscript denotesa scalar; a Latin letter with a superscript denotes a vector.

Definition 140 Let V be a vector space, m ∈ N\ {0} and v1, v2, ..., vm ∈ V . A linear combinationof vectors v1, v2, ..., vm via coefficients α1, α2, ..., αm ∈ F is the vector

mXi=1

αivi.

The set of all such combinations(v ∈ V : ∃ (αi)mi=1 ∈ Fm such that v =

mXi=1

αivi

)

is called span of©v1, v2, ..., vm

ªand it is denoted by

span¡©v1, v2, ..., vm

ª¢or simply

span¡v1, v2, ..., vm

¢.

Definition 141 Let V be a vector space and S ⊆ V . span (S) is the set of all linear combinationsof a finite number of vectors in S.

Proposition 142 Let V be a vector space and S 6= ∅, S ⊆ V .

1. a. S ⊆ span (S) and b. span (S) is a vector subspace of V .

2. If W is a subspace of V and S ⊆W , then span (S) ⊆W .

3 It follows from Proposition 135.1.

4.5. ROW AND COLUMN SPACE OF A MATRIX 47

Proof. 1a. Given v ∈ S, 1v = v ∈ span (S) . 1b. Since S 6= ∅, then span (S) 6= ∅. Givenα, β ∈ F and v, w ∈span S. Then ∃α1, ..., αn ∈ F , v1, ..., vn ∈ S and β1, ..., βm ∈ F , w1, ..., wm ∈ S,such that v =

Pni=1 αiv

i and w =Pm

j=1 βjwj . Then

αv + βw =nXi=1

(ααi) vi +

mXj=1

¡ββj

¢wj ∈ spanS.

2. Take v ∈ span S. Then ∃α1, ..., αn ∈ F , v1, ..., vn ∈ S ⊆ W such that v =Pn

i=1 αivi ∈ W ,

as desired.

Definition 143 Let V be a vector space and v1, v2, ..., vm ∈ V . If V = span¡v1, v2, ..., vm

¢, we say

that V is the vector space generated or spanned by the vectors v1, v2, ..., vm.

Example 144 1. R3 = span ({(1, 0, 0) , (0, 1, 0) , (0, 0, 1)}).2. span

¡{tn}n∈N

¢is equal to the vector space of all polynomials.

4.5 Row and column space of a matrixDefinition 145 Given A ∈M (m,n),

row spanA := span (R1 (A) , .., Ri (A) , ..., Rm (A))

is called the row space of A or row span of A.The column space of A or col spanA is

col spanA := span (C1 (A) , .., Cj (A) , ..., Rn (A)) .

Remark 146 Given A ∈M (m,n)

col spanA = row spanAT .

Remark 147 Linear combinations of columns and rows of a matrix.Let A ∈M (m,n) , x ∈ Rn and y ∈ Rm . Then, ∀j ∈ {1, .., n} , Cj (A) ∈ Rm and ∀i ∈ {1, ...,m},

Ri (A) ∈ Rn. Then,

Ax = [C1 (A) , ..., Cj (A) , ..., Cn (A)] ·

⎡⎢⎢⎢⎢⎣x1...xj...xn

⎤⎥⎥⎥⎥⎦ =nXj=1

xj · Cj (A)

and

Ax is a linear combination of the columns of A via the components of the vector x.

Moreover,

yA = [y1, ..., yi, ..., ym]

⎡⎢⎢⎢⎢⎣R1 (A)...Ri (A)...Rm (A)

⎤⎥⎥⎥⎥⎦ =mXi=1

yi ·Ri (A)

and

yA is a linear combination of the rows of A via the components of the vector y.

As a consequence of the above observation, we have what follow.

1.row span A = {w ∈ Rn : ∃ y ∈ Rm such that w = yA} .


2.col span A = {z ∈ Rm : ∃ x ∈ Rn such that z = Ax} ;

Proposition 148 Given A,B ∈M (m,n),1. if A is row equivalent to B, then row spanA = row spanB;2. if A is column equivalent to B, then col spanA = col spanB.

Proof. 1. B is obtained by A via elementary row operations. Therefore, ∀i ∈ {1, ...,m}, eitheri. Ri (B) = Ri (A), orii. Ri (B) is a linear combination of rows of A.Therefore, row spanB ⊆ row spanA. Since A is obtained by B via elementary row operations,

row spanB ⊇ row spanA.2. if A is column equivalent to B, then AT is row equivalent to BT and therefore, from i. above,

row spanAT = row spanBT . Then the result follows from Remark 146.

Remark 149 Let A ∈M (m,n) be given and assume that

b := (bj)nj=1 =

mXi=1

ci ·Ri (A) ,

i.e., b is a linear combination of the rows of A. Then,

∀j ∈ {1, ..., n} , bj =mXi=1

ci ·Rji (A) ,

where ∀i ∈ {1, ...m} and ∀j ∈ {1, ...n} , Rji (A) is the j − th component of the i− th row Ri (A)

of A.

Lemma 150 Assume that A,B ∈M (m,n) are in echelon form with pivots

a1j1 , ..., aiji , ..., arjr ,

andb1k1 , ..., biki , ..., bsks ,

respectively, and4 r, s ≤ min {m,n}. Then

hrow spanA = row spanBi⇒ hs = r and for i ∈ {1, ..., s} , ji = kii .

Proof. Preliminary remark 1. If A = 0, then A = B and s = r = 0.Preliminary remark 2. Assume that A,B 6= 0 and then s, r ≥ 1. We want to verify that

j1 = k1. Suppose j1 < k1. Then, by definition of echelon matrix, Cj1 (B) = 0, otherwise youwould contradict Property 2 of the Definition 28 of echelon matrix. Then, from the assumptionthat row spanA = row spanB,we have that R1 (A) is a linear combination of the rows of B, viasome coefficients c1, ..., cm, and from Remark 149 and the fact that Cj1 (B) = 0, we have thata1j1 = c1 · 0 + ...cm · 0 = 0, contradicting the fact that a1j1 is a pivot for A. Therefore, j1 ≥ k1. Aperfectly symmetric argument shows that j1 ≤ k1.We can now prove the result by induction on the number m of rows.Step 1. m = 1.It is basically the proof of Preliminary Remark 2.Step 2.Given A,B ∈ M (m,n), define A0, B0 ∈ M (m− 1, n) as the matrices obtained erasing the first

row in matrix A and B respectively. From Remark 33, A0 and B0 are still in echelon form. Ifwe show that row spanA0 = row spanB0, from the induction assumption, and using PreliminaryRemark 2, we get the desired result.Let R = (a1, ..., an) be any row of A0. Since R ∈ row spanB, ∃ (di)mi=1 such that

R =mXi=1

diRi (B) .

4 See Remark 30.

4.5. ROW AND COLUMN SPACE OF A MATRIX 49

Since A is in echelon form and we erased its first row, we have that if i ≤ j1 = k1, then ai = 0,otherwise you would contradict the definition of j1. Since B is in echelon form, each entry in itsk1 − th column are zero, but b1k1 which is different from zero. Then,

a1k1 = 0 =mXi=1

di · bik1 = d1 · bik1 ,

and therefore d1 = 0, i.e., R =Pm

i=2 diRi (B), or R ∈ row spanB0, as desired. Symmetricargument shows the other inclusion.

Proposition 151 Assume that A,B ∈M (m,n) are in row canonical form. Then,

hrow spanA = row spanBi⇔ hA = Bi .

Proof. [⇐] Obvious.[⇒] From Lemma 150, the number of pivots in A and B is the same. Therefore, A and B have

the same number s of nonzero rows, which in fact are the first s rows. Take i ∈ {1, ..., s}. Sincerow spanA = row spanB, there exists (ch)

sh=1 such that

Ri (A) =sX

h=1

ch ·Rh (B) . (4.1)

We want then to show that ci = 1 and ∀l ∈ {1, ..., s} \ {i}, cl = 0.Let aijibe the pivot of Ri (A) , i.e., aiji is the nonzero ji− th component of Ri (A). Then, from

Remark 149,

aiji =sX

h=1

ch ·Rhji (B) =sX

h=1

ch · bhji . (4.2)

From Lemma 150, for i ∈ {1, ..., s} , ji = ki, and therefore biji is a pivot entry for B, and sinceB is in row reduced form, biji is the only nonzero element in the ji column of B. Therefore, from(4.2),

aiji =sX

h=1

ch ·Rhji (B) = ci · biji .

Since A and B are in row canonical form aiji = biji = 1 and therefore

ci = 1.

Now take l ∈ {1, ..., s} \ {i} and consider the pivot element bljl in Rl (B). From (4.1) andRemark 149,

aijl =sX

h=1

ch · bhjl = cl, (4.3)

where the last equalities follow from the fact that B is in row reduced form and therefore bljl isthe only nonzero element in the jl − th column of B, in fact, bljl = 1. From Lemma 150, since bljlis a pivot element for B, aljl is a pivot element for A. Since A is in row reduced form, aljl is theonly nonzero element in column jl of A. Therefore, since l 6= i, aijl = 0, and from (4.3), the desiredresult,

∀l ∈ {1, ..., s} \ {i} , cl = 0.,

does follow.

Proposition 152 For every A ∈ M (m,n), there exists a unique B ∈ M (m,n) which is in rowcanonical form and row equivalent to A.

Proof. The existence of at least one matrix with the desired properties is the content ofProposition 38. Suppose that there exists B1 and B2 with those properties. Then from Proposition148, we get

row spanA = row spanB1 = row spanB2.

From Proposition 151,B1 = B2.


Corollary 153 1. For any matrix A ∈ M (m,n) there exists a unique number r ∈ {0, 1, ...,min {m,n}}such that A is equivalent to the block matrix of the form∙

Ir 00 0

¸.

2. For any A ∈M (m,n), there exist invertible matrices P ∈M (m,m) and Q ∈M (n, n) and aunique number r ∈ {0, 1, ...,min {m,n}} such that A is equivalent to

PAQ =

∙Ir 00 0

¸Proof. 1.From Step 1 in the proof of Proposition 122 and from Proposition 152, there exists a unique

matrix A∗ which is row equivalent to A and it is in row canonical form.From Step 2 and 3 in the proof of Proposition 122 and from Proposition 152, there exist a unique

matrix∙Ir 00 0

¸Twhich is row equivalent to A∗T and it is in row canonical form. Therefore the

desired result follows.2.From Proposition 105.2, PA is row equivalent to A; from Proposition 119.2, PAQ is column

equivalent to PA. Therefore, PAQ is equivalent to A.From Proposition 124 ,∙Ir0 00 0

¸is equiv-

alent to A. From part 1 of the present Proposition, the desired result then follows.

4.6 Linear dependence and independenceDefinition 154 Let V be a vector space on a field F , m ∈ N\ {0} and v1, v2, ..., vm ∈ V.The setS =

©v1, v2, ..., vm

ªis a set of linearly dependent vectors if

either S = {0} ,or ∃ k ∈ {1, ...m} and (m− 1) ≥ 1 coefficients αj ∈ F with j ∈ {1, ...m} \ {k} such that

vk =X

j∈{1,...m}\{k}αjv

j

or, shortly,vk =

Xj 6=k

αjvj

i.e., there exists a vector equal to a linear combination of the other vectors.

Geometrical example in R2.

Definition 155 Let V be a vector space and S ⊆ V. The set S is linearly dependent if eitherS = {0} or there exists a subset of S of finite cardinality which is linearly dependent.

Proposition 156 Let V be a vector space and v1, v2, ..., vm ∈ V on a field F . The set S =©v1, v2, ..., vm

ª⊆ V is a set of linearly dependent vectors if and only if

∃ (β1, ..., βi, ..., βm) ∈ Fm\ {0} such thatmXi=1

βi · vi = 0, (4.4)

i.e., there exists a linear combination of the vectors equal to the null vector and with somenonzero coefficient.

Proof. [⇒]If #S = 1, i.e., S = {0}, any β ∈ R\ {0} is such that β · 0 = 0. Assume then that #S > 1. Take

βi =

½αi if i 6= j−1 if i = j

4.6. LINEAR DEPENDENCE AND INDEPENDENCE 51

[⇐]If S = {v} and ∃β ∈ R\ {0} is such that β · v = 0,then from Proposition 58.3 v = 0. Assume

then that #S > 1. Without loss of generality take β1 6= 0. Then,

β1v1 +

Xi6=1

βivi = 0

and

v1 =Xi6=1

βiβ1

vi.

Proposition 157 Let m ≥ 2 and v1, ..., vm be nonzero linearly dependent vectors. Then, one ofthe vectors is a linear combination of the preceding vectors, i.e., ∃ k > 1 and

¡αi¢k−1i=1

such that

vk =Pk−1

i=1 αvi.

Proof. Since©v1, ..., vm

ªare linearly dependent, ∃ (βi)

mi=1 ∈ Rm\ {0} such that

Pmi=1 βiv

i = 0.Let k be the largest i such that βi 6= 0, i.e.,

∃k ∈ {1, ...,m} such that βk 6= 0 and ∀i ∈ {k + 1, ...,m} , βi = 0. (4.5)

Consider the case k = 1. Then we would have β1 6= 0 and ∀i > 1, βi = 0 and therefore0 =

Pmi=1 βiv

i = β1v1, contradicting the assumption that v1, ..., vm are nonzero vectors. Then, we

must have k > 1, and from (4.5) , we have

0 =mXi=1

βivi =

kXi=1

βivi

and

βkvk =

k−1Xi=1

βivi,

or, as desired,

vk =k−1Xi=1

−βi

βkvi.

It is then enough to choose αi =−βiβk

for any i ∈ {1, ..., k − 1}.

Example 158 Take the vectors x1 = (1, 2), x2 = (−1,−2) and x3 = (0, 4). S :=©x1, x2, x3

ªis a

set of linearly dependent vectors: x1 = −1 · x2 + 0 · x3. Observe that there are no α1 , α2 ∈ R suchthat x3 = α1 · x1 + α2 · x2.

Definition 159 The set of vectors S =©v1, ..., vm

ª⊆ V is called a set of linearly independent

vectors if it is not linearly dependent, i.e., h ¬ (4.4)i ,i.e., if

∀ (β1, ..., βi, ..., βm) ∈ Fm\ {0} it is the case thatPm

i=1 βi · vi 6= 0,

or

(β1, ..., βi, ..., βm) ∈ Fm\ {0} ⇒mXi=1

βi · vi 6= 0

or

mXi=1

βi · vi = 0 ⇒ (β1, ..., βi, ..., βm) = 0

or the only linear combination of the vectors which is equal to the null vector has each coefficientequal to zero.


Example 160 The vectors (1, 2) , (1, 5) are linearly independent.

Example 161 Let V be the vector space of all functions f : R → R. f, g, h defined below arelinearly independent:

f (x) = e2x, g (x) = x2, h (x) = x.

Suppose there exists (α1, α2, α3) ∈ R3 such that

α1 · f + α2 · g + α3 · h = 0V ,

which means that∀x ∈ R, α1 · f(x) + α2 · g(x) + α3 · h(x) = 0.

We want to show that (α1, α2, α3) = (0, 0, 0). The trick is to find appropriate values of x to get thedesired value of (α1, α2, α3). Choose x to take values 0, 1,−1. We obtain the following system ofequations ⎧⎨⎩ α1 = 0

e · α1 + α2 + α3 = 0e−1α1 + α2 + (−1)α3 = 0

It then follows that (α1, α2, α3) = (0, 0, 0), as desired.

Geometrical example in R2.

Definition 162 Let V be a vector space and S ⊆ V . The set S is linearly independent if everyfinite subset of S is linearly independent.

Remark 163 From Remark 147, we have what follows:

hAx = 0⇒ x = 0i⇔ hthe column vectors of A are linearly independenti (4.6)

hyA = 0⇒ y = 0i⇔ hthe row vectors of A are linearly independenti (4.7)

Remark 164 ∅ is a set of linearly independent vectors: see Definition 159.

Example 165 Consider the vectors x1 = (1, 0, 0, 0) ∈ R4 and x2 = (0, 1, 0, 0) . Observe thatα1x

1 + α2x2 = 0 means (α1, α2, 0, 0) = (0, 0, 0, 0).

Exercise 166 Say if the following set is a set of linearly dependent or independent vectors:S =

©x1, x2, x3

ªwith x1 = (3, 2, 1), x2 = (4, 1, 3), x3 = (3,−3, 6).

Proposition 167 Let V be a vector space and v1, v2, ..., vm ∈ V . If S =©v1, ..., vm

ªis a set of

linearly dependent vectors and vm+1, ..., vm+k ∈ V are arbitrary vectors, then

S0 := S ∪©vm+1, ..., vm+k

ª=©v1, ..., vm, vm+1, ..., vm+k

ªis a set of linearly dependent vectors.

Proof. From the assumptions ∃i∗ ∈ {1, ...,m} and (aj)j 6=i∗ such that

vj =Xj 6=i∗

αjvj

But thenvj =

Xj 6=i∗

αjvj + 0 · vm+1 + 0 · vm+k

Proposition 168 If S =©v1, ..., vm

ª⊆ V is a set of linearly independent vectors, then S0 ⊆ S is

a set of linearly independent vectors

Proof. Suppose otherwise, but then you contradict the previous proposition.

4.6. LINEAR DEPENDENCE AND INDEPENDENCE 53

Remark 169 Consider vectors in Rn.

1. Adding components to linearly dependent vectors gives raise to linearly dependent or indepen-dent vectors;

2. Eliminating components from linearly independent vectors give raise to linearly dependent orindependent vectors;

3. Adding components to linearly independent vectors give raise to linearly independent vectors;

4. Eliminating components from linearly dependent vectors give raise to linearly dependent vec-tors.

To verify 1. 1nd 2. above consider the following two vectors:⎡⎣ 110

⎤⎦ ,⎡⎣ 221

⎤⎦To verify 4., if you have a set of linearly dependent vectors, it is possible to express one vector

as a linear combination of the others and then eliminate components leaving the equality still true.The proof of 3 is contained in the following Proposition.

Proposition 170 If Sx =©x1, ..., xm

ª⊆ Rn is a set of linearly independent vectors and Sy =©

y1, ..., ymª⊆ Rk is a set of vectors, then S =

©¡x1, y1

¢, ... (, xm, ym)

ª⊆ Rn+k is a set of linearly

independent vectors.

Proof. By assumptionmXi=1

βi · vi = 0 ⇒ (β1, ..., βi, ..., βm) = 0.

S is a set of linearly independent vectors ifmXi=1

βi ·¡vi, yi

¢= 0 ⇒ (β1, ..., βi, ..., βm) = 0.

SincemXi=1

βi ·¡vi, yi

¢= 0 ⇔

mXi=1

βi · vi = 0 andmXi=1

βi · yi = 0

the desired result follows.

Corollary 171 If Sx =©x1, ..., xm

ª⊆ Rn is a set of linearly independent vectors and Sy =©

y1, ..., ymª⊆ Rk and Sz =

©z1, ..., zm

ª⊆ Rlare sets of vectors, then

S =©¡z1, x

1, y1¢, ... (zm, x

m, ym)ª⊆ Rl+n+k

is a set of linearly independent vectors.

Example 172 1. The set of vectors©v1, ..., vm

ªis a linearly dependent set if ∃k ∈ {1, ...,m} such

that vk = 0:vk +

Xi6=k

0 · vi = 0

2. The set of vectors©v1, ..., vm

ªis a linearly dependent set if ∃k, k0 ∈ {1, ...,m} and α ∈ R

such that vk0= αvk:

vk0 − αvk +

Xi6=k,k0

0 · vi = 0

3. Two vectors are linearly dependent if and only if one is a multiple of the other.

Proposition 173 The nonzero rows of a matrix A in echelon form are linearly independent.

Proof. We will show that each row of A starting from the first one is not a linear combinationof the subsequent rows. Then, as a consequence of Proposition 157, the desired result will follow.Since A is in echelon form, the first row has a pivot below which all the elements are zero. Then

that row cannot be a linear combination of the following rows. Similar argument applies to theother rows.


4.7 Basis and dimension

Definition 174 A set S (of arbitrary cardinality) in a vector space V on a field F is a basis of Vif

1. S is a linearly independent set;

2. span (S) = V .

Proposition 175 A set S =©u1, u2, ..., un

ª⊆ V is a basis of V on a field F if and only if ∀v ∈ V

there exists a unique (αi)ni=1 ∈ Fn such that v =

Pni=1 αiu

i.

Proof. [⇒] Suppose there exist (αi)ni=1 , (βi)ni=1 ∈ Rn such that v =

Pni=1 αiu

i =Pn

i=1 βiui.

Then

0 =nXi=1

αiui −

nXi=1

βiui =

nXi=1

(αi − βi)ui.

Since©u1, u2, ..., un

ªare linearly independent,

∀i ∈ {1, ..., n} , αi − βi = 0,

as desired.[⇐]Clearly V = span (S); we are left with showing that

©u1, u2, ..., un

ªare linearly independent.

ConsiderPn

i=1 αiui = 0. Moreover,

Pni=1 0 · ui = 0. But since there exists a unique (αi)

ni=1 ∈ Rn

such that v =Pn

i=1 αiui, it must be the case that ∀i ∈ {1, ..., n} , αi = 0.

Lemma 176 Suppose that given a vector space V , span¡v1, ..., vm

¢= V .

1. If w ∈ V , then©w, v1, ..., vm

ªis linearly dependent and span

¡w, v1, ..., vm

¢= V ;

2. If vi is a linear combination of©vjªi−1j=1, then span

¡v1, ..., vi−1, vi+1, ..., vm

¢= V .

Proof. Obvious.

Lemma 177 (Replacement Lemma) Given a vector space V , if

1. span©v1, ..., vn

ª= V ,

2.©w1, ..., wm

ª⊆ V is linearly independent,

then

1. n ≥ m,

2. a. If n = m, then span©w1, ..., wm

ª= V .

b. if n > m, there exists©vi1 , ..., vin−m

ª⊆©v1, ..., vn

ªsuch that

span¡w1, ..., wm, vi1 , ..., vin−m

¢= V.

Proof. Observe preliminary that since©w1, ..., wm

ªis linearly independent, ∀j ∈ {1, ...,m},

wj 6= 0.Define I1 as the subset of {1, ..., n} such that ∀i ∈ I1, v

i 6= 0. Then, clearly, span³©

viªi∈I1

´=

span¡©v1, ..., vn

ª¢= V . We are going to show that m ≤ #I1, and since #I1 ≤ n, our result

will imply conclusion 1. Moreover, we are going to show that there exists©vi1 , ..., vi#I1−m

ª⊆©

viªi∈I1 ⊆

©v1, ..., vn

ªsuch that span

¡©w1, ..., wm, vi1 , ..., vi#I1−m

ª¢= V , and that result will

imply conclusion 2.Using the above observation and to make notation easier we will assume that I1 = {1, ..., n},

i.e., ∀i ∈ {1, ..., n} , vi 6= 0.

4.7. BASIS AND DIMENSION 55

Now consider the case n = 1.©w1, ..., wm

ª⊆ V implies that there exists (αj)

mj=1 ∈ Rm such

that ∀j, αj 6= 0 and wj = αjv1, then it has to be m = 1 (and conclusion 1 holds) and sincew1 = 1

α1v1, span

¡w1¢= V (and conclusion 2 holds).

In conclusion, we can assume n ≥ 2.First of all, observe that from Lemma 176.1,

©w1, v1, ..., vn

ªis linearly dependent and

span¡w1, v1, ..., vn

¢= V.

By Lemma 157, there exists k1 ∈ {1, ..., n} such that vk is a linear combination of the precedingvectors. Then from Lemma 176.2, we have

span³w1,

¡vi¢i6=k1

´= V.

Then again from Lemma 176.1,nw1, w2,

¡vi¢i6=k1

ois linearly dependent and span

³w1, w2,

¡vi¢i6=k1

´=

V . By Lemma 157, there exists k2 ∈ {2, ..., n} \ {k1} such that vk2 or w2 is a linear combination ofthe preceding vectors. That vector cannot be w2 because of assumption 2. Therefore,

span³w1, w2,

¡vi¢i6=k1,k2

´= V.

We can now distinguish three cases: m < n, m = n and m > n.Now if m < n, after m steps of the above procedure we get

span³w1, ..., wm,

¡vi¢i6=k1,k2,..,kn

´= V,

which shows 2.a. If m = n,we have

span¡w1, ..., wm

¢= V,

which shows 2.b.Let’s now show that it cannot be m > n. Suppose that is the case. Then, after n of the above

steps, we get span¡w1, ..., wn

¢= V and therefore wn+1 is a linear combination of

¡w1, ..., wn

¢,

contradicting assumption 2.

Proposition 178 Assume that S =©u1, u2, ..., un

ªand T =

©v1, v2, ..., vm

ªare bases of V . Then

n = m.

Proof. By definition of basis we have that

span¡©u1, u2, ..., un

ª¢= V and

©v1, v2, ..., vm


Then from Lemma 177, m ≤ n. Similarly,

span¡©v1, v2, ..., vm

ª¢= V and

©u1, u2, ..., un

ªare linearly independent,

and from Lemma 177, n ≤ m.The above Proposition allows to give the following Definition.

Definition 179 A vector space V has dimension n if there exists a basis of V whose cardinalityis n. In that case, we say that V has finite dimension (equal to n)and we write dimV = n. If avector space does not have finite dimension, it is said to be of infinite dimension.

Definition 180 The vector space {0} has dimension 0.

Example 181 1. A basis of Rn is©e1, ..., ei, ..., en

ª, where ei is defined in Definition 55. That

basis is called canonical basis.2. Consider the vector space Pn (t) of polynomials of degree ≤ n. The set

©t0, t1, ...tn

ªof

polynomials is a basis of Pn (t) and therefore dimPn (t) = n+ 1.

Proposition 182 Let V be a vector space of dimension n.


1. m > n vectors in V are linearly dependent;

2. If S =©u1, ..., un

ª⊆ V is a linearly independent set, then it is a basis of V ;

3. If span¡u1, ..., un

¢= V , then

©u1, ..., un

ªis a basis of V.

Proof. Let©w1, ..., wn

ªbe a basis of V . Then span

¡©w1, ..., wn

ª¢= V .

1. We want to show that©v1, ..., vm

ªarbitrary vectors in V are linearly dependent. Suppose

otherwise, then by Lemma 177, we would have m ≤ n, a contradiction.

2. It is the content of Lemma 177.2.a.

3. We have to show that©u1, ..., un

ªare linearly independent. Suppose otherwise. Then there

exists k ∈ {1, ..., n} such that span³n¡

ui¢i6=k

o´= V , but since

©w1, ..., wn

ªis linearly inde-

pendent, from Proposition 177 (the Replacement Lemma), we have n ≤ n−1, a contradiction.

Remark 183 The above Proposition 182 shows that in the case of finite dimensional vector spaces,one of the two conditions defining a basis is sufficient to obtain a basis.

Proposition 184 (Completion Lemma) Let V be a vector space of dimension n and©w1, ..., wm

ª⊆

V be a linearly independent set, with5 m ≤ n. If m < n, then, there exists a set©u1, ..., un−m

ªsuch that ©

w1, ..., wm, u1, ..., un−mª

is a basis of V .

Proof. Take a basis©v1, ..., vn

ªof V . Then from Conclusion 2.b in Lemma 177,

span¡w1, ..., wm, vi1 , ..., vin−m

¢= V.

Then from Proposition 182.3, we get the desired result.

Proposition 185 Let W be a subspace of an n−dimensional vector space V . Then

1. dimW ≤ n;

2. If dimW = n, then W = V .

Proof. 1. From Proposition 182.1, m > n vectors in V are linearly dependent. Since a basis ofW is a set of linearly independent vectors then dimW ≤ n.2. If

©w1, ..., wn

ªis a basis of W , then span

¡w1, ..., wn

¢= W . Moreover, those vectors are n

linearly independent vectors in V . Therefore from Proposition 182.2., span¡w1, ..., wn

¢= V .

Remark 186 As a trivial consequence of Proposition 184, V = span©u1, ..., ur

ª⇒ dimV ≤ r.

Example 187 Let W be a subspace of R3, whose dimension is 3. Then from the previous Propo-sition, dimW ∈ {0, 1, 2, 3}. In fact,1. If dimW = 0, then W = {0}, i.e., a point,2. if dimW = 1, then W is a straight line trough the origin,3. if dimW = 2, then W is a plane trough the origin,4. if dimW = 3, then W = R3.

Definition 188 A maximal linearly independent subset S0 of a set of vectors S ⊆ V is a subset ofS such that1. S0 is linearly independent, and2. S0 ⊆ S00 ⊆ S ⇒ S00 is linearly dependent, i.e., if S00 is another subset of S whose cardinality

is bigger than the cardinality of S0, then S00 is a linearly dependent set.

5The inequality m ≤ n follows from Proposition 182.1.

4.7. BASIS AND DIMENSION 57

Lemma 189 Given a vector space V , if1. S =

©v1, ..., vk

ª⊆ V are linearly independent, and

2. S0 = S ∪©vk+1

ªare linearly dependent,

thenvk+1 is a linear combination of the vectors in S.

Proof. First Proof.Since S0 is a linearly dependent set,

∃i ∈ {1, ..., k + 1} ,¡βj¢j∈{1,...,k+1}\{i} such that v

i =X

j∈{1,...,k+1}\{i}βjv

j .

If i = k + 1,we are done. If i 6= k + 1, without loss of generality, take i = 1. Then

∃¡γj¢j∈{1,...,k+1}\{1} such that v

1 =k+1Xj=2

γjvj .

If γk+1 = 0, we would have v1 −

Pkj=2 γjv

j = 0, contradicting Assumption 1. Then

vk+1 =1

γk+1

⎛⎝ v1 −kX

j=2

γjv

⎞⎠ .

Second Proof.It follows from Proposition 157, as shown below. Observe that for any i ∈ {1, ..., k}, vk 6= 0,

otherwise S would be linearly dependent. If vk+1 = 0, we are done. If vk+1 6= 0, assume theconclusion is false; then, from Proposition 157, ∃k0 ∈ {1, ..., k} such that vk0 is a linear combinationof the preceding vectors, contradicting assumption 1.

Remark 190 Let V be a vector space and S ⊆ T ⊆ V . Then,1. span S ⊆ span T ;2. span (span S) = span S.

Proposition 191 If S is a finite set of vectors in V and S0 is a maximal linearly independentsubset of S, then S0 is a basis of span S.

Proof. Let S0 be equal to©v1, ..., vn

ª. If w ∈ S0, then w ∈ span S0; if w ∈ S\S0, then©

v1, ..., vn, wªis linearly dependent and w ∈ span S0. Therefore, w ∈ S implies that w ∈ span

S0,i.e.,S ⊆ span S0.

Then

span S(1)

⊆ span (span S0)(2)= span S0

(3)

⊆ span S.

where (1) follows from Remark 190.1, (2) Remark 190.2, and (3) from Remark 190.1. Then spanS = span S0 and therefore S0 is a basis of span S.

Proposition 192 Let S0, S00be two maximal linearly independent subsets of a finite S ⊆ V . Then#S = #S0.

Proof. From Proposition 191 , S0 and S00are a basis of span S. Therefore, they have the samecardinality.Proposition 192 allows to give the following definition.

Definition 193 The row rank of A ∈ M (m,n) is the maximum number of linearly independentrows of A (i.e., the cardinality of each maximal linearly independent subset of the set of the rowvectors of A).

Definition 194 The column rank of A ∈M (m,n) is the maximum number of linearly independentcolumns of A.


Proposition 195 For any A ∈M (m,n),

1. row rank of A is equal to dim row span of A;

2. column rank of A is equal to dim col span of A.

Proof. 1.Take a maximal linearly independent subset S of the rows of A. Then, by definition,

#S = row rank A. (4.8)

From Proposition 191, S is a basis of row span A, and therefore

#S = dim row rank A. (4.9)

(4.8) and (4.9) give the desired result.2.dim col span A

Rmk.146= dim row span AT = max # l.i. rows in AT = max # l.i. columns in A.

Proposition 196 For any A ∈M (m,n), row rank of A is equal to column rank of A.

Proof. Let A be an arbitrary m× n matrix⎡⎢⎢⎢⎢⎣a11 ... a1j ... a1n...ai1 ... aij ... ain...am1 ... amj ... amn

⎤⎥⎥⎥⎥⎦Suppose the row rank is r ≤ m and the following r vectors form a basis of the row space:⎡⎢⎢⎢⎢⎣

S1 = [b11 ... b1j ... b1n]...

Sk = [bk1 ... bkj ... bkn]...

Sr = [br1 ... brj ... brn]

⎤⎥⎥⎥⎥⎦Then, each row vector of A is a linear combination of the above vectors, i.e., we have

∀i ∈ {1, ...,m} , Ri =rX

k=1

αikSk,

or

∀i ∈ {1, ...,m} ,£ai1 ... aij ... ain

¤=

rXk=1

αik [bk1 ... bkj ... bkn] ,

and setting the j component of each of the above vector equations equal to each other, we have

∀j ∈ {1, .., n} and ∀i ∈ {1, ...,m} , aij =rX

k=1

αikbkj ,

and

∀j ∈ {1, .., n} ,

⎧⎪⎪⎪⎪⎨⎪⎪⎪⎪⎩a1j =

Prk=1 α1kbkj ,

....aij =

Prk=1 αikbkj,

...amj =

Prk=1 αmkbkj ,

or

∀j ∈ {1, .., n} ,

⎡⎢⎢⎢⎢⎣a1j...aij...amj

⎤⎥⎥⎥⎥⎦ =rX

k=1

bkj

⎡⎢⎢⎢⎢⎣α1k...αik...αmk

⎤⎥⎥⎥⎥⎦ ,

4.8. CHANGE OF BASIS 59

i.e., each column of A is a linear combination of the r vectors⎧⎪⎪⎪⎪⎨⎪⎪⎪⎪⎩

⎛⎜⎜⎜⎜⎝α11...αi1...αm1

⎞⎟⎟⎟⎟⎠ , ...,

⎛⎜⎜⎜⎜⎝α1k...αik...αmk

⎞⎟⎟⎟⎟⎠ , ...,

⎛⎜⎜⎜⎜⎝α1r...αir...αmr

⎞⎟⎟⎟⎟⎠⎫⎪⎪⎪⎪⎬⎪⎪⎪⎪⎭ .

Then, from Remark 186,dim col span A ≤ r = row rank A, (4.10)

i.e.,col rank A ≤ row rank A.

From (4.10) ,which holds for arbitrary matrix A, we also get

dim col span AT ≤ row rank AT . (4.11)

Moreover,

dim col span AT Rmk. 146= dim row span A := row rank A

androw rank AT := dim row span AT Rmk. 146

= dimcol span A.

Then, from the two above equalities and (4.11), we get

row rank A ≤ dim col span A, (4.12)

and (4.10) and (4.12) gives the desired result.We can summarize Propositions 195 and 196 as follows.

Corollary 197 For every A ∈M (m,n),

row rank A = dim row spanA = col rank A = dimcol span A.

Exercise 198 Check the above result on the following matrix⎡⎣ 1 2 35 1 63 7 10

⎤⎦ .4.8 Change of basisDefinition 199 Let V be a vector space over a field F with a basis S =

©v1, ..., vn

ª. Then, from

Proposition 175 ∀v ∈ V ∃! (ai)ni=1 ∈ Fn such that v =Pn

i=1 αivi. The scalars (ai)

ni=1 are called

the coordinates of v relative to the basis S and are denoted by [v]S, or simply by [v], when noambiguity arises. We also denote the i− th component of the vector [v]S by [v]

iS and therefore we

have [v]S =³[v]iS

´ni=1.

Remark 200 If V = Rn and S =©einªni=1

:= en, i.e., the canonical basis, then

∀x ∈ Rn, [x]en =

"nXi=1

xiein

#= x.

Exercise 201 Consider the vector space V of polynomials of degree ≤ 2©ax2 + bx+ c : a, b, c ∈ R

ª.

Find the coordinates of an arbitrary element of V with respect to the basisnv1 = 1, v2 = t− 1, v3 = (t− 1)2

o


Definition 202 Consider a vector space V of dimension n and two bases v =©viªni=1

and u =©ukªnk=1

of V . Then,

∀k ∈ {1, ..., n} ,∃! (aik)ni=1 =

⎡⎢⎢⎢⎢⎣a1k...aik

ank

⎤⎥⎥⎥⎥⎦ such that uk =nXi=1

aikvi. (4.13)

The matrix

P =

⎡⎢⎢⎢⎢⎣a11 ... a1k ... a1n

...ai1 aik ain

an1 ank ann

⎤⎥⎥⎥⎥⎦ ∈M (n, n) , (4.14)

i.e.,P =

£ £u1¤v

...£uk¤v

... [un]v¤∈M (n, n) ,

is called the change-of-basis matrix from the basis v to the basis u.

Remark 203 The name of the matrix P in the above Definition follows from the fact that theentries of P are used to “transform” (in the way described in (4.13)) vectors of the basis v invectors of the basis u.

Proposition 204 If P is the change-of-basis matrix from the basis v to the basis u and Q thechange-of-basis matrix from the basis u to the basis v, then P and Q are invertible matrices andP = Q−1.

Proof. By assumption,

∀k ∈ {1, ..., n} ,∃ (aik)ni=1 such that uk =nXi=1

aikvi, (4.15)

P = [aik] ∈M (n, n) ,

∀i ∈ {1, ..., n} ,∃ (bj)nj=1 =

⎡⎢⎢⎢⎢⎣b1i...bji

bni

⎤⎥⎥⎥⎥⎦ such that vi =nXj=1

bjiuj , (4.16)

Q =

⎡⎢⎢⎢⎢⎣b11 ... b1i ... b1n

...bj1 bji bjn

bn1 bni bnn

⎤⎥⎥⎥⎥⎦ ∈M (n, n) ,Substituting (4.16) in (4.15), we get

uk =nXi=1

aik

⎛⎝ nXj=1

bjiuj

⎞⎠ =nXj=1

ÃnXi=1

bjiaik

!uj

Now, observe that ∀k ∈ {1, ..., n} ,

£uk¤u=

ÃnXi=1

bjiaik

!n

j=1

= ekn


and

ÃnXi=1

bjiaik

!n

j=1

=

⎛⎜⎜⎜⎜⎝£ bj1 bji bjn¤⎡⎢⎢⎢⎢⎣

a1k...aik

ank

⎤⎥⎥⎥⎥⎦⎞⎟⎟⎟⎟⎠n

j=1

=¡bTj·a·k

¢nj=1

is the k−th column of BA.

ThereforeBA = I

and we got the desired result.The next Proposition explains how the coordinate vectors are affected by a change of basis.

Proposition 205 Let P be the change-of-basis matrix from v to u. Then,

∀w ∈ V, [w]u = P−1 · [w]v and

[w]v = P · [w]u

Proof. By definition of [w]u, there exists (αk)nk=1 ∈ Rn such that [w]u = (αk)

nk=1 and

w =nX

k=1

αkuk.

Moreover, as said at the beginning of the previous proof,

∀k ∈ {1, ..., n} ,∃ (aik)ni=1 such that uk =Pn

i=1 aikvi.

Then,

w =nX

k=1

αk

ÃnXi=1

aikvi

!=

nXi=1

ÃnX

k=1

αkaik

!vi,

i.e.,

[w]v =

ÃnX

k=1

aik · αk

!n

i=1

.

Moreover, using the definition of P - see (4.14) - we get

P · [w]u =

⎡⎢⎢⎢⎢⎣a11 ... a1k ... a1n

...ai1 aik ain

an1 ank ann

⎤⎥⎥⎥⎥⎦⎡⎢⎢⎢⎢⎣

a1...ak

an

⎤⎥⎥⎥⎥⎦ =⎡⎢⎢⎢⎢⎣Pn

k=1 a1k · αk...Pn

k=1 aik · αk...Pn

k=1 ank · αk

⎤⎥⎥⎥⎥⎦ ,as desired.

Remark 206 Although P is called the change-of-basis matrix from v to u, for the reason explainedin Remark 203, it is P−1 which transforms the coordinates of w ∈ V relative to the original basisv into the coordinates of w relative to the new basis u.

Proposition 207 Let S =©v1, ..., vn

ªbe a subset of a vector space V . Let T be a subset of V

obtained from S using one of the following “elementary operations”:

1. interchange two vectors,

2. multiply a vector by a nonzero scalar,

3. add a multiple of one vector to another one.

Then,


1. spanS = spanT , and

2. S is independent ⇔ T is independent.

Proof. 1. Any element in T is a linear combination of vectors of S. Since any operation has aninverse, i.e., an operation which brings the set back to its original nature - similarly to what saidin Proposition 94 - any element in S is a linear combination of vectors in T .

2. S is independent(1)⇔ S is a basis of spanS

(2)⇔ dim spanS = n(3)⇔ dim spanT = n ⇔ T is a

basis of spanT ⇔ T is independentwhere (1) follows from the definitions of basis and span, the [⇐] part in (2) from Proposition

182.3, (3) from above conclusion 1. .

Proposition 208 Let A = [aij ] , B = [bij ] ∈ M (m,n) be row equivalent matrices over a field Fand v1, ...vn vectors in a vector space V . Define

u1 = a11v1+ ... a1jv

j+ ... a1nvn

...ui = ai1v

1 aijvj+ ainv

n

um = am1v1 amjv

j+ amnvn

andw1 = b11v

1+ ... b1jvj+ ... b1nv

n

...wi = bi1v

1 bijvj+ binv

n

wm = bm1v1 bmjv

j+ bmnvn

.

Thenspan

¡u1, ..., um

¢= span

¡w1, ..., wm

¢.

Proof. Observe that applying “elementary operations” of the type defined in the previousProposition to the set of vectors

©u1, ..., um

ªis equivalent to applying elementary row operations

to the matrix A.Since B can be obtained via row operations from A, then the set of vectors

©w1, ..., wm

ªcan be

obtained applying “elementary operations” of the type defined in the previous Proposition to theset of vectors

©u1, ..., um

ª, and from that Proposition the desired result follows.

Proposition 209 Let v1, ..., vn belong to a vector space V . Assume that ∀k ∈ {1, ..., n} , ∃ (aik)ni=1such that uk =

Pnk=1 aikv

k , and define P = [aik] ∈M (n, n). Then,1. If P is invertible, thenD©

viªni=1

is linearly independentE⇔D©

ukªnk=1

is linearly independentE;

2. If P is not invertible, then©ukªnk=1

is linearly dependent.

Proof. 1.From Proposition 104, P is row equivalent to the identity matrix In. Therefore, from Proposition

208,span

©viªni=1

= span©ukªnk=1

,

and from Proposition 207, the desired result follows.2. Since P is not invertible, then P is not row equivalent to the identity matrix and from

Proposition 122, it is equivalent to a matrix with a zero row, say the last one. Then ∀i ∈ {1, ..., n},ani = 0 and un =

Pnk=1 ankv

k = 0.

Corollary 210 Let v =©v1, ..., vn

ªbe a basis of a vector space V . Let P = [aik] ∈ M (n, n) be

invertible. Then u =©u1, ..., un

ªsuch that ∀k ∈ {1, ..., n} , uk =

Pnk=1 aikv

k is a basis of V andP is the change-of-basis matrix from v to u.


Proposition 211 If v =©viªni=1

is a basis of Rn,

B :=£v1 ... vn

¤∈M (n, n) ,

i.e., B is the matrix whose columns are the vectors of S, and P ∈M (n, n) is an invertible matrix,then1.

BP :=£u1 ... un

¤∈M (n, n) ,

is a matrix whose columns are another basis of Rn, and P is the change-of-basis matrix from v tou :=

©u1, ..., un

ª, and

2. ∀w ∈ Rn, [w]u = P−1 · [w]v.

Proof. 1.

BP(3.4)in Rmk. 71

=£B · C1 (P ) ... B · Cn (P )

¤ Rmk. 147=

£ Pnk=1C

k1 (P ) · vk ...

Pnk=1C

kn (P ) · vk

¤.

Therefore,

∀i ∈ {1, ..., n} , ui =nX

k=1

Cki (P ) · vk,

i.e., each element in the basis u is a linear combinations of elements in v. Then from Corollary210, the desired result follows.2. It follows from Proposition 205.

Remark 212 In the above Proposition 211, if

B :=£e1 ... en

¤,

i.e., e =©e1, ..., en

ªis the canonical basis (and P is invertible), then

1. the column vectors of P are a new basis of Rn , call it u, and2. ∀x ∈ Rn,

y := [x]u = P−1 · [x]e = P−1 · x.

Chapter 5

Determinant and rank of a matrix

In this chapter we are going to introduce the definition of determinant, an useful tool to studylinear dependence, invertibility of matrices and solutions to systems of linear equations.

5.1 Definition and properties of the determinant of a matrixTo motivate the definition of determinant, we present an informal discussion of a way to findsolutions to the linear system with two equations and two unknowns, shown below.½

a11x1 + a12x2 = b1a21x1 + a22x2 = b2

(5.1)

The system can be rewritten as follows

Ax = b

where

A =

∙a11 a12a21 a22

¸, x =

∙x1x2

¸, b =

∙b1b2

¸.

Let’s informally discuss how to find solutions to system (5.1). If a22 6= 0 and a12 6= 0, multiplyingboth sides of the first equation by a22, of the second equation by −a12 and adding up, we get

a11a22x1 + a12a22x2 − a12a21x1 − a12a22x2 = a22b1 − a12b2

Therefore, ifa11a22 − a12a21 6= 0

we have

x1 =b1a22 − b2a12a11a22 − a12a21

(5.2)

In a similar manner1 we have

x2 =b2a11 − b1a21a11a22 − a12a21

(5.3)

We can then the following preliminary definition: given A ∈M22, the determinant of A is

det A = det

∙a11 a12a21 a22

¸:= a11a22 − a12a21

Using the definition of determinant, we can rewrite (5.2) and (5.3)as follows.

x1 =

det

∙b1 a12b2 a22

¸det A

and x2 =

det

∙a11 b1a21 b2

¸detA

1Assuming a21 6= 0 and a11 6= 0, multiply both sides of the first equation by a21, of the second equation by −a11and then add up.

65

66 CHAPTER 5. DETERMINANT AND RANK OF A MATRIX

We can now present the definition of the determinant of a square matrix An×n for arbitraryn ∈ N\ {0}.

Definition 213 Given n > 1 and A ∈M (n, n), ∀i, j ∈ {1, ..., n}, we call Aij ∈ M (n− 1, n− 1) thematrix obtained from A erasing the i− th row and the j − th column.

Definition 214 Given A ∈M (1, 1), i.e., A = [a] with a ∈ R. The determinant of A is denoted bydetA and we let detA := a. For n ∈ N\ {0, 1}, given A ∈M (n, n), we define the determinant of Aas

detA :=nXj=1

(−1)1+ja1j detA1j

Observe that [a1j ]nj=1 is the first row of A, i.e.,

detA :=nXj=1

(−1)1+jRj1 (A) detA1j

Example 215 For n = 2, we have

det

∙a11 a12a21 a22

¸=

2Xj=1

(−1)1+ja1j detA1j = (−1)1+1a11 detA11 + (−1)1+2a12 detA12 =

= a11a22 − a12a21

and we get the informal definition given above.

Example 216 For n = 3, we have

detA = det

⎡⎣ a11 a12 a13a21 a22 a23a31 a32 a33

⎤⎦ = (−1)1+1a11 detA11 + (−1)1+2a12 detA12 + (−1)1+3a13 detA13

Definition 217 Given A = [aij ] ∈M (n, n), , ∀i, j, detAij is called minor of aij in A;

(−1)i+j detAij

is called cofactor of aij in A.

Theorem 218 Given A ∈M (n, n), detA is equal to the sum of the products of the elements of anyrows or column for the corresponding cofactors, i.e.,

∀i ∈ {1, ...n} , detA =nXj=1

(−1)i+jRji (A) detAij (5.4)

and

∀j ∈ {1, ...n} , detA =nXi=1

(−1)i+jCij (A) detAij (5.5)

Proof. Omitted. We are going to omit several proofs about determinants. There are differentways of introducing the concept of determinant of a square matrix. One of them uses the concept ofpermutations - see, for example, Lipschutz (1991), Chapter 7. Another one is an axiomatic approach- see, for example, Lang (1971) - he introduces (three) properties that a function f : M (n, n)→ Rhas to satisfy and then shows that there exists a unique such function, called determinant. Followingthe first approach the proof of the present theorem can be found on page 252, in Lipschutz (1991)Theorem 7.8, or following the second approach, in Lang (1971), page 128, Theorem 4.

Definition 219 The expression used above for the computation of detA in (5.4) is called “(Laplace)expansion” of the determinant by row i.The expression used above for the computation of detA in (5.5) is called “(Laplace) expansion”

of the determinant by column j.

5.1. DEFINITION AND PROPERTIES OF THE DETERMINANT OF A MATRIX 67

Definition 220 Consider a matrix An×n. Let 1 ≤ k ≤ n. A k − th order principal submatrix(minor) of A is the (determinant of the) square submatrix of A obtained deleting (n− k) rows and(n− k) columns in the same position.

Theorem 221 (Properties of determinants)Let the matrix A = [aij ] ∈M (n, n) be given. Properties presented below hold true even if words

“column, columns” are substituted by the words “row, rows”.

1. detA = detAT .

2. if two columns are interchanged, the determinant changes its sign,.

3. if there exists j ∈ {1, .., n} such that Cj (A) =Pp

k=1 βkbk, then

det

"C1 (A) , ...,

pXk=1

βkbk, ..., Cn (A)

#=

pXk=1

βk det£C1 (A) , ..., b

k, ..., Cn (A)¤,

i.e., the determinant of a matrix which has a column equal to the linear combination of somevectors is equal to the linear combination of the determinants of the matrices in which thecolumn under analysis is each of the vector of the initial linear combination, and, therefore,∀β ∈ R and ∀j ∈ {1, ..., n},

det [C1 (A) , ..., βCj (A) , ..., Cn (A)] = β det [C1 (A) , ..., Cj (A) , ..., Cn (A)] = β detA.

4. if ∃j ∈ {1, ..., n} such that Cj (A) = 0, then detA = 0, i.e., if a matrix has column equal tozero, then the determinant is zero.

5. if ∃ j, k ∈ {1, ..., n} and β ∈ Rsuch that Cj (A) = βCk (A), then detA = 0, i.e., the determi-nant of a matrix with two columns proportional one to the other is zero.

6. If ∃k ∈ {1, ..., n} and β1, ..., βk−1, βk+1, ..., βn ∈ R such that Ck (A) =P

j 6=k βjCj (A), thendetA = 0, i.e., if a column is equal to a linear combination of the other columns, thendetA = 0.

7.

det

⎡⎣C1 (A) , ..., Ck (A) +Xj 6=k

βj · Cj (A) , ..., Cn (A)

⎤⎦ = detA8. ∀j, j∗ ∈ {1, ..., n} ,

Pni=1 aij ·(−1)

i+j∗detAij∗ = 0, i.e., the sum of the products of the elements

of a column times the cofactor of the analogous elements of another column is equal to zero.

9. If A is triangular, detA = a11 · ... · a22 · ...ann, i.e., if A is triangular (for example, diagonal),the determinant is the product of the elements on the diagonal.

Proof. 1.Consider the expansion of the determinant by the first row for the matrix A and the expansion

of the determinant by the first column for the matrix AT .2.We proceed by induction. Let A be the starting matrix and A0the matrix with the interchanged

columns.P (2) is obviously true.P (n− 1)⇒ P (n)Expand detA and detA0 by a column which is not any of the interchanged ones:

detA =nXi=1

(−1)i+jCij (A) detAij

detA0 =nXi=1

(−1)i+kCij (A) detA

0ij


Since ∀k ∈ {1, ..., n}, Akj , A0kj ∈ M (n− 1, n− 1), and they have interchanged column, by theinduction argument, detAkj = −detA0kj, and the desired result follows.3.Observe that

pXk=1

βkbk =

ÃpX

k=1

βkbki

!n

i=1

.

Then,det

£C1 (A) , ...,

Ppk=1 βkb

k, ..., Cn (A)¤=

= det

⎡⎢⎢⎢⎢⎣C1 (A) , ...,pX

k=1

βk

⎡⎢⎢⎢⎢⎣bk1...bki...bkn

⎤⎥⎥⎥⎥⎦ , ..., Cn (A)

⎤⎥⎥⎥⎥⎦ =

= det

⎡⎢⎢⎢⎢⎣C1 (A) , ...,⎡⎢⎢⎢⎢⎣Pp

k=1 βkbk1

...Ppk=1 βkb

ki

...Ppk=1 βkb

kn

⎤⎥⎥⎥⎥⎦ , ..., Cn (A)

⎤⎥⎥⎥⎥⎦ =

=nXi=1

(−1)i+jÃ

pXk=1

βkbki

!detAij =

=Pp

k=1 βkPn

i=1 (−1)i+j

bki detAij =Pp

k=1 βk det£C1 (A) , ..., b

k, ..., Cn (A)¤.

4.It is sufficient to expand the determinant by the column equal to zero.5.Let A := [C1 (A) , βC1 (A) , C3 (A) , ..., Cn (A)] and eA := [C1 (A) , C1 (A) , C3 (A) , ..., Cn (A)] be

given. Then detA = β det [C1 (A) , C1 (A) , C3 (A) , ..., Cn (A)] = β det eA. Interchanging the firstcolumn with the second column of the matrix eA, from property 2, we have that det eA = −det eA andtherefore det eA = 0, and detA = β det eA = 0.6.It follows from 3 and 5.7.It follows from 3 and 6.8.It follows from the fact that the obtained expression is the determinant of a matrix with two

equal columns.9.It can be shown by induction and expanding the determinant by the firs row or column, choosing

one which has all the elements equal to zero excluding at most the first element. In other words, inthe case of an upper triangular matrix, we can say what follows.

det

∙a11 a120 a22

¸= a11 · a22.

det

⎡⎢⎢⎢⎢⎢⎣a11 a12 a13 ... a1n0 a22 a23 ... a2n0 0 a33 a3n

.... . .

0 ... 0 ... ann

⎤⎥⎥⎥⎥⎥⎦ = a11 det

⎡⎢⎢⎢⎣a22 a23 ... a2n0 a33 a3n

. . .... 0 ... ann

⎤⎥⎥⎥⎦ = a11 · a22 · a33 · ...ann.

Theorem 222 ∀A,B ∈M (n, n), det(AB) = detA · detB.

Proof. Exercise.

Definition 223 A ∈M (n, n) is called nonsingular if detA 6= 0.

5.2. RANK OF A MATRIX 69

5.2 Rank of a matrixDefinition 224 Given A ∈ M (m,n), a square submatrix of A of order k ≤ min {m,n} is a matrixobtained considering the elements belonging to k rows and k columns of A.

Definition 225 Given A ∈ M (m,n), the rank of A is the greatest order of square nonsingularsubmatrices of A.

Remark 226 rankA ≤ min {m,n}.

To compute rank A, with A ∈ M (m,n), we can proceed as follows.1. Consider k = min {m,n}, and the set of square submatrices of A of order k. If there exists

a nonsingular matrix among them, then rank A = k. If all the square submatrices of A of order kare singular, go to step 2 below.2. Consider k− 1, and then the set of the square submatrices of A of order k− 1. If there exists

a nonsingular matrix among them, then rank A = k− 1. If all square submatrices of order k− 1aresingular, go to step 3.3. Consider k − 2 ...and so on.

Remark 227 1. rank In = n.2. The rank of a matrix with a zero row or column is equal to the rank of that matrix without

that row or column, i.e.,

rank

∙A0

¸= rank

£A 0

¤= rank

∙A 00 0

¸= rank A

That result follows from the fact that the determinant of any square submatrix of A involvingthat zero row or columns is zero.3. From the above results, we also have that

rank

∙Ir 00 0

¸= r

We now describe an easier way to the compute the rank of A, which in fact involves elementaryrow and column operations we studied in Chapter 1.

Proposition 228 Given A,A0 ∈ M (m,n),

h A is equivalent to A0i⇔ hrank A = rank A0i

Proof. [⇒] Since A is equivalent to A0, it is possible to go from A to A0 through a finite numberof elementary row or column operations. In each step, in any square submatrix A∗ of A which hasbeen changed accordingly to those operations, the elementary row or column operations 1, 2 and 3(i.e., 1. row or column interchange, 2. row or column scaling and 3. row or column addition) aresuch that the determinant of A∗ remains unchanged or changes its sign (Property 2, Theorem 221),it is multiplied by a nonzero constant (Property 3), remains unchanged (Property 7), respectively.Therefore, each submatrix A∗ whose determinant is different from zero remains with determinant

different from zero and any submatrix A∗ whose determinant is zero remains with zero determinant.[⇐]From Corollary 153.2 2, we have that there exist unique br and br0 such that

A is equivalent to∙Ir 00 0

¸2That results says what follows:For any A ∈ M (m,n), there exist invertible matrices P ∈ M (m,m) and Q ∈ M (n, n) and a unique number

r ∈ {0, 1, ...,min {m,n}} such that A is equivalent to

PAQ =Ir 00 0


and

A0 is equivalent to∙Ir0 00 0

¸.

Moreover, from [⇒] part of the present proposition, and Remark 227

rankA = rank

∙Ir 00 0

¸= br

and

rankA0 = rank

∙Ir0 00 0

¸= br0.

Then, by assumption, br = br0 := r, and A and A0 are equivalent to∙Ir 00 0

¸and therefore A is equivalent to A0.

Example 229 Given ⎡⎣ 1 2 32 3 43 5 7

⎤⎦we can perform the following elementary rows and column operations, and cancellation of zero

row and columns on the matrix:⎡⎣ 1 2 32 3 43 5 7

⎤⎦ ,⎡⎣ 1 2 32 3 40 0 0

⎤⎦ , ∙ 1 2 32 3 4

¸,

∙1 1 32 1 4

¸,

∙1 1 02 1 1

¸,

∙1 1 00 0 1

¸,

∙0 1 00 0 1

¸,

∙1 00 1

¸.

Therefore, the rank of the matrix is 2.

5.3 Inverse matrices (continued)

Using the notion of determinant, we can find another way of analyzing the problems of i. existenceand ii. computation of the inverse matrix. To do that, we introduce the concept of adjoint matrix.

Definition 230 Given a matrix An×n, we call adjoint matrix of A, and we denote it by Adj A,the matrix whose elements are the cofactors3 of the corresponding elements of AT .

Remark 231 In other words to construct Adj A,1. construct AT ,2. consider the cofactors of each element of AT .

Example 232

A =

∙1 20 2

¸, AT =

∙1 02 2

¸, Adj A =

∙2 −20 1

¸Observe that ∙

1 20 2

¸ ∙2 −20 1

¸=

∙2 00 2

¸= (detA) · I.

3From Definition 217, recall that given A = [aij ] ∈ M (m,m), ∀i, j,

(−1)i+j detAij

is called cofacor of aij in A.

5.3. INVERSE MATRICES (CONTINUED) 71

Example 233

A =

⎡⎣ 1 2 30 1 21 2 0

⎤⎦ , AT =

⎡⎣ 1 0 12 1 23 2 0

⎤⎦ , Adj A =

⎡⎣ −4 6 12 −3 −2−1 0 1

⎤⎦Proposition 234 Given An×n, we have

A ·Adj A = Adj A ·A = detA · I (5.6)

Proof. Making the product A ·Adj A := B, we have1. ∀i ∈ {1, ..., n}, the i − th element on the diagonal of B is the expansion of the determinant

by the i− th row and therefore is equal to detA.2. any element not on the diagonal of B is the product of the elements of a row times the

corresponding cofactor a parallel row and it is therefore equal to zero due to Property 8 of thedeterminants stated in Theorem 221).

Example 235

det

⎡⎣ 1 2 30 1 21 2 0

⎤⎦ = −3⎡⎣ 1 2 30 1 21 2 0

⎤⎦⎡⎣ −4 6 12 −3 −2−1 0 1

⎤⎦ =⎡⎣ −3 0 0

0 −3 00 0 −3

⎤⎦ = −3 · IProposition 236 Given an n× n matrix A, the following statements are equivalent:

1. detA 6= 0, i.e., A is nonsingular;

2. A−1 exists, i.e., A is invertible;

3. rank A = n;

4. col rank A = n, i.e., the column vectors of the matrix A are linearly independent;

5. row rank A = n, i.e., the row vectors of the matrix A are linearly independent;

6. dim col span A = n;

7. dim row span A = n.

Proof. 1⇒ 2From (5.6) and from the fact that detA 6= 0, we have

A · Adj AdetA

=Adj A

detA·A = I

and therefore

A−1 =Adj A

detA(5.7)

1⇐ 2AA−1 = I ⇒ det

¡AA−1

¢= det I ⇒ detA · detA−1 = 1⇒ detA 6= 0 (and detA−1 6= 0).

1⇔ 3It follows from the definition of rank and the fact that A is n× n matrix.2⇒ 4From (4.6), it suffices to show that hAx = 0⇒ x = 0i. Since A−1 exists, Ax = 0 ⇒ A−1Ax =

A−10⇒ x = 0.2⇐ 4From Proposition 182.2, the n linearly independent column vectors a·1, ..., a·i, ..., a·n are a basis

of Rn. Therefore, each vector in Rn is equal to a linear combination of those vectors. Then


∀k ∈ {1, ..., n} ∃bk ∈ Rn such that the k − th vector ek in the canonical basis is equal to[C1 (A) , ..., Ci (A) , ..., Cn (A)] · bk = Abk, i.e.,£

e1 ... ek ... en¤=£Ab1 ... Abk ... Abn

¤or, from (3.4) in Remark 71, defined

B :=£b1 ... bk ... bn

¤I = AB

i.e., A−1 exists (and it is equal to B).

The remaining equivalences follow from Corollary 197.

Remark 237 From the proof of the previous Proposition, we also have that, if detA 6= 0, thendetA−1 = (detA)−1.

Remark 238 The previous theorem gives a way to compute the inverse matrix as explained in(5.7).

Example 239 1. ⎡⎣ 1 2 30 1 21 2 0

⎤⎦−1 = −13

⎡⎣ −4 6 12 −3 −2−1 0 1

⎤⎦2. ⎡⎣ 0 1 1

1 1 01 1 1

⎤⎦−1 =⎡⎣ −1 0 1

1 1 −10 −1 1

⎤⎦3. ⎡⎣ 0 1 2

3 4 56 7 8

⎤⎦−1 does not exist because det⎡⎣ 0 1 23 4 56 7 8

⎤⎦ = 04. ⎡⎣ 0 1 0

2 0 21 2 3

⎤⎦−1 =⎡⎣ 1 3

4 −121 0 0−1 −14

12

⎤⎦5. ⎡⎣ a 0 0

0 b 00 0 c

⎤⎦−1 =⎡⎣ 1

a 0 00 1

b 00 0 1

c

⎤⎦if a, b, c 6= 0.

5.4 Span of a matrix, linearly independent rows and columns,rank

Proposition 240 Given A ∈M (m,n), then

dim row span A = rank A.

Proof. First proof.The following result which is the content of Corollary 153.5 plays a crucial role in the proof:For any A ∈ M (m,n), there exist invertible matrices P ∈ M (m,m) and Q ∈ M (n, n) and a

unique number r ∈ {0, 1, ...,min {m,n}} such that

A is equivalent to PAQ =∙Ir 00 0

¸.

5.4. SPAN OF A MATRIX, LINEARLY INDEPENDENT ROWS AND COLUMNS, RANK 73

From the above result, Proposition 228 and Remark 227 , we have that

rank A = rank

∙Ir 00 0

¸= r. (5.8)

From Propositions 105 and 148, we have

row span A = row span PA;

from Propositions 119 and 148, we have

col span PA = col span PAQ = col span

∙Ir 00 0

¸.

From Corollary 197,dim row span PA = dimcol span PA.

Therefore

dim row span A = dimcol span

∙Ir 00 0

¸= r, (5.9)

where the last equality follows simply because

col span

∙Ir 00 0

¸= col span

∙Ir0

¸,

and the r column vectors of the matrix∙Ir0

¸are linearly independent and therefore, from Propo-

sition 191 , they are a basis of span∙Ir0

¸.

From (5.8) and (5.9), the desired result follows.

Second proof.We are going to show that row rank A = rank A.Recall that

row rank A :=

* r ∈ N such thati . r row vectors of A are linearly independent,ii. if m > r, any set of rows of A of cardinality > r is linearly dependent.

+

We ant to show that1. if row rank A = r, then rank A = r, and2. if rank A = r, then row rank A = r.1.Consider the r l.i. row vectors of A. Since r is the maximal number of l.i. row vectors, from

Lemma 189, each of the remaining (m− r) row vectors is a linear combination of the r l.i. ones.Then, up to reordering of the rows of A, which do not change either row rank A or rank A, thereexist matrices A1 ∈M (r, n) and A2 ∈M (m− r, n) such that

rank A = rank

∙A1A2

¸= rank A1

where the last equality follows from Proposition 228. Then r is the maximum number of l.i. rowvectors of A1 and therefore, from Proposition 196, the maximum number of l.i. column vectors ofA1. Then, again from Lemma 189, we have that there exist A11 ∈ M (r, r) and A12 ∈ M (r, n− r)such that

rank A1 = rank£A11 A12

¤= rank A11

Then the square r×r matrix A11 contains r linearly independent vectors. Then from Proposition236, the result follows.2.


By Assumption, up to reordering of rows, which do not change either row rank A or rank A,

A =

⎡⎢⎢⎣s r n− r − s

r A11 A12 A13m− r A21 A22 A23

⎤⎥⎥⎦with

rank A11 = r.

Then from Proposition 236, row, and column, vectors of A12 are linearly independent. Thenfrom Corollary 171, the r row vectors of£

A11 A12 A13¤

are l.i.Now suppose that the maximum number of l.i. row vectors of A are r0 > r (and the other m−r0

row vectors of A are linear combinations of them). Then from part 1 of the present proof, rankA = r0 > r, contradicting the assumption.

Remark 241 From Corollary 197 and the above Proposition, we have for any matrix Am×n, thefollowing numbers are equal:1. rank A :=greatest order of square nonsingular submatrices of A,2. dim row span A,3. dim col span A4. row rank A := max number of linear independent rows of A,5. col rank A := max number of linear independent columns of A.

Chapter 6

Eigenvalues and eigenvectors

In the present chapter, we try to answer the following basic question: given a real matrix A ∈M (n, n), can we find a nonsingular matrix P (which can be thought to represent a change-of-basismatrix ( see Corollary 211 and Remark after it) such that

P−1AP is a diagonal matrix?

6.1 Characteristic polynomialWe first introduce the notion of similar matrices.

Definition 242 A matrix B ∈ M (n, n) is similar to a matrix A ∈ M (n, n) if there exists aninvertible matrix P ∈M (n, n) such that

B = P−1AP

Remark 243 Similarity is an equivalence relation. Therefore, we say that A and B are similarmatrices if B = P−1AP .

Definition 244 A matrix A ∈ M (n, n) is said to be diagonalizable if there exists an invertiblematrix P such that B = P−1AP is a diagonal matrix, i.e., if there exists a similar diagonal matrix.

Definition 245 Given a matrix A ∈MF (n, n), the matrix

tI −A,

with t ∈ F, is called the characteristic matrix of A;

det [tI −A]

is called the characteristic polynomial (in t) and it is denoted by ∆A (t);

∆A (t) = 0

is the characteristic equation of A .

Proposition 246 Similar matrix have the same determinant and the same characteristic polyno-mial.

Proof. Assume that A and B are similar matrices; then, by definition, there exists a nonsingularmatrix P such that B = P−1AP . Then

detB = det£P−1AP

¤= detP−1 detAdetP =

= detP−1 detP detA = det¡P−1P

¢detA =

= detA.

75

76 CHAPTER 6. EIGENVALUES AND EIGENVECTORS

and very similarly,

det [tI −B] = det£tI − P−1AP

¤= det

£P−1tIP − P−1AP

¤=

= det£P−1 (tI −A)P

¤= detP−1 det [tI −A] detP =

= detP−1 detP det [(tI −A)] = det¡P−1P

¢det [(tI −A)] =

= det [(tI −A)] .

The proof of the following result goes beyond the scope of these notes and it is therefore omitted.

Proposition 247 Let A ∈MF (n, n) be given. Then its characteristic polynomial is

tn − S1tn−1 + S2t

n−2 + ...+ (−1)n Sn,

where ∀i ∈ {1, ..., n}, Si is the sum of the principal minors of order i in A - see Definition 220.

Exercise 248 Verify the statement of the above Proposition for n ∈ {2, 3}.

6.2 Eigenvalues and eigenvectorsDefinition 249 Let A ∈ M (n, n) on a field F be given. A scalar λ ∈ F is called an eigenvalue ofA if there exists a nonzero vector v ∈ Fn such that

Av = λv.

Every vector v satisfying the above relationship is called an eigenvector of A associated with (orbelonging to) the eigenvalue λ.

Remark 250 The two main reasons to require an eigenvector to be different from zero are explainedin Proposition 252, part 1, and the first line in both steps of the induction proof of Proposition 259below.

Definition 251 Let Eλ be the set of eigenvectors belonging to λ.

Proposition 252 1. There is only one eigenvalue associated with an eigenvector.2. The set Eλ ∪ {0} is a subspace of Fn.

Proof. 1. If Av = λv = μv, then (λ− μ) v = 0. Since v 6= 0, the result follows.2. Step 1. If λ is an eigenvalue of A and v is an associated eigenvector, then ∀k ∈ F\ {0}, kv is

an associated eigenvector as well: Av = λv ⇒ A (kv) = kAv = k (λv).Step 2. If λ is an eigenvalue of A and v, u are associated eigenvectors, then v+u is an associated

eigenvector as well: A (v + u) = Av +Au = λv + λu = λ (v + u).

Proposition 253 Let A be a matrix in M (n, n) over a field F . Then, the following statements areequivalent.

1. λ ∈ F is an eigenvalue of A.

2. λI −A is a singular matrix.

3. λ ∈ F is a solution to the characteristic equation ∆A (t) = 0.

Proof. λ ∈ F is an eigenvalue of A ⇔ ∃v ∈ Fn\ {0} such that Av = λv ⇔ ∃v ∈ Fn\ {0} suchthat (λI −A) v = 0

Exercise⇔ λI − A is a singular matrix ⇔ det (λI −A) = 0 ⇔ λ ∈ F is a solutionto the characteristic equation ∆A (t) = 0.

Theorem 254 (The fundamental theorem of algebra) 1 Let

p (z) = anzn + an−1z

n−1 + ...+ a1z + a0 with an 6= 01For a brief description of the field of complex numbers, see Chapter 10.

6.3. SIMILAR MATRICES AND DIAGONALIZABLE MATRICES 77

be a polynomial of degree n ≥ 1 with complex coefficients a0, ..., an. Then

p (z) = 0

for at least one z ∈ C.In fact, there exist r ∈ N\ {0}, pairwise distinct (zi)ri=1 ∈ Cr and (ni)

ri=1 ∈ Nr, c ∈ C\ {0} such

thatp (z) = c ·Πri=1 (z − zi)

ni

andrX

i=1

ni = n.

Definition 255 For any i ∈ {1, ..., r}, ni is called the algebraic multiplicity of the solution zi.

As a consequence of the Fundamental Theorem of Algebra, we have the following result.

Proposition 256 Let A be a matrix in M (n, n) over the complex field C. Then, A has at leastone eigenvalue.

Definition 257 Let λ be an eigenvalue of A. The algebraic multiplicity of λ is the multiplicity ofλ as a solution to the characteristic equation ∆A (t) = 0. The geometric multiplicity of λ is thedimension of Eλ ∪ {0}.

In Section 8.8, we will prove the following result: Let λ be an eigenvalue of A. The geometricmultiplicity of λ is smaller or equal than the algebraic multiplicity of λ.

6.3 Similar matrices and diagonalizable matricesProposition 258 A ∈ M (n, n) is diagonalizable, i.e., A is similar to a diagonal matrix D ⇔A has n linearly independent eigenvectors. In that case, the elements on the diagonal of D arethe associated eigenvalues and D = P−1AP , where P is the matrix whose columns are the neigenvectors.

Proof. [⇐] Assume that©vkªnk=1

is the set of linearly independent eigenvectors associated withA and for every k ∈ {1, ..., n}, λk is the unique eigenvalue associated with vk. Then,

∀k ∈ {1, ..., n} , Avk = λkvk

i.e.,AP = P · diag [(λk)

nk=1] ,

where P is the matrix whose columns are the n eigenvectors. Since they are linearly independent,P is invertible and

P−1AP = diag [(λk)nk=1] .

[⇒] Assume that there exists an invertible matrix P ∈M (n, n) and a diagonal matrix D =diag[(λk)

nk=1] such that P

−1AP = D. Then

AP = PD.

Let©vkªnk=1

be the column vectors of P . Then the above expression can be rewritten as

∀k ∈ {1, ..., n} , Avk = λkvk.

First of all, ∀k ∈ {1, ..., n} , vk 6= 0, otherwise P would not be invertible. Then vk, λk areeigenvectors-eigenvalues associated with A. Finally

©vkªnk=1

are linearly independent because P isinvertible.

Proposition 259 Let v1, ..., vn be eigenvectors of A ∈ M (m,m) belonging to pairwise distincteigenvalues λ1, ..., λn, respectively. Then v1, ..., vn are linearly independent.

78 CHAPTER 6. EIGENVALUES AND EIGENVECTORS

Proof. We prove the result by induction on n.Step 1. n = 1.Since, by definition, eigenvectors are nonzero, the result follows.Step 2.Assume that

nXi=1

aivi = 0. (6.1)

Multiplying (6.1) by A, and using the assumption that ∀i ∈ {1, ..., n}, Avi = λivi, we get

0 =nXi=1

aiAvi =

nXi=1

aiλivi. (6.2)

Multiplying (6.1) by λn, we get

0 =nXi=1

aiλnvi. (6.3)

Subtracting (6.3) from (6.2), we get

0 =n−1Xi=1

ai (λi − λn) vi. (6.4)

From the assumption of Step 2 of the proof by induction and from (6.4), we have that¡v1, ..., vn−1

¢are linearly independent, and therefore, ∀i ∈ {1, ..., n− 1}, ai (λi − λn) = 0, and since eigenvalueare pairwise distinct,

∀i ∈ {1, ..., n− 1} , ai = 0.Substituting the above conditions in (6.1), we get anvn = 0,and since vn is an eigenvector and

therefore different from zero,an = 0.

Remark 260 A ∈ M (m,m) admits at most m distinct eigenvalues. Otherwise, from Proposition259, you would have m+ 1 linearly independent vectors in Rm, contradicting Proposition 182 .1.

Proposition 261 Let A ∈ M (n, n) be given. Assume that ∆A (t) = Πni=1 (t− ai) , with a1, ..., an

pairwise distinct. Then, A is similar to diag [(ai)ni=1], and, in fact, diag [(ai)

ni=1] = P−1AP ,

where P is the matrix whose columns are the n eigenvectors.

Proof. From Proposition 253, ∀i ∈ {1, ..., n}, ai is an eigenvalue of A. For every i ∈ {1, ..., n},let vi be the eigenvector associated with ai. Since a1, ..., an are pairwise distinct, from Proposition259, the eigenvectors v1, ..., vn are linearly independent. From Proposition 258, the desired resultfollows.In Section 9.2, we will describe an algorithm to find eigenvalues and eigenvectors of A and to

show if A is diagonalizable.

Chapter 7

Linear functions

7.1 Definition

Definition 262 Given the vector spaces V and Uover the same field F , a function l : V → U islinear if

1. ∀v, w ∈ V , l (v + w) = l (v) + l (w), and

2. ∀α ∈ F, ∀v ∈ V , l (αv) = αl (v).

Call L (V,U) the set of all such functions. Any time we write L (V,U), we implicitly assumethat V and U are vector spaces on the same field F .

In other words, l is linear if it “preserves” the two basic operations of vector spaces.

Remark 263 1. l ∈ L (V,U) iff ∀v,w ∈ V and ∀α, β ∈ F, l (αv + βw) = αl (v) + βl (w);2. If l ∈ L (V,U), then l (0) = 0: for arbitrary x ∈ V , l (0) = l (0x) = 0l (x) = 0.

Example 264 Let V and U be vector spaces. The following functions are linear.1. (identity function)

l1 : V → V, l1 (v) = v.

2. (null function)l2 : V → U, l2 (v) = 0.

3.∀a ∈ F, la : V → V, la (v) = av.

4. (projection function)∀n ∈ N,∀k ∈ N\ {0} ,

projn+k,n : Rn+k → Rn, projn+k,n : (xi)n+ki=1 7→ (xi)

ni=1 ;

5. (immersion function)∀n ∈ N,∀k ∈ N\ {0} ,

in,n+k : Rn → Rn+k, in,n+k : (xi)ni=1 7→ ((xi)

ni=1 , 0) with 0 ∈ Rk.

Example 265 Taken A ∈M (m,n), then

l : Rn → Rm, l (x) = Ax

is a linear function, as shown in part 3 in Remark 76.

79

80 CHAPTER 7. LINEAR FUNCTIONS

Example 266 Let V be the vector space of polynomials in the variable t. The following functionsare linear1. The derivative defines a function D : V → V as

D : p 7→ p0

where p0 is the derivative function of p.2. The definite integral from 0 to 1 defines a function i : V → R as

i : p 7→Z 1

0

p (t) dt.

Proposition 267 If l ∈ L (V,U) is invertible, then its inverse l−1 is linear.

Proposition 268 If l ∈ L (V,U) is invertible, then its inverse l−1 is linear.

Proof. Take arbitrary u, u0 ∈ U and α, β ∈ F . Then, since l is onto, there exist v, v0 ∈ V suchthat

l (v) = u and l (v0) = u0

and by definition of inverse function

l−1 (u) = v and l−1 (u0) = v0 .

Thenαu+ βu0 = αl (v) + βl (v0) = l (αv + βv0)

where last equality comes from the linearity of l. Then again by definition of inverse,

l−1 (αu+ βu0) = αv + βv0 = αl−1 (u) + βl−1 (u0) .

7.2 Kernel and Image of a linear functionDefinition 269 Assume that l ∈ L (V,U). The kernel of l, denoted by ker l is the set

{v ∈ V : l (v) = 0} = l−1 (0) .

The Image of l, denoted by Im l is the set

{u ∈ U : ∃v ∈ V such that f (v) = u} = l (V ) .

Proposition 270 Given l ∈ L (V,U), ker l is a vector subspace of V and Im l is a vector subspaceof U .

Proof. Take v1, v2 ∈ ker l and α, β ∈ F . Then

l¡αv1 + βv2

¢= αl

¡v1¢+ βl

¡v2¢= 0

i.e., αv1 + βv2 ∈ ker l.Take w1, w2 ∈ Im l and α, β ∈ F . Then for i ∈ {1, 2} , ∃vi ∈ V such that l

¡vi¢= wi. Moreover,

αw1 + βw2 = αl (vi) + βl¡v2¢= l

¡αv1 + βv2

¢i.e., αw1 + βw2 ∈ Im l.

Proposition 271 If span¡©v1, ..., vn

ª¢= V and l ∈ L (V,U), then span

¡©l¡vi¢ªn

i=1

¢= Im l.

Proof. Taken u ∈ Im l, there exists v ∈ V such that l (v) = u. Moreover, ∃ (αi)ni=1 ∈ Rn suchthat v =

Pni=1 αiv

i. Then

u = l (v) = l

ÃnXi=1

αivi

!=

nXi=1

αil¡vi¢,

as desired.

7.2. KERNEL AND IMAGE OF A LINEAR FUNCTION 81

Remark 272 From the previous proposition, we have that if©v1, ..., vn

ªis a basis of V , then

n ≥ dim span³©

l¡vi¢ªn

i=1

´= dim Im l.

Example 273 Let V the vector space of polynomials and D3 : V → V , p 7→ p000, i.e., the thirdderivative of p. Then

kerD3 = set of polynomials of degree ≤ 2,since D3

¡at2 + bt+ c

¢= 0 and D3 (tn) 6= 0 for n > 2. Moreover,

ImD3 = V,

since every polynomial is the third derivative of some polynomial.

Proposition 274 (Dimension Theorem)If V is a finite dimensional vector space and l ∈ L (V,U),then

dimV = dimker l + dim Im l

Proof. (Idea of the proof.1. Using a basis of ker l (with n1 elements) and a basis of Im l (with n2 elements) construct a

set with n1 + n2 elements which generates V .2. Show that set is linearly independent (by contradiction), and therefore a basis of V , and

therefore dimV = n1 + n2.)Since ker l ⊆ V and from Remark 272, ker l and Im l have finite dimension. Therefore, we can

define n1 = dimker l and n2 = dim Im l.Take an arbitrary v ∈ V . Let ©

w1, .., wn1ªbe a basis of ker l (7.1)

and ©u1, .., un2

ªbe a basis of Im l (7.2)

Then,∀i ∈ {1, .., n2} , ∃vi ∈ V such that ui = l

¡vi¢

(7.3)

From (7.2),

∃c = (ci)n2i=1 such that l (v) =n2Xi=1

ciui (7.4)

Then, from (7.4) and (7.3) ,we get

l (v) =

n2Xi=1

ciui =

n2Xi=1

cil¡vi¢

and from linearity of l

0 = l (v)−n2Xi=1

cil¡vi¢= l (v)− l

Ãn2Xi=1

civi

!= l

Ãv −

n2Xi=1

civi

!i.e.,

v −n2Xi=1

civi ∈ ker l (7.5)

From (7.1),

∃ (dj)nj=1 such that v −n2Xi=1

civi =

n1Xj=1

djwj

Summarizing, we have

∀v ∈ V,∃ (ci)n2i=1 and (dj)n1j=1 such that v =

n2Xi=1

civi +

n1Xj=1

djwj


Therefore, we found n1 + n2 vectors which generate V ; if we show that they are l .i., then, bydefinition, they are a basis and therefore n = n1 + n2 as desired.We want to show that

n2Xi=1

αivi +

n1Xj=1

βjwj = 0 ⇒

³(αi)

n2i=1 ,

¡βj¢n1j=1

´= 0 (7.6)

Then

0 = l

⎛⎝ n2Xi=1

αivi +

n1Xj=1

βjwj

⎞⎠ =

n2Xi=1

αil¡vi¢+

n1Xj=1

βjl¡wj¢

From (7.1), i.e.,©w1, .., wn1

ªis a basis of ker l, and from (7.3), we get

n2Xi=1

αiui = 0

From (7.2), i.e.,©u1, .., un2

ªbe a basis of Im l,

(αi)n2i=1 = 0 (7.7)

But from the assumption in (7.6) and (7.7) we have thatn1Xj=1

βjwj = 0

and since©w1, .., wn1

ªis a basis of ker l, we get also that¡

βj¢n1j=1

= 0,

as desired.

Example 275 Let V and U be vector spaces, with dimV = n.In 1. and 2. below, we verify the statement of the Dimension Theorem: in 3. and 4., we use

that statement.1. Identity function idV .

dim Im idV = ndimker l = 0.

2. Null function 0 ∈ L (V,U)dim Im0 = 0dimker 0 = n.

3. l ∈ L¡R2,R

¢,

l ((x1, x2)) = x1.

Since ker l =©(x1, x2) ∈ R2 : x1 = 0

ª, {(0, 1)} is a basis1 of ker l and

dimker l = 1,dim Im l = 2− 1 = 1.

4. l ∈ L¡R3,R2

¢,

l ((x1, x2, x3)) =

∙1 2 30 1 0

¸⎡⎣ x1x2x3

⎤⎦ .Defined

A =

∙1 2 30 1 0

¸,

sinceIm l =

©y ∈ R2 : ∃x ∈ R3 such that Ax = y

ª= span col A = rank A,

and since the first two column vectors of A are linearly independent, we have that

dim Im l = 2dimker l = 3− 2 = 1.

1 In Remark 361 we will present an algorithm to compute a basis of ker l.

7.3. NONSINGULAR FUNCTIONS AND ISOMORPHISMS 83

7.3 Nonsingular functions and isomorphismsDefinition 276 l ∈ L (V,U) is singular if ∃v ∈ V \ {0} such that l (v) = 0.

Remark 277 Thus l ∈ L (V,U) is nonsingular if ∀v ∈ V \ {0} , l (v) 6= 0 i.e., ker l = {0}. Briefly,

l ∈ L (V,U) is nonsingular ⇔ ker l = {0}.

Remark 278 In Remark 310, we will discuss the relationship between singular matrices and sin-gular linear functions.

Proposition 279 If l ∈ L (V,U) is nonsingular, then the image of any linearly independent set islinearly independent.

Proof. Suppose©v1, ..., vn

ªare linearly independent. We want to show that

©l¡v1¢, ..., l (vn)

ªare linearly independent as well. Suppose

nXi=1

αi · l¡vi¢= 0.

Then

l

ÃnXi=1

αi · vi!= 0.

and therefore¡Pn

i=1 αi · vi¢∈ ker l = {0}, where the last equality comes from the fact that l is

nonsingular. ThenPn

i=1 αi · vi = 0 and, since©v1, ..., vn

ªare linearly independent, (αi)

ni=1 = 0, as

desired.

Definition 280 Let two vector spaces V and U be given. U is isomorphic to V if there exists afunction l ∈ L (V,U) which is one-to-one and onto. l is called an isomorphism from V to U .

Remark 281 By definition of isomorphism , if l is an isomorphism, the l is invertible and therefore,from Proposition 267, l−1is linear.

Remark 282 “To be isomorphic” is an equivalence relation.

Proposition 283 Any n-dimensional vector space V on a field F is isomorphic to Fn.

Proof. Since V and Fn are vector spaces, we are left with showing that there exists an isomor-phism between them. Let v =

©v1, ..., vn

ªbe a basis of V . Define

cr : V → Fn, v 7→ [v]v ,

where cr stands for “coordinates”.1. cr is linear. Given v, w ∈ V , suppose

v =nXi=1

aivi and w =

nXi=1

bivi

i.e.,[v]v = [ai]

ni=1 and [w]v = [bi]

ni=1 .

∀α, β ∈ F and ∀v1, v2 ∈ V ,

αv + βw = αnXi=1

aivi + β

nXi=1

bivi =

nXi=1

(αai + βbi) vi

i.e.,[αv + βw]v = α [ai]

ni=1 + β [bi]

ni=1i = α [v]v + β [w]v .

2. cr is onto. ∀ (a)ni=1 ∈ Rn, cr¡Pn

i=1 aivi¢= (a)

ni=1.

3. cr is one-to-one. cr (v) = cr (w) implies that v = w, simply because v =Pn

i=1 cri (v)ui and

w =Pn

i=1 cri (w)ui.


Proposition 284 Let V and U be finite dimensional vectors spaces on the same field F such thatS =

©v1, ..., vn

ªis a basis of V and

©u1, ..., un

ªis a set of arbitrary vectors in U . Then there exists

a unique linear function l : V → U such that ∀i ∈ {1, ..., n}, l¡vi¢= ui.

Proof. The proof goes the following three steps.1. Define l;2. Show that l is linear;3. Show that l is unique.1. Using Definition 199, ∀v ∈ V , define

l : V → U, v 7→nXi=1

[v]is · ui.

Then ∀j ∈ {1, ..., n},£vj¤S= ejn

2 and l¡vj¢=Pn

j=1 ejn · uj = uj .

2. Let v, w ∈ V and α, β ∈ F . Then

l (αv + βw) =nXi=1

[αv + βw]iS · ui =

nXi=1

³α [v]

iS + β [w]

iS

´· ui = α

nXi=1

[v]iS · ui + β

nXi=1

[w]iS · ui,

where the before the last equality follows from the linearity of [.] - see the proof of Proposition 2833. Suppose g ∈ L (V,U) and ∀i ∈ {1, ..., n}, g

¡vi¢= ui. Then ∀v ∈ V ,

g (v) = g

ÃnXi=1

[v]iS · vi

!=

nXi=1

[v]iS · g

¡vi¢=

nXi=1

[v]iS · ui = l (v)

where the last equality follows from the definition of l.

Remark 285 Observe that if V and U are finite(nonzero) dimensional vector spaces, there is amultitude of functions from V to U . The above Proposition says that linear functions are completelydetermined by what “they do to the elements of a basis”of V .

Proposition 286 Assume that l ∈ L (V,U) .Then,

l is one-to-one ⇔ l is nonsingular

Proof. [⇒]Take v ∈ ker l. Then

l (v) = 0 = l (0)

where last equality follows from Remark 263. Since l is one-to-one, v = 0.[⇐] If l (v) = l (w), then l (v − w) = 0 and, since l is nonsingular, v − w = 0.

Proposition 287 Assume that V and U are finite dimensional vector spaces and l ∈ L (V,U) .Then,1. l is one-to-one ⇒ dimV ≤ dimU ;2. l is onto ⇒ dimV ≥ dimU ;3. l is invertible ⇒ dimV = dimU .

Proof. The main ingredient in the proof is Proposition 274 , i.e., the Dimension Theorem.1. Since l is one-to-one, from the previous Proposition, dimker l = 0. Then, from Proposition

274 (the Dimension Theorem), dimV = dim Im l. Since Im l is a subspace of U , then dim Im l ≤dimU .2. Since l is onto iff Im l = U , from Proposition 274 (the Dimension Theorem), we get

dimV = dimker l + dimU ≥ dimU.

3. l is invertible iff l is one-to-one and onto.

Proposition 288 Let V and U be finite dimensional vector space on the same field F . Then,

U and V are isomorphic ⇔ dimU = dimV.

2Recall that ein ∈ Rn is the i− th element in the canonical basis of Rn.

7.3. NONSINGULAR FUNCTIONS AND ISOMORPHISMS 85

Proof. [⇒]It follows from the definition of isomorphism and part 3 in the previous Proposition.[⇐]Assume that V and U are vector spaces such that dimV = dimU = n. Then, from Proposition

283, V and U are isomorphic to Fn and from Remark 282, the result follows.

Proposition 289 Suppose V and U are vector spaces such that dimV = dimU = n and l ∈L (V,U). Then the following statements are equivalent.1. l is nonsingular, i.e., ker l = {0} ,2. l is one-to-one,3. l is onto,4. l is an isomorphism.

Proof. [1⇔ 2] .It is the content of Proposition 286.[1⇒ 3]Since l is nonsingular, then ker l = {0} and dimker l = 0. Then, from Proposition 274 (the

Dimension Theorem), i.e., dimV = dimker l + dim Im l, and the fact dimV = dimU , we getdimU = dim Im l. Since Im l ⊆ U and U is finite dimensional, from Proposition 185, Im l = U , i.e.,l is onto, as desired.[3⇒ 1]Since l is onto, dim Im l = dimV and from Proposition 274 (the Dimension Theorem), dimV =

dimker t+ dimV, and therefore dimker l = 0, i.e., l is nonsingular.[1⇒ 4]It follows from the definition of isomorphism and the facts that [1⇔ 2] and [1⇒ 3].[4⇒ 1]It follows from the definition of isomorphism and the facts that [2⇔ 1].

Chapter 8

Linear functions and matrices

In Remark 66 we have seen that the set of m × n matrices with the standard sum and scalarmultiplication is a vector space, called M (m,n). We are going to show that:1. the set L (V,U) with naturally defined sum and scalar multiplication is a vector space, called

L (V,U);2. If dimV = n and dimU = m, then L (V,U) and M (m,n) are isomorphic.

8.1 From a linear function to the associated matrix

Definition 290 Suppose V and U are vector spaces over a field F and l1, l2 ∈ L (V,U) and α ∈ F .

l1 + l2 : V → U, v 7→ l1 (v) + l2 (v)

αl1 : V → U, v 7→ αl1 (v)

Proposition 291 L (V,U) with the above defined operations is a vector space on F , denoted byL (V,U).

Proof. Exercise.

Remark 292 Compositions of linear functions are linear.

Definition 293 Suppose l ∈ L (V,U) , v =©v1, ..., vn

ªis a basis of V , u =

©u1, ..., um

ªis a basis

of U . Then,

[l]uv :=£ £

l¡v1¢¤u

...£l¡vj¢¤u

... [l (vn)]u¤∈M (m,n) , (8.1)

where for any j ∈ {1, ..., n},£l¡vj¢¤uis a column vector, is called the matrix representation of l

relative to the basis v and u. In words, [l]uv is the matrix whose columns are the coordinates relativeto the basis of the codomain of l of the images of each vector in the basis of the domain of l.

Definition 294 Suppose l ∈ L (V,U) , v =©v1, ..., vn

ªis a basis of V , u =

©u1, ..., um

ªis a basis

of U .

ϕuv : L (V,U)→M (m,n) , l 7→ [l]uv defined in (8.1) .

If no confusion may arise, we will denote ϕuv simply by ϕ.

The proposition below shows that multiplying the coordinate vector of v relative to the basis vby the matrix [l]uv, we get the coordinate vector of l (v) relative to the basis u.

Proposition 295 ∀v ∈ V ,

[l]uv · [v]v = [l (v)]u (8.2)

87

88 CHAPTER 8. LINEAR FUNCTIONS AND MATRICES

Proof. Assume v ∈ V . First of all observe that

[l]uv · [v]v =£ £

l¡v1¢¤u

...£l¡vj¢¤u

... [l (vn)]u¤⎡⎢⎢⎢⎢⎣[v]

1v

...

[v]jv...[v]

nv

⎤⎥⎥⎥⎥⎦ =nXj=1

[v]jv ·£l¡vj¢¤u.

Moreover, from the linearity of the function cru := [.]u, and using the fact that the compositionof linear functions is a continuous function, we get:

[l (v)]u = cru (l (v)) = (cru ◦ l)

⎛⎝ nXj=1

[v]jv · vj⎞⎠u

=nXj=1

[v]jv · (cru ◦ l)¡vj¢=

nXj=1

[v]jv ·£l¡vj¢¤u.

Example 296 Let’s verify equality (8.2) in the case in whicha.

l : R2 → R2, (x1, x2) 7→µ

x1 + x2x1 − x2

¶b. the basis v of the domain of l is ½µ

10

¶,

µ01

¶¾,

c. the basis u of the codomain of l is ½µ11

¶,

µ21

¶¾,

d.

v =

µ34

¶.

The main needed computations are presented below.

[l]uv :=

∙∙l

µ10

¶¸u

,

∙l

µ01

¶¸u

¸=

∙∙11

¸u

,

∙1−1

¸u

¸=

∙1 −30 2

¸,

[v]v =

∙34

¸,

[l (v)]u =

∙µ7−1

¶¸u

=

∙−98

¸,

[l]uv · [v]v =

∙1 −30 2

¸ ∙34

¸=

∙−98

¸.

8.2 From a matrix to the associated linear functionGiven A ∈M (m,n) , recall that ∀i ∈ {1, ..,m}, Ri (A) denotes the i− th row vector of A, i.e.,

A =

⎡⎢⎢⎢⎢⎣R1 (A)...Ri (A)...Rm (A)

⎤⎥⎥⎥⎥⎦m×n

Definition 297 Consider vector spaces V and U with basis v =©v1, ..., vn

ªand u =

©u1, ..., um

ª,

respectively. Given A ∈M (m,n) , define

luA,v : V → U, v 7→mXi=1

(Ri (A) · [v]v) · ui.

8.3. M (M,N) AND L (V,U) ARE ISOMORPHIC 89

Proposition 298 luA,v defined above is linear, i.e., luA,v ∈ L (V,U).

Proof. ∀α, β ∈ R and ∀v1, v2 ∈ V ,

luA,v¡αv1 + βv2

¢=Pm

i=1Ri (A) ·£αv1 + βv2

¤v· ui =

Pmi=1Ri (A) ·

¡α£v1¤v+ β

£v2¤v

¢· ui =

= αPm

i=1Ri (A) ·£v1¤v· ui + β

Pmi=1Ri (A) ·

£v2¤v· ui = αluA,v

¡v1¢+ βluA,v

¡v2¢.

where the second equality follows from the proof of Proposition 283.

Definition 299 Given the vector spaces V and U with basis v =©v1, ..., vn

ªand u =

©u1, ..., um

ª,

respectively, defineψuv :M (m,n)→ L (V,U) : A 7→ luA,v .

If no confusion may arise, we will denote ψuv simply by ψ.

Proposition 300 ψ defined above is linear.

Proof. We want to show that ∀α, β ∈ R and ∀A,B ∈M (m,n),

ψ (αA+ βB) = αψ (A) + βψ (B)

i.e.,lvαA+βB,u = αluA,v + βlvB,u

i.e., ∀v ∈ V ,lvαA+βB,u (v) = αluA,v (v) + βlvB,u (v) .

Now,

lvαA+βB,u (v) =Pm

i=1 (α ·Ri (A) + β ·Ri (B)) · [v]v · ui =

= αPm

i=1Ri (A) · [v]v · ui + βPm

i=1Ri (B) · [v]v · ui = αluA,v (v) + βlvB,u (v) ,

where the first equality come from Definition 297.

8.3 M (m,n) and L (V, U) are isomorphicProposition 301 Given the vector space V and U with dimension n and m, respectively,

M (m,n) and L (V,U) are isomorphic.

Proof. Linearity of the two spaces was proved above. We want now to check that ψ presentedin Definition 299 is an isomorphism, i.e., ψ is linear, one-to-one and onto. In fact, thanks toProposition 300, it is enough to show that ψ is invertible.First proof.1. ψ is one-to-one: see Theorem 2, page 105 in Lang (1971);2. ψ is onto: see bottom of page 107 in Lang (1970).Second proof.1. ψ ◦ ϕ = idL(V,U).Given l ∈ L (V,U), we want to show that ∀v ∈ V,

l (v) = ((ψ ◦ ϕ) (l)) (v)

i.e., from Proposition 283,[l (v)]u = [((ψ ◦ ϕ) (l)) (v)]u .

First of all, observe that from (8.2), we have

[l (v)]u = [l]uv [v]v .


Moreover,

[((ψ ◦ ϕ) (l)) (v)]u(1)=£ψ¡[l]uv¢(v)¤u

(2)=£Pn

i=1

¡i− th row of [l]uv

¢· [v]v · ui

¤u

(3)=

=£¡i− th row of [l]uv

¢· [v]v

¤ni=1

(4)= [l]

uv · [v]v

where (1) comes from the definition of ϕ, (2) from the definition of ψ, (3) from the definition of[.]u, (4) from the definition of product between matrices.2. ϕ ◦ ψ = idM(m,n).Given A ∈M (m,n), we want to show that (ϕ ◦ ψ) (A) = A. By definition of ψ,

ψ (A) = luA,v such that ∀v ∈ V , luA,v (v) =mXi=1

Ri (A) · [v]v · ui. (8.3)

By definition of ϕ,ϕ (ψ (A)) =

£luA,v

¤uv.

Therefore, we want to show that£luA,v

¤uv= A. Observe that from 8.3,

luA,v(v1) = m

i=1 Ri(A)·[v1]v·ui = mi=1[ai1,...,aij ,...,ain]·[1,...,0,...,0]·ui = a11u1+...+ai1ui+....+am1um

....

luA,v(vj) = m

i=1Ri(A)·[vj ]v·ui = mi=1[ai1,...,aij ,...,ain]·[0,...,1,...,0]·ui = a1ju1+...+aijui+...+amjum

...

luA,v(vn) = m

i=1 Ri(A)·[vn]v·ui = mi=1[ai1,...,aij ,...,ain]·[0,...,0,...,1]·ui = a1nu1+...+ainui+...+amnum

(From the above, it is clear why in definition 293 we take the transpose.) Therefore,

£luA,v

¤uv=

⎡⎢⎢⎢⎢⎣a11 ... a1j ... a1n...ai1 aij ain...am1 ... amj ... amn

⎤⎥⎥⎥⎥⎦ = A,

as desired.

Proposition 302 Let the following objects be given.1. Vector spaces V with basis v =

©v1, ..., vj , ..., vn

ª, U with basis u =

©u1, ..., ui, ..., um

ªand

W with basis w =©w1, ..., wk, ..., wp

ª;

2. l1 ∈ L (V,U) and l2 ∈ L (U,W ).Then

[l2 ◦ l1]wv = [l2]wu · [l1]

uv ,

orϕwv (l2 ◦ l1) = ϕwu (l2) · ϕuv (l1) .

Proof. By definition1

[l1]uv =

£ £l1¡v1¢¤u

...£l1¡vj¢¤u

... [l1 (vn)]u

¤m×n :=

:=

⎡⎢⎢⎢⎢⎣l11¡v1¢

... l11¡vj¢

... l11 (vn)

...li1¡v1¢

li1¡vj¢

li1 (vn)

...lm1¡v1¢

... lm1¡vj¢

... lm1 (vn)

⎤⎥⎥⎥⎥⎦ :=⎡⎢⎢⎢⎢⎣

l111 ... l1j1 ... l1n1...

li11 lij1 lin1...

lm11 ... lmj1 ... lmn

1

⎤⎥⎥⎥⎥⎦ :=

:=hlij1

ii∈{1,...,m},j∈{1,...,n}

:= A ∈M (m,n) ,

1 l1 v1u, .., [l1 (vn)]u .are column vectors.

8.3. M (M,N) AND L (V,U) ARE ISOMORPHIC 91

and therefore ∀j ∈ {1, ..., n} , l1¡vj¢=Pm

i=1 lij1 · ui.

Similarly,

[l2]wu =

£ £l2¡u1¢¤w

...£l2¡ui¢¤w

... [l2 (um)]w

¤p×m :=

:=

⎡⎢⎢⎢⎢⎣l12¡u1¢

... l12¡ui¢

... l12 (um)

...lk2¡u1¢

lk2¡ui¢

lk2 (um)

...lp2¡u1¢

... lp2¡ui¢

... lp2 (um)

⎤⎥⎥⎥⎥⎦ :=⎡⎢⎢⎢⎢⎣

l112 ... l1i2 ... l1m2...lk12 lki2 lkm2...

lp12 ... lpi2 ... lpm2

⎤⎥⎥⎥⎥⎦ :=

:=£lki2¤k∈{1,...,p},i∈{1,...,m} := B ∈M (p,m) ,

and therefore ∀i ∈ {1, ...,m} , l2¡ui¢=Pp

k=1 lki2 · wk.

Moreover, defined l := (l2 ◦ l1), we get

[l2 ◦ l1]wv =£ £

l¡v1¢¤w

...£l¡vj¢¤w

... [l (vn)]w¤p×n :=

:=

⎡⎢⎢⎢⎢⎣l1¡v1¢

... l1¡vj¢

... l1 (vn)...

lk¡v1¢

lk¡vj¢

lk (vn)...

lp¡v1¢

... lp¡vj¢

... lp (vn)

⎤⎥⎥⎥⎥⎦ :=⎡⎢⎢⎢⎢⎣

l11 ... l1j ... l1n

...lk1 lkj lkn

...lp1 ... lpj ... lpn

⎤⎥⎥⎥⎥⎦ :=

:=£lkj¤k∈{1,...,p},j∈{1,...,n} := C ∈M (p, n) ,

and therefore ∀j ∈ {1, ..., n} , l¡vj¢=Pp

k=1 lkj · wk.

Now, ∀j ∈ {1, ..., n}

l¡vj¢= (l2 ◦ l1)

¡vj¢= l2

¡l1¡vj¢¢= l2

³Pmi=1 l

ij1 · ui

´=Pm

i=1 lij1 · l2

¡ui¢=Pm

i=1 lij1 ·Pp

k=1 lki2 · wk =

Ppk=1

Pmi=1 l

ki2 · l

ij1 · wk.

The above says that ∀j ∈ {1, ..., n}, the j − th column of C is⎡⎢⎢⎢⎢⎣Pm

i=1 l1i2 · l

ij1Pm

i=1 lki2 · l

ij1Pm

i=1 lpi2 · l

ij1

⎤⎥⎥⎥⎥⎦On the other hand, the j − th column of B ·A is

⎡⎢⎢⎢⎢⎣[1st row of B] · [j − th column of A]

...[k − th row of B] · [j − th column of A]

...[p− th row of B] · [j − th column of A]

⎤⎥⎥⎥⎥⎦ =

⎡⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎣

£l112 ... l1i2 ... l1m2

¤·

⎡⎢⎢⎢⎢⎣l1j1

lij1

lmj1

⎤⎥⎥⎥⎥⎦...

£lk12 lki2 lkm2

¤·

⎡⎢⎢⎢⎢⎣l1j1

lij1

lmj1

⎤⎥⎥⎥⎥⎦...

£lp12 ... lpi2 ... lpm2

¤·

⎡⎢⎢⎢⎢⎣l1j1

lij1

lmj1

⎤⎥⎥⎥⎥⎦

⎤⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎦

=

⎡⎢⎢⎢⎢⎣Pm

i=1 l1i2 · l

ij1Pm

i=1 lki2 · l

ij1Pm

i=1 lpi2 · l

ij1

⎤⎥⎥⎥⎥⎦


as desired.

8.4 Some related properties of a linear function and associ-ated matrix

In this section, the following objects will be given: a vector space V with a basis v =©v1, ..., vn

ª;

a vector space U with a basis u =©u1, ..., un

ª; l ∈ L (V,U) and A ∈M (m,n).

From Remark 147, recall that

col span A = {z ∈ Rm : ∃ x ∈ Rn such that z = Ax} ;

Lemma 303 cru (Im l) = col span [l]uv.

Proof. [⊆]y ∈ cru (Im l)

def cr⇒ ∃u ∈ Im l such that cru (u) = [u]u = ydef Im l⇒ ∃v ∈ V such that l (v) = u⇒

∃v ∈ V such that [l (v)]u = yProp. 295⇒ ∃v ∈ V such that [l]uv · [v]v = y ⇒ y ∈ col span [l]uv.

[⊇]We want to show that y ∈ col span [l]uv ⇒ y ∈ cru (Im l), i.e., ∃u ∈ Im l such that y = [u]u.

y ∈ col span [l]uv ⇒ ∃xy ∈ Rn such that [l]uv ·xy = y

def cr⇒ ∃v =Pn

j=1 xy,jvj such that [l]uv · [v]v =

yProp. 295⇒ ∃v ∈ V such that [l (v)]u = y

u=l(v)⇒ ∃u ∈ Im l such that y = [u]u, as desired.

Lemma 304 dim Im l = dimcol span [l]uv = rank [l]uv.

Proof. It follows from the above Lemma, the proof of Proposition 283, which says that cru isan isomorphism and Proposition 288, which says that isomorphic spaces have the same dimension.

Proposition 305 Given l ∈ L (V,U),

1. l onto ⇔ rank [l]uv = dimU ;

2. l one-to-one ⇔ rank [l]uv = dimV ;

3. l invertible ⇔ [l]uv invertible, and in that case£l−1¤vu=£[l]uv¤−1.

Proof. Recall that from Remark 277 and Proposition 286, l one-to-one ⇔ l nonsingular ⇔ker l = {0}.

1. l onto ⇔ Im l = U ⇔ dim Im l = dimULemma 304⇔ rank [l]

uv = dimU .

2. l one-to-oneProposition 274⇔ dim Im l = dimV

Lemma 304⇔ rank [l]uv = dimV .

3. The first part of the statement follows from 1. and 2. above. The second part is provenbelow. First of all observe that for any vector space W with basis w, idW ∈ L (W,W ) and ifW has a basis w =

©w1, .., wk

ª, we also have that

[idW ]ww =

££idW

¡w1¢¤w, ..,£idW

¡wk¢¤w

¤= Ik.

Moreover, if l is invertiblel−1 ◦ l = idV

and £l−1 ◦ l

¤vv= [idU ]

vv = Im.

Since £l−1 ◦ l

¤vv=£l−1¤vu· [l]uv ,

the desired result follows.

8.4. SOMERELATED PROPERTIES OF A LINEAR FUNCTION ANDASSOCIATEDMATRIX93

Remark 306 From the definitions of ϕ and ψ, we have what follows:1.

l = ψ (ϕ (l)) = ψ¡[l]uv

¢= lu[l]uv,v.

2.A = ϕ (ψ (A)) = ϕ

¡luA,v

¢=£luA,v

¤uv.

Lemma 307 cru¡Im luA,v

¢= col spanA.

Proof. Recall that ϕ (l) = [l]uv and ψ (A) = luA,v. For any l ∈ L (V,U),

cru (Im l)Lemma 303

= col span [l]uv

Def. 294= col span ϕ (l) (8.4)

Take l = luA,v. Then from (8.4) ,we have

cru¡Im luA,v

¢= col span ϕ

¡luA,v

¢ Rmk. 306.2= col span A.

Lemma 308 dim Im luA,v = dimcol spanA = rankA.

Proof. Since Lemma 304 holds for any l ∈ L (V,U) and luA,v ∈ L (V,U), we have that

dim Im luA,v = dimcol span£luA,v

¤uv

Rmk. 306.2= dimcol span A

Rmk. 241.= rank A.

Proposition 309 Let A ∈M (m,n) be given.

1. rankA = m⇔ luA,v onto;

2. rankA = n⇔ luA,v one-to-one;

3. A invertible ⇔ luA,v invertible, and in that case lvA−1,u =

¡luA,v

¢−1.

Proof. 1. rankA = mLemma 308⇔ dim Im luA,v = m⇔ luA,v onto;

2. rankA = n(1)⇔ dimker luA,v = 0

Proposition 286⇔ luA,v one-to-one,where (1) the first equivalence follows form the fact that n = dimker luA,v + dim Im luA,v, and

Lemma 308.3. First statement: A invertible

Prop. 236⇔ rankA = m = n1 and 2 above⇔ luA,v invertible.

Second statement: Since luA,v invertible, there exists¡luA,v

¢−1: U → V such that

idV =¡luA,v

¢−1 ◦ luA,v.Then

I = ϕvv (idV )Prop. 302= ϕvu

³¡luA,v

¢−1´ · ϕuv ¡luA,v¢ Rmk. 306= ϕvu

³¡luA,v

¢−1´ ·A.Then, by definition of inverse matrix,

A−1 = ϕvu

³¡luA,v

¢−1´and

ψvu¡A−1

¢= (ψvu ◦ ϕvu)

³¡luA,v

¢−1´= idL(U,U)

³¡luA,v

¢−1´=¡luA,v

¢−1.

Finally, from the definition of ψvu, we have

ψvu¡A−1

¢= lvA−1,u,

as desired.


Remark 310 Consider A ∈M (n, n). Then from Proposition 309,

A invertible ⇔ luA,v invertible;

from Proposition 236,A invertible ⇔ A nonsingular;

from Proposition 289,luA,v invertible ⇔ luA,v nonsingular.

Therefore,A nonsingular ⇔ luA,v nonsingular.

8.5 Some facts on L (Rn,Rm)

In this Section, we specialize (basically repeat) the content of the previous Section in the importantcase in which

V = Rn, v =¡ejn¢nj=1

:= en

U = Rm u =¡eim¢mi=1

:= em

v = x (8.5)

and thereforel ∈ L (Rn,Rm) .

From L (Rn,Rm) to M (m,n).From Definition 293, we have

[l]emen =h £

l¡e1n¢¤em

...£l¡ejn¢¤em

... [l (enn)]em

i=

=£l¡e1n¢

... l¡ejn¢

... l (enn)¤:= [l] ;

(8.6)

from Definition 294,

ϕ := ϕemen : L (Rn,Rm)→M (m,n) , l 7→ [l] ;

from Proposition 295,[l] · x = l (x) . (8.7)

From M (m,n) to L (Rn,Rm).From Definition 297,

lA := lemA,em : Rn → Rm, (8.8)

lA (x) =Pm

i=1

¡Ri (A) · [x]en

¢· eim =

=

⎡⎢⎢⎢⎢⎣R1 (A) · x

...Ri (A) · x

...Rm (A) · x

⎤⎥⎥⎥⎥⎦ = Ax.(8.9)

From Definition 299,

ψ := ψemen :M (m,n)→ L (Rn,Rm) : A 7→ lA. .

From Proposition 301,

M (m,n) and L (Rn,Rm) are isomorphic.

8.5. SOME FACTS ON L¡RN ,RM

¢95

From Proposition 302, if l1 ∈ L (Rn,Rm) and l2 ∈ L (Rm,Rp), then

[l2 ◦ l1] = [l2] · [l1] . (8.10)

Some related properties.From Proposition 305, given l ∈ L (Rn,Rm),1. l onto ⇔ rank [l] = m;2. l one-to-one ⇔ rank [l] = n;

3. l invertible ⇔ [l] invertible, and in that case£l−1¤= [l]−1.

From Remark 306,1.

l = ψ (ϕ (l)) = ψ ([l]) = l[l].

2.A = ϕ (ψ (A)) = ϕ (lA) = [lA] .

From Proposition 309, given A ∈M (m,n),1. rankA = m⇔ lA onto;2. rankA = n⇔ lA one-to-one;3. A invertible ⇔ lA invertible, and in that case lA−1 = (lA)

−1 .

Remark 311 From (8.7) and Remark 147,

Im l := {y ∈ Rm : ∃x ∈ Rn such that y = [l] · x} = col span [l] . (8.11)

Then, from the above and Remark 241,

dim Im l = rank [l] = max# linearly independent columns of [l] . (8.12)

Similarly, from (8.9) and Remark 241, we get

Im lA := {y ∈ Rm : ∃x ∈ Rn such that y = lA · x = Ax} = col span A,

anddim Im lA = rank A = max# linearly independent columns of A.

Remark 312 Assume that l ∈ L (Rn,Rm). The above Remark gives a way of finding a basis ofIm l: it is enough to consider a number equal to rank [l] of linearly independent vectors among thecolumn vectors of [l]. In a more detailed way, we have what follows.1. Compute [l].2. Compute dim Im l = rank [l] := r.3. To find a basis of Im l, we have to find r vectors which are a. linearly independent, and b.

elements of Im l.To get a “simpler” basis, you can make elementary operations on those column. Recall that

elementary operations on linearly independent vectors lead to linearly independent vectors, and thatelementary operations on vectors lead to vector belonging to the span of the set of starting vectors.

Example 313 Given n ∈ N with n ≥ 3, and the linear function

l : Rn → R3, x := (xi)ni=1 7→

⎧⎨⎩Pn

i=1 xix2x2 + x3

.

find a basis forIm l.1.

[l] =

⎡⎣ 1 1 1 1 ... 10 1 0 0 ... 00 1 1 0 ... 0

⎤⎦2. Since

rank

⎡⎣ 1 1 11 0 01 1 0

⎤⎦ = rank⎡⎣ 0 1 11 0 00 1 0

⎤⎦ 0 = rank⎡⎣ 0 0 11 0 00 1 0

⎤⎦ = 3


3. A basis of Im la is given by the column vectors of⎡⎣ 1 1 11 0 01 1 0

⎤⎦ .Remark 314 From (8.7), we have that

ker l = {x ∈ Rn : [l]x = 0} ,

i.e., ker l is the set, in fact the vector space, of solution to the systems [l]x = 0. In Remark361, we will describe an algorithm to find a basis of the kernel of an arbitrary linear function.

Remark 315 From Remark 311 and Proposition 274 (the Dimension Theorem), given l ∈ L (Rn,Rm),

dimRn = dimker l + rank [l] ,

and given A ∈M (m,n),dimRn = dimker lA + rank A.

8.6 Examples of computation of [l]uv1. id ∈ L (V, V ) .

[id]vv =

££id¡v1¢¤v, ...,

£id¡vj¢¤v, ..., [id (vn)]v

¤=

=££v1¤v, ...,

£vj¤v, ..., [vn]v

¤=£e1n, ..., e

jn, ..., e

nn

¤= In.

2. 0 ∈ L (V,U) .

[0]uv = [[0]v , ..., [0]v , ..., [0]v] = 0 ∈M (m,n) .

3. lα ∈ L (V, V ), with α ∈ F.

[lα]vv =

££α · v1

¤v, ...,

£α · vj

¤v, ..., [α · vn]v

¤=£α ·£v1¤v, ..., α ·

£vj¤v, ..., α · [vn]v

¤=

=£α · e1n, ..., α · ejn, ..., α · enn

¤= α · In.

4. lA ∈ L (Rn,Rm), with A ∈M (m,n).

[lA] =£A · e1n, ..., A · ejn, ..., A · enn

¤= A ·

£e1n, ..., e

jn, ..., e

nn

¤= A · In = A.

5. (projection function) projn+k,n ∈ L¡Rn+k,Rn

¢, projn+k,n : (xi)

n+ki=1 7→ (xi)

ni=1 . Defined

projn+k,n := p, we have

[p] =£p¡e1n+k

¢, ..., p

¡enn+k

¢, p¡en+1n+k

¢, ..., p

¡en+kn+k

¢¤=£e1n, ..., e

nn, 0, ..., 0

¤= [In|0] ,

where 0 ∈M (n, k).6. (immersion function) in,n+k ∈ L

¡Rn,Rn+k

¢, in,n+k : (xi)

ni=1 7→ ((xi)

ni=1 , 0) with

0 ∈ Rk.Defined in,n+k := i, we have

[i] =£i¡e1n¢, ..., i (enn)

¤=

∙e1n ... enn0 ... 0

¸=

∙In0

¸,

where 0 ∈M (k, n).

Remark 316 Point 4. above implies that if l : Rn → Rm, x 7→ Ax, then [l] = A. In other words,to compute [l] you do not have to take the image of each element in the canonical basis; the firstline of [l] is the vector of the coefficient of the first component function of l and so on. For example,if l : R2 → R3,

x 7→

⎧⎨⎩ a11x1 + a12x2a21x1 + a22x2a31x1 + a32x2

,

8.7. CHANGE OF BASIS AND LINEAR OPERATORS 97

then

[l] =

⎡⎣ a11 a12a21 a22a31 a32

⎤⎦8.7 Change of basis and linear operatorsIn this section, we answer the following question: how does the matrix representation of a linearfunction l ∈ L (V, V ) for given basis v =

©v1, .., vn

ªchange when we change the basis?

Definition 317 Let V be a vector space over a field F . l ∈ L (V, V ) is called a linear operator onV .

Proposition 318 Let P be the change-of-basis matrix from a basis v to a basis u for a vector spaceV . Then, for any l ∈ L (V, V ),

[l]uu = P−1 [l]vv P,

In words, if A is the matrix representing l in the basis v, then B = P−1AP is the matrix whichrepresents l in a new basis u, where P is the change-of-basis matrix from v to u.

Proof. For any v ∈ V , from Proposition 205, we have that

P [v]u = [v]v .

Then, premultiplying both sides by P−1 [l]vv, we get

P−1 [l]vv P [v]u = P−1 [l]vv [v]vProp. 295= P−1 [l (v)]v

Prop. 205= [l (v)]u . (8.13)

From Proposition 295,[l]uu · [v]u = [l (v)]u . (8.14)

Therefore, form (8.13) and (8.14), we get

P−1 [l]vv P [v]u = [l]uu · [v]u . (8.15)

Since, from the proof of Proposition 283, [...]u := cru : V → Rn is onto, ∀x ∈ Rn, ∃v ∈ V suchthat [v]u = x. Observe that ∀k ∈ {1, .., n},

£uk¤u= ekn. Therefore, rewriting (8.15) with respect to

k, for any k ∈ {1, .., n}, we getP−1 [l]vv PIn = [l]

uu · In,

as desired..

Remark 319 A and B represent l means that there exists basis v and u such that

A = [l]vv and B = [l]

uu .

Moreover, from Definition 202 and Proposition 204, there exists P invertible which is a change-of-basis matrix. The above Proposition 318 says that A and B are similar. Therefore, all thematrix representations of l ∈ L (V, V ) form an equivalence class of similar matrices.

Remark 320 Now suppose thatf :M (n, n)→ R

is such thatif A is similar to B, then f (A) = f (B) .

Then, given a vector space V of dimension n, we can define

F : L (V, V )→ R

such that for any basis v of V ,F (l) = f

¡[l]vv

¢.

Indeed, by definition of f and by Remark 319, the definition is well given: for any pair of basisu and v, f ([l]u) = f ([l]v).


Remark 321 For any A,P ∈M (n, n) with P invertible,

tr PAP−1 = tr A,

as verified below. tr PAP−1 = tr (PA)P−1 = tr P−1PA = tr A,where the second equalitycomes from Property 3 in Proposition 82.

Remark 320, Proposition 246 and Remark 321 allow to give the following definitions.

Definition 322 Let l ∈ L (V, V ) and a basis v of V . The definitions of determinant, trace andcharacteristic polynomial of l are as follows:

det : L (V, V )→ R, : l 7→ det [l]vv ,

tr : L (V, V )→ R, : l 7→ tr [l]vv ,

∆l : L (V, V )→ space of polynomials, : l 7→ ∆[l]vv .

8.8 Diagonalization of linear operatorsIn this Section, we assume that V is a vector space of dimension n on a field F .

Definition 323 Given a vector space V of dimension n, l ∈ L (V, V ) is diagonalizable if thereexists a basis v of V such that [l]vv is a diagonal matrix.

We present below some definitions and results which are analogous to those discussed in Section6.3. In what follows, V is a vector space of dimension n.

Definition 324 Let a vector space V on a field F and l ∈ L (V, V ) be given. A scalar λ ∈ F iscalled an eigenvalue of l if there exists a nonzero vector v ∈ vV such that

l (v) = λv.

Every vector v satisfying the above relationship is called an eigenvector of l associated with (orbelonging to) the eigenvalue λ.

Definition 325 Let Eλ be the set of eigenvector belonging to λ.

Proposition 326 1. There is only one eigenvalue associated with an eigenvector.2. The set Eλ ∪ {0} is a subspace of V .

Proof. Similar to the matrix case.

Proposition 327 Let a vector space V on a field F and l ∈ L (V, V ) be given. Then, the followingstatements are equivalent.

1. λ ∈ F is an eigenvalue of l.

2. λidV − l is singular.

3. λ ∈ F is a solution to the characteristic equation ∆l (t).

Proof. Similar to the matrix case.

Definition 328 Let λ be an eigenvalue of l. The algebraic multiplicity of λ is the multiplicity ofλ as a solution to the characteristic equation ∆l (t) = 0. The geometric multiplicity of λ is thedimension of Eλ ∪ {0}.

Proposition 329 l ∈ L (V, V ) is diagonalizable ⇔ there exists a basis of V made up by n (linearlyindependent) eigenvectors.In that case, the elements on the diagonal of D are the associated eigenvalues.

8.8. DIAGONALIZATION OF LINEAR OPERATORS 99

Proof. [⇒][l]vv =diag [(λi)

ni=1] := D ⇒ ∀i ∈ {1, ..., n} ,

£l¡vi¢¤v= [l]

vv ·£vi¤v= D · ein = λie

in. Taking

(crv)−1 of both sides, we get

∀i ∈ {1, ..., n} , l¡vi¢= λiv

i.

[⇐]∀i ∈ {1, ..., n} , l

¡vi¢= λiv

i ⇒ ∀i ∈ {1, ..., n} ,£l¡vi¢¤v= [l]

vv ·£vi¤v= D · ein = λie

in ⇒

[l]vv :=££l¡v1¢¤v|...| [l (vn)]v

¤=£λie

1n|...|λienn

¤= diag [(λi)

ni=1] .

Lemma 330©£v1¤v, ..., [vn]v

ªare linearly independent ⇔

©v1, ..., vn


Proof. Pni=1 αi

£vi¤v= 0⇔ (crv)

−1 ¡Pni=1 αi

£vi¤v

¢= (crv)

−1 (0)cr−1 is linear⇔

⇔Pn

i=1 αi (crv)−1 ¡£vi¤

v

¢= 0⇔

Pni=1 αiv

i = 0.

Proposition 331 Let v1, ..., vr be eigenvectors of l ∈ L (V, V ) belonging to pairwise distinct eigen-values λ1, ..., λn, respectively. Then v1, ..., vn are linearly independent.

Proof. By assumption, ∀i ∈ {1, ..., n} , l¡vi¢= λiv

i and [l]vv ·£vi¤v= λi ·

£vi¤v. Then, from the

analogous theorem for matrices, i.e., Proposition 259,©£v1¤v, ..., [vn]v


Then, the result follows from Lemma 330.Proof. l ∈ L (V, V ) is diagonalizable ⇔ there exists a basis of V made up by n (linearly

independent) eigenvectors ⇔ l has n pairwise distinct eigenvalues.

Proposition 332 λ is an eigenvalue of l ∈ L (V, V ) ⇔ λ is a root of the characteristic polynomialof l.

Proof. Similar to the matrix case, i.e., Proposition 253.

Proposition 333 Let λ be an eigenvalue of l ∈ L (V, V ). The geometric multiplicity of λ is smalleror equal than the algebraic multiplicity of λ.

Proof. Assume that the geometric multiplicity of λ is r. Then, by assumption, dimEλ∪{0} = rand therefore there are r linearly independent eigenvectors, v1, ..., vr, and therefore,

∀i ∈ {1, ..., r} , l¡vi¢= λvi

From Lemma 184, there exist©w1, .., ws

ªsuch that u =

©v1, ..., vr, w1, .., ws

ªis a basis of V . Then,

by definition of matrix associated with l with respect to that basis, we have

[l]uu =££l¡v1¢¤u|...| [l (vr)]u |

£l¡w1¢¤u|...| [l (ws)u]

¤=

=

⎡⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎣

λ0...0_00...0

0λ...0_00...0

...

...

...

..._............

00...λ_00...0

||||_||||

a11a21...ar1_b11b21...bs1

a12a22...ar2_b12b22...bs2

...

...

...

..._............

a1sa2s...ars_b1sb2s...bss

⎤⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎦:=

∙λIr A0 B

¸,

for well chosen values of the entries of A and B. Then the characteristic polynomial of [l]uu is

det

µtI −

∙λIr A0 B

¸¶= det

µ∙(t− λ) Ir A

0 tIs −B

¸¶= (t− λ)

r · det (tIs −B) .

Therefore, the algebraic multiplicity of λ is greater or equal than r.


8.9 Appendix.

8.9.1 The dual and double dual space of a vector space

The dual space of a vector space

Definition 334 The2 dual space of a vector space V is

V ∗ := L (V,R) .

Remark 335 careful here From Proposition 292 in my EUI notes, we have that if dimV isfinite, then dimV ∗ = dimV .

Proposition 336 Let V be a finite dimensional vector space over a field Fand let V =©viªni=1

bea basis for V.Then,1. there exists a unique basis V∗ = {li}ni=1 for V ∗ such that ∀i, j ∈ {1, ..., n} , li

¡vj¢= δij;

2.

∀f ∈ V ∗, f =Pn

i=1 f¡vi¢· li, i.e., [f ]V∗ =

¡f¡vi¢¢n

i=1,

i.e., the coordinates of f with respect to V∗are the images via f of the elements of the basis V, and3.

∀v ∈ V, v =Pn

i=1 li (v) · vi, i.e., [v]V = (li (v))ni=1,

i.e., the coordinates of v with respect to V are the images of v via the elements of the basis V.

Proof. 1. From Proposition 276 in my EUI notes, we have that given a basis V =©viªni=1

forV,

∀i ∈ {1, ..., n} , ∃ ! li ∈ V ∗ such that ∀j ∈ {1, ..., n} , li¡vj¢= δij .

We are left with showing that {li}ni=1 is a linearly independent and the therefore a basis of then dimensional vector space V ∗. We want to show that if

Pni=1 αili = 0, then (αi)

ni=1 = 0, i.e., if

∀v ∈ V,Pn

i=1 αili (v) = 0, then (αi)ni=1 = 0. Then, ∀j ∈ {1, ..., n} , 0 =

Pni=1 αili

¡vj¢= αj, as

desired.2. Take f ∈ V ∗; then , since V∗ = {li}ni=1 is a basis for V ∗, ∃ (βi)

ni=1 ∈ Rn such that

f =Pn

i=1 βili. Then,

∀vj ∈ V, f¡vj¢=

nXi=1

βili¡vj¢= βj ,

as desired.3. Take v ∈ V ; then , since V =

©viªni=1

is a basis for V, ∃ (γi)ni=1 ∈ Rn such that v =

Pni=1 γiv

i.Then,

∀lj ∈ V∗, lj (v) =nXi=1

γilj¡vi¢= γj ,

as desired.

Definition 337 Let a basis V =©viªni=1

for V be given. The unique basis V∗ = {li}ni=1 for V ∗such that ∀i, j ∈ {1, ..., n} , li

¡vj¢= δijis called the dual basis of V.

Corollary 338 If V∗ is the unique dual basis of V =©viªni=1, defined l : V → Rn, v 7→ (li (v))

ni=1 ,

we have that

[l]EnV = I ∈M (n, n) .

Proof. [l]EnV :=£l¡v1¢, ..., l (vn)

¤= I.

2Here I follow pages 98 and 99 in Hoffman and Kunze (1971).

8.9. APPENDIX. 101

The double dual space of a vector space

Question. Take3 a basis V∗of V ∗. Is V∗the basis of some basis V of V ? Corollary 344 below willanswer positively to that question.

Definition 339 The double dual V ∗∗of V is the dual space of V ∗,i.e.,

V ∗∗ := (V ∗)∗ := L (V ∗,R) .

Remark 340 If V has finite dimension, then

dimV = dimV ∗ = dimV ∗∗.

Proposition 341 For any v ∈ V , the function

Lv : V∗ → R, f 7→ f (v)

is linear, i.e., Lv ∈ V ∗∗.

Proof. For any α, β ∈ F , f1, f2 ∈ V ∗,

Lv (αf1 + βf2) = (αf1 + βf2) (v) = αf1 (v) + βf2 (v) = αLf1 + βLf2 .

Proposition 342 Let V be a finite dimensional vector space. Then

ϕ : V → V ∗∗, v 7→ Lv

is an isomorphism.

Proof. Step 1. ϕ is linear.For any α, β ∈ F , v1, v2 ∈ V,by definition of ϕ,

ϕ (αv1 + βv2) = Lαv1+βv2 ,

andαϕ (v1) + βϕ (v2) = αLv1 + βLv2 .

Moreover, ∀f ∈ V ∗,

Lαv1+βv2 (f) := f (αv1 + βv2)f∈V ∗= αf (v1) + βf (v2)

def. Lv= αLv1 + βLv2 .

Step 2. ϕ is nonsingular.We want to show that kerϕ = {0}, i.e., hv 6= 0⇒ ϕ (v) := Lv 6= 0i, i.e., ∀v 6= 0, ∃f ∈ V ∗

such that f (v) 6= 0.Take a basis V =©v, vi

ªni=2of V . The existence of such a basis is insured

by Proposition 184 . Then, from Proposition 336, there exists a unique dual basis V∗ = (li)ni=1

associated with V, and l1 (v) = 1 6= 0.Step 3. Desired result.From Remark 335, n = dimV = dimV ∗ = dimV ∗∗. Then, from Proposition ??? 281 ?? my

EUI notes, since ϕ is nonsingular, it is invertible and therefore an isomorphism.

Corollary 343 Let V be a finite dimensional vector space over a field F . If L ∈ V ∗∗, then ∃ !v ∈ V such that ∀f ∈ V ∗, L (f) = f (v).

Proof. Such v is just ϕ (L).

Corollary 344 Let V be a finite dimensional vector space over a field F . Any basis of V ∗ is theunique dual basis of some basis of V .

3Here I follow pages 107 and 108 in Hoffman and Kunze (1971).


Proof. Take a basis {li}ni=1 of V ∗. From Proposition 336, there exists a unique basis {Li}ni=1of V ∗∗such that

∀i, j ∈ {1, ..., n} , Li (fj) = δij. (8.16)

Then, from the previous Corollary,

∀i ∈ {1, ..., n} ,∃ !vi ∈ V such that ∀f ∈ V ∗, Li (f) = f (vi) . (8.17)

Now, from the definition of Lvi , we have

∀f ∈ V ∗, Lvi (f) = f (vi) . (8.18)

Then, from (8.17) and (8.18) ,we getLi = Lvi . (8.19)

Then,

ϕ−1 (Li)(8.19)= ϕ−1 (Lvi) = vi,

where the last equality follows from Proposition 342. Since {Li}ni=1 is a basis of V ∗∗ andϕ is an isomorphism, then from Proposition 279,

©viªni=1

is a basis of V.Moreover, from (8.17),

fi¡vj = Li (fj)

¢ (8.16)= δij , and then, by definition , {li}ni=1 is the unique dual basis associated with

{vi}ni=1.A sometimes useful result is presented below.

Proposition 345 If V is a vector space of dimension bigger than n, for any i ∈ {1, ..., n} , li ∈L (V,R) ,4 and {li}ni=1 is a linearly independent set, then l : V → Rn, v 7→ (li (v))

ni=1 is onto.

Proof. Take an n dimensional vector subspace V 0 of V . From Corollary 344, there exists abasis V =

©viªni=1

of V 0 such that {li}ni=1 is its unique associated dual basis. Then, by definitionof dual basis, for any i ∈ {1, ..., n}, l

¡vi¢= ein. Therefore, for any x ∈ Rn,

l

ÃnXi=1

xivi

!=

nXi=1

xiein = x,

as desired.

8.9.2 Vector spaces as Images or Kernels of well chosen linear functions

This section is preliminary and incomplete.

Proposition 346 If L is a vector subspace of RSof dimension A, then

1. a. ∃l1 ∈ L¡RS ,RS−A

¢such that ker l1 = L;

b. ∃σ ∈ Σ and ∃E ∈M (S −A,A) such that

[l1] = [IS−A|E] · Pσ ∈M (S −A,S) ;

c. E = ψσ (L) where ψσ is a chart of an atlas of GS,A.

2. a. ∃l2 ∈ L¡RA,RS

¢such that Iml2 = L;

b. 5

[l2] = Pσ−1

∙−EIA×A

¸and, therefore, if σ = id,

[l2] =

∙−EIA

¸S×A

4L (V,R) is the vector space of linear functions from V to R.5Recall that ∀σ ∈ Σ,

Pσ−1 = P−1σ = PTσ .

8.9. APPENDIX. 103

3.

L = [IS−A|E] · Pσ = ImPσ−1

∙−EIA×A

¸Moreover,

4. Let M,M 0 ∈Mf (S −A,S) be given. Then

kerM = kerM 0 ⇔ there exists B ∈Mf (S −A,S −A) such that M 0 = BM.

5. Let Y, Y 0 ∈Mf (S,A) be given. Then

ImY = ImY 0 ⇔ there exists a unique C ∈Mf (A,A) such that Y 0 = Y C.

Proof. 1.Take a basis

©l1, ...lA

ª⊆ RS of L. Define

C =

⎡⎣ l1

...lA

⎤⎦S×A

,

where the vectors are taken to be row vectors. By definition of basis, rank C = A. From theDimension Theorem, dimRS = dim ImlC+dimker lC and S = A+dimker lC , or dimker lC = S−A.Let

©k1, ..., kS−A

ª⊆ RS be a basis of ker lC . Then, ∀s ∈ {A, ..., S −A} , Cks = 0, and

∀a ∈ 1, ..., A, ∀s ∈ {1, ..., S −A} , la · ks = 0,

i.e.,6

L = (ker lC)⊥. (8.20)

Define

M =

⎡⎣ k1

...kS−A

⎤⎦(S−A)×S

.

By definition of basis, rank M = S − A. Moreover, from the fact that©k1, ..., kS−A

ª⊆ RS be

a basis of ker lC and from (8.20), we have that

L =©x ∈ RS : ∀s ∈ {1, ..., S −A} , ks · x = 0

ª=©x ∈ RS :M · x = 0

ª= ker lM .

2.a. Let {va}Aa=1 be a basis of L ⊆ RS . Take l2 ∈ L

¡RA,RS

¢such that

∀a ∈ {1, ..., A} , l2 (eaA) = va,

where eaA is the a—th element in the canonical basis in RA. From a standard Proposition inlinear algebra7, such function does exists and, in fact, it is unique. Then, from the (linear algebra)dimension theorem

dim Iml2 = A− dimker l2 ≤ A.

Moreover, L = span {va}Aa=1 ⊆ Iml2 and dimL = A. Therefore dim Iml2 = A,and L = Iml2, asdesired.b. From Step 2a above, and by definition of [l2], we have that

[l2] =£v1, ..., va, ..., vA

¤∈M (S,A) ,

6We are using the obvious fact that:Two vector spaces V and W are orthogonal iff each each element in a basis of V is orthogonal to each element in

a basis of W .7Let V and U be finite dimensional vectors spaces such that S = v1, ..., vn is a basis of V and u1, ..., un

is a set of arbitrary vectors in U . Then there exists a unique linear function l : V → U such that ∀i ∈ {1, ..., n},l vi = ui - see my math class notes, Proposition 273, page 82.


where {va}Aa=1 is a basis of L and the vectors va are written as column vectors. Therefore, weare left with finding a basis of L = ker [IS−A|E] · Pσ, which we claim is given by the A columnvectors of

Pσ−1

∙−EIA×A

¸,

a fact which is proved below:i. The vectors belong to ker [IS−A|E] · Pσ:

[I|E] · Pσ · Pσ−1∙−EI

¸= [I|E] ·

∙−EI

¸= 0.

ii. the vectors are linearly independent:

rankPσ−1

∙−EI

¸(1)= rank

∙−EI

¸= A,

where (1) follows from Corollary 7, page 10 in the 4-author book.3.It is an immediate consequence of 2. and 3. above.4.See Villanacci and others (2002), Proposition 39, page 388, or observe what follows[⇒]From, say, Theorem on page 41, in Ostaszewski (1990)8, we have that

(kerY )⊥= ImY T ,

(this result is proved, along other things, in my handwritten file some-facts-on-Stiefel-manifolds-2011-10-12.pdfand therefore, using the assumption

kerY T = (ImY )⊥ = (ImY 0)⊥= ker(Y 0)T .

Then, from part 3 in the present Proposition,

there exists bB ∈Mf (A,A) such that Y 0T = bBY T .

andY

0= Y bBT ,

and it is then enough to take C = bBT .[⇐]ImY ⊆ ImY 0 : w ∈ ImY ⇒ ∃z ∈ RA such that w = Y z = Y 0Cz ⇒ ∃z0 = Cz ∈ RA such that

w = Y 0z0 ⇔ w ∈ ImY 0.ImY 0 ⊆ ImY : w0 ∈ ImY 0 ⇒ ∃z0 ∈ RA such that w0 = Y 0z0 = Y 0CC−1z0 = Y C−1z0 ⇒ ∃z =

C−1z0 ∈ RA such that w0 = Y z ⇔ w0 ∈ ImY .Uniqueness. Since Y ∈ Mf (S,A), there exists σ ∈ Σ (S) such that PσY =

∙Y ∗bY

¸, with

Y ∗ ∈ Mf (A,A). Define also PσY0 =

∙Y 0∗bY 0

¸. Then, premultiplying Y 0 = Y C by Pσ, we get

Y 0∗ = Y ∗C and C = Y ∗−1Y 0∗.

8Ostaszewski, A., (1990), Advanced mathematical methods, London School of Economics Mathematics Series,Cambridge University Press, Cambridge, UK

Chapter 9

Solutions to systems of linearequations

9.1 Some preliminary basic facts

Let’s recall some basic definition from Section 1.6.

Definition 347 Consider the following linear system with m equations and n unknowns⎧⎪⎨⎪⎩a11x1 + · · ·+ a1nxn = b1

...am1x1 + · · ·+ amnxn = bm

which can be rewritten asAx = b

Am×n is called matrix of the coefficients (or coefficient matrix) associated with the system andMm×(n+1) =

£A | b

¤is called augmented matrix associated with the system.

Recall the following definition.

Definition 348 Two linear system are said to be equivalent if they have the same solutions.

Let’s recall some basic facts we discussed in previous chapters.

Remark 349 It is well known that the following operations applied to a system of linear equationslead to an equivalent system:I) interchange two equations;II) multiply both sides of an equation by a nonzero real number;III) add left and right hand side of an equation to the left and right hand side of another equation;IV) change the place of the unknowns.The transformations I), II), III) and IV) are said elementary transformations, and, as it is well

known, they do not change the solution set of the system they are applied to.Those transformations correspond to elementary operations on rows of M or columns of A in

the way described belowI) interchange two rows of M ;II) multiply a row of M by a nonzero real number;III) sum a row of M to another row of M ;IV) interchange two columns of A.The above described operations do not change the rank of A and they do not change the rank of

M - see Proposition 228.

Homogenous linear system.

105

106 CHAPTER 9. SOLUTIONS TO SYSTEMS OF LINEAR EQUATIONS

Definition 350 A linear system for which b = 0, i.e., of the type

Ax = 0

with A ∈M (m,n), is called homogenous system.

Remark 351 Obviously, 0 is a solution of the homogenous system. The set of solution of a homo-geneous system is ker lA. From Remark 315,

dimker lA = n− rank A.

9.2 A solution method: Rouchè-Capelli’s and Cramer’s the-orems

The solution method presented in this section is based on two basic theorems.1. Rouchè-Capelli’s Theorem, which gives necessary and sufficient condition for the existence of

solutions;2. Cramer’s Theorem, which gives a method to compute solutions - if they exist.

Theorem 352 (Rouche− Capelli) A system with m equations and n unknowns

Am×nx = b (9.1)

has solutions⇔

rank A = rank£A | b

¤Proof. [⇒]Let x∗ be a solution to 9.1. Then, b is a linear combination, via the solution x∗ of the columns

of A. Then, from Proposition 228,

rank£A | b

¤= rank

£A | 0

¤= rank

£A¤.

[⇐]1st proof.We want to show that

∃x ∈ Rn such that Ax∗ = b, i.e., b =nXj=1

x∗j · Cj (A) .

By assumption, rank A = rank£A | b

¤:= r. Since rank A = r, there are r linearly

independent column vectors of A, say {Cj (A)}j∈R, where R ⊆ {1, ..., n} and #R = r.

Since rank£A | b

¤= r, {Cj (A)}j∈R∪{b} is a linearly dependent set and from Lemma 189,

b is a linear combinations of the vectors in {Cj (A)}j∈R, i.e., ∃ (xj)j∈R such that b =P

j∈R xj ·Cj (A)and

b =Xj∈R

xj · Cj (A) +X

j0∈{1,...,n}\R0 · Cj0 (A) .

Then, x∗ =¡x∗j¢nj=1

such that

x∗j =

⎧⎨⎩ xj if j ∈ R

0 if j0 ∈ {1, ..., n} \R

is a solution to Ax = b.Second proof.

9.2. A SOLUTION METHOD: ROUCHÈ-CAPELLI’S AND CRAMER’S THEOREMS 107

Since rank A := r ≤ min {m,n}, by definition of rank, there exists a rank r square submatrixA∗ of A. From Remark 349, reordering columns of A and rows of

£A | b

¤does not change the

rank of A or£A | b

¤and leads to the following system, which is equivalent to Ax = b:∙

A∗ A12A21 A22

¸ ∙x1

x2

¸=

∙b0

b00

¸,

where A12 ∈ M (r, n− r) , A21 ∈ M (m− r, r) , A22 ∈ M (m− r, n− r) , x1 ∈ Rr, x2 ∈ Rn−r, b0 ∈Rr, b00 ∈ Rm−r,¡

x1, x2¢has been obtained from x performing on it the same permutations performed on the

columns of A,(b0, b00) has been obtained from b performing on it the same permutations performed on the rows

of A.Since

rank£A∗ A12 b0

¤= rank A∗ = r,

the r rows of£A∗ A12 b0

¤are linearly independent. Since

rank

∙A∗ A12 b0

A21 A22 b00

¸= r,

the rows of that matrix are linearly dependent and from Lemma 189, the last m − r rows of[A|b] are linear combinations of the first r rows. Therefore, using again Remark 349, we have thatAx = b is equivalent to £

A∗ A12¤ ∙ x1

x2

¸= b0

or, using Remark 74.2,A∗x1 +A12x2 = b0

andx1 = (A∗)−1

¡b0 −A12x

2¢∈ Rr

while x2 can be chosen arbitrarily; more preciselyn¡x1, x2

¢∈ Rn : x1 = (A∗)−1

¡b0 −A12x

2¢∈ Rr and x2 ∈ Rn−r

ois the nonempty set of solution to the system Ax = b.

Theorem 353 (Cramer) A system with n equations and n unknowns

An×nx = b

with detA 6= 0, has a unique solution x = (x1, ..., xi, ..., xn) where for i ∈ {1, ..., n} ,

xi =detAi

detA

and Ai is the matrix obtained from A substituting the column vector b in the place of the i −th column.

Proof. since detA 6= 0, A−1 exists and it is unique. Moreover, from Ax = b, we get A−1Ax =A−1b and

x = A−1b

MoreoverA−1b =

1

detAAdj A · b

It is then enough to verify that

Adj A · b =

⎡⎢⎢⎢⎢⎣detA1...

detAi

...detAn

⎤⎥⎥⎥⎥⎦


which we omit (see Exercise 7.34, page 268, in Lipschutz (1991).The combinations of Rouche-Capelli and Cramer’s Theorem allow to give a method to solve any

linear system - apart from computational difficulties.

Remark 354 Rouche’-Capelli and Cramer’s Theorem based method.Let the following system with m equations and n unknowns be given.

Am×nx = b

1. Compute rank A and rank£A | b

¤.

i. Ifrank A 6= rank

£A | b

¤,

then the system has no solution.ii. If

rank A = rank£A | b

¤:= r,

then the system has solutions which can be computed as follows.2. Extract a square r-dimensional invertible submatrix Ar from A.i. Discard the equations, if any, whose corresponding rows are not part of Ar.ii. In the remaining equations, bring on the right hand side the terms containing unknowns

whose coefficients are not part of the matrix Ar, if any.iii. You then get a system to which Cramer’s Theorem can be applied, treating as constant the

expressions on the right hand side and which contain n − r unknowns. Those unknowns can bechosen arbitrarily. Sometimes it is said that then the system has “∞n−r” solutions or that thesystem admits n− r degrees of freedom. More formally, we can say what follows.

Definition 355 Given S, T ⊆ Rn, we define the sum of the sets S and T , denoted by S + T , asfollows

{x ∈ Rn : ∃s ∈ S, ∃t ∈ T such that x = s+ t} .

Proposition 356 Assume that the set S of solutions to the system Ax = b is nonempty and letx∗ ∈ S. Then

S = {x∗}+ ker lA :=©x ∈ Rn : ∃x0 ∈ ker lA such that x = x∗ + x0

ªProof. [⊆]Take x ∈ S. We want to find x0 ∈ kerA such that x = x∗ + x0. Take x0 = x − x∗. Clearly

x = x∗ + (x− x∗). Moreover,

Ax0 = A (x− x∗)(1)= b− b = 0 (9.2)

where (1) follows from the fact that x, x∗ ∈ S.[⊇]Take x = x∗ + x0 with x∗ ∈ S and x0 ∈ kerA. Then

Ax = Ax∗ +Ax0 = b+ 0 = b.

Remark 357 The above proposition implies that a linear system either has no solutions, or has aunique solution, or has infinite solutions.

Definition 358 V is an affine subspace of Rn if there exists a vector subspace W of Rn and avector x ∈ Rn such that

V = {x}+W

We say that the1 dimension of the affine subspace V is dimW.

1 If W 0 and W 00 are vector subspaces of Rn, x ∈ Rn and V := {x}+W 0 = {x}+W 00, then W 0 =W 00.Take w ∈ W 0 , then x+ w ∈ V = {x}+W 00. Then there exists x ∈ {x} and w ∈ W 00 such that x+ w = x + w.

Then w = w ∈W 00, and W 0 ⊆W 00. Similar proof applies for the opposite inclusion.


Remark 359 Since dimkerA = n − rank A, the above Proposition and Definition say that if anonhomogeneous systems has solutions, then the set of solutions is an affine space of dimensionn− rank A.

Exercise 360 Apply the algorithm described in Remark 354 to solve the following linear system.⎧⎨⎩ x1 + 2x2 + 3x3 = 14x1 + 5x2 + 6x3 = 25x1 + 7x2 + 9x3 = 3

The associated matrix£A | b

¤is⎡⎣ 1 2 3 | 14 5 6 | 25 7 9 | 3

⎤⎦1. Since the third row of the matrix

£A | b

¤is equal to the sum of the first two rows, and

since

det

∙1 24 5

¸= 5− 8 = −3,

we have thatrank A = rank

£A | b

¤= 2,

and the system has solutions.

2. Define

A2 =

∙1 24 5

¸i. Discarding the equations, whose corresponding rows are not part of A2and, in the remaining

equations, bringing on the right hand side the terms containing unknowns whose coefficients are notpart of the matrix A2, we get ⎧⎨⎩ x1 + 2x2 = 1− 3x3

4x1 + 5x2 = 2− 6x35x1 + 7x2 = 3− 9x3

iii. Then, using Cramer’s Theorem, recalling that detA2 = −3,we get

x1 =

det

∙1− 3x3 22− 6x3 5

¸−3 = x3 −

1

3,

x2 =

det

∙1 1− 3x34 2− 6x3

¸−3 =

2

3− 2x3,

and the solution set is½(x1, x2, x3) ∈ R3 : x1 = x3 −

1

3, x2 =

2

3− 2x3

¾.

Remark 361 How to find a basis of ker .Let A ∈ M (m,n) be given and rankA = r ≤ min {m,n}. Then, from the second proof of

Rouche’-Capelli’s Theorem, we have that system

Ax = 0

admits the following set of solutionsn¡x1, x2

¢∈ Rr ×Rn−r : x1 = (A∗)−1

¡−A12 · x2

¢o(9.3)


Observe that dimker lA = n− r := p. Then, a basis of ker lA is

B =½∙− [A∗]−1A12 · e1pe1p

¸, ...,

∙− [A∗]−1A12 · eppepp

¸¾.

To check the above statement, we check that 1. B ⊆ ker lA, and 2. B is linearly independent2 .1. It follows from (9.3);2. It follows from the fact that det

£e1p, ..., e

pp

¤= det I = 1.

Example 362 ½x1 + x2 + x3 + 2x4 = 0x1 − x2 + x3 + 2x4 = 0

Defined

A∗ =

∙1 11 −1

¸, A12 =

∙1 21 2

¸the starting system can be rewritten as∙

x1x2

¸= −

∙1 11 −1

¸−1 ∙1 21 2

¸ ∙x3x4

¸.

Then a basis of ker is

B =

⎧⎪⎪⎨⎪⎪⎩⎡⎢⎢⎣ −

∙1 11 −1

¸−1 ∙1 21 2

¸ ∙10

¸∙10

¸⎤⎥⎥⎦ ,⎡⎢⎢⎣ −

∙1 11 −1

¸−1 ∙1 21 2

¸ ∙01

¸∙01

¸⎤⎥⎥⎦⎫⎪⎪⎬⎪⎪⎭ =

=

⎧⎪⎪⎨⎪⎪⎩⎡⎢⎢⎣−1010

⎤⎥⎥⎦ ,⎡⎢⎢⎣−2001

⎤⎥⎥⎦⎫⎪⎪⎬⎪⎪⎭

An algorithm to find eigenvalues and eigenvectors of An×n and to show if A isdiagonalizable.Step 1. Find the characteristic polynomial ∆ (t) of A.Step 2. Find the roots of ∆ (t) to obtain the eigenvalues of A.Step 3. Repeat (a) and (b) below for each eigenvalue λ of A:

(a) Form M = A− λI;(b) Find a basis of kerM . These basis vectors are linearly independent eigenvectors of

A belonging to λ.Step 4. Consider the set S = {v1, v2, ..., vm} of all eigenvectors obtained in Step 3:

(a) if m 6= n, then A is not diagonalizable.(b) If m = n, let P be the matrix whose columns are the eigenvectors {v1, v2, ..., vm}.

Then,D = P−1AP = diag (λi)

0 ,

where for i ∈ {1, ..., n}, λi is the eigenvalue corresponding to the eigenvector vi.

Example 363 Apply the above algorithm to

A =

∙1 42 3

¸.

1.

det [tI −A] = det

∙t− 1 −4−2 t− 3

¸= t2 − 4t− 5.

2. t2 − 4t− 5 = 0 has solutions λ1 = −1, λ2 = 5.2Observe that B is made up by p vectors and dimker lA = p.


3. λ1 = −1.(a)

M =

∙−1− 1 −4−2 −1− 3

¸=

∙−2 −4−2 −4

¸(b) kerA =

©(x, y) ∈ R2 : x = −12y

ª=©(x, y) ∈ R2 : 2x = −y

ª. Therefore a basis of that one

dimensional space is {(2,−1)} .3. λ2 = 5.(a)

M =

∙5− 1 −4−2 5− 3

¸=

∙4 −4−2 2

¸(b) kerA =

©(x, y) ∈ R2 : x = y

ª. Therefore a basis of that one dimensional space is {(1, 1)} .

4. A is diagonalizable.

P =

∙2 1−1 1

¸, P−1 =

∙13 −1313

23

¸and

P−1AP =

∙13 −1313

23

¸·∙1 42 3

¸·∙2 1−1 1

¸=

∙−1 00 5

¸Example 364 Discuss the following system (i.e., say if admits solutions).⎧⎨⎩ x1 + x2 + x3 = 4

x1 + x2 + 2x3 = 82x1 + 2x2 + 3x3 = 12

The augmented matrix [A|b] is: ⎡⎣ 1 1 1 | 41 1 2 | 82 2 3 | 12

⎤⎦

rank [A|b] = rank

⎡⎣ 1 1 1 | 41 1 2 | 80 0 0 | 0

⎤⎦ = 2 = rank A

From Step 2 of Rouche’-Capelli and Cramer’s method, we can consider the system½x2 + x3 = 4− x1x2 + 2x3 = 8− x1

Therefore, x1 can be chosen arbitrarily and since det∙1 11 2

¸= 1,

x2 = det

∙4− x1 18− x1 2

¸= −x1

x3 = det

∙1 4− x11 8− x1

¸= 4

Therefore, the set of solution is©(x1, x2, x3) ∈ R3 : x2 = −x1, x3 = 4

ªExample 365 Discuss the following system½

x1 + x2 = 2x1 − x2 = 0

The augmented matrix [A|b] is: ∙1 1 | 21 −1 | 0

¸


Since detA = −1− 1 = −2 6= 0,

rank [A|b] = rankA = 2

and the system has a unique solution:

x1 =

det

∙2 10 −1

¸−2 =

−2−2 = 1

x2 =

det

∙1 21 0

¸−2 =

−2−2 = 1

Therefore, the set of solution is{(1, 1)}

Example 366 Discuss the following system½x1 + x2 = 2x1 + x2 = 0

The augmented matrix [A|b] is: ½1 1 | 21 1 | 0

Since detA = 1− 1 = 0, and det∙1 21 0

¸= −2 6= 0, we have that

rank [A|b] = 2 6= 1 = rankA

and the system has no solutions. Therefore, the set of solution is ∅.

Example 367 Discuss the following system½x1 + x2 = 22x1 + 2x2 = 4

The augmented matrix [A|b] is: ∙1 1 | 22 2 | 4

¸From rank properties,

rank

∙1 12 2

¸= rank

∙1 10 0

¸= 1

rank

∙1 1 22 2 4

¸= rank

∙1 1 20 0 0

¸= 1

Recall that elementary operations of rows on the augmented matrix do not change the rank ofeither the augmented or coefficient matrices.Therefore

rank [A|b] = 1 = rankA

and the system has infinite solutions. More precisely, the set of solutions is©(x1, x2) ∈ R2 : x1 = 2− x2

ª


Example 368 Say for which value of the parameter k ∈ R the following system has one, infiniteor no solutions: ⎧⎨⎩ (k − 1)x+ (k + 2) y = 1

−x+ ky = 1x− 2y = 1

[A|b] =

⎡⎣ k − 1 k + 2 | 1−1 k | 11 −2 | 1

⎤⎦det [A|b] = det

⎡⎣ k − 1 k + 2 1−1 k 11 −2 1

⎤⎦ = det ∙ −1 k1 −2

¸−det

∙k − 1 k + 21 −2

¸+det

∙k − 1 k + 2−1 k

¸=

(2− k)− (−2k + 2− k − 2) +¡k2 − k + k + 2

¢= 2− k + 2k + k + k2 + 2 = 2k + k2 + 4

∆ = −1− 17 < 0. Therefore, the determinant is never equal to zero and rank [A|b] = 3. Since rankA3×2 ≤ 2, the solution set of the system is empty of each value of k.

Remark 369 To solve a parametric linear system Ax = b, where A ∈ M (m,n), it is convenientto proceed as follows.

1. Perform “easy” row operations on [A|b];

2. Compute min {m,n+ 1} := k and consider the k × k submatrices of the matrix [A|b];

3. among those square matrices choose a matrix A∗ which is a submatrix of A, if possible; if youhave more than one matrix to choose among, choose “the easiest one” from a computationalviewpoint, i.e., that one with highest number of zeros, the lowest number of times a parametersappear, ... ;

4. compute detA∗, a function of the parameter;

5. analyze the two cases detA∗ 6= 0 and detA∗ = 0.

Example 370 Say for which value of the parameter a ∈ R the following system has one, infiniteor no solutions: ⎧⎨⎩ ax1 + x2 + x3 = 2

x1 − ax2 = 02ax1 + ax2 = 4

[A|b] =

⎡⎣ a 1 1 21 −a 0 02a a 0 4

⎤⎦det

⎡⎣ a 1 11 −a 02a a 0

⎤⎦ = a+ 2a2 = 0 if a = 0,−12.

Therefore, if a 6= 0,− 12 ,rank [A|b] = rangoA = 3

and the system has a unique solution.If a = 0,

[A|b] =

⎡⎣ 0 1 1 21 0 0 00 0 0 4

⎤⎦Since det

∙0 11 0

¸= −1, rank A = 2. On the other hand,

det

⎡⎣ 0 1 21 0 00 0 4

⎤⎦ = −1 det ∙ 1 20 4

¸= −4 6= 0


and therefore rank [A|b] = 3, and

rank [A|b] = 3 6= 2 = rangoA

and the system has no solutions.If a = −12 ,

[A|b] =

⎡⎣ −12 1 1 21 1

2 0 0−1 −12 0 4

⎤⎦Since

det

∙1 112 0

¸= −1

2,

rangoA = 2.

Since

det

⎡⎣ −12 1 21 0 01 0 4

⎤⎦ = −det ∙ 1 20 4

¸= −4

and thereforerank [A|b] = 3,

rank [A|b] = 3 6= 2 = rangoA

and the system has no solutions.

Example 371 Say for which value of the parameter a ∈ R the following system has one, infiniteor no solutions: ½

(a+ 1)x1 + (−2)x2 + 2ax3 = aax1 + (−a)x2 + x3 = −2

[A|b] =∙a+ 1 −2 2a | aa −a 1 | −2

¸

det

∙−2 2a−a 1

¸= 2a2 − 2 = 0,

whose solutions are −1, 1. Therefore, if a ∈ R\ {−1, 1},

rank [A|b] = 2 = rangoA

and the system has infinite solutions. Let’s study the system for a ∈ {−1, 1}.If a = −1, we get

[A|b] =∙0 −2 −2 | −1−1 1 1 | −2

¸and since

det

∙0 −2−1 1

¸= 2 6= 0

we have againrank [A|b] = 2 = rangoA

If a = 1,we have

[A|b] =∙2 −2 2 | 11 −1 1 | −2

¸and

rank [A|b] = 2 > 1 = rangoA

and the system has no solution..


Example 372 Say for which value of the parameter a ∈ R the following system has one, infiniteor no solutions: ⎧⎪⎪⎨⎪⎪⎩

ax1 + x2 = 1x1 + x2 = a2x1 + x2 = 3a3x1 + 2x2 = a

[A|b] =

⎡⎢⎢⎣a 1 | 11 1 | a2 1 | 3a3 2 | a

⎤⎥⎥⎦Observe that

rank A4×2 ≤ 2.

det

⎡⎣ 1 1 a2 1 3a3 2 a

⎤⎦ = 3a = 0 if a = 0.

Therefore, if a ∈ R\ {0},rank [A|b] = 3 > 2 ≥ rank A4×2

If a = 0,

[A|b] =

⎡⎢⎢⎣0 1 | 11 1 | 02 1 | 03 2 | 0

⎤⎥⎥⎦and since

det

⎡⎣ 0 1 11 1 02 1 0

⎤⎦ = −1the system has no solution for a = 0.Summarizing, ∀a ∈ R, the system has no solutions.

Chapter 10

Appendix. Basic results oncomplex numbers

A1 simple motivation to introduce complex numbers is to observe that the equation

x2 + 1 = 0 (10.1)

does not have any solution in R. The symbol√−1, denoted by i, was introduced to “solve the

equation anyway” and was called the imaginary unit. It was also said that ±i are solutions to theequation (10.1). Numbers as 3 + 4i were called complex numbers. In what follows, we present arigorous description of what said above.

10.1 Definition of the set C of complex numbersDefinition 373 The set of complex numbers is the set R2 with equality, addition and multiplicationdefined respectively as follows: ∀ (x1, x2) , (y1, y2) ∈ R2,a. Equality:

(x1, x2) = (y1, y2) if x1 = y1 and x2 = y2;

b. Addition:(x1, x2)¢ (y1, y2) = (x1 + y1, x2 + y2);

c. Product:(x1, x2)¡ (y1, y2) = (x1y1 − x2y2, x1y2 + x2y1),

where addition and multiplication on real numbers are the standard ones. The set of complexnumber is denoted by C. For any x = (x1, x2) ∈ C, x1 is called the real part of x and x2 theimaginary part of x.

Remark 374 For simplicity of notation we will use the symbols + and · in the place of ¢ and ¡,respectively.

Remark 375 “Note that the symbol i =√−1 does not appear anywhere in this definition. Presently,

we shall introduce i as a particular complex number which has all the algebraic properties ascribed tothe fictitious symbol

√−1 by the early mathematicians. However, before we do this, we will discuss

the basic properties of the operations just defined.” (Apostol (1967), page 359).

Proposition 376 C is a field.

Proof. We have to verify that the operations presented in Definition 373 satisfy properties ofthe definition of field (see Definition 129. We will show only some of them.Property 2 for multiplication.

1The present Chapter follows very closely Chapter 9 in Apostol (1967).

117

118 CHAPTER 10. APPENDIX. BASIC RESULTS ON COMPLEX NUMBERS

Given x = (x1, x2), y = (y1, y2) and z = (z1, z2), we have

(xy) · z = (x1y1 − x2y2, x1y2 + x2y1) (z1, z2) =

= ((x1y1 − x2y2) z1 − (x1y2 + x2y1) z2, (x1y1 − x2y2) z2 + (x1y2 + x2y1) z1 ) =

= (x1y1z1 − x2y2z1 − x1y2z2 − x2y1z2, x1y1z2 − x2y2z2 + x1y2z1 + x2y1z1 )

and

x · (y · z) = (x1, x2) (y1z1 − y2z2, y1z2 + y2z1) =

= (x1 (y1z1 − y2z2)− x2 (y1z2 + y2z1) , x1 (y1z2 + y2z1) + x2 (y1z1 − y2z2)) =

= (x1y1z1 − x1y2z2 − x2y1z2 − x2y2z1, x1y1z2 + x1y2z1 + x2y1z1 − x2y2z2)

Property 4.The null element with respect to the sum is (0, 0). The null element with respect to the multi-

plication is (1, 0):∀ (x1, x2) ∈ C, (1, 0) (x1, x2) = (1x1 − 0x2, 1x2 + 0x1) = (x1, x2).Property 5. The negative element of (x1, x2) is simply (−x1,−x2) defined − (x1, x2).Property 6. ∀ (x1, x2) ∈ C\ {0}, we want to find its inverse element (x1, x2)−1 := (a, b). We

must then have(x1, x2) (a, b) = (1, 0)

or ½x1 · a− x2 · b = 1x2 · a+ x1 · b = 0

which admits a unique solution iff (x1)2 + (x2)

2 6= 0,i.e., (x1, x2) 6= 0, as we assumed. Then(a = x1

(x1)2+(x2)

2

b = −x2(x1)

2+(x2)2

Summarizing,

(x1, x2)−1 =

Ãx1

(x1)2 + (x2)

2 ,−x2

(x1)2 + (x2)

2

!.

Remark 377 As said in the above proof,0 ∈ C means 0 = (0, 0) ;1 ∈ C means 1 = (1, 0) ;for any x = (x1, x2) ∈ C, we define −x = (−x1,−x2).

Remark 378 Obviously all the laws of algebra deducible from the field axioms also hold for complexnumbers.

10.2 The complex numbers as an extension of the real num-bers

Definition 379C0 = {(x1, x2) ∈ C : x2 = 0} .

Definition 380 A subset G of a field F is a subfield of F if G is a field with respect to the operationsdefining F as field.

Proposition 381 C0 is a subfield of C.

10.3. THE IMAGINARY UNIT I 119

Proof. Properties 1., 2. and 3. follow trivially. Properties 4., 5. and 6. follows from what saidbelow.The null element with respect to addition is (0, 0) ∈ C0 and the null element with respect to

product is (1, 0) ∈ C0; the negative element of any (x, 0) ∈ C0 is (−x, 0) ∈ C0. The inverse of anyelement in C0\ {0} is

³x1(x1)

2 , 0´=³1x1, 0´∈ C0.

Definition 382 Given two fields F1 and F2, if there exists a function f : F1 → F2 which isinvertible and preserves operations, then the function is called a (field) isomorphism and the fieldsare said isomorphic.

Remark 383 In the case described in the above definition, each elements in F1 can be “identified”with its isomorphic image.

Proposition 384 R is field isomorphic to C0, the isomorphism being

f : R→ C0, a 7→ (a, 0) .

Proof. 1. f is invertible.Its inverse is

f−1 : C0 → R, (a, 0) 7→ a.¡f ◦ f−1

¢((a, 0)) = f

¡f−1 (a, 0)

¢= f (a) = (a, 0).¡

f−1 ◦ f¢(a) = f−1 (f (a)) = f−1 (a, 0) = a.

2. f (a+ b) = f (a) + f (b).f (a+ b) = (a+ b, 0) ; f (a) + f (b) = (a, 0) + (b, 0) = (a+ b, 0).3. f (ab) = f (a) f (b) .f (ab) = (ab, 0); f (a) f (b) = (a, 0) (b, 0) = (ab− 0, 0a+ 0b) = (ab, 0).

Definition 385 A field F is an extension of a field G if Fcontains a subfield F 0 isomorphic to G.

Remark 386 From the two above Propositions 381 and 384, C0 is a subfield of C isomorphic toR. Then C is an extension of R, and using what said in Remark 383,

∀a ∈ R, we could identify a ∈ R with (a, 0) ∈ C0. (10.2)

To be precise using the notation of Proposition 384,

f(a) = (a, 0)

and alsof (R) = C0 ⊆ C. (10.3)

Very often, consistently with (10.2) and (10.3) , authors write

R ⊆ C.

10.3 The imaginary unit i

Definition 387 For any x ∈ C, x2 = x · x = (x1, x2) (x1, x2) =³(x1)

2 − (x2)2 , 2x1x2´.

Proposition 388 The equationx2 + 1 = 0 (10.4)

has solutions x∗ = (0, 1) ∈ C and x∗∗ = − (0, 1) ∈ C.

Proof. We show that x∗ is a solution. Observe that in the above equation 1 ∈ C, i.e., we wantto show that

(0, 1) (0, 1) + (1, 0) = (0, 0) .

(0, 1) (0, 1) + (1, 0) = (0− 1, 0) + (1, 0) = (0, 0) ,as desired.On the base of the above proposition, we can give the following definition.


Definition 389i := (0, 1) .

Remark 390 We then have that −i = (0,−1) and then both i and −i are solutions to equation(10.4) and that

i2 = (−i)2 = (0, 1)(0, 1) = (0− 1, 0) = (−1, 0) = − (1, 0) = −1,

where 1 is the complex unit.

The following simple Proposition makes sense of the definition of a+ bi as a complex number.

Proposition 391 ∀ (a, b) ∈ C,(a, b) = (a, 0) + (b, 0) i.

Proof. (a, 0)+(b, 0) i = (a, 0)+(b, 0) (0, 1) = (a, 0)+(b · 0− 0 · 1, b+ 0) = (a, 0)+(0, b) = (a, b).

Remark 392 From Proposition 391, and using (10.2), we could write

(a, b) = a+ bi,

or more precisely, using Proposition 384,

(a, b) = f(a) + if(b).

Remark 393 The use of the identification of (a, b) with a+bi is mainly mnemonic. In fact, treatinga+ bi as “made up by real numbers” and using the fact that i2 = −1, we get back the definition ofproduct and inverse as follows:

(a+ bi) (c+ di) = ac+ adi+ bci− bd = (ac− bd) + (ad+ bc) i,

and if a+ bi 6= 0,1

a+ bi=

a− bi

(a+ bi) (a− bi)=

a

a2 + b2+

−ba2 + b2

i.

We will sometime use the mnemonic notation.

10.4 Geometric interpretation: modulus and argumentSince a complex number is an ordered pair of real numbers, it may be represented geometricallyby a point in the plane. The x-axis is called real axis and the y-axis is called the imaginary axis.Therefore, if z = (a, b) ∈ C\ {(0, 0)} , we can express a and b in polar coordinates, finding the uniquer ∈ R++ and the unique θ ∈ (−π, π] such that

a = r cos θ

b = r sin θ

Therefore,z = r(cos θ + i sin θ) (10.5)

Remark 394 r ∈ R++ is called the modulus or absolute value of z = (a, b) and we write

r = |z| =pa2 + b2.

θ ∈ (−π, π] is called the principal argument of z = (a, b), and we write

θ = arg z.

Remark 395 We let θ belong to the half-open interval (−π, π] of length 2π in order to assign aunique argument θ to a complex number. For a given z = a+ ib the angle θ is determined only upto multiples of 2π, i.e. if θ is an argument of z, ∀K ∈ Z, θ + 2Kπ is an argument as well.

10.5. COMPLEX EXPONENTIALS 121

Remark 396 Taken r = 0,∀θ ∈ R

C 3 0 = 0(cos θ + i sin θ)

Definition 397 The complex conjugate of z = (a, b) is z = z = (a,−b).

Remark 398 The following properties follow from the above definitions. For any z1, z2, z ∈ C,

1.z1 + z2 = z1 + z2;

2.z1 · z2 = z1 · z2;

3. µz1z2

¶=

z1z2;

4.z · z = |z|2 .

Remark 399 For a discussion of the solution of a quadratic equation, see Apostol (1967), pages362 and bottom of page 364.

Remark 400 Some trigonometric identities. The following convenient trigonometric identi-ties hold true2 for any α, β ∈ R:

1. (fundamental formula) cos2 α+ sin2 α = 1;

2. summation formulas:

a. cos (α± β) = cosα · cosβ ∓ sinα · sinβ;

b. sin (α± β) = sinα · cosβ ± cosα · sinβ;

and thena. cos 2α = cos2 α− sin2 α;

b. sin 2α = 2 sinα cosα;

10.5 Complex exponentialsNow we extend the definition of ea to C, i.e., we replace a ∈ R with any z ∈ C and we verify thatin this extension :

P1 ez agrees with the usual exponential when z ∈ R, or, more precisely, if z = (x, 0) ∈ C0, thene(x,0) = (ex, 0) = f (ex).

P2 ∀x, y ∈ C, exey = ex+y, i.e. the law of exponents will be valid for all complex numbers.

Definition 401 Take3 z = (a, b), ez is the complex number given by the equation

ea+ib = ez = ea(cos b+ i sin b) (10.6)

i.e.,ez = ea(cos b, sin b)

Remark 402 Observe that for b = 0, ez = (ex, 0) and P1 is satisfied.

Now we show that P2 is satisfied.2See, for example, Simon and Blume (1994), Appendix A2.3For a motivation of the presented definition see Apostol (1967) page 366.


Theorem 403 ∀x, y ∈ C,exey = ex+y

Proof. Writing x = (a, b) and y = (u, v) , we have

ex = ea(cos b, sin b) ey = eu(cos v, sin v),

soexey = eaeu [(cos b cos v, − sin b sin v) + i(cos b sin v, sin b cos v)] .

Using addition formulas(2) and the law of exponents for real exponentials, we obtain

exey = ea+u [cos(b+ v), sin(b+ v)]

Since x+ y = (a+ u, b+ v) and from (10.6), we have

ex+y = e(a+u, b+v) = e(a+u) [cos(b+ v), sin(b+ v)]

as desired.

Theorem 404 Let z ∈ C\ {(0, 0)} be given. Then

z = reθi, (10.7)

where r := |z| and θ = arg z.

Proof. If z = (a, b), from (10.5) we have

z = r(cos θ, sin θ)

If we take x = 0 and y = θ in(10.6), we obtain

eθi = eθ(0,1) = e(0,θ) = e0(cos θ, sin θ) = (cos θ, sin θ) ,

as desired.

Theorem 405 From (10.7) and (10.5) ,if z ∈ C\ {(0, 0)}, then

z = r(cos θ, sin θ) = r(cos θ + i sin θ) = reiθ (10.8)

Proposition 406 For any z = r (cos θ, i sin θ), any w = s (cos ρ, sin ρ) and n ∈ N, where r =|z| , θ = arg (z) , s = |w| and ρ = arg (w), we have

1. zw = rs (cos (θ + ρ) , sin (θ + ρ)) ,

2. zn = rn (cosnθ, sinnθ) .

Proof. 1. From (10.7) and (10.8) ,

zw = reiθseiρ = rsei(θ+ρ) = rs(cos(θ + ρ) + i sin(θ + ρ))

2. We can prove the result by induction.For n = 1, z = r(cos θ + i sin θ) follows from (10.5).For arbitrary n ∈ N,

zn = zn−1z = rn−1(cos(n− 1)θ + i sin(n− 1)θ)r(cos θ + i sin θ) =

= rn [cos(n− 1)θ cos θ + i(sin θ cos(n− 1)θ + sin(n− 1)θ cos θ)− sin(n− 1)θ sin θ]

= rn(cosnθ + i sinnθ),

where last equality follows from 2a in Remark 400.

Remark 407 Expression 2. in Proposition 406 is also valid for negative integers n if we definez−m to be (z−1)m, where m is a positive integer. Moreover, for any z1, z2 ∈ C, \ {(0, 0)}, withr1, r2 ∈ R++ and θ, φ ∈ R, modulus and argument, respectively, we have

z1z2=

r1eiθ

r2eiφ=

r1r2ei(θ−φ),

and r1r2is the modulus of z1

z2and (θ − φ) is an admissible argument of z1

z2.

10.6. COMPLEX-VALUED FUNCTIONS 123

10.6 Complex-valued functionsDefinition 408 A function f : S ⊆ R→ C is called a complex-valued function of a real variable.A function f : S ⊆ C → C is called a complex-valued function of a complex variable, or a

function of a complex variable.

Remark 409 Some examples of functions of a complex variable are the logarithm, the trigonomet-ric functions and the exponential functions defined in C; new properties are revealed in this context,as the example in the following Proposition does show.

Proposition 410 The exponential function

f : C→ C, f(z) = ez

is periodic with period 2πi.

Proof. Taken z = (a, b) and n ∈ Z, from(10.6) we have

ez+2nπi = ea [cos(b+ 2nπ) + i sin(b+ 2nπ)] = ea [cos b+ i sin b] = ez,

or

ez+(2nπ,0)(0,1) = (ea, 0) [cos(b+ 2nπ), sin(b+ 2nπ)] = (ea, 0) (cos b, sin b) = ez,

i.e.,f(z + 2nπi) = f(z).

Below, we present some simple results on complex-valued functions of a real variable.

Definition 411 Givenf : S ⊆ R→ C, x 7→ u(x) + iv(x)

orf : S ⊆ R→ C, x 7→ (u(x), v(x))

u : S ⊆ R→ R, u : x 7→ u(x) is a real-valued function called real part of f , andv : S ⊆ R→ R, v : x 7→ v(x) is a real-valued function called imaginary part of f.

Now we define continuity, differentiation and integration of f in terms of the correspondingconcepts for the functions u and v.

Definition 412 Givenf : S ⊆ R→ C, x 7→ u(x) + iv(x),

orf : S ⊆ R→ C, x 7→ (u(x), v(x))

1. f is continuous at a point if both u and v are continuous at that point;

2. if both derivatives of u and v, u0 and v0 exist, the derivative of f is f 0(x) = u0(x) + iv0(x) forany x ∈ S or f 0(x) = (u0(x), v0(x)) for any x ∈ S

3. if both integrals of u and v exist, the integral of f on [a, b] isaRb

f(x)dx =aRb

u(x)dx +

iaRb

v(x)dx.oraRb

f(x)dx =

µaRb

u(x)dx,aRb

v(x)dx.

¶Many of the theorems of differential and integral calculus are also valid for complex-valued

functions. In order to show an example, we consider the zero-derivative theorem.

Theorem 413 Let I ⊆ R be an open interval. Given f : I ⊆ R→ C , f : x 7→ f(x) = u(x)+ iv(x),if ∀x ∈ I, f 0(x) = 0, then

f is constant on I.


Proof. Since f 0(x) = u0(x)+ iv0(x), the statement f 0(x) = 0 means that ∀x ∈ I both u0(x) = 0and v0(x) = 0.Then, by zero-derivative theorem for real-valued functions, both u and v are constant on I.

Therefore f is constant on I.We now discuss some examples of differentiation and integration formulas of a complex-valued

function of a real variable.

Theorem 414 Let t ∈ C and f : R→ C, f(x) = etx be given, then

f 0(x) = tetx

Proof. Given t = α+ iβ, α, β ∈ R,we have

f(x) = etx = eαx+iβx(10.6)= eαx cosβx+ ieαx sinβx.

Therefore, the real and imaginary parts of f(x) are given by

u : R→ R, u : x 7→ eαx cosβxv : R→ R v : x 7→ eαx sinβx

and u and v are differentiable . Then, we have

u0(x) = αeαx cosβx− βeαx sinβx

and

v0(x) = αeαx sinβx+ βeαx cosβx.

Since f 0(x) = u0(x) + iv0(x),we also have

f 0(x) = (αeαx cosβx− βeαx sinβx) + i (αeαx sinβx+ βeαx cosβx) =

= αeαx(cosβx+ i sinβx) + iβeαx(cosβx+ i sinβx) = (α+ iβ)e(α+iβ)x =

= tetx

Remark 415 Theorem 414 implies that, for t 6= 0,Zetxdx =

etx

t(10.9)

If we let t = α+ iβ, α, β ∈ R, α 6= 0 and β 6= 0, and equate the real and imaginary parts of equation(10.9), we obtain the integration formulasR

eαx cosβx dx = eαx(α cosβx+β sinβx)α2+β2

andReαx sinβxdx = eαx(α sinβx−β cosβx)

α2+β2,

as shown in detail below,Retxdx =

Reαx+iβxdx

(10.6)=

R((eαx cosβx) + i (eαx sinβx)) dx

by def.=

=R ¡(eαx cosβx) dx+ i

R(eαx sinβx)

¢dx

etx

t=

eαx+iβx

α+ iβ

(10.6), Rmk 393=

eαx (cosβx+ i sinβx) · (α− iβ)

α2 + β2=

=eαx [(α sinβx+ β cosβx) + i(α sinβx− β sinβx)]

α2 + β2.

Part II

Some topology in metric spaces

125

Chapter 11

Metric spaces

11.1 Definitions and examples

Definition 416 Let X be a nonempty set. A metric or distance on X is a function d : X×X → Rsuch that ∀x, y, z ∈ X1. (a.) d (x, y) ≥ 0, and (b.) d (x, y) = 0⇔ x = y,2. d (x, y) = d (y, x),3. d (x, z) ≤ d (x, y) + d (y, z) (Triangle inequality).(X, d) is called a metric space.

Remark 417 Observe that the definition requires that ∀x, y ∈ X, it must be the case that d (x, y) ∈R.

Example 418 n-dimensional Euclidean space with Euclidean metric.Given n ∈ N\ {0}, take X = Rn, and

d2,n : Rn ×Rn → R, (x, y) 7→Ã

nXi=1

(xi − yi)2

! 12

.

(X, d2,n) was shown to be a metric space in Proposition 58, Section 2.3. d2,n is called the Euclideandistance in Rn. In what follows, unless needed, we write simply d2 in the place of d2,n.

Proposition 419 (Discrete metric space) Given a nonempty set X and the function

d : X2 → R, d (x, y) =

⎧⎨⎩ 0 if x = y

1 if x 6= y,

(X, d) is a metric space, called discrete metric space.

Proof. 1a. 0, 1 ≥ 0.1b. From the definition, d (x, y) = 0⇔ x = y.2. It follows from the fact that x = y ⇔ y = x and x 6= y ⇔ y 6= x.3. If x = z, the result follows. If x 6= z, then it cannot be x = y and y = z, and again the result

follows.

Proposition 420 Given n ∈ N\ {0} , p ∈ [1,+∞) , X = Rn,

d : Rn ×Rn → R, (x, y) 7→Ã

nXi=1

|xi − yi|p! 1

p

,

(X, d) is a metric space.

127

128 CHAPTER 11. METRIC SPACES

Proof. 1a. It follows from the definition of absolute value.1b. [⇐]Obvious.[⇒] (

Pni=1 |xi − yi|p)

1p = 0 ⇔

Pni=1 |xi − yi|p = 0 ⇒ for any i , |xi − yi| = 0 ⇒for any i ,

xi − yi = 0.2. It follows from the fact |xi − yi| = |yi − xi|.3. First of all observe that

d (x, z) =

ÃnXi=1

|(xi − yi) + (yi − zi)|p! 1

p

.

Then, it is enough to show thatÃnXi=1

|(xi − yi) + (yi − zi)|p! 1

p

≤Ã

nXi=1

|xi − yi|p! 1

p

+

ÃnXi=1

|(yi − zi)|p! 1

p

which is a consequence of Proposition 421 below.

Proposition 421 Taken n ∈ N\ {0} , p ∈ [1,+∞) ,X = Rn, a, b ∈ RnÃnXi=1

|ai + bi|p! 1

p

≤Ã

nXi=1

|ai|p! 1

p

+

ÃnXi=1

|bi|p! 1

p

Proof. It follows from the proof of the Proposition 424 below.

Definition 422 Let R∞ be the set of sequences in R.

Definition 423 For any p ∈ [1,+∞), define1

lp =

((xn)n∈N ∈ R∞ :

+∞Xn=1

|xn|p < +∞),

i.e., roughly speaking, lp is the set of sequences whose associated series are absolutely convergent.

Proposition 424 (Minkowski inequality). ∀ (xn)n∈N , (yn)n∈N ∈ lp,∀p ∈ [1,+∞),

Ã+∞Xn=1

|xn + yn|p! 1

p

≤Ã+∞Xn=1

|xn|p! 1

p

+

Ã+∞Xn=1

|yn|p! 1

p

. (11.1)

Proof. If either (xn)n∈N or (yn)n∈N are such that ∀n ∈ N, xn = 0 or ∀n ∈ N, yn = 0, i.e., ifeither sequence is the constant sequence of zeros, then (11.1) is trivially true.Then, we can consider the case in which

∃α, β ∈ R++ such that³P+∞

n=1 |xn|p´ 1p

= α and³P+∞

n=1 |yn|p´ 1p

= β. (11.2)

Define

∀n ∈ N, bxn = µ |xn|α

¶pand byn = µ |yn|

β

¶p. (11.3)

Then+∞Xn=1

bxn = +∞Xn=1

byn = 1. (11.4)

For any n ∈ N, from the triangle inequality for the absolute value, we have

|xn + yn| ≤ |xn|+ |yn| ;1For basic results on series, see, for example, Section 10.5 in Apostol (1967).

11.1. DEFINITIONS AND EXAMPLES 129

since ∀p ∈ [1,+∞), fp : R+ → R, fp (t) = tp is an increasing function, we have

|xn + yn|p ≤ (|xn|+ |yn|)p . (11.5)

Moreover, from (11.3),

(|xn|+ |yn|)p = (α |bxn|+ β |byn|)p = (α+ β)pµµ

α

α+ β|bxn|+ β

α+ β|byn|¶p¶ . (11.6)

Since ∀p ∈ [1,+∞), fp is convex (just observe that f 00p (t) = p (p− 1) tp−2 ≥ 0), we getµα

α+ β|bxn|+ β

α+ β|byn|¶p ≤ α

α+ β|bxn|p + β

α+ β|byn|p (11.7)

From (11.5) , (11.6) and (11.7), we get

|xn + yn|p ≤ (α+ β)p ·µ

α

α+ β|bxn|p + β

α+ β|byn|p¶ .

From the above inequalities and basic properties of the series, we then get

+∞Xn=1

|xn + yn|p ≤ (α+ β)p

Ãα

α+ β

+∞Xn=1

|bxn|p + β

α+ β

+∞Xn=1

|byn|p! (11.4)= (α+ β)

p

µα

α+ β+

β

α+ β

¶= (α+ β)

p.

Therefore, using (11.2), we get

+∞Xn=1

|xn + yn|p ≤

⎛⎝Ã+∞Xn=1

|xn|p! 1

p

+

Ã+∞Xn=1

|yn|p! 1

p

⎞⎠p

,

and therefore the desired result.

Proposition 425 (lp, dp) with

dp : lp × lp → R, dp

¡(xn)n∈N , (yn)n∈N

¢=

Ã+∞Xn=1

|xn − yn|p! 1

p

.

is a metric space.

Proof. We first of all have to check that dp (x, y) ∈ R, i.e., that³P+∞

n=1 |xn − yn|p´ 1p

converges.

Ã+∞Xn=1

|xn − yn|p! 1

p

=

Ã+∞Xn=1

|xn + (−yn)|p! 1

p

≤Ã+∞Xn=1

|xn|p! 1

p

+

Ã+∞Xn=1

|yn|p! 1

p

< +∞,

where the first inequality follows from Minkowski inequality and the second inequality from theassumption that we are considering sequences in lp.Properties 1 and 2 of the distance follow easily from the definition. Property 3 is again a

consequence of Minkowski inequality:

dp (x, z) =

Ã+∞Xn=1

|(xn − yn) + (yn − zn)|p! 1

p

≤Ã+∞Xn=1

|xn − yn|p! 1

p

+

Ã+∞Xn=1

|(yn − zn)|p! 1

p

:= dp (x, y)+dp (y, z) .

Definition 426 Let T be a non empty set. B (T ) is the set of all bounded real functions defined onT , i.e.,

B (T ) := {f : T → R : sup {|f (x)| : x ∈ T} < +∞} ,


and2

d∞ : B (T )× B (T )→ R, d∞ (f, g) = sup {|f (x)− g (x)| : x ∈ T}

Definition 428l∞ =

©(xn)n∈N ∈ R∞ : sup {|xn| : n ∈ N} < +∞

ªis called the set of bounded real sequences, and, still using the symbol of the previous definition,

d∞ : l∞ × l∞ → R, d∞


¢= sup {|xn − yn| : n ∈ N}

Proposition 429 (B (T ) , d∞)and (l∞, d∞) are metric spaces, and d∞ is called the sup metric.

Proof. We show that (B (T ) , d∞) is a metric space. As usual, the difficult part is to showproperty 3 of d∞, which is done below.∀f, g, h ∈ B (T ) ,∀x ∈ T,

|f (x)− g (x)| ≤ |f (x)− h (x)|+ |h (x)− g (x)| ≤

≤ sup {|f (x)− h (x)| : x ∈ T}+ sup {|h (x)− g (x)| : x ∈ T} =

= d∞ (f, h) + d∞ (h, g) .

Then,∀x ∈ T,d∞ (f, g) := sup |f (x)− g (x)| ≤ d∞ (f, g) + d∞ (h, g) .

Exercise 430 If (X, d) is a metric space, thenµX,

d

1 + d

¶is a metric space.

Proposition 431 Given a metric space (X, d) and a set Y such that ∅ 6= Y ⊆ X, then¡Y, d|Y×Y

¢is a metric space.

Proof. By definition.

Definition 432 Given a metric space (X,d) and a set Y such that ∅ 6= Y ⊆ X, then¡Y, d|Y×Y

¢,

or simply, (Y, d) is called a metric subspace of X.

Example 433 1. Given R with the (Euclidean) distance d2,1, ([0, 1) , d2,1) is a metric subspace of(R, d2,1).2. Given R2 with the (Euclidean) distance d2,2, ({0} ×R, d2,2) is a metric subspace of

¡R2, d2,2

¢.

Exercise 434 Let C ([0, 1]) be the set of continuous functions from [0, 1] to R. Show that a metricon that set is defined by

d (f, g) =

Z 1

0

|f (x)− g (x)| dx,

where f, g ∈ C ([0, 1]).

Example 435 Let X be the set of continuous functions from R to R, and consider d (f, g) =supx∈R |f (x)− g (x)|. (X,d) is not a metric space because d is not a function from X2 to R: itcan be supx∈R |f (x)− g (x)| = +∞.

2

Definition 427 Observe that d∞ (f, g) ∈ R :

d∞ (f, g) := sup {|f (x)− g (x)| : x ∈ T} ≤ sup {|f (x)| : x ∈ T}+ sup {|g (x)| : x ∈ T} < +∞.

11.2. OPEN AND CLOSED SETS 131

Example 436 Let X = {a, b, c} and d : X2 → R such that

d (a, b) = d (b, a) = 2

d (a, c) = d (c, a) = 0

d (b, c) = d (c, b) = 1.

Since d (a, b) = 2 > 0 + 1 = d (a, c) + d (b, c), then (X, d) is not a metric space.

Example 437 Given n ∈ N\ {0} , p ∈ (0, 1) , X = R2, define

d : R2 ×R2 → R, (x, y) 7→Ã

2Xi=1

|xi − yi|p! 1

p

,

(X, d) is not a metric space, as shown below. Take x = (0, 1) , y = (1, 0) and z = (0, 0). Then

d (x, y) = (1p + 1p)1p = 2

1p ,

d (x, z) = (0p + 1p)1p = 1

d (z, y) = 1.

Then, d (x, y)− (d (x, z) + d (z, y)) = 21p − 2 > 0.

11.2 Open and closed sets

Definition 438 Let (X, d) be a metric space. ∀x0 ∈ X and ∀r ∈ R++, the open r-ball of x0 in(X, d) is the set

B(X,d) (x0, r) = {x ∈ X : d (x, x0) < r} .

If there is no ambiguity about the metric space (X, d) we are considering, we use the lighternotation B (x0, r) in the place of B(X,d) (x0, r).

Example 439 1.B(R,d2) (x0, r) = (x0 − r, x0 + r)

is the open interval of radius r centered in x0.2.

B(R,d2) (x0, r) =

½(x1, x2) ∈ R2 :

q(x1 − x01)

2 + (x2 − x02)2 < r

¾is the open disk of radius r centered in x0.3. In R2 with the metric d given by

d ((x1, x2) , (y1, y2)) = max {|x1 − y1, |x2 − y2||}

the open ball B (0, 1) can be pictured as done below:a square around zero.

Definition 440 Let (X, d) be a metric space. x is an interior point of S ⊆ X ifthere exists an open ball centered in x and contained in S, i.e.,∃ r ∈ R++ such that B (x, r) ⊆ S.

Definition 441 The set of all interior points of S is called the Interior of S and it is denoted byInt(X,d) S or simply by Int S.


Remark 442 Int S ⊆ S, simply because x ∈ Int S ⇒ x ∈ B (x, r) ⊆ S, where the first inclusionfollows from the definition of open ball and the second one from the definition of Interior. In otherwords, to find interior points of S, we can limit our search to points belonging to S.It is not true that ∀S ⊆ X, S ⊆ Int S, as shown below. We want to prove that

¬(∀S ⊆ X,∀x ∈ S, x ∈ S ⇒ x ∈ IntS),

i.e.,(∃S ⊆ X and x ∈ S such that x /∈ Int S).

Take (X,d) = (R, d2), S = {1} and x = 1. Then, clearly 1 ∈ {1}, but 1 /∈ Int S : ∀r ∈ R++ ,(1− r, 1 + r) * {1}.

Remark 443 To understand the following example, recall that ∀a, b ∈ R such that a < b, ∃c ∈ Qand d ∈ R\Q such that c, d ∈ (a, b) - see, for example, Apostol (1967).

Example 444 Let (R, d2) be given.1. Int N = Int Q =∅.2. ∀a, b ∈ R, a < b, Int [a, b] = Int [a, b) = Int (a, b] = Int (a, b) = (a, b).3. Int R = R.4. Int ∅ = ∅.

Definition 445 Let (X, d) be a metric space. A set S ⊆ X is open in (X,d) , or (X, d)-open, oropen with respect to the metric space (X,d), if S ⊆ Int S, i.e., S = Int S, i.e.,

∀x ∈ S, ∃r ∈ R++ such that B(X,d) (x, r) := {y ∈ X : d (y, x) < r} ⊆ S.

Remark 446 Let (R, d2) be given. From Example 444, it follows thatN,Q, [a, b] , [a, b) , (a, b] are not open sets, and (a, b) ,R and ∅ are open sets. In particular, open

interval are open sets, but there are open sets which are not open interval. Take for exampleS = (0, 1) ∪ (2, 3).

Exercise 447 ∀n ∈ N,∀i ∈ {1, ..., n} ,∀ai, bi ∈ R with ai < bi,

×ni=1 (ai, bi)

is (Rn, d2) open.

Proposition 448 Let (X, d) be a metric space. An open ball is an open set.

Proof. Take y ∈ B (x0, r). Define

δ = r − d (x0, y) . (11.8)

First of all, observe that, since y ∈ B (x0, r), d (x0, y) < r and then δ ∈ R++. It is then enoughto show that B (y, δ) ⊆ B (x0, r), i.e., we assume that

d (z, y) < δ (11.9)

and we want to show that d (z, x0) < r. From the triangle inequality

d (z, x0) ≤ d (z, y) + d (y, x0)(11.9),(11.8)

< < δ + (r − δ) = r,

as desired.

Example 449 In a discrete metric space (X, d), ∀x ∈ X,∀r ∈ (0, 1] , B (x, r) := {y ∈ X : d (x, y) < r} ={x} and ∀r > 1, B (x, r) := {y ∈ X : d (x, y) < r} = X. Then, it is easy to show that any subset ofa discrete metric space is open, as verified below. Let (X, d) be a discrete metric space and S ⊆ X.For any x ∈ S, take ε = 1

2 ; then B¡x, 12

¢= {x} ⊆ S.


Definition 450 Let a metric space (X, d) be given. A set T ⊆ X is closed in (X, d) if its comple-ment in X, i.e., X\T is open in (X,d).

If no ambiguity arises, we simply say that T is closed in X, or even, T is closed; we also writeTC in the place of X\T .

Remark 451 S is open ⇔ SC is closed, simply because SC closed ⇔¡SC¢C= S is open.

Example 452 The following sets are closed in (R, d2): R;N;∅;∀a, b ∈ R, a < b, {a} and [a, b].

Remark 453 It is false that:

S is not open ⇒ S is closed

(and therefore that S is not closed ⇒ S is open), i.e., there exist sets which are not open andnot closed, for example (0, 1] in (R, d2). There are also two sets which are both open and closed: ∅and Rn in (Rn, d2).

Proposition 454 Let a metric space (X, d) be given.1. ∅ and X are open sets.2. The union of any (finite or infinite) collection of open sets is an open set.3. The intersection of any finite collection of open sets is an open set.

Proof. 1.∀x ∈ X, ∀r ∈ R++, B (x, r) ⊆ X. ∅ is open because it contains no elements.2.Let I be a collection of open sets and S = ∪A∈IA. Assume that x ∈ S. Then there exists A ∈ I

such that x ∈ A. Then, for some r ∈ R++

x ∈ B (x, r) ⊆ A ⊆ S

where the first inclusion follows from fact that A is open and the second one from the definition ofS.3.Let F be a collection of open sets, i.e., F = {An}n∈N , where N ⊆ N, #N is finite and ∀n ∈ N ,

An is an open set. Take S = ∩n∈NAn. If S = ∅, we are done. Assume that S 6= ∅ and that x ∈ S.Then from the fact that each set A is open and from the definition of S as the intersection of sets

∀n ∈ N,∃rn ∈ R++ such that x ∈ B (x, rn) ⊆ An

Since N is a finite set, there exists a positive r∗ = min {rn : n ∈ N} > 0. Then

∀n ∈ N,x ∈ B (x, r∗) ⊆ B (x, rn) ⊆ An

and from the very definition of intersections

x ∈ B (x, r∗) ⊆ ∩n∈NB (x, rn) ⊆ ∩n∈NAn = S.

Remark 455 The assumption that #N is finite cannot be dispensed with:

∩+∞n=1Bµ0,1

n

¶= ∩+∞n=1

µ− 1n,1

n

¶= {0}

is not open.

Remark 456 A generalization of metric spaces is the concept of topological spaces. In fact, wehave the following definition which “ assumes the previous Proposition”.Let X be a nonempty set. A collection T of subsets of X is said to be a topology on X if1. ∅ and X belong to T ,2. The union of any (finite or infinite) collection of sets in T belongs to T ,3. The intersection of any finite collection of sets in T belongs to T .(X, T ) is called a topological space.The members of T are said to be open set with respect to the topology T , or (X,T ) open.


Proposition 457 Let a metric space (X, d) be given.1. ∅ and X are closed sets.2. The intersection of any (finite or infinite) collection of closed sets is a closed set.3. The union of any finite collection of closed sets is a closed set.

Proof. 1It follows from the definition of closed set, the fact that ∅C = X, XC = ∅ and Proposition 454.2.Let I be a collection of closed sets and S = ∩B∈IB. Then, from de Morgan’s laws,

SC = (∩B∈IB)C = ∪B∈IBC

Then from Remark 451, ∀B ∈ I, BC is open and from Proposition 454.1, ∪B∈IBC is open aswell.2.Let F be a collection of closed sets, i.e., F = {Bn}n∈N , where N ⊆ N, #N is finite and ∀n ∈ N ,

Bn is an open set. Take S = ∪n∈NBn. Then, from de Morgan’s laws,

SC = (∪n∈NBn)C= ∩n∈NBC

n

Then from Remark 451, ∀n ∈ N , BC is open and from Proposition 454.2, ∩B∈IBC is open as well.

Remark 458 The assumption that #N is finite cannot be dispensed with:µ∩+∞n=1B

µ0,1

n

¶¶C= ∪+∞n=1B

µ0,1

n

¶C= ∪+∞n=1

µµ−∞,− 1

n,

¸∪∙1

n,+∞

¶¶is not closed.

Definition 459 If S is both closed and open in (X, d), S is called clopen in (X, d).

Remark 460 In any metric space (X,d), X and ∅ are clopen.

Proposition 461 In any metric space (X, d), {x} is closed.

Proof. We want to show that X\ {x} is open. If X = {x}, then X\ {x} = ∅, and we are done.If X 6= {x}, take y ∈ X, where y 6= x. Taken

r = d (y, x) (11.10)

with r > 0, because x 6= y. We are left with showing that B (y, r) ⊆ X\ {x}, which is truebecause of the following argument. Suppose otherwise; then x ∈ B (y, r), i.e., r

(11.10)= d (y, x) < r,

a contradiction.

Remark 462 From Example 449, any set in any discrete metric space is open. Therefore, thecomplement of each set is open, and therefore each set is then clopen.

Definition 463 Let a metric space (X,d) and a set S ⊆ X be given. x is an boundary point of Sifany open ball centered in x intersects both S and its complement in X, i.e.,

∀r ∈ R++, B (x, r) ∩ S 6= ∅ ∧ B (x, r) ∩ SC 6= ∅.

Definition 464 The set of all boundary points of S is called the Boundary of S and it is denotedby F (S).

Exercise 465 F (S) = F¡SC¢

Exercise 466 F (S) is a closed set.


Definition 467 The closure of S, denoted by Cl (S) is the intersection of all closed sets containingS, i.e., Cl (S) = ∩S0∈SS0 where S := {S0 ⊆ X : S0 is closed and S0 ⊇ S}.

Proposition 468 1. Cl (S)is a closed set;2. S is closed ⇔ S = Cl (S).

Proof. 1.It follows from the definition and Proposition 457.2.[⇐]It follows from 1. above.[⇒]Since S is closed, then S ∈ S. Therefore, Cl (S) = S ∩ (∩S0∈SS0) = S.

Definition 469 x ∈ X is an accumulation point for S ⊆ X if any open ball centered at x containspoints of S different from x, i.e., if

∀r ∈ R++, (S\ {x}) ∩B (x, r) 6= ∅

The set of accumulation points of S is denoted by D (S) and it is called the Derived set of S.

Definition 470 x ∈ X is an isolated point for S ⊆ X if x ∈ S and it is not an accumulation pointfor S, i.e.,

x ∈ S and ∃r ∈ R++ such that (S\ {x}) ∩B (x, r) = ∅,

or

∃r ∈ R++ such that S ∩B (x, r) = {x} .

The set of isolated points of S is denoted by Is (S)..

Proposition 471 D (S) = {x ∈ Rn : ∀r ∈ R++, S ∩B (x, r) has an infinite cardinality}.

Proof. [⊆]Suppose otherwise, i.e., x is an accumulation point of S and ∃r ∈ R++ such that S ∩B (x, r) =

{x1, ..., xn}.Then defined δ := min {d (x, xi) : i ∈ {1, ..., n}}, (S\ {x}) ∩ B¡x, δ2

¢= ∅, a contradic-

tion.[⊇]Since S ∩B (x, r) has an infinite cardinality, then (S\ {x}) ∩B (x, r) 6= ∅.

Remark 472

Is (S) = {x ∈ S : ∃r ∈ R++ such that S ∩B (x, r) = {x}} ,

as verified below.x is an isolated point of S if belongs to S and it is not an accumulation point, i.e., if

x ∈ S ∧ x /∈ D (S) ,

or

x ∈ S ∧ ∃r ∈ R++ such that (S\ {x}) ∩B (x, r) = ∅,

or

∃r ∈ R++ such that S ∩B (x, r) = {x} .


11.2.1 Sets which are open or closed in metric subspaces.

Remark 473 1. [0, 1) is ([0, 1) , d2) open.2. [0, 1) is not (R, d2) open. We want to show

¬∀x0 ∈ [0, 1) ,∃r ∈ R++ such that B(R,d2) (x0, r)

®= (x0 − r, x0 + r) ⊆ [0, 1) , (11.11)

i.e.,

∃x0 ∈ [0, 1) such that ∀r ∈ R++, ∃x0 ∈ R such that x0 ∈ (x0 − r, x0 + r) and x0 /∈ [0, 1) .

It is enough to take x0 = 0 and x0 = − r2 .

3. Let ([0,+∞) , d2) be given. [0, 1) is open, as shown below. By definition of open set, - goback to Definition 445 and read it again - we have that, given the metric space ((0,+∞) , d2), [0, 1)is open if

∀x0 ∈ [0, 1), ∃r ∈ R++ such that B([0,+∞),d2) (x0, r) :=nx ∈ [0,+∞) : d (x0, x) < r

o⊆ ([0, 1)) .

If x0 ∈ (0, 1), then take r = min {x0, 1− x0} > 0.If x0 = 0, then take r = 1

2 . Therefore, we have B(R+,d2)¡0, 12

¢=©x ∈ R+ : |x− 0| < 1

2

ª=£

0, 12¢⊆ [0, 1).

Remark 474 1. (0, 1) is ((0, 1) , d2) closed.2. (0, 1] is ((0,+∞) , d2) closed, simply because (1,+∞) is open.

Proposition 475 Let a metric space (X,d), a metric subspace (Y, d) of (X,d) and a set S ⊆ Y begiven.

S is open in (Y, d)⇔ there exists a set O open in (X, d) such that S = Y ∩O.

Proof. Preliminary remark.

∀x0 ∈ Y,∀r ∈ R++,

B(Y,d) (x0, r) := {x ∈ Y : d (x0, x) < r} = Y ∩ {x ∈ X : d (x0, x) < r} = Y ∩B(X,d) (x0, r) .(11.12)

[⇒]Taken x0 ∈ S, by assumption ∃rx0 ∈ R++ such that B(Y,d) (x0, r) ⊆ S ⊆ Y . Then

S = ∪x0∈SB(Y,d) (x0, r)(11.12)= ∪x0∈S

¡Y ∩B(X,d) (x0, r)

¢ distributive laws= Y ∩

¡∪x0∈SB(X,d) (x0, r)

¢,

and the it is enough to take O = ∪x0∈SB(X,d) (x0, r) to get the desired result.[⇐]Take x0 ∈ S. then, x0 ∈ O, and, since, by assumption, O is open in (X, d) , ∃r ∈ R++ such that

B(X,d) (x0, r) ⊆ O. Then

B(Y,d) (x0, r)(11.12)= Y ∩B(X,d) (x0, r) ⊆ O ∩ Y = S,

where the last equality follows from the assumption. Summarizing, ∀x0 ∈ S, ∃r ∈ R++ such thatB(Y,d) (x0, r) ⊆ S, as desired.

Corollary 476 Let a metric space (X,d), a metric subspace (Y, d) of (X, d) and a set S ⊆ Y begiven.1.

hS closed in (Y, d)i⇔ hthere exists a set C closed in (X, d) such that S = Y ∩ C.i .

2.

hS open (respectively, closed) in (X, d)i ⇒: hS open (respectively, closed) in (Y, d)i .

3. If Y is open (respectively, closed) in X,

hS open (respectively, closed) in (X,d)i⇐ hS open (respectively, closed) in (Y, d)i .

i.e., “the implication ⇐ in the above statement 2. does hold true”.

11.3. SEQUENCES 137

Proof. 1.hS closed in (Y, d)i def.⇔ hY \S open in (Y, d)i Prop. 475⇔⇔ hthere exists an open set S00 in (X, d) such that Y \S = S00 ∩ Y i⇔⇔ hthere exists a closed set S0 in (X, d) such that S = S0 ∩ Y i ,where the last equivalence is proved below;[⇐]Take S00 = X\S0, open in (X, d) by definition. We want to show thatif S00 = X\S, S = S0 ∩ Y and Y ⊆ X, then Y \S = S00 ∩ Y :

x ∈ Y \S iff x ∈ Y ∧ x /∈ Sx ∈ Y ∧ (x /∈ S0 ∩ Y )x ∈ Y ∧ (¬ (x ∈ S0 ∩ Y ))x ∈ Y ∧ (¬ (x ∈ S0 ∧ x ∈ Y ))x ∈ Y ∧ ((x /∈ S0 ∨ x /∈ Y ))(x ∈ Y ∧ x /∈ S0) ∨ ((x ∈ Y ∧ x /∈ Y ))x ∈ Y ∧ x /∈ S0

x ∈ S00 ∩ Y iff x ∈ Y ∧ x ∈ S00

x ∈ Y ∧ (x ∈ X ∧ x /∈ S0)(x ∈ Y ∧ x ∈ X) ∧ x /∈ S0

x ∈ Y ∧ x /∈ S0

[⇒]Take S0 = X\S.Then S0 is closed in (X, d). We want to show thatif T 0 = X\S00, Y \S = S00 ∩ Y, Y ⊆ X, then S = S0 ∩ Y .Observe that we want to show that Y \S = Y \ (S0 ∩ Y ) ,or from the assumptions, we want to

show thatS00 ∩ Y = Y \ ((X\S00) ∩ Y ) .

x ∈ Y \ ((X\S00) ∩ Y ) iff x ∈ Y ∧ (¬ (x ∈ X\S00 ∧ x ∈ Y ))x ∈ Y ∧ (x /∈ X\S00 ∨ x /∈ Y )x ∈ Y ∧ (x ∈ S00 ∨ x /∈ Y )(x ∈ Y ∧ x ∈ S00) ∨ (x ∈ Y ∧ x /∈ Y )x ∈ Y ∧ x ∈ S00

x ∈ S00 ∩ Y

2. and 3.Exercises.

11.3 Sequences

Unless otherwise specified, up to the end of the chapter, we assume that

X is a metric space with metric d,

andRn is the metric space with Euclidean metric.

Definition 477 A sequence in X is a function x : N→ X.

Usually, for any n ∈ N, the value x (n) is denoted by xn,which is called the n-th term of thesequence; the sequence is denoted by (xn)n∈N.

Definition 478 Given a nonempty set X, X∞ is the set of sequences (xn)n∈N such that ∀n ∈ N,xn ∈ X.

Definition 479 A strictly increasing sequence of natural numbers is a sequence (kn)n∈N in N such

1 < k1 < k2 < ... < kn < ...


Definition 480 A subsequence of a sequence (xn)n∈N is a sequence (yn)n∈N such that there existsa strictly increasing sequence (kn)n∈N of natural numbers such that ∀n ∈ N, yn = xkn.

Definition 481 A sequence (xn)n∈N ∈ X∞ is said to be (X, d) convergent to x0 ∈ X (or convergentto x0 ∈ X with respect to the metric space (X, d) ) if

∀ε > 0,∃n0 ∈ N such that ∀n > n0, d (xn, x0) < ε (11.13)

x0 is called the limit of the sequence (xn)n∈N and we write

limn→+∞ xn = x0, or xnn→ x0. (11.14)

(xn)n∈N in a metric space (X,d) is convergent if there exist x0 ∈ X such that (11.13) holds. Inthat case, we say that the sequence converges to x0 and x0 is the limit of the sequence.3

Remark 482 A more precise, and heavy, notation for (11.14) would be

limn→+∞(X,d)

xn = x0 or xnn→

(X,d)x0

Remark 483 Observe that¡1n

¢n∈N+ converges with respect to (R, d2) and it does not converge with

respect to (R++, d2) .

Proposition 484 limn→+∞ xn = x0 ⇔ limn→+∞ d (xn, x0) = 0.

Proof. Observe that we can define the sequence (d (xn, x0))n∈N in R. Then from definition 481,we have that limn→+∞ d (xn, x0) = 0 means that

∀ε > 0,∃n0 ∈ N such that ∀n > n0, |d (xn, x0)− 0| < ε.

Remark 485 Since (d (xn, x0))n∈N is a sequence in R, all well known results hold for that sequence.Some of those results are listed below.

Proposition 486 (Some properties of sequences in R).All the following statements concern sequences in R.1. Every convergent sequence is bounded.2. Every increasing (decreasing) sequence that is bounded above (below) converges to its sup

(inf).3. Every sequence has a monotone subsequence.4. (Bolzano-Weierstrass 1) Every bounded sequence has a convergent subsequence.5. (Bolzano-Weierstrass 2) Every sequence contained in a closed and bounded set has a conver-

gent subsequence in the set.Moreover, suppose that (xn)n∈N and (yn)n∈N are sequences in R and limn→∞ xn = x0 and

limn→+∞ yn = y0. Then6. limn→+∞ (xn + yn) = x0 + y0;7. limn→+∞ xn · yn = x0 · y0;8. if ∀n ∈ N, xn 6= 0 and x0 6= 0, limn→+∞

1xn= 1

x0;

9. if ∀n ∈ N, xn ≤ yn, then x0 ≤ y0;10. Let (zn)n∈N be a sequence such that ∀n ∈ N, xn ≤ zn ≤ yn, and assume that x0 = y0. Then

limn→+∞ zn = x0.

Proof. Most of the above results can be found in Chapter 12 in Simon and Blume (1994), Okpage 50 on, Morris pages 126-128, apostol; put a reference to apostol.

Proposition 487 If (xn)n∈N converges to x0 and (yn)n∈N is a subsequence of (xn)n∈N, then (yn)n∈Nconverges to x0.

3For the last sentence in the Definition, see, for example, Morris (2007), page 121.

11.3. SEQUENCES 139

Proof. By definition of subsequence, there exists a strictly increasing sequence (kn)n∈N ofnatural numbers, i.e., 1 < k1 < k2 < ... < kn < ..., such that ∀n ∈ N, yn = xkn .If n→ +∞, then kn → +∞. Moreover, ∀n, ∃kn such that

d (x0, xkn) = d (x0, yn)

Taking limits of both sides for n→ +∞, we get the desired result.

Proposition 488 A sequence in (X, d) converges at most to one element in X.

Proof. Assume that xnn→ p and xn

n→ q; we want to show that p = q. From the Triangleinequality,

∀n ∈ N, 0 ≤ d (p, q) ≤ d (p, xn) + d (xn, q) (11.15)

Since d (p, xn)→ 0 and d (xn, q)→ 0, Proposition 486.7 and (11.15) imply that d (p, q) = 0 andtherefore p = q.

Proposition 489 Given a sequence (xn)n∈N =³¡xin¢ki=1

´n∈N

in Rk,

(xn)n∈N converges to x

®⇔D∀i ∈ {1, ..., k} ,

¡xin¢n∈N converges to xi

E,

and

limn→+∞

xn =

µlim

n→+∞xin

¶ni=1

.

Proof. [⇒]Observe that ¯

xin − xi¯=

q(xin − xi)

2 ≤ d (xnx) .

Then, the result follows.[⇐]By assumption, ∀ε > 0 and ∀i ∈ {1, ..., k} , there exists n0 such that ∀n > n0, we have¯

xin − xi¯< ε√

k. Then ∀n > n0,

d (xnx) =

ÃkXi=1

¯xin − xi

¯2! 12

<

ÃkXi=1

¯ε√k

¯2! 12

=

Ãε2

kXi=1

1

k

! 12

= ε.

Proposition 490 Suppose that (xn)n∈N and (yn)n∈N are sequences in Rk and limn→∞ xn = x0and limn→+∞ yn = y0. Then

1. limn→+∞ (xn + yn) = x0 + y0;

2. ∀c ∈ R, limn→+∞ c · xn = c · x0;

3. limn→+∞ xn · yn = x0 · y0.

Proof. It follows from Propositions 486 and 489.

Example 491 In Proposition 429 , we have seen that (B ([0, 1]) , d∞) is a metric space. Observethat defined ∀n ∈ N\ {0},

fn : [0, 1]→ R, t 7→ tn,

we have that (fn)n ∈ B ([0, 1])∞. Moreover, ∀t ∈ [0, 1],

¡fn¡t¢¢

n∈N ∈ R∞ and it converges in

(R, d2). In fact,

limn→+∞

tn=

⎧⎨⎩ 0 if t ∈ [0, 1)

1 if t = 1.


Define

f : [0, 1]→ R, t 7→

⎧⎨⎩ 0 if t ∈ [0, 1)

1 if t = 1..

10.750.50.250

1

0.75

0.5

0.25

0

x

y

x

y

We want to check that it is false that

fm →(B([0,1]),d∞)

f,

i.e., it is false that d∞ (fm, f)→m0. Then, we have to check

¬∀ε > 0 ∃Nε ∈ N such that ∀n > Nε, d∞ (fn, f) < ε

®,

i.e.,∃ε > 0 such that ∀Nε ∈ N, ∃n > Nε such that d∞ (fn, f) ≥ ε .

Then, taken ε = 14 , it suffice to show that

∀m ∈ N, ∃t ∈ (0, 1) such that¯fm¡t¢− f

¡t¢¯≥ 1

4

®.

It is then enough to take t =¡12

¢m.

Exercise 492 For any metric space (X, d) and (xn)n∈N ∈ X∞,¿xn →

(X,d)x

À⇔*xn →(X, 1

1+d)x

+.

11.4 Sequential characterization of closed setsProposition 493 Let (X, d) be a metric space and S ⊆ X.4

S is closed®⇔

any (X,d) convergent sequence (xn)n∈N ∈ S∞ converges to an element of S

®.

Proof. We want to show that

S is closed⇔

⇔¿¿

(xn)n∈N is such that 1. ∀n ∈ N, xn ∈ S, and2. xn → x0

À⇒ 3. x0 ∈ S

À.

[⇒]We are going to show that if S is closed, 2. and not 3. hold, then 1. does not hold. Taken a

sequence converging to x0 ∈ X\S. Since S is closed, X\S is open and therefore ∃r ∈ R++ such that4Proposition 555 in Appendix 11.8 presents a different proof of the result below.

11.5. COMPACTNESS 141

B (x0, r) ⊆ X\S. Since xn → x0, ∃M ∈ N such that ∀n > M , xn ∈ B (x0, r) ⊆ X\S, contradictingAssumption 1 above.[⇐]Suppose otherwise, i.e., S is not closed. Then, X\S is not open. Then, ∃ x ∈ X\S such that

∀n ∈ N, ∃xn ∈ X such that xn ∈ B¡x, 1n

¢∩ S, i.e.,

i. x ∈ X\Sii. ∀n ∈ N, xn ∈ S,iii. d (xn, x) < 1

n , and therefore xn → x,and i., ii. and iii. contradict the assumption.

Remark 494 The Appendix to this chapter contains some other characterizations of closed setsand summarizes all the presented characterizations of open and closed sets.

11.5 Compactness

Definition 495 Let (X, d) be a metric space, S a subset of X, and Γ be a set of arbitrary cardi-nality. A family S = {Sγ}γ∈Γ such that ∀γ ∈ Γ, Sγ is (X, d) open, is said to be an open cover ofS if S ⊆ ∪γ∈ΓSγ.A subfamily S 0 of S is called a subcover of S if S ⊆ ∪S0∈S0S0.

Definition 496 A metric space (X,d) is compact if every open cover of X has a finite subcover.A set S ⊆ X is compact in X if every open cover of S has a finite subcover of S.

Example 497 Any finite set in any metric space is compact.Take S = {xi}ni=1 in (X, d) and an open cover S of S. For any i ∈ {1, ..., n}, take an open set

in S which contains xi; call it Si. Then S 0 = {Si : i ∈ {1, ..., n}} is the desired open subcover of S.

Example 498 1. (0, 1) is not compact in (R, d2).We want to show that the following statement is true:

¬ h∀S such that ∪S∈S S ⊇ (0, 1) , ∃S 0 ⊆ S such that #S 0 is finite and ∪S∈S0 S ⊇ (0, 1)i ,

i.e.,

∃S such that ∪S∈S S ⊇ (0, 1) and ∀S 0 ⊆ S either #S 0 is infinite or ∪S∈S0 S + (0, 1) .

Take S =¡¡

1n , 1

¢¢n∈N\{0,1} and S

0 any finite subcover of S.Then there exists a finite set N such

that S 0 =¡¡

1n , 1

¢¢n∈N . Take n∗ = max {n ∈ N}. Then, ∪S∈S0S = ∪n∈N

¡1n , 1

¢=¡1n∗ , 1

¢and¡

1n∗ , 1

¢+ (0, 1).

2. (0, 1] is not compact in ((0,+∞) , d2). Take S =¡¡

1n , 1 +

1n

¢¢n∈N\{0} and S

0 any finite sub-

cover of S.Then there exists a finite set N such that S 0 =¡¡

1n , 1 +

1n

¢¢n∈N . Take n

∗ = max {n ∈ N}and n∗∗ = min {n ∈ N}. Then, ∪S∈S0S = ∪n∈N

¡1n , 1 +

1n

¢=¡1n∗ , 1 +

1n∗∗

¢and

¡1n∗ , 1 +

1n∗∗

¢+

(0, 1].

Proposition 499 Let (X, d) be a metric space.

X compact and C ⊆ X closed ⇒ hC compacti .

Proof. Take an open cover S of C. Then S ∪ (X\C) is an open cover of X. Since X iscompact, then there exists an open covers S 0of S ∪ (X\C) which cover X. Then S 0\ {X\C} is afinite subcover of S which covers C.

11.5.1 Compactness and bounded, closed sets

Definition 500 Let (X, d) be a metric space and S a subset of X. S is bounded in (X, d) if∃r ∈ R++ and x ∈ S such that S ⊆ B(X,d) (x, r).

Proposition 501 S is bounded ⇔ ∃r∗ ∈ R++ such that S ⊆ B (0, r∗).


Proof. [⇒] Take r∗ = d (x, 0) + r . We want to show that ∀s ∈ S, d (s, 0) < r∗. Indeed,

d (s, 0) ≤ d (s, x) + d (x, 0) < r + (r∗ − r) = r∗,

where the last inequality follows from the fact that s ∈ S implies that d (s, x) < r and that,from the definition of r∗, d (x, 0) = r∗ − r.[⇐] Take r = r∗ and x = 0.

Proposition 502 The finite union of bounded set is bounded.

Proof. Take n ∈ N and {Si}ni=1 such that ∀i ∈ {1, ..., n}, Si is bounded. Then, ∀i ∈ {1, ..., n},∃r ∈ R++ such that Si ⊆ B (0, ri). Take r = max {ri}ni=1. Then ∪ni=1Si ⊆ ∪ni=1B (0, ri) ⊆ B (0, r).

Proposition 503 Let (X, d) be a metric space and S a subset of X.

S compact ⇒ S bounded.

Proof. If S = ∅, we are done. Assume then that S 6= ∅, and take x ∈ S and B ={B (x, n)}n∈N\{0}. B is an open cover of X and therefore of S. Then, there exists B0 ⊆ B such that

B0 = {B (x, ni)}i∈N ,

where N is a finite set and B0 covers S.Then takes n∗ = maxi∈N ni, we get S ⊆ B (x, n∗) as desired.


S compact ⇒ S closed.

Proof. If S = X, we are done by Proposition 457. Assume that S 6= X: we want to show thatX\S is open. Take y ∈ S and x ∈ X\S. Then, taken ry ∈ R such that

0 < ry <1

2d (x, y) ,

we haveB (y, ry) ∩B (x, ry) = ∅.

Now, S = {B (y, ry) : y ∈ S} is an open cover of S, and since S is compact, there exists a finitesubcover S 0 of S which covers S, say

S 0 = {B (yn, rn)}n∈N ,

such that N is a finite set. Taker∗ = min

n∈Nrn,

and therefore r∗ > 0. Then ∀n ∈ N ,

B (yn, rn) ∩B (x, rn) = ∅,

B (yn, rn) ∩B (x, r∗) = ∅,and

(∪n∈N B (yn, rn)) ∩B (x, r∗) = ∅.Since {B (yn, rn)}n∈N covers S, we then have

S ∩B (x, r∗) = ∅,

orB (x, r∗) ⊆ X\S.

Therefore, we have shown that

∀x ∈ X\S, ∃r∗ ∈ R++ such that B (x, r∗) ⊆ X\S,

i.e., X\S is open and S is closed.


Remark 505 Summarizing, we have seen that in any metric space

S compact ⇒ S bounded and closed.

The opposite implication is false. In fact, the following sets are bounded, closed and not compact.1. Let the metric space ((0,+∞) , d2). (0, 1] is closed from Remark 474, it is clearly bounded

and it is not compact from Example 498.2 .2. (X,d) where X is an infinite set and d is the discrete metric.X is closed, from Remark 462 .X is bounded: take x ∈ X and r = 2 .X is not compact. Take S = {B (x, 1)}x∈X . Then ∀x ∈ X there exists a unique element Sx in

S such that x ∈ Sx. 5

Remark 506 In next section we are going to show that if (X, d) is an Euclidean space with theEuclidean distance and S ⊆ X, then

S compact ⇐ S bounded and closed.

11.5.2 Sequential compactness

Definition 507 Let a metric space (X, d) be given. S ⊆ X is sequentially compact if every sequenceof elements of S has a subsequence which converges to an element of S, i.e.,(xn)n∈N is a sequence in S

®⇒∃ a subsequence (yn)n∈N of (xn)n∈N such that yn → x ∈ S

®.

In what follows, we want to prove that in metric spaces, compactness is equivalent to sequentialcompactness. To do that requires some work and the introduction of some, useful in itself, concepts.

Proposition 508 (Nested intervals) For every n ∈ N, define In = [an, bn] ⊆ R such that In+1 ⊆ In.Then ∩n∈NIn 6= ∅.

Proof. By assumption,a1 ≤ a2 ≤ ... ≤ an ≤ ... (11.16)

and...bn ≤ bn−1 ≤ ... ≤ b1 (11.17)

Then,∀m,n ∈ N, am < bn

simply because, if m > n, then am < bm ≤ bn,where the first inequality follows from thedefinition of interval Im and the second one from (11.17), and if m ≤ n, then am ≤ an ≤ bn,wherethe first inequality follows from (11.16) and the second one from the definition of interval In.Then A := {an : n ∈ N} is nonempty and bounded above by bn for any n.Then supA := s exists.Since ∀n ∈ N, bn is an upper bound for A,

∀n ∈ N, s ≤ bn

and from the definition of sup,∀n ∈ N, an ≤ s

Then∀n ∈ N, an ≤ s ≤ bn

and∀n ∈ N, In 6= ∅.

Remark 509 The statement in the above Proposition is false if instead of taking closed boundedintervals we take either open or unbounded intervals. To see that consider In =

¡0, 1n

¢and In =

[n,+∞].5For other examples, see among others, page 155, Ok (2007).


Proposition 510 (Bolzano- Weirstrass) If S ⊆ Rn has infinite cardinality and is bounded, then Sadmits at least an accumulation point, i.e., D (S) 6= ∅.

Proof. Step 1. n = 1.Since S is bounded, ∃a0, b0 ∈ R such that S ⊆ [a0, b0] := B0.Divide B0 in two subinterval of

equal length: ∙a0,

a0 + b02

¸and

∙a0 + b02

, b0

¸Choose an interval which contains an infinite number of points in S. Call B1 = [a1, b1] that

interval. Proceed as above for B1. We therefore obtain a family of intervals

B0 ⊇ B1 ⊇ .... ⊇ Bn ⊇ ...

Observe that lenght B0 := b0 − a0 and

∀n ∈ N, lenght Bn =b0 − a02n

.

Therefore, ∀ε > 0, ∃ N ∈ N such that ∀n > N, lenght Bn < ε.From Proposition 508, it follows that

∃x ∈ ∩+∞n=0Bn

We are now left with showing that x is an accumulation point for S

∀r ∈ R++, B (x, r) contains an infinite number of points.

By construction, ∀n ∈ N, Bn contains an infinite number of points; it is therefore enough toshow that

∀r ∈ R++,∃n ∈ N such that B (x, r) ⊇ Bn.

Observe that

B (x, r) ⊇ Bn ⇔ (x− r, x+ r) ⊇ [an, bn]⇔ x− r < an < bn < x+ r ⇔ max {x− an, bn − x} < r

Moreover, since x ∈ [an, bn],

max {x− an, bn − x} < bn − an = lenght Bn =b0 − a02n

Therefore, it suffices to show that

∀r ∈ R++,∃n ∈ N such thatb0 − a02n

< r

i.e., n ∈ N and n > log2 (b0 − a0).Step 2. Omitted (See Ok (2007)).

Remark 511 The above Proposition does not say that there exists an accumulation point whichbelongs to S. To see that, consider S =

©1n : n ∈ N

ª.

Proposition 512 Let a metric space (X, d) be given and consider the following statements.

1. S is compact set;

2. Every infinite subset of S has an accumulation point which belongs to S, i.e.,

hT ⊆ S ∧#T is infinitei⇒ hD (T ) ∩ S 6= ∅i ,

3. S is sequentially compact

4. S is closed and bounded.


Then1.⇔ 2.⇔ 3⇒ 4.

If X = Rn, d = d2, then we also have that

3⇐ 4.

Proof. (1)⇒ (2)6

Take an infinite subset T ⊆ S and suppose otherwise. Then, no point in S is an accumulationpoint of T , i.e., ∀x ∈ S ∃rx > 0 such that

B (x, rx) ∩ T\ {x} = ∅.

Then7

B (x, rx) ∩ T ⊆ {x} . (11.19)

SinceS ⊆ ∪x∈SB (x, rx)

and S is compact, ∃x1, ..., xn such that

S ⊆ ∪ni=1B (xi, ri)

Then, since T ⊆ S,

T = S ∩ T ⊆ (∪ni=1B (xi, ri)) ∩ T = ∪ni=1 (B (xi, ri) ∩ T ) ⊆ {x1, ..., xn}

where the last inclusion follows from (11.19). But then #T ≤ n, a contradiction.(2)⇒ (3)Take a sequence (xn)n∈N of elements in S.If # {xn : n ∈ N} is finite, then ∃xn∗ such that xj = xn∗ for j in an infinite subset of N, and

(xn∗ , ..., xn∗ , ...) is the required convergent subsequence - converging to xn∗ ∈ S.If # {xn : n ∈ N} is infinite, then there exists a subsequence (yn)n∈N of (xn)n∈N with an infinite

amount distinct values, i.e., such that ∀n,m ∈ N, n 6= m, we have yn 6= ym. To construct thesubsequence (yn)n∈N, proceed as follows.

y1 = x1 := xk1 ,y2 = xk2 /∈ {xk1},y3 = xk3 /∈ {xk1 , xk2},...yn = xkn /∈

©xk1 , xk2 , ...xkn−1

ª,

...Since T := {yn : n ∈ N} is an infinite subset of S, by assumption it does have an accumulation

point x in S; moreover, we can redefine (yn)n∈N in order to have ∀n ∈ N, yn 6= x 8 , as follows.If ∃k such that yk = x, take the (sub)sequence (yk+1, yk+2, ...) = (yk+n)n∈N. With some abuse of

6Proofs of 1⇒ 2 and 2⇒ 3 are taken from Aliprantis and Burkinshaw (1990), pages 38-39.7 In general,

A\B = C ⇒ A ⊆ C ∪B, (11.18)

as shown below.Since A\B = A ∩BC , by assumption, we have

A ∩BC ∪B = C ∪B

Moreover,

A ∩BC ∪B = (A ∪B) ∩ BC ∪B = A ∪B ⊇ A

Observe that the inclusion in (11.18) can be strict, i.e., it can be

A\B = C ∧A ⊂ C ∪B;

just take A = {1} , B = {2} and C = {1} :

A\B = {1} = C ∧A = {1} ⊂ C ∪B = {1, 2} .;

8Below we need to have d (yn, x) > 0.


notation, call still (yn)n∈N the sequence so obtained. Now take a further subsequence as follows,using the fact that x is an accumulation point of {yn : n ∈ N} := T ,

ym1∈ T such that d (ym1, x) < 1

1 ,

ym2 ∈ T such that d (ym2, x) < minn12 , (d (ym, x))m≤m1

o,

ym3∈ T such that d (ym3, x) < min

n13 , (d (ym, x))m≤m2

o,

...ymn ∈ T such that d (ymn , x) < min

n1n , (d (ym, x))m≤mn−1

o,

Observe that since ∀n, d (ymn , x) < minn(d (ym, x))m≤mn−1

o, we have that ∀n, mn > mn−1

and therefore (ymn)n∈N is a subsequence of (yn)n∈N and therefore of (xn)n∈N. Finally, since

limn→+∞

d (ymn, x) < lim

n→+∞

1

n= 0

we also have thatlim

n→+∞ymn = x

as desired.(3)⇒ (1)It is the content of Proposition 521 below.(1)⇒ (4)It is the content of Remark 505.If X = Rn, (4)⇒ (2)Take an infinite subset T ⊆ S. Since S is bounded T is bounded as well. Then from Bolzano-

Weirestrass theorem, i.e., Proposition 510, D (T ) 6= ∅. Since T ⊆ S, from Proposition 547, D (T ) ⊆D (S) and since S is closed, D (S) ⊆ S. Then, summarizing ∅ 6= D (T ) ⊆ S and thereforeD (T ) ∩ S = D (T ) 6= ∅, as desired.To complete the proof of the above Theorem it suffices to show sequential compactness implies

compactness, which is done below, and it requires some preliminary results.

Definition 513 Let (X,d) be a metric space and S a subset of X. S is totally bounded if ∀ε > 0,∃a finite set T ⊆ S such that S ⊆ ∪x∈TB (x, ε).


S totally bounded ⇒ S bounded.

Proof. It follows from the definition of totally bounded sets and from Proposition 502.

Remark 515 In the previous Proposition, the opposite implication does not hold true.

Example 516 Take (X, d) where X is an infinite set and d is the discrete metric. Then, if ε = 12 ,

a ball is needed to “take care of each element in X” .ably Similar situation arises in the followingprobably more interesting example.

Example 517 Consider the metric space¡l2, d2

¢- see Proposition 425. Recall that

l2 =

((xn)n∈N ∈ R∞ :

+∞Xn=1

|xn|2 < +∞)

and

d2 : l2 × l2 → R+,


¢7→Ã+∞Xn=1

|xn − yn|2! 1

2

.

Define em = (em,n)n∈N such that

em,n =

⎧⎨⎩ 1 if n = m

0 if n 6= m,


and S = {em : m ∈ N}. In other words, S =

{(1, 0, 0, ..., 0, ...) , (0, 1, 0, ..., 0, ...) , (0, 0, 1, ..., 0, ...) , ...} .

Observe that ∀m ∈ N,P+∞

n=1 |em,n|2 = 1 and therefore S ⊆ l2. We now want to check that S isbounded, but not totally bounded. The main ingredient of the argument below is that

∀m, p ∈ N such that m 6= p, d (em, ep) =√2. (11.20)

1. S is bounded. For any m ∈ N, d (e1, em) =³P+∞

n=1 |e1,n, em,n|2´ 12

= 212 .

2. S is not totally bounded. We want to show that ∃ε > 0 such that for any finite subset Tof S there exists x ∈ S such that x /∈ ∪x∈TB (x, ε). Take ε = 1and let T = {ek : k ∈ N} with Narbitrary finite subset of N. Then, for k0 ∈ N\N , ek0 ∈ S and from (11.20) , for any k0 ∈ N\N andk ∈ N , d (ek, ek0) =

√2 > 1. Therefore, for k0 ∈ N\N , ek0 /∈ ∪k∈NB (ek, 1) .

Remark 518 In (Rn, d2), S bounded ⇒ S totally bounded.

Lemma 519 Let (X, d) be a metric space and S a subset of X.

S sequentially compact ⇒ S totally bounded.

Proof. Suppose otherwise, i.e., ∃ε > 0 such that for any finite set T ⊆ S, S " ∪x∈TB (x, ε).We are now going to construct a sequence in S which does not admit any convergent subsequence,contradicting sequential compactness.Take an arbitrary

x1 ∈ S.

Then, by assumption S " B (x1, ε). Then take x2 ∈ S\B (x1, ε), i.e.,

x2 ∈ S and d (x1, x2) > ε.

By assumption, S " B (x1, ε) ∪B (x2, ε). Then, take x3 ∈ S\ (B (x1, ε) ∪B (x2, ε)), i.e.,

x3 ∈ S and for i ∈ {1, 2} , d (x3, xi) > ε.

By the axiom of choice, we get that

∀n ∈ N, xn ∈ S and for i ∈ {1, ..., n− 1} , d (xn, xi) > ε.

Therefore, we have constructed a sequence (xn)n∈N ∈ S∞ such that

∀i, j ∈ N, if i 6= j, then d (xi, xj) > ε. (11.21)

But, then it is easy to check that (xn)n∈N does not have any convergent subsequence in S, asverified below. Suppose otherwise, then (xn)n∈N would admit a subsequence (xm)m∈N ∈ S∞ suchthat xm → x ∈ S. But, by definition of convergence, ∃N ∈ N such that ∀m > N, d (xm, x) <

ε2 ,

and therefored (xm, xm+1) ≤ d (xm, x) + d (xm+1, x) < ε,

contradicting (11.21).

Lemma 520 Let (X, d) be a metric space and S a subset of X.¿S sequentially compactS is an open cover of S

À⇒

∃ε > 0 such that ∀x ∈ S, ∃Ox ∈ S such that B (x, ε) ⊆ Ox

®.

Proof. Suppose otherwise; then

∀n ∈ N+, ∃xn ∈ S such that ∀O ∈ S, B

µxn,

1

n

¶* O. (11.22)


By sequential compactness, the sequence (xn)n∈N ∈ S∞ admits a subsequence, without loss ofgenerality the sequence itself, (xn)n∈N ∈ S∞ such that xn → x ∈ S. Since S is an open cover of S,∃ O ∈ S such that x ∈ O and, since O is open, ∃ε > 0 such that

B (x, ε) ⊆ O. (11.23)

Since xn → x, ∃M ∈ N such that{xM+i, i ∈ N} ⊆ B¡x, ε2

¢. Now, take n > max

©M, 2ε

ª. Then,

B

µxn,

1

n

¶⊆ B (x, ε) . (11.24)

i.e., d (y, xn) < 1n ⇒ d (y, x) < ε, as shown below.

d (y, x) ≤ d (y, xn) + d (xn, x) <1

n+

ε

2< ε.

From (11.23) and (11.24), we get B¡xn,

1n

¢⊆ O ∈ S, contradicting (11.22).


S sequentially compact ⇒ S compact.

Proof. Take an open cover S of S. Since S is sequentially compact, from Lemma 520,

∃ε > 0 such that ∀x ∈ S ∃Ox ∈ S such that B (x, ε) ⊆ Ox.

Moreover, from Lemma 519 and the definition of total boundedness, there exists a finite setT ⊆ S such that S ⊆ ∪x∈TB (x, ε) ⊆ ∪x∈TOx. But then {Ox : x ∈ T} is the required subcover ofS which covers S.We conclude our discussion on compactness with some results we hope will clarify the concept

of compactness in Rn.

Proposition 522 Let X be a proper subset of Rn, and C a subset of X.

C is bounded and (Rn, d2) closed(m 1)

C is (Rn, d2) compact(m 2)

C is (X, d2) compact⇓ (not ⇑) 3

C is bounded and (X, d2) closed

Proof. [1 m]It is the content of (Propositions 503, 504 and last part of) Proposition 512.To show the other result, observe preliminarily that

(X ∩ Sα) ∪ (X ∩ Sβ) = X ∩ (Sα ∪ Sβ)

[2 ⇓]Take T := {Tα}α∈A such that ∀α ∈ A, Tα is (X, d) open and C ⊆ ∪α∈ATα. From Proposition

475,∀α ∈ A, ∃ Sα such that Sα is (Rn, d2) open and Tα = X ∩ Sα.

ThenC ⊆ ∪α∈ATα ⊆ ∪α∈A (X ∩ Sα) = X ∩ (∪α∈ASα) .

We then have thatC ⊆ ∪α∈ASα,

i.e., S := {Sα}α∈A is a (Rn, d2) open cover of C and since C is (Rn, d2) compact, then there existsa finite subcover {Si}i∈N of S such that

C ⊆ ∪i∈NSi.

11.6. COMPLETENESS 149

Since C ⊆ X , we then have

C ⊆ (∪i∈NSi) ∩X = ∪i∈N (Si ∩X) = ∪i∈NTi,

i.e., {Ti}i∈N is a (X, d) open subcover of {Tα}α∈A which covers C, as required.[2 ⇑]Take S := {Sα}α∈A such that ∀α ∈ A, Sα is (Rn, d2) open and C ⊆ ∪α∈ASα. From Proposition

475,∀α ∈ A, Tα := X ∩ Sα is (X, d) open.

Since C ⊆ X , we then have

C ⊆ (∪α∈ASα) ∩X = ∪α∈A (Sα ∩X) = ∪α∈ATα.

Then, by assumption, there exists {Ti}i∈N is an open subcover of {Tα}α∈A which covers C, andtherefore there exists a set N with finite cardinality such that

C ⊆ ∪i∈NTi = ∪i∈N (Si ∩X) = (∪i∈NSi) ∩X ⊆ (∪i∈NSi) ,

i.e., {Si}i∈N is a (Rn, d2) open subcover of {Sα}α∈A which covers C, as required.[3 ⇓]It is the content of Propositions 503, 504.[3 not ⇑]See Remark 505.1.

Remark 523 The proof of part [2 m] above can be used to show the following result.Given a metric space (X, d), a metric subspace (Y, d) a set C ⊆ Y , then

C is (Y, d) compactm

C is (X, d) compact

In other words, (X 0, d) compactness of C ⊆ X 0 ⊆ X is an intrinsic property of C: it does notdepend by the subspace X 0you are considering. On the other hand, as we have seen, closedness andopenness are not an intrinsic property of the set.

Remark 524 Observe also that to define “anyway” compact sets as closed and bounded sets wouldnot be a good choice. The conclusion of the extreme value theorem (see Theorem 580) would nothold in that case. That theorem basically says that a continuous real valued function on a compactset admits a global maximum. It is not the case that a continuous real valued function on a closedand bounded set admits a global maximum: consider the continuous function

f : (0, 1]→ R, f (x) =1

x.

The set (0, 1] is bounded and closed (in ((0,+∞) , d2) and f has no maximum on (0, 1].

11.6 Completeness

11.6.1 Cauchy sequences

Definition 525 Let (X, d) be a metric space. A sequence (xn)n∈N ∈ X∞ is a Cauchy sequence if

∀ε > 0, ∃N ∈ N such that ∀l,m > N, d (xl, xm) < ε.

Proposition 526 Let a metric space (X, d) and a sequence (xn)n∈N ∈ X∞ be given.

1. (xn)n∈N is convergent ⇒ (xn)n∈N is Cauchy, but not vice-versa;

2. (xn)n∈N is Cauchy ⇒ (xn)n∈N is bounded;

3. (xn)n∈N is Cauchy and it has a subsequence converging in X ⇒ (xn)n∈N is convergent.


Proof. 1.[⇒] Since (xn)n∈N is convergent, by definition, ∃x ∈ X such that xn → x, ∃N ∈ N such that

∀l,m > N , d (x, xl) < ε2 and d (x, xm) <

ε2 . But then d (xl, xm) ≤ d (xl, x) + d (x, xm) <

ε2 +

ε2 = ε.

[:]Take X = (0, 1) , d = absolute value, (xn)n∈N\{0} ∈ (0, 1)

∞ such that ∀n ∈ N\ {0} , xn = 1n .

(xn)n∈N\{0} is Cauchy:

∀ε > 0, d

µ1

l,1

m

¶=

¯1

l− 1

m

¯<

¯1

l

¯+

¯1

m

¯=1

l+1

m<

ε

2+

ε

2= ε,

where the last inequality is true if 1l < ε2 and

1m < ε

2 , i.e., if l >2ε and m > 2

ε . Then, it isenough to take N > 2

ε and N ∈ N, to get the desired result.(xn)n∈N\{0} is not convergent to any point in (0, 1):take any x ∈ (0, 1). We want to show that

∃ε > 0 such that ∀N ∈ N ∃n > N such that d (xn, x) > ε.

Take ε = x2 > 0 and ∀N ∈ N , take n∗ ∈ N such that 1

n∗ < min©1N , x2

ª. Then, n∗ > N , and¯

1

n∗− x

¯= x− 1

n∗> x− x

2=

x

2= ε.

2.Take ε = 1. Then ∃N ∈ N such that ∀l,m > N, d (xl, xm) < 1. If N = 1, we are done. If

N ≥ 1, definer = max {1, d (x1, xN ) , ..., d (xN−1, xN )} .

Then{xn : n ∈ N} ⊆ B (xN , r) .

3.Let (xnk)k∈N be a convergent subsequence to x ∈ X. Then,

d (xn, x) ≤ d (xn, xnk) + d (xnk , x) .

Since d (xn, xnk)→ 0, because the sequence is Cauchy, and d (xnkn, x)→ 0, because the subse-quence is convergent, the desired result follows.

11.6.2 Complete metric spaces

Definition 527 A metric space (X,d) is complete if every Cauchy sequence is a convergent se-quence.

Remark 528 If a metric space is complete, to show convergence you do not need to guess the limitof the sequence: it is enough to show that the sequence is Cauchy.

Example 529 ((0, 1) , absolute value) is not a complete metric space; it is enough to consider¡1n

¢n∈N\{0}.

Example 530 Let (X, d) be a discrete metric space. Then, it is complete. Take a Cauchy sequence(xn)n∈N ∈ X∞. Then, we claim that ∃N ∈ N and x ∈ X such that ∀n > N, xn = x. Supposeotherwise:

∀N ∈ N,∃m,m0 > N such that xm 6= xm0 ,

but then d (xm, xm0) = 1, contradicting the fact that the sequence is Cauchy.

Example 531 (Q, d2) is not a complete metric space. Since R\Q is dense in R, ∀x ∈ R\Q, wecan find (xn)n∈N ∈ Q∞ such that xn → x.

Proposition 532¡Rk, d2

¢is complete.

11.6. COMPLETENESS 151

Proof. 1. (R, d2) is complete.Take a Cauchy sequence (xn)n∈N ∈ R∞. Then, from Proposition 526.2, it is bounded. Then from

Bolzanol-Weierstrass Theorem (i.e., Proposition 486.4), (xn)n∈N does have a convergent subsequence- i.e., ∃ (yn)n∈N ∈ R∞ which is a subsequence of (xn)n∈N and such that yn → a ∈ R. Then fromProposition 526.3.2. For any k ≥ 2,

¡Rk, d2

¢is complete.

Take a Cauchy sequence (xn)n∈N ∈¡Rk¢∞

For i ∈ {1, ..., k}, consider¡xin¢n∈N ∈ R

∞. Then, forany n,m ∈ N, ¯

xin − xim¯< kxn − xmk .

Then, ∀i ∈ {1, ..., k},¡xin¢n∈N ∈ R

∞ is Cauchy and therefore from 1. above, ∀i ∈ {1, ..., k},¡xin¢n∈N is convergent. Finally, from Proposition 489, the desired result follows.

Example 533 For any nonempty set T , (B (T ) , d∞) is a complete metric space.Let (fn)n∈N ∈ (B (T ))

∞ be a Cauchy sequence. For any x ∈ T , (fn (x))n∈N ∈ R∞ is a Cauchysequence, and since R is complete, it has a convergent subsequence, without loss of generality,(fn (x))n∈N itself converging say to fx ∈ R. Define

f : T → R, : x 7→ fx.

We are going to show that (i). f ∈ B (T ), and (ii) fn → f .(i). Since (fn)n is Cauchy,

∀ε > 0, ∃N ∈ N such that ∀l,m > N, d∞ (fl, fm) := supx∈T

|fl (x)− fm (x)| < ε.

Then,∀x ∈ T, |fl (x)− fm (x)| ≤ sup

x∈T|fl (x)− fm (x)| = d∞ (fl, fm) < ε. (11.25)

Taking limits of both sides of (11.25) for l→ +∞, and using the continuity of the absolute valuefunction, we have that

∀x ∈ T, liml→+∞

|fl (x)− fm (x)| = |f (x)− fm (x)| < ε. (11.26)

Since9

∀x ∈ T, | |f (x)|− |fm (x)| | ≤ | f (x)− fm (x) | < ε,

and therefore,∀x ∈ T, |f (x)| ≤ fm (x) + ε.

Since fl ∈ B (T ), f ∈ B (T ) as well.(ii) From (11.26), we also have that

∀x ∈ T, |f (x)− fm (x)| < ε,

and by definition of sup

d∞ (fm, f) := supx∈T

|fm (x)− f (x)| < ε,

i.e., d∞ (fm, f)→ 0.

For future use, we also show the following result.

Proposition 534

BC (X) := {f : X → R : f is bounded and continuous}

endowed with the metric d (f, g) = supx∈X |f (x)− g (x)| is a complete metric space.

Proof. See Stokey and Lucas (1989), page 47.9See, for example, page 37 in Ok (2007).


11.6.3 Completeness and closedness

Proposition 535 Let a metric space (X, d) and a metric subspace (Y, d) of (X, d) be given.

1. Y complete ⇒ Y closed;

2. Y complete ⇐ Y closed and X complete.

Proof. 1.Take (xn)n∈N ∈ Y∞ such that xn → x. From Proposition 493, it is enough to show that x ∈ Y .

Since (xn)n∈N is convergent in X, then it is Cauchy. Since Y is complete, by definition, xn → x ∈ Y .2.Take a Cauchy sequence (xn)n∈N ∈ Y∞. We want to show that xn → x ∈ Y . Since Y ⊆ X,

(xn)n∈N is Cauchy in X, and since X is complete, xn → x ∈ X. But since Y is closed, x ∈ Y.

Remark 536 An example of a metric subspace (Y, d) of (X, d) which is closed and not complete isthe following one. (X, d) = (R++, d2), (Y, d) = ((0, 1] , d2) and (xn)n∈N\{0} =

¡1n

¢n∈N\{0}.

Corollary 537 Let a complete metric space (X,d) and a metric subspace (Y, d) of (X, d) be given.Then,

Y complete ⇔ Y closed.

11.7 Fixed point theorem: contractionsDefinition 538 Let (X, d) be a metric space. A function ϕ : X → X is said to be a contraction if

∃k ∈ (0, 1) such that ∀x, y ∈ X, d (φ (x) , φ (y)) ≤ k · d (x, y) .

The inf of the set of k satisfying the above condition is called contraction coefficient of φ.

Example 539 1. Given (R, d2),

fα : R→ R, x 7→ αx

is a contraction iff |α| < 1; in that case |α| is the contraction coefficient of fα.2. Let S be a nonempty open subset of R and f : S → S a differentiable function. If

supx∈S

|f 0 (x)| < 1,

then f is a contraction

Definition 540 For any f, g ∈ X ⊆ B (T ), we say that f ≤ g if ∀x ∈ T, f (x) ≤ g (x).

Proposition 541 (Blackwell) Let the following objects be given:1. a nonempty set T ;2. X is a nonempty subset of the set B (T ) such that ∀f ∈ X, ∀α ∈ R+, f + α ∈ X;3. φ : X → X is increasing, i.e., f ≤ g ⇒ φ (f) ≤ φ (g);4. ∃δ ∈ (0, 1) such that ∀f ∈ X,∀α ∈ R+, φ (f + α) ≤ φ (f) + δα.Then φ is a contraction with contraction coefficient δ.

Proof. ∀f, g ∈ X, ∀x ∈ T

f (x)− g (x) ≤ |f (x)− g (x)| ≤ supx∈T

|f (x)− g (x)| = d∞ (f, g) .

Therefore, f ≤ g + d∞ (f, g), and from Assumption 3,

φ (f) ≤ φ (g + d∞ (f, g)) .

Then, from Assumption 4,

∃δ ∈ (0, 1) such that φ (g + d∞ (f, g)) ≤ φ (g) + δd∞ (f, g) ,

11.7. FIXED POINT THEOREM: CONTRACTIONS 153

and thereforeφ (f) ≤ φ (g) + δd∞ (f, g) . (11.27)

Since the argument above is symmetric with respect to f and g, we also have

φ (g) ≤ φ (f) + δd∞ (f, g) . (11.28)

From (11.27) and (11.28) and the definition of absolute value, we have

|φ (f)− φ (g)| ≤ δd∞ (f, g) ,

as desired.

Proposition 542 (Banach fixed point theorem) Let (X, d) be a complete metric space. If φ : X →X is a contraction with coefficient k, then

∃! x∗ ∈ X such that x∗ = φ (x∗) . (11.29)

and∀x0 ∈ X and ∀n ∈ N\ {0} , d (φn (x0) , x

∗) ≤ kn · d (x0, x∗) , (11.30)

where φn :=1

(φ ◦2

φ ◦ .... ◦n

φ).Proof. (11.29) holds true.Take any x0 ∈ X and define the sequence

(xn)n∈N ∈ X∞, with ∀n ∈ N, xn+1 = φ (xn) .

We want to show that 1. that (xn)n∈N is Cauchy, 2. its limit is a fixed point for φ, and 3. thatfixed point is unique.1. First of all observe that

∀n ∈ N\ {0} , d (xn+1, xn) ≤ knd (x1, x0) , (11.31)

where k is the contraction coefficient of φ, as shown by induction below.Step 1: P (1) is true:

d (x2, x1) = d (φ (x1) , φ (x0)) ≤ kd (x1, x0)

from the definition of the chosen sequence and the assumption that φ is a contraction.Step 2. P (n− 1)⇒ P (n) :

d (xn+1, xn) = d (φ (xn) , φ (xn−1)) ≤ kd (xn, xn−1) ≤ knd (x1, x0)

from the definition of the chosen sequence, the assumption that φ is a contraction and the assumptionof the induction step.Now, for any m, l ∈ N with m > l,

d (xm, xl) ≤ d (xm, xm−1) + d (xm−1, xm−2) + ...+ d (xl+1, xl) ≤

≤¡km−1 + km−2 + ...+ kl

¢d (x1, x0) ≤ kl 1−k

m−l

1−k d (x1, x0) ,

where the first inequality follows from the triangle inequality, the third one from the followingcomputation10 :

km−1 + km−2 + ...+ kl = kl¡1 + k + ...+ km−l+1

¢= kl

1− km−l

1− k.

10We are also using the basic facat used to study geometrical series. Define

sn ≡ 1 + a+ a2 + ...+ an;

:Multiply both sides of the above equality by(1− a):

(1− a) sn ≡ (1− a) 1 + a+ a2 + ...+ an

(1− a) sn ≡ 1 + a+ a2 + ...+ an − a+ a2 + ...+ an+1 = 1− an+1

Divide both sides by (1− a):

sn ≡ 1 + a+ a2 + ...+ an − a+ a2 + ...+ an+1 =1− an+1

1− a=

1

1− a− an+1

1− a


Finally, since k ∈ (0, 1) ,we get

d (xm, xl) ≤kl

1− kd (x1, x0) . (11.32)

If x1 = x0, then for any m, l ∈ N with m > l, d (xm, xl) = 0 and ∀n ∈ N, xn = x0 andthe sequence is converging and therefore it is Cauchy. Therefore, consider the case x1 6= x0.From(11.32) it follows that (xn)n∈N ∈ X∞ is Cauchy: ∀ε > 0 choose N ∈ N such that kN1−kd (x1, x0) < ε,

i.e., kN < ε(1−k)d(x1,x0)

and N >log ε(1−k)

d(x1,x0)

logκ .

2. Since (X, d) is a complete metric space, (xn)n∈N ∈ X∞ does converge say to x∗ ∈ X, and,in fact, we want to show that φ (x∗) = x∗. Then, ∀ε > 0,∃N ∈ N such that ∀n > N,

d (φ (x∗) , x∗) ≤ d (φ (x∗) , xn+1) + d¡xn+1, x∗

¢≤

≤ d (φ (x∗) , φ (xn)) + d¡xn+1, x∗

¢≤ kd (x∗, xn) + d

¡xn+1, x∗

¢≤

≤ ε2 +

ε2 = ε,

where the first equality comes from the triangle inequality, the second one from the constructionof the sequence (xn)n∈N ∈ X∞, the third one from the assumption that φ is a contraction and thelast one from the fact that (xn)n∈N converges to x∗. Since ε is arbitrary, d (φ (x∗) , x∗) = 0, asdesired.3. Suppose that bx is another fixed point for φ - beside x∗. Then,

d (bx, x∗) = d (φ (bx) , φ (x∗)) ≤ kd (bx, x∗)and assuming bx 6= x∗ would imply 1 ≤ k, a contradiction of the fact that φ is a contraction with

contraction coefficient k.(11.30) hods true.We show the claim by induction on n ∈ N\ {0}.P (1) is true.

d (φ (x0) , x∗) = d (φ (x0) , φ (x

∗)) ≤ k · d (x0, x∗) ,

where the equality follows from the fact that x∗ is a fixed point for φ, and the inequality by thefact that φ is a contraction.P (n− 1) is true implies that P (n) is true.

d (φn (x0) , x∗) = d (φn (x0) , φ (x

∗)) = d¡φ¡φn−1 (x0)

¢, φ (x∗)

¢≤

≤ k · d¡φn−1 (x0) , x

∗¢ ≤ k · kn−1 · d (x0, x∗) = kn · d (x0, x∗) .

11.8 Appendix. Some characterizations of open and closedsets

Remark 543 From basic set theory, we have AC ∩B = ∅⇔ B ⊆ A, as verified below.

¬∃x : x ∈ AC ∧ x ∈ B

®= h∀x : x ∈ A ∨ ¬ (x ∈ B)i =

= h∀x : ¬ (x ∈ B) ∨ x ∈ Ai (∗)= h∀x : x ∈ B ⇒ x ∈ Ai ,

where (∗) follows from the fact that hp⇒ qi = h(¬p) ∨ qi.

Proposition 544 S is open ⇔ S ∩F (S) = ∅.

11.8. APPENDIX. SOME CHARACTERIZATIONS OF OPEN AND CLOSED SETS 155

Proof. [⇒]Suppose otherwise, i.e., ∃x ∈ S ∩F (S). Since x ∈ F (S), ∀r ∈ R++, B (x, r) ∩ SC 6= ∅. Then,

from Remark 543, ∀r ∈ R++, it is false that B (x, r) ⊆ S, contradicting the assumption that S isopen.[⇐]Suppose otherwise, i.e., ∃x ∈ S such that

∀r ∈ R++, B (x, r) ∩ SC 6= ∅ (11.33)

Moreoverx ∈ B (x, r) ∩ S 6= ∅ (11.34)

But (11.33) and (11.34) imply x ∈ F (S). Since x ∈ S, we would have S ∩F (S) 6= ∅, contra-dicting the assumption.

Proposition 545 S is closed ⇔ F (S) ⊆ S.

Proof.

S closed ⇔ SC open(1)⇔ SC ∩F

¡SC¢= ∅ (2)⇔ SC ∩F (S) = ∅ (3)⇔ F (S) ⊆ S

where(1) follows from Proposition 544;(2) follows from Remark 465(3) follows Remark 543.

Proposition 546 S is closed ⇔ D (S) ⊆ S.

Proof. We are going to use Proposition 545, i.e., S is closed ⇔ F (S) ⊆ S.[⇒]Suppose otherwise, i.e.,

∃x /∈ S such that ∀r ∈ R++, (S\ {x}) ∩B (x, r) 6= ∅

and since x /∈ S, it is also true that

∀r ∈ R++, S ∩B (x, r) 6= ∅ (11.35)

and∀r ∈ R++, SC ∩B (x, r) 6= ∅ (11.36)

From (11.35) and (11.36), it follows that x ∈ F (S), while x /∈ S, which from Proposition 545contradicts the assumption that S is closed.[⇐]Suppose otherwise, i.e., using Proposition 545,

∃x ∈ F (S) such that x /∈ S

Then, by definition of F (S),

∀r ∈ R++, B (x, r) ∩ S 6= ∅.

Since x /∈ S, we also have

∀r ∈ R++, B (x, r) ∩ (S\ {x}) 6= ∅,

i.e., x ∈ D (S) and x /∈ S, a contradiction.

Proposition 547 ∀S, T ⊆ X, S ⊆ T ⇒ D (S) ⊆ D (T ).


Proof. Take x ∈ D (S). Then

∀r ∈ R++, (S\ {x}) ∩B (x, r) 6= ∅. (11.37)

Since S ⊆ T , we also have

(T\ {x}) ∩B (x, r) ⊇ (S\ {x}) ∩B (x, r) . (11.38)

From (11.37) and (11.38), we get x ∈ D (T ).

Proposition 548 S ∪D (S) is a closed set.

Proof. Take x ∈ (S ∪D (S))C i.e., x /∈ S and x /∈ D (S). We want to show that

∃r ∈ R++ such that B (x, r) ∩ (S ∪D (S)) = ∅,

i.e.,∃r ∈ R++ such that (B (x, r) ∩ S) ∪ (B (x, r) ∩D (S)) = ∅,

Since x /∈ D (S), ∃r ∈ R++ such that B (x, r) ∩ (S\ {x}) = ∅. Since x /∈ S, we also have that

∃r ∈ R++ such that S ∩B (x, r) = ∅. (11.39)

We are then left with showing that B (x, r) ∩ D (S) = ∅. If y ∈ B (x, r), then from (11.39),y /∈ S and B (x, r) ∩ S \ {y} = ∅, i.e., y /∈ D (S) ,i.e., B (x, r) ∩D (S) = ∅, as desired.

Proposition 549 Cl (S) = S ∪D (S).

Proof. [⊇]Since

S ⊆ Cl (S) (11.40)

from Proposition 547,D (S) ⊆ D (Cl (S)) . (11.41)

Since Cl (S) is closed, from Proposition 546,

D (Cl (S)) ⊆ Cl (S) (11.42)

From (11.40) , and (11.41) , (11.42), we get

S ∪D (S) ⊆ Cl (S)

[⊆]Since, from Proposition 548, S ∪D (S) is closed and contains S, then by definition of Cl (S),

Cl (S) ⊆ S ∪D (S) .

To proceed in our analysis, we need the following result.

Lemma 550 For any metric space (X, d) and any S ⊆ X, we have that1. X = Int S ∪ F (S) ∪ Int SC, and2. (Int S ∪F (S))C = Int SC .

Proof. If either S = ∅ or S = X, the results are trivial. Otherwise, observe that either x ∈ Sor x ∈ X \ S.1. If x ∈ S,theneither ∃r ∈ R++ such that B (x, r) ⊆ S and then x ∈ Int S,or ∀r ∈ R++, B (x, r) ∩ SC 6= ∅ and then x ∈ F (S) .Similarly, if x ∈ X \ S,theneither ∃r0 ∈ R++ such that B (x, r0) ⊆ X \ S and then x ∈ Int (X \ S),or ∀r0 ∈ R++, B (x, r0) ∩ S 6= ∅ and then x ∈ F (S) .2. By definition of Interior and Boundary of a set, (Int S ∪F (S)) ∩ Int SC = ∅.Now, for arbitrary sets A,B ⊆ X such that A ∪B = X and A ∩B = ∅, we have what follow:A ∪B = X ⇔ (A ∪B)C = XC ⇔ AC ∩BC = ∅, and from Remark 543, BC ⊆ A;A ∩B = ∅⇔ A ∩

¡BC¢C= ∅⇒ A ⊆ BC .

Therefore we can the desired result.

11.8. APPENDIX. SOME CHARACTERIZATIONS OF OPEN AND CLOSED SETS 157

Proposition 551 Cl (S) = Int S ∪ F (S).

Proof. From Lemma 550, it is enough to show that

(Cl (S))C = Int SC .

[⊇]Take x ∈ Int SC . Then, ∃r ∈ R++ such that B (X, r) ⊆ SC and therefore B (x, r)∩S = ∅ and,

since x /∈ S,B (x, r) ∩ (S\ {x}) = ∅.

Then x /∈ S and x /∈ D (S), i.e.,

x /∈ S ∪D (S) = Cl (S)

where last equality follows from Proposition 549. In other words, x ∈ (Cl (S))C .[⊆]Take x ∈ (Cl (S))C = (D (S) ∪ S)C . Since x /∈ D (S),

∃r ∈ R++ such that (S\ {x}) ∩B (x, r) = ∅ (11.43)

Since x /∈ S,∃r ∈ R++ such that S ∩B (x, r) = ∅ (11.44)

i.e.,∃r ∈ R++ such that B (x, r) ⊆ SC (11.45)

and x ∈ Int SC .

Definition 552 x ∈ X is an adherent point for S if ∀r ∈ R++, B (x, r) ∩ S 6= ∅ and

Ad (S) := {x ∈ X : ∀r ∈ R++, B (x, r) ∩ S 6= ∅}

Corollary 553 1. Cl (S) = Ad (S).2.A set S is closed ⇔ Ad (S) = S.

Proof. 1.[⊆]x ∈ Cl (S)⇒ hx ∈ IntS or F (S)i and in both cases the desired conclusion is insured.[⊇]If x ∈ S, then, by definition of closure, x ∈ Cl (S). If x /∈ S,then S = S\ {x} and, from the

assumption, ∀r ∈ R++, B (x, r) ∩ (S\ {x}) 6= ∅, i.e., x ∈ D (S) which is contained in Cl (S)fromProposition 549.2. It follows from 1. above and Proposition 468.2.

Proposition 554 x ∈ Cl (S)⇔ ∃ (xn)n∈N in S converging to x.

Proof. [⇒]From Corollary 553, if x ∈ Cl (S) then ∀n ∈ N, we can take xn ∈ B

¡x, 1n

¢∩S. Then d (x, xn) <

1n and limn→+∞ d (x, xn) = 0.[⇐]By definition of convergence,

∀ε > 0,∃nε ∈ N such that ∀n > nε, d (xn, x) < ε or xn ∈ B (x, ε)

or∀ε > 0, B (x, ε) ∩ S ⊇ {xn : n > nε}

and∀ε > 0, B (x, ε) ∩ S 6= ∅

i.e., x ∈ Ad (S) , and from the Corollary 553.1, the desired result follows.


Proposition 555 S is closed ⇔ any convergent sequence (xn)n∈N with elements in S converges toan element of S.

Proof. We are going to show that S is closed using Proposition 546, i.e., S is closed ⇔D (S) ⊆ S. We want to show that

hD (S) ⊆ Si⇔¿¿

(xn)n∈N is such that 1. ∀n ∈ N, xn ∈ S, and2. xn → x0

À⇒ x0 ∈ S

À,

[⇒]Suppose otherwise, i.e., there exists (xn)n∈N such that 1. ∀n ∈ N, xn ∈ S. and 2.xn → x0, but

x0 /∈ S.By definition of convergent sequence, we have

∀ε > 0,∃n0 ∈ N such that ∀n > n0, d (xn, x0) < ε

and, since ∀n ∈ N, xn ∈ S,

{xn : n > n0} ⊆ B (x0, ε) ∩ (S\ {x0})

Then,∀ε > 0, B (x0, ε) ∩ (S\ {x0}) 6= ∅

and therefore x0 ∈ D (S) while x0 /∈ S, contradicting the fact that S is closed.[⇐]Suppose otherwise, i.e., ∃ x0 ∈ D (S) and x0 /∈ S. We are going to construct a convergent

sequence (xn)n∈N with elements in S which converges to x0 (a point not belonging to S).From the definition of accumulation point,

∀n ∈ N, (S\ {x0}) ∩Bµx,1

n

¶6= ∅.

Then, we can take xn ∈ (S\ {x})∩B¡x, 1n

¢, and since d (xn, x0) < 1

n , we have that d (xn, x0)→0.Summarizing, the following statements are equivalent:

1. S is open (i.e., Int S ⊆ S)

2. SC is closed,

3. S ∩F (S) = ∅,

and the following statements are equivalent:

1. S is closed,

2. SC is open,

3. F (S) ⊆ S,

4. S = Cl (S) .

5. D (S) ⊆ S,

6. Ad (S) = S,

7. any convergent sequence (xn)n∈N with elements in S converges to an element of S.

Chapter 12

Functions

12.1 Limits of functionsIn what follows we take for given metric spaces (X,d) and (X 0, d0) and sets S ⊆ X and T ⊆ X 0.

Definition 556 Given x0 ∈ D (S) ,i.e., given an accumulation point x0 for S, and f : S → T , wewrite

limx→x0

f (x) = l

if∀ε > 0,∃δ > 0 such that x ∈

¡B(X,d) (x0, δ) ∩ S

¢\ {x0} ⇒ f (x) ∈ B(X0,d0) (l, ε)

or∀ε > 0,∃δ > 0 such that h x ∈ S ∧ 0 < d (x, x0) < δi⇒ d0 (f (x) , l) < ε

Proposition 557 Given x0 ∈ D (S) and f : S → T ,

hlimx→x0 f (x) = li⇔* for any sequence (xn)n∈N in S such that ∀n ∈ N, xn 6= x0 and limn→+∞ xn=x0,

limn→+∞ f (xn) = l.

+

Proof. for the following proof see also Proposition 6.2.4, page 123 in Morris.[⇒]Take

a sequence (xn)n∈N in S such that ∀n ∈ N, xn 6= x0 and limn→+∞

xn = x0

We want to show that limn→+∞ f (xn) = l, i.e.,

∀ε > 0,∃n0 ∈ N such that ∀n > n0, d (f (xn) , l) < ε

Since limx→x0 f (x) = l,

∀ε > 0,∃δ > 0 such that x ∈ S ∧ 0 < d (x, x0) < δ ⇒ d (f (x) , l) < ε

Since limn→+∞ xn = x

∀δ > 0,∃n0 ∈ N such that ∀n > n0, 0(∗)< d (xn, x0) < δ

where (∗) follows from the fact that ∀n ∈ N, xn 6= x0.Therefore, combining the above results, we get

∀ε > 0,∃n0 ∈ N such that ∀n > n0, d (f (xn) , l) < ε

as desired.

159

160 CHAPTER 12. FUNCTIONS

[⇐]Suppose otherwise, then

∃ε > 0 such that ∀δn = 1n , i.e., ∀ n ∈ N, ∃xn ∈ S such that

xn ∈ S ∧ 0 < d (xn, x0) <1nand d (f (x) , l) ≥ ε.

(12.1)

Consider (xn)n∈N; then, from the above and from Proposition 484, xn → x0, and from the above(specifically the fact that 0 < d (xn, x0)), we also have that ∀n ∈ N, xn 6= x0. Then by assumption,limn→+∞ f (xn) = l, i.e., by definition of limit,

∀ε > 0, ∃N ∈ N such that if n > N, then |f (xn − l)| < ε,

contradicting (12.1).

Proposition 558 (uniqueness) Given x0 ∈ D (S) and f : S → T ,¿limx→x0

f (x) = l1 and limx→x0

f (x) = l2

À⇒ hl1 = l2i

Proof. It follows from Proposition 557 and Proposition 488.

Proposition 559 Given S ⊆ X, x0 ∈ D (S) and f, g : S → R, , and

limx→x0

f (x) = l and limx→x0

g (x) = m

1. limx→x0 f (x) + g (x) = l +m;

2. limx→x0 f (x) · g (x) = l ·m;

3. if m 6= 0 and ∀x ∈ S, g (x) 6= 0, limx→x0f(x)g(x) =

lm .

Proof. It follows from Proposition 557 and Proposition 486.

12.2 Continuous Functions

Definition 560 Given a metric space (X,d) and a set V ⊆ X, an open neighborhood of V is anopen set containing V .

Remark 561 Sometimes, an open neighborhood is simply called a neighborhood.

Definition 562 Take x0 ∈ S and f : S → T . Then, f is continuous at x0 if

∀ε > 0,∃δ > 0 such that x ∈¡B(X.d) (x0, δ) ∩ S

¢⇒ f (x) ∈ B(X0,d0) (f (x0) , ε) ,

i.e.,∀ε > 0,∃δ > 0 such that x ∈ S ∧ d (x, x0) < δ ⇒ d0 (f (x) , f (x0)) < ε,

i.e.,∀ε > 0,∃δ > 0, f

¡B(X,d) (x0, δ) ∩ S

¢⊆ B(X0,d0) (f (x0) , ε) ,

i.e.,for any open neighborhood V of f (x0),there exists an open neighborhood U of x0 such that f (U ∩ S) ⊆ V.

If f is continuous at x0 for every x0 in S, f is continuous on S.

Remark 563 If x0 is an isolated point of S, f is continuos at x0. If x0 is an accumulation pointfor S, f is continuous at x0 if and only if limx→x0 f (x) = f (x0).

12.2. CONTINUOUS FUNCTIONS 161

Proposition 564 Suppose that Z ⊆ X 00, where (X 000, d00) is a metric space and

f : S → T, g :W ⊇ f (S)→ Z

h : S → Z, h (x) = g (f (x))

If f is continuous at x0 ∈ S and g is continuous at f (x0), then h is continuous at x0.

Proof. Exercise (see Apostol (1974), page 79) or Ok, page 206.

Proposition 565 Take f, g : S ⊆ Rn → R. If f and g are continuous, then

1. f + g is continuous;

2. f · g is continuous;

3. if ∀x ∈ S, g (x) 6= 0, fg is continuous.

Proof. If x0 is an isolated point of S, from Remark 563, we are done. If x0 is an accumulationpoint for S, the result follows from Remark 563 and Proposition 559.

Proposition 566 Let f : S ⊆ X → Rm, and for any j ∈ {1, ...,m} fj : S → R be such that∀x ∈ S,

f (x) = (fj (x))mj=1

Then,hf is continuousi⇔ h∀j ∈ {1, ...,m} , fj is continuousi

Proof. The proof follows the strategy used in Proposition 489.

Definition 567 Given for any i ∈ {1, ..., n} , Si ⊆ R, f : ×ni=1Si → R is continuous in each

variable separately if ∀i ∈ {1, ..., n} and ∀x0i ∈ Si,

fx0i : ×k 6=iSk → R, fx0i

³(xk)k 6=i

´= f

¡x1, .., xi−1, x

0i , xi+1, ..., xn

¢is continuous.

Proposition 568 Given for any i ∈ {1, ..., n} ,

f : ×ni=1Si → R is continuous ⇒ f is continuous in each variable separately

Proof. Exercise.

Remark 569 It is false that

f is continuous in each variable separately ⇒ f is continuous

To see that consider f : R2 → R,

f (x, y) =

⎧⎨⎩xy

x2+y2 if (x, y) 6= 0

0 if (x, y) = 0

The following Proposition is useful to show continuity of functions using the results aboutcontinuity of functions from R to R.

Proposition 570 For any k ∈ {1, ..., n} , take Sk ⊆ X, and define S := ×nk=1Sk ⊆ Xn. Moreover,

take i ∈ {1, ..., n} and letg : Si → Y, : xi 7→ g (xi)

be a continuous function and

f : S → Y, (xk)nk=1 7→ g (xi) .

Then f is continuous.


Example 571 An example of the objects described in the above Proposition is the following one.

g : [0, π]→ R, g (x) = sinx,

f : [0, π]× [−π, 0]→ R, f (x, y) = sinx.

Proof. of Proposition 570. We want to show that

∀x0 ∈ S, ∀ε > 0 ∃δ > 0 such that d (x, x0) < δ ∧ x ∈ S ⇒ d (f (x) , f (x0)) < ε

We know that

∀xi0 ∈ Si,∀ε > 0 ∃δ0 > 0 such that d (xi, xi0) < δ0 ∧ xi ∈ S ⇒ d (g (xi) , g (xi0)) < ε

Take δ = δ0. Then d (x, x0) < δ ∧ x ∈ S ⇒ d (xi, xi0) < δ0 ∧ xi ∈ S and ε > d (g (xi) , g (xi0)) =d (f (x) , f (x0)), as desired.

Exercise 572 Show that the following function is continuous.

f : R2 → R3, f (x1x2) =

⎛⎝ ex1 + cos (x1 · x2)sin2 x1ex2

x1 + x2

⎞⎠From Proposition 566, it suffices to show that each component function is continuous. We are

going to show that f1 : R2 → R,

f1 (x1x2) = ex1 + cos (x1 · x2)

is continuous, leaving the proof of the continuity of the other component functions to the reader.1. f11 : R2 → R, f11 (x1, x2) = ex1 is continuous from Proposition 570 and “Calculus 1”;2. h1 : R2 → R, h1 (x1, x2) = x1 is continuous from Proposition 570 and “Calculus 1”,h2 : R2 → R, h2 (x1, x2) = x2 is continuous from Proposition 570 and “Calculus 1”,g : R2 → R, g (x1, x2) = h1 (x1, x2) · h2 (x1, x2) = x1 · x2 is continuous from Proposition 565.2,φ : R→ R, φ (x) = cosx is continuous from “Calculus 1”,f12 : R2 → R, f12 (x1, x2) = (φ ◦ g) (x1, x2) = cos (x1 · x2) is continuous from Proposition 564

(continuity of composition).3. f1 = f11 + f12 is continuous from Proposition 565.1.

The following Proposition is useful in the proofs of several results.

Proposition 573 Let S, T be arbitrary sets, f : S → T , {Ai}ni=1 a family of subsets of S and{Bi}ni=1 a family of subsets of T.Then

1. “ inverse image preserves inclusions, unions, intersections and set differences”, i.e.,

a. B1 ⊆ B2 ⇒ f−1 (B)1 ⊆ f¡−1B2¢,

b. f−1 (∪ni=1Bi) = ∪ni=1f−1 (Bi) ,

c. f−1 (∩ni=1Bi) = ∩ni=1f−1 (Bi) ,

d. f−1 (B1\B2) = f−1 (B1) \f−1 (B2) ,

2. “ image preserves inclusions, unions, only”, i.e.,

e. A1 ⊆ A2 ⇒ f (A)1 ⊆ f (A2),

f. f (∪ni=1Ai) = ∪ni=1f (Ai) ,

g. f (∩ni=1Ai) ⊆ ∩ni=1f (Ai) , andif f is one-to-one, then f (∩ni=1Ai) = ∩ni=1f (Ai) ,

h. f (A1\A2) ⊇ f (A1) \f (A2) ,andif f is one-to-one and onto, then f (A1\A2) = f (A1) \f (A2) ,

3. “relationship between image and inverse image”

12.2. CONTINUOUS FUNCTIONS 163

i. A1 ⊆ f−1 (f (A1)) , andif f is one-to-one, then A1 = f−1 (f (Ai)) ,

l. B1 ⊇ f¡f−1 (B1)

¢, and

if f is onto, then B1 = f¡f−1 (B1)

¢.

Proof....g.(i) . y ∈ f (A1 ∩A2)⇔ ∃x ∈ A1 ∩A2 such that f (x) = y;(ii) . y ∈ f (A1) ∩ f (A2) ⇔ y ∈ f (A1)∧ y ∈ f (A2) ⇔ (∃x1 ∈ A1 such that f (x1) = y) ∧

(∃x2 ∈ A2 such that f (x2) = y)To show that (i)⇒ (ii) it is enough to take x1 = x and x2 = x....

Proposition 574 f : X → Y is continuous ⇔

V ⊆ Y is open ⇒ f−1 (V ) ⊆ X is open.

Proof. [⇒]Take a point x0 ∈ f−1 (V ). We want to show that

∃r > 0 such that B (x0, r) ⊆ f−1 (V )

Define y0 = f (x0) ∈ V . Since V ⊆ Y is open,

∃ε > 0 such that B (y0, ε) ⊆ V (12.2)

Since f is continuous,

∀ε > 0,∃δ > 0, f (B (x0, δ)) ⊆ B (f (x0) , ε) = B (y0, ε) (12.3)

Then, taken r = δ, we have

B (x0, r) = B (x0, δ)(1)

⊆ f−1 (f (B (x0, δ)))(2)

⊆ f−1 (B (y0, ε))(3)

⊆ f−1 (V )

where (1) follows from 3.i in Proposition 573,(2) follows from 1.a in Proposition 573 and (12.3)(3) follows from 1.a in Proposition 573 and (12.2).[⇐]Take x0 ∈ X and define y0 = f (x0); we want to show that f is continuous at x0.Take ε > 0, then B (y0, ε) is open and, by assumption,

f−1 (B (y0, ε)) ⊆ X is open. (12.4)

Moreover, by definition of y0,x0 ∈ f−1 (B (y0, ε)) (12.5)

(12.4) and (12.5) imply that

∃δ > 0 such that B (x0, δ) ⊆ f−1 (B (y0, ε)) (12.6)

Then

f (B (x0, δ))(1)

⊆ f¡f−1 (B (y0, ε))

¢ (2)⊆ (B (y0, ε))

where(1) follows from 2.e in Proposition 573 and (12.6) ,(2) follows from 2.l in Proposition 573

Proposition 575 f : X → Y is continuous ⇔

V ⊆ Y closed ⇒ f−1 (V ) ⊆ X closed.


Proof. [⇒]V closed in Y ⇒ Y \V open. Then

f−1 (Y \V ) = f−1 (Y ) \ f−1 (V ) = X\f−1 (V ) (12.7)

where the first equality follows from 1.d in Proposition 573.Since f is continuous and Y \V open,then from (12.7) X\f−1 (V ) ⊆ X is open and therefore

f−1 (V ) is closed.[⇐]We want to show that for every open set V in Y , f−1 (V ) is open.

V open ⇒ Y \V closed ⇒ f−1 (Y \V ) closed⇔ X\f−1 (V ) closed⇔ f−1 (V ) open.

Definition 576 A function f : X → Y is open if

S ⊆ X open ⇒ f (S) open;

it is closed ifS ⊆ X closed ⇒ f (S) closed.

Exercise 577 Through simple examples show the relationship between open,closed and continuousfunctions.

We can summarize our discussion on continuous function in the following Proposition.

Proposition 578 Let f be a function between metric spaces (X, d) and (Y, d0). Then the followingstatements are equivalent:

1. f is continuous;

2. V ⊆ Y is open ⇒ f−1 (V ) ⊆ X is open;

3. V ⊆ Y closed ⇒ f−1 (V ) ⊆ X closed;

4.

* ∀x0 ∈ X, ∀ (xn)n∈N ∈ X∞ such that ∀n ∈ N, xn 6= x0 and limn→+∞ xn = x0,

limn→+∞ f (xn) = l

+.

12.3 Continuous functions on compact setsProposition 579 Given f : X → Y, if S is a compact subset of X and f is continuous, then f (S)is a compact subset (of Y ).

Proof. Let F be an open covering of f (S), so that

f (S) ⊆ ∪A∈FA. (12.8)

We want to show that F admits an open subcover which covers f (S). Since f is continuous,

∀A ∈ F , f−1 (A) is open in X

Moreover,

S(1)

⊆ f−1 (f (S))(2)

⊆ f−1 (∪A∈FA)(3)= ∪A∈Ff−1 (A)

where(1) follows from 3.i in Proposition 573,(2) follows from 1.a in Proposition 573 and (12.8),(3) follows from 1.b in Proposition 573.

12.3. CONTINUOUS FUNCTIONS ON COMPACT SETS 165

In other words©f−1 (A)

ªA∈F is an open cover of S. Since S is compact there exists A1, .., An ∈

F such thatS ⊆ ∪ni=1 f−1 (A) .

Then

f (S)(1)

⊆ f¡∪ni=1

¡f−1 (A)

¢¢ (2)= ∪ni=1f

¡f−1 (Ai)

¢ (3)⊆ ∪ni=1Ai

where(1) follows from 1.a in Proposition 573,(2) follows from 2.f in Proposition 573,(3) follows from 3.l in Proposition 573.

Proposition 580 (Extreme Value Theorem) If S a nonempty, compact subset of X and f :S → R is continuous, then f admits global maximum and minimum on S, i.e.,

∃xmin, xmax ∈ S such that ∀x ∈ S, f (xmin) ≤ f (x) ≤ f (xmax) .

Proof. From the previous Proposition f (S) is closed and bounded. Therefore, since f (S) isbounded, there exists M = sup f (S). By definition of sup,

∀ε > 0, B (M, ε) ∩ f (S) 6= ∅

Then1, ∀n ∈ N,takeαn ∈ B

µM,

1

n

¶∩ f (S) .

Then, (αn)n∈N is such that ∀n ∈ N, αn ∈ f (S) and 0 < d (αn,M) < 1n . Therefore, αn →M,and

since f (S) is closed, M ∈ f (S). But M ∈ f (S) means that ∃xmax ∈ S such that f (xmax) = Mand the fact that M = sup f (S) implies that ∀x ∈ S, f (x) ≤ f (xmax). Similar reasoning holdsfor xmin.We conclude the section showing a result useful in itself and needed to show the inverse function

theorem - see Section 17.3.

Proposition 581 Let f : X → Y be a function from a metric space (X,d) to another metric space(Y, d0). Assume that f is one-to-one and onto. If X is compact and f is continuous, then theinverse function f−1 is continuous.

Proof. Exercise.We are going to use the above result to show that a “ well behaved” consumer problem does

have a solution.Let the following objects be given.Price vector p ∈ Rn++, consumption vector x ∈ Rn, consumer’s wealth w ∈ R++, continuous

utility function u : Rn → R, x 7→ u (x). The consumer solves the following problem. For given,p ∈ Rn++,w ∈ R++, find x which gives the maximum value to the utility function u under theconstraint x ∈ C (p,w) defined as

{x = (xi)ni=1 ∈ Rn : ∀i ∈ {1, ..., n} , xi ≥ 0 and px ≤ w} .

As an application of Propositions 580 and 512, we have to show that for any p ∈ Rn++,w ∈ R++,1. C (p,w) 6= ∅,2. C (p,w) is bounded, and3. C (p,w) is closed,1. 0 ∈ C (p,w) .2. Clearly if S ⊆ Rn,then S is bounded iffS is bounded below, i.e., ∃x = (xi)

ni=1 ∈ Rnsuch that ∀x = (xi)

ni=1 ∈ S, we have that ∀i ∈

{1, ..., n}, xi ≥ xi, andS is bounded above, i.e., ∃x = (xi)

ni=1 ∈ Rnsuch that ∀x = (xi)

ni=1 ∈ S, we have that ∀i ∈

{1, ..., n}, xi ≤ xi.

1The fact that M ∈ f (S) can be also proved as follows: from Proposition 553 , M ∈ Cl f (S) = f (S), where thelast equality follows from the fact that f (S) is closed.


C (p,w) is bounded below by zero, i.e., we can take x = 0. C (p,w) is bounded above becausefor every i ∈ {1, ..., n} ,

xi ≤w −

Pi0 6=i pi0xi0

pi≤ w

pi,

where the first inequality comes from the fact that px ≤ w, and the second inequality from the

fact that p ∈ Rn++ and x ∈ Rn+.Then we can take x = (m,m, ...,m), where m = maxnwpi

oni=1.

3. Definefor i ∈ {1, ..., n} , gi : Rn → R, x = (xi)

ni=1 7→ xi,

andh : Rn → R, x = (xi)

ni=1 7→ w − px.

All the above functions are continuous and clearly,

C (p,w) = {x = (xi)ni=1 ∈ Rn : ∀i ∈ {1, ..., n} , gi (x) ≥ 0 and h (x) ≥ 0} .

Moreover,

C (p,w) = {x = (xi)ni=1 ∈ Rn : ∀i ∈ {1, ..., n} , gi (x) ∈ [0,+∞) and h (x) ∈ [0,+∞)} =

= ∩ni=1g−1i ([0,+∞)) ∩ h−1 ([0,+∞)) .[0,+∞) is a closed set and since for any gi is continuous and h is continuous, from Proposition

578.3,the following sets are closed

∀i ∈ {1, ..., n} , g−1i ([0,+∞)) and h−1 ([0,+∞)) .

Then the desired result follows from the fact that intersection of closed set is closed.

Chapter 13

Correspondence and fixed pointtheorems

13.1 Continuous CorrespondencesDefinition 582 Consider1 two metric spaces (X, dX) and (Y, dY ). A correspondence ϕ from Xto Y is a rule which associates a subset of Y with each element of X, and it is described by thenotation

ϕ : X →→ Y, ϕ : x 7→7→ ϕ (x) .

Remark 583 In other words, a correspondence ϕ : X →→ Y can be identified with a functionfrom X to 2Y (the set of all subsets of Y ). If we identify x with {x}, a function from X to Y canbe thought as a particular correspondence.

Remark 584 Some authors make part of the Definition of correspondence the fact that ϕ is notempty valued, i.e., that ∀x ∈ X, ϕ (x) 6= ∅.

In what follows, unless otherwise stated, (X, dX) and (Y, dY ) are assumed to be metric spacesand are denoted by X and Y , respectively.

Definition 585 Given U ⊆ X, ϕ (U) = ∪x∈Uϕ (x) = {y ∈ Y : ∃x ∈ U such that y ∈ ϕ (x)}.

Definition 586 The graph of ϕ : X →→ Y is

graph ϕ := {(x, y) ∈ X × Y : y ∈ ϕ (x)} .

Definition 587 Consider ϕ : X →→ Y. ϕ is Upper Hemi-Continuous (UHC) at x ∈ X if ϕ (x) 6= ∅and for every open neighborhood V of ϕ (x) , there exists an open neighborhood U of x such that forevery x0 ∈ U, ϕ (x0) ⊆ V (or ϕ (U) ⊆ V ).

ϕ is UHC if it is UHC at every x ∈ X.

Definition 588 Consider ϕ : X →→ Y. ϕ is Lower Hemi-Continuous (LHC) at x ∈ X if ϕ (x) 6= ∅and for any open set V in Y such that ϕ (x) ∩ V 6= ∅, there exists an open neighborhood U of xsuch that for every x0 ∈ U, ϕ (x0) ∩ V 6= ∅.

ϕ is LHC if it is LHC at every x ∈ X.

Example 589 Consider X = R+ and Y = [0, 1], and

ϕ1 (x) =

½[0, 1] if x = 0{0} if x > 0.

ϕ2 (x) =

½{0} if x = 0[0, 1] if x > 0.

ϕ1 is UHC and not LHC; ϕ2 is LHC and not UHC.

1This chapter is based mainly on McLean (1985) and Hildebrand (1974).

167

168 CHAPTER 13. CORRESPONDENCE AND FIXED POINT THEOREMS

Some (partial) intuition about the above definitions can be given as follows.Upper Hemi-Continuity does not allow ”explosions”. In other words, ϕ is not UHC at x if there

exists a small enough open neighborhood of x such that ϕ does “explode”, i.e., it becomes muchbigger in that neighborhood.Lower Hemi-Continuity does not allow ”implosions”. In other words, ϕ is not LHC at x if there

exists a small enough open neighborhood of x such that ϕ does “implode”, i.e., it becomes muchsmaller in that neighborhood.In other words, “UHC⇒ no explosion” and “LHC⇒ no implosion”( or “explosion⇒ not UHC”

and “implosion ⇒ not LHC)”. On the other hand, opposite implications are false, i.e.,it is false that “explosion ⇐ not UHC” and “implosion ⇐ not LHC”, or, in an equivalent

manner,it is false that “no explosion ⇒ UHC” and “no implosion ⇒ LHC”.An example of a correspondence which neither explodes nor explodes and which is not UHC

and not LHC is presented below.

ϕ : R+ →→ R, ϕ : x 7→7→½[1, 2] if x ∈ [0, 1)[3, 4] if x ∈ [1,+∞)

ϕ does not implode or explode if you move away from 1 (in a small open neighborhood of 1):on the right of 1, ϕ does not change; on the left, it changes completely. Clearly, ϕ is neither UHCnor LHC (in 1).The following correspondence is both UHC and LHC:

ϕ : R+ →→ R, ϕ : x 7→7→ [x, x+ 1]

A, maybe disturbing, example is the following one

ϕ : R+ →→ R, ϕ : x 7→7→ (x, x+ 1) .

Observe that the graph of the correspondence under consideration “does not implode, does notexplode, does not jump”. In fact, the above correspondence is LHC, but it is not UHC in anyx ∈ R+, as verified below. We want to show that

not

* for every neighborhood V of ϕ (x) ,

there exists a neighborhood U of x such that for every x0 ∈ U, ϕ (x0) ⊆ V

+

i.e., * there exists a neighborhood V ∗ of ϕ (x) such that.

for every neighborhood U of x there exists x0 ∈ U such that ϕ (x0) * V

+

Just take V = ϕ (x) = (x, x+ 1); then for any open neighborhood U of x and, in fact, ∀x0 ∈U\ {x}, ϕ (x0) * V .

Example 590 The correspondence below

ϕ : R+ →→ R, ϕ : x 7→7→½[1, 2] if x ∈ [0, 1][3, 4] if x ∈ [1,+∞)

is UHC, but not LHC.

Remark 591 Summarizing the above results, we can maybe say that a correspondence which isboth UHC and LHC, in fact a continuous correspondence, is a correspondence which agrees withour intuition of a graph without explosions, implosions or jumps.

Proposition 592 1. If ϕ : X →→ Y is either UHC or LHC and it is a function, then it is acontinuous function.2. If ϕ : X →→ Y is a continuous function, then it is a UHC and LHC correspondence.

13.1. CONTINUOUS CORRESPONDENCES 169

Proof.1.Case 1. ϕ is UHC.First proof. Use the fourth characterization of continuous function in Definition 562.Second proof. Recall that a function f : X → Y is continuous iff [V open in Y ]⇒

£f−1 (V ) open in X

¤.

Take V open in Y . Consider x ∈ f−1 (V ), i.e., x such that f (x) ∈ V . By assumption f is UHC andtherefore ∃ an open neighborhood U of x such that f (U) ⊆ V . Then, U ⊆ f−1 ◦ f (U) ⊆ f−1 (V ) .Then, for any x ∈ f−1 (V ), we have found an open set U which contains x and is contained inf−1 (V ) , i.e., f−1 (V ) is open.Case 2. ϕ is LHC.See Remark 596 below.2.The results follows from the definitions and again from Remark 596 below.

Definition 593 ϕ : X →→ Y is a continuous correspondence if it is both UHC and LHC.

Very often, checking if a correspondence is UHC or LHC is not easy. We present some relatedconcepts which are more convenient to use.

Definition 594 ϕ : X →→ Y is “sequentially LHC” at x ∈ X iffor every sequence (xn)n∈N ∈ X∞ such that xn → x, and for every y ∈ ϕ (x) ,there exists a sequence (yn)n∈N ∈ Y∞ such that ∀ n ∈ N, yn ∈ ϕ (xn) and yn → y,ϕ is ”sequentially LHC” if it is ”sequentially LHC” at every x ∈ X.

Proposition 595 Consider ϕ : X →→ Y. ϕ is LHC at x ∈ X ⇔ ϕ is LHC in terms of sequencesat x ∈ X.

Proof.[⇒]Consider an arbitrary sequence (xn)n∈N ⊆ X∞ such that xn → x and an arbitrary y ∈ ϕ (x) .

For every r ∈ N\ {0} , consider B¡y, 1r

¢. Clearly B

¡y, 1r

¢∩ϕ (x) 6= ∅, since y belongs to both sets.

From the fact that ϕ is LHC, we have that

∀r ∈ N\ {0} , ∃ a neighborhood Ur of x such that ∀z ∈ Ur, ϕ (z) ∩Bµy,1

r

¶6= ∅. (1)

Since xn → x,

∀r ∈ N\ {0} , ∃ nr ∈ N such that n ≥ nr ⇒ xn ∈ Ur (2) .

Consider {n1, ..., nr, ...} . For any r ∈ N\ {0}, if nr < nr+1, define n0r := nr and n0r+1 := nr+1, i.e.,“just add a 0 to the name of those indices”. If nr ≥ nr+1, define n0r := nr and n0r+1 := nr+1. Then,∀r ∈ N\ {0} , n0r+1 > n0r and condition (2) still hold, i.e.,

∀r, ∃ n0r such that n ≥ n0r ⇒ xn ∈ Ur and n0r+1 > n0r. (3)

We can now define the desired sequence (yn)n∈N . For any n ∈£n0r, n

0r+1

¢, observe that from (3),

xn ∈ Ur , and, then, from (1) , ϕ (xn) ∩B¡y, 1r

¢6= ∅. Then,

for any r, for any n ∈£n0r, n

0r+1

¢∩ N, choose yn ∈ ϕ (xn) ∩B

µy,1

r

¶(4) .

We are left with showing that yn → y, i.e., ∀ε > 0, ∃ m such that n ≥ m⇒ yn ∈ B (y, ε). Observethat (4) just says thatfor any n ∈ [n01, n02) , yn ∈ ϕ (xn) ∩B

¡y, 11

¢,

for any n ∈ [n02, n03) , yn ∈ ϕ (xn) ∩B¡y, 12

¢⊆ B (y, 1)

...for any n ∈

£n0r, n

0r+1

¢, yn ∈ ϕ (xn) ∩B

¡y, 1r

¢⊆ B

³y, 1

r−1

´,


and so on. Then, for any ε > 0, choose r > 1ε (so that

1r < ε) and m = n0r, then from the above

observations, n ∈£m,n0r+1

¢⇒ yn ∈ B

¡y, 1r

¢⊆ B (y, ε) ,and for n > n0r+1, a fortiori, yn ∈ B

¡y, 1r

¢.

[⇐]Assume otherwise, i.e., ∃ an open set V such that

ϕ (x) ∩ V 6= ∅ (5)

and such that ∀ open neighborhood U of x ∃ xU ∈ U such that ϕ (xU ) ∩ V = ∅.Consider the following family of open neighborhood of x:½

B

µx,1

n

¶: n ∈ N\ {0}

¾.

Then ∀n ∈ N\ {0} , ∃ xn ∈ B¡x, 1n

¢, and therefore xn → x, such that

ϕ (xn) ∩ V = ∅ (6) .

From (5) , we can take y ∈ ϕ (x) ∩ V. By assumption, we know that there exists a sequence(yn)n∈N ∈ Y∞ such that ∀n ∈ N\ {0}, yn ∈ ϕ (xn) and yn → y. Since V is open and y ∈ V, ∃ nsuch that n > n⇒ yn ∈ V. Therefore,

y ∈ ϕ (xn) ∩ V (7) .

But (7) contradicts (6) .

Thanks to the above Proposition from now on we talk simply of Lower Hemi-Continuous corre-spondences.

Remark 596 If ϕ : X →→ Y is LHC and it is a function, then it is a continuous function. Theresult follows from the characterization of Lower Hemi-Continuity in terms of sequences and fromthe characterization of continuous functions presented in Proposition 578.

Definition 597 ϕ : X →→ Y is closed, or "sequentially UHC", at x ∈ X iffor every sequence (xn)n∈N ∈ X∞ such that xn → x, and for every sequence (yn)n∈N ∈ Y∞ such

that yn ∈ ϕ (xn) and yn → y,it is the case that y ∈ ϕ (x) .ϕ is closed if it is closed at every x ∈ X.

Proposition 598 ϕ is closed ⇔ graph ϕ is a closed set in X × Y . 2

Proof. An equivalent way of stating of the above Definition is the following one: for everysequence (xn, yn)n∈N ∈ (X × Y )∞ such that ∀n ∈ N, (xn, yn) ∈ graph ϕ and (xn, yn)→ (x, y), it isthe case that (x, y) ∈ graph ϕ. Then, from the characterization of closed sets in terms of sequences,i.e., Proposition 493, the desired result follows.

Remark 599 Because of the above result, many author use the expression “ϕ has closed graph” inthe place of “ϕ is closed”.

Remark 600 The definition of closed correspondence does NOT reduce to continuity in the caseof functions, as the following example shows.

ϕ3 : R+ →→ R, ϕ3 (x) =½{0} if x = 0©1x

ªif x > 0.

ϕ is a closed correspondence, but it is not a continuous function.

Definition 601 ϕ : X →→ Y is closed (non-empty, convex, compact ...) valued if for every x ∈ X,ϕ (x) is a closed (non-empty, convex, compact ...) set.

Proposition 602 Consider ϕ : X →→ Y. ϕ closed⇒: ϕ closed valued.

2 ((X × Y ) , d∗) with d∗ ((x, x0) , (y, y0)) = d (x, x0) + d0 (y, y0) is a metric space.


Proof.[⇒]We want to show that every sequence in ϕ (x) which converges, in fact, converges in ϕ (x).

Choose x ∈ X and a sequence (yn)n∈N such that {yn : n ∈ N} ⊆ ϕ (x) and such that yn → y. Thensetting for any n ∈ N, xn = x, we get xn → x, yn ∈ ϕ (xn) , yn → y. Then, since ϕ is closed,y ∈ ϕ (x) . This shows that ϕ (x) is a closed set.[:]ϕ2 in Example 589 is closed valued, but not closed.

Remark 603 Consider ϕ : X →→ Y. ϕ is UHC;: ϕ is closed.

[;]ϕ4 : R+ →→ R, ϕ4 (x) = [0, 1) for every x ∈ Ris UHC and not closed.[:]ϕ3 in Remark 600 is closed and not UHC, simply because it is not a continuous “function”.

Proposition 604 Consider ϕ : X →→ Y. If ϕ is UHC (at x) and closed valued (at x), then ϕ isclosed (at x).

Proof.Take an arbitrary x ∈ X.We want to show that ϕ is closed at x, i.e., assume that xn → x, yn ∈

ϕ (xn) , yn → y; we want to show that y ∈ ϕ (x). Since ϕ (x) is a closed set, it suffices to showthat y ∈ Clφ (x), i.e.,3 ∀ε > 0, B (y, ε) ∩ ϕ (x) 6= ∅.Consider

©B¡z, ε2

¢: z ∈ ϕ (x)

ª. Then, ∪z∈ϕ(x)B

¡z, ε2

¢:= V is open and contains ϕ (x) . Since

ϕ is UHC at x, there exists an open neighborhood U of x such that

ϕ (U) ⊆ V. (1)

Since xn → x ∈ U, ∃bn ∈ N such that ∀n > bn, xn ∈ U, and, from (1), ϕ (xn) ⊆ V. Sinceyn ∈ ϕ (xn),

∀n > bn, yn ∈ V := ∪z∈ϕ(x)B³z,

ε

2

´. (2)

From (2) , ∀n > bn, ∃z∗n ∈ ϕ (x) such that yn ∈ B¡z∗n,

ε2

¢and then

d (yn, z∗) <

ε

2. (3)

Since yn → y, ∃n∗ such that ∀n > n∗,

d (yn, y) <ε

2(4)

From (3) and (4) ,∀n > max {bn, n∗} , z∗n ∈ ϕ (x) and d (y, z∗) ≤ d (y, yn) + d (yn, z∗) < ε, i.e.,

z∗n ∈ B (y, ε) ∩ ϕ (x) 6= ∅.

Proposition 605 Consider ϕ : X →→ Y. If ϕ is closed and there exists a compact set K ⊆ Y suchthat ϕ (X) ⊆ K, then ϕ is UHC.Therefore, in simpler terms, if ϕ is closed (at x)and Y is compact, then ϕ is UHC (at x).

Proof.Assume that there exists x ∈ X such that ϕ is not UHC at x ∈ X, i.e., there exist an open

neighborhood V of ϕ (x) such that for every open neighborhood Ux of x, ϕ (Ux) ∩ V C 6= ∅.In particular, ∀n ∈ N\ {0} , ϕ

¡B¡x, 1n

¢¢∩ V C 6= ∅. Therefore, we can construct a sequence

(xn)n∈N ∈ X∞ such that xn → x and ϕ (xn) ∩ V C 6= ∅. Now, take yn ∈ ϕ (xn) ∩ V C . Sinceyn ∈ ϕ (X) ⊆ K and K is compact, and therefore sequentially compact, up to a subsequence,yn → y ∈ K. Moreover, since ∀n ∈ N\ {0} , yn ∈ V C and V C is closed,

y ∈ V C (1) .

3 See Corollary 553.


Since ϕ is closed and xn → x, yn ∈ ϕ (xn) , yn → y, we have that y ∈ ϕ (x). Since, byassumption, ϕ (x) ⊆ V, we have that

y ∈ V (2) .


None of the Assumptions of the above Proposition can be dispensed of. All the examples belowshow correspondences which are not UHC.

Example 606 1.

ϕ : R+ →→ R, ϕ (x) =½ ©

12

ªif x ∈ [0, 2]

{1} if x > 2.

Y = [0, 1] , but ϕ is not closed.2.

ϕ : R+ →→ R, ϕ (x) =½{0} if x = 0©1x

ªif x > 0.

ϕ is closed, but ϕ (X) = R+,which is closed, but not bounded.3.

ϕ : [0, 1]→→ [0, 1) , ϕ (x) =

½{0} if x ∈ [0, 1)©12

ªif x = 1.

ϕ is closed (in Y ), but Y = [0, 1) is not compact. Observe that if you consider

ϕ : [0, 1]→→ [0, 1↓], ϕ (x) =

½{0} if x ∈ [0, 1)©12

ªif x = 1.

,

then ϕ is not closed.

Definition 607 Consider ϕ : X →→ Y, V ⊆ Y.The strong inverse image of V via ϕ is

sϕ−1 (V ) := {x ∈ X : ϕ (x) ⊆ V } ;.The weak inverse image of V via ϕ is

wϕ−1 (V ) := {x ∈ X : ϕ (x) ∩ V 6= ∅} .

Remark 608 1. ∀ V ⊆ Y, sϕ−1 (V ) ⊆ wϕ−1 (V ) .2. If ϕ is a function, the usual definition of inverse image coincides with both above definitions.

Proposition 609 Consider ϕ : X →→ Y.1.1. ϕ is UHC ⇔ for every open set V in Y, sϕ−1 (V ) is open in X;1.2. ϕ is UHC ⇔ for every closed set V in Y, wϕ−1 (V ) is closed in X;2.1. ϕ is LHC ⇔ for every open set V in Y, wϕ−1 (V ) is open in X;2.2. ϕ is LHC ⇔ for every closed set V in Y, sϕ−1 (V ) is closed in X.4

Proof.[1.1.,⇒] Consider V open in Y . Take x0 ∈ sϕ−1 (V ); by definition of sϕ−1, ϕ (x0) ⊆ V. By

definition of UHC correspondence, ∃ an open neighborhood U of x0 such that ∀x ∈ U, ϕ (x) ∈ V.Then x0 ∈ U ⊆ sϕ−1 (V ) .[1.1.,⇐] Take an arbitrary x0 ∈ X and an open neighborhood V of ϕ (x0) . Then x0 ∈ sϕ−1 (V )

and sϕ−1 (V ) is open by assumption. Therefore (just identifying U with sϕ−1 (V )), we have provedthat ϕ is UHC.

To show 1.2, preliminarily, observe that¡wϕ−1 (V )

¢C= sϕ−1

¡V C¢. (13.1)

4Part 2.2 of the Proposition will be used in the proof of the Maximum Theorem.


(To see that, simply observe that¡wϕ−1 (V )

¢C:= {x ∈ X : ϕ (x) ∩ V = ∅} and sϕ−1

¡V C¢:=©

x ∈ X : ϕ (x) ⊆ V Cª)

[1.2.,⇒] V closed ⇔ V C openAssum., (1.1)⇔ sϕ−1

¡V C¢ (13.1)=

¡wϕ−1 (V )

¢Copen ⇔ wϕ−1 (V )

closed.[1.2.,⇐]From (1.1.) , it suffices to show that ∀ open set V in Y, sϕ−1 (V ) is open in X. Then,

V open ⇔ V C closedAssum.⇔w

ϕ−1¡V C¢closed ⇔

¡wϕ−1

¡V C¢¢C (13.1)

= sϕ−1 (V ) open.The proofs of parts 2.1. and 2.2. are similar to the above ones.

Remark 610 Observe that ϕ is UHC;: for every closed set V in Y, sϕ−1 (V ) is closed in X.

[;]Consider

ϕ : R+ →→ R, ϕ (x) =

½[0, 2] if x ∈ [0, 1][0, 1] if x > 1.

ϕ is UHC and [0, 1] is closed, but sϕ−1 ([0, 1]) := {x ∈ R+ : ϕ (x) ⊆ [0, 1]} = (1,+∞) is notclosed.[:]Consider

ϕ : R+ →→ R+, ϕ (x) =

½ £0, 12

¤∪ {1} if x = 0

[0, 1] if x > 0.

For any closed set in Y := R+, sϕ−1 (V ) can be only one of the following set, and each of themis closed: {0} ,R+,∅. On the other hand, ϕ is not UHC in 0.

Definition 611 Let the vector spaces (X, dX) , (Y, dY ) and (Z, dZ) and the correspondences ϕ :X →→ Y, ψ : Y →→ Z be given. The composition of ϕ with ψ is

ψ ◦ ϕ : X →→ Z,

( ψ ◦ ϕ) (x) := ∪y∈ϕ(x)ψ (y) = {z ∈ Z : ∃x ∈ X such that z ∈ ψ (ϕ (x))}

.

Proposition 612 Consider ϕ : X →→ Y, ψ : Y →→ Z. If ϕ and ψ are UHC, then ψ ◦ ϕ is UHC.

Proof.Step 1. s (ψ ◦ ϕ)−1 (V ) = sϕ−1

¡sψ−1 (V )

¢.

s (ψ ◦ ϕ)−1 (V ) = {x ∈ X : ψ (ϕ (x)) ⊆ V } = {x ∈ X : ∀y ∈ ϕ (x) , ψ (y) ⊆ V } ==©x ∈ X : ∀y ∈ ϕ (x) , y ∈ sψ−1 (V )

ª=©x ∈ X : ϕ (x) ⊆ sψ−1 (V )

ª= sϕ−1

¡sψ−1 (V )

¢.

Step 2. Desired result.Take V open in Z. From Theorem 609, we want to show that s (ψ ◦ ϕ) (V ) is open in X. From

step 1, we have that s (ϕ ◦ ψ)−1 (V ) = sϕ−1¡sψ−1 (V )

¢. Now, sψ−1 (V ) is open because ψ is UHC,

and sϕ−1¡sψ−1 (V )

¢is open because ϕ is UHC.

Proposition 613 Consider ϕ : X →→ Y. If ϕ is UHC and compact valued, and A ⊆ X is acompact set, then ϕ (A) is compact.

Proof.Consider an arbitrary open cover {Cα}α∈I for ϕ (A) . Since ϕ (A) := ∪x∈Aϕ (x) and ϕ is compact

valued, there exists a finite set Nx ⊆ I such that

ϕ (x) ⊆ ∪α∈NxCα := Gx. (13.2)

Since for every α ∈ Nx, Cα is open, then Gx is open. Since ϕ is UHC, sϕ−1 (Gx) is open.Moreover, x ∈ sϕ−1 (Gx): this is the case because, by definition, x ∈ sϕ−1 (Gx) iff ϕ (x) ⊆ Gx,


which is just (13.2) . Therefore,©sϕ−1 (Gx)

ªx∈A is an open cover of A. Since, by assumption, A is

compact, there exists a finite set {xi}mi=1 ⊆ A such that A ⊆ ∪mi=1¡sϕ−1 (Gxi)

¢. Finally,

ϕ (A) ⊆ ϕ¡∪mi=1

¡sϕ−1 (Gxi)

¢¢ (1)⊆ ∪mi=1ϕ

¡sϕ−1 (Gxi)

¢ (2)⊆ ∪mi=1Gxi = ∪mi=1 ∪α∈Nxi

Cα,

andn{Cα}α∈Nxi

omi=1

is a finite subcover of {Cα}α∈I . We are left with showing (1) and (2)above.(1) . In general, it is the case that ϕ (∪mi=1Si) ⊆ ∪mi=1ϕ (Si) .y ∈ ϕ (∪mi=1Si) ⇔ ∃x ∈ ∪mi=1Si such that y ∈ ϕ (x) ⇒ ∃i such that y ∈ ϕ (x) ⊆ ϕ (Si) ⇒ y ∈

∪mi=1ϕ (Si) .(2) . In general, it is the case that ϕ

¡sϕ−1 (A)

¢⊆ A.

y ∈ ϕ¡sϕ−1 (A)

¢⇒ ∃x ∈ sϕ−1 (A) such that y ∈ ϕ (x). But, by definition of sϕ−1 (A) , and

since x ∈ sϕ−1 (A) , it follows that ϕ (x) ⊆ A and therefore y ∈ A.

Remark 614 Observe that the assumptions in the above Proposition cannot be dispensed of, asverified below.Consider ϕ : R+ →→ R , ϕ (x) = [0, 1). Observe that ϕ is UHC and bounded valued but not

closed valued , and ϕ ([0, 1]) = [0, 1) is not compact.Consider ϕ : R+ →→ R , ϕ (x) = R+. Observe that ϕ is UHC and closed valued, but not

bounded valued, and ϕ ([0, 1]) = R+ is not compact.

Consider ϕ : R+ →→ R+, ϕ (x) =

½{x} if x 6= 1{0} if x = 1.

Observe that ϕ is not UHC and

ϕ ([0, 1]) = [0, 1) is not compact.

Add Proposition 5, page 25 and Proposition 6, page 26, from Hildebrand (1974) ...maybe as exercises ...

Remark 615 Below, we summarize some facts we showed in the present Section, in a somehowinformal manner.

h(if ϕ is a fcn., it is cnt.i⇔ hϕ is UHCi ;: hϕ is sequentially UHC,i.e., closedi ;⇐ h(if ϕ is a fcn., it is cnt.i

h(if ϕ is a fcn, it is continuousi⇔ hϕ is LHCi⇔ hϕ is sequentially LHCi

hϕ UHC and closed valued at xi⇒ hϕ is closed at xihϕ UHC at x i⇐ hϕ is closed at x and Imϕ compacti

13.2 The Maximum TheoremTheorem 616 (Maximum Theorem) Let the metric spaces (Π, dΠ) , (X, dX), the correspondenceβ : Π→→ X and a function u : X ×Π→ R be given.5 Define

ξ : Π→→ X,ξ (π) = {z ∈ β (π) : ∀x ∈ β (π) , u (z, π) ≥ u (x, π)} = argmaxx∈β(π) u (x, π) ,

Assume thatβ is non-empty valued, compact valued and continuous,u is continuous.Then1. ξ is non-empty valued, compact valued, UHC and closed,and2.

v : Π→ R, v : π 7→ maxx∈β(π)

u (x, π) .

is continuous.5Obviously, β stands for “budget correspondence” and u for “utility function”.

13.2. THE MAXIMUM THEOREM 175

Proof.

ξ is non-empty valued.

It is a consequence of the fact that β is non-empty valued and compact valued and of theExtreme Value Theorem - see Proposition 580.

ξ is compact valued.

We are going to show that for any π ∈ Π, ξ (π) is a sequentially compact set. Consider asequence (xn)n∈N ∈ X∞ such that {xn : n ∈ N} ⊆ ξ (π) . Since ξ (π) ⊆ β (π) and β (π) is compactby assumption, without loss of generality, up to a subsequence, xn → x0 ∈ β (π) . We are leftwith showing that x0 ∈ ξ (π) . Take an arbitrary z ∈ β (π). Since {xn : n ∈ N} ⊆ ξ (π) , we havethat u (xn, π) ≥ u (z, π) . By continuity of u, taking limits with respect to n of both sides, we getu (x0, π) ≥ u (z, π) , i.e., x0 ∈ ξ (π), as desired.

ξ is UHC.

From Proposition 609, it suffices to show that given an arbitrary closed set V in X, wξ−1 (V ) :={π ∈ Π : ξ (π) ∩ V 6= ∅} is closed inΠ. Consider an arbitrary sequence (πn)n∈N such that {πn : n ∈ N} ⊆wξ−1 (V ) and such that πn → π0. We have to show that π0 ∈ wξ−1 (V ).Take a sequence (xn)n∈N ∈ X∞ such that for every n, xn ∈ ξ (πn) ∩ V 6= ∅. Since ξ (πn) ⊆

β (πn) , it follows that xn ∈ β (πn). We can now show the followingClaim. There exists a subsequence (xnk)nk∈N of (xn)n∈N such that xnk → x0 and x0 ∈ β (π0) .Proof of the Claim.Since {πn : n ∈ N} ∪ {π0} is a compact set (Show it), and since, by assumption, β is UHC

and compact valued, from Proposition 613, β ({πn : n ∈ N} ∪ {π0}) is compact. Since {xn}n ⊆β ({πn} ∪ {π0}) , there exists a subsequence (xnk)nk∈N of (xn)n∈N which converges to some x0.Since β is compact valued, it is closed valued, too. Then, β is UHC and closed valued and fromProposition 604, β is closed. Since

πnk → π0, xnk ∈ β (πnk) , xnk → x0,

the fact that β is closed implies that x0 ∈ β (π0).End of the Proof of the Claim.Choose an arbitrary element z0 such that z0 ∈ β (π0). Since we assumed that πn → π0 and

since β is LHC, there exists a sequence (zn)n∈N ∈ X∞ such that zn ∈ β (πn) and zn → z0.Summarizing, and taking the subsequences of (πn)n∈N and (zn)n∈N corresponding to (xnk)nk∈N,

we have for any nk,πnk → π0,xnk → x0, xnk ∈ ξ (πnk) , x0 ∈ β (π0) ,znk → z0, znk ∈ β (πnk) , z0 ∈ β (π0) .

Then for any nk, we have that u (xnk , πnk) ≥ u (znk , πnk) . Since u is continuous, taking limits,we get that u (x0, π0) ≥ u (z0, π0) .Since the choice of z0 in β (π0) was arbitrary, we have thenx0 ∈ ξ (π0) .Finally, since (xnk)nk∈N ∈ V∞, xnk → x0 and V is closed, x0 ∈ V. Then x0 ∈ ξ (π0) ∩ V and

π0 ∈ {π ∈ Π : ξ (π) ∩ V 6= ∅} := wξ−1 (V ) , which was the desired result.

ξ is closed.

ξ is UHC and compact valued, and therefore closed valued. Then, from Proposition 604, it isclosed, too.

v is a continuous function.

The basic idea of the proof is that v is a function and “it is equal to” the composition of UHCcorrespondences; therefore, it is a continuous function. A precise argument goes as follows.

Let the following correspondences be given:

(ξ, id) : Π→→ X ×Π, π 7→ ξ (π)× {π} ,


β : X ×Π→→ R, (x, π) 7→ {u (x, π)} .

Then, from Definition 611,

(β ◦ (ξ, id)) (π) = ∪(x,π)∈ξ(π)×{π} {u (x, π)} .

By definition of ξ,

∀π ∈ Π,∀x ∈ ξ (π) , ∪(x,π)∈ξ(π)×{π} {u (x, π)} = {u (x, π)} ,

and∀π ∈ Π, (β ◦ (ξ, id)) (π) = {u (x, π)} = {v (π)} . (13.3)

Now, (ξ, id) is UHC, and since u is a continuous function, β is UHC as well. From Proposition612, β ◦ (ξ, id) is UHC and, from 13.3, v is a continuous function.

A sometimes more useful version of the Maximum Theorem is one which does not use the factthat β is UHC.

Theorem 617 (Maximum Theorem) Consider the correspondence β : Π →→ X and the functionu : X ×Π→ R defined in Theorem 616 and Π,X Euclidean spaces.Assume thatβ is non-empty valued, compact valued, convex valued, closed and LHC.u is continuous.Then1. ξ is a non-empty valued, compact valued, closed and UHC correspondence;2. v is a continuous function.

Proof.The desired result follows from next Proposition.

Proposition 618 Consider the correspondence β : Π→→ X , with Π and X Euclidean spaces.Assume that β is non-empty valued, compact valued, convex valued, closed and LHC. Then β

is UHC.

Proof.See Hildenbrand (1974) Lemma 1 page 33. The proof requires also Theorem 1 in Hildenbrand

(1974).

The following result allows to substitute the requirement “β is LHC” with the easier to checkrequirement “Clβ is LHC”.

Proposition 619 Consider the correspondence ϕ : Π→→ X. ϕ is LHC ⇔ Clφ is LHC.

Proof.Preliminary Claim.V open set, , Clφ (π) ∩ V 6= ∅⇒ ϕ (π) ∩ V 6= ∅.Proof of the Preliminary Claim.Take z ∈ Clφ (π) ∩ V 6= ∅. Since V is open, ∃ε > 0 such that B (z, ε) ⊆ V. Since z ∈ Clφ (π) ,

∃ {zn} ⊆ ϕ (π) such that zn → z. But then ∃nε such that n > nε ⇒ zn ∈ B (z, ε) ⊆ V . But zn ∈ Vand zn ∈ ϕ (π) implies that ϕ (π) ∩ V 6= ∅.End of the Proof of the Preliminary Claim.[⇒]Take an open set V such that Clφ (π)∩ V 6= ∅. We want to show that there exists an open set

U∗ such that π ∈ U∗ and ∀ξ ∈ U∗, Clφ (ξ) ∩ V 6= ∅. From the preliminary remark, it must be thecase that ϕ (π) ∩ V 6= ∅. Then, since ϕ is LHC, there exists an open set U such that π ∈ U and∀ξ ∈ U∗, ϕ (π) ∩ V 6= ∅. Since Clφ (π) ⊇ ϕ (π) , we also have Clφ (π) ∩ V 6= ∅. Choosing U∗ = U,we are done.

13.3. FIXED POINT THEOREMS 177

[⇐]Since ϕ (π) ∩ V 6= ∅, then Clφ (π) ∩ V 6= ∅, and, by assumption, ∃ open set U 0 such that

π ∈ U 0 and ∀ξ ∈ U 0, Clφ (ξ)∩V 6= ∅. Then, from the preliminary remark, it must be the case thatϕ (π) ∩ V 6= ∅.

Remark 620 In some economic models, a convenient strategy to show that a correspondence β isLHC is the following one. Introduce a correspondence bβ; show that bβ is LHC; show that Cl bβ = β.Then from the above Proposition 619, the desired result follows - see, for example, point 5 the proofof Proposition 633 below.

13.3 Fixed point theoremsA thorough analysis of the many versions of fixed point theorems existing in the literature is outsidethe scope of this notes. Below, we present a useful relatively general version of fixed point theoremsboth in the case of functions and correspondences.

Theorem 621 (The Brouwer Fixed Point Theorem)For any n ∈ N\ {0}, let S be a nonempty, compact, convex subset of Rn. If f : S → S is a

continuous function, then ∃x ∈ S such that f (x) = x.

Proof. For a (not self-contained) proof, see Ok (2007), page 279.Just to try to avoid having a Section without a proof, let’s show the following extremely simple

version of that theorem.

Proposition 622 If f : [0, 1]→ [0, 1] is a continuous function, then ∃x ∈ [0, 1] such that f (x) = x.

Proof. If f (0) = 0 or f (1) = 1, the result is true. Then suppose otherwise, i.e., f (0) 6= 0 andf (1) 6= 1, i.e., since the domain of f is [0, 1], suppose that f (0) > 0 and f (1) < 1. Define

g : [0, 1]→ R, : x 7→ x− f (x) .

Clearly, g is continuous, g (0) = −f (0) < 0 and g (1) = 1 − f (1) > 0. Then, from theintermediate value for continuous functions, ∃x ∈ [0, 1] such that g (x) = x − f (x) = 0, i.e.,x = f (x), as desired.

Theorem 623 (Kakutani’s Fixed Point Theorem)For any n ∈ N\ {0}, let S be a nonempty, compact, convex subset of Rn. If ϕ : S →→ S is a

convex valued, closed correspondence, then ∃x ∈ S such that ϕ (x) 3 x.

Proof. For a proof, see Ok (2007), page 331.

13.4 Application of the maximum theorem to the consumerproblem

Definition 624 (Mas Colell (1996), page 17) Commodities are goods and services available forpurchases in the market.

We assume the number of commodities is finite and equal to C. Commodities are indexed bysuperscript c = 1, ..., C.

Definition 625 A commodity vector is an element of the commodity space RC .

Definition 626 (almost Mas Colell(1996), page 18) A consumption set is a subset of thecommodity space RC. It is denoted by X. Its elements are the vector of commodities the individualcan conceivably consume given the physical or institutional constraints imposed by the environment.

Example 627 See Mas colell pages 18, 19.


Common assumptions on X are that it is convex,bounded below and unbounded. Unless other-wise stated, we make the following stronger

Assumption 1 X = RC+ :=©x ∈ RC : x ≥ 0

ª.

Definition 628 p ∈ RC is the vector of commodity prices.

Households’ choices are limited also by an economic constraint: they cannot buy goods whosevalue is bigger than their wealth, i.e., it must be the case that px ≤ w, where w is household’swealth.

Remark 629 w can take different specifications. For example, we can have w = pe, where e ∈ RCis the vector of goods owned by the household, i.e., her endowments.

Assumption 2 All commodities are traded in markets at publicly observable prices, expressed inmonetary unit terms.

Assumption 3 All commodities are assumed to be strictly goods (and not ”bad”), i.e., p ∈ RC++.

Assumption 4 Households behave as if they cannot influence prices.

Definition 630 The budget set is

β (p,w) :=©x ∈ RC+ : px ≤ w

ª.

With some abuse of notation we define the budget correspondence as

β : RC++ ×R++ →→ RC , β (p,w) =©x ∈ RC+ : px ≤ w

ª.

Definition 631 The utility function is

u : X → R, x 7→ u (x)

Definition 632 The Utility Maximization Problem (UMP ) is

maxx∈RC+ u (x) s.t. px ≤ w, or x ∈ β (p,w) .

ξ : RC++ ×R++ →→ RC , ξ (p,w) = argmax (UMP ) is the demand correspondence.

Theorem 633 ξ is a non-empty valued, compact valued, closed and UHC correspondence and

v : RC++ ×R++ → R, v : (p,w) 7→ max (UMP ) ,

i.e., the indirect utility function, is a continuous function.

Proof.As an application of the (second version of) the Maximum Theorem, i.e., Theorem 617, we have

to show that β is non-empty valued, compact valued, convex valued, closed and LHC.1. β is non-empty valued.

x =³

wCpc

´Cc=1∈ β (p,w) (or, simpler, 0 ∈ β (p,w)).

2. β is compact valued.β (p,w) is closed because is the intersection of the inverse image of closed sets via continuous

functions.β (p,w) is bounded below by zero.

β (p,w) is bounded above because for every c, xc ≤ w− c0 6=c pc0xc

0

pc ≤ wpc , where the first inequal-

ity comes from the fact that px ≤ w, and the second inequality from the fact that p ∈ RC++ andx ∈ RC+.3. β is convex valued.To see that, simply, observe that (1− λ) px0 + λpx00 ≤ (1− λ)w + λw = w.4. β is closed.We want to show that for every sequence {(pn, wn)}n ⊆ RC++ ×R++ such that

13.4. APPLICATION OF THE MAXIMUM THEOREM TO THE CONSUMER PROBLEM179

(pn, wn)→ (p,w) , xn ∈ β (pn, wn) , xn → x,it is the case that x ∈ β (p,w) .Since xn ∈ β (pn, wn), we have that pnxn ≤ wn and xn ≥ 0. Taking limits of both sides of both

inequalities, we get px ≤ w and x ≥ 0,i.e., x ∈ β (p,w) .5. β is LHC.We proceed as follows: a. Int β is LHC; b. Cl Int β = β. Then, from Proposition 619 the

result follows.a. Observe that Int β (p,w) :=

©x ∈ RC+ : xÀ 0 and px < w

ªand that Int β (p,w) 6= ∅, since

x =³

w2Cpc

´Cc=1∈ Int β (p,w) .We want to show that the following is true.

For every sequence (pn, wn)n ∈¡RC++ × R++

¢∞such that (pn, wn) → (p,w) and for any x ∈

Intβ (p,w) ,there exists a sequence {xn}n ⊆ RC+ such that ∀n, xn ∈ Int β (pn, wn) and xn → x.pnx−wn → px−w < 0 (where the strict inequality follows from the fact that x ∈ Int β (p,w) .

Then, ∃N such that n ≥ N ⇒ pnx− wn < 0.For n ≤ N, choose an arbitrary xn ∈ Int β (pn, wn) 6= ∅. Since pnx− w < 0, for every n > N,

there exists εn > 0 such that z ∈ B (x, εn)⇒ pnz − wn < 0.For any n > N, choose xn = x+ 1√

Cmin

©εn2 ,

1n

ª· 1. Then,

d (x, xn) =

ÃC

µ1√Cmin

½εn2,1

n

¾¶2! 12

= min

½εn2,1

n

¾< εn,

i.e., xn ∈ B (x, εn) and therefore

pnxn − wn < 0 (1) .

Since xn À x, we also have

xn À 0 (2) .

(1) and (2) imply that xn ∈ Int β (pn, wn) .Moreover, since xn À x, we have 0 5 limn→+∞ (xn − x) =limn→∞

1√C·min

©εn2 ,

1n

ª· 1 5 limn→∞

1n

1√C· 1 = 0, i.e., limn→+∞ xn = x.6

b.It follows from the fact that the budget correspondence is the intersection of the inverse images

of half spaces via continuous functions.2.It follows from Proposition 634, part (4), and the Maximum Theorem.

Proposition 634 For every (p,w) ∈ RC++ ×R++,(1) ∀α ∈ R++, ξ (αp, αw) = ξ (p,w) ;(2) if u is LNS, ∀x ∈ RC+, x ∈ ξ (p,w)⇒ px = w;(3) if u is quasi-concave, ξ is convex valued;(4) if u is strictly quasi-concave, ξ is single valued, i.e., it is a function.

Proof.(1)It simply follows from the fact that ∀α ∈ R++,β (αp, αw) = β (p,w) .(2)Suppose otherwise, then ∃x0 ∈ RC+ such that x0 ∈ ξ (p,w) and px0 < w. Therefore,∃ε0 > 0 such

that B (x0, ε0) ⊆ β (p,w) (take ε0 = d (x0,H (p,w))). Then, from the fact that u is LNS, there existsx∗ such that x∗ ∈ B (x0, ε0) ⊆ β (p,w) and u (x∗) > u (x0), i.e., x0 /∈ ξ (p,w) , a contradiction.(3)Assume there exist x0, x00 such that x0, x00 ∈ ξ (p,w). We want to show that ∀λ ∈ [0, 1] , xλ :=

(1− λ)x0 + λx00 ∈ ξ (p,w) . Observe that u (x0) = u (x00) := u∗. From the quasi-concavity of u, we

6Or simply

0 ≤ limn→∞

d (x, xn) ≤ limn→∞

1

n= 0.


have u¡xλ¢≥ u∗. We are therefore left with showing that xλ ∈ β (p,w) , i.e., β is convex valued.

To see that, simply, observe that pxλ = (1− λ) px0 + λpx00 ≤ (1− λ)w + λw = w.(4) Assume otherwise. Following exactly the same argument as above we have x0, x00 ∈ ξ (p,w) ,

and pxλ ≤ w. Since u is strictly quasi concave, we also have that u¡xλ¢> u (x0) = u (x00) := u∗,

which contradicts the fact that x0, x00 ∈ ξ (p,w) .

Proposition 635 If u is a continuous LNS utility function, then the indirect utility function hasthe following properties.For every (p,w) ∈ RC++ ×R++,(1) ∀α ∈ R++, v (αp, αw) = v (p,w) ;(2) Strictly increasing in w and for every c, non increasing in pc;(3) for every v ∈ R, {(p,w) : v (p,w) ≤ v} is convex.(4) continuous.

Proof.(1) It follows from Proposition 634 (2) .(2)If w increases, say by∆w, then, from Proposition 634 (2) , px (p,w) < w+∆w. Define x (p,w) :=

x0. Then,∃ε0 > 0 such that B (x0, ε0) ⊆ β (p,w +∆w) (take ε0 = d (x0,H (p,w +∆w))). Then, fromthe fact that u is LNS, there exists x∗ such that x∗ ∈ B (x0, ε0) ⊆ β (p,w +∆w) and u (x∗) > u (x0).The result follows observing that v (p,w +∆w) ≥ u (x∗) .

Similar proof applies to the case of a decrease in p. Assume ∆pc0< 0. Define ∆ := (∆c)Cc=1 ∈ RC

with ∆c = 0 iff c 6= c0 and ∆c0 = ∆pc0. Then,

px (p,w) = w⇒ (p+∆)x (p,w) = px (p,w) +(≤0)

∆pc0xc

0(p,w) =

= w + ∆pc0xc

0(p,w) ≤ w. The remaining part of the proof is the same as in the case of an

increase of w.(3) Take (p0, w0) , (p00, w00) ∈ {(p,w) : v (p,w) ≤ v} := S (v) . We want to show that ∀λ ∈ [0, 1] ,¡

pλ, wλ¢:= (1− λ) (p0, w0) + λ (p00, w00) ∈ S (v) , i.e., x ∈ ξ

¡pλ, wλ

¢⇒ u (x) > v.

x ∈ ξ¡pλ, wλ

¢⇒ pλx ≤ wλ ⇔ (1− λ) p0 + λp00 ≤ (1− λ)w0 + λw00.

Then, either p0x ≤ w0 or p00x ≤ w00. If p0x ≤ w0, then u (x) ≤ v (p0, w0) ≤ v. Similarly, ifp00x ≤ w00 .(4)It was proved in Theorem 633.

Part III

Differential calculus in Euclideanspaces

181

Chapter 14

Partial derivatives and directionalderivatives

14.1 Partial Derivatives

The1 concept of partial derivative is not that different from the concept of “standard” derivative ofa function from R to R, in fact we are going to see that partial derivatives of a function f : Rn → Rare just standard derivatives of a naturally associated function from R to R.Recall that for any k ∈ {1, ..., n}, ekn = (0, .., 1, ..., 0) is the k − th vector in the canonical basis

of Rn.

Definition 636 Let a set S ⊆ Rn, a point x0 = (x0k)nk=1 ∈ Int S and a function f : S → R be

given. If the following limit exists and it is finite

limh→0

f¡x0 + hekn

¢− f (x0)

h(14.1)

then it is called the partial derivative of f with respect to the k− th coordinate computed in x0 andit is denoted by any of the following symbols

Dxkf (x0) , Dkf (x0) ,∂f∂xk

(x0) ,∂f(x)∂xk |x=x0

.

Remark 637 As said above, partial derivatives are not really a new concept. We are just treatingf as a function of one variable at the time, keeping the other variables fixed. In other words, forsimplicity taking S = Rn and using the notation of the above definition, we can define

gk : R→ R, gk (xk) = f¡x0 + (xk − x0k) e

kn

¢a function of only one variable, and, by definition of gk,

g0k (x0k) = limh→0

gk (x0k + h)− gk (x0k)

h= lim

h→0

f¡x0 + hek

¢− f (x0)

h= Dxkf (x0) .

Example 638 Given f : R3 → R,

f (x1, x2, x3) = exy cosx+ sin yz

we have ⎛⎝ Dx1f (x)Dx2f (x)Dx2f (x)

⎞⎠ =

⎛⎝ − (sinx) exy + y (cosx) exy

z cos yz + x (cosx) exy

y cos yz

⎞⎠1 In this Part, I follow closely Section 5.14 and chapters 12 and 13 in Apostol (1974).

183

184 CHAPTER 14. PARTIAL DERIVATIVES AND DIRECTIONAL DERIVATIVES

Remark 639 Loosely speaking, we can give the following geometrical interpretation of partial deriv-atives. Given f : R2 → R admitting partial derivatives, ∂f(x0)

∂x1is the slope of the graph of the

function obtained cutting the graph of f with a plane which isorthogonal to the x1 − x2 plane, andgoing through the line parallel to the x1 axis and passing through the point x0, line to which we

have given the same orientation as the x1 axis.picture to be added.

Definition 640 Given an open subset S in Rn and a function f : S → R, if ∀k ∈ {1, ..., n}, thelimit in (14.1) exists, we call the gradient of f in x0 the following vector

(Dkf (x0))nk=1

and we denote it byDf (x0)

Remark 641 The existence of the gradient for f in x0 does not imply continuity of the functionin x0, as the following example shows.

f : R2 → R, f (x1, x2) =

⎧⎪⎪⎨⎪⎪⎩x1 + x2 if

either x1 = 0 or x2 = 0i.e., (x1, x2) ∈ ({0} ×R) ∪ (R× {0})

1 otherwise

5 2.5 0 -2.5 -5

52.50-2.5-5

5

2.5

0

-2.5

-5

xy

z

xy

z

D1f (0) = limx1→0

f (x1, 0)− f (0, 0)

x1 − 0= lim

x1→0

x1x1 − 0

= 1

and similarlyD2f (0) = 1.

f is not continuous in 0: we want to show that ∃ε > 0 such that ∀δ > 0 there exists (x1, x2) ∈R2such that (x1, x2) ∈ B (0, ε) and |f (x1, x2)− f (0, 0)| ≥ ε. Take ε = 1

2 and any (x1, x2) ∈ B (0, δ)such that x1 6= 0 and x2 6= 0. Then |f (x1, x2)− f (0, 0)| = 1 > ε.

14.2 Directional DerivativesA first generalization of the concept of partial derivative of a function is presented in Definition 643below.

14.2. DIRECTIONAL DERIVATIVES 185

Definition 642 Givenf : S ⊆ Rn → Rm, x 7→ f (x) ,

∀i ∈ {1, ..,m, }, the function

fi : S ⊆ Rn → Rm, x 7→ i− th component of f (x) .

is called the i− th component function of f .

Therefore,∀x ∈ S, f (x) = (fi (x))

mi=1 . (14.2)

Definition 643 Given m,n ∈ N\ {0}, a set S ⊆ Rn, x0 ∈ Int S, u ∈ Rn, h ∈ R such thatx0 + hu ∈ S, f : S → Rm, we call the directional derivative of f at x0 in the direction u, denotedby the symbol

f 0 (x0;u) ,

the limit

limh→0

f (x0 + hu)− f (x0)

h(14.3)

if it exists and it is finite.

Remark 644 Assume that the limit in (14.3) exists and it is finite. Then, from (14.2) and usingProposition 489,

f 0 (x0;u) = limh→0

f (x0 + hu)− f (x0)

h=

µlimh→0

fi (x0 + hu)− fi (x0)

h

¶mi=1

= (f 0i (x0;u))mi=1 .

If u = ejn, the j − th element of the canonical basis in Rn, we then have

f 0¡x0; e

jn

¢=

Ãlimh→0

fi¡x0 + hejn

¢− fi (x0)

h

!m

i=1

=¡f 0i¡x0; e

jn

¢¢mi=1

(∗)=¡Dxjfi (x0)

¢mi=1

:= Dxjf (x0)

(14.4)where equality (∗) follows from (??).We can then construct a matrix whose n columns are the above vectors, a matrix which involves

all partial derivative of all component functions of f . That matrix is formally defined below.

Definition 645 Assume that f = (fi)mi=1 : S ⊆ Rn → Rm admits all partial derivatives in x0. The

Jacobian matrix of f at x0 is denoted by Df (x0) and is the following m× n matrix:⎡⎢⎢⎢⎢⎣Dx1f1 (x0) ... Dxjf1 (x0) ... Dxnf1 (x0)

... ... ...Dx1fi (x0) ... Dxjfi (x0) ... Dxnfi (x0)

... ... ...Dx1fm (x0) ... Dxjfm (x0) ... Dxnfm (x0)

⎤⎥⎥⎥⎥⎦ = £ Dx1f (x0) ... Dxjf (x0) ... Dxnf (x0)¤=

=£f 0¡x0; e

1n

¢... f 0

¡x0; e

jn

¢... f 0 (x0; e

nn)¤.

Remark 646 How to easily write the Jacobian matrix of a function.To compute the Jacobian of f is convenient to construct a table as follows.

1. In the first column, write the m vector component functions f1, ..., fi, ..., fm of f .

2. In the first row, write the subvectors x1, ..., xj , ..., xn of x.

3. For each i and j, write the partial Jacobian matrix Dxjfi (x) in the entry at the intersectionof the i− th row and j − th column.


We then obtain the following table,

x1 ... xj ... xn

f1 d Dx1f1 (x) Dxjf1 (x) Dxnf1 (x) e... | |fi | Dx1fi (x) Dxjfi (x) Dxnfi (x) |... | |fm b Dx1fm (x) Dxjfm (x) Dxnfm (x) c

where the Jacobian matrix is the part of the table between square brackets.

Example 647 Given f : R4 → R5,

f (x, y, z, t) =

⎛⎜⎜⎜⎜⎝xy

x2+1x+yzex

xyztet

x+ y + z + tx2 + t2

⎞⎟⎟⎟⎟⎠its Jacobian matrix is⎡⎢⎢⎢⎢⎣

yx2+1 − 2x2

y(x2+1)2

xx2+1 0 0

1ex −

1ex (x+ yz) z

exyex 0

ty zet tx z

et tx yet xy z

et − txy zet

1 1 1 12x 0 0 2t

⎤⎥⎥⎥⎥⎦5×4

Remark 648 From Remark 644,

∀u ∈ Rn, f 0 (x0;u) exists ⇒ Df (x0) exists (14.5)

On the other hand, the opposite implication does not hold true. Consider the example in Remark641. There, we have seen that

Dxf (0) = Dyf (0) = 1

But if u = (u1, u2) with u1 6= 0 and u2 6= 0, we have

limh→0

f (0 + hu)− f (0)

h= lim

h→0

1− 0h

=∞

Remark 649 Again loosely speaking, we can give the following geometrical interpretation of direc-tional derivatives. Take f : R2 → R admitting directional derivatives. f (x0;u) with kuk = 1 is theslope the graph of the function obtained cutting the graph of f with a plane which isorthogonal to the x1 − x2 plane, andgoing through the line going through the points x0 and x0 + u, line to which we have given the

same orientation as u.picture to be added.

Example 650 Takef : Rn → R, x 7→ x0 · x0 = kxk2

Then, the existence of f 0 (x0;u) can be checked computing the following limit.

limh→0f(x0+hu)−f(x0)

h = limh→0(x0+hu)(x0+hu)−x0x0

h =

= limh→0x0x0+hx0u+hux0+h

2uu−x0x0h =

= limh→02hx0u+h

2uuh = limh→0 2x0u+ huu = 2x0u


Exercise 651 Verify thatf 0 (x0;−u) = −f 0 (x0;u) .

Solution:

f 0 (x0;−u) = limh→0f(x0+h(−u))−f(x0)

h = limh→0f(x0−hu)−f(x0)

h =

= − limk→0f(x0+(−h)u)−f(x0)

−hk:=−h= − limk→0

f(x0+ku)−f(x0)k = −f 0 (x0;u) .

Remark 652 It is not the case that

∀u ∈ Rn, f 0 (x0;u) exists ⇒ f is continuous in x0 (14.6)

as the following example shows. Consider

f : R2 → R, f (x, y) =

⎧⎨⎩xy2

x2+y4 if x 6= 0

0 if x = 0, i.e., (x, y) ∈ {0} ×R

25

20

15

10

5

0

-5

5

2.5

0

-2.5

-5

0.50.250-0.25-0.5

x

y

z

x

y

z

Let’s compute f 0 (0;u). If u1 6= 0.

limh→0

f (0 + hu)− f (0)

h= lim

h→0

hu1 · h2u22(h2u21 + h4u42)h

= limh→0

u1 · u22u21 + h2u42

=u22u1

If u1 = 0,we have

limh→0

f (0 + hu)− f (0)

h= lim

h→0

f (0, hu2)− f (0)

h= lim

h→0

0

h= 0

On the other hand, if x = y2 and x, y 6= 0, i.e., along the graph of the parabola x = y2 exceptthe origin, we have

f (x, y) = f¡y2, y

¢=

y4

y4 + y4=1

2

whilef (0, 0) = 0.

Roughly speaking, the existence of partial derivatives in a given point in all directions implies“continuity along straight lines” through that point; it does not imply “ continuity along all possiblelines through that point”, as in the case of the parabola in the picture above.

Remark 653 We are now left with two problems:1. Is there a definition of derivative whose existence implies continuity?2. Is there any “easy” way to compute the directional derivative?


Appendix (to be corrected)There are other definitions of directional derivatives used in the literature.Let the following objects be given: m,n ∈ N\ {0}, a set S ⊆ Rn, x0 ∈ Int S, u ∈ Rn, h ∈ R

such that x0 + hu ∈ S, f : S → Rm,

Definition 654 (our definition following Apostol (1974)) We call the directional derivative of fat x0 in the direction u according to Apostol, denoted by the symbol

f 0A (x0;u) ,

the limit

limh→0

f (x0 + hu)− f (x0)

h(14.7)


Definition 655 (Girsanov (1972)) We call the directional derivative of f at x0 in the direction uaccording to Girsanov, denoted by the symbol

f 0G (x0;u) ,

the limit

limh→0+

f (x0 + hu)− f (x0)

h(14.8)


Definition 656 (Wikipedia) Take u ∈ Rnsuch that kuk = 1.We call the directional derivative of fat x0 in the direction u according to Wikipedia, denoted by the symbol

f 0W (x0;u) ,

the limit

limh→0+

f (x0 + hu)− f (x0)

h(14.9)


Fact 1. For given x0 ∈ S, u ∈ RnA⇒ G⇒W,

while the opposite implications do not hold true. In particular, to see way A : G, just takef : R→ R,

f (x) =

⎧⎨⎩ 0 if x < 0

1 if x ≥ 0,

and observe that while the right derivative in 0 is

limh→0+

f (h)− f (0)

h= lim

h→0−1− 1h

= 0,

while the left derivative is

limh→0−

f (h)− f (0)

h= lim

h→0−0− 1h

= +∞.

Fact 2. For given x0 ∈ S,

f 0W (x, u) exists ⇒ f 0G (x, v) exists for any v = αu and α ∈ R++.

Proof.

f 0G (x, v) = limh→0+f(x0+hv)−f(x0)

h = α limh→0+f(x0+hαu)−f(x0)

αh

k=αh>0=

= α limh→0+f(x0+ku)−f(x0)

k = αf 0W (x, u) .


Fact 3. Assume that u 6= 0 and x0 ∈ Rn. Then the following implications are true:

∀u ∈ Rn, f 0A (x, u) exists ⇔ ∀u ∈ Rn, f 0G (x, u) exists ⇔ ∀u ∈ Rn such that kuk = 1, f 0W (x, u) exists.

Proof.From Fact 1, we are left with showing just two implications.G⇒ A.We want to show that

∀u ∈ Rn, limh→0+

f (x0 + hu)− f (x0)

h∈ R⇒∀v ∈ Rn, lim

h→0

f (x0 + hv)− f (x0)

h∈ R.

Therefore, it suffices to show that l := limh→0−f(x0+hv)−f(x0)

h ∈ R.Take u = −v. Then,

l = limh→0−

f (x0 − hu)− f (x0)

h= − lim

h→0−f (x0 − hu)− f (x0)

−hk=−h= − lim

k→0+f (x0 + ku)− f (x0)

k∈ R.

W ⇒ G.The proof of this implication is basically the proof of Fact 2.We want to show that

∀u ∈ Rn such that kuk = 1, limh→0+

f (x0 + hu)− f (x0)

h∈ R⇒∀v ∈ Rn\ {0} , l := lim

h→0+f (x0 + hv)− f (x0)

h∈ R.

In fact,

l := limh→0+

f³x0 + h kvk v

kvk

´− f (x0)

h∈ R,

simply because°°° vkvk

°°° = 1.Remark 657 We can give the following geometrical interpretation of directional derivatives. Firstof all observe that from Proposition 660,

f 0 (x0;u) := limh→0

f (x0 + hu)− f (x0)

h= dfx0 (u) .

If f : R→ R, we then havef 0 (x0;u) = f 0 (x0) · u.

Therefore, if u = 1, we havef 0 (x0;u) = f 0 (x0) ,

and if u > 0, we havesign f 0 (x0;u) = sign f 0 (x0) .

Take now f : R2 → R admitting directional derivatives. Then,

f 0 (x0;u) = Df (x0) · u with kuk = 1

is the slope the graph of the function obtained cutting the graph of f with a plane which isorthogonal to the x1 − x2 plane, andalong the line going through the points x0 and x0 + u, in the direction from x0 to x0 + u.

Chapter 15

Differentiability

15.1 Total Derivative and Differentiability

If f : R→ R, we say that f is differentiable in x0, if the following limit exists and it is finite

limh→0

f (x0 + h)− f (x0)

h

and we write

f 0 (x0) = limh→0

f (x0 + h)− f (x0)

h

or, in equivalent manner,

limh→0

f (x0 + h)− f (x0)− f 0 (x0) · hh

= 0

andf (x0 + h)− (f (x0) + f 0 (x0) · h) = r (h)

where

limh→0

r (h)

h= 0,

orf (x0 + h) = f (x0) + f 0 (x0) · h+ r (h) ,

or using what said in Section 8.5, and more specifically using definition 8.8,

f (x0 + h) = f (x0) + lf 0(x0) (h) + r (h)

where lf 0(x0) ∈ L (R,R)

and limh→0r(h)h = 0

Definition 658 Given a set S ⊆ Rn, x0 ∈ Int S, f : S → Rm, we say that f is differentiable atx0 ifthere exists

a linear function dfx0 : Rn → Rm

such that for any u ∈ Rn,u 6= 0, such that x0 + u ∈ S,

limu→0

f (x0 + u)− f (x0)− dfx0 (u)

kuk = 0 (15.1)

In that case, the linear function dfx0 is called the total derivative or the differential or simplythe derivative of f at x0.

191

192 CHAPTER 15. DIFFERENTIABILITY

Remark 659 Obviously, given the condition of the previous Definition, we can say that f is dif-ferentiable at x0 if there exists a linear function dfx0 : Rn → Rm such that ∀u ∈ Rn such thatx0 + u ∈ S

f (x0 + u) = f (x0) + dfx0 (u) + r (u) , with limu→0

r (u)

kuk = 0 (15.2)

or

f (x0 + u) = f (x0) + dfx0 (u) + kuk ·Ex0 (u) , with limu→0

Ex0 (u) = 0 (15.3)

The above equations are called the first-order Taylor formula (of f at x0 in the direction u).Condition (15.3) is the most convenient one to use in many instances.

Proposition 660 Assume that f : S → Rm is differentiable at x0 , then

∀u ∈ Rn, f 0 (x0;u) = dfx0 (u) .

Proof.

f 0 (x0;u) := limh→0

f (x0 + hu)− f (x0)

h

(1)=

= limh→0

f (x0) + dfx0 (hu) + khuk ·Ex0 (hu)− f (x0)

h

(2)= lim

h→0

hdfx0 (u) + |h| kuk ·Ex0 (hu)

h

(3)=

= limh→0

dfx0 (u) + limh→0

sign (h) · kuk ·Ex0 (hu)(4)= dfx0 (u) + kuk lim

hu→0sign (h) ·Ex0 (hu)

(5)= dfx0 (u) ,

where(1) follows from (15.3) with hu in the place of u,(2) from the fact that dfx0 is linear and therefore (Exercise) continuous, and from a property of

a norm,(3) from the fact that |h|h = sign (h), 1

(4) from the fact that h→ 0 implies that hu→ 0,(5) from the assumption that f is differentiable in x0.

Remark 661 The above Proposition implies that if the differential exists, then it is unique - fromthe fact that the limit is unique, if it exists.

Proposition 662 If f : S → Rm is differentiable at x0, then f is continuous at x0.

Proof. We have to prove that

limu→0

f (x0 + u)− f (x0) = 0

i.e., from (15.3), it suffices to show that

limu→0

dfx0 (u) + kuk ·Ex0 (u) = dfx0 (0) + limu→0

kuk ·Ex0 (u) = 0

where the first equality follows from the fact that dfx0 is linear and therefore continuous, and thesecond equality from the fact again that dfx0 is linear, and therefore dfx (0) = 0, and from (15.2) .

Remark 663 The above Proposition is the answer to Question 1 in Remark 653. We still do nothave a answer to Question 2 and another question naturally arises at this point:3. Is there an “easy” way of checking differentiability?

1sign is the function defined as follows:

sign : R→ {−1, 0 + 1} , x 7→

⎧⎨⎩ −1 if x < 00 if x = 0+1 if x > 0.

15.2. TOTAL DERIVATIVES IN TERMS OF PARTIAL DERIVATIVES. 193

15.2 Total Derivatives in terms of Partial Derivatives.In Remark 665 below, we answer question 2 in Remark 653: Is there any “easy” way to computethe directional derivative?

Proposition 664 Assume that f = (fj)mj=1 : S ⊆ Rn → Rm is differentiable in x0. The matrix

associated with dfx0 with respect to the canonical bases of Rn and Rm is the Jacobian matrix Df (x0),i.e., using the notation of Section 8.5,

[dfx0 ] = Df (x0) ,

i.e.,∀x ∈ Rn, dfx0 (x) = Df (x0) · x. (15.4)

Proof. From (8.6) in Section 8.5

[dfx0 ] =£dfx0

¡e1n¢

... dfx0¡ein¢

... dfx0 (enn)¤m×n

From Proposition 660,

∀i ∈ {1, ..., n} , dfx0¡ei¢= f 0

¡x0; e

i¢,

and from (14.4)f 0¡x0; e

i¢= (Dxifj (x0))

mj=1 .

Then

[dfx0 ] =£(Dx1fj (x0))

mj=1 ... (Dxifj (x0))

mj=1 ... (Dx1fj (x0))

mj=1

¤m×n ,

as desired.

Remark 665 From Proposition 660, part 1, and the above Proposition 664, we have that if f isdifferentiable in x0, then ∀u ∈ Rm

∀u ∈ Rm, f 0 (x0;u) = Df (x0) · u.

Remark 666 From (15.4), we get

kdfx0 (x)k = k[Df (x0)]mn xk(1)

≤mXj=1

|Dfj (x0) · x|(2)

≤mXj=1

kDfj (x0)k · kxk

where (1) follows from Remark 56, (2) from Cauchy-Schwarz inequality in (53) .Therefore, de-fined α :=

Pmj=1 kDfj (x0)k ,we have that

kdfx0 (x)k ≤ α · kxk

andlimx→0

kdfx0 (x)k = 0

Remark 667 We have seen that

f differentiable in x0 ⇒ f admits directional derivative in x0 ⇒:

Df (x0) exists

⇓ (not ⇓) (not ⇓)f continuous in xo f continuous in xo f continuous in xo

Thereforef differentiable in x0 : Df (x0) exists

andf differentiable in x0 : f admits directional derivative in x0

We still do not have an answer to question 3 in Remark 663: Is there an easy way of checkingdifferentiability? We will provide an answer in Proposition 695.

194 CHAPTER 15. DIFFERENTIABILITY

Chapter 16

Some Theorems

We first introduce some needed definitions.

Definition 668 Given an open S ⊆ Rn,

f : S → Rm, x := (xj)nj=1 7→ f (x) = (fi (x))

mi=1

I ⊆ {1, ...,m} and J ⊆ {1, ..., n} ,the partial Jacobian of (fi)i∈I with respect to (xj)j∈J in x0 ∈ S is the following (#I) × (#J)

submatrix of Df (x0) ∙∂fi (x0)

∂xj

¸i∈I, j∈J

,

and it is denoted byD(xj)j∈J

(fj)i∈I (x0)

Example 669 Take:S an open subset of Rn1 , with generic element x0 = (xj)n1j=1,T an open subset of Rn2 , with generic element x00 = (xk)n2k=1 and

f : S × T → Rm, (x0, x00) 7→ f (x0, x00)

Then, defined n = n1 + n1, we have

Dx0f (x0) =

⎡⎢⎢⎢⎢⎣Dx1f1 (x0) ... Dxn1

f1 (x0)... ...

Dx1fi (x0) ... Dxn1fi (x0)

... ...Dx1fm (x0) ... Dxn1

fm (x0)

⎤⎥⎥⎥⎥⎦m×n1

and, similarly,

f (x0) :=

⎡⎢⎢⎢⎢⎣Dxn1+1

f1 (x0) ... Dxnf1 (x0)

... ...Dxn1+1

fi (x0) ... Dxnfi (x0)

... ...Dxn1+1

fm (x0) ... Dxnfm (x0)

⎤⎥⎥⎥⎥⎦m×n2

and thereforeDf (x0) :=

£Dx0f (x0) Dx00f (x0)

¤m×n

Definition 670 Given an open set S ⊆ Rn and f : S → R, assume that ∀x ∈ S, Df (x) :=³∂f(x)∂xj

´nj=1

exists. Then, ∀j ∈ {1, ..., n}, we define the j − th partial derivative function as

∂f

∂xj: S → R, x 7→ ∂f (x)

∂xj

195

196 CHAPTER 16. SOME THEOREMS

Assuming that the above function has partial derivative with respect to xk for k ∈ {1, ..., n}, wedefine it as the mixed second order partial derivative of f with respect to xj and xk and we write

∂2f (x)

∂xj∂xk:=

∂ ∂f(x)∂xj

∂xk

Definition 671 Given f : S ⊆ Rn → R, the Hessian matrix of f at x0 is the n× n matrix

D2f (x0) :=

∙∂2f

∂xj∂xk(x0)

¸j,k=1,...,n

Remark 672 D2f (x0) is the Jacobian matrix of the gradient function of f .

Example 673 Compute the Hessian function of f : R3++ → R,

f (x, y, z) =¡ex cos y + z2 + x2 log y + log x+ log z + 2t log t

¢We first compute the gradient: ⎛⎜⎜⎝

2x ln y + (cos y) ex + 1x

− (sin y) ex + x2

y

2z + 1z

2 ln t+ 2

⎞⎟⎟⎠and then the Hessian matrix⎡⎢⎢⎣

2 ln y + (cos y) ex − 1x2 − (sin y) ex + 2xy 0 0

− (sin y) ex + 2xy − (cos y) ex − x2

y2 0 0

0 0 − 1z2 + 2 0

0 0 0 2t

⎤⎥⎥⎦16.1 The chain ruleProposition 674 (Chain Rule) Given S ⊆ Rn, T ⊆ Rm,

f : S ⊆ Rn → Rm,

such that Im f ⊆ T , andg : T ⊆ Rm → Rp,

assume that f and g are differentiable in x0 and y0 = f (x0), respectively. Then

h : S ⊆ Rn → Rp, h (x) = (g ◦ f) (x)

is differentiable in x0 anddhx0 = dgf(x0) ◦ dfx0 .

Proof. We want to show that there exists a linear function dhx0 : Rn → Rp such that

h (x0 + u) = h (x0) + dhfx0 (u) + kuk ·E∗x0 (u) , with limu→0

E∗x0 (u) = 0,

and dhx0 = dgf(x0) ◦ dfx0 .Taking u sufficiently small (in order to have x0 + u ∈ S), we have

h (x0 + u)− h (x0) = f [g (x0 + u)]− f [g (x0)] = f [g (x0 + u)]− f (y0)

and definedv = g (x0 + u)− y0

we geth (x0 + u)− h (x0) = f (y0 + v)− f (y0) .

16.1. THE CHAIN RULE 197

Since g is differentiable in x0, we get

v = dgx0 (u) + kuk ·Ex0 (u) , with limu→0

Ex0 (u) = 0. (16.1)

Since f is differentiable in y0 = g (x0), we get

f (y0 + v)− f (y0) = dfy0 (v) + kvk ·Ey0 (v) , with limv→0

Ey0 (v) = 0. (16.2)

Inserting (16.1) in (16.2), we get

f (y0 + v)−f (y0) = dfy0 (dgx0 (u) + kuk ·Ex0 (u))+kvk·Ey0 (v) = dfy0 (dgx0 (u))+kuk·dfy0 (Ex0 (u))+kvk·Ey0 (v)

Defined

Ex0 (u) :=

⎧⎨⎩0 if u = 0

dfy0 (Ex0 (u)) +kvkkuk ·Ey0 (v) if u 6= 0

,

we are left with showing thatlimu→0

Ex0 (u) = 0.

Observe thatlimu→0

dfy0 (Ex0 (u)) = 0

since linear functions are continuous and from (16.1). Moreover, since limu→0 v = 0, from (16.2),we get

limu→0

Ey0 (v) = 0.

Finally, we have to show that limu→0kvkkuk is bounded. Now, from the definition of u and from

(666), defined α :=Pm

j=1 kDgj (x0)k,

kvk = kdgx0 (u) + kuk ·Ex0 (u)k ≤ kdgx0 (u)k+ kuk kEx0 (u)k ≤ (α+ kEx0 (u)k) · kuk

and

limu→0

kvkkuk ≤ lim

u→0(α+ kEx0 (u)k) = α,

as desired.

Remark 675 From Proposition 664 and Proposition 302, or simply (8.10), we also have

Dh (x0)p×n = Dg (f (x0))p×m ·Df (x0)m×n .

Observe that Dg (f (x0)) is obtained computing Dg (y) and then substituting f (x0) in the placeof y. We therefore also write Dg (f (x0)) = Dg (y)|y=f(x0)

Exercise 676 Compute dhx0 , if n = 1, p = 1.

Definition 677 Given f : S ⊆ Rn → Rm1 , g : S ⊆ Rn → Rm2 ,

(f, g) : Rn → Rm1+m2 , (f, g) (x) = (f (x) , g (x))

Remark 678 Clearly,

D (f, g) (x0) =

∙Df (x0)Dg (x0)

¸Example 679 Given

f : R→ R2, : x 7→ (sinx, cosx)

g : R2 → R2, : (y1, y2) 7→ (y1 + y2, y1 · y2)

h = g ◦ f : R→ R2, : x 7→ (sinx+ cosx, sinx · cosx)


verify the conclusion of the Chain Rule Proposition.

Df (x) =

∙cosx− sinx

¸

Dg (y) =

∙1 1y2 y1

¸Dg (f (x)) =

∙1 1

cosx sinx

¸Dg (f (x)) ·Df (x) =

∙1 1

cosx sinx

¸ ∙cosx− sinx

¸=

∙cosx− sinxcos2 x− sin2 x

¸= Dh (x)

Example 680 Takeg : Rk → Rn, : t 7→ g (t)

f : Rn ×Rk → Rm, : (x, t) 7→ f (x, t)

h : Rk → Rm, : t 7→ f (g (t) , t)

Then eg := (g, idRk) : Rk → Rn ×Rk, t 7→ (g (t) , t)

andh = f ◦ eg = f ◦ (g, idRk , )

Therefore, assuming that f, g, h are differentiable,

[Dh (t0)]m×k = [Df (g (t0) , t0)]m×(n+k) ·∙Dg (t0)

I

¸(n+k)×k

=

=£[Dxf (g (t0) , t0)]m×n | [Dtf (g (t0) , t0)]m×k

¤·∙[Dg (t0)]n×k

Ik×k

¸=

[Dxf (g (t0) , t0)]m×n · [Dg (t0)]n×k + [Dtf (g (t0) , t0)]m×k

In the case k = n = m = 1, the above expression

df (x = g (t) , t)

dt=

∂f (g (t) , t)

∂x

dg (t)

dt+

∂f (g (t) , t)

∂t

ordf (g (t) , t)

dt=

∂f (x, t)

∂x |x=g(t)· dg (t)

dt+

∂f (x, t)

∂t |x=g(t)

16.2 Mean value theorem

Proposition 681 (Mean Value Theorem) Let S be an open subset of Rn and f : S → Rm adifferentiable function. Let x, y ∈ S be such that the line segment joining them is contained in S,i.e.,

L (x, y) := {z ∈ Rn : ∃λ ∈ [0, 1] such that z = (1− λ)x+ λy} ⊆ S.

Then

∀a ∈ Rm, ∃ z ∈ L (x, y) such that a [f (y)− f (x)] = a [Df (z) · (y − x)]

Remark 682 Under the assumptions of the above theorem, the following conclusion is false:

∃ z ∈ L (x, y) such that f (y)− f (x) = Df (z) · (y − x).

But if f : S → Rm=1, then setting a ∈ Rm=1 equal to 1, we get that the above statement isindeed true.

16.2. MEAN VALUE THEOREM 199

Proof. of Proposition 681Define u = y − x. Since S is open and L (x, y) ⊆ S, ∃δ > 0 such that ∀t ∈ (−δ, 1 + δ) such that

x+ tu = (1− t)x+ ty ∈ S. Taken a ∈ Rm, define

F : (−δ, 1 + δ)→ R, : t 7→ a · f (x+ tu) =Pm

j+1 aj · fj (x+ tu)

Then

F 0 (t) =mXj+1

aj · [Dfj (x+ tu)]1×n · un×1 = a1×m · [Df (x+ tu)]m×n · un×1

and F is continuous on [0, 1] and differentiable on (0, 1); then, we can apply “Calculus 1” MeanValue Theorem and conclude that

∃θ ∈ (0, 1) such that F (1)− F (0) = F 0 (θ) ,

and by definition of F and u,

∃θ ∈ (0, 1) such that f (y)− f (x) = a ·Df (x+ θu) · (y − x)

which choosing z = x+ θu gives the desired result.

Remark 683 Using the results we have seen on directional derivatives, the conclusion of the abovetheorem can be rewritten as follows.

∃ z ∈ L (x, y) such that f (y)− f (x) = f 0(z, y − x)

As in the case of real functions of real variables, the Mean Value Theorem allows to give a simplerelationship between sign of the derivative and monotonicity.

Definition 684 A set C ⊆ Rn is convex if ∀x1, x2 ∈ C and ∀λ ∈ [0, 1], (1− λ)x1 + λx2 ∈ C.

Proposition 685 Let S be an open and convex subset of Rn and f : S → Rm a differentiablefunction. If ∀x ∈ S, dfx = 0, then f is constant on S.

Proof. Take arbitrary x, y ∈ S. Then since S is convex and f is differential, from the MeanValue Theorem, we have that

∀a ∈ Rm, ∃ z ∈ L (x, y) such that a [f (y)− f (x)] = a [Df (z) · (y − x)] = 0.

Taken a = f (y)− f (x), we get that

kf (y)− f (x)k = 0

and thereforef (x) = f (y) ,

as desired.

Definition 686 Given x := (xi)ni=1 , y := (yi)

ni=1 ∈ Rn,

x ≥ y means ∀i ∈ {1, ..., n} , xi ≥ yi;

x > y means x ≥ y ∧ x 6= y;

xÀ y means ∀i ∈ {1, ..., n} , xi > yi.

Definition 687 f : S ⊆ Rn → R is increasing if ∀x, y ∈ S, x > y ⇒ f (x) ≥ f (y).f is strictly increasing if ∀x, y ∈ S, x > y ⇒ f (x) > f (y) .

Proposition 688 Take an open, convex subset S of Rn, and f : S → R differentiable.1. If ∀x ∈ S, Df (x) ≥ 0, then f is increasing;2. If ∀x ∈ S, Df (x) >> 0, then f is strictly increasing.


Proof. 1. Take y ≥ x. Since S is convex, L (x, y) ⊆ S. Then from the Mean Value Theorem,

∃ z ∈ L (x, y) such that f (y)− f (x) = Df (z) · (y − x)

Since y − x ≥ 0 and Df (z) ≥ 0, the result follows.2. Take x > y. Since S is convex, L (x, y) ⊆ S. Then from the Mean Value Theorem,

∃ z ∈ L (x, y) such that f (y)− f (x) = Df (z) · (y − x)

Since y − x > 0 and Df (z) >> 0, the result follows.

Exercise 689 Is the following statement correct: “If ∀x ∈ S, Df (x) > 0, then f is strictlyincreasing” ?.

Corollary 690 Take an open, convex subset S of Rn, and f ∈ C1 (S,R). If ∃x0 ∈ S and u ∈Rn\ {0} such that f 0 (x0, u) > 0, then ∃ t ∈ R++ such that ∀t ∈

£0, t¢,

f (x0 + tu) ≥ f (x0) .

Proof. Since f is C1 and f 0 (x0, u) = Df (x0) · u > 0, ∃r > 0 such that

∀x ∈ B (x0, r) , f 0 (x, u) > 0.

Then ∀t ∈ (−r, r),°°°x0 + 1

kuk tu− x0

°°° = t < r, and therefore

f 0µx0 +

t

kuku, u¶> 0

Then, from the Mean Value Theorem, ∀t ∈£0, r2

¤,

f (x0 + tu)− f (x0) = f 0 (x0 + tu, u) ≥ 0.

Definition 691 Given a function f : S ⊆ Rn → R, x0 ∈ S is a point of local maximum for f if

∃δ > 0 such that ∀x ∈ B (x0, δ) , f (x0) ≥ f (x) ;

x0 is a point of global maximum for f if

∀x ∈ S, f (x0) ≥ f (x) .

x0 ∈ S is a point of strict local maximum for f if

∃δ > 0 such that ∀x ∈ B (x0, δ) , f (x0) > f (x) ;

x0 is a point of strict global maximum for f if

∀x ∈ S, f (x0) > f (x) .

Local, global, strict minima are defined in obvious manner

Proposition 692 If S ⊆ Rn, f : S → R admits all partial derivatives in x0 ∈ Int S and x0 is apoint of local maximum or minimum, then Df (x0) = 0.

Proof. Since x0 is a point of local maximum, ∃δ > 0 such that ∀x ∈ B (x0, δ), f (x0) ≥ f (x).As in Remark 637, for any k ∈ {1, ..., n}, define

gk : R→ R, g (xk) = f¡x0 + (xk − x0k) e

kn

¢.

Then gk has a local maximum point at x0k. Then from Calculus 1,

g0k(x0k) = 0

Since, again from Remark 637, we have

Dkf (x0) = g0k (x0) .

the result follows.

16.3. A SUFFICIENT CONDITION FOR DIFFERENTIABILITY 201

16.3 A sufficient condition for differentiability

Definition 693 f = (fi)mi=1 : S ⊆ Rn → Rm is continuously differentiable on A ⊆ S, or f is C1

on S, or f ∈ C (A,Rm) if ∀i ∈ {1, ...,m} , j ∈ {1, ..., n} ,

Dxjfi : A→ R, x 7→ Dxjfi (x) is continuous.

Definition 694 f : S ⊆ Rn → R is twice continuously differentiable on A ⊆ S, or f is C2 on S,or f ∈ C2 (A,Rm) if ∀j, k ∈ {1, ..., n} ,

∂2f

∂xj∂xk: A→ R, x 7→ ∂2f

∂xj∂xk(x) is continuous.

Proposition 695 If f is C1 in an open neighborhood of x0, then it is differentiable in x0.

Proof. See Apostol (1974), page 357 or Section "Existence of derivative", page 232, in Bartle(1964), where the significant case f : Rn → Rm=1 is presented using Cauchy-Schwarz inequality.See also,Theorem 1, page 197, in Taylor and Mann (1984), for the case f : R2 → R.The above result is the answer to Question 3 in Remark 663. To show that f : Rm → Rn is

differentiable, it is enough to verify that all its partial derivatives, i.e., the entries of the Jacobianmatrix, are continuous functions.

16.4 A sufficient condition for equality of mixed partial deriv-atives

Proposition 696 If f : S ⊆ Rn → R1 is C2 in an open neighborhood of x0, then ∀i

∂ ∂f∂xi

∂xk(x0) =

∂ ∂f∂xk

∂xi(x0)

or∂2f

∂xi∂xk(x0) =

∂2f

∂xk∂xi(x0)

Proof. See Apostol (1974), Section 12.13, page 358.

16.5 Taylor’s theorem for real valued functions

To get Taylor’s theorem for functions f : Rm → R, we introduce some notation in line with thedefinition of directional derivative:

f 0 (x, u) =nXi=1

Dxif (x) · ui.

Definition 697 Assume S is an open subset of Rm and the function f : S → R admits partialderivatives at least up to order m, and x ∈ S, u ∈ Rm. Then

f 00 (x, u) :=nXi=1

nXj=1

Di,jf (x) · ui · uj ,

f 000 (x, u) :=nXi=1

nXj=1

nXk=1

Di,j,kf (x) · ui · uj · uk

and similar definition applies to f (m) (x, u).


Proposition 698 (Taylor’s formula) Assume S is an open subset of Rm and the function f : S →R admits partial derivatives at least up to order m, and x ∈ S, u ∈ Rm. Assume also that all itspartial derivative of order < m are differentiable. If y and x are such that L (y, x) ⊆ S, then thereexists z ∈ L (y, x) such that

f (y) = f (x) +m−1Xk=1

1

k!f (k) (x, y − x) +

1

m!f (m) (z, y − z) .

Proof. . Since S is open and L (x, y) ⊆ S, ∃δ > 0 such that ∀t ∈ (−δ, 1 + δ) such thatx+ t (y − x) ∈ S. Define g : (−δ, 1 + δ)→ R

g (t) = f (x+ t (y − x)) .

From standard “Calculus 1” Taylor’s theorem, we have that ∃θ ∈ (0, 1) such that

f (y)− f (x) = g (1)− g (0) =m−1Xk=1

1

k!g(k) (0) +

1

m!g(m) (θ) .

Then

g (0t) = Df (x+ t (y − x)) · (y − x) =nXi=1

Dxif (x+ t (y − x)) · (yi − xi) = f 0 (x+ t (y − x) , y − x) ,

g00 (t) =nXi=1

nXj=1

Dxi,xjf (x+ t (y − x)) · (yi − xi) · (yj − xj) = f 00 (x+ t (y − x) , y − x)

and similarlyg(m) (t) = f 0(m) (x+ t (y − x) , y − x)

Then the desired result follow substituting 0 in the place of t where needed and choosing z =x+ θ (y − x).

Chapter 17

Implicit function theorem

17.1 Some intuition

Below, we present an informal discussion of the Implicit Function Theorem. Assume that

f : R2 → R, (x, t) 7→ f (x, t)

is at least C1. The basic goal is to study the nonlinear equation

f (x, t) = 0,

where x can be interpreted as an endogenous variable and t as a parameter (or an exogenousvariable). Assume that

∃¡x0, t0

¢∈ R2 such that f

¡x0, t0

¢= 0

and for some ε > 0

∃ a C1 function g :¡t0 − ε, t0 + ε

¢→ R, t 7→ g (t)

such thatf (g (t) , t) = 0 (17.1)

We can then say that g describes the solution to the equation

f (x, t) = 0,

in the unknown variable x and parameter t, in an open neighborhood of t0. Therefore, usingthe Chain Rule - and in fact, Remark 680 - applied to both sides of (17.1), we get

∂f (x, t)

∂x |x=g(t)· dg (t)

dt+

∂f (x, t)

∂t |x=g(t)= 0

and

assuming that∂f (x, t)

∂x |x=g(t)6= 0

we havedg (t)

dt= −

∂f(x,t)∂t |x=g(t)

∂f(x,t)∂x |x=g(t)

(17.2)

The above expression is the derivative of the function implicitly defined by (17.1) close to thevalue t0. In other words, it is the slope of the level curve f (x, t) = 0 at the point (t, g (t)).For example, taken

f : R2 → R, (x, t) 7→ x2 + t2 − 1

f (x, t) = 0 describes the circle with center in the origin and radius equal to 1. Putting t on thehorizontal axis and x on the vertical axis, we have the following picture.

203

204 CHAPTER 17. IMPLICIT FUNCTION THEOREM

10.50-0.5-1

1

0.5

0

-0.5

-1

t

x

t

x

Clearlyf ((0, 1)) = 0

As long as t ∈ (−1, 1) , g (t) =√1− t2 is such that

f (g (t) , t) = 0 (17.3)

Observe that

d¡√1− t2

¢dt

= − t√1− t2

and

−∂f(x,t)∂t |x=g(t)


= − 2t2x |x=g(t)

= − t√1− t2

For example for t = 1√2, g0 (t) = −

1√2√1− 1

2

= −1.

Let’s try to present a more detailed geometrical interpretation1. Consider the set©(x, t) ∈ R2 : f (x, t) = 0

ªpresented in the following picture.Insert picture a., page 80.In this case, does equation

f (x, t) = 0 (17.4)

define x as a function of t? Certainly, the curve presented in the picture is not the graph of afunction with x as dependent variable and t as an independent variable for all values of t in R. Infact,1. if t ∈ (−∞, t1], there is only one value of x which satisfies equation (17.4);2. if t ∈ (t1, t2), there are two values of x for which f (x, t) = 0;3. if t ∈ (t2,+∞), there are no values satisfying the equation.If we consider t belonging to an interval contained in (t1, t2), we have to to restrict the admissible

range of variation of x in order to conclude that equation (17.4) defines x as a function of t in thatinterval. For example, we see that if the rectangle R is as indicated in the picture , the givenequation defines x as a function of t, for well chosen domain and codomain - naturally associatedwith R. The graph of that function is indicated in the figure below.Insert picture b., page 80.

1This discussion is taken from Sydsaeter (1981), page 80-81.

17.2. FUNCTIONS WITH FULL RANK SQUARE JACOBIAN 205

The size of R is limited by the fact that we need to define a function and therefore one and onlyone value has to be associated with t. Similar rectangles and associated solutions to the equationcan be constructed for all other points on the curve, except one: (t2, x2). Irrespectively of howsmall we choose the rectangle around that point, there will be values of t close to t2, say t0, suchthat there are two values of x, say x0 and x00, with the property that both (t0, x0) and (t0, x00) satisfythe equation and lie inside the rectangle. Therefore, equation (17.4) does not define x as a functionof t in an open neighborhood of the point (t2, x2). In fact, there the slope of the tangent to thecurve is infinite. If you try to use expression (17.2) to compute the slope of the curve defined byx2 + t2 = 1 in the point (1, 0), you get an expression with zero in the denominator.On the basis of the above discussion, we see that it is crucial to require the condition

∂f (x, t)

∂x |x=g(t)6= 0

to insure the possibility of locally writing x as a solution (to (17.4)) function of t.We can informally, summarize what we said as follow.If f is C1, f (x0, t0) = 0 and

∂f(x,t)∂x |(x,t)=(x0,t0) 6= 0, then f (x, t) = 0 define x as a C1 function

g of t in an open neighborhood of t0, and g0 (t) = −∂f(x,t)∂t


.

Next sections provide a formal statement and proof of the Implicit Function Theorem. Somework is needed.

17.2 Functions with full rank square JacobianProposition 699 Taken a ∈ Rn, r ∈ R++, assume that1. f := (fi)

ni=1 : Rn → Rn is continuous on Cl (B (a, r)) ;

2. ∀x ∈ B (a, r) , [Df (x)]n×n exists and detDf (x) 6= 0;3. ∀x ∈ F (B (a, r)) , f (x) 6= f (a) .Then, ∃δ ∈ R++ such that

f (B (a, r)) ⊇ B (f (a) , δ) .

Proof. Define B := B (a, r) and

g : F (B)→ R, x 7→ kf (x)− f (a)k .

From Assumption 3, ∀x ∈ F (B), g (x) > 0. Moreover, since g is continuos and F (B) is compact,g attains a global minimum value m > 0 on F (B). Take δ = m

2 ; to prove the desired result, it isenough to show that T := B (f (a, δ)) ⊆ f (B), i.e., ∀y ∈ T , y ∈ f (B) . Define

h : Cl (B)→ R, x 7→ kf (x)− yk .

since h is continuos and Cl B is compact, h attains a global minimum in a point c ∈ B. Wenow want to show that c ∈ B. Observe that, since y ∈ T = B

¡f¡a, δ = m

2

¢¢,

h (a) = kf (a)− yk < m

2(17.5)

Therefore, since c is a global minimum point for h, it must be the case that h (c) < m2 . Now

take x ∈ F (B); then

h (x) = kf (x)− yk = kf (x)− f (a)− (y − f (a))k(1)

≥ kf (x)− f (a)k−ky − f (a)k(2)

≥ g (x)−m2

(3)

≥ m

2,

where (1) follows from Remark 54, (2) from (17.5) and (3) from the fact g has minimum valueequal to m. Therefore, ∀x ∈ F (B), h (x) > h (a) and h does not attain its minimum on F (B).Then h and h2 get their minimum at c ∈ B 2. Since

H (x) := h2 (x) = kf (x)− yk2 =nXi=1

(fi (x)− yi)2

2∀x ∈ B, h (x) ≥ 0 and h (x) ≥ h (c). Therefore, h2 (x) ≥ h2 (c).


from Proposition 692, DH (c) = 0, i.e.,

∀k ∈ {1, ..., n} , 2nXi=1

Dxkfi (c) · (fi (c)− yi) = 0

i.e.,[Df (c)]n×n (f (c)− y)n×1 = 0.

Then, from assumption 2,f (c) = y,

i.e., since c ∈ B, y ∈ f (B), ad desired.

Proposition 700 (1st sufficient condition for openness of a function)Let an open set A ⊆ Rn and a function f : A→ Rn be given. If1. f is continuous,2. f is one-to-one,3. ∀x ∈ A, Df (x) exists and detDf (x) 6= 0,

then f is open.

Proof. Taken b ∈ f (A), there exists a ∈ A such that f (a) = b. Since A is open, there existsr ∈ R++ such that B (a, r) ⊆ A. Moreover, since f is one-to-one and since a /∈ F (B), ∀x ∈ F (B),f (x 6= f (a)) .Then3, for sufficiently small r, Cl (B (a, r)) ⊆ A, and the assumptions of Proposition699 are satisfied and there exists δ ∈ R++ such that

f (A) ⊇ f (Cl (B (a, r))) ⊇ B (f (a) , δ) ,

as desired.

Definition 701 Given f : S → T , and A ⊆ S, the function f|A is defined as follows

f|A : A→ f (A) , f|A (x) = f (x) .

Proposition 702 Let an open set A ⊆ Rn and a function f : A→ Rn be given. If1. f is C1,2. ∃a ∈ A such that detDf (a) 6= 0,then ∃r ∈ R++ such that f is one-to-one on B (a, r), and, therefore, f|B(a,r) is invertible.

Proof. Consider (Rn)n with generic element z :=¡zi¢ni=1, where ∀i ∈ {1, ..., n} , zi ∈ Rn, and

define

h : Rn2 → R, :¡zi¢ni=1

7→ det

⎡⎢⎢⎢⎢⎣Df1 (z1)

...Dfi

¡zi¢

...Dfn (z

n)

⎤⎥⎥⎥⎥⎦ .

Observe that h is continuous because f is C1 and the determinant function is continuous in itsentries. Moreover, from Assumption 2,

h (a, ..., a, ..., a) = detDf (a) 6= 0.

Therefore, ∃r ∈ R++ such that

∀¡zi¢ni=1∈ B ((a, ..., a, ..., a) , r0) , h

³¡zi¢ni=1

´6= 0.

Observe thatB∗ := B ((a, ..., a, ..., a) , r)∩β 6= ∅, where β :=n¡

zi¢ni=1∈ Rn2 : ∀i ∈ {1, ..., n} , zi = z1

o.

Defineproj : (Rn)n → Rn, proj :

¡zi¢ni=1

7→ z1,

3 Simply observe that ∀r ∈ R++, B x, r2⊆ B (x, r).

17.2. FUNCTIONS WITH FULL RANK SQUARE JACOBIAN 207

and observe that proj (B∗) = B (a, r) ⊆ Rn and ∀i ∈ {1, .., n} ,∀zi ∈ B (a, r),¡z1, ..., zi..., zn

¢∈

B∗ and therefore h¡z1, ..., zi..., zn

¢6= 0,or, summarizing,

∃r ∈ R++ such that ∀i ∈ {1, .., n} ,∀zi ∈ B (a, r) , h¡z1, ..., zi..., zn

¢6= 0

We now want to show that f is one-to-one on B (a, r). Suppose otherwise, i.e., given x, y ∈B (a, r), f (x) = f (y), but x 6= y. We can now apply the Mean Value Theorem (see Remark 682) tof i for any i ∈ w {1, ..., n} on the segment L (x, y) ⊆ B (a, r). Therefore ∀i ∈ {1, .., n}, ∃zi ∈ L (x, y)such that

∀i ∈ {1, .., n} ,∃zi ∈ L (x, y) such that 0 = fi (x)− fi (y) = Df¡zi¢(y − x)

i.e., ⎡⎢⎢⎢⎢⎣Df1 (z1)

...Dfi

¡zi¢

...Dfn (z

n)

⎤⎥⎥⎥⎥⎦ (y − x) = 0

Observe that ∀i, zi ∈ B (a, r) and therefore¡zi¢ni=1∈ B ((a, ..., a, ..., a) , r00) ,and therefore

det

⎡⎢⎢⎢⎢⎣Df1 (z1)

...Dfi

¡zi¢

...Dfn (z

n)

⎤⎥⎥⎥⎥⎦ = h³¡zi¢ni=1

´6= 0

and therefore y = x, a contradiction.

Remark 703 The above result is not a global result, i.e., it is false that if f is C1 and its Jacobianhas full rank everywhere in the domain, then f is one to one. Just take the function tan.

52.50-2.5-5

50

25

0

-25

-50

x

y

x

y

The next result gives a global property.

Proposition 704 (2nd sufficient condition for openness of a function) Let an open set A ⊆ Rnand a function f : A→ Rn be given. If1. f is C1,2. ∀x ∈ A, detDf (x) 6= 0,then f is an open function.


Proof. Take an open set S ⊆ A. From Proposition 702,∀x ∈ S there exists r ∈ R++ such thatf is one-to-one on B (x, r). Then, from Proposition 700, f (B (x, r)) is open in Rn. We can thenwrite S = ∪x∈SB (x, r) and

f (S) = f (∪x∈SB (x, r)) = ∪x∈Sf (B (x, r))

where the second equality follows from Proposition 573.2..f1, and then f (S) is an open set.

17.3 The inverse function theorem

Proposition 700 shows that a C1 function with full rank square Jacobian in a point a has a localinverse in an open neighborhood of a. The inverse function theorem give local differentiabilityproperties of that local inverse function.

Lemma 705 If g is the inverse function of f : X → Y and A ⊆ X, then g|f(A) is the inverse off|A, andif g is the inverse function of f : X → Y and B ⊆ Y , then g|B is the inverse of f|g(B).

Proof. Exercise.

Proposition 706 Let an open set S ⊆ Rn and a function f : S → Rn be given. If1. f is C1, and2. ∃a ∈ S, detDf (a) 6= 0,then there exist two open sets X ⊆ S and Y ⊆ f (S) and a unique function g such that1. a ∈ X and f (a) ∈ Y ,2. Y = f (X),3. f is one-to-one on X,4. g is the inverse of fX ,5. g is C1.

Proof. Since f is C1, ∃r1 ∈ R++ such that ∀x ∈ B (a, r1) , detDf (x) 6= 0. Then, fromProposition 702, f is one-to-one on B (a, r1). Then take r2 ∈ (0, r1), and define B := B (a, r2) .Observe that Cl (B) (a, r2) ⊆ B (a, r1) .Using the fact that f is one-to-one on B (a, r1) and thereforeon B (a, r2), we get that Assumption 3 in Proposition 699 is satisfied - while the other two aretrivially satisfied. Then, ∃δ ∈ R++ such that

f (B (a, r2)) ⊇ B (f (a) , δ) := Y.

Define alsoX := f−1 (Y ) ∩B, (17.6)

an open set because Y and B are open sets and f is continuous. Since f is one-to-one andcontinuous on the compact set Cl (B) , from Proposition 581, there exists a unique continuousinverse function bg : f (Cl (B))→ Cl (B) of f|Cl (B). From definition of Y ,

Y ⊆ f (B) ⊆ f (Cl (B)) . (17.7)

From definition of X,

f (X) = Y ∩ f (B) = Y.

Then, from Lemma 705,g = bgY is the inverse of f|X .

The above shows conclusions 1-4 of the Proposition. (About conclusion 1, observe that a ∈f−1 (B (f (a, δ))) ∩B (a, r2) = X and f (a) ∈ f (X) = Y .)We are then left with proving condition 5.

17.3. THE INVERSE FUNCTION THEOREM 209

Following what said in the proof of Proposition 702, we can define

h : Rn2 → R, :¡zi¢ni=1

7→ det

⎡⎢⎢⎢⎢⎣Df1 (z1)

...Dfi

¡zi¢

...Dfn (z

n)

⎤⎥⎥⎥⎥⎦ .

and get that, from Assumption 2,

h (a, ..., a, ..., a) = detDf (a) 6= 0,

and, see the proof of Proposition 702 for details,

∃r ∈ R++ such that ∀i ∈ {1, .., n} ,∀zi ∈ B (a, r) , h¡z1, ..., zi..., zn

¢6= 0, (17.8)

and trivially also

∃r ∈ R++ such that ∀z ∈ B (a, r) , h (z, ..., z..., z) = detDf (z) 6= 0. (17.9)

Assuming, without loss of generality that we took r1 < r, we have that

Cl (B) := Cl (B) (a, r2) ⊆ B (a, r1) ⊆ B (a, r) .

Then ∀z1, ..., zn ∈ Cl (B) , h¡z1, ..., zi..., zn

¢6= 0. Writing g =

¡gj¢nj=1, we want to prove that

∀i ∈ {1, .., n}, gi is C1. We go through the following two steps: 1. ∀y ∈ Y, ∀i, k ∈ {1, .., n} ,Dykg

i (y) exists, and 2.it is continuous.Step 1.We want to show that the following limit exists and it is finite:

limh→0

gi¡y + hekn

¢− gi (y)

h.

Definex = (xi)

ni=1 = g (y) ⊆ X ⊆ Cl (B)

x0 = (x0i)ni=1 = g

¡y + hekn

¢⊆ X ⊆ Cl (B) (17.10)

Thenf (x0)− f (x) =

¡y + hekn

¢− y = hekn.

We can now apply the Mean Value Theorem to f i for i ∈ {1, .., n}: ∃zi ∈ L (x, x0) ⊆ Cl (B) ,where the inclusion follows from the fact that x, x0 ∈ Cl (B) a convex set, such that

∀i ∈ {1, .., n} , f i (x0)− f i (x)

h=

Df i¡zi¢(x0 − x)

h

and therefore ⎡⎢⎢⎢⎢⎣Df1 (z1)

...Dfi

¡zi¢

...Dfn (z

n)

⎤⎥⎥⎥⎥⎦ 1h (x0 − x) = ekn.

Define

A =

⎡⎢⎢⎢⎢⎣Df1 (z1)

...Dfi

¡zi¢

...Dfn (z

n)

⎤⎥⎥⎥⎥⎦Then, from (17.8), the above system admits a unique solution, i.e., using (17.10),

g¡y + hekn

¢− g (y)

h=1

h(x0 − x)


and, using Cramer theorem (i.e., Theorem 353),

g¡y + hekn

¢− g (y)

h=

ϕ¡z1, ..., zn

¢h (z1, ..., zn)

where ϕ takes values which are determinants of a matrix involving entries of A. We are left withshowing that

limh→0

ϕ¡z1, ..., zn

¢h (z1, ..., zn)

exists and it is finite, i.e., the limit of the numerator exists and its finite and the limit of thedenominator exists is finite and nonzero.Then, if h→ 0, y + hekn → y, and, being g continuous, x0 → x and, since zi ∈ L (x, x0), zi → x

for any i. Then, h¡z1, ..., zn

¢→ h (x, ..., x) 6= 0,because, from 17.10, x ∈ Cl (B) and from (17.9).

Moreover, ϕ¡z1, ..., zn

¢→ ϕ (x, ..., x).

Step 2.Since

limh→0

gi¡y + hekn

¢− gi (y)

h=

ϕ (x, ..., x)

h (x, ..., x)

and ϕ and h are continuous functions, the desired result follows.

17.4 The implicit function theoremTheorem 707 Given S, T open subsets of Rn and Rk respectively and a function

f : S × T → Rn, : (x, t) 7→ f (x, t) ,

assume that

1. f is C1,

there exists (x0, t0) ∈ S × T such that

2. f(x0, t0) = 0,

3. Dxf (x0, t0)n×n is invertible.

Then there exist N(x0) ⊆ S open neighborhood of x0, N(t0) ⊆ T open neighborhood of t0 and aunique function

g : N(t0)→ N(x0)

such that

1. g is C1,

2. {(x, t) ∈ N(x0)×N(t0) : f(x, t) = 0} = {(x, t) ∈ N(x0)×N(t0) : x = g(t)} := graph g.4

Proof. See Apostol (1974),

Remark 708 Conclusion 2. above can be rewritten as

∀t ∈ N (t0) , f (g (t) , t) = 0 (17.11)

Computing the Jacobian of both sides of (17.11), using Remark 680, we get

∀t ∈ N (t0) , 0 = [Dxf (g (t) , t)]n×n · [Dg (t)]n×k + [Dtf (g (t) , t)]n×k (17.12)

and using Assumption 3 of the Implicit Function Theorem, we get

∀t ∈ N (t0) , [Dg (t)]n×k = − [Dxf (g (t) , t)]−1n×n · [Dtf (g (t) , t)]n×k

Observe that (17.12) can be rewritten as the following k systems of equations: ∀i ∈ {1, ..., k},

[Dxf (g (t) , t)]n×n · [Dtig (t)]n×1 = − [Dtif (g (t) , t)]n×14Then g (t0) = x0.

17.4. THE IMPLICIT FUNCTION THEOREM 211

Exercise 709 5Discuss the application of the Implicit Function Theorem to f : R5 → R2

f (x1, x2, t1, t2, t3) 7→µ2ex1 + x2t1 − 4t2 + 3x2 cosx1 − 6x1 + 2t1 − t3

¶at¡x0, t0

¢= (0, 1, 3, 2, 7) .

Let’s check that each assumption of the Theorem is verified.

1. f(x0, t0) = 0 . Obvious.

2. f is C1.

We have to compute the Jacobian of the function and check that each entry is a continuousfunction.

x1 x2 t1 t2 t3

2ex1 + x2t1 − 4t2 + 3 2ex1 t1 x2 −4 0x2 cosx1 − 6x1 + 2t1 − t3 −x2 sinx1 − 6 cosx1 2 0 −1

3. [Dxf (x0, t0)]n×n is invertible.

[Dxf (x0, t0)] =

∙2ex1 t1−x2 sinx1 − 6 cosx1

¸|(0,1,3,2,7)

=

∙2 3−6 1

¸whose determinant is 20.

Therefore, we can apply the Implicit Function Theorem and compute the Jacobian of g : N (t0) ⊆R2 → N (x0) ⊆ R3:

Dg (t) = −∙2ex1 t1−x2 sinx1 − 6 cosx1

¸−1 ∙x2 −4 02 0 −1

¸=

=1

6t1 + 2 (cosx1) ex1 + t1x2 sinx1

∙2t1 − x2 cosx1 4 cosx1 −t1

−6x2 − 4ex1 − x22 sinx1 4x2 sinx1 + 24 2ex1

¸Exercise 710 Given the utility function u : R2++ → R++, (x, y) 7→ u (x, y) satisfying the followingpropertiesi. u is C2, ii. ∀ (x, y) ∈ R2++, Du (x, y) >> 0,iii. ∀ (x, y) ∈ R2++, Dxxu (x, y) < 0,Dyyu (x, y) <

0,Dxyu (x, y) > 0,compute the Marginal Rate of Substitution in (x0, y0) and say if the graph of each indifference

curve is concave.

53.752.51.250

5

3.75

2.5

1.25

0

x

y

x

y

5The example is taken from Rudin (1976), pages 227-228.


17.5 Some geometrical remarks on the gradientIn what follows we make some geometrical, not rigorous remarks on the meaning of the gradient,using the implicit function theorem. Consider an open subset X of R2, a C1 function

f : X → R, : (x, y) 7→ f (x, y)

where a ∈ R. Assume that set

L (a) := {(x, y) ∈ X : f (x, y) = a}

is such that ∀ (x, y) ∈ X, ∂f(x,y)∂y 6= 0 and ∂f(x,y)∂x 6= 0 , then

1. L (a) is the graph of a C1 function from a subset of R to R;

2. (x∗, y∗) ∈ L (a)⇒ the line going through the origin and the point Df (x∗, y∗) is orthogonalto the line going through the origin and parallel to the tangent line to L (a) at (x∗, y∗) ;orthe line tangent to the curve L (a) in (x∗, y∗) is orthogonal to the line to which the gradientbelongs to.

3. (x∗, y∗) ∈ L (a) ⇒ the directional derivative of f at (x∗, y∗) in the the direction u such thatkuk = 1 is the largest one if u = Df(x∗,y∗)

kDf(x∗,y∗)k .

1. It follows from the Implicit Function Theorem.2. The slope of the line going through the origin and the vector Df (x∗, y∗) is

∂f(x∗,y∗)∂y

∂f(x∗,y∗)∂x

(17.13)

Again from the Implicit Function Theorem, the slope of the tangent line to L (a) in (x∗, y∗) is

−∂f(x∗,y∗)

∂x∂f(x∗,y∗)

∂y

(17.14)

The product between the expressions in (17.13) and (17.14) is equal to −1.3. the directional derivative of f at (x∗, y∗) in the the direction u is

f 0 ( (x∗, y∗) ;u) = Df (x∗, y∗) · u = kDf (x∗, y∗)k · kuk · cos θ

where θ is an angle in between the two vectors. Then the above quantity is the greatest possible iffcos θ = 1, i.e., u is colinear with Df (x∗, y∗), i.e., u = Df(x∗,y∗)

kDf(x∗,y∗)k .

17.6 Extremum problems with equality constraints.Given the open set X ⊆ Rn, consider the C1 functions

f : X → R, f : x 7→ f (x) ,

g : X → Rm, g : x 7→ g (x) := (gj (x))mj=1

with m ≤ n. Consider also the following “maximization problem”:

(P ) maxx∈X f (x) subject to g (x) = 0 (17.15)

The setC := {x ∈ X : g (x) = 0}

is called the constraint set associated with problem (17.15).

17.6. EXTREMUM PROBLEMS WITH EQUALITY CONSTRAINTS. 213

Definition 711 The solution set to problem (17.15) is the set

{x∗ ∈ C : ∀x ∈ C, f (x∗) ≥ f (x)} ,

and it is denoted by argmax (17.15).

The function

L : X ×Rm → R, L : (x, λ) 7→ f (x) + λT g (x)

is called Lagrange function associated with problem (17.15).

Theorem 712 Given the open set X ⊆ Rn and the C1 functions

f : X → R, f : x 7→ f (x) , g : X → Rm, g : x 7→ g (x) := (gj (x))mj=1 ,

assume that1. f and g are C1functions,2. x0 is a solution to problem (17.15),6 and3. rank [Dg (x0)]m×n = m.Then, there exists λ0 ∈ Rm, such that, DL (x0, λ0) = 0, i.e.,½

Df (x0) + λ0Dg (x0) = 0g (x0) = 0

(17.16)

Proof. Define x0 := (xi)mi=1 ∈ Rm and t = (xm+k)

n−mk=1 ∈ Rn−m and therefore x = (x0, t). From

Assumption 3, without loss of generality,

det [Dx0g (x0)]m×m 6= 0. (17.17)

We want to show that there exists λ0 ∈ Rm which is a solution to the system

[Df (x0)]1×n + λ1×m [Dg (x0)]m×n = 0. (17.18)

We can rewrite (17.18) as follows£Dx0f (x0)1×m | Dtf (x0)1×(n−m)

¤+ λ1×m

£Dx0g (x0)m×m | Dtg (x0)m×(n−m)

¤= 0

or ⎧⎨⎩£Dx0f (x0)1×m

¤+ λ1×m

£Dx0g (x0)m×m

¤= 0 (1)£

Dtf (x0)1×(n−m)¤+ λ1×m

£Dtg (x0)m×(n−m)

¤= 0 (2)

(17.19)

From (17.17), there exists a unique solution λ0 to subsystem (1) in (17.19). If n = m, we aredone. Assume now that n > m. We have now to verify that λ0 is a solution to subsystem (2)in (17.19), as well. To get the desired result, we are going to use the Implicit Function Theorem.Summarizing, we hat that

1. g is C1, 2. g(x00, t0) = 0, 3. det [Dx0g (x00, t0)]m×m 6= 0,

i.e., all the assumption of the Implicit Function Theorem are verified. Then we can concludethat there exist N(x0) ⊆ Rm open neighborhood of x00, N(t0) ⊆ Rn−m open neighborhood of t0and a unique function ϕ : N(t0)→ N(x0) such that

1. ϕ is C1, 2. ϕ (t0) = x00 , 3. ∀t ∈ N (t0) , g (ϕ (t) , t) = 0. (17.20)

Define nowF : N(t0) ⊆ Rn−m → R, : t 7→ f (ϕ (t) , t) ,

andG : N(t0) ⊆ Rn−m → Rm, : t 7→ g (ϕ (t) , t) .

6The result does apply in the case in which x0 is a local maximum for Problem (17.15). Obviously the resultapply to the case of (local) minima, as well.


Then, from (17.20) and from Remark 708, we have that ∀t ∈ N (t0),

0 = [DG (t)]m×(n−m) = [Dx0g (ϕ (t) , t)]m×m · [Dϕ (t)]m×(n−m) + [Dtg (ϕ (t) , t)]m×(n−m) . (17.21)

Since7, from (17.20), ∀t ∈ N (t0) , g (ϕ (t) , t) = 0 and since

x0 := (x00, t0)

(17.20)= (ϕ (t0) , t0) (17.22)

is a solution to problem (17.15), we have that f (x0) = F (t0) ≥ F (t), i.e., briefly,

∀t ∈ N (t0) , F (t0) ≥ F (t) .

Then, from Proposition 692, DF (t0) = 0. Then, from the definition of F and the Chain Rule,we have

[Dx0f (ϕ (t0) , t0)]1×m · [Dϕ (t0)]m×(n−m) + [Dtf (ϕ (t0) , t0)]1×(n−m) = 0. (17.23)

Premultiplying (17.21) by λ, we get

λ1×m · [Dx0g (ϕ (t) , t)]m×m · [Dϕ (t)]m×(n−m) + λ1×m · [Dtg (ϕ (t) , t)]m×(n−m) = 0. (17.24)

Adding up (17.23) and (17.24) ,computed at t = t0,we get

([Dx0f (ϕ (t0) , t0)] + λ · [Dx0g (ϕ (t0) , t0)]) · [Dϕ (t0)]+ [Dtf (ϕ (t0) , t0)]++λ · [Dtg (ϕ (t0) , t0)] = 0,

and from (17.22),

([Dx0f (x0)] + λ · [Dx0g (x0)]) · [Dϕ (t0)] + [Dtf (x0)] + λ · [Dtg (x0)] = 0. (17.25)

Then, from the definition of λ0 as the unique solution to (1) in (17.19) ,we have that [Dx0f (x0)]+λ0 · [Dx0g (x0)] = 0, and then from (17.25) computed at λ = λ0, we have

[Dtf (x0)] + λ0 · [Dtg (x0)] = 0,

i.e., (2) in (17.19), the desired result.

7The only place where the proof has to be sligtgly changed to get the result for local maxima is here.

Part IV

Nonlinear programming

215

Chapter 18

Concavity

Consider1 a set X ⊆ Rn, a set Π ⊆ Rk and the functions f : X × Π → R, g : X × Π → Rm, h :X ×Π→ Rl. The goal of this Chapter is to study the problem:for given f, g, h and for given π ∈ Π,

maxx∈X

f (x) s.t. g (x) ≥ 0 and h (x) = 0,

under suitable assumptions. The role of concavity (and differentiability) of the functions f ,gand h is crucial.

18.1 Convex setsDefinition 713 A set C ⊆ Rn is convex if ∀x1, x2 ∈ C and ∀λ ∈ [0, 1], (1− λ)x1 + λx2 ∈ C.

Definition 714 A set C ⊆ Rn is strictly convex if ∀x1, x2 ∈ C such that x1 6= x2, and ∀λ ∈ (0, 1),(1− λ)x1 + λx2 ∈ Int C.

Remark 715 If C is strictly convex, then C is convex, but not vice-versa.

Proposition 716 The intersection of an arbitrary family of convex sets is convex.

Proof. We want to show that given a family {Ci}i∈I of convex sets, if x, y ∈ C := ∩i∈ICi

then (1− λ)x + λy ∈ C. x, y ∈ C implies that x, y ∈ Ci, ∀i ∈ I. Since Ci is convex, ∀i ∈ I,∀λ ∈ [0, 1], (1− λ)x+ λy ∈ Ci, and ∀λ ∈ [0, 1] (1− λ)x+ λy ∈ C.

Exercise 717 ∀n ∈ N,∀i ∈ {1, ..., n} , Ii is an interval in R, then

×ni=1Ii

is a convex set.

18.2 Different Kinds of Concave FunctionsMaintained Assumptions in this Chapter. Unless otherwise stated,

X is an open and convex subset of Rn.

f is a function such that

f : X → R, : x 7→ f (x) .

For each type of concavity we study, we present

1This part is based on Cass (1991).

217

218 CHAPTER 18. CONCAVITY

1. the definition in the case in which f is C0 (i.e., continuous),2. an attempt of a “partial characterization” of that definition in the case in which f is C1 and

C2; by partial characterization, we mean a statement which is either sufficient or necessary for theconcept presented in the case of continuous f ;3. the relationship between the different partial characterizations;4. the relationship between the type of concavity and critical points and local or global extrema

of f .Finally, we study the relationship between different kinds of concavities.The following pictures are taken from David Cass’s Microeconomic Course I followed at the

University of Pennsylvania (in 1985) and summarize points 1., 2. and 3. above.

18.2. DIFFERENT KINDS OF CONCAVE FUNCTIONS 219

18.2.1 Concave Functions.

Definition 718 Consider a C0 function f . f is concave iff ∀ x0, x00 ∈ X, ∀ λ ∈ [0, 1],

f((1− λ)x0 + λx00) ≥ (1− λ)f (x0) + f(x00).

Proposition 719 Consider a C0 function f .f is concave⇔M = {(x, y) ∈ X ×R : y ≤ f(x) } is convex.

Proof.[⇒]Take (x0, y0) , (x00, y00) ∈M. We want to show that

∀ λ ∈ [0, 1] , ((1− λ)x0 + λx00, (1− λ) y0 + λy00) ∈M.

But, from the definition of M, we get that

(1− λ)y0 + λy00 ≤ (1− λ)f(x0) + λf(x00) ≤ f((1− λ)x0 + λx00).

[⇐]From the definition of M, ∀x0, x00 ∈ X , (x0 , f(x0)) ∈M and (x00, f(x00)) ∈M.Since M is convex,

((1− λ)x0 + λx00, (1− λ) f (x0) + λf(x00)) ∈M

and from the definition of M,

(1− λ) f (x0) + λf(x00) ≤ f(λx0 + (1− λ)x00)

as desired.

Proposition 720 (Some properties of concave functions).1. If f, g : X → R are concave functions and a, b ∈ R+, then the function af + bg : X → R,

af + bg : x 7→ af (x) + bg (x) is a concave function.2. If f : X → R is a concave function and F : A→ R , with A ⊇ Im f , is nondecreasing and

concave, then F ◦ f is a concave function.

Proof.1. This result follows by a direct application of the definition.2. Let x0, x00 ∈ X and λ ∈ [0, 1] . Then

(F ◦ f) ((1− λ)x0 + λx00)(1)

≥ F ((1− λ) f (x0) + λf (x00))(2)

≥ (1− λ) · (F ◦ f) (x0) + λ · (F ◦ f) (x00) ,

where (1) comes from the fact that f is concave and F is non decreasing, and(2) comes from the fact that F is concave.

Remark 721 (from Sydsæter (1981)). With the notation of part 2 of the above Proposition, theassumption that F is concave cannot be dropped, as the following example shows. Take f, F :R++ → R++, f (x) =

√x and F (y) = y3. Then f is concave and F is strictly increasing, but

F ◦ f (x) = x32 and its second derivative is 3

4x− 12 > 0. Then, from Calculus I, we know that F ◦ f

is strictly convex and therefore it is not concave.Of course, the monotonicity assumption cannot be dispensed either. Consider f (x) = −x2 and

F (y) = −y.Then, (F ◦ f) (x) = x2, which is not concave.

Proposition 722 Consider a differentiable function f .f is concave⇔∀x0, x00 ∈ X, f(x00)− f(x0) ≤ Df(x0)(x00 − x0).


Proof.[⇒]From the definition of concavity, we have that for λ ∈ (0, 1) ,

(1− λ) f(x0) + λf(x00) ≤ f(x0 + λ (x00 − x0)) ⇒

λ (f (x00)− f (x0)) ≤ f (x0 + λ (x00 − x0))− f(x0) ⇒

f(x00)− f(x0) ≤ f(x0+λ(x00−x0))−f(x0)λ .

Taking limits of both sides of the lasts inequality for λ→ 0, we get the desired result.[⇐]Consider x0, x00 ∈ X and λ ∈ (0, 1). For λ ∈ {0, 1} , the desired result is clearly true. Since X is

convex, xλ := (1− λ)x0 + λx00 ∈ X. By assumption,

f(x00)− f(xλ

) ≤ Df(xλ)(x00 − xλ) and

f(x0)− f(xλ) ≤ Df(xλ)(x0 − xλ)

Multiplying the first expression by λ, the second one by (1− λ) and summing up, we get

λ(f(x00)− f(xλ)) + (1− λ)(f(x0)− f(xλ)) ≤ Df(xλ)(λ(x00 − xλ) + (1− λ)(x0 − xλ))

Sinceλ(x00 − xλ) + (1− λ) (x0 − xλ) = xλ − xλ = 0,

we getλf(x00) + (1− λ) f(x0) ≤ f(xλ),

i.e., the desired result.

Definition 723 Given a symmetric matrix An×n, A is negative semidefinite if ∀x ∈ Rn, xAx ≤ 0.A is negative definite if ∀x ∈ Rn\ {0}, xAx < 0.

Proposition 724 Consider a C2 function f .f is concave⇔∀x ∈ X, D2f(x) is negative semidefinite.

Proof.[⇒]We want to show that ∀u ∈ Rn, ∀x0 ∈ X, it is the case that uTD2f(x0)u ≤ 0. Since X is open,

∀ x0 ∈ X ∃ a ∈ R++ such that |h| < a⇒ (x0 + hu) ∈ X . Taken I := (−a, a) ⊆ R, define

g : I → R, g : h 7→ f(x0 + hu)− f(x0)−Df(x0)hu.

Observe thatg0 (h) = Dxf(x0 + hu) · u+Df(x0) · u

andg00 (h) = u ·D2f(x0 + hu) · u

Since f is a concave function, from Proposition 722, we have that ∀ h ∈ I, g(h) ≤ 0. Sinceg(0) = 0, h = 0 is a maximum point. Then, g

0(0) = 0 and

g00(0) ≤ 0 (1) .

Moreover, ∀h ∈ I, g0(h) = Df(x0 + hu)u−Df(x0)u and g00(h) = uTD2f(x0 + hu)u. Then,

g00(0) = u ·D2f(x0) · u (2) .


(1) and (2) give the desired result.[⇐]Consider x, x0 ∈ X. From Taylor’s Theorem (see, Proposition 698), we get

f(x) = f(x0) +Df(x0)(x− x0) +1

2

¡x− x0

¢TD2f(x)(x− x0)

where x = x0+ θ(x−x0), for some θ ∈ (0, 1) . Since, by assumption,¡x− x0

¢TD2f(x)(x−x0) ≤ 0,

we have thatf(x)− f(x0) ≤ Df(x0)(x− x0),

the desired result.

Some Properties.

Proposition 725 Consider a concave function f . If x0 is a local maximum point, then it is aglobal maximum point.

Proof.By definition of local maximum point, we know that ∃δ > 0 such that ∀x ∈ B (x0, δ) , f (x0) ≥

f (x) . Take y ∈ X; we want to show that f¡x0¢≥ f (y) .

Since X is convex,∀λ ∈ [0, 1] , (1− λ)x0 + λy ∈ X.

Take λ0 > 0 and sufficiently small to have¡1− λ0

¢x0+ λ0y ∈ B(x0, δ). To find such λ0, just solve

the inequality°°¡1− λ0

¢x0 + λ0y − x0

°° = °°λ0 ¡y − x0¢°° = ¯

λ0¯ °°¡y − x0

¢°° < δ, where, withoutloss of generality, y 6= x0.Then,

f¡x0¢≥ f

¡¡1− λ0

¢x0 + λ0y

¢ f concave≥

¡1− λ0

¢f(x0) + λ0f(y),

or λ0f(x0) ≥ λ0f(y). Dividing both sides of the inequality by λ0 > 0, we get f(x0) ≥ f(y).

Proposition 726 Consider a differentiable and concave function f . If Df(x0) = 0, then x0 is aglobal maximum point.

Proof.From Proposition 722, if Df(x0) = 0, we get that ∀ x ∈ X, f

¡x0¢≥ f(x), the desired result.

18.2.2 Strictly Concave Functions.

Definition 727 Consider a C0 function f . f is strictly concave iff ∀ x0, x00 ∈ X such thatx0 6= x00, ∀ λ ∈ (0, 1),

f((1− λ)x0 + λx00) > (1− λ)f (x0) + λf(x00).

Proposition 728 Consider a C1 function f .f is strictly concave⇔ ∀x0, x00 ∈ X such that x0 6= x00,

f(x00)− f(x0) < Df(x0)(x00 − x0).

Proof.[⇒]Since strict concavity implies concavity, it is the case that

∀x0, x00 ∈ X , f(x00)− f(x0) ≤ Df(x0)(x00 − x0). (18.1)

By contradiction, suppose f is not strictly concave. Then, from 18.1, we have that


∃ x0, x00 ∈ X, x0 6= x00 such that f(x00) = f(x0) +Df(x0)(x00 − x0). (18.2)

From the definition of strict concavity and 18.2, for λ ∈ (0, 1) ,

f((1− λ)x0 + λx00) > (1− λ)f (x0) + λf(x0) + λDf(x0)(x00 − x0)

or

f((1− λ)x0 + λx00) > f(x0) + λDf(x0)(x00 − x0). (18.3)

Applying 18.1 to the points x (λ) := (1− λ)x0 + λx00 and x0, we get that for λ ∈ (0, 1) ,

f((1− λ)x0 + λx00) ≤ f(x0) +Df(x0)((1− λ)x0 + λx00 − x0)

or

f((1− λ)x0 + λx00) ≤ f(x0) + λDf(x0)(x00 − x0). (18.4)

And 18.4 contradicts 18.3.[⇐] The proof is very similar to that one in Proposition 719.

Proposition 729 Consider a C2 function f . If

∀x ∈ X, D2f(x) is negative definite,

then f is strictly concave.

Proof.The proof is similar to that of Proposition 724.

Remark 730 In the above Proposition, the opposite implication does not hold. The standard coun-terexample is f : R→ R, f : x 7→ −x4.

Some Properties.

Proposition 731 Consider a strictly concave, C0 function f. If x0 is a local maximum point, thenit is a strict global maximum point, i.e., the unique global maximum point.

Proof.First, we show that a. it is a global maximum point, and then b. the desired result.a. It follows from the fact that strict concavity is stronger than concavity and from Proposition

725.b. Suppose otherwise, i.e., ∃x0, x0 ∈ X such that x0 6= x0 and both of them are global maximum

points. Then, ∀λ ∈ (0, 1) , (1− λ)x0 + λx0 ∈ X, since X is convex, and

f¡(1− λ)x0 + λx0

¢> (1− λ) f (x0) + λf

¡x0¢= f (x0) = f

¡x0¢,

a contradiction.

Proposition 732 Consider a strictly concave, differentiable function f . If Df¡x0¢= 0, then x0

is a strict global maximum point.

Proof.Take an arbitrary x ∈ X such that x 6= x0. Then from Proposition 728, we have that f(x) <

f(x0) +Df(x0)(x− x0) = f¡x0¢, the desired result.


18.2.3 Quasi-Concave Functions.

Definitions.

Definition 733 Consider a C0 function f . f is quasi-concave iff ∀x0, x00 ∈ X, ∀ λ ∈ [0, 1],

f((1− λ)x0 + λx00) ≥ min {f (x0) , f(x00)} .

Proposition 734 If f : X → R is a quasi-concave function and F : R → R is non decreasing,then F ◦ f is a quasi-concave function.

Proof.Without loss of generality, assume

f (x00) ≥ f (x0) (1) .

Then, since f is quasi-concave, we have

f((1− λ)x0 + λx00) ≥ f (x0) (2) .

Then,

F (f ((1− λ)x0 + λx00))(a)

≥ F (f (x0))(b)= min {F (f (x0)) , F (f (x00))} ,

where (a) comes from (2) and the fact that F is nondecreasing, and(b) comes from (1) and the fact that F is nondecreasing.

Proposition 735 Consider a C0 function f . f is quasi-concave ⇔∀α ∈ R, B (α) := {x ∈ X : f(x) ≥ α } is convex.

Proof.[⇒] [Strategy: write what you want to show].We want to show that ∀α ∈ R and ∀λ ∈ [0, 1], we have that

hx0, x00 ∈ B (a)i⇒ h(1− λ)x0 + λx00 ∈ B (α)i ,i.e.,

hf (x0) ≥ α and f (x00) ≥ αi⇒ hf ((1− λ)x0 + λx00) ≥ αi .But by Assumption,

f((1− λ)x0 + λx00) ≥ min {f (x0) , f(x00)}def x0,x00

≥ α.

[⇐]Consider arbitrary x0, x00 ∈ X. Define α := min {f (x0) , f(x00)}. Then x0, x00 ∈ B (α) . By

assumption, ∀ λ ∈ [0, 1], (1− λ)x0 + λx00 ∈ B (α) , i.e.,

f((1− λ)x0 + λx00) ≥ α := min {f (x0) , f(x00)} .

Proposition 736 Consider a differentiable function f . f is quasi-concave ⇔ ∀x0, x00 ∈ X,

f(x00)− f(x0) ≥ 0 ⇒ Df(x0)(x00 − x0) ≥ 0.Proof.[⇒] [Strategy: Use the definition of directional derivative.]Take x0, x00 such that f (x00) ≥ f (x0) . By assumption,

f ((1− λ)x0 + λx00) ≥ min {f (x0) , f(x00)} = f (x0)


andf ((1− λ)x0 + λx00)− f (x0) ≥ 0.

Dividing both sides of the above inequality by λ > 0, and taking limits for λ→ 0+, we get

limλ→0+

f (x0 + λ (x00 − x0))− f (x0)

λ= Df(x0)(x00 − x0) ≥ 0.

[⇐]Without loss of generality, take

f (x0) = min {f (x0) , f (x00)} (1) .

Defineϕ : [0, 1]→ R, ϕ : λ 7→ f ((1− λ)x0 + λx00) .

We want to show that∀λ ∈ [0, 1] , ϕ (λ) ≥ ϕ (0) .

Suppose otherwise, i.e., ∃ λ∗ ∈ [0, 1] such that ϕ (λ∗) < ϕ (0). Observe that in fact it cannot beλ∗ ∈ {0, 1}: if λ∗ = 0, we would have ϕ (0) < ϕ (0), and if λ∗ = 1, we would have ϕ (1) < ϕ (0), i.e.,f (x00) < f (x0), contradicting (1). Then, we have that

∃λ∗ ∈ (0, 1) such that ϕ (λ∗) < ϕ (0) (2) .

Observe that from (1) , we also have that

ϕ (1) ≥ ϕ (0) (3) .

Therefore, see Lemma 737, ∃ λ∗∗ > λ∗ such that

ϕ0 (λ∗∗) > 0 (4) , and

ϕ (λ∗∗) < ϕ (0) (5) .

From (4) , and using the definition of ϕ0, and the Chain Rule,2 we get

0 < ϕ0 (λ∗∗) = [Df ((1− λ∗∗)x0 + λ∗∗x00)] (x00 − x0) (6) .

Define x∗∗ := (1− λ∗∗)x0 + λ∗∗x00. From (5) , and the assumption, we get that

f (x∗∗) < f (x0) .

Therefore, by assumption,

0 ≤ Df (x∗∗) (x0 − x∗∗) = Df (x∗∗) (−λ∗∗) (x00 − x0) ,

i.e.,

[Df (x∗∗)]T (x00 − x0) ≤ 0 (7) .


Lemma 737 Consider a function g : [a, b]→ R with the following properties:1. g is differentiable on (a, b) ;2. there exists c ∈ (a, b) such that g (b) ≥ g (a) > g (c) .Then, ∃ t ∈ (c, b) such that g0 (t) > 0 and g (t) < g (a) .

2Defined v : [0, 1] → X ⊆ Rn, λ 7→ (1− λ)x0 + λx00, we have that ϕ = f ◦ v. Therefore, ϕ0 (λ∗) = Df (v (λ∗)) ·Dv (λ∗).


Proof.Without loss of generality and to simplify notation, assume g (a) = 0.DefineA := {x ∈ [c, b] : g (x) = 0} .Observe that A = [c, b] ∩ g−1 (0) is closed; and it is non empty, because g is continuous and by

assumption g (c) < 0 and g (b) ≥ 0.Therefore, A is compact, and we can define ξ := minA.Claim. x ∈ [c, ξ)⇒ g (x) < 0.Suppose not, i.e., ∃y ∈ (c, ξ) such that g (y) ≥ 0. If g (y) = 0, ξ could not be minA. If g (y) > 0,

since g (c) < 0 and g is continuous, there exists x0 ∈ (c, y) ⊆ (c, ξ) , again contradicting the definitionof ξ. End of the proof of the Claim.Finally, applying Lagrange Theorem to g on [c, ξ], we have that ∃t ∈ (c, ξ) such that g0 (t) =

g(ξ)−g(c)ξ−c . Since g (ξ) = 0 and g (c) < 0, we have that g0 (t) < 0. From the above Claim, the desired

result then follows.

Proposition 738 Consider a C2 function f . If f is quasi-concave then

∀x ∈ X,∀∆ ∈ Rn such that Df (x) ·∆ = 0, ∆TD2f(x)∆ ≤ 0.

Proof.for another proof- see Laura’ s fileSuppose otherwise, i.e., ∃ x ∈ X, and ∃∆ ∈ Rn such thatDf (x)·∆ = 0 and∆TD2f

¡x0¢∆ > 0.

Since the function h : X → R, h : x 7→ ∆TD2f(x)∆ is continuous and X is open, ∀λ ∈[0, 1] , ∃ε > 0 such that if k x− x0 k< ε , then

∆ ·D2f¡λx+ (1− λ)x0

¢·∆ > 0 (1) .

Define x := x0 + μ ∆k∆k , with 0 < μ < ε . Then,

kx− x0k = kμ ∆k∆kk = μ < ε

and x satisfies (1). Observe that

∆ =k ∆ kμ

(x− x0).

Then, we can rewrite (1) as¡x− x0

¢TD2f

¡λx+ (1− λ)x0

¢ ¡x− x0

¢> 0

From Taylor Theorem, ∃λ ∈ (0, 1) such that

f(x) = f(x0) + (x− x0)TDf¡x0¢+1

2(x− x0)TD2f(λx+ (1− λ)x0)(x− x0).

Since Df(x0)∆ = 0 and from (1) , we have

f(x) > f(x0) (2).

Letting ex = x0 + μ(−∆/k∆ k), using the same procedure as above, we can conclude that

f(ex) > f(x0) (3).

But, since x0 = 12(x+ ex), (2) and (3) contradict the Definition of quasi-concavity.

Remark 739 In the above Proposition, the opposite implication does not hold. Consider f : R→R, f : x 7→ x4.From Proposition 735, that function is clearly not quasi-concave. Take α > 0. ThenB (α) =

©x ∈ R : x4 ≥ α

ª= (−∞,−

√α) ∪ (

√α,+∞) which is not convex.

On the other hand observe the following. f 0 (x) = 4x3 and 4x3∆ = 0 if either x = 0 or ∆ = 0.In both cases ∆12x2∆ = 0. (This is example is taken from Avriel M. and others (1988), page 91).


Some Properties.

Remark 740 Consider a quasi concave function f . It is NOT the case thatif x0 is a local maximum point , then it is a global maximum point. To see that, consider the

following function.

f : R→ R, f : x 7→½−x2 + 1 if x < 10 if x ≥ 1

2.51.250-1.25-2.5

1.25

0

-1.25

-2.5

x

y

x

y

Proposition 741 Consider a C0 quasi-concave function f . If x0 is a strict local maximum point, then it is a strict global maximum point.

Proof.By assumption, ∃δ > 0 such that if x ∈ B (x0, δ) ∩X and x0 6= x, then f (x0) > f (x) .

Suppose the conclusion of the Proposition is false; then ∃x0 ∈ X such that f (x0) ≥ f (x0) .

Since f is quasi-concave,

∀λ ∈ [0, 1] , f ((1− λ)x0 + λx0) ≥ f (x0) . (1)

For sufficiently small λ, (1− λ)x0 + λx0 ∈ B (x0, δ) and (1) above holds, contradicting the factthat x0 is the strict local maximum point.

Proposition 742 Consider f : (a, b)→ R. f monotone ⇒ f quasi-concave.

Proof.Without loss of generality, take x00 ≥ x0.Case 1. f is increasing. Then f (x00) ≥ f (x0) . If λ ∈ [0, 1], then (1− λ)x0 + λx00 = x0 +

λ (x00 − x0) ≥ x0 and therefore f ((1− λ)x0 + λx00) ≥ f (x0).Case 2. f is decreasing. Then f (x00) ≤ f (x0) . If λ ∈ [0, 1], then (1− λ)x0 + λx00 = (1− λ)x0 −

(1− λ)x00 + x00 = x00 − (1− λ) (x00 − x0) ≤ x00and therefore f ((1− λ)x0 + λx00) ≥ f (x00) .

Remark 743 The following statement is false: If f1 and f2 are quasi-concave and a, b ∈ R+, thenaf1 + bf2 is quasi-concave.It is enough to consider f1, f2 : R→ R, f1 (x) = x3 + x, and f2 (x) = −4x. Since f 01 > 0, then

f1 and, of course, f2 are monotone and then, from Proposition 742, they are quasi-concave. On theother hand, g (x) = f1 (x) + f2 (x) = x3 − x has a strict local maximum in x = −1which is not astrict global maximum, and therefore, from Proposition 741, g is not quasi-concave.

x3 − 3x


2.51.250-1.25-2.5

20

10

0

-10

-20

x

y

x

y

Remark 744 Consider a differentiable quasi-concave function f . It is NOT the case thatif Df(x0) = 0, then x0 is a global maximum point.Just consider f : R→ R, f : x 7→ x3 and x0 = 0, and use Proposition 742.

18.2.4 Strictly Quasi-concave Functions.

Definitions.

Definition 745 Consider a C0 function f . f is strictly quasi-concaveiff ∀ x0, x00 ∈ X, such that x0 6= x00, and ∀ λ ∈ (0, 1), we have that

f((1− λ)x0 + λx00) > min {f (x0) , f(x00)} .

Proposition 746 Consider a C0 function f . f is strictly quasi-concave ⇒ ∀α ∈ R, B (α) :={x ∈ X : f(x) ≥ α } is strictly convex.

Proof.Taken an arbitrary α and x0, x00 ∈ B(α), with x0 6= x00, we want to show that ∀λ ∈ (0, 1), we

have that

xλ := (1− λ)x0 + λx00 ∈ Int B (α)

Since f is strictly quasi-concave,

f¡xλ¢> min {f (x0) , f(x00)} ≥ α

Since f is C0, there exists δ > 0 such that ∀x ∈ B¡xλ, δ

¢f (x) > α

i.e., B¡xλ, δ

¢⊆ B (α), as desired. Of course, we are using the fact that {x ∈ X : f(x) > α } ⊆

B (α).

Remark 747 Observe that in Proposition 746, the opposite implication does not hold true: justconsider f : R→ R, f : x 7→ 1.Observe that ∀α ≤ 1, B (α) = R, and ∀α > 1, B (α) = ∅. On the other hand, f is not strictly

quasi-concave.


Definition 748 Consider a differentiable function f . f is differentiable-strictly-quasi-concave iff∀ x0, x00 ∈ X, such that x0 6= x00, we have that

f(x00)− f(x0) ≥ 0 ⇒ Df(x0)(x00 − x0) > 0.

Proposition 749 Consider a differentiable function f .If f is differentiable-strictly-quasi-concave, then f is strictly quasi-concave.

Proof.The proof is analogous to the case of quasi concave functions.

Remark 750 Given a differentiable function, it is not the case that strict-quasi-concavity impliesdifferentiable-strict-quasi-concavity.

f : R → R, f : x 7→ x3 a. is differentiable and strictly quasi concave and b. it is notdifferentiable-strictly-quasi-concave.a. f is strictly increasing and therefore strictly quasi concave - see Fact below.b. Take x0 = 0 and x00 = 1. Then f (1) = 1 > 0 = f (0) . But Df (x0) (x00 − x0) = 0 · 1 = 0 ≯ 0.

Remark 751 If we restrict the class of differentiable functions to whose with non-zero gradientseverywhere in the domain, then differentiable-strict-quasi-concavity and strict-quasi-concavity areequivalent (see Balasko (1988), Math. 7.2.).

Fact. Consider f : (a, b)→ R. f strictly monotone ⇒ f strictly quasi concave.

Proof.By assumption, x0 6= x00, say x0 < x00 implies that f (x0) < f (x00) (or f (x0) > f (x00)). If

λ ∈ (0, 1), then (1− λ)x0 + λx00 > x0 and therefore f ((1− λ)x0 + λx00) > min {f (x0) , f (x00)} .

Proposition 752 Consider a C2 function f . If

∀x ∈ X,∀∆ ∈ Rn\ {0} ,we have thatDf (x)∆ = 0⇒ ∆TD2f(x)∆ < 0

®,

then f is differentiable-strictly-quasi-concave.

Proof.

Suppose otherwise, i.e., there exist x0, x00 ∈ X such that

x0 6= x00, f (x00) ≥ f (x0) and Df (x0) (x00 − x0) ≤ 0.

Since X is an open set, ∃a ∈ R++ such the following function is well defined:

g : [−a, 1]→ R, g : h 7→ f ((1− h)x0 + hx00) .

Since g is continuous, there exists hm ∈ [0, 1] which is a global minimum. We now proceed asfollows. Step 1. hm /∈ {0, 1} . Step 2. hm is a strict local maximum point, a contradiction.Preliminary observe that

g0 (h) = Df (x0 + h (x00 − x0)) · (x00 − x0)

and

g00 (h) = (x00 − x0)T ·D2f (x0 + h (x00 − x0)) · (x00 − x0) .

Step 1. If Df (x0) (x00 − x0) = 0, then, by assumption,

(x00 − x0)T ·D2f (x0 + h (x00 − x0)) · (x00 − x0) < 0.

Therefore, zero is a strict local maximum (see, for example, Theorem 13.10, page 378, in Apostol(1974) ). Therefore, there exists h∗ ∈ R such that g (h∗) = f (x0 + h∗ (x00 − x0)) < f (x0) = g (0) .


Ifg0 (0) = Df (x0) (x00 − x0) < 0,

then there exists h∗∗ ∈ R such that

g (h∗∗) = f (x0 + h∗ (x00 − x0)) < f (x0) = g (0) .

Moreover, g (1) = f (x00) ≥ f (x0) . In conclusion, neither zero nor one can be global minimumpoints for g on [0, 1] .Step 2. Since the global minimum point hm ∈ (0, 1) , we have that

0 = g0 (hm) = Df (x0 + hm (x00 − x0)) (x00 − x0) .

Then, by assumption,

g00 (0) = (x00 − x0)T ·D2f (x0 + hm (x

00 − x0)) · (x00 − x0) < 0,

but then hm is a strict local maximum point, a contradiction.

Remark 753 Differentiable-strict-quasi-concavity does not imply the condition presented in Propo-sition 752. f : R → R, f : x 7→ −x4 is differentiable-strictly-quasi-concave (in next sectionwe will show that strict-concavity implies differentiable-strict-quasi-concavity). On the other hand,take x∗ = 0. Then Df (x∗) = 0. Therefore, for any ∆ ∈ Rn\ {0} , we have Df (x∗)∆ = 0, but∆TD2f (x∗)∆ = 0 ≮ 0.

Some Properties.

Proposition 754 Consider a differentiable-strictly-quasi-concave function f .x∗ is a strict global maximum point ⇔ Df (x∗) = 0.

Proof.[⇒] Obvious.[⇐] From the contropositive of the definition of differentiable-strictly-quasi-concave function,

we have:∀ x∗, x00 ∈ X, such that x∗ 6= x00, it is the case that Df(x∗)(x00−x∗) ≤ 0⇒ f (x00)− f (x∗) < 0

or f (x∗) > f (x00) . Since Df (x∗) = 0, then the desired result follows.

Remark 755 Obviously, we also have that if f is differentaible-strictly-quasi-concave, it is the casethat:

x∗ local maximum point ⇒ x∗ is a strict maximum point.

Remark 756 The above implication is true also for continuous strictly quasi concave functions.(Suppose otherwise, i.e., ∃ x0 ∈ X such that f (x0) ≥ f (x∗). Since f is strictly quasi-concave,∀λ ∈ (0, 1), f ((1− λ)x∗ + λx0) > f (x∗), which for sufficiently small λ contradicts the fact that x∗

is a local maximum point.

Is there a definition of ?−concavity weaker than concavity and such that:If f is a ?−concave function, thenx∗ is a global maximum point iff Df (x∗) = 0.The answer is given in the next section.

18.2.5 Pseudo-concave Functions.

Definition 757 Consider a differentiable function f . f is pseudo-concave iff

∀x0, x00 ∈ X, f (x00) > f (x0)⇒ Df (x0) (x00 − x0) > 0,

or

∀x0, x00 ∈ X, Df (x0) (x00 − x0) ≤ 0⇒ f (x00) ≤ f (x0) .


Proposition 758 If f is a pseudo-concave function, then

x∗ is a global maximum point ⇔ Df (x∗) = 0.

Proof.[⇒] Obvious.[⇐] Df (x∗) = 0⇒ ∀x ∈ X, Df (x∗) (x− x∗) ≤ 0⇒ f (x) ≤ f (x∗) .

Remark 759 Observe that the following “definition of pseudo-concavity” will not be useful:

∀x0, x00 ∈ X, Df (x0) (x00 − x0) ≥ 0⇒ f (x00) ≤ f (x0) (18.5)

For such a definition the above Proposition would still apply, but it is not weaker than concavity.Simply consider the function f : R → R, f : x 7→ −x2. That function is concave, but it does notsatisfy condition (18.5). Take x0 = −2 and x00 = −1. Then, f 0 (x0) (x00 − x0) = 4 (−1− (−2)) =4 > 0, but f (x00) = −1 > f (x0) = −4.

We summarize some of the results of this subsection in the following tables.

Class of function Fundamental properties

C ⇒ G max L max ⇒ G maxUniquenessof G. max

Strictly concave Yes Yes YesConcave Yes Yes NoDiff.ble-str.-q.-conc. Yes Yes YesPseudoconcave Yes Yes NoQuasiconcave No No No

where C stands for property of being a critical point, and L and G stand for local and global,respectively. Observe that the first, the second and the last row of the second column apply to thecase of C0 and not necessarily differentiable functions.

18.3 Relationships among Different Kinds of ConcavityThe relationships among different definitions of concavity in the case of differentiable functions aresummarized in the following table.

strict concavity⇓ &

linearity ⇒ affinity ⇒ concavity⇓pseudo-concavity ⇐ differentiable-strict-quasi-concavity⇓quasi-concavity

All the implications which are not implied by those explicitly written do not hold true.In what follows, we prove the truth of each implication described in the table and we explain

why the other implications do no hold.Recall that1. f : Rn → Rm is a linear function iff ∀x0, x00 ∈ Rn, ∀a, b ∈ R f (ax0 + bx00) = af (x0)+ bf (x00);2. g : Rn → Rm is an affine function iff there exists a linear function f : Rn → Rm and c ∈ Rm

such that ∀x ∈ Rn, g (x) = f (x) + c.

SC ⇒ CObvious (“a > b⇒ a ≥ b”).

18.3. RELATIONSHIPS AMONG DIFFERENT KINDS OF CONCAVITY 231

C ⇒ PCFrom the assumption and from Proposition 722, we have that f (x00)−f (x0) ≤ Df (x0) (x00 − x0) .

Then f (x00)− f (x0) > 0⇒ Df (x0) (x00 − x0) > 0.

PC ⇒ QC

Suppose otherwise, i.e., ∃ x0, x00 ∈ X and ∃λ∗ [0, 1] such that

f ((1− λ∗)x0 + λ∗x00) < min {f (x0) , f (x00)} .

Define x (λ) := (1− λ)x0 + λx00. Consider the segment L (x0, x00) joining x0 to x00. Take λ ∈argminλ f (x (λ)) s.t. λ ∈ [0, 1] . λ is well defined from the Extreme Value Theorem. Observe thatλ 6= 0, 1, because f (x (λ∗)) < min {f (x (0)) = f (x0) , f (x (1)) = f (x00)} .Therefore, ∀λ ∈ [0, 1] and ∀μ ∈ (0, 1) ,f¡x¡λ¢¢≤ f

¡(1− μ)x

¡λ¢+ μx (λ)

¢.

(1− μ)x¡λ¢+ μx (λ)

↓· · · · ·↑ ↑ ↑ ↑x0 x (λ) x

¡λ¢

x00

Then,

∀λ ∈ [0, 1] , 0 ≤ limμ→0+

f¡(1− μ)x

¡λ¢+ μx (λ)

¢− f

¡x¡λ¢¢

μ= Df

¡x¡λ¢¢ ¡

x (λ)− x¡λ¢¢.

Taking λ = 0, 1 in the above expression, we get:

Df¡x¡λ¢¢ ¡

x0 − x¡λ¢¢≥ 0 (1)

and

Df¡x¡λ¢¢ ¡

x00 − x¡λ¢¢≥ 0 (2) .

Since

x0 − x¡λ¢= x0 −

¡1− λ

¢x0 − λx00 = −λ (x00 − x0) (3) ,

and

x00 − x¡λ¢= x00 −

¡1− λ

¢x0 − λx00 =

¡1− λ

¢(x00 − x0) (4) ,

substituting (3) in (1) , and (4) in (2) , we get

(−)−λ ·

£Df

¡x¡λ¢¢· (x00 − x0)

¤≥ 0,

and

(+)¡1− λ

¢·£Df

¡x¡λ¢¢· (x00 − x0)

¤≥ 0.

Therefore,

0 = Df¡x¡λ¢¢· (x00 − x0) = Df

¡x¡λ¢¢·¡1− λ

¢· (x00 − x0)

(4)=

= Df¡x¡λ¢¢·¡x00 − x

¡λ¢¢.

Then, by pseudo-concavity,f (x00) ≤ f

¡x¡λ¢¢

(5) .

By assumption,f (x (λ∗)) < f (x00) (6) .


(5) and (6) contradict the definition of λ.

DSQC ⇒ PC

Obvious.

SC ⇒ DSQC

Obvious.

L ⇒ CObvious.

C ; SCf : R→ R, f : x 7→ x.

QC ; PC

f : R→ R, f : x 7→½

0 if x ≤ 0e−

1x2 if x > 0

420-2-4

1

0.75

0.5

0.25

0

x

y

x

y

f is clearly nondecreasing and therefore, from Lemma 742, quasi-concave.f is not pseudo-concave: 0 < f (−1) > f (1) = 0, butf 0 (1) (−1− 1) = 0 · (−2) = 0.

PC ; C , DSQC ; C and DSQC ; SC

Take f : (1,+∞)→ R, f : x 7→ x3.

Take x0 < x00. Then f (x00) > f (x0) .Moreover,(>0)

Df (x0)(>0)

(x00 − x0) > 0. Therefore, f is DSQC andtherefore PC. Since f 00 (x) > 0, f is strictly convex and therefore it is not concave and, a fortiori,it is not strictly concave.

PC ; DSQC , C ; DSQC

Consider f : R→ R, f : x 7→ 1. f is clearly concave and PC, as well ( ∀x0, x00 ∈ R, Df (x0) (x00 − x0) ≥0). Moreover, any point in R is a critical point, but it is not the unique global maximum point.Therefore, from Proposition 754, f is not differentiable - strictly - quasi - concave.

QC ; DSQC

If so, we would have QC ⇒ DSQC ⇒ PC, contradicting the fact that QC ; PC.

C ; L and SC ; Lf : R→ R, f : x 7→ −x2.

18.3. RELATIONSHIPS AMONG DIFFERENT KINDS OF CONCAVITY 233

18.3.1 Hessians and Concavity.

In this subsection, we study the relation between submatrices of a matrix involving the Hessianmatrix of a C2 function and the concavity of that function.

Definition 760 Consider a matrix An×n. Let 1 ≤ k ≤ n.A k− th order principal submatrix (minor) of A is the (determinant of the) square submatrix of

A obtained deleting (n− k) rows and (n− k) columns in the same position. Denote these matricesby eDi

k.The k − th order leading principal submatrix (minor) of A is the (determinant of the) square

submatrix of A obtained deleting the last (n− k) rows and the last (n− k) columns. Denote thesematrices by Dk.

Example 761 Consider

A =

⎡⎣ a11 a12 a13a21 a22 a23a31 a32 a33

⎤⎦ .Then

eD11 = a11, eD2

1 = a22, eD31 = a33, D1 = eD1

1 = a11;

eD12 =

∙a11 a12a21 a22

¸, eD2

2 =

∙a11 a13a31 a33

¸, eD3

2 =

∙a22 a23a32 a33

¸,

D2 = eD12 =

∙a11 a12a21 a22

¸;

D3 = eD13 = A.

Definition 762 Consider a C2 function f : X ⊆ Rn → R. The bordered Hessian of f is thefollowing matrix

Bf (x) =

∙0 Df (x)

[Df (x)]T D2f (x)

¸.

Theorem 763 (Simon, (1985), Theorem 1.9.c, page 79 and Sydsaeter (1981), Theorem 5.17, page259). Consider a C2 function f : X → R.1. If ∀x ∈ X, ∀k ∈ {1, ..., n} ,sign

¡k − leading principal minor of D2f (x)

¢= sign (−1)k ,

then f is strictly concave.2. ∀x ∈ X, ∀k ∈ {1, ..., n} ,sign

¡non zero k − principal minor of D2f (x)

¢= sign (−1)k ,

iff f is concave.3. If n ≥ 2 and ∀x ∈ X, ∀k ∈ {3, ..., n+ 1} ,sign (k − leading principal minor of Bf (x)) = sign (−1)k−1 ,then f is pseudo concave and, therefore, quasi-concave.4. If f is quasi-concave, then ∀x ∈ X, ∀k ∈ {2, ..., n+ 1} ,sign (non zero k − leading principal minors of Bf (x)) = sign (−1)k−1

Remark 764 It can be proved that Conditions in part 1 and 2 of the above Theorem are sufficientfor D2f (x) being negative definite and equivalent to D2f (x) being negative semidefinite, respec-tively.

Remark 765 (From Sydsaetter (1981), page 239) It is tempting to conjecture that a function f isconcave iff

∀x ∈ X,∀k ∈ {1, ..., n} , sign¡non zero k − leading principal minor of D2f (x)

¢= sign (−1)k ,

(18.6)


That conjecture is false. Consider

f : R3 → R, f : (x1, x2, x3) 7→ −x22 + x23.

Then Df (x) = (0,−2x2, 2x3) and

D2f (x) =

⎡⎣ 0 0 00 −2 00 0 2

⎤⎦All the leading principal minors of the above matrix are zero, and therefore Condition 18.6 issatisfied, but f is not a concave function. Take x0 = (0, 0, 0) and x00 (0, 0, 1). Then

∀λ ∈ (0, 1) , f ((1− λ)x0 + λx00) = λ2 < (1− λ) f (x0) + λf (x00) = λ

Example 766 Consider f : R2++ → R, f : (x, y) 7→ xαyβ , with α, β ∈ R++. Observe that ∀ (x, y) ∈R2++, f (x, y) > 0. Verify that

1. if α+ β < 1, then f is strictly concave;

2. ∀α, β ∈ R++, f is quasi-concave;

3. α+ β ≤ 1 if and only if f is concave.

1.Dxf (x, y) = αxα−1yβ = α

x f (x, y) ;

Dyf (x, y) = βxαyβ−1 = βy f (x, y) ;

D2x,xf (x, y) = α (α− 1)xα−2yβ = α(α−1)

x2 f (x, y) ;

D2y,yf (x, y) = β (β − 1)xαyβ−2 = β(β−1)

y2 f (x, y) ;

D2x,.yf (x, y) = αβxα−1yβ−1 = α

xβy f (x, y) .

D2f (x, y) = f (x, y)

"α(α−1)

x2αxβy

αxβy

β(β−1)y2

#.

a. α(α−1)x2 < 0⇔ α ∈ (0, 1).

b. α(α−1)β(β−1)−α2β2x2y2 = 1

x2y2

¡αβ (αβ − α− β + 1)− α2β2

¢=

= 1x2y2αβ (1− α− β) > 0

α,β>0⇔ α+ β < 1.

In conclusion, if α, β ∈ (0, 1) and α+ β < 1, then f is strictly concave.2.Observe that

f (x, y) = g (h (x, y))

whereh : R2++ → R, (x, y) 7→ α lnx+ β ln y

g : R→ R, z 7→ ez

Since h is strictly concave (why?) and therefore quasi-concave and g is strictly increasing, thedesired result follows from Proposition 734.3.Obvious from above results.

Chapter 19

Maximization Problems

Let the following objects be given:

1. an open convex set X ⊆ Rn, n ∈ N\ {0};

2. f : X → R, g : X → Rm, h : X → Rk, m, k ∈ N\ {0}, with f, g, h at least differentiable.

The goal of this Chapter is to study the problem.

maxx∈X f (x)s.t. g (x) ≥ 0 (1)

h (x) = 0 (2)(19.1)

f is called objective function; x choice variable vector; (1) and (2) in (19.1) constraints; gand h constraint functions;

C := {x ∈ X : g (x) ≥ 0 and h (x) = 0}is the constraint set.To solve the problem (19.1) means to describe the following set

{x∗ ∈ C : ∀x ∈ C, f (x∗) ≥ f (x)}

which is called solution set to problem (19.1) and it is also denoted by argmax (19.1). We willproceed as follows.1. We will analyze in detail the problem with inequality constraints, i.e.,

maxx∈X f (x)s.t. g (x) ≥ 0 (1)

2. We will analyze in detail the problem with equality constraints, i.e.,

maxx∈X f (x)s.t. h (x) = 0 (2)

3. We will describe how to solve the problem with both equality and inequality constraints, i.e.,

maxx∈X f (x)s.t. g (x) ≥ 0 (1)

h (x) = 0 (2)

19.1 The case of inequality constraints: Kuhn-Tucker theo-rems

Consider the open and convex set X ⊆ Rn and the differentiable functions f : X → R, g :=¡gj¢mj=1

: X → Rm. The problem we want to study is

maxx∈X f (x) s.t. g (x) ≥ 0. (19.2)

235

236 CHAPTER 19. MAXIMIZATION PROBLEMS

Definition 767 The Kuhn-Tucker system (or conditions) associated with problem 19.2 is⎧⎪⎪⎨⎪⎪⎩Df (x) + λDg (x) = 0 (1)λ ≥ 0 (2)g (x) ≥ 0 (3)λg (x) = 0 (4)

(19.3)

Equations (1) are called first order conditions; equations (2) , (3) and (4) are called complemen-tary slackness conditions.

Remark 768 (x, λ) ∈ X × Rm is a solution to Kuhn-Tucker system iff it is a solution to any ofthe following systems:

1. ⎧⎪⎪⎨⎪⎪⎩∂f(x)∂xi

+Pm

j=1 λj∂gj(x)∂xi

= 0 for i = 1, ..., n (1)

λj ≥ 0 for j = 1, ...,m (2)gj (x) ≥ 0 for j = 1, ...,m (3)λjgj (x) = 0 for j = 1, ...,m (4)

2. ½Df (x) + λDg (x) = 0 (1)min {λj , gj (x)} = 0 for j = 1, ...,m (2)

Moreover, (x, λ) ∈ X ×Rm is a solution to Kuhn-Tucker system iff it is a solution to

Df (x) + λDg (x) = 0 (1)

and for each j = 1, ...,m, to one of the following conditions

either ( λj > 0 and gj (x) = 0 )or ( λj = 0 gj (x) > 0 )or ( λj = 0 gj (x) = 0 )

Definition 769 Given x∗ ∈ X, we say that j is a binding constraint at x∗ if gj (x∗) = 0. Let

J∗ (x∗) := {j ∈ {1, ...,m} : gj (x∗) = 0}

andg∗ := (gj)j∈J∗(x∗) , bg := (gj)j /∈J∗(x∗)

Definition 770 x∗ ∈ Rn satisfies the constraint qualifications associated with problem 19.2 if it isa solution to

maxx∈Rn Df (x∗)x s.t. Dg∗ (x∗) (x− x∗) ≥ 0 (19.4)

The above problem is obtained from 19.2

1. replacing g with g∗;

2. linearizing f and g∗ around x∗, i.e., substituting f and g∗ with f (x∗)+Df (x∗) (x− x∗) andg (x∗) +Dg (x∗) (x− x∗), respectively;

3. dropping redundant terms, i.e., the term f (x∗) in the objective function, and the termg∗ (x∗) = 0 in the constraint.

Theorem 771 Suppose x∗ is a solution to problem 19.2 and to problem 19.4, then there existsλ∗ ∈ Rm such that (x∗, λ∗) satisfies Kuhn-Tucker conditions.

The proof of the above theorem requires the following lemma, whose proof can be found forexample in Chapter 2 in Mangasarian (1994).

Lemma 772 (Farkas) Given a matrix Am×n and a vector a ∈ Rn,either 1. there exists λ ∈ Rm+ such that a = λA,or 2. there exists y ∈ Rn such that Ay ≥ 0 and ay < 0,but not both.

19.1. THE CASE OF INEQUALITY CONSTRAINTS: KUHN-TUCKER THEOREMS 237

Proof. of Theorem 771(main steps: 1. use the fact x∗ is a solution to problem 19.4; 2. apply Farkas Lemma; 3. choose

λ∗ = (λ from Farkas , 0)).Since x∗ is a solution to problem 19.4, for any x ∈ Rn such that Dg∗ (x∗) (x− x∗) ≥ 0 it is the

case that Df (x∗)x∗ ≥ Df (x∗)x or

Dg∗ (x∗) (x− x∗) ≥ 0⇒ [−Df (x∗)] (x− x∗) ≥ 0. (19.5)

Applying Farkas Lemma identifying

a with −Df (x∗)

andA with Dg∗ (x∗)

we have that either1. there exists λ ∈ Rm+ such that

−Df (x∗) = λDg∗ (x∗) (19.6)

or 2. there exists y ∈ Rn such that

Dg∗ (x∗) y ≥ 0 and −Df (x∗) y < 0 (19.7)

but not both 1 and 2.Choose x = y + x∗ and therefore you have y = x− x∗. Then, 19.7 contradicts 19.5. Therefore,

1. above holds.Now, choose λ∗ := (λ, 0) ∈ Rm∗ ×Rm−m∗ , we have that

Df (x∗) + λ∗Dg (x∗) = Df (x∗) + (λ, 0)

µDg∗ (x∗)Dbg (x∗)

¶= Df (x∗) + λDg (x∗) = 0

where the last equality follows from 19.6;λ∗ ≥ 0 by Farkas Lemma;g (x∗) ≥ 0 from the assumption that x∗ solves problem 19.2;

λ∗g (x∗) = (λ, 0)

µg∗ (x∗)bg (x∗)

¶= λg∗ (x) = 0, where the last equality follows from the definition

of g∗.

Theorem 773 If x∗ is a solution to problem (19.2) andeither for j = 1, ..,m, gj is pseudo-concave and ∃x++ ∈ X such that g (x++)À 0,or rank Dg∗ (x∗) = m∗ := #J∗ (x∗),then x∗ solves problem (19.4) .

Proof. We prove the conclusion of the theorem under the first set of conditions.Main steps: 1. suppose otherwise: ∃ ex ... ; 2. use the two assumptions; 3. move from x∗ in the

direction xθ := (1− θ) ex+ θx++.Suppose that the conclusion of the theorem is false. Then there exists ex ∈ Rn such that

Dg∗ (x∗) (ex− x∗) ≥ 0 and Df (x∗) (ex− x∗) > 0 (19.8)

Moreover, from the definition of g∗ and x++, we have that

g∗¡x++

¢>> 0 = g∗ (x∗)

Since for j = 1, ..,m, gj is pseudo-concave we have that

Dg∗ (x∗)¡x++ − x∗

¢À 0 (19.9)

Definexθ := (1− θ) ex+ θx++


with θ ∈ (0, 1). Observe that

xθ − x∗ = (1− θ) ex+ θx++ − (1− θ)x∗ − θx∗ = (1− θ) (ex− x∗) + θ¡x++ − x∗

¢Therefore,

Dg∗ (x∗)¡xθ − x∗

¢= (1− θ)Dg∗ (x∗) (ex− x∗) + θDg∗ (x∗)

¡x++ − x∗

¢À 0 (19.10)

where the last equality come from 19.8 and 19.9.Moreover,

Df (x∗)¡xθ − x∗

¢= (1− θ)Df (x∗) (ex− x∗) + θDf (x∗)

¡x++ − x∗

¢À 0 (19.11)

where the last equality come from 19.8 and a choice of θ sufficiently small.1

Observe that from Remark 665, 19.10 and 19.11 we have that

(g∗)0¡x∗, xθ

¢À 0

andf 0¡x∗, xθ

¢> 0

Therefore, using the fact that X is open, and that bg (x∗)À 0, there exists γ such that

x∗ + γ¡xθ − x∗

¢∈ X

g∗¡x∗ + γ

¡xθ − x∗

¢¢À g∗ (x∗) = 0

f∗¡x∗ + γ

¡xθ − x∗

¢¢> f (x∗)bg ¡x∗ + γ

¡xθ − x∗

¢¢À 0

(19.12)

But then 19.12 contradicts the fact that x∗ solves problem (19.2).From Theorems 771 and 773, we then get the following corollary.

Theorem 774 Suppose x∗ is a solution to problem 19.2, and one of the following constraint qual-ifications hold:a. for j = 1, ...,m, gj is pseudo-concave and there exists x++ ∈ X such that g (x++)À 0b. rankDg∗ (x∗) = #J∗,Then there exists λ∗ ∈ Rm such that (x∗, λ∗) solves the system 19.3.

Theorem 775 If f is pseudo-concave, and for j = 1, ...,m, gj is quasi-concave, and (x∗, λ∗) solves

the system 19.3, then x∗ solves problem 19.2.

Proof. Main steps: 1. suppose otherwise and use the fact that f is pseudo-concave; 2. forj ∈ bJ (x∗) := {1, ...,m} \J∗ (x∗), use the quasi-concavity of gj ; 3. for j ∈ J∗ (x∗), use (secondpart of) kuhn-Tucker conditions; 4. Observe that 2. and 3. above contradict the first part ofKuhn-Tucker conditions.)Suppose otherwise, i.e., there exists bx ∈ X such that

g (bx) ≥ 0 and f (bx) > f (x∗) (19.13)

From 19.13 and the fact that f pseudo-concave, we get

Df (x∗) (bx− x∗) > 0 (19.14)

From 19.13, the fact that g∗ (x∗) = 0 and that gj is quasi-concave, we get that

for j ∈ J∗ (x∗) , Dgj (x∗) (bx− x∗) ≥ 01Assume that θ ∈ (0, 1), α ∈ R++ and β ∈ R. We want to show that there exist θ∗ ∈ (0, 1) such that

(1− θ)α+ θβ > 0

i.e.,

α > θ (α− β)

If (α− β) = 0, the claim is true.If (α− β) > 0, any θ < α

α−β will work (observe thatα

α−β > 0).If (α− β) < 0, the claim is clearly true because 0 < α and θ (α− β) < 0.

19.1. THE CASE OF INEQUALITY CONSTRAINTS: KUHN-TUCKER THEOREMS 239

and since λ∗ ≥ 0,for j ∈ J∗ (x∗) , λ∗jDgj (x∗) (bx− x∗) ≥ 0 (19.15)

For j ∈ bJ (x∗), from Kuhn-Tucker conditions, we have that gj (x∗) > 0 and λ∗j = 0, and thereforefor j ∈ bJ (x∗) , λ∗jDgj (x∗) (bx− x∗) = 0 (19.16)

But then from 19.14, 19.15 and 19.16, we have

Df (x∗) (bx− x∗) + λ∗Dg (x∗) (bx− x∗) > 0

contradicting Kuhn-Tucker conditions.We can summarize the above results as follows. Call (M) the problem

maxx∈X f (x) s.t. g (x) ≥ 0 (19.17)

and defineM := argmax (M) (19.18)

S := {x ∈ X : ∃λ ∈ Rm such that (x, λ) is a solution to Kuhn-Tucker system (19.3)} (19.19)

1. Assume that one of the following conditions hold:

(a) for j = 1, ...,m, gj is pseudo-concave and there exists x++ ∈ X such that g (x++)À 0

(b) rankDg∗ (x∗) = #J∗.Then

x∗ ∈M ⇒ x∗ ∈ S

2. Assume that both the following conditions hold:

(a) f is pseudo-concave, and

(b) for j = 1, ...,m, gj is quasi-concave.Then

x∗ ∈ S ⇒ x∗ ∈M.

19.1.1 On uniqueness of the solution

The following proposition is a useful tool to show uniqueness.

Proposition 776 The solution to problem

maxx∈X f (x) s.t. g (x) ≥ 0 (P )

either does not exist or it is unique if one of the following conditions holds

1. f is strictly quasi-concave, and

for j ∈ {1, ...,m}, gj is quasi-concave;

2. f is quasi-concave and locally non-satiated (i.e., ∀x ∈ X,∀ε > 0, there exists x0 ∈ B (x, ε)such that f (x0) > f (x) ), and

for j ∈ {1, ...,m}, gj is strictly quasi-concave.

Proof. 1.Since gj is quasi concave V j := {x ∈ X : gj (x) ≥ 0} is convex. Since the intersection of convex

sets is convex V = ∩mj=1V j is convex.Suppose that both x0 and x00 are solutions to problem (P ) and x0 6= x00. Then for any λ ∈ (0, 1),

(1− λ)x0 + λx00 ∈ V (19.20)


because V is convex, and

f ((1− λ)x0 + λx00) > min {f (x0) , f (x00)} = f (x0) = f (x00) (19.21)

because f is strictly-quasi-concave.But (19.20) and (19.21) contradict the fact that x0 and x00 are solutions to problem (P ).2.Observe that V is strictly convex because each V j is strictly convex. Suppose that both x0 and

x00 are solutions to problem (P ) and x0 6= x00. Then for any λ ∈ (0, 1),

x (λ) := (1− λ)x0 + λx00 ∈ Int V

i.e., ∃ε > 0 such that B (x (λ) , ε) ⊆ V . Since f is locally non-satiated, there exists x0 ∈B (x (λ) , ε) ⊆ V such that

f (bx) > f (x (λ)) (19.22)

Since f is quasi-concave,f (x (λ)) ≥ f (x0) = f (x00) (19.23)

(19.22) and (19.23) contradict the fact that x0 and x00 are solutions to problem (P ).

Remark 777 1. If f is strictly increasing (i.e., ∀x0, x00 ∈ X such that x0 > x00, we have thatf (x0) > f (x00) ) or strictly decreasing, then f is locally non-satiated.2. If f is affine and not constant, then f is quasi-concave and Locally NonSatiated.Proof of 2.f : Rn → R affine and not constant means that there exists a ∈ R and b ∈ R\ {0} such that

f : x 7→ a+ bTx. Take an arbitrary x and ε > 0. For i ∈ {1, ..., n}, define αi := εk · (sign bi) andex := x+ (αi)

ni=1, with k 6= 0 and which will be computed below. Then

f (ex) = a+ bx+Pn

i=1εk |bi| > f (x);

kex− xk =°° εk · ((sign bi) · bi)ni=1

°° = εk ·°°°°qPn

i=1 (bi)2

°°°° = εk · kbk < ε if k > 1

kbk .

Remark 778 In part 2 of the statement of the Proposition f has to be both quasi-concave andLocally NonSatiated.a. Example of f quasi-concave (and gj strictly-quasi-concave) with more than one solution:

maxx∈R

1 s.t. x+ 1 ≥ 0 1− x ≥ 0

The set of solution is [−1,+1]a. Example of f Locally NonSatiated (and gj strictly-quasi-concave) with more than one solution:

maxx∈R

x2 s.t. x+ 1 ≥ 0 1− x ≥ 0

The set of solutions is {−1,+1}.

19.2 The Case of Equality Constraints: Lagrange Theorem.Consider the C1 functions

f : X → R, f : x 7→ f (x) ,

g : X → Rm, g : x 7→ g (x) := (gj (x))mj=1

with m ≤ n. Consider also the following “”maximization problem:

(P ) maxx∈X f (x) s.t. g (x) = 0 (19.24)

L : X ×Rm → R, L : (x, λ) 7→ f (x) + λg (x)

is called Lagrange function associated with problem (17.15).We recall below the statement of Theorem 712.

19.2. THE CASE OF EQUALITY CONSTRAINTS: LAGRANGE THEOREM. 241

Theorem 779 (Necessary Conditions)Assume that rank [Dg (x∗)] = m.Under the above condition, we have thatx∗ is a local maximum for (P )⇒there exists λ∗ ∈ Rm, such that½

Df (x∗) + λ∗Dg (x∗) = 0g (x∗) = 0

(19.25)

Remark 780 The full rank condition in the above Theorem cannot be dispensed. The followingexample shows a case in which x∗ is a solution to maximization problem (19.24), Dg (x∗) does nothave full rank and there exists no λ∗ satisfying Condition 19.25. Consider

max(x,y)∈R2 x s.t. x3 − y = 0x3 + y = 0

The constraint set is {(0, 0)} and therefore the solution is just (x∗, y∗) = (0, 0). The Jacobianmatrix of the constraint function is∙

3x2 −13x2 1

¸|(x∗,y∗)

=

∙0 −10 1

¸which does have full rank.

(0, 0) = Df (x∗, y∗) + (λ1, λ2)Dg (x∗, y∗) =

= (1, 0) + (λ1, λ2)

∙0 −10 1

¸= (1,−λ1 + λ2) ,

from which it follows that there exists no λ∗ solving the above system.

Theorem 781 (Sufficient Conditions)Assume that1. f is pseudo-concave,2. for j = 1, ...,m, gj is quasi concave.Under the above conditions, we have what follows.[there exist (x∗, λ∗) ∈ X ×Rm such that3. λ∗ ≥ 0,4. Df (x∗) + λ∗Dg (x∗) = 0, and5. g (x∗) = 0 ]⇒x∗ solves (P ) .

Proof.Suppose otherwise, i.e., there exists bx ∈ X such that

for j = 1, ...,m, gj (bx) = gj (x∗) = 0 (1) , and

f (bx) > f (x∗) (2) .

Quasi-concavity of gj and (1) imply that

Dgj (x∗) (bx− x∗) ≥ 0 (3) .

Pseudo concavity of f and (2) imply that

Df (x∗) (bx− x∗) > 0 (4) .

But then

0Assumption

= [Df (x∗) + λDg (x∗)] (bx− x∗) =(>0)

Df (x∗) (bx− x∗) +(≥0)λ

(≥0)Dg (x∗) (bx− x∗) > 0,

a contradiction.


19.3 The Case of Both Equality and Inequality Constraints.

Considerthe open and convex set X ⊆ Rn and the differentiable functions f : X → R, g : X → Rm, h :

X → Rl.Consider the problem

maxx∈X f (x) s.t. g (x) ≥ 0h (x) = 0.

(19.26)

Observe that

h (x) = 0⇔

⇔ for k = 1, ..., l, hk (x) = 0⇔

⇔ for k = 1, ..., l, hk1 (x) := hk (x) ≥ 0 and hk2 (x) := −hk (x) ≥ 0.

Defined h.1 (x) :=¡hk1 (x)

¢lk=1

and h.2 (x) :=¡hk2 (x)

¢lk=1

, problem 19.26 with associatedmultipliers can be rewritten as

maxx∈X f (x) s.t. g (x) ≥ 0 λh.1 (x) ≥ 0 μ1h.2 (x) ≥ 0 μ2

(19.27)

The Lagrangian function of the above problem is

L (x;λ, μ1, μ2) = f (x) + λT g (x) + (μ1 − μ2)T h (x) =

= f (x) +mXj=1

λjgj (x) +lX

k=1

¡μk1 − μk2

¢Th (x) ,

and the Kuhn-Tucker Conditions are:

Df (x) + λTDg (x) + (μ1 − μ2)TDh (x) = 0

gj (x) ≥ 0, λj ≥ 0, λjgj (x) = 0, for j = 1, ...,m,

hk (x) = 0,¡μk1 − μk2

¢:= μ T 0, for k = 1, ..., l.

(19.28)

Theorem 782 Assume that f, g and h are C2 functions and that

either rank∙Dg∗ (x∗)Dh (x∗)

¸= m∗ + l,

or for j = 1, ...,m, −gj is pseudoconcave, and ∀k, hk and −hk are pseudoconcaveUnder the above conditions,if x∗ solves 19.26, then ∃ (x∗, λ∗, μ∗) ∈ X ×Rm×Rl which satisfies the associated Kuhn-Tucker

conditions.

Proof. The above conditions are called “Weak reverse convex constraint qualification” (Man-gasarian (1969)) or “Reverse constraint qualification” (Bazaraa and Shetty (1976)). The neededresult is presented and proved inMangasarian2,- see 4, page 172 and Theorem 6, page 173, and Bazaraa and Shetty (1976) - see

7 page 148, and theorems 6.2.3, page 148 and Theorem 6.2.4, page 150.See also El-Hodiri (1991), Theorem 1, page 48 and Simon (1985), Theorem 4.4. (iii), page 104.

2What Mangasarian calls a linear function is what we call an affine function.

19.4. MAIN STEPS TO SOLVE A (NICE) MAXIMIZATION PROBLEM 243

Theorem 783 Assume thatf is pseudo-concave, andfor j = 1, ...,m, gj is quasi-concave, and for k = 1, ..., l, hk is quasi-concave and −hk is

quasi-concave.Under the above conditions,if (x∗, λ∗, μ∗) ∈ X × Rm × Rl satisfies the Kuhn-Tucker conditions associated with 19.26, then

x∗ solves 19.26.

Proof.This follows from Theorems proved in the case of inequality constraints.

Similarly, to what we have done in previous sections, we can summarize what said above asfollows.Call (M2) the problem

maxx∈X f (x) s.t. g (x) ≥ 0h (x) = 0.

(19.29)

and defineM2 := argmax (M2)

S2 := {x ∈ X : ∃λ ∈ Rm such that (x, λ) is a solution to Kuhn-Tucker system (19.28)}

1. Assume that one of the following conditions hold:

(a) rank

∙Dg∗ (x∗)Dh (x∗)

¸= m∗ + l,or

(b) for j = 1, ...,m, gj is linear, and h (x) is affine.Then

M2 ⊆ S2

2. Assume that both the following conditions hold:

(a) f is pseudo-concave, and

(b) for j = 1, ...,m, gj is quasi-concave, and for k = 1, ..., l, hk is quasi-concave and −hk isquasi-concave.Then

M2 ⊇ S2

19.4 Main Steps to Solve a (Nice) Maximization ProblemWe have studied the problem

maxx∈X f (x) s.t. g (x) ≥ 0 (M)

which we call a maximization problem in the “canonical form”, i.e., a maximization problemwith constraints in the form of “≥”, and we have defined

M := argmax (M)

C := {x ∈ X : g (x) ≥ 0}

S := {x ∈ X : ∃λ ∈ Rm such that (x, λ) satisfies Kuhn-Tucker Conditions (19.3)}

Recall that X is an open, convex subset of Rn, f : X → R, ∀j ∈ {1, ...,m}, gj : X → R andg := (gj)

mj=1 : X → Rm.


In many cases, we have to study the following problem

maxx

f (x) s.t. g (x) ≥ 0, (M 0)

in which the set X is not specified.We list the main steps to try to solve (M 0).1. Canonical form.Write the problem in the (in fact, our definition of) canonical form. Sometimes the problem

contains a parameter π ∈ Π an open subset of Rk. Then we should write: for given π ∈ Π

maxx

f (x, π) s.t. g (x, π) ≥ 0.

2. The set X and the functions f and g.a. Define the functions ef , eg naturally arising from the problem with domain equal to their

definition set, where the definition set of a function ϕ is the largest set Dϕ which can be the domainof that function.b. Determine X. A possible choice for X is the intersection of the “definition set” of each

function, , i.e.,X = Df ∩Dg1 ∩ ... ∩Dgm

c. Check if X is open and convex.d. To apply the analysis described in the previous sections, show, if possible, that f and g are

of class C2 or at least C1.3. Existence.Try to apply the Extreme Value Theorem. If f is at least C1, then f is continuous and therefore

we have to check if the constraint set C is non-empty and compact. Recall that a set S in Rn iscompact if and only if S is closed (in Rn) and bounded.Boundedness has to be shown “brute force”, i.e., using the specific form of the maximization

problem.If X = Rn, then C := {x ∈ X : g (x) ≥ 0} is closed, because of the following well-known argu-

ment:C = ∩mj=1g−1j ([0,+∞)) ; since gj is C2 (or at least C1) and therefore continuous, and [0,+∞)

closed, g−1j ([0,+∞)) is closed in X = Rn; then C is closed because intersection of closed sets.A problem may arise if X is an open proper subset of Rn. In that case the above argument

shows that C is a closed set in X 6= Rn and therefore it is not necessarily closed in Rn. A possibleway out is the following one.Verify that while the definition set of f is X, the definition set of g is Rn. If Dg = Rn, then from

the above argument eC := {x ∈ Rn : eg (x) ≥ 0}is closed (in Rn). Observe that

C = eC ∩X.

Then, we are left with showing that eC ⊆ X and therefore eC ∩X = eC and then

C = eCIf eC is compact, C is compact as well.3

4. Number of solutions.See subsection 19.1.1. In fact, summarizing what said there, we know that the solution to (M),

if any, is unique if1. f is strictly-quasi-concave, and for j ∈ {1, ...,m}, gj is quasi-concave; or2. for j ∈ {1, ...,m}, gj is strictly-quasi-concave and3Observe that the above argument does not apply to che case in which

C = x ∈ R2++ : w − x1 − x2 ≥ 0 :

in that case,

C = x ∈ R2 : w − x1 − x2 ≥ 0 * R2++.


either a. f is quasi-concave and locally non-satiated,or b. f is affine and non-costant,or c. f is quasi-concave and strictly monotone,or d. f is quasi-concave and ∀x ∈ X, Df (x) >> 0,or e. f is quasi-concave and ∀x ∈ X, Df (x) << 0.5. Necessity of K-T conditions.Check if the conditions which insure that M ⊆ S hold, i.e.,either a. for j = 1, ...,m, gj is pseudo-concave and there exists x++ ∈ X such that g (x++)À 0,or b. rankDg∗ (x∗) = #J∗.If those conditions holds, each property we show it holds for elements of S does hold a fortiori

for elements of M .6. Sufficiency of K-T conditions.Check if the conditions which insure that M ⊇ S hold, i.e., thatf is pseudo-concave and for j = 1, ...,m, gj is quasi-concave.If those conditions holds, each property we show it does not hold for elements of S does not

hold a fortiori for elements of M .7. K-T conditions.Write the Lagrangian function and then the Kuhn-Tucker conditions.8. Solve the K-T conditions.Try to solve the system of Kuhn-Tucker conditions in the unknown variables (x, λ) . To do that;either, analyze all cases,or, try to get a “good conjecture” and check if the conjecture is correct.

Example 784 Discuss the problem

max(x1,x2)12 log (1 + x1) +

13 log (1 + x2) s.t. x1 ≥ 0

x2 ≥ 0x1 + x2 ≤ w

with w > 0.

1. Canonical form.For given w ∈ R++,

max(x1,x2)12 log (1 + x1) +

13 log (1 + x2) s.t. x1 ≥ 0

x2 ≥ 0w − x1 − x2 ≥ 0

(19.30)

2. The set X and the functions f and g.a. ef : (−1,+∞)2 → R, (x1, x2) 7→ 1

2 log (1 + x1) +13 log (1 + x2)eg1 : R2 → R (x1, x2) 7→ x1eg2 : R2 → R (x1, x2) 7→ x2eg3 : R2 → R (x1, x2) 7→ w − x1 − x2

b.

X = (−1,+∞)2

and therefore f and g are just ef and eg restricted to X.c. X is open and convex because Cartesian product of open intervals which are open, convex

sets.d. Let’s try to compute the Hessian matrices of f, g1, g2, g3. Gradients are

Df (x1, x2) , =³

12(x1+1)

, 13(x2+1)

´Deg1 (x1, x2) = (1, 0)Deg2 (x1, x2) = (0, 1)Deg3 (x1, x2) = (−1,−1)


Hessian matrices are

D2f (x1, x2) , =

"− 12(x1+1)

2 0

0 − 13(x2+1)

2

#D2eg1 (x1, x2) = 0D2eg2 (x1, x2) = 0D2eg3 (x1, x2) = 0

In fact, g1, g2, g3 are affine functions. In conclusion, f and g1, g2, g3 are C2. In fact, g1 and g2are linear and g3 is affine.2. Existence.C is clearly bounded: ∀x ∈ C,

(0, 0) ≤ (x1, x2) ≤ (w,w)

In fact, the first two constraint simply say that (x1, x2) ≥ (0, 0). Moreover, from the third constraintx1 ≤ w − x2 ≤ w, simply because x2 ≥ 0; similar argument can be used to show that x2 ≤ w.To show closedness, use the strategy proposed above.

eC := {x ∈ Rn : g (x) ≥ 0}is obviously closed. Since eC ⊆ R2+, because of the first two constraints, eC ⊆ X := (−1,+∞)2

and therefore C = eC ∩X = eC is closed.We can then conclude that C is compact and therefore argmax (19.30) 6= ∅.4. Number of solutions.From the analysis of the Hessian and using Theorem 763, parts 1 ad 2, we have that f is strictly

concave:− 1

2 (x1 + 1)2 < 0

det

"− 12(x1+1)

2 0

0 − 13(x2+1)

2

#=

1

2 (x1 + 1)2 ·

1

3 (x2 + 1)2 > 0

Moreover g1, g2, g3 are affine and therefore concave. From Proposition 776, part 1, the solutionis unique.5. Necessity of K-T conditions.Since each gj is affine and therefore pseudo-concave, we are left with showing that there exists

x++ ∈ X such that g (x++) >> 0. Just take¡x++1 , x++2

¢= w

4 (1, 1) :

w4 > 0w4 > 0w − w

4 −w4 =

w2 > 0

ThereforeM ⊆ S

6. Sufficiency of K-T conditions.f is strictly concave and therefore pseudo-concave, and each gj is linear and therefore quasi-

concave. ThereforeM ⊇ S

7. K-T conditions.

L (x1, x2, λ1, λ2, μ;w) =1

2log (1 + x1) +

1

3log (1 + x2) + λ1x1 + λ2x2 + μ (w − x1 − x2)⎧⎪⎪⎪⎪⎨⎪⎪⎪⎪⎩

12(x1+1)

+ λ1 − μ = 01

3(x2+1)+ λ2 − μ = 0

min {x1, λ1} = 0min {x2, λ2} = 0

min {w − x1 − x2, μ} = 0


8. Solve the K-T conditions.Conjecture: x1 > 0 and therefore λ1 = 0; x2 > 0 and therefore λ2 = 0; w − x1 − x2 = 0.The

Kuhn-Tucker system becomes: ⎧⎪⎪⎪⎪⎪⎪⎨⎪⎪⎪⎪⎪⎪⎩

12(x1+1)

− μ = 01

3(x2+1)− μ = 0

w − x1 − x2 = 0μ ≥ 0x1 > 0, x2 > 0λ1 = 0, λ2 = 0

Then, ⎧⎪⎪⎪⎪⎪⎪⎨⎪⎪⎪⎪⎪⎪⎩

12(x1+1)

= μ1

3(x2+1)= μ

w − x1 − x2 = 0μ > 0x1 > 0, x2 > 0λ1 = 0, λ2 = 0⎧⎪⎪⎪⎪⎪⎪⎪⎨⎪⎪⎪⎪⎪⎪⎪⎩

x1 = 12μ − 1

x2 = 13μ − 1

w −³12μ − 1

´−³13μ − 1

´= 0

μ > 0x1 > 0, x2 > 0λ1 = 0, λ2 = 0

0 = w −³12μ − 1

´−³13μ − 1

´= w − 5

6μ + 2; and μ = 56(w+2) > 0. Then x1 =

12μ − 1 =

6(w+2)2·5 − 1 = 3w+6−5

5 = 3w+15 and x2 = 1

3μ − 1 =6(w+2)3·5 − 1 = 2w+4−5

5 = 2w−15 .

Summarizing ⎧⎪⎪⎨⎪⎪⎩x1 = 3w+1

5 > 0x2 = 2w−1

5 > 0μ = 5

6(w+2) > 0

λ1 = 0, λ2 = 0

Observe that while x1 > 0 for any value of w, x2 > 0iff w > 12 . Therefore, for w ∈

¡0, 12

¤, the

above one is not a solution, and we have to come up with another conjecture;x1 = w and therefore λ1 = 0; x2 = 0 and λ2 ≥ 0; w − x1 − x2 = 0 and μ ≥ 0.The Kuhn-Tucker

conditions become ⎧⎪⎪⎪⎪⎪⎪⎪⎪⎨⎪⎪⎪⎪⎪⎪⎪⎪⎩

12(w+1) − μ = 013 + λ2 − μ = 0λ1 = 0x2 = 0λ2 ≥ 0x1 = wμ ≥ 0

and ⎧⎪⎪⎪⎪⎪⎪⎨⎪⎪⎪⎪⎪⎪⎩

μ = 12(w+1) > 0

λ2 = 12(w+1) −

13 =

3−2w−26(w+1) =

1−2w6(w+1)

λ1 = 0x2 = 0λ2 ≥ 0x1 = w

λ2 =1−2w6(w+1) = 0 if w =

12 ,and .λ2 =

1−2w6(w+1) > 0 if w ∈

¡0, 12

¢Summarizing, the unique solution x∗ to the maximization problem is

if w ∈¡0, 12

¢, then x∗1 = w, λ∗1 = 0 and x∗2 = 0, λ∗2 > 0

if w = 12 , then x∗1 = w, λ∗1 = 0 x∗2 = 0, λ∗2 = 0

if w ∈¡12 ,+∞

¢, then x∗1 =

3w+15 > 0 λ∗1 = 0 and x∗2 =

2w−15 > 0 λ∗2 = 0


The graph of x∗1 as a function of w is presented below (please, complete the picture)

21.510.50

1

0.75

0.5

0.25

0

w

x1

w

x1

The graph below shows constraint sets for w = 14 ,

12 , 1 and some significant level curve of the

objective function.

1.510.50-0.5-1

1.5

1

0.5

0

-0.5

-1

x1

x2

x1

x2

Observe that in the example, we get that if λ∗2 = 0, the associated constraint x2 ≥ 0 is notsignificant. See Subsection 19.6.2, for a discussion of that statement.

Of course, several problems may arise in applying the above procedure. Below, we describe somecommonly encountered problems and some possible (partial) solutions.

19.4.1 Some problems and some solutions

1. The set X.

X is not open.Rewrite the problem in terms of an open set X 0 and some added constraints. A standard

example is the following one.

maxx∈Rn+ f (x) s.t. g (x) ≥ 0


which can be rewritten as

maxx∈Rn f (x) s.t. g (x) ≥ 0x ≥ 0

2. Existence.

a. The constraint set is not compact. If the constraint set is not compact, it is sometimespossible to find another maximization problem such thati. its constraint set is compact and nonempty, andii. whose solution set is contained in the solution set of the problem we are analyzing.A way to try to achieve both i. and ii. above is to “restrict the constraint set (to make it

compact) without eliminating the solution of the original problem”. Sometimes, a problem withthe above properties is the following one.

maxx∈X f (x) s.t. g (x) ≥ 0 (P1)f (x)− f (bx) ≥ 0

where bx is an element of X such that g (bx) ≥ 0.Observe that (P − cons) is an example of (P1).In fact, while, condition 1. above., i.e., the compactness of the constraint set of (P1) depends

upon the specific characteristics of X, f and g, condition ii. above is satisfied by problem (P1), asshown in detail below.Define

M := argmax (P ) M1 := argmax (P1)

and V and V 1 the constraint sets of Problems (P ) and (P1), respectively. Observe that

V 1 ⊆ V (19.31)

If V 1 is compact, thenM1 6= ∅ and the only thing left to show is thatM1 ⊆M , which is alwaysinsured as proved below.

Proposition 785 M1 ⊆M .

Proof. If M1 = ∅, we are done.Suppose that M1 6= ∅, and that the conclusion of the Proposition is false, i.e., there exists

x1 ∈M1 such thata. x1 ∈M1, and b. x1 /∈M , ora. ∀x ∈ X such that g (x) ≥ 0 and f (x) ≥ f (bx), we have f ¡x1¢ ≥ f (x);andb. either i. x1 /∈ V ,or ii. ∃ex ∈ X such that

g (ex) ≥ 0 (19.32)

andf (ex) > f

¡x1¢

(19.33)

Let’s show that i. and ii. cannot hold.i.It cannot hold simply because V 1 ⊆ V , from 19.31.ii.Since x1 ∈ V 1,

f¡x1¢≥ f (bx) (19.34)

From (19.33) and (19.34), it follows that

f (ex) > f (bx) (19.35)

But (19.32), (19.35) and (19.33) contradict the definition of x1, i.e., a. above


b. Existence without the Extreme Value Theorem If you are not able to show existence,buti. sufficient conditions to apply Kuhn-Tucker conditions hold, andii. you are able to find a solution to the Kuhn-Tucker conditions,then a solution exists.

19.5 The Implicit Function Theorem and Comparative Sta-tics Analysis

The Implicit Function Theorem can be used to study how solutions (x ∈ X ⊆ Rn) to maximizationsproblems and, if needed, associated Lagrange or Kuhn-Tucker multipliers (λ ∈ Rm) change whenparameters (π ∈ Π ⊆ Rk) change. That analysis can be done if the solutions to the maximizationproblem (and the multipliers) are solution to a system of equation of the form

F1 (x, π) = 0

with (# choice variables) = (# dimension of the codomain of F1), or

F2 (ξ, π) = 0

where ξ := (x, λ), and (# choice variables and multipliers) = (# dimension of the codomain ofF2),To apply the Implicit Function Theorem, it must be the case that the following conditions do

hold.

1. (# choice variables x) = (# dimension of the codomain of F1), or

(# choice variables and multipliers) = (# dimension of the codomain of F2).

2. Fi has to be at least C1.That condition is insured if the above systems are obtained frommaximization problems characterized by functions f, g which are at least C2: usually the abovesystems contain some form of first order conditions, which are written using first derivativesof f and g.

3. F1 (x∗, π0) = 0 or F2 (ξ∗, π0) = 0. The existence of a solution to the system is usually the

result of the strategy to describe how to solve a maximization form - see above Section 19.4.

4. det [DxF1 (x∗, π0)]n×n 6= 0 or det [DξF2 (ξ

∗, π0)](n+m)×(n+m) 6= 0. That condition has to beverified directly on the problem.

If the above conditions are verified, the Implicit Function Theorem allow to conclude whatfollows (in reference to F2).There exist an open neighborhood N(ξ∗) ⊆ X of ξ∗, an open neighborhood N(π0) ⊆ Π of

π0 and a unique C1 function g : N(π0) ⊆ Π ⊆ Rp → N(ξ∗) ⊆ X ⊆ Rn such that ∀π ∈N (π0) , F (g (π) , π) = 0 and

Dg (π) = −hDξF (ξ, π)|ξ=g(π)

i−1·hDπF (ξ, π)|ξ=g(π)

iTherefore, using the above expression, we may be able to say if the increase in any value of any

parameter implies an increase in the value of any choice variable (or multiplier).Three significant cases of application of the above procedure are presented below. We are going

to consider C2 functions defined on open subsets of Euclidean spaces.

19.5.1 Maximization problem without constraint

Assume that the problem to study ismaxx∈X

f (x, π)

and that1. f is concave;

19.5. THE IMPLICIT FUNCTION THEOREM AND COMPARATIVE STATICS ANALYSIS251

2. There exists a solution x∗ to the above problem associated with π0.Then, from Proposition 726, we know that x∗ is a solution to

Df (x, π0) = 0

Therefore, we can try to apply the Implicit Function Theorem to

F1 (x, π) = Df (x, π0)

An example of application of the strategy illustrated above is presented in Section 20.3.

19.5.2 Maximization problem with equality constraints

Consider a maximization problem

maxx∈X f (x, π) s.t. g (x, π) = 0,

Assume that necessary and sufficient conditions to apply Lagrange Theorem hold and that thereexists a vector (x∗, λ∗) which is a solution (not necessarily unique) associated with the parameterπ0. Therefore, we can try to apply the Implicit Function Theorem to

F2 (ξ, π) =

µDf (x∗, π0) + λ∗Dg (x∗, π0)g (x∗, π0) .

¶(19.36)

19.5.3 Maximization problem with Inequality Constraints

Consider the following maximization problems with inequality constraints. For given π ∈ Π,

maxx∈X f (x, π) s.t. g (x, π) ≥ 0 (19.37)

Moreover, assume that the set of solutions of that problem is nonempty and characterized by theset of solutions of the associated Kuhn-Tucker system, i.e., using the notation of Subsection 19.1,

M = S 6= ∅.

We have seen that we can write Kuhn-Tucker conditions in one of the two following ways, besidesome other ones, ⎧⎪⎪⎨⎪⎪⎩

Df (x, π) + λDg (x, π) = 0 (1)λ ≥ 0 (2)g (x, π) ≥ 0 (3)λg (x, π) = 0 (4)

(19.38)

or ½Df (x, π) + λDg (x, π) = 0 (1)min {λj , gj (x, π)} = 0 for j ∈ {1, ...,m} (2)

(19.39)

The Implicit Function Theorem cannot be applied to either system (19.38) or system (19.39):system (19.38) contains inequalities; system (19.39) involves functions which are not differentiable.We present below conditions under which the Implicit Function Theorem can be anyway appliedto allow to make comparative statics analysis. Take a solution (x∗, λ∗, π0) to the above system(s).Assume that

for each j, either λ∗j > 0 or gj (x∗, π0) > 0.

In other words, there is no j such that λj = gj (x∗, π0) = 0. Consider a partition J∗, bJ of

{1, ..,m}, and the resulting Kuhn-Tucker conditions.⎧⎪⎪⎪⎪⎨⎪⎪⎪⎪⎩Df (x∗, π0) + λ∗Dg (x∗, π0) = 0λ∗j > 0 for j ∈ J∗

gj (x∗, π0) = 0 for j ∈ J∗

λ∗j = 0 for j ∈ bJgj (x

∗, π0) > 0 for j ∈ bJ(19.40)


Defineg∗ (x∗, π0) := (gj (x

∗, π0))j∈J∗bg (x∗, π0) := (gj (x∗, π0))j∈Jλ∗∗ :=

¡λ∗j¢j∈J∗bλ∗ := ¡λ∗j¢j∈J

Write the system of equations obtained from system (19.40) eliminating strict inequality con-straints and substituting in the zero variables:½

Df (x∗, π0) + λ∗∗Dg∗ (x∗, π0) = 0g∗ (x∗, π0) = 0

(19.41)

Observe that the number of equations is equal to the number of “remaining” unknowns andthey are

n+#J∗

i.e., Condition 1 presented at the beginning of the present Section 19.5 is satisfied. Assume thatthe needed rank condition does hold and we therefore can apply the Implicit Function Theorem to

F2 (ξ, π) =

µDf (x∗, π0) + λ∗∗Dg∗ (x∗, π0)g∗ (x∗, π0)

¶= 0

Then. we can conclude that there exists a unique C1 function ϕ defined in an open neighborhoodN1 of π0 such that

∀π ∈ N1, ϕ (π) := (x∗ (π) , λ∗∗ (π))

is a solution to system (19.41) at π.Therefore, by definition of ϕ,½

Df (x∗ (π) , π) + λ∗∗ (π)T Dg∗ (x∗ (π) , π) = 0g∗ (x∗ (π) , π) = 0

(19.42)

Since ϕ is continuous and λ∗∗ (π0) > 0 and bg (x∗ (π0) , π0) > 0, there exist an open neighborhoodN2 ⊆ N1 of π0 such that ∀π ∈ N2 ½

λ∗∗ (π) > 0bg (x (π) , π) > 0(19.43)

Take also ∀π ∈ N2 n bλ∗ (π) = 0 (19.44)

Then, systems (19.42), (19.43) and (19.44) say that ∀π ∈ N2,³x (π) , λ∗ (π) , bλ (π)´ satisfy

Kuhn-Tucker conditions for problem (19.37) and therefore, since C =M , they are solutions to themaximization problem.The above conclusion does not hold true if Kuhn-Tucker conditions are of the following form⎧⎪⎪⎨⎪⎪⎩

Df (x, π) + λTDg (x, π) = 0λj = 0, gj (x, π) = 0 for j ∈ J 0

λj > 0, gj (x, π) = 0 for j ∈ J 00

λj = 0, gj (x, π) > 0 for j ∈ bJ(19.45)

where J 0 6= ∅, J 00 and bJ is a partition of J .In that case, applying the same procedure described above, i.e., eliminating strict inequality

constraints and substituting in the zero variables, leads to the following systems in the unknownsx ∈ Rn and (λj)j∈J00 ∈ R#J

00:⎧⎨⎩

Df (x, π) + (λj)j∈J00 D (gj)j∈J00 (x, π) = 0

gj (x, π) = 0 for j ∈ J 0

gj (x, π) = 0 for j ∈ J 00

19.6. THE ENVELOPE THEOREM AND THE MEANING OF MULTIPLIERS 253

and therefore the number of equation is n + #J 00 + #J 0 > n + #J 00,simply because we areconsidering the case J 0 6= ∅. Therefore the crucial condition

(# choice variables and multipliers) = (# dimension of the codomain of F2)

is violated.Even if the Implicit Function Theorem could be applied to the equations contained in (19.45),

in an open neighborhood of π0 we could have

λj (π) < 0 and/or gj (x (π) , π) < 0 for j ∈ J 0

Then ϕ (π) would be solutions to a set of equations and inequalities which are not Kuhn-Tuckerconditions of the maximization problem under analysis, and therefore x (π) would not be a solutionto the that maximization problem.An example of application of the strategy illustrated above is presented in Section 20.1.

19.6 The Envelope Theorem and the meaning of multipliers

19.6.1 The Envelope Theorem

Consider the problem (M) : for given π ∈ Π,

maxx∈X f (x, π) s.t. g (x, π) = 0

Assume that for every π, the above problem admits a unique solution characterized by Lagrangeconditions and that the Implicit function theorem can be applied. Then, there exists an open setO ⊆ Π such that

x : O→ X, x : π 7→ argmax (P ) ,v : O→ R, v : π 7→ max (P ) andλ : O→ Rm, π 7→ unique Lagrange multiplier vector

are differentiable functions.

Theorem 786 For any π∗ ∈ O and for any pair of associated (x∗, λ∗) := (x (π∗) , λ (π∗)), we have

Dπv (π∗) = DπL (x∗, λ∗, π∗)

i.e.,Dπv (π

∗) = Dπf (x∗, π∗) + λ∗Dπg (x

∗, π∗)

Remark 787 Observe that the above analysis applies also to the case of inequality constraints, aslong as the set of binding constraints does not change.

Proof. of Theorem 786 By definition of v (.) and x (.) , we have that

∀π ∈ O, v (π) = f (x (π) , π) . (1)

Consider an arbitrary value π∗ and the unique associate solution x∗ = x (π∗) of problem (P ) .Differentiating both sides of (1) with respect to π and computing at π∗, we get

[Dπv (π∗)]1×k =

hDxf (x, π)|(x∗,π∗)

i1×n

·hDπx (π)|π=π∗

in×k

+hDπf (x, π)|(x∗,π∗)

i1×k

(2)

From Lagrange conditions

Dxf (x, π)|(x∗,π∗) = −λ∗Dxg (x, π)|(x∗,π∗) (3) ,

where λ∗ is the unique value of the Lagrange multiplier. Moreover

∀π ∈ O, g (x (π) , π) = 0. (4)


Differentiating both sides of (4) with respect to π and computing at π∗, we gethDxg (x, π)|(x∗,π∗)

im×n

·h[Dπx (π)]|π=π∗

in×k

+hDπg (x, π)|(x∗,π∗)

im×k

= 0 (5) .

Finally,

[Dπv (π∗)]1×k

(2),(3)= −λ∗Dxg (x, π)|(x∗,π∗)Dπx (π)|π=π∗ +Dπf (x, π)|(x∗,π∗)

(5)=

= Dπf (x, π)|(x∗,π∗) + λ∗Dπg (x, π)|(x∗,π∗)

19.6.2 On the meaning of the multipliers

The main goal of this subsection is to try to formalize the following statements.

1. The fact that λj = 0 indicates that the associated constraint gj (x) ≥ 0 is not significant - seeProposition 788 below.

2. The fact that λj > 0 indicates that a way to increase the value of the objective function is toviolate the associated constraint gj (x) ≥ 0 - see Proposition 789 below.

For simplicity, consider the case m = 1. Let (CP ) be the problem

maxx∈X

f (x) s.t. g (x) ≥ 0

and (UP ) the problemmaxx∈X

f (x)

with f strictly quasi-concave, g is quasi-concave and solutions to both problem exist . Definex∗ := argmax (CP ) with associated multiplier λ∗, and x∗∗ := argmax (UP ).

Proposition 788 If λ∗ = 0, then x∗ = argmax (UP )⇔ x∗ = argmax (CP ) .

Proof. By the assumptions of this section, the solution to (CP ) exists, is unique, it is equal tox∗ and there exists λ∗ such that

Df (x∗) + λ∗Dg (x∗) = 0min {g (x∗) , λ∗} = 0.

Moreover, the solution to (UP ) exists, is unique and it is the solution to

Df (x) = 0.

Since λ∗ = 0, the desired result follows.Take ε > 0 and k ∈ (−ε,+∞). Let (CPk) be the problem

maxx∈X

f (x) s.t. g (x) ≥ k

Let bx : (−ε,+∞)→ X, k 7→ argmax (CPk)

bv : (−ε,+∞)→ R, k 7→ max (CPk) := f (bx (k))Let bλ (k) be such that ³bx (k) , bλ (k)´ is the solution to the associated Kuhn-Tucker conditions.Observe that

x∗ = bx (0) , λ∗ = bλ (0) (19.46)

Proposition 789 If λ∗ > 0, then bv0 (0) < 0.

19.6. THE ENVELOPE THEOREM AND THE MEANING OF MULTIPLIERS 255

Proof. From the envelope theorem,

∀k ∈ (−ε,+∞) , bv0 (k) = ∂ (f (x) + λ (g (x)− k))

∂k |x(k),λ(k)= −bλ (k)

and from (19.46) bv0 (0) = −bλ (0) = −λ∗ < 0.Remark 790 Consider the following problem. For given a ∈ R,

maxx∈X f (x) s.t. g (x)− a ≥ 0 (19.47)

Assume that the above problem is “well-behaved” and that x (a) = argmax (19.47), v (a) =f (x (a)) and (x (a) , λ (a)) is the solution of the associated Kuhn-Tucker conditions. Then, applyingthe Envelope Theorem we have

v0 (a) = λ (a)

Chapter 20

Applications to Economics

20.1 The Walrasian Consumer ProblemThe utility function of a household is

u : RC++ → R : x7→u (x) .

Assumption. u is a C2 function; u is differentiably strictly increasing, i.e., ∀x ∈ RC++, Du (x)À 0;u is differentiably strictly quasi-concave, i.e., ∀x ∈ RC++, ∆x 6= 0 and Du (x)∆x = 0 ⇒∆xTD2u (x)∆x < 0; for any u ∈ R,

©x ∈ RC++ : u (x) ≥ u

ªis closed in RC .

The maximization problem for household h is

(P1) maxx∈RC++ u (x) s.t. px− w ≤ 0.

The budget set of the above problem is clearly not compact. But, in the Appendix, we showthat the solution of (P1) are the same as the solutions of (P2) and (P3) below. Observe that theconstraint set of (P3) is compact.

(P2) maxx∈RC++ u (x) s.t. px− w = 0;

(P3) maxx∈RC++ u (x) s.t. px− w ≤ 0;u (x) ≥ u (e∗) ,

where e∗ ∈©x ∈ RC++ : px ≤ w

ª.

Theorem 791 Under the Assumptions (smooth 1-5), ξh (p,wh) is a C1 function.

Proof.Observe that, from it can be easily shown that, ξ is a function.We want to show that (P2) satisfies necessary and sufficient conditions to Lagrange Theorem,

and then apply the Implicit Function Theorem to the First Order Conditions of that problem.The necessary condition is satisfied because Dx [px− w] = p 6= 0;Define also

μ : RC++ ×RC−1++ → RC++,μ : (p,w) 7→ Lagrange multiplier for (P2) .

The sufficient conditions are satisfied because: from Assumptions (smooth 4), u is differentiablystrictly quasi-concave; the constraint is linear; the Lagrange multiplier μ is strictly positive -seebelow.The Lagrangian function for problem (P2) and the associated First Order Conditions are de-

scribed below.L (x, μ, p, w) = u (x) + μ · (−px+ w)

(FOC) (1) Du (x)− μp = 0(2) −px+ w = 0

257

258 CHAPTER 20. APPLICATIONS TO ECONOMICS

DefineF : RC++ ×R++ ×RC++ ×R++ → RC ×R,

F : (x, μ, p,w) 7→µ

Du (x)− μp−px+ w

¶.

As an application of the Implicit Function Theorem, it is enough to show thatD(x,μ)F (x, μ, p, w)has full row rank (C + 1).Suppose D(x,μ)F does not have full rank; then there would exist∆x ∈ RC and ∆μ ∈ R such that ∆ := (∆x,∆μ) 6= 0 and D(x,μ)F · (∆x,∆μ) = 0, or∙

D2u (x) −pT−p 0

¸ ∙∆x∆μ

¸= 0,

or(a) D2u (x)∆x− pT∆μ = 0(b) −p∆x = 0

.The idea of the proof is to contradict Assumption u3.Claim 1. ∆x 6= 0.

By assumption it must be ∆ 6= 0 and therefore, if ∆x = 0, ∆μ 6= 0. Since p ∈ RC++, pT∆μ 6= 0.Moreover, if ∆x = 0, from (a), we would have pT∆μ = 0, a contradiction. .Claim 2. Du ·∆x = 0.

From (FOC1) , we have Du ·∆x− μhp ·∆x = 0; using (b) the desired result follows .Claim 3. ∆xTD2u ·∆x = 0.

Premultiplying (a) by ∆xT , we get ∆xTD2u (x)∆x−∆xT pT∆μ = 0. Using (b) , the result follows.Claims 1, 2 and 3 contradict Assumption u3.

The above result gives also a way of computing D(p,w)x (p,w) , as an application of the ImplicitFunction Theorem .Since

x μ p w

Du (x)− μp D2u −pT −μIC 0−px+ w −p 0 −x 1£

D(p,w) (x, μ) (p,w)¤(C+1)×(C+1) =

∙Dpx DwxDpμ Dpμ

¸=

= −∙D2u −pT−p 0

¸−1(C+1)×(C+1)

∙−μIC 0−x 1

¸(C+1)×(C+1)

To compute the inverse of the above matrix, we can use the following fact about the inverse ofpartitioned matrix (see for example, Goldberger, (1963), page 26)Let A be an n× n nonsingular matrix partitioned as

A =

∙E FG H

¸,

where En1×n1 , Fn1×n2 , Gn2×n1 , Hn2×n2 and n1 + n2 = n. Suppose that E and D := H −GE−1F are non singular. Then

A−1 =

∙E−1

¡I + FD−1GE−1

¢−E−1FD−1

−D−1GE−1 D−1

¸.

If we assume that D2u is negative definite and therefore invertible, we have∙D2u −pT−p 0

¸−1=

" ¡D2¢−1 ³

I + δ−1pT p¡D2¢−1´

δ−1¡D2¢−1

pT

δ−1p¡D2¢−1

δ−1

#

where δ = −p¡D2¢−1

pT ∈ R++.And

20.2. PRODUCTION 259

[Dpx (p,w)]C×C = −h ¡

D2¢−1 ³

I + δ−1pTp¡D2¢−1´

δ−1¡D2¢−1

pTi ∙ μIC

x

¸=

= −μ¡D2¢−1 ³

I + δ−1pT p¡D2¢−1´− δ−1

¡D2¢−1

pTx = −δ−1¡D2¢−1 h

μ³δI + pT p

¡D2¢−1´

+ pTxi

[Dwx (p,w)]C×1 = −h ¡

D2¢−1 ³

I + δ−1pT p¡D2¢−1´

δ−1¡D2¢−1

pTi ∙ 0−1

¸= δ−1

¡D2¢−1

pT

[Dpμ (p,w)]1×C = −hδ−1p

¡D2¢−1

δ−1i ∙ μIC

x

¸= −δ−1

³μp¡D2¢−1

+ x´.

[Dwμ (p,w)]1×1 = −hδ−1p

¡D2¢−1

δ−1i ∙ 0−1

¸= δ−1.

.

.As a simple application of the Envelope Theorem, we also have that, defined the indirect utility

function as

v : RC+1++ → R, v : (p,w) 7→ u (x (p,w)) ,

we have that

D(p,w)v (p,w) = λ£−xT 1

¤.

20.2 Production

Definition 792 A production vector (or input-output or netput vector) is a vector y := (yc)Cc=1 ∈RC which describes the net outputs of C commodities from a production process. Positive numbersdenote outputs, negative numbers denote inputs, zero numbers denote commodities neither used norproduced.

Observe that, given the above definition, py is the profit of the firm.

Definition 793 The set of all feasible production vectors is called the production set Y ⊆ RC . Ify ∈ Y, then y can be obtained as a result of the production process; if y /∈ Y,that is not the case.

Definition 794 The Profit Maximization Problem (PMP) is

maxy

py s.t. y ∈ Y.

It is convenient to describe the production set Y using a function F : RC → R called thetransformation function. That is done as follows:

Y =©y ∈ RC : F (y) ≥ 0

ª.

We list below a smooth version of the assumptions made on Y , using the transformation function.Some assumption on F (.) .(1) ∃y ∈ RC such that F (y) ≥ 0.(2) F is C2.(3) (No Free Lunch) If y ≥ 0, then F (y) < 0.(4) (Possibility of Inaction) F (0) = 0.(5) (F is differentiably strictly decreasing) ∀y ∈ RC , DF (y)¿ 0(6) (Irreversibility) If y 6= 0 and F (y) ≥ 0, then F (−y) < 0.(7) (F is differentiably strictly concave) ∀∆ ∈ RC\ {0} , ∆TD2F (y)∆ < 0.

Definition 795 Consider a function F (.) satisfying the above properties and a strictly positive realnumber N . The Smooth Profit Maximization Problem (SPMP) is

maxy

py s.t. F (y) ≥ 0 and kyk ≤ N. (20.1)


Remark 796 For any solution to the above problem it must be the case that F (y) = 0. Supposethere exists a solution y0 to (SPMP) such that F (y0) > 0. Since F is continuous, in fact C2,there exists ε > 0 such that z ∈ B (y0, ε) ⇒ F (z) > 0. Take z0 = y0 + ε·1

C . Then, d (y0, z0) :=³PCc=1

¡εC

¢2´ 12

=³C¡εC

¢2´ 12

=³ε2

C

´ 12

= ε√C< ε. Therefore z0 ∈ B (y0, ε) and

F (z0) > 0 (1) .

But,

pz0 = py0 + pε · 1C

> py0 (2) .

(1) and (2) contradict the fact that y0 solves (SPMP).

Proposition 797 If a solution with kyk < N to (SPMP ) exists . Then y : RC++ → RC , p 7→argmax (20.1) is a well defined C1function.

Proof.Let’s first show that y (p) is single valued.Suppose there exist y, y0 ∈ y (p) with y 6= y0. Consider yλ := (1− λ) y+λy0. Since F (.) is strictly

concave, it follows that F¡yλ¢> (1− λ)F (y) + λF (y0) ≥ 0, where the last inequality comes from

the fact that y, y0 ∈ y (p) . But then F¡yλ¢> 0. Then following the same argument as in Remark

796, there exists ε > 0 such that z0 = yλ+ ε·1C and F (z0) > 0. But pz0 > pyλ = (1− λ) py+λpy0 = py,

contradicting the fact that y ∈ y (p) .Let’s now show that y is C1

From Remark 796 and from the assumption that kyk < N, (SPMP) can be rewritten asmaxy py s.t. F (y) = 0. We can then try to apply Lagrange Theorem.Necessary conditions: DF (y)¿ 0;sufficient conditions: py is linear and therefore pseudo-concave; F (.) is differentiably strictly

concave and therefore quasi-concave; the Lagrange multiplier λ is strictly positive -see below.Therefore, the solutions to (SPMP ) are characterized by the following First Order Conditions,

i.e., the derivative of the Lagrangian function with respect to y and λ equated to zero:

y pL (y, p) = py + λF (y) . p+ λDF (y) = 0 F (y) = 0

.Observe that λ = − p1

Dy1F (y)> 0.

As usual to show differentiability of the choice function we take derivatives of the First OrderConditions.

y λ

p+ λDF (y) = 0 D2F (y) [DF (y)]T

F (y) = 0 DF (y) 0

We want to show that the above matrix has full rank. By contradiction, assume that thereexists ∆ := (∆y,∆λ) ∈ RC ×R, ∆ 6= 0 such that∙

D2F (y) [DF (y)]T

DF (y) 0

¸ ∙∆y∆λ

¸= 0,

i.e.,

D2F (y) ·∆y + [DF (y)]T ·∆λ = 0 (a) ,

DF (y) ·∆y = 0 (b) .

Premultiplying (a) by ∆yT , we get ∆yT ·D2F (y) ·∆y+∆yT · [DF (y)]T ·∆λ = 0. From (b) , itfollows that ∆yT ·D2F (y) ·∆y = 0, contradicting the differentiably strict concavity of F (.) .(3)From the Envelope Theorem, we know that if

¡y, λ

¢is the unique pair of solution-multiplier

associated with p, we have that

20.3. THE DEMAND FOR INSURANCE 261

Dpπ (p)|p = Dp (py)|(p,y) + λDpF (y)|(p,y) .

Since Dp (py)|(p,y) = y, DpF (y) = 0 and, by definition of y, y = y (p) , we get Dpπ (p)|p = y (p) ,as desired.(4)

From (3) , we have that Dpy (p) = D2pπ (p) . Since π (.) is convex -see Proposition ?? (2)- the

result follows.(5)

From Proposition ?? (4) and the fact that y (p) is single valued, we know that ∀α ∈ R++, y (p)−y (αp) = 0. Taking derivatives with respect to α, we have Dpy (p)|(αp) ·p = 0. For α = 1, the desiredresult follows.

20.3 The demand for insurance

Consider an individual whose wealth is

W − d with probability π, andW with probability 1− π,

where W > 0 and d > 0.

Let the functionu : A→ R, u : c→ u (c)

be the individual’s Bernoulli function.

Assumption 1. ∀c ∈ R, u0 (c) > 0 and u00 (c) < 0.

Assumption 2. u is bounded above.

An insurance company offers a contract with following features: the potentially insured individ-ual pays a premium p in each state and receives d if the accident occurs. The (potentially insured)individual can buy a quantity a ∈ R of the contract. In the case, she pays a premium (a · p) in eachstate and receives a reimbursement (a · d) if the accident occurs. Therefore, if the individual buysa quantity a of the contract, she get a wealth described as follows

W1 :=W − d− ap+ ad with probability π, andW2 :=W − ap with probability 1− π.

(20.2)

Remark 798 It is reasonable to assume that p ∈ (0, d) .

DefineU : R→ R, U : a 7→ πu (W − d− ap+ ad) + (1− π)u (W − ap) .

Then the individual solves the following problem. For given, W ∈ R++, d ∈ R++, p ∈ (0, d) ,π ∈ (0, 1)

maxa∈R U (a) (M) (20.3)

To show existence of a solution, we introduce the problem presented below. For given W ∈R++, d ∈ R++, p ∈ (0, d) , π ∈ (0, 1)

maxa∈R U (a) s.t. U (a) ≥ U (0) (M 0)

DefinedA∗ := argmax (M) and A0 := argmax (M 0) , the existence of solution to (M) , followsfrom the Proposition below.

Proposition 799 1. A0 ⊂ A∗. 2. A0 6= ∅.


Proof.Exercise

To show that the solution is unique, observe that

U 0 (a) = πu0 (W − d+ a (d− p)) (d− p) + (1− π)u0 (W − ap) (−p) (20.4)

and therefore

U 00 (a) =(+)π u00

(−)(W − d+ a (d− p))

(+)

(d− p)2 +(+)

(1− π)(−)

u00 (W − ap)(+)

p2 < 0.

Summarizing, the unique solution of problem (M) is the unique solution of the equation:

U 0 (a) = 0.

Definition 800 a∗ : R++ × (0, 1)× (0, d)×R++ → R,a∗ : (d, π, p,W ) 7→ argmax (M) .U∗ : Θ→ R,U∗ : θ 7→ πu (W − d+ a∗ (θ) (d− p)) + (1− π)u (W − a∗ (θ) p)

Proposition 801 The signs of the derivatives of a∗ and U∗ with respect to θ are presented in thefollowing table1 :

d π p W

a∗ > 0 if a∗ ∈ [0, 1] > 0 T 0 ≤ 0 if a∗ ≤ 1U∗ ≤ 0 if a∗ ∈ [0, 1] ≤ 0 if a∗ ∈ [0, 1] ≤ 0 if a∗ ≥ 0 > 0

Proof. Exercise.

1Conditions on a∗ (θ) contained in the table can be expressed in terms of exogenous variables.

Part V

Problem Sets

263

Chapter 21

Exercises

21.1 Linear Algebra1.Show that the set of pair of real numbers is not a vector space with respect to the following

operations:(i). (a, b) + (c, d) = (a+ c, b+ d) and k (a, b) = (ka, b) ;(ii) (a, b) + (c, d) = (a+ c, b) and k (a, b) = (ka, kb) .

2.Show that W is not a vector subspace of R3 on R if(i) W =

©(x, y, z) ∈ R3 : z ≥ 0

ª;

(ii) W =©x ∈ R3 : kxk ≤ 1

ª,

(iii) W = Q3.

3.Let V be the vector space of all functions f : R→ R. Show the W is a vector subspace of V if(i) W = {f ∈ V : f (1) = 0} ;(ii) W = {f ∈ V : f (1) = f (2)} .

4.

Show that(i).

V =©(x1, x2, x3) ∈ R3 : x1 + x2 + x3 = 0

ªis a vector subspace of R3;(ii).

S = {(1,−1, 0) , (0, 1,−1)}is a basis for V .

5.Show the following fact.Proposition. Let a matrix A ∈M (n, n), with n ∈ N\ {0} be given. The set

CA := {B ∈M (n, n) : BA = AB}

is a vector subspace of M (n, n) (with respect to the field R).

6.Let U and V be vector subspaces of a vector space W . Show that

U + V := {w ∈W : ∃u ∈ U and v ∈ V such that w = u+ v}

265

266 CHAPTER 21. EXERCISES

is a vector subspace of W .

7.Show that the following set of vectors is linearly independent:

{(1, 1, 1, ) , (0, 1, 1) , (0, 0, 1)} .

8. Using the definition, find the change-of-basis matrix from

S = {u1 = (1, 2), u2 = (3, 5)}

toE = {e1 = (1, 0), e2 = (0, 1)}

and from E to S. Check the conclusion of Proposition 204, i.e., that one matrix is the inverseof the other one.

9. Find the determinant of

C =

⎡⎢⎢⎢⎢⎣6 2 1 0 52 1 1 −2 11 1 2 −2 33 0 2 3 −1−1 −1 −3 4 2

⎤⎥⎥⎥⎥⎦

10. Say for which values of k ∈ R the following matrix has rank a. 4, b. 3:

A :=

⎡⎣ k + 1 1 −k 2−k 1 2− k k1 0 1 −1

⎤⎦

11.Diagonalize

A =

∙4 23 −1

¸.

12. Let

A =

∙2 21 3

¸.

Find: (a) all eigenvalues of A and corresponding eigenspaces, (b) an invertible matrix P such thatD = P−1AP is diagonal.

13.Show that similarity between matrices is an equivalence relation.

14.Let A be a square symmetric matrix with real entries. Show that eigenvalues are real numbers.

Show that if λi 6= λj for i 6= j then corresponding eigenvectors are orthogonal, i.e., their scalarproduct is zero.

15.Show that

V =©(x1, x2, x3) ∈ R3 : x1 − x2 = 0

ªis a vector subspace of R3 and find a basis for V .

16.

21.1. LINEAR ALGEBRA 267

Given

l : R4 → R4, l (x1, x2, x3, x4) =¡x1, x1 + x2, x1 + x2 + x3, x1 + x2 + x3 + x4

¢show it is linear, compute the associated matrix with respect to canonical bases, and compute ker land Iml.

17.Complete the text below.Proposition. Assume that l ∈ L (V,U) and ker l = {0}. Then,

∀u ∈ Im l, there exists a unique v ∈ V such that l (v) = u.

Proof.Since ........................, by definition, there exists v ∈ V such that

l (v) = u. (21.1)

Take v0 ∈ V such that l (v0) = u. We want to show that

........................ (21.2)

Observe thatl (v)− l (v0)

(a)= ......................... (21.3)

where (a) follows from .........................Moreover,

l (v)− l (v0)(b)= ........................., (21.4)

where (b) follows from .........................Therefore,

l (v − v0) = 0,

and, by definition of ker l,......................... (21.5)

Since, ........................., from (21.5), it follows that

v − v0 = 0.

18.Let the following sets be given:

V =©(x1, x2, x3, x4) ∈ R4 : x1 − x2 + x3 − x4 = 0

ªand

W =©(x1, x2, x3, x4) ∈ R4 : x1 + x2 + x3 + x4 = 0

ªIf possible, find a basis of V ∩W .

19.Say if the following statement is true or false.Let V and U be vector spaces on R, W a vector subspace of U and l ∈ L (V,U). Then l−1 (W )

is a vector subspace of V .

20.Let the following full rank matrices

A =

∙a11 a12a21 a22

¸B =

∙b11 b12b21 b22

¸


be given. Say for which values of k ∈ R, the following linear system has solutions.

⎡⎢⎢⎢⎢⎣1 a11 a12 0 0 02 a21 a22 0 0 03 5 6 b11 b12 04 7 8 b21 b22 01 a11 a12 0 0 k

⎤⎥⎥⎥⎥⎦

⎡⎢⎢⎢⎢⎢⎢⎣x1x2x3x4x5x6

⎤⎥⎥⎥⎥⎥⎥⎦ =⎡⎢⎢⎢⎢⎣

k123k

⎤⎥⎥⎥⎥⎦

21.Consider the following Proposition contained in Section 8.1 in the class Notes:Proposition .∀v ∈ V,

[l]uv · [v]v = [l (v)]u (21.6)

Verify the above equality in the case in whicha.

l : R2 → R2, (x1, x2) 7→µ

x1 + x2x1 − x2

¶b. the basis v of the domain of l is ½µ

10

¶,

µ01

¶¾,

c. the basis u of the codomain of l is½µ11

¶,

µ21

¶¾,

d.

v =

µ34

¶.

22.Complete the following proof.Proposition. Letn,m ∈ N\ {0} such that m > n, anda vector subspace L of Rm such that dimL = nbe given. Then, there exists l ∈ L (Rn,Rm) such that Im l = L.Proof. Let

©viªni=1

be a basis of L ⊆ Rm. Take l ∈ L (Rn,Rm) such that

∀i ∈ {1, ..., n} , l2¡ein¢= vi,

where ein is the i—th element in the canonical basis in Rn. Such function does exists and, in fact,it is unique as a consequence of a Proposition in the Class Notes that we copy below:..........................................................................................................................Then, from the Dimension theorem

dim Iml = ..............................................

Moreover,L = ...........

©viªni=1⊆ ...............

Summarizing,L ⊆ Im l , dimL = n and dim Im l ≤ n,

and thereforedim Iml = n.

Finally, from Proposition .................................in the class Notes since L ⊆ Im l , dimL = nand dim Iml = n, we have that Im l = L, as desired.Proposition ............................. in the class Notes says what follows:

21.2. SOME TOPOLOGY IN METRIC SPACES 269

.......................................................................................

23.Say for which value of the parameter a ∈ R the following system has one, infinite or no solutions⎧⎪⎪⎨⎪⎪⎩

ax1 + x2 = 1x1 + x2 = a2x1 + x2 = 3a3x1 + 2x2 = a

24.Say for which values of k,the system below admits one, none or infinite solutions.

A (k) · x = b (k)

where k ∈ R, and

A (k) ≡

⎡⎢⎢⎣1 0

1− k 2− k1 k1 k − 1

⎤⎥⎥⎦ , b (k) ≡

⎡⎢⎢⎣k − 1k10

⎤⎥⎥⎦ .

21.2 Some topology in metric spaces

21.2.1 Basic topology in metric spaces

1.Do Exercise 430: Let d be a metric on a non-empty set X. Show that

d0 : X ×X → R, d (x, y) =d (x, y)

1 + d (x, y)

is a metric on X.

2.Let X be the set of continuous real valued functions with domain [0, 1] ⊆ R and

d (f, g) =

Z 1

0

f (x)− g (x) dx,

where the integral is the Riemann Integral (that one you learned in Calculus 1). Show that(X, d) is not a metric space.

3.Do Exercise 447 for n = 2 : ∀n ∈ N,∀i ∈ {1, ..., n} ,∀ai, bi ∈ R with ai < bi,

×ni=1 (ai, bi)

is (Rn, d2) open.

4.Show the second equality in Remark 455:

∩+∞n=1µ− 1n,1

n

¶= {0}

5.


Say if the following set is (R, d2) open or closed:

S :=

½x ∈ R : ∃ n ∈ N\ {0} such that x = (−1)n 1

n

¾

6.Say if the following set is (R, d2) open or closed:

A := ∪+∞n=1µ1

n, 10− 1

n

¶.

7.Do Exercise 465: show that F (S) = F

¡SC¢.

8.Do Exercise 466: show that F (S) is a closed set.

9.Let the metric space (R, d2) be given. Find Int S,Cl (S) ,F (S) ,D (S) , Is (S) and say if S is

open or closed for S = Q, S = (0, 1) and S =©x ∈ R : ∃n ∈ N+ such that x = 1

n

ª.

10.Show that the following statements are false:a. Cl (Int S) = S,b. Int Cl (S) = S.11.Given S ⊆ R, say if the following statements are true or false.a. S is an open bounded interval ⇒ S is an open set;b. S is an open set ⇒ S is an open bounded interval;c. x ∈ F (S)⇒ x ∈ D (S);d. x ∈ D (S)⇒ x ∈ F (S) .

12.Using the definition of convergent sequences, show that the following sequences do converge:a. (xn)n∈N ∈ R∞ such that ∀n ∈ N, xn = 1;b. (xn)n∈N\{0} ∈ R∞ such that ∀n ∈ N\ {0} , xn = 1

n .

13.Using Proposition 493, show that [0, 1] is (R, d2) closed.

14.Show the following result: A subset of a discrete space, i.e., a metric space with the discrete

metric, is compact if and only if it is finite.

15.Say if the following statement is true: An open set is not compact.

16.Using the definition of compactness, show the following statement: Any open ball in

¡R2, d2

¢is

not compact.

17.Show that f (A ∪B) = f (A) ∪ f (B) .

18.Show that f (A ∩B) 6= f (A) ∩ (B) .

19.


Using the characterization of continuous functions in terms of open sets, show that for anymetric space (X, d) the constant function is continuous.

20.a. Say if the following sets are (Rn, d2) compact:i.

Rn+,

ii.

∀x ∈ Rn and∀r ∈ R++, , Cl B (x, r) .

b. Say if the following set is (R, d2) compact:½x ∈ R : ∃n ∈ N\ {0} such that x = 1

n

¾.

21.Given the continuous functions

g : Rn → Rm

show that the following set is closed

{x ∈ Rn : g (x) ≥ 0}

22.Assume that f : Rm → Rn is continuous. Say if

X = {x ∈ Rm : f (x) = 0}

is (a) closed, (b) is compact.

23.Using the characterization of continuous functions in terms of open sets, show that the following

function is not continuous

f : R→ R, f (x) =

⎧⎨⎩ x if x 6= 0

1 if x = 0

24.Using the Extreme Value Theorem, say if the following maximization problems have solutions.

maxx∈Rn

nXi=1

xi s.t. kxk ≤ 1

maxx∈Rn

nXi=1

xi s.t. kxk < 1

maxx∈Rn

nXi=1

xi s.t. kxk ≥ 1


21.2.2 Correspondences

To solve the following exercises on correspondences, we need some preliminary definitions.1

A set C ⊆ Rn is convex if ∀x1, x2 ∈ C and ∀λ ∈ [0, 1], (1− λ)x1 + λx2 ∈ C.A set C ⊆ Rn is strictly convex if ∀x1, x2 ∈ C and ∀λ ∈ (0, 1), (1− λ)x1 + λx2 ∈ Int C.Consider an open and convex set X ⊆ Rn and a continuous function f : X → R, f is quasi-

concave iff ∀ x0, x00 ∈ X, ∀ λ ∈ [0, 1],

f((1− λ)x0 + λx00) ≥ min {f (x0) , f(x00)} .

f is strictly quasi-concave

Definition 802 iff ∀ x0, x00 ∈ X, such that x0 6= x00, and ∀ λ ∈ (0, 1), we have that

f((1− λ)x0 + λx00) > min {f (x0) , f(x00)} .

We define the budget correspondence as

Definition 803

β : RC++ ×R++ →→ RC , β (p,w) =©x ∈ RC+ : px ≤ w

ª.

The Utility Maximization Problem (UMP ) is

Definition 804maxx∈RC+ u (x) s.t. px ≤ w, or x ∈ β (p,w)

ξ : RC++ ×R++ →→ RC , ξ (p,w) = argmax (UMP ) is the demand correspondence.

The Profit Maximization Problem (PMP) is

maxy

py s.t. y ∈ Y.

Definition 805 The supply correspondence is

y : RC++ →→ RC , y (p) = argmax(PMP ).

We can now solve some exercises. (the numbering has to be changed)1.Show that ξ is non-empty valued.2.Show that for every (p,w) ∈ RC++ ×R++,(a) if u is quasiconcave, ξ is convex valued;(b) if u is strictly quasiconcave, ξ is single valued, i.e., it is a function.3.Show that β is closed.4.If a solution to (PMP) exists, show the following properties hold.(a) If Y is convex, y (.) is convex valued;(b) If Y is strictly convex (i.e., ∀λ ∈ (0, 1) , yλ := (1− λ) y0+λy00 ∈ Int Y ), y (.) is single valued.5Consider φ1, φ2 : [0, 2]→→ R,

φ1 (x) =

⎧⎨⎩£−1 + 0.25 · x, x2 − 1

¤if x ∈ [0, 1)

[−1, 1] if x = 1£−1 + 0.25 · x, x2 − 1

¤if x ∈ (1, 2]

,

and

φ2 (x) =

⎧⎨⎩£−1 + 0.25 · x, x2 − 1

¤if x ∈ [0, 1)

[−0.75,−0.25] if x = 1£−1 + 0.25 · x, x2 − 1

¤if x ∈ (1, 2]

.

1The definition of quasi-concavity and strict quasi-concavity will be studied in detail in Chapter .

21.3. DIFFERENTIAL CALCULUS IN EUCLIDEAN SPACES 273

Say if φ1 and φ2 are LHC, UHC, closed, convex valued, compact valued.6.Consider φ : R+ →→ R,

φ (x) =

½ ©sin 1

x

ªif x > 0

[−1, 1] if x = 0,

Say if φ is LHC, UHC, closed.7.Consider φ : [0, 1]→→ [−1, 1]

φ (x) =

½[0, 1] if x ∈ Q ∩ [0, 1][−1, 0] if x ∈ [0, 1] \Q .

Say if φ is LHC, UHC, closed.8.Consider φ1, φ2 : [0, 3]→→ R,

φ1 (x) =£x2 − 2, x2

¤,

and

φ2 (x) =£x2 − 3, x2 − 1

¤,

φ3 (x) := (φ1 ∩ φ2) (x) := φ1 (x) ∩ φ2 (x) .Say if φ1, φ2 and φ3 are LHC, UHC, closed.

21.3 Differential Calculus in Euclidean Spaces1 .Using the definition, compute the partial derivative of the following function in an arbitrary

point (x0, y0) :f : R2 → R, f (x, y) = 2x2 − xy + y2.

2 .If possible, compute partial derivatives of the following functions.a. f (x, y) = x · arctan y

x ;b. f (x, y) = xy;c. f (x, y) = (sin (x+ y))

√x+y in (0, 3)

3,Given the function f : R2 → R,

f (x, y) =

⎧⎨⎩xy

x2+y2 if x+ y 6= 0

0 otherwise,

show that it admits both partial derivatives in (0, 0) and it is not continuous in (0, 0).

4 .Using the definition, compute the directional derivative f 0 ((1, 1) ; (α1, α2)) with α1, α2 6= 0 for

f : R2 → R,f (x, y) =

x+ y

x2 + y2 + 1.

5 .Using the definition, show that the following function is differentiable: f : R2 → R,

f (x, y) = x2 − y2 + xy (21.7)


Comment: this exercise requires some tricky computations. Do not spend too much time on it.Do this exercise after having studied Proposition 695.

6 .Using the definition, show that the following functions are differentiable.a. l ∈ L (Rn,Rm);b. the projection function f : Rn → R, f :

¡xi¢ni=1

7→ x1.7.Show the following result which was used in the proof of Proposition 660. A linear function

l : Rn → Rm is continuous.

8 .Compute the Jacobian matrix of f : R2 → R3,

f (x, y) = (sinx cos y, sinx sin y, cosx cos y)

9 .Given differentiable functions g, h : R → R and y ∈ R\ {0}, compute the Jacobian matrix of

f : R3 → R3, f (x, y, z) =³g (x) · h (z) , g(h(x))

y , ex·g(h(x))´

10 .Compute total derivative and directional derivative at x0 in the direction u.a.

f : R3++ → R, f (x1, x2, x3) =1

3log x1 +

1

6log x2 +

1

2log x3

x0 = (1, 1, 2), u = 1√3(1, 1, 1);

b.

f : R3 → R, f (x1, x2, x3) = x21 + 2x22 − x23 − 2x1x2 − 6x2x3

x0 = (1, 0,−1), u =³− 1√

2, 0, 1√

2

´;

c.

f : R2 → R, f (x1, x2) = x1 · ex1x2

x0 = (0, 0), u = (2, 3).

11 .Given

f (x, y, z) =¡x2 + y2 + z2

¢−12 ,

show that if (x, y, z) 6= 0, then

∂2f (x, y, z)

∂x2+

∂2f (x, y, z)

∂y2+

∂2f (x, y, z)

∂z2= 0

12 .Given the C2 functions g, h : R→ R++, compute the Jacobian matrix of

f : R3 → R3, f (x, y, z) =³g(x)h(z) , g (h (x)) + xy, ln (g (x) + h (x))

´

13 .Given the functions

f : R2 → R2, f (x, y) =

µex + yey + x

¶


g : R→ R, x 7→ g (x)

h : R→ R2, h (x) = f (x, g (x))

Assume that g is C2. a. compute the differential of h in 0; b. check the conclusion of the ChainRule.

14 .Let the following differentiable functions be given.

f : R3 → R (x1, x2, x3) 7→ f (x1, x2, x3)

g : R3 → R (x1, x2, x3) 7→ g (x1, x2, x3)

a : R3 → R3, (x1, x2, x3) 7→

⎛⎝ f (x1, x2, x3)g (x1, x2, x3)x1

⎞⎠b : R3 → R2, (y1, y2, y3) 7→

µg (y1, y2, y3)f (y1, y2, y3)

¶Compute the directional derivative of the function b ◦ a in the point (0, 0, 0) in the direction

(1, 1, 1).

15 .Using the theorems of Chapter 16, show that the function in (21.7) is differentiable.

16 .Given

f : R3 → R1, (x, y, z) 7→ z + x+ y3 + 2x2y2 + 3xyz + z3 − 9,say if you can apply the Implicit Function Theorem to the function in (x0, y0, z0) = (1, 1, 1) and,

if possible, compute ∂x∂z and

∂y∂z in (1, 1, 1).

17 .Using the notation of the statement of the Implicit Function Theorem presented in the Class

Notes, say if that Theorem can be applied to the cases described below; if it can be applied, computethe Jacobian of g. (Assume that a solution to the system f (x, t) = 0 does exist).a. f : R4 → R2,

f (x1, x2, t1, t2) 7→µ

x21 − x22 + 2t1 + 3t2x1x2 + t1 − t2

¶b. f : R4 → R2,

f (x1, x2, t1, t2) 7→µ2x1x2 + t1 + t22x21 + x22 + t21 − 2t1t2 + t22

¶c. f : R4 → R2,

f (x1, x2, t1, t2) 7→µ

t21 − t22 + 2x1 + 3x2t1t2 + x1 − x2

¶

18.Say under which conditions, if z3 − xz − y = 0, then

∂2z

∂x∂y= − 3z2 + x

(3z2 − x)3

19. Do Exercise 710: Let the utility function u : R2++ → R++, (x, y) 7→ u (x, y) be given.Assume that it satisfies the following properties i. u is C2, ii. ∀ (x, y) ∈ R2++, Du (x, y) >> 0,iii.∀ (x, y) ∈ R2++, Dxxu (x, y) < 0,Dyyu (x, y) < 0,Dxyu (x, y) > 0. Compute the Marginal Rate ofSubstitution in (x0, y0) and say if the graph of each indifference curve is concave.


21.4 Nonlinear Programming1. 2 Determine, if possible, the nonnegative parameter values for which the following functionsf : X → R, f : (xi)ni=1 := x 7→ f (x) are concave, pseudo-concave, quasi-concave, strictly concave.(a) X = R++, f (x) = αxβ;

(b) X = Rn++, n ≥ 2, f (x) =Pn

i=1 αi (xi)βi (for pseudo-concavity and quasi-concavity consider

only the case n = 2).(c) X = R, f (x) = min {α, βx− γ} .

2.a. Discuss the following problem. For given π ∈ (0, 1), a ∈ (0,+∞),

max(x1,x2) π · u (x) + (1− π)u (y) s.t. y ≤ a− 12x

y ≤ 2a− 2xx ≥ 0y ≥ 0

where u : R→ R is a C2 function such that ∀z ∈ R, u0 (z) > 0 and u00 (z) < 0.b. Say if there exist values of (π, a) such that (x, y, λ1, λ2, λ3, λ4) =

¡23a,

23a, λ1, 0, 0, 0

¢, with

λ1 > 0, is a solution to Kuhn-Tucker conditions, where for j ∈ {1, 2, 3, 4}, λj the multiplier associ-ated with constraint j.c. “Assuming” that the first, third and fourth constraint hold with a strict inequality, and

the multiplier associated with the second constraint is strictly positive, describe in detail how tocompute the effect of a change of a or π on a solution of the problem.

3.a. Discuss the following problem. For given π ∈ (0, 1) , w1, w2 ∈ R++,

max(x,y,m)∈R2++×R π log x+ (1− π) log y s.t

w1 −m− x ≥ 0w2 +m− y ≥ 0

b. Compute the effect of a change of w1 on the component x∗ of the solution.c. Compute the effect of a change of π on the objective function computed at the solution of

the problem.

4.a. Discuss the following problem.

min(x,y)∈R2 x2 + y2 − 4x− 6y s.t. x+ y ≤ 6y ≤ 2x ≥ 0y ≥ 0

Let (x∗, y∗) be a solution to the the problem.b. Can it be x∗ = 0 ?c. Can it be (x∗, y∗) = (2, 2) ?.

5.Characterize the solutions to the following problems.(a) (consumption-investment)

max(c1,c2,k)∈R3 u (c1) + δu (c2)s.t.c1 + k ≤ ec2 ≤ f (k)c1, c2, k ≥ 0,

2Exercise 1 is taken from David Cass’ problem sets for his Microeconomics course at the University of Pennsylvania.

21.4. NONLINEAR PROGRAMMING 277

where u : R → R, u0 > 0, u00 < 0; f : R+ → R, f 0 > 0, f 00 < 0 and such that f (0) = 0; δ ∈(0, 1) , e ∈ R++. After having written Kuhn Tucker conditions, consider just the case in whichc1, c2, k > 0.(b) (labor-leisure)

max(x,l)∈R2 u (x, l)s.t.px+ wl ≤ wll ≤ lx, l ≥ 0,

where u : R2 is C2, ∀ (x, l) Du (x, l) À 0, u is differentiably strictly quasi-concave, i.e.,∀ (x, l) ,if ∆ 6= 0 and Du (x, l) ·∆ = 0, then ∆TD2u ∆ < 0; p > 0, w > 0 and l > 0.Describe solutions for which x > 0 and 0 < l < l,

6.(a) Consider the model described in Exercise 6. (a) . What would be the effect on consumption

(c1, c2) of an increase in initial endowment e?What would be the effect on (the value of the objective function computed at the solution of

the problem) of an increase in initial endowment e?Assume that f (k) = akα, with a ∈ R++ and α ∈ (0, 1) . What would be the effect on consump-

tion (c1, c2) of an increase in a?(b) Consider the model described in Exercise 6. (b) .What would be the effect on leisure l of an

increase in the wage rate w? in the price level p?What would be the effect on (the value of the objective function computed at the solution of

the problem) of an increase in the wage rate w? in the price level p?.

Chapter 22

Solutions

22.1 Linear Algebra

1.We want to prove that the following sets are not vector spaces. Thus, it is enough to find a

counter-example violating one of the conditions defining vector spaces.(i) The definition violates the so-called M2 distributive assumption since for k = 1 and a = b = 1,

(1 + 1) · (1, 1) = 2 · (1, 1) = (2, 1) : while : 1 · (1, 1) + 1 · (1, 1) = (2, 2)

(ii) The definition violates the so-called A4 commutative property since for a = b = c = 1 andd = 0,

(1, 1) + (1, 0) = (2, 1) 6= (1, 0) + (1, 1) = (2, 0)

2.(i) Take any w := (x, y, z) ∈W with z > 0, and α ∈ R with α < 0; then αw = (αx, αy, αz) with

αz < 0 and therefore w /∈W.(ii) Take any nonzero w ∈W and define α = 2/||w||. Observe that ||αw|| = 2 > 1 and therefore

αw /∈W.(iii) Multiplication of any nonzero element of Q3 by α ∈ R \Q will give an element of R3 \Q3

instead of Q3.3.We use Proposition 138. Therefore, we have to check thata. 0 ∈W ; b. ∀u, v ∈W, ∀α, β ∈ F, αu+ βv ∈W .Define simply by 0 the function f : R→ R such that ∀x ∈ R, f (x) = 0.(i) a. Since 0 (1) = 0, 0 ∈W .b.

αu (1) + βv (1) = α · 0 + β · 0 = 0,

where the first equality follows from the assumption that u, v ∈W . Then, indeed, αu+βv ∈W .(ii) a. Since 0 (1) = 0 = 0 (2), we have that 0 ∈W .b.

αu (1) + βv (1) = αu (2) + βv (2) ,

where the equality follows from the assumption that u, v ∈ W and therefore u (1) = u (2) andv (1) = v (2).4.Again we use Proposition 138 and we have to check thata. 0 ∈W ; b. ∀u, v ∈W, ∀α, β ∈ F, αu+ βv ∈W .a. (0, 0, 0) ∈W simply because 0 + 0 + 0 = 0.b. Given u = (u1, u2, u3) , v = (v1, v2, v3) ∈ V , i.e., such that

u1 + u2 + u3 = 0 and v1 + v2 + v3 = 0, (22.1)

279

280 CHAPTER 22. SOLUTIONS

we haveαu+ βv = (αu1 + βv1, αu2 + βv2, αu3 + βv3) .

Then,

αu1 + βv1 + αu2 + βv2 + αu3 + βv3 = α (u1 + u2 + u3) + β (v1 + v2 + v3)(22.1)= 0.

(ii) We have to check that a. S is linearly independent and b. span S = V .a.

α (1,−1, 0) + β (0, 1,−1) =¡α −α+ β −β

¢= 0

implies that α = β = 0.b. Taken (x1, x2, x3) ∈ V , we want to find α, β ∈ R such that (x1, x2, x3) = α (1,−1, 0) +

β (0, 1,−1) =¡α −α+ β −β

¢, i.e., we want to find α, β ∈ R such that⎧⎪⎪⎨⎪⎪⎩x1 = αx2 = −α+ βx3 = −βx1 + x2 + x3 = 0⎧⎨⎩ −x2 − x3 = αx2 = −α+ βx3 = −β

Then, α = −x2 − x3, β = −x3 is the (unique) solution to the above system.

5.1. 0 ∈M (n, n) : A0 = 0A = 0.2. ∀α, β ∈ R and ∀B,B0 ∈ CA,

(αB + βB0)A = αBA+ βB0A = αAB + βAB0 = AαB +AβB0 = A (αB + βB0) .

6.i. 0 ∈ U + V , because 0 ∈ U and 0 ∈ V.ii. Take α, β ∈ F and w1, w2 ∈ U + V . Then there exists u1, u2 ∈ U and v1, v2 ∈ V such that

w1 = u1 + v1 and w2 = u2 + v2. Therefore,

αw1 + βw2 = α¡u1 + v1

¢+ β

¡u2 + v2

¢=¡αu1 + βv1

¢+¡αu2 + βv2

¢∈ U + V,

because U and V are vector spaces and therefore αu1 + βv1 ∈ U and αu2 + βv2 ∈ V .7.We want to show that if

P3i=1 βivi = 0, then βi = 0 for all i. Note that

P3i=1 βivi = (β1, β1 +

β2, β1 + β2 + β3) = 0, which implies the desired result.8.We want to apply Definition 202: Consider a vector space V of dimension n and two bases

v =©viªni=1

and u =©ukªnk=1

of V . Then,

P =£ £

u1¤v

...£uk¤v

... [un]v¤∈M (n, n) ,

is called the change-of-basis matrix from the basis v to the basis u. Then in our case, thechange-of-basis matrix from S to E is

P =£ £

e1¤S

£e2¤S

¤∈M (2, 2) .

Moreover,using also Proposition 204, the change-of-basis matrix from S to E is

Q =£ £

v1¤E

£v2¤E

¤= P−1.

Computation of£e1¤S. We want to find α and β such that e1 = αu1 + βu2, i.e., (1, 0) =

α (1, 2) + β (3, 5) =¡α+ 3β 2α+ 5β

¢, i.e.,⎧⎨⎩ α+ 3β = 1

2α+ 5β = 0


whose solution is α = −5, β = 2.Computation of

£e2¤S. We want to find α and β such that e2 = αu1 + βu2, i.e., (0, 1) =

α (1, 2) + β (3, 5) =¡α+ 3β 2α+ 5β

¢, i.e.,⎧⎨⎩ α+ 3β = 0

2α+ 5β = 1½α+ 3β = 02α+ 5β = 1

whose solution is α = 3, β = −1. Therefore,

P =

∙−5 32 −1

¸.

Since E is the canonical basis, we have

S = {u1 = (1, 2), u2 = (3, 5)}

Q =

∙1 32 5

¸.

Finally ∙1 32 5

¸−1=

∙−5 32 −1

¸as desired.9.Easiest way: use row and column operations to change C to a triangular matrix.

detC = det

⎡⎢⎢⎢⎢⎣6 2 1 0 52 1 1 −2 11 1 2 −2 33 0 2 3 −1−1 −1 −3 4 2

⎤⎥⎥⎥⎥⎦ = £R1 ↔ R3¤= −det

⎡⎢⎢⎢⎢⎣1 1 2 −2 32 1 1 −2 16 2 1 0 53 0 2 3 −1−1 −1 −3 4 2

⎤⎥⎥⎥⎥⎦

= −det

⎡⎢⎢⎢⎢⎣1 1 2 −2 32 1 1 −2 16 2 1 0 53 0 2 3 −1−1 −1 −3 4 2

⎤⎥⎥⎥⎥⎦ =⎡⎢⎢⎣−2R1 +R2 → R2−6R1 +R3 → R3−3R1 +R4 → R4R1 +R5 → R5

⎤⎥⎥⎦ = −det⎡⎢⎢⎢⎢⎣1 1 2 −2 30 −1 −3 2 −50 −4 −11 12 −130 −3 −4 9 −100 0 −1 2 5

⎤⎥⎥⎥⎥⎦

= −det

⎡⎢⎢⎢⎢⎣1 1 2 −2 30 −1 −3 2 −50 −4 −11 12 −130 −3 −4 9 −100 0 −1 2 5

⎤⎥⎥⎥⎥⎦ =∙4R2 +R3 → R33R2 +R4 → R4

¸= −det

⎡⎢⎢⎢⎢⎣1 1 2 −2 30 −1 −3 2 −50 0 1 4 70 0 5 3 50 0 −1 2 5

⎤⎥⎥⎥⎥⎦

= −det

⎡⎢⎢⎢⎢⎣1 1 2 −2 30 −1 −3 2 −50 0 1 4 70 0 5 3 50 0 −1 2 5

⎤⎥⎥⎥⎥⎦ =∙−5R3 +R4 → R4R3 +R5 → R5

¸= −det

⎡⎢⎢⎢⎢⎣1 1 2 −2 30 −1 −3 2 −50 0 1 4 70 0 0 −17 −300 0 0 6 12

⎤⎥⎥⎥⎥⎦

= −det

⎡⎢⎢⎢⎢⎣1 1 2 −2 30 −1 −3 2 −50 0 1 4 70 0 0 −17 −300 0 0 6 12

⎤⎥⎥⎥⎥⎦ =∙6

17R4 +R5 → R5

¸= −det

⎡⎢⎢⎢⎢⎣1 1 2 −2 30 −1 −3 2 −50 0 1 4 70 0 0 −17 −300 0 0 0 24

17

⎤⎥⎥⎥⎥⎦


= −det

⎡⎢⎢⎢⎢⎣1 1 2 −2 30 −1 −3 2 −50 0 1 4 70 0 0 −17 −300 0 0 0 24

17

⎤⎥⎥⎥⎥⎦ = −(1)(−1)(1)(−17)(2417) =⇒ detC = −24

10.Observe that it cannot be 4 as rank(A) ≤ min{#rows,#collums}. It’s easy to check that

rank(A) = 3 by using elementary operations on rows and columns of A :−2R3 +R1 → R1, C4 + C1 → C1, C4 + C3 → C3, C1 + C3 → C3, −C2 + C3 → C3,to get ⎡⎣k − 1 1 0 0

0 0 1 k0 0 0 −1

⎤⎦which has the last three columns independent for any k.

11.First we find the eigenvalues of the matrix:

A =

∙4 23 −1

¸=⇒ det[tI −A] =

∙t− 4 −2−3 t+ 1

¸= t2 − 3t− 10

The solutions for det[tI −A] = 0 and, therefore, the eigenvalues of the matrix are λ1 = 5 and λ2 =−2. They are different and thus by Proposition 261, the matrix P composed of the correspondingeigenvectors v1 = (2, 1)0 and v2 = (−1, 3)0 (for example) does the trick. The answer is

P =

∙2 −11 3

¸and D =

∙5 00 −2

¸.

12.For the eigenvalues, we look at the solutions for det[tI −A] = 0

A =

∙2 21 3

¸=⇒ det[tI −A] =

∙t− 2 −2−1 t− 3

¸= t2 − 5t+ 4

So that the eigenvalues are 1 and 4. For the eigenspaces, we substitute each eigenvalue in thetI −A matrix.For λ = 1

I −A =

∙−1 −2−1 −2

¸,

and then©(x, y) ∈ R2 : 2x = −y

ªis the eigenspace for λ = 1.

For λ = 4

4I −A =

∙2 −2−1 1

¸,

and©(x, y) ∈ R2 : x = y

ªis the eigenspace for λ = 4.

Finally, for the P matrix, we consider one eigenvector belonging to each of the eigenspaces, forexample, (2,−1) for λ = 1 and (1, 1) for λ = 4. Then, by Proposition 241, we can construct P as

P =

∙2 1−1 1

¸

13.Recall that: A matrix B ∈ M (n, n) is similar to a matrix A ∈ M (n, n) if there exists an

invertible matrix P ∈M (n, n) such that


B = P−1AP.

An equivalence relation is a relation which is reflexive, symmetric and transitive.a. reflexive. B = I−1BI.b. symmetric. B = P−1AP ⇒ A = PBP−1.c. transitive. B = P−1AP and C = Q−1BQ ⇒ C = Q−1BQ = Q−1

¡P−1AP

¢Q =

(PQ)−1A (PQ).14.

1. Let A be a square matrix symmetric. Let λ ∈ C be an eigenvalue, λ its conjugate, and Xa corresponding eigenvector - thus X 6= 0. ∗ denotes the conjugate of a matrix or a vectoraccording to the context. We have that

λX∗X = (λX)∗.X = (AX)∗X = X∗A∗X

Or, A being symmetric A∗ = A. Thus,

λX∗X = X∗AX = X∗(λX) = λX∗X

Since X∗X 6= 0, it must be that λ = λ. Then λ ∈ R.

2. Let now Xi and Xj be two eigenvectors associated with the two distinct eigenvectors λi andλj .Then

λiX∗i Xj =|{z}

λi∈R

(λiXi)∗Xj = (AXi)

∗Xj =|{z}A is symmetric

X∗i AXj = λjX∗i Xj

Since λi 6= λj , it must be that X∗i Xj = 0, i.e., Xi and Xj are orthogonal.

15.Defined

l : R3 → R, (x1, x2, x3) 7→ x1 − x2,

it is easy to check that l is linear and V = ker l,a vector space. Moreover,

[l] =£1 −1 0

¤.

Therefore, dimker l = 3− rank [l] = 3− 1 = 2. u1 = (1, 1, 0) and u2 = (0, 0, 1) are independentelements of V . Therefore, from Remark 183, u1, u2 are a basis for V .16.Linearity is easy. By definition, l is linear if ∀u, v ∈ R4 and ∀α, β ∈ R, l(αu+βv) = αl(u)+βl(v).

Then,

αl(u) + βl(v) =

=α(u1,u1+u2,u1+u2+u3,u1+u2+u3+u4)++β(v1,v1+v2,v1+v2+v3,v1+v2+v3+v4) =

=(αu1+βv1,αu1+αu2+βv1+βv2,αu1+αu2+αu3+βv1+βv2+βv3,αu1+αu2+αu3+αu4+βv1+βv2+βv3+βv4) =

= l(αu+ βv)

and then l is linear.

[l] =

⎡⎢⎢⎣1 0 0 01 1 0 01 1 1 01 1 1 1

⎤⎥⎥⎦ .Therefore, dimker l = 4− rank [l] = 4− 4 = 0. Moreover, dim Im l = 4 and the column vectors

of [l] are a basis of Im l.17.Proposition. Assume that l ∈ L (V,U) and ker l = {0}. Then,


∀u ∈ Iml, there exists a unique v ∈ V such that l (v) = u.

Proof.Since u ∈ Iml, by definition, there exists v ∈ V such that

l (v) = u. (22.2)

Take v0 ∈ V such that l (v0) = u. We want to show that

l (v0) = u. (22.3)

Observe thatl (v)− l (v0)

(a)= u− u = 0, (22.4)

where (a) follows from (22.2) and (22.3).Moreover,

l (v)− l (v0)(b)= l (v − v0) , (22.5)

where (b) follows from the assumption that l ∈ L (V,U).Therefore,

l (v − v0) = 0,

and, by definition of ker l,v − v0 ∈ ker l. (22.6)

Since, by assumption, ker l = {0}, from (22.6), it follows that

v − v0 = 0.

18.Both V and W are ker of linear function; therefore V , W and V ∩W are vector subspaces of

R4. Moreover

V ∩W =

⎧⎨⎩(x1, x2, x3, x4) ∈ R4 :⎧⎨⎩ x1 − x2 + x3 − x4 = 0

x1 + x2 + x3 + x4 = 0

⎫⎬⎭rank

∙1 −1 1 −11 1 1 1

¸= 2

Therefore, dimker l = dimV ∩W = 4− 2 = 2.Let’s compute a basis of V ∩W :⎧⎨⎩ x1 − x2 = −x3 + x4

x1 + x2 = −x3 − x4

After taking sum and subtraction we get following expression⎧⎨⎩ x1 = −x3

x2 = −x4

A basis consists two linearly independent vectors. For example,

{(−1, 0, 1, 0) , (0,−1, 0, 0)} .

19.By proposition 138, we have to show that1. 0 ∈ l−1 (W ) ,2. ∀α, β ∈ R and v1, v2 ∈ l−1 (W ) we have that αv1 + βv2 ∈ l−1 (W ).1.

l (0)l∈L(V,U)= 0

W vector space∈ W


,2.Since v1, v2 ∈ l−1 (W ),

l¡v1¢, l¡v2¢∈W. (22.7)

Then

l¡αv1 + βv2

¢ l∈L(V,U)= αl

¡v1¢+ βl

¡v2¢ (a)∈ W

where (a) follows from (22.7) and the fact that W is a vector space.

20.Observe that

det

⎡⎢⎢⎢⎢⎣a11 a12 0 0 0a21 a22 0 0 05 6 b11 b12 07 8 b21 b22 0a11 a12 0 0 k

⎤⎥⎥⎥⎥⎦ = detA · detB · k.Then, if k 6= 0,then the rank of both matrix of coefficients and augmented matrix is 5 and the

set of solution to the system is an affine subspace of R6 of dimension 1. If k = 0,then the system is

⎡⎢⎢⎢⎢⎣1 a11 a12 0 0 02 a21 a22 0 0 03 5 6 b11 b12 04 7 8 b21 b22 01 a11 a12 0 0 0

⎤⎥⎥⎥⎥⎦

⎡⎢⎢⎢⎢⎢⎢⎣x1x2x3x4x5x6

⎤⎥⎥⎥⎥⎥⎥⎦ =⎡⎢⎢⎢⎢⎣01230

⎤⎥⎥⎥⎥⎦ ,

which is equivalent to the system

⎡⎢⎢⎣1 a11 a12 0 0 02 a21 a22 0 0 03 5 6 b11 b12 04 7 8 b21 b22 0

⎤⎥⎥⎦⎡⎢⎢⎢⎢⎢⎢⎣

x1x2x3x4x5x6

⎤⎥⎥⎥⎥⎥⎥⎦ =⎡⎢⎢⎣0123

⎤⎥⎥⎦ ,

whose set of solution is an affine subspace of R6 of dimension 2.21.

[l]uv :=

∙∙l

µ10

¶¸u

,

∙l

µ01

¶¸u

¸=

∙∙11

¸u

,

∙1−1

¸u

¸=

∙1 −30 2

¸,

[v]v =

∙34

¸

[l (v)]u =

∙µ7−1

¶¸u

=

∙−98

¸

[l]uv · [v]v =∙1 −30 2

¸ ∙34

¸=

∙−98

¸22.Letn,m ∈ N\ {0} such that m > n, anda vector subspace L of Rm such that dimL = nbe given. Then, there exists l ∈ L (Rn,Rm) such that Im l = L.Proof. Let

©viªni=1

be a basis of L ⊆ Rm. Take l ∈ L (Rn,Rm) such that

∀i ∈ {1, ..., n} , l2¡ein¢= vi,

where ein is the i—th element in the canonical basis in Rn. Such function does exists and, in fact,it is unique as a consequence of a Proposition in the Class Notes that we copy below:


Let V and U be finite dimensional vectors spaces such that S =©v1, ..., vn

ªis a basis of V and©

u1, ..., unªis a set of arbitrary vectors in U . Then there exists a unique linear function l : V → U

such that ∀i ∈ {1, ..., n}, l¡vi¢= ui - see Proposition 273, page 82.

Then, from the Dimension theorem

dim Iml = n− dimker l ≤ n.

Moreover, L = span©viªni=1⊆ Iml. Summarizing,

L ⊆ Im l , dimL = n and dim Iml ≤ n,

and thereforedim Iml = n.

Finally, from Proposition 179 in the class Notes since L ⊆ Im l , dimL = n and dim Iml = n,we have that Iml = L, as desired.Proposition 179 in the class Notes says what follows: Proposition. Let W be a subspace of an

n−dimensional vector space V . Then, 1. dimW ≤ n; 2. If dimW = n, then W = V .

23.

[A | b] =

⎡⎢⎢⎣a 1 11 1 a2 1 3a3 2 a

⎤⎥⎥⎦ =⇒ £R1 ↔ R2

¤=⇒

⎡⎢⎢⎣1 1 aa 1 12 1 3a3 2 a

⎤⎥⎥⎦⎡⎢⎢⎣1 1 aa 1 12 1 3a3 2 a

⎤⎥⎥⎦ =⇒ −aR1 +R2 → R2−2R1 +R3 → R3−3R1 +R4 → R4

=⇒

⎡⎢⎢⎣1 1 a0 1− a 1− a2

0 −1 a0 −1 −2a

⎤⎥⎥⎦ := [A0 (a) | b0 (a)]Since

det

⎡⎣1 1 a0 −1 a0 −1 −2a

⎤⎦ = 3a,We have that if a 6= 0, then rank A ≤ 2 < 3 = rank [A0 (a) | b0 (a)], and the system has no

solutions. If a = 0, [A0 (a) | b0 (a)] becomes⎡⎢⎢⎣1 1 00 1 10 −1 00 −1 0

⎤⎥⎥⎦whose rank is 3 and again the system has no solutions.

24.

[A (k) |b (k)] ≡

⎡⎢⎢⎣1 0 k − 1

1− k 2− k k1 k 11 k − 1 0

⎤⎥⎥⎦det

⎡⎣ 1 0 k − 11 k 11 k − 1 0

⎤⎦ = 2− 2kIf k 6= 1, the system has no solutions. If k = 1,

[A (1) |b (1)] ≡

⎡⎢⎢⎣1 0 00 1 11 1 11 0 0

⎤⎥⎥⎦


⎡⎣ 1 0 00 1 11 1 1

⎤⎦∙1 0 00 1 1

¸Then, if k = 1, there exists a unique solution.

22.2 Some topology in metric spaces

22.2.1 Basic topology in metric spaces

1.To prove that d0 is a metric, we have to check the properties listed in Definition 416.a. d0(x, y) ≥ 0, d0(x, y) = 0⇔ x = yBy definition of d0(x, y), it is always going to be positive as d(x, y) ≥ 0. Furthermore, d0(x, y) =

0⇐⇒ d(x, y) = 0⇐⇒ x = y.b. d0(x, y) = d0(y, x)Applying the definition

d0(x, y) = d0(y, x)⇐⇒ d(x, y)

1 + d(x, y)=

d(y, x)

1 + d(y, x)

but d(x, y) = d(y, x) so we have

d(x, y)

1 + d(x, y)=

d(x, y)

1 + d(x, y)

c. d0(x, z) ≤ d0(x, y) + d0(y, z)Applying the definition

d0(x, z) ≤ d0(x, y) + d0(y, z)⇐⇒ d(x, z)

1 + d(x, z)≤ d(x, y)

1 + d(x, y)+

d(y, z)

1 + d(y, z)

Multiplying both sides by [1 + d(x, z)][1 + d(x, y)][1 + d(y, z)]

d(x, z)[1 + d(x, y)][1 + d(y, z)] ≤ d(x, y)[1 + d(x, z)][1 + d(y, z)] + d(y, z)[1 + d(x, z)][1 + d(x, y)]

Simplifying we obtain

d(x, z) ≤ d(x, y) + d(y, z) + [[1 + d(x, z)][1 + d(x, y)][1 + d(y, z)] + 2[1 + d(x, y)][1 + d(y, z)]]

which concludes the proof.

2.It is enough to show that one of the properties defining a metric does not hold.It can be d (f, g) = 0 and f 6= g. Take

f (x) = 0,∀x ∈ [0, 1] ,

andg (x) = −2x+ 1

Then, Z 1

0

(−2x+ 1) dx = 0.

It can be d (f, g) < 0.Consider the null function and the function that take value 1 for all x in[0; 1]. Then d(0, 1) = −

R 101 dx. by linearity of the Riemann integral, which is equal to −1. Then,


d(0, 1) < 0.

3.Define S = (a1, b1)× (a2, b2) and take x0 :=

¡x01, x

02

¢∈ S. Then, for i ∈ {1, 2}, there exist εi > 0

such that x0i ∈ B¡x0i , εi

¢⊆ (ai, bi). Take ε = min {ε1, ε2}. Then, for i ∈ {1, 2}, x0i ∈ B

¡x0i , ε

¢⊆

(ai, bi) and, defined B =.B¡x01, ε

¢× B

¡x02, ε

¢, we have that x0 ∈ B ⊆ S. It then suffices to show

that B¡x0, ε

¢⊆ B. Observe that

x ∈ B¡x0, ε

¢⇔ d

¡x, x0

¢< ε,

d¡¡x01, 0

¢, (x1, 0)

¢=

q(x01 − x1)

2=¯x01 − x1

¯,

and

d¡¡x01, 0

¢, (x1, 0)

¢=q(x01 − x1) ≤

q(x01 − x1)

2+ (x01 − x1)

2= d

¡x, x0

¢.

4.Show the second equality in Remark 455:

∩+∞n=1µ− 1n,1

n

¶= {0}

5.

S =

½−1,+1

2,−13,1

4,−15,1

6, ...

¾The set is not open: it suffices to find x ∈ S and such that x /∈ Int S; take for example −1. We

want to show that it false that

∃ε > 0 such that (−1− ε,−1 + ε) ⊆ S.

In fact, ∀ε > 0, −1 − ε2 ∈ (−1− ε,−1 + ε), but −1 − ε

2 /∈ S. The set is not closed. It sufficesto show that F (S) is not contained in S, in fact that 0 /∈ S (obvious) and 0 ∈ F (S). We want toshow that ∀ε > 0, B (0, ε) ∩ S 6= ∅.In fact, (−1)n 1

n ∈ B (0, ε) if n is even and (−1)n 1n =

1n < ε. It

is then enough to take n even and n > 1ε .

6.

A = (0, 10)

The set is (R, d2) open, as a union of infinite collection of open sets. The set is not closed,because Ac is not open. 10 or 0 do not belongs to Int(AC)

7.The solution immediately follow from Definition of boundary of a set: Let a metric space (X, d)

and a set S ⊆ X be given. x is an boundary point of S ifany open ball centered in x intersects both S and its complement inX, i.e., ∀r ∈ R++, B (x, r)∩

S 6= ∅ ∧ B (x, r) ∩ SC 6= ∅.As you can see nothing changes in definition above if you replace the set with its complement.8.

x ∈ (F (S))C ⇔ x /∈ (F (S))⇔ ¬

¡∀r ∈ R++, B (x, r) ∩ S 6= ∅ ∧ B (x, r) ∩ SC 6= ∅

¢⇔ ∃r ∈ R++ such that B (x, r) ∩ S = ∅ ∨ B (x, r) ∩ SC 6= ∅⇔ ∃r ∈ R++ such that B (x, r) ⊆ SC ∨ B (x, r) ⊆ S⇔ x ∈ Int SC ∨ x ∈ Int S(1)⇔ ∃r∗x ∈ R++ such that either a. B (x, r∗x) ⊆ Int SC or b. B (x, r∗x) ⊆ Int S.

(22.8)


where (1) follows from the fact that the Interior of a set is an open set.If case a. in (22.8) holds true, then, using Lemma 550, B (x, r∗x) ⊆ (F (S))

C and similarly forcase b., as desired.

9.

Int S Cl (S) F (S) D (S) Is (S) open or closed

S = Q ∅ R R R ∅ neither open nor closed

S = (0, 1) (0, 1) [0, 1] {0, 1} [0, 1] ∅ open

S =©1n

ªn∈N+ ∅ S ∪ {0} S ∪ {0} {0} S neither open nor closed

10.a.Take S = N. Then, Int S = ∅, Cl (∅) = ∅, and Cl (Int S) = ∅ 6= N =S.b.Take S = N. Then, Cl (S) = N, Int N = ∅, and Int Cl (S) = ∅ 6= N =S.

11.a.True. If S is an open bounded interval, then ∃a, b ∈ R, a < b such that S = (a, b). Take x ∈ S

and δ = min {|x− a| , |x− b|}. Then I (x, δ) ⊆ (a, b).b.False. (0, 1) ∪ (2, 3) is an open set, but it is not an open interval.c.False. Take S := {0, 1}. 0 ∈ F (S), but 0 /∈ D (S)d. .False. Take S (0, 1) . 12 ∈ D (S) 0, but 12 /∈ F (S) .12.Recall that: A sequence (xn)n∈N ∈ X∞ is said to be (X, d) convergent to x0 ∈ X (or convergent

with respect to the metric space (X, d) ) if ∀ε > 0,∃n0 ∈ N such that ∀n > n0, d (xn, x0) < ε.a.(xn)n∈N ∈ R∞ such that ∀n ∈ N, : xn = 1

Let > 0 then by definition of (xn)n∈N , ∀n > 0, d(xn, 1) = 0 < . So that

limn→∞

xn = 1

b.(xn)n∈N ∈ R∞ such that ∀n ∈ N, : xn =

1n

Let > 0. Because N is unbounded, ∃n0 ∈ N , such that n0 > 1 . Then ∀n > n0, d(xn, 0) = 1n <

1n0

< . Then, by definition of a limit, we proved that

limn→∞

xn = 0

13.Take (xn)n∈N ∈ [0, 1]

∞ such that xn → x0; we want to show that x0 ∈ [0, 1]. Suppose otherwise,i.e., x0 /∈ [0, 1].Case 1. x0 < 0. By definition of convergence, chosen ε = −x0

2 > 0, there exists nε ∈ N suchthat ∀n > nε, d (xn, x0) < ε, i.e., |xn − x0| < ε = −x0

2 , i.e., x0 +x02 < xn < x0 − x0

2 = x02 < 0.

Summarizing, ∀n > nε, xn /∈ [0, 1] , contradicting the assumption that (xn)n∈N ∈ [0, 1]∞ .

Case 2. x0 > 1. Similar to case 1.

14.This is Example 7.15, page 150, Morris (2007):


1. In fact, we have the following result: Let (X, d) be a metric space and A = {x1, ..., xn} anyfinite subset of X.Then A is compact, as shown below.Let Oi, i ∈ I be any family of open sets such that A ⊆ ∪i∈IOi. Then for each xj ∈ A, there

exists Oij such that xj ∈ Oij . Then A ⊆ Oi1 ∪Oi2 ∪ ... ∪Oin . Therefore A is compact.2. Conversely, let A be compact. Then the family of singleton sets Ox = {x}, x ∈ A is such

that each Ox is open and A ⊆ ∪x∈AOx. Since A is compact, there exists Ox1 , Ox2 , ..., Oxn such thatA ⊆ Ox1 ∪Ox2 ∪ ... ∪Oxn , that is, A ⊆ {x1, ..., xn}. Hence, A is finite.

15.In general it is false. For example in a discrete metric space: see previous exercise.

16.Take an open ball B (x, r). Consider S =

©B¡x, r

¡1− 1

n

¢¢ªn∈N\{0,1}. Observe that S is an

open cover of B (x, r); in fact ∪n∈N\{0,1}B¡x, r

¡1− 1

n

¢¢= B (x, r) ,as shown below.

[⊆] x0 ∈ ∪n∈N\{0,1}B¡x, r

¡1− 1

n

¢¢⇔ ∃nx0N\ {0, 1} such x ∈ B

³x, r

³1− 1

nx0

´´⊆ B (x, r).

[⊇] Take x0 ∈ B (x, r). Then, d (x, x0) < r. Take n such that d (x0, x) < r³1− 1

nx0

´, i.e.,

n > rr−d(x0,x) (and n > 1), then x0 ∈ B

¡x, r

¡1− 1

n

¢¢.

Consider an arbitrary subcover of S, i.e.,

S 0 =½B

µx, r

µ1− 1

n

¶¶¾n∈N

with#N = N ∈ N. Define n∗ = min {n ∈ N}. Then ∪n∈NB¡x, r

¡1− 1

n

¢¢= B

¡x, r

¡1− 1

n∗

¢¢,

and if d (x0, x) ∈¡r¡1− 1

n∗

¢, r¢, then x0 ∈ B (x, r) and x0 /∈ ∪n∈NB

¡x, r

¡1− 1

n

¢¢.

17.1st proof.We have to show that f (A ∪B) ⊆ f (A) ∪ f (B) and f (A) ∪ f (B) ⊆ f (A ∪B).To prove the first inclusion, take y ∈ f (A ∪B); then ∃x ∈ A ∪ B such that f(x) = y. Then

either x ∈ A or x ∈ B that implies f(x) = y ∈ A or f(x) = y ∈ B. In both case y ∈ f(A) ∪ f(B)We now show the opposite e inclusion. Let y ∈ f(A) ∪ f(B), then y ∈ f(A) or y ∈ f(B), but

y ∈ f(A) implies that ∃x ∈ A such that f(x) = y. The same implication for y ∈ f(B). As results,y = f(x) in either case with x ∈ A ∪B i.e. y ∈ f(A ∪B).2nd proof.

y ∈ f(A ∪B)⇔⇔ ∃x ∈ A ∪B such that f(x) = y

⇔ (∃x ∈ A such that f(x) = y) ∨ (∃x ∈ B such that f(x) = y)

⇔ (y ∈ f(A)) ∨ (y ∈ f(B))

⇔ y ∈ f(A) ∪ f(B)

18.:First proof. Take f = sin, A = [−2π, 0] , B = [0, 2π].Second proof. Consider

f : {0, 1}→ R, x 7→ 1

Then take A = {0} and B = {1}. Then A ∩B = ∅, so f(A ∩B) = ∅. But as f(A) = f(B) = {1},we have that f(A) ∩ f(B) = {1} 6= ∅.

19.Take c ∈ R and define the following function

f : X → Y, f (x) = c.


It suffices to show that the preimage of every open subset of the domain is open in the codomain.The inverse image of any open set K is either X (if c ∈ K) or ∅ (if c /∈ K), which are both opensets.

20.a.i. Rn+ is not bounded, then by Proposition 512 it is not compact.ii. Cl B(x, r) is compact.From Proposition 512, it suffices to show that the set is closed and bounded.Cl B(x, r) is closed from Proposition 468.Cl B(x, r) is bounded because Cl B (x, r) = {y ∈ Rn : d (x, y) ≤ r} ⊆ B (x, 2r).Let’s show in detail the equality.i. Cl B (x, r) ⊆ C.

The function dx : Rn → R, dx (y) = d (x, y) :=³Pn

i=1 (xi − yi)2´ 12

is continuous. Therefore,

C = d−1x ([0, r]) is closed. Since B (x, r) ⊆ C, by definition of closure, the desired result follows.ii. Cl B (x, r) ⊇ C.From Corollary 553, it suffices to show that Ad B (x, r) ⊇ C. If d (y, x) < r, we are done.

Suppose that d (y, x) = r.We want to show that for every ε > 0, we have that B (x, r) ∩B (y, ε) 6=∅.If ε > r, then x ∈ B (x, r) ∩ B (y, ε). Now take, ε ≤ r. It is enough to take a point “veryclose to y inside B (x, r)”". For example, we can verify that z ∈ B (x, r) ∩ B (y, ε), where z =x+

¡1− ε

2r ) (y − x)¢.Indeed,

d (x, z) =³1− ε

2r)d (y, x)

´= (1− ε

2r)r = r − ε

2< r,

andd (y, z) =

ε

2rd (y, x) =

ε

2rr =

ε

2< ε.

c.See solution to Exercise 5, where it was shown that S is not closed and therefore using Proposition

512, we can conclude S is not compact.

21.Observe that given for any j ∈ {1, ...,m} , the continuous functions gj : Rn → R and g = (gj)mj=1,

we can defineC := {x ∈ Rn : g (x) ≥ 0} .

Then C is closed, because of the following argument:C = ∩mj=1g−1j ([0,+∞)) ; since gj is continuous, and [0,+∞) is closed, then g−1j ([0,+∞)) is

closed in Rn; then C is closed because intersection of closed sets.

22.The set is closed, because X = f−1 ({0}) .The set is not compact: take f as the constant function.

23.LetV = BY (1, ε), be an open ball around the value 1 of the codomain, with ε < 1. f−1(V ) =

{0} ∪BX(1, ε) is the union of an open set and a closed set, so is neither open nor closed.

24.To apply the Extreme Value Theorem, we first have to check if the function to be maximized is

continuous. Clearly, the functionPn

i=1 xi is continuous as is the sum of affine functions. Therefore,to check for the existence of solutions for the problems we only have to check for the compactnessof the restrictions.The first set is closed, because it is the inverse image of the closed set [0, 1] via the continuous

function kk. The first set is bounded as well by definition. Therefore the set is compact and thefunction is continuous, we can apply Extreme Value theorem. The second set is not closed, thereforeit is not compact and Extreme Value theorem can not be applied. The third set is unbounded, andtherefore it is not compact and the Extreme Value theorem can not be applied.


22.2.2 Correspondences

1.Since u is a continuous function, from the Extreme Value Theorem , we are left with showing

that for every (p,w) , β (p,w) is non empty and compact,i.e., β is non empty valued and compactvalued.

x =³

wCpc

´Cc=1∈ β (p,w) .

β (p,w) is closed because is the intersection of the inverse image of two closed sets via continuousfunctions.

β (p,w) is bounded below by zero.

β (p,w) is bounded above because for every c, xc ≤ w− c0 6=c pc0xc

0

pc ≤ wpc , where the first inequal-

ity comes from the fact that px ≤ w, and the second inequality from the fact that p ∈ RC++ andx ∈ RC+.2.(a)Consider x0, x00 ∈ ξ (p,w) .We want to show that ∀λ ∈ [0, 1] , xλ := (1− λ)x0+λx00 ∈ ξ (p,w) .

Observe that u (x0) = u (x00) := u∗. From the quasiconcavity of u, we have u¡xλ¢≥ u∗. We are

therefore left with showing that xλ ∈ β (p,w) , i.e., β is convex valued. To see that, simply, observethat pxλ = (1− λ) px0 + λpx00 ≤ (1− λ)w + λw = w.(b) Assume otherwise. Following exactly the same argument as above we have x0, x00 ∈ ξ (p,w) ,

and pxλ ≤ w. Since u is strictly quasi concave, we also have that u¡xλ¢> u (x0) = u (x00) := u∗,

which contradicts the fact that x0, x00 ∈ ξ (p,w) .3.We want to show that for every (p,w) the following is true. For every sequence {(pn, wn)}n ⊆

RC++ ×R++ such that(pn, wn)→ (p,w) , xn ∈ β (pn, wn) , xn → x,it is the case that x ∈ β (p,w) .Since xn ∈ β (pn, wn), we have that pnxn ≤ wn. Taking limits of both sides, we get px ≤ w,i.e.,

x ∈ β (p,w) .4.(a)We want to show that ∀y0, y00 ∈ y (p) , ∀λ ∈ [0, 1] , it is the case that yλ := (1− λ) y0+λy00 ∈

y (p) , i.e., yλ ∈ Y and ∀y ∈ Y, pyλ ≥ py.yλ ∈ Y simply because Y is convex.

pyλ := (1− λ) py0 + λpy00y0,y00∈y(p)≥ (1− λ) py + λpy = py.

(b)Suppose not; then ∃y0, y00 ∈ Y such that y0 6= y00 and such that

∀y ∈ Y, py0 = py00 > py (1) .

Since Y is strictly convex,∀λ ∈ (0, 1) , yλ := (1− λ) y0 + λy00 ∈ Int Y. Then, ∃ε > 0 such that

B¡yλ, ε

¢⊆ Y. Consider y∗ := yλ + ε

2C1, where 1 := (1, ..., 1) ∈ RC . d¡y∗, yλ

¢=

qPCc=1

¡ε2C

¢2=

ε2√C. Then, y∗ ∈ B

¡yλ, ε

¢⊆ Y and, since pÀ 0, we have that py∗ > pyλ = py0 = py00, contradicting

(1) .5.This exercise is taken from Beavis and Dobbs (1990), pages 74-78.

21.510.50

3.75

2.5

1.25

0

-1.25

x

y

x

y


For every x ∈ [0, 2], both φ1 (x) and φ2 (x) are closed, bounded intervals and therefore convexand compact sets. Clearly φ1 is closed and φ2 is not closed.

φ1and φ2 are clearly UHC and LHC for x 6= 1. Using the definitions, it is easy to see that forx = 1, φ1 is UHC, and not LHC and φ2 is LHC and not UHC.6.

53.752.51.250

1

0.5

0

-0.5

-1

x

y

x

y

For every x > 0, φ is a continuous function. Therefore, for those values of x, φ is both UHCand LHC.

φ is UHC in 0. For every neighborhood of [−1, 1] and for any neighborhood of {0} in R+,φ (x) ⊆ [−1, 1] .

φ is not LHC in 0. Take the open set V =¡12 ,

32

¢;we want to show that ∀ε > 0 ∃z∗ ∈ (0, ε)

such that φ (z∗) /∈¡12 ,

32

¢. Take n ∈ N such that 1

n < ε and z∗ = 1nπ . Then 0 < z∗ < ε and

sin z∗ = sinnπ = 0 /∈¡12 ,

32

¢.

Since φ is UHC and closed valued, from Proposition 16 is closed.7.φ is not closed. Take xn =

√2

2n ∈ [0, 1] for every n ∈ N. Observe that xn → 0. For everyn ∈ N, yn = −1 ∈ φ (xn) and yn → −1. But −1 ∈ φ (0) = [0, 1] .

φ is not UHC. Take x = 0 and a neighborhood V =¡−12 ,

32

¢of φ (0) = [0, 1] . Then ∀ε >

0, ∃x∗ ∈ (0, ε) \Q. Therefore, φ (x∗) = [−1, 0] * V.φ is not LHC. Take x = 0 and the open set V =

¡12 ,

32

¢.Then φ (0) ∩

¡12 ,

32

¢= [0, 1] ∩

¡12 ,

32

¢=¡

12 , 1¤6= ∅. But, as above, ∀ε > 0, ∃x∗ ∈ (0, ε) \Q. . Then φ (x∗) ∩ V = [−1, 0] ∩

¡12 ,

32

¢= ∅.

8.(This exercise is taken from Klein, E. (1973), Mathematical Methods in Theoretical Economics,

Academic Press, New York, NY, page 119).Observe that φ3 (x) =

£x2 − 2, x2 − 1

¤.

32.521.510.50

7.5

5

2.5

0

-2.5

x

y

x

y

φ1 ([0, 3]) =©(x, y) ∈ R2 : x ≥ 0,× ≤ 3, y ≥ x2 − 2, y ≤ x2

ª. φ1 ([0, 3]) is defined in terms of

weak inequalities and continuous functions and it is closed and therefore φ1 is closed. Similarargument applies to φ2 and φ3.Since [−10, 10] is a compact set such that φ1 ([0, 3]) ⊆ [−10, 10] , from Proposition 17, φ1 is

UHC. Similar argument applies to φ2 and φ3.φ1 is LHC. Take an arbitrary x ∈ [0, 3] and a open set V with non-empty intersection with

φ1 (x) =£x2 − 2, x2

¤. To fix ideas, take V =

£ε, x2 + ε

¤, with ε ∈

¡0, x2

¢. Then, take U =³√

ε,px2 + ε

´. Then for every x ∈ U,

©x2ª⊆ V ∩ φ1 (x) .


Similar argument applies to φ2 and φ3.

22.3 Differential Calculus in Euclidean Spaces1 .The partial derivative of f with respect to the first coordinate at the point (x0, y0), is - if it

exists and is finite -

limx→x0

f(x, y0)− f(x0, y0)

x− x0

f(x, y0)− f(x0, y0)

x− x0=2x2 − xy0 + y20 − (2x20 − x0y0 + y20)

x− x0

=2x2 − 2x20 − (xy0 − x0y0)

x− x0= 2(x+ x0)− y0

ThenD1f(x0, y0) = 4x0 − y0

The partial derivative of f with respect to the second coordinate at the point (x0, y0), is - if itexists and is finite -

limy→y0

f(x0, y)− f(x0, y0)

y − y0

f(x0, y)− f(x0, y0)

y − y0=2x20 − x0y + y2 − (2x20 − x0y0 + y20)

y − y0

=−x0(y − y0) + y2 − y20

y − y0

= −x0 + (y + y0)

D2f(x0, y0) = −x0 + 2y0.2 .a.The domain of f is R\{0} × R. As arctan is differentiable over the whole domain, we may

compute the partial derivative over the whole domain of f at the point (x, y) - we omit from nowon the superscript 0

D1f(x, y) = arctany

x+ x(− y

x2)

1

1 + ( yx)2

= arctany

x− y

1

1 + ( yx)2

D2f(x, y) = x1

x

1

1 + ( yx)2=

1

1 + ( yx)2

b.The function is defined on R++ ×R. and

∀(x, y) ∈ R++ ×R, f(x, y) = ey lnx

Thus as exp and ln are differentiable over their whole respective domain, we may compute thepartial derivatives :

D1f(x, y) =y

xey lnx = yxy−1

D2f(x, y) = ln(x)eylnx = ln(x)xy

c.


f(x, y) = (sin(x+ y))√x+y = e

√x+y ln[sin(x+y)] in (0, 3).

We check that sin(0 + 3) > 0 so that the point belongs to the domain of the function. Both partialderivatives in (x, y) have the same expression since f is symmetric with respect to x and y

D1f(x, y) = D2f(x, y) =

∙1

2√x+ y

ln[sin(x+ y)] +√x+ y tan(x+ y)

¸(sin(x+ y))

√x+y,

and

D1f(0, 3) = D2f(0, 3) =

∙1

2√3ln[sin(3)] +

√3 tan(3)

¸(sin(3))

√3

3.a.

Dxf (0, 0) = limh→0

f (h, 0)− f (0, 0)

h= lim

h→0

0

h= 0;

b.

Dyf (0, 0) = limk→0

f (0, k)− f (0, 0)

h= lim

h→0

0

h= 0;

c. The basic idea of the proof is to compute the limit for (x, y) going to (0, 0) along the liney = mx in the xy plane: ....

lim(x,y)→0

x · (mx)

x2 + (mx)2= lim(x,y)→0

m

1 +m2.

Therefore, the limit depends on m and using that result a precise statement in terms of thedefinition of continuity can be given. Consider the sequences (xn)n∈N and (yn)n∈N such that∀n ∈ N, : xn = yn =

1n .

We have that limn→0

(xn, yn) = 0R2 but

f(xn, yn) =1n2

1n2 +

1n2

Thenlimn→0

f(xn, yn) =1

26= f(0, 0) = 0

Thus, f is not continuous in (0, 0).

4 .

f 0((1, 1); (α1, α2)) = limh→0f(1 + hα1, 1 + hα2)− f(1, 1)

h=

limh→01

h

∙2 + h(α1 + α2)

(1 + hα1)2 + (1 + hα2)2 + 1− 23

¸= · · ·

= limh→0−2h(α21 + α22)− (α1 + α2)

3[(1 + hα1)2 + (1 + hα2)2 + 1]= −α1 + α2

9

5 .1st answer.We will show the existence of a linear function T(x0,y0)(x, y) = a(x− x0) + b(y − y0) such that

the definition of differential is satisfied. After substituting, we want to show that

lim(x,y)→(x0,y0)

|x2 − y2 + xy − x20 + y20 − x0y0 − a(x− x0)− b(y − y0)|p(x− x0)2 + (y − y0)2

= 0.

Manipulate the numerator of the above to get NUM = |(x − x0)(x + x0 − a + y) − (y − y0)(y +y0 + b− x0)|. Now the ratio R whose limit we are interested to obtain satisfies

0 ≤ R ≤ |x− x0||x+ x0 − a+ y|p(x− x0)2 + (y − y0)2

+|y − y0||y + y0 + b− x0|p(x− x0)2 + (y − y0)2

≤ |x− x0||x+ x0 − a+ y|+ |y − y0||y + y0 + b− x0|.


For a = 2x0 + y0 and b = x0 − 2y0 we get the limit of R equal zero as required.

2nd answer.As a polynomial function, we know that f is C1(R2) so we can “guess" that the derivative at

the point (x, y) is the following linear application:

df(x,y) : R2 → R

u 7→ (2x+ y,−2y + x).

µu1u2

¶In order to comply with the definition, we have to prove that

limu→0

f ((x, y) + (u1, u2))− f ((x, y))− df(x,y)(u)

||u|| = 0

f ((x, y) + (u1, u2))− f ((x, y))− df(x,y)(u) = (x+ u1)2 − (y + u2)

2 + (x+ u1)(y + u2)

− x2 + y2 − xy − (2x+ y,−2y + x).

µu1u2

¶= 2xu1 + u21 − 2yu2 − u22 + xu2 + yu1 + u1u2

− (2x+ y)u1 − (−2y + x)u2

= u21 − u22 + u1u2

= O(||u||2)

Or, we know that limu→0

O(||u||2)||u|| = 0. Then we have proven that the candidate was indeed the

derivative of f at point (x, y).

6 .a) given x0 ∈ Rn we need to find Tx0 : Rn → Rm linear and Ex0 with limv→0Ex0(v) = 0. Take

Tx0 = l and Ex0 ≡ 0. Then, the desired result follows.b) projection is linear, so by a) is differentiable.7.From the definition of continuity, we want to show that ∀x0 ∈ Rn, ∀ε > 0 ∃δ > 0 such that

kx− x0k < δ ⇒ kl (x)− l (x0)k < ε. Defined [l] = A, we have that

kl (x)− l (x0)k = kA · x− x0k =

= kR1 (A) · (x− x0) , ..., Rm (A) · (x− x0)k(1)

≤Pm

i=1 |Ri (A) · (x− x0)|(2)

≤

≤Pm

i=1 kRi (A)k · kx− x0k ≤ m ·¡maxi∈{1,...,m} {Ri (A)}

¢· kx− x0k ,

(22.9)

where (1) follows from Remark 56 and (2) from Proposition 53.4, i.e., Cauchy-Schwarz inequality.Therefore, if maxi∈{1,...,m} {Ri (A)}, we are done. Otherwise, take

δ =ε

m ·¡maxi∈{1,...,m} {Ri (A)}

¢ .Then we have that kx− x0k < δ implies that kx− x0k ·m ·

¡maxi∈{1,...,m} {Ri (A)}

¢< ε, and

from (22.9) , kl (x)− l (x0)k < ε, as desired.8 .

Df(x, y) =

⎡⎣ cosx cos y − sinx sin ycosx sin y sinx cos y− sinx cos y − cosx sin y

⎤⎦ .9 .


Df(x, y, z) =

⎡⎣ g0(x)h(z) 0 g(x)h0(z)g0(h(x))h0(x)/y −g(h(x))/y2 0exp(xg(h(x))((g(h(x)) + g0(h(x))h0(x)x)) 0 0

⎤⎦ .

10 .a.

f is differentiable over its domain since x 7→ log x is differentiable over R++. Then from Proposition664, we know that

[dfx] = Df(x) =£13x1

, 16x2

, 12x3

¤Then

[dfx0 ] =£13

16

14

¤By application of Remark 665, we have that

f 0(x0, u) = Df(x0).u =1√3

£13

16

14

¤.

⎛⎝111

⎞⎠ =

√3

4

b.As a polynomial expression, f is differentiable over its domain.

[dfx] = Df(x) =£2x1 − 2x2, 4x2 − 2x1 − 6x3, −2x3 − 6x2

¤Then

[dfx0 ] =£2 4 2

¤and

f 0(x0, u) = Df(x0).u =£2 4 2

¤.

⎛⎝− 1√2

01√2

⎞⎠ = 0

c.[dfx] = Df(x) =

£ex1x2 + x2x1e

x1x2 , x21ex1x2

¤Then

[dfx0 ] =£1 0

¤and

f 0(x0, u) = Df(x0).u =£1, 0

¤.

µ23

¶= 2

11 .Then since f is in C∞(R\{0}), we know that f admits partial derivative functions and that

these functions admit themselves partial derivatives in (x, y, z). Since, the function is symmetric in

its arguments, it is enough to compute explicitly ∂2f(x,y,z)∂x2 .

∂f

∂x(x, y, z) = −x(x2 + y2 + z2)−

32

Then∂2f

∂x2(x, y, z) = −(x2 + y2 + z2)−

32 + 3x2(x2 + y2 + z2)−

52

Then ∀(x, y, z) ∈ R3\{0},

∂2f

∂x2(x, y, z) +

∂2f

∂y2(x, y, z) +

∂2f

∂z2(x, y, z) = −3(x2 + y2 + z2)−

32 + (3x2 + 3y2 + 3z2)(x2 + y2 + z2)−

52

= −3(x2 + y2 + z2)−32 + 3(x2 + y2 + z2)−

32

= 0


12 .g, h ∈ C2(R,R++)2

f : R3 → R3, f(x, y, z) =³g(x)h(z) , g(h(x)) + xy, ln(g(x) + h(x))

´Since Im(g) ⊆ R++, (x, y, z) 7→ g(x)

h(z) is differentiable as the ratio of differentiable functions. Andsince Im(g+ h) ⊆ R++ and that ln is differentiable over R++, x 7→ ln(g(x) + h(x)) is differentiableby proposition 619. Then

Df(x, y, z) =

⎡⎢⎣g0(x)h(z) 0 −h0(z) g(x)h(z)2

h0(x)g0(h(x)) + y x 0g0(x)+h0(x)g(x)+h(x) 0 0

⎤⎥⎦13 .a.Since

h (x) =

µex + g (x)eg(x) + x

¶,

then

[dhx] = Dh (x) =

µex + g0 (x)

g0 (x) eg(x) + 1

¶,

[dh0] = Dh (0) =

µ1 + g0 (0)

g0 (0) eg(0) + 1

¶,

b.Let us define l : R→ R2, f(x) = (x, g(x)). Then h = f ◦ l. As l is differentiable over R and f

is differentiable over R2 we may apply the “chain rule".

dh0 = dfl(0) ◦ dl0

[dlx] = Dl(x) =

∙1

g0(x)

¸[df(x,y)] = Dl(x, y) =

∙ex 11 ey

¸Then

[dh0] =

∙e0 11 eg(0)

¸ ∙1

g0(0)

¸=

∙1 + g0(0)

1 + g0(0)eg(0)

¸14 .

Dx (b ◦ a) (x) = Dyb (y)|y=a(x) ·Dxa (x) .

Dyb (y) =

∙Dy1g (y) Dy2g (y) Dy3g (y)Dy1f (y) Dy2f (y) Dy3f (y)

¸|y=a(x)

Dxa (x) =

⎡⎣ Dx1f (x) Dx2f (x) Dx3f (x)Dx1g (x) Dx2g (x) Dx3g (x)

1 0 0

⎤⎦Dx (b ◦ a) (x) =

⎡⎣ Dy1g(f(x),g(x),x1) Dy2g(f(x),g(x),x1) Dy3g(f(x),g(x),x1)

Dy1f(f(x),g(x),x1) Dy2f(f(x),g(x),x1) Dy3f(f(x),g(x),x1)

⎤⎦⎡⎢⎢⎢⎣

Dx1f(x) Dx2f(x) Dx3f(x)

Dx1g(x) Dx2g(x) Dx3g(x)

1 0 0

⎤⎥⎥⎥⎦=


=

∙Dy1g Dy2g Dy3gDy1f Dy2f Dy3f

¸⎡⎣ Dx1f Dx2f Dx3fDx1g Dx2g Dx3g1 0 0

⎤⎦ =

=

∙Dy1g ·Dx1f +Dy2gDx1g +Dy3g Dy1g ·Dx2f +Dy2g ·Dx2g Dy1g ·Dx3f +Dy2g ·Dx3gDy1f ·Dx1f +Dy2f ·Dx1g +Dy3f Dy1f ·Dx2f +Dy2f ·Dx2g Dy1f ·Dx3f +Dy2f ·Dx3g

¸.

15 .By the sufficient condition of differentiability, it is enough to show that the function f ∈ C1.

Partial derivatives are∂f

∂x= 2x + y and

∂f

∂y= −2y + x — both are indeed continuous, so f is

differentiable.

16 .i) f ∈ C1 as Df(x, y, z) = (1+4xy2+3yz, 3y2+4x2y+3z, 1+3xy+3z2) has continuous entries

(everywhere, in particular around (x0, y0, z0))).ii) f(x0, y0, z0) = 0 by direct calculation.iii) f 0z =

∂f∂z |(x0,y0,z0) = 7 6= 0, f 0y =

∂f∂y |(x0,y0,z0) = 10 6= 0 and f 0x =

∂f∂x |(x0,y0,z0) = 8 6= 0.

Therefore we can apply Implicit Function Theorem around (x0, y0, z0) = (1, 1, 1) to get

∂x

∂z= −f

0z

f 0x= −7/8,

17 .

a)Df =

∙2x1 −2x2 2 3x2 x1 1 −1

¸and each entry ofDf is continuous; then f is C1. detDxf(x, t) =

2x21 + 2x22 6= 0 except for x1 = x2 = 0. Finally,

Dg(t) = −∙2x1 −2x2x2 x1

¸−1 ∙2 31 −1

¸= − 1

2x21 + 2x22

∙2x1 + 2x2 3x1 − 2x2−2x2 + 2x1 −2x1 − 3x1

¸.

b) Df =

∙2x2 2x1 1 2t22x1 2x2 2t1 − 2t2 −2t1 + 2t2

¸continuous, detDxf(x, t) = 4x

22 − 4x21 6= 0 except for

|x1| = |x2|. Finally

Dg(t) = −∙2x2 2x12x1 2x2

¸−1 ∙1 2t2

2t1 − 2t2 −2t1 + 2t2

¸=

= − 14x22−4x21

∙−4x1t1 + 4x1t2 + 2x2 4x1t1 − 4x1t2 + 4x2t2−2x1 + 4x2t1 − 4x2t2 −4x1t2 − 4x2t1 + 4x2t2

¸ .

c) Df =

∙2 3 2t1 −2t21 −1 t2 t1

¸continuous, detDxf(x, t) = −5 6= 0 always. Finally

Dg(t) = −∙2 31 −1

¸−1 ∙2t1 −2t2t2 t1

¸=1

5

∙−2t1 − 3t2 2t2 − 3t1−2t1 + 2t2 2t2 + 2t1

¸.

18.As an application of the Implicit Function Theorem, we have that

∂z

∂x= −

∂(z3−xz−y)∂x

∂(z3−xz−y)∂z

= − −z3z2 − x

if 3z2 − x 6= 0. Then,

∂2z

∂x∂y=

∂³

z(x,y)

3(z(x,y))2−x

´∂y

=

∂z∂y

¡3z2 − x

¢− 6 ∂z∂y · z2

(3z2 − x)2


Since,

∂z

∂y= −

∂(z3−xz−y)∂y

∂(z3−xz−y)∂z

= − −13z2 − x

,

we get

∂2z

∂x∂y=

13z2−x

¡3z2 − x

¢− 6 1

3z2−x · z2

(3z2 − x)2=−3z2 − x

(3z2 − x)3

19.As an application of the Implicit Function Theorem, we have that the Marginal Rate of Substi-

tution in (x0, y0) is

dy

dx |(x,y)=(x0,y0)= −

∂(u(x,y)−k)∂x

∂(u(x,y)−k)∂y |(x,y)=(x0,y0)

< 0

d2y

dx2= −

∂³Dxu(x,y(x))Dyu(x,y(x))

´∂x

= −

Ã(−)

Dxxu+(+)

Dxyu(−)dydx

!(+)

Dyu−Ã

(+)

Dxyu+(−)Dyyu

(−)dydx

!(+)

Dxu

(Dyu)2

(+)

> 0

and therefore the function y (x) describing indifference curves is convex.

22.4 Nonlinear Programming

1(a)

If β = 0, then f (x) = α. The constant function is concave and therefore pseudo-concave,quasi-concave, not strictly concave.If β > 0, f 0 (x) = αβxβ−1, f 00 (x) = αβ (β − 1)xβ−2.f 00 (x) ≤ 0⇔ αβ (β − 1) ≥ 0 α≥0,β≥0⇔ 0 ≤ β ≤ 1⇔ f concave ⇒ f quasi-concave.f 00 (x) < 0⇔ (α > 0 and β ∈ (0, 1))⇒ f strictly concave.(b)

The Hessian matrix of f is

D2f (x) =

⎡⎢⎣ α1β1 (β1 − 1)xβ1−2 0. . .

0 αnβn (βn − 1)xβn−2

⎤⎥⎦.

D2f (x) is negative semidefinite ⇔ (∀i, βi ∈ [0, 1])⇒ f is concave.D2f (x) is negative definitive ⇔ (∀i, αi > 0 and βi ∈ (0, 1))⇒ f is strictly concave.The border Hessian matrix is

B (f (x)) =

⎡⎢⎢⎢⎣0 α1β1x

β1−1 − αnβnxβn−1

α1β1xβ1−1 α1β1 (β1 − 1)xβ1−2 0

| . . .αnβnx

β−1 0 αnβn (βn − 1)xβn−2

⎤⎥⎥⎥⎦.

The determinant of the significant leading principal minors are

det

∙0 α1β1x

β1−1

α1β1xβ1−1 α1β1 (β1 − 1)xβ1−2

¸= −α21β21

¡xβ1−1

¢2< 0


det

⎡⎣ 0 α1β1xβ1−1 α2β2x

β2−1

α1β1xβ1−1 α1β1 (β1 − 1)xβ1−2 0

α2β2xβ1−1 0 α2β2 (β2 − 1)xβ2−2

⎤⎦ == −

£α1β1x

β1−1α1β1xβ1−1α2β2 (β2 − 1)xβ2−2

¤−£α2β2x

β1−1α2β2xβ2−1α1β1 (β1 − 1)xβ1−2

¤=

= −¡α1β1α2β2x

β1+β2−4¢ £α1β1x

β2 (β2 − 1) + α2β2xβ1 (β1 − 1)

¤=

= −α1β1α2β2xβ1+β2−4£α1β1x

β2 (β2 − 1) + α2β2xβ1 (β1 − 1)

¤> 0

iff for i = 1, 2, αi > 0 and βi ∈ (0, 1) .(c)If β = 0, then f (x) = min {α,−γ} = 0.If β > 0, we have

53.752.51.250-1.25

10

7.5

5

2.5

0

-2.5

-5

x

y

x

y

The intersection of the two line has coordinates x∗ :³= α+γ

β , α´.

f is clearly not strictly concave, because it is constant in a subset of its domain. Let’s show itis concave and therefore pseudo-concave and quasi-concave.Given x0, x00 ∈ X, 3 cases are possible.Case 1. x0, x00 ≤ x∗.Case 2. x0, x00 ≥ x∗.Case 3. x0 ≤ x∗ and x00 ≥ x∗.The most difficult case is case 3: we want to show that (1− λ) f (x0)+λf (x00) ≤ f ((1− λ)x0 + λx00) .Then, we have

(1− λ) f (x0) + λf (x00) = (1− λ)min {α, βx0 − γ}+ λmin {α, βx00 − γ} = (1− λ) (βx0 − γ) + λα.

Since, by construction α ≥ βx0 − γ,

(1− λ) (βx0 − γ) + λα ≤ α;

since, by construction α ≤ βx00 − γ,

(1− λ) (βx0 − γ) + λα ≤ (1− λ) (βx0 − γ) + λ ((βx00 − γ) = β [(1− λ)x0 + λx00]− γ.

Then

(1− λ) f (x0) + λf (x00) ≤ min {α, β [(1− λ)x0 + λx00]− γ} = f ((1− λ)x0 + λx00) ,

as desired.2.a.i. Canonical form.


For given π ∈ (0, 1), a ∈ (0,+∞),

max(x,y)∈R2 π · u (x) + (1− π)u (y) s.t. a− 12x− y ≥ 0 λ1

2a− 2x− y ≥ 0 λ2x ≥ 0 λ3y ≥ 0 λ4½

y = a− 12x

y = 2a− 2x , solution is∙x =

2

3a, y =

2

3a

¸

21.510.50

2

1.5

1

0.5

0

x

y

x

y

ii. The set X and the functions f and g ( X open and convex; f and g at least differentiable).a. The domain of all function is R2. Take X = R2 which is open and convex.b. Df (x, y) = (π · u0 (x) , (1− π)u0 (y)).The Hessian matrix is∙

π · u00 (x) 00 (1− π)u00 (y)

¸Therefore, f and g are C2 functions and f is strictly concave and the functions gj are affine.iii. Existence.C is closed and bounded below by (0, 0) and above by (a, a) :y ≤ a− 1

2x ≤ a2x ≤ 2a− y ≤ 2a.iv. Number of solutions.The solution is unique because f is strictly concave and the functions gj are affine and therefore

concave.v. Necessity of K-T conditions.The functions gj are affine and therefore concave.x++ =

¡12a,

12a¢

a− 1212a−

12a =

14a > 0

2a− 212a−12a =

12a > 0

12a > 012a > 0

vi. Sufficiency of K-T conditions.The objective function is strictly concave and the functions gj are affinevii. K-T conditions.


L (x, y, λ1, ..., λ4;π, a) = π ·u (x)+ (1− π)u (y)+λ1

µa− 1

2x− y

¶+λ2 (2a− 2x− y)+λ3x+λ4y.

⎧⎪⎪⎪⎪⎪⎪⎨⎪⎪⎪⎪⎪⎪⎩

π · u0 (x)− 12λ1 − 2λ2 + λ3 = 0

(1− π)u0 (y)− λ1 − λ2 + λ4 = 0min

©λ1, a− 1

2x− yª

= 0min {λ2, 2a− 2x− y} = 0min {λ3, x} = 0min {λ4, y} = 0

b. Inserting (x, y, λ1, λ2, λ3, λ4) =¡23a,

23a, λ1, 0, 0, 0

¢, with λ1 > 0, in the Kuhn-Tucker condi-

tions we get: ⎧⎪⎪⎪⎪⎪⎪⎨⎪⎪⎪⎪⎪⎪⎩

π · u0¡23a¢− 1

2λ1 = 0(1− π)u0

¡23a¢− λ1 = 0

a− 1223a−

23a = 0

min©0, 2a− 223a−

23aª

= 0min

©0, 23a

ª= 0

min©0, 23a

ª= 0

and ⎧⎪⎪⎪⎪⎪⎪⎨⎪⎪⎪⎪⎪⎪⎩

λ1 = 2π · u0¡23a¢> 0

λ1 = (1− π)u0¡23a¢≥ 0

a− 1223a−

23a = 0

min {0, 0} = 0min

©0, 23a

ª= 0

min©0, 23a

ª= 0

Therefore, the proposed vector is a solution if

2π · u0µ2

3a

¶= (1− π)u0

µ2

3a

¶> 0,

i.e.,

2π = 1− π or π =1

3and for any a ∈ R++.

c. If the first, third and fourth constraint hold with a strict inequality, and the multiplierassociated with the second constraint is strictly positive, Kuhn-Tucker conditions become:⎧⎪⎪⎪⎪⎪⎪⎨⎪⎪⎪⎪⎪⎪⎩

π · u0 (x)− 2λ2 = 0(1− π)u0 (y)− λ2 = 0a− 1

2x− y > 02a− 2x− y = 0x > 0y > 0

⎧⎨⎩ π · u0 (x)− 2λ2 = 0(1− π)u0 (y)− λ2 = 02a− 2x− y = 0

x y λ2 π a

π · u0 (x)− 2λ2 π · u00 (x) 0 −2 u0 (x) 0(1− π)u0 (y)− λ2 0 (1− π)u00 (y) −1 −u (y) 02a− 2x− y −2 −1 0 0 2


det

⎡⎣ π · u00 (x) 0 −20 (1− π)u00 (y) −1−2 −1 0

⎤⎦ == πu00 (x) det

∙(1− π)u00 (y) −1−1 0

¸− 2 det

∙0 −2(1− π)u00 (y) −1

¸=

= −πu00 (x)− 4 (1− π)u00 (y) > 0

D(π,a) (x, y, λ2) = −

⎡⎣ π · u00 (x) 0 −20 (1− π)u00 (y) −1−2 −1 0

⎤⎦−1 ⎡⎣ u00 (x) 0−u00 (y) 00 2

⎤⎦Using maple:

⎡⎣ π · u00 (x) 0 −20 (1− π)u00 (y) −1−2 −1 0

⎤⎦−1 =

= 1−πu00(x)−4(1−π)u00(y)

⎡⎣ −1 2 2u00 (y)− 2πu00 (y)2 −4 πu00 (x)

2u00 (y)− 2πu00 (y) πu00 (x) πu00 (x)u00 (y)− π2u00 (x)u00 (y)

⎤⎦

D(π,a) (x, y, λ2) =

= − 1−πu00(x)−4(1−π)u00(y)

⎡⎣ −1 2 2u00 (y) (1− π)2 −4 πu00 (x)

2u00 (y) (1− π) πu00 (x) πu00 (x)u00 (y) (1− π)

⎤⎦⎡⎣ u0 (x) 0−u0 (y) 00 2

⎤⎦ =

= 1πu00(x)+4(1−π)u00(y)

⎡⎣ −u0 (x)− 2u0 (y) 4u00 (y) (1− π)2u0 (x)− 4u0 (y) 2πu00 (x)

2u00 (y) (1− π) · u0 (x) + πu00 (x) (−u0 (y)) 2πu00 (x)u00 (y) (1− π)

⎤⎦3.i. Canonical form.For given π ∈ (0, 1) , w1, w2 ∈ R++,


w1 −m− x ≥ 0 λxw2 +m− y ≥ 0 λy

where λx and λy are the multipliers associated with the first and the second constraint respec-tively.ii. The set X and the functions f and g ( X open and convex; f and g at least differentiable)The set X = R2++×R is open and convex. The constraint functions are affine and therefore C2.

The gradient and the hessian matrix of the objective function are computed below:

x y m

π log x+ (1− π) log y πx

1−πy 0

x y m

πx − π

x2 0 0

1−πy 0 −1−πy2 0

0 0 0 0


Therefore, the objective function is C2 and concave, but not strictly concave.iii. Existence.The problem has the same solution set as the following problem:


w1 −m− x ≥ 0 λxw2 +m− y ≥ 0 λyπ log x+ (1− π) log y ≥ π logw1 + (1− π) logw2

whose constraint set is compact (details left to the reader).iv. Number of solutions.The objective function is concave and the constraint functions are affine; uniqueness is not

insured on the basis of the sufficient conditions presented in the notes.v. Necessity of K-T conditions.Constraint functions are affine and therefore pseudo-concave. Choose (x, y,m)++ =

¡w12 ,

w22 , 0

¢.

vi. Sufficiency of Kuhn-Tucker conditions.f is concave and therefore pseudo-concave and constraint functions are affine and therefore

quasi-concave..vii. K-T conditions.

DxL = 0⇒ πx − λx = 0

DyL = 0⇒ 1−πy − λy = 0

DmL = 0⇒ −λx + λy = 0min {w1 −m− x, λx} = 0min {w2 +m− y, λy} = 0

viii. Solve the K-T conditions.Constraints are binding: λx = π

x > 0 and λy =1−πy > 0. Then, we get

λx =πx and x = π

λxλy =

1−πy and y = 1−π

λy

λx = λy := λw1 −m− x = 0w2 +m− y = 0

λx =πx and x = π

λxλy =

1−πy and y = 1−π

λy

λx = λyw1 −m− π

λ = 0w2 +m− 1−π

λ = 0

Then w1 − πλ = −w2 +

1−πλ and λ = 1

w1+w2.Therefore

λx =1

w1+w2λy =

1w1+w2

x = π (w1 + w2)y = (1− π) (w1 + w2)

b., c.Computations of the desired derivatives are straightforward.4.i. Canonical form.

max(x,y)∈R2 −x2 − y2 + 4x+ 6y s.t. −x− y + 6 ≥ 0 λ12− y ≥ 0 λ2x ≥ 0 μxy ≥ 0. μy

ii. The set X and the functions f and g ( X open and convex; f and g at least differentiable)


X = R2 is open and convex. The constraint functions are affine and therefore C2. The gradientand hessian matrix of the objective function are computed below.

x y−x2 − y2 + 4x+ 6y −2x+ 4 −2y + 6

,

x y

−2x+ 4 −2 0−2y + 6 0 −2

Therefore the objective function is C2 and strictly concave.iii. Existence.The constraint set C is nonempty ( 0 belongs to it) and closed. It is bounded below by 0. y is

bounded above by 2. x is bounded above because of the first constraint:: x ≤ 6−yy≥0≤ 6. Therefore

C is compact.iv. Number of solutions.Since the objective function is strictly concave (and therefore strictly quasi-concave) and the

constraint function are affine and therefore quasi-concave, the solution is unique.v. Necessity of K-T conditions.Constraints are affine and therefore pseudo-concave. Take (x++, y++) = (1, 1) .vi. Sufficiency of K-T conditions.The objective function is strictly concave and therefore pseudo-concave. Constraints are affine

and therefore quasi-concave.vii. K-T conditions.L¡x, y, λ1, λ2, μx, μy

¢= −x2 − y2 + 4x+ 6y + λ1 · (−x− y + 6) + +λ2 · (2− y) + μxx+ μyy.

−2x+ 4− λ1 + μx = 0−2y + 6− λ2 + μy = 0min {−x− y + 6, λ1} = 0min {2− y, λ2} = 0min {x, μx} = 0min

©y, μy

ª= 0

b.

+4− λ1 + μx = 0−2y + 6− λ2 + μy = 0min {−y + 6, λ1} = 0min {2− y, λ2} = 0min {0, μx} = 0min

©y, μy

ª= 0

Since y ≤ 2, we get −y + 6 > 0 and therefore λ1 = 0. But then μx = −4, which contradicts theKuhn-Tucker conditions above.c.

−4 + 4− λ1 + μx = 0−4 + 6− λ2 + μy = 0min {−4 + 6, λ1} = 0min {2− 2, λ2} = 0min {2, μx} = 0min

©2, μy

ª= 0

−λ1 + μx = 0+2− λ2 + μy = 0λ1 = 0min {0, λ2} = 0μx = 0μy = 0


μx = 0+2− λ2 = 0λ1 = 0min {0, λ2} = 0μx = 0μy = 0

μx = 0λ2 = 2λ1 = 0μx = 0μy = 0

Therefore¡x∗, y∗, λ∗1, λ

∗2, μ∗x, μ∗y

¢= (2, 2, 0, 2, 0) is a solution to the Kuhn-Tucker conditions

5 .1

5. a.i. Canonical form.For given δ ∈ (0, 1) , e ∈ R++,

max(c1,c2,k)∈R3 u (c1) + δu (c2)s.t.e− c1 − k ≥ 0 λ1f (k)− c2 ≥ 0 λ2c1 ≥ 0 μ1c2 ≥ 0 μ2k ≥ 0. μ3

ii. The set X and the functions f and g ( X open and convex; f and g at least differentiable).X = R3 is open and convex. Let’s compute the gradient and the Hessian matrix of the second

constraint:c1 c2 k

f (k)− c2 0 −1 f 0 (k)

c1 c2 k

0 0 0 0

−1 0 0 0

f 0 (k) 0 0 f 00 (k) < 0

Therefore the second constraint function is C2 and concave; the other constraint functions areaffine.Let’s compute the gradient and the Hessian matrix of the objective functions:

c1 c2 k

u (c1) + δu (c2) u0 (c1) δu0 (c2) 0

c1 c2 k

u0 (c1) u00 (c1) 0 0

δu0 (c2) 0 δu00 (c2) 0

0 0 0 0

Therefore the objective function is C2 and concave.

1Exercise 7 and 8 are taken from David Cass’ problem sets for his Microeconomics course at the University ofPennsylvania.


iii. Existence.The objective function is continuous on R3.The constraint set is closed because inverse image of closed sets via continuous functions. It is

bounded below by 0. It is bounded above: suppose not thenif c1 → +∞, then from the first constraint it must be k → −∞, which is impossible;if c2 → +∞, then from the second constraint and the fact that f 0 > 0, it must be k → +∞,

violating the first constraint;if k → +∞, then the first constraint is violated.Therefore, as an application of the Extreme Value Theorem, a solution exists.

iv. Number of solutions.Since the objective function is concave and the constraint functions are either concave or affine,

uniqueness is not insured on the basis of the sufficient conditions presented in the notes.v. Necessity of K-T conditions.The constraints are affine or concave. Take

¡c++1 , c++2 , k++

¢=¡e4 ,

12f¡e4

¢, e4¢. Then the con-

straints are verified with strict inequality.vi. Sufficiency of K-T conditions.The objective function is concave and therefore pseudo-concave. The constraint functions are

either concave or affine and therefore quasi-concave.vii. K-T conditions.L (c1, c2, k, λ1, λ2, μ1, μ2, μ3) := u (c1)+δu (c2)+λ1 (e− c1 − k)+λ2 (f (k)− c2)+μ1c1+μ2c2+

μ3k. ⎧⎪⎪⎪⎪⎪⎪⎪⎪⎪⎪⎨⎪⎪⎪⎪⎪⎪⎪⎪⎪⎪⎩

u0 (c1)− λ1 + μ1 = 0δu0 (c2)− λ2 + μ2 = 0−λ1 + λ2f

0 (k) + μ3 = 0min {e− c1 − k, λ1} = 0min {f (k)− c2, λ2} = 0min {c1, μ1} = 0min {c2, μ2} = 0min {k, μ3} = 0

viii. Solve the K-T conditions.Since we are looking for positive solution we get⎧⎪⎪⎪⎪⎪⎪⎪⎪⎪⎪⎨⎪⎪⎪⎪⎪⎪⎪⎪⎪⎪⎩

u0 (c1)− λ1 = 0δu0 (c2)− λ2 = 0−λ1 + λ2f

0 (k) = 0min {e− c1 − k, λ1} = 0min {f (k)− c2, λ2} = 0μ1 = 0μ2 = 0μ3 = 0⎧⎪⎪⎪⎪⎨⎪⎪⎪⎪⎩

u0 (c1)− λ1 = 0δu0 (c2)− λ2 = 0−λ1 + λ2f

0 (k) = 0e− c1 − k = 0f (k)− c2 = 0

Observe that from the first two equations of the above system, λ1, λ2 > 0.5. b.i. Canonical form.For given p > 0, w > 0 and l > 0,

max(x,l)∈R2 u (x, l)s.t.

−px− wl + wl ≥ 0l − l ≥ 0x ≥ 0l ≥ 0


ii. The set X and the functions f and g ( X open and convex; f and g at least differentiable).X = R2 is open and convex.The constraint functions are affine and therefore C2. The objective function is C2 and differen-

tiably strictly quasi concave by assumption.iii. Existence.The objective function is continuous on R3.The constraint set is closed because inverse image of closed sets via continuous functions. It is

bounded below by 0. It is bounded above: suppose not thenif x→ +∞, then from the first constraint (px+wl = wl),it must be l→ −∞, which is impossible.

Similar case is obtained, if l→ +∞.

Therefore, as an application of the Extreme Value Theorem, a solution exists.iv. Number of solutions.The budget set is convex. The function is differentiably strictly quasi concave and therefore

strictly quasi-concave and the solution is unique.v. Necessity of K-T conditions.The constraints are pseudo-concave (x++, l++) = (wl3p ,

l3) satisfies the constraints with strict

inequalities.vi. Sufficiency of K-T conditions.The objective function is differentiably strictly quasi-concave and therefore pseudo-concave. The

constraint functions are quasi-concavevii. K-T conditions.L¡c1, c2, k, λ1, λ2, λ3, λ4; p,w, l

¢:= u (x, l) + λ1

¡−px− wl + wl

¢+ λ2

¡l − l

¢+ λ3x+ λ4l.

Dxu− λ1p+ λ3 = 0Dlu− λ1w − λ2 + λ4 = 0min

©−px− wl + wl, λ1

ª= 0

min©l − l, λ2

ª= 0

min {x, λ3} = 0min {l, λ4} = 0

viii. Solve the K-T conditions.Since we are looking for solutions at which x > 0 and 0 < l < l, we get

Dxu− λ1p+ λ3 = 0Dlu− λ1w − λ2 + λ4 = 0min


ª= 0

λ2 = 0λ3 = 0λ4 = 0

Dxu− λ1p = 0Dlu− λ1w = 0min


ª= 0

and then λ1 > 0 and

⎧⎨⎩Dxu− λ1p = 0Dlu− λ1w = 0

−px− wl + wl = 0.

7.a.Let’s apply the Implicit Function Theorem (:= IFT) to the conditions found in Exercise 7.(a).

Writing them in the usual informal way we have:


c1 c2 k λ1 λ2 e a

u0 (c1)− λ1 = 0 u00 (c1) −1δu0 (c2)− λ2 = 0 δu00 (c2) −1−λ1 + λ2f

0 (k) = 0 λ2f00 (k) −1 f 0 (k) λ2αk

α−1

e− c1 − k = 0 −1 −1 1f (k)− c2 = 0 −1 f 0 (k) kα

To apply the IFT, we need to check that the following matrix has full rank

M :=

⎡⎢⎢⎢⎢⎣u00 (c1) −1

δu00 (c2) −1λ2f

00 (k) −1 f 0 (k)−1 −1

−1 f 0 (k)

⎤⎥⎥⎥⎥⎦.Suppose not then there exists ∆ := (∆c1,∆c2,∆k,∆λ1,∆λ2) 6= 0 such that M∆ = 0, i.e.,⎧⎪⎪⎪⎪⎨⎪⎪⎪⎪⎩

u00 (c1)∆c1+ −∆λ1 = 0δu00 (c2)∆c2+ −∆λ2 = 0

λ2f00 (k)∆k+ −∆λ1+ f 0 (k)∆λ2 = 0

−∆c1+ −∆k = 0−∆c2+ f 0 (k)∆k+ = 0

Recall that[M∆ = 0⇒ ∆ = 0] iff M has full rank.

The idea of the proof is either you prove directly [M∆ = 0⇒ ∆ = 0] , or you 1. assumeM∆ = 0and ∆ 6= 0 and you get a contradiction.If we define ∆c := (∆c1,∆c2), ∆λ := (∆λ1,∆λ2) , D

2 :=

∙u00 (c1)∆c1

δu00 (c2)∆c2

¸, the

above system can be rewritten as⎧⎪⎪⎨⎪⎪⎩D2∆c+ −∆λ = 0

λ2f00 (k)∆k+ [−1, f 0 (k)]∆λ = 0

−∆c+∙−1

f 0 (k)

¸∆k = 0

.

⎧⎪⎪⎨⎪⎪⎩∆cTD2∆c+ −∆cT∆λ = 0 (1)

∆kλ2f00 (k)∆k+ ∆k [−1, f 0 (k)]∆λ = 0 (2)

−∆λT∆c+ ∆λT∙−1

f 0 (k)

¸∆k = 0 (3)

.

∆cTD2∆c(1)= ∆cT∆λ = −∆λT∆c (3)= ∆λT

∙−1

f 0 (k)

¸∆k

(2)= ∆k [−1, f 0 (k)]∆λ =

= ∆kλ2f00 (k)∆k > 0,

while ∆cTD2∆c = (∆c1)2 u00 (c1)+(∆c2)

2 δu00 (c1) < 0. since we got a contradiction, M has fullrank.Therefore, in a neighborhood of the solution we have

D(e,a) (c1, c2, k, λ1, λ2) = −

⎡⎢⎢⎢⎢⎣u00 (c1) −1

δu00 (c2) −1λ2f

00 (k) −1 f 0 (k)−1 −1

−1 f 0 (k)

⎤⎥⎥⎥⎥⎦−1 ⎡⎢⎢⎢⎢⎣ λ2αk

α−1

1kα

⎤⎥⎥⎥⎥⎦ .To compute the inverse of the above matrix, we can use the following fact about the inverse of

partitioned matrix (see Goldberger, (1964), page 27:Let A be an n× n nonsingular matrix partitioned as


A =

∙E FG H

¸,

where En1×n1 , Fn1×n2 , Gn2×n1 , Hn2×n2 and n1 + n2 = n. Suppose that E and D := H −GE−1F are non singular. Then

A−1 =

∙E−1

¡I + FD−1GE−1

¢−E−1FD−1

−D−1GE−1 D−1

¸.

In fact, using Maple, with obviously simplified notation, we get⎡⎢⎢⎢⎢⎣u1 0 0 −1 00 δu2 0 0 −10 0 λ2f2 −1 f1−1 0 −1 0 00 −1 f1 0 0

⎤⎥⎥⎥⎥⎦−1

=

⎡⎢⎢⎢⎢⎢⎢⎢⎣

1u1+δu2f21+λ2f2

− f1u1+δu2f21+λ2f2

− 1u1+δu2f21+λ2f2

− δu2f21+λ2f2

u1+δu2f21+λ2f2−δu2 f1

u1+δu2f21+λ2f2

− f1u1+δu2f21+λ2f2

f21u1+δu2f21+λ2f2

f1u1+δu2f21+λ2f2

−f1 u1u1+δu2f21+λ2f2

− u1+λ2f2u1+δu2f21+λ2f2

− 1u1+δu2f21+λ2f2

f1u1+δu2f21+λ2f2

1u1+δu2f21+λ2f2

− u1u1+δu2f21+λ2f2

δu2f1

u1+δu2f21+λ2f2

− δu2f21+λ2f2

u1+δu2f21+λ2f2−f1 u1

u1+δu2f21+λ2f2− u1

u1+δu2f21+λ2f2−u1 δu2f

21+λ2f2

u1+δu2f21+λ2f2−δu2f1 u1

u1+δu2f21+λ2f2

−δu2 f1u1+δu2f21+λ2f2

− u1+λ2f2u1+δu2f21+λ2f2

δu2f1

u1+δu2f21+λ2f2−δu2f1 u1

u1+δu2f21+λ2f2−δu2 u1+λ2f2

u1+δu2f21+λ2f2

⎤⎥⎥⎥⎥⎥⎥⎥⎦Therefore,

⎡⎢⎢⎢⎢⎣Dec1 Dac1Dec2 Dac2Dek DakDeλ1 Daλ1De, λ2 Da, λ2

⎤⎥⎥⎥⎥⎦ =⎡⎢⎢⎢⎢⎢⎢⎢⎣

δu2f1+λ2f2u1+δu2f1+λ2f2

λ2αkα−1+δu2f1k

α

u1+δu2f1+λ2f2

f1u1

u1+δu2f1+λ2f2

−f1λ2αkα−1+kαu1+kαλ2f2u1+δu2f1+λ2f2

u1u1+δu2f1+λ2f2

−λ2αkα−1+δu2f1k

α

u1+δu2f1+λ2f2

u1δu2f1+λ2f2

u1+δu2f1+λ2f2u1

λ2αkα−1+δu2f1k

α

u1+δu2f1+λ2f2

δu2f1u1

u1+δu2f1+λ2f2δu2

−f1λ2αkα−1+kαu1+kαλ2f2u1+δu2f1+λ2f2

⎤⎥⎥⎥⎥⎥⎥⎥⎦.

Then Dec1 =δu2f1+λ2f2

u1+δu2f1+λ2f2:=

+

δ−

u00(c2)+

f 0++

λ2−f 00

u00(c1)−

+δ+u00(c2)−

f 0++λ2

+f 00−

” = −−” > 0

Dec2 = f1u1

u1+δu2f1+λ2f2:=

+

f 0−

u00(c1)u00(c1)−

+δ+u00(c2)−

f 0++λ2

+f 00−

” = −−” > 0

Dak = −λ2αkα−1+δu2f1k

α

u1+δu2f1+λ2f2:= −

+

λ2

+

αkα−1++

δ−

u00(c2)+

f 0+

kα

u00(c1)−

+δ+u00(c2)−

f 0++λ2

+f 00−

,

which has sign equal to signµ+

λ2+

αkα−1 ++

δ−

u00 (c2)+

f 0+

kα¶.

b.Let’s apply the Implicit Function Theorem to the conditions found in a previous exercise. Writ-

ing them in the usual informal way we have:

x l λ1 p w l

Dxu− λ1p = 0 D2x D2

xl −p −λ1Dlu− λ1w = 0 D2

xl D2l −w −λ1

−px− wl + wl = 0 −p −w −1 l − l w

To apply the IFT, we need to check that the following matrix has full rank

M :=

⎡⎣ D2x D2

xl −pD2xl D2

l −w−p −w

⎤⎦.


Defined D2 :=

∙D2x D2

xl

D2xl D2

l

¸, q :=

∙−p−w

¸, we have M :=

∙D2 −q−qT

¸.

Suppose not then there exists ∆ := (∆y,∆λ) ∈¡R2 ×R

¢\ {0} such that M∆ = 0, i.e.,½

D2∆y −q∆λ = 0 (1)−qT∆y = 0 (2)

We are going to showStep 1. ∆y 6= 0; Step 2. Du ·∆y = 0; Step 3. It is not the case that ∆yTD2∆y < 0.These results contradict the assumption about u.Step 1.Suppose ∆y = 0. Since q À 0, from (1) , we get ∆λ = 0, and therefore ∆ = 0, a contradiction.Step 2.From the First Order Conditions, we haveDu− λ1q = 0 (3) .

Du∆y(3)= λ1q∆y

(2)= 0.

Step 3.

∆yTD2∆y(1)= ∆yT q∆λ

(2)= 0.

Therefore, in a neighborhood of the solution we have

D(p,w,l) (x, l, λ1) = −

⎡⎣ D2x D2

xl −pD2xl D2

l −w−p −w

⎤⎦−1 ⎡⎣ −λ1 −λ1−1 l − l w

⎤⎦ .Unfortunately, here we cannot use the formula seen in the Exercise 4 (a) because the Hessian of

the utility function is not necessarily nonsingular. We can invert the matrix using the definition ofinverse. (For the inverse of a partitioned matrix with this characteristics see also Dhrymes, P. J.,(1978), Mathematics for Econometrics, 2nd edition, Springer-Verlag, New York, NY, Addendumpages 142-144.With obvious notation and using Maple, we get⎡⎣ dx d −p

d dl −w−p −w 0

⎤⎦−1 =⎡⎢⎣

w2

dxw2−2dpw+p2dl −p wdxw2−2dpw+p2dl − −dw+pdl

dxw2−2dpw+p2dl−p w

dxw2−2dpw+p2dlp2

dxw2−2dpw+p2dl−dxw+dp

dxw2−2dpw+p2dl− −dw+pdl


dxw2−2dpw+p2dl−dxdl+d2

dxw2−2dpw+p2dl

⎤⎥⎦Therefore,D(p,w,l) (x, l, λ1) =

−

⎡⎢⎣w2

dxw2−2dpw+p2dl −p wdxw2−2dpw+p2dl − −dw+pdl

dxw2−2dpw+p2dl−p w

dxw2−2dpw+p2dlp2


dxw2−2dpw+p2dl− −dw+pdl


dxw2−2dpw+p2dl−dxdl+d2

dxw2−2dpw+p2dl

⎤⎥⎦⎡⎣ −λ1 0 00 −λ1 0−1 l − l w

⎤⎦ =

=

⎡⎢⎣ − −w2λ1−dw+pdl

dxw2−2dpw+p2dl − pwλ1−ldw+lpdldxw2−2dpw+p2dl

−dw+pdldxw2−2dpw+p2dlw

−pwλ1−dxw+dpdxw2−2dpw+p2dl

p2λ1−ldxw+ldpdxw2−2dpw+p2dl − −dxw+dp

dxw2−2dpw+p2dlw

−−λ1dw+λ1pdl+dxdl−d2

dxw2−2dpw+p2dl−λ1dxw+λ1dp−ldxdl+ld2

dxw2−2dpw+p2dl − −dxdl+d2dxw2−2dpw+p2dlw

⎤⎥⎦Dpl =

−pwλ1 − dxw + dp

dxw2 − 2dpw + p2dl

Dwl =p2λ1 − ldxw + ldp

dxw2 − 2dpw + p2dl.

The sign of these expressions is ambiguous, unless other assumptions are made.


References

Aliprantis, C. D.and O. Burkinshaw, (1990), Principles of Real Analysis, 2nd edition, AcademicPress, St Louis, Missouri.

Apostol, T. M., (1967), Calculus, Volume 1, 2nd edition, John Wiley & Sons, New York, NY.

Apostol, T. M., (1974), Mathematical Analysis, 2nd edition, Addison-Wesley Publishing Company,Reading, MA.

Azariadis, C., (2000), Intertemporal Macroeconomics, Blackwell Publisher Ltd, Oxford, UK.

Balasko, Y., (1988), Foundations of the Theory of General Equilibrium,. Academic Press, Boston.

Bazaraa, M.S. and C. M. Shetty, (1976), Foundations of Optimization, Springer-Verlag, Berlin.

Bartle, R. G., (1964), The Elements of Real Analysis, John Wiley & Sons, New York, NY.

Cass D., (1991), Nonlinear Programming for Economists, mimeo, University of Pennsylvania.

de la Fuente, A., (2000), Mathematical Methods and Models for Economists, Cambridge UniversityPress, Cambridge, UK.

Dhrymes, P. J., (1978), Mathematics for Econometrics, 2nd edition Springer-Verlag, New York,NY

El-Hodiri, M. A., (1991), Extrema of smooth functions, Springer-Verlag, Berlin.

Girsanov, I.V., (1972), Lectures on Mathematical Theory of Extremum Problems, Springer-Verlag,Berlin and New York.

Goldberger, A.S, (1964), Econometric Theory, John Wiley & Sons, New York, NY.

Hildebrand, W., (1974), Core and equilibria of a large economy, Princiton University Press, Prince-ton, NJ.

Hirsch, M. W. and S. Smale, (1974), Differential Equations, Dynamical Systems and Linear Alge-bra, Academic Press, New York, NY.

Lang S. (1971), Linear Algebra, second edition, Addison Wesley, Reading.

Lang, S., (1986), Introduction to Linear Algebra, 2nd edition, Springer-Verlag, New York, NY.

Lipschutz, S., (1991), Linear Algebra, 2nd edition, McGraw-Hill, New York, NY.

Lipschutz, S., (1965), General Topology, McGraw-Hill, New York, NY.

Mangasarian O. L. (1994), Nonlinear Programming, SIAM, Philadelphia.

McLean, R., (1985), Class notes for the course of Mathematical Economics (708), University ofPennsylvania, Philadelphia, PA, mimeo.

Moore, J. C., (1999), Mathematical Methods for Economic Theory 1, Springer-Verlag, Berlin.

Moore, J. C., (1999), Mathematical Methods for Economic Theory 2, Springer-Verlag, Berlin.

Ok, E. A., (2007), Real Analysis with Economic Applications, Princeton University Press, Prince-ton NJ.

Rudin, W., (1976), Principles of Mathematical Analysis, 3rd edition, McGraw-Hill, New York,NY.

Simon, C. S. (1986), Scalar and vector maximization: calculus techniques with economic appli-cations, in Reiter S, Studies in mathematical economics, The Mathematical Association ofAmerica, p. 62-159

Simon, C. P. and L. Blume, (1994), Mathematics for Economists, Norton, New York


Simmons, G. F., (1963), Introduction to Topology and Modern Analysis, McGraw-Hill, New York.

Smith, L.,(1992), Linear Algebra, 2nd edition, Springer-Verlag, New York, NY.

Stokey, N. C., Lucas, E. C. (with Prescott, E. C.), (1989), Recursive methods in Economic Dy-namics, Harvard University Press, Cambridge, MA.

Sydsaeter, K., (1981), Topics in Mathematical Analysis for Economists, Academic Press, London,UK.

Taylor, A. E. and W. R. Mann, (1984), Advanced Calculus, 3rd ed., John Wiley & Sons, NewYork, NY.

Wrede, R. C. and M. Spiegel (2002), Theory and Problems of Advanced Calculus, 2nd edition,Schaum’s outlines, McGraw Hill, New York.

eui math-course-2012-09--19

Education