ifferential calculus - faculty.arts.ubc.cafaculty.arts.ubc.ca/pschrimpf/526/calculus-526.pdf ·...

23
Differential Calculus Paul Schrimpf October 31, 2018 University of British Columbia Economics 526 cba1 In this lecture, we will define derivatives for functions on vector spaces. We will show that all the familiar properties of derivatives — the mean value theorem, chain rule, etc — hold in any vector space. We will primarily focus on R n , but we also discuss infinite dimensional spaces (will we need to differentiate in them to study optimal control later). All of this material is also covered in chapter 4 of Carter. Chapter 14 of Simon and Blume and chapter 9 of Rudin’s Principles of Mathematical Analysis cover differentiation on R n . Simon and Blume is better for general understanding and applications, but Rudin is better for proofs and rigor. 1. Derivatives 1.1. Partial derivatives. We have discussed limits of sequences, but perhaps not limits of functions. To be complete, we define limits as follows. Definition 1.1. Let X and Y be metric spaces and f : XY. lim xx 0 f ( x ) c where x and x 0 X and c Y, means that ϵ> 0 δ> 0 such that d ( x , x 0 ) implies d ( f ( x ) , c ) . Equivalently, we could say lim xx 0 f ( x ) c means that for any sequence { x n } with x n x , f ( x n )→c . You are probably already familiar with the derivative of a function of one variable. Let f : RR. f is differentiable at x 0 if lim h0 f ( x 0 + h )- f ( x 0 ) h df dx ( x 0 ) exists. Similiarly, if f : V W we define its i th partial derivative as follows. Definition 1.2. Let f : R n R. The i th partial derivative of f is f x i ( x 0 ) lim h0 f ( x 01 , ..., x 0i + h , ... x 0n )- f ( x 0 ) h . The i th partial derivative tells you how much the function changes as its i th argument changes. 1 This work is licensed under a Creative Commons Attribution-ShareAlike 4.0 International License 1

Upload: buitram

Post on 23-Aug-2019

217 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: ifferential Calculus - faculty.arts.ubc.cafaculty.arts.ubc.ca/pschrimpf/526/calculus-526.pdf · DIFFERENTIAL CALCULUS Definition 1.4. Let f: Rn!R. we say that f has a local maximum

Differential CalculusPaul Schrimpf

October 31, 2018University of British Columbia

Economics 526cba1

In this lecture, we will define derivatives for functions on vector spaces. We will showthat all the familiar properties of derivatives — the mean value theorem, chain rule, etc— hold in any vector space. We will primarily focus on Rn , but we also discuss infinitedimensional spaces (will we need to differentiate in them to study optimal control later).All of this material is also covered in chapter 4 of Carter. Chapter 14 of Simon and Blumeand chapter 9 of Rudin’s Principles of Mathematical Analysis cover differentiation on Rn .Simon and Blume is better for general understanding and applications, but Rudin is betterfor proofs and rigor.

1. Derivatives

1.1. Partial derivatives. We have discussed limits of sequences, but perhaps not limits offunctions. To be complete, we define limits as follows.

Definition 1.1. Let X and Y be metric spaces and f : X→Y.

limx→x0

f (x) � c

where x and x0 ∈ X and c ∈ Y, means that ∀ϵ > 0 ∃δ > 0 such that d(x , x0) < δ impliesd( f (x), c) < ϵ.

Equivalently, we could say limx→x0 f (x) � c means that for any sequence {xn} withxn→x, f (xn)→c.

You are probably already familiar with the derivative of a function of one variable. Letf : R→R. f is differentiable at x0 if

limh→0

f (x0 + h) − f (x0)h

�d fdx

(x0)

exists. Similiarly, if f : V→W we define its ith partial derivative as follows.

Definition 1.2. Let f : Rn→R. The ith partial derivative of f is∂ f∂xi

(x0) � limh→0

f (x01, ..., x0i + h , ...x0n) − f (x0)h

.

The ith partial derivative tells you how much the function changes as its ith argumentchanges.

1This work is licensed under a Creative Commons Attribution-ShareAlike 4.0 International License1

Page 2: ifferential Calculus - faculty.arts.ubc.cafaculty.arts.ubc.ca/pschrimpf/526/calculus-526.pdf · DIFFERENTIAL CALCULUS Definition 1.4. Let f: Rn!R. we say that f has a local maximum

DIFFERENTIAL CALCULUS

Example 1.1. Let f : Rn→R be a production function. Then we call ∂ f∂xi

the marginalproduct of xi . If f is Cobb-Douglas, f (k , l) � Akα lβ, where k is capital and l is labor,then the marginal products of capital and labor are

∂ f∂k

(k , l) �Aαkα−1lβ

∂ f∂l

(k , l) �Aβkα lβ−1.

1.2. Examples.

Example 1.2. If u : Rn→R is a utility function, then we call ∂u∂xi

the marginal utility ofxi . If u is CRRA,

u(c1, ..., cT) �T∑

t�1βt c1−γ

t

1 − γthen the marginal utility of consumption in period t is

∂u∂ct

� βt c−γt .

Example 1.3. The price elasticity of demand is the percentage change in demanddivided by the percentage change in its price. If q1 : R3→R is a demand function withthree arguments: own price p1, the price of another good, p2, and consumer income,y. The own price elasticity is

ϵq1 ,p1 �∂q1

∂p1

p1

q1(p1, p2, y).

The cross price elasticity is the percentage change in demand divided by the percentagechange in the other good’s price, i.e.

ϵq1 ,p2 �∂q1

∂p2

p2

q1(p1, p2, y).

Similarly, the income elasticity of demand is

ϵq1 ,y �∂q1

∂yy

q1(p1, p2, y).

1.3. Total derivatives. Derivatives of univariate functions have a number of useful prop-erties that partial derivatives do not always share. Examples of useful properties includeunivariate derivatives giving the slope of a tangent line, the implicit function theorem,and Taylor series approximations. We would like the derivatives of multivariate functionsto have these properties, but partial derivatives are not enough for this.

2

Page 3: ifferential Calculus - faculty.arts.ubc.cafaculty.arts.ubc.ca/pschrimpf/526/calculus-526.pdf · DIFFERENTIAL CALCULUS Definition 1.4. Let f: Rn!R. we say that f has a local maximum

DIFFERENTIAL CALCULUS

Example 1.4. Consider f : R2→R,

f (x , y) �{

x2 + y2 if x y < 0x + y if x y ≥ 0

(x 2 + y 2) (x y < 0) + (x + y) (x y >= 0)

-10

-5

0

5

10

x

-10

-5

0

5

10

y

-200

-150

-100

-50

0

50

100

The partial derivatives of this function at 0 are ∂ f∂x (0, 0) � 1 and ∂ f

∂y (0, 0) � 1. How-

ever, there are points arbitrarily close to zero with ∂ f∂x (x , y) � 2x + 2y. If we were to

try to draw a tangent plane to the function at zero, we would find that we cannot.Although the partial derivatives of this function exist everywhere, it is in some sensenot differentialable at zero (or anywhere with x y � 0).

Partially motivated by the preceding example, we define the total derivative (or just thederivative; we’re saying “total” to emphasize the difference between partial derivativesand the derivative).

Definition 1.3. Let f : Rn→R. The derivative (or total derivative or differential) of f at x0is a linear mapping, D fx0 : Rn→R1 such that

limh→0

�� f (x0 + h) − f (x0) − D fx0 h��

∥h∥ � 0.

The h in this definition is an n vector in Rn . This is contrast to the h in the definitionof partial derivatives, which was just a scalar. The fact that h is now a vector is importantbecause h can approach 0 along any path. Partial derivatives only look at the limits

3

Page 4: ifferential Calculus - faculty.arts.ubc.cafaculty.arts.ubc.ca/pschrimpf/526/calculus-526.pdf · DIFFERENTIAL CALCULUS Definition 1.4. Let f: Rn!R. we say that f has a local maximum

DIFFERENTIAL CALCULUS

as h approaches 0 along the axes. This allows partial derivatives to exist for strangefunctions like the one in example 1.4. We can see that the function from the example is notdifferentiable by letting h approach 0 along a path that switches from x y < 0 to x y ≥ 0infinitely many times close to 0. The limit in the definition of the derivative does not existalong such a path, so the derivative does not exist.

Comment 1.1. In proofs, it will be useful to define r(x , h) � f (x + h) − f (x) − D fx h.We will then repeatedly use the fact that limh→0

|r(x ,h)|∥h∥ � 0.

If the derivative of f at x0 exists, then so do the partial derivatives, and the totalderivative is simply the 1 × n matrix of partial derivatives.

Theorem 1.1. Let f : Rn→R be differentiable at x0, then ∂ f∂xi

(x0) exists for each i and

D fx0 h �

(∂ f∂x1

(x0) · · · ∂ f∂xn

(x0))

h.

Proof. Since f is differentiable at x0, we can make h � ei t for ei the ith standard basisvector, and t a scalar. The definition of derivative says that

limt→0

�� f (x0 + ei t) − f (x0) − D fx0(ei t)��

∥ei t∥� 0.

Let

ri(x0, t) � f (x0 + ei t) − f (x0) − tD fx0 ei

and note that limt→0|ri(x0 ,t)|

|t | � 0. Rearranging and dividing by t,

f (x0 + ei t) − f (x0)t

� D fx0 ei +ri(x0, t)

tand taking the limit

limt→0

f (x0 + ei t) − f (x0)t

� D fx0 ei

we get the exact same expression as in the definition of the partial derivative. There-fore, ∂ f

∂xi� D fx0 ei . Finally, as when we first introduced matrices, we know that linear

transformation D fx0 must be represented by

D fx0 h �

(∂ f∂x1

(x0) · · · ∂ f∂xn

(x0))

h.

We know from example 1.4 that the converse of this theorem is false. The existence ofpartial derivatives is not enough for a function to be differentiable. However, if the partialderivatives exist and are continuous in a neighborhood, then the function is differentiable.

Theorem 1.2. Let f : Rn→R and suppose its partial derivatives exist and are continuous inNδ(x0) for some δ > 0. Then f is differentiable at x0 with

D fx0 �

(∂ f∂x1

(x0) · · · ∂ f∂xn

(x0)).

4

Page 5: ifferential Calculus - faculty.arts.ubc.cafaculty.arts.ubc.ca/pschrimpf/526/calculus-526.pdf · DIFFERENTIAL CALCULUS Definition 1.4. Let f: Rn!R. we say that f has a local maximum

DIFFERENTIAL CALCULUS

Proof. Let h � (h1, ..., hn) with ∥h∥ < r. Notice that

f (x0 + h) − f (x0) � f (x0 + h1e1) − f (x0) + f (x0 + h1e1 + h2e2) − f (x0 + h1e1) + ... (1)

+ f (x0 + h) − f

(x0 −

n−1∑i�1

hi ei

)(2)

n∑j�1

f

(x0 +

j∑i�1

hi ei

)− f

(x0 +

j−1∑i�1

hi ei

). (3)

By the mean value theorem (1.5),

f

(x0 +

j∑i�1

hiei

)− f

(x0 +

j−1∑i�1

hiei

)� h j

∂ f∂x j

(x0 +

j−1∑i�1

hi ei + h je j)

for some h j between 0 and h j . The partial derivatives are continuous by assumption, soby making r small enough, we can make����� ∂ f

∂x j(x0 +

j−1∑i�1

hi ei + h je j) −∂ f∂x j

(x0)����� < ϵ/n ,

for any ϵ > 0. Combined with equation 3 now we have,

f (x0 + h) − f (x0) �n∑

j�1h j

(∂ f∂x j

(x0) +ϵn

)(4)������ f (x0 + h) − f (x0) −

n∑j�1

h j∂ f∂x j

(x0)

������ ������� n∑

j�1h jϵ/n

������ (5)�� f (x0 + h) − f (x0) − D fx0 h�� ≤ϵ ∥h∥ (6)

Dividing by ∥h∥ and taking the limit,

limh→0

�� f (x0 + h) − f (x0) − D fx0 h��

∥h∥ ≤ ϵ.

This is true for any ϵ > 0, so the limit must be 0. □

A minor modification of this proof would show the stronger result that f : Rn→R hasa continuous derivative on an open set U ⊆ Rn if and only if its partial derivatives arecontinuous on U. We call such a function continuously differentiably on U and denotethe set of all such function as C1(U).1.4. Mean value theorem. The mean value theorem inR1 says that f (x+h)− f (x) � f ′(x)hfor some x between x + h and x. The same theorem holds for multivariate functions. Toprove it, we will need a couple of intermediate results.

Theorem 1.3. Let f : Rn→R be continuous and K ⊂ Rn be compact. Then ∃x∗ ∈ K such thatf (x∗) ≥ f (x)∀x ∈ K.

Proof. In the last set of notes. □5

Page 6: ifferential Calculus - faculty.arts.ubc.cafaculty.arts.ubc.ca/pschrimpf/526/calculus-526.pdf · DIFFERENTIAL CALCULUS Definition 1.4. Let f: Rn!R. we say that f has a local maximum

DIFFERENTIAL CALCULUS

Definition 1.4. Let f : Rn→R. we say that f has a local maximum at x if ∃δ > 0 such thatf (y) ≤ f (x) for all y ∈ Nδ(x).

Next, we need a result that relates derivatives to maxima.

Theorem 1.4. Let f : Rn→R and suppose f has a local maximum at x and is differentiable at x.Then D fx � 0.

Proof. Choose δ as in the definition of a local maximum. Since f is differentiable, we canwrite

f (x + h) − f (x)∥h∥ �

D fxh + r(x , h)∥h∥

where limh→0|r(x ,h)|∥h∥ � 0. Let h � tv for some v ∈ Rn with ∥v∥ � 1 and t ∈ R. If D fx v > 0,

then for t > 0 small enough, we would have f (x+tv)− f (x)|t | � D fxv +

r(x ,tv)|t | > D fxv/2 > 0

and f (x + tv) > f (x) in contradiction to x being a local maximum. Similary, if D fv v < 0then for t < 0 and small, we would have f (x+tv)− f (x)

|t | � D − fx v +r(x ,tv)

|t | > −D fx v/2 > 0and f (x + tv) > f (x). Thus, it must be that D fxv � 0 for all v, i.e. D fx � 0. □

Now we can prove the mean value theorem.

Theorem 1.5 (mean value). Let f : Rn→R1 be in continuously differentiable on some open setU (i.e. f ∈ C1(U). Let x , y ∈ U be such that the line connecting x and y, ℓ(x , y) � {z ∈ Rn :z � λx + (1 − λ)y , λ ∈ [0, 1]}, is also in U. Then there is some x ∈ ℓ(x , y) such that

f (x) − f (y) � D fx(x − y).Proof. Let z(t) � y + t(x − y) for t ∈ [0, 1] (i.e. t � λ). Define

g(t) � f (y) − f (z(t)) +(

f (x) − f (y))

t

Note that g(0) � g(1) � 0. The set [0, 1] is closed and bounded, so it is compact. It is easyto verify that g(t) is continuously differentiable since f is continuously differentiable .Hence, g must attain its maximum on [0, 1], say at t. If t � 0 or 1, then either g is constant,in which case any t ∈ (0, 1) is also a maximum, or g must have an interior minimum,and we can look at the maximum of −g instead. When t is not 0 or 1, then the previoustheorem shows that g′(t) � 0. Simple calculation shows that

g′(t) � −D fz(t)(x − y) + f (x) − f (y) � 0

soD fx(x − y) � f (x) − f (y)

where x � z(t). □

1.5. Functions from Rn→Rm . So far we have only looked at functions from Rn to R.Functions to Rm work essentially the same way.

Definition 1.5. Let f : Rn→Rm . The derivative (or total derivative or differential) of f atx0 is a linear mapping, D fx0 : Rn→Rm such that

limh→0

f (x0 + h) − f (x0) − D fx0 h

∥h∥ � 0.6

Page 7: ifferential Calculus - faculty.arts.ubc.cafaculty.arts.ubc.ca/pschrimpf/526/calculus-526.pdf · DIFFERENTIAL CALCULUS Definition 1.4. Let f: Rn!R. we say that f has a local maximum

DIFFERENTIAL CALCULUS

Theorems 1.1 and 2.1 sill hold with no modification. The total derivative of f can berepresented by the m by n matrix of partial derivatives,

D fx0 �

©­­­«∂ f1∂x1

(x0) · · · ∂ f1∂xn

(x0)...

...∂ fm∂x1

(x0) · · · ∂ fm∂xn

(x0)

ª®®®¬ .This matrix of partial derivatives is often called the Jacobian of f .

The mean value theorem 1.5 holds for each of the component functions of f : Rn→Rm .Meaning, that f can be written as f (x) �

(f1(x) · · · fm(x)

)T where each f j : Rn→R. Themean value theorem is true for each f j , but the x’s will typically differ with j.

Corollary 1.1 (mean value for Rn→Rm). Let f : Rn→Rm be in C1(U) for some open U. Letx , y ∈ U be such that the line connecting x and y, ℓ(x , y) � {z ∈ Rn : z � λx + (1 − λ)y , λ ∈[0, 1]}, is also in U. Then there are x j ∈ ℓ(x , y) such that

f j(x) − f j(y) � D f j x j(x − y)

and

f (x) − f (y) �©­­«

D f1 x1...

D fm xm

ª®®¬ (x − y).

Slightly abusing notation, we might at times write D fx instead of(D f1 x1

· · · D fm xm

)T

with the understanding that we mean the later.

1.6. Chain rule. For univariate functions, the chain rule says that the derivative of f (g(x))is f ′(g(x))g′(x). The same is true for multivariate functions.

Theorem 1.6. Let f : Rn→Rm and g : Rk→Rn . Let g be continuously differentiable on someopen set U and f be continuously differentiable on g(U). Then h : Rk→Rm , h(x) � f (g(x)) iscontinuously differentiable on U with

Dhx � D fg(x)D gx

Proof. Let x ∈ U. Consider f (g(x + d)) − f (g(x))

∥d∥ .

Since g is differentiable by the mean value theorem, g(x + d) � g(x) + D gx(d)d, so f (g(x + d)) − f (g(x)) �

f (g(x) + D gx(d)d) − f (g(x))

≤ f (g(x) + D gxd) − f (g(x))

+ ϵwhere the inequality follows from the the continuity of D gx and f , and holds for anyϵ > 0. f is differentiable, so

limD gx d→0

f (g(x) + D gx d) − f (g(x)) − D fg(x)D gx d D gx d

� 0

7

Page 8: ifferential Calculus - faculty.arts.ubc.cafaculty.arts.ubc.ca/pschrimpf/526/calculus-526.pdf · DIFFERENTIAL CALCULUS Definition 1.4. Let f: Rn!R. we say that f has a local maximum

DIFFERENTIAL CALCULUS

Using the Cauchy-Schwarz inequality, D gx d

≤ D gx

∥d∥, we get f (g(x) + D gx d) − f (g(x)) − D fg(x)D gxd D gx

∥d∥≤

f (g(x) + D gxd) − f (g(x)) − D fg(x)D gx d D gxd

so

limd→0

f (g(x) + D gx d) − f (g(x)) − D fg(x)D gxd

∥d∥ � 0.

1.7. Higher order derivatives. We can take higher order derivatives of multivariate func-tions just like of univariate functions. If f : Rn→Rm , then f has nm partial first derivatives.Each of these has n partial derivatives, so f has n2m partial second derivatives, written∂2 fk∂xi∂x j

.

Theorem 1.7. Let f : Rn→Rm be twice continuously differentiable on some open set U. Then

∂2 fk

∂xi∂x j(x) � ∂2 fk

∂x j∂xi(x)

for all i , j, k and x ∈ U.

Proof. Using the definition of partial derivative, twice, we have

∂2 f∂xi∂x j

� limt j→0

limti→0f (x+ti ei+t j e j)− f (x+t j e j)

ti− limti→0

f (x+ti ei)− f (x)ti

t j

� limt j→0

limti→0

f (x + t j e j + ti ei) − f (x + t j e j) − f (x + ti ei) + f (x)t j ti

from which it is apparent that we get the same expression for ∂2 f∂x j∂xi

.1 □

The same argument shows that in general the order of partial derivatives does notmatter.

Corollary 1.2. Let f : Rn→Rm be k times continuously differentiable on some open set U. Then

∂k f

∂x j11 · · · ∂x jn

n

�∂k f

∂xjp(1)p(1) · · · ∂x

jp(n)p(n)

where∑n

i�1 ji � k and p : {1, .., n}→{1, ..., n} is any permutation (i.e. reordering).

1This proof is not completely correct. We should carefully show that we can interchange the order oftaking limits. Interchanging limits is not always possible, but the assumed continuity makes it possiblehere.

8

Page 9: ifferential Calculus - faculty.arts.ubc.cafaculty.arts.ubc.ca/pschrimpf/526/calculus-526.pdf · DIFFERENTIAL CALCULUS Definition 1.4. Let f: Rn!R. we say that f has a local maximum

DIFFERENTIAL CALCULUS

1.8. Taylor series. You have probably seen Taylor series for univariate functions before.A function can be approximated by a polynomial whose coefficients are the function’sderivatives.

Theorem 1.8. Let f : R→R be k + 1 times continuously differentiable on some open set U, andlet a, a + h ∈ U. Then

f (a + h) � f (a) + f ′(a)h +f 2(a)

2 h2+ ... +

f k(a)k! hk

+f k+1(a)(k + 1)! hk+1

where a is between a and h.

The same theorem is true for multivariate functions.

Theorem 1.9. Let f : Rn→Rm be k times continuously differentiable on some open set U anda , a + h ∈ U. Then there exists a k times continuously differentiable function rk(a , h) such that

f (a + h) � f (a) +k∑

∑ni�1 ji�1

1k!

∂∑

ji f

∂x j11 · · · ∂x jn

n

(a)h j11 h j2

2 · · · h jnn + rk(a , h)

and limh→0 ∥rk(a , h)∥ ∥h∥k� 0.

Proof. Follows from the mean value theorem. For k � 1, the mean value theorem says that

f (a + h) − f (a) �D fa h

f (a + h) � f (a) + D fa h

� f (a) + D fa h + (D fa − D fa)h︸ ︷︷ ︸r1(a ,h)

D fa is continuous as a function of a, and as h→0, a→a, so limh→0 r1(a , h) � 0, and thetheorem is true for k � 1. For general k, suppose we have proven the theorem up to k − 1.Then repeating the same argument with the k−1st derivative of f in place of f shows thattheorem is true for k. The only complication is the division by k!. To see where it comesfrom, we will just focus on f : R→R. The idea is the same for Rn , but the notation getsmessy. Suppose we want a second order approximation to f at a,

f (h) � f (a) + f ′(a)h + c2 f 2(a)h2

9

Page 10: ifferential Calculus - faculty.arts.ubc.cafaculty.arts.ubc.ca/pschrimpf/526/calculus-526.pdf · DIFFERENTIAL CALCULUS Definition 1.4. Let f: Rn!R. we say that f has a local maximum

DIFFERENTIAL CALCULUS

and pretend that we do not know c2. Consider f (a + h) � f (h). Applying the mean valuetheorem to the difference of these functions twice, we have

f (a + h) − f (h) � f (a) − f (0)︸︷︷︸� f (a)

+

f ′(a + h1) − f ′(h1)︸︷︷︸� f ′(a)

h

� f ′(a) − f ′(0) +

f 2(a + h2) − f 2(h2)︸ ︷︷ ︸�2c2 f 2(a)

h1h

� f 2(a)(1 − 2c2)h1h + f 3(a + h3)h2h1h

if we set c2 �12 , we can eliminate one term and

| f (a + h) − f (h)| ≤ | f 3(a + h3)h3︸ ︷︷ ︸�r2(a ,h)

|.

Repeating this sort of argument, we will see that setting ck �1k! ensures that limh→0 ∥rk(a , h)∥ ∥h∥k

0. □

Example 1.5. The mean value theorem is used often in econometrics to show asymp-totic normality. Many estimators can be written as

θn ∈ arg minθ∈Θ

Qn(θ)

where Qn(θ) is some objective function that depends on the sampled data. Ex-amples include least squares, maximum likelihood and the generalized method ofmoments. Suppose there is also a population version of the objective function, Q0(θ)and Qn(θ)

p→ Q0(θ) as n→∞. There is a true value of the parameter, θ0, that satisfies

θ0 ∈ arg minθ∈Θ

Q0(θ).

For example for OLS,

Qn(θ) � 1n

n∑i�1

(yi − xiθ)2

andQ0(θ) � E

[(Y − Xθ)2

].

If Qn is continuously differentiablea on Θ and θn ∈ int(Θ), then from theorem 1.4,

DQnθn

� 0

10

Page 11: ifferential Calculus - faculty.arts.ubc.cafaculty.arts.ubc.ca/pschrimpf/526/calculus-526.pdf · DIFFERENTIAL CALCULUS Definition 1.4. Let f: Rn!R. we say that f has a local maximum

DIFFERENTIAL CALCULUS

Applying the mean value theorem,

0 �DQnθn

� DQnθ0

+ D2Qnθ(θn − θ0)

θn − θ0 � −(D2Qn

θ

)−1DQn

θ0.

Typically, some variant of the central limit theorem implies√

nDQnθ0

d→ N(0,Σ). Forexample for OLS,

√nDQn

θ �1√n

∑i

2(yi − xiθ)θ.

Also, typically D2Qnθ

p→ D2Q0

θ0, so by Slutsky’s theorem,b

√n(θn − θ0) � −

(D2Qn

θ

)−1 √nDQn

θ0

d→ N(0,

(D2Q0

θ0

)−1Σ

(D2Q0

θ0

)−1).

aEssentially the same argument works if you expand Q0 instead of Qn . This is sometimes necessarybecause there are some models, like quantile regression, where Qn is not differentiable, but Q0 isdifferentiable.bPart of Slutsky’s theorem says that if Xn

d→X and Ynp→ c, then Xn/Yn

d→X/c.

2. Functions on vector spaces

To analyze infinite dimensional optimization problems, we need to differentiate func-tions on infinite dimensional vector spaces. We already did this when studying optimalcontrol, but we glossed over the details. Anyway, we can define the derivative of a functionbetween any two vector spaces as follows.

Definition 2.1. Let f : V→W . The Fréchet derivative of f at x0 is a continuous2 linearmapping, D fx0 : V→W such that

limh→0

f (x0 + h) − f (x0) − D fx0 h

∥h∥ � 0.

Note that this definition is the same as the definition of total derivative.

Example 2.1. Let V � L∞(0, 1) and W � R. Suppose f is given by

f (x) �∫ 1

0g(x(τ), (τ))dτ

for some continuously differentiable function g : R2→R. Then D fx is a linear transfor-mation from V to R. How can we calculate D fx? If V were Rn we would calculate thepartial derivatives of f and then maybe check that they are continuous so that theoremholds. For an infinite dimensional space there are infinite partial derivatives, so wecannot possibly compute them all. However, we can look at directional derivatives.

2If V and W are finite dimensional, then all linear functions are continuous. In infinite dimensions, therecan be discontinuous linear functions.

11

Page 12: ifferential Calculus - faculty.arts.ubc.cafaculty.arts.ubc.ca/pschrimpf/526/calculus-526.pdf · DIFFERENTIAL CALCULUS Definition 1.4. Let f: Rn!R. we say that f has a local maximum

DIFFERENTIAL CALCULUS

Definition 2.2. Let f : V→W , v ∈ V and x ∈ U ⊆ V for some open U. The directionalderivative (or Gâteaux derivative when V is infinite dimensional) in direction v at x is

d f (x; v) � limα→0

f (x + αv) − f (x)α

.

where α ∈ R is a scalar.

Analogs of theorems 1.1 and 2.1 relates the Gâteaux derivative to the Fréchet derivative.

Lemma 2.1. If f : V→W is Fréchet differentiable at x, then the Gâteaux derivative, d f (x; v),exists for all v ∈ V , and

d f (x; v) � D fx v.

The proof of theorem 2.1 relies on the fact that Rn is finite dimensional. In fact, in aninfinite dimensional space it is not enough that all the directional derivatives be continuouson an open set around x for the function to be differentiable at x; we also require thedirectional derivatives to be linear in v. In finite dimensions we can always create a linearmap from the partial derivatives by arranging the partial derivatives in a matrix. In infinitedimensions, we cannot do that.

Lemma 2.2. If f : V→W has Gâteaux derivatives that are linear in v and “continuous” in x inthe sense that ∀ϵ > 0 ∃δ > 0 such that if ∥x1 − x∥ < δ, then

supv∈V

d f (x1; v) − d f (x; v)

∥v∥ < ϵ

then f is Fréchet differentiable with D fx0 v � d f (x; v).

Comment 2.1. This continuity in x is actually a very natural definition. If V and Ware normed vector spaces, then the set of all bounded (or equivalently continuous)linear transformations is also a normed vector space with norm

∥A∥ ≡ supv∈V

∥Av∥W

∥v∥V.

We are requiring d f (x; v) as a function of x to be continuous with respect to this norm.

Proof. Note that

f (x + h) − f (x) �∫ 1

0d f (x + th , h)dt

by the fundamental theorem of calculus (which we should really prove, but do not havetime for, so we will take it as given). Then, f (x + h) − f (x) − d f (x; h)

∫ 1

0d f (x + th , h) − d f (x , h)dt

∫ 1

0

d f (x + th , h) − d f (x , h) dt

12

Page 13: ifferential Calculus - faculty.arts.ubc.cafaculty.arts.ubc.ca/pschrimpf/526/calculus-526.pdf · DIFFERENTIAL CALCULUS Definition 1.4. Let f: Rn!R. we say that f has a local maximum

DIFFERENTIAL CALCULUS

By the definition of sup, (d f (x + th; h) − d f (x; h)) ≤ sup

v∈V

(d f (x + th; v) − d f (x; v))

∥v∥ ∥h∥ .

The continuity in x implies for any ϵ > 0 ∃δ > 0 such that if ∥th∥ < δ, then

supv∈V

(d f (x + th; v) − d f (x; v))

∥v∥ < ϵ.

Thus, f (x + h) − f (x) − d f (x; h) < ∫ 1

0ϵ ∥h∥ dt � ϵ ∥h∥ .

In other words, for any ϵ > 0 ∃δ > 0 such that if ∥h∥ < δ, then f (x + h) − f (x) − d f (x; h)

∥h∥ < ϵ,

and we can conclude that d f (x; h) � D fx h. □

Example (Example 2.1 continued). Motivated by lemmas 2.1 and 2.2, we can find theFréchet derivative of f by computing its Gâteaux derivatives. Let v ∈ V . Rememberthat both x and v are functions in this example. Then,

f (x + αv) �∫ 1

0g(x(τ) + αv(τ), τ)dτ

and

d f (x; v) � limα→0

∫ 10 g(x(τ) + αv(τ), τ)dτ

α

∫ 1

0

∂g∂x

(x(τ), τ)v(τ)dτ

Now, we can either check that these derivatives are linear and continuous, or justguess and verify that

D fx(v) �∫ 1

0

∂g∂x

(x(τ), τ)v(τ)dτ.

Note that this expression is linear in v as it must be for it to be the derivative. Now,we check that the limit in the definition of the derivative is zero,

limh→0

f (x + h) − f (x) − D fx(h)

∥h∥ � limh→0

���∫ g(x(τ) + h(τ), τ) − g(x(τ), τ) − ∂g∂x (x(τ), τ)h(τ)dτ

���∥h∥

≤ limh→0

∫ ���g(x(τ) + h(τ), τ) − g(x(τ), τ) − ∂g∂x (x(τ), τ)h(τ)

��� dτ

∥h∥where the inequality follows from the triangle inequality. To simplify, let us assumethat g and ∂g

∂x are bounded. Then, by the dominated convergence, theorem, we can

13

Page 14: ifferential Calculus - faculty.arts.ubc.cafaculty.arts.ubc.ca/pschrimpf/526/calculus-526.pdf · DIFFERENTIAL CALCULUS Definition 1.4. Let f: Rn!R. we say that f has a local maximum

DIFFERENTIAL CALCULUS

interchange the integral and the limit.a We then have

≤∫

limh→0

���g(x(τ) + h(τ), τ) − g(x(τ), τ) − ∂g∂x (x(τ), τ)h(τ)

���∥h∥ dτ

The definition of ∂g∂x says that����� g(x(τ) + h(τ), τ) − g(x(τ), τ) − ∂g

∂x (x(τ), τ)h(τ)h(τ)

�����→0

Also |h(τ)|∥h∥ ≤ 1 for all τ because in L∞(0, 1), ∥h∥ � sup0≤τ≤1 |h(τ)|. Thus, we can

conclude that

limh→0

���g(x(τ) + h(τ), τ) − g(x(τ), τ) − ∂g∂x (x(τ), τ)h(τ)

���∥h∥ �

� limh→0

���g(x(τ) + h(τ), τ) − g(x(τ), τ) − ∂g∂x (x(τ), τ)h(τ)

���|h(τ)|

|h(τ)|∥h∥ � 0,

so f is Fréchet differentiable at x with derivative D fx .aWe have not covered the dominated convergence theorem. Unless specifically stated otherwise, onhomeworks and exams you can assume that interchanging limits and integrals is allowed. However, donot forget that this is not always allowed. The issue is the order of taking limits. Integrals are defined interms of limits (either Riemann sums or integrals of simple functions). It is not difficult to come up withexamples where am ,n is a doubly indexed sequence and limm→∞ limn→∞ am ,n , limn→∞ limm→∞ am ,n .

With this definition of the derivative, almost everything that we proved above forfunctions fromRn→Rm also holds for functions on Banach spaces. In particular, the chainrule, Taylor’s theorem, the implicit function theorem, and the inverse function theoremhold. The proofs of these theorems on Banach spaces are essentially the same as above,so we will not go over them. If you find this interesting, you may want to go through theproofs of all these claims, but it is not necessary to do so.

The mean value theorem is slightly more delicate. It still holds for functions f : V→Rm ,but must be modified when the target space of f is infinite dimensional. We start withthe mean value theorem for f : V→R.

Theorem 2.1 (Mean value theorem (onto R) ). Let f : V→R1 be in continuously differentiableon some open set U. Let x , y ∈ U be such that the line connecting x and y, ℓ(x , y) � {z ∈ Rn :z � λx + (1 − λ)y , λ ∈ [0, 1]}, is also in U. Then there is some x ∈ ℓ(x , y) such that

f (x) − f (y) � D fx(x − y).

Proof. Identical to the proof of 1.5. □

This result can then be generalized to f : V→Rn in the same way as theorem 1.1.14

Page 15: ifferential Calculus - faculty.arts.ubc.cafaculty.arts.ubc.ca/pschrimpf/526/calculus-526.pdf · DIFFERENTIAL CALCULUS Definition 1.4. Let f: Rn!R. we say that f has a local maximum

DIFFERENTIAL CALCULUS

Now, let’s consider a first order Taylor expansion for f : V→W :

f (x + h) − f (x) �D fx h

f (x + h) � f (x) + D fx h

� f (x) + D fx h + (D fx − D fx)h︸ ︷︷ ︸r1(x ,h)

To show that first order Taylor expansions have small approximation error, we need toshow that (D fx − D fx)h is uniformly small for all possible h. When W is R1, there isa single x. When W � Rm , then x is different for each dimension (as in theorem 1.1).However, this is okay because we just take the maximum over the finitely many different xto get an upper bound on

(D fx − D fx)h . When W is infinite dimensional, this argument

does not work. Instead, we must use a result called the Hahn-Banach extension theorem,which is closely related to the separating hyperplane theorem.

Theorem 2.2 (Hahn-Banach theorem ). Let V be a vector space and g : V→R be convex. LetS ⊆ V be a linear subspace. Suppose f0 : S→R is linear and

f0(x) ≤ g(x)for all x ∈ S. Then f0 can be extended to f : V→R such that

f0(x) � f (x)for all x ∈ S and

f (x) ≤ g(x)for all x ∈ V .

Proof. We will apply the separating hyperplane theorem to two sets in V × R. Let

A � {(x , a) : a > g(x), x ∈ V, a ∈ R}.Since g is a convex function, A is a convex set. Let

B � {(x , a) : a � f0(x), x ∈ S, a ∈ R}Since f0 is linear, B is convex. Clearly A ∩ B � ∅. Also,

∫A , ∅ because A is open, so∫

A � A and A , ∅ since e.g. (x , g(x) + 1) ∈ A.Therefore, by the separating hyperplane theorem, ∃ξ ∈ (V × R)∗ and c ∈ R such that

ξzA > c ≥ ξzB

for all zA ∈ A and zB ∈ B. B is a linear subspace, so it must be that ξzB � 0 for all zB ∈ B,so we can take c � 0.

Note that since (0, 1) ∈ A, ξ(0, 1) > 0. Let f : V→R be given by f (x) � − ξ(x ,0)ξ(0,1) . Toconclude, we must show that f (x) � f0(x) for all x ∈ S and f (x) ≤ g(x) for all x ∈ V . Toshow that f (x) � f0(x), by linearity of ξ we have

ξ(x , y) �ξ(x , 0) + yξ(0, 1)ξ(x , y)ξ(x , 0) � − f (x) + y

15

Page 16: ifferential Calculus - faculty.arts.ubc.cafaculty.arts.ubc.ca/pschrimpf/526/calculus-526.pdf · DIFFERENTIAL CALCULUS Definition 1.4. Let f: Rn!R. we say that f has a local maximum

DIFFERENTIAL CALCULUS

If x ∈ S, then ξ(x , f0(x)) � 0, so f (x) � f0(x) for all x ∈ S. Similarly for any y > g(x), then(x , y) ∈ A, so ξ(x ,y)ξ(x ,0) � − f (x) + y > 0, and y > f (x). Therefore, f (x) ≤ g(x). □

With this result in hand, we now can now prove the mean value theorem for arbitraryBanach spaces.

Theorem 2.3 (Mean value theorem (onto R) ). Let f : V→W be in continuously differentiableon some open set U. Let x , y ∈ U be such that the line connecting x and y, ℓ(x , y) � {z ∈ Rn :z � λx + (1 − λ)y , λ ∈ [0, 1]}, is also in U. Then there is some x ∈ ℓ(x , y) such that f (x) − f (y)

W ≤

D fx(x − y)

W ≤ D fx

BL(V,W)

x − y

V

Proof. Step 1: use the Hahn Banach theorem to show that ∃ϕ ∈ W ∗ with ϕ � 1 and f (x) − f (y)

� ϕ( f (x) − f (y)). The relevant f0 is f0(α( f (x) − f (y))) � α f (x) − f (y)

.Define g : [0, 1]→R by g(t) � ϕ( f (tx) − f ((1 − t)y)) and apply the mean value theorem

on R to get f (x) − f (y) � ϕD ft∗x+(1−t∗)y)(x − y)

. □

3. Optimization in vector spaces

Recall from earlier that there are many sets of functions that are vector spaces. Wetalked a little bit about Lp spaces of functions. The set of all continuous functions andthe sets of all k times continuously differentiable functions are also vector spaces. Oneof these vector spaces of functions will be appropriate for finding the solution to optimalcontrol problems. Exactly which vector space is a slightly technical problem dependentquestion, so we will not worry about that for now (and we may not worry about it at all).Similarly, there are vector spaces of infinite sequences. Little ℓp is similar to big Lp , butwith sequences instead of functions

ℓp� {{xt}∞t�1 :

( ∞∑t�1

|xt |p)1/p

< ∞}

There are others as well. Again, the right choice of vector space depends on the problembeing considered, and we will not worry about it too much.

Given two Banach spaces (complete normed vector spaces), V and W , let BL(V,W)denote the set of all linear transformations from V to W . The derivative of f : V→W willbe in BL(V,W). We can show that BL(V,W) is a vector space, and we can define a normon BL(V,W) by

∥D∥BL(V,W) � sup∥v∥V�1

∥Dv∥W

where ∥·∥V is the norm on V and ∥·∥W is the norm on W . Moreover, we could show thatBL(V,W) is complete. Thus, BL(V,W) is also a Banach space. Viewed as function of x,D fx is a function from V to BL(V,W). As we just said, there are both Banach spaces, socan differentiate D fx with respect to x. In this way, we can define the second and higherderivatives of f : V→W .

16

Page 17: ifferential Calculus - faculty.arts.ubc.cafaculty.arts.ubc.ca/pschrimpf/526/calculus-526.pdf · DIFFERENTIAL CALCULUS Definition 1.4. Let f: Rn!R. we say that f has a local maximum

DIFFERENTIAL CALCULUS

In the previous section we saw that differentiation for functions on Banach spaces is thesame as for functions on finite dimensional vector spaces. All of our proofs of first andsecond order conditions only relied on Taylor expansions and some properties of lineartransformations. Taylor expansions and linear transformations are the same on Banachspaces as on finite dimensional vector spaces, so our results for optimization will still hold.Let’s just state for the first order condition for equality constraints. The other results aresimilar, but stating them gets to be slightly cumbersome.

Theorem 3.1 (First order condition for maximization with equality constraints). Let f :U→R and h : U→W be continuously differentiable on U ⊆ V , where V and W are Banachspaces. Suppose x∗ ∈ interior(U) is a local maximizer of f on U subject to h(x) � 0. Supposethat Dhx∗ : V→W is onto. . Then, there exists µ∗ ∈ BL(W,R) such that for

L(x , µ) � f (x) − µh(x).

we have

DxL(x∗, µ∗) �D fx∗ − µDhx∗ � 0BL(V,R)

DµL(x∗, µ∗) �h(x∗) � 0W

There are a few differences compared to the finite dimensional case that are worthcommenting on. First, in the finite dimensional case, we had h : U→Rm , and the conditionthat rankDhx∗ � m. This is the same as saying that the Dhx∗ : Rn→Rm is onto. Rank isnot well-defined in infinite dimension, so we now state this condition as Dhx∗ being ontoinstead of being rank m.

Secondly, previously µ ∈ Rm , and the Lagrangian was

L(x , µ) � f (x) − µT h(x).

Viewed as a 1 by m matrix, µT is a linear transformation from Rm to R. Thus, in theabstract case, we just say µ ∈ BL(W,R), which as when we defined transposes, is calledthe dual space of W and is denoted W ∗.

Finally, we have subscripted the zeroes in the first order condition with BL(V,R) andW to emphasize that the first equation is for linear transformations from V to R, and thesecond equation is in W . D fx∗ is a linear transformation from V to R. Dhx∗ goes from Vto W . µ goes from W to R, so µ composed with Dhx∗, which we just denoted by µDhx∗ isa linear transformation from V to R.

4. Inverse functions

Suppose f : Rn→Rm . If we know f (x) � y, when can we solve for x in terms of y? Inother words, when is f invertible? Well, suppose we know that f (a) � b for some a ∈ Rn

and b ∈ Rm . Then we can expand f around a,

f (x) � f (a) + D fa(x − a) + r1(a , x − a)17

Page 18: ifferential Calculus - faculty.arts.ubc.cafaculty.arts.ubc.ca/pschrimpf/526/calculus-526.pdf · DIFFERENTIAL CALCULUS Definition 1.4. Let f: Rn!R. we say that f has a local maximum

DIFFERENTIAL CALCULUS

where r1(a , x − a) is small. Since r1 is small, we can hopefully ignore it then y � f (x) canbe rewritten as a linear equation:

f (a) + D fa(x − a) � y

D fax � y − f (a) + D fa a

we know that this equation has a solution if rankD fa � rank(D fa y − f (a) + D fa a

). It

has a solution for any y if rankD fa � m. Moreoever, this solution is unique if rankD fa � n.This discussion is not entirely rigorous because we have not been very careful about whatr1 being small means. The following theorem makes it more precise.

Theorem 4.1 (Inverse function). Let f : Rn→Rn be continuously differentiable on an open setE. Let a ∈ E, f (a) � b, and D fa be invertible . Then

(1) there exist open sets U and V such that a ∈ U, b ∈ V , f is one-to-one on U and f (U) � V ,and

(2) the inverse of f exists and is continuously differentiable on V with derivative(D f f −1(x)

)−1.

The open sets U and V are the areas where r1 is small enough. The continuity of f andits derivative are also needed to ensure that r1 is small enough. The proof of this theoremis a bit long, but the main idea is the same as the discussion preceding the theorem.

Comment 4.1. The proof uses the fact that the space of all continuous linear transfor-mations between two normed vector spaces is itself a vector space. I do not think wehave talked about this before. Anyway, it is a useful fact that already came up in theproof that continuous Gâteaux differentiable implies Fréchet differentiable last lecture.Let V and W be normed vector spaces with norms ∥·∥V and ∥·∥W . Let BL(V,W) denotethe set of all continuous (or equivalently bounded) linear transformations from V toW . Then BL(V,W) is a normed vector space with norm

∥A∥BL ≡ supv∈V

∥Av∥W

∥v∥V.

This is sometimes called the operator norm on BL(V,W). Last lecture, the proof thatGâteaux differentiable implies Fréchet differentiable required that the mapping fromV to BL(V,W) defined by D fx as a function of x ∈ V had to be continuous with respectto the above norm.

We will often use the inequality,

∥Av∥W ≤ ∥A∥BL ∥v∥V ,

which follows from the definition of ∥·∥BL. We will also use the fact that if V is finitedimensional and f (x , v) : V ×V→W , is continuous in x and v and linear in v for eachx, then f (x , ·) : V→BL(V,W) is continuous in x with respect to ∥·∥BL.

Proof. For any y ∈ Rn , consider φy(x) � x + D f −1a

(y − f (x)

). By the mean value theorem

for x1, x2 ∈ U, where a ∈ U and U is open,

φy(x1) − φy(x2) � Dφyx (x1 − x2)

18

Page 19: ifferential Calculus - faculty.arts.ubc.cafaculty.arts.ubc.ca/pschrimpf/526/calculus-526.pdf · DIFFERENTIAL CALCULUS Definition 1.4. Let f: Rn!R. we say that f has a local maximum

DIFFERENTIAL CALCULUS

Note that

Dφyx �I − D f −1

a D fx

�D f −1a (D fa − D fx).

Since D fx is continuous (as a function of x) if we make U small enough, then D fa − D fxwill be near 0. Let λ �

12∥D f −1

a ∥BL. Choose U small enough that

D fa − D fx < λ for all

x ∈ U. From above, we know that φy(x1) − φy(x2) �

D f −1a (D fa − D fx)(x1 − x2)

Dφyx

BL

D fa − D fx

BL ∥x1 − x2∥

≤12 ∥x1 − x2∥ (7)

For any y ∈ f (U) we can start with an arbitrary x1 ∈ U, then create a sequence by setting

xi+1 � φy(xi).

From (7), this sequence satisfies

∥xi+1 − xi ∥ ≤ 12 ∥xi − xi−1∥ .

Using this it is easy to verify that xi form a Cauchy sequence, so it converges. The limitsatisfy φy(x) � x, i.e. f (x) � y. Moreover, this x is unique because if φy(x1) � x1 andφy(x2) � x2, then we have ∥x1 − x2∥ ≤ 1

2 ∥x1 − x2∥, which is only possible if x1 � x2. 3Thus for each y ∈ f (U), there is exactly one x such that f (x) � y. That is, f is one-to-oneon U. This proves the first part of the theorem and that f −1 exists.

We now show that f −1 is continuously differentiable with the stated derivative. Lety , y + k ∈ V � f (U). Then ∃x , x + h ∈ U such that y � f (x) and y + k � f (x + h). Withφy as defined above, we have

φy(x + h) − φy(x) �h + D f −1a ( f (x) − f (x + h))

�h − D f −1a k

By 7, h − D f −1

a k ≤ 1

2 ∥h∥. It follows that D f −1

a k ≥ 1

2 ∥h∥ and

∥h∥ ≤ 2 D f −1

a

BL ∥k∥ � λ−1 ∥k∥ .

3Functions like φy that have d(ϕ(x), ϕ(y)) ≤ cd(x , y) for c < 1 are called contraction mappings. The xwith x � ϕ(x) is called a fixed point of the contraction mapping. The argument in the proof shows thatcontraction mappings have at most one fixed point. It is not hard to show that contraction mappings alwayshave exactly one fixed point.

19

Page 20: ifferential Calculus - faculty.arts.ubc.cafaculty.arts.ubc.ca/pschrimpf/526/calculus-526.pdf · DIFFERENTIAL CALCULUS Definition 1.4. Let f: Rn!R. we say that f has a local maximum

DIFFERENTIAL CALCULUS

Importantly as k→0, we also have h→0. Now, f −1(y + k) − f −1(y) − D f −1x k

∥k∥ �

−D f −1x ( f (x + h) − f (x) − D fx h)

∥k∥

≤ D fx

−1λ

f (x + h) − f (x) − D fx h

∥h∥

limk→0

f −1(y + k) − f −1(y) − D f −1x k

∥k∥ ≤ lim

k→0

D fx −1

BL λ

f (x + h) − f (x) − D fxh

∥h∥ � 0

Finally, since D fx is continuous, so is (D f f −1(y))−1, which is the derivative of f −1. □

The proof of the inverse function theorem might be a bit confusing. The important ideais that if the derivative of a function is nonsingular at a point, then you can invert thefunction around that point because inverting the system of linear equations given by themean value expansion around that point nearly gives the inverse of the function.

5. Implicit functions

The implicit function theorem is a generalization of the inverse function theorem. Ineconomics, we usually have some variables, say x, that we want to solve for in terms ofsome parameters, say β. For example, x could be a person’s consumption of a bundle ofgoods, and b could be the prices of each good and the parameters of the utility function.Sometimes, we might be able to separate x and β so that we can write the conditions of ourmodel as f (x) � b(β). Then we can use the inverse function theorem to compute ∂xi

∂β jand

other quantities of interest. However, it is not always easy and sometimes not possibleto separate x and β onto opposite sides of the equation. In this case our model gives usequations of the form f (x , β) � c. The implicit function theorem tells us when we cansolve for x in terms of β and what ∂xi

∂β jwill be.

The basic idea of the implicit function theorem is the same as that for the inversefunction theorem. We will take a first order expansion of f and look at a linear systemwhose coefficients are the first derivatives of f . Let f : Rn→Rm . Suppose f can be writtenas f (x , y) with x ∈ Rk and y ∈ Rn−k . x are endogenous variables that we want to solvefor, and y are exogenous parameters. We have a model that requires f (x , y) � c, and weknow that some particular x0 and y0 satisfy f (x0, y0) � c. To solve for x in terms of y, wecan expand f around x0 and y0.

f (x , y) � f (x0, y0) + Dx f(x0 ,y0)(x − x0) + Dy f(x0 ,y0)(y − y0) + r(x , y) � c

In this equation, Dx f(x0 ,y0) is the m by k matrix of first partial derivatives of f with respectto x evaluated at (x0, y0). Similary, Dy f(x0 ,y0) is the m by n − k matrix of first partialderivatives of f with respect to y evaluated at (x0, y0). Then, if r(x , y) is small enough,we have

f (x0, y0) + Dx f(x0 ,y0)(x − x0) + Dy f(x0 ,y0)(y − y0) ≈ c

Dx f(x0 ,y0)(x − x0) ≈(c − f (x0, y0) − Dy f(x0 ,y0)(y − y0)

)20

Page 21: ifferential Calculus - faculty.arts.ubc.cafaculty.arts.ubc.ca/pschrimpf/526/calculus-526.pdf · DIFFERENTIAL CALCULUS Definition 1.4. Let f: Rn!R. we say that f has a local maximum

DIFFERENTIAL CALCULUS

This is just a system of linear equations with unknowns (x − x0). If k � m and Dx f(x0 ,y0) isnonsingular, then we have

x ≈ x0 +(Dx f(x0 ,y0)

)−1 (c − f (x0, y0) − Dy f(x0 ,y0)(y − y0)

)which gives x approximately as function of y. The implicit function says that you canmake this approximation exact and get x � g(y). The theorem also tells you what thederivative of g(y) is in terms of the derivative of f .

Theorem 5.1 (Implicit function). Let f : Rn+m→Rn be continuously differentiable on someopen set E and suppose f (x0, y0) � c for some (x0, y0) ∈ E, where x0 ∈ Rn and y0 ∈ Rm . IfDx f(x0 ,y0) is invertible, then there exists open sets U ⊂ Rn and W ⊂ Rn−k with x0 ∈ U andy0 ∈ W such that

(1) For each y ∈ W there is a unique x such that (x , y) ∈ U and f (x , y) � c.(2) Define this x as g(y). Then g is continuously differentiable on W , g(y0) � x0, f (g(y), y) �

c for all y ∈ W , and D gy0 � −(Dx f(x0 ,y0)

)−1Dy f(x0 ,y0)

Proof. We will show the first part by applying the inverse function theorem. DefineF : Rn+m→Rn+m by F(x , y) � ( f (x , y), y). To apply the inverse function theorem we mustshow that F is continuously differentiable and DF(x0 ,y0) is invertible. To show that F iscontinuously differentiable, note that

F(x + h , y + k) − F(x , y) �( f (x + h , y + k) − f (x , y), k)�(D f(x , y)(h k), k)

where the second line follows from the mean value theorem. It is then apparent that

lim(h ,k)→0

F(x + h , y + k) − F(x , y) −(Dx f(x ,y) Dy f(x ,y)

0 Im

) (hk

) ∥(h , k)∥ � 0.

So, DF(x ,y) �

(Dx f(x ,y) Dy f(x ,y)

0 Im

), which is continuous sinve D f(x ,y) is continuous. Also,

DF(x0 ,y0) can be shown to be invertible by using the partitioned inverse formula becauseDx f(x0 ,y0) is invertiable by assumption. Therefore, by the inverse function theorem, thereexists open sets U and V such that (x0, y0) ∈ U and (c , y0) ∈ V , and F is one-to-one on U.

Let W be the set of y ∈ Rm such that (c , y) ∈ V . By definition, y0 ∈ W . Also, W is openin Rm because V is open in Rn+m .

We can now complete the proof of 1. If y ∈ W then (c , y) � F(x , y) for some (x , y) ∈ U.If there is another (x′, y) such that f (x′, y) � c, then F(x′, y) � (c , y) � F(x , y). We justshowed that F is one-to-one on U, so x′ � x.

We now prove 2. Define g(y) for y ∈ W such that (g(y), y) ∈ U and f (g(y), y) � c, and

F(g(y), y) � (c , y).By the inverse function theorem, F has an inverse on U. Call it G. Then

G(c , y) � (g(y), y)21

Page 22: ifferential Calculus - faculty.arts.ubc.cafaculty.arts.ubc.ca/pschrimpf/526/calculus-526.pdf · DIFFERENTIAL CALCULUS Definition 1.4. Let f: Rn!R. we say that f has a local maximum

DIFFERENTIAL CALCULUS

and G is continuously differentiable, so g must be as well. Differentiating the aboveequation with respect to y, we have

DyG(c ,y) �

(D gyIm

)On the other hand, from the inverse function theorem, the derivative of G at (x0, y0) is

DG(x0 ,y0) �(DF(x0 ,y0)

)−1

(Dx f(x0 ,y0) Dy f(x0 ,y0)

0 Im

)−1

(Dx f −1

(x0 ,y0) −Dx f −1(x0 ,y0)Dy f(x0 ,y0)

0 Im

)In particular,

DyG(c ,y0) �

(−Dx f −1(x0 ,y0)Dy f(x0 ,y0)

Im

)�

(D gy0

Im

)so D gy0 � −Dx f −1

(x0 ,y0)Dy f(x0 ,y0). □

6. Contraction mappings

One step of the proof the of the inverse function theorem was to show that φy(x1) − φy(x2) ≤ 1

2 ∥x1 − x2∥ .

This property ensures that φ(x) � x has a unique solution. Functions like φy appear quiteoften, so they have name.

Definition 6.1. Let f : Rn→Rn . f is a contraction mapping on U ⊆ Rn if for all x , y ∈ U, f (x) − f (y) ≤ c

x − y

for some 0 ≤ c < 1.

If f is a contraction mapping, then an x such that f (x) � x is called a fixed point of thecontraction mapping. Any contraction mapping has at most one fixed point.

Lemma 6.1. Let f : Rn→Rn be a contraction mapping on U ⊆ Rn . If x1 � f (x1) and x2 � f (x2)for some x1, x2 ∈ U, then x1 � x2.

Proof. Since f is a contraction mapping, f (x1) − f (x2) ≤ c ∥x1 − x2∥ .

f (xi) � xi , so∥x1 − x2∥ ≤ c ∥x1 − x2∥ .

Since 0 ≥ c < 1, the previous inequality can only be true if ∥x1 − x2∥ � 0. Thus, x1 � x2. □22

Page 23: ifferential Calculus - faculty.arts.ubc.cafaculty.arts.ubc.ca/pschrimpf/526/calculus-526.pdf · DIFFERENTIAL CALCULUS Definition 1.4. Let f: Rn!R. we say that f has a local maximum

DIFFERENTIAL CALCULUS

Starting from any x0, we can construct a sequence, x1 � f (x0), x2 � f (x1), etc. When fis a contraction, ∥xn − xn+1∥ ≤ cn ∥x1 − x0∥, which approaches 0 as n→∞. Thus, {xn} is aCauchy sequence and converges to a limit. Moreover, this limit will be such that x � f (x),i.e. it will be a fixed point.

Lemma 6.2. Let f : Rn→Rn be a contraction mapping on U ⊆ Rn , and suppose that f (U) ⊆ U.Then f has a unique fixed point.

Proof. Pick x0 ∈ U. As in the discussion before the lemma, construct the sequence definedby xn � f (xn−1). Each xn ∈ U because xn � f (xn−1) ∈ f (U) and f (U) ⊆ U by assumption.Since f is a contraction on U, ∥xn+1 − xn ∥ ≤ cn ∥x1 − x0∥, so limn→∞ ∥xn+1 − xn ∥ � 0, and{xn} is a Cauchy sequence. Let x � limn→∞ xn . Then x − f (x)

≤ ∥x − xn ∥ + f (x) − f (xn−1)

≤ ∥x − xn ∥ + c ∥x − xn−1∥

xn→x, so for any ϵ > 0 ∃N , such that if n ≥ N , then ∥x − xn ∥ < ϵ1+c . Then, x − f (x)

< ϵfor any ϵ > 0. Therefore, x � f (x). □

23