introduction to optimization theory - shandong...
TRANSCRIPT
Introduction to Optimization
Theory
Lecture Notes
JIANFEI SHEN
S C H O O L O F E C O N O M I C S
S H A N D O N G U N I V E R S I T Y
Besides language and music, mathematics is one of the primarymanifestations of the free creative power of the human mind.
— Hermann Weyl
CONTENTS
1 Multivariable Calculus 1
1.1 Functions on Euclidean Space . . . . . . . . . . . . . . . . . . . 2
1.2 Directional Derivative and Derivative . . . . . . . . . . . . . . . 8
1.3 Partial Derivatives and the Jacobian . . . . . . . . . . . . . . . . 13
1.4 The Chain Rule . . . . . . . . . . . . . . . . . . . . . . . . . . . . 13
1.5 The Implicit Function Theorem . . . . . . . . . . . . . . . . . . . 16
1.6 Gradient and Its Properties . . . . . . . . . . . . . . . . . . . . . 19
1.7 Continuously Differentiable Functions . . . . . . . . . . . . . . 25
1.8 Quadratic Forms: Definite and Semidefinite Matrices . . . . . 27
1.9 Homogeneous Functions and Euler’s Formula . . . . . . . . . . 28
2 Optimization in Rn 31
2.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 31
2.2 Unconstrained Optimization . . . . . . . . . . . . . . . . . . . . 33
2.3 Equality Constrained Optimization: Lagrange’s Method . . . . 35
2.4 Inequality Constrained Optimization: Kuhn-Tucker Theorem . 42
2.5 Envelop Theorem . . . . . . . . . . . . . . . . . . . . . . . . . . . 49
3 Convex Analysis in Rn 55
3.1 Convex Sets . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 55
3.2 Separation Theorem . . . . . . . . . . . . . . . . . . . . . . . . . 57
3.3 Systems of Linear Inequalities: Theorems of the Alternative . 63
3.4 Convex Functions . . . . . . . . . . . . . . . . . . . . . . . . . . . 65
3.5 Convexity and Optimization . . . . . . . . . . . . . . . . . . . . 70
References 75
Index 78
1
MULTIVARIABLE CALCULUS
In this chapter we consider functions mapping Rm into Rn, and we define
what we mean by the derivative of such a function. It is important to be
familiar with the idea that the derivative at a point a of a map between
open sets of (normed) vector spaces is a linear transformation between the
vector spaces (in this chapter the linear transformation is represented as a
n �m matrix).
This chapter is based on Spivak (1965, Chapters 1 & 2) and Munkres
(1991, Chapter 2)—one could do no better than to study theses two excel-
lent books for multivariable calculus.
Notation
We use standard notation:
N´ f1; 2; 3; : : :g D the set of all natural numbers;
Z´ f0;˙1;˙2; : : :g D the set of all integers;
Q´
�n
mW n;m 2 Z and m ¤ 0
�D the set of all rational numbers;
R´ the set of all real numbers:
We also define
RC´ fx 2 R W x > 0g and RCC´ fx 2 R W x > 0g:
2 CHAPTER 1 MULTIVARIABLE CALCULUS
1.1 Functions on Euclidean Space
Norm, Inner Product and Metric
Definition 1.1 (Euclidean n-space) Euclidean n-space Rn is defined as the
set of all n-tuples .x1; : : : ; xn/ of real numbers xi :
Rn´˚.x1; : : : ; xn/ W each xi 2 R
:
An element of Rn is often called a point in Rn, and R1, R2, R3 are often called
the line, the plane, and space, respectively.
If x denotes an element of Rn, then x is an n-tuple of numbers, the i th
one of which is denoted xi ; thus, we can write
x D .x1; : : : ; xn/:
A point in Rn is frequently also called a vector in Rn, because Rn, with
x C y D .x1 C y1; : : : ; xn C yn/ x;y 2 Rn
and
˛x D .˛x1; : : : ; ˛xn/; ˛ 2 R and x 2 Rn;
as operations, is a vector space.
To obtain the full geometric structure of Rn, we introduce three struc-
tures on Rn: the Euclidean norm, inner product and metric.
Definition 1.2 (Norm) In Rn, the length of a vector x 2 Rn, usually called
the norm kxk of x, is defined by
kxk D
qx21 C � � � x
2n:
Remark 1.3 The norm k � k satisfies the following properties: for all x;y 2
Rn and ˛ 2 R,
� kxk > 0,
� kxk D 0 iff1 x D 0,
� k˛xk D j˛j � kxk,
� kx C yk 6 kxk C kyk (Triangle inequality).
1“iff” is the abbreviation of “if and only if”.
1.1 FUNCTIONS ON EUCLIDEAN SPACE 3
? Exercise 1.4 Prove thatˇkxk � kyk
ˇ6 kx�yk for any two vectors x;y 2 Rn
(use the triangle inequality).
Definition 1.5 (Inner Product) Given x;y 2 Rn, the inner product of the
vectors x and y , denoted x � y or hx;yi, is defined as
x � y D
nXiD1
xiyi :
Remark 1.6 The norm and the inner product are related through the
following identity:
kxk Dpx � x:
Theorem 1.7 (Cauchy-Schwartz Inequality) For any x;y 2 Rn we have
jx � yj 6 kxk kyk:
Proof. We assume that x ¤ 0; for otherwise the proof is trivial. For every
a 2 R, we have
0 6 kax C yk2 D a2kxk2 C 2a.x � y/C kyk2:
In particular, let a D �.x � y/=kxk2. Then, from the above display, we get
the desired result. ut
? Exercise 1.8 Prove the triangle inequality (use the Cauchy-Schwartz In-
equality). Show it holds with equality iff one of the vector is a nonnegative
scalar multiple of the other.
Definition 1.9 (Metric) The distance d.x;y/ between two vectors x;y 2 Rn
is given by
d.x;y/ D
pnXiD1
.xi � yi /2:
The distance function d is called a metric.
Example 1.10 In R2, choose two points x1 D .x11 ; x12/ and x2 D .x21 ; x
22/
with x21 � x11 D a and x22 � x
12 D b. Then Pythagoras tells us that (Figure 1.1)
d.x1;x2/ Dpa2 C b2 D
r�x21 � x
11
�2C
�x22 � x
12
�2:
Remark 1.11 The metric is related to the norm k � k through the identity
d.x;y/ D kx � yk:
4 CHAPTER 1 MULTIVARIABLE CALCULUS
0 x1
x2
x1
x2
x11 x21
x12
x22
d.x1 ;x
2 / D
pa2 C
b2
x21 � x11 D a
x22 � x12 D b
Figure 1.1: Distance in the plane.
Subsets of Rn
Definition 1.12 (Open Ball) Let x 2 Rn and r > 0. The open ball B.xI r/
with center x and radius r is given by
B.xI r/´˚y 2 Rn W d.x;y/ < r
:
Definition 1.13 (Interior) Let S � Rn. A point x 2 S is called an interior
point of S if there is some r > 0 such that B.xI r/ � S . The set of all interior
points of S is called its interior and is denoted SB.
Definition 1.14 Let S � Rn.
� S is open if for every x 2 S there exists r > 0 such that B.xI r/ � S .
� S is closed if its complement Rn X S is open.
� S is bounded if there exists r > 0 such that S � B.0I r/.
� S is compact if (and only if) it is closed and bounded (Heine-Borel Theo-
rem).2
Example 1.15 On R, the interval .0; 1/ is open, the interval Œ0; 1� is closed.
Both .0; 1/ and Œ0; 1� are bounded, and Œ0; 1� is compact. However, the interval
.0; 1� is neither open nor closed. But R is both open and closed.
2This definition does not work for more general metric spaces. See Willard (2004) fordetails.
1.1 FUNCTIONS ON EUCLIDEAN SPACE 5
Limit and Continuity
Functions A function from Rm to Rn (sometimes called a vector-valued
function of m variables) is a rule which associates to each point in Rm some
point in Rn. We write
f W Rm ! Rn
to indicate that f .x/ 2 Rn is defined for x 2 Rm.
The notation f W A! Rn indicates that f .x/ is defined only for x in the
set A, which is called the domain of f . If B � A, we define f .B/ as the set
of all f .x/ for x 2 B :
f .B/´ ff .x/ W x 2 Bg :
If C � Rn we define
f �1.C /´ fx 2 A W f .x/ 2 C g :
The notation f W A! B indicates that f .A/ � B . The graph of f W A! B is
the set of all pairs .a; b/ 2 A � B such that b D f .a/.
A function f W A! Rn determines n component functions f1; : : : ; fn W A!
R by
f .x/ D�f1.x/; : : : ; fn.x/
�:
Sequences A sequence is a function that assigns to each natural number
n 2 N a vector or point xn 2 Rn. We usually write the sequences as .xn/1nD1or .xn/.
Example 1.16 Examples of sequences in R2 are
(a) .xn/ D ..n; n//.
(b) .xn/ D ..cos n�2; sin n�
2//.
(c) .xn/ D ...�1/n=2n; 1=2n//.
(d) .xn/ D ...�1/n � 1=n; .�1/n � 1=n//.
See Figure 1.2.
Definition 1.17 (Limit) A sequence .xn/ is said to have a limit x or to
converge to x if for every " > 0 there is N" 2 N such that whenever n > N",
we have xn 2 B.xI "/. We write
limn!1
xn D x or xn ! x:
Example 1.18 In Example 1.16, the sequences (a), (b) and (d) do not con-
verge, while the sequence (c) converges to .0; 0/.
6 CHAPTER 1 MULTIVARIABLE CALCULUS
0 x1
x2
x1
x2
x3
(a) .xn/ D�.n; n/
�
0 x1
x2
x1
x2
x3
x4
x5
x6
(b) .xn/ D�.cos n�
2; sin n�
2/�
0 x1
x2x1
x2
x3x4
(c) .xn/ D˚..�1/n=2n; 1=2n/
, which is convergent
0 x1
x2
x1
x2
x3
x4
(d) .xn/ D�..�1/n � 1=n; .�1/n � 1=n/
�Figure 1.2: Examples of sequences
1.1 FUNCTIONS ON EUCLIDEAN SPACE 7
0
(a) A continuous function.
0 x
(b) We cannot draw the graph of 1=xwithout taking our pencil off the paper.
Figure 1.3: Naive continuity.
Naive Continuity The simplest way to say that a function f W A! R is
continuous is to say that one can draw its graph without taking the pencil off
the paper. For example, a function whose graph looks like in Figure 1.3(a)
would be continuous in this sense (Crossley, 2005, Chapter 2).
But if we look at the function f .x/ D 1=x, then we see that things are
not so simple. The graph of this function has two parts—one part cor-
responding to negative x values, and the other to positive x values. The
function is not defined at 0, so we certainly cannot draw both parts of this
graph without taking our pencil off the paper; see Figure 1.3(b). Of course,
f .x/ D 1=x is continuous near every point in its domain. Such a function
deserves to be called continuous. So this characterization of continuity in
terms of graph-sketching is too simplistic.
Rigorous Continuity The notation limx!a f .x/ D b means, as in the
one-variable case, that we get f .x/ as close to b as desired, by choosing x
sufficiently close to, but not equal to, a. In mathematical terms this means
that for every number " > 0 there is a number ı > 0 such that kf .x/�bk < "
for all x in the domain of f which satisfy 0 < kx � ak < ı.
A function f W A ! Rn is called continuous at a 2 A if limx!a f .x/ D
f .a/, and f is continuous if it is continuous at each a 2 A.
8 CHAPTER 1 MULTIVARIABLE CALCULUS
? Exercise 1.19 Let
f .x/ D
˚x if x ¤ 1
3=2 if x D 1:
Show that f .x/ is not continuous at a D 1.
1.2 Directional Derivative and Derivative
Let us first recall how the derivative of a real-valued function of a real vari-
able is defined. Let A � R; let f W A ! R. Suppose A contains a neigh-
borhood of the point a, that is, there is an open ball B.aI r/ such that
B.aI r/ � A. We define the derivative of f at a by the equation
f 0.a/ D limt!0
f .aC t / � f .a/
t; (1.1)
provided the limit exists. In this case, we say that f is differentiable at a.
Geometrically, f 0.a/ is the slope of the tangent line to the graph of f at the
point .a; f .a//.
Definition 1.20 For a function f W .a; b/! R, and point x0 2 .a; b/, if
limt"0
f .x0 C t / � f .x0/
t
exists and is finite, we denote this limit by f 0�.x0/ and call it the left-hand
derivative of f at x0. Similarly, we define f 0C.x0/ and call it the right-hand
derivative of g at x0. Of course, f is differentiable at x0 iff it has left-hand
and right-hand derivatives at x0 that are equal.
Now let A � Rm, where m > 1; let f W A ! Rn. Can we define the
derivative of f by replacing a and t in the definition just given by points of
Rm? Certainly we cannot since we cannot divide a point of Rn by a point of
Rm if m > 1.
Directional Derivative
The following is our first attempt at a definition of “derivative”. Let A � Rm
and let f W A ! Rn. We study how f changes as we move from a point
a 2 AB (the interior of A) along a line segment to a nearby point a C u,
where u ¤ 0. Each point on the segment can be expressed as aC tu, where
t 2 R. The vector u describes the direction of the line segment. Since a is
an interior point of A, the line segment joining a to aC tu will lie in A if t is
small enough.
1.2 DIRECTIONAL DERIVATIVE AND DERIVATIVE 9
Definition 1.21 (Directional Derivative) Let A � Rm; let f W A ! Rn. Sup-
pose A contains a neighborhood of a. Given u 2 Rm with u ¤ 0, define
f 0.aIu/ D limt!0
f .aC tu/ � f .a/
t;
provided the limit exists. This limit is called the directional derivative of f
at a with respect to the vector u.
Remark 1.22 In calculus, one usually requires u to be a unit vector, i.e.,
kuk D 1, but that is not necessary.
Example 1.23 Let f W R2 ! R be given by the equation f .x1; x2/ D x1x2.
The directional derivative of f at a D .a1; a2/ with respect to the vector
u D .u1; u2/ is
f 0.aIu/ D limt!0
f .aC tu/ � f .a/
tD lim
t!0
.a1 C tu1/.a2 C tu2/ � a1a2
t
D u2a1 C u1a2:
Example 1.24 Suppose the directional derivative of f at a with respect to
u exists. Then for c 2 R,
f 0.aI cu/ D limt!0
f .aC tcu/ � f .a/
tD c lim
t!0
f .aC tcu/ � f .a/
tc
D c lims!0
f .aC su/ � f .a/
s
D cf 0.aIu/:
Remark 1.25 Example 1.24 shows that if u and v are collinear vectors in
Rm, then f 0.aIu/ and f 0.aI v/ are collinear in Rn.
However, directional derivative is not the appropriate generalization of
the notion of “derivative”. The main problems are:
Problem 1. Continuity does not follow from this definition of “differen-
tiability”. There exists functions such that f 0.aIu/ exists for all u ¤ 0 but
are not continuous.
Example 1.26 Define f W R2 ! R by setting
f .x; y/ D
�0 if .x; y/ D .0; 0/x2y
x4 C y2if .x; y/ ¤ .0; 0/:
10 CHAPTER 1 MULTIVARIABLE CALCULUS
We show that all directional derivatives of f exist at 0, but that f is not
continuous at 0. Let u D .h; k/ ¤ 0. Then
f .0C tu/ � f .0/
tD
.th/2.tk/�.th/4 C .tk/2
�tD
h2k
t2h4 C k2;
so that
f 0.0Iu/ D
˚h2=k if k ¤ 0;
0 if k D 0:
However, the function f takes the value 1=2 as each point of the parabola
y D x2 (except at 0), so f is not continuous at 0 since f .0/ D 0.
Problem 2. Composites of “differentiable” functions may not differen-
tiable.
Derivative
To give the right generalization of the notion of “derivative”, let us recon-
sider (1.1). In fact, if f 0.a/ exists, let Ra.t/ denote the difference
Ra.t/´
˚f .aCt/�f .a/
t� f 0.a/ if t ¤ 0
0 if t D 0:(1.2)
From (1.2) we see that limt!0Ra.t/ D 0. Then we have
f .aC t / D f .a/C f 0.a/t CRa.t/t: (1.3)
Note that (1.3) also holds for t D 0. This is called the first-order Taylor
formula for approximating f .a C t / � f .a/ by f 0.a/t . The error committed
is Ra.t/t .3 See Figure 1.4. It is this idea leads to the following definition:
Definition 1.27 (Differentiability) Let A � Rm; let f W A ! Rn. Suppose A
contains a neighborhood of a. We say that f is differentiable at a if there is
an n �m matrix Ba such that
f .aC h/ D f .a/C Ba � hC khkRa.h/;
where limh!0Ra.h/ D 0. The matrix Ba, which is unique, is called the
derivative of f at a; it is denoted Df .a/.
3“All science is dominated by the idea of approximation.”—Bertrand Russell.
1.2 DIRECTIONAL DERIVATIVE AND DERIVATIVE 11
0 xa
f .a/
f .x/
f.aCt /�f.a/
f0 .a/� t
t
Figure 1.4: f 0.a/t is the linear approximation to f .aC t / � f .a/ at a.
0 x0 x0 x0 x0 x0 x0 x0 x0 x0 x0 x0 x0 x0 x0 x0 x
a
f .a/f
Df .a/
Figure 1.5: Df .a/ is the linear part of f at a.
Remark 1.28 [1] Notice that h is a point of Rm and f .aCh/�f .a/�Ba �h
is a point of Rn, so the norm signs are essential.
[2] The derivative Df .a/ depends on the point a as well as the function f .
We are not saying that there exists a B which works for all a, but that
for a fixed a such a B exists.
[3] Here is how to visualize Df . Take m D n D 2. The function f W A! R2
distorts shapes nonlinearly; its derivative describes the linear part of
the distortion. Circles are sent by f to wobbly ovals, but they become
ellipses under Df .a/ (here we treat Df .a/ as the matrix that represents
a linear operator; see, e.g., Axler 1997.) Lines are sent by f to curves,
but they become straight lines under Df .a/. See Figure 1.5.
Example 1.29 Let f W Rm ! Rn be defined by the equation
f .x/ D A � x C b;
12 CHAPTER 1 MULTIVARIABLE CALCULUS
where A is an n �m matrix, and a 2 Rn. Then
f .aC h/ D A � .aC h/C b D A � aC bCA � h
D f .a/CA � h:
Hence, Ra.h/ D Œf .aC h/ � f .a/ �A � h�=khk D 0; that is, Df .a/ D A.
We now show that the definition of derivative is stronger than direc-
tional derivative. In particular, we have:
Theorem 1.30 Let A � Rm; let f W A! Rn. If f is differentiable at a, then
f is continuous at a.
Proof. Differentiablity at a implies that
kf .aC h/ � f .a/k D Df .a/ � hC khkRa.h/
6 kDf .a/k � khk C kRa.h/k � khk
! 0;
as aCh! a, where the inequality follows from the Triangle Inequality and
Cauchy-Schwartz Inequality (Theorem 1.7). ut
However, there is a nice connection between directional derivative and
derivative.
Theorem 1.31 Let A � Rm; let f W A! Rn. If f is differentiable at a, then
all the directional derivatives of f at a exist, and
f 0.aIu/ D Df .a/ � u:
Proof. Fix any u 2 Rm and take h D tu. Then
f .aC tu/ � f .a/
tD
Df .a/ � .tu/C ktukRa.tu/
t
D Df .a/ � uCjt j � kuk
tRa.tu/:
The last term converges to zero as t ! 0, which proves that f 0.aIu/ D
Df .a/ � u. ut
? Exercise 1.32 Define f W R2 ! R by setting
f .x; y/ D
�0 if .x; y/ D .0; 0/x2y
x4 C y2if .x; y/ ¤ .0; 0/:
Show that f is not differentiable at .0; 0/.
? Exercise 1.33 Let f W R2 ! R be defined by f .x; y/ Dpjxyj. Show that f
is not differentiable at .0; 0/.
1.3 PARTIAL DERIVATIVES AND THE JACOBIAN 13
1.3 Partial Derivatives and the Jacobian
We now introduce the notion of the “partial derivatives” of a real-valued
function. Let .e1; : : : ; em/ be the standard basis of Rm, i.e.,
e1 D .1; 0; 0; : : : ; 0/;
e2 D .0; 1; 0; : : : ; 0/;
: : :
em D .0; 0; : : : ; 0; 1/:
Definition 1.34 (Partial Derivatives) Let A � Rm; let f W A ! R. We define
the j th partial derivative of f at a to be the directional derivative of f at a
with respect to the vector ej , provided this derivative exists; and we denote
it by Djf .a/. That is,
Djf .a/ D limt!0
f .aC tej / � f .a/
t:
Remark 1.35 It is important to note that Djf .a/ is the ordinary derivative
of a certain function; in fact, if g.x/ D f .a1; : : : ; aj�1; x; ajC1; : : : ; am/, then
Djf .a/ D g0.aj /. This means that Djf .a/ is the slope of the tangent line at
.a; f .a// to the curve obtained by intersecting the graph of f with the plane
xi D ai with i ¤ j . See Figure 1.6.
The following theorem relates partial derivatives to the derivative in the
case where f is a real-valued function.
Theorem 1.36 Let A � Rm; let f W A! R. If f is differentiable at a, then
Df .a/ Dh
D1f .a/ D2f .a/ � � � Dmf .a/i:
Proof. If f is differentiable at a, then Df .a/ is a .1 �m/-matrix. Let
Df .a/ Dh�1 �2 � � � �m
i:
It follows from Theorem 1.31 that
Djf .a/ D f0.aI ej / D Df .a/ � ej D �j : ut
1.4 The Chain Rule
We extend the familiar chain rule to the current setting.
14 CHAPTER 1 MULTIVARIABLE CALCULUS
.a; b/
x1
x2
Figure 1.6: D1f .a; b/.
Theorem 1.37 (Chain Rule) Let A � Rm; let B � Rn. Let f W A ! Rn and
g W B ! Rp , with f .A/ � B . Suppose f .a/ D b. If f is differentiable at a,
and if g is differentiable at b, then the composite function g B f W A! Rp is
differentiable at a. Furthermore,
D.g B f /.a/ D Dg.b/ � Df .a/:
Proof. Omitted. See, e.g., Spivak (1965, Theorem 2-2), Rudin (1976, Theo-
rem 9.15), or Munkres (1991, Theorem 7.1). ut
Here is an application of the Chain Rule.
Theorem 1.38 Let A � Rm; let f W A ! Rn. Suppose A contains a neigh-
borhood of a. Let fi W A! R be the i th component function of f , so that
f .x/ D
2664f1.x/:::
fn.x/
3775 :(a) The function f is differentiable at a if and only if each component func-
tion fi is differentiable at a.
1.4 THE CHAIN RULE 15
(b) If f is differentiable at a, then its derivative is the .n�m/-matrix whose
i th row is the derivative of the function fi . That is,
Df .a/ D
2664Df1.a/:::
Dfn.a/
3775 D2664
D1f1.a/ � � � Dmf1.a/:::
: : ::::
D1fn.a/ � � � Dmfn.a/
3775 :
Proof. (a) Assume that f is differentiable at a and express the i th compo-
nent of f as
fi D �i B f;
where �i W Rn ! R is the projection that sends a vector x D .x1; : : : ; xn/ to
xi . Notice that we can write �i as �i .x/ D A � x, where A is a 1 � n matrix
such that
A Dh0 � � � 0 1 0 � � � 0
i;
where the number 1 appears at the i th place. Then �i is differentiable and
D�.x/ D A for all x 2 A (see Example 1.29). By the Chain Rule, fi is
differentiable at a and
Dfi .a/ D D.�i B f /.a/ D D�i�f .a/
�� Df .a/ D A � Df .a/: (1.4)
Now suppose that each fi is differentiable at a. Let
B´
2664Df1.a/:::
Dfn.a/
3775 :We show that Df .a/ D B.
khkRa.h/ D f .aC h/ � f .a/ � B � h D
2664f1.aC h/ � f1.a/ � Df1 � h
:::
fn.aC h/ � fn.a/ � Dfn � h
3775
D khk
2664Ra.hIf1/
:::
Ra.hIfn/;
3775whereRa.hIfi / is the Taylor remainder for fi . It is clear that limh!0Ra.h/ D
0, and which proves that Df .a/ D B.
16 CHAPTER 1 MULTIVARIABLE CALCULUS
(b) This claim follows from the previous part and Theorem 1.36. ut
Remark 1.39 Theorem 1.38 implies that there is little loss of generality
assuming n D 1, i.e., that our functions are real-valued. Multidimensionality
of the domain, not the range, is what distinguished multivariable calculus
from one-variable calculus.
Definition 1.40 (Jocobian Matrix) Let A � Rm; let f W A! Rn. If the partial
derivatives of the component functions fi of f exist at a, then one can form
the matrix that has Djfi .a/ as its entry in row i and column j . This matrix,
denoted by Jf .a/, is called the Jacobian matrix of f . That is,
Jf .a/ D
2664D1f1.a/ � � � Dmf1.a/
:::: : :
:::
D1fn.a/ � � � Dmfn.a/
3775 :Remark 1.41 [1] The Jacobian encapsulates all the essential information
regarding the linear function that best approximates a differentiable
function at a particular point. For this reason it is the Jacobian which is
usually used in practical calculations with the derivative
[2] If f is differentiable at a, then Jf .a/ D Df .a/. However, it is possi-
ble for the partial derivatives, and hence the Jacobian matrix, to exist,
without it following that f is differentiable at a (see Exercise 1.32).
1.5 The Implicit Function Theorem
Let U � Rk � Rn be open. Let f W U ! Rn. Fix a point .a;b/ 2 U and write
f .a;b/ D 0. Our goal is to solve the equation
f .x;y/ D 0 (1.5)
near .a;b/. More precisely, we hope to show that the set of points .x;y/
nearby .a;b/ at which f .x;y/ D 0, the level-set of f through 0, is the graph
of a function y D g.x/. If so, g is the implicit function defined by (1.5). See
Figure 1.7.
Under various hypotheses we will show that g exists, is unique, and is
differentiable. Let us first consider an example.
Example 1.42 Consider the function f W R2 ! R defined by
f .x; y/ D x2 C y2 � 1:
1.5 THE IMPLICIT FUNCTION THEOREM 17
0 Rk
Rn
0 Rk
Rn
a
b
Lf .0/
Figure 1.7: Near .a;b/, Lf .0/ is the graph of a function y D g.x/.
If we choose .a; b/ with f .a; b/ D 0 and a ¤ ˙1, there are (Figure 1.8) open
intervals A containing a and B containing b with the following property: if
x 2 A, there is a unique y 2 B with f .x; y/ D 0. We can therefore define a
function g W A ! R by the condition g.x/ 2 B and f .x; g.x// D 0 (if b > 0,
as indicated in Figure 1.8, then g.x/ Dp1 � x2). For the function f we
are considering there is another number b1 such that f .a; b1/ D 0. There
will also be an interval B1 containing b1 such that, when x 2 A, we have
f .x; g1.x// D 0 for a unique g1.x/ 2 B1 (here g1.x/ D �p1 � x2). Both g and
g1 are differentiable. These functions are said to be defined implicitly by
the equation f .x; y/ D 0.
If we choose a D 1 or �1 it is impossible to find any such function g
defined in an open interval containing a.
Theorem 1.43 (Implicit Function Theorem) Let U � RkCn be open; let
f W U ! Rn be of class C r . Write f in the form f .x;y/ for x 2 Rk and
y 2 Rn. Suppose that .a;b/ is a point of U such that f .a;b/ D 0. Let M be
the n � n matrix
M D
266664DkC1f1.a;b/ DkC2f1.a;b/ � � � DkCnf1.a;b/
DkC1f2.a;b/ DkC2f1.a;b/ � � � DkCnf2.a;b/:::
:::: : :
:::
DkC1fn.a;b/ DkC2fn.a;b/ � � � DkCnfn.a;b/
377775 :If det .M/ ¤ 0, then near .a;b/, Lf .0/ is the graph of a unique function
y D g.x/. Besides, g is C r .
Proof. The proof is too long to give here. You can find it from, e.g., Spivak
(1965, Theorem 2-12), Rudin (1976, Theorem 9.28), Munkres (1991, Theo-
rem 2.9.2), or Pugh (2002, Theorem5.22). ut
18 CHAPTER 1 MULTIVARIABLE CALCULUS
0 x
y
graph of g
graph of g1f .x; y/ D 0
.a; b/B
B1
A
b
b1
a
Figure 1.8: Implicit function theorem.
Example 1.44 Let f W R2 ! R be given by the equation
f .x; y/ D x2 � y3:
Then .0; 0/ is a solution of the equation f .x; y/ D 0. Because @f .0; 0/=@y D 0,
we do not expect to be able to solve this equation for y in terms of x near
.0; 0/. But in fact, we can; and furthermore, the solution is unique! However,
the function we obtain is not differentiable at x D 0. See Figure 1.9.
Example 1.45 Let f W R2 ! R be given by the equation
f .x; y/ D �x4 C y2:
Then .0; 0/ is a solution of the equation f .x; y/ D 0. Because @f .0; 0/=@y D 0,
we do not expect to be able to solve for y in terms of x near .0; 0/. In fact,
however, we can do so, and we can do so in such a way that the result-
ing function is differentiable. However, the solution is not unique. See
Figure 1.10.
Now the point .1; 1/ is also a solution to f .x; y/ D 0. Because @f .1; 1/=@y D
2, one can solve this equation for y as a continuous function of x in a neigh-
borhood of x D 1. See Figure 1.10.
1.6 GRADIENT AND ITS PROPERTIES 19
0 x
y
Figure 1.9: y is not differentiable at x D 0.
0 x
y
−2 −1 1 2
−1
1
Figure 1.10: Example 1.45.
Remark 1.46 We will use the Implicit Function Theorem in Theorem 2.12.
The theorem will also be used to derive comparative statics for economic
models, which we perhaps do not have time to discuss.
1.6 Gradient and Its Properties
In this section we investigate the significance of the gradient vector, which
is defined as follows:
Definition 1.47 (Gradient) Let A � Rm; let f W A! R. Suppose A contains
a neighborhood of a. The gradient of f , denoted by Of .a/, is defined by
Of .a/´h
D1f .a/ D2f .a/ � � � Dmf .a/i:
Remark 1.48 [1] It follows from Theorem 1.36 that if f is differentiable
at a, then Of .a/ D Df .a/. The inverse does not hold; see Remark 1.41[2].
20 CHAPTER 1 MULTIVARIABLE CALCULUS
0 x
yOf .1; 3/ D .2; 6/
Tangent plane
.1; 3/
Lf .10/
(a)
0 x1
x2
Lf .c/x
Of .x/
slope D � D1f .x/D2f .x/
(b)
Figure 1.11: The geometric interpretation of Of .x/.
[2] With the notation of gradient, we can write the Jocobian of f D .f1; : : : ; fn/
as
Jf .a/ D
2664Of1.a/:::
Ofn.a/
3775 :Let f W Rm ! Rn with f D .f1; : : : ; fn/. Recall that the level set of f
through 0 is given by
Lf .0/´˚x 2 Rm W f .x/ D 0
D
n\iD1
Lfi .0/: (1.6)
Given a point a 2 Lf .0/, it is intuitively clear what it means for a plane to be
tangent to Lf .0/ at a. Figure 1.12 shows some examples of tangent planes.
A formal definition of tangent plane will be given later. In this section we
show that the gradient Of .a/ is orthogonal to the tangent plane of Lf .0/
at a and, under some conditions, the tangent plane of Lf .0/ at a can be
characterized by the vectors that are orthogonal to Of .a/. We begin with
the simplest case that f W R2 ! R. Here is an example.
Example 1.49 Let f .x; y/ D x2 C y2. The level set Lf .10/ is given by
Lf .10/ Dn.x; y/ 2 R2 W x2 C y2 D 10
o:
Calculus yieldsdy
dx
ˇalong Lf .10/ at .1;3/
D �1
3:
1.6 GRADIENT AND ITS PROPERTIES 21
Of .a/
a
Tangent plane
Lf .0/
(a) f W R2! R.
a
Of .a/
Lf .0/
Tangent plane
(b) f W R3! R.
Lf1.0/
Lf2.0/
a
Of1.a/
Of2.a/
Tangent plane
(c) f1; f2 W R3! R.
Figure 1.12: Examples of tangent planes.
22 CHAPTER 1 MULTIVARIABLE CALCULUS
Hence, the tangent plane at .1; 3/ is given by y D 3 � .x � 1/=3. Since
Of .1; 3/ D .2; 6/, the result follows immediately; see Figure 1.11(a).
The result in Example 1.49 can be explained as follows. If we change x1and x2, and are to remain on Lf .0/, then dx1 and dx2 must be such as to
leave the value of f unchanged at 0. They must therefore satisfy
f 0.xI .dx1; dx2// D D1f .x/dx1 C D2f .x/dx2 D 0: (1.7)
By solving (1.7) for dx2=dx1, the slope of the level set through x will be (see
Figure 1.11(b))dx2dx1D �
D1f .x/
D2f .x/:
Since the slope of the vector Of .x/ D .D1f .x/; D2f .x// is D2f .x/=D1f .x/,
we obtain the desired result.
We then present the general result. For simplicity, we assume through-
out this section that each fi 2 C1. For Lf .0/ defined in (1.6), obtaining an
explicit representation for the tangent plane is a fundamental problem that
we now address. First we define curves on Lf .0/ and the tangent plane at
some point x 2 Rm. You may want to refer some Differential Geometry text-
books, e.g., O’neill (2006), Spivak (1999), or Lee (2009), for understanding
some of the following concepts better.
One can picture a curve in Rm as a trip taken by a moving point c. At
each “time” t in some interval Œa; b� � R, c is located at the point
c.t/ D�c1.t/; : : : ; cm.t/
�2 Rm:
In rigorous terms then, c is a function from Œa; b� to Rm, and the compo-
nent functions c1; : : : ; cm are its Euclidean coordinate functions. We define
the function c to be differentiable provided its component functions are
differentiable.
Example 1.50 A helix c W R! R3 is obtained through the formula
c.t/ D .a cos t; a sin t; bt/ ;
where a; b > 0. See Figure 1.13.
Definition 1.51 A curve on Lf .0/ is a continuous curve c W Œa; b� ! Lf .0/.
A curve c.t/ is said to pass through the point a 2 Lf .0/ if a D c.t�/ for some
t� 2 Œa; b�.
1.6 GRADIENT AND ITS PROPERTIES 23
Figure 1.13: The Helix.
Definition 1.52 The tangent plane at a 2 Lf .0/, denoted Tf .a/, is defined
as the collection of the derivatives at a of all differentiable curves on Lf .0/
passing through a.
Ideally, we would like to express the tangent plane defined in Defini-
tion 1.52 in terms of derivatives of functions fi that defines the surface
Lf .0/ (see Example 1.49). We introduce the subspace
M ´˚x 2 Rm W Df .a/ � x D 0
(1.8)
and investigate under what conditions M is equal to the tangent plane at a.
The following result shows that Ofi .a/ is orthogonal to the tangent plane
Tf .a/ for all a 2 Lf .0/.
Theorem 1.53 For each a 2 Lf .0/, the gradient Ofi .a/ is orthogonal to the
tangent plane Tf .a/.
Proof. We establish this result by showing Tf .a/ � M for each a 2 Lf .0/.
Every curve c.t/ passing through a at t D t� satisfies f .x.t�// D 0, and so
Df�c.t�/
�� Dc.t�/ D 0:
That is, Dc.t�/ 2M . ut
Definition 1.54 A point a 2 Lf .0/ is said to be a regular point if the
gradient vectors .Of1.a/; : : : ;Ofn.a// are linearly independent.
In general, at regular points it is possible to characterize the tangent
plane in terms of Of1.a/; : : : ;Ofn.a/.
24 CHAPTER 1 MULTIVARIABLE CALCULUS
Theorem 1.55 At a regular point a of Lf .0/, the tangent plane Tf .a/ is
equal to M
Proof. We show that M � Tf .a/. Combining this result with Theorem 1.53,
we have Tf .a/ DM .
To show M � Tf .a/, we must show that if x 2 M then there exists a
curve on Lf .a/ passing through a with derivative x. To construct such a
curve we consider the equations
f�aC tx C Df .a/T � u.t/
�D 0; (1.9)
where for fixed t we consider u.t/ 2 Rn to be the unknown. This is a nonlin-
ear system of n equations and n unknowns, parametrized continuously by
t . At t D 0 there is a solution u.0/ D 0. The Jacobian matrix of the system
with respect to u at t D 0 is the n � n matrix
Df .a/ � Df .a/T ;
which is nonsingular, since Df .a/ is of full rank if a is a regular point. Thus,
by the Implicit Function Theorem (Theorem 1.43) there is a continuously
differentiable solution u.t/ in some region t 2 Œ�a; a�.
The curve
c.t/´ aC tx C Df .a/T � u.t/
is thus a curve on Lf .0/. By differentiating the system (1.9) with respect to
t at t D 0 we obtain
Df .a/ �hx C Df .a/T � Du.0/
iD 0: (1.10)
By definition of x we have Df .a/�x D 0 and thus, again since Df .a/�Df .a/T
is nonsingular, we conclude from (1.10) that
Du.0/ Dh
Df .a/ � Df .a/Ti�1� 0 D 0:
Therefore,
Dc.0/ D x C Df .a/T � Du.0/ D x;
and the constructed curve has derivative x at a. ut
Example 1.56 [1] In R2 let f .x1; x2/ D x1. Then Lf .0/ is the x2-axis, and
every point on that asis is regular since Of .0; x2/ D .1; 0/. In this case,
Tf ..x1; 0// DM , and which is the x2-axis.
1.7 CONTINUOUSLY DIFFERENTIABLE FUNCTIONS 25
[2] In R2 let f .x1; x2/ D x21 . Again, Lf .0/ is the x2-axis, but now no point
on the x2-axis is regular: Of .0; x2/ D .0; 0/. Indeed in this case M D R2,
while the tangent plane is the x2-axis.
We close this section by providing another property of gradient vectors.
Let f W Rn ! R be a differentiable function and a 2 Rn, where Of .a/ ¤ 0.
Suppose that we want to determine the direction in which f increases most
rapidly at a. By a “direction” here we mean a unit vector u. Let �u denote
the angle between u and Of .a/. Then
f 0.aIu/ D Of .a/ � u D kOf .a/k cos �u:
But cos �u attains its maximum value of 1 when �u D 0, that is, when u
and Of .a/ are collinear and point in the same direction. We conclude that
kOf .a/k is the maximum value of f 0.aIu/ for u a unit vector, and that this
maximum value is attained with u D Of .a/= kOf .a/k.
1.7 Continuously Differentiable Functions
We know that mere existence of the partial derivatives does not imply dif-
ferentiability (see Exercise 1.32). If, however, we impose the additional con-
dition that these partial derivatives are continuous, then differentiability is
assured.
Theorem 1.57 Let A be open in Rm. Suppose that the partial derivatives
Djfi .x/ of the component functions of f exist at each point x 2 A and are
continuous on A. Then f is differentiable at each point of A.
A function satisfying the hypotheses of this theorem is often said to be
continuously differentiable, or of class C1, on A.
Proof of Theorem 1.57. It suffices to show that each component function
of f is differentiable by Theorem 1.38. Therefore we may restrict ourselves
to the case of a real-valued function f W A ! R. Let a 2 A. We claim that
Df .a/ D Of .a/ when f 2 C1.
Recall that .e1; : : : ; em/ is the standard basis of Rm. Then every h D
.h1; : : : ; hm/ 2 Rm can be written as h DPmiD1 hiei . For each i D 1; : : : ; m, let
pi ´ aC
iXkD1
hkek D pi�1 C hiei ;
where p0 ´ a. Figure 1.14 illustrates the case where m D 3 and all hi are
positive. For each i D 1; : : : ; m, we also define a function �i W Œ0; 1� ! Rm by
26 CHAPTER 1 MULTIVARIABLE CALCULUS
x1
x2
x3
p0 D a
p1 p2
p3 D aC h
Figure 1.14: The segmented path from a to aC h.
letting
�i .t/ D pi�1 C thiei :
So, �i is a segment from pi�1 to pi .
By the one-dimensional chain rule and mean value theorem applied to
the differentiable real-valued function g W Œ0; 1�! R defined by
g.t/ D .f B �i /.t/;
there exists Nti 2 .0; 1/ such that
f .pi / � f .pi�1/ D g.1/ � g.0/
D g0�Nti�
Ddf
�hi�11 ; hi�1i�1; h
i�1i C thi ; h
i�1iC1; : : : ; h
i�1m
�dt
ˇˇtDNti
D Dif��i�Nti��� hi :
Telescoping f .aC h/ � f .a/ along .�1; : : : ; �m/ gives
f .aC h/ � f .a/ �Of .a/ � h DmXiD1
hf�pi�� f
�pi�1
�i�Of .a/ � h
D
mXiD1
�Dif
��i�Nti��� Dif .a/
�� hi :
1.8 QUADRATIC FORMS: DEFINITE AND SEMIDEFINITE MATRICES 27
Continuity of the partials implies that Dif .�i .Nti // � Dif .a/ ! 0 as khk !
0. ut
Remark 1.58 It follows from Theorem 1.57 that sin.xy/ and xy2 C zexy
are both differentiable since they are of class C1.
Let A � Rm and f W A! Rn. Suppose that the partial derivative Djfi of
the component functions of f exist on A. These then are functions from A
to R, and we may consider their partial derivatives, which have the form
Dk.Djfi /µ Djkfi
and are called the second-order partial derivatives of f . Similarly, one de-
fines the third-order partial derivatives of the functions fi , or more gener-
ally the partial derivatives of order r for arbitrary r .
Definition 1.59 If the partial derivatives of the function fi of order less
than or equal to r are continuous on A, we say f is of class C r on A. We
say f is of class C1 on A if the partials of the functions fi of all orders are
continuous on A.
Definition 1.60 (Hessian) Let a 2 A � Rm; let f W A ! R be twice-
differentiable at a. The m � m matrix representing the second derivative
of f is called the Hessian of f , denoted Hf .a/:
Hf .a/ D
266664D11f .a/ D12f .a/ � � � D1mf .a/
D21f .a/ D22f .a/ � � � D2mf .a/:::
:::: : :
:::
Dm1f .a/ Dm2f .a/ � � � Dmmf .a/
377775 D D.Of /:
Remark 1.61 If f W A ! R is of class C2, then the Hessian of f is a
symmetric matrix, i.e., Dijf .a/ D Dj if .a/ for all i; j D 1; : : : ; m and for all
a 2 A. See Rudin (1976, Corollary to Theorem 9.41, p. 236).
? Exercise 1.62 Find the Hessian of the Cobb-Douglas function
f .x; y/ D x˛yˇ :
1.8 Quadratic Forms: Definite and Semidefinite
Matrices
Definition 1.63 (Quadratic Form) Let A be a symmetric n � n matrix. A
quadratic form on Rn is a function QA W Rn ! R of the form
QA.x/ D x �Ax D
nXiD1
nXjD1
aijxixj :
28 CHAPTER 1 MULTIVARIABLE CALCULUS
Since the quadratic form QA is completely specified by the matrix A,
we henceforth refer to A itself as the quadratic form. Observe that if f
is of class C2, then the Hessian Hf of f defines a quadratic form; see
Remark 1.61.
Definition 1.64 A quadratic form A is said to be
� positive definite if we have x �Ax > 0 for all x 2 Rn X f0g;
� positive semidefinite if we have x �Ax > 0 for all x 2 Rn;
� negative definite if we have x �Ax < 0 for all x 2 Rn X f0g;
� negative semidefinite if we have x �Ax 6 0 for all x 2 Rn.
1.9 Homogeneous Functions and Euler’s Formula
Definition 1.65 (Homogeneous Function) A function f W Rn ! R is homo-
geneous of degree r (for r D : : : ;�1; 0; 1; : : :) if for every t > 0 we have
f .tx1; : : : ; txn/ D trf .x1; : : : ; xn/:
? Exercise 1.66 The function
f .x; y/ D Ax˛yˇ ; A; ˛; ˇ > 0;
is known as the Cobb-Douglas function. Check whether this function is
homogeneous.
Theorem 1.67 (Euler’s Formula) Suppose that f W Rn ! R is homogeneous
of degree r (for some r D : : : ;�1; 0; 1; : : :) and differentiable. Then at any
x� 2 Rn we have
Of .x�/ � x� D rf .x�/:
Proof. By definition we have
f .tx�/ � t rf .x�/ D 0:
Differentiating with respect to t using the chain rule, we have
Of .tx�/ � x� D rt r�1f .x�/:
Evaluating at t D 1 gives the desired result. ut
Lemma 1.68 If f is homogeneous of degree r , its partial derivatives are
homogeneous of degree r � 1.
1.9 HOMOGENEOUS FUNCTIONS AND EULER’S FORMULA 29
? Exercise 1.69 Prove Lemma 1.68.
? Exercise 1.70 Let f .x; y/ D Ax˛yˇ with ˛ C ˇ D 1 and A > 0. Show that
Theorem 1.67 and Lemma 1.68 hold for this function.
2
OPTIMIZATION IN RN
This chapter is based on Luenberger (1969, Chapters 8 & 9), Mas-Colell,
Whinston and Green (1995, Sections M.J & M.K), Sundaram (1996), Vohra
(2005), Luenberger and Ye (2008, Chapter 11), Duggan (2010), and Jehle
and Reny (2011, Chapter A2).
2.1 Introduction
An optimization problem in Rn, or simply an optimization problem, is one
where the values of a given function f W Rn ! R are to be maximized or
minimized over a given set X � Rn. The function f is called the objective
function and the set X the constraint set. Notationally, we will represent
these problems by
Maximize f .x/ subject to x 2 X;
and
Minimize f .x/ subject to x 2 X;
respectively. More compactly, we shall also write
max ff .x/ W x 2 Xg and min ff .x/ W x 2 Xg :
Example 2.1 [a] LetX D Œ0;1/ and f .x/ D x. Then the problem maxff .x/ W
x 2 Xg has no solution; see Figure 2.1(a).
[b] Let X D Œ0; 1� and f .x/ D x.1�x/. Then the problem maxff .x/ W x 2 Xg
has exactly one solution, namely x D 1=2; see Figure 2.1(b).
32 CHAPTER 2 OPTIMIZATION IN RN
0 x
f.x/ Dx
(a) No solution.
0 x1
f .x/ D x.1 � x/
(b) Exactly one solution.
0 x−1 1
f.x/Dx2
(c) Two solutions.
Figure 2.1: Example 2.1.
[c] Let X D Œ�1; 1� and f .x/ D x2. Then the problem maxff .x/ W x 2 Xg
has two solutions, namely x D �1 and x D 1; see Figure 2.1(c).
Example 2.1 suggests that we shall talk of the set of solutions of the
optimization problem, which is denoted
argmax ff .x/ W x 2 Xg D˚x 2 X W f .x/ > f .y/ for all y 2 X
;
and
argmin ff .x/ W x 2 Xg D˚x 2 X W f .x/ 6 f .y/ for all y 2 X
:
We close this section by considering an optimization problem in eco-
nomics.
Example 2.2 There are n commodities in an economy. There is a consumer
whose utility from consuming xi > 0 units of commodity i (i D 1; : : : ; n) is
given by u.x1; : : : ; xn/, where u W RnC ! R is the consumer’s utility function.
The consumer’s income is I > 0, and faces the price vector p D .p1; : : : ; pn/.
His budget set is given by (see Figure 2.2)
B.p; I /´˚x 2 RnC W p � x 6 I
:
The consumer’s objective is to maximize his utility over the budget set, i.e.,
Maximize u.x/ subject to x 2 B.p; I /:
Here we introduce an important fact we will make frequent use of.
2.2 UNCONSTRAINED OPTIMIZATION 33
0 x1
x2
B.p; I /
I=px1
I=px2
Figure 2.2: The budget set B.px1 ; px2 ; I /.
Lemma 2.3 Let f W Rn ! R be differentiable. Then for all h 2 Rn satisfying
h �Of .x/ > 0, we have
f .x C "h/ > f .x/ for all " > 0 sufficiently small.
Proof. We approximate f by its Taylor series expansion (see Definition 1.27):
f .x C "h/ D f .x/C "h �Of .x/C k"hkRx."h/;
where lim"!0Rx."h/ D 0. Then
lim"#0
f .x C "h/ � f .x/
"Dh �Of .x/khk
C khk � lim"#0
Rx."h/ Dh �Of .x/khk
> 0I
therefore, there is N" > 0 such that for all " 2 .0; N"/,
f .x C "h/ � f .x/
"> 0;
i.e., f .x C "/ � f .x/ > 0 for all " 2 .0; N"/. ut
2.2 Unconstrained Optimization
Definition 2.4 (Maximum) Given X � Rn, f W X ! R and x 2 X , we say x
is a maximum of f if
f .x/ D max ff .y/ W y 2 Xg :
We say x is a local maximum of f if there is some " > 0 such that for all
y 2 X \ B.xI "/ we have f .x/ > f .y/. And x is a strict local maximum of f
if the latter inequality holds strictly.
34 CHAPTER 2 OPTIMIZATION IN RN
2.2.1 First-Order Necessary Conditions
Recall that XB is the interior of X � Rn (Definition 1.13), and f 0.xIu/ is the
directional derivative of f at x with respect to u (Definition 1.21).
Theorem 2.5 Let X � Rn and x 2 XB; let f W X ! R be differentiable at x.
If x is a local maximum of f , then Of .x/ D 0.
Proof. Suppose that Of .x/ ¤ 0. Let h D Of .x/ (we can do this since x 2
XB). Then h � Of .x/ > 0. Hence f .x C "h/ > f .x/ for all " > 0 sufficiently
small by Lemma 2.3. This contradicts local optimality of x. ut
Definition 2.6 (Critical Point) A vector x 2 Rn such that Of .x/ D 0 is
called a critical point.
Example 2.7 Let X D R2 and f .x; y/ D xy � 2x4 � y2. The first order
condition is
Of .x; y/ D .y � 8x3; x � 2y/ D .0; 0/:
Thus, the critical points are .x; y/ D .0; 0/; .1=4; 1=8/; .�1=4;�1=8/.
2.2.2 Second-Order Sufficient Conditions
The first-order conditions for unconstrained local optima do not distin-
guish between maxima and minima (see the following Example 2.9). To
obtain such a distinction in the behavior of f at an optimum, we need to
examine the behavior of the Hessian Hf of f (see Definition 1.60).
Theorem 2.8 Suppose f is of class C2 on X � Rn, and x 2 XB.
(a) If f has a local maximum at x, then Hf .x/ is negative semidefinite.
(b) If f has a local minimum at x, then Hf .x/ is positive semidefinite.
(c) If Of .x/ D 0 and Hf .x/ is negative definite at some x, then x is a strict
local maximum of f on X .
(d) If Of .x/ D 0 and Hf .x/ is positive definite at some x, then x is a strict
local minimum of f on X .
Proof. See Sundaram (1996, Section 4.6). ut
Example 2.9 Let f W R ! R be defined by f .x/ D 2x3 � 3x2. It is easy to
check that f 2 C2 on R and there are two critical points: x D 0 and x D 1.
Invoking the second-order conditions, we get f 00.0/ D �6 and f 00.1/ D 6.
2.3 EQUALITY CONSTRAINED OPTIMIZATION: LAGRANGE’S METHOD 35
0 x−1 1
2x3�3x2
Figure 2.3: x D 0 and x D 1 are local optima but not global optima.
Thus, the point x D 0 is a strict local maximum of f on R, and the point
x D 1 is a strict local minimizer of f on R; see Figure 2.3.
However, there is nothing in the first- or second-order conditions that
will help determine whether these points are global optima. In fact, they are
not: global optima do not exist in this example, since limx!C1 f .x/ D C1
and limx!�1 f .x/ D �1.
2.3 Equality Constrained Optimization: Lagrange’s
Method
In this section we consider problems with equality constraints of the form
max f .x/
s:t: gi .x/ D 0; i D 1; : : : ; k 6 n:(ECP)
We assume that f W Rn ! R, gi W Rn ! R, i D 1; : : : ; k, are continuously
differentiable functions. For notational convenience, we introduce the con-
straint function g W Rn ! Rk , where
g D .g1; : : : ; gk/:
We can then write the constraints in the more compact form g.x/ D 0. Also
recall that a point x� such that gi .x�/ D 0 for all i D 1; : : : ; k is called a
regular point if the gradient vectors
Og1.x�/; : : : ;Ogk.x�/
36 CHAPTER 2 OPTIMIZATION IN RN
are linearly independent (see Definition 1.54).
2.3.1 First-Order Necessary Conditions
Since the representation of the tangent plane is known from Theorem 1.55,
it is fairly simple to derive the necessary and sufficient conditions for a
point to be a local extremum point subject to equality constraints.
Lemma 2.10 Let x� be a regular point of the constrains g.x/ D 0 and a
local maximum of f subject to these constraints. Then all x 2 Rn satisfying
Dg.x�/ � x D 0 (2.1)
must also satisfy
Of .x�/ � x D 0: (2.2)
Proof. Take an arbitrary point x 2 M ´ fy 2 Rn W Dg.x�/ � y D 0g. Since
x� is regular, it follows from Theorem 1.55 that the tangent plane Tg.x�/
coincides with M . Therefore, there exists a differentiable curve c W Œa; b� !
Rn on Lg.x�/ passing through x� with c.t�/ D x� and Dc.t�/ D x for some
t� 2 Œa; b�.
Since x� is a constrained local maximium of f , we have
df�c.t/
�dt
ˇtDt�D 0;
or equivalently,
Of .x�/ � x D 0: ut
Lemma 2.10 says that Of .x�/ is orthogonal to the tangent plane Tg.x�/.
Here is an example.
Example 2.11 Consider the following problem
max f .x1; x2/ D �x1=2C x2=2
s:t: g.x1; x2/ D x21 C x
22 D 2:
At the local maximum x� D .�1; 1/, the gradient Of .x�/ is orthogonal to
the tangent plane of the constraint surface. See Figure 2.4.
We now conclude that Lemma 2.10 implies that Of .x�/ is a linear com-
bination of the gradients of g at x�.
2.3 EQUALITY CONSTRAINED OPTIMIZATION: LAGRANGE’S METHOD 37
0 x1
x2
x� D .�1; 1/
Lf.�1/
Lg.0/
Tangent plane
Og.x�/ D .�2;�2/
Of .x�/ D .�12 ;12 /
Figure 2.4: Example 2.11.
Theorem 2.12 (Lagrange’s Theorem) Let x� be a local maximum of f sub-
ject to g.x/ D 0, and assume that x� is a regular point of these constraints.
Then, there exists a unique vector �� D .��1 ; : : : ; ��k/ such that
Of .x�/ DkXiD1
��i �Ogi .x�/:
Proof. Since x� is a regular point of the constraints, the vectors Og1.x�/,: : :, Ogk.x�/ consists of a basis of Rn if k D n, and in this case it is obvious
to see that the theorem holds. So we assume that k < n.
Let U be the subspace spanned by Og1.x�/; : : : ;Ogk.x�/. Then, Rn can
be represented as the direct sum of U and U?, the orthogonal complement
of U , i.e., Rn D U ˚ U?. It follows from Theorem 1.55 that U? D M .
Therefore, dim.Rn/ D dim.U /C dim.M/, or equivalently,
dim.M/ D n � k: (2.3)
Let V be the subspace spanned by Og1.x�/; : : : ;Ogk.x�/;Of .x�/. Sup-
pose there does not exist �1; : : : ; �k such that Of .x�/ DPkiD1 �kOgi .x�/.
Then Og1.x�/; : : : ;Ogk.x�/;Of .x�/ are linearly independent, and so dim.V / D
k C 1. However, Lemma 2.10 says that the
M � V ?;
and which means that We then have
dim.M/ 6 dim.V ?/ D n � dim.V / D n � k C 1; (2.4)
38 CHAPTER 2 OPTIMIZATION IN RN
which contradicts (2.3). We thus conclude that Of .x�/ 2 U , i.e., there exists
a unique vector �� D .��1 ; : : : ; ��k/ such that
Of .x�/ DkXiD1
��i Ogi .x�/: ut
Example 2.13 Consider the problem
max x1x2 C x2x3 C x1x3
s.t. x1 C x2 C x3 D 3:
The necessary conditions become
x2 C x3 D �
x1 C x3 D �
x1 C x2 D �:
These three equations together with the one constraint equation give four
equations that can be solved for the four unknowns x1, x2, x3, �. Solution
yields x1 D x2 D x3 D 1, � D 2.
The Regularity If x� is not a regular point, then Theorem 2.12 can fail.
Consider the following example:
Example 2.14 Let f .x/ D .x C 1/2 and g.x/ D x2. Consider the problem
of maximizing f subject to g.x/ D 0. The maximum is clearly x D 0. But
Dg.0/ D 0 and Df .0/ D 2, so there is no � such that Df .0/ D �Dg.0/.
The Lagrange Multipliers The vector �� D .��1 ; : : : ; ��k/ in Theorem 2.12
is called the vector of Lagrange multipliers corresponding to the local opti-
mum x�. The i th multiplier ��i measures the sensitivity of the value of the
objective function at x� to a small relaxation of the i th constraint gi .
To clarify the notion of “relaxation of a constraint”, let us consider the
following 1-constraint case:
max f .x/
s.t. Qg.x/ � c D 0;
where c 2 R. Then a relaxation of the constraint may be thought of as an
increase in c.
2.3 EQUALITY CONSTRAINED OPTIMIZATION: LAGRANGE’S METHOD 39
Suppose that for each c 2 R there exists a global maximum x�.c/. Sup-
pose further that x�.c/ is a regular point for each c, so there exists ��.c/ 2 R
such that
Of�x�.c/
�D ��.c/ �O Qgi
�x�.c/
�:
Suppose further that x�.c/ is differentiable with respect to c. Then
df�x�.c/
�dc
D Of�x�.c/
�� Dx�.c/
D ��.c/ �hO Qg
�x�.c/
�� Dx�.c/
i:
Since Qg.x�.c// � c D 0, we also have O Qg�x�.c/
�� Dx�.c/ D 1. Therefore,
df�x�.c/
�dc
D ��.c/:
In sum, the lagrange multiplier ��.c/ tells us that a small relaxation in
the constraint will raise the maximized value of the objective function by
��.c/. For this reason, ��.c/ is also called the shadow price of the constraint.
Lagrange’s Method We now describe a procedure for using Theorem 2.12
to solve (ECP).
Step 1 Set up a function L W Rn � Rk ! R, called the Lagrangian, defined
by
L.x;�/ D f .x/ �
kXiD1
�igi .x/:
The vector � D .�1; : : : ; �k/ 2 Rk is called the vector of Lagrange multi-
pliers.
Step 2 Find all critical points of L.x;�/:
@L
@xi.x;�/ D Dif .x/ �
kX`D1
�` Dig`.x/ D 0; i D 1; : : : ; n (2.5)
@L
@�j.x;�/ D gj .x/ D 0; j D 1; : : : ; k: (2.6)
Define
Y ´n.x;�/ 2 RnCk W .x;�/ satisfies (2.5) and (2.6)
o:
40 CHAPTER 2 OPTIMIZATION IN RN
Step 3 Evaluate f at each point x in the set˚x 2 Rn W 9 � such that .x;�/ 2 Y
:
Thus, we see that Lagrange’s method is a clever way of converting a
maximization problem with constraints, to another maximization problem
without constraint, by increasing the number of variables.
Why the Lagrange’s Method typically succeeds in identifying the desired
optima? This is because the set of all critical points of L contains the set of
all local maximums and minimums of (ECP) when the regularity condition
is met. That is, if x� is a local maximum or minimum of f subject to
g.x�/ D 0, and if x� is regular, then there exists �� such that .x�;��/ is a
critical point of L.
We are not going to explain why the Lagrange’s method could fail (but
see Sundaram 1996, Section 5.4 for details).
Example 2.15 Consider the problem
max � x2 � y2
s.t. x C y � 1 D 0:
First, form the Lagrangian,
L.x; y; �/ D �x2 � y2 � �.x C y � 1/:
Then set all of its first-order partials equal to zero:
@L
@xD �2x � � D 0
@L
@yD �2y � � D 0
@L
@�D x C y � 1 D 0:
So the critical points of L is
.x�; y�; ��/ D .1=2; 1=2;�1/:
Hence, f .x�; y�/ D �1=2; see Figure 2.5.
? Exercise 2.16 A consumer purchases a bundle .x; y/ to maximize utility.
His income is I > 0 and prices are px > 0 and py > 0. His utility function is
u.x; y/ D xayb;
where a; b > 0. Find his optimal choice .x�; y�/.
2.3 EQUALITY CONSTRAINED OPTIMIZATION: LAGRANGE’S METHOD 41
0 x
y
L f.�2/
L f.�1/
L f.�1=2/
x�
Lg .0/
Figure 2.5: Lagrange’s method.
Lagrange’s Theorem Is Not Sufficient Lagrange’s Theorem (Theorem 2.12)
only gives us a necessary—not a sufficient—condition for a constrained lo-
cal maximum. To see why the first order condition is not generally suffi-
cient, consider the following example.
Example 2.17 Consider the problem
max x C y2
s.t. x � 1 D 0:
Observe that .x�; y�/ D .1; 0/ satisfies the constraint g.x�; y�/ D 0, and
.1; 0/ is regular. Furthermore, the first-order condition from Lagrange’s The-
orem is satisfied at .1; 0/. This is because Of .1; 0/ D Og.1; 0/ D .1; 0/. Hence,
by letting � D 1 we have Of .1; 0/ D �Og.1; 0/.However, .1; 0/ is not a constrained local maximum: for " > 0, we have
g.1; "/ D 0 and f .1; "/ D 1C "2 > 1 D f .1; 0/. See Figure 2.6.
2.3.2 Second-Order Analysis
We probably do not have time to discus the second-order conditions. See
Jehle and Reny (2011, Section A2.3.4) and Sundaram (1996, Section 5.3).
42 CHAPTER 2 OPTIMIZATION IN RN
0 x
y
.1; 0/
.1; "/
Lg .0/
Lf .1/
Lf .1C "2/
Figure 2.6: The Lagrange’s Theorem is not sufficient.
2.4 Inequality Constrained Optimization: Kuhn-Tucker
Theorem
We now consider a problem of the form
max f .x/
s.t. h1.x/ 6 0; : : : ; h` 6 0;(ICP)
where f and hi are continuously differentiable functions from Rn to R.
More succinctly, we can write this problem as
max f .x/
s.t. h.x/ D 0;
where h W Rn ! R` is the function h D .h1; : : : ; h`/.
For any feasible point x, the set of active inequality constraints is de-
noted by
A.x/´ fi W hi .x/ D 0g : (2.7)
If i … A.x/, we say that the i th constraint is inactive at x. We note that
if x� is a local maximum of (ICP), then x� is also a local maximum for a
problem identical to (ICP) except that the inactive constraints at x� have
been discarded. Thus, inactive constraints at x� do not matter ; they can be
ignored in the statement of optimality conditions.
2.4 INEQUALITY CONSTRAINED OPTIMIZATION: KUHN-TUCKER THEOREM 43
0 x1
x2
0 x1
x2
0 x1
x2
0 x1
x2
Of .y/
Oh1.y/
Oh2.z/
Oh1.z/
Of .z/
Lh1.0/
Lh2.0/
x
Df .x/ D 0
y
z
Figure 2.7: Inequality constrained optimization.
Definition 2.18 Let x� be a point satisfying the constraint h.x�/ 6 0. Then
x� is said to be regular if Ohi .x�/, i 2 A.x�/, are linearly independent.
2.4.1 First-Order Analysis
Example 2.19 Figure 2.7 illustrates a problem with two inequality con-
straints and depicts three possibilities, depending on whether none, one,
or two constraints are active.
� In the first case, we could have a constrained local maximum such as x,
for which both constraints are inactive. Such a vector must be a critical
point of the objective function.
� In the second case, only a single constraint is active at a constrained
local maximum such as y , and here the gradients of the objective and
constraint are collinear. As we will see, these gradients actually point in
the same direction.
� Lastly, we could have a constrained local maximum such as z, where
both constraints are active. Here, the gradient of the objective is not
collinear with the gradient of either constraint, and it may appear that
44 CHAPTER 2 OPTIMIZATION IN RN
Oh1.x�/
Oh2.x�/
Of .x�/
x�
Figure 2.8: Kuhn-Tucker Theorem.
no gradient restriction is possible. But in fact, Of .z/ can be written as a
linear combination of Oh1.z/ and Oh2.z/ with non-negative weights.
The restrictions evident in Figure 2.7 are formalized in the next theorem.
Theorem 2.20 (Kuhn-Tucker Theorem) Let x� be a local maximum of (ICP),
and assume that x� is regular. Then there exists a vector �� D .��1 ; : : : ; ��`/
such that
��i > 0 and ��i hi .x�/ D 0; i D 1; : : : ; `; (KT-1)
Of .x�/ DXiD1
�iOhi .x�/: (KT-2)
Proof. See Sundaram (1996, Section 6.5). ut
Remark 2.21 [1] Geometrically, the first-order condition from the Kuhn-
Tucker Theorem means that the gradient of the objective function, Of .x�/,is contained in the “semi-positive cone” generated by the gradients of
binding constraints, i.e., it is contained in the set8<:XiD1
˛iOhi .x�/ W ˛1; : : : ; ˛` > 0
9=; ;depicted in Figure 2.8.
[2] Condition (KT-1) in Theorem 2.20 is called the condition of comple-
mentary slackness: if hi .x�/ < 0 then ��i D 0; if ��i > 0 then hi .x�/ D 0.
The Kuhn-Tucker Multipliers The vector �� in Theorem 2.20 is called
the vector of Kuhn-Tucker multipliers corresponding to the local maximum
x�. The Kuhn-Tucker multipliers measure the sensitivity of the objective
function at x� to relaxations of the various constraints:
2.4 INEQUALITY CONSTRAINED OPTIMIZATION: KUHN-TUCKER THEOREM 45
� If hi .x�/ < 0, then the i th constraint is already slack, so relaxing it fur-
ther will not help raise the value of the objective function, and ��i must
be zero.
� If hi .x�/ D 0, then relaxing the i th constraint may help increase the value
of the maximization exercise, so we have ��i > 0.
Two Differences There are two important differences from the case of
equality constraints (see Theorem 2.12 and Theorem 2.20):
� The regularity condition now holds only for the gradients of binding
constraints.
� The multipliers are non-negative. This difference comes from the fact
that now only the inequality hi .x/ 6 0 needs to be maintained, so relax-
ing the constraint never hurts.
The Regularity Condition As with the analogous condition in Theorem 2.12
(see Example 2.14), here we show that the regularity condition in Theo-
rem 2.20 is essential.
Example 2.22 Consider the following maximization problem
max f .x1; x2/ D x1
s.t. h1.x1; x2/ D �.1 � x1/3C x2 6 0
h2.x1; x2/ D �x1 6 0
h3.x1; x2/ D �x2 6 0:
See Figure 2.9. Clearly the solution is .x�1 ; x�2 / D .1; 0/. At this point we have
Oh1.1; 0/ D .0; 1/; Oh3.1; 0/ D .0;�1/ and Of .1; 0/ D .1; 0/:
Since x�1 > 0, it follows from the complementary slackness condition (KT-1)
that ��2 D 0. But now (KT-2) fails: for any �1 > 0 and �3 > 0, we have
�1Oh1.1; 0/C �3Oh3.1; 0/ D .0; �1 � �3/ ¤ Of .1; 0/:
This is because .1; 0/ is not regular: There are two binding constraints
at .1; 0/, namely h1 and h3, and the gradients Oh1.1; 0/ and Oh3.1; 0/ are
colinear. Certainly Of .1; 0/ cannot be contained in the cone generated by
Oh1.1; 0/ and Oh3.1; 0/; see Remark 2.21.
46 CHAPTER 2 OPTIMIZATION IN RN
0 x1
x2
1
Lf .1=2/ Lf .1/
Lh1 .0/
Oh3.1; 0/
Of .1; 0/
Oh1.1; 0/
Figure 2.9: The constraint qualification fails at .1; 0/.
The Lagrangian As with equality constraints, we can define the Lagrangian
L W Rn � R` ! R by
L.x;�/ D f .x/ �XiD1
�ihi .x/;
and then condition (KT-2) from Theorem 2.20 is the requirement that x is
a critical point of the Lagrangian given multipliers �1; : : : ; �`.
Let us consider a numerical example.
Example 2.23 Consider the problem
max x2 � y
s.t. x2 C y2 � 1 6 0:
Set the Lagrangian:
L.x; y; �/ D x2 � y � �.x2 C y2 � 1/:
2.4 INEQUALITY CONSTRAINED OPTIMIZATION: KUHN-TUCKER THEOREM 47
0 x
y
0 x
y
Lf.1/
Lf.5=4/
Lh .0/
.�p3=4;�1=2/ .
p3=4;�1=2/
Figure 2.10: Example 2.23.
The critical points of L are the solutions .x; y; �/ to
2x � 2�x D 0 (2.8)
�1 � 2�y D 0 (2.9)
� > 0 (2.10)
x2 C y2 � 1 6 0 (2.11)
�.x2 C y2 � 1/ D 0: (2.12)
For (2.8) to hold, we must have x D 0 or � D 1.
� If � D 1, then (2.9) implies y D �1=2, and (2.12) implies x D ˙p3=2.
That is,
.x; y; �/ D
˙
p3
2;�1
2; 1
!:
We thus have f .x; y/ D 5=4; see Figure 2.10.
� If x D 0, then � > 0 by (2.9). Hence h is binding: x2 C y2 � 1 D 0, and so
y D ˙1. Since (2.9) implies that y D 1 is impossible, we have
.x; y; �/ D
�0;�1;
1
2
�:
At this critical point, we have f .0;�1/ D 1 < 5=4, which means that
.0;�1; 1=2/ cannot be a solution. Since there are no other critical points,
it follows that there are exactly two solutions to the maximization prob-
lem, namely .x�; y�/ D .˙p3=2;�1=2/.
48 CHAPTER 2 OPTIMIZATION IN RN
? Exercise 2.24 Let U D R2; let f .x; y/ D .x � 1/2 C y2, and let h.x; y/ D
2kx � y2 6 0, where k > 0. Solve the maximization problem
max ff .x; y/ W h.x; y/ 6 0g :
Example 2.25 Consider a consumer’s problem:
max.x;y/2R2
fu.x; y/ D x C yg
s.t. h1.x; y/ D �x 6 0
h2.x; y/ D �y 6 0
h3.x; y/ D pxx C pyy � I 6 0;
(2.13)
where px ; py ; I > 0.
We first identify all possible combinations of constraints that can, in
principle, be binding at the optimum. There are eight combinations to be
check:
¿; h1; h2; h3; .h1; h2/; .h1; h3/; .h2; h3/; and .h1; h2; h3/:
Of these, the last one can be ruled out, since h1 D h2 D 0 implies that h3 < 0.
Moreover, since u is strictly increasing in both arguments, it is obvious that
h3 D 0. So we only need to check three combinations: .h1; h3/, .h2; h3/, and
h3.
� If the optimum occurs at a point where only h1 and h3 are binding, then�Oh1.x; y/;Oh3.x; y/
�D�.�1; 0/; .px ; py/
�is linear independent. So the constraint qualification holds at such a
point.
� Similarly, the constraint qualification holds if only .h2; h3/ bind.
� If h3 is the only binding constraint, then Oh3.x; y/ D .px ; py/ ¤ 0; that
is, the constraint qualification holds.
? Exercise 2.26 Solve the problem of (2.13).
2.4.2 Second-Order Analysis
We probably do not have time to discus the second-order conditions. See
Duggan (2010, Section 6.3).
2.5 ENVELOP THEOREM 49
2.5 Envelop Theorem
Let A � R. The graph of a real-valued function f on A is a curve in the
R2 plane, and we shall also refer to the curve itself as f . Given a one-
dimensional parametrized family of curves f˛ W A ! R, where ˛ runs over
some interval, the curve h W A! R is the envelope of the family if
� each point on the curve h is tangent to the graph of one of the curves f˛and
� each curve f˛ is tangent to h.
(See, e.g., Apostol 1974, p. 342 or Zorich 2004, p. 252 for this definition.)
That is, for each ˛, there is some q and also for each q, there is some ˛,
satisfying ˚f˛.q/ D h.q/
f 0˛.q/ D h0.q/:
We may regard h as a function of ˛ if the correspondence between curves
and points on the envelope is one-to-one.
2.5.1 An Envelopment Theorem for Unconstrained
Maximization
Consider now an unconstrained parametrized maximization problem. Let
x�.q/ be the value of the control variable x that maximizes f .x; q/, where q
is our parameter of interest. For some fixed x, the function
�x.q/´ f .x; q/
defines a curve. We also define the value function
V.q/´ f�x�.q/; q
�D max
x�x.q/:
Under appropriate conditions, the graph of the value function V will be the
envelope of the curves �x . “Envelope theorems” in maximization theory are
concerned with the tangency conditions this entails.
Example 2.27 Let
f .x; q/ D q � .x � q/2 C 1; where x; q 2 Œ0; 2�:
Then given q, the maximizing x is given by x�.q/ D q, and V.q/ D q C 1.
50 CHAPTER 2 OPTIMIZATION IN RN
0 q
�x.q/
1 2
1
2
3
xD0
xD0:2
xD0:4
xD0:6
xD0:8
xD1
(a) The family of curves f�x.q/ W x D0; 0:2; : : : ; 2g.
0 q
V.q/
1 2
1
2
3
V.q/ DqC1
(b) The envelop.
Figure 2.11: The graph of V is the envelope of the family of graphs of the
functions �x .
For each x, the function �x is given by
�x.q/ D q � .x � q/2C 1:
The graphs of these functions and of V are shown for selected values of x
in Figure 2.11. Observe that the graph of V is the envelope of the family of
graphs of the functions �x . Consequently the slope of V is the slope of the
�x to which it is tangent, that is,
V 0.q/ D@�x
@q
ˇxDx�.q/Dq
D@f
@q
ˇxDx�.q/Dq
D 1C 2.x � q/jxDx�.q/Dq D 1:
This last observation is one version of the Envelope Theorem.
2.5.2 An Envelope Theorem for Constrained Maximization
Consider the maximization problem,
maxx2Rn
f .x; q/ s.t. gi .x; q/ D 0; i D 1; : : : ; m; (2.14)
2.5 ENVELOP THEOREM 51
where x is a vector of choice variables, and q D .q1; : : : ; q`/ 2 R` is a vector
of parameters that may enter the objective function, the constraints, or
both.
Suppose that for each q there exists a unique solution x.q/. Further-
more, we assume that the objective function f W Rn ! R, constraints gi W Rn�
R` ! R (i D 1; : : : ; m), and solutions x W R` ! Rn are differentiable in the
parameter q.
Then, for every parameter q, the maximized value of the objective func-
tion is f .x.q/; q/. This defines a new function, V W R` ! R, called the value
function. Formally,
V.q/´ maxx2Rn
ff .x; q/ W gi .x; q/ D 0; i D 1; : : : ; mg : (2.15)
Theorem 2.28 (Envelope Theorem) Consider the value function V.q/ for
the problem (2.14). Let .�1; : : : ; �m/ be values of the Lagrange multipliers
associated with the maximum solution x. Nq/ at Nq. Then for each k D 1; : : : ; `,
@V. Nq/
@qk
D@f .x. Nq/; Nq/
@qk
�
mXjD1
�j@gj .x. Nq/; Nq/
@qk
: (2.16)
Proof. By definition, V.q/ D f .x.q/; q/ for all q. Using the chain rule, we
have@V. Nq/
@qk
D
nXiD1
"@f .x. Nq/; Nq/
@xi
@xi . Nq/
@qk
#C@f .x. Nq/; Nq/
@qk
:
It follows from Theorem 2.12 that
@f .x. Nq/; Nq/
@xiD
mXjD1
�j@gj .x. Nq/; Nq/
@xi:
Hence,
@V. Nq/
@qk
D
nXiD1
"@f .x. Nq/; Nq/
@xi
@xi . Nq/
@qk
#C@f .x. Nq/; Nq/
@qk
D
nXiD1
2640@ mXjD1
�j@gj .x. Nq/; Nq/
@xi
1A @xi . Nq/
@qk
375C @f .x. Nq/; Nq/
@qk
D
nXjD1
�j
nXiD1
"@gj .x. Nq/; Nq/
@xi
@xi . Nq/
@qk
#C@f .x. Nq/; Nq/
@qk
:
Finally, since gj .x.q/; q/ D 0 for all q, we have
nXiD1
"@gj .x. Nq/; Nq/
@xi
@xi . Nq/
@qk
#C@gj .x. Nq/; Nq/
@qk
D 0:
52 CHAPTER 2 OPTIMIZATION IN RN
Combining, we get (2.16). ut
Example 2.29 We are given the problem
max.x;y/2R2
ff ..x; y/; q/ D xyg s.t. g..x; y/; q/ D 2x C 4y � q D 0:
Forming the Lagrangian, we get
L D xy � �.2x C 4y � q/;
with first-order conditions:
y � 2� D 0
x � 4� D 0
q � 2x � 4y D 0:
(2.17)
These solve for x.q/ D q=4, y.q/ D q=8 and �.q/ D q=16. Thus,
V.q/ D x.q/y.q/ Dq2
32:
Differentiating V.q/ with respect to q we get
V 0.q/ Dq
16:
Now let us verify this using the Envelope Theorem. The theorem tells us
that
V 0.q/ D@f ..x.q/; y.q//; q/
@q� �.q/
@g..x.q/; y.q//; q/
@qD �.q/ D
q
16:
Example 2.30 Consider a consumer whose utility function u W RnC ! R is
strictly increasing in every commodity i D 1; : : : ; n. Then this consumer’s
problem is
maxu.x/ s.t.nXiD1
xipi D I:
The Lagrangian is
L.x; �/ D u.x/ � �
0@I � nXiD1
xipi
1A :It follows from Theorem 2.28 that
V 0.I / [email protected]; �/
@ID �:
That is, � measures the marginal utility of income.
2.5 ENVELOP THEOREM 53
2.5.3 Integral Form Envelope Theorem
The Envelope theorems we introduced so far rely on assumptions that are
not satisfactory for applications, e.g., mechanism design. Unfortunately, it
is too technique to develop the more advanced treatment of the Envelope
Theorem. We refer the reader to Milgrom and Segal (2002) and Milgrom
(2004, Chapter 3) for the integral form Envelope Theorem.
3
CONVEX ANALYSIS IN RN
Rockafellar (1970) is the classical reference for finite-dimensional con-
vex analysis. As for infinite-dimensional convex analysis, Luenberger (1969)
is an excellent text.
To understand the material what follows, it is necessary that the reader
have a good background in Multivariable Calculus (Chapter 1) and Linear
Algebra (Axler, 1997).
In this chapter, we will exclusively consider convexity in Rn for con-
creteness, but much of the discussion here generalizes to infinite dimen-
sional vector spaces. You may want to consult Lay (1982), Hiriart-Urruty
and Lemaréchal (2001), Berkovitz (2002), Bertsekas, Nedic and Ozdaglar
(2003), Bertsekas (2009), Ok (2007, Chapter G) and Royden and Fitzpatrick
(2010, Chapter 6).
3.1 Convex Sets
Definition 3.1 (Convex Set) A subset C � Rn is convex if for every pair of
points x1;x2 2 C , the line segment
Œx1;x2�´ fx W x D �x1 C .1 � �/x2; 0 6 � 6 1g
belongs to C .
? Exercise 3.2 Sketch the following sets in R2 and determine from figure
which sets are convex and which are not:
(a)n�x; y
�W x2 C y2 6 1
o,
(b)n�x; y
�W 0 < x2 C y2 6 1
o,
(c)n�x; y
�W y > x2
o,
56 CHAPTER 3 CONVEX ANALYSIS IN RN
(0, 1, 0)
(1, 0, 0)
(0, 0, 1)
(0, 1, 0)
(1, 0, 0)
(0, 0, 1)
Figure 3.1: �2 in R3.
(d)n�x; y
�W jxj Cjyj 6 1
o, and
(e)n�x; y
�W y > 1
ı1C x2
o.
Lemma 3.3 Let fC˛g be a collection of convex sets such that C ´T˛ C˛ ¤
¿. Then C is convex.
Proof. Let x1;x2 2 C . Then x1;x2 2 C˛ for all ˛. Since C˛ is convex, we
have Œx1;x2� � C˛ for all ˛. Hence, Œx1;x2� � C , so C is convex. ut
For each positive integer n, define
�n�1´
8<:.�1; : : : ; �n/ 2 Œ0; 1�n WnXiD1
�i D 1
9=; : (3.1)
For n D 1, the set �0 is the singleton f1g. For n D 2, the set �1 is the
closed line segment joining .0; 1/ and .1; 0/. For n D 3, the set �2 is the
closed triangle with vertices .1; 00/, .0; 1; 0/ and .0; 0; 1/ (see Figure 3.1).
Definition 3.4 A point x 2 Rn is a convex combination of points x1; : : : ;xkif there exists � 2 �k�1 such that
x D
kXiD1
�ixi :
Lemma 3.5 A set C � Rn is convex iff every convex combination of points
in C is also in C .
3.2 SEPARATION THEOREM 57
Proof. The “if” part is evident. So we shall prove the “only if” statement by
induction on k. It holds for k D 2 by definition. Suppose the statement is
true for k D n. Now consider k D n C 1. Let x1; : : : ;xkC1 2 C and � 2 �n
with �kC1 2 .0; 1/. Then
x D
nC1XiD1
�ixi
D
0@ nXjD1
�j
1A24 nXiD1
�iPnjD1 �j
!xi
35C �kC1xkC12 C: ut
Let A � Rn, and let A be the class of all convex subsets of Rn that
contain A. We have A ¤ ¿—after all, Rn 2 A. Then, by Lemma 3.3,T
A is
a convex set in Rn that contains A. Clearly, this set is the smallest (that is,
�-minimum) convex subset of Rn that contains A.
Definition 3.6 The convex hull of A, denoted by cov.A/, is the intersection
of all convex sets containing A.
? Exercise 3.7 For a given set A, let K.A/ denote the set of all convex
combinations of points in A. Show that K.A/ is convex and A � K.A/.
Theorem 3.8 Let A � Rn. Then cov.A/ D K.A/.
Proof. Let A be the family of convex sets containing A. Since cov.A/ DT
A
and K.A/ 2 A (Exercise 3.7), we have cov.A/ � K.A/.
To prove the reverse inclusion relation, take an arbitrary C 2 A. Then
A � C . It follows from Lemma 3.5 that K.A/ � C . Hence K.A/ �T
A D
cov.A/. ut
3.2 Separation Theorem
This section is devoted to the establishment of separation theorems. In
some sense, these theorems are the fundamental theorems of optimization
theory. For simplicity, we restrict our analysis on Rn.
Definition 3.9 A hyperplane Hˇa in Rn is defined to be the set of points
that satisfy the equation ha;xi D ˛. Thus,
Hˇa ´
˚x 2 Rn W ha;xi D ˇ
:
The vector a is said to be a normal to the hyperplane.
58 CHAPTER 3 CONVEX ANALYSIS IN RN
Hˇa
a
Half-space below Hˇa
Half-space above Hˇa
Figure 3.2: Hyperplane and half-spaces.
Remark 3.10 Geometrically, a hyperplane Hˇa in Rn is a translation of an
.n� 1/-dimensional subspace (an affine manifold). Algebraically, it is a level
set of a linear functional. For an excellent explanation about hyperplanes,
see Luenberger (1969, Section 5.12).
A hyperplane Hˇa divides Rn into two half spaces, one on each side of
Hˇa . The set fx 2 Rn W ha;xi > ˇg is called the half-space above the hyper-
plane Hˇa , and the set fx 2 Rn W ha;xi 6 ˇg is called the half-space below the
hyperplane Hˇa ; see Figure 3.2.
It therefore seems natural to say that two sets X and Y are separated by
a hyperplane Hˇa if they are contained in different half spaces determined
by Hˇa . We will introduce two separation theorems.
Theorem 3.11 (A First Separation Theorem) Let C be a closed and convex
subset of Rn; let y 2 Rn X C . Then there exists a vector a 2 Rn with a ¤ 0,
and a scalar ˇ 2 R such that ha;yi > ˇ and ha;xi < ˇ for all x 2 C .
To motivate the proof, we argue heuristically from Figure 3.3, where C
is assumed to have a tangent at each boundary point. Draw a line from y to
x�, the point of C that is closest to y . The vector y � x� is orthogonal to C
in the sense that y �x� is orthogonal to the tangent line at x�. The tangent
line, which is exactly the set fz 2 Rn W hy � x�; z � x�i D 0g, separates y and
C . The point x� is characterized by the fact that hy � x�;x � x�i 6 0 for all
x 2 C . If we move the tangent line parallel to itself so as to pass through
a point x0 2 .x�;y/, we are done. We now justify these steps in a series of
claims.
Claim 1 Let C be a convex subset of Rn and let y 2 Rn X C . If there exists
a point in C that is closest to y , then it is unique.
3.2 SEPARATION THEOREM 59
Hˇa
y
a
C
x�
x
y � x�
x � x�
Figure 3.3: The separating hyperplane theorem.
Proof. Suppose that there were two points x1 and x2 of C that were closest
to y . Then .x1 C x2/=2 2 C since C is convex, and so1
d.y; C / 6 x1 C x22
� y
D
12 �.x1 � y/C .x2 � y/�
61
2kx1 � yk C
1
2kx2 � yk
D d.y; C /:
Hence the triangle inequality holds with equality. It follows from Exer-
cise 1.8 that there exists � > 0 such that x1 � y D �.x2 � y/. Clearly, � ¤ 0;
for otherwise x1 � y D 0 implies that y D x1 2 C . Then � D 1 since
kx1 � yk D kx2 � yk D d.y; C /. But then x1 D x2.
Claim 2 Let C be a closed subset of Rn and let y 2 Rn X C . Then there
exists a point x� 2 C that is closest to y .
Proof. Take an arbitrary point x0 2 C . Let r > kx0�yk. Then C1´ B.yI r/\
C is nonempty (at least x0 is in the intersection), closed, and bounded and
hence is compact. The function x 7! kx � yk is continuous on C1 and so
attains its minimum at some point x� 2 C1, i.e.,
kx� � yk 6 kx � yk for all x 2 C1:
1For a set X � Rn, the function d.y;X/ is the distance from y to X defined by
d.y;X/ D infx2Xky � xk:
60 CHAPTER 3 CONVEX ANALYSIS IN RN
For every x 2 C X C1, we have
kx � yk > r > kx0 � yk > kx� � yk;
since x0 2 C1.
Claim 3 Let C be a convex subset of Rn and let y 2 Rn X C . Then x� 2 C
is a closest point in C to y iff˝y � x�;x � x�
˛6 0 for all x 2 C: (3.2)
Proof. Let x� 2 C be a closest point to y and let x 2 C . Since C is convex,
we have
Œx�;x�´˚z.t/ 2 Rn W z.t/ D x� C t .x � x�/; t 2 Œ0; 1�
� C:
Let
g.t/´ kz.t/ � yk2 D˝x� C t .x � x�/ � y;x� C t .x � x�/ � y
˛D
nXiD1
�x�i � yi C t .xi � x
�i /�2:
Observe that g.0/ D x�. Since g is continuously differentiable on .0; 1� and
x� 2 argminx2C kx � yk, we have g0C.0/ > 0. Since
g0.t/ D 2
nXiD1
�x�i � yi C t .xi � x
�i /�.xi � x
�i /
D 2
24� nXiD1
.yi � x�i /.xi � x
�i /C t
nXiD1
.xi � x�i /2
35D 2
h�˝y � x�;x � x�
˛C tkx � x�k2
i:
(3.3)
Letting t # 0 we get (3.2).
Conversely, suppose that (3.2) holds. Take an arbitrary x 2 C X fx�g. It
follows from (3.3) that if t 2 .0; 1� then
g0.t/ D 2h�˝y � x�;x � x�
˛C tkx � x�k2
i> 2tkx � x�k2 > 0:
That is, g is strictly increasing on Œ0; 1�. Thus, g.1/ D kx � yk > kx� � yk D
g.0/.
3.2 SEPARATION THEOREM 61
Proof of Theorem 3.11. We now can complete the proof of Theorem 3.11.
Let x� 2 C be the closest point to y (by Claim 1 and Claim 2). Let a D y�x�.
Then for all x 2 C , we have ha;x � x�i 6 0 (by Claim 3), i.e., ha;xi 6 ha;x�i,with equality occurring when x D x�. Hence,
maxx2Cha;xi D
˝a;x�
˛:
On the other hand, ha;y � x�i D kak2 > 0, so
ha;yi D˝a;x�
˛C kak2 >
˝a;x�
˛:
Finally, take an arbitrary ˇ 2 .ha;x�i ; ha;yi/. We thus have ha;yi > ˇ
and ha;xi 6 ha;x�i < ˇ for all x 2 C . ut
Theorem 3.12 (Separation Theorem) Let X and Y be two disjoint convex
subsets of Rn. Then there exists a 2 Rn with a ¤ 0 and a scalar ˇ 2 R such
that ha;xi > ˇ for all x 2 X and ha;yi 6 ˇ for all y 2 Y . That is, there is a
hyperplane Hˇa that separates X and Y .
Proof. First note that
X \ Y D ¿ () 0 … X � Y:
Since X and Y are convex, so is X � Y . Hence, there exists an a ¤ 0 such
that
ha;x � yi 6 0 for all x � y 2 X � Y:
Therefore,
ha;xi 6 ha;yi for all x 2 X and y 2 Y: (3.4)
Let ˛ ´ supfha;xi W x 2 Xg and ´ inffha;yi W y 2 Y g. Then from (3.4)
we get that ˛ and are finite, and for any number ˇ 2 Œ˛; �,
ha;xi 6 ˇ 6 ha;yi for all x 2 X and y 2 Y:
Thus the hyperplane Hˇa separates X and Y . ut
Theorem 3.13 (Proper Separation Theorem) Let X and Y be two convex
sets such that XB ¤ ¿ and XB \ Y D ¿. Then there exists a hyperplane Hˇa
that properly separates xX and xY .
Proof. SinceXB is convex and disjoint from Y , it follows from Theorem 3.12
that there exists a hyperplane Hˇa such that
ha;xi 6 ˇ 6 ha;yi for all x 2 XB and all y 2 Y: (3.5)
62 CHAPTER 3 CONVEX ANALYSIS IN RN
Let x 2 xX . Then x 2 XB since xX D XB. So there exists a sequence .xn/
in XB such that xn ! x. It then follows from (3.5) and the continuity of
the inner product that (3.5) holds for all x 2 xX and all y 2 xY . Hence, the
hyperplane Hˇa separates xX and xY . Since . xX/B D XB, we know that . xX/B ¤
¿. Furthermore, ha;xi < ˇ for x 2 . xX/B, so the separation is proper. ut
Theorem 3.14 (Strong Separation Theorem) Let K be a compact convex
set and let C be a closed convex set such that K \ C D ¿. Then K and C
can be strongly separated.
Proof. For every " > 0, define
K"´ K C B.0I "/ D[
x2K
�x C B.0I "/
�C"´ C C B.0I "/ D
[y2C
�y C B.0I "/
�:
Clear, both K" and C" are open and convex.
We now show that there exists " > 0 such that K" \ C" D ¿. If the
assertion were false, there would exist a sequence ."n/ with "n > 0 and
"n ! 0 and a sequence .wn/ such that wn 2 K"k \ C"k for each n. Since
wn D
˚xn C zn with xn 2 K and kznk < "k
yn C z0n with yn 2 C and
z0n < "k ;we have a sequence .xk/ in K and a sequence .yn/ in C such that
kwn � xnk < "k ;
kwn � ynk < "k :
Hence,
kxn � ynk D .xn �wn/C .wn � yn/ 6 kxn �wnkCkyn �wnk < 2"k : (3.6)
Since K is compact, there exists a subsequence .xnk / that converges to
x0 2 K. It follows from (3.6) that
kynk � x0k 6 kxnk � x0k C kxnk � ynkk ! 0;
that is, ynk ! x0. Since C is closed, x0 2 C . This contradicts the assump-
tion that C \K D ¿, and so the assertion is true.
We have shown that there exists " > 0 such that K" \ C" D ¿. Hence,
by Theorem 3.13 there is a hyperplane that properly separates K" and C".
Since both K" and C" are open, the separation is strict. According to the
definition, this says that K and C are strongly separated. ut
3.3 SYSTEMS OF LINEAR INEQUALITIES: THEOREMS OF THE ALTERNATIVE 63
3.3 Systems of Linear Inequalities: Theorems of the
Alternative
Let A be an m� n matrix and b ¤ 0 be a vector in Rm. We focus on finding a
non-negative x 2 RnC such that A � x D b or show that no such x exists. The
problem can be framed as follows: can b be expressed as a non-negative
linear combination of the columns of A?
Definition 3.15 The set of all non-negative linear combinations of the
columns of A is called the finite cone generated by the columns of A. It is
denoted cone.A/.
Lemma 3.16 Let A be an m � n matrix, then cone.A/ is a convex set.
Proof. Take any two y;y 0 2 cone.A/. Then there exist x;x0 2 RnC such that
y D Ax and y 0 D Ax0:
For any � 2 Œ0; 1� we have
�y C .1 � �/y 0 D �Ax C .1 � �/Ax0 D A��x C .1 � �/x0
�:
Since �x C .1 � �/x0 2 RnC it follows that �y C .1 � �/y 0 2 cone.A/. ut
Lemma 3.17 Let A be an m � n matrix, then cone.A/ is a closed set.
Proof. We prove this lemma by considering two cases.
Case 1: The columns of A are linearly independent. Let .wn/ be a con-
vergent sequence in cone.A/ with limit w. We show that w 2 cone.A/. For
each wn there exists xn 2 RnC such that wn D Axn. We use the fact that
.Axn/ converges to show that .xn/ converges.
ut
Theorem 3.18 (Farkas’ Lemma) Let A be anm�nmatrix and let b 2 RmXf0g.
Then one and only one of the systems
Ax D b; x > 0; x 2 Rn; (FI )
yA > 0; y � b < 0; y 2 Rm; (FII )
has a solution.
64 CHAPTER 3 CONVEX ANALYSIS IN RN
Proof. We first suppose that (FI ) has a solution x�. We shall show that (FII )
has no solution. Suppose there exists y0 2 Rm such that AT � y0 > 0. Then
y0 � b D x�� .ATy0/ > 0:
Hence (FII ) has no solution.
We now suppose that (FI ) has no solution; that is,
b … cone.A/:
Since cone.A/ is convex and closed, there exists a hyperplane Hˇy0 that
strongly separates b and cone.A/. Thus, y0 ¤ 0 and˝y0;b
˛> ˇ >
˝y0; z
˛for all z 2 cone.A/:
Since 0 2 cone.A/, we get that ˇ > 0. Thus,˝y0;b
˛> 0: (3.7)
For all z 2 cone.A/ we have˝y0; z
˛D˝y0;Ax
˛D
Dx;ATy0
E< ˛;
the last inequality now holding for all x 2 RnC. ut
Example 3.19 We use the Farkas’ Lemma to decide if the following system
has a non-negative solution:
"4 1 �5
1 0 2
#264x1x2x3
375 D "11
#:
The Farkas alternative ishy1 y2
i "4 1 �5
1 0 2
#>h0 0 0
i;
y1 C y2 < 0:
Or equivalently, ˚4y1 C y2 > 0;
y1 > 0;
�5y1 C 2y2 > 0;
y1 C y2 < 0:
(3.8)
3.4 CONVEX FUNCTIONS 65
There has no solution to the system (3.8). The second inequality requires
that y1 > 0. Combining this with the last inequality we conclude that y2 < 0.
But y1 > 0 and y2 < 0 contradict the third inequality of (3.8). So, the original
system has a non-negative solution.
Example 3.20 We use the Farkas’ Lemma to decide the solvability of the
system: 2666641 1 0
0 1 1
1 0 1
1 1 1
377775264x1x2x3
375 D2666642
2
2
1
377775 :We are interested in non-negative solutions of this system. The Farkas
alternative is
hy1 y2 y3 y4
i2666641 1 0
0 1 1
1 0 1
1 1 1
377775 >h0 0 0 0
i;
2y1 C 2y2 C 2y3 C y4 < 0:
One solution is y1 D y2 D y3 D �1=2 and y4 D 1, implying that the given
system has no solution.
3.4 Convex Functions
Throughout this section we will assume the subset C � Rn is convex and
f is a real-valued function defined on C , that is, f W C ! R. When we take
x1;x2 2 C , we will let x t ´ tx1 C .1 � t /x2, for t 2 Œ0; 1�, denote the convex
combination of x1 and x2.
Definitions
Definition 3.21 A function f W C ! R is convex if for all x1;x2 2 C and
t 2 Œ0; 1�,
f Œtx1 C .1 � t /x2� 6 tf .x1/C .1 � t /f .x2/:
The function f is strictly convex if the above inequality holds strictly.
A function f W C ! R is concave if for all x1;x2 2 C and t 2 Œ0; 1�,
f Œtx1 C .1 � t /x2� > tf .x1/C .1 � t /f .x2/:
The function f is strictly concave if the above inequality holds strictly.
66 CHAPTER 3 CONVEX ANALYSIS IN RN
0 x
f .x/
epi.f /
(a) f is convex.
0 x
f .x/
sub.f /
(b) f is concave.
0 x1
x2
I.y0/Lf .y
0 /
(c) f is quasi-convex.
0 x1
x2
S.y0/
Lf.y0 /
(d) f is quasi-concave.
Figure 3.4:
Definition 3.22 A function f W C ! R is quasi-convex if, for all x1;x2 2 C
and t 2 Œ0; 1�,
f Œtx1 C .1 � t /x2� 6 max ff .x1/; f .x2/g :
The function f is strictly quasi-convex function if the above inequality holds
strictly.
A function f W C ! R is quasi-concave if, for all x1;x2 2 C and t 2 Œ0; 1�,
f Œtx1 C .1 � t /x2� > min ff .x1/; f .x2/g :
The function f is strictly quasi-concave if the above inequality holds strictly.
Geometric Interpretation
Given a function f W C ! R and y0 2 R, let us define
3.4 CONVEX FUNCTIONS 67
The epigraph of f : epi.f /´ f.x; y/ 2 C � R W f .x/ 6 yg;
The subgraph of f : sub.f /´ f.x; y/ 2 C � R W f .x/ > yg;
The superior set for level y0: S.y0/´ fx 2 C W f .x/ > y0g;
The inferior set for level y0: I.y0/´ fx 2 C W f .x/ 6 y0g.
We then have (see Figure 3.4)
� f is convex () epi.f / is convex.
� f is concave () sub.f / is convex.
� f is quasi-convex () I.y0/ is convex.
� f is quasi-concave () S.y0/ is convex.
Convexity and Quasi-convexity
It is a simple matter to show that concavity (convexity) implies quasi-concavity
(quasi-convexity).
Theorem 3.23 Let C � Rn and f W C ! R. If f is concave on C , it is
also quasi-concave on C . If f is convex on C , it is also quasi-convex
on C .
Proof. We only prove the first claim, and leave the second as an exercise.
Suppose f is concave on C . Take any x;y 2 C and t 2 Œ0; 1�. Without loss
of generality, we let
f .x/ > f .y/:
By the definition of concavity, we have
f�tx C .1 � t /y
�> tf .x/C .1 � t /f .y/
D t�f .x/ � f .y/
�C f .y/
> f .y/
D min ff .x/; f .y/g :
Hence, f is quasi-concave. ut
? Exercise 3.24 Prove the second claim in Theorem 3.23: if f is convex,
then it is also quasi-convex.
68 CHAPTER 3 CONVEX ANALYSIS IN RN
Concavity and Hessian
We now characterize concavity of a function using the Hessian matrix.
Theorem 3.25 Let A � Rn. The (twice continuously differentiable)
function f W A! R is concave if and only if Hf .x/ is negative semidef-
inite for every x 2 A.
Proof. See Mas-Colell et al. (1995, Theorem M.C.2). ut
Example 3.26 Let A´ .0;1/� .�5;1/. Let f .x; y/ D ln x C ln.y C 5/. For
each point .x; y/ 2 A, the Hessian of f is
Hf .x; y/ D
"�1=x2 0
0 �1=.y C 5/2
#:
Then for each .u; v/ 2 R2, we have
hu v
i�
"�1=x2 0
0 �1=.y C 5/2
#"u
v
#D �
u2
x2�
v2
.y C 5/26 0:
That is, Hf .x; y/ is negative semidefinite. Hence, f .x; y/ is concave. See
Figure 3.5.
Jensen’s Inequality
Theorem 3.27 (Jensen’s Inequality) Let f W C ! R, where C � Rn is
convex. Then f is convex iff for every finite set of points x1; : : : ;xk 2 C
and every t D .t1; : : : ; tk/ 2 �k�1 (see (3.1) for the definition of �k�1),
f
0@ kXiD1
tixi
1A 6kXiD1
tif .xi /: (3.9)
Proof. The ”If” part is evident. So we only prove the “only if” part. Suppose
that f is convex. We shall prove (3.9) by induction on k. For k D 1 the
relation is trivial (remember that �0 D f1g when k D 1). For k D 2 the
relation follows from the definition of a convex function. Suppose that
k > 2 and that (3.9) has been established for k� 1. We show that (3.9) holds
for k.
3.4 CONVEX FUNCTIONS 69
10
20
30
x
0
10
20
30
y
0
2
4
6
Figure 3.5: The function ln x C ln.y C 5/ is concave.
If tk D 1, then there is nothing to prove. If tk < 1, set T DPk�1iD1 ti . Then
T C tk D 1, T D 1 � tk > 0 and
k�1XiD1
ti
TDT
TD 1:
70 CHAPTER 3 CONVEX ANALYSIS IN RN
Hence,
f
0@ kXiD1
tixi
1A D f0@k�1XiD1
tixi C tkxk
1AD f
0B@T24k�1XiD1
�ti
T
�xi
35C tkxk1CA
6 Tf
0@k�1XiD1
�ti
T
�xi
1AC tkf .xk/6 T
24k�1XiD1
�ti
T
�f .xi /
35C tkf .xk/D
kXiD1
tif .xi /;
where the first inequality follows from the convexity of f and the second
inequality follows from the inductive hypothesis. ut
3.5 Convexity and Optimization
Concave Programming
We first present two results which indicates the importance of convexity
for optimization theory.
� In convex optimization problems, all local optima must also be global
optima.
� If a strictly convex optimization problem admits a solution, the solution
must be unique.
Theorem 3.28 Let X � Rn be convex and f W X ! R be concave. Then
(a) Any local maximizer of f is a global maximizer of f .
(b) The set argmaxff .x/ W x 2 Xg of maximizers of f on X is either
empty or convex.
3.5 CONVEXITY AND OPTIMIZATION 71
Proof. (a) Suppose x is a local maximizer but not a global maximizer of f .
Then there exists " > 0 such that
f .x/ > f .y/; for all y 2 X \ B.xI "/;
and there exists z 2 X such that
f .z/ > f .x/: (3.10)
Since X is convex, Œtx C .1 � t /z� 2 X for all t 2 .0; 1/. Take Nt sufficiently
close to 1 so that NtxC .1� Nt /z 2 B.xI "/. By the concavity of f and (3.10), we
have
f�Ntx C .1 � Nt /z
�> Ntf .x/C .1 � Nt /f .z/ > f .x/:
A contradiction.
b Suppose that x1 and x2 are both maximizers of f on X . Then f .x1/ D
f .x2/. For any t 2 .0; 1/, we have
f .x1/ > f�tx1 C .1 � t /x2
�> tf .x1/C .1 � t /f .x2/ D f .x1/:
That is, f ŒtxC .1� t /x2� D f .x1/. Thus, tx1C .1� t /x2 is a maximizer of f
on X . ut
Theorem 3.29 Let X � Rn be convex and f W X ! R is strictly con-
cave. Then argmaxff .x/ W x 2 Xg either is empty or contains a single
point.
? Exercise 3.30 Prove Theorem 3.29.
We now present a extremely important theorem, which says that the
first-order conditions of the Kuhn-Tucker Theorem (Theorem 2.20) are both
necessary and sufficient to identify optima of convex inequality-constrained
optimization problem, provided a mild regularity condition is met.
72 CHAPTER 3 CONVEX ANALYSIS IN RN
Theorem 3.31 (Kuhn-Tucker Theorem under Convexity) Let U � Rn
be open and convex. Let f W U ! R be a concave C1 function. For
i D 1; : : : ; `, let hi W U ! R be convex C1 functions. Suppose there is
some Nx 2 U such that
hi . Nx/ < 0; i D 1; : : : ; `:
(This is called the Slater’s condition.) Then x� maximizes f over
X ´ fx 2 U W hi .x/ 6 0; i D 1; : : : ; `g
if and only if there is �� 2 R` such that the Kuhn-Tucker first-order
conditions hold:
�� > 0;XiD1
��i hi .x�/ D 0: (KTC-1)
Of .x�/ DXiD1
��i Ohi .x�/: (KTC-2)
Proof. See Sundaram (1996, Section 7.7). ut
Example 3.32 Let U D .0;1/ � .�5;1/. Let f .x; y/ D ln x C ln.y C 5/,
h1.x; y/ D x C y � 4 and h2.x; y/ D �y. Consider the problem
max.x;y/2U
f .x; y/
s.t. h1.x; y/ 6 0; h2.x; y/ 6 0:
(3.11)
? Exercise 3.33 Show that .x�; y�/ D .4; 0/ is the unique point satisfying the
first-order condition for a local maximizer of the problem (3.11).
Clearly, the Slater’s condition holds. Then, combining Theorem 3.31, Ex-
ercise 3.33 and Example 3.26, we conclude that .4; 0/ is a global maximizer.
See Figure 3.6.
Slater’s Condition
For a formal demonstration of the need for Slater’s condition, let us con-
sider the following example.
3.5 CONVEXITY AND OPTIMIZATION 73
0 x
y
4
4
X
Lh1 .0/
Lf.ln4C
ln5/
.4; 0/
Figure 3.6: .4; 0/ is the global maximizer.
Example 3.34 Let U D R; let f .x/ D x and g.x/ D x2. The only point in
R satisfying g.x/ 6 0 is x D 0, so this is trivially the constrained maximizer
of f . But
f 0.0/ D 1 and g0.0/ D 0;
so there is no � > 0 such that f 0.0/ D �g0.0/.
Quasi-concavity and Optimization
Quasi-concave and quasi-convex functions fail to exhibit many of the sharp
properties that distinguish concave and convex functions. As an example,
we show that Theorem 3.31 fails for quasi-concave objective functions.
Example 3.35 Let f W R! R and h W R! R be quasi-concave continuously
differentiable functions, where
f .x/ D
„x3 if x 2 .�1; 0/
0 if x 2 Œ0; 1�
.x � 1/2 if x 2 .1;1/;
and h.x/ D �x; see Figure 3.7.
? Exercise 3.36 Show that for every point x 2 Œ0; 1�, there exists � > 0 such
that the pair .x; �/ satisfies (KT-1) and (KT-2) (see p. 44).
74 CHAPTER 3 CONVEX ANALYSIS IN RN
Furthermore, the Slater’s condition holds. However, it is clear that no
point x 2 Œ0; 1� can be a solution to the problem (Why?).
0 x
f.x/
h.x/D�x
1
Figure 3.7: Quasi-concavity and optimization.
REFERENCES
[1] Apostol, Tom M. (1974) Mathematical Analysis: Pearson Education,
2nd edition. [49]
[2] Axler, Sheldon (1997) Linear Algebra Done Right, Undergraduate
Texts in Mathematics, New York: Springer-Verlag, 2nd edition. [11,
55]
[3] Berkovitz, Leonard D. (2002) Convexity and Optimization in Rn,
Pure and Applied Mathematics: A Wiley-Interscience Series of Texts,
Monographs and Tracts, New York: Wiley-Interscience. [55]
[4] Bertsekas, Dimitri P. (2009) Convex Optimization Theory, Belmont,
Massachusetts: Athena Scientific. [55]
[5] Bertsekas, Dimitri P., Angelia Nedic, and Asuman E. Ozdaglar
(2003) Convex Analysis and Optimization, Belmont, Massachusetts:
Athena Scientific. [55]
[6] Crossley, Martin D. (2005) Essential Topology, Springer Undergrad-
uate Mathematics Series, London: Springer-Verlag. [7]
[7] Duggan, John (2010) “Notes on Optimization and Pareto Optimality,”
URL: http://www.johnduggan.net/lecture, Rochester University.
[31, 48]
[8] Hiriart-Urruty, Jean-Baptiste and Claude Lemaréchal (2001)
Fundamentals of Convex Analysis, Grundlehren Text Editions, Berlin:
Springer-Verlag. [55]
[9] Jehle, Geoffrey A. and Philip J. Reny (2011) Advanced Microeco-
nomic Theory, Essex, England: Prentice Hall, 3rd edition. [31, 41]
[10] Lay, Steven R. (1982) Convex Sets and Their Applications, New York:
John Wiley & Sons. [55]
76 REFERENCES
[11] Lee, Jeffrey M. (2009) Manifolds and Differential Geometry, Vol. 107
of Graduate Studies in Mathematics, Providence, Rhode Island: Amer-
ican Mathematical Society. [22]
[12] Luenberger, David G. (1969) Optimization by Vector Space Methods,
Wiley Professional Paperback Series, New York: John Wiley & Sons.
[31, 55, 58]
[13] Luenberger, David G. and Yinyu Ye (2008) Linear and Nonlinear
Programming, International Series in Operations Research and Man-
agement Science, New York: Springer, 3rd edition. [31]
[14] Mas-Colell, Andreu, Michael D. Whinston, and Jerry R. Green
(1995) Microeconomic Theory, New York: Oxford University Press. [31,
68]
[15] Milgrom, Paul (2004) Putting Auction Theory to Work, Cambridge,
U.K.: Cambridge University Press. [53]
[16] Milgrom, Paul and Ilya Segal (2002) “Envelope Theorems for Ar-
bitrary Choice Sets,” Econometrica, 70 (2): 583–601. [53]
[17] Munkres, James R. (1991) Analysis on Manifolds, Boulder, Colorado:
Westview Press. [1, 14, 17]
[18] Ok, Efe A. (2007) Real Analysis with Economic Applications, New Jer-
sey: Princeton University Press. [55]
[19] O’neill, Barrett (2006) Elementary Differential Geometry, New York:
Academic Press, revised second edition edition. [22]
[20] Pugh, Charles Chapman (2002) Real Mathematical Analysis, Under-
graduate Texts in Mathematics, New York: Springer-Verlag. [17]
[21] Rockafellar, R. Tyrrell (1970) Convex Analysis, Princeton Mathe-
matical Series, New Jersey: Princeton University Press, 2nd edition.
[55]
[22] Royden, Halsey and Patrick Fitzpatrick (2010) Real Analysis,
New Jersey: Prentice Hall, 4th edition. [55]
[23] Rudin, Walter (1976) Principles of Mathematical Analysis, New York:
McGraw-Hill Companies, Inc. 3rd edition. [14, 17, 27]
REFERENCES 77
[24] Spivak, Michael (1965) Calculus on Manifolds: A Modern Approach
to Classical Theorems of Advanced Calculus, Boulder, Colorado: West-
view Press. [1, 14, 17]
[25] (1999) A Comprehensive Introduction to Differential Geome-
try, Vol. Two, Houston, Texas: Publish or Perish, Inc. [22]
[26] Sundaram, Rangarajan K. (1996) A First Course in Optimization
Theory, Cambridge, Massachusetts: Cambridge University Press. [31,
34, 40, 41, 44, 72]
[27] Vohra, Rakesh V. (2005) Advanced Mathematical Economics, Lon-
don: Routledge. [31]
[28] Willard, Stephen (2004) General Topology, New York: Dover Publi-
cations, Inc. [4]
[29] Zorich, Vladimir A. (2004) Mathematical Analysis II, Universitext,
Berlin: Springer-Verlag. [49]
INDEX
C1, 25
C1, 27
C r , 27
Active inequality constraint, 42
Affine manifold, 58
Chain Rule, 14
Cobb-Douglas function, 28
Component function, 5
Concave function, 65
Continuously differentiable, 25
Convex combination, 56
Convex function, 65
Convex hull, 57
Convex set, 55
Critical point, 34
Curve, 22
Directional derivative, 9
Envelope, 49
Epigraph, 67
Euclidean n-space, 2
Farkas’ Lemma, 63
First-order Taylor formula, 10
Gradient, 19
Graph, 5
Half space, 58
Helix, 22
Hessian, 27
Hyperplane, 57
Implicit function, 16
Implicit function theorem, 17
Inactive constraint, 42
Inequality constrained optimization,
42
Inferior set, 67
Inner product, 3
Interior, 4
Jacobian matrix, 16, 24
Jensen’s inequality, 68
Kuhn-Tucker Multipliers, 44
Kuhn-Tucker Theorem, 44
Lagrange multiplier, 38, 39
Lagrange multipliers, 38
Lagrange’s Method, 39
Lagrange’s Theorem, 37
Lagrangian, 39
Limit, 5
Local maximum, 33
Maximum, 33
maximum, 33
Negative definite, 28
Negative semidefinite, 28
Norm, 2
Orthogonal complement, 37
Partial derivative, 13
INDEX 79
Positive definite, 28
Positive semidefinite, 28
Proper Separation Theorem, 61
Quadratic form, 27
quasi-concave function, 66
quasi-convex function, 66
Regular point, 23, 43
Second-order partial derivative, 27
Separation Theorem, 61
Separation theorem, 58
Sequence, 5
Shadow price, 39
Slater’s condition, 72
Standard basis, 13
Strict local maximum, 33
Strictly concave function, 65
Strictly convex function, 65
Strictly quasi-concave function, 66
Strictly quasi-convex function, 66
Strong Separation Theorem, 62
Subgraph, 67
Superior set, 67
Tangent plane, 23
Unconstrained optimization, 33
Value function, 49, 51