static lecture 1
Post on 02-Mar-2018
218 Views
Preview:
TRANSCRIPT
-
7/26/2019 Static Lecture 1
1/22
STATIC LECTURE 1
UNIVERSITY OF MARYLAND: ECON 600
1. Introduction12
The first half of the course is about techniques for solving a class of constrained
optimization problems that, in their most general form can be described verbally as
follows. There are some outcomes, x A that a decision-maker cares about. The
decision-maker can only choose from a subset,C A. What is the best choice for
the decision-maker?
Micro-economics is very much about problems of this sort:
How does a consumer select a consumption bundle from his affordable set
to maximize utility?
How does a monopolist set prices to maximize profits given demand?
How can a manager select inputs so as to minimize costs and achieve atarget output?
And so on. Mathematically, this problem can be stated by assuming there exists
a real-valued function, f : A R which characterizes the preferences of the
decision-maker. The decision-maker then solves the problem
maxxC
f(x)
Although the theory allows us to deal with a wide variety of different types ofx,
in this course we will almost exclusively focus on the case where A = Rn and the
decision-maker is selecting an n-dimensional vector.
Date: Summer 2013.1The notes corresponding to the static part of the course were prepared by Prof. Vincent.2Notes compiled on August 2, 2013
1
-
7/26/2019 Static Lecture 1
2/22
STATIC LECTURE 1 2
From your initial experience with microeconomics, you may already have been
confronted with concepts and issues in constrained optimization. Many of the
techniques you acquired in coping with these problems are covered in more detail
in this course.
The intent is two-fold. First, to give you a grounding in the theory that underlies
much of the standard constrained optimization techniques that you already are
using. Second, because the statements of some of the theorems are delicate, (they
require some special conditions) you need to be aware of when you can use them
directly, when some dangers arise and some ways of coping with these dangers.
The plan of the (first half of the) course is to establish some mathematical prelim-inaries, then to start with unconstrained optimization (Part II), next characteristics
of constraint sets (Part III)), then to go to the most important theory in constrained
optimization, Kuhn-Tucker Theory (Part IV)). After that, applications are studied
to appreciate where the theory works and where it fails. The last section is an
important area of applications of K-T Theory which is how to use it to conduct
comparative statics. (Part V)). The last half of the course then uses these techniques
to examine dynamic optimization.
2. Preliminary Concepts
2.1. Some Examples. How do we actually go about solving an optimization
problem of the form maxx f(x)? One can imagine just programming the function
and computing the solution. But what would be a more reliable analytic approach?
You all are probably fairly confident about the ability to solve many simply
unconstrained optimization problems. Loosely speaking, you take a first derivative
and find where the slope of the function is zero.
Obviously, this is too simple as the next figures show (Figures 2.1,2.2, 2.3, and
2.4).
The problem in Figure2.1 is that the solution is unbounded. No matter how
high we select x we still can do better by choosing a higher one.
-
7/26/2019 Static Lecture 1
3/22
STATIC LECTURE 1 3
Figure 2.1. Function increases without bound
In Figure 2.2, the difficulty is that the function is not differentiable at the
optimum.
In Figure2.3,the function is not continuous and does not achieve an optimum
at the candidate solution, x.
In Figure2.4,there are multiple local optima. In fact, there are three solutions
to the problem, take the derivative off(x) and find where it is equal to zero.
Other problems may also arise. For example, while we can understand the
approach for the single-dimensional problem,f : R R, how does the technique
work for the more interesting and more common multi-dimensional case,f : Rn R?
2.2. Continuity and Linearity.
2.2.1. Metrics. If we are focusing on the set of problems where the choice variable
is a n-dimensional real vector (x Rn) then we need to develop an idea of what it
-
7/26/2019 Static Lecture 1
4/22
STATIC LECTURE 1 4
Figure 2.2. Derivative does not exist at maximum
Figure 2.3. Function is discontinuous at maximum
-
7/26/2019 Static Lecture 1
5/22
STATIC LECTURE 1 5
Figure 2.4. Function has multiple local maxima
means for two different choices, x and y to be close to each other. That is, we
need an idea of distance, or norm or metric.
The notion of distance or closeness is the usual common sense idea of the
Euclidean distance. For a vector x Rn, we say that the length of that vector (or
equivalently, the distance of that vector from the origin) or its normor its metric
is defined by
x=
x21+ x22+ x
23+ + x
2n
That is, the square root of the sum of the squares of its components.
Remark 1. See SB 29.4
In fact, there are many different possible concepts of distance we could have used
even in Rn. If you notice what the norm is doing you will see that it is taking an
element from our Vector Space and giving us back a real number which we interpret
as telling us the distance or size of the element.
-
7/26/2019 Static Lecture 1
6/22
STATIC LECTURE 1 6
Figure 2.5. The Triangle Inequality: x + y x + y
More generally, then, a norm is any function that operates on a Vector Space
and gives us a real number back which also satisfies the following three properties:
(1) x 0, and furthermore x= 0 if and only ifx = 0
(2) x + y x + y(Triangle Inequality)
(3) ax= |a| x, a R, x V
Prove for yourself, that the Euclidean norm satisfies these conditions.Observe that the triangle inequality is geometrically sensible in Figure2.5
Remark 2. Some other possible metrics on Rn are:
(1) xp = (|x1|p + |x2|
p + |x3|p + + |xn|
p)1/p (Lp norm)
(2) x1 =n
i=1 |xi| (taxicab norm)
(3) x
= max{|x1|, |x2|, |x3|, . . . , |xn|} (maximum norm)
Which is the appropriate choice of norm can depend partly on the context (for
example, suppose you cared about a choice of multiple lane tunnels and you needed
to make sure that the tunnel you selected had a lane wide enough to let your vehicle
pass through, which would be the appropriate norm?) and partly on convenience --
some norms have better features or are more tractable than others.
-
7/26/2019 Static Lecture 1
7/22
STATIC LECTURE 1 7
Note that if we think of the norm ofx as denoting the distance ofx from the
origin, for any two vectors, x and y, the norm ofx y is a measure of the distance
betweenx and y.
We have a good intuition about what continuity means for a function from the
real line to the real line, (no jumps or gaps), since we can rarely draw the graphs of
functions from more complicated spaces, we need a more precise definition.
Roughly, we want to ensure that whenever x is close to x in the domain space,
f(x) is close tof(x) in the target space.
Since we focus onRn, we will typically just use the Euclidean Norm as our idea
of distance.
Definition 3. A sequence of elements, {xn} is said to converge to a point, x Rn,
if for every >0 there is a number, Nsuch that for all n > N, xn x< .
Definition 4. A function f : Rn Rm is said to be continuous at a point x if for
ALL sequences{xn} converging to x, the derived sequence of points in the target
space, {f(xn)}converges to the point f(x). We say that a function is continuous if
it is continuous at all points in its domain.
Observe why the example in Figure 2.3 above fails the condition of continuity at
x.
Definition 5. A functionf :V W is linear if for any two real numbers a, b and
for any two elements, v, v V we havef(av+ bv) = af(v) + bf(v).
Note that any linear function from Rn to Rm can be represented by an m by n
matrix, A, such that f(x) = Ax. (You might also observe that this means that
f(x) is the (column) vector of numbers that result when we take the inner product
of every row ofA with x.)
Note that although we sometimes call functions from R to R of the form f(x) =
mx+blinear functions, these are really affine functions. Why do these functions
not generally satisfy the definition of linear functions?
-
7/26/2019 Static Lecture 1
8/22
STATIC LECTURE 1 8
Remark 6. In order to ensure that a solution exists to an optimization problem,
(that is, to rule out problems like that shown in Figure 2.3) we generally need to
rule out decision problems where the objective function, f(x), is not continuous
in x. The typical approach is simply to assume that the problem is such that
the continuity of the objective function holds. Usually this is not controversial.
However, there are cases where it is too strong. We can sometimes make a less
strong assumption and assume that the objective function satisfies
xn x, limn
f(xn) f(x)
In this case, we say f(x) is upper semi-continuous. For an economic example
where this became an issue, see Dasgupta, P. and E. Maskin, The Existence of
Equilibria in Discontinuous Games: Theory, Review of Economic Studies, LIII,
1986, pp. 1-26.
2.3. Vector Geometry. The next problem addressed (in the next two subsections),
is how to extend our intuition about an optimum being at a point where the objective
function has slope zero to multiple dimensions. First, we need some notions of
vector geometry and later some multidimensional calculus.
Definition 7. A set of vectors {v1, v2, . . . , vn} is linearly independent if and only
if the only set of numbers {a1, a2, . . . , an} that satisfies the following equation is
the trivial solution where all a are identically 0
a1v1+ a2v2+ + anvn= 0
Forx, y Rn, the inner product ofx and y is x y= (x1y1+ x2y2+ + xnyn).
Note that there is a direct relationship between the inner product and the Euclidean
norm. That is
x2 =x x
-
7/26/2019 Static Lecture 1
9/22
STATIC LECTURE 1 9
Figure 2.6. Upper semi-continuous (bottom point open, top closed)
Two vectors are orthogonal to each other (geometrically, are perpendicular to
each other) ifx y= 0.
Suppose thatx= (1, 0, 0). Find a vector which is orthogonal to x. Show that
ify is orthogonal to x, then ay is also orthogonal to x. Show that there are two
linearly independent vectors which are orthogonal to x.
Let v, w Rn. In matrix notation, v and w are n 1 matrices. v is the
transpose ofv, that is, it is the 1 n matrix derived from v. We can thus write
the inner product ofv and w as the transpose ofv pre-multiplying w. That is, let
vw=n
i=1 viwi = v w .
-
7/26/2019 Static Lecture 1
10/22
STATIC LECTURE 1 10
Note as well, that inRn, if we take any two vectors and join them at the tail,
the two vectors will define a plane (a two dimensional at surface) inRn. How are
the two vectors related?
Theorem 8. (SB 10.4)
Ifv w> 0, v andw form an acute angle with each other.
Ifv w< 0 they form an obtuse (greater than 90 degrees) angle with each
other.
Ifv w= 0, then they are perpendicular to each other. (They are orthogonal
to each other.)
2.4. Hyperplanes: Supporting and Separating.
Definition 9. A linear function is a transformation from Rn toRm with the feature
thatf(ax + by) = af(x) + bf(y) for all x, y inRn and for all a, binR.
Fact 10. Every linear functional (a linear function withR as the range-space) can
itself be represented by an n-dimensional vector (call it (f1, f2, . . . , f n)) with the
feature that
f(x) =ni=1
fixi
That is, the value of the functional atx, is just the inner product of this defining
vector(f1, f2, . . . , f n) withx.
Remark 11. Note that once we fix a domain space, for example,Rn, we could ask
the question, What constitutes all of the possible linear functionals defined on that
space? Obviously this is a large set. The set of all such functionals for a given
domain space V, is called the dual space ofVand is often denoted, V .
Fact 12. The fact above implies thatRn is its own dual space. This symmetry,
though, does not always hold (for example if the domain space is the vector space
of all continuous functions defined over[0, 1], the dual space is quite different) so
-
7/26/2019 Static Lecture 1
11/22
STATIC LECTURE 1 11
it is mathematically correct to continue to carefully distinguish between a domain
space and its dual space.
Definition 13. A hyperplane is the set of points given by {x: f(x) = c)} where f
is a linear functional and c is some real number. (We cannot have fbe the trivial
linear functional of all zeroes.)
Example. For R2 a hyperplane is a straight line.
Example. For R3 a hyperplane is a plane.
Intuition: Note that using the definition of a hyperplane, we can think of it as
one of the many level sets of the special linear functional, f. As we vary c, we
change level sets.
Intuition: Suppose that x, y are two points on a given hyperplane with defining
vector given by (f1, f2, . . . , f n). Note that the vector that joins x and y , x y lies
along the hyperplane. Using the definition of the hyperplane, we can show that the
defining vector (f1, f2, . . . , f n) is orthogonal to the hyperplane in the sense that it
is orthogonal to any line that joins any two points on the hyperplane. (Prove this
for yourself.)
Definition 14. A Half-Space is the set of points on one side or another of a
hyperplane. It can be defined formally as H S(f) = {x: f(x) c}or H S(f) ={x:
f(x) c} wheref is the linear functional that defines the hyperplane.
Now consider any two disjoint (nonintersecting) sets. When can I construct a
hyperplane that goes in between them or that separates them?
Definition 15. A hyperplane separates two sets, C1,C2, if for all x C1, f(x) c
and for all x C2, f(x) c. That is, the two sets lie completely in two differenthalfspaces determined by the hyperplane.
In R2 , the separating hyperplane looks like this:
But of course, it is not always possible to draw separating hyperplanes. Try
doing it on figure2.8.
-
7/26/2019 Static Lecture 1
12/22
STATIC LECTURE 1 12
Figure 2.7. A Separating hyperplane
Definition 16. IfClies in a half-space defined by H and Hcontains a point on
the boundary ofCthen we say thatHis a supporting hyperplane ofC.
Recall the following important definition.
Definition 17. A set C Rn is convex if for all x, y C, for all [0, 1],
x + (1 )y C
Since any given convex set can be represented as the intersection of halfspaces
defined by all of the supporting hyperplanes of the set, we may be able to anticipate
the role of hyperplanes in optimization theory. The problem is to find a point in C
-
7/26/2019 Static Lecture 1
13/22
STATIC LECTURE 1 13
Figure 2.8. No separating Hyperplane exists
Figure 2.9. Supporting Hyperplanes
to minimize the distance to x that would yield the same answer as the problem:
among all the separating hyperplanes between x and C, find the hyperplane that is
the farthest away from x. Note that this hyperplane is a supporting hyperplane of
Cand is orthogonal to the vector x. (See Figure2.9)
-
7/26/2019 Static Lecture 1
14/22
STATIC LECTURE 1 14
Figure 2.10. Separating Hyperplane Examples
There are many versions of separating hyperplane theorems but I will give just
one. (See also de la Fuente, 241-244) (See lecture 3 for more details on Int and
other set notation.)
Theorem 18. (Takayama pp. 39-49) SupposeX, Yare non-empty, convex sets
inRn such that Int(Y) X=; and the Interior ofY is not empty. Then there
exists a vector a in Rn which is the defining vector of a separating hyperplane
betweenX andY. That is, for allx
X, ax
c and for ally
Y, cay
.
Remark 19. The requirement that the interior ofY be disjoint with Xallows for
the two sets to intersect on a boundary. The requirement that the interior ofY be
nonempty rules out the counterexample of two intersecting lines (see Figure2.10).
Definition 20. The Graph of a function from V to W is the ordered pair of
elements,{(v, w) : v V, w= f(v)}.
Example. The graph off(x) = x2 is {(x, x2) : x R}See Figure2.11.
Remark 21. The graph of a function is what you normally see when you draw the
function in a Cartesian diagram.
-
7/26/2019 Static Lecture 1
15/22
STATIC LECTURE 1 15
Figure 2.11. A Graph
2.5. Derivatives, Gradients, and Subgradients. You already know that, in
well-behaved problems, a necessary condition for x to be an unconstrained maximum
of a function fis that its derivative be zero (if the derivative exists) at x. Indeed,
this notion generalizes, if the partial derivatives of a function, f : Rn R exist at
x andx is an unconstrained maximum, then all the partial derivatives at x must
be zero.
The questions explored in this section are:
Why is this true?
What happens if the derivatives do not exist?
What is the geometric interpretation of this?
2.5.1. Single Dimension. Forf : R R, the formal definition of the derivative of
fat some pointx is
f(x) = limh0
f(x + h) f(x)
h
whereh represents any sequence going to zero.
-
7/26/2019 Static Lecture 1
16/22
STATIC LECTURE 1 16
Figure 2.12. The Derivative off
Note that this object does not exist at allxfor every function. Thus, we sometimes
encounter functions (even continuous functions) which do not have a derivative at
some x. Though we can often rule these out without any harm, it is also the case
that non-differentiable functions arise naturally in economic problems so we cannot
always do this.
Informally, we think of the derivative off atx as telling us about the slope off.
Note that this is really a notion about the graph off.
Another way to think about what the derivative does, which ties more directly
into optimization theory and also gives us a better clue about how to extend it to
many dimensions is to see that it defines a supporting hyperplane to the graph of
fat the point (x, f(x)).
To see this, consider the points in the (x, y) space given by
H={(x, y)|(f(x), 1) (x, y) = f(x)x f(x)}
This is a hyperplane and exactly defines the line drawn in the graph. It touches
(is tangent to) the graph off atf(x).
-
7/26/2019 Static Lecture 1
17/22
STATIC LECTURE 1 17
2.5.2. Multidimensional Derivatives. The extension of the derivative to functions
f : Rn R is fairly direct. For the ith component, the ith partial derivative off
atx = (xi, xi) is computed by thinking of the function fxi : R R given by
fxi =f(xi; xi)
where we treat the components xi as fixed and vary xi. We then compute the
partial derivative of fwith respect to xi at x by computing the ordinary one-
dimensional derivative offxi . This is like taking a slice of the graph off along
theith dimension.
f(x)
xi= lim
h0
f(xi+ h, xi) f(x)
h
Definition 22. The gradient offatx (written,f(x)) is then-dimensional vector
which lists all the partial derivatives offif they exist.
f(x) =
f(x)
x1,f(x)
x2, . . . ,
f(x)
xn
Definition 23. The derivative off atx, written:
Df=
f(x)x1
,f(x)x2
, . . . ,f(x)xn
is the 1 n row vector of partial derivatives if they exist. That is, it is the
transpose of the gradient off.
These objects are useful because if we take a small vectorv= (v1, v2, . . . , vn) Rn
the vector, fhelps us determine approximately how f changes when we move
from x in the direction ofv . The sum over i = 1, . . . , nofvi f(x)/xi is a very
close estimate of the change in fwhen we move from x to x + v. That is
f(x + v) f(x) +ni=1
f(x)
xivi
-
7/26/2019 Static Lecture 1
18/22
STATIC LECTURE 1 18
Figure 2.13. The two-dimensional gradient as a supporting hyperplane
Using the gradient off, we can write this summation term more concisely as an
inner product:
ni=1
f(x)
xivi= f(x) v
or as a vector multiplication:
ni=1
f(x)
xivi= v
f(x)
As in the one-dimensional case, the gradient can be interpreted as a supporting
hyperplane of the graph off. The hyperplane is defined as:
H= {(x, y)|x Rn, y R, (f(x), 1)(x, y) = f(x)x f(x)}
The object on the right side of the equality is just a real number that corresponds
to the position of the hyperplane, the left side corresponds to the slope of the
hyperplane.
The function in Figure2.13is the graph off(x, y) = x2 +y2 and its gradient
hyperplane at the point (1, 1). The flat plane (the gradient hyperplane) just kisses
the graph at the point (1, 1, 2).
-
7/26/2019 Static Lecture 1
19/22
STATIC LECTURE 1 19
2.5.3. Second Order Derivatives. Iff : Rn Rm thenfis a vector valued function
(that is, for any x Rn, fgives a vector in Rm). Alternatively, we can think of
fas consisting of a list ofm functions, fi : Rn R, i = 1, 2, . . . , m, and we can
take derivatives of each of these functions as above. In this case, the gradient off
at x is actually a n m matrix. Each column of the matrix, say column i is the
n-dimensional gradient of the function, fi.
This logic allows us to consider second derivatives of a function, f : Rn R. If
f is twice continuously differentiable (written as C2) then we can also differentiate
the gradient off (which is a function fromRn toRn) to get an n n matrix of
functions:
2f(x) =
2f(x)x2
1
2f(x)x1x2
. . . 2f(x)
x1xn
2f(x)x2x1
2f(x)x2x2
. . . 2f(x)
x2xn...
... . . .
...
2f(x)xnx1
2f(x)xnx2
. . . 2f(x)x2
n
which is sometimes written as
2f(x) =
f11 f12 . . . f 1n
f21 f22 . . . f 2n...
... . . .
...
fn1 fn2 . . . f nn
Definition 24. The derivative of the gradient (or Jacobian) of f is called the
Hessian off
Theorem 25. Youngs Theorem (SB 14.5): Iff isC2, fij(x) =fji(x). That is,
the Hessian off is symmetric.
2.6. Homogeneous and Homothetic Functions. Certain functions in Rn are
especially well-behaved:
Definition 26. A function f : Rn R is homogeneous of degree kiff(tx1, tx2, . . . , t xn) =
tkf(x) fort R.
-
7/26/2019 Static Lecture 1
20/22
STATIC LECTURE 1 20
One reason why these functions are useful is that they are very easy to characterize.
For example, suppose we only knew the value that the function takes for some ball
around the origin. Then we can use the homogeneity assumption to determine its
value everywhere in Rn. That is because for a ball that completely surrounds the
origin, we can define any point x Rn as some scalar multiple of some point on
that ball, so x =tx. Then apply the definition.
Definition 27. Iff(tx1, tx2, . . . , t xn) = tf(x) we say that f is linearly homoge-
neous.
An important feature of homogeneous functions comes from the following theorem:
Theorem 28. Eulers Theorem (SB 20.4): Iffis homogeneous of degreek then
x f(x) = kf(x)
Proof. Use the chain rule of differentiation to get
d
dtf(tx1, tx2, . . . , t xn) =
f(tx)
x1x1+
f(tx)
x2x2+ +
f(tx)
xnxn
=ni=1
f(tx)
xixi
=f(tx) x
where f(tx)/xi represents the partial derivative of f with respect to its ith
argument evaluated attx. Now note that by homogeneity,
d
dtf(tx1, tx2, . . . , t xn) =
d
dttkf(x1, x2, . . . , xn)
=ktk1f(x)
Both of these results hold for any value oft so in particular, choose t = 1. Substi-tuting and combining the two equations gives the result.
Definition 29. A ray through x Rn is defined as the set, {x Rn :x =tx, t
R}. Geometrically, it is the line joining x and the origin and extending forever in
both directions.
-
7/26/2019 Static Lecture 1
21/22
STATIC LECTURE 1 21
A useful feature of homogeneous functions is that the Jacobian or gradient of the
function is (essentially) the same along any ray. Since for any two points, x, x on
a ray (x =tx), we have f(x) = tkf(x), thenf(x) =f(tx) = tkf(x). That
is, the gradient at x is just a scalar multiple of the gradient at x and so the two
vectors are linearly dependent.
This means that the level sets of the function, along any ray, have the same
slope.
Application: An important application of this is that homogeneous utility func-
tions rule out income effects on demand. (For constant prices, consumers demand
goods in the same proportion as income changes.) This feature of identical Jacobianvectors is not restricted to homogeneous functions. Homothetic functions also exhibit
this feature.
Definition 30. A function f : Rn+ R+ is homothetic iff(x) = h(v(x)) where
h: R+ R+ is strictly increasing and v : Rn+ R+ is homogeneous of degree k .
2.7. Some More Geometry of Vectors in Rn.
Theorem 31. (SB Theorems 10.3 and 14.2). Consider a continuously differentiable
function, f : Rn R. f(x) is a vector in Rn which points in the direction of
greatest increase offmoving from the pointx.
Note that if we define a (small) vector v such that f(x)v= 0, then we know
thatv is moving us away from x in a direction that adds zero to the value off(x).
Therefore, solving the equation v f(x) = 0 is a way of finding the level sets of
f(x). Geometrically, the vector v is tangent to the level set off(x).
Also, we know that v and f(x) are orthogonal (or normal) to each other (by
definition since two vectors, w, v are orthogonal ifw v= 0) . Thus, the directionof greatest increase of a function at a point x is at right angles to the level set at x.
Definition 32. Consider a function f : Rn R. The level set off or the set
{x: f(x) =c}is the set of points in Rn which yield the same value c for f. The set
of points{x: f(x) c}is an upper contour set off.
-
7/26/2019 Static Lecture 1
22/22
STATIC LECTURE 1 22
If n = 2 and f is C1, we can solve for the level set of f by using the total
differential: Let (dx,dy) satisfy
df= 0 =f(x, y)
x dx +
f(x, y)
y dy
Then the differential equation,
dy
dx=
f(x;y)x
f(x;y)y
along with an initial condition, y0(x0) =y0 will trace out the level set off through
the point (x0, y0).
top related