static lecture 1

7/26/2019 Static Lecture 1

1/22

STATIC LECTURE 1

UNIVERSITY OF MARYLAND: ECON 600

1. Introduction12

The first half of the course is about techniques for solving a class of constrained

optimization problems that, in their most general form can be described verbally as

follows. There are some outcomes, x A that a decision-maker cares about. The

decision-maker can only choose from a subset,C A. What is the best choice for

the decision-maker?

Micro-economics is very much about problems of this sort:

How does a consumer select a consumption bundle from his affordable set

to maximize utility?

How does a monopolist set prices to maximize profits given demand?

How can a manager select inputs so as to minimize costs and achieve atarget output?

And so on. Mathematically, this problem can be stated by assuming there exists

a real-valued function, f : A R which characterizes the preferences of the

decision-maker. The decision-maker then solves the problem

maxxC

f(x)

Although the theory allows us to deal with a wide variety of different types ofx,

in this course we will almost exclusively focus on the case where A = Rn and the

decision-maker is selecting an n-dimensional vector.

Date: Summer 2013.1The notes corresponding to the static part of the course were prepared by Prof. Vincent.2Notes compiled on August 2, 2013

1


2/22

STATIC LECTURE 1 2

From your initial experience with microeconomics, you may already have been

confronted with concepts and issues in constrained optimization. Many of the

techniques you acquired in coping with these problems are covered in more detail

in this course.

The intent is two-fold. First, to give you a grounding in the theory that underlies

much of the standard constrained optimization techniques that you already are

using. Second, because the statements of some of the theorems are delicate, (they

require some special conditions) you need to be aware of when you can use them

directly, when some dangers arise and some ways of coping with these dangers.

The plan of the (first half of the) course is to establish some mathematical prelim-inaries, then to start with unconstrained optimization (Part II), next characteristics

of constraint sets (Part III)), then to go to the most important theory in constrained

optimization, Kuhn-Tucker Theory (Part IV)). After that, applications are studied

to appreciate where the theory works and where it fails. The last section is an

important area of applications of K-T Theory which is how to use it to conduct

comparative statics. (Part V)). The last half of the course then uses these techniques

to examine dynamic optimization.

2. Preliminary Concepts

2.1. Some Examples. How do we actually go about solving an optimization

problem of the form maxx f(x)? One can imagine just programming the function

and computing the solution. But what would be a more reliable analytic approach?

You all are probably fairly confident about the ability to solve many simply

unconstrained optimization problems. Loosely speaking, you take a first derivative

and find where the slope of the function is zero.

Obviously, this is too simple as the next figures show (Figures 2.1,2.2, 2.3, and

2.4).

The problem in Figure2.1 is that the solution is unbounded. No matter how

high we select x we still can do better by choosing a higher one.


3/22

STATIC LECTURE 1 3

Figure 2.1. Function increases without bound

In Figure 2.2, the difficulty is that the function is not differentiable at the

optimum.

In Figure2.3,the function is not continuous and does not achieve an optimum

at the candidate solution, x.

In Figure2.4,there are multiple local optima. In fact, there are three solutions

to the problem, take the derivative off(x) and find where it is equal to zero.

Other problems may also arise. For example, while we can understand the

approach for the single-dimensional problem,f : R R, how does the technique

work for the more interesting and more common multi-dimensional case,f : Rn R?

2.2. Continuity and Linearity.

2.2.1. Metrics. If we are focusing on the set of problems where the choice variable

is a n-dimensional real vector (x Rn) then we need to develop an idea of what it


4/22

STATIC LECTURE 1 4

Figure 2.2. Derivative does not exist at maximum

Figure 2.3. Function is discontinuous at maximum


5/22

STATIC LECTURE 1 5

Figure 2.4. Function has multiple local maxima

means for two different choices, x and y to be close to each other. That is, we

need an idea of distance, or norm or metric.

The notion of distance or closeness is the usual common sense idea of the

Euclidean distance. For a vector x Rn, we say that the length of that vector (or

equivalently, the distance of that vector from the origin) or its normor its metric

is defined by

x=

x21+ x22+ x

23+ + x

2n

That is, the square root of the sum of the squares of its components.

Remark 1. See SB 29.4

In fact, there are many different possible concepts of distance we could have used

even in Rn. If you notice what the norm is doing you will see that it is taking an

element from our Vector Space and giving us back a real number which we interpret

as telling us the distance or size of the element.


6/22

STATIC LECTURE 1 6

Figure 2.5. The Triangle Inequality: x + y x + y

More generally, then, a norm is any function that operates on a Vector Space

and gives us a real number back which also satisfies the following three properties:

(1) x 0, and furthermore x= 0 if and only ifx = 0

(2) x + y x + y(Triangle Inequality)

(3) ax= |a| x, a R, x V

Prove for yourself, that the Euclidean norm satisfies these conditions.Observe that the triangle inequality is geometrically sensible in Figure2.5

Remark 2. Some other possible metrics on Rn are:

(1) xp = (|x1|p + |x2|

p + |x3|p + + |xn|

p)1/p (Lp norm)

(2) x1 =n

i=1 |xi| (taxicab norm)

(3) x

= max{|x1|, |x2|, |x3|, . . . , |xn|} (maximum norm)

Which is the appropriate choice of norm can depend partly on the context (for

example, suppose you cared about a choice of multiple lane tunnels and you needed

to make sure that the tunnel you selected had a lane wide enough to let your vehicle

pass through, which would be the appropriate norm?) and partly on convenience --

some norms have better features or are more tractable than others.


7/22

STATIC LECTURE 1 7

Note that if we think of the norm ofx as denoting the distance ofx from the

origin, for any two vectors, x and y, the norm ofx y is a measure of the distance

betweenx and y.

We have a good intuition about what continuity means for a function from the

real line to the real line, (no jumps or gaps), since we can rarely draw the graphs of

functions from more complicated spaces, we need a more precise definition.

Roughly, we want to ensure that whenever x is close to x in the domain space,

f(x) is close tof(x) in the target space.

Since we focus onRn, we will typically just use the Euclidean Norm as our idea

of distance.

Definition 3. A sequence of elements, {xn} is said to converge to a point, x Rn,

if for every >0 there is a number, Nsuch that for all n > N, xn x< .

Definition 4. A function f : Rn Rm is said to be continuous at a point x if for

ALL sequences{xn} converging to x, the derived sequence of points in the target

space, {f(xn)}converges to the point f(x). We say that a function is continuous if

it is continuous at all points in its domain.

Observe why the example in Figure 2.3 above fails the condition of continuity at

x.

Definition 5. A functionf :V W is linear if for any two real numbers a, b and

for any two elements, v, v V we havef(av+ bv) = af(v) + bf(v).

Note that any linear function from Rn to Rm can be represented by an m by n

matrix, A, such that f(x) = Ax. (You might also observe that this means that

f(x) is the (column) vector of numbers that result when we take the inner product

of every row ofA with x.)

Note that although we sometimes call functions from R to R of the form f(x) =

mx+blinear functions, these are really affine functions. Why do these functions

not generally satisfy the definition of linear functions?


8/22

STATIC LECTURE 1 8

Remark 6. In order to ensure that a solution exists to an optimization problem,

(that is, to rule out problems like that shown in Figure 2.3) we generally need to

rule out decision problems where the objective function, f(x), is not continuous

in x. The typical approach is simply to assume that the problem is such that

the continuity of the objective function holds. Usually this is not controversial.

However, there are cases where it is too strong. We can sometimes make a less

strong assumption and assume that the objective function satisfies

xn x, limn

f(xn) f(x)

In this case, we say f(x) is upper semi-continuous. For an economic example

where this became an issue, see Dasgupta, P. and E. Maskin, The Existence of

Equilibria in Discontinuous Games: Theory, Review of Economic Studies, LIII,

1986, pp. 1-26.

2.3. Vector Geometry. The next problem addressed (in the next two subsections),

is how to extend our intuition about an optimum being at a point where the objective

function has slope zero to multiple dimensions. First, we need some notions of

vector geometry and later some multidimensional calculus.

Definition 7. A set of vectors {v1, v2, . . . , vn} is linearly independent if and only

if the only set of numbers {a1, a2, . . . , an} that satisfies the following equation is

the trivial solution where all a are identically 0

a1v1+ a2v2+ + anvn= 0

Forx, y Rn, the inner product ofx and y is x y= (x1y1+ x2y2+ + xnyn).

Note that there is a direct relationship between the inner product and the Euclidean

norm. That is

x2 =x x


9/22

STATIC LECTURE 1 9

Figure 2.6. Upper semi-continuous (bottom point open, top closed)

Two vectors are orthogonal to each other (geometrically, are perpendicular to

each other) ifx y= 0.

Suppose thatx= (1, 0, 0). Find a vector which is orthogonal to x. Show that

ify is orthogonal to x, then ay is also orthogonal to x. Show that there are two

linearly independent vectors which are orthogonal to x.

Let v, w Rn. In matrix notation, v and w are n 1 matrices. v is the

transpose ofv, that is, it is the 1 n matrix derived from v. We can thus write

the inner product ofv and w as the transpose ofv pre-multiplying w. That is, let

vw=n

i=1 viwi = v w .


10/22

STATIC LECTURE 1 10

Note as well, that inRn, if we take any two vectors and join them at the tail,

the two vectors will define a plane (a two dimensional at surface) inRn. How are

the two vectors related?

Theorem 8. (SB 10.4)

Ifv w> 0, v andw form an acute angle with each other.

Ifv w< 0 they form an obtuse (greater than 90 degrees) angle with each

other.

Ifv w= 0, then they are perpendicular to each other. (They are orthogonal

to each other.)

2.4. Hyperplanes: Supporting and Separating.

Definition 9. A linear function is a transformation from Rn toRm with the feature

thatf(ax + by) = af(x) + bf(y) for all x, y inRn and for all a, binR.

Fact 10. Every linear functional (a linear function withR as the range-space) can

itself be represented by an n-dimensional vector (call it (f1, f2, . . . , f n)) with the

feature that

f(x) =ni=1

fixi

That is, the value of the functional atx, is just the inner product of this defining

vector(f1, f2, . . . , f n) withx.

Remark 11. Note that once we fix a domain space, for example,Rn, we could ask

the question, What constitutes all of the possible linear functionals defined on that

space? Obviously this is a large set. The set of all such functionals for a given

domain space V, is called the dual space ofVand is often denoted, V .

Fact 12. The fact above implies thatRn is its own dual space. This symmetry,

though, does not always hold (for example if the domain space is the vector space

of all continuous functions defined over[0, 1], the dual space is quite different) so


11/22

STATIC LECTURE 1 11

it is mathematically correct to continue to carefully distinguish between a domain

space and its dual space.

Definition 13. A hyperplane is the set of points given by {x: f(x) = c)} where f

is a linear functional and c is some real number. (We cannot have fbe the trivial

linear functional of all zeroes.)

Example. For R2 a hyperplane is a straight line.

Example. For R3 a hyperplane is a plane.

Intuition: Note that using the definition of a hyperplane, we can think of it as

one of the many level sets of the special linear functional, f. As we vary c, we

change level sets.

Intuition: Suppose that x, y are two points on a given hyperplane with defining

vector given by (f1, f2, . . . , f n). Note that the vector that joins x and y , x y lies

along the hyperplane. Using the definition of the hyperplane, we can show that the

defining vector (f1, f2, . . . , f n) is orthogonal to the hyperplane in the sense that it

is orthogonal to any line that joins any two points on the hyperplane. (Prove this

for yourself.)

Definition 14. A Half-Space is the set of points on one side or another of a

hyperplane. It can be defined formally as H S(f) = {x: f(x) c}or H S(f) ={x:

f(x) c} wheref is the linear functional that defines the hyperplane.

Now consider any two disjoint (nonintersecting) sets. When can I construct a

hyperplane that goes in between them or that separates them?

Definition 15. A hyperplane separates two sets, C1,C2, if for all x C1, f(x) c

and for all x C2, f(x) c. That is, the two sets lie completely in two differenthalfspaces determined by the hyperplane.

In R2 , the separating hyperplane looks like this:

But of course, it is not always possible to draw separating hyperplanes. Try

doing it on figure2.8.


12/22

STATIC LECTURE 1 12

Figure 2.7. A Separating hyperplane

Definition 16. IfClies in a half-space defined by H and Hcontains a point on

the boundary ofCthen we say thatHis a supporting hyperplane ofC.

Recall the following important definition.

Definition 17. A set C Rn is convex if for all x, y C, for all [0, 1],

x + (1 )y C

Since any given convex set can be represented as the intersection of halfspaces

defined by all of the supporting hyperplanes of the set, we may be able to anticipate

the role of hyperplanes in optimization theory. The problem is to find a point in C


13/22

STATIC LECTURE 1 13

Figure 2.8. No separating Hyperplane exists

Figure 2.9. Supporting Hyperplanes

to minimize the distance to x that would yield the same answer as the problem:

among all the separating hyperplanes between x and C, find the hyperplane that is

the farthest away from x. Note that this hyperplane is a supporting hyperplane of

Cand is orthogonal to the vector x. (See Figure2.9)


14/22

STATIC LECTURE 1 14

Figure 2.10. Separating Hyperplane Examples

There are many versions of separating hyperplane theorems but I will give just

one. (See also de la Fuente, 241-244) (See lecture 3 for more details on Int and

other set notation.)

Theorem 18. (Takayama pp. 39-49) SupposeX, Yare non-empty, convex sets

inRn such that Int(Y) X=; and the Interior ofY is not empty. Then there

exists a vector a in Rn which is the defining vector of a separating hyperplane

betweenX andY. That is, for allx

X, ax

c and for ally

Y, cay

.

Remark 19. The requirement that the interior ofY be disjoint with Xallows for

the two sets to intersect on a boundary. The requirement that the interior ofY be

nonempty rules out the counterexample of two intersecting lines (see Figure2.10).

Definition 20. The Graph of a function from V to W is the ordered pair of

elements,{(v, w) : v V, w= f(v)}.

Example. The graph off(x) = x2 is {(x, x2) : x R}See Figure2.11.

Remark 21. The graph of a function is what you normally see when you draw the

function in a Cartesian diagram.


15/22

STATIC LECTURE 1 15

Figure 2.11. A Graph

2.5. Derivatives, Gradients, and Subgradients. You already know that, in

well-behaved problems, a necessary condition for x to be an unconstrained maximum

of a function fis that its derivative be zero (if the derivative exists) at x. Indeed,

this notion generalizes, if the partial derivatives of a function, f : Rn R exist at

x andx is an unconstrained maximum, then all the partial derivatives at x must

be zero.

The questions explored in this section are:

Why is this true?

What happens if the derivatives do not exist?

What is the geometric interpretation of this?

2.5.1. Single Dimension. Forf : R R, the formal definition of the derivative of

fat some pointx is

f(x) = limh0

f(x + h) f(x)

h

whereh represents any sequence going to zero.


16/22

STATIC LECTURE 1 16

Figure 2.12. The Derivative off

Note that this object does not exist at allxfor every function. Thus, we sometimes

encounter functions (even continuous functions) which do not have a derivative at

some x. Though we can often rule these out without any harm, it is also the case

that non-differentiable functions arise naturally in economic problems so we cannot

always do this.

Informally, we think of the derivative off atx as telling us about the slope off.

Note that this is really a notion about the graph off.

Another way to think about what the derivative does, which ties more directly

into optimization theory and also gives us a better clue about how to extend it to

many dimensions is to see that it defines a supporting hyperplane to the graph of

fat the point (x, f(x)).

To see this, consider the points in the (x, y) space given by

H={(x, y)|(f(x), 1) (x, y) = f(x)x f(x)}

This is a hyperplane and exactly defines the line drawn in the graph. It touches

(is tangent to) the graph off atf(x).


17/22

STATIC LECTURE 1 17

2.5.2. Multidimensional Derivatives. The extension of the derivative to functions

f : Rn R is fairly direct. For the ith component, the ith partial derivative off

atx = (xi, xi) is computed by thinking of the function fxi : R R given by

fxi =f(xi; xi)

where we treat the components xi as fixed and vary xi. We then compute the

partial derivative of fwith respect to xi at x by computing the ordinary one-

dimensional derivative offxi . This is like taking a slice of the graph off along

theith dimension.

f(x)

xi= lim

h0

f(xi+ h, xi) f(x)

h

Definition 22. The gradient offatx (written,f(x)) is then-dimensional vector

which lists all the partial derivatives offif they exist.

f(x) =

f(x)

x1,f(x)

x2, . . . ,

f(x)

xn

Definition 23. The derivative off atx, written:

Df=

f(x)x1

,f(x)x2

, . . . ,f(x)xn

is the 1 n row vector of partial derivatives if they exist. That is, it is the

transpose of the gradient off.

These objects are useful because if we take a small vectorv= (v1, v2, . . . , vn) Rn

the vector, fhelps us determine approximately how f changes when we move

from x in the direction ofv . The sum over i = 1, . . . , nofvi f(x)/xi is a very

close estimate of the change in fwhen we move from x to x + v. That is

f(x + v) f(x) +ni=1

f(x)

xivi


18/22

STATIC LECTURE 1 18

Figure 2.13. The two-dimensional gradient as a supporting hyperplane

Using the gradient off, we can write this summation term more concisely as an

inner product:

ni=1

f(x)

xivi= f(x) v

or as a vector multiplication:

ni=1

f(x)

xivi= v

f(x)

As in the one-dimensional case, the gradient can be interpreted as a supporting

hyperplane of the graph off. The hyperplane is defined as:

H= {(x, y)|x Rn, y R, (f(x), 1)(x, y) = f(x)x f(x)}

The object on the right side of the equality is just a real number that corresponds

to the position of the hyperplane, the left side corresponds to the slope of the

hyperplane.

The function in Figure2.13is the graph off(x, y) = x2 +y2 and its gradient

hyperplane at the point (1, 1). The flat plane (the gradient hyperplane) just kisses

the graph at the point (1, 1, 2).


19/22

STATIC LECTURE 1 19

2.5.3. Second Order Derivatives. Iff : Rn Rm thenfis a vector valued function

(that is, for any x Rn, fgives a vector in Rm). Alternatively, we can think of

fas consisting of a list ofm functions, fi : Rn R, i = 1, 2, . . . , m, and we can

take derivatives of each of these functions as above. In this case, the gradient off

at x is actually a n m matrix. Each column of the matrix, say column i is the

n-dimensional gradient of the function, fi.

This logic allows us to consider second derivatives of a function, f : Rn R. If

f is twice continuously differentiable (written as C2) then we can also differentiate

the gradient off (which is a function fromRn toRn) to get an n n matrix of

functions:

2f(x) =

2f(x)x2

1

2f(x)x1x2

. . . 2f(x)

x1xn

2f(x)x2x1

2f(x)x2x2

. . . 2f(x)

x2xn...

... . . .

...

2f(x)xnx1

2f(x)xnx2

. . . 2f(x)x2

n

which is sometimes written as

2f(x) =

f11 f12 . . . f 1n

f21 f22 . . . f 2n...

... . . .

...

fn1 fn2 . . . f nn

Definition 24. The derivative of the gradient (or Jacobian) of f is called the

Hessian off

Theorem 25. Youngs Theorem (SB 14.5): Iff isC2, fij(x) =fji(x). That is,

the Hessian off is symmetric.

2.6. Homogeneous and Homothetic Functions. Certain functions in Rn are

especially well-behaved:

Definition 26. A function f : Rn R is homogeneous of degree kiff(tx1, tx2, . . . , t xn) =

tkf(x) fort R.


20/22

STATIC LECTURE 1 20

One reason why these functions are useful is that they are very easy to characterize.

For example, suppose we only knew the value that the function takes for some ball

around the origin. Then we can use the homogeneity assumption to determine its

value everywhere in Rn. That is because for a ball that completely surrounds the

origin, we can define any point x Rn as some scalar multiple of some point on

that ball, so x =tx. Then apply the definition.

Definition 27. Iff(tx1, tx2, . . . , t xn) = tf(x) we say that f is linearly homoge-

neous.

An important feature of homogeneous functions comes from the following theorem:

Theorem 28. Eulers Theorem (SB 20.4): Iffis homogeneous of degreek then

x f(x) = kf(x)

Proof. Use the chain rule of differentiation to get

d

dtf(tx1, tx2, . . . , t xn) =

f(tx)

x1x1+

f(tx)

x2x2+ +

f(tx)

xnxn

=ni=1

f(tx)

xixi

=f(tx) x

where f(tx)/xi represents the partial derivative of f with respect to its ith

argument evaluated attx. Now note that by homogeneity,

d

dtf(tx1, tx2, . . . , t xn) =

d

dttkf(x1, x2, . . . , xn)

=ktk1f(x)

Both of these results hold for any value oft so in particular, choose t = 1. Substi-tuting and combining the two equations gives the result.

Definition 29. A ray through x Rn is defined as the set, {x Rn :x =tx, t

R}. Geometrically, it is the line joining x and the origin and extending forever in

both directions.


21/22

STATIC LECTURE 1 21

A useful feature of homogeneous functions is that the Jacobian or gradient of the

function is (essentially) the same along any ray. Since for any two points, x, x on

a ray (x =tx), we have f(x) = tkf(x), thenf(x) =f(tx) = tkf(x). That

is, the gradient at x is just a scalar multiple of the gradient at x and so the two

vectors are linearly dependent.

This means that the level sets of the function, along any ray, have the same

slope.

Application: An important application of this is that homogeneous utility func-

tions rule out income effects on demand. (For constant prices, consumers demand

goods in the same proportion as income changes.) This feature of identical Jacobianvectors is not restricted to homogeneous functions. Homothetic functions also exhibit

this feature.

Definition 30. A function f : Rn+ R+ is homothetic iff(x) = h(v(x)) where

h: R+ R+ is strictly increasing and v : Rn+ R+ is homogeneous of degree k .

2.7. Some More Geometry of Vectors in Rn.

Theorem 31. (SB Theorems 10.3 and 14.2). Consider a continuously differentiable

function, f : Rn R. f(x) is a vector in Rn which points in the direction of

greatest increase offmoving from the pointx.

Note that if we define a (small) vector v such that f(x)v= 0, then we know

thatv is moving us away from x in a direction that adds zero to the value off(x).

Therefore, solving the equation v f(x) = 0 is a way of finding the level sets of

f(x). Geometrically, the vector v is tangent to the level set off(x).

Also, we know that v and f(x) are orthogonal (or normal) to each other (by

definition since two vectors, w, v are orthogonal ifw v= 0) . Thus, the directionof greatest increase of a function at a point x is at right angles to the level set at x.

Definition 32. Consider a function f : Rn R. The level set off or the set

{x: f(x) =c}is the set of points in Rn which yield the same value c for f. The set

of points{x: f(x) c}is an upper contour set off.


22/22

STATIC LECTURE 1 22

If n = 2 and f is C1, we can solve for the level set of f by using the total

differential: Let (dx,dy) satisfy

df= 0 =f(x, y)

x dx +

f(x, y)

y dy

Then the differential equation,

dy

dx=

f(x;y)x

f(x;y)y

along with an initial condition, y0(x0) =y0 will trace out the level set off through

the point (x0, y0).

static lecture 1

Documents

lecture: static ilprajeev/cs6810/pres/20-6810-09.pdf · 1...

cs162: operating systems and systems programming lecture 1...

cmpen 411 vlsi digital circuits lecture 06: static cmos...

lecture 14: rotational dynamics & static equilibrium

lecture 3: further static oligopoly

1 lecture: static ilp topics: loop unrolling, software...

lecture 12 feedback control systems: static analysis

st lecture 4 - static techniques

lecture 24: rotational dynamics & static equilibrium

chapter 12 – static equilibrium and elasticity lecture...

lecture 8: static graphical user interface

lecture 23 april29 static correction

lecture 2 static and dynamic analysis

lecture 2: static methods, if statements, homework uploader

3rdle lecture 31 - r11 static equilibrium (4)

lecture 1 introduction to...

lecture notes on static analysis

lecture 2: static pages

building java programs chapter 1 lecture 1-2: static methods...

12/1/2004ee 42 fall 2004 lecture 381 lecture #38: memory (2)...