notation - university of pittsburghpitt.edu/~jrclass/opt/notes1.pdf · 2018-08-27 · 1 notation r...

1

Notation

R the set of real numbers

nR the set of real n-vectors

m nR the set of real m×n matrices

( )n n R R nonegative (positive) orthant of nR , i.e., the set of all vectors in nR

all the coordinates of which are nonnegative (positive)

xX x is an element of the set X

xX x is not an element of the set X

XY the set X is a subset of the set Y; X=Y is not included

X=Y the sets X and Y coincide

XY the union of sets X and Y

XY the intersection of sets X and Y

X\Y the set of all xX such that xY

XY The Cartesian product of sets X and Y, i.e., the set of pairs (x,y) where xX and yY

{x:T} or {x|T} the set of all x satisfying condition T

int(X), relint(X) The interior (or relative interior) of set X

cl(X) or X The closure of set X

bd(X) or X The boundary of set X

conv(X) The convex hull of set X

aff(X) The affine hull of set X

x reads “for all x”

x reads “there exists x”

dis(x,y) distance between two points x and y

dis(x,X) distance between point x and set Y: inf∈

,

N(x) the open neighborhood of the point x

N(X) the open neighborhood of the set X

N(X) the -neighborhood of the set X: {x: dis(x,X) <

X= the set X is empty

(a,b) satisfies the condition a<<b

2

[a,b] satisfies the condition ab

f:XY a one-to-one mapping of X onto Y; a function f with domain X and range Y

x nR the vector x is an element of the space nR

xT, AT the transpose of the vector x or matrix A

‖ ‖ the norm of the vector x; typically, the Euclidean norm is meant

|A| the determinant of matrix A

x0 (>0) the coordinates of x are nonnegative (positive)

x≼(≺)y component-wise inequality (strict inequality) between x and y

A0 (>0) the square symmetric matrix A is positive semidefinite (positive definite); sometimes denoted as ( )

Diag(z) or D(z) a diagonal matrix whose ith diagonal element is zi

In identity matrix of order n

0nm the zero matrix of order nm

f(x) the column vector whose ith element is (the gradient vector)

2f(x) or H(x) the square matrix whose (i,j)th element is (the Hessian matrix)

G(x) the rectangular matrix whose (i,j)th element is (the Jacobian

matrix)

∈ the set of all xX at which the minimum of the function f is attained

on X.

sup, inf supremum, infimum

3

Mathematics Review

VECTORS

Vector x nR =

nx

x

x

:2

1

= [x1 x2 … xn]T.

Special vectors: 1=

1

:

1

1

; 0=

0

:

0

0

; ei= positioni th

0

:

1

:

0

nR { x nR | xi0, i=1,2,…,n};

nR { x nR | xi>0, i=1,2,…,n}

NORM of a vector ( ) is a real-valued function x : nR nR that is a measure

of “length” or distance. It satisfies the following properties:

(i) x 0x; (ii) x =0x=0; (iii) x =||* x for all real ;

(iv) yx x + y x and y.

Various norms:

l1-norm: 1

x =

n

iix

1

l2-norm: 2

x =2

1

1

2

n

iix

lp-norm: p

x =pn

i

pix

1

1

l-norm:

x = ii xMax

4

Suppose xRn and y Rn; and x, y 0. Then xTy = <x, y> = ixiyi= cosyx ,

where is the angle between the vectors x and y.

Note that if

xTy = 0, then cos = 0 =90 (orthogonal vectors)

xTy > 0 then cos > 0 <90 (acute angle between vectors)

xTy < 0 then cos < 0 >90 (obtuse angle between vectors)

Also, xTx = i xi2 = 0cosxx =

2x (as defined by normal vector

multiplication)

Note that a vector also has associated with it a direction (i.e., the vector [6 6]T and

the vector [1 1]T have the same direction). This is easily seen by looking at their

product as [6 6][1 1]T = 12 = 6 6 1 1 cos (by the definition above), so

that cos =12/(722) = 1, i.e, =0.

x

y

Example in R2

5

Combinations of vectors: Given x1, x2…, xn

Linear combination = j jxj

Affine combination = j jxj, where j j=1

Convex combination = j jxj, where j j=1 and all j0

Note that every vector in R2 is a linear combination of x and y.

Linear Independence: The vectors x1, x2…, xn are said to be linearly

independent if j jxj = 0 implies that j=0 for all j.

Span: The vectors x1, x2…, xk are said to span the space L Rn if any xL can be

written as a linear combination of x1, x2…, xk.

Basis: The vectors x1, x2…, xk constitute a basis for Rn if they span Rn, but if one

of them is dropped then the remaining vectors do not span Rn.

Convex (as well as affine) combination of x and y

Affine combination of x and y

x

y

6

MATRICES

Amn = {aij} where i=1,2,…,m is the row index and j=1,2,…,n is the column

index. We also denote Amn = [a1 a2 … an] where each aj Rm.

Special matrices: In Identity Matrix of order nn

[0] the zero matrix

AT =transpose of A where (AT)ij = Aji. If AT=A then A is said to be symmetric.

Inverse of a matrix: Given Ann, if nn matrix A-1 AA-1= A-1A = I, then A-1 is

called the inverse of A; otherwise A is said to be singular.

A is said to have full rank if the maximum number of linearly independent

columns or rows (= rank of A) = Min{n,m}.

Matrix Norms: Suppose A is some nn matrix. Then the matrix norm A is said

to be induced by the corresponding vector norm x , x nR and is defined as

A = Axx

Max1

.

Thus

1

A = 1

11

Axx

Max

={Max i |(jaijxj)|, st {j |xj|}=1}} =1 1

n

ijj n i

aMax

For example, if A=1 3 2

5 7 9

2 1 6

then Ax=1 2 3

1 2 3

1 2 3

3 2

5 7 9

2 6

x x x

x x x

x x x

and so

1A ={Max {|x1+3x2+2x3|+|-5x1+7x2+9x3|+|2x1-x2+6x3|}, st {|x1|+|x2|+|x3|}=1}}

= Max {1+5+2 (corr.:1,0,0), 3+7+1 (corr.:0,1,0), 2+9+6 (corr.: 0,0,1)}

= 17 (…with x1=x2=0 and x3=1 or –1 at the optimum)

7

Similarly,

2

A = 2

12

Axx

Max

={Max

12 2

1 1

n n

ij ji j

a x

, st

21

1

2

n

jjx =1}} =

SQRT{max(ATA)}, where max(ATA) is the largest eigenvalue of the matrix ATA (may be shown…)

This matrix norm is also called the spectral norm. For our example, the

eigenvalues of ATA are {0, 23.48, 108.05, 167.47} (check using MATLAB or

other package….), so that 2

A = 47.167 =12.94.

A =

Axx

Max1

= 11 1

, st | | 1n

ij j jj ni n j

Max a x max x Max

=

n

jij

miaMax

11

For our example,

A ={Max [{Max (|x1+3x2+2x3|, |-5x1+7x2+9x3|, |2x1-x2+6x3|}],

st Max(|x1|,|x2|,|x3|)=1}

= Max{1+3+2 (corr.:1,1,1), 5+7+9 (corr.:-1,1,1), 2+1+6 (corr.:1,-1,1)}

= Max (6, 21, 9) = 21 (…with x1= -1 and x2= x3 = 1 at the optimum…

Another special norm is the Frobenius norm defined as F

A 1

22

1 1

n n

iji j

a

; note

that this is just the l2-norm of the vector whose elements are all the elements of A.

Also, note that 1I .

8

SETS

A set S is a collection of elements. e.g.,

S = {1,2,3,4} finite, countable S = {1,2,3,…} infinite but countable

S = {xR| 0x1} infinite and not countable

In nonlinear optimization problems, typically we deal with sets of the form

S = {xRn | “some set of conditions”};

e.g., S = {xRn | aTx 0}, i.e. the set of vectors in Rn making an angle of 90 or

less with the vector a. The empty set is the set with no elements in it.

Neighborhood: Given a point xRn, its -neighborhood (where R++) is

defined as the set N(x) = {yRn | xy <}. The set N(x) is also called a ball

(or more precisely, an open ball).

Note that for yN(x), xy <

We may also define an -neighborhood for an entire set S in an analogous

fashion: N(S) = {y Rn | xyx

S

Min <}.

x y

-x

y-x

N(x)

9

“LOCAL” Points belonging to some -neighborhood

“GLOBAL” If is “extremely large,” i.e., “everywhere.”

Sets could be (i) open, (ii) closed, (iii) neither open nor closed or, (iv) both open

and closed.

Suppose S is a subset of Rn. Set S is said to be open if xS, >0 N(x)S.

The interior of a set S is defined as int(S) = {xS| >0 N(x)S}, i.e., it is the

collection of points in S that have some sufficiently small -neighborhood that is

contained entirely within S. It should be clear that a set S is open if S = int(S).

The closure of a set S is defined as cl(S) = {x Rn | >0, SN(x)}. In

other words it is the set of all points in Rn for which an arbitrarily small -

neighborhood can still not be separated from S (obviously the closure also

includes the interior of S), i.e., it is the set of all points that are in, or can be made

“arbitrarily close to” the set S. It should be clear that a set S is closed if S = cl(S).

The set of boundary points of a set S (denoted by S) is defined as the set of all

points x Rn such that for every >0, N(x) has at least one point that belongs to S

and one that does not belong to S. A closed set contains all of its boundary points.

int(S) cl(S) S

10

Examples:

S = {xR| axb} is the closed set [a,b]. Note that int(S)= {xR| a<x<b},

cl(S)=S and S={a,b}.

S = {xR| a<x<b} is the open set (a,b). Note that int(S)= S, while

cl(S)={xR| axb} and S={a,b}.

S = {x R| a<xb} is the set (a,b] that is neither open nor closed. Note that

int(S)= {x R| a<x<b}, cl(S)={xR| axb} and S={a,b}.

A set S is said to be bounded if it can be contained within a ball of finite radius,

i.e., for all xS, x <M for some M>0. Note that all three sets above are

bounded, whereas the set S = {xR| x0} is closed but unbounded.

A set S is said to be compact if it is both closed as well as bounded.

SEQUENCES

A sequence of vectors {xk} (= x1, x2,…) is said to converge to a limit point x* if

*xx k 0 as k, i.e., given >0, a positive integer N() *xx k < for

kN(). We define *k xx k

lim .

A subsequence of {xk} is a subset of x1, x2,…, and is denoted by {xk} where

{1,2,3,…}

Bolzano-Weierstrass Theorem: Every sequence {xk} in a compact set S has a

convergent subsequence with limit x* in S.

If for a sequence, given any >0, N()>0 mk xx < k, mN(), then the

sequence is called a Cauchy sequence. A sequence in Rn has a limit if, and only

if, it is Cauchy.

11

FUNCTIONS

Given a domain D Rn, a function f is a mapping f:DR that maps each point

in the domain on to a real number. A function f is said to be continuous at a point

x0D, if >0, >0 yD, 0xy < |f(y) – f(x0)|<.

A simpler way to state it is that given an -neighborhood of f(x0), we can find a

-neighborhood of x0 such that any point in this -neighborhood leads to a

function value that is in the -neighborhood of f(x0). An equivalent definition of

continuity is that if f is continuous at x0D, then any sequence that converges to

x0 also has the corresponding function values to f(x0).

Fact: Continuous functions attain their maxima or minima over compact sets. For

example, Max f(x) = x; st 0x1 yields a maximum at x=1 since the maximum is

over a compact set. On the other hand, for the problem Max f(x) = x; st 0x<1,

the maximization is over a non-compact set, and thus even though the supremum

of f(x) is equal to 1, this is never attained at any x, i.e., there is no maximizing

point. Note that sometimes the maxima or minima over non-compact sets may

still be attained (e.g., the minimum for the latter problem is attained at x=0).

x0 x

f(x) f(x)

x

x0

12

Differentiability

Let S be a nonempty set in Rn. A function f: S R is differentiable at a point

x0 int(S) if for all h in Rn (x0+h)S, there exist

(a) a vector f(x0) and

(b) a function (x0, h) satisfying0

limh

(x0,h)=0

such that

f(x0+h) = f(x0) + [f(x0)]Th + h (x0,h) ()

Here f(x0) = T

nx

f

x

f

x

f

)(,...),(),(21

000 xxx is called the gradient vector.

Note that the above representation of f as given in () is called a first-order

Taylor series expansion of f at the point x0 and without the remainder term

h (x0,h) it is called a first-order Taylor series approximation of f at x0. An

equivalent form is f(x) f(x0) + [x-x0]Tf(x0)

Another important concept: Using the same definitions for f and h as above, f is

said to be twice differentiable at x0 int(S) if in addition to f(x0) and (x0,h) as

defined above, there exists an nn symmetric matrix H(x0) called the Hessian

f(x0+h) = f(x0) + [f(x0)]Th + (½)hT H(x0) h + 2

h (x0,h) ()

Here H(x0) has as its (i-j)th element the second partial )(2

0xji xx

f

.

Again, the above representation of f as given in () is called a second-order

Taylor series expansion of f at the point x0 and without the remainder term h

13

(x0,h) it is called a second-order Taylor series approximation of f at x0. An

equivalent form is f(x) f(x0) + [x-x0]T f(x0) + (½)[x-x0]TH(x0) [x-x0]

Mean Value Theorem: Given S, a nonempty convex open set in Rn and a

function f:SR that is differentiable on int(S). Then x1, x2int(S), (0,1)

x=x1+(1-)x2 and f(x2)=f(x1)+[x2-x1]Tf(x), i.e., f(x2)-f(x1)= [x2-x1]Tf(x).

An illustration for a univariate function

is shown alongside…

Taylor’s Theorem: Given S, a nonempty convex open set in Rn and a function

f:SR that is twice differentiable on int(S).

Then x1, x2S, (0,1) and x=x1+(1-)x2

f(x2)=f(x1)+[x2-x1]Tf(x1)+ (½)[x2-x1]TH(x) [x2-x1]

Functions with continuous first derivatives are said to be continuously

differentiable, functions with continuous first and second derivatives are said to

be twice-continuously differentiable, etc. In general, the class of functions with

continuous derivatives of order 1 through k is denoted by Ck. Functions with high

degrees of differentiability are said to be “smooth” functions.

f(x1)

x x1 x2

f(x2)

14

NUMERICAL SOLUTION OF EQUATIONS

Used when analytical solutions are not possible; e.g.,

2x - sin x = 0.5,

x2 + x log x= 1

x4 + 5 - 2x = 0

Methods are iterative and provide successively better approximations at each

iteration.

NEWTON RAPHSON METHOD To solve f(x) = 0. 1) Start with x0 - an initial 'guess.' 2) If |f(x0)| > 0, then we try to find a 'correction', say x, so that

|f(x0 +x)| < |f(x0)|.

3) Let x0 + x = x1, our new (and better) approximation. Repeat the procedure

until |f(xn)| =| f(x*)| 0 ('sufficiently' close to zero)

Q. How do we get x?

A. Use a TAYLOR SERIES expansion about x0:

f(x0 +x) = f(x0) + x f (x0) + ½ (x)2 f(x) + ...

want this to be zero small...negligible,

Thus, at iteration i we set the first order Taylor series approximation to zero and

solve: f(xi) + x f(xi) 0 x = - [f(xi) / f(xi)] Then xi+1 = xi + x

15

GEOMETRIC INTERPRETATION: We look for x* such that f(x*)=0, i.e., where

f intersects the X-axis,

Note that xi+1 = xi +x is the intersection of the tangent to the curve at [xi, f(xi)]

with the X-axis.

Example: Given a>0, find √ using only the operations (+, , , ),

Let √ = x x2 a = 0,

To solve the equation f(x) = x2-a = 0, let xi the guess at iteration i,

xi xi+1 xi+2 x*

x

slope=f (xi)

f(xi)

f(xi+1)

f(x)

x

0∆

⟹ ∆

16

⟹ ∆ 2

and the improved guess xi+1 = xi +x = xi

2

0.5

e.g., Suppose a = 5, and the initial guess x0 =2, and we use a tolerance of 0.001

i xi f(xi) xi+1

0 2 -1 0.5(2+5/2)=2.25

1 2.25 0.0625 0.5(2.25+5/2.25)= 2.2361

2 2.2361 0.00014 STOP

Suppose a = 2, and the initial guess x0 =1

i xi f(xi) xi+1

0 1 -1 0.5(1+2/1)=1.5

1 1.5 0.25 0.5(1.5+2/1.5)= 1.4167

2 1.4167 0.00704 0.5(1.4167+2/1.4167)= 1.4142

3 1.4142 -0.00004 STOP

17

WHEEL-COUNTERWEIGHT PROBLEM

r=radius of wheel,

M = unbalanced moment,

t = thickness of the counterweight

= density of the counterweight

d = lever arm; wheel center to center of gravity

= arc; half angle

A = area of counterweight

PROBLEM: Design a sector shaped counterweight for a locomotive driver wheel

to balance a moment M,

e.g. M = 600 Newton-m,

t = 0.1 m,

= 700 kg/m3,

r = 1.0 m,

M

M

r

d

c.g. of counterweight

d t

18

Moment (M) = (Force)*(Distance)= (Mass)*(g)*(Distance)

= (Volume)*(Density)*(g)*(Distance); i.e., M = (At) * () * (g) * (d)

r

Note that 2r sin 2 and cos = d/r

r2 (area of sector subtended by 2) c.g. ½ (counterweight area)

= +

i.e., the counterweight area A is given by

A= 2{r2 - ½ (2r sin)d} [Note that the c.g. is "inside" the counterweight]

= 2{r2 - r2 sin cos} = r2(2 - 2sin cos) = r2(2 - sin2)

Thus we have:

SOLVE FOR ...,

ANALYTICAL APPROACH

2 2 ⁄ CLEARLY IMPOSSIBLE TO SOLVE!

19

NEWTON RAPHSON APPROACH

Let ⁄ (from given values)

Then we solve

f() = 2 2 0

f() = cos 2 2 cos 2 2 sin 2 sin

If = approximation at iteration i, then ∆ = - f( ) f( ), and the improved

approximation at the next iteration is

= – {f( ) f( )}

‐0.2

‐0.1

0

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0 0.2 0.4 0.6 0.8 1 1.2 1.4 1.6

f()

20

#include <iostream.h> #include <math.h> #include <stdlib.h> float c,m,t,rho,r,tol,x,f,fprm; float deriv(float y) { return (‐sin(y)*(2*y‐sin(2*y)) + cos(y)*(2‐2*cos(2*y))); } void main() { cout << "Enter unbalanced moment (M) in Newton‐meters: "; cin >> m; cout << "Enter counterweight thickness (T) in meters: "; cin >> t; cout << "Enter counterweight material density (RHO) in kg/cu.m.: "; cin >> rho; cout << "Enter radius of flywheel (R) in m: "; cin >> r; cout << "Enter maximum allowable unbalanced moment in N‐m: "; cin >> tol; cout << "Enter initial guess for theta in radians: "; cin >> x; cout << "\n"; tol=tol/(t*rho*9.81*(r*r*r)); c = m/(t*rho*9.81*(r*r*r)); // Evaluate f(theta) at current approximation f=(cos(x)*(2*x‐sin(2*x)))‐c; do { cout << "theta = " << x << "\tf(theta) = " << f << endl; // Evaluate the derivative of f(theta) at the current approximation fprm=deriv(x); // Evaluate and add the correction x=x+(‐f/fprm); // Find the new function value f=(cos(x)*(2*x‐sin(2*x)))‐c; } while (fabs(f) > tol); cout << "theta = " << x << "\tf(theta) = " << f << endl; float ff = abs(f)*t*rho*9.81*(r*r*r); cout << "\nFinal unbalanced moment is:\t" << ff; }

21

c:\Users\rajgopal\Documents\CPP>newtrap Enter unbalanced moment (M) in Newton-meters: 600 Enter counterweight thickness (T) in meters: 0.1 Enter counterweight material density (RHO) in kg/cu.m.: 7000 Enter radius of flywheel (R) in m: 1 Enter maximum allowable unbalanced moment in N-m: 0.001 Enter initial guess for theta in radians: 1 theta = 1 f(theta) = 0.501935 theta = 0.180515 f(theta) = -0.079709 theta = 0.815954 f(theta) = 0.346872 theta = 0.46643 f(theta) = 0.0283223 theta = 0.423793 f(theta) = 0.00186569 theta = 0.420555 f(theta) = 1.09459e-05 theta = 0.420536 f(theta) = -1.85746e-09 Final unbalanced moment is: 0

c:\Users\rajgopal\Documents\CPP>newtrap Enter unbalanced moment (M) in Newton-meters: 600 Enter counterweight thickness (T) in meters: 0.1 Enter counterweight material density (RHO) in kg/cu.m.: 7000 Enter radius of flywheel (R) in m: 1 Enter maximum allowable unbalanced moment in N-m: 0.001 Enter initial guess for theta in radians: 1.4 theta = 1.4 f(theta) = 0.331597 theta = 1.58746 f(theta) = -0.140825 theta = 1.54445 f(theta) = -0.00738953 theta = 1.54193 f(theta) = -2.57154e-05 theta = 1.54192 f(theta) = -4.93854e-08 Final unbalanced moment is: 0

22

SECANT METHOD [TO SOLVE f(x)=0]

The Newton-Raphson method requires derivatives (f'(x)) which are sometimes

difficult to compute...

Let x0 and x1 be two initial "guesses"

To Find x2:

From properties of similar triangles:

0

Solving for x2 we have,

x0 x1 x2

f(x0)

f(x1)

f(x)

x

1) Draw the secant through [x0, f(x0)] and [x1, f(x1)]

2) Let its intersection with the X-axis be at x2

3) Go to Step 1 with x1 and x2 as the two new guesses…

∙

23

MINIMIZATION OF A FUNCTION OF A SINGLE VARIABLE

(WITHOUT USING DERIVATIVES)

Analytical expressions for f(x) may be unknown (e.g. f(x) is measured in a

laboratory), or f(x) may not be differentiable everywhere

The result of minimization will be an "interval of uncertainty" which contains

the optimum and we assume that an initial interval of uncertainty is specified -

say [a0, b0]. The method stops when the interval is "sufficiently" small.

We assume that f is UNIMODAL.

UNIMODAL NOT UNIMODAL

These methods are usually used to solve "subproblems" within larger multi-

variable problems.

f(x)

x

f(x)

xlocal min

global min

24

THREE POINT EQUI-INTERVAL SEARCH

A simple but rather inefficient approach...

Let [ai, bi] be the interval of uncertainty iteration i. Let ci = (ai+bi )/2 -- the

midpoint -- and let f(ai ), f(bi), f(ci ) be known,

Find the midpoints of the two new subintervals [ai, ci] and [ci, bi] via

,

Evaluate f(di) and f(ei)

Consider the relative magnitudes of f at the five points.

Case 1: New interval of uncertainty

ai+1= bi+1=


ai+1= bi+1=


ai+1= bi+1=

a d c e b

a d c e b

a d c e b

25


ai+1= bi+1=


ai+1= bi+1=


ai+1= bi+1=

Repeat the procedure until the interval of uncertainty is smaller than some

pre-specified tolerance , i.e., |bn – an| < .

At each iteration, the interval of uncertainty is (usually) halved (for example in

Cases 1-3); occasionally it may be reduced by more than 50 % (Cases 4-5) or less

than 50% (Case 6). HOWEVER, we typically need TWO function evaluations at

each iteration -- this may be quite expensive....

A more efficient search method is the so-called GOLDEN SECTION SEARCH,

which tries to reduce the number of function evaluations required.

a d c e b

a d c e b

a d c e b

26

GOLDEN SECTION SEARCH

Again, we have an interval of uncertainty [ai, bi] at iteration i, along with a point

ci in the interior. HOWEVER, ci is NOT the midpoint of the interval...

We choose ci along with another interior point di such that they are both

symmetrical about the midpoint of [ai, bi].

Consider the cases shown below:

Case 1 Case 2

Case 3 Case 4

Case 5

a c d b a c d b

a c d b a c d b

a c d b

Find [ai+1, bi+1] for each of these five cases

27

The interval is (usually) less than halved at each step. (i.e., the new interval is

larger than in the 3-pt equi-interval search). HOWEVER, only ONE new

function evaluation is required!

QUESTION: How to locate ci inside [ai, bi]?

CASE 1

CASE 2

x

x x x = length of current interval

x ∆ = length of next interval

, are constants; + = 1

and must satisfy

∆ ∆ Δ ∆ ⇒ 1 ∆ /Δ

Δ ∆ ∆ ()

∆ Δ⁄ ⟹∆Δ

⟹1 2 ∆

Δ

So from ( and it follows that (1-2)/ = (1-); i.e., 2 - 3 +1 = 0

√ .Choose “ ” sign; “+” leads to >1

The ratio (+)/ = / = 1.618034 is called the GOLDEN RATIO.

ak ck dk bk

ak+1 ck+1 dk+1 bk+1

ak+1 ck+1 dk+1 bk+1

a c d b

= (3-5)/2 = 0.381966

= 1- = 0.618034

28

Q. WHAT IS THE IMPLICATION?

A. If ci (di) is placed at a distance of 0.381966*(bi –ai) units from ai (bi), then at

the beginning of iteration n, cn (dn) will automatically be at a distance of

0.381966*(bn-an) from an (bn).

Here, at each iteration the interval of uncertainty x is reduced by approx. 38%

and there is only ONE new function evaluation per iteration.

Compare Golden Section to 3-pt Equi-Interval Search...

No. of evaluations of f(x)

Interval of Uncertainty 3-pt. Equi-Interval

Interval of Uncertainty Golden Section

3 4 5 6 7 8 9 10 11 12 13 14 15 16 17

1.00

(0.5)1=0.5

(0.5)2=0.25

(0.5)3=0.125

(0.5)4=0.0625

(0.5)5=0.01325

(0.5)6=0.0156

(0.5)1=0.0078

1.00 (0.618)1=0.618 (0.618)2=0.382 (0.618)3=0.236 (0.618)4=0.146 (0.618)5=0.090

(0.618)6=0.0557 (0.618)7=0.0344 (0.618)8=0.0213 (0.618)9=0.0132

(0.618)10=0.00813 (0.618)11=0.00503 (0.618)12=0.00311 (0.618)13=0.00192 (0.618)14=0.00119

29

ELECTRICAL CABLE INSULATION

Maximum Electrical Stress = /(2r)

Utilization = U =(Line Voltage) / (Maximum Electrical Stress)

⟹

ln2

2 ln

PROBLEM: Proportion R with r so as to maximize the "utilization" for a given

volume (cross-sectional area) of the insulation --- this allows a 'maximum' safe

voltage for the cable for a given stress (and hence presumably the most power).

A = area = R2 - r2 = r2(R2/r2 -1) = r2(x2 -1), where we define x=R/r

Also, U = r ln (R/r) r =U / (ln x). Therefore

ln1 ⟹

ln1

To maximize U, we perform the equivalent task of minimizing 1/U...

Line Voltage applied = V

, where

= charge/unit length

= dielectric constant of the insulation

INSULATION

CONDUCTOR

SHEATH

R

r

30

1/ 1

ln

i.e., 1 / ln , where V= 1/U and c=/A

(It turns out that this function is unimodal in x)

Thus one could use (1) Three point Equi-interval Search, or (2) Golden Section

Search.

31

FIBONACCI SEARCH

Named after Leonardo of Pisa, son of Bonacci (hence Fibonacci!), in 1202 A.D.

Demonstrated initially with rabbit-breeding...

ASSUME

Each pair of mature rabbits produces one pair per litter

Exactly ONE litter per month

Maturation requires 1 month

How many pairs of rabbits are there at the end of each month, starting with one

new born pair in January?

Month J F M A M J J A S O

Immature

Mature

TOTAL

1

0

1

0

1

1

1

1

2

1

2

3

2

3

5

3

5

8

5

8

13

8

13

21

13

21

34

FIBONACCI NUMBERS: 1, 1, 2, 3, 5, 8, 13, 21,...

In general, F0 = F1 = 1, FN+1 = FN + FN-1,

32

The Fibonacci numbers are used for the Fibonacci Search. It also starts with 2

initial evaluations, and with only ONE subsequent evaluation per iteration.

HOWEVER, the interval of uncertainty is not reduced by the same amount each

time. Also, the number of function evaluations planned (say n) is determined

before the search commences.

Suppose at iteration k we have the interval Ik = [ak, bk] of length =bk - ak. Let

Then the new interval of uncertainty Ik+1 will usually be [ak, dk] or [ck, bk],

In the first case, = dk - ak = (Fn-k/Fn-(k-1))

In the second case, = bk - ck = (Fn-k/Fn-(k-1)) (VERIFY!!)

In either case is reduced by a factor of (Fn-k/Fn-(k-1))

EXERCISE: At iteration k+1, [ak+1, bk+1] = (i) [ck, bk] or (ii) [ak, dk]. Show

that,

(i) ck+1 = dk or (ii) dk+1 = ck,

Thus only ONE new function evaluation is required at the next step.

33

QUESTION: How should we choose n?

Let l = required final interval of uncertainty (Tolerance). We know that

1

2

1 2

1

1 2

3

2 3

2

2 3

etc., etc., etc…

1

2 1

2 1

2

2 1 1

So, if we want to be < l, then we must choose < l, i.e., Fn > / l.

Thus we pick the first n such that this condition is satisfied.

34

The Fibonacci search then proceeds as follows:

1) Start with a1, b1 (where b1-a1= ). Then find c1 and d1 using the formulas

on the previous page. Set k=0.

2) Set k=k+1. Evaluate f(ak), f(bk), f(ck) and f(dk) and eliminate a portion of

the interval. In general, we will have one of two cases:

(i) ak+1=ak, bk+1=dk, dk+1=ck and ck+1 is computed via the formula on

page 32

(ii) ak+1=ck, bk+1=bk, ck+1=dk and dk+1 is computed via the formula on

page 32

3) If kn-1 go to Step 2.

4) When k=n-1 the two formulas for ck and dk yield the same value ck = dk =

ak+ (½)(bk-ak), since F0 =F1 = 1. So no further interval reduction is

possible. Consequently, we place cn-1 (or dn-1 as the case may be…) at a

distance from the existing interior point.

5) Repeat the procedure of Step 2 for this last interval and obtain the final

interval of uncertainty [an, bn]

NOTE: The quantity is referred to as a “distinguishability constant.”

We will have bn-an = (b1-a1)/Fn + , where = 0 or =.

35

a1 c1 X* d1 b1

=13l ITER 1

a2 X* c2 d2 b2

=8l ITER 2

a3 X* c3 d3 b3

=5l ITER 3

a4 c4 X* d4 b4

=3l ITER 4

a5 c5 d5 X* b5

=2l ITER 5

Note that d5 is the mid‐point of [a5,b5]…

l

= l (Final Interval) ITER 6

Note that the length of the final interval is (1/F6)*(b1‐a1)+ = l+ since the final interval is [c5,b5] for this example. If the optimum had been in the final interval [a5,d5] then its length would have been l = (1/F6)*(b1‐a1).

Also, note that = + . Thus we start with = +

Fn = Fn‐1 + Fn‐2 1n-1 n-2

n n

F F+

F F 1 1 1 2 3

n-1 n-2

n n

F FI + I = I = I + I

F F

36

EXAMPLE: Suppose that the initial interval is [1.05, 4.00], so that =2.95 and

we need to reduce the final interval of uncertainty to l =0.01

Since /l = 295, the smallest value of n for which Fn exceeds 295 is given by

n=13 (…F13=377). So we will plan on 13 iterations. The search might perhaps

then proceed as follows:

a1= 1.05

c1= 1.05+ F11/F13(2.95) = 1.05+(144/377)(2.95) = 2.1768

d1= 1.05+ F12/F13(2.95) = 1.05+(233/377)(2.95) = 2.8732

b1= 4.00

a2= 1.05

c2= 1.05+ F10/F12(2.8732-1.05) = 1.05+(89/233)(1.8232) = 1.7464

d2= 2.1768

b2= 2.8732

a3= 1.7464

c3= 2.1768

d3==1.7464 + F10/F11(2.8732-1.7464) = 1.05+(89/144)(1.268) = 2.4428

b3= 2.8732

etc. etc. etc…

37

LAGRANGE'S INTERPOLATING POLYNOMIALS

Assume that we are given the n+1 values x0, x1,…,xn.

Define

… …

Some properties of lj(x):

1. lj(x) is a polynomial of degree n+1

2. lj(xj) = 1

3. lj(xi) = 0 for ij,

Lagrange's Interpolating polynomial is

Note that

. .

38

QUADRATIC INTERPOLATION OF A MINIMUM

Given a, b, c and f(a), f(b), f(c) we wish to find a point d and f(d).

The interpolating polynomial is

Note: p(a) = f(a), p(b) = f(b), p(c) = f(c).

Differentiating p(x) w.r.t. x and equating to zero, we obtain

2 2

2 0

Solving for x:

12

⟹12

Now, let d=x, then find f(d) and proceed as usual…

39

MULTI-DIMENSIONAL OPTIMIZATION WITHOUT DERIVATIVES: THE HOOKE-JEEVES METHOD

Used to solve the general problem:

MINIMIZE f(x), where x= [x1, x2, …, xn]T Rn,

At iteration i suppose we have the points xi and y1 ( Rn) and a set of search

directions d1,d2, ..., dn.

(NOTE: Usually these directions are the coordinate directions, but this is not

necessary…)

STEP 1

Using y1 find the optimum solution to the problem:

Minimize f() = f(y1+d1) to obtain 1.

Let y2 = y1+l d1,

Using y2 find the optimum solution to the problem:

Minimize f() = f(y2+d2) to obtain 2.

Let y3 = y2+2 d2, etc. etc…

Continue until yn+1 = yn +n dn

Let xi+1 = yn+1

Compare xi+1 and xi; IF ║xi+1 – xi║ < then STOP, ELSE go to STEP 2.

40

STEP 2

Let d = xi+1 – xi; solve the subproblem:

Minimize f() = f(xi+1 +d) to obtain *.

Let y1= xi+1 +*d and return to STEP 1

NOTE: At iteration 1, there is no y1; so use y1= x1)

====================X============X==================

AN EXAMPLE

Minimize f(x) = f(x1, x2) = (x1 -2)4 + (x1 -2x2)2

Start with x1 = 03

, and use d1= 10

, d2= 01

ITERATION 1

STEP 1 y1 = 03

, Min 03

10

= f(,3)

i.e., Min (-2)4 + (-2·3)2 1 = 3.13

Therefore y2 = y1 +l d1 = 03

3.13 10

= 3.133.00

.

Minimize 3.133.00

01

= f(3.13, 3+)

i.e. Min (3.13-2)4 + [3.13-2(3+)]2 2 = -1.44,

Therefore y3 = y2 +2 d2 = 3.133.00

1.44 01

= 3.131.56

.

Let x2 = 3.131.56

, ║x2 – x1║ is not negligible; go to Step 2.

41

STEP 2

d = 3.131.56

- 03

= 3.131.44

Minimize f() = 3.131.56

3.131.44

* = -0.10

Therefore y1= x2+*d = 3.131.56

- (0.10) 3.131.44

= 2.821.70

------------------------------

ITERATION 2

STEP 1 y1 = 2.821.70

, Min 2.821.70

10

= f(2.82+, 1.7)

This yields 1 = -0.12

Therefore y2 = y1 +l d1 = 2.701.70

.

Minimize 2.701.70

01

Min f(2.7, 1.7+) 2 = -0.35,

Therefore y3 = y2 +2 d2 = 2.701.35

.

Let x3 = 2.701.35

, ║x3 – x2║ is not negligible; go to Step 2.

STEP 2

d = x3-x2 = 2.701.35

– 3.131.56

= 0.430.21

Minimize f() = 2.701.35

0.430.21

* = 1.50

43

Therefore y1= x3+*d = 2.701.35

+ (1.50) 0.430.21

= 2.061.04

------------------------------

ITERATION 3,

STEP 1 y1 = 2.061.04

, Min 2.061.04

10

ETC. ETC.

Note that the optimal solution to the problem is, x*= [x1 =2, x2 =1], and we're

headed that way...

GEOMETRICAL INTERPRETATION

STEP 1: Exploratory search to get xi+1 (via y1, y2)

STEP 2: Line search along xi+1 – xi (to get y1, y2 for the next iteration)

x1=03

x2=3.131.56

x3=2.701.35

Contours of f(x)

x1 (=y1)

y3 (=x2)

y2

y1y2

y3 (=x3)

y1

44

CONVEXITY

i.e., If x1S, x2S x1 + (1-)x2 S for all [0,1], then S is a convex set.

In general, 1 x1 + 2 x2 is said to be a convex combination of x1 and x2 if 1,2 0

and 1+2 =1.

CONVEX HULL: The convex hull of the set S denoted by H(S) is the smallest

convex set that contains S. It is the collection of all convex combinations of

points in S.

e.g.

x1

x2

S

A set S is said to be CONVEX if for any two points x1 and x2 in S, the line segment joining x1and x2 also lies entirely in the set S.

x2

x1

S This set S is a NONCONVEX set since the line segment joining x1 and x2 is not entirely in the set S.

S S S

H(S) H(S) H(S)

45

CONVEX FUNCTIONS

The EPIGRAPH of a function f - denoted by epi(f) - is the set of points on or

above the graph of f(x).

The HYPOGRAPH of a function f - denoted by hyp(f) - is the set of points on or

below the graph of f(x).

A function f is convex if, and only if, the epigraph of f is a convex set.

convex nonconvex

Equivalently, for [0,1]

(1-)f(x1) + f(x2) f((1-)x1+ x2))

if, and only if, f is convex.

Some Convex Functions

f(x)

x

f(x)

x

epi (f) epi (f)

x1 x x2 x1 x x2 x1 x x2

F x

F F

f(x)f(x)

F=f(x)

46

SOME TERMINOLOGY AND SOME GENERALIZATIONS

1) EPIGRAPH: EPI(f) = { (x,y) | xRn, yR, yf(x) } Rn+1,

2) HYPOGRAPH: HYP(f) = { (x,y) | xRn, yR, yf(x) } Rn+1

3) LEVEL SET: S = { xRn | f(x) },

4) QUASICONVEXITY: Given [0,1], x1x2, the function f is said to be

quasiconvex if f(x1+(1-) x2) Max{f(x1), f(x2)}.

Clearly, every convex function is also quasiconvex but the converse is not

true. Quasiconvex functions can have discontinuities and local minima in

addition to the global minimum.

Theorem: f(x) quasiconvex all of its level sets are convex

Theorem: Suppose f(x) is quasiconvex. If x* is a strict local minimum of f, it

is also a strict global minimum of f.

f(x)

x

S

e.g.

f(x)

x

S

Quasiconvex, but not convex

NOTE: f: RnR convex S is convex for all

47

We can further strengthen quasiconvexity via the following:

4a) Given [0,1], x1x2, the function f is said to be strongly quasiconvex if

f(x1+(1-) x2) Max{f(x1), f(x2)}

Theorem: Suppose f(x) is strongly quasiconvex. Then it attains its (global)

minimum at no more than one single point.

We can weaken strong quasiconvexity slightly through the following:

4b) Given [0,1], x1x2, f(x1)f(x2), the function f is said to be strictly

quasiconvex if f(x1+(1-) x2) Max{f(x1), f(x2)}

Theorem: Suppose f(x) is strictly quasiconvex. If x* is a local minimum of f, it is

also a global minimum of f.

4c) A function f is quasiconcave if –f is quasiconvex

4d) A function that is both quasiconvex as well as quasiconcave is said to be

quasimonotone.

Strict (but not Strong) Strong as well as strict quasimonotone

48

5) PSEUDOCONVEXITY: f is said to be pseudoconvex on its domain if it is

differentiable everywhere and f T(x1) (x2-x1) 0 f(x2) f(x1), where f

is the gradient vector of f (i.e., fi=f/xi)

Theorem: If the gradient vector of a pseudoconvex function is equal to zero at

some point, the point must be a (global) minimizer.

Note that every (differentiable) convex function is also pseudoconvex, and every

pseudoconvex function is strictly quasiconvex (and therefore every local

minimum is also a global one).

THEOREM: Suppose S is a nonempty open convex set in Rn and let f:SR be

differentiable on S. Then f is (strictly) convex if, and only if, for any x'S,

f(x) (>) f(x') + f T(x') (x- x') for each xS

Note that if we are minimizing a convex f over xS then given any point x'S, the

affine function f(x') + f T(x') (x- x') bounds f from below. So the minimum of

this affine function over xS yields a lower bound on the optimum value of f.

This fact is often used in optimization algorithms.

≡

x1 xx2

f(x)

49

POSITIVE (NEGATIVE) SEMIDEFINITE (DEFINITE) MATRICES

Let H(x) be a square symmetric matrix. H is said to be POSITIVE

SEMIDEFINITE (DEFINITE) at x' if xTH(x')x (>) 0 for all x0, NEGATIVE

SEMIDEFINITE (DEFINITE) if xTH(x')x (<) 0 for all x0.

If the above conditions cannot be satisfied the matrix H is said to be INDEFINITE.

(Also, note here that if all entries in H are 0, it does not imply anything -- this is

not a sufficient condition for H to be positive semidefinite).

Examples:

2 00 2

⟹ 2 00 2

2 2

0forall . Thus H is positive definite everywhere.

2 22 2

⟹ 2 22 2

2

0forall . Thus H is positive semidefinite everywhere.

2 00 2

⟹ 2 00 2

2 2

May be of any sign. Thus H is indefinite.

6 2 0

0 12 1⟹

6 2 0

0 12 1 6 2

12 1 0forall . Thus H is positive definite everywhere.

50

TWICE DIFFERENTIABLE CONVEX FUNCTIONS

Suppose that f has 2nd partial derivatives. Then 2f(x) = H(x) is called the

HESSIAN matrix of f, where Hij(x) = [2f(x)]ij = .

QUADRATIC FORM: The quadratic form is a convenient way of representing

quadratic functions (polynomials of order no higher than 2). Specifically it has

the form f(x) = ½ xTHx + bTx, where H=2f(x) is the Hessian matrix of the

function f(x) with Hij = .

Example 1: f(x1, x2) = ax12 + bx2

2 + cx1x2

⟹ 12

22

Example 2: f(x1, x2) = ax12 + bx2

2 + cx32 + kx1x2 + lx1x3 + mx2x3 + p

⟹ 2

22

+ p

THEOREM

a) f: SR R; f convex 0 xS

b) f: SRnR; f convex 2f(x) is POSITIVE SEMIDEFINITE xS

c) f: SRnR; 2f(x) is POSITIVE DEFINITE xS f strictly convex

f strictly convex 2f(x) is POSITIVE SEMIDEFINITE xS

51

EIGENVALUES AND EIGENVECTORS

Given an n row by n column (nn) symmetric matrix A, if we have a scalar and

a nonzero vector vRn satisfying Av= v, i.e., [A-I]v = 0, then v is called

an eigenvector for A, and is called the corresponding eigenvalue for A.

To compute the eigenvalues, we solve

Det(A-I) = | A-I | = 0 for

Then for each i (in general we have n of them) we solve

(A-iI)v = 0 to find the eigenvector v

THEOREM: A matrix is positive (negative) semidefinite if all its eigenvalues are

nonnegative (nonpositive). A matrix is positive (negative) definite if all its

eigenvalues are positive (negative).

THEOREM: A real nn symmetric matrix has n (not necessarily distinct)

eigenvalues, and at most n linearly independent eigenvectors.

Example: A = 2 22 2

⟹ 2 22 2

Then |A-I| = (2-)2 - 4 = 0 - 4 = 0.

(-4) = 0 the eigenvalues are 1=4, 2=0.

(Note that these being nonnegative, A is positive semidefinite). Then the

eigenvectors are vT = [v1 v2], where2 22 2

= 0

52

i.e., (2-)v1 + 2v2 = 0, 2v1+ (2-)v2 = 0

i. e., for 4, for 0,

where t is any arbitrary real number.

THEOREM: Two eigenvectors of a real symmetric matrix corresponding to

different eigenvalues are orthogonal, i.e., if 12 then (v1)Tv2 = 0.

Further, the eigenvalues of such a matrix are always real.

DEFINITION: A principal minor Hii is the determinant of a (square) sub-matrix

obtained by deleting from a (square) matrix, a row and a column (or multiple

rows and columns) that share the same index i. The leading principal minors of a

square matrix are the determinants of the square sub-matrices along the main

diagonal.

THEOREM: A real symmetric matrix has eigenvalues that are positive if, and

only if, all the leading principal minors are positive. If the eigenvalues are all

nonnegative then the leading principal minors are all nonnegative. If ALL

possible principal minors are nonnegative, then the eigenvalues are nonnegative.

The above theorem indicates that if we have Hii0 (Hii<0) for some i , then H can

never be positive definite (semidefinite).

53

THEOREM: A real symmetric matrix A has eigenvalues that are negative if, and

only if, (-1)i|Ai| (where |Ai| is the ith leading principal minor) is positive for

i=1,..,n. If the eigenvalues are nonpositive then the corresponding (-1)i|Ai| are all

nonnegative, but for the eigenvalues to be nonpositive, (-1)i|Ai| must be

nonnegative for ALL possible principal minors of order i.

The last two theorems provide another way to check the definiteness of a

matrix (and thus to check for convexity...).

EXAMPLE 1: f(x1, x2) = -x12 - 5x2

2 + 2x1x2 + 10x1 - 10x2

2 2 1010 2 10 & 2 2

2 10

|H-I| = 0 (-2-)(-10-) - 4 = 0 = -6 25

Since 1 = -6-25 < 0 and 2 = -6+25 < 0 both eigenvalues are negative,

and therefore the Hessian H is negative definite, implying f is strictly concave.

EXAMPLE 2: . Here 2 11

Let us call . then |H-I| = 0 {Q(x1-2)-}(Qx1-) – Q2(x1-1)2 = 0

Solving for , we get (VERIFY…),

1 1 1 0

1 1 1 0

Therefore H is indefinite and f is neither convex nor concave.