5 numerical methods for unconstrained optimization.pdf

8/11/2019 5 Numerical Methods for Unconstrained Optimization.pdf

1/95

Numerical Methods for

Unconstrained Optimization

Cheng-Liang ChenPSELABORATORY

Department of Chemical EngineeringNational TAIWAN University


2/95

Chen CL 1

Analytical vs. Numerical ?

In Analytical Methods, we write necessaryconditions and solve them (analytical or numerical

?) for candidate local minimum designs

Some Difficulties: Number of design variables in constraints can be large

Functions for the design problem can be highly nonlinear

In many applications, cost and/or constraint functions can be

implicit in terms of design variables

Numerical Methods: estimate an initial design and

improveit until optimality conditions are satisfied


3/95

Chen CL 2

Unconstrained Optimization


4/95


5/95

Chen CL 4

C C


6/95

Chen CL 5

Descent Step Idea

current

estimate f(x(k)) >

new

estimate

f(x(k+1))

= f(x(k) +kd(k))

f(x(k)) + fT(x(k)) x(k) +kd(k) x(k)= f(x(k)) +kc

(k)d(k)


7/95

Chen CL 6

Example: check the descent condition

f(x) =x21 x1x2+ 2x22 2x1+e(x1+x2)

Verify d1 = (1, 2), d2 = (1, 0) at (0, 0) are descent directions or

not

c = 2x1 x2 2 +e(x1+x2)x1+ 4x2+e(x1+x2) (0,0)

= 11 c d1 =

1 1

1

2

= 1 + 2 = 1>0 (not a descent dir.)

c d2 =1 1

10

= 1 + 0 = 1


8/95

Ch CL 8


9/95

Chen CL 8

Analytical Method to Compute Step Size

d(k)

is a descent direction >0df(k)

d = 0,

df2(k)

d2 >0

0 =df(x(k+1)

)d

=df(x(k+1)

)dx

dx(k+1)

d = fT(x(k+1))

c(k+1)T

d(k)

Gradient of the cost function at NEW point, c(k+1),is orthogonal to the current search direction, d(k)

Ch CL 9


10/95

Chen CL 9

Example: analytical step size determination

f(x) = 3x21+ 2x1x2+ 2x22+ 7

d(k) = (1, 1) at x(k) = (1, 2)

c(k) = f(x(k)) =

6x1+ 2x2

2x1+ 4x2

x(k)

=

10

10

c(k) d(k) =

10 10

11

= 20


11/95

Chen CL 10

NC: df

d = 14k 20 = 0 k =10

7d2f

d2 = 14>0

x(k+1) =

1

2

+ (107)

1

1

=

3/74/7

f(x(k+1)) = 54

7 < 22 = f(x(k))

Chen CL 11


12/95

Chen CL 11

Numerical Methods to Compute Step Size

Most one-dimensional search methods work for only

unimodal functions

(work for = 0 =u,)( u interval of uncertainty)

Chen CL 12


13/95

Chen CL 12

Unimodal Function

Unimodal function: f(x) is one unimodal function if x1< x2< x

implies f(x1)> f(x2), and x> x3> x4 implies f(x3)< f(x4)

Chen CL 13


14/95

Chen CL 13

Unimodal Function

Outcome of two experiments

x [0, 1], 0< x1< x2 f2 x [x1, 1] f1=f2 x [x1, x2]

Chen CL 14


15/95

Chen CL 14

Equal Interval Search

To reduce successively the interval of uncertainty, I,to a small acceptable value

I=u , (= 0)

Evaluate the function at , 2, 3,

If f((q+ 1)) < f(q) then continue

If f((q+ 1)) > f(q) then = (q 1),u= (q+ 1)

[ , u ]

Chen CL 15


16/95

Chen CL 15

Chen CL 16


17/95

Chen CL 16

Equal Interval Search: Example

f(x) =x(x

1.5), x

[0, 1]

x

[x7, x

8] = [0.7, 0.8]

i 1 2 3 4 5 6 7 8 9

xi .1 .2 .3 .4 .5 .6 .7 .8 .9

f(xi) .14 .26 .36 .44 .50 .54 .56 .56 .54

Chen CL 17


18/95

Chen CL 17

Equal Interval Search:

Example

f() = 2 4+e = 0.5

= 0.001

No. Trial step Function1 0.000000 3.000000 = 0.52 0.500000 1.6487213 1.000000 0.7182824 1.500000 0.481689

5 u 2.000000 1.3890566 1.050000 0.657651 start7 1.100000 0.604166 from8 1.150000 0.558193 = 1.09 1.200000 0.520117 = 0.0510 1.250000 0.49034311 1.300000 0.46929712 1.350000 0.45742613 1.400000 0.45520014 u

1.450000 0.463115

15 1.355000 0.456761 start16 1.360000 0.456193 from17 1.365000 0.455723 = 1.3518 1.370000 0.455351 = 0.00519 1.375000 0.45507720 1.380000 0.45490221 1.385000 0.45482622 u 1.390000 0.45485023 1.380500 0.454890 start24 1.381000 0.454879 from

25 1.381500 0.454868 = 1.3826 1.382000 0.454859 = 0.000527 1.382500 0.45485128 1.383000 0.45484429 1.383500 0.45483830 1.384000 0.45483331 1.384500 0.45482932 1.385000 0.45482633 1.385500 0.45482434

1.386000 0.454823

35 1.386500 0.45482336 u 1.387000 0.45482437 1.386500 0.454823

Chen CL 18


19/95

Chen CL 18

Equal Interval Search: 3 Interior Pointsx [a, b] three tests x1, x0, x2 three possibilities

Chen CL 19


20/95

Chen CL 19

Equal Interval Search: 2 Interior Points

a = +13(u ) = 13(u+ 2)

b = +23(u ) = 13(2u+)

Case 1: f(a) < f(b) < < bCase 2: f(a) > f(b) a< < u

I= 23I : reduced interval of uncertainty

Chen CL 20


21/95

Chen CL 20

Golden Section Search

Question ofEqual Interval Search (n= 2):known midpoint is not used in next iteration

Solution: Golden Section Search

Fibonacci Sequence:

F0 = 1; F1= 1; Fn=Fn1+Fn2, n= 2, 3, 1, 1, 2, 3, 5, 8, 13, 21, 34, 55, 89,

Fn

Fn1 1.618,

Fn1

Fn 0.618 as n

Chen CL 21


22/95

Golden Section SearchInitial Bracketing of Minimum

Starting at = 0,

evaluate q =q

j=0

(1.618)j =q1+ (1.618)q, q= 0, 1, 2,

q = 0; 0 =

q = 1; 1 = +

1 0 1.618

1.618(0 0)

= 2.618

q = 2; 2 = 2.618+

2 1 1.6182 1.618(1 0)

= 5.236

q = 3; 3 = 5.236+

3 2 1.6183

1.618(2 1)= 9.472

... ...

Chen CL 22


23/95

Golden Section SearchInitial Bracketing of Minimum

If f(q2)> f(q1) andf(q1)< f(q)

Then q2< < q

u = q =

qj=0

(1.618)j

= q2 =q2

j=0(1.618)j

I = u = (1.618)q q q1

+ (1.618)q1 q1 q2

= 2.618(1.618)q1

Chen CL 23


24/95

Golden Section SearchReduction of Interval of Uncertainty

Given u, I = u Select a, b s.t. u a = I, a = (1 )I

b = I, u b = (1 )ISuppose f(b) > f(a) [b, u], delete [b, u]

= ,

b = a,

u = b, I

=

u

,

b = I, u b = (1 )I

Chen CL 24


25/95

Golden Section SearchReduction of Interval of Uncertainty

I

= I, (1 )I = I = ( I) 2 + 1 = 0 =1+

5

2 = 0.618 = 11.618

q1

a

q2

= 0.382I,

uq aq1 = 0.618I = (1.618) 0.382I q1 q2 u a

a = q q1q1 q2 =

0.618I

0.382I = 1.618

ratio of increased trial step size is 1.618

Chen CL 25


26/95

Golden Section SearchAlgorithm

Step 1: choose q, = q2, u=q, I

Step 2: = + 0.382I, b= + 0.618I, f(a), f(b)

Step 3: compare f(a), f(b), go to Step 4, 5, or 6

Step 4: iff(a)< f(b) < < b =, u=b, b=a, a= + 0.382(u ), go to Step 7

Step 5: iff(a)> f(b) a< < u

=a,

u=u,

a=b,

b=

+ 0.618(

u

), go to Step 7

Step 6: iff(a) =f(b) a< < b =a, u=b, return to Step 2

Step 7: ifI=u < =

u

+2 and Stop; otherwise return to Step 3

Chen CL 26


27/95

Golden Section Search

Example

f() = 2 4+e = 0.5

= 0.001

Chen CL 27


28/95

Golden Section Search: Example

f(x) =x(x

1.5)

Initial Bracketing of MinimumNo. Trial xs Fcn value

1 0.000000 0.0000002 0.250000 0.3125003 0.500000 0.500000 X = 0.54 0.750000 0.562500 fmin5 1.000000 0.500000 Xu = 1.0

Reducing Interval of UncertaintyNo. X Xa Xb Xu I

1 0.5000000 0.6910000 0.8090000 1.0000000 0.500000000.5000000 0.5590190 0.5590190 0.5000000

2 0.6910000 0.7360760 0.7639240 0.8090000 0.118000000.5590190 0.5623061 0.5623061 0.5590190

3 0.7360760 0.7469139 0.7532861 0.7639240 0.02784800

0.5623061

0.5624892

0.5624892

0.5623061

4 0.74691393 0.74922448 0.75077551 0.75328606 0.006572120.562489202 0.562499399 0.562499399 0.562489202

5 0.7492244890 0.7498169790 0.7501830210 0.7507755110 0.0015510220.562493399 0.562499967 0.562499967 0.562499399

6 0.7498467900 0.7499566900 0.7500431210 0.7501830210 0.0003662310.562499966 0.562499998 0.562499998 0.562499967

Chen CL 28


29/95

Polynomial InterpolationQuadratic Curve Fitting

q() = a0+a1+a22 (approximated quadratic function)

f() = q() = a0+a1+a22

f(i) = q(i) = a0+a1i+a22i

f(u) = q(u) = a0+a1u+a22u

a2 = 1u i

f(u) f()u

f(i) f()i

a1 =

f(i) f()i a2(i+)

a0 = f() a1 a22

dq()

d

= a1+ 2a2 = 0

= a12a2

if d2q

d2= 2a2>0

Chen CL 29


30/95

Computational Algorithm:

Step 1: locate initial interval of uncertainty (, u)

Step 2: select < i< u

f(i)

Step 3: compute a0, a1, a2,, f() Step 4:

f(i)< f() f(i)> f()

i<

[, u]

, i,

[, i]

i,, u

< i [,] , i, u

[i, u] ,, i

Step 5: Stopif two successive estimates of minimum point off()are sufficiently close. Otherwise delete primes on ,

i,

u

and return to Step 2

Chen CL 30


31/95

Example:

f() = 2 4+e = 0.5

= 0.5 i = 1.309017 u = 2.618034f() = 1.648721 f(i) = 0.466464 f(u) = 5.236610

a2 = 11.30902

3.58792.1180 1.18230.80902

= 2.410

a1 = 1.1823

0.80902

(2.41)(1.80902) =

5.821

a0 = 1.648271 (5.821)(0.50) 2.41(0.25) = 3.957 = 1.2077< i f() = 0.5149> f(i)

= = 1.2077 u=u= 2.618034, i=i= 1.309017

= 1.2077 i = 1.309017 u = 2.618034

f() = 0.5149 f(i) = 0.466464 f(u) = 5.236610

a2 = 5.3807 a1= 7.30547 a0= 2.713

= 1.3464 f() = 0.4579

Chen CL 31


32/95

Multi-Dimensional Minimization:Powells Conjugate Directions Method

Conjugate Directions

Let Abe an n n symmetric matrix.A set ofn vectors (directions)

{Si

}is said to be

A-conjugate if

STiASj = 0 fori, j = 1, , n; i =j

Note: orthogonal directions are a special case of

conjugate directions (A=I)

Chen CL 32


33/95


Quadratically Convergent Method

If a minimization method, using exact arithmetic, can

find the minimum point in n steps while minimizing a

quadratic function in n variables, the method is calleda quadratically convergent method

Theorem: Given a quadratic function ofn variables

and two parallel hyperplanes 1 and 2 of dimensionsk < n. Let the constrained stationary points of the

quadratic function in the hyperplanes be X1 and X2,

respectively. Then the line joining X1 and X2 is

conjugate to any line parallel to the hyperplanes.

Chen CL 33


34/95


Proof:

Q(X) = 1

2XTAX+ BTX+C

Q(X) = AX+ B (n 1)search froma along S X1 (stationary pt)search fromb along S X2

Sorthogonal to

Q(X1) and

Q(X2)

STQ(X1) = STAX1+ STB = 0STQ(X2) = STAX2+ STB = 0

ST [Q(X1) Q(X2)] = STA(X1 X2) = 0

Chen CL 34


35/95


Meaning: IfX1andX2are the minima ofQobtainedby searching along the direction S from two different

starting points Xa and Xb, respectively,

the line (X1

X2) will be conjugate to S

Chen CL 35


36/95


Theorem:If a quadratic function

Q(X) =1

2XTAX+ BTX+C

is minimized sequentially,

once along each direction

of a set of n mutually

conjugate directions, the

minimum of the function Q

will be found at or before

the nth step irrespective of

the starting point

Proof:

Q(X) = B+ AX = 0Let X = X1+

n

j=1jSj

Sj : conjugate directions to A

0 = B+ AX1+ A n

j=1

jSj

0 = STi(B+ AX1)+

STi A

nj=1

jSj

= (B+ AX1)

TSi+iSTi ASi

i =

(B+ AX1)

TSi

STi ASi

Chen CL 36


37/95


Note: Xi+1 = Xi+i Si, i= 1,

, n

i is found by minimizingQ(iSi) so that

0 = STiQ(Xi+1)Q(Xi+1) = B+ AXi+1 = B+ A(Xi+i Si)

0 = STiQ(Xi+1) = STi{B+ A(Xi+i Si)}= (B+ AXi)TSi+i STi Si

i = (B+ AXi)

TSi

STi ASi

Xi = X1+

i1

j=1jSj

XiTASi = X1

TASi+

i1j=1

jSj

T

ASi = X1TASi

i = (B+ AXi)T Si

STi ASi

= (B+ AX1)T Si

STi ASi= i

Chen CL 37

C


38/95

Powells Conjugate Directions: Example

f(x1, x2) = 6x21+ 2x

22

6x1x2

x1

2x2

=1 2

x1x2

+

12

x1 x2

12 66 4

x1x2

if S1 =

1

2

X1 =

0

0

ST1 AS2 = 1 2 12 66 4 s1s2=

0 2

s1s2

= 0 S2 =

1

0

1 = 1 2

1

2

1 2

12 66 4

1

2

= 5

4

X2 = X1+

1S1 =

0

0 +5

4 1

2 = 5/4

5/2

Chen CL 38


39/95

2 =

1 2

10

1 0

12 66 4

1

0

= 1

12

X3 = X2+2S2 =

5/4

5/2

+

1

12

1

0

=

4/3

5/2

= X (?)

Chen CL 39

P ll Al i h


40/95

Powells Algorithm


41/95

Chen CL 41

P ll C j Di i E l


42/95

Powells Conjugate Directions: Example

Min: f(x1, x2) = x1 x2+ 2x21+ 2x1x2+x22X

1 = [ 0 0 ]T

Chen CL 42

C l 1 U i i t S h


43/95

Cycle 1: Univariate Search

along u2: f(X1+u2) = f(0, ) = 2

df

d = 0 = 1

2

X2 = X1+u2 = 0

0.5

along

u1: f(X2

u1) = f(

, 0.5) = 22

2

0.25

dfd

= 0 = 12

X3 = X2 u1 =

0.50.5

along u2: f(X3+u2) = f(0.5, 0.5 +) = 2 0.75dfd

= 0 = 12

X4 = X1+

u2 = 0.5

1

Chen CL 43

C l 2 P tt S h


44/95

Cycle 2: Pattern Search

S(1) =X4

X2 = 0.51 0

0.5 = 0.50.5

f(X4+S(1)) = f(0.5 0.5, 1 + 0.5)

= 0.252 0.5 1df

d = 0 = 1.0 X5 = X4+S(1) =

1.0

1.5

Chen CL 44

Si l M th d


45/95

Simplex Method

Chen CL 45

Si l M th d


46/95

Simplex Method

Chen CL 46

Si l M th d


47/95

Simplex Method

Chen CL 47

Si l M th d


48/95

Simplex Method

Chen CL 48

P o e ties of G adie t Vecto


49/95

Properties of Gradient Vector

f =

f

x1...

fxn

= c

c(k) = c(x(k)) = f(x(k)) = f(x(k))

xi

Chen CL 49

Property 1: The gradient vector c of a function


50/95

Property 1: The gradient vector c of a function

f(x1, , xn) at point x= (x1, , xn) is orthogonal(normal) to the tangent plane for the surface

f(x1, , xn) = constant.

Cis any curve on the surface through x

T is a vector tangent to curve C at x c T = 0

Chen CL 50

Proof:


51/95

Proof:

s : any parameter along C

T =

x1s

xns

x=x(a unit tangent vector along C at x)

f(x) = constant dfds

= 0

0 = df

ds =

f

x1

x1s + +

f

xn

xns

= cTT = c T

Chen CL 51

Property 2: Gradient represents a direction of


52/95

Property 2: Gradient represents a direction of

maximum rate of increase for f(x) at x

Proof:u : a unit vector in any direction not tangent to C

t : a parameter alongudf

dt

= lim0

f(x +u)

f(x +u) = f(x) +

u1fx1

+ +un fxn

+O(2)

f(x +u) f(x) = n

i=1

uif

xi+O(2) (1

)

dfdt

= lim0

f(x +u)

=n

i=1

ui fxi

= c u = cTu

= ||c|| ||u|| cos (max rate of increase when= 0)

Chen CL 52

Property 3: The maximum rate of change of f (x) at


53/95

Property 3: The maximum rate of change off(x) at

any point x is the magnitude of the gradient vector

(max dfdt= ||c||)uis in the direction of gradient vector for = 0

Chen CL 53

Verify Properties of Gradient Vector


54/95

Verify Properties of Gradient Vector

f(x) = 25x21+x22, x

(0) = (0.6, 4)

f(x(0)) = 25

c = f(0.6, 4) =

fx1fx2

=

50x1

2x2

=

30

8

C = c

||c|| =

30

8

302+82

=

0.966235

0.257663

t = (25x21+x

22=25)

s1

(25x

2

1+x

2

2=25)s2 = 415

T = t

||t|| =

4

15

(4)2+152

=

.2576630.966235

Chen CL 54

Property 1: C T = 0


55/95

Property 1: C T = 0

Slope of tangent: m1 = dx2dx1

= 5x11x21

= 13.75Slope of gradient: m

2 = c1

c2= 50x1

2x2= 30

8 = 3.75

Property 2: choose arbitrary directionD= (0.501034, 0.865430), = 0.1

x(1)C

= x(0) +C =0.6

4.0

+ 0.1

0.9662350.257663

=

0.69662354.0257663

x(1)D

= x(0) +D =

0.6

4.0

+ 0.1

0.501034

0.865430

=

0.6501034

4.0854300

f(x(1)

C

) = 28.3389

f(x(1)D

) = 27.2566 < f(x(1)C

)

Property 3: C C= 1.00 > C D= 0.7071059

Chen CL 55

Steepest Descent Algorithm


56/95

Steepest Descent Algorithm

Steepest Descent Direction

Let f(x) be a differentiable function w.r.t. x. The

direction of steepest descent for f(x) at any point

is d= c

Steepest Descent Algorithm:

Step 1: a starting design x(0), k= 0, Step 2: c(k) = f(x(k)); stop if||c(k)|| < Step 3: d(k) = c(k) Step 4: calculate k to minimize f(x

(k) +d(k))

Step 5: x(k+1)

=x(k)

+kd(k)

, k=k + 1 Step 2

Chen CL 56

Notes:


57/95

Notes:

d= c c d= ||c||2


58/95

Steepest Descent: Example

f(x1, x2) =x21+x

22 2x1x2 x(0) = (1, 0)

Step 1: x(0)

= (1, 0), k= 0, ()

Step 2: c(0) = f(x(0)) = (2x1 2x2, 2x2 2x1)= (2, 2); ||c(0)|| = 22 = 0

Step 3: d(0) =

c(0) = (

2, 2)

Step 4: to minimize f(x(0) +d(0)) =f(1 2, 2)f(1 2, 2) = (1 2)2 + (2)2 2(1 2)(2)

= 162 81= f()df

(

)d = 32 8 = 0 0= 0.25d2f()d2

= 32> 0

Step 5:

x(1)

=x(0)

+0d(0)

= (1 0.25(2), 0 + 0.25(2)) = (0.5, 0.5)c(1) = (0, 0) stop

Chen CL 58



59/95


f(x1, x2, x3) = x21+ 2x

22+ 2x

23+ 2x1x2+ 2x2x3

x(0) = (2, 4, 10)x = (0, 0, 0)

Step 1: k= 0, = 0.005, (= 0.05, = 0.0001 for Golden)

Step 2: c(0) = f(x(0)) = (2x1+ 2x2, 4x2+ 2x1+ 2x3, 4x3+ 2x2)= (12, 40, 48); ||c(0)|| = 4048 = 63.6>

Step 3: d(0) = c(0) = (12, 40, 48)

Step 4: to minimize f(x(0) +d(0)) by Golden 0= 0.158718

Step 5:x(1) =x(0) +0d

(0) = (0.0954, 2.348, 2.381)c(1) = (4.5, 4.438, 4.828); ||c(1)|| = 7.952> Note: c

(1)

d(0)

= 0 (perfect line search)

Chen CL 59


60/95

Chen CL 60

Steepest Descent: Disadvantages


61/95

Steepest Descent: Disadvantages

Slow to converge, especially when approaching the optimum

a large number of iterations

Information calculated at previous iterations is not used,

each iteration is started independent of others

Chen CL 61

Scaling of Design Variables


62/95

Scaling of Design Variables

The steepest descent method converges in only one iteration for a

positive definite quadratic function with a unit condition numberof the Hessian matrix

To accelerate the rate of convergence

scale design variables such thatcondition number of new Hessian matrix is unity

Chen CL 62


63/95

Example:

Min: f(x1, x2) = 25x21+x22 x0= (1, 1)

H =

50 0

0 2

let x = Dy D= 1

50 0

0 12

Min: f(y1, y2) =

12

y21+y

22

y0 = (

50,

2)

Chen CL 63


64/95

Chen CL 64

Example:


65/95

p

Min:f(x1, x2) = 6x21 6x1x2+ 2x22 5x1+ 4x2+ 2

H = 12 66 4 1,2 = 0.7889, 15.211 (eigenvalues)

v1,2 = (0.4718, 0.8817), (0.8817, 0.4718)

let x = Qy Q= v1 v2= 0.4718 0.88170.8817 0.4718 Min:f(y1, y2) = 0.5(0.7889y21+ 15.211y

22) + 1.1678y1+ 6.2957y2+ 2

let y = Dz D=

10.7889

0

0

1

15.211

Min:f(y1, y2) = 0.5(z21+z

22) + 1.3148z1+ 1.6142z2

x0 = (1, 2) z = (1.3158, 1.6142)x = QDz= (

13,

23)

Chen CL 65

Conjugate Gradient Method


66/95

Conjugate Gradient MethodFletcher and Reeves (1964)

Steepest Descent: orthogonal at consecutive steps

converge but slow Conjugate Gradient Method:

modify current steepest descent direction by adding a

scaled previous direction

cut diagonally through orthogonal steepest descent directions Conjugate Gradient Directions: d(i), d(j)

orthogonal w.r.t. a symmetric and positive definite

matrix A

d(i)T

Ad(j)

= 0

Chen CL 66

Conjugate Gradient Method: algorithm


67/95

j g g

Step 1: k= 0, x(0) d(0) = c(0) = f(x(0))Stop if

||c(0)

||< , otherwise go to Step 4

Step 2: c(k) = f(x(k)), Stop if||c(k)|| < Step 3: d(k) = c(k)+kd(k1), k=

||c(k)||/||c(k1)||2 Step 4: compute k= to minimize f(x

(k) +d(k))

Step 5: x(k+1) =x(k) +d(k), k=k+ 1, go to Step 2

Note:

Find the minimum in n iterations for positive definite quadraticforms having n design variables

Inexact line search, non-quadratic forms

re-started every n+ 1 iterations for computational stability

(x(0)

=x(n+1)

)

Chen CL 67

Example:


68/95

p

Min: f(x) = x21+ 2x22+ 2x

23+ 2x1x2+ 2x2x3 x

(0) = (2, 4, 10)

c(0) = (12, 40, 48); ||

c(0)

||= 63.6; f(x(0)) = 332.0

x(1) = (0.0956, 2.348, 2.381)c(1) = (4.5, 4.438, 4.828); ||c(1)|| = 7.952; f(x(1)) = 10.75

1 = ||c(1)||/||c(0)||

2= [7.952/63.3]2 = 0.015633

d(1) = c(1) +1d(0)

=

4.500

4.438

4.828

+ (0.015633)

1240

48

=

4.31241

3.81268

5.57838

Chen CL 680 0956

4 31241


69/95

x(2) = x(1) +d(1) =

0.0956

2.348

2.381

+

4.31241

3.81268

5.57838

Min: f(x(1) + d(1)) = 0.3156

x(2) = (1.4566, 1.1447, 0.6205)c(2) = (0.6238, 0.4246, 0.1926), ||c(2)|| = 0.7788

Note: c(2) d(1) = 0

Chen CL 69

Newton Method


70/95

Newton MethodA Second-order Method

x : current estimate ofx

x x + x (desired)

f(x + x) = f(x) + cT

x +1

2xT

HxNC: fx = c + Hx = 0

x = H1c

x = H1

c (modified)

Chen CL 70

Steps: (modified)


71/95

p ( )

Step 1: k= 0; c(0);

Step 2: c(k)

i = f(x

(k))

xi, i= 1

n; Stop if

||c(k)

||<

Step 3: H(x(k)) = 2fxixj

Step 4: d(k) = H1c(k) or Hd(k) = c(k)

Note: for computational efficiency, a system of linear simultaneous eqns

is solved instead of evaluating the inverse of Hessian Step 5: compute k= to minimize f(x

(k) +d(k))

Step 6: x(k+1) =x(k) +d(k), k=k+ 1, go to Step 2

Note: unless H is positive definite,

d(k) will not be that of descent for f

H>0 c(k)Td(k) = k c(k)TH1c(k) >0 for positive H


72/95

p

f(x) = 3x21+ 2x1x2+ 2x22+ 7

x(0) = (5, 10); = 0.0001

c(0) = (6x1+ 2x2, 2x1+ 4x2) = ( 5 0, 50); ||c(0)|| = 502H(0) =

6 2

2 4

, H(0)

1= 120

4 22 6

d(0)

= H1

c(0)

= 1

20 4 22 6 5050= 510x(1) = x(0) +d(0) =

5

10

+

510

=

5 510 10

dfd

= 0 or

f(x(1))

d(0) = 0

f(x(1)) =6(5 5) + 2(10 10)

2(5 5) + 4(10 10)

=50 50

50 50

f(x(1)) d(0) =

50 50 50 50 5

10

= 5(50 50) 10(50 50) = 0 = 1

Chen CL 72

( )

5 5

0


73/95

x(1) =

5 510 10

=

0

0

c(1) =50 50

50 50

=0

0

Chen CL 73

Example:


74/95

f(x) = 10x41 20x21x2+ 10x22+x21 2x1+ 5, x(0) = (1, 3)c = f(x) = ( 4 0x31 40x1x2+ 2x1 2, 20x21+ 20x2)

H = 2f(x) =

120x21 40x2+ 2 40x140x1 20

Chen CL 74


75/95

Chen CL 75

Comparison of Steepest Descent, Newton,


76/95

Conjugate Gradient Methods

f(x) = 50(x2 x21)2 + (2 x1)2 x(0) = (5, 5) x= (2, 4

Chen CL 76


77/95

Chen CL 77


78/95

Chen CL 78

Newton Method


79/95

Advantage: quadratic convergent rate Disadvantages:

Calculation of second-order derivatives at each iteration

A system of simultaneous linear equations needs to be solved

Hessian of the function may be singular at some iterations

Memoryless method: each iteration is started afresh

Not convergent unless Hessian remains positive definite and a

step size determination scheme is used

Chen CL 79

Marquardt Modification (1963)


80/95

q ( )

d(k) = (H+I)1c(k)

Far away solution pointuse Steepest Descent

Near the solution pointuse Newton Method Step 1: k= 0; c(0); ; (= 10000) (large)

Step 2: c(k)i =

f(x(k))xi

, i= 1 n; Stop if||c(k)|| <

Step 3: H(x

(k)

= 2f

xixj) Step 4: d(k) = (H+kI)1c(k) Step 5: iff(x(k) + d(k))< f(x(k)), go to Step 6

Otherwise, let k= 2k and go to Step 4

Step 6: Set k+1

= 0.5k

, k=k + 1 and go to Step 2

Chen CL 80

Quasi-Newton Methods


81/95

Steepest Descent:

Use only 1st-order information poor rate of convergence Each iteration is started with new design variables without using

any information from previous iterations

Newton Method:

Use2nd-order derivatives quadratic convergence rate

Requires calculation of n(n+1)

2

2nd-order derivatives !

DIfficulties if Hessian is singular

Not learning processes

Chen CL 81

Quasi-Newton Methods


82/95

Quasi Newton Methods, Update Methods:

Use first-order derivatives to generate approximations for Hessian

combine desirable features of both steepest descent and Newtons

methods

Use information from previous iterations to speed up convergence

(learning processes)

Several waysto approximate (updated) Hessian or its inverse

Preserve properties of symmetry and positive definiteness

Chen CL 82

Davidon-Fletcher-Powell (DFP) Method


83/95

( )

Davidon (1959), Fletcher and Powell (1963)

To approximate Hessian inverse using only first

derivatives

x = H1c Ac

A : find A by using only 1st-order information

Chen CL 83

DFP Procedures: A H1


84/95

Step 1: k= 0; c(0), ; A(0)(=I, H1) Step 2: c(k) =

f(x(k)), Stop if

||c(k)

||<

Step 3: d(k) = A(k)c(k) Step 4: compute k= to minimize f(x

(k) +d(k))

Step 5: x(k+1) =x(k) +kd(k)

Step 6: update A(k)

A(k+1) = A(k) + B(k) + C(k)

B(k) = s(k)s(k)

T

s(k)y(k) C(k) =z

(k)z(k)T

y(k)z(k)

s(k) = kd(k) (change in design)

y(k) = c(k+1) c(k) (change in gradient)c(k+1) = f(x(k+1)) z(k) = A(k)y(k)

Step 7: set k=k + 1 and go to Step 2

Chen CL 84

DFP Properties:


85/95

Matrix A(k) is always positive definite

always converge to a local minimum if >0

d

df(x(k) +d(k))

=0

= c(k)TA(k)c(k)


86/95

f(x) = 5x21+ 2x1x2+x22+ 7 x

(0) = (1, 2)

1-1. x(0) = (1, 2); A(0) =I; k= 0, = 0.001

c(0) = (10x1+ 2x2, 2x1+ 2x2) = (14, 6),

1-2. ||

c(0)

|| =

142 + 62 = 15.232>

1-3. d(0) = c(0) = (14, 6)1-4. x(1) = x(0) +d(0) = (1 14, 2 6)

f(x(1)) = f() =5(1 14)2 + 2(1 14)(2 6) + 2(2 6)2 + 7dfd = 5(2)(14)(1 14) + 2(14)(2 6) + 2(6)(1 14)

+2(6)(2 6) = 0 = 0.0988, d

2f

d2= 2348> 0

1-5. x(1) = x(0) +0d(0) = (1

14, 2

6) = (

0.386, 1.407)

Chen CL 86


87/95

1-6. s(0) = 0d(0) = (1.386, 0.593), c(1) = (1.406, 2.042

y(0) = c(1) c(0) = (15.046, 3.958), z(0) =y(0)

s(0)

y(0)

= 23.20, y(0)

z(0)

= 242.05

s(0)s(0)T

=

1.921 0.822

0.822 0.352

z(0)z(0)T =

226.40 59.55

59.55 15.67

B(0) = 0.0828 0.03540.0354 0.0152

C(0) = 0.935 0.2460.246 0.065A(1) = A(0) + B(0) + C(0) =

0.148 0.211

0.211 0.950

Chen CL 87


88/95

2-2. ||c(1)|| = 1.0462 + 2.0422 = 2.29> 2-3. d(1) = A(1)c(1) = (0.586, 1.719)2-4. x(2) = x(1) +d(1)

1 = 0.776 (minimize f(x(1) +d(1)))

2-5. x(2) = x(1) +d(1) = (0.386, 1.407) + (0.455, 1.334)= (0.069, 0.073)

Chen CL 88

2-6. s(1) = 1d(1) = (0.455, 1.334), c(2) = (0.836, 0.284)


89/95

2 6. s 1d (0 455, 1 334), c (0 836, 0 284)

y(1) = c(2) c(1) = (1.882, 1.758)z(1) = A(1)y(1) = (0.649,

2.067)

s(1) y(1) = 3.201, y(1) z(1) = 4.855

s(1)s(1)T

=

0.207 0.607

0.607 1.780

z(1)z(1)

T=

0.421 1.341

1.341 4.272

B(1) =

.0647 0.190.19 0.556

C(1) = .0867 0.2760.276 0.880

A

(2)

= A

(1)

+ B

(1)

+ C

(1)

= 0.126

0.125

0.125 0.626

Chen CL 89

Broyden-Fletcher-Goldfarb-Shanno (BFGS)


90/95

Method

Direct update Hessian using only first derivatives

x = H1cHx = cAx c

A : find A by using only 1st-order information

Chen CL 90

BFGS Procedures:


91/95

Step 1: k= 0; c(0), ; H(0)(=I, H) Step 2: c(k) = f(x(k)), Stop if||c(k)|| < Step 3: solve H

(k)

d(k)

= c(k)

to obtain d(k)

Step 4: compute k= to minimize f(x(k) +d(k))

Step 5: x(k+1) =x(k) +kd(k)

Step 6: update H(k)

H(k+1) = H(k) + D(k) + E(k)

D(k) = y(k)y(k)

T

y(k)s(k) E(k) = c

(k)c(k)T

c(k)d(k)

s(k) = kd(k) (change in design)

y(k) = c(k+1) c(k) (change in gradient)c(k+1) = f(x(k+1))

Step 7: set k=k + 1 and go to Step 2

Chen CL 91

BFGS Example:


92/95

f(x) = 5x21+ 2x1x2+x22+ 7 x

(0) = (1, 2)

1-1. x(0) = (1, 2); H(0) =I; k= 0, = 0.001

c(0) = (10x1+ 2x2, 2x1+ 2x2) = (14, 6),

1-2. ||

c(0)

|| =

142 + 62 = 15.232>

1-3. d(0) = c(0) = (14, 6)1-4. x(1) = x(0) +d(0) = (1 14, 2 6)

f(x(1)) = f() =5(1 14)2 + 2(1 14)(2 6) + 2(2 6)2 + 7dfd = 5(2)(14)(1 14) + 2(14)(2 6) + 2(6)(1 14)

+2(6)(2 6) = 0 = 0.0988, d

2f

d2= 2348> 0

1-5. x(1) = x(0) +0d(0) = (1

14, 2

6) = (

0.386, 1.407)

Chen CL 92


93/95

1-6. s(0) = 0d(0) = (1.386, 0.593), c(1) = (1.406, 2.042

y(0) = c(1) c(0) = (15.046, 3.958)

y(0)

s(0)

= 23.20, c(0)

d(0)

= 232.0y(0)y(0)

T=

226.40 59.55

59.55 15.67

c(0)c(0)T =

196 84

84 36

D(0) = 9.760 2.5672.567 0.675

E(0) = 0.845 0.3620.362 0.155H(1) = H(0) + D(0) + E(0) =

9.915 2.205

2.205 0.520

Chen CL 93


94/95

2-2. ||c(1)|| = 1.0462 + 2.0422 = 2.29> 2-3. c(1) = H(1)d(1) d(1) = (17.20, 76.77)2-4. x(2) = x(1) +d(1)

1 = 0.018455 (minimize f(x(1) +d(1)))

2-5. x(2) = x(1) +1d(1) = (

0.0686,

0.0098)

Chen CL 94

2-6. s(1) = 1d(1) = (0.317, 1.417), c(2) = (0.706, 0.157)


95/95

( , ), ( , )

y(1) = c(2) c(1) = (0.340, 2.199)y(1)

s(1) = 3.224, c(1)

d(1) =

174.76

y(1)y(1)T

=

0.1156 0.7480.748 4.836

c(1)c(1)T = 1.094 2.1362.136 4.170

D(1)

= 0.036 0.2320.232 1.500 E(1) = .0063 .0122.0122 .0239H(2) = H(1) + D(1) + E(1) =

9.945 1.985

1.985 1.996

5 numerical methods for unconstrained optimization.pdf

Documents