computational optimization steepest descent 2/8. most obvious algorithm the negative gradient always...

Computational Optimization

Steepest Descent 2/8

Most Obvious Algorithm

The negative gradient always points downhill.

Head in the direction of the negative gradient.

Use linesearch to decided how far to go.

Repeat.

Steepest Descent Algorithm

Start with x0

For k =1,…,K If xk is optimal then stop

Perform exact or backtracking linesearch to determine

xk+1=xk+kpk

( )k kp f x

How far should we step?

Linesearch: xk+1=xk+kpk

fixed k::=c>0 a constantDecreasing Exact Approximate conditions

ArmijoWolfe Goldstein

1

,0i

kk

Key Idea: Sufficient Decrease

The step can’t go to 0 unless the gradient goes to 0.

Armijo Condition

( ) ( )

'(0) ( ) '

k

k

k

k

g f x p

g f x p

g()

g(0)+ g’()

g(0)+ c1g’()

11/2

1( ) ( ) ( ) 'k k kk kf x p f x c f x p

Curvature Condition

2( ) ' ( ) 'k kk kf x p p c f x p

Make sure the step uses up most of the available decrease

Wolfe Condition

For 0<c1<c2<1

Solution exists for any descent direction if f is bounded below on the linesearch.

(Lemma 3.1)

1( ) ( ) ( ) 'k k kk kf x p f x c f x p

2( ) ' ( ) 'k kk kf x p p c f x p

Backtracking Search

Key point: Stepsize cannot be allowed to go to 0 unless gradient is going to 0. Must have sufficient decrease.

Fix >0 (0,1) (0,1)

0 1 2max( , , ,...)

which satisfies

( ) ( ) ( ) 'k k

k

k k k k kf x p f x f x p

Armijo Condition

( ) ( )

'(0) ( ) '

k

k

k

k

g f x p

g f x p

g()

g(0)+ g’()

g(0)+ c1g’()

11/2

1( ) ( ) ( ) 'k k kk kf x p f x c f x p

Steepest Descent Algorithm

Start with x0

For k =1,…,K If xk is optimal then stop

Perform backtracking linesearch to determine

xk+1=xk+kpk

( )k kp f x

Convergence Analysis

We usually analyze quadratic problem

Equivalent to more general problems since let x* solve min f(x)=1/2x’Qx-bx Let y = x-x* so x=y+x* then

1

2min ( ) ' where min ( ) 0f y y Qy c f y

1

21 1

2 21 1

2 21

2

( ) ( *)' ( *) '( *)

' * ' * ' * ' ' *

' '( * ) * ' * ' *

' ( *)

f x y x Q y x b y x

y Qy x Qx y Qx b y b x

y Qy y Qx b x Qx b x

y Qy f x

Convergence Rate

Consider case of constant stepsize

Square both sides

1 ( ) for our quadratic functionk k k k k kx x f x I Q x

22

1

2 2

21

'

max eigenvalues

and we know the eigenvalues are

(1 ) where ... are the eigs of

k k k k

k k

k i n

x x I Q x

I Q x

Q

Convergence Rate…

111

22 21

12

Convergence rate based on

0max |1 |,|1 |

0

since max 1 , 1

So we want this ratio to be small as possible

kkk k n

k k

kk k n

k

xerror

error x

x

x

Best Step

11Minimizes max | |1,|| k nk

1

n 1

2

n 1

1

1

2*

n

1

1

n

n

Condition number determines convergence rate

Linear convergence since

Condition number:

1 is best

1 1

1

1k n

k n

x

x

1

1n

1

1

11

1n

n

Well Conditioned Problem

x^2+y^2Cond num =1/1=1

Steepest Descentdoes one step like Newton

ill-Conditioned Problem

50(x-10)^2+y^2Cond num =50/1=50

Steepest DescentZIGZAGS!!!

Linear Convergencefor our fixed stepsize

1 1

1

1k n

k n

x

x

Theorem 3.3 NW

Let be the sequence generated by steepest descent with exact linesearch applied to function min 1/2x’Qx-b’x where Q is p.d.

Then for any x0, the sequence converges to the unique minimizer x* and

That is the method converges linearly

kx

22 2

1 1

( ) 1* *

( ) 1k kQ Q

cond Qx x x x

cond Q

Condition Number

Officially:

But this as ratio of max and mix eigenvalues if A is pd and symmetric

1( ) || || || ||cond A A A

max

2max max

|| || ( ' )

( ) ( ) if A is pd and symmetric

A A A

A A

1 1 1max

2max min

|| || ( ' )

( ) 1/ ( )

A A A

A A

Theorem 3.4

Suppose f is twice cont diff and the sequence of steepest descent converge to x* satisfying SOSC.

Let

The for all k sufficient large

211

1

... ( *)nn

n

r where eigenvaluesof f x

*)()(*)()( 21 xfxfrxfxf kk

Other possible directions

Any direction satisfying

is a descent direction.

Which ones will work?

0)'( kk pxf

( ) 'cos

( )k k

kk k

f x p

f x p

( )kf x

kpk

Zoutendijk’s Theorem

For any descent direction and step size satisfying the Wolfe conditions, f bounded, below, Lipschitz, and differentiable

22

0

cos ( )k kk

f x

Corollary: 22cos ( ) 0k kf x

Convergence Theorem

If for all iteration k

Then

So steepest descent converges!

Many other variations.

lim ( ) 0kk

f x

cos 0k

Theorem Convergence of Gradient Related Methods

(i) Assume the set S is bounded where

(ii) Let be Lipschitz cont on S i.e.

for all x,y S and some fixed finite L

For the sequence generated by

( )f x( ) ( )f x f y L x y

1k k kx x p

kx S

0: ( ) ( )S x f x f x

Theorem 10.2 Nash and Sofer

(iii) Let the directions pk satisfy:

(iv) be gradient related:

and bounded in form:

' ( )0

( )k k

k k

p f x

p f x

( ) for all (with 0)k kp m f x k m

for all kp M k

Theorem 10.2

(v) Let the stepsize be the first element of the sequence {1,1/2,1/4,1/8,….} satisfying

with 0<<1.

Then lim ( ) 0.kk

f x

( ) ( ) ( ) 'k kk k k k kf x p f x f x p

Steepest Descent

An obvious choice of direction is

Called steepest descent because

Clearly satisfies gradient related requirement of Theorem 10.2

( )k kp f x

' '( ) ( ) ( )min 1

( ) ( ) ( )k k k k

k k k k

p f x f x f x

p f x f x f x

Steepest Descent Summary

Simple algorithm Inexpensive per iterationOnly requires first derivative informationGlobal convergence to local minimumLinear convergence – may be slowConvergence rate depends on condition

numberMay zigzag

computational optimization steepest descent 2/8. most obvious algorithm the negative gradient always...

Documents