computational optimization steepest descent 2/8. most obvious algorithm the negative gradient always...

30
Computational Optimization Steepest Descent 2/8

Upload: buddy-stokes

Post on 24-Dec-2015

223 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Computational Optimization Steepest Descent 2/8. Most Obvious Algorithm The negative gradient always points downhill. Head in the direction of the negative

Computational Optimization

Steepest Descent 2/8

Page 2: Computational Optimization Steepest Descent 2/8. Most Obvious Algorithm The negative gradient always points downhill. Head in the direction of the negative

Most Obvious Algorithm

The negative gradient always points downhill.

Head in the direction of the negative gradient.

Use linesearch to decided how far to go.

Repeat.

Page 3: Computational Optimization Steepest Descent 2/8. Most Obvious Algorithm The negative gradient always points downhill. Head in the direction of the negative

Steepest Descent Algorithm

Start with x0

For k =1,…,K If xk is optimal then stop

Perform exact or backtracking linesearch to determine

xk+1=xk+kpk

( )k kp f x

Page 4: Computational Optimization Steepest Descent 2/8. Most Obvious Algorithm The negative gradient always points downhill. Head in the direction of the negative

How far should we step?

Linesearch: xk+1=xk+kpk

fixed k::=c>0 a constantDecreasing Exact Approximate conditions

ArmijoWolfe Goldstein

1

,0i

kk

Page 5: Computational Optimization Steepest Descent 2/8. Most Obvious Algorithm The negative gradient always points downhill. Head in the direction of the negative

Key Idea: Sufficient Decrease

The step can’t go to 0 unless the gradient goes to 0.

Page 6: Computational Optimization Steepest Descent 2/8. Most Obvious Algorithm The negative gradient always points downhill. Head in the direction of the negative

Armijo Condition

( ) ( )

'(0) ( ) '

k

k

k

k

g f x p

g f x p

g()

g(0)+ g’()

g(0)+ c1g’()

11/2

1( ) ( ) ( ) 'k k kk kf x p f x c f x p

Page 7: Computational Optimization Steepest Descent 2/8. Most Obvious Algorithm The negative gradient always points downhill. Head in the direction of the negative

Curvature Condition

2( ) ' ( ) 'k kk kf x p p c f x p

Make sure the step uses up most of the available decrease

Page 8: Computational Optimization Steepest Descent 2/8. Most Obvious Algorithm The negative gradient always points downhill. Head in the direction of the negative

Wolfe Condition

For 0<c1<c2<1

Solution exists for any descent direction if f is bounded below on the linesearch.

(Lemma 3.1)

1( ) ( ) ( ) 'k k kk kf x p f x c f x p

2( ) ' ( ) 'k kk kf x p p c f x p

Page 9: Computational Optimization Steepest Descent 2/8. Most Obvious Algorithm The negative gradient always points downhill. Head in the direction of the negative

Backtracking Search

Key point: Stepsize cannot be allowed to go to 0 unless gradient is going to 0. Must have sufficient decrease.

Fix >0 (0,1) (0,1)

0 1 2max( , , ,...)

which satisfies

( ) ( ) ( ) 'k k

k

k k k k kf x p f x f x p

Page 10: Computational Optimization Steepest Descent 2/8. Most Obvious Algorithm The negative gradient always points downhill. Head in the direction of the negative

Armijo Condition

( ) ( )

'(0) ( ) '

k

k

k

k

g f x p

g f x p

g()

g(0)+ g’()

g(0)+ c1g’()

11/2

1( ) ( ) ( ) 'k k kk kf x p f x c f x p

Page 11: Computational Optimization Steepest Descent 2/8. Most Obvious Algorithm The negative gradient always points downhill. Head in the direction of the negative

Steepest Descent Algorithm

Start with x0

For k =1,…,K If xk is optimal then stop

Perform backtracking linesearch to determine

xk+1=xk+kpk

( )k kp f x

Page 12: Computational Optimization Steepest Descent 2/8. Most Obvious Algorithm The negative gradient always points downhill. Head in the direction of the negative

Convergence Analysis

We usually analyze quadratic problem

Equivalent to more general problems since let x* solve min f(x)=1/2x’Qx-bx Let y = x-x* so x=y+x* then

1

2min ( ) ' where min ( ) 0f y y Qy c f y

1

21 1

2 21 1

2 21

2

( ) ( *)' ( *) '( *)

' * ' * ' * ' ' *

' '( * ) * ' * ' *

' ( *)

f x y x Q y x b y x

y Qy x Qx y Qx b y b x

y Qy y Qx b x Qx b x

y Qy f x

Page 13: Computational Optimization Steepest Descent 2/8. Most Obvious Algorithm The negative gradient always points downhill. Head in the direction of the negative

Convergence Rate

Consider case of constant stepsize

Square both sides

1 ( ) for our quadratic functionk k k k k kx x f x I Q x

22

1

2 2

21

'

max eigenvalues

and we know the eigenvalues are

(1 ) where ... are the eigs of

k k k k

k k

k i n

x x I Q x

I Q x

Q

Page 14: Computational Optimization Steepest Descent 2/8. Most Obvious Algorithm The negative gradient always points downhill. Head in the direction of the negative

Convergence Rate…

111

22 21

12

Convergence rate based on

0max |1 |,|1 |

0

since max 1 , 1

So we want this ratio to be small as possible

kkk k n

k k

kk k n

k

xerror

error x

x

x

Page 15: Computational Optimization Steepest Descent 2/8. Most Obvious Algorithm The negative gradient always points downhill. Head in the direction of the negative

Best Step

11Minimizes max | |1,|| k nk

1

n 1

2

n 1

1

1

2*

n

1

1

n

n

Page 16: Computational Optimization Steepest Descent 2/8. Most Obvious Algorithm The negative gradient always points downhill. Head in the direction of the negative

Condition number determines convergence rate

Linear convergence since

Condition number:

1 is best

1 1

1

1k n

k n

x

x

1

1n

1

1

11

1n

n

Page 17: Computational Optimization Steepest Descent 2/8. Most Obvious Algorithm The negative gradient always points downhill. Head in the direction of the negative

Well Conditioned Problem

x^2+y^2Cond num =1/1=1

Steepest Descentdoes one step like Newton

Page 18: Computational Optimization Steepest Descent 2/8. Most Obvious Algorithm The negative gradient always points downhill. Head in the direction of the negative

ill-Conditioned Problem

50(x-10)^2+y^2Cond num =50/1=50

Steepest DescentZIGZAGS!!!

Page 19: Computational Optimization Steepest Descent 2/8. Most Obvious Algorithm The negative gradient always points downhill. Head in the direction of the negative

Linear Convergencefor our fixed stepsize

1 1

1

1k n

k n

x

x

Page 20: Computational Optimization Steepest Descent 2/8. Most Obvious Algorithm The negative gradient always points downhill. Head in the direction of the negative

Theorem 3.3 NW

Let be the sequence generated by steepest descent with exact linesearch applied to function min 1/2x’Qx-b’x where Q is p.d.

Then for any x0, the sequence converges to the unique minimizer x* and

That is the method converges linearly

kx

22 2

1 1

( ) 1* *

( ) 1k kQ Q

cond Qx x x x

cond Q

Page 21: Computational Optimization Steepest Descent 2/8. Most Obvious Algorithm The negative gradient always points downhill. Head in the direction of the negative

Condition Number

Officially:

But this as ratio of max and mix eigenvalues if A is pd and symmetric

1( ) || || || ||cond A A A

max

2max max

|| || ( ' )

( ) ( ) if A is pd and symmetric

A A A

A A

1 1 1max

2max min

|| || ( ' )

( ) 1/ ( )

A A A

A A

Page 22: Computational Optimization Steepest Descent 2/8. Most Obvious Algorithm The negative gradient always points downhill. Head in the direction of the negative

Theorem 3.4

Suppose f is twice cont diff and the sequence of steepest descent converge to x* satisfying SOSC.

Let

The for all k sufficient large

211

1

... ( *)nn

n

r where eigenvaluesof f x

*)()(*)()( 21 xfxfrxfxf kk

Page 23: Computational Optimization Steepest Descent 2/8. Most Obvious Algorithm The negative gradient always points downhill. Head in the direction of the negative

Other possible directions

Any direction satisfying

is a descent direction.

Which ones will work?

0)'( kk pxf

( ) 'cos

( )k k

kk k

f x p

f x p

( )kf x

kpk

Page 24: Computational Optimization Steepest Descent 2/8. Most Obvious Algorithm The negative gradient always points downhill. Head in the direction of the negative

Zoutendijk’s Theorem

For any descent direction and step size satisfying the Wolfe conditions, f bounded, below, Lipschitz, and differentiable

22

0

cos ( )k kk

f x

Corollary: 22cos ( ) 0k kf x

Page 25: Computational Optimization Steepest Descent 2/8. Most Obvious Algorithm The negative gradient always points downhill. Head in the direction of the negative

Convergence Theorem

If for all iteration k

Then

So steepest descent converges!

Many other variations.

lim ( ) 0kk

f x

cos 0k

Page 26: Computational Optimization Steepest Descent 2/8. Most Obvious Algorithm The negative gradient always points downhill. Head in the direction of the negative

Theorem Convergence of Gradient Related Methods

(i) Assume the set S is bounded where

(ii) Let be Lipschitz cont on S i.e.

for all x,y S and some fixed finite L

For the sequence generated by

( )f x( ) ( )f x f y L x y

1k k kx x p

kx S

0: ( ) ( )S x f x f x

Page 27: Computational Optimization Steepest Descent 2/8. Most Obvious Algorithm The negative gradient always points downhill. Head in the direction of the negative

Theorem 10.2 Nash and Sofer

(iii) Let the directions pk satisfy:

(iv) be gradient related:

and bounded in form:

' ( )0

( )k k

k k

p f x

p f x

( ) for all (with 0)k kp m f x k m

for all kp M k

Page 28: Computational Optimization Steepest Descent 2/8. Most Obvious Algorithm The negative gradient always points downhill. Head in the direction of the negative

Theorem 10.2

(v) Let the stepsize be the first element of the sequence {1,1/2,1/4,1/8,….} satisfying

with 0<<1.

Then lim ( ) 0.kk

f x

( ) ( ) ( ) 'k kk k k k kf x p f x f x p

Page 29: Computational Optimization Steepest Descent 2/8. Most Obvious Algorithm The negative gradient always points downhill. Head in the direction of the negative

Steepest Descent

An obvious choice of direction is

Called steepest descent because

Clearly satisfies gradient related requirement of Theorem 10.2

( )k kp f x

' '( ) ( ) ( )min 1

( ) ( ) ( )k k k k

k k k k

p f x f x f x

p f x f x f x

Page 30: Computational Optimization Steepest Descent 2/8. Most Obvious Algorithm The negative gradient always points downhill. Head in the direction of the negative

Steepest Descent Summary

Simple algorithm Inexpensive per iterationOnly requires first derivative informationGlobal convergence to local minimumLinear convergence – may be slowConvergence rate depends on condition

numberMay zigzag