computational optimization steepest descent 2/8. most obvious algorithm the negative gradient always...
TRANSCRIPT
Computational Optimization
Steepest Descent 2/8
Most Obvious Algorithm
The negative gradient always points downhill.
Head in the direction of the negative gradient.
Use linesearch to decided how far to go.
Repeat.
Steepest Descent Algorithm
Start with x0
For k =1,…,K If xk is optimal then stop
Perform exact or backtracking linesearch to determine
xk+1=xk+kpk
( )k kp f x
How far should we step?
Linesearch: xk+1=xk+kpk
fixed k::=c>0 a constantDecreasing Exact Approximate conditions
ArmijoWolfe Goldstein
1
,0i
kk
Key Idea: Sufficient Decrease
The step can’t go to 0 unless the gradient goes to 0.
Armijo Condition
( ) ( )
'(0) ( ) '
k
k
k
k
g f x p
g f x p
g()
g(0)+ g’()
g(0)+ c1g’()
11/2
1( ) ( ) ( ) 'k k kk kf x p f x c f x p
Curvature Condition
2( ) ' ( ) 'k kk kf x p p c f x p
Make sure the step uses up most of the available decrease
Wolfe Condition
For 0<c1<c2<1
Solution exists for any descent direction if f is bounded below on the linesearch.
(Lemma 3.1)
1( ) ( ) ( ) 'k k kk kf x p f x c f x p
2( ) ' ( ) 'k kk kf x p p c f x p
Backtracking Search
Key point: Stepsize cannot be allowed to go to 0 unless gradient is going to 0. Must have sufficient decrease.
Fix >0 (0,1) (0,1)
0 1 2max( , , ,...)
which satisfies
( ) ( ) ( ) 'k k
k
k k k k kf x p f x f x p
Armijo Condition
( ) ( )
'(0) ( ) '
k
k
k
k
g f x p
g f x p
g()
g(0)+ g’()
g(0)+ c1g’()
11/2
1( ) ( ) ( ) 'k k kk kf x p f x c f x p
Steepest Descent Algorithm
Start with x0
For k =1,…,K If xk is optimal then stop
Perform backtracking linesearch to determine
xk+1=xk+kpk
( )k kp f x
Convergence Analysis
We usually analyze quadratic problem
Equivalent to more general problems since let x* solve min f(x)=1/2x’Qx-bx Let y = x-x* so x=y+x* then
1
2min ( ) ' where min ( ) 0f y y Qy c f y
1
21 1
2 21 1
2 21
2
( ) ( *)' ( *) '( *)
' * ' * ' * ' ' *
' '( * ) * ' * ' *
' ( *)
f x y x Q y x b y x
y Qy x Qx y Qx b y b x
y Qy y Qx b x Qx b x
y Qy f x
Convergence Rate
Consider case of constant stepsize
Square both sides
1 ( ) for our quadratic functionk k k k k kx x f x I Q x
22
1
2 2
21
'
max eigenvalues
and we know the eigenvalues are
(1 ) where ... are the eigs of
k k k k
k k
k i n
x x I Q x
I Q x
Q
Convergence Rate…
111
22 21
12
Convergence rate based on
0max |1 |,|1 |
0
since max 1 , 1
So we want this ratio to be small as possible
kkk k n
k k
kk k n
k
xerror
error x
x
x
Best Step
11Minimizes max | |1,|| k nk
1
n 1
2
n 1
1
1
2*
n
1
1
n
n
Condition number determines convergence rate
Linear convergence since
Condition number:
1 is best
1 1
1
1k n
k n
x
x
1
1n
1
1
11
1n
n
Well Conditioned Problem
x^2+y^2Cond num =1/1=1
Steepest Descentdoes one step like Newton
ill-Conditioned Problem
50(x-10)^2+y^2Cond num =50/1=50
Steepest DescentZIGZAGS!!!
Linear Convergencefor our fixed stepsize
1 1
1
1k n
k n
x
x
Theorem 3.3 NW
Let be the sequence generated by steepest descent with exact linesearch applied to function min 1/2x’Qx-b’x where Q is p.d.
Then for any x0, the sequence converges to the unique minimizer x* and
That is the method converges linearly
kx
22 2
1 1
( ) 1* *
( ) 1k kQ Q
cond Qx x x x
cond Q
Condition Number
Officially:
But this as ratio of max and mix eigenvalues if A is pd and symmetric
1( ) || || || ||cond A A A
max
2max max
|| || ( ' )
( ) ( ) if A is pd and symmetric
A A A
A A
1 1 1max
2max min
|| || ( ' )
( ) 1/ ( )
A A A
A A
Theorem 3.4
Suppose f is twice cont diff and the sequence of steepest descent converge to x* satisfying SOSC.
Let
The for all k sufficient large
211
1
... ( *)nn
n
r where eigenvaluesof f x
*)()(*)()( 21 xfxfrxfxf kk
Other possible directions
Any direction satisfying
is a descent direction.
Which ones will work?
0)'( kk pxf
( ) 'cos
( )k k
kk k
f x p
f x p
( )kf x
kpk
Zoutendijk’s Theorem
For any descent direction and step size satisfying the Wolfe conditions, f bounded, below, Lipschitz, and differentiable
22
0
cos ( )k kk
f x
Corollary: 22cos ( ) 0k kf x
Convergence Theorem
If for all iteration k
Then
So steepest descent converges!
Many other variations.
lim ( ) 0kk
f x
cos 0k
Theorem Convergence of Gradient Related Methods
(i) Assume the set S is bounded where
(ii) Let be Lipschitz cont on S i.e.
for all x,y S and some fixed finite L
For the sequence generated by
( )f x( ) ( )f x f y L x y
1k k kx x p
kx S
0: ( ) ( )S x f x f x
Theorem 10.2 Nash and Sofer
(iii) Let the directions pk satisfy:
(iv) be gradient related:
and bounded in form:
' ( )0
( )k k
k k
p f x
p f x
( ) for all (with 0)k kp m f x k m
for all kp M k
Theorem 10.2
(v) Let the stepsize be the first element of the sequence {1,1/2,1/4,1/8,….} satisfying
with 0<<1.
Then lim ( ) 0.kk
f x
( ) ( ) ( ) 'k kk k k k kf x p f x f x p
Steepest Descent
An obvious choice of direction is
Called steepest descent because
Clearly satisfies gradient related requirement of Theorem 10.2
( )k kp f x
' '( ) ( ) ( )min 1
( ) ( ) ( )k k k k
k k k k
p f x f x f x
p f x f x f x
Steepest Descent Summary
Simple algorithm Inexpensive per iterationOnly requires first derivative informationGlobal convergence to local minimumLinear convergence – may be slowConvergence rate depends on condition
numberMay zigzag