newton handouts

8/12/2019 Newton Handouts

1/9

Methods for unconstrained optimization

Convergence

Descent directions

Line search

The Newton Method

Nonlinear OptimizationOverview of methods; the Newton method with line search

Niclas Brlin

Department of Computing Science

Ume University

[email protected]

November 19, 2007

c 2007 Niclas Brlin, CS, UmU Nonlinear Optimization; The Newton method w/ line search


Convergence

Descent directions

Line search

The Newton Method

Overview

Line search

Trust-region

Overview

Most deterministic methods for unconstrained optimization have thefollowing features:

They areiterative, i.e. they start with an initial guess x0 of the

variables and tries to find better points {xk}, k= 1, . . ..

They aredescent methods, i.e. at each iteration k,

f(xk+1) 0

f(xk+ pk),

where the scalar is called thestep length.

In theory we would like optimal step lengths, but in practice it is

more efficient to test trial step lengths until we find one that givesus a good enough point.



2/9


Convergence

Descent directions

Line search

The Newton Method

Overview

Line search

Trust-region

2.5 2 1.5 1 0.5

0.6

0.4

0.2

0

0.2

0.4

0.6

0.8

xk

p



Convergence

Descent directions

Line search

The Newton Method

Overview

Line search

Trust-region

Trust-region

In the trust-region strategy, the algorithm defines a region of trustaroundxkwhere the current model function mk is trusted.

The region of trust is usually defined as

p2 ,

where the scalar is called the trust-region radius. A candidate steppis found by approximately solving the

following subproblem

minp

mk(xk+ p) s.t. p2 .

If the candidate step does not produce a good enough new point,

we shrink the trust-region radius and re-solve the subproblem.



Convergence

Descent directions

Line search

The Newton Method

Overview

Line search

Trust-region

2.5 2 1.5 1 0.5

0.6

0.4

0.2

0

0.2

0.4

0.6

0.8

xk



Convergence

Descent directions

Line search

The Newton Method

Overview

Line search

Trust-region

In the line search strategy, the direction is chosen first, followed

by the distance.

In the trust-region strategy, the maximum distance is chosen

first, followed by the direction.

2.5 2 1.5 1 0.5

0.6

0.4

0.2

0

0.2

0.4

0.6

0.8

xk

p

2.5 2 1.5 1 0.5

0.6

0.4

0.2

0

0.2

0.4

0.6

0.8

xk



3/9


Convergence

Descent directions

Line search

The Newton Method

Convergence rate

Linear convergence

Quadratic convergence

Local vs. global convergence

Globalization strategies

Convergence rate

In order to compare different iterative methods, we need an

efficiency measure.

Since we do not know the number of iterations in advance, the

computational complexitymeasure used by direct methods

cannot be used.

Instead the concept of aconvergence rateis defined.



Convergence

Descent directions

Line search

The Newton Method

Convergence rate

Linear convergence




Assume we have a series {xk}that converges to a solutionx.

Define the sequence of errors as

ek= xk x

and note that

limk

ek= 0.

We say that the sequence{xk}converges tox with rater and

rate constantC if

limk

ek+1ekr

= C

andC


4/9


Convergence

Descent directions

Line search

The Newton Method

Convergence rate

Linear convergence




Quadratic convergence, examples

Forr= 2,C= 0.1 oche0 = 1, the sequence becomes

1, 101, 103, 107, . . .

Forr= 2,C= 3 oche0 = 1, the sequence diverges

1, 3, 27, . . .

Forr= 2,C= 3 oche0 = 0.1, the sequence becomes

0.1, 0.03, 0.0027, . . . ,

i.e. it converges despite C>1.

For quadratic convergence, the constantCis of lesser importance.

Instead it is important that the initial approximation is close enough to

the solution, i.e.e0is small.



Convergence

Descent directions

Line search

The Newton Method

Convergence rate

Linear convergence





A method is called locally convergentif it produces a convergent

sequence toward a minimizerx provided aclose enough

starting approximation.

A method is called globally convergentif it produces a

convergent sequence toward a minimizer x providedany

starting approximation.

Note that global convergence does not imply convergencetowards a global minimizer.



Convergence

Descent directions

Line search

The Newton Method

Convergence rate

Linear convergence





The line search and trust-region methods are sometimes called

globalization strategies, since they modify a core method

(typically locally convergent) to become globally convergent.

There are two efficiency requirements on any globalizationstrategy: Far from the solution, they should stop the methods from going

out of control. Close to the solution, when the core method is efficient, they

should interfere as little as possible.



Convergence

Descent directions

Line search

The Newton Method

Descent directions

Consider the Taylor expansion of the objective function along a

search directionp

f(xk+ p) = f(xk) + pTfk+

1

22pT2f(xk+ p)p,

for some (0, )

Any directionpsuch thatpTfk


5/9


Convergence

Descent directions

Line search

The Newton Method

If the search direction has the form

pk= B1k fk,

the descent condition

pTk fk= fTkB

1k fk 0.

Ideally we would like to find the global minimizer of for everyiteration. This is called anexactline search.

However, it is possible to construct inexactline search methods

that produce an adequate reduction of f at a minimal cost.

Inexact line search methods construct a number of candidatevalues for and stop when certain conditions are satisfied.



6/9


Convergence

Descent directions

Line search

The Newton Method

Overview

Exact and inexact line searches

The Sufficient Decrease Condition

Backtracking

The Curvature Condition

The Wolfe Condition

The Sufficient Decrease Condition

Mathematically, the descent conditionf(xk+ pk)


7/9


8/9


Convergence

Descent directions

Line search

The Newton Method

The Newton-Raphson method in 1

The Classical Newton minimization method inn

Geometrical interpretation; the model function

Properties of the Newton method

Ensuring a descent direction

The modified Newton algorithm with line search



Convergence

Descent directions

Line search

The Newton Method








Advantages:

It converges quadraticallytoward a stationary point.

Disadvantages:

It does not necessarilyconverge toward a minimizer.

It may diverge if the startingapproximation is too far from thesolution.

It will fail if 2f(xk) is notinvertible for somek.

It requires second-orderinformation2f(xk).

Newtons method is rarely used in its classical formulation. However, many

methods may be seen as approximations of Newtons method.



Convergence

Descent directions

Line search

The Newton Method








Since the Newton search directionpN is written as

pN = B1k fk,

withBk= 2fk,p

N will be a descent direction if 2fkis positive definite.

If2fk isnotpositive definite, the Newton directionpN may not a

descent direction.

In that case we will chooseBkas a positive definite approximation of2fk.

Performed in a proper way, this modified algorithm will converge towarda minimizer. Furthermore, close to the solution the Hessian is usuallypositive definite and the modification will only be performed far fromthe solution.



Convergence

Descent directions

Line search

The Newton Method









9/9


Convergence

Descent directions

Line search

The Newton Method







The positive definite approximation Bkof the Hessian may be

found with minimal extra effort: The search direction pis

calculated as the solution of2f(x)p= f(x).

If2f(x) is positive definite, the matrix factorization

2f(x) = LDLT

may be used, where the diagonal elements of Dare positive. If2f(x) is not positive definite, at some point during the

factorization, a diagonal element will bedii0. In this case, theelement may be replaced with a suitable positive entry.

Finally, the factorization is used to calculate the search direction

(LDLT)p= f(x).



Convergence

Descent directions

Line search

The Newton Method








Specify a starting approximationx0 and a convergence tolerance.Repeat fork= 0, 1, . . .

Iff(xk)< , stop.

Compute the modifiedLDLT factorization of the Hessian.

Solve(LDLT)pNk = f(xk)

for the search directionpN

k. Perform a line search to determine the new approximation

xk+1= xk+ kpNk.


newton handouts

Documents