lecture6.handout

8/12/2019 Lecture6.Handout

1/31

Numerical Methods ISolving Nonlinear Equations

Aleksandar DonevCourant Institute, NYU1

[email protected]

1Course G63.2010.001 / G22.2420-001, Fall 2010

October 14th, 2010

A. Donev (Courant Institute) Lecture VI 10/14/2010 1 / 31


2/31


3/31

Final Presentations

The final project writeup will be due Sunday Dec. 26th by midnight (Ihave to start grading by 12/27 due to University deadlines).

You will also need to give a 15 minute presentation in front of me andother students.

Our last class is officially scheduled for Tuesday 12/14, 5-7pm, and

the final exam Thursday 12/23, 5-7pm. Neither of these are good!By the end of next week, October 23rd, please let me know thefollowing:

Are you willing to present early Thursday December 16th during usual

class time?Do you want to present during the official scheduled last class,Thursday 12/23, 5-7pm.If neither of the above, tell me when you cannot present Monday Dec.20th to Thursday Dec. 23rd (finals week).



4/31

Basics of Nonlinear Solvers

Fundamentals

Simplest problem: Root finding in one dimension:

f(x) = 0 with x [a, b]

Or more generally, solving a square system of nonlinear equations

f(x) =0 fi(x1, x2, . . . , xn) = 0 for i= 1, . . . , n.

There can be no closed-form answer, so just as for eigenvalues, weneed iterative methods.

Most generally, starting from m 1 initial guesses x0, x1, . . . , xm,iterate:

xk+1 =(xk, xk1, . . . , xkm).


B i f N li S l


5/31


Order of convergence

Consider one dimensional root finding and let the actual root be ,f() = 0.

A sequence of iterates xk that converges to has order ofconvergence p>1 if as k

xk+1 |xk |p = ek+1|ek|p C= const,

where the constant 0


6/31


Local vs. global convergence

A good initial guess is extremely important in nonlinear solvers!Assume we are looking for a unique root a bstarting with aninitial guess a x0 b.A method has local convergence if it converges to a given root for

any initial guess that is sufficiently close to (in the neighborhoodof a root).

A method has global convergence if it converges to the root for anyinitial guess.

General rule: Global convergence requires a slower (careful) methodbut is safer.

It is best to combine a global method to first find a good initial guessclose to and then use a faster local method.




7/31


Conditioning of root finding

f(+) f() +f

() =f

|| |f||f()| abs=f()1 .

The problem of finding a simple rootis well-conditioned when|f()|is far from zero.

Finding roots with multiplicity m>1 is ill-conditioned:

f()= = f(m1)()= 0 || |f||fm()|1/m

Note that finding roots of algebraic equations (polynomials) is aseparate subject of its own that we skip.


One Dimensional Root Finding


8/31


The bisection and Newtonalgorithms




9/31


Bisection

First step is to locate a root by searching for a sign change, i.e.,

finding a

0

and b

0

such thatf(a0)f(b0)


10/31


Newtons Method

Bisection is a slow but sure method. It uses no information about thevalue of the function or its derivatives.

Better convergence, of order p= (1 +

5)/2 1.63 (the goldenratio), can be achieved by using the value of the function at twopoints, as in the secant method.

Achieving second-order convergence requires also evaluating thefunction derivative.

Linearize the function around the current guess using Taylor series:

f(xk+1)

f(xk) + (xk+1

xk)f(xk) = 0

xk+1 =xk f(xk)

f(xk)




11/31

g

Convergence of Newtonsmethod

Taylor series with remainder:

f() = 0 =f(xk)+(xk)f(xk)+ 12

(xk)2f() = 0, for some [xn, ]

After dividing by f(xk) = 0 we get

xk f(xk)f(xk)

= 12

( xk)2 f

()f(xk)

xk+1 =ek+1 = 12

ek

2 f()

f(xk)

which shows second-order convergencexk+1 |xk |2

=

ek+1|ek|2

=

f()

2f(xk)

f()

2f()




12/31

g

Proof of Local Convergence

xk+1 |xk |2

=

f()2f(xk) M= sup

|e0|x,y+|e0|

f(x)2f(y)

Mxk+1 = Ek+1 Mxk 2 = Ek2which will converge ifE0


13/31

g

Fixed-Point Iteration

Another way to devise iterative root finding is to rewrite f(x) in an

equivalent formx=(x)

Then we can use fixed-point iteration

xk+1

=(xk

)

whose fixed point (limit), if it converges, is x .For example, recall from first lecture solving x2 =cvia theBabylonian method for square roots

xn+1 =(xn) =1

2

cx

+x

,

which converges (quadratically) for any non-zero initial guess.




14/31

Convergence theory

It can be proven that the fixed-point iteration xk+1 =(xk)

converges if(x) is a contraction mapping:(x) K


15/31

Applications of general convergence theory

Think of Newtons method

xk+1 =xk f(xk)

f(xk)

as a fixed-point iteration method xk+1 =(xk) with iteration

function:(x) =x f(x)

f(x).

We can directly show quadratic convergence because (also seehomework)

(x) = f(x)f(x)[f(x)]2

() = 0

() = f()

f()= 0




16/31

Stopping Criteria

A good library function for root finding has to implement careful

termination criteria.An obvious option is to terminate when the residual becomes smallf(xk)< ,which is only good for very well-conditioned problems,

|f()

| 1.

Another option is to terminate when the increment becomes smallxk+1 xk< .For fixed-point iteration

xk+1 xk =ek+1 ek 1 () ek ek [1 ()] ,

so we see that the increment test works for rapidly convergingiterations (()

1).




17/31

In practice

A robust but fast algorithm for root finding would combine bisectionwith Newtons method.

Specifically, a method like Newtons that can easily take huge steps inthe wrong direction and lead far from the current point must besafeguarded by a method that ensures one does not leave the search

interval and that the zero is not missed.

Once xk is close to , the safeguard will not be used and quadratic orfaster convergence will be achieved.

Newtons method requires first-order derivatives so often other

methods are preferred that require function evaluation only.Matlabs function fzerocombines bisection, secant and inversequadratic interpolation and is fail-safe.



( )

( 2/ )


18/31

Find zeros of a sin(x) + bexp(x2/2) in MATLAB

% f=@ m f i l e u s e s a f u n c t i o n i n an m f i l e

% P a r a m e t e r i z e d f u n c t i o n s a r e c r e a t e d w i th :

a = 1 ; b = 2 ;f = @( x ) a s i n( x ) + bexp(x 2/ 2) ; % H a n d l e

f i g u r e( 1 )e z p l o t ( f , [ 5 , 5 ] ) ; g r i d

x1=f z e r o( f , [ 2 ,0 ])[ x2 , f 2 ]= f z e r o( f , 2 . 0 )

x1 = 1.227430849357917x2 = 3 . 1 5 5 3 6 6 4 1 5 4 9 4 8 0 1f 2 = 2.116362640691705e16



f f ( )


19/31

Figure of f(x)

5 4 3 2 1 0 1 2 3 4 5

1

0.5

0

0.5

1

1.5

2

2.5

x

a sin(x)+b exp(x2/2)


Systems of Non-Linear Equations

M l i V i bl

T l E i


20/31

Multi-Variable Taylor Expansion

We are after solving a square system of nonlinear equations forsome variables x:

f(x) =0 fi(x1, x2, . . . , xn) = 0 for i= 1, . . . , n.It is convenient to focus on one of the equations, i.e., consider ascalar function f(x).The usual Taylor series is replaced by

f(x+ x) =f(x) +gT (x) +1

2(x)T H (x)

where the gradient vector is

g=xf = fx1 , fx2 , , fxn T

and the Hessian matrix is

H= 2xf =

2f

xixj

ij



N M

h d f S f E i


21/31

Newtons Method for Systems of Equations

It is much harder if not impossible to do globally convergent methodslike bisection in higher dimensions!

A good initial guess is therefore a must when solving systems, andNewtons method can be used to refine the guess.

The first-order Taylor series is

fxk + x fx

k

+ J xk

x=0where the Jacobian Jhas the gradients offi(x) as rows:

[J (x)]ij= fixj

So taking a Newton step requires solving a linear system:J

xk

x= fxk but denote J J xk

xk+1 =xk + x= xk

J1fx

k

.A. Donev (Courant Institute) Lecture VI 10/14/2010 21 / 31


C

f N t th d


22/31

Convergenceof Newtons method

Newtons method converges quadratically if started sufficiently closeto a root xat which the Jacobian is not singular.xk+1 x= ek+1= xk J1fxk x= ek J1fxkbut using second-order Taylor series

J1fxk J1 f(x) +Jek +12ekT H ek

=ek +J1

2

ekT

H

ek

ek+1= J12 ekT H ek J1 H2 ek2

Fixed point iteration theory generalizes to multiple variables, e.g.,replace f()


23/31


I actice


24/31

In practice

It is much harder to construct general robust solvers in higherdimensions and some problem-specific knowledge is required.

There is no built-in function for solving nonlinear systems inMATLAB, but the Optimization Toolbox has fsolve.

In many practical situations there is some continuity of the problemso that a previous solution can be used as an initial guess.

For example, implicit methods for differential equations have atime-dependent Jacobian J(t) and in many cases the solution x(t)evolves smootly in time.

For large problems specialized sparse-matrix solvers need to be used.In many cases derivatives are not provided but there are sometechniques for automatic differentiation.


Intro to Unconstrained Optimization

Formulation


25/31

Formulation

Optimization problems are among the most important in engineeringand finance, e.g., minimizing production cost, maximizing profits,

etc.minxRn

f(x)

where x are some variable parameters and f : Rn R is a scalarobjective function.

Observe that one only need to consider minimization as

maxxRn

f(x) = minxRn

[f(x)]A local minimum x is optimal in some neighborhood,

f(x

) f(x) x s.t. x x

R>0.(think of finding the bottom of a valley)Finding the global minimumis generally not possible for arbitraryfunctions(think of finding Mt. Everest without a satelite)



Connection to nonline

ar systems


26/31

Connection to nonlinear systems

Assume that the objective function is differentiable (i.e., first-orderTaylor series converges or gradient exists).

Then a necessary condition for a local minimizer is that x be acritical point

g (x) =xf (x) =

f

xi(x)i =0

which is a system of non-linear equations!

In fact similar methods, such as Newton or quasi-Newton, apply toboth problems.

Vice versa, observe that solving f(x) =0is equivalent to anoptimization problem

minx

f(x)T f(x)

although this is only recommended under special circumstances.



Sufficient Conditions


27/31

Sufficient Conditions

Assume now that the objective function is twice-differentiable (i.e.,Hessian exists).

A critical point xis a local minimum if the Hessian is positivedefinite

H (x) =2

x

f (x)

0

which means that the minimum really looks like a valley or a convexbowl.

At any local minimum the Hessian is positive semi-definite,

2

x

f (x)

0.

Methods that require Hessian information converge fast but areexpensive (next class).



Direct Search Method

s


28/31

Direct-Search Methods

A direct search method only requires f(x) to be continuous butnot necessarily differentiable, and requires only function evaluations.

Methods that do a search similar to that in bisection can be devisedin higher dimensions also, but they may fail to converge and areusually slow.

The MATLAB function fminsearch uses the Nelder-Mead orsimplex-search method, which can be thought of as rolling a simplexdownhill to find the bottom of a valley. But there are many othersand this is an active research area.

Curse of dimensionality: As the number of variables(dimensionality)n becomes larger, direct search becomes hopelesssince the number of samples needed grows as 2n!



Minimum of 100(x2

x21 )

2 + (a x1)2 in MATLAB


29/31

Minimum of 100(x2 x1 ) + (a x1) in MATLAB

% R o se nb ro ck o r b anana f u n c t i o n :

a = 1 ;b an an a = @( x ) 1 00( x ( 2 )x ( 1 ) 2 ) 2 + ( ax ( 1 ) ) 2 ;

% Th is f u n c t i o n must a c ce p t a r r a y a rg um en ts ! b a n an a x y = @( x 1 , x 2 ) 1 0 0( x2x 1 . 2 ) . 2 + ( ax1 ) . 2 ;

f i g u r e( 1 ) ; e z s u r f ( b an an a x y , [ 0 , 2 , 0 , 2 ] )

[ x , y ] = m esh gr id (l i n s p a c e( 0 , 2 , 1 0 0 ) ) ;f i g u r e( 2 ) ; c o n t o u r f ( x , y , b a n a n a x y ( x , y ) , 1 0 0 )

% C o r r e ct a ns we rs a r e x = [ 1 , 1] and f ( x )=0 [ x , f v a l ] = f m i n s e a r c h ( b an an a , [ 1 . 2 , 1 ] , o p t i m s e t ( T olX , 1 e 8))x = 0 . 9 9 9 9 9 9 9 9 9 1 8 7 8 1 4 0 . 9 9 9 9 9 9 9 9 8 4 4 1 9 1 9f v a l = 1 . 0 9 9 0 8 8 9 5 1 9 1 9 5 7 3 e18



Figure of Rosenbrock

f (x)


30/31

Figure of Rosenbrock f(x)

1

0.5

0

0.5

1

2

1

0

1

20

200

400

600

800

x1

100 (x2x

1

2)2+(ax

1)2

x2

0 0.2 0.4 0.6 0.8 1 1.2 1.4 1.6 1.8 20

0.2

0.4

0.6

0.8

1

1.2

1.4

1.6

1.8

2


Conclusions

Conclusions/Summary


31/31

Conclusions/Summary

Root finding is well-conditioned for simple roots (unit multiplicity),ill-conditioned otherwise.

Methods for solving nonlinear equations are always iterative and theorder of convergence matters: second order is usually good enough.

A good method uses a higher-order unsafe method such as Newtonmethod near the root, but safeguards it with something like thebisectionmethod.

Newtons method is second-order but requires derivative/Jacobianevaluation. In higher dimensions having a good initial guess for

Newtons method becomes very important.Quasi-Newtonmethods can aleviate the complexity of solving theJacobian linear system.


lecture6.handout

Documents