lecture6.handout
TRANSCRIPT
-
8/12/2019 Lecture6.Handout
1/31
Numerical Methods ISolving Nonlinear Equations
Aleksandar DonevCourant Institute, NYU1
1Course G63.2010.001 / G22.2420-001, Fall 2010
October 14th, 2010
A. Donev (Courant Institute) Lecture VI 10/14/2010 1 / 31
-
8/12/2019 Lecture6.Handout
2/31
-
8/12/2019 Lecture6.Handout
3/31
Final Presentations
The final project writeup will be due Sunday Dec. 26th by midnight (Ihave to start grading by 12/27 due to University deadlines).
You will also need to give a 15 minute presentation in front of me andother students.
Our last class is officially scheduled for Tuesday 12/14, 5-7pm, and
the final exam Thursday 12/23, 5-7pm. Neither of these are good!By the end of next week, October 23rd, please let me know thefollowing:
Are you willing to present early Thursday December 16th during usual
class time?Do you want to present during the official scheduled last class,Thursday 12/23, 5-7pm.If neither of the above, tell me when you cannot present Monday Dec.20th to Thursday Dec. 23rd (finals week).
A. Donev (Courant Institute) Lecture VI 10/14/2010 3 / 31
-
8/12/2019 Lecture6.Handout
4/31
Basics of Nonlinear Solvers
Fundamentals
Simplest problem: Root finding in one dimension:
f(x) = 0 with x [a, b]
Or more generally, solving a square system of nonlinear equations
f(x) =0 fi(x1, x2, . . . , xn) = 0 for i= 1, . . . , n.
There can be no closed-form answer, so just as for eigenvalues, weneed iterative methods.
Most generally, starting from m 1 initial guesses x0, x1, . . . , xm,iterate:
xk+1 =(xk, xk1, . . . , xkm).
A. Donev (Courant Institute) Lecture VI 10/14/2010 4 / 31
B i f N li S l
-
8/12/2019 Lecture6.Handout
5/31
Basics of Nonlinear Solvers
Order of convergence
Consider one dimensional root finding and let the actual root be ,f() = 0.
A sequence of iterates xk that converges to has order ofconvergence p>1 if as k
xk+1 |xk |p = ek+1|ek|p C= const,
where the constant 0
-
8/12/2019 Lecture6.Handout
6/31
Basics of Nonlinear Solvers
Local vs. global convergence
A good initial guess is extremely important in nonlinear solvers!Assume we are looking for a unique root a bstarting with aninitial guess a x0 b.A method has local convergence if it converges to a given root for
any initial guess that is sufficiently close to (in the neighborhoodof a root).
A method has global convergence if it converges to the root for anyinitial guess.
General rule: Global convergence requires a slower (careful) methodbut is safer.
It is best to combine a global method to first find a good initial guessclose to and then use a faster local method.
A. Donev (Courant Institute) Lecture VI 10/14/2010 6 / 31
Basics of Nonlinear Solvers
-
8/12/2019 Lecture6.Handout
7/31
Basics of Nonlinear Solvers
Conditioning of root finding
f(+) f() +f
() =f
|| |f||f()| abs=f()1 .
The problem of finding a simple rootis well-conditioned when|f()|is far from zero.
Finding roots with multiplicity m>1 is ill-conditioned:
f()= = f(m1)()= 0 || |f||fm()|1/m
Note that finding roots of algebraic equations (polynomials) is aseparate subject of its own that we skip.
A. Donev (Courant Institute) Lecture VI 10/14/2010 7 / 31
One Dimensional Root Finding
-
8/12/2019 Lecture6.Handout
8/31
One Dimensional Root Finding
The bisection and Newtonalgorithms
A. Donev (Courant Institute) Lecture VI 10/14/2010 8 / 31
One Dimensional Root Finding
-
8/12/2019 Lecture6.Handout
9/31
One Dimensional Root Finding
Bisection
First step is to locate a root by searching for a sign change, i.e.,
finding a
0
and b
0
such thatf(a0)f(b0)
-
8/12/2019 Lecture6.Handout
10/31
One Dimensional Root Finding
Newtons Method
Bisection is a slow but sure method. It uses no information about thevalue of the function or its derivatives.
Better convergence, of order p= (1 +
5)/2 1.63 (the goldenratio), can be achieved by using the value of the function at twopoints, as in the secant method.
Achieving second-order convergence requires also evaluating thefunction derivative.
Linearize the function around the current guess using Taylor series:
f(xk+1)
f(xk) + (xk+1
xk)f(xk) = 0
xk+1 =xk f(xk)
f(xk)
A. Donev (Courant Institute) Lecture VI 10/14/2010 10 / 31
One Dimensional Root Finding
-
8/12/2019 Lecture6.Handout
11/31
g
Convergence of Newtonsmethod
Taylor series with remainder:
f() = 0 =f(xk)+(xk)f(xk)+ 12
(xk)2f() = 0, for some [xn, ]
After dividing by f(xk) = 0 we get
xk f(xk)f(xk)
= 12
( xk)2 f
()f(xk)
xk+1 =ek+1 = 12
ek
2 f()
f(xk)
which shows second-order convergencexk+1 |xk |2
=
ek+1|ek|2
=
f()
2f(xk)
f()
2f()
A. Donev (Courant Institute) Lecture VI 10/14/2010 11 / 31
One Dimensional Root Finding
-
8/12/2019 Lecture6.Handout
12/31
g
Proof of Local Convergence
xk+1 |xk |2
=
f()2f(xk) M= sup
|e0|x,y+|e0|
f(x)2f(y)
Mxk+1 = Ek+1 Mxk 2 = Ek2which will converge ifE0
-
8/12/2019 Lecture6.Handout
13/31
g
Fixed-Point Iteration
Another way to devise iterative root finding is to rewrite f(x) in an
equivalent formx=(x)
Then we can use fixed-point iteration
xk+1
=(xk
)
whose fixed point (limit), if it converges, is x .For example, recall from first lecture solving x2 =cvia theBabylonian method for square roots
xn+1 =(xn) =1
2
cx
+x
,
which converges (quadratically) for any non-zero initial guess.
A. Donev (Courant Institute) Lecture VI 10/14/2010 13 / 31
One Dimensional Root Finding
-
8/12/2019 Lecture6.Handout
14/31
Convergence theory
It can be proven that the fixed-point iteration xk+1 =(xk)
converges if(x) is a contraction mapping:(x) K
-
8/12/2019 Lecture6.Handout
15/31
Applications of general convergence theory
Think of Newtons method
xk+1 =xk f(xk)
f(xk)
as a fixed-point iteration method xk+1 =(xk) with iteration
function:(x) =x f(x)
f(x).
We can directly show quadratic convergence because (also seehomework)
(x) = f(x)f(x)[f(x)]2
() = 0
() = f()
f()= 0
A. Donev (Courant Institute) Lecture VI 10/14/2010 15 / 31
One Dimensional Root Finding
-
8/12/2019 Lecture6.Handout
16/31
Stopping Criteria
A good library function for root finding has to implement careful
termination criteria.An obvious option is to terminate when the residual becomes smallf(xk)< ,which is only good for very well-conditioned problems,
|f()
| 1.
Another option is to terminate when the increment becomes smallxk+1 xk< .For fixed-point iteration
xk+1 xk =ek+1 ek 1 () ek ek [1 ()] ,
so we see that the increment test works for rapidly convergingiterations (()
1).
A. Donev (Courant Institute) Lecture VI 10/14/2010 16 / 31
One Dimensional Root Finding
-
8/12/2019 Lecture6.Handout
17/31
In practice
A robust but fast algorithm for root finding would combine bisectionwith Newtons method.
Specifically, a method like Newtons that can easily take huge steps inthe wrong direction and lead far from the current point must besafeguarded by a method that ensures one does not leave the search
interval and that the zero is not missed.
Once xk is close to , the safeguard will not be used and quadratic orfaster convergence will be achieved.
Newtons method requires first-order derivatives so often other
methods are preferred that require function evaluation only.Matlabs function fzerocombines bisection, secant and inversequadratic interpolation and is fail-safe.
A. Donev (Courant Institute) Lecture VI 10/14/2010 17 / 31
One Dimensional Root Finding
( )
( 2/ )
-
8/12/2019 Lecture6.Handout
18/31
Find zeros of a sin(x) + bexp(x2/2) in MATLAB
% f=@ m f i l e u s e s a f u n c t i o n i n an m f i l e
% P a r a m e t e r i z e d f u n c t i o n s a r e c r e a t e d w i th :
a = 1 ; b = 2 ;f = @( x ) a s i n( x ) + bexp(x 2/ 2) ; % H a n d l e
f i g u r e( 1 )e z p l o t ( f , [ 5 , 5 ] ) ; g r i d
x1=f z e r o( f , [ 2 ,0 ])[ x2 , f 2 ]= f z e r o( f , 2 . 0 )
x1 = 1.227430849357917x2 = 3 . 1 5 5 3 6 6 4 1 5 4 9 4 8 0 1f 2 = 2.116362640691705e16
A. Donev (Courant Institute) Lecture VI 10/14/2010 18 / 31
One Dimensional Root Finding
f f ( )
-
8/12/2019 Lecture6.Handout
19/31
Figure of f(x)
5 4 3 2 1 0 1 2 3 4 5
1
0.5
0
0.5
1
1.5
2
2.5
x
a sin(x)+b exp(x2/2)
A. Donev (Courant Institute) Lecture VI 10/14/2010 19 / 31
Systems of Non-Linear Equations
M l i V i bl
T l E i
-
8/12/2019 Lecture6.Handout
20/31
Multi-Variable Taylor Expansion
We are after solving a square system of nonlinear equations forsome variables x:
f(x) =0 fi(x1, x2, . . . , xn) = 0 for i= 1, . . . , n.It is convenient to focus on one of the equations, i.e., consider ascalar function f(x).The usual Taylor series is replaced by
f(x+ x) =f(x) +gT (x) +1
2(x)T H (x)
where the gradient vector is
g=xf = fx1 , fx2 , , fxn T
and the Hessian matrix is
H= 2xf =
2f
xixj
ij
A. Donev (Courant Institute) Lecture VI 10/14/2010 20 / 31
Systems of Non-Linear Equations
N M
h d f S f E i
-
8/12/2019 Lecture6.Handout
21/31
Newtons Method for Systems of Equations
It is much harder if not impossible to do globally convergent methodslike bisection in higher dimensions!
A good initial guess is therefore a must when solving systems, andNewtons method can be used to refine the guess.
The first-order Taylor series is
fxk + x fx
k
+ J xk
x=0where the Jacobian Jhas the gradients offi(x) as rows:
[J (x)]ij= fixj
So taking a Newton step requires solving a linear system:J
xk
x= fxk but denote J J xk
xk+1 =xk + x= xk
J1fx
k
.A. Donev (Courant Institute) Lecture VI 10/14/2010 21 / 31
Systems of Non-Linear Equations
C
f N t th d
-
8/12/2019 Lecture6.Handout
22/31
Convergenceof Newtons method
Newtons method converges quadratically if started sufficiently closeto a root xat which the Jacobian is not singular.xk+1 x= ek+1= xk J1fxk x= ek J1fxkbut using second-order Taylor series
J1fxk J1 f(x) +Jek +12ekT H ek
=ek +J1
2
ekT
H
ek
ek+1= J12 ekT H ek J1 H2 ek2
Fixed point iteration theory generalizes to multiple variables, e.g.,replace f()
-
8/12/2019 Lecture6.Handout
23/31
Systems of Non-Linear Equations
I actice
-
8/12/2019 Lecture6.Handout
24/31
In practice
It is much harder to construct general robust solvers in higherdimensions and some problem-specific knowledge is required.
There is no built-in function for solving nonlinear systems inMATLAB, but the Optimization Toolbox has fsolve.
In many practical situations there is some continuity of the problemso that a previous solution can be used as an initial guess.
For example, implicit methods for differential equations have atime-dependent Jacobian J(t) and in many cases the solution x(t)evolves smootly in time.
For large problems specialized sparse-matrix solvers need to be used.In many cases derivatives are not provided but there are sometechniques for automatic differentiation.
A. Donev (Courant Institute) Lecture VI 10/14/2010 24 / 31
Intro to Unconstrained Optimization
Formulation
-
8/12/2019 Lecture6.Handout
25/31
Formulation
Optimization problems are among the most important in engineeringand finance, e.g., minimizing production cost, maximizing profits,
etc.minxRn
f(x)
where x are some variable parameters and f : Rn R is a scalarobjective function.
Observe that one only need to consider minimization as
maxxRn
f(x) = minxRn
[f(x)]A local minimum x is optimal in some neighborhood,
f(x
) f(x) x s.t. x x
R>0.(think of finding the bottom of a valley)Finding the global minimumis generally not possible for arbitraryfunctions(think of finding Mt. Everest without a satelite)
A. Donev (Courant Institute) Lecture VI 10/14/2010 25 / 31
Intro to Unconstrained Optimization
Connection to nonline
ar systems
-
8/12/2019 Lecture6.Handout
26/31
Connection to nonlinear systems
Assume that the objective function is differentiable (i.e., first-orderTaylor series converges or gradient exists).
Then a necessary condition for a local minimizer is that x be acritical point
g (x) =xf (x) =
f
xi(x)i =0
which is a system of non-linear equations!
In fact similar methods, such as Newton or quasi-Newton, apply toboth problems.
Vice versa, observe that solving f(x) =0is equivalent to anoptimization problem
minx
f(x)T f(x)
although this is only recommended under special circumstances.
A. Donev (Courant Institute) Lecture VI 10/14/2010 26 / 31
Intro to Unconstrained Optimization
Sufficient Conditions
-
8/12/2019 Lecture6.Handout
27/31
Sufficient Conditions
Assume now that the objective function is twice-differentiable (i.e.,Hessian exists).
A critical point xis a local minimum if the Hessian is positivedefinite
H (x) =2
x
f (x)
0
which means that the minimum really looks like a valley or a convexbowl.
At any local minimum the Hessian is positive semi-definite,
2
x
f (x)
0.
Methods that require Hessian information converge fast but areexpensive (next class).
A. Donev (Courant Institute) Lecture VI 10/14/2010 27 / 31
Intro to Unconstrained Optimization
Direct Search Method
s
-
8/12/2019 Lecture6.Handout
28/31
Direct-Search Methods
A direct search method only requires f(x) to be continuous butnot necessarily differentiable, and requires only function evaluations.
Methods that do a search similar to that in bisection can be devisedin higher dimensions also, but they may fail to converge and areusually slow.
The MATLAB function fminsearch uses the Nelder-Mead orsimplex-search method, which can be thought of as rolling a simplexdownhill to find the bottom of a valley. But there are many othersand this is an active research area.
Curse of dimensionality: As the number of variables(dimensionality)n becomes larger, direct search becomes hopelesssince the number of samples needed grows as 2n!
A. Donev (Courant Institute) Lecture VI 10/14/2010 28 / 31
Intro to Unconstrained Optimization
Minimum of 100(x2
x21 )
2 + (a x1)2 in MATLAB
-
8/12/2019 Lecture6.Handout
29/31
Minimum of 100(x2 x1 ) + (a x1) in MATLAB
% R o se nb ro ck o r b anana f u n c t i o n :
a = 1 ;b an an a = @( x ) 1 00( x ( 2 )x ( 1 ) 2 ) 2 + ( ax ( 1 ) ) 2 ;
% Th is f u n c t i o n must a c ce p t a r r a y a rg um en ts ! b a n an a x y = @( x 1 , x 2 ) 1 0 0( x2x 1 . 2 ) . 2 + ( ax1 ) . 2 ;
f i g u r e( 1 ) ; e z s u r f ( b an an a x y , [ 0 , 2 , 0 , 2 ] )
[ x , y ] = m esh gr id (l i n s p a c e( 0 , 2 , 1 0 0 ) ) ;f i g u r e( 2 ) ; c o n t o u r f ( x , y , b a n a n a x y ( x , y ) , 1 0 0 )
% C o r r e ct a ns we rs a r e x = [ 1 , 1] and f ( x )=0 [ x , f v a l ] = f m i n s e a r c h ( b an an a , [ 1 . 2 , 1 ] , o p t i m s e t ( T olX , 1 e 8))x = 0 . 9 9 9 9 9 9 9 9 9 1 8 7 8 1 4 0 . 9 9 9 9 9 9 9 9 8 4 4 1 9 1 9f v a l = 1 . 0 9 9 0 8 8 9 5 1 9 1 9 5 7 3 e18
A. Donev (Courant Institute) Lecture VI 10/14/2010 29 / 31
Intro to Unconstrained Optimization
Figure of Rosenbrock
f (x)
-
8/12/2019 Lecture6.Handout
30/31
Figure of Rosenbrock f(x)
1
0.5
0
0.5
1
2
1
0
1
20
200
400
600
800
x1
100 (x2x
1
2)2+(ax
1)2
x2
0 0.2 0.4 0.6 0.8 1 1.2 1.4 1.6 1.8 20
0.2
0.4
0.6
0.8
1
1.2
1.4
1.6
1.8
2
A. Donev (Courant Institute) Lecture VI 10/14/2010 30 / 31
Conclusions
Conclusions/Summary
-
8/12/2019 Lecture6.Handout
31/31
Conclusions/Summary
Root finding is well-conditioned for simple roots (unit multiplicity),ill-conditioned otherwise.
Methods for solving nonlinear equations are always iterative and theorder of convergence matters: second order is usually good enough.
A good method uses a higher-order unsafe method such as Newtonmethod near the root, but safeguards it with something like thebisectionmethod.
Newtons method is second-order but requires derivative/Jacobianevaluation. In higher dimensions having a good initial guess for
Newtons method becomes very important.Quasi-Newtonmethods can aleviate the complexity of solving theJacobian linear system.
A. Donev (Courant Institute) Lecture VI 10/14/2010 31 / 31