L. Vandenberghe ECE236C (Spring 2020)
14. Newton’s method
• Kantorovich theorem
• inexact Newton method
14.1
Newton’s method for nonlinear equations
Newton iteration for solving a nonlinear equation f (x) = 0:
x(k+1) = x(k) − f ′(x(k))−1 f (x(k))
• f : Rn → Rn is a vector valued function f (x) = ( f1(x), . . . , fn(x))• f ′(x) is the n × n Jacobian matrix at x:
( f ′(x))i j =∂ fi∂x j(x), i, j = 1, . . . ,n
• x(k+1) is the solution of the linearized equation at x(k):
f (x(k)) + f ′(x(k))(x − x(k)) = 0
we denote the iterates by the simpler notation xk = x(k) if the meaning is clear
Newton’s method 14.2
Matrix norm
in this lecture, operator norms are used for square matrices:
‖A‖ = supx,0
‖Ax‖‖x‖
the same (arbitrary) vector norm is used for ‖Ax‖ and ‖x‖
Properties (A, B are n × n matrices and x is an n-vector)
• identity matrix: ‖I ‖ = 1
• matrix-vector product: ‖Ax‖ ≤ ‖A‖‖x‖• submultiplicative property: ‖AB‖ ≤ ‖A‖‖B‖• perturbation lemma: if A is invertible and ‖A−1B‖ < 1, then A + B is invertible,
‖(A + B)−1‖ ≤ ‖A−1‖1 − ‖A−1B‖
Newton’s method 14.3
Proof of perturbation lemma
• A + B is invertible: if (A + B)x = 0, then
‖x‖ = ‖A−1Bx‖ ≤ ‖A−1B‖‖x‖
if ‖A−1B‖ < 1 this is only possible if x = 0
• Y = (A + B)−1 satisfies (I + A−1B)Y = A−1; therefore
‖Y ‖ = ‖A−1 − A−1BY ‖≤ ‖A−1‖ + ‖A−1BY ‖≤ ‖A−1‖ + ‖A−1B‖‖Y ‖
from which the inequality in the lemma follows
Newton’s method 14.4
Outline
• Kantorovich theorem
• inexact Newton method
Assumptions in Kantorovich theorem
• f : Rn → Rn is continuously differentiable on an open convex set D
• the Jacobian matrix f ′(x0) at the starting point x0 ∈ D is invertible
• the Jacobian is Lipschitz continuous on D: there exists a positive γ such that
‖ f ′(x0)−1( f ′(x) − f ′(y))‖ ≤ γ‖x − y‖ for all x, y ∈ D
• the norm η B ‖ f ′(x0)−1 f (x0)‖ of the first Newton step is bounded by
ηγ ≤ 12
• D contains the ball B(x0,r) = {x | ‖x − x0‖ ≤ r}, where
r =1 −
√1 − 2γηγ
Newton’s method 14.5
Kantorovich theorem
xk+1 = xk − f ′(xk)−1 f (xk), k = 0,1, . . .
under the assumptions on the previous page:
• the iteration is well defined, i.e., the Jacobian matrices f ′(xk) are invertible
• the iterates remain in B(x0,r)
• the iterates converge to a solution x? of f (x) = 0
• the following error bound holds:
‖xk − x?‖ ≤ (2γη)2k−1
2k−1 η
Newton’s method 14.6
Comments
• this is the affine-invariant version of the theorem: invariant under transformation
f (x) = A f (x), A nonsingular
• existence of a solution is not assumed but follows from the theorem
• complete theorem includes uniqueness of solution in a larger region
• theorem explains very fast local convergence; for example, if γη = 0.4
k (2γη)2k−1/2k−1
0 2.00000000000000001 0.80000000000000002 0.25600000000000003 0.05242880000000004 0.00439804651110405 0.00006189700196436 0.00000002451992877 0.0000000000000077
Newton’s method 14.7
Newton method for quadratic scalar equation
we first examine the convergence of Newton’s method applied to g(t) = 0, where
g(t) = γ2
t2 − t + η with γ > 0, η > 0, h B γη ≤ 12
• the roots will be denoted by t? and t??
t? =1 −√
1 − 2hγ
, t?? =1 +√
1 − 2hγ
• the Newton iteration started at t0 = 0 is
tk+1 =(γ/2)t2
k − ηγtk − 1
Newton’s method 14.8
Iteration map
Newton iteration can be written as tk+1 = ψ(tk) where
ψ(t) = (γ/2)t2 − η
γt − 1
0 t? 1 t?? 2
0
γ = 1, η = h = 3/8
g(t)ψ(t)
t
0 t? = t?? 2
0
γ = 1, η = h = 1/2
g(t)ψ(t)
t
Newton’s method 14.9
Recursions
to derive simple error bounds, we define gk(τ) as g(t) scaled and centered at tk :
gk(τ) =g(τ + tk)−g′(tk)
=γk
2τ2 − τ + ηk, k = 0,1, . . .
• coefficients γk , ηk , and hk = γkηk satisfy the recursions (see next page)
γk+1 =γk
1 − hk, ηk+1 =
hkηk
2(1 − hk), hk+1 =
h2k
2(1 − hk)2
with γ0 = γ, η0 = η, h0 = γη
• we denote the smallest root of gk by rk :
rk = t? − tk =1 − √1 − 2hk
γk=
2ηk
1 +√
1 − 2hk
• Newton step for gk at τ = 0 is equal to Newton step for g at t = tk :
ηk = tk+1 − tk
Newton’s method 14.10
Proof of recursions
• since g is quadratic,
γk
2τ2 − τ + ηk =
g(tk + τ)−g′(tk)
=(g′′(tk)/2)τ2 + g′(tk)τ + g(tk)
−g′(tk)
• recursion for γk :
γk+1 =g′′(tk+1)−g′(tk + ηk)
=g′′(tk)
−g′(tk) − g′′(tk)ηk=
γk
1 − γkηk=
γk
1 − hk
• recursion for ηk :
ηk+1 =g(tk + ηk)−g′(tk + ηk)
=(g′′(tk)/2)η2
k−g′(tk) − g′′(tk)ηk
=γkη
2k
2(1 − γkηk)=
hkηk
2(1 − hk)
• recursion for hk follows from hk+1 = γk+1ηk+1
Newton’s method 14.11
Error bounds
• Newton step ηk = tk+1 − tk :
ηk ≤(2h)2k−1
2kη
(see next page)
• error rk = t? − tk :
rk =2ηk
1 +√
1 − 2hk≤ 2ηk ≤
(2h)2k−1
2k−1 η
Newton’s method 14.12
Proof of bound on ηk
• since h0 ≤ 1/2 the recursion for hk shows that
2hk =h2
k−1(1 − hk−1)2
≤ (2hk−1)2
• applying this recursively we obtain 2hk ≤ (2h0)2k
• from the recursion for ηk (and hk ≤ 1/2):
ηk =hk−1ηk−1
2(1 − hk−1)≤ hk−1ηk−1
• applying this recursively and using the bound on hk we obtain the bound on ηk :
ηk ≤ hk−1 · · · h1h0η0
= 2−k(2h0)2k−1(2h0)2
k−2 · · · (2h0)2(2h0)η0
= 2−k(2h0)2k−1η0
Newton’s method 14.13
Summary of proof of Kantorovich theorem
to prove the Kantorovich theorem (pp. 14.5–14.6), we show that
‖xk+1 − xk ‖ ≤ tk+1 − tk
where tk are the iterates in Newton’s method, started at t0 = 0, for
γ
2t2 − t + η = 0
• tk is called a majorizing sequence for the sequence xk
• the bounds for tk on page 14.12 provide bounds and convergence results for xk
Newton’s method 14.14
Consequences of majorization
t0 = 0, ‖xk+1 − xk ‖ ≤ tk+1 − tk for k ≥ 0
• by the triangle inequality, if k ≥ j,
‖xk − x j ‖ ≤k−1∑i= j‖xi+1 − xi‖ ≤
k−1∑i= j(ti+1 − ti) = tk − t j
• the inequality shows that xk is a Cauchy sequence, so it converges
• taking j = 0 shows that xk remains in the set B(x0,r) (defined on page 14.5):
‖xk − x0‖ ≤ tk − t0 ≤ t? = r
• taking limits for k →∞ shows the error bound on page 14.6:
‖x? − x j ‖ ≤ t? − t j = r j ≤ (2h)2 j−1
2 j−1 η
Newton’s method 14.15
Details of proof of Kantorovich theorem
we prove that the following inequalities hold for k = 0,1, . . .
‖ f ′(xk+1)−1 f ′(xk)‖ ≤1
1 − hk(1)
‖ f ′(xk)−1( f ′(x) − f ′(y))‖ ≤ γk ‖x − y‖ for all x, y ∈ D (2)
‖ f ′(xk)−1 f (xk)‖ ≤ ηk (3)
B(xk+1,rk+1) ⊆ B(xk,rk) (4)
• γk , ηk , hk , rk are the sequences defined on page 14.10
• for k = 0, inequalities (2) and (3) hold by assumption, since γ0 = γ, η0 = η
• (3) is the majorization inequality ‖xk+1 − xk ‖ ≤ ηk = tk+1 − tk
Newton’s method 14.16
Proof by induction: suppose (2) and (3) hold at k = i, and (4) holds for k < i
• xi+1 ∈ D because B(xi,ri) ⊆ · · · ⊆ B(x0,r0) ⊆ D and ‖xi+1 − xi‖ ≤ ηi ≤ ri
• the inequality (2) at k = i implies that
‖ f ′(xi)−1 f ′(xi+1) − I ‖ = ‖ f ′(xi)−1( f ′(xi+1) − f ′(xi))‖≤ γi‖xi+1 − xi‖≤ γiηi
= hi
invertibility of f ′(xi+1) and (1) at k = i follow from the perturbation lemma
• inequality (2) at k = i + 1 follows from (1) and (2) at k = i:
‖ f ′(xi+1)−1( f ′(x) − f ′(y))‖ ≤ ‖ f ′(xi+1)−1 f ′(xi)‖‖ f ′(xi)−1( f ′(x) − f ′(y))‖≤ γi
1 − hi‖x − y‖
= γi+1‖x − y‖
Newton’s method 14.17
• inequality (3) at k = i + 1 follows from (2) at k = i + 1: define v = xi+1 − xi,
‖ f ′(xi+1)−1 f (xi+1)‖ = f ′(xi+1)−1
(∫ 1
0f ′(xi + tv)vdt + f (xi)
) =
f ′(xi+1)−1∫ 1
0
(f ′(xi + tv) − f ′(xi)
)vdt
≤ ‖v‖
∫ 1
0
f ′(xi+1)−1( f ′(xi + tv) − f ′(xi)) dt
≤ γi+12‖v‖2
≤ γi+1η2i
2= ηi+1
• inequality (4) at k = i follows from (3) at k = i + 1 and ri = ri+1 + ηi
‖x − xi+1‖ ≤ ri+1 =⇒ ‖x − xi‖ ≤ ‖x − xi+1‖ + ‖xi+1 − xi‖ ≤ ri+1 + ηi = ri
Newton’s method 14.18
Limit
it remains to show that the limit x? solves the equation
• by the assumptions on page 14.5,
‖ f ′(x0)−1 f (xk)‖ = ‖ f ′(x0)−1 f ′(xk)(xk+1 − xk)‖= ‖( f ′(x0)−1( f ′(xk) − f ′(x0)) + I) (xk+1 − xk)‖≤
(‖( f ′(x0)−1( f ′(xk) − f ′(x0))‖ + 1
)‖xk+1 − xk ‖
≤ (γr + 1)‖xk+1 − xk ‖
• since ‖xk+1 − xk ‖ → 0 and f is continuous,
‖ f ′(x0)−1 f (x?)‖ = limk→∞
‖ f ′(x0)−1 f (xk)‖ = 0
Newton’s method 14.19
Outline
• Kantorovich theorem
• inexact Newton method
Inexact Newton method
inexact Newton method for solving nonlinear equation f (x) = 0:
xk+1 = xk + sk where sk ≈ − f ′(xk)−1 f (xk)
• sk is an approximate solution of the Newton equation
f ′(xk)s = − f (xk)
• goal is to reduce cost per iteration while retaining fast convergence
• in Newton-iterative methods, Newton equation is solved by iterative method
• an example is the Newton-CG method if f ′(x) is symmetric positive definite
Newton’s method 14.20
Forcing condition
accept the inexact Newton step sk if
‖ f ′(xk)sk + f (xk)‖ ≤ αk ‖ f (xk)‖
• coefficient αk < 1 is called the forcing term
• αk limits relative error in the Newton equation
• provides a stopping condition in iterative method for solving Newton equation
• αk is constant or adjusted adaptively
Newton’s method 14.21
Local convergence
Assumptions
• the equation has a solution x? and f ′(x?) is invertible
• f ′ is Lipschitz continuous in a neighborhood of x?
Local convergence result
• the iterates xk converge to x? if x0 is sufficiently close to x?
• an error bound of following type holds (for some κ with κα < 1 if αk ≤ α < 1):
‖xk+1 − x?‖ ≤ κ (‖xk − x?‖ + αk) ‖xk − x?‖
• this shows how the forcing term determines the rate of convergence
αk constant αk = 0 αk ↘ 0
convergence: linear quadratic superlinear
Newton’s method 14.22
References
Newton method
• J. E. Dennis, Jr., and R. B. Schabel, Numerical Methods for Unconstrained Optimization andNonlinear Equations (1996).
• P. Deuflhard, Newton Methods for Nonlinear Problems: Affine Invariance and AdaptiveAlgorithms (2011).
• C. T. Kelley, Iterative Methods for Linear and Nonlinear Equations (1995).• J. M. Ortega and W. C. Rheinboldt, Iterative Solution of Nonlinear Equations in Several
Variables (2000).
Kantorovich theorem: the statement and proof of the theorem in the lecture follow
• P. Deuflhard, Newton Methods for Nonlinear Problems: Affine Invariance and AdaptiveAlgorithms (2011), theorem 2.1.
• T. Yamamoto, A unified derivation of several error bounds for Newton’s process, Journal ofComputational and Applied Mathematics (1985).
Inexact Newton method
• C. T. Kelley, Iterative Methods for Linear and Nonlinear Equations (1995), chapter 6.• J. Nocedal and S. J. Wright, Numerical Optimization (2006), chapter 7.
Newton’s method 14.23