numerical methods for large-scale non-linear systems, hoppe

Numerical Methods for

Large-Scale Nonlinear Systems

Handouts by Ronald H.W. Hoppe

following the monograph

P. Deuflhard

Newton Methods for Nonlinear Problems

Springer, Berlin-Heidelberg-New York, 2004

Num. Meth. Large-Scale Nonlinear Systems 2

1. Classical Newton Convergence Theorems

1.1 Classical Newton-Kantorovich Theorem

Theorem 1.1 Classical Newton-Kantorovich Theorem

Let X and Y be Banach spaces, D X a convex subset, and suppose that thatF : D X Y is continuously Frechet differentiable on D with an invertibleFrechet derivative F (x0) for some initial guess x0 D. Assume further thatthe following conditions hold true:

F (x0)1F (x0 , (1.1)F (y) F (x) y x , x, y D , (1.2)h0 := F (x0)1 < 1

2, (1.3)

B(x0, 0) D , 0 := 11 2h0

F (x0)1 . (1.4)

Then, for the sequence {xk}lN0 of Newton iterates

F (xk) xk = F (xk) ,xk+1 = xk + xk

there holds

(i) F (x) is invertible for all Newton iterates x = xk, k lN0,(ii) The sequence {xk}lN of Newton iterates is well defined with xk B(x0, 0),

k lN0, and xk x B(x0, 0), k lN0 (k ), where F (x) = 0,(iii) The convergence xk x (k ) is quadratic,(iv) The solution x of F (x) = 0 is unique in

B(x0, 0) (D B(x0, 0)) , 0 :=1 +

1 2h0

F (x0)1 .

Proof. We have

F (xk) F (x0) xk x0 tkfor some upper bound tk, k lN.If we can prove xk B(x0, 0) and tk := F (x0)1tk < 1, k lN, then by the


Banach perturbation lemma F (xk) is invertible with

F (xk)1 F(x0)1

1 F (x0)1F (xk) F (x0) (1.5)

F(x0)1

1 F (x0)1xk x0 F (x0)11 tk

=: k .

We prove xk B(x0, 0) and tk < 1, k lN, by induction on k:For k = 1 we have

x1 x0 = F (x0)1F (x0) = h0F (x0)1 < 0 ,

since h0 < 11 2h0, and

t1 := F (x0)1 t1 = F (x0)1 x1 x0 == F (x0)1 F (x0)1F (x0) F (x0)1

= h0

=

=

1s=0

< F (xk + sxk) F (xk),xk > ds 12

< F (xk)xk,xk > =

=

1s=0

1t=0

< F (xk + stxk)xk,xk > dt ds 12< F (xk)xk,xk >=

=

1s=0

s

1t=0

< (F (xk + stxk) F (xk))xk =: wk

,xk > dt ds =

=

1s=0

s

1t=0

< F (xk)1/2wk, F (xk)1/2xk > dt ds

1

s=0

s

1t=0

F (xk)1/2wk stF (xk)1/2xk2

F (xk)1/2xk dt ds

F (xk)1/2xk2 = k

F (xk)1/2xk = hk

1s=0

s21

t=0

t dt 16hk k ,

which proves (1.34).

Using the right-hand side of (1.34) and hk < 2 yields

f(xk) f(xk+1) (12

+1

6hk) k

1

6k .

Together, this proves (1.35).

In order to prove (iii), we use (1.34) and obtain

0 2 (f(x0) f(x)) 2k=0

(f(xk) f(xk+1) < 562 k =

=5

6h2k =

5

64 (

1

2hk)

2 .

Using

1

2hk+1 (1

2hk)

2 12hk < 1 ,

we further get

(1

2h0)

2 + (1

2h1)

2 + ...

(12h0)

2 + (1

2h0)

4 + (1

2h1)

4 + ...

14h20

k=0

(1

2h0)

k =14h20

1 h02

,



3. Inexact Newton Methods

We recall that Newtons method computes iterates successively as the solutionof linear algebraic systems

F (xk) xk = F (xk) , k lN0 , (1.37)xk+1 = xk + xk .

The classical convergence theorems of Newton-Kantorovich and Newton-My-sovskikh and its affine covariant, affine contravariant, and affine conjugate ver-sions assume the exact solution of (1.37).In practice however, in particular if the dimension is large, (1.37) will be solvedby an iterative method. In this case, we end up with an outer/inner iter-ation, where the outer iterations are the Newton steps and the inner iterationsresult from the application of an iterative scheme to (1.37). It is important totune the outer and inner iterations and to keep track of the iteration errors.With regard to affine covariance, affine contravariance, and affine conjugacy theiterative scheme for the inner iterations has to be chosen in such a way, that iteasily provides information about the

error norm in case of affine covariance, residual norm in case of affine contravariance, and energy norm in case of affine conjugacy.

Except for convex optimization, we cannot expect F (x), x D, to be sym-metric positive definite. Hence, for affine covariance and affine contravariancewe have to pick iterative solvers that are designed for nonsymmetric matrices.Appropriate candidates are

CGNE (Conjugate Gradient for the Normal Equations) in caseof affine covariance,

GMRES (Generalized Minimum RESidual) in case of affine con-travariance, and

PCG (Preconditioned Conjugate Gradient) in case of affine con-jugacy.


3.1 Affine Covariant Inexact Newton Methods

3.1.1 CGNE (Conjugate Gradient for the Normal Equations)

We assume A lRnn to be a regular, nonsymmetric matrix and b lRn tobe given and look for y lRn as the unique solution of the linear algebraicsystem

Ay = b . (1.38)

As the name already suggests, CGNE is the conjugate gradient method appliedto the normal equations:It solves the system

AAT z = b , (1.39)

for z and then computes y according to

y = AT z . (1.40)

The implementation of CGNE is as follows:

CGNE Initialization:

Given an initial guess y0 lRn, compute the residual r0 = b Ay0 and setp0 = r0 , p0 = 0 ,

0 = 0 , 0 = r02 .CGNE Iteration Loop: For 1 i imax compute

pi = AT ri1 + i1 pi1 , i =

i1pi2 ,

yi = yi1 ipi , 2i1 = ii1 ,

ri = ri1 iApi , i = ri2 ,

i =ii1

.

CGNE has the error minimizing property

y yi = minvKi(AT r0,ATA)

y v , (1.41)

where Ki(AT r0, ATA) stands for the Krylov subspaceKi(AT r0, ATA) := span{AT r0, (ATA)AT r0, ..., (ATA)i1AT r0} . (1.42)


Lemma 3.1 Representation of the iteration error

Let i := y yi2 be the square of the CGNE iteration error with respectto the i-th iterate. Then, there holds

i =n1j=i

2j . (1.43)

Proof. CGNE has the Galerkin orthogonality

(yi y0, yi+m yi) = 0 , m lN . (1.44)Setting m = 1, this implies the orthogonal decomposition

yi+1 y02 = yi+1 yi2 + yi y02 , (1.45)which readily gives

yi y02 =i1j=0

yj+1 yj2 =i1j=0

2j . (1.46)

On the other hand, observing yn = y, for m = n i the Galerkin orthogonality

yields

y y02 =

n1j=0

2j

= y yi2 = 2i

+ yi y02 =

i1j=0

2j

. (1.47)

Computable lower bound for the iteration error

It follows readily from Lemma 3.1 that the computable quantity

[i] :=i+mj=i

2j , m lN, (1.48)

provides a lower bound for the iteration error.In practice, we will test the relative error norm according to

i :=y yiyi

[i]

yi , (1.49)

where is a user specified accuracy.


3.1.2 Convergence of affine covariant inexact Newton methods

We denote by xk lRn the result of an inner iteration, e.g., CGNE, for thesolution of (1.37). Then, it is easy to see that the iteration error xk xksatisfies the error equation

F (xk)(xk xk) = F (xk) + F (xk)xk =: rk . (1.50)We will measure the impact of the inexact solution of (1.37) by the relativeerror

k :=xk xk

xk . (1.51)

Theorem 3.1 Affine covariant convergence theorem for the inexactNewton method. Part I: Linear convergence

Suppose that that F : D lRn lRn is continuously differentiable on D withinvertible Frechet derivatives F (x), x lRn. Assume further that the followingaffine covariant Lipschitz condition is satisfied

F (z)1(F (y) F (x)

)v y x v , (1.52)

where x, y, z D, v lRn.Assume that x0 D is an initial guess for the outer Newton iteration andthat x0 = 0 is chosen as the startiterate for the inner iteration. Considerthe Kantorovich quantities

hk := xk , hk := xk =hk1 + 2k

(1.53)

associated with the outer and inner iteration.Assume that

h0 < 2 , 0 < 1 , (1.54)and control the inner iterations according to

(hk, k) :=12hk + k(1 + h

k)

1 + 2k < 1 , (1.55)

which implies linear convergence.Note that a necessary condition for (hk, k) is that it holds true fork = 0, which is satisfied due to assumption (1.37).


Then, there holds:

(i) The Newton CGNE iterates xk, k lN0 stay in

B(x0, ) , :=x01 (1.56)

and converge linearly to some x B(x0, ) with F (x) = 0.(ii) The exact Newton increments decrease monotonically according to

xk+1xk , (1.57)

whereas for the inexact Newton increments we have

xk+1xk

1 + 2k1 + 2k+1

. (1.58)

Proof. By elementary calculations we find

xk+1 = F (xk+1)1F (xk+1) = (1.59)

= F (xk+1)1[F (xk+1) F (xk)

]+ F (xk+1)1 F (xk)

= rkF (xk)xk

= F (xk+1)1[F (xk+1) F (xk) F (xk)xk

] +

+ F (xk+1)1 rk= F (xk)(xkxk)

1

0

F (xk+1)1[F (xk + txk) F (xk)

]xkdt

=: I

+

+ F (xk+1)1F (xk)(xk xk) =: II

.


Using the affine covariant Lipschitz condition (1.52), the first term on theright-hand side in (1.59) can be estimated according to

I xk21

0

t dt =1

2 xk2 . (1.60)

For the second term we obtain by the same argument

II = F (xk+1)1[F (xk)(xk xk) F (xk+1)(xk xk)

] (1.61)

F (xk+1)1(F (xk+1) F (xk))(xk xk) +

+ F (xk+1)1F (xk+1)(xk xk)

12xk xk xk + xk xk2 .

Combining (1.60) and (1.61) yields

xk+1xk

1

2 xk

= hk

+1

2 xk x

k xkxk

= k hk

+xk xk

xk = k

hk + k (1 + hk) .Observing (1.53), we finally get

xk+1xk (hk, k) =

12hk + k(1 + h

k)

1 + 2k < 1 , (1.62)

which implies linear convergence.Note that a necessary condition for (hk, k) is that it holds true for k = 0,which is satisfied due to assumption (1.54).For the contraction of the inexact Newton increments we get

xk+1xk =

1 + 2k1 + 2k+1

xk+1xk

1 + 2k1 + 2k+1

. (1.63)

It can be easily shown that {xk}lN0 is a Cauchy sequence in B(x0, ). Conse-quently, there exists x B(x0, ) such that xk x (k ). Since

F (xk)xk 0

= F (xk) + rk F (x)

,

we conclude F (x) = 0.


Theorem 3.2 Affine covariant convergence theorem for the inexactNewton method. Part II: Quadratic convergence

Under the same assumptions on F : D lRn lRn as in Theorem 3.1 supposethat the initial guess x0 D satisfies

h0 0 and control the inner iterations such that

k 2

hk1 + hk

. (1.65)

Then, there holds:

(i) The Newton CGNE iterates xk, k lN0 stay in

B(x0, ) , :=x0

1 1+2

h0(1.66)

and converge quadratically to some x B(x0, ) with F (x) = 0.(ii) The exact Newton increments and the inexact Newton incrementsdecrease quadratically according to

xk+1 1 + 2

xk2 , (1.67)

xk+1 1 + 2

xk2 . (1.68)

Proof. We proceed as in the proof of Theorem 3.1 to obtain

xk+1xk (hk, k) =

12hk + k(1 + h

k)

1 + 2k.

and

xk+1xk =

1 + 2k1 + 2k+1

xk+1xk .

In view of (1.65) we get the further estimates

xk+1xk

1 +

2

hk1 + 2k

1 + 2

hk .


and

xk+1xk

1 +

2

hk1 + 2k+1

1 + 2

hk ,

from which (1.67) and (1.68) follow by the definition of the Kantorovich quan-tities.In order to deduce quadratic convergence we have to make sure that the initialincrements (k = 0) are small enough, i.e.,

1 +

2h0

1 +

2h0 < 1 . (1.69)

Furthermore, (1.68) and (1.69) allow us to show that the iterates xk, k lN stayin B(x0, ). Indeed, (1.68) implies

xj 1 + 2

hj1 xj1 1 + 2

h0 xj1 , j lN ,and hence,

xk x k

j=0

xj x0k

j=0

(1 +

2h0)

j x0

1 1+2

h0.

3.1.3 Algorithmic aspects of affine covariant inexact Newton methods

(i) Convergence monitor

Let us assume that the quantity < 1 in both the linear convergence modeand the quadratic convergence mode has been specified and let us furtherassume that we use CGNE with xk0 = 0 in the inner iteration.Then, (1.58) suggests the monotonicity test

k :=

1 + 2k+11 +

2

k

xk+1xk , (1.70)

where 2

k and 2

k+1 are computationally available estimates of 2k and

2k+1.

(ii) Termination criterion

We recall that the termination criterion for the exact Newton iteration withrespect to a user specified accuracy XTOL is given by

xk12k1

XTOL .


According to (1.53) we have

xk =1 + 2k xk.

Consequently, replacing k1 and k by the computable quantities k1 andk, we arrive at the termination criterion

1 + 2

k

1 2k1 XTOL . (1.71)

(iii) Balancing outer and inner iterations

According to (1.55) of Theorem 3.1, in the linear convergence mode theadaptive termination criterion for the inner iteration is

(hk, k) :=12hk + k(1 + h

k)

1 + 2k < 1 .

On the other hand, in view of (1.65) of Theorem 3.2, in the quadratic con-vergence mode the termination criterion is

k 2

hk1 + hk

.

Since the theoretical Kantorovich quantities (cf. (1.53))

hk = xk =hk1 + 2k

are not directly accessible, we have to replace them by computationally avail-able estimates [hk].We recall that for hk we have the a priori estimate

[hk] = 2 2k1 hk .

Consequently, replacing k by k, hk by [hk], and k1 by k1 (cf. (1.70)), weget the a priori estimates

[hk] =[hk]1 +

2

k

, [hk] = 2 2k1 , k lN . (1.72)

For k = 0, we choose 0 = 0 =14.

In practice, for k 1 we begin with the quadratic convergence mode and switch


to the linear convergence mode as soon as the approximate contraction factork is below some prespecified threshold value 12 .(iii)1 Quadratic convergence mode

The computationally realizable termination criterion for the inner itera-tion in the quadratic convergence mode is

k 2

[hk]

1 + [hk]. (1.73)

Inserting (1.72) into (1.73), we obtain a simple nonlinear equation in k.

Remark 3.1 Validity of the approximate termination criterion

Observing that the right-hand side in (1.73) is a monotonically increasing func-tion of [hk], and taking [h

k] hk into account, it follows that for k k the

approximate termination criterion (1.73) implies the exact termination criterion(1.65).

Remark 3.2 Computational work in the quadratic convergence mode

Since k 0 (k ) is enforced, it follows that:The more the iterates xk approach the solution x, the more computa-tional work is required for the inner iterations to guarantee quadraticconvergence of the outer iteration.

(iii)2 Linear convergence mode

We switch to the linear convergence mode, once the criterion

k < (1.74)

is met.The computationally realizable termination criterion for the inner itera-tion in the linear convergence mode is

[(hk, k)] := ([hk], k) =12[hk] + k(1 + [h

k])

1 + 2

k

. (1.75)

Remark 3.3 Validity of the approximate termination criterion

Since the right-hand side in (1.75) is a monotonically increasing function in [hk]and [hk] hk, the estimate provided by (1.75) may be too small and thus resultin an overestimation of k. However, since the exact quantities and their apriori estimates both tend to zero as k approaches infinity, asymptotically wemay rely on (1.75).


In practice, we require the monotonicity test (1.70) in CGNE and run theinner iterations until k satisfies (1.75) or divergence occurs, i.e.,

k > 2 .

Remark 3.4 Computational work in the linear convergence mode

As opposed to the quadratic convergence mode, we observe

The more the iterates xk approach the solution x, the less compu-tational work is required for the inner iterations to guarantee linearconvergence of the outer iteration.


3.2 Affine Contravariant Inexact Newton Methods

3.2.1 GMRES (Generalized Minimum RESidual)

The Generalized Minimum RESidual Method (GMRES is an iterativesolver for nonsymmetric linear algebraic systems which generates an orthogo-nal basis of the Krylov subspace

Ki(r0, A) := span{r0, Ar0, ..., Ai1r0} . (1.76)

by a modified Gram-Schmidt orthogonalization called the Arnoldi method.The inner product coefficients are stored in an upper Hessenberg matrixso that an approximate solution can be obtained by the solution of a least-squares problem in terms of that Hessenberg matrix:

GMRES Initialization:

Given an initial guess y0 lRn, compute the residual r0 = b Ay0 and set

:= r0 , v1 := r0

, V1 := v1 . (1.77)

GMRES Iteration Loop: For 1 i imax:I. Orthogonalization:

vi+1 = Avi Vihi , (1.78)where hi = V

Ti Avi . (1.79)

II. Normalization:

vi+1 =vi+1vi+1 . (1.80)

III. Update:

Vi+1 =(Vi vi+1

). (1.81)

Hi =

(hi

vi+1)

, i = 1 , (1.82)

Hi =

(Hi1 hi0 vi+1

), i > 1 . (1.83)


IV. Least squares problem: Compute zi as the solution of

e1 Hizi = minzlRn

e1 Hiz . (1.84)

V. Approximate solution:

yi = Vizi + y0 . (1.85)

GMRES has the residual norm minimizing property

b Ayi = minzKi(r0,A)

b Az . (1.86)

Moreover, the inner residuals decrease monotonically

ri+1 ri , i lN0 . (1.87)

Termination criterion for the GMRES iteration

The residuals satisfy the orthogonality relation

(ri, ri r0) = 0 , i lN , (1.88)from which we readily deduce

r02 = ri r02 + ri2 , i lN . (1.89)We define the relative residual norm error

i :=rir0 . (1.90)

Clearly, i < 1, i lN, andi+1 < i if i 6= 0 . (1.91)

Consequently, given a user specified accuracy , an appropriate adaptivetermination criterion is

i . (1.92)We note that, in terms of i, (1.89) can be written as

ri r02 = (1 2i ) r02 . (1.93)


3.2.2 Convergence of affine contravariant inexact Newton methods

We denote by xk lRn the result of the inner GMRES iteration. As initialvalues for GMRES we choose

xk0 = 0 , rk0 = F (x

k) . (1.94)

Consequently, during the inner GMRES iteration the relative error i, i lN0,in the residuals satisfies

i =rki

F (xk) 1 , i+1 < i , if i 6= 0 . (1.95)

In the sequel, we drop the subindices i for the inner iterations and refer to kas the final value of the inner iterations at each outer iteration step k.

Theorem 3.3 Affine contravariant convergence theorem for the inex-act Newton GMRES method. Part I: Linear convergence

Suppose that F : D lRn lRn is continuously differentiable on D and letx0 D be some initial guess. Let further the following affine contravariantLipschitz condition be satisfied

(F (y) F (x))(y x) F (x)(y x)2 , x, y D , 0 . (1.96)Assume further that the level set

L0 := {x lRn | F (x) F (x0)} (1.97)is a compact subset of D.In terms of the Kantorovich quantities

hk := F (xk) , k lN0 . (1.98)the outer residual norms can be bounded according to

F (xk+1) (k +

1

2(1 2k) hk

)F (xk) . (1.99)

Assume that

h0 < 2 (1.100)

and control the inner iterations according to

k 12hk , (1.101)


for some h02< < 1.

Then, the Newton GMRES iterates xk, k lN0 stay in L0 and convergelinearly to some x L0 with F (x) = 0 at an estimated rate

F (xk+1) F (xk) . (1.102)Proof. We recall that the Newton GMRES iterates satisfy

F (xk) xk = F (xk) + rk , (1.103)xk+1 = xk + xk . (1.104)

It follows from the generalized mean value theorem that

F (xk+1) = F (xk) +

10

F (xk + txk) xk dt . (1.105)

Consequently, replacing F (xk) in (1.105) by (1.103), we obtain

F (xk+1) = 1

0

(F (xk + txk) F (xk)

)xk dt + rk

1

0


)xk dt + rk

12 F (xk) xk2 + rk

12 F (xk) rk2 + rk .

We recall (1.93)

rk F (xk)2 = (1 2k) F (xk)2 ,from which (1.99) can be immediately deduced.Now, in view of (1.101), (1.99) yields

F (xk+1) (

k 1

2hk

+1

2(1 2k)hk

)F (xk)

( 122k hk) F (xk) F (xk) .

Taking advantage of the previous inequality, by induction on k it follows that

xk L0 D , k lN0 .


Hence, there exists a subsequence lN lN and an x L0 such that xk x (k lN ) and F (x) = 0. Moreover, since

F (xk+`) F (xk) F (xk+`) + F (xk) (1 + `) F (xk) (1 + `) k F (x0) 0 (k lN) ,

the whole sequence must converge to x.

Theorem 3.4 Affine contravariant convergence theorem for the inex-act Newton GMRES method. Part II: Quadratic convergence



k1 2k

2hk . (1.107)

Then, the Newton GMRES iterates xk, k lN0 stay in L0 and convergequadratically to some x B(x0, ) with F (x) = 0 at an estimated rate

F (xk+1) 12 (1 + ) (1 2k) F (xk)2 . (1.108)

Proof. Inserting (1.107) into (1.99) and observing hk = F (xk) gives theassertion.

3.2.3 Algorithmic aspects of affine contravariant inexact Newtonmethods


Throughout the inexact Newton GMRES iteration we use the residual mono-tonicity test

k :=F (xk+1)F (xk) < 1 . (1.109)

The iteration is considered as divergent, if

k > . (1.110)



As in the exact Newton iteration, specifying a residual accuracy FTOL, thetermination criterion for the inexact Newton GMRES iteration is

F (xk) FTOL . (1.111)(iii) Balancing outer and inner iterations

With regard to (1.101) of Theorem 3.3, in the linear convergence mode theadaptive termination criterion for the inner GMRES iteration is

k 12hk ,

whereas, in view of (1.107) of Theorem 3.4, in the quadratic convergencemode the termination criterion is

k1 2k

2hk .

Again, we replace the theoretical Kantorovich quantities hk by some computa-tionally easily available a priori estimates. We distinguish between the quadraticand the linear convergence mode:

(iii)1 Quadratic convergence mode

We recall the termination criterion (1.107) for the quadratic convergence mode

k1 2k

2hk .

It suggests the a posteriori estimate

[hk]2 :=2 k

(1 + ) (1 2k) hk .

In view of hk+1 = khk, this implies the a priori estimate

[hk+1] := k [hk]2 k hk = hk+1 . (1.112)Using (1.112) in (1.107) results in the computationally feasible terminationcriterion

k1 2k

12 [hk] , 1.0 . (1.113)



We switch from the quadratic to the linear convergence mode, if the local con-traction factor satisfies

k < . (1.114)

The proof of the previous theorems reveals

F (xk+1) rk 2F (xk) rk2 = 1

2(1 2k) hk F (xk) . (1.115)

The above inequality (1.115) implies the a posteriori estimate

[hk]1 :=2 F (xk+1) rk(1 2k)F (xk)

hk (1.116)

and the a priori estimate

[hk+1] := k [hk]1 hk+1 . (1.117)

Based on (1.117) we define

k+1 := 1

2[hk+1] . (1.118)

If we find

k+1 < k (1.119)

with k from (1.113), we continue the iteration in the quadratic conver-gence mode.Otherwise, we realize the linear convergence mode with some

k+1 k+1 . (1.120)


3.3 Affine Conjugate Inexact Newton Methods

3.3.1 PCG (Preconditioned Conjugate Gradient)

The Preconditioned Conjugate Gradient Method (PCG) is an iterativesolver for linear algebraic systems with a symmetric positive definite coefficientmatrix A lRnn. We recall that any symmetric positive definite matrix C lRnn defines an energy inner product (, )C according to

(u, v)C := (u,Cv) , u, v lRn .

The associated energy norm is denoted by C .The PCG Method with a symmetric positive definite preconditioner B lRnn corresponds to the CG Method applied to the transformed linear algebraicsystem

B1/2AB1/2(B1/2y) = B1/2b .

The PCG Method is implemented as follows:

PCG Initialization:

Given an initial guess y0 lRn, compute the residual r0 = b Ay0 and thepreconditioned residual r0 = Br0 and set

p0 := r0 , 0 := (r0, r0) = r02B .

PCG Iteration Loop: For 0 i imax compute:

yi+1 = yi +1

ipi ,

ri+1 = ri 1i

Api , ri+1 = Bri+1 , i =pi2Ai

2i =ii

(= yi+1 yi2A) ,

pi+1 = ri+1 +i+1i

pi , i+1 = ri+12B .


PCG minimizes the energy error norm

y yiA = minzKi(r0,A)

y zA , (1.121)

where Ki(r0, A) denotes the Krylov subspace

Ki(r0, A) := span{r0, ..., Ai1r0} . (1.122)

PCG satisfies the Galerkin orthogonality

(yi y0, yi+m yi)A = 0 , m lN . (1.123)

Denoting by y lRn the unique solution of Ay = b and by i := y yi2Athe square of the iteration error in the energy norm, we have the following errorrepresentation:

Lemma 3.2 Representation of the iteration error

The PCG iteration error satisfies

i =n1j=i

2j . (1.124)

Proof. For m = 1 the Galerkin orthogonality implies the orthogonal decom-positions

yi+1 y02A = yi+1 yi2A = 2i

+ yi y02A , (1.125)

yi y02A =i1j=0

yj+1 yj2A =i1j=0

2j . (1.126)

On the other hand, observing yn = y, for m = n i the Galerkin orthogonality

yields

y y02A =

n1j=0

2j

= y yi2A = 2i

+ yi y02A =

i1j=0

2j

. (1.127)


Computable lower bound for the iteration error

A lower bound for the iteration error in the energy norm is obviously given by

[i] =i+mj=0

2j . (1.128)

In the inexact Newton PCG method we will control the inner PCG itera-tions by the relative energy error norms

i =y yiAyiA

[i]

yiA (1.129)

and use the termination criterion

i , (1.130)

where is a user specified accuracy.

3.3.2 Convergence of affine conjugate inexact Newton methods

We denote by xk lRn the result of the inner PCG iteration. As initial valuefor PCG we choose

xk0 = 0 . (1.131)

Again, we will drop the subindices i for the inner PCG iterations and refer tok as the final value of the inner iterations at each outer iteration step k. Werecall the Galerkin orthogonality (cf. (1.123))

(xk, F (xk)(xk xk)) = (xk, rk) = 0 . (1.132)

Theorem 3.5 Affine conjugate convergence theorem for the inexactNewton PCG method. Part I: Linear convergence

Suppose that f : D lRn lR is a twice continuously differentiable strictlyconvex functional onD with the first derivative F := f and the Hessian F = f

which is symmetric and uniformly positive definite. Assume that x0 D is someinitial guess such that the level set

L0 := {x D | f(x) f(x0)}


is compact.Let further the following affine conjugate Lipschitz condition be satisfied

F (z)1/2(F (y) F (x)

)v (1.133)

F (x)1/2(y x) F (x)1/2v , x, y, z D , 0 .

For the inner Newton PCG iterations consider the exact error terms

k := F (xk)1/2xk2

and the Kantorovich quantities

hk := F (xk)1/2xk

as well as their inexact analogues

k := F (xk)1/2xk2 =k

1 + 2k

and

hk := F (xk)1/2xk =hk1 + 2k

,

where k characterizes the inner PCG iteration error

k :=F (xk)1/2

(xk xk

)

F (xk)1/2xk .

Assume that for some < 1

h0 < 2 < 2 (1.134)

and that

k+1 k , k lN0 (1.135)

holds true throughout the outer Newton iterations.Control the inner iterations according to

(hk, k) :=hk + k

(hk +

4 + (hk)

2)

21 + 2k

. (1.136)


Then, the inexact Newton PCG iterates xk, k lN0 stay in L0 and con-verge linearly to some x L0 with f(x) = min

xDf(x).

The following estimates hold true

F (xk+1)1/2xk+1 F (xk)1/2xk , k lN0 , (1.137)

F (xk+1)1/2xk+1 F (xk)1/2xk , k lN0 . (1.138)

Moreover, the objective functional is reduced according to

110

hk k f(xk) f(xk+1)

2

3k

1

10hk

k . (1.139)

Proof. Observing

rk = F (xk) + F (xk)xk , k lN0 ,

for [0, 1] we obtain

f(xk + xk) f(xk) =

s=0

(xk, F (xk + sxk)) ds = (1.140)

=

s=0

(xk, F (xk + sxk) F (xk)) ds +

s=0

(xk, F (xk)) ds =

=

s=0

s

st=0

(xk, F (xk + stxk)xk) dt ds +

s=0

(xk, F (xk)) ds =

=

s=0

s

st=0

(xk,(F (xk + stxk) F (xk)

)xk) dt ds +

+

s=0

s

st=0

(xk, F (xk)xk) dt ds +

s=0

(xk, F (xk) rkF (xk)xk

) ds =


=

s=0

s

st=0

(F (xk)1/2xk, F (xk)1/2(F (xk + stxk) F (xk)

)xk)

F (xk)1/2xk s t F (xk)1/2xk2 = s t hk k

dt ds

+

s=0

s

st=0

(xk, F (xk)xk) dt ds

s=0

(xk, F (xk)xk) ds +

+

s=0

(xk, rk) = 0 due to (1.123)

ds 110

6 hk k +

1

34 k 2 k .

It readily follows from (1.140) that

f(xk + xk) f(xk) + 2 ( 110

hk k + (

1

32 1) k) . (1.141)

Denoting by Lk the level setLk := { x D | f(x) f(xk) } ,

by induction on k we prove

hk < 2 and hence, xk+1 Lk . (1.142)

For k = 0, we have h0 < 2 by assumption (1.134). Since h0 h0, (1.141) readily

shows f(x1) < f(x0), whence x1 L0.Now, assuming (1.142) to hold true for some k lN, again taking advantage ofhk hk < 2, (1.141) yields f(xk+1) < f(xk) and thus xk+1 Lk.Moreover, choosing = 1 in (1.141), we obtain the left-hand side of the func-tional descent property (1.139). We note that we get the right-hand side of(1.139), if in (1.140) we estimate by the other direction of the Cauchy-Schwarzinequality.Finally, in order to prove the contraction properties (1.137),(1.138) and lin-ear convergence, we estimate the local energy norms as follows:

F (xk+1)1/2xk+1 = F (xk+1)1/2 F (xk+1)xk+1 = F (xk+1)

=

= F (xk+1)1/2(F (xk+1) F (xk)

) =


= F (xk+1)1/2(F (xk+1) F (xk)

)+ F (xk+1)1/2 F (xk) .

Observing

F (xk) = F (xk)xk + rk ,and using the affine conjugate Lipschitz condition we obtain

F (xk+1)1/2xk+1 = (1.143)

= F (xk+1)1/2( 1

0


)xk dt + rk

)

12 F (xk)1/2xk2 + F (xk+1)1/2rk .

Setting z = xkxk, for the second term on the right-hand side of the previousinequality we get the implicit estimate

F (xk+1)1/2rk2

F (xk)1/2z2 + hk F (xk)1/2z F (xk+1)1/2rk ,which gives the explicit bound

F (xk+1)1/2rk 12

(hk +

4 + (hk)

2)F (xk)z . (1.144)

Using (1.144) in (1.143) results in

F (xk+1)1/2xk+1

122 F (xk)1/2xk2

= (hk)2

+1

2

(hk +

4 + (hk)

2) F (xk)1/2z

= k hk

.

Taking (1.136) into account, we thus get the contraction factor estimate

k := F (xk+1)1/2xk+1 F (xk)1/2xk = hk =

1+2k h

k

(hk, k) , (1.145)


which proves (1.137) and linear convergence.For the proof of (1.138) we observe

F (x`)1/2x`2 = (1 + 2` ) F (x`)1/2x`2 , ` = k, k + 1 ,

as well as k+1 k and obtain

F (xk+1)1/2xk+1F (xk)1/2xk

1 + 2k1 + 2k+1

k k . (1.146)

By standard arguments we further show that the sequence {xk}lN0 of inexactNewton PCG iterates is a Cauchy sequence in L0 and there exists an x L0such that xk x (k ) with F (x) = 0.

Theorem 3.6 Affine conjugate convergence theorem for the inexactNewton PCG method. Part II: Quadratic convergence



k 2

hk

hk +4 + (hk)

2. (1.148)

Then, there holds:

(i) The Newton CGNE iterates xk, k lN0 stay in L0 and convergequadratically to some x L0 with F (x) = 0.(ii) The exact Newton increments and the inexact Newton incrementsdecrease quadratically according to

F (xk+1)1/2 xk+1 1 + 2

F (xk)1/2 xk2 , (1.149)

F (xk+1)1/2 xk+1 1 + 2

F (xk)1/2 xk2 . (1.150)


Proof. Using (1.148) in (1.145) yields

F (xk+1)1/2xk+1F (xk)1/2xk

hk + k (hk +

1 + (hk)

2)

21 + 2k

12(1 + ) hk ,

which proves (1.149) in view of hk hk h0 < 2.The proof of (1.150) follows along the same line by using (1.148) in (1.146).

3.3.3 Algorithmic aspects of the affine conjugate inexact NewtonPCG method


Let us assume that the quantity < 1 in both the linear convergence modeand the quadratic convergence mode has been specified and let us furtherassume that we use the startiterate xk0 = 0 in the inner PCG iteration.Denoting by k an easily computable estimate of the relative energy normiteration error k, we accept a new iterate x

k+1, if the condition

f(xk+1) f(xk) 110

k = 110

(1 + 2

k)k (1.151)

or the monotonicity test

k :=(k+1

k

)1/2=((1 + 2k+1) k+1

(1 + 2

k) k

)1/2 < 1 (1.152)

is satisfied. We consider the outer iteration as divergent, if neither (1.151) nor(1.152) hold true.


With respect to a user specified accuracy ETOL, the inexact Newton PCGiteration will be terminated, if either

k = (1 + 2

k) k ETOL2 . (1.153)

or

f(xk) f(xk+1) 12ETOL2 . (1.154)

(iii) Balancing outer and inner iterations

For k = 0, we choose 0 = 0 =14.

As in case of the inexact Newton CGNE iteration, for k 1 we begin with the


quadratic convergence mode and switch to the linear convergence mode as soonas the approximate contraction factor k is below some prespecified thresholdvalue 1

2.

(iii)1 Quadratic convergence mode

A computationally realizable termination criterion for the inner PCGiteration in the quadratic convergence mode is given by

k [hk]

[hk] +4 + [hk]

2, (1.155)

where [hk] is an appropriate a priori estimate of the inexact Kantorovichquantity hk. In view of (1.145), we have the a posteriori estimates

[hk]2 :=10

k|f(xk+1) f(xk) + 1

3k| (1.156)

and

[hk]2 :=

1 +

2

k |[hk]2 . (1.157)We note that (1.157) yields the a priori estimate

[hk] := k1 [hk1]2 . (1.158)

Using (1.158) in (1.157), for the inexact Kantorovich quantity we obtain thefollowing a priori estimate

[hk] :=[hk]1 +

2

k

. (1.159)

Inserting (1.159) into (1.155), we obtain a simple nonlinear equation in k.

Remark 3.5 Computational work in the quadratic convergence mode

Since k 0 (k ) is enforced, it follows that:The more the iterates xk approach the solution x, the more computa-tional work is required for the inner iterations to guarantee quadraticconvergence of the outer iteration.


We switch to the linear convergence mode, if

k < (1.160)


is satisfied.The computationally realizable termination criterion for the inner itera-tion in the linear convergence mode is

[(hk, k)] := ([hk], k) . (1.161)

Since asymptotically there holds

k 12

(k ) ,

we observe:

Remark 3.6 Computational work in the linear convergence mode

The more the iterates xk approach the solution x, the less compu-tational work is required for the inner iterations to guarantee linearconvergence of the outer iteration.


4. Quasi-Newton Methods

4.1 Introduction

Given F : D lRn lRn as well as xk, xk+1 D , xk 6= xk+1, the idea is toapproximate F locally around xk+1 by an affine function

Sk+1(x) := F (xk+1) + Jk+1(x xk+1) , Jk+1 lRnn , (1.162)

such that

Sk+1(xk) = F (xk) . (1.163)

The requirement (1.163) gives rise to the so-called secant condition

J(xk+1 xk

)

=: xk

= F (xk+1) F (xk) =: yk

. (1.164)

The matrix J is not uniquely determined by (1.164), since

dim Sk+1 = (n 1)n , (1.165)

where

Sk+1 := {J lRnn | Jxk = yk} . (1.166)

There are different criteria to select an appropriate J Sk+1.4.1.1 The Good Broyden rank 1 update

Let us consider the change in the affine model as given by

Sk+1(x) Sk(x) = (Jk+1 Jk)(x xk) . (1.167)

An appropriate idea is to choose Jk+1 Sk+1 such that there is a least changein the affine model in the sense

Jk+1 JkF = minJSk+1

J JkF , (1.168)

where F stands for the Frobenius norm (observe J = (Jik)ni,k=1)

JF :=( ni,k=1

J2ik

)1/2. (1.169)


The solution of (1.169) can be heuristically motivated as follows: Choose tk xk such that

x xk = xk + tk .Then, (1.167) reads

Sk+1(x) Sk(x) = (Jk+1 Jk)xk = (ykJkxk)

+ (Jk+1 Jk)tk . (1.170)

Now, choose Jk+1 Sk+1 such that(Jk+1 Jk)tk = 0 .

It follows that

rank (Jk+1 Jk) = 1 , Jk+1 Jk = vk(xk)T . (1.171)Inserting (1.171) into (1.170) yields

vk (xk)T xk = (yk Jkxk) ,which results in

vk =yk Jkxk(xk)T xk

.

Altogether, this gives us Broydens rank 1 update (Good Broyden)

Jk+1 = Jk +[F (xk+1) F (xk) Jkxk

] (xk)T(xk)T xk

. (1.172)

For the solution of nonlinear systems, we are more interested in updates of theinverse of Jk. Such an update can be provided by the Sherman-Morrison-Woodbury formula

(A + uvT )1 = A1 A1uvTA1

1 + vTA1u. (1.173)

Setting

A := Jk , u := F (xk+1) F (xk) Jkxk , v := (x

k)T

(xk)T xk,

we obtain

J1k+1 = J1k +

[xk J1k (F (xk+1) F (xk))

](xk)TJ1k

(xk)TJ1k[F (xk+1) F (xk)

] . (1.174)


4.1.2 The Bad Broyden rank 1 update

Instead of (1.168), an alternative to choose Jk+1 Sk+1 such that there is aleast change in the solution of the affine model, i.e.,

J1k+1 J1k F = minJ Sk+1

J1 J1k F . (1.175)

Similar considerations as before lead us to the Broydens alternative rank1 update (Bad Broyden)

J1k+1 = J1k +

[xk J1k

(F (xk+1) F (xk)

)](F (xk+1) F (xk)

)T(F (xk+1) F (xk)

)T(F (xk+1) F (xk)

) .(1.176)

4.2 Affine covariant Quasi-Newton method

4.2.1 Affine covariant Quasi-Newton convergence theory

Affine covariant Quasi-Newton methods require the secant condition (1.164) tobe stated by means of affine covariant terms in the domain of definition of thenonlinear mapping F .Observing that we compute the Quasi-Newton increment xk as the solution of

Jkxk = F (xk) , (1.177)

we can rewrite (1.164) according to

(Jk J)xk = F (xk+1) .

Multiplication by J1k yields the affine covariant secant condition

xk+1 := (I J1k J) =: Ek(J)

xk = J1k F (xk+1) . (1.178)

we note that any rank 1 update of the form

Jk+1 = Jk

(I x

k+1vT

vT xk

), v lRn \ {0} (1.179)

satisfies the affine covariant secant condition (1.178).In particular, for v = xk we recover the Good Broyden.


Theorem 4.1 Properties of the affine covariant Quasi-Newton method

For Broydens affine covariant rank 1 update (Good Broyden)

Jk+1 = Jk

(I x

k+1(xk)T

xk2)

(1.180)

assume that the local contraction condition

k =xk+1xk 0 ,


In order to come up with an affine covariant globalization concept, weintroduce the level set associated with the level function TA given by

GA(z) := {x D | TA(x) TA(z)} . (1.252)We recall that monotonicity with respect to TA reads as follows

xk+1 int GA(xk) , if int GA(xk) 6= .Denoting by GL(n) the set of all regular nn matrices, we introduce the affinecovariant level set

GA(x) :=

AGL(n)GA(x) . (1.253)

Theorem 5.1 Newton path

Assume that F : D lRn lRn is continuously differentiable on D withnonsingular Jacobi matrix F (x), x D. Further suppose that for some A GL(n) the path-connected component of GA(x

0), x0 D, is a compact subsetof D. Then, the path-connected component of GA(x

0) is a topologicalpath x : [0, 2] lRn, called the Newton path. It has the properties

F (x()) = (1 ) F (x0) , (1.254)

TA(x()) = (1 )2 TA(x0) , A GL(n) , (1.255)and satisfies the two-point boundary value problem

dx

d= F (x)1F (x0) , (1.256)

x(0) = x0 , x(1) = x .


Moreover, we recover the ordinary Newton increment x0 by means of

dx

d|=0 = F (x0)1F (x0) = x0 . (1.257)

Proof. We introduce the level sets

HA(x0) := {y lRn | Ay2 AF (x0)2}

and define their intersection

H(x0) :=

AGL(n)HA(x

0) . (1.258)

The idea of proof is to show that H(x0) = G(x0).For that purpose, we refer to i, 1 i n, as the singular values of A andto qi, 1 i n, as the associated eigenvectors of ATA such that

ATA =ni=1

2i qiqTi .

We further denote by A the following subset of GL(n)

A := {A GL(n) | ATA =ni=1

2i qiqTi , q1 =

F (x0)

F (x0) } .

Obviously, every y lRn admits the representation

y =nj=1

bjqj , bj lR , 1 j n ,

and hence,

Ay2 = yTATAy =ni=1

2i b2i ,

AF (x0)2 = 21 F (x0)2 .

In particular, for A A we find

HA(x0) = {y lRn |

ni=1

2i b2i 21 F (x0)2} .


Figure 1: Intersection of ellipsoids HA(x0), A A.

In other words, HA(x0) defines the n-dimensional ellipsoid

1

F (x0)2 b21 +

( 21F (x0)

)2b22 + ... +

( n1F (x0)

)2b2n 1 .

For A A, all ellipsoids have a common b1-axis of length F (x0), whereas thelengths of the other axes differ (cf. Figure 1).

It follows readily that

H(x0) = {y lRn | y = b1q1 , |b1| F (x0)} = (1.259)

= {y lRn | y = (1 )F (x0) , [0, 2]} =

= {y lRn | Ay = (1 )AF (x0) , [0, 2] , A GL(n)} .Since A GL(n), we have

H(x0) H(x0) .On the other hand, for y H(x0) and A A

Ay2 = (1 )2AF (x0)2 AF (x0)2 ,which shows

H(x0) H(x0) .


The final stage of the proof is done by an appropriate lifting of the path H(x0)to G(x0) using the homotopy

(x, ) := F (x) (1 )F (x0) .In view of

x = F(x) , = F (x0)

and observing that x is nonsingular for x D and GA(x0) D, local contin-uation from x(0) = x0 by the implicit function theorem, applied to 0,delivers the existence of the path

x GA(x0) Dwith the properties (1.256),(1.257). The assertions (1.254) and (1.255) are nowa direct consequence of (1.259).

Remark 5.2 The implication of the previous theorem is that even far from thesolution, the Newton increment x0/x0, which is tangent to the Newtonpath originating from x0, plays a decisive role and should be used in an affineinvariant globalization strategy. Alone, its length may me too large and thushas to be controlled appropriately.

Remark 5.3 The previous theorem assumes that the Jacobian is regular in D.However, sometimes the situation is encountered where the Jacobian is singularat a critical point x even close to the initial guess x0. In this case, the implicitfunction theorem tells us that the Newton path ends at that critical point.

5.2 Trust region concepts

As we have seen, far away from the solution the ordinary Newton method canbe still used, provided an appropriate damping of the Newton increment isprovided. Of course, we would like to know how to determine the dampingfactor, or in other words, what is the region around the current iterate wherewe can rely on the linearization with respect to the tangent to the Newton path.The specification of such regions is known as trust region concepts.

5.2.1 Trust region based on the Levenberg-Marquardt method

Given a current iterate xk lRn and a prespecified parameter > 0, the idea ofthe Levenberg-Marquardt method is to determine an increment xk lRnas the solution of the constrained minimization problem

infxkK

F (xk) + F (xk)xk ,


where K stands for the constraint

K := {xk lRn | xk } .Coupling the inequality constraints by a Lagrangian multiplier lR+ leadsto the saddle point problem

infxklRn

suplR+

L(xk, )

in terms of the associated Lagrangian functional

L(xk, ) := F (xk) + F (xk)xk2 + (xk2 2

).

The KKT conditions read as follows:(F (xk)TF (xk) + I

)xk = F (xk)F (xk) , (1.260)

0 , xk2 2 0 , (xk2 2) = 0 . (1.261)Denoting the solution of the saddle point problem by (xk(), ), we observe

0+ = xk() F (xk)F (xk) ,

>> 1 = xk() 1F (xk)F (xk) = 1

grad T (xk) .

This means:Close to the solution, the method coincides with the ordinary Newton method,whereas far from the solution, it corresponds to a steepest descent with thesteplength parameter 1

.

The Levenberg-Marquardt method looks robust, since the coefficient matrixF (xk)TF (xk)+I in (1.260) is regular, even if the Jacobian F (xk) is singular.However, the method may terminate for singular F (xk), since then the right-hand side in (1.260) also degenerates. Moreover, the Levenberg-Marquardtmethods lacks affine invariance.

5.2.2 The Armijo damping strategy

An empirical damping strategy is the Armijo strategy:Let k {1, 12 , 14 , ..., min} be a sequence of steplengths with the property

T (xk + xk) (1 12) T (xk) , k . (1.262)


Figure 2: Geometric interpretation of the affine covariant trust region method

Then, the damping parameter k k is chosen as the optimal one:

T (xk + kxk) = min

kT (xk + xk) .

Obviously, the choice of the level function T (x) in the Armijo rule does notreflect affine covariance. We will develop an affine covariant damping strategybelow.

5.2.3 Affine covariant trust region method

The Levenberg-Marquardt method can be easily reformulated to yield an affinecovariant version. Since affine covariance means affine invariance with respectto transformations in the domain of definition, we have to modify the objectivefunctional:

infxkK

F (xk)1(F (xk) + F (xk)xk

) , (1.263)

whereas the set of constraints K is given as before.

The affine covariant trust region method (1.263) admits an easy geometric in-terpretation as shown in Figure 5.2. The set K of constraints is representedas a sphere with radius around xk. If exceeds the length of the Newtoncorrection xk, the constraint is not active, and we are in the regime of theordinary Newton method. However, if is smaller than the Newton correctionxk, we have to apply an appropriate damping.


5.2.4 Affine contravariant trust region method

We can also easily reformulate the Levenberg-Marquardt method to come upwith an affine contravariant version. Since affine contravariance means affineinvariance with respect to transformations in the range space, the objectivefunctional remains unchanged, but we have to modify the set of constraints:

infxkK

F (xk) + F (xk)xk ,

whereas the set of constraints K is given as follows:

K := {xk lRn | F (xk)xk } . (1.264)

There is basically the same geometric interpretation as before with the onlydifference that now the picture has to be drawn in the range space.

5.3 Globalization of affine contravariant Newton methods

5.3.1 Convergence of the damped Newton iteration

We consider the damped Newton iteration

F (xk)xk = F (xk) , (1.265)xk+1 = xk + kx

k , k [0, 1]

in an affine contravariant setting where the damping factor k is chosen toachieve residual contraction.

Theorem 5.2 Optimal choice of the damping factor

Assume that F : D lRn lRn , D convex, is continuously differentiable onD with regular Jacobian F (x), x D. We further suppose that the followingaffine contravariant Lipschitz condition holds true

(F (y) F (x)

)(y x) F (x)(y x)2 , x, y D . (1.266)

Setting hk := F (xk), for [0,min(1, 2hk )] we have

F (xk + xk) tk() F (xk) , (1.267)

where

tk() := 1 + 12hk

2 .


The optimal choice of the damping factor is

k := min(1,1

hk) . (1.268)

Proof. By straightforward calculation we find

F (xk + xk) = F (xk + xk F (xk) F (xk)xk =

=

0


)xk dt (1 ) F (xk)xk

0


)xk dt + (1 ) F (xk) .

The first term on the right-hand side measures the deviation from the New-ton path. Using the affine contravariant Lipschitz condition, it can be esti-mated as follows

0


)xk dt

122 F (xk)xk2 1

2hk

2 F (xk) .

Inserting this estimate into the previous one and minimizing tk() proves thetheorem.

Theorem 5.3 Global convergence of affine contravariant Newtonmethods

Under the same assumptions as in theorem 5.2 let D0 be the path-connectedcomponent of the level set G(x0) and suppose that D0 is a compact subset ofD. Then, the for all damping factors

k [, 2k ] (1.269)

with > 0 sufficiently small, the damped Newton iterates xk, k lN0 convergeto some x D0 with F (x) = 0.


Proof. The parabola tk() from Theorem 5.2 can be bounded by a polygonalas follows

tk()

1 12 , 0 1

hk,

1 + 12 1

hk, 1

hk 2

hk.

For 0 < 1hk

and k [, 2k ] we thus have

tk() 1 12 , (1.270)

which shows strict reduction of the residual level function T (x).The existence of a global > 0 follows from the compactness assumption on D0which implies

maxxD0

F (x) < .

Consequently, if G(xk) D0, then (1.270) yields

G(xk+1()) G(xk) .

The rest of the proof is along the same lines as the proof of the affine contravari-ant Newton-Mysovskikh theorem.

5.3.2 Adaptive affine contravariant trust region strategy

In Theorem 5.2 we derived the theoretical damping factor (1.268). Since theKantorovich quantity hk = F (xk) cannot be accessed directly, we again haveto provide appropriate estimates

[hk] := [] F (xk) , (1.271)

where [] is a lower bound for the domain dependent Lipschitz constant thatcan be obtained by pointwise sampling.Then, an estimate of the optimal damping factor is given by means of

[k] := min (1,1

[hk]) . (1.272)

It follows readily from (1.271) that

[k] k ,


i.e., we may have a considerable overestimation. As a remedy, repeated reduc-tions must be performed by appropriate prediction and correction strate-gies.The following bit counting lemma gives information about the contractionin the residuals in terms of the accuracy of the estimate for the Kantorovichquantity.

Lemma 5.3 Bit counting lemma

Assume that for some 0 < 1 there holds0 hk [hk] < max (1, [hk]) . (1.273)

Then, the residual monotonicity test (1.267) yields

F (xk+1) (1 1

2(1 )k

)F (xk) . (1.274)

Proof. The assumption (1.273) can be rewritten as

[hk] hk < (1 + ) max (1, [hk]) ,which results in the following estimate of the residual contraction

F (xk+1)F (xk) [1 +

1

22hk]|=[k] 0 being sufficiently small, the damped Newton method convergesto some x D0 with F (x) = 0.Proof. As before, we remark that the parabola tAk () can be bounded fromabove by a polygonal bound according to

tAk () 1 1

2 , 0 < 1

hk. (1.288)

Moreover, there is a global , since with regard to the compactness assumptionon D0 we have

maxxD0

F (x)1F (x) cond(AF (x)) < .

The proof proceeds by induction on k: Assuming GA(xk) D0, (1.288) yields

GA(xk+1) GA(xk) D0 .

Consequently, the sequence of Newton iterates lives in a compact set whichallows to conclude.

Remark 5.5 The flaws of residual monotonicity

Setting A = I in the previous theorem, we are obviously back in the residualbased regime where we have proved global convergence according to Theorem5.3. However, if the Jacobian F (xk) is ill conditioned, we obtain

k =(hk cond(F

(xk)))1

1 , (1.289)


Figure 3: Reduction factors and optimal damping factors

which algorithmically will result in a termination of the iteration.

5.4.2 Natural level function

In view of (1.283) and (1.286), the most natural choice of the matrix A GL(n) in the level function TA is

A := Ak = F(xk)1 . (1.290)

The associated level function TF (xk)1 is called the natural level functionwhich gives rise to the natural monotonicity test

xk+1 xk (1.291)in terms of the simplified Newton correction

xk+1

= F (xk)1F (xk+1) . (1.292)Several remarks are due with respect to the properties of the natural levelfunction.

Remark 5.6 Extremal properties

As shown in Figure 3, for A GL(n) the reduction factors tAk () and the optimaldamping factors k(A) satisfy

tAkk () = 1 +1

22 hk tAk () , (1.293)

k(Ak) = min (1,1

hk) k(A) . (1.294)


Figure 4: Asymptotic distance spheres associated with natural level sets

Remark 5.7 Steepest descent property

The damped Newton method in xk is a method of steepest descent for thenatural level function TAk :

xk = grad TAk(xk) . (1.295)

Remark 5.8 Asymptotic optimality

In view of

hk < 1 = k(Ak) = 1 , (1.296)

the damped Newton method asymptotically achieves quadratic convergence.

Remark 5.9 Asymptotic distance function

If F : D lRn lRn is twice continuously differentiable, we can show

TF (x)1(x) =1

2x x2 + O(x x3) .

Hence, for xk x the natural monotonicity criterion approaches a distancecriterion of the form

xk+1 x xk x .

As shown in Figure 4, close to the solution x the natural level surface is closeto a sphere, whereas it degenerates to an osculating sphere with increasing


distance to x. Note that for other level functions, the level surface is an ellipsoidclose to x, with the ratio of the largest to the smallest half-axis being relatedto the condition number of the Jacobian, and an osculating ellipsoid off x.

Remark 5.10 Local descent

if we insert A = Ak into (1.285),(1.286) of Theorem 5.4, we get the localdescent property

xk+1 (1 + 1

22 hk

)xk . (1.297)

Remark 5.11 Global convergence

We note that the results of Theorem 5.5 are not applicable to the situation athand, since A = Ak changes from one step to the other. Taking the asymptoticdistance function property into account, in the subsequent global convergenceresult we make the fixed choice A = F (x)1.

Theorem 5.6 Global convergence of the affine covariant damped New-ton method with natural level functions; Part I

Assume that F : D lRn lRn, D lRn convex, is continuously differentiableon D with regular Jacobian F (x), x D and suppose that the following affinecovariant Lipschitz condition is fulfilled

F (x)1(F (y) F (x)

)(y x) y x2 , x, y D . (1.298)

Suppose further that x D is the unique solution in D and let x0 D bean initial guess such that the path-connected component of GF (x)1(x

0) is acompact subset of D.Let the damping factors be chosen according to

k [, 2k ] , 0 < 0 the data

x(`) , q `


is available. Then, in terms of the fundamental Lagrange polynomials L`q(),the prediction path is given by the interpolating polynomial

xq() :=

`=qx(` L

`q() . (1.334)

Standard error estimates give

x() xq() Cq+1 () , (1.335)where

() :=

`=q( `) .

(iv)2 Hermite extrapolation

Here, we assume that we are given the data

x(`) , x(`) , q ` .

We define the prediction path xq() as the associated Hermite polynomial andobtain

x() xq() Cq+1 () , (1.336)where

() :=

`=q( `)2 .

6.1.3 Affine covariant correction method

Once we have computed a prediction path x(), +1, we choose thepredicted value x0 := x(+1) as an initial guess for a correction methodto compute an approximation of x := x(+1). We will study the ordinaryNewton method with a new Jacobian at each iterate. Applying the affine co-variant version of the Newton-Kantorovich theorem, we get the following result.

Theorem 6.1 Convergence of the corrector

Assume that F : D I lRn is continuously differentiable with nonsingularJacobian Fx(x, ), (x, ) D I. Further, suppose that there exists a uniquehomotopy path x() and that the affine covariant Lipschitz condition

Fx(x(), )1(Fx(y, ) Fx(x, )

) 0 y x , x, y D , I (1.337)


is satisfied, where x() is a prediction method of order p (cf. (1.326)). Then,for all step sizes

max :=(2 10 p

)1/p, (1.338)

the ordinary Newton method with initial guess x(+1) converges to the solutionpoint x(+1).

Proof. For the ease of exposition, we write instead of . The affinecovariant Newton-Kantorovich theorem requires

x0()0 12. (1.339)

Applying the Lipschitz condition (1.337), by straightforward computation wefind

x0() = Fx(x(), )1F (x(), ) = Fx(x, )1(F (x, ) F (x, )

) =

Fx(x, )11

0

Fx(x+ t(x x), )(x x) dt x x)(1 +

1

20 x x

).

Observing (1.326), we deduce

x0() pp(1 +

1

20 p

p)

=: () . (1.340)

Consequently, this leads to the requirement

0 pp(1 +

1

20 p

p) 1

2,

which is equivalent to0 p

p 2 1 .

6.1.4 Adaptive stepsize control

For the practical application of the theoretical convergence results we have toreplace the theoretical quantities 0 and p by computationally available lowerbounds [0] and [p] thus resulting in the stepsize estimate

[max] :=(2 1[0] [p]

)1/p max . (1.341)

Since there might be a substantial overestimation, we need again a predictionstrategy and a correction strategy.


As far as the correction strategy is concerned, let us assume that for +1wealready know the first contraction factor

0() :=x1()x0() .

The convergence analysis of the affine covariant Newton method yields

0() 120 x0() . (1.342)

Hence, inserting (1.340) gives us

0() 120 p

p ,

which leads to0 p

p g(0()) ,where

g() :=1 + 4 1 .

From this, we get the a posteriori estimate

[0 p] :=g(0())

p 0 p ,

and the associated stepsize estimate

[max] :=( g()[0 p]

)1/p, =

1

4.

Denoting by the stepsize associated with the computed value of 0 and by corresponding to =

14, we arrive at the stepsize correction

:=( g()g(0)

)1/p . (1.343)

Remark: If the termination criterion detects some k such that k >12, the

last continuation step has to be repeated with

:=( g()g(k)

)1/p , (1.344)

which gives rise to a reduction, since

[max]

numerical methods for large-scale non-linear systems, hoppe

Documents