nonlinear eigenvalue problems - tuhh · nonlinear eigenvalue problems t( )x = 0 arise in a variety...

115Nonlinear Eigenvalue Problems

Heinrich VossHamburg University of Technology

115.1 Basic Properties . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 115-2115.2 Analytic matrix functions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 115-3115.3 Variational Characterization of Eigenvalues . . . . . . . . 115-7115.4 General Rayleigh Functionals . . . . . . . . . . . . . . . . . . . . . . . . 115-9115.5 Methods for dense eigenvalue problems . . . . . . . . . . . . . 115-10

115.6 Iterative projection methods . . . . . . . . . . . . . . . . . . . . . . . . . . 115-13

115.7 Methods using invariant pairs . . . . . . . . . . . . . . . . . . . . . . . . 115-17

115.8 The infinite Arnoldi method . . . . . . . . . . . . . . . . . . . . . . . . . . 115-20

References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 115-22

This chapter considers the nonlinear eigenvalue problem to find a parameter λ such thatthe linear system

T (λ)x = 0 (115.1)

has a nontrivial solution x, where T (·) : D → Cn×n is a family of matrices depending ona complex parameter λ ∈ D.

It generalizes the linear eigenvalue problem Ax = λx, A ∈ Cn×n, where T (λ) = λI − A,and the generalized linear eigenvalue problem where T (λ) = λB −A, A,B ∈ Cn×n.

Nonlinear eigenvalue problems T (λ)x = 0 arise in a variety of applications in science andengineering, such as the dynamic analysis of structures, vibrations of fluid–solid structures,the electronic behavior of quantum dots, and delay eigenvalue problems, to name just a few.Due to its wide range of applications, the quadratic eigenvalue problem T (λ)x = λ2Mx +λCx + Kx = 0 is of particular interest, but also polynomial, rational and more generaleigenvalue problems appear. A standard approach for investigating or numerically solvingpolynomial eigenvalue problems is linearization where the original problem is transformedinto a generalized linear eigenvalue problem with the same spectrum. Details on linearizationand structure preservation are discussed in Chapter 102, Matrix Polynomials.

This chapter is concerned with the general nonlinear eigenvalue problem which in generalcan not be linearized. Unlike for linear and polynomial eigenvalue problems there may existinfinitely many eigenvalues. In practice, however, one is usually interested only in a feweigenvalues close to a target value or a line in the complex plane.

If T is linear then T (λ) = T (0)+λT ′(0) has the form of a generalized eigenvalue problem,and in the general case linerization gives the approximation T (λ) = T (0) +λT ′(0) +O(λ2),which is again a generalized linear eigenvalue problem. Hence, it is not surprising, that the(elementwise) derivative T ′(λ) of T (λ) plays an important role in the analysis of nonlineareigenvalue problems.

We tacitly assume in the whole chapter that whenever a derivative T ′(λ) appears, T is

analytic in a neighborhood of λ or in the real case T : D → Rn×n, D ⊂ R that T isdifferentiable in a neighborhood of λ. ‖ · ‖ always denotes the Euclidean and spectral norm,respectively, and we use the notation [x; y] := [xT ,yT ]T for column vectors.

115-1

115-2 Handbook of Linear Algebra

115.1 Basic Properties

This section presents basic properties of the nonlinear eigenvalue problem (115.1)

Definitions:

As for a linear eigenvalue problem, λ ∈ D is called an eigenvalue of T (·) if T (λ)x = 0 has a

nontrivial solution x 6= 0. Then x is called a corresponding eigenvector or right eigenvector,

and (λ, x) is called eigenpair of T (·).Any nontrivial solution y 6= 0 of the adjoint equation T (λ)∗y = 0 is called left eigenvector of

T (·) and the vector-scalar-vector triplet (y, λ, x) is called eigentriplet of T (·).The eigenvalue problem (115.1) is regular if detT (λ) 6≡ 0, and otherwise it is called singular.

The spectrum σ(T (·)) of T (·) is the set of all eigenvalues of T (·).An eigenvalue λ of T (·) has algebraic multiplicity k if d`

dλ` det(T (λ))∣∣∣λ=λ

= 0 for ` = 0, . . . , k−

1 and dk

dλk det(T (λ))∣∣∣λ=λ

6= 0.

An eigenvalue λ is simple if its algebraic multiplicity is one.

The geometric multiplicity of an eigenvalue λ is the dimension of the kernel ker(T (λ)) of

T (λ).

An eigenvalue λ is called semi-simple if its algebraic and geometric mutiplicity coincide.

T (·) : J → Rn×n is real symmetric if T (λ) = T (λ)T for every λ ∈ J ⊂ R.

T (·) : D → Cn×n is complex symmetric if T (λ) = T (λ)T for every λ ∈ D.

T (·) : D → Cn×n is Hermitian if D is symmetric with respect to the real line and T (λ)∗ = T (λ)

for every λ ∈ D.

Facts:

1. For A ∈ Cn×n and T (λ) = λI − A, the terms eigenvalue, (left and right) eigen-vector, eigenpair, eigentriplet, spectrum, algebraic and geometric multiplicity andsemi-simple have their standard meaning.

2. For linear eigenvalue problems,

– eigenvectors corresponding to distinct eigenvalues are linearly independent, whichis not the case for nonlinear eigenvalue problems (cf. Example 1).

– left and right eigenvectors corresponding to distinct eigenvalues are orthogonal,which does not hold for nonlinear eigenproblems (cf. Example 2).

– the algebraic multiplicities of eigenvalues sum up to the dimension of the prob-lem, whereas for nonlinear problems there may exist an infinite number of eigen-values (cf. Example 2) and an eigenvalue may have any algebraic multiplicity(cf. Example 3).

3. [Sch08] If λ is an algebraically simple eigenvalue of T (·), then λ is geometricallysimple.

4. [Neu85, Sch08] Let (y, λ, x) be an eigentriplet of T (·). Then λ is algebraically simple

if and only if λ is geometrically simple and y∗T ′(λ)x 6= 0.5. [Sch08] Let D ⊂ C and E ⊂ Cd be open sets. Let T : D×E → Cn×n be continuously

differentiable, and let λ be a simple eigenvalue of T (·, 0) and x and y right and left

eigenvectors with unit norm. Then the first order perturbation expansion at λ readsas follows:

λ(ε)− λ =1

y∗T ′(λ, 0)x

d∑j=1

εjy∗ ∂T

∂εj(λ, 0)x + o(‖ε‖).

Nonlinear Eigenvalue Problem 115-3

The normwise condition number for λ is given by

κ(λ) = lim sup‖ε‖→0

|λ(ε)− λ|‖ε‖

=1

|y∗T ′(λ, 0)x|

√√√√ d∑j=1

∣∣∣∣y∗ ∂T∂εj (λ, 0)x

∣∣∣∣2.6. [Sch08] Let (y, λ, x) be an eigentriplet of T (·) with simple eigenvalue λ. Then for

sufficiently small |λ− λ|

T (λ)−1 =1

λ− λxy∗

y∗T ′(λ)x+O(1).

7. [Neu85] Let λ be a simple eigenvalue of T (·), and let x be a right eigenvector normal-

ized such that e∗x = 1 for some vector e. Then the matrix B := T (λ) + T ′(λ)xe∗ isnonsingular.

8. If T (·) is real symmetric and λ is a real eigenvalue, then left and right eigenvectorscorresponding to λ coincide.

9. If T (·) is complex symmetric and x is a right eigenvector, then x is a left eigenvectorcorresponding to the same eigenvalue.

10. If T (·) is Hermitian, then eigenvalues are real (and left and right eigenvectors corre-sponding to λ coincide) or they come in pairs, i.e. if (y, λ,x) is an eigentriplet of T (·),then this is also true for (x, λ,y).

Examples:

1. For the quadratic eigenvalue problem T (λ)x = 0 with

T (λ) :=

[0 1

−2 3

]+ λ

[7 −5

10 −8

]+ λ2

[1 0

0 1

](115.2)

the

distinct eigenvalues λ = 1 and λ = 2 share the eigenvector [1; 2].

2. Let T (λ)x :=

[eiλ

2

1

1 1

]x = 0. Then T (λ)x = 0 has a countable set of eigenvalues

√2kπ,

k ∈ N ∪ 0. λ = 0 is an algebraically double and geometrically simple eigenvalue with

left and right eigenvectors x = y = [1;−1], and y∗T ′(0)x = 0. Every λk =√

2kπ, k 6= 0

is algebraically and geometrically simple with the same eigenvectors x, y as before, and

y∗T ′(λk)x = 2√

2kπi 6= 0.

3. T (λ) = (λk), k ∈ N has the eigenvalue λ = 0 with algebraic multiplicity k.

115.2 Analytic matrix functions

In this section we consider the eigenvalue problem (115.1) where T (·) : D → Cn×n is a

regular matrix function which is analytic in a neighborhood of an eigenvalue λ.

Definitions:

A sequence of vectors x0,x1, . . . ,xr−1 is called a Jordan chain (of length r) corresponding to λ

if x0 6= 0 and ∑k=0

1

k!

dkT (λ)

dλk

∣∣∣∣λ=λ

x`−k = 0 for ` = 0, . . . , r − 1.


x0 is an eigenvector and x1, . . . ,xr−1 are generalized eigenvectors.

Let x0 be an eigenvector corresponding to an eigenvalue λ. The maximal length of a Jordan

chain that starts with x0 is called the multiplicity of x0.

An eigenvalue λ is is said to be normal if it is a discrete point in σ(T (·))) and the multiplicity

of each corresponding eigenvector is finite.

An analytic function x : D → Cn is called root function of T (·) at λ ∈ D if T (λ)x(λ) = 0 and

x(λ) 6= 0.

The multiplicity of λ as a zero of T (λ)x(λ) is called the multiplicity of x(·).The rank of an eigenvector x0 is the maximum of the multiplicities of all root functions x(·)

such that x(λ) = x0.

A root function x(·) is called a maximal root function if the multiplicity of x(·) is equal to

the rank of x0 := x(λ).

Let x(1)0 ∈ kerT (λ) be an eigenvector with maximal rank and let x(1)(λ) =

∑∞j=0 x

(1)j (λ − λ)j

be a maximal root function such that x(1)(λ) = x(1)0 . Suppose that the root functions x(k)(λ) =∑∞

j=0 x(k)j (λ − λ)j , k = 1, . . . , i − 1 are already constructed, and let x

(i)0 be an eigenvector with

maximal rank in some direct complement to the linear span of the vectors x(1)0 , . . . ,x

(i−1)0 in

kerT (λ). Let x(i)(λ) =∑∞j=0 x

(i)j (λ − λ)j be a maximal root function such that x(i)(λ) = x

(i)0 .

Then the ordered set

x(1)0 , . . . ,x

(1)r1−1,x

(2)0 , . . . ,x

(2)r2−1, . . . ,x

(k)0 , . . . ,x

(k)rk−1,

where k = dim kerT (λ) and rj = rank x(j)0 is called canonical set of Jordan chains, and the

ordered set x(1)(λ), . . . ,x(k)(λ) is called canonical system of root functions.

Let X ∈ Cn×α contain in its columns the vectors of a canonical set of Jordan chains and let

J = diag(J1, . . . , Jk), where Jj is a Jordan block of size rj × rj corresponding to λ. Then the pair

(X, J) is called a Jordan pair.

Let x(1)(λ), . . . ,x(k)(λ) be a canonical system of root functions at λ, and let x(k+1), . . . ,x(n) ∈Cn such that x(1)(λ), . . . ,x(k)(λ),x(k+1), . . . ,x(n) is a basis of Cn. Then the system x(1)(λ),

. . . ,x(k)(λ),x(k+1), . . . ,x(n) is called an extended canonical system of root functions. To

the constant functions x(k+1), . . . ,x(n) ∈ Cn (which are not root functions in the strict sense of

the definition) is assigned the multiplicity 0.

Let λ be an eigenvalue of T (·), and let Φ(·) be an analytic matrix function such that its columns

form an extended canonical system of root functions of T (·) at λ. Then (cf. [GKS93]) in a neigh-

borhood of λ,

L(λ)Φ(λ) = P (λ)D(λ), (115.3)

where D(λ) is a diagonal matrix with diagonal entries (λ− λ)κ1 , . . . , (λ− λ)κn and P (·) is a matrix

function analytic at λ such that det P (λ) 6= 0. Furthermore, the exponents κ1, . . . , κn are the

multiplicities of the columns of Φ(·), also called partial multiplicities of T (·) at λ. (115.3) is the

local Smith form of T (·) in a neighborhood of λ.

A pair of matrices (Y,Z) ∈ Cn×p ×Cp×p is a regular pair if for some integer ` ≥ 1

rank

Y

Y Z...

Y Z`−1

= p.

The number p is called the order of the regular pair (Y,Z).

Facts:

The following facts for which no specific reference is given can be found in [GLR82, GR81].


1. In contrast to linear eigenvalue problems the vectors in a Jordan chain need not belinearly independent. Even the zero vector is admissible as a generalized eigenvector.

2. Let x(·) be a root function at λ, and let x(j) denote the jth derivative of x. Then the

vectors xj := x(j)(λ), j = 0, . . . , r − 1 form a Jordan chain at λ, where r denotes themultiplicity of x(·).

3. The multiplicity of a root function at λ (and hence the rank of an eigenvector) is at

most the algebraic multiplicity of λ.4. The numbers r1 ≥ · · · ≥ rk in a Jordan pair are uniquely determined.5. The number α := r1 + · · ·+ rk is the algebraic multiplicity of the eigenvalue λ.6. [GKS93] Let y1, . . . ,y` : D → Cn be a set of root functions at λ with multiplicities

s1 ≥ · · · ≥ s` such that y1(λ), . . . ,y`(λ) ∈ kerT (λ) are linearly independent. If the

root functions x1, . . . ,xk define a canonical set of Jordan chains of T (·) at λ withmultiplicities r1 ≥ · · · ≥ rk, then k ≥ ` and ri ≥ si for i = 1, . . . , `. Moreover,y1, . . . ,y` define a canonical set of Jordan chains of T (·) at λ if and only if ` = k andsj = rj for j = 1, . . . , `.

7. Let S(·) be an analytic matrix function with det S(λ) 6= 0. x0, . . . ,xk is a Jordan

chain of T (·)S(·) corresponding to λ if and only if the vectors y0, . . . ,yk given by

yj =∑ji=0

1i!S

(i)(λ)xj−i, j = 0, . . . , k − 1 is a Jordan chain of T (·) corresponding to

λ.8. For S as in the last fact the Jordan chains of T (·) coincide with those of S(·)T (·)

corresponding to the same λ.9. Two regular analytic matrix functions T1(·) and T2(·) have a common Jordan pair at

λ if and only if T2(λ)T−11 (λ) is analytic and invertible at λ.

10. [GKS93] Let T (·), Φ(·), D(·) and P (·) be regular n × n matrix functions, analytic

at λ, such that L(λ)Φ(λ) = P (λ)D(λ) in a neighborhood of λ. Assume that Φ(λ)is invertible and that D(·) is a diagonal matrix polynomial with diagonal entries

(λ − λ)κ1 , . . . , (λ − λ)κn , where κ1 ≥ · · · ≥ κn. Then the following three conditionsare equivalent:

(i) the columns of Φ(·) form an extended canonical system of root functions of T (·)at λ with partial multiplicities κ1, . . . , κn

(ii) det P (λ) 6= 0

(iii)∑nj=1 κj is the the algebraic multiplicity of λ.

11. [GKS93] Let x(λ) =∑∞j=0(λ− λ)jxj be an analytic Cn-vector function with x0 6= 0,

and set X := [x0, . . . ,xp]. Then x(·) is a root function of T (·) at λ of multiplicity atmost p if and only if T (λ)X(λI − Jλ,p)

−1 is an n× p analytic matrix function. Here

Jλ,p denotes a p× p Jordan block with eigenvalue λ.

12. [AST09] T (·) admits a representation P (λ)T (λ)Q(λ) = D(λ) where P (·) and Q(·) areregular analytic matrix functions with constant nonzero determinants, and D(λ) =diag dj(λ))j=1...,n is a diagonal matrix of analytic functions such that dj(λ)/dj−1(λ)are analytic for j = 2, 3, . . . , n. This representation is also called local Smith form.

13. [AST09] With the representation in the last fact, if qj(λ) is the jth column of Q, and

λ a zero of dj(·), then (λ,qj(λ)) is an eigenpair of T (·).14. The non-zero partial multiplicities κj in the local Smith form of T (·) at λ coincide

with the lengths r1 ≥ · · · ≥ rk of Jordan chains in a canonical set.15. A Jordan pair (X, J) of T (·) at an eigenvalue λ is regular.

16. [GR81] Let λ be an eigenvalue of T (·) with algebraic multiplicity α, and let (Y,Z) ∈Cn×α×Cα×α be a pair of matrices such that σ(Z) = λ. (Y,Z) is similar to a Jordan


pair (X,J) (i.e. Y = XS and Z = S−1JS for some invertible matrix S) if and onlyif (Y,Z) is regular and the following equation holds:

∞∑i=0

TjY (T − λI)j = 0, where Tj =1

j!T (j)(λ)

(note that only a finite number of terms in the left-hand side of the equation is

different from zero, because σ (Z) = λ).17. [HL99] Suppose that A(λ) and B(λ) are analytic matrix-valued functions such that

A(λ) and B(λ) are non-singular. Then the partial multiplicities of the eigenvalue λof T (λ) and T (λ) := B(λ)A(λ)C(λ) coincide.

18. [HL99] Suppose that a matrix-valued function T (λ, τ) depends analytically on λ and

continuously on τ and that λ = 0 is an eigenvalue of T (·, 0) of algebraic multiplicity

α. Then there exists a neighborhood O of λ such that, for all τ sufficiently close tothe origin, there are exactly α eigenvalues (counting with algebraic multiplicities) ofthe matrix-valued function T (·, τ) in O.

Examples:

1. [GLR82] For T (λ) =

[λ2 −λ0 λ2

]we have detT (λ) = λ4, and hence λ = 0 is an eigenvalue of

T (·) with algebraic multiplicity 4 and geometric multiplicity 2.

For an eigenvector x0 = [x01;x02] the first generalized eigenvector x1 satisfies T ′(0)x0 +

T (0)x1 = 0, i.e.

[0 −1

0 0

]x0 = 0, and x1 exists if and only if x02 = 0, and x1 can be taken

completely arbitrary. For a second generalized eigenvector x2 we have 12T′′(0)x0 +T ′(0)x1 +

T (0)x2 = 0, i.e.

[1 0

0 1

]x0 +

[0 −1

0 0

]x1 = 0, i.e. x01 = x12, and if this equation is satisfied,

x2 can be chosen arbitrarily. The condition for the third generalized eigenvector x3 reads[1 0

0 1

]x1 +

[0 −1

0 0

]x2 = 0, which implies x12 = 0 and is contradictory.

To summarize, the length of a Jordan chain can not exceed 3. Jordan chains of length 1 are

x0, x0 6= 0, Jordan chains of length 2 are x0 = [x01; 0],x1 with x01 6= 0 and x1 arbitrary,

and Jordan chains of length 3 are x0 = [x01; 0],x1 = [x11;x01],x2, where x01 6= 0, and x11

and x2 are arbitrary. One example of a canonical system of Jordan chains is x(1)0 = [1; 0],

x(1)1 = [0; 1], x

(1)2 = [1; 1], x

(2)0 = [0; 1].

T (0) = 0 implies that x(·) is a root function at λ = 0, if x1 and x2 are analytic and x(0) 6= 0.

T (λ)x(λ) = [λ2x1(λ) − λx2(λ);λ2x2(λ)] = 0 yields that x has at least the multiplicity 2,

and if x2(λ) = λx1(λ), then the multiplicity is 3, and a higher multiplicity is not possible.

In the latter case one obtains a Jordan chain as [x1(0); 0], [x′1(0);x1(0)], [x′′1 (0); 2x′1(0)].

2. For the quadratic eigenvalue problem in (115.2), det T (λ) = λ4 − λ3 − 3λ2 + λ+ 2. Hence,

λ = −1 is an eigenvalue with algebraic multiplicity 2 and geometric multiplicity 1. From

T (−1)x0 =

[−6 6

−12 12

]x0 =

[0

0

], T (−1)x1+T ′(−1)x0 =

[−6 6

−12 12

]x1+

[5 −5

10 −10

]x0 =

[0

0

]it follows that x = [1; 1] is an eigenvector corresponding to λ, and a generalized eigenvector

as well. Then for X =

[1 1

1 1

]and J =

[−1 1

0 −1

]the pair (X, J) is a regular pair of order 2,

namely the Jordan pair corresponding to λ = −1.


115.3 Variational Characterization of Eigenvalues

Variational characterizations of eigenvalues are very powerful tools when studying self-adjoint linear operators on a Hilbert space. Many things can be easily proved using thesecharacterizations; for example, bounds for eigenvalues, comparison theorems, interlacingresults and monotonicity of eigenvalues, to name just a few.

This section presents similar results for nonlinear eigenvalue problems. A minmax charac-terization was first proved in [Duf55] for overdamped quadratic eigenproblems, generalizedin [Rog64] to general overdamped, and in [VW82] to non-overdamped problems. Althoughthe characterizations also hold for infinite dimensional problems [Had68, VW82] the pre-sentation here is restricted to the finite dimensional case.

We assume in this whole section that J ⊂ R is an open interval (which may be un-bounded), and we consider a family of Hermitian matrices T : J → Cn×n dependingcontinuously on the parameter λ ∈ J , such that the following two conditions are satisfied

(i) For every x ∈ Cn, x 6= 0 the real equation

f(λ; x) := x∗T (λ)x = 0 (115.4)

has at most one solution λ =: p(x) in J . Then (115.4) implicitly defines a (nonlinear)functional on some domain D(p).

(ii)

(λ− p(x))f(λ; x) > 0 for every x ∈ D(p) and every λ ∈ J, λ 6= p(x). (115.5)

Definitions:

The functional p : D(p)→ J is called the Rayleigh functional.

If D(p) = Cn \ 0, then the problem T (λ)x = 0 is called overdamped.

An eigenvalue λ ∈ J of T (·) is a jth eigenvalue if µ = 0 is the j largest eigenvalue of the matrix

T (λ).

Facts:

In this subsection we denote by Sj the set of all j dimensional subspaces of Cn. The followingfacts for which no specific reference is given can be found in [Had68, VW82, Vos09].

1. D(p) is an open set in Cn.2. p(αx) = p(x) for every x ∈ D(p) and every α ∈ C \ 0.3. If T (·) is differentiable in a neighborhood of an eigenvalue λ and x∗T ′(λ)x 6= 0 for a

corresponding eigenvector x, then x is a stationary point of p, i.e. |p(x + h)− p(x)| =o(‖h‖). In the real case T : J → Rn×n, J ⊂ R, we have ∇p(x) = 0.

4. For every j ∈ 1, . . . , n there is at most one jth eigenvalue of T (·).5. T (·) has at most n eigenvalues in J .6. [Rog64] If T (·) is overdamped, then T (·) has exactly n eigenvalues in J .7. If

λj := infV ∈Sj , V ∩D(p)6=∅

supx∈V ∩D(p)

p(x) ∈ J,

then λj is a jth eigenvalue of T (·).8. (minmax characterization) If λj ∈ J is a jth eigenvalue of T (·), then

λj := minV ∈Sj , V ∩D(p)6=∅

maxx∈V ∩D(p)

p(x) ∈ J.

The minimum is attained for an invariant subspace of the matrix T (λj) correspondingto its j largest eigenvalues. The maximum is attained for some x ∈ ker T (λj).


9. Let λ1 := infx∈D(p) p(x) ∈ J and λj ∈ J for some j ∈ 1, . . . , n. Then for every k ∈1, . . . , j there exists Uk ∈ Sk with Uk ⊂ D(p) ∪ 0 and λk := maxx∈Uk, x 6=0 p(x).Hence,

λk := minV ∈Sk, V⊂D(p)∪0

maxx∈V, x 6=0

p(x) ∈ J for k = 1, . . . , j.

10. [Vos03] (maxmin characterization) Assume that there exists a jth eigenvalue λj ∈ J .Then

λj = maxV ∈Sj−1, V ⊥∩D(p)6=∅

infx∈V ⊥∩D(p)

p(x).

The maximum is attained for every invariant subspace of T (λj) corresponding to itsj − 1 largest eigenvalues.

11. Let λj ∈ J be a jth eigenvalue of T (·) and λ ∈ J . Then

λ

<=>

λj ⇐⇒ maxV ∈Sj

minx∈V,x6=0

x∗T (λ)x

x∗x

<=>

0.

12. (orthogonality) [Had68] Let T (·) be differentiable in J . Then eigenvectors can bechosen orthogonal with respect to the generalized scalar product

[x,y] :=

y∗T (p(x))− T (p(y))

p(x)− p(y)x, if p(x) 6= p(y)

y∗T ′(p(x))x, if p(x) = p(y)

which is symmetric and homogeneous, but in general is not bilinear.If T (·) is differentiable und condition (ii) strengthened to x∗T ′(p(x))x > 0 for everyx ∈ D, then [·, ·] is definite, i.e. [x,x] > 0 for every x ∈ D(p).

13. (Rayleigh’s principle) Assume that J contains λ1, . . . , λj−1 where λk is a kth eigen-value of T (·), and let xk, k = 1, . . . , j − 1 be a corresponding eigenvectors. If

λj := infp(x) : x ∈ D(p), [x,xk] = 0, k = 1, . . . , j − 1 ∈ J,

then λj is a jth eigenvalue of T (·).14. (Sylvester’s law; overdamped case) Assume that T (·) is overdamped. For σ ∈ J let

(π, ν, δ) be the inertia of T (σ). Then T (·) has π eigenvalues that are smaller than σ,ν eigenvalues that exceed σ, and if δ 6= 0, then σ is an eigenvalue of multiplicity δ.

15. (Sylvester’s law; extreme eigenvalues) Assume that T (µ) is negative definite for someµ ∈ J , and for σ > µ let (π, ν, δ) be the inertia of T (σ). Then T (·) has exactly πeigenvalues in J that are smaller than σ.

16. (Sylvester’s law; general case) Let µ ∈ J , and assume that for every r dimensionalsubspace V with V ∩D(p) 6= ∅ there exists x ∈ V ∩D(p) with p(x) > µ. For σ ∈ J ,σ > µ let (π, ν, δ) be the inertia of T (σ). Then for j = r, . . . , π there exists a jtheigenvalue λj of T (·) in [µ, σ).

Examples:

1. [Duf55] The quadratic pencil Q(λ) := λ2A+ λB +C with positive definite A,B,C ∈ Cn×nis overdamped if and only if d(x) := (x∗Bx)2− 4(x∗Ax)(x∗Cx) > 0 for every x ∈ Cn \ 0.For x 6= 0 the quadratic equation x∗Q(λ)x = 0 has two real solutions p±(x) = (−x∗Bx ±√d(x))/(2x∗Ax), and γ− := supx6=0 p−(x) < γ+ := infx6=0 p+(x).

Q(·) has n eigenvalues in (−∞, γ+) which are minmax values of p− and n eigenvalues in

(γ−, 0) which are minmax values of p+.


2. Assume that Q(·) as in the last example is not necessarily overdamped, and let in(Q(σ)) =

(π, ν, δ) denote the inertia of Q(σ). If σ < γ+ := infx6=0p+(x) : p+(x) ∈ R, then Q(·) has

exactly ν eigenvalues in (−∞, σ), and if σ > γ− := supx6=0p−(x) : p−(x) ∈ R, then Q(·)has ν eigenvalues in (σ, 0).

If µmin and µmax are the minimal and maximal eigenvalues of Cx = µAx, then −√µmax ≤γ+ and −√µmin ≥ γ−.

If κmin and κmax are the minimal and maximal eigenvalues of Cx = κBx, respectively, then

−2κmax ≤ γ+ and −2κmin ≥ γ−.

115.4 General Rayleigh Functionals

Whereas Section 115.3 presupposes the existence and uniqueness of a Rayleigh functionalfor problems allowing for a variational characterization, this section extends its definitionto more general eigenproblems. It collects results on the existence and approximation prop-erties of a Rayleigh functional in a vicinity of eigenvectors corresponding to algebraicallysimple eigenvalues. The material in this section is mostly taken from [Sch08, SS10].

Definitions:

Let T : D → Cn×n be a matrix valued mapping, which is analytic, or which is differentiable with

Lipschitz continuous derivative in the real case.

Let (λ, x) be an eigenpair of T (·), and define neighborhoods B(λ, τ) := λ ∈ C : |λ − λ| < τand Kε(x) := x ∈ Cn : ∠(Spanx, Spanx) ≤ ε of λ and x, respectively.

p : Kε → B(λ, τ) is a (one-sided) Rayleigh functional if the following conditions hold:

(i) p(αx) = p(x) for every α ∈ C, α 6= 0

(ii) x∗T (p(x))x = 0 for every x ∈ Kε(x)

(iii) x∗T ′(p(x))x 6= 0 for every x ∈ Kε(x).

Let (y, λ, x) be an eigentriplet of T (·).p : Kε(x)×Kε(y)→ B(λ, τ) is a two-sided Rayleigh functional if the following conditions

hold for every x ∈ Kε(x) and y ∈ Kε(y):

(i) p(αx, βy) = p(x,y) for every α, β ∈ C \ 0,(ii) y∗T (p(x,y))x = 0,

(iii) y∗T ′(p(x,y))x 6= 0.

The generalized Rayleigh quotient (which was introduced in [Lan61] only for polynomial

eigenvalue problems) is defined as

pL : Kε(x)×B(λ, τ)×Kε(y)→ B(λ, τ), pL(y, λ,x) := λ− y∗T (λ)x

y∗T ′(λ)x.

Facts:

The following facts can be found in [Sch08, SS10].

1. Let (y, λ, x) be an eigentriplet of T (·) with ‖x‖ = ‖y‖ = 1, and assume that

y∗T ′(λ)x 6= 0. Then there exist ε > 0 and τ > 0 such that the two-sided Rayleighfunctional is defined in Kε(x)×Kε(y), and

|p(x,y)− λ| ≤ 8

3

‖T (λ)‖|y∗T ′(λ)x|

tan ξ tan η,

where ξ := ∠(Spanx,Spanx) and η := ∠(Spany,Spany).


2. Under the conditions of Fact 1 let ξ < π/3 and η < π/3. Then

|p(x,y)− λ| ≤ 32

3

‖T (λ)‖|y∗T ′(λ)x|

‖x− x‖‖y − y‖.

3. Under the conditions of Fact 1 the two-sided Rayleigh functional is stationary at(x, y), i.e. |p(x + s, y + t)− λ| = O((‖s‖+ ‖t‖)2).

4. Let (λ, x) be an eigenpair of T (·) with ‖x‖ = 1 and x∗T ′(λ)x 6= 0, and suppose that

T (λ) = T (λ)∗. Then there exist ε > 0 and τ > 0 such that the one-sided Rayleighfunctional p(·) is defined in Kε(x), and

|p(x)− λ| ≤ 8

3

‖T (λ)‖|x∗T ′(λ)x|

tan2 ξ,

where ξ := ∠(Spanx,Spanx).5. Let (λ, x) be an eigenpair of T (·) with ‖x‖ = 1 and x∗T ′(λ)x 6= 0. Then there existε > 0 and τ > 0 such that the one-sided Rayleigh functional p(·) is defined in Kε(x),and

|p(x)− λ| ≤ 10

3

‖T (λ)‖|x∗T ′(λ)x|

tan ξ,

where ξ := ∠(Spanx,Spanx).6. Let x be a right eigenvector of T (·) corresponding to λ, and x∗T ′(λ)x 6= 0. The

one-sided Rayleigh functional p is only stationary at x if x is also a left eigenvector.7. The generalized Rayleigh quotient pL is obtained when applying Newton’s method

to the equation defining the two-sided Rayleigh functional for fixed x and y.8. [Lan61] Let (y, λ, x) be an eigentriplet of T (·) with y∗T ′(λ)x 6= 0. Then the general-

ized Rayleigh quotient pL is stationary at (y, λ, x).9. Under the conditions of Fact 1 the generalized Rayleigh quotient pL is defined for allλ ∈ B(λ, τ) and (x,y) ∈ Kε(x)×Kε(y), and

|pL(y, λ,x)− λ| ≤ 4‖T (λ)‖|y∗T ′(λ)x|

tan ξ tan η +2L

|y∗T ′(λ)x||λ− λ|2

cos ξ cos η,

where L denotes the Lipschitz constant of T ′(·).

115.5 Methods for dense eigenvalue problems

The size of the eigenvalue problems that can be treated with the numerical methods con-sidered in this section is limited to a few thousands depending on the available storagecapacities. Moreover, they require several factorizations of varying matrices to approximateone eigenvalue, and therefore, they are not appropriate for large and sparse problems. How-ever, they are needed to solve the projected eigenproblem in most of the iterative projectionmethods for sparse problems.

For general nonlinear eigenvalue problems, the classical approach is to formulate theeigenvalue problem as a system of nonlinear equations and to use variations of Newton’smethod or the inverse iteration method. Thus, these methods are local and therefore notguaranteed to converge, but as for linear eigenvalue problems their basin of convergence canbe enlarged using homotopy methods [DP01] or trust region strategies [YMW07].


Facts:

1. [Kub70] Let T (λ)P (λ) = Q(λ)R(λ) be the QR factorization of T (λ), where P (λ) is apermutation matrix which is chosen such that the diagonal elements rjj(λ) of R(λ)are decreasing in magnitude, i.e. |r11(λ)| ≥ |r22(λ)| ≥ · · · ≥ |rnn(λ)|. Then λ is aneigenvalue of T (·) if and only if rnn(λ) = 0.Applying Newton’s method to this equation, one obtains the iteration

λk+1 = λk −1

eTnQ(λk)∗T ′(λk)P (λk)R(λk)−1en

for approximations to an eigenvalue of problem T (λ)x = 0, where en denotes the nthunit vector.Approximations to left and right eigenvectors can be obtained from yk = Q(λk)enand xk = P (λk)R(λk)−1en.However, this relatively simple idea is not efficient, since it computes eigenvalues oneat a time and needs several O(n3) factorizations per eigenvalue. It is, however, usefulin the context of iterative refinement of computed eigenvalues and eigenvectors.

2. [AR68] Applying Newton’s method to the nonlinear system

F (x, λ) :=

(T (λ)x

v∗x− 1

)= 0

where v ∈ Cn is suitably chosen one obtains the inverse iteration given in Algorithm1. Being a variant of Newton’s method it converges locally and quadratically forsimple eigenpairs.

Algorithm 1: Inverse iteration

Require: Initial pair (λ0,x0) and normalization vector v with v∗x0 = 11: for k = 0, 1, 2, . . . until convergence do2: solve T (λk)uk+1 = T ′(λk)xk for uk+1

3: λk+1 ← λk − (v∗xk)/(v∗uk+1)4: normalize xk+1 ← uk+1/v

∗uk+1

5: end for

3. If T (·) is Hermitian such that the general conditions of Section 115.3 are satisfied, oneobtains the Rayleigh functional iteration if the update of λk+1 in step 3 of Algorithm1 is replaced with λk+1 ← p(uk+1). This method converges locally and cubically[Rot89] for simple eigenpairs.

4. [Lan61, Ruh73] Replacing the vector v in the normalization step of inverse iterationfor a general matrix function T (·) with vk = T (λk)∗yk, where yk is an approximationto a left eigenvector, the update for λ becomes

λk+1 ← λk −y∗kT (λk)xky∗kT

′(λk)xk,

which is the generalized Rayleigh quotient pL.5. [Sch08] For general T (·) and simple eigentriplets (y, λ, x) cubic convergence is also

achieved by the two-sided Rayleigh functional iteration in Algorithm 2. If the linearsystem in step 2 is solved by factorizing T (λk), then taking the conjugate transposethe factorization can be reused for the system in step 3. So, the cost of one iterationstep is similar to the one of the one-sided Rayleigh functional iteration.


Algorithm 2: Two-sided Rayleigh functional iteration

Require: Initial triplet (y0, λ0,x0) where x∗0x0 = y∗0y0 = 11: for k = 0, 1, 2, . . . until convergence do2: solve T (λk)uk+1 = T ′(λk)xk for uk+1; xk+1 ← uk+1/‖uk+1‖3: solve T (λk)∗vk+1 = T ′(λk)∗yk for vk+1; yk+1 ← vk+1/‖vk+1‖4: solve y∗k+1T (λk+1)xk+1 = 0 for λk+1

5: end for

6. [Neu85] The cost for solving a linear system in each iteration step with a varyingmatrix is avoided in the residual inverse iteration in Algorithm 3 where the matrixT (λ0) is fixed during the whole iteration (or at least for several steps).

Algorithm 3: Residual inverse iteration

Require: Initial pair (λ0,x0) and normalization vector w with w∗x0 = 11: for k = 0, 1, 2, . . . until convergence do2: solve w∗T (λ0)−1T (λk+1)xk = 0 for λk+1

3: solve T (λ0)uk = T (λk+1)xk for uk4: set vk+1 ← xk − uk and normalize xk+1 ← vk+1/w

∗vk+1

5: end for

If T (·) is Hermitian and λ ∈ R, then the convergence can be improved by determiningλk+1 in step 1 via the Rayleigh functional, i.e. solving x∗kT (λk+1)xk = 0 for λk+1.

If T (·) is twice continuously differentiable and λ is algebraically simple, then the

residual inverse iteration converges for all (λ0,x0) sufficiently close to (λ, x), and

‖xk+1 − x‖‖xk − x‖

= O(|λ0 − λ|) and |λk+1 − λ| = O(‖xk − x‖t)

where t = 2 in the Hermitian case if λk+1 is updated via the Rayleigh functional, andt = 1 in the general case.

7. [Ruh73] The first order approximation T (λ + σ)x = T (λ)x + σT ′(λ)x + o(|σ|) sug-gests the method of successive linear problems in Algorithm 4, which also convergesquadratically for simple eigenvalues.

Algorithm 4: Method of successive linear problems

Require: Initial approximation λ0

1: for k = 0, 1, . . . until convergence do2: solve the linear eigenproblem T (λk)u = θT ′(λk)u3: choose an eigenvalue θ smallest in modulus4: λk+1 = λk − θ5: end for

If λ is a semi-simple eigenvalue, xk converges to a right eigenvector x. If y is a lefteigenvector corresponding to λ such that y∗T ′(λ)x 6= 0 (which is guaranteed for asimple eigenvalue), then the convergence factor is given by (cf. [Jar12])

c := limk→∞

λk+1 − λ(λk − λ)2

=1

2

y∗T ′′(λ)x

y∗T ′(λ)x.


8. [Wer70] If the nonlinear eigenvalue problem allows for a variational characterization ofits eigenvalues, then the safeguarded iteration, which aims at a particular eigenvalue,is a natural choice.

Algorithm 5: Safeguarded iteration for determining an mth eigenvalue

Require: Approximation λ0 to an mth eigenvalue1: for k = 0, 1, . . . until convergence do2: determine an eigenvector xk corresponding to the m-largest eigenvalue of T (λk)3: solve x∗kT (λk+1)xk = 0 for λk+1

4: end for

Under the conditions of Section 115.3, the safeguarded iteration has the followingproperties [NV10]:

(i) If λ1 := infx∈D(p) p(x) ∈ J and x0 ∈ D, then the safeguarded iteration with

m = 1 converges globally to λ1.

(ii) If T (·) is continuously differentiable and λm is a simple eigenvalue, then the

safeguarded iteration converges locally and quadratically to λm.

(iii) Let T (·) be twice continuously differentiable and T ′(λm) be positive definite.If xk in step 3 is chosen to be an eigenvector corresponding to the m largesteigenvalue of the generalized eigenvalue problem T (λk)x = µT ′(λk)x, then theconvergence is even cubic.

9. [SX11] For higher dimensions n it is too costly to solve the occurring linear systemsexactly. Szyld and Xue [SX11] studied inexact versions of inverse iteration and residualinverse iteration and proved that the same order of convergence can be achieved as forthe exact methods if the respective linear systems are solved sufficiently accurately.

115.6 Iterative projection methods

For sparse linear eigenvalue problems Ax = λx iterative projection methods like the Lanc-zos, Arnoldi, rational Krylov or Jacobi–Davidson method are very efficient. Here the di-mension of the eigenproblem is reduced by projecting it to a subspace of much smallerdimension, and the reduced problem is solved by a fast technique for dense problems. Thesubspaces are expanded in the course of the algorithm in an iterative way with the aim thatsome of the eigenvalues of the reduced matrix become good approximations to some of thewanted eigenvalues of the given large matrix.

Two types of iterative projection methods are in use: methods which expand the subspacesindependently of the eigenpair of the projected problem and which take advantage of anormal form of A like the Arnoldi, Lanczos, and rational Krylov method, and methodswhich aim at a particular eigenpair and choose the expansion such that it has a highapproximation potential for a wanted eigenvector like the Jacobi–Davidson method.

For general nonlinear eigenproblems a normal form does not exist. Therefore, generaliza-tions of iterative projection methods to general nonlinear eigenproblems always have to beof the second type. There are essentially two types of these methods, the Jacobi–Davidsonmethod (and its two–sided version) which is based on inverse iteration and the nonlinearArnoldi method which is based on residual inverse iteration.

Jacobi–Davidson methodAssume that we are given a search space V and a matrix V with orthonormal columns


containing a basis of V. Let (y, θ) be an eigenpair of the projected problem V ∗T (λ)V y = 0and x = V y be the corresponding Ritz vector. A direction with high approximation potentialis given by inverse iteration v = T (θ)−1T ′(θ)x, however replacing v with an inexact solutionof the linear system T (θ)v = T ′(θ)x will spoil the favorable approximation properties ofinverse iteration.

Actually, we are not interested in the direction v but in an expansion of V which containsv, and for every α 6= 0 the vector t = x + αv is as qualified as v. It was shown in [Vos07]that the most robust expansion of this type is obtained if x and t := x+αv are orthogonal,and it is easily seen that this t solves the so called correction equation(

I − T ′(θ)xx∗

x∗T ′(θ)x

)T (θ)

(I − xx∗

x∗x

)t = T (θ)x, t ⊥ x.

The resulting iterative projection method is called Jacobi–Davidson method, a templateof which is given in Algorithm 6.

Algorithm 6: Nonlinear Jacobi–Davidson method

Require: Initial basis V , V ∗V = I; m = 11: determine preconditioner K ≈ T (σ)−1, σ close to first wanted eigenvalue2: while m ≤ number of wanted eigenvalues do3: compute an approximation θ to the mth wanted eigenvalue and corresponding

eigenvector y of the projected problem TV (θ)y := V ∗T (θ)V y = 04: determine the Ritz vector u = V y and the residual r = T (θ)u5: if ‖r‖/‖u‖ < ε then6: accept approximate eigenpair (λm,xm) := (θ,u); increase m← m+ 1;7: reduce search space V if indicated8: determine new preconditioner K ≈ T (λm)−1 if necessary9: choose approximation (θ,u) to next eigenpair

10: compute residual r = T (θ)u;11: end if12: Find approximate solution of correction equation

(I − T ′(θ)uu∗

u∗T ′(θ)u)T (θ)(I − uu∗

u∗u)t = −r, t ⊥ u (115.6)

(by preconditioned Krylov solver, e.g.)13: orthogonalize t = t− V V ∗t, v = t/‖t‖, and expand subspace V = [V,v]14: update projected problem15: end while

Facts:

1. The Jacobi–Davidson method was introduced for polynomial eigenproblem in [SBF96]and studied for general nonlinear eigenvalue problems in [BV04, Vos07a].

2. As in the linear case the correction equation (115.6) does not have to be solvedexactly to maintain fast convergence, but usually a few steps of a Krylov subspacesolver with an appropriate preconditioner suffice to obtain a good expansion directionof the search space.

3. In the correction equation (115.6) the operator T (θ) is restricted to map the subspacex⊥ into itself. Hence, if K−1 ≈ T (θ) is a preconditioner of T (θ), then a preconditioner


for an iterative solver of (115.6) should be modified correspondingly to

K := (I − T ′(θ)uu∗

u∗T ′(θ)u)K−1(I − uu∗

u∗u).

With left-preconditioning equation (115.6) becomes

K−1T (θ)t = −K−1r, t ⊥ u where T (θ) := (I − T ′(θ)uu∗

u∗T ′(θ)u)T (θ)(I − uu∗

u∗u).

Taking into account the projectors in the preconditioner, i.e. using K instead of Kin a preconditioned Krylov solver, raises the cost only slightly. In every step one hasto solve one linear system Kw = y, and to initialize the solver requires only oneadditional solve.

4. In step 1 of Algorithm 6 any preinformation such as a small number of known ap-proximate eigenvectors of problem (115.1) corresponding to eigenvalues close to σ orof eigenvectors of a contiguous problem can and should be used.If no information on eigenvectors is at hand, and if one is interested in eigenvaluesclose to the parameter σ ∈ D, one can choose an initial vector at random, executea few Arnoldi steps for the linear eigenproblem T (σ)u = θu or T (σ)u = θT ′(σ)u,and choose the eigenvector corresponding to the smallest eigenvalue in modulus or asmall number of Schur vectors as initial basis of the search space.Starting with a random vector without this preprocessing usually will yield a valueλm in step 4 which is far away from σ and will avert convergence.

5. As the subspaces expand in the course of the algorithm the increasing storage orthe computational cost for solving the projected eigenvalue problems may make itnecessary to restart the algorithm and purge some of the basis vectors. Since a restartdestroys information on the eigenvectors and particularly on the one the method isjust aiming at, the method is restarted only if an eigenvector has just converged.An obvious way to restart is to determine a Ritz pair (µ,u) from the projection to thecurrent search space span(V ) approximating an eigenpair wanted next, and to restartthe Jacobi–Davidson method with this single vector u. However, this may discardtoo much valuable information contained in span(V ), and may slowdown the speedof convergence too much. Therefore, thick restarts with subspaces spanned by theRitz vector u and a small number of eigenvector approximations obtained in previoussteps which correspond to eigenvalues closest to µ are preferable.

6. A crucial point in iterative methods for general nonlinear eigenvalue problems whenapproximating more than one eigenvalue is to inhibit the method from convergingto the same eigenvalue repeatedly. For linear eigenvalue problems locking of alreadyconverged eigenvectors can be achieved using an incomplete Schur factorization. Fornonlinear problems allowing for a variational characterization of its eigenvalues onecan determine the eigenpairs one after another solving the projected problem bysafeguarded iteration [BV04]. For general nonlinear eigenproblems a locking procedurebased on invariant pairs was introduced in [Eff12] (cf. Subsection 7).

7. Often the matrix function T (·) is given in the following form T (λ) :=∑mj=1 fj(λ)Aj

where fj : Ω→ C are continuous functions and Aj ∈ Cn×n are fixed matrices. Thenthe projected problem can be updated easily appending one row and column to eachof the projected matrices V ∗AjV .

Two-sided Jacobi–Davidson methodIn Algorithm 6 approximations to an eigenvalue are obtained in step 3 from a Galerkinprojection of T (λ)x = 0 to the search space Span (V ) for right eigenvectors. Computing a


left search space also with a correction equation for left eigenvectors and applying a Petrov-Galerkin projection one arrives at the Two-sided Jacobi-Davidson method in Algorithm 7(where only the computation of one eigentriplet is considered):

Algorithm 7: Two-sided Jacobi–Davidson method

Require: Initial bases U with U∗U = I and V with V ∗V = I1: while not converged do2: solve V ∗T (θ)Uc = 0 and U∗T (θ)∗V d = 0 for (θ, c,d)3: determine Ritz vectors u = Uc and v = V d and residuals ru = T (θ)u, rv = T (θ)∗v4: if min(‖ru‖/‖u‖, ‖rv‖/‖v‖) < ε then5: accept approximate eigentriplet (v, θ,u); STOP6: end if7: Solve (approximately) correction equations

(I − T ′(θ)uv∗

v∗T ′(θ)u)T (θ)(I − uu∗

u∗u)s = −ru, s ⊥ u,

(I − T ′(θ)∗vu∗

u∗T ′(θ)∗v)T (θ)∗(I − vv∗

v∗v)t = −rv, t ⊥ v

8: orthogonalize s = s− UU∗s, s = s/‖s‖, and expand left search space U = [U, s]9: orthogonalize t = t− V V ∗t, t = t/‖t‖, and expand right search space V = [V, t]

10: end while

Facts:

8. [Sch08] θ as computed in step 2 is the value of the two-sided Rayleigh functional at(u,v), and one therefore may expect local cubic convergence for simple eigenvalues.

9. [HS03] The correction equation in step 7 of Algorithm 7 can be replaced with

(I − T ′(θ)uv∗

v∗T ′(θ)u)T (θ)(I − T ′(θ)uv∗

v∗T ′(θ)u)s = −ru, s ⊥ u,

(I − T ′(θ)∗vu∗

u∗T ′(θ)∗v)T (θ)∗(I − T ′(θ)∗vu∗

u∗T ′(θ)∗v)t = −rv, t ⊥ v.

This variant was suggested in [HS03] for linear eigenvalue problems, and its general-ization to the nonlinear problem is obvious. Since again θ is the value of the two-sidedRayleigh functional the convergence should also be cubic.

10. [SS06] Replacing the correction equations with

(I − vv∗)T (θ)(I − uu∗)s = −ru, s ⊥ u,

(I − uu∗)T (θ)∗(I − vv∗)t = −rv, t ⊥ v

one obtains the primal-dual Jacobi-Davidson method which was shown to bequadratically convergent.

Nonlinear Arnoldi methodExpanding the current search space V by the direction v = x− T−1(σ)T (θ)x as suggestedby residual inverse iteration generates similar robustness problems as for inverse iteration.If v is close to the desired eigenvector, then an inexact evaluation of v spoils the favorableapproximation properties of residual inverse iteration.


Similarly as in the Jacobi–Davidson method one could replace v by z := x + αv whereα is chosen that x∗z = 0, and one could determine an approximation to z solving acorrection equation. However, since the new search direction is orthonormalized againstthe previous search space V and since x is contained in V we may choose the new di-rection v = T (σ)−1T (θ)x as well. This direction satisfies the orthogonality condition

x∗v = 0 at least in the limit as θ approaches a simple eigenvalue λ (cf. [Vos07]), i.e.limθ→λ x∗T (σ)−1T (θ)x = 0.

A template for the preconditioned nonlinear Arnoldi method with restarts and varyingpreconditioner is just like Algorithm 6. Only step 12 has to be replaced with t = Kr.

Facts:

11. The general remarks about the initial approximation to the eigenvector, restarts andlocking following the Jacobi–Davidson method apply to the nonlinear Arnoldi methodalso.

12. Since the residual inverse iteration with fixed pole σ converges (at least) linearly, andthe contraction rate satisfies O(|σ−λm|), it is reasonable to update the preconditionerif the convergence (measured by the quotient of the last two residual norms beforeconvergence) has become too slow.

13. The nonlinear Arnoldi method was introduced for quadratic eigenvalue problems in[Mee01] and for general nonlinear eigenvalue problems in [Vos04].

14. [LBL10] studies a variant that avoids complex arithmetic augmenting the search spaceby two vectors, the real and imaginary part of the expansion t = Kr.

115.7 Methods using invariant pairs

One of the most important problems when determining more than one eigenpair of a nonlin-ear eigenvalue problem is to prevent the method from determining the same pair repeatedly.Jordan chains are conceptually elegant but unstable under perturbations. More robust con-cepts for computing several eigenvalues along with the corresponding (generalized) eigen-vectors were introduced only recently and are based on invariant pairs [Kre09, BT09].

It is convenient to consider the nonlinear eigenvalue problem in the following form:

T (λ)x :=

m∑j=1

fj(λ)Ajx = 0 (115.7)

where fj : Ω→ C are analytic functions and Aj ∈ Cn×n are fixed matrices.

Definitions:

Let the eigenvalues of S ∈ Ck×k be contained in Ω and let X ∈ Cn×k. Then (X,S) is called

invariant pair of the nonlinear eigenvalue problem (115.7) if

m∑j=1

AjXfj(S) = 0.


A pair (X,S) ∈ Cn×k ×Ck×k is minimal if there is ` ∈ N such that the matrix

V`(X,S) :=

X

XS...

XS`−1

has rank k.

The smallest such ` is called the minimality index of (X,S).

An invariant pair (X,S) is called simple if (X,S) is minimal and the algebraic multiplicities of

the eigenvalues of S are identical to the ones of the corresponding eigenvalues of T (·).

Facts:

The following facts for which no specific reference is given can be found in [Kre09].

1. Let (X,S) be a minimal invariant pair of (115.7). Then the eigenvalues of S areeigenvalues of T (·).

2. By the Cayley–Hamilton theorem the minimality index of a minimal pair can notexceed k.

3. [BK11] For a regular matrix polynomial of degree m the minimality index of a minimalinvariant pair can not exceed m.

4. [Eff12] Let p0, . . . , p`−1 be a basis for the polynomials of degree less than `. Then thepair (X,S) is minimal with minimality index at most ` if and only if

V p` (X,S) =

Xp0(S)...

Xp`−1(S)

has full column rank.

5. [BK11] If V`(X,S) has rank k < k, then there is a minimal pair (X, S) ∈ Cn×k×Ck×ksuch that Span(X) = Span(X) and Span(V`(X, S)) = Span(V`(X,S)).

6. If (X,S) is a minimal invariant pair, then (XZ,Z−1SZ) is also a minimal invariantpair for every invertible matrix Z ∈ Ck×k.

7. Let (X,S) be a minimal invariant pair, and let pj ∈ Πk be the Hermite interpolatingpolynomials of fj at the spectrum of S of maximum degree k. Then (X,S) is aminimal invariant pair of P (λ)x :=

∑mj=1 pj(λ)Ajx = 0.

8. Let (λj ,xj), j = 1, . . . , k be eigenpairs of T (·) with λi 6= λj for i 6= j. Then theinvariant pair (X,S) := ([x1, . . . ,xk],diag(λ1, . . . , λk)) is minimal.

9. Consider the nonlinear matrix operator

T :

Cn×k × Ck×kΩ → Cn×k

(X,S) 7→∑mj=1AjXfj(S)

(115.8)

where Ck×kΩ denotes the set of k×k matrices with eigenvalues in Ω. Then an invariantpair (X,S) satisfies T(X,S) = 0, but this relation is not sufficient to characterize(X,S).To define a scaling condition, choose ` such that the matrix V`(X,S) has rank k, anddefine the partition

W =

W0

W1

...W`−1

:= V`(X,S) (V`(X,S)∗V`(X,S))−1 ∈ Cnk×k


with Wj ∈ Cn×k. Then V(X,S) = 0 for the operator

V : Cn×k × Ck×kΩ → Cn×k, V(X,S) := W ∗V`(X,S)− Ik.

If (X,S) is a minimal invariant pair for the nonlinear eigenvalue problem T (·)x = 0,then (X,S) is simple if and only if the linear matrix operator

L : Cn×k × Ck×k → Cn×k × Ck×k, (∆X,∆S) 7→ (DT(∆X,∆S),DV(∆X,∆S))

is invertible, where DT and DV denotes the Frechet derivative of T and V, respectively.10. [Kre09] The last Fact motivates to apply Newton’s method to the system T(X,S) = 0,

V(X,S) = 0 which can be written as

(Xp+1, Sp+1) = (Xp, Sp)− L−1(T(Xp, Sp),V(Xp, Sp))

where L = (DT,DV) is the Jacobian matrix of T(X,S) = 0,V(X,S) = 0.

DT(∆X,∆S) = T(∆X,S) +

m∑j=1

AjX[Dfj(S)](∆S),

DV(∆X,∆S) = W ∗0 ∆X +

m∑j=1

W ∗j (∆XSj +X[DSj ](∆S)).

Algorithm 8: Newton’s method for computing invariant pairs

Require: Initial pair (X0, S0) ∈ Cn×k × Ck×k such that V`(X0, S0)∗V`(X0, S0) = Ik1: p← 0, W ← V`(X0, S0)2: repeat3: Res← T(Xp, Sp)4: Solve linear matrix equation L(∆X,∆S) = (Res, O)5: Xp+1 ← Xp −∆X, Sp+1 ← Sp −∆S

6: Compute compact QR decomposition V`(Xp+1, Sp+1) = WR

7: Xp+1 ← Xp+1R−1, Sp+1 ← RSp+1R

−1

8: until convergence

11. [Bey12, BEK11]

T(X,S) =

∫Γ

T (z)X(zI − S)−1 dz

where Γ is a contour (i.e. a simply closed curve) in Ω containing the spectrum of Sin its interior.

12. [BEK11]

DT(X,S)(∆X,∆S) =1

2πi

∫Γ

(∆X +X(zI − S)−1∆S)(zI − S)−1 dz.

13. [Eff12] Let (X,S) be a minimal (index `) invariant pair of T (·). If ([ YV ],M) is aminimal invariant pair of the augmented analytic matrix function

T :

Ω→ C(n+k)×(n+k),

T (µ)

[yv

]=

[T([X, y], [ S v

0 µ ])[V p`+1(X,S)]∗V p`+1([X, y], [ S v

0 µ ])

]ek+1


with T as in Fact 11, V p`+1 analogous to Fact 4, and ek+1 = (0, . . . , 0, 1)T ∈ Rk+1,then ([X,Y ], [ S V

0 M ]) is a minimal invariant pair of T (·). Conversely, for any minimal in-

variant pair ([X,Y ], [ S V0 M ]) of T (·) there exists a unique F such that ([ Y−XF

V−(SF−FM) ],M)

is a minimal invariant pair of T (·).14. The previous fact suggests that working with T (·) deflates the minimal invariant

pair (X,S) from T (·).15. [Eff12] Effenberger combined the deflation in Fact 13 with the Jacobi–Davidson

method to determine several eigenpairs of a nonlinear eigenvalue problem one afteranother in a safe way.

16. [GKS93] The pair (X,S) is minimal if and only if

rank

[λI − SX

]has full rank for every λ ∈ C (or, equivalently, for every eigenvalue λ of S).

17. [GKS93] Let λ be an eigenvalue of T (·) and X := [x0, . . . ,xk−1] ∈ Ck×k with x0 6= 0.

Then x0, . . . ,xk−1 is a Jordan chain at λ if and only if (X, Jk(λ)) is an invariant pair

of T (·), where Jk(λ) denotes a k × k Jordan block corresponding to λ.

18. [BEK11] Let λ be an eigenvalue of T (·) and consider a matrix X = [X(1), . . . , X(p)],

X(i) = [x(i)0 , . . . ,x

(i)mi ], with x

(i)0 6= 0. Then every x

(i)0 , . . . ,x

(i)mi for i = 1, . . . , p is a

Jordan chain if and only if (X, J) with J := diag(Jm1(λ), . . . , Jmp

(λ)) is an invariant

pair of T (·). Moreover, (X, J) is minimal if and only if x(1)0 , . . . ,x

(p)0 are linearly

independent.19. [SX12] Suppose that (X,S) is a simple invariant pair of (115.7), λ an eigenvalue of

S, and J = Z−1SZ is the Jordan canonical form of S. Assume that J has m Jordanblocks corresponding to λ, each of size ki × ki, 1 ≤ i ≤ m. Then there are exactly mJordan chains of T (·) corresponding to λ, the length of each is ki, and the geometric

multiplicity of λ is m.This fact demonstrates that the spectral structure of an eigenvalue λ of a matrixfunction T (·), including the algebraic, partial and geometric multiplicities togetherwith all Jordan chains, is completely represented in a simple invariant pair (X,S) for

which λ is an eigenvalue of S.

Examples:

1. For the quadratic eigenvalue problem (115.2) with eigenvalue λ = −1 and eigenvector x =

[1; 1] the pair (X,S) := (x, λ) is a minimal invariant pair with minimality index 1, which is

not simple, because the algebraic multiplicity of λ is 2 as an eigenvalue of T (λ)x = 0 and

only 1 as an eigenvalue of S.

The Jordan pair (X1, S1) with X1 =

[1 1

1 1

]and S1 =

[−1 1

0 −1

]is a minimal invariant pair

with minimality index 2, which is simple, and the same is true for the pairs (X2, S2) with

X2 =

[1 2

1 2

]and S1 =

[1 0

0 2

], and (X3, S3) with X3 := [X1, X2] and S3 := diag(X1, X2).

115.8 The infinite Arnoldi method

Let T : Ω→ Cn×n be analytic on a neighborhood Ω of the origin, and assume that λ = 0 isnot an eigenvalue of T (·). To determine eigenvalues close to 0 [JMM12] use the equivalence


of T (λ)x = 0 to a linear, infinite dimensional eigenvalue problem, and apply the linearArnoldi method, which can be reformulated to an iteration involving only standard linearalgebra operations on matrices and vectors of finite dimension.

Definitions:

B(λ) := T (0)−1(T (0) − T (λ))/λ for λ 6= 0 and B(0) := −T (0)−1T ′(0) is analytic on Ω, and λ is

an eigenvalue of T (·) if and only if λ is an eigenvalue of λB(λ)x = x.

Let D(B) := φ ∈ C∞(R,Cn) :∑∞i=0B

(i)(0)φ(i)(0) <∞, and define

B(θ) :=

θ∫0

φ(θ) dθ + C(φ), C(φ) :=

∞∑i=0

B(i)(0)

i!φ(i)(0) =

(B(

d

dθ)φ

)(0). (115.9)

(Ψ, R) ∈ D(B)p ×Cp×p is an invariant pair of the operator B if (BΨ)(θ) = Ψ(θ)R.

Facts:

The following facts can be found in [JMM12].

1. Let x ∈ Cn \ 0, λ ∈ Ω and denote φ(θ) := xeλθ. Then the following two statementsare equivalent:

(i) (λ, x) is an eigenpair of T (·)(ii) (λ, φ) is an eigenpair of the linear, infinite dimensional eigenvalue problem λBφ =

φ.

2. All eigenfunctions of B depend exponentially on θ, i.e. if λBψ = ψ, then ψ(θ) = xeλθ.3. The (linear) Arnoldi method for the operator B is given in Algorithm 9. Here 〈·, ·〉

denotes a scalar product on C∞(R,Cn), and Hk = (hik) ∈ Ck×k a Hessenberg matrixconstructed in the algorithm.

Algorithm 9: Arnoldi method for BRequire: Initial function φ1 with 〈φ1, φ1〉 = 11: for k = 1, 2, . . . until convergence do2: ψ ← Bψk3: for i=1,. . . ,k do4: hik ← 〈ψ, φi〉5: ψ ← ψ − hikφi6: end for7: hk+1,k ←

√〈ψ,ψ〉

8: φk+1 ← ψ/hk+1,k

9: end for10: Compute eigenvalues µi of Hessenberg matrix Hk

11: Return eigenvalue approximations 1/µi of B

4. Since the Arnoldi method favors extreme eigenvalues of B, 1/µi will approximateeigenvalues of T (·) close to the origin.

5. If φ1 is a polynomial of degree k, then Bφ is a polynomial of degree k + 1. Hence, ifφ1 is a constant function, then Algorithm 9 after N steps arrives at a Krylov spaceKN (B, φ1) = Spanφ1, . . . , φN of vectors of polynomials of degree N − 1.

6. Let qii=0,1,... be a sequence of polynomials such that qi is of degree i with non-zero leading coefficient, and let q0 ≡ 1. Let LN ∈ RN×N be an integration map


corresponding to qi such that q0(θ)...

qN−1(θ)

= LN

q′1(θ)...

q′N (θ)

.Let the columns of (x0, . . . ,xN−1) =: X ∈ Cn×N denote the vector coefficients in the

basis qi, and denote a vector of polynomials φ(θ) :=∑N−1i=0 qi(θ)xi.

If ψ(θ) = (Bφ)(θ) =:∑Ni=0 qi(θ)yi, then the coefficients yi of Bφ are given by

(y1, . . . ,yN ) = XLN and y0 =(∑N−1

i=0 B( ddθ )qi(θ)xi

)(0)−

∑Ni=1 qi(0)yi.

This fact permits to reformulate Algorithm 9 to an iteration involving only standardlinear algebra operations on matrices and vectors of finite dimension.In [JMM12] the details are worked out for two polynomial bases, the monomial basisqi = θi and Chebyshev polynomials.

7. [JMM11] Suppose that S ∈ Cp×p is invertible and suppose that (Ψ, S−1) is an in-variant pair of B. Then Ψ can be expressed as Ψ(θ) = X exp(θS) for some matrixX ∈ Cp×p.

8. [JMM11] Assume that T (λ) :=∑mj=1 fj(λ)Aj . Let S ∈ Cp×p be nonsingular and

X ∈ Cn×p. The following two statements are equivalent:

(i) (Ψ, S−1), Ψ(θ) := X exp(θS) is an invariant pair of the operator B.

(ii) (X,S) is an invariant pair of T (·), i.e.∑mj=1AjXfj(S) = 0.

9. Inspired by the implicitly restarted Arnoldi method for linear eigenproblems [JMM11]proposes a variant of the infinite Arnoldi method for the nonlinear eigenvalue problemT (λ)x = 0 which allows for locking already converged eigenpairs. The locked part ofthe partial Schur factorization for linear problems is replaced by invariant pairs. Themethod uses functions φ(θ) = XeSθc+ q(θ) where X ∈ Cn×p, S ∈ Cp×p, c ∈ Cp andq : C→ Cn is a vector of polynomials.

References

[AR68] P.M. Anselone, L.B. Rall, The solution of characteristic value-vector problems by New-

ton’s method, Numer. Math., 11:38–45, 1968.

[AST09] J. Asakura, T. Sakurai, H. Tadano, T. Ikegami, K. Kimura, A numerical method for

nonlinear eigenvalue problems using contour integrals, JSIAM Letters, 1:52–55, 2009.

[BV04] T. Betcke, H. Voss, A Jacobi–Davidson–type projection method for nonlinear eigenvalue

problems, Future Generation Comput. Syst., 20:363–372, 2004.

[Bey12] W.-J. Beyn, An integral method for solving nonlinear eigenvalue problems, Linear Al-gebra Appl., 436:3839–3863, 2012.

[BEK11] W.-J. Beyn, C. Effenberger, D. Kressner, Continuation of eigenvalues and invariant

pairs for parametrized nonlinear eigenvalue problems, Numer. Math., 119:489–516, 2011.

[BT09] W.-J. Beyn, V. Thummler, Continuation of low-dimensional invariant subspaces in dy-

namical systems of large dimension, SIAM J. Matrix Anal. Appl., 31:1361–1381, 2009.

[BK11] T. Betcke, D. Kressner, Perturbation, extraction and refinement of invariant pairs, LinearAlgebra Appl., 435:514–536, 2011.

[DP01] E.M. Daya, M. Potier–Ferry, A numerical method for nonlinear eigenvalue problems

application to vibrations of viscoelastic structures, Comupers & Structures, 79:533–541,

2001.

[Duf55] R.J. Duffin, A minimax theory for overdamped networks, J. Rat. Mech. Anal., 4:221–

233, 1955.


[Eff12] C. Effenberger, Robust successive computation of eigenpairs for nonlinear eigenvalue

problems, Tech.Report 27.2012, Math. Inst. of Comput. Sc. Engn., EPF Lausanne 2012.

[GKS93] I. Gohberg, M.A. Kaashoek, F. van Schagen, On the local theory of regular analytic

functions, Linear Algebra Appl., 182:9–25, 1993.

[GLR82] I. Gohberg, P. Lancaster, L. Rodman. Matrix Polynomials, Academic Press, New

York, 1982.

[GR81] I. Gohberg, L. Rodman, On the local theory of regular analytic functions, Linear AlgebraAppl., 182:9–25, 1993.

[Had68] K.P. Hadeler, Variationsprinzipien bei nichtlinearen Eigenwertaufgaben, Arch. Ration.Mech. Anal., 30:297–307, 1968.

[HS03] M.E. Hochstenbach, G.L.P Sleijpen, Two-sided and alternating Jacobi-Davidson,, LinearAlgebra Appl.,358:145–172, 2003.

[HL99] V. Hryniv, P. Lancaster, On the perturbation of analytic matrix functions, IntegralEquations Operator Theory,34:325–338, 1999.

[Jar12] E. Jarlebring, Convergence factor for Newton methods for nonlinear Eigenvalue problems,

Linear Algebra Appl.,436:3943–3853, 2012.

[JMM11] E. Jarlebring, W. Michiels, K. Meerbergen, Computing a partial Schur factorization of

nonlinear eigenvalue problems using the infinite Arnoldi method, TechRep Dept. Comp.

Science, K.U. Leuven, 2011.

[JMM12] E. Jarlebring, W. Michiels, K. Meerbergen, A linear eigenvalue algorithm for the non-

linear eigenvalue problem, Numer. Math, accepted, 2012.

[Kre09] D. Kressner, A block Newton method for nonlinear eigenvalue problems, Numer.Math.,114:355–372, 2009.

[Kub70] V.N. Kublanovskaya, On an approach to the solution of the generalized latent value

problem for λ-matrices, SIAM. J. Numer. Anal.,7:532–537, 1970.

[Lan61] P. Lancaster, A generalised Rayleigh quotient iteration for lambda-matrices, Arch. Rat.Mech. Anal.,8:309–322, 1961.

[LBL10] B.-S. Liao, Z. Bai, L.-Q. Lee, K. Ko, Nonlinear Rayleigh–Ritz iterative method for

solving large scale nonlinear eigenvalue problems, Taiwanese J. Math., 14:869–883, 2010.

[Mee01] K. Meerbergen, Locking and restarting quadratic eigenvalue solvers, SIAM J. Sci.Comput., 22:1814–1839, 2001.

[Neu85] A. Neumaier, Residual inverse iteration for the nonlinear eigenvalue problem, SIAM J.Numer. Anal.,22:914–923, 1985.

[NV10] V. Niendorf, H. Voss, Detecting hyperbolic and definite matrix polynomials, LinearAlgebra Appl.,432:1017–1035, 2010.

[Rog64] E.H. Rogers, A minimax theory for overdamped systems, Arch. Ration. Mech.Anal.,16:89–96, 1964.

[Rot89] K. Rothe. Losungsverfahren fur nichtlineare Matrixeigenwertaufgaben mit Anwen-dungen auf die Ausgleichselementmethode, Ph.D. Thesis, Universitat Hamburg, Ger-

many, 1989.

[Ruh73] A. Ruhe, Algorithms for nonlinear eigenvalue problems, SIAM J. Numer. Anal.,10:674–

689, 1973.

[Sch08] K. Schreiber. Nonlinear Eigenvalue Problems: Newton-type Methods and NonlinearRayleigh Functionals, Ph.D. Thesis, TU Berlin, Germany, 2008.

[SS06] H. Schwetlick, K. Schreiber, A primal-dual Jacobi–Davidson-like method for nonlinear

eigenvalue problems, TechRep ZIH-IR-0613, Technische Universitat Dresden, Germany,

2006.

[SS10] H. Schwetlick, K. Schreiber, Nonlinear Rayleigh functionals. Linear Algebra Appl.,436:3991 – 4016, 2012.

[SBF96] G.L. Sleijpen, G.L. Booten, D.R. Fokkema, H.A. van der Vorst, Jacobi-Davidson type

methods for generalized eigenproblems and polynomial eigenproblems, BIT Numerical


Mathematics, 36:595–633, 1996.

[SX11] D. Szyld, F. Xue, Local convergence analysis of several inexact Newton-type algorithms

for general nonlinear eigenvalue problems, TechRep 11-08-09, Temple University, Philadel-

phia , USA, 2011.

[SX12] D. Szyld, F. Xue, Several properties of invariant pairs of nonlinear algebraic eigenvalue

problems TechRep 12-02-09, Temple University, Philadelphia , USA, 2012.

[Vos03] H. Voss, A maxmin principle for nonlinear eigenvalue problems with application to a

rational spectral problem in fluid–solid vibration, Appl. Math., 48:607–622, 2003.

[Vos04] H. Voss, An Arnoldi method for nonlinear eigenvalue problems, BIT Numerical Math-ematics, 44:387–401, 2004.

[Vos07] H. Voss, A new justification of the Jacobi–Davidson method for large eigenproblems,

Linear Algebra Appl., 424:448–455, 2007.

[Vos07a] H. Voss, A Jacobi–Davidson method for nonlinear and nonsymmetric eigenproblems,

Computers & Structures, 85:1284–1292, 2007.

[Vos09] H. Voss, A minmax principle for nonlinear eigenproblems depending continuously on the

eigenparameter, Numer. Linear Algebra Appl., 16:899–913, 2009.

[VW82] H. Voss, B. Werner, A minimax principle for nonlinear eigenvalue problems with appli-

cations to nonoverdamped systems, Math. Meth. Appl. Sci., 4:415–424, 1982.

[Wer70] B. Werner. Das Spektrum von Operatorenscharen mit verallgemeinerten Rayleighquo-tienten, Ph.D. Thesis, Universitat Hamburg, Germany, 1970.

[YMW07] C. Yang, J.C. Meza, L.-W. Wang, A trust region direct constrained minimization

algorithm for the Kohn-Sham equation, SIAM J. Sci. Comput., 29:1854–1875, 2007.

nonlinear eigenvalue problems - tuhh · nonlinear eigenvalue problems t( )x = 0 arise in a variety...

Documents