a ‘must’ for engineers - in.tum.de · a ‘must’ for engineers ... 6. the symmetric...

Motivation Condition Vector Iteration QR Iteration Reduction Algorithms

6. The Symmetric Eigenvalue Problem

A ‘must’ for engineers . . .


Numerical Programming I (for CSE), Hans-Joachim Bungartz page 1 of 28

http://www5.in.tum.de/persons/bungartz.html


6.1. Motivation6.1.1. An Example from Physics

Consider the following system, consisting of two bodies B1, B2 with mass m,connected by springs with spring constant k. Let x1(t), x2(t) denote the displacementof the bodies B1, B2 at time t.

m m

k k k

x1

x2

From Newton’s law (F = mx) and Hooke’s law (F = mk), we have the following linearsecond-order differential equations with constant coefficients:

mx1 = −kx1 + k(x2 − x1)mx2 = −k(x2 − x1)− kx2





• From calculus, we know that we can use the following complex ansatz to find a(non-trivial) solution of this differential equation:

xj(t) = cjeiωt,

cj , ω ∈ R, j = 1, 2.

• Since xj(t) = −cjω2eiωt, we have:

−mc1ω2 = −kc1 + k(c2 − c1) = k(−2c1 + c2)−mc2ω2 = −k(c2 − c1)− kc2 = k(c1 − 2c2)

• We substitute λ := −mω2/k:

λc1 = −2c1 + c2λc2 = c1 − 2c2

Or, with c := (c1, c2)T , „−2 11 −2

«c = λc

We have an instance of the symmetric eigenvalue problem!

• For given initial values, the solution is unique.

• We can obtain similar instances of the symmetric eigenvalue problem for systemswith a higher number of bodies.





6.1.2. Eigenvalues in Numerics

• A common problem in numerics is to solve systems of linear equations of the formAx = b with A ∈ Rn×n symmetric, b ∈ Rn.

• An important question is the condition of this problem, i.e. to which extent anerror in b affects the value of the correct solution x∗ (cf. chapter 5).

• The maximum ratio of the relative error in the solution x∗ to the relative error in b(measured using the Euclidean norm) is the condition number κ(A) of solvingAx = b. It can be shown that, for symmetric A,

κ(A) =

˛λmax(A)

λmin(A)

˛.

That means the condition depends directly on the eigenvalues of A!

• This number plays an important role in iterative solution of linear equation systems(cf. chapter 7), for example:

– Assume we have an approximate solution x.– This means we have exactly solved the system Ax = b with b := Ax.– However, if κ(A) is large, this means that even if b− b is small, our solution

may be far away from the solution of Ax = b!

• Finally, also the convergence speed of common iterative solvers is greatlyaffected by the eigenvalues of A.





6.2. Condition

• Before trying to develop numerical algorithms for the symmetric eigenvalueproblem, we should have a look at its condition!

• Assume that instead of A we have a disturbed matrix A + εB, where ||B||2 = 1.Since A is symmetric, we assume that B is also symmetric (usually only one halfof A is stored in memory).

• Let λ be an eigenvalue of A and x an eigenvector to this eigenvalue, i.e.Ax = λx. Let x(ε) and λ(ε) be the disturbed values of x and λ, implicitly given bythe equality:

(A + εB)x(ε) = λ(ε)x(ε)

• Using Taylor series expansion around ε0 = 0, we have:

(A + εB)(x + x′(0)ε + 12x′′(0)ε2 + . . . ) = (λ + λ′(0)ε + 1

2λ′′(0)ε2 + . . . )

(x + x′(0)ε + 12x′′(0)ε2 + . . . )

• We are interested in the value of λ′(0). Comparing coefficients of ε, we have:

Bx + Ax′(0) = λ′(0)x + λx′(0)xT Bx + xT Ax′(0) = λ′(0)xT x + λxT x′(0)xT Bx + λxT x′(0) = λ′(0)xT x + λxT x′(0)

xT BxxT x

= λ′(0)





• It follows that

λ(ε)− λ(0) =xT Bx

xT xε + O(ε2).

• Since˛

xT BxxT x

˛≤ ‖B‖2, we get

|λ(ε)− λ(0)| ≤ |ε| ‖B‖2 + O(ε2)

• Finally, with our assumption ‖B‖2 ≤ 1 for the perturbation matrix B, we have

|λ(ε)− λ(0)| = O(ε)

• Thus, the symmetric eigenvalue problem is a well-conditioned problem!

Remark: The asymmetric eigenvalue problem is ill-conditioned! Consider theasymmetric matrices

A =

„1 10 1 + 10−10

«, A + εB =

„1 1

10−10 1 + 10−10

«.

A has eigenvalues λ1 = 1, λ2 = 1 + 10−10 while A + εB has eigenvaluesµ1/2 = 1± 10−5. This means the error in the eigenvalues is about 105 times largerthan the error in the matrix!





Naive Computation

• A naive method to compute eigenvalues could be to determine the characteristicpolynomial p(λ) of A, then using an iterative method for solving p(λ) = 0, e.g.Newton’s iteration.

• However, this reduces the well-conditioned symmetric eigenvalue problem to theill-conditioned problem of determining the roots of a polynomial!

• Example: Consider a 12× 12 matrix with the eigenvalues λi = i. Its characteristicpolynomial is

p(λ) =12Y

i=1

(λ− i)

The coefficient of λ7 is -6926634. Assume we have the polynomialq(λ) = p(λ)− 0.001λ7, i.e. the relative error of the coefficient of λ7 isε ≈ 1.44 · 10−10. However, the relative error of the eigenvalue λ9 in this case is≈ 0.02!

• We ought to have some better ideas . . .





6.3. Vector Iteration6.3.1. Power Iteration

Let A ∈ Rn×n be symmetric. Then A has n eigenvectors u1, . . . , un that form a basisof Rn, and we can write any vector x ∈ Rn as a linear combination of u1, . . . , un:

x = c1u1 + c2u2 + · · ·+ cnun (c1, . . . , cn ∈ R)

Let λ1, . . . , λn be the eigenvalues of A corresponding to the eigenvectors u1, . . . , un.Then

Ax = λ1c1u1 + λ2c2u2 + · · ·+ λncnun

A2x = λ21c1u1 + λ2

2c2u2 + · · ·+ λ2ncnun

...Akx = λk

1c1u1 + λk2c2u2 + · · ·+ λk

ncnun

= λk1

»c1u1 +

“λ2λ1

”kc2u2 + · · ·+

“λnλ1

”kcnun

–Assume that λ1 is dominant (i.e. |λ1| > |λ2| ≥ · · · ≥ |λn|).

Then for k →∞, we have“

λ2λ1

”k, . . . ,

“λnλ1

”k→ 0, and thus 1

λk1

Akx → c1u1.





• By choosing an appropriate start vector x(0) ∈ Rn and performing the iteration

x(k+1) = Ax(k),

we can calculate an approximation of an eigenvector of A belonging to theeigenvalue λ1.

• “Appropriate” means that uT1 x(0) 6= 0. (Otherwise, x(k) will converge to the first

eigenvector ui where uTi x(0) 6= 0.) If x(0) is chosen randomly with uniform

probability, then the probability that uT1 x(0) = 0 is negligible (mathematically, it is

zero).• The values of the iteration vector get arbitrarily large (if λ1 > 1) or small (if

λ1 < 1). To avoid numerical problems, we should therefore normalize the value ofxk after each iteration step.

• Calculating eigenvalues from eigenvectors: Let x be an eigenvector of Abelonging to the eigenvalue λ. Then

Ax = λxxT AxxT x

= λ

If x is normalized, i.e. ‖x‖ = 1, then λ = xT Ax.• The term

xT Ax

xT xis also called Rayleigh quotient.





Power Iteration: Algorithm

• Choose a random x(0) ∈ Rn such that‚‚x(0)

‚‚ = 1

• for k = 0, 1, 2, · · · :1. w(k) := Ax(k)

2. λ(k) := (x(k))T w(k)

3. x(k+1) := w(k)

‖w(k)‖4. stop if approximation “sufficiently accurate”

Remarks:• We can consider our approximation x(k) “sufficiently accurate” if‚‚‚w(k) − λ(k)x(k)

‚‚‚ ≤ ε‚‚‚w(k)

‚‚‚ .

• The cost per iteration step is determined by the cost of computing thematrix-vector product Ax(k), which is at most O(n2).





Convergence

• The convergence order of the sequence {x(k)} is linear with convergence rateq = λ2/λ1.

• The convergence order of the sequence {λ(k)} is quadratic with convergencerate q.

• Thus, if λ2 is only slightly smaller than λ1, convergence will be very slow. We canaddress this problem by shifting the eigenvalues:

– Assume we have guessed an approximation µ ≈ λ2.– Consider the matrix A− µI. It is easy to see that this matrix has eigenvalues

λ1 − µ, . . . , λn − µ.– By performing the iteration with the matrix A′ = A− µI instead of A, we can

greatly speed up convergence.– Example: n = 2, λ1 = 5, λ2 = 4.9.

Using µ = 4.85, we have convergence rate λ2−µλ1−µ

= 13

.

Compare this to the “original” convergence rate λ2/λ1 = 4950

!

– In general, we have a convergence rate λj−µ

λi−µ, where λi − µ is the largest

and λj − µ the second largest shifted eigenvalue, regarding the modulus, i.e.|λi − µ| ≥ |λj − µ| ≥ |λk − µ| ∀ k ∈ {1, . . . , n} \ {i, j}.





Remarks

• The algorithm also works for a slightly weaker condition on the eigenvalues of A:If A has no dominant eigenvalue, but there exists an r ∈ {1, . . . , n} such thatλ1 = · · · = λr and |λr| > |λi| for all i ∈ {r + 1, . . . , n}, then the sequence {x(k)}converges to a linear combination of u1, . . . , ur .

• Power iteration can be used to successively compute several eigenvalues indescending order:

– It is a standard result in linear algebra that, if v1 is an eigenvector belongingto the eigenvalue λ1, the matrix A− λ1v1vT

1 has the same eigenvalues andeigenvectors as A, but λ1 has been replaced by 0.

– After eigenvalue λ1 and a corresponding eigenvector v1 have beencomputed with sufficient precision, we continue the iteration with the matrixA− λ1v1vT

1 to find λ2 and so on.

• Prominent areas of application of power iteration are found in statistics (Google’sPageRank algorithm, Principal Component Analysis).





6.3.2. Inverse Iteration

• We now look for a method to compute a specific eigenvalue λ∗ of a symmetricmatrix A ∈ Rn×n, given the approximation µ ≈ λ∗. Let λ1, . . . , λn be theeigenvalues of A.

• Remember that A−1 has eigenvalues λ−11 , . . . , λ−1

n , such that we could computethe smallest eigenvalue of A by perfoming power iteration with A−1.

• We can combine this with the shifting technique we saw before: If µ is closer to λ∗

than to any other eigenvalue, λ∗ − µ is the smallest eigenvalue of A− µI.

• We can therefore perform power iteration with (A− µI)−1 to calculate anapproximation for λ∗ − µ!

• However, we do not have to calculate (A− µI)−1 explicitly since the recurrence

x(k+1) = (A− µI)−1x(k)

is equivalent to(A− µI)x(k+1) = x(k).

This means that we can calculate the value of x(k+1) by using a linear systemsolver in each iteration step.





Inverse Iteration: Algorithm

• Choose a random x(0) ∈ Rn such that‚‚x(0)

‚‚ = 1

• for k = 0, 1, 2, · · · :1. solve (A− µI)w(k) = x(k)

2. x(k+1) := w(k)

‖w(k)‖3. Stop if approximation “sufficiently accurate”

Remarks:• The cost of each iteration step is dominated by the complexity of solving the linear

system with coefficient matrix A− µI.

• Since this matrix does not change during iteration, we can compute its LUfactorization in the beginning, thereby reducing the cost of each iteration step toO(n2) operations.

• As we are essentially performing power iteration, the same remarks aboutconvergence apply.





6.3.3. Rayleigh Quotient Iteration

• In our implementation of power iteration, we used the Rayleigh quotient tocompute an approximation of λ1 in each iteration step.

• We can make use of this value to refine our approximation µ in each step!

• This gives the following algorithm:

– Choose a random x(0) ∈ Rn such that‚‚x(0)

‚‚ = 1

– for k = 0, 1, 2, · · · :1. µ(k) := (x(k))T Ax(k)

2. solve (A− µ(k)I)w(k) = x(k)

3. x(k+1) := w(k)

‖w(k)‖4. Stop if approximation “sufficiently accurate”

• This algorithm is usually known as Rayleigh quotient iteration.





Remarks

• It can be shown that the convergence order of the sequence {λ(k)} is now cubic!Still, the sequence of the eigenvectors converges linearly.

• However, since we have to solve a different system of linear equations in eachiteration step, the cost per step is usually significantly higher than for poweriteration!

• Rayleigh quotient iteration is the method of choice if we have a goodapproximation of the eigenvalue we look for – then only a few iteration steps areneeded due to the high convergence order.

• If we already have an eigenvalue and only want to compute a correspondingeigenvector, “standard” inverse iteration has an advantage over Rayleigh quotientiteration, since the sequence of eigenvectors converges linearly for both methods,but Rayleigh quotient iteration is more costly.





6.4. QR Iteration

• Factorization methods such as the QR iteration [Francis 1961] compute alleigenvalues of a symmetric matrix A ∈ Rn×n at once.

• The idea is to compute a sequence of matrices {Ak} that are similar to A. Thissequence should converge to a matrix in triangular form, such that the diagonal ofAk is an approximation for the eigenvalues of A.

• The QR iteration is based on the QR decomposition which factorizes a givenmatrix A such that A = QR, with Q an orthogonal and R an upper triangularmatrix. Then the matrix

RQ = Q−1AQ = QT AQ

is similar to A.

• It can be shown that the sequence {A(k)} defined by

A(0) = A,

A(k+1) = R(k)Q(k),

Q(k) orthogonal, R(k) upper triangular, A(k) = Q(k)R(k)

converges to a diagonal matrix.





QR Iteration: Algorithm

• Thus, we have the following algorithm:

– A(0) := A– for k = 0, 1, 2, . . .:

1. Compute a factorization A(k) = QR using QR decomposition2. A(k+1) := RQ

3. if a(k+1)n,n−1 sufficiently small:

output λn ≈ a(k+1)n,n

remove row and column n

• We consider the subdiagonal element a(k+1)n,n−1 “sufficiently small” if its absolute

value is ≤ ε · |a(k+1)n,n |.

• We still need to know how to implement the QR decomposition!





6.4.1. QR Decomposition

• Remember: We want to decompose a matrix A into an orthogonal matrix Q andan upper triangular matrix R such that A = QR.

• This problem is equivalent to the following: Find an orthogonal matrix G such thatGA = R, i.e. G “zeroes out” the subdiagonal part of A. Then A = GT R.

• We first have a look at the two-dimensional case: Given a vector x = (x1, x2)T ,find an orthogonal matrix G such that Gx = (r, 0)T for some r ∈ R.

• It is easy to see that

G =1q

x21 + x2

2

„x1 x2

−x2 x1

«does the job.

• Geometrically, the matrix G describes a rotation by θ = arctan(x2/x1) radians inthe (x1, x2) plane.

• G is called a Givens rotation matrix [Givens 1958].





• Generalization to the n-dimensional case: When multiplied with A,

Gij =

0BBBBBBBBBBBB@

1 · · · 0 · · · 0 · · · 0...

. . ....

......

0 · · · ρ · ajj · · · ρ · aij · · · 0...

.... . .

......

0 · · · −ρ · aij · · · ρ · ajj · · · 0...

......

. . ....

0 · · · 0 · · · 0 · · · 1

1CCCCCCCCCCCCA, ρ =

1qa2

ij + a2jj

affects rows i and j of A and eliminates entry aij . Gij differs from the identitymatrix in only four entries.

• To zero out the entire subdiagonal part, we use the product

G = Gn−1,n · Gn−2,nGn−2,n−1 · · · · · G2,n · · ·G2,3 · G1,n · · ·G1,2,

i.e. we eliminate the matrix entries column-wise from left to right, top to bottom, inorder not to introduce additional non-zero entries!

• Since the product of two orthogonal matrices is orthogonal, G is orthogonal andGA is an upper triangular matrix for any matrix A.





QR Iteration: Implementation

• We now have a method to compute the QR decomposition of A: R = GA,Q = GT .

• The naive computation of G via n(n− 1)/2 matrix multiplications costs O(n5)operations!

• However, the effect of the multiplication with a Givens rotation matrix can becomputed with O(n) operations due to its special structure. This leads to a total ofO(n3) operations for the QR decomposition.

• For QR iteration, we are not even interested in the value of G. We only need tocompute the product A(k+1) = GA(k)GT !

• Since matrix multiplication is associative and (M1M2)T = MT2 MT

1 , we can usethe following scheme for computation of A(k+1):

A(k+1) = . . . G1,3

“G1,2 A(k) GT

1,2

”GT

1,3 . . .





Remarks

• QR iteration is the method of choice if all eigenvalues of a matrix are needed.

• The convergence order of the sequence {A(k)} is linear.

• As with power iteration, convergence is slow if any of the eigenvalues are closetogether. By using a shift operation, similar to the one in Rayleigh quotientiteration, we can achieve quadratic convergence order.

• Our algorithm does not calculate eigenvectors. Eigenvectors to specificeigenvalues can be retrieved using inverse iteration.

• QR decomposition can also be used as an exact solver for systems of linearequations: Let A ∈ Rn×n, b ∈ Rn. Then

Ax = b⇐⇒ QRx = b⇐⇒ Rx = QT b

– Since R is upper triangular, the system Rx = QT b can be solved with O(n2)operations.

– As the computation of the QR decomposition can take O(n3) operations, thismethod – as LU factorization – is especially useful when a set of systemswith the same matrix A but many different vectors bi has to be solved.





6.5. Reduction Algorithms

• The cost of all three algorithms presented so far depends largely on the number ofnon-zero entries in the matrix A:

– Cost of an iteration step in power iteration is dominated by the complexity ofthe matrix-vector product Ax.

– Cost of an iteration step in inverse iteration is dominated by the complexity ofsolving Ax = b.

– Cost of an iteration step in QR iteration depends on the number of Givensrotations that we have to use to eliminate non-zero entries in the iterationmatrix Ak.

• Therefore, for numerical computation of eigenvalues, we are interested in matricesthat are similar to A but have a significantly lower number of non-zero entries.





• We try to develop an algorithm that transforms a symmetric matrix A ∈ Rn×n intoa similar tridiagonal symmetric matrix:

A →

0BBBBBBBBB@

∗ ∗ 0 0 · · · · · · 0∗ ∗ ∗ 0 0 · · · 00 ∗ ∗ ∗ 0 · · · 0

...0 · · · 0 ∗ ∗ ∗ 00 · · · 0 0 ∗ ∗ ∗0 · · · · · · 0 0 ∗ ∗

1CCCCCCCCCA• Observation: Matrix A ∈ Rn×n is tridiagonal and symmetric if and only if it is of

the form 0BBBBB@α ρ 0 · · · 0ρ0 A′

...0

1CCCCCAwith A′ ∈ R(n−1)×(n−1) tridiagonal and symmetric.

• This leads to a “dynamic programming” approach: Use a similarity transformationto transform A to a matrix in the above form, then continue with A′ recursively.





Transformation Algorithm

• If we had a matrix H′ ∈ R(n−1)×(n−1) such that

H′

0BBB@a12

a13

...a1n

1CCCA =

0BBB@ρ0...0

1CCCAfor some ρ ∈ R, then

HA :=

0BBBBB@1 0 · · · 000 H′

...0

1CCCCCA

0BBBBB@a11 a12 · · · a1n

a12

a13

.... . .

...a1n · · · ann

1CCCCCA =

0BBBBB@a11 a12 · · · a1n

ρ0 B...0

1CCCCCA(B ∈ R(n−1)×(n−1)).

• We already know that such a H′ exists – for example, it can be described as theproduct of n− 2 Givens rotations.





• By transposing the result, we can use the effect of multiplication with H again:

H(HA)T = H

0BBBBB@a11 ρ 0 · · · 0a12

a13 BT

...a1n

1CCCCCA =

0BBBBB@a11 ρ 0 · · · 0ρ0 A2

...0

1CCCCCA• Since H(HA)T = HAT HT = HAHT , this is a similarity transformation if and

only if H is orthogonal.

• For that, it is important that the application of H does not destroy the first row of A.Therefore, we can only eliminate everything below the subdiagonal element of thefirst column, and produce a triadiagonal matrix only (instead of a diagonal one).

• With basic linear algebra, one can also show that A2 is symmetric.

• Our task is now to find an appropriate matrix H′!





Householder Transformations

• A first idea could be to choose H′ as a product of Givens rotations, as previouslyindicated.

• However, in practice Householder transformations [Householder 1958] are usedmore frequently.

• For any vector x ∈ Rn, Householder matrix H ∈ Rn×n is defined by

H = I − 2vvT ,

with v := u‖u‖2

and u := x− ‖x‖2 e1, such that

Hx =

0BBB@ρ0...0

1CCCAfor some ρ ∈ R.

• Geometrically, H describes a reflection of x in the hyperplane defined by thevector v which is orthogonal to the hyperplane.





Remarks

• As with Givens rotation matrices, Householder matrices are usually not explicitlycomputed. When using an appropriate implementation, the effect of multiplicationwith a Householder matrix can be computed with O(n2) operations.

• In that case, the transformation algorithm described above takes O(n3) steps,rather than O(n4) steps for a naive implementation with O(n) matrixmultiplications.

• Householder transformations can also be used to compute the QR decomposition,since the product of n− 1 Householder matrices transforms a matrix into uppertriangular form.

• We could therefore also use them for QR iteration!

• However, it is more efficient to first reduce a matrix to tridiagonal form usingHouseholder transformations (cost: O(n3)), then perform QR iteration usingGivens rotations on the reduced matrix.

• Then, in each iteration step, only O(n) elements need to be zeroed using Givensrotations, which results in a cost of O(n2) per iteration step, vs. O(n3) when usingHouseholder transformations.




a ‘must’ for engineers - in.tum.de · a ‘must’ for engineers ... 6. the symmetric...

Documents