numerical linear algebrawetton/m521/notesxvii.pdf · 1 introduction to numerical linear algebra in...

23
Numerical Linear Algebra Brian Wetton November 22, 2010 Contents 1 Introduction to Numerical Linear Algebra 2 1.1 Notation and Model Problems ...................... 2 1.2 Direct and Fast Transform methods ................... 3 1.3 Conjugate Gradient Method ....................... 6 1.3.1 How it works (three tricks) .................... 6 1.3.2 How well it works ......................... 9 1.3.3 Performance on the Model Problems .............. 12 1.3.4 Preconditioned Conjugate Gradients .............. 15 2 Spectral Methods 18 What is Numerical Analysis? In a recent edition of SIAM News, L.N. Trefethen gives two definitions of Numerical Analysis: 1. the study of rounding errors. 2. the study of algorithms for the problems of continuum mechanics. It is the second definition that will take priority for us in this class. Specifically, we are considering the FEM applied to elliptic problems. We have already seen how to use the FE approach to discretize elliptic problems and turn them into systems of linear equations whose solution will give us approximations to the solution of the original problem. Then, we got an idea of how a specific geometry, forcing and boundary data can be input in to a general package like PLTMG to generate the linear system we wish to solve. In the next section, we will discuss the properties of various methods to solve the resulting system of equations, restricting ourselves to positive definite systems for now. It is important to note that all three aspects are necessary for “good” computational results. A fast solver for a poor (or even inconsistent!) discretization does not help and an accurate discretization that leads to systems that 1

Upload: others

Post on 27-Jun-2020

11 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Numerical Linear Algebrawetton/m521/NotesXVII.pdf · 1 Introduction to Numerical Linear Algebra In this section, we will consider the solution of the symmetric, positive definite

Numerical Linear Algebra

Brian Wetton

November 22, 2010

Contents

1 Introduction to Numerical Linear Algebra 21.1 Notation and Model Problems . . . . . . . . . . . . . . . . . . . . . . 21.2 Direct and Fast Transform methods . . . . . . . . . . . . . . . . . . . 31.3 Conjugate Gradient Method . . . . . . . . . . . . . . . . . . . . . . . 6

1.3.1 How it works (three tricks) . . . . . . . . . . . . . . . . . . . . 61.3.2 How well it works . . . . . . . . . . . . . . . . . . . . . . . . . 91.3.3 Performance on the Model Problems . . . . . . . . . . . . . . 121.3.4 Preconditioned Conjugate Gradients . . . . . . . . . . . . . . 15

2 Spectral Methods 18

What is Numerical Analysis?

In a recent edition of SIAM News, L.N. Trefethen gives two definitions of NumericalAnalysis:

1. the study of rounding errors.

2. the study of algorithms for the problems of continuum mechanics.

It is the second definition that will take priority for us in this class. Specifically,we are considering the FEM applied to elliptic problems. We have already seen howto use the FE approach to discretize elliptic problems and turn them into systemsof linear equations whose solution will give us approximations to the solution ofthe original problem. Then, we got an idea of how a specific geometry, forcing andboundary data can be input in to a general package like PLTMG to generate the linearsystem we wish to solve. In the next section, we will discuss the properties of variousmethods to solve the resulting system of equations, restricting ourselves to positivedefinite systems for now. It is important to note that all three aspects are necessaryfor “good” computational results. A fast solver for a poor (or even inconsistent!)discretization does not help and an accurate discretization that leads to systems that

1

Page 2: Numerical Linear Algebrawetton/m521/NotesXVII.pdf · 1 Introduction to Numerical Linear Algebra In this section, we will consider the solution of the symmetric, positive definite

cannot be solved on existing machines is useless. Neither discretization or solutionprocedure are of any value if you cannot set up the geometry of the problem you wishto solve in a reasonably efficient way.

1 Introduction to Numerical Linear Algebra

In this section, we will consider the solution of the symmetric, positive definite linearsystems arising from a FEM discretization of an elliptic problem. We will considerthis in the general form

AhU = F

where h is considered to be a grid spacing parameter in the discretization. Althoughthis seems like a very different problem from what we considered earlier (analysingthe error from a FE discretization) many Numerical Analysts have spent their wholecareers considering this question. We will see why a good choice of solver is vital ifwe want to get accurate answers using less than a decade of computer time.

We consider first two extremes: direct methods and fast transform methods ofsolution. Then, we turn to the real task of solution procedures for general problemsand geometries. We consider conjugate gradient methods and preconditioning; simpleiterative methods (Jacobi, Gauss-Seidel, SSOR); using SSOR as a preconditionerfor conjugate gradients; and finally the (asymptotic) winner: multigrid methods.Along the way, we consider the question of spectra for elliptic problems and theirdiscretizations. Some notation and model problems are discussed below.

1.1 Notation and Model Problems

We consider the three model problems below for u on the unit square domain Ω =[0, 1]2. The data f is taken to be esin 2π(x−y).

1. −∆u = f in Ω and u = 0 on ∂Ω (our original problem).

2. −∆u + u = f and u is doubly periodic.

3. −∆u + (2 + cos(cos 2π(x + y))u = f and u is doubly periodic.

Note that the term αu with α ≥ 1 is added in the periodic problems to give uniquenessof solutions. Although we ultimately want to judge the performance of solvers for FEdiscretization of these problems, we will consider first a centered FD discretization.

Note 1 (Very Useful) Many of the properties of solvers for FE discretizations arethe same as solvers for centered FD approximations.

As support for this claim, we note that FE discretization of the Poisson problem ona triangulation of a regular square grid is the same (except for some weighting ofthe forcing term) as the standard centered FD approximation. It is useful to first

2

Page 3: Numerical Linear Algebrawetton/m521/NotesXVII.pdf · 1 Introduction to Numerical Linear Algebra In this section, we will consider the solution of the symmetric, positive definite

N time (SPARC 2 CPU seconds)8 1.216 31.832 (2035)64 (1.3e5)...

...1024 (2.18e12 = 69 millenia)

Table 1: CPU time vs N for direct LU method.

consider the solution methods applied to FD discretizations because they are easierto implement and the properties of the methods are often much more transparent.

As review of FD discretization, we consider a regular grid approximation of modelproblem 1 (M1) above on an N × N grid with grid spacing h = 1/N . The discretevalues Ui,j approximate u(ih, jh) for i, j = 1, . . . N . The equation for unknown i, j(i.e. the (i,j) row of the corresponding matrix Ah) is

−Ui+1,j − Ui−1,j − Ui,j+1 − Ui,j−1 + 4Ui,j

h2= Fi,j.

3D variants of the problems and discretizations will be considered later for operationcount comparisons.

A final note on notation should be made here: N denotes the number of grid lines(or planes) in single coordinate direction, not the total number of unknowns M . In2D M = O(N2) and in 3D M = O(N3). We retain N because many of the propertiesof the solution methods we consider will scale like N not M for different dimensions.Also, the concept of “acceptable” resolution for many problems will scale like N . Forphysical problems, often N is between 50 and 1, 000, giving M ∼ 2500 − l, 000, 000in 2D to M ∼ 125, 000 − 1, 000, 000, 000 in 3D. We will discuss the performance ofmethods in terms of asymptotically large N (i.e. the best possible - and unattainable -method would solve a discretization of the model problems above in O(N2) operationsin 2D and O(N3) in 3D). However, note that the number of unknowns can be huge butN can be relatively moderate. Therefore, we will also compare the actual performanceof the methods for reasonable N .

1.2 Direct and Fast Transform methods

We will concentrate on the problem M2 in this section (switching to M1 briefly fora discussion of banded solvers). The FD discretization of the problem results in asystem with M = N2 unknowns. The simplest (but inadvisable) thing to do is tocall a “black box” matrix solver. I called the NAG routine F04ARF which does anLU factorization of the matrix and then solves the problem backwards (like gaussianelimination and backward substitution). Times for various N are shown in table 1.Actually, I ran out of memory for the N = 32 calculation and the numbers shown in

3

Page 4: Numerical Linear Algebrawetton/m521/NotesXVII.pdf · 1 Introduction to Numerical Linear Algebra In this section, we will consider the solution of the symmetric, positive definite

N time (SPARC 2 CPU seconds)8 0.116 0.1532 1.2864 17.42...

...1024 (1.14e6 = 13 days)

Table 2: CPU time vs N for banded direct method.

brackets are projected from N = 16 using the fact that the operation count for thedirect solver is O(M3) = O(N6), i.e. that every time the resolution is doubled, thecomputation time goes up by a factor of 26 = 64. Note that the ratio of the N = 8to N = 16 times is not 64 - this is due to overhead time.

Of course, this just shows that a totally “black box” approach to solving theproblem will not be good enough. We see below how some simple informationon the banded structure can be used to speed up the computation significantly.For this discussion, we consider the problem M1. If we order the unknowns as(1, 1), (2, 1), . . . , (N, 1), (1, 2), (2, 2), . . . , (N, N) we end up with the following bandedstructure to Ah:

Ah = picture . (1)

The bands are of size N and a banded solver can be used here to considerably reducethe computational time involved with the solution of this problem as shown in table2. My system runs out of main memory at N = 128 and after this point the timesare extended using the operation count O(N4), i.e. every time the resolution doublesthe computation time goes up by a factor of 24 = 16. At N = 64 we are close to theasymptotic regime.

Note 2 (On Direct Sparse Matrix Solvers) In the FE discretization of M1, theunknowns are not ordered in such a way that the banded structure is apparent. Anadditional computation must be done to reorder the unknowns to reduce the bandwidth.This is the simplest form of sparse matrix techniques.

For 1D computations, the matrix will be tridiagonal (the bands don’t grow in size)which makes solution techniques trivial in 1D (a tridiagonal solver is O(N), theoptimal for 1D). For 3D computations, the bands are proportionally much widerO(N2) and banded techniques are even less competitive.

These direct techniques are about as bad as we can do. We see below an exampleof a “fast method” which is about as good as we can do. We return to M2. TheFFT is a fast method to compute the discrete fourier transform (DFT) of an array ofvalues. For a 2D N × N array Uk,j (index k, j from 0 to N − 1 here) the DFT Uα,β

is given by

Uα,β =1

N

N−1∑k=0

N−1∑j=0

Uk,je−2π(ikα+ijβ)/N (2)

4

Page 5: Numerical Linear Algebrawetton/m521/NotesXVII.pdf · 1 Introduction to Numerical Linear Algebra In this section, we will consider the solution of the symmetric, positive definite

N time (SPARC 2 CPU seconds)8 0.1416 0.2032 0.3664 1.06128 4.01256 15.6...

...1024 (250)

Table 3: CPU time vs N for fast transform method.

for α, β = 0, . . . N − 1. The inverse transform is given by

U = F(U∗)∗

where F also denotes the DFT and ∗ denotes complex conjugation. The transform(2) is just writing the vector U in terms of coefficients in the ortho-normal basiseα,β := 1

Ne−2π(ikα+ijβ)/N. This basis is especially convenient because

1. Computing U is very efficient using the FFT (assume N is a power of 2 or has“nice” prime factors): it takes O(N2 log N) operations.

2. The DFT diagonalizes constant coefficient periodic problems, i.e.

Aheα,β = κα,βeα,β

where Ah is the finite difference approximation of M2 and

κα,β = (4− 2 cos(2παh)− 2 cos(2πβh))/h2 + 1.

The fast method for solving M2 is realized as follows: calculate F by FFT; calculateUα,β = F /κα,β pointwise; finally calculate U from U using the inverse FFT. We areleft with an algorithm of order O(N2 log N). The factor log N grows very slowly, so wehave a near optimal algorithm in which the computational time basically quadruplesevery time the grid is refined by a factor of two. The performance of the fast methodon M2 is given below: Obviously, this method is far superior than the previous directmethods. It is, in fact, the most efficient solution technique for this problem. However,it is limited to problems with constant coefficients, regular grids and nice geometries.But...

Note 3 (Use a Fast Method When You Can) If you are trying to examine somephenomenon that do not require the presence of complicated boundaries to occur, dothe computations in a simple domain with a FD (or spectral in this case) discretiza-tion. Fast solvers for periodic channels and boxes in any dimension with dirichlet orneumann boundary conditions are easy to implement (in the case of a box, a fast sineor cosine transform is used). This permits much finer resolution computations andmay decide whether the phenomena can be observed numerically at all.

5

Page 6: Numerical Linear Algebrawetton/m521/NotesXVII.pdf · 1 Introduction to Numerical Linear Algebra In this section, we will consider the solution of the symmetric, positive definite

method 1D 2D 3D comments

optimal N N2 N3 unattainable except in 1Ddirect LU N3 N6 N9 “black box” but useless

direct banded N N4 N7 great for 1D, limited use in 2D, use-less in 3D

fast N log N N2 log N N3 log Nnear optimal for 2D and 3D, limitedto “nice” geometries and constantcoefficient problems.

Table 4: Operation counts for some solvers

The discussion of this section is summarized in the table 4.

1.3 Conjugate Gradient Method

We now turn to problems where fast transform methods cannot be used (like M3which has variable coefficients). For this kind of problem, the best method above wasthe banded solver, since this took advantage of the sparsity of the matrix A. However,it certainly did not do as well as possible (the fact that the bands were sparse wasnot used). For instance, multiplication by A takes advantage of the sparsity, takingO(M) operations. Let us run with this idea. With a sparse A it is easy to constructthe vectors

F,AF,A2F, . . . (3)

If we denote the space span F,AF,Ak−1F by Sk we might want to choose anapproximation Uk ∈ Sk to U that minimizes the error ‖U − Uk‖. There are three“tricks” that are involved in turning this idea in to a useful algorithm. These aredescribed below, followed by an error analysis and then a discussion of the results onapplication to a test problem.

1.3.1 How it works (three tricks)

It turns out that the “right” norm to use to minimize the error is not the Euclideannorm. We introduce several inner products on the space RM :

(U, V ) =M∑i

Ui · Vi euclidean

(U, V )A = (U,AV )

(U, V )A−1 = (U,A−1V )

(U, V )A2 = (AU,AV ) residual.

These norms all make sense for matrices A that are symmetric positive definite. Wewill discuss this in more detail in the next section and show that the matrices Ah

from the discretization of our model problems are positive definite.

6

Page 7: Numerical Linear Algebrawetton/m521/NotesXVII.pdf · 1 Introduction to Numerical Linear Algebra In this section, we will consider the solution of the symmetric, positive definite

Note 4 (On Norms) Just like John, I am being a little cavalier with notation.When continuous norms are being used for the discrete functions Uh we will usethe subscript h explicitly on U . With this notation, it is easy to show that

(U, V )A = (Uh, Vh)H10

(4)

(U, V )M = (Uh, Vh)L2 (5)

where

Ai,j = (∇φih,∇φj

h) (6)

Mi,j = (φih, φ

jh) (7)

Since the H10 norm is the one in which our original problem was defined, it makes

sense to do the minimizing in this norm (it also makes the algorithm amazingly simpleas we will see below). To make the minimization easy, we will construct an (·, ·)A-orthogonal (⊥A) sequence di such that d1, . . . dk spans Sk (this could be done i.e.by applying the Gramm Schmidt process to (3) but can be much more efficiently asshown below). We write our approximation Uk as

Uk =k∑

i=1

αidi

In order to minimize the error (in (·, ·)A remember) between U and Uk ∈ Sk for eachk we must have

(U,Adj) = (Uk, Adj) for j = 1, . . . , k

or

αj =(U,Adj)

(dj, Adj)for j = 1, . . . , k

using the fact that di is A-orthogonal. I think this is clear in itself, but it is aconsequence of the following result: if U is given and L is a linear subspace of RM

and V ∈ L minimizes‖U − V ‖?, V ∈ L

then U − V ∈ L⊥? .Now comes the magic aspect of the choice of inner product:

αj =(U,Adj)

(dj, Adj)=

(AU, dj)

(dj, Adj)=

(F, dj)

(dj, Adj)

and so the coefficients in the approximations for U do not involve U (good, since wedon’t know it) but only F (which is known).

The algorithm is made practical by a trick to calculate the ⊥A vectors withoutneeding the previous vectors. This is accomplished by introducing the residual rk =

7

Page 8: Numerical Linear Algebrawetton/m521/NotesXVII.pdf · 1 Introduction to Numerical Linear Algebra In this section, we will consider the solution of the symmetric, positive definite

Axk − b = A(xk − x). Note that hk := rk + b is in the space ASk. We rewrite theoriginal minimization problem to show that hk minimizes the problem

‖A−1(h + b)‖2A, h ∈ ASk.

Since‖A−1x‖2

A = (A−1x, AA−1x) = (A−1x, x) = ‖x‖2A−1

our problem is also that of minimizing

‖(h + b)‖2A−1 , h ∈ ASk.

As before, the minimizer hk must be such that hk + b = rk is ⊥A−1 to ASk. Thisshows easily that

rk ⊥ Sk (8)

rk ⊥A Sk−1. (9)

Theorem 1 The vector dk is in span rk−1, dk−1.

Proof: Note that Sk = span F, r1, . . . , rk−1. This follows easily from the recurrencerelation rk = rk−1 + αkAdk. Thus, we can write

dk = rk−1 −k−1∑i=1

γidi.

We take A-inner products to solve for γi:

γi = (rk−1, Adi)/(di, Adi).

Now, using (9) we know that γi = 0 for i = 1, . . . k − 2.

We change notation slightly from the theorem and write

dk = rk−1 + βkdk−1

whereβk = (rk−1, Adk−1)/(dk−1, Adk−1).

Using the orthogonality results above many other formulas for α and β can bederived as in the algorithm described below. Begin with U0 = 0 and r0 = F andcompute

1. βj = (rj−1, rj−1)/(rj−2, rj−2) (except β1 = 0)

2. dj = rj−1 + βjdj−1 (except d1 = r0)

3. αj = (rj−1, rj−1)/(dj, Adj)

8

Page 9: Numerical Linear Algebrawetton/m521/NotesXVII.pdf · 1 Introduction to Numerical Linear Algebra In this section, we will consider the solution of the symmetric, positive definite

4. xj = xj−1 + αjdj

5. jj = jj−1 − αjAdj

We now show that the above method (called the conjugate gradient method)terminates with the exact solution in a finite number of steps.

Theorem 2 Let M∗ = dim SM . The CG algorithm provides the exact solution U inM∗ steps (i.e. U = UM∗

).

Proof: If M∗ = M then SM = RM and so UM = U . Otherwise, consider rM∗ =AxM∗ − b. Note that rM∗ ∈ SM∗ since this is the biggest space generated bythe action of A on F . Also, by (8), rM∗ ∈ S⊥

M∗ . Therefore, rM∗ = 0 and soU = UM∗ .

Note that only one matrix multiply is needed per iteration. Therefore, the totaloperation count is O(M2) to get the exact solution. This is O(N4) in 2D and O(N6)in 3D. However, we make the following remark

Note 5 (You Don’t Need The Exact Solution of the Discrete Problem) Ourdiscrete solution Uh is only an approximation of the exact solution. Therefore, we canterminate the CG method before we have the exact solution to the discrete problemand not care as long as our answer is “accurate enough”. However, it is often helpfulto make the error from the solution procedure much smaller than the discretizationerror so as not to confuse the source of errors. When the method has been tested ona class of problems, the accuracy of the solution procedure can be relaxed to increaseefficiency.

1.3.2 How well it works

We know that the CG method will give the exact solution in at most M iterations,but how large is the error at each step? Recall that

‖U − Uk‖2A = ‖rk‖2

A−1 = minhk∈ASk

‖F − hk‖2A−1 . (10)

Notice that all vectors of the form F − hk can be written as P (A)F , where P isa polynomial in Πk which includes all polynomials of degree ≤ k with P (0) = 1.Recall that A is positive definite symmetric so it has a full set of ortho-normal (inthe euclidean inner product (·, ·)) eigenvectors vi with eigenvalues λi. We expandF in this basis

F =M∑i=1

aivi

and note that U =∑M

i=1 λ−1i aivi and that ‖F‖2 =

∑a2

i , ‖F‖2A =

∑λia

2i , etc. Suppose

we pick a polynomial P in Πk such that

|P (λi)| ≤ M, for all i.

9

Page 10: Numerical Linear Algebrawetton/m521/NotesXVII.pdf · 1 Introduction to Numerical Linear Algebra In this section, we will consider the solution of the symmetric, positive definite

ThenP (A)F =

∑P (λi)aivi

and so

‖P (A)F‖2A−1 =

∑λ−1

i P 2(λi)a2i

≤ M2∑

λ−1i a2

i

= M2‖U‖A.

Therefore, using (10), we have

‖U − Uk‖A ≤ M‖U‖A.

We see that the study of the performance of CG methods for finite iterations boilsdown to the study of polynomials on the spectral set of A (more on spectra nextsection). To prove the main theorem of this section, we will use some properties ofthe Chebyshev polynomials Tk coming from the two equivalent formulas below:

1. Tk(x) = 12[(x +

√x2 − 1)k + (x−

√x2 − 1)k].

2. Tk(x) = cos[k cos−1 x] for |x| ≤ 1.

From formula 2 we see thatmax|x|≤1

|Tk(x)| = 1 (11)

andTk(xi) = (−1)i for xi = cos(iπ/k), i = 0, 1, . . . k. (12)

From formula 1 we get the bound

Tk(a + 1

a− 1) >

1

2

(√a + 1√a− 1

). (13)

We are now in a position to state the main theorem which gives a bound on the errorafter k steps using only the values of the extreme eigenvalues,i.e. λ1 and λM withλ1 < λM and λi ∈ [λ1, λM ] for all i.

Theorem 3 The CG iterates satisfy

‖U − Uk‖A ≤ 2γk‖U‖A (14)

where

γ =

√a− 1√a + 1

and a = λM/λ1. In addition, the number of iterations p(ε) to reduce the initial error‖U‖A by a factor of ε satisfies

p(ε) ≤ 1

2

√a ln(2/ε) + 1. (15)

10

Page 11: Numerical Linear Algebrawetton/m521/NotesXVII.pdf · 1 Introduction to Numerical Linear Algebra In this section, we will consider the solution of the symmetric, positive definite

Proof: Since we know nothing about the structure of the spectra in the interval[λ1, λM ] we will attempt to find the polynomials in Πk that have a minimalmaximum value Bk over the interval. It turns out that the scaled Chebyshevpolynomials

Pk(x) =Tk[(λM + λ1 − 2x)/(λM − λ1)]

Tk[(λM + λ1)/(λM − λ1)]

have this property. Note that the scaling in the argument in numerator mapsthe interval [λ1, λM ] in to the interval [−1, 1] and that the argument in thedenominator is greater than 1 so the value cannot be zero (all the zeros of Tk

are in [−1, 1]). The maximum value of Pk on the spectral interval is

Ck = Tk[(λM + λ1)/(λM + λ1)]−1 (16)

which occurs k + 1 times (with alternating sign) at the mapped points xi from(12). To show that Pk has the desired property, assume that there is a polyno-mial Qk ∈ Πk with

maxx∈[λ1,λN ]

|Qk(x)| = Bk < Ck.

Now consider Rk = Qk−Pk. This has a zero at x = 0 and alternates sign at thek + 1 mapped points xi (since |Qk(x)| < Ck at all points in the interval). SinceRk must have a zero between each sign change, it has k + 1 zeros, so R ≡ 0.This proves the optimal quality of the polynomial Pk. The bound (16) alongwith (13) proves (14).

We now turn to a proof of (15). Clearly, we will satisfy the condition if p satisfies

2γp ≤ ε

or

p ≥ log(2/ε)

log(1/γ).

Sincelog[(

√a + 1)/(

√a− 1)] > 2/

√a, for all a > 1

the result (15) follows.

Clearly, this result is not the most optimal result, because it does not take intoaccount the distribution of the eigenvalues within the interval. For instance, even ifλ1 = 1 and λM = 1000 (large ratio, expect poor performance from the theorem) wewill get convergence in two iterations if they are the only eigenvalues present (withas many multiplicities as you want).

We now consider what the spectrum looks like for one of our model problems, M2.Actually, we have the answer already because the problem was diagonalized by theDFT. The eigenvalues were

κα,β = (4− 2 cos(2παh)− 2 cos(2πβh))/h2 + 1.

11

Page 12: Numerical Linear Algebrawetton/m521/NotesXVII.pdf · 1 Introduction to Numerical Linear Algebra In this section, we will consider the solution of the symmetric, positive definite

Thus the minimum eigenvalue is 1 and the maximum is 8/h2 + 1 so the ratio isa ∼ 8N2. Therefore, using the results of the theorem above, we get convergence totolerance ε in O(N log(1/ε)) iterations. For simplicity of notation we assume thatε ∼ hq (i.e.that we choose ε to be comparable with some discretization error). In thiscase, we get convergence in O(N log N) iterations. This is true due to the behaviourof the spectra of the discrete operator which stays the same for 3D. Therefore, weget sufficient accuracy in O(N3 log N) operations in 2D and O(N4 log N) in 3D. Weobserve this behaviour in the example calculations described below.

Note 6 (Good News, Bad News) This is a remarkable improvement over the orig-inal CG method (run to termination) that we get essentially for free. However, wecould still be unhappy because the rate of convergence gets worse as the problem getsbigger. We would like the rate to be constant as the problem gets bigger (since we haveto do more work per iteration anyway). As far as I know, this property holds only forsome “nice” preconditioned CG methods (an example is given below) and Multi-Gridmethods.

1.3.3 Performance on the Model Problems

From the discussion of the previous section, we expect the conjugate gradient methodto converge to the exact solution to some fixed accuracy in the A norm in O(N)operations. We will do some model calculations on M2, because in this case, theexact (discrete) solution can be found using the FFT and then compared to the CGiterates (in ‖ · ‖A). We compute the convergence factor

ρk = k

√√√√‖U − Uk‖A

‖U‖A

and expect ρ ∼ k√

2 γ from our previous analysis. We take ε = 10−7 and compute untilthe relative error is less than ε in the A-norm. The results for N = 16 are shown intable 5. The surprising feature of this table is that the convergence factors decreaseduring the computation, unlike the behaviour predicted by the bound. There are anumber of reasons for this. First, the eigenvalues are not uniformly distributed asshown in Fig. 1 (there are actually 256 eigenvalues in there with many duplications).Also, we have not taken into account the properties of the coefficients ai in the modeldecomposition of F . In this case, the ai’s decay very rapidly, so the CG alogorithmdoes not “feel” the effect of large eigenvalues (see discussion below). There are alsosome subtleties with the decrease in convergence factor or “superlinear convergence”discussed in [9].

We continue applying the CG method to M2 with different N . The number ofiterations to convergence, the first convergence factor, and the total calculation timeare shown in table 6. Notice that the first convergence factors ρ1 are not tending to1 with increasing N (the prediction was ρ ∼ γ = 1− O(h)). As discussed above, weexpect the convergence factors to improve, but at the first step one would think the

12

Page 13: Numerical Linear Algebrawetton/m521/NotesXVII.pdf · 1 Introduction to Numerical Linear Algebra In this section, we will consider the solution of the symmetric, positive definite

iteration ‖U − Uk‖A/‖U‖A ρk

1 1.23 0.9742 1.20 0.9753 1.15 0.9694 0.95 0.9305 0.34 0.7696 3.62-02 0.5527 1.26-03 0.3728 8.69-06 0.2269 1.38-14 2.81-02

Table 5: Performance of CG on M2 with N = 16

Figure 1: Eigenvalue distribution for a FD discretization of M2. The large marks arezero (left) and 2500 (right). The spectra lies between 1 and 2049.

N # iterations ρ1 time16 9 0.9744 0.232 13 0.9748 1.0164 20 0.9749 6.42128 37 0.9749 52.7256 73 0.9749 416...

......

...1024 (292) (0.9749) (26K)

Table 6: Performance of CG on M2.

13

Page 14: Numerical Linear Algebrawetton/m521/NotesXVII.pdf · 1 Introduction to Numerical Linear Algebra In this section, we will consider the solution of the symmetric, positive definite

N # iterations ρ1 1− ρ1 time16 32 0.9761 0.0239 0.2532 64 0.9882 0.0118 2.2264 126 0.9941 0.0059 19.9128 252 0.9971 0.0029 196...

......

......

1024 (2016) (0.9996) (0.0004) (100K)

Table 7: Performance of CG on M2 with nonsmooth data.

method will “see” the full spectrum. Again, the observed effect is due to a peculiarityof this sample problem, where the ai (or equivalently the Fα,β) decay very rapidly. Infact, we will see in the discussion of spectra below that if f is smooth, then

|Fα,β| ≤Kr

|α|r + |βr|for every r. In this case, f is smooth and the rapid decay of the data means that theCG method does not “see” the effect of the large eigenvalues. Despite the peculiaritiesof this example, we see that the number of iterations needed for convergence growslinearly with N as predicted.

We now apply the method on the problem M2 with a modified f = esin(x−y).This function is smooth on [0, 1]2 but not smooth as a periodic function so the highereigenvalues are present. The performance is shown in table 7. The ρ1 values nowshow 1−O(h) behaviour.

Question 1 Can you show why we see the ρ1 = 1 − O(h) behaviour? With evenrougher data (something like approximate delta functions), will we see the expectedρ1 = 1−O(h2) behaviour?

We return to the original smooth data M2. We knew the exact discrete solutionto this problem (from fast method) and so could measure the error in the A norm.This is artificial, since we rarely know the solution beforehand. However, a convenientnorm is the residual norm ‖ · ‖A2 since

‖U − Uk‖A2 = ‖A(U − Uk)‖ = ‖rk‖

and rk are the residuals calculated during the CG process (they do not require theexact solution U of course). The performance of the CG method with stopping criteriabased on relative error of ε = 10−7 in this norm is shown in table 8. The numberof iterations required for convergence is roughly the same for the residual and A-norms for this and other problems tested (convergence in residue norm is in general astricter criteria). Note that the residuals can grow in size: we designed the algorithmto reduce the error in A-norm, not residual.

Tests on the model problems M1 and M3 (using convergence in residue norm)show similar behaviour to that shown in table 7. The FEAT test problem also showsa similar behaviour (try it!).

14

Page 15: Numerical Linear Algebrawetton/m521/NotesXVII.pdf · 1 Introduction to Numerical Linear Algebra In this section, we will consider the solution of the symmetric, positive definite

iteration ‖U − Uk‖A2/‖U‖A2 ρk

1 2.75 1.822 4.16 1.663 7.65 1.714 16.2 1.815 11.6 1.506 1.48 0.9977 5.57-02 0.6248 3.93-04 0.3569 1.77-13 3.65-02

Table 8: Performance of CG on M2 with N = 16 using residual norm

Note 7 The exact performance of the CG method depends on the particular problemand data. However, the predicted behaviour of convergence in O(N3) operations in2D is borne out by the experiments.

1.3.4 Preconditioned Conjugate Gradients

Remember that the performance of the CG method depends on the location of theeigenvalues of A. It would be great if we could modify the matrix A to A so that thenew matrix had a “better” spectrum. We consider preconditioning by a symmetricpositive definite matrix M such that MA has a better spectrum than A and that Mis efficient to apply. Typically, M is some approximation to A−1. Note that to retainsymmetry, A must actually be defined as M1/2AM1/2 where M1/2 is the uniquelydefined positive definite symmetric square root of M . However, in the algorithmdefined below, M1/2 is never used. The preconditioned CG or PCG algorithm beginswith r0 = F , z0 = d1 = Mr0 , U0 = 0 and continues iteratively:

1. αj = (zj−1, rj−1)/(dj, Adj)

2. Uj = Uj−1 + αjdj

3. rj = rj−1 − αjAdj

4. zj = Mrj

5. βj = (zj, rj)/(zj−1, rj−1)

6. dj+1 = zj + βjdj

Note that only one multiplication by A and one multiplication by M is done everyiteration. A subtle point is that iterates from this procedure are optimal in theA-norm, not in the A norm. This is not of real practical concern.

Question 2 How does the above algorithm get around the problem of calculatingM1/2?

15

Page 16: Numerical Linear Algebrawetton/m521/NotesXVII.pdf · 1 Introduction to Numerical Linear Algebra In this section, we will consider the solution of the symmetric, positive definite

N # iterations time16 4 0.2932 4 0.8564 4 3.05128 4 12.6...

......

1024 (4) (806)

Table 9: Performance of preconditioned CG on M3.

The simplest preconditioner is that used automatically by FEAT: a diagonal scal-ing. Here, M is the diagonal matrix with entries a−1

i,i where ai,i are the diagonalentries of A. This removes any scaling differences in the system. Applying such apreconditioner to a diagonal A will give convergence in one iteration.

We now turn to a more interesting example. We could not solve the problemM3 using a fast transform method because of the variable coefficients. However, wecould easily solve the closely related constant coeffient problem M2. This suggestsusing the preconditioner M = A−1

2 (where A2 is the matrix from M2) for A3 fromM3. In the algorithm described above, we must compute z = Mr or z that satisfiesA2z = r which can be done using the FFT technique described earlier. The per-formance of the method is described in table 9. It appears that a constant numberof iterations is needed for every N . A look at the spectra for N = 16 and N = 32shown in figure 2 reveals why: the eigenvalues for A−1

2 A3 are contained in an interval[c, 1] where c > 0 does not depend on N (I believe this can be shown rigorously).Therefore, every PCG iteration will reduce the error by a constant amount. Thisgives us an O(N2 log N log(1/ε)) algorithm (since each iteration requires a fast solveof O(N2 log N)).

Note 8 If your problem is remotely similar to a problem that you can use a fastmethod on, then you can use the fast method as an effective preconditioner.

However, many problems do not fit in to this category. We need effective precon-ditioners that can be used for more general problems. This will be our goal in laterlectures.

Note 9 (Warning) Most of the asymptotic properties of the methods became evidentat N = 32 or N = 64. However, at this grid size, the problems we were looking atwere “over-resolved”, i.e. we were computing the exact solution to a relative accuracyof less than 1 %. There is clearly no application where we would need this accuracy.Therefore, one must be careful in reading off the properties of methods in this way.We will try and return to the performance of different methods on “real” problems atmore modest resolution whenever possible.

16

Page 17: Numerical Linear Algebrawetton/m521/NotesXVII.pdf · 1 Introduction to Numerical Linear Algebra In this section, we will consider the solution of the symmetric, positive definite

Figure 2: Eigenvalue distribution for M3 preconditioned by M2. The longer marksare zero (left) and 1.2 (right). The spectra are shown for N = 16 and N = 32.

17

Page 18: Numerical Linear Algebrawetton/m521/NotesXVII.pdf · 1 Introduction to Numerical Linear Algebra In this section, we will consider the solution of the symmetric, positive definite

2 Spectral Methods

We are considering a 1D model problem:

Au := −µuxx + bu = f (17)

with x ∈ [0, 2π] and u periodic. The continuous fourier transform for 2π-periodicfunctions is given by

uα = F(u) =1

∫ 2π

0u(x) e−iαx dx (18)

with inverse

u(x) =∞∑

α=−∞uαeiαx. (19)

Note that eiαx is an orthonormal set in L2 norm and spans L2 and

‖u‖22 =

∞∑α=−∞

|uα|2.

Also, these functions are eigenfunctions of A, i.e.

A eiαx = (µα2 + b)eiαx.

A solution to (17) can be found easily by decomposing f into spectral representa-tion, i.e. we can compute F(f), then uα = fα/(µα2 + b) and u(x) can be computedby the inverse transform.

spectral methods

The idea of spectral methods is to use a finite number of terms in the expansion (19).We will use an approximation

U(x) =N/2∑−N/2

uαeiαx.

where we use

uα =fα

µα2 + b

with the exact fα for now. We can ask how close U is to the exact u in some convenientnorm. For this problem, all norms are convenient. We’ll use the L2 norm here andnotice that

‖U − u‖2 =∑

|α|>N/2

|uα|2 =∑

|α|>N/2

|fα|2/(µα2 + b)2 (20)

18

Page 19: Numerical Linear Algebrawetton/m521/NotesXVII.pdf · 1 Introduction to Numerical Linear Algebra In this section, we will consider the solution of the symmetric, positive definite

since the coefficients for small α are the same. We assume that f ∈ C∞ (periodic)and derive the following simple estimate:

|fα| = | 1

∫ 2π

0f(x)e−iαx dx|

= | i

2πα

∫ 2π

0f ′(x)e−iαx dx|

≤ B/|α|

where B = max |f ′|. We repeat integration by parts to get

|fα| ≤ Bj/|α|j (21)

where Bj = max |f (j)|. This shows that if f is smooth, then |fα| decays rapidly with|α|. Returning to the error equation (20) we see that

‖U − u‖2 =∑

|α|>N/2

|fα|2/(µα2 + b)2

≤(

Bj

(N/2)j

)2 ∑|α|>N/2

1

(µα2 + b)2

≤ (Cj/Nj)2

for all j where Cj depends only on the size of the derivatives of f up to order j (notethat this estimate is not sharp). Thus we have

‖U − u‖ ≤ Cj/Nj (22)

for all j. Recall the estimate for a second order FD approximation:

‖Uh − u‖ ≤ Ch2.

For higher order (q) methods, we will see higher order convergence

‖Uh − u‖ ≤ Chq

but q is still fixed. Considering (22) we see that spectral methods are asymptoticallymore accurate that any FD method since they converge faster than any power ofh = 1/N . This type of convergence (22) is called spectral convergence. Rememberthat this estimate only holds for smooth data f .

Aliasing and Pseudo-Spectral Methods

The only thing we don’t like about the spectral method described above is the po-tentially laborious calculation to accurately compute fα from the integrals (18) andthe evaluation of U(x) at desired points from the summation (19) (recall this sum-mation has only finite terms for U). The way to get around this problem is to use the

19

Page 20: Numerical Linear Algebrawetton/m521/NotesXVII.pdf · 1 Introduction to Numerical Linear Algebra In this section, we will consider the solution of the symmetric, positive definite

inverse FFT to evaluate the approximation on a uniform grid with N grid points -this procedure is fast and exact. We can also use the FFT to approximate the fouriercoefficients fα from a vector of values Fi = f(ih) on N grid points. The details arepresented below.

Recall that the DFT is given by

Uα =1

N

N−1∑k=0

Uke−ikαh

where h = 2π/N and the inverse is given by

Uk =N−1∑α=0

Uαeikαh. (23)

We will first deal with some technical details. Note that the sum above goes from0 to N − 1 but the spectral method provides us with uα for |α| ≤ N/2 (this makesmore sense because the smaller |α| are the significant ones). However, if we evaluatethe functions eiαx and ei(α+N)x only at the points xk = k/N on the grid, we cannotdistinguish them. Therefore, we can easily relabel the summation in (23) to go fromα = −N/2 + 1, . . . N/2 where

Uα ∼ uα for α = −N/2 + 1, . . . N/2− 1

andUN/2 ∼ u−N/2 + uN/2. (24)

No information has been lost in (24) because these two modes are indistinguishableon the grid. Whatever we do with the N/2 coefficient is not that important since weexpect it to be very small (remember the fast decay for large α). However, it is some-times useful to consider this term in the form (24), i.e. so that spectral evaluationsof first derivatives of real f stay real. Using the above values of U , the inverse FFTcan be used to evaluate the function U on the grid points exactly.

To approximate fα for |α| ≤ N/2 we follow the same plan. We get the vectorF by evaluating f on the grid points and then compute F by the inverse FFT andidentify

Fα ∼ fα for α = −N/2 + 1, . . . N/2− 1

andFN/2 ∼ f−N/2 + fN/2. (25)

It is easy to show that in fact

Fα =∞∑

l=−∞fα+Nl. (26)

This is not surprising since we cannot distinguish between the α, α+N , α+2N , etc.modes on the grid. The effect in (26), where high frequency information is seen as

20

Page 21: Numerical Linear Algebrawetton/m521/NotesXVII.pdf · 1 Introduction to Numerical Linear Algebra In this section, we will consider the solution of the symmetric, positive definite

low frequency information, is called aliasing. Using the decay of the coefficients (21)it can be shown that

|Fα − fα| ≤ Cj/Nj (27)

for |α| ≤ N/2 − 1 and all j. A similar bound can be made for the N/2 term. Withthe F values we can compute

Uα = Fα/κα

for α = −N/2+1, . . . N/2 where κα is the corresponding eigenvalue µα2+b. Note thatthis approximation is consistent with (24) and (25) since κ−N/2 = κN/2. Now, the val-ues of U can be evaluated on the grid as described above. This is a fast (O(N log N))method called the pseudo-spectral method which is also spectrally accurate as shownbelow.

We consider pointwise errors rather than L2 errors this time.

|u(x)− U(x)| = |∞∑

α=−∞

κα

−N/2∑

α=−N/2

κα

|

≤N/2∑

α=−N/2

|fα − Fα|+∑

|α|>N/2

|fα|.

The second term decays spectrally as before and so does the first using (27). Thuswe have

|u(x)− U(x)| ≤ Cj/Nj

for all j and all x. We write ‖u− U‖∞ ≤ Cj/Nj. As before, the estimates above are

not sharp.

Problems with Variable Coefficients

We now apply a spectral method to a variable coefficient problem:

−µuxx + b(x)u = f.

This is a symmetric, positive definite problem which has a complete set of eigenfunc-tions as before. However, we do not want to use these functions as our basis becausewe don’t know them and we won’t have a fast transform technique to do the conver-sion from a grid representation to a spectral one. Therefore, we will continue to usethe Fourier basis from before.

Note 10 (Terminology) A spectral method does not mean we always use a spectralrepresentation of the problem. It means we are using a representation that will giveus spectral accuracy.

We first develop infinite conditions that the exact solution u must satisfy. Bytaking the transform of the original problem we get

µα2uα + buα = fα.

21

Page 22: Numerical Linear Algebrawetton/m521/NotesXVII.pdf · 1 Introduction to Numerical Linear Algebra In this section, we will consider the solution of the symmetric, positive definite

However,

buα =∞∑

n=−∞bnuα−n

so the FT does not diagonalize the problem effectively (this should not be expectedsince the fourier terms are not eigenfunctions).

A true spectral method (a Galerkin method) can be derived by assuming uα = 0for |α| > N/2 and then projecting the resulting equations on to the correspondingfinite dimensional space. The resulting equations for Uα for α ≤ N/2 are

(µα2 + B)Uα = fα

where the matrix B is full (with the b terms from the truncated convolution). Solvingthis system directly will be slow, because full matrix methods must be used. Althoughsolving the method iteratively by the CG method is possible (the matrix is symmetricand positive definite), it will be slow because multiplication by B will be slow.

We can speed up this process, however, by evaluating BU approximately (in thepseudo-spectral sense which will introduce aliasing). Recall that BU approximates

bu. Therefore, we could compute Ui by inverse FFT, multiply pointwise by bi andthen compute the inverse transform. This approximation is equivalent to replacingthe matrix B by

FBF−1

where B is a diagonal matrix representing the pointwise multiplication. The resultingsystem is

AU := (Ξ + FBF−1)U = FF (28)

where Ξ is a diagonal matrix with entries µα2 and it is assumed that f is approx-imated pseudospectrally. Note that A is a symmetric (since F−1 = F∗ where ∗denotes complex transpose) and positive definite since B and Ξ have positive entries.Although A is a full matrix, it is also possible to evaluate A quickly so a ConjugateGradient (CG) method can be applied. In this case, preconditioning by the inverseof the pseudospectral approximation of the constant coefficient case is effective.

Note 11 (Don’t need A sparse for CG) This example shows that a matrix doesnot have to be sparse to make CG an efficient technique. In fact, our matrix A wasdense, but we could still evaluate it quickly. For another interesting example of thisphenomenon, see [6].

Note 12 (Notation) Since almost all practical methods involve pseudo-spectral eval-uation of coefficients and interpolation, the “pseudo” is being dropped from thesemethods and they are usually called simply spectral methods.

22

Page 23: Numerical Linear Algebrawetton/m521/NotesXVII.pdf · 1 Introduction to Numerical Linear Algebra In this section, we will consider the solution of the symmetric, positive definite

Be Careful with Boundaries

What about domains with boundaries? We consider the problem

−uxx = f

with u(0) = u(1) = 0. The eigenfunctions of this problem are sin(nπx) with cor-responding eigenvalues n2π2. Since these correspond to the basis of the fast sinetransform, it seems to be a suitable basis for a spectral method. We consider com-puting the problem with f ≡ 1 (smooth), giving u(x) = −x2/2 + x/2 (also smooth).However, the coefficients in the sine series for f

fn =√

2∫ 1

0f(x) sin(nπx)dx

decay only like n−1. The corresponding solution u has sine coefficients u that decayonly like n−3. Thus, the “spectral” approximation of this problem described abovewill have errors of size h3 = 1/N3, i.e. errors no better than a third order FD method.

In this case, the eigenfunctions are a poor choice for the basis functions for amethod. A more appropriate choice would be the Chebyshev polynomials (using fastinterpolation on an irregular grid of points). The details are given in the references.

References

[1] Axelsson and Barker, “FE Solution of Boundary Value Problems”.

[2] Canuto, C. et. al., Spectral Methods in Fluid Dynamics.

[3] Golub and Van Loan, “Matrix Computations”.

[4] Gottlieb, D. and Orszag, S., Numerical Analysis of Spectral Methods.

[5] W. Hackbush, “Multi-Grid Methods and Applications”.

[6] Rokhlin, V., “Rapid Solution of Integral Equations of Classical Potential The-ory,” JCP 60, 187-207 (1983).

[7] G. Strang, “Introduction to Applied Math”.

[8] R. Varga, “Matrix Iterative Analysis”.

[9] van der Sluis and van der Horst, “The Rate of Convergence of Conjugate Gra-dients,” Numer. Mathe. 48, 543-560 (1986).

23