solving large linear systems

7/21/2019 Solving Large Linear Systems

http://slidepdf.com/reader/full/solving-large-linear-systems 1/18

192

CHAPTER 3. BOUNDARY VALUE PROBLEMS

3.6 SolvingLarge Linear Systems

Finite

elements

and

finite

differences

produce

large

linear

systems KU =

F . The

matrices K are

extremely

sparse .

They

have

only

a

small

number

of

nonzero

entries

in

atypicalrow. In“physicalspace”thosenonzerosareclusteredtightlytogether—they

come

from

neighboring

nodes

and

meshpoints.

But

we

cannot

number

N 2 nodes

in

a

planeinanywaythatkeepsneighborsclosetogether! Soin2-dimensionalproblems,andevenmorein3-dimensionalproblems,wemeetthreequestionsrightaway:

1. Howbesttonumberthenodes

2. Howtousethesparsenessof K (whennonzeroscanbewidelyseparated)

3. Whethertochoose

direct

eliminationoran

iterative

method.

That

last

point

will

split

this

section

into

two

parts—elimination

methods

in

2D

(where

node

order

is

important)

and

iterative

methods

in

3D

(where

preconditioning

is

crucial).

To fix ideas, we will create the n equations KU = F from Laplace’s difference

equation inan interval, asquare, andacube. WithN unknowns ineach direction,K hasordern=N orN 2 orN 3. Thereare3or5or7nonzeros inatypicalrowof thematrix. Seconddifferences in1D,2D,and3Dareshown inFigure3.17.

−1 −1Block

−1

TridiagonalK Tridiagonal

4 K 6

−1 2

−1 −1 −1 −1 −1

N 2 byN 2

N byN −1

N 3 byN 3 −1 −1

Figure3.17: 3,5,7pointdifferencemoleculesfor

−uxx,−uxx

−uyy

,−uxx

−uyy

−uzz

.

Along

a

typical

row

of

the

matrix,

the

entries

add

to

zero.

In

two

dimensions

this

is4

−1

−1

−1

−1=0. This“zerosum”remainstrueforfiniteelements(theelement

shapes decide the exact numerical entries). It reflects the fact that u = 1 solves

Laplace’s equation and U i

= 1 has differences equal to zero. The constant vectorsolves KU = 0 except near the boundaries . When a neighbor is a boundary point

whereU i

isknown, itsvaluemovesontotherightsideof KU =F . Thenthatrowof K isnot zerosum . Otherwise K wouldbesingular,if K ∗ones(n,1)=zeros(n,1).

Using block matrix notation, we can create the 2D matrix K = K2D from the

familiarN by N seconddifferencematrixK . Wenumberthenodesof thesquarea



193

3.6. SOLVING LARGE LINEAR SYSTEMS

rowatatime(this“naturalnumbering” isnotnecessarilybest). Thenthe

−1’sfor

theneighboraboveandtheneighborbelowareN positionsawayfromthemain

diagonal of K2D. The2Dmatrix isblocktridiagonalwithtridiagonalblocks:

2

−1 K + 2I

−I

−1 2

−1

· · ·

K2D =

−I K

+ 2I

−I

· · ·

K

=

(1)

−1 2

−I K + 2I

SizeN Eliminationinthisorder:K2D hassizen=N 2

TimeN Bandwidth

w=N, Spacenw=N 3, Time

nw2 =N 4

The matrix

K2D has 4’s down the main diagonal. Its bandwidth w = N is the

distancefromthediagonaltothenonzerosin

−I . Manyof thespacesinbetweenare

filledduringelimination! Thenthestoragespacerequiredforthefactorsin

K=

LU

is of order nw = N 3. The time is proportional to nw2 = N 4, when n rows each

contain

w

nonzeros,

and

w

nonzeros

below

the

pivot

require

elimination.

Those counts are not impossibly large in many practical 2D problems (and we

showhowtheycanbereduced). Thehorrifying largecountscome forK3D inthree

dimensions. Supposethe3Dgridisnumberedbysquarecross-sectionsinthenaturalorder1, . . . , N . Then K3D hasblocks of orderN 2 fromthosesquares. Eachsquare

isnumberedasabovetoproduceblockscomingfromK2D andI=I2D:

K2D + 2I

−I Sizen=N 3

K3D =

−I K2D + 2I

−I

· · ·

Bandwidthw =N 2

Elimination

spacenw

=N 5

−I K2D + 2I

Elimination

time

≈nw2 =N 7

Nowthemaindiagonalcontains6’s,and“insiderows”havesix

−1’s. Nexttoapoint

oredgeorcornerof theboundarycube,we loseoneortwoorthreeof those−1’s.

Thegoodwaytocreate

K2D from K and I (N by N ) is to use the kron(A, B)

command. This

Kronecker

product replaceseachentryaij

bytheblockaij

B. To take

seconddifferences inallrowsatthesametime,andthenallcolumns,usekron:

K2D =kron(K, I ) + kron(I, K ). (2)

The identity matrix in two dimensions is

I2D = kron(I, I ). This adjusts to allow

rectangles,

with

I ’s

of

different

sizes,

and

in

three

dimensions

to

allow

boxes.

For

acubewetakeseconddifferences insideallplanesandalsointhez -direction:

K3D =kron(K2D, I ) + kron(I2D, K ).

HavingsetupthesespecialmatricesK2D andK3D,wehavetosaythatthereare

specialwaystoworkwiththem. Thex,y,z directionsareseparable. Thegeometry

(abox)isalsoseparable. SeeSection7.2onFast Poisson Solvers . Herethematrices

K andK2D andK3D areservingasmodelsof the type of matrices that we meet .



194


Minimum Degree Algorithm

Wenowdescribe(alittleroughly)ausefulreorderingof thenodesandtheequationsin

K2DU=

F. The orderingachievesminimum degree ateach step—the

number

of

nonzeros

below

the

pivot

row

is

minimized. Thisisessentiallythealgorithm

used

in

MATLAB’s

command

U

=

K \F

, when

K

has

been

defined

as

a

sparse

matrix.

Welistsomeof thefunctionsfromthesparfundirectory:

speye (sparse identityI ) nnz (numberof nonzeroentries)

find (findindicesof nonzeros) spy (visualizesparsitypattern)colamdandsymamd (approximateminimumdegreepermutationof K )

Youcantestandusetheminimumdegreealgorithmswithoutacarefulanalysis. The

approximationsarefasterthantheexactminimumdegreepermutationscolmmdand

symmmd. Thespeed(intwodimensions)andtheroundoff errorsarequitereasonable.

In

the

Laplace

examples,

the

minimum

degree

ordering

of

nodes

is

irregular

com-paredto“arowatatime.” Thefinalbandwidth isprobablynotdecreased. Butthe

nonzeroentriesarepostponedaslongaspossible! Thatisthekey.

Thedifferenceisshowninthearrowmatrixof Figure3.18. Ontheleft,minimum

degree(onenonzerooff thediagonal)leadstolargebandwidth. Butthereisno fill-in .Eliminationwillonlychange its lastrowandcolumn. Thetriangular factorsLand

U have allthe samezerosas A. Thespace forstoragestays at3n, andelimination

needsonlyndivisionsandmultiplicationsandsubtractions.

∗ ∗

∗ ∗

∗ ∗

∗ ∗

∗ ∗

∗ ∗

∗ ∗

Bandwidth

6 and

3

Fill-in

0 and

6

∗ ∗

∗ ∗

∗ ∗ ∗ ∗

∗

∗

∗

∗ ∗

F F

∗ F

∗

F

∗ ∗ ∗ ∗ ∗ ∗ ∗

∗ F F

∗

Figure3.18: Arrowmatrix: Minimumdegree(noF)againstminimumbandwidth.

The

second

ordering

reduces

the

bandwidth

from

6

to

3.

But

when

row

4

is

reached as the pivot row, the entries indicated by F are filled in . That full lower

quarter of A gives

1

8n

2 nonzeros to both factors L and U . You see that the whole

“profile”of thematrixdecidesthefill-in,not justthebandwidth.

The minimum degree algorithm chooses the (k+1)st pivot column, after k

columnshavebeeneliminatedasusualbelowthediagonal,bythefollowingrule:

In the remaining matrix of size n

− k,select the column with the fewest nonzeros.



195


The component of U corresponding to that column is renumbered k+ 1. So is the

nodeinthefinitedifferencegrid. Of courseeliminationinthatcolumnwillnormally

produce new nonzeros in the remaining columns! Some fill-in is unavoidable. So

thealgorithmmustkeeptrackof thenewpositionsof nonzeros,andalsotheactualentries. It is thepositions that decide the ordering of unknowns. Then theentries

decide

the

numbers

in

L

and

U

.

Example Figure 3.19 shows a small example of the minimal degree ordering, forLaplace’s 5-pointscheme. The node connections produce nonzero entries (indicated by

∗) in K. The problem has six unknowns. K has two 3 by 3 tridiagonal blocks from

horizontal links,andtwo3by3blockswith

−I fromvertical links.

The

degree

of

a

node isthenumberof connectionstoothernodes. This isthe

numberof nonzeros inthatcolumnof K. Thecornernodes1,3,4,6allhavedegree

2. Nodes2and5havedegree3. Alargerregionhas insidenodesof degree4,which

will not be eliminated first. The degrees change as elimination proceeds, because of

fill-in .

Thefirsteliminationstepchoosesrow1aspivotrow,becausenode1hasminimum

degree2. (Wehadtobreakatie! Anydegree2nodecouldcomefirst,leadingtodifferent

elimination orders.) The pivot is P, the other nonzeros in that row are boxed. When

row 1 operates on rows 2 and 4, it changes six entries below it. In particular, the two

fill-in entries marked by Fchange to nonzeros . Thisfill-inof the(2,4)and(4,2)entriescorrespondstothedashed lineconnectingnodes2and4 inthegraph.

1

2

3

F F F

2

3

4 5 6 4 5 6

1

2

3

4

5

6

∗

∗P 2

3

4

5

6

∗ F

∗ F

∗ P

F

∗

F

∗ ∗

∗ ∗ ∗

∗ ∗ F

∗

∗∗ ∗

2

3

P

Fill-in

F

from

4

5

6

Pivots

Zeros

elimination

∗

F

∗

∗

∗ ∗

∗ ∗ ∗

∗ ∗

∗ ∗

Figure3.19: Minimumdegreenodes1and3. ThepivotsPareinrows1and3;new

edges2–4and2–6inthegraphmatchthematrixentriesFfilledinbyelimination.



196


Nodes that were connected to the eliminated node are now connected to each other .Eliminationcontinuesonthe5by5matrix(andthegraphwith5nodes). Node2stillhasdegree3,soitisnoteliminatednext. If webreakthetiebychoosingnode3,elimination

using thenewpivotPwillfill inthe(2,6)and(6,2)positions. Node 2becomes linked

to node 6because they were both linked to the eliminated node 3.

The

problem

is

reduced

to

4

by

4,

for

the

unknown

U

’s

at

the

remaining

nodes

2,4,5,6. Problem asksyoutotakethenextstep—chooseaminimumdegreenode

andreducethesystemto3by3.

Storing the Nonzero Structure = Sparsity Pattern

AlargesystemKU=Fneedsafastandeconomicalstorageof thenodeconnections(whichmatchthepositionsof nonzerosinK). Theconnectionsandnonzeroschange

aselimination proceeds. The list of edges andnonzeropositions corresponds tothe

“adjacency

matrix

”

of

the

graph

of

nodes.

The

adjacency

matrix

has

1

or

0

to

indicate

nonzeroorzeroin

K.

Foreachnode i,wehavealistadj(i)of thenodesconnectedtoi. How to combine

theseintoonemasterlistNZforthewholegraphandthewholematrix

K? A simple

wayistostorethelistsadj(i)sequentiallyinNZ(thenonzerosfori= 1 up to i=n).AnindexarrayINDof pointerstellsthestartingpositionof thesublistadj(i) within

the master list NZ. It is useful to give IND an (n+1)stentry to point to the finalentry in NZ (or to the blank that follows, in Figure 3.20). MATLAB will store one

morearray(thesamelengthnnz(K)asNZ)togivetheactualnonzeroentries.

NZ

|

2 4↑

|

1 3↑

5

|

2 6↑

|

1 5↑

|

2↑

4

6

|

3↑

5

| ↑

IND 1 3 6 8 10 13 15

Node 1 2 3 4 5 6

Figure

3.20:

Master

list

NZ

of

nonzeros

(neighbors

in

Figure

3.19).

Positions

in

IND.

The indicesiarethe“originalnumbering”of thenodes. If there isrenumbering,theneworderingcanbestoredasapermutationPERM.ThenPERM(i) = k when

the

new

number

i

is

assigned

to

the

node

with

original

number

k. The

text

[GL]

byGeorgeandLiuistheclassicreferenceforthisentiresectiononorderingof thenodes.

Graph Separators

Hereisanothergoodordering,differentfromminimumdegree. Graphsormeshesare

oftenseparated intodisjoint pieces by acut. Thecut goesthroughasmallnumber

of nodesormeshpoints(aseparator). It is a good idea tonumber the nodes in the



3.6. SOLVING LARGE LINEAR SYSTEMS 197

separator last . Elimination isrelatively fast forthedisjointpiecesP andQ. It only

slowsdownattheend,forthe(smaller)separatorS .

ThethreegroupsP,Q,S of meshpointshavenodirectconnectionsbetweenP and

Q(theyarebothconnectedtotheseparatorS ). Numberedinthatorder,the“block

arrow”

stiffness

matrix

and

its

K

=

LU

factorization

look

like

this:

K

K P 0 K P S LP U P 0 A

K=

0

K Q

K QS

L=

0

LQ

U=

U Q

B

SP K SQ

K S X Y Z C (3)

The zero blocks in

K give zero blocks in

L and

U. The submatrix K P comes firstinelimination,toproduceLP

andU P

. ThencomethefactorsLQU Q

of K Q,followed

bytheconnectionsthroughtheseparator. Themajorcostisoftenthatlaststep,the

solutionof afairlydensesystemof thesizeof theseparator.

1 4 1 5 3

S Q2 P

6

3 5 2 6 4 P S Q

Arrow

matrix

Separator

comes

last

BlocksP,Q

(Figure3.18) (Figure3.19)

SeparatorS

Figure3.21: Agraphseparatornumbered lastproducesablockarrowmatrixK.

Figure3.21showsthreeexamples,eachwithseparators. Thegraphforaperfect

arrow

matrix

has

a

one-point

separator

(very

unusual).

The

6-node

rectangle

has

a

two-node separator in the middle. Every N by N grid can be cut by an N -point

separator(andN ismuchsmallerthanN 2). If themeshpointsformarectangle,the

bestcut isdownthemiddleintheshorterdirection.

Youcouldsaythatthenumbering of P thenQthenS isblock minimum degree .But one cut with one separator will not come close to an optimal numbering. It

is

natural

to

extend

the

idea

to

a

nested

sequence

of

cuts. P and

Q

have

their

own

separators

at

the

next

level.

This

nested

dissection

continues

until

it

is

not

productivetocutfurther. It isastrategyof “divideandconquer.”

Figure

3.22

illustrates

three

levels

of

nested

dissection

on

a

7

by

7

grid.

The

first

cutisdownthemiddle. Thentwocutsgoacrossandfourcutsgodown. Numbering

theseparatorslastwithineachstage,thematrix

Kof size49hasarrowsinsidearrows

insidearrows. Thespycommandwilldisplaythepatternof nonzeros.

Separatorsandnesteddissectionshowhownumberingstrategiesarebasedonthe

graph of nodes and edges in the mesh. Those edges correspond to nonzeros in the

matrixK. Thenonzeroscreatedbyelimination(filledentriesinLandU)correspond

topaths inthegraph. Inpractice,therehastobeabalancebetweensimplicityand

optimalityinthenumbering—inscientificcomputingsimplicityisaverygoodthing!



199


morepowerful. Sotheolderiterationsof JacobiandGauss-Seidelandoverrelaxation

arelessfavoredinscientificcomputing,comparedtoconjugategradientsandGM-RES.WhenthegrowingKrylovsubspacesreachthewholespaceRn, these methods

(inexactarithmetic)givetheexactsolutionA−1b. But inrealitywestopmuchear-lier, long beforensteps are complete. The conjugategradient method ( for positive

definite

A,

and

with

a

good

preconditioner

)

has

become

truly

important.

The next tenpageswill introduce you to

numerical

linear

algebra. This has

become a central part of scientific computing, with a clear goal: Find

a

fast

stable

algorithm

that

uses

the

special

properties

of

the

matrices . We meet matrices that

aresparseorsymmetric ortriangularororthogonalortridiagonalorHessenberg or

Givens or Householder. Those matrices are at the core of so many computationalproblems. The algorithm doesn’t need details of the entries (which come from the

specific application). By using only their structure, numerical linear algebra offersmajorhelp.

Overall,eliminationwithgoodnumberingisthefirstchoiceuntilstorageandCPU

time

become

excessive.

This

high

cost

often

arises

first

in

three

dimensions.

At

that

pointweturnto iterativemethods, whichrequiremoreexpertise. Youmustchoose

the method and the preconditioner. The next pages aim to help the reader atthis

frontierof scientificcomputing.

PureIterations

Webeginwithold-stylepure iteration(notobsolete). The letterK willbereserved

for“Krylov”sowe leavebehindthenotationKU =F . The linearsystem becomes

Ax=bwithalargesparsematrixA,notnecessarilysymmetricorpositivedefinite:

Linear systemAx=b Residualrk

=b−Axk

PreconditionerP ≈A

ThepreconditionerP attemptstobe“closetoA” and at the same time much easier

to work with. A diagonal P is one extreme (not very close). P = A is the otherextreme(tooclose). SplittingthematrixAgivesanequivalentformof Ax=b:

Splitting P x = (P −A)x+b . (4)

Thissuggestsaniteration,inwhicheveryvectorxk

leadstothenextxk+1:

Iteration

P xk+1 = (P

−

A)xk

+

b .

(5)

Startingfromanyx0,thefirststepfindsx1 fromP x1 = (P −A)x0 +b. Theiteration

continuestox2 withthesamematrixP ,soitoftenhelpstoknowitstriangularfactors

LandU . SometimesP itself istriangular,oritsfactorsLandU areapproximationstothetriangularfactorsof A. TwoconditionsonP maketheiterationsuccessful:

1. Thenewxk+1 mustbequicklycomputable. Equation(5)mustbefasttosolve.

2. Theerrorsek

=x

−xk

mustconvergequicklytozero.



200


Subtractequation(5)from(4)tofindtheerror equation. Itconnectsek

toek+1:

Error P ek+1 = (P −A)ek

whichmeans ek+1 = (I −P −1A)ek

=Mek

. (6)

The right side b disappears in this error equation. Each step multiplies the error

vector

by

M

=

I

−

P

−1A.

The

speed

of

convergence

of

xk to

x

(and

of

ek to

zero)

dependsentirelyonM . The

test

for

convergence

is

given

by

the

eigenvalues

of M :

Convergence test Everyeigenvalueof M musthave

|λ(M )|<1.

Thelargesteigenvalue(inabsolutevalue)isthe

spectral

radiusρ(M ) = max

|λ(M )|.Convergencerequiresρ(M )<1. Theconvergence rate issetbythe largesteigen-value. Foralargeproblem,wearehappywithρ(M ) = .9andevenρ(M ) = .99.

Suppose that the initial error e0 happens to be an eigenvector of M . Then the

nexterror ise1 =Me0 =λe0. Ateverysteptheerror ismultipliedbyλ, so we must

have

|λ|

<

1.

Normally

e0 is

a

combination

of

all

the

eigenvectors.

When

the

iterationmultiplies byM ,eacheigenvector ismultipliedby itsowneigenvalue. Afterk steps

thosemultipliersareλk. Wehaveconvergence if all|λ|<1.

Forpreconditionerwefirstproposetwosimplechoices:

Jacobi iteration P =diagonalpartof A

Gauss-Seidel iteration P =lowertriangularpartof A

Typical examples have spectral radius ρ(M ) = 1

−cN −1. This comes closer and

closerto1asthemeshisrefinedandthematrixgrows. AnimprovedpreconditionerP can give ρ(M ) = 1

−cN −1/2. Then ρ is smaller and convergence is faster, as in

“overrelaxation.”

But

a

different

approach

has

given

more

flexibility

in

constructing

agoodP , from a quick incomplete LU factorization of the true matrix A:

I ncomplete

LU P =(approximationtoL)(approximationtoU ).

TheexactA=LU hasfill-in,sozeroentriesinAbecomenonzeroinLandU . The ap-proximateLandU couldignorethisfill-in(fairlydangerous). OrP =LapproxU approxcan keep only the fill-in entries

F above a fixed threshold. The variety of options,

andthe factthatthecomputer candecideautomaticallywhich entries tokeep, has

madethe

ILU idea(incompleteLU )averypopularstartingpoint.

Example The

−1,2,−1 matrixA=K provides anexcellentexample. Wechoose

thepreconditioner P =T ,thesamematrixwithT 11 = 1 insteadof K 11 = 2. The LU factorsof T areperfectfirstdifferences,withdiagonalsof +1and

−1. (Rememberthatallpivotsof T equal1,whilethepivotsof K are2/1,3/2,4/3, . . .)Wecancomputethe

right side of T −1Kx = T −1b with only 2N additions and no multiplications (just back

substitutionusingLandU ). Idea: This LandU areapproximatelycorrect forK .

ThematrixP −1A=T −1K ontheleftsideistriangular. Morethanthat,T isarank1

change from K (the1,1entry changes from2 to 1). It follows thatT −1K and K −1T



201


arerank1changesfromtheidentitymatrixI . AcalculationinProblem showsthatonly the first column of I is changed ,bythe“linearvector”= (N, N −1, . . . , 1):

P −1A=T −1K =I +eT and K −1T =I −(eT1 )/(N + 1) . (7)1

T

Here

e1 =

1 0

. . .

0

so

eT

has

first

column

.

This

example

finds

x

=

K −1

b

by1

aquickexactformula(K −1T )T −1b,needingonly2N additionsforT −1 andN additionsandmultiplicationsforK −1T . Inpracticewewouldn’tpreconditionthisK (justsolve).

Theusualpurposeof preconditioningistospeedupconvergenceforiterativemethods,and that depends on the eigenvalues of P −1A. Here the eigenvalues of T −1K are itsdiagonalentriesN +1, 1, . . . , 1. Thisexamplewillillustrateaspecialpropertyof conjugate

gradients, that with only two different eigenvalues it reaches the true solution x in two

steps.

The iterationP xk+1 = (P −A)xk

+b istoosimple! Itischoosingoneparticularvector

in

a

“Krylov

subspace.”

With

relatively

little

work

we

can

make

a

much

better

choice

of

xk.

Krylov

projections

are

the

state

of

the

art

in

today’s

iterative

methods.

Krylov Subspaces

Our original equation is Ax = b. The preconditioned equation is P −1Ax = P −1b.When

we

write P −1,

we

never

intend

that

an

inverse

would

be

explicitly

computed

(except inour example). The ordinary iteration isa correction toxk

by the vector

P −1rk:

P xk+1 = (P

−A)xk

+b or P xk+1 =P xk

+rk

or xk+1 =xk

+P −1rk

. (8)

Here rk

= b

−Axk

is the residual. It is the error in Ax = b, not the error ek

in x. The symbol P −1rk

represents the change from xk

to xk+1, but that step is

notcomputed bymultiplying P −1 timesrk

. Wemightuse incomplete LU , or a few

steps of a “multigrid” iteration, or “domain decomposition.” Or an entirely new

preconditioner.

IndescribingKrylovsubspaces,IshouldworkwithP −1A. For

simplicity

I

will

only

write A. IamassumingthatP hasbeenchosenandused,andtheprecondi-

tionedequationP −1Ax=P −1b isgiventhenotationAx=b. Thepreconditioner is

nowP =I . OurnewmatrixAisprobablybetterthantheoriginalmatrixwiththat

name.

The Krylov subspaceKk(A, b)contains all combinations of b,Ab, . . . ,Ak−1b.

These arethevectors thatwe cancompute quickly, multiplying by asparse A. We

lookinthisspaceKk fortheapproximationxk

tothetruesolutionof Ax=b. Notice

thatthepureiterationxk

= (I −A)xk−1 +bdoesproduceavector inKk whenxk−1

is in Kk−1. The Krylov subspace methods make other choices of xk

. Here are four

differentapproachestochoosingagoodxk

inKk —thisistheimportantdecision:



202


1. Theresidualrk

=b − Axk

isorthogonaltoKk (Conjugate Gradients, . . . )

2. Theresidualrk

hasminimumnormforxk

inKk (GMRES,MINRES, . . . )

3. rk

isorthogonaltoadifferentspacelike

Kk(AT) (BiConjugate

Gradients, . . . )

4. ek

hasminimumnorm(SYMMLQ; for BiCGStabxk

isinATKk(AT) ; . . . )

Ineverycasewehopetocomputethenewxk

quicklyandstablyfromtheearlierx’s.If thatrecursion only involves xk−1 andxk−2 (short recurrence) it isespecially fast.We will see this happen for conjugate gradients and symmetric positive definite A.The BiCG method in 3 is a natural extension of short recurrences to unsymmetric

A —butstabilityandotherquestionsopenthedoortothewholerangeof methods.

Tocomputexk

weneedabasis forKk. Thebestbasisq 1, . . . , q k

isorthonormal .Each

new q k

comes

from

orthogonalizing t=

Aq k−1 to

the

basis

vectorsq 1, . . . , q k−1

that

are

already

chosen.

This

is

the

Gram-Schmidt

idea

(called

modified

Gram-Schmidtwhenwesubtractprojectionsof tontotheq ’s

one

at

a

time , for numerical

stability). The iteration to compute the orthonormal q ’s is known as

Arnoldi’s

method:

1 q 1 =b/b2; %Normalizeto

q 1= 1

for j= 1, . . . , k − 1

2 t=Aq j

; %tisintheKrylovspaceK j+1(A, b)

fori= 1, . . . , j

3 hij

=q iTt; %hij

q i

=projectionof tontoq i4 t=t

− hij

q i; %Subtractcomponentof talongq i

6 q

5 hend;

j+1,j

=t2; %tisnoworthogonaltoq 1, . . . , q j

j+1 =t/h j+1,j

; %Normalizetto

q j+1= 1

end %q 1, . . . , q k

areorthonormalin

Kk

Putthecolumnvectorsq 1, . . . , q k

intoannby kmatrixQk

. Multiplyingrowsof TQT by

columns

of Qk

produces

all

the

inner

productsq i

q j

,

which

are

the

0’s

and

1’sk

in

the

identity

matrix.

The

orthonormal

property

means

that QT

k

Qk

=I k

.

Arnoldi constructs each q j+1 from Aq j

by subtracting projections hij

q i. If we

expressthestepsupto j=k

−1inmatrixnotation,theybecomeAQk−1 =Qk

H k,k−1:

Arnoldi h11 h12 · h1,k−1

AQk−1 =

Aq 1 · · · Aq k−1

=

q 1 · · · q k

h21 h22 · h2,k−1

0 h23 · ·0 0 · hk,k−1

. (9)

nbyk − 1 nbyk kbyk − 1

ThatmatrixH k,k−1 is“upperHessenberg”because ithasonlyonenonzerodiagonalbelow the main diagonal. We check that the first column of this matrix equation



203


(multiplyingbycolumns!) producesq 2:

Aq 1 −h11q 1

hAq 1 =h11q 1 +h21q 2 or q 2 = . (10)

21

ThatsubtractionisStep

4inArnoldi’salgorithm. Divisionbyh21 isStep

6.

Unless

more

of

the

hij

are

zero,

the

cost

is

increasing

at

every

iteration.

We

have

kdotproductstocomputeatstep

3and

5, and kvectorupdatesinsteps4and

6. A

short recurrence meansthatmostof thesehij

arezero. ThathappenswhenA=AT.

ThematrixH istridiagonalwhenAissymmetric. Thisfactisthefounda-tionof conjugategradients. Foramatrixproof,multiplyequation(9)byQT . The k−1

rightsidebecomesH without its lastrow,because(QTk−1Qk

)H k,k−1 = [ I 0]H k,k−1.TheleftsideQT

k−1AQk−1 isalwayssymmetricwhenAissymmetric. SothatH matrix

has to be symmetric, which makes it tridiagonal . There are only three nonzeros in

the rows and columns of H ,andGram-Schmidttofindq k+1 onlyinvolvesq k

andq k−1:

Arnoldi

when

A

=

AT

Aq k =

hk+1,kq k+1 +

hk,kq k +

hk−

1,kq k−1 .

(11)

This

is

the

Lanczos

iteration . Each

newq k+1 = (Aq k

−hk,kq k

−hk−1,kq k−1)/hk+1,k

involves

one

multiplicationAq k

,

two

dot

products

for

newh’s,

and

two

vector

updates.

The QR Method for Eigenvalues

Allowmeanimportantcommentontheeigenvalue problem Ax=λx. Wehaveseen

thatH k−1 =QTk−1AQk−1 istridiagonal if A=AT. When k

−1reachesnandQn

is

square,thematrixH =QTAQn

=Q−1AQn

hasthesameeigenvaluesasA:n n

Same

λ

Hy

=

Q−1AQny

=

λy

gives

Ax

=

λx

with

x

=

Qny .

(12)n

ItismucheasiertofindtheeigenvaluesλforatridiagonalH thanthefororiginalA.

Thefamous“QR

method”fortheeigenvalueproblemstartswithT 1 =H ,factors

it into T 1 =Q1R1 (this isGram-Schmidt on the shortcolumns of T 1), andreverses

ordertoproduceT 2 =R1Q1. ThematrixT 2 isagaintridiagonal,anditsoff-diagonalentries are normally smaller than for T 1. The next step is Gram-Schmidt on T 2,orthogonalizingitscolumns inQ2 bythecombinations intheuppertriangularR2:

QR

Method

Factor T 2 into

Q2R2 . Reverse

order

to T 3 =

R2Q2 =Q−1T 2Q2 (13)2 .

By

the

reasoning

in

(12),

any

Q−1

T Q

has

the

same

eigenvalues

as

T

.

So

the

matricesT 2, T 3, . . . all have the same eigenvalues as T 1 = H and A. (These square Qk

from

Gram-Schmidt are entirely different from the rectangular Qk

in Arnoldi.) We can

even shift T before Gram-Schmidt, and we should, provided we remember to shift

back:

Shifted

QR Factor T k

−skI =Qk

Rk

. Reverseto T k+1 =Rk

Qk

+bk

I . (14)

T Whentheshiftsk

ischosen tobethen,nentryof T k

,the lastoff-diagonalentryof

k+1 becomesverysmall. Then,nentryof T k+1 movesclosetoaneigenvalue. Shifted



204


QRis one of the great algorithms of numerical linear algebra . Itsolvesmoderate-size

eigenvalueproblemswithgreatefficiency. This isthecoreof MATLAB’seig(A).

For a large symmetric matrix, we often stop the Arnoldi-Lanczos iteration at a

tridiagonalH k

withk < n. The full n-stepprocesstoreachH n

istooexpensive,and

often

we

don’t

need

all

n

eigenvalues.

So

we

compute

(by

the

same

QR

method)

the

k

eigenvalues

of

H k

instead

of

the

n eigenvalues

of

H n.

These

computed

λ1k, λ2k, . . . , λkk

can provide good approximations to the first k eigenvalues of A. And we have an

excellentstartontheeigenvalueproblemforH k+1,if wedecidetotakeafurtherstep.

This

Lanczos

methodwillfind,approximatelyand iterativelyandquickly, the

leadingeigenvaluesof alargesymmetricmatrix.

The Conjugate Gradient Method

We return to iterative methods for Ax = b. The Arnoldi algorithm produced or-

thonormal

basis

vectors

q 1, q 2, . . .

for

the

growing

Krylov

subspaces

K1

, K2

, . . .. Nowwe select vectors x1, x2, . . . in those subspaces that approach the exact solution to

Ax = b. We concentrate on the conjugate gradient method for symmetric positive

definite A.

The

rule

for xk

in

conjugate

gradients

is

that

the

residual rk

= b

−Axk

should

be orthogonalto all vectors in

Kk. Since rk

will be in

Kk+1, it must bea multiple

of Arnoldi’s next vector q k+1! Each residual is therefore orthogonal to all previousresiduals(whicharemultiplesof thepreviousq ’s):

Orthogonal

residuals ri

Trk

= 0 for i < k . (15)

The

difference

between

rk

and

q k+1 is

that

the

q ’s

are

normalized,

as

in

q 1 =

b/b.

KSincerk−1 isamultipleof q k,thedifferencerk

−rk−1 isorthogonaltoeachsubspace

i with i < k. Certainly xi −xi−1 lies in that Ki. So ∆r is orthogonal to earlier∆x’s:

(xi−xi−1)T(rk

−rk−1) = 0 for i < k . (16)

Thesedifferences∆xand∆raredirectlyconnected,becausetheb’scancel in∆r:

rk

−rk−1 = (b −Axk)

−(b −Axk−1) =

−A(xk

−xk−1). (17)

Substituting(17)into(16),theupdatesinthex’sare“A-orthogonal”or

conjugate:

Conjugate

updates

∆x (xi

−xi−1)TA(xk

−xk−1) = 0 for i < k . (18)

Nowwehave alltherequirements. Eachconjugategradientstepwillfindanew

“search direction” dk

for the update xk

−xk−1. From xk−1 it will move the right

distance αkdk

to xk. Using (17) it will compute the new rk. The constants β k

in

the search direction and αk

in the update will be determined by (15) and (16) for

i=k −1. ForsymmetricAtheorthogonalityin(15)and(16)willbeautomaticfor

i < k

−1,asinArnoldi. Wehavea“shortrecurrence”forthenewxk

andrk.



205


Hereisonecycleof thealgorithm,startingfromx0 = 0 and r0 =bandβ 1 = 0. It

involvesonlytwonewdotproductsandonematrix-vectormultiplicationAd:

Conjugate

1 β k

=rk−1rk−1/rTTk−2rk−2 %Improvementthisstep

Gradient

2 dk

=rk−1 +β k

dk−1 %Nextsearchdirection

TMethod

3

αk

=

rk−1rk−1/dTAdk

% Step length to

next

xkk

4 xk

=xk−1 +αk

dk

%Approximatesolution

5 rk

=rk−1 −αk

Adk

%Newresidualfrom(17)

Theformulas1and3forβ k

andαk

areexplainedbrieflybelow—andfullybyTrefethen-Bau()andShewchuk()andmanyothergoodreferences.

Different Viewpoints on Conjugate Gradients

I

want

to

describe

the

(same!)

conjugate

gradient

method

in

two

different

ways:

1. ItsolvesatridiagonalsystemHy=f recursively

2. Itminimizestheenergy

1xTAx

−xTbrecursively.

2

How does Ax = b change to the tridiagonal Hy= f ? That uses Arnoldi’s or-thonormalcolumnsq 1, . . . , q n

inQ, with QTQ=I andQTAQ=H :

Ax=b is (QTAQ)(QTx) = QTb which is Hy=f = (b,0, . . . , 0). (19)

TSince

q 1 is

b/b,

the

first

component

of

f

=

QTb

is

q 1 b

=

b

and

the

other

com-ponents are q Tb = 0. The conjugate gradient method is implicitly computing thisi

symmetrictridiagonalH andupdatingthesolutionyateachstep. Hereisthethird

step:

h11 h12 b

H 3y3 =

h21 h22 h23 y3

=

0

. (20)

h32 h33 0

This

is

the

equationAx

=b

projected

by Q3 onto

the

third

Krylov

subspace

K3.

Theseh’sneverappearinconjugategradients. Wedon’twanttodoArnolditoo!

It

is

the

LDLT

factors

of

H

that

CG

is

somehow

computing—two

new

numbers

at

each step. Those give a fast update fromy j−1 toy j

. The corresponding x j

=Q j

y j

fromconjugategradientsapproachestheexactsolutionxn

=Qnyn

whichisx=A−1b.

If wecanseeconjugategradientsalsoasanenergyminimizingalgorithm,wecan

extend itto nonlinear problems anduse it in optimization. Forour linear equation

Ax=b,theenergy isE (x) =

1xTAx

−xTb. MinimizingE (x) isthesameassolving

2

Ax = b, when A is positive definite (the main point of Section 1. ). The CG

iteration minimizes E (x) on the growing Krylov subspaces. On the firstsubspace K1, the line where x is αb = αd1, this minimization produces the right



206


value for α1:

1 bTb

E (αb) = α2bTAb

−αbTb isminimizedat α1 = . (21)

2 bTAb

That

α1 is

the

constant

chosen

in

step

3

of

the

first

conjugate

gradient

cycle.

The gradient of E (x) =

2

1xTAx

−xTb is exactly Ax

−b. The

steepest

descent

direction

at x1 is

along

the

negative

gradient,

which

is r1! Thissoundsliketheperfect

directiond2 forthenextmove. Butthegreatdifficultywithsteepestdescent isthat

this r1 can be too close to the first direction. Little progress that way. So we add

the right multiple β 2d1, in order to make d2 = r1 +β 2d1 A-orthogonal to the first

directiond1.

Thenwemoveinthisconjugatedirectiond2 tox2 =x1 +α2d2. Thisexplainsthe

nameconjugate gradients ,ratherthanthepuregradientsof steepestdescent. Every

cycleof CGchoosesα j

tominimizeE (x)inthenewsearchdirectionx=x j−1 +αd j

.

The

last cycle

(if

we

go

that far) gives

the

overall

minimizer

xn =

x

=

A−1b.

Example

2 1 1 3 4 1 2 1

−1

=

Ax=b is

0

.

1 1 2

−1 0

1From x0 = 0 andβ 1 = 0 andr0 =d1 =b thefirst cyclegivesα1 =

2andx1 =

1b=

2

(2,0,0). Thenewresidual isr1 =b

−Ax1 = (0,−2,−2). Thenthesecondcycleyields

2 38 8β 2 = , d2 =

−2

, α2 = , x2 =

−1

=A−1b!

16 16−2

−1

A

Thecorrectsolutionisreachedintwosteps,wherenormallyitwilltaken= 3 steps. The

reason is that this particular A has only two distinct eigenvalues 4 and 1. In that case

−1bisacombinationof bandAb,andthisbestcombinationx2 isfoundatcycle2. The

residualr2 iszeroandthecyclesstopearly—veryunusual.

Energy minimization leads in [ ]to an estimate of the convergence rate forthe√

errore=x

−x j

inconjugategradients,usingtheA-norm

eA

= eTAe:

λ

√ √

j

max − λmin

λErrorestimate

x

−x j

A

≤2

√ √ x

−x0A

. (22)

max + λmin

λ

Thisisthebest-knownerrorestimate,althoughitdoesn’taccountforanyclusteringof theeigenvaluesof A. Itinvolvesonlytheconditionnumberλmax/λmin. Problem

gives the “optimal” error estimate but it is not so easy to compute. That optimalestimateneedsalltheeigenvaluesof A,while(22)usesonlytheextremeeigenvalues

max(A) and λmin(A)—which inpracticewecanboundaboveandbelow.



207


Minimum Residual Methods

WhenA isnotsymmetric positive definite, conjugategradient isnotguaranteed to

solve Ax = b. Most likely it won’t. We will follow van der Vorst [ ] in briefly

describingtheminimumnormresidualapproach,leadingtoMINRESandGMRES.

These

methods

choose

x j

in

the

Krylov

subspace

K j so

that

b−

Ax j

is

minimal.

FirstwecomputetheorthonormalArnoldivectorsq 1, . . . , q j

. Theygointhecolumnsof Q j

, so QT j

Q j

= I . As in (19) we set x j

= Q j

y, to express the solution as a

combinationof thoseq ’s. Thenthenormof theresidualr j

using(9) is

b −Ax j

=

b −AQ j

y=

b −Q j+1H j+1,j

y. (23)

T TThesevectorsareallintheKrylovspaceK j+1, where r j

(Q j+1Q jT+1r j

) = r j

r j

. This

saysthatthenormisnotchangedwhenwemultiplybyQ jT+1. Ourproblembecomes:

Choose

y

to

minimize

r j

=

Q jT+1b −

H j+1,jy

=

f

−

Hy

.

(24)

This isanordinary least squares problem fortheequationHy=f with only j+ 1

equations and j unknowns. The right side f = QT j+1b is (r0, 0, . . . , 0) as in (19).

ThematrixH =H j+1,j

isHessenbergasin(9),withonenonzerodiagonalbelowthe

main diagonal. We face a completely typical problem of numerical linear algebra:Use

the

special

properties

of H and

f to

find

a

fast

algorithm

that

computes y. The

twofavoritealgorithmsforthis leastsquaresproblemarecloselyrelated:

MINRES

A

is

symmetric

(probably

indefinite,

or

we

use

CG)

and

H

is

tridiagonal

GMRES Aisnot symmetricandtheuppertriangularpartof H canbefull

Inbothcaseswewanttoclearoutthatnonzerodiagonalbelowthemaindiagonalof H . Thenaturalwaytodothat,onenonzeroentryatatime,isby“Givens rotations .”

Theseplanerotationsaresousefulandsimple(theessentialpartisonly2by2)that

wecompletethissectionbyexplainingthem.

Givens Rotations

The direct approach to the least squares solutionof Hy=f constructs thenormalequationsH THy =H Tf . Thatwasthecentral ideainChapter1,butyouseewhatwe lose. If H isHessenberg, withmany goodzeros, H TH is full. Thosezeros inH shouldsimplifyandshortenthecomputations,sowedon’twantthenormalequations.

The other approach to least squares is by Gram-Schmidt. We factor H into

orthogonal times upper triangular. Since the letter Q is already used, the or-thogonalmatrixwillbecalledG (afterGivens). TheuppertriangularmatrixisG−1H .The3by2caseshowshowaplanerotationG−1 canclearoutthesubdiagonalentry21



208


h21:

cosθ sinθ 0 h11 h12 ∗ ∗

G−1

21 H =

− sinθ cosθ 0

h21 h22=

0

∗ . (25)

0 0 1 0 h32 0 ∗

GThat

bold

zero

entry

requires

h11

sin

θ

=

h21

cos

θ,

which

determines

θ.

A

second

−1rotation

G−1

32 2132 ,

in

the

2-3

plane,

will

zero

out

the

3,

2

entry.

Then

G−1 H

is

a

squareuppertriangularmatrixU abovearowof zeros!

TheGivensorthogonalmatrixisG=G21G32 butthereisnoreasontodothismul-tiplication. WeuseeachGij

asitisconstructed,tosimplifytheleastsquaresproblem.Rotations(andallorthogonalmatrices) leavethelengthsof vectorsunchanged:

21 Hy

− G−1G−1 U F Hy

− f =

G−1G−1 f =

y

− . (26)32 32 21 0 e

This length is what MINRES and GMRES minimize. The row of zeros below U

means

that

the

last

entry

e

is

the

error—we

can’t

reduce

it.

But

we

get

all

the

otherentries exactly rightby solving the j by j system Uy =F (here j = 2). This gives

thebest leastsquares solutiony. Goingbacktotheoriginalproblemof minimizing

r =

b

− Ax j

, the best x j

intheKrylovspaceK j isQ j

y.

For non-symmetric A (GMRES rather than MINRES) we don’t have a short

recurrence. Theuppertriangle inH can be full, and step j becomes expensive and

possiblyinaccurateas j increases. Sowemaychange“fullGMRES”toGMRES(m),which

restarts

the

algorithm

everym

steps.

It

is

not

so

easy

to

choose

a

goodm.

Problem Set 3.6

1 Create K2D for a 4 by 4 square grid with N 2 = 32 interior mesh points (so

n=9). Printout its factorsK=LU(or itsCholesky factorC=chol(K) for

thesymmetrizedformK=CTC). Howmanyzerosinthesetriangularfactors?

Also

print

outinv(K)

to

see

that

it

is

full.

2 AsN increases,whatpartsof theLUfactorsof K2D arefilled in?

3 Can you answer the same question for

K3D? In each case we really want an

estimatecN p of thenumberof nonzeros(themostimportantnumber is p).

4

Use

the

tic;

...;

toc

clocking

command

to

compare

the

solution

time

for

K2Dx

=randomf inordinaryMATLABandsparseMATLAB(whereK2D isdefinedas

a

sparse

matrix).

Above

what

value

of N does

the

sparse

routine

K\f

win?

5 Compareordinaryvs. sparsesolutiontimes inthethree-dimensional K3Dx=

randomf . At which N doesthesparseK\f begintowin?

6 IncompleteLU

7 Conjugategradients



3.6.

SOLVING LARGE LINEAR SYSTEMS 209

8 DrawthenextstepafterFigure3.19whenthematrixhasbecome4by4and

the graph has nodes 2–4–5–6. Which have minimum degree? Is there more

fill-in?

9 Redraw the right side of Figure 3.19 if row number 2 is chosen as the second

pivot

row.

Node

2

does

not

have

minimum

degree.

Indicate

new

edges

in

the5-nodegraphandnewnonzerosF

inthematrix.

T10 To show that T −1K =I +eT1 in(7),withe1 = [ 1 0 . . . 0],wecanstartfrom

T TK =T +e1e1 . Then T −1K =I + (T −1e1)e1 andweverifythate1 =T :

1

−1 N 1

T =

−1 2

−1

· · ·

N − 1

·

=

0

·

=e1 .

−1 2 1

0

Second

differences

of

a

linear

vector

are

zero.

Now

multiply

T

−1

K

=

I

+

eT

1

timesI − (eT1 )/(N +1)toestablishtheinversematrixK −1T in(7).

11 ArnoldiexpresseseachAq k

ashk+1,kq k+1 +hk,kq k

+

· · · +h1,k

q 1. Multiplybyq TiTtofindhi,k

=q i

Aq k

. If Ais symmetric youcanwritethisas(Aq i)Tq k. Explain

why(Aq i)Tq k

= 0 for i < k

− 1byexpandingAq i

intohi+1,iq i+1 +

· · · +h1,iq 1.We have a short recurrence if A = AT (only hk+1,k

and hk,k

and hk−1,k

are

nonzero).

solving large linear systems

Documents