solving large linear systems
DESCRIPTION
CHAPTER 3. BOUNDARY VALUE PROBLEMS. How to solve them.TRANSCRIPT
7/21/2019 Solving Large Linear Systems
http://slidepdf.com/reader/full/solving-large-linear-systems 1/18
192
CHAPTER 3. BOUNDARY VALUE PROBLEMS
3.6 SolvingLarge Linear Systems
Finite
elements
and
finite
differences
produce
large
linear
systems KU =
F . The
matrices K are
extremely
sparse .
They
have
only
a
small
number
of
nonzero
entries
in
atypicalrow. In“physicalspace”thosenonzerosareclusteredtightlytogether—they
come
from
neighboring
nodes
and
meshpoints.
But
we
cannot
number
N 2 nodes
in
a
planeinanywaythatkeepsneighborsclosetogether! Soin2-dimensionalproblems,andevenmorein3-dimensionalproblems,wemeetthreequestionsrightaway:
1. Howbesttonumberthenodes
2. Howtousethesparsenessof K (whennonzeroscanbewidelyseparated)
3. Whethertochoose
direct
eliminationoran
iterative
method.
That
last
point
will
split
this
section
into
two
parts—elimination
methods
in
2D
(where
node
order
is
important)
and
iterative
methods
in
3D
(where
preconditioning
is
crucial).
To fix ideas, we will create the n equations KU = F from Laplace’s difference
equation inan interval, asquare, andacube. WithN unknowns ineach direction,K hasordern=N orN 2 orN 3. Thereare3or5or7nonzeros inatypicalrowof thematrix. Seconddifferences in1D,2D,and3Dareshown inFigure3.17.
−1 −1Block
−1
TridiagonalK Tridiagonal
4 K 6
−1 2
−1 −1 −1 −1 −1
N 2 byN 2
N byN −1
N 3 byN 3 −1 −1
Figure3.17: 3,5,7pointdifferencemoleculesfor
−uxx,−uxx
−uyy
,−uxx
−uyy
−uzz
.
Along
a
typical
row
of
the
matrix,
the
entries
add
to
zero.
In
two
dimensions
this
is4
−1
−1
−1
−1=0. This“zerosum”remainstrueforfiniteelements(theelement
shapes decide the exact numerical entries). It reflects the fact that u = 1 solves
Laplace’s equation and U i
= 1 has differences equal to zero. The constant vectorsolves KU = 0 except near the boundaries . When a neighbor is a boundary point
whereU i
isknown, itsvaluemovesontotherightsideof KU =F . Thenthatrowof K isnot zerosum . Otherwise K wouldbesingular,if K ∗ones(n,1)=zeros(n,1).
Using block matrix notation, we can create the 2D matrix K = K2D from the
familiarN by N seconddifferencematrixK . Wenumberthenodesof thesquarea
7/21/2019 Solving Large Linear Systems
http://slidepdf.com/reader/full/solving-large-linear-systems 2/18
193
3.6. SOLVING LARGE LINEAR SYSTEMS
rowatatime(this“naturalnumbering” isnotnecessarilybest). Thenthe
−1’sfor
theneighboraboveandtheneighborbelowareN positionsawayfromthemain
diagonal of K2D. The2Dmatrix isblocktridiagonalwithtridiagonalblocks:
2
−1 K + 2I
−I
−1 2
−1
· · ·
K2D =
−I K
+ 2I
−I
· · ·
K
=
(1)
−1 2
−I K + 2I
SizeN Eliminationinthisorder:K2D hassizen=N 2
TimeN Bandwidth
w=N, Spacenw=N 3, Time
nw2 =N 4
The matrix
K2D has 4’s down the main diagonal. Its bandwidth w = N is the
distancefromthediagonaltothenonzerosin
−I . Manyof thespacesinbetweenare
filledduringelimination! Thenthestoragespacerequiredforthefactorsin
K=
LU
is of order nw = N 3. The time is proportional to nw2 = N 4, when n rows each
contain
w
nonzeros,
and
w
nonzeros
below
the
pivot
require
elimination.
Those counts are not impossibly large in many practical 2D problems (and we
showhowtheycanbereduced). Thehorrifying largecountscome forK3D inthree
dimensions. Supposethe3Dgridisnumberedbysquarecross-sectionsinthenaturalorder1, . . . , N . Then K3D hasblocks of orderN 2 fromthosesquares. Eachsquare
isnumberedasabovetoproduceblockscomingfromK2D andI=I2D:
K2D + 2I
−I Sizen=N 3
K3D =
−I K2D + 2I
−I
· · ·
Bandwidthw =N 2
Elimination
spacenw
=N 5
−I K2D + 2I
Elimination
time
≈nw2 =N 7
Nowthemaindiagonalcontains6’s,and“insiderows”havesix
−1’s. Nexttoapoint
oredgeorcornerof theboundarycube,we loseoneortwoorthreeof those−1’s.
Thegoodwaytocreate
K2D from K and I (N by N ) is to use the kron(A, B)
command. This
Kronecker
product replaceseachentryaij
bytheblockaij
B. To take
seconddifferences inallrowsatthesametime,andthenallcolumns,usekron:
K2D =kron(K, I ) + kron(I, K ). (2)
The identity matrix in two dimensions is
I2D = kron(I, I ). This adjusts to allow
rectangles,
with
I ’s
of
different
sizes,
and
in
three
dimensions
to
allow
boxes.
For
acubewetakeseconddifferences insideallplanesandalsointhez -direction:
K3D =kron(K2D, I ) + kron(I2D, K ).
HavingsetupthesespecialmatricesK2D andK3D,wehavetosaythatthereare
specialwaystoworkwiththem. Thex,y,z directionsareseparable. Thegeometry
(abox)isalsoseparable. SeeSection7.2onFast Poisson Solvers . Herethematrices
K andK2D andK3D areservingasmodelsof the type of matrices that we meet .
7/21/2019 Solving Large Linear Systems
http://slidepdf.com/reader/full/solving-large-linear-systems 3/18
194
CHAPTER 3. BOUNDARY VALUE PROBLEMS
Minimum Degree Algorithm
Wenowdescribe(alittleroughly)ausefulreorderingof thenodesandtheequationsin
K2DU=
F. The orderingachievesminimum degree ateach step—the
number
of
nonzeros
below
the
pivot
row
is
minimized. Thisisessentiallythealgorithm
used
in
MATLAB’s
command
U
=
K \F
, when
K
has
been
defined
as
a
sparse
matrix.
Welistsomeof thefunctionsfromthesparfundirectory:
speye (sparse identityI ) nnz (numberof nonzeroentries)
find (findindicesof nonzeros) spy (visualizesparsitypattern)colamdandsymamd (approximateminimumdegreepermutationof K )
Youcantestandusetheminimumdegreealgorithmswithoutacarefulanalysis. The
approximationsarefasterthantheexactminimumdegreepermutationscolmmdand
symmmd. Thespeed(intwodimensions)andtheroundoff errorsarequitereasonable.
In
the
Laplace
examples,
the
minimum
degree
ordering
of
nodes
is
irregular
com-paredto“arowatatime.” Thefinalbandwidth isprobablynotdecreased. Butthe
nonzeroentriesarepostponedaslongaspossible! Thatisthekey.
Thedifferenceisshowninthearrowmatrixof Figure3.18. Ontheleft,minimum
degree(onenonzerooff thediagonal)leadstolargebandwidth. Butthereisno fill-in .Eliminationwillonlychange its lastrowandcolumn. Thetriangular factorsLand
U have allthe samezerosas A. Thespace forstoragestays at3n, andelimination
needsonlyndivisionsandmultiplicationsandsubtractions.
∗ ∗
∗ ∗
∗ ∗
∗ ∗
∗ ∗
∗ ∗
∗ ∗
Bandwidth
6 and
3
Fill-in
0 and
6
∗ ∗
∗ ∗
∗ ∗ ∗ ∗
∗
∗
∗
∗ ∗
F F
∗ F
∗
F
∗ ∗ ∗ ∗ ∗ ∗ ∗
∗ F F
∗
Figure3.18: Arrowmatrix: Minimumdegree(noF)againstminimumbandwidth.
The
second
ordering
reduces
the
bandwidth
from
6
to
3.
But
when
row
4
is
reached as the pivot row, the entries indicated by F are filled in . That full lower
quarter of A gives
1
8n
2 nonzeros to both factors L and U . You see that the whole
“profile”of thematrixdecidesthefill-in,not justthebandwidth.
The minimum degree algorithm chooses the (k+1)st pivot column, after k
columnshavebeeneliminatedasusualbelowthediagonal,bythefollowingrule:
In the remaining matrix of size n
− k,select the column with the fewest nonzeros.
7/21/2019 Solving Large Linear Systems
http://slidepdf.com/reader/full/solving-large-linear-systems 4/18
195
3.6. SOLVING LARGE LINEAR SYSTEMS
The component of U corresponding to that column is renumbered k+ 1. So is the
nodeinthefinitedifferencegrid. Of courseeliminationinthatcolumnwillnormally
produce new nonzeros in the remaining columns! Some fill-in is unavoidable. So
thealgorithmmustkeeptrackof thenewpositionsof nonzeros,andalsotheactualentries. It is thepositions that decide the ordering of unknowns. Then theentries
decide
the
numbers
in
L
and
U
.
Example Figure 3.19 shows a small example of the minimal degree ordering, forLaplace’s 5-pointscheme. The node connections produce nonzero entries (indicated by
∗) in K. The problem has six unknowns. K has two 3 by 3 tridiagonal blocks from
horizontal links,andtwo3by3blockswith
−I fromvertical links.
The
degree
of
a
node isthenumberof connectionstoothernodes. This isthe
numberof nonzeros inthatcolumnof K. Thecornernodes1,3,4,6allhavedegree
2. Nodes2and5havedegree3. Alargerregionhas insidenodesof degree4,which
will not be eliminated first. The degrees change as elimination proceeds, because of
fill-in .
Thefirsteliminationstepchoosesrow1aspivotrow,becausenode1hasminimum
degree2. (Wehadtobreakatie! Anydegree2nodecouldcomefirst,leadingtodifferent
elimination orders.) The pivot is P, the other nonzeros in that row are boxed. When
row 1 operates on rows 2 and 4, it changes six entries below it. In particular, the two
fill-in entries marked by Fchange to nonzeros . Thisfill-inof the(2,4)and(4,2)entriescorrespondstothedashed lineconnectingnodes2and4 inthegraph.
1
2
3
F F F
2
3
4 5 6 4 5 6
1
2
3
4
5
6
∗
∗P 2
3
4
5
6
∗ F
∗ F
∗ P
F
∗
F
∗ ∗
∗ ∗ ∗
∗ ∗ F
∗
∗∗ ∗
2
3
P
Fill-in
F
from
4
5
6
Pivots
Zeros
elimination
∗
F
∗
∗
∗ ∗
∗ ∗ ∗
∗ ∗
∗ ∗
Figure3.19: Minimumdegreenodes1and3. ThepivotsPareinrows1and3;new
edges2–4and2–6inthegraphmatchthematrixentriesFfilledinbyelimination.
7/21/2019 Solving Large Linear Systems
http://slidepdf.com/reader/full/solving-large-linear-systems 5/18
196
CHAPTER 3. BOUNDARY VALUE PROBLEMS
Nodes that were connected to the eliminated node are now connected to each other .Eliminationcontinuesonthe5by5matrix(andthegraphwith5nodes). Node2stillhasdegree3,soitisnoteliminatednext. If webreakthetiebychoosingnode3,elimination
using thenewpivotPwillfill inthe(2,6)and(6,2)positions. Node 2becomes linked
to node 6because they were both linked to the eliminated node 3.
The
problem
is
reduced
to
4
by
4,
for
the
unknown
U
’s
at
the
remaining
nodes
2,4,5,6. Problem asksyoutotakethenextstep—chooseaminimumdegreenode
andreducethesystemto3by3.
Storing the Nonzero Structure = Sparsity Pattern
AlargesystemKU=Fneedsafastandeconomicalstorageof thenodeconnections(whichmatchthepositionsof nonzerosinK). Theconnectionsandnonzeroschange
aselimination proceeds. The list of edges andnonzeropositions corresponds tothe
“adjacency
matrix
”
of
the
graph
of
nodes.
The
adjacency
matrix
has
1
or
0
to
indicate
nonzeroorzeroin
K.
Foreachnode i,wehavealistadj(i)of thenodesconnectedtoi. How to combine
theseintoonemasterlistNZforthewholegraphandthewholematrix
K? A simple
wayistostorethelistsadj(i)sequentiallyinNZ(thenonzerosfori= 1 up to i=n).AnindexarrayINDof pointerstellsthestartingpositionof thesublistadj(i) within
the master list NZ. It is useful to give IND an (n+1)stentry to point to the finalentry in NZ (or to the blank that follows, in Figure 3.20). MATLAB will store one
morearray(thesamelengthnnz(K)asNZ)togivetheactualnonzeroentries.
NZ
|
2 4↑
|
1 3↑
5
|
2 6↑
|
1 5↑
|
2↑
4
6
|
3↑
5
| ↑
IND 1 3 6 8 10 13 15
Node 1 2 3 4 5 6
Figure
3.20:
Master
list
NZ
of
nonzeros
(neighbors
in
Figure
3.19).
Positions
in
IND.
The indicesiarethe“originalnumbering”of thenodes. If there isrenumbering,theneworderingcanbestoredasapermutationPERM.ThenPERM(i) = k when
the
new
number
i
is
assigned
to
the
node
with
original
number
k. The
text
[GL]
byGeorgeandLiuistheclassicreferenceforthisentiresectiononorderingof thenodes.
Graph Separators
Hereisanothergoodordering,differentfromminimumdegree. Graphsormeshesare
oftenseparated intodisjoint pieces by acut. Thecut goesthroughasmallnumber
of nodesormeshpoints(aseparator). It is a good idea tonumber the nodes in the
7/21/2019 Solving Large Linear Systems
http://slidepdf.com/reader/full/solving-large-linear-systems 6/18
3.6. SOLVING LARGE LINEAR SYSTEMS 197
separator last . Elimination isrelatively fast forthedisjointpiecesP andQ. It only
slowsdownattheend,forthe(smaller)separatorS .
ThethreegroupsP,Q,S of meshpointshavenodirectconnectionsbetweenP and
Q(theyarebothconnectedtotheseparatorS ). Numberedinthatorder,the“block
arrow”
stiffness
matrix
and
its
K
=
LU
factorization
look
like
this:
K
K P 0 K P S LP U P 0 A
K=
0
K Q
K QS
L=
0
LQ
U=
U Q
B
SP K SQ
K S X Y Z C (3)
The zero blocks in
K give zero blocks in
L and
U. The submatrix K P comes firstinelimination,toproduceLP
andU P
. ThencomethefactorsLQU Q
of K Q,followed
bytheconnectionsthroughtheseparator. Themajorcostisoftenthatlaststep,the
solutionof afairlydensesystemof thesizeof theseparator.
1 4 1 5 3
S Q2 P
6
3 5 2 6 4 P S Q
Arrow
matrix
Separator
comes
last
BlocksP,Q
(Figure3.18) (Figure3.19)
SeparatorS
Figure3.21: Agraphseparatornumbered lastproducesablockarrowmatrixK.
Figure3.21showsthreeexamples,eachwithseparators. Thegraphforaperfect
arrow
matrix
has
a
one-point
separator
(very
unusual).
The
6-node
rectangle
has
a
two-node separator in the middle. Every N by N grid can be cut by an N -point
separator(andN ismuchsmallerthanN 2). If themeshpointsformarectangle,the
bestcut isdownthemiddleintheshorterdirection.
Youcouldsaythatthenumbering of P thenQthenS isblock minimum degree .But one cut with one separator will not come close to an optimal numbering. It
is
natural
to
extend
the
idea
to
a
nested
sequence
of
cuts. P and
Q
have
their
own
separators
at
the
next
level.
This
nested
dissection
continues
until
it
is
not
productivetocutfurther. It isastrategyof “divideandconquer.”
Figure
3.22
illustrates
three
levels
of
nested
dissection
on
a
7
by
7
grid.
The
first
cutisdownthemiddle. Thentwocutsgoacrossandfourcutsgodown. Numbering
theseparatorslastwithineachstage,thematrix
Kof size49hasarrowsinsidearrows
insidearrows. Thespycommandwilldisplaythepatternof nonzeros.
Separatorsandnesteddissectionshowhownumberingstrategiesarebasedonthe
graph of nodes and edges in the mesh. Those edges correspond to nonzeros in the
matrixK. Thenonzeroscreatedbyelimination(filledentriesinLandU)correspond
topaths inthegraph. Inpractice,therehastobeabalancebetweensimplicityand
optimalityinthenumbering—inscientificcomputingsimplicityisaverygoodthing!
7/21/2019 Solving Large Linear Systems
http://slidepdf.com/reader/full/solving-large-linear-systems 7/18
7/21/2019 Solving Large Linear Systems
http://slidepdf.com/reader/full/solving-large-linear-systems 8/18
199
3.6. SOLVING LARGE LINEAR SYSTEMS
morepowerful. Sotheolderiterationsof JacobiandGauss-Seidelandoverrelaxation
arelessfavoredinscientificcomputing,comparedtoconjugategradientsandGM-RES.WhenthegrowingKrylovsubspacesreachthewholespaceRn, these methods
(inexactarithmetic)givetheexactsolutionA−1b. But inrealitywestopmuchear-lier, long beforensteps are complete. The conjugategradient method ( for positive
definite
A,
and
with
a
good
preconditioner
)
has
become
truly
important.
The next tenpageswill introduce you to
numerical
linear
algebra. This has
become a central part of scientific computing, with a clear goal: Find
a
fast
stable
algorithm
that
uses
the
special
properties
of
the
matrices . We meet matrices that
aresparseorsymmetric ortriangularororthogonalortridiagonalorHessenberg or
Givens or Householder. Those matrices are at the core of so many computationalproblems. The algorithm doesn’t need details of the entries (which come from the
specific application). By using only their structure, numerical linear algebra offersmajorhelp.
Overall,eliminationwithgoodnumberingisthefirstchoiceuntilstorageandCPU
time
become
excessive.
This
high
cost
often
arises
first
in
three
dimensions.
At
that
pointweturnto iterativemethods, whichrequiremoreexpertise. Youmustchoose
the method and the preconditioner. The next pages aim to help the reader atthis
frontierof scientificcomputing.
PureIterations
Webeginwithold-stylepure iteration(notobsolete). The letterK willbereserved
for“Krylov”sowe leavebehindthenotationKU =F . The linearsystem becomes
Ax=bwithalargesparsematrixA,notnecessarilysymmetricorpositivedefinite:
Linear systemAx=b Residualrk
=b−Axk
PreconditionerP ≈A
ThepreconditionerP attemptstobe“closetoA” and at the same time much easier
to work with. A diagonal P is one extreme (not very close). P = A is the otherextreme(tooclose). SplittingthematrixAgivesanequivalentformof Ax=b:
Splitting P x = (P −A)x+b . (4)
Thissuggestsaniteration,inwhicheveryvectorxk
leadstothenextxk+1:
Iteration
P xk+1 = (P
−
A)xk
+
b .
(5)
Startingfromanyx0,thefirststepfindsx1 fromP x1 = (P −A)x0 +b. Theiteration
continuestox2 withthesamematrixP ,soitoftenhelpstoknowitstriangularfactors
LandU . SometimesP itself istriangular,oritsfactorsLandU areapproximationstothetriangularfactorsof A. TwoconditionsonP maketheiterationsuccessful:
1. Thenewxk+1 mustbequicklycomputable. Equation(5)mustbefasttosolve.
2. Theerrorsek
=x
−xk
mustconvergequicklytozero.
7/21/2019 Solving Large Linear Systems
http://slidepdf.com/reader/full/solving-large-linear-systems 9/18
200
CHAPTER 3. BOUNDARY VALUE PROBLEMS
Subtractequation(5)from(4)tofindtheerror equation. Itconnectsek
toek+1:
Error P ek+1 = (P −A)ek
whichmeans ek+1 = (I −P −1A)ek
=Mek
. (6)
The right side b disappears in this error equation. Each step multiplies the error
vector
by
M
=
I
−
P
−1A.
The
speed
of
convergence
of
xk to
x
(and
of
ek to
zero)
dependsentirelyonM . The
test
for
convergence
is
given
by
the
eigenvalues
of M :
Convergence test Everyeigenvalueof M musthave
|λ(M )|<1.
Thelargesteigenvalue(inabsolutevalue)isthe
spectral
radiusρ(M ) = max
|λ(M )|.Convergencerequiresρ(M )<1. Theconvergence rate issetbythe largesteigen-value. Foralargeproblem,wearehappywithρ(M ) = .9andevenρ(M ) = .99.
Suppose that the initial error e0 happens to be an eigenvector of M . Then the
nexterror ise1 =Me0 =λe0. Ateverysteptheerror ismultipliedbyλ, so we must
have
|λ|
<
1.
Normally
e0 is
a
combination
of
all
the
eigenvectors.
When
the
iterationmultiplies byM ,eacheigenvector ismultipliedby itsowneigenvalue. Afterk steps
thosemultipliersareλk. Wehaveconvergence if all|λ|<1.
Forpreconditionerwefirstproposetwosimplechoices:
Jacobi iteration P =diagonalpartof A
Gauss-Seidel iteration P =lowertriangularpartof A
Typical examples have spectral radius ρ(M ) = 1
−cN −1. This comes closer and
closerto1asthemeshisrefinedandthematrixgrows. AnimprovedpreconditionerP can give ρ(M ) = 1
−cN −1/2. Then ρ is smaller and convergence is faster, as in
“overrelaxation.”
But
a
different
approach
has
given
more
flexibility
in
constructing
agoodP , from a quick incomplete LU factorization of the true matrix A:
I ncomplete
LU P =(approximationtoL)(approximationtoU ).
TheexactA=LU hasfill-in,sozeroentriesinAbecomenonzeroinLandU . The ap-proximateLandU couldignorethisfill-in(fairlydangerous). OrP =LapproxU approxcan keep only the fill-in entries
F above a fixed threshold. The variety of options,
andthe factthatthecomputer candecideautomaticallywhich entries tokeep, has
madethe
ILU idea(incompleteLU )averypopularstartingpoint.
Example The
−1,2,−1 matrixA=K provides anexcellentexample. Wechoose
thepreconditioner P =T ,thesamematrixwithT 11 = 1 insteadof K 11 = 2. The LU factorsof T areperfectfirstdifferences,withdiagonalsof +1and
−1. (Rememberthatallpivotsof T equal1,whilethepivotsof K are2/1,3/2,4/3, . . .)Wecancomputethe
right side of T −1Kx = T −1b with only 2N additions and no multiplications (just back
substitutionusingLandU ). Idea: This LandU areapproximatelycorrect forK .
ThematrixP −1A=T −1K ontheleftsideistriangular. Morethanthat,T isarank1
change from K (the1,1entry changes from2 to 1). It follows thatT −1K and K −1T
7/21/2019 Solving Large Linear Systems
http://slidepdf.com/reader/full/solving-large-linear-systems 10/18
201
3.6. SOLVING LARGE LINEAR SYSTEMS
arerank1changesfromtheidentitymatrixI . AcalculationinProblem showsthatonly the first column of I is changed ,bythe“linearvector”= (N, N −1, . . . , 1):
P −1A=T −1K =I +eT and K −1T =I −(eT1 )/(N + 1) . (7)1
T
Here
e1 =
1 0
. . .
0
so
eT
has
first
column
.
This
example
finds
x
=
K −1
b
by1
aquickexactformula(K −1T )T −1b,needingonly2N additionsforT −1 andN additionsandmultiplicationsforK −1T . Inpracticewewouldn’tpreconditionthisK (justsolve).
Theusualpurposeof preconditioningistospeedupconvergenceforiterativemethods,and that depends on the eigenvalues of P −1A. Here the eigenvalues of T −1K are itsdiagonalentriesN +1, 1, . . . , 1. Thisexamplewillillustrateaspecialpropertyof conjugate
gradients, that with only two different eigenvalues it reaches the true solution x in two
steps.
The iterationP xk+1 = (P −A)xk
+b istoosimple! Itischoosingoneparticularvector
in
a
“Krylov
subspace.”
With
relatively
little
work
we
can
make
a
much
better
choice
of
xk.
Krylov
projections
are
the
state
of
the
art
in
today’s
iterative
methods.
Krylov Subspaces
Our original equation is Ax = b. The preconditioned equation is P −1Ax = P −1b.When
we
write P −1,
we
never
intend
that
an
inverse
would
be
explicitly
computed
(except inour example). The ordinary iteration isa correction toxk
by the vector
P −1rk:
P xk+1 = (P
−A)xk
+b or P xk+1 =P xk
+rk
or xk+1 =xk
+P −1rk
. (8)
Here rk
= b
−Axk
is the residual. It is the error in Ax = b, not the error ek
in x. The symbol P −1rk
represents the change from xk
to xk+1, but that step is
notcomputed bymultiplying P −1 timesrk
. Wemightuse incomplete LU , or a few
steps of a “multigrid” iteration, or “domain decomposition.” Or an entirely new
preconditioner.
IndescribingKrylovsubspaces,IshouldworkwithP −1A. For
simplicity
I
will
only
write A. IamassumingthatP hasbeenchosenandused,andtheprecondi-
tionedequationP −1Ax=P −1b isgiventhenotationAx=b. Thepreconditioner is
nowP =I . OurnewmatrixAisprobablybetterthantheoriginalmatrixwiththat
name.
The Krylov subspaceKk(A, b)contains all combinations of b,Ab, . . . ,Ak−1b.
These arethevectors thatwe cancompute quickly, multiplying by asparse A. We
lookinthisspaceKk fortheapproximationxk
tothetruesolutionof Ax=b. Notice
thatthepureiterationxk
= (I −A)xk−1 +bdoesproduceavector inKk whenxk−1
is in Kk−1. The Krylov subspace methods make other choices of xk
. Here are four
differentapproachestochoosingagoodxk
inKk —thisistheimportantdecision:
7/21/2019 Solving Large Linear Systems
http://slidepdf.com/reader/full/solving-large-linear-systems 11/18
202
CHAPTER 3. BOUNDARY VALUE PROBLEMS
1. Theresidualrk
=b − Axk
isorthogonaltoKk (Conjugate Gradients, . . . )
2. Theresidualrk
hasminimumnormforxk
inKk (GMRES,MINRES, . . . )
3. rk
isorthogonaltoadifferentspacelike
Kk(AT) (BiConjugate
Gradients, . . . )
4. ek
hasminimumnorm(SYMMLQ; for BiCGStabxk
isinATKk(AT) ; . . . )
Ineverycasewehopetocomputethenewxk
quicklyandstablyfromtheearlierx’s.If thatrecursion only involves xk−1 andxk−2 (short recurrence) it isespecially fast.We will see this happen for conjugate gradients and symmetric positive definite A.The BiCG method in 3 is a natural extension of short recurrences to unsymmetric
A —butstabilityandotherquestionsopenthedoortothewholerangeof methods.
Tocomputexk
weneedabasis forKk. Thebestbasisq 1, . . . , q k
isorthonormal .Each
new q k
comes
from
orthogonalizing t=
Aq k−1 to
the
basis
vectorsq 1, . . . , q k−1
that
are
already
chosen.
This
is
the
Gram-Schmidt
idea
(called
modified
Gram-Schmidtwhenwesubtractprojectionsof tontotheq ’s
one
at
a
time , for numerical
stability). The iteration to compute the orthonormal q ’s is known as
Arnoldi’s
method:
1 q 1 =b/b2; %Normalizeto
q 1= 1
for j= 1, . . . , k − 1
2 t=Aq j
; %tisintheKrylovspaceK j+1(A, b)
fori= 1, . . . , j
3 hij
=q iTt; %hij
q i
=projectionof tontoq i4 t=t
− hij
q i; %Subtractcomponentof talongq i
6 q
5 hend;
j+1,j
=t2; %tisnoworthogonaltoq 1, . . . , q j
j+1 =t/h j+1,j
; %Normalizetto
q j+1= 1
end %q 1, . . . , q k
areorthonormalin
Kk
Putthecolumnvectorsq 1, . . . , q k
intoannby kmatrixQk
. Multiplyingrowsof TQT by
columns
of Qk
produces
all
the
inner
productsq i
q j
,
which
are
the
0’s
and
1’sk
in
the
identity
matrix.
The
orthonormal
property
means
that QT
k
Qk
=I k
.
Arnoldi constructs each q j+1 from Aq j
by subtracting projections hij
q i. If we
expressthestepsupto j=k
−1inmatrixnotation,theybecomeAQk−1 =Qk
H k,k−1:
Arnoldi h11 h12 · h1,k−1
AQk−1 =
Aq 1 · · · Aq k−1
=
q 1 · · · q k
h21 h22 · h2,k−1
0 h23 · ·0 0 · hk,k−1
. (9)
nbyk − 1 nbyk kbyk − 1
ThatmatrixH k,k−1 is“upperHessenberg”because ithasonlyonenonzerodiagonalbelow the main diagonal. We check that the first column of this matrix equation
7/21/2019 Solving Large Linear Systems
http://slidepdf.com/reader/full/solving-large-linear-systems 12/18
203
3.6. SOLVING LARGE LINEAR SYSTEMS
(multiplyingbycolumns!) producesq 2:
Aq 1 −h11q 1
hAq 1 =h11q 1 +h21q 2 or q 2 = . (10)
21
ThatsubtractionisStep
4inArnoldi’salgorithm. Divisionbyh21 isStep
6.
Unless
more
of
the
hij
are
zero,
the
cost
is
increasing
at
every
iteration.
We
have
kdotproductstocomputeatstep
3and
5, and kvectorupdatesinsteps4and
6. A
short recurrence meansthatmostof thesehij
arezero. ThathappenswhenA=AT.
ThematrixH istridiagonalwhenAissymmetric. Thisfactisthefounda-tionof conjugategradients. Foramatrixproof,multiplyequation(9)byQT . The k−1
rightsidebecomesH without its lastrow,because(QTk−1Qk
)H k,k−1 = [ I 0]H k,k−1.TheleftsideQT
k−1AQk−1 isalwayssymmetricwhenAissymmetric. SothatH matrix
has to be symmetric, which makes it tridiagonal . There are only three nonzeros in
the rows and columns of H ,andGram-Schmidttofindq k+1 onlyinvolvesq k
andq k−1:
Arnoldi
when
A
=
AT
Aq k =
hk+1,kq k+1 +
hk,kq k +
hk−
1,kq k−1 .
(11)
This
is
the
Lanczos
iteration . Each
newq k+1 = (Aq k
−hk,kq k
−hk−1,kq k−1)/hk+1,k
involves
one
multiplicationAq k
,
two
dot
products
for
newh’s,
and
two
vector
updates.
The QR Method for Eigenvalues
Allowmeanimportantcommentontheeigenvalue problem Ax=λx. Wehaveseen
thatH k−1 =QTk−1AQk−1 istridiagonal if A=AT. When k
−1reachesnandQn
is
square,thematrixH =QTAQn
=Q−1AQn
hasthesameeigenvaluesasA:n n
Same
λ
Hy
=
Q−1AQny
=
λy
gives
Ax
=
λx
with
x
=
Qny .
(12)n
ItismucheasiertofindtheeigenvaluesλforatridiagonalH thanthefororiginalA.
Thefamous“QR
method”fortheeigenvalueproblemstartswithT 1 =H ,factors
it into T 1 =Q1R1 (this isGram-Schmidt on the shortcolumns of T 1), andreverses
ordertoproduceT 2 =R1Q1. ThematrixT 2 isagaintridiagonal,anditsoff-diagonalentries are normally smaller than for T 1. The next step is Gram-Schmidt on T 2,orthogonalizingitscolumns inQ2 bythecombinations intheuppertriangularR2:
QR
Method
Factor T 2 into
Q2R2 . Reverse
order
to T 3 =
R2Q2 =Q−1T 2Q2 (13)2 .
By
the
reasoning
in
(12),
any
Q−1
T Q
has
the
same
eigenvalues
as
T
.
So
the
matricesT 2, T 3, . . . all have the same eigenvalues as T 1 = H and A. (These square Qk
from
Gram-Schmidt are entirely different from the rectangular Qk
in Arnoldi.) We can
even shift T before Gram-Schmidt, and we should, provided we remember to shift
back:
Shifted
QR Factor T k
−skI =Qk
Rk
. Reverseto T k+1 =Rk
Qk
+bk
I . (14)
T Whentheshiftsk
ischosen tobethen,nentryof T k
,the lastoff-diagonalentryof
k+1 becomesverysmall. Then,nentryof T k+1 movesclosetoaneigenvalue. Shifted
7/21/2019 Solving Large Linear Systems
http://slidepdf.com/reader/full/solving-large-linear-systems 13/18
204
CHAPTER 3. BOUNDARY VALUE PROBLEMS
QRis one of the great algorithms of numerical linear algebra . Itsolvesmoderate-size
eigenvalueproblemswithgreatefficiency. This isthecoreof MATLAB’seig(A).
For a large symmetric matrix, we often stop the Arnoldi-Lanczos iteration at a
tridiagonalH k
withk < n. The full n-stepprocesstoreachH n
istooexpensive,and
often
we
don’t
need
all
n
eigenvalues.
So
we
compute
(by
the
same
QR
method)
the
k
eigenvalues
of
H k
instead
of
the
n eigenvalues
of
H n.
These
computed
λ1k, λ2k, . . . , λkk
can provide good approximations to the first k eigenvalues of A. And we have an
excellentstartontheeigenvalueproblemforH k+1,if wedecidetotakeafurtherstep.
This
Lanczos
methodwillfind,approximatelyand iterativelyandquickly, the
leadingeigenvaluesof alargesymmetricmatrix.
The Conjugate Gradient Method
We return to iterative methods for Ax = b. The Arnoldi algorithm produced or-
thonormal
basis
vectors
q 1, q 2, . . .
for
the
growing
Krylov
subspaces
K1
, K2
, . . .. Nowwe select vectors x1, x2, . . . in those subspaces that approach the exact solution to
Ax = b. We concentrate on the conjugate gradient method for symmetric positive
definite A.
The
rule
for xk
in
conjugate
gradients
is
that
the
residual rk
= b
−Axk
should
be orthogonalto all vectors in
Kk. Since rk
will be in
Kk+1, it must bea multiple
of Arnoldi’s next vector q k+1! Each residual is therefore orthogonal to all previousresiduals(whicharemultiplesof thepreviousq ’s):
Orthogonal
residuals ri
Trk
= 0 for i < k . (15)
The
difference
between
rk
and
q k+1 is
that
the
q ’s
are
normalized,
as
in
q 1 =
b/b.
KSincerk−1 isamultipleof q k,thedifferencerk
−rk−1 isorthogonaltoeachsubspace
i with i < k. Certainly xi −xi−1 lies in that Ki. So ∆r is orthogonal to earlier∆x’s:
(xi−xi−1)T(rk
−rk−1) = 0 for i < k . (16)
Thesedifferences∆xand∆raredirectlyconnected,becausetheb’scancel in∆r:
rk
−rk−1 = (b −Axk)
−(b −Axk−1) =
−A(xk
−xk−1). (17)
Substituting(17)into(16),theupdatesinthex’sare“A-orthogonal”or
conjugate:
Conjugate
updates
∆x (xi
−xi−1)TA(xk
−xk−1) = 0 for i < k . (18)
Nowwehave alltherequirements. Eachconjugategradientstepwillfindanew
“search direction” dk
for the update xk
−xk−1. From xk−1 it will move the right
distance αkdk
to xk. Using (17) it will compute the new rk. The constants β k
in
the search direction and αk
in the update will be determined by (15) and (16) for
i=k −1. ForsymmetricAtheorthogonalityin(15)and(16)willbeautomaticfor
i < k
−1,asinArnoldi. Wehavea“shortrecurrence”forthenewxk
andrk.
7/21/2019 Solving Large Linear Systems
http://slidepdf.com/reader/full/solving-large-linear-systems 14/18
205
3.6. SOLVING LARGE LINEAR SYSTEMS
Hereisonecycleof thealgorithm,startingfromx0 = 0 and r0 =bandβ 1 = 0. It
involvesonlytwonewdotproductsandonematrix-vectormultiplicationAd:
Conjugate
1 β k
=rk−1rk−1/rTTk−2rk−2 %Improvementthisstep
Gradient
2 dk
=rk−1 +β k
dk−1 %Nextsearchdirection
TMethod
3
αk
=
rk−1rk−1/dTAdk
% Step length to
next
xkk
4 xk
=xk−1 +αk
dk
%Approximatesolution
5 rk
=rk−1 −αk
Adk
%Newresidualfrom(17)
Theformulas1and3forβ k
andαk
areexplainedbrieflybelow—andfullybyTrefethen-Bau()andShewchuk()andmanyothergoodreferences.
Different Viewpoints on Conjugate Gradients
I
want
to
describe
the
(same!)
conjugate
gradient
method
in
two
different
ways:
1. ItsolvesatridiagonalsystemHy=f recursively
2. Itminimizestheenergy
1xTAx
−xTbrecursively.
2
How does Ax = b change to the tridiagonal Hy= f ? That uses Arnoldi’s or-thonormalcolumnsq 1, . . . , q n
inQ, with QTQ=I andQTAQ=H :
Ax=b is (QTAQ)(QTx) = QTb which is Hy=f = (b,0, . . . , 0). (19)
TSince
q 1 is
b/b,
the
first
component
of
f
=
QTb
is
q 1 b
=
b
and
the
other
com-ponents are q Tb = 0. The conjugate gradient method is implicitly computing thisi
symmetrictridiagonalH andupdatingthesolutionyateachstep. Hereisthethird
step:
h11 h12 b
H 3y3 =
h21 h22 h23 y3
=
0
. (20)
h32 h33 0
This
is
the
equationAx
=b
projected
by Q3 onto
the
third
Krylov
subspace
K3.
Theseh’sneverappearinconjugategradients. Wedon’twanttodoArnolditoo!
It
is
the
LDLT
factors
of
H
that
CG
is
somehow
computing—two
new
numbers
at
each step. Those give a fast update fromy j−1 toy j
. The corresponding x j
=Q j
y j
fromconjugategradientsapproachestheexactsolutionxn
=Qnyn
whichisx=A−1b.
If wecanseeconjugategradientsalsoasanenergyminimizingalgorithm,wecan
extend itto nonlinear problems anduse it in optimization. Forour linear equation
Ax=b,theenergy isE (x) =
1xTAx
−xTb. MinimizingE (x) isthesameassolving
2
Ax = b, when A is positive definite (the main point of Section 1. ). The CG
iteration minimizes E (x) on the growing Krylov subspaces. On the firstsubspace K1, the line where x is αb = αd1, this minimization produces the right
7/21/2019 Solving Large Linear Systems
http://slidepdf.com/reader/full/solving-large-linear-systems 15/18
206
CHAPTER 3. BOUNDARY VALUE PROBLEMS
value for α1:
1 bTb
E (αb) = α2bTAb
−αbTb isminimizedat α1 = . (21)
2 bTAb
That
α1 is
the
constant
chosen
in
step
3
of
the
first
conjugate
gradient
cycle.
The gradient of E (x) =
2
1xTAx
−xTb is exactly Ax
−b. The
steepest
descent
direction
at x1 is
along
the
negative
gradient,
which
is r1! Thissoundsliketheperfect
directiond2 forthenextmove. Butthegreatdifficultywithsteepestdescent isthat
this r1 can be too close to the first direction. Little progress that way. So we add
the right multiple β 2d1, in order to make d2 = r1 +β 2d1 A-orthogonal to the first
directiond1.
Thenwemoveinthisconjugatedirectiond2 tox2 =x1 +α2d2. Thisexplainsthe
nameconjugate gradients ,ratherthanthepuregradientsof steepestdescent. Every
cycleof CGchoosesα j
tominimizeE (x)inthenewsearchdirectionx=x j−1 +αd j
.
The
last cycle
(if
we
go
that far) gives
the
overall
minimizer
xn =
x
=
A−1b.
Example
2 1 1 3 4 1 2 1
−1
=
Ax=b is
0
.
1 1 2
−1 0
1From x0 = 0 andβ 1 = 0 andr0 =d1 =b thefirst cyclegivesα1 =
2andx1 =
1b=
2
(2,0,0). Thenewresidual isr1 =b
−Ax1 = (0,−2,−2). Thenthesecondcycleyields
2 38 8β 2 = , d2 =
−2
, α2 = , x2 =
−1
=A−1b!
16 16−2
−1
A
Thecorrectsolutionisreachedintwosteps,wherenormallyitwilltaken= 3 steps. The
reason is that this particular A has only two distinct eigenvalues 4 and 1. In that case
−1bisacombinationof bandAb,andthisbestcombinationx2 isfoundatcycle2. The
residualr2 iszeroandthecyclesstopearly—veryunusual.
Energy minimization leads in [ ]to an estimate of the convergence rate forthe√
errore=x
−x j
inconjugategradients,usingtheA-norm
eA
= eTAe:
λ
√ √
j
max − λmin
λErrorestimate
x
−x j
A
≤2
√ √ x
−x0A
. (22)
max + λmin
λ
Thisisthebest-knownerrorestimate,althoughitdoesn’taccountforanyclusteringof theeigenvaluesof A. Itinvolvesonlytheconditionnumberλmax/λmin. Problem
gives the “optimal” error estimate but it is not so easy to compute. That optimalestimateneedsalltheeigenvaluesof A,while(22)usesonlytheextremeeigenvalues
max(A) and λmin(A)—which inpracticewecanboundaboveandbelow.
7/21/2019 Solving Large Linear Systems
http://slidepdf.com/reader/full/solving-large-linear-systems 16/18
207
3.6. SOLVING LARGE LINEAR SYSTEMS
Minimum Residual Methods
WhenA isnotsymmetric positive definite, conjugategradient isnotguaranteed to
solve Ax = b. Most likely it won’t. We will follow van der Vorst [ ] in briefly
describingtheminimumnormresidualapproach,leadingtoMINRESandGMRES.
These
methods
choose
x j
in
the
Krylov
subspace
K j so
that
b−
Ax j
is
minimal.
FirstwecomputetheorthonormalArnoldivectorsq 1, . . . , q j
. Theygointhecolumnsof Q j
, so QT j
Q j
= I . As in (19) we set x j
= Q j
y, to express the solution as a
combinationof thoseq ’s. Thenthenormof theresidualr j
using(9) is
b −Ax j
=
b −AQ j
y=
b −Q j+1H j+1,j
y. (23)
T TThesevectorsareallintheKrylovspaceK j+1, where r j
(Q j+1Q jT+1r j
) = r j
r j
. This
saysthatthenormisnotchangedwhenwemultiplybyQ jT+1. Ourproblembecomes:
Choose
y
to
minimize
r j
=
Q jT+1b −
H j+1,jy
=
f
−
Hy
.
(24)
This isanordinary least squares problem fortheequationHy=f with only j+ 1
equations and j unknowns. The right side f = QT j+1b is (r0, 0, . . . , 0) as in (19).
ThematrixH =H j+1,j
isHessenbergasin(9),withonenonzerodiagonalbelowthe
main diagonal. We face a completely typical problem of numerical linear algebra:Use
the
special
properties
of H and
f to
find
a
fast
algorithm
that
computes y. The
twofavoritealgorithmsforthis leastsquaresproblemarecloselyrelated:
MINRES
A
is
symmetric
(probably
indefinite,
or
we
use
CG)
and
H
is
tridiagonal
GMRES Aisnot symmetricandtheuppertriangularpartof H canbefull
Inbothcaseswewanttoclearoutthatnonzerodiagonalbelowthemaindiagonalof H . Thenaturalwaytodothat,onenonzeroentryatatime,isby“Givens rotations .”
Theseplanerotationsaresousefulandsimple(theessentialpartisonly2by2)that
wecompletethissectionbyexplainingthem.
Givens Rotations
The direct approach to the least squares solutionof Hy=f constructs thenormalequationsH THy =H Tf . Thatwasthecentral ideainChapter1,butyouseewhatwe lose. If H isHessenberg, withmany goodzeros, H TH is full. Thosezeros inH shouldsimplifyandshortenthecomputations,sowedon’twantthenormalequations.
The other approach to least squares is by Gram-Schmidt. We factor H into
orthogonal times upper triangular. Since the letter Q is already used, the or-thogonalmatrixwillbecalledG (afterGivens). TheuppertriangularmatrixisG−1H .The3by2caseshowshowaplanerotationG−1 canclearoutthesubdiagonalentry21
7/21/2019 Solving Large Linear Systems
http://slidepdf.com/reader/full/solving-large-linear-systems 17/18
208
CHAPTER 3. BOUNDARY VALUE PROBLEMS
h21:
cosθ sinθ 0 h11 h12 ∗ ∗
G−1
21 H =
− sinθ cosθ 0
h21 h22=
0
∗ . (25)
0 0 1 0 h32 0 ∗
GThat
bold
zero
entry
requires
h11
sin
θ
=
h21
cos
θ,
which
determines
θ.
A
second
−1rotation
G−1
32 2132 ,
in
the
2-3
plane,
will
zero
out
the
3,
2
entry.
Then
G−1 H
is
a
squareuppertriangularmatrixU abovearowof zeros!
TheGivensorthogonalmatrixisG=G21G32 butthereisnoreasontodothismul-tiplication. WeuseeachGij
asitisconstructed,tosimplifytheleastsquaresproblem.Rotations(andallorthogonalmatrices) leavethelengthsof vectorsunchanged:
21 Hy
− G−1G−1 U F Hy
− f =
G−1G−1 f =
y
− . (26)32 32 21 0 e
This length is what MINRES and GMRES minimize. The row of zeros below U
means
that
the
last
entry
e
is
the
error—we
can’t
reduce
it.
But
we
get
all
the
otherentries exactly rightby solving the j by j system Uy =F (here j = 2). This gives
thebest leastsquares solutiony. Goingbacktotheoriginalproblemof minimizing
r =
b
− Ax j
, the best x j
intheKrylovspaceK j isQ j
y.
For non-symmetric A (GMRES rather than MINRES) we don’t have a short
recurrence. Theuppertriangle inH can be full, and step j becomes expensive and
possiblyinaccurateas j increases. Sowemaychange“fullGMRES”toGMRES(m),which
restarts
the
algorithm
everym
steps.
It
is
not
so
easy
to
choose
a
goodm.
Problem Set 3.6
1 Create K2D for a 4 by 4 square grid with N 2 = 32 interior mesh points (so
n=9). Printout its factorsK=LU(or itsCholesky factorC=chol(K) for
thesymmetrizedformK=CTC). Howmanyzerosinthesetriangularfactors?
Also
outinv(K)
to
see
that
it
is
full.
2 AsN increases,whatpartsof theLUfactorsof K2D arefilled in?
3 Can you answer the same question for
K3D? In each case we really want an
estimatecN p of thenumberof nonzeros(themostimportantnumber is p).
4
Use
the
tic;
...;
toc
clocking
command
to
compare
the
solution
time
for
K2Dx
=randomf inordinaryMATLABandsparseMATLAB(whereK2D isdefinedas
a
sparse
matrix).
Above
what
value
of N does
the
sparse
routine
K\f
win?
5 Compareordinaryvs. sparsesolutiontimes inthethree-dimensional K3Dx=
randomf . At which N doesthesparseK\f begintowin?
6 IncompleteLU
7 Conjugategradients
7/21/2019 Solving Large Linear Systems
http://slidepdf.com/reader/full/solving-large-linear-systems 18/18
3.6.
SOLVING LARGE LINEAR SYSTEMS 209
8 DrawthenextstepafterFigure3.19whenthematrixhasbecome4by4and
the graph has nodes 2–4–5–6. Which have minimum degree? Is there more
fill-in?
9 Redraw the right side of Figure 3.19 if row number 2 is chosen as the second
pivot
row.
Node
2
does
not
have
minimum
degree.
Indicate
new
edges
in
the5-nodegraphandnewnonzerosF
inthematrix.
T10 To show that T −1K =I +eT1 in(7),withe1 = [ 1 0 . . . 0],wecanstartfrom
T TK =T +e1e1 . Then T −1K =I + (T −1e1)e1 andweverifythate1 =T :
1
−1 N 1
T =
−1 2
−1
· · ·
N − 1
·
=
0
·
=e1 .
−1 2 1
0
Second
differences
of
a
linear
vector
are
zero.
Now
multiply
T
−1
K
=
I
+
eT
1
timesI − (eT1 )/(N +1)toestablishtheinversematrixK −1T in(7).
11 ArnoldiexpresseseachAq k
ashk+1,kq k+1 +hk,kq k
+
· · · +h1,k
q 1. Multiplybyq TiTtofindhi,k
=q i
Aq k
. If Ais symmetric youcanwritethisas(Aq i)Tq k. Explain
why(Aq i)Tq k
= 0 for i < k
− 1byexpandingAq i
intohi+1,iq i+1 +
· · · +h1,iq 1.We have a short recurrence if A = AT (only hk+1,k
and hk,k
and hk−1,k
are
nonzero).