a general krylov method for solving symmetric systems of linear equations
DESCRIPTION
Krylov subspace methods are used for solving systems of linear equationsTRANSCRIPT
A general Krylov method for solving symmetric systems oflinear equations
Anders FORSGREN∗ Tove ODLAND∗
Technical Report TRITA-MAT-2014-OS-01Department of Mathematics
KTH Royal Institute of Technology
March 2014
Abstract
Krylov subspace methods are used for solving systems of linear equationsHx+ c = 0. We present a general Krylov subspace method that can be appliedto a symmetric system of linear equations, i.e., for a system in which H = HT .In each iteration, we have a choice of scaling of the orthogonal vectors thatsuccessively make the Krylov subspaces available. We define an extended rep-resentation of each such Krylov vector, so that the Krylov vector is representedby a triple. We show that our Krylov subspace method is able to determine ifthe system of equations is compatible. If so, a solution is computed. If not, acertificate of incompatibility is computed. The method of conjugate gradientsis obtained as a special case of our general method. Our framework gives a wayto understand the method of conjugate gradients, in particular when H is not(positive) definite. Finally, we derive a minimum-residual method based on ourframework and show how the iterates may be updated explicitly based on thetriples.
1. Introduction
An important problem in numerical linear algebra is to solve a system of equationswhere the matrix is symmetric. Such a problem may be posed as
Hx+ c = 0, (1.1)
for x ∈ Rn, with c ∈ Rn and H = HT ∈ Rn×n. Our primary motivation comesfrom KKT systems arising in optimization, where H is symmetric but in generalindefinite, see, e.g., [5], but there are many other applications. Throughout, H isassumed symmetric, any other assumptions on H at particular instances will bestated explicitly. It is also assumed throughout that c 6= 0.
A strategy for a class of methods for solving (1.1) is to generate linearly in-dependent vectors, qk, k = 0, 1, . . . until qk becomes linearly dependent for somek = r ≤ n and hence qr = 0. These methods will have finite termination properties.
∗Optimization and Systems Theory, Department of Mathematics, KTH Royal Institute of Tech-nology, SE-100 44 Stockholm, Sweden ([email protected],[email protected]). Research partially sup-ported by the Swedish Research Council (VR).
2 A general Krylov method for solving symmetric systems of linear equations
In this paper we consider a general Krylov subspace method in which the generatedvectors form an orthogonal basis for the Krylov subspaces generated by H and c,
K0(c,H) = {0}, Kk(c,H) = span{c,Hc,H2c, . . . ,Hk−1c}, k = 1, 2, . . . . (1.2)
In our method, the process by which these vectors are generated bear much re-semblance to the Lanczos process, with the difference that the available scalingparameter is left explicitly undecided. The method of conjugate gradients is derivedas a special case of this general Krylov method. In addition, the minimum-residualmethod and explicit recursions for the minimum-residual iterates can be derivedbased on our framework.
There have been very many contributions to the theory regarding the Lanczosprocess, the method of conjugate gradients and minimum-residual methods, see,e.g., [6] for an extensive survey of the years 1948-1976 and [7].
We will assume exact arithmetic and the theory developed in this paper is basedon that assumption. In the end of the paper we briefly discuss computational aspectsof our results in finite precision.
The outline of the paper is as follows. In Section 2 we define and give a re-cursion for the Krylov vectors qk ∈ Kk+1(c,H), k = 0, . . . , r. In Section 3 wedefine triples (qk, yk, δk) associated with qk, k = 0, . . . , r, such that qk = Hyk + δkc,qk ∈ Kk+1(c,H), yk ∈ Kk(c,H) and δk ∈ R, k = 0, . . . , r. These triples are definedup to a scaling and recursions for them are given in Proposition 3.1. We then stateseveral results concerning the triples and using these results we devise an algorithmin Section 3.3 that either gives the solution, if (1.1) is compatible (in this case weshow that δr 6= 0), or gives a certificate of incompatibility (in this case δr = 0). Themain attribute is that if we do not require δk to attain a specific value or sign, thenwe do not have to require anything more than symmetry of H.
Further, in Section 4, the method of conjugate gradients is illustrated in ourframework, which gives a better way of understanding its behavior, in particularwhen H is not a (positive) definite matrix. Finally, in Section 5, a minimum-residualmethod applicable also for incompatible systems is illustrated in this framework andexplicit recursions for the for the minimum-residual iterates are given. An algorithmfor this method is given in Section 5.1.
1.1. Notation
The letter i, j and k denote integer indices, other lowercase letters such as q, y andc denote column vectors, possibly with super- and/or subscripts. For a symmetricmatrix H, H � 0 denotes H positive definite. Analogously, H � 0 is used to denoteH positive semidefinite. N (H) denotes the null space of H and R(H) denotes therange space of H. We will denote by Z an orthonormal matrix whose columns forma basis for N (H). If H is nonsingular, then Z is to be interpreted as an emptymatrix. When referring to a norm, the Euclidean norm is used throughout.
2. Background 3
2. Background
Regarding (1.1), the raw data available is the matrix H and the vector c and com-binations of the two, for example represented by the Krylov subspaces generatedby H and c, as defined in (1.2). For an introduction and background on Krylovsubspaces, see, e.g., [18].
Without loss of generality, the scaling of the first Krylov vector q0 may be chosenso that q0 = c. Then one sequence of linearly independent vectors may be generatedby letting qk ∈ Kk+1(c,H) ∩ Kk(c,H)⊥, k = 1, . . . , r, such that qk 6= 0, for k =0, 1, . . . , r−1 and qr = 0 where r is the minimum index k for which qk ∈ Kk+1(c,H)∩Kk(c,H)⊥ = {0}. These vectors {q0, q1, . . . , qr−1} form an orthogonal, hence linearlyindependent, basis of Kr(c,H). We will refer to these vectors as the Krylov vectors.With q0 = c, each vector qk, k = 1, . . . , r−1, is uniquely determined up to a scaling.A vector qk ∈ Kk+1(c,H) may be expressed as
qk =
k∑j=0
δ(j)k Hjc, (2.1)
for some parameters δ(j)k , j = 0, . . . , k. In order to ensure that qk = 0 if and only
if Hkc is linearly dependent on c, Hc, . . . , Hk−1c, it must hold that δ(k)k 6= 0. Also
note that for k < r, the vectors c, Hc, . . . , Hk−1c are linearly independent. Hence,
since qk is uniquely determined up to a nonzero scaling, so are δ(j)k , j = 0, . . . , k. For
k = r, qr = 0, and since δ(r)r 6= 0, the same argument shows that δ
(j)r , j = 0, . . . , r,
are uniquely determined up to a common nonzero scaling. This is made precise inLemma A.1.
The following proposition states a recursion for such a sequence of vectors wherethe scaling factors denoted by {αk}r−1k=0 is left unspecified. The recursion in Propo-sition 2.1 is a slight generalization of the Lanczos process for generating mutuallyorthogonal vectors, see [10,11], in which the scaling of each vector qk is chosen suchthat ||qk|| = 1, k = 0, . . . , r − 1. For completeness, this proposition and its proof isincluded.
Proposition 2.1. Let r denote the smallest positive integer k for which
Kk+1(c,H) ∩ Kk(c,H)⊥ = {0}.
Given q0 = c ∈ K1(c,H), there exist vectors qk, k = 1, . . . , r, such that
qk ∈ Kk+1(c,H) ∩ Kk(c,H)⊥, k = 1, . . . , r,
for which qk 6= 0, k = 1, . . . , r − 1, and qr = 0. Each such qk, k = 1, . . . , r − 1, isuniquely determined up to a scaling, and a sequence {qk}rk=1 may be generated as
q1 = α0
(−Hq0 +
qT0 Hq0
qT0 q0q0), (2.2a)
qk+1 = αk(−Hqk +
qTkHqk
qTk qkqk +
qTk−1Hqk
qTk−1qk−1qk−1
), k = 1, . . . , r − 1, (2.2b)
4 A general Krylov method for solving symmetric systems of linear equations
where αk, k = 0, . . . , r − 1, are free and nonzero parameters. In addition, it holdsthat
qTk+1qk+1 = −αkqTk+1Hqk, k = 0, . . . , r − 1. (2.3)
Proof. Given q0 = c, let k be an integer such that 1 ≤ k ≤ r − 1. Assume thatqi, i = 0, . . . , k, are mutually orthogonal with qi ∈ Ki+1(c,H) ∩ Ki(c,H)⊥. Letqk+1 ∈ Kk+2(c,H) be expressed as
qk+1 = −αkHqk +k∑i=0
η(i)k qi, k = 0, . . . , r − 1, (2.4)
In order for qk+1 to be orthogonal to qi, i = 0, . . . , k, the parameters η(i)k , i = 0, . . . , k,
are uniquely determined as follows.For k = 0, to have qT0 q1 = 0, it must hold that
η(0)0 = α0
qT0 Hq0
qT0 q0,
hence obtaining q1 ∈ K2(c,H)∩K1(c,H)⊥ as in (2.2a), where α0 is free and nonzero.For k such that 1 ≤ k ≤ r − 1, in order to have qTi qk+1 = 0, i = 0, . . . , k, it musthold that
η(k)k = αk
qTkHqk
qTk qk, η
(k−1)k = αk
qTk−1Hqk
qTk−1qk−1, and η
(i)k = 0, i = 0, . . . , k − 2.
The last relation follows by the symmetry ofH. Hence, obtaining qk+1 ∈ Kk+2(c,H)∩Kk+1(c,H)⊥ as in the three-term recurrence of (2.2b), where αk, k = 1, . . . , r − 1,are free and nonzero.
Since q1 is orthogonal to q0, and since qk+1 is orthogonal to qk and qk−1, k =1, . . . , r − 1, pre-multiplication of (2.2) with qTk+1 yields
qTk+1qk+1 = −αkqTk+1Hqk, k = 0, . . . , r − 1.
Finally note that if qk+1 is given by (2.2), then the only term that increases the power
of H is αk(−Hqk). Since αk 6= 0, repeated use of this argument gives δ(k+1)k+1 6= 0
if qk+1 is expressed by (2.1). In fact, δ(k+1)k+1 = (−1)k+1
∏ki=0 αi 6= 0. Hence, by
Lemma A.1, qk+1 = 0 implies Kk+2(c,H) ∩ Kk+1(c,H)⊥ = {0}, so that k + 1 = r,as required.
The choice of a sign in front of the first term of (2.4) is arbitrary. Our choice ofminus-sign is made in order to get coherence with existing theory as will be shownlater on.
Many methods for solving (1.1) are based on using the Lanczos process, inwhich the scaling factors are chosen such that the generated vectors have norm oneand matrix-factorization techniques are used on the symmetric tridiagonal matrixobtained by putting the three-term recurrence on matrix form. For an introductionto how Krylov subspace methods are formalized in this way, see, e.g., [2, 15]. Forour purposes we leave these available scaling factors unspecified.
3. A general Krylov method 5
3. A general Krylov method
3.1. An extended representation of the Krylov vectors
To find a solution of (1.1), if it exists, it is not sufficient to generate the sequence{qk}rk=1. Note that, as in (2.1), qk ∈ Kk+1(c,H), k = 0, . . . , r, may be expressed as
qk =k∑j=0
δ(j)k Hjc = H
( k∑j=1
δ(j)k Hj−1c
)+ δ
(0)k c, k = 1, . . . , r. (3.1)
It is not convenient to represent qk by (2.1). Therefore, we may define y0 = 0,δ0 = 1,
yk =
k∑j=1
δ(j)k Hj−1c ∈ Kk(c,H) and δk = δ
(0)k , k = 1, . . . , r,
so that (3.1) gives qk = Hyk + δkc. These quantities may be represented by a
triple (qk, yk, δk), k = 0, . . . , r. Note that {δ(j)k }kj=0 are given in association with the
raw data H and c, the choice we make is to use only δ(0)k explicitly and collect all
other terms in yk. These triples form the basis for the results of this paper. It isstraightforward to note that if δr 6= 0, then 0 = qr = Hxr + c for xr = (1/δr)yr, sothat xr solves (1.1). It will be shown that (1.1) has a solution if and only if δr 6= 0.
We will define (q0, y0, δ0) = (c, 0, 1). As mentioned earlier, for a given k such
that 1 ≤ k ≤ r, the parameters δ(j)k , j = 1, . . . , k, are uniquely defined up to a
scaling. Hence, so is the triple (qk, yk, δk). This is made precise in the recursiongiven in Proposition 3.1.
The terms gk and xk are reserved for the case when the scaling yields δk = 1,then we denote (qk, yk, δk) by (gk, xk, 1). Such a scaling is not always well defined.However, as we shall show, there is no need to base the scaling on the value of δk.
It is possible to continue the recursion, and let
yk = Hy(1)k + δ
(1)k c, with (3.2)
y(1)k =
k∑j=2
δ(j)k Hj−2c ∈ Kk−1(c,H), k = 2, . . . , r,
in addition to y(1)1 = 0. This will be used in the analysis, but not in the algorithm
presented.
3.2. Results on the triples
Based on Proposition 2.1, the following proposition states a recursion for (qk, yk, δk),k = 1, . . . , r, given (q0, y0, δ0) = (c, 0, 1).
Proposition 3.1. Let r denote the smallest positive integer k for which
Kk+1(c,H) ∩ Kk(c,H)⊥ = {0}.
6 A general Krylov method for solving symmetric systems of linear equations
Given (q0, y0, δ0) = (c, 0, 1), there exist vectors (qk, yk, δk), k = 1, . . . , r, such that
qk ∈ Kk+1(c,H) ∩ Kk(c,H)⊥, yk ∈ Kk(c,H), qk = Hyk + δkc, k = 1, . . . , r,
for which qk 6= 0, k = 1, . . . , r−1, and qr = 0. Each such (qk, yk, δk), k = 1, . . . , r, isuniquely determined up to a scalar, and a sequence {(qk, yk, δk)}rk=1 may be generatedas
y1 = α0
(− q0 +
qT0 Hq0
qT0 q0y0), δ1 = α0
(qT0 Hq0qT0 q0
δ0), q1 = Hy1 + δ1c,
and
yk+1 = αk(− qk +
qTkHqk
qTk qkyk +
qTk−1Hqk
qTk−1qk−1yk−1
), k = 1, . . . , r − 1,
δk+1 = αk(qTkHqkqTk qk
δk +qTk−1Hqk
qTk−1qk−1δk−1
), k = 1, . . . , r − 1,
qk+1 = Hyk+1 + δk+1c, k = 1, . . . , r − 1,
where αk, k = 0, . . . , r − 1, are free and nonzero parameters. In addition, it holdsthat yk are linearly independent for k = 1, . . . , r.
Proof. For k = 0, q0 = c = Hy0+δ0c, holds by assumption since y0 = 0 and δ0 = 1.For k = 1 rewriting the expression for q1 from Proposition 2.1 yields
q1 = α0
(−Hq0 +
qT0 Hq0
qT0 q0q0)
= α0
(−Hq0 +
qT0 Hq0
qT0 q0(Hy0 + δ0c)
)=
= α0H(− q0 +
qT0 Hq0
qT0 q0y0)
+ α0
(qT0 Hq0qT0 q0
δ0)c = Hy1 + δ1c,
where the last equality follows from the expressions for y1 and δ1. Hence, q1 =Hy1 + δ1c. Similarly for k = 2, . . . , r − 1, assume that qk = Hyk + δkc and qk−1 =Hyk−1 + δk−1c, then rewriting the expression for qk+1 from Proposition 2.1 yields,
qk+1 = αk(−Hqk +
qTkHqk
qTk qkqk +
qTk−1Hqk
qTk−1qk−1qk−1
)=
= αk(−Hqk +
qTkHqk
qTk qk(Hyk + δkc) +
qTk−1Hqk
qTk−1qk−1(Hyk−1 + δk−1c)
)=
= αkH(− qk +
qTkHqk
qTk qkyk +
qTk−1Hqk
qTk−1qk−1yk−1
)+ αk
(qTkHqkqTk qk
δk +qTk−1Hqk
qTk−1qk−1δk−1
)c =
= Hyk+1 + δk+1c,
for k = 2, . . . , r − 1, where the last equality follows from the expressions for yk+1
and δk+1, k = 1, . . . , r − 1. Hence, qk+1 = Hyk+1 + δk+1c, k = 1, . . . , r − 1.
By Lemma A.1 it holds that for k = 0, . . . , r, δ(j)k , j = 0, . . . , k are uniquely
determined up to a scaling, hence it follows that yk+1 and δk+1, k = 0, . . . , r− 1 areuniquely determined up to a scaling by the recursions of this proposition.
3. A general Krylov method 7
Further, note that the recursion for yk+1 has a nonzero leading term of qk plusadditional terms of yi, i = k and i = k−1. Since qk is orthogonal to yi for i ≤ k andqk 6= 0 for k < r, it follows that yk+1 are linearly independent for k = 0, . . . , r − 1.
We will henceforth refer to the triples (qk, yk, δk), k = 0, . . . , r, as given byProposition 3.1. In particular, the triple (qr, yr, δr), can now be used to show ourmain convergence result, namely that (1.1) has a solution if and only if δr 6= 0, andthat the recursions in Proposition 3.1 can be used to find a solution if δr 6= 0 and acertificate of incompatibility if δr = 0.
Theorem 3.1. Let (qk, yk, δk), k = 0, . . . , r, be given by Proposition 3.1, and let Zdenote a matrix whose columns form an orthonormal basis for N (H). Then, thefollowing holds for the cases δr 6= 0 and δr = 0 respectively.
a) If δr 6= 0, then Hxr + c = 0 for xr = (1/δr)yr, so that c ∈ R(H) and xr solves(1.1). In addition, it holds that ZTyk = 0, k = 0, . . . , r.
b) If δr = 0, then yr = δ(1)r ZZT c, with δ
(1)r 6= 0 and ZT c 6= 0, so that c 6∈ R(H)
and (1.1) has no solution. Further, there is a y(1)r ∈ Kk−1(c,H) so that yr =
Hy(1)r + δ
(1)r c. Hence, H(Hx
(1)r + c) = 0 for x
(1)r = (1/δ
(1)r )y
(1)r , so that x
(1)r
solves minx∈IRn ‖Hx+ c‖22.
If H is nonsingular, then Z is to be interpreted as an empty matrix.
Proof. For (a), suppose that δr 6= 0. Then 0 = qr = Hyr + δrc, hence Hxr + c = 0for xr = (1/δr)yr, i.e., xr = (1/δr)yr is a solution to (1.1). Since (1.1) has a solution,
it must hold that ZTc = 0. We have yk = Hy(1)k + δ
(1)k c for k = 0, . . . , r, so that
ZTyk = δ(1)k ZT c. As ZTc = 0, it follows that ZTyk = 0, k = 0, . . . , r.
For (b), suppose that δr = 0. We have yr = Hy(1)r + δ
(1)r c, so that ZT yr =
δ(1)r ZT c. If δr = 0, then 0 = qr = Hyr so that yr = δ
(1)r ZZT c. It follows from
Proposition 3.1 that yr 6= 0 so that δ(1)r 6= 0 and ZTc 6= 0. A combination of Hyr = 0
and yr = Hy(1)r + δ
(1)r c gives H(Hy
(1)r + δ
(1)r c) = 0. Consequently, since δ
(1)r 6= 0, it
holds that H(Hx(1)r + c) = 0 for x
(1)r = (1/δ
(1)r )y
(1)r . With f(x) = 1
2‖Hx+ c‖22, one
obtains ∇f(x) = H(Hx+ c), so that x(1)r is a global minimizer to f over IRn.
Hence, we have shown that (1.1) has a solution if and only if δr 6= 0, and that therecursions in Proposition 3.1 can be used to find a solution if δr 6= 0 and a certificateof incompatibility if δr = 0.
Next we make a few remarks on the behavior of the third component of thetriples defined by Proposition 3.1, i.e., the sequence {δk} and on the choice of signfor the nonzero scaling factors {αk}. In the following proposition it is shown thatthe sequence {δk} can not have two zero elements in a row.
8 A general Krylov method for solving symmetric systems of linear equations
Proposition 3.2. Let (qk, yk, δk), k = 0, . . . , r, be given by Proposition 3.1. Ifqk 6= 0 and δk = 0, then
δk+1 = − αkαk−1
qTk qk
qTk−1qk−1δk−1 6= 0.
Proof. By Proposition 2.1 it holds that
qTk qk = −αk−1qTkHqk−1,
and, taking into account δk = 0, the expression for δk+1 from Proposition 3.1 give
δk+1 = αkqTk−1Hqk
qTk−1qk−1δk−1 = − αk
αk−1
qTk qk
qTk−1qk−1δk−1, (3.3)
giving the required expression for δk+1.It remains to show that δk+1 6= 0. First, assume that k = 1 so that δ1 = 0.
Then, since α1 6= 0, α0 6= 0 and δ0 = 1, (3.3) gives δ2 6= 0. Now assume that k > 1.Assume, to get a contradiction, that δk+1 = 0. Then, since αk 6= 0, αk−1 6= 0,(3.3) gives δk−1 = 0. We may then repeat the same argument to obtain δi = 0,i = 1, . . . , k. But this gives a contradiction, as δ1 = 0 implies δ2 6= 0. Hence, it musthold that δk+1 6= 0, as required.
Based on Proposition 3.2 the following corollary states that if αk−1 and αk havethe same sign and δk = 0, then δk+1 and δk−1 will have opposite signs.
Corollary 3.1. Let (qk, yk, δk), k = 0, . . . , r, be given by Proposition 3.1 with αk−1and αk of the same sign. If qk 6= 0 and δk = 0, then δk+1δk−1 < 0.
The following lemma states an expression for the triples that will be usefulin showing properties of the signs of δk and αk for the case when H is positivesemidefinite.
Lemma 3.1. Let (qk, yk, δk), k = 0, . . . , r, be given by Proposition 3.1. If δk 6= 0and k < r, then(
yk+1 −δk+1
δkyk)TH(yk+1 −
δk+1
δkyk)
= αkδk+1
δkqTk qk. (3.4)
Proof. Eliminating c from the difference of qk+1 and qk yields
qk+1 −δk+1
δkqk = H
(yk+1 −
δk+1
δkyk). (3.5)
Then pre-multiplication of (3.5) with (yk+1 − δk+1
δkyk)
T yields
(yk+1 −
δk+1
δkyk)T (
qk+1 −δk+1
δkqk)
=(yk+1 −
δk+1
δkyk)TH(yk+1 −
δk+1
δkyk). (3.6)
3. A general Krylov method 9
Since qk+1 is orthogonal to yk+1 and yk, and since qk is orthogonal to yk, (3.6)becomes
−δk+1
δkyTk+1qk =
(yk+1 −
δk+1
δkyk)TH(yk+1 −
δk+1
δkyk). (3.7)
Hence, by Proposition 2.1 and since qk is orthogonal to yk and yk−1, (3.7) may bewritten as
αkδk+1
δkqTk qk =
(yk+1 −
δk+1
δkyk)TH(yk+1 −
δk+1
δkyk),
hence (3.4) is obtained.
The following lemma gives some results on the behavior of the sequence of {δk}in connection to the sign of αk for the case when H � 0.
Lemma 3.2. Let (qk, yk, δk), k = 0, . . . , r, be given by Proposition 3.1. Assumethat H � 0. Then δk 6= 0 for k < r. If δk > 0 and δk+1 6= 0, then δk+1 > 0 if andonly if αk > 0.
Proof. Assume that δk = 0 for k < r, then qk = Hyk, hence pre-multiplicationwith yTk yields 0 = yTk qk = yTkHyk, since qk is orthogonal to yk. Then, since H � 0,it follows that Hyk = 0 and hence qk = 0. Since qk 6= 0 for k < r, the assumptionyields a contradiction. Hence, δk 6= 0, k < r.
Next suppose that δk > 0 and δk+1 6= 0. Since H � 0, Lemma 3.1 gives
αkδk+1
δkqTk qk ≥ 0, (3.8)
which implies that δk+1 and δk have the same sign if and only if αk > 0. Hence, ifδk > 0, then δk+1 > 0 if and only if αk > 0.
The result of Lemma 3.2 motivates our choice of the minus-sign in (2.4). Oth-erwise δk would alternate sign in each iteration for αk > 0 and H � 0.
Note that for H � 0 and δ0 > 0, then δi > 0, i = 0, . . . , r−1, and δr ≥ 0. Further,if H � 0 then the inequality in (3.8) is strict implying that δi > 0, i = 1, . . . , r, i.e.,δr 6= 0 since (1.1) with H � 0 is always compatible.
Another consequence is that if αk is chosen positive for k = 0, . . . , r, then δk ≤ 0for some k implies H 6� 0 and δk < 0 for some k implies H 6� 0.
3.3. A general Krylov algorithm
Based on our results we now state an algorithm for solving (1.1) based on the triples(qk, yk, δk), k = 0, . . . , r, as in Proposition 3.1 using some αk of our choice. We callAlgorithm 3.1 a Krylov algorithm as it is the vectors {qk}, spanning the Krylov-subspaces, that drive the progress of the algorithm and ensures its termination.
The choice of a nonzero αk is in theory arbitrary, but for the algorithm statedbelow we have made the choice to let αk > 0 such that ‖yk+1‖2 = ‖c‖2, i.e.,
α0 =‖c‖2∣∣∣∣− q0 +qT0 Hq0qT0 q0
y0∣∣∣∣2
, αk =‖c‖2∣∣∣∣− qk +
qTk HqkqTk qk
yk +qTk−1Hqk
qTk−1qk−1yk−1
∣∣∣∣2
, k = 1, . . . , r.
10 A general Krylov method for solving symmetric systems of linear equations
This choice is well defined since yk 6= 0, k = 1, . . . , r, by Proposition 3.1.In theory, triples are generated as long as qk 6= 0. In the algorithm, we introduce
a tolerance such that we proceed with the iterations as long as ‖qk‖2 > qtol, wherewe let qtol =
√εM , where εM is the machine precision. In theory we also draw
conclusions based on δr 6= 0 or δr = 0, for this we introduce a tolerance δtol =√εM .
Algorithm 3.1 A general Krylov algorithm
Input arguments: H, c;Output arguments: compatible; xr if compatible=1; yr if compatible=0;qtol ← tolerance on ‖q‖2; [Our choice: qtol =
√εM ]
δtol ← tolerance on |δ|; [Our choice: δtol =√εM ]
k ← 0; q0 ← c; y0 ← 0; δ0 ← 1;
y1 ←(− q0 +
qT0 Hq0qT0 q0
y0); δ1 ←
( qT0 Hq0qT0 q0
δ0); q1 ← Hy1 + δ1c;
α0 ← nonzero scalar; [Our choice: α0 = ‖c‖2/‖y1‖2]q1 ← α0q1; y1 ← α0y1; δ1 ← α0δ1;k ← 1;while ‖qk‖2 > qtol do
yk+1 ←(− qk +
qTk HqkqTk qk
yk +qTk−1Hqk
qTk−1qk−1yk−1
); δk+1 ←
( qTk HqkqTk qk
δk +qTk−1Hqk
qTk−1qk−1δk−1
);
qk+1 ← Hyk+1 + δk+1c;αk ← nonzero scalar; [Our choice: αk = ‖c‖2/‖yk+1‖2]qk+1 ← αkqk+1; yk+1 ← αkyk+1; δk+1 ← αkδk+1;k ← k + 1;
end whiler ← k;if |δr| > δtol then
xr ← 1δryr; compatible ← 1;
elsecompatible ← 0;
end if
By Theorem 3.1, Algorithm 3.1 will return either a solution to (1.1) or a certifi-cate that the system is incompatible. Note that δk = 0 for some k does not stopAlgorithm 3.1.
The following example illustrates Algorithm 3.1 with our choices for αk > 0,qtol and δtol, on a case of (1.1) where δk = 0 every other iteration. The examplealso illustrates the change of sign between δk+1 and δk−1 when δk = 0 predicted byCorollary 3.1.
Example 3.1. Let
c =(
3 2 1 0 −1 −2 −3)T, H = diag(c),
Algorithm 3.1 applied to H and c with αk > 0 such that ||yk|| = ||c||, k = 1, . . . , r,and qtol = δtol =
√εM , yields the following sequences
4. Connection to the method of conjugate gradients 11
q =
3.0000 -9.0000 2.2678 -2.7046 0.2648 -0.2445 0.0000
2.0000 -4.0000 -2.2678 5.4912 -1.0591 1.4673 0
1.0000 -1.0000 -2.2678 2.3768 1.3239 -3.6681 0
0 0 0 0 0 0 0
-1.0000 -1.0000 2.2678 2.3768 -1.3239 -3.6681 0
-2.0000 -4.0000 2.2678 5.4912 1.0591 1.4673 0
-3.0000 -9.0000 -2.2678 -2.7046 -0.2648 -0.2445 -0.0000
y =
0 -3.0000 3.4017 -0.9015 -2.2241 -0.0815 2.1602
0 -2.0000 1.5119 2.7456 -2.8419 0.7336 2.1602
0 -1.0000 0.3780 2.3768 -0.9885 -3.6681 2.1602
0 0 0 0 0 0 0
0 1.0000 0.3780 -2.3768 -0.9885 3.6681 2.1602
0 2.0000 1.5119 -2.7456 -2.8419 -0.7336 2.1602
0 3.0000 3.4017 0.9015 -2.2241 0.0815 2.1602
delta =
1.0000 0 -2.6458 0 2.3123 0 -2.1602
Hence, r = 6 and xr = (1/δr)yr =(−1 −1 −1 0 −1 −1 −1
)T.
4. Connection to the method of conjugate gradients
Next we will put the method of conjugate gradients by Hestenes and Stiefel [9]into the framework of our results. For an introduction to the method of conjugategradients see, e.g., [12, 19]. This method is defined for the case where H � 0.The iterate xk is then defined as the solution to minKk(c,H)
12x
THx + cTx, and
gk = Hxk + c for k = 0, . . . , r, i.e., gk ∈ Kk+1(c,H) ∩ Kk(c,H)⊥.
In our setting, this is equivalent to generating triples (qk, yk, δk), k = 0, . . . , r,given by Proposition 3.1, selecting the scaling αk such that δk = 1. If it is assumedthat H � 0, Lemma 3.2 gives δk 6= 0, k = 0, . . . , r − 1. If c ∈ R(H), i.e., (1.1) iscompatible, then Theorem 3.1 ensures that δr 6= 0. Hence, if H � 0 and c ∈ R(H),such a scaling is well defined. The following proposition gives the particular choicesof scaling αk for which (qk, yk, δk) may be denoted by (gk, xk, 1), k = 0, . . . , r.
Proposition 4.1. Assume that H � 0 and c ∈ R(H). If (qk, yk, δk) are given byProposition 3.1 for the particular choices
α0 =1
δ0
qT0 q0
qT0 Hq0, αk =
1
qTk HqkqTk qk
δk +qTk−1Hqk
qTk−1qk−1δk−1
, k = 1, . . . , r − 1, (4.1)
then αk > 0, k = 0, . . . , r − 1, and δk = 1, k = 0, . . . , r.
12 A general Krylov method for solving symmetric systems of linear equations
Hence, (qk, yk, δk) may be denoted by (gk, xk, 1), k = 0, . . . , r, and it holds that
α0 =gT0 g0
gT0 Hg0, x1 = x0 + α0(−g0), g1 = Hx1 + c, (4.2a)
αk =1
gTk HgkgTk gk
+gTk−1Hgk
gTk−1gk−1
, k = 1, . . . , r − 1, (4.2b)
xk+1 = xk + αk(− gk −
gTk−1Hgk
gTk−1gk−1(xk − xk−1)
), k = 1, . . . , r − 1, (4.2c)
gk+1 = Hxk+1 + c, k = 1, . . . , r − 1. (4.2d)
Proof. By assumption δ0 = 1 and by inserting αk, k = 0, . . . , r − 1, as in (4.1) inthe expressions for δk+1, k = 1, . . . , r−1, from Proposition 3.1, it holds that δk = 1,k = 0, . . . , r. Hence, (qk, yk, δk) may be denoted by (gk, xk, 1), k = 0, . . . , r.
If (gk, xk, 1) is introduced for (qk, yk, δk) in (4.1) then the expressions for αk,k = 0, . . . , r − 1, in (4.2a) and (4.2b) are obtained. By these expressions, xk,k = 0, . . . , r, are given by x0 = 0,
x1 = α0
(− g0 +
gT0 Hg0
gT0 g0x0)
= x0 + α0(−g0),
and
xk+1 = αk(− gk +
gTkHgk
gTk gkxk +
gTk−1Hgk
gTk−1gk−1xk−1
)=
= xk + αk(− gk −
gTk−1Hgk
gTk−1gk−1(xk − xk−1)
), k = 1, . . . , r − 1,
and gk, k = 0, . . . , r, are given by g0 = c and gk+1 = Hxk+1 +c, k = 0, . . . , r−1. ByLemma 3.2, and since δk = 1 > 0, k = 0, . . . , r, it holds that αk > 0, k = 0, . . . , r−1.
It is evident that the method of conjugate gradients relies heavily on the factthat (gk, xk, 1) is well defined for all k = 0, . . . , r, i.e., that δk 6= 0 for all k = 0, . . . , r.In effect, the method of conjugate gradients implicitly makes the very specific choiceof αk, k = 0, . . . , r − 1, as given in Proposition 4.1. If the assumption H � 0 andc ∈ R(H) is lifted then a high price is paid for using (gk, xk, 1), the algorithm breaksdown if δk = 0 for some k. By our analysis, this choice of scaling is unnecessary, asillustrated by Theorem 3.1 and by Algorithm 3.1.
In particular, if the method of conjugate gradients is applied to a case whereH � 0 and c 6∈ R(H), the algorithm will break down. From Lemma 3.1, it is clearthat this will happen at the very last iteration. Algorithm 3.1 would instead provideyr as a certificate of incompatibility in this situation.
The method of conjugate gradients may be regarded as a line-search method onan unconstrained quadratic program, see, e.g. [19], where search-directions pk andan optimal step-lengths along these are calculated along with the xk and gk, k =
5. Connection to the minimum-residual method 13
0, . . . , r. In the following corollary we show that this formulation can be obtainedfrom Proposition 4.1.
Corollary 4.1. Assume that H � 0 and c ∈ R(H). Let (gk, xk, 1), k = 0, . . . , r, begiven by Proposition 4.1. Then,
xk+1 = xk + αkpk, k = 0, . . . , r − 1,
gk+1 = gk + αkHpk, k = 0, . . . , r − 1,
where
p0 = −g0, pk = −gk +gTk gk
gTk−1gk−1pk−1, k = 1, . . . , r − 1,
αk = −gTk pk
pTkHpk, k = 0, . . . , r − 1.
Proof. Let pk = (1/αk)(xk+1 − xk), k = 0, . . . , r − 1. Proposition 4.1 then gives
p0 = −g0, and pk = −gk −gTk−1Hgk
gTk−1gk−1(xk − xk−1), k = 1, . . . , r − 1.
By Proposition 2.1 it holds that gTk gk = −αk−1gTkHgk−1, hence
pk = −gk −gTk−1Hgk
gTk−1gk−1αk−1pk−1 = −gk +
gTk gk
gTk−1gk−1pk−1, k = 1, . . . , r − 1.
Consequently, since gk+1 − gk = H(xk+1 − xk) = αkHpk, k = 0, . . . , r − 1, it holdsthat gk+1 = gk + αkHpk, k = 0, . . . , r − 1. Further, gTk+1pk = 0 since gk+1 ∈Kk+1(c,H) ∩ Kk(c,H)⊥ and pk ∈ Kk(c,H). Hence, 0 = gTk+1pk = gTk pk + αkp
TkHpk
yields
αk = −gTk pk
pTkHpk, k = 0, . . . , r − 1,
completing the proof.
Note that the scaling, given in Proposition 4.1, of the triple in our formulationis regarded as the scaling of a search direction in the line-search formulation ofthe method of conjugate gradients. From the point of view of our results it is notnecessary to calculate search directions, Proposition 4.1 gives a complete descriptionof the method of conjugate gradients.
5. Connection to the minimum-residual method
One issue that remains to be addressed is the case when (1.1) is incompatible.Algorithm 3.1 then gives a certificate of this fact, but often one would be interestedin a vector x that is ”as good as possible”. The method of choice could then be theminimum residual method. This method goes back to [11] and [20], and the name is
14 A general Krylov method for solving symmetric systems of linear equations
adopted form the implementation of the method, MINRES, by Paige and Saunders,see [16].
For k = 0, . . . , r, xMRk is defined as a solution to minx∈Kk(c,H) ‖Hx + c‖22, and
the corresponding residual gMRk is defined as gMR
k = HxMRk + c. The vectors xMR
k
are uniquely defined for k = 0, . . . , r − 1, and for k = r if c ∈ R(H). For thecase k = r and c 6∈ R(H) there is one degree of freedom for xMR
r . If c ∈ R(H),then xMR
r solves (1.1), and if c 6∈ R(H), then xMRr−1 and xMR
r are both solutions tominx∈IRn ‖Hx+ c‖22.
In the following theorem, we derive the minimum-residual method based on ourframework. In particular, we give explicit formulas for xMR
k and gMRk , k = 0, . . . , r.
For the case k = r, c 6∈ R(H), we give an explicit formula for xMRr of minimum
Euclidean norm.
Theorem 5.1. Let (qk, yk, δk) be given by Proposition 3.1 for k = 0, . . . , r. Then,for k = 0, . . . , r, it holds that xMR
k solves minx∈Kk(c,H) ‖Hx + c‖22 if and only if
xMRk =
∑ki=0 γiyi for some γi, i = 0, . . . , k, that are optimal to
minimizeγ0,...,γk
12
k∑i=0
γ2i qTi qi
subject to∑k
i=0 γiδi = 1.
(5.1)
In particular, xMRk takes the following form for the mutually exclusive cases (a)
k < r; (b) k = r and δr 6= 0; and (c) k = r and δr = 0.
a) For k < r, it holds that
xMRk =
1∑kj=0
δ2jqTj qj
k∑i=0
δi
qTi qiyi, (5.2)
and gMRk = HxMR
k + c 6= 0.
b) For k = r and δr 6= 0, it holds that xMRr = (1/δr)yr and g
MRr = HxMR
r +c = 0,so that xMR
r solves (1.1) and xMRr is identical to xr of Theorem 3.1.
c) For k = r and δr = 0, it holds that xMRr = xMR
r−1+γryr, where γr is an arbitraryscalar, and gMR
r = HxMRr + c = gMR
r−1 6= 0. In addition, xMRr−1 and xMR
r solveminx∈IRn ‖Hx+ c‖22. The particular choice
γr = −yTrx
MRr−1
yTryr
makes xMRr an optimal solution to minx∈IRn ‖Hx+ c‖22 of minimum Euclidean
norm.
5. Connection to the minimum-residual method 15
Proof. Since qi, i = 0, . . . , k, form an orthogonal basis for Kk+1(c,H), an arbitraryvector in Kk+1(c,H) can be written as
g =k∑i=0
γiqi =k∑i=0
γi(Hyi + δic) = H( k∑i=0
γiyi)
+ (k∑i=0
γiδi)c, (5.3)
for some parameters γi, i = 0, . . . , k. Consequently, the condition∑k
i=0 γiδi = 1
inserted into (5.3) gives g = Hx + c with x =∑k
i=0 γiyi, i.e., g is an arbitraryvector in Kk+1(c,H) for which the coefficient in front of c equals one, and x is thecorresponding arbitrary vector in Kk(c,H). Minimizing the Euclidean norm of sucha g gives the quadratic program
minimizeg,γ0,...,γk
12gTg
subject to g =∑k
i=0 γiqi,∑ki=0 γiδi = 1,
(5.4)
so that, by (5.3), the optimal values of γi, i = 0, . . . , k, give gMRk as gMR
k =∑k
i=0 γiqiand xMR
k as xMRk =
∑ki=0 γiyi. Elimination of g in (5.4), taking into account the
orthogonality of the qi’s, gives the equivalent problem (5.1). Also note that sinceδ0 6= 0, the quadratic programs (5.1) and (5.4) are always feasible, and hence theyare well defined.
Let L(γ, λ) be the Lagrangian function for (5.1),
L(γ, λ) =1
2
k∑i=0
γ2i qTi qi − λ
( k∑i=0
γiδi − 1).
The optimality conditions for (5.1) are given by
0 = ∇γiL(γ, λ) = γiqTi qi − λδi, i = 0, . . . , k, (5.5a)
0 = ∇λL(γ, λ) = 1−k∑i=0
γiδi. (5.5b)
First, for (a), consider the case k < r. From (5.5a) it holds that
γi = λδi
qTi qi, i = 0, . . . , k, (5.6)
which are well defined, since qi 6= 0, i = 0, . . . , r−1. The expression for λ is obtainedby inserting the expression for γi, i = 0, . . . , k, given by (5.6) in (5.5b) so that
1 =k∑i=0
γiδi =k∑i=0
λδ2iqTi qi
yielding λ =1∑k
i=0δ2iqTi qi
. (5.7)
Consequently, a combination of (5.6) and (5.7) gives
γi =1∑k
j=0
δ2jqTj qj
δi
qTi qi, i = 0, . . . , k. (5.8)
16 A general Krylov method for solving symmetric systems of linear equations
Hence, by letting xMRk =
∑ki=0 γiyi, with γi given by (5.8), (5.2) follows.
Now, for (b), consider the case k = r with δr 6= 0. Then, since qr = 0, (5.5a)gives λ = 0 and γi = 0, i = 0, . . . , r − 1. Consequently, (5.5b) gives γr = 1/δr.Again, by letting xMR
r =∑r
i=0 γiyi, it holds that xMRk = (1/δr)yr, for which gMR
k =HxMR
k + c = 0, so that the optimal value is zero in (5.1) and xMRr solves (1.1).
Finally, for (c), consider the case k = r with δr = 0. Theorem 3.1 shows that
there exists an x(1)r ∈ Kk−1(c,H) that solves minx∈IRn ‖Hx + c‖22. Consequently,
since x(1)r ∈ Kk−1(c,H), it follows from (a) that it must hold that x
(1)r = xMR
r−1 sothat xMR
r−1 solves minx∈IRn ‖Hx+ c‖22.For k = r, the optimality conditions (5.5) are equivalent to when k = r− 1, just
with the additional information that γr is arbitrary. Hence, xMRr = xMR
r−1 +γryr andgMRr = HxMR
r + c = gMRr−1 for γr arbitrary since Hyr = 0.
For the remainder of the proof, let xMRr = xMR
r−1 + γryr for the particular choice
γr = −(yTr xMRr−1)/(yTr yr), so that yTr x
MRr = 0. Since yk = Hy
(1)k + δ
(1)k c, it follows
that ZT y(1)k = δ
(1)k ZT c, k = 0, . . . , r. Consequently, since xMR
r is formed as alinear combination of yk, k = 0, . . . , r, it holds that ZTxMR
r−1 is parallel to ZTc. But0 = yTr x
MRr = cTZZTxMR = 0, so that ZTxMR is also orthogonal to ZTc. By
Theorem 3.1, ZTc 6= 0, so that ZTxMR = 0. Hence, xMRr belongs to the range
space of H for this choice of γr. As xMRr = xMR
r−1 + γryr and yr belongs to the nullspace of H, it follows that the range-space component of xMR
r is unique. Hence,since the range-space component is unique and the null-space component is zero,the conclusion is that xMR
r is an optimal solution to minx∈IRn ‖Hx+c‖22 of minimumEuclidean norm.
Note that at an iteration k at which δk = 0 and qk 6= 0, it holds that xMRk = xMR
k−1so that the iterate is unchanged. By Proposition 3.2 this cannot happen at twoconsecutive iterations. If qk 6= 0, all information from the problem has not beenextracted even if δk = 0. Only in the case when k = r, global information isobtained, and it is determined whether (1.1) has a solution.
In the following corollary of Theorem 5.1 we may state explicit recursions forthe minimum-residual method.
Corollary 5.1. Let (qk, yk, δk) be given by Proposition 3.1 and let xMRk be given by
Theorem 5.1 for k = 0, . . . , r. If δMR0 = δ20, y
MR0 = δ0y0,
δMRk = qTk qk
k−1∑i=0
δ2iqTi qi
+ δ2k and yMRk = qTk qk
k−1∑i=0
δi
qTi qiyi + δkyk, k = 1, . . . , r,
then
xMRk =
1
δMRk
yMRk , k = 0, . . . , r − 1 and k = r if δr 6= 0.
In addition, it holds that
δMRk+1 =
qTk+1qk+1
qTk qkδMRk + δ2k+1 and yMR
k+1 =qTk+1qk+1
qTk qkyMRk + δk+1yk+1,
for k = 0, . . . , r − 1.
5. Connection to the minimum-residual method 17
Proof. For k = 0, . . . , r − 1, the expressions for δMRk and yMR
k give
1
qTk qkδMRk =
k∑i=0
δ2iqTi qi
and1
qTk qkyMRk =
k∑i=0
δi
qTi qiyi.
Note that δ0 6= 0 and qi 6= 0, i = 0, . . . , k implies δMRk > 0 for k < r. Hence,
Theorem 5.1 gives xMRk = (1/δMR
k )yMRk . If k = r and δr 6= 0, the expressions for
δMRr and yMR
r giveδMRr = δ2r > 0 and yMR
r = δryr,
so that Theorem 5.1 gives xMRr = (1/δMR
r )yMRr also for this case.
To see the recursions, note that for k = 0, . . . , r − 1 it follows that
δMRk+1 = qTk+1qk+1
k∑i=0
δ2iqTi qi
+ δ2k+1
=qTk+1qk+1
qTk qk
(qTk qk
k−1∑i=0
δ2iqTi qi
+ δ2k
)+ δ2k+1 =
qTk+1qk+1
qTk qkδMRk + δ2k+1,
and
yMRk+1 = qTk+1qk+1
k∑i=0
δi
qTi qiyi + δk+1yk+1
=qTk+1qk+1
qTk qk
(qTk qk
k−1∑i=0
δi
qTi qiyi + δkyk
)+ δk+1yk+1 =
qTk+1qk+1
qTk qkyMRk + δk+1yk+1,
as required.
The recursion for xMRr , based on xMR
r−1 and (qr, yr, δr), for the case δr = 0 is givenin Theorem 5.1 and it is not reiterated in Corollary 5.1.
Note that the expressions in Theorem 5.1 and Corollary 5.1 for xMRk , k = 0, . . . , r,
are independent of the scaling of (qk, yk, δk). Hence, the scaling δk = 1 may be usedfor the case H � 0, c ∈ R(H), as stated in the following corollary.
Corollary 5.2. Assume that H � 0 and c ∈ R(H). Let (gk, xk, 1), k = 0, . . . , r, begiven by Proposition 4.1. Then, for each k = 0, . . . , r, (qk, yk, δk) may be replacedby (gk, xk, 1) in Theorem 5.1 and Corollary 5.1.
In effect, if H � 0 and c ∈ R(H), then xMRk , k = 0, . . . , r − 1, is given as a
convex combination of xi, i = 0, . . . , k.
5.1. A mimimum-residual algorithm based on the general Krylov method
Next we state an algorithm for the minimum-residual method based on Algorithm 3.1and extended with the recursion in Corollary 5.1. We used the same αk > 0 andtolerances qtol and δtol as for Algorithm 3.1.
18 A general Krylov method for solving symmetric systems of linear equations
Hence, for a compatible system (1.1) Algorithm 5.2 gives the same solution xras Algorithm 3.1, and in addition it calculates xMR
r . They are both estimates of asolution to (1.1). Further, if (1.1) is incompatible then Algorithm 5.2 delivers xMR
r ,an optimal solution to minx∈IRn ‖Hx+ c‖22 of minimum Euclidean norm, in additionto the certificate of incompatibility.
Algorithm 5.2 A mimimum-residual algorithm based on the general Krylovmethod
Input arguments: H, c;Output arguments: xMR
r , gMRr , compatible; (xr or yr if compatible=1 or 0;)
qtol ← tolerance on ‖q‖2; [Our choice: qtol =√εM ]
δtol ← tolerance on |δ|; [Our choice: δtol =√εM ]
k ← 0; q0 ← c; y0 ← 0; δ0 ← 1;yMR0 ← δ0y0; δMR
0 ← δ20 ; xMR0 ← 1
δMR0
yMR0 ; gMR
0 ← HxMR0 + c;
y1 ←(− q0 +
qT0 Hq0qT0 q0
y0); δ1 ←
( qT0 Hq0qT0 q0
δ0); q1 ← Hy1 + δ1c;
α0 ← nonzero scalar; [Our choice: α0 = ‖c‖2/‖y1‖2]q1 ← α0q1; y1 ← α0y1; δ1 ← α0δ1;if ‖q1‖2 > qtol or |δ1| > δtol then
yMR1 ← qT1 q1
qT0 q0yMR0 + δ1y1; δMR
1 ← qT1 q1qT0 q0
δMR0 + δ21 ;
xMR1 ← 1
δMR1
yMR1 ; gMR
1 ← HxMR1 + c;
end ifk ← 1;while ‖qk‖2 > qtol do
yk+1 ←(− qk +
qTk HqkqTk qk
yk +qTk−1Hqk
qTk−1qk−1yk−1
); δk+1 ←
( qTk HqkqTk qk
δk +qTk−1Hqk
qTk−1qk−1δk−1
);
qk+1 ← Hyk+1 + δk+1c;αk ← nonzero scalar; [Our choice: αk = ‖c‖2/‖yk+1‖2]qk+1 ← αkqk+1; yk+1 ← αkyk+1; δk+1 ← αkδk+1;if ‖qk+1‖2 > qtol or |δk+1| > δtol then
yMRk+1 ←
qTk+1qk+1
qTk qkyMRk + δk+1yk+1; δMR
k+1 ←qTk+1qk+1
qTk qkδMRk + δ2k+1;
xMRk+1 ←
1δMRk+1
yMRk+1; gMR
k+1 ← HxMRk+1 + c;
end ifk ← k + 1;
end whiler ← k;if |δr| > δtol then
xr ← 1δryr; compatible ← 1;
else
xMRr ← xMR
r−1 −yTrx
MRr−1
yTryryr; compatible ← 0;
end if
Revisiting Example 3.1 and applying the extended Algorithm 5.2 to the compat-ible system of (1.1), then in addition the identical {δk}, {qk} and {yk}(see Example
5. Connection to the minimum-residual method 19
3.1), Algorithm 5.2 also returns
xMR =
0 0 -1.1108 -1.1108 -0.9953 -0.9953 -1.0000
0 0 -0.4937 -0.4937 -1.0641 -1.0641 -1.0000
0 0 -0.1234 -0.1234 -0.3593 -0.3593 -1.0000
0 0 0 0 0 0 0
0 0 -0.1234 -0.1234 -0.3593 -0.3593 -1.0000
0 0 -0.4937 -0.4937 -1.0641 -1.0641 -1.0000
0 0 -1.1108 -1.1108 -0.9953 -0.9953 -1.0000
We state {δk} again for convenience,
delta =
1.0000 0 -2.6458 0 2.3123 0 -2.1602
and note that for δk = 0, it holds that xMRk = xMR
k−1, as predicted by Theorem 5.1.Next we observe an example that illustrates Algorithm 5.2 with our choices for
αk > 0, qtol and δtol, on a case when (1.1) is incompatible, i.e. c /∈ R(H).
Example 5.1. Let
c =(
3 2 1 1 −1 −2 −3)T, H = diag
(5 2 1 0 −1 −2 −3
),
Algorithm 5.2 applied to H and c with αk > 0 such that ||yk|| = ||c||, k = 1, . . . , r,and qtol = δtol =
√εM , yields the following sequences
q =
3.0000 -13.1379 3.5628 -0.8597 0.1063 -0.0181 0.0017 -0.0000
2.0000 -2.7586 -5.7676 3.9832 -1.3787 0.6372 -0.1470 -0.0000
1.0000 -0.3793 -3.1464 0.1039 1.8638 -2.5737 1.1021 0.0000
1.0000 0.6207 -2.8617 -1.7605 2.2573 0.5896 -1.7634 -0.0000
-1.0000 -1.6207 2.0296 2.7934 -0.6882 -2.4489 -1.4695 0.0000
-2.0000 -5.2414 1.3007 4.3735 2.1842 1.1548 0.3149 -0.0000
-3.0000 -10.8621 -3.8286 -2.6032 -0.6658 -0.2082 -0.0367 0.0000
y =
0 -3.0000 2.4296 0.8844 -1.3331 -0.3574 1.0584 -0.0000
0 -2.0000 -0.0222 3.7521 -2.9466 -0.2710 1.6899 0
0 -1.0000 -0.2847 1.8644 -0.3935 -3.1633 2.8655 0.0000
0 -1.0000 -0.5584 1.5833 0.7502 -3.7018 -0.7931 5.3852
0 1.0000 0.8320 -1.0329 -1.5691 1.8593 3.2329 0.0000
0 2.0000 2.2113 -0.4262 -3.3494 -1.1670 1.6060 0.0000
0 3.0000 4.1379 2.6283 -2.0353 -0.5202 1.7756 0.0000
delta =
1.0000 0.6207 -2.8617 -1.7605 2.2573 0.5896 -1.7634 -0.0000
20 A general Krylov method for solving symmetric systems of linear equations
xMR =
0 -0.1588 -0.6633 -0.6143 -0.5995 -0.5998 -0.6000 -0.6000
0 -0.1059 -0.0228 -0.6647 -1.0640 -1.0371 -1.0000 -1.0000
0 -0.0529 0.0585 -0.2817 -0.2148 -0.4441 -1.0000 -1.0000
0 -0.0529 0.1284 -0.1845 0.1376 -0.1481 0.1333 -0.0000
0 0.0529 -0.1983 0.0407 -0.4178 -0.2588 -1.0000 -1.0000
0 0.1059 -0.5364 -0.2994 -1.0375 -1.0794 -1.0000 -1.0000
0 0.1588 -1.0143 -1.1600 -0.9990 -0.9938 -1.0000 -1.0000
Hence, r = 7 and xMRr =
(−0.6 −1 −1 0 −1 −1 −1
)T. Note that since
|δr| < δtol, the system is considered incompatible and xMRr is the optimal solution to
minx∈IRn ‖Hx+ c‖22 of minimum Euclidean norm and ‖HxMRr + c‖22 = 1.
6. Summary and conclusion
We have presented a general Krylov method for solving a system of linear equationsHx+ c for H symmetric. No assumption on H except symmetry is made. In exactarithmetic, we determine if the system is compatible. If so, a solution is computed.If not, a certificate of incompatibility is computed. The basis for our method are thetriples (qk, yk, δk), k = 0, . . . , r, given by Proposition 3.1, that are uniquely definedup to a scaling. We have obtained the method of conjugate gradients as a specialcase of our method by giving the scaling for which δk = 1, k = 0, . . . , r.
We have also put the minimum-residual method in our framework and providedexplicit formulas for the iterations. Again, we base our analysis on the triples. Inthe case of an incompatible system, our method gives xMR
r of minimum Euclideannorm. The original implementation of MINRES [16] did not deliver the optimalsolution to minx∈IRn ‖Hx+ c‖22 of minimum Euclidean norm. In [2] a MINRES-likealgorithm, MINRES-QLP is developed that does.
One may observe that an alternative to using the minimum-residual iterations
would be to consider recursions for y(1)k and δ
(1)k as given by (3.2) and then calculate
xMRr−1 = (1/δ
(1)r )y
(1)r , according to the analysis in the proof of Theorems 3.1 and
5.1. However, such an approach would not automatically yield the estimate xMRk ,
k = 0, . . . , r − 2.
One could also note that the method of conjugate gradients may be viewedas trying to solve the miminum-residual problem (5.1) in the situation where onlythe present triple (qk, yk, δk) is allowed in the linear combination, i.e., γi = 0, i =0, . . . , k−1. The problem is then not necessarily feasible. It will be infeasible exactlywhen δk = 0. One could think of methods other than the minimum-residual methodwhich use a linear combination of more than one triple. By Proposition 3.2, it wouldsuffice to use two consecutive triples, since it cannot hold that δk−1 = 0 and δk = 0.The method SYMMLQ [16] uses this property, but we have not investigated theprecise relationship to an optimization problem on the same form as (5.1).
Finally, we want to stress that this paper is meant to give insight into Krylovmethods, but we do not claim any numerical properties. The general Krylov methodwould inherit deficiencies of any method based on the Lanczos process such as loss oforthogonality of the generated vectors. It is beyond the scope of the present paper
A. A result on the Krylov vectors 21
to make such an analysis, see, e.g., [8,13,14,17]. The theory of our paper is based ondetermining if certain quantities are zero or not. In our algorithms, we have madechoices on optimality tolerances that are not meant to be universal. To obtain afully functioning algorithm, the issue of determining if a quantity is near-zero wouldneed to be considered more in detail. Also, we have based our analysis on the triples,so that termination of Algorithm 5.2 is based on qk and δk. In practice, one shouldprobably also consider gMR
k .Further, the use of pre-conditioning is not explored in this paper, for this subject
see, e.g., [1, 3, 4].
A. A result on the Krylov vectors
For completeness, we review a result that characterizes the properties of qk expressedas in (2.1), needed for the analysis.
Lemma A.1. Let r denote the smallest positive integer k for which
Kk+1(c,H) ∩ Kk(c,H)⊥ = {0}.
For an index k such that 1 ≤ k ≤ r, let qk ∈ Kk+1(c,H) ∩ Kk(c,H)⊥, be expressed
as in (2.1). Then, the scalars δ(j)k , j = 0, . . . , k, are uniquely determined up to
a nonzero scaling. In addition, if δ(k)k 6= 0 it holds that qk = 0 if and only if
Kk+1(c,H) ∩ Kk(c,H)⊥ = {0}, i.e., if k = r.
Proof. Assume that qk ∈ Kk+1(c,H)∩Kk(c,H)⊥ is expressed as in (2.1). If k < r,
then c, Hc, H2c, . . . , Hkc are linearly independent. Hence, δ(j)k , j = 0, . . . , k, are
uniquely determined by qk. Consequently, as qk is uniquely defined up to a nonzero
scaling, then so are δ(j)k , j = 0, . . . , k. For k = r, we have qr = 0 so that
−δ(r)r Hrc =r−1∑j=0
δ(j)r Hjc. (A.1)
By the definition of r, it holds that c, Hc, H2c, . . . , Hr−1c are linearly independent.
Hence, (A.1) shows that a fixed δ(r)r uniquely determines δ
(j)r , j = 1, . . . , r − 1.
Consequently, a scaling of δ(r)r gives a corresponding scaling of δ
(j)r , j = 0, . . . , r− 1.
Thus, δ(j)r , j = 0, . . . , r are uniquely determined up to a common scaling.
Finally, assume that δ(k)k 6= 0. By definition Kk+1(c,H) ∩ Kk(c,H)⊥ = {0}
implies qk = 0. To show the converse, assume that qk = 0. Then,
−δ(k)k Hkc =
k−1∑j=0
δ(j)k Hjc. (A.2)
If δ(k)k 6= 0, then (A.2) impliesHkc ∈ span{c,Hc,H2c, . . . ,Hk−1c}, i.e., Kk+1(c,H) =
Kk(c,H), so that Kk+1(c,H) ∩ Kk(c,H)⊥ = {0}, completing the proof.
22 References
References
[1] Axelsson, O. On the efficiency of a class of A-stable methods. Nordisk Tidskr. Informations-behandling (BIT) 14 (1974), 279–287.
[2] Choi, S.-C. T., Paige, C. C., and Saunders, M. A. MINRES-QLP: a Krylov subspacemethod for indefinite or singular symmetric systems. SIAM J. Sci. Comput. 33, 4 (2011),1810–1836.
[3] Evans, D. J. The use of pre-conditioning in iterative methods for solving linear equationswith symmetric positive definite matrices. J. Inst. Math. Appl. 4 (1968), 295–314.
[4] Evans, D. J. The analysis and application of sparse matrix algorithms in the finite elementmethod. 427–447.
[5] Forsgren, A., Gill, P. E., and Shinnerl, J. R. Stability of symmetric ill-conditionedsystems arising in interior methods for constrained optimization. SIAM J. Matrix Anal. Appl.17 (1996), 187–211.
[6] Golub, G. H., and O’Leary, D. P. Some history of the conjugate gradient and Lanczosalgorithms: 1948–1976. SIAM Rev. 31, 1 (1989), 50–102.
[7] Golub, G. H., and Van Loan, C. F. Matrix computations, vol. 3 of Johns Hopkins Seriesin the Mathematical Sciences. Johns Hopkins University Press, Baltimore, MD, 1983.
[8] Hanke, M. Conjugate gradient type methods for ill-posed problems, vol. 327 of Pitman ResearchNotes in Mathematics Series. Longman Scientific & Technical, Harlow, 1995.
[9] Hestenes, M. R., and Stiefel, E. Methods of conjugate gradients for solving linear systems.J. Research Nat. Bur. Standards 49 (1952), 409–436 (1953).
[10] Lanczos, C. An iteration method for the solution of the eigenvalue problem of linear differ-ential and integral operators. J. Research Nat. Bur. Standards 45 (1950), 255–282.
[11] Lanczos, C. Solution of systems of linear equations by minimized-iterations. J. ResearchNat. Bur. Standards 49 (1952), 33–53.
[12] Luenberger, D. G. Linear and nonlinear programming, second ed. Addison-Wesley Pub Co,Boston, MA, 1984.
[13] Meurant, G., and Strakos, Z. The Lanczos and conjugate gradient algorithms in finiteprecision arithmetic. Acta Numer. 15 (2006), 471–542.
[14] Paige, C. C. Error analysis of the Lanczos algorithm for tridiagonalizing a symmetric matrix.J. Inst. Math. Appl. 18, 3 (1976), 341–349.
[15] Paige, C. C. Krylov subspace processes, Krylov subspace methods, and iteration polynomials.In Proceedings of the Cornelius Lanczos International Centenary Conference (Raleigh, NC,1993) (Philadelphia, PA, 1994), SIAM, pp. 83–92.
[16] Paige, C. C., and Saunders, M. A. Solutions of sparse indefinite systems of linear equations.SIAM J. Numer. Anal. 12, 4 (1975), 617–629.
[17] Reid, J. K. On the method of conjugate gradients for the solution of large sparse systems oflinear equations. In Large sparse sets of linear equations (Proc. Conf., St. Catherine’s Coll.,Oxford, 1970). Academic Press, London, 1971, pp. 231–254.
[18] Saad, Y. Iterative methods for sparse linear systems, second ed. Society for Industrial andApplied Mathematics, Philadelphia, PA, 2003.
[19] Shewchuk, J. R. An introduction to the conjugate gradient method without the agonizingpain. Tech. rep., Pittsburgh, PA, USA, 1994.
[20] Stiefel, E. Relaxationsmethoden bester Strategie zur Losung linearer Gleichungssysteme.Comment. Math. Helv. 29 (1955), 157–179.