a general krylov method for solving symmetric systems of linear equations

A general Krylov method for solving symmetric systems oflinear equations

Anders FORSGREN∗ Tove ODLAND∗

Technical Report TRITA-MAT-2014-OS-01Department of Mathematics

KTH Royal Institute of Technology

March 2014

Abstract

Krylov subspace methods are used for solving systems of linear equationsHx+ c = 0. We present a general Krylov subspace method that can be appliedto a symmetric system of linear equations, i.e., for a system in which H = HT .In each iteration, we have a choice of scaling of the orthogonal vectors thatsuccessively make the Krylov subspaces available. We define an extended rep-resentation of each such Krylov vector, so that the Krylov vector is representedby a triple. We show that our Krylov subspace method is able to determine ifthe system of equations is compatible. If so, a solution is computed. If not, acertificate of incompatibility is computed. The method of conjugate gradientsis obtained as a special case of our general method. Our framework gives a wayto understand the method of conjugate gradients, in particular when H is not(positive) definite. Finally, we derive a minimum-residual method based on ourframework and show how the iterates may be updated explicitly based on thetriples.

1. Introduction

An important problem in numerical linear algebra is to solve a system of equationswhere the matrix is symmetric. Such a problem may be posed as

Hx+ c = 0, (1.1)

for x ∈ Rn, with c ∈ Rn and H = HT ∈ Rn×n. Our primary motivation comesfrom KKT systems arising in optimization, where H is symmetric but in generalindefinite, see, e.g., [5], but there are many other applications. Throughout, H isassumed symmetric, any other assumptions on H at particular instances will bestated explicitly. It is also assumed throughout that c 6= 0.

A strategy for a class of methods for solving (1.1) is to generate linearly in-dependent vectors, qk, k = 0, 1, . . . until qk becomes linearly dependent for somek = r ≤ n and hence qr = 0. These methods will have finite termination properties.

∗Optimization and Systems Theory, Department of Mathematics, KTH Royal Institute of Tech-nology, SE-100 44 Stockholm, Sweden ([email protected],[email protected]). Research partially sup-ported by the Swedish Research Council (VR).

2 A general Krylov method for solving symmetric systems of linear equations

In this paper we consider a general Krylov subspace method in which the generatedvectors form an orthogonal basis for the Krylov subspaces generated by H and c,

K0(c,H) = {0}, Kk(c,H) = span{c,Hc,H2c, . . . ,Hk−1c}, k = 1, 2, . . . . (1.2)

In our method, the process by which these vectors are generated bear much re-semblance to the Lanczos process, with the difference that the available scalingparameter is left explicitly undecided. The method of conjugate gradients is derivedas a special case of this general Krylov method. In addition, the minimum-residualmethod and explicit recursions for the minimum-residual iterates can be derivedbased on our framework.

There have been very many contributions to the theory regarding the Lanczosprocess, the method of conjugate gradients and minimum-residual methods, see,e.g., [6] for an extensive survey of the years 1948-1976 and [7].

We will assume exact arithmetic and the theory developed in this paper is basedon that assumption. In the end of the paper we briefly discuss computational aspectsof our results in finite precision.

The outline of the paper is as follows. In Section 2 we define and give a re-cursion for the Krylov vectors qk ∈ Kk+1(c,H), k = 0, . . . , r. In Section 3 wedefine triples (qk, yk, δk) associated with qk, k = 0, . . . , r, such that qk = Hyk + δkc,qk ∈ Kk+1(c,H), yk ∈ Kk(c,H) and δk ∈ R, k = 0, . . . , r. These triples are definedup to a scaling and recursions for them are given in Proposition 3.1. We then stateseveral results concerning the triples and using these results we devise an algorithmin Section 3.3 that either gives the solution, if (1.1) is compatible (in this case weshow that δr 6= 0), or gives a certificate of incompatibility (in this case δr = 0). Themain attribute is that if we do not require δk to attain a specific value or sign, thenwe do not have to require anything more than symmetry of H.

Further, in Section 4, the method of conjugate gradients is illustrated in ourframework, which gives a better way of understanding its behavior, in particularwhen H is not a (positive) definite matrix. Finally, in Section 5, a minimum-residualmethod applicable also for incompatible systems is illustrated in this framework andexplicit recursions for the for the minimum-residual iterates are given. An algorithmfor this method is given in Section 5.1.

1.1. Notation

The letter i, j and k denote integer indices, other lowercase letters such as q, y andc denote column vectors, possibly with super- and/or subscripts. For a symmetricmatrix H, H � 0 denotes H positive definite. Analogously, H � 0 is used to denoteH positive semidefinite. N (H) denotes the null space of H and R(H) denotes therange space of H. We will denote by Z an orthonormal matrix whose columns forma basis for N (H). If H is nonsingular, then Z is to be interpreted as an emptymatrix. When referring to a norm, the Euclidean norm is used throughout.

2. Background 3

2. Background

Regarding (1.1), the raw data available is the matrix H and the vector c and com-binations of the two, for example represented by the Krylov subspaces generatedby H and c, as defined in (1.2). For an introduction and background on Krylovsubspaces, see, e.g., [18].

Without loss of generality, the scaling of the first Krylov vector q0 may be chosenso that q0 = c. Then one sequence of linearly independent vectors may be generatedby letting qk ∈ Kk+1(c,H) ∩ Kk(c,H)⊥, k = 1, . . . , r, such that qk 6= 0, for k =0, 1, . . . , r−1 and qr = 0 where r is the minimum index k for which qk ∈ Kk+1(c,H)∩Kk(c,H)⊥ = {0}. These vectors {q0, q1, . . . , qr−1} form an orthogonal, hence linearlyindependent, basis of Kr(c,H). We will refer to these vectors as the Krylov vectors.With q0 = c, each vector qk, k = 1, . . . , r−1, is uniquely determined up to a scaling.A vector qk ∈ Kk+1(c,H) may be expressed as

qk =

k∑j=0

δ(j)k Hjc, (2.1)

for some parameters δ(j)k , j = 0, . . . , k. In order to ensure that qk = 0 if and only

if Hkc is linearly dependent on c, Hc, . . . , Hk−1c, it must hold that δ(k)k 6= 0. Also

note that for k < r, the vectors c, Hc, . . . , Hk−1c are linearly independent. Hence,

since qk is uniquely determined up to a nonzero scaling, so are δ(j)k , j = 0, . . . , k. For

k = r, qr = 0, and since δ(r)r 6= 0, the same argument shows that δ

(j)r , j = 0, . . . , r,

are uniquely determined up to a common nonzero scaling. This is made precise inLemma A.1.

The following proposition states a recursion for such a sequence of vectors wherethe scaling factors denoted by {αk}r−1k=0 is left unspecified. The recursion in Propo-sition 2.1 is a slight generalization of the Lanczos process for generating mutuallyorthogonal vectors, see [10,11], in which the scaling of each vector qk is chosen suchthat ||qk|| = 1, k = 0, . . . , r − 1. For completeness, this proposition and its proof isincluded.

Proposition 2.1. Let r denote the smallest positive integer k for which

Kk+1(c,H) ∩ Kk(c,H)⊥ = {0}.

Given q0 = c ∈ K1(c,H), there exist vectors qk, k = 1, . . . , r, such that

qk ∈ Kk+1(c,H) ∩ Kk(c,H)⊥, k = 1, . . . , r,

for which qk 6= 0, k = 1, . . . , r − 1, and qr = 0. Each such qk, k = 1, . . . , r − 1, isuniquely determined up to a scaling, and a sequence {qk}rk=1 may be generated as

q1 = α0

(−Hq0 +

qT0 Hq0

qT0 q0q0), (2.2a)

qk+1 = αk(−Hqk +

qTkHqk

qTk qkqk +

qTk−1Hqk

qTk−1qk−1qk−1

), k = 1, . . . , r − 1, (2.2b)


where αk, k = 0, . . . , r − 1, are free and nonzero parameters. In addition, it holdsthat

qTk+1qk+1 = −αkqTk+1Hqk, k = 0, . . . , r − 1. (2.3)

Proof. Given q0 = c, let k be an integer such that 1 ≤ k ≤ r − 1. Assume thatqi, i = 0, . . . , k, are mutually orthogonal with qi ∈ Ki+1(c,H) ∩ Ki(c,H)⊥. Letqk+1 ∈ Kk+2(c,H) be expressed as

qk+1 = −αkHqk +k∑i=0

η(i)k qi, k = 0, . . . , r − 1, (2.4)

In order for qk+1 to be orthogonal to qi, i = 0, . . . , k, the parameters η(i)k , i = 0, . . . , k,

are uniquely determined as follows.For k = 0, to have qT0 q1 = 0, it must hold that

η(0)0 = α0

qT0 Hq0

qT0 q0,

hence obtaining q1 ∈ K2(c,H)∩K1(c,H)⊥ as in (2.2a), where α0 is free and nonzero.For k such that 1 ≤ k ≤ r − 1, in order to have qTi qk+1 = 0, i = 0, . . . , k, it musthold that

η(k)k = αk

qTkHqk

qTk qk, η

(k−1)k = αk

qTk−1Hqk

qTk−1qk−1, and η

(i)k = 0, i = 0, . . . , k − 2.

The last relation follows by the symmetry ofH. Hence, obtaining qk+1 ∈ Kk+2(c,H)∩Kk+1(c,H)⊥ as in the three-term recurrence of (2.2b), where αk, k = 1, . . . , r − 1,are free and nonzero.

Since q1 is orthogonal to q0, and since qk+1 is orthogonal to qk and qk−1, k =1, . . . , r − 1, pre-multiplication of (2.2) with qTk+1 yields

qTk+1qk+1 = −αkqTk+1Hqk, k = 0, . . . , r − 1.

Finally note that if qk+1 is given by (2.2), then the only term that increases the power

of H is αk(−Hqk). Since αk 6= 0, repeated use of this argument gives δ(k+1)k+1 6= 0

if qk+1 is expressed by (2.1). In fact, δ(k+1)k+1 = (−1)k+1

∏ki=0 αi 6= 0. Hence, by

Lemma A.1, qk+1 = 0 implies Kk+2(c,H) ∩ Kk+1(c,H)⊥ = {0}, so that k + 1 = r,as required.

The choice of a sign in front of the first term of (2.4) is arbitrary. Our choice ofminus-sign is made in order to get coherence with existing theory as will be shownlater on.

Many methods for solving (1.1) are based on using the Lanczos process, inwhich the scaling factors are chosen such that the generated vectors have norm oneand matrix-factorization techniques are used on the symmetric tridiagonal matrixobtained by putting the three-term recurrence on matrix form. For an introductionto how Krylov subspace methods are formalized in this way, see, e.g., [2, 15]. Forour purposes we leave these available scaling factors unspecified.

3. A general Krylov method 5

3. A general Krylov method

3.1. An extended representation of the Krylov vectors

To find a solution of (1.1), if it exists, it is not sufficient to generate the sequence{qk}rk=1. Note that, as in (2.1), qk ∈ Kk+1(c,H), k = 0, . . . , r, may be expressed as

qk =k∑j=0

δ(j)k Hjc = H

( k∑j=1

δ(j)k Hj−1c

)+ δ

(0)k c, k = 1, . . . , r. (3.1)

It is not convenient to represent qk by (2.1). Therefore, we may define y0 = 0,δ0 = 1,

yk =

k∑j=1

δ(j)k Hj−1c ∈ Kk(c,H) and δk = δ

(0)k , k = 1, . . . , r,

so that (3.1) gives qk = Hyk + δkc. These quantities may be represented by a

triple (qk, yk, δk), k = 0, . . . , r. Note that {δ(j)k }kj=0 are given in association with the

raw data H and c, the choice we make is to use only δ(0)k explicitly and collect all

other terms in yk. These triples form the basis for the results of this paper. It isstraightforward to note that if δr 6= 0, then 0 = qr = Hxr + c for xr = (1/δr)yr, sothat xr solves (1.1). It will be shown that (1.1) has a solution if and only if δr 6= 0.

We will define (q0, y0, δ0) = (c, 0, 1). As mentioned earlier, for a given k such

that 1 ≤ k ≤ r, the parameters δ(j)k , j = 1, . . . , k, are uniquely defined up to a

scaling. Hence, so is the triple (qk, yk, δk). This is made precise in the recursiongiven in Proposition 3.1.

The terms gk and xk are reserved for the case when the scaling yields δk = 1,then we denote (qk, yk, δk) by (gk, xk, 1). Such a scaling is not always well defined.However, as we shall show, there is no need to base the scaling on the value of δk.

It is possible to continue the recursion, and let

yk = Hy(1)k + δ

(1)k c, with (3.2)

y(1)k =

k∑j=2

δ(j)k Hj−2c ∈ Kk−1(c,H), k = 2, . . . , r,

in addition to y(1)1 = 0. This will be used in the analysis, but not in the algorithm

presented.

3.2. Results on the triples

Based on Proposition 2.1, the following proposition states a recursion for (qk, yk, δk),k = 1, . . . , r, given (q0, y0, δ0) = (c, 0, 1).

Proposition 3.1. Let r denote the smallest positive integer k for which

Kk+1(c,H) ∩ Kk(c,H)⊥ = {0}.


Given (q0, y0, δ0) = (c, 0, 1), there exist vectors (qk, yk, δk), k = 1, . . . , r, such that

qk ∈ Kk+1(c,H) ∩ Kk(c,H)⊥, yk ∈ Kk(c,H), qk = Hyk + δkc, k = 1, . . . , r,

for which qk 6= 0, k = 1, . . . , r−1, and qr = 0. Each such (qk, yk, δk), k = 1, . . . , r, isuniquely determined up to a scalar, and a sequence {(qk, yk, δk)}rk=1 may be generatedas

y1 = α0

(− q0 +

qT0 Hq0

qT0 q0y0), δ1 = α0

(qT0 Hq0qT0 q0

δ0), q1 = Hy1 + δ1c,

and

yk+1 = αk(− qk +

qTkHqk

qTk qkyk +

qTk−1Hqk

qTk−1qk−1yk−1

), k = 1, . . . , r − 1,

δk+1 = αk(qTkHqkqTk qk

δk +qTk−1Hqk

qTk−1qk−1δk−1

), k = 1, . . . , r − 1,

qk+1 = Hyk+1 + δk+1c, k = 1, . . . , r − 1,

where αk, k = 0, . . . , r − 1, are free and nonzero parameters. In addition, it holdsthat yk are linearly independent for k = 1, . . . , r.

Proof. For k = 0, q0 = c = Hy0+δ0c, holds by assumption since y0 = 0 and δ0 = 1.For k = 1 rewriting the expression for q1 from Proposition 2.1 yields

q1 = α0

(−Hq0 +

qT0 Hq0

qT0 q0q0)

= α0

(−Hq0 +

qT0 Hq0

qT0 q0(Hy0 + δ0c)

)=

= α0H(− q0 +

qT0 Hq0

qT0 q0y0)

+ α0

(qT0 Hq0qT0 q0

δ0)c = Hy1 + δ1c,

where the last equality follows from the expressions for y1 and δ1. Hence, q1 =Hy1 + δ1c. Similarly for k = 2, . . . , r − 1, assume that qk = Hyk + δkc and qk−1 =Hyk−1 + δk−1c, then rewriting the expression for qk+1 from Proposition 2.1 yields,

qk+1 = αk(−Hqk +

qTkHqk

qTk qkqk +

qTk−1Hqk

qTk−1qk−1qk−1

)=

= αk(−Hqk +

qTkHqk

qTk qk(Hyk + δkc) +

qTk−1Hqk

qTk−1qk−1(Hyk−1 + δk−1c)

)=

= αkH(− qk +

qTkHqk

qTk qkyk +

qTk−1Hqk

qTk−1qk−1yk−1

)+ αk

(qTkHqkqTk qk

δk +qTk−1Hqk


)c =

= Hyk+1 + δk+1c,

for k = 2, . . . , r − 1, where the last equality follows from the expressions for yk+1

and δk+1, k = 1, . . . , r − 1. Hence, qk+1 = Hyk+1 + δk+1c, k = 1, . . . , r − 1.

By Lemma A.1 it holds that for k = 0, . . . , r, δ(j)k , j = 0, . . . , k are uniquely

determined up to a scaling, hence it follows that yk+1 and δk+1, k = 0, . . . , r− 1 areuniquely determined up to a scaling by the recursions of this proposition.


Further, note that the recursion for yk+1 has a nonzero leading term of qk plusadditional terms of yi, i = k and i = k−1. Since qk is orthogonal to yi for i ≤ k andqk 6= 0 for k < r, it follows that yk+1 are linearly independent for k = 0, . . . , r − 1.

We will henceforth refer to the triples (qk, yk, δk), k = 0, . . . , r, as given byProposition 3.1. In particular, the triple (qr, yr, δr), can now be used to show ourmain convergence result, namely that (1.1) has a solution if and only if δr 6= 0, andthat the recursions in Proposition 3.1 can be used to find a solution if δr 6= 0 and acertificate of incompatibility if δr = 0.

Theorem 3.1. Let (qk, yk, δk), k = 0, . . . , r, be given by Proposition 3.1, and let Zdenote a matrix whose columns form an orthonormal basis for N (H). Then, thefollowing holds for the cases δr 6= 0 and δr = 0 respectively.

a) If δr 6= 0, then Hxr + c = 0 for xr = (1/δr)yr, so that c ∈ R(H) and xr solves(1.1). In addition, it holds that ZTyk = 0, k = 0, . . . , r.

b) If δr = 0, then yr = δ(1)r ZZT c, with δ

(1)r 6= 0 and ZT c 6= 0, so that c 6∈ R(H)

and (1.1) has no solution. Further, there is a y(1)r ∈ Kk−1(c,H) so that yr =

Hy(1)r + δ

(1)r c. Hence, H(Hx

(1)r + c) = 0 for x

(1)r = (1/δ

(1)r )y

(1)r , so that x

(1)r

solves minx∈IRn ‖Hx+ c‖22.

If H is nonsingular, then Z is to be interpreted as an empty matrix.

Proof. For (a), suppose that δr 6= 0. Then 0 = qr = Hyr + δrc, hence Hxr + c = 0for xr = (1/δr)yr, i.e., xr = (1/δr)yr is a solution to (1.1). Since (1.1) has a solution,

it must hold that ZTc = 0. We have yk = Hy(1)k + δ

(1)k c for k = 0, . . . , r, so that

ZTyk = δ(1)k ZT c. As ZTc = 0, it follows that ZTyk = 0, k = 0, . . . , r.

For (b), suppose that δr = 0. We have yr = Hy(1)r + δ

(1)r c, so that ZT yr =

δ(1)r ZT c. If δr = 0, then 0 = qr = Hyr so that yr = δ

(1)r ZZT c. It follows from

Proposition 3.1 that yr 6= 0 so that δ(1)r 6= 0 and ZTc 6= 0. A combination of Hyr = 0

and yr = Hy(1)r + δ

(1)r c gives H(Hy

(1)r + δ

(1)r c) = 0. Consequently, since δ

(1)r 6= 0, it

holds that H(Hx(1)r + c) = 0 for x

(1)r = (1/δ

(1)r )y

(1)r . With f(x) = 1

2‖Hx+ c‖22, one

obtains ∇f(x) = H(Hx+ c), so that x(1)r is a global minimizer to f over IRn.

Hence, we have shown that (1.1) has a solution if and only if δr 6= 0, and that therecursions in Proposition 3.1 can be used to find a solution if δr 6= 0 and a certificateof incompatibility if δr = 0.

Next we make a few remarks on the behavior of the third component of thetriples defined by Proposition 3.1, i.e., the sequence {δk} and on the choice of signfor the nonzero scaling factors {αk}. In the following proposition it is shown thatthe sequence {δk} can not have two zero elements in a row.


Proposition 3.2. Let (qk, yk, δk), k = 0, . . . , r, be given by Proposition 3.1. Ifqk 6= 0 and δk = 0, then

δk+1 = − αkαk−1

qTk qk

qTk−1qk−1δk−1 6= 0.

Proof. By Proposition 2.1 it holds that

qTk qk = −αk−1qTkHqk−1,

and, taking into account δk = 0, the expression for δk+1 from Proposition 3.1 give

δk+1 = αkqTk−1Hqk

qTk−1qk−1δk−1 = − αk

αk−1

qTk qk

qTk−1qk−1δk−1, (3.3)

giving the required expression for δk+1.It remains to show that δk+1 6= 0. First, assume that k = 1 so that δ1 = 0.

Then, since α1 6= 0, α0 6= 0 and δ0 = 1, (3.3) gives δ2 6= 0. Now assume that k > 1.Assume, to get a contradiction, that δk+1 = 0. Then, since αk 6= 0, αk−1 6= 0,(3.3) gives δk−1 = 0. We may then repeat the same argument to obtain δi = 0,i = 1, . . . , k. But this gives a contradiction, as δ1 = 0 implies δ2 6= 0. Hence, it musthold that δk+1 6= 0, as required.

Based on Proposition 3.2 the following corollary states that if αk−1 and αk havethe same sign and δk = 0, then δk+1 and δk−1 will have opposite signs.

Corollary 3.1. Let (qk, yk, δk), k = 0, . . . , r, be given by Proposition 3.1 with αk−1and αk of the same sign. If qk 6= 0 and δk = 0, then δk+1δk−1 < 0.

The following lemma states an expression for the triples that will be usefulin showing properties of the signs of δk and αk for the case when H is positivesemidefinite.

Lemma 3.1. Let (qk, yk, δk), k = 0, . . . , r, be given by Proposition 3.1. If δk 6= 0and k < r, then(

yk+1 −δk+1

δkyk)TH(yk+1 −

δk+1

δkyk)

= αkδk+1

δkqTk qk. (3.4)

Proof. Eliminating c from the difference of qk+1 and qk yields

qk+1 −δk+1

δkqk = H

(yk+1 −

δk+1

δkyk). (3.5)

Then pre-multiplication of (3.5) with (yk+1 − δk+1

δkyk)

T yields

(yk+1 −

δk+1

δkyk)T (

qk+1 −δk+1

δkqk)

=(yk+1 −

δk+1

δkyk)TH(yk+1 −

δk+1

δkyk). (3.6)


Since qk+1 is orthogonal to yk+1 and yk, and since qk is orthogonal to yk, (3.6)becomes

−δk+1

δkyTk+1qk =

(yk+1 −

δk+1

δkyk)TH(yk+1 −

δk+1

δkyk). (3.7)

Hence, by Proposition 2.1 and since qk is orthogonal to yk and yk−1, (3.7) may bewritten as

αkδk+1

δkqTk qk =

(yk+1 −

δk+1

δkyk)TH(yk+1 −

δk+1

δkyk),

hence (3.4) is obtained.

The following lemma gives some results on the behavior of the sequence of {δk}in connection to the sign of αk for the case when H � 0.

Lemma 3.2. Let (qk, yk, δk), k = 0, . . . , r, be given by Proposition 3.1. Assumethat H � 0. Then δk 6= 0 for k < r. If δk > 0 and δk+1 6= 0, then δk+1 > 0 if andonly if αk > 0.

Proof. Assume that δk = 0 for k < r, then qk = Hyk, hence pre-multiplicationwith yTk yields 0 = yTk qk = yTkHyk, since qk is orthogonal to yk. Then, since H � 0,it follows that Hyk = 0 and hence qk = 0. Since qk 6= 0 for k < r, the assumptionyields a contradiction. Hence, δk 6= 0, k < r.

Next suppose that δk > 0 and δk+1 6= 0. Since H � 0, Lemma 3.1 gives

αkδk+1

δkqTk qk ≥ 0, (3.8)

which implies that δk+1 and δk have the same sign if and only if αk > 0. Hence, ifδk > 0, then δk+1 > 0 if and only if αk > 0.

The result of Lemma 3.2 motivates our choice of the minus-sign in (2.4). Oth-erwise δk would alternate sign in each iteration for αk > 0 and H � 0.

Note that for H � 0 and δ0 > 0, then δi > 0, i = 0, . . . , r−1, and δr ≥ 0. Further,if H � 0 then the inequality in (3.8) is strict implying that δi > 0, i = 1, . . . , r, i.e.,δr 6= 0 since (1.1) with H � 0 is always compatible.

Another consequence is that if αk is chosen positive for k = 0, . . . , r, then δk ≤ 0for some k implies H 6� 0 and δk < 0 for some k implies H 6� 0.

3.3. A general Krylov algorithm

Based on our results we now state an algorithm for solving (1.1) based on the triples(qk, yk, δk), k = 0, . . . , r, as in Proposition 3.1 using some αk of our choice. We callAlgorithm 3.1 a Krylov algorithm as it is the vectors {qk}, spanning the Krylov-subspaces, that drive the progress of the algorithm and ensures its termination.

The choice of a nonzero αk is in theory arbitrary, but for the algorithm statedbelow we have made the choice to let αk > 0 such that ‖yk+1‖2 = ‖c‖2, i.e.,

α0 =‖c‖2∣∣∣∣− q0 +qT0 Hq0qT0 q0

y0∣∣∣∣2

, αk =‖c‖2∣∣∣∣− qk +

qTk HqkqTk qk

yk +qTk−1Hqk

qTk−1qk−1yk−1

∣∣∣∣2

, k = 1, . . . , r.


This choice is well defined since yk 6= 0, k = 1, . . . , r, by Proposition 3.1.In theory, triples are generated as long as qk 6= 0. In the algorithm, we introduce

a tolerance such that we proceed with the iterations as long as ‖qk‖2 > qtol, wherewe let qtol =

√εM , where εM is the machine precision. In theory we also draw

conclusions based on δr 6= 0 or δr = 0, for this we introduce a tolerance δtol =√εM .

Algorithm 3.1 A general Krylov algorithm

Input arguments: H, c;Output arguments: compatible; xr if compatible=1; yr if compatible=0;qtol ← tolerance on ‖q‖2; [Our choice: qtol =

√εM ]

δtol ← tolerance on |δ|; [Our choice: δtol =√εM ]

k ← 0; q0 ← c; y0 ← 0; δ0 ← 1;

y1 ←(− q0 +

qT0 Hq0qT0 q0

y0); δ1 ←

( qT0 Hq0qT0 q0

δ0); q1 ← Hy1 + δ1c;

α0 ← nonzero scalar; [Our choice: α0 = ‖c‖2/‖y1‖2]q1 ← α0q1; y1 ← α0y1; δ1 ← α0δ1;k ← 1;while ‖qk‖2 > qtol do

yk+1 ←(− qk +

qTk HqkqTk qk

yk +qTk−1Hqk

qTk−1qk−1yk−1

); δk+1 ←

( qTk HqkqTk qk

δk +qTk−1Hqk


);

qk+1 ← Hyk+1 + δk+1c;αk ← nonzero scalar; [Our choice: αk = ‖c‖2/‖yk+1‖2]qk+1 ← αkqk+1; yk+1 ← αkyk+1; δk+1 ← αkδk+1;k ← k + 1;

end whiler ← k;if |δr| > δtol then

xr ← 1δryr; compatible ← 1;

elsecompatible ← 0;

end if

By Theorem 3.1, Algorithm 3.1 will return either a solution to (1.1) or a certifi-cate that the system is incompatible. Note that δk = 0 for some k does not stopAlgorithm 3.1.

The following example illustrates Algorithm 3.1 with our choices for αk > 0,qtol and δtol, on a case of (1.1) where δk = 0 every other iteration. The examplealso illustrates the change of sign between δk+1 and δk−1 when δk = 0 predicted byCorollary 3.1.

Example 3.1. Let

c =(

3 2 1 0 −1 −2 −3)T, H = diag(c),

Algorithm 3.1 applied to H and c with αk > 0 such that ||yk|| = ||c||, k = 1, . . . , r,and qtol = δtol =

√εM , yields the following sequences

4. Connection to the method of conjugate gradients 11

q =

3.0000 -9.0000 2.2678 -2.7046 0.2648 -0.2445 0.0000

2.0000 -4.0000 -2.2678 5.4912 -1.0591 1.4673 0

1.0000 -1.0000 -2.2678 2.3768 1.3239 -3.6681 0

0 0 0 0 0 0 0

-1.0000 -1.0000 2.2678 2.3768 -1.3239 -3.6681 0

-2.0000 -4.0000 2.2678 5.4912 1.0591 1.4673 0

-3.0000 -9.0000 -2.2678 -2.7046 -0.2648 -0.2445 -0.0000

y =

0 -3.0000 3.4017 -0.9015 -2.2241 -0.0815 2.1602

0 -2.0000 1.5119 2.7456 -2.8419 0.7336 2.1602

0 -1.0000 0.3780 2.3768 -0.9885 -3.6681 2.1602

0 0 0 0 0 0 0

0 1.0000 0.3780 -2.3768 -0.9885 3.6681 2.1602

0 2.0000 1.5119 -2.7456 -2.8419 -0.7336 2.1602

0 3.0000 3.4017 0.9015 -2.2241 0.0815 2.1602

delta =

1.0000 0 -2.6458 0 2.3123 0 -2.1602

Hence, r = 6 and xr = (1/δr)yr =(−1 −1 −1 0 −1 −1 −1

)T.

4. Connection to the method of conjugate gradients

Next we will put the method of conjugate gradients by Hestenes and Stiefel [9]into the framework of our results. For an introduction to the method of conjugategradients see, e.g., [12, 19]. This method is defined for the case where H � 0.The iterate xk is then defined as the solution to minKk(c,H)

12x

THx + cTx, and

gk = Hxk + c for k = 0, . . . , r, i.e., gk ∈ Kk+1(c,H) ∩ Kk(c,H)⊥.

In our setting, this is equivalent to generating triples (qk, yk, δk), k = 0, . . . , r,given by Proposition 3.1, selecting the scaling αk such that δk = 1. If it is assumedthat H � 0, Lemma 3.2 gives δk 6= 0, k = 0, . . . , r − 1. If c ∈ R(H), i.e., (1.1) iscompatible, then Theorem 3.1 ensures that δr 6= 0. Hence, if H � 0 and c ∈ R(H),such a scaling is well defined. The following proposition gives the particular choicesof scaling αk for which (qk, yk, δk) may be denoted by (gk, xk, 1), k = 0, . . . , r.

Proposition 4.1. Assume that H � 0 and c ∈ R(H). If (qk, yk, δk) are given byProposition 3.1 for the particular choices

α0 =1

δ0

qT0 q0

qT0 Hq0, αk =

1

qTk HqkqTk qk

δk +qTk−1Hqk


, k = 1, . . . , r − 1, (4.1)

then αk > 0, k = 0, . . . , r − 1, and δk = 1, k = 0, . . . , r.


Hence, (qk, yk, δk) may be denoted by (gk, xk, 1), k = 0, . . . , r, and it holds that

α0 =gT0 g0

gT0 Hg0, x1 = x0 + α0(−g0), g1 = Hx1 + c, (4.2a)

αk =1

gTk HgkgTk gk

+gTk−1Hgk

gTk−1gk−1

, k = 1, . . . , r − 1, (4.2b)

xk+1 = xk + αk(− gk −

gTk−1Hgk

gTk−1gk−1(xk − xk−1)

), k = 1, . . . , r − 1, (4.2c)

gk+1 = Hxk+1 + c, k = 1, . . . , r − 1. (4.2d)

Proof. By assumption δ0 = 1 and by inserting αk, k = 0, . . . , r − 1, as in (4.1) inthe expressions for δk+1, k = 1, . . . , r−1, from Proposition 3.1, it holds that δk = 1,k = 0, . . . , r. Hence, (qk, yk, δk) may be denoted by (gk, xk, 1), k = 0, . . . , r.

If (gk, xk, 1) is introduced for (qk, yk, δk) in (4.1) then the expressions for αk,k = 0, . . . , r − 1, in (4.2a) and (4.2b) are obtained. By these expressions, xk,k = 0, . . . , r, are given by x0 = 0,

x1 = α0

(− g0 +

gT0 Hg0

gT0 g0x0)

= x0 + α0(−g0),

and

xk+1 = αk(− gk +

gTkHgk

gTk gkxk +

gTk−1Hgk

gTk−1gk−1xk−1

)=

= xk + αk(− gk −

gTk−1Hgk

gTk−1gk−1(xk − xk−1)

), k = 1, . . . , r − 1,

and gk, k = 0, . . . , r, are given by g0 = c and gk+1 = Hxk+1 +c, k = 0, . . . , r−1. ByLemma 3.2, and since δk = 1 > 0, k = 0, . . . , r, it holds that αk > 0, k = 0, . . . , r−1.

It is evident that the method of conjugate gradients relies heavily on the factthat (gk, xk, 1) is well defined for all k = 0, . . . , r, i.e., that δk 6= 0 for all k = 0, . . . , r.In effect, the method of conjugate gradients implicitly makes the very specific choiceof αk, k = 0, . . . , r − 1, as given in Proposition 4.1. If the assumption H � 0 andc ∈ R(H) is lifted then a high price is paid for using (gk, xk, 1), the algorithm breaksdown if δk = 0 for some k. By our analysis, this choice of scaling is unnecessary, asillustrated by Theorem 3.1 and by Algorithm 3.1.

In particular, if the method of conjugate gradients is applied to a case whereH � 0 and c 6∈ R(H), the algorithm will break down. From Lemma 3.1, it is clearthat this will happen at the very last iteration. Algorithm 3.1 would instead provideyr as a certificate of incompatibility in this situation.

The method of conjugate gradients may be regarded as a line-search method onan unconstrained quadratic program, see, e.g. [19], where search-directions pk andan optimal step-lengths along these are calculated along with the xk and gk, k =

5. Connection to the minimum-residual method 13

0, . . . , r. In the following corollary we show that this formulation can be obtainedfrom Proposition 4.1.

Corollary 4.1. Assume that H � 0 and c ∈ R(H). Let (gk, xk, 1), k = 0, . . . , r, begiven by Proposition 4.1. Then,

xk+1 = xk + αkpk, k = 0, . . . , r − 1,

gk+1 = gk + αkHpk, k = 0, . . . , r − 1,

where

p0 = −g0, pk = −gk +gTk gk

gTk−1gk−1pk−1, k = 1, . . . , r − 1,

αk = −gTk pk

pTkHpk, k = 0, . . . , r − 1.

Proof. Let pk = (1/αk)(xk+1 − xk), k = 0, . . . , r − 1. Proposition 4.1 then gives

p0 = −g0, and pk = −gk −gTk−1Hgk

gTk−1gk−1(xk − xk−1), k = 1, . . . , r − 1.

By Proposition 2.1 it holds that gTk gk = −αk−1gTkHgk−1, hence

pk = −gk −gTk−1Hgk

gTk−1gk−1αk−1pk−1 = −gk +

gTk gk

gTk−1gk−1pk−1, k = 1, . . . , r − 1.

Consequently, since gk+1 − gk = H(xk+1 − xk) = αkHpk, k = 0, . . . , r − 1, it holdsthat gk+1 = gk + αkHpk, k = 0, . . . , r − 1. Further, gTk+1pk = 0 since gk+1 ∈Kk+1(c,H) ∩ Kk(c,H)⊥ and pk ∈ Kk(c,H). Hence, 0 = gTk+1pk = gTk pk + αkp

TkHpk

yields

αk = −gTk pk

pTkHpk, k = 0, . . . , r − 1,

completing the proof.

Note that the scaling, given in Proposition 4.1, of the triple in our formulationis regarded as the scaling of a search direction in the line-search formulation ofthe method of conjugate gradients. From the point of view of our results it is notnecessary to calculate search directions, Proposition 4.1 gives a complete descriptionof the method of conjugate gradients.

5. Connection to the minimum-residual method

One issue that remains to be addressed is the case when (1.1) is incompatible.Algorithm 3.1 then gives a certificate of this fact, but often one would be interestedin a vector x that is ”as good as possible”. The method of choice could then be theminimum residual method. This method goes back to [11] and [20], and the name is


adopted form the implementation of the method, MINRES, by Paige and Saunders,see [16].

For k = 0, . . . , r, xMRk is defined as a solution to minx∈Kk(c,H) ‖Hx + c‖22, and

the corresponding residual gMRk is defined as gMR

k = HxMRk + c. The vectors xMR

k

are uniquely defined for k = 0, . . . , r − 1, and for k = r if c ∈ R(H). For thecase k = r and c 6∈ R(H) there is one degree of freedom for xMR

r . If c ∈ R(H),then xMR

r solves (1.1), and if c 6∈ R(H), then xMRr−1 and xMR

r are both solutions tominx∈IRn ‖Hx+ c‖22.

In the following theorem, we derive the minimum-residual method based on ourframework. In particular, we give explicit formulas for xMR

k and gMRk , k = 0, . . . , r.

For the case k = r, c 6∈ R(H), we give an explicit formula for xMRr of minimum

Euclidean norm.

Theorem 5.1. Let (qk, yk, δk) be given by Proposition 3.1 for k = 0, . . . , r. Then,for k = 0, . . . , r, it holds that xMR

k solves minx∈Kk(c,H) ‖Hx + c‖22 if and only if

xMRk =

∑ki=0 γiyi for some γi, i = 0, . . . , k, that are optimal to

minimizeγ0,...,γk

12

k∑i=0

γ2i qTi qi

subject to∑k

i=0 γiδi = 1.

(5.1)

In particular, xMRk takes the following form for the mutually exclusive cases (a)

k < r; (b) k = r and δr 6= 0; and (c) k = r and δr = 0.

a) For k < r, it holds that

xMRk =

1∑kj=0

δ2jqTj qj

k∑i=0

δi

qTi qiyi, (5.2)

and gMRk = HxMR

k + c 6= 0.

b) For k = r and δr 6= 0, it holds that xMRr = (1/δr)yr and g

MRr = HxMR

r +c = 0,so that xMR

r solves (1.1) and xMRr is identical to xr of Theorem 3.1.

c) For k = r and δr = 0, it holds that xMRr = xMR

r−1+γryr, where γr is an arbitraryscalar, and gMR

r = HxMRr + c = gMR

r−1 6= 0. In addition, xMRr−1 and xMR

r solveminx∈IRn ‖Hx+ c‖22. The particular choice

γr = −yTrx

MRr−1

yTryr

makes xMRr an optimal solution to minx∈IRn ‖Hx+ c‖22 of minimum Euclidean

norm.


Proof. Since qi, i = 0, . . . , k, form an orthogonal basis for Kk+1(c,H), an arbitraryvector in Kk+1(c,H) can be written as

g =k∑i=0

γiqi =k∑i=0

γi(Hyi + δic) = H( k∑i=0

γiyi)

+ (k∑i=0

γiδi)c, (5.3)

for some parameters γi, i = 0, . . . , k. Consequently, the condition∑k

i=0 γiδi = 1

inserted into (5.3) gives g = Hx + c with x =∑k

i=0 γiyi, i.e., g is an arbitraryvector in Kk+1(c,H) for which the coefficient in front of c equals one, and x is thecorresponding arbitrary vector in Kk(c,H). Minimizing the Euclidean norm of sucha g gives the quadratic program

minimizeg,γ0,...,γk

12gTg

subject to g =∑k

i=0 γiqi,∑ki=0 γiδi = 1,

(5.4)

so that, by (5.3), the optimal values of γi, i = 0, . . . , k, give gMRk as gMR

k =∑k

i=0 γiqiand xMR

k as xMRk =

∑ki=0 γiyi. Elimination of g in (5.4), taking into account the

orthogonality of the qi’s, gives the equivalent problem (5.1). Also note that sinceδ0 6= 0, the quadratic programs (5.1) and (5.4) are always feasible, and hence theyare well defined.

Let L(γ, λ) be the Lagrangian function for (5.1),

L(γ, λ) =1

2

k∑i=0

γ2i qTi qi − λ

( k∑i=0

γiδi − 1).

The optimality conditions for (5.1) are given by

0 = ∇γiL(γ, λ) = γiqTi qi − λδi, i = 0, . . . , k, (5.5a)

0 = ∇λL(γ, λ) = 1−k∑i=0

γiδi. (5.5b)

First, for (a), consider the case k < r. From (5.5a) it holds that

γi = λδi

qTi qi, i = 0, . . . , k, (5.6)

which are well defined, since qi 6= 0, i = 0, . . . , r−1. The expression for λ is obtainedby inserting the expression for γi, i = 0, . . . , k, given by (5.6) in (5.5b) so that

1 =k∑i=0

γiδi =k∑i=0

λδ2iqTi qi

yielding λ =1∑k

i=0δ2iqTi qi

. (5.7)

Consequently, a combination of (5.6) and (5.7) gives

γi =1∑k

j=0

δ2jqTj qj

δi

qTi qi, i = 0, . . . , k. (5.8)


Hence, by letting xMRk =

∑ki=0 γiyi, with γi given by (5.8), (5.2) follows.

Now, for (b), consider the case k = r with δr 6= 0. Then, since qr = 0, (5.5a)gives λ = 0 and γi = 0, i = 0, . . . , r − 1. Consequently, (5.5b) gives γr = 1/δr.Again, by letting xMR

r =∑r

i=0 γiyi, it holds that xMRk = (1/δr)yr, for which gMR

k =HxMR

k + c = 0, so that the optimal value is zero in (5.1) and xMRr solves (1.1).

Finally, for (c), consider the case k = r with δr = 0. Theorem 3.1 shows that

there exists an x(1)r ∈ Kk−1(c,H) that solves minx∈IRn ‖Hx + c‖22. Consequently,

since x(1)r ∈ Kk−1(c,H), it follows from (a) that it must hold that x

(1)r = xMR

r−1 sothat xMR

r−1 solves minx∈IRn ‖Hx+ c‖22.For k = r, the optimality conditions (5.5) are equivalent to when k = r− 1, just

with the additional information that γr is arbitrary. Hence, xMRr = xMR

r−1 +γryr andgMRr = HxMR

r + c = gMRr−1 for γr arbitrary since Hyr = 0.

For the remainder of the proof, let xMRr = xMR

r−1 + γryr for the particular choice

γr = −(yTr xMRr−1)/(yTr yr), so that yTr x

MRr = 0. Since yk = Hy

(1)k + δ

(1)k c, it follows

that ZT y(1)k = δ

(1)k ZT c, k = 0, . . . , r. Consequently, since xMR

r is formed as alinear combination of yk, k = 0, . . . , r, it holds that ZTxMR

r−1 is parallel to ZTc. But0 = yTr x

MRr = cTZZTxMR = 0, so that ZTxMR is also orthogonal to ZTc. By

Theorem 3.1, ZTc 6= 0, so that ZTxMR = 0. Hence, xMRr belongs to the range

space of H for this choice of γr. As xMRr = xMR

r−1 + γryr and yr belongs to the nullspace of H, it follows that the range-space component of xMR

r is unique. Hence,since the range-space component is unique and the null-space component is zero,the conclusion is that xMR

r is an optimal solution to minx∈IRn ‖Hx+c‖22 of minimumEuclidean norm.

Note that at an iteration k at which δk = 0 and qk 6= 0, it holds that xMRk = xMR

k−1so that the iterate is unchanged. By Proposition 3.2 this cannot happen at twoconsecutive iterations. If qk 6= 0, all information from the problem has not beenextracted even if δk = 0. Only in the case when k = r, global information isobtained, and it is determined whether (1.1) has a solution.

In the following corollary of Theorem 5.1 we may state explicit recursions forthe minimum-residual method.

Corollary 5.1. Let (qk, yk, δk) be given by Proposition 3.1 and let xMRk be given by

Theorem 5.1 for k = 0, . . . , r. If δMR0 = δ20, y

MR0 = δ0y0,

δMRk = qTk qk

k−1∑i=0

δ2iqTi qi

+ δ2k and yMRk = qTk qk

k−1∑i=0

δi

qTi qiyi + δkyk, k = 1, . . . , r,

then

xMRk =

1

δMRk

yMRk , k = 0, . . . , r − 1 and k = r if δr 6= 0.

In addition, it holds that

δMRk+1 =

qTk+1qk+1

qTk qkδMRk + δ2k+1 and yMR

k+1 =qTk+1qk+1

qTk qkyMRk + δk+1yk+1,

for k = 0, . . . , r − 1.


Proof. For k = 0, . . . , r − 1, the expressions for δMRk and yMR

k give

1

qTk qkδMRk =

k∑i=0

δ2iqTi qi

and1

qTk qkyMRk =

k∑i=0

δi

qTi qiyi.

Note that δ0 6= 0 and qi 6= 0, i = 0, . . . , k implies δMRk > 0 for k < r. Hence,

Theorem 5.1 gives xMRk = (1/δMR

k )yMRk . If k = r and δr 6= 0, the expressions for

δMRr and yMR

r giveδMRr = δ2r > 0 and yMR

r = δryr,

so that Theorem 5.1 gives xMRr = (1/δMR

r )yMRr also for this case.

To see the recursions, note that for k = 0, . . . , r − 1 it follows that

δMRk+1 = qTk+1qk+1

k∑i=0

δ2iqTi qi

+ δ2k+1

=qTk+1qk+1

qTk qk

(qTk qk

k−1∑i=0

δ2iqTi qi

+ δ2k

)+ δ2k+1 =

qTk+1qk+1

qTk qkδMRk + δ2k+1,

and

yMRk+1 = qTk+1qk+1

k∑i=0

δi

qTi qiyi + δk+1yk+1

=qTk+1qk+1

qTk qk

(qTk qk

k−1∑i=0

δi

qTi qiyi + δkyk

)+ δk+1yk+1 =

qTk+1qk+1

qTk qkyMRk + δk+1yk+1,

as required.

The recursion for xMRr , based on xMR

r−1 and (qr, yr, δr), for the case δr = 0 is givenin Theorem 5.1 and it is not reiterated in Corollary 5.1.

Note that the expressions in Theorem 5.1 and Corollary 5.1 for xMRk , k = 0, . . . , r,

are independent of the scaling of (qk, yk, δk). Hence, the scaling δk = 1 may be usedfor the case H � 0, c ∈ R(H), as stated in the following corollary.

Corollary 5.2. Assume that H � 0 and c ∈ R(H). Let (gk, xk, 1), k = 0, . . . , r, begiven by Proposition 4.1. Then, for each k = 0, . . . , r, (qk, yk, δk) may be replacedby (gk, xk, 1) in Theorem 5.1 and Corollary 5.1.

In effect, if H � 0 and c ∈ R(H), then xMRk , k = 0, . . . , r − 1, is given as a

convex combination of xi, i = 0, . . . , k.

5.1. A mimimum-residual algorithm based on the general Krylov method

Next we state an algorithm for the minimum-residual method based on Algorithm 3.1and extended with the recursion in Corollary 5.1. We used the same αk > 0 andtolerances qtol and δtol as for Algorithm 3.1.


Hence, for a compatible system (1.1) Algorithm 5.2 gives the same solution xras Algorithm 3.1, and in addition it calculates xMR

r . They are both estimates of asolution to (1.1). Further, if (1.1) is incompatible then Algorithm 5.2 delivers xMR

r ,an optimal solution to minx∈IRn ‖Hx+ c‖22 of minimum Euclidean norm, in additionto the certificate of incompatibility.

Algorithm 5.2 A mimimum-residual algorithm based on the general Krylovmethod

Input arguments: H, c;Output arguments: xMR

r , gMRr , compatible; (xr or yr if compatible=1 or 0;)

qtol ← tolerance on ‖q‖2; [Our choice: qtol =√εM ]

δtol ← tolerance on |δ|; [Our choice: δtol =√εM ]

k ← 0; q0 ← c; y0 ← 0; δ0 ← 1;yMR0 ← δ0y0; δMR

0 ← δ20 ; xMR0 ← 1

δMR0

yMR0 ; gMR

0 ← HxMR0 + c;

y1 ←(− q0 +

qT0 Hq0qT0 q0

y0); δ1 ←

( qT0 Hq0qT0 q0

δ0); q1 ← Hy1 + δ1c;

α0 ← nonzero scalar; [Our choice: α0 = ‖c‖2/‖y1‖2]q1 ← α0q1; y1 ← α0y1; δ1 ← α0δ1;if ‖q1‖2 > qtol or |δ1| > δtol then

yMR1 ← qT1 q1

qT0 q0yMR0 + δ1y1; δMR

1 ← qT1 q1qT0 q0

δMR0 + δ21 ;

xMR1 ← 1

δMR1

yMR1 ; gMR

1 ← HxMR1 + c;

end ifk ← 1;while ‖qk‖2 > qtol do

yk+1 ←(− qk +

qTk HqkqTk qk

yk +qTk−1Hqk

qTk−1qk−1yk−1

); δk+1 ←

( qTk HqkqTk qk

δk +qTk−1Hqk


);

qk+1 ← Hyk+1 + δk+1c;αk ← nonzero scalar; [Our choice: αk = ‖c‖2/‖yk+1‖2]qk+1 ← αkqk+1; yk+1 ← αkyk+1; δk+1 ← αkδk+1;if ‖qk+1‖2 > qtol or |δk+1| > δtol then

yMRk+1 ←

qTk+1qk+1

qTk qkyMRk + δk+1yk+1; δMR

k+1 ←qTk+1qk+1

qTk qkδMRk + δ2k+1;

xMRk+1 ←

1δMRk+1

yMRk+1; gMR

k+1 ← HxMRk+1 + c;

end ifk ← k + 1;

end whiler ← k;if |δr| > δtol then

xr ← 1δryr; compatible ← 1;

else

xMRr ← xMR

r−1 −yTrx

MRr−1

yTryryr; compatible ← 0;

end if

Revisiting Example 3.1 and applying the extended Algorithm 5.2 to the compat-ible system of (1.1), then in addition the identical {δk}, {qk} and {yk}(see Example


3.1), Algorithm 5.2 also returns

xMR =

0 0 -1.1108 -1.1108 -0.9953 -0.9953 -1.0000

0 0 -0.4937 -0.4937 -1.0641 -1.0641 -1.0000

0 0 -0.1234 -0.1234 -0.3593 -0.3593 -1.0000

0 0 0 0 0 0 0

0 0 -0.1234 -0.1234 -0.3593 -0.3593 -1.0000

0 0 -0.4937 -0.4937 -1.0641 -1.0641 -1.0000

0 0 -1.1108 -1.1108 -0.9953 -0.9953 -1.0000

We state {δk} again for convenience,

delta =

1.0000 0 -2.6458 0 2.3123 0 -2.1602

and note that for δk = 0, it holds that xMRk = xMR

k−1, as predicted by Theorem 5.1.Next we observe an example that illustrates Algorithm 5.2 with our choices for

αk > 0, qtol and δtol, on a case when (1.1) is incompatible, i.e. c /∈ R(H).

Example 5.1. Let

c =(

3 2 1 1 −1 −2 −3)T, H = diag

(5 2 1 0 −1 −2 −3

),

Algorithm 5.2 applied to H and c with αk > 0 such that ||yk|| = ||c||, k = 1, . . . , r,and qtol = δtol =

√εM , yields the following sequences

q =

3.0000 -13.1379 3.5628 -0.8597 0.1063 -0.0181 0.0017 -0.0000

2.0000 -2.7586 -5.7676 3.9832 -1.3787 0.6372 -0.1470 -0.0000

1.0000 -0.3793 -3.1464 0.1039 1.8638 -2.5737 1.1021 0.0000

1.0000 0.6207 -2.8617 -1.7605 2.2573 0.5896 -1.7634 -0.0000

-1.0000 -1.6207 2.0296 2.7934 -0.6882 -2.4489 -1.4695 0.0000

-2.0000 -5.2414 1.3007 4.3735 2.1842 1.1548 0.3149 -0.0000

-3.0000 -10.8621 -3.8286 -2.6032 -0.6658 -0.2082 -0.0367 0.0000

y =

0 -3.0000 2.4296 0.8844 -1.3331 -0.3574 1.0584 -0.0000

0 -2.0000 -0.0222 3.7521 -2.9466 -0.2710 1.6899 0

0 -1.0000 -0.2847 1.8644 -0.3935 -3.1633 2.8655 0.0000

0 -1.0000 -0.5584 1.5833 0.7502 -3.7018 -0.7931 5.3852

0 1.0000 0.8320 -1.0329 -1.5691 1.8593 3.2329 0.0000

0 2.0000 2.2113 -0.4262 -3.3494 -1.1670 1.6060 0.0000

0 3.0000 4.1379 2.6283 -2.0353 -0.5202 1.7756 0.0000

delta =

1.0000 0.6207 -2.8617 -1.7605 2.2573 0.5896 -1.7634 -0.0000


xMR =

0 -0.1588 -0.6633 -0.6143 -0.5995 -0.5998 -0.6000 -0.6000

0 -0.1059 -0.0228 -0.6647 -1.0640 -1.0371 -1.0000 -1.0000

0 -0.0529 0.0585 -0.2817 -0.2148 -0.4441 -1.0000 -1.0000

0 -0.0529 0.1284 -0.1845 0.1376 -0.1481 0.1333 -0.0000

0 0.0529 -0.1983 0.0407 -0.4178 -0.2588 -1.0000 -1.0000

0 0.1059 -0.5364 -0.2994 -1.0375 -1.0794 -1.0000 -1.0000

0 0.1588 -1.0143 -1.1600 -0.9990 -0.9938 -1.0000 -1.0000

Hence, r = 7 and xMRr =

(−0.6 −1 −1 0 −1 −1 −1

)T. Note that since

|δr| < δtol, the system is considered incompatible and xMRr is the optimal solution to

minx∈IRn ‖Hx+ c‖22 of minimum Euclidean norm and ‖HxMRr + c‖22 = 1.

6. Summary and conclusion

We have presented a general Krylov method for solving a system of linear equationsHx+ c for H symmetric. No assumption on H except symmetry is made. In exactarithmetic, we determine if the system is compatible. If so, a solution is computed.If not, a certificate of incompatibility is computed. The basis for our method are thetriples (qk, yk, δk), k = 0, . . . , r, given by Proposition 3.1, that are uniquely definedup to a scaling. We have obtained the method of conjugate gradients as a specialcase of our method by giving the scaling for which δk = 1, k = 0, . . . , r.

We have also put the minimum-residual method in our framework and providedexplicit formulas for the iterations. Again, we base our analysis on the triples. Inthe case of an incompatible system, our method gives xMR

r of minimum Euclideannorm. The original implementation of MINRES [16] did not deliver the optimalsolution to minx∈IRn ‖Hx+ c‖22 of minimum Euclidean norm. In [2] a MINRES-likealgorithm, MINRES-QLP is developed that does.

One may observe that an alternative to using the minimum-residual iterations

would be to consider recursions for y(1)k and δ

(1)k as given by (3.2) and then calculate

xMRr−1 = (1/δ

(1)r )y

(1)r , according to the analysis in the proof of Theorems 3.1 and

5.1. However, such an approach would not automatically yield the estimate xMRk ,

k = 0, . . . , r − 2.

One could also note that the method of conjugate gradients may be viewedas trying to solve the miminum-residual problem (5.1) in the situation where onlythe present triple (qk, yk, δk) is allowed in the linear combination, i.e., γi = 0, i =0, . . . , k−1. The problem is then not necessarily feasible. It will be infeasible exactlywhen δk = 0. One could think of methods other than the minimum-residual methodwhich use a linear combination of more than one triple. By Proposition 3.2, it wouldsuffice to use two consecutive triples, since it cannot hold that δk−1 = 0 and δk = 0.The method SYMMLQ [16] uses this property, but we have not investigated theprecise relationship to an optimization problem on the same form as (5.1).

Finally, we want to stress that this paper is meant to give insight into Krylovmethods, but we do not claim any numerical properties. The general Krylov methodwould inherit deficiencies of any method based on the Lanczos process such as loss oforthogonality of the generated vectors. It is beyond the scope of the present paper

A. A result on the Krylov vectors 21

to make such an analysis, see, e.g., [8,13,14,17]. The theory of our paper is based ondetermining if certain quantities are zero or not. In our algorithms, we have madechoices on optimality tolerances that are not meant to be universal. To obtain afully functioning algorithm, the issue of determining if a quantity is near-zero wouldneed to be considered more in detail. Also, we have based our analysis on the triples,so that termination of Algorithm 5.2 is based on qk and δk. In practice, one shouldprobably also consider gMR

k .Further, the use of pre-conditioning is not explored in this paper, for this subject

see, e.g., [1, 3, 4].

A. A result on the Krylov vectors

For completeness, we review a result that characterizes the properties of qk expressedas in (2.1), needed for the analysis.

Lemma A.1. Let r denote the smallest positive integer k for which

Kk+1(c,H) ∩ Kk(c,H)⊥ = {0}.

For an index k such that 1 ≤ k ≤ r, let qk ∈ Kk+1(c,H) ∩ Kk(c,H)⊥, be expressed

as in (2.1). Then, the scalars δ(j)k , j = 0, . . . , k, are uniquely determined up to

a nonzero scaling. In addition, if δ(k)k 6= 0 it holds that qk = 0 if and only if

Kk+1(c,H) ∩ Kk(c,H)⊥ = {0}, i.e., if k = r.

Proof. Assume that qk ∈ Kk+1(c,H)∩Kk(c,H)⊥ is expressed as in (2.1). If k < r,

then c, Hc, H2c, . . . , Hkc are linearly independent. Hence, δ(j)k , j = 0, . . . , k, are

uniquely determined by qk. Consequently, as qk is uniquely defined up to a nonzero

scaling, then so are δ(j)k , j = 0, . . . , k. For k = r, we have qr = 0 so that

−δ(r)r Hrc =r−1∑j=0

δ(j)r Hjc. (A.1)

By the definition of r, it holds that c, Hc, H2c, . . . , Hr−1c are linearly independent.

Hence, (A.1) shows that a fixed δ(r)r uniquely determines δ

(j)r , j = 1, . . . , r − 1.

Consequently, a scaling of δ(r)r gives a corresponding scaling of δ

(j)r , j = 0, . . . , r− 1.

Thus, δ(j)r , j = 0, . . . , r are uniquely determined up to a common scaling.

Finally, assume that δ(k)k 6= 0. By definition Kk+1(c,H) ∩ Kk(c,H)⊥ = {0}

implies qk = 0. To show the converse, assume that qk = 0. Then,

−δ(k)k Hkc =

k−1∑j=0

δ(j)k Hjc. (A.2)

If δ(k)k 6= 0, then (A.2) impliesHkc ∈ span{c,Hc,H2c, . . . ,Hk−1c}, i.e., Kk+1(c,H) =

Kk(c,H), so that Kk+1(c,H) ∩ Kk(c,H)⊥ = {0}, completing the proof.

22 References

References

[1] Axelsson, O. On the efficiency of a class of A-stable methods. Nordisk Tidskr. Informations-behandling (BIT) 14 (1974), 279–287.

[2] Choi, S.-C. T., Paige, C. C., and Saunders, M. A. MINRES-QLP: a Krylov subspacemethod for indefinite or singular symmetric systems. SIAM J. Sci. Comput. 33, 4 (2011),1810–1836.

[3] Evans, D. J. The use of pre-conditioning in iterative methods for solving linear equationswith symmetric positive definite matrices. J. Inst. Math. Appl. 4 (1968), 295–314.

[4] Evans, D. J. The analysis and application of sparse matrix algorithms in the finite elementmethod. 427–447.

[5] Forsgren, A., Gill, P. E., and Shinnerl, J. R. Stability of symmetric ill-conditionedsystems arising in interior methods for constrained optimization. SIAM J. Matrix Anal. Appl.17 (1996), 187–211.

[6] Golub, G. H., and O’Leary, D. P. Some history of the conjugate gradient and Lanczosalgorithms: 1948–1976. SIAM Rev. 31, 1 (1989), 50–102.

[7] Golub, G. H., and Van Loan, C. F. Matrix computations, vol. 3 of Johns Hopkins Seriesin the Mathematical Sciences. Johns Hopkins University Press, Baltimore, MD, 1983.

[8] Hanke, M. Conjugate gradient type methods for ill-posed problems, vol. 327 of Pitman ResearchNotes in Mathematics Series. Longman Scientific & Technical, Harlow, 1995.

[9] Hestenes, M. R., and Stiefel, E. Methods of conjugate gradients for solving linear systems.J. Research Nat. Bur. Standards 49 (1952), 409–436 (1953).

[10] Lanczos, C. An iteration method for the solution of the eigenvalue problem of linear differ-ential and integral operators. J. Research Nat. Bur. Standards 45 (1950), 255–282.

[11] Lanczos, C. Solution of systems of linear equations by minimized-iterations. J. ResearchNat. Bur. Standards 49 (1952), 33–53.

[12] Luenberger, D. G. Linear and nonlinear programming, second ed. Addison-Wesley Pub Co,Boston, MA, 1984.

[13] Meurant, G., and Strakos, Z. The Lanczos and conjugate gradient algorithms in finiteprecision arithmetic. Acta Numer. 15 (2006), 471–542.

[14] Paige, C. C. Error analysis of the Lanczos algorithm for tridiagonalizing a symmetric matrix.J. Inst. Math. Appl. 18, 3 (1976), 341–349.

[15] Paige, C. C. Krylov subspace processes, Krylov subspace methods, and iteration polynomials.In Proceedings of the Cornelius Lanczos International Centenary Conference (Raleigh, NC,1993) (Philadelphia, PA, 1994), SIAM, pp. 83–92.

[16] Paige, C. C., and Saunders, M. A. Solutions of sparse indefinite systems of linear equations.SIAM J. Numer. Anal. 12, 4 (1975), 617–629.

[17] Reid, J. K. On the method of conjugate gradients for the solution of large sparse systems oflinear equations. In Large sparse sets of linear equations (Proc. Conf., St. Catherine’s Coll.,Oxford, 1970). Academic Press, London, 1971, pp. 231–254.

[18] Saad, Y. Iterative methods for sparse linear systems, second ed. Society for Industrial andApplied Mathematics, Philadelphia, PA, 2003.

[19] Shewchuk, J. R. An introduction to the conjugate gradient method without the agonizingpain. Tech. rep., Pittsburgh, PA, USA, 1994.

[20] Stiefel, E. Relaxationsmethoden bester Strategie zur Losung linearer Gleichungssysteme.Comment. Math. Helv. 29 (1955), 157–179.

a general krylov method for solving symmetric systems of linear equations

Documents