minimal residual methods for large scale lyapunov equations

MINIMAL RESIDUAL METHODS FOR LARGE SCALE LYAPUNOV

EQUATIONS∗

YIDING LIN† AND VALERIA SIMONCINI ‡

Abstract. Projection methods have emerged as competitive techniques for solving large scalematrix Lyapunov equations. We explore the numerical solution of this class of linear matrix equationswhen a Minimal Residual (MR) condition is used during the projection step. We derive both a newdirect method, and a preconditioned operator-oriented iterative solver based on CGLS, for solvingthe projected reduced least squares problem. Numerical experiments with benchmark problems showthe effectiveness of an MR approach over a Galerkin procedure using the same approximation space.

Key words. Lyapunov equation, minimal residual method, Galerkin condition, normal equa-tion, PCGLS.

AMS subject classifications. 65F10, 65F30, 15A06

1. Introduction. The solution of the following Lyapunov matrix equation,

AX +XA∗ +BB∗ = 0, A ∈ Rn×n, B ∈ R

n×p (1.1)

with p ≪ n and ∗ denoting the conjugate transpose, plays an important role in thestability analysis of dynamical systems and in control theory; see, e.g., [1],[4]. Weshall assume that A is stable, that is the real parts of its eigenvalues are in the lefthalf complex plane. For n small, say up to a few hundreds, well established and robustnumerical methods exist, which are based on the Schur decomposition of A [2]; see also[15]. For large scale problems, and in particular for A stemming from the discretizationof two (2D) or three-dimensional (3D) partial differential operators, several numericalmethods have been explored to approximate the positive semidefinite solution matrixX by means of a low-rank matrix, ZZ∗ ≈ X, so that only the tall matrix Z needsto be stored. The Alternating Direction Implicit (ADI) method was introduced byWachspress [42] as an iterative procedure to obtain a full approximation matrix; seealso [32]. An effective cyclic low-rank version was proposed by Penzl in [36] and by Liand White in [31], and since then, a lot of work has been devoted to the algorithmicand convergence analysis of the method; we refer to [6] for a recent overview, and to[25], [5] for extended ADI-based strategies.

Projection methods have also been proposed for iteratively solving (1.1). Asfor modern ADI schemes, these methods provide a factorized approximate solution,obtained from the projection of the original problem onto a much smaller dimensionspace. A pioneering strategy was proposed by Saad in 1990 ([38]) where a Krylovsubspace was used as projection space, and then further explored in [21], [24], [13];see also [27]. However, only more recently projection strategies have emerged as truecompetitors with respect to ADI type methods: the use of powerful subspaces for theprojection step may allow one to determine an accurate factored approximation with avery small approximation space dimension. In particular, extended Krylov subspacesand more general rational Krylov subspaces have proven to be particularly effectiveon large 2D and 3D problems [40], [11]. Due to the good experimental behavior, large

∗Version of August 28, 2012.†School of Mathematical Sciences, Xiamen University, China, and Dipartimento di Matematica,

Universita di Bologna, Bologna, Italy ([email protected]).‡Dipartimento di Matematica, Universita di Bologna, Piazza di Porta S. Donato, 5,40127 Bologna,

Italy ([email protected]).

1

2 Yiding Lin and Valeria Simoncini

efforts have been recently devoted to the theoretical analysis of projection methods,whose new results have filled a gap in the understanding of these strategies for matrixequations, also with respect to the related ADI method [10], [41], [3], [30].

Projection-type methods for (1.1) generally perform the following steps:1. Generate the approximation subspace Range(Vk)2. Project the original problem onto Range(Vk); solve the reduced problem for

Yk

3. Generate the approximate solution Xk = VkYkV∗

k .If the approximate solution is not satisfactory, then the space is expanded and anew candidate computed. The second step above characterizes the specific extractionmethod, once the approximation space has been chosen. Most successful projectionmethods use the Galerkin condition for extracting the approximate solution from thegiven space. This condition requires that the associated residual R = AXk +XkA

∗ +BB∗ be orthogonal to the approximation space. In matrix terms, this translates intothe equation (cf., e.g., [38])

V∗

k (AVkYkV∗

k + VkYkV∗

kA∗ +BB∗)Vk = 0. (1.2)

Assuming that Vk has orthonormal columns, Yk can be determined as the solution toa reduced Lyapunov matrix equation with coefficient matrix V∗

kAVk.We are interested in exploring a different (Petrov-Galerkin-type) condition, which

corresponds to minimizing the Frobenius norm of the residual. In this case, thereference problem is

Y MRk = arg min

Yk∈Rk×k‖AVkYkV∗

k + VkYkV∗

kA∗ +BB∗‖F . (1.3)

The approximation process can thus be stopped by monitoring the quantity that isactually minimized. On the contrary, the residual norm in the Galerkin process mayexhibit a erratic behavior, thus delaying the process. This behavior is reminiscentof the well known situation in projection-based methods for solving standard linearsystems [39]: the residual norm of the FOM method (applying the Galerkin condition)may heavily oscillate, whereas the minimized residual norm of GMRES (based on aresidual minimization procedure) smoothly decays. A thorough analysis shows thatpeaks and plateaux of FOM and GMRES, respectively, can be explicitly related [8], [9].In the matrix equation setting we can numerically confirm this fascinating interplay.

Minimal residual approaches were analyzed in [24], [21]; however the proposedalgorithms for solving the reduced problem did not lead to computationally compet-itive methods with respect to the Galerkin strategy. As a result, minimal residualstrategies were somehow disregarded, in spite of their good theoretical properties. Inthis paper we provide an implementation of the minimal residual method (MR in thefollowing) whose leading computational cost is comparable to that of the Galerkinmethod, using the same projection space, with smoother convergence behavior. Thealgorithmic bottleneck in the MR method is the solution of a reduced order linearleast squares matrix problem with three terms, whose efficient numerical solutionhas not been addressed in the standard least squares literature. This cost should becompared with that of solving the reduced Lyapunov equation (1.2) for the Galerkinapproach. By using Kronecker products, the least squares problem can be reformu-lated as a standard least squares problem of exploding size, so that solving the latterproblem becomes prohibitive even for moderate dimension of the matrix least squaresproblem. We thus explore three venues: i) We propose a new direct method that

A new minimal residual method for large scale Lyapunov equations 3

computes the residual norm and the least squares solution by using spectral decom-positions; ii) We improve the algorithm originally proposed in [21] by exploiting theLyapunov structure of our setting; iii) We devise operator oriented preconditioningstrategies for iteratively solving the matrix normal equation associated with the ma-trix least squares problem. For all strategies we provide a complete derivation anddiscuss the main algebraic properties and computational challenges. A selection ofnumerical experiments will be reported to show the potential of the new algorithmson various known data sets.

The paper is organized as follows: given an approximation space satisfying certainproperties, section 2 describes the projection step under a minimal residual constraint.Section 3 serves as introduction to the description of the computational core. In sec-tion 3.1 we devise a procedure for computing the least squares residual norm as thespace expands, without explicitly computing the least squares solution; from section3.2 to section 3.4 we propose three different strategies, direct and iterative, to com-pute the final solution to the reduced problem. Section 4 summarizes the minimalresidual algorithm. In section 5 we report on our numerical experience with these algo-rithms when different approximation spaces are used, and compare their performancewith that of a standard Galerkin approach. Our conclusions and open problems aredescribed in section 6.

Throughout the paper, ‖ · ‖F denotes the Frobenius norm, the symbol ⊗ denotesthe Kronecker product and vec(·) denotes the vectorization operator, which stacksall columns of a matrix one below the other. The field of values of A is defined asF (A) = {x∗Ax : x ∈ C

n, x∗x = 1}. The identity matrix of size k is denoted by Ik,though the subscript will be omitted when clear from the context.

2. Residual minimization with subspace projection. The general residualminimization problem (1.3) can be considerably simplified whenever the approxima-tion space satisfies certain properties. In particular, the new problem has much smallerdimension, so that computation is significantly reduced. We next derive the simpli-fication step and show that some well established classes of Krylov subspaces satisfythese properties.

Proposition 2.1. Let the columns of Vk = [V1, V2, . . . , Vk] form an orthonormalbasis of the given approximation space. Assume that B = V1RB for some RB, andthat for each k there exist two matrices Vk+1 ∈ R

n×p, having orthogonal columns, andHk ∈ R

(k+1)p×kp, such that

AVk = Vk+1Hk, with Vk+1 = [V1, V2, . . . , Vk, Vk+1] (2.1)

Then the minimization problem (1.3) can be written as

Y MRkp = arg min

Y ∈Rkp×kp

∥∥∥∥HY[I, 0

]+

[I0

]Y H∗ +

[RBR

∗

B 00 0

]∥∥∥∥F

. (2.2)

Proof. We have

minY ∈Rkp×kp

‖AVkY V∗

k + VkY V∗

kA∗ +BB∗‖F

= minY ∈Rkp×kp

‖Vk+1HkY V∗

k + VkY H∗

kV∗

k+1 +BB∗‖F

= minY ∈Rkp×kp

∥∥∥∥Vk+1

[HkYk

[I, 0

]+

[I0

]YkH

∗

k +

[RBR

∗

B 00 0

]]V∗

k+1

∥∥∥∥F

= minY ∈Rkp×kp

∥∥∥∥HY[I, 0

]+

[I0

]Y H∗ +

[RBR

∗

B 00 0

]∥∥∥∥F

.


The reduction from an n× kp to a (k+1)p× kp is possible mainly thanks to relation(2.1). Such relation is well known to hold for the standard block Krylov subspace

Kk = Range([B,AB, . . . , Ak−1B]), dim(Kk) ≤ kp,

for which it can be verified that AVk = Vk+1Hk, where the columns of Vj are anorthonormal basis for Kj , j = k, k + 1, and Hk = V∗

k+1AVk [39, sec. 6.12].Analogously, for the extended block Krylov subspace

EKk = Range([B,A−1B,AB,A−2B, . . . , Ak−1B,A−kB]), dim(EKk) ≤ 2kp,

for each k the relation (2.1) holds with Vk+1 = Vk+1, where the columns of Vk+1 spanEKk+1, and Hk = V∗

k+1AVk [12], [23], [18], [40].We next show that the crucial property (2.1) also holds for the rational block

Krylov subspace, defined as

RKk = Range([B, (A− s2I)−1B, . . . ,

k∏

j=2

(A− sjI)−1B]), dim(RKk) ≤ kp,

where s2, . . . , sk are properly chosen parameters such that A− sjI is nonsingular forall j = 2, . . . , k; see, e.g., [11] and references therein.

Proposition 2.2. Let the columns of Vj = [V1, . . . , Vj ] ∈ Rn×jp, span RKj,

j = k, k + 1, and assume that Vk is generated by means of a block Arnoldi-typeprocess, so that B = V1RB for some p×p matrix RB, and the block upper Hessenberg

matrix Hk =

[Hk

χk+1,kE∗

k

], χk+1,k ∈ R

p×p, holds the orthogonalization coefficients.

Let Dk = blkdiag(s2Ip, . . . , sk+1Ip) and Tk := V∗

kAVk. Then (2.1) holds with Vk+1

and Hk given as follows:

Vk+1sk+1 − (I − VkV∗

k )AVk+1 = Vk+1R, Hk :=

[Tk

Rχk+1,kE∗

kH−1k

],

where Vk+1R is the thin QR decomposition of the left-hand side matrix.Proof. From [11, Proof of Proposition 4.2] we obtain

AVk = VkTk + Vk+1Hk+1,kE∗

kDkH−1k − (I − VkV∗

k )AVk+1Hk+1,kE∗

kH−1k

= [Vk, Vk+1sk+1 − (I − VkV∗

k )AVk+1]

[Tk

χk+1,kE∗

kH−1k

].

The final relation follows by substituting the thin QR decomposition of Vk+1sk+1 −(I − VkV∗

k )AVk+1.

We explicitly notice that by construction, Vk+1 is orthogonal to the matrix Vk;this fact will be used in the following when computing the Frobenius norm.

To simplify the notation, from now on we omit the iteration subscript, whichremains clear from the context. To make the presentation more uniform, irrespectiveof the projection process used and of the right-hand side rank, we write the reducedminimization problem as

Y MR = arg minY ∈Rk×k

∥∥∥∥HY I∗ + IY H∗ +

[RBR

∗

B 00 0

]∥∥∥∥F

, (2.3)


where I =

[I0

], H =

[HE∗

], with I,H ∈ R

l×k, H ∈ Rk×k, where k is the dimension

of the approximation space. In practice, l is slightly larger than k, according to thedifferent kinds of process and right-hand side; for instance, in theory EKSM with arank-one right-hand side will have l = k + 2, while with a rank-2 right-hand side willhave l = k+4. In fact, since for EKSM the last row of the projected matrix is alwayszero, we actually use l = k + 1 and l = k + 2, respectively; see, e.g., [22].

For later comparison, we recall that the Galerkin reduced problem obtained fromthe same approximation subspace is given by

HY + Y H∗ = −[RBR

∗

B 00 0

]. (2.4)

The reduced problem (2.4) has a unique solution if and only if H and −H∗ do nothave common eigenvalues. A standard hypothesis to prevent common eigenvalues isto require that the field of values of A be contained in C

−, so that the field of valuesof H will also be in C

−. The reduced problem (2.3) does not have this restriction.However, since the procedures we consider for its solution are only defined wheneverHand −H∗ have no common eigenvalues, this hypothesis will be assumed throughout.

3. The solution of the reduced minimization problem. The numericalsolution of matrix least squares problem with more than two terms has not beensystematically addressed in the available literature, and only very few methods exist,usually tailored to specific cases; we refer to, e.g., [37] for an early general algorithm.

By means of the Kronecker product, it is possible to recast (2.3) as a standard(vector) least squares problem with coefficient matrix of size l2 × k2. Without takinginto account the structure, its solution requires a computational complexity of theorder of O(k5), roughly corresponding to a QR decomposition of a k2 × k2 matrixwith lower bandwidth k; see also [28]. The methods in [21],[24], devised in the contextof the standard Krylov subspace, were able to solve the problem with a complexity ofO(k4). However, we wish to solve this least squares problem at a cost O(k3), whichwould be comparable to the cost of solving the matrix Lyapunov equation (2.4). Weestimate that any approach that explicitly uses the Kronecker formulation withouttaking into account the special structure, will have a complexity of at least O(k4);therefore, we employ the Kronecker formulation only at the design level.

As opposed to previous Krylov space-based minimum residual algorithms, weexploit the fact that the reduced problem does not need to be solved at each iteration;instead, only its residual norm is required, and the solution matrix is only computedat convergence. This crucial fact is well known in the linear system setting, where thebest-established minimal residual method, GMRES, cleverly and cheaply performsthe computation of the residual norm at very iteration. Unfortunately, computingthe residual norm in our matrix problem is significantly more expensive than in thelinear system case, though we are able to keep a O(k3) overall cost. To simplify thepresentation in later sections, we assume that the projection space has dimension kand the right-hand side B has rank one, so that B = v1‖B‖ = Vke1β0.

In the next section we present a new strategy that only computes the residualnorm during the iteration. In subsequent sections we explore various alternatives tosolve the final least squares problem: a new direct method, and a cheaper variantof a direct algorithm originally proposed in [21]. In section 3.4 we also describe theuse of a preconditioned iterative solver, namely the preconditioned Conjugate Gradi-ent method for least squares problem (PCGLS), with a selection of preconditioning


strategies.

3.1. Computation of the residual norm at each iteration. In Kroneckerform, we can write the problem (2.3) as

miny

‖e1β0 −Hy‖, H = H ⊗ I + I ⊗H. (3.1)

Since H is (k + 1) × k, the Kronecker matrix H is (k + 1)2 × k2. For H full columnrank, y = (H∗H)−1H∗e1β0 is the least squares solution, and the residual is given bye1β0 − Hy = (I − H(H∗H)−1H)e1β0, which is the projection onto Null(H∗). Let Ube a full column rank matrix such that Range(U) is the null space of H∗. Therefore,

‖e1β0 −Hy‖2 = ‖U(U∗U)−1U∗b‖2 = β20 e

∗

1U(U∗U)−1U∗e1.

We are thus left with the determination of U ∈ R(k+1)2×(2k+1). Due to its structure,

H is already too large for moderate k, say k = 10, to perform a cheap computationof ‖e1β0 − Hy‖ or an explicit determination of U , without taking the structure intoaccount. We next show that a structure-preserving basis U can be formally derived,and that its explicit computation is not needed to monitor the least squares residualnorm. In the following, ◦ denotes the Hadamard (element-wise) product.

Proposition 3.1. Let (λi, yi), i = 1, . . . , k, be the eigenpairs of H∗, and assumethat w∗

i = E∗(H + λiI)−1 is defined, for i = 1, . . . , k. Let β0 = ‖b‖,

W =

[w1 . . . wk

1 . . . 1

]= [w1, . . . , wk], Y =

[y1 . . . yk0 . . . 0

]= [y1, . . . , yk].

Then the least squares residual norm satisfies

‖e1β0 −Hy‖2 = β20 g

∗

Y ∗Y ◦ W ∗W Y ∗W ◦ W ∗Y 0

W ∗Y ◦ Y ∗W W ∗W ◦ Y ∗Y 00 0 1

−1

g, (3.2)

with g∗ = [y1(1)w1(1), . . . , yk(1)wk(1), w1(1)y1(1), . . . , wk(1)yk(1), 0].Before we prove this result, we stress that the large matrix appearing in (3.2) has

size 2k + 1, and that its determination entails computations of the order of k3. Theresidual norm thus requires O(k3) floating point operations.

Proof. We have

H∗u = 0 ⇔ I∗UH +H∗UI = 0, u = vec(U).

We thus look for 2k + 1 linearly independent vectors u1, . . . , u2k+1 with (k + 1)2

components each, which can form a null space basis, and the columns of U .We write U = [U1, U2;U

∗

3 , U4] ∈ C(k+1)×(k+1), with U4 scalar and U2, U3 vectors,

so that the condition above corresponds to

U1H + U2E∗ +H∗U1 + EU∗

3 = 0. (3.3)

The scalar U4 is not used in (3.3), so that it can be freely chosen. Therefore, weset U = [0, 0; 0, 1] (that is, U4 = 1 and all other matrices zero) as one candidatefor the basis; let it be u2k+1. With the equation above we can determine 2k morevectors u1, . . . , u2k, all having U4 = 0, so that ui ⊥ u2k+1, i = 1, . . . , k, to ensure


good conditioning with respect to the last basis vector. To proceed, one could chooseU1 and U3 somewhat randomly and then solve the Lyapunov equation for H. Thiswould require O(k) solves, giving a O(k4) overall cost. Instead, we require that eitherU2 is equal to an eigenvector of H∗ and U3 = 0, or the opposite.

Assume first that U3 = 0 and let U2 be an eigenvector of H∗ with eigenvalue λ.Then, for H + λI nonsingular, by substitution we find that the matrix U1 = U2w

∗,with w∗ = −E∗(H+λI)−1 satisfies (3.3). By using different eigenvectors, we can givek different U ’s.

Analogously, taking U2 = 0 and the vector U3 to be any eigenvector of H∗, weobtain that the choice U1 = vU∗

3 and v = −(H∗ + λI)−1E = w satisfies (3.3).Summarizing, if (λi, yi), i = 1, . . . , k are the eigenpairs of H∗, the first k columns

of U are taken as

ui = vec(

[yiw

∗

i yi0 0

]) = vec(yiw

∗

i ) =¯wi ⊗ yi, i = 1, . . . , k, (3.4)

while the next k columns can be chosen to be

uk+i = vec(

[wiy

∗

i 0y∗i 0

]) = vec(wiy

∗

i ) =¯yi ⊗ wi, i = 1, . . . , k; (3.5)

moreover, u2k+1 = e2k+1. Therefore, for U = [u1, . . . , uk, uk+1, . . . , u2k, u2k+1], theentries of U∗U satisfy:

u∗

i uj =(¯wi ⊗ yi

)∗ ( ¯wj ⊗ yj)=

(¯wi

∗

y∗i

) (¯wj ⊗ yj

)

= y∗i yj(w∗

iwj + 1), i, j = 1, . . . k;

analogously,

u∗

k+iuj =(¯yi ⊗ wi

)∗ ( ¯wj ⊗ yj)=

(¯y∗

i ⊗ w∗

i

) (¯wj ⊗ yj

)

= (w∗

i yj)(y∗

iwj), i, j = 1, . . . k.

In the same way we obtain u∗

i uk+j = (y∗i wj)(w∗

i yj), u∗

k+iuk+j = (w∗

iwj + 1)y∗i yj .Moreover, by construction, u∗

i u2k+1 = 0, i = 1, . . . , 2k. Therefore, we have

U∗U =

(y∗i yj(w

∗

iwj + 1))i,j=1,k

(y∗i wjw∗

i yj)i,j=1,k 0

(w∗

i yjy∗

i wj)i,j=1,k ((w∗

iwj + 1)y∗i yj)i,j=1,k 00 0 1

=

Y ∗Y ◦ W ∗W Y ∗W ◦ W ∗Y 0

W ∗Y ◦ Y ∗W W ∗W ◦ Y ∗Y 00 0 1

.

In addition, e∗1U = [y1(1)w1(1), . . . , yk(1)wk(1), w1(1)y1(1), . . . , wk(1)yk(1), 0], fromwhich the result follows.

A few remarks concerning Proposition 3.1 are in order.

Remark 3.2. The formula in (3.2) requires solving a linear system with a possiblycomplex matrix. However, the upper part of the linear system (the last element ofthe solution is zero) has the following structure

[B1 B2

B2 B1

] [xy

]=

[g1g1

]


with B2 complex symmetric, from which it follows that y = x. Therefore, the upperpart of the system can be rewritten by means of real matrices as follows

[ℜ(B1) + ℜ(B2) −ℑ(B1) + ℑ(B2)ℑ(B1) + ℑ(B2) ℜ(B1)−ℜ(B2)

] [ℜ(x)ℑ(x)

]=

[ℜ(g1)ℑ(g1)

],

significantly lowering the number of floating point operations. Note also that sinceℑ(B1) is skew-symmetric, the coefficient matrix is symmetric.

Remark 3.3. When B has p > 1 columns, after k iterations the matrix H hasdimension (k + 1)p× kp, so that H has dimension (k+ 1)2p2 × k2p2, with H∗ havinga null space of dimension 2kp2 + p2. The proof of Proposition 3.1 proceeds as in thecase p = 1, except that U2 = ye∗i is now a matrix, with ei ∈ R

p, i = 1, . . . , p, andeach matrix U1 is selected as U1 = U2W

∗. Clearly, the curse of dimensionality maydramatically affect the performance of the residual computation, and this can be seenin our experiments in Example 5.5.

The proof of Proposition 3.1 constructs a basis U for the null space of H, althoughwe show that its explicit computation is not needed. The question arises whether sucha basis is sufficiently well conditioned. The following result shows that its conditioningis driven by, though in general better than, that of the left eigenvectors of H.

Proposition 3.4. Assume the same notations and definitions of Proposition 3.1hold. Let U = [u1, . . . , uk, uk+1, . . . , u2k, u2k+1], with ui and uk+i, i = 1, . . . , k asdefined in (3.4) and (3.5), respectively, and u2k+1 = e2k+1. Then

cos(〈(ui, uj)) = cos(〈(uk+i, uk+j)) = cos(〈(yi, yj)) cos(〈(wi, wj)), i, j = 1, . . . , k,

and

cos(〈(ui, uk+j)) = cos(〈(wi, yj)) cos(〈(yi, wj)), i, j = 1, . . . , k,

Proof. The relations can be obtained by simply making the inner products u∗

i uj ,u∗

i uk+j , etc. more explicit.

3.2. Computation of the final least squares solution. Once the residualnorm of the inner least squares problem has become sufficiently small, we can computethe solution y and thus the matrix Y MR in (2.3). The following proposition showsthat Y MR can be obtained with some extra O(k3) effort.

Proposition 3.5. Let the columns of U define a basis for the null space of H∗.and let P be a permutation matrix such that

P(I ⊗H + I ⊗H) =

[I ⊗H +H ⊗ I

h∗

]=: H.

Define accordingly PU = [U1;U2], and let d = e1β0 − U1(U∗U)−1U∗e1β0. Then Y MR

is the solution to HY + Y H∗ −D = 0, with d = vec(D).Proof. With the Kronecker notation of problem (3.1), let e = e1β0, and let

H := P(I ⊗ H + I ⊗ H) so that H = [H;h∗] with H nonsingular. Note also thatPe = e so that the known vector is not affected by the permutation, as well as thesolution. Then we have that the solution y = vec(Y MR) satisfies

y = (H∗H)−1H∗

e = (H∗H+ hh∗)−1H∗e = H−1(I +H−Thh∗H)−1e

= H−1(I −H−Th(I + h∗H−1H−Th)−1h∗H−1)e,


where in the last equality the Sherman-Morrison-Woodbury formula was used. From[H∗, h](PU) = 0, that is H∗[I,H−Th](PU) = 0, it follows that there exists a nonsin-gular matrix ρ such that

[H−Th−I

]ρ = PU , so that U1 = H−Thρ.

Substituting H−Th = U1ρ−1 in the expression for y and noticing that

U∗U = (PU)∗(PU) = ρ∗(I + h∗H−1H−Th)ρ

we obtain

y = H−1(I −H−Th(I + h∗H−1H−Th)−1h∗H−1)e

= H−1(I − U1ρ−1ρ(U∗U)−1ρ∗ρ−HU∗

1 )e.

Observing that U∗

1 e = U∗e, where in the second case the length of e was properlyadjusted, we obtain that y = H−1d, which is the Kronecker version of the statedLyapunov equation.

Proposition 3.5 uses the quantity (U∗U)−1U∗e1β0 which is already available fromthe computation of the residual norm. Since the eigenvalue/vector decomposition ofH∗ is available, the Lyapunov equation giving Y in Proposition 3.5 can be solvedelement-wise by using the spectrally transformed problem.

3.3. Hu-Reichel direct method for the solution of the reduced problem.

In [21], Hu and Reichel proposed a O(k4) direct method for solving the reduced prob-lem (2.3) in the more general case of the Sylvester equation. We emphasize that theprocedure in [21] was derived after reduction with a standard Krylov subspace; herewe apply the procedure after projection onto a possibly more general space. We alsomention the variant in [28], which however does not seem to lower the computationalcost of the method.

The algorithm in [21] obtains the least squares solution from the Galerkin solution,by using the Sherman-Morrison-Woodbury formula. We next recall the algorithm inthe Lyapunov setting, and propose a cheaper, though still order k4, variant.

Let H = UTHU∗ be the real Schur decomposition of H, with TH block uppertriangular and U orthonormal, and let F = E∗U . Then we can write (2.3) as

minY

(‖HY + Y H∗ +RBR

∗

B‖2F + ‖E∗Y ‖2F + ‖Y E‖2F)

= minY

(∥∥∥TH Y + Y T ∗

H + U∗RBR∗

BU∥∥∥2

F+ ‖FY ‖2F + ‖Y F ∗‖2F

),

where Y = U∗Y U . The Kronecker form of the problem is

minvec(Y )∈Rk2

∥∥∥∥∥∥

I ⊗ TH + TH ⊗ II ⊗ FF ⊗ I

vec(Y ) +

vec(U∗RBR∗

BU)00

∥∥∥∥∥∥

2

2

.

Letting R = I ⊗ TH + TH ⊗ I, S =

[I ⊗ FF ⊗ I

], d = vec(U∗RBR

∗

BU) and y = vec(Y ),

the problem becomes

miny∈Rk2

∥∥∥∥[

RS

]y +

[d0

]∥∥∥∥2

2

, (3.6)


whose associated normal equation is given by (R∗R + S∗S) y + R∗d = 0. For Rnonsingular, we can equivalently write

(I + (SR−1)∗SR−1

)y′ + d = 0, y = R−1y′. (3.7)

By means of the Sherman-Morrison-Woodbury formula we can obtain y′ as follows

y′ = −d+ (SR−1)∗(I + SR−1(SR−1)∗)−1SR−1d,

from which y and thus Y and Y can be readily recovered. This approach requires theexplicit computation of

SR−1 = (

[I ⊗ FF ⊗ I

])(I ⊗ TH + TH ⊗ I)−1 =

[(I ⊗ F )(I ⊗ TH + TH ⊗ I)−1

(F ⊗ I)(I ⊗ TH + TH ⊗ I)−1

],

so that the systems (I ⊗ TH + TH ⊗ I)∗z = (I ⊗ F ∗)ei and (I ⊗ TH + TH ⊗ I)∗z =(F ∗ ⊗ I)ei for i = 1, . . . , k(l − k) need be solved. We notice that I ⊗ F ∗ is a blockdiagonal (rectangular) matrix, with all diagonal blocks equal to the tall matrix F ∗.Therefore, each column fℓ = (I ⊗ F ∗)eℓ, ℓ = 1, . . . , k(l − k) satisfies the relationvec(F ∗eie

∗

j ) = fi+(j−1)(l−k), i = 1, . . . , (l − k), j = 1, . . . , k. With this relation, thesystem ((I ⊗ TH + TH ⊗ I)∗z = (I ⊗ F ∗)eℓ is equivalent to solving the Lyapunovequations T ∗

HZ +ZTH = F ∗eie∗

j , with z = vec(Z), for i = 1, . . . , (l− k), j = 1, . . . , k,with ℓ = i+ (j − 1)(ℓ− k).

The computational cost of the Sylvester equation T ∗

HZ + ZTH = F ∗eie∗

j can befurther reduced by exploiting the block structure of TH and the high sparsity of theright-hand side. Indeed, partition T ∗

H as T ∗

H = [T1, 0;T2, T3], with the size of T1 notsmaller than j; this is possible due to the quasi-triangular form of TH . Then, with acorresponding partitioning of Z and F ∗eie

∗

j we obtain

[T1 0T2 T3

] [Z1 Z2

Z3 Z4

]+

[Z1 Z2

Z3 Z4

] [T ∗

1 T ∗

2

0 T ∗

3

]+[F1 0

]= 0.

An explicit computation shows that Z2 = 0, Z4 = 0 and

[T1 0T2 T3

] [Z1

Z3

]+

[Z1

Z3

] [T ∗

1

]+ F1 = 0,

so that only the blocks Z1, Z3 need to be computed explicitly. The cost for solvingthe last Sylvester equation is lower than that for the original equation as long as T1

is significantly smaller than the whole matrix TH . Therefore, we used this reducedform for j ≤ k/2.

For the second block of SR−1, involving F ⊗ I, we similarly observe that itscomputation corresponds to solving the matrix equation T ∗

HZ + ZTH = (e∗jF )∗ei.Since this equation is the transpose of the previous matrix equation, we do not needto solve it again; instead, we simply compute the new z as z = vec(Z∗) directly.We notice that this computational saving exploits the symmetry of the Lyapunovequation.

The complete algorithm in [21] is summarized in Figure 3.1. Overall, the algo-rithm requires a Schur decomposition, the solution of k Sylvester equations of leadingsize k, and the solution of k×k systems in the Sherman-Morrison-Woodbury formula,giving the already mentioned O(k4) complexity.


Fig. 3.1. Hu-Reichel Algorithm for solving minY ‖H(Y ) + C‖F .

1: Compute real Schur decomposition H = UTHU∗

2: Compute F = E∗U , d = vec(U∗RBR∗

BU)3: Compute the rows of SR−1 by solving k Lyapunov equations: T ∗

HZ + ZTH =F ∗eie

∗

j , i, j = 1, . . . , k.

4: (Variant) Compute the rows of SR−1 by solving k Sylvester equations: T ∗

HZ +Z(TH)1:j,1:j = F ∗eie

∗

j , i, j = 1, . . . , k.

5: Solve [I + (SR−1K )∗SR−1

K ]y′ = vec(U∗RBR∗

BU) using the Sherman-Morrison-Woodbury formula;

6: Solve TH Y + Y T ∗

H = Y ′, where vec(Y ′) = y′;

7: Recover Y = UY U∗

As a possibly cheaper alternative, Hu and Reichel also suggest applying the Con-jugate Gradient method (CG) to (3.7). Since the coefficient matrix is a rank-(2k(l−k))modification of the identity, in exact arithmetic CG will convergence in 2k(l−k) iter-ations. Their approach is mathematically equivalent to applying CGLS to (3.7) witha structured preconditioner (cf. section 3.4.1).

3.4. Iterative method for the solution of the reduced problem. Theleast squares problem (2.3) can also be solved by means of an iterative procedure. Weuse this strategy only at convergence, however in principle it could be used at eachiteration whenever this is more efficient than the direct method; see Example 5.6.

We first introduce the operator H(T ) := HTI∗ + ITH∗(k × k 7→ l × l), so thatthe reduced problem (2.3) becomes minY ‖H(Y )+C‖F . The transposed operator canbe defined as H∗(S) = H∗SI + I∗SH(l × l 7→ k × k). Therefore,

minY

‖H(Y ) + C‖F ⇔ H∗(H(Y ) + C) = 0, (3.8)

where the latter equation is the associated normal equation. Using I∗H = H, we get

H∗(H(Y )) = H∗HY + Y HTH +HYH +H∗Y H∗. (3.9)

The matrix equation on the right-hand side in (3.8) can be written as

H∗HY + Y H∗H +HYH +HTY H∗ = CNE , CNE := H∗(−C). (3.10)

Since H has full column rank, we have that H∗H = H∗H + EE∗ > 0. If H isnonsingular, then H∗H > 0.

Both the matrix least squares problem in (2.3) and the matrix normal equationin (3.10) could be transformed into vector form by using the Kronecker formulation.Then, a standard iterative solver such as preconditioned LSQR [35],[34] or precondi-tioned CGLS (PCGLS) [16] could be employed. To exploit matrix-matrix operations,we employ a matrix form of PCGLS that directly uses the operators H and H∗. Theidea of using matrix-matrix operations when dealing with linear systems in Kroneckerform is not new, see, e.g., [19],[13],[17],[26].

In the matrix inner product defined by 〈X,Y 〉F = tr(Y ∗X), ∀X,Y ∈ Rk×k an

operator K is self-adjoint and positive definite if it satisfies

〈K(T ), S〉F = 〈T,K(S)〉F , ∀S, T ∈ Rk×k, 〈K(T ), T 〉F > 0, ∀T 6= 0 ∈ R

k×k.


Algorithm 1

Operator version of PCGLS for solving minY ‖A(Y ) +C‖F with preconditioner M(·) andinitial guess Y0.

1: Compute R0 = −C − A(Y0),S0 = A∗(R0),Z0 = M−1(S0),P0 = R0.2: for j = 1, . . . , jmax do

3: Wj = A(Pj)4: αj = 〈Sj , Zj〉F /‖Wj‖2F ;5: Yj+1 = Yj + αjPj ;6: Rj+1 = Rj − αjWj ; % least squares residual7: Sj+1 = A∗(Rj+1); % normal eqn. residual8: if converged then

9: break;10: end if

11: Zj+1 = M−1(Sj+1)12: βj = 〈Sj+1, Zj+1〉F /〈Sj , Zj〉F ;13: Pj+1 = Zj+1 + βjPj ;14: end for

15: return Yj+1;

Given a generic full column rank operator A and a “preconditioning” operator M(·),both self-adjoint and positive definite, Algorithm 1 reproduces the operator version ofPCGLS for solving minY ‖A(Y )+C‖F with preconditioner M(·) and initial guess Y0;in our numerical experiments we used Y0 = 0. There is no significant visual difference1

between this algorithm and the classical PCGLS method, except that all capital lettersrefer to matrices; the latter can be recovered by transforming all matrices in vectors(by the “vec” operation) and using the Kronecker product for the matrix operators.

The O(k3) complexity of applying both H(P ) and H∗(R) at each iteration ofPCGLS dominates all other costs; since computing H(P ) involves the sum of a matrixand its transpose, obviously only one of them should be explicitly computed. Thesame applies to H∗(R).

3.4.1. Preconditioning strategies. In line 11 of Algorithm 1 the precondi-tioning step is performed. For this step to be effective, we need to determine a selfadjoint positive operator M that is close to A∗(A(·)). A natural preconditioningoperator is T 7→ H∗HT + TH∗H, which uses the first two terms of the normal equa-tion operator H∗(H(T )). Applying this preconditioner requires solving a Lyapunovequation with nonsingular coefficient matrix H∗H. Fortunately, this cost can be sig-nificantly reduced as follows. Let H∗H = QΛQ∗ be the eigenvalue decomposition ofthe symmetric and positive definite matrix H∗H. Then we can rewrite (3.10) as

ΛY + Y Λ + HY H + H∗Y H∗ = Q∗CNEQ, (3.11)

with Y = Q∗Y Q and H = Q∗HQ. The previous preconditioning operator becomes

M(S) = ΛS + SΛ. (3.12)

1The computation of 〈X,Y 〉F = tr(Y ∗X) requires some care, to avoid O(k3) operations, formatrices X,Y with k rows. Adding all the elements of Y ◦X only requires O(k2) complexity.


This change of variable corresponds to modifying the original operator H into

H(Y ) := HY[Q∗ 0

]+

[Q∗

0

]Y H

∗

, with H := HQ,

associated with the least squares problem minY‖H(Y ) + C‖F . The corresponding

matrix normal equation gives precisely (3.11) with the normal equation operator

H∗(H(Y )) = ΛY + Y Λ + HY H + H∗Y H∗. (3.13)

In summary, an effective application of the preconditioner can be obtained byfirst transforming the original matrix equation so that its leading coefficient matricesare diagonal. Therefore, the PCGLS method is applied to the transformed problem.

We have explored the use of other preconditioning operators, namely:

S 7→ ΛS + SΛ + 2DSD, where D = diag(diag(H)) (3.14)

S 7→ ΛS + SΛ + 2Λ1

2SΛ1

2 = Λ1

2 (Λ1

2S + SΛ1

2 ) + (Λ1

2S + SΛ1

2 )Λ1

2 . (3.15)

These choices, together with M in (3.12), represent approximations to the normal

equation operator H∗(H(·)) in (3.13). The operator M uses the first two terms, whilethe operator in (3.14) adds the diagonal part of the second two terms. Finally, theoperator in (3.15) is constructed so that we can define its square root operator. Thiscan be useful if one wishes to use an operator version of LSQR as underlying solver,since LSQR requires a split positive definite preconditioner. Our experiments did notreport the clear superiority of LSQR with (3.15), compared to PCGLS with the samepreconditioner; Therefore, we will not discuss this approach further.

By using their Kronecker form, it can be shown that all considered precondition-ers are positive definite in the defined inner product, so that they can be used aspreconditioners for CGLS.

From a computational stand point, we stress that the diagonalization of H∗Hallowed us to design preconditioners that effectively use a relevant splitting of theoperator, at a low cost. Indeed, since all preconditioning operators are diagonal, thecost of applying the preconditioner is kept at O(k2).

Yet another alternative preconditioning strategy may be obtained by writing

H∗(H(Y )) = H∗HY + Y H∗H +HYH +H∗Y H∗ + EE∗Y + Y EE∗

= H∗(HY + Y H∗) + (HY + Y H∗)H + EE∗Y + Y EE∗,

and thus defining

S 7→ H∗(HS + SH∗) + (HS + SH∗)H. (3.16)

Inverting this operator requires two Lyapunov solves and thus also in this case, aspectral decomposition can be used to improve efficiency. More precisely, if H =UTHU∗ is the real Schur decomposition of H, then the original normal equation canbe rewritten so that the preconditioning operator is given by S 7→ T ∗

A(THS+ST ∗

H)+(THS+ST ∗

H)TH , whose inversion involves the solution of two Lyapunov equations withblock triangular coefficient matrices. This approach is mathematically equivalent toiteratively solving (3.7), as suggested in [21]. Our numerical experience shows that thispreconditioning strategy is more effective than the previous ones, in terms of numberof PCGLS iterations, however its CPU time per iteration is significantly higher, dueto the nondiagonal structure of the Lyapunov equations involved; we report on ournumerical experience with this preconditioner in Example 5.4 (cf. Table 5.4).


3.5. The overall iterative scheme and its properties. We can summarizethe steps taken by our iterative strategy for solving (2.3) as follows:

1. Perform a pre-processing for efficient preconditioning:For preconditionersM and (3.14) compute eigenvalue decompositionH∗H =

QΛQ∗

Set H := HQ, Q :=

[Q∗

0

], or H := Q∗HQ (only for preconditioner (3.14))

For preconditioner (3.16) compute Schur decomposition H = UTHU∗ andwork with TH

2. Apply Algorithm 1 to solve minY‖HY Q∗ + QY H

∗

+ C‖F with associatedpreconditioner

3. Recover solution Y = QY Q∗

The implementation core lies in step 2, whose cost depends on the convergencerate of the PCGLS method. In turn, convergence is largely driven by the spectralproperties of the preconditioned coefficient matrix (in Kronecker form).

To complete our characterization of the iterative procedure, we analyze the con-vergence of PCGLS for (3.11) (or equivalently for (3.10)) when using the precondi-tioner M. Let

H = Ik ⊗H∗H +H∗H ⊗ Ik︸︷︷︸M

+H∗ ⊗H +H ⊗H∗

︸︷︷︸N

(3.17)

and further split M as

M = Ik ⊗H∗H +H∗H ⊗ Ik︸︷︷︸M

+ Ik ⊗ E∗ + E∗ ⊗ Ik︸︷︷︸E

, (3.18)

where we recall that E is such that H = [H;E∗]. The operator preconditioner M

corresponds to M , and thus the preconditioned normal equation in Kronecker formhas coefficient matrix I + M−1N . We are thus interested in analyzing the spectralproperties of I +M−1N , from which an upper bound on the error norm for PCGLScan be derived. Here, Λ(M) denotes the set of eigenvalues of the matrix M.

Theorem 3.6. Let M ,N be as in (3.17). Then Λ(I + M−1N ) ⊂ [0, 2].Proof. We only need to show that Λ(M−1N ) ⊂ [−1, 1]. Since M is symmetric

and positive definite, and N is symmetric, all eigenvalues of M−1N are real. Forany 0 6= Y ∈ R

k×k, let y = vec(Y ). Then

y∗N y = 〈Y,HY H +H∗Y H∗〉F= tr(Y ∗HYH) + tr(Y ∗H∗Y H∗) = 2tr(HY ∗HY )

≤ 2‖Y H∗‖F ‖HY ‖F ≤ ‖HY ∗‖2F + ‖HY ‖2F= tr(Y H∗HY ∗) + tr(Y ∗H∗HY ) = tr(Y ∗Y H∗H) + tr(Y ∗H∗HY )

= 〈Y,H∗HY + Y H∗H〉F = y∗M y ≤ y∗M y,

(3.19)

where the last inequality follows from y∗E y ≥ 0, ∀y 6= 0. The result follows.The result above precisely locates the spectrum of the preconditioned matrix,

and it also shows that eigenvalues close to zero may arise, with possible disastrousconsequences on convergence. Nonetheless, we next provide an example for which fastconvergence is ensured.

Corollary 3.7. If N ≥ 0, then κ2(I + M−1

2 N M−1

2 ) ≤ 2.


Proof. If N ≥ 0, then Λ(M−1N ) ⊂ [0, 1]. Therefore, Λ(I + M−1N ) ⊂ [1, 2],

hence κ2(I + M−1

2 N M−1

2 ) ≤ 2 .The result above ensures that if N ≥ 0, the PCGLS method converges with a

rate at least proportional to (√2 − 1)/(

√2 + 1) [14, p. 531]. The result applies to

symmetric matrices. The situation is far more difficult in the non-hermitian case.In the next theorem we derive a sufficient, albeit very stringent, condition for N

to be positive semidefinite.Theorem 3.8. Assume that 0 /∈ F (H). If |ℜ(ξ)| ≥ |ℑ(ξ)| for any ξ ∈ F (H),

then N ≥ 0.Proof. From the definition of N we have that for any Y ∈ R

k×k, y = vec(Y ),y∗N y = 2tr(HY ∗HY ). Therefore, N ≥ 0 if and only if tr(HY ∗HY ) ≥ 0.

We observe that any Y can be written as Y = Z2Z∗

1 with Z2 nonsingular, sothat tr(HY ∗HY ) = tr(HZ1Z

∗

2HZ2Z∗

1 ) = tr(Z∗

1HZ1Z∗

2HZ2). We will show thattr(Z∗

1HZ1Z∗

2HZ2) ≥ 0.The angular field of values of a square matrix H is given by F ′(H) = {x∗Hx :

x ∈ Cn, x 6= 0} (see, e.g., [20, p.6]). For any nonsingular Z2 ∈ R

k×k, F ′(Z∗

2HZ2) =F ′(H); see, e.g., [20, p.13]. Moreover, for any possibly singular Z1 ∈ R

k×k it holdsthat2 F ′(Z∗

1HZ1) ⊂ F ′(H) ∪ {0}. Collecting these two results and recalling that|ℜ(ξ)| ≥ |ℑ(ξ)| for any ξ ∈ F (H), we obtain that for all Z1, Z2 with Z2 nonsingular,it holds that3

F ′(Z∗

1HZ1)F′(Z∗

2HZ2) ⊂ (F ′(H) ∪ {0})F ′(H) ≡ C+0 , (3.20)

where C+0 = {z ∈ C,ℜ(z) ≥ 0}. Since 0 /∈ F ′(Z∗

2HZ2), it follows that (cf. [20, p.67]and (3.20))

Λ(Z∗

1HZ1Z∗

2HZ2) ⊂ F ′(Z∗

1HZ1)F′(Z∗

2HZ2) ⊂ C+0 .

Therefore, tr(Z∗

1HZ1Z∗

2HZ2) ≥ 0 for any Z1, Z2 with Z2 nonsingular, from whichthe result follows.

Theorem 3.8 requires the field of values F (H) to lie in the west cone obtained bythe two bisectors of the complex plane. This condition is significantly more restrictivethan assuming that F (H) ⊂ C

−, as is usually assumed in Galerkin approximations.

4. A minimal residual algorithm for the Lyapunov Equation. We sum-marize all steps of our Minimal Residual algorithm in Algorithm 2.

At convergence, the algorithm performs a rank truncation of the solution matrixY MRk . Such truncation affects the final solution and the associated residual norm,

therefore the choice of the threshold for the rank truncation is left to the user. It isalso important to realize that the symmetric solution Y MR is not necessarily positivedefinite, as opposed to the solution to (1.1). However, for sufficiently accurate ap-proximations, the magnitude of the negative eigenvalues of Y MR usually falls belowthe truncation tolerance, so that these eigenvalues can be neglected. In case of neg-ative eigenvalues, however, it is possible to monitor the least squares norm with thetruncated solution matrix, to decide whether the iteration should really be stopped.

2For any y 6= 0, y∗Z∗1HZ1y is either zero, if y is in the null space of Z1, or y∗Z∗

1HZ1y = w∗Hw ∈F ′(H), with w = Z1y.

3Since F ′(H) ⊆ {z = ρeiθ, 34π ≤ θ ≤ 5

4π}, we have that for z = z1z2 ∈ (F ′(H) ∪ {0})F ′(H) it

holds that z = ρ1ρ2ei(θ1+θ2), with either z = 0 or 32π ≤ θ1+θ2 ≤ 5

2π, that is (F ′(H)∪{0})F ′(H) =

C+0 .


Algorithm 2

MR: Minimal residual algorithm for AX +XA∗ +BB∗ = 0.

1: for k = 1 : kmax do

2: Expand the approximate subspace. Determine Vk,Vk+1 and Hk such thatAVk = Vk+1Hk

3: Compute residual norm using Proposition 3.14: if residual norm < tol‖BB∗‖F then

5: Apply either Direct method or Iterative method to get Y MRk

6: break;7: end if

8: end for

9: Find Lk such that Y MRk ≈ LkL

∗

k

10: Compute Zk = VkLk such that XMRk = ZkZ

∗

k

11: return

The relative residual norm is used in our stopping criterion. Another possibilitywould be the use of the backward error (cf. also [40]), that is

‖RMRk ‖F

‖B‖2F + ‖A‖F ‖Y MRk ‖F

< tol

where ‖RMRk ‖F denotes the residual Frobenius norm. Since the backward error de-

pends on the approximation matrix, we preferred to use the relative residuals, to havea more explicit residual comparison between the MR and Galerkin approaches.

5. Numerical experiments. In this section we propose a selection of our nu-merical experiments, showing the performance of the minimal residual method, fordifferent choices of approximation spaces. We first discuss a few motivating examples,which emphasize the relevance of a minimum residual strategy.

All experiments were carried out in Matlab 7.11 (64 bits) [33] under a Windowsoperating system using one of the four available CPUs.

−2000 −1000 0 1000 2000

−2000

−1500

−1000

−500

0

500

1000

1500

2000

Fig. 5.1. Field of value of matrix ISS. Eigenvalues are represented by ‘×’.

5.1. Minimal residual and Galerkin methods. The difference between aGalerkin approach and a minimal residual strategy only lies in the extraction step,that is in the determination of the reduced matrix Y . In the linear system case, the


0 20 40 60 80 10010

−8

10−6

10−4

10−2

100

102

outer iteration index

rela

tive

resi

dual

MRGarlerkin

0 50 100 150 20010

−8

10−6

10−4

10−2

100

102

outer iteration index

rela

tive

resi

dual

MRGarlerkin

Fig. 5.2. Convergence history (residual norm) of Galerkin and MR methods. Left: ExtendedKrylov subspace. Right: Rational Krylov subspace.

Galerkin approach may exhibit a erratic behavior when eigenvalues of the projectedmatrix H are near the origin, whereas a smooth behavior is observed in the minimalresidual approach [8]. In our setting peaks in the residual convergence curve occurwhen the matrix I ⊗ H + H ⊗ I has eigenvalues close to the origin. This is morelikely to occur, for instance, when A is stable but not passive, so that 0 ∈ F (A). Atypical situation is reported in Figure 5.1, for the well known dataset ISS [7], with asmall but particularly nasty matrix. Figure 5.2 shows the convergence history of MRand Galerkin methods for ISS (the first column of the available matrix B is used), forthe Extended (left) and Rational Krylov (right) subspaces. By construction, the MRmethod provides a smoother convergence curve than that obtained in the Galerkinmethod; in addition, for certain values of the subspace dimension, the two residualcurves are remarkably different, even several orders of magnitude apart. This behavioris particularly evident with the Rational Krylov space. Similarly to the linear systemcase, peaks in the Galerkin convergence history seem to correspond to plateaux in theMR case [9]; the derivation of a precise correspondence is an open problem.

5.2. Numerical evidence with large scale problems. In this section wereport on our numerical experience with MR and Galerkin methods, and with differentapproximation subspaces, when applied to large datasets. Although an MR methodis well defined and can be used for symmetric and negative definite matrices, we thinkthat it is particularly well suited for nonsymmetric matrices, therefore we shall onlyreport on nonsymmetric problems.

Our benchmark problems derive the equation AXE + EXA∗ + BB∗ = 0 fromthe associated linear dynamical system Ex′ = Ax + Bu. In all cases considered, Eis diagonal, symmetric and positive definite. Whenever E is not the identity matrix,letting E = LL∗ and denoting X = L∗XL, we equivalently solve equation

(L−1AL−T )X + X(L−1AL−T ) + L−1B(L−1B)∗ = 0.

Moreover, to improve efficiency, in all examples a symmetric approximate minimumdegree permutation was applied to A as a preprocessing (Matlab function symamd).

In the next examples we consider various matrices of size 10,000 or larger (cf.Table 5.1) stemming from available benchmark problems as well as from the explicitdiscretization (5-7 stencil finite differences) of partial differential operators.All matri-ces are stable (eigenvalues with negative real part), while all are passive except forL1, which is very moderately nonpassive.


Table 5.1

Large scale matrices used in the examples.

name size originFLOW 9 669 Oberwolfach collection ([7])CHIP 20 082 Oberwolfach collection ([7])L1 10 000 L(u) = (exp(−10xy)ux)x + (exp(10xy)uy)y − (10(x+ y)u)x ([11])L2 10 648 L(u) = uxx + uyy + uzz − 10xux − 1000yuy − 10uz ([40])L3 160 000 L(u) = div(exp(3xy)gradu)− 1/(x+ y)ux ([29])

The Standard Krylov subspace approach was implemented as in, e.g.,[38], whilethe Extended Krylov subspace was computed using the K-PIK procedure in [40]; forthe latter method, we recall that the generated space has twice the dimension asthe number of iterations. For the Rational Krylov method, the adaptive procedurefor the automatic selection of the poles introduced in [11] was employed. All citedimplementations use the Galerkin strategy to extract the approximate solution. Foran MR method we only had to replace the extraction step, while the constructionof the space remained unchanged. For the sake of stability, in all cases we enforceddouble orthogonalization during the construction of the orthogonal basis.

Both MR and Galerkin methods use the relative residual Frobenius norm as stop-ping criterion, which is computed at each iteration with the same procedure, irrespec-tive of the projection space employed. For the Galerkin approach, this entails deter-mining the solution to the projected problem; see, e.g., [38],[40]. The computationof the residual norm for MR and the computation of the solution Y in Galerkin, wasperformed at each iteration. For the MR method we use the procedure introduced insection 3.1. The stopping tolerance was fixed to tol = 10−8 for all examples, unless ex-plicitly stated otherwise. The relative residual norm of the normal equation was usedfor the stopping test in PCGLS, with tolinner = 10−13. The actual solution matrixmay differ from the solution one would obtain by using a direct method, affecting thefinal solution rank, which may slightly differ in some cases. Since the iterative solveruses the Galerkin solution during the preconditioning step, we also expect that veryfew iterations will be required whenever the MR and Galerkin residuals and solutionsare very close.

The various MR procedures only differ for the extraction strategy at convergence:We compare the CPU times of the new procedure of Proposition 3.5, the modified Huand Reichel method (in the tables denoted by HR, cf. section 3.3), and the iterativesolver PCGLS with preconditioner M (cf. section 3.4). A cost breakdown of the MRprocedures is also provided in a separate table.

For both MR and Galerkin approaches we also report the final approximationspace dimension, and (in parenthesis) the solution rank after truncation (for a trun-cation tolerance of 10−12). The final numerical rank may vary significantly dependingon the chosen tolerance, due to the presence of many small eigenvalues around thethreshold. Therefore, a smaller subspace dimension does not necessarily lead to asmaller numerical rank of the approximate solution.

Example 5.1. In this set of experiments we compare MR and Galerkin whenusing the Extended Krylov subspace (EK) as projection space, with B of rank one.Whenever the vector B was not available, we considered a unit vector with all constantentries. The matrix was factorized at the beginning of the process and solves withthe sparse factors were performed at each iteration. Results are reported in the firstblock of rows of Table 5.2. Within the MR approaches, all methods have similarCPU time performance as long as the approximation space is small; for the operator


Table 5.2

CPU Times and other data for MR and Galerkin methods using Extended and Rational Krylovsubspaces.

Galerkin MRGalerkin Space dim. Iterative Modified New Space dim.

Basis Matrix method (Rank) method HR direct (Rank)EK FLOW 9.50 74(48) 9.54 9.84 9.57 74(47)

CHIP 48.31 56(27) 48.46 48.55 48.64 56(27)L1 9.88 228(42) 12.15 21.69 11.48 218(47)L2 71.58 130(64) 67.73 68.60 67.43 118(82)L3 30.84 100(30) 29.34 30.06 29.83 96(30)

RK FLOW 3.73 32(32) 3.34 3.34 3.34 29(29)CHIP 49.23 22(22) 49.20 49.25 49.75 22(22)L1 10.39 72(41) 9.67 10.11 9.65 68(40)L2 38.32 79(61) 36.90 36.98 36.69 76(66)L3 73.86 27(26) 67.50 67.65 67.50 25(24)

L1 a significantly larger space is required, so that the modified Hu-Reichel methodsuffers from its O(k4) complexity, as expected. The double orthogonalization stepoverwhelms all other costs, as reported in Table 5.3.

A comparison between the MR and Galerkin strategies shows that CPU timesare fairly comparable in all cases, except for the operator L1, for which a large spacedimension k is used. We should mention that the Galerkin strategy only uses thecompiled Matlab function lyap, whereas the new strategy also uses uncompiled Mat-lab commands, which are in general less optimized. We also notice that the finalspace dimension is in general lower for MR than for Galerkin, and this is due to thesmoothness properties of the MR residual.

Table 5.3

CPU time details of MR solution strategies for Table 5.2.

Residual Iterative Modified New directBasis Matrix Ortho computation method HR methodEK FLOW 9.24 0.12 0.03 0.27 0.03

CHIP 48.17 0.06 0.01 0.12 0.01L1 6.26 4.03 1.64 11.25 0.48L2 66.38 0.56 0.67 1.48 0.11L3 28.75 0.20 0.04 0.74 0.03

RK FLOW 3.200 0.048 0.014 0.046 0.064CHIP 49.108 0.017 0.002 0.014 0.003L1 9.339 0.165 0.048 0.514 0.021L2 36.276 0.248 0.217 0.362 0.037L3 67.117 0.020 0.003 0.015 0.003

Remark 5.2. We also experimented with the standard Krylov subspace as ap-proximation space. The dimension required for a satisfactory solution was unaccept-ably large, confirming experimental observations in the literature, see, e.g., [36], [40].For instance, for the FLOW problem, the Galerkin procedure required a space ofdimension 870 with a rank 313 solution. Analogously, the MR approach needed aKrylov subspace of dimension 840 with a rank 510 solution. Comparing these digitswith those in Table 5.2 for the same matrix, shows that the Extended Krylov subspaceis able to complete the task with a drastically reduced space dimension.

Example 5.3. In this example we consider the use of the Rational Krylov sub-space as approximation space. All other conditions are as in the previous example.


The second block of rows of Table 5.2 shows the CPU times for this choice of spacefor all considered methods, and also the final space dimension and the approximatesolution rank. Galerkin and MR procedures behave as in the case of the ExtendedKrylov subspace, and considerations as in Example 5.1 apply.

As expected, Rational Krylov approximation spaces have significantly lower di-mension. However, this welcomed property is not always reflected in the CPU time.Generating the Rational space requires solving a different shifted system at eachiteration, thus the overall performance heavily depends on the cost of solving thesesystems; This fact can be appreciated on the largest matrix L3. The trade-off betweenmemory requirements and CPU time may thus drive the subspace selection.

The detailed information in Table 5.3 shows that the final CPU time devoted tothe solution extraction is in general irrelevant, compared with the orthogonalizationprocess, due to the small size of the approximation space.

Example 5.4. We conclude our exploration for a rank-one B with a comparisonof two preconditioning techniques when using PCGLS. Table 5.2 reported results withthe preconditioner M. For the same data, Table 5.4 shows the performance of the“double” preconditioner defined in (3.16), together with the iteration count.

Table 5.4

PCGLS with different preconditioners.

PCGLS + M PCGLS + (3.16)matrix CPU Time its CPU Time its

EK FLOW 0.03 27 0.12 15CHIP 0.01 19 0.06 11L1 1.64 107 3.86 41L2 0.67 240 0.39 17L3 0.04 18 0.29 17

RK FLOW 0.014 19 0.049 21CHIP 0.002 18 0.021 17L1 0.048 65 0.263 47L2 0.217 242 0.156 17L3 0.003 18 0.030 23

The digits show that the preconditioner defined in (3.16) better captures thespectral properties of the operator, resulting in a lower number of PCGLS iterations.When the number of iterations and the problem size are significant, CPU times doshow that the application of the preconditioner in (3.16) is more expensive. Wealso observe that since the preconditioner in (3.16) uses the Galerkin solution as anintermediate step, a low number of iterations with this preconditioner means that theGalerkin and the MR solutions are very close to each other.

Example 5.5. We consider the case of rank(B)=p > 1, and as a sample, wereport our numerical experience with p = 4. Except for the stopping tolerance,which is here set to 10−7, the experimental setting is the same as in the two previousexamples, and Tables 5.5-5.6 should be interpreted analogously. When not availablein the dataset, the columns of B were chosen from a uniformly distributed randomsequence in the interval (0, 1) (matlab function rand).

We immediately notice that all methods are significantly more expensive thanfor p = 1, since a block of p (2p for EK) new vectors is added to the basis at eachiteration. In particular, the modified Hu-Reichel method is affected by its O(k4)computational cost. We also remark that when using the Extended Krylov subspace,no MR procedure is competitive with respect to the Galerkin solution, especiallywhen a very large subspace needs to be computed (cf. operator L1). On the other


hand, when using RK a much smaller approximation space suffices, making Galerkinand MR approaches more similar, in terms of CPU times. We should add that allMR algorithms showed sensitivity with respect to the conditioning of the eigenvectorbasis for p > 1, which reflected in some instability of the residual norm computation;for this reason a looser stopping tolerance was selected, though the threshold is stilllargely below any reasonable value used in practice.

Table 5.5

Rank(B)=4. CPU Times and other data for MR and Galerkin methods using Extended andRational Krylov subspaces.

Galerkin MRGalerkin Space dim. Iterative Modified New Space dim.

Basis Matrix method (Rank) method HR direct (Rank)EK FLOW 6.86 320(240) 30.48 247.74 32.03 304(201)

CHIP 29.95 184(154) 32.53 60.33 33.04 176(140)L1 123.92 760(294) 702.37 – 695.88 704(336)L2 93.02 456(251) 197.85 – 186.74 432(318)L3 74.81 296(171) 79.77 216.80 81.26 280(155)

RK FLOW 18.88 192(184) 22.50 52.09 22.74 180(162)CHIP 75.28 108(107) 76.27 80.35 76.40 108(101)L1 66.61 360(258) 101.97 331.46 101.73 316(231)L2 74.46 288(239) 102.11 268.17 99.41 276(234)L3 101.52 100(100) 92.98 95.23 93.03 92(91)

“–” means out of memory

Table 5.6

CPU time details of MR solution strategies for Table 5.5.

Residual Iterative Modified New directBasis Matrix Ortho computation method HR methodEK FLOW 3.55 23.36 2.93 220.15 4.11

CHIP 28.93 2.89 0.23 28.13 0.74L1 22.26 618.25 58.26 – 46.57L2 78.17 94.86 23.96 – 11.54L3 63.95 10.34 0.63 136.81 1.46

RK FLOW 15.587 5.922 0.653 30.284 0.783CHIP 74.951 0.994 0.062 4.237 0.193L1 43.792 52.031 5.193 234.486 4.409L2 64.910 29.994 6.518 172.558 3.162L3 90.823 0.404 0.045 2.325 0.098

“–” means out of memory

Example 5.6. In this example we consider solving the reduced least squaresproblem at each iteration by means of PCGLS, giving rise to an inner-outer iteration,where the inner normal equation is solved by PCGLS. The approximate solution atthe previous outer iteration, padded with zeros, was used as initial gues Y0. In Table5.7 we display the CPU time and the average number of inner iterations for p = 1and p = 4. The final subspace and ranks are the same as for the direct method.The CPU timings are similar to those obtained by using the direct procedure for thecomputation of the residual at each iteration; on the other hand, memory requirementsare in general lower.

6. Conclusions. We have presented a new implementation of the minimal resid-ual method for solving the matrix Lyapunov equation by means of a projection strat-egy. Our new approach computes the residual at each iteration, while determines


Table 5.7

CPU time and average number of inner iterations for inner-outer MR method using PCGLSwith M preconditioning.

rank-one B rank-four BCPU avg. CPU avg.

Basis Matrix time inner its. time inner its.EK FLOW 9.67 13.03 21.51 40.92

CHIP 48.50 12.21 30.41 16.95L1 21.44 39.95 605.34 67.84L2 72.59 111.85 263.78 116.91L3 29.70 12.10 73.31 12.20

RK FLOW 3.37 9.79 20.66 36.58CHIP 49.41 10.41 75.46 15.07L1 10.12 28.96 96.91 50.70L2 38.89 96.68 128.60 99.12L3 67.42 10.44 92.76 11.43

the projected solution only at termination. We have analyzed different strategies forthe final solution computation: For rank-one right-hand side, the new direct methodappears to be more efficient than available direct procedures, especially when thespace dimension increases. For the iterative solver we devised new preconditioningtechniques that exploit the matrix structure of the associated normal equation. Pre-liminary experiments show that the MR inner-outer approach is also very competitive.

Numerical experiments on large benchmark problems and on standard 2D and3D operators show that the performance of the method is comparable to that of aGalerkin procedure, while the residual convergence curve is in general smoother.

One significant contribution of this paper was the development of direct anditerative solvers for a matrix least squares problem with three terms, which has notbeen systematically addressed in the literature as a numerical linear algebra problem.Moreover, the associated normal equation is a general linear matrix equation withfive terms, whose solution has not been discussed in the recent literature. We thusplan to further explore these new problems in more general settings in the future, asthey occur in other applications of broad interest.

The proposed derivation has some shortcomings though. To maintain a O(k3)complexity, our new direct strategy relies on the computation of the eigendecompo-sition of the projected matrix Hk, as opposed to a more stable Schur decomposition,as is done in the projected Galerkin approach. We have experienced this sensitivitywhen B is a matrix, in some of the more difficult problems. We expect that thisimplementation may suffer from these instabilities in case of highly non-normal oralmost non-diagonalizable matrices Hk.

Another possible shortcoming is that a minimal residual condition does not neces-sarily ensure that the obtained approximate solution is positive semidefinite. In caseof an indefinite matrix, the negative eigenvalues of the approximate solution couldbe purged a-posteriori, with an obvious effect on the associated residual. Althoughour stopping tolerance was sufficiently small to prevent the appearance of negativeeigenvalues, we believe this issue deserves further analysis.

In the derivation of our new direct method, we have assumed that the matrixH ⊗ I + I ⊗ H be nonsingular, as for the Galerkin method. This is a technicalassumption, since our procedure relies on a Lyapunov equation with coefficient matrixH, whereas the least squares solution would still exist even for singular H⊗I+I⊗H.It would be nice to explicitly prove that the singularity of H ⊗ I + I ⊗H prevents the


definition of the Galerkin solution, while enforcing stagnation of the minimal residualmethod. Such a result would generalize the corresponding result in the linear systemsetting, between FOM and GMRES.

A Matlab version of all algorithms developed by the authors is available uponrequest.

Acknowledgments. This work was performed while the first author was visitingthe Department of Mathematics of the Universita di Bologna, supported by fellowship2011631028 from the China Scholarship Council (CSC).

REFERENCES

[1] Athanasios C. Antoulas. Approximation of large-scale dynamical systems, volume 6 of Ad-vances in Design and Control. Society for Industrial and Applied Mathematics (SIAM),Philadelphia, PA, 2005. With a foreword by Jan C. Willems.

[2] R.H. Bartels and GW Stewart. Solution of the matrix equation ax+ xb= c [f4]. Communicationsof the ACM, 15(9):820–826, 1972.

[3] Bernhard Beckermann. An error analysis for rational Galerkin projection applied to theSylvester equation. SIAM J. Numer. Anal., 49(6):2430–2450, 2011.

[4] P. Benner. Solving large-scale control problems. Control Systems Magazine, IEEE, 24(1):44–59,2004.

[5] P. Benner and J. Saak. A Galerkin-Newton-ADI method for solving large-scale algebraic Riccatiequations. Technical Report SPP1253-090, Deutsche Forschungsgemeinschaft - PriorityProgram 1253, 2010.

[6] Peter Benner, Jing-Rebecca Li, and Thilo Penzl. Numerical solution of large-scale Lyapunovequations, Riccati equations, and linear-quadratic optimal control problems. Numer. Lin-ear Algebra Appl., 15(9):755–777, 2008.

[7] Benchmark Collection. Oberwolfach model reduction benchmark collection, 2003.http://www.imtek.de/simulation/benchmark.

[8] Jane Cullum and Anne Greenbaum. Relations between Galerkin and norm-minimizing iterativemethods for solving linear systems. SIAM J. Matrix Anal. Appl., 17(2):223–247, 1996.

[9] Jane K. Cullum. Peaks, plateaus, numerical instabilities in a Galerkin minimal residual pair ofmethods for solving Ax = b. Appl. Numer. Math., 19(3):255–278, 1995. Special issue oniterative methods for linear equations (Atlanta, GA, 1994).

[10] V. Druskin, L. Knizhnerman, and V. Simoncini. Analysis of the rational Krylov subspace andADI methods for solving the Lyapunov equation. SIAM J. Numer. Anal., 49(5):1875–1898,2011.

[11] V. Druskin and V. Simoncini. Adaptive rational Krylov subspaces for large-scale dynamicalsystems. Systems Control Lett., 60(8):546–560, 2011.

[12] Vladimir Druskin and Leonid Knizhnerman. Extended Krylov subspaces: approximation ofthe matrix square root and related functions. SIAM J. Matrix Anal. Appl., 19(3):755–771(electronic), 1998.

[13] A. El Guennouni, K. Jbilou, and A. J. Riquet. Block Krylov subspace methods for solving largeSylvester equations. Numer. Algorithms, 29(1-3):75–96, 2002. Matrix iterative analysis andbiorthogonality (Luminy, 2000).

[14] Gene H. Golub and Charles F. Van Loan. Matrix computations. Johns Hopkins Studies inthe Mathematical Sciences. Johns Hopkins University Press, Baltimore, MD, third edition,1996.

[15] S. J. Hammarling. Numerical solution of the stable, nonnegative definite Lyapunov equation.IMA J. Numer. Anal., 2(3):303–323, 1982.

[16] Magnus R. Hestenes and Eduard Stiefel. Methods of conjugate gradients for solving linearsystems. J. Research Nat. Bur. Standards, 49:409–436 (1953), 1952.

[17] M. Heyouni and K. Jbilou. Matrix Krylov subspace methods for large scale model reductionproblems. Appl. Math. Comput., 181(2):1215–1228, 2006.

[18] M. Heyouni and K. Jbilou. An extended block Arnoldi algorithm for large-scale solutions ofthe continuous-time algebraic Riccati equation. Electron. Trans. Numer. Anal., 33:53–62,2008/09.

[19] Marlis Hochbruck and Gerhard Starke. Preconditioned Krylov subspace methods for Lyapunovmatrix equations. SIAM J. Matrix Anal. Appl., 16(1):156–171, 1995.


[20] Roger A. Horn and Charles R. Johnson. Topics in matrix analysis. Cambridge UniversityPress, Cambridge, 1994. Corrected reprint of the 1991 original.

[21] D. Y. Hu and L. Reichel. Krylov-subspace methods for the Sylvester equation. Linear AlgebraAppl., 172:283–313, 1992. Second NIU Conference on Linear Algebra, Numerical LinearAlgebra and Applications (DeKalb, IL, 1991).

[22] Carl Jagels and Lothar Reichel. The extended Krylov subspace method and orthogonal Laurentpolynomials. Linear Algebra Appl., 431(3-4):441–458, 2009.

[23] Carl Jagels and Lothar Reichel. Recursion relations for the extended Krylov subspace method.Linear Algebra Appl., 434(7):1716–1732, 2011.

[24] Imad M. Jaimoukha and Ebrahim M. Kasenally. Krylov subspace methods for solving largeLyapunov equations. SIAM J. Numer. Anal., 31(1):227–251, 1994.

[25] K. Jbilou. ADI preconditioned Krylov methods for large Lyapunov matrix equations. LinearAlgebra Appl., 432(10):2473–2485, 2010.

[26] K. Jbilou, A. Messaoudi, and H. Sadok. Global FOM and GMRES algorithms for matrixequations. Appl. Numer. Math., 31(1):49–63, 1999.

[27] K. Jbilou and A. J. Riquet. Projection methods for large Lyapunov matrix equations. LinearAlgebra Appl., 415(2-3):344–358, 2006.

[28] Zhongxiao Jia and Yuquan Sun. A QR decomposition based solver for the least squares problemsfrom the minimal residual method for the Sylvester equation. J. Comput. Math., 25(5):531–542, 2007.

[29] L. Knizhnerman and V. Simoncini. A new investigation of the extended Krylov subspacemethod for matrix function evaluations. Numer. Linear Algebra Appl., 17(4):615–638,2010.

[30] L. Knizhnerman and V. Simoncini. Convergence analysis of the extended Krylov subspacemethod for the Lyapunov equation. Numer. Math., 118(3):567–586, 2011.

[31] Jing-Rebecca Li and Jacob White. Low rank solution of Lyapunov equations. SIAM J. MatrixAnal. Appl., 24(1):260–280, 2002.

[32] An Lu and E. L. Wachspress. Solution of Lyapunov equations by alternating direction implicititeration. Comput. Math. Appl., 21(9):43–58, 1991.

[33] The MathWorks, Inc. MATLAB 7, September 2004.[34] C.C. Paige and M.A. Saunders. Algorithm 583 lsqr sparse linear equations and least squares

problems. ACM Transactions on Mathematical Software (TOMS), 8(2):195–209, 1982.[35] Christopher C. Paige and Michael A. Saunders. LSQR: an algorithm for sparse linear equations

and sparse least squares. ACM Trans. Math. Software, 8(1):43–71, 1982.[36] Thilo Penzl. A cyclic low-rank Smith method for large sparse Lyapunov equations. SIAM J.

Sci. Comput., 21(4):1401–1418 (electronic), 1999/00.[37] Yu-yang Qiu, Zhen-yue Zhang, and An-ding Wang. The least squares problem of the matrix

equation A1X1BT1 +A2X2BT

2 = T . Appl. Math. J. Chinese Univ. Ser. B, 24(4):451–461,2009.

[38] Youcef Saad. Numerical solution of large Lyapunov equations. In Signal processing, scatter-ing and operator theory, and numerical methods (Amsterdam, 1989), volume 5 of Progr.Systems Control Theory, pages 503–511. Birkhauser Boston, Boston, MA, 1990.

[39] Yousef Saad. Iterative methods for sparse linear systems. Society for Industrial and AppliedMathematics, Philadelphia, PA, second edition, 2003.

[40] V. Simoncini. A new iterative method for solving large-scale Lyapunov matrix equations. SIAMJ. Sci. Comput., 29(3):1268–1288, 2007.

[41] V. Simoncini and V. Druskin. Convergence analysis of projection methods for the numericalsolution of large Lyapunov equations. SIAM J. Numer. Anal., 47(2):828–843, 2009.

[42] E. L. Wachspress. Iterative Solution of the Lyapunov Matrix Equations. Appl. Math. Lett.,1(1):87–90, 1988.

minimal residual methods for large scale lyapunov equations

Documents