estimating the largest singular values of large sparse matrices via modified moments

Numerical Algorithms 1 (1991) 353-374 353

E S T I M A T I N G T H E L A R G E S T S I N G U L A R V A L U E S O F L A R G E S P A R S E M A T R I C E S VIA M O D I F I E D M O M E N T S

M i c h a e l B E R R Y 1,. a n d G e n e G O L U B 2,**

1 Department of Computer Science, University of Tennessee, Knoxville, TN 37996-1301, U.S~4. 2 Department of Computer Science, Stanford University, Stanford, CA 94305-2140, U.S.A.

Received 6 June 1991

We describe a procedure for determining a few of the largest singular values of a large sparse matrix. The method by Golub and Kent which uses the method of modified moments for estimating the eigenvalues of operators used in iterative methods for the solution of linear systems of equations is appropriately modified in order to generate a sequence of bidiagonal matrices whose singular values approximate those of the original sparse matrix. A simple Lanczos recursion is proposed for determining the corresponding left and right singular vectors. The potential asynchronous computation of the bidiagonal matrices using modified moments with the iterations of an adapted Chebyshev semi-iterative (CSI) method is an attractive feature for parallel computers. Comparisons in efficiency and accuracy with an appropriate Lanczos algorithm (with selective re-orthogonalization) are presented on large sparse (rectangular) matrices arising from applications such as information retrieval and seismic reflection tomography. This procedure is essentially motivated by the theory of moments and Gauss quadrature.

Subject classifications: 65D32, 65F15, 65F20, 65F50, 65Y05

Keywords: Chebyshev, modified moments, singular value decomposition, sparse

1. Introduction

T h e s ingu la r v a l u e d e c o m p o s i t i o n ( S V D ) is c o m m o n l y u sed in the so lu t ion o f u n c o n s t r a i n e d l i nea r l eas t s q u a r e s p r o b l e m s , ma t r i x r a n k e s t ima t ion , a n d c a n o n -

ical c o r r e l a t i o n analysis . In app l i ca t i ons such as i n f o r m a t i o n re t r ieva l , se i smic r e f l ec t ion t o m o g r a p h y , and r e a l - t i m e s ignal p rocess ing , t he so lu t ion to t h e s e

* This author's work was supported by the National Science Foundation under grants NSF CCR-8717492 and CCR-910000N (NCSA), the U.S. Department of Energy under grant DOE DE-FG02-85ER25001, and the Air Force Office of Scientific Research under grant AFOSR- 90-0044 while at the University of Illinois at Urbana-Champaign Center for Supercomputing Research and Development.

** This author's work was supported by the U.S. Army Research Office under grant DAAL03-90- G-0105, and the National Science Foundation under grant NSF DCR-8412314.

�9 J.C. Baltzer A.G. Scientific Publishing Company

354 M. Berry, G. Golub / Largest singular values o f large sparse matrices

problems is needed in the shortest possible time. Given the growing availability of multiprocessor computer systems, there has been great interest in the development of efficient implementations of the singular value decomposition, in general. In applications such as information retrieval ([1], [2], [6], and [7]) and seismic tomography ([1], [3], and [18]), the data matrix whose SVD is sought is usually large and sparse. It is this particular case that motivates" our research. Before our discussion of modified moments and how they may be used to determine the SVD of a sparse matrix, we review a few of the fundamental characterizations of the SVD.

Without loss of generality, suppose A is a sparse n by p (n >>p) matrix with rank(A) = r. The singular value decomposition (SVD) of A can be defined as

A=u vL (1) where u T u = v r v = I v and ~ = diag((rl , . . . , %), cr i > 0 for 1 ~< i < r, o" i = 0 for i >/r + 1. The first r columns of the orthogonal matrices U and V define the orthonormalized eigenvectors associated with the r nonzero eigenvalues of AA T and A~A, respectively. The singular values of A are defined as the diagonal elements of Z which are the nonnegativr square roots of the n eigenvalues of AA T. The set {u i, (r i, v i} is called the i-th singular triplet. The singular vectors (triplets) corresponding to large (small) singular values are called large (small) singular vectors (triplets).

To illustrate ways in which the SVD can reveal important information about the structure of a matrix we state two well-known theorems:

T H E O R E M 1.1

Let the SVD o f A be given by (1) and

O-1>~O'2>~ . . - ~ O'r ~> Orr + 1 . . . . . O'p = O ,

then 1. R a n k property: rank(A) = r, N ( A ) - span(Vr+l, . . . , vp}, and

span{ul , . . . ,Ur} , where U = [ u I u z "-- up] and V = [ v 1 v 2 " " vv]. 2. Dyadic decomposition: A = ) ~ 7 = l U i " O'i �9 v T .

3. Norms: IIZ lift =o-lZ + "--+O'r z, and IIAll2=~r x.

R ( A ) -

The rank property, perhaps one of the most useful aspects of the SVD, allows us to use the singular values of A as quantitative measures of the qualitative notion of rank. The dyadic decomposition, which is the rationale for data reduction or compression in many applications, provides a canonical description of a matrix as a sum of r rank-one matrices of decreasing importance, as measured by the singular values. The three results in Theorem 1.1 can be combined to yield the following quantification of matrix rank deficiency (see [12] for a proof):

M. Berry, G. Golub / Largest singular values of large sparse matrices 355

THEOREM 1.2. [Eckart and Young] Let the SFD of A be given by (1) with r = rank(A) <~p =min(n, p) and define:

k

A k = E U i'O'i" Vi T with k < r, i=1

then

min I I A - B I I 2 = I IA -A k l l 2=cr2+l + "'" +tr 2. r(B)~k

This important result, which indicates that A k is the best rank-k approximation to the matrix A, is the basis for concepts such as data reduction and image enhancement. In fact, A k is the best approximation to A for any unitarily invariant norm ([15]). Hence,

min I I A - B I I E = IlZ--AkllE----Crk+l. r(B)=k

Associated with the n • (n >~p) matrix A is the symmetric (n + p ) • (n + p ) matrix

A r 0 "

If rank(A) = p , it can be easily shown that the eigenvalues of B are the p pairs, +o'i, where o-~ is a singular value of A, with (n - p ) additional zero eigenvalues if n >p . The multiplicity of the zero eigenvalue of B is n + p - 2 r , where r = rank(A)~ The following Lemma (see [4] for proof) demonstrates how the SVD of A is generated from the eigenvalues and eigenvectors of the the matrix B in (2).

LEMMA 1.3 Let A be an n • p (n >~ p) matrix and B defined by (2).

1. For any positive eigenvalue, tri, of B let (ui, Vi )T denote a corresponding eigenvector of norm v~. Then tr i is a singular value of A and u i, v i are respectively, left and right singular vectors of A corresponding to tr i.

2. For tr i = O, if B has corresponding orthogonal eigenvectors (uj, vi) r with vi ~ 0 and u~ ~ 0 for j = 1 , . . . , t for some t >1 1, then 0 is a singular value of the matrix A , and the corresponding left and right singular vectors can be obtained by orthogonalizing these u~ and vj, respectively. Otherwise, A has full rank, i.e., rank(A) = p.

The numerical accuracy of the i-th approximate singular triplet (t~,., tTi, Ui ) as determined via the eigensystem of the 2-cyclic 1 matrix B (provided A/> 0) is

I A non-negative irreducible matrix B which is 2-cyclic has 2 eigenvalues of modulus p(B), where p(B) is the spectral radius of B. See Definition 2.2 on page 35 in [22].


then determined by the norm of the eigenpair residual vector r,. defined as

Ilri II [ n ( a , , r . 2 , 0,) 2]/[lla, +llv, ll=] which can also be written as

IIr, l12-- ZD,-~a, /[11 ,11 +11D,1122] 1/2- (3)

We note that the interval

I o - , - 4 1 - < Ilrill2 contains at least one singular value o-i.

Alternatively, we may compute the SVD(A) indirectly by the eigenpairs of either the p • matrix A(A or the n • n matrix AA r. The following Lemma illustrates the fundamental relations between these symmetric eigenvalue problems and the SVD.

LEMMA 1.4 Let A be an n • p (n >_. p) matrix with r a n k ( A ) = r.

(1) I f V = { v 1, v 2 , . . . , v r} are linearly independent p • 1 eigenvectors o f ArA so that V r ( A r A ) V = diag(o -2, o.22,..., o.f), then o.i is the i-th nonzero singular value o f A corresponding to the right singular vector v r The corresponding left singular vector, u i, is then obtained as u i = ( 1 / o.i ) A v r

(2) I f U = {u 1, u 2 , . . . , u r} are linearly independent n • 1 eigenvectors o f A A r so 2 o.2,), then o.i is the i-th nonzero singular that Ur( A A r ) U = diag( o. 2, o-2,...,

value o f A corresponding to the left singular vector u r The corresponding right singular vector, vi, is then obtained as v i = (1 /o- i )Arur

Computing the SVD(A) via the eigensystems of either A(A of AA r may be adequate for determining the largest singular triplets of A, but the loss of accuracy can be severe for the smallest singular triplets (see [4]). Whereas the smallest and largest singular values of A are the extremes of the spectrum of A~4 or AA r, the smallest singular values of A lie at the center of the spectrum of B in (2). For computed eigenpairs of A(A and AA r, the norms of the i-th eigenpair residuals (corresponding to (3)) are given by

Ilrill2 = [)l~lD,-~,2Di =/ll~ill2 and

II r, II ll , respectively. Thus, extremely high precision in computed eigenpairs may be necessary to compute the smallest singular triplets of A.

Having defined equivalent symmetric eigenvalue problems for the SVD problem, in Section 2 we discuss the Chebyshev Semi-Iterative (CSI) method

M. Berry, G. Golub / Largest singular values o f large sparse matrices 357

and its application to the eigensystem of the matrix B in (2). We then demonstrate how modified moments derived from the iteration vectors of the CSI method can be used to approximate the largest singular values of a sparse matrix A in Section 3. Section 4 presents a Lanczos-type iteration for recon- structing the appropriate left and right singular vectors once the desired singular values have been determined. In Section 5, we demonstrate the success of our hybrid SVD method with comparisons in accuracy and speed on the Cray Y-MP/4-64 for sparse matrices arising from applications in information retrieval and seismic reflection tomography. We assess our preliminary results and give concluding remarks in Section 6.

Throughout this paper, we assume that the original matrix A has been scaled so that cri(A) ~< 1, where o-i(A) is the i-th largest singular value of A. If we let E = ~11 A 111" II A II | < ~ - - II A II 1, then, II A II z ~< E, and cri(A) ~ 1, where A = (1/E)A.

2. Chebyshev semi-iterative method

Consider the system of linear equations

Cz=O, (4)

where the (n + p ) x (n + p ) matrix C is defined by

(1 C = - A r I ' (5)

and z r = (x r, yr) , where x and y are n • 1 and p • 1 column vectors, respectively. Let B = I - C be the 2-cyclic matrix in (2), and consider the basic iteration

z ok+l) = B z (k). (6)

If we apply the Chebyshev semi-iterative (CSI) method (see [13]) then (6) can be written as

~k)=pk(B)~O) , (7)

where Pk(B) is a polynomial of degree k in B, and s r is an (n + p ) • 1 column vector. Using (7) with the method of modified moments (presented in succeeding section) we can estimate eigenvalues of B corresponding to largest singular values of A.

The basic algorithm has the form

~(k+l ) = ~(k-1) ..~_ O)k + l (~(k) __ ~(k-1) _ c~(k)), (8)

where

o,1 = 1, = ( 1 - �89 o,k+l = ( 1 - 1 , - 1 , . . . , "g l~ tO k ) ,

358 M. Berry, G. Golub / Largest singular values of large sparse matrices

and 0 < ~ ~< 1. Exploiting the form of C in (5), and the partitioning st(k) in the form

X k) st(k)

I y(k) }, where x (k) and y(k) are n • 1 and p x 1 column vectors, respectively, the right-hand-side of (8) may be written as

1,) x,. ,,) y ( k - 1 ) "l"tOk+l Arx(k)_y(k-1) "

If we assume the initial vector st(0) is chosen such that

,,0, (go,) (9, where II x (~ II 2 = 1/2, it then follows that (8) can be cast as the coupled equations

x(2m) = x(2rn-2) d- 092m(AY (2m - 1) __ x(2rn --2))

y (2m + 1) = x(2m - 1) _}_ O)2r n + I ( A T x ( 2 m ) -- y(2m - 1 ) ) ' (10)

where

/ 0 1' y(2m+,) , (11)

so that the inner product (st(2,o, st(2m+l)) = 0. Similar to the Lanczos algorithm for the symmetric eigenvalue problem (see

[16]), we construct a sequence of bidiagonal matrices Ja whose largest singular values (and corresponding left and right singular vectors) approximate those of the original sparse matrix A. Such a sequence can be constructed via the modified moments (see [9])

Vk = (St(0), st(k)) =(St(0), Pk(B)st(o)). (12)

3. Modified moments

If r = rank(A) and Q = [qa, q2,. . . , qr] is an orthogonal matrix whose i-th column, qi, is an eigenvector of B corresponding to the i-th largest (positive) eigenvalue of B, Ai, then following [9] we can write

St(O) = ~ otiqi . i=1

M. Berry, G. Golub / Largest singular values o f large sparse matrices 359

From (7), it follows that u r

~(k) = E aiPk(tri)qi = Y'. otiTk(O'i)qi, i = 1 i = 1

since h i = tr i (of course - q i is also an eigenvalue of B) and Pk(B) -- Tk(B) , the Chebyshev polynomial (first kind) of degree k. We then form the inner product

r

(r ~(0) = E a2Tk(~176 = f Tk(o-)T,(o-) d a ( q ) , (13) i = 1

where a(o') is a discrete non-negative distribution with jumps of height tr/2 at each singular value o- i. Associated with the distribution a(tr) is a set of discrete orthogonal polynomials {~Ok(O')}~,= 0 such that

fOi(q)Oy(q) d a ( q ) = 0 when i ~ j .

The final orthogonal polynomial g,,,(o-) has a zero at each point of the distribution, i.e., g,, ,(~)= 0, i = 1, 2 , . . . , r. For each iteration of (10), we calculate modified moments of the distribution a(o'):

~'/'4k = (~:(4k) , ~ (0) ) = d~(q ) ,

= r = f r , da( t r ) , /U'4k+2

which are used to compute the recurrence relationship for the orthogonal polynomials {~bk(O')}.

Following [9], let

oTj (o) = biTj+l(o ) q-ajZl.((7 ) q- c]Z)_l(O'),

o~i(o" ) = yi+x~Oi+x(o-) + aj~Oj(o) + yi~Oi_a(o'), (14)

and define

The modified Chebyshev algorithm begins with ~q_13=0, l = - 1 , 0 , . . . , 2 rn + 1,

~O,l=Vl, l = 0 , 1 , . . . , 2 m - - 1 ,

and

Ot 0 = a 0 -F bo (~ /01 / /T /oo ) , To ---- 0 .

Then, for k = 1, 2 , . . . , 2m - 1,

"~kl = [ bt'rlk-l,t+ l + ( al - O l k - - 1 ) ' l ~ k - l , l -~- C l T ~ k - l , l - 1 - - ~ / k - l q q k - 2 , 1 ] ,

for l = k , k + 1 , . . . , 2 (2m - 1 ) - k ,

a k =a k + bk('qk,k+l/rlkk) - -bk- l ( 'qk- l ,k / 'qk- l ,k -X) ,


and

7k = bk-x(rtkk/nk-l .k-1) �9 For the Chebyshev polynomials, Ti(tr), we choose

b ; = 1/to;+1, a j = 0 , and c; = (to;+ 1 - 1 ) / to ;+1= ~2to;/4.

We note that c 1 = tz2tol/2 = izz/2. From (18) we have

= (1/,o = 0, and

~Ta,k+2~-I = 0 , for i = 1, 2 , . . . , 2 m - k - 1.

Hence, a k = 0 for k = 1, 2 , . . . , 2m - 1 and (15) may be rewrit ten as

r /k ,= [b,r/k_l,,+ 1 + ctrh,_l,,_ 1 -- yk_tr/k_2,,], (16)

for l = k , k + 1 , . . . , 2 ( 2 m - 1 ) - k .

In the absence of roundoff, this computat ional procedure yields the same tridiagonal matrix as the Lanczos algorithm ([14]) for determining the eigenvalues of the 2-cycle matrix B in (2). The extreme singular values of the bidiagonal matrix Ja of order k > 1) formed by the square roots of the yj's in (14),

=

will, in general, provide good approximations to the extreme singular values of the original matrix A as k increases ([11]). This procedure terminates when either

(i) ~Tkk <~ O, for some k < 2m - 1, after rn CSI steps (10), or (ii) m = mmax, where r n ~ , is the maximum number of CSI steps allowed, or

(iii) [tT~ m~ - t~ / ( ' -1) I/t~["o ~< "r, where ~- is a user-specified lower bound for the residual error in the convergence of the i-th singular value of J,,,, t~i("~ to an exact singular value of A, tri, after rn CSI steps.

We can use a simple recursion to calculate the moments ~'k, k = 0, 1, 2 , . . . , 4 m - 2 , defined by (12). F rom (11), it follows that

b,2k+l = (~(k) , ~(k+l) ) = 0, k = 0, 1 , . . . , (18)

and ~'0 = 1/4 , since II x (~ II 2 = 1/2 . Now, for k = 1, 2 , . . . , m - 1 we need only determine

1 ~',k =(xczk), X(2k)) + T,k(1/-g ) [(x(Rk), x(Ek))--~'0] (19)


1 v41,+z=(yCZk+1), y(2k+z))+ T4k+2(1/-~)[(y(Zk+x), y(Zk+l))_v0],

where the the Chebyshev polynomials (T k) satisfy the three term recurrence

Tk+l(1/~. ) = 2 / ~ T k ( 1 / ~ ) - Tk_x(1/~ ), (20)

and T0(1,/-~) = 1, T~(1/-~) = 1 /~ . We particularly note that the parameter ~ in (19) and (20) plays an important

role in our method. If we desire to treat all singular values equally, we choose /z = 1. However, if we want to suppress (or damp-out) all singular values, o-i, having magnitudes less than ~, we set /z - -~.

4. S i n g u l a r v e c t o r s r e c u r s i o n

After rn steps of the CSI method (10), we can approximate the largest singular values of the n • matrix A by computing the SVD of the m x m bidiagonal matrix Jm

- e m ~ , m a m , Jm-- " T

where the columns of Pro, Qm are left, right singular vectors (respectively) of Jr,,, and ~,,, is a diagonal matrix whose elements, 6i, approximate singular values of A.

To compute approximate the corresponding left and right singular vectors of the matrix A, we rely upon a Lanczos-type recursion ([12]) defined by

AY=XJm, ArX = yjr , (21)

where X, Y are n • rn and p X m matrices, respectively, having orthogonal columns. Constructing such matrices X and Y coupled with the SVD(Jm) will yield our desired approximate singular vectors, i.e.,

A=(XPm)~,m(YQm) T,

so that the columns of U - XP m and V - Y Q m are the desired approximate left and right singular vectors of A, respectively.

If we define X = [x 1, xz, . . . , x m] and Y = [Yx, Yz,.. . , Ym], then from (17) and (21) we necessarily have

where y, -Yx/I[ Yl II 2 (i.e., Yx is a normalized p • 1 vector). For k = 1, 2 , . . . , m - 1, we then determine

and


k \ l - 1

- 1 0 0

1

(a) m -- 1.

k \ l - 1

- 1 0 0

1

2

3

(b) m = 2.

kkl -1 - 1 0

0

1

2

3

4

5

(c) m = 3.

0 1 2

0 0 0 I; 0 0 1' 2

"011

0 1 2 3 4 5 6

0 0 0 0 0 0 0 v 0 0 v 2 0 v 4 0 v 6

'011 0 "013 0 "01s

"022 0 "024

"033

0 1 2 3 4 5 6 7 8 9 10

0 0 0 0 0 0 0 0 v 0 0 v 2 0 v 4 0 v 6 0

"011 0 ~713 0 "015 0 "017

"022 0 "024 0 "026 0

"033 0 "035 0 "037

"044 0 "046

"055

Fig. 1. Construction of the 2-D "0-array for m = 1, 2, 3.

0 0 0 v 8 0 Vl0

0 '019

"028

5. P e r f o r m a n c e o f h y b r i d S V D m e t h o d

To assess the merits of our hybrid SVD method with regard to speed and accuracy on a Cray Y-MP/4-64 supercomputer 2, we consider the SVD of large sparse matrices arising from two application areas: query-based information retrieval and seismic reflection tomography. Below, we briefly discuss the need for approximating the largest singular values and corresponding singular vectors of large sparse matrices by researchers in physical and social sciences.

5.1. LATENT SEMANTIC INDEXING

In [1], [6], and [7] a new approach to automatic indexing and retrieval is discussed. It is designed to overcome a fundamental problem that plagues existing information retrieval techniques that try to match words of queries with words of documents. The problem is that users want to retrieve on the basis of conceptual topic or meaning of a document. There are usually many ways to

2 Cray Y-MP/4-64 supercomputer located at the National Center for Supercomputing Research and Development (NCSA) at the University of Illinois at Urbana-Champaign.

M. Berry, G. Golub / Largest singular values of large sparse matrices

Table 1 Sparse term-document matrix specifications

363

Data Columns Rows Nonzeros Density /z c /~r

c i s i 1460 5143 66340 0.88 45.4 12.9 c R A N 1400 4997 78942 1.10 56.4 15.8 ~! E 0 1033 5831 52012 0.86 50.4 8.9 1' i I~ s 425 10337 80888 1.80 190.3 7.8 r s c H 6535 16637 32744 0.30 50.0 20.0 N't 1" I M s s 19660 35796 1879480 0.02 95.6 50.0

express a given concept (synonymy), so the literal terms in a user 's query may not match those of a relevant document . In addition, most words have multiple meanings (polysemy), so terms in a user 's query will literally match terms in irrelevant documents .

The p roposed latent semantic indexing (LSI) approach tries to overcome the problems of word-based access by treating the observed word to text-object association data as an unreliable est imate of the true, larger pool of words that could have b e e n associated with each object. It is assumed there is some underlying latent semantic structure 3 in word usage data that is partially obscured by the variability of word choice. Using the SVD defined in (1), we can est imate this latent s tructure and remove the obscuring noise.

Specifically, for an n • term-document matrix A whose n rows and p columns (n >>p) correspond to terms and documents , respectively, we seek the rank-k (k <<p) matrix

k Ak -- E v?, (22)

i = l

given by T h e o r e m 1.2. The idea is that the matrix A k captures the major associational s tructure in the matrix and removes the noise. Since relatively few terms are used as referents to a given document , the rectangular matrix A = [air] is quite sparse. The matrix e lement air indicates the f requency with which term i occurs in document j. Depend ing upon the size of the database from which the te rm-document is generated, the matrix A can have several thousand rows and slightly fewer columns. Table 1 lists a few statistics of six large sparse te rm-document matrices that have been genera ted 4. W e note that /z r and /~c are the average number o f nonzeros per row and column, respectively.

3 Semantic structure refers to the correlation structure in the way in which individual words appear in documents; semantic implies only the fact that terms in a document may be taken as referents to the document itself or to its topic.

4 Special thanks to Sue Dumais from Bell Communications Research (Bellcore), Morristown, NJ for providing the various sparse matrices from Latent Semantic Indexing (LSI) studies.


By using the reduced model in (22), usually with 100 < k ~< 200, minor differences in terminology are virtually ignored. Moreover, the closeness of objects is determined by the overall pattern of term usage, so documents can be classified together regardless of the precise words that are used to describe them, and their description depends on a consensus of their term meanings, thus dampen- ing the effects of polysemy. As a result, terms that do not actually appear in a document may still be used as referents, if that is consistent with the major patterns of association in the data. Position in the reduced space (R(Ak)) then serves as a new kind of semantic indexing.

As demonstrated in [1] and [6], LSI using the sparse SVD is more robust and economical than straight term overlap methods. However, in practice, one must compute at least 100-200 largest singular values and corresponding singular vectors of sparse matrices having similar characteristics to those matrices in Table 1. In addition, it is not necessarily the case that rank(A) = p for the n • term-document matrix A, this is due to errors caused by term extraction, spelling, or duplication of documents. Regarding the numerical precision of the desired singular triplets for LSI, recent tests using a few of the matrices listed in Table 1 have revealed that for the i-th approximate singular triplet, {ui, ~7i, vi},

10 -6 <~lASi- ff.if~i l<~ 10 -a

will suffice. Finally, as the desire for using LSI on larger and larger databases or archives grows, fast algorithms for computing the sparse singular value decomposition will become of paramount importance.

5.2. SEISMIC TOMOGRAPHY

Seismic travel tomography is another application area in which the sparse SVD problem arises. In recent literature (see [3], [18], and [19]), great interest has been expressed in the ability to approximate some of the largest singular values and corresponding singular vectors of large unstructured sparse matrices.

In this particular application, the sparse SVD problem arises from the solution of nonlinear inverse problems associated with the approximation of acoustic or elastic wavespeed from travel times. Specifically, the travel times (data) are related to the wavespeed (model parameters) via

t(r) = fr(sr y, z) dl, (23)

where x, y, and z are spatial coordinates, dl is the distance (differential) along the ray r, and s(x, y, z) = l/l~(x, y, z) is the slowness (reciprocal of velocity) at the point (x, y, z). For large 2D problems, the travel times (t(r)), extracted from the original seismograms, can involve up to O(105) rays. The ray path depends on the slowness (unknown) and thus (23) must be linearized about some initial or reference slowness model. Discretization of the slowness by cells

M. Berry, G. Golub / Largest singular values o f large sparse matrices

Table 2 Sparse Jacobian matrix specifications from seismic tomography

365

Data Columns Rows Non,zeros Density /~c t~r

A M 0 c 01 330 1436 35210 7.43 106.7 24.5 AM0C02 8754 9855 1159116 1.34 132.4 117.6

or finite elements, within which the slowness is assumed to be constant, allows the linearized integral to be approximated as a sum. The resulting overdeter- mined system of linear equations for the unknown slowness perturbation values is

D A s = A t , (24)

where the components of At are the differences between the travel times computed for the model and observed times, the components of As are the differences between the initial and updated model, and D is the Jacobian matrix whose (i, j) element is the distance the i-th ray travels in the j-th cell. For 2D problems, the matrix D in (24) is generally large (up to order O(10s)) and sparse. Table 2 lists the statistics for two such Jacobian matrices, D, arising from seismic reflection tomography research s

As discussed in [18], the linear least squares solution for (24) is usually determined using the pseudo-inverse

D* = VkZklU~,

where k = rank(D), Zk = diag(cq, t r2, . . . , irk) , and UkrUk = VkrVk = I k SO that

UTDVk = Z k

is the SVD of the matrix D. It has been observed ([18]), that perturbations or errors comparable to the smallest singular values may introduce large changes in the least squares solution, Ast,, which may become quite rough (non-smooth) for noisy singular inverse problems. Further, the singular vectors corresponding to the smallest singular values may control the high-spatial-frequency aspects of the solution,

A st, = DtA t.

As discussed in [1], the poorly resolved features of the inverse problem are controlled by the smallest singular triplets since these features are in the approximate null space of the Jacobian matrix, N(D). Since long-wavelength rather than short-wavelength trends are usually observed from seismic travel times, inversion for velocity trends alone using travel times, t(r), will force the vectors of the smallest singular triplets to be high frequency. However, as

s Special thanks to John A. Scales from Amoco Research Center, Tulsa, OK for providing sample reflection data matrices.


discussed in [18] and [19], inversion for subsurface velocity (/~(x, y, z)) and depths-to-reflectors from travel times involves a fundamental trade-off between velocity and reflector position: increasing (decreasing) the velocity above a reflector while simultaneously increasing (decreasing) position does not affect the travel times. In this case, the singular vectors corresponding to the smallest singular values control the velocity-reflector depth trade off. If these singular values are at or below the data noise level, the long-wavelength velocity-reflector depth trade-off is unresolvable. Consequently, researchers in seismic reflection tomography can assess trade-offs in model parameters using sparse SVD methods which approximate singular triplets above specified quantities (noise level).

5.3. EXPERIMENTS

We now demonstrate the performance of the method of modified moments presented in Section 3 for computing a few of the largest singular values of the sparse matrices in Tables 1 and 2 on a Cray Y-MP/4-64 supercomputer. As discussed in [1], a single-vector Lanczos algorithm has been demonstrated to be the fastest method among Lanczos and subspace-iteration based methods for computing several of the largest singular values and corresponding vectors of large sparse matrices (Tables 1 and 2). This particular Lanczos SV-D scheme employs a selective reorthogonalization strategy (see [17], [20]) and determines the largest singular triplets as eigenpairs of A ~ according to Lemma 1.3 (see [1]). For clarification purposes, LASVD and MMSVD denote the single-vector Lanczos SVI) method and our modified moments SVD method, respectively.

In the following experiments, we approximate the three largest singular values, {o- 1, o-2, o-3}, of the n • matrix A where

o'1>_-o-2~> "-" >to-p,

and A is one of the sparse matrices listed in Tables 1 and 2. Several of the largest singular triplets of these matrices have already been determined ([1]) by L A S V D so tha t

II r i II 2 10 -6,

where II ri I12 is the residual norm given in (3). We assess the accuracy of our hybrid method, MMSVD, by the difference

di =l t~/m -tr /z l, i = 1, 2, 3, (25)

where ~7~ and 6p are the approximations to o-; by MMSVD and LASVD, respectively. Table 3 lists the three largest singular values (tTp, i = 1, 2, 3) for each of the sparse matrices in Tables 1 and 2. For MMSVD, we set rnm~ x = 20, where mm~, is the maximum number of Chebyshev steps, and define -r = 10 -6, where -r is the lower bound for the residual error in any tT~ (see Section 3).

M. Berry, G. Golub / Largest singular oalues of large sparse matrices 367

Table 3 The three largest singular values of the matrices in Tables 1 and 2 as determined by LA SV D on the Cray Y-MP/4-64

Matrix d~l ~ d~3

A M 0 C 01 23.7630524172 19.7682890658 16.5819300424 C I S I 33.8000293446 21.0610875033 18.3848409765 M E D 35.9639240492 29.0816318619 26.1893678962 C R A N 51.7216091924 32.5492649037 30.4925801984 T E C H 54.2676842963 37.9635161112 36.8817304775 T I M E 58.0774731128 46.7912355145 42.6990705083 N Y T I M E S 702.7858027361 179.4845080951 160.1175339729 A M 0 C 0 2 14354.5922858634 1145.81331917319 861.2512871199

For each sparse matrix considered (Tables 1 and 2) the nonzeros are stored in computer memory using the column-wise Harwel l /Boeing sparse matrix format ([5]). In particular, we store the sparse matrix A only and access A r by the same data structure. With regard to parallelism, only compiler-generated (auto- tasking) parallelization of MMSVD and LASVD was used on the Cray Y-MP/4-64 which has 4 CPU's and 64 million words (64-bits per word) of core memory. Most of our experiments were conducted in a multi-user environment. Conse- quently, the average number of concurrent CPU's dedicated to each experiment ranged only from 1 to 1.5. We do note, however, that the Chebyshev steps (10), the assembly of the bidiagonal matrices Jk (17), and the computation of singular values of Jk can execute in parallel within M MSVD. In other words, after m Chebyshev steps we can assemble J2m-1 and be computing the singular values of J2,,,-1 while subsequent Chebyshev steps (m + 1, m + 2 . . . . ) are executed. The asynchronous computation of Chebyshev steps and moment calculation coupled with an appropriate QR method (see [12]) for computing singular values of bidiagonal matrices is an attractive feature of M M S V D that can be exploited on a multiprocessor architecture.

In Tables 4 and 5, we illustrate the performance of M M S V D as compared to that of L A S V D on the Cray Y-MP/4-64, when approximating the three largest singular values (see Table 3) of the sparse matrices in Tables 1 and 2 to single precision (six or more decimal digits) accuracy. In these tables, the elapsed user CPU time in seconds, the total number of sparse matrix multiplications by A or A r (SpMxV), and the number of rows (n) and columns (p) of each sparse matrix A are indicated. The number of Lanczos steps required by L A S V D is given by l, and the number of Chebyshev steps required by M MSVD per choice of (spectrum damping parameter in (19) and (20)) is given by rn.

Among our six choices of g (0.99, 0.95, 0.90, 0.75, 0.50, 0.25), we observe that the speed improvement (ratio of CPU times) of M M S V D relative to I_ A S V D ranges from approximately 1.9 to 6.4, and the total number of sparse matrix multiplica-


Table 4 Performance of MMSVD and LASVa when computing the 3 largest singular values of the matrices in Tables 1 and 2 on the Cray Y-MP/4-64.

Matrix (n, p) MMSVD LASVD

m ~ SpMxV CPU Time l SpMxV CPU time (seconds) (seconds)

AMOCO1 (1436, 330) 11 0.99 21 0.09 22 46 0.18 11 0.95 21 0.09 11 0.90 21 0.09 11 0.75 21 0.09 9 0.50 17 0.08 8 0.25 15 0.06

c r s I (5143, 1460) 9 0.99 17 0.15 31 64 0.56 9 0.95 17 0.15 9 0.90 17 0.15 9 0.75 17 0.15 7 0.50 13 0.11 7 0.25 13 0.11

MED (5831, 1033) 12 0.99 23 0.16 31 64 0.43 12 0.95 23 0.16 12 0.90 23 0.16 11 0.75 21 0.15 9 0.50 17 0.12 8 0.25 15 0.10

C R A U (4997, 1400) 9 0.99 17 0.17 22 46 0.45 9 0.95 17 0.17 9 0.90 17 0.17 9 0.75 17 0.17 8 0.50 15 0.15 7 0.25 13 0.13

tions by a vector (SpMxV) ranges f rom 2.2 to 6.5. The larger speed improve- ments are typically obtained for the smaller choices of g (0.50, 0.25) which reflects smaller numbers of Chebyshev steps prior to terminat ion (see criteria in Section 3), and minimal damping of unwanted singular values. In Table 6 we indicate the average cost (in CPU time) per i terat ion l and m for L A S V D and MMSVD with ~ = 0.99, 0.25, respectively. Essentially, the cost per i terat ion for the two methods (irrespective of the choice of ~ for MMSVD) is about the same across our test suite of sparse matrices. Hence, the difference in overall cost is at tr ibuted to the 2(l + 1) and 2m - 1 sparse matrix multiplications by a vector required by LASVD and MMSVD, respectively.

To assess the quality of the singular value approximations de termined by M M S V D, Tables 7 and 8 indicate the quality of the singular value approximations by the

[ - l o g l o (d i ) l , i= 1, 2, 3,


Table 5 Performance of M M S V D and L A S V D when computing the 3 largest singular values of the matrices in Tables 1 and 2 on the Cray Y-MP/4-64

Matrix (n, p ) MMSVD LASVD

m ~ SpMxV CPU time I SpMxV CPU time (seconds) (seconds)

T I M E (10337, 425) 12 0.99 23 0.20 22 46 0.38 12 0.95 23 0.20 12 0.90 23 0.20 10 0.75 19 0.16

9 0.50 17 0.14 8 0.25 15 0.13

T E C H (16637, 6535) 10 0.99 19 0.76 31 54 2.67 10 0.95 19 0.77 10 0.90 19 0.77 9 0.75 17 0.69 8 0.50 15 0.61 7 0.25 13 0.52

AM 0 C 0 2 (9855, 8754) 4 0.99 7 0.77 19 40 4.92 4 0.95 7 0.78 4 0.90 7 0.77 4 0.75 7 0.77 4 0.50 7 0.77 4 0.25 7 0.78

NYTIMES (37596, 19660) 8 0.99 15 3.73 22 46 10.9 5 0.95 9 2.27 5 0.90 9 2.27 5 0.75 9 2.27 5 0.50 9 2.27 4 0.25 7 1.79

Table 6 Average cost in CPU seconds per Lanczos and Chebyshev step of LASVD and MMSVD, respectively, when approximating the three largest singular values of the sparse matrices in Table 1 and 2 on the Cray Y-MP/4-64

Matrix MMSVD LA$VD

= 0.99 ~ = 0.25

A M 0 C 01 0.008 0.008 0.008 C I S I 0.016 0.016 0.018 M E D 0.013 0.013 0.014 C R A N 0.019 0.019 0.020 T I M E 0.017 0.016 0.017 T E C H 0.076 0.074 0.086 AMOCO2 0.193 0.195 0.2.59 N Y T I M E S 0.466 0.447 0.495


Table 7 Numerical accuracy of MMSVD compared to LASVD when approximating the values of the matrices in Tables 1 and 2 on the Cray Y-MP/4-64

3 largest singular

Matrix (n, p) rn ~ [-logl0(dl)J [ - logl0(d2)J [ - loglo(d3)]

AM0C01 (1436, 330) 11 0.99 14 12 6 11 0.95 13 11 6 11 0.90 14 11 6 11 0.75 13 10 7 9 0.50 12 8 5 8 0.25 10 6 3

C I S r (5143, 1460) 9 0.99 14 5 3 9 0.95 13 4 2 9 0.90 13 4 2 9 0.75 13 5 3 7 0.50 12 3 1 7 0.25 11 3 1

MED (5831, 1033) 12 0.99 13 6 3 12 0.95 13 6 3 12 0.90 13 6 3 11 0.75 13 5 2 9 0.50 9 4 2 8 0.25 9 3 1

C R A N (4997, 1400) 9 0.99 13 4 4 9 0.95 13 4 3 9 0.90 13 4 3 9 0.75 13 4 3 8 0.50 11 2 1 7 0.25 11 2 0

where d; is given by (25). For approximations to the largest singular value, o-1, we note that M M S V D produces equivalent approximations to those of t.A S V D in a range of 9 to 14 decimal digits. For the choice ~ = 0.99, the average number of decimal digits in agreement is 13.25, while the average is about 10.25 for the choice ~ = 0.25. For approximations to the second largest singular value, 0- 2, M M S V D produces equivalent approximations to those of t. A S V D in a range of 0 to 12 decimal digits. An average of 6.125 decimal digits can be achieved for

= 0.99, while the choice ~ = 0.25 only yields an average of 3.125 decimal digit accuracy. For the third largest singular value, o-3, M M S V D with ~ = 0.99 produces approximations within 3.125 decimal digits of those of I_ASVD, while the choice

= 0.25 averages only 1.25 decimal digits. For matrices A M 0 C 0 2 and N Y T I M E S, in particular, the approximations to 0- 3 produced by M M S V D (for all 6 choices of ~) are not competitive with those produced by L A S V D. Among the 8 of the matrices in Tables 1 and 2, we can insure that

[-logl0(d,)] >/6,


Table 8 Numerical accuracy of MMSVD compared to LASVD when approximating the 3 largest singular values of the matrices in Tables 1 and 2 on the Cray Y-MP/4-64

Matrix (n, p) m ~ [-loglo(dl)] [ - loglo(d2)] [-loglo(d3)]

T I M E (10337, 425) 12 0.99 14 9 5 12 0.95 12 8 4 12 0.90 12 8 5 10 0.75 12 7 3 9 0.50 10 4 2 8 0.25 10 4 2

T E C H (16637, 6535) 10 0.99 14 5 4 10 0.95 14 5 4 10 0.90 12 5 3 9 0.75 12 4 2 8 0.50 11 4 2 7 0.25 10 3 1

AMOC02 (9855, 8754) 4 0.99 12 4 0 4 0.95 11 4 0 4 0.90 11 4 0 4 0.75 11 4 0 4 0.50 11 4 0 4 0.25 11 4 0

NYTIMES (35796, 19660) 8 0.99 12 4 0 5 0.95 12 3 0 5 0.90 12 3 0 5 0.75 13 2 0 5 0.50 12 2 0 4 0.25 10 0 2

for all 8 matrices when i = 1, 3 matrices when i -- 2, and only 1 matrix when i = 3. Hence, we have demonst ra ted that M M S V D can be extremely efficient for approximating two of the largest singular values of the large sparse matrices in Tables 1 and 2.

6. Conclusions

We have demonst ra ted a procedure, MMSVD, for determining a few of the largest singular values of a large sparse matrix using the method of modified moments. As a Lanczos-like method, this scheme generates a sequence of bidiagonal matrices whose singular values approximate those of the original sparse matrix. Al though we have used M M S V D for approximating singular values only (no singular vectors), we have proposed simple Lanczos recursion that could be used to recover the corresponding left and right singular vectors. The potential asynchronous computat ion of the bidiagonal matrices using modified


m o m e n t s wi th the i t e ra t ions o f an a d a p t e d C h e b y s h e v semi- i t e ra t ive (CSI) m e t h o d is an a t t rac t ive f e a t u r e fo r mul t ip rocesso r s , which we h o p e to exploi t in f u t u r e r e sea rch .

References

[1] M. Berry, Multiprocessor sparse SVD algorithms and applications, PhD thesis, The Univer- sity of Illinois at Urbana-Champaign, October 1990.

[2] M. Berry, Multiprocessor sparse SVD methods for information retrieval applications, in: Science and Engineering on Supercomputers ed. Eric J. Pitcher (Computational Mechanics Publications, Southampton, Springer-Verlag, Berlin, 1990) 133-144.

[3] R. Bording, A. Gertsztenkorn, L. Lines, J. Scales and S. Treitel. Applications of seismic travel-time tomography, Geophys. J. R. Astr. Soc. 90 (2) (1987) 285-304.

[4] J.K. Cullum and R.A. Willoughby, Lanczos Algorithm for Large Symmetric Eigenvalue Computations, Vol. 1 Theory (Birkh~iuser, Boston, 1985).

[5] I.S. Duff, R.G. Grimes and J.G. Lewis, Sparse matrix test problems, ACM Trans. Maih. Software 15 (1989) 1-14.

[6] S. Deerwester, S.T. Dumais, G.W. Furnas, T.K. Landauer and R. Harshman, Indexing by latent semantic analysis, Journal of the American Society for Information Science, 41 (6) (1990) 391-407.

[7] S.T. Dumais, G.W. Furnas and T.K. Landauer, Using latent semantic analysis to improve access to textual information, in: Proc. Computer Human Interaction '88, 1988.

[8] W. Gautschi, On generating orthogonal polynomials, SIAM J. Sci. Stat. Comput. 3 (3) (1982) 289-317.

[9] G.H. Golub and M.D. Kent, Estimates of eigenvalues for iterative methods, Mathematics of Computation ,53 (188) (1989) 619-626.

[10] G. Golub and W. Kahan, Calculating the singular values and pseudoinverse of a matrix, SIAM J. Numer. Anal. 2 (3) (1965) 205-224.

[11] G.H. Golub, F.T. Luk and M.L. Overton, A block Lanczos method for computing the singular values and corresponding singular vectors of a matrix, ACM Transactions on Mathematical Software 7 (2) (1981) 149-169.

[12] G. Golub and C. Van Loan, Matrix Computations, 2rid ed. (Johns Hopkins, Baltimore, 1989). [13] G.H. Golub and R.S. Varga, Chebyshev semi-iterative methods, successive overrelaxation

methods, and second order Richardson iterative methods, Numer. Math. 3 (1961) 147-168. [14] C. Lanczos, An iteration method for the solution of the eigenvalue problem of linear

differential and integral operators, J. Res. Nat. Bur. Standards, 45:4 (1950) 255--282. [15] L. Mirsky, Symmetric gage functions and unitarily invariant norms, Quart. J. Math. 11 (1960)

50-59. [16] B.N. Parlett, The Symmetric Eigenvalue Problem (Prentice Hall, Englewood Cliffs, NJ, 1980). [17] B.N. Parlett and D.S. Scott, The Lanczos algorithm with selective reorthogonalization, Math.

Comp. 33 (1979) 217-238. [18] J.A. Scales, P. Dochery and A. Gerszternkorn, Regularization of nonlinear inverse problems:

imaging the near-suriaceweathering layer, Inverse Problems 6 (1) (1990) 115-131. [19] J.A. Scales and A. Gerszternkorn, Robust methods in inverse theory, Inverse Problems 4

(1988) 1071-1091. [20] H. Simon, Analysis of the symmetric Lanczos algorithm with reorthogonalization methods,

Lin. AIg. and Its Appl. 61 (1984) 101-131.


[21] J. Vanderwalle and B. De Moor, A variety of applications of singular value decomposition in identification and signal processing, In: SVD and Signal Processing, Algorithms, Applications, and Architectures (Elsevier, Amsterdam, 1988).

[22] R.S. Varga, Matrix Iterative Analysis (Prentice Hall, Englewood Cliffs, NJ, 1962). [23] J.H. Wilkinson, The Algebraic Eigenvalue Problem (Clarendon Press, Oxford, 1965).

estimating the largest singular values of large sparse matrices via modified moments

Documents