skew symmetry in retrospect

9
Adv Data Anal Classif DOI 10.1007/s11634-014-0181-7 REGULAR ARTICLE Skew symmetry in retrospect John C. Gower Received: 15 March 2014 / Revised: 21 July 2014 / Accepted: 1 August 2014 © Springer-Verlag Berlin Heidelberg 2014 Abstract The paper gives a short account of how I became interested in analysing asymmetry in square tables. The early history of the canonical analysis of skew- symmetry and the associated development of its geometrical interpretation are described. Keywords Skew-symmetry · Canonical analysis of skew-symmetric matrices · Singular value decomposition of skew matrices · Hedra · Bimensions · Triangle diagrams Mathematics Subject Classification 62-09 1 Early history My interest in asymmetry began when I spent the year 1974 in Adelaide with CSIRO. My principal collaborator there was Graham Constantine. Graham pointed out that although there was a huge amount of work on the analysis of symmetric matrices there was little, if anything, on the analysis of square asymmetric matrices, except in so far as they could be regarded as a special case of any rectangular table, in which case there is a legion of models underpinning possible methods of analyses. A common way of handling asymmetry in a square matrix X is to analyse the symmetric matrix 1/2(X + X T ), thus ignoring departures from symmetry, perhaps, treating them as of little interest or as error. In practice, many applications are concerned just as much J. C. Gower (B ) Department of Mathematics and Statistics, The Open University, Milton Keynes MK76AA, UK e-mail: [email protected]; [email protected] 123

Upload: john-c

Post on 14-Feb-2017

217 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Skew symmetry in retrospect

Adv Data Anal ClassifDOI 10.1007/s11634-014-0181-7

REGULAR ARTICLE

Skew symmetry in retrospect

John C. Gower

Received: 15 March 2014 / Revised: 21 July 2014 / Accepted: 1 August 2014© Springer-Verlag Berlin Heidelberg 2014

Abstract The paper gives a short account of how I became interested in analysingasymmetry in square tables. The early history of the canonical analysis of skew-symmetry and the associated development of its geometrical interpretation aredescribed.

Keywords Skew-symmetry · Canonical analysis of skew-symmetric matrices ·Singular value decomposition of skew matrices · Hedra · Bimensions · Trianglediagrams

Mathematics Subject Classification 62-09

1 Early history

My interest in asymmetry began when I spent the year 1974 in Adelaide with CSIRO.My principal collaborator there was Graham Constantine. Graham pointed out thatalthough there was a huge amount of work on the analysis of symmetric matrices therewas little, if anything, on the analysis of square asymmetric matrices, except in so faras they could be regarded as a special case of any rectangular table, in which casethere is a legion of models underpinning possible methods of analyses. A commonway of handling asymmetry in a square matrix X is to analyse the symmetric matrix1/2(X + XT), thus ignoring departures from symmetry, perhaps, treating them as oflittle interest or as error. In practice, many applications are concerned just as much

J. C. Gower (B)Department of Mathematics and Statistics,The Open University,Milton Keynes MK76AA, UKe-mail: [email protected]; [email protected]

123

Page 2: Skew symmetry in retrospect

J. C. Gower

with any asymmetry as they are with symmetry. Well-known examples are studiesof social mobility between classes, international trade between countries and peckingorder in hens. Graham and I wondered whether we could make any useful contributionto the problem of examining asymmetry.

We began by asking what rank 1 matrix when added to a square matrix X bringsit closest to symmetry? To answer this question, we have to define what is meantby symmetry in this context. The obvious criterion of asymmetry is X − XT and itssum-of-squares is its simplest measure. Indeed, the well-known identity:

X = M + N where M = 1/2(X + XT) and N =1/2(X − XT) for which ||X||2= ||M||2 + ||N||2,

underpins an orthogonal analysis of variance in which symmetry and departures fromsymmetry (i.e. skew-symmetry) may be analysed independently using least squares.This approach is apt when symmetry and departures from symmetry are thought likelyto be generated by different mechanisms that govern the generation of the data. Evenwhen the two parts are influenced by a common mechanism it is interesting to examinehow. Thus we want to minimise ||1/2X + uv′ − 1/2XT||2 which is equivalent tominimising ||N + uv′||2. Note that the factors 1/2 are introduced only to preservenotation and have no substantive effect.

The trouble begins when we recognise that ||N + uv′||2 = ||N − vu′||2 which is asimple consequence of reversing the order of multiplication in the following:

trace(N − vu′)(−N − uv′) = trace(N + uv′)(−N + vu′).

It follows that if uv′ is a valid solution to the problem of adding a rank-one solutionthen −vu′ is another equally good solution. Indeed, there is an infinite set of solutionsobtainable by rotating u and v orthogonally. By asking for a best rank-one solution itseems that we do not have a well posed problem and instead have to ask what is the bestrank-two matrix to add. We were not very surprised by our finding because we knewthat skew-symmetric matrices have imaginary eigen values +i σ, −i σ correspondingto conjugate eigenvectors (see Theorem 1 of the Appendix) but we did not knowabout the singular values and corresponding vectors. I have to admit that we asked thecomputer to evaluate the singular vectors of a skew symmetric matrix and found thatthe singular values come in equal pairs corresponding to right and left vectors beingequal but reversed in sign. Of course, we said, that is obvious and how the SVD of askew-symmetric matrix must be:

N = U�JU′ (1)

where J is zero except that J(2i−1), 2i = 1 and J2i, (2i−1), = −1 for i = 1, 2, 3, . . .

In fact it seemed so obvious that we did not write a proper paper about it but merelyreported the result in passing in Gower (1977), a conference paper, Constantine andGower (1978b) a paper giving applications of several forms of analysis of asymmetry,another paper of Gower (1980c), giving applications in biochemistry, and so on.

123

Page 3: Skew symmetry in retrospect

Skew symmetry in retrospect

As Gower (1977) reported, Graham and I had plans for a formal treatment hopefullyto be published in a main-stream statistical journal but they never materialised. In factit was only in 1992 that Gower and Zielman, in a University of Leiden Technical report,published an algebraic treatment. The latter paper appeared six years later in LinearAlgebra and its Applications (Gower and Zielman 1998) but without the Appendixconcerning the singular value decomposition of a skew-symmetric matrix. Therefore,I give a simple proof now as Theorem 2 of the current paper.

The result (1) gives all that is needed from an algebraic point of view but in data-analytical applications visualisation is essential. Equation (1) may be written

N =m∑

i=1

σi(ui v′

i − vi u′i)

(2)

where m = 1/2n for n even and m = 1/2(n − 1) for n odd. Also, ui is an alias for the(2i−1)th, column of U and vi is an alias for the (2i)th, column of U. Thus the firstterm of (2) is:

N = σ1(u1v′

1 − v1u′1

). (3)

The usual way of interpreting singular value decompositions U�V′ is to use the firsttwo columns of U as coordinates for the n row points and the first two columns of Vas coordinates for the n column points There are two problems with continuing thismethod. Firstly the first two columns of U and V essential give the same informationand secondly the inner product (3) does not seem to be readily interpretable.

We did not immediately understand how to overcome these difficulties. Indeed, itwas not until John Hartigan, of Yale, visiting his homeland of Australia came to Ade-laide, where I outlined our on-going work on asymmetry. In the course of explainingthings I suddenly realised the geometrical interpretation of our algebra. Focussing onthe row i and row j of (3) gives

ni j = σ1

(u1iv

′1 j − v1 j u

′1i

)

which is proportional to the area of a triangle with vertices Pi (u1i , v1i ) andPj (u1 j , v1 j ) and the origin O. So the inner product (3) is to be interpreted as thearea of a triangle. Thus John Hartigan was the unsuspecting catalyst of triangle dia-grams. Once the triangular nature is recognised, it is a small step to develop othergeometrical properties that are helpful for interpretation. Thus, all points Pj (u1 j , v1 j )

that form a triangle with Pi (u1i , v1i ) and O lie on a line through Pi (u1i , v1i ) parallelto OPi (u1i , v1i ). This is the counterpart of the familiar Euclidean property that allpoints Pj that are the same distance from a given point Pi lie on a circle with centrePi . When the origin O is collinear with Pi and Pj , the three points form a triangleof zero area, so ni j is zero; thus although proximity suggests zero asymmetry, as inEuclidean visualisations, in triangle diagrams proximity is not a necessary conditionfor zero asymmetry. The triangles O Pj Pi and O Pi Pj are the same but have their

123

Page 4: Skew symmetry in retrospect

J. C. Gower

labels permuted giving the same areas but with different signs. We can use the con-vention that clockwise labelled areas are positive and anticlockwise areas are negative,in accordance with the basic property of skew-symmetric matrices that n ji = −ni j .The points Pi , Pj and Pk (say) of the hedron (see below) define a triangle of areani j + n jk + nki . The interpretation of the areas of such triangles depends on whetherO is within the triangle or not.

I have used the name triangle diagram for this visualisation but Doug Carroll coinedthe term Bimension which has some merit but is a Graeco-Latin hybrid. Gower andZielman (1992, 1998) preferred hedron (adjective hedral), which is of pure Greekderivation and refers to planes as in polyhedron, dihedral etc. Thus the decomposition(2) refers to m hedra.

To sum up, what we did in 1974 was:

(i) We obtained the hedral form of the singular value decomposition of a skew-symmetric matrix.

(ii) We identified the geometrical properties of the hedral form in terms of areas oftriangles with (a) two points of the hedron and the origin and (b) three generalpoints of the hedron. These are the non-Euclidean basis of visualisations.

(iii) We recognised that the symmetric and skew-symmetric components of a squarematrix may originate independently and so merit independent analyses. Never-theless, the two components may be linked and one objective of analysis is toidentify possible linkages. This aspect is treated in Sect. 2

2 Linear skew-symmetry and combining skew-symmetry with symmetry

The simplest, and perhaps the most useful, form of skew symmetry is given by

N = 1n′ − n1′ (4)

which is linear. This form occurs frequently, notably in the earliest model of whichI know which incorporated asymmetry (Yates 1947). Note that despite its linearity,this form of skew symmetry remains of rank two. The hedral representation of (4)is merely a set of points Pi with coordinates (1, ni ) for (i = 1, . . ., n). In otherwords n collinear points on an axis orthogonal to the horizontal. It follows that anylinear hedral representation is an orthogonal rotation of this basic form and may be soparameterised. A linear hedral diagram implies that the linear form (4) may be usedas a tool for diagnosing linear asymmetry.

Gower (1977) examined two simple models of asymmetry: the jet-stream modeland the cyclone model. The jet-stream model imagined a country with n towns Pi (i =1, 2, . . ., n) with aeroplanes flying between them with velocity V . There is also ajet-stream with fixed direction and velocity v which will affect the times of flight,depending on whether the direction of the jet-stream gives a head or tail wind. Simplealgebra shows that providing v/V is sufficiently small, the times of flight have asymmetric component M and a skew-symmetric component N with the linear form(4). Multidimensional scaling of the symmetric component readily produces the mapZ (say) of the country while n is proportional to the projections of the towns onto the

123

Page 5: Skew symmetry in retrospect

Skew symmetry in retrospect

direction of the jet-stream. The projection Procrustes minimisation of ||Zp − λ n||2gives p and λ gives an estimate of v/V , thus disentangling the whole scenario. Asimilar story applies to the cyclone model where the jet-stream is replaced by a cyclone(i.e. a circular wind). Thus the symmetric part M of the flight times continues to giveZ and a map of the country. The skew-symmetric part is of full rank-two form and notlinear. However its hedral representation is a set of triangular regions centred on thecentre of the cyclone, so the problem is to find this centre and map the regions ontothe corresponding triangular hedral regions representing the towns, again allowing fora scaling factor ω/V , where ω is the angular velocity of the cyclone.

Thus, if we have the right model, it is relatively simple to fit it and to estimateeverything interpretable in terms of the model. Asymmetry may be independent ofsymmetry and even when there seems to be some dependency it may not be readilyidentified, although it can point to where mechanisms may be sought. For example,Gower (1980c) produced a map of DNA proteins which could be supplemented bylinear asymmetry which could be superimposed on the map as roughly concentriccontours. These suggested that certain pairs of DNA molecules could more readilymutate in one direction rather than the other. Similarly, Banfield and Gower (1980)showed that a multidimensional scaling of sugar related compounds had a linear asym-metry, apparently related to amount of sugar molecules. The point is that even linearasymmetry may occur in many ways.

If one has an apriori model of the two components, it is relatively easy to seeappropriate ways of interpretation. For example, with the wind model an appropriatedirection has to be found in a geographical two-dimensional map, or with the cyclonemodel the centre of the cyclone has to be found and mapped. The converse problemof how to disentangle possible unknown symmetric and asymmetric components ismuch more difficult. Using the data to find a model is much more difficult than fittinga given model to data. I have long hoped that good ways of doing this might be foundbut my most recent essay on this aspect (Gower 2008) shows that the difficulties areformidable.

A referee has drawn my attention to Escoufier (1980) which I have now reread. Itis an interesting paper quite closely related to my work with Graham Constantine butit is not the same. Indeed Escoufier and Grorud provide a remark which I quote below(N.B. the notation E, A, B correspond to X, M, N in the above):

Remarque: Dans leurs travaux A.G. Constantine et J.C. Gower propose de représenterséparément A et B. Nous proposons de les représenter conjointement. La représenta-tion séparée a l’avantage d’être optimale pour A et B mais a l’inconvenience de nepas être conjointe. La représentation conjointe a l’avantage de être conjointe, optimalepour E; elle a l’inconvénient de ne pas être optimale pour A et B séparément.

This remark precisely explains the difference between the two approaches. Thissection is about combining symmetry with skew symmetry, so it should be clear thatmy preference is to treat the two separately. Indeed, even with a simple model like (4)it is hard to see why any symmetric part should be spanned by the vectors 1 and n.Nevertheless it is nice to have the conjoint approach available.

123

Page 6: Skew symmetry in retrospect

J. C. Gower

3 Miscellaneous comments

The analysis of skew symmetry has fascinated me for 40 years and from time-to-time it has stimulated new research. In Sect. 2, I outlined how links between sym-metric and skew-symmetric aspects of a square matrix A might be combined. Moreencouraging was work began while in Adelaide to find the hedral structure of variousskew-symmetric matrices. Gower and Laslett (1978) found the structures of matri-ces, such as those with elements ni j = t|i− j |sign(i − j). They used direct but rathertedious methods. Immediately after I returned from Adelaide, I began to develop amore systematic approach based ultimately on the characteristic polynomial of anysquare matrix A. This depended on satisfying the property that the powers Ar havesome common structure that is preserved when summed. It turned out that this condi-tion is frequently satisfied and is associated with matrices A which conform to someuseful model that might be used as a basis for data analysis. The method may be putinto the form of a simple algorithm with algebraic, rather than computational moti-vation. It gives explicit algebraic forms for the eigenstructures and inverses, possiblypseudo-inverses, of matrices. I discovered from Colin Mallows of Bell Laboratoriesthat such a method could be found in an algebra text by Fadeev and Fadeeva (1963),Gantmacher (1959). Fadeev and Fadeeva attributed the method to the famous Frenchastronomer, Leverrier (1840) but although I looked at Leverrier’s paper, which wasvery long, and couched in terms of algebra in three-dimensions, I could not find it.However, it is probably there in some form. and Fadeeva required that the eigenvectorsof A be distinct, whereas I was concerned with matrices −N2 which are symmetric pdfwith eigenvalues occurring in equal pairs. A modified form of the algorithm based onthe minimal polynomial is then available (Gower 1980b). With the Leverrier–Fadeevversion of the algorithm, the coefficient of the characteristic polynomial are foundsequentially as part of the algorithm. With the modified version, the coefficients of theminimal polynomial are not found but this turns out to be inconsequential. In fact thealgorithm is available for general real square matrices and I have found many appli-cations (e.g. Gower 1980a, b, c; Gower and Groenen 1991; Denis and Gower 1994;Gower and De Rooij 2003).

Another spin-off was the development of further hedral structures. From the begin-ning it was clear that the Cayley formulae for associating skew-symmetric matriceswith orthogonal matrices imply corresponding associations between hedral forms. Thekey result [see e.g. Gantmacher (1959) but the result goes back to the last quarter ofthe nineteenth century as detailed in MacDuffee (1946)] is that any orthogonal matrixQ may be written in the form Q = U[I,−I, Hi ]U′ where the U is itself orthogonaland I refers to unit transformations and −I refers to reflections in the correspondingcolumns of U and Hi is an elementary rotation through an angle θi in two correspond-ing columns of U given by: (

cos θi sin θi

− sin θi cos θi

).

There are as many elementary rotations as required, immediately demonstrating thehedral structure, albeit supplemented by unit matrices. The real interest comes fromasking what unit matrix is closest to Q. Minimising ||Q − I||2 gives hedral compo-

123

Page 7: Skew symmetry in retrospect

Skew symmetry in retrospect

nents, I − I = 0, I − (− I) = 2I and a set depending on the elementary rotations,giving:

(1 00 1

)−

(cos θi sin θi

− sin θi cos θi

)= 2 sin (θi/2)

(sin (θi/2) − cos (θi/2)

cos (θi/2) sin (θi/2)

)

signifying a best two-dimensional rotation given by the largest singular value ofsin(θi /2). This offers a way of finding best-rotations but the possibilities remainrelatively unexplored (see Gower and Zielman 1992, 1998).

Hedral diagrams find their way into methods not designed for analysing asymmetry.Thus the ordinary inner-product expressed in terms of cosines can be reinterpretedin terms of the relationship cos(θ) = sin(θ +π /2), showing that inner-productsin two-dimensional space can be interpreted in terms of areas of triangles and thusmay be exhibited as triangles (Gower et al. 2010). Even more recently Albers andGower (2014) have shown how triple-product terms, as appear in INDSCAL, haveinterpretations in terms of content (the three-dimensional extension of area) and maybe expressed in hedral form.

I would like to think that the work I did with Graham Constantine has been useful; ithas certainly been fun. There is still much to be done in understanding and developingtriangle diagrams, skew-symmetry and generalised concepts of hedra.

4 Appendix: Basic results for skew-symmetry and orthogonality

In this appendix, for convenience we gather together some results on skew-symmetricmatrices. Some of these are well known but I do not think we ever gave a formal proofof the form of the singular value decomposition of a skew symmetric matrix, otherthan in a University of Leiden internal report (Gower and Zielman 1992) which waslater published in shortened form as Gower and Zielman (1998) without the materialgiven below.

Theorem 1 If N is a real skew-symmetric matrix, then its eigenvalues are imaginaryand occur in conjugate pairs iσ and − iσ corresponding to eigenvectors x + iy andx − iy, respectively where x′x = y′y and x′y = 0. When the order of N is odd, thereis an additional zero eigenvalue

Proof of Theorem 1 If ρ +iσ is an eigenvalue associated with an eigenvector x + iywe have:

N(x + iy) = (ρ +iσ)(x + iy)

and equating real and imaginary parts gives:

Nx = ρx − σyNy = σx + ρy

}(5)

123

Page 8: Skew symmetry in retrospect

J. C. Gower

Premultiplying by x′ and y′ and adding, gives that

x′Nx + y′Ny = ρ(x′x + y′y).

But x′Nx = y′Ny = 0 and so ρ = 0, showing that the non-zero eigenvalues arepurely imaginary and x′y = 0. Premultiplying by y′ and x′ shows that σx′x = σy′y.

Setting ρ = 0 in (5) gives

Nx = −σyNy = σx

}(6)

and hence if iσ is an eigenvalue of N satisfying N(x + iy) = iσ(x + iy) then theeigenvector equation N(x − iy) = −iσ(x − iy) is also satisfied.

When N is of odd order there is an extra eigenvalue ν, say, which is not one of apair. Because the imaginary pairs cancel one another, the sum of all the eigenvectorsmust be ν. Hence, ν = trace(N) = 0.

Theorem 2 The singular value decomposition of real skew-symmetric matrix N hasthe form U�JU′ where U is orthogonal and J is defined in Sect. 1.

Proof of Theorem 2 Assume that N has a general singular value decomposition N =USV′. Then U and V are the eigenvectors of the real symmetric positive semi-definitematrices NN′ = US2U′ and N′N = VS2V′. Because N is skew-symmetric we havethat NN′ = N′N = −N2 and hence from (6):

NN′x = σ 2x and N′Ny = σ 2y.

This shows that the singular values of N occur in pairs, corresponding to orthogonalvectors x and y which occur as columns in both U and V. Indeed, rearranging (6)gives:

N (x, y) = σ (x, y)

(0 1−1 0

)= σ (−y, x) .

This shows that the columns of V corresponding to the singular value σ are the sameof those of U but in reverse order and a change of sign. When n is of even order allthe singular vectors have the relationship V = UJ′ and when n odd there is a zerosingular value and J has to be augmented by a final unit diagonal value. In both cases,J is orthogonal and so is UJ′, as it has to be for a valid singular value decomposition.Thus, finally the SVD of a skew-symmetric matrix is:

N = U�JU′ (7)

where S = � = (σ1, σ1, σ2, σ2, . . ., (0)) and the singular values are assumed to be innon-increasing order and the final “0” is omitted when n is even.

123

Page 9: Skew symmetry in retrospect

Skew symmetry in retrospect

References

Albers CJ, Gower JC (2014) A contribution to the visualisation of three-way tables. J Multivar AnalBanfield CF, Gower JC (1980) A note on the graphical representation of multivariate binary data. Appl Stat

29:238–245Constantine AG, Gower JC (1978a) Some properties and applications of simple orthogonal matrices. J Inst

Math Appl 21:445–454Constantine AG, Gower JC (1978b) Graphical representation of asymmetry. Appl Stat 27:297–304Denis J-B, Gower JC (1994) Asymptotic covariances for the parameters of biadditive models. Utilitas

Mathematica 46:193–205Escoufier Y (1980) Analyse factorielle des matrices carrees non symmetriques. In: Diday E et al (eds) Data

analysis and informatics. North-Holland Press, Amsterdam, pp 263–276Fadeev DK, Fadeeva VD (1963) Computational methods in linear algebra (in Russian, Fizmatgiz, Moscow,

1960). English translation by Williams RC. W.H. Freeman, San FranciscoGantmacher FR (1959) The theory of matrices (two volumes) (Hirsch KA, Trans). Chelsea Publishing

Company, New YorkGower JC (1977) The analysis of asymmetry and orthogonality. In: Barra J et al (eds) Recent developments

in statistics. North Holland Press, Amsterdam, pp 109–123Gower JC (1980a) An application of the Leverrier–Fadeev algorithm to skew-symmetric matrix decompo-

sitions. Utilitas Mathematica 18:225–240Gower JC (1980b) A modified Leverrier–Fadeev algorithm for matrices with multiple eigenvalues. Linear

Algebra Appl 31:61–70Gower JC (1980c) Problems in interpreting asymmetrical chemical relationships. In: Bisby FA, Vaughan

JG, Wright CA (eds) Chemosystematics: principles and practice, vol 16. Academic Press, London, pp399–409

Gower JC (2000) Rank-one and rank-two departures from symmetry. Comput Stat Data Anal 33:177–188Gower JC (2005) An application of the modified Leverrier–Faddeev algorithm to the singular value decom-

position of block-circulant matrices and the spectral decomposition of symmetric block-circulant matri-ces. In: Barlow JL, Berry MW, Ruhe A, Zha H (eds) CSDA. Special issue: Matrix computations andstatistics, vol 50/1, pp 89–106

Gower JC (2008) Asymmetry analysis: the place of models. In: Shigemesu K, Okada A, Imaizumi T,Hoshino T (eds) New trends in psychometrics. Universal Academy Press, Tokyo, pp 69–78

Gower JC, Groenen PJF (1991) Applications of the modified Leverrier–Faddeev algorithm for the construc-tion of explicit matrix spectral decompositions and inverses. Utilitas Mathematica 40:51–64

Gower JC, Laslett GM (1978a) Explicit singular value decompositions, spectral decompositions and inversesof some skew symmetric matrices. Utilitas Mathematica 13:33–48

Gower and De Rooij M (2003) A comparison of the multidimensional scaling of triadic and dyadic distances.J Classif 20:115–136

Gower JC, van der Velden M, Groenen PJF (2010) Area biplots. J Comput Gr Stat 19:46–61Gower JC, Zielman B (1992) Some remarks on orthogonality in the analysis of asymmetry. RR-92-08,

Department of Data Theory, University of Leiden, p 22Gower JC, Zielman B (1998) Orthogonality and its approximation in the analysis of asymmetry. Linear

Algebra Appl 278:183–193Leverrier UJJ (1840) Sur les variations seculaires des elements des orbites. J MathMacDuffee CC (1946) The theory of matrices. Chelsea Publishing Company, New YorkYates F (1947) The analysis of data from all possible reciprocal crosses between a set of parental lines.

Heredity 1:297–301

123