2014 paper joc the canonical analysis of distance

Journal of Classification 31:107-1028 (2014) DOI: 10.1007/s00357-014-9149-8

__________ We would like to thank the anonymous reviewers and the editor for their helpful

comments and suggestions. This work is based upon research supported by the National Research Foundation of South Africa. Any opinions, findings and conclusions, or recommendations expressed in this material are those of the authors and therefore the NRF does not accept any liability in regard thereof.

Authors Addresses: J.C. Gower, Department of Mathematics and Statistics, The Open University, Milton Keynes, MK7 6AA, UK, e-mail: [email protected]; N.J. le Roux, Department of Statistics and Actuarial Science, Stellenbosch University, Stellenbosch, 7600, South Africa; e-mail: [email protected]; S. Lubbe, Department of Statistical Sciences, University of Cape Town, Cape Town, 7701, South Africa, e-mail: [email protected].

The Canonical Analysis of Distance

John C. Gower The Open University, UK Niel J. le Roux Stellenbosch University, South Africa Sugnet Gardner-Lubbe University of Cape Town, South Africa

Abstract: Canonical Variate Analysis (CVA) is one of the most useful of multivariate methods. It is concerned with separating between and within group variation among N samples from K populations with respect to p measured variables. Mahalanobis distance between the K group means can be represented as points in a (K 1) dimensional space and approximated in a smaller space, with the variables shown as calibrated biplot axes. Within group variation may also be shown, together with circular confidence regions and other convex prediction regions, which may be used to discriminate new samples.

This type of representation extends to what we term Analysis of Distance (AoD), whenever a Euclidean inter-sample distance is defined. Although the N u N distance matrix of the samples, which may be large, is required, eigenvalue calculations are needed only for the much smaller K u K matrix of distances between group centroids. All the ancillary information that is attached to a CVA analysis is available in an AoD analysis.

We outline the theory and the R programs we developed to implement AoD by presenting two examples. Keywords: Analysis of distance; Biplot; Canonical variate analysis.

Published online: 3 April 2014

J.C. Gower, N.J. le Roux, and S. Lubbe

1. Introduction

Canonical Variate Analysis (CVA) gives a useful method for des- cribing and assessing the differences between the means of K groups or classes (see e.g. Krzanowski 2000; Mardia, Kent and Bibby 1979; and McLachlan 1992). A key concept in CVA is the use of Mahalanobis dis-tance to define inter-group distance. The group means occupy K 1 dimensions and, after transformation into Mahalanobis space, will usually be approximated, essentially by Principal Component Analysis (PCA), in some smaller number of dimensions. This approximation may be exhibited graphically, together with points representing the individual samples. Confidence circles, or other regions, may be included to represent the degree of uncertainty and the whole may be endowed with calibrated linear biplot axes. Thus, CVA has two aspects: (a) making maps that exhibit within and between group variability and (b) using such maps to aid discrimination by assigning samples to their best group. By extending the technique referred to by Digby and Gower (1981) as an Analysis of Distance (AoD), we show how the ideas behind CVA can be generalized to cope with other definitions of distance which often occur in the applied literature. We emphasise aspect (a) while (b), which is related to linear discriminant analysis, gets less prominence.

As in CVA we have measurements on each of p variables for n samples distributed among K groups of sizes n1, n2, , nK summing to n. These measurements are available in an np matrix X, with group-membership given in an nK indicator matrix G. Here, G is zero except that gik = 1 when the ith sample belongs to the kth group. Thus G1 = 1 and 1cG = 1cN, where N = diag(n1, n2, , nK) = GcG.

In AoD it is assumed that the distances iid c between all pairs i and ic of samples (i, ic = 1, , n) are available in the form of an nn matrix D = ^ `221 iid c . To avoid tedious repetition we term such a matrix, of squared distances divided by 2, a ddistance matrix. Distances may be defined very generally, though it is desirable that they be Euclidean embeddable as we shall assume in the following. We shall also require that each variable contributes additively to squared distance, thus satisfying:

cc p

jjiijjii x,xfd

1

2 )( , (1)

where fj(., .) is a function defining squared distance for variable j. Distances may be expressed in terms of quantitative variables or by qualitative (categorical) variables. Gower and Legendre (1986) give a list of some of the possibilities. Usually, fj(., .) represents the same function for each variable but this is not necessary. It is especially useful to allow

108


different functions when some variables are numerical and some categorical.

Thus, AoD differs from CVA in allowing any Euclidean embeddable measure of inter-sample distance and, by extension, any measure of inter-group distance. As with CVA, these distances may be represented in maps with points representing the centres of the groups, supplemented by additional points representing the within-group variation. In this paper we give examples of Pythagorean distance (equivalent to PCA), square root of Manhattan distance and Clarks distance.

We could proceed by (a) doing a Principal Coordinate Analysis (PCO) (see Gower and Hand 1996) of the n u n matrix D followed by (b) evaluating the group means to produce a map of the K group means. The latter may be approximated in fewer r, say, dimensions by using K dimensional PCA. PCA involves a rotation of the K group means and then the same rotation may be applied to all n samples. A problem with this approach is that n may be very large, entailing a massive eigen-decomposition. This can be avoided by using the AoD methodology described below, which requires the whole of D but the eigenstructure of only a K u K matrix. This simplification allows large data sets to be handled efficiently and at the same time, by focussing on the group-average space, helps interpretation. These provide the main motivations of the following.

The basic methodology started with a somewhat hard-to-find publication by Digby and Gower (1981), followed by Gower (1989), Krzanowski (1994), Gower and Krzanowski (1999) and Gower, Lubbe and le Roux (2011). Ringrose (1996) and Krzanowski and Radley (1989) discussed nonparametric confidence and tolerance regions which may be used to aid discrimination in CVA, and which are readily adaptable to AoD. These papers present successive enhancements and generalizations, a process continued here.

The general plan followed below is:

x Representation of the K group means in K 1 dimensions. x Addition of points for all the n samples. x The approximation of the above in r dimensions and summary in

the form of an analysis of distance. x Endowment of the approximation with predictive calibrated

nonlinear biplot axes for quantitative variables. x A discussion of the methodology of using group sizes as weights. x Introduction to software written in R for performing an AoD as

described above. x Presentation of examples.

109


2. Representation of the Group Means If gk is the kth column of G then kk

kkkk nn ccc c Dgg

1D gives the

average of the ddistances between the members of the kth and kcth groups. When k = kc, the zero diagonals and repeated symmetric values are included in the averaging process. For all K groups we obtain the KK matrix:

11 c DGNGND . (2)

Using (2), Gower and Hand (1996, p249) showed that

kkkkkkkk cccc D2DD 21 (3) is the ddistance between the centroids of groups k and kc forming a ddistance matrix ={ kk c }: K u K. In order to obtain a map of the group means, analogous to the map of group means in CVA, any method of multidimensional scaling can be used. In the following we shall use PCO because of its simplicity and openness to algebraic analysis. When we define the distances to be Pythagorean, a PCO of is equivalent to a PCA of the group means. If in D we defined Mahalanobis distance between all pairs of samples, a PCO of would recover the CVA of the canonical means. With other choices of embeddable distance and MDS, different analyses and representations will ensue.

3. Adding Points for the Individual Samples

Using PCO to represent , the coordinates of the group means mK u:Y are obtained from the spectral decomposition of

YY11I11I c cc )/()/( KK , (4) where YY c is diagonal. Gower (1968) showed that any point P, say, whose ddistances from P to the group means are given in a vector Ghas coordinates KP /1 1Yy c , (5) and PPP KKy yy111 ccc /2/

221m, , an extra (m + 1)th dimension

required for each point added but rarely needed in applications. We assume that the distances from the new point P, to all the n

original points are given in a column-vector d: n u 1 of elements Kdddd ccc c ...,,, 21 here dk is a vector of size nk giving the ddistances

110


of the new point from the samples in the kth group. To interpolate all the samples from the kth group, d must be taken successively as the nk

columns of D, corresponding to the kth group, > @cccc Kkkk DDD !21 . Let cccc Khhhh )(2)(1)()( ...,,, dddd denote the hth column of the latter matrix, then Gower, Lubbe and le Roux (2011) show that substituting

}{ 2211 iK G u with nk columns of the form

ihniii iD )(22 d1c G , (6)

for i = 1, ..., K and h = 1, ..., nk in (5), yields

c

c

cc

c

cc

c u

k

Kk

k

k

k

n

Kkn

kn

kn

nKK

n

n

mnk K

D

DD

11

D1

D1D1

1

11

YY

1

21

11

22

11

1 2

1

21

##. .(7)

The centroids of the nk inserted points are at the same position as the

kth group-mean in Y , as is verified in section 5.8.1 of Gower, Lubbe and le Roux (2011).

4. Approximation in r Dimensions

Equation (5) gives the coordinates of an added point in K 1 or fewer dimensions; in r-dimensional approximations, only the first r columns of Y and the first r eigenvalues will be needed. This will be written as /1J Y where

J =

uu

u)()(:)(:

)(:rKrKrrK

rKrr00

0I.

The cloud of points surrounding each centroid may be enclosed in any tolerance region that expresses spread, analogously to the confidence circles of CVA. Thus, we may use minimal covering circles or ellipses enclosing, say, all or 95% of the points or we may use bagplots or we may use convex hulls (see e.g. Section 2.9 of Gower, Lubbe and le Roux 2011). Furthermore a nonparametric testing procedure can be used for testing as illustrated in an example below.

Next we show how the above may be summarized in the form of an analysis of distance. We may write in full matrix form as:

= D 21 [diag( D )11c+ 11c( diag( D )],

111


whence

ncn = ncDn n[nc(diag( D )1], (8)

with n the n-vector of diagonal elements of N. From (8) we have:

c c

K

kkkk Dnn

1D11nn

, which rearranges to:

1

Kk k

k kn n ng Dg1 D1 n n

cc c . (9)

Recalling that 1cD1/n is the total sum of squares and k k kng Dg /c is the sum of squares within the kth group, we see that, apart from sign, the analysis of distance (9) is analogous to the CVA orthogonal analysis of variance:

Total sum of squares = Between group sum of squares + Within group sum of squares. Thus, from (9) we may form an analysis of distance table in which

the contributions between and within groups are exhibited. Further we may break this down into the contribution arising from different dimensions and sets of dimensions, especially the r fitted dimensions and the remaining residual dimensions. The latter may be further subdivided into the (K 1 r) dimensions holding the group means and the distances orthogonal to the group means. Note that with K groups, the means fit into K 1 or fewer dimensions so the remaining residual dimensions for the group means are null.

5. Biplot Axes

We have represented within group variation by choosing d as the

successive columns of D, eventually leading to (7). However, d may also refer to a genuine new sample, in which case (5) interpolates that sample into the map. In particular, d may be chosen as a pseudo sample and used to plot predictive trajectories for numerical variables. To do this, we set the pseudo sample for the jth variable to have value ej so that as varies we trace out a nonlinear trajectory for the jth variable, which may be calibrated for suitably chosen values of . In this way the AoD of the individual samples may be enhanced with a biplot to include information on the variables. These trajectories may be approximated in r dimensions by the methods given by Gower and Hand (1996) and Gower, Lubbe and le Roux (2011) for nonlinear biplots. Here, we extend the nonlinear theory

112


to construct trajectories in the case of canonical analysis of distance. The trajectories act like coordinate axes and may be used by projecting sample points onto them and reading off the nearest calibrated value. An easier method for reading off predictions equivalent to normal projection is termed circular projection. In circular projection, the trajectories are constructed so that when a circle is drawn with diameter given by the origin and the sample point, the predictions are given where this circle intersects the trajectories. Alternatively, the regression method (see e.g. Chapter 4 in Gower, Lubbe and le Roux 2011) may be used to give approximate linear biplot axes. Krzanowski (2004) suggests a half-way house where a limited number of pseudo samples (say 10) are fitted and joined by linear axes.

6. Weighting

Finally, we note that the above uses unweighted centroids of the

group means as its origin O, say. Again, analogously to CVA, we may use centroids weighted by sample sizes. The starting point is the weighted PCO of where now (4) is replaced by

c

c c

nn1nIn1IYY 22 . (10)

Because nc 2Y = 0c, the origin moves from O to G, the origin of the

samples, which is the centroid of group centroids weighted by the group sizes. As with CVA, the use of a weighted centroid does not affect the distances between the individual centroids but in approximations, groups with smaller sample size will be less well represented than those with the larger sample sizes. A PCO type eigendecomposition of (10) is not entirely satisfactory because it gives an unweighted fit to a best-fitting plane through the weighted centroids. That is, residuals from projections of the group-centroids onto any r-dimensional approximation plane all have unit weight. To weight the residuals according to given weights W, say, is readily accomplished by a weighted PCA of 2Y . That is, we minimize

)}(){( 2222 YYWYY ctrace which may be written min

2

222/1 )( YYW . This requires a simple application of the Eckart-

Young theorem in which the singular value decomposition (SVD) VUYW c6 2

2/1 gives the r-dimensional approximation

VJUYW c6 22/1 and, finally:

113


VJUWY c6 212 . (11)

Note that (i) we may write VVJVUWY cc6 212

= 2Y VJVc,

showing 2Y as an orthogonal projection of 2Y , and (ii) for r-dimensional

plotting purposes we may use VJYY 22 because the final orthogonal

matrix V merely rotates the r-dimensional solution into p dimensions. The two steps of determining 2Y from (10), followed by a weighted

PCA of 2Y , may be subsumed into a single step, as follows. Combining the SVD of 2

2/1 YW with (10) gives:

UUW1nIn1IW c6

c

c 22/12/1

nn, (12)

which immediately yields U and . These may be substituted into (11), ignoring V as just discussed, to give 2Y referred to r-dimensional principal axes. Normally, the weights W would be the diagonal matrix N with the group sizes n in the diagonal.

Thus (12) immediately gives a weighted AoD for the group means; the problem of how to add individual samples remains. This is not difficult but two points have to be borne in mind. Firstly, Y and 2Y are referred to different origins. Secondly, although Y given by (4) and the 2Y derived from (10) generate the same ddistances, their orientations will differ. Recall that any centring s, where s'1 = 1, does not affect the distances generated by Y . This follows from noting that writing ss YY c = (I 1sc)(I s1c) the squared distance between the ith and i'th rows of sY is (ei ei')'(I 1s')(I s1')(ei ei') = (ei ei')'(ei ei') = ii + i'i' 2ii' =

2ii cG (where ei denotes a unit K-vector with its ith element equal to unity,

else zero) as given in (3). In (4) we have chosen s = 1/K and in (10) s = n/n.

Denote the recentred matrix Yn1I n/c by 1Y ; then we require the rotation Q of the coordinates represented by 1Y that match the

coordinates represented by 2Y or 2Y . This match is given by the solution of

the orthogonal Procrustes problem2

2 1minQ Y YQ or 2

2 1min

QY YQ .

114


The solution to this problem is well-known (see for example, Gower and Dijksterhuis 2004) and it is obtained through the SVD

TSYY c c 12 or TSYY c c 12 , by setting Q = TSc. Moreover the fit is exact when 2Y is used. The

difference in origin of 2Y or 2Y , relative to Y is the translation ncY /n. These results imply that if a sample is added according to the methodology given in Section 3 to give a point y, then relative to the weighted analysis the point has coordinates (yc ncY/n)Q in -space; the coordinates orthogonal to -space are unchanged. This allows all samples and all biplot trajectories as well as CLPs to be placed in the space of the weighted analysis, whose first r dimensions then give the r-dimensional weighted approximation. Thus, the weighted analysis is easily derived from the unweighted analysis.

7. Software for Constructing AoD Biplots

A shortcoming, already referred to above, is that in the discussion of

AoD given by Gower, Lubbe and le Roux (2011) they do not present AoD plots with trajectories fitted when the plot is based on general Euclidean embeddable distances that are additive. Nor do they provide their R function AODplot with facilities for constructing AoD biplots. To address these shortcomings the R function AODbiplot has been written to construct the AoD biplots discussed in this paper. AODbiplot extends AODplot by utilizing nonlinear biplots as discussed by Gower, Lubbe and le Roux (2011). All the above functions are available by following the instructions in the file ReadMe.txt at

https://dl.dropbox.com/u/17860902/CanonAnalDist.zip. These authors show that the co-ordinates for tracing a prediction

biplot axis W for variable t along a series of values P, is based on a series of lines L(P) with equation

c

c

PP

OO

OO

dd

Kyy

yy

dd

KK

1z 1

21

11

121

111

21

21

## , (13)

where z denotes the two dimensional coordinates in the biplot space and ijy is the ijth element of mK u:Y . The procedure is analogous to that of

the nonlinear biplot (see Gower and Ngouenet 2005) except for the derivative of G. Assuming additive distances defined by (1) the ddistances between pseudo sample W(P) and the nk samples in the kth group are given by

115


p

jtnttntjnj

p

jttttjj

k

kkkxfxfxf

xfxfxf

),()0,()0,(

),()0,()0,(

)(

21

21

21

121

121

121

)(

P

P

PW #d .

Therefore, for the kth group and variable t it follows that

),(

),(

)(

21

121

)(

PP

PP

PP W

tnt

tt

k

kxf

dd

xfdd

dd #d . (14)

Using (6) and (14), it follows that

^ `

c i

ii

n

jjttniniii xfd

dDdd

dd

1

1)(

22 ),()( PP

PP

GP W

d1

and

K

K

K

n

ijit

K

n

ijit

n

ijit

xfdd

n

xfdd

n

xfdd

n

dd

1

12

11

),(2

1

),(21

),(21

2

2

2

1

1

1

PP

PP

PP

P#

.

Writing (13) as a(P)z = c*(P) with reparameterization

)()(/)()( 2221 PPPP aaal ii for i = 1, 2,

and 2 21 2c c a a

*( ) ( ) / ( ) ( )P P P P ,

the normal projection prediction biplot trajectories are given by

^ ` ^ `2 10 0 c c d d

2 1 1 2d l d l = l l d l l d( ) ( )( ) ( )( ) ( ) ( ) ( ) ( )W

c ,

116


where P0 is the solution to c(P0) = 0 and the circle projection prediction

biplot trajectories are given by > @1 2 l c l c ( ) ( ) ( ) ( ) ( )W c (Gower, Lubbe and le Roux 2011).

This has been implemented in our R function AODbiplot used for constructing the examples in the next section. It is of interest to note that the nonlinear biplot as described by Gower and Hand (1996) or Gower, Lubbe and le Roux (2011) is obtained as a special case of an AoD biplot by specifying it as an n group AoD biplot with each group consisting of a single sample.

8. Examples

8.1 Ocotea Data Our first example concerns the properties of timber sampled from

species of the hard wood genus Ocotea. Gower, Lubbe and le Roux (2011) give a detailed description of the data, consisting of three groups (the species) and six continuous variables. Anatomical characteristics of 37 wood samples were determined by microscopic methods. The following measurements were made: Vessel diameter in Pm (VesD), vessel element length in Pm (VesL), fibre length in Pm (FibL), ray height in Pm(RayH), ray width in Pm (RayW) and the number of vessels per mm2 (NumVes). The 37 samples consisted of three known species: Ocotea bullata, O. porosa and O. kenyensis.

Initially, we used Pythagorean distance, after normalizing the variables in the usual way to unit variances. With our methodology this gives a PCA of the group means; the individual samples are interpolated as described above. A two-dimensional biplot approximation is shown in Figure 1. We have used open symbols for the samples to differentiate the three species and the corresponding filled symbols to mark the positions of the sample means. Notice that despite the normalization, for convenience we have calibrated the axes in terms of their actual measurements. We can immediately see Oken scores high on RayH and FibL but low on NumVes. Similarly Obul scores high on VesL but low on RayH, VesD and RayW. The main feature of Opor is its low score on VesL compared with the other species.

We can also comment from Figure 1 that the sample variation within groups is neither homogeneous nor elliptical, the classical assumptions of CVA; similar heterogeneity is evident in other figures shown in this section. Nonparametric tolerance regions, as discussed by Ringrose (1996) and Krzanowski and Radley (1989), would have to be used if we had been concerned with discrimination.

The biplot axes in Figure 1 are linear and the individual samples are interpolated onto the plot to show within group variation. Because there

117


Figure 1. An AoD using Pythagorean distance, showing the group means (filled symbols) and surrounding sample variation (corresponding unfilled symbols). This is similar to a PCA of the group means and, because of the Pythagorean distance, continues to have linear biplot axes.

are only three species, the group means fit exactly into our two-dimensional display. Therefore, the group means given in Table 1 can be read exactly from the biplot axes in Figure 1.

The partitioning (9) of the total AoD sum of squared distances of 216.0 in the analysis underlying Figure 1 is: Between = 66.9625 and Within = 149.0375.

That there are real differences between the species is evident from Figure 1. A permutation test of the null hypothesis that any observed difference between the group means is due to chance may be made by assigning the 37 samples randomly to three groups of sizes 20, 10 and 7 respectively. The between and within contributions to the total sum of

40

60

80

100

120

140

160

180

200VesD

250

300

350

400

450

500

VesL

400

600

800

1000

1200

1400

1600

1800

FibL

300

350

400

450

RayH

25

30

35

40

45

50

RayW

810

1214

1618

2022

NumVes

Obul

Oken

Opor

118


Table 1. The Group Mean Values of the Raw Ocotea Data.

VesD VesL FibL RayH RayW NumVes Obul 98.10 412.00 1185.40 375.35 32.30 14.30 Oken 137.29 401.71 1568.86 446.14 37.29 9.14 Opor 129.30 342.40 1051.70 398.20 39.40 14.80

squared distances of 216.0 were then determined for 10 000 repetitions and the achieved significance level (ASL) determined as the proportion of times the between contribution exceeds the value of 66.9625. It is clear from the permutation density displayed in Figure 2 that the null hypothesis is rejected with an ASL of approximately zero.

Next we repeat the analysis of Figure 1, but using the square root of the Manhattan distance in the AoD biplot function (after the same normalization of the data underlying the biplot in Figure 1). Notice that taking the derivative of an absolute value function results in the sharp turns in the biplot axes of Figure 3.

In contrast to the Pythagorean distance used in Figure 1, the square root of the Manhattan distance results in perfect separation between the samples of the three groups. However, projections onto the trajectories can be ambiguous and are more problematic.

We note that from (9) the partitioning of the sum of squared distances of the AoD based on the square root of Manhattan distance follows as: Total = 119.2082; Between = 27.3292 and Within = 91.8790.

Next, we give an example of an Euclidean embeddable distance that results in smooth nonlinear trajectories, namely Clarks distance (Gower and Ngouenet 2005) defined by

p

k jkik

jkikij xx

xxd

1

2

for non-negative values xik, xjk. Clarks distance is invariant to scaling of the variables but not to the

location of the origin. Thus, the variables in Clarks distance should always be positive (and preferably nonzero); it is ideal when there is a natural zero for every variable. Since the Ocotea data contain only non-negative values Clarks distance (without any normalization) can be used in the AoD. The resulting biplot with axes for circular prediction is shown in Figures 4 and 5.

It is clear that the intersections of the circles with the respective axes in Figure 5 give relatively accurate predictions of the true values in Table 1, with the exception of the VesL value of Oken and the NumV value of Obul. The corresponding partitioning (9) of the sum of the squared dis-

119


Figure 2. Permutation distribution of the between sum of squares for 10 000 repetitions performing the AoD resulting in Figure 1 by randomly allocating the observed samples to three groups of sizes 20, 10 and 7 respectively. tances now becomes: Total = 2.7810; Between = 0.7926 and Within = 1.9884.

Taking Figures 4 and 5 together, we can make the following remarks. Firstly, the axes are only slightly nonlinear. Indeed some axes turn out to be close to linearity. The corresponding AoD biplot with axes constructed to enable normal prediction will give the same predictions as those in Figures 4 and 5. Therefore it is not shown here but we can report that the program used for constructing Figures 4 and 5 takes several orders of magnitude longer when instructed to construct axes for normal prediction. As well as the nonlinear nature of the axes, we can remark on the regularity of the scale markers used for calibration. Of course in PCA everything is linear and regular and with Figure 3 everything is nonlinear and irregular. However, Clark distance seems to have produced only mild nonlinearity accompanied by mild irregularity. Most notable are probably variables NumVes and FibL, the latter being particularly noteworthy because its irregularity occurs in the centre of the range of the Obul samples. Apart from speed considerations circular projection has the property of giving all predictions for a sample simultaneously, which is

0 10 20 30 40

0.00

0.02

0.04

0.06

0.08

Den

sity

120


Figure 3. The same as Figure 1 but using the square root of Manhattan distance. In order to follow a trajectory more easily different colours are used for them. The trajectories are now angular (at data-values) but the groups overlap less. useful, especially when used in conjunction with interactive software. Although we have shown only predictions for the group means, both circular prediction and normal prediction are equally valid for predicting the values of all sample points. 8.2 Pine Data

In our second example, we illustrate the effects of weighting on the

AoD. We use a small set of data on 36 samples (sizes 11, 5, 6, 9, 5) collected from five species of pine (the groups). The species are described by seven variables: TotYield (total pulp yield expressed as a percentage

125

375395

390

370 430435

80

8590

95100

110

115120

130135

140

145

VesD

335340345

350

360

365

395400

405

410VesL

1050

11001150

1300 140015001550

FibL

340

345

350355

360365

370

375380 385

390

420

425440

445450

RayH

28

29

30

313233

34

35

37

38

39 40 RayW

9

10

11

1213

14

15

NumVes

105

355

36

ObulOkenOpor

121


Figure 4. AoD biplot with Clarks distance on the original Ocotea data. The axes are designed for circular prediction and are well-behaved with no, or very slight, nonlinearity and near-regular calibrated intervals. of the original mass); Alkali (percentage alkali consumption); Density (wood density in kg m3); TEA (tensile energy absorbsion in mJ g1); Tensile (tensile index); Tear (tearing index in mN m2 g1) and Burst (burst index in kPa m2 g1), (see Gower, Lubbe and le Roux 2011 for a detailed description). The data come from an investigation at a South African wood mill into the underlying relationships between genetic (species) and physiological factors of wood, including pulp quality.The Pine data will also be used to show how the within sum of squares can be decomposed into components from the delta-space (here in four dimensions), the biplot display space (here in two dimensions), and the space orthogonal to the delta-space (here in three dimensions). In this example, we use Pythag-

70

80

90

100

110

130

140

150

160

170

180

VesD

260

280

300

320

340

360

400

420

440

460

480

500

520VesL

800

900

1000

1100

1200

14001500

16001700

1800

FibL

280

300

320

340

360

380

400

420

440

460

480

500

RayH

24

26

28

30

32

34

36

38

40

42

44

46

48

50

52

RayW

89

1011

121314

15161718

19202122

NumVes

Obul

Oken

Opor

122


Figure 5. Similar to Figure 4 but omitting the samples to illustrate how circular prediction can be made for the group means. gorean distance throughout but similar methodology would apply with other distances, though not necessarily with similar results.

The partitioning of the total sum of squares associated with the above AoD is:

Total ss = 245.00; Between ss = 75.15 and Within ss = 169.85. The contributions in the delta-space to the above partitioning are

given in Table 2. Notice that the between sums of squares is confined to the four

dimensions of the delta-space, while the within group sum of squares has components in the three higher dimensions. Summing over the first r dimensions gives the contributions in the r dimensional display space.

70

80

90

100

110

130

140

150

160

170

180

VesD

260

280

300

320

340

360

400

420

440

460

480

500

520VesL

800

900

1000

1100

1200

14001500

16001700

1800

FibL

280

300

320

340

360

380

400

420

440

460

480

500

RayH

24

26

28

30

32

34

36

38

40

42

44

46

48

50

52

RayW

89

1011

121314

15161718

19202122

NumVes

Obul

Oken

Opor

123


Table 2. Contributions to the overall partitioning of the sum of squares obtained in the AoD analysis of the Pine data for both the unweighted and weighted analyses. The first four columns refer to the delta-space and the fifth column to all dimensions orthogonal to the delta-space. Unweighted analysis

Dim 1 Dim 2 Dim 3 Dim 4 Dim >4 Sum Between ss 34.36 27.41 11.60 1.79 0 75.16 Within ss 20.36 55.94 24.62 16.44 52.48 169.84 Total ss 54.72 83.35 36.22 18.23 52.48 245.00

Weighted analysis

Dim 1 Dim 2 Dim 3 Dim 4 Dim >4 Sum Between ss 35.97 27.18 10.25 1.75 0 75.16 Within ss 12.05 64.24 25.42 15.65 52.48 169.84 Total ss 48.02 91.42 35.67 17.40 52.48 245.00

Thus, the overall quality of display in two dimensions is: 82.18% (unweighted) and 84.04% (weighted). Figures 6 and 7 contain the unweighted and weighted AoD biplot for the Pine data, respectively.

The only difference between the unweighted and weighted analyses is in the values found in delta-space. The most striking observation from Figures 6 and 7 is that the weighted and unweighted analyses are very close. Probably the only difference apparent to the naked eye lies in the different distribution of within group variances in the first two dimensions, and even that is accounted for to some extent by differences in orientation.

Next, we consider the individual contributions to the total within sum of squares (52.48) in the space orthogonal to the delta-space. In Table 3 we show the samples having the five smallest together with the samples having the five largest individual within sum of squares in the space orthogonal to the delta-space.

Table 3 shows that samples 15, 22, 25, 27 and 34 are nearest to the delta-space while samples 30, 10, 29, 2 and 31 are the furthest away from the delta-space.

Finally, (Table 4), the sum of squares orthogonal to delta-space can be computed separately for each group.

8. Conclusions

Within the context of grouped samples, the methodology described

here generalizes canonical analysis to be applied to any additive Euclidean embedded distance, to give associated visualizations of samples and group means, together with the usual accessories of analysis of variance, representations of uncertainty regions and calibrated biplot axes. This gen-

124


Figure 6. Unweighted AoD of the Pine data for comparison with the weighted AoD in Figure 7. Pythagorean distance is used and predictive axes constructed. Axes pass through the unweighted centroid of the group centroids. With Pythagorean distance the axes are linear with regular calibrations. eralization has two advantages: (a) it presents a methodology for grouped data that handles commonly used distances of the kind often used in applications of ungrouped data in fields such as ecology, taxonomy and sociology and (b) our methods are computationally efficient, depending on the number of groups rather than the total number of samples. The generalization is not quite complete, as the additivity assumption does not directly allow for intra-group correlation but even this may be possible if there is an independently available metric correction-matrix, possibly obtained from previously determined or hypothesized measures of within group dispersion.

42

44

46

48

50

72.5

73

73.5

74

74.5

75

75.5

350

400

450

500

550

600

650

86

88

90

92

94

96

98

1400

1500

1600

1700

1800

1900

8

10

12

14

16

18

5.5

6

6.5

7

TotYield

Alkali

Density

TEA

Tensile

Tear

Burst

P.ell

P.maxP.patP.tae

P.kes

125


Figure 7. Weighted AoD for comparison with the unweighted AoD of Figure 6. The axes run through the weighted centroid of the group centroids which is the same as the centroid of all the individual samples. However, note that the intersection of the axes can be placed anywhere on the plot using orthogonal parallel shifts as explained in Gower, Lubbe and le Roux (2011).

Although we have presented our results in terms of continuous

variables, the methodology extends to cover categorical variables, or mixtures of continuous and categorical variables. The main difference is that because a categorical variable has a limited number of possible levels, it will be represented by a set of points, the category level points (CLPs), rather than by a continuous linear or nonlinear trajectory. Although the basic ideas are similar to those we have discussed above, their algebraic development is quite demanding and we shall develop the details elsewhere, showing that many properties of CLPs for ungrouped data extend to grouped data.

42

44

46

48

50

73

73.5

74

74.5

75

75.5

76

350

400

450

500

550

600

650

88

90

92

94

96

98

1400

1500

1600

1700

1800

1900

2000

8

10

12

14

16

18

5.5

6

6.5

7

TotYield

Alkali

Density

TEA

Tensile

Tear

Burst

P.ell

P.maxP.patP.tae

P.kes

126


Table 3. The five samples with the smallest and the five samples with the largest within sum of squares in the space orthogonal to the delta-space (unweighted and weighted analyses identical).

Smallest within ss Largest within ss Sample Sum of squares Sample Sum of squares

15 0.09 31 3.13 22 0.23 02 3.25 25 0.31 29 3.56 27 0.32 10 3.76 34 0.50 30 4.37

Table 4. The within per group sum of squares orthogonal to the delta-space.

Group SS Group size P.ell 19.18 11 P.kes 4.02 5 P.max 7.45 6 P.pat 15.51 9 P.tae 6.32 5 Sum 52.48 36

References

DIGBY, P.G.N., and GOWER, J.C. (1981), Ordination Between and Within Groups Applied to Soil Classification, in Down to Earth Statistics: Solutions Looking for Geological Problems, ed. D.F. Merriam, Syracuse University Geology Contribu-tions, pp. 5375.

GOWER, J.C. (1968), Adding a Point to Vector Diagrams in Multivariate Analysis, Biometrika, 55, 582585.

GOWER, J.C. (1989), Generalized Canonical Analysis, in: Multiway Data Analysis, eds. R. Coppi and S. Bolasco, Amsterdam: Elsevier (North Holland).

GOWER, J.C., and DIJKSTERHUIS, G.B. (2004), Procrustes Problems, Oxford: Oxford University Press.

GOWER, J.C., and NGOUENET, R.F. (2005), Nonlinearity Effects in Multidimensional Scaling, Journal of Multivariate Analysis, 94, 344365.

GOWER, J.C., and HAND, D.J. (1996), Biplots, London: Chapman and Hall. GOWER, J.C., and KRZANOWSKI, W.J. (1999), Analysis of Distance for Structured

Multivariate Data, Applied Statistics, 48, 505519. GOWER, J.C., LUBBE, S., and LE ROUX, N.J. (2011), Understanding Biplots,

Chichester: John Wiley & Sons Ltd. GOWER, J.C., and LEGENDRE, P. (1986), Metric and Euclidean Properties of

Dissimilarity Coefficients, Journal of Classification, 3, 548. KRZANOWSKI, W.J. (1994), Ordination in the Presence of Group Structure, for

General Multivariate Data, Journal of Classification, 11, 195207.

127


KRZANOWSKI, W.J. (2004), Biplots for Multifactorial Analysis of Distance, Biometrics, 60, 517524.

KRZANOWSKI, W.J. (2000), Principles of Multivariate Analysis: A Users Perspective (Revised Edition), Oxford: Oxford University Press.

KRZANOWSKI, W.J., and RADLEY, D. (1989), Nonparametric Confidence and Tolerance Regions in Canonical Variate Analysis, Biometrics, 45, 11631173.

MARDIA, K.V., KENT, J.T., and BIBBY, J.M. (1979), Multivariate Analysis, London: Academic Press.

McLACHLAN, G.J. (1992), Discriminant Analysis and Statistical Pattern Recognition, Chichester: John Wiley & Sons Ltd.

RINGROSE, T.J. (1996), Alternative Confidence Regions for Canonical Variate Analysis, Biometrika, 83, 575587.

128

2014 paper joc the canonical analysis of distance

Documents