canonical analysis of two convex polyhedral cones and

~S¥CHOME’rRIK^--VOL. 53, NO. 4, 503-524t)ECEMUEa 1988

CANONICAL ANALYSIS OF TWO CONVEXPOLYHEDRAL CONES AND APPLICATIONS

MICHEL TENENHAU$

CENTRE HEC=ISA

Canonical analysis of two convex polyhedral cones consists in looking for two vectors (one ineach cone) whose square cosine is a maximum. This paper presents new results about the proper-ties of the optimal solution to this problem, and also discusses in detail the convergence of analternating least squares algorithm. The set of sealings of an ordinal variable is a convex poly-hedral cone, which thus plays an important role in optimal scaling methods for the analysis ofordinal data. Monotone analysis of variance, and correspondence analysis subject to an ordinalconstraint on one of the factors are both canonical analyses of a convex polyhedral cone and asubspace. Optimal multiple regression of a dependent ordinal variable on a set of independentordinal variables is a canonical analysis of two convex polyhedral cones as long as the signs of theregression coefficients are given. We discuss these three situations and illustrate them by examples.

Key words: convex polyhedral cone, alternating least squares algorithm, optimal scaling, mono-tone analysis of variance, optimal multiple regression.

Introduction

The analysis of ordinal data by optimal scaling methods leads to the search for aquantification of the categories of the ordinal variables that respects the ordinal structureand maximizes a criterion (de Leeuw, 1984; Girl, 1981; Young, 1981). Algorithms for thesearch for optimal scalings have been proposed for analysis of variance (de Leeuw, Young& Takane, 1976; Kruskal, 1965), multiple regression (Young, de Leeuw & Takane, 1976),principal component analysis (Young, Takane & de Leeuw, 1978), canonical analysis (vantier Burg & de Leeuw, 1983), generalized canonical analysis (Tenenhaus, 1984; van derBurg, de Leeuw, Verdegaal, 1984; Verdegaal, 1986) as well as for many other methods.

In this paper we have assembled some results useful to the development of suchalgorithms. The set of scalings of an ordinal variable is a convex polyhedral cone whichthus plays an important role in these algorithms. For example monotone regression is theprojection of a vector into a convex polyhedral cone, monotone analysis of variance andcorrespondence analysis subject to an ordinal constraint on one of the factors are bothcanonical analyses of a convex polyhedral cone and a subspace, optimal multiple regres-sion of a dependent ordinal variable on a set of independent ordinal variables is acanonical analysis of two convex polyhedral cones, as long as the signs of the regressioncoefficients are given.

Canonical analysis of two convex polyhedral cones consists in looking for two vec-tors (one in each cone) whose squared cosine is a maximum. This paper presents newresults about the properties of the optimal solution to this problem. We also discuss indetail the convergence of an alternating least squares algorithm, and we derive some

This paper is dedicated to the memory of Patrice Bertier who suggested its subject to me in 1974. Thispaper also owes much to the very valuable suggestions of J. P. Benz~cri, P. Cazes and J. de Leeuw, and I amvery grateful to them. 1 also want to thank J. Yellin for his help in improving the English of the manuscript.

Requests for reprints should be sent to Michel Tenenhaus, D~partement S.I.A.D., Centre HEC-ISA, 78350Jouy-en-Josas, FRANCE.

0033-3123/88/1200-8065 $00.75/0 503© 1988 The Psychometric Society

PSYCHOMETRIKA

properties of the limits of the calculated sequences, thus completing the results of Benz6cri(1979) about convex cones.

As background to the new results, it is useful to recall some definitions and proper-ties of a convex polyhedral cone and its polar. This is done in section one where we alsoreview the various propertiesof the projection of a vector into a convex polyhedral coneand study the case of monotone regression. In addition this section gives an interpretationof a "pool-adjacent-violators" algorithm as a backward multiple regression, which is new.Section two is devoted to canonical analysis of two convex polyhedral cones. In the thirdand last section we apply the previously presented results to: (a) monotone analysis variance, (b) correspondence analysis subject to an ordinal constraint on one of thefactors, and (c) regression of an ordinal variable on a set of ordinal variables.

1. Projection of a Vector Into a Convex Polyhedral Cone

The properties and determination of the projection of a vector into a convex poly-hedral cone play an important role in optimal scaling algorithms. A good reference for thedefinitions and properties of a convex polyhedral cone is Stoer and Witzgall (1970). Theproperties of the projection of a vector into a convex polyhedral cone are studied by Stoerand Witzgall (1970) as well as de Leeuw (1977). Some results for this projection presented in Luenberger (1969) and Kruskal and Carroll (1969). Efficient algorithms the determination of this projection have been proposed by Lawson and Hanson (1974),and Bremner (1982). In this section, we summarize the various definitions, properties andresults. We also discuss the case of monotone regression and we show that a "pool-adjacent-violators" algorithm presented in Barlow, Bartholomew, Bremner and Brunk(1972) has a simple interpretation as a backward multiple regression.

Convex Polyhedral Cone and its Polar

A convex polyhedral cone C can be defined as a set of vectors x in Rn verifying thecondition that A’x < 0, where A is a matrix with n rows and m columns and A’ is itstranspose. A convex polyhedral cone C is generated by a finite number of generators:there exist vectors sI .... , sk of Rn such that any element x of C can he written x =~ {2,.s~; i= 1, k}, where 21 ..... 2k are nonnegative. Conversely, any cone with a finitenumber of generators is polyhedral. We denote by C(S) the conical hull of a set S in Rn.

The polar cone Cp of C is the set of vectors y in Rn such that x’y is nonpositive for anyvector x of C. The polar cone Cp is polyhedral: it is generated by the columns of thematrix A.

Properties of the Projection of a Vector Into a Convex Polyhedral Cone

The properties of the projection of a vector of R~ into a closed convex set in R" canbe applied to the particular case of a convex polyhedral cone. The following two proposi-tions which are proved in Luenberger (1969, p. 69-71), summarize these properties. Theyare illustrated in Figure 1.

Proposition 1. Let y be a vector of Rn and K a closed convex set in Rn. There existsa unique vector x of K such that II y - x II < II Y - z II for any vector z of K. This vector xis the projection of y into K. Furthermore, a vector x is the projection of y into K if andonly if (y - x)’(z - < 0for any vector z ofK.

Applying this proposition to the particular case of a convex polyhedral cone, resultsin Proposition 2.

MICHEL TENENHAUS 505

FIGURE 1Projection of a vector y into a closed convex set K or a convex polyhedral cone C

Proposition 2. Let x be the projection of a vector y of Rn into a convex polyhedralcone C = C(S) that is generated by a set S = {sl ..... Sk}. Let R be the set of vectors sI of Sorthogonal to y - x. Then the vector x is equal to the projection of y into the subspaceL(R) generated by the vectors of R.

The polar cone plays the same role as the orthogonal subspace. This is shown by thefollowing proposition (Stoer & Witzgall, p. 51-53) and in Figure

Proposition 3. Let y be a vector of Rn and C a convex polyhedral cone. Theorthogonal decomposition y = x + z, where x is in C, z in Cp, and x’z = 0, is unique andis obtained from the projections x and z of the vector y into C and Cp respectively.

Corollary 1. Let y be a vector of Rn and C = C(S) be a convex polyhedral conegenerated by the set S = {s, ..., Sk}. Let R be a subset of S. The projection x of y intoL(R) is equal to the projection of y into C if and only if (y - x)’ < 0 for any s ofS -R,where S - R is the set of vectors of S which do not belong to R.

Corollary 2. Let C be a convex polyhedral cone. A vector y belongs to the polarcone Cp if and only if the projection of the vector y into C is the null vector.

The projection of a vector into a convex polyhedral cone possesses optimal proper-ties often used in optimal scaling algorithms. They are described in the next proposition(de Leeuw, 1977; and Kruskal & Carroll, 1969).

Proposition 4. Let C be a convex polyhedral cone, y a vector in Rn which does notbelong to Cp, x the projection of y into C, and z any vector of C. We have the followingresults:

1. The minimum of II y - z II/11 y II over z in C is reached at z = x and is equal to(1 - cos2(x, y))X/2.

2. The minimum ofll y - z II/11 z II over z in C is reached at z = (1/cosz (x, y))x and equal to (1 - cos~ (x, y))~/z.

506 PSYCHOMETRIKA

FIGURE 2

Orthogonal decomposition of a vector y with respect to C and Cv

3. The maximum of cos (y, z) over z in C is reached at any nonnull vector of C(x).Furthermore, cos (y, z) = cos (y, x) implies that z is any nonnull vector of C(x).

Proof. The vector x is nonnull since y does not belong to Cv. Result (1) is obvious.We now prove (3). The maximum of cos (y, z) over z in C is reached for some vectorz = x~ and is strictly positive since y does not belong to &. The cosine being independantof the norm, we may choose a vector xl with the same norm as that of x. We then haveII y - ×~ IJ -< JJ y - x II and the unieity of the projection into C implies that xl = x. Thesame argument is used to show that cos (y, z) = cos (y, x) implies that z belongs to C(x).We now prove (2). The minimum of II Y - z II/11 z II is reached for some vector z = 2. Wedenote by x3 a vector of C(x) with the norm of 2. F rom II Y- x3II/ 11 x~ II > I I Y -x2II /IIx2 II we get cos (y, x3) < cos (y, x2), so x2 belongs to C(x). Let us calculate the scalar such that x, = 2x. The ratio IJy-2xJJ/llXxll being equal to the minimum ofII x - ~y IJ/ll x II with respect to ~, we conclude that 2 = II Y 112/x’Y = 1/cos2 (x, y). We candeduce (2) from the equalities

I[ Y-- x2ll 2 [[xll 2 - ~-2 Ily[[2 [Ixll

II X2 II 2 - II x II2and cos (x, y) - II Y II"

Corollary. Let C be a convex polyhedral cone, y a vector of Rn not belonging to Cv,

S the sphere of radius 1, and x the projection of y into C. Then the projection of y intoC & S is equal to x/ll x II-

Proof. The minimum of II y - u II for u belonging to C c~ S is reached for a unitvector that maximizes cos (y, u).

The next result is useful for the study of the convergence of alternating least squaresalgorithms.

Proposition 5. Let C be a convex polyhedral cone and S the unit sphere. The


projection operator A into C is continuous on Rn, and the projection operator B intoC c~ S is continuous on Rn -- Cv.

and

Proof This proposition is deduced from the following inequalities:

II Au---Av II -< II u -- v II for any u and v in R~

2II Bu -- By II ~< -- II u - v II for any u and v in R" -- Cp. []

II Au II

Determination of the Projection of a Vector Into a Convex Polyhedral Cone

Many algorithms have been proposed for the determination of the projection x of avector y into the convex polyhedral cone C = C(S) generated by the vectors of S = {s~,.... sk}. Quadratic programming can be used to minimize II y - ~ {~ls~; i -- 1, k} II subjectto the constraints ~1 ..... gk > 0 (Judge & Takayama, 1966). It would seem to be a betterprocedure, however, to search for a subset R of S such that x is equal to the projection ofy into L(R). This is done by using standard computational routines for regression. Water-man (1974) proposed to calculate all the possible regressions of y on a subset of Deutsch, McCabe and Phillips (1975) improved this procedure by checking the optimalityof each subset. Armstrong and Frome (1976) proposed a branch-and-bound algorithmthat restrains the number of subsets of S to be examined.

More efficient algorithms have been designed by Lawson and Hanson (1974) andBremner (1982). The NNLS (Nonnegative Least Squares) algorithm of Lawson Hanson and Bremner’s algorithm are similar. In both cases, the search for an optimalsubset R is performed by an iterative multiple regression. At the initialization step R isempty. During the current step, we obtain a subset R of S such that the projection

YR = ~ {~ti si; si ~ R} of y into L(R) has all its coefficients ~q strictly positive. If (y - yR)’s nonpositive for any s belonging to S - R, then yR is the projection of y into C. Otherwise,we add to the subset R either the vector s that maximizes (y - yR)’s (Lawson & Hanson)or the t statistic of s in the regression of y on R + s, the union of sets R and s (Bremner).Now it is possible to find a subset Q of R such that the projection yQ+, = ~ {/~,s~;si ~ Q + s} of y into L (Q + s) has all its coefficients 3i strictly positive and II y - yQ÷, II <II y- yR II, The current step is over and the new set R is now Q + s. The algorithmconverges to the optimal subset, because, at each iteration, the distance between y and yastrictly decreases and the number of subsets R is finite. To obtain the subset Q, Bremnerproposes to take off the variable s with the largest negative t statistic, and to iterate thisprocedure until all the variables have positive regression coefficients. It is not clear thatthis leads to a vector yo+, such that IlY - Y~+nll < IlY - yall. In the NNLS algorithm,however, we are sure that a vector y~+, with this property is obtained.

Lawson and Hanson do not explicitly prove the convergence of the NNLS algorithmto the optimal subset R. This can be done quite easily, however, by using only geometricalarguments, as is shown below.

Proposition 6. Let R be a subset of S such that the projection y~ = ~ {~qs~; s~ e R}of y into L(R) has all its coefficients ~q strictly positive. Suppose that there exists a vector sof S -- R such that (y -- y~)’s is strictly positive. Then there exists a subset Q of R suchthat the projection ye+, = ~ {fl~s~; s~ ~ Q + s} of y into L(Q + s) has all its coefficients fl~strictly positive and such that II Y -- YQ+, II < II Y -- Y~ II-

508 PSYCHOMETRIKA

Proof. Let YR+s = ~ {fl~st; si s R + s}. If all the fli are strictly positive, then Q = Rand the proposition is proved. Otherwise there exists at least one fit < 0. However, thecoefficient fl of s is strictly positive, since [[y _yRl[2 < [[y_y~+sl]2 = 2fl(y--y~)’s-II Y~ -- Ya + ~ 1[2 > 0. Now consider the vector x of the segment [YR, YR +,] which belongs toC and is as close as possible to y. This vector x is equal to 2yR + (I- 2)y~+. with2 = Max {flt/(flt - ~t); fli < 0}. We denote by I the set of vectors st such that 2cq + (1- 2)fl~ = 0. As ~ < 1 and fl > 0 the vector s does not belong to I. Since vector x belongs

to C(R + s - I) we get [tY - Yg+,-, ]l < Ily - ill < [[y - yall. Let Y~+s-~ = ~ {~;s~; s~ eR + s - I} be the projection of y into L(R + s - I). If all the fl’t are strictly positive, then

Q = R - I. Otherwise, we iterate the described procedure with x playing the role of ygand Y~+,-I that of y~+,, excluding from R another subset I’ such thatII Y -- YR+,-~-v II < II y - y~ II. After a finite number of iterations we obtain a subset Q ofR with the desired property. []

We now present the NNLS algorithm which gives the projection of y into C in afinite number of steps.

The NNLS al#orithm.

Step 1 : Initialization of R: R = ~ZI.Step 2: If R = S or if the scalars (y - y~)’s are nonpositive for all the vectors s

S - R, then go to step 6. Otherwise, go to Step 3.Step 3: Find the vector s of S - R which maximizes (y - ys)’s.Step 4: If y~+~ = ~ {first; si ~ R + s} is such that all the coefficients fit are strictly

positive, then incorporate the vector s into the set R and go to Step 2. Otherwise go toStep 5.

Step 5: Calculate. 2 = Max {flt/(fl~ -- ~q); fll <- 0} and 7t = fit + 2(~tt - fit). Eject thevectors st such that y~ = 0 from R. Set ~q = yl. Go to Step 4.

Step 6: The projection of y into C is equal to the projection of y into L(R).

From a numerical point of view the calculations are simplified by using the Sweepoperator (Jennrich, 1977) as proposed by Bremner (1982). This use of the Sweep operatorin the NNLS algorithm has been implemented by Agha (1986).

Monotone Re#ression

Monotone regression is the projection of a numerical variable y into the convexpolyhedral cone of the scalings of an ordinal variable x. In this paragraph we will definemore precisely this problem and give a new geometrical interpretation of the "Pool-Adjacent-Violators" algorithm described in Barlow, Bartholomew, Bremner, and Brunk(1972, p. 13).

Scalin~l of an ordinal variable. An ordinal variable x is observed on n subjects andtakes its values on a set M = {1, 2 ..... m} of categories provided with the natural order.A scaling 6 of the categories of x is a real non-decreasing function which associates a real

number 6~ with each category j of M. The scaling x* of x induced by t5 is the numericalvariable which associates the scaling fix,) with each subject i. If we denote by xj thedummy variable which is equal to one if x(i) = and zero ot herwise, the scaling x*canbewritten x* = ~ {~ x~; j = 1, m}. Taking into account the ordinal constraint the scalings di~can be written 6~. = ,q + ct2 + ... + ~t~ with ,t 2, ..., ~,~ > 0. Consequently we get x* =~ {ct~z~; j = 1, m} where z~ = ~’. {xh; h =j, m}. So the set of scalings x* of the ordinalvariable x is the convex polyhedral cone C = L(z~)t~ C(z2 ..... z=), which is the direct


sum of the subspace generated by zI and the convex polyhedral cone generated by z2,

Monotone regression and the "pool-adjacent-violators" algorithm. If a numericalvariable y and an ordinal variable x are observed on a set of n subjects, the monotoneregression problem consists in looking for the scaling x* of the ordinal variable x that isas close as possible to the numerical variable y. So we are looking for the projection x* ofthe vector y of Rn into the convex polyhedral cone C of the scalings of the ordinalvariable x. The particular structure of the cone C allows the construction of fast andsimple algorithms based on the assembling of adjacent violators (Ayer, Brunk, Ewing,Reid, & Silverman, 1955; Barlow et al., 1972; Kruskal, 1964; Van Eeden, 1956). In thisparagraph we describe a "Pool-Adjacent-Violators" algorithm (Barlow et al., p. 13) andgive a geometrical interpretation of it in terms of a backward stepwise regression of y onZ1, Z2,..., Zra.

Let 37(B) denote the mean of the variable y restricted to the subjects i such that x(i)belongs to the subset B of M. A block of categories of M is a subset of M formed byconsecutive elements. Let Bo, B1 ..... B, be a partition of M into increasing blocks: thelargest element of Bh is smaller than the smallest element of Bh+ r By using geometricalarguments, it is possible to show that there exists a partition of M into increasing blocksB0, B~ ...... Br such that the optimal scaling 6 takes on the value 6(j)= )7(B~) for categoryj in B~.

Proposition 7. There exists a partition of M into increasing blocks Bo, BI, ..., B,,such that the optimal scaling t5 of the categories of x associated with the monotoneregression of y on x takes on the value ~(j) Y(Bh) for any category j inBh.

Proof. Proposition 2 implies the existence of a subset R of {zl, z2, .oo, Zm} such thatthe projection x* of y into C = L(z~) ~ C(z2 ..... z,,) is equal to the projection of y intothe subspace L(R). The vector z~ belongs to R since --z I and zI belong to C. Let J denotethe set of indices j such that zj belongs to R: J = {1, jl ..... jr}. This set J induces apartition of M into increasing blocks: {B~ = {jh ..... j(h + 1) - 1}, h = 0 ..... r} withj0 = 1 and j(r + 1) = m + 1. The indicatory variables i(Bn) = ~ {xj; j ~ Bn} of the blocksBh verify i(B~) = zj~ -- z~+~ for h = 0 ..... r - 1 and i(B,) = z~,. Since the subsets R ={zl, z~ ..... z j,} and {i(Bh), h = 0 ..... r} generate the same subspace, we may deduce

X* ---- E {f(Bh)i(Bh); h = O, r} = ~(Bo)zI + ~ {07(Bn) -- ~(B~_l))Zjh; h = 1, r}. []

The search for the optimal partition of M into increasing blocks Bo, B1 ..... B, isequivalent to the search for an optimal subset J = { 1, j 1 ..... jr} of M where each jh is thesmallest element of Bh. The "Pool-Adjacent-Violators" algorithm gives the optimalblocks Bo, B~ ..... B, in an iterative way. At the first iteration each category j is a block.We look at the sequence y(j). If y(1) < y(2) < -.. < y(m) the optimal solution has found. Otherwise, beginning at the first category, we pool together the categories whichconstitute a monotonically non-increasing run, and this gives another set of blocks onwhich the procedure is iterated. The optimal solution Bo, B~ ..... B, is reached as soon as37(Bo) < 37(B~) <". < 37(Br).

This algorithm can be interpreted as a backward stepwise multiple regression of thedependent variable y on the independent variables z2 ..... z,,. (Note that the variable 1,which is a vector of ones, is implicitcly in the regression.) At each step, the variables zjwith a non-positive regression coefficient are suppressed. The optimal subset J = {1, jl,.... jr} is obtained as soon as all the regression coefficients of the variables z jl ..... z j, are

5 l0 PSYCHOMETRIKA

p > 0 p< 0

FIGURE 3Canonical Analysis of two convex polyhedral cones

strictly positive. This interpretation of the "Pool-Adjacent-Violators" algorithm as abackward stepwise regression comes directly from the fact that the regression coefficientsof the variables zjh are y’(Bh)- )7(Bn_x)/Pooling 1 andBh when)7(Bh_l)> )7(Bsequivalent to ejecting the category jh from J and then to removing the variable zj~ fromthe regression.

2. Canonical Analysis of Two Convex Polyhedral Cones

The Problem

We consider two convex polyhedral cones C and D generated respectively by the setsof vectors S = {s~ ..... Sk} and T = {t~ ..... tn} in Rn. Canonical analysis of the two conesC and D is the search for two normed vectors x of C and y of D maximizing cos2 (u, v)over u in C and v in D. According to the relative position of the cones C and D, theproblem is to find the smallest or the largest angle between the two cones.

Properties of an Optimal Solution

Let x in C and y in D be two normed vectors verifying/9 2 = COS2 (X, y) > cos2 (u, v)for any u in C and any v in D. The following proposition shows that this optimal solutioncan be obtained by a canonical analysis of two well-chosen subspaccs. This is illustratedin Figure 3.

Proposition 8. There exist a subset P of S and a subset Q of T, such that the optimalsolution (x, y) can be obtained by a canonical analysis of the two subspaces L(P) and

MICHEL TENENHAUS 51 1

L(Q). Furthermore, the vectors x and y are the canonical vectors associated with thelargest canonical correlation Px.

Proof. Let e be the sign of p = cos (x, y) and let Pc and PD be the respectiveprojection operators into the cones C and D. Since the sign of cos (ex, y) is positive, may conclude that the maximum of cos (u, v) for u in eC and v in D is reached for ex andy. Therefore, using proposition 4, we conclude that the vectors x and y verify the equa-tions x = (1/I p I) Pc(eY) and y = (1/I p I) Po(ex). Moreover there exist subsets P of S and Q

of T such that Pc(ey) = Pue)(eY) and PD(eX) PL(Q)(ex). Fr om that wededuce theequations P~.te) ° P~,tQ)(x) 2xand P~.te~ o P~.t e~(Y) = p2y.Thus,the vectors x and y are obtained by canonical analysis of the subspaces L(P) and L(Q): the canonical vectors x andy are associated with the eigenvalue 2 = p2.

Moreover the canonical vectors x and y are associated with the largest eigenvalue

21 = p2~. Let xt and Yl be the canonical vectors associated with the largest eigenva!ue 2x.It is always possible to choose these vectors in such a way that the sign of p~ = cos (x~,

Yl) is that of p = cos (x, y). Let us suppose that ),~ = p~2 is strictly greater than 2 = (xt, y~) does not belong to the cone C x/). Then there exists a vector (u, v) = 7(x~, + (1 - 7)(x, y) of the segment [(x, y), (x~, y~)] belonging to C x D, where 7 is

positive. In effect, we can suppose without loss of generality that x = ~ {~hSh; She P} andy = ~’. {fl~t~; the Q}, where all the ~ and fi~ are strictly positive. Putting x~ =s~ e P} and y~ = ~ {fltht~; th ~ Q}, there exists some coefficient ~q~ or fl~ strictly nega-tive, since (x~, y~) does not belong to the cone C x D. The vector (u, v) of the segment[(x, y), (xp y~)] belongs to C x D as long as 7~qn + (1 -- ~):t~ > 0 and 7fl~h + (1 -- 7)fib > 0for all h. Thus we have:

7 < minimum { c(~ fih}

Since this minimum is strictly positive, we can choose 7 to be strictly positive. Sincethe canonical vectors x~, yt are orthogonal to the canonical vectors x, y, we obtain:l cos (u, v) l = (72 I pxl ÷ (1 2 I/9 I)/( 72 ÷ (1-- 7)2) >I P l" SOwearriveat a contradiction,which implies the proposition. []

Determination of a Stationary Solution

We call stationary solution to the problem of canonical analysis of two convexpolyhedral cones C and D two normed vectors x in C and y in D that verify the relationsx = Pc ~ s(SY) and y Po~ s(~X), where e i s thesignof cos ( x, y). T he optimal solut ion isstationary, but the converse is not necessarily true. The number of stationary solutions isfinite if, for any subset (P, Q) of S × T, the nonnull eigenvalues of Pu~’~ ° P~{o) are distinct.

By using -C rather than C, it is always possible to be brought back to the case where~ = 1 and where the polar cone of C does not contain D. Let us now study an alternatingleast squares algorithm which converges to a stationary solution, under the hypothesisthat the number of stationary solutions is finite and that D is not contained in the polarcone of C. These hypotheses are not really restrictive: for real data, eigenvalues areusually distinct, and if D is contained in C~’(s’~ t~ < 0 for any i, j), then - D is used insteadof D. The algorithm consists in the construction of the sequences x~ = Pc ~ s(L- ~), Y~

Po ~ s(X,), P, = cos (x,, y,). The process is initialized by an arbitrary vector Yo of D ~’.The properties of this process are studied in Propositions 9 through 12.

Proposition 9. Let C and D be two convex polyhedral cones such that one of thecones is not contained in the polar of the other, and let S be the unit sphere. The

512 PSYCHOMETRIKA

FIGURE 4The ALS algorithm

sequences x, = Pc ~ s(Y,-1), Y, Po~ s(X,) andp, =cos(x,, y ,), which are initia lized by anarbitrary vector Y0 of D - Cp, have the following properties:

1. The sequence Pt is increasing and converges to a limit p.2. The sequences II x,+ 1 -- x, II and II Y,+, -- Y, II converge to zero.

Proof. Using Proposition 4 and its corollary we deduce the inequalities p~ _<cos (x~÷ 1, Yt) -< P, + 1,’which imply the first property. We now prove the second property.Note that y, is equal to z,/ll z, II, where z~ is the projection of x~ into the cone D, see Figure4.

1. The sequence II xt - zt II 2 is decreasing and tends to 1 - p2, because it is equal to

2. Similarly the sequence II x,+l -z, II 2 is decreasing and tends to 1 -p2, because itverifies IIx,+l- z,+lll-< IIx,+l-z, II _< IIx,- z, ll. The right inequality comes from thefact that x~+ 1 is also equal to the projection of z~ into C r~ S.

3. Let us show that II z,+ 1 -- z, II converges to zero. We have:

The last term of the right member is non-negative since z~ belongs to D. From that, wededuce [[zt+ 1 - z~[[2 _< [[x,+1 -- z, ll 2 - Ilx~÷l - z~÷l [[2. And from 1 and 2, we deduce theresult.

4. The convergence to zero of the sequence ]]z~+1 --ztll implies the convergence tozero of the sequences II Yt+l --Y~ II and II x~÷l --x~ II since the projection operators into Sand C ~ S are continuous. []


The sequence p, can converge in a finite number of steps. In fact, as soon as Pn =the solution (x,, y~) is stationary. This is shown in the next proposition.

Proposition 10. If there exists an integer n such that p~ = p, ÷ 1 we have (xt, Yt) (x,, y~) for any > n and the solution (xn, y~) is stationary.

Proofi The equality p, = P,+I implies pn = cos (x,+ 1, y,) = P,+I and thereforeimplies the proposition. []

Let us now study the case where the sequence pt is strictly increasing.

Proposition 1l. Let (x~, y,) be a sequence such that p, = cos (x,, Yt) is strictly creasing. Any accumulation point (x*, y*) of the sequence (xt, y~) is stationary.

Proofi Let H be a set of integers such that the limit of the sequence {(x~, y~), t e is (x*, y*). Since the sequence p~ converges to p, the subsequence {pt, t ~ H} also con-verges to p = cos (x*, y*). The limit of the sequence {Yt = Po~s(X~), t ~ H} is PoCks(X*). Moreover, for any integer t and u in C, we have: p >cos (x,+cos (x~+l, y,)> cos (u, y~). Therefore, cos (x*, y*)> cos (u, y*) for any u in C; quently, x* = Pc ,~ s(Y*).

Proposition 12. If there are a finite number of stationary solutions to the problem ofcanonical analysis of the convex polyhedral cones C and D, then the sequence (xt, yt)converges to one of the solutions.

Proofi We suppose that the sequence p~ is strictly increasing (otherwise the result isobtained). Since the accumulation points (x*, y*) of the sequence (x,, y~) are stationary,they are finite in number by hypothesis. Let us now use a result of Ostrowski (1966, chap.28) mentioned by de Leeuw (1977):

If the bounded sequence (x,, y,) is such that II (x,÷ 1, Y~+ 1) - (xt, y,)II converges to zeroand if the number of accumulation points is finite, then the sequence (x~, y~) converges some limit (x*, y*).

If there exist subsets P of S and Q of T such that xt = PL(i’)r~ s(Yt-1) and y, =

PuO~ ~ s(Xt) for any large enough t, the sequence (xt, yt) converges to the first canonicalvariables of the canonical analysis of the subspaces L(P) and L(Q). This leads to thefollowing algorithm which can accelerate the convergence of the ALS algorithm andwhich gives the first canonical variables of some canonical analysis for solution. Afterseveral iterations of the ALS algorithm, we get x, in L(P,) and y, in L(Q,), where P, and Q,are subsets of S and T. Then we perform an iterative canonical analysis:

We express the first canonical variables of the subspaces L(P~) and L(Qt) linear combinations of vectors of P~ and Q~ respectively. If all the canonicalcoefficients are positive, the desired solution is obtained. Otherwise, we suppressthe vectors of P, and Qt that have nonpositive coefficients, and the remainingsets play the role of P~ and Q,. We iterate this procedure until all the coefficientsare positive. This gives a stationary solution (x*, y*).

We have to check that cos (x*, y*) is larger than cos (x,, y,). If it is not the case

514 PSYCHOMETRIKA

may mean that we have switched too soon to the iterative canonical analysis and weshould return to the ALS algorithm at the point where we left it, and execute additionaliterations. It may also mean that the limit of the sequence (xt, yt) is not optimal: Thislimit is a pair of canonical variables, but it is not the first one which does not belong toC x D. Regardless of the result, however, completion of the ALS algorithm by iterativecanonical analysis is useful. If the algorithm succeeds we obtain a stationary solutionwhich is a pair of first canonical variables. If it fails we know that the limit of the ALSalgorithm is not optimal, and we can try another initialization.

Agha (1986) has written a program for canonical analysis of two convex polyhedralcones that follows the described procedure.

3. Applications

Let us now use canonical analysis of two convex polyhedral cones to solve threeproblems: (a) the search for the best monotone transformation of the dependent variablein multiple regression; (b) correspondence analysis subject to an ordinal constraint on oneof the factors; and (c) optimal multiple regression of a dependent ordinal variable on a setof independent ordinal variables, subject to the constraint that the regression coefficientshave fixed signs.

Search for the Best Monotone Transformation of theDependent Variable in Multiple Re#ression

Let us consider a multiple regression problem. Let the vectors of the values of thedependent and independent variables observed on n subjects be denoted by y, x1 ..... xk.Denote by C the set of monotone nondecreasing transformations z of the vector y: y~ < yjimplies zi < zj where y~ and zi are coordinates of y and z. The matrix X = Ixo, xl .....consists of a column x0 of n ones and k columns of independent variables. The subspacegenerated by the columns of X is denoted by L(X). Kruskal (1965) proposed to lookfor a monotone transformation z =f(y) such that the quantity (called Stress) S(z,

IIz- ull/llu- all is minimum for z in C and for u in L(X), where ~i is the mean of thecoordinates of the vector u. The function S(z, u) is invariant for any affine transformation:S(az + b, au + b) = S(z, u). Therefore we can limit the search for the minimum of S(z, over z in C and u in L = L(X) :~ x~, and S(z, u) becomes equal to II z -- u II/11 u II- From theset of coordinates y~ of y, we extract the m distinct values ranked in an increasing order:

Y~I < Y~2 < "’" < Y~,~. Let us define the dummy variables zh by zh(i) = 1 if y~ > y~ = 0otherwise, where h -- 1, ..., m. The cone C of the monotone transformations of y is thenequal to L(zl) ~ C(z2 ..... z,.).

For a fixed u of L, the minimum of S(z, u) is reached for the projection z* of u into and has the value (1 - cos2 (z*, u))1/2. For a fixed z of C the minimum of S(z, u) is reachedfor (l/cos 2 (Z, U*)U*, where u* is the projection of z into L, and has the value(1 - cos2 (z, u*))~/2.

These two results come directly from Proposition 4. So the problem studied byKruskal is equivalent to a canonical analysis of the cone C and the subspace L and we getS(z, u) = (1 - 2 (z, u))~/2 at t he minimum. In the particular caseof a factor experi-mental design, Kruskal proposed an iterative algorithm (MONANOVA, Kruskal, 1965;Kruskal & Carmone, 1968), which consists in minimizing alternatively the stress S(z, u) z and u. The minimization with respect to u is performed, however, by a gradient methodrather than by the calculation of a projection into L.

Box and Cox (1964) and Kruskal (1965) have analysed the data of Barella and about the resistance to traction of a wool yarn. The data, which appear in the Box and


Cox paper, give the number of cycles y needed to break the yarn in a 33 factor experi-mental design, where the factors are: xI = length of the yarn (250, 300, 350 mm), 2 =amplitude of the cycle (8, 9, 10 mn) and 3 =load (40, 45, 50g).Let us now lookfor tmonotone transformation of y that is as closely related as possible to the factors. Weconstruct the 27 dummy variables zh which indicate the order of the Yl and the 6 dummyvariables xo associated with the first two categories of the factors x~ (we have dropped thelast category of each factor).

We perform a canonical analysis of the cone C = L(zl)~ C(z2 ..... ) andthesubspace L = L(x0, xt ~ ..... x32). We use the ALS algorithm completed with the iterativecanonical analysis. We initialize the algorithm with the standardized variable y; then weconstruct the sequences xt = PL(Yt-~), Yt = Pc(xt) and pt = cos (xt, yt). The first iterations give Pl = .977, P2 --- .994 and P3 = .9964. We then perform the iterative canoni-cal analysis described above and obtain the "optimal" monotone transformation z givenin Figure 5. The multiple correlation between z and the variables xo is equal to .9975.

The regression coefficients of the dummy variables x,~ in the regression of z on thedummy variables xo are the scalings of the categories and are given in Table 1.

The ordinal structure of the factors has been respected: It is a property of the dataand not of the method. If such had not been the case, it would have been possible toimpose ordinal constraints on the calculation of the coefficients of the factor categories byusing the ADDALS algorithm (de Leeuw, Young, Takane, 1976) or the MORALS algo-rithm (Young, de Leeuw, Takane, 1976). As did Fisher (1946) who performed a qualitativeanalysis of variance by canonical analysis, we have calculated the usual F in the analysisof variance of the transformed variable z with the factors length, amplitude and load (seeTable 2). Although these F do not follow any Fisher-Snedecor distributions, they can beused as indices of the strength of the relationship between the various factors and themonotone transformation z of the number of cycles needed to break the wool yarn.

Table 2 also shows the results of the analysis of variance when y and log y are usedinstead of z. The increase in the strength of the relationship between the dependentvariable and the factors is clear. We may note that the correlation between z and log y is0.9899.

Correspondence Analysis Subject to an Ordinal Constraint on One FactorCorrespondence analysis of a contingency table {ko} representing association fre-

quencies between categories i andj of two qualitative variables y and x leads to the searchfor standardized scalings y* and x* as correlated as possible. If we denote by Y and X thebinary tables associated with the variables y and x, the optimal scalings can be writteny* = Y~ and x* = Xq~ where ~ and ~ are the first factors of correspondence analysis. Letus now study the problem of correspondence analysis subject to an ordinal constraint onone of the factors.

Suppose that the variable y is ordinal with categories 1 ..... m respecting the naturalorder and that the variable x is nominal with p categories. We want to perform acorrespondence analysis subject to the constraint that the factor ~ is a nondecreasingfunction of the categories of y. Therefore a scaling y* may be written y* = Z0t where thecolumns of the matrix Z are the indicatory variables of the order of the categories and ~t isa vector such that all its coordinates (except the first one) must be nonnegative. We wishto find the vectors 0t and q~ that maximize the correlation between the variables Z~t andXq~. But this is precisely a canonical analysis of the cone of the centered scalings y* of yand the subspace L(X). The optimal solution is obtained for the first non-trivial canonicalvectors of L(Zj) and L(X), where Zj is a submatrix of Z that includes the first column Z and also includes the columns Z~ for j belonging to J = {jl ..... jk}. Denote by Yh =

516 PSYCHOMETRIKA

2.9

1.9

0.9

- 1.1 ~

-2.1 , I , = I , , , ~ I , ~ ~ ~4.4 5.4 6.4 7.4

LOG y

FIGURE 5

Plot of the "Optimal" Monotone Transformation z Versus the Logarithm of y

Z~h_ ~ -- Zj~, the indicatory variables of the intervals [j(h - 1), jh - 1], h = 1 ..... k + 1,where j0 = 1, j(k + 1) = m + 1 and Zj~k+l~ = 0, and denote by Ya the matrix with thecolumns YI ..... Y~+ ~. Since the subspaces generated by the columns of Zz and Y~ are thesame, the optimal solution is obtained for the first factors of the correspondence analysisof a contingency table gotten by agregating the rows of the initial table {k~} : the new rowh corresponds to the categories of the interval [j(h - 1),jh 1"1.

An interesting case was analysed by Bradley, Katti & Coons (1962, Example 3) andby Nishisato (1980, p. 163). They considered five treatments which were evaluated accord-ing to the scale 1 = Excellent, 2 = Good, 3 = Average, 4 = Bad and 5 = Terrible. Theyobtained the contingency table shown below.

We wish to perform a corresponde.nce analysis of this table, subject to the constraintthat the factor ~ associated with the rows is a non-decreasing function of the categories.We have studied all the partitions of the scale categories into increasing blocks and foundthat the optimal partition leading to a first factor ~ that respects the constraint andassociated with the largest first eigenvalue is {[1, 2], 3, [4, 5]}. The corresponding factorq~ permits the ranking of the treatments: 9(1)=.086, q~(II)=.617, 9(111)=--1.28,9(IV) = -.98, 9(V) = 1.406. The new reordered contingency table is given in Table 4 confirms the ranking of the treatments.

TABLE 1 : Scaling of the factors

Length of Yarn Amplitude of Cycle Load

250 300 350 8 9 10 40 45 50

Scalings -1.78 -0.71 0 1.34 .63 0 .87 .46 0


TABLE 2 : F Table

Factors F Values When the Dependent Variable is

y log y

Length of yarn 18 174 1000

Amplitude of cycle 12 100 618

Load 4 39 258

The alternating least squares algorithm described in section 2 is here a variant of thereciprocal averages method in correspondence analysis. The projection Xq~ of Y~ intoL(X) leads to the relation

{k_~ ,(i); i = 1, m}, for all (1)~(J) ~"k.j

The projection Y~ of X~a into the cone of the scalings y* of y is obtained by monotoneregression of X~ on y. Let B0 ..... B, be the partition of the categories of y associatedwith this monotone regression. We obtain the relation

f~{k’~;l~B~} t(2)~(i) = ~ {k,.; l ~ B~} q~(Y);J = 1,

TABLE 3 : Contingency Table Scale x Treatment

Treatment

I II III IV

Scale

1 9 7 14 11 0

2 5 3 13 15 2

3 9 10 6 3 10

4 13 20 7 5 30

5 4 4 0 8 2

518 PSYCHOMETRIKA

TABLE 4 : The New Reordered Contingency Table

Treatment

III IV I II

Scale

1,2 27 26 14 10 2

3 6 3 9 10 10

4,5 7 13 17 24 32

for any i belonging to Bh. From an arbitrary standardized scaling 0o we obtain O~ byusing (1) and then q~ by using (2). The process is repeated until the algorithm converges.The scaling are in fact normalized at each step. In the current example the algorithm hasconverged to the optimal solution. This ALS algorithm has been proposed by Pietri(1972).

The SDM method of Nishisato (1980, p. 165) is an iterative correspondence analysis.At the initial step the original table is analyzed: If the first factor q~ respects the con-straint, the optimal solution has been found, otherwise the two successive categories i andi + 1 maximizing q~(i) - q~(i + 1) are pooled and a correspondence analysis is performedon the new table. This procedure is repeated until the constraint on ~ is respected. Thesteps of the algorithm are shown in Table 5, where we can see that the optimal solutionhas been found.

TABLE 5 : Use of the SDM Method : First Factor ~1 of the Correspondence

Analysis and Associated Eigenvalue kI

Scale Eigenvalue

1 2 3 4 5 ~I

-1.04 -1.30 .55 1.07 -.47 .284

-1.15 -1.40 .56 .85 .85 .245

-1.28 -1.28 .55 .86 .86 .244

Step 1

Step 2

Step 3


TABLE 6 : A subjective preference system for ten reference cars

(source of criteria information : Special Salon, L’ActionAutomobile et Touristique, N0.238, octobre 1980)

oars Sub- Criteria

rankingMaximum Consumption Consumption Horsespeed in town at 120 km/h Power(km) (i~/I00 km) (it/100 km) (CV)

Peugeot 505 GR 1 173 11.4 10.01 10Opel Rekord 2000 LS 2 176 12.3 10.48 11Citro6n Visa Super "E" 3 142 8.2 7.30 5VW Golf 1300 GLS 4 148 10.5 9.61 13Ci~ro6n CX 2400 Pallas 5 178 14.5 11.05 13Meroedes 230 6 180 13.6 10.40 13BMW 520 7 182 12.7 12.26 11Volvo 244 DL 8 145 14.3 12.95 11Peugeot 104 ZS 9 161 8.6 8.42 7Citro@n Dyana i0 117 7.2 6.75 3

Both the ALS algorithm and Nishisato’s algorithm lead to a stationary solution, butnot necessarily to the optimal one. From this point-of-view, the approach of Bradley,Katti and Coons is superior: Their approach is strictly equivalent to maximizing thecorrelation between the scalings y* and x*, and they show that their iterative procedureleads to the optimal solution. For this example we may check that they found an R2 of.244 strictly equal to the 21 of Table 5. Their optimal scaling values of the categories of y(-.569, -.569, .286, .431, .431) is a result of an affine transformation of (-1.28, -1.28.,55, .86, .86), so that the difference between the scalings of the extreme categories is one.

Optimal Multiple Regression of Ordinal Variables

Let y, x1 ..... xk be ordinal variables. We want to find standardized optimal scalingsy*, x~’ ..... x~’ that maximize the correlation between y* and the regression equation~* =fl~x~ + "" + [3kX~, where the signs e~ of the fl~ are determined by the user for logicalreasons. If we denote by Z, Z1 ..... Zk the centered indicatory variables of the order of thecategories of the various variables, the optimal scalings y*, x~’ ..... x~’ are obtained by acanonical analysis of the two cones C(Z) and C(eIZ1 ..... ek Zk).

We have used the program written by Agha (1986) for canonical analysis of twoconvex polyhedral cones to perform an optimal multiple regression of the variables whichappear in Table 6. These data were analysed by Jacquet-Lagr6ze and Siskos (1982). Theyused linear programming (the UTA method) to find a scaling of the subjective ranking close as possible to a linear combination of the scalings of the various criteria, wheredistance is measured by absolute deviation and with the constraint that the coefficients ofmaximum speed and horsepower be negative and those of consumption in town andconsumption at 120 km/h be positive. In using optimal multiple regression we have thesame goal but in a least squares sense. All the variables of Table 6 are considered to beordinal. The results of the canonical analysis of the cones C(Z) associated with thesubjective ranking and C(--Z, 2, Z 3, - -Z,,) associated with t he criteria a ppear i nFigure 6. The optimal scaling of the subjective ranking is perfectly reconstituted by thefollowing equation:

520 PSYCHOMETRIKA

2.2

Scaling ofSubjective Ranking

-1.8 " ’~ ’"~ ’"0 2 4 6 9 10

Scaling ofConsumption in Town

1.5

1.1

0.7

0.3

-0.1

-0.5

-0.97.2 9.2 11.2 13.2 15.2

0.5

0

-0.5

-1

-1.5

-26.7

Scaling ofConsumption at 120 km/h

1 ,,~,,,~,, ,~,,,

8.7 10.7 12.7 14.7

1.6

0.6

-0.4

-1.4

-2.40

FIGURE 6Optimal Scalings of the Ordinal Variables

Scaling ofHorsepower

3 6 9 12 15

(Subjective ranking)* = 0.86(Consumption in towm)*

+ 0.67(consumption at 120 Kin/h)* - 1.58(Horsepower)*.

The multiple correlation between the dependent variable (Subjective Ranking)* andthe independent variables (Consumption in Town)*, (Consumption at 120 km/h)* (Horsepower)* is equal to one. It is interesting to note that the variable "MaximumSpeed" has been eliminated from the model: in the canonical analysis of the two conesC(Z) and C(-Z1, Z2, Za, -Z,t), the coefficients of the columns of (-Z1) are all zero atthe optimum. So we have constructed monotone transformations of the variables that areperfectly related.

For the subject who rank-orders the cars, (3) indicates the system of compensationamong consumption in town, consumption at 120 km/h and horsepower. Note that theabove results are certainly too good and should be validated by bootstrap procedures.

For the same data we have used the MORALS algorithm. This algorithm searchesfor optimal scalings y*, xI’ ..... x~’ that have the same means and standard deviations as


12

10

8

6

4

2

0

Scaling ofSubjective Ranking

,,I,,,1,,,i,~,1,,

10

Cor (Ranking, Ranking*) =. 9986

Scaling of Speed

190

170

150

130

110110 130 150 170 190

Cor (Speed, Speed*)

15

13

11

9

77.2

Scaling ofConsumption In Town

~ ,ll ~ ,I ~ ~ ,I , ,

9.2 11.2 13.2 15.2

Scaling ofConsumption at 120 km/h

12.7~,,,,,,,, .~/,-r~,

9"7 .,,~~,,,~ ,,,i,

8.7

7.7

6.7 ,6.7 8.7 10.7 12.7 14.7

15

12

9

Scaling ofHorsepower

0 3 6 9 12

Cor (C. Town, C. Town*) =. 9955 Cor (C. Road, C. Road’) = .9925 Cor (H.P., H.P.*) =

FIGURE 7Optimal Scaling of the Ordinal Variables Given by MORALS

15

the original data, that maximize the correlation between y* and the regression equation~* = flo + fllx~ + "’" + flkX’~, but where there is no constraint on the signs of the regres-sion coefficients. This problem is equivalent to a canonical analysis of the polyhedralconvex cone C(Z) and the (non convex) cone composed of the union of the cones C(eIZ~,.... e+Z,,) with e~ ..... e+ equal to plus or minus one. The optimal scalings given byMORALS appear in Figure 7, where we can see that the data have been less transformedthan above. It is rather interesting to compare the regressions on the original and trans-formed data, see Table 7. Here too the results are too good, and we can see that verysmall transformations of the data lead to a large variation in R2, but no change at thelevel of the regression coefficients.

522 PSYCHOMETRIKA

TABLE 7 : Comparison of the Regressions on the Original and

Transformed data

Correlations

Speed C.Town C.Road H.P.

Rank

-.417 -.193 -.076 -.263

-.58 -.25 -.073 -.27

Original data

transformed data

Multiple Regressions

Original data :

Rank = 50 -.28 Speed -4.41 C.Town + 1.79 C.Road + 3.64 H.P.

R2 = 0.41, F = 0.86, ~ = 3.12

Transformed data :

Rank* = 51 -0.3 Speed* -4.36 C.Town* + 1.84 C.Road* + 3.7 H.P.

R2 = 1

Conclusion

The primary purpose of this paper has been to study canonical analysis of twoconvex polyhedral cones in detail. We have found useful properties of the optimal solu-tion and have studied the convergence of an ALS algorithm not only at the criterion level,but also at the level of the sequences of projections into the cones. The second purpose ofthe paper has been to apply canonical analysis of two convex polyhedral cones to threesituations. For monotone analysis of variance we have shown that the MONANOVAalgorithm is equivalent to a canonical analysis of a convex polyhedral cone and a sub-space. We have also shown that one can construct a MONANOVA algorithm with onlyan iterative use of a multiple regression program, since monotone regression can be doneby backward stepwise multiple regression. Furthermore, the convergence of the algorithmcan be improved with a canonical analysis program. For correspondence analysis subjectto an ordinal constraint on one of the factors, we have shown that the optimal solution isobtained for the first factors of a correspondence analysis performed on a table wheresome categories of the ordinal dimension have been pooled. The algorithms of Pietri,Nishisato and Bradley et al. have been compared: the algorithms of Pietri and Nishisatolead to a stationary solution; the algorithms of Bradley et al. leads to the optimal one.Multiple ordinal regression has been studied as an example of canonical analysis of twoconvex polyhedral cones as long as the signs of the regression coefficients have been given.


We have seen that the ALS algorithm is efficient, and that the geometrical interpretationof it as a succession of projections into cones seems very attractive. Finally it is hopedthat this paper will provide a sound basis for further studies on optimal scaling methodsfor analyzing ordinal data.

References

Agha, A. (1986). Analyse canonique de deux cones poly6driques convexes [Canonical analysis of two convexpolyhedral cones]. M6moire de D.E.A. Paris : Universit~ Paris IX-Dauphine, LAMSADE.

Armstrong, R. D., & Frome E. L. (1976). A branch-and-bound solution of a restricted least squares problem.Technometrics, 18, 447-450.

Ayer, M., Brunk, H. D., Ewing, G. M., Reid, W. T., & Silverman, E. (1955). An empirical distribution functionfor sampling with incomplete information. Annals of Mathematical Statistics, 26, 641-647.

Barlow, R. E., Bartholomew, D. J., Bremner, J. M., & Brunk, H. D. (1972). Statistical inference under orderrestriction. New York: Wiley.

Benz6cri, J. P. (1979). Un algorithme pour construire une perpendiculaire commune ~i deux convexes. [Analgorithm to construct a common perpendicular to two convex surfaces]. Les cahiers de l’analyse desdonndes, 4, 159-173.

Box, G. E. P., & Cox, D. R. (1964). An analysis of transformations. Journal of the Royal Statistical Society,Series B, 26, 211-252.

Bradley, R. A., Katti, S. K., & Coons, I. J. (1962). Optimal scaling for ordered categories. Psychometrika, 27,355-374.

Bremner, J. M. (1982). An algorithm for nonnegative least squares and projection into cones. In H. Caussinus,P. Ettinger, & R. Tomassone (Eds), Compstat 1982 (pp. 155-160). Wien: Physica-Verlag.

de Leeuw, J. (1977). A normalized cone regression approach to alternating least squares algorithms. Unpublishedmanuscript. Leiden: University of Leiden, Department of Data Theory.

de Leeuw, J. (1984). The GIFI system on nonlinear multivariate analysis. In E. Diday, M. Jambu, L. Lebart,J.-P. Pagds, & R. Tomassone (Eds), Data Analysis and Informatics, 3 (pp. 415-424). Amsterdam: North-Holland.

de Leeuw, J., Young, F. W., & Takane, Y. (1976). Additive structure in qualitative data: An alternating leastsquares method with optimal scaling features. Psychometrika, 41, 471-503.

Deutsch, F., McCabe, J. H., & Phillips, G. M. (1975). Some algorithms for computing best approximations fromconvex cones. SIAM Journal, Numerical Analysis, 12, 390-403.

Fisher, R. A. (1946). Statistical methodsfor research workers (lOth ed.). Edinburgh: Oliver and Boyd.Girl, A. (1981). Nonlinear multivariate analysis. Leiden: University of Leiden, Department of Data Theory.Jacquet-Lagreze, E., & Siskos, J. (1982). Assessing a set of additive utility functions for multicriteria decision-

making, the UTA method. European Journal of Operational Research, 10, 151-164.Jennrich, R. I. (1977). Stepwise regression. In K. Enslein, A. Ralston & H. S. Will (Eds), Statistical methods

digital computers (pp. 58-75). New York : John Wiley & Sons.Judge, G. G., & Takayama, T. (1966). Inequality restrictions in regression analysis. Journal of American Statis-

tical Association, 61 166-181.K ruskal, J. B. (1964). Nonmetric multidimensional scaling: A numerical method. Psychometrika, 29, 28-42.Kruskal, J. B. (1965). Analysis of factorial experiments by estimating monotone transformations of the data.

Journal of the Royal Statistical Society, Series B, 27, 251-263.Kruskal, J. B., & Carmone, F. J. (1968). Monanova, a Fortran IV program for monotone analysis of variance.

Unpublished manuscript. Bell Telephone Laboratories, Murray-Hill, New Jersey.Kruskal, J. B., & Carroll, J. D. (1969). Geometrical models and Badness-of-fit functions. In P. R. Krishnaiah

(Ed.), Multivariate analysis I1 (pp. 639-671). New York: Academic Press.Lawson, C. M., & Hanson, R. J. (1974). Solving least squares problems. Englewood Cliffs: Prentice-Hall.Luenberger, D. G. (1969). Optimization by vector space methods. New York: J. Wiley.Nishisato, S. (1980). Analysis of categorical data: Dual scaling and its applications. Toronto: University of

Toronto Press.Ostrowski, A. M. (1966). Solutions of equations and systems of equations. New York: Academic Press.Pietri, M. (1972). Analyse des correspondances sous contraintes. [Correspondence analysis subject to con-

straints]. Unpublished manuscript. Fontenay-sous-Bois: CEA, LSEES.Stoer, J., & Witzgall, C. (1970). Convexity and optimization infinite dimensions 1. Berlin: Springer-Verlag.Tenenhaus, M. (1984). L’analyse canonique g6n6ralis6e de variables num6riques, nominales ou ordinales par des

m6thodes de codage optimal. [Generalized canonical analysis of numerical, nominal or ordinal variables by

524 PSYCHOMETRIKA

optimal scaling methods]. In E. Diday, M. Jambu, L. Lebart, J-P. Pag6s, & R. Tomassone (Eds.), Dataanalysis and informatics, 3 (pp. 71-84). Amsterdam: North-Holland.

Van Eeden, C. (1956). Maximum likelihood estimation of ordered probabilities, lnda~lationes Mathematicae, 18(3), 444-455.

van der Burg, E. (1983). Canals users guide. Leiden: University of Leiden, Department of Data Theory.van der Burg, E., & de Leeuw, J. (1983). Nonlinear canonical correlation. British Journal of Mathematical and

Statistical Psychology, 36, 54-80.van der Burg, E., de Leeuw, J., & Verdegall, R. (1984). Nonlinear canonical correlation with m sets of variables.

Leiden: University of Leiden, Department of Data Theory.Verdegall, R. (1986). Overals. Leiden: University of Leiden, Departent of Data Theory.Waterman, M. S. (1974). A restricted least squares problem. Technometrics, 16, 135-136.Young, F. W. (1981). Quantitative analysis of qualitative data. Psychometrika, 46, 357-388.Young, F. W., de Leeuw, J., & Takane, Y. (1976). Regression with qualitative and quantitative variables:

alternating least squares method with optimal scaling features. Psychometrika, 41, 505-529.Young, F. W., Takane, Y. & de Leeuw, J. (1978). The principal components of mixed measurement level

multivariate data: An alternating least squares method with optimal scaling features. Psychometrika, 43,279-281.

Manuscript received 1/20/86Final version received 8/7/87

canonical analysis of two convex polyhedral cones and

Documents