past, present, and future of multidimensional...

Past, Present, and Future of Multidimensional Scaling

Patrick J. F. Groenen

*Econometric Institute, Erasmus University Rotterdam, The Netherlands,

[email protected], http://people.few.eur.nl/groenen/

Summary:

1 What is MDS?

2 Some Historical Milestones

3 Present

4 Future

5 Summary of highlights in MDS

Past, Present, and Future of MDS – 2 –

1 What is MDS?

• Table of travel times by train between 10 French cities:

Bor-

deaux Brest Lille Lyon Mar-seille Nice Parijs

Strassbourg

Tou-louse Tours

Bordeaux 0 Brest 9h58 0 Lille 6h39 7h11 0 Lyon 8h05 7h11 4h52 0 Marseille 5h47 8h49 6h12 1h35 0 Nice 8h30 13h36 8h20 4h33 2h26 0 Parijs 2h59 4h17 1h04 2h01 3h00 5h52 0 Strassbourg 8h08 10h16 6h54 4h36 7h04 11h15 4h01 0 Toulouse 2h02 13h52 9h42 4h25 3h26 6h29 5h14 10h56 0 Tours 2h36 5h38 4h17 4h21 5h13 9h04 1h13 6h03 6h06 0

Marseille Toulouse

Bordeaux

Lyon

Nice

Paris

Tours

Lille

Strassbourg Brest

Marseille Toulouse

Bordeaux Lyon

Nice

Paris Tours Lille

Strassbourg Brest

MDS map of travel time by train. Geographic map of France.


dissimilarity matrix ∆

O1 O2 O3 L On-1 On

O1 0

O2 δ12 0

O3 δ13 δ23 0

M M M M O

On-1 δ1,n-1 δ2,n-1 δ3,n-1 L 0

On δ1n δ2n δ3n L δ2n 0

⇓

coordinates matrix X

dim 1 dim 2

O1 x11 x12

O2 x21 x22

O3 x31 x32

M M M

On-1 xn-1,1 xn-1,2

On xn1 xn1

O1

O2 O

3

On

On-1

•

•

•

•

•

⇒


• First sentence in Borg and Groenen (2005):

Multidimensional scaling (MDS) is a method that represents measurements of similarity

(or dissimilarity) among pairs of objects as distances between points of a low-

dimensional space.

• Who uses MDS?

– psychology, – medicine,

– sociology, – chemistry,

– archaeology, – network analysis

– biology, – economists, etc.

• Similarities and dissimilarities:

– Large similarity approximated by small distance in MDS.

– Large dissimilarity (δij) approximated by large distance in MDS.

– General term: proximity.


2 Some Historical Milestones

• 1635: van Langren: Provides a distance matrix and a map.

Map of Durham county

– Cartographer: Jacob van Langren – Date 1635

Newcastle Durham



• 1958: Torgerson: Provides a solution for classical MDS based on

eigendecomposition

• 1966: Gower: Provides independently the same solution for

classical MDS and gives connection to

principal components analysis.

Classical MDS: minimize Strain(X) = 1/4||J(∆∆∆∆(2)–D(2)(X))J||2

with J centering matrix by

eigendecomposition of –½ J∆∆∆∆(2)J




eigendecomposition

• 1966: Gower: Provides independently the same solution for classical MDS and

gives connection to principal components analysis.

• 1962: Shepard: Provides a heuristic for MDS.




eigendecomposition




• 1964: Kruskal: Establishes least-squares MDS.

Provides a minimization algorithm.

Proposes ordinal MDS plus optimization

Minimize Stress-I: σI(X,d̂) = ( )∑

∑

<

<−

ji ij

ji ijij

d

dd

)(

)(ˆ

2

2

X

X

with ijd̂ disparity satisfying a monotone

relation with proximities.


– Classic example: Rothkopf (1957) Morse code confusion data

+ Is there some systematic way in which people confuse Morse codes?

+ 36 Morse code (26 for alphabet, 10 for numbers)

+ Subjects task: judge whether two Morse codes are the same or not. For example:

+ Is .- (N) the same as .-. (R)? Yes (1), or no (2)

+ Stimulus pair presented in two orders: pair NR and RN.

+ Each subject judges many combinations of Morse codes.

+ N = 598.

+ Morse code confusion table: proportion confused.

+ Data are similarities

A B C D L 0

.- A 92 4 6 13 L 3

-... B 5 84 37 31 L 4

-.-. C 4 38 87 17 L 12

-.. D 8 62 17 88 L 6

M M M M M M O M

----- 0 9 3 11 2 L 94


– Classic example: Rothkopf (1957) Morse code confusion data

.-

-...

-.-.

-..

.

..-.

--.

.... ..

.---

-.-.-.. --

-.

---

.--.

--.-

.-.

...

-

..-

...-

.---..-

-.----..

.----

..---

...--

....-

....

-....

--...

---..

----.-----




eigendecomposition




• 1964: Kruskal: Establishes least-squares MDS. Provides a minimization algorithm.


• 1964: Guttman: Facet theory and regional interpretation in MDS.

– In facet theory, extra information (external variables) is available on

the objects according to the facet design by which the objects are

generated:



+ Every object i belongs to a category on one or more facets.

+ See, e.g., Guttman (1959), Borg & Shye (1995), Borg & Groenen (1997, 1998)

Dissimilarity matrix ∆: Facet design Facet

O1 O2 O3 L On-1 On 1 2 3

O1 0 O1 1 1 3

O2 δ12 0 O2 1 2 3

O3 δ13 δ23 0 O3 2 1 3

M M M M O M M M M

On-1 δ1,n-1 δ2,n-1 δ3,n-1 L 0 On-1 3 1 1

On δ1n δ2n δ3n L δ2n 0 On 3 2 1

– The extra facet information is used to partition the objects in the MDS space in regions.

– Facets are used for regional hypotheses about the empirical structure of the data.

a

a

a

aa

b

b

b

b

b

c

c

c

c

a

aa

a

b bb

bb

b

c

c

cc c

c

c

a

a

a a

aa

a

bb

b

c

cc

c

b

b

c

c

axial modular polar



– For the Morse code data, we have additional information available:

1. Length of the signal (.05 to .95 seconds).

2. Signal type (ratio of long versus short beeps).

Letter Morse code Length Signal type

Letter

Morse code Length Signal type

A .- 25 1=2 S ... 25 1 B -... 45 1>2 T - 15 2 C -.-. 55 1=2 U ..- 35 1>2 D -.. 35 1>2 V ...- 45 1>2 E . 05 1 W .-- 45 1<2 F ..-. 45 1>2 X -..- 55 1=2 G --. 45 1<2 Y -.-- 65 1<2 H .... 35 1 Z --.. 55 1=2 I .. 15 1 1 .---- 85 1<2 J .--- 65 1<2 2 ..--- 75 1<2 K -.- 45 1<2 3 ...-- 65 1>2 L .-.. 45 1>2 4 ....- 55 1>2 M -- 35 2 5 ..... 45 1 N -. 25 1=2 6 -.... 55 1>2 O --- 55 2 7 --... 65 1>2 P .--. 55 1=2 8 ---.. 75 1<2 Q --.- 65 1<2 9 ----. 85 1<2 R .-. 35 1>2 0 ----- 95 1 S ... 25 1



– Borg and Groenen (2005): Regional restrictions through Proxscal, by specifying:

+ two dimensions

+ two external variables,

+ each variable is transformed ordinally using the primary approach ties.

1 1>2 1=2 2>1 2

95

85

75

65

55

45

35

05

25

15

1111

112121

211

111

12 21

11

122

212

2121

2211

2122 22121222

22221

222

1221

221

222

1

11222

22211

11122

22111

2111111112

11111

11121121 1211

2111

2112

1222222222

1 1>2 1=2 2>1 2

95

85

75

65

55

45

35

25

15

05

12

2111 2121

211

1

1121221

1111

11

1222

212

1211

22

21

2221221

2212

121

111

2

112

1112

122

21122122

2211

12222

1122211122

11112

11111

21111

22111

22211 22221

22222

Unconstrained Regionally constrained




eigendecomposition







• 1969: Horan Dimension weighting models in 3-way MDS

1970: Carroll and Chang: Introduction (INDSCAL, IDIOSCAL)


• 1969: Horan: Dimension weighting models in 3-way MDS


– 3-way MDS: more than one dissimilarity matrix:

– In the weighted Euclidean model or each source, the common space

G may be stretched or shrunk along the axes.

– Model δijk ≈ dij(GSk) with

+ G a single common space and

+ Sk is a diagonal matrix of

dimension weights and

– INDSCAL uses STRAIN loss.

J J J J J J J J J J J J J

J J J


J J J

J J J J J

J

J

J

J J J J J

J

J

J

J J J J J J J J J J J J J J J J

≈

≈

≈

∆∆∆∆ 1

∆∆∆∆ 2

∆∆∆∆ 3

Common

space

s 11

= 1.5

s 12

= .5

s 21

= .8

s 22

= 1.5

s 31

= 1

s 32

= .3


• 1969: Horan: Dimension weighting models in 3-way MDS


– 3-way MDS: more than one dissimilarity matrix:

– In the weighted Euclidean model or each source, the common space

G may be stretched or shrunk along the axes.

– In the generalized Euclidean model, the common space G may rotated, then stretched

or shrunk along (rotated) axes.

– Model δijk ≈ dij(GSk) with

+ G a single common space and

+ Sk is any matrix of dimension

weights

– IDIOSCAL uses STRAIN loss.


J J J

≈

≈

≈

∆∆∆∆ 1

∆∆∆∆ 2

∆∆∆∆ 3

Common

space

α 3 = -10

o

s 31

= 1

s 32

= .3

J J

J J

J J J J J J J J J

J J

J


J J

J

J J J J J J J J J J J J J J J J

α 2 = 45

o

s 21

= .8

s 22

= .5

α 1 = 30

o

s 11

= 1.2

s 12

= .5




eigendecomposition







• 1969: Horan Dimension weighting models in 3-way MDS



2.1 Milestones in MDS algorithms

• 1958, 1966: Torgerson & Gower: solutions for classical MDS.

• 1964: Kruskal: Introduction Stress-I loss function plus

minimization and ordinal MDS.

• 1977: De Leeuw: Introduction SMACOF (Scaling by MAjorizing a

COomplicated function) algorithm for MDS.

• 1980: De Leeuw & Heiser: SMACOF extended to a comprehensive

MDS algorithm allowing transformations of the

dissimilarities, constraints on the configuration,

and three-way dimension weighting extensions.

• 1988: De Leeuw: Convergence results derived of the SMACOF

algorithm.

• 1995: Groenen, Mathar, Heiser: Extension SMACOF to city-block

distances.

De Leeuw

Heiser

Mathar


• Formalizing MDS by minimizing raw Stress over X:

σr(X, d̂) = w iji < j

∑ ˆ d ij − d ij (X)( )2

with wij ≥ 0 and δij ≥ 0

where

ˆ d ij disparity, d–hat, pseudo-distance: optimal transformation of dissimilarities

subject to (ordinal) restrictions and

∑< ji

ijijdw 2ˆ = n(n–1)/2 to avoid the trivial solution d̂=0 and X=0

dij(X) Euclidean distance between rows i and j of X

X n×p matrix of coordinates of n objects by p dimensions

wij nonnegative weights (for example, to code missings)


• Constrained MDS (De Leeuw & Heiser, 1980):

– Easy to combine majorization with constraints.

– The majorizing function ˆ σ (X,Y) can be conveniently expressed as

ˆ σ (X,Y) = 2

δη + tr XV'X – 2 tr X'B(Y)Y

= 2

δη + tr XV'X – 2 tr X'VX

= 2

δη + (tr XV'X + tr X 'VX – 2 tr X'VX ) – tr X 'VX

= 2

δη + tr(X – X )'V(X – X ) – tr X 'VX

with

Y the previous configuration

X the unconstrained update 2

δη the sum of squared dissimilarities

V a fixed (positive semi-definite) matrix depending on the weights.

Quadratic in X Constant Constant


∆∆∆∆1

∆∆∆∆2

∆∆∆∆3

• What type of constraints can be imposed?

– Any constraint on X that is solved easily by minimizing least squares error,

e.g., the linear constraints X = ZC (for given Z)

– Three-way MDS through constrained MDS(De Leeuw & Heiser, 1980):

– Minimize

σr(G,S1,S2,...,Sk)= ∑∑= <

K

k jiijkw

1

(δijk – dij(GSk))2

where

+ G is the n×p matrix of coordinates (the common space)

+ Sk is the p×p matrix of weights

– Consider the block matrices:

∆* =

4

3

2

1

∆000

0∆00

00∆0

000∆

, W* =

4

3

2

1

W000

0W00

00W0

000W

, and X* =

4

3

2

1

X

X

X

X

– Then, the dimension weighting models amount to restricting X* by Xk = GSk


3 Present

• 1986-1998: Meulman: integration of (nonlinear) multivariate

analysis and MDS.

– Much emphasis on the representation of objects, less on the

variables.

– Fitting by MDS through Stress as a dimension reduction technique.

– Including a wide variety of MVA techniques:

+ (Nonlinear) PCA

+ Multiple Correspondence Analysis

+ Correspondence Analysis

+ Generalized Canonical Correlation Analysis

+ Discriminant Analysis.


• 1986-1998: Meulman: integration of (nonlinear) multivariate analysis and MDS.

• 1994 Buja: Constant dissimilarities

– Data with all δij = 1 can be seen as maximum noninformative in MDS,

since all pairs of objects are equally dissimilar.

– Suppose ∆ = c

0111

1011

1101

1110

with c > 0

– What configuration does MDS yield with constant data?




– Data with all δij = 1 can be seen as maximum noninformative in MDS,

since all pairs of objects are equally dissimilar.

– Suppose ∆ = c

0111

1011

1101

1110

with c > 0

– What configuration does MDS yield with constant data?

– Buja, Logan, Reeds, & Shepp (1994) proved:

1 dimensional 2 dimensional 3 dimensional or higher

points equally spaced points on points on a sphere

on a line concentric circles

• • • • • • • • •




• 1978,1995- Various authors: local minima in MDS:

1. unidimensional scaling (Defays, De Leeuw, Pliner, Hubert, Arabie, Vera)

Daniel Defays Larry Hubert & Mathew

Hesson-McInnes

Phipps Arabie Jose Fernando

Vera


– When do local minima occur?

+ Unidimensional scaling

(De Leeuw & Heiser, 1977; Defays, 1978; Hubert and Arabie 1986; Pliner, 1996).

+ City-block MDS (Hubert, Arabie & Hesson-McInnes, 1992).

+ Depends on data:

with increasing dimensionality ⇒ fewer local minima

and error structure.


– When do local minima occur?

+ Unidimensional scaling

(De Leeuw & Heiser, 1977; Defays, 1978; Hubert and Arabie 1986; Pliner, 1996).

+ City-block MDS (Hubert, Arabie & Hesson-McInnes, 1992).

+ Depends on data:

with increasing dimensionality ⇒ fewer local minima

and error structure.

– What can you do about local minima?

+ Multiple random starts.

+ Tunneling (Groenen & Heiser, 1996)

+ Distance smoothing: unidimensional scaling (Pliner, 1996), city-block MDS, general

MDS (Groenen et al. 1999).

+ Meta heuristics

+ simulated annealing: De Soete, Hubert, Arabie (1988), Brusco (2001), Vera &

Heiser (2005, 2007)

+ genetic algorithm, …..





• 1998: Buja: Applying weights in Stress to mimic loss functions

+ Choose wij = 2−ijδ . Then Raw Stress becomes:

σr(X) = ( )∑ <−δ

ji ijijijw2

)(d X = ( )∑ <

− −ji ijijij

22 )(d Xδδ = ∑ <

−

jiij

ij

2)(d

1δ

X

+ These weights make that Stress fits the ratio of distances to dissimilarities:

large ijδ = 10, dij(X) = 5: 5.10

51

)(d1 =

−=

δ−=

ij

ijije

X

small ijδ = 2, dij(X) = 1: 5.2

11

)(d1 =

−=

δ−=

ij

ijije

X

+ This is a very good idea to attach equal importance to small and large errors.





• 1998: Buja: Applying weights in Stress to mimic loss functions (after Buja,

1998).


4 Future

• 1999-: Heiser, Meulman, Busing: PROXSCAL (i.e. SMACOF) in SPSS (PASW)

• 2009: De Leeuw & Mair: SMACOF in R.



• 2009: De Leeuw & Mair : SMACOF in R.

• 2000: Tenenbaum, et al.: Large scale MDS ISOMAP heuristic for.

• 2005-: Groenen, Trosset, Kagie: Large scale MDS through Stress.

– Problems:

+ Computationally too demanding.

+ Storage is a problem (n2).

+ Uninformative solutions.

10 100 1000 10000.01

.1

1

10

100

1000

n

CP

U s

eco

nd

s


– Solution Groenen, Trosset, Kagie:

+ Use only a fraction of the data.

+ Make use of smart designs.

+ Use sparseness of the data efficiently to obtain a fast majorization algorithm.

– Comparison large scale majorization versus SMACOF

– n = 1,000

– Proportion nonmissing: .05

(Nnonmis = 23,000 out of 499,500)

0 0.5 1 1.5 20.05

0.1

0.15

0.2

0.25

0.3

0.35

0.4

0.45

CPU seconds

Str

es

s

Large scale majorizationSMACOF

– n = 10,000

– Proportion nonmissing: .005

(Nnonmis = 250,000 out of 49,995,000)

0 50 100 150 2000.05

0.1

0.15

0.2

0.25

0.3

0.35

0.4

0.45

CPU seconds

Str

es

s

Large scale majorizationSMACOF


– Avoiding uninformative solutions:

+ Edinburgh Associative Thesaurus (EAT) data set (1968, 1971): words associated

with stimulus

+ cij contains the number of associations between words i and j.

+ n = 23,219 terms.

+ 325,060 nonzero association counts between terms (sparseness = 0.1%).

+ Solution when choosing: δij = 1/cij


+ Solution when choosing the gravity model δij = ij

ji

c

oo and wij = 5

ijδ

with oi is the total number of occurrences of term i




• 2000: Tenenbaum, et al.: Large scale MDS ISOMAP heuristic.


• 2002-: Buja, Cook, Swayne: Dynamic MDS visualization in the G-GVis software.

• 2003: Groenen: Dynamic MDS visualization through iMDS.

Andreas Buja, Deborah Swayne, Di Cook








• 2002: Denœux, Masson, Groenen, Winsberg, Diday: Symbolic MDS of intervals

• 2006: Groenen, Winsberg: Symbolic MDS of histograms


-0.8 -0.6 -0.4 -0.2 0 0.2 0.4 0.6-0.8

-0.6

-0.4

-0.2

0

0.2

0.4

0.6

1

2

3

4

5

6

78

9

10

d28

(L)

d28

(U)

• 2002: Denœux, Masson, Groenen, Winsberg, Diday: Symbolic MDS of interval data

– Instead of δij, its interval is given: δij ∈ [ )()( , Uij

Lij δδ ]

– Find MDS with coordinates xis also in an interval.

– Minimize I-Stress:

),(2 RXIσ = ( )∑ <−

ji

Lij

Lijij dw

2)()( ),( RXδ +

( )∑ <−

ji

Uij

Uijij dw

2)()( ),( RXδ

with

),()( RXLijd longmallest distance between

rectangles

),()( RXUijd longest distance between

rectangles



– Instead of δij, its empirical distribution (percentiles) are given: αααα = [.20, .30, .40]

Lower bound Upper bound

k αααα percentile )(Lijkδ percentile )(U

ijkδ

1 .20 20 )(1L

ijδ 80 )(1U

ijδ

2 .30 30 )(2L

ijδ 70 )(2U

ijδ

3 .40 40 )(3L

ijδ 60 )(3U

ijδ

– Minimize

),...,,( 12

KHistI RRXσ =

( )∑ ∑ <−

k ji kL

ijL

ijkij dw2)()( ),( RXδ +

( )∑ ∑ <−

k ji kU

ijU

ijkij dw2)()( ),( RXδ

subject to 0 ≤ ris1 ≤ ris2 ≤ … ≤ risK.








• 2002: Denœux, Masson, Groenen, Winsberg, Diday: Symbolic MDS of intervals


• 2010: Groenen: Dynamic MDS of Dutch political parties


• Political party comparison website for Dutch parliament elections 2010 asks

to rate 30 politcal statements (www.stemwijzer.nl), e.g.,

1. The government needs to cut the budget by biljons. The budget deficit should

disappear at the latest in 2015.

Agree Don’t know Disagree

2. Those with high income should pay more taxes.

Agree Don’t know Disagree

– 11 political parties also rated these 30 items.

– What is the political landscape in the Dutch elections of 2010?

– Do iMDS on the distances between the 11 parties in 30 dimensional space.


5 Summary of highlights in MDS

Past Main author(s) Topic

1958, 1966 Torgerson,Gower Classical MDS

1964 Kruskal Least-squares MDS through Stress with

transformations

1964 Guttman Facet theory and regional interpretations in MDS

1969, 1970 Horan, Carroll Three-way MDS models (INDSCAL, IDIOSCAL)

1977- De Leeuw and others The majorization algorithm for MDS

Present

1986-1998 Meulman Distance-based MVA through MDS

1994 Buja Constant dissimilarities

1978, 1995- Various Local minimum problem

1998 Buja Smart use of weights in MDS


Future

1999, Heiser, Meulman,

Busing

Modern MDS software: Proxscal in SPSS (PASW)

2000 Tenenbaum, et al. Large scale MDS ISOMAP heuristic

2002 Buja, Swayne, Cook Dynamic MDS in GGvis (part of GGobi)

2003 Groenen Dynamic MDS visualization through iMDS

2005- Groenen, Trosset,

Kagie

Large scale MDS through Stress

2002 Denœux, Masson,

Groenen, Winsberg,

Diday

Symbolic MDS of interval dissimilarities

2006 Groenen, Winsberg Symbolic MDS of histograms

2009 De Leeuw, Mair Smacof package in R

past, present, and future of multidimensional...

Documents