extending metric multidimensional scaling with bregman divergences jigang sun and colin fyfe

63
Extending metric multidimensional scaling with Bregman divergences Jigang Sun and Colin Fyfe

Post on 20-Dec-2015

226 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Extending metric multidimensional scaling with Bregman divergences Jigang Sun and Colin Fyfe

Extending metric multidimensional scaling with Bregman divergences

Jigang Sun and Colin Fyfe

Page 2: Extending metric multidimensional scaling with Bregman divergences Jigang Sun and Colin Fyfe

Visualising 18 dimensional data

Page 3: Extending metric multidimensional scaling with Bregman divergences Jigang Sun and Colin Fyfe

Outline

• Bregman divergence.• Multidimensional scaling(MDS).• Extending MDS with Bregman divergences.• Relating the Sammon mapping to mappings

with Bregman divergences. Comparison of effects and explanation.

• Conclusion

Page 4: Extending metric multidimensional scaling with Bregman divergences Jigang Sun and Colin Fyfe

Strictly Convex function

pqpq

pq

F(p),η)-(1 ηF(q))η)1(F(η

,1 0any and domain, itsin and any for

Pictorially, the strictly convex function F(x) lies below segment connecting two points q and p.

Page 5: Extending metric multidimensional scaling with Bregman divergences Jigang Sun and Colin Fyfe

Bregman Divergences

)(,)()(),( yyxyxyxd

is the Bregman divergence between x and y based on convex function, φ.

...!2

)('')()(')()()( 2

yyxyyxyx

Taylor Series expansion is

Page 6: Extending metric multidimensional scaling with Bregman divergences Jigang Sun and Colin Fyfe

Bregman Divergences

Page 7: Extending metric multidimensional scaling with Bregman divergences Jigang Sun and Colin Fyfe

Euclidean distance is a Bregman divergence

2

22

22

2

),(

2),(

2).(),(

)(

yxyxd

yxyxyxd

yyxyxyxd

xx

Page 8: Extending metric multidimensional scaling with Bregman divergences Jigang Sun and Colin Fyfe

Kullback Leibler Divergence

i

id

ii

d

iii

i

id

ii

d

iiii

d

iii

d

iii

d

iii

d

iii

d

iii

q

ppqpd

qpeq

ppqpd

eqqpqqppqpd

qqpqqppqpd

ppp

1

12

1

122

11

11

1

log),(

)(loglog),(

)log)(log(loglog),(

)(,loglog),(

log)(

Page 9: Extending metric multidimensional scaling with Bregman divergences Jigang Sun and Colin Fyfe

Generalised Information Divergence

• φ(z)=z log(z)

yxxyxd

yxyxxxyxd

yyxyyxxyxd

yx

log),(

loglog),(

1log,loglog),(

Page 10: Extending metric multidimensional scaling with Bregman divergences Jigang Sun and Colin Fyfe
Page 11: Extending metric multidimensional scaling with Bregman divergences Jigang Sun and Colin Fyfe

Other Divergences

• Itakura-Saito Divergence

• Mahalanobis distance

• Logistic loss

• Any convex function

)1log()1(log)(

)(

)log()(

1

xxxxx

xxx

xx

T

Page 12: Extending metric multidimensional scaling with Bregman divergences Jigang Sun and Colin Fyfe

Some Properties

• dφ(x,y)≥0, with equality iff x==y.

• Not a metric since dφ(x,y)≠ dφ(y,x)

• (Though d(x,y)=(dφ(x,y)+dφ(y,x)) is symmetric)

• Convex in the first parameter.• Linear, dφ+aγ(x,y)= dφ(x,y) + a.dγ(x,y)

Page 13: Extending metric multidimensional scaling with Bregman divergences Jigang Sun and Colin Fyfe

Multidimensional Scaling

• Creates one latent point for each data point.• The latent space is often 2 dimensional.• Positions the latent points so that they best

represent the data distances.– Two latent points are close if the two corresponding data points are close.– Two latent points are distant if the two

corresponding data points are distant.

Page 14: Extending metric multidimensional scaling with Bregman divergences Jigang Sun and Colin Fyfe

Classical/Basic Metric MDS• We minimise the stress function

spacelatent in j and i pointsbetween distance mapped the

space datain and pointsbetween distance the

||, - || L

, || - ||

ij

ij

XX

jYiYD

ji

ji

XX

YY

ijij

ii

LD

XY

Y

jj

X data space Latent space

)abs( E

E)D(LE

error

where

N

1i

N

1ij

2N

1i

N

1ij

2BasicMDS

ijijij

ijijij

DL

Page 15: Extending metric multidimensional scaling with Bregman divergences Jigang Sun and Colin Fyfe

Sammon Mapping (1969)

N

1i

N

1ijij

ijijij

N

1i

N

1ij ij

2ij

N

1i

N

1ij ij

2ijij

Sammon

DC

)Dabs(L E

D

E

D

)D(LE

scalarion Normalisat

error

where

11CC

Focuses on small distances: for the same error, the smaller distance is given bigger stress.

Page 16: Extending metric multidimensional scaling with Bregman divergences Jigang Sun and Colin Fyfe

Possible Extensions

N

jijiFjiFF

N

jijiFjiF

dddJ

ddJ

1,

1,

2

)),(),,((

)),(),((

213

21

yyxx

yyxx

Bregman divergences in both data space and latent space

Or even

Page 17: Extending metric multidimensional scaling with Bregman divergences Jigang Sun and Colin Fyfe

ijijij

ijijISij

ij

ijij

ij

ijijijIS

jiFij

jiFij

N

jijiFjiFFBMDS

LDL

DLdL

D

DL

D

LDLd

yydD

xxdL

yydxxddJ

11),(

log),(

),(

),(

)),(),,((

3

2

3211,

Metric MDs with Bregman divergence between distances

Euclidean distance on latents.

Any divergence on data

Itakura-Saito divergence between them:

to minimisedivergence.

(Sammon-like)

Page 18: Extending metric multidimensional scaling with Bregman divergences Jigang Sun and Colin Fyfe

Moving the Latent Points

N

j ijijjii

ijijij

ijijISij

N

jijiFjiFFBMDS

DLxxx

DLL

DLdL

yydxxddJ

1

1,

11)(

11),(

),(),,((321

F1 for I.S. divergence, F2 for euclidean , F3 any divergence

Page 19: Extending metric multidimensional scaling with Bregman divergences Jigang Sun and Colin Fyfe

The algae data set

Page 20: Extending metric multidimensional scaling with Bregman divergences Jigang Sun and Colin Fyfe

The algae data set

Page 21: Extending metric multidimensional scaling with Bregman divergences Jigang Sun and Colin Fyfe

Two representations

))()()()((

),()(

1 1

1 1

ijijij

N

i

N

ijijij

N

i

N

ijijijFBMDS

DFDLDFLF

DLdYE

...)()(

!3

1)(

)(

!2

1)( 3

3

3

1 1

22

2

ijijij

ijN

i

N

ijijij

ij

ijBMDS DL

dD

DFdDL

dD

DFdYE

The standard Bregman representation:

Concentrating on the residual errors:

Page 22: Extending metric multidimensional scaling with Bregman divergences Jigang Sun and Colin Fyfe

Basic MDS is a special BMMDS• Base convex function is chosen as • And higher order derivatives are

• So

• is derived as

Page 23: Extending metric multidimensional scaling with Bregman divergences Jigang Sun and Colin Fyfe

Sammon Mapping

xdx

xFdx

dx

xdF

xxxF

1)(,log1

)(

,log)(

2

2

...)()(

!3

1

...)()(

!3

1)(

)(

!2

1)(

33

3

1 1

33

3

1 1

22

2

ijijij

ijN

i

N

ij

Sammonij

ijijij

ijN

i

N

ijijij

ij

ijBMDS

DLdD

DFdI

DLdD

DFdDL

dD

DFdYE

Select

Then

Page 24: Extending metric multidimensional scaling with Bregman divergences Jigang Sun and Colin Fyfe

Example 2: Extended Sammon

• Base convex function

• This is equivalent to

• The Sammon mapping is rewritten as

0, x x,log x F(x)

Page 25: Extending metric multidimensional scaling with Bregman divergences Jigang Sun and Colin Fyfe

Sammon and Extended Sammon

• The common term • The Sammon mapping is thus an

approximation to the Extended Sammon mapping via the common term.

• The Extended Sammon mapping will do more adjustments on the basis of the higher order terms.

Page 26: Extending metric multidimensional scaling with Bregman divergences Jigang Sun and Colin Fyfe

An Experiment on Swiss roll data set

Page 27: Extending metric multidimensional scaling with Bregman divergences Jigang Sun and Colin Fyfe

Distance preservation

Page 28: Extending metric multidimensional scaling with Bregman divergences Jigang Sun and Colin Fyfe

Relative standard deviation

Page 29: Extending metric multidimensional scaling with Bregman divergences Jigang Sun and Colin Fyfe

Relative standard deviation

• On short distances, Sammon has smaller variance than BasicMDS, Extended Sammon has smaller variance than Sammon, i.e. control of small distances is enhanced.

• Large distances are given more and more freedom in the same order as above.

Page 30: Extending metric multidimensional scaling with Bregman divergences Jigang Sun and Colin Fyfe

LCMC: local continuity meta-criterion (L. Chen 2006)

• A common measure assesses projection quality of different MDS methods.

• In terms of neighbourhood preservation.• Value between 0 and 1, the higher the better.

Page 31: Extending metric multidimensional scaling with Bregman divergences Jigang Sun and Colin Fyfe

Quality accessed by LCMC

Page 32: Extending metric multidimensional scaling with Bregman divergences Jigang Sun and Colin Fyfe

Why Extended Sammon outperforms Sammon

when

• Stress formation

Page 33: Extending metric multidimensional scaling with Bregman divergences Jigang Sun and Colin Fyfe

Features of the base convex function

• Recall that the base convex function for the Extended Sammon mapping is

• Higher order derivatives are

• Even orders are positive and odd ones are negative.

Page 34: Extending metric multidimensional scaling with Bregman divergences Jigang Sun and Colin Fyfe

Stress comparison between Sammon and Extended Sammon

Page 35: Extending metric multidimensional scaling with Bregman divergences Jigang Sun and Colin Fyfe

Stress configured by Sammon, calculated and mapped by Extended Sammon

Page 36: Extending metric multidimensional scaling with Bregman divergences Jigang Sun and Colin Fyfe

Stress configured by Sammon, calculated and mapped by Extended Sammon

• The Extended Sammon mapping calculates stress on the basis of the configuration found by the Sammon mapping.

• For , the mean stresses calculated by the Extended Sammon are much higher than mapped by the Sammon mapping.

• For , the calculated mean stresses are obviously lower than that of the Sammon mapping.

• The Extended Sammon makes shorter mapped distance even more short, longer even more long.

Page 37: Extending metric multidimensional scaling with Bregman divergences Jigang Sun and Colin Fyfe

Stress formation by items

Page 38: Extending metric multidimensional scaling with Bregman divergences Jigang Sun and Colin Fyfe

Generalisation: from MDS to Bregman divergences

• A group of MDS is generalised as

• C is a normalisation scalar which is used for quantitative comparison purposes. It does not affect the mapping results.

• Weight function for missing samples

• The Basic MDS and the Sammon mapping belong to this group.

Page 39: Extending metric multidimensional scaling with Bregman divergences Jigang Sun and Colin Fyfe

Generalisation: from MDS to Bregman divergences

• If C=1, then set • Then the generalised MDS is the first term of

BMMDS and BMMDS is an extension of MDS. • Recall that BMMDS is equivalent to

Page 40: Extending metric multidimensional scaling with Bregman divergences Jigang Sun and Colin Fyfe

Criterion for base convex function selection

• In order to focus on local distances and concentrate less on long distances, the base convex function must satisfy

• Not all convex functions can be considered, such as F(x)=exp(x).

• The 2nd order derivative is primarily considered. We wish it to be big for small distances and small for long distances. It represents the focusing power on local distances.

Page 41: Extending metric multidimensional scaling with Bregman divergences Jigang Sun and Colin Fyfe

Two groups of Convex functions

• The even order derivatives are positive, odd order ones are negative.

• No 1 is that of the Extended Sammon mapping.

Page 42: Extending metric multidimensional scaling with Bregman divergences Jigang Sun and Colin Fyfe

Focusing power

Page 43: Extending metric multidimensional scaling with Bregman divergences Jigang Sun and Colin Fyfe

Different strategies for focusing power

• Vertical axis is logarithm of 2nd order derivative.• These use different strategies for increasing

focusing power.• In the first group, the second order derivatives

are higher and higher for small distances and lower and lower for long distances.

• In the second group, second order derivatives have limited maximum values for very small distances, but derivatives are drastically lower and lower for long distances when λ increases.

Page 44: Extending metric multidimensional scaling with Bregman divergences Jigang Sun and Colin Fyfe

Two groups of Bregman divergences

• Elastic scaling(Victor E McGee, 1966)

Page 45: Extending metric multidimensional scaling with Bregman divergences Jigang Sun and Colin Fyfe

Experiment on Swiss roll: The FirstGroup

Page 46: Extending metric multidimensional scaling with Bregman divergences Jigang Sun and Colin Fyfe

Experiment on Swiss roll: FirstGroup

• For Extended Sammon, Itakura-Saito, • , local distances are mapped better

and better, long distances are stretched such that unfolding trend is obvious.

Page 47: Extending metric multidimensional scaling with Bregman divergences Jigang Sun and Colin Fyfe

Distances mapping : FirstGroup

Page 48: Extending metric multidimensional scaling with Bregman divergences Jigang Sun and Colin Fyfe

Standard deviation : FirstGroup

Page 49: Extending metric multidimensional scaling with Bregman divergences Jigang Sun and Colin Fyfe

LCMC measure : FirstGroup

Page 50: Extending metric multidimensional scaling with Bregman divergences Jigang Sun and Colin Fyfe

Experiment on Swiss roll:SecondGroup

Page 51: Extending metric multidimensional scaling with Bregman divergences Jigang Sun and Colin Fyfe

Distance mapping: SecondGroup

Page 52: Extending metric multidimensional scaling with Bregman divergences Jigang Sun and Colin Fyfe

StandardDeviation: SecondGroup

Page 53: Extending metric multidimensional scaling with Bregman divergences Jigang Sun and Colin Fyfe

LCMC: SecondGroup

Page 54: Extending metric multidimensional scaling with Bregman divergences Jigang Sun and Colin Fyfe

OpenBox, Sammon and FirstGroup

Page 55: Extending metric multidimensional scaling with Bregman divergences Jigang Sun and Colin Fyfe

SecondGroup on OpenBox

Page 56: Extending metric multidimensional scaling with Bregman divergences Jigang Sun and Colin Fyfe

Distance mapping: two groups

Page 57: Extending metric multidimensional scaling with Bregman divergences Jigang Sun and Colin Fyfe

LCMC: two groups

Page 58: Extending metric multidimensional scaling with Bregman divergences Jigang Sun and Colin Fyfe

Standard deviation: two groups

Page 59: Extending metric multidimensional scaling with Bregman divergences Jigang Sun and Colin Fyfe

Swiss roll distances distribution

Page 60: Extending metric multidimensional scaling with Bregman divergences Jigang Sun and Colin Fyfe

OpenBox distances distribution

Page 61: Extending metric multidimensional scaling with Bregman divergences Jigang Sun and Colin Fyfe

Swiss roll vs OpenBox• Distances formation:• Swiss roll: proportion of longer distances is greater than that of the

shorter distances.• OpenBox: Very large quantity of a set of medium distances, small

distances take much of the rest.

• Mapping results:• Swiss roll: Long distances are stretched and local distances are usually

mapped shorter. • The OpenBox: the longest distances are not stretched obviously,

perhaps even compressed. Small distances are mapped longer than original values in data space by some methods.

• Conclusion: Tug of war between local and long distances. Trying to get the opportunities to be mapped to their original values in data space.

Page 62: Extending metric multidimensional scaling with Bregman divergences Jigang Sun and Colin Fyfe

Left and right Bregman divergences

• All of this is with left divergences – latent points are in left position in divergence, ...

• We can show that right divergences produce extensions of curvilinear component analysis.

(Sun et al, ESANN2010)

Page 63: Extending metric multidimensional scaling with Bregman divergences Jigang Sun and Colin Fyfe

Conclusion

• Applied Bregman divergences to multidimensional scaling.

• Shown that basic MMDS is a special case and Sammon mapping approximates a BMMDS.

• Improved upon both with 2 families of divergences.

• Shown results on two artificial data sets.