fast global k-means clustering using cluster membership and inequality

Intelligent Database Systems Lab

N.Y.U.S.T.I. M.

Fast global k-means clustering using cluster membership and

inequality

Pattern Recognition (PR, 2010)

Presenter : Lin, Shu-Han

Authors : Jim Z.C. Lai, Tsung-Jen Huang


N.Y.U.S.T.I. M.

2

Outline

Motivation Objective Methodology Experiments Conclusion Comments


N.Y.U.S.T.I. M.

3

Motivation

FGKM

and MGKM

Have the same computational complexity

MGKM Claims that it is more effective than FGKM (see 2008.PR.8.書漢 .1027.Modified global k-means algorithm for minimum sum-of-squares clustering problems)


N.Y.U.S.T.I. M.

4

Objectives

Develop a set of inequalities to Speed up FGKM and MGKM, called MFGKM

Using Karhunen-Loeve Transform (KLT) closely related to the Principal Component

Analysis (PCA)

, th=.9999


N.Y.U.S.T.I. M.Methodology – MFGKM

5

Red = proposed

(or s Yj’ , called MCS)


N.Y.U.S.T.I. M.

Methodology – cluster center selection algorithm

6

(Speed up)


N.Y.U.S.T.I. M.

Methodology – Candidate set construction algorithm

7


N.Y.U.S.T.I. M.

Methodology – Candidate set construction algorithm (Cont.)

8

A1* A2 A3 … Ana

x10 8.2 4.2 … … …

x9 8.0 2.1

x8 (Q) 7.2 2.2 3.2 … …

x7 5.4

x6 5.1

l+p

r10=2, r10=d(x10, c)

|8.2-7.2|=11+|2.2-4.2|=3>r10, delete x10,

x10 cannot be the nearest neighbor of x8

1.

m 1 2


N.Y.U.S.T.I. M.


9

A1* A2 A3 … Ana

x12 11 4 3 1

x11 9.7 4.3 2 0

x10 8.3 4.2

x9 8.0 5.1

X8 (Q) 7.2 2.2 3.2 … …

x7 5.4

x6 5.1

x5 4.7 ... ... ... ...

x4 ... ... ... ... ...

x3 ... ... ... ... ...

x2 ... ... ... ... ...

x1 ... ... ... ... ...

m

rmax=22.


N.Y.U.S.T.I. M.


10

3.


N.Y.U.S.T.I. M.


11

A1* A2 A3 … Ana

x9 8.2 2.2 4 3 2

X8 (Q) 7.2 2.2 4 3 2

x7 6.2 2.2 4 3 2

m

Diff (distortion)

Diff = (r9-d(x8,x9))+(r10-d(x8,x10))=2-1 + 2-1

4.

Return 2 and center of x9 and x7


N.Y.U.S.T.I. M.


12


N.Y.U.S.T.I. M.


13


N.Y.U.S.T.I. M.


14


N.Y.U.S.T.I. M.Methodology – MCS

15


N.Y.U.S.T.I. M.Experiments – Computing time

16


N.Y.U.S.T.I. M.Experiments – Distortion

17

Least distortion

Faster, but distortion


N.Y.U.S.T.I. M.

18

Conclusions

GKM FGKM: faster, but local MGKM: better performance then FGKM, but needs more computational complexity

MFGKM: faster, and better then MGKM MFGKM+MCS: fastest method, and performance is comparable to MGKM


N.Y.U.S.T.I. M.

19

Comments

Advantage Improve both performance and speed

Drawback …

Application …


N.Y.U.S.T.I. M.Methodology – k-Means

20

sensitive to the choice of a starting point


N.Y.U.S.T.I. M.Methodology – The GKM algorithm

21

Objective function


N.Y.U.S.T.I. M.

Methodology – Objective function

22

Old version

Reformulated version


N.Y.U.S.T.I. M.

Methodology – fast GKM algorithm

23

Old version

Proposed version (auxiliary cluster function)

i

jk-1

i

yk-1


N.Y.U.S.T.I. M.

Methodology – modified GKM algorithm

24

Proposed version

i

S2k-1

S2

S2

S2

S2

ci


N.Y.U.S.T.I. M.

Methodology – modified GKM algorithm

25

fast global k-means clustering using cluster membership and inequality

Documents