fast global k-means clustering using cluster membership and inequality

25
Intelligent Database Systems Lab N.Y.U.S. T. I. M. Fast global k-means clustering using cluster membership and inequality Pattern Recognition (PR, 2010) Presenter : Lin, Shu-Han Authors : Jim Z.C. Lai, Tsung-Jen Huang

Upload: tejana

Post on 16-Jan-2016

35 views

Category:

Documents


0 download

DESCRIPTION

Fast global k-means clustering using cluster membership and inequality. Presenter : Lin, Shu -Han Authors : Jim Z.C. Lai, Tsung -Jen Huang. Pattern Recognition (PR, 2010). Outline. Motivation Objective Methodology Experiments Conclusion Comments. Motivation. FGKM and MGKM - PowerPoint PPT Presentation

TRANSCRIPT

Page 1: Fast  global k-means clustering  using cluster membership and inequality

Intelligent Database Systems Lab

N.Y.U.S.T.I. M.

Fast global k-means clustering using cluster membership and

inequality

Pattern Recognition (PR, 2010)

Presenter : Lin, Shu-Han

Authors : Jim Z.C. Lai, Tsung-Jen Huang

Page 2: Fast  global k-means clustering  using cluster membership and inequality

Intelligent Database Systems Lab

N.Y.U.S.T.I. M.

2

Outline

Motivation Objective Methodology Experiments Conclusion Comments

Page 3: Fast  global k-means clustering  using cluster membership and inequality

Intelligent Database Systems Lab

N.Y.U.S.T.I. M.

3

Motivation

FGKM

and MGKM

Have the same computational complexity

MGKM Claims that it is more effective than FGKM (see 2008.PR.8.書漢 .1027.Modified global k-means algorithm for minimum sum-of-squares clustering problems)

Page 4: Fast  global k-means clustering  using cluster membership and inequality

Intelligent Database Systems Lab

N.Y.U.S.T.I. M.

4

Objectives

Develop a set of inequalities to Speed up FGKM and MGKM, called MFGKM

Using Karhunen-Loeve Transform (KLT) closely related to the Principal Component

Analysis (PCA)

, th=.9999

Page 5: Fast  global k-means clustering  using cluster membership and inequality

Intelligent Database Systems Lab

N.Y.U.S.T.I. M.Methodology – MFGKM

5

Red = proposed

(or s Yj’ , called MCS)

Page 6: Fast  global k-means clustering  using cluster membership and inequality

Intelligent Database Systems Lab

N.Y.U.S.T.I. M.

Methodology – cluster center selection algorithm

6

(Speed up)

Page 7: Fast  global k-means clustering  using cluster membership and inequality

Intelligent Database Systems Lab

N.Y.U.S.T.I. M.

Methodology – Candidate set construction algorithm

7

Page 8: Fast  global k-means clustering  using cluster membership and inequality

Intelligent Database Systems Lab

N.Y.U.S.T.I. M.

Methodology – Candidate set construction algorithm (Cont.)

8

A1* A2 A3 … Ana

x10 8.2 4.2 … … …

x9 8.0 2.1

x8 (Q) 7.2 2.2 3.2 … …

x7 5.4

x6 5.1

l+p

r10=2, r10=d(x10, c)

|8.2-7.2|=11+|2.2-4.2|=3>r10, delete x10,

x10 cannot be the nearest neighbor of x8

1.

m 1 2

Page 9: Fast  global k-means clustering  using cluster membership and inequality

Intelligent Database Systems Lab

N.Y.U.S.T.I. M.

Methodology – Candidate set construction algorithm (Cont.)

9

A1* A2 A3 … Ana

x12 11 4 3 1

x11 9.7 4.3 2 0

x10 8.3 4.2

x9 8.0 5.1

X8 (Q) 7.2 2.2 3.2 … …

x7 5.4

x6 5.1

x5 4.7 ... ... ... ...

x4 ... ... ... ... ...

x3 ... ... ... ... ...

x2 ... ... ... ... ...

x1 ... ... ... ... ...

m

rmax=22.

Page 10: Fast  global k-means clustering  using cluster membership and inequality

Intelligent Database Systems Lab

N.Y.U.S.T.I. M.

Methodology – Candidate set construction algorithm (Cont.)

10

3.

Page 11: Fast  global k-means clustering  using cluster membership and inequality

Intelligent Database Systems Lab

N.Y.U.S.T.I. M.

Methodology – Candidate set construction algorithm (Cont.)

11

A1* A2 A3 … Ana

x9 8.2 2.2 4 3 2

X8 (Q) 7.2 2.2 4 3 2

x7 6.2 2.2 4 3 2

m

Diff (distortion)

Diff = (r9-d(x8,x9))+(r10-d(x8,x10))=2-1 + 2-1

4.

Return 2 and center of x9 and x7

Page 12: Fast  global k-means clustering  using cluster membership and inequality

Intelligent Database Systems Lab

N.Y.U.S.T.I. M.

Methodology – Candidate set construction algorithm (Cont.)

12

Page 13: Fast  global k-means clustering  using cluster membership and inequality

Intelligent Database Systems Lab

N.Y.U.S.T.I. M.

Methodology – Candidate set construction algorithm (Cont.)

13

Page 14: Fast  global k-means clustering  using cluster membership and inequality

Intelligent Database Systems Lab

N.Y.U.S.T.I. M.

Methodology – Candidate set construction algorithm (Cont.)

14

Page 15: Fast  global k-means clustering  using cluster membership and inequality

Intelligent Database Systems Lab

N.Y.U.S.T.I. M.Methodology – MCS

15

Page 16: Fast  global k-means clustering  using cluster membership and inequality

Intelligent Database Systems Lab

N.Y.U.S.T.I. M.Experiments – Computing time

16

Page 17: Fast  global k-means clustering  using cluster membership and inequality

Intelligent Database Systems Lab

N.Y.U.S.T.I. M.Experiments – Distortion

17

Least distortion

Faster, but distortion

Page 18: Fast  global k-means clustering  using cluster membership and inequality

Intelligent Database Systems Lab

N.Y.U.S.T.I. M.

18

Conclusions

GKM FGKM: faster, but local MGKM: better performance then FGKM, but needs more computational complexity

MFGKM: faster, and better then MGKM MFGKM+MCS: fastest method, and performance is comparable to MGKM

Page 19: Fast  global k-means clustering  using cluster membership and inequality

Intelligent Database Systems Lab

N.Y.U.S.T.I. M.

19

Comments

Advantage Improve both performance and speed

Drawback …

Application …

Page 20: Fast  global k-means clustering  using cluster membership and inequality

Intelligent Database Systems Lab

N.Y.U.S.T.I. M.Methodology – k-Means

20

sensitive to the choice of a starting point

Page 21: Fast  global k-means clustering  using cluster membership and inequality

Intelligent Database Systems Lab

N.Y.U.S.T.I. M.Methodology – The GKM algorithm

21

Objective function

Page 22: Fast  global k-means clustering  using cluster membership and inequality

Intelligent Database Systems Lab

N.Y.U.S.T.I. M.

Methodology – Objective function

22

Old version

Reformulated version

Page 23: Fast  global k-means clustering  using cluster membership and inequality

Intelligent Database Systems Lab

N.Y.U.S.T.I. M.

Methodology – fast GKM algorithm

23

Old version

Proposed version (auxiliary cluster function)

i

jk-1

i

yk-1

Page 24: Fast  global k-means clustering  using cluster membership and inequality

Intelligent Database Systems Lab

N.Y.U.S.T.I. M.

Methodology – modified GKM algorithm

24

Proposed version

i

S2k-1

S2

S2

S2

S2

ci

Page 25: Fast  global k-means clustering  using cluster membership and inequality

Intelligent Database Systems Lab

N.Y.U.S.T.I. M.

Methodology – modified GKM algorithm

25