data clustering: 50 years beyond k-means
DESCRIPTION
Data Clustering: 50 years beyond K-means. Presenter : Jiang-Shan Wang Authors : Anil K. Jain. 國立雲林科技大學 National Yunlin University of Science and Technology. PRL 2010. Outline. Motivation Objective Data clustering User’s dilemma K-means Extensions of K-means - PowerPoint PPT PresentationTRANSCRIPT
Intelligent Database Systems Lab
N.Y.U.S.T.
I. M.
Data Clustering: 50 years beyond K-means
Presenter : Jiang-Shan Wang
Authors : Anil K. Jain
PRL 2010
國立雲林科技大學National Yunlin University of Science and Technology
1
Intelligent Database Systems Lab
N.Y.U.S.T.
I. M.Outline
Motivation
Objective
Data clustering
User’s dilemma
K-means
Extensions of K-means
Trends in data clustering
Summary
Comments
2
Intelligent Database Systems Lab
N.Y.U.S.T.
I. M.Motivation
Providing a brief overview of clustering and point out some of the emerging and useful research directions.
3
Intelligent Database Systems Lab
N.Y.U.S.T.
I. M.Objective
4
Summarizing well known clustering methods, discuss the major challenge and key issues in designing clustering algorithm, and point out some of the emerging and useful research directions.
Intelligent Database Systems Lab
N.Y.U.S.T.
I. M.Data clustering
5
Three main purposes: Underlying structure
Natural classification
Compression
Intelligent Database Systems Lab
N.Y.U.S.T.
I. M.K-means
Three parameters Number of clusters
Cluster initialization
Distance metrics
6
Intelligent Database Systems Lab
N.Y.U.S.T.
I. M.Extensions of K-means
Fuzzy C-means
Bisecting K-means
X-means
K-medoid
Kernel K-means
7
Intelligent Database Systems Lab
N.Y.U.S.T.
I. M.User’s dilemma
Representation
8
Intelligent Database Systems Lab
N.Y.U.S.T.
I. M.User’s dilemma
Purpose of grouping
9
Intelligent Database Systems Lab
N.Y.U.S.T.
I. M.User’s dilemma
Number of clusters
10
Intelligent Database Systems Lab
N.Y.U.S.T.
I. M.User’s dilemma
Cluster validity
11
Intelligent Database Systems Lab
N.Y.U.S.T.
I. M.User’s dilemma
Comparing clustering algorithm
12
Intelligent Database Systems Lab
N.Y.U.S.T.
I. M.User’s dilemma
Comparing clustering algorithm
13
Intelligent Database Systems Lab
N.Y.U.S.T.
I. M.User’s dilemma
Admissibility analysis of clustering algorithms
Fisher and vanNess’s criteria Convex
Cluster proportion
Cluster omission
Monotone
Kleinberg’s criteria Scale invariance
Richness
consistency
14
Intelligent Database Systems Lab
N.Y.U.S.T.
I. M.Trends in data clustering
Clustering ensembles
15
Intelligent Database Systems Lab
N.Y.U.S.T.
I. M.Trends in data clustering
Semi-supervised clustering
16
Intelligent Database Systems Lab
N.Y.U.S.T.
I. M.Trends in data clustering
Large-scale clustering
Studies Efficient Nearest Neighbor
Data summarization
Distributed computing
Incremental clustering
Sampling-based methods
17
Intelligent Database Systems Lab
N.Y.U.S.T.
I. M.Trends in data clustering
Multi-way clustering
Heterogeneous data Rank data
Dynamic data
Graph data
Relational data
18
Intelligent Database Systems Lab
N.Y.U.S.T.
I. M.Summary
19
There needs to be a suite of benchmark data.
A tighter integration between clustering algorithms and the application needs.
Optimization problems.
Stability or consistency.
Choose clustering principles according to satisfiability of the stated axioms.
Develop semi-supervised clustering.
Intelligent Database Systems Lab
N.Y.U.S.T.
I. M.Comments
20
Advantage Many figures to understanding.
Drawback …
Application Clustering.