clustering density-based methods elsayed hemayed data mining course

14
CLUSTERING DENSITY-BASED METHODS Elsayed Hemayed Data Mining Course

Upload: prudence-carr

Post on 18-Jan-2018

227 views

Category:

Documents


0 download

DESCRIPTION

Clustering Methods Density-based Clustering Methods 3  Partitioning methods  K-Means  Hierarchical methods  Agglomerative Hierarchical Clustering  Divisive hierarchical clustering  Density-based methods  DBSCAN: a Density-Based Spatial Clustering of Applications with Noise  Grid-based methods  STING: A Statistical Information Grid Approach to Spatial Data Mining  Model-based methods  Expectation-Maximization  Neural Network Approach  High Dimensional Data Clustering  CLIQUE: A Dimension-Growth Subspace Clustering Method

TRANSCRIPT

Page 1: CLUSTERING DENSITY-BASED METHODS Elsayed Hemayed Data Mining Course

CLUSTERINGDENSITY-BASED METHODSElsayed HemayedData Mining Course

Page 2: CLUSTERING DENSITY-BASED METHODS Elsayed Hemayed Data Mining Course

2

Outline

Density-based Clustering Methods

Density-Based Clustering Methods Density-Based Clustering Background Terminology How does DBSCAN find clusters? DBSCAN

Page 3: CLUSTERING DENSITY-BASED METHODS Elsayed Hemayed Data Mining Course

3

Clustering Methods

Density-based Clustering Methods

Partitioning methods K-Means

Hierarchical methods Agglomerative Hierarchical Clustering Divisive hierarchical clustering

Density-based methods DBSCAN: a Density-Based Spatial Clustering of Applications

with Noise Grid-based methods

STING: A Statistical Information Grid Approach to Spatial Data Mining Model-based methods

Expectation-Maximization Neural Network Approach

High Dimensional Data Clustering CLIQUE: A Dimension-Growth Subspace Clustering Method

Page 4: CLUSTERING DENSITY-BASED METHODS Elsayed Hemayed Data Mining Course

4

DBSCAN

Density-based Clustering Methods

Density-based Clustering Methods

Page 5: CLUSTERING DENSITY-BASED METHODS Elsayed Hemayed Data Mining Course

5

Density-Based Clustering Methods Clustering based on density, such as density-connected

points instead of distance metric. Cluster = set of “density connected” points. Major features:

Discover clusters of arbitrary shape Handle noise Need “density parameters” as termination condition- (when

no new objects can be added to the cluster.)

Example: DBSCAN (Ester, et al. 1996) OPTICS (Ankerst, et al 1999) DENCLUE (Hinneburg & D. Keim 1998)

Density-based Clustering Methods

Page 6: CLUSTERING DENSITY-BASED METHODS Elsayed Hemayed Data Mining Course

6

Density-Based Clustering: Background

Eps neighborhood: The neighborhood within a radius Eps of a given object MinPts: Minimum number of points in an Eps-

neighborhood of that object. Core object: If the Eps neighborhood contains at

least a minimum number of points Minpts, then the object is a core object

Directly density-reachable: A point p is directly density-reachable from a point q wrt. Eps, MinPts if 1) p is within the Eps neighborhood of q 2) q is a core object p

qMinPts = 5

Eps = 1Density-based Clustering Methods

Page 7: CLUSTERING DENSITY-BASED METHODS Elsayed Hemayed Data Mining Course

7

Density Reachability and Density Connectivity

M, P, O and R are core objects since each is in an Eps neighborhood containing at least 3 points

Minpts = 3

Eps=radius of the circles

Density-based Clustering Methods

Page 8: CLUSTERING DENSITY-BASED METHODS Elsayed Hemayed Data Mining Course

8

Directly density reachable Q is directly density reachable from M. M is directly density reachable from P and

vice versa.

Density-based Clustering Methods

Page 9: CLUSTERING DENSITY-BASED METHODS Elsayed Hemayed Data Mining Course

9

Indirectly density reachable Q is indirectly density reachable from P

since Q is directly density reachable from M and M is directly density reachable from P. But, P is not density reachable from Q since Q is not a core object.

Density-based Clustering Methods

Page 10: CLUSTERING DENSITY-BASED METHODS Elsayed Hemayed Data Mining Course

10

Core, border, and noise points DBSCAN is a Density-Based Spatial Clustering of

Applications with Noise Density = number of points within a specified radius (Eps)

A point is a core point if it has a specified number (or more) of points (MinPts) within Eps These are points that are at the interior of a cluster.

A border point has fewer than MinPts within Eps, but is in the neighborhood of a core point.

A noise point is any point that is not a core point nor a border point.

Density-based Clustering Methods

Page 11: CLUSTERING DENSITY-BASED METHODS Elsayed Hemayed Data Mining Course

11

How does DBSCAN find clusters?

Density-based Clustering Methods

DBSCAN searches for clusters by checking the Eps-neighborhood of each point in the database.

If the Eps-neighborhood of a point p contains more than MinPts, a new cluster with p as a core object is created.

DBSCAN then iteratively collects directly density-reachable objects from these core objects, which may involve the merge of a few density-reachable clusters.

The process terminates when no new point can be added to any cluster

Page 12: CLUSTERING DENSITY-BASED METHODS Elsayed Hemayed Data Mining Course

12

DBSCAN Algorithm Arbitrary select a point p Retrieve all points density-reachable from p

wrt Eps and MinPts. If p is a core point, a cluster is formed. If p is a border point, no points are density-

reachable from p and DBSCAN visits the next point of the database.

Continue the process until all of the points have been processed.

Density-based Clustering Methods

Page 13: CLUSTERING DENSITY-BASED METHODS Elsayed Hemayed Data Mining Course

13

DBSCAN Summary DBSCAN is A Density-Based Clustering Method

Based on Connected Regions with Sufficiently High Density

The algorithm grows regions with sufficiently high density into clusters and discovers clusters of arbitrary shape in spatial databases with noise.

It defines a cluster as a maximal set of density-connected points. So distance is not the metric unlike the case of hierarchical methods.

Density-based Clustering Methods

Page 14: CLUSTERING DENSITY-BASED METHODS Elsayed Hemayed Data Mining Course

14

Summary

Density-based Clustering Methods

Density-Based Clustering Methods Density-Based Clustering

Background Terminology How does DBSCAN find clusters? DBSCAN