density-based clustering · main clustering approaches partitioning method →constructs partitions...

26
Density-Based Clustering Izabela Moise, Evangelos Pournaras, Dirk Helbing Izabela Moise, Evangelos Pournaras, Dirk Helbing 1

Upload: others

Post on 24-Jun-2020

21 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Density-Based Clustering · Main Clustering Approaches Partitioning method →constructs partitions of data points →evaluates the partitions by some criterion → k-means, -medoids

Density-Based ClusteringIzabela Moise, Evangelos Pournaras, Dirk Helbing

Izabela Moise, Evangelos Pournaras, Dirk Helbing 1

Page 2: Density-Based Clustering · Main Clustering Approaches Partitioning method →constructs partitions of data points →evaluates the partitions by some criterion → k-means, -medoids

Reminder

Unsupervised data miningX Clustering→ k -Means

Izabela Moise, Evangelos Pournaras, Dirk Helbing 2

Page 3: Density-Based Clustering · Main Clustering Approaches Partitioning method →constructs partitions of data points →evaluates the partitions by some criterion → k-means, -medoids

Main Clustering Approaches

• Partitioning method→ constructs partitions of data points→ evaluates the partitions by some criterion→ k -means, k -medoids

• Density-based method:→ based on connectivity and density functions→ DBSCAN, DJCluster

Izabela Moise, Evangelos Pournaras, Dirk Helbing 3

Page 4: Density-Based Clustering · Main Clustering Approaches Partitioning method →constructs partitions of data points →evaluates the partitions by some criterion → k-means, -medoids

Density-Based Clustering

Izabela Moise, Evangelos Pournaras, Dirk Helbing 4

Page 5: Density-Based Clustering · Main Clustering Approaches Partitioning method →constructs partitions of data points →evaluates the partitions by some criterion → k-means, -medoids

Density-Based Clustering

Density-Based Clustering

locates regions of high density that are separated from one anotherby regions of low density.

Izabela Moise, Evangelos Pournaras, Dirk Helbing 4

Page 6: Density-Based Clustering · Main Clustering Approaches Partitioning method →constructs partitions of data points →evaluates the partitions by some criterion → k-means, -medoids

Main principles

• Two parameters:1. maximum radius of the neighbourhood→ Eps2. minimum number of points in an Eps neighbourhood of a point→ MinPts

• NEps(p) : {q ∈ D s.t . dist(p, q) ≤ Eps}• Key idea: the density of the neighbourhood has to exceed

some threshold.

• The shape of a neighbourhood depends on the dist function

Izabela Moise, Evangelos Pournaras, Dirk Helbing 5

Page 7: Density-Based Clustering · Main Clustering Approaches Partitioning method →constructs partitions of data points →evaluates the partitions by some criterion → k-means, -medoids

Main principles

• Two parameters:1. maximum radius of the neighbourhood→ Eps2. minimum number of points in an Eps neighbourhood of a point→ MinPts

• NEps(p) : {q ∈ D s.t . dist(p, q) ≤ Eps}• Key idea: the density of the neighbourhood has to exceed

some threshold.

• The shape of a neighbourhood depends on the dist function

Izabela Moise, Evangelos Pournaras, Dirk Helbing 5

Page 8: Density-Based Clustering · Main Clustering Approaches Partitioning method →constructs partitions of data points →evaluates the partitions by some criterion → k-means, -medoids

Main principles

• Two parameters:1. maximum radius of the neighbourhood→ Eps2. minimum number of points in an Eps neighbourhood of a point→ MinPts

• NEps(p) : {q ∈ D s.t . dist(p, q) ≤ Eps}• Key idea: the density of the neighbourhood has to exceed

some threshold.

• The shape of a neighbourhood depends on the dist function

Izabela Moise, Evangelos Pournaras, Dirk Helbing 5

Page 9: Density-Based Clustering · Main Clustering Approaches Partitioning method →constructs partitions of data points →evaluates the partitions by some criterion → k-means, -medoids

Core, Border and Noise/Outlier

1

1Jing Gao, SUNY BuffaloIzabela Moise, Evangelos Pournaras, Dirk Helbing 6

Page 10: Density-Based Clustering · Main Clustering Approaches Partitioning method →constructs partitions of data points →evaluates the partitions by some criterion → k-means, -medoids

Directly Density-Reachable

Directly density-reachable:→ A point p is directly density-reachable from a point q wrt. Eps,MinPts if:

1. p ∈ NEps(q) and

2. |NEps(q)| ≥ MinPts

Izabela Moise, Evangelos Pournaras, Dirk Helbing 7

Page 11: Density-Based Clustering · Main Clustering Approaches Partitioning method →constructs partitions of data points →evaluates the partitions by some criterion → k-means, -medoids

Directly Density-Reachable

Directly density-reachable:→ A point p is directly density-reachable from a point q wrt. Eps,MinPts if:

1. p ∈ NEps(q) and

2. |NEps(q)| ≥ MinPts

Izabela Moise, Evangelos Pournaras, Dirk Helbing 7

Page 12: Density-Based Clustering · Main Clustering Approaches Partitioning method →constructs partitions of data points →evaluates the partitions by some criterion → k-means, -medoids

Density-Reachable

• Density-reachable:→ A point p is density-reachable from a point q wrt. Eps,MinPts if there is a chain of points p1, ..., pn, withp1 = q, pn = p, s.t .pi+1 is directly density reachable from pi

• transitive but not symmetric

Izabela Moise, Evangelos Pournaras, Dirk Helbing 8

Page 13: Density-Based Clustering · Main Clustering Approaches Partitioning method →constructs partitions of data points →evaluates the partitions by some criterion → k-means, -medoids

Density-Connected

Density-connected:→ A point p is density-connected from a point q wrt. Eps, MinPts ifthere is a point o s.t. p and q are density-reachable from o wrt. Epsand MinPts

Izabela Moise, Evangelos Pournaras, Dirk Helbing 9

Page 14: Density-Based Clustering · Main Clustering Approaches Partitioning method →constructs partitions of data points →evaluates the partitions by some criterion → k-means, -medoids

Density-Connected

Density-connected:→ A point p is density-connected from a point q wrt. Eps, MinPts ifthere is a point o s.t. p and q are density-reachable from o wrt. Epsand MinPts→ symmetric

Izabela Moise, Evangelos Pournaras, Dirk Helbing 9

Page 15: Density-Based Clustering · Main Clustering Approaches Partitioning method →constructs partitions of data points →evaluates the partitions by some criterion → k-means, -medoids

Density-Connected

Density-connected:→ A point p is density-connected from a point q wrt. Eps, MinPts ifthere is a point o s.t. p and q are density-reachable from o wrt. Epsand MinPts→ symmetric

Izabela Moise, Evangelos Pournaras, Dirk Helbing 9

Page 16: Density-Based Clustering · Main Clustering Approaches Partitioning method →constructs partitions of data points →evaluates the partitions by some criterion → k-means, -medoids

DBSCAN - Density-Based Spatial Clustering of Applicationswith Noise

Izabela Moise, Evangelos Pournaras, Dirk Helbing 10

Page 17: Density-Based Clustering · Main Clustering Approaches Partitioning method →constructs partitions of data points →evaluates the partitions by some criterion → k-means, -medoids

Main Principles

One of the most cited clustering algorithms

Main principle:

a cluster is defined as a maximal set of density-connected points.

• Discovers clusters of arbitrary shapes (spherical, elongated,linear), and noise

• Works with spatial datasets:→ geomarketing, tomography, satellite images

• Requires only two parameters (no prior knowledge of thenumber of clusters)

Izabela Moise, Evangelos Pournaras, Dirk Helbing 11

Page 18: Density-Based Clustering · Main Clustering Approaches Partitioning method →constructs partitions of data points →evaluates the partitions by some criterion → k-means, -medoids

Definition: Cluster

2

2Erik Kropat, University of the Bundeswehr Munich

Izabela Moise, Evangelos Pournaras, Dirk Helbing 12

Page 19: Density-Based Clustering · Main Clustering Approaches Partitioning method →constructs partitions of data points →evaluates the partitions by some criterion → k-means, -medoids

Definition: Noise

3

3Erik Kropat, University of the Bundeswehr Munich

Izabela Moise, Evangelos Pournaras, Dirk Helbing 13

Page 20: Density-Based Clustering · Main Clustering Approaches Partitioning method →constructs partitions of data points →evaluates the partitions by some criterion → k-means, -medoids

The Algorithm

1. Randomly select a point p

2. Retrieve all points density-reachable from p wrt. Eps andMinPts

3. If p is a core point, a cluster is formed

4. If p is a border point, no points are density-reachable from p→visit the next data point

5. Continue the process until all points have been processed

Izabela Moise, Evangelos Pournaras, Dirk Helbing 14

Page 21: Density-Based Clustering · Main Clustering Approaches Partitioning method →constructs partitions of data points →evaluates the partitions by some criterion → k-means, -medoids

Selecting Eps and MinPts

The two parameters can be determined by a heuristic

Observation:• For points in a cluster their k -th nearest neighbours are at

roughly the same distance.

• Noise points have the k -th nearest neighbour at fartherdistance.

Izabela Moise, Evangelos Pournaras, Dirk Helbing 15

Page 22: Density-Based Clustering · Main Clustering Approaches Partitioning method →constructs partitions of data points →evaluates the partitions by some criterion → k-means, -medoids

4

4Erik Kropat, University of the Bundeswehr Munich

Izabela Moise, Evangelos Pournaras, Dirk Helbing 16

Page 23: Density-Based Clustering · Main Clustering Approaches Partitioning method →constructs partitions of data points →evaluates the partitions by some criterion → k-means, -medoids

5

5Erik Kropat, University of the Bundeswehr Munich

Izabela Moise, Evangelos Pournaras, Dirk Helbing 17

Page 24: Density-Based Clustering · Main Clustering Approaches Partitioning method →constructs partitions of data points →evaluates the partitions by some criterion → k-means, -medoids

6

6Erik Kropat, University of the Bundeswehr Munich

Izabela Moise, Evangelos Pournaras, Dirk Helbing 18

Page 25: Density-Based Clustering · Main Clustering Approaches Partitioning method →constructs partitions of data points →evaluates the partitions by some criterion → k-means, -medoids

Pros and Cons

Pros:

X discovers clusters of arbitrary shapes

X handles noise

X needs density parameters as termination condition

Izabela Moise, Evangelos Pournaras, Dirk Helbing 19

Page 26: Density-Based Clustering · Main Clustering Approaches Partitioning method →constructs partitions of data points →evaluates the partitions by some criterion → k-means, -medoids

Pros and Cons

Cons:

X cannot handle varying densities

X sensitive to parameters→ hard to determine the correct set ofparameters

X sampling affects density measures

Izabela Moise, Evangelos Pournaras, Dirk Helbing 20