3.4 density and grid methods

26
Clustering Density and Grid Based 1

Upload: krishver2

Post on 12-Aug-2015

42 views

Category:

Education


0 download

TRANSCRIPT

Page 1: 3.4 density and grid methods

ClusteringDensity and Grid

Based

1

Page 2: 3.4 density and grid methods

2

Density based methods Clusters – dense regions of objects Low density regions – Noise DBSCAN

Density Based Spatial Clustering of Applications with Noise

OPTICS Ordering Points To Identify the Clustering Structure

DENCLUE DENsity Based CLUstEring

Page 3: 3.4 density and grid methods

3

DBSCAN Cluster – maximal set of density connected points

Grows regions with sufficiently high density into clusters -neighborhood MinPts and Core object Directly Density Reachable

An object p is directly density reachable from object q if p is within the -neighborhood of q and q is a core object

pq

MinPts = 5

Page 4: 3.4 density and grid methods

4

DBSCAN Density Reachable

An object p is density reachable from q, if there is a chain of objects p1, …pn, p1=q and pn=p such that pi+1 is directly density reachable from pi

p

qp1

Page 5: 3.4 density and grid methods

5

DBSCAN Density Connected

An object p is density connected to object q if there is an object o such that both p and q are density reachable from o.

p q

o

Page 6: 3.4 density and grid methods

6

DBSCAN Arbitrarily select a point p

Retrieve all points density-reachable from p

If p is a core point, a cluster is formed.

If p is a border point, no points are density-reachable

from p, then DBSCAN visits the next point of the

database.

Continue the process until all of the points have been

processed.

Complexity : O(n log n) / O(n2)

Page 7: 3.4 density and grid methods

7

OPTICS: A Cluster-Ordering Method OPTICS: Ordering Points To Identify the Clustering

Structure Produces a special order of the database with respect

to its density-based clustering structure Good for both automatic and interactive cluster

analysis, including finding intrinsic clustering structure Can be represented graphically or using visualization

techniques

Page 8: 3.4 density and grid methods

OPTICS In DBSCAN, for a constant MinPts value, density based

clusters with respect to a higher density (lower value of ) are

completely contained in lower density sets.

DBSCAN is extended so that Objects are processed in a

specific order. Selects an object that is density-reachable with respect to lowest

value

Core distance of an object p : smallest ’ value that makes {p} a core

object

Reachability distance of an object q with respect to p = max (core-

distance of p, d(p,q))

8

Page 9: 3.4 density and grid methods

OPTICS

Complexity : O(n log n)

9

Page 10: 3.4 density and grid methods

10

Reachability-distance

Cluster-order

of the objects

undefined

OPTICS

Page 11: 3.4 density and grid methods

11

DENCLUE: using density functions

DENsity-based CLUstEring Major features

Solid mathematical foundation Good for data sets with large amounts of noise Allows a compact mathematical description of arbitrarily

shaped clusters in high-dimensional data sets Significantly faster than existing algorithm (faster than

DBSCAN by a factor of up to 45) But needs a large number of parameters

Page 12: 3.4 density and grid methods

12

Influence function: describes the impact of a data point within its neighborhood. x, y – objects in Fd – d-dimensional input space Influence of object y on x is: Can be determined by distance:

Overall density of the data space can be calculated as the sum of the influence function of all data points.

Clusters can be determined mathematically by identifying density attractors. Density attractors are local maximal of the overall density function.

DENCLUE

),()( yxfxf ByB

otherwise 1or ),( 0),( yxdifyxf square

f x y eGaussian

d x y

( , )( , )

2

22

Page 13: 3.4 density and grid methods

13

Density attractor – Local maxima of overall density function

A point x is said to be density attracted to a density attractor x* if there exists a set of points x0, x1,..xk such that x0 = x and xk =x* and the gradient of xi-1 is in the direction of xi

DENCLUE

Page 14: 3.4 density and grid methods

DENCLUE Center defined clusters

For a density attractor x* - a subset of points that are density attracted by x* and where density function x* is no less than threshold

Others are outliers Arbitrary shape cluster

Set of density attractors and set of Cs There should be a path from each density attractor to

another where density function value for each point is no less that

14

Page 15: 3.4 density and grid methods

DENCLUE

15

Page 16: 3.4 density and grid methods

16

Grid Based Methods

Uses a Multi-resolution grid data structure Quantizes space into a finite number of cells

that form a grid structure Fast processing time STING WaveCluster CLIQUE – CLustering In QUEst

Page 17: 3.4 density and grid methods

17

STING

STatistical Information Grid Spatial area is divided into rectangular cells Several levels of cells – at different levels of

resolution High level cell is partitioned into several lower

level cells Statistical attributes are stored in cell

Mean, Maximum, Minimum

Page 18: 3.4 density and grid methods

18

STING

Page 19: 3.4 density and grid methods

19

STING Parameters of higher level cells are computed

from those at lower levels To answer queries

Identify level Estimate cell’s relevance to query Process relevant cells at lower levels Continue to lowest level

Page 20: 3.4 density and grid methods

20

STING

Computation is query independent Parallel processing – supported Data is processed in a single pass Quality depends on granularity

Page 21: 3.4 density and grid methods

21

WaveCluster A multi-resolution clustering approach which applies

wavelet transform to the feature space A wavelet transform is a signal processing technique

that decomposes a signal into different frequency sub-band.

Both grid-based and density-based Input parameters:

# of grid cells for each dimension the wavelet, and the # of applications of wavelet

transform.

Page 22: 3.4 density and grid methods

22

WaveCluster

Using wavelet transform to find clusters Summarises the data by imposing a multidimensional

grid structure onto data space These multidimensional spatial data objects are

represented in a n-dimensional feature space Apply wavelet transform on feature space to find the

dense regions in the feature space Apply wavelet transform multiple times which result in

clusters at different scales from fine to coarse

Page 23: 3.4 density and grid methods

23

Quantization

Page 24: 3.4 density and grid methods

24

Transformation

Page 25: 3.4 density and grid methods

25

WaveCluster Reasons for using Wavelet transformation in clustering

Unsupervised clustering

It uses filters to emphasize region where points cluster, but simultaneously to suppress weaker information in their boundary

Effective removal of outliers Multi-resolution Cost efficiency

Major features: Complexity O(N) Detect arbitrary shaped clusters at different scales Not sensitive to noise, not sensitive to input order Only applicable to low dimensional data

Page 26: 3.4 density and grid methods

26

CLIQUE (Clustering In QUEst) Automatically identifying subspaces of a high dimensional data space

that allow better clustering than original space

CLIQUE can be considered as both density-based and grid-based

It partitions each dimension into the same number of equal length interval

It partitions an m-dimensional data space into non-overlapping rectangular units

A unit is dense if the fraction of total data points contained in the unit exceeds the input model parameter

A cluster is a maximal set of connected dense units within a subspace