data mining cluster analysis: advanced concepts and algorithms
TRANSCRIPT
![Page 1: Data Mining Cluster Analysis: Advanced Concepts and Algorithms](https://reader030.vdocuments.site/reader030/viewer/2022012102/616a03ba11a7b741a34ddf15/html5/thumbnails/1.jpg)
Data Mining Cluster Analysis: Advanced Concepts
and Algorithms
ref. Chapter 9
Introduction to Data Mining by
Tan, Steinbach, Kumar
1
![Page 2: Data Mining Cluster Analysis: Advanced Concepts and Algorithms](https://reader030.vdocuments.site/reader030/viewer/2022012102/616a03ba11a7b741a34ddf15/html5/thumbnails/2.jpg)
Data e Web Mining 2
Outline
Prototype-based – Fuzzy c-means – Mixture Model Clustering
Density-based – Grid-based clustering – Subspace clustering
Graph-based – Chameleon
Scalable Clustering Algorithms – Cure and Birch
Characteristics of Clustering Algorithms
![Page 3: Data Mining Cluster Analysis: Advanced Concepts and Algorithms](https://reader030.vdocuments.site/reader030/viewer/2022012102/616a03ba11a7b741a34ddf15/html5/thumbnails/3.jpg)
Data e Web Mining 3
Hard (Crisp) vs Soft (Fuzzy) Clustering
Hard (Crisp) vs. Soft (Fuzzy) clustering – Generalize K-means objective function (for all the N
points)
wij : weight with which object xi belongs to cluster Cj
– To minimize SSE, repeat the following steps: Fixed cj and determine wij (cluster assignment) Fixed wij and recompute cj
– Hard clustering: wij ∈ {0,1}
€
SSE = wij xi − c j( )2
i=1
N
∑j=1
k
∑ ,
€
wijj=1
k
∑ =1
![Page 4: Data Mining Cluster Analysis: Advanced Concepts and Algorithms](https://reader030.vdocuments.site/reader030/viewer/2022012102/616a03ba11a7b741a34ddf15/html5/thumbnails/4.jpg)
Data e Web Mining 4
Hard (Crisp) vs Soft (Fuzzy) Clustering
SSE(x) is minimized when wx1 = 1, wx2 = 0
1 2 5
c1 c2 x
![Page 5: Data Mining Cluster Analysis: Advanced Concepts and Algorithms](https://reader030.vdocuments.site/reader030/viewer/2022012102/616a03ba11a7b741a34ddf15/html5/thumbnails/5.jpg)
Data e Web Mining 5
Fuzzy C-means
Objective function
wij : weight with which object xi belongs to cluster Cj
– To minimize objective function, repeat the following: Fixed cj and determine wij Fixed wij and recompute cj
– Fuzzy clustering: wij ∈[0,1]
€
wijj=1
k
∑ =1
p: fuzzifier (p > 1)
€
SSE = wijp xi − c j( )
2
i=1
N
∑j=1
k
∑ ,
![Page 6: Data Mining Cluster Analysis: Advanced Concepts and Algorithms](https://reader030.vdocuments.site/reader030/viewer/2022012102/616a03ba11a7b741a34ddf15/html5/thumbnails/6.jpg)
Data e Web Mining 6
Fuzzy C-means
SSE(x) is minimized when wx1 = 0.9, wx2 = 0.1
1 2 5
c1 c2 x
![Page 7: Data Mining Cluster Analysis: Advanced Concepts and Algorithms](https://reader030.vdocuments.site/reader030/viewer/2022012102/616a03ba11a7b741a34ddf15/html5/thumbnails/7.jpg)
Data e Web Mining 7
Fuzzy C-means
Objective function:
Initialization: choose the weights wij randomly
Repeat: – Update centroids:
– Update weights:
€
SSE = wijp xi − c j( )
2
i=1
N
∑j=1
k
∑ ,
€
wijj=1
k
∑ =1
p: fuzzifier (p > 1)
![Page 8: Data Mining Cluster Analysis: Advanced Concepts and Algorithms](https://reader030.vdocuments.site/reader030/viewer/2022012102/616a03ba11a7b741a34ddf15/html5/thumbnails/8.jpg)
Data e Web Mining 8
Fuzzy K-means Applied to Sample Data
![Page 9: Data Mining Cluster Analysis: Advanced Concepts and Algorithms](https://reader030.vdocuments.site/reader030/viewer/2022012102/616a03ba11a7b741a34ddf15/html5/thumbnails/9.jpg)
Data e Web Mining 9
Hard (Crisp) vs Soft (Probabilistic) Clustering
Idea is to model the set of data points as arising from a mixture of distributions
– Typically, normal (Gaussian) distribution is used – But other distributions have been very profitably used.
Clusters are found by estimating the parameters of the statistical distributions
– Can use a k-means like algorithm, called the EM algorithm, to estimate these parameters Actually, k-means is a special case of this approach
– Provides a compact representation of clusters – The probabilities with which point belongs to each cluster provide
a functionality similar to fuzzy clustering.
![Page 10: Data Mining Cluster Analysis: Advanced Concepts and Algorithms](https://reader030.vdocuments.site/reader030/viewer/2022012102/616a03ba11a7b741a34ddf15/html5/thumbnails/10.jpg)
Data e Web Mining 10
Probabilistic Clustering: Example
Informal example: consider modeling the points that generate the following histogram.
Looks like a combination of two normal distributions
Suppose we can estimate the mean and standard deviation of each normal distribution.
– This completely describes the two clusters – We can compute the probabilities with which each point belongs
to each cluster – Can assign each point to the
cluster (distribution) in which it is most probable.
![Page 11: Data Mining Cluster Analysis: Advanced Concepts and Algorithms](https://reader030.vdocuments.site/reader030/viewer/2022012102/616a03ba11a7b741a34ddf15/html5/thumbnails/11.jpg)
Data e Web Mining 11
Probabilistic Clustering: EM Algorithm
Initialize the parameters Repeat
For each point, compute its probability under each distribution Using these probabilities, update the parameters of each distribution
Until there is not change
Very similar to of K-means Consists of assignment and update steps Can use random initialization
– Problem of local minima For normal distributions, typically use K-means to initialize If using normal distributions, can find elliptical as well as
spherical shapes.
![Page 12: Data Mining Cluster Analysis: Advanced Concepts and Algorithms](https://reader030.vdocuments.site/reader030/viewer/2022012102/616a03ba11a7b741a34ddf15/html5/thumbnails/12.jpg)
Data e Web Mining 12
Probabilistic Clustering: EM Algorithm
Choose K seeds: means of a gaussian distribution
Estimation: calculate probability of belonging to a cluster based on distance
Maximization: move mean of gaussian to centroid of data set, weighted by the contribution of each point
Repeat till means don’t move
![Page 13: Data Mining Cluster Analysis: Advanced Concepts and Algorithms](https://reader030.vdocuments.site/reader030/viewer/2022012102/616a03ba11a7b741a34ddf15/html5/thumbnails/13.jpg)
Data e Web Mining 13
Probabilistic Clustering Applied to Sample Data
![Page 14: Data Mining Cluster Analysis: Advanced Concepts and Algorithms](https://reader030.vdocuments.site/reader030/viewer/2022012102/616a03ba11a7b741a34ddf15/html5/thumbnails/14.jpg)
Data e Web Mining 14
Grid-based Clustering
A type of density-based clustering
![Page 15: Data Mining Cluster Analysis: Advanced Concepts and Algorithms](https://reader030.vdocuments.site/reader030/viewer/2022012102/616a03ba11a7b741a34ddf15/html5/thumbnails/15.jpg)
Data e Web Mining 15
Grid-based Clustering
Issues – how to discretize the dimensions
equal width vs. equal frequency discretization
– density of cells containing the points close to the border of a cluster can be very low these cells are discarded. A possible solution is to reduce the size of cells, but this may yield additional problems
![Page 16: Data Mining Cluster Analysis: Advanced Concepts and Algorithms](https://reader030.vdocuments.site/reader030/viewer/2022012102/616a03ba11a7b741a34ddf15/html5/thumbnails/16.jpg)
Data e Web Mining 16
Subspace Clustering
Until now, we found clusters by considering all of the attributes
Some clusters may involve only a subset of attributes, i.e., subspaces of the data – Example:
In a document collection, documents can be represented as vectors, where the dimensions correspond to terms
When k-means is used to find document clusters, the resulting clusters can typically be characterized by 10 or so terms
![Page 17: Data Mining Cluster Analysis: Advanced Concepts and Algorithms](https://reader030.vdocuments.site/reader030/viewer/2022012102/616a03ba11a7b741a34ddf15/html5/thumbnails/17.jpg)
Data e Web Mining 17
Example
Three clear clusters. – The circle points are not a cluster in three dimensions – If the dimensions are discretized (equal width), these
points are included in low density cells
![Page 18: Data Mining Cluster Analysis: Advanced Concepts and Algorithms](https://reader030.vdocuments.site/reader030/viewer/2022012102/616a03ba11a7b741a34ddf15/html5/thumbnails/18.jpg)
Data e Web Mining 18
Histograms to determine density
Equi-width discretized space
Density Threshold = 6%
Contiguous intervals to be clustered
![Page 19: Data Mining Cluster Analysis: Advanced Concepts and Algorithms](https://reader030.vdocuments.site/reader030/viewer/2022012102/616a03ba11a7b741a34ddf15/html5/thumbnails/19.jpg)
Data e Web Mining 19
Example
![Page 20: Data Mining Cluster Analysis: Advanced Concepts and Algorithms](https://reader030.vdocuments.site/reader030/viewer/2022012102/616a03ba11a7b741a34ddf15/html5/thumbnails/20.jpg)
Data e Web Mining 20
Example
![Page 21: Data Mining Cluster Analysis: Advanced Concepts and Algorithms](https://reader030.vdocuments.site/reader030/viewer/2022012102/616a03ba11a7b741a34ddf15/html5/thumbnails/21.jpg)
Data e Web Mining 21
Example
![Page 22: Data Mining Cluster Analysis: Advanced Concepts and Algorithms](https://reader030.vdocuments.site/reader030/viewer/2022012102/616a03ba11a7b741a34ddf15/html5/thumbnails/22.jpg)
Data e Web Mining 22
Example : remarks
The circles do not form a cluster in the three dimensions, but they may form a cluster in some subspaces
A cluster in the three dimensions is part of a cluster (maybe a larger one) in the subspaces
![Page 23: Data Mining Cluster Analysis: Advanced Concepts and Algorithms](https://reader030.vdocuments.site/reader030/viewer/2022012102/616a03ba11a7b741a34ddf15/html5/thumbnails/23.jpg)
Data e Web Mining 23
Clique Algorithm - Overview
A grid-based clustering algorithm that methodically finds subspace clusters – Partitions the data space into rectangular units of
equal volume – Measures the density of each unit by the fraction of
points it contains – A unit is dense if the fraction of overall points it
contains is above a user specified threshold, τ – A cluster is a group of collections of contiguous
(touching) dense units
![Page 24: Data Mining Cluster Analysis: Advanced Concepts and Algorithms](https://reader030.vdocuments.site/reader030/viewer/2022012102/616a03ba11a7b741a34ddf15/html5/thumbnails/24.jpg)
Data e Web Mining 24
Clique Algorithm
It is impractical to check each subspace to see if it is dense, due to the exponential number of them – 2n subspaces, if n are the dimensions
Monotone property of density-based clusters: – If a set of points forms a density based cluster in k
dimensions, then the same set of points is also part of a density based cluster in all possible subsets of those dimensions
Very similar to the Apriori algorithm for frequent itemset mining
Can find overlapping clusters
![Page 25: Data Mining Cluster Analysis: Advanced Concepts and Algorithms](https://reader030.vdocuments.site/reader030/viewer/2022012102/616a03ba11a7b741a34ddf15/html5/thumbnails/25.jpg)
Data e Web Mining 25
Clique Algorithm
![Page 26: Data Mining Cluster Analysis: Advanced Concepts and Algorithms](https://reader030.vdocuments.site/reader030/viewer/2022012102/616a03ba11a7b741a34ddf15/html5/thumbnails/26.jpg)
Data e Web Mining 26
Limitations of Clique
Time complexity is exponential in number of dimensions – Especially if “too many” dense units are generated at
lower stages
May fail if clusters are of widely differing densities, since the threshold is fixed – Determining appropriate threshold and unit interval
length can be challenging
![Page 27: Data Mining Cluster Analysis: Advanced Concepts and Algorithms](https://reader030.vdocuments.site/reader030/viewer/2022012102/616a03ba11a7b741a34ddf15/html5/thumbnails/27.jpg)
Data e Web Mining 27
Graph-Based Clustering: General Concepts
Graph-Based clustering uses the proximity graph – Start with the proximity matrix – Consider each point as a node in a graph – Each edge between two nodes has a weight which is
the proximity between the two points – Initially the proximity graph is fully connected – MIN (single-link) and MAX (complete-link) can be
viewed as starting with this graph
In the simplest case, clusters are connected components in the graph.
![Page 28: Data Mining Cluster Analysis: Advanced Concepts and Algorithms](https://reader030.vdocuments.site/reader030/viewer/2022012102/616a03ba11a7b741a34ddf15/html5/thumbnails/28.jpg)
Data e Web Mining 28
Graph-Based Clustering: Chameleon
Based on several key ideas
– Sparsification of the proximity graph
– Partitioning the data into clusters that are relatively pure subclusters of the “true” clusters
– Merging based on preserving characteristics of clusters
![Page 29: Data Mining Cluster Analysis: Advanced Concepts and Algorithms](https://reader030.vdocuments.site/reader030/viewer/2022012102/616a03ba11a7b741a34ddf15/html5/thumbnails/29.jpg)
Data e Web Mining 29
Graph-Based Clustering: Sparsification
The amount of data that needs to be processed is drastically reduced, thus making the algorithm more scalable – Sparsification can eliminate more than 99% of the
entries in a proximity matrix – The amount of time required to cluster the data is
drastically reduced – The size of the problems that can be handled is
increased
![Page 30: Data Mining Cluster Analysis: Advanced Concepts and Algorithms](https://reader030.vdocuments.site/reader030/viewer/2022012102/616a03ba11a7b741a34ddf15/html5/thumbnails/30.jpg)
Data e Web Mining 30
Graph-Based Clustering: Sparsification …
Clustering may work better – Sparsification techniques keep the connections to the most
similar (nearest) neighbors of a point while breaking the connections to less similar points.
– The nearest neighbors of a point tend to belong to the same class as the point itself.
– This reduces the impact of noise and outliers and sharpens the distinction between clusters.
Sparsification facilitates the use of graph partitioning algorithms (or algorithms based on graph partitioning algorithms)
– Chameleon and Hypergraph-based Clustering
![Page 31: Data Mining Cluster Analysis: Advanced Concepts and Algorithms](https://reader030.vdocuments.site/reader030/viewer/2022012102/616a03ba11a7b741a34ddf15/html5/thumbnails/31.jpg)
Data e Web Mining 31
Sparsification in the Clustering Process
![Page 32: Data Mining Cluster Analysis: Advanced Concepts and Algorithms](https://reader030.vdocuments.site/reader030/viewer/2022012102/616a03ba11a7b741a34ddf15/html5/thumbnails/32.jpg)
Data e Web Mining 32
Limitations of Current Merging Schemes
Existing merging schemes in hierarchical clustering algorithms are static in nature – MIN or CURE:
Merge two clusters based on their closeness (or minimum distance)
– GROUP-AVERAGE: Merge two clusters based on their average connectivity
![Page 33: Data Mining Cluster Analysis: Advanced Concepts and Algorithms](https://reader030.vdocuments.site/reader030/viewer/2022012102/616a03ba11a7b741a34ddf15/html5/thumbnails/33.jpg)
Data e Web Mining 33
Limitations of Current Merging Schemes
Closeness schemes will merge (a) and (b)
(a)
(b)
(c)
(d)
Average connectivity schemes will merge (c) and (d)
![Page 34: Data Mining Cluster Analysis: Advanced Concepts and Algorithms](https://reader030.vdocuments.site/reader030/viewer/2022012102/616a03ba11a7b741a34ddf15/html5/thumbnails/34.jpg)
Data e Web Mining 34
Chameleon: Clustering Using Dynamic Modeling
Adapt to the characteristics of the data set to find the natural clusters
Use a dynamic model to measure the similarity between clusters – Main properties are the relative closeness and relative inter-
connectivity of the cluster – Two clusters are combined if the resulting cluster shares certain
properties with the constituent clusters – The merging scheme preserves self-similarity
![Page 35: Data Mining Cluster Analysis: Advanced Concepts and Algorithms](https://reader030.vdocuments.site/reader030/viewer/2022012102/616a03ba11a7b741a34ddf15/html5/thumbnails/35.jpg)
Data e Web Mining 35
Experimental Results: CHAMELEON
![Page 36: Data Mining Cluster Analysis: Advanced Concepts and Algorithms](https://reader030.vdocuments.site/reader030/viewer/2022012102/616a03ba11a7b741a34ddf15/html5/thumbnails/36.jpg)
Data e Web Mining 36
Experimental Results: CHAMELEON
![Page 37: Data Mining Cluster Analysis: Advanced Concepts and Algorithms](https://reader030.vdocuments.site/reader030/viewer/2022012102/616a03ba11a7b741a34ddf15/html5/thumbnails/37.jpg)
Data e Web Mining 37
Experimental Results: CHAMELEON
![Page 38: Data Mining Cluster Analysis: Advanced Concepts and Algorithms](https://reader030.vdocuments.site/reader030/viewer/2022012102/616a03ba11a7b741a34ddf15/html5/thumbnails/38.jpg)
Data e Web Mining 38
CURE: a Scalable Algorithm
Agglomerative hierarchical clustering algorithms vary in terms of how the proximity of two clusters are computed
MIN (single link) – susceptible to noise/outliers
MAX (complete link)/GROUP AVERAGE/Centroid: – may not work well with non-globular clusters
CURE (Clustering Using REpresentatives) algorithm tries to handle both problems
– It is a graph-based algorithm – Starts with a proximity matrix/proximity graph
![Page 39: Data Mining Cluster Analysis: Advanced Concepts and Algorithms](https://reader030.vdocuments.site/reader030/viewer/2022012102/616a03ba11a7b741a34ddf15/html5/thumbnails/39.jpg)
Data e Web Mining 39
Represents a cluster using multiple representative points – Goals: scalability, by choosing points that capture the
geometry and shape of clusters – Representative points are found by selecting a
constant number of points from a cluster The first representative point is chosen to be the point farthest
from the center of the cluster Remaining representative points are chosen so that they are
farthest from all previously chosen points
CURE Algorithm
![Page 40: Data Mining Cluster Analysis: Advanced Concepts and Algorithms](https://reader030.vdocuments.site/reader030/viewer/2022012102/616a03ba11a7b741a34ddf15/html5/thumbnails/40.jpg)
Data e Web Mining 40
“Shrink” representative points toward the center of the cluster by a factor, α
Shrinking representative points toward the center helps avoid problems with noise and outliers
– shrinking factor: α
Cluster similarity is the similarity of the closest pair of representative points (MIN) from different clusters
CURE Algorithm
× ×
![Page 41: Data Mining Cluster Analysis: Advanced Concepts and Algorithms](https://reader030.vdocuments.site/reader030/viewer/2022012102/616a03ba11a7b741a34ddf15/html5/thumbnails/41.jpg)
Data e Web Mining 41
CURE Algorithm
Uses agglomerative hierarchical scheme to perform clustering; – α = 0: similar to centroid-based – α = 1: somewhat similar to single-link (MIN)
CURE is better able to handle clusters of arbitrary shapes and sizes
![Page 42: Data Mining Cluster Analysis: Advanced Concepts and Algorithms](https://reader030.vdocuments.site/reader030/viewer/2022012102/616a03ba11a7b741a34ddf15/html5/thumbnails/42.jpg)
Data e Web Mining 42
Experimental Results: CURE (10 clusters)
![Page 43: Data Mining Cluster Analysis: Advanced Concepts and Algorithms](https://reader030.vdocuments.site/reader030/viewer/2022012102/616a03ba11a7b741a34ddf15/html5/thumbnails/43.jpg)
Data e Web Mining 43
Experimental Results: CURE (9 clusters)
![Page 44: Data Mining Cluster Analysis: Advanced Concepts and Algorithms](https://reader030.vdocuments.site/reader030/viewer/2022012102/616a03ba11a7b741a34ddf15/html5/thumbnails/44.jpg)
Data e Web Mining 44
Experimental Results: CURE
Picture from CURE, Guha, Rastogi, Shim.
![Page 45: Data Mining Cluster Analysis: Advanced Concepts and Algorithms](https://reader030.vdocuments.site/reader030/viewer/2022012102/616a03ba11a7b741a34ddf15/html5/thumbnails/45.jpg)
Data e Web Mining 45
Experimental Results: CURE
Picture from CURE, Guha, Rastogi, Shim.
(centroid)
(single link)
![Page 46: Data Mining Cluster Analysis: Advanced Concepts and Algorithms](https://reader030.vdocuments.site/reader030/viewer/2022012102/616a03ba11a7b741a34ddf15/html5/thumbnails/46.jpg)
Data e Web Mining 46
CURE Cannot Handle Differing Densities
Original Points CURE
![Page 47: Data Mining Cluster Analysis: Advanced Concepts and Algorithms](https://reader030.vdocuments.site/reader030/viewer/2022012102/616a03ba11a7b741a34ddf15/html5/thumbnails/47.jpg)
Data e Web Mining 47
BIRCH: a Scalable Algorithm
Balanced Iterative Reducing and Clustering using Hierarchies
Scales linearly: finds a good clustering with a single scan and improves the quality with a few additional scans
Weakness: handles only numeric data (Euclidean space), and is sensitive to the order of the data record
![Page 48: Data Mining Cluster Analysis: Advanced Concepts and Algorithms](https://reader030.vdocuments.site/reader030/viewer/2022012102/616a03ba11a7b741a34ddf15/html5/thumbnails/48.jpg)
Data e Web Mining 48
BIRCH
Clustering Feature (CF): – Number of point, Linear Sum of points, Sum of
Squares of points – CF incrementally updated, to be used for computing
centroids, variance (used for measuring the diameter of the cluster)
– Also used for computing distances between clusters
€
(N,LS→
,SS→
)
![Page 49: Data Mining Cluster Analysis: Advanced Concepts and Algorithms](https://reader030.vdocuments.site/reader030/viewer/2022012102/616a03ba11a7b741a34ddf15/html5/thumbnails/49.jpg)
Data e Web Mining 49
BIRCH
CF is a compact storage for data on points in a cluster
Has enough information to calculate the intra-cluster distances
Additivity theorem allows us to merge sub-clusters – C3 = C1 C2 – CFC3= <nC1+ nC2 , LSC1+ LSC2, SSC1+SSC2>
€
![Page 50: Data Mining Cluster Analysis: Advanced Concepts and Algorithms](https://reader030.vdocuments.site/reader030/viewer/2022012102/616a03ba11a7b741a34ddf15/html5/thumbnails/50.jpg)
Data e Web Mining 50
BIRCH
Basic steps of BIRCH
– Load the data into memory by creating a CF tree that “summarizes” the data (see the following slide)
– Perform global clustering. Produces a better clustering than the initial step. An agglomerative, hierarchical technique was selected.
– Redistribute the data points using the centroids of clusters discovered in the global clustering phase, and thus, discover a new (and hopefully better) set of clusters.
![Page 51: Data Mining Cluster Analysis: Advanced Concepts and Algorithms](https://reader030.vdocuments.site/reader030/viewer/2022012102/616a03ba11a7b741a34ddf15/html5/thumbnails/51.jpg)
Data e Web Mining 51
BIRCH
BIRCH maintains a balanced CF-Tree – Branching Factor B: max entry number in a non-leaf – Max size leaf L: max entry number in a leaf – Threshold T: the diameter of a leaf < T
CF Tree CF1
child1
CF3
child3
CF2
child2
CF6
child6
CF1
child1
CF3
child3
CF2
child2
CF5
child5
CF1 CF2 CF6 prev next CF1 CF2 CF4 prev next
B = 7
L = 6
Root
Non-leaf node
Leaf node Leaf node
![Page 52: Data Mining Cluster Analysis: Advanced Concepts and Algorithms](https://reader030.vdocuments.site/reader030/viewer/2022012102/616a03ba11a7b741a34ddf15/html5/thumbnails/52.jpg)
Data e Web Mining 52
High dimensionality Size of data set Sparsity of attribute values Noise and Outliers Types of attributes and type of data sets Differences in attribute scale Properties of the data space
– Can you define a meaningful centroid
Characteristics of Data
![Page 53: Data Mining Cluster Analysis: Advanced Concepts and Algorithms](https://reader030.vdocuments.site/reader030/viewer/2022012102/616a03ba11a7b741a34ddf15/html5/thumbnails/53.jpg)
Data e Web Mining 53
Data distribution Shape Differing sizes Differing densities Poor separation Relationship of clusters Subspace clusters
Characteristics of Clusters
![Page 54: Data Mining Cluster Analysis: Advanced Concepts and Algorithms](https://reader030.vdocuments.site/reader030/viewer/2022012102/616a03ba11a7b741a34ddf15/html5/thumbnails/54.jpg)
Data e Web Mining 54
Order dependence Non-determinism Parameter selection Scalability Underlying model Optimization based approach
Characteristics of Clustering Algorithms