![Page 1: Click to add title - WordPress.com · · 11/7/2016 What is Cluster Analysis? Clustering is the process of grouping large data sets according to their similarity. A cluster is a](https://reader033.vdocuments.site/reader033/viewer/2022060213/5f054d947e708231d4124b53/html5/thumbnails/1.jpg)
Clustering
YZM 3226 – Makine Öğrenmesi
![Page 2: Click to add title - WordPress.com · · 11/7/2016 What is Cluster Analysis? Clustering is the process of grouping large data sets according to their similarity. A cluster is a](https://reader033.vdocuments.site/reader033/viewer/2022060213/5f054d947e708231d4124b53/html5/thumbnails/2.jpg)
Outline
◘ What is Cluster Analysis?
◘ Similarity Measure
◘ Clustering Applications
◘ Clustering Methods
◘ Clustering Algorithm Selection
◘ Cluster Validation
![Page 3: Click to add title - WordPress.com · · 11/7/2016 What is Cluster Analysis? Clustering is the process of grouping large data sets according to their similarity. A cluster is a](https://reader033.vdocuments.site/reader033/viewer/2022060213/5f054d947e708231d4124b53/html5/thumbnails/3.jpg)
What is Cluster Analysis?
◘ Clustering is the process of grouping large data sets according to
their similarity.
◘ A cluster is a collection of data objects that are similar to one
another within the same cluster.
Size Based Geographic Distance Based
Each point represents a house
![Page 4: Click to add title - WordPress.com · · 11/7/2016 What is Cluster Analysis? Clustering is the process of grouping large data sets according to their similarity. A cluster is a](https://reader033.vdocuments.site/reader033/viewer/2022060213/5f054d947e708231d4124b53/html5/thumbnails/4.jpg)
Clustering Definition
◘ Grouping similar data objects into clusters.
◘ Clustering
– Given a set of data points
– Data points have a set of attributes find clusters
– A similarity measure
![Page 5: Click to add title - WordPress.com · · 11/7/2016 What is Cluster Analysis? Clustering is the process of grouping large data sets according to their similarity. A cluster is a](https://reader033.vdocuments.site/reader033/viewer/2022060213/5f054d947e708231d4124b53/html5/thumbnails/5.jpg)
Clustering Applications
◘ Spatial Data Analysis
– Detect spatial clusters
– e.g. Create thematic maps in GIS
◘ Image Processing
◘ WWW
– Document clustering
• To find groups of documents that are similar to each other based on the
important terms appearing in them.
– Cluster Weblog data
• To discover groups of similar access patterns
◘ Marketting
◘ Medical Applications
◘ Science Applications
![Page 6: Click to add title - WordPress.com · · 11/7/2016 What is Cluster Analysis? Clustering is the process of grouping large data sets according to their similarity. A cluster is a](https://reader033.vdocuments.site/reader033/viewer/2022060213/5f054d947e708231d4124b53/html5/thumbnails/6.jpg)
Example Clustering Applications
◘ Marketing: – Customer Segmentation: Find clusters of similar customers
– Market Segmentation: Subdivide a market into subsets
◘ Medical: Clustering of disease
◘ Insurance: Identifying groups of insurance policy holders
◘ City-planning: Identifying groups of houses according to their house
type, value, and geographical location
◘ Earth-quake studies: Observed earth quake epicenters should be
clustered along continent faults
Income
![Page 7: Click to add title - WordPress.com · · 11/7/2016 What is Cluster Analysis? Clustering is the process of grouping large data sets according to their similarity. A cluster is a](https://reader033.vdocuments.site/reader033/viewer/2022060213/5f054d947e708231d4124b53/html5/thumbnails/7.jpg)
Data Types
Categorical
Nominal Ordinal Continous Discrete
Numeric
Continous Discrete Binary Categorical
(Nominal)
Categorical
(Ordinal)
12 0-18 Smoker Mountain bicycle Very Unhappy
45 18-40 Non-Smoker Utility bicycle Unhappy
34 40-100 Racing bicycle Neutral
9 Happy
48 Very Happy
Similarity Measures
![Page 8: Click to add title - WordPress.com · · 11/7/2016 What is Cluster Analysis? Clustering is the process of grouping large data sets according to their similarity. A cluster is a](https://reader033.vdocuments.site/reader033/viewer/2022060213/5f054d947e708231d4124b53/html5/thumbnails/8.jpg)
Similarity Measures
◘ Numeric Data
– If attributes are continuous:
• Manhattan Distance (p=1)
• Euclidean Distance (p=2)
• Minkowski Distance
◘ Categorical Data
– Jaccard's distance
– ...
◘ Others
– Problem-specific measures
cbacb jid
),(
![Page 9: Click to add title - WordPress.com · · 11/7/2016 What is Cluster Analysis? Clustering is the process of grouping large data sets according to their similarity. A cluster is a](https://reader033.vdocuments.site/reader033/viewer/2022060213/5f054d947e708231d4124b53/html5/thumbnails/9.jpg)
p1 p2 p3 p4
p1 0 2.828 3.162 5.099
p2 2.828 0 1.414 3.162
p3 3.162 1.414 0 2
p4 5.099 3.162 2 0
1- Similarity Measures for Numeric Data
0
1
2
3
0 1 2 3 4 5 6
p1
p2
p3 p4
point x y
p1 0 2
p2 2 0
p3 3 1
p4 5 1
Euclidean Distance Matrix
p1 p2 p3 p4
p1 0 4 4 6
p2 4 0 2 4
p3 4 2 0 2
p4 6 4 2 0
Manhattan Distance Matrix
![Page 10: Click to add title - WordPress.com · · 11/7/2016 What is Cluster Analysis? Clustering is the process of grouping large data sets according to their similarity. A cluster is a](https://reader033.vdocuments.site/reader033/viewer/2022060213/5f054d947e708231d4124b53/html5/thumbnails/10.jpg)
p1 p2 p3 p4
p1 0 2.828 3.162 5.099
p2 2.828 0 1.414 3.162
p3 3.162 1.414 0 2
p4 5.099 3.162 2 0
Similarity Measures for Numeric Data
0
1
2
3
0 1 2 3 4 5 6
p1
p2
p3 p4
point x y
p1 0 2
p2 2 0
p3 3 1
p4 5 1
Euclidean Distance Matrix
p1 p2 p3 p4
p1 0 4 4 6
p2 4 0 2 4
p3 4 2 0 2
p4 6 4 2 0
Manhattan Distance Matrix
![Page 11: Click to add title - WordPress.com · · 11/7/2016 What is Cluster Analysis? Clustering is the process of grouping large data sets according to their similarity. A cluster is a](https://reader033.vdocuments.site/reader033/viewer/2022060213/5f054d947e708231d4124b53/html5/thumbnails/11.jpg)
Example for Clustering Numeric Data
◘ Document Clustering
– Each document becomes a `term' vector,
• each term is a component (attribute) of the vector,
• the value of each component is the number of times the corresponding term occurs in
the document.
Document 1
se
aso
n
time
ou
t
lost
wi
n
ga
me
sco
re
ba
ll
play
co
ach
tea
m
Document 2
Document 3
3 0 5 0 2 6 0 2 0 2
0
0
7 0 2 1 0 0 3 0 0
1 0 0 1 2 2 0 3 0
Doc Doc Doc Doc Doc Doc Doc Doc
![Page 12: Click to add title - WordPress.com · · 11/7/2016 What is Cluster Analysis? Clustering is the process of grouping large data sets according to their similarity. A cluster is a](https://reader033.vdocuments.site/reader033/viewer/2022060213/5f054d947e708231d4124b53/html5/thumbnails/12.jpg)
2- Similarity Measures for Categorical Data
◘ Categorical Data:
◘ e.g. Binary Variables - 0/1 - presence/absence
– Jaccard's coefficient (measure similarity)
– Jaccard's distance (measure dissimilarity)
cbaa jisim
Jaccard ),(
pdbcasum
dcdc
baba
sum
0
1
01
Object i
Object j
cbacb jid
),(
![Page 13: Click to add title - WordPress.com · · 11/7/2016 What is Cluster Analysis? Clustering is the process of grouping large data sets according to their similarity. A cluster is a](https://reader033.vdocuments.site/reader033/viewer/2022060213/5f054d947e708231d4124b53/html5/thumbnails/13.jpg)
Example for Clustering Categorical Data
◘ Find the Jaccard's distance between Apple and Banana.
Feature of Fruit Sphere shape Sweet Sour Crunchy
Object i =Apple Yes Yes Yes Yes
Object j =Banana No Yes No No
pdbcasum
dcdc
baba
sum
0
1
01
Object i
Object j
cbacb jid
),((a = 1, b = 3, c = 0, d= 0)
(3+0) / (1 + 3 + 0) = 3/4 = 0.75
![Page 14: Click to add title - WordPress.com · · 11/7/2016 What is Cluster Analysis? Clustering is the process of grouping large data sets according to their similarity. A cluster is a](https://reader033.vdocuments.site/reader033/viewer/2022060213/5f054d947e708231d4124b53/html5/thumbnails/14.jpg)
Example for Clustering Categorical Data
◘ Who are the most likely to have a similar disease?
Let the values Y and P be set to 1, and the value N be set to 0
Result: Jim and Mary are unlikely to have a similar disease.
Jack and Mary are the most likely to have a similar disease.
Name Gender Fever Cough Test-1 Test-2 Test-3 Test-4
Jack M Y N P N N N
Mary F Y N P N P N
Jim M Y P N N N N
75.0211
21),(
67.0111
11),(
33.0102
10),(
maryjimd
jimjackd
maryjackd
pdbcasum
dcdc
baba
sum
0
1
01
Object i
Object j
cbacb jid
),(
![Page 15: Click to add title - WordPress.com · · 11/7/2016 What is Cluster Analysis? Clustering is the process of grouping large data sets according to their similarity. A cluster is a](https://reader033.vdocuments.site/reader033/viewer/2022060213/5f054d947e708231d4124b53/html5/thumbnails/15.jpg)
Clustering Methods
◘ Partitioning Methods
– K-Means, K-Medoids, PAM, CLARA, CLARANS, ...
◘ Hierarchical Methods
– AGNES, DIANA, BIRCH, CURE, CHAMELEON, ...
◘ Density-Based Methods
– DBSCAN, OPTICS, DENCLUE, ...
◘ Grid-Based Methods
– STING, WaveCluster, CLIQUE ...
◘ Model-Based Methods
– COBWEB, CLASSIT, SOM (Self-Organizing Feature Maps) ...
![Page 16: Click to add title - WordPress.com · · 11/7/2016 What is Cluster Analysis? Clustering is the process of grouping large data sets according to their similarity. A cluster is a](https://reader033.vdocuments.site/reader033/viewer/2022060213/5f054d947e708231d4124b53/html5/thumbnails/16.jpg)
Clustering Algorithm Selection
1. Scalability
– Efficiently execution on large databases
– Scanning database only several times
2. Running on different data types
– Continuous, discrete, binary, nominal, ordinal, …
3. Updateability
– Updating clusters after insertion and deletion of some data values
4. Efficient memory usage
5. Input parameters
– Results in different outputs on different inputs
– Un-understandable and too many input parameters
6. Without any foreknowledge
7. Different cluster shapes
8. Workable on dirty data
– Workable on missing, wrong and noise data
9. Insensitivity on data ordering
10. Multi dimension
– Workability on multi-dimensional datasets
11. Usability for different areas
![Page 17: Click to add title - WordPress.com · · 11/7/2016 What is Cluster Analysis? Clustering is the process of grouping large data sets according to their similarity. A cluster is a](https://reader033.vdocuments.site/reader033/viewer/2022060213/5f054d947e708231d4124b53/html5/thumbnails/17.jpg)
Cluster Characteristics
◘ Each cluster is represented by the following characteristics
![Page 18: Click to add title - WordPress.com · · 11/7/2016 What is Cluster Analysis? Clustering is the process of grouping large data sets according to their similarity. A cluster is a](https://reader033.vdocuments.site/reader033/viewer/2022060213/5f054d947e708231d4124b53/html5/thumbnails/18.jpg)
Categories of Clustering Algorithms
Partitioning
Methods
Hierarchical
Methods
K-Means
K-Medoid
PAM
CLARA
CLARANS
AGNES
DIANA
BIRCH
CURE
CHAMELEON
OPTICS
DENCLUE
STING
WaveCluster
CLIQUE
Model
Based
Methods
Density
Based
Methods
Grid
Based
Methods
1
2
3
4
5
6
1
23 4
5
COBWEB
CLASSIT
SOM
DBSCAN
![Page 19: Click to add title - WordPress.com · · 11/7/2016 What is Cluster Analysis? Clustering is the process of grouping large data sets according to their similarity. A cluster is a](https://reader033.vdocuments.site/reader033/viewer/2022060213/5f054d947e708231d4124b53/html5/thumbnails/19.jpg)
Partitioning Methods
◘ Construct a partition of a database D of n objects into a set of k
clusters
◘ Given a k, find a partition of k clusters that optimizes the chosen
partitioning criterion e.g. minimize SSE (Sum of Squared Distance)
![Page 20: Click to add title - WordPress.com · · 11/7/2016 What is Cluster Analysis? Clustering is the process of grouping large data sets according to their similarity. A cluster is a](https://reader033.vdocuments.site/reader033/viewer/2022060213/5f054d947e708231d4124b53/html5/thumbnails/20.jpg)
K-Means
◘ K-Means is an algorithm to cluster n objects based on attributes into k
partitions, k < n.
![Page 21: Click to add title - WordPress.com · · 11/7/2016 What is Cluster Analysis? Clustering is the process of grouping large data sets according to their similarity. A cluster is a](https://reader033.vdocuments.site/reader033/viewer/2022060213/5f054d947e708231d4124b53/html5/thumbnails/21.jpg)
K-Means Example
![Page 22: Click to add title - WordPress.com · · 11/7/2016 What is Cluster Analysis? Clustering is the process of grouping large data sets according to their similarity. A cluster is a](https://reader033.vdocuments.site/reader033/viewer/2022060213/5f054d947e708231d4124b53/html5/thumbnails/22.jpg)
K-Means Example
![Page 23: Click to add title - WordPress.com · · 11/7/2016 What is Cluster Analysis? Clustering is the process of grouping large data sets according to their similarity. A cluster is a](https://reader033.vdocuments.site/reader033/viewer/2022060213/5f054d947e708231d4124b53/html5/thumbnails/23.jpg)
◘ http://home.dei.polimi.it/matteucc/Clustering/tutorial_html/AppletKM.html
K-Means Demo
![Page 24: Click to add title - WordPress.com · · 11/7/2016 What is Cluster Analysis? Clustering is the process of grouping large data sets according to their similarity. A cluster is a](https://reader033.vdocuments.site/reader033/viewer/2022060213/5f054d947e708231d4124b53/html5/thumbnails/24.jpg)
K-Means Adv. DisAdv.
◘ Strength:
– Relatively efficient: O(tkn) n is # objects, k is # clusters, and t is # iterations.
– Easy to understand
◘ Weakness
– Applicable only when mean is defined, then what about categorical data?
– Need to specify k, the number of clusters, in advance
– Unable to handle noisy data and outliers
– Not suitable to discover clusters with non-convex shapes
K-Means result Other clustering
algorithm result
![Page 25: Click to add title - WordPress.com · · 11/7/2016 What is Cluster Analysis? Clustering is the process of grouping large data sets according to their similarity. A cluster is a](https://reader033.vdocuments.site/reader033/viewer/2022060213/5f054d947e708231d4124b53/html5/thumbnails/25.jpg)
K-Medoids Method
◘ K-Medoids: Instead of taking the mean value of the object in a cluster as a
reference point, medoids can be used, which is the most centrally located
object in a cluster.
![Page 26: Click to add title - WordPress.com · · 11/7/2016 What is Cluster Analysis? Clustering is the process of grouping large data sets according to their similarity. A cluster is a](https://reader033.vdocuments.site/reader033/viewer/2022060213/5f054d947e708231d4124b53/html5/thumbnails/26.jpg)
K-Medoids Method
K-Means
K-Medoids
![Page 27: Click to add title - WordPress.com · · 11/7/2016 What is Cluster Analysis? Clustering is the process of grouping large data sets according to their similarity. A cluster is a](https://reader033.vdocuments.site/reader033/viewer/2022060213/5f054d947e708231d4124b53/html5/thumbnails/27.jpg)
Categories of Clustering Algorithms
Partitioning
Methods
Hierarchical
Methods
K-Means
K-Medoid
PAM
CLARA
CLARANS
OPTICS
DENCLUE
STING
WaveCluster
CLIQUE
Model
Based
Methods
Density
Based
Methods
Grid
Based
Methods
1
2
3
4
5
6
1
23 4
5
COBWEB
CLASSIT
SOM
DBSCAN
AGNES
DIANA
BIRCH
CURE
CHAMELEON
![Page 28: Click to add title - WordPress.com · · 11/7/2016 What is Cluster Analysis? Clustering is the process of grouping large data sets according to their similarity. A cluster is a](https://reader033.vdocuments.site/reader033/viewer/2022060213/5f054d947e708231d4124b53/html5/thumbnails/28.jpg)
Hierarchical Clustering
◘ Create a hierarchical decomposition of the set of data using some criterion
◘ Strength: This method does not require the number of clusters k as an input.
◘ Weakness: But it needs a termination condition.
Step 0 Step 1 Step 2 Step 3 Step 4
b
d
c
e
a a b
d e
c d e
a b c d e
Step 4 Step 3 Step 2 Step 1 Step 0
agglomerative
(AGNES)
divisive
(DIANA)
![Page 29: Click to add title - WordPress.com · · 11/7/2016 What is Cluster Analysis? Clustering is the process of grouping large data sets according to their similarity. A cluster is a](https://reader033.vdocuments.site/reader033/viewer/2022060213/5f054d947e708231d4124b53/html5/thumbnails/29.jpg)
Hierarchical Clustering
![Page 30: Click to add title - WordPress.com · · 11/7/2016 What is Cluster Analysis? Clustering is the process of grouping large data sets according to their similarity. A cluster is a](https://reader033.vdocuments.site/reader033/viewer/2022060213/5f054d947e708231d4124b53/html5/thumbnails/30.jpg)
How the Clusters are Merged?
◘ Given: Matrix of similarity between every point pair
◘ Start with each point in a separate cluster and merge clusters based on some
criteria:
– Single link: The minimum of the distances between the members of the clusters.
– Complete link: The maximum of the distances between the members of the clusters.
– Average link: The average of the distances between the members of the clusters.
Min
distance
Average
distance
Max
distance
![Page 31: Click to add title - WordPress.com · · 11/7/2016 What is Cluster Analysis? Clustering is the process of grouping large data sets according to their similarity. A cluster is a](https://reader033.vdocuments.site/reader033/viewer/2022060213/5f054d947e708231d4124b53/html5/thumbnails/31.jpg)
How the Clusters are Merged?
Single Link: Kümelerin en yakın
elemanları arasındaki uzaklık
Complete Link: Kümelerin en uzak
elemanları arasındaki uzaklık
Average Link: Bir kümedeki elemanlar ve diğer
kümedeki başka elemanlar arasındaki ortalama
uzaklık
Centroid Link: İki kümenin
centroidleri arasındaki uzaklık
![Page 32: Click to add title - WordPress.com · · 11/7/2016 What is Cluster Analysis? Clustering is the process of grouping large data sets according to their similarity. A cluster is a](https://reader033.vdocuments.site/reader033/viewer/2022060213/5f054d947e708231d4124b53/html5/thumbnails/32.jpg)
How the Clusters are Merged?
Single Link
1
2
3
4
5
6
1
2
3
4
5
3 6 2 5 4 10
0.05
0.1
0.15
0.2 1
2
3
4
5
6
1
2 5
3
4
3 6 4 1 2 50
0.05
0.1
0.15
0.2
0.25
0.3
0.35
0.4
Complete Link
1
2
3
4
5
6 1
2
5
3
4
Average Link
3 6 4 1 2 50
0.05
0.1
0.15
0.2
0.25
![Page 33: Click to add title - WordPress.com · · 11/7/2016 What is Cluster Analysis? Clustering is the process of grouping large data sets according to their similarity. A cluster is a](https://reader033.vdocuments.site/reader033/viewer/2022060213/5f054d947e708231d4124b53/html5/thumbnails/33.jpg)
Hierarchical Clustering Demo
◘ http://home.dei.polimi.it/matteucc/Clustering/tutorial_html/AppletH.html
![Page 34: Click to add title - WordPress.com · · 11/7/2016 What is Cluster Analysis? Clustering is the process of grouping large data sets according to their similarity. A cluster is a](https://reader033.vdocuments.site/reader033/viewer/2022060213/5f054d947e708231d4124b53/html5/thumbnails/34.jpg)
Categories of Clustering Algorithms
Partitioning
Methods
Hierarchical
Methods
K-Means
K-Medoid
PAM
CLARA
CLARANS
OPTICS
DENCLUE
STING
WaveCluster
CLIQUE
Model
Based
Methods
Density
Based
Methods
Grid
Based
Methods
1
2
3
4
5
6
1
23 4
5
COBWEB
CLASSIT
SOM
DBSCAN
AGNES
DIANA
BIRCH
CURE
CHAMELEON
![Page 35: Click to add title - WordPress.com · · 11/7/2016 What is Cluster Analysis? Clustering is the process of grouping large data sets according to their similarity. A cluster is a](https://reader033.vdocuments.site/reader033/viewer/2022060213/5f054d947e708231d4124b53/html5/thumbnails/35.jpg)
Density-Based Clustering
◘ Dense objects should be grouped together into one cluster.
◘ They use a fixed threshold value to determine dense regions.
◘ Density-based clustering algorithms
– DBSCAN (1996)
– OPTICS (1999)
– DENCLUE (1998)
Check the number of points within a specified radius of the point
Core
Border
Outlier
Eps = 1cm
MinPts = 5
![Page 36: Click to add title - WordPress.com · · 11/7/2016 What is Cluster Analysis? Clustering is the process of grouping large data sets according to their similarity. A cluster is a](https://reader033.vdocuments.site/reader033/viewer/2022060213/5f054d947e708231d4124b53/html5/thumbnails/36.jpg)
Categories of Clustering Algorithms
Partitioning
Methods
Hierarchical
Methods
K-Means
K-Medoid
PAM
CLARA
CLARANS
OPTICS
DENCLUE
STING
WaveCluster
CLIQUE
Model
Based
Methods
Density
Based
Methods
Grid
Based
Methods
1
2
3
4
5
6
1
23 4
5
COBWEB
CLASSIT
SOM
DBSCAN
AGNES
DIANA
BIRCH
CURE
CHAMELEON
![Page 37: Click to add title - WordPress.com · · 11/7/2016 What is Cluster Analysis? Clustering is the process of grouping large data sets according to their similarity. A cluster is a](https://reader033.vdocuments.site/reader033/viewer/2022060213/5f054d947e708231d4124b53/html5/thumbnails/37.jpg)
Grid Based Clustering Methods
◘ Simplest approach is to divide region into a number of rectangular
cells of equal volume and define density as # of points the cell
contains
![Page 38: Click to add title - WordPress.com · · 11/7/2016 What is Cluster Analysis? Clustering is the process of grouping large data sets according to their similarity. A cluster is a](https://reader033.vdocuments.site/reader033/viewer/2022060213/5f054d947e708231d4124b53/html5/thumbnails/38.jpg)
Categories of Clustering Algorithms
Partitioning
Methods
Hierarchical
Methods
K-Means
K-Medoid
PAM
CLARA
CLARANS
OPTICS
DENCLUE
STING
WaveCluster
CLIQUE
Model
Based
Methods
Density
Based
Methods
Grid
Based
Methods
1
2
3
4
5
6
1
23 4
5
COBWEB
CLASSIT
SOM
DBSCAN
AGNES
DIANA
BIRCH
CURE
CHAMELEON
![Page 39: Click to add title - WordPress.com · · 11/7/2016 What is Cluster Analysis? Clustering is the process of grouping large data sets according to their similarity. A cluster is a](https://reader033.vdocuments.site/reader033/viewer/2022060213/5f054d947e708231d4124b53/html5/thumbnails/39.jpg)
Model Based Methods
Attempt to optimize the fit between the given data and some mathematical model
It uses statistical functions
![Page 40: Click to add title - WordPress.com · · 11/7/2016 What is Cluster Analysis? Clustering is the process of grouping large data sets according to their similarity. A cluster is a](https://reader033.vdocuments.site/reader033/viewer/2022060213/5f054d947e708231d4124b53/html5/thumbnails/40.jpg)
Clustering Algorithms
General Overview
![Page 41: Click to add title - WordPress.com · · 11/7/2016 What is Cluster Analysis? Clustering is the process of grouping large data sets according to their similarity. A cluster is a](https://reader033.vdocuments.site/reader033/viewer/2022060213/5f054d947e708231d4124b53/html5/thumbnails/41.jpg)
Factors Affecting Clustering Results
◘ Outliers
◘ Inappropriate value for parameters
◘ Drawbacks of the clustering algorithm themselves
INPUT DATASET
GOOD CLUSTERING BAD CLUSTERING
Parameter (k=6) Parameter (k=20)
![Page 42: Click to add title - WordPress.com · · 11/7/2016 What is Cluster Analysis? Clustering is the process of grouping large data sets according to their similarity. A cluster is a](https://reader033.vdocuments.site/reader033/viewer/2022060213/5f054d947e708231d4124b53/html5/thumbnails/42.jpg)
Cluster Validation - SSE
◘ Clustering • Data points in one cluster are more similar to one another.
• Data points in separate clusters are less similar to one another.
◘ A good clustering method will produce high quality clusters with • high intra-class similarity
• low inter-class similarity
Intracluster distances
are minimized
Intercluster distances
are maximized
Clustering in 3-D space.
Sum of Squared Error (SSE)
![Page 43: Click to add title - WordPress.com · · 11/7/2016 What is Cluster Analysis? Clustering is the process of grouping large data sets according to their similarity. A cluster is a](https://reader033.vdocuments.site/reader033/viewer/2022060213/5f054d947e708231d4124b53/html5/thumbnails/43.jpg)
Cluster Validation - Correlation
Between -1.0 and +1.0