clustering_kuk.pptx
TRANSCRIPT
![Page 1: Clustering_KUK.pptx](https://reader036.vdocuments.site/reader036/viewer/2022081603/563dbb7d550346aa9aada167/html5/thumbnails/1.jpg)
Introduction to Clustering
Alka AroraSr. Scientist
Indian Agricultural Statistics Research InstituteLibrary Avenue, Pusa, New [email protected], [email protected]
![Page 2: Clustering_KUK.pptx](https://reader036.vdocuments.site/reader036/viewer/2022081603/563dbb7d550346aa9aada167/html5/thumbnails/2.jpg)
• Organizing data into classes such that there is• high intra-class similarity
• low inter-class similarity
• Finding the class labels and the number of classes directly from the data (in contrast to classification).
• More informally, finding natural groupings among objects.
What is Clustering?Also called unsupervised learning, sometimes called classification by statisticians and sorting by psychologists and segmentation by people in marketing
![Page 3: Clustering_KUK.pptx](https://reader036.vdocuments.site/reader036/viewer/2022081603/563dbb7d550346aa9aada167/html5/thumbnails/3.jpg)
School Employees Simpson's Family Males Females
Clustering is subjective
What is a natural grouping among these objects?
![Page 4: Clustering_KUK.pptx](https://reader036.vdocuments.site/reader036/viewer/2022081603/563dbb7d550346aa9aada167/html5/thumbnails/4.jpg)
How do we measure similarity?
0.23 3 342.7
Peter Piotr
![Page 5: Clustering_KUK.pptx](https://reader036.vdocuments.site/reader036/viewer/2022081603/563dbb7d550346aa9aada167/html5/thumbnails/5.jpg)
Peter
Piter
Pioter
Piotr
Substitution (i for e)
Insertion (o)
Deletion (e)
Edit Distance Example It is possible to transform any string Q into string C, using only Substitution, Insertion and Deletion.Assume that each of these operators has a cost associated with it.
The similarity between two strings can be defined as the cost of the cheapest transformation from Q to C.
How similar are the names “Peter” and “Piotr”?Assume the following cost function
Substitution 1 UnitInsertion 1 UnitDeletion 1 Unit
D(Peter,Piotr) is 3
![Page 6: Clustering_KUK.pptx](https://reader036.vdocuments.site/reader036/viewer/2022081603/563dbb7d550346aa9aada167/html5/thumbnails/6.jpg)
Similarity / Dissimilarity
• Similarity between feature i and feature j is denoted by sij. We can measure this quantity in several ways depending on the scale of measurement (or data type) that we have.
• Dissimilarity (dij) is opposite to similarity. There are many types of distance and similarity. Each similarity or dissimilarity has its own characteristics.
• The relationship between dissimilarity and similarity is given by
• dij= 1-sij
![Page 7: Clustering_KUK.pptx](https://reader036.vdocuments.site/reader036/viewer/2022081603/563dbb7d550346aa9aada167/html5/thumbnails/7.jpg)
Distance for Binary Variables
• Binary variables are those which take binary value such as Yes and No, or Agree and Disagree, True and False, Success and Failure, 0 and 1, Absence or Present, Positive and Negative, etc.
• Binary variables take only two possible values, which can be represented as positive and negative. Similarity or dissimilarity (distance) of two objects can be measured in term of number of occurrence (frequency) of positive and negative in each object.
![Page 8: Clustering_KUK.pptx](https://reader036.vdocuments.site/reader036/viewer/2022081603/563dbb7d550346aa9aada167/html5/thumbnails/8.jpg)
Example
• Let p = number of variables that positive for both objects .• q = number of variables that positive for the i th objects and negative for
the j th object • r= number of variables that negative for the i th objects and positive for the
j th object • s= number of variables that negative for both objects • t= p+q+r+s = total number of variables. • Apple and Banana have p=1, q=3 and r=0, s=0. Thus, t= p+q+r+s=4.
Feature of Fruit Sphere shape Sweet Sour Crunchy
Apple Yes Yes Yes Yes
Banana No Yes No No
![Page 9: Clustering_KUK.pptx](https://reader036.vdocuments.site/reader036/viewer/2022081603/563dbb7d550346aa9aada167/html5/thumbnails/9.jpg)
Distance Measures for Binary Variables
• Simple Matching distance – =3/4
• Jaccard's distance
– =3/4• Hamming distance
– The length of different digits is called Hamming distance = =3
![Page 10: Clustering_KUK.pptx](https://reader036.vdocuments.site/reader036/viewer/2022081603/563dbb7d550346aa9aada167/html5/thumbnails/10.jpg)
Distance for Quantitative Variables
Euclidean Distance examines the root of square differences between coordinates of a pair of objects .
City block (Manhattan) distance examines the absolute differences between coordinates of a pair of objects.
A = (0, 3, 4, 5)
B = (7, 6, 3, -1)
![Page 11: Clustering_KUK.pptx](https://reader036.vdocuments.site/reader036/viewer/2022081603/563dbb7d550346aa9aada167/html5/thumbnails/11.jpg)
Distance for Quantitative Variables
Chebyshev distance is also called Maximum value distance. It examines the absolute magnitude of the differences between coordinates of a pair of objects. This distance can be used for both ordinal and quantitative variables.
Minkowski Distance is the generalized metric distance. When , it becomes city block distance and when , it becomes Euclidean distance. Chebyshev distance is a special case of Minkowski distance with (taking a limit).
A = (0, 3, 4, 5)
B = (7, 6, 3, -1)
![Page 12: Clustering_KUK.pptx](https://reader036.vdocuments.site/reader036/viewer/2022081603/563dbb7d550346aa9aada167/html5/thumbnails/12.jpg)
Distance for Quantitative Variables
• Angular separation :It represents cosine angle between two vectors. It measures similarity rather than distance or dissimilarity. Thus, higher value of Angular separation indicates the two objects are similar. The value of angular separation is [-1, 1] similar to cosine. It is often called as Coefficient of Correlation.
A = (0, 3, 4, 5)
B = (7, 6, 3, -1)
![Page 13: Clustering_KUK.pptx](https://reader036.vdocuments.site/reader036/viewer/2022081603/563dbb7d550346aa9aada167/html5/thumbnails/13.jpg)
Distance for Quantitative Variables
• Correlation coefficient is standardized angular separation by centering the coordinates to its mean value. The value is between -1 and +1. It measures similarity rather than distance or dissimilarity.
A = (0, 3, 4, 5)
B = (7, 6, 3, -1)
![Page 14: Clustering_KUK.pptx](https://reader036.vdocuments.site/reader036/viewer/2022081603/563dbb7d550346aa9aada167/html5/thumbnails/14.jpg)
Two Types of Clustering
Hierarchical
• Partitional algorithms: Construct various partitions and then evaluate them by some criterion
• Hierarchical algorithms: Create a hierarchical decomposition of the set of objects using some criterion
Partitional
![Page 15: Clustering_KUK.pptx](https://reader036.vdocuments.site/reader036/viewer/2022081603/563dbb7d550346aa9aada167/html5/thumbnails/15.jpg)
A Useful Tool for Summarizing Similarity Measurements Dendrogram is a tool used for similarity measurement in hierarchical clustering.
Root
Internal Branch
Terminal Branch
Leaf
Internal Node
Root
Internal Branch
Terminal Branch
Leaf
Internal Node
The similarity between two objects in a dendrogram is represented as the height of the lowest internal node they share.
![Page 16: Clustering_KUK.pptx](https://reader036.vdocuments.site/reader036/viewer/2022081603/563dbb7d550346aa9aada167/html5/thumbnails/16.jpg)
Business & Economy
B2B Finance Shopping Jobs
Aerospace Agriculture… Banking Bonds… Animals Apparel Career Workspace
Note that hierarchies are commonly used to organize information, for example in a web portal.
Yahoo’s hierarchy is manually created, data mining focus on automatic creation of hierarchies.
![Page 17: Clustering_KUK.pptx](https://reader036.vdocuments.site/reader036/viewer/2022081603/563dbb7d550346aa9aada167/html5/thumbnails/17.jpg)
We can look at the dendrogram to determine the “correct” number of clusters. In this case, the two highly separated subtrees are highly suggestive of two clusters. (Things are rarely this clear cut, unfortunately)
![Page 18: Clustering_KUK.pptx](https://reader036.vdocuments.site/reader036/viewer/2022081603/563dbb7d550346aa9aada167/html5/thumbnails/18.jpg)
Outlier
One potential use of a dendrogram is to detect outliers
The single isolated branch is suggestive of a data point that is very different to all others
![Page 19: Clustering_KUK.pptx](https://reader036.vdocuments.site/reader036/viewer/2022081603/563dbb7d550346aa9aada167/html5/thumbnails/19.jpg)
(How-to) Hierarchical Clustering
The number of dendrograms with n leafs = (2n -3)!/[(2(n -2)) (n -2)!]
Number Number of Possibleof Leafs Dendrograms 2 13 34 155 105... …10 34,459,425
Since we cannot test all possible trees we will have to heuristic search of all possible trees. We could do this..
Bottom-Up (agglomerative): Starting with each item in its own cluster, find the best pair to merge into a new cluster. Repeat until all clusters are fused together.
Top-Down (divisive): Starting with all the data in a single cluster, consider every possible way to divide the cluster into two. Choose the best division and recursively operate on both sides.
![Page 20: Clustering_KUK.pptx](https://reader036.vdocuments.site/reader036/viewer/2022081603/563dbb7d550346aa9aada167/html5/thumbnails/20.jpg)
Process flow of Agglomerative Hierarchical Clustering
• Convert object features to distance matrix. • Set each object as a cluster (thus if we
have 6 objects, we will have 6 clusters in the beginning)
• Iterate until number of cluster is 1 – Merge two closest clusters – Update distance matrix
![Page 21: Clustering_KUK.pptx](https://reader036.vdocuments.site/reader036/viewer/2022081603/563dbb7d550346aa9aada167/html5/thumbnails/21.jpg)
Linkage criteria to determine the distance between an object and a cluster, or defining the distance between two clusters.
• Single linkage (nearest neighbor): In this method the distance between two clusters is determined by the distance of the two closest objects (nearest neighbors) in the different clusters.
• Complete linkage (furthest neighbor): In this method, the distances between clusters are determined by the greatest distance between
any two objects in the different clusters (i.e., by the "furthest neighbors").
• Group average linkage: In this method, the distance between two clusters is calculated as the average distance between all pairs of objects in
the two different clusters.
• Wards Linkage: In this method, we try to minimize the variance of the merged clusters
![Page 22: Clustering_KUK.pptx](https://reader036.vdocuments.site/reader036/viewer/2022081603/563dbb7d550346aa9aada167/html5/thumbnails/22.jpg)
Example
• Consider 6 objects (with name A, B, C, D, E and F)
• Two measured features (X1 and X2).
• Distance between object A = (1, 1) and B = (1.5, 1.5)
• D = (3, 4) and F = (3, 3.5)
![Page 23: Clustering_KUK.pptx](https://reader036.vdocuments.site/reader036/viewer/2022081603/563dbb7d550346aa9aada167/html5/thumbnails/23.jpg)
Example cont..
• Distance between ungrouped clusters will not change from the original distance matrix.
• Now the problem is how to calculate distance between newly grouped clusters (D, F) and other clusters?– Using single linkage, we
specify minimum distance between original objects of the two clusters.
– Using the input distance matrix, distance between cluster (D, F) and cluster A is computed .
![Page 24: Clustering_KUK.pptx](https://reader036.vdocuments.site/reader036/viewer/2022081603/563dbb7d550346aa9aada167/html5/thumbnails/24.jpg)
Example cont..
• Distance between cluster C and cluster (D, F)
• Distance between cluster (D, F) and cluster (A, B)
![Page 25: Clustering_KUK.pptx](https://reader036.vdocuments.site/reader036/viewer/2022081603/563dbb7d550346aa9aada167/html5/thumbnails/25.jpg)
Example cont..
• Distance between cluster ((D, F), E) and cluster (A, B)
![Page 26: Clustering_KUK.pptx](https://reader036.vdocuments.site/reader036/viewer/2022081603/563dbb7d550346aa9aada167/html5/thumbnails/26.jpg)
• No need to specify the number of clusters in advance.
• Hierarchal nature maps nicely onto human intuition for some domains
• They do not scale well: time complexity of at least O(n2), where n is the number of total objects.
• Interpretation of results is (very) subjective.
Summary of Hierarchal Clustering Methods
![Page 27: Clustering_KUK.pptx](https://reader036.vdocuments.site/reader036/viewer/2022081603/563dbb7d550346aa9aada167/html5/thumbnails/27.jpg)
Partitional Clustering
• Nonhierarchical, each instance is placed in exactly one of K non-overlapping clusters.
• Since only one set of clusters is output, the user normally has to input the desired number of clusters K.
![Page 28: Clustering_KUK.pptx](https://reader036.vdocuments.site/reader036/viewer/2022081603/563dbb7d550346aa9aada167/html5/thumbnails/28.jpg)
Algorithm k-meansIt randomly selects k objects as cluster mean or center. It
works towards optimizing square error criteria function
1. Decide on a value for k.
2. Initialize the k cluster centers (randomly, if necessary).
3. Decide the class memberships of the N objects by assigning them to the nearest cluster center.
4. Re-estimate the k cluster centers, by assuming the memberships found above are correct.
5. If none of the N objects changed membership in the last iteration, exit. Otherwise goto 3.
k
i Cxi
i
mx1
2
![Page 29: Clustering_KUK.pptx](https://reader036.vdocuments.site/reader036/viewer/2022081603/563dbb7d550346aa9aada167/html5/thumbnails/29.jpg)
0
1
2
3
4
5
0 1 2 3 4 5
K-means Clustering: Step 1Algorithm: k-means, Distance Metric: Euclidean Distance
k1
k2
k3
![Page 30: Clustering_KUK.pptx](https://reader036.vdocuments.site/reader036/viewer/2022081603/563dbb7d550346aa9aada167/html5/thumbnails/30.jpg)
0
1
2
3
4
5
0 1 2 3 4 5
K-means Clustering: Step 2Algorithm: k-means, Distance Metric: Euclidean Distance
k1
k2
k3
![Page 31: Clustering_KUK.pptx](https://reader036.vdocuments.site/reader036/viewer/2022081603/563dbb7d550346aa9aada167/html5/thumbnails/31.jpg)
0
1
2
3
4
5
0 1 2 3 4 5
K-means Clustering: Step 3Algorithm: k-means, Distance Metric: Euclidean Distance
k1
k2
k3
![Page 32: Clustering_KUK.pptx](https://reader036.vdocuments.site/reader036/viewer/2022081603/563dbb7d550346aa9aada167/html5/thumbnails/32.jpg)
0
1
2
3
4
5
0 1 2 3 4 5
K-means Clustering: Step 4Algorithm: k-means, Distance Metric: Euclidean Distance
k1
k2
k3
![Page 33: Clustering_KUK.pptx](https://reader036.vdocuments.site/reader036/viewer/2022081603/563dbb7d550346aa9aada167/html5/thumbnails/33.jpg)
0
1
2
3
4
5
0 1 2 3 4 5
expression in condition 1
exp
ress
ion
in c
on
dit
ion
2
K-means Clustering: Step 5Algorithm: k-means, Distance Metric: Euclidean Distance
k1
k2 k3
![Page 34: Clustering_KUK.pptx](https://reader036.vdocuments.site/reader036/viewer/2022081603/563dbb7d550346aa9aada167/html5/thumbnails/34.jpg)
Example
Object X Y
Medicine A 1 1
Medicine B 2 1
Medicine C 4 3
Medicine D 5 4
K=2, based on the two features X, Y
1. Initial value of centroids : Suppose we use medicine A and medicine B as the first centroids. Let c1 and c2 denote the coordinate of the centroids, then c1=(1,1) and c2=(2,1)
2. Objects-Centroids distance : For distance between cluster centroid to each object use Euclidean distance.
• distance from medicine C = (4, 3) to
the first centroid c1(1,1) is:
• Distance from medicine c=(4,3) to second centroid c2(2,1) is:
distance matrix at iteration 0 is
• Each column in the distance matrix symbolizes the object.
• The first row of the distance matrix corresponds to the distance of each object to the first centroid.
• Second row is the distance of each object to the second centroid.
![Page 35: Clustering_KUK.pptx](https://reader036.vdocuments.site/reader036/viewer/2022081603/563dbb7d550346aa9aada167/html5/thumbnails/35.jpg)
Example
Object X Y
Medicine A 1 1
Medicine B 2 1
Medicine C 4 3
Medicine D 5 4
K=2, based on the two features X, Y 3. Objects clustering : Each object is assigned
based on the minimum distance. Thus, medicine A is assigned to group 1; medicine B, medicine C and medicine D to group 2.
4. Iteration-1, determine centroids : Again compute the new centroid of each group based on these new memberships.
Group 1 only has one member thus the centroid remains c1=(1,1).
Group 2 has three members thus new centroid:
5. Iteration-1, Objects-Centroids distances : To compute the distance of all objects to the new centroids. Similar to step 2, new distance matrix is obtained.
Group Matrix
• The element of Group matrix is 1, if and only if the object is assigned to that group
![Page 36: Clustering_KUK.pptx](https://reader036.vdocuments.site/reader036/viewer/2022081603/563dbb7d550346aa9aada167/html5/thumbnails/36.jpg)
Example
6. Iteration-1, Objects clustering: Similar to step 3, we assign each object to groups based on the minimum distance. Medicine
B is moved to group 2. 7. Iteration 2, determine centroids: step 4 to
calculate the new centroids is repeated:
8. Iteration-2, Objects-Centroids distances : Repeat step 2 again, to compute new distance matrix.
9. Iteration-2, Objects clustering: Again each object is assigned to groups.
Comparing the results, G2=G1 reveals that the objects does not move group anymore.
Group Matrix
![Page 37: Clustering_KUK.pptx](https://reader036.vdocuments.site/reader036/viewer/2022081603/563dbb7d550346aa9aada167/html5/thumbnails/37.jpg)
Comments on the K-Means Method
• Strength – Relatively efficient: O(tkn), where n is # objects, k is #
clusters, and t is # iterations. Normally, k, t << n.– Scale well for large datasets.
• Weakness– Applicable only when mean is defined, then what
about categorical data?– Need to specify k, the number of clusters, in advance– Unable to handle noisy data and outliers
![Page 38: Clustering_KUK.pptx](https://reader036.vdocuments.site/reader036/viewer/2022081603/563dbb7d550346aa9aada167/html5/thumbnails/38.jpg)
References• J. Han, and M. Kamber, Data Mining Concepts and Techniques.
Morgan Kaufmann.• A.K, Jain, M.N., Murty, and P.J. Flynn, Data Clustering : A review.
ACM Computing Surveys, 31 ,3, 264-323.• Pavel Berkhin. Survey of Clustering Data Mining Techniques. from
Internet.• B. Mirkin, Clustering for Data Mining: Data Recovery Approach,
Chapman & Hall/CRC, 2005.• S.C. Sharma, Applied multivariate techniques, John Wiley & Sons,
New York, 1995.• http://en.wikipedia.org/wiki/K-means_clustering • http://people.revoledu.com/kardi/tutorial• R. Xu, D. Wunsch, Survey of Clustering Algorithms, IEEE
Transactions on Neural Networks, Vol. 16, No. 3, May 2005• S. Mitra, T. Acharya, Data Mining: Multimedia, Soft Computing, and
Bioinformatics, John Wiley & Sons, 2004, ISBN 9812-53-063-0
![Page 39: Clustering_KUK.pptx](https://reader036.vdocuments.site/reader036/viewer/2022081603/563dbb7d550346aa9aada167/html5/thumbnails/39.jpg)
![Page 40: Clustering_KUK.pptx](https://reader036.vdocuments.site/reader036/viewer/2022081603/563dbb7d550346aa9aada167/html5/thumbnails/40.jpg)
Probability-Based Clustering
• The foundation of the probability-based clustering approach is based on a so-called finite mixture model.
• A mixture is a set of k probability distributions, each of which governs the attribute values distribution of a cluster.
• Seek the most likely clusters given the data• Instance belongs to a particular cluster with a certain
probability
![Page 41: Clustering_KUK.pptx](https://reader036.vdocuments.site/reader036/viewer/2022081603/563dbb7d550346aa9aada167/html5/thumbnails/41.jpg)
Using the Mixture Model• Probability that instance x belongs to cluster A:
with
• Probability of an instance given the clusters:Pr]|Pr[ xA
]Pr[
),;(
Pr[x]
Pr[A]A]|Pr[x]|Pr[
xAPAAxf
xA
22
2)(exp
)2(
1),;(
xxf
i
iclustericlusterxclustersthex ]Pr[]|Pr[]_|Pr[
![Page 42: Clustering_KUK.pptx](https://reader036.vdocuments.site/reader036/viewer/2022081603/563dbb7d550346aa9aada167/html5/thumbnails/42.jpg)
Learning the Clusters• Assume
– We know there are k clusters.• Learn the Clusters
– Determine their parameters– (mean and standard deviation)
• Performance Criteria– Probability of training data given the clusters
• EM algorithm– Finds the maximum of the likelihood
Pr]|Pr[ xA
![Page 43: Clustering_KUK.pptx](https://reader036.vdocuments.site/reader036/viewer/2022081603/563dbb7d550346aa9aada167/html5/thumbnails/43.jpg)
EM Algorithm
• EM = Expectation Maximization – Generalize k means to probabilistic setting
• Iterative procedure:– E “expectation” step: Calculate cluster probability for each instance– M “maximization” step:Estimate distribution parameters from cluster probabilities
• Store cluster probabilities as instance weights• Stop when improvement is negligible Pr]|Pr[ xA
![Page 44: Clustering_KUK.pptx](https://reader036.vdocuments.site/reader036/viewer/2022081603/563dbb7d550346aa9aada167/html5/thumbnails/44.jpg)
A 2-Cluster Example of the Finite Mixture Model
• In this example, it is assumed that there are two clusters and the attribute value distributions in both clusters are normal distributions.
N(2,22)N(1,1
2)
![Page 45: Clustering_KUK.pptx](https://reader036.vdocuments.site/reader036/viewer/2022081603/563dbb7d550346aa9aada167/html5/thumbnails/45.jpg)
Operation of the EM Algorithm
• The EM algorithm is to figure out the parameters for the finite mixture model.
• In this example, we need to figure out the following 5 parameters:1, 1, 2, 2, P(C1).
![Page 46: Clustering_KUK.pptx](https://reader036.vdocuments.site/reader036/viewer/2022081603/563dbb7d550346aa9aada167/html5/thumbnails/46.jpg)
• Then, the probabilities that sample si belongs to these two clusters are computed as follow:
. sample of valueattribute is where
,|Prob|Prob
|Prob
2
1
Prob
Prob
Prob
Prob|Prob|Prob
2
1
Prob
Prob
Prob
Prob|Prob|Prob
21
1
2
2
2222
2
1
1111
22
22
21
21
ii
ii
ii
x
ii
ii
x
ii
ii
sx
xCxC
xCp
ex
C
x
CCxxC
ex
C
x
CCxxC
i
i
![Page 47: Clustering_KUK.pptx](https://reader036.vdocuments.site/reader036/viewer/2022081603/563dbb7d550346aa9aada167/html5/thumbnails/47.jpg)
•The new estimated values of parameters are computed as follows.
n
pCP
p
xp
p
xp
p
xp
p
xp
n
ii
n
ii
n
iii
n
ii
n
iii
n
ii
n
iii
n
ii
n
iii
11
1
1
21
22
1
1
21
21
1
12
1
11
)(
)1(
)1( ;
)1(
)1( ;
![Page 48: Clustering_KUK.pptx](https://reader036.vdocuments.site/reader036/viewer/2022081603/563dbb7d550346aa9aada167/html5/thumbnails/48.jpg)
• The process is repeated until the clustering results converge.
• Generally, we attempt to maximize the following likelihood function:
.
21*
21
2211
ii
i
iii
ii
ii
i
|xCP|xCPcoff
|xCPxP|xCPxP
|CxPCP|CxPCP
![Page 49: Clustering_KUK.pptx](https://reader036.vdocuments.site/reader036/viewer/2022081603/563dbb7d550346aa9aada167/html5/thumbnails/49.jpg)
• Once we have figured out the approximate parameter values, then we assign sample si into C1, if
• Otherwise, si is assigned into C2.
.2
1
Prob
Prob
Prob
Prob|Prob|Prob
2
1
Prob
Prob
Prob
Prob|Prob|Prob
,5.0|Prob|Prob
|Prob
22
22
21
21
2
2
2222
2
1
1111
21
1
i
i
x
ii
ii
x
ii
ii
ii
ii
ex
C
x
CCxxC
ex
C
x
CCxxC
xCxC
xCp
![Page 50: Clustering_KUK.pptx](https://reader036.vdocuments.site/reader036/viewer/2022081603/563dbb7d550346aa9aada167/html5/thumbnails/50.jpg)
ExampleConsider a one dimensional clustering problem in which data
given are: x1=-4, x2=-3, x3=-1, x4=3, x5=5 from two probability distributions.
The initial mean of first Gaussian (first cluster) and second Gaussian (second cluster) are considered to be 0 and 2.
Probability density function for normal Gaussian is:
2)
2(
2
1
8
1),(
x
exf
![Page 51: Clustering_KUK.pptx](https://reader036.vdocuments.site/reader036/viewer/2022081603/563dbb7d550346aa9aada167/html5/thumbnails/51.jpg)
Example
• First probability of each instance belonging to different cluster is computed using probability density function.
0269.8
1)1,4(
2)
2
04(
2
1
ef
0022.8
1)2,4(
2)
2
24(
2
1
ef
0646.)0,3( f 00874.)2,3( f
176.)0,1( f0646.)2,1( f
0646.)0,3( f 176.)2,3( f
00874.)0,5( f 0646.)2,5( f
![Page 52: Clustering_KUK.pptx](https://reader036.vdocuments.site/reader036/viewer/2022081603/563dbb7d550346aa9aada167/html5/thumbnails/52.jpg)
Example
• E Step924.
0022.0269.
0269.
),(),(
),(
2211
1111
xfxf
xfh 0756.12 h
8808.21 h 1191.22 h
7315.31 h 2685.32 h
2685.41 h 7315.42 h
1191.51 h 8808.52 h
94.1
11
11
1
n
ii
n
iii
h
xh 39.3
12
12
2
n
ii
n
iii
h
xh
• M Step (Calculate mean again)
![Page 53: Clustering_KUK.pptx](https://reader036.vdocuments.site/reader036/viewer/2022081603/563dbb7d550346aa9aada167/html5/thumbnails/53.jpg)
2 nd Iteration
• E Step
11733.
2)
2
)94.1(4(
2
1
8
1)94.1,4(
ef
00022.
2)
2
39.34(
2
1
8
1)39.3,4(
ef
17330.)94.1,3( f 00121.)39.3,3( f
17858.)94.1,1( f 01793.)39.3,1( f
00944.)94.1,3( f 19568.)39.3,3( f
00048.)94.1,5( f 14424.)39.3,5( f
99813.00022.11733.
11733.
)2,2()1,1(
)1,1(11
xfxf
xfh 00187.
00022.11733.
00022.12
h
99307.00121.11730.
11730.21
h
00693.00121.17330.
00121.22
h
90876.01793.17858.
17858.31
h
09124.01793.17858.
01793.32
h
04602.19568.00944.
00944.41
h 95398.
19568.00944.
19568.42
h
00332.14424.00048.
00048.51
h 99668.
14424.00048.
14424.52
h
62.2
11
11
1
n
iih
n
iixih
37.3
12
12
2
n
iih
n
iixih
![Page 54: Clustering_KUK.pptx](https://reader036.vdocuments.site/reader036/viewer/2022081603/563dbb7d550346aa9aada167/html5/thumbnails/54.jpg)
3rd Iteration
• E Step
• M Step
15718.)62.2,4( f 00011.)37.3,4( f
19586.)62.2,3( f 00065.)37.3,3( f
14366.)62.2,1( f 01160.)37.3,1( f
00385.)62.2,3( f 18519.)37.3,3( f
00014.)62.2,5( f 16507.)37.3,5( f
99930.11 h00070.12 h
99669.21 h 00331.22 h
92529.31 h 07471.32 h
02037.41 h 97963.42 h
00085.51 h 99915.52 h
67.2
11
11
1
n
iih
n
iixih
81.3
12
12
2
n
iih
n
iixih
![Page 55: Clustering_KUK.pptx](https://reader036.vdocuments.site/reader036/viewer/2022081603/563dbb7d550346aa9aada167/html5/thumbnails/55.jpg)
EM Clustering in Weka • The Expectation-Maximization (EM) algorithm is part of the
Weka clustering package. • EM clustering scheme generates probabilistic descriptions of the
clusters in terms of mean and standard deviation for the numeric attributes and value counts (incremented by 1 and modified with a small value to avoid zero probabilities) for the nominal ones.
• EM can selects the number of clusters automatically by maximizing the logarithm of the likelihood of future data, estimated using ten fold cross-validation.
Cross validation steps:(i) The number of cluster is set to 1.(ii) Training set is split randomly into 10 folds.(iii) EM is performed 10 times using 10 fold cross
validation.(iv) The log likelihood is averaged over all 10 results.(v) If log likelihood value has increased then number of
clusters is increased by 1 and program continues at step ii.
![Page 56: Clustering_KUK.pptx](https://reader036.vdocuments.site/reader036/viewer/2022081603/563dbb7d550346aa9aada167/html5/thumbnails/56.jpg)
Limitation of the Finite Mixture Model and the EM Algorithm
• The finite mixture model and the EM algorithm generally assume that the attributes are independent.
• Approaches have been proposed for handling correlated attributes. However, these approaches are subject to further limitations.
![Page 57: Clustering_KUK.pptx](https://reader036.vdocuments.site/reader036/viewer/2022081603/563dbb7d550346aa9aada167/html5/thumbnails/57.jpg)
Attribute Information of Soybean Disease Datasetv1 date: april=0, may=1, june=2, july=3, august=4, september=5, october=6
v2 plant-stand: normal=0, lt-normal=1
v3 precip: lt-norm=0, norm=1, gt-norm=2
v4 temp: lt-norm=0, norm=1, gt-norm=2
v5 hail: yes=0, no=1
v6 crop-hist: diff-lst-year=0, same-lst-yr=1, same-lst-two-yrs=2, same-lst-sev-yrs=3
v7 area-damaged: scattered=0, low-areas=1, upper-areas=2, whole-field=3
v8 severity: pot-severe=1, severe=2
v9 seed-tmt: none=0, fungicide=1
v10 germination: '90-100%'=0, '80-89%'=1, 'lt-80%'=2
v12 leaves: norm=0, abnorm=1
v20 lodging: yes=0, no=1
v21 stem-cankers: absent=0, below-soil=1, above-soil=2, above-sec-nde=3
v22 canker-lesion: dna=0, brown=1, dk-brown-blk=2, tan=3
v23 fruiting-bodies: absent=0, present=1
v24 external decay: absent=0, firm-and-dry=1
v25 mycelium: absent=0, present=1
v26 int-discolor: none=0, black=2
v27 sclerotia: absent=0, present=1
v28 fruit-pods: norm=0, dna=3
v35 roots: norm=0, rotted=1