clustering:hierarchicalclusteringandk3...
TRANSCRIPT
![Page 1: Clustering:HierarchicalClusteringandK3 Means,Clustering,10601b/slides/clustering.pdfClustering:HierarchicalClusteringandK3 Means,Clustering, Machine(Learning(10.601B(SeyoungKim(Many(of(these(slides(are(derived(from(William(Cohen,(Ziv(Bar.Joseph,(Eric](https://reader035.vdocuments.site/reader035/viewer/2022071414/610de861bde2c1752b5bf2cb/html5/thumbnails/1.jpg)
Clustering: Hierarchical Clustering and K-‐Means Clustering
Machine Learning 10-‐601B
Seyoung Kim
Many of these slides are derived from William Cohen, Ziv Bar-‐Joseph, Eric Xing. Thanks!
![Page 2: Clustering:HierarchicalClusteringandK3 Means,Clustering,10601b/slides/clustering.pdfClustering:HierarchicalClusteringandK3 Means,Clustering, Machine(Learning(10.601B(SeyoungKim(Many(of(these(slides(are(derived(from(William(Cohen,(Ziv(Bar.Joseph,(Eric](https://reader035.vdocuments.site/reader035/viewer/2022071414/610de861bde2c1752b5bf2cb/html5/thumbnails/2.jpg)
Two Classes of Learning Problems
• Supervised learning = learning from labeled data, where class labels (in classificaNon) or output values (regression) are given – Train data: (X, Y) for inputs X and labels Y
• Unsupervised learning = learning from unlabeled, unannotated data – Train data: X for unlabeled data – we do not have a teacher that provides examples with their labels
![Page 3: Clustering:HierarchicalClusteringandK3 Means,Clustering,10601b/slides/clustering.pdfClustering:HierarchicalClusteringandK3 Means,Clustering, Machine(Learning(10.601B(SeyoungKim(Many(of(these(slides(are(derived(from(William(Cohen,(Ziv(Bar.Joseph,(Eric](https://reader035.vdocuments.site/reader035/viewer/2022071414/610de861bde2c1752b5bf2cb/html5/thumbnails/3.jpg)
• Organizing data into clusters such that there is • high intra-‐cluster similarity
• low inter-‐cluster similarity
• Informally, finding natural groupings among objects.
• Why do we want to do that?
• Any REAL applicaNon?
![Page 4: Clustering:HierarchicalClusteringandK3 Means,Clustering,10601b/slides/clustering.pdfClustering:HierarchicalClusteringandK3 Means,Clustering, Machine(Learning(10.601B(SeyoungKim(Many(of(these(slides(are(derived(from(William(Cohen,(Ziv(Bar.Joseph,(Eric](https://reader035.vdocuments.site/reader035/viewer/2022071414/610de861bde2c1752b5bf2cb/html5/thumbnails/4.jpg)
Examples
• People
• Images
• Language
• species
![Page 5: Clustering:HierarchicalClusteringandK3 Means,Clustering,10601b/slides/clustering.pdfClustering:HierarchicalClusteringandK3 Means,Clustering, Machine(Learning(10.601B(SeyoungKim(Many(of(these(slides(are(derived(from(William(Cohen,(Ziv(Bar.Joseph,(Eric](https://reader035.vdocuments.site/reader035/viewer/2022071414/610de861bde2c1752b5bf2cb/html5/thumbnails/5.jpg)
Unsupervised learning
• Clustering methods – Non-‐probabilisNc method
• Hierarchical clustering • K means algorithm
– ProbabilisNc method
• Mixture model
• We will also discuss dimensionality reducNon, another unsupervised learning method later in the course
![Page 6: Clustering:HierarchicalClusteringandK3 Means,Clustering,10601b/slides/clustering.pdfClustering:HierarchicalClusteringandK3 Means,Clustering, Machine(Learning(10.601B(SeyoungKim(Many(of(these(slides(are(derived(from(William(Cohen,(Ziv(Bar.Joseph,(Eric](https://reader035.vdocuments.site/reader035/viewer/2022071414/610de861bde2c1752b5bf2cb/html5/thumbnails/6.jpg)
The quality or state of being similar; likeness; resemblance; as, a similarity of features.
Similarity is hard to define, but… “We know it when we see it”
The real meaning of similarity is a philosophical quesNon. We will take a more pragmaNc approach.
Webster's DicEonary
![Page 7: Clustering:HierarchicalClusteringandK3 Means,Clustering,10601b/slides/clustering.pdfClustering:HierarchicalClusteringandK3 Means,Clustering, Machine(Learning(10.601B(SeyoungKim(Many(of(these(slides(are(derived(from(William(Cohen,(Ziv(Bar.Joseph,(Eric](https://reader035.vdocuments.site/reader035/viewer/2022071414/610de861bde2c1752b5bf2cb/html5/thumbnails/7.jpg)
DefiniEon: Let O1 and O2 be two objects from the universe of possible objects. The distance (dissimilarity) between O1 and O2 is a real number denoted by D(O1,O2)
0.23 3 342.7
gene2 gene1
![Page 8: Clustering:HierarchicalClusteringandK3 Means,Clustering,10601b/slides/clustering.pdfClustering:HierarchicalClusteringandK3 Means,Clustering, Machine(Learning(10.601B(SeyoungKim(Many(of(these(slides(are(derived(from(William(Cohen,(Ziv(Bar.Joseph,(Eric](https://reader035.vdocuments.site/reader035/viewer/2022071414/610de861bde2c1752b5bf2cb/html5/thumbnails/8.jpg)
What properEes should a distance measure have?
• D(A,B) = D(B,A) Symmetry
• D(A,A) = 0 Constancy of Self-‐Similarity
• D(A,B) = 0 IIf A= B Posi:vity Separa:on
• D(A,B) ≤ D(A,C) + D(B,C) Triangular Inequality
![Page 9: Clustering:HierarchicalClusteringandK3 Means,Clustering,10601b/slides/clustering.pdfClustering:HierarchicalClusteringandK3 Means,Clustering, Machine(Learning(10.601B(SeyoungKim(Many(of(these(slides(are(derived(from(William(Cohen,(Ziv(Bar.Joseph,(Eric](https://reader035.vdocuments.site/reader035/viewer/2022071414/610de861bde2c1752b5bf2cb/html5/thumbnails/9.jpg)
• D(A,B) = D(B,A) Symmetry – Otherwise you could claim "Alex looks like Bob, but Bob looks nothing like
Alex"
• D(A,A) = 0 Constancy of Self-‐Similarity – Otherwise you could claim "Alex looks more like Bob, than Bob does"
• D(A,B) = 0 IIf A= B Posi:vity Separa:on – Otherwise there are objects in your world that are different, but you
cannot tell apart.
• D(A,B) ≤ D(A,C) + D(B,C) Triangular Inequality – Otherwise you could claim "Alex is very like Bob, and Alex is very like Carl,
but Bob is very unlike Carl"
IntuiEons behind desirable distance measure properEes
![Page 10: Clustering:HierarchicalClusteringandK3 Means,Clustering,10601b/slides/clustering.pdfClustering:HierarchicalClusteringandK3 Means,Clustering, Machine(Learning(10.601B(SeyoungKim(Many(of(these(slides(are(derived(from(William(Cohen,(Ziv(Bar.Joseph,(Eric](https://reader035.vdocuments.site/reader035/viewer/2022071414/610de861bde2c1752b5bf2cb/html5/thumbnails/10.jpg)
Distance Measures
• Suppose two object x and y both have p features
• Euclidean distance
• CorrelaNon coefficient
€
d(x,y) = | xi − yii=1
p
∑ |22
![Page 11: Clustering:HierarchicalClusteringandK3 Means,Clustering,10601b/slides/clustering.pdfClustering:HierarchicalClusteringandK3 Means,Clustering, Machine(Learning(10.601B(SeyoungKim(Many(of(these(slides(are(derived(from(William(Cohen,(Ziv(Bar.Joseph,(Eric](https://reader035.vdocuments.site/reader035/viewer/2022071414/610de861bde2c1752b5bf2cb/html5/thumbnails/11.jpg)
Similarity Measures: CorrelaEon Coefficient
Time
Gene A
Gene A Time
Gene B
Expression Level Expression Level
Expression Level
Time
Gene A Gene B
Gene B
![Page 12: Clustering:HierarchicalClusteringandK3 Means,Clustering,10601b/slides/clustering.pdfClustering:HierarchicalClusteringandK3 Means,Clustering, Machine(Learning(10.601B(SeyoungKim(Many(of(these(slides(are(derived(from(William(Cohen,(Ziv(Bar.Joseph,(Eric](https://reader035.vdocuments.site/reader035/viewer/2022071414/610de861bde2c1752b5bf2cb/html5/thumbnails/12.jpg)
• Hierarchical algorithms: Create a hierarchical decomposiNon of the set of objects using some criterion
• ParEEonal algorithms: Construct various parNNons and then evaluate them by some criterion
Top down Bojom up or top down
![Page 13: Clustering:HierarchicalClusteringandK3 Means,Clustering,10601b/slides/clustering.pdfClustering:HierarchicalClusteringandK3 Means,Clustering, Machine(Learning(10.601B(SeyoungKim(Many(of(these(slides(are(derived(from(William(Cohen,(Ziv(Bar.Joseph,(Eric](https://reader035.vdocuments.site/reader035/viewer/2022071414/610de861bde2c1752b5bf2cb/html5/thumbnails/13.jpg)
(How-‐to) Hierarchical Clustering
BoOom-‐Up (agglomeraEve): StarNng with each item in its own cluster, find the best pair to merge into a new cluster. Repeat unNl all clusters are fused together.
![Page 14: Clustering:HierarchicalClusteringandK3 Means,Clustering,10601b/slides/clustering.pdfClustering:HierarchicalClusteringandK3 Means,Clustering, Machine(Learning(10.601B(SeyoungKim(Many(of(these(slides(are(derived(from(William(Cohen,(Ziv(Bar.Joseph,(Eric](https://reader035.vdocuments.site/reader035/viewer/2022071414/610de861bde2c1752b5bf2cb/html5/thumbnails/14.jpg)
0 8 8 7 7
0 2 4 4
0 3 3
0 1
0
D( , ) = 8
D( , ) = 1
We begin with a distance matrix which contains the distances between every pair of objects in our database.
![Page 15: Clustering:HierarchicalClusteringandK3 Means,Clustering,10601b/slides/clustering.pdfClustering:HierarchicalClusteringandK3 Means,Clustering, Machine(Learning(10.601B(SeyoungKim(Many(of(these(slides(are(derived(from(William(Cohen,(Ziv(Bar.Joseph,(Eric](https://reader035.vdocuments.site/reader035/viewer/2022071414/610de861bde2c1752b5bf2cb/html5/thumbnails/15.jpg)
… Consider all possible merges…
Choose the best
![Page 16: Clustering:HierarchicalClusteringandK3 Means,Clustering,10601b/slides/clustering.pdfClustering:HierarchicalClusteringandK3 Means,Clustering, Machine(Learning(10.601B(SeyoungKim(Many(of(these(slides(are(derived(from(William(Cohen,(Ziv(Bar.Joseph,(Eric](https://reader035.vdocuments.site/reader035/viewer/2022071414/610de861bde2c1752b5bf2cb/html5/thumbnails/16.jpg)
… Consider all possible merges…
Choose the best
Consider all possible merges… …
Choose the best
![Page 17: Clustering:HierarchicalClusteringandK3 Means,Clustering,10601b/slides/clustering.pdfClustering:HierarchicalClusteringandK3 Means,Clustering, Machine(Learning(10.601B(SeyoungKim(Many(of(these(slides(are(derived(from(William(Cohen,(Ziv(Bar.Joseph,(Eric](https://reader035.vdocuments.site/reader035/viewer/2022071414/610de861bde2c1752b5bf2cb/html5/thumbnails/17.jpg)
… Consider all possible merges…
Choose the best
Consider all possible merges… …
Choose the best
Consider all possible merges…
Choose the best …
![Page 18: Clustering:HierarchicalClusteringandK3 Means,Clustering,10601b/slides/clustering.pdfClustering:HierarchicalClusteringandK3 Means,Clustering, Machine(Learning(10.601B(SeyoungKim(Many(of(these(slides(are(derived(from(William(Cohen,(Ziv(Bar.Joseph,(Eric](https://reader035.vdocuments.site/reader035/viewer/2022071414/610de861bde2c1752b5bf2cb/html5/thumbnails/18.jpg)
… Consider all possible merges…
Choose the best
Consider all possible merges… …
Choose the best
Consider all possible merges…
Choose the best …
But how do we compute distances between clusters rather than objects?
![Page 19: Clustering:HierarchicalClusteringandK3 Means,Clustering,10601b/slides/clustering.pdfClustering:HierarchicalClusteringandK3 Means,Clustering, Machine(Learning(10.601B(SeyoungKim(Many(of(these(slides(are(derived(from(William(Cohen,(Ziv(Bar.Joseph,(Eric](https://reader035.vdocuments.site/reader035/viewer/2022071414/610de861bde2c1752b5bf2cb/html5/thumbnails/19.jpg)
CompuEng distance between clusters: Single Link
• cluster distance = distance of two closest members in each class
- Potentially long and skinny clusters
![Page 20: Clustering:HierarchicalClusteringandK3 Means,Clustering,10601b/slides/clustering.pdfClustering:HierarchicalClusteringandK3 Means,Clustering, Machine(Learning(10.601B(SeyoungKim(Many(of(these(slides(are(derived(from(William(Cohen,(Ziv(Bar.Joseph,(Eric](https://reader035.vdocuments.site/reader035/viewer/2022071414/610de861bde2c1752b5bf2cb/html5/thumbnails/20.jpg)
CompuEng distance between clusters: Complete Link
• cluster distance = distance of two farthest members
+ tight clusters
![Page 21: Clustering:HierarchicalClusteringandK3 Means,Clustering,10601b/slides/clustering.pdfClustering:HierarchicalClusteringandK3 Means,Clustering, Machine(Learning(10.601B(SeyoungKim(Many(of(these(slides(are(derived(from(William(Cohen,(Ziv(Bar.Joseph,(Eric](https://reader035.vdocuments.site/reader035/viewer/2022071414/610de861bde2c1752b5bf2cb/html5/thumbnails/21.jpg)
CompuEng distance between clusters: Average Link
• cluster distance = average distance of all pairs
the most widely used measure
Robust against noise
![Page 22: Clustering:HierarchicalClusteringandK3 Means,Clustering,10601b/slides/clustering.pdfClustering:HierarchicalClusteringandK3 Means,Clustering, Machine(Learning(10.601B(SeyoungKim(Many(of(these(slides(are(derived(from(William(Cohen,(Ziv(Bar.Joseph,(Eric](https://reader035.vdocuments.site/reader035/viewer/2022071414/610de861bde2c1752b5bf2cb/html5/thumbnails/22.jpg)
Example: single link
1
2
3
4
5
![Page 23: Clustering:HierarchicalClusteringandK3 Means,Clustering,10601b/slides/clustering.pdfClustering:HierarchicalClusteringandK3 Means,Clustering, Machine(Learning(10.601B(SeyoungKim(Many(of(these(slides(are(derived(from(William(Cohen,(Ziv(Bar.Joseph,(Eric](https://reader035.vdocuments.site/reader035/viewer/2022071414/610de861bde2c1752b5bf2cb/html5/thumbnails/23.jpg)
Example: single link
1
2
3
4
5
![Page 24: Clustering:HierarchicalClusteringandK3 Means,Clustering,10601b/slides/clustering.pdfClustering:HierarchicalClusteringandK3 Means,Clustering, Machine(Learning(10.601B(SeyoungKim(Many(of(these(slides(are(derived(from(William(Cohen,(Ziv(Bar.Joseph,(Eric](https://reader035.vdocuments.site/reader035/viewer/2022071414/610de861bde2c1752b5bf2cb/html5/thumbnails/24.jpg)
Example: single link
1
2
3
4
5
![Page 25: Clustering:HierarchicalClusteringandK3 Means,Clustering,10601b/slides/clustering.pdfClustering:HierarchicalClusteringandK3 Means,Clustering, Machine(Learning(10.601B(SeyoungKim(Many(of(these(slides(are(derived(from(William(Cohen,(Ziv(Bar.Joseph,(Eric](https://reader035.vdocuments.site/reader035/viewer/2022071414/610de861bde2c1752b5bf2cb/html5/thumbnails/25.jpg)
Example: single link
1
2
3
4
5
![Page 26: Clustering:HierarchicalClusteringandK3 Means,Clustering,10601b/slides/clustering.pdfClustering:HierarchicalClusteringandK3 Means,Clustering, Machine(Learning(10.601B(SeyoungKim(Many(of(these(slides(are(derived(from(William(Cohen,(Ziv(Bar.Joseph,(Eric](https://reader035.vdocuments.site/reader035/viewer/2022071414/610de861bde2c1752b5bf2cb/html5/thumbnails/26.jpg)
Average linkage
Single linkage
Height represents distance between objects / clusters
![Page 27: Clustering:HierarchicalClusteringandK3 Means,Clustering,10601b/slides/clustering.pdfClustering:HierarchicalClusteringandK3 Means,Clustering, Machine(Learning(10.601B(SeyoungKim(Many(of(these(slides(are(derived(from(William(Cohen,(Ziv(Bar.Joseph,(Eric](https://reader035.vdocuments.site/reader035/viewer/2022071414/610de861bde2c1752b5bf2cb/html5/thumbnails/27.jpg)
Summary of Hierarchal Clustering Methods
• No need to specify the number of clusters in advance.
• Hierarchical structure maps nicely onto human intuiNon for some domains
• They do not scale well: Nme complexity of at least O(n2), where n is the number of total objects.
• Like any heurisNc search algorithms, local opNma are a problem.
• InterpretaNon of results is (very) subjecNve.
![Page 28: Clustering:HierarchicalClusteringandK3 Means,Clustering,10601b/slides/clustering.pdfClustering:HierarchicalClusteringandK3 Means,Clustering, Machine(Learning(10.601B(SeyoungKim(Many(of(these(slides(are(derived(from(William(Cohen,(Ziv(Bar.Joseph,(Eric](https://reader035.vdocuments.site/reader035/viewer/2022071414/610de861bde2c1752b5bf2cb/html5/thumbnails/28.jpg)
In some cases we can determine the “correct” number of clusters. However, things are rarely this clear cut, unfortunately.
But what are the clusters?
![Page 29: Clustering:HierarchicalClusteringandK3 Means,Clustering,10601b/slides/clustering.pdfClustering:HierarchicalClusteringandK3 Means,Clustering, Machine(Learning(10.601B(SeyoungKim(Many(of(these(slides(are(derived(from(William(Cohen,(Ziv(Bar.Joseph,(Eric](https://reader035.vdocuments.site/reader035/viewer/2022071414/610de861bde2c1752b5bf2cb/html5/thumbnails/29.jpg)
Outlier
The single isolated branch is suggesNve of a data point that is very different to all others
![Page 30: Clustering:HierarchicalClusteringandK3 Means,Clustering,10601b/slides/clustering.pdfClustering:HierarchicalClusteringandK3 Means,Clustering, Machine(Learning(10.601B(SeyoungKim(Many(of(these(slides(are(derived(from(William(Cohen,(Ziv(Bar.Joseph,(Eric](https://reader035.vdocuments.site/reader035/viewer/2022071414/610de861bde2c1752b5bf2cb/html5/thumbnails/30.jpg)
Example: clustering genes
• Microarrays measure the acNviNes of all genes in different condiNons
• Group genes that perform the same funcNon
• Clustering genes can help determine new funcNons for unknown genes
• An early “killer applicaNon” in this area – The most cited (>12,000) paper in PNAS!
![Page 31: Clustering:HierarchicalClusteringandK3 Means,Clustering,10601b/slides/clustering.pdfClustering:HierarchicalClusteringandK3 Means,Clustering, Machine(Learning(10.601B(SeyoungKim(Many(of(these(slides(are(derived(from(William(Cohen,(Ziv(Bar.Joseph,(Eric](https://reader035.vdocuments.site/reader035/viewer/2022071414/610de861bde2c1752b5bf2cb/html5/thumbnails/31.jpg)
ParEEoning Algorithms
• ParNNoning method: Construct a parNNon of n objects into a set of K clusters – Given: a set of objects and the number K
– Find: a parNNon of K clusters that opNmizes the chosen parNNoning criterion
• Globally opNmal: exhausNvely enumerate all parNNons
• EffecNve heurisNc methods: K-‐means algorithms
![Page 32: Clustering:HierarchicalClusteringandK3 Means,Clustering,10601b/slides/clustering.pdfClustering:HierarchicalClusteringandK3 Means,Clustering, Machine(Learning(10.601B(SeyoungKim(Many(of(these(slides(are(derived(from(William(Cohen,(Ziv(Bar.Joseph,(Eric](https://reader035.vdocuments.site/reader035/viewer/2022071414/610de861bde2c1752b5bf2cb/html5/thumbnails/32.jpg)
• Nonhierarchical, each instance is placed in exactly one of K non-‐overlapping clusters.
• Since the output is only one set of clusters, the user has to specify the desired number of clusters K.
![Page 33: Clustering:HierarchicalClusteringandK3 Means,Clustering,10601b/slides/clustering.pdfClustering:HierarchicalClusteringandK3 Means,Clustering, Machine(Learning(10.601B(SeyoungKim(Many(of(these(slides(are(derived(from(William(Cohen,(Ziv(Bar.Joseph,(Eric](https://reader035.vdocuments.site/reader035/viewer/2022071414/610de861bde2c1752b5bf2cb/html5/thumbnails/33.jpg)
k1
k2 k3
Re-‐assign and move centers, unNl no objects change their cluster membership
![Page 34: Clustering:HierarchicalClusteringandK3 Means,Clustering,10601b/slides/clustering.pdfClustering:HierarchicalClusteringandK3 Means,Clustering, Machine(Learning(10.601B(SeyoungKim(Many(of(these(slides(are(derived(from(William(Cohen,(Ziv(Bar.Joseph,(Eric](https://reader035.vdocuments.site/reader035/viewer/2022071414/610de861bde2c1752b5bf2cb/html5/thumbnails/34.jpg)
K-‐Means Clustering Algorithm
Find the cluster means
Re-‐assign samples to clusters
Iterate unNl convergence
![Page 35: Clustering:HierarchicalClusteringandK3 Means,Clustering,10601b/slides/clustering.pdfClustering:HierarchicalClusteringandK3 Means,Clustering, Machine(Learning(10.601B(SeyoungKim(Many(of(these(slides(are(derived(from(William(Cohen,(Ziv(Bar.Joseph,(Eric](https://reader035.vdocuments.site/reader035/viewer/2022071414/610de861bde2c1752b5bf2cb/html5/thumbnails/35.jpg)
K-‐means Clustering: Step 1
![Page 36: Clustering:HierarchicalClusteringandK3 Means,Clustering,10601b/slides/clustering.pdfClustering:HierarchicalClusteringandK3 Means,Clustering, Machine(Learning(10.601B(SeyoungKim(Many(of(these(slides(are(derived(from(William(Cohen,(Ziv(Bar.Joseph,(Eric](https://reader035.vdocuments.site/reader035/viewer/2022071414/610de861bde2c1752b5bf2cb/html5/thumbnails/36.jpg)
K-‐means Clustering: Step 2
![Page 37: Clustering:HierarchicalClusteringandK3 Means,Clustering,10601b/slides/clustering.pdfClustering:HierarchicalClusteringandK3 Means,Clustering, Machine(Learning(10.601B(SeyoungKim(Many(of(these(slides(are(derived(from(William(Cohen,(Ziv(Bar.Joseph,(Eric](https://reader035.vdocuments.site/reader035/viewer/2022071414/610de861bde2c1752b5bf2cb/html5/thumbnails/37.jpg)
K-‐means Clustering: Step 3
![Page 38: Clustering:HierarchicalClusteringandK3 Means,Clustering,10601b/slides/clustering.pdfClustering:HierarchicalClusteringandK3 Means,Clustering, Machine(Learning(10.601B(SeyoungKim(Many(of(these(slides(are(derived(from(William(Cohen,(Ziv(Bar.Joseph,(Eric](https://reader035.vdocuments.site/reader035/viewer/2022071414/610de861bde2c1752b5bf2cb/html5/thumbnails/38.jpg)
K-‐means Clustering: Step 4
![Page 39: Clustering:HierarchicalClusteringandK3 Means,Clustering,10601b/slides/clustering.pdfClustering:HierarchicalClusteringandK3 Means,Clustering, Machine(Learning(10.601B(SeyoungKim(Many(of(these(slides(are(derived(from(William(Cohen,(Ziv(Bar.Joseph,(Eric](https://reader035.vdocuments.site/reader035/viewer/2022071414/610de861bde2c1752b5bf2cb/html5/thumbnails/39.jpg)
K-‐means Clustering: Step 5
![Page 40: Clustering:HierarchicalClusteringandK3 Means,Clustering,10601b/slides/clustering.pdfClustering:HierarchicalClusteringandK3 Means,Clustering, Machine(Learning(10.601B(SeyoungKim(Many(of(these(slides(are(derived(from(William(Cohen,(Ziv(Bar.Joseph,(Eric](https://reader035.vdocuments.site/reader035/viewer/2022071414/610de861bde2c1752b5bf2cb/html5/thumbnails/40.jpg)
K-‐Means
Algorithm
1. Decide on a value for K.
2. IniNalize the K cluster centers randomly if necessary.
3. Decide the class memberships of the N objects by assigning them to the nearest cluster centroids (aka the center of gravity or mean)
4. Re-‐esNmate the K cluster centers, by assuming the memberships found above are correct.
5. If none of the N objects changed membership in the last iteraNon, exit. Otherwise go to 3.
![Page 41: Clustering:HierarchicalClusteringandK3 Means,Clustering,10601b/slides/clustering.pdfClustering:HierarchicalClusteringandK3 Means,Clustering, Machine(Learning(10.601B(SeyoungKim(Many(of(these(slides(are(derived(from(William(Cohen,(Ziv(Bar.Joseph,(Eric](https://reader035.vdocuments.site/reader035/viewer/2022071414/610de861bde2c1752b5bf2cb/html5/thumbnails/41.jpg)
10
1 2 3 4 5 6 7 8 9 10
1
2
3
4
5
6
7
8
9
![Page 42: Clustering:HierarchicalClusteringandK3 Means,Clustering,10601b/slides/clustering.pdfClustering:HierarchicalClusteringandK3 Means,Clustering, Machine(Learning(10.601B(SeyoungKim(Many(of(these(slides(are(derived(from(William(Cohen,(Ziv(Bar.Joseph,(Eric](https://reader035.vdocuments.site/reader035/viewer/2022071414/610de861bde2c1752b5bf2cb/html5/thumbnails/42.jpg)
1 2 3 4 5 6 7 8 9 10
When k = 1, the objecNve funcNon is 873.0
€
Obj =k∑ ||
i∑ xi − µk ||2
2
![Page 43: Clustering:HierarchicalClusteringandK3 Means,Clustering,10601b/slides/clustering.pdfClustering:HierarchicalClusteringandK3 Means,Clustering, Machine(Learning(10.601B(SeyoungKim(Many(of(these(slides(are(derived(from(William(Cohen,(Ziv(Bar.Joseph,(Eric](https://reader035.vdocuments.site/reader035/viewer/2022071414/610de861bde2c1752b5bf2cb/html5/thumbnails/43.jpg)
1 2 3 4 5 6 7 8 9 10
When k = 2, the objecNve funcNon is 173.1
€
Obj =k∑ ||
i∑ xi − µk ||2
2
![Page 44: Clustering:HierarchicalClusteringandK3 Means,Clustering,10601b/slides/clustering.pdfClustering:HierarchicalClusteringandK3 Means,Clustering, Machine(Learning(10.601B(SeyoungKim(Many(of(these(slides(are(derived(from(William(Cohen,(Ziv(Bar.Joseph,(Eric](https://reader035.vdocuments.site/reader035/viewer/2022071414/610de861bde2c1752b5bf2cb/html5/thumbnails/44.jpg)
1 2 3 4 5 6 7 8 9 10
When k = 3, the objecNve funcNon is 133.6
€
Obj =k∑ ||
i∑ xi − µk ||2
2
![Page 45: Clustering:HierarchicalClusteringandK3 Means,Clustering,10601b/slides/clustering.pdfClustering:HierarchicalClusteringandK3 Means,Clustering, Machine(Learning(10.601B(SeyoungKim(Many(of(these(slides(are(derived(from(William(Cohen,(Ziv(Bar.Joseph,(Eric](https://reader035.vdocuments.site/reader035/viewer/2022071414/610de861bde2c1752b5bf2cb/html5/thumbnails/45.jpg)
0.00E+00 1.00E+02 2.00E+02 3.00E+02 4.00E+02 5.00E+02 6.00E+02 7.00E+02 8.00E+02 9.00E+02 1.00E+03
1 2 3 4 5 6
We can plot the objecNve funcNon values for k equals 1 to 6…
The abrupt change at k = 2, is highly suggesNve of two clusters in the data. This technique for determining the number of clusters is known as “knee finding” or “elbow finding”.
Note that the results are not always as clear cut as in this toy example
k
ObjecNve FuncNo
n
![Page 46: Clustering:HierarchicalClusteringandK3 Means,Clustering,10601b/slides/clustering.pdfClustering:HierarchicalClusteringandK3 Means,Clustering, Machine(Learning(10.601B(SeyoungKim(Many(of(these(slides(are(derived(from(William(Cohen,(Ziv(Bar.Joseph,(Eric](https://reader035.vdocuments.site/reader035/viewer/2022071414/610de861bde2c1752b5bf2cb/html5/thumbnails/46.jpg)
What you should know
• Clustering as an unsupervised learning method
• What are the different types of clustering algorithms • What are the assumpNons we are making for each, and what
can we get from them
• Unsolved issues: number of clusters, iniNalizaNon, etc.