introduction to machine learning,10701/slides/14_em.pdf · 2017. 10. 18. · some seeds can result...
TRANSCRIPT
![Page 1: Introduction to Machine Learning,10701/slides/14_EM.pdf · 2017. 10. 18. · Some seeds can result in poor convergence rate, or convergence to sub-optimal clustering. K-means algorithm](https://reader035.vdocuments.site/reader035/viewer/2022063016/5fd6d10f8f76d7124610a7b3/html5/thumbnails/1.jpg)
Introduction to Machine Learning,
Clustering and EM
Barnabás Póczos
![Page 2: Introduction to Machine Learning,10701/slides/14_EM.pdf · 2017. 10. 18. · Some seeds can result in poor convergence rate, or convergence to sub-optimal clustering. K-means algorithm](https://reader035.vdocuments.site/reader035/viewer/2022063016/5fd6d10f8f76d7124610a7b3/html5/thumbnails/2.jpg)
2
Contents
Clustering
K-means
Mixture of Gaussians
Expectation Maximization
Variational Methods
![Page 3: Introduction to Machine Learning,10701/slides/14_EM.pdf · 2017. 10. 18. · Some seeds can result in poor convergence rate, or convergence to sub-optimal clustering. K-means algorithm](https://reader035.vdocuments.site/reader035/viewer/2022063016/5fd6d10f8f76d7124610a7b3/html5/thumbnails/3.jpg)
3
Clustering
![Page 4: Introduction to Machine Learning,10701/slides/14_EM.pdf · 2017. 10. 18. · Some seeds can result in poor convergence rate, or convergence to sub-optimal clustering. K-means algorithm](https://reader035.vdocuments.site/reader035/viewer/2022063016/5fd6d10f8f76d7124610a7b3/html5/thumbnails/4.jpg)
4
K- means clusteringWhat is clustering?Clustering:
The process of grouping a set of objects into classes of similar objects
–high intra-class similarity
–low inter-class similarity
–It is the most common form of unsupervised learning
Clustering is Subjective
![Page 5: Introduction to Machine Learning,10701/slides/14_EM.pdf · 2017. 10. 18. · Some seeds can result in poor convergence rate, or convergence to sub-optimal clustering. K-means algorithm](https://reader035.vdocuments.site/reader035/viewer/2022063016/5fd6d10f8f76d7124610a7b3/html5/thumbnails/5.jpg)
5
K- means clusteringWhat is clustering?Clustering:
The process of grouping a set of objects into classes of similar objects
–high intra-class similarity
–low inter-class similarity
–It is the most common form of unsupervised learning
![Page 6: Introduction to Machine Learning,10701/slides/14_EM.pdf · 2017. 10. 18. · Some seeds can result in poor convergence rate, or convergence to sub-optimal clustering. K-means algorithm](https://reader035.vdocuments.site/reader035/viewer/2022063016/5fd6d10f8f76d7124610a7b3/html5/thumbnails/6.jpg)
6
K- means clusteringWhat is Similarity?
Hard to define! …but we know it when we see it
![Page 7: Introduction to Machine Learning,10701/slides/14_EM.pdf · 2017. 10. 18. · Some seeds can result in poor convergence rate, or convergence to sub-optimal clustering. K-means algorithm](https://reader035.vdocuments.site/reader035/viewer/2022063016/5fd6d10f8f76d7124610a7b3/html5/thumbnails/7.jpg)
7
The K- means Clustering Problem
![Page 8: Introduction to Machine Learning,10701/slides/14_EM.pdf · 2017. 10. 18. · Some seeds can result in poor convergence rate, or convergence to sub-optimal clustering. K-means algorithm](https://reader035.vdocuments.site/reader035/viewer/2022063016/5fd6d10f8f76d7124610a7b3/html5/thumbnails/8.jpg)
8
K-means Clustering Problem
Partition the n observations into K sets (K ≤ n) S = {S1, S2, …, SK}
such that the sets minimize the within-cluster sum of squares:
K-means clustering problem:
K=3
![Page 9: Introduction to Machine Learning,10701/slides/14_EM.pdf · 2017. 10. 18. · Some seeds can result in poor convergence rate, or convergence to sub-optimal clustering. K-means algorithm](https://reader035.vdocuments.site/reader035/viewer/2022063016/5fd6d10f8f76d7124610a7b3/html5/thumbnails/9.jpg)
9
K-means Clustering Problem
Partition the n observations into K sets (K ≤ n) S = {S1, S2, …, SK}
such that the sets minimize the within-cluster sum of squares:
The problem is NP hard, but there are good heuristic algorithms that seem to work well in practice:
• K–means algorithm
• mixture of Gaussians
K-means clustering problem:
How hard is this problem?
![Page 10: Introduction to Machine Learning,10701/slides/14_EM.pdf · 2017. 10. 18. · Some seeds can result in poor convergence rate, or convergence to sub-optimal clustering. K-means algorithm](https://reader035.vdocuments.site/reader035/viewer/2022063016/5fd6d10f8f76d7124610a7b3/html5/thumbnails/10.jpg)
10
K-means Clustering Alg: Step 1
• Given n objects.
• Guess the cluster centers (k1, k2, k3. They were 1, 2,3 in the previous slide)
![Page 11: Introduction to Machine Learning,10701/slides/14_EM.pdf · 2017. 10. 18. · Some seeds can result in poor convergence rate, or convergence to sub-optimal clustering. K-means algorithm](https://reader035.vdocuments.site/reader035/viewer/2022063016/5fd6d10f8f76d7124610a7b3/html5/thumbnails/11.jpg)
11
K-means Clustering Alg: Step 2
Decide the class memberships of the n objects by assigning them to the nearest cluster centers k1, k2, k3. (= Build a Voronoi diagram based on the cluster centers k1, k2, k3.)
![Page 12: Introduction to Machine Learning,10701/slides/14_EM.pdf · 2017. 10. 18. · Some seeds can result in poor convergence rate, or convergence to sub-optimal clustering. K-means algorithm](https://reader035.vdocuments.site/reader035/viewer/2022063016/5fd6d10f8f76d7124610a7b3/html5/thumbnails/12.jpg)
12
K-means Clustering Alg: Step 3
• Re-estimate the cluster centers (aka the centroid or mean), by assuming the memberships found above are correct.
![Page 13: Introduction to Machine Learning,10701/slides/14_EM.pdf · 2017. 10. 18. · Some seeds can result in poor convergence rate, or convergence to sub-optimal clustering. K-means algorithm](https://reader035.vdocuments.site/reader035/viewer/2022063016/5fd6d10f8f76d7124610a7b3/html5/thumbnails/13.jpg)
13
K-means Clustering Alg: Step 4
• Build a new Voronoi diagram based on the new cluster centers.
• Decide the class memberships of the n objects based on this diagram
![Page 14: Introduction to Machine Learning,10701/slides/14_EM.pdf · 2017. 10. 18. · Some seeds can result in poor convergence rate, or convergence to sub-optimal clustering. K-means algorithm](https://reader035.vdocuments.site/reader035/viewer/2022063016/5fd6d10f8f76d7124610a7b3/html5/thumbnails/14.jpg)
14
K-means Clustering Alg: Step 5
• Re-estimate the cluster centers.
![Page 15: Introduction to Machine Learning,10701/slides/14_EM.pdf · 2017. 10. 18. · Some seeds can result in poor convergence rate, or convergence to sub-optimal clustering. K-means algorithm](https://reader035.vdocuments.site/reader035/viewer/2022063016/5fd6d10f8f76d7124610a7b3/html5/thumbnails/15.jpg)
15
K-means Clustering Alg: Step 6
• Stop when everything is settled. (The Voronoi diagrams don’t change anymore)
![Page 16: Introduction to Machine Learning,10701/slides/14_EM.pdf · 2017. 10. 18. · Some seeds can result in poor convergence rate, or convergence to sub-optimal clustering. K-means algorithm](https://reader035.vdocuments.site/reader035/viewer/2022063016/5fd6d10f8f76d7124610a7b3/html5/thumbnails/16.jpg)
16
K- means clustering
Algorithm
Input
– Data + Desired number of clusters, K
Initialize
– the K cluster centers (randomly if necessary)
Iterate
1. Decide the class memberships of the n objects by assigning them to the nearest cluster centers
2. Re-estimate the K cluster centers (aka the centroid or mean), by assuming the memberships found above are correct.
Termination
– If none of the n objects changed membership in the last iteration, exit.
Otherwise go to 1.
K- means Clustering Algorithm
![Page 17: Introduction to Machine Learning,10701/slides/14_EM.pdf · 2017. 10. 18. · Some seeds can result in poor convergence rate, or convergence to sub-optimal clustering. K-means algorithm](https://reader035.vdocuments.site/reader035/viewer/2022063016/5fd6d10f8f76d7124610a7b3/html5/thumbnails/17.jpg)
17
K- means clusteringK- means Algorithm Computation Complexity
At each iteration,
– Computing distance between each of the n objects and the K cluster centers is O(Kn).
– Computing cluster centers: Each object gets added once to some cluster: O(n).
Assume these two steps are each done once for l iterations: O(lKn).
![Page 18: Introduction to Machine Learning,10701/slides/14_EM.pdf · 2017. 10. 18. · Some seeds can result in poor convergence rate, or convergence to sub-optimal clustering. K-means algorithm](https://reader035.vdocuments.site/reader035/viewer/2022063016/5fd6d10f8f76d7124610a7b3/html5/thumbnails/18.jpg)
18
K- means clusteringSeed Choice
![Page 19: Introduction to Machine Learning,10701/slides/14_EM.pdf · 2017. 10. 18. · Some seeds can result in poor convergence rate, or convergence to sub-optimal clustering. K-means algorithm](https://reader035.vdocuments.site/reader035/viewer/2022063016/5fd6d10f8f76d7124610a7b3/html5/thumbnails/19.jpg)
19
K- means clusteringSeed Choice
![Page 20: Introduction to Machine Learning,10701/slides/14_EM.pdf · 2017. 10. 18. · Some seeds can result in poor convergence rate, or convergence to sub-optimal clustering. K-means algorithm](https://reader035.vdocuments.site/reader035/viewer/2022063016/5fd6d10f8f76d7124610a7b3/html5/thumbnails/20.jpg)
20
K- means clusteringSeed Choice
The results of the K- means Algorithm can vary based on random seed selection.
Some seeds can result in poor convergence rate, or convergence to sub-optimal clustering.
K-means algorithm can get stuck easily in local minima.
– Select good seeds using a heuristic (e.g., object least similar to any existing mean)
– Try out multiple starting points (very important!!!)
– Initialize with the results of another method.
![Page 21: Introduction to Machine Learning,10701/slides/14_EM.pdf · 2017. 10. 18. · Some seeds can result in poor convergence rate, or convergence to sub-optimal clustering. K-means algorithm](https://reader035.vdocuments.site/reader035/viewer/2022063016/5fd6d10f8f76d7124610a7b3/html5/thumbnails/21.jpg)
21
Alternating Optimization
![Page 22: Introduction to Machine Learning,10701/slides/14_EM.pdf · 2017. 10. 18. · Some seeds can result in poor convergence rate, or convergence to sub-optimal clustering. K-means algorithm](https://reader035.vdocuments.site/reader035/viewer/2022063016/5fd6d10f8f76d7124610a7b3/html5/thumbnails/22.jpg)
22
K- means clusteringK- means Algorithm (more formally)
Randomly initialize k centers
Classify: At iteration t, assign each point xj (j 2 {1,…,n}) to the
nearest center:
Recenter: i(t+1) is the centroid of the new set:
Classification at iteration t
Re-assign new cluster centers at iteration t
![Page 23: Introduction to Machine Learning,10701/slides/14_EM.pdf · 2017. 10. 18. · Some seeds can result in poor convergence rate, or convergence to sub-optimal clustering. K-means algorithm](https://reader035.vdocuments.site/reader035/viewer/2022063016/5fd6d10f8f76d7124610a7b3/html5/thumbnails/23.jpg)
23
K- means clusteringWhat is the K-means algorithm optimizing?
Define the following potential function F of centers and point allocation C
It’s easy to see that the optimal solution of the K-means problem is:
Two equivalent versions
![Page 24: Introduction to Machine Learning,10701/slides/14_EM.pdf · 2017. 10. 18. · Some seeds can result in poor convergence rate, or convergence to sub-optimal clustering. K-means algorithm](https://reader035.vdocuments.site/reader035/viewer/2022063016/5fd6d10f8f76d7124610a7b3/html5/thumbnails/24.jpg)
24
K- means clusteringK-means Algorithm
K-means algorithm:
(1)
Optimize the potential function:
(2)
Exactly the 2nd step (re-center)
Assign each point to the nearest cluster center
Exactly the first step
![Page 25: Introduction to Machine Learning,10701/slides/14_EM.pdf · 2017. 10. 18. · Some seeds can result in poor convergence rate, or convergence to sub-optimal clustering. K-means algorithm](https://reader035.vdocuments.site/reader035/viewer/2022063016/5fd6d10f8f76d7124610a7b3/html5/thumbnails/25.jpg)
25
K- means clusteringK-means Algorithm
K-means algorithm: (coordinate descent on F)
Today, we will see a generalization of this approach:
EM algorithm
(1)
(2)
“Expectation step”
“Maximization step”
Optimize the potential function:
![Page 26: Introduction to Machine Learning,10701/slides/14_EM.pdf · 2017. 10. 18. · Some seeds can result in poor convergence rate, or convergence to sub-optimal clustering. K-means algorithm](https://reader035.vdocuments.site/reader035/viewer/2022063016/5fd6d10f8f76d7124610a7b3/html5/thumbnails/26.jpg)
26
Gaussian Mixture Model
![Page 27: Introduction to Machine Learning,10701/slides/14_EM.pdf · 2017. 10. 18. · Some seeds can result in poor convergence rate, or convergence to sub-optimal clustering. K-means algorithm](https://reader035.vdocuments.site/reader035/viewer/2022063016/5fd6d10f8f76d7124610a7b3/html5/thumbnails/27.jpg)
27
K- means clusteringGenerative Gaussian Mixture Model
Mixture of K Gaussians distributions: (Multi-modal distribution)
• There are K components
• Component i has an associated mean vector i
Component i generates data from
Each data point is generated using this process:
![Page 28: Introduction to Machine Learning,10701/slides/14_EM.pdf · 2017. 10. 18. · Some seeds can result in poor convergence rate, or convergence to sub-optimal clustering. K-means algorithm](https://reader035.vdocuments.site/reader035/viewer/2022063016/5fd6d10f8f76d7124610a7b3/html5/thumbnails/28.jpg)
28
Gaussian Mixture ModelMixture of K Gaussians distributions: (Multi-modal distribution)
Mixturecomponent
Mixtureproportion
Observed data
Hidden variable
![Page 29: Introduction to Machine Learning,10701/slides/14_EM.pdf · 2017. 10. 18. · Some seeds can result in poor convergence rate, or convergence to sub-optimal clustering. K-means algorithm](https://reader035.vdocuments.site/reader035/viewer/2022063016/5fd6d10f8f76d7124610a7b3/html5/thumbnails/29.jpg)
29
Mixture of Gaussians Clustering
Cluster x based on the ratio of posteriors:
Assume that
For a given x we want to decide if it belongs to cluster i or cluster j
![Page 30: Introduction to Machine Learning,10701/slides/14_EM.pdf · 2017. 10. 18. · Some seeds can result in poor convergence rate, or convergence to sub-optimal clustering. K-means algorithm](https://reader035.vdocuments.site/reader035/viewer/2022063016/5fd6d10f8f76d7124610a7b3/html5/thumbnails/30.jpg)
30
Mixture of Gaussians Clustering Assume that
![Page 31: Introduction to Machine Learning,10701/slides/14_EM.pdf · 2017. 10. 18. · Some seeds can result in poor convergence rate, or convergence to sub-optimal clustering. K-means algorithm](https://reader035.vdocuments.site/reader035/viewer/2022063016/5fd6d10f8f76d7124610a7b3/html5/thumbnails/31.jpg)
31
Piecewise linear decision boundary
![Page 32: Introduction to Machine Learning,10701/slides/14_EM.pdf · 2017. 10. 18. · Some seeds can result in poor convergence rate, or convergence to sub-optimal clustering. K-means algorithm](https://reader035.vdocuments.site/reader035/viewer/2022063016/5fd6d10f8f76d7124610a7b3/html5/thumbnails/32.jpg)
32
MLE for GMM
) Maximum Likelihood Estimate (MLE)
What if we don't know the parameters?
![Page 33: Introduction to Machine Learning,10701/slides/14_EM.pdf · 2017. 10. 18. · Some seeds can result in poor convergence rate, or convergence to sub-optimal clustering. K-means algorithm](https://reader035.vdocuments.site/reader035/viewer/2022063016/5fd6d10f8f76d7124610a7b3/html5/thumbnails/33.jpg)
33
General GMMGMM –Gaussian Mixture Model
Mixture
component
Mixture
proportion
![Page 34: Introduction to Machine Learning,10701/slides/14_EM.pdf · 2017. 10. 18. · Some seeds can result in poor convergence rate, or convergence to sub-optimal clustering. K-means algorithm](https://reader035.vdocuments.site/reader035/viewer/2022063016/5fd6d10f8f76d7124610a7b3/html5/thumbnails/34.jpg)
34
General GMM
“Quadratic Decision boundary” – second-order terms don’t cancel out
Clustering based on ratios of posteriors:
Assume that
![Page 35: Introduction to Machine Learning,10701/slides/14_EM.pdf · 2017. 10. 18. · Some seeds can result in poor convergence rate, or convergence to sub-optimal clustering. K-means algorithm](https://reader035.vdocuments.site/reader035/viewer/2022063016/5fd6d10f8f76d7124610a7b3/html5/thumbnails/35.jpg)
35
General GMM MLE Estimation
) Maximize marginal likelihood (MLE):
What if we don't know
Doable, but often slow
Non-linear, non-analytically solvable
![Page 36: Introduction to Machine Learning,10701/slides/14_EM.pdf · 2017. 10. 18. · Some seeds can result in poor convergence rate, or convergence to sub-optimal clustering. K-means algorithm](https://reader035.vdocuments.site/reader035/viewer/2022063016/5fd6d10f8f76d7124610a7b3/html5/thumbnails/36.jpg)
36
The EM algorithm
What is EM in the general case, and why does it work?
![Page 37: Introduction to Machine Learning,10701/slides/14_EM.pdf · 2017. 10. 18. · Some seeds can result in poor convergence rate, or convergence to sub-optimal clustering. K-means algorithm](https://reader035.vdocuments.site/reader035/viewer/2022063016/5fd6d10f8f76d7124610a7b3/html5/thumbnails/37.jpg)
37
Expectation-Maximization (EM)A general algorithm to deal with hidden data, but we will study it in
the context of unsupervised learning (hidden class labels = clustering) first.
• EM is an optimization strategy for objective functions that can be interpreted as likelihoods in the presence of missing data.
• In the following examples EM is “simpler” than gradient methods:No need to choose step size.
• EM is an iterative algorithm with two linked steps:
o E-step: fill-in hidden values using inference
o M-step: apply standard MLE/MAP method to completed data
• We will prove that this procedure monotonically improves the likelihood (or leaves it unchanged).
![Page 38: Introduction to Machine Learning,10701/slides/14_EM.pdf · 2017. 10. 18. · Some seeds can result in poor convergence rate, or convergence to sub-optimal clustering. K-means algorithm](https://reader035.vdocuments.site/reader035/viewer/2022063016/5fd6d10f8f76d7124610a7b3/html5/thumbnails/38.jpg)
38
General EM algorithm
Observed data:
Unknown variables:
Paramaters:
For example in clustering:
For example in MoG:
Goal:
Notation
![Page 39: Introduction to Machine Learning,10701/slides/14_EM.pdf · 2017. 10. 18. · Some seeds can result in poor convergence rate, or convergence to sub-optimal clustering. K-means algorithm](https://reader035.vdocuments.site/reader035/viewer/2022063016/5fd6d10f8f76d7124610a7b3/html5/thumbnails/39.jpg)
39
General EM algorithm
Goal:
Free energy:
E Step:
M Step:
We are going to discuss why this approach works
![Page 40: Introduction to Machine Learning,10701/slides/14_EM.pdf · 2017. 10. 18. · Some seeds can result in poor convergence rate, or convergence to sub-optimal clustering. K-means algorithm](https://reader035.vdocuments.site/reader035/viewer/2022063016/5fd6d10f8f76d7124610a7b3/html5/thumbnails/40.jpg)
40
General EM algorithm
Free energy:
M Step:
We maximize only here in !!!
E Step:
![Page 41: Introduction to Machine Learning,10701/slides/14_EM.pdf · 2017. 10. 18. · Some seeds can result in poor convergence rate, or convergence to sub-optimal clustering. K-means algorithm](https://reader035.vdocuments.site/reader035/viewer/2022063016/5fd6d10f8f76d7124610a7b3/html5/thumbnails/41.jpg)
41
General EM algorithm
Free energy:
Theorem: During the EM algorithm the marginal likelihood is not decreasing!
Proof:
![Page 42: Introduction to Machine Learning,10701/slides/14_EM.pdf · 2017. 10. 18. · Some seeds can result in poor convergence rate, or convergence to sub-optimal clustering. K-means algorithm](https://reader035.vdocuments.site/reader035/viewer/2022063016/5fd6d10f8f76d7124610a7b3/html5/thumbnails/42.jpg)
42
General EM algorithm
Goal:
E Step:
M Step:
During the EM algorithm the marginal likelihood is not decreasing!
![Page 43: Introduction to Machine Learning,10701/slides/14_EM.pdf · 2017. 10. 18. · Some seeds can result in poor convergence rate, or convergence to sub-optimal clustering. K-means algorithm](https://reader035.vdocuments.site/reader035/viewer/2022063016/5fd6d10f8f76d7124610a7b3/html5/thumbnails/43.jpg)
43
Convergence of EM
Sequence of EM lower bound F-functions
EM monotonically converges to a local maximum of likelihood !
Log-likelihood function
![Page 44: Introduction to Machine Learning,10701/slides/14_EM.pdf · 2017. 10. 18. · Some seeds can result in poor convergence rate, or convergence to sub-optimal clustering. K-means algorithm](https://reader035.vdocuments.site/reader035/viewer/2022063016/5fd6d10f8f76d7124610a7b3/html5/thumbnails/44.jpg)
44
Convergence of EM
Use multiple, randomized initializations in practice
Different sequence of EM lower bound F-functions depending on initialization
![Page 45: Introduction to Machine Learning,10701/slides/14_EM.pdf · 2017. 10. 18. · Some seeds can result in poor convergence rate, or convergence to sub-optimal clustering. K-means algorithm](https://reader035.vdocuments.site/reader035/viewer/2022063016/5fd6d10f8f76d7124610a7b3/html5/thumbnails/45.jpg)
45
Variational Methods
![Page 46: Introduction to Machine Learning,10701/slides/14_EM.pdf · 2017. 10. 18. · Some seeds can result in poor convergence rate, or convergence to sub-optimal clustering. K-means algorithm](https://reader035.vdocuments.site/reader035/viewer/2022063016/5fd6d10f8f76d7124610a7b3/html5/thumbnails/46.jpg)
46
Variational methods
Free energy:
Variational methods might decrease the marginal likelihood!
![Page 47: Introduction to Machine Learning,10701/slides/14_EM.pdf · 2017. 10. 18. · Some seeds can result in poor convergence rate, or convergence to sub-optimal clustering. K-means algorithm](https://reader035.vdocuments.site/reader035/viewer/2022063016/5fd6d10f8f76d7124610a7b3/html5/thumbnails/47.jpg)
47
Variational methods
Free energy:
Partial M Step:
Partial E Step:
But not necessarily the best max/min which would be
Variational methods might decrease the marginal likelihood!
![Page 48: Introduction to Machine Learning,10701/slides/14_EM.pdf · 2017. 10. 18. · Some seeds can result in poor convergence rate, or convergence to sub-optimal clustering. K-means algorithm](https://reader035.vdocuments.site/reader035/viewer/2022063016/5fd6d10f8f76d7124610a7b3/html5/thumbnails/48.jpg)
48
Summary: EM AlgorithmA way of maximizing likelihood function for hidden variable models.
Finds MLE of parameters when the original (hard) problem can be broken up into two (easy) pieces:
1.Estimate some “missing” or “unobserved” data from observed data and current parameters.
2. Using this “complete” data, find the MLE parameter estimates.
Alternate between filling in the latent variables using the best guess (posterior) and updating the parameters based on this guess:
In the M-step we optimize a lower bound F on the log-likelihood L.
In the E-step we close the gap, making bound F =log-likelihood L.
EM performs coordinate ascent on F, can get stuck in local optima.
E Step:
M Step:
![Page 49: Introduction to Machine Learning,10701/slides/14_EM.pdf · 2017. 10. 18. · Some seeds can result in poor convergence rate, or convergence to sub-optimal clustering. K-means algorithm](https://reader035.vdocuments.site/reader035/viewer/2022063016/5fd6d10f8f76d7124610a7b3/html5/thumbnails/49.jpg)
49
EM Examples
![Page 50: Introduction to Machine Learning,10701/slides/14_EM.pdf · 2017. 10. 18. · Some seeds can result in poor convergence rate, or convergence to sub-optimal clustering. K-means algorithm](https://reader035.vdocuments.site/reader035/viewer/2022063016/5fd6d10f8f76d7124610a7b3/html5/thumbnails/50.jpg)
50
Expectation-Maximization (EM)A simple case:
• We have unlabeled data x1, x2, …, xn
• We know there are K classes
• We know P(y=1)=1, P(y=2)=2, P(y=3) =3…, P(y=K)=K
• We know common variance 2
• We don’t know 1, 2, … K , and we want to learn them
We can write
Marginalize over class
Independent data
) learn 1, 2, … K
![Page 51: Introduction to Machine Learning,10701/slides/14_EM.pdf · 2017. 10. 18. · Some seeds can result in poor convergence rate, or convergence to sub-optimal clustering. K-means algorithm](https://reader035.vdocuments.site/reader035/viewer/2022063016/5fd6d10f8f76d7124610a7b3/html5/thumbnails/51.jpg)
51
EXPECTATION (E) STEP
![Page 52: Introduction to Machine Learning,10701/slides/14_EM.pdf · 2017. 10. 18. · Some seeds can result in poor convergence rate, or convergence to sub-optimal clustering. K-means algorithm](https://reader035.vdocuments.site/reader035/viewer/2022063016/5fd6d10f8f76d7124610a7b3/html5/thumbnails/52.jpg)
52
Maximization (M) step
Equivalent to updating cluster centers in K-means
We calculated these weights in the E step
Joint distribution is simple
At iteration t, maximize function Q in t:M step
![Page 53: Introduction to Machine Learning,10701/slides/14_EM.pdf · 2017. 10. 18. · Some seeds can result in poor convergence rate, or convergence to sub-optimal clustering. K-means algorithm](https://reader035.vdocuments.site/reader035/viewer/2022063016/5fd6d10f8f76d7124610a7b3/html5/thumbnails/53.jpg)
53
EM for spherical, same variance GMMs
E-step
Compute “expected” classes of all datapoints for each class
In K-means “E-step” we do hard assignment. EM does soft assignment
M-step
Compute Max of function Q. [I.e. update μ given our data’s class membership distributions (weights) ]
Iterate.
![Page 54: Introduction to Machine Learning,10701/slides/14_EM.pdf · 2017. 10. 18. · Some seeds can result in poor convergence rate, or convergence to sub-optimal clustering. K-means algorithm](https://reader035.vdocuments.site/reader035/viewer/2022063016/5fd6d10f8f76d7124610a7b3/html5/thumbnails/54.jpg)
54
![Page 55: Introduction to Machine Learning,10701/slides/14_EM.pdf · 2017. 10. 18. · Some seeds can result in poor convergence rate, or convergence to sub-optimal clustering. K-means algorithm](https://reader035.vdocuments.site/reader035/viewer/2022063016/5fd6d10f8f76d7124610a7b3/html5/thumbnails/55.jpg)
55
![Page 56: Introduction to Machine Learning,10701/slides/14_EM.pdf · 2017. 10. 18. · Some seeds can result in poor convergence rate, or convergence to sub-optimal clustering. K-means algorithm](https://reader035.vdocuments.site/reader035/viewer/2022063016/5fd6d10f8f76d7124610a7b3/html5/thumbnails/56.jpg)
56
EM for general GMMs: Example
![Page 57: Introduction to Machine Learning,10701/slides/14_EM.pdf · 2017. 10. 18. · Some seeds can result in poor convergence rate, or convergence to sub-optimal clustering. K-means algorithm](https://reader035.vdocuments.site/reader035/viewer/2022063016/5fd6d10f8f76d7124610a7b3/html5/thumbnails/57.jpg)
57
EM for general GMMs: ExampleAfter 1st iteration
![Page 58: Introduction to Machine Learning,10701/slides/14_EM.pdf · 2017. 10. 18. · Some seeds can result in poor convergence rate, or convergence to sub-optimal clustering. K-means algorithm](https://reader035.vdocuments.site/reader035/viewer/2022063016/5fd6d10f8f76d7124610a7b3/html5/thumbnails/58.jpg)
58
EM for general GMMs: ExampleAfter 2nd iteration
![Page 59: Introduction to Machine Learning,10701/slides/14_EM.pdf · 2017. 10. 18. · Some seeds can result in poor convergence rate, or convergence to sub-optimal clustering. K-means algorithm](https://reader035.vdocuments.site/reader035/viewer/2022063016/5fd6d10f8f76d7124610a7b3/html5/thumbnails/59.jpg)
59
EM for general GMMs: ExampleAfter 3rd iteration
![Page 60: Introduction to Machine Learning,10701/slides/14_EM.pdf · 2017. 10. 18. · Some seeds can result in poor convergence rate, or convergence to sub-optimal clustering. K-means algorithm](https://reader035.vdocuments.site/reader035/viewer/2022063016/5fd6d10f8f76d7124610a7b3/html5/thumbnails/60.jpg)
60
EM for general GMMs: ExampleAfter 4th iteration
![Page 61: Introduction to Machine Learning,10701/slides/14_EM.pdf · 2017. 10. 18. · Some seeds can result in poor convergence rate, or convergence to sub-optimal clustering. K-means algorithm](https://reader035.vdocuments.site/reader035/viewer/2022063016/5fd6d10f8f76d7124610a7b3/html5/thumbnails/61.jpg)
61
EM for general GMMs: ExampleAfter 5th iteration
![Page 62: Introduction to Machine Learning,10701/slides/14_EM.pdf · 2017. 10. 18. · Some seeds can result in poor convergence rate, or convergence to sub-optimal clustering. K-means algorithm](https://reader035.vdocuments.site/reader035/viewer/2022063016/5fd6d10f8f76d7124610a7b3/html5/thumbnails/62.jpg)
62
EM for general GMMs: ExampleAfter 6th iteration
![Page 63: Introduction to Machine Learning,10701/slides/14_EM.pdf · 2017. 10. 18. · Some seeds can result in poor convergence rate, or convergence to sub-optimal clustering. K-means algorithm](https://reader035.vdocuments.site/reader035/viewer/2022063016/5fd6d10f8f76d7124610a7b3/html5/thumbnails/63.jpg)
63
EM for general GMMs: ExampleAfter 20th iteration
![Page 64: Introduction to Machine Learning,10701/slides/14_EM.pdf · 2017. 10. 18. · Some seeds can result in poor convergence rate, or convergence to sub-optimal clustering. K-means algorithm](https://reader035.vdocuments.site/reader035/viewer/2022063016/5fd6d10f8f76d7124610a7b3/html5/thumbnails/64.jpg)
64
GMM for Density Estimation
![Page 65: Introduction to Machine Learning,10701/slides/14_EM.pdf · 2017. 10. 18. · Some seeds can result in poor convergence rate, or convergence to sub-optimal clustering. K-means algorithm](https://reader035.vdocuments.site/reader035/viewer/2022063016/5fd6d10f8f76d7124610a7b3/html5/thumbnails/65.jpg)
65
WHAT YOU SHOULD KNOW
• K-means problem
• K-mean algorithm
• Mixture of Gaussians model
• Expectation Maximization Algorithm
• EM vs MLE
![Page 66: Introduction to Machine Learning,10701/slides/14_EM.pdf · 2017. 10. 18. · Some seeds can result in poor convergence rate, or convergence to sub-optimal clustering. K-means algorithm](https://reader035.vdocuments.site/reader035/viewer/2022063016/5fd6d10f8f76d7124610a7b3/html5/thumbnails/66.jpg)
66
Thanks for your attention!