unsupervised learning. supervised learning vs. unsupervised learning

31
Unsupervised Learning

Upload: gyles-palmer

Post on 05-Jan-2016

265 views

Category:

Documents


5 download

TRANSCRIPT

Page 1: Unsupervised Learning. Supervised learning vs. unsupervised learning

Unsupervised Learning

Page 2: Unsupervised Learning. Supervised learning vs. unsupervised learning

Supervised learning vs. unsupervised learning

Page 3: Unsupervised Learning. Supervised learning vs. unsupervised learning

Adapted from Andrew Moore, http://www.autonlab.org/tutorials/gmm

Page 4: Unsupervised Learning. Supervised learning vs. unsupervised learning

Adapted from Andrew Moore, http://www.autonlab.org/tutorials/gmm

Page 5: Unsupervised Learning. Supervised learning vs. unsupervised learning

Adapted from Andrew Moore, http://www.autonlab.org/tutorials/gmm

Page 6: Unsupervised Learning. Supervised learning vs. unsupervised learning

Adapted from Andrew Moore, http://www.autonlab.org/tutorials/gmm

Page 7: Unsupervised Learning. Supervised learning vs. unsupervised learning

Adapted from Andrew Moore, http://www.autonlab.org/tutorials/gmm

Page 8: Unsupervised Learning. Supervised learning vs. unsupervised learning

8

K-means clustering algorithm

Adapted from Bing Liu, UIC http://www.cs.uic.edu/~liub/teach/cs583-fall-05/CS583-unsupervised-learning.ppt

Input: k, D;Choose k points as initial centroids (cluster centers);

Repeat the following until the stopping criterion is met:

For each data point x D do compute the distance from x to each centroid;assign x to the closest centroid;

Re-compute centroids as means of current cluster memberships

Page 10: Unsupervised Learning. Supervised learning vs. unsupervised learning

10

Stopping/convergence criterion 1. no (or minimum) re-assignments of data

points to different clusters,

2. no (or minimum) change of centroids, or

3. minimum decrease in the sum of squared error (SSE),

Ci is the jth cluster, mj is the centroid of cluster Cj (the mean vector of all the data points in Cj), and dist(x, mj) is the distance between data point x and centroid mj.

k

jC j

j

distSSE1

2),(x

mx(1)

Adapted from Bing Liu, UIC http://www.cs.uic.edu/~liub/teach/cs583-fall-05/CS583-unsupervised-learning.ppt

Page 11: Unsupervised Learning. Supervised learning vs. unsupervised learning

CS583, Bing Liu, UIC

Example distance functions

• Let xi = (ai1, ..., ain) and xj = (aj1, ...,ajn)

– Euclidean distance:

– Manhattan (city block) distance

dij = (aiq − a jq )2

q =1

n

dij = aiq − a jqq =1

n

Page 12: Unsupervised Learning. Supervised learning vs. unsupervised learning

• A text document consists of a sequence of sentences and each sentence consists of a sequence of words.

• To simplify: a document is usually considered a “bag” of words in document clustering. – Sequence and position of words are ignored.

• A document is represented with a vector just like a normal data point.

• Distance between two documents is the cosine of the angle between their corresponding feature vectors.

Distance function for text documents

Adapted from Bing Liu, UIC http://www.cs.uic.edu/~liub/teach/cs583-fall-05/CS583-unsupervised-learning.ppt

Page 13: Unsupervised Learning. Supervised learning vs. unsupervised learning

Example from http://arbesman.net/blog/2011/03/24/clustering-map-of-biomedical-articles

Clustering Map of Biomedical Articles

Page 14: Unsupervised Learning. Supervised learning vs. unsupervised learning

Example: Image segmentation by k-means clustering by color

From http://vitroz.com/Documents/Image%20Segmentation.pdf

K=5, RGB space

Page 15: Unsupervised Learning. Supervised learning vs. unsupervised learning

K=10, RGB space

Page 16: Unsupervised Learning. Supervised learning vs. unsupervised learning

K=5, RGB space

Page 17: Unsupervised Learning. Supervised learning vs. unsupervised learning

K=10, RGB space

Page 18: Unsupervised Learning. Supervised learning vs. unsupervised learning

K=5, RGB space

Page 19: Unsupervised Learning. Supervised learning vs. unsupervised learning

K=10, RGB space

Page 20: Unsupervised Learning. Supervised learning vs. unsupervised learning

Weaknesses of k-means

Adapted from Bing Liu, UIC http://www.cs.uic.edu/~liub/teach/cs583-fall-05/CS583-unsupervised-learning.ppt

Page 21: Unsupervised Learning. Supervised learning vs. unsupervised learning

Weaknesses of k-means

• The algorithm is only applicable if the mean is defined. – For categorical data, k-mode - the centroid is

represented by most frequent values.

Adapted from Bing Liu, UIC http://www.cs.uic.edu/~liub/teach/cs583-fall-05/CS583-unsupervised-learning.ppt

Page 22: Unsupervised Learning. Supervised learning vs. unsupervised learning

Weaknesses of k-means

• The algorithm is only applicable if the mean is defined. – For categorical data, k-mode - the centroid is

represented by most frequent values.

• The user needs to specify k.

Adapted from Bing Liu, UIC http://www.cs.uic.edu/~liub/teach/cs583-fall-05/CS583-unsupervised-learning.ppt

Page 23: Unsupervised Learning. Supervised learning vs. unsupervised learning

Weaknesses of k-means

• The algorithm is only applicable if the mean is defined. – For categorical data, k-mode - the centroid is

represented by most frequent values.

• The user needs to specify k.

• The algorithm is sensitive to outliers– Outliers are data points that are very far away

from other data points. – Outliers could be errors in the data recording or

some special data points with very different values.

Adapted from Bing Liu, UIC http://www.cs.uic.edu/~liub/teach/cs583-fall-05/CS583-unsupervised-learning.ppt

Page 24: Unsupervised Learning. Supervised learning vs. unsupervised learning

Weaknesses of k-means

• The algorithm is only applicable if the mean is defined. – For categorical data, k-mode - the centroid is

represented by most frequent values.

• The user needs to specify k.

• The algorithm is sensitive to outliers– Outliers are data points that are very far away

from other data points. – Outliers could be errors in the data recording or

some special data points with very different values.

• k-means is sensitive to initial random centroids

Adapted from Bing Liu, UIC http://www.cs.uic.edu/~liub/teach/cs583-fall-05/CS583-unsupervised-learning.ppt

Page 25: Unsupervised Learning. Supervised learning vs. unsupervised learning

CS583, Bing Liu, UIC

Weaknesses of k-means: Problems with outliers

Adapted from Bing Liu, UIC http://www.cs.uic.edu/~liub/teach/cs583-fall-05/CS583-unsupervised-learning.ppt

Page 26: Unsupervised Learning. Supervised learning vs. unsupervised learning

How to deal with outliers/noise in clustering?

Page 27: Unsupervised Learning. Supervised learning vs. unsupervised learning

CS583, Bing Liu, UIC

Dealing with outliers

• One method is to remove some data points in the clustering process that are much further away from the centroids than other data points.

– To be safe, we may want to monitor these possible outliers over a few iterations and then decide to remove them.

• Another method is to perform random sampling. Since in sampling we only choose a small subset of the data points, the chance of selecting an outlier is very small.

– Assign the rest of the data points to the clusters by distance or similarity comparison, or classification

Adapted from Bing Liu, UIC http://www.cs.uic.edu/~liub/teach/cs583-fall-05/CS583-unsupervised-learning.ppt

Page 28: Unsupervised Learning. Supervised learning vs. unsupervised learning

CS583, Bing Liu, UIC

Weaknesses of k-means (cont …)

• The algorithm is sensitive to initial seeds.

+

+

Adapted from Bing Liu, UIC http://www.cs.uic.edu/~liub/teach/cs583-fall-05/CS583-unsupervised-learning.ppt

Page 29: Unsupervised Learning. Supervised learning vs. unsupervised learning

CS583, Bing Liu, UIC

• If we use different seeds: good resultsThere are some

methods to help choose good seeds

++

Adapted from Bing Liu, UIC http://www.cs.uic.edu/~liub/teach/cs583-fall-05/CS583-unsupervised-learning.ppt

Weaknesses of k-means (cont …)

Page 30: Unsupervised Learning. Supervised learning vs. unsupervised learning

CS583, Bing Liu, UIC

• The k-means algorithm is not suitable for discovering clusters that are not hyper-ellipsoids (or hyper-spheres).

+

Adapted from Bing Liu, UIC http://www.cs.uic.edu/~liub/teach/cs583-fall-05/CS583-unsupervised-learning.ppt

Weaknesses of k-means (cont …)

Page 31: Unsupervised Learning. Supervised learning vs. unsupervised learning

CS583, Bing Liu, UIC

k-means summary

• Despite weaknesses, k-means is still the most popular algorithm due to its simplicity, efficiency and – other clustering algorithms have their own lists of

weaknesses.

• No clear evidence that any other clustering algorithm performs better in general – although they may be more suitable for some

specific types of data or applications.

Adapted from Bing Liu, UIC http://www.cs.uic.edu/~liub/teach/cs583-fall-05/CS583-unsupervised-learning.ppt