introduction to artificial intelligence unsupervised learning · clustering clustering is one of...

Post on 29-Aug-2020

9 Views

Category:

Documents

1 Downloads

Preview:

Click to see full reader

TRANSCRIPT

Introduction to Artificial IntelligenceUnsupervised Learning

Janyl JumadinovaOctober 21, 2016

Supervised learning vs. Unsupervised learning

I Supervised learning: discover patterns in the data that relatedata attributes with a target (class) attribute.- These patterns are then utilized to predict the values of thetarget attribute in future data instances.

I Unsupervised learning: the data has no target attribute.- We want to explore the data to find some intrinsic structuresin them.

2/29

Supervised learning vs. Unsupervised learning

I Supervised learning: discover patterns in the data that relatedata attributes with a target (class) attribute.- These patterns are then utilized to predict the values of thetarget attribute in future data instances.

I Unsupervised learning: the data has no target attribute.- We want to explore the data to find some intrinsic structuresin them.

2/29

Clustering

I Organizing data into classes such that there is:- high intra-class similarity- low inter-class similarity

I Finding the class labels and the number of classes directly fromthe data (in contrast to classification).

I More informally, finding natural groupings among objects.

3/29

Clustering

I Organizing data into classes such that there is:- high intra-class similarity- low inter-class similarity

I Finding the class labels and the number of classes directly fromthe data (in contrast to classification).

I More informally, finding natural groupings among objects.

3/29

Clustering

I Organizing data into classes such that there is:- high intra-class similarity- low inter-class similarity

I Finding the class labels and the number of classes directly fromthe data (in contrast to classification).

I More informally, finding natural groupings among objects.

3/29

Clustering

Clustering is one of the most utilized data mining techniques

It has a long history, and used in almost every field, e.g., medicine,psychology, botany, sociology, biology, archeology, marketing,insurance, libraries, etc.

I Ex.: : Given a collection of text documents, we want toorganize them according to their content similarities.

I Ex.: In marketing, segment customers according to theirsimilarities (to do targeted marketing).

4/29

Clustering

Clustering is one of the most utilized data mining techniques

It has a long history, and used in almost every field, e.g., medicine,psychology, botany, sociology, biology, archeology, marketing,insurance, libraries, etc.

I Ex.: : Given a collection of text documents, we want toorganize them according to their content similarities.

I Ex.: In marketing, segment customers according to theirsimilarities (to do targeted marketing).

4/29

Clustering

Clustering is one of the most utilized data mining techniques

It has a long history, and used in almost every field, e.g., medicine,psychology, botany, sociology, biology, archeology, marketing,insurance, libraries, etc.

I Ex.: : Given a collection of text documents, we want toorganize them according to their content similarities.

I Ex.: In marketing, segment customers according to theirsimilarities (to do targeted marketing).

4/29

What is a natural grouping among these

objects?

5/29

What is a natural grouping among these objects?

6/29

What is Similarity?

The quality or state of being similar; likeness; resemblance; as, a similarity

of features. Webster’s Dictionary

Similarity is hard to define, but ... “We know it when we see it”

The real meaning of similarity is a philosophical question. We will take a more pragmatic

approach.

7/29

What is Similarity?The quality or state of being similar; likeness; resemblance; as, a similarity

of features. Webster’s Dictionary

Similarity is hard to define, but ... “We know it when we see it”

The real meaning of similarity is a philosophical question. We will take a more pragmatic

approach.

7/29

What is Similarity?The quality or state of being similar; likeness; resemblance; as, a similarity

of features. Webster’s Dictionary

Similarity is hard to define, but ... “We know it when we see it”

The real meaning of similarity is a philosophical question. We will take a more pragmatic

approach.

7/29

What is Similarity?The quality or state of being similar; likeness; resemblance; as, a similarity

of features. Webster’s Dictionary

Similarity is hard to define, but ... “We know it when we see it”

The real meaning of similarity is a philosophical question. We will take a more pragmatic

approach.7/29

Defining Distance Measures

Definition:

Let O1 and O2 be two objects from the universe of possible objects.The distance (dissimilarity) between O1 and O2 is a real numberdenoted by D(O1,O2).

8/29

What properties should a distance measure

have?

I D(A,B) = D(B,A) SymmetryOtherwise you could claim “Greg looks like Oliver, but Oliver looks nothing like

Greg.”

I D(A,A) = 0 Constancy of Self-SimilarityOtherwise you could claim “Greg looks more like Oliver, than Oliver does.”

I D(A,B) = 0 iff A = B Positivity (Separation)Otherwise there are objects in your world that are different, but you cannot tell

apart.

I D(A,B) ≤ D(A,C ) + D(B,C ) Triangular InequalityOtherwise you could claim “Greg is very like Bob, and Greg is very like Oliver, but

Bob is very unlike Oliver.”

9/29

What properties should a distance measure

have?

I D(A,B) = D(B,A) SymmetryOtherwise you could claim “Greg looks like Oliver, but Oliver looks nothing like

Greg.”

I D(A,A) = 0 Constancy of Self-SimilarityOtherwise you could claim “Greg looks more like Oliver, than Oliver does.”

I D(A,B) = 0 iff A = B Positivity (Separation)Otherwise there are objects in your world that are different, but you cannot tell

apart.

I D(A,B) ≤ D(A,C ) + D(B,C ) Triangular InequalityOtherwise you could claim “Greg is very like Bob, and Greg is very like Oliver, but

Bob is very unlike Oliver.”

9/29

What properties should a distance measure

have?

I D(A,B) = D(B,A) SymmetryOtherwise you could claim “Greg looks like Oliver, but Oliver looks nothing like

Greg.”

I D(A,A) = 0 Constancy of Self-SimilarityOtherwise you could claim “Greg looks more like Oliver, than Oliver does.”

I D(A,B) = 0 iff A = B Positivity (Separation)Otherwise there are objects in your world that are different, but you cannot tell

apart.

I D(A,B) ≤ D(A,C ) + D(B,C ) Triangular InequalityOtherwise you could claim “Greg is very like Bob, and Greg is very like Oliver, but

Bob is very unlike Oliver.”

9/29

What properties should a distance measure

have?

I D(A,B) = D(B,A) SymmetryOtherwise you could claim “Greg looks like Oliver, but Oliver looks nothing like

Greg.”

I D(A,A) = 0 Constancy of Self-SimilarityOtherwise you could claim “Greg looks more like Oliver, than Oliver does.”

I D(A,B) = 0 iff A = B Positivity (Separation)Otherwise there are objects in your world that are different, but you cannot tell

apart.

I D(A,B) ≤ D(A,C ) + D(B,C ) Triangular InequalityOtherwise you could claim “Greg is very like Bob, and Greg is very like Oliver, but

Bob is very unlike Oliver.” 9/29

How do we measure similarity?

10/29

How do we measure similarity?To measure the similarity between two objects, transform one of theobjects into the other, and measure how much effort it took. Themeasure of effort becomes the distance measure.

11/29

How do we measure similarity?

12/29

Partitional Clustering

I Non-hierarchical, each instance is placed in exactly one of Knonoverlapping clusters.

I Since only one set of clusters is output, the user normally has toinput the desired number of clusters K.

13/29

Minimize Squared Error

14/29

K-means clustering

I K-means is a partitional clustering algorithm.I The k-means algorithm partitions the given data into k clusters.

I Each cluster has a cluster center, called centroid.I k is specified by the user.

15/29

K-means Algorithm

1. Decide on a value for k .

2. Initialize the k cluster centers (randomly, if necessary).

3. Decide the class memberships of the N objects by assigningthem to the nearest cluster center.

4. Re-estimate the k cluster centers, by assuming the membershipsfound above are correct.

5. If none of the N objects changed membership in the lastiteration, exit. Otherwise goto 3.

16/29

K-means Algorithm

1. Decide on a value for k .

2. Initialize the k cluster centers (randomly, if necessary).

3. Decide the class memberships of the N objects by assigningthem to the nearest cluster center.

4. Re-estimate the k cluster centers, by assuming the membershipsfound above are correct.

5. If none of the N objects changed membership in the lastiteration, exit. Otherwise goto 3.

16/29

K-means Algorithm

1. Decide on a value for k .

2. Initialize the k cluster centers (randomly, if necessary).

3. Decide the class memberships of the N objects by assigningthem to the nearest cluster center.

4. Re-estimate the k cluster centers, by assuming the membershipsfound above are correct.

5. If none of the N objects changed membership in the lastiteration, exit. Otherwise goto 3.

16/29

K-means Algorithm

1. Decide on a value for k .

2. Initialize the k cluster centers (randomly, if necessary).

3. Decide the class memberships of the N objects by assigningthem to the nearest cluster center.

4. Re-estimate the k cluster centers, by assuming the membershipsfound above are correct.

5. If none of the N objects changed membership in the lastiteration, exit. Otherwise goto 3.

16/29

K-means Algorithm

1. Decide on a value for k .

2. Initialize the k cluster centers (randomly, if necessary).

3. Decide the class memberships of the N objects by assigningthem to the nearest cluster center.

4. Re-estimate the k cluster centers, by assuming the membershipsfound above are correct.

5. If none of the N objects changed membership in the lastiteration, exit. Otherwise goto 3.

16/29

K-Means Clustering: Step 1

17/29

K-Means Clustering: Step 2

18/29

K-Means Clustering: Step 3

19/29

K-Means Clustering: Step 4

20/29

K-Means Clustering: Step 5

21/29

How can we tell the right number of clusters?

I In general, this is an unsolved problem.

I We can use approximation methods!

22/29

How can we tell the right number of clusters?

I In general, this is an unsolved problem.

I We can use approximation methods!

22/29

How can we tell the right number of clusters?

I In general, this is an unsolved problem.

I We can use approximation methods!

22/29

23/29

24/29

25/29

We can plot the objective function values for k = 1...6

I The abrupt change at k = 2, is highly suggestive of two clustersin the data.

I This technique for determining the number of clusters is knownas “knee finding” or “elbow finding”.

26/29

Strengths of K-Means

I Simple: easy to understand and to implement

I Efficient: Time complexity O(tkn), where n is the number ofdata points, k is the number of clusters, and t is the number ofiterations.- Since both k and t are small, k-means is considered a linearalgorithm.

I Often terminates at a local optimum.- The global optimum may be found using techniques such as:deterministic annealing and genetic algorithms

27/29

Weaknesses of K-Means

I The algorithm is only applicable if the mean is defined.- For categorical data - the centroid is represented by mostfrequent values.- The user needs to specify k.

I The algorithm is sensitive to outliers.- Outliers are data points that are very far away from other datapoints.- Outliers could be errors in the data recording or some specialdata points with very different values.

28/29

Weaknesses of K-Means

I The algorithm is only applicable if the mean is defined.- For categorical data - the centroid is represented by mostfrequent values.- The user needs to specify k.

I The algorithm is sensitive to outliers.- Outliers are data points that are very far away from other datapoints.- Outliers could be errors in the data recording or some specialdata points with very different values.

28/29

K-Means Summary

I Despite weaknesses, k-means is still the most popular algorithmdue to its simplicity, efficiency and other clustering algorithmshave their own lists of weaknesses.

I No clear evidence that any other clustering algorithm performsbetter in general, although they may be more suitable for somespecific types of data or applications.

I Comparing different clustering algorithms is a difficult task. Noone knows the correct clusters!

29/29

top related