clustering algorithms - maya ackerman · • there are clustering algorithms for a wide variety of...

Margareta Ackerman !

Clustering Algorithms

• As we discussed last class, there are MANY clustering algorithms, and new ones are proposed all the time.

• They are very different from each other!

A sea of algorithms

• There are clustering algorithms for a wide variety of input and output types. Today, we will focus on the most popular one.

• Input:The input is (X,d) and k, where 1. X is a set of elements (think of it as the labels of the points) 2. d: X x X → R+ is a dissimilarity function 3. k is the number of desired clusters, 1≤k≤|X|

Input/output

!

• Input:The input is (X,d) and k, where 1. X is a set of elements (think of it as the labels of the points) 2. d: X x X → R+ is a dissimilarity function 3. k is the number of desired clusters, 1≤k≤|X|

• Output: A partition of X into k sets {C1, C2, …, Ck} where 1) Ci ∩ Cj is empty for all i and j 2) C1 ∪ C2 ∪ … ∪ Ck = X.

Input/output

• Start by placing each point in its own cluster

• Then, merge the two “closest” clusters

• Continue to merge two “closest” clusters until exactly k clusters remain

Linkage-Based Algorithms

Start by placing each point in its own cluster Calculate and store the distance between each pair of clustersWhile there are more than k clusters - Let A, B be the two closest clusters - Add cluster A U B - Remove clusters A and B - Find the distance between A U B and all other clusters

Linkage-Based Algorithms: More detail

!• How do we define the distance between clusters? • Common examples: – Single-linkage: min between-cluster distance – Average-linkage: average between-cluster distance – Complete-linkage: max between-cluster distance

7

Examples of linkage-based algorithms

Linkage-based algorithms are often applied in the hierarchical setting, where the algorithm outputs an entire tree of clustering.

Hierarchical linkage-based algorithms are similar to the partitional versions we saw here (more about the hierarchal setting later).

!

Hierarchical algorithms

!

Perhaps the most popular clustering algorithm !

Often applied to data in Euclidean space.

9

K-means

!

Given a clustering {C1, C2, …, Ck}, the k-means objective function is !

!

!

Where µi is the mean of Ci. That is, !

The ideal goal is to find a clustering with the minimum k-means cost. But that can take too long (it’s NP-hard.) !

So instead, we apply a heuristic: An algorithm that, in practice, tends to find clusterings with low k-means cost.

10

K-means Objective Function

!

Pick k points (call them “centers”) Until convergence: Assign each point to its closest center. This gives us k clusters. Compute the mean of each cluster Let these means be the new centers !

!

The algorithm converges when the clusters don’t change in two consecutive iterations. !

! 11

Lloyd’s method

!

!

How could we initialize the centers? !

Furthest centroids: Pick one random center c1. Set c2 to the furthest point from c1 Set ci to have the largest minimum distance from any center already chosen. !

12

Variations of Lloyd’s method

!

!

How could we initialize the centers? !

Random: Pick k random initial centers. !

Using this approach, we might end up in a “local optimum.” !

So, we run the algorithm many times (~100) to completion and pick the minimum cost clustering.

13

Variations of Lloyd’s method

!

• Picking random centers works VERY WELL in practice. • In particular, it work much better than furthest centroids. • It works so well, that “k-means” is synonymous with this approach. !• Does Lloyd’s method with random centers always find the optimal k-means solution? No. !• We will see other ways to initialize Lloyd’s method.

14

Lloyd’s method with random centers

15

K-median!

Like k-means, except that we do not square the distance to the center. !

!

!

!

Given a clustering {C1, C2, …, Ck}, the k-median objective function is !

!

!

Where µi is the mean of Ci. That is, !

!

16

K-medoids!

Like k-means, except that the centers must be part of the data set. !!

Given a clustering {C1, C2, …, Ck}, the k-medoids objective function is !

!

!

where that minimizes the above sum. !

ci 2 Ci

kX

i=1

X

x2Ci

kx� c

i

k2

17

Min-sum!

Given a clustering {C1, C2, …, Ck}, the min-sum objective function is !

!

!

!

!

kX

i=1

X

x,y2Ci

d(x, y)

18Single-linkage k-means

Differences in Input/Output Behavior of Clustering Algorithms

19

Single-linkage, average-linkage, complete-linkage, min-diamater

k-means, k-median, k-medoids

Differences in Input/Output Behavior of Clustering Algorithms

There are a wide variety of clustering algorithms, which can produce very different clusterings.

!!

!

20

How should a user decide which algorithm to use for

a given application?

The User’s Dilemma

Users rely on cost related considerations: running

times, space usage, software purchasing costs, etc…

!

There is inadequate emphasis on

input-output behaviour !

21

Clustering Algorithm Selection

A framework that lets a user utilize prior knowledge to select an algorithm

!

• Identify properties that distinguish between different input-output behaviour of clustering paradigms

• The properties should be: 1) Intuitive and “user-friendly” 2) Useful for distinguishing clustering algorithms

22

Framework for Algorithm Selection (Ackerman, Ben-David, and Loker, NIPS 2010)

The goal is to understand fundamental differences between clustering

methods, and convey them formally, clearly, and as simply as possible.

23

Framework for Algorithm Selection

24

Local Outer Con.

Inner Con.

Consistent Refin. Preserv

Order Inv.

Rich Outer Rich

Rep Ind

Scale Inv

Single linkage ! ! ! ! ! ! ! ! ! !

Average linkage ! ! " " ! " ! ! ! !

Complete linkage ! ! " " ! ! ! ! ! !

K-means ! ! " " " " ! ! ! !K-medoids ! ! " " " " ! ! ! !Min-Sum ! ! ! ! " " ! ! ! !Ratio-cut " " ! ! " " ! ! ! !Normalized-cut " " " " " " ! ! ! !

Property-based classification for fixed k Ackerman, Ben-David, and Loker, NIPS 2010

Local Outer Con.

Inner Con.


Order Inv.

Rich Outer Rich

Rep Ind

Scale Inv

Single linkage ! ! ! ! ! ! ! ! ! !

Average linkage ! ! " " ! " ! ! ! !

Complete linkage ! ! " " ! ! ! ! ! !

K-means ! ! " " " " ! ! ! !K-medoids ! ! " " " " ! ! ! !Min-Sum ! ! ! ! " " ! ! ! !Ratio-cut " " ! ! " " ! ! ! !Normalized-cut " " " " " " ! ! ! !

25

Kleinberg’s Axioms are consistent when

k is given

Kleinberg’s axioms for fixed k

26

Single linkage satisfied ALL of these properties! !

So should we just use Single linkage all the time? !

It’s not a good clustering algorithm in practice!

Single-linkage satisfies everything

Local Outer Con.

Inner Con.


Order Inv.

Rich Outer Rich

Rep Ind

Scale Inv

Single linkage ! ! ! ! ! ! ! ! ! !

27

! Despite much work on clustering properties, some basic questions remained

unanswered. !

Consider some of the most popular clustering methods: k-means, single-linkage, average-linkage, etc…

!!!

• How do these algorithms differ in their input-output behavior? • What are the advantages of k-means over other methods? • We were missing some key properties. !

! More on that in our next class!

!

!

What’s Left To Be Done?

clustering algorithms - maya ackerman · • there are clustering algorithms for a wide variety of...

Documents