reza bosagh zadeh (carnegie mellon) shai ben-david (waterloo) uai 09, montréal, june 2009 a...

26
Reza Bosagh Zadeh (Carnegie Mellon) Shai Ben-David (Waterloo) UAI 09, Montréal, June 2009 A UNIQUENESS THEOREM FOR CLUSTERING

Upload: samuel-nelson

Post on 21-Jan-2016

220 views

Category:

Documents


2 download

TRANSCRIPT

Page 1: Reza Bosagh Zadeh (Carnegie Mellon) Shai Ben-David (Waterloo) UAI 09, Montréal, June 2009 A UNIQUENESS THEOREM FOR CLUSTERING

Reza Bosagh Zadeh (Carnegie Mellon)

Shai Ben-David (Waterloo)

UAI 09, Montréal, June 2009

A UNIQUENESS THEOREMFOR CLUSTERING

Page 2: Reza Bosagh Zadeh (Carnegie Mellon) Shai Ben-David (Waterloo) UAI 09, Montréal, June 2009 A UNIQUENESS THEOREM FOR CLUSTERING

TALK OUTLINE

Questions being addressed

Introduce Axioms

Introduce Properties

Uniqueness Theorem for Single-Linkage

Taxonomy of Partitioning Functions

Bosagh Zadeh, Ben-David, UAI 2009

Page 3: Reza Bosagh Zadeh (Carnegie Mellon) Shai Ben-David (Waterloo) UAI 09, Montréal, June 2009 A UNIQUENESS THEOREM FOR CLUSTERING

WHAT IS CLUSTERING?

Given a collection of objects (characterized by feature vectors, or just a matrix of pair-wise similarities), detects the presence of distinct groups, and assign objects to groups.

40 45 50 55

74

76

78

80

82

84

Bosagh Zadeh, Ben-David, UAI 2009

Page 4: Reza Bosagh Zadeh (Carnegie Mellon) Shai Ben-David (Waterloo) UAI 09, Montréal, June 2009 A UNIQUENESS THEOREM FOR CLUSTERING

SOME BASIC UNANSWERED QUESTIONS

Are there principles governing all clustering paradigms?

Which clustering paradigm should I use for a given task?

Bosagh Zadeh, Ben-David, UAI 2009

Page 5: Reza Bosagh Zadeh (Carnegie Mellon) Shai Ben-David (Waterloo) UAI 09, Montréal, June 2009 A UNIQUENESS THEOREM FOR CLUSTERING

“Clustering” is an ill-defined problem

There are many different clustering tasks, leading to different clustering paradigms:

THERE ARE MANY CLUSTERING TASKS

Bosagh Zadeh, Ben-David, UAI 2009

Page 6: Reza Bosagh Zadeh (Carnegie Mellon) Shai Ben-David (Waterloo) UAI 09, Montréal, June 2009 A UNIQUENESS THEOREM FOR CLUSTERING

“Clustering” is an ill-defined problem

There are many different clustering tasks, leading to different clustering paradigms:

THERE ARE MANY CLUSTERING TASKS

Bosagh Zadeh, Ben-David, UAI 2009

Page 7: Reza Bosagh Zadeh (Carnegie Mellon) Shai Ben-David (Waterloo) UAI 09, Montréal, June 2009 A UNIQUENESS THEOREM FOR CLUSTERING

WE WOULD LIKE TO DISCUSS THE BROAD NOTION OF CLUSTERING

Independently of any particular algorithm, particular objective function, or particular generative data model

Bosagh Zadeh, Ben-David, UAI 2009

Page 8: Reza Bosagh Zadeh (Carnegie Mellon) Shai Ben-David (Waterloo) UAI 09, Montréal, June 2009 A UNIQUENESS THEOREM FOR CLUSTERING

WHAT FOR?

Choosing a suitable algorithm for a given task.

Axioms: to capture intuition about clustering in general.

Expected to be satisfied by all clustering functions

Properties: to capture differences between different clustering paradigms

Bosagh Zadeh, Ben-David, UAI 2009

Page 9: Reza Bosagh Zadeh (Carnegie Mellon) Shai Ben-David (Waterloo) UAI 09, Montréal, June 2009 A UNIQUENESS THEOREM FOR CLUSTERING

TIMELINE

Jardine, Sibson 1971 Considered only hierarchical functions

Kleinberg 2003 Presented an impossibility result

This paper Presents a uniqueness result for Single-Linkage

Bosagh Zadeh, Ben-David, UAI 2009

Page 10: Reza Bosagh Zadeh (Carnegie Mellon) Shai Ben-David (Waterloo) UAI 09, Montréal, June 2009 A UNIQUENESS THEOREM FOR CLUSTERING

THE BASIC SETTING

For a finite domain set SS, a distance function is a symmetric mapping

d:SxSd:SxS → RR++ such that d(x,y)=0d(x,y)=0 iff x=yx=y.

A partitioning function takes a dissimilarity function on SS and returns a partition of SS.

We wish to define the axioms that distinguish clustering functions, from any other functions that output domain partitions.

Bosagh Zadeh, Ben-David, UAI 2009

Page 11: Reza Bosagh Zadeh (Carnegie Mellon) Shai Ben-David (Waterloo) UAI 09, Montréal, June 2009 A UNIQUENESS THEOREM FOR CLUSTERING

KLEINBERG’S AXIOMS Scale Invariance F(F(λλd)=F(d)d)=F(d) for all d d and all strictly positive

λλ.

RichnessThe range of F(d)F(d) over all dd is the set of all possible partitionings

ConsistencyIf d’d’ equals dd except for shrinking

distances within clusters of F(d)F(d) or stretching between-cluster distances,

then F(d) = F(d’).F(d) = F(d’).

Bosagh Zadeh, Ben-David, UAI 2009

Page 12: Reza Bosagh Zadeh (Carnegie Mellon) Shai Ben-David (Waterloo) UAI 09, Montréal, June 2009 A UNIQUENESS THEOREM FOR CLUSTERING

KLEINBERG’S AXIOMS Scale Invariance F(F(λλd)=F(d)d)=F(d) for all d d and all strictly positive

λλ.

RichnessThe range of F(d)F(d) over all dd is the set of all possible partitionings

ConsistencyIf d’d’ equals dd except for shrinking

distances within clusters of F(d)F(d) or stretching between-cluster distances,

then F(d) = F(d’).F(d) = F(d’).Inconsistent! No algorithm can satisfy all 3 of these.

Bosagh Zadeh, Ben-David, UAI 2009

Page 13: Reza Bosagh Zadeh (Carnegie Mellon) Shai Ben-David (Waterloo) UAI 09, Montréal, June 2009 A UNIQUENESS THEOREM FOR CLUSTERING

CONSISTENT AXIOMS: Scale Invariance F(F(λλd, k)=F(d, k)d, k)=F(d, k) for all d d and all strictly positive

λλ.

k-RichnessThe range of F(d, k)F(d, k) over all dd is the set of all possible k-k-partitionings

ConsistencyIf d’d’ equals dd except for shrinking distances

within clusters of F(d, k)F(d, k) or stretching between-cluster distances,

then F(d, k)=F(d’, k).F(d, k)=F(d’, k).Consistent! (And satisfied by Single-Linkage, Min-Sum, …)

Fix kk

Bosagh Zadeh, Ben-David, UAI 2009

Page 14: Reza Bosagh Zadeh (Carnegie Mellon) Shai Ben-David (Waterloo) UAI 09, Montréal, June 2009 A UNIQUENESS THEOREM FOR CLUSTERING

Definition. Call any partitioning function which satisfies

a Clustering Function

Scale Invariancek-RichnessConsistency

CLUSTERING FUNCTIONS

Bosagh Zadeh, Ben-David, UAI 2009

Page 15: Reza Bosagh Zadeh (Carnegie Mellon) Shai Ben-David (Waterloo) UAI 09, Montréal, June 2009 A UNIQUENESS THEOREM FOR CLUSTERING

TWO CLUSTERING FUNCTIONS

Single-Linkage

1. Start with with all points in their own cluster

2. While there are more than k clusters Merge the two most

similar clusters

Similarity between two clusters is the similarity of the most similar two

points from differing clusters

Min-Sum k-Clustering

Find the k-partitioning Γ which minimizes

(Is NP-Hard to optimize)

Scale Invariancek-RichnessConsistency

Both Functions satisfy:

Hierarchical

Not Hierarchical

Proofs in paper.

Bosagh Zadeh, Ben-David, UAI 2009

Page 16: Reza Bosagh Zadeh (Carnegie Mellon) Shai Ben-David (Waterloo) UAI 09, Montréal, June 2009 A UNIQUENESS THEOREM FOR CLUSTERING

CLUSTERING FUNCTIONS

Single-Linkage and Min-Sum are both Clustering functions.

How to distinguish between them in an Axiomatic framework? Use Properties

Not all properties are desired in every clustering situation: pick and choose properties for your task

Bosagh Zadeh, Ben-David, UAI 2009

Page 17: Reza Bosagh Zadeh (Carnegie Mellon) Shai Ben-David (Waterloo) UAI 09, Montréal, June 2009 A UNIQUENESS THEOREM FOR CLUSTERING

PROPERTIES - ORDER-CONSISTENCY

Order-ConsistencyIf two datasets dd and d’d’ have the same

ordering of the distances, then for all kk, F(d, k)=F(d’, k)F(d, k)=F(d’, k)

Bosagh Zadeh, Ben-David, UAI 2009

o In other words the clustering function only cares about whether a pair of points are closer/further than another pair of points.

o Satisfied by Single-Linkage, Max-Linkage, Average-Linkage…

o NOT satisfied by most objective functions (Min-Sum, k-means, …)

Page 18: Reza Bosagh Zadeh (Carnegie Mellon) Shai Ben-David (Waterloo) UAI 09, Montréal, June 2009 A UNIQUENESS THEOREM FOR CLUSTERING

PATH-DISTANCE

In other words, we find the path from x to y, which has the smallest longest jump in it.

1 2

7

14

e.g.

Pd( , ) = 2

Since the path from above has a jump of distance 2

Undrawn edges are large

Bosagh Zadeh, Ben-David, UAI 2009

Page 19: Reza Bosagh Zadeh (Carnegie Mellon) Shai Ben-David (Waterloo) UAI 09, Montréal, June 2009 A UNIQUENESS THEOREM FOR CLUSTERING

PATH-DISTANCE

Imagine each point is an island, and we would like to go from island a to island b. As if we’re trying to cross a river by jumping on rocks.

Being human, we are restricted in how far we can jump from island to island.

Path-Distance would have us find the path with the smallest longest jump, ensuring that we could complete all the jumps successfully.

Bosagh Zadeh, Ben-David, UAI 2009

Page 20: Reza Bosagh Zadeh (Carnegie Mellon) Shai Ben-David (Waterloo) UAI 09, Montréal, June 2009 A UNIQUENESS THEOREM FOR CLUSTERING

PROPERTIES – PATH-DISTANCE COHERENCE

Path-Distance CoherenceIf two datasets dd and d’d’ have the same

induced path distance then for all kk, F(d, k)=F(d’, k)F(d, k)=F(d’, k)

Bosagh Zadeh, Ben-David, UAI 2009

Page 21: Reza Bosagh Zadeh (Carnegie Mellon) Shai Ben-David (Waterloo) UAI 09, Montréal, June 2009 A UNIQUENESS THEOREM FOR CLUSTERING

UNIQUENESS THEOREM

Theorem (This work) Single-Linkage is the only clustering function

satisfying Order-Consistency and Path Distance-Coherence

Bosagh Zadeh, Ben-David, UAI 2009

Page 22: Reza Bosagh Zadeh (Carnegie Mellon) Shai Ben-David (Waterloo) UAI 09, Montréal, June 2009 A UNIQUENESS THEOREM FOR CLUSTERING

UNIQUENESS THEOREM

Theorem (This work) Single-Linkage is the only clustering function satisfying

Order-Consistency and Path-Distance-Coherence

Is Path-Distance-Coherence doing all the work? No. Consistency is necessary for uniqueness k-Richness is necessary

“X is Necessary”: All other axioms/properties satisfied, just X missing, still not enough to get uniqueness

Bosagh Zadeh, Ben-David, UAI 2009

Page 23: Reza Bosagh Zadeh (Carnegie Mellon) Shai Ben-David (Waterloo) UAI 09, Montréal, June 2009 A UNIQUENESS THEOREM FOR CLUSTERING

PRACTICAL CONSIDERATIONS

Single-Linkage is not always the right function to use. Because Path-Distance-Coherence is not always

desirable.

It’s not always immediately obvious when we want a function to focus on the Path Distance

Introduce a different formulation involving Minimum Spanning Trees

Bosagh Zadeh, Ben-David, UAI 2009

Page 24: Reza Bosagh Zadeh (Carnegie Mellon) Shai Ben-David (Waterloo) UAI 09, Montréal, June 2009 A UNIQUENESS THEOREM FOR CLUSTERING

2020

PROPERTIES - MST-COHERENCE

F

If

Then

MST-CoherenceIf two datasets dd and d’d’ have the same

Minimum Spanning Tree then for all kk, F(d, k)=F(d’, k)F(d, k)=F(d’, k)

, 2

F, 220

20 , 2

Bosagh Zadeh, Ben-David, UAI 2009

Page 25: Reza Bosagh Zadeh (Carnegie Mellon) Shai Ben-David (Waterloo) UAI 09, Montréal, June 2009 A UNIQUENESS THEOREM FOR CLUSTERING

A TAXONOMY OF CLUSTERING FUNCTIONS

Min-Sum satisfies neither MST-Coherence nor Order-Consistency

Future work: Characterize other clustering functions

Bosagh Zadeh, Ben-David, UAI 2009

Page 26: Reza Bosagh Zadeh (Carnegie Mellon) Shai Ben-David (Waterloo) UAI 09, Montréal, June 2009 A UNIQUENESS THEOREM FOR CLUSTERING

THANKS FOR YOUR ATTENTION!

Bosagh Zadeh, Ben-David, UAI 2009