theoretical foundations of clustering – cs497 talk shai ben-david february 2007

22
Theoretical Foundations of Clustering – CS497 Talk Shai Ben-David February 2007

Upload: belinda-potter

Post on 17-Jan-2016

222 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Theoretical Foundations of Clustering – CS497 Talk Shai Ben-David February 2007

Theoretical Foundations of Clustering – CS497 Talk

Shai Ben-David

February 2007

Page 2: Theoretical Foundations of Clustering – CS497 Talk Shai Ben-David February 2007

What is clustering?

Given a collection of objects (characterized by feature vectors, or just a matrix of pair-wise similarities), detects the presence of distinct groups, and assign objects to groups.

40 45 50 55

74

76

78

80

82

84

Page 3: Theoretical Foundations of Clustering – CS497 Talk Shai Ben-David February 2007

Another example

Page 4: Theoretical Foundations of Clustering – CS497 Talk Shai Ben-David February 2007

Why should we care about clustering?

Clustering is a basic step in most data mining procedures:

Examples :

Clustering movie viewers for movie ranking.

Clustering proteins by their functionality.

Clustering text documents for content similarity.

Page 5: Theoretical Foundations of Clustering – CS497 Talk Shai Ben-David February 2007

Clustering is one of the most widely used toolfor exploratory data analysis. Social Sciences Biology Astronomy Computer Science . .All apply clustering to gain a first understanding of the structure of large data sets.

The Theory-Practice Gap

Yet, there exist distressingly little theoretical understanding of clustering

Page 6: Theoretical Foundations of Clustering – CS497 Talk Shai Ben-David February 2007

What questions should research address? What is clustering?

What is “good” clustering?

Can clustering be carried out efficiently?

Can we distinguish “clusterable” from “structureless” data?

Many more …

Page 7: Theoretical Foundations of Clustering – CS497 Talk Shai Ben-David February 2007

“Clustering” is an ill defined problem

There are many different clustering tasks, leading to different clustering paradigms:

There are Many Clustering TasksThere are Many Clustering Tasks

Page 8: Theoretical Foundations of Clustering – CS497 Talk Shai Ben-David February 2007

“Clustering” is an ill defined problem

There are many different clustering tasks, leading to different clustering paradigms:

There are Many Clustering TasksThere are Many Clustering Tasks

Page 9: Theoretical Foundations of Clustering – CS497 Talk Shai Ben-David February 2007

Some more examples

-2 0 2

-3-2

-10

12

3

2-d data set

-2 0 2

-3-2

-10

12

3

Compact partitioning into tw o strata

-2 0 2

-3-2

-10

12

3

Unsupervised learning

Page 10: Theoretical Foundations of Clustering – CS497 Talk Shai Ben-David February 2007

I would like to discuss the broad notion of clustering

Independently of any particular algorithm, objective function or generative data model

Page 11: Theoretical Foundations of Clustering – CS497 Talk Shai Ben-David February 2007

What for?

Choosing a suitable algorithm for a given task.

Evaluating the quality of clustering methods. Distinguishing significant structure from random fata morgana,

Distinguishing clusterable data from structure-less data.

Providing performance guarantees for clustering algorithms.

Much more …

Page 12: Theoretical Foundations of Clustering – CS497 Talk Shai Ben-David February 2007

The Basic Setting

For a finite domain set SS, a dissimilarity function (DF) is a symmetric mapping

d:SxSd:SxS → RR++ such that d(x,y)=0d(x,y)=0 iff x=yx=y.

A clustering function takes a dissimilarity function on SS and returns a partition of SS.

We wish to define the properties that distinguish clustering functions (from any other functions that output domain partitions).

Page 13: Theoretical Foundations of Clustering – CS497 Talk Shai Ben-David February 2007

Kleinberg’s Axioms

Scale Invariance F(F(λλd)=F(d)d)=F(d) for all d d and all strictly positive λλ.

Richness For any finite domain SS, {F(d): d {F(d): d is a DF over S}={P:P S}={P:P a partition of S} S}

Consistency d’d’ equals dd except for shrinking distances within

clusters of F(d)F(d) or stretching between-cluster distances (w.r.t. F(d)F(d)), then F(d)=F(d’).F(d)=F(d’).

Page 14: Theoretical Foundations of Clustering – CS497 Talk Shai Ben-David February 2007

Note that any pair is realizable

Consider Single-Linkage with different stopping criteria:

k connected components. Distance r stopping. Scale α stopping:

add edges as long as their length is

at most α(max-distance)

Page 15: Theoretical Foundations of Clustering – CS497 Talk Shai Ben-David February 2007

Kleinberg’s Impossibility result

There exist no clustering function

Proof:

Scaling up

Consistency

Page 16: Theoretical Foundations of Clustering – CS497 Talk Shai Ben-David February 2007

A Different Perspective

The goal is to generate a variety of axioms (or properties) over a fixed framework, so that different clustering approaches could be classified by the different subsets of axioms they satisfy.

Axioms as a tool for classifying clustering paradigms

Page 17: Theoretical Foundations of Clustering – CS497 Talk Shai Ben-David February 2007

Axioms as a tool for a taxonomy of clustering paradigms

The goal is to generate a variety of axioms (or properties) over a fixed framework, so that different clustering approaches could be classified by the different subsets of axioms they satisfy.

Scale Invariance

Richness Local Consistency

Full Consistency

Single Linkage - + + +Center Based + + + -Spectral + + + -MDL + + -Rate Distortion + + -

“Axioms” “Properties”

Page 18: Theoretical Foundations of Clustering – CS497 Talk Shai Ben-David February 2007

Ideal Theory

We would like to have a list of simple properties so that major clustering methods are distinguishable from

each other using these properties.

We would like the axioms to be such that all methods satisfy all of them, and nothing that is clearly not a clustering satisfies all of them.

(this is probably too much to hope for).

In the remainder of this talk, I would like to discuss some candidate “axioms” and “properties” to get a taste of

what this theory-development program may involve.

Page 19: Theoretical Foundations of Clustering – CS497 Talk Shai Ben-David February 2007

Types of Axioms/Properties

Richness requirements

E.g., relaxations of Kelinberg’s richness, e.g.,

{F(d): d {F(d): d is a DF over S}={P:P S}={P:P a partition of S S into k k sets}}

Invariance/Robustness/Stability requirements.

E.g., Scale-Invariance, Consistency, robustness

to perturbations of dd (“smoothness” of F F) or stability

w.r.t. sampling of SS.

Page 20: Theoretical Foundations of Clustering – CS497 Talk Shai Ben-David February 2007

Relaxations of Consistency

Local Consistency –

Let CC11, …C…Ckk be the clusters of F(d). F(d).

For every λλ0 0 ≥ 1 ≥ 1 and positive λλ11, .., ..λλk k ≤≤ 11, if d’d’ is defined by:

λλiid(a,b)d(a,b) if aa and bb are in CCii

d’(a,b)=d’(a,b)=

λλ00d(a,b)d(a,b) if a,ba,b are not in the same F(d)F(d)--cluster,

then F(d)=F(d’).F(d)=F(d’).

Is there any known clustering method for which it fails?

Page 21: Theoretical Foundations of Clustering – CS497 Talk Shai Ben-David February 2007

Other types of clustering

Culotta and McCallum’s “Clusterwise Similarity”

Edge-Detection (advantage to smooth contours)

Texture clustering

The professors example.

Page 22: Theoretical Foundations of Clustering – CS497 Talk Shai Ben-David February 2007

Conclusions and open questions

There is a place for developing an axiomatic framework for clustering.

The existing negative results do not rule

out the possibility of useful axiomatization. We should also develop a system of “clustering

properties” for a taxonomy of clustering methods. There are many possible routes to take and

hidden subtleties in this project.