linked based clustering and its theoretical foundations paper written by margareta ackerman and shai...

53
Linked Based Clustering and Its Theoretical Foundations Paper written by Margareta Ackerman and Shai Ben-David Presented by Yan T. Yang Yan T. Yang

Upload: alec-weare

Post on 15-Dec-2015

216 views

Category:

Documents


2 download

TRANSCRIPT

Page 1: Linked Based Clustering and Its Theoretical Foundations Paper written by Margareta Ackerman and Shai Ben-David Yan T. Yang Presented by Yan T. Yang

Linked Based Clustering and Its Theoretical Foundations

Paper written by Margareta Ackerman and Shai Ben-David

Presented by Yan T. YangYan T. Yang

Page 2: Linked Based Clustering and Its Theoretical Foundations Paper written by Margareta Ackerman and Shai Ben-David Yan T. Yang Presented by Yan T. Yang

Outline

Hierarchical clustering Linkage based clustering Theoretical foundation of clustering Characterization of linkage based

clustering

Page 3: Linked Based Clustering and Its Theoretical Foundations Paper written by Margareta Ackerman and Shai Ben-David Yan T. Yang Presented by Yan T. Yang

Outline

Hierarchical clustering Linkage based clustering Theoretical foundation of clustering Characterization of linkage based

clustering

Page 4: Linked Based Clustering and Its Theoretical Foundations Paper written by Margareta Ackerman and Shai Ben-David Yan T. Yang Presented by Yan T. Yang

Hierarchical clustering

Agglomerative Divisive

Page 5: Linked Based Clustering and Its Theoretical Foundations Paper written by Margareta Ackerman and Shai Ben-David Yan T. Yang Presented by Yan T. Yang

Hierarchical clustering

A

D

E

B

C

A B C D E

Raw Data Dendrogram

F

F

Page 6: Linked Based Clustering and Its Theoretical Foundations Paper written by Margareta Ackerman and Shai Ben-David Yan T. Yang Presented by Yan T. Yang

Agglomerative clustering

Raw Data Dendrogram

A B C D E F

A

D

E

B

C

F

Page 7: Linked Based Clustering and Its Theoretical Foundations Paper written by Margareta Ackerman and Shai Ben-David Yan T. Yang Presented by Yan T. Yang

Agglomerative clustering

Raw Data Dendrogram

A B C D E F

A

D

E

B

C

F

Page 8: Linked Based Clustering and Its Theoretical Foundations Paper written by Margareta Ackerman and Shai Ben-David Yan T. Yang Presented by Yan T. Yang

Agglomerative clustering

A

D

E

B

C

Raw Data Dendrogram

F

A B C D E F

Page 9: Linked Based Clustering and Its Theoretical Foundations Paper written by Margareta Ackerman and Shai Ben-David Yan T. Yang Presented by Yan T. Yang

Agglomerative clustering

A

D

E

B

C

Raw Data Dendrogram

F

A B C D E F

Page 10: Linked Based Clustering and Its Theoretical Foundations Paper written by Margareta Ackerman and Shai Ben-David Yan T. Yang Presented by Yan T. Yang

Agglomerative clustering

Raw Data Dendrogram

A B C D E F

A

D

E

B

C

F

Page 11: Linked Based Clustering and Its Theoretical Foundations Paper written by Margareta Ackerman and Shai Ben-David Yan T. Yang Presented by Yan T. Yang

Agglomerative clustering

Raw Data Dendrogram

A B C D E F

A

D

E

B

C

F

General Complexity: O(n3)

Page 12: Linked Based Clustering and Its Theoretical Foundations Paper written by Margareta Ackerman and Shai Ben-David Yan T. Yang Presented by Yan T. Yang

Agglomerative clustering

Raw Data Dendrogram

General Complexity: O(n3)

Linkage Based: O(n2)

Page 13: Linked Based Clustering and Its Theoretical Foundations Paper written by Margareta Ackerman and Shai Ben-David Yan T. Yang Presented by Yan T. Yang

Linkage Based Clustering

General Complexity: O(n3)

Linkage Based: O(n2)

Whenever a pair of points share the same cluster,

they are connected via a tight chain of points.

A

D

E

B

C

F

Page 14: Linked Based Clustering and Its Theoretical Foundations Paper written by Margareta Ackerman and Shai Ben-David Yan T. Yang Presented by Yan T. Yang

Linkage Based Clustering

General Complexity: O(n3)

Linkage Based: O(n2)

Whenever a pair of points share the same cluster,

they are connected via a tight chain of points.

A

D

E

B

C

F

3. Repeat until only k clusters remain.

Steps:

1. Start: singletons

2. At each step, merge the closest pair of clusters

Page 15: Linked Based Clustering and Its Theoretical Foundations Paper written by Margareta Ackerman and Shai Ben-David Yan T. Yang Presented by Yan T. Yang

Linkage-based clustering

Linkage Based: O(n2)

Dissimilarity Measure Linkage Criteria

A measure of distance between pairof observations.

A measure of distance between setsof observations

as function of dissimilarity measure.

Page 16: Linked Based Clustering and Its Theoretical Foundations Paper written by Margareta Ackerman and Shai Ben-David Yan T. Yang Presented by Yan T. Yang

Linkage-based clustering

Linkage Based: O(n2)

Euclidean distance

Minkowski distance (generalized Euclidean)

Mahalahobis distance (scaled Euclidean)

City block distance ( |*| version of Euclidean)

Dissimilarity Measure Linkage Criteria

Page 17: Linked Based Clustering and Its Theoretical Foundations Paper written by Margareta Ackerman and Shai Ben-David Yan T. Yang Presented by Yan T. Yang

Linkage-based clustering

Linkage Based: O(n2)

Dissimilarity Measure Linkage Criteria

Between two sets A and B

},:),(min{ :Linkage Single BbAabad

},:),(max{ :Linkage Complete BbAabad

Aa Bb

bad ),(|B||A|

1 :Linkage Average

Page 18: Linked Based Clustering and Its Theoretical Foundations Paper written by Margareta Ackerman and Shai Ben-David Yan T. Yang Presented by Yan T. Yang

Linkage-based clustering

Linkage Based: O(n2)

Linkage Criteria

Between two sets A and B

},:),(min{ :Linkage Single BbAabad

},:),(max{ :Linkage Complete BbAabad

Aa Bb

bad ),(|B||A|

1 :Linkage Average

Minimum Spanning Tree

Successive Cliques

UPGMA tree

Page 19: Linked Based Clustering and Its Theoretical Foundations Paper written by Margareta Ackerman and Shai Ben-David Yan T. Yang Presented by Yan T. Yang

Linkage-based clustering

Linkage Based: O(n2)

Linkage Criteria

Between two sets A and B

},:),(min{ :Linkage Single BbAabad

},:),(max{ :Linkage Complete BbAabad

Aa Bb

bad ),(|B||A|

1 :Linkage Average

SLINK [Defays]

CLINK [Sibson]

UPGMA [Murtagh]

Page 20: Linked Based Clustering and Its Theoretical Foundations Paper written by Margareta Ackerman and Shai Ben-David Yan T. Yang Presented by Yan T. Yang

Linkage-based clustering},:),(min{ :Linkage Single BbAabad

Add to cluster as long as there is a link

Page 21: Linked Based Clustering and Its Theoretical Foundations Paper written by Margareta Ackerman and Shai Ben-David Yan T. Yang Presented by Yan T. Yang

Linkage-based clustering},:),(min{ :Linkage Single BbAabad

Add to cluster as long as there is a link

Page 22: Linked Based Clustering and Its Theoretical Foundations Paper written by Margareta Ackerman and Shai Ben-David Yan T. Yang Presented by Yan T. Yang

Linkage-based clustering},:),(min{ :Linkage Single BbAabad

Add to cluster as long as there is a link

Page 23: Linked Based Clustering and Its Theoretical Foundations Paper written by Margareta Ackerman and Shai Ben-David Yan T. Yang Presented by Yan T. Yang

Linkage-based clustering},:),(min{ :Linkage Single BbAabad

Add to cluster as long as there is a link

Page 24: Linked Based Clustering and Its Theoretical Foundations Paper written by Margareta Ackerman and Shai Ben-David Yan T. Yang Presented by Yan T. Yang

Linkage-based clustering},:),(min{ :Linkage Single BbAabad

Add to cluster as long as there is a link

Page 25: Linked Based Clustering and Its Theoretical Foundations Paper written by Margareta Ackerman and Shai Ben-David Yan T. Yang Presented by Yan T. Yang

Linkage-based clustering},:),(min{ :Linkage Single BbAabad

Add to cluster as long as there is a link

Page 26: Linked Based Clustering and Its Theoretical Foundations Paper written by Margareta Ackerman and Shai Ben-David Yan T. Yang Presented by Yan T. Yang

Linkage-based clustering},:),(max{ :Linkage Complete BbAabad

Add to cluster when a clique is formed

Page 27: Linked Based Clustering and Its Theoretical Foundations Paper written by Margareta Ackerman and Shai Ben-David Yan T. Yang Presented by Yan T. Yang

Linkage-based clustering},:),(max{ :Linkage Complete BbAabad

Add to cluster when a clique is formed

Page 28: Linked Based Clustering and Its Theoretical Foundations Paper written by Margareta Ackerman and Shai Ben-David Yan T. Yang Presented by Yan T. Yang

Linkage-based clustering},:),(max{ :Linkage Complete BbAabad

Add to cluster when a clique is formed

Page 29: Linked Based Clustering and Its Theoretical Foundations Paper written by Margareta Ackerman and Shai Ben-David Yan T. Yang Presented by Yan T. Yang

Linkage-based clustering},:),(max{ :Linkage Complete BbAabad

Add to cluster when a clique is formed

Page 30: Linked Based Clustering and Its Theoretical Foundations Paper written by Margareta Ackerman and Shai Ben-David Yan T. Yang Presented by Yan T. Yang

Linkage-based clustering},:),(max{ :Linkage Complete BbAabad

Add to cluster when a clique is formed

Page 31: Linked Based Clustering and Its Theoretical Foundations Paper written by Margareta Ackerman and Shai Ben-David Yan T. Yang Presented by Yan T. Yang

Linkage-based clustering},:),(max{ :Linkage Complete BbAabad

Add to cluster when a clique is formed

Page 32: Linked Based Clustering and Its Theoretical Foundations Paper written by Margareta Ackerman and Shai Ben-David Yan T. Yang Presented by Yan T. Yang

Linkage-based clustering},:),(max{ :Linkage Complete BbAabad

Add to cluster when a clique is formed

Page 33: Linked Based Clustering and Its Theoretical Foundations Paper written by Margareta Ackerman and Shai Ben-David Yan T. Yang Presented by Yan T. Yang

Linkage-based clustering},:),(max{ :Linkage Complete BbAabad

Add to cluster when a clique is formed

Page 34: Linked Based Clustering and Its Theoretical Foundations Paper written by Margareta Ackerman and Shai Ben-David Yan T. Yang Presented by Yan T. Yang

Linkage-based clustering},:),(max{ :Linkage Complete BbAabad

Add to cluster when a clique is formed

Page 35: Linked Based Clustering and Its Theoretical Foundations Paper written by Margareta Ackerman and Shai Ben-David Yan T. Yang Presented by Yan T. Yang

Theoretical Foundation

Many different clustering techniques: which one to use ?

Identify properties that separate input-output behavior of different clustering paradigms: [Ackerman, Ben-David, Loker]

1) Be intuitive and meaningful

2) Differentiate algorithms

The properties should

Page 36: Linked Based Clustering and Its Theoretical Foundations Paper written by Margareta Ackerman and Shai Ben-David Yan T. Yang Presented by Yan T. Yang

Theoretical Foundation

[Kleinberg, 2002]: abstract properties OR axioms.

[Zadeh, Ben-David 2009]: properties to characterize single linkage.

[Ackerman, Ben-David, Loker] : properties to characterize all linkage.

Page 37: Linked Based Clustering and Its Theoretical Foundations Paper written by Margareta Ackerman and Shai Ben-David Yan T. Yang Presented by Yan T. Yang

Theoretical Foundation

Properties

Extended richness

Outer consistency

Locality

Definition

Refinement

Hierarchical

Clustering function

Hierarchical

Page 38: Linked Based Clustering and Its Theoretical Foundations Paper written by Margareta Ackerman and Shai Ben-David Yan T. Yang Presented by Yan T. Yang

Theoretical Foundation

Properties

Extended richness

Outer consistency

Locality

Definition

A clustering C is a refinement of clustering C’ if every cluster in C’ is a union of some clusters in C.

CC’

Refinement

Hierarchical

Clustering function

Hierarchical

Page 39: Linked Based Clustering and Its Theoretical Foundations Paper written by Margareta Ackerman and Shai Ben-David Yan T. Yang Presented by Yan T. Yang

Theoretical Foundation

Properties

Extended richness

Outer consistency

Locality

Definition

Refinement

Hierarchical

A clustering function F(X,d,k):

Clustering function

Hierarchical

1. Take input X, all data points2. Distance measure, d

3. Number of clusters desired, k.4. Output the clusters

Page 40: Linked Based Clustering and Its Theoretical Foundations Paper written by Margareta Ackerman and Shai Ben-David Yan T. Yang Presented by Yan T. Yang

Theoretical Foundation

Properties

Extended richness

Outer consistency

Locality

Definition

Refinement

Hierarchical

A clustering function is hierarchical if forall X and all d and every 1<= k <= k’ <= |X| F(X,d,k’) is a refinement of F(X,d,k)

Clustering function

A B C D E F

6

43

2

Hierarchical

Page 41: Linked Based Clustering and Its Theoretical Foundations Paper written by Margareta Ackerman and Shai Ben-David Yan T. Yang Presented by Yan T. Yang

Theoretical Foundation

Properties

Extended richness

Outer consistency

Locality

Definition

Refinement

Hierarchical

Clustering function)||,,(

:any and any for if local is F

Cc

CdcFC

F(X,d,k)C X, d, k

C

Hierarchical

Page 42: Linked Based Clustering and Its Theoretical Foundations Paper written by Margareta Ackerman and Shai Ben-David Yan T. Yang Presented by Yan T. Yang

Theoretical Foundation

Properties

Extended richness

Outer consistency

Locality

Definition

Refinement

Hierarchical

Clustering function . and , allfor then distances,

cluster -between increasingfor except ,equals If

kd, X(X,d’,k)F(X,d,k)=F

dd’

d d’

Hierarchical

Page 43: Linked Based Clustering and Its Theoretical Foundations Paper written by Margareta Ackerman and Shai Ben-David Yan T. Yang Presented by Yan T. Yang

Theoretical Foundation

Properties

Extended richness

Outer consistency

Locality

Definition

Refinement

Hierarchical

Clustering function

Locality and outer consistency

Not axioms

Spectral clustering with

Ratio region

Normalized cut

does not satisfy these two.

Hierarchical

Page 44: Linked Based Clustering and Its Theoretical Foundations Paper written by Margareta Ackerman and Shai Ben-David Yan T. Yang Presented by Yan T. Yang

Theoretical Foundation

Properties

Extended richness

Outer consistency

Locality

Definition

Refinement

Hierarchical

Clustering function ),( 11 dX ),( 22 dX ),( 33 dX

),( 321 dXXX Hierarchical

Page 45: Linked Based Clustering and Its Theoretical Foundations Paper written by Margareta Ackerman and Shai Ben-David Yan T. Yang Presented by Yan T. Yang

Theoretical Foundation

Properties

Extended richness

Outer consistency

Locality

Theorem:

The clustering function is linkage based

If and only if

All of the properties are satisfied.

Hierarchical

Page 46: Linked Based Clustering and Its Theoretical Foundations Paper written by Margareta Ackerman and Shai Ben-David Yan T. Yang Presented by Yan T. Yang

Theoretical Foundation

Theorem:

The clustering function is linkage based

If and only if

All of the properties are satisfied.

properties are satisfied.

Properties

Extended richness

Outer consistency

Locality

Hierarchical

linkage based function

Page 47: Linked Based Clustering and Its Theoretical Foundations Paper written by Margareta Ackerman and Shai Ben-David Yan T. Yang Presented by Yan T. Yang

Theoretical Foundation

Theorem:

The clustering function is linkage based

If and only if

All of the properties are satisfied.

linkage based function properties are satisfied.

Straightforward

Properties

Extended richness

Outer consistency

Locality

Hierarchical

Page 48: Linked Based Clustering and Its Theoretical Foundations Paper written by Margareta Ackerman and Shai Ben-David Yan T. Yang Presented by Yan T. Yang

Theoretical Foundation

Theorem:

The clustering function is linkage based

If and only if

All of the properties are satisfied.

properties are satisfied.

Properties

Extended richness

Outer consistency

Locality

Hierarchical

linkage based function

Page 49: Linked Based Clustering and Its Theoretical Foundations Paper written by Margareta Ackerman and Shai Ben-David Yan T. Yang Presented by Yan T. Yang

Theoretical Foundation

Properties

Extended richness

Outer consistency

Locality

Hierarchical

linkage based function

A linkage function is a function l: {(X1,X2,d) : d is a dissimilarity function over X1 U X2} → R+

that satisfies the following:},:),(min{ :Linkage Single BbAabad

},:),(max{ :Linkage Complete BbAabad

Aa Bb

bad ),(|B||A|

1 :Linkage Average

- Representation independent

- Monotonic

- Any pair of clusters can be made arbitrarily distant

Page 50: Linked Based Clustering and Its Theoretical Foundations Paper written by Margareta Ackerman and Shai Ben-David Yan T. Yang Presented by Yan T. Yang

Theoretical Foundation

Properties

Extended richness

Outer consistency

Locality

Hierarchical

linkage based function

Goal

Given a clustering function F that satisfies the properties, define a linkage function l so that the linkage-based clustering based on l coincides with F (for every X, d and k).

F

Linkage-based

Page 51: Linked Based Clustering and Its Theoretical Foundations Paper written by Margareta Ackerman and Shai Ben-David Yan T. Yang Presented by Yan T. Yang

Theoretical FoundationGiven a clustering function F that satisfies the properties, define a linkage function l so that the linkage-based clustering based on l coincides with F (for every X, d and k).

F

Linkage-based

Define an operator <F: (A,B,d1) <F (C,D,d2) if there exists d that extends d1 and d2 such that when we run F on (A U B U C U D), A and B are merged before C and D.

A

B

C

D

F(AUBUCUD,d,4) A

B

C

D

F(AUBUCUD,d,3)

Page 52: Linked Based Clustering and Its Theoretical Foundations Paper written by Margareta Ackerman and Shai Ben-David Yan T. Yang Presented by Yan T. Yang

Theoretical FoundationGiven a clustering function F that satisfies the properties, define a linkage function l so that the linkage-based clustering based on l coincides with F (for every X, d and k).

F

Linkage-based

Define an operator <F: (A,B,d1) <F (C,D,d2) if there exists d that extends d1 and d2 such that when we run F on (A U B U C U D), A and B are merged before C and D.

A

B

C

D

F(AUBUCUD,d,4) A

B

C

D

F(AUBUCUD,d,3)

Page 53: Linked Based Clustering and Its Theoretical Foundations Paper written by Margareta Ackerman and Shai Ben-David Yan T. Yang Presented by Yan T. Yang

Theoretical Foundation- Prove that <F can be extended to a partial ordering - Use the ordering to define l

Lemma (<F is cycle-free)

Given a function F that is hierarchical, local, outer-consistent and satisfies extended richness, there are no (A1,B1,d1), (A2,B2,d2),… (An,Bn,dn) so that

(A1,B1,d1) <F (A2,B2,d2) … <F (An,Bn,dn)and (A1,B1,d1) = (An,Bn,dn)

By the above Lemma, the transitive closure of <F is a partial ordering.

There exists an order preserving function l that maps pairs of data sets to R (since <F is defined over a countable set)

It can be shown that l satisfies the properties of a linkage function.