iterative optimization of hierarchical clusterings doug fisher department of computer science,...

Iterative Optimization of Hierarchical Clusterings

Doug FisherDepartment of Computer Science, Vanderbilt University

Journal of Artificial Intelligence Research 4 (1996) 147-179

Presentation: Yugong Cheng

04/23/02

2

Outline

• Introduction• Objective Function• Iterative Optimization Methods and

Experiments• Simplification of Hierarchical Clustering• Conclusion• Final Exam Questions Summary

3

Introduction

• Clustering is a process of unsupervised learning, which groups objects into clusters.

• Major Clustering Methods– Partitioning– Hierarchical– Density-based– Grid-based– Model-based

4

Introduction (Continued)

• Clustering systems differ in• objective function • control strategy

• Usually a search strategy cannot be both computationally inexpensive and give any guarantee about the quality.

5

Introduction (Continued)

This paper discusses the use of iterative optimization and simplification to construct clusters that satisfy both conditions:

• High quality

• Computationally inexpensive

The suggested method involves two steps:• Constructing a clustering inexpensively

• Using an iterative optimization method to improve the clustering

6

Category Utility

• CU(CK) = P(Ck)[P(Ai = Vij |CK)2 -P(Ai = Vij)2]

• PU({C1, C2, … CN}) = k CU(CK)/N

Where an observation is a vector of Vij along attributes(or variables) Ai

• This measure rewards clusters Ck, that increases the predictability of Vij within Ck (i.e. P(Ai=Vij|Ck)) relative to their predictability in the population as a whole (i.e. P(Ai = Vij))

8

Hierarchical Sorting

• Given an observation and current partition, evaluate the quality of the clusterings that result from– Placing the observation in each of the existing

clusters– Creating a new cluster that only covers the new

observation

• Select the option that yields the highest quality score (PU)

10

Iterative Optimization Methods

• Reorder-resort (Cluster/2): seed selection, reordering, and re-clustering.

• Iterative redistribution of single observation: moving single observation one by one.

• Iterative hierarchical redistribution: moving clusters together with its sub-tree.

11

Reorder-resort (k-mean)

k-mean: k random seeds are selected, and k clusters are growing around these attractors; the centroids of the clusters are picked as new seeds, new clusters are growing. The process iterates until there is no further improvement in the quality of generated clustering.

12

Reorder-resort (k-mean)

• Ordering data to make consecutive observations dissimilar based on Euclidean distance leads to good clusterings

• Extracting biased “dissimilarity” ordering from the hierarchical clustering

• Initial sorting, extraction dissimilarity ordering, re-clustering

13

Iterative Redistribution of Single Observations

• Moves single observations from cluster to cluster

• A cluster contains only one observation is removed and its single observation is resorted

• Iterate until two consecutive iterations yield the same clustering

14

• The ISODATA algorithm determines a target cluster for each observation but does not move the cluster until targets for all observations have been determined

• A sequential version that moves each observation as its target is identified through sorting

Single Observation

Redistribution Variations

15

Iterative Hierarchical Redistribution

• Takes large steps in the search for a better clustering

• Resorts sub-tree instead of single observation

• Tree removal requires that the various counts of ancestors’ be decremented. Also, the host cluster’s variable value counts needs to be incremented.

16

Scheme

• Given an existing hierarchical clustering, a recursive loop examines sibling clusters in the hierarchy in a depth first fashion.

• An inner, iterative loop examines each sibling based on the objective function. And repeats until two consecutive iterations lead to the same set of siblings.

17

(Continued)

• The recursive loop then turns its attention to the children of each of these remaining siblings.

• Finally the leaves will be reached and resorted.

• The recursive loop will be applied several times until there are no changes that occur from one pass to the next.

19

Experiment conditions

– The initial clustering is generated by hierarchical sorting on

• random ordering observations

• similarity ordering observations, which samples observations within the same region before sampling observations from differing regions.

– Optimization strategies are applied– Assume the primary goal of clustering is to

discover a single-level partitioning of the data that is of optimal quality

20

Comparison between Iterative Optimization Strategies

21

Main findings from the Table:

• Hierarchical redistribution achieves the highest mean PU scores in both the random and similarity case in 3 of 4 domains.

• Reordering and re-clustering comes closest to hierarchical redistribution’s performance in all cases, better it in 1 domain.

• Single-observation redistribution modestly improves an initial sort, and is substantially worse than the other two optimization methods.

22

Time requirements

23

Level of Tree

24

Simplifying Hierarchical Clustering

• Simplify hierarchical clustering and minimize classification cost

• Minimize Error Rate

• Validation set to identify the frontier of clusters for prediction of each variable

• Node lies below the frontier of every variable would be pruned

25

Validation

• For each variable, Ai, the objects from the validation set are each classified through the hierarchical clustering with the value of variable Ai “masked” for purposes of classification.

• At each cluster encountered during classification the observation’s value for Ai is compared to the most probable value for Ai at the cluster.

• A Count of all correct predictions for each variable at a cluster is maintained.

• A preferred frontier for each variable is identified that maximizes the number of correct counts for the variable.

28

Concluding Remarks

• There are three phases in searching the space of hierarchical clusterings:– Inexpensive generation of an initial clustering– Iterative optimization for clusterings– Retrospective simplification of generated

clusterings

• The new method, hierarchical redistribution optimization works well.

29

Final Exam Questions

1. The main idea of the paper is to construct clusterings which satisfy two conditions, 1) name the conditions, 2) name the two steps to satisfy the conditions

1) To construct clusterings that satisfy both conditions: high quality and computationally inexpensive

2) First constructs a clustering inexpensively (hierarchical sorting), then uses an iterative optimization method to improve the quality of clustering (reorder-resort, iterative single redistribution, hierarchical redistribution).

Final Exam Question2. Describe the three iterative methods for clustering optimization:Reorder-resort (k-mean): Extracting biased “dissimilarity” ordering from the

initial hierarchical clustering, then performing k-mean partitioning iteratively.

Iterative redistribution of single observation: moving single observation one by one. A cluster contains only one observation is removed and its single observation is resorted. Iterating until two consecutive iterations yield the same clustering.

Hierarchical redistribution: Takes large steps in the search for a better clustering. It resorts sub-tree instead of single observation.

• Given an existing hierarchical clustering, a recursive loop examines sibling clusters in the hierarchy in a depth first fashion.

• An inner, iterative loop examines each sibling based on the objective function. And repeats until two consecutive iterations lead to the same set of siblings

• The recursive loop then turns its attention to the children of each of these remaining siblings.

• Finally the leaves will be reached and resorted. • The recursive loop will be applied several times until there are no

changes that occur from one pass to the next.

31

Final Exam Question

3. (1) The cluster is better when the relative CU score is a) big, b) small, c) equal to 0.

The cluster is better with a higher CU score. So choose a).

(2) Which sorting method is better? a) random sorting, b) similarity sorting.

Dissimilar ordering will yield better clustering, so random sorting of samples will be better. Choose a).

iterative optimization of hierarchical clusterings doug fisher department of computer science,...

Documents