hierachical clustering

84
Tilani Gunawardena Algorithms: Clustering

Upload: tilani-gunawardena-phdunibas-bscpera-fheauk-amiesl

Post on 24-Jan-2017

78 views

Category:

Education


0 download

TRANSCRIPT

Page 1: Hierachical clustering

Tilani Gunawardena

Algorithms: Clustering

Page 2: Hierachical clustering

• Grouping of records ,observations or cases into classes of similar objects.

• A cluster is a collection of records,– Similar to one another– Dissimilar to records in other clusters

What is Clustering?

Page 3: Hierachical clustering

Clustering

Page 4: Hierachical clustering

Clustering

Page 5: Hierachical clustering

• There is no target variable for clustering • Clustering does not try to classify or predict the

values of a target variable. • Instead, clustering algorithms seek to segment

the entire data set into relatively homogeneous subgroups or clusters, – Where the similarity of the records within the

cluster is maximized, and – Similarity to records outside this cluster is

minimized.

Difference between Clustering and Classification

Page 6: Hierachical clustering

• Identification of groups f records such that similarity within a group is very high while the similarity to records in other groups is very low.– group data points that are close (or similar) to each other– identify such groupings (or clusters) in an unsupervised

manner• Unsupervised: no information is provided to the

algorithm on which data points belong to which clusters

• In other words,– Clustering algorithm seeks to construct clusters of records

such that the between-cluster variation(BCV) is large compared to the within-cluster variation(WCV)

Goal of Clustering

Page 7: Hierachical clustering

Between-cluster variation:

Within-cluster variation:

Goal of Clustering

between-cluster variation(BCV) is large compared to the within-cluster variation(WCV)

(Intra-cluster distance) the sum of distances between objects in the same cluster are minimized(Inter-cluster distance) while the distances between different clusters are maximized

Page 8: Hierachical clustering

• Clustering techniques apply when there is no class to be predicted

• As we've seen clusters can be: – disjoint vs. overlapping – deterministic vs. probabilistic – flat vs. hierarchical

• k-means Algorithm – k-means clusters are disjoint, deterministic, and

flat

Clustering

Page 9: Hierachical clustering

Issues Related to Clustering

• How to measure similarity – Euclidian Distance – City-block Distance – Minkowski Distance

• How to recode categorical variables?• How to standardize or normalize numerical

variables? – Min-Max Normalization– Z-score standardization ( )

• How many clusters we expect to uncover?

Page 10: Hierachical clustering

Type of Clustering• Partitional clustering: Partitional algorithms

determine all clusters at once. They include:– K-Means Clustering – Fuzzy c-means clustering– QT clustering

• Hierarchical Clustering: – Agglomerative ("bottom-up"): Agglomerative

algorithms begin with each element as a separate cluster and merge them into successively larger clusters.

– Divisive ("top-down"): Divisive algorithms begin with the whole set and proceed to divide it into successively smaller clusters.

Page 11: Hierachical clustering

K-means Clustering

Page 12: Hierachical clustering

k-Means Clustering

• Input: n objects (or points) and a number k• Algorithm1) Randomly assign K records to be the initial

cluster center locations2) Assign each object to the group that has the

closest centroid3) When all objects have been assigned,

recalculate the positions of the K centroids4) Repeat steps 2 to 3 until convergence or

termination

Page 13: Hierachical clustering

K-Mean Clustering

Page 14: Hierachical clustering

Termination Conditions

• The algorithm terminates when the centroids no longer change.

• The SSE(sum of squared errors) value is less than some small threshold value

• Where p є Ci represents each data point in cluster i and mi represent the centroid of cluster i.

Page 15: Hierachical clustering

Example 1:

• Lets s suppose the following points are the delivery locations for pizza

Page 16: Hierachical clustering

• Lets locate three cluster centers randomly

Page 17: Hierachical clustering

• Find the distance of points as shown

Page 18: Hierachical clustering

• Assign the points to the nearest cluster center based on the distance between each center and the point

Page 19: Hierachical clustering

• Re-assign the cluster centres and locate nearest points

Page 20: Hierachical clustering

• Re-assign the cluster centres and locate nearest points, calculate the distance

Page 21: Hierachical clustering

Form the three clusters

Page 22: Hierachical clustering

Example 2:• Suppose that we have eight data points in

two-dimensional space as follows

• And suppose that we are interested in uncovering k=2 clusters.

Page 23: Hierachical clustering

Point Distance from m1 Distance from m2 Cluster membership

a

b

c

d

e

f

g

h

n

iii yxYXD

1

2)(),(

Page 24: Hierachical clustering

Point Distance from m1 Distance from m2 Cluster membership

a 2.00 2.24

b 2.83 2.24

c 3.61 2.83

d 4.47 3.61

e 1.00 1.41

f 3.16 2.24

g 0.00 1.00

h 1.00 0.00

n

iii yxYXD

1

2)(),(

Page 25: Hierachical clustering

Point Distance from m1 Distance from m2 Cluster membership

a 2.00 2.24 C1

b 2.83 2.24 C2

c 3.61 2.83 C2

d 4.47 3.61 C2

e 1.00 1.41 C1f 3.16 2.24 C2

g 0.00 1.00 C1

h 1.00 0.00 C2

SSE=22+2.242+2.832+3.612+12+2.242+02+02=36

d(m1,m2)=1

Page 26: Hierachical clustering

Centroid of the cluster 1 is[(1+1+1)/3,(3+2+1)/3]=(1,2)

Centroid of the cluster 2 is[(3+4+5+4+2)/5,(3+3+3+2+1)/5]=(3.6,2.4)

Page 27: Hierachical clustering

Point Distance from m1 Distance from m2 Cluster membership

a

b

c

d

ef

g

h

m1=(1,2)m2=(3.6,2.4)

Page 28: Hierachical clustering

Point Distance from m1 Distance from m2 Cluster membership

a 1.00 2.67

b 2.24 0.85

c 3.61 0.72

d 4.12 1.52

e 0.00 2.63f 3.00 0.57

g 1.00 2.95

h 1.41 2.13

m1=(1,2)m2=(3.6,2.4)

Page 29: Hierachical clustering

Point Distance from m1 Distance from m2 Cluster membership

a 1.00 2.67 C1

b 2.24 0.85 C2

c 3.61 0.72 C2

d 4.12 1.52 C2

e 0.00 2.63 C1f 3.00 0.57 C2

g 1.00 2.95 C1

h 1.41 2.13 C1

SSE=12+0.852+0.722+1.522+02+0.572+12+1.412=7.88

d(m1,m2)=2.63m1(1,2)m2(3.6,2.4)

Page 30: Hierachical clustering

Centroid of the cluster 1 is[(1+1+1+2)/4,(3+2+1+1)/4]=(1.25,1.75)

Centroid of the cluster 2 is[(3+4+5+4)/4,(3+3+3+2)/4]=(4,2.75)

Page 31: Hierachical clustering

Point Distance from m1 Distance from m2 Cluster membership

a

b

c

d

ef

g

h

m1(1.25,1.75)m2(4,2.75)

Page 32: Hierachical clustering

Point Distance from m1 Distance from m2 Cluster membership

a 1.27 3.01

b 2.15 1.03

c 3.02 0.25

d 3.95 1.03

e 0.35 3.09f 2.76 0.75

g 0.79 3.47

h 1.06 2.66

m1(1.25,1.75)m2(4,2.75)

Page 33: Hierachical clustering

Point Distance from m1 Distance from m2 Cluster membership

a 1.27 3.01 C1

b 2.15 1.03 C2

c 3.02 0.25 C2

d 3.95 1.03 C2

e 0.35 3.09 C1f 2.76 0.75 C2

g 0.79 3.47 C1

h 1.06 2.66 C1

m1(1.25,1.75)m2(4,2.75)

SSE=1.272+1.032+0.252+1.032+0.352+0.752+0.792+1.062=6.25

d(m1,m2)=2.93

Page 34: Hierachical clustering

Final Results

Page 35: Hierachical clustering

Example 2:

Page 36: Hierachical clustering
Page 37: Hierachical clustering

How to decide k?

• Unless the analyst has a prior knowledge of the number of underlying clusters, therefore,– Clustering solutions for each value of K is

compared– The value of K resulting in the smallest SSE being

selected

Page 38: Hierachical clustering

SummaryK-means algorithm is a simple yet popular method for clustering analysis• Low complexity :complexity is O(nkt), where t =

#iterations• Its performance is determined by initialisation and

appropriate distance measure• There are several variants of K-means to overcome its

weaknesses – K-Medoids: resistance to noise and/or outliers(data that do not

comply with the general behaviour or model of the data )– K-Modes: extension to categorical data clustering analysis– CLARA: extension to deal with large data sets– Gaussian Mixture models (EM algorithm): handling uncertainty

of clusters

Page 39: Hierachical clustering

Hierarchical Clustering

Page 40: Hierachical clustering

42

Hierarchical Clustering

• Build a tree-based hierarchical taxonomy (dendrogram)

• One approach: recursive application of a partitional clustering algorithm.

animal

vertebrate

fish reptile amphib. mammal worm insect crustacean

invertebrate

Page 41: Hierachical clustering

Hierarchical clustering and dendrograms

• A hierarchical clustering on a set of objects D is a set of nested partitions of D. It is represented by a binary tree such that : – The root node is a cluster that contains all data points– Each (parent) node is a cluster made of two subclusters

(childs) – Each leaf node represents one data point (singleton ie

cluster with only one item) • A hierarchical clustering scheme is also called a

taxonomy. In data clustering the binary tree is called a dendrogram.

Page 42: Hierachical clustering

44

• Clustering obtained by cutting the dendrogram at a desired level: each connected component forms a cluster.

Dendogram: Hierarchical Clustering

Page 43: Hierachical clustering

Hierarchical clustering: forming clusters

• Forming clusters from dendograms

Page 44: Hierachical clustering

Hierarchical clustering• There are two styles of hierarchical clustering algorithms to

build a tree from the input set S: – Agglomerative (bottom-up):

• Beginning with singletons (sets with 1 element)• Merging them until S is achieved as the root.• In each steps , the two closest clusters are aggregates into a new

combined cluster• In this way, number of clusters in the data set is reduced at each step• Eventually, all records/elements are combined into a single huge cluster• It is the most common approach.

– Divisive (top-down):• All records are combined in to a one big cluster• Then the most dissimilar records being split off recursively partitioning

S until singleton sets are reached.

• Does not require the number of clusters k in advance

Page 45: Hierachical clustering

Two types of hierarchical clustering algorithms :

Agglomerative : “bottom-up”

Divisive : “top-down

Page 46: Hierachical clustering

48

Hierarchical Agglomerative Clustering (HAC) Algorithm

Start with all instances in their own cluster.Until there is only one cluster: Among the current clusters, determine the two clusters, ci and cj, that are most similar. Replace ci and cj with a single cluster ci cj

• Assumes a similarity function for determining the similarity of two instances.

• Starts with all instances in a separate cluster and then repeatedly joins the two clusters that are most similar until there is only one cluster.

Page 47: Hierachical clustering

Classification of AHCWe can distinguish AHC algorithms according to the type of distance

measures used. There are two approaches :

Graph methods :

• Single link method

• Complete link method

• Group average method (UPGMA)

• Weighted group average method (WPGMA)

Geometric :

• Ward’s method

• Centroid method

• Median method

Page 48: Hierachical clustering

Lance –Williams AlgorithmDefinition(Lance-Williams formula) In AHC algorithms, the Lance-Williams formula[Lance and Williams, 1967] is a recurrence equation used to calculate the dissimilarity between a cluster Ck and a cluster formed by merging two other clusters Cl C∪ l ′

where αI, α

I’ ,β, γ are real numbers

Page 49: Hierachical clustering

AHC methods and the Lance-Williams formula

Page 50: Hierachical clustering

Single link method• Also known as the nearest neighbor method,

since it employs the nearest neighbor to measure the dissimilarity between two clusters

Page 51: Hierachical clustering

Cluster distance measure• Single link

– Distance between closest elements in clusters

• Complete link– Distance between farthest elements in clusters

• Centroids– Distance between centroids(means) of two clusters

Page 52: Hierachical clustering

Single-link clustering

Nested Clusters Dendrogram

1

2

3

4

5

6

12

3

4

5

3 6 2 5 4 10

0.05

0.1

0.15

0.2

Page 53: Hierachical clustering

Example 1-Single link method

Page 54: Hierachical clustering
Page 55: Hierachical clustering
Page 56: Hierachical clustering
Page 57: Hierachical clustering
Page 58: Hierachical clustering
Page 59: Hierachical clustering
Page 60: Hierachical clustering
Page 61: Hierachical clustering
Page 62: Hierachical clustering
Page 63: Hierachical clustering
Page 64: Hierachical clustering
Page 65: Hierachical clustering
Page 66: Hierachical clustering
Page 67: Hierachical clustering

Example 02-Single link method • x

1 = (1, 2)

• x2

= (1, 2.5)

• x3

= (3, 1)

• x4

= (4, 0.5)

• x5

= (4, 2)

Page 68: Hierachical clustering

• x1

= (1, 2)

• x2

= (1, 2.5)

• x3

= (3, 1)

• x4

= (4, 0.5)

• x5

= (4, 2)

Merge X1 and X2

Page 69: Hierachical clustering
Page 70: Hierachical clustering

Merge X3 and X4

Page 71: Hierachical clustering
Page 72: Hierachical clustering

Merge {X3,X4} and X5

Page 73: Hierachical clustering
Page 74: Hierachical clustering

Merge {X1,X2} and {X3,X4,X5}

Page 75: Hierachical clustering
Page 76: Hierachical clustering

• x1

= (1, 2)

• x2

= (1, 2.5)

• x3

= (3, 1)

• x4

= (4, 0.5)

• x5

= (4, 2)

Merge X1 and X2

Example 3-Complete link method

Page 77: Hierachical clustering
Page 78: Hierachical clustering

Merge X3 and X4

Page 79: Hierachical clustering
Page 80: Hierachical clustering

Merge {X3,X4} and X5

Page 81: Hierachical clustering
Page 82: Hierachical clustering

Merge {X1,X2} and {X3,X4,X5}

Page 83: Hierachical clustering

The dendrogram :

Page 84: Hierachical clustering

Summary Hierarchical Clustering• For a dataset consisting of n points

• O(n2) space; it requires storing the distance matrix • O(n3) time complexity in most of the cases(agglomerative clustering)

• Advantages– Dendograms are great for visualization– Provides hierarchical relations between clusters

• Disadvantages– Not easy to define levels for clusters– Can never undo what was done previously– Sensitive to cluster distance measures and noise/outliers– Experiments showed that other clustering techniques outperform

hierarchical clustering• There are several variants to overcome its weaknesses

– BIRCH: scalable to a large data set– ROCK: clustering categorical data – CHAMELEON: hierarchical clustering using dynamic modelling