09 -1 lecture 09 clustering-based learning topics –basics –k-means –self-organizing maps...

09<Clustering>-1

Lecture 09 Clustering-based Learning

• Topics– Basics– K-Means– Self-Organizing Maps– Applications– Discussions

09<Clustering>-2

Basics

• Clustering– Grouping a collection of objects

(examples) into clusters, such that objects are most similar inside each cluster and least similar between clusters.

– Core problem: similarity definition• Intra cluster similarity• Inter cluster similarity

– Inductive learning– Unsupervised learning

09<Clustering>-3

Basics• Minimizing intra cluster dissimilarity

is equivalent to maximizing inter cluster dissimilarity

• Clustering performance in terms of Intra cluster dissimilarity:

– K for K clusters and d(xi,xi’) for dissimilarity measure

i ikiikk

K

kk

xxdWwhere

WCDS

''

1

),(2

1

,)(

09<Clustering>-4

Basics

• Dissimilarity measure depends on value types and value coding systems

• Some examples– Quantitative variables:

– Ordinal variables:

– Categorical variables:

)(),( '' iiii xxlxxd

M

ixi

2/1

otherwise

xxifxxd ii

ii ,1

,0),( '

'

09<Clustering>-5

Basics

• Clustering algorithms– Combinatorial Algorithms

• Work directly on the observed data• K-Means• Self-Organizing Maps

09<Clustering>-6

K-Means

• A statistical learning mechanism• A given object is assigned to a

cluster if it has least dissimilarity to the mean value of the cluster.

• Euclidean or Manhattan distance is commonly used to measure dissimilarity

• The mean value of each cluster is recalculated in each iteration

09<Clustering>-7

K-Means• Step 1: Selecting Centers

Selects k objects randomly, each becoming the center (mean) of an initial cluster.

• Step 2: ClusteringAssign each of the remaining objects to the cluster with the nearest distance. The most popular method for calculating distance is Euclidean distance. Given two points p = ( p1, p2, …, pk ) and q = ( q1, q2, …, qk ), their Euclidean distance is defined as:

2i

k

1ii )qp(d

09<Clustering>-8

K-Means

• Step 3: Computing New CentersCompute new cluster centers. Let xi be one of the elements assigned to the kth cluster, and Nk be the number of elements in the cluster. The new center of cluster k, Ck, is calculated as:

• Step 4: IterationRepeat steps 2 and 3 until no members change their clusters.

k

Nii

k N

x

C k

1

09<Clustering>-9

K-Means

• Example

0

1

2

3

4

5

6

7

8

9

10

0 1 2 3 4 5 6 7 8 9 100

1

2

3

4

5

6

7

8

9

10

0 1 2 3 4 5 6 7 8 9 10

0

1

2

3

4

5

6

7

8

9

10

0 1 2 3 4 5 6 7 8 9 10

0

1

2

3

4

5

6

7

8

9

10

0 1 2 3 4 5 6 7 8 9 10

0

1

2

3

4

5

6

7

8

9

10

0 1 2 3 4 5 6 7 8 9 10

K=2

Arbitrarily choose K object as initial cluster center

Assign each object to most similar center

Update the cluster means

Update the cluster means

reassign

reassign

09<Clustering>-10

K-Means

• Usually, the problem itself has the setting of K

• If K is not given, in order to find the best K, we examine the intra cluster dissimilarity Wk, which is a function of K

• Usually Wk decreases with increasing K

09<Clustering>-11

K-Means

• Decide K

A sharp drop of Wk

is observed

09<Clustering>-12

K-Means

• Hierarchical Clustering

09<Clustering>-13

K-Means

• Agglomerative Hierarchical Clustering

0

1

2

3

4

5

6

7

8

9

10

0 1 2 3 4 5 6 7 8 9 10

0

1

2

3

4

5

6

7

8

9

10

0 1 2 3 4 5 6 7 8 9 10

0

1

2

3

4

5

6

7

8

9

10

0 1 2 3 4 5 6 7 8 9 10

09<Clustering>-14

K-Means

• Divisive Hierarchical Clustering

0

1

2

3

4

5

6

7

8

9

10

0 1 2 3 4 5 6 7 8 9 100

1

2

3

4

5

6

7

8

9

10

0 1 2 3 4 5 6 7 8 9 10

0

1

2

3

4

5

6

7

8

9

10

0 1 2 3 4 5 6 7 8 9 10

09<Clustering>-15

Self-Organizing Maps

• Brain self-organizing structure – Our brain is dominated by the cerebral

cortex, a very complex structure of billions of neurons and hundreds of billions of synapses.

– The cortex includes areas that are responsible for different human activities (motor, visual, auditory, etc.), and associated with different sensory inputs.

– We can say that each sensory input is mapped into a corresponding area of the cerebral cortex.

– The cortex is a self-organising computational map in the human brain.

09<Clustering>-16


• The self-organising map (SOM) provides a topological mapping emulating the cortex structure. It places a fixed number of input patterns from the input layer into a higher-dimensional output or Kohonen layer.

• SOM is a subsymbolic learning algorithm; data input need to be numerically coded.

09<Clustering>-17


Input layer

Kohonen layer

(a)

Input layer

Kohonen layer

1 0 (b)

0 1

09<Clustering>-18

Self-Organizing Maps Training of SOM is based on

competitive learning: Neurons compete among themselves to be activated, but only a single output neuron can be active at any time.

• The output neuron that wins the “competition” is called the winner-takes-all neuron

• Training in SOM begins with the winner’s neighborhood of a fairly large size. Then, as training proceeds, the neighborhood size gradually decreases.

09<Clustering>-19


• Conceptual architecture

Input layer

O u

t p

u t

S

i g

n a

l s

I n p

u t

S

i g

n a

l s

x1

x2

Output (Kohonen) layer

y1

y2

y3

09<Clustering>-20

Self-Organizing Maps• The lateral connections are used to create a

competition between neurons. The neuron with the largest activation level among all neurons in the output layer becomes the winner. This neuron is the only neuron that produces an output signal. The activity of all other neurons is suppressed in the competition.

• The lateral feedback connections produce excitatory or inhibitory effects, depending on the distance from the winning neuron. This can be achieved by the use of a Mexican hat function which describes synaptic weights between neurons in the Kohonen layer.

09<Clustering>-21


Connectionstrength

Distance

Excitatoryeffect

Inhibitoryeffect

Inhibitoryeffect

0

1

• Mexican hat function of lateral connection

09<Clustering>-22

SOM – Competitive Learning Algorithm

• Step 1: InitializationSet initial weights to small random values, say in an interval [0, 1], and assign a small positive value, e.g., 0.2 to 0.5, to the learning rate parameter 0.

09<Clustering>-23


• Step 2: Activation and Similarity Matching

Activate the SOM by applying the input vector X, and find the best matching neuron JX at iteration p, using the minimum Euclidean distance criterion

where n is the number of neurons in the input layer, m is the number of neurons in the Kohonen layer, and j = 1, 2, …m.

21

1

2

/n

iiji

jj

j][ )p(wxminarg)p(argmin)p(J

WXX

09<Clustering>-24


• Step 3: Learning(a) Calculate weight corrections according to the competitive learning rule:

where

ΛJ: neighborhood of neuron J, d0: initial neighborhood size and T: total repetitions.

)(,0

)(, )( )( )(

][pj

pjpwxppw

J

Jijiij

)/1()(

)},()(|{)(

),/1()(

0

0

Tpdpd

pdpJiip

Tpp

XJ

09<Clustering>-25


• Step 3: Learning (Continued)(b) Update the weights

where wij(p) is the weight correction at iteration p.

• Step 4: Iteration Increase iteration p by one, go back to Step 2 and continue until the minimum-distance Euclidean criterion is satisfied, or no noticeable changes occur in the feature map.

)()()1( pwpwpw ijijij

09<Clustering>-26


• SOM is online K-Means

0

1

2

3

4

5

6

7

8

9

10

0 1 2 3 4 5 6 7 8 9 10

New Object

09<Clustering>-27

Self-Organizing Maps• Example: A SOM with 100 neurons

arranged in the form of a two-dimensional lattice with 10 rows and 10 columns. It is required to classify two-dimensional input vectors each neuron in the network should respond only to the input vectors occurring in its region.

• The network is trained with 1000 two-dimensional input vectors generated randomly in a square region in the interval between –1 and +1. The learning rate parameter is fixed, equal to 0.1.

09<Clustering>-28


-0.8 -0.6 -0.4 -0.2 0 0.2 0.4 0.6 0.8 1W(1,j)

-1-1

-0.8

-0.6

-0.4

-0.2

0

0.2

0.4

0.6

0.8

1

W(2

,j)

Initial random weights

09<Clustering>-29


-0.8 -0.6 -0.4 -0.2 0 0.2 0.4 0.6 0.8 1W(1,j)

-1-1

-0.8

-0.6

-0.4

-0.2

0

0.2

0.4

0.6

0.8

1

W(2

,j)

100 repetitions

09<Clustering>-30


-0.8 -0.6 -0.4 -0.2 0 0.2 0.4 0.6 0.8 1W(1,j)

-1-1

-0.8

-0.6

-0.4

-0.2

0

0.2

0.4

0.6

0.8

1

W(2

,j)

1,000 repetitions

09<Clustering>-31


-0.8 -0.6 -0.4 -0.2 0 0.2 0.4 0.6 0.8 1W(1,j)

-1-1

-0.8

-0.6

-0.4

-0.2

0

0.2

0.4

0.6

0.8

1

W(2

,j)

10,000 repetitions

09<Clustering>-32

Applications

• K-Means– Cluster ECG signals according to

Correlation Dimensions

• Self-Organizing Maps– Find churner groups– Speech recognition

09<Clustering>-33

Discussions

• Clustering events with attribute-based representation– Attribute-based similarity measure for

clusters – Hierarchical clustering of event

sequences– Generalization, e.g.,

• “A ∧ B ∧C” generalized to “A ∧ B” • “A ∨ B” generalized to “A ∨ B ∨ C”• ontology

– Specialization

09 -1 lecture 09 clustering-based learning topics –basics –k-means –self-organizing maps...

Documents

kth cluster

dissimilarity measure

iteration slide

new center of cluster

intra cluster dissimilarity

new cluster centers

initial cluster center

basics dissimilarity