fast algorithm and implementation of dissimilarity self-organizing maps
DESCRIPTION
Fast algorithm and implementation of dissimilarity self-organizing maps. Reporter: Ming-Jui Kuo D9515007. Outline. Introduction The DSOM Simulation Results. Introduction. A drawback in standard SOM. - PowerPoint PPT PresentationTRANSCRIPT
Fast algorithm and implementation of dissimilarity self-organizing maps
Reporter: Ming-Jui Kuo D9515007
2/24
Outline
Introduction The DSOM Simulation Results
3/24
4/24
A drawback in standard SOM
Since vectors from a fixed and finite-dimensional vector space. Unfortunately, many real-world data depart strongly from this model. It is quite common, for instance, to have variable-sized data.
They are natural, for example, in online handwriting recognition where the representation of a character drawn by the user can vary in length because of the drawing conditions. Other data, such as texts for instance, are strongly non-numerical and have a complex internal structure: they are very difficult to represent accurately in a vector space.
5/24
Related papers
[1] Teuvo Kohonen*, Panu Somervuo, “Self-organizing maps
of symbol strings,” Neurocomputing, vol 21, pp. 19-30,
1998 [2] Teuvo Kohonen*, Panu Somervuo, “How to make large
self-organizing maps for nonvectorial data,” vol. 15, pp.
945-152, 2002 [3] Aïcha El Golli , Brieuc Conan-Guez, and Fabrice Rossi,
“A Self Organizing Map for dissimilarity data,” IFCS,
2004, Proceedings. [4] Aïcha El Golli , “Speeding up the self organizing map for
dissimilarity data”
6/24
Alias Name
Median self-organizing map : Median SOM [2]
Dissimilarity self-organizing map : DSOM [3]
7/24
Related web sites
http://apiacoa.org/
http://lists.gforge.inria.fr/pipermail/somlib-commits
by Fabrice Rossi
8/24
A major drawback of the DSOM is that its running time can be very high, especially when compared to the standard vector SOM.
It is well known that the SOM algorithm behaves linearly with the number of input data. In contrast, the DSOM behaves quadratically with this number.
The goal of this paper
9/24
In this paper, the authors propose several modifications of the basic algorithm that allow a much faster implementation.
The quadratic nature of the algorithm cannot be avoided, essentially because dissimilarity data are intrinsically described by a quadratic number of one-to-one dissimilarities.
The goal of this paper (cont’d)
10/24
The standard DSOM algorithm cost is proportional to, where N is the number of observations and M the number of clusters that the algorithm has to produce, whereas the modifications of this paper lead to a cost proportional to. (save in the representation phase)
An important property of all modifications in this paper is that the obtained algorithm produces exactly the same results as the standard DSOM algorithm.
The goal of this paper (cont’d)
11/24
Dissimilarity Data
In a given data set X, use the dissimilarity measure to measure the dissimilarity between data instances (one-to-one, pairwise). Sometimes, the distance measurement can be used.
12/24
xi
xj
d(xi,xj)Data Set
NN
ji xxdRowi
Colj
),(
Dissimilarity Matrix
Dissimilarity Data (cont’d)
13/24
xi
xj
NN
ji xxdRowi
Colj
),(
Raw Data
Dissimilarity Data
Dissimilarity Data (cont’d)
14/24
15/24
The DSOM algorithm1: choose initial values for {Initialization phase}2: for l = 1 to L do
3: for all do {Template for the affectation phase}4: compute
5: end for
6: for all do {Template for the representation phase}7: compute
8: end for
9: end for
),...,( 1on
oo mmM
},...,1{ Ni
},...,1{ Mj
),(minarg)( 1
},...,1{
l
jiMj
l dic mx
N
ii
ll
Dm
lj djichm
1),()),((minarg mx
16/24
input layer, competitive layer,
1x
2x
lx ijy
weighting vectors, Wijlw ,
Y
2
1, ,
llijlij xwy
Figure. A diagram for SOMNN consists of two layers: 2D-input and 4 4-output array
X
output vectors
input vector
17/24
The DSOM
m3m1
Representation Phase
Affectation Phase
(Standard SOM)
(Operation by k-means)
DSOM (median SOM)
Map
m2
m4m5 m6
m7
m8 m9
C8
C1 C2 C3
C4 C5C6
C7C9
18/24
Partial Sums
19/24
Early Stopping
20/24
21/24
22/24
23/24
24/24
25/24