multiple spheres som - home - cecs -...

Multiple Spheres SOM

Songwen Zha

<[email protected]>

30 May 2012

A report submitted for the degree of Master of Computing of

Australian National University

Supervisor: Prof. Tom Gedeon

Acknowledgements

I am sincerely and heartily thankful to my supervisor Prof. Tom Gedeon who gave his

guidance and encouragement during the whole project, and also I would like thanks

to Dr. Dingyun Zhu for his sugguestions about the coding process.

Moreover, I thank Dr. Weifa Liang who shared many presentations and writing

techniques for the project with me.

Finally, I also would like to thank my family and my friends for their help and support.

Abstract

This report first gives an overview of neural networks and Self-Organizing Maps

(SOMs) and then introduces the Spherical Self-Organizing MAP (SSOM) which avoids

some ill-effects from the SOM. In order to explore a more effective method for

classification, a new type of Self-Organizing Map is proposed which is the Multiple

Spherical Self-Organizing Map (MSSOM).

In this report, I present a new way to create the MSSOM which is based on SSOM. In

this kind of MSSOM, the distance between spheres is no longer a constant, but

depends on the number of spheres and the size of the spheres. Due to the change of

the distance between spheres, the neighborhood structure becomes more flexible. I

also discuss Quantization errors (QE) and Topological errors(TE), which are used to

measure the classification results.

Finally, I designed 4 experiments to demonstrate that the method I presented is

viable. The results show that the method is feasible and worth further investigation.

List of Abbreviations

NN Neural Networks

SOM Self-Organizing Map

SSOM Spherical Self –Organizing Map

MSSOM Multiple Spherical Self- Organizing Map

QE Quantization Error

TE Topological Error

Table of Contents

Acknowledgements ......................................................................................................................... 2

Abstract ............................................................................................................................................ 3

List of Abbreviations ....................................................................................................................... 4

1 Introduction ................................................................................................................................... 9

1.1 Motivation ............................................................................................................................... 9

1.1 Objective of project ................................................................................................................. 9

1.3 Contribution .......................................................................................................................... 10

1.4 Report organization ............................................................................................................... 10

2 Background and relevant techniques ........................................................................................... 10

2.1 Neural network ..................................................................................................................... 10

2.2 Kohonen’s Self-organizing map ............................................................................................. 12

2.3.Spherical Self-Organizing Maps (S-SOMs) ............................................................................. 13

3. Spherical SOM ............................................................................................................................. 14

3.1 The weight adaptation algorithm for SSOM.......................................................................... 14

3.2 The neighborhood structure for the SSOM ........................................................................... 17

4 Multiple Spheres SOM and relevant techniques......................................................................... 17

4.1 Description ............................................................................................................................ 18

4.2 Process of Multiple Spheres SOM ......................................................................................... 20

4.3 Neighborhoods structure ...................................................................................................... 21

4.4 Distance between spheres .................................................................................................... 22

4.5 Training methods................................................................................................................... 25

4.6 Visualization .......................................................................................................................... 25

4.7 Evaluation .............................................................................................................................. 27

5Experiment of Multiple Spheres SOM ........................................................................................... 28

5.1 Experiment 1: Comparison of different SOMs ...................................................................... 28

5.1.1 Experiment description .................................................................................................. 28

5.1.2 Experiment process and data analysis ........................................................................... 29

5.2 Experiment 2: Searching for a better Multiple Spheres SOMs.............................................. 31

5.2.1Experiment description ................................................................................................... 31

5.2.2 Experiment process and data analysis ........................................................................... 32

5.3 Experiment 3: In-depth experiment on MSSOM ................................................................... 33

5.4 Experiment 4: Another new method to modify the distance ............................................... 35

6 Conclusion and future work ........................................................................................................ 35

6.1 Conclusion ............................................................................................................................. 35

6.2 Future work ........................................................................................................................... 36

Reference........................................................................................................................................ 37

Figure 1 An example of basic neuron ...................................................................................... 11

Figure 2 2D arrangement ........................................................................................................ 12

Figure 3 Weight Adaption Process .......................................................................................... 16

Figure 4 User interface of Multiple Spheres SOM ................................................................... 18

Figure 5 Data set list ................................................................................................................ 19

Figure 6 Data structure list ...................................................................................................... 19

Figure 7 Loading the data ........................................................................................................ 19

Figure 8 After the training ....................................................................................................... 20

Figure 9 The flow chart of Multiple Spheres SOM .................................................................. 21

Figure 10 2D output neuron map of Multiple Spheres SOM .................................................. 22

Figure 11 4 spheres SOM ........................................................................................................ 23

Figure 12 4 spheres chain glyph visualization ......................................................................... 26

Figure 13 4 spheres equal glyph visualization ......................................................................... 26

Figure 14 SSOM visual effects from different angles .............................................................. 30

Figure 15 4 spheres SOMs visual effects ................................................................................. 30

Figure 16 MSSOM (4s) visual effects ....................................................................................... 33

Table 1 Difference between von Neumann and neural network. ........................................... 11

Table 2 Relation between N and the number of neurons in ICOSA ........................................ 17

Table 3 4 patterns input vector in parallel training ................................................................. 25

Table 4 4 patterns input vector in sequence training .............................................................. 25

Table 5 Hardware environment ............................................................................................... 28

Table 6 Attributes information for IRIS .................................................................................... 28

Table 7 Summary of dataset .................................................................................................... 28

Table 8 Summary of dataset .................................................................................................... 29

Table 9 Attributes information for IRIS .................................................................................... 31

Table 10 Summary of dataset .................................................................................................. 31

Table 11 Results of Huajie Wu’s method ................................................................................. 32

Table 12 Results of my method ............................................................................................... 32

Table 13 Results of Huajie’s method ....................................................................................... 34

Table 14 Results of my method ............................................................................................... 34

Table 15 Correlation between number of spheres and QE&TE .............................................. 34

Table 16 Results of new method ............................................................................................. 35

1 Introduction

Self-organizing Features Maps(SOMs)[1]are a type of artificial neural network which

can map high-dimensional data to a low-dimensional representation space by

unsupervised learning. The SOM uses a neighborhood function to preserve the

topological mapping from the high-dimensional data space to the map neurons [2]. It

is very useful to visualize some certain characteristics of the model vectors and the

cluster structure by the regular grid structure of the SOM [3].

1.1 Motivation

Ideally, all the neurons in the SOM enjoy the same chance of updating its weights

and its neighbors` weights. However, the conventional SOM is a planar map and the

grids units at the boundary of SOM have fewer neighbors than the inside units. In

other words the grids units at the boundary get less chance to update its weights and

its neighborhood`s weights during the training process, which often leads to the

notorious “border effect” [4]. As the result, the sphere SOM is introduced to perform

better. This is because in the sphere SOM the boundary units are eliminated, all the

units are enjoy the same priority to compete and update.

In order to improve clustering results, the way the distance between spheres is

calculated has been changed. Better clustering could be found by modifying the

distance.

Visualization is a great tool to assist the user to analysis interactive data especially for

discovering and analyzing patterns within large multi-dimensional data sets. Using

certain visual elements, it is much easier to identify the patterns such as color and

shape. Multiple spheres SOM could give a view of three-dimensional (3D) view,

which means the observer may see or find more precise details of informative

patterns.

1.1 Objective of project

This report aims to allow users choose an arbitrary number of spheres to visualized

the clustering results by a modified version of Sangole&Leontitsis’s SSOM code and

explore the influence of the distance between spheres on the clustering results.

1.3 Contribution

The contribution of this report is to extend previous work which constructed a

Multiple Spheres SOM by modifying Sangole&Leontitsis’s SSOM code. The previous

work did no t cluster well using many spheres, a new method of calculating the

distance between spheres was introduced to test a large dataset.

1.4 Report organization

Chapter 2 gives a background and overview of the relevant techniques, concepts and

algorithms, including neural networks, SOMs, SSOMs which are preparation for well

understanding Multiple Spheres SOM. Chapter 3 takes a further interpretation and

explanation of Sangole&Leontitsis’s SSOM. Chapter 4 describes how to modify

Sangole&Leontitsis’s SSOM code to change the distance between spheres in order to

explore the influence that the distance between spheres has on the clustering results.

Chapter 5 designs some experiments to examine the hypothesis whether the

distance between spheres would affect the clustering results.

2 Background and relevant techniques

2.1 Neural network

Neural networks or artificial neural networks are composed of interconnecting

artificial neurons which can simulate some properties and functions of biological

neural networks to solve specific artificial intelligence problems [5].

Compared to von Neumann machines, neural networks are a different paradigm for

computing. The Von Neumann machine is based on the processing or memory

abstraction of human information processing. However, neural networks are based

on the parallel multiple processing structure of the human brain. The Table 1 below

shows the main differences between von Neumann and neural network models.

Von Neumann Neural Network

Single CPU Multiple processing

CPU is complex Each neuron has simple function to

process

Processing fast Processing might be slow

Data pathways is simple Interconnections between neurons are

complex

Data processing is outstanding Good at pattern recognition

Symbol processing is outstanding Good at sub-symbolic problems

Table 1Difference between von Neumann and neural network.

Neural networks consist of many neurons and each neuron can be described by a

simple model. In the Figure 2, a basic neuron is demonstrated. From the figure we

can see, there is an input layer and an output layer. The parameter X represents input

dimension which could get from the data set or the other neurons, the parameter Y

represents output dimensions, and the parameter W represents the strength of the

connections between input data and neurons. Each input and output could be

multiple dimensions.

X1

X2

XN

……

Y

W1

W2

WN

Inputs Weights Outputs

∑WX

Activation fuction

Figure 1An example of basic neuron

Figure 2 shows a simple single neuron’s adaptive network. At first, when the neuron

gets the input, the input Xn would multiply the weight. Then the neuron would

collect all the adaptive inputs. Finally the neuron would use activation function to

generate the output. The activation function is showed below:

Where f is the activation function, e.g. sigmoid function

y is the output of the neuron

n is the number of the inputs

is the i-th input value

is the weight of the i-th input value

In general, neural networks consist of many different basic neurons. A typical

neural network contains three layers: input layer, output layer, hidden layer. Each

layer takes a critical role during the training process.

There are three main learning algorithms to adapt the weights in neural

networks: they are supervised learning, unsupervised and reinforcement

learning.

In supervised learning, the network would have a group of desired outputs, and

each output should be compared with the desired output and then adapt the

weight. In reinforcement learning, some information on the quality of outputs are

required so it can assign credit. In the unsupervised learning, the input should be

modeled statistically by the network. In this report, I will use unsupervised

learning in multiple spheres SOM, which means the network will group the input

data and find suitable patterns.

2.2 Kohonen’s Self-organizing map

In fact, the term “Self-organizing map” could be complemented by many different

approaches. Kohonen’s SOM is called a topology-preserving map [6]. They use a

neighborhood function to preserve the topological properties of the input space.

The self-organizing map describes a mapping from a higher dimensional input

space to a lower dimensional map space.

In this report, I would like to use Kohonen’s Self-organizing map. Therefore, all

the SOM in this report stand for Kohonen’s Self-organizing map. Teuvo Kohonen

first introduced the Self-organizing map in 1982[7]. Kohonen describes SOM as a

visualization and analysis tool for high dimensional data. Moreover, it also

usually can be used for clustering, dimensionality reduction, classification,

sampling, vector quantization, and data-mining.

A significant difference between SOM and the backpropagation neural network is

that SOM only contains an input layer and an output layer without a hidden layer.

In other words, all the input neurons are directly connected to the output

neurons. The figure 2 below shows the two-dimensional arrangement for SOM.

Figure 22D arrangement

http://en.wikipedia.org/wiki/Topology

Next, I would like to introduce the algorithm for Kohonon’s SOM. It is assumed

that the output neurons are connected in an array and the neural network is fully

connected which means all the neurons in the input layer are connected to all the

neurons in the output layer. The weight vectors should be initialized and the

desired number of cycles should be set. Then the competitive learning algorithm

would be used to find the ‘winning neuron’.

First of all, an input vector X should be randomly chosen. Then the following

function would be used to find the nearest distance (Euclidean distance)

between input vector and all the N weight vectors:

, n=1,2…..N (2.1)

In the function above, represents a non-decreasing function depends on

counting, which is used to prevent cluster under-utilization [7].

Then the nearest neuron needs to be found. Therefore, if the distance between

input X and all the weight vectors is computed as the smallest one, the neuron n

is selected as the winning neuron [6].

After that, the weights associated with the winning neuron and all the neurons

residing within its specified neighborhood should be updated. The following

function shows how to update the weights.

(2.2)

(2.3)

Where is a predefined learning rate, ‘s’ is the neighborhood size parameter,

‘R’ is the size of the neighborhood that is to cover a hemisphere - it is considered

constant [6].

Finally, repeating the process before until the desired cycles is arrived or a

predefined stopping criterion is met.

2.3.Spherical Self-Organizing Maps (S-SOMs)

The Spherical Self-Organizing Maps(S-SOMs) is a kind of extension of the SOMs.

The normal SOM leads to the notorious “border effect” with neurons on the edges

of the grid having few neighbors, however, the S-SOM could eliminate these

effects because all the neurons in the S-SOM enjoys the same priority in

competition.

The S-SOM inherit the capability of visualizing high dimensional data which

means S-SOM not only has an overall symmetry and continuity in its structure

but also gives a 3D framework for visualizing the data[8].

Actually, there are many different topologies for the S-SOM. In this report, I

would like focus on Sangole&Leontitsis’s S-SOM. it provides the index of the

neighborhood and the relationship between the neurons which would be

convenient to expand to the multiple S-SOM.

3. Spherical SOM

In SSOM, there is a non-linear mapping from the data space to the surface of the

sphere. In that case, all the neurons have the same chance to update themselves

and their neighborhoods. Actually, several approaches have been attempted to

eliminate the “border effect” [9]. Kohonen suggested using a heuristic weighting

rule method. Moreover, the problem also could be solved by implementing on a

torus instead of sphere. However, the torus-based SOM cannot provide an

intuitively readable map. People are more likely understand get a map on a

sphere. In addition, the spherical SOM is more suitable for data with an

underlying directional structure [10]. Therefore, the spherical SOM is considered

as an effective way to map from a higher dimensional input space to a lower

dimensional map space.

3.1 The weight adaptation algorithm for SSOM

In this implementation, all the data vectors are mapped into a spherical SOM. In

the following steps, the winning neuron which is closest to the input vector

would be selected and updated. The data space is Cartesian form. The Figure 3

shows the weight adaption process.

At the beginning of training processing, all the weight vectors should be

initialized and the desired number of epochs also should be set.

In the step 1, an input vector supposed would be selected randomly where

parameter p means the total number of training patterns. After the input vector

is selected, the distance or difference between the input vectors and the

weight vectors should be calculated. From the step 0, we already know the input

vector’s Cartesian coordinate. The weight vector stands by (i,j,k). The following

formula shows the way to compute the distance between input vectors and the

weight vectors. is used to prevent cluster under-utilization which contains a

count-dependent non-decreasing function and stands for the weight

from the nth input to weight vector (i,j,k)th given i=1,2,…I, j=1,2,…J, k=1,2,…K.

(3.1)

In step 3, the winning neuron (i,j,k) should be selected by comparing with the

other neurons. The winning neuron would get the chance to update itself and its

neighborhood [8].

In step 4, all the weights associated with the winning neuron (i,j,k) and all the

neurons residing within the specified neighborhood( ) would be updated.

The following formulas show the way how to update the weights.

(3.2)

where

(3.3)

is a learning rate which is predefined, and stands for the initial

neighborhood size [3]. The current neighborhood size should be reduced

gradually

Initialize the weight vectors and set the desired number of epochs through the training data set.

Randomly select an input from the dataset.

Calculate the distance between the input vector and the weight vectors for all the neurons in the network.

Select the winning neuron to be the one with the minimum distance.

Update the weights of the winning neuron and its neighborhood.

Yes

Yes

Check whether satisfy the stopping criterion.

Check whether all the inputs have been selected

Finish.

NoNo

Figure 3Weight Adaption Process

during the training process. At the end of process, only the winning neuron can

be updated. The number of epochs is set before as already mentioned in step

0.The neighborhoods size is generally reduced during the training. At the first

quarter of epochs, the current neighborhood size is equal to the initial

neighborhood size. At the second quarter of epochs, the current neighborhood

size is equal to half of the initial neighborhood size. At the third quarter of epochs,

the current neighborhood size is equal to 1. At the last quarter of epochs, the

current neighborhood size is equal to 0[13].

In step 5, whether the stopping criterion is satisfied should be checked. If not

then training process go to step 1 and repeat until the stopping criterion is

satisfied.

In the final step, whether all the input patterns have been selected should be

checked to make sure the training process is correct.

3.2 The neighborhood structure for the SSOM

In Sangole’s spherical SOM, they defined the distance between two neurons on

the sphere as the length of the shortest path connecting them. Each neuron

should maintain a list of its immediate neighbors in a multiple dimensional

matrix. It is then easy to find the neighborhood neurons when the winning

neuron is selected.

There are five regular tessellations of sphere platonic polyhedral which are

tetrahedron, octahedron, cube, dodecahedron and icosahedrons [2]. The

icosahedrons-based geodesic dome is chose for the spherical SOM because the

icosahedrons is most similar to the sphere which means the change of the edge

length is smaller than the others. We call this arrangement as , where N

is the number of recursive subdivision. 2+10* neurons in the output space can

be arranged by . Table 1 shows the number of neurons for different

values of N.

N Number of neurons

0 12

1 42

2 162

3 642

4 2562

5 10242

Table 2Relation between N and the number of neurons in ICOSA

From the table we can see that we may not get the accurate suitable structure for

the dataset using a single sphere. However, we can choose amore suitable

structure using multiple spheres.

4 Multiple Spheres SOM and relevant

techniques

In this section, I would like to introduce the training process of multiple spheres

SOM. Then I would explain how to define the neighborhoods structure and the

distance between spheres. Also, some visualization results will be displayed.

Moreover, two training methods which are implemented in the program will be

described. Finally, some evaluation criteria which are used to analysis the

classification result will be introduced.

4.1 Description

In this section, I would like introduce the main GUI of the windows for the

software and also describe the user interface.

From figure 4, we can easily find the user interface is friendly. Users can quickly

get the critical information from the user interface.

Figure 4User interface of Multiple Spheres SOM

Before the training, a certain dataset and a certain SOM data structure should be

loaded. All the dataset are saved in ‘.mat’ files (show in Figure 5). All the data

structures are listed in Figure 6 (for more details refer to Table 1).First of all, the

data set which I need to classify should be loaded. Then a suitable data structure

file from the Figure 6 should be loaded. The data structure which I choose

depends on the size of the data set. Finally, the first column parameter which

stands for training parameters on the right side of Figure 4 need to be set.

“Epochs” represents the parallel training cycles, and is set to 44 as the default

value. “Size” stands for the neighborhood size, and is set to 0.5 as the default

value which represents the neighborhood size is to cover a hemisphere and it is

considered constant. “Spheres” represents the number of spheres, and is set to 3

spheres as the default value. “Times” is used to count the times of the sequence

training, and is set to 3 as the default value.

Figure 5Data set list

Figure 6Data structure list

The parameters in the second column are used for display, when the training

process is finished, all the parameters will have values.

Training button

Figure 7Loading the data

From Figure 7 we can see after loading the data set file and data structure, the

two training button are visible. The first training button is used for the parallel

training which uses the parameter “Epochs” to count the cycles of training. The

second training button is used for the sequence training which uses the

parameter “Times” to count the times of training. I will describe the detail about

these two different training methods in Section 4.6.

Display button

Figure 8After the training

From the Figure 8, we can find that the plot glyph button is visible. The first

button “Plot ‘Chain’ Glyph” is used to create the glyphs in the chain, and the

second button “Plot ‘Equal’ Glyph” is used to generate the glyphs of equal size. I

will describe the details in Section 4.6

If the users want to get more information about this software, the “Help” button

will provide more information to them.

4.2 Process of Multiple Spheres SOM

In this project, Sangole&Leontitsis’s S-SOM would be used to generate the

Multiple Spherical Self-Organized Maps. The flow chart below shows the general

process of Multiple Spheres SOM. From the chart, it is can be seen that the

process of Multiple Spheres SOM consists of four critical parts which are

initialization, training process, visualization and evaluation.

In step 1, the data set file and data structure file are loaded, then the common

parameters such as size and spheres are set. After that, all the variables are saved

in the workspace. Compared to the SSOM, the Multiple Spheres SOM’s

neighborhoods structure is changed because the neighbors of neurons are not all

in the same sphere. The neighbors on the other spheres should be calculated, due

to which the neighborhood list should be reorganized. Also for the neurons on

the spheres, the Cartesian coordinates are changed based on the spheres which

we set before. Therefore, we need to resize all the neurons’ coordinates as well.

Load the data set and data

structure

Load the common

parameters and save the

variables

Reorganize the neurons and

their neighbors

Read the epochs for

parallel training

Read the times for sequence

training

Parallel Training

Sequence Training

Step 1 Initialization

Step 1.1Step 1.2

Step 1.3

Step 1.4.1

Step 1.4.2

Step 2.1.1

Step 2.1.2

Step 2 Training process

Step 3 Visualization

Plot ‘Equal’

Glyph

Plot ‘Chain’

Glyph

Step 1.4.2

Step 1.4.2

Distortions and colors

QE and TE

Step 4 Evaluation

Step 4.1.1

Step 4.1.2

Figure 9The flow chart of Multiple Spheres SOM

In step 2, there are two types of training methods could be applied which

depends on the button the user choose. More details will be described in Section

4.5.

After the training process, we need to analyze the training result. Step 3 provides

two different kinds of way to analyze the data in various views which are Plot

‘Chain’ Glyph and Plot ‘Equal’ Glyph. Section 4.6 will give more detail about them.

In the final step, we will evaluate the classification result which we get from the

training process. In Section 4.7, I will give more specific explanation for the

evaluation.

4.3 Neighborhoods structure

One of the different between SSOM and Multiple Spheres SOM is the

neighborhoods structure. The neighborhoods size should become smaller and

smaller during the training. In the initialization, we set neighborhood size equal

0.5 as the default value which means the neighborhood size is to cover a

hemisphere. When the neurons try to cover a hemisphere, it may search to the

other spheres. Therefore, the neighborhoods list should be modified. More

neighbors from the other spheres should be added into the list which due to the

training process would become more complex.

a

b

c

de

f

g

h

i

jk

l

m

a

b

c

de

f

g

h

i

jk

l

m

a

b

c

de

f

g

h

i

jk

l

m

Sphere 1.

Sphere 2.

Sphere 3.

Figure 102D output neuron map of Multiple Spheres SOM

The figure 10 shows part of Multiple Spheres SOM. There are three spheres

which use different colors to represent them in this diagram. We suppose the

winning neuron is a on the sphere 1. We also suppose the distance between

adjacent neurons is equal 1, and the distance between spheres is equal 1 as well.

Compared to S-SOM, the neighborhoods size is changed, we use nsize to

represent the neighborhoods size. When nsize=1, the red(left sphere) neurons

from b to g all are the neighbors of winning neuron red a. Because the distance

between spheres also equal 1, the blue(middle sphere)a and also the neighbors

of red a. When nsize=2, the red neurons from h to m are the neighbors of red a

on sphere 1, the blue neurons from b tog are the neighbors of red a on sphere 2,

and the green(right sphere) neuron a is the neighbor of red a on sphere 3.

However, in what we discussed above the distance between spheres is equal to 1.

In fact, the distance between spheres equal 1 does not make sense. In the next

section, I will provide a new way to set the distance between spheres in order to

get better classification.

4.4 Distance between spheres

In the initialization step, we already set the maximum neighborhoods size is

equal to the steps which from a neuron traverse the hemisphere. In fact, in the

S-SOM, it is an effective way to set the neighborhoods size equal 0.5.

However, for the Multiple Spheres SOM, the distance between spheres is

introduced into the project. In the S-SOM, all output neurons are on the one

sphere. If we still only chose the neighborhoods from the same sphere, it is not

fair for the other neurons. If we set the distance between spheres much shorter

than the distance between adjacent neurons, it may bring more complex

computing in the training process. If we set the distance much longer than the

distance between adjacent neurons, it may be less effective to use the Multiple

Spheres SOM because the winning neuron may only need search among neurons

which are on the same sphere. As a result, a suitable distance between spheres

needs to be found.

The distance between spheres could be considered as the neighborhoods size in

the Multiple Spheres SOM. In another words, we aim that the steps by which the

neuron traverses the hemisphere is exactly equal to the neuron traversing half of

the number of spheres on the Multiple Spheres SOM. Figure 11 below shows the

example of 4 spheres SOM, and I choose the second sphere as the start sphere.

a

b

c

a

b

c

a

b

c

a

b

c

Distance between spheres




Sphere 1 Sphere 3 Sphere 4Sphere 2

Figure 114 spheres SOM

Following the idea above, in figure 11, we want to find a suitable distance to

make the steps from the a on the sphere 1 to the a on the sphere 3 equal to the

distance from a traversing the hemisphere. From Figure 11, we can easily find

that double distance equals to the steps to traversing the hemisphere which we

use rsize to represent. And the half spheres are exactly equal two. If there are six

spheres, we also can find the triple distance is exactly equals to rsize. And the

half spheres are exactly equal three. As a result, we could conclude that make a

conclusion the distance between spheres is equal to double rsize divided by the

number of spheres. The formula as follow:

(4.1)

Where rsize equals the steps from a neuron on the sphere to traversing the

hemisphere.

Moreover, the pseudo code for updating neurons’ neighborhoods list and the

distance between spheres is below:

Algorithm updating neurons’ neighborhoods list and the distance between

spheres

Initialize the neighborhood's data structure;

n spheres of original neighborhood data structure is C;

Cnew=C;

%calculate the distance between spheres

if spheres=1

Distance=0;

else

Distance=2*rsize/spheres

end

% enlarge the neuron’s neighbors in multiple spheres

for i=1 to spheres-1 do

Xnew=[Xnew:X]

Cnew=[Cnew;C]

end

% resize the indexes of the neurons neighbors in multiple spheres

for i=1 to size(C,2)

for j=1 to spheres

for k=1 to nsize %nsize is total number of neurons

Cnew{k+nsize*(j-1),i}=Cnew{k+nsize*(j-1),i}+nsize*(j-1);

End

End

End

rsize is set to span all spheres as much as possible

ifrsize bigger than defaultradiusparementer, the excess are set empty value in order to avoid

over index

%resize the neighborhood list

First get the adjacent spheres index

Second get the k adjacent spheres’s index which distance equal to current rsize-k

4.5 Training methods

There are two different training methods are applied on the Multiple Spheres

SOM which are parallel training and sequence training. The parallel training is

the traditional method which I mentioned before in Chapter 3.1. In parallel

training, the winning neuron should always be chosen for the first pattern for

each sphere in Multiple Spheres SOM. However, in sequence training, the first

pattern which the spheres choose is randomly selected. The follow two tables

show how these two training methods work in 4 spheres SOM in which input

vector have 4 patterns.

Epochs\Spheres S1 S2 S3 S4

1 P1 P1 P1 P1

1 P2 P2 P2 P2

1 P3 P3 P3 P3

1 P4 P4 P4 P4

2 P1 P1 P1 P1

… … … … …

Table 34 patterns input vector in parallel training

Times\Spheres S1 S2 S3 S4

1 P4 P3 P2 P1

2 P3 P2 P1 P4

3 P2 P1 P4 P3

4 P1 P4 P3 P2

5 P2 P3 P4 P1

… … … … …

Table 44 patterns input vector in sequence training

where P1 stands for the first pattern of the input vector.

From the tables above, parallel training used 1 epoch to present each pattern

involved in each sphere and sequence training spent 4 times to achieve that.

Therefore, 4 times in sequence training has the same effect as one epoch in

parallel training.

4.6 Visualization

One of advantage of Multiple Spheres SOM is to provide a 3D framework for

visualizing the data. This project also gives two kinds of display methods which

could be convenient for the user to analyze data from different viewpoints.

The “Plot ‘Chain’ Glyph” provides the visualization of the chain of spheres. Users

can get a view of all the spheres at the same time. In this way, the center sphere

should be focused on and get the biggest view size, the other spheres size are

decreased by the distance from the center sphere. Users can choose the sphere

they need by adjusting the initialization set. The Figure 11 shows 4 spheres and

the sphere1 is set as the center sphere, where the ends of chain wrap around.

Figure 124 spheres chain glyph visualization

However, sometimes the data set has too much input, which usually means a lot

of spheres are needed. In that case, users may wish have to see a lot of spheres in

the same graph. Even though the user can set the center sphere, it is hard to

recognize the features of smaller spheres. The “Plot ‘Equal’ Glyph” could provide

an arbitrary continuous number of equal-size spheres which could avoid these

problems. Figure 12 shows the same result as Figure 11, and only sphere 1 is

shown on the graph.

Figure 134 spheres equal glyph visualization

4.7 Evaluation

There are many ways to evaluate the quality of Multiple Spheres SOM. Here I

would like introduce quantization error and topological error methods which I

used in this project.

The SOM could compress the information while preserving the topological and

the metric relationships of primary data items. When the input dimension is

higher than the output dimension, the vector quantization and topology

preservation always conflict..

However, no matter how much we trained, there would remain some difference

between input patterns and the neuron which they are mapped to. These

differences we consider as the quantization error [11].The Quantization error

(QE) is defined as follow:

(4.2)

Where is the winning neuron vector for each data point .

The topological error (TE) evaluates the complexity of the output space [12]. The

topological error measures the average number of times the second closest

neighbor is not mapped to the first closest neighbor. The more the topological

error, the output space is more complex [11]. High topological error may indicate

the training was not adequate. The Topological error (TE) is defined as follows:

…

(4.3)

Where

…

From the definitions, we can find TE would increase when training because those

non-adjacent neurons on the map will approach each other in the data space. In

contrast, a high QE means that the vectors associated with the map neurons do

not represent the data points well.

5 Experiment of Multiple Spheres SOM

All the experiments described here are based on the Multiple Spheres SOM

model. The network and test suites are implemented in the Matlab R2009a

(7.8.0.347) Windows version. The detailed information for the hardware

environment used is listed in the table below:

PC brand ASUS K42JV Laptop

CPU Intel core i5 M460(2.54GHz)

Memory 2.67GHz

Operate System Windows 7-32bit

Video Card NVIDIA Geforce GT335M

Network PCI Express Gigabit Ethernet Adapter

Table 5Hardware environment

5.1 Experiment 1: Comparison of different SOMs

In this experiment, I would like to compare the different SOMs’ quality by use of

the same dataset in both SOMs. The quantization error and topological error

methods are used to evaluate the classification results of different SOMs.

5.1.1 Experiment description

The dataset I chose is a famous dataset which is perhaps the best known

database to be found in the pattern recognition literature. This dataset consists of

3 classes. Each class contains 50 instances. The attribute information for the IRIS

dataset is listed below:

Arribute 1 Arribute 2 Arribute 3 Arribute 4 Arribute 5

sepal length

in cm

sepal width in

cm

petal length

in cm

petal length

in cm

class:

-Iris Setosa

-Iris Versicolor

-Iris Virginica

Table 6Attributes information for IRIS

The following table shows a summary of IRIS dataset properties

Dataset

name

Input

dimensions

Number of

instances

Missing

values

Data types

IRIS 4 150 No Multivariate

Table 7Summary of dataset

5.1.2 Experiment process and data analysis

As I mentioned before, I am interested in comparing three different types of

self-organizing map which are Kohonen’s SOM, SSOM and Multiple Spheres SOMs.

This part of the experiment would be to find a better type of self-organizing map

by comparing quantization errors and topological errors among Kohonen’s SOM,

SSOM and Multiple Spheres SOMs. At first, all the parameters should be

initialized. The “Epoch” is set to 40. Epoch means the training times of the

network. In order to get an optimal result, this parameter should not be set too

large or too small, as they may cause overfitting or underfitting respectively. The

parameter “Size” is set to 0.5 which means is the size of the neighborhood that is

needed to cover a hemisphere in a spherical SOM.

Then, in order to best collect scientifically valid data results, I chose similar

numbers of neurons for each type of self-organizing map. Therefore, the

which contains 162 neurons is selected as the SSOM structure, and the

4-spheres with42 neurons, 168 in total is selected as the Multiple

Spheres SOMs.

Finally, I run each structure 10 times and show the average values in the

following table:

Kohonen’s SOM SSOM 4 Spheres SOMs

Quantization

error(QE)

0.255 0.218 0.235

Topological

errors(TE)

0.028 0.021 0.019

Table 8Summary of dataset

From the table above, we can see that the SSOM and 4 spheres SOMs are both

better than Kohonen’s SOM in both Quantization errors(QE) and Topological

errors(TE).

In the introduction section, I mentioned that Kohonen’s SOM has border effects.

The neuron on the border of Kohonen’s SOM has less chance to update itself and

its neighborhood. However, both SSOM and Multiple Spheres SOM are able to

avoid this problem because all the neurons are on the surface of a sphere not in a

planar arrangement. As a result, the SSOM and 4 spheres SOMs get a better result

than Kohonen’s SOM.

The following graphs are given the optimal visual effects by “Equal Glyph” display

schema for SSOM and 4-spheres SOMs. Figure 13 shows SSOM visual effects with

0.192 error in QE and 0.018 error in TE, and the figure 14 shows Multiple

Spheres SOMs(4 spheres) visual effect with 0.114 error in QE and 0.013 error in

TE.

Figure 14SSOM visual effects from different angles

Figure 154 spheres SOMs visual effects

From the Table 6 we can see that there are only slight differences in the results of

SSOM and Multiple Spheres SOM. That is most likely because the IRIS dataset

only contains 150 instances and 4 input variables. The dataset is too small to get

a result which differentiates between these two models. As a result, I choose the

ECSH dataset which contains 3641 instances and 7 input vectors to get results on

a more complex dataset.

5.2 Experiment 2: Searching for a better Multiple

Spheres SOMs

In this experiment, I would also like to use the quantization error and topological

error methods to evaluate the classification results between different SSOMs. I

chose a more complex dataset in order to get more general results in different

SSOMS.

As I mention in section 4, I introduced a new way to set the distance between

spheres. In HuajieWu’s method, he used 1 as the distance between spheres. In my

opinion, this is not suitable for the Multiple Spheres SOMs, because we already

set the distance between two adjacent neurons is equal to 1, so if we set the

distance between spheres also equal to1, it seems likely we split a big sphere into

some small spheres and the neighborhood structure do not change much.

Therefore, we need a new way to set the distance between spheres. The distance

value in my method depends on the number of spheres and the size of sphere

(the formula mentioned in Section 4.4).

In order to demonstrate whether my method is better, I chose the same dataset

which Wu used in his experiment, and compared the results between Wu’s

method and my method.

5.2.1 Experiment description

This dataset is used for judging whether the reader is looking at text which is

easy, calm, stressful or hard in the reading process. The attribute information for

the ECSH is listed below:

Arribute

1

Arribute

2

Arribute

3

Arribute

4

Arribute

5

Arribute

6

Arribute

7

xGaze

yGaze pupiLdiam pupiRdiam ECG GSR BP

Table 9Attributes information for IRIS

The following table shows the summary of ECSH

Dataset

name

Input

dimensions

Number of

instances

Missing

value

Data types

ECSH 7 3641 Yes Time

sequence

Table 10 Summary of dataset

5.2.2 Experiment process and data analysis

In order to get the results comparable to the previous work, I used the same

parameter values as Wu. Therefore, the parameter “Epoch” is set to 20, the “Size”

is set to 0.5. I chose the total neurons in the SSOMs as 2,562 which is same as

Wu’s set. Also, I used the same experiment group as Wu.

The following table shows Wu’s result on ESCH, all the data results are the

average of 10 runs:

Error SSOM

2.652n

MSSOM(4s)

4s*642n

MSSOM(15s)

15s*162n

MSSOM(61s)

61s*42n

MSSOM(214s)

214s*12n

QE 193.77 381.71 188.75 587.73 426.14

TE 0.101 0.328 0.179 0.092 0.105

Table 11Results of Huajie Wu’s method

In Table 7, SSOM is a spherical SOM with 2,562 neurons. MSSOM stands for

Multiple Spheres SOMs. MSSOM (4s) represents 4 spheres with 642 neurons;

MSSOM (15s) which stands for 15 spheres with 162 neurons; MSSOM (61s)

represents 61 spheres of 42 neurons; MSSOM (214s) which represents 214

spheres of 12 neurons.

The table below shows my results on ESCH, all the data results are the average of

10 runs:

Error SSOM

2,652n

MSSOM(4s)

4s*642n

MSSOM(15s)

15s*162n

MSSOM(61s)

61s*42n

MSSOM(214s)

214s*12n

QE 193.77 91.99 431.33 3196.75 3001.99

TE 0.101 0.135 0.136 0.088 0.174

Table 12Results of my method

From the Table 7 and Table 8, we can find that both of these two methods find a

better MSSOM which with lower Quantization error and Topological errors. Both

methods show the smallest topological errors in the same group which is MSSOM

(61s) and almost the same values. Wu said in his report “MSSON (61s) has the

largest average number of neighbors per neuron”. In my method, I changed the

distance between spheres, however, when the number of spheres is big enough,

the distance between spheres become much smaller than 1, then more neurons

get the chance to update itself during training. TE is used to measure the whether

the neuron’s first closest neuron is connected with the second closest neurons.

With the neighbors increasing, the possibility of TE is reduced. Therefore, the

MSSON (61S) has the smallest QE in both methods.

However, we also can find that the MSSOM (4s) in my method get a much lower

quantization errors (QE) than either SOM or Wu’s method. In the MSSOM (4s),

the distance between spheres is nearly equal to 5.76 which is significantly larger

than 1. The distance between spheres in MSSOM (4s) is much larger than the

distance between neurons in the same sphere. Therefore, the input patterns have

more chance to be mapped to a similar neuron.

The following graphs show the optimal visual effects by “Equal Glyph” display

schema of MSSOM (4s). The figure 15 shows Multiple Spheres SOMs(4 spheres)

visual effect with 0.114 error in QE and 0.013 error in TE.

Even though I get a much better result than Wu’s in MSSOM (4s), it is still not

completely convincing. If this is correct we need to try other experiments with

large spheres. Therefore, I design the following experiment.

Figure 16MSSOM (4s) visual effects

5.3 Experiment 3: In-depth experiment on MSSOM

In this experiment, I would like use another two topologies of SOM to compare

with Wu’s method. In order to eliminate the hypothesis in experiment 2, I would

use the same structure of MSSOM (4s) but with different numbers.

In experiment 2, MSSOM (4s) contains 4 spheres in which each sphere contain

642 neurons. In this experiment, I will use 3 spheres and 5 spheres in which

sphere also contains 642 neurons to determine if the results I get in experiment

2 continues here.

The table 9 shows Wu’s method results. Each value is the average value of 10

run’s results:

Spheres number QE TE

3 416.71 0.383

4 381.71 0.328

5 305.45 0.459

Table 13Results of Huajie’s method

Table 10 shows the results of my method. Each value is also the average value of

10 run’s results:

Spheres number QE TE

3 96.52 0.113

4 91.99 0.135

5 66.25 0.117

Table 14Results of my method

From tables 9 and 10, we can clearly see that both QE and TE in my method is

much less than Wu’s method. Meanwhile, we can find that the QE is increasing

with the number of spheres. Then, the correlation is show below:

The number of spheres QE TE

Zha’s method Positive correlation No correlation

Wu’s method Positive correlation No correlation

Table 15Correlation between number of spheres and QE&TE

From the table above, we can easily get the result is that QE is positively

correlated with the number of spheres and TE has no correlation with the

number of spheres. As a result, we consider that QE has a more impact on results

and we get the same result as experiment 2.

5.4 Experiment 4: Another new method to modify the

distance

From the three experiments above, we also see that the method I provided before

is an effective way to do classification on the small number and large size of

spheres. In order to find a more effective way, I modified formula (4.1) and did

the same experiment with the same dataset. The new formula as follow:

(5.1)

The reason why I modified like this is I want to keep the new method effective on

few- large-spheres as before and also require the new method could reduce the

errors on the many-small-spheres. From formula 5.1, we can find when the

spheres equals 4, the results of formula 5.1 is same as the formula 4.1.

spheres Huajie`s Distance=2*rszie/spheres 4*rsize/spheres*sqrt(spheres)

4 387.71 0.328 91.99 0.135 91.99 0.135

15 188.25 0.179 431.33 0.136 428.57 0.076

61 587.73 0.092 3196.75 0.088 2776.96 0.073

Table 16Results of new method

Table 12 shows that the new method slightly reduced the QE and the TE. Because

the distance in the new method would change less than before, the

neighborhoods structure would change more slowly than before. As a result, the

QE and the TE is decreased.

6 Conclusion and future work

6.1 Conclusion

I have proposed a new way to modify the structure of Multiple Spheres SOM. The

neighborhood structure is changed in the new method, the distance between

spheres in the Multiple Spheres SOM is changed depending on the number of

spheres and the size of the spheres. Then I designed three experiments to

demonstrate this method is better than Wu’s method.

In experiment 1, I demonstrate that the Multiple Spheres SOM which I proposed

is better than Kohonen’s SOM. In experiment 2, I get a much better result of QE

on MSSOM (4s), which is almost the half of the best result of Wu’s method. Finally,

I demonstrate not only MSSOM (4s) is better than Wu’s, but also the MSSOM (3s)

and MSSOM (5s) are all better than Wu’s.

As a conclusion, my modification of the way to set the distance between spheres

is an effective way to get a better classification.

6.2 Future work

First of all, in order to compare with Wu’s method, I only chose the datasets

which were used in Wu’s experiments. Actually, more datasets should be used to

test on different types of MSSOM and more experiment group need to be

designed to further demonstrate the effectiveness and limitations of my method.

Also, even though I get a better result with a few large spheres, however, when

the number of spheres is large, the distance between spheres would be small.

The neighborhood structure is changed. Usually, when the winning neuron

searches its neighborhoods, the search distance should increase by 1 for each

time. Because all the neighborhoods structure are stored in the matrix, and the

index of matrix should be an integer. When the winning neuron tries to search

its neighborhoods, it may take more time to search on the other spheres which

may bring more QE and TE. It may also take more time to search on more spheres

because the distance between multiple spheres is too small. As a result, in order

to avoid those problems, the formula (4.1) should be modified to cater to the

large numbers of spheres. It should be investigated whether the problem is really

due to the fact that there are many spheres, or is due to the small sized spheres

used. A possibility to investigate is whether there should be a maximum on how

many spheres away a neighborhood can stretch.

Reference

[1] Leontitsis.A&Sangole. AP 2006, ‘Estimating an Optimal Neighborhood Size in the

Spherical Self-Organizing Feature Map’, International Journal of Computational

Intelligence, vol 18, no. 35, pp. 192-196.

[2] Nishio.H, Altaf-Ui-Amin.MD, Kurokawa.K, Minato. K&Kanaya, S 2005, ‘Spherical SOM with

arbitrary number of nwurons and measure of suitability’,WSOM, pp. 323-330

[3] Sangole.A& Knopf. GK 2003, ‘Representing high-dimensional data sets as closed surfaces’,

Information Visualization 1.2Jun 2002 pp. 111-124

[4] Tuoya, Suggi. Y, Hitomi Satoh, Dongwen Yu, Matsuura. Y, Tokutaka. H & Seno.2008,

‘Spherical self-organizing map as a helpful tool to identify category-specific cell surface

markers’,Biochemical and Biophusical Research Communciation, vol.376, pp. 414-418

[5] Lawrence, J 1994, Introduction to neural network: design, theory and applications, 6thedn,

California Scientific Software Press.

[6] Kohonen, T 1990,‘TheSelf-Organizing Map’, Proceedings of the IEEE, vol.78, no. 9,

September,pp. 1461-1480

[7] Kohonen, T 1982,‘Self-Organized Formation of Topologicallly Correct Feature Maps’,

Biological Cybernetics, vol.43.

[8] Sangole, A & Knopf, GK 2003, ‘Visualization of randomly ordered numeric data sets using

spherical sel-organizing feature maps’, Computer & Graphics, vol.27, no.6, pp. 963-976.

[9] Wu, Y &Takatsuka, M 2005, ‘Fast Spherical self-organizing map–use of indexed geodesic

data structure’, WSOM

[10] Wu, Y &Takatsuka, M 2006, ‘Spherical self-organizing map using efficient indexed

geodesic data structure’, Neural Networks, vol. 19, no. 6, pp. 900-910

[11] Kirk. J. S, Zurada. M. J 1999,‘Algorithms for improved topology preservation in

self-organizing maps’, IEEE, pp. 396-400

[12] Uriarte, EA & Martin, FD 2005, ‘Topology Preservation in SOM’, Interational Journal of

Mathematical and Computer Science, vol.1, no.1, pp. 1-4.

[13] Sangole, A & Knopf, GK 2003, ‘Geometric Representations for High-Dimensional Data

Using a Spherical SOFM’, Smart Engineering System Design, vol.5, pp. 11-20.

multiple spheres som - home - cecs -...

Documents