alfred ultsch,lutz herrmann databionics research group · ica is „all wrong“ here. herrmann/...

Post on 29-Oct-2019

4 Views

Category:

Documents

0 Downloads

Preview:

Click to see full reader

TRANSCRIPT

Self-Organized Swarms for cluster preserving Projections of high-dimensional Data

Alfred Ultsch,Lutz Herrmann

Databionics Research Group

Herrmann/ UltschUniversity of Marburg 2

Outline of the talk

How to visualize structures of the Rn

What do we know about the structure

preserving of projections?

Problems with naive usage

U-Matrix on ESOM

Swarm Organized Projection (SOP)

Herrmann/ UltschUniversity of Marburg 3

How to visualize structures of the Rn

Answer: Projection into R<4

Possibilities:

1. Linear Projections

2. Nonlinear Projections preserving

distances

3. Discontinuous Projections preserving

what ??

Herrmann/ UltschUniversity of Marburg 4

1. Linear Projections PCA

Principal Component Analysis PCA

PCA is „all wrong“ here

Herrmann/ UltschUniversity of Marburg 5

1. Linear Projections ICA

Independent Component Analysis ICA

ICA is „all wrong“ here

Herrmann/ UltschUniversity of Marburg 6

2. Nonlinear Projections MDS

Multidimensional Scaling MDS

MDS is „all wrong“ here

See non linear distortion of space!

Herrmann/ UltschUniversity of Marburg 7

2. Nonlinear Projections Sammon

Sammons Mapping SM

SM is „all wrong“ here

See non linear distortion of space!

Herrmann/ UltschUniversity of Marburg 8

Curvilinear Component Analysis

advanced offspring of MDS

Michel Verleysen‘s study group

“unfolds non-linear manifolds“

minimization of topographic error

Herrmann/ UltschUniversity of Marburg 9

3. Discontinuous Projection : CCA

Curvilinear Component Analysis CCA [Verleysen et al /Leuven]

advanced offspring of MDS

“unfolds non-linear manifolds“

minimization of “topographic error”

Problem: cooling scheme for neighborhood radius

-2

-1

0

1

2

-2-1

01

23

-2

-1

0

1

2

x

Chainlink

y

z

CCA

Herrmann/ UltschUniversity of Marburg 10

3. Discontinuous Projection : ESOM

Emergent SOM = ESOM

Visualize Structures using U-matrix

Appoved recipes for parameters

Herrmann/ UltschUniversity of Marburg 11

ESOM (Emergent SOM)

Self Organizezed Feture Maps come in 2

Flavours:

1. k-means-SOM

Very few neurons

Neuron = Cluster

Clustering Properties:

identical with k-means

2. Emergent SOM (ESOM)

„cluster“

Herrmann/ UltschUniversity of Marburg 12

ESOM (Emergent SOM) Self Organized Feature Maps come in 2 Flavours:

1. k-means-SOM

2. Emergent SOM (ESOM)

4000+ Neurons on Rectangular Grid

Borderless (toroid)

Properties:

nonlinear topology (cluster) preserving projection

Rn (Data) -> grid of neurons

Nonlinear Interpolation,

rather Projection than Clustering!

82 Neurons

50N

.

Herrmann/ UltschUniversity of Marburg 13

3.Discontinuous Projection : ESOM

ESOM

ESOM: is „all wrong“ here

Herrmann/ UltschUniversity of Marburg 14

3.Discontinuous Projection : ESOM

ESOM

However: Cluster structure is preserved!

Disentangling of rings !

Herrmann/ UltschUniversity of Marburg 15

What do we know about the structure

preserving of projections?

Topology Preservation of Rn -> Rm

is NOT POSSIBLE

Only possible if „true data dimension = m“ i.e.

If Data resides on m-dimensional submanifold

Measuring this „intrinsic“ dimension is difficult!

„swiss roll“

Herrmann/ UltschUniversity of Marburg 16

What do we know about the structure

preserving of projections?

perfect preservation of topology for

Rn –Rm , m < n is per se impossible

So, how do the projection algorithms perform?

Ad 1. linear Projections d(p(x),p(y)) = flin(D(x,y))

PCA aims at variance preservation

ICA aims at „Non-Gaussianity“

Both aims may not be Cluster preserving!

Herrmann/ UltschUniversity of Marburg 17

What do we know about the structure

preserving of projections?

Ad 2. Non Linear Projections

Often Stress-Measure E

E(D(x,y),d(p(x),p(y)) = error(D(x,y)-> d(p(x),p(y)))

Directly minimized (see MDS, Sammon, CCA)

However: may not be Cluster preserving!

Distance Structures may be non linear distorted!

Visualization may be misleading! (see above)

Herrmann/ UltschUniversity of Marburg 18

Ad 3. discontinuous (E) SOM

Bottom lines of theory on SOM:

SOM is nonlinear, discontinuous and “usually” cluster preserving

For SOM in principle no Energy function possible

Variants of SOM show topology preservation under certain preconditions

Herrmann/ UltschUniversity of Marburg 19

Let’s face the Problem:

For projections R -> Rm, n>3,m<4

Conservation of structures (clusters) in

general impossible

So what?

Proposal: visualize the “problem areas”

Easy for ESOM : The U-matrix

Herrmann/ UltschUniversity of Marburg 20

U-Matrix

Neurons

on the

Grid

E

SESSW

W

NENNW

ESOM,

Topology preservation

nonlinear disentangeling

Grid of

neurons

weights

of

neurons

U-heigths are average local distances !

U-Matrix

Herrmann/ UltschUniversity of Marburg 21

U-matrix on SOM Shows distance structures of the Rn

High ridges = large distances

Low valleys = points in Rn close to each other

Neurons

on the

Grid

E

SESSW

W

NENNW

U-Matrix of Chainlink U-heigths are local distances !

Herrmann/ UltschUniversity of Marburg 22

Ant-Based Projections/Clustering

stochastic ants perform random walks on a grid

fixed perceptive neighbourhood of size σ2 2 {9,25}

one or many ants:

pick input sample when neighbourhood contains dissimilar samples

drop input sample when neighbourhood containts similiar samples

σ

σ

Herrmann/ UltschUniversity of Marburg 23

Important Ant Models

Deneubourg (1990): modelling emergent phenomena of ants clustering corpses

Lumer, Faieta (1994): clustering pairwise dissimilarity data, e.g. four gaussians

Ramos (2003): ACLUSTER method includes pheromones, analyzes web traffic data

Handl, Knowles (2005): ATTA method, first solid empirical evaluations

Tan et al. (2006): empirical evaluation: number of ants is irrelevant

Herrmann/ UltschUniversity of Marburg 24

too many, too small clusters appear

topologically distorted clusters

results are highly sensitive to parameters, e.g. grid size

no proof of topological ordering

-0.2 0 0.2 0.4 0.6 0.8 1 1.2-0.2

0

0.2

0.4

0.6

0.8

1

1.2

data:

4 gaussian

cluster

data processed with

Lumer/Faieta algorithm

Problems with Ant-Based Methods

Herrmann/ UltschUniversity of Marburg 25

Strengths & Weaknesses

Ant-Based

Clustering

(Emergent)

SOM

Quality of

topographic

mapping

poor good

continuous

learningyes no

data metric spacesnormed vector

spaces

Improve the quality of

topographic mappings

to obtain a method that

is superior to Self-

Organizing Maps !

Herrmann/ UltschUniversity of Marburg 26

Xy

Xy

i ymih

yxymih

xm))(,(

))(,(

minarg)(

Bestmatch Objectives

Batch-SOM

Dissimilarity-SOM

Ants

z

z

yi izmh

zyizmh

xxm)),((

)),((

minargminarg)(

Xy

Xy

i ymih

yxymihiN

xm))(,(

))(,(1

1)(

maxarg)(2

Distortion

upper limit of the

SOM objective,

resembles

Dissimiliarity SOM

Herrmann/ UltschUniversity of Marburg 27

Ants versus SOM

Ants maximize the product :

“output density × topography

preservation”

This distorts the formation of correct

topographic maps.

Herrmann/ UltschUniversity of Marburg 28

From Ants to Swarms

Ants

use small, but fixed neighbourhoods

account for output space density

Swarms

use large, shrinking neighbourhoods

do not account for output space density

Herrmann/ UltschUniversity of Marburg 29

Lumer/Faieta Projection of Chainlink

too many & too small clusters emerge

-4

-2

0

2

4

-2-1.5

-1-0.5

00.5

11.5

2

-3

-2

-1

0

1

2

3

0 10 20 30 40 50 600

10

20

30

40

50

60

Ants

Herrmann/ UltschUniversity of Marburg 30

Summary up to here (ESANN 08)

Ants and swarms construct much worse

mappings than ESOM.

Reason: objective function includes

optimization of output density

Herrmann/ UltschUniversity of Marburg 31

Swarm Model improvement

Make ant systems more ESOM-like:

1. use perceptive neighbourhoods like in ESOM

(starting large then shrinking…)

2. omit output space densities from objective function , preservation is of doubtful value

distorts the topographic term:

output space densities easy to optimize and

this will dominate the objective function

Herrmann/ UltschUniversity of Marburg 32

Swarm Clustering of Chainlink

Cluster structure is preserved.

-4

-2

0

2

4

-2-1.5

-1-0.5

00.5

11.5

2

-3

-2

-1

0

1

2

3

0 10 20 30 40 50 600

10

20

30

40

50

60

AntsSwarm

Herrmann/ UltschUniversity of Marburg 33

Trials on Fundamental Clustering

Problems Suite (FCPS)

FCPS: a collection of simple data sets

Every decent clustering /

topo-mapping algorithm

should be able to handle

these little problems

the ESOM does

But many don’t

Herrmann/ UltschUniversity of Marburg 34

FCPS download

google: FCPS , Ultsch

Herrmann/ UltschUniversity of Marburg 35

0

5

10

15

20

25

atom chainlink hepta iris target 2diamonds wingnut

Experimental Results:

Ants versus Swarm Model on FCPS

Measuring Topography

Preservation with “Minimal

Pathlength” [Goodhill 95]

percent improvements

typ > +10%

On 100 runs: significanctly

reduced error values

according to Kolmogorov-

Smirnov test on α = 1% level

Herrmann/ UltschUniversity of Marburg 36

Real World Application of Swarm

Model

Bioinformatics Data

containing protein data

GPD194 [Popescu 06].

194 proteins / 3 Classes

pairwise dissimilarities.

well defined cluster

structure

see Silhouettes:

Herrmann/ UltschUniversity of Marburg 37

Other Projections of GPD194

MDS

Herrmann/ UltschUniversity of Marburg 38

Other Projections of GPD194

CCA

Herrmann/ UltschUniversity of Marburg 39

SOP Projection

Herrmann/ UltschUniversity of Marburg 40

Conclusions

“Naive” usage of your favorite projection:

May show cluster that are not in the data

May miss cluster that are in the data

Cluster preserving Projection Rn -> R2 for

nontrivial data in principle impossible

ESOM/U-Matrix superior to most projections

Swarm Organized Projection (SOP) still better

top related