artificial neural networks (icann)

Lecture Notes in Computer Science 5163

Commenced Publication in 1973

Founding and Former Series Editors:Gerhard Goos, Juris Hartmanis, and Jan van Leeuwen

Editorial Board

David HutchisonLancaster University, UK

Takeo KanadeCarnegie Mellon University, Pittsburgh, PA, USA

Josef KittlerUniversity of Surrey, Guildford, UK

Jon M. KleinbergCornell University, Ithaca, NY, USA

Alfred KobsaUniversity of California, Irvine, CA, USA

Friedemann MatternETH Zurich, Switzerland

John C. MitchellStanford University, CA, USA

Moni NaorWeizmann Institute of Science, Rehovot, Israel

Oscar NierstraszUniversity of Bern, Switzerland

C. Pandu RanganIndian Institute of Technology, Madras, India

Bernhard SteffenUniversity of Dortmund, Germany

Madhu SudanMassachusetts Institute of Technology, MA, USA

Demetri TerzopoulosUniversity of California, Los Angeles, CA, USA

Doug TygarUniversity of California, Berkeley, CA, USA

Gerhard WeikumMax-Planck Institute of Computer Science, Saarbruecken, Germany

Vera Kurkov

Roman Neruda

Jan Koutnk (Eds.)

ArtificialNeural Networks ICANN 2008

18th International Conference

Prague, Czech Republic, September 3-6, 2008

Proceedings, Part I

13

Volume Editors

Vera KurkovRoman NerudaInstitute of Computer ScienceAcademy of Sciences of the Czech RepublicPod Vodarenskou vezi 2182 07 Prague 8, Czech RepublicE-mail: {vera, roman}@cs.cas.cz

Jan KoutnkDepartment of Computer ScienceCzech Technical University in PragueKarlovo nam. 13121 35 Prague 2, Czech RepublicE-mail: [email protected]

Library of Congress Control Number: 2008934470

CR Subject Classification (1998): F.1, I.2, I.5, I.4, G.3, J.3, C.2.1, C.1.3

LNCS Sublibrary: SL 1 Theoretical Computer Science and General Issues

ISSN 0302-9743

ISBN-10 3-540-87535-2 Springer Berlin Heidelberg NewYork

ISBN-13 978-3-540-87535-2 Springer Berlin Heidelberg NewYork

This work is subject to copyright. All rights are reserved, whether the whole or part of the material isconcerned, specifically the rights of translation, reprinting, re-use of illustrations, recitation, broadcasting,reproduction on microfilms or in any other way, and storage in data banks. Duplication of this publicationor parts thereof is permitted only under the provisions of the German Copyright Law of September 9, 1965,in its current version, and permission for use must always be obtained from Springer. Violations are liableto prosecution under the German Copyright Law.

Springer is a part of Springer Science+Business Media

springer.com

Springer-Verlag Berlin Heidelberg 2008Printed in Germany

Typesetting: Camera-ready by author, data conversion by Scientific Publishing Services, Chennai, IndiaPrinted on acid-free paper SPIN: 12520565 06/3180 5 4 3 2 1 0

Preface

This volume is the first part of the two-volume proceedings of the 18th Interna-tional Conference on Artificial Neural Networks (ICANN 2008) held September36, 2008 in Prague, Czech Republic. The ICANN conferences are annual meet-ings supervised by the European Neural Network Society, in cooperation withthe International Neural Network Society and the Japanese Neural Network So-ciety. This series of conferences has been held since 1991 in various Europeancountries and covers the field of neurocomputing and related areas. In 2008,the ICANN conference was organized by the Institute of Computer Science,Academy of Sciences of the Czech Republic together with the Department ofComputer Science and Engineering from the Faculty of Electrical Engineeringof the Czech Technical University in Prague. Over 300 papers were submittedto the regular sessions, two special sessions and two workshops. The ProgramCommittee selected about 200 papers after a thorough peer-review process; theyare published in the two volumes of these proceedings. The large number, varietyof topics and high quality of submitted papers reflect the vitality of the field ofartificial neural networks.

The first volume contains papers on the mathematical theory of neurocom-puting, learning algorithms, kernel methods, statistical learning and ensembletechniques, support vector machines, reinforcement learning, evolutionary com-puting, hybrid systems, self-organization, control and robotics, signal and timeseries processing and image processing.

The second volume is devoted to pattern recognition and data analysis, hard-ware and embedded systems, computational neuroscience, connectionistic cogni-tive science, neuroinformatics and neural dynamics. It also contains papers fromtwo special sessions, Coupling, Synchronies, and Firing Patterns: From Cogni-tion to Disease, and Constructive Neural Networks, and two workshops, NewTrends in Self-Organization and Optimization of Artificial Neural Networks, andAdaptive Mechanisms of the Perception-Action Cycle.

It is our pleasure to express our gratitude to everyone who contributed inany way to the success of the event and the completion of these proceedings. Inparticular, we thank the members of the Board of the ENNS who uphold thetradition of the series and helped with the organization. With deep gratitude wethank all the members of the Program Committee and the reviewers for theirgreat effort in the reviewing process. We are very grateful to the members of theOrganizing Committee whose hard work made the vision of the 18th ICANNreality. Zdenek Buk and Eva Pospsilova and the entire Computational Intel-ligence Group at Czech Technical University in Prague deserve special thanksfor preparing the conference proceedings. We thank to Miroslav Cepek for theconference website administration. We thank Milena Zeithamlova and Action MAgency for perfect local arrangements. We also thank Alfred Hofmann, Ursula

VI Preface

Barth, Anna Kramer and Peter Strasser from Springer for their help with thisdemanding publication project. Last but not least, we thank all authors whocontributed to this volume for sharing their new ideas and results with the com-munity of researchers in this rapidly developing field of biologically motivatedcomputer science. We hope that you enjoy reading and find inspiration for yourfuture work in the papers contained in these two volumes.

June 2008 Vera KurkovaRoman Neruda

Jan Koutnk

Organization

Conference Chairs

General Chair Vera Kurkova, Academy of Sciences of theCzech Republic, Czech Republic

Co-Chairs Roman Neruda, Academy of Sciences of theCzech Republic, Czech Republic

Jan Koutnk, Czech Technical University inPrague, Czech Republic

Milena Zeithamlova, Action M Agency,Czech Republic

Honorary Chair John Taylor, Kings College London, UK

Program Committee

Wlodzislaw Duch Nicolaus Copernicus University in Torun,Poland

Luis Alexandre University of Beira Interior, PortugalBruno Apolloni Universita Degli Studi di Milano, ItalyTimo Honkela Helsinki University of Technology, FinlandStefanos Kollias National Technical University in Athens,

GreeceThomas Martinetz University of Lubeck, GermanyGuenter Palm University of Ulm, GermanyAlessandro Sperduti Universita Degli Studi di Padova, ItalyMichel Verleysen Universite catholique de Louvain, BelgiumAlessandro E.P. Villa Universite jouseph Fourier, Grenoble,

FranceStefan Wermter University of Sunderland, UKRudolf Albrecht University of Innsbruck, AustriaPeter Andras Newcastle University, UKGabriela Andrejkova P.J. Safarik University in Kosice, SlovakiaBartlomiej Beliczynski Warsaw University of Technology, PolandMonica Bianchini Universita degli Studi di Siena, ItalyAndrej Dobnikar University of Ljubljana, SloveniaJose R. Dorronsoro Universidad Autonoma de Madrid, Spain

Peter Erdi Hungarian Academy of Sciences, HungaryMarco Gori Universita degli Studi di Siena, ItalyBarbora Hammer University of Osnabruck, Germany

VIII Organization

Tom Heskes Radboud University Nijmegen,The Netherlands

Yoshifusa Ito Aichi-Gakuin University, JapanJanusz Kacprzyk Polish Academy of Sciences, PolandPaul C. Kainen Georgetown University, USAMikko Kolehmainen University of Kuopio, FinlandPavel Kordk Czech Technical University in Prague,

Czech RepublicVladimr Kvasnicka Slovak University of Technology in Bratislava,

SlovakiaDanilo P. Mandic Imperial College, UKErkki Oja Helsinki University of Technology, FinlandDavid Pearson Universite Jean Monnet, Saint-Etienne,

FranceLionel Prevost Universite Pierre et Marie Curie, Paris,

FranceBernadete Ribeiro University of Coimbra, PortugalLeszek Rutkowski Czestochowa University of Technology, PolandMarcello Sanguineti University of Genova, ItalyKaterina Schindler Austrian Academy of Sciences, AustriaJuergen Schmidhuber TU Munich (Germany) and IDSIA

(Switzerland)Jir Sma Academy of Sciences of the Czech Republic,

Czech RepublicPeter Sincak Technical University in Kosice, SlovakiaMiroslav Skrbek Czech Technical University in Prague,

Czech RepublicJohan Suykens Katholieke Universiteit Leuven, BelgiumMiroslav Snorek Czech Technical University in Prague,

Czech RepublicRyszard Tadeusiewicz AGH University of Science and Technology,

Poland

Local Organizing Committee

Zdenek Buk Czech Technical University in PragueMiroslav Cepek Czech Technical University in PragueJan Drchal Czech Technical University in PraguePaul C. Kainen Georgetown UniversityOleg Kovark Czech Technical University in PragueRudolf Marek Czech Technical University in PragueAles Pilny Czech Technical University in PragueEva Pospsilova Academy of Sciences of the Czech RepublicTomas Siegl Czech Technical University in Prague

Organization IX

Referees

S. AbeR. AdamczakR. AlbrechtE. AlhoniemiR. AndonieG. AngeliniD. AnguitaC. Angulo-BahonC. ArchambeauM. AtenciaP. AubrechtY. AvrithisL. BenuskovaT. BeranZ. BukG. CawleyM. CepekE. CorchadoV. CutsuridisE. DominguezG. DouniasJ. DrchalD. A. ElizondoH. ErwinZ. FabianA. FlanaganL. FrancoD. FrancoisC. FyfeN. Garca-PedrajasG. GneccoB. GosselinJ. GrimR. HaschkeM. HolenaJ. HollmenT. David Huang

D. HusekA. HussainM. ChetouaniC. IgelG. IndiveriS. IshiiH. IzumiJ.M. JerezM. JirinaM. Jirina, jr.K.T. KalveramK. KarpouzisS. KasderidisM. KoskelaJ. KubalkM. KulichF.J. KurfessM. KurzynskiJ. LaaksonenE. LangK. LeiviskaL. LhotskaA. LikasC. LoizouR. MarekE. MarchioriM. A. Martn-MerinoV. di MassaF. MasulliJ. MandziukS. MelacciA. MicheliF. MoutardeR. Cristian MuresanM. NakayamaM. NavaraD. Novak

M. OlteanuD. Ortiz BoyerH. Paugam-MoisyK. PelckmansG. PetersP. PoskD. PolaniM. PorrmannA. PucciA. RaouzaiouK. RapantzikosM. RochaA. RomarizF. RossiL. SartiB. SchrauwenF. SchwenkerO. SimulaA. SkodrasS. SlusnyA. StafylopatisJ. StastnyD. StefkaG. StoilosA. SuarezE. TrentinN. TsapatsoulisP. VidnerovaT. VillmannZ. VomlelT. WennekersP. WiraB. WynsZ. YangF. Zelezny

Table of Contents Part I

Mathematical Theory of Neurocomputing

Dimension Reduction for Mixtures of Exponential Families . . . . . . . . . . . . 1Shotaro Akaho

Several Enhancements to Hermite-Based Approximation ofOne-Variable Functions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11

Bartlomiej Beliczynski and Bernardete Ribeiro

Multi-category Bayesian Decision by Neural Networks . . . . . . . . . . . . . . . . 21Yoshifusa Ito, Cidambi Srinivasan, and Hiroyuki Izumi

Estimates of Network Complexity and Integral Representations . . . . . . . . 31Paul C. Kainen and Vera Kurkova

Reliability of Cross-Validation for SVMs in High-Dimensional, LowSample Size Scenarios . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 41

Sascha Klement, Amir Madany Mamlouk, and Thomas Martinetz

Generalization of Concave and Convex Decomposition in Kikuchi FreeEnergy . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 51

Yu Nishiyama and Sumio Watanabe

Analysis of Chaotic Dynamics Using Measures of the Complex NetworkTheory . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 61

Yutaka Shimada, Takayuki Kimura, and Tohru Ikeguchi

Global Dynamics of Finite Cellular Automata . . . . . . . . . . . . . . . . . . . . . . . 71Martin Schule, Thomas Ott, and Ruedi Stoop

Learning Algorithms

Semi-supervised Learning of Tree-Structured RBF Networks UsingCo-training . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 79

Mohamed F. Abdel Hady, Friedhelm Schwenker, and Gunther Palm

A New Type of ART2 Architecture and Application to Color ImageSegmentation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 89

Jiaoyan Ai, Brian Funt, and Lilong Shi

BICA: A Boolean Indepenedent Component Analysis Approach . . . . . . . . 99Bruno Apolloni, Simone Bassis, and Andrea Brega

XII Table of Contents Part I

Improving the Learning Speed in 2-Layered LSTM Network byEstimating the Configuration of Hidden Units and Optimizing WeightsInitialization . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 109

Debora C. Correa, Alexandre L.M. Levada, and Jose H. Saito

Manifold Construction Using the Multilayer Perceptron . . . . . . . . . . . . . . . 119Wei-Chen Cheng and Cheng-Yuan Liou

Improving Performance of a Binary Classifier by Training SetSelection . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 128

Cezary Dendek and Jacek Mandziuk

An Overcomplete ICA Algorithm by InfoMax and InfoMin . . . . . . . . . . . . 136Yoshitatsu Matsuda and Kazunori Yamaguchi

OP-ELM: Theory, Experiments and a Toolbox . . . . . . . . . . . . . . . . . . . . . . . 145Yoan Miche, Antti Sorjamaa, and Amaury Lendasse

Robust Nonparametric Probability Density Estimation by SoftClustering . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 155

Ezequiel Lopez-Rubio, Juan Miguel Ortiz-de-Lazcano-Lobato,

Domingo Lopez-Rodrguez, and Mara del Carmen Vargas-Gonzalez

Natural Conjugate Gradient on Complex Flag Manifolds for ComplexIndependent Subspace Analysis . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 165

Yasunori Nishimori, Shotaro Akaho, and Mark D. Plumbley

Quadratically Constrained Quadratic Programming for SubspaceSelection in Kernel Regression Estimation . . . . . . . . . . . . . . . . . . . . . . . . . . . 175

Marco Signoretto, Kristiaan Pelckmans, and Johan A.K. Suykens

The Influence of the Risk Functional in Data Classification withMLPs . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 185

Lus M. Silva, Mark Embrechts, Jorge M. Santos, and

Joaquim Marques de Sa

Nonnegative Least Squares Learning for the Random NeuralNetwork . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 195

Stelios Timotheou

Kernel Methods, Statistical Learning, and EnsembleTechniques

Sparse Bayes Machines for Binary Classification . . . . . . . . . . . . . . . . . . . . . 205Daniel Hernandez-Lobato

Tikhonov Regularization Parameter in Reproducing Kernel HilbertSpaces with Respect to the Sensitivity of the Solution . . . . . . . . . . . . . . . . 215

Katerina Hlavackova-Schindler

Table of Contents Part I XIII

Mixture of Expert Used to Learn Game Play . . . . . . . . . . . . . . . . . . . . . . . . 225Peter Lacko and Vladimr Kvasnicka

Unsupervised Bayesian Network Learning for Object Recognition inImage Sequences . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 235

Daniel Oberhoff and Marina Kolesnik

Using Feature Distribution Methods in Ensemble Systems Combinedby Fusion and Selection-Based Methods . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 245

Laura E.A. Santana, Anne M.P. Canuto, and Joao C. Xavier Jr.

Bayesian Ying-Yang Learning on Orthogonal Binary Factor Analysis . . . 255Ke Sun and Lei Xu

A Comparative Study on Data Smoothing Regularization for LocalFactor Analysis . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 265

Shikui Tu, Lei Shi, and Lei Xu

Adding Diversity in Ensembles of Neural Networks by Reordering theTraining Set . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 275

Joaqun Torres-Sospedra, Carlos Hernandez-Espinosa, and

Mercedes Fernandez-Redondo

New Results on Combination Methods for Boosting Ensembles . . . . . . . . 285Joaqun Torres-Sospedra, Carlos Hernandez-Espinosa, and

Mercedes Fernandez-Redondo

Support Vector Machines

Batch Support Vector Training Based on Exact Incremental Training . . . 295Shigeo Abe

A Kernel Method for the Optimization of the Margin Distribution . . . . . 305Fabio Aiolli, Giovanni Da San Martino, and Alessandro Sperduti

A 4Vector MDM Algorithm for Support Vector Training . . . . . . . . . . . . . 315Alvaro Barbero, Jorge Lopez, and Jose R. Dorronsoro

Implementation Issues of an Incremental and Decremental SVM . . . . . . . 325Honorius Galmeanu and Razvan Andonie

Online Clustering of Non-stationary Data Using Incremental andDecremental SVM . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 336

Khaled Boukharouba and Stephane Lecoeuche

Support Vector Machines for Visualization and DimensionalityReduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 346

Tomasz Maszczyk and Wlodzislaw Duch

XIV Table of Contents Part I

Reinforcement Learning

Multigrid Reinforcement Learning with Reward Shaping . . . . . . . . . . . . . . 357Marek Grzes and Daniel Kudenko

Self-organized Reinforcement Learning Based on Policy Gradientin Nonstationary Environments . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 367

Yu Hiei, Takeshi Mori, and Shin Ishii

Robust Population Coding in Free-Energy-Based ReinforcementLearning . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 377

Makoto Otsuka, Junichiro Yoshimoto, and Kenji Doya

Policy Gradients with Parameter-Based Exploration for Control . . . . . . . 387Frank Sehnke, Christian Osendorfer, Thomas Ruckstie,

Alex Graves, Jan Peters, and Jurgen Schmidhuber

A Continuous Internal-State Controller for Partially ObservableMarkov Decision Processes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 397

Yuki Taniguchi, Takeshi Mori, and Shin Ishii

Episodic Reinforcement Learning by Logistic Reward-WeightedRegression . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 407

Daan Wierstra, Tom Schaul, Jan Peters, and Juergen Schmidhuber

Error-Entropy Minimization for Dynamical Systems Modeling . . . . . . . . . 417Jernej Zupanc

Evolutionary Computing

Hybrid Evolution of Heterogeneous Neural Networks . . . . . . . . . . . . . . . . . 426Zdenek Buk and Miroslav Snorek

Ant Colony Optimization with Castes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 435Oleg Kovark and Miroslav Skrbek

Neural Network Ensembles for Classification Problems UsingMultiobjective Genetic Algorithms . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 443

David Lahoz and Pedro Mateo

Analysis of Vestibular-Ocular Reflex by Evolutionary Framework . . . . . . . 452Daniel Novak, Ales Pilny, Pavel Kordk, Stefan Holiga, Petr Posk,

R. Cerny, and Richard Brzezny

Fetal Weight Prediction Models: Standard Techniques or ComputationalIntelligence Methods? . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 462

Tomas Siegl, Pavel Kordk, Miroslav Snorek, and Pavel Calda

Table of Contents Part I XV

Evolutionary Canonical Particle Swarm Optimizer A Proposal ofMeta-optimization in Model Selection . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 472

Hong Zhang and Masumi Ishikawa

Hybrid Systems

Building Localized Basis Function Networks Using Context DependentClustering . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 482

Marcin Blachnik and Wlodzislaw Duch

Adaptation of Connectionist Weighted Fuzzy Logic Programs withKripke-Kleene Semantics . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 492

Alexandros Chortaras, Giorgos Stamou, Andreas Stafylopatis, and

Stefanos Kollias

Neuro-fuzzy System for Road Signs Recognition . . . . . . . . . . . . . . . . . . . . . 503Boguslaw Cyganek

Neuro-inspired Speech Recognition with Recurrent Spiking Neurons . . . . 513Arfan Ghani, T. Martin McGinnity, Liam P. Maguire, and

Jim Harkin

Predicting the Performance of Learning Algorithms Using SupportVector Machines as Meta-regressors . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 523

Silvio B. Guerra, Ricardo B.C. Prudencio, and Teresa B. Ludermir

Municipal Creditworthiness Modelling by Kohonens Self-organizingFeature Maps and Fuzzy Logic Neural Networks . . . . . . . . . . . . . . . . . . . . . 533

Petr Hajek and Vladimir Olej

Implementing Boolean Matrix Factorization . . . . . . . . . . . . . . . . . . . . . . . . . 543Roman Neruda, Vaclav Snasel, Jan Platos, Pavel Kromer,

Dusan Husek, and Alexander A. Frolov

Application of Potts-Model Perceptron for Binary PatternsIdentification . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 553

Vladimir Kryzhanovsky, Boris Kryzhanovsky, and Anatoly Fonarev

Using ARTMAP-Based Ensemble Systems Designed by Three Variantsof Boosting . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 562

Araken de Medeiros Santos and Anne Magaly de Paula Canuto

Self-organization

Matrix Learning for Topographic Neural Maps . . . . . . . . . . . . . . . . . . . . . . . 572Banchar Arnonkijpanich, Barbara Hammer,

Alexander Hasenfuss, and Chidchanok Lursinsap

XVI Table of Contents Part I

Clustering Quality and Topology Preservation in Fast LearningSOMs . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 583

Antonino Fiannaca, Giuseppe Di Fatta, Salvatore Gaglio,

Riccardo Rizzo, and Alfonso Urso

Enhancing Topology Preservation during Neural Field DevelopmentVia Wiring Length Minimization . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 593

Claudius Glaser, Frank Joublin, and Christian Goerick

Adaptive Translation: Finding Interlingual Mappings UsingSelf-organizing Maps . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 603

Timo Honkela, Sami Virpioja, and Jaakko Vayrynen

Self-Organizing Neural Grove: Efficient Multiple Classifier System withPruned Self-Generating Neural Trees . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 613

Hirotaka Inoue

Self-organized Complex Neural Networks through Nonlinear TemporallyAsymmetric Hebbian Plasticity . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 623

Hideyuki Kato and Tohru Ikeguchi

Temporal Hebbian Self-Organizing Map for Sequences . . . . . . . . . . . . . . . . 632Jan Koutnk and Miroslav Snorek

FLSOM with Different Rates for Classification in ImbalancedDatasets . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 642

Ivan Machon-Gonzalez and Hilario Lopez-Garca

A Self-organizing Neural System for Background and ForegroundModeling . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 652

Lucia Maddalena and Alfredo Petrosino

Analyzing the Behavior of the SOM through Wavelet Decomposition ofTime Series Generated during Its Execution . . . . . . . . . . . . . . . . . . . . . . . . . 662

Vctor Mireles and Antonio Neme

Decreasing Neighborhood Revisited in Self-Organizing Maps . . . . . . . . . . . 671Antonio Neme, Elizabeth Chavez, Alejandra Cervera, and

Vctor Mireles

A New GHSOM Model Applied to Network Security . . . . . . . . . . . . . . . . . 680Esteban J. Palomo, Enrique Domnguez, Rafael Marcos Luque, and

Jose Munoz

Reduction of Visual Information in Neural Network LearningVisualization . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 690

Matus Uzak, Rudolf Jaksa, and Peter Sincak

Table of Contents Part I XVII

Control and Robotics

Heuristiscs-Based High-Level Strategy for Multi-agent Systems . . . . . . . . 700Peter Gasztonyi and Istvan Harmati

Echo State Networks for Online Prediction of MovementData Comparing Investigations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 710

Sven Hellbach, Soren Strauss, Julian P. Eggert, Edgar Korner, and

Horst-Michael Gross

Comparison of RBF Network Learning and Reinforcement Learning onthe Maze Exploration Problem . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 720

Stanislav Slusny, Roman Neruda, and Petra Vidnerova

Modular Neural Networks for Model-Free Behavioral Learning . . . . . . . . . 730Johane Takeuchi, Osamu Shouno, and Hiroshi Tsujino

From Exploration to Planning . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 740Cornelius Weber and Jochen Triesch

Signal and Time Series Processing

Sentence-Level Evaluation Using Co-occurences of N-Grams . . . . . . . . . . . 750Theologos Athanaselis, Stelios Bakamidis,

Konstantinos Mamouras, and Ioannis Dologlou

Identifying Single Source Data for Mixing Matrix Estimation inInstantaneous Blind Source Separation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 759

Pau Bofill

ECG Signal Classification Using GAME Neural Network and ItsComparison to Other Classifiers . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 768

Miroslav Cepek, Miroslav Snorek, and Vaclav Chudacek

Predictive Modeling with Echo State Networks . . . . . . . . . . . . . . . . . . . . . . 778Michal Cernansky and Peter Tino

Sparse Coding Neural Gas for the Separation of Noisy OvercompleteSources . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 788

Kai Labusch, Erhardt Barth, and Thomas Martinetz

Mutual Information Based Input Variable Selection Algorithm andWavelet Neural Network for Time Series Prediction . . . . . . . . . . . . . . . . . . 798

Rashidi Khazaee Parviz, Mozayani Nasser, and M.R. Jahed Motlagh

Stable Output Feedback in Reservoir Computing Using RidgeRegression . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 808

Francis Wyffels, Benjamin Schrauwen, and Dirk Stroobandt

XVIII Table of Contents Part I

Image Processing

Spatio-temporal Summarizing Method of Periodic Image Sequenceswith Kohonen Maps . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 818

Mohamed Berkane, Patrick Clarysse, and Isabelle E. Magnin

Image Classification by Histogram Features Created with LearningVector Quantization . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 827

Marcin Blachnik and Jorma Laaksonen

A Statistical Model for Histogram Refinement . . . . . . . . . . . . . . . . . . . . . . . 837Nizar Bouguila and Walid ElGuebaly

Efficient Video Shot Summarization Using an Enhanced SpectralClustering Approach . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 847

Vasileios Chasanis, Aristidis Likas, and Nikolaos Galatsanos

Surface Reconstruction Techniques Using Neural Networks to RecoverNoisy 3D Scenes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 857

David Elizondo, Shang-Ming Zhou, and Charalambos Chrysostomou

A Spatio-temporal Extension of the SUSAN-Filter . . . . . . . . . . . . . . . . . . . 867Benedikt Kaiser and Gunther Heidemann

A Neighborhood-Based Competitive Network for Video Segmentationand Object Detection . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 877

Rafael Marcos Luque Baena, Enrique Dominguez,

Domingo Lopez-Rodrguez, and Esteban J. Palomo

A Hierarchic Method for Footprint Segmentation Based on SOM . . . . . . . 887Marco Mora Cofre, Ruben Valenzuela, and Girma Berhe

Co-occurrence Matrixes for the Quality Assessment of Coded Images . . . 897Judith Redi, Paolo Gastaldo, Rodolfo Zunino, and Ingrid Heynderickx

Semantic Adaptation of Neural Network Classifiers in ImageSegmentation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 907

Nikolaos Simou, Thanos Athanasiadis, Stefanos Kollias,

Giorgos Stamou, and Andreas Stafylopatis

Partially Monotone Networks Applied to Breast Cancer Detection onMammograms . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 917

Marina Velikova, Hennie Daniels, and Maurice Samulski

Image Processing Recognition Systems

A Neuro-fuzzy Approach to User Attention Recognition . . . . . . . . . . . . . . . 927Stylianos Asteriadis, Kostas Karpouzis, and Stefanos Kollias

Table of Contents Part I XIX

TriangleVision: A Toy Visual System . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 937Thomas Bangert

Face Recognition with VG-RAM Weightless Neural Networks . . . . . . . . . . 951Alberto F. De Souza, Claudine Badue, Felipe Pedroni,

Elias Oliveira, Stiven Schwanz Dias, Hallysson Oliveira, and

Soterio Ferreira de Souza

Invariant Object Recognition with Slow Feature Analysis . . . . . . . . . . . . . 961Mathias Franzius, Niko Wilbert, and Laurenz Wiskott

Analysis-by-Synthesis by Learning to Invert Generative Black Boxes . . . . 971Vinod Nair, Josh Susskind, and Geoffrey E. Hinton

A Bio-inspired Connectionist Architecture for Visual Classification ofMoving Objects . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 982

Pedro L. Sanchez Orellana and Claudio Castellanos Sanchez

A Visual Object Recognition System Invariant to Scale and Rotation . . . 991Yasuomi D. Sato, Jenia Jitsev, and Christoph von der Malsburg

Recognizing Facial Expressions: A Comparison of ComputationalApproaches . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1001

Aruna Shenoy, Tim M. Gale, Neil Davey, Bruce Christiansen, and

Ray Frank

A Probabilistic Prediction Method for Object Contour Tracking . . . . . . . 1011Daniel Weiler, Volker Willert, and Julian Eggert

Author Index . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1021

Table of Contents Part II

Pattern Recognition and Data Analysis

Investigating Similarity of Ontology Instances and Its Causes . . . . . . . . . . 1Anton Andrejko and Maria Bielikova

A Neural Model for Delay Correction in a Distributed ControlSystem . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11

Ana Antunes, Fernando Morgado Dias, and Alexandre Mota

A Model-Based Relevance Estimation Approach for Feature Selectionin Microarray Datasets . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 21

Gianluca Bontempi and Patrick E. Meyer

Non-stationary Data Mining: The Network Security Issue . . . . . . . . . . . . 32Sergio Decherchi, Paolo Gastaldo, Judith Redi, and Rodolfo Zunino

Efficient Feature Selection for PTR-MS Fingerprinting of AgroindustrialProducts . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 42

Pablo M. Granitto, Franco Biasioli, Cesare Furlanello, and

Flavia Gasperi

Extraction of Binary Features by Probabilistic Neural Networks . . . . . . . 52Jir Grim

Correlation Integral Decomposition for Classification . . . . . . . . . . . . . . . . . 62Marcel Jirina and Marcel Jirina Jr.

Modified q-State Potts Model with Binarized Synaptic Coefficients . . . . . 72Vladimir Kryzhanovsky

Learning Similarity Measures from Pairwise Constraints with NeuralNetworks . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 81

Marco Maggini, Stefano Melacci, and Lorenzo Sarti

Prediction of Binding Sites in the Mouse Genome Using Support VectorMachines . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 91

Yi Sun, Mark Robinson, Rod Adams, Alistair Rust, and Neil Davey

Mimicking Go Experts with Convolutional Neural Networks . . . . . . . . . . . 101Ilya Sutskever and Vinod Nair

Associative Memories Applied to Pattern Recognition . . . . . . . . . . . . . . . . 111Roberto A. Vazquez and Humberto Sossa

XXII Table of Contents Part II

MLP-Based Detection of Targets in Clutter: Robustness with Respectto the Shape Parameter of Weibull-Disitributed Clutter . . . . . . . . . . . . . . . 121

Raul Vicen-Bueno, Eduardo Galan-Fernandez,

Manuel Rosa-Zurera, and Maria P. Jarabo-Amores

Hardware, Embedded Systems

Modeling and Synthesis of Computational Efficient AdaptiveNeuro-Fuzzy Systems Based on Matlab . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 131

Guillermo Bosque, Javier Echanobe, Ines del Campo, and

Jose M. Tarela

Embedded Neural Network for Swarm Learning of Physical Robots . . . . . 141Pitoyo Hartono and Sachiko Kakita

Distribution Stream of Tasks in Dual-Processor System . . . . . . . . . . . . . . . 150Michael Kryzhanovsky and Magomed Malsagov

Efficient Implementation of the THSOM Neural Network . . . . . . . . . . . . . . 159Rudolf Marek and Miroslav Skrbek

Reconfigurable MAC-Based Architecture for Parallel HardwareImplementation on FPGAs of Artificial Neural Networks . . . . . . . . . . . . . . 169

Nadia Nedjah, Rodrigo Martins da Silva,

Luiza de Macedo Mourelle, and Marcus Vinicius Carvalho da Silva

Implementation of Central Pattern Generator in an FPGA-BasedEmbedded System . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 179

Cesar Torres-Huitzil and Bernard Girau

Biologically-Inspired Digital Architecture for a Cortical Model ofOrientation Selectivity . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 188

Cesar Torres-Huitzil, Bernard Girau, and Miguel Arias-Estrada

Neural Network Training with Extended Kalman Filter Using GraphicsProcessing Unit . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 198

Peter Trebaticky and Jir Pospchal

Blind Source-Separation in Mixed-Signal VLSI Using the InfoMaxAlgorithm . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 208

Waldo Valenzuela, Gonzalo Carvajal, and Miguel Figueroa

Computational Neuroscience

Synaptic Rewiring for Topographic Map Formation . . . . . . . . . . . . . . . . . . 218Simeon A. Bamford, Alan F. Murray, and David J. Willshaw

Implementing Bayes Rule with Neural Fields . . . . . . . . . . . . . . . . . . . . . . . . 228Raymond H. Cuijpers and Wolfram Erlhagen

Table of Contents Part II XXIII

Encoding and Retrieval in a CA1 Microcircuit Model of theHippocampus . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 238

Vassilis Cutsuridis, Stuart Cobb, and Bruce P. Graham

A Bio-inspired Architecture of an Active Visual Search Model . . . . . . . . . 248Vassilis Cutsuridis

Implementing Fuzzy Reasoning on a Spiking Neural Network . . . . . . . . . . 258Cornelius Glackin, Liam McDaid, Liam Maguire, and

Heather Sayers

Short Term Plasticity Provides Temporal Filtering at ChemicalSynapses . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 268

Bruce P. Graham and Christian Stricker

Observational Versus Trial and Error Effects in a Model of an InfantLearning Paradigm . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 277

Matthew Hartley, Jacqueline Fagard, Rana Esseily, and John Taylor

Modeling the Effects of Dopamine on the Antisaccade Reaction Times(aSRT) of Schizophrenia Patients . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 290

Ioannis Kahramanoglou, Stavros Perantonis, Nikolaos Smyrnis,

Ioannis Evdokimidis, and Vassilis Cutsuridis

Fast Multi-command SSVEP Brain Machine Interface withoutTraining . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 300

Pablo Martinez Vasquez, Hovagim Bakardjian,

Montserrat Vallverdu, and Andrezj Cichocki

Separating Global Motion Components in Transparent VisualStimuli A Phenomenological Analysis . . . . . . . . . . . . . . . . . . . . . . . . . . . . 308

Andrew Meso and Johannes M. Zanker

Lateral Excitation between Dissimilar Orientation Columns forOngoing Subthreshold Membrane Oscillations in Primary VisualCortex . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 318

Yuto Nakamura, Kazuhiro Tsuboi, and Osamu Hoshino

A Computational Model of Cortico-Striato-Thalamic Circuits inGoal-Directed Behaviour . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 328

N. Serap Sengor, Ozkan Karabacak, and Ulrich Steinmetz

Firing Pattern Estimation of Synaptically Coupled Hindmarsh-RoseNeurons by Adaptive Observer . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 338

Yusuke Totoki, Kouichi Mitsunaga, Haruo Suemitsu, and

Takami Matsuo

Global Oscillations of Neural Fields in CA3 . . . . . . . . . . . . . . . . . . . . . . . . . 348Francesco Ventriglia

XXIV Table of Contents Part II

Connectionistic Cognitive Science

Selective Attention Model of Moving Objects . . . . . . . . . . . . . . . . . . . . . . . . 358Roman Borisyuk, David Chik, and Yakov Kazanovich

Tempotron-Like Learning with ReSuMe . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 368Razvan V. Florian

Neural Network Capable of Amodal Completion . . . . . . . . . . . . . . . . . . . . . 376Kunihiko Fukushima

Predictive Coding in Cortical Microcircuits . . . . . . . . . . . . . . . . . . . . . . . . . . 386Andreea Lazar, Gordon Pipa, and Jochen Triesch

A Biologically Inspired Spiking Neural Network for Sound Localisationby the Inferior Colliculus . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 396

Jindong Liu, Harry Erwin, Stefan Wermter, and Mahmoud Elsaid

Learning Structurally Analogous Tasks . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 406Paul W. Munro

Auto-structure of Presynaptic Activity Defines Postsynaptic FiringStatistics and Can Modulate STDP-Based Structure Formation andLearning . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 413

Gordon Pipa, Raul Vicente, and Alexander Tikhonov

Decision Making Logic of Visual Brain . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 423Andrzej W. Przybyszewski

A Computational Model of Saliency Map Read-Out During VisualSearch . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 433

Mia Setic and Drazen Domijan

A Corpus-Based Computational Model of Metaphor UnderstandingIncorporating Dynamic Interaction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 443

Asuka Terai and Masanori Nakagawa

Deterministic Coincidence Detection and Adaptation Via DelayedInputs . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 453

Zhijun Yang, Alan Murray, and Juan Huo

Synaptic Formation Rate as a Control Parameter in a Model for theOntogenesis of Retinotopy . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 462

Junmei Zhu

Neuroinformatics

Fuzzy Symbolic Dynamics for Neurodynamical Systems . . . . . . . . . . . . . . . 471Krzysztof Dobosz and Wlodzislaw Duch

Table of Contents Part II XXV

Towards Personalized Neural Networks for Epileptic SeizurePrediction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 479

Antonio Dourado, Ricardo Martins, Joao Duarte, and Bruno Direito

Real and Modeled Spike Trains: Where Do They Meet? . . . . . . . . . . . . . . . 488Vasile V. Moca, Danko Nikolic, and Raul C. Muresan

The InfoPhase Method or How to Read Neurons with Neurons . . . . . . . . . 498Raul C. Muresan, Wolf Singer, and Danko Nikolic

Artifact Processor for Neuronal Activity Analysis during Deep BrainStimulation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 508

Dimitri V. Nowicki, Brigitte Piallat, Alim-Louis Benabid, and

Tatiana I. Aksenova

Analysis of Human Brain NMR Spectra in Vivo Using Artificial NeuralNetworks . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 517

Erik Saudek, Daniel Novak, Dita Wagnerova, and Milan Hajek

Multi-stage FCM-Based Intensity Inhomogeneity Correction for MRBrain Image Segmentation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 527

Laszlo Szilagyi, Sandor M. Szilagyi, Laszlo David, and Zoltan Benyo

KCMAC: A Novel Fuzzy Cerebellar Model for Medical DecisionSupport . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 537

S.D. Teddy

Decoding Population Neuronal Responses by Topological Clustering . . . . 547Hujun Yin, Stefano Panzeri, Zareen Mehboob, and Mathew Diamond

Neural Dynamics

Learning of Neural Information Routing for Correspondence Finding . . . 557Jan D. Bouecke and Jorg Lucke

A Globally Asymptotically Stable Plasticity Rule for Firing RateHomeostasis . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 567

Prashant Joshi and Jochen Triesch

Analysis and Visualization of the Dynamics of Recurrent NeuralNetworks for Symbolic Sequences Processing . . . . . . . . . . . . . . . . . . . . . . . . 577

Matej Makula and Lubica Benuskova

Chaotic Search for Traveling Salesman Problems by Using 2-opt andOr-opt Algorithms . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 587

Takafumi Matsuura and Tohru Ikeguchi

Comparison of Neural Networks Incorporating Partial Monotonicity byStructure . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 597

Alexey Minin and Bernhard Lang

XXVI Table of Contents Part II

Special Session: Coupling, Synchronies and FiringPatterns: from Cognition to Disease

Effect of the Background Activity on the Reconstruction of Spike Trainby Spike Pattern Detection . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 607

Yoshiyuki Asai and Alessandro E.P. Villa

Assemblies as Phase-Locked Pattern Sets That Collectively Win theCompetition for Coherence . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 617

Thomas Burwick

A CA2+ Dynamics Model of the STDP Symmetry-to-AsymmetryTransition in the CA1 Pyramidal Cell of the Hippocampus . . . . . . . . . . . . 627

Vassilis Cutsuridis, Stuart Cobb, and Bruce P. Graham

Improving Associative Memory in a Network of Spiking Neurons . . . . . . . 636Russell Hunter, Stuart Cobb, and Bruce P. Graham

Effect of Feedback Strength in Coupled Spiking Neural Networks . . . . . . . 646Javier Iglesias, Jordi Garca-Ojalvo, and Alessandro E.P. Villa

Bifurcations in Discrete-Time Delayed Hopfield Neural Networks ofTwo Neurons . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 655

Eva Kaslik and Stefan Balint

EEG Switching: Three Views from Dynamical Systems . . . . . . . . . . . . . . . 665Carlos Lourenco

Modeling Synchronization Loss in Large-Scale Brain Dynamics . . . . . . . . 675Antonio J. Pons Rivero, Jose Luis Cantero, Mercedes Atienza, and

Jordi Garca-Ojalvo

Spatio-temporal Dynamics during Perceptual Processing in anOscillatory Neural Network . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 685

A. Ravishankar Rao and Guillermo Cecchi

Resonant Spike Propagation in Coupled Neurons with SubthresholdActivity . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 695

Belen Sancristobal, Jose M. Sancho, and Jordi Garca-Ojalvo

Contour Integration and Synchronization in Neuronal Networks of theVisual Cortex . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 703

Ekkehard Ullner, Raul Vicente, Gordon Pipa, and

Jordi Garca-Ojalvo

Special Session: Constructive Neural Networks

Fuzzy Growing Hierarchical Self-Organizing Networks . . . . . . . . . . . . . . . . 713Miguel Barreto-Sanz, Andres Perez-Uribe,

Carlos-Andres Pena-Reyes, and Marco Tomassini

Table of Contents Part II XXVII

MBabCoNN A Multiclass Version of a Constructive Neural NetworkAlgorithm Based on Linear Separability and Convex Hull . . . . . . . . . . . . . 723

Joao Roberto Bertini Jr. and Maria do Carmo Nicoletti

On the Generalization of the m-Class RDP Neural Network . . . . . . . . . . . 734David A. Elizondo, Juan M. Ortiz-de-Lazcano-Lobato, and

Ralph Birkenhead

A Constructive Technique Based on Linear Programming for TrainingSwitching Neural Networks . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 744

Enrico Ferrari and Marco Muselli

Projection Pursuit Constructive Neural Networks Based on Quality ofProjected Clusters . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 754

Marek Grochowski and Wlodzislaw Duch

Introduction to Constructive and Optimization Aspects of SONN-3 . . . . 763Adrian Horzyk

A Reward-Value Based Constructive Method for the AutonomousCreation of Machine Controllers . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 773

Andreas Huemer, David Elizondo, and Mario Gongora

A Brief Review and Comparison of Feedforward Morphological NeuralNetworks with Applications to Classification . . . . . . . . . . . . . . . . . . . . . . . . . 783

Alexandre Monteiro da Silva and Peter Sussner

Prototype Proliferation in the Growing Neural Gas Algorithm . . . . . . . . . 793Hector F. Satizabal, Andres Perez-Uribe, and Marco Tomassini

Active Learning Using a Constructive Neural Network Algorithm . . . . . . 803Jose Luis Subirats, Leonardo Franco, Ignacio Molina Conde, and

Jose M. Jerez

M-CLANN: Multi-class Concept Lattice-Based Artificial NeuralNetwork for Supervised Classification . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 812

Engelbert Mephu Nguifo, Norbert Tsopze, and Gilbert Tindo

Workshop: New Trends in Self-organization andOptimization of Artificial Neural Networks

A Classification Method of Children with Developmental DysphasiaBased on Disorder Speech Analysis . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 822

Marek Bartu and Jana Tuckova

Nature Inspired Methods in the Radial Basis Function NetworkLearning Process . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 829

Miroslav Bursa and Lenka Lhotska

XXVIII Table of Contents Part II

Tree-Based Indirect Encodings for Evolutionary Development of NeuralNetworks . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 839

Jan Drchal and Miroslav Snorek

Generating Complex Connectivity Structures for Large-Scale NeuralModels . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 849

Martin Hulse

The GAME Algorithm Applied to Complex Fractionated AtrialElectrograms Data Set . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 859

Pavel Kordk, Vaclav Kremen, and Lenka Lhotska

Geometrical Perspective on Hairy Memory . . . . . . . . . . . . . . . . . . . . . . . . . . 869Cheng-Yuan Liou

Neural Network Based BCI by Using Orthogonal Components ofMulti-channel Brain Waves and Generalization . . . . . . . . . . . . . . . . . . . . . . 879

Kenji Nakayama, Hiroki Horita, and Akihiro Hirano

Feature Ranking Derived from Data Mining Process . . . . . . . . . . . . . . . . . . 889Ales Pilny, Pavel Kordk, and Miroslav Snorek

A Neural Network Approach for Learning Object Ranking . . . . . . . . . . . . 899Leonardo Rigutini, Tiziano Papini, Marco Maggini, and

Monica Bianchini

Evolving Efficient Connection for the Design of Artificial NeuralNetworks . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 909

Min Shi and Haifeng Wu

The Extreme Energy Ratio Criterion for EEG Feature Extraction . . . . . . 919Shiliang Sun

Workshop: Adaptive Mechanisms of thePerception-Action Cycle

The Schizophrenic Brain: A Broken Hermeneutic Circle . . . . . . . . . . . . . . . 929Peter Erdi, Vaibhav Diwadkar, and Balazs Ujfalussy

Neural Model for the Visual Recognition of Goal-DirectedMovements . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 939

Falk Fleischer, Antonino Casile, and Martin A. Giese

Emergent Common Functional Principles in Control Theory and theVertebrate Brain: A Case Study with Autonomous Vehicle Control . . . . . 949

Amir Hussain, Kevin Gurney, Rudwan Abdullah, and Jon Chambers

Organising the Complexity of Behaviour . . . . . . . . . . . . . . . . . . . . . . . . . . . . 959Stathis Kasderidis

Table of Contents Part II XXIX

Towards a Neural Model of Mental Simulation . . . . . . . . . . . . . . . . . . . . . . . 969Matthew Hartley and John Taylor

Author Index . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 981

Dimension Reduction for Mixtures of

Exponential Families

Shotaro Akaho

Neuroscience Research Institute, AIST, Tsukuba 3058568, Japan

Abstract. Dimension reduction for a set of distribution parameters hasbeen important in various applications of datamining. The exponentialfamily PCA has been proposed for that purpose, but it cannot be directlyapplied to mixture models that do not belong to an exponential family.This paper proposes a method to apply the exponential family PCAto mixture models. A key idea is to embed mixtures into a space of anexponential family. The problem is that the embedding is not unique, andthe dimensionality of parameter space is not constant when the numbersof mixture components are different. The proposed method finds a sub-optimal solution by linear programming formulation.

1 Introduction

In many applications, dimension reduction is important for many purposes suchas visualization and data compression. Traditionally, the principal componentanalysis (PCA) has been widely used as a powerful tool for dimension reductionin the Euclidean space. However, data are often given as binary strings or graphstructures that have very different nature from Euclidean vectors.

One approach that we take here is to regard such a data as a parameter ofa probability distribution. Information geometry[1] gives a mathematical frame-work of the space of probability distributions, and a dimension reduction methodhas been proposed for a class of exponential family[2,3,4,5]. There are mainly twoadvantages of information geometrical approach to other conventional methods:one is that the information geometrical projection from data point always lies onthe support of parameters, and the other is that the projection is defined morenaturally for a distribution than the conventional Euclidean projection.

In this paper, we focus on the mixture models[6], which are very flexible andare often used for clustering. However, we cannot apply the exponential familyPCA to the mixture models, because they are not members of an exponentialfamily. Our main idea is to embed mixture models into the space of an expo-nential family. However, that is not straightforward because the embedding isnot unique and the dimensionality of parameter space is not constant when thenumbers of mixture components are different. Those problems can be resolvedby solving some combinatorial optimization problem, which is computationallyintractable. Therefore, we propose a method that finds a sub-optimal solutionby separating the problems into subproblems, each of which can be optimizedeasier.

V. Kurkova et al. (Eds.): ICANN 2008, Part I, LNCS 5163, pp. 110, 2008.c Springer-Verlag Berlin Heidelberg 2008

2 S. Akaho

The proposed framework is not only useful for visualization and data com-pression, but also for applications which have been developed recently in the fieldof datamining, privacy preserving datamining[7] and distributed datamining[8].In the distributed datamining, raw data are collected in many distributed sites.Those datae are not directly sent to the center, but processed into statisticaldata in each site in order to preserve privacy as well as to reduce communicationcosts, and then those statistical data are sent to the center. Similar frameworkhas begun to be studied in the field of sensor networks[9].

2 e-PCA and m-PCA: Dual Dimension Reduction

2.1 Information Geometry of Exponential Family

In this section, we review the exponential family PCA called e-PCA and m-PCA[4]. Exponential family is defined as a class of distributions given by

p(x;) = exp{d

i=1

iFi(x) + C(x) ()}, (1)

with a random variable x and a parameter = (1, . . . , d). The whole set

of distribution p(x;) by changing forms a space (manifold) S. The struc-ture of the manifold is determined by introducing a Riemannian metric andan affine connection. Statistically natural metric is a Fisher information matrixgjk() = E[{ log p(x;)/j}{ log p(x;)/k}] and the natural connectionis -connection specified by one real valued parameter . In particular, = 1 isimportant, because S becomes a flat manifold. When = 1, the space is callede-flat1 with respect to an affine coordinate (e-coordinate) . When = 1,the exponential family is also flat with respect to another affine coordinate (m-coordinate) = (1, . . . , d)

defined by i = E[Fi(x)]. The coordinates and are dually related and transformed each other by Legendre transform, and wewrite this coordinate transform by (),().

2.2 e-PCA and m-PCA

Since the manifold of an exponential family is flat in the e- and m- affine co-ordinate, there are two kinds of flat submanifolds for dimension reduction ac-cordingly. The e-PCA (and m-PCA) is defined by finding the e-flat (m-flat)submanifold that fits to samples given as a set of points of the exponential fam-ily. Here we describe only e-PCA, because m-PCA is completely dual to e-PCAthat is given by exchanging e- and m- in the description of e-PCA.

Let us define the h dimensional e-flat subspace M. The points on M can beexpressed by

(w;U) =

h

j=1

wjuj + u0, (2)

1 e stands for exponential and m stands for mixture.

Dimension Reduction for Mixtures of Exponential Families 3

where U = [u0,u1, . . . ,uh] Rdh is a matrix containing basis vectors of thesubspace and w = (w1, . . . , wh)

Rh is a local coordinate on M.Suppose we have a set of parameters (1), . . . ,(n) S as sample points. For

dimension reduction, we need to consider the projection of the sample pointsonto M, which is defined by a geodesic that is orthogonal to M with respectto the Fisher information. According to the two kinds of geodesic, we can definee-projection and m-projection.

Amari[1] has proved that the m-projection onto an e-flat submanifold isunique, and further it is given by the point that minimizes the m-divergence,

Km(p, q) =

p(x){log p(x) log q(x)}dx, (3)

hence we take m-projection for the e-PCA2.As a cost function of fitting of the sample points to a submanifold, it is

convenient to take sum of the m-divergence

L(U,W ) =n

i=1

Km((i),(w(i);U)), (4)

where W = (w(1), . . . ,w(n)), and e-PCA is defined by finding U and W thatminimize L(U,W ). Note that even when data are given as values of a randomvariable instead of parameters, the random variable can be related to a param-eter, thus we can apply the same framework[10].

2.3 Alternating Gradient Descent Algorithm

Although it is difficult to optimize L(U,W ) with respect to U and W simul-taneously, the optimization becomes easier by alternating procedures in whichoptimization is performed for one variable with fixing the other variable. If wefix the basis vectors U , the projection onto an e-flat space from a sample pointis unique as mentioned above. On the other hand, optimizing U with fixingW is also an m-projection to the e-flat subspace determined by W that is asubmanifold of the product space of Sn. Therefore, it also has a unique solution.

In each optimization step, we can apply a Newton-like method[4], but we onlyuse an simple gradient descent in this paper. Note that whatever algorithm weuse, the algorithm does not always converge to the global solution, even if eachalternating step is globally optimized, like in EM and variational Bayes.

The gradient descent algorithm is given by

w(i)j = wuj (i), uj = u

n

i=1

w(i)j

(i), u0 = un

i=1

(i)

(5)

2 By duality, we take e-projection for m-PCA, and e-divergence is defined byKe(p, q) = Km(q, p).

4 S. Akaho

where (i) = (i) (i) is a difference of m-coordinates between the pointspecified by the current estimate w(i) and the sample point. As a general ten-dency, the problem is more sensitive against U than against W . Thus we take alearning constant to be w > u.

Further, the basis vectors U has a redundancy of linear transformation, thatis, when U is transformed by any non-singular matrix A, the same solution is ob-tained by transforming W to WA1. It also happens that two different bases uiand uj converge to the same direction if they are optimized by ordinary gradientdescent without any constraints. Therefore, we restrict U to be an orthogonalframe (i.e. UU = Id). Such a space is called Grassmann manifold. The opti-mization in Grassmann manifold is often used for finding principal componentsor minor components [11]. The natural gradient for U is given by

Unat = U UUU (6)

where U be the matrix whose columns are uj in (5). Since this update ruledoes not preserve the orthogonal constraint strictly, we need to orthogonalize it(we apply this in the experiment), or update U along the geodesic.

2.4 e-Center and m-Center

An important special case of the e-PCA and m-PCA is a zero dimensional sub-space that corresponds to a point. The only parameter in that case is u0 that isgiven in a closed form

ec =

(

1

n

n

i=1

((i))

)

, mc =

(

1

n

n

i=1

((i))

)

. (7)

We call them e-center and m-center respectively.

2.5 Properties of e-PCA and m-PCA

In this subsection, we summarize several points that the e-PCA and m-PCA aredifferent from the ordinary PCA.

The first thing is about the hierarchical relation between different dimen-sions. Since e-PCA and m-PCA includes nonlinear part in its formulation, anoptimal low dimensional subspace is not always included in a higher dimensionalone. In some applications, hierarchical structures are necessary or convenient.In such cases, we can construct an algorithm that finds the optimal subspace byconstraining the search space.

The second thing is about the domain (or support) of S. The parameter setof exponential family is a local coordinate. That means does not define theprobability distribution for all values of Rd. In general, it forms a convex regionfor e- and m- coordinate systems. It is known that the m-projection for e-PCA isguaranteed to be included in that region. However, when we apply the gradient-type algorithm, too large step size causes the excess of the candidate solution


from the domain. In our implementation, the candidate solution is checked to beincluded in each learning step, and the learning constant is adaptively changedin the case of excess.

The third thing is about the initialization problem. Since the alternating algo-rithm only gives a local optimum, it is important to find a good initial solution.The naive idea is to use the conventional PCA using Euclidean metric, and u0is initialized by its e-center. However, the initialization problem is related tothe domain problem above, i.e., the initialization points have to lie in the do-main region. For simplicity, we take W = 0 in our numerical simulation, whichcorresponds to the initial projection point being always u0.

3 Embedding of Mixture Models

Now let us move on to our main topic, the dimension reduction of mixture mod-els. A major difficulty is that mixture models are not members of an exponentialfamily. If we add a latent variable z representing which component x is generatedfrom, p(x, z;) belongs to the exponential family.

3.1 Latent Variable Model

Mixture of exponential family is written as

p(x) =

k

i=0

ifi(x; i), fi(x; i) = exp(i F i(x)i(i)), i = 0, . . . , k. (8)

Since the number of freedom {i} is k, we regard 1, . . . , k as parameters anddefine 0 by 0 = 1

ki=1 i.

When z {0, 1, 2, . . . , k} is a latent variable representing which component ofmixture x is generated from, the distribution of (x, z) is an exponential family[12] as written down below.

p(x, z) = zfz(x; z) exp

[k

i=1

i F i(x)i(z)

+0 F 0(x)(

1k

i=1

i(z)

)

+

k

i=1

ii(z) ]

, (9)

where i(z) = 1 when z = i, and 0 otherwise, and

i = log i i(i) (log 0 0(0)) , = log 0 + 0(0). (10)

The e-coordinate of this model is = 1, . . . , k, 0, 1, . . . , k, and the m-coordinate is E [i(z)] = i corresponding to i, and E[F i(x)i(z)] = iicorresponding to i, where i = E[F i(x)] is the m-coordinate of each compo-nent distribution fi(x; i).

6 S. Akaho

3.2 Problems of the Embedding

There are two problems in the embedding described above. The first one is thatthe embedding is not unique, because the mixture distribution is invariant whencomponents are exchanged. The other happens when there are different numbersof mixture components. In such a case, we cannot embed them directly into onecommon space, because the dimensions of mixture components are different.

For the first problem, we will find the embedding so that embedded distri-butions are located as closely as possible. Once the embedding is completed,the e-PCA (or m-PCA) can be applied directly. For the second problem, wewill split the components to adjust the dimensions between different numbers ofcomponents.

3.3 Embedding for the Homogeneous Mixtures

Firstly, we consider the homogeneous case in which the numbers of componentsare the same for all mixtures (i).

A naive way to resolve the problem is that we perform e-PCA (or m-PCA)for any possible embeddings and take the best one. However, it is not practicalbecause the number of possible embeddings increase exponentially with respectto the number of components and the number of mixtures. Instead, we try to finda configuration by which mixtures get as close together as possible. The followingproposition shows the divergence between two mixtures in the embedded spaceis given in a very simple form.

Proposition 1. Suppose there are two mixture distributions with the same num-bers of components, and their distributions with latent variables be

p1(x, z) = zfz(x; z), p2(x, z) = zfz(x; z). (11)

The m-divergence between p1 and p2 is given by

Km(p1, p2) =

k

i=0

i[Km(fi(x; i), fi(x; i)) + logii

]. (12)

This means that the divergence is separated into sum of functions each of whichdepends only on pairwise components of the two mixtures. Note that the diver-gence between the original mixtures is not so simple.

Based on this fact, we can derive the optimal embedding for two mixturesthat minimizes the divergence. It should be noted that the optimality is notinvariant with respect to the order of p1 and p2 because the divergence is nota symmetric function. For the general n mixtures case, we apply the followinggreedy algorithm based on the pairwise optimality.

[Embedding algorithm (for e-PCA, homogeneous)]

1. Embed (1) in any configuration2. Repeat the following procedures for i = 2, 3, . . . , n

(a) Let ec be the e-center of already embedded mixtures for j = 1, . . . , i1.(b) Embed (i) so as to minimize the m-divergence between (i) and ec in

the embedded space (see next subsection).


0 0

1 1

k k

01

k0

1k

......

00

1k

01

k0

...

...kkk

Fig. 1. Matching of distributions. Left: Homogeneous case. The sum of weights isminimized. Right: Heterogeneous case. In this example, the k-th component of the leftgroup is split and matched with two components (0, k-th) of the right group.

3.4 The Optimal Matching Solution

In this subsection, we give an optimization method to find a matching be-tween two mixtures so as to minimize the cost function (12) that is the sumof component-wise functions (Left of fig.1).

Letting the weight values be

ij = Km(p1, p2) = i

[

Km(fi(x; i), fj(x; j)) + logij

]

, (13)

we obtain the optimization problem in terms of the linear programming,

minaij

k

i=0

k

j=0

ijaij s.t. aij 0,k

i=0

aij =

k

j=0

aij = 1 (14)

The solution aij takes binary values (0 or 1) by the following integrality theorem.

Proposition 2 (Integrality theorem[13]). In the transshipment problem,

minaijk

i=0

k

j=0 ijaij s.t. aij 0,k

i=0 aij = sj ,k

j=0 aij = ti, aij has aninteger optimal solution when the problem has at least one feasible solution andsj , ti are all integers. In particular, the solution given by the simplex methodalways gives the integer solution.

3.5 General Case: Splitting Components

When numbers of components of mixtures are different (heterogeneous case), wecan adjust the numbers by splitting components. Splitting the components ofmixtures have played an important role in different situations, for example, tofind an optimal number of components for fitting the mixture[14].

Let f(x) be one of the components of a mixture, it can be split into k + 1components like

if(x; ), i = 0, . . . , k,

k

i=0

i = , i > 0 (15)

8 S. Akaho

We need to determine two things: which component should be split and howlarge weights of splitting i should be. However, since it is hard to optimize themsimultaneously, we solve the problem sequentially, that is, first we determine thecomponent to be split based on the optimal assignment problem in the previoussubsection, and then we optimize the weights of splitting.

3.6 Component Selection

Suppose we have two mixtures p1 and p2 given in (11). When their numbers ofcomponents are different (heterogeneous case), we need to find matching one-to-many. Here let z = 0, 1, . . . , k for p1 and z = 0, 1, . . . , k

for p2. In orderto find the one-to-many matching, we extend the optimization problem of thehomogeneous case to the heterogeneous case in a natural way

minaij

k

i=0

k

j=0

ijaij s.t. aij 0,k

i=0

aij 1,k

j=0

aij = 1, (16)

where ij is defined by (13), and we assumed p1 has a smaller number of compo-nents than p2 (k k) and some equality constraints are replaced by inequalityconstraints to deal with one-to-many matching (the right of fig.1).

Note that this problem only gives a sub-optimal matching for the entire prob-lem, because the splitting weights are not taken into account. However, from thecomputational point of view, the integrality property of the solution is preservedand all weights are guaranteed to be binary values, and further virtue of thisformulation is that the homogeneous case is included as a special case of theheterogeneous case.

3.7 Optimal Weights

After the matching is performed, we split the component f(x; ) into k + 1components given by (15) and find the optimal correspondence to the compo-nents ifi(x; i), (i = 0, . . . , k). This can be given by the following proposition.

Proposition 3. The optimal splitting that minimizes the sum of m-divergencebetween if(x; ) and ifi(x; i), (i = 0, . . . , k) is given by

ei =iZ

exp(Km(f(x; ), f(x; i))), (17)

where Z is a normalization constant. The splitting for e-divergence is given by

mi =iZ. (18)

Now we summarize the embedding method in the general case including bothhomogeneous and heterogeneous.


10 0 100

0.2

0.4

1

10 0 100

0.2

0.4

2

10 0 100

0.2

0.4

3

10 0 100

0.2

0.4

4

10 0 100

0.2

0.4

5

10 0 100

0.2

0.4

6

10 0 100

0.2

0.4

7

10 0 100

0.2

0.4

8

10 0 100

0.2

0.4

1

10 0 100

0.2

0.4

2

10 0 100

0.2

0.4

3

10 0 100

0.2

0.4

4

10 0 100

0.2

0.4

5

10 0 100

0.2

0.4

6

10 0 100

0.2

0.4

7

10 0 100

0.2

0.4

8

6 5 4 3 2 1 0 15

4

3

2

1

0

1

2

3

4

1

2

3

4

5

6 7

8

Fig. 2. Up-left: Original mixtures, Up-right: Mixtures with reduced dimension, Down:Two dimensional scatter plots of mixtures

[Embedding algorithm (for e-PCA, general)]

1. Sort (1), . . . ,(n) in the descending order of the numbers of components.2. Embed (1) in any configuration3. Repeat the following (a),(b),(c) for i = 2, 3, . . . , n

(a) Let ec be e-center of already embedded mixtures j = 1, . . . , i 1.(b) Solve (16) to find the correspondence between (i) and ec.

(c) If the number of components of (i) is smaller than ec, then split thecomponents by (17).

4 Numerical Experiments

We applied the proposed method to a synthetic data set of one dimensionalGaussian mixtures. First, Gaussian mixtures are generated (total 8 = 4 mix-tures with 3 components + 3 mixtures with 2 components + 1 mixture with 1component), where the parameters of those mixtures (mixing weight, mean andvariance of each component) are determined at random. (the upper left of fig.2).

The learning coefficient of e-PCA is taken to be w = 0.1, u = 0.01 except thecases that the parameter exceeds the domain boundary or the objective functionincreases exceptionally (in such cases the learning rate is decreased adaptively).The update of U is performed 20 steps, each of which follows after 50 stepupdates of W for the sake of stable convergence.

The down figure of fig. 2 shows the result of dimension reduction (e-PCA)to 2 dimensional subspace from 8 dimensional original space (= the number ofparameters of Gaussian mixtures with 3 components). The objective function

10 S. Akaho

of L(U,W ) is about 6.4 in the initial solution (the base function is initializedEuclidean PCA) that decreased to about 1.9. The upper right of fig. 2 showsthe projected distributions obtained by e-PCA. We see their original shapes arewell-preserved even in 2-D subspace, but the shapes are little smoothed. We alsoapplied m-PCA as well, a similar but different results are obtained.

5 Concluding Remarks

We have proposed a dimension reduction method of parameters of mixture dis-tributions. There are two important problems to be solved: One is to find a goodinitial solution because the final solution is not a global optimum, though theoptimum solution is obtained in each step. The other is to develop a stable andfast algorithm. As for the embedding, there is a lot of possibilities to be improvedfrom the proposed greedy algorithm. The application of the real world data andextensions to other structured models like HMM and other types of methods likeclustering are all left as future works.

References

1. Amari, S.: Differential Geometrical Methods in Statistics. Springer, Heidelberg(1985)

2. Amari, S.: Information Geometry on Hierarchy of Probability Distributions. IEEETrans. on Information Theory 41 (2001)

3. Collins, M., Dasgupta, S., Schapire, R.: A Generalization of Principal ComponentAnalysis to the Exponential Family. In: Advances in NIPS, vol. 14 (2002)

4. Akaho, S.: The e-PCA and m-PCA: dimension reduction by information geometry.In: IJCNN 2004, pp. 129134 (2004)

5. Watanabe, K., Akaho, S., Okada, M.: Clustering on a Subspace of ExponentialFamily Using Variational Bayes Method. In: Proc. of Worldcomp2008/InformationTheory and Statistical Learning (2008)

6. McLachlan, G., Peel, D.: Finite Mixture Models. Wiley, Chichester (2000)7. Agrawal, R., Srikant, R.: Privacy-Preserving Data Mining. In: Proc. of the ACM

SIGMOD, pp. 439450 (2000)8. Kumar, A., Kantardzic, M., Madden, S.: Distributed Data Mining: Framework and

Implementations. IEEE Internet Computing 10, 1517 (2006)9. Chong, C.Y., Kumar, S.: Sensor networks: evolution, opportunities, and challenges.

Proc. of the IEEE 91, 12471256 (2003)10. Buntine, W.: Variational extensions to EM and multinomial PCA. In: Elomaa, T.,

Mannila, H., Toivonen, H. (eds.) ECML 2002. LNCS (LNAI), vol. 2430. Springer,Heidelberg (2002)

11. Edelman, A., Arias, T., Smith, S.: The geometry of algorithms with orthogonalityconstraints. SIAM J. Matrix Anal. Appl. 20(2), 303353 (1998)

12. Amari, S.: Information geometry of the EM and em algorithms for neural networks.Neural Networks 8(9), 13791408 (1995)

13. Chvatal, V.: Linear Programming. W.H. Freeman and Company, New York (1983)14. Fukumizu, K., Akaho, S., Amari, S.: Critical lines in symmetry of mixture models

and its application to component splitting. In: Proc. of NIPS15 (2003)

Several Enhancements to Hermite-Based

Approximation of One-Variable Functions

Bartlomiej Beliczynski1 and Bernardete Ribeiro2

Warsaw University of Technology,Institute of Control and Industrial Electronics,ul. Koszykowa 75, 00-662 Warszawa, Poland

[email protected]

Department of Informatics Engineering, Center for Informatics and Systems,University of Coimbra,

Polo II, P-3030-290 Coimbra, [email protected]

Abstract. Several enhancements and comments to Hermite-based one-variable function approximation are presented. First of all we prove thata constant bias extracted from the function contributes to the error de-crease. We demonstrate how to choose that bias. Secondly we show howto select a basis among orthonormal functions to achieve minimum errorfor a fixed dimension of an approximation space. Thirdly we prove thatloss of orthonormality due to truncation of the argument range of thebasis functions does not effect the overall error of approximation and theexpansion coefficients. We show how this feature can be used. An appli-cation of the obtained results to ECG data compression is presented.

1 Introduction

A set of Hermite functions forming an orthonormal basis is naturally attractivefor various approximation, classification and data compression tasks. These ba-sis functions are defined on the real numbers set IR and they can be recursivelycalculated. The approximating function coefficients can be determined relativelyeasily to achieve the best approximation property. Since Hermite functions areeigenfunctions of the Fourier transform, time and frequency spectra are simulta-neously approximated. Each subsequent basis function extends frequency band-width within a limited range of well concentrated energy; see for instance [1]. Byintroducing scaling parameter we may control the bandwidth influencing at thesame time the dynamic range of the input argument, till we strike a desirablebalance.

If Hermite one-variable functions are generalized to two variables, they retainthe same useful property and turn out to be very suitable for image compressiontasks.

Recently in several publications (see for instance [2], [3]) it was suggested touse Hermite functions as activation functions in neural schemes. In [3], a so calledconstructive approximation scheme is used. It is a type of incremental approx-imation developed in [4], [5]. The novelty of this approach is that contrary to the

V. Kurkova et al. (Eds.): ICANN 2008, Part I, LNCS 5163, pp. 1120, 2008.c Springer-Verlag Berlin Heidelberg 2008

12 B. Beliczynski and B. Ribeiro

traditional neural architecture, every node in the hidden layer has a different ac-tivation function. It gains several advantages of the Hermite functions. However,in such approach orthogonality of Hermite functions is not really exploited.

In this paper we return to the basic tasks of one-variable function approxima-tion. For this classical problem we are offering two enhancements and one proofof correctness.

For fixed basis functions in a Hilbert space, there always exists the best ap-proximation. If the basis is orthonormal, the approximation can relatively easilybe calculated in the form of expansion coefficients. Those coefficients representthe original function approximated in the Hermite basis. The coefficients usuallyrequire less space than the original data. At first glance there seems to be littleroom for improvement. However one may slightly reformulate the problem. In-stead of approximating the function f , one may approximate f f0, where f0is a fixed chosen function. After the approximation is done, f0 is added to theapproximant of ff0. From approximation and data compression point of view,this procedure makes sense if additional efforts put into the representation of f0are compensated by reduction of the approximation error.

In a typically stated approximation problem a basis of n+1 functions {e0, e1, ..., en} is given and we are looking for their expansion coefficients. We may howeverreformulate that problem in the following way. Let us search for any n+1 Hermitebasis functions, not necessarily with consecutive indices, ensuring the smallesterror of approximation. This is the second issue.

The third problem which is stated and discussed here is the problem of loosingorthonormality property by basis functions if the set IR is replaced by its subset.When the approximating basis is orthonormal, the expansion coefficients arecalculated easily. Otherwise these calculations are more complicated. Howeverwe prove that despite of loss of orthonormality, we may determine the Hermiteexpansion coefficients as before.

In this paper we are focusing on Hermite basis, however many of the stud-ied properties are applicable to any orthonormal basis. Our enhancements weretested and demonstrated with ECG data compression, a well known applicationarea.

This paper is organized as follows. In Section 2 basic facts about approxima-tion needed for later use are recalled. In Section 3 Hermite functions are shortlydescribed. Then we present our results in Section 4: bias extraction, basis func-tions selection and proof of correctness for expansion coefficients calculationdespite the lack of basis orthonormality. In Section 5 certain practicalities arepresented and an application of our improvements to ECG data compression isdemonstrated and discussed. In Section 6 conclusions are drawn.

2 Approximation Framework

Some selected facts on function approximation useful for this paper will be re-called. Let us consider the following function

Several Enhancements to Hermite-Based Approximation 13

fn+1 =

n

i=0

wigi, (1)

where gi G H, and H is a Hilbert space H = (H,||.||), i = 0, ..., n, andwi IR, i = 0, . . . , n.

For any function f from a Hilbert space H and a closed (finite dimensional)subspace G H with basis {g0, ..., gn} there exists a unique best approximationof f by elements of G ([6]). Let us denote it by gb. Because the error of the bestapproximation is orthogonal to all elements of the approximation space fgbG,the coefficients wi may be calculated from the set of linear equations

gi, f gb = 0 for i = 0, ..., n (2)

where ., . denotes inner product.The formula (2) can also be written as gi, f

nk=0 wkgk=gi, f

nk=0 wk

gi, gk = 0 for i = 0, ..., n or in the matrix form

w = Gf (3)

where = [gi, gj], i, j = 0, ..., n, w= [w0, ..., wn]T , Gf = [g0, f , ..., gn, f]Tand T denotes transposition.

Because there exists a unique best approximation of f in a n+1 dimensionalspace G with basis {g0, ..., gn}, the matrix is nonsingular and wb = 1Gf .

For any basis {g0, ..., gn} one can find such orthonormal basis {e0, ..., en},ei, ej = 1when i = j and ei, ej = 0 when i = j that span{g0, ..., gn} =span{e0, ..., en}. In such a case, is a unit matrix and

wb =[e0, f , e2, f , . .., en, f

]T. (4)

Finally (1) will take the form

fn+1 =

n

i=0

ei, f ei, i = 0, 1, ..., n. (5)

The squared error errorn+1 =< f fn, f fn > of the best approximationof a function f in the basis {e0, ..., en} is thus expressible by

||errorn+1 ||2 = ||f ||2 n

i=0

w2i . (6)

3 Hermite Functions

We will be looking at an orthonormal set of functions in the form of Hermitefunctions. Their expansion coefficients are easily and independently calculatedfrom (4). Let us consider a space of a great practical interest L2(,+)


5 0 50.8

0.6

0.4

0.2

0

0.2

0.4

0.6

0.8

h0

h1

h3

Fig. 1. Hermite functions h0, h1, h3

with the inner product defined < x, y >=+

x(t)y(t)dt. In such space a se-

quence of linearly independent and bounded functions could be defined as fol-lows h0(t) = w(t) = e

t2/2, h1(t) = tw(t),..., hn(t) = tnw(t),..This basis could

be orthonormalized by using the well known and efficient Gram-Schmidt process(see for instance [6]). Finally a new now orthonormal basis spanning the samespace is obtained

h0(t), h1(t), ..., hn(t), ... (7)

where

hn(t) = cne t

2

2 Hn(t); Hn(t) = (1)net2 dn

dtn(et

2

); cn =1

(2nn!)1/2

. (8)

The polynomialsHn(t) are called Hermite polynomials and the functions en(t)Hermite functions. According to (8) the first several Hermite functions could becalculated

h0(t) =1

1/4e

t2

2 ; h1(t) =1

21/4e

t2

2 2t;

h2(t) =1

221/4

et2

2 (4t2 2); h3(t) =1

431/4

et2

2 (8t3 12t)

A plot of several functions of the Hermite basis are shown in Fig.1.

4 Main Results

4.1 Extracting of Bias

In this section our first enhancement is introduced. Let f be any function from aHilbert spaceH. Instead of approximating function f, we suggest to approximate

Several Enhancements to Hermite-Based Approximation 15

the function f f0,where f0 H is a known function. Later f0 is added to theapproximant of f f0. Now a modification of (5) will be the following

ff0n+1 = f0 +

n

i=0

< f f0, ei > ei, (9)

Then the approximation error will be expressed as

ef0n = f ff0n+1 = f f0 n

i=0

< f f0, ei > ei,

and similarly to (6) its squared norm

||ef0n+1||2 = ||f f0||2 n

i=0

< f f0, ei >2 (10)

Theorem 1. Let H be a Hilbert space of functions on a subset of R containingthe interval [a, b], let f be a function from H, f H, {e0, e1, .. ., en} be anorthonormal set in H, c be a constant c R. Let f0 = c1[a,b] where 1[a,b] denotesa function of value 1 in the range [a, b] and 0 elsewhere, and the approximationformula be the following

ff0n+1 = f0 +

n

i=0

< f f0, ei > ei

then the norm of the approximation error is minimized for c = c0 and

c0 =< f, 1[a,b] >

ni=0 < f, ei >< ei, 1[a,b] >

(b a) ni=0 < ei, 1[a,b] >2(11)

Proof. The squared error formula (10) could be expressed as follows ||ef0n+1||2 =||f ||2 + ||f0||2 2 < f, f0 >

ni=0(f, ei ei, f0)2 = ||f ||

2 + c2(b a) 2c

ni=0(f, ei

2+ c2

ei, 1[a,b]

2 2c f, eiei, 1[a,b]

). Now differen-

tiating the squared error formula in respect of c and equating it to zero oneobtains (11).

Along the Theorem we are suggesting the two step approximation. First f0should be calculated and then the function f f0 will be approximated in ausual way.

Remark 1. One may notice that in many applications c0 of (11) could well beapproximated by

c0 < f, 1[a,b] >

(b a) (12)

The right hand side of (12) expresses the mean value of the approximated func-tion f in the range [a, b]. A usual choice of [a, b] is such as an actual function fargument range.


4.2 Basis Selection

In a typically stated approximation problem there is a function to be approx-imated f and a basis {e0, e1, ..., en} of approximation. We are looking for thefunction expansion coefficients related to the basis functions.

The problem may however be reformulated in the following way. Let searchfor any n+ 1 Hermite-basis functions, not necessarily with consecutive indices,ensuring the smallest error of approximation. In practice this easily can be done.Since for any orthonormal basis an indicator of the error reduction associatedwith the basis function ei is |wi| = | < f, ei > |, one may calculate sufficientlymany coefficients and order them

artificial neural networks (icann)

Documents