artificial neural networks (icann)

1052
Lecture Notes in Computer Science 5163 Commenced Publication in 1973 Founding and Former Series Editors: Gerhard Goos, Juris Hartmanis, and Jan van Leeuwen Editorial Board David Hutchison Lancaster University, UK Takeo Kanade Carnegie Mellon University, Pittsburgh, PA, USA Josef Kittler University of Surrey, Guildford, UK Jon M. Kleinberg Cornell University, Ithaca, NY, USA Alfred Kobsa University of California, Irvine, CA, USA Friedemann Mattern ETH Zurich, Switzerland John C. Mitchell Stanford University, CA, USA Moni Naor Weizmann Institute of Science, Rehovot, Israel Oscar Nierstrasz University of Bern, Switzerland C. Pandu Rangan Indian Institute of Technology, Madras, India Bernhard Steffen University of Dortmund, Germany Madhu Sudan Massachusetts Institute of Technology, MA, USA Demetri Terzopoulos University of California, Los Angeles, CA, USA Doug Tygar University of California, Berkeley, CA, USA Gerhard Weikum Max-Planck Institute of Computer Science, Saarbruecken, Germany

Upload: truongdieu

Post on 08-Dec-2016

280 views

Category:

Documents


1 download

TRANSCRIPT

  • Lecture Notes in Computer Science 5163

    Commenced Publication in 1973

    Founding and Former Series Editors:Gerhard Goos, Juris Hartmanis, and Jan van Leeuwen

    Editorial Board

    David HutchisonLancaster University, UK

    Takeo KanadeCarnegie Mellon University, Pittsburgh, PA, USA

    Josef KittlerUniversity of Surrey, Guildford, UK

    Jon M. KleinbergCornell University, Ithaca, NY, USA

    Alfred KobsaUniversity of California, Irvine, CA, USA

    Friedemann MatternETH Zurich, Switzerland

    John C. MitchellStanford University, CA, USA

    Moni NaorWeizmann Institute of Science, Rehovot, Israel

    Oscar NierstraszUniversity of Bern, Switzerland

    C. Pandu RanganIndian Institute of Technology, Madras, India

    Bernhard SteffenUniversity of Dortmund, Germany

    Madhu SudanMassachusetts Institute of Technology, MA, USA

    Demetri TerzopoulosUniversity of California, Los Angeles, CA, USA

    Doug TygarUniversity of California, Berkeley, CA, USA

    Gerhard WeikumMax-Planck Institute of Computer Science, Saarbruecken, Germany

  • Vera Kurkov

    Roman Neruda

    Jan Koutnk (Eds.)

    ArtificialNeural Networks ICANN 2008

    18th International Conference

    Prague, Czech Republic, September 3-6, 2008

    Proceedings, Part I

    13

  • Volume Editors

    Vera KurkovRoman NerudaInstitute of Computer ScienceAcademy of Sciences of the Czech RepublicPod Vodarenskou vezi 2182 07 Prague 8, Czech RepublicE-mail: {vera, roman}@cs.cas.cz

    Jan KoutnkDepartment of Computer ScienceCzech Technical University in PragueKarlovo nam. 13121 35 Prague 2, Czech RepublicE-mail: [email protected]

    Library of Congress Control Number: 2008934470

    CR Subject Classification (1998): F.1, I.2, I.5, I.4, G.3, J.3, C.2.1, C.1.3

    LNCS Sublibrary: SL 1 Theoretical Computer Science and General Issues

    ISSN 0302-9743

    ISBN-10 3-540-87535-2 Springer Berlin Heidelberg NewYork

    ISBN-13 978-3-540-87535-2 Springer Berlin Heidelberg NewYork

    This work is subject to copyright. All rights are reserved, whether the whole or part of the material isconcerned, specifically the rights of translation, reprinting, re-use of illustrations, recitation, broadcasting,reproduction on microfilms or in any other way, and storage in data banks. Duplication of this publicationor parts thereof is permitted only under the provisions of the German Copyright Law of September 9, 1965,in its current version, and permission for use must always be obtained from Springer. Violations are liableto prosecution under the German Copyright Law.

    Springer is a part of Springer Science+Business Media

    springer.com

    Springer-Verlag Berlin Heidelberg 2008Printed in Germany

    Typesetting: Camera-ready by author, data conversion by Scientific Publishing Services, Chennai, IndiaPrinted on acid-free paper SPIN: 12520565 06/3180 5 4 3 2 1 0

  • Preface

    This volume is the first part of the two-volume proceedings of the 18th Interna-tional Conference on Artificial Neural Networks (ICANN 2008) held September36, 2008 in Prague, Czech Republic. The ICANN conferences are annual meet-ings supervised by the European Neural Network Society, in cooperation withthe International Neural Network Society and the Japanese Neural Network So-ciety. This series of conferences has been held since 1991 in various Europeancountries and covers the field of neurocomputing and related areas. In 2008,the ICANN conference was organized by the Institute of Computer Science,Academy of Sciences of the Czech Republic together with the Department ofComputer Science and Engineering from the Faculty of Electrical Engineeringof the Czech Technical University in Prague. Over 300 papers were submittedto the regular sessions, two special sessions and two workshops. The ProgramCommittee selected about 200 papers after a thorough peer-review process; theyare published in the two volumes of these proceedings. The large number, varietyof topics and high quality of submitted papers reflect the vitality of the field ofartificial neural networks.

    The first volume contains papers on the mathematical theory of neurocom-puting, learning algorithms, kernel methods, statistical learning and ensembletechniques, support vector machines, reinforcement learning, evolutionary com-puting, hybrid systems, self-organization, control and robotics, signal and timeseries processing and image processing.

    The second volume is devoted to pattern recognition and data analysis, hard-ware and embedded systems, computational neuroscience, connectionistic cogni-tive science, neuroinformatics and neural dynamics. It also contains papers fromtwo special sessions, Coupling, Synchronies, and Firing Patterns: From Cogni-tion to Disease, and Constructive Neural Networks, and two workshops, NewTrends in Self-Organization and Optimization of Artificial Neural Networks, andAdaptive Mechanisms of the Perception-Action Cycle.

    It is our pleasure to express our gratitude to everyone who contributed inany way to the success of the event and the completion of these proceedings. Inparticular, we thank the members of the Board of the ENNS who uphold thetradition of the series and helped with the organization. With deep gratitude wethank all the members of the Program Committee and the reviewers for theirgreat effort in the reviewing process. We are very grateful to the members of theOrganizing Committee whose hard work made the vision of the 18th ICANNreality. Zdenek Buk and Eva Pospsilova and the entire Computational Intel-ligence Group at Czech Technical University in Prague deserve special thanksfor preparing the conference proceedings. We thank to Miroslav Cepek for theconference website administration. We thank Milena Zeithamlova and Action MAgency for perfect local arrangements. We also thank Alfred Hofmann, Ursula

  • VI Preface

    Barth, Anna Kramer and Peter Strasser from Springer for their help with thisdemanding publication project. Last but not least, we thank all authors whocontributed to this volume for sharing their new ideas and results with the com-munity of researchers in this rapidly developing field of biologically motivatedcomputer science. We hope that you enjoy reading and find inspiration for yourfuture work in the papers contained in these two volumes.

    June 2008 Vera KurkovaRoman Neruda

    Jan Koutnk

  • Organization

    Conference Chairs

    General Chair Vera Kurkova, Academy of Sciences of theCzech Republic, Czech Republic

    Co-Chairs Roman Neruda, Academy of Sciences of theCzech Republic, Czech Republic

    Jan Koutnk, Czech Technical University inPrague, Czech Republic

    Milena Zeithamlova, Action M Agency,Czech Republic

    Honorary Chair John Taylor, Kings College London, UK

    Program Committee

    Wlodzislaw Duch Nicolaus Copernicus University in Torun,Poland

    Luis Alexandre University of Beira Interior, PortugalBruno Apolloni Universita Degli Studi di Milano, ItalyTimo Honkela Helsinki University of Technology, FinlandStefanos Kollias National Technical University in Athens,

    GreeceThomas Martinetz University of Lubeck, GermanyGuenter Palm University of Ulm, GermanyAlessandro Sperduti Universita Degli Studi di Padova, ItalyMichel Verleysen Universite catholique de Louvain, BelgiumAlessandro E.P. Villa Universite jouseph Fourier, Grenoble,

    FranceStefan Wermter University of Sunderland, UKRudolf Albrecht University of Innsbruck, AustriaPeter Andras Newcastle University, UKGabriela Andrejkova P.J. Safarik University in Kosice, SlovakiaBartlomiej Beliczynski Warsaw University of Technology, PolandMonica Bianchini Universita degli Studi di Siena, ItalyAndrej Dobnikar University of Ljubljana, SloveniaJose R. Dorronsoro Universidad Autonoma de Madrid, Spain

    Peter Erdi Hungarian Academy of Sciences, HungaryMarco Gori Universita degli Studi di Siena, ItalyBarbora Hammer University of Osnabruck, Germany

  • VIII Organization

    Tom Heskes Radboud University Nijmegen,The Netherlands

    Yoshifusa Ito Aichi-Gakuin University, JapanJanusz Kacprzyk Polish Academy of Sciences, PolandPaul C. Kainen Georgetown University, USAMikko Kolehmainen University of Kuopio, FinlandPavel Kordk Czech Technical University in Prague,

    Czech RepublicVladimr Kvasnicka Slovak University of Technology in Bratislava,

    SlovakiaDanilo P. Mandic Imperial College, UKErkki Oja Helsinki University of Technology, FinlandDavid Pearson Universite Jean Monnet, Saint-Etienne,

    FranceLionel Prevost Universite Pierre et Marie Curie, Paris,

    FranceBernadete Ribeiro University of Coimbra, PortugalLeszek Rutkowski Czestochowa University of Technology, PolandMarcello Sanguineti University of Genova, ItalyKaterina Schindler Austrian Academy of Sciences, AustriaJuergen Schmidhuber TU Munich (Germany) and IDSIA

    (Switzerland)Jir Sma Academy of Sciences of the Czech Republic,

    Czech RepublicPeter Sincak Technical University in Kosice, SlovakiaMiroslav Skrbek Czech Technical University in Prague,

    Czech RepublicJohan Suykens Katholieke Universiteit Leuven, BelgiumMiroslav Snorek Czech Technical University in Prague,

    Czech RepublicRyszard Tadeusiewicz AGH University of Science and Technology,

    Poland

    Local Organizing Committee

    Zdenek Buk Czech Technical University in PragueMiroslav Cepek Czech Technical University in PragueJan Drchal Czech Technical University in PraguePaul C. Kainen Georgetown UniversityOleg Kovark Czech Technical University in PragueRudolf Marek Czech Technical University in PragueAles Pilny Czech Technical University in PragueEva Pospsilova Academy of Sciences of the Czech RepublicTomas Siegl Czech Technical University in Prague

  • Organization IX

    Referees

    S. AbeR. AdamczakR. AlbrechtE. AlhoniemiR. AndonieG. AngeliniD. AnguitaC. Angulo-BahonC. ArchambeauM. AtenciaP. AubrechtY. AvrithisL. BenuskovaT. BeranZ. BukG. CawleyM. CepekE. CorchadoV. CutsuridisE. DominguezG. DouniasJ. DrchalD. A. ElizondoH. ErwinZ. FabianA. FlanaganL. FrancoD. FrancoisC. FyfeN. Garca-PedrajasG. GneccoB. GosselinJ. GrimR. HaschkeM. HolenaJ. HollmenT. David Huang

    D. HusekA. HussainM. ChetouaniC. IgelG. IndiveriS. IshiiH. IzumiJ.M. JerezM. JirinaM. Jirina, jr.K.T. KalveramK. KarpouzisS. KasderidisM. KoskelaJ. KubalkM. KulichF.J. KurfessM. KurzynskiJ. LaaksonenE. LangK. LeiviskaL. LhotskaA. LikasC. LoizouR. MarekE. MarchioriM. A. Martn-MerinoV. di MassaF. MasulliJ. MandziukS. MelacciA. MicheliF. MoutardeR. Cristian MuresanM. NakayamaM. NavaraD. Novak

    M. OlteanuD. Ortiz BoyerH. Paugam-MoisyK. PelckmansG. PetersP. PoskD. PolaniM. PorrmannA. PucciA. RaouzaiouK. RapantzikosM. RochaA. RomarizF. RossiL. SartiB. SchrauwenF. SchwenkerO. SimulaA. SkodrasS. SlusnyA. StafylopatisJ. StastnyD. StefkaG. StoilosA. SuarezE. TrentinN. TsapatsoulisP. VidnerovaT. VillmannZ. VomlelT. WennekersP. WiraB. WynsZ. YangF. Zelezny

  • Table of Contents Part I

    Mathematical Theory of Neurocomputing

    Dimension Reduction for Mixtures of Exponential Families . . . . . . . . . . . . 1Shotaro Akaho

    Several Enhancements to Hermite-Based Approximation ofOne-Variable Functions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11

    Bartlomiej Beliczynski and Bernardete Ribeiro

    Multi-category Bayesian Decision by Neural Networks . . . . . . . . . . . . . . . . 21Yoshifusa Ito, Cidambi Srinivasan, and Hiroyuki Izumi

    Estimates of Network Complexity and Integral Representations . . . . . . . . 31Paul C. Kainen and Vera Kurkova

    Reliability of Cross-Validation for SVMs in High-Dimensional, LowSample Size Scenarios . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 41

    Sascha Klement, Amir Madany Mamlouk, and Thomas Martinetz

    Generalization of Concave and Convex Decomposition in Kikuchi FreeEnergy . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 51

    Yu Nishiyama and Sumio Watanabe

    Analysis of Chaotic Dynamics Using Measures of the Complex NetworkTheory . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 61

    Yutaka Shimada, Takayuki Kimura, and Tohru Ikeguchi

    Global Dynamics of Finite Cellular Automata . . . . . . . . . . . . . . . . . . . . . . . 71Martin Schule, Thomas Ott, and Ruedi Stoop

    Learning Algorithms

    Semi-supervised Learning of Tree-Structured RBF Networks UsingCo-training . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 79

    Mohamed F. Abdel Hady, Friedhelm Schwenker, and Gunther Palm

    A New Type of ART2 Architecture and Application to Color ImageSegmentation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 89

    Jiaoyan Ai, Brian Funt, and Lilong Shi

    BICA: A Boolean Indepenedent Component Analysis Approach . . . . . . . . 99Bruno Apolloni, Simone Bassis, and Andrea Brega

  • XII Table of Contents Part I

    Improving the Learning Speed in 2-Layered LSTM Network byEstimating the Configuration of Hidden Units and Optimizing WeightsInitialization . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 109

    Debora C. Correa, Alexandre L.M. Levada, and Jose H. Saito

    Manifold Construction Using the Multilayer Perceptron . . . . . . . . . . . . . . . 119Wei-Chen Cheng and Cheng-Yuan Liou

    Improving Performance of a Binary Classifier by Training SetSelection . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 128

    Cezary Dendek and Jacek Mandziuk

    An Overcomplete ICA Algorithm by InfoMax and InfoMin . . . . . . . . . . . . 136Yoshitatsu Matsuda and Kazunori Yamaguchi

    OP-ELM: Theory, Experiments and a Toolbox . . . . . . . . . . . . . . . . . . . . . . . 145Yoan Miche, Antti Sorjamaa, and Amaury Lendasse

    Robust Nonparametric Probability Density Estimation by SoftClustering . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 155

    Ezequiel Lopez-Rubio, Juan Miguel Ortiz-de-Lazcano-Lobato,

    Domingo Lopez-Rodrguez, and Mara del Carmen Vargas-Gonzalez

    Natural Conjugate Gradient on Complex Flag Manifolds for ComplexIndependent Subspace Analysis . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 165

    Yasunori Nishimori, Shotaro Akaho, and Mark D. Plumbley

    Quadratically Constrained Quadratic Programming for SubspaceSelection in Kernel Regression Estimation . . . . . . . . . . . . . . . . . . . . . . . . . . . 175

    Marco Signoretto, Kristiaan Pelckmans, and Johan A.K. Suykens

    The Influence of the Risk Functional in Data Classification withMLPs . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 185

    Lus M. Silva, Mark Embrechts, Jorge M. Santos, and

    Joaquim Marques de Sa

    Nonnegative Least Squares Learning for the Random NeuralNetwork . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 195

    Stelios Timotheou

    Kernel Methods, Statistical Learning, and EnsembleTechniques

    Sparse Bayes Machines for Binary Classification . . . . . . . . . . . . . . . . . . . . . 205Daniel Hernandez-Lobato

    Tikhonov Regularization Parameter in Reproducing Kernel HilbertSpaces with Respect to the Sensitivity of the Solution . . . . . . . . . . . . . . . . 215

    Katerina Hlavackova-Schindler

  • Table of Contents Part I XIII

    Mixture of Expert Used to Learn Game Play . . . . . . . . . . . . . . . . . . . . . . . . 225Peter Lacko and Vladimr Kvasnicka

    Unsupervised Bayesian Network Learning for Object Recognition inImage Sequences . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 235

    Daniel Oberhoff and Marina Kolesnik

    Using Feature Distribution Methods in Ensemble Systems Combinedby Fusion and Selection-Based Methods . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 245

    Laura E.A. Santana, Anne M.P. Canuto, and Joao C. Xavier Jr.

    Bayesian Ying-Yang Learning on Orthogonal Binary Factor Analysis . . . 255Ke Sun and Lei Xu

    A Comparative Study on Data Smoothing Regularization for LocalFactor Analysis . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 265

    Shikui Tu, Lei Shi, and Lei Xu

    Adding Diversity in Ensembles of Neural Networks by Reordering theTraining Set . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 275

    Joaqun Torres-Sospedra, Carlos Hernandez-Espinosa, and

    Mercedes Fernandez-Redondo

    New Results on Combination Methods for Boosting Ensembles . . . . . . . . 285Joaqun Torres-Sospedra, Carlos Hernandez-Espinosa, and

    Mercedes Fernandez-Redondo

    Support Vector Machines

    Batch Support Vector Training Based on Exact Incremental Training . . . 295Shigeo Abe

    A Kernel Method for the Optimization of the Margin Distribution . . . . . 305Fabio Aiolli, Giovanni Da San Martino, and Alessandro Sperduti

    A 4Vector MDM Algorithm for Support Vector Training . . . . . . . . . . . . . 315Alvaro Barbero, Jorge Lopez, and Jose R. Dorronsoro

    Implementation Issues of an Incremental and Decremental SVM . . . . . . . 325Honorius Galmeanu and Razvan Andonie

    Online Clustering of Non-stationary Data Using Incremental andDecremental SVM . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 336

    Khaled Boukharouba and Stephane Lecoeuche

    Support Vector Machines for Visualization and DimensionalityReduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 346

    Tomasz Maszczyk and Wlodzislaw Duch

  • XIV Table of Contents Part I

    Reinforcement Learning

    Multigrid Reinforcement Learning with Reward Shaping . . . . . . . . . . . . . . 357Marek Grzes and Daniel Kudenko

    Self-organized Reinforcement Learning Based on Policy Gradientin Nonstationary Environments . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 367

    Yu Hiei, Takeshi Mori, and Shin Ishii

    Robust Population Coding in Free-Energy-Based ReinforcementLearning . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 377

    Makoto Otsuka, Junichiro Yoshimoto, and Kenji Doya

    Policy Gradients with Parameter-Based Exploration for Control . . . . . . . 387Frank Sehnke, Christian Osendorfer, Thomas Ruckstie,

    Alex Graves, Jan Peters, and Jurgen Schmidhuber

    A Continuous Internal-State Controller for Partially ObservableMarkov Decision Processes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 397

    Yuki Taniguchi, Takeshi Mori, and Shin Ishii

    Episodic Reinforcement Learning by Logistic Reward-WeightedRegression . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 407

    Daan Wierstra, Tom Schaul, Jan Peters, and Juergen Schmidhuber

    Error-Entropy Minimization for Dynamical Systems Modeling . . . . . . . . . 417Jernej Zupanc

    Evolutionary Computing

    Hybrid Evolution of Heterogeneous Neural Networks . . . . . . . . . . . . . . . . . 426Zdenek Buk and Miroslav Snorek

    Ant Colony Optimization with Castes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 435Oleg Kovark and Miroslav Skrbek

    Neural Network Ensembles for Classification Problems UsingMultiobjective Genetic Algorithms . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 443

    David Lahoz and Pedro Mateo

    Analysis of Vestibular-Ocular Reflex by Evolutionary Framework . . . . . . . 452Daniel Novak, Ales Pilny, Pavel Kordk, Stefan Holiga, Petr Posk,

    R. Cerny, and Richard Brzezny

    Fetal Weight Prediction Models: Standard Techniques or ComputationalIntelligence Methods? . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 462

    Tomas Siegl, Pavel Kordk, Miroslav Snorek, and Pavel Calda

  • Table of Contents Part I XV

    Evolutionary Canonical Particle Swarm Optimizer A Proposal ofMeta-optimization in Model Selection . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 472

    Hong Zhang and Masumi Ishikawa

    Hybrid Systems

    Building Localized Basis Function Networks Using Context DependentClustering . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 482

    Marcin Blachnik and Wlodzislaw Duch

    Adaptation of Connectionist Weighted Fuzzy Logic Programs withKripke-Kleene Semantics . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 492

    Alexandros Chortaras, Giorgos Stamou, Andreas Stafylopatis, and

    Stefanos Kollias

    Neuro-fuzzy System for Road Signs Recognition . . . . . . . . . . . . . . . . . . . . . 503Boguslaw Cyganek

    Neuro-inspired Speech Recognition with Recurrent Spiking Neurons . . . . 513Arfan Ghani, T. Martin McGinnity, Liam P. Maguire, and

    Jim Harkin

    Predicting the Performance of Learning Algorithms Using SupportVector Machines as Meta-regressors . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 523

    Silvio B. Guerra, Ricardo B.C. Prudencio, and Teresa B. Ludermir

    Municipal Creditworthiness Modelling by Kohonens Self-organizingFeature Maps and Fuzzy Logic Neural Networks . . . . . . . . . . . . . . . . . . . . . 533

    Petr Hajek and Vladimir Olej

    Implementing Boolean Matrix Factorization . . . . . . . . . . . . . . . . . . . . . . . . . 543Roman Neruda, Vaclav Snasel, Jan Platos, Pavel Kromer,

    Dusan Husek, and Alexander A. Frolov

    Application of Potts-Model Perceptron for Binary PatternsIdentification . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 553

    Vladimir Kryzhanovsky, Boris Kryzhanovsky, and Anatoly Fonarev

    Using ARTMAP-Based Ensemble Systems Designed by Three Variantsof Boosting . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 562

    Araken de Medeiros Santos and Anne Magaly de Paula Canuto

    Self-organization

    Matrix Learning for Topographic Neural Maps . . . . . . . . . . . . . . . . . . . . . . . 572Banchar Arnonkijpanich, Barbara Hammer,

    Alexander Hasenfuss, and Chidchanok Lursinsap

  • XVI Table of Contents Part I

    Clustering Quality and Topology Preservation in Fast LearningSOMs . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 583

    Antonino Fiannaca, Giuseppe Di Fatta, Salvatore Gaglio,

    Riccardo Rizzo, and Alfonso Urso

    Enhancing Topology Preservation during Neural Field DevelopmentVia Wiring Length Minimization . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 593

    Claudius Glaser, Frank Joublin, and Christian Goerick

    Adaptive Translation: Finding Interlingual Mappings UsingSelf-organizing Maps . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 603

    Timo Honkela, Sami Virpioja, and Jaakko Vayrynen

    Self-Organizing Neural Grove: Efficient Multiple Classifier System withPruned Self-Generating Neural Trees . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 613

    Hirotaka Inoue

    Self-organized Complex Neural Networks through Nonlinear TemporallyAsymmetric Hebbian Plasticity . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 623

    Hideyuki Kato and Tohru Ikeguchi

    Temporal Hebbian Self-Organizing Map for Sequences . . . . . . . . . . . . . . . . 632Jan Koutnk and Miroslav Snorek

    FLSOM with Different Rates for Classification in ImbalancedDatasets . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 642

    Ivan Machon-Gonzalez and Hilario Lopez-Garca

    A Self-organizing Neural System for Background and ForegroundModeling . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 652

    Lucia Maddalena and Alfredo Petrosino

    Analyzing the Behavior of the SOM through Wavelet Decomposition ofTime Series Generated during Its Execution . . . . . . . . . . . . . . . . . . . . . . . . . 662

    Vctor Mireles and Antonio Neme

    Decreasing Neighborhood Revisited in Self-Organizing Maps . . . . . . . . . . . 671Antonio Neme, Elizabeth Chavez, Alejandra Cervera, and

    Vctor Mireles

    A New GHSOM Model Applied to Network Security . . . . . . . . . . . . . . . . . 680Esteban J. Palomo, Enrique Domnguez, Rafael Marcos Luque, and

    Jose Munoz

    Reduction of Visual Information in Neural Network LearningVisualization . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 690

    Matus Uzak, Rudolf Jaksa, and Peter Sincak

  • Table of Contents Part I XVII

    Control and Robotics

    Heuristiscs-Based High-Level Strategy for Multi-agent Systems . . . . . . . . 700Peter Gasztonyi and Istvan Harmati

    Echo State Networks for Online Prediction of MovementData Comparing Investigations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 710

    Sven Hellbach, Soren Strauss, Julian P. Eggert, Edgar Korner, and

    Horst-Michael Gross

    Comparison of RBF Network Learning and Reinforcement Learning onthe Maze Exploration Problem . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 720

    Stanislav Slusny, Roman Neruda, and Petra Vidnerova

    Modular Neural Networks for Model-Free Behavioral Learning . . . . . . . . . 730Johane Takeuchi, Osamu Shouno, and Hiroshi Tsujino

    From Exploration to Planning . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 740Cornelius Weber and Jochen Triesch

    Signal and Time Series Processing

    Sentence-Level Evaluation Using Co-occurences of N-Grams . . . . . . . . . . . 750Theologos Athanaselis, Stelios Bakamidis,

    Konstantinos Mamouras, and Ioannis Dologlou

    Identifying Single Source Data for Mixing Matrix Estimation inInstantaneous Blind Source Separation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 759

    Pau Bofill

    ECG Signal Classification Using GAME Neural Network and ItsComparison to Other Classifiers . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 768

    Miroslav Cepek, Miroslav Snorek, and Vaclav Chudacek

    Predictive Modeling with Echo State Networks . . . . . . . . . . . . . . . . . . . . . . 778Michal Cernansky and Peter Tino

    Sparse Coding Neural Gas for the Separation of Noisy OvercompleteSources . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 788

    Kai Labusch, Erhardt Barth, and Thomas Martinetz

    Mutual Information Based Input Variable Selection Algorithm andWavelet Neural Network for Time Series Prediction . . . . . . . . . . . . . . . . . . 798

    Rashidi Khazaee Parviz, Mozayani Nasser, and M.R. Jahed Motlagh

    Stable Output Feedback in Reservoir Computing Using RidgeRegression . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 808

    Francis Wyffels, Benjamin Schrauwen, and Dirk Stroobandt

  • XVIII Table of Contents Part I

    Image Processing

    Spatio-temporal Summarizing Method of Periodic Image Sequenceswith Kohonen Maps . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 818

    Mohamed Berkane, Patrick Clarysse, and Isabelle E. Magnin

    Image Classification by Histogram Features Created with LearningVector Quantization . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 827

    Marcin Blachnik and Jorma Laaksonen

    A Statistical Model for Histogram Refinement . . . . . . . . . . . . . . . . . . . . . . . 837Nizar Bouguila and Walid ElGuebaly

    Efficient Video Shot Summarization Using an Enhanced SpectralClustering Approach . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 847

    Vasileios Chasanis, Aristidis Likas, and Nikolaos Galatsanos

    Surface Reconstruction Techniques Using Neural Networks to RecoverNoisy 3D Scenes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 857

    David Elizondo, Shang-Ming Zhou, and Charalambos Chrysostomou

    A Spatio-temporal Extension of the SUSAN-Filter . . . . . . . . . . . . . . . . . . . 867Benedikt Kaiser and Gunther Heidemann

    A Neighborhood-Based Competitive Network for Video Segmentationand Object Detection . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 877

    Rafael Marcos Luque Baena, Enrique Dominguez,

    Domingo Lopez-Rodrguez, and Esteban J. Palomo

    A Hierarchic Method for Footprint Segmentation Based on SOM . . . . . . . 887Marco Mora Cofre, Ruben Valenzuela, and Girma Berhe

    Co-occurrence Matrixes for the Quality Assessment of Coded Images . . . 897Judith Redi, Paolo Gastaldo, Rodolfo Zunino, and Ingrid Heynderickx

    Semantic Adaptation of Neural Network Classifiers in ImageSegmentation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 907

    Nikolaos Simou, Thanos Athanasiadis, Stefanos Kollias,

    Giorgos Stamou, and Andreas Stafylopatis

    Partially Monotone Networks Applied to Breast Cancer Detection onMammograms . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 917

    Marina Velikova, Hennie Daniels, and Maurice Samulski

    Image Processing Recognition Systems

    A Neuro-fuzzy Approach to User Attention Recognition . . . . . . . . . . . . . . . 927Stylianos Asteriadis, Kostas Karpouzis, and Stefanos Kollias

  • Table of Contents Part I XIX

    TriangleVision: A Toy Visual System . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 937Thomas Bangert

    Face Recognition with VG-RAM Weightless Neural Networks . . . . . . . . . . 951Alberto F. De Souza, Claudine Badue, Felipe Pedroni,

    Elias Oliveira, Stiven Schwanz Dias, Hallysson Oliveira, and

    Soterio Ferreira de Souza

    Invariant Object Recognition with Slow Feature Analysis . . . . . . . . . . . . . 961Mathias Franzius, Niko Wilbert, and Laurenz Wiskott

    Analysis-by-Synthesis by Learning to Invert Generative Black Boxes . . . . 971Vinod Nair, Josh Susskind, and Geoffrey E. Hinton

    A Bio-inspired Connectionist Architecture for Visual Classification ofMoving Objects . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 982

    Pedro L. Sanchez Orellana and Claudio Castellanos Sanchez

    A Visual Object Recognition System Invariant to Scale and Rotation . . . 991Yasuomi D. Sato, Jenia Jitsev, and Christoph von der Malsburg

    Recognizing Facial Expressions: A Comparison of ComputationalApproaches . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1001

    Aruna Shenoy, Tim M. Gale, Neil Davey, Bruce Christiansen, and

    Ray Frank

    A Probabilistic Prediction Method for Object Contour Tracking . . . . . . . 1011Daniel Weiler, Volker Willert, and Julian Eggert

    Author Index . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1021

  • Table of Contents Part II

    Pattern Recognition and Data Analysis

    Investigating Similarity of Ontology Instances and Its Causes . . . . . . . . . . 1Anton Andrejko and Maria Bielikova

    A Neural Model for Delay Correction in a Distributed ControlSystem . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11

    Ana Antunes, Fernando Morgado Dias, and Alexandre Mota

    A Model-Based Relevance Estimation Approach for Feature Selectionin Microarray Datasets . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 21

    Gianluca Bontempi and Patrick E. Meyer

    Non-stationary Data Mining: The Network Security Issue . . . . . . . . . . . . 32Sergio Decherchi, Paolo Gastaldo, Judith Redi, and Rodolfo Zunino

    Efficient Feature Selection for PTR-MS Fingerprinting of AgroindustrialProducts . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 42

    Pablo M. Granitto, Franco Biasioli, Cesare Furlanello, and

    Flavia Gasperi

    Extraction of Binary Features by Probabilistic Neural Networks . . . . . . . 52Jir Grim

    Correlation Integral Decomposition for Classification . . . . . . . . . . . . . . . . . 62Marcel Jirina and Marcel Jirina Jr.

    Modified q-State Potts Model with Binarized Synaptic Coefficients . . . . . 72Vladimir Kryzhanovsky

    Learning Similarity Measures from Pairwise Constraints with NeuralNetworks . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 81

    Marco Maggini, Stefano Melacci, and Lorenzo Sarti

    Prediction of Binding Sites in the Mouse Genome Using Support VectorMachines . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 91

    Yi Sun, Mark Robinson, Rod Adams, Alistair Rust, and Neil Davey

    Mimicking Go Experts with Convolutional Neural Networks . . . . . . . . . . . 101Ilya Sutskever and Vinod Nair

    Associative Memories Applied to Pattern Recognition . . . . . . . . . . . . . . . . 111Roberto A. Vazquez and Humberto Sossa

  • XXII Table of Contents Part II

    MLP-Based Detection of Targets in Clutter: Robustness with Respectto the Shape Parameter of Weibull-Disitributed Clutter . . . . . . . . . . . . . . . 121

    Raul Vicen-Bueno, Eduardo Galan-Fernandez,

    Manuel Rosa-Zurera, and Maria P. Jarabo-Amores

    Hardware, Embedded Systems

    Modeling and Synthesis of Computational Efficient AdaptiveNeuro-Fuzzy Systems Based on Matlab . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 131

    Guillermo Bosque, Javier Echanobe, Ines del Campo, and

    Jose M. Tarela

    Embedded Neural Network for Swarm Learning of Physical Robots . . . . . 141Pitoyo Hartono and Sachiko Kakita

    Distribution Stream of Tasks in Dual-Processor System . . . . . . . . . . . . . . . 150Michael Kryzhanovsky and Magomed Malsagov

    Efficient Implementation of the THSOM Neural Network . . . . . . . . . . . . . . 159Rudolf Marek and Miroslav Skrbek

    Reconfigurable MAC-Based Architecture for Parallel HardwareImplementation on FPGAs of Artificial Neural Networks . . . . . . . . . . . . . . 169

    Nadia Nedjah, Rodrigo Martins da Silva,

    Luiza de Macedo Mourelle, and Marcus Vinicius Carvalho da Silva

    Implementation of Central Pattern Generator in an FPGA-BasedEmbedded System . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 179

    Cesar Torres-Huitzil and Bernard Girau

    Biologically-Inspired Digital Architecture for a Cortical Model ofOrientation Selectivity . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 188

    Cesar Torres-Huitzil, Bernard Girau, and Miguel Arias-Estrada

    Neural Network Training with Extended Kalman Filter Using GraphicsProcessing Unit . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 198

    Peter Trebaticky and Jir Pospchal

    Blind Source-Separation in Mixed-Signal VLSI Using the InfoMaxAlgorithm . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 208

    Waldo Valenzuela, Gonzalo Carvajal, and Miguel Figueroa

    Computational Neuroscience

    Synaptic Rewiring for Topographic Map Formation . . . . . . . . . . . . . . . . . . 218Simeon A. Bamford, Alan F. Murray, and David J. Willshaw

    Implementing Bayes Rule with Neural Fields . . . . . . . . . . . . . . . . . . . . . . . . 228Raymond H. Cuijpers and Wolfram Erlhagen

  • Table of Contents Part II XXIII

    Encoding and Retrieval in a CA1 Microcircuit Model of theHippocampus . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 238

    Vassilis Cutsuridis, Stuart Cobb, and Bruce P. Graham

    A Bio-inspired Architecture of an Active Visual Search Model . . . . . . . . . 248Vassilis Cutsuridis

    Implementing Fuzzy Reasoning on a Spiking Neural Network . . . . . . . . . . 258Cornelius Glackin, Liam McDaid, Liam Maguire, and

    Heather Sayers

    Short Term Plasticity Provides Temporal Filtering at ChemicalSynapses . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 268

    Bruce P. Graham and Christian Stricker

    Observational Versus Trial and Error Effects in a Model of an InfantLearning Paradigm . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 277

    Matthew Hartley, Jacqueline Fagard, Rana Esseily, and John Taylor

    Modeling the Effects of Dopamine on the Antisaccade Reaction Times(aSRT) of Schizophrenia Patients . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 290

    Ioannis Kahramanoglou, Stavros Perantonis, Nikolaos Smyrnis,

    Ioannis Evdokimidis, and Vassilis Cutsuridis

    Fast Multi-command SSVEP Brain Machine Interface withoutTraining . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 300

    Pablo Martinez Vasquez, Hovagim Bakardjian,

    Montserrat Vallverdu, and Andrezj Cichocki

    Separating Global Motion Components in Transparent VisualStimuli A Phenomenological Analysis . . . . . . . . . . . . . . . . . . . . . . . . . . . . 308

    Andrew Meso and Johannes M. Zanker

    Lateral Excitation between Dissimilar Orientation Columns forOngoing Subthreshold Membrane Oscillations in Primary VisualCortex . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 318

    Yuto Nakamura, Kazuhiro Tsuboi, and Osamu Hoshino

    A Computational Model of Cortico-Striato-Thalamic Circuits inGoal-Directed Behaviour . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 328

    N. Serap Sengor, Ozkan Karabacak, and Ulrich Steinmetz

    Firing Pattern Estimation of Synaptically Coupled Hindmarsh-RoseNeurons by Adaptive Observer . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 338

    Yusuke Totoki, Kouichi Mitsunaga, Haruo Suemitsu, and

    Takami Matsuo

    Global Oscillations of Neural Fields in CA3 . . . . . . . . . . . . . . . . . . . . . . . . . 348Francesco Ventriglia

  • XXIV Table of Contents Part II

    Connectionistic Cognitive Science

    Selective Attention Model of Moving Objects . . . . . . . . . . . . . . . . . . . . . . . . 358Roman Borisyuk, David Chik, and Yakov Kazanovich

    Tempotron-Like Learning with ReSuMe . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 368Razvan V. Florian

    Neural Network Capable of Amodal Completion . . . . . . . . . . . . . . . . . . . . . 376Kunihiko Fukushima

    Predictive Coding in Cortical Microcircuits . . . . . . . . . . . . . . . . . . . . . . . . . . 386Andreea Lazar, Gordon Pipa, and Jochen Triesch

    A Biologically Inspired Spiking Neural Network for Sound Localisationby the Inferior Colliculus . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 396

    Jindong Liu, Harry Erwin, Stefan Wermter, and Mahmoud Elsaid

    Learning Structurally Analogous Tasks . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 406Paul W. Munro

    Auto-structure of Presynaptic Activity Defines Postsynaptic FiringStatistics and Can Modulate STDP-Based Structure Formation andLearning . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 413

    Gordon Pipa, Raul Vicente, and Alexander Tikhonov

    Decision Making Logic of Visual Brain . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 423Andrzej W. Przybyszewski

    A Computational Model of Saliency Map Read-Out During VisualSearch . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 433

    Mia Setic and Drazen Domijan

    A Corpus-Based Computational Model of Metaphor UnderstandingIncorporating Dynamic Interaction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 443

    Asuka Terai and Masanori Nakagawa

    Deterministic Coincidence Detection and Adaptation Via DelayedInputs . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 453

    Zhijun Yang, Alan Murray, and Juan Huo

    Synaptic Formation Rate as a Control Parameter in a Model for theOntogenesis of Retinotopy . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 462

    Junmei Zhu

    Neuroinformatics

    Fuzzy Symbolic Dynamics for Neurodynamical Systems . . . . . . . . . . . . . . . 471Krzysztof Dobosz and Wlodzislaw Duch

  • Table of Contents Part II XXV

    Towards Personalized Neural Networks for Epileptic SeizurePrediction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 479

    Antonio Dourado, Ricardo Martins, Joao Duarte, and Bruno Direito

    Real and Modeled Spike Trains: Where Do They Meet? . . . . . . . . . . . . . . . 488Vasile V. Moca, Danko Nikolic, and Raul C. Muresan

    The InfoPhase Method or How to Read Neurons with Neurons . . . . . . . . . 498Raul C. Muresan, Wolf Singer, and Danko Nikolic

    Artifact Processor for Neuronal Activity Analysis during Deep BrainStimulation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 508

    Dimitri V. Nowicki, Brigitte Piallat, Alim-Louis Benabid, and

    Tatiana I. Aksenova

    Analysis of Human Brain NMR Spectra in Vivo Using Artificial NeuralNetworks . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 517

    Erik Saudek, Daniel Novak, Dita Wagnerova, and Milan Hajek

    Multi-stage FCM-Based Intensity Inhomogeneity Correction for MRBrain Image Segmentation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 527

    Laszlo Szilagyi, Sandor M. Szilagyi, Laszlo David, and Zoltan Benyo

    KCMAC: A Novel Fuzzy Cerebellar Model for Medical DecisionSupport . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 537

    S.D. Teddy

    Decoding Population Neuronal Responses by Topological Clustering . . . . 547Hujun Yin, Stefano Panzeri, Zareen Mehboob, and Mathew Diamond

    Neural Dynamics

    Learning of Neural Information Routing for Correspondence Finding . . . 557Jan D. Bouecke and Jorg Lucke

    A Globally Asymptotically Stable Plasticity Rule for Firing RateHomeostasis . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 567

    Prashant Joshi and Jochen Triesch

    Analysis and Visualization of the Dynamics of Recurrent NeuralNetworks for Symbolic Sequences Processing . . . . . . . . . . . . . . . . . . . . . . . . 577

    Matej Makula and Lubica Benuskova

    Chaotic Search for Traveling Salesman Problems by Using 2-opt andOr-opt Algorithms . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 587

    Takafumi Matsuura and Tohru Ikeguchi

    Comparison of Neural Networks Incorporating Partial Monotonicity byStructure . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 597

    Alexey Minin and Bernhard Lang

  • XXVI Table of Contents Part II

    Special Session: Coupling, Synchronies and FiringPatterns: from Cognition to Disease

    Effect of the Background Activity on the Reconstruction of Spike Trainby Spike Pattern Detection . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 607

    Yoshiyuki Asai and Alessandro E.P. Villa

    Assemblies as Phase-Locked Pattern Sets That Collectively Win theCompetition for Coherence . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 617

    Thomas Burwick

    A CA2+ Dynamics Model of the STDP Symmetry-to-AsymmetryTransition in the CA1 Pyramidal Cell of the Hippocampus . . . . . . . . . . . . 627

    Vassilis Cutsuridis, Stuart Cobb, and Bruce P. Graham

    Improving Associative Memory in a Network of Spiking Neurons . . . . . . . 636Russell Hunter, Stuart Cobb, and Bruce P. Graham

    Effect of Feedback Strength in Coupled Spiking Neural Networks . . . . . . . 646Javier Iglesias, Jordi Garca-Ojalvo, and Alessandro E.P. Villa

    Bifurcations in Discrete-Time Delayed Hopfield Neural Networks ofTwo Neurons . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 655

    Eva Kaslik and Stefan Balint

    EEG Switching: Three Views from Dynamical Systems . . . . . . . . . . . . . . . 665Carlos Lourenco

    Modeling Synchronization Loss in Large-Scale Brain Dynamics . . . . . . . . 675Antonio J. Pons Rivero, Jose Luis Cantero, Mercedes Atienza, and

    Jordi Garca-Ojalvo

    Spatio-temporal Dynamics during Perceptual Processing in anOscillatory Neural Network . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 685

    A. Ravishankar Rao and Guillermo Cecchi

    Resonant Spike Propagation in Coupled Neurons with SubthresholdActivity . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 695

    Belen Sancristobal, Jose M. Sancho, and Jordi Garca-Ojalvo

    Contour Integration and Synchronization in Neuronal Networks of theVisual Cortex . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 703

    Ekkehard Ullner, Raul Vicente, Gordon Pipa, and

    Jordi Garca-Ojalvo

    Special Session: Constructive Neural Networks

    Fuzzy Growing Hierarchical Self-Organizing Networks . . . . . . . . . . . . . . . . 713Miguel Barreto-Sanz, Andres Perez-Uribe,

    Carlos-Andres Pena-Reyes, and Marco Tomassini

  • Table of Contents Part II XXVII

    MBabCoNN A Multiclass Version of a Constructive Neural NetworkAlgorithm Based on Linear Separability and Convex Hull . . . . . . . . . . . . . 723

    Joao Roberto Bertini Jr. and Maria do Carmo Nicoletti

    On the Generalization of the m-Class RDP Neural Network . . . . . . . . . . . 734David A. Elizondo, Juan M. Ortiz-de-Lazcano-Lobato, and

    Ralph Birkenhead

    A Constructive Technique Based on Linear Programming for TrainingSwitching Neural Networks . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 744

    Enrico Ferrari and Marco Muselli

    Projection Pursuit Constructive Neural Networks Based on Quality ofProjected Clusters . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 754

    Marek Grochowski and Wlodzislaw Duch

    Introduction to Constructive and Optimization Aspects of SONN-3 . . . . 763Adrian Horzyk

    A Reward-Value Based Constructive Method for the AutonomousCreation of Machine Controllers . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 773

    Andreas Huemer, David Elizondo, and Mario Gongora

    A Brief Review and Comparison of Feedforward Morphological NeuralNetworks with Applications to Classification . . . . . . . . . . . . . . . . . . . . . . . . . 783

    Alexandre Monteiro da Silva and Peter Sussner

    Prototype Proliferation in the Growing Neural Gas Algorithm . . . . . . . . . 793Hector F. Satizabal, Andres Perez-Uribe, and Marco Tomassini

    Active Learning Using a Constructive Neural Network Algorithm . . . . . . 803Jose Luis Subirats, Leonardo Franco, Ignacio Molina Conde, and

    Jose M. Jerez

    M-CLANN: Multi-class Concept Lattice-Based Artificial NeuralNetwork for Supervised Classification . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 812

    Engelbert Mephu Nguifo, Norbert Tsopze, and Gilbert Tindo

    Workshop: New Trends in Self-organization andOptimization of Artificial Neural Networks

    A Classification Method of Children with Developmental DysphasiaBased on Disorder Speech Analysis . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 822

    Marek Bartu and Jana Tuckova

    Nature Inspired Methods in the Radial Basis Function NetworkLearning Process . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 829

    Miroslav Bursa and Lenka Lhotska

  • XXVIII Table of Contents Part II

    Tree-Based Indirect Encodings for Evolutionary Development of NeuralNetworks . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 839

    Jan Drchal and Miroslav Snorek

    Generating Complex Connectivity Structures for Large-Scale NeuralModels . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 849

    Martin Hulse

    The GAME Algorithm Applied to Complex Fractionated AtrialElectrograms Data Set . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 859

    Pavel Kordk, Vaclav Kremen, and Lenka Lhotska

    Geometrical Perspective on Hairy Memory . . . . . . . . . . . . . . . . . . . . . . . . . . 869Cheng-Yuan Liou

    Neural Network Based BCI by Using Orthogonal Components ofMulti-channel Brain Waves and Generalization . . . . . . . . . . . . . . . . . . . . . . 879

    Kenji Nakayama, Hiroki Horita, and Akihiro Hirano

    Feature Ranking Derived from Data Mining Process . . . . . . . . . . . . . . . . . . 889Ales Pilny, Pavel Kordk, and Miroslav Snorek

    A Neural Network Approach for Learning Object Ranking . . . . . . . . . . . . 899Leonardo Rigutini, Tiziano Papini, Marco Maggini, and

    Monica Bianchini

    Evolving Efficient Connection for the Design of Artificial NeuralNetworks . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 909

    Min Shi and Haifeng Wu

    The Extreme Energy Ratio Criterion for EEG Feature Extraction . . . . . . 919Shiliang Sun

    Workshop: Adaptive Mechanisms of thePerception-Action Cycle

    The Schizophrenic Brain: A Broken Hermeneutic Circle . . . . . . . . . . . . . . . 929Peter Erdi, Vaibhav Diwadkar, and Balazs Ujfalussy

    Neural Model for the Visual Recognition of Goal-DirectedMovements . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 939

    Falk Fleischer, Antonino Casile, and Martin A. Giese

    Emergent Common Functional Principles in Control Theory and theVertebrate Brain: A Case Study with Autonomous Vehicle Control . . . . . 949

    Amir Hussain, Kevin Gurney, Rudwan Abdullah, and Jon Chambers

    Organising the Complexity of Behaviour . . . . . . . . . . . . . . . . . . . . . . . . . . . . 959Stathis Kasderidis

  • Table of Contents Part II XXIX

    Towards a Neural Model of Mental Simulation . . . . . . . . . . . . . . . . . . . . . . . 969Matthew Hartley and John Taylor

    Author Index . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 981

  • Dimension Reduction for Mixtures of

    Exponential Families

    Shotaro Akaho

    Neuroscience Research Institute, AIST, Tsukuba 3058568, Japan

    Abstract. Dimension reduction for a set of distribution parameters hasbeen important in various applications of datamining. The exponentialfamily PCA has been proposed for that purpose, but it cannot be directlyapplied to mixture models that do not belong to an exponential family.This paper proposes a method to apply the exponential family PCAto mixture models. A key idea is to embed mixtures into a space of anexponential family. The problem is that the embedding is not unique, andthe dimensionality of parameter space is not constant when the numbersof mixture components are different. The proposed method finds a sub-optimal solution by linear programming formulation.

    1 Introduction

    In many applications, dimension reduction is important for many purposes suchas visualization and data compression. Traditionally, the principal componentanalysis (PCA) has been widely used as a powerful tool for dimension reductionin the Euclidean space. However, data are often given as binary strings or graphstructures that have very different nature from Euclidean vectors.

    One approach that we take here is to regard such a data as a parameter ofa probability distribution. Information geometry[1] gives a mathematical frame-work of the space of probability distributions, and a dimension reduction methodhas been proposed for a class of exponential family[2,3,4,5]. There are mainly twoadvantages of information geometrical approach to other conventional methods:one is that the information geometrical projection from data point always lies onthe support of parameters, and the other is that the projection is defined morenaturally for a distribution than the conventional Euclidean projection.

    In this paper, we focus on the mixture models[6], which are very flexible andare often used for clustering. However, we cannot apply the exponential familyPCA to the mixture models, because they are not members of an exponentialfamily. Our main idea is to embed mixture models into the space of an expo-nential family. However, that is not straightforward because the embedding isnot unique and the dimensionality of parameter space is not constant when thenumbers of mixture components are different. Those problems can be resolvedby solving some combinatorial optimization problem, which is computationallyintractable. Therefore, we propose a method that finds a sub-optimal solutionby separating the problems into subproblems, each of which can be optimizedeasier.

    V. Kurkova et al. (Eds.): ICANN 2008, Part I, LNCS 5163, pp. 110, 2008.c Springer-Verlag Berlin Heidelberg 2008

  • 2 S. Akaho

    The proposed framework is not only useful for visualization and data com-pression, but also for applications which have been developed recently in the fieldof datamining, privacy preserving datamining[7] and distributed datamining[8].In the distributed datamining, raw data are collected in many distributed sites.Those datae are not directly sent to the center, but processed into statisticaldata in each site in order to preserve privacy as well as to reduce communicationcosts, and then those statistical data are sent to the center. Similar frameworkhas begun to be studied in the field of sensor networks[9].

    2 e-PCA and m-PCA: Dual Dimension Reduction

    2.1 Information Geometry of Exponential Family

    In this section, we review the exponential family PCA called e-PCA and m-PCA[4]. Exponential family is defined as a class of distributions given by

    p(x;) = exp{d

    i=1

    iFi(x) + C(x) ()}, (1)

    with a random variable x and a parameter = (1, . . . , d). The whole set

    of distribution p(x;) by changing forms a space (manifold) S. The struc-ture of the manifold is determined by introducing a Riemannian metric andan affine connection. Statistically natural metric is a Fisher information matrixgjk() = E[{ log p(x;)/j}{ log p(x;)/k}] and the natural connectionis -connection specified by one real valued parameter . In particular, = 1 isimportant, because S becomes a flat manifold. When = 1, the space is callede-flat1 with respect to an affine coordinate (e-coordinate) . When = 1,the exponential family is also flat with respect to another affine coordinate (m-coordinate) = (1, . . . , d)

    defined by i = E[Fi(x)]. The coordinates and are dually related and transformed each other by Legendre transform, and wewrite this coordinate transform by (),().

    2.2 e-PCA and m-PCA

    Since the manifold of an exponential family is flat in the e- and m- affine co-ordinate, there are two kinds of flat submanifolds for dimension reduction ac-cordingly. The e-PCA (and m-PCA) is defined by finding the e-flat (m-flat)submanifold that fits to samples given as a set of points of the exponential fam-ily. Here we describe only e-PCA, because m-PCA is completely dual to e-PCAthat is given by exchanging e- and m- in the description of e-PCA.

    Let us define the h dimensional e-flat subspace M. The points on M can beexpressed by

    (w;U) =

    h

    j=1

    wjuj + u0, (2)

    1 e stands for exponential and m stands for mixture.

  • Dimension Reduction for Mixtures of Exponential Families 3

    where U = [u0,u1, . . . ,uh] Rdh is a matrix containing basis vectors of thesubspace and w = (w1, . . . , wh)

    Rh is a local coordinate on M.Suppose we have a set of parameters (1), . . . ,(n) S as sample points. For

    dimension reduction, we need to consider the projection of the sample pointsonto M, which is defined by a geodesic that is orthogonal to M with respectto the Fisher information. According to the two kinds of geodesic, we can definee-projection and m-projection.

    Amari[1] has proved that the m-projection onto an e-flat submanifold isunique, and further it is given by the point that minimizes the m-divergence,

    Km(p, q) =

    p(x){log p(x) log q(x)}dx, (3)

    hence we take m-projection for the e-PCA2.As a cost function of fitting of the sample points to a submanifold, it is

    convenient to take sum of the m-divergence

    L(U,W ) =n

    i=1

    Km((i),(w(i);U)), (4)

    where W = (w(1), . . . ,w(n)), and e-PCA is defined by finding U and W thatminimize L(U,W ). Note that even when data are given as values of a randomvariable instead of parameters, the random variable can be related to a param-eter, thus we can apply the same framework[10].

    2.3 Alternating Gradient Descent Algorithm

    Although it is difficult to optimize L(U,W ) with respect to U and W simul-taneously, the optimization becomes easier by alternating procedures in whichoptimization is performed for one variable with fixing the other variable. If wefix the basis vectors U , the projection onto an e-flat space from a sample pointis unique as mentioned above. On the other hand, optimizing U with fixingW is also an m-projection to the e-flat subspace determined by W that is asubmanifold of the product space of Sn. Therefore, it also has a unique solution.

    In each optimization step, we can apply a Newton-like method[4], but we onlyuse an simple gradient descent in this paper. Note that whatever algorithm weuse, the algorithm does not always converge to the global solution, even if eachalternating step is globally optimized, like in EM and variational Bayes.

    The gradient descent algorithm is given by

    w(i)j = wuj (i), uj = u

    n

    i=1

    w(i)j

    (i), u0 = un

    i=1

    (i)

    (5)

    2 By duality, we take e-projection for m-PCA, and e-divergence is defined byKe(p, q) = Km(q, p).

  • 4 S. Akaho

    where (i) = (i) (i) is a difference of m-coordinates between the pointspecified by the current estimate w(i) and the sample point. As a general ten-dency, the problem is more sensitive against U than against W . Thus we take alearning constant to be w > u.

    Further, the basis vectors U has a redundancy of linear transformation, thatis, when U is transformed by any non-singular matrix A, the same solution is ob-tained by transforming W to WA1. It also happens that two different bases uiand uj converge to the same direction if they are optimized by ordinary gradientdescent without any constraints. Therefore, we restrict U to be an orthogonalframe (i.e. UU = Id). Such a space is called Grassmann manifold. The opti-mization in Grassmann manifold is often used for finding principal componentsor minor components [11]. The natural gradient for U is given by

    Unat = U UUU (6)

    where U be the matrix whose columns are uj in (5). Since this update ruledoes not preserve the orthogonal constraint strictly, we need to orthogonalize it(we apply this in the experiment), or update U along the geodesic.

    2.4 e-Center and m-Center

    An important special case of the e-PCA and m-PCA is a zero dimensional sub-space that corresponds to a point. The only parameter in that case is u0 that isgiven in a closed form

    ec =

    (

    1

    n

    n

    i=1

    ((i))

    )

    , mc =

    (

    1

    n

    n

    i=1

    ((i))

    )

    . (7)

    We call them e-center and m-center respectively.

    2.5 Properties of e-PCA and m-PCA

    In this subsection, we summarize several points that the e-PCA and m-PCA aredifferent from the ordinary PCA.

    The first thing is about the hierarchical relation between different dimen-sions. Since e-PCA and m-PCA includes nonlinear part in its formulation, anoptimal low dimensional subspace is not always included in a higher dimensionalone. In some applications, hierarchical structures are necessary or convenient.In such cases, we can construct an algorithm that finds the optimal subspace byconstraining the search space.

    The second thing is about the domain (or support) of S. The parameter setof exponential family is a local coordinate. That means does not define theprobability distribution for all values of Rd. In general, it forms a convex regionfor e- and m- coordinate systems. It is known that the m-projection for e-PCA isguaranteed to be included in that region. However, when we apply the gradient-type algorithm, too large step size causes the excess of the candidate solution

  • Dimension Reduction for Mixtures of Exponential Families 5

    from the domain. In our implementation, the candidate solution is checked to beincluded in each learning step, and the learning constant is adaptively changedin the case of excess.

    The third thing is about the initialization problem. Since the alternating algo-rithm only gives a local optimum, it is important to find a good initial solution.The naive idea is to use the conventional PCA using Euclidean metric, and u0is initialized by its e-center. However, the initialization problem is related tothe domain problem above, i.e., the initialization points have to lie in the do-main region. For simplicity, we take W = 0 in our numerical simulation, whichcorresponds to the initial projection point being always u0.

    3 Embedding of Mixture Models

    Now let us move on to our main topic, the dimension reduction of mixture mod-els. A major difficulty is that mixture models are not members of an exponentialfamily. If we add a latent variable z representing which component x is generatedfrom, p(x, z;) belongs to the exponential family.

    3.1 Latent Variable Model

    Mixture of exponential family is written as

    p(x) =

    k

    i=0

    ifi(x; i), fi(x; i) = exp(i F i(x)i(i)), i = 0, . . . , k. (8)

    Since the number of freedom {i} is k, we regard 1, . . . , k as parameters anddefine 0 by 0 = 1

    ki=1 i.

    When z {0, 1, 2, . . . , k} is a latent variable representing which component ofmixture x is generated from, the distribution of (x, z) is an exponential family[12] as written down below.

    p(x, z) = zfz(x; z) exp

    [k

    i=1

    i F i(x)i(z)

    +0 F 0(x)(

    1k

    i=1

    i(z)

    )

    +

    k

    i=1

    ii(z) ]

    , (9)

    where i(z) = 1 when z = i, and 0 otherwise, and

    i = log i i(i) (log 0 0(0)) , = log 0 + 0(0). (10)

    The e-coordinate of this model is = 1, . . . , k, 0, 1, . . . , k, and the m-coordinate is E [i(z)] = i corresponding to i, and E[F i(x)i(z)] = iicorresponding to i, where i = E[F i(x)] is the m-coordinate of each compo-nent distribution fi(x; i).

  • 6 S. Akaho

    3.2 Problems of the Embedding

    There are two problems in the embedding described above. The first one is thatthe embedding is not unique, because the mixture distribution is invariant whencomponents are exchanged. The other happens when there are different numbersof mixture components. In such a case, we cannot embed them directly into onecommon space, because the dimensions of mixture components are different.

    For the first problem, we will find the embedding so that embedded distri-butions are located as closely as possible. Once the embedding is completed,the e-PCA (or m-PCA) can be applied directly. For the second problem, wewill split the components to adjust the dimensions between different numbers ofcomponents.

    3.3 Embedding for the Homogeneous Mixtures

    Firstly, we consider the homogeneous case in which the numbers of componentsare the same for all mixtures (i).

    A naive way to resolve the problem is that we perform e-PCA (or m-PCA)for any possible embeddings and take the best one. However, it is not practicalbecause the number of possible embeddings increase exponentially with respectto the number of components and the number of mixtures. Instead, we try to finda configuration by which mixtures get as close together as possible. The followingproposition shows the divergence between two mixtures in the embedded spaceis given in a very simple form.

    Proposition 1. Suppose there are two mixture distributions with the same num-bers of components, and their distributions with latent variables be

    p1(x, z) = zfz(x; z), p2(x, z) = zfz(x; z). (11)

    The m-divergence between p1 and p2 is given by

    Km(p1, p2) =

    k

    i=0

    i[Km(fi(x; i), fi(x; i)) + logii

    ]. (12)

    This means that the divergence is separated into sum of functions each of whichdepends only on pairwise components of the two mixtures. Note that the diver-gence between the original mixtures is not so simple.

    Based on this fact, we can derive the optimal embedding for two mixturesthat minimizes the divergence. It should be noted that the optimality is notinvariant with respect to the order of p1 and p2 because the divergence is nota symmetric function. For the general n mixtures case, we apply the followinggreedy algorithm based on the pairwise optimality.

    [Embedding algorithm (for e-PCA, homogeneous)]

    1. Embed (1) in any configuration2. Repeat the following procedures for i = 2, 3, . . . , n

    (a) Let ec be the e-center of already embedded mixtures for j = 1, . . . , i1.(b) Embed (i) so as to minimize the m-divergence between (i) and ec in

    the embedded space (see next subsection).

  • Dimension Reduction for Mixtures of Exponential Families 7

    0 0

    1 1

    k k

    01

    k0

    1k

    ......

    00

    1k

    01

    k0

    ...

    ...kkk

    Fig. 1. Matching of distributions. Left: Homogeneous case. The sum of weights isminimized. Right: Heterogeneous case. In this example, the k-th component of the leftgroup is split and matched with two components (0, k-th) of the right group.

    3.4 The Optimal Matching Solution

    In this subsection, we give an optimization method to find a matching be-tween two mixtures so as to minimize the cost function (12) that is the sumof component-wise functions (Left of fig.1).

    Letting the weight values be

    ij = Km(p1, p2) = i

    [

    Km(fi(x; i), fj(x; j)) + logij

    ]

    , (13)

    we obtain the optimization problem in terms of the linear programming,

    minaij

    k

    i=0

    k

    j=0

    ijaij s.t. aij 0,k

    i=0

    aij =

    k

    j=0

    aij = 1 (14)

    The solution aij takes binary values (0 or 1) by the following integrality theorem.

    Proposition 2 (Integrality theorem[13]). In the transshipment problem,

    minaijk

    i=0

    k

    j=0 ijaij s.t. aij 0,k

    i=0 aij = sj ,k

    j=0 aij = ti, aij has aninteger optimal solution when the problem has at least one feasible solution andsj , ti are all integers. In particular, the solution given by the simplex methodalways gives the integer solution.

    3.5 General Case: Splitting Components

    When numbers of components of mixtures are different (heterogeneous case), wecan adjust the numbers by splitting components. Splitting the components ofmixtures have played an important role in different situations, for example, tofind an optimal number of components for fitting the mixture[14].

    Let f(x) be one of the components of a mixture, it can be split into k + 1components like

    if(x; ), i = 0, . . . , k,

    k

    i=0

    i = , i > 0 (15)

  • 8 S. Akaho

    We need to determine two things: which component should be split and howlarge weights of splitting i should be. However, since it is hard to optimize themsimultaneously, we solve the problem sequentially, that is, first we determine thecomponent to be split based on the optimal assignment problem in the previoussubsection, and then we optimize the weights of splitting.

    3.6 Component Selection

    Suppose we have two mixtures p1 and p2 given in (11). When their numbers ofcomponents are different (heterogeneous case), we need to find matching one-to-many. Here let z = 0, 1, . . . , k for p1 and z = 0, 1, . . . , k

    for p2. In orderto find the one-to-many matching, we extend the optimization problem of thehomogeneous case to the heterogeneous case in a natural way

    minaij

    k

    i=0

    k

    j=0

    ijaij s.t. aij 0,k

    i=0

    aij 1,k

    j=0

    aij = 1, (16)

    where ij is defined by (13), and we assumed p1 has a smaller number of compo-nents than p2 (k k) and some equality constraints are replaced by inequalityconstraints to deal with one-to-many matching (the right of fig.1).

    Note that this problem only gives a sub-optimal matching for the entire prob-lem, because the splitting weights are not taken into account. However, from thecomputational point of view, the integrality property of the solution is preservedand all weights are guaranteed to be binary values, and further virtue of thisformulation is that the homogeneous case is included as a special case of theheterogeneous case.

    3.7 Optimal Weights

    After the matching is performed, we split the component f(x; ) into k + 1components given by (15) and find the optimal correspondence to the compo-nents ifi(x; i), (i = 0, . . . , k). This can be given by the following proposition.

    Proposition 3. The optimal splitting that minimizes the sum of m-divergencebetween if(x; ) and ifi(x; i), (i = 0, . . . , k) is given by

    ei =iZ

    exp(Km(f(x; ), f(x; i))), (17)

    where Z is a normalization constant. The splitting for e-divergence is given by

    mi =iZ. (18)

    Now we summarize the embedding method in the general case including bothhomogeneous and heterogeneous.

  • Dimension Reduction for Mixtures of Exponential Families 9

    10 0 100

    0.2

    0.4

    1

    10 0 100

    0.2

    0.4

    2

    10 0 100

    0.2

    0.4

    3

    10 0 100

    0.2

    0.4

    4

    10 0 100

    0.2

    0.4

    5

    10 0 100

    0.2

    0.4

    6

    10 0 100

    0.2

    0.4

    7

    10 0 100

    0.2

    0.4

    8

    10 0 100

    0.2

    0.4

    1

    10 0 100

    0.2

    0.4

    2

    10 0 100

    0.2

    0.4

    3

    10 0 100

    0.2

    0.4

    4

    10 0 100

    0.2

    0.4

    5

    10 0 100

    0.2

    0.4

    6

    10 0 100

    0.2

    0.4

    7

    10 0 100

    0.2

    0.4

    8

    6 5 4 3 2 1 0 15

    4

    3

    2

    1

    0

    1

    2

    3

    4

    1

    2

    3

    4

    5

    6 7

    8

    Fig. 2. Up-left: Original mixtures, Up-right: Mixtures with reduced dimension, Down:Two dimensional scatter plots of mixtures

    [Embedding algorithm (for e-PCA, general)]

    1. Sort (1), . . . ,(n) in the descending order of the numbers of components.2. Embed (1) in any configuration3. Repeat the following (a),(b),(c) for i = 2, 3, . . . , n

    (a) Let ec be e-center of already embedded mixtures j = 1, . . . , i 1.(b) Solve (16) to find the correspondence between (i) and ec.

    (c) If the number of components of (i) is smaller than ec, then split thecomponents by (17).

    4 Numerical Experiments

    We applied the proposed method to a synthetic data set of one dimensionalGaussian mixtures. First, Gaussian mixtures are generated (total 8 = 4 mix-tures with 3 components + 3 mixtures with 2 components + 1 mixture with 1component), where the parameters of those mixtures (mixing weight, mean andvariance of each component) are determined at random. (the upper left of fig.2).

    The learning coefficient of e-PCA is taken to be w = 0.1, u = 0.01 except thecases that the parameter exceeds the domain boundary or the objective functionincreases exceptionally (in such cases the learning rate is decreased adaptively).The update of U is performed 20 steps, each of which follows after 50 stepupdates of W for the sake of stable convergence.

    The down figure of fig. 2 shows the result of dimension reduction (e-PCA)to 2 dimensional subspace from 8 dimensional original space (= the number ofparameters of Gaussian mixtures with 3 components). The objective function

  • 10 S. Akaho

    of L(U,W ) is about 6.4 in the initial solution (the base function is initializedEuclidean PCA) that decreased to about 1.9. The upper right of fig. 2 showsthe projected distributions obtained by e-PCA. We see their original shapes arewell-preserved even in 2-D subspace, but the shapes are little smoothed. We alsoapplied m-PCA as well, a similar but different results are obtained.

    5 Concluding Remarks

    We have proposed a dimension reduction method of parameters of mixture dis-tributions. There are two important problems to be solved: One is to find a goodinitial solution because the final solution is not a global optimum, though theoptimum solution is obtained in each step. The other is to develop a stable andfast algorithm. As for the embedding, there is a lot of possibilities to be improvedfrom the proposed greedy algorithm. The application of the real world data andextensions to other structured models like HMM and other types of methods likeclustering are all left as future works.

    References

    1. Amari, S.: Differential Geometrical Methods in Statistics. Springer, Heidelberg(1985)

    2. Amari, S.: Information Geometry on Hierarchy of Probability Distributions. IEEETrans. on Information Theory 41 (2001)

    3. Collins, M., Dasgupta, S., Schapire, R.: A Generalization of Principal ComponentAnalysis to the Exponential Family. In: Advances in NIPS, vol. 14 (2002)

    4. Akaho, S.: The e-PCA and m-PCA: dimension reduction by information geometry.In: IJCNN 2004, pp. 129134 (2004)

    5. Watanabe, K., Akaho, S., Okada, M.: Clustering on a Subspace of ExponentialFamily Using Variational Bayes Method. In: Proc. of Worldcomp2008/InformationTheory and Statistical Learning (2008)

    6. McLachlan, G., Peel, D.: Finite Mixture Models. Wiley, Chichester (2000)7. Agrawal, R., Srikant, R.: Privacy-Preserving Data Mining. In: Proc. of the ACM

    SIGMOD, pp. 439450 (2000)8. Kumar, A., Kantardzic, M., Madden, S.: Distributed Data Mining: Framework and

    Implementations. IEEE Internet Computing 10, 1517 (2006)9. Chong, C.Y., Kumar, S.: Sensor networks: evolution, opportunities, and challenges.

    Proc. of the IEEE 91, 12471256 (2003)10. Buntine, W.: Variational extensions to EM and multinomial PCA. In: Elomaa, T.,

    Mannila, H., Toivonen, H. (eds.) ECML 2002. LNCS (LNAI), vol. 2430. Springer,Heidelberg (2002)

    11. Edelman, A., Arias, T., Smith, S.: The geometry of algorithms with orthogonalityconstraints. SIAM J. Matrix Anal. Appl. 20(2), 303353 (1998)

    12. Amari, S.: Information geometry of the EM and em algorithms for neural networks.Neural Networks 8(9), 13791408 (1995)

    13. Chvatal, V.: Linear Programming. W.H. Freeman and Company, New York (1983)14. Fukumizu, K., Akaho, S., Amari, S.: Critical lines in symmetry of mixture models

    and its application to component splitting. In: Proc. of NIPS15 (2003)

  • Several Enhancements to Hermite-Based

    Approximation of One-Variable Functions

    Bartlomiej Beliczynski1 and Bernardete Ribeiro2

    Warsaw University of Technology,Institute of Control and Industrial Electronics,ul. Koszykowa 75, 00-662 Warszawa, Poland

    [email protected]

    Department of Informatics Engineering, Center for Informatics and Systems,University of Coimbra,

    Polo II, P-3030-290 Coimbra, [email protected]

    Abstract. Several enhancements and comments to Hermite-based one-variable function approximation are presented. First of all we prove thata constant bias extracted from the function contributes to the error de-crease. We demonstrate how to choose that bias. Secondly we show howto select a basis among orthonormal functions to achieve minimum errorfor a fixed dimension of an approximation space. Thirdly we prove thatloss of orthonormality due to truncation of the argument range of thebasis functions does not effect the overall error of approximation and theexpansion coefficients. We show how this feature can be used. An appli-cation of the obtained results to ECG data compression is presented.

    1 Introduction

    A set of Hermite functions forming an orthonormal basis is naturally attractivefor various approximation, classification and data compression tasks. These ba-sis functions are defined on the real numbers set IR and they can be recursivelycalculated. The approximating function coefficients can be determined relativelyeasily to achieve the best approximation property. Since Hermite functions areeigenfunctions of the Fourier transform, time and frequency spectra are simulta-neously approximated. Each subsequent basis function extends frequency band-width within a limited range of well concentrated energy; see for instance [1]. Byintroducing scaling parameter we may control the bandwidth influencing at thesame time the dynamic range of the input argument, till we strike a desirablebalance.

    If Hermite one-variable functions are generalized to two variables, they retainthe same useful property and turn out to be very suitable for image compressiontasks.

    Recently in several publications (see for instance [2], [3]) it was suggested touse Hermite functions as activation functions in neural schemes. In [3], a so calledconstructive approximation scheme is used. It is a type of incremental approx-imation developed in [4], [5]. The novelty of this approach is that contrary to the

    V. Kurkova et al. (Eds.): ICANN 2008, Part I, LNCS 5163, pp. 1120, 2008.c Springer-Verlag Berlin Heidelberg 2008

  • 12 B. Beliczynski and B. Ribeiro

    traditional neural architecture, every node in the hidden layer has a different ac-tivation function. It gains several advantages of the Hermite functions. However,in such approach orthogonality of Hermite functions is not really exploited.

    In this paper we return to the basic tasks of one-variable function approxima-tion. For this classical problem we are offering two enhancements and one proofof correctness.

    For fixed basis functions in a Hilbert space, there always exists the best ap-proximation. If the basis is orthonormal, the approximation can relatively easilybe calculated in the form of expansion coefficients. Those coefficients representthe original function approximated in the Hermite basis. The coefficients usuallyrequire less space than the original data. At first glance there seems to be littleroom for improvement. However one may slightly reformulate the problem. In-stead of approximating the function f , one may approximate f f0, where f0is a fixed chosen function. After the approximation is done, f0 is added to theapproximant of ff0. From approximation and data compression point of view,this procedure makes sense if additional efforts put into the representation of f0are compensated by reduction of the approximation error.

    In a typically stated approximation problem a basis of n+1 functions {e0, e1, ..., en} is given and we are looking for their expansion coefficients. We may howeverreformulate that problem in the following way. Let us search for any n+1 Hermitebasis functions, not necessarily with consecutive indices, ensuring the smallesterror of approximation. This is the second issue.

    The third problem which is stated and discussed here is the problem of loosingorthonormality property by basis functions if the set IR is replaced by its subset.When the approximating basis is orthonormal, the expansion coefficients arecalculated easily. Otherwise these calculations are more complicated. Howeverwe prove that despite of loss of orthonormality, we may determine the Hermiteexpansion coefficients as before.

    In this paper we are focusing on Hermite basis, however many of the stud-ied properties are applicable to any orthonormal basis. Our enhancements weretested and demonstrated with ECG data compression, a well known applicationarea.

    This paper is organized as follows. In Section 2 basic facts about approxima-tion needed for later use are recalled. In Section 3 Hermite functions are shortlydescribed. Then we present our results in Section 4: bias extraction, basis func-tions selection and proof of correctness for expansion coefficients calculationdespite the lack of basis orthonormality. In Section 5 certain practicalities arepresented and an application of our improvements to ECG data compression isdemonstrated and discussed. In Section 6 conclusions are drawn.

    2 Approximation Framework

    Some selected facts on function approximation useful for this paper will be re-called. Let us consider the following function

  • Several Enhancements to Hermite-Based Approximation 13

    fn+1 =

    n

    i=0

    wigi, (1)

    where gi G H, and H is a Hilbert space H = (H,||.||), i = 0, ..., n, andwi IR, i = 0, . . . , n.

    For any function f from a Hilbert space H and a closed (finite dimensional)subspace G H with basis {g0, ..., gn} there exists a unique best approximationof f by elements of G ([6]). Let us denote it by gb. Because the error of the bestapproximation is orthogonal to all elements of the approximation space fgbG,the coefficients wi may be calculated from the set of linear equations

    gi, f gb = 0 for i = 0, ..., n (2)

    where ., . denotes inner product.The formula (2) can also be written as gi, f

    nk=0 wkgk=gi, f

    nk=0 wk

    gi, gk = 0 for i = 0, ..., n or in the matrix form

    w = Gf (3)

    where = [gi, gj], i, j = 0, ..., n, w= [w0, ..., wn]T , Gf = [g0, f , ..., gn, f]Tand T denotes transposition.

    Because there exists a unique best approximation of f in a n+1 dimensionalspace G with basis {g0, ..., gn}, the matrix is nonsingular and wb = 1Gf .

    For any basis {g0, ..., gn} one can find such orthonormal basis {e0, ..., en},ei, ej = 1when i = j and ei, ej = 0 when i = j that span{g0, ..., gn} =span{e0, ..., en}. In such a case, is a unit matrix and

    wb =[e0, f , e2, f , . .., en, f

    ]T. (4)

    Finally (1) will take the form

    fn+1 =

    n

    i=0

    ei, f ei, i = 0, 1, ..., n. (5)

    The squared error errorn+1 =< f fn, f fn > of the best approximationof a function f in the basis {e0, ..., en} is thus expressible by

    ||errorn+1 ||2 = ||f ||2 n

    i=0

    w2i . (6)

    3 Hermite Functions

    We will be looking at an orthonormal set of functions in the form of Hermitefunctions. Their expansion coefficients are easily and independently calculatedfrom (4). Let us consider a space of a great practical interest L2(,+)

  • 14 B. Beliczynski and B. Ribeiro

    5 0 50.8

    0.6

    0.4

    0.2

    0

    0.2

    0.4

    0.6

    0.8

    h0

    h1

    h3

    Fig. 1. Hermite functions h0, h1, h3

    with the inner product defined < x, y >=+

    x(t)y(t)dt. In such space a se-

    quence of linearly independent and bounded functions could be defined as fol-lows h0(t) = w(t) = e

    t2/2, h1(t) = tw(t),..., hn(t) = tnw(t),..This basis could

    be orthonormalized by using the well known and efficient Gram-Schmidt process(see for instance [6]). Finally a new now orthonormal basis spanning the samespace is obtained

    h0(t), h1(t), ..., hn(t), ... (7)

    where

    hn(t) = cne t

    2

    2 Hn(t); Hn(t) = (1)net2 dn

    dtn(et

    2

    ); cn =1

    (2nn!)1/2

    . (8)

    The polynomialsHn(t) are called Hermite polynomials and the functions en(t)Hermite functions. According to (8) the first several Hermite functions could becalculated

    h0(t) =1

    1/4e

    t2

    2 ; h1(t) =1

    21/4e

    t2

    2 2t;

    h2(t) =1

    221/4

    et2

    2 (4t2 2); h3(t) =1

    431/4

    et2

    2 (8t3 12t)

    A plot of several functions of the Hermite basis are shown in Fig.1.

    4 Main Results

    4.1 Extracting of Bias

    In this section our first enhancement is introduced. Let f be any function from aHilbert spaceH. Instead of approximating function f, we suggest to approximate

  • Several Enhancements to Hermite-Based Approximation 15

    the function f f0,where f0 H is a known function. Later f0 is added to theapproximant of f f0. Now a modification of (5) will be the following

    ff0n+1 = f0 +

    n

    i=0

    < f f0, ei > ei, (9)

    Then the approximation error will be expressed as

    ef0n = f ff0n+1 = f f0 n

    i=0

    < f f0, ei > ei,

    and similarly to (6) its squared norm

    ||ef0n+1||2 = ||f f0||2 n

    i=0

    < f f0, ei >2 (10)

    Theorem 1. Let H be a Hilbert space of functions on a subset of R containingthe interval [a, b], let f be a function from H, f H, {e0, e1, .. ., en} be anorthonormal set in H, c be a constant c R. Let f0 = c1[a,b] where 1[a,b] denotesa function of value 1 in the range [a, b] and 0 elsewhere, and the approximationformula be the following

    ff0n+1 = f0 +

    n

    i=0

    < f f0, ei > ei

    then the norm of the approximation error is minimized for c = c0 and

    c0 =< f, 1[a,b] >

    ni=0 < f, ei >< ei, 1[a,b] >

    (b a) ni=0 < ei, 1[a,b] >2(11)

    Proof. The squared error formula (10) could be expressed as follows ||ef0n+1||2 =||f ||2 + ||f0||2 2 < f, f0 >

    ni=0(f, ei ei, f0)2 = ||f ||

    2 + c2(b a) 2c

    ni=0(f, ei

    2+ c2

    ei, 1[a,b]

    2 2c f, eiei, 1[a,b]

    ). Now differen-

    tiating the squared error formula in respect of c and equating it to zero oneobtains (11).

    Along the Theorem we are suggesting the two step approximation. First f0should be calculated and then the function f f0 will be approximated in ausual way.

    Remark 1. One may notice that in many applications c0 of (11) could well beapproximated by

    c0 < f, 1[a,b] >

    (b a) (12)

    The right hand side of (12) expresses the mean value of the approximated func-tion f in the range [a, b]. A usual choice of [a, b] is such as an actual function fargument range.

  • 16 B. Beliczynski and B. Ribeiro

    4.2 Basis Selection

    In a typically stated approximation problem there is a function to be approx-imated f and a basis {e0, e1, ..., en} of approximation. We are looking for thefunction expansion coefficients related to the basis functions.

    The problem may however be reformulated in the following way. Let searchfor any n+ 1 Hermite-basis functions, not necessarily with consecutive indices,ensuring the smallest error of approximation. In practice this easily can be done.Since for any orthonormal basis an indicator of the error reduction associatedwith the basis function ei is |wi| = | < f, ei > |, one may calculate sufficientlymany coefficients and order them