artificial neural networks (icann)
TRANSCRIPT
-
Lecture Notes in Computer Science 5163
Commenced Publication in 1973
Founding and Former Series Editors:Gerhard Goos, Juris Hartmanis, and Jan van Leeuwen
Editorial Board
David HutchisonLancaster University, UK
Takeo KanadeCarnegie Mellon University, Pittsburgh, PA, USA
Josef KittlerUniversity of Surrey, Guildford, UK
Jon M. KleinbergCornell University, Ithaca, NY, USA
Alfred KobsaUniversity of California, Irvine, CA, USA
Friedemann MatternETH Zurich, Switzerland
John C. MitchellStanford University, CA, USA
Moni NaorWeizmann Institute of Science, Rehovot, Israel
Oscar NierstraszUniversity of Bern, Switzerland
C. Pandu RanganIndian Institute of Technology, Madras, India
Bernhard SteffenUniversity of Dortmund, Germany
Madhu SudanMassachusetts Institute of Technology, MA, USA
Demetri TerzopoulosUniversity of California, Los Angeles, CA, USA
Doug TygarUniversity of California, Berkeley, CA, USA
Gerhard WeikumMax-Planck Institute of Computer Science, Saarbruecken, Germany
-
Vera Kurkov
Roman Neruda
Jan Koutnk (Eds.)
ArtificialNeural Networks ICANN 2008
18th International Conference
Prague, Czech Republic, September 3-6, 2008
Proceedings, Part I
13
-
Volume Editors
Vera KurkovRoman NerudaInstitute of Computer ScienceAcademy of Sciences of the Czech RepublicPod Vodarenskou vezi 2182 07 Prague 8, Czech RepublicE-mail: {vera, roman}@cs.cas.cz
Jan KoutnkDepartment of Computer ScienceCzech Technical University in PragueKarlovo nam. 13121 35 Prague 2, Czech RepublicE-mail: [email protected]
Library of Congress Control Number: 2008934470
CR Subject Classification (1998): F.1, I.2, I.5, I.4, G.3, J.3, C.2.1, C.1.3
LNCS Sublibrary: SL 1 Theoretical Computer Science and General Issues
ISSN 0302-9743
ISBN-10 3-540-87535-2 Springer Berlin Heidelberg NewYork
ISBN-13 978-3-540-87535-2 Springer Berlin Heidelberg NewYork
This work is subject to copyright. All rights are reserved, whether the whole or part of the material isconcerned, specifically the rights of translation, reprinting, re-use of illustrations, recitation, broadcasting,reproduction on microfilms or in any other way, and storage in data banks. Duplication of this publicationor parts thereof is permitted only under the provisions of the German Copyright Law of September 9, 1965,in its current version, and permission for use must always be obtained from Springer. Violations are liableto prosecution under the German Copyright Law.
Springer is a part of Springer Science+Business Media
springer.com
Springer-Verlag Berlin Heidelberg 2008Printed in Germany
Typesetting: Camera-ready by author, data conversion by Scientific Publishing Services, Chennai, IndiaPrinted on acid-free paper SPIN: 12520565 06/3180 5 4 3 2 1 0
-
Preface
This volume is the first part of the two-volume proceedings of the 18th Interna-tional Conference on Artificial Neural Networks (ICANN 2008) held September36, 2008 in Prague, Czech Republic. The ICANN conferences are annual meet-ings supervised by the European Neural Network Society, in cooperation withthe International Neural Network Society and the Japanese Neural Network So-ciety. This series of conferences has been held since 1991 in various Europeancountries and covers the field of neurocomputing and related areas. In 2008,the ICANN conference was organized by the Institute of Computer Science,Academy of Sciences of the Czech Republic together with the Department ofComputer Science and Engineering from the Faculty of Electrical Engineeringof the Czech Technical University in Prague. Over 300 papers were submittedto the regular sessions, two special sessions and two workshops. The ProgramCommittee selected about 200 papers after a thorough peer-review process; theyare published in the two volumes of these proceedings. The large number, varietyof topics and high quality of submitted papers reflect the vitality of the field ofartificial neural networks.
The first volume contains papers on the mathematical theory of neurocom-puting, learning algorithms, kernel methods, statistical learning and ensembletechniques, support vector machines, reinforcement learning, evolutionary com-puting, hybrid systems, self-organization, control and robotics, signal and timeseries processing and image processing.
The second volume is devoted to pattern recognition and data analysis, hard-ware and embedded systems, computational neuroscience, connectionistic cogni-tive science, neuroinformatics and neural dynamics. It also contains papers fromtwo special sessions, Coupling, Synchronies, and Firing Patterns: From Cogni-tion to Disease, and Constructive Neural Networks, and two workshops, NewTrends in Self-Organization and Optimization of Artificial Neural Networks, andAdaptive Mechanisms of the Perception-Action Cycle.
It is our pleasure to express our gratitude to everyone who contributed inany way to the success of the event and the completion of these proceedings. Inparticular, we thank the members of the Board of the ENNS who uphold thetradition of the series and helped with the organization. With deep gratitude wethank all the members of the Program Committee and the reviewers for theirgreat effort in the reviewing process. We are very grateful to the members of theOrganizing Committee whose hard work made the vision of the 18th ICANNreality. Zdenek Buk and Eva Pospsilova and the entire Computational Intel-ligence Group at Czech Technical University in Prague deserve special thanksfor preparing the conference proceedings. We thank to Miroslav Cepek for theconference website administration. We thank Milena Zeithamlova and Action MAgency for perfect local arrangements. We also thank Alfred Hofmann, Ursula
-
VI Preface
Barth, Anna Kramer and Peter Strasser from Springer for their help with thisdemanding publication project. Last but not least, we thank all authors whocontributed to this volume for sharing their new ideas and results with the com-munity of researchers in this rapidly developing field of biologically motivatedcomputer science. We hope that you enjoy reading and find inspiration for yourfuture work in the papers contained in these two volumes.
June 2008 Vera KurkovaRoman Neruda
Jan Koutnk
-
Organization
Conference Chairs
General Chair Vera Kurkova, Academy of Sciences of theCzech Republic, Czech Republic
Co-Chairs Roman Neruda, Academy of Sciences of theCzech Republic, Czech Republic
Jan Koutnk, Czech Technical University inPrague, Czech Republic
Milena Zeithamlova, Action M Agency,Czech Republic
Honorary Chair John Taylor, Kings College London, UK
Program Committee
Wlodzislaw Duch Nicolaus Copernicus University in Torun,Poland
Luis Alexandre University of Beira Interior, PortugalBruno Apolloni Universita Degli Studi di Milano, ItalyTimo Honkela Helsinki University of Technology, FinlandStefanos Kollias National Technical University in Athens,
GreeceThomas Martinetz University of Lubeck, GermanyGuenter Palm University of Ulm, GermanyAlessandro Sperduti Universita Degli Studi di Padova, ItalyMichel Verleysen Universite catholique de Louvain, BelgiumAlessandro E.P. Villa Universite jouseph Fourier, Grenoble,
FranceStefan Wermter University of Sunderland, UKRudolf Albrecht University of Innsbruck, AustriaPeter Andras Newcastle University, UKGabriela Andrejkova P.J. Safarik University in Kosice, SlovakiaBartlomiej Beliczynski Warsaw University of Technology, PolandMonica Bianchini Universita degli Studi di Siena, ItalyAndrej Dobnikar University of Ljubljana, SloveniaJose R. Dorronsoro Universidad Autonoma de Madrid, Spain
Peter Erdi Hungarian Academy of Sciences, HungaryMarco Gori Universita degli Studi di Siena, ItalyBarbora Hammer University of Osnabruck, Germany
-
VIII Organization
Tom Heskes Radboud University Nijmegen,The Netherlands
Yoshifusa Ito Aichi-Gakuin University, JapanJanusz Kacprzyk Polish Academy of Sciences, PolandPaul C. Kainen Georgetown University, USAMikko Kolehmainen University of Kuopio, FinlandPavel Kordk Czech Technical University in Prague,
Czech RepublicVladimr Kvasnicka Slovak University of Technology in Bratislava,
SlovakiaDanilo P. Mandic Imperial College, UKErkki Oja Helsinki University of Technology, FinlandDavid Pearson Universite Jean Monnet, Saint-Etienne,
FranceLionel Prevost Universite Pierre et Marie Curie, Paris,
FranceBernadete Ribeiro University of Coimbra, PortugalLeszek Rutkowski Czestochowa University of Technology, PolandMarcello Sanguineti University of Genova, ItalyKaterina Schindler Austrian Academy of Sciences, AustriaJuergen Schmidhuber TU Munich (Germany) and IDSIA
(Switzerland)Jir Sma Academy of Sciences of the Czech Republic,
Czech RepublicPeter Sincak Technical University in Kosice, SlovakiaMiroslav Skrbek Czech Technical University in Prague,
Czech RepublicJohan Suykens Katholieke Universiteit Leuven, BelgiumMiroslav Snorek Czech Technical University in Prague,
Czech RepublicRyszard Tadeusiewicz AGH University of Science and Technology,
Poland
Local Organizing Committee
Zdenek Buk Czech Technical University in PragueMiroslav Cepek Czech Technical University in PragueJan Drchal Czech Technical University in PraguePaul C. Kainen Georgetown UniversityOleg Kovark Czech Technical University in PragueRudolf Marek Czech Technical University in PragueAles Pilny Czech Technical University in PragueEva Pospsilova Academy of Sciences of the Czech RepublicTomas Siegl Czech Technical University in Prague
-
Organization IX
Referees
S. AbeR. AdamczakR. AlbrechtE. AlhoniemiR. AndonieG. AngeliniD. AnguitaC. Angulo-BahonC. ArchambeauM. AtenciaP. AubrechtY. AvrithisL. BenuskovaT. BeranZ. BukG. CawleyM. CepekE. CorchadoV. CutsuridisE. DominguezG. DouniasJ. DrchalD. A. ElizondoH. ErwinZ. FabianA. FlanaganL. FrancoD. FrancoisC. FyfeN. Garca-PedrajasG. GneccoB. GosselinJ. GrimR. HaschkeM. HolenaJ. HollmenT. David Huang
D. HusekA. HussainM. ChetouaniC. IgelG. IndiveriS. IshiiH. IzumiJ.M. JerezM. JirinaM. Jirina, jr.K.T. KalveramK. KarpouzisS. KasderidisM. KoskelaJ. KubalkM. KulichF.J. KurfessM. KurzynskiJ. LaaksonenE. LangK. LeiviskaL. LhotskaA. LikasC. LoizouR. MarekE. MarchioriM. A. Martn-MerinoV. di MassaF. MasulliJ. MandziukS. MelacciA. MicheliF. MoutardeR. Cristian MuresanM. NakayamaM. NavaraD. Novak
M. OlteanuD. Ortiz BoyerH. Paugam-MoisyK. PelckmansG. PetersP. PoskD. PolaniM. PorrmannA. PucciA. RaouzaiouK. RapantzikosM. RochaA. RomarizF. RossiL. SartiB. SchrauwenF. SchwenkerO. SimulaA. SkodrasS. SlusnyA. StafylopatisJ. StastnyD. StefkaG. StoilosA. SuarezE. TrentinN. TsapatsoulisP. VidnerovaT. VillmannZ. VomlelT. WennekersP. WiraB. WynsZ. YangF. Zelezny
-
Table of Contents Part I
Mathematical Theory of Neurocomputing
Dimension Reduction for Mixtures of Exponential Families . . . . . . . . . . . . 1Shotaro Akaho
Several Enhancements to Hermite-Based Approximation ofOne-Variable Functions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11
Bartlomiej Beliczynski and Bernardete Ribeiro
Multi-category Bayesian Decision by Neural Networks . . . . . . . . . . . . . . . . 21Yoshifusa Ito, Cidambi Srinivasan, and Hiroyuki Izumi
Estimates of Network Complexity and Integral Representations . . . . . . . . 31Paul C. Kainen and Vera Kurkova
Reliability of Cross-Validation for SVMs in High-Dimensional, LowSample Size Scenarios . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 41
Sascha Klement, Amir Madany Mamlouk, and Thomas Martinetz
Generalization of Concave and Convex Decomposition in Kikuchi FreeEnergy . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 51
Yu Nishiyama and Sumio Watanabe
Analysis of Chaotic Dynamics Using Measures of the Complex NetworkTheory . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 61
Yutaka Shimada, Takayuki Kimura, and Tohru Ikeguchi
Global Dynamics of Finite Cellular Automata . . . . . . . . . . . . . . . . . . . . . . . 71Martin Schule, Thomas Ott, and Ruedi Stoop
Learning Algorithms
Semi-supervised Learning of Tree-Structured RBF Networks UsingCo-training . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 79
Mohamed F. Abdel Hady, Friedhelm Schwenker, and Gunther Palm
A New Type of ART2 Architecture and Application to Color ImageSegmentation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 89
Jiaoyan Ai, Brian Funt, and Lilong Shi
BICA: A Boolean Indepenedent Component Analysis Approach . . . . . . . . 99Bruno Apolloni, Simone Bassis, and Andrea Brega
-
XII Table of Contents Part I
Improving the Learning Speed in 2-Layered LSTM Network byEstimating the Configuration of Hidden Units and Optimizing WeightsInitialization . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 109
Debora C. Correa, Alexandre L.M. Levada, and Jose H. Saito
Manifold Construction Using the Multilayer Perceptron . . . . . . . . . . . . . . . 119Wei-Chen Cheng and Cheng-Yuan Liou
Improving Performance of a Binary Classifier by Training SetSelection . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 128
Cezary Dendek and Jacek Mandziuk
An Overcomplete ICA Algorithm by InfoMax and InfoMin . . . . . . . . . . . . 136Yoshitatsu Matsuda and Kazunori Yamaguchi
OP-ELM: Theory, Experiments and a Toolbox . . . . . . . . . . . . . . . . . . . . . . . 145Yoan Miche, Antti Sorjamaa, and Amaury Lendasse
Robust Nonparametric Probability Density Estimation by SoftClustering . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 155
Ezequiel Lopez-Rubio, Juan Miguel Ortiz-de-Lazcano-Lobato,
Domingo Lopez-Rodrguez, and Mara del Carmen Vargas-Gonzalez
Natural Conjugate Gradient on Complex Flag Manifolds for ComplexIndependent Subspace Analysis . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 165
Yasunori Nishimori, Shotaro Akaho, and Mark D. Plumbley
Quadratically Constrained Quadratic Programming for SubspaceSelection in Kernel Regression Estimation . . . . . . . . . . . . . . . . . . . . . . . . . . . 175
Marco Signoretto, Kristiaan Pelckmans, and Johan A.K. Suykens
The Influence of the Risk Functional in Data Classification withMLPs . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 185
Lus M. Silva, Mark Embrechts, Jorge M. Santos, and
Joaquim Marques de Sa
Nonnegative Least Squares Learning for the Random NeuralNetwork . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 195
Stelios Timotheou
Kernel Methods, Statistical Learning, and EnsembleTechniques
Sparse Bayes Machines for Binary Classification . . . . . . . . . . . . . . . . . . . . . 205Daniel Hernandez-Lobato
Tikhonov Regularization Parameter in Reproducing Kernel HilbertSpaces with Respect to the Sensitivity of the Solution . . . . . . . . . . . . . . . . 215
Katerina Hlavackova-Schindler
-
Table of Contents Part I XIII
Mixture of Expert Used to Learn Game Play . . . . . . . . . . . . . . . . . . . . . . . . 225Peter Lacko and Vladimr Kvasnicka
Unsupervised Bayesian Network Learning for Object Recognition inImage Sequences . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 235
Daniel Oberhoff and Marina Kolesnik
Using Feature Distribution Methods in Ensemble Systems Combinedby Fusion and Selection-Based Methods . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 245
Laura E.A. Santana, Anne M.P. Canuto, and Joao C. Xavier Jr.
Bayesian Ying-Yang Learning on Orthogonal Binary Factor Analysis . . . 255Ke Sun and Lei Xu
A Comparative Study on Data Smoothing Regularization for LocalFactor Analysis . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 265
Shikui Tu, Lei Shi, and Lei Xu
Adding Diversity in Ensembles of Neural Networks by Reordering theTraining Set . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 275
Joaqun Torres-Sospedra, Carlos Hernandez-Espinosa, and
Mercedes Fernandez-Redondo
New Results on Combination Methods for Boosting Ensembles . . . . . . . . 285Joaqun Torres-Sospedra, Carlos Hernandez-Espinosa, and
Mercedes Fernandez-Redondo
Support Vector Machines
Batch Support Vector Training Based on Exact Incremental Training . . . 295Shigeo Abe
A Kernel Method for the Optimization of the Margin Distribution . . . . . 305Fabio Aiolli, Giovanni Da San Martino, and Alessandro Sperduti
A 4Vector MDM Algorithm for Support Vector Training . . . . . . . . . . . . . 315Alvaro Barbero, Jorge Lopez, and Jose R. Dorronsoro
Implementation Issues of an Incremental and Decremental SVM . . . . . . . 325Honorius Galmeanu and Razvan Andonie
Online Clustering of Non-stationary Data Using Incremental andDecremental SVM . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 336
Khaled Boukharouba and Stephane Lecoeuche
Support Vector Machines for Visualization and DimensionalityReduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 346
Tomasz Maszczyk and Wlodzislaw Duch
-
XIV Table of Contents Part I
Reinforcement Learning
Multigrid Reinforcement Learning with Reward Shaping . . . . . . . . . . . . . . 357Marek Grzes and Daniel Kudenko
Self-organized Reinforcement Learning Based on Policy Gradientin Nonstationary Environments . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 367
Yu Hiei, Takeshi Mori, and Shin Ishii
Robust Population Coding in Free-Energy-Based ReinforcementLearning . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 377
Makoto Otsuka, Junichiro Yoshimoto, and Kenji Doya
Policy Gradients with Parameter-Based Exploration for Control . . . . . . . 387Frank Sehnke, Christian Osendorfer, Thomas Ruckstie,
Alex Graves, Jan Peters, and Jurgen Schmidhuber
A Continuous Internal-State Controller for Partially ObservableMarkov Decision Processes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 397
Yuki Taniguchi, Takeshi Mori, and Shin Ishii
Episodic Reinforcement Learning by Logistic Reward-WeightedRegression . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 407
Daan Wierstra, Tom Schaul, Jan Peters, and Juergen Schmidhuber
Error-Entropy Minimization for Dynamical Systems Modeling . . . . . . . . . 417Jernej Zupanc
Evolutionary Computing
Hybrid Evolution of Heterogeneous Neural Networks . . . . . . . . . . . . . . . . . 426Zdenek Buk and Miroslav Snorek
Ant Colony Optimization with Castes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 435Oleg Kovark and Miroslav Skrbek
Neural Network Ensembles for Classification Problems UsingMultiobjective Genetic Algorithms . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 443
David Lahoz and Pedro Mateo
Analysis of Vestibular-Ocular Reflex by Evolutionary Framework . . . . . . . 452Daniel Novak, Ales Pilny, Pavel Kordk, Stefan Holiga, Petr Posk,
R. Cerny, and Richard Brzezny
Fetal Weight Prediction Models: Standard Techniques or ComputationalIntelligence Methods? . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 462
Tomas Siegl, Pavel Kordk, Miroslav Snorek, and Pavel Calda
-
Table of Contents Part I XV
Evolutionary Canonical Particle Swarm Optimizer A Proposal ofMeta-optimization in Model Selection . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 472
Hong Zhang and Masumi Ishikawa
Hybrid Systems
Building Localized Basis Function Networks Using Context DependentClustering . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 482
Marcin Blachnik and Wlodzislaw Duch
Adaptation of Connectionist Weighted Fuzzy Logic Programs withKripke-Kleene Semantics . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 492
Alexandros Chortaras, Giorgos Stamou, Andreas Stafylopatis, and
Stefanos Kollias
Neuro-fuzzy System for Road Signs Recognition . . . . . . . . . . . . . . . . . . . . . 503Boguslaw Cyganek
Neuro-inspired Speech Recognition with Recurrent Spiking Neurons . . . . 513Arfan Ghani, T. Martin McGinnity, Liam P. Maguire, and
Jim Harkin
Predicting the Performance of Learning Algorithms Using SupportVector Machines as Meta-regressors . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 523
Silvio B. Guerra, Ricardo B.C. Prudencio, and Teresa B. Ludermir
Municipal Creditworthiness Modelling by Kohonens Self-organizingFeature Maps and Fuzzy Logic Neural Networks . . . . . . . . . . . . . . . . . . . . . 533
Petr Hajek and Vladimir Olej
Implementing Boolean Matrix Factorization . . . . . . . . . . . . . . . . . . . . . . . . . 543Roman Neruda, Vaclav Snasel, Jan Platos, Pavel Kromer,
Dusan Husek, and Alexander A. Frolov
Application of Potts-Model Perceptron for Binary PatternsIdentification . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 553
Vladimir Kryzhanovsky, Boris Kryzhanovsky, and Anatoly Fonarev
Using ARTMAP-Based Ensemble Systems Designed by Three Variantsof Boosting . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 562
Araken de Medeiros Santos and Anne Magaly de Paula Canuto
Self-organization
Matrix Learning for Topographic Neural Maps . . . . . . . . . . . . . . . . . . . . . . . 572Banchar Arnonkijpanich, Barbara Hammer,
Alexander Hasenfuss, and Chidchanok Lursinsap
-
XVI Table of Contents Part I
Clustering Quality and Topology Preservation in Fast LearningSOMs . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 583
Antonino Fiannaca, Giuseppe Di Fatta, Salvatore Gaglio,
Riccardo Rizzo, and Alfonso Urso
Enhancing Topology Preservation during Neural Field DevelopmentVia Wiring Length Minimization . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 593
Claudius Glaser, Frank Joublin, and Christian Goerick
Adaptive Translation: Finding Interlingual Mappings UsingSelf-organizing Maps . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 603
Timo Honkela, Sami Virpioja, and Jaakko Vayrynen
Self-Organizing Neural Grove: Efficient Multiple Classifier System withPruned Self-Generating Neural Trees . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 613
Hirotaka Inoue
Self-organized Complex Neural Networks through Nonlinear TemporallyAsymmetric Hebbian Plasticity . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 623
Hideyuki Kato and Tohru Ikeguchi
Temporal Hebbian Self-Organizing Map for Sequences . . . . . . . . . . . . . . . . 632Jan Koutnk and Miroslav Snorek
FLSOM with Different Rates for Classification in ImbalancedDatasets . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 642
Ivan Machon-Gonzalez and Hilario Lopez-Garca
A Self-organizing Neural System for Background and ForegroundModeling . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 652
Lucia Maddalena and Alfredo Petrosino
Analyzing the Behavior of the SOM through Wavelet Decomposition ofTime Series Generated during Its Execution . . . . . . . . . . . . . . . . . . . . . . . . . 662
Vctor Mireles and Antonio Neme
Decreasing Neighborhood Revisited in Self-Organizing Maps . . . . . . . . . . . 671Antonio Neme, Elizabeth Chavez, Alejandra Cervera, and
Vctor Mireles
A New GHSOM Model Applied to Network Security . . . . . . . . . . . . . . . . . 680Esteban J. Palomo, Enrique Domnguez, Rafael Marcos Luque, and
Jose Munoz
Reduction of Visual Information in Neural Network LearningVisualization . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 690
Matus Uzak, Rudolf Jaksa, and Peter Sincak
-
Table of Contents Part I XVII
Control and Robotics
Heuristiscs-Based High-Level Strategy for Multi-agent Systems . . . . . . . . 700Peter Gasztonyi and Istvan Harmati
Echo State Networks for Online Prediction of MovementData Comparing Investigations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 710
Sven Hellbach, Soren Strauss, Julian P. Eggert, Edgar Korner, and
Horst-Michael Gross
Comparison of RBF Network Learning and Reinforcement Learning onthe Maze Exploration Problem . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 720
Stanislav Slusny, Roman Neruda, and Petra Vidnerova
Modular Neural Networks for Model-Free Behavioral Learning . . . . . . . . . 730Johane Takeuchi, Osamu Shouno, and Hiroshi Tsujino
From Exploration to Planning . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 740Cornelius Weber and Jochen Triesch
Signal and Time Series Processing
Sentence-Level Evaluation Using Co-occurences of N-Grams . . . . . . . . . . . 750Theologos Athanaselis, Stelios Bakamidis,
Konstantinos Mamouras, and Ioannis Dologlou
Identifying Single Source Data for Mixing Matrix Estimation inInstantaneous Blind Source Separation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 759
Pau Bofill
ECG Signal Classification Using GAME Neural Network and ItsComparison to Other Classifiers . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 768
Miroslav Cepek, Miroslav Snorek, and Vaclav Chudacek
Predictive Modeling with Echo State Networks . . . . . . . . . . . . . . . . . . . . . . 778Michal Cernansky and Peter Tino
Sparse Coding Neural Gas for the Separation of Noisy OvercompleteSources . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 788
Kai Labusch, Erhardt Barth, and Thomas Martinetz
Mutual Information Based Input Variable Selection Algorithm andWavelet Neural Network for Time Series Prediction . . . . . . . . . . . . . . . . . . 798
Rashidi Khazaee Parviz, Mozayani Nasser, and M.R. Jahed Motlagh
Stable Output Feedback in Reservoir Computing Using RidgeRegression . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 808
Francis Wyffels, Benjamin Schrauwen, and Dirk Stroobandt
-
XVIII Table of Contents Part I
Image Processing
Spatio-temporal Summarizing Method of Periodic Image Sequenceswith Kohonen Maps . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 818
Mohamed Berkane, Patrick Clarysse, and Isabelle E. Magnin
Image Classification by Histogram Features Created with LearningVector Quantization . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 827
Marcin Blachnik and Jorma Laaksonen
A Statistical Model for Histogram Refinement . . . . . . . . . . . . . . . . . . . . . . . 837Nizar Bouguila and Walid ElGuebaly
Efficient Video Shot Summarization Using an Enhanced SpectralClustering Approach . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 847
Vasileios Chasanis, Aristidis Likas, and Nikolaos Galatsanos
Surface Reconstruction Techniques Using Neural Networks to RecoverNoisy 3D Scenes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 857
David Elizondo, Shang-Ming Zhou, and Charalambos Chrysostomou
A Spatio-temporal Extension of the SUSAN-Filter . . . . . . . . . . . . . . . . . . . 867Benedikt Kaiser and Gunther Heidemann
A Neighborhood-Based Competitive Network for Video Segmentationand Object Detection . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 877
Rafael Marcos Luque Baena, Enrique Dominguez,
Domingo Lopez-Rodrguez, and Esteban J. Palomo
A Hierarchic Method for Footprint Segmentation Based on SOM . . . . . . . 887Marco Mora Cofre, Ruben Valenzuela, and Girma Berhe
Co-occurrence Matrixes for the Quality Assessment of Coded Images . . . 897Judith Redi, Paolo Gastaldo, Rodolfo Zunino, and Ingrid Heynderickx
Semantic Adaptation of Neural Network Classifiers in ImageSegmentation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 907
Nikolaos Simou, Thanos Athanasiadis, Stefanos Kollias,
Giorgos Stamou, and Andreas Stafylopatis
Partially Monotone Networks Applied to Breast Cancer Detection onMammograms . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 917
Marina Velikova, Hennie Daniels, and Maurice Samulski
Image Processing Recognition Systems
A Neuro-fuzzy Approach to User Attention Recognition . . . . . . . . . . . . . . . 927Stylianos Asteriadis, Kostas Karpouzis, and Stefanos Kollias
-
Table of Contents Part I XIX
TriangleVision: A Toy Visual System . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 937Thomas Bangert
Face Recognition with VG-RAM Weightless Neural Networks . . . . . . . . . . 951Alberto F. De Souza, Claudine Badue, Felipe Pedroni,
Elias Oliveira, Stiven Schwanz Dias, Hallysson Oliveira, and
Soterio Ferreira de Souza
Invariant Object Recognition with Slow Feature Analysis . . . . . . . . . . . . . 961Mathias Franzius, Niko Wilbert, and Laurenz Wiskott
Analysis-by-Synthesis by Learning to Invert Generative Black Boxes . . . . 971Vinod Nair, Josh Susskind, and Geoffrey E. Hinton
A Bio-inspired Connectionist Architecture for Visual Classification ofMoving Objects . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 982
Pedro L. Sanchez Orellana and Claudio Castellanos Sanchez
A Visual Object Recognition System Invariant to Scale and Rotation . . . 991Yasuomi D. Sato, Jenia Jitsev, and Christoph von der Malsburg
Recognizing Facial Expressions: A Comparison of ComputationalApproaches . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1001
Aruna Shenoy, Tim M. Gale, Neil Davey, Bruce Christiansen, and
Ray Frank
A Probabilistic Prediction Method for Object Contour Tracking . . . . . . . 1011Daniel Weiler, Volker Willert, and Julian Eggert
Author Index . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1021
-
Table of Contents Part II
Pattern Recognition and Data Analysis
Investigating Similarity of Ontology Instances and Its Causes . . . . . . . . . . 1Anton Andrejko and Maria Bielikova
A Neural Model for Delay Correction in a Distributed ControlSystem . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11
Ana Antunes, Fernando Morgado Dias, and Alexandre Mota
A Model-Based Relevance Estimation Approach for Feature Selectionin Microarray Datasets . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 21
Gianluca Bontempi and Patrick E. Meyer
Non-stationary Data Mining: The Network Security Issue . . . . . . . . . . . . 32Sergio Decherchi, Paolo Gastaldo, Judith Redi, and Rodolfo Zunino
Efficient Feature Selection for PTR-MS Fingerprinting of AgroindustrialProducts . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 42
Pablo M. Granitto, Franco Biasioli, Cesare Furlanello, and
Flavia Gasperi
Extraction of Binary Features by Probabilistic Neural Networks . . . . . . . 52Jir Grim
Correlation Integral Decomposition for Classification . . . . . . . . . . . . . . . . . 62Marcel Jirina and Marcel Jirina Jr.
Modified q-State Potts Model with Binarized Synaptic Coefficients . . . . . 72Vladimir Kryzhanovsky
Learning Similarity Measures from Pairwise Constraints with NeuralNetworks . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 81
Marco Maggini, Stefano Melacci, and Lorenzo Sarti
Prediction of Binding Sites in the Mouse Genome Using Support VectorMachines . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 91
Yi Sun, Mark Robinson, Rod Adams, Alistair Rust, and Neil Davey
Mimicking Go Experts with Convolutional Neural Networks . . . . . . . . . . . 101Ilya Sutskever and Vinod Nair
Associative Memories Applied to Pattern Recognition . . . . . . . . . . . . . . . . 111Roberto A. Vazquez and Humberto Sossa
-
XXII Table of Contents Part II
MLP-Based Detection of Targets in Clutter: Robustness with Respectto the Shape Parameter of Weibull-Disitributed Clutter . . . . . . . . . . . . . . . 121
Raul Vicen-Bueno, Eduardo Galan-Fernandez,
Manuel Rosa-Zurera, and Maria P. Jarabo-Amores
Hardware, Embedded Systems
Modeling and Synthesis of Computational Efficient AdaptiveNeuro-Fuzzy Systems Based on Matlab . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 131
Guillermo Bosque, Javier Echanobe, Ines del Campo, and
Jose M. Tarela
Embedded Neural Network for Swarm Learning of Physical Robots . . . . . 141Pitoyo Hartono and Sachiko Kakita
Distribution Stream of Tasks in Dual-Processor System . . . . . . . . . . . . . . . 150Michael Kryzhanovsky and Magomed Malsagov
Efficient Implementation of the THSOM Neural Network . . . . . . . . . . . . . . 159Rudolf Marek and Miroslav Skrbek
Reconfigurable MAC-Based Architecture for Parallel HardwareImplementation on FPGAs of Artificial Neural Networks . . . . . . . . . . . . . . 169
Nadia Nedjah, Rodrigo Martins da Silva,
Luiza de Macedo Mourelle, and Marcus Vinicius Carvalho da Silva
Implementation of Central Pattern Generator in an FPGA-BasedEmbedded System . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 179
Cesar Torres-Huitzil and Bernard Girau
Biologically-Inspired Digital Architecture for a Cortical Model ofOrientation Selectivity . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 188
Cesar Torres-Huitzil, Bernard Girau, and Miguel Arias-Estrada
Neural Network Training with Extended Kalman Filter Using GraphicsProcessing Unit . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 198
Peter Trebaticky and Jir Pospchal
Blind Source-Separation in Mixed-Signal VLSI Using the InfoMaxAlgorithm . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 208
Waldo Valenzuela, Gonzalo Carvajal, and Miguel Figueroa
Computational Neuroscience
Synaptic Rewiring for Topographic Map Formation . . . . . . . . . . . . . . . . . . 218Simeon A. Bamford, Alan F. Murray, and David J. Willshaw
Implementing Bayes Rule with Neural Fields . . . . . . . . . . . . . . . . . . . . . . . . 228Raymond H. Cuijpers and Wolfram Erlhagen
-
Table of Contents Part II XXIII
Encoding and Retrieval in a CA1 Microcircuit Model of theHippocampus . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 238
Vassilis Cutsuridis, Stuart Cobb, and Bruce P. Graham
A Bio-inspired Architecture of an Active Visual Search Model . . . . . . . . . 248Vassilis Cutsuridis
Implementing Fuzzy Reasoning on a Spiking Neural Network . . . . . . . . . . 258Cornelius Glackin, Liam McDaid, Liam Maguire, and
Heather Sayers
Short Term Plasticity Provides Temporal Filtering at ChemicalSynapses . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 268
Bruce P. Graham and Christian Stricker
Observational Versus Trial and Error Effects in a Model of an InfantLearning Paradigm . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 277
Matthew Hartley, Jacqueline Fagard, Rana Esseily, and John Taylor
Modeling the Effects of Dopamine on the Antisaccade Reaction Times(aSRT) of Schizophrenia Patients . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 290
Ioannis Kahramanoglou, Stavros Perantonis, Nikolaos Smyrnis,
Ioannis Evdokimidis, and Vassilis Cutsuridis
Fast Multi-command SSVEP Brain Machine Interface withoutTraining . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 300
Pablo Martinez Vasquez, Hovagim Bakardjian,
Montserrat Vallverdu, and Andrezj Cichocki
Separating Global Motion Components in Transparent VisualStimuli A Phenomenological Analysis . . . . . . . . . . . . . . . . . . . . . . . . . . . . 308
Andrew Meso and Johannes M. Zanker
Lateral Excitation between Dissimilar Orientation Columns forOngoing Subthreshold Membrane Oscillations in Primary VisualCortex . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 318
Yuto Nakamura, Kazuhiro Tsuboi, and Osamu Hoshino
A Computational Model of Cortico-Striato-Thalamic Circuits inGoal-Directed Behaviour . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 328
N. Serap Sengor, Ozkan Karabacak, and Ulrich Steinmetz
Firing Pattern Estimation of Synaptically Coupled Hindmarsh-RoseNeurons by Adaptive Observer . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 338
Yusuke Totoki, Kouichi Mitsunaga, Haruo Suemitsu, and
Takami Matsuo
Global Oscillations of Neural Fields in CA3 . . . . . . . . . . . . . . . . . . . . . . . . . 348Francesco Ventriglia
-
XXIV Table of Contents Part II
Connectionistic Cognitive Science
Selective Attention Model of Moving Objects . . . . . . . . . . . . . . . . . . . . . . . . 358Roman Borisyuk, David Chik, and Yakov Kazanovich
Tempotron-Like Learning with ReSuMe . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 368Razvan V. Florian
Neural Network Capable of Amodal Completion . . . . . . . . . . . . . . . . . . . . . 376Kunihiko Fukushima
Predictive Coding in Cortical Microcircuits . . . . . . . . . . . . . . . . . . . . . . . . . . 386Andreea Lazar, Gordon Pipa, and Jochen Triesch
A Biologically Inspired Spiking Neural Network for Sound Localisationby the Inferior Colliculus . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 396
Jindong Liu, Harry Erwin, Stefan Wermter, and Mahmoud Elsaid
Learning Structurally Analogous Tasks . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 406Paul W. Munro
Auto-structure of Presynaptic Activity Defines Postsynaptic FiringStatistics and Can Modulate STDP-Based Structure Formation andLearning . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 413
Gordon Pipa, Raul Vicente, and Alexander Tikhonov
Decision Making Logic of Visual Brain . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 423Andrzej W. Przybyszewski
A Computational Model of Saliency Map Read-Out During VisualSearch . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 433
Mia Setic and Drazen Domijan
A Corpus-Based Computational Model of Metaphor UnderstandingIncorporating Dynamic Interaction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 443
Asuka Terai and Masanori Nakagawa
Deterministic Coincidence Detection and Adaptation Via DelayedInputs . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 453
Zhijun Yang, Alan Murray, and Juan Huo
Synaptic Formation Rate as a Control Parameter in a Model for theOntogenesis of Retinotopy . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 462
Junmei Zhu
Neuroinformatics
Fuzzy Symbolic Dynamics for Neurodynamical Systems . . . . . . . . . . . . . . . 471Krzysztof Dobosz and Wlodzislaw Duch
-
Table of Contents Part II XXV
Towards Personalized Neural Networks for Epileptic SeizurePrediction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 479
Antonio Dourado, Ricardo Martins, Joao Duarte, and Bruno Direito
Real and Modeled Spike Trains: Where Do They Meet? . . . . . . . . . . . . . . . 488Vasile V. Moca, Danko Nikolic, and Raul C. Muresan
The InfoPhase Method or How to Read Neurons with Neurons . . . . . . . . . 498Raul C. Muresan, Wolf Singer, and Danko Nikolic
Artifact Processor for Neuronal Activity Analysis during Deep BrainStimulation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 508
Dimitri V. Nowicki, Brigitte Piallat, Alim-Louis Benabid, and
Tatiana I. Aksenova
Analysis of Human Brain NMR Spectra in Vivo Using Artificial NeuralNetworks . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 517
Erik Saudek, Daniel Novak, Dita Wagnerova, and Milan Hajek
Multi-stage FCM-Based Intensity Inhomogeneity Correction for MRBrain Image Segmentation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 527
Laszlo Szilagyi, Sandor M. Szilagyi, Laszlo David, and Zoltan Benyo
KCMAC: A Novel Fuzzy Cerebellar Model for Medical DecisionSupport . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 537
S.D. Teddy
Decoding Population Neuronal Responses by Topological Clustering . . . . 547Hujun Yin, Stefano Panzeri, Zareen Mehboob, and Mathew Diamond
Neural Dynamics
Learning of Neural Information Routing for Correspondence Finding . . . 557Jan D. Bouecke and Jorg Lucke
A Globally Asymptotically Stable Plasticity Rule for Firing RateHomeostasis . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 567
Prashant Joshi and Jochen Triesch
Analysis and Visualization of the Dynamics of Recurrent NeuralNetworks for Symbolic Sequences Processing . . . . . . . . . . . . . . . . . . . . . . . . 577
Matej Makula and Lubica Benuskova
Chaotic Search for Traveling Salesman Problems by Using 2-opt andOr-opt Algorithms . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 587
Takafumi Matsuura and Tohru Ikeguchi
Comparison of Neural Networks Incorporating Partial Monotonicity byStructure . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 597
Alexey Minin and Bernhard Lang
-
XXVI Table of Contents Part II
Special Session: Coupling, Synchronies and FiringPatterns: from Cognition to Disease
Effect of the Background Activity on the Reconstruction of Spike Trainby Spike Pattern Detection . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 607
Yoshiyuki Asai and Alessandro E.P. Villa
Assemblies as Phase-Locked Pattern Sets That Collectively Win theCompetition for Coherence . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 617
Thomas Burwick
A CA2+ Dynamics Model of the STDP Symmetry-to-AsymmetryTransition in the CA1 Pyramidal Cell of the Hippocampus . . . . . . . . . . . . 627
Vassilis Cutsuridis, Stuart Cobb, and Bruce P. Graham
Improving Associative Memory in a Network of Spiking Neurons . . . . . . . 636Russell Hunter, Stuart Cobb, and Bruce P. Graham
Effect of Feedback Strength in Coupled Spiking Neural Networks . . . . . . . 646Javier Iglesias, Jordi Garca-Ojalvo, and Alessandro E.P. Villa
Bifurcations in Discrete-Time Delayed Hopfield Neural Networks ofTwo Neurons . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 655
Eva Kaslik and Stefan Balint
EEG Switching: Three Views from Dynamical Systems . . . . . . . . . . . . . . . 665Carlos Lourenco
Modeling Synchronization Loss in Large-Scale Brain Dynamics . . . . . . . . 675Antonio J. Pons Rivero, Jose Luis Cantero, Mercedes Atienza, and
Jordi Garca-Ojalvo
Spatio-temporal Dynamics during Perceptual Processing in anOscillatory Neural Network . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 685
A. Ravishankar Rao and Guillermo Cecchi
Resonant Spike Propagation in Coupled Neurons with SubthresholdActivity . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 695
Belen Sancristobal, Jose M. Sancho, and Jordi Garca-Ojalvo
Contour Integration and Synchronization in Neuronal Networks of theVisual Cortex . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 703
Ekkehard Ullner, Raul Vicente, Gordon Pipa, and
Jordi Garca-Ojalvo
Special Session: Constructive Neural Networks
Fuzzy Growing Hierarchical Self-Organizing Networks . . . . . . . . . . . . . . . . 713Miguel Barreto-Sanz, Andres Perez-Uribe,
Carlos-Andres Pena-Reyes, and Marco Tomassini
-
Table of Contents Part II XXVII
MBabCoNN A Multiclass Version of a Constructive Neural NetworkAlgorithm Based on Linear Separability and Convex Hull . . . . . . . . . . . . . 723
Joao Roberto Bertini Jr. and Maria do Carmo Nicoletti
On the Generalization of the m-Class RDP Neural Network . . . . . . . . . . . 734David A. Elizondo, Juan M. Ortiz-de-Lazcano-Lobato, and
Ralph Birkenhead
A Constructive Technique Based on Linear Programming for TrainingSwitching Neural Networks . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 744
Enrico Ferrari and Marco Muselli
Projection Pursuit Constructive Neural Networks Based on Quality ofProjected Clusters . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 754
Marek Grochowski and Wlodzislaw Duch
Introduction to Constructive and Optimization Aspects of SONN-3 . . . . 763Adrian Horzyk
A Reward-Value Based Constructive Method for the AutonomousCreation of Machine Controllers . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 773
Andreas Huemer, David Elizondo, and Mario Gongora
A Brief Review and Comparison of Feedforward Morphological NeuralNetworks with Applications to Classification . . . . . . . . . . . . . . . . . . . . . . . . . 783
Alexandre Monteiro da Silva and Peter Sussner
Prototype Proliferation in the Growing Neural Gas Algorithm . . . . . . . . . 793Hector F. Satizabal, Andres Perez-Uribe, and Marco Tomassini
Active Learning Using a Constructive Neural Network Algorithm . . . . . . 803Jose Luis Subirats, Leonardo Franco, Ignacio Molina Conde, and
Jose M. Jerez
M-CLANN: Multi-class Concept Lattice-Based Artificial NeuralNetwork for Supervised Classification . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 812
Engelbert Mephu Nguifo, Norbert Tsopze, and Gilbert Tindo
Workshop: New Trends in Self-organization andOptimization of Artificial Neural Networks
A Classification Method of Children with Developmental DysphasiaBased on Disorder Speech Analysis . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 822
Marek Bartu and Jana Tuckova
Nature Inspired Methods in the Radial Basis Function NetworkLearning Process . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 829
Miroslav Bursa and Lenka Lhotska
-
XXVIII Table of Contents Part II
Tree-Based Indirect Encodings for Evolutionary Development of NeuralNetworks . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 839
Jan Drchal and Miroslav Snorek
Generating Complex Connectivity Structures for Large-Scale NeuralModels . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 849
Martin Hulse
The GAME Algorithm Applied to Complex Fractionated AtrialElectrograms Data Set . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 859
Pavel Kordk, Vaclav Kremen, and Lenka Lhotska
Geometrical Perspective on Hairy Memory . . . . . . . . . . . . . . . . . . . . . . . . . . 869Cheng-Yuan Liou
Neural Network Based BCI by Using Orthogonal Components ofMulti-channel Brain Waves and Generalization . . . . . . . . . . . . . . . . . . . . . . 879
Kenji Nakayama, Hiroki Horita, and Akihiro Hirano
Feature Ranking Derived from Data Mining Process . . . . . . . . . . . . . . . . . . 889Ales Pilny, Pavel Kordk, and Miroslav Snorek
A Neural Network Approach for Learning Object Ranking . . . . . . . . . . . . 899Leonardo Rigutini, Tiziano Papini, Marco Maggini, and
Monica Bianchini
Evolving Efficient Connection for the Design of Artificial NeuralNetworks . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 909
Min Shi and Haifeng Wu
The Extreme Energy Ratio Criterion for EEG Feature Extraction . . . . . . 919Shiliang Sun
Workshop: Adaptive Mechanisms of thePerception-Action Cycle
The Schizophrenic Brain: A Broken Hermeneutic Circle . . . . . . . . . . . . . . . 929Peter Erdi, Vaibhav Diwadkar, and Balazs Ujfalussy
Neural Model for the Visual Recognition of Goal-DirectedMovements . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 939
Falk Fleischer, Antonino Casile, and Martin A. Giese
Emergent Common Functional Principles in Control Theory and theVertebrate Brain: A Case Study with Autonomous Vehicle Control . . . . . 949
Amir Hussain, Kevin Gurney, Rudwan Abdullah, and Jon Chambers
Organising the Complexity of Behaviour . . . . . . . . . . . . . . . . . . . . . . . . . . . . 959Stathis Kasderidis
-
Table of Contents Part II XXIX
Towards a Neural Model of Mental Simulation . . . . . . . . . . . . . . . . . . . . . . . 969Matthew Hartley and John Taylor
Author Index . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 981
-
Dimension Reduction for Mixtures of
Exponential Families
Shotaro Akaho
Neuroscience Research Institute, AIST, Tsukuba 3058568, Japan
Abstract. Dimension reduction for a set of distribution parameters hasbeen important in various applications of datamining. The exponentialfamily PCA has been proposed for that purpose, but it cannot be directlyapplied to mixture models that do not belong to an exponential family.This paper proposes a method to apply the exponential family PCAto mixture models. A key idea is to embed mixtures into a space of anexponential family. The problem is that the embedding is not unique, andthe dimensionality of parameter space is not constant when the numbersof mixture components are different. The proposed method finds a sub-optimal solution by linear programming formulation.
1 Introduction
In many applications, dimension reduction is important for many purposes suchas visualization and data compression. Traditionally, the principal componentanalysis (PCA) has been widely used as a powerful tool for dimension reductionin the Euclidean space. However, data are often given as binary strings or graphstructures that have very different nature from Euclidean vectors.
One approach that we take here is to regard such a data as a parameter ofa probability distribution. Information geometry[1] gives a mathematical frame-work of the space of probability distributions, and a dimension reduction methodhas been proposed for a class of exponential family[2,3,4,5]. There are mainly twoadvantages of information geometrical approach to other conventional methods:one is that the information geometrical projection from data point always lies onthe support of parameters, and the other is that the projection is defined morenaturally for a distribution than the conventional Euclidean projection.
In this paper, we focus on the mixture models[6], which are very flexible andare often used for clustering. However, we cannot apply the exponential familyPCA to the mixture models, because they are not members of an exponentialfamily. Our main idea is to embed mixture models into the space of an expo-nential family. However, that is not straightforward because the embedding isnot unique and the dimensionality of parameter space is not constant when thenumbers of mixture components are different. Those problems can be resolvedby solving some combinatorial optimization problem, which is computationallyintractable. Therefore, we propose a method that finds a sub-optimal solutionby separating the problems into subproblems, each of which can be optimizedeasier.
V. Kurkova et al. (Eds.): ICANN 2008, Part I, LNCS 5163, pp. 110, 2008.c Springer-Verlag Berlin Heidelberg 2008
-
2 S. Akaho
The proposed framework is not only useful for visualization and data com-pression, but also for applications which have been developed recently in the fieldof datamining, privacy preserving datamining[7] and distributed datamining[8].In the distributed datamining, raw data are collected in many distributed sites.Those datae are not directly sent to the center, but processed into statisticaldata in each site in order to preserve privacy as well as to reduce communicationcosts, and then those statistical data are sent to the center. Similar frameworkhas begun to be studied in the field of sensor networks[9].
2 e-PCA and m-PCA: Dual Dimension Reduction
2.1 Information Geometry of Exponential Family
In this section, we review the exponential family PCA called e-PCA and m-PCA[4]. Exponential family is defined as a class of distributions given by
p(x;) = exp{d
i=1
iFi(x) + C(x) ()}, (1)
with a random variable x and a parameter = (1, . . . , d). The whole set
of distribution p(x;) by changing forms a space (manifold) S. The struc-ture of the manifold is determined by introducing a Riemannian metric andan affine connection. Statistically natural metric is a Fisher information matrixgjk() = E[{ log p(x;)/j}{ log p(x;)/k}] and the natural connectionis -connection specified by one real valued parameter . In particular, = 1 isimportant, because S becomes a flat manifold. When = 1, the space is callede-flat1 with respect to an affine coordinate (e-coordinate) . When = 1,the exponential family is also flat with respect to another affine coordinate (m-coordinate) = (1, . . . , d)
defined by i = E[Fi(x)]. The coordinates and are dually related and transformed each other by Legendre transform, and wewrite this coordinate transform by (),().
2.2 e-PCA and m-PCA
Since the manifold of an exponential family is flat in the e- and m- affine co-ordinate, there are two kinds of flat submanifolds for dimension reduction ac-cordingly. The e-PCA (and m-PCA) is defined by finding the e-flat (m-flat)submanifold that fits to samples given as a set of points of the exponential fam-ily. Here we describe only e-PCA, because m-PCA is completely dual to e-PCAthat is given by exchanging e- and m- in the description of e-PCA.
Let us define the h dimensional e-flat subspace M. The points on M can beexpressed by
(w;U) =
h
j=1
wjuj + u0, (2)
1 e stands for exponential and m stands for mixture.
-
Dimension Reduction for Mixtures of Exponential Families 3
where U = [u0,u1, . . . ,uh] Rdh is a matrix containing basis vectors of thesubspace and w = (w1, . . . , wh)
Rh is a local coordinate on M.Suppose we have a set of parameters (1), . . . ,(n) S as sample points. For
dimension reduction, we need to consider the projection of the sample pointsonto M, which is defined by a geodesic that is orthogonal to M with respectto the Fisher information. According to the two kinds of geodesic, we can definee-projection and m-projection.
Amari[1] has proved that the m-projection onto an e-flat submanifold isunique, and further it is given by the point that minimizes the m-divergence,
Km(p, q) =
p(x){log p(x) log q(x)}dx, (3)
hence we take m-projection for the e-PCA2.As a cost function of fitting of the sample points to a submanifold, it is
convenient to take sum of the m-divergence
L(U,W ) =n
i=1
Km((i),(w(i);U)), (4)
where W = (w(1), . . . ,w(n)), and e-PCA is defined by finding U and W thatminimize L(U,W ). Note that even when data are given as values of a randomvariable instead of parameters, the random variable can be related to a param-eter, thus we can apply the same framework[10].
2.3 Alternating Gradient Descent Algorithm
Although it is difficult to optimize L(U,W ) with respect to U and W simul-taneously, the optimization becomes easier by alternating procedures in whichoptimization is performed for one variable with fixing the other variable. If wefix the basis vectors U , the projection onto an e-flat space from a sample pointis unique as mentioned above. On the other hand, optimizing U with fixingW is also an m-projection to the e-flat subspace determined by W that is asubmanifold of the product space of Sn. Therefore, it also has a unique solution.
In each optimization step, we can apply a Newton-like method[4], but we onlyuse an simple gradient descent in this paper. Note that whatever algorithm weuse, the algorithm does not always converge to the global solution, even if eachalternating step is globally optimized, like in EM and variational Bayes.
The gradient descent algorithm is given by
w(i)j = wuj (i), uj = u
n
i=1
w(i)j
(i), u0 = un
i=1
(i)
(5)
2 By duality, we take e-projection for m-PCA, and e-divergence is defined byKe(p, q) = Km(q, p).
-
4 S. Akaho
where (i) = (i) (i) is a difference of m-coordinates between the pointspecified by the current estimate w(i) and the sample point. As a general ten-dency, the problem is more sensitive against U than against W . Thus we take alearning constant to be w > u.
Further, the basis vectors U has a redundancy of linear transformation, thatis, when U is transformed by any non-singular matrix A, the same solution is ob-tained by transforming W to WA1. It also happens that two different bases uiand uj converge to the same direction if they are optimized by ordinary gradientdescent without any constraints. Therefore, we restrict U to be an orthogonalframe (i.e. UU = Id). Such a space is called Grassmann manifold. The opti-mization in Grassmann manifold is often used for finding principal componentsor minor components [11]. The natural gradient for U is given by
Unat = U UUU (6)
where U be the matrix whose columns are uj in (5). Since this update ruledoes not preserve the orthogonal constraint strictly, we need to orthogonalize it(we apply this in the experiment), or update U along the geodesic.
2.4 e-Center and m-Center
An important special case of the e-PCA and m-PCA is a zero dimensional sub-space that corresponds to a point. The only parameter in that case is u0 that isgiven in a closed form
ec =
(
1
n
n
i=1
((i))
)
, mc =
(
1
n
n
i=1
((i))
)
. (7)
We call them e-center and m-center respectively.
2.5 Properties of e-PCA and m-PCA
In this subsection, we summarize several points that the e-PCA and m-PCA aredifferent from the ordinary PCA.
The first thing is about the hierarchical relation between different dimen-sions. Since e-PCA and m-PCA includes nonlinear part in its formulation, anoptimal low dimensional subspace is not always included in a higher dimensionalone. In some applications, hierarchical structures are necessary or convenient.In such cases, we can construct an algorithm that finds the optimal subspace byconstraining the search space.
The second thing is about the domain (or support) of S. The parameter setof exponential family is a local coordinate. That means does not define theprobability distribution for all values of Rd. In general, it forms a convex regionfor e- and m- coordinate systems. It is known that the m-projection for e-PCA isguaranteed to be included in that region. However, when we apply the gradient-type algorithm, too large step size causes the excess of the candidate solution
-
Dimension Reduction for Mixtures of Exponential Families 5
from the domain. In our implementation, the candidate solution is checked to beincluded in each learning step, and the learning constant is adaptively changedin the case of excess.
The third thing is about the initialization problem. Since the alternating algo-rithm only gives a local optimum, it is important to find a good initial solution.The naive idea is to use the conventional PCA using Euclidean metric, and u0is initialized by its e-center. However, the initialization problem is related tothe domain problem above, i.e., the initialization points have to lie in the do-main region. For simplicity, we take W = 0 in our numerical simulation, whichcorresponds to the initial projection point being always u0.
3 Embedding of Mixture Models
Now let us move on to our main topic, the dimension reduction of mixture mod-els. A major difficulty is that mixture models are not members of an exponentialfamily. If we add a latent variable z representing which component x is generatedfrom, p(x, z;) belongs to the exponential family.
3.1 Latent Variable Model
Mixture of exponential family is written as
p(x) =
k
i=0
ifi(x; i), fi(x; i) = exp(i F i(x)i(i)), i = 0, . . . , k. (8)
Since the number of freedom {i} is k, we regard 1, . . . , k as parameters anddefine 0 by 0 = 1
ki=1 i.
When z {0, 1, 2, . . . , k} is a latent variable representing which component ofmixture x is generated from, the distribution of (x, z) is an exponential family[12] as written down below.
p(x, z) = zfz(x; z) exp
[k
i=1
i F i(x)i(z)
+0 F 0(x)(
1k
i=1
i(z)
)
+
k
i=1
ii(z) ]
, (9)
where i(z) = 1 when z = i, and 0 otherwise, and
i = log i i(i) (log 0 0(0)) , = log 0 + 0(0). (10)
The e-coordinate of this model is = 1, . . . , k, 0, 1, . . . , k, and the m-coordinate is E [i(z)] = i corresponding to i, and E[F i(x)i(z)] = iicorresponding to i, where i = E[F i(x)] is the m-coordinate of each compo-nent distribution fi(x; i).
-
6 S. Akaho
3.2 Problems of the Embedding
There are two problems in the embedding described above. The first one is thatthe embedding is not unique, because the mixture distribution is invariant whencomponents are exchanged. The other happens when there are different numbersof mixture components. In such a case, we cannot embed them directly into onecommon space, because the dimensions of mixture components are different.
For the first problem, we will find the embedding so that embedded distri-butions are located as closely as possible. Once the embedding is completed,the e-PCA (or m-PCA) can be applied directly. For the second problem, wewill split the components to adjust the dimensions between different numbers ofcomponents.
3.3 Embedding for the Homogeneous Mixtures
Firstly, we consider the homogeneous case in which the numbers of componentsare the same for all mixtures (i).
A naive way to resolve the problem is that we perform e-PCA (or m-PCA)for any possible embeddings and take the best one. However, it is not practicalbecause the number of possible embeddings increase exponentially with respectto the number of components and the number of mixtures. Instead, we try to finda configuration by which mixtures get as close together as possible. The followingproposition shows the divergence between two mixtures in the embedded spaceis given in a very simple form.
Proposition 1. Suppose there are two mixture distributions with the same num-bers of components, and their distributions with latent variables be
p1(x, z) = zfz(x; z), p2(x, z) = zfz(x; z). (11)
The m-divergence between p1 and p2 is given by
Km(p1, p2) =
k
i=0
i[Km(fi(x; i), fi(x; i)) + logii
]. (12)
This means that the divergence is separated into sum of functions each of whichdepends only on pairwise components of the two mixtures. Note that the diver-gence between the original mixtures is not so simple.
Based on this fact, we can derive the optimal embedding for two mixturesthat minimizes the divergence. It should be noted that the optimality is notinvariant with respect to the order of p1 and p2 because the divergence is nota symmetric function. For the general n mixtures case, we apply the followinggreedy algorithm based on the pairwise optimality.
[Embedding algorithm (for e-PCA, homogeneous)]
1. Embed (1) in any configuration2. Repeat the following procedures for i = 2, 3, . . . , n
(a) Let ec be the e-center of already embedded mixtures for j = 1, . . . , i1.(b) Embed (i) so as to minimize the m-divergence between (i) and ec in
the embedded space (see next subsection).
-
Dimension Reduction for Mixtures of Exponential Families 7
0 0
1 1
k k
01
k0
1k
......
00
1k
01
k0
...
...kkk
Fig. 1. Matching of distributions. Left: Homogeneous case. The sum of weights isminimized. Right: Heterogeneous case. In this example, the k-th component of the leftgroup is split and matched with two components (0, k-th) of the right group.
3.4 The Optimal Matching Solution
In this subsection, we give an optimization method to find a matching be-tween two mixtures so as to minimize the cost function (12) that is the sumof component-wise functions (Left of fig.1).
Letting the weight values be
ij = Km(p1, p2) = i
[
Km(fi(x; i), fj(x; j)) + logij
]
, (13)
we obtain the optimization problem in terms of the linear programming,
minaij
k
i=0
k
j=0
ijaij s.t. aij 0,k
i=0
aij =
k
j=0
aij = 1 (14)
The solution aij takes binary values (0 or 1) by the following integrality theorem.
Proposition 2 (Integrality theorem[13]). In the transshipment problem,
minaijk
i=0
k
j=0 ijaij s.t. aij 0,k
i=0 aij = sj ,k
j=0 aij = ti, aij has aninteger optimal solution when the problem has at least one feasible solution andsj , ti are all integers. In particular, the solution given by the simplex methodalways gives the integer solution.
3.5 General Case: Splitting Components
When numbers of components of mixtures are different (heterogeneous case), wecan adjust the numbers by splitting components. Splitting the components ofmixtures have played an important role in different situations, for example, tofind an optimal number of components for fitting the mixture[14].
Let f(x) be one of the components of a mixture, it can be split into k + 1components like
if(x; ), i = 0, . . . , k,
k
i=0
i = , i > 0 (15)
-
8 S. Akaho
We need to determine two things: which component should be split and howlarge weights of splitting i should be. However, since it is hard to optimize themsimultaneously, we solve the problem sequentially, that is, first we determine thecomponent to be split based on the optimal assignment problem in the previoussubsection, and then we optimize the weights of splitting.
3.6 Component Selection
Suppose we have two mixtures p1 and p2 given in (11). When their numbers ofcomponents are different (heterogeneous case), we need to find matching one-to-many. Here let z = 0, 1, . . . , k for p1 and z = 0, 1, . . . , k
for p2. In orderto find the one-to-many matching, we extend the optimization problem of thehomogeneous case to the heterogeneous case in a natural way
minaij
k
i=0
k
j=0
ijaij s.t. aij 0,k
i=0
aij 1,k
j=0
aij = 1, (16)
where ij is defined by (13), and we assumed p1 has a smaller number of compo-nents than p2 (k k) and some equality constraints are replaced by inequalityconstraints to deal with one-to-many matching (the right of fig.1).
Note that this problem only gives a sub-optimal matching for the entire prob-lem, because the splitting weights are not taken into account. However, from thecomputational point of view, the integrality property of the solution is preservedand all weights are guaranteed to be binary values, and further virtue of thisformulation is that the homogeneous case is included as a special case of theheterogeneous case.
3.7 Optimal Weights
After the matching is performed, we split the component f(x; ) into k + 1components given by (15) and find the optimal correspondence to the compo-nents ifi(x; i), (i = 0, . . . , k). This can be given by the following proposition.
Proposition 3. The optimal splitting that minimizes the sum of m-divergencebetween if(x; ) and ifi(x; i), (i = 0, . . . , k) is given by
ei =iZ
exp(Km(f(x; ), f(x; i))), (17)
where Z is a normalization constant. The splitting for e-divergence is given by
mi =iZ. (18)
Now we summarize the embedding method in the general case including bothhomogeneous and heterogeneous.
-
Dimension Reduction for Mixtures of Exponential Families 9
10 0 100
0.2
0.4
1
10 0 100
0.2
0.4
2
10 0 100
0.2
0.4
3
10 0 100
0.2
0.4
4
10 0 100
0.2
0.4
5
10 0 100
0.2
0.4
6
10 0 100
0.2
0.4
7
10 0 100
0.2
0.4
8
10 0 100
0.2
0.4
1
10 0 100
0.2
0.4
2
10 0 100
0.2
0.4
3
10 0 100
0.2
0.4
4
10 0 100
0.2
0.4
5
10 0 100
0.2
0.4
6
10 0 100
0.2
0.4
7
10 0 100
0.2
0.4
8
6 5 4 3 2 1 0 15
4
3
2
1
0
1
2
3
4
1
2
3
4
5
6 7
8
Fig. 2. Up-left: Original mixtures, Up-right: Mixtures with reduced dimension, Down:Two dimensional scatter plots of mixtures
[Embedding algorithm (for e-PCA, general)]
1. Sort (1), . . . ,(n) in the descending order of the numbers of components.2. Embed (1) in any configuration3. Repeat the following (a),(b),(c) for i = 2, 3, . . . , n
(a) Let ec be e-center of already embedded mixtures j = 1, . . . , i 1.(b) Solve (16) to find the correspondence between (i) and ec.
(c) If the number of components of (i) is smaller than ec, then split thecomponents by (17).
4 Numerical Experiments
We applied the proposed method to a synthetic data set of one dimensionalGaussian mixtures. First, Gaussian mixtures are generated (total 8 = 4 mix-tures with 3 components + 3 mixtures with 2 components + 1 mixture with 1component), where the parameters of those mixtures (mixing weight, mean andvariance of each component) are determined at random. (the upper left of fig.2).
The learning coefficient of e-PCA is taken to be w = 0.1, u = 0.01 except thecases that the parameter exceeds the domain boundary or the objective functionincreases exceptionally (in such cases the learning rate is decreased adaptively).The update of U is performed 20 steps, each of which follows after 50 stepupdates of W for the sake of stable convergence.
The down figure of fig. 2 shows the result of dimension reduction (e-PCA)to 2 dimensional subspace from 8 dimensional original space (= the number ofparameters of Gaussian mixtures with 3 components). The objective function
-
10 S. Akaho
of L(U,W ) is about 6.4 in the initial solution (the base function is initializedEuclidean PCA) that decreased to about 1.9. The upper right of fig. 2 showsthe projected distributions obtained by e-PCA. We see their original shapes arewell-preserved even in 2-D subspace, but the shapes are little smoothed. We alsoapplied m-PCA as well, a similar but different results are obtained.
5 Concluding Remarks
We have proposed a dimension reduction method of parameters of mixture dis-tributions. There are two important problems to be solved: One is to find a goodinitial solution because the final solution is not a global optimum, though theoptimum solution is obtained in each step. The other is to develop a stable andfast algorithm. As for the embedding, there is a lot of possibilities to be improvedfrom the proposed greedy algorithm. The application of the real world data andextensions to other structured models like HMM and other types of methods likeclustering are all left as future works.
References
1. Amari, S.: Differential Geometrical Methods in Statistics. Springer, Heidelberg(1985)
2. Amari, S.: Information Geometry on Hierarchy of Probability Distributions. IEEETrans. on Information Theory 41 (2001)
3. Collins, M., Dasgupta, S., Schapire, R.: A Generalization of Principal ComponentAnalysis to the Exponential Family. In: Advances in NIPS, vol. 14 (2002)
4. Akaho, S.: The e-PCA and m-PCA: dimension reduction by information geometry.In: IJCNN 2004, pp. 129134 (2004)
5. Watanabe, K., Akaho, S., Okada, M.: Clustering on a Subspace of ExponentialFamily Using Variational Bayes Method. In: Proc. of Worldcomp2008/InformationTheory and Statistical Learning (2008)
6. McLachlan, G., Peel, D.: Finite Mixture Models. Wiley, Chichester (2000)7. Agrawal, R., Srikant, R.: Privacy-Preserving Data Mining. In: Proc. of the ACM
SIGMOD, pp. 439450 (2000)8. Kumar, A., Kantardzic, M., Madden, S.: Distributed Data Mining: Framework and
Implementations. IEEE Internet Computing 10, 1517 (2006)9. Chong, C.Y., Kumar, S.: Sensor networks: evolution, opportunities, and challenges.
Proc. of the IEEE 91, 12471256 (2003)10. Buntine, W.: Variational extensions to EM and multinomial PCA. In: Elomaa, T.,
Mannila, H., Toivonen, H. (eds.) ECML 2002. LNCS (LNAI), vol. 2430. Springer,Heidelberg (2002)
11. Edelman, A., Arias, T., Smith, S.: The geometry of algorithms with orthogonalityconstraints. SIAM J. Matrix Anal. Appl. 20(2), 303353 (1998)
12. Amari, S.: Information geometry of the EM and em algorithms for neural networks.Neural Networks 8(9), 13791408 (1995)
13. Chvatal, V.: Linear Programming. W.H. Freeman and Company, New York (1983)14. Fukumizu, K., Akaho, S., Amari, S.: Critical lines in symmetry of mixture models
and its application to component splitting. In: Proc. of NIPS15 (2003)
-
Several Enhancements to Hermite-Based
Approximation of One-Variable Functions
Bartlomiej Beliczynski1 and Bernardete Ribeiro2
Warsaw University of Technology,Institute of Control and Industrial Electronics,ul. Koszykowa 75, 00-662 Warszawa, Poland
Department of Informatics Engineering, Center for Informatics and Systems,University of Coimbra,
Polo II, P-3030-290 Coimbra, [email protected]
Abstract. Several enhancements and comments to Hermite-based one-variable function approximation are presented. First of all we prove thata constant bias extracted from the function contributes to the error de-crease. We demonstrate how to choose that bias. Secondly we show howto select a basis among orthonormal functions to achieve minimum errorfor a fixed dimension of an approximation space. Thirdly we prove thatloss of orthonormality due to truncation of the argument range of thebasis functions does not effect the overall error of approximation and theexpansion coefficients. We show how this feature can be used. An appli-cation of the obtained results to ECG data compression is presented.
1 Introduction
A set of Hermite functions forming an orthonormal basis is naturally attractivefor various approximation, classification and data compression tasks. These ba-sis functions are defined on the real numbers set IR and they can be recursivelycalculated. The approximating function coefficients can be determined relativelyeasily to achieve the best approximation property. Since Hermite functions areeigenfunctions of the Fourier transform, time and frequency spectra are simulta-neously approximated. Each subsequent basis function extends frequency band-width within a limited range of well concentrated energy; see for instance [1]. Byintroducing scaling parameter we may control the bandwidth influencing at thesame time the dynamic range of the input argument, till we strike a desirablebalance.
If Hermite one-variable functions are generalized to two variables, they retainthe same useful property and turn out to be very suitable for image compressiontasks.
Recently in several publications (see for instance [2], [3]) it was suggested touse Hermite functions as activation functions in neural schemes. In [3], a so calledconstructive approximation scheme is used. It is a type of incremental approx-imation developed in [4], [5]. The novelty of this approach is that contrary to the
V. Kurkova et al. (Eds.): ICANN 2008, Part I, LNCS 5163, pp. 1120, 2008.c Springer-Verlag Berlin Heidelberg 2008
-
12 B. Beliczynski and B. Ribeiro
traditional neural architecture, every node in the hidden layer has a different ac-tivation function. It gains several advantages of the Hermite functions. However,in such approach orthogonality of Hermite functions is not really exploited.
In this paper we return to the basic tasks of one-variable function approxima-tion. For this classical problem we are offering two enhancements and one proofof correctness.
For fixed basis functions in a Hilbert space, there always exists the best ap-proximation. If the basis is orthonormal, the approximation can relatively easilybe calculated in the form of expansion coefficients. Those coefficients representthe original function approximated in the Hermite basis. The coefficients usuallyrequire less space than the original data. At first glance there seems to be littleroom for improvement. However one may slightly reformulate the problem. In-stead of approximating the function f , one may approximate f f0, where f0is a fixed chosen function. After the approximation is done, f0 is added to theapproximant of ff0. From approximation and data compression point of view,this procedure makes sense if additional efforts put into the representation of f0are compensated by reduction of the approximation error.
In a typically stated approximation problem a basis of n+1 functions {e0, e1, ..., en} is given and we are looking for their expansion coefficients. We may howeverreformulate that problem in the following way. Let us search for any n+1 Hermitebasis functions, not necessarily with consecutive indices, ensuring the smallesterror of approximation. This is the second issue.
The third problem which is stated and discussed here is the problem of loosingorthonormality property by basis functions if the set IR is replaced by its subset.When the approximating basis is orthonormal, the expansion coefficients arecalculated easily. Otherwise these calculations are more complicated. Howeverwe prove that despite of loss of orthonormality, we may determine the Hermiteexpansion coefficients as before.
In this paper we are focusing on Hermite basis, however many of the stud-ied properties are applicable to any orthonormal basis. Our enhancements weretested and demonstrated with ECG data compression, a well known applicationarea.
This paper is organized as follows. In Section 2 basic facts about approxima-tion needed for later use are recalled. In Section 3 Hermite functions are shortlydescribed. Then we present our results in Section 4: bias extraction, basis func-tions selection and proof of correctness for expansion coefficients calculationdespite the lack of basis orthonormality. In Section 5 certain practicalities arepresented and an application of our improvements to ECG data compression isdemonstrated and discussed. In Section 6 conclusions are drawn.
2 Approximation Framework
Some selected facts on function approximation useful for this paper will be re-called. Let us consider the following function
-
Several Enhancements to Hermite-Based Approximation 13
fn+1 =
n
i=0
wigi, (1)
where gi G H, and H is a Hilbert space H = (H,||.||), i = 0, ..., n, andwi IR, i = 0, . . . , n.
For any function f from a Hilbert space H and a closed (finite dimensional)subspace G H with basis {g0, ..., gn} there exists a unique best approximationof f by elements of G ([6]). Let us denote it by gb. Because the error of the bestapproximation is orthogonal to all elements of the approximation space fgbG,the coefficients wi may be calculated from the set of linear equations
gi, f gb = 0 for i = 0, ..., n (2)
where ., . denotes inner product.The formula (2) can also be written as gi, f
nk=0 wkgk=gi, f
nk=0 wk
gi, gk = 0 for i = 0, ..., n or in the matrix form
w = Gf (3)
where = [gi, gj], i, j = 0, ..., n, w= [w0, ..., wn]T , Gf = [g0, f , ..., gn, f]Tand T denotes transposition.
Because there exists a unique best approximation of f in a n+1 dimensionalspace G with basis {g0, ..., gn}, the matrix is nonsingular and wb = 1Gf .
For any basis {g0, ..., gn} one can find such orthonormal basis {e0, ..., en},ei, ej = 1when i = j and ei, ej = 0 when i = j that span{g0, ..., gn} =span{e0, ..., en}. In such a case, is a unit matrix and
wb =[e0, f , e2, f , . .., en, f
]T. (4)
Finally (1) will take the form
fn+1 =
n
i=0
ei, f ei, i = 0, 1, ..., n. (5)
The squared error errorn+1 =< f fn, f fn > of the best approximationof a function f in the basis {e0, ..., en} is thus expressible by
||errorn+1 ||2 = ||f ||2 n
i=0
w2i . (6)
3 Hermite Functions
We will be looking at an orthonormal set of functions in the form of Hermitefunctions. Their expansion coefficients are easily and independently calculatedfrom (4). Let us consider a space of a great practical interest L2(,+)
-
14 B. Beliczynski and B. Ribeiro
5 0 50.8
0.6
0.4
0.2
0
0.2
0.4
0.6
0.8
h0
h1
h3
Fig. 1. Hermite functions h0, h1, h3
with the inner product defined < x, y >=+
x(t)y(t)dt. In such space a se-
quence of linearly independent and bounded functions could be defined as fol-lows h0(t) = w(t) = e
t2/2, h1(t) = tw(t),..., hn(t) = tnw(t),..This basis could
be orthonormalized by using the well known and efficient Gram-Schmidt process(see for instance [6]). Finally a new now orthonormal basis spanning the samespace is obtained
h0(t), h1(t), ..., hn(t), ... (7)
where
hn(t) = cne t
2
2 Hn(t); Hn(t) = (1)net2 dn
dtn(et
2
); cn =1
(2nn!)1/2
. (8)
The polynomialsHn(t) are called Hermite polynomials and the functions en(t)Hermite functions. According to (8) the first several Hermite functions could becalculated
h0(t) =1
1/4e
t2
2 ; h1(t) =1
21/4e
t2
2 2t;
h2(t) =1
221/4
et2
2 (4t2 2); h3(t) =1
431/4
et2
2 (8t3 12t)
A plot of several functions of the Hermite basis are shown in Fig.1.
4 Main Results
4.1 Extracting of Bias
In this section our first enhancement is introduced. Let f be any function from aHilbert spaceH. Instead of approximating function f, we suggest to approximate
-
Several Enhancements to Hermite-Based Approximation 15
the function f f0,where f0 H is a known function. Later f0 is added to theapproximant of f f0. Now a modification of (5) will be the following
ff0n+1 = f0 +
n
i=0
< f f0, ei > ei, (9)
Then the approximation error will be expressed as
ef0n = f ff0n+1 = f f0 n
i=0
< f f0, ei > ei,
and similarly to (6) its squared norm
||ef0n+1||2 = ||f f0||2 n
i=0
< f f0, ei >2 (10)
Theorem 1. Let H be a Hilbert space of functions on a subset of R containingthe interval [a, b], let f be a function from H, f H, {e0, e1, .. ., en} be anorthonormal set in H, c be a constant c R. Let f0 = c1[a,b] where 1[a,b] denotesa function of value 1 in the range [a, b] and 0 elsewhere, and the approximationformula be the following
ff0n+1 = f0 +
n
i=0
< f f0, ei > ei
then the norm of the approximation error is minimized for c = c0 and
c0 =< f, 1[a,b] >
ni=0 < f, ei >< ei, 1[a,b] >
(b a) ni=0 < ei, 1[a,b] >2(11)
Proof. The squared error formula (10) could be expressed as follows ||ef0n+1||2 =||f ||2 + ||f0||2 2 < f, f0 >
ni=0(f, ei ei, f0)2 = ||f ||
2 + c2(b a) 2c
ni=0(f, ei
2+ c2
ei, 1[a,b]
2 2c f, eiei, 1[a,b]
). Now differen-
tiating the squared error formula in respect of c and equating it to zero oneobtains (11).
Along the Theorem we are suggesting the two step approximation. First f0should be calculated and then the function f f0 will be approximated in ausual way.
Remark 1. One may notice that in many applications c0 of (11) could well beapproximated by
c0 < f, 1[a,b] >
(b a) (12)
The right hand side of (12) expresses the mean value of the approximated func-tion f in the range [a, b]. A usual choice of [a, b] is such as an actual function fargument range.
-
16 B. Beliczynski and B. Ribeiro
4.2 Basis Selection
In a typically stated approximation problem there is a function to be approx-imated f and a basis {e0, e1, ..., en} of approximation. We are looking for thefunction expansion coefficients related to the basis functions.
The problem may however be reformulated in the following way. Let searchfor any n+ 1 Hermite-basis functions, not necessarily with consecutive indices,ensuring the smallest error of approximation. In practice this easily can be done.Since for any orthonormal basis an indicator of the error reduction associatedwith the basis function ei is |wi| = | < f, ei > |, one may calculate sufficientlymany coefficients and order them