particle swarm optimization in machine learningteisseyrep/teaching/sml/presentations/psoinml.pdf ·...

26
Introduction Application to training of MLP Application to training of SNN Application to clustering Application to full model selection Conclusions Particle Swarm Optimization in Machine Learning Micha l Okulewicz, Julian Zubek Institute of Computer Science Polish Academy of Sciences Statistical Machine Learning 09 January 2014 Michal Okulewicz, Julian Zubek PSO in ML

Upload: dangtram

Post on 10-Apr-2018

216 views

Category:

Documents


3 download

TRANSCRIPT

  • IntroductionApplication to training of MLPApplication to training of SNN

    Application to clusteringApplication to full model selection

    Conclusions

    Particle Swarm Optimization in Machine Learning

    Micha l Okulewicz, Julian Zubek

    Institute of Computer SciencePolish Academy of Sciences

    Statistical Machine Learning09 January 2014

    Micha l Okulewicz, Julian Zubek PSO in ML

  • IntroductionApplication to training of MLPApplication to training of SNN

    Application to clusteringApplication to full model selection

    Conclusions

    Presentation Plan

    1 Introduction

    2 Application to training of MLP

    3 Application to training of SNN

    4 Application to clustering

    5 Application to full model selection

    Micha l Okulewicz, Julian Zubek PSO in ML

  • IntroductionApplication to training of MLPApplication to training of SNN

    Application to clusteringApplication to full model selection

    Conclusions

    Machine Learning taskMultilayer PerceptronSpiking Neural NetworksClusteringPSO Algorithm

    General machine learning task

    Machine learning algorithm

    ML algorithm = family of models + model selection

    Family of considered models is called hypothesis space.

    Choosing the best model is an optimization problem.

    Note: Some methods does not describe hypothesis functionexplicitly (e.g. kNN).

    Micha l Okulewicz, Julian Zubek PSO in ML

  • IntroductionApplication to training of MLPApplication to training of SNN

    Application to clusteringApplication to full model selection

    Conclusions

    Machine Learning taskMultilayer PerceptronSpiking Neural NetworksClusteringPSO Algorithm

    Optimization in machine learning

    Decision TreeHypothesis space: all possible partitionsby trees.Model selection: multistep greedy searchoptimizing Gini coefficient at each split.

    Micha l Okulewicz, Julian Zubek PSO in ML

  • IntroductionApplication to training of MLPApplication to training of SNN

    Application to clusteringApplication to full model selection

    Conclusions

    Machine Learning taskMultilayer PerceptronSpiking Neural NetworksClusteringPSO Algorithm

    Optimization in machine learning

    Linear regressionHypothesis space:

    y = 0 + 1x

    Model selection: ordinary least squares estimator(closed-form).

    Logistic regressionHypothesis space:

    (x) =1

    1 + exp(0 + 1x)

    Model selection: Newtons method (iterative root finding).

    Micha l Okulewicz, Julian Zubek PSO in ML

  • IntroductionApplication to training of MLPApplication to training of SNN

    Application to clusteringApplication to full model selection

    Conclusions

    Machine Learning taskMultilayer PerceptronSpiking Neural NetworksClusteringPSO Algorithm

    Optimization in machine learning

    Multi-layered perceptronHypothesis space:

    y(x) =

    (00 +

    01

    [(10 +

    11

    [(20 +

    21x), . . .

    ]T), . . .

    ]T)Model selection: Backpropagation (gradient descentoptimization).

    Micha l Okulewicz, Julian Zubek PSO in ML

  • IntroductionApplication to training of MLPApplication to training of SNN

    Application to clusteringApplication to full model selection

    Conclusions

    Machine Learning taskMultilayer PerceptronSpiking Neural NetworksClusteringPSO Algorithm

    MultiLayer Perceptron

    Interpretation:

    Artificial Neural Network modeling a neural system. A stack of logistic regression models.

    Applications:

    Classification, Regression.

    Problems:

    Standard Backpropagationalgorithm might get lost inlocal minima (restarts needed).

    Needs tuning of a learning rate. Backpropagation is unsuitable

    for more then 2 hidden layers.

    Micha l Okulewicz, Julian Zubek PSO in ML

  • IntroductionApplication to training of MLPApplication to training of SNN

    Application to clusteringApplication to full model selection

    Conclusions

    Machine Learning taskMultilayer PerceptronSpiking Neural NetworksClusteringPSO Algorithm

    Spiking Neural Network

    Interpretation:

    Artificial Neural Network taking into account timing of inputs. A set of differential equations for computing membrane

    potential of a neuron.

    Applications:

    Sequence (time-series) analysis. Pattern recognition.

    Problems:

    Tuning of the parametersis not easy (STDP and ReSuMealgorithms train only the weights, but not the recovery time,increase time and initial potential of the neurons).

    Micha l Okulewicz, Julian Zubek PSO in ML

  • IntroductionApplication to training of MLPApplication to training of SNN

    Application to clusteringApplication to full model selection

    Conclusions

    Machine Learning taskMultilayer PerceptronSpiking Neural NetworksClusteringPSO Algorithm

    Selected types of clustering

    Similarity based clustering with defined K . Capacitated clustering (possibly with maximum K ). Cost based clustering (possibly with maximum K and limited

    cluster capacity).

    Micha l Okulewicz, Julian Zubek PSO in ML

  • IntroductionApplication to training of MLPApplication to training of SNN

    Application to clusteringApplication to full model selection

    Conclusions

    Machine Learning taskMultilayer PerceptronSpiking Neural NetworksClusteringPSO Algorithm

    When global stochastic search is feasible?

    Search space is very large. Objective function has multiple minima. Function gradient is unknown. Non-standard evaluation criteria. It is easy to overfit.

    * Tom Dietterich, 1995

    [In machine learning] it appears to be better not to optimize!

    Micha l Okulewicz, Julian Zubek PSO in ML

  • IntroductionApplication to training of MLPApplication to training of SNN

    Application to clusteringApplication to full model selection

    Conclusions

    Machine Learning taskMultilayer PerceptronSpiking Neural NetworksClusteringPSO Algorithm

    Particle Swarm Optimization

    Continuous iterative global optimization metaheuristicalgorithm.

    Utilizes the idea of Swarm Intelligence. Optimization is performed by a set of simple beings called

    particles.

    Each particle has current location, velocity and memory of thebest visited location.

    Particles communicate their best visited location to the set oftheir neighbours.

    Micha l Okulewicz, Julian Zubek PSO in ML

  • IntroductionApplication to training of MLPApplication to training of SNN

    Application to clusteringApplication to full model selection

    Conclusions

    Machine Learning taskMultilayer PerceptronSpiking Neural NetworksClusteringPSO Algorithm

    Particle Swarm Optimization

    Initialize swarm

    Evaluate particles

    Update velocity Update velocity Update velocity...

    Update position Update position Update position

    [STOP conditions not met]

    [STOP conditions met]

    Micha l Okulewicz, Julian Zubek PSO in ML

  • IntroductionApplication to training of MLPApplication to training of SNN

    Application to clusteringApplication to full model selection

    Conclusions

    Machine Learning taskMultilayer PerceptronSpiking Neural NetworksClusteringPSO Algorithm

    SPSO 2007

    tth iteration for the ith particle:

    v(t+1)i = c1u

    (1)U[0;1](x

    (best)n[i ] x

    (t)i ) +

    c2u(2)U[0;1](x

    (best)i x

    (t)i ) +

    v(t)i

    (1)

    x(t+1)i = x

    (t)i + v

    (t)i (2)

    Micha l Okulewicz, Julian Zubek PSO in ML

  • IntroductionApplication to training of MLPApplication to training of SNN

    Application to clusteringApplication to full model selection

    Conclusions

    Machine Learning taskMultilayer PerceptronSpiking Neural NetworksClusteringPSO Algorithm

    SPSO 2011

    tth iteration for the ith particle:

    g(t)i =

    1

    3

    [3x

    (t)i +[

    c1(x

    (best)n[i ] x

    (t)i

    )]+[

    c2(x

    (best)i x

    (t)i

    )]] (3)

    x(t)i unifBi (g (t)i ,x(t)i g (t)i )

    v(t+1)i = v

    (t)i + x

    (t)i x

    (t)i

    (4)

    x(t+1)i = x

    (t)i + v

    (t)i (5)

    Micha l Okulewicz, Julian Zubek PSO in ML

  • IntroductionApplication to training of MLPApplication to training of SNN

    Application to clusteringApplication to full model selection

    Conclusions

    Machine Learning taskMultilayer PerceptronSpiking Neural NetworksClusteringPSO Algorithm

    Simulation on Rastrigins function for SPSO 2007

    Micha l Okulewicz, Julian Zubek PSO in ML

  • IntroductionApplication to training of MLPApplication to training of SNN

    Application to clusteringApplication to full model selection

    Conclusions

    Machine Learning taskMultilayer PerceptronSpiking Neural NetworksClusteringPSO Algorithm

    Simulation on Rastrigins function for SPSO 2011

    Micha l Okulewicz, Julian Zubek PSO in ML

  • IntroductionApplication to training of MLPApplication to training of SNN

    Application to clusteringApplication to full model selection

    Conclusions

    MLP: PSO + SGD BP

    Task: minimize MSE.

    0 100 200 300 400 500

    -10

    -8-6

    -4-2

    irisF2_

    Iteration

    log(

    MS

    E)

    without initial PSOwith initial PSO

    0 100 200 300 400 500

    -1.5

    -1.0

    -0.5

    0.0

    glassF2_

    Iteration

    log(

    MS

    E)

    without initial PSOwith initial PSO

    0 100 200 300 400 500

    -8-6

    -4-2

    0

    thyroidF2_

    Iteration

    log(

    MS

    E)

    without initial PSOwith initial PSO

    0 1000 2000 3000 4000 5000

    -1.2

    -1.0

    -0.8

    -0.6

    gsmF2_40_agg_mean

    Iteration

    log(

    MS

    E)

    without initial PSOwith initial PSO

    0 1000 2000 3000 4000 5000 6000

    -4.0

    -3.5

    -3.0

    -2.5

    -2.0

    -1.5

    -1.0

    -0.5

    wifiF2_40_agg_mean

    Iteration

    log(

    MS

    E)

    without initial PSOwith initial PSO

    0 500 1000 1500 2000 2500

    -3.0

    -2.5

    -2.0

    -1.5

    -1.0

    -0.5

    wifiF2_40_minus

    Iteration

    log(

    MS

    E)

    without initial PSOwith initial PSO

    Micha l Okulewicz, Julian Zubek PSO in ML

  • IntroductionApplication to training of MLPApplication to training of SNN

    Application to clusteringApplication to full model selection

    Conclusions

    Task: minimize AUC of absolute value of membrane potential ofSimilarity Measure Neuron.

    Micha l Okulewicz, Julian Zubek PSO in ML

  • IntroductionApplication to training of MLPApplication to training of SNN

    Application to clusteringApplication to full model selection

    Conclusions

    K-means clustering

    Task: minimize distance from cluster centers to points belongingto clusters.

    Micha l Okulewicz, Julian Zubek PSO in ML

  • IntroductionApplication to training of MLPApplication to training of SNN

    Application to clusteringApplication to full model selection

    Conclusions

    DVRP: example of cost based clustering

    Task: minimize total routes length.

    Micha l Okulewicz, Julian Zubek PSO in ML

  • IntroductionApplication to training of MLPApplication to training of SNN

    Application to clusteringApplication to full model selection

    Conclusions

    Full model selection

    Standard machine learning

    Model selection:

    Tuning parameters of a function of a given class.

    Meta-learning

    Full model selection:

    Choosing preprocessing algorithm and its parameters. Choosing feature selection strategy and its parameters. Choosing machine learning algorithm and its parameters.

    Micha l Okulewicz, Julian Zubek PSO in ML

  • IntroductionApplication to training of MLPApplication to training of SNN

    Application to clusteringApplication to full model selection

    Conclusions

    PSO in full model selection

    Particle Swarm Model Selection (H. J. Escalante, M. Montes, E.Sucar, 2009):

    Implemented on top of Challenge Learning Object Package(CLOP) for MATLAB.

    Used in Agnostic Learning vs Prior Knowledge 2007 challenge(75 competitors):

    8th place overall, 5th place among agnostic methods, 2th place among methods utilizing only standard CLOP

    algorithms.

    2-6 hours needed for each dataset from the competition.

    Micha l Okulewicz, Julian Zubek PSO in ML

  • IntroductionApplication to training of MLPApplication to training of SNN

    Application to clusteringApplication to full model selection

    Conclusions

    Conclusions

    PSO (or possibly other metaheuristics) could be applied toML models fitting and selection.

    In standard approach it does not usualy beat well knownspecific training algorithms.

    It could be used for a mathematically non-trivial models(SNN) or with non-standard fitness functions (we couldoptimize easier for a business criteria).

    Micha l Okulewicz, Julian Zubek PSO in ML

  • IntroductionApplication to training of MLPApplication to training of SNN

    Application to clusteringApplication to full model selection

    Conclusions

    Bibliography I

    Maurice Clerc.

    Standard PSO 2006, 2011, 2012.

    Ioan Cristian and Trelea.

    The particle swarm optimization algorithm: convergence analysis and parameter selection.Information Processing Letters, 85(6):317 325, 2003.

    X. Cui, T.E. Potok, and P. Palathingal.

    Document clustering using particle swarm optimization.In Swarm Intelligence Symposium, 2005. SIS 2005. Proceedings 2005 IEEE, pages 185 191, june 2005.

    Hugo Jair Escalante, Manuel Montes, Luis Enrique Sucar, Isabelle Guyon, and Amir Saffari.

    Particle swarm model selection.In JMLR, Special Topic on Model Selection, pages 405440, 2009.

    Yuan-wei Jing, Tao Ren, and Yu-cheng Zhou.

    Neural network training using pso algorithm in atm traffic control.In Intelligent Control and Automation, pages 341350. Springer, 2006.

    Micha l Okulewicz, Julian Zubek PSO in ML

  • IntroductionApplication to training of MLPApplication to training of SNN

    Application to clusteringApplication to full model selection

    Conclusions

    Bibliography II

    Jan Karwowski, Micha l Okulewicz, and Jaros law Legierski.

    Application of particle swarm optimization algorithm to neural network training process in the localizationof the mobile terminal.In Lazaros Iliadis, Harris Papadopoulos, and Chrisina Jayne, editors, Engineering Applications of NeuralNetworks, volume 383 of Communications in Computer and Information Science, pages 122131. SpringerBerlin Heidelberg, 2013.

    J. Kennedy and R. Eberhart.

    Particle swarm optimization.Proceedings of IEEE International Conference on Neural Networks. IV, pages 19421948, 1995.

    Hongbo Liu, Bo Li, Xiukun Wang, Ye Ji, and Yiyuan Tang.

    Survival density particle swarm optimization for neural network training.In Advances in Neural NetworksISNN 2004, pages 332337. Springer, 2004.

    Michael Meissner, Michael Schmuker, and Gisbert Schneider.

    Optimized Particle Swarm Optimization (OPSO) and its application to artificial neural network training.BMC bioinformatics, 7(1):125, 2006.

    Ammar Mohemmed, Satoshi Matsuda, Stefan Schliebs, Kshitij Dhoble, and Nikola Kasabov.

    Optimization of spiking neural networks with dynamic synapses for spike sequence generation using pso.In IJCNN, pages 29692974. IEEE, 2011.

    Micha l Okulewicz, Julian Zubek PSO in ML

  • IntroductionApplication to training of MLPApplication to training of SNN

    Application to clusteringApplication to full model selection

    Conclusions

    Bibliography III

    Ben Niu and Li Li.

    A hybrid particle swarm optimization for feed-forward neural network training.In Advanced Intelligent Computing Theories and Applications. With Aspects of Artificial Intelligence, pages494501. Springer, 2008.

    Micha l Okulewicz and Jacek Mandziuk.

    Application of particle swarm optimization algorithm to dynamic vehicle routing problem.In Leszek Rutkowski, Marcin Korytkowski, Rafa l Scherer, Ryszard Tadeusiewicz, LotfiA. Zadeh, andJacekM. Zurada, editors, Artificial Intelligence and Soft Computing, volume 7895 of Lecture Notes inComputer Science, pages 547558. Springer Berlin Heidelberg, 2013.

    Xiaorong Pu, Zhongjie Fang, and Yongguo Liu.

    Multilayer perceptron networks training using particle swarm optimization with minimum velocityconstraints.In Advances in Neural NetworksISNN 2007, pages 237245. Springer, 2007.

    Y. Shi and R.C. Eberhart.

    A modified particle swarm optimizer.Proceedings of IEEE International Conference on Evolutionary Computation, page 69 73, 1998.

    Y. Shi and R.C. Eberhart.

    Parameter selection in particle swarm optimization.Proceedings of Evolutionary Programming VII (EP98), page 591600, 1998.

    Micha l Okulewicz, Julian Zubek PSO in ML

    IntroductionApplication to training of MLPApplication to training of SNNApplication to clusteringApplication to full model selection