on implementation of neuron network(back-propagation)
TRANSCRIPT
On Implementation of NeuronNetwork(Back-propagation)
Yu Liu
National Institute of Informatics
Nov 9, 2010
Yu Liu MapReduce For Machine Learning
Outline
1 Motivations2 Brief introduction of background
The Neural NetworkThe Back-propagation AlgorithmThe Problems of Back-propagation
3 Implementation using C++ STL, Sketo Lib and Intel TBBand boost Mapreduce(next week)
Main Flow od data processingAnalysis of ParallelismOptimization
4 The Benchmark Results
5 The Remained Problems
Yu Liu MapReduce For Machine Learning
Motivation
Do more practice of parallel programming.
Using and comparing different parallel programming Libraries.
Studying the principle of designing a good parallelprogramming Library.
Yu Liu MapReduce For Machine Learning
Motivation
Do more practice of parallel programming.
Using and comparing different parallel programming Libraries.
Studying the principle of designing a good parallelprogramming Library.
Yu Liu MapReduce For Machine Learning
Motivation
Do more practice of parallel programming.
Using and comparing different parallel programming Libraries.
Studying the principle of designing a good parallelprogramming Library.
Yu Liu MapReduce For Machine Learning
MapReduce Programming modelWhat is MapReduce
The Computation of MapReduce Framework
Input: a set of key/value pairs
Output: a set of key/value pairs.
The user provide two functions: Map and Reduce.
Main Concepts of MapReduce Programming Paradigm
SPLIT: Splitting the input data and iterating over it;
MAP: Computation key/value pairs on each split;
SHUFFLE and SORT: Grouping intermediate values by key;
REDUCE: Iterating over the resulting groups and reducingeach group.
Yu Liu MapReduce For Machine Learning
MapReduce Programming modelWhat is MapReduce
The Computation of MapReduce Framework
Input: a set of key/value pairs
Output: a set of key/value pairs.
The user provide two functions: Map and Reduce.
Main Concepts of MapReduce Programming Paradigm
SPLIT: Splitting the input data and iterating over it;
MAP: Computation key/value pairs on each split;
SHUFFLE and SORT: Grouping intermediate values by key;
REDUCE: Iterating over the resulting groups and reducingeach group.
Yu Liu MapReduce For Machine Learning
MapReduce Programming modelWhat is MapReduce
The Computation of MapReduce Framework
Input: a set of key/value pairs
Output: a set of key/value pairs.
The user provide two functions: Map and Reduce.
Main Concepts of MapReduce Programming Paradigm
SPLIT: Splitting the input data and iterating over it;
MAP: Computation key/value pairs on each split;
SHUFFLE and SORT: Grouping intermediate values by key;
REDUCE: Iterating over the resulting groups and reducingeach group.
Yu Liu MapReduce For Machine Learning
MapReduce Programming modelWhat is MapReduce
The Computation of MapReduce Framework
Input: a set of key/value pairs
Output: a set of key/value pairs.
The user provide two functions: Map and Reduce.
Main Concepts of MapReduce Programming Paradigm
SPLIT: Splitting the input data and iterating over it;
MAP: Computation key/value pairs on each split;
SHUFFLE and SORT: Grouping intermediate values by key;
REDUCE: Iterating over the resulting groups and reducingeach group.
Yu Liu MapReduce For Machine Learning
MapReduce Programming modelWhat is MapReduce
The Computation of MapReduce Framework
Input: a set of key/value pairs
Output: a set of key/value pairs.
The user provide two functions: Map and Reduce.
Main Concepts of MapReduce Programming Paradigm
SPLIT: Splitting the input data and iterating over it;
MAP: Computation key/value pairs on each split;
SHUFFLE and SORT: Grouping intermediate values by key;
REDUCE: Iterating over the resulting groups and reducingeach group.
Yu Liu MapReduce For Machine Learning
MapReduce Programming modelWhat is MapReduce
The Computation of MapReduce Framework
Input: a set of key/value pairs
Output: a set of key/value pairs.
The user provide two functions: Map and Reduce.
Main Concepts of MapReduce Programming Paradigm
SPLIT: Splitting the input data and iterating over it;
MAP: Computation key/value pairs on each split;
SHUFFLE and SORT: Grouping intermediate values by key;
REDUCE: Iterating over the resulting groups and reducingeach group.
Yu Liu MapReduce For Machine Learning
MapReduce Programming modelWhat is MapReduce
The Computation of MapReduce Framework
Input: a set of key/value pairs
Output: a set of key/value pairs.
The user provide two functions: Map and Reduce.
Main Concepts of MapReduce Programming Paradigm
SPLIT: Splitting the input data and iterating over it;
MAP: Computation key/value pairs on each split;
SHUFFLE and SORT: Grouping intermediate values by key;
REDUCE: Iterating over the resulting groups and reducingeach group.
Yu Liu MapReduce For Machine Learning
MapReduce Programming modelThe example of applying MapReduce on machine-learning
The paper: Map-Reduce for Machine Learning on Multicore giveus a programming framework model which using mapreduceparadigm to do parallel data processing:
Yu Liu MapReduce For Machine Learning
The Artificial Neural Network
An Artificial Neural Network (ANN) is an information processingparadigm that is inspired by the way biological nervous systems,such as the brain, process information.It is composed of a largenumber of highly interconnected processing elements (neurones)working in unison to solve specific problems.R.Rojas: Neural Networks. Springer-Verlag, Berlin, 1996
Yu Liu MapReduce For Machine Learning
The Artificial Neural NetworkA simple mapreduce example
The neural network can be trained to recognise some patterns:
Yu Liu MapReduce For Machine Learning
The Back-Propagation AlgorithmTraining the NN
In order to train a neural network, must adjust the weights of eachunit to make that the error between the desired output and theactual output is reduced.The back-propagation algorithm:http://galaxy.agh.edu.pl/ ∼vlsi/AI/backp t en/backprop.html
Yu Liu MapReduce For Machine Learning
The Back-Propagation AlgorithmBack-Propagation concepts
1 Propagates inputs forward in the usual way, i.e.
All outputs are computed using sigmoid threshold of the innerproduct of the corresponding weight and input vectors.All outputs at stage n are connected to all the inputs at stagen+1
2 Propagates the errors backwards by apportioning them toeach unit according to the amount of this error the unit isresponsible for.
Yu Liu MapReduce For Machine Learning
The Back-Propagation Algorithm
Back-Propagation process:
Yu Liu MapReduce For Machine Learning
The Back-Propagation Algorithm
Back-Propagation process:
Yu Liu MapReduce For Machine Learning
3 versions of implementationsSequential, Sketo, and TBB
A neural network(NN) c++ class was implemented and
An instance of neural network can be created by givenarguments of number of input, layers, number of neurons ...given an input pattern it will give a (set of) output signal(s).it has B-P method to update all neuron’s weights.it has other methods to do all kinds of operations. E.g, putweights , get weights.All the operations of the neural network are sequential.
Yu Liu MapReduce For Machine Learning
3 versions of implementationsSequential, Sketo, and TBB
All three versions are implemented as same architecture.
The Training stage:
training data and instance of NN(BP algorithm) are inputs ofa map function;out put of the map function are set of new weights.input of reduce function are out put of map function.out put of reduce function is the average of these new weights.( here I simply average all the weights )
Yu Liu MapReduce For Machine Learning
3 versions of implementationsSequential, Sketo, and TBB
When training finished, this Neural Network can be used torecognizing the unknown data
inputs of map function are unknown patterns and NeuralNetwork algorithm;out put of map function are set of signals which denote whatthe input data are.no reduce processing is needed.
Yu Liu MapReduce For Machine Learning
Sequential implementation(STL)Training stage
The MAP and REDUCE functions:
Yu Liu MapReduce For Machine Learning
Sequential implementation(STL)Training stage
The function object used by MAP:
Yu Liu MapReduce For Machine Learning
Implementation using Sketo LibTraining stage
Forward and backward propagation:basic ideas
1 Each computing node has an instance of Neural Network withsame initial weights ;
2 This Neural Network is an extended Neural Network(eachlayer has a ”1” input).
3 Let each computing node calculate same amount of trainingsamples;
4 Running fixed steps training.
5 Sum up each node’s weights and get average.
6 Update this average weights to each NN, and calculate thetotal error.
7 Repeat 4-6 until total error less than a given value
Yu Liu MapReduce For Machine Learning
Implementation using STL, Sketo, TBBWell Trained stage
Then the system can be use to analyse the data’s patterns:
1 Mapping/splitting the input data to each computingnode/core ;
2 Reduce is not needed.This is a very good parallel processing: input data are allindependent.
For Sketo and TBB version implementation, the parallelism of thisstage is P, if there are P processors(cores).
Yu Liu MapReduce For Machine Learning
Implementation using STL, Sketo, TBBTBB
I use tbb::parallel for and tbb::parallel reduce to implement theMAP and REDUCE.
1 TBB version looks a little more complex than Sketo but alsovery easy to use.
2 But TBB provides a lot of useful tools. such as concurrentcontainer, task management ... .
Yu Liu MapReduce For Machine Learning
Implementation using STL, Sketo, TBB
The source code: on google code:http://skyto-neurl-network.googlecode.com/svn/trunk/skyto NN
Yu Liu MapReduce For Machine Learning
Performance TestOn multi-core machine
Test the performance on 8-core workstation.(weaker than “kraken” but I have GPU :D)
8-core Xeon E5620 8GB RAM 2.4GHz : Training stage:
1 million input patterns.
Neural network: 4 inputs, 4 hidden layers, each hidden layerhas 4 neurons, 4 outputs.
TBB version 3.0, Gcc4.4.5, ubuntu 10.04LST 64bit.
Yu Liu MapReduce For Machine Learning
Performance TestOn multi-core machine
The STL version vs Sketo version( 1 core)Training: 6.5992 s vs 7.1128 sUsing to recognize data1.4782 s vs 1.4762 s
Yu Liu MapReduce For Machine Learning
Performance TestOn multi-core machine
The test result of Sketo version
Yu Liu MapReduce For Machine Learning
Performance TestOn multi-core machine
The test result of Sketo version - speedup
Yu Liu MapReduce For Machine Learning
Performance TestOn multi-core machine
The comparison of Sketo version and TBB version (8 cores)
Traing:Sketo: 1.0110 s , TBB: 1.3924 s
Using to recognize dataSketo: 0.1925 s , TBB: –
Yu Liu MapReduce For Machine Learning
The Remained Problems
The implementation is not completed yet (boost version). Somedetails are not resolved:
Some B-P algorithm problems such as the ”local minimaproblem” ;
Not tested with very big data ;
The size of neural network are hard to decide (still lackknowledge of NN);
Yu Liu MapReduce For Machine Learning
The boost Mapreduce library
The Boost.MapReduce library is a MapReduce implementationacross a plurality of CPU cores rather than machines. The libraryis implemented as a set of C++ class templates, and is aheader-only library.
provide map, reduce, combine and lost of utilities;
based on Boost.FileSystem and Boost.Thread; Like Hadoop,using files as I/O media.
Now , it is not yet part of the Boost Library and is still underdevelopment and review.
Yu Liu MapReduce For Machine Learning