on implementation of neuron network(back-propagation)

35
On Implementation of Neuron Network(Back-propagation) Yu Liu National Institute of Informatics Nov 9, 2010 Yu Liu MapReduce For Machine Learning

Upload: yu-liu

Post on 14-Apr-2017

339 views

Category:

Technology


0 download

TRANSCRIPT

Page 1: On Implementation of Neuron Network(Back-propagation)

On Implementation of NeuronNetwork(Back-propagation)

Yu Liu

National Institute of Informatics

Nov 9, 2010

Yu Liu MapReduce For Machine Learning

Page 2: On Implementation of Neuron Network(Back-propagation)

Outline

1 Motivations2 Brief introduction of background

The Neural NetworkThe Back-propagation AlgorithmThe Problems of Back-propagation

3 Implementation using C++ STL, Sketo Lib and Intel TBBand boost Mapreduce(next week)

Main Flow od data processingAnalysis of ParallelismOptimization

4 The Benchmark Results

5 The Remained Problems

Yu Liu MapReduce For Machine Learning

Page 3: On Implementation of Neuron Network(Back-propagation)

Motivation

Do more practice of parallel programming.

Using and comparing different parallel programming Libraries.

Studying the principle of designing a good parallelprogramming Library.

Yu Liu MapReduce For Machine Learning

Page 4: On Implementation of Neuron Network(Back-propagation)

Motivation

Do more practice of parallel programming.

Using and comparing different parallel programming Libraries.

Studying the principle of designing a good parallelprogramming Library.

Yu Liu MapReduce For Machine Learning

Page 5: On Implementation of Neuron Network(Back-propagation)

Motivation

Do more practice of parallel programming.

Using and comparing different parallel programming Libraries.

Studying the principle of designing a good parallelprogramming Library.

Yu Liu MapReduce For Machine Learning

Page 6: On Implementation of Neuron Network(Back-propagation)

MapReduce Programming modelWhat is MapReduce

The Computation of MapReduce Framework

Input: a set of key/value pairs

Output: a set of key/value pairs.

The user provide two functions: Map and Reduce.

Main Concepts of MapReduce Programming Paradigm

SPLIT: Splitting the input data and iterating over it;

MAP: Computation key/value pairs on each split;

SHUFFLE and SORT: Grouping intermediate values by key;

REDUCE: Iterating over the resulting groups and reducingeach group.

Yu Liu MapReduce For Machine Learning

Page 7: On Implementation of Neuron Network(Back-propagation)

MapReduce Programming modelWhat is MapReduce

The Computation of MapReduce Framework

Input: a set of key/value pairs

Output: a set of key/value pairs.

The user provide two functions: Map and Reduce.

Main Concepts of MapReduce Programming Paradigm

SPLIT: Splitting the input data and iterating over it;

MAP: Computation key/value pairs on each split;

SHUFFLE and SORT: Grouping intermediate values by key;

REDUCE: Iterating over the resulting groups and reducingeach group.

Yu Liu MapReduce For Machine Learning

Page 8: On Implementation of Neuron Network(Back-propagation)

MapReduce Programming modelWhat is MapReduce

The Computation of MapReduce Framework

Input: a set of key/value pairs

Output: a set of key/value pairs.

The user provide two functions: Map and Reduce.

Main Concepts of MapReduce Programming Paradigm

SPLIT: Splitting the input data and iterating over it;

MAP: Computation key/value pairs on each split;

SHUFFLE and SORT: Grouping intermediate values by key;

REDUCE: Iterating over the resulting groups and reducingeach group.

Yu Liu MapReduce For Machine Learning

Page 9: On Implementation of Neuron Network(Back-propagation)

MapReduce Programming modelWhat is MapReduce

The Computation of MapReduce Framework

Input: a set of key/value pairs

Output: a set of key/value pairs.

The user provide two functions: Map and Reduce.

Main Concepts of MapReduce Programming Paradigm

SPLIT: Splitting the input data and iterating over it;

MAP: Computation key/value pairs on each split;

SHUFFLE and SORT: Grouping intermediate values by key;

REDUCE: Iterating over the resulting groups and reducingeach group.

Yu Liu MapReduce For Machine Learning

Page 10: On Implementation of Neuron Network(Back-propagation)

MapReduce Programming modelWhat is MapReduce

The Computation of MapReduce Framework

Input: a set of key/value pairs

Output: a set of key/value pairs.

The user provide two functions: Map and Reduce.

Main Concepts of MapReduce Programming Paradigm

SPLIT: Splitting the input data and iterating over it;

MAP: Computation key/value pairs on each split;

SHUFFLE and SORT: Grouping intermediate values by key;

REDUCE: Iterating over the resulting groups and reducingeach group.

Yu Liu MapReduce For Machine Learning

Page 11: On Implementation of Neuron Network(Back-propagation)

MapReduce Programming modelWhat is MapReduce

The Computation of MapReduce Framework

Input: a set of key/value pairs

Output: a set of key/value pairs.

The user provide two functions: Map and Reduce.

Main Concepts of MapReduce Programming Paradigm

SPLIT: Splitting the input data and iterating over it;

MAP: Computation key/value pairs on each split;

SHUFFLE and SORT: Grouping intermediate values by key;

REDUCE: Iterating over the resulting groups and reducingeach group.

Yu Liu MapReduce For Machine Learning

Page 12: On Implementation of Neuron Network(Back-propagation)

MapReduce Programming modelWhat is MapReduce

The Computation of MapReduce Framework

Input: a set of key/value pairs

Output: a set of key/value pairs.

The user provide two functions: Map and Reduce.

Main Concepts of MapReduce Programming Paradigm

SPLIT: Splitting the input data and iterating over it;

MAP: Computation key/value pairs on each split;

SHUFFLE and SORT: Grouping intermediate values by key;

REDUCE: Iterating over the resulting groups and reducingeach group.

Yu Liu MapReduce For Machine Learning

Page 13: On Implementation of Neuron Network(Back-propagation)

MapReduce Programming modelThe example of applying MapReduce on machine-learning

The paper: Map-Reduce for Machine Learning on Multicore giveus a programming framework model which using mapreduceparadigm to do parallel data processing:

Yu Liu MapReduce For Machine Learning

Page 14: On Implementation of Neuron Network(Back-propagation)

The Artificial Neural Network

An Artificial Neural Network (ANN) is an information processingparadigm that is inspired by the way biological nervous systems,such as the brain, process information.It is composed of a largenumber of highly interconnected processing elements (neurones)working in unison to solve specific problems.R.Rojas: Neural Networks. Springer-Verlag, Berlin, 1996

Yu Liu MapReduce For Machine Learning

Page 15: On Implementation of Neuron Network(Back-propagation)

The Artificial Neural NetworkA simple mapreduce example

The neural network can be trained to recognise some patterns:

Yu Liu MapReduce For Machine Learning

Page 16: On Implementation of Neuron Network(Back-propagation)

The Back-Propagation AlgorithmTraining the NN

In order to train a neural network, must adjust the weights of eachunit to make that the error between the desired output and theactual output is reduced.The back-propagation algorithm:http://galaxy.agh.edu.pl/ ∼vlsi/AI/backp t en/backprop.html

Yu Liu MapReduce For Machine Learning

Page 17: On Implementation of Neuron Network(Back-propagation)

The Back-Propagation AlgorithmBack-Propagation concepts

1 Propagates inputs forward in the usual way, i.e.

All outputs are computed using sigmoid threshold of the innerproduct of the corresponding weight and input vectors.All outputs at stage n are connected to all the inputs at stagen+1

2 Propagates the errors backwards by apportioning them toeach unit according to the amount of this error the unit isresponsible for.

Yu Liu MapReduce For Machine Learning

Page 18: On Implementation of Neuron Network(Back-propagation)

The Back-Propagation Algorithm

Back-Propagation process:

Yu Liu MapReduce For Machine Learning

Page 19: On Implementation of Neuron Network(Back-propagation)

The Back-Propagation Algorithm

Back-Propagation process:

Yu Liu MapReduce For Machine Learning

Page 20: On Implementation of Neuron Network(Back-propagation)

3 versions of implementationsSequential, Sketo, and TBB

A neural network(NN) c++ class was implemented and

An instance of neural network can be created by givenarguments of number of input, layers, number of neurons ...given an input pattern it will give a (set of) output signal(s).it has B-P method to update all neuron’s weights.it has other methods to do all kinds of operations. E.g, putweights , get weights.All the operations of the neural network are sequential.

Yu Liu MapReduce For Machine Learning

Page 21: On Implementation of Neuron Network(Back-propagation)

3 versions of implementationsSequential, Sketo, and TBB

All three versions are implemented as same architecture.

The Training stage:

training data and instance of NN(BP algorithm) are inputs ofa map function;out put of the map function are set of new weights.input of reduce function are out put of map function.out put of reduce function is the average of these new weights.( here I simply average all the weights )

Yu Liu MapReduce For Machine Learning

Page 22: On Implementation of Neuron Network(Back-propagation)

3 versions of implementationsSequential, Sketo, and TBB

When training finished, this Neural Network can be used torecognizing the unknown data

inputs of map function are unknown patterns and NeuralNetwork algorithm;out put of map function are set of signals which denote whatthe input data are.no reduce processing is needed.

Yu Liu MapReduce For Machine Learning

Page 23: On Implementation of Neuron Network(Back-propagation)

Sequential implementation(STL)Training stage

The MAP and REDUCE functions:

Yu Liu MapReduce For Machine Learning

Page 24: On Implementation of Neuron Network(Back-propagation)

Sequential implementation(STL)Training stage

The function object used by MAP:

Yu Liu MapReduce For Machine Learning

Page 25: On Implementation of Neuron Network(Back-propagation)

Implementation using Sketo LibTraining stage

Forward and backward propagation:basic ideas

1 Each computing node has an instance of Neural Network withsame initial weights ;

2 This Neural Network is an extended Neural Network(eachlayer has a ”1” input).

3 Let each computing node calculate same amount of trainingsamples;

4 Running fixed steps training.

5 Sum up each node’s weights and get average.

6 Update this average weights to each NN, and calculate thetotal error.

7 Repeat 4-6 until total error less than a given value

Yu Liu MapReduce For Machine Learning

Page 26: On Implementation of Neuron Network(Back-propagation)

Implementation using STL, Sketo, TBBWell Trained stage

Then the system can be use to analyse the data’s patterns:

1 Mapping/splitting the input data to each computingnode/core ;

2 Reduce is not needed.This is a very good parallel processing: input data are allindependent.

For Sketo and TBB version implementation, the parallelism of thisstage is P, if there are P processors(cores).

Yu Liu MapReduce For Machine Learning

Page 27: On Implementation of Neuron Network(Back-propagation)

Implementation using STL, Sketo, TBBTBB

I use tbb::parallel for and tbb::parallel reduce to implement theMAP and REDUCE.

1 TBB version looks a little more complex than Sketo but alsovery easy to use.

2 But TBB provides a lot of useful tools. such as concurrentcontainer, task management ... .

Yu Liu MapReduce For Machine Learning

Page 28: On Implementation of Neuron Network(Back-propagation)

Implementation using STL, Sketo, TBB

The source code: on google code:http://skyto-neurl-network.googlecode.com/svn/trunk/skyto NN

Yu Liu MapReduce For Machine Learning

Page 29: On Implementation of Neuron Network(Back-propagation)

Performance TestOn multi-core machine

Test the performance on 8-core workstation.(weaker than “kraken” but I have GPU :D)

8-core Xeon E5620 8GB RAM 2.4GHz : Training stage:

1 million input patterns.

Neural network: 4 inputs, 4 hidden layers, each hidden layerhas 4 neurons, 4 outputs.

TBB version 3.0, Gcc4.4.5, ubuntu 10.04LST 64bit.

Yu Liu MapReduce For Machine Learning

Page 30: On Implementation of Neuron Network(Back-propagation)

Performance TestOn multi-core machine

The STL version vs Sketo version( 1 core)Training: 6.5992 s vs 7.1128 sUsing to recognize data1.4782 s vs 1.4762 s

Yu Liu MapReduce For Machine Learning

Page 31: On Implementation of Neuron Network(Back-propagation)

Performance TestOn multi-core machine

The test result of Sketo version

Yu Liu MapReduce For Machine Learning

Page 32: On Implementation of Neuron Network(Back-propagation)

Performance TestOn multi-core machine

The test result of Sketo version - speedup

Yu Liu MapReduce For Machine Learning

Page 33: On Implementation of Neuron Network(Back-propagation)

Performance TestOn multi-core machine

The comparison of Sketo version and TBB version (8 cores)

Traing:Sketo: 1.0110 s , TBB: 1.3924 s

Using to recognize dataSketo: 0.1925 s , TBB: –

Yu Liu MapReduce For Machine Learning

Page 34: On Implementation of Neuron Network(Back-propagation)

The Remained Problems

The implementation is not completed yet (boost version). Somedetails are not resolved:

Some B-P algorithm problems such as the ”local minimaproblem” ;

Not tested with very big data ;

The size of neural network are hard to decide (still lackknowledge of NN);

Yu Liu MapReduce For Machine Learning

Page 35: On Implementation of Neuron Network(Back-propagation)

The boost Mapreduce library

The Boost.MapReduce library is a MapReduce implementationacross a plurality of CPU cores rather than machines. The libraryis implemented as a set of C++ class templates, and is aheader-only library.

provide map, reduce, combine and lost of utilities;

based on Boost.FileSystem and Boost.Thread; Like Hadoop,using files as I/O media.

Now , it is not yet part of the Boost Library and is still underdevelopment and review.

Yu Liu MapReduce For Machine Learning