restricted boltzmann machines on multi-core...

31
Restricted Boltzmann Machines on Multi-Core Processors By Sai Prasad nooka Stavan Karia

Upload: others

Post on 27-May-2020

6 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Restricted Boltzmann Machines on Multi-Core Processorsmeseec.ce.rit.edu/756-projects/fall2014/3-2.pdf˜ Boltzmann Machines have bidirectional Connections. ˜ Each neuron have binary

Restricted Boltzmann Machines on Multi-Core ProcessorsBy

Sai Prasad nooka

Stavan Karia

Page 2: Restricted Boltzmann Machines on Multi-Core Processorsmeseec.ce.rit.edu/756-projects/fall2014/3-2.pdf˜ Boltzmann Machines have bidirectional Connections. ˜ Each neuron have binary

Overview

� What is machine learning?

� Brain Vs Processors

� Motivation & Goal

� Introduction to Artificial Neural Networks

� What is a Deep Neural Network ?

� Why Deep Neural Network?

� Boltzmann Machines

� Restricted Boltzmann Machines

� Semi- Supervised Learning

� Learning Feature Hierarchy

� RBM Implementation

� Compute Unified Device Architecture (CUDA )

� Implementation on GPU

� Results

� Conclusion

Page 3: Restricted Boltzmann Machines on Multi-Core Processorsmeseec.ce.rit.edu/756-projects/fall2014/3-2.pdf˜ Boltzmann Machines have bidirectional Connections. ˜ Each neuron have binary

What is Machine Learning?

A model of digit recognition demo :� http://www.cs.toronto.edu/~hinton/adi/in

dex.htm� This model learns to generate combinations

of labels and images.

2000 top-level neurons

500 neurons

500 neurons

28 x 28 pixel image

10 label neurons

Hinton

Page 4: Restricted Boltzmann Machines on Multi-Core Processorsmeseec.ce.rit.edu/756-projects/fall2014/3-2.pdf˜ Boltzmann Machines have bidirectional Connections. ˜ Each neuron have binary

Brain Vs Processors

� Brain is made up of billions of cells, called neurons which is highly parallel and adaptive connections.

� We now have similar number of transistors per chip but not adaptive.

Switching Time:

� Neurons switching frequency is 10KHz.

� Processors switching frequency is approaching 10 GHz, in this case processors are way faster.

Connections:

� In brain we have thousands of interconnected neurons.

� In processors we have maximum of 10 connections per transistor.

Page 5: Restricted Boltzmann Machines on Multi-Core Processorsmeseec.ce.rit.edu/756-projects/fall2014/3-2.pdf˜ Boltzmann Machines have bidirectional Connections. ˜ Each neuron have binary

Motivation & Goal

http://publications.csail.mit.edu/abstracts/abstracts07/brussell2/brussell2.html

Page 6: Restricted Boltzmann Machines on Multi-Core Processorsmeseec.ce.rit.edu/756-projects/fall2014/3-2.pdf˜ Boltzmann Machines have bidirectional Connections. ˜ Each neuron have binary

Motivation & Goal

� The goal is to solve practical problems by using novel learning algorithms inspired by brain and make computers more user friendly.

� Try to achieve Human like performance in problems like :� Object Detection

� Speech recognition

Page 7: Restricted Boltzmann Machines on Multi-Core Processorsmeseec.ce.rit.edu/756-projects/fall2014/3-2.pdf˜ Boltzmann Machines have bidirectional Connections. ˜ Each neuron have binary

Introduction to Artificial Neural Networks

Activation function

Page 8: Restricted Boltzmann Machines on Multi-Core Processorsmeseec.ce.rit.edu/756-projects/fall2014/3-2.pdf˜ Boltzmann Machines have bidirectional Connections. ˜ Each neuron have binary

Introduction to Artificial Neural Networks

http://neuralnetworksanddeeplearning.com/chap5.html

Page 9: Restricted Boltzmann Machines on Multi-Core Processorsmeseec.ce.rit.edu/756-projects/fall2014/3-2.pdf˜ Boltzmann Machines have bidirectional Connections. ˜ Each neuron have binary

What is a Deep Neural network?

http://neuralnetworksanddeeplearning.com/chap5.html

Page 10: Restricted Boltzmann Machines on Multi-Core Processorsmeseec.ce.rit.edu/756-projects/fall2014/3-2.pdf˜ Boltzmann Machines have bidirectional Connections. ˜ Each neuron have binary

Why Deep Neural Networks?

� There is a huge chance that back propagation might get stuck in local minima.

� It is very slow in networks with multiple hidden layers.

� It requires labeled training data.

Page 11: Restricted Boltzmann Machines on Multi-Core Processorsmeseec.ce.rit.edu/756-projects/fall2014/3-2.pdf˜ Boltzmann Machines have bidirectional Connections. ˜ Each neuron have binary

Boltzmann Machines

� Boltzmann Machines were introduced by Hinton & Sejnowski [‘83].

� Boltzmann Machines have bidirectional Connections.

� Each neuron have binary valued states (‘on’ or ‘off’).

� Boltzmann Machines learn the complex irregularities in the training data.

� Probabilistic state transition mechanism.

� This Learning algorithm is very slow in networks with many layers, which gave rise to Restricted Boltzmann Machines.

Page 12: Restricted Boltzmann Machines on Multi-Core Processorsmeseec.ce.rit.edu/756-projects/fall2014/3-2.pdf˜ Boltzmann Machines have bidirectional Connections. ˜ Each neuron have binary

Restricted Boltzmann Machines (RBM)

� RBM’s are the Boltzmann Machines with some restrictions stated below:� There are no connections between two visible units.

� There are no connections between any two hidden units.

� With these restrictions hidden units are conditionally independent given a visible vector.

Page 13: Restricted Boltzmann Machines on Multi-Core Processorsmeseec.ce.rit.edu/756-projects/fall2014/3-2.pdf˜ Boltzmann Machines have bidirectional Connections. ˜ Each neuron have binary

Semi-Supervised Learning

Unlabeled images (all cars/elephants)

Elephant car

Test

Source: Caltech-101

Page 14: Restricted Boltzmann Machines on Multi-Core Processorsmeseec.ce.rit.edu/756-projects/fall2014/3-2.pdf˜ Boltzmann Machines have bidirectional Connections. ˜ Each neuron have binary

Learning feature hierarchy

Akshay n hegde

Page 15: Restricted Boltzmann Machines on Multi-Core Processorsmeseec.ce.rit.edu/756-projects/fall2014/3-2.pdf˜ Boltzmann Machines have bidirectional Connections. ˜ Each neuron have binary

Compute Unified Device Architecture (CUDA )� CUDA is the general purpose architecture which allows to compute in parallel on

NVIDIA GPU’s

� The CUDA programming models use C and C++ to create special functions called Kernels that define data parallel computations.

� Kernels are executed by different threads on GPU’s that operate as a coprocessor /accelerator to CPU.

� To run a kernel threads must first be organized into blocks that can run independently of each other.

Page 16: Restricted Boltzmann Machines on Multi-Core Processorsmeseec.ce.rit.edu/756-projects/fall2014/3-2.pdf˜ Boltzmann Machines have bidirectional Connections. ˜ Each neuron have binary

RBM Implementation

� Z is the partition function given by:

� Given a random input (v) the probability of the hidden unit (j) to be 1 is

Where σ(x) is the sigmoid function

� Similarly given a random hidden vector the state of visible ‘i’ can be set to 1 with probability given by:

[2] Noel Lopes, Bernardete Ribeiro and Joao Goncalves “Restricted Boltzmann Machines and Deep Belief Networks on Multi-Core Processors” in WCCI 2012 IEEE World Congress on Computational Intelligence

June, 10-15, 2012 - Brisbane, Australia

Page 17: Restricted Boltzmann Machines on Multi-Core Processorsmeseec.ce.rit.edu/756-projects/fall2014/3-2.pdf˜ Boltzmann Machines have bidirectional Connections. ˜ Each neuron have binary

RBM Implementation

� Updating weights:

(6)

� (7)

� (8)

[2] Noel Lopes, Bernardete Ribeiro and Joao Goncalves “Restricted Boltzmann Machines and Deep Belief Networks on Multi-Core Processors” in WCCI 2012 IEEE World Congress on Computational Intelligence

June, 10-15, 2012 - Brisbane, Australia

Page 18: Restricted Boltzmann Machines on Multi-Core Processorsmeseec.ce.rit.edu/756-projects/fall2014/3-2.pdf˜ Boltzmann Machines have bidirectional Connections. ˜ Each neuron have binary

Algorithm -1

[2] Noel Lopes, Bernardete Ribeiro and Joao Goncalves “Restricted Boltzmann Machines and Deep Belief Networks on Multi-Core Processors” in WCCI 2012 IEEE World Congress on Computational Intelligence

June, 10-15, 2012 - Brisbane, Australia

Page 19: Restricted Boltzmann Machines on Multi-Core Processorsmeseec.ce.rit.edu/756-projects/fall2014/3-2.pdf˜ Boltzmann Machines have bidirectional Connections. ˜ Each neuron have binary

Algorithm - 2

[2] Noel Lopes, Bernardete Ribeiro and Joao Goncalves “Restricted Boltzmann Machines and Deep Belief Networks on Multi-Core Processors” in WCCI 2012 IEEE World Congress on Computational Intelligence

June, 10-15, 2012 - Brisbane, Australia

Page 20: Restricted Boltzmann Machines on Multi-Core Processorsmeseec.ce.rit.edu/756-projects/fall2014/3-2.pdf˜ Boltzmann Machines have bidirectional Connections. ˜ Each neuron have binary

RBM implementation on GPU

� RBM Kernel's:

� Compute Status Hidden Units� Compute Status Visible Units� Correct Weights

Page 21: Restricted Boltzmann Machines on Multi-Core Processorsmeseec.ce.rit.edu/756-projects/fall2014/3-2.pdf˜ Boltzmann Machines have bidirectional Connections. ˜ Each neuron have binary

Sequence of GPU Kernel calls per epoch

[2] Noel Lopes, Bernardete Ribeiro and Joao Goncalves “Restricted Boltzmann Machines and Deep Belief Networks on Multi-Core Processors” in WCCI 2012 IEEE World Congress on Computational Intelligence

June, 10-15, 2012 - Brisbane, Australia

Page 22: Restricted Boltzmann Machines on Multi-Core Processorsmeseec.ce.rit.edu/756-projects/fall2014/3-2.pdf˜ Boltzmann Machines have bidirectional Connections. ˜ Each neuron have binary

Kernel Implementation

� Compute Status Hidden Units kernel & Compute Status Hidden Units kernel :� Each neuron in both visible and hidden layer represents a block and sum up the values

computed by each thread using a reduction process and then compute the output of the neuron for the active sample.

� The order in which the weight matrix is placed in the memory will affect both the kernels, We place these weights J X I matrix which favor Compute Status Hidden Units as it is executed more number of times.

� Correct weights Kernel 1st Method:� Correct weights kernel consists of summing the values of all samples in each block.

� Each thread gathers and sums up the values for all the samples, then a reduction process take place in order to calculate the weights and biases.

[2] Noel Lopes, Bernardete Ribeiro and Joao Goncalves “Restricted Boltzmann Machines and Deep Belief Networks on Multi-Core Processors” in WCCI 2012 IEEE World Congress on Computational Intelligence

June, 10-15, 2012 - Brisbane, Australia

Page 23: Restricted Boltzmann Machines on Multi-Core Processorsmeseec.ce.rit.edu/756-projects/fall2014/3-2.pdf˜ Boltzmann Machines have bidirectional Connections. ˜ Each neuron have binary

Kernel Implementation

� Correct weights Kernel 2nd Method:� Each block is of 16 X 16 threads, the first dimension of the block (x) is associated to an

input unit ‘i’,while the second dimension (y) to a hidden unit ‘j’. Each thread within a block performs all the samples.

[2] Noel Lopes, Bernardete Ribeiro and Joao Goncalves “Restricted Boltzmann Machines and Deep Belief Networks on Multi-Core Processors” in WCCI 2012 IEEE World Congress on Computational Intelligence

June, 10-15, 2012 - Brisbane, Australia

Page 24: Restricted Boltzmann Machines on Multi-Core Processorsmeseec.ce.rit.edu/756-projects/fall2014/3-2.pdf˜ Boltzmann Machines have bidirectional Connections. ˜ Each neuron have binary

Comparison between two methods

Fig. Proportion of time spent, per epoch, in each task/kernel (as measuredin a GTX 280 device).

[2] Noel Lopes, Bernardete Ribeiro and Joao Goncalves “Restricted Boltzmann Machines and Deep Belief Networks on Multi-Core Processors” in WCCI 2012 IEEE World Congress on Computational Intelligence

June, 10-15, 2012 - Brisbane, Australia

Page 25: Restricted Boltzmann Machines on Multi-Core Processorsmeseec.ce.rit.edu/756-projects/fall2014/3-2.pdf˜ Boltzmann Machines have bidirectional Connections. ˜ Each neuron have binary

Limitations Of 1st Approach

� Two main problems related to memory access :� The memory access is not in coalesced manner, thus cache performance not at its

best.

� Many blocks were trying to access the exactly the same memory addresses, generating memory conflicts.

[2] Noel Lopes, Bernardete Ribeiro and Joao Goncalves “Restricted Boltzmann Machines and Deep Belief Networks on Multi-Core Processors” in WCCI 2012 IEEE World Congress on Computational Intelligence

June, 10-15, 2012 - Brisbane, Australia

Page 26: Restricted Boltzmann Machines on Multi-Core Processorsmeseec.ce.rit.edu/756-projects/fall2014/3-2.pdf˜ Boltzmann Machines have bidirectional Connections. ˜ Each neuron have binary

Experiment Setup

Data Set : MNIST

Number of Samples : 60,000

Number of Visible Units : 784 (28 X 28)

CPU : Intel dual-core i5-2410M with 8 GB of memory

GPU : NVIDIA GeForce 460 GTX

� The number of Hidden units and number of training samples are changed.

[2] Noel Lopes, Bernardete Ribeiro and Joao Goncalves “Restricted Boltzmann Machines and Deep Belief Networks on Multi-Core Processors” in WCCI 2012 IEEE World Congress on Computational Intelligence

June, 10-15, 2012 - Brisbane, Australia

Page 27: Restricted Boltzmann Machines on Multi-Core Processorsmeseec.ce.rit.edu/756-projects/fall2014/3-2.pdf˜ Boltzmann Machines have bidirectional Connections. ˜ Each neuron have binary

Results

Increase in sample size

Increase in sample size

Increase in number of hidden units (across Horizontal Dimension)

[2] Noel Lopes, Bernardete Ribeiro and Joao Goncalves “Restricted Boltzmann Machines and Deep Belief Networks on Multi-Core Processors” in WCCI 2012 IEEE World Congress on Computational Intelligence

June, 10-15, 2012 - Brisbane, Australia

Page 28: Restricted Boltzmann Machines on Multi-Core Processorsmeseec.ce.rit.edu/756-projects/fall2014/3-2.pdf˜ Boltzmann Machines have bidirectional Connections. ˜ Each neuron have binary

Analysis

� The GPU speed ups obtained are in the range of 22 to 46 times.

� For example if N=60,000 with Hidden units= 800 takes 40 minutes per epoch to train but on GPU it takes only 53 seconds per epoch.

Factor Change Speedup Execution Time

Number of Samples Increases Tremendous Increases Drastic Fall

Number of hidden units Increases Sub-linear increase Mediocre reduction

[2] Noel Lopes, Bernardete Ribeiro and Joao Goncalves “Restricted Boltzmann Machines and Deep Belief Networks on Multi-Core Processors” in WCCI 2012 IEEE World Congress on Computational Intelligence

June, 10-15, 2012 - Brisbane, Australia

Page 29: Restricted Boltzmann Machines on Multi-Core Processorsmeseec.ce.rit.edu/756-projects/fall2014/3-2.pdf˜ Boltzmann Machines have bidirectional Connections. ˜ Each neuron have binary

Conclusion

� Deep Belief Networks model is time consuming and computationally expensive.

� With the help of GPU’s by taking advantage of its inherent parallel architecture we could run many number of experiments in short period.

[2] Noel Lopes, Bernardete Ribeiro and Joao Goncalves “Restricted Boltzmann Machines and Deep Belief Networks on Multi-Core Processors” in WCCI 2012 IEEE World Congress on Computational Intelligence

June, 10-15, 2012 - Brisbane, Australia

Page 30: Restricted Boltzmann Machines on Multi-Core Processorsmeseec.ce.rit.edu/756-projects/fall2014/3-2.pdf˜ Boltzmann Machines have bidirectional Connections. ˜ Each neuron have binary

References

[1] G. O. Young, “Synthetic structure of industrial plastics (Book style with paper title and editor),” in Plastics,

2nd ed. vol. 3, J. Peters, Ed. New York: McGraw-Hill, 1964, pp. 15–64.

[2] Noel Lopes, Bernardete Ribeiro and Joao Goncalves “Restricted Boltzmann Machines and Deep Belief

Networks on Multi-Core Processors” in WCCI 2012 IEEE World Congress on Computational Intelligence

June, 10-15, 2012 - Brisbane, Australia

Page 31: Restricted Boltzmann Machines on Multi-Core Processorsmeseec.ce.rit.edu/756-projects/fall2014/3-2.pdf˜ Boltzmann Machines have bidirectional Connections. ˜ Each neuron have binary

Thank You