distributed deep learning techniques

8/18/2019 Distributed Deep Learning Techniques

1/75

Apache Singa: A General Distributed DeepLearning Platform

Md Johirul IslamDepartment of Computer Science

Iowa State [email protected]

March 3, 2016

http://find/


2/75

OverviewSystem Architecture

Distributed Training FrameWorkNeuralNet

TrainingSummary

Singa is a general distributed deep learning platform fortraining big deep learning models over large datasets

It is designed with an intuitive programming model usinglayer abstraction

SINGA is intergrated with Mesos, so that distributedtraining can be started as a Mesos framework

SINGA can run on top of distributed storage system toachieve scalability. The current version of SINGA supportsHDFS

Md Johirul Islam 2/75 Apache Singa

http://find/


3/75


4/75



TrainingSummary

Work ow

Training Goal is to nd optimal parameters involved in thetransformation functions that genrate good features forspecic tasks.

SGD algorithm is used to randomly initialize theparameters and then randomly update through iterations


http://find/http://goback/


5/75


6/75



TrainingSummary

Work Flow

The training workload is distributed over the workers andservers

In each iteration every worker calls the TrainOneBatchfunction to compute parameter gradients

TrainOneBatch takes a NeuralNet object representing aneural network and visits all the layers in a certain order

The resultant gradients are aggregated by the local stub. The stub forwards them to the corresponding servers forupdating


http://goforward/http://find/http://goback/


7/75



TrainingSummary

Work Flow


http://find/


8/75

Overview


9/75



TrainingSummary

Logical ArchitectureParallelismCommunication

Worker GroupMade up of one or more workers.Each worker group trains a complete model replica forparticular datasetThey compute the parameter gradients

A worker group communicates with only one server group.All worker groups communicate with the server groupasynchronously.workers inside a worker group communicatessynchronously.

Server GroupMade up of a number of servers.Each Server manages apartition of the model parameters.The handle get/update requests.The neighboring server groups synchronize time to time.


http://find/


10/75

Overview


11/75



TrainingSummary


Figure: Hybrid Parallelism.


Overview



12/75



TrainingSummary


In Singa Workers and Server run in separate threads. Several workers and servers again reside in a process. There is a main thread in a process that works as stub. The communication between then occurs throughmessages occur through messages.

The stub aggregates all the local messages and forwards

them to different threads


Overview



13/75

System ArchitectureDistributed Training FrameWork

NeuralNetTraining

Summary


Singa Communication library consists of two components:MessageSocket


Overview



14/75


NeuralNetTraining

Summary


Message header contains the Sender and Receiver IDs.

The sender and receiver id comprises of the group id andworker/server id.

The stub forwards messages seeing these id in theaddress table.


Overview



15/75


NeuralNetTraining

Summary





16/75

Overview


17/75


NeuralNetTraining

Summary



Overviewh



18/75


NeuralNetTraining

Summary



OverviewS t A hit t

http://find/


19/75


NeuralNetTraining

Summary






20/75


NeuralNetTraining

Summary






21/75


NeuralNetTraining

Summary


Creating Address



http://find/


22/75


NeuralNetTraining

Summary


Create Address





23/75


NeuralNetTraining

Summary


Sockets

There are two types of sockets: Dealer Socket and RouterSocket

The communication between dealers and router areasynchronous.

The Basic functions of Sockets are to send and receivemessages.



L i l A hi

http://find/


24/75

yDistributed Training FrameWork

NeuralNetTraining

Summary




L i l A hit t



25/75

yDistributed Training FrameWork

NeuralNetTraining

Summary


Poller



Logical Architecture

http://find/


26/75


TrainingSummary


A poller class provides the asynchronous communicationbetween the dealers and the Routers

One can register a set of Socket Interface Objects with apoller instance via calling add method and then calling waitmethod of this poll object to wait for the registeredSocketInterface to be ready for sending and receivingmessages






27/75


TrainingSummary







28/75


TrainingSummary


In Singa the Dealer Socket can connect to only one RouterSocket.

The connection is set up by connecting the dealer socketto the end point of the router socket. A router Socket can connect to one or more Dealer socket.Upon receiving a message the router forwards it to theappropriate dealer according to the Reciever ID of themessage.




http://find/


29/75


TrainingSummary




Di ib d T i i F W kLogical Architecture

http://find/


30/75


TrainingSummary

gParallelismCommunication



Distributed Training FrameWork Overview

http://find/


31/75


TrainingSummary

OverviewTypes of Topology

Singa Cluster topology support different distributed trainingframeworks.

The Cluster topology of Singa is congured in the clustereld of JobProto






32/75


TrainingSummary





http://find/


33/75


TrainingSummary





http://find/


34/75


TrainingSummary


SandBlaster

This is a synchronous framework used by Google Brain. A single server group is launched to handle all requestsfrom workers. A worker computes on its partition of themodel, and only communicates with servers handlingrelated parameters.




http://find/


35/75

gNeuralNet

TrainingSummary

Types of Topology

Figure: SandBlaster topology




http://find/


36/75

gNeuralNet

TrainingSummary

Types of Topology

AllReduce

This is a synchronous framework used by BaiduDeepImage

We bind each worker with a server on the same node, sothat each node is responsible for maintaining a partition ofparameters and collecting updates from all other nodes






37/75

NeuralNetTraining

Summary

Types of Topology

Figure: AllReduce topology




http://find/


38/75

NeuralNetTraining

Summary

Types of Topology

Downpour

This is a asynchronous framework used by Google Brain.

Figure: Downpour topology


http://find/


39/75


40/75


Distributed Training FrameWorkN lN

OverviewTypes of Neural NetworkL


41/75

NeuralNetTraining

Summary

LayerParam


OverviewSystem ArchitectureDistributed Training FrameWork

Ne ralNet

OverviewTypes of Neural NetworkLa er



42/75

NeuralNetTraining

Summary

LayerParam

NeuralNet represents a user neural network model

We have to convert neural net into conguration NeuralNet Users congure NeuralNet by listing all layers of the neuralnet and specifying each layer source layers names



NeuralNet

OverviewTypes of Neural NetworkLayer



43/75

NeuralNetTraining

Summary

LayerParam

Feed Forward

They do not have any cycles Example: MLP,CNN



NeuralNet




44/75

NeuralNetTraining

Summary

LayerParam

Figure: A Simple MLP



NeuralNet


http://find/


45/75

NeuralNetTraining

Summary

LayerParam

Energy Models In energy models the connections are undirected To convert these models into NeuralNet we have to replaceeach undirected connection with two directed connections



NeuralNet


http://find/


46/75

eu a etTraining

Summary

ayeParam

RNN Models

For recurrent neural networks rst step would be to unrollthe recurrent layer



NeuralNet




47/75

TrainingSummary

yParam

Layer is core abstraction in SINGA It performs a variety of feature transformation to obtainhigh level features.



NeuralNet


http://find/


48/75

TrainingSummary

Param



NeuralNet


http://find/


49/75

TrainingSummary

Param

Built in Layers

Input Layers: for loading data from HDFS, DISK orNetwork into memory

Neuron Layers: For feature transformation e.gconvolution, pooling, dropout Loss Layers: for measuring training objective loss, e.g.Cross Entropy loss, Euclidean Loss

Output Layers: For putting the output of prediction intoDISK, HDS etc.

Connection Layers: For connecting partitions whenNeuralNet is partitioned.



NeuralNet


http://find/


50/75

TrainingSummary

Param

Input Layers

A base layer for for loading data from data store It has different subclasses SingleLabelRecordLayer,RecordInputLayer, CSVInputLayer, ImagePreprocessLayer

and many others.



NeuralNet




51/75

TrainingSummary

Param

Output Layers

This layer gets data from its source layer and converts itinto records of type RecordProto. Records are written as(key,value) tuples into Store.



NeuralNetT i i

OverviewTypes of Neural NetworkLayerP

http://find/


52/75

TrainingSummary

Param

Neuron Layer

They manipulate feature transformation ConvolutionLayer: conducts convolution transformation.


http://find/


53/75



Training

OverviewTypes of Neural NetworkLayerParam


54/75

TrainingSummary

Param

ConnectionLayer

ConcateLayer: connects more than one source layers toconcatenate their feature blob along given dimension

SliceLayer: connects to more than one destination layersto slice its feature blob along given dimension

SplitLayer: connects to more than one destination layers

to replicate its feature blob




Training


http://find/


55/75

TrainingSummary

Param




Training




56/75

TrainingSummary

Param

Base Layer Class

Fields:

Methods:




Training


http://find/


57/75

TrainingSummary

Param

Creating Custom Layer




Training




58/75

gSummary





Training


http://find/


59/75

Summary





Training


http://find/


60/75

Summary





Training




61/75

Summary

A Param object in SINGA represents a set of parameterse.g weight matrix or a bias vector congured inside a layerconguration




Training


http://find/


62/75

Summary

Different Parameter Types




TrainingS


http://find/


63/75

Summary

Creating Custom Parameter Type




TrainingS


http://find/


64/75

Summary

Creating Custom Parameter Type




TrainingSummary

TrainOneBatchUpdater

http://find/


65/75

Summary

For each SGD iteration every worker calls theTraineOneBatch function to compute gradients ofparameters associated with local layers.

SINGA implemented two algorithms for the TrainOneBatchBP or BackPropagation: Used By Feed forward and RNN

modelsCD or Contrastive Divergence: used by energy models




TrainingSummary


http://find/


66/75

Summary

Implementing new Algorithms

To implement a new algorithm for TrainOneBatch we haveto create a subclass of Worker




TrainingSummary


http://find/


67/75

Summary

Implementing new Algorithms




TrainingSummary


http://find/


68/75

Summary

Implementing new Algorithm




TrainingSummary


http://find/


69/75

y

Every Server in SINGA has an updater instance There are many updaters all of which are subclasses ofUpdater class

The base Updater implements the Vanilla SGD Algorithm




TrainingSummary


http://find/


70/75

Learning Rate

There are different change methods like kFixed, kLinear,kExponential, kInverseT, kStep, kFixedStep




TrainingSummary


http://find/


71/75

For different change methods different conguration wouldbe used.




TrainingSummary


http://find/


72/75

Implement Custom Updater

Figure: Base Updater Class




TrainingSummary


http://find/


73/75

Implement Custom Updater


http://find/


74/75



TrainingSummary


75/75

We can use SINGA without a much programmingexperience.

To get our custom layers,parameters, algorithms we need

to change the code. Apache SINGA still in development phase. A lot of featuresare being added very soon.

Currently it has Python Binding following Keras.

It currently supports training on GPU

http://find/

distributed deep learning techniques

Documents