parallelization stategies of deeplearning neural network training

65
Disclaimer: This is a beta talk - don’t kill me - Disclaimer: This is a beta talk - don’t kill me - Disclaimer: This is a beta talk - don’t kill me Disclaimer: This is a beta talk - don’t kill me - Disclaimer: This is a beta talk - don’t kill me - Disclaimer: This is a beta talk Romeo Kienzler Building Brains - Parallelisation Strategies of Large- Scale Deep Learning Neural Networks on Parallel Scale Out Architectures Like Apache Spark Romeo Kienzler, IBM Watson IoT

Upload: romeo-kienzler

Post on 22-Jan-2018

279 views

Category:

Science


0 download

TRANSCRIPT

Page 1: Parallelization Stategies of DeepLearning Neural Network Training

Disclaimer: This is a beta talk - don’t kill me - Disclaimer: This is a beta talk - don’t kill me - Disclaimer: This is a beta talk - don’t kill me

Disclaimer: This is a beta talk - don’t kill me - Disclaimer: This is a beta talk - don’t kill me - Disclaimer: This is a beta talk Romeo Kienzler

Building Brains - Parallelisation Strategies of Large-Scale Deep Learning Neural Networks on Parallel

Scale Out Architectures Like Apache Spark

Romeo Kienzler, IBM Watson IoT

Page 2: Parallelization Stategies of DeepLearning Neural Network Training

Disclaimer: This is a beta talk - don’t kill me - Disclaimer: This is a beta talk - don’t kill me - Disclaimer: This is a beta talk - don’t kill me

Disclaimer: This is a beta talk - don’t kill me - Disclaimer: This is a beta talk - don’t kill me - Disclaimer: This is a beta talk Romeo Kienzler

Page 3: Parallelization Stategies of DeepLearning Neural Network Training

Disclaimer: This is a beta talk - don’t kill me - Disclaimer: This is a beta talk - don’t kill me - Disclaimer: This is a beta talk - don’t kill me

Disclaimer: This is a beta talk - don’t kill me - Disclaimer: This is a beta talk - don’t kill me - Disclaimer: This is a beta talk Romeo Kienzler

Page 4: Parallelization Stategies of DeepLearning Neural Network Training

Disclaimer: This is a beta talk - don’t kill me - Disclaimer: This is a beta talk - don’t kill me - Disclaimer: This is a beta talk - don’t kill me

Disclaimer: This is a beta talk - don’t kill me - Disclaimer: This is a beta talk - don’t kill me - Disclaimer: This is a beta talk Romeo Kienzler

Page 5: Parallelization Stategies of DeepLearning Neural Network Training

Disclaimer: This is a beta talk - don’t kill me - Disclaimer: This is a beta talk - don’t kill me - Disclaimer: This is a beta talk - don’t kill me

Disclaimer: This is a beta talk - don’t kill me - Disclaimer: This is a beta talk - don’t kill me - Disclaimer: This is a beta talk Romeo Kienzler

Page 6: Parallelization Stategies of DeepLearning Neural Network Training

Disclaimer: This is a beta talk - don’t kill me - Disclaimer: This is a beta talk - don’t kill me - Disclaimer: This is a beta talk - don’t kill me

Disclaimer: This is a beta talk - don’t kill me - Disclaimer: This is a beta talk - don’t kill me - Disclaimer: This is a beta talk Romeo Kienzler

the forward pass

Page 7: Parallelization Stategies of DeepLearning Neural Network Training

Disclaimer: This is a beta talk - don’t kill me - Disclaimer: This is a beta talk - don’t kill me - Disclaimer: This is a beta talk - don’t kill me

Disclaimer: This is a beta talk - don’t kill me - Disclaimer: This is a beta talk - don’t kill me - Disclaimer: This is a beta talk Romeo Kienzler

back propagation

Page 8: Parallelization Stategies of DeepLearning Neural Network Training

Disclaimer: This is a beta talk - don’t kill me - Disclaimer: This is a beta talk - don’t kill me - Disclaimer: This is a beta talk - don’t kill me

Disclaimer: This is a beta talk - don’t kill me - Disclaimer: This is a beta talk - don’t kill me - Disclaimer: This is a beta talk Romeo Kienzler

back propagation

Page 9: Parallelization Stategies of DeepLearning Neural Network Training

Disclaimer: This is a beta talk - don’t kill me - Disclaimer: This is a beta talk - don’t kill me - Disclaimer: This is a beta talk - don’t kill me

Disclaimer: This is a beta talk - don’t kill me - Disclaimer: This is a beta talk - don’t kill me - Disclaimer: This is a beta talk Romeo Kienzler

gradient descent

Page 10: Parallelization Stategies of DeepLearning Neural Network Training

Disclaimer: This is a beta talk - don’t kill me - Disclaimer: This is a beta talk - don’t kill me - Disclaimer: This is a beta talk - don’t kill me

Disclaimer: This is a beta talk - don’t kill me - Disclaimer: This is a beta talk - don’t kill me - Disclaimer: This is a beta talk Romeo Kienzler

gradient descent

Page 11: Parallelization Stategies of DeepLearning Neural Network Training

Disclaimer: This is a beta talk - don’t kill me - Disclaimer: This is a beta talk - don’t kill me - Disclaimer: This is a beta talk - don’t kill me

Disclaimer: This is a beta talk - don’t kill me - Disclaimer: This is a beta talk - don’t kill me - Disclaimer: This is a beta talk Romeo Kienzler

gradient descent

Page 12: Parallelization Stategies of DeepLearning Neural Network Training

Disclaimer: This is a beta talk - don’t kill me - Disclaimer: This is a beta talk - don’t kill me - Disclaimer: This is a beta talk - don’t kill me

Disclaimer: This is a beta talk - don’t kill me - Disclaimer: This is a beta talk - don’t kill me - Disclaimer: This is a beta talk Romeo Kienzler

gradient descent

Page 13: Parallelization Stategies of DeepLearning Neural Network Training

Disclaimer: This is a beta talk - don’t kill me - Disclaimer: This is a beta talk - don’t kill me - Disclaimer: This is a beta talk - don’t kill me

Disclaimer: This is a beta talk - don’t kill me - Disclaimer: This is a beta talk - don’t kill me - Disclaimer: This is a beta talk Romeo Kienzler

gradient descent

Page 14: Parallelization Stategies of DeepLearning Neural Network Training

Disclaimer: This is a beta talk - don’t kill me - Disclaimer: This is a beta talk - don’t kill me - Disclaimer: This is a beta talk - don’t kill me

Disclaimer: This is a beta talk - don’t kill me - Disclaimer: This is a beta talk - don’t kill me - Disclaimer: This is a beta talk Romeo Kienzler

gradient descent

Page 15: Parallelization Stategies of DeepLearning Neural Network Training

Disclaimer: This is a beta talk - don’t kill me - Disclaimer: This is a beta talk - don’t kill me - Disclaimer: This is a beta talk - don’t kill me

Disclaimer: This is a beta talk - don’t kill me - Disclaimer: This is a beta talk - don’t kill me - Disclaimer: This is a beta talk Romeo Kienzler

‣inter-model parallelism ‣data parallelism ‣intra-model parallelism ‣pipelined parallelism

parallelisation

Page 16: Parallelization Stategies of DeepLearning Neural Network Training

Disclaimer: This is a beta talk - don’t kill me - Disclaimer: This is a beta talk - don’t kill me - Disclaimer: This is a beta talk - don’t kill me

Disclaimer: This is a beta talk - don’t kill me - Disclaimer: This is a beta talk - don’t kill me - Disclaimer: This is a beta talk Romeo Kienzler

inter-model parallelismaka. hyper parameter space exploration / tuning

Model A

Page 17: Parallelization Stategies of DeepLearning Neural Network Training

Disclaimer: This is a beta talk - don’t kill me - Disclaimer: This is a beta talk - don’t kill me - Disclaimer: This is a beta talk - don’t kill me

Disclaimer: This is a beta talk - don’t kill me - Disclaimer: This is a beta talk - don’t kill me - Disclaimer: This is a beta talk Romeo Kienzler

inter-model parallelismaka. hyper parameter space exploration / tuning

Model A Model B

Page 18: Parallelization Stategies of DeepLearning Neural Network Training

Disclaimer: This is a beta talk - don’t kill me - Disclaimer: This is a beta talk - don’t kill me - Disclaimer: This is a beta talk - don’t kill me

Disclaimer: This is a beta talk - don’t kill me - Disclaimer: This is a beta talk - don’t kill me - Disclaimer: This is a beta talk Romeo Kienzler

inter-model parallelismaka. hyper parameter space exploration / tuning

Model A Model B Model C

Page 19: Parallelization Stategies of DeepLearning Neural Network Training

Disclaimer: This is a beta talk - don’t kill me - Disclaimer: This is a beta talk - don’t kill me - Disclaimer: This is a beta talk - don’t kill me

Disclaimer: This is a beta talk - don’t kill me - Disclaimer: This is a beta talk - don’t kill me - Disclaimer: This is a beta talk Romeo Kienzler

inter-model parallelismaka. hyper parameter space exploration / tuning

Model A Model B Model C

Node 1

Page 20: Parallelization Stategies of DeepLearning Neural Network Training

Disclaimer: This is a beta talk - don’t kill me - Disclaimer: This is a beta talk - don’t kill me - Disclaimer: This is a beta talk - don’t kill me

Disclaimer: This is a beta talk - don’t kill me - Disclaimer: This is a beta talk - don’t kill me - Disclaimer: This is a beta talk Romeo Kienzler

inter-model parallelismaka. hyper parameter space exploration / tuning

Model A Model B Model C

Node 1 Node 2

Page 21: Parallelization Stategies of DeepLearning Neural Network Training

Disclaimer: This is a beta talk - don’t kill me - Disclaimer: This is a beta talk - don’t kill me - Disclaimer: This is a beta talk - don’t kill me

Disclaimer: This is a beta talk - don’t kill me - Disclaimer: This is a beta talk - don’t kill me - Disclaimer: This is a beta talk Romeo Kienzler

inter-model parallelismaka. hyper parameter space exploration / tuning

Model A Model B Model C

Node 1 Node 2 Node 3

Page 22: Parallelization Stategies of DeepLearning Neural Network Training

Disclaimer: This is a beta talk - don’t kill me - Disclaimer: This is a beta talk - don’t kill me - Disclaimer: This is a beta talk - don’t kill me

Disclaimer: This is a beta talk - don’t kill me - Disclaimer: This is a beta talk - don’t kill me - Disclaimer: This is a beta talk Romeo Kienzler

inter-model parallelismaka. hyper parameter space exploration / tuning

Model A Model B Model C

Node 1 Node 2 Node 3

Page 23: Parallelization Stategies of DeepLearning Neural Network Training

Disclaimer: This is a beta talk - don’t kill me - Disclaimer: This is a beta talk - don’t kill me - Disclaimer: This is a beta talk - don’t kill me

Disclaimer: This is a beta talk - don’t kill me - Disclaimer: This is a beta talk - don’t kill me - Disclaimer: This is a beta talk Romeo Kienzler

data parallelismaka. “Jeff Dean style” parameter averaging

Model A

Page 24: Parallelization Stategies of DeepLearning Neural Network Training

Disclaimer: This is a beta talk - don’t kill me - Disclaimer: This is a beta talk - don’t kill me - Disclaimer: This is a beta talk - don’t kill me

Disclaimer: This is a beta talk - don’t kill me - Disclaimer: This is a beta talk - don’t kill me - Disclaimer: This is a beta talk Romeo Kienzler

Model A Model A

data parallelismaka. “Jeff Dean style” parameter averaging

Page 25: Parallelization Stategies of DeepLearning Neural Network Training

Disclaimer: This is a beta talk - don’t kill me - Disclaimer: This is a beta talk - don’t kill me - Disclaimer: This is a beta talk - don’t kill me

Disclaimer: This is a beta talk - don’t kill me - Disclaimer: This is a beta talk - don’t kill me - Disclaimer: This is a beta talk Romeo Kienzler

Model A Model A Model A

data parallelismaka. “Jeff Dean style” parameter averaging

Page 26: Parallelization Stategies of DeepLearning Neural Network Training

Disclaimer: This is a beta talk - don’t kill me - Disclaimer: This is a beta talk - don’t kill me - Disclaimer: This is a beta talk - don’t kill me

Disclaimer: This is a beta talk - don’t kill me - Disclaimer: This is a beta talk - don’t kill me - Disclaimer: This is a beta talk Romeo Kienzler

Node 1

Model A Model A Model A

data parallelismaka. “Jeff Dean style” parameter averaging

Page 27: Parallelization Stategies of DeepLearning Neural Network Training

Disclaimer: This is a beta talk - don’t kill me - Disclaimer: This is a beta talk - don’t kill me - Disclaimer: This is a beta talk - don’t kill me

Disclaimer: This is a beta talk - don’t kill me - Disclaimer: This is a beta talk - don’t kill me - Disclaimer: This is a beta talk Romeo Kienzler

Node 1 Node 2

Model A Model A Model A

data parallelismaka. “Jeff Dean style” parameter averaging

Page 28: Parallelization Stategies of DeepLearning Neural Network Training

Disclaimer: This is a beta talk - don’t kill me - Disclaimer: This is a beta talk - don’t kill me - Disclaimer: This is a beta talk - don’t kill me

Disclaimer: This is a beta talk - don’t kill me - Disclaimer: This is a beta talk - don’t kill me - Disclaimer: This is a beta talk Romeo Kienzler

Node 1 Node 2 Node 3

Model A Model A Model A

data parallelismaka. “Jeff Dean style” parameter averaging

Page 29: Parallelization Stategies of DeepLearning Neural Network Training

Disclaimer: This is a beta talk - don’t kill me - Disclaimer: This is a beta talk - don’t kill me - Disclaimer: This is a beta talk - don’t kill me

Disclaimer: This is a beta talk - don’t kill me - Disclaimer: This is a beta talk - don’t kill me - Disclaimer: This is a beta talk Romeo Kienzler

Node 1 Node 2 Node 3

Model A Model A Model A

data parallelismaka. “Jeff Dean style” parameter averaging

Parameter Server

Page 30: Parallelization Stategies of DeepLearning Neural Network Training

Disclaimer: This is a beta talk - don’t kill me - Disclaimer: This is a beta talk - don’t kill me - Disclaimer: This is a beta talk - don’t kill me

Disclaimer: This is a beta talk - don’t kill me - Disclaimer: This is a beta talk - don’t kill me - Disclaimer: This is a beta talk Romeo Kienzler

Node 1 Node 2 Node 3

Model A Part 1

Model A Part 2

Model A Part 3

intra-model parallelismParameter Server

Page 31: Parallelization Stategies of DeepLearning Neural Network Training

Disclaimer: This is a beta talk - don’t kill me - Disclaimer: This is a beta talk - don’t kill me - Disclaimer: This is a beta talk - don’t kill me

Disclaimer: This is a beta talk - don’t kill me - Disclaimer: This is a beta talk - don’t kill me - Disclaimer: This is a beta talk Romeo Kienzler

intra-model parallelism

Page 32: Parallelization Stategies of DeepLearning Neural Network Training

Disclaimer: This is a beta talk - don’t kill me - Disclaimer: This is a beta talk - don’t kill me - Disclaimer: This is a beta talk - don’t kill me

Disclaimer: This is a beta talk - don’t kill me - Disclaimer: This is a beta talk - don’t kill me - Disclaimer: This is a beta talk Romeo Kienzler

v

intra-model parallelism

Page 33: Parallelization Stategies of DeepLearning Neural Network Training

Disclaimer: This is a beta talk - don’t kill me - Disclaimer: This is a beta talk - don’t kill me - Disclaimer: This is a beta talk - don’t kill me

Disclaimer: This is a beta talk - don’t kill me - Disclaimer: This is a beta talk - don’t kill me - Disclaimer: This is a beta talk Romeo Kienzler

v

v

intra-model parallelism

Page 34: Parallelization Stategies of DeepLearning Neural Network Training

Disclaimer: This is a beta talk - don’t kill me - Disclaimer: This is a beta talk - don’t kill me - Disclaimer: This is a beta talk - don’t kill me

Disclaimer: This is a beta talk - don’t kill me - Disclaimer: This is a beta talk - don’t kill me - Disclaimer: This is a beta talk Romeo Kienzler

v

v

v

intra-model parallelism

Page 35: Parallelization Stategies of DeepLearning Neural Network Training

Disclaimer: This is a beta talk - don’t kill me - Disclaimer: This is a beta talk - don’t kill me - Disclaimer: This is a beta talk - don’t kill me

Disclaimer: This is a beta talk - don’t kill me - Disclaimer: This is a beta talk - don’t kill me - Disclaimer: This is a beta talk Romeo Kienzler

v

v

v

v

intra-model parallelism

Page 36: Parallelization Stategies of DeepLearning Neural Network Training

Disclaimer: This is a beta talk - don’t kill me - Disclaimer: This is a beta talk - don’t kill me - Disclaimer: This is a beta talk - don’t kill me

Disclaimer: This is a beta talk - don’t kill me - Disclaimer: This is a beta talk - don’t kill me - Disclaimer: This is a beta talk Romeo Kienzler

pipelined parallelism

Page 37: Parallelization Stategies of DeepLearning Neural Network Training

Disclaimer: This is a beta talk - don’t kill me - Disclaimer: This is a beta talk - don’t kill me - Disclaimer: This is a beta talk - don’t kill me

Disclaimer: This is a beta talk - don’t kill me - Disclaimer: This is a beta talk - don’t kill me - Disclaimer: This is a beta talk Romeo Kienzler

pipelined parallelism

0 0 0 1 1 0 1 1

Page 38: Parallelization Stategies of DeepLearning Neural Network Training

Disclaimer: This is a beta talk - don’t kill me - Disclaimer: This is a beta talk - don’t kill me - Disclaimer: This is a beta talk - don’t kill me

Disclaimer: This is a beta talk - don’t kill me - Disclaimer: This is a beta talk - don’t kill me - Disclaimer: This is a beta talk Romeo Kienzler

pipelined parallelism

0 0 0 1 1 0 1 1

Page 39: Parallelization Stategies of DeepLearning Neural Network Training

Disclaimer: This is a beta talk - don’t kill me - Disclaimer: This is a beta talk - don’t kill me - Disclaimer: This is a beta talk - don’t kill me

Disclaimer: This is a beta talk - don’t kill me - Disclaimer: This is a beta talk - don’t kill me - Disclaimer: This is a beta talk Romeo Kienzler

pipelined parallelism

0 0 0 1 1 0 1 1

Page 40: Parallelization Stategies of DeepLearning Neural Network Training

Disclaimer: This is a beta talk - don’t kill me - Disclaimer: This is a beta talk - don’t kill me - Disclaimer: This is a beta talk - don’t kill me

Disclaimer: This is a beta talk - don’t kill me - Disclaimer: This is a beta talk - don’t kill me - Disclaimer: This is a beta talk Romeo Kienzler

pipelined parallelism

0 0 0 1 1 0 1 1

Page 41: Parallelization Stategies of DeepLearning Neural Network Training

Disclaimer: This is a beta talk - don’t kill me - Disclaimer: This is a beta talk - don’t kill me - Disclaimer: This is a beta talk - don’t kill me

Disclaimer: This is a beta talk - don’t kill me - Disclaimer: This is a beta talk - don’t kill me - Disclaimer: This is a beta talk Romeo Kienzler

pipelined parallelism

0 0 0 1 1 0 1 1

Page 42: Parallelization Stategies of DeepLearning Neural Network Training

Disclaimer: This is a beta talk - don’t kill me - Disclaimer: This is a beta talk - don’t kill me - Disclaimer: This is a beta talk - don’t kill me

Disclaimer: This is a beta talk - don’t kill me - Disclaimer: This is a beta talk - don’t kill me - Disclaimer: This is a beta talk Romeo Kienzler

Apache SparkDriver JVM

Compute Node

Executor JVM

Executor JVM

Executor JVM

Compute Node

Executor JVM

Executor JVM

Executor JVM

Compute Node

Executor JVM

Executor JVM

Executor JVM

Compute Node

Executor JVM

Executor JVM

Executor JVM

Compute Node

Executor JVM

Executor JVM

Executor JVM

Page 43: Parallelization Stategies of DeepLearning Neural Network Training

Disclaimer: This is a beta talk - don’t kill me - Disclaimer: This is a beta talk - don’t kill me - Disclaimer: This is a beta talk - don’t kill me

Disclaimer: This is a beta talk - don’t kill me - Disclaimer: This is a beta talk - don’t kill me - Disclaimer: This is a beta talk Romeo Kienzler

DeepLearning4J

vs.

Page 44: Parallelization Stategies of DeepLearning Neural Network Training

Disclaimer: This is a beta talk - don’t kill me - Disclaimer: This is a beta talk - don’t kill me - Disclaimer: This is a beta talk - don’t kill me

Disclaimer: This is a beta talk - don’t kill me - Disclaimer: This is a beta talk - don’t kill me - Disclaimer: This is a beta talk Romeo Kienzler

DeepLearning4J

vs.

data paralle

lism

Page 45: Parallelization Stategies of DeepLearning Neural Network Training

Disclaimer: This is a beta talk - don’t kill me - Disclaimer: This is a beta talk - don’t kill me - Disclaimer: This is a beta talk - don’t kill me

Disclaimer: This is a beta talk - don’t kill me - Disclaimer: This is a beta talk - don’t kill me - Disclaimer: This is a beta talk Romeo Kienzler

ND4J (DeepLearning4J)

• Tensor support (Linear Buffer + Stride)

• Multiple implementations, one interface

• vectorized c++ code (JavaCPP), off-heap data storage, BLAS (OpenBLAS, Intel MKL, cuBLAS)

• GPU (CUDA 8)

Page 46: Parallelization Stategies of DeepLearning Neural Network Training

Disclaimer: This is a beta talk - don’t kill me - Disclaimer: This is a beta talk - don’t kill me - Disclaimer: This is a beta talk - don’t kill me

Disclaimer: This is a beta talk - don’t kill me - Disclaimer: This is a beta talk - don’t kill me - Disclaimer: This is a beta talk Romeo Kienzler

ND4J (DeepLearning4J)

• Tensor support (Linear Buffer + Stride)

• Multiple implementations, one interface

• vectorized c++ code (JavaCPP), off-heap data storage, BLAS (OpenBLAS, Intel MKL, cuBLAS)

• GPU (CUDA 8)

intra-m

odel

paralle

lism

Page 47: Parallelization Stategies of DeepLearning Neural Network Training

Disclaimer: This is a beta talk - don’t kill me - Disclaimer: This is a beta talk - don’t kill me - Disclaimer: This is a beta talk - don’t kill me

Disclaimer: This is a beta talk - don’t kill me - Disclaimer: This is a beta talk - don’t kill me - Disclaimer: This is a beta talk Romeo Kienzler

Apache SystemML• Custom machine learning algorithms

• Declarative ML

• Transparent distribution on data-parallel framework

• Scale-up

• Scale-out

• Cost-based optimiser generates low level execution plans

Page 48: Parallelization Stategies of DeepLearning Neural Network Training

2009 2008 2007

2007-2008: Multiple projects at IBM Research – Almaden involving machine learning on Hadoop.

2010

2009-2010: Through engagements with customers, we observe how data scientists create ML solutions.

2009: We form a dedicated team for scalable ML

Page 49: Parallelization Stategies of DeepLearning Neural Network Training

2014 2013 2012 2011

Research

Page 50: Parallelization Stategies of DeepLearning Neural Network Training

2016 2015

June 2015: IBM Announces open-source SystemML

September 2015: Code available on Github

November 2015: SystemML enters Apache incubation

June 2016: Second Apache release (0.10)

February 2016: First release (0.9) of Apache SystemML

Page 51: Parallelization Stategies of DeepLearning Neural Network Training

Disclaimer: This is a beta talk - don’t kill me - Disclaimer: This is a beta talk - don’t kill me - Disclaimer: This is a beta talk - don’t kill me

Disclaimer: This is a beta talk - don’t kill me - Disclaimer: This is a beta talk - don’t kill me - Disclaimer: This is a beta talk Romeo Kienzler

Apache SystemMLU"="rand(nrow(X),"r,"min"="01.0,"max"="1.0);""V"="rand(r,"ncol(X),"min"="01.0,"max"="1.0);""while(i"<"mi)"{""""i"="i"+"1;"ii"="1;""""if"(is_U)"""""""G"="(W"*"(U"%*%"V"0"X))"%*%"t(V)"+"lambda"*"U;""""else"""""""G"="t(U)"%*%"(W"*"(U"%*%"V"0"X))"+"lambda"*"V;""""norm_G2"="sum(G"^"2);"norm_R2"="norm_G2;""""""""R"="0G;"S"="R;""""while(norm_R2">"10E09"*"norm_G2"&"ii"<="mii)"{""""""if"(is_U)"{""""""""HS"="(W"*"(S"%*%"V))"%*%"t(V)"+"lambda"*"S;""""""""alpha"="norm_R2"/"sum"(S"*"HS);""""""""U"="U"+"alpha"*"S;""""""""}"else"{""""""""HS"="t(U)"%*%"(W"*"(U"%*%"S))"+"lambda"*"S;""""""""alpha"="norm_R2"/"sum"(S"*"HS);""""""""V"="V"+"alpha"*"S;""""""""}""""""R"="R"0"alpha"*"HS;""""""old_norm_R2"="norm_R2;"norm_R2"="sum(R"^"2);""""""S"="R"+"(norm_R2"/"old_norm_R2)"*"S;""""""ii"="ii"+"1;""""}""""""is_U"="!"is_U;"}"

Page 52: Parallelization Stategies of DeepLearning Neural Network Training

Disclaimer: This is a beta talk - don’t kill me - Disclaimer: This is a beta talk - don’t kill me - Disclaimer: This is a beta talk - don’t kill me

Disclaimer: This is a beta talk - don’t kill me - Disclaimer: This is a beta talk - don’t kill me - Disclaimer: This is a beta talk Romeo Kienzler

Apache SystemMLU"="rand(nrow(X),"r,"min"="01.0,"max"="1.0);""V"="rand(r,"ncol(X),"min"="01.0,"max"="1.0);""while(i"<"mi)"{""""i"="i"+"1;"ii"="1;""""if"(is_U)"""""""G"="(W"*"(U"%*%"V"0"X))"%*%"t(V)"+"lambda"*"U;""""else"""""""G"="t(U)"%*%"(W"*"(U"%*%"V"0"X))"+"lambda"*"V;""""norm_G2"="sum(G"^"2);"norm_R2"="norm_G2;""""""""R"="0G;"S"="R;""""while(norm_R2">"10E09"*"norm_G2"&"ii"<="mii)"{""""""if"(is_U)"{""""""""HS"="(W"*"(S"%*%"V))"%*%"t(V)"+"lambda"*"S;""""""""alpha"="norm_R2"/"sum"(S"*"HS);""""""""U"="U"+"alpha"*"S;""""""""}"else"{""""""""HS"="t(U)"%*%"(W"*"(U"%*%"S))"+"lambda"*"S;""""""""alpha"="norm_R2"/"sum"(S"*"HS);""""""""V"="V"+"alpha"*"S;""""""""}""""""R"="R"0"alpha"*"HS;""""""old_norm_R2"="norm_R2;"norm_R2"="sum(R"^"2);""""""S"="R"+"(norm_R2"/"old_norm_R2)"*"S;""""""ii"="ii"+"1;""""}""""""is_U"="!"is_U;"}"

SystemML: compile and run at scale no performance code needed!

Page 53: Parallelization Stategies of DeepLearning Neural Network Training

0

5000

10000

15000

20000

1.2GB (sparse binary)

12GB 120GB

Run

ning

Tim

e (s

ec)

R MLLib SystemML

>24h >24h

OO

M

OO

M

Page 54: Parallelization Stategies of DeepLearning Neural Network Training

Architecture

SystemML Optimizer

High-Level Algorithm

Parallel Spark

Program

Page 55: Parallelization Stategies of DeepLearning Neural Network Training

Architecture

High-Level Operations (HOPs) General representation of statements in the data analysis language

Low-Level Operations (LOPs) General representation of operations in the runtime framework

High-level language front-ends

Multiple execution environments

Cost Based

Optimizer

Page 56: Parallelization Stategies of DeepLearning Neural Network Training

Architecture

High-Level Operations (HOPs) General representation of statements in the data analysis language

Low-Level Operations (LOPs) General representation of operations in the runtime framework

High-level language front-ends

Multiple execution environments

Cost Based

Optimizer

data paralle

lism

Page 57: Parallelization Stategies of DeepLearning Neural Network Training

Architecture

High-Level Operations (HOPs) General representation of statements in the data analysis language

Low-Level Operations (LOPs) General representation of operations in the runtime framework

High-level language front-ends

Multiple execution environments

Cost Based

Optimizer

intra-m

odel

paralle

lism

Page 58: Parallelization Stategies of DeepLearning Neural Network Training

Disclaimer: This is a beta talk - don’t kill me - Disclaimer: This is a beta talk - don’t kill me - Disclaimer: This is a beta talk - don’t kill me

Disclaimer: This is a beta talk - don’t kill me - Disclaimer: This is a beta talk - don’t kill me - Disclaimer: This is a beta talk Romeo Kienzler

TensorFramesCompute Node

Executor JVM

Executor JVM

Executor JVM

Compute Node

Executor JVM

Executor JVM

Executor JVM

Compute Node

Executor JVM

Executor JVM

Executor JVM

Compute Node

Executor JVM

Executor JVM

Executor JVM

Compute Node

Executor JVM

Executor JVM

Executor JVM

Page 59: Parallelization Stategies of DeepLearning Neural Network Training

Disclaimer: This is a beta talk - don’t kill me - Disclaimer: This is a beta talk - don’t kill me - Disclaimer: This is a beta talk - don’t kill me

Disclaimer: This is a beta talk - don’t kill me - Disclaimer: This is a beta talk - don’t kill me - Disclaimer: This is a beta talk Romeo Kienzler

TensorFramesCompute Node

Executor JVM

Executor JVM

Executor JVM

Compute Node

Executor JVM

Executor JVM

Executor JVM

Compute Node

Executor JVM

Executor JVM

Executor JVM

Compute Node

Executor JVM

Executor JVM

Executor JVM

Compute Node

Executor JVM

Executor JVM

Executor JVM

Page 60: Parallelization Stategies of DeepLearning Neural Network Training

Disclaimer: This is a beta talk - don’t kill me - Disclaimer: This is a beta talk - don’t kill me - Disclaimer: This is a beta talk - don’t kill me

Disclaimer: This is a beta talk - don’t kill me - Disclaimer: This is a beta talk - don’t kill me - Disclaimer: This is a beta talk Romeo Kienzler

TensorFramesCompute Node

Executor JVM

Executor JVM

Executor JVM

Compute Node

Executor JVM

Executor JVM

Executor JVM

Compute Node

Executor JVM

Executor JVM

Executor JVM

Compute Node

Executor JVM

Executor JVM

Executor JVM

Compute Node

Executor JVM

Executor JVM

Executor JVM

inter-model

paralle

lism

Page 61: Parallelization Stategies of DeepLearning Neural Network Training

Disclaimer: This is a beta talk - don’t kill me - Disclaimer: This is a beta talk - don’t kill me - Disclaimer: This is a beta talk - don’t kill me

Disclaimer: This is a beta talk - don’t kill me - Disclaimer: This is a beta talk - don’t kill me - Disclaimer: This is a beta talk Romeo Kienzler

TensorSparkParameter Server on

Driver

Compute Node

Executor JVM

Executor JVM

Executor JVM

Compute Node

Executor JVM

Executor JVM

Executor JVM

Compute Node

Executor JVM

Executor JVM

Executor JVM

Compute Node

Executor JVM

Executor JVM

Executor JVM

Compute Node

Executor JVM

Executor JVM

Executor JVM

Page 62: Parallelization Stategies of DeepLearning Neural Network Training

Disclaimer: This is a beta talk - don’t kill me - Disclaimer: This is a beta talk - don’t kill me - Disclaimer: This is a beta talk - don’t kill me

Disclaimer: This is a beta talk - don’t kill me - Disclaimer: This is a beta talk - don’t kill me - Disclaimer: This is a beta talk Romeo Kienzler

TensorSparkParameter Server on

Driver

Compute Node

Executor JVM

Executor JVM

Executor JVM

Compute Node

Executor JVM

Executor JVM

Executor JVM

Compute Node

Executor JVM

Executor JVM

Executor JVM

Compute Node

Executor JVM

Executor JVM

Executor JVM

Compute Node

Executor JVM

Executor JVM

Executor JVM

data paralle

lism

Page 63: Parallelization Stategies of DeepLearning Neural Network Training

Disclaimer: This is a beta talk - don’t kill me - Disclaimer: This is a beta talk - don’t kill me - Disclaimer: This is a beta talk - don’t kill me

Disclaimer: This is a beta talk - don’t kill me - Disclaimer: This is a beta talk - don’t kill me - Disclaimer: This is a beta talk Romeo Kienzler

CaffeOnSpark

Page 64: Parallelization Stategies of DeepLearning Neural Network Training

Disclaimer: This is a beta talk - don’t kill me - Disclaimer: This is a beta talk - don’t kill me - Disclaimer: This is a beta talk - don’t kill me

Disclaimer: This is a beta talk - don’t kill me - Disclaimer: This is a beta talk - don’t kill me - Disclaimer: This is a beta talk Romeo Kienzler

CaffeOnSpark

data paralle

lism

Page 65: Parallelization Stategies of DeepLearning Neural Network Training

Disclaimer: This is a beta talk - don’t kill me - Disclaimer: This is a beta talk - don’t kill me - Disclaimer: This is a beta talk - don’t kill me

Disclaimer: This is a beta talk - don’t kill me - Disclaimer: This is a beta talk - don’t kill me - Disclaimer: This is a beta talk Romeo Kienzler

?