deep learning on a mixed cluster with deeplearning4j and spark
TRANSCRIPT
Deep learning on a mixedcluster with Deeplearning4j
and SparkBarcelona Spark meetup, Dec 9, 2016
(right after NIPS)[email protected] @huitseeker
AgendaIntro
Why Deep Learning on aCluster
Big Data Architecture
Deeplearning4j
Spark challenges
Introduction : Deep Learningin the trenches today
The bad thing about doing atalk right after NIPS
you guys are scary.
The good thing about doing atalk right after NIPS
You guys don't need to be told SkyNet is a fantasy (for now).
Paying algorithmsAnomaly detection in many forms (bad guys / predictivemaintenance / market rally)
Fraud detection
Network intrusion
Fintech secutiries churn prediction
Video object detection (security)
Models that are beingneglected in benchmarks and
implementation efforts
LSTMs
Autoencoders
How to deal with this in theSpark world ?
experiment with trained model application: Tensorframes,
what are the deep learning frameworks that let you train?
Why Deep Learning on acluster ?
Practically ... let's look at benchmarks
Practically ... let's look at benchmarks
Practically ... let's look at benchmarks
Practically ... let's look at benchmarks
Training, but how ?
New Amazon GPU instances
Training, but how ?
Training, but how ?
Cluster training in theenterprise
it's really about multi-tenancy & economies of scale
a big bunch of machines shared among everybody sharesbetter
if only because you can reuse it for other workloads
Minor reasons
enterprises may not haveGPUs
Distributing training
basically distributing SGD (R)
challenge is AllReduce Communication
Sparse updates, asynccommunications
Distributing training : goodengineering matters
Cluster training in your(experimentor) case ?
it's a fun problem : AllReduce
Ultimately solved for people with a large amount of images
that solution is not open-source (but at Facebook, Google,Amazon, Microsoft¹, Baidu)
¹: 1-bit SGD is under non-commercial license in CNTK 2.0
Big Data architecture
With a parameter server
With SparkSpark does the initial ETL
Spark ingests the �nal result
In the middle : parameterserver.
Spark cluster modesMesos GPU support merged
devices cgroups !
YARN GPU support throughtags
Spark Standalone : ?
Deeplearning4j
Deeplearning4jthe �rst commercial-grade, open-source, distributed deep-learning library written for Java and Scala
Skymind its commercial support arm
Scienti�c computing on the JVMlibnd4j : Vectorization, 32-bit addressing, linalg (BLAS!)
JavaCPP: generates JNI bindings to your CPP libs
ND4J : numpy for the JVM, native superfast arrays
Datavec : one-stop interface to an NDArray
DeepLearning4J: orchestration, backprop, layer de�nition
ScalNet: gateway drug, inspired from (and closely following)Keras
RL4J : Reinforcement learning for the JVM
With SparkJavaSparkContent sc = ...; JavaRDD<DataSet> trainingData = ...; MultiLayerConfiguration networkConfig = ...; //Create the TrainingMaster instance int examplesPerDataSetObject = 1; TrainingMaster trainingMaster = new ParameterAveragingTrainingMaster.Builder(examplesPerDataSetObject) .(other configuration options) .build(); //Create the SparkDl4jMultiLayer instance SparkDl4jMultiLayer sparkNetwork = new SparkDl4jMultiLayer(sc, networkConfig, trainingMaster); //Fit the network using the training data: sparkNetwork.fit(trainingData);
Spark Challenges
Even if you don't care about Deeplearning
(from Kazuaki Ishizaki @ IBM Japan)
SPARK-6442 : better linear algebra thanbreeze
ND4J will have sparse representations soon
Even if you don't care about Deeplearning II
Meta-RDDs
Killing the bottlenecksSpark has already changed its networking backend once.
better support for parameters servers and their faulttolerance.
A Last Word (from Andrew Y. Ng)get involved !
don't just read papers, reproduce researchresults
AlsoWe're happy to mentor contributions, and there's a book !
Questions ?