hadoop summit 2014 distributed deep learning

33
Deep Learning on Hadoop Scaleout Deep Learning on YARN

Upload: adam-gibson

Post on 06-May-2015

378 views

Category:

Software


1 download

DESCRIPTION

Deep Learning on Hadoop with DeepLearning4j and Metronome Deep-learning is useful in detecting anomalies like fraud, spam and money laundering; identifying similarities to augment search and text analytics; predicting customer lifetime value and churn; recognizing faces and voices. Deeplearning4j is an infinitely scalable deep-learning architecture suitable for Hadoop and other big-data structures. It includes a distributed deep-learning framework and a normal deep-learning framework; i.e. it runs on a single thread as well. Training takes place in the cluster, which means it can process massive amounts of data. Nets are trained in parallel via iterative reduce, and they are equally compatible with Java, Scala and Clojure. The distributed deep-learning framework is made for data input and neural net training at scale, and its output should be highly accurate predictive models. The framework's neural nets include restricted Boltzmann machines, deep-belief networks, deep autoencoders, convolutional nets and recursive neural tensor networks.

TRANSCRIPT

Page 1: Hadoop Summit 2014 Distributed Deep Learning

Deep Learning on Hadoop

Scaleout Deep Learning on YARN

Page 3: Hadoop Summit 2014 Distributed Deep Learning

Josh Patterson

Email:[email protected]

Twitter:

@jpatanooga

Github:

https://github.com/jpatanooga

Past

Published in IAAI-09:

“TinyTermite: A Secure Routing Algorithm”

Grad work in Meta-heuristics, Ant-algorithms

Tennessee Valley Authority (TVA)

Hadoop and the Smartgrid

Cloudera

Principal Solution Architect

Today: Patterson Consulting

Page 4: Hadoop Summit 2014 Distributed Deep Learning

Overview

• What is Deep Learning?

• Deep Belief Networks

• Implementation on Hadoop/YARN

• Results

Page 5: Hadoop Summit 2014 Distributed Deep Learning

What is Deep Learning?

Page 6: Hadoop Summit 2014 Distributed Deep Learning

What is Deep Learning?

Algorithm that tries to learn simple features in lower layers

And more complex features in higher layers

Page 7: Hadoop Summit 2014 Distributed Deep Learning

Interesting Properties of Deep Learning

Reduces a problem with overfitting in neural networks.

Introduces new techniques for "unsupervised feature learning”

introduces new more automatic ways to figure out the parts of your data you should feed into your learning algorithm.

Page 8: Hadoop Summit 2014 Distributed Deep Learning

Chasing Nature

Learning sparse representations of auditory signals

leads to filters that closely correspond to neurons in early audio processing in mammals

When applied to speech

Learned representations showed a striking resemblance to the cochlear filters in the auditory cortext

Page 9: Hadoop Summit 2014 Distributed Deep Learning

Yann LeCunn on Deep Learning

Has become the dominant method for acoustic modeling in speech recognition

Quickly becoming the dominant method for several vision tasks such as

object recognition

object detection

semantic segmentation.

Page 10: Hadoop Summit 2014 Distributed Deep Learning

Deep Belief Networks

Page 11: Hadoop Summit 2014 Distributed Deep Learning

What is a Deep Belief Network?

Generative probabilistic model

Composed of one visible layer

Many hidden layers

Each hidden layer learns relationship between units in lower layer

Higher layer representations tend to become more complext

Page 12: Hadoop Summit 2014 Distributed Deep Learning

Restricted Boltzmann Machines

Unsupervised model: Does feature learning by repeated sampling of the input data. Learns how to reconstruct data for good feature detection. RBMs have different formulas for different kinds of data:

Binary

Continuous

Page 13: Hadoop Summit 2014 Distributed Deep Learning

DeepLearning4J

Implementation in Java

Self-contained & built on Akka, Hazelcast, Jblas

Distributed to run faster and with more features than current Theano-based implementations.

Talks to any data source, expects one format.

Page 14: Hadoop Summit 2014 Distributed Deep Learning

Vectorized Implementation

Handles lots of data concurrently.

Any number of examples at once, but the code does not change.

Faster: Allows for native/GPU execution.

One format: Everything is a matrix.

Page 15: Hadoop Summit 2014 Distributed Deep Learning

DL4J vs Theano Perf

GPUs are inherently faster than normal native.

Theano is not distributed, and GPUs have very low RAM.

DL4J allows for situations where you have to “throw CPUs at it.”

Page 16: Hadoop Summit 2014 Distributed Deep Learning

What are Good Applications for Deep Learning?

Image Processing

High MNIST Scores

Audio Processing

Current Champ on TIMIT dataset

Text / NLP Processing

Word2vec, etc

Page 17: Hadoop Summit 2014 Distributed Deep Learning

Deep Learning on

Hadoop

Page 18: Hadoop Summit 2014 Distributed Deep Learning

Past Work: Parallel Iterative Algorithms on YARN

Started with

Parallel linear, logistic regression

Parallel Neural Networks

Packaged in Metronome

100% Java, ASF 2.0 Licensed, on github

Page 19: Hadoop Summit 2014 Distributed Deep Learning

19

Parameter Averaging

McDonald, 2010

Distributed Training Strategies for the Structured Perceptron

Langford, 2007

Vowpal Wabbit

Jeff Dean’s Work on Parallel SGD

DownPour SGD

Page 20: Hadoop Summit 2014 Distributed Deep Learning

20

MapReduce vs. Parallel

IterativeInput

Output

Map Map Map

Reduce Reduce

ProcessorProcessor ProcessorProcessor ProcessorProcessor

Superstep 1Superstep 1

ProcessorProcessor ProcessorProcessor

Superstep 2Superstep 2

. . .

ProcessorProcessor

Page 21: Hadoop Summit 2014 Distributed Deep Learning

21

SGD: Serial vs Parallel

Model

Training Data

Worker 1

Master

Partial Model

Global Model

Worker 2

Partial Model

Worker N

Partial Model

Split 1 Split 2 Split 3

Page 22: Hadoop Summit 2014 Distributed Deep Learning

Managing Resources

Running through YARN on hadoop is important

Allows for workflow scheduling

Allows for scheduler oversight

Allows the jobs to be first class citizens on Hadoop

And share resources nicely

Page 23: Hadoop Summit 2014 Distributed Deep Learning

Parallelizing Deep Belief Networks

Two phase training

Pre Train

Fine tune

Each phase can do multiple passes over dataset

Entire network is averaged at master

Page 24: Hadoop Summit 2014 Distributed Deep Learning

PreTrain and Lots of Data

We’re exploring how to better leverage the unsupervised aspects of the PreTrain phase of Deep Belief Networks

Allows for the use of far less unlabeled data

Allows us to more easily modeled the massive amounts of structured data in HDFS

Page 25: Hadoop Summit 2014 Distributed Deep Learning

Results

Page 26: Hadoop Summit 2014 Distributed Deep Learning

DBNs on IR Performance

Faster to Train.

Parameter averaging is an automatic form of regularization.

Adagrad with IR allows for better generalization of different features and even pacing.

Page 27: Hadoop Summit 2014 Distributed Deep Learning

Scale Out Metrics

Batches of records can be processed by as many workers as there are data splits

Message passing overhead is minimal

Exhibits linear scaling

Example: 3x workers, 3x faster learning

Page 28: Hadoop Summit 2014 Distributed Deep Learning

Usage From Command Line

Run Deep Learning on Hadoop

yarn jar iterativereduce-0.1-SNAPSHOT.jar [props file]

Evaluate model

./score_model.sh [props file]

Page 29: Hadoop Summit 2014 Distributed Deep Learning

Handwriting Renders

Page 30: Hadoop Summit 2014 Distributed Deep Learning

Faces Renders

Page 31: Hadoop Summit 2014 Distributed Deep Learning

…In Which We Gather Lots of Cat Photos

Page 32: Hadoop Summit 2014 Distributed Deep Learning

Future Direction

GPUs

Better Vectorization tooling

Move YARN version back over to JBLAS for matrices

Page 33: Hadoop Summit 2014 Distributed Deep Learning

References

“A Fast Learning Algorithm for Deep Belief Nets”

Hinton, G. E., Osindero, S. and Teh, Y. - Neural Computation (2006)

“Large Scale Distributed Deep Networks”

Dean, Corrado, Monga - NIPS (2012)

“Visually Debugging Restricted Boltzmann Machine Training with a 3D Example”

Yosinski, Lipson - Representation Learning Workshop (2012)