Download - Webinar: Deep Learning with H2O
Deep Learning with H2O
!
H2O.aiScalable In-Memory Machine Learning
!
Webinar, 5/21/14
SriSatish Ambati, CEO and Co-Founder Arno Candel, PhD, Physicst & Hacker
H2O Deep Learning, @ArnoCandel
Outline
Intro & Live Demo (5 mins)
Methods & Implementation (10 mins)
Results & Live Demo (10 mins)
MNIST handwritten digits
text classification
Q & A (10 mins)
2
H2O Deep Learning, @ArnoCandel 3
About H20 (aka 0xdata)Pure Java, Apache v2 Open Source Join the www.h2o.ai/community!
3
+1 Cyprien Noel for prior work
H2O Deep Learning, @ArnoCandel
Customer Demands for Practical Machine Learning
4
Requirements Value
In-Memory Fast (Interactive)
Distributed Big Data (No Sampling)
Open Source Ownership of Methods
API / SDK Extensibility
H2O was developed by 0xdata to meet these requirements
H2O Deep Learning, @ArnoCandel
H2O Integration
H2O
HDFS HDFS HDFS
YARN Hadoop MR
R ScalaJSON Python
Standalone Over YARN On MRv1
5
H2O H2O
Java
H2O Deep Learning, @ArnoCandel
H2O Architecture
Distributed In-Memory K-V storeCol. compression
Machine Learning
Algorithms
R EngineNano fast
Scoring Engine
Prediction Engine
Memory manager
e.g. Deep Learning
6
MapReduce
H2O Deep Learning, @ArnoCandel
H2O + R = Happy Data Scientist
7
Machine Learning on Big Data with R:Data resides on the H2O cluster!
H2O Deep Learning, @ArnoCandel
H2O Deep Learning in Action
Train: 60,000 rows 784 integer columns 10 classes Test: 10,000 rows 784 integer columns 10 classes
8
MNIST = Digitized handwritten digits database (Yann LeCun)
Live Demo Build a H2O Deep Learning model on MNIST train/test data
Data: 28x28=784 pixels with (gray-scale) values in 0…255
Yann LeCun: “Yet another advice: don't get fooled by people who claim to have a solution to Artificial General Intelligence. Ask them what error rate they get on MNIST or ImageNet.”
H2O Deep Learning, @ArnoCandel
Wikipedia:Deep learning is a set of algorithms in machine learning that attempt to model high-level abstractions in data by using
architectures composed of multiple non-linear transformations.
What is Deep Learning?
Example: Input data(image)
Prediction (who?)
9
Facebook's DeepFace (Yann LeCun) recognises faces as well as humans
H2O Deep Learning, @ArnoCandel
Deep Learning is Trending
20132012
Google trends
2011
10
Businesses are usingDeep Learning techniques!
Google Brain (Andrew Ng, Jeff Dean & Geoffrey Hinton) !FBI FACE: $1 billion face recognition project !Chinese Search Giant Baidu Hires Man Behind the “Google Brain” (Andrew Ng)
H2O Deep Learning, @ArnoCandel
What is NOT DeepLinear models are not deep (by definition)
!
Neural nets with 1 hidden layer are not deep (no feature hierarchy)
!
SVMs and Kernel methods are not deep (2 layers: kernel + linear)
!
Classification trees are not deep (operate on original input space)
11
H2O Deep Learning, @ArnoCandel
1970s multi-layer feed-forward Neural Network (supervised learning with stochastic gradient descent using back-propagation) !+ distributed processing for big data (H2O in-memory MapReduce paradigm on distributed data) !+ multi-threaded speedup (H2O Fork/Join worker threads update the model asynchronously) !+ breakthrough algorithms for accuracy (weight initialization, adaptive learning, momentum, dropout, regularization)
!
= Top-notch prediction engine!
Deep Learning in H2O12
H2O Deep Learning, @ArnoCandel
“fully connected” directed graph of neurons
age
income
employment
married
single
Input layerHidden layer 1
Hidden layer 2
Output layer
3x4 4x3 3x2#connections
information flow
input/output neuronhidden neuron
4 3 2#neurons 3
Example Neural Network13
H2O Deep Learning, @ArnoCandel
age
income
employmentyj = tanh(sumi(xi*uij)+bj)
uij
xi
yj
per-class probabilities sum(pl) = 1
zk = tanh(sumj(yj*vjk)+ck)
vjk
zk pl
pl = softmax(sumk(zk*wkl)+dl)
wkl
softmax(xk) = exp(xk) / sumk(exp(xk))
“neurons activate each other via weighted sums”
Prediction: Forward Propagation
married
single
activation function: tanh alternative:
x -> max(0,x) “rectifier”
pl is a non-linear function of xi: can approximate ANY function
with enough layers!
bj, ck, dl: bias values(indep. of inputs)
14
H2O Deep Learning, @ArnoCandel
Mean Square Error = (0.22 + 0.22)/2 “penalize differences per-class” ! Cross-entropy = -log(0.8) “strongly penalize non-1-ness”
Training: Update Weights & Biases
Stochastic Gradient Descent: Update weights and biases via gradient of the error (via back-propagation):
For each training row, we make a prediction and compare with the actual label (supervised learning):
married10.8predicted actual
Objective: minimize prediction error (MSE or cross-entropy)
w <— w - rate * ∂E/∂w
1
15
single00.2
E
wrate
H2O Deep Learning, @ArnoCandel
H2O Deep Learning Architecture
K-V
K-V
HTTPD
HTTPD
nodes/JVMs: sync
threads: async
communication
w
w w
w w w w
w1 w3 w2w4
w2+w4w1+w3
w* = (w1+w2+w3+w4)/4
map: each node trains a copy of the weights
and biases with (some* or all of) its
local data with asynchronous F/J
threads
initial model: weights and biases w
updated model: w*
H2O atomic in-memoryK-V store
reduce: model averaging:
average weights and biases from all nodes,
speedup is at least #nodes/log(#rows) arxiv:1209.4129v3
Keep iterating over the data (“epochs”), score from time to time
Query & display the model via
JSON, WWW
2
2 431
1
1
1
43 2
1 2
1
i
*user can specify the number of total rows per MapReduce iteration
16
H2O Deep Learning, @ArnoCandel
“Secret” Sauce to Higher Accuracy
Adaptive learning rate - ADADELTA (Google)Automatically set learning rate for each neuron based on its training history
Grid Search and Checkpointing Run a grid search to scan many hyper-parameters, then continue training the most promising model(s)
RegularizationL1: penalizes non-zero weights L2: penalizes large weightsDropout: randomly ignore certain inputs
17
H2O Deep Learning, @ArnoCandel
MNIST: digits classification
Standing world record: Without distortions or convolutions, the best-ever published error rate on test set: 0.83% (Microsoft)
18
Time to check in on the demo!
Let’s see how H2O did in the past 10 minutes!
H2O Deep Learning, @ArnoCandel
Frequent errors: confuse 2/7 and 4/9
H2O Deep Learning on MNIST: 0.87% test set error (so far)
19
test set error: 1.5% after 10 mins 1.0% after 1.5 hours 0.87% after 4 hours
World-class results!
No pre-training No distortions
No convolutions No unsupervised
training
Running on 4 nodes with 16 cores each
On 4 nodes
H2O Deep Learning, @ArnoCandel
Use Case: Text Classification
Goal: Predict the item from seller’s text description
20
Train: 578,361 rows 8,647 cols 467 classes Test: 64,263 rows 8,647 cols 143 classes
“Vintage 18KT gold Rolex 2 Tone in great condition”
Data: Binary word vector 0,0,1,0,0,0,0,0,1,0,0,0,1,…,0
vintagegold condition
Let’s see how H2O does on the ebay dataset!
H2O Deep Learning, @ArnoCandel
Out-Of-The-Box: 11.6% test set error after 10 epochs! Predicts the correct class (out of 143) 88.4% of the time!
21
Note 2: No tuning was done(results are for illustration only)
Train: 578,361 rows 8,647 cols 467 classes Test: 64,263 rows 8,647 cols 143 classes
Note 1: H2O columnar-compressed in-memory store only needs 60 MB to store 5 billion values (dense CSV needs 18 GB)
Use Case: Text Classification
H2O Deep Learning, @ArnoCandel
Parallel Scalability (for 64 epochs on MNIST, with “0.87%” parameters)
22
Speedup
0.00
10.00
20.00
30.00
40.00
1 2 4 8 16 32 63
H2O Nodes
(4 cores per node, 1 epoch per node per MapReduce)
2.7 mins
Training Time
0
25
50
75
100
1 2 4 8 16 32 63
H2O Nodes
in minutes
H2O Deep Learning, @ArnoCandel
Outlook for H2O Deep Learning23
Convolutional and Pooling Layers for General Image Recognition (ImageNet)
Sparse Auto-Encoders for Dimensionality Reduction and Anomaly Detection
Execution on GPU clusters for even faster training
H2O Deep Learning, @ArnoCandel
H2O Steam: Scoring Platform
24
H2O Deep Learning, @ArnoCandel
H2O Steam: More Coming Soon!
25
H2O Deep Learning, @ArnoCandel
Key Take-Aways
H2O is a distributed in-memory math platform for enterprise-grade machine learning applications. !
H2O Deep Learning is ready to take your advanced analytics to the next level - Try it on your data! !
Join our Community and Meetups! git clone https://github.com/0xdata/h2o http://docs.0xdata.com www.h2o.ai/community @hexadata
26