webinar: deep learning with h2o

26
Deep Learning with H 2 O H2O.ai Scalable In-Memory Machine Learning Webinar, 5/21/14 SriSatish Ambati, CEO and Co-Founder Arno Candel, PhD, Physicst & Hacker

Upload: srisatish-ambati

Post on 26-Jan-2015

109 views

Category:

Technology


2 download

DESCRIPTION

Note: Make sure to download the slides to get the high-resolution version! Also, you can find the webinar recording here (please also download for better quality): https://www.dropbox.com/s/72qi6wjzi61gs3q/H2ODeepLearningArnoCandel052114.mov Come hear how Deep Learning in H2O is unlocking never before seen performance for prediction! H2O is google-scale open source machine learning engine for R & Big Data. Enterprises can now use all of their data without sampling and build intelligent applications. This live webinar introduces Distributed Deep Learning concepts, implementation and results from recent developments. Real world classification & regression use cases from eBay text dataset, MNIST handwritten digits and Cancer datasets will present the power of this game changing technology.

TRANSCRIPT

Page 1: Webinar: Deep Learning with H2O

Deep Learning with H2O

!

H2O.aiScalable In-Memory Machine Learning

!

Webinar, 5/21/14

SriSatish Ambati, CEO and Co-Founder Arno Candel, PhD, Physicst & Hacker

Page 2: Webinar: Deep Learning with H2O

H2O Deep Learning, @ArnoCandel

Outline

Intro & Live Demo (5 mins)

Methods & Implementation (10 mins)

Results & Live Demo (10 mins)

MNIST handwritten digits

text classification

Q & A (10 mins)

2

Page 3: Webinar: Deep Learning with H2O

H2O Deep Learning, @ArnoCandel 3

About H20 (aka 0xdata)Pure Java, Apache v2 Open Source Join the www.h2o.ai/community!

3

+1 Cyprien Noel for prior work

Page 4: Webinar: Deep Learning with H2O

H2O Deep Learning, @ArnoCandel

Customer Demands for Practical Machine Learning

4

Requirements Value

In-Memory Fast (Interactive)

Distributed Big Data (No Sampling)

Open Source Ownership of Methods

API / SDK Extensibility

H2O was developed by 0xdata to meet these requirements

Page 5: Webinar: Deep Learning with H2O

H2O Deep Learning, @ArnoCandel

H2O Integration

H2O

HDFS HDFS HDFS

YARN Hadoop MR

R ScalaJSON Python

Standalone Over YARN On MRv1

5

H2O H2O

Java

Page 6: Webinar: Deep Learning with H2O

H2O Deep Learning, @ArnoCandel

H2O Architecture

Distributed In-Memory K-V storeCol. compression

Machine Learning

Algorithms

R EngineNano fast

Scoring Engine

Prediction Engine

Memory manager

e.g. Deep Learning

6

MapReduce

Page 7: Webinar: Deep Learning with H2O

H2O Deep Learning, @ArnoCandel

H2O + R = Happy Data Scientist

7

Machine Learning on Big Data with R:Data resides on the H2O cluster!

Page 8: Webinar: Deep Learning with H2O

H2O Deep Learning, @ArnoCandel

H2O Deep Learning in Action

Train: 60,000 rows 784 integer columns 10 classes Test: 10,000 rows 784 integer columns 10 classes

8

MNIST = Digitized handwritten digits database (Yann LeCun)

Live Demo Build a H2O Deep Learning model on MNIST train/test data

Data: 28x28=784 pixels with (gray-scale) values in 0…255

Yann LeCun: “Yet another advice: don't get fooled by people who claim to have a solution to Artificial General Intelligence. Ask them what error rate they get on MNIST or ImageNet.”

Page 9: Webinar: Deep Learning with H2O

H2O Deep Learning, @ArnoCandel

Wikipedia:Deep learning is a set of algorithms in machine learning that attempt to model high-level abstractions in data by using

architectures composed of multiple non-linear transformations.

What is Deep Learning?

Example: Input data(image)

Prediction (who?)

9

Facebook's DeepFace (Yann LeCun) recognises faces as well as humans

Page 10: Webinar: Deep Learning with H2O

H2O Deep Learning, @ArnoCandel

Deep Learning is Trending

20132012

Google trends

2011

10

Businesses are usingDeep Learning techniques!

Google Brain (Andrew Ng, Jeff Dean & Geoffrey Hinton) !FBI FACE: $1 billion face recognition project !Chinese Search Giant Baidu Hires Man Behind the “Google Brain” (Andrew Ng)

Page 11: Webinar: Deep Learning with H2O

H2O Deep Learning, @ArnoCandel

What is NOT DeepLinear models are not deep (by definition)

!

Neural nets with 1 hidden layer are not deep (no feature hierarchy)

!

SVMs and Kernel methods are not deep (2 layers: kernel + linear)

!

Classification trees are not deep (operate on original input space)

11

Page 12: Webinar: Deep Learning with H2O

H2O Deep Learning, @ArnoCandel

1970s multi-layer feed-forward Neural Network (supervised learning with stochastic gradient descent using back-propagation) !+ distributed processing for big data (H2O in-memory MapReduce paradigm on distributed data) !+ multi-threaded speedup (H2O Fork/Join worker threads update the model asynchronously) !+ breakthrough algorithms for accuracy (weight initialization, adaptive learning, momentum, dropout, regularization)

!

= Top-notch prediction engine!

Deep Learning in H2O12

Page 13: Webinar: Deep Learning with H2O

H2O Deep Learning, @ArnoCandel

“fully connected” directed graph of neurons

age

income

employment

married

single

Input layerHidden layer 1

Hidden layer 2

Output layer

3x4 4x3 3x2#connections

information flow

input/output neuronhidden neuron

4 3 2#neurons 3

Example Neural Network13

Page 14: Webinar: Deep Learning with H2O

H2O Deep Learning, @ArnoCandel

age

income

employmentyj = tanh(sumi(xi*uij)+bj)

uij

xi

yj

per-class probabilities sum(pl) = 1

zk = tanh(sumj(yj*vjk)+ck)

vjk

zk pl

pl = softmax(sumk(zk*wkl)+dl)

wkl

softmax(xk) = exp(xk) / sumk(exp(xk))

“neurons activate each other via weighted sums”

Prediction: Forward Propagation

married

single

activation function: tanh alternative:

x -> max(0,x) “rectifier”

pl is a non-linear function of xi: can approximate ANY function

with enough layers!

bj, ck, dl: bias values(indep. of inputs)

14

Page 15: Webinar: Deep Learning with H2O

H2O Deep Learning, @ArnoCandel

Mean Square Error = (0.22 + 0.22)/2 “penalize differences per-class” ! Cross-entropy = -log(0.8) “strongly penalize non-1-ness”

Training: Update Weights & Biases

Stochastic Gradient Descent: Update weights and biases via gradient of the error (via back-propagation):

For each training row, we make a prediction and compare with the actual label (supervised learning):

married10.8predicted actual

Objective: minimize prediction error (MSE or cross-entropy)

w <— w - rate * ∂E/∂w

1

15

single00.2

E

wrate

Page 16: Webinar: Deep Learning with H2O

H2O Deep Learning, @ArnoCandel

H2O Deep Learning Architecture

K-V

K-V

HTTPD

HTTPD

nodes/JVMs: sync

threads: async

communication

w

w w

w w w w

w1 w3 w2w4

w2+w4w1+w3

w* = (w1+w2+w3+w4)/4

map: each node trains a copy of the weights

and biases with (some* or all of) its

local data with asynchronous F/J

threads

initial model: weights and biases w

updated model: w*

H2O atomic in-memoryK-V store

reduce: model averaging:

average weights and biases from all nodes,

speedup is at least #nodes/log(#rows) arxiv:1209.4129v3

Keep iterating over the data (“epochs”), score from time to time

Query & display the model via

JSON, WWW

2

2 431

1

1

1

43 2

1 2

1

i

*user can specify the number of total rows per MapReduce iteration

16

Page 17: Webinar: Deep Learning with H2O

H2O Deep Learning, @ArnoCandel

“Secret” Sauce to Higher Accuracy

Adaptive learning rate - ADADELTA (Google)Automatically set learning rate for each neuron based on its training history

Grid Search and Checkpointing Run a grid search to scan many hyper-parameters, then continue training the most promising model(s)

RegularizationL1: penalizes non-zero weights L2: penalizes large weightsDropout: randomly ignore certain inputs

17

Page 18: Webinar: Deep Learning with H2O

H2O Deep Learning, @ArnoCandel

MNIST: digits classification

Standing world record: Without distortions or convolutions, the best-ever published error rate on test set: 0.83% (Microsoft)

18

Time to check in on the demo!

Let’s see how H2O did in the past 10 minutes!

Page 19: Webinar: Deep Learning with H2O

H2O Deep Learning, @ArnoCandel

Frequent errors: confuse 2/7 and 4/9

H2O Deep Learning on MNIST: 0.87% test set error (so far)

19

test set error: 1.5% after 10 mins 1.0% after 1.5 hours 0.87% after 4 hours

World-class results!

No pre-training No distortions

No convolutions No unsupervised

training

Running on 4 nodes with 16 cores each

On 4 nodes

Page 20: Webinar: Deep Learning with H2O

H2O Deep Learning, @ArnoCandel

Use Case: Text Classification

Goal: Predict the item from seller’s text description

20

Train: 578,361 rows 8,647 cols 467 classes Test: 64,263 rows 8,647 cols 143 classes

“Vintage 18KT gold Rolex 2 Tone in great condition”

Data: Binary word vector 0,0,1,0,0,0,0,0,1,0,0,0,1,…,0

vintagegold condition

Let’s see how H2O does on the ebay dataset!

Page 21: Webinar: Deep Learning with H2O

H2O Deep Learning, @ArnoCandel

Out-Of-The-Box: 11.6% test set error after 10 epochs! Predicts the correct class (out of 143) 88.4% of the time!

21

Note 2: No tuning was done(results are for illustration only)

Train: 578,361 rows 8,647 cols 467 classes Test: 64,263 rows 8,647 cols 143 classes

Note 1: H2O columnar-compressed in-memory store only needs 60 MB to store 5 billion values (dense CSV needs 18 GB)

Use Case: Text Classification

Page 22: Webinar: Deep Learning with H2O

H2O Deep Learning, @ArnoCandel

Parallel Scalability (for 64 epochs on MNIST, with “0.87%” parameters)

22

Speedup

0.00

10.00

20.00

30.00

40.00

1 2 4 8 16 32 63

H2O Nodes

(4 cores per node, 1 epoch per node per MapReduce)

2.7 mins

Training Time

0

25

50

75

100

1 2 4 8 16 32 63

H2O Nodes

in minutes

Page 23: Webinar: Deep Learning with H2O

H2O Deep Learning, @ArnoCandel

Outlook for H2O Deep Learning23

Convolutional and Pooling Layers for General Image Recognition (ImageNet)

Sparse Auto-Encoders for Dimensionality Reduction and Anomaly Detection

Execution on GPU clusters for even faster training

Page 24: Webinar: Deep Learning with H2O

H2O Deep Learning, @ArnoCandel

H2O Steam: Scoring Platform

24

Page 25: Webinar: Deep Learning with H2O

H2O Deep Learning, @ArnoCandel

H2O Steam: More Coming Soon!

25

Page 26: Webinar: Deep Learning with H2O

H2O Deep Learning, @ArnoCandel

Key Take-Aways

H2O is a distributed in-memory math platform for enterprise-grade machine learning applications. !

H2O Deep Learning is ready to take your advanced analytics to the next level - Try it on your data! !

Join our Community and Meetups! git clone https://github.com/0xdata/h2o http://docs.0xdata.com www.h2o.ai/community @hexadata

26