latest developments in h2o

57
Latest Developments in H 2 O Jo-fai (Joe) Chow Data Scientist [email protected] @matlabulous H 2 O at Booking.com – Amsterdam 6 th April, 2017

Upload: jo-fai-chow

Post on 16-Apr-2017

115 views

Category:

Software


0 download

TRANSCRIPT

Page 1: Latest Developments in H2O

Latest Developments in H2O

Jo-fai (Joe) Chow

Data Scientist

[email protected]

@matlabulous

H2O at Booking.com – Amsterdam6th April, 2017

Page 2: Latest Developments in H2O

About Me

• Civil (Water) Engineer• 2010 – 2015

• Consultant (UK)• Utilities

• Asset Management

• Constrained Optimization

• Industrial PhD (UK)• Infrastructure Design Optimization

• Machine Learning + Water Engineering

• Discovered H2O in 2014

• Data Scientist• 2015

• Virgin Media (UK)

• Domino Data Lab (Silicon Valley, US)

• 2016 – Present

• H2O.ai (Silicon Valley, US)

2

Page 3: Latest Developments in H2O

About Me

3

R + H2O + Domino for KaggleGuest Blog Post for Domino & H2O (2014)

• The Long Story• bit.ly/joe_kaggle_story

Page 4: Latest Developments in H2O

About Me

4

Developing R Packages for FunrPlotter (2014)

Page 5: Latest Developments in H2O

Agenda

• About H2O.ai• Company• Machine Learning Platform

• What’s New?• Deep Water• H2O + xgboost• Stacked Ensembles• Auto Machine Learning• Model Interpretation• Community

• PyData Conference

5

Page 6: Latest Developments in H2O

About H2O.ai

6

Page 7: Latest Developments in H2O

Company OverviewFounded 2011 Venture-backed, debuted in 2012

Products • H2O Open Source In-Memory AI Prediction Engine• Sparkling Water• Steam

Mission Operationalize Data Science, and provide a platform for users to build beautiful data products

Team 70 employees• Distributed Systems Engineers doing Machine Learning• World-class visualization designers

Headquarters Mountain View, CA

7

Page 8: Latest Developments in H2O

8

Our Team

Joe

Kuba

Page 9: Latest Developments in H2O

Scientific Advisory Council

9

Page 10: Latest Developments in H2O

10

Page 11: Latest Developments in H2O

11

Joe (2015)

http://www.h2o.ai/gartner-magic-quadrant/

Page 12: Latest Developments in H2O

12

Check out our websiteh2o.ai

Page 13: Latest Developments in H2O

H2O Machine Learning Platform

13

Page 14: Latest Developments in H2O

HDFS

S3

NFS

DistributedIn-Memory

Load Data

Loss-lessCompression

H2O Compute Engine

Production Scoring Environment

Exploratory &Descriptive

Analysis

Feature Engineering &

Selection

Supervised &Unsupervised

Modeling

ModelEvaluation &

Selection

Predict

Data & ModelStorage

Model Export:Plain Old Java Object

YourImagination

Data Prep Export:Plain Old Java Object

Local

SQL

High Level Architecture

14

Page 15: Latest Developments in H2O

HDFS

S3

NFS

DistributedIn-Memory

Load Data

Loss-lessCompression

H2O Compute Engine

Production Scoring Environment

Exploratory &Descriptive

Analysis

Feature Engineering &

Selection

Supervised &Unsupervised

Modeling

ModelEvaluation &

Selection

Predict

Data & ModelStorage

Model Export:Plain Old Java Object

YourImagination

Data Prep Export:Plain Old Java Object

Local

SQL

High Level Architecture

15

Import Data from Multiple Sources

Page 16: Latest Developments in H2O

HDFS

S3

NFS

DistributedIn-Memory

Load Data

Loss-lessCompression

H2O Compute Engine

Production Scoring Environment

Exploratory &Descriptive

Analysis

Feature Engineering &

Selection

Supervised &Unsupervised

Modeling

ModelEvaluation &

Selection

Predict

Data & ModelStorage

Model Export:Plain Old Java Object

YourImagination

Data Prep Export:Plain Old Java Object

Local

SQL

High Level Architecture

16

Fast, Scalable & Distributed Compute Engine Written in

Java

Page 17: Latest Developments in H2O

HDFS

S3

NFS

DistributedIn-Memory

Load Data

Loss-lessCompression

H2O Compute Engine

Production Scoring Environment

Exploratory &Descriptive

Analysis

Feature Engineering &

Selection

Supervised &Unsupervised

Modeling

ModelEvaluation &

Selection

Predict

Data & ModelStorage

Model Export:Plain Old Java Object

YourImagination

Data Prep Export:Plain Old Java Object

Local

SQL

High Level Architecture

17

Fast, Scalable & Distributed Compute Engine Written in

Java

Page 18: Latest Developments in H2O

Supervised Learning

• Generalized Linear Models: Binomial, Gaussian, Gamma, Poisson and Tweedie

• Naïve Bayes

Statistical Analysis

Ensembles

• Distributed Random Forest: Classification or regression models

• Gradient Boosting Machine: Produces an ensemble of decision trees with increasing refined approximations

Deep Neural Networks

• Deep learning: Create multi-layer feed forward neural networks starting with an input layer followed by multiple layers of nonlinear transformations

Algorithms OverviewUnsupervised Learning

• K-means: Partitions observations into k clusters/groups of the same spatial size. Automatically detect optimal k

Clustering

Dimensionality Reduction

• Principal Component Analysis: Linearly transforms correlated variables to independent components

• Generalized Low Rank Models: extend the idea of PCA to handle arbitrary data consisting of numerical, Boolean, categorical, and missing data

Anomaly Detection

• Autoencoders: Find outliers using a nonlinear dimensionality reduction using deep learning

18

Page 19: Latest Developments in H2O

H2O Deep Learning in Action

19

Page 20: Latest Developments in H2O

HDFS

S3

NFS

DistributedIn-Memory

Load Data

Loss-lessCompression

H2O Compute Engine

Production Scoring Environment

Exploratory &Descriptive

Analysis

Feature Engineering &

Selection

Supervised &Unsupervised

Modeling

ModelEvaluation &

Selection

Predict

Data & ModelStorage

Model Export:Plain Old Java Object

YourImagination

Data Prep Export:Plain Old Java Object

Local

SQL

High Level Architecture

20

Multiple Interfaces

Page 21: Latest Developments in H2O

H2O + R

21

Page 22: Latest Developments in H2O

H2O + Python

22

Page 23: Latest Developments in H2O

23

H2O Flow (Web) Interface

Page 24: Latest Developments in H2O

HDFS

S3

NFS

DistributedIn-Memory

Load Data

Loss-lessCompression

H2O Compute Engine

Production Scoring Environment

Exploratory &Descriptive

Analysis

Feature Engineering &

Selection

Supervised &Unsupervised

Modeling

ModelEvaluation &

Selection

Predict

Data & ModelStorage

Model Export:Plain Old Java Object

YourImagination

Data Prep Export:Plain Old Java Object

Local

SQL

High Level Architecture

24

Export Standalone Models for Production

Page 25: Latest Developments in H2O

25

docs.h2o.ai

Page 26: Latest Developments in H2O

New Developments

26

Page 27: Latest Developments in H2O

27

Page 28: Latest Developments in H2O

Deep Water Next-Gen Distributed Deep Learning with H2O

H2O integrates with existing GPU backends for significant performance gains

One Interface - GPU Enabled - Significant Performance GainsInherits All H2O Properties in Scalability, Ease of Use and Deployment

Recurrent Neural Networksenabling natural language processing, sequences, time series, and more

Convolutional Neural Networks enabling Image, video, speech recognition

Hybrid Neural Network Architectures enabling speech to text translation, image captioning, scene parsing and more

Deep Water

28

Page 29: Latest Developments in H2O

Deep Water Architecture

Node 1 Node N

ScalaSpark

H2OJava

Execution Engine

TensorFlow/mxnet/CaffeC++

GPU CPU

TensorFlow/mxnet/CaffeC++

GPU CPU

RPC

R/Py/Flow/Scala clientREST APIWeb server

H2OJava

Execution Engine

grpc/MPI/RDMA

ScalaSpark

29

Page 31: Latest Developments in H2O

31

Example: Deep Water + H2O FlowChoosing different network structures

Page 32: Latest Developments in H2O

32

Choosing different backends(TensorFlow, MXNet, Caffe)

Page 33: Latest Developments in H2O

H2O + xgboost

33

Page 34: Latest Developments in H2O

34

Page 35: Latest Developments in H2O

35

Page 36: Latest Developments in H2O

Stacked Ensembles

36

Page 37: Latest Developments in H2O

37

https://github.com/h2oai/h2o-meetups/blob/master/2017_02_23_Metis_SF_Sacked_Ensembles_Deep_Water/stacked_ensembles_in_h2o_feb2017.pdf

Page 38: Latest Developments in H2O

Available in both Python and R

38

https://github.com/woobe/odsc_h2o_machine_learning

Page 39: Latest Developments in H2O

Automatic Machine Learning(AutoML)

39

Page 40: Latest Developments in H2O

40

Page 41: Latest Developments in H2O

Model Interpretation

41

Page 42: Latest Developments in H2O

42

https://www.oreilly.com/ideas/ideas-on-interpreting-machine-learning

Page 43: Latest Developments in H2O

43

Page 44: Latest Developments in H2O

Time Series in H2O

44

Page 45: Latest Developments in H2O

45

Page 46: Latest Developments in H2O

46

Page 47: Latest Developments in H2O

Blog Post

47

http://blog.h2o.ai/2016/11/indexing-1-billion-time-series-with-h2o-and-isax/

Page 48: Latest Developments in H2O

Sparkling WaterH2O’s integration with Spark – latest developments

48

Page 49: Latest Developments in H2O

External H2O Backend for Sparkling Water

• High Availability Mode• Separating Spark and H2O

• while preserving same API

• Advantages• H2O does not crash when Spark

executor goes down

• Better resource management since resources can be planned per tool

49

• Disadvantages• Transfer overhead between Spark

and H2O processes

• Under measurement with cooperation of a customer

• Links• https://github.com/h2oai/sparklin

g-water/blob/master/doc/backends.md

Page 50: Latest Developments in H2O

50

Separation

Page 51: Latest Developments in H2O

H2O Community

51

Page 52: Latest Developments in H2O

H2O Meetup Groups

• www.meetup.com/topics/h2o/all/

• Need your help ☺

52

Page 53: Latest Developments in H2O

#AroundTheWorldWithH2Oai

53

Page 54: Latest Developments in H2O

54

Page 55: Latest Developments in H2O

PyData Amsterdam 2017

55

Page 56: Latest Developments in H2O

56

H2O TutorialFriday – 4:00 pm

(All Materials will be available on GitHub)

Sparkling Water TalkSaturday – 3:15 pm

Lucas’ TalkSaturday – 1:30 pm

Page 57: Latest Developments in H2O

• Organizers & Sponsors

• Find us at PyData Conference• Live Demos

57

Thanks!

• Code, Slides & Documents• bit.ly/h2o_meetups• docs.h2o.ai

• Contact• [email protected]• @matlabulous• github.com/woobe

• Please search/ask questions on Stack Overflow• Use the tag `h2o` (not H2 zero)