latest developments in h2o

Download Latest Developments in H2O

Post on 16-Apr-2017

105 views

Category:

Software

0 download

Embed Size (px)

TRANSCRIPT

  • Latest Developments in H2O

    Jo-fai (Joe) Chow

    Data Scientist

    joe@h2o.ai

    @matlabulous

    H2O at Booking.com Amsterdam6th April, 2017

  • About Me

    Civil (Water) Engineer 2010 2015

    Consultant (UK) Utilities

    Asset Management

    Constrained Optimization

    Industrial PhD (UK) Infrastructure Design Optimization

    Machine Learning + Water Engineering

    Discovered H2O in 2014

    Data Scientist 2015

    Virgin Media (UK)

    Domino Data Lab (Silicon Valley, US)

    2016 Present

    H2O.ai (Silicon Valley, US)

    2

  • About Me

    3

    R + H2O + Domino for KaggleGuest Blog Post for Domino & H2O (2014)

    The Long Story bit.ly/joe_kaggle_story

    https://blog.dominodatalab.com/using-r-h2o-and-domino-for-a-kaggle-competition/

  • About Me

    4

    Developing R Packages for FunrPlotter (2014)

    https://github.com/woobe/rPlotter

  • Agenda

    About H2O.ai Company Machine Learning Platform

    Whats New? Deep Water H2O + xgboost Stacked Ensembles Auto Machine Learning Model Interpretation Community

    PyData Conference

    5

  • About H2O.ai

    6

  • Company OverviewFounded 2011 Venture-backed, debuted in 2012

    Products H2O Open Source In-Memory AI Prediction Engine Sparkling Water Steam

    Mission Operationalize Data Science, and provide a platform for users to build beautiful data products

    Team 70 employees Distributed Systems Engineers doing Machine Learning World-class visualization designers

    Headquarters Mountain View, CA

    7

  • 8

    Our Team

    Joe

    Kuba

  • Scientific Advisory Council

    9

  • 10

  • 11

    Joe (2015)

    http://www.h2o.ai/gartner-magic-quadrant/

    http://www.h2o.ai/gartner-magic-quadrant/

  • 12

    Check out our websiteh2o.ai

  • H2O Machine Learning Platform

    13

  • HDFS

    S3

    NFS

    DistributedIn-Memory

    Load Data

    Loss-lessCompression

    H2O Compute Engine

    Production Scoring Environment

    Exploratory &Descriptive

    Analysis

    Feature Engineering &

    Selection

    Supervised &Unsupervised

    Modeling

    ModelEvaluation &

    Selection

    Predict

    Data & ModelStorage

    Model Export:Plain Old Java Object

    YourImagination

    Data Prep Export:Plain Old Java Object

    Local

    SQL

    High Level Architecture

    14

    https://www.youtube.com/v/UGW3cT_cZLc&autoplay=1

  • HDFS

    S3

    NFS

    DistributedIn-Memory

    Load Data

    Loss-lessCompression

    H2O Compute Engine

    Production Scoring Environment

    Exploratory &Descriptive

    Analysis

    Feature Engineering &

    Selection

    Supervised &Unsupervised

    Modeling

    ModelEvaluation &

    Selection

    Predict

    Data & ModelStorage

    Model Export:Plain Old Java Object

    YourImagination

    Data Prep Export:Plain Old Java Object

    Local

    SQL

    High Level Architecture

    15

    Import Data from Multiple Sources

    https://www.youtube.com/v/UGW3cT_cZLc&autoplay=1

  • HDFS

    S3

    NFS

    DistributedIn-Memory

    Load Data

    Loss-lessCompression

    H2O Compute Engine

    Production Scoring Environment

    Exploratory &Descriptive

    Analysis

    Feature Engineering &

    Selection

    Supervised &Unsupervised

    Modeling

    ModelEvaluation &

    Selection

    Predict

    Data & ModelStorage

    Model Export:Plain Old Java Object

    YourImagination

    Data Prep Export:Plain Old Java Object

    Local

    SQL

    High Level Architecture

    16

    Fast, Scalable & Distributed Compute Engine Written in

    Java

    https://www.youtube.com/v/UGW3cT_cZLc&autoplay=1

  • HDFS

    S3

    NFS

    DistributedIn-Memory

    Load Data

    Loss-lessCompression

    H2O Compute Engine

    Production Scoring Environment

    Exploratory &Descriptive

    Analysis

    Feature Engineering &

    Selection

    Supervised &Unsupervised

    Modeling

    ModelEvaluation &

    Selection

    Predict

    Data & ModelStorage

    Model Export:Plain Old Java Object

    YourImagination

    Data Prep Export:Plain Old Java Object

    Local

    SQL

    High Level Architecture

    17

    Fast, Scalable & Distributed Compute Engine Written in

    Java

    https://www.youtube.com/v/UGW3cT_cZLc&autoplay=1

  • Supervised Learning

    Generalized Linear Models: Binomial, Gaussian, Gamma, Poisson and Tweedie

    Nave Bayes

    Statistical Analysis

    Ensembles

    Distributed Random Forest: Classification or regression models

    Gradient Boosting Machine: Produces an ensemble of decision trees with increasing refined approximations

    Deep Neural Networks

    Deep learning: Create multi-layer feed forward neural networks starting with an input layer followed by multiple layers of nonlinear transformations

    Algorithms OverviewUnsupervised Learning

    K-means: Partitions observations into k clusters/groups of the same spatial size. Automatically detect optimal k

    Clustering

    Dimensionality Reduction

    Principal Component Analysis: Linearly transforms correlated variables to independent components

    Generalized Low Rank Models: extend the idea of PCA to handle arbitrary data consisting of numerical, Boolean, categorical, and missing data

    Anomaly Detection

    Autoencoders: Find outliers using a nonlinear dimensionality reduction using deep learning

    18

  • H2O Deep Learning in Action

    19

  • HDFS

    S3

    NFS

    DistributedIn-Memory

    Load Data

    Loss-lessCompression

    H2O Compute Engine

    Production Scoring Environment

    Exploratory &Descriptive

    Analysis

    Feature Engineering &

    Selection

    Supervised &Unsupervised

    Modeling

    ModelEvaluation &

    Selection

    Predict

    Data & ModelStorage

    Model Export:Plain Old Java Object

    YourImagination

    Data Prep Export:Plain Old Java Object

    Local

    SQL

    High Level Architecture

    20

    Multiple Interfaces

    https://www.youtube.com/v/UGW3cT_cZLc&autoplay=1

  • H2O + R

    21

  • H2O + Python

    22

  • 23

    H2O Flow (Web) Interface

  • HDFS

    S3

    NFS

    DistributedIn-Memory

    Load Data

    Loss-lessCompression

    H2O Compute Engine

    Production Scoring Environment

    Exploratory &Descriptive

    Analysis

    Feature Engineering &

    Selection

    Supervised &Unsupervised

    Modeling

    ModelEvaluation &

    Selection

    Predict

    Data & ModelStorage

    Model Export:Plain Old Java Object

    YourImagination

    Data Prep Export:Plain Old Java Object

    Local

    SQL

    High Level Architecture

    24

    Export Standalone Models for Production

    https://www.youtube.com/v/UGW3cT_cZLc&autoplay=1

  • 25

    docs.h2o.ai

  • New Developments

    26

  • 27

  • Deep Water Next-Gen Distributed Deep Learning with H2O

    H2O integrates with existing GPU backends for significant performance gains

    One Interface - GPU Enabled - Significant Performance GainsInherits All H2O Properties in Scalability, Ease of Use and Deployment

    Recurrent Neural Networksenabling natural language processing, sequences, time series, and more

    Convolutional Neural Networks enabling Image, video, speech recognition

    Hybrid Neural Network Architectures enabling speech to text translation, image captioning, scene parsing and more

    Deep Water

    28

  • Deep Water Architecture

    Node 1 Node N

    ScalaSpark

    H2OJava

    Execution Engine

    TensorFlow/mxnet/CaffeC++

    GPU CPU

    TensorFlow/mxnet/CaffeC++

    GPU CPU

    RPC

    R/Py/Flow/Scala clientREST APIWeb server

    H2OJava

    Execution Engine

    grpc/MPI/RDMA

    ScalaSpark

    29

  • Available Networks in Deep Water

    LeNet

    AlexNet

    VGGNet

    Inception (GoogLeNet)

    ResNet (Deep Residual Learning)

    Build Your Own

    30

    ResNet

    http://yann.lecun.com/exdb/lenet/http://papers.nips.cc/paper/4824-imagenet-classification-with-deep-convolutional-neural-networkshttp://www.robots.ox.ac.uk/~vgg/research/very_deep/https://www.google.co.uk/url?sa=t&rct=j&q=&esrc=s&source=web&cd=1&cad=rja&uact=8&ved=0ahUKEwiT2KSWo8rQAhWqKcAKHZJnDEwQFgggMAA&url=https://www.cs.unc.edu/~wliu/papers/GoogLeNet.pdf&usg=AFQjCNHSEJVb0PWLBIG-Y-zWh9gRv9ehBQ&sig2=52aAA7D5iNSVyBxbCpmtrA&bvm=bv.139782543,d.bGshttps://arxiv.org/abs/1512.03385

  • 31

    Example: Deep Water + H2O FlowChoosing different network structures

  • 32

    Choosing different backends(TensorFlow, MXNet, Caffe)

  • H2O + xgboost

    33

  • 34

  • 35

  • Stacked Ensembles

    36

  • 37

    https://github.com/h2oai/h2o-meetups/blob/master/2017_02_23_Metis_SF_Sacked_Ensembles_Deep_Water/stacked_ensembles_in_h2o_feb2017.pdf

    https://github.com/h2oai/h2o-meetups/blob/master/2017_02_23_Metis_SF_Stacked_Ensembles_Deep_Water/stacked_ensembles_in_h2o_feb2017.pdf

  • Available in both Python and R

    38

    https://github.com/woobe/odsc_h2o_machine_learning

    https://github.com/woobe/odsc_h2o_machine_learning

  • Automatic Machine

Recommended

View more >