h2o world - h2o deep learning with arno candel

8
H2O Deep Learning Arno Candel, PhD Chief Architect, H2O.ai @ArnoCandel

Upload: jo-fai-chow

Post on 16-Apr-2017

2.292 views

Category:

Software


4 download

TRANSCRIPT

Page 1: H2O World - H2O Deep Learning with Arno Candel

H2O  Deep  Learning

Arno  Candel,  PhD  Chief  Architect,  H2O.ai  @ArnoCandel

Page 2: H2O World - H2O Deep Learning with Arno Candel

Why  Deep  Learning?

• Deep  Learning  is  trending  (so  it  must  be  useful)

2013 20152011

I  join  H2O

Deep  Learning  is  everywhere

Google  starts  Google  Trends

Page 3: H2O World - H2O Deep Learning with Arno Candel

What   is  Deep  Learning?

• Deep  Learning  learns  a  hierarchy  of  non-­‐linear  transformations  

• Neurons  transform  their  input  in  a  non-­‐linear  way  

• Black-­‐box,  brute-­‐force  method,  really  good  at  pattern  recognition  

• Deep  Learning  got  a  boost  in  the  last  decade  due  to  faster  hardware  and  algorithmic  advances

Deep  Learning  model    

=    

set  of  connecting  weights    

+  

type  of  non-­‐linearity

Age

Income

FICO  score

Fraud

Legit

Layer  1  hidden  neurons Layer  2  

hidden    neurons

Page 4: H2O World - H2O Deep Learning with Arno Candel

Deep  Learning:  Practical  Use

• non  linear  • robust  to  correlated  features  • conceptually  simple  • learned  features  can  be  extracted  • can  stop  training  at  any  time  • can  be  fine-­‐tuned  with  more  data  • great  ensemble  member  • efficient  for  multi-­‐class  problems  • world-­‐class  at  pattern  recognition

strengths

• slow  to  train  • slow  to  score  • not  interpretable  • results  not  fully  reproducible  • theory  not  well  understood  • overfits,  needs  regularization  • many  hyper-­‐parameters  • expands  categorical  variables  • must  impute  missing  values

weaknesses

Page 5: H2O World - H2O Deep Learning with Arno Candel

H2O  Deep  Learning  Features

• H2O  Eco-­‐System  Benefits:  

- Scalable  to  massive  datasets  on  large  clusters,  fully  parallelized  

- Low-­‐latency  Java  (“POJO”)  scoring  code  is  auto-­‐generated  

- Easy  to  deploy  on  Laptop,  Server,  Hadoop  cluster,  Spark  cluster,  HPC  

- APIs  include  R,  Python,  Flow  UI,  Scala,  Java,  JavaScript,  REST  

• Regularization  techniques:  Dropout,  L1/L2  

• Early  stopping,  N-­‐fold  cross-­‐validation,  Grid  search  

• Handling  of  categorical,  missing  and  sparse  data  

• Gaussian/Laplace/Poisson/Gamma/Tweedie  regression  with  offsets,  observation  weights,  various  loss  functions  

• Unsupervised  mode  for  non-­‐linear  dimensionality  reduction,  outlier  detection

Page 6: H2O World - H2O Deep Learning with Arno Candel

Learn  More  about  H2O  Deep  Learning

Tomorrow  11:00  AM  Erdos  Stage  

Top  10  Deep  Learning  Tips  &  Tricks

Page 7: H2O World - H2O Deep Learning with Arno Candel

What  do  these  st ickers  mean?

I have H2O Installed

I have Python installed

I have R installed

I have the H2O World data sets

P i ck  up   s t i cke rs  o r   get   i n s ta l l   he lp   a t   the  in fo rmat ion  booth

Page 8: H2O World - H2O Deep Learning with Arno Candel

Hands-­‐On  Tutorial

• Introduction  

• Installation  and  Startup  

• Decision  Boundaries  

• Cover  Type  Dataset  

• Exploratory  Data  Analysis  

• Deep  Learning  Model  

• Hyper-­‐Parameter  Search  

• Checkpointing  

• Cross-­‐Validation  

• Model  Save  &  Load  

• Regression  and  Binary  Classification  

• Deep  Learning  Tips  &  Tricks  (more  tomorrow!)