h2o world - machine learning for non-data scientists

21
Calling Your Shots with Data How to Ask Smarter Questions to Make Better Business Decisions November 9, 2015 Jessica Lanford Chen Huang

Upload: srisatish-ambati

Post on 08-Jan-2017

1.030 views

Category:

Software


1 download

TRANSCRIPT

Page 1: H2O World - Machine Learning for non-data scientists

Call ing  Your  Shots  with  Data

How  to  Ask  Smarter  Questions  to  Make  Better  Business  Decisions  

November  9,  2015

Jessica  LanfordChen  Huang

Page 2: H2O World - Machine Learning for non-data scientists

Need  a  handy  conference  guide?

Down load  ou r   app ,   “H2O  Wor ld  2015”

Page 3: H2O World - Machine Learning for non-data scientists

What  do  these  st ickers  mean?

I have H2O Installed

I have Python installed

I have R installed

I have the H2O World data sets

P i ck  up   s t i cke rs  o r   get   i n s ta l l   he lp   a t   the  in fo rmat ion  booth

Page 4: H2O World - Machine Learning for non-data scientists

Agenda

• Introduction  • Business  Decision  Process    • Tools  &  Resources  • Bridging  the  Communications  Gap  • Q&A

Page 5: H2O World - Machine Learning for non-data scientists

Business  Decis ion  Process

Make  data-­‐informed  business  decisions

Ask  business  questions

Define  business  problems

Analysis  Process

Page 6: H2O World - Machine Learning for non-data scientists

Motivators  that   Inf luence  Business  Decis ions

Business  decisions  will  impact:  Customer  product  interaction  

Customer  and  company  engagement  

Product  development  

???  

1

2

3

4

Your%customers%

Your%products%/%services%/%offerings

Your%business

1

2 34

Page 7: H2O World - Machine Learning for non-data scientists

Asking  the  “Right”  Questions

The  answers  to  the  business  questions  will  ultimately  provide  business  context  for  the  “Analysis  Process”.  

Make  data-­‐informed  business  decisions

Ask  business  questions

Define  business  problems

Analysis  Process

Page 8: H2O World - Machine Learning for non-data scientists

Data$Scientist

Analysis  Process

Analysis(Process

Analyze(Data(&(Find(Insights

1.(Frame(the(question 2.(Collect(raw(data 3.(Prepare(and(

explore(data

4.(Develop(model5.(Evaluate,(validate,(and(

interpret(results

6.(Communicate(and(visualize(

results

Discussion(to(reach(agreement(in(problem(

statement

Translate(business(questions(and(context(into(a(problem(

statement

Page 9: H2O World - Machine Learning for non-data scientists

Complete  Business  Decis ion  Process

Analysis(Process

Analyze(Data(&(Find(Insights

Define(business(problems Ask(business(questionsMake(data?Informed(business(decisions

1.(Frame(the(question

2.(Collect(raw(data3.(Prepare(and(explore(data

4.(Develop(model5.(Evaluate,(validate,(and(

interpret(results

6.(Communicate(and(visualize(

results

Business(decision(maker

Data(Scientist

Discussion(to(reach(agreement(in(problem(

statement

Translate(business(questions(and(context(into(a(problem(

statement

Page 10: H2O World - Machine Learning for non-data scientists

Tools  and  Resources

Data$Science$Team

R,$Python,$Scala,$Java,$CoffeeScript /$JavaScript,$SQL,$Julia

Languages

Jupyter$(IPython),$H2O$Flow,$…$

Notebooks$+$IDEs

Skillsets:• Domain$knowledge• Math$and$statistics• Programming$ skills• Databases• Machine$learning• Communication$and$visualization

Page 11: H2O World - Machine Learning for non-data scientists

Bridging  the  Communication  Gap

Page 12: H2O World - Machine Learning for non-data scientists

What   is  Machine  Learning?  

• Machine  reads  the  data,  learns  from  the  data,  uses  it  to  make  predictions  

• Can  show  you  correlation  but  not  necessarily  causation  

• Can  find  relationships  and  patterns  within  volumes  of  data  that  the  human  mind  is  incapable  of  processing

Jessica  talk

Note:  There  is  no  “right”  or  “best”  model  that  a  data  scientist  can  use.  The  model  used  is  dependent  on  the  data,  problem,  and  the  data  scientist.

Page 13: H2O World - Machine Learning for non-data scientists

Supervised  Learning

Business  Applications:• Classification  

• Twitter  sentiments:  Rant  -­‐>  negative,  Rave  -­‐>  positive  

• Coffee  vs.  tea  vs.  soda  drinker  • Recommender  systems  

• Netflix’s  “More  Like  This”  • Amazon’s  “Customers  Who  Bought  This  Item  Also  Bought”  

• Fraud  detection  • Authorizing  transactions

• Known  right  answer,  using  model  to  verify  

• Algorithm  tries  to  predict  results  

• Based  on  its  training  data,  the  program  can  make  accurate  decisions  when  given  new  data  

• Examples  of  algorithms  and  models:  GLM,  DRF,  GBM,  Deep  Learning

Data  Science  Concept:

Page 14: H2O World - Machine Learning for non-data scientists

Unsupervised  Learning

Business  Applications:• Anomaly  detection  

• outliers:  detecting  irregular  heartbeats  

• computer  security  with  unauthorized  access    

• Clustering    • Grouping  users  by  salary  • Grouping  users  by  behavior

• No  “known”  answer,  using  algorithms  to  determine  answer  

• Algorithm  tries  to  identify  patterns  in  the  data  

• General  understanding  of  input  data  where  no  prediction  is  needed  

• Examples  of  algorithms  and  models:  K-­‐means,  PCA

Data  Science  Concept:

Page 15: H2O World - Machine Learning for non-data scientists

Classif ication  (Supervised)

Business  Applications:• Will  customers  upgrade  to  new  software?    

• What  age  groups  tested  well  for  this  new  TV  show?  (marketing  campaigns)  

• Nigerian  419  (spam  classification)  

• Will  the  real  Barack  Obama  please  stand  up?  (fraud  detection)

• Classification  is  the  process  of  taking  an  input  and  assigning  a  label  to  it.  

• The  labels  could  be  binomial  (Yes,  No)  or  multinomial  (High,  Medium,  Low).    

• Examples  of  algorithms  and  models:  Random  Forest

Data  Science  Concept:

Page 16: H2O World - Machine Learning for non-data scientists

Regression  (Supervised)

Business  Applications:• How  much  money  would  a  user  who  has  reached  level  200  in  CandyCrush  spend  on  in-­‐app  purchases?  (forecasting)  

• How  much  would  a  customer  expect  to  pay  for  car  insurance  based  on  age,  gender,  and  car  type?  (prediction)  

• How  many  registered  meetup.com  attendees  will  actually  show  up  based  on  past  event  registration  and  attendance?  (prediction)

• Regression  predict  a  continuous  numerical  value  output    

• Examples  of  algorithms  and  models:  Linear  Regression,  Random  Forest

Data  Science  Concept:

Page 17: H2O World - Machine Learning for non-data scientists

Deep  Learning  (Supervised  and  Unsupervised)

Business  Applications:• Scanning  mug  shots  of  suspects  against  FBI  database  (scanning  image  classification)  

• Siri  (language  processing)  • Early  detection  of  frustrated  customers  who  call  into  call  centers  (audio  processing)

• Uses  “features”  (multiple  variables  impacting  a  result)  to  identify  patterns  

• Uses  results  to  iteratively  improve  predictions  for  new  data

Data  Science  Concept:

Page 18: H2O World - Machine Learning for non-data scientists

Clustering  (Unsupervised)

Business  Applications:• Identify  different  types  of  shoppers  based  on  purchasing  history  to  create  exclusive  promotions  (market  segmentation)  

• Identifying  groups  of  products  people  like  to  buy  online  

• Identify  geographic  locations  where  a  national  mobile  carrier  should  install  its  next  cellular  tower  to  optimize  for  its  user  base  

• Grouping  a  set  of  objects  in  the  same  group  that  are  more  similar  to  each  other  than  other  groups    

• Examples  of  algorithms  and  models:  K-­‐means  clustering,  hierarchical  clustering,  DBSCAN

Data  Science  Concept:

Page 19: H2O World - Machine Learning for non-data scientists

Business    Examples:

Types  of    Machine  Learning:

Machine  Learning  Summary

Supervised

• Calculating  estimated  lifetime  value  

• Forecasting  and  prediction  • Recommendation  engine  • Fraud  detection

Unsupervised

Data  Science    Concepts:  

• Anomaly  detection  • Determining  customer  behavior  

• Imagine,  text,  and  audio  processing  

• Classification  • Regression  • Deep  Learning

• Deep  Learning  • Clustering

Page 20: H2O World - Machine Learning for non-data scientists

I f  you  Want  to  Learn  More…  

• StackExchange:  stats.stackexchange.com    • Quora:  quora.com/Machine-­‐Learning    • Data  Science  in  H2O:  http://docs.h2o.ai/h2oclassic/datascience/top.html    

• Visualization  Introduction  to  Machine  Learning:  r2d3.us/visual-­‐intro-­‐to-­‐machine-­‐learning-­‐part-­‐1  

• Machine  Learning  Map:    http://scikit-­‐learn.org/stable/tutorial/machine_learning_map/  

Page 21: H2O World - Machine Learning for non-data scientists

Questions  and  Answers