h2o world - machine learning for non-data scientists
TRANSCRIPT
Call ing Your Shots with Data
How to Ask Smarter Questions to Make Better Business Decisions
November 9, 2015
Jessica LanfordChen Huang
Need a handy conference guide?
Down load ou r app , “H2O Wor ld 2015”
What do these st ickers mean?
I have H2O Installed
I have Python installed
I have R installed
I have the H2O World data sets
P i ck up s t i cke rs o r get i n s ta l l he lp a t the in fo rmat ion booth
Agenda
• Introduction • Business Decision Process • Tools & Resources • Bridging the Communications Gap • Q&A
Business Decis ion Process
Make data-‐informed business decisions
Ask business questions
Define business problems
Analysis Process
Motivators that Inf luence Business Decis ions
Business decisions will impact: Customer product interaction
Customer and company engagement
Product development
???
1
2
3
4
Your%customers%
Your%products%/%services%/%offerings
Your%business
1
2 34
Asking the “Right” Questions
The answers to the business questions will ultimately provide business context for the “Analysis Process”.
Make data-‐informed business decisions
Ask business questions
Define business problems
Analysis Process
Data$Scientist
Analysis Process
Analysis(Process
Analyze(Data(&(Find(Insights
1.(Frame(the(question 2.(Collect(raw(data 3.(Prepare(and(
explore(data
4.(Develop(model5.(Evaluate,(validate,(and(
interpret(results
6.(Communicate(and(visualize(
results
Discussion(to(reach(agreement(in(problem(
statement
Translate(business(questions(and(context(into(a(problem(
statement
Complete Business Decis ion Process
Analysis(Process
Analyze(Data(&(Find(Insights
Define(business(problems Ask(business(questionsMake(data?Informed(business(decisions
1.(Frame(the(question
2.(Collect(raw(data3.(Prepare(and(explore(data
4.(Develop(model5.(Evaluate,(validate,(and(
interpret(results
6.(Communicate(and(visualize(
results
Business(decision(maker
Data(Scientist
Discussion(to(reach(agreement(in(problem(
statement
Translate(business(questions(and(context(into(a(problem(
statement
Tools and Resources
Data$Science$Team
R,$Python,$Scala,$Java,$CoffeeScript /$JavaScript,$SQL,$Julia
Languages
Jupyter$(IPython),$H2O$Flow,$…$
Notebooks$+$IDEs
Skillsets:• Domain$knowledge• Math$and$statistics• Programming$ skills• Databases• Machine$learning• Communication$and$visualization
Bridging the Communication Gap
What is Machine Learning?
• Machine reads the data, learns from the data, uses it to make predictions
• Can show you correlation but not necessarily causation
• Can find relationships and patterns within volumes of data that the human mind is incapable of processing
Jessica talk
Note: There is no “right” or “best” model that a data scientist can use. The model used is dependent on the data, problem, and the data scientist.
Supervised Learning
Business Applications:• Classification
• Twitter sentiments: Rant -‐> negative, Rave -‐> positive
• Coffee vs. tea vs. soda drinker • Recommender systems
• Netflix’s “More Like This” • Amazon’s “Customers Who Bought This Item Also Bought”
• Fraud detection • Authorizing transactions
• Known right answer, using model to verify
• Algorithm tries to predict results
• Based on its training data, the program can make accurate decisions when given new data
• Examples of algorithms and models: GLM, DRF, GBM, Deep Learning
Data Science Concept:
Unsupervised Learning
Business Applications:• Anomaly detection
• outliers: detecting irregular heartbeats
• computer security with unauthorized access
• Clustering • Grouping users by salary • Grouping users by behavior
• No “known” answer, using algorithms to determine answer
• Algorithm tries to identify patterns in the data
• General understanding of input data where no prediction is needed
• Examples of algorithms and models: K-‐means, PCA
Data Science Concept:
Classif ication (Supervised)
Business Applications:• Will customers upgrade to new software?
• What age groups tested well for this new TV show? (marketing campaigns)
• Nigerian 419 (spam classification)
• Will the real Barack Obama please stand up? (fraud detection)
• Classification is the process of taking an input and assigning a label to it.
• The labels could be binomial (Yes, No) or multinomial (High, Medium, Low).
• Examples of algorithms and models: Random Forest
Data Science Concept:
Regression (Supervised)
Business Applications:• How much money would a user who has reached level 200 in CandyCrush spend on in-‐app purchases? (forecasting)
• How much would a customer expect to pay for car insurance based on age, gender, and car type? (prediction)
• How many registered meetup.com attendees will actually show up based on past event registration and attendance? (prediction)
• Regression predict a continuous numerical value output
• Examples of algorithms and models: Linear Regression, Random Forest
Data Science Concept:
Deep Learning (Supervised and Unsupervised)
Business Applications:• Scanning mug shots of suspects against FBI database (scanning image classification)
• Siri (language processing) • Early detection of frustrated customers who call into call centers (audio processing)
• Uses “features” (multiple variables impacting a result) to identify patterns
• Uses results to iteratively improve predictions for new data
Data Science Concept:
Clustering (Unsupervised)
Business Applications:• Identify different types of shoppers based on purchasing history to create exclusive promotions (market segmentation)
• Identifying groups of products people like to buy online
• Identify geographic locations where a national mobile carrier should install its next cellular tower to optimize for its user base
• Grouping a set of objects in the same group that are more similar to each other than other groups
• Examples of algorithms and models: K-‐means clustering, hierarchical clustering, DBSCAN
Data Science Concept:
Business Examples:
Types of Machine Learning:
Machine Learning Summary
Supervised
• Calculating estimated lifetime value
• Forecasting and prediction • Recommendation engine • Fraud detection
Unsupervised
Data Science Concepts:
• Anomaly detection • Determining customer behavior
• Imagine, text, and audio processing
• Classification • Regression • Deep Learning
• Deep Learning • Clustering
I f you Want to Learn More…
• StackExchange: stats.stackexchange.com • Quora: quora.com/Machine-‐Learning • Data Science in H2O: http://docs.h2o.ai/h2oclassic/datascience/top.html
• Visualization Introduction to Machine Learning: r2d3.us/visual-‐intro-‐to-‐machine-‐learning-‐part-‐1
• Machine Learning Map: http://scikit-‐learn.org/stable/tutorial/machine_learning_map/
Questions and Answers