big data monetization in telecoms
Post on 18-Jul-2015
152 Views
Preview:
TRANSCRIPT
Beyond the Arc, Inc. © 2014 Beyond the Arc, Inc.
Customer Experience & Strategic Communications
July 15, 2014
Big Data Monetization in Telecoms
Addressing your biggest business challenges
with Data Science
Workshop
Beyond the Arc, Inc.
About Beyond the Arc
2
Beyond the Arc is a Berkeley-based customer experience consultancy that helps businesses use Big Data to:
• Transform the customer experience
• Streamline operations
• Develop the products of the future
Beyond the Arc, Inc.
Who am I?
Brandon Purcell
Data Science Team Lead at Beyond the Arc
• Manage a team of data scientists who specialize in translating business challenges into data challenges, then translating data solutions back into implementable business solutions
• Trader on the CBOE and American Stock Exchange
• Peace Corps volunteer in Benin (West Africa)
• MBA from Haas School of Business at UC Berkeley
• BA from Dartmouth College
3
Beyond the Arc, Inc.
Who are you?
Tell us about yourselves
• Name and current position
• Background – how did you find your way into Big Data?
• What are you hoping to get out of today’s workshop?
• Something about yourself that you don’t usually reveal to people you’ve just met
4
Beyond the Arc, Inc.
Workshop goal #1
5
Our goal today is to teach you a standard framework for data
mining and apply it to your specific Big Data business challenges
Beyond the Arc, Inc.
Workshop goal #2
6
Have fun!(We are in Vegas, after all)
Beyond the Arc, Inc.
Agenda
7
Time Items Goals
1:30 to 1:45 Introductions Introduce ourselvesReview workshop goals
1:45 to 2:00 What is Big Data? Define Big DataIdentify your top Big Data challenges
2:00 to 3:00 CRISP-DM• Business Understanding• Data Understanding• Data Preparation
Introduction to Data Mining and CRISP-DMApply CRISP-DM to your real-world challenges
3:00 to 3:30 Break Relax and reenergize
3:30 to 4:00 CRISP-DM• Modeling• Evaluation• Deployment
Apply CRISP-DM to your real-world challenges
4:00 to 4:30 Wild-card discussion and conclusion
Discuss top-of-mind topicsWrap up the workshop
Beyond the Arc, Inc.
Agenda
8
Time Items Goals
1:30 to 1:45 Introductions Introduce ourselvesReview workshop goals
1:45 to 2:00 What is Big Data? Define Big DataIdentify your top Big Data challenges
2:00 to 3:00 CRISP-DM• Business Understanding• Data Understanding• Data Preparation
Introduction to Data Mining and CRISP-DMApply CRISP-DM to your real-world challenges
3:00 to 3:30 Break Relax and reenergize
3:30 to 4:00 CRISP-DM• Modeling• Evaluation• Deployment
Apply CRISP-DM to your real-world challenges
4:00 to 4:30 Wild-card discussion and conclusion
Discuss top-of-mind topicsWrap up the workshop
Beyond the Arc, Inc.
What is Big Data and how can we monetize it?
9
Beyond the Arc, Inc.
What is Big Data?
10
Beyond the Arc, Inc.
and how can we monetize it?
11
Beyond the Arc, Inc.
What Big Data business challenges are you
facing?
12
In other words, why are you here?
• Tell us one of your most pressing Big Data business challenges and we will attempt to address it today
• Note: “I need a Big Data solution” is not an adequate answer
◦ Why do you need it?
◦ What real business problem will it address?
◦ How will an effective solution impact your bottom line?
Beyond the Arc, Inc. 13
Look for this criteria
Business problems with:
• Clear business objective
• Available data
• Feeds into existing business process and makes it better
Beyond the Arc, Inc. 14
Selecting your project
• What are your most pressing problems…
• about which you have data…
• that we can address in a short time (30-90 days)?
Beyond the Arc, Inc.
What Big Data business challenges are you
facing?
15
In other words, why are you here?
Tell us one of your most pressing Big Data business challenges and we will
attempt to address it today
Beyond the Arc, Inc.
What Big Data business challenges are you
facing?
Common examples of Big Data business challenges:
• Retention – How can we stop our customers from leaving?
• Cross-sell and up-sell – How can we make our customer relationships more profitable?
• Product development – How do I know what people want? How do a I improve an existing product or service?
• Operational efficiency – Where is there waste in our operations? How can we cut costs?
• Predictive maintenance – Can we predict and therefore prevent outages before they occur?
• Compliance – How can we prevent complaints and compliance issues before they occur?
16
Beyond the Arc, Inc.
Big Data opportunities for telecoms
17
Beyond the Arc, Inc.
Agenda
18
Time Items Goals
1:30 to 1:45 Introductions Introduce ourselvesReview workshop goals
1:45 to 2:00 What is Big Data? Define Big DataIdentify your top Big Data challenges
2:00 to 3:00 CRISP-DM• Business Understanding• Data Understanding• Data Preparation
Introduction to Data Mining and CRISP-DMApply CRISP-DM to your real-world challenges
3:00 to 3:30 Break Relax and reenergize
3:30 to 4:00 CRISP-DM• Modeling• Evaluation• Deployment
Apply CRISP-DM to your real-world challenges
4:00 to 4:30 Wild-card discussion and conclusion
Discuss top-of-mind topicsWrap up the workshop
Beyond the Arc, Inc. 19
Introduction to Data Mining
and CRISP-DM
Beyond the Arc, Inc.
What is data mining?
• Data mining means:
o finding patterns in your data
o which you can use
o to do your business better
20
Beyond the Arc, Inc.
Data mining algorithms are only tools
• Data Mining algorithms are incredibly smart data-wise, but incredibly dumb business-wise.
• Algorithms find patterns in data.
• We’re looking for patterns in business and customer behavior.
• Only significant and actionable patterns are interesting -computers can’t decide that.
21
Beyond the Arc, Inc.
Okay, how do we do it?
• Create models to understand and predict behavior; apply the models by generating a score for each customer
• Deploy these models immediately, tested and targeted across the customer base
• Evaluate change in customer behavior
• Repeat this process to learn for the future, which leads to…
22
Beyond the Arc, Inc.
Decision Models
• Most exciting results of data mining are decision models.
• These are executable objects that can be put to work wherever appropriate in the business.
• For example, decision models can score each customer with:
o Their risk to default on payment
o Their propensity to buy a new product/service
o The likelihood that they will close their account
23
Beyond the Arc, Inc.
Applications
• Propensity too Make a purchase
o Take up offer
o Default on payment
o Cancel a contract
• Likelihood of risko Credit risk
o Fraud risk
• Cross-Sell / Up-Sello Next best activities
o Next best offer
• Segmentationo Groups of customers
o Groups of products
o Groups of communities
• Campaign Strategyo Creative Campaign strategy
o Call Centre strategy
o Direct Mailing strategy
o Viral Marketing strategy
24
Beyond the Arc, Inc.
Likelihood to purchase
Customer1 ………...90%Customer2 ………...50%Customer3 ………...80%Customer4 ………...10%Customer5 ………... 5%Customer6 ………...95%
Scoring each customer
• Predictive models are scored using current customer data
• Each customer is given a unique score (e.g., cross-selling likehood, risk, potential revenue)
25
DataWarehouse
Data MiningModel
Beyond the Arc, Inc. 26
Business Examples
of Predictive Analytics
Beyond the Arc, Inc.
Customer Acquisition
• Problem: We need more customers
• Resolution: Send promotions to those most likely to accept our offer
• Process:
o Get data showing who accepted the same or similar offer in the past
o Purchase demographic data about people
o Build models that match buyers and demographics
o Use the models to rank the prospects
Beyond the Arc, Inc.
Cross-sell and Up-sell
• Problem: Let’s make the customers we have more profitable by
selling them more products or products with a higher margin
• Resolution: Predict which ones are most likely to buy and why. Present targeted offers to those most likely to buy.
• Process:
o Build predictive models from historical data
o Use the models to understand reasons for purchases
o Predict buying propensity and reason for each customer
Beyond the Arc, Inc.
Churn Analysis – Customer Retention
29
• Problem: Customers are leaving
• Resolution: Predict which ones are most likely to leave and why. Prepare targeted retention messages and incentives, so the call center is ready when customers call.
• Process:
o Using historical data, build models predicting which customers are most likely to leave and why
o Predict risk level and reasons for each customer and store in database, along with prepared response text
Beyond the Arc, Inc. 30
CRISP-DM:
Cross-Industry Standard Process
for Data Mining
Beyond the Arc, Inc.
CRISP-DM Process
What it is:
• Reliable and repeatable process for doing data mining projects
• Used across industries
Six phases:
• Business Understanding
• Data Understanding
• Data Preparation
• Modeling
• Evaluation
• Deployment
31
Beyond the Arc, Inc. 32
Business Understanding
Beyond the Arc, Inc.
Focus of Business Understanding
• Understand project objectives and requirements from a business perspective
• Convert this knowledge into:
o Data mining problem definition
o Preliminary plan designed to achieve the objectives
33
Beyond the Arc, Inc. 34
Business Understanding Tasks
• Determine Business Objectives
• Assess Situation
• Determine Data Mining Goal
• Produce Project Plan
Beyond the Arc, Inc. 35
Tasks with details
• Determine Business Understandingo Describe the business owner’s primary objectives
o Describe the criteria for a successful/useful outcome
• Assess Situation
o Identify resources, assumptions, constraints, and risks
o Do a cost-benefit analysis
• Determine Data Mining Goals
o Describe outputs and define success criteria
• Produce Project Plan
o Specify steps, including selection of tools and techniques
o List stages and dependencies
Beyond the Arc, Inc. 36
Business Understanding – Key Questions
• Imagine that you are starting a new data mining project.
• What questions would you ask to understand the business owner’s needs and expectations?
Beyond the Arc, Inc. 37
Business Understanding – Key Questions
• Who are my business partners
• What is important to them?
• What resources do I need? Have available?
• What assumptions am I making?
• What constraints should I consider?
• What are my data mining goals?
• How will I know if I’ve achieved them?
• What’s the timeline? Budget?
Beyond the Arc, Inc. 38
Exercise – Business Understanding
Answer as many of these key questions as possible
• Who are my business partners
• What is important to them?
• What resources do I need? Have available?
• What assumptions am I making?
• What constraints should I consider?
• What are my data mining goals?
• How will I know if I’ve achieved them?
• What’s the timeline? Budget?
You have 10 minutes
Beyond the Arc, Inc. 39
Data Understanding
Beyond the Arc, Inc.
Focus of Data Understanding
• Begins with data collection
• Followed by activities to:
o Get familiar with the data
o Identify data quality problems
o Discover first insights into the data
o Detect interesting subsets to form hypotheses for hidden information
40
Beyond the Arc, Inc. 41
Data Understanding Tasks
• Collect Data
• Describe Data
• Explore Data
• Verify Data Quality
Beyond the Arc, Inc. 42
Tasks with details
• Collect datao Acquire data necessary for project
o Integration of multiple data sources may be necessary
• Describe datao High level report on data properties
• Explore datao Identify key attributes
o Identify interesting subsets
• Verify data qualityo Determine whether data is complete
o Determine steps for Data Preparation
Beyond the Arc, Inc. 43
Data Understanding – Key Questions
When starting a new data mining project with a new data source, what key questions would you ask?
Beyond the Arc, Inc. 44
Data Understanding – Key Questions
• What data do I have?
• What data do I need?
• Where do I acquire it and how do I get access?
• How often is it updated?
• How clean is it?
• How and when was it collected?
• How far back does it go?
• Is the data internal? External? Mixed?
• What’s the security protocol? Can I share it / take it home?
• Can the data be improved at all going forward?
Beyond the Arc, Inc. 45
Data Understanding – Key Questions
Tell us a little bit about your data
• What data do I have?
• What data do I need?
• Where do I acquire it and how do I get access?
• How often is it updated?
• How clean is it?
• How and when was it collected?
• How far back does it go?
• Is the data internal? External? Mixed?
• What’s the security protocol? Can I share it / take it home?
• Can the data be improved at all going forward?
You have 10 minutes
Beyond the Arc, Inc. 46
Data Preparation
Beyond the Arc, Inc.
Focus of Data Preparation
• Covers all activities needed to construct final dataset
• Tasks are likely to be performed multiple times, and not in any set order; they include:
o Table, record, and attribute selection
o Transformation and cleaning of data
90% of an analyst’s time is spent on Data Prep
47
Beyond the Arc, Inc. 48
Data Preparation Tasks
• Select Data
• Clean Data
• Construct Data
• Integrate Data
• Format Data
You will typically spend most of your time on this step
Beyond the Arc, Inc. 49
Tasks with details
• Select datao Decide which data is necessary for analysis
• Clean datao Improve data quality so it can be used for modeling
• Construct datao Derive new attributes
o Transform existing values
• Integrate datao Combine information from multiple tables
• Format datao Prepare data for tool use
Beyond the Arc, Inc. 50
Data Preparation – Key Questions
When preparing data sources for analysis, what questions would you need to ask?
Beyond the Arc, Inc. 51
Data Preparation – Key Questions
• What should my data look like to enable me to do the analysis?
• What specific data do I need to select for the analysis? Why?
• Which fields are necessary for my analysis?
• What data am I lacking?
• Do I have enough data?
• Do I have duplicates?
• What fields do I need to derive?
• What do I do with null values? (discard, impute, etc.)
• Do I need to combine data from multiple sources?
Beyond the Arc, Inc. 52
Data Preparation – Key Questions
What are 3 data prep steps you will need to accomplish?
• What should my data look like to enable me to do the analysis?
• What specific data do I need to select for the analysis? Why?
• Which fields are necessary for my analysis?
• What data am I lacking?
• Do I have enough data?
• Do I have duplicates?
• What fields do I need to derive?
• What do I do with null values? (discard, impute, etc.)
• Do I need to combine data from multiple sources?
You have 5 minutes
Beyond the Arc, Inc.
Agenda
53
Time Items Goals
1:30 to 1:45 Introductions Introduce ourselvesReview workshop goals
1:45 to 2:00 What is Big Data? Define Big DataIdentify your top Big Data challenges
2:00 to 3:00 CRISP-DM• Business Understanding• Data Understanding• Data Preparation
Introduction to Data Mining and CRISP-DMApply CRISP-DM to your real-world challenges
3:00 to 3:30 Break Relax and reenergize
3:30 to 4:00 CRISP-DM• Modeling• Evaluation• Deployment
Apply CRISP-DM to your real-world challenges
4:00 to 4:30 Wild-card discussion and conclusion
Discuss top-of-mind topicsWrap up the workshop
Beyond the Arc, Inc. 54
Break
Beyond the Arc, Inc.
Agenda
55
Time Items Goals
1:30 to 1:45 Introductions Introduce ourselvesReview workshop goals
1:45 to 2:00 What is Big Data? Define Big DataIdentify your top Big Data challenges
2:00 to 3:00 CRISP-DM• Business Understanding• Data Understanding• Data Preparation
Introduction to Data Mining and CRISP-DMApply CRISP-DM to your real-world challenges
3:00 to 3:30 Break Relax and reenergize
3:30 to 4:00 CRISP-DM• Modeling• Evaluation• Deployment
Apply CRISP-DM to your real-world challenges
4:00 to 4:30 Wild-card discussion and conclusion
Discuss top-of-mind topicsWrap up the workshop
Beyond the Arc, Inc. 56
Modeling
Beyond the Arc, Inc. 57
Translating business problems to data mining goals
Problem Example
Predict (Propensity) Which people will become customers? Who will leave the company?
(Estimation) What do we forecast as a future network load?
Classify How to profile or describe the customers who belong to known groups of interest (e.g., high profit/low profit/loss making)?
Segment Which customers form groups which have highly similar members?
Associate Which products or services are bought together?
Sequence Find the most common sequences of events and see how they play out for a given customer’s behavior.
Beyond the Arc, Inc. 58
Matching data mining goals to analytic approaches
Problem Example Approach
Predict (Propensity) Which people will become customers? Who will leave the company?
(Estimation) What do we forecast as a future network load?
C5.0, C&RT, Neural Networks
C&RT, Neural Networks, Linear Regression
Classify How to profile or describe the customers who belong to known groups of interest (e.g., high profit/low profit/loss making)?
Neural Networks, C5.0, C&RT, Logistic Regression
Segment Which customers form groups which have highly similar members?
Kohonen (Self-Organizing) Mapping, K-Means Clustering, Two-Step Clustering, C5.0, C&RT
Associate Which products or services are bought together?
Apriori, GRI
Sequence Find the most common sequences of events and see how they play out for a given customer’s behavior.
Sequence (CARMA)
Beyond the Arc, Inc.
Focus of Modeling
• Select and apply the modeling techniques - typically, there are several techniques for the same type of data mining problem
• Some techniques have specific requirements on the form of data…
• So it’s often necessary to go back to the data preparation phase
59
Beyond the Arc, Inc. 60
Modeling Tasks
• Select Modeling Technique
• Generate Test Design
• Build Model
• Assess Model
Beyond the Arc, Inc. 61
Tasks with details
• Select Modeling Technique
o Select the specific modeling technique
• Generate Test Design
o Separate the dataset into train and test sets
o Build the model on the train set
o Estimate its quality of the test set
• Build Model
o Run the modeling tool on the dataset
• Assess Model
o Use evaluation criteria to assess the model or models
Beyond the Arc, Inc.
Frame the business problem
• Understand the "real“ problem, (not just the data mining/modeling tasks).
• In some cases, studying the problem may reveal that a model or other sophisticated analysis is not needed.
• In most cases, the model will only be one part of a larger solution.
• A predictive model or clustering mechanism must be aligned with the structure of the overall solution.
62
Beyond the Arc, Inc.
Validate your approach
Data mining models always require validation
o Test model performance against new/unseen data to prove that it works
o This means partition your data into “Training” and “Validation” sets
63
Beyond the Arc, Inc. 64
Modeling – Key Questions
When selecting a modeling technique, what key questions would you ask?
Beyond the Arc, Inc. 65
Modeling – Key Questions
• What type of problem am I trying to solve?
• What measurement type is my target field?
• What measurement types are my input fields?
• How should I partition the data?
• What models should I use?
• How will I evaluate them?
• How will I explain models to my leadership?
Beyond the Arc, Inc. 66
Modeling – Key Questions
Which model might you use and why?
• What type of problem am I trying to solve?
• What measurement type is my target field?
• What measurement types are my input fields?
• How should I partition the data?
• What models should I use?
• How will I evaluate them?
• How will I explain models to my leadership?
You have 5 minutes to select a model
Beyond the Arc, Inc. 67
Evaluation
Beyond the Arc, Inc.
Focus of Evaluation
• At this phase, one or more high quality models have been built
• Before proceeding, thoroughly evaluate the models to be certain they achieve the business objectives
• Determine if there is an important business issue that has not been sufficiently addressed
• At the end of this phase, reach a decision on the use of the data mining results
68
Beyond the Arc, Inc. 69
Evaluation Tasks
• Evaluate Results
• Review Process
• Determine Next Steps
Beyond the Arc, Inc. 70
Tasks with details
• Evaluate Results
o Assess degree to which model meets business objectives
• Review Process
o Check to see if an important factor or task has been overlooked
o Conduct QA
• Determine Next Steps
o Decide whether or not to move to deployment
Beyond the Arc, Inc. 71
Evaluation – Key Questions
Once you’ve modeled the data, what questions would you ask to evaluate your findings?
Beyond the Arc, Inc. 72
Evaluation – Key Questions
• How do I evaluate the results?
• What should I do if I get no results?
• What tools do I use?
• Which model is best?
• Does the model make sense from a business standpoint?
• Are the results in an actionable form?
Beyond the Arc, Inc. 73
Evaluation – Key Questions
How will you evaluate your results?
• How do I evaluate the results?
• What should I do if I get no results?
• What tools do I use?
• Which model is best?
• Does the model make sense from a business standpoint?
• Are the results in an actionable form?
You have 5 minutes
Beyond the Arc, Inc. 74
Deployment
Beyond the Arc, Inc.
Focus of Deployment
• Creation of model is generally not end of project
• Even if purpose of model is to increase knowledge of data, this knowledge will need to be organized and presented in a way business owner can use it
• Depending on requirements - deployment phase can be as simple as generating a report, or as complex as implementing a repeatable data mining process
• In many cases, it will be business owner, not data analyst who will carry out deployment steps
• Business owner needs to understand up front what actions need to be carried out to make use of models
75
Beyond the Arc, Inc. 76
Deployment Tasks
• Plan Deployment
• Plan Monitoring and Maintenance
• Produce Final Report
• Review Project
Beyond the Arc, Inc. 77
Tasks with details
• Plan Deploymento Summarize your deployment strategy/steps
• Plan Monitoring and Maintenanceo Summarize your monitoring and maintenance strategy/steps
• Produce Final Reporto Produce a final written report
• Review Projecto Assess what happened
o Identify areas for improvement
Beyond the Arc, Inc. 78
Deployment – Key Questions About Findings
Before presenting your findings, what questions would you want to answer?
Beyond the Arc, Inc. 79
Deployment – Key Questions About Findings
• What are the insights that emerged from this analysis?
• Do they answer the business problem that I set out to solve?
• Do they answer another business problem?
• Are they actionable?
• Who is affected by this? (audience for presentation)
• Are there political sensitivities about these insights?
• What is the most compelling way to present these to my management team?
• How will we measure success?
Hint: The “so what” needs to be upfront; include recommendations for action as appropriate.
Beyond the Arc, Inc. 80
Deployment – Key Questions About Findings
What would a key insight look like and how would you put it into action?
• What are the insights that emerged from this analysis?
• Do they answer the business problem that I set out to solve?
• Do they answer another business problem?
• Are they actionable?
• Who is affected by this? (audience for presentation)
• Are there political sensitivities about these insights?
• What is the most compelling way to present these to my management team?
• How will we measure success?
You have 5 minutes to complete this
Beyond the Arc, Inc.
Congratulations! You have used the CRISP-DM
process to solve a Big Data business problem!
Next Steps:
• Collect your notes and present to leadership and relevant stakeholders
• Initiate project, adhering to CRISP-DM process
• Measure success
• Bask in professional glory
• Retire early and move to Kauai
81
Beyond the Arc, Inc.
Agenda
82
Time Items Goals
1:30 to 1:45 Introductions Introduce ourselvesReview workshop goals
1:45 to 2:00 What is Big Data? Define Big DataIdentify your top Big Data challenges
2:00 to 3:00 CRISP-DM• Business Understanding• Data Understanding• Data Preparation
Introduction to Data Mining and CRISP-DMApply CRISP-DM to your real-world challenges
3:00 to 3:30 Break Relax and reenergize
3:30 to 4:00 CRISP-DM• Modeling• Evaluation• Deployment
Apply CRISP-DM to your real-world challenges
4:00 to 4:30 Wild-card discussion and conclusion
Discuss top-of-mind topicsWrap up the workshop
Beyond the Arc, Inc. 83
Wild-Card
Discussion
Beyond the Arc, Inc. 84
Brandon Purcell 510.926.2694 bpurcell@beyondthearc.com
Office 877.676.3743
Web
Blog
beyondthearc.com
Beyondthearc.com/blog
:: Twitter :: LinkedIn :: Facebook
Thank you
Please keep in touch!
Beyond the Arc, Inc. 85
Appendix
Beyond the Arc, Inc.
Determine Business
Objectives•Background•Business Objectives•Business Success
Criteria
Assess Situation •Inventory of Resources•Requirements,
Assumptions, andConstraints•Risks and Contingencies•Terminology•Costs and Benefits
Determine Data Mining Goal
•Data Mining Goals•Data Mining Success
Criteria
Produce Project Plan•Project Plan•Initial Asessment of
Tools and Techniques
Collect Initial Data•Initial Data Collection
Report
Describe Data•Data Description Report
Explore Data•Data Exploration Report
Verify Data Quality •Data Quality Report
Data Set•Data Set Description
Select Data •Rationale for Inclusion
/ Exclusion
Clean Data •Data Cleaning Report
Construct Data•Derived Attributes•Generated Records
Integrate Data•Merged Data
Format Data•Reformatted Data
Evaluate Results•Assessment of Data
-Mining Results w.r.t. -Business Success
Criteria•Approved Models
Review Process•Review of Process
Determine Next Steps•List of Possible ActionsDecision
Plan Deployment•Deployment Plan
Plan Monitoring and Maintenance
•Monitoring and Maintenance Plan
Produce Final Report•Final Report•Final Presentation
Review Project•Experience
Documentation
Deployment
Select ModelingTechnique
•Modeling Technique•Modeling Assumptions
Generate Test Design•Test Design
Build Model•Parameter Settings•Models•Model Description
Assess Model•Model AssessmentRevised Parameter
Settings
CRISP DM Process
Business Understanding
DataUnderstanding
DataPreparation
EvaluationModeling Deployment
Beyond the Arc, Inc. 87
Model Matrix
Output(Target) Input
Clear results
structure? Algorithm Goal Results
Flag or NominalFlag, or Nominal and Continuous Yes
C5.0, CHAID, C&RT,QUEST Predict / Profile
Rule Set or Decision Tree with predicted category
and associated confidence (CHAID, C&RT, QUEST
can be built interactively)
Flag or NominalFlag, or Nominal and Continuous No
Decision List, Neural Network, SVM, Bayes Net, Discriminant, KNN Predict / Profile
Ranked set of rows, more opaque solution
structure
Flag or NominalFlag, or Nominal and Continuous No SLRM Campaign modeling Prediction of response yes/no
ContinuousContinuous and Flag or Nominal Yes C&RT, CHAID Predict / Profile
Decision Tree with mean predictions and
associated variance (CHAID, C&RT, QUEST can be
built interactively)
ContinuousContinuous (and Flag or Nominal) Yes Linear Regression Predict Equation for mean prediction with coefficients
ContinuousContinuous (and Flag or Nominal) No
Neural Network, Generalized Linear Model, SVM, KNN Predict Numeric prediction
Flag or NominalContinuous (and Flag or Nominal) Sort of Logistic Regression Predict
Equation for prediction of probability and
associated coefficients
Flag or Nominal or Continuous
Continuous,Flag, or Nominal No Neural Network Predict
Prediction and relative importance of input
variables, but no equation or tree (black box
solution)
NoneContinuous,Flag, or Nominal Sort of Kohonen Map Cluster
Cluster Membership represented as X and Y
coordinates
None Numeric Yes K-Means Cluster Cluster Membership
NoneContinuous,Flag, or Nominal Yes Two-Step Cluster Cluster Membership
Beyond the Arc, Inc. 88
Model Matrix – Specialty algorithms
Output(Target)
Input Clear results
structure?
Algorithm Goal Results
Flag or Nominal Flag or Nominal Yes Apriori, CARMA Associate Association with confidence
Flag or Nominal
Flag or Nominal with time sequence Yes Sequence (CARMA) Sequence Sequence Association with confidence
Continuous Continuous Sort of
Time SeriesExp SmoothingARIMA Forecast
Equation and future predictions with confidence
intervals,
line graph
Numeric Continuous Yes Cox RegressionTime until something happens Predicted time to event
None Continuous Sort of Factor / PCA
Reduce number of variables, remove correlation
Variable groupings that make up factors,
Continuous score for each factor to use in next
model
NoneContinuous,Flag, or Nominal Sort of
Feature SelectionAnomaly Detection Detect data problems Filter node for variables, flag for outlier cases
Beyond the Arc, Inc. 89
Approaches to business problems–C5.0
Approach More examples Strengths Watch out for…
C5.0 •Key behaviors of customers who are likely
to leave
•Customer acquisition profiles
•Improve profitability with a targeted
message to customers
•Discover best niche markets
•Discover unusual segments for better
business strategy
•Identify most important variables from a
larger set by using those which appear
higher in tree and by using the Rule Set
generator
Gives a clear explanation in
the form of a rule set or
decision tree.
Works well with :•Complicated data
• Nonlinear data
Works with small-cell
designs
Allows for multiple rules to
fire and can select the best
rule by voting
Can help you generate
hypotheses and insight from
the rules
Important variables may be
combined with a derive
node and / or dropped in
later models to gain
increased insight.
The rule set is not built
directly from the tree but
from a subset of the tree that
may be better for real data.
If two numeric input
variables are highly
correlated, or if two
symbolic variables are
closely related, C5.0 will use
only one of them for the tree
and drop the other.
Classification results are just
as good, but it may be the
case that the dropped
variable is easier to obtain
for deployment. You may
wish to test the data in C&RT
as well to see if there is a
variable substitution effect
as described above.
Beyond the Arc, Inc. 90
Approaches to business problems–C&RT
Approach More examples Strengths Watch out for…
C&RT •Key behaviors of customers who are
likely to leave
•Customer acquisition profiles
•Improve profitability with a targeted
message to customers
•Discover best niche markets
•Discover unusual segments for
better business strategy
•Identify most important variables
from a larger set by using those
which appear higher in the tree
Gives a clear explanation
in the form of a rule set
or decision tree.
Works well with: • Complicated data
• Nonlinear data
Works with small-cell
designs
Accepts a numeric or a
symbolic target
You can explore
surrogate variables the
model would have used
if the variable it did use
was not available
Only gives binary splits
in the tree
Beyond the Arc, Inc. 91
Approaches to business problems–Linear Regression
Approach More examples Strengths Watch out for…
Linear
Regression
•Detect fraudulent transactions by
looking at outliers and poorly
predicted cases
•Predict customer behavior within
the observed range of example data
•Which variables are most important?
•What-if scenarios by substituting
new values into the regression
equation
•What is the amount of expected
change in the outcome variable when
one of the inputs changes?
Works well with linear
data
Gives an easily
understood equation
with effects for each
input, controlling for the
other variables
Can not make reliable
predictions outside the
observed range of
inputs
Can not have a two
category or dummy
coded target; use
Logistic Regression
instead.
Can not have highly
correlated input
variables; use PCA to
remove this problem.
Does not work well with:• Complicated data• Nonlinear data• Time series forecasting
Beyond the Arc, Inc. 92
Approaches to business problems–Logistic Regression
Approach More examples Strengths Watch out for…
Logistic
Regression
•Predict probability of one behavior
vs. another (e.g., cancel vs. active)
•“What if…” scenarios
•Discover influential factors
pertaining to desired outcome
•Predict probabilities for multi-
category targets (e.g., cancel, active,
terminate)
Gives an equation
evaluating tradeoffs
(changes in odds) for a
given combination of
inputs.
Allows for many dummy
coded variables as
inputs
Widely understood
The equation gives the
natural log of the odds
ratio which is not always
easily understood
Beyond the Arc, Inc. 93
Approaches to business problems–Neural Network
Approach More examples Strengths Watch out for…
Neural
Network
•Use propensity of customer to churn
to select best offer for save
•Evaluate risk of customer to defect
•Discover unusual customers
potentially associated with fraud
•Identify best potential customers
•Identify most important variables
using relative importance measure
Works well with: • Complicated data
• Nonlinear data
Often has high accuracy
Has many different
topology methodologies
to pick from
Gives a “black box” or
unexplainable solution.
Too many input
categories may cause
the model to over fit the
data
Beyond the Arc, Inc. 94
Approaches to business problems–Kohonen Map
Approach More examples Strengths Watch out for…
Kohonen
Map
•Discover similarities in customer
behavior
•Create new segment variable to use
as input for further analysis
•Use segments to create targeted
messages
Works well with: • Complicated data
• Nonlinear data
Based on similar
patterns
You may experiment in
determining the ideal
number of clusters by
exploring more than one
map layout
You may use rule
induction methods or
graphical techniques to
profile segments
Cluster understanding is
still necessary to
interpret findings
Beyond the Arc, Inc. 95
Approaches to business problems–K-Means Clustering
Approach More examples Strengths Watch out for…
K-Means
Clustering
•Create new segment variable to use
as input for further analysis
•Use segments to create targeted
messages
Searches for cases that
are close together in
multi-dimensional space
using a distance
measure
Uses fewer data passes
than traditional
hierarchical clustering
User must determine
ideal number of clusters
by exploring more than
one solution (choosing
different values of K and
re-running the analysis)
Cluster understanding is
still necessary to
interpret findings
You may use rule
induction methods or
graphical techniques to
profile segments
Beyond the Arc, Inc. 96
Approaches to business problems–Two-Step Clustering
Approach More examples Strengths Watch out for…
Two-Step
Clustering
Create new segment variable to use
as input for further analysis
Use segments to create targeted
messages
Searches for cases that
are close together in
space
Uses only 1 data pass
Uses statistical criterion
to determine ideal
number of clusters
Data should be
randomized before the
analysis
Cluster understanding is
still necessary to
interpret findings
You may use rule
induction methods or
graphical techniques to
profile segments
Beyond the Arc, Inc. 97
Approaches to business problems–Apriori
Approach More examples Strengths Watch out for…
Apriori •Market Basket Analysis
•Identify behaviors associated with a
particular outcome
•Identify features used together
•Identify “hot spots”
Data can be
transactional or tabular
A time field may be
incorporated to tell the
model when events
happen at the same
time.
Can control
interestingness of rules
from a variety of
perspectives
Can evaluate flags as
true-only or as presence
and absence
Only symbolic inputs
and conclusions
Beyond the Arc, Inc. 98
Approaches to business problems–GRI
Approach More examples Strengths Watch out for…
GRI •Market Basket Analysis
•Identify thresholds of inputs
associated with particular behaviors
•Identify behaviors associated with a
particular outcome
•Identify features used together
Data can be
transactional or tabular
Can control length,
coverage and confidence
of rules
Can evaluate flags as
true-only or as presence
and absence
While inputs may be
symbolic or numeric,
conclusions may only be
symbolic
Beyond the Arc, Inc. 99
Approaches to business problems–Sequence Detection
Approach More examples Strengths Watch out for…
Sequence
Detection
•Use insight to streamline business
process.
•Discover unusual sequences
indicating areas for business
improvement
Order of events must be
recorded though not
necessarily timing.
Thousands of sequences
can be evaluated
Uses the CARMA
algorithm
It may take a long time
or a lot of memory
Beyond the Arc, Inc. 100
Approaches to business problems–Factor/PCA
Approach More examples Strengths Watch out for…
Factor /
Principal
Components
Analysis
(PCA)
•Remove correlation between
independent variables
•Discover which inputs are most
important for each underlying factor
Place the PCA nugget in
your stream followed by
a Type node. Use the
variables created by the
PCA node as inputs into
your model. Remove
(set to None) the
variables that were used
as input for the PCA
model.
Can make direct
interpretation of the
effect of inputs very
awkward (what is the
meaning of a unit
change in a factor
score?). Consider using
the mean of the
variables with the
highest inputs on each
factor, excluding the
variables used on other
factors, to create an
interpretable set of
variables.
top related