how to improve customer acquisition models with ensembles · 2010-06-08 · –create one decision...
TRANSCRIPT
© Abbott Analytics 2001-2009 1
How to Improve Customer Acquisition
Models with Ensembles
Dean Abbott
Abbott Analyticshttp://www.abbottanalytics.com
Predictive Analytics World
October 20, 2009
Washington, DC
Brian Siegel
TN Marketing, LLChttp://www.tnmarketing.com
© Abbott Analytics 2001-2009 2
Acknowledgements
Thanks to– TN Marketing, LLC for allowing this problem and solution
to be described in a public setting
– The Modeling Agency (TMA) and its President, Eric King
TMA is the contractor for predictive analytics consulting with TN Marketing
Mr. Abbott was a representative of TMA in this (and other) consulting engagements
http://www.the-modeling-agency.com
Data described in this talk is– Real, live, and difficult to model
© Abbott Analytics 2001-2009 3
Outline of Presentation
Introductions
Overview of Project
Data Preparation
Modeling Approach with Ensembles
Modeling Results
Deployment Results
© Abbott Analytics 2001-2009 4
About Abbott Analytics
Abbott Analytics– Founded in 1999, based in San Diego, CA
– Dedicated to data mining consulting and training
Principal: Dean Abbott– Applied Data Mining for 22+ years in
Direct Marketing, CRM, Survey Analysis, Tax Compliance, Fraud Detection, Predictive Toxicology, Biological Risk Assessment
– Course InstructionPublic 2- and 3-day Data Mining Courses
Conference Tutorials
– Customized Training and Knowledge TransferData mining methodology
Training services and hands-on courses for software products, including Clementine, Statistica, Affinium Model, Enterprise Miner, Tibco Spotfire Miner, CART,
© Abbott Analytics 2001-2009 5
A Word About TN Marketing
TN Marketing has been in business since Dec. 1998
Privately owned program developer and marketer, located in Minneapolis, MN.
TN Marketing’s business provides Partners with a productive marketing program that:
Generates direct revenues without investment
Increases brand loyalty
Supports leading edge direct marketing and fulfillment of books, and DVDs to members and customers of affinity partners using proprietary technology systems.
One of the 10 largest Direct Response book and video marketing/distribution companies in America.
© Abbott Analytics 2001-2009 6
The TN Marketing ModelTN Marketing licenses the brand and content for use in a direct mail marketing campaign to the brand’s current customers.
TN Marketing identifies and develops product(s) in consultation with the brand.
The brand approves the products.
TN Marketing solicits products to qualified brand enthusiasts. If it appeals to them, they may elect to receive similar products in a continuity series.
The brand earns royalties on sales.
The brand’s customers are assured:– 100% satisfaction guaranteed.
– No minimum purchase obligations and the ability to cancel at any time.
© Abbott Analytics 2001-2009 7
TN Marketing enjoys partnerships with
numerous consumer, subscriber, donor
and membership organizations
© Abbott Analytics 2001-2009 9
Objective’s
Random test mailing to NRA’s house file achieved a 11% response rate
Minimum response rate required to meet financial expectations is 13.5%
Develop a binary outcome model that will rank-order current database based on propensity to respond to traditional mailing, optimizing at a cumulative average response rate of >= 13.5%.
© Abbott Analytics 2001-2009 10
Source Data
Business partner provided data that summarizes
transactional data for every active NRA member
- 49 independent variables.
TN Marketing enhanced the database with
demographic data- 18 appended variables.
I-Miner was used to derive new variable features
and transformations of pre existing data points -
79 derived variables.
© Abbott Analytics 2001-2009 11
Data Preparation
Key transformations– Date Features
– Filling missing data Use ―Distribution‖ when possible for numeric fields
Use Constant for categoricals
For numeric data with both ―in-house‖ and third-party versions, use in-house when available, and if not, use third party
– Binning and BinarizationReduce # values if nominal variables with many poorly populated values
© Abbott Analytics 2001-2009 12
Data Size
Original Data
Data after data cleanup and feature creation
Data after further cleanup, and adding interaction terms
© Abbott Analytics 2001-2009 13
Sampling
Randomly split the 21,557 records into two 10,775 record data sets– Build response model on training data set
– Validate model by scoring test data set
Problem: – Training set contains just over 1000 affirmative examples, at the
edge of the lower bound for building reliable models
– Could result in unreliable models that behave poorly upon rollout
Solution: Model Ensembles
© Abbott Analytics 2001-2009 14
What are Model Ensembles?
Combining outputs from multiple models into single decision
Models can be created using the same algorithm, or several
different algorithms
Decision Logic
Ensemble Prediction
© Abbott Analytics 2001-2009 15
Bagging
Bagging Method– Create many data sets by
bootstrapping (can also do this with cross validation)
– Create one decision tree for each data set
– Combine decision trees by averaging (or voting) final decisions
– Primarily reduces model variance rather than bias
Results– On average, better than any
individual tree
Final
Answer
(average)
Breiman, (1996). Bagging Predictors, Machine Learning, Vol. 24, No. 2, pp. 123-140.
© Abbott Analytics 2001-2009 16
Why Ensembles Work
Single model synthesis can be difficult– algorithms search for best model, but not exhaustively (trees, poly.
Nets, regression)
– iterative algorithms converge to local minima (neural nets)
– Insufficient data to smooth out noise if one partitions data into train/test
~Uncorrelated output estimates provide means to eliminate errors from individual classifiers
Picture from
T.G. Dietterich. Ensemble methods in machine learning. In Multiple Classier Systems, Cagliari, Italy, 2000.
http://citeseer.ist.psu.edu/dietterich00ensemble.html
© Abbott Analytics 2001-2009 17
Model Ensembles: The Good and the Bad
Pro– Can significantly reduce model error
– Can be easy to automate -- already has been done in many commercial tools using Boosting, Bagging, Random Forests, and others
Con– Model interpretability is more difficult
– Can be very time consuming to generate dozens of models to combine
Note:– Weak learners (trees, Naïve Bayes) have greater
margin for improvement, but all algorithms can benefit from ensembles through either
Improved performance, or
Risk reduction
© Abbott Analytics 2001-2009 18
Error Ranges for Model
Combinations on Glass Data
Model prediction diversity obtained by using different algorithms: tree, NN, RBF, Gaussian, Regression, k-NN
Combining 3-5 models on average better than best single model
Combining all 6 models not best (best is 3&4 model combination), but is close
The is an example of reducing model variance through ensembles, but not model bias
1 2 3 4 5 6
0%
5%
10%
15%
20%
25%
30%
35%
40%
Pe
rce
nt C
lassific
atio
n E
rro
r
Number Models Combined
Max Error Min Error Average Error
Abbott, D.W. (1999). Combining Models to Improve Classifier Accuracy and Robustness. 1999
International Conference on Information Fusion—Fusion99, Sunnyvale, CA, July 6-8.
© Abbott Analytics 2001-2009 19
Model Comparison Example:Rankings Tell Different Stories
Model Number Model ID AUC Train RMS Test RMS AUC Rank Train RMS Rank Test RMS Rank
50 NeuralNet1032 73.3% 0.459 0.370 9 53 1
39 NeuralNet303 72.4% 0.477 0.374 42 59 2
36 NeuralNet284 75.0% 0.458 0.376 2 52 3
31 NeuralNet244 72.7% 0.454 0.386 33 49 4
57 CVLinReg2087 70.4% 0.397 0.393 52 5 5
34 NeuralNet277 72.7% 0.455 0.399 28 50 6
37 NeuralNet297 72.4% 0.449 0.399 43 38 7
56 CV_CART2079 68.0% 0.391 0.401 54 4 8
54 CVNeuralNet2073 67.9% 0.403 0.401 55 6 9
59 CVNeuralNet2097 66.0% 0.403 0.401 59 7 10
61 CV_CART2104 70.4% 0.386 0.402 53 3 11
42 NeuralNet334 72.4% 0.450 0.404 40 44 12
52 CVLinReg2063 67.5% 0.404 0.404 57 8 13
41 NeuralNet330 72.4% 0.443 0.406 41 16 14
38 NeuralNet300 72.4% 0.451 0.408 38 45 15
55 CV_CHAID2078 64.6% 0.380 0.411 60 2 16
45 NeuralNet852 74.2% 0.456 0.413 3 51 17
53 CVLogit2068 67.5% 0.414 0.414 58 10 18
60 CV_CHAID2102 61.5% 0.380 0.414 61 1 19
58 CVLogit2092 67.7% 0.413 0.414 56 9 20
Top RMS model is 9th in AUC, 2nd Test RMS rank is 42nd in AUC
Correlation between rankings: AUC Rank Train RMS Rank Test RMS Rank
AUC Rank 1
Train RMS Rank (0.465) 1
Test RMS Rank (0.301) 0.267 1
© Abbott Analytics 2001-2009 20
PAKDD07 Results
Problem: finance company wants to cross-sell
home loans to credit card holders
40K records for training, 8K testing
http://lamda.nju.edu.cn/conf/pakdd07/dmc07/
© Abbott Analytics 2001-2009 21
Even Ensembles of Ensembles
Can Help
http://www.tiberius.biz/pakdd07.html
© Abbott Analytics 2001-2009 22
Our Ensemble Approach
Build 10 bootstrap samples of training data
Build one logistic regression model per
bootstrap sample
– Build each model carefully, pruning to avoid needless
overfit (but not avoiding overfit completely)
Combine models through averaging of
probabilities
Rank-order composite score to determine
mailing depth
© Abbott Analytics 2001-2009 23
Building Ensemble Predictions:
Considerations or I-Miner
Steps:
1. Join by record—all models applied to same data in
same row order
2. Change probability names
3. Average probabilities
1. ―Decision‖ is avg_prob > threshold
4. Decile Probability Ranks
© Abbott Analytics 2001-2009 24
Ensemble of Logistic Regression
ModelsAverage Probability:– Pr(1) = (Pr1Seed100 + Pr1Seed200 +Pr1Seed300 +
Pr1Seed400 +Pr1Seed500 + Pr1Seed600 + Pr1Seed700 +Pr1Seed800 + Pr1Seed900 +Pr1Seed1000 )/10
Decision: If average probability is greater than original proportion of responders, count prediction as response– Decision = ifelse ( ((Pr1Seed100 + Pr1Seed200
+Pr1Seed300 + Pr1Seed400 +Pr1Seed500 + Pr1Seed600 + Pr1Seed700 +Pr1Seed800 + Pr1Seed900 +Pr1Seed1000 )/10) > threshold , "1", "0" )
© Abbott Analytics 2001-2009 25
Variable Inclusion in Model
Ensembles
Twenty-Five different
variables represented
in the ten models
Only five were
represented in seven
or more models
Twelve were
represented in one or
two models
# Models # Variables
# Models with
Common Variables
© Abbott Analytics 2001-2009 26
Individual Model vs. Ensembles
Decile by Decile Response Rates
Cumulative Lift
© Abbott Analytics 2001-2009 27
Individual Model vs. Ensemble:
Ranks of Cumulative Lift
Notes:– Every model was ranked in the top 2 of cumulative lift at some
point going down the deciles
– Every model was ranked 8th-10th in cumulative lift at some point going down the deciles
Conclusion: Single models behave erratically on this (small) data set
* Ensemble ranks are based upon placement after ranks of individual models were already set.
Therefore, its rank will match an individual model that it has bested -> it’s average rank is pessimistic
*
© Abbott Analytics 2001-2009 28
Ensemble Model Results
© Abbott Analytics 2001-2009 29
Compare Response to ScoreCompare Response to score linear
y = 1.0028x
R2 = 0.9649
0.00
0.05
0.10
0.15
0.20
0.25
0.00 0.05 0.10 0.15 0.20 0.25
Mean Score
Res
po
nse
Rat
e
Compare Response to score polynomial
y = -0.3331x2 + 1.0489x
R2 = 0.9664
0.00
0.05
0.10
0.15
0.20
0.00 0.05 0.10 0.15 0.20 0.25
Mean Score
Res
po
nse
Rat
e
© Abbott Analytics 2001-2009 30
Model Deployment
SCORE SCORE SCORE Diff Linear pred. quadratic pred Actual
DECILE Mean Min Max StDevMean
scoresof response of response
CountResults
1 14.97% 11.48% 31.19% 0.026208 0 15.01% 14.95% 11,410 15.11%
2 10.65% 10.00% 11.48% 0.0042 0.00497 10.68% 10.79% 6,042 10.95%
17,452 Weighted
Ave. 13.5110% 13.5134% 13.67%
-
SCORE SCORE SCORE Diff Linear pred. quadratic pred 114,105
DECILE Mean Min Max StDevMean
scoresof response of response
1 14.97% 11.48% 31.19% 0.026208 -0.0425 15.01% 14.95% 11410
2 10.15% 9.24% 11.48% 0.006281 -0.0528 10.18% 10.31% 11410
3 8.72% 8.28% 9.24% 0.002739 -0.05 8.74% 8.89% 11411
4 7.95% 7.65% 8.28% 0.001818 -0.0409 7.98% 8.13% 11410
5 7.39% 7.13% 7.65% 0.001516 -0.032 7.41% 7.57% 11411
6 6.87% 6.62% 7.13% 0.001456 -0.0267 6.89% 7.05% 11410
7 6.37% 6.12% 6.62% 0.001431 -0.0233 6.39% 6.55% 11411
8 5.86% 5.58% 6.12% 0.001563 -0.0198 5.87% 6.03% 11410
9 5.28% 4.94% 5.58% 0.001854 -0.0158 5.29% 5.44% 11411
10 4.30% 0.56% 4.94% 0.005632 -0.0117 4.31% 4.45% 11411
Deployment Rifle Prospect Scoring Summary
© Abbott Analytics 2001-2009 31
Ensemble Model Results
Scored over 2,100,000 prospects
Actual results from the rollout
– Average response rate = 13.67%
Significant gross revenue generated for
business partner.
© Abbott Analytics 2001-2009 32
Questions?
Thank You!