how to improve customer acquisition models with ensembles · 2010-06-08 · –create one decision...

© Abbott Analytics 2001-2009 1

How to Improve Customer Acquisition

Models with Ensembles

Dean Abbott

Abbott Analyticshttp://www.abbottanalytics.com

[email protected]

Predictive Analytics World

October 20, 2009

Washington, DC

Brian Siegel

TN Marketing, LLChttp://www.tnmarketing.com

[email protected]

http://www.abbottanalytics.com/

http://www.tnmarketing.com/


Acknowledgements

Thanks to– TN Marketing, LLC for allowing this problem and solution

to be described in a public setting

– The Modeling Agency (TMA) and its President, Eric King

TMA is the contractor for predictive analytics consulting with TN Marketing

Mr. Abbott was a representative of TMA in this (and other) consulting engagements

http://www.the-modeling-agency.com

Data described in this talk is– Real, live, and difficult to model

http://www.the-modeling-agency.com/






Outline of Presentation

Introductions

Overview of Project

Data Preparation

Modeling Approach with Ensembles

Modeling Results

Deployment Results


About Abbott Analytics

Abbott Analytics– Founded in 1999, based in San Diego, CA

– Dedicated to data mining consulting and training

Principal: Dean Abbott– Applied Data Mining for 22+ years in

Direct Marketing, CRM, Survey Analysis, Tax Compliance, Fraud Detection, Predictive Toxicology, Biological Risk Assessment

– Course InstructionPublic 2- and 3-day Data Mining Courses

Conference Tutorials

– Customized Training and Knowledge TransferData mining methodology

Training services and hands-on courses for software products, including Clementine, Statistica, Affinium Model, Enterprise Miner, Tibco Spotfire Miner, CART,


A Word About TN Marketing

TN Marketing has been in business since Dec. 1998

Privately owned program developer and marketer, located in Minneapolis, MN.

TN Marketing’s business provides Partners with a productive marketing program that:

Generates direct revenues without investment

Increases brand loyalty

Supports leading edge direct marketing and fulfillment of books, and DVDs to members and customers of affinity partners using proprietary technology systems.

One of the 10 largest Direct Response book and video marketing/distribution companies in America.


The TN Marketing ModelTN Marketing licenses the brand and content for use in a direct mail marketing campaign to the brand’s current customers.

TN Marketing identifies and develops product(s) in consultation with the brand.

The brand approves the products.

TN Marketing solicits products to qualified brand enthusiasts. If it appeals to them, they may elect to receive similar products in a continuity series.

The brand earns royalties on sales.

The brand’s customers are assured:– 100% satisfaction guaranteed.

– No minimum purchase obligations and the ability to cancel at any time.


TN Marketing enjoys partnerships with

numerous consumer, subscriber, donor

and membership organizations


Some of Our Content Providers

http://www.primedia.com/


Objective’s

Random test mailing to NRA’s house file achieved a 11% response rate

Minimum response rate required to meet financial expectations is 13.5%

Develop a binary outcome model that will rank-order current database based on propensity to respond to traditional mailing, optimizing at a cumulative average response rate of >= 13.5%.


Source Data

Business partner provided data that summarizes

transactional data for every active NRA member

- 49 independent variables.

TN Marketing enhanced the database with

demographic data- 18 appended variables.

I-Miner was used to derive new variable features

and transformations of pre existing data points -

79 derived variables.


Data Preparation

Key transformations– Date Features

– Filling missing data Use ―Distribution‖ when possible for numeric fields

Use Constant for categoricals

For numeric data with both ―in-house‖ and third-party versions, use in-house when available, and if not, use third party

– Binning and BinarizationReduce # values if nominal variables with many poorly populated values


Data Size

Original Data

Data after data cleanup and feature creation

Data after further cleanup, and adding interaction terms


Sampling

Randomly split the 21,557 records into two 10,775 record data sets– Build response model on training data set

– Validate model by scoring test data set

Problem: – Training set contains just over 1000 affirmative examples, at the

edge of the lower bound for building reliable models

– Could result in unreliable models that behave poorly upon rollout

Solution: Model Ensembles


What are Model Ensembles?

Combining outputs from multiple models into single decision

Models can be created using the same algorithm, or several

different algorithms

Decision Logic

Ensemble Prediction


Bagging

Bagging Method– Create many data sets by

bootstrapping (can also do this with cross validation)

– Create one decision tree for each data set

– Combine decision trees by averaging (or voting) final decisions

– Primarily reduces model variance rather than bias

Results– On average, better than any

individual tree

Final

Answer

(average)

Breiman, (1996). Bagging Predictors, Machine Learning, Vol. 24, No. 2, pp. 123-140.


Why Ensembles Work

Single model synthesis can be difficult– algorithms search for best model, but not exhaustively (trees, poly.

Nets, regression)

– iterative algorithms converge to local minima (neural nets)

– Insufficient data to smooth out noise if one partitions data into train/test

~Uncorrelated output estimates provide means to eliminate errors from individual classifiers

Picture from

T.G. Dietterich. Ensemble methods in machine learning. In Multiple Classier Systems, Cagliari, Italy, 2000.

http://citeseer.ist.psu.edu/dietterich00ensemble.html


Model Ensembles: The Good and the Bad

Pro– Can significantly reduce model error

– Can be easy to automate -- already has been done in many commercial tools using Boosting, Bagging, Random Forests, and others

Con– Model interpretability is more difficult

– Can be very time consuming to generate dozens of models to combine

Note:– Weak learners (trees, Naïve Bayes) have greater

margin for improvement, but all algorithms can benefit from ensembles through either

Improved performance, or

Risk reduction


Error Ranges for Model

Combinations on Glass Data

Model prediction diversity obtained by using different algorithms: tree, NN, RBF, Gaussian, Regression, k-NN

Combining 3-5 models on average better than best single model

Combining all 6 models not best (best is 3&4 model combination), but is close

The is an example of reducing model variance through ensembles, but not model bias

1 2 3 4 5 6

0%

5%

10%

15%

20%

25%

30%

35%

40%

Pe

rce

nt C

lassific

atio

n E

rro

r

Number Models Combined

Max Error Min Error Average Error

Abbott, D.W. (1999). Combining Models to Improve Classifier Accuracy and Robustness. 1999

International Conference on Information Fusion—Fusion99, Sunnyvale, CA, July 6-8.


Model Comparison Example:Rankings Tell Different Stories

Model Number Model ID AUC Train RMS Test RMS AUC Rank Train RMS Rank Test RMS Rank

50 NeuralNet1032 73.3% 0.459 0.370 9 53 1

39 NeuralNet303 72.4% 0.477 0.374 42 59 2

36 NeuralNet284 75.0% 0.458 0.376 2 52 3

31 NeuralNet244 72.7% 0.454 0.386 33 49 4

57 CVLinReg2087 70.4% 0.397 0.393 52 5 5

34 NeuralNet277 72.7% 0.455 0.399 28 50 6

37 NeuralNet297 72.4% 0.449 0.399 43 38 7

56 CV_CART2079 68.0% 0.391 0.401 54 4 8

54 CVNeuralNet2073 67.9% 0.403 0.401 55 6 9

59 CVNeuralNet2097 66.0% 0.403 0.401 59 7 10

61 CV_CART2104 70.4% 0.386 0.402 53 3 11

42 NeuralNet334 72.4% 0.450 0.404 40 44 12

52 CVLinReg2063 67.5% 0.404 0.404 57 8 13

41 NeuralNet330 72.4% 0.443 0.406 41 16 14

38 NeuralNet300 72.4% 0.451 0.408 38 45 15

55 CV_CHAID2078 64.6% 0.380 0.411 60 2 16

45 NeuralNet852 74.2% 0.456 0.413 3 51 17

53 CVLogit2068 67.5% 0.414 0.414 58 10 18

60 CV_CHAID2102 61.5% 0.380 0.414 61 1 19

58 CVLogit2092 67.7% 0.413 0.414 56 9 20

Top RMS model is 9th in AUC, 2nd Test RMS rank is 42nd in AUC

Correlation between rankings: AUC Rank Train RMS Rank Test RMS Rank

AUC Rank 1

Train RMS Rank (0.465) 1

Test RMS Rank (0.301) 0.267 1


PAKDD07 Results

Problem: finance company wants to cross-sell

home loans to credit card holders

40K records for training, 8K testing

http://lamda.nju.edu.cn/conf/pakdd07/dmc07/


Even Ensembles of Ensembles

Can Help

http://www.tiberius.biz/pakdd07.html


Our Ensemble Approach

Build 10 bootstrap samples of training data

Build one logistic regression model per

bootstrap sample

– Build each model carefully, pruning to avoid needless

overfit (but not avoiding overfit completely)

Combine models through averaging of

probabilities

Rank-order composite score to determine

mailing depth


Building Ensemble Predictions:

Considerations or I-Miner

Steps:

1. Join by record—all models applied to same data in

same row order

2. Change probability names

3. Average probabilities

1. ―Decision‖ is avg_prob > threshold

4. Decile Probability Ranks


Ensemble of Logistic Regression

ModelsAverage Probability:– Pr(1) = (Pr1Seed100 + Pr1Seed200 +Pr1Seed300 +

Pr1Seed400 +Pr1Seed500 + Pr1Seed600 + Pr1Seed700 +Pr1Seed800 + Pr1Seed900 +Pr1Seed1000 )/10

Decision: If average probability is greater than original proportion of responders, count prediction as response– Decision = ifelse ( ((Pr1Seed100 + Pr1Seed200

+Pr1Seed300 + Pr1Seed400 +Pr1Seed500 + Pr1Seed600 + Pr1Seed700 +Pr1Seed800 + Pr1Seed900 +Pr1Seed1000 )/10) > threshold , "1", "0" )


Variable Inclusion in Model

Ensembles

Twenty-Five different

variables represented

in the ten models

Only five were

represented in seven

or more models

Twelve were

represented in one or

two models

# Models # Variables

# Models with

Common Variables


Individual Model vs. Ensembles

Decile by Decile Response Rates

Cumulative Lift


Individual Model vs. Ensemble:

Ranks of Cumulative Lift

Notes:– Every model was ranked in the top 2 of cumulative lift at some

point going down the deciles

– Every model was ranked 8th-10th in cumulative lift at some point going down the deciles

Conclusion: Single models behave erratically on this (small) data set

* Ensemble ranks are based upon placement after ranks of individual models were already set.

Therefore, its rank will match an individual model that it has bested -> it’s average rank is pessimistic

*


Ensemble Model Results


Compare Response to ScoreCompare Response to score linear

y = 1.0028x

R2 = 0.9649

0.00

0.05

0.10

0.15

0.20

0.25

0.00 0.05 0.10 0.15 0.20 0.25

Mean Score

Res

po

nse

Rat

e

Compare Response to score polynomial

y = -0.3331x2 + 1.0489x

R2 = 0.9664

0.00

0.05

0.10

0.15

0.20

0.00 0.05 0.10 0.15 0.20 0.25

Mean Score

Res

po

nse

Rat

e


Model Deployment

SCORE SCORE SCORE Diff Linear pred. quadratic pred Actual

DECILE Mean Min Max StDevMean

scoresof response of response

CountResults

1 14.97% 11.48% 31.19% 0.026208 0 15.01% 14.95% 11,410 15.11%

2 10.65% 10.00% 11.48% 0.0042 0.00497 10.68% 10.79% 6,042 10.95%

17,452 Weighted

Ave. 13.5110% 13.5134% 13.67%

-

SCORE SCORE SCORE Diff Linear pred. quadratic pred 114,105

DECILE Mean Min Max StDevMean

scoresof response of response

1 14.97% 11.48% 31.19% 0.026208 -0.0425 15.01% 14.95% 11410

2 10.15% 9.24% 11.48% 0.006281 -0.0528 10.18% 10.31% 11410

3 8.72% 8.28% 9.24% 0.002739 -0.05 8.74% 8.89% 11411

4 7.95% 7.65% 8.28% 0.001818 -0.0409 7.98% 8.13% 11410

5 7.39% 7.13% 7.65% 0.001516 -0.032 7.41% 7.57% 11411

6 6.87% 6.62% 7.13% 0.001456 -0.0267 6.89% 7.05% 11410

7 6.37% 6.12% 6.62% 0.001431 -0.0233 6.39% 6.55% 11411

8 5.86% 5.58% 6.12% 0.001563 -0.0198 5.87% 6.03% 11410

9 5.28% 4.94% 5.58% 0.001854 -0.0158 5.29% 5.44% 11411

10 4.30% 0.56% 4.94% 0.005632 -0.0117 4.31% 4.45% 11411

Deployment Rifle Prospect Scoring Summary


Ensemble Model Results

Scored over 2,100,000 prospects

Actual results from the rollout

– Average response rate = 13.67%

Significant gross revenue generated for

business partner.


Questions?

Thank You!

how to improve customer acquisition models with ensembles · 2010-06-08 · –create one decision...

Documents