introduction to uplift modelling

Introduction to Uplift Modelling An online gaming application

A few words about me

•  Senior Data Scientist at Dataiku (worked on churn prediction, fraud detection, bot detection, recommender systems, graph analytics, smart cities, … )

•  Occasional Kaggle competitor

•  Mostly code with python and SQL

•  Twitter @prrgutierrez

Plan •  Introduction / Clients situation

•  Uplift use case examples

•  Uplift modelling

•  Uplift evaluation & results

Client situation •  French Online Gaming Company (RPG)

•  A lot of users are leaving •  let’s do a churn prediction model !

•  Target : no come back in 14 or 28 days. (14 missing days -> 80 % of chance not to come back

28 missing days -> 90 % of chance not to come back) •  Features :

•  Connection features : •  Time played in 1,7,15,30,… days •  Time since last connection •  Connection frequency •  Days of week / hours of days played

•  Equivalent for payments and subscriptions

•  Age, sex, country •  Number of account, is a bot … •  No in game features (no data)

Client situation •  Model Results :

•  AUC 0.88 •  Very stable model

•  Marketing actions : •  7 different actions based on customer segmentation (offers, promotion, … ) •  A/B test -> -5 % churn for persons contacted by email

•  Going further : •  Feature engineering : guilds, close network, in game actions, … •  Study long term churn …

Client situation •  But wait !

•  Strong hypothesis : target the person that are the most likely to churn


•  Strong hypothesis : target the person that are the most likely to churn •  What is the gain / person for an action ?

•  cost of action •  value of the customer •  independent variables •  “treated” population and “control” population

• 

•  Value with action : •  Value without action : •  Gain (if independent of treatment ) :

cvi i

XT C

Y =

⇢1 if customer churn

0 otherwise

ET (Vi) = vi(1� PT (Y = 1|X))� cEC(Vi) = vi(1� PC(Y = 1|X))

viE(Gi) = vi(P

C(Y = 1|X)� PT (Y = 1|X))� c


•  Strong hypothesis : target the person that are the most likely to churn •  What is the gain / person for an action ?

•  Objective : maximize this gain •  Targeting highly probable churner -> minimize But not the difference ! •  Intuitive examples :

•  : action is expected to make the situation worst. Spam ? •  : user does not care, is already lost

Upli& = Model

E(Gi) = vi(PC(Y = 1|X)� PT (Y = 1|X))� c

PT (Y = 1|X)

PC(Y = 1) ⇡ PT (Y = 1)

�P

PC(Y = 1) < PT (Y = 1)

Uplift •  Model effect of the action

•  4 groups of customers / patients

•  1 Responded because of the action (the people we want) •  2 Responded, but would have responded anyway (unnecessary costs) •  3 Did not respond and the action had no impact (unnecessary costs) •  4 Did not respond because the action had a negative impact (negative impact)

•  Incomplete knowledge

Uplift Examples •  Healthcare :

•  A typical medical trial: •  treatment group: gets the treatment •  control group: gets placebo (or another treatment)

•  do a statistical test to show that the treatment is better than placebo

•  With uplift modeling we can find out for whom the treatment works best

•  Personalized medicine

•  Ex : What is the gain in survival probability ?

-> classification/uplift problem

Uplift Examples •  Churn :

•  E-gaming •  Other Ex : Coyote

•  Retail : •  Compare coupons campaigns

Uplift Examples •  Mailing : Hillstrom challenge

•  2 campaigns : •  one men email

•  one woman email

•  Question : who are the people to target / that have the best response rate

Uplift Examples •  Common pattern

•  Experiment or A/B testing -> Test and control

•  Warning : Control can be biased easily : •  Targeted most probable churners and control is the rest •  Call only the people that come to a shop

•  Limited experiment trial -> no bandit algorithm : (once a medicine experiment is done, you don’t continue the “exploration”) -> relatively large and discrete in time feedbacks.

Uplift modelling •  Three main methods :

•  Two models approach

•  Class variable modification

•  Modification of existing machine learning models

Uplift modelling : Two model approach •  Build a model on treatment to get

•  Build a model on control to get

•  Set :

PT (Y |X)

PC(Y |X)

�P = PT (Y |X)� PC(Y |X)

Uplift modelling : Two model approach •  Advantages :

•  Standard ML models can be used •  In theory, two good estimators -> a good uplift model •  Works well in practice •  Generalize to regression and multi-treatment easily

•  Drawbacks •  Difference of estimators is probably not the best estimator of the difference •  The two classifier can ignore the weaker uplift signal (since it’s not their target) •  Algorithm focusing on estimating the difference should perform better

Uplift modelling : Class variable modification •  Introduced in Jaskowski, Jaroszewicz 2012 •  Allows any classifier to be updated to uplift modeling

•  Let denote the group membership (Treatment or Control)

•  Let’s define the new target variable :

•  This corresponds to flipping the target in the control dataset.

G 2 {T,C}

Z =

8<

:

1 if G = T and Y = 1

1 if G = C and Y = 0

0 otherwise

Uplift modelling : Class variable modification •  Summary :

•  Flip class for control dataset •  Concatenate test and control dataset •  Build a classifier •  Target users with highest probability

•  Advantages :

•  Any classifier can be used •  Directly predict uplift (and not each class separately) •  Single model on a larger dataset (instead of two small ones)

•  Drawbacks :

•  Complex decision surface -> model can perform poorly •  Interpretation : what is AUC in this case ?

Uplift modeling : Other methods •  Based on decision trees :

•  Rzepakowski Jaroszewicz 2012 new decision tree split criterion based on information theory •  Soltys Rzepakowski Jaroszewicz 2013 Ensemble methods for uplift modeling

(out of today scope)

Evaluation •  We used :

•  2 model approach. -> AUC ? Not very informative. •  1 model approach -> does AUC means something ? •  How can we evaluate / compare them ?

•  Cross Validation : •  4 datasets : treatment/control x train/test

•  Problem : •  We don’t have a clear 0/1 target. •  We would need to know for each customer

•  Response to treatment •  Response to control -> not possible

Evaluation

•  Gain for group of customers : •  Gain for the 10% highest scoring customers =

% of successes for top 10% treated customers − % of successes for top 10% control customers

•  Uplift curve ? :

•  Difference between two lift curve •  Interpretation : net gain in success rate if a given percentage of the population is treated •  Pb : no theoretic maximum •  Pb 2 : weird behaviour for 2 wizard models.

Evaluation : Qini

•  Qini Measure : •  Similar to Gini (Area under lift curve). Lift Curve <-> Qini Curve •  Parametric curve defined by :

•  When taking the first observations •  is the total number of 1 seen in target observations •  is the total number of 1 seen in control observations •  is the total number of target observations •  is the total number of control observations

•  Balanced setting :

tf(t) = YT (t)� YC(t) ⇤NC(t)/NT (t)

YT

YC

NC

NT

f(t) = YT (t)� YC(t)

Evaluation : Qini

•  Personal intuition : •  We can’t know everything :

•  treated that convert, not treated that don’t convert. What would have happen ? •  But we don’t want to see :

•  Treated not converting •  Not treated converting (in our top list)

•  In we want to minimize :

•  Very similar to lift taking into account only negative examples.

t

NT (t)� YT (t) + YC(t)

Evaluation : Qini


Evaluation : Qini •  Best model :

•  Take first all positive in target and last all positive in control. •  No theoretic best model :

•  depends on possibility of negative effect •  Displayed for no negative effect

•  Random model : •  Corresponds to global effect of treatment

•  Hillstrom Dataset : •  For women models are comparable and useful •  For men, there is no clear individuals to target

Evaluation : Qini


Evaluation : Qini •  Back to our study :

•  Class modification performs best •  Two models approach performs poorly

•  A/B test failure : •  Control dataset is way to small ! •  Class modification model very close to lift •  Two model slightly better than random -> need to redo the A/B test.

Conclusion •  Uplift :

•  Surprisingly little literature / examples •  The theory is rather easy to test

•  Two models •  Class modification

•  The intuition and evaluation are not easy to grasp

•  On the client side : •  I don’t loose hope we’ll do the A/B test again •  A good lead to select the best offer for a customer

A few references •  Data :

•  Churn in gaming : WOWAH dataset (blog post to come)

•  Uplift for healthcare : Colon Dataset

•  Uplift in mailing : Hillstrom data challenge

•  Uplift in General :

Simulated data : (blog post to come)

A few references •  Application

•  Uplift modeling for clinical trial data (Jaskowski, Jaroszewicz) •  Uplift Modeling in Direct Marketing (Rzepakowski, Jaroszewicz)

A few references •  Modeling techniques :

•  Rzepakowski Jaroszewicz 2011 (decision trees) •  Soltys Rzepakowski Jaroszewicz 2013 (ensemble for uplift) •  Jaskowski Jaroszewicz 2012 (Class modification model)

A few references •  Evaluation

•  Using Control Groups to Target on Predicted Lift (Radcliffe) •  Testing a New Metric for Uplift Models (Mesalles Naranjo)

Thank you for your attention !

introduction to uplift modelling

Data & Analytics