phd seminar riezlern 2016

Leveraging Regularity in Predicting Customer Lifetime Value

Michael Platzer & Thomas Reutterer

Seminar Riezlern 2016

Warm Up

PAGE 2

Customer A

Customer B

1-Jan-16, 09:00 21-Jun-16, 10:28

1) Which customer would you prefer? The regular one, or the clumpy one?

2) Which type of customers are more prevalent? The regular ones, or the clumpy ones?

Two customers – Same Recency, Same Frequency

1. Intro to BTYD models2. On the Subject of Regularity

3. Our Pareto/GGG model

4. Our (M)BG/CNBD-k model

5. Our BTYDplus R package

dead?

non-contractual setting

Customer purchases, until she stops purchasing. However, dropout event is not observed.

Buy-Till-You-Die1. Intro to BTYD models2. On the Subject of Regularity3. Our Pareto/GGG model4. Our (M)BG/CNBD-k model5. Our BTYDplus R package

alive!

Key Issues in the Management of Customer Relationships

PAGE 5

1. Intro to BTYD models2. On the Subject of Regularity3. Our Pareto/GGG model4. Our (M)BG/CNBD-k model5. Our BTYDplus R package

?

?

Given: Purchase history of customer cohort in non-contractual setting.

Example:

CD Sales

Broadening the context:• purchase ≈ transaction ≈ event …• customer relationship ≈ channel activity ≈

service activity …

Questions:How valuable is that cohort?

How many purchases to expect?

Who will still be active?

Who will be most active?

When will next purchase take place?

BTYD “Gold Standard”Pareto/NBDSchmittlein, Morrison and Colombo, 1987

Assumptions1. Purchase process (while ‘alive’)

• Purchases follow Poisson process, i.e. exponentially-distributed inter-transaction times, itti,j ~ Exponential(λi)

• λi are Gamma (r, α) distributed across customers

Pareto

NBD(Ehrenberg 1959)

à parameter estimation of (r, α, s, β) via Maximum Likelihood à closed-form solutions for key expressions P(alive), # of future purchases à require only recency/frequency summary statistics (x, tx, T) per customer

2. Dropout (‘death’) process

• (Unobserved) customer’s lifetime is exponentially distributed, lifetime τi~ Exponential(μi)

• μi are Gamma (s, β) distributed across customers


3. λ and μ vary independently

BTYD Models

• BG/NBD (Fader, Hardie, and Lee 2005)Discrete time defection process (after any transaction) instead of continuous

• MBG/NBD (Batislam et al. 2007), CBG/NBD (Hoppe and Wagner 2007)Customers can drop out at time zero (immediately after first purchase)

• PDO/NBD (Jerath et al. 2011)Defection opportunities tied to calendar time (indep. of transaction timing)

• GG/NBD (Bemmaor and Glady 2012)Flexible lifetime model, departing from exponential (Gamma-Gompertz)

• Pareto/NBD variant (Abe 2009)Hierarchical Bayes extension of Pareto/NBD (dependencies of λi and μi)


à All modify dropout process, but not purchase process

Regularity improves Predictability

PAGE 9

futurepast


next event?



next event?

Well, so what?



still alive?

A

B

Buy-Till-You-Die Setting

Customer A and B exhibit same Recency and Frequency, yet we come to different assessments regarding P(alive).



• Erlang-k Herniter (1971)• Gamma Wheat & Morrison (1990)• CNBD Chatfield and Goodhardt (1973)

Schmittlein and Morrison (1983)

Morrison and Schmittlein (1988)

• CNBD Models Gupta (1991)

Wu and Chen (2000)

Schweidel and Fader (2009)

Regularity in Purchase Timings


• RFMC Zhang, Bradlow and Small (2015)

Irregularity in Purchase Timings


Empirical Findings1. Intro to BTYD models2. On the Subject of Regularity3. Our Pareto/GGG model4. Our (M)BG/CNBD-k model5. Our BTYDplus R package

Data SetsGrocery kwheat = 2.5Donations kwheat = 2.2Health Supplements kwheat = 2.1 Office Supply kwheat = 1.8CD Sales kwheat = 1.0Fashion & Accessoires kwheat = 0.6

Grocery CategoriesCoffee pads kwheat = 3.1Detergents kwheat = 2.8Toilet Paper kwheat = 2.8Cat food kwheat = 2.8…Light bulbs kwheat = 1.9Cosmetics & perfumes kwheat = 1.6Sparkling Wine kwheat = 1.6

Pareto/GGG Platzer and Reutterer, forthcoming

PAGE 18

Customer Level

• Purchase Process: While alive, customer purchases with Gamma distributed waiting times; i.e. itti,j ~ Gamma(ki, ki λi)

• Dropout Process: Each customer remains alive for an exponentially distributed lifetime with death rate μi; i.e. lifetime τi ~ Exponential(μi)

Heterogeneity across Customers• λi ~ Gamma(r, α)

• μi ~ Gamma(s, β)• ki ~ Gamma(t, γ)

• λi, μi, ki vary independently


Pareto/GGG = Pareto/NBD + Varying Regularity

Gamma Distributed Interpurchase Times

PAGE 19

k=0.3 k=1Exponential

k=8Erlang-8

regularrandomclumpy

Coefficient of Variation = 1 / sqrt(k)


Pareto/GGG Estimation via MCMCComponent-wise Slice Sampling within Gibbs with Data Augmentation

SEITE 20

L Significantly Increased Computational Costs(2mins for drawing 1’000 customers)


Pareto/GGG Estimation via MCMCComponent-wise Slice Sampling within Gibbs with Data Augmentation

SEITE 21


L Significantly Increased Computational Costs(2mins for drawing 1’000 customers)

J but…• Posterior Distributions instead of Point Estimates

• Also for Individual Level Parameters

• Direct Simulation of Key Metrics of Managerial Interest

• And only one additional summary statistic required

Simulation StudyDesign

160 scenarios covering a wide range of parameter settings(similar to simulation design from BG/BB paper)

• N = {1000, 4000}• r = {0.25, 0.75}, α = {5, 15}• s = {0.25, 0.75}, β = {5, 15}• (t, γ) = {(1.6, 0.4), (5, 2.5), (6, 4), (8, 8), (17, 20)}=> Total of 400’000 simulated customers=> Total of 64 billion individual-level parameter draws (via slice sampling)

Compare individual-level forecast accuracy of Pareto/GGG vs. Pareto/NBD in terms of mean absolute error (MAE). Study relative improvement in terms in MAE.


Simulation StudyRegularity improves Predictability


• bigger lift for bigger regularity• even for mildly regular patterns

we see lift• no lift for random and clumpy

customers


Simulation StudyLift in Predictive Accuracy by Segment


Simulation StudyInterplay of Recency, Frequency and Regularity

Assumptions: mean(itt) = 6 weeks, mean(lifetime) = 52 weeks

A


Simulation StudyInterplay of Recency, Frequency and Regularity

Same RF, but different P(alive) for different k! Particularly when customer is already “overdue”.

Regular customers are less likely and clumpy customers are more likely to be still alive, when compared to the randomly purchasing customer.

Assumptions: mean(itt) = 6 weeks, mean(lifetime) = 52 weeks

A

B


Empirical Findings

regularPoisson

clumpy

à regularity varies across but also within datasets


à improved predictive accuracy for datasets with regular patterns

median(k) rel. Lift in MAE


Empirical Findings

à estimates for next transaction timings differ, when regularity is taking into consideration


Empirical Findings

(M)BG/CNBD-kPlatzer and Reutterer, forthcoming

PAGE 32

Customer Level

• Purchase Process: While alive, customer purchases with Erlang-k distributed waiting times; i.e. itti,j ~ Erlang-k(λi)

• Dropout Process: A customer drops out at a (re-)purchase event with probability pi

Heterogeneity across Customers• λi ~ Gamma(r, α)

• pi ~ Beta(a, b)• λi, pi vary independently


BG/CNBD-k = BG/NBD + Fixed Regularity

MBG/CNBD-k = MBG/NBD + Fixed Regularity

(M)BG/CNBD-kPlatzer and Reutterer, forthcoming

PAGE 33

Closed-Form Expressions

• Likelihood à 100-1000x faster parameter estimation via MLE than MCMC

• P(X(t)=x | r, α, a, b) à approximate Unconditional Expectation

• P(alive | r, α, a, b, x, tx, T) à key component for Conditional Expectation

• Conditional Expected Transactions à “pretty good” approximation possible


Erlang-k = Poisson with every kth event counted


Simulation StudyDesign

324 scenarios covering a wide range of parameter settings – 5 repeats each(similar to simulation design from BG/NBD paper)

• N = 4000, T.cal = 52, T.star = {4, 16, 52}• r = {0.25, 0.50, 0.75}, α = {5, 10, 15}• s = {0.50, 0.75, 1.00}, β = {2.5, 5, 10}• k = {1, 2, 3, 4}=> total of 1’300’000 simulated customers

Compare individual-level forecast accuracy of Pareto/GGG vs. Pareto/NBD in terms of mean absolute error (MAE). Study relative improvement in terms in MAE.


Simulation StudyExample


Simulation StudyResults


• bigger lift for bigger regularity• even for mildly regular patterns we see lift


Empirical FindingsResults

Findings

1. MBG/NBD either on par or better than BG/NBD

2. MBG/CNBD-k sees lift in forecast accuracy, if regularity present

3. MBG/CNBD-k comes close to P/GGG


Empirical FindingsResults

Yet to come: Study Lift by Retail Category


BTYDplus

• https://github.com/mplatzer/BTYDplus

• GPL-3 license• Implementations of

• MBG/NBD – Batislam et al. (2007)• GammaGompertz/NBD – Bemmaor & Glady (2012)• (M)BG/CNBD-k – Platzer and Reutterer (forthcoming)

• Pareto/NBD (MCMC) - Shao-Hui and Liu (2007)• Pareto/NBD variant (MCMC) – Abe (2009)

• Pareto/GGG (MCMC) – Platzer and Reutterer (forthcoming)• Fully tested and documented, incl. demos• Vignette will be coming

…

Users


BTYDplusdemo

> elogcust date

1: 4 1997-01-182: 4 1997-08-023: 4 1997-12-124: 18 1997-01-045: 21 1997-01-01

---6914: 23556 1997-07-266915: 23556 1997-09-276916: 23556 1998-01-036917: 23556 1998-06-076918: 23569 1997-03-25

> (cbs <- elog2cbs(elog, per="week", T.cal=as.Date("1997-09-30"), T.tot=as.Date("1997-09-30")))

cust x t.x litt T.cal T.star x.star1: 4 1 28.000000 3.3322045 36.42857 39 12: 18 0 0.000000 0.0000000 38.42857 39 03: 21 1 1.714286 0.5389965 38.85714 39 04: 50 0 0.000000 0.0000000 38.85714 39 05: 60 0 0.000000 0.0000000 34.42857 39 0

---2353: 23537 0 0.000000 0.0000000 27.00000 39 22354: 23551 5 24.285714 5.5243721 27.00000 39 02355: 23554 0 0.000000 0.0000000 27.00000 39 12356: 23556 4 26.571429 6.3127713 27.00000 39 22357: 23569 0 0.000000 0.0000000 27.00000 39 0

calibration summary statsx = Frequencyt.x = Recencylitt = Sum Over Logarithmic Intertransaction Times

holdout summary stats

Transform event-log to summary stats(optionally one can split data into calibration and holdout)

customer ID


BTYDplusdemo MBG/CNBD-k

> params <- mbgcnbd.EstimateParameters(cbs)> round(params, 2)

k r alpha a b 1.00 0.52 6.17 0.89 1.62

> cbs$xstar_est <- mbgnbd.ConditionalExpectedTransactions(params, cbs$T.star, cbs$x, cbs$t.x, cbs$T.cal)> cbs$palive_est <- mbgnbd.PAlive(params, cbs$x, cbs$t.x, cbs$T.cal)> cbs

cust x t.x litt T.cal T.star x.star palive_est xstar_est1: 4 1 28.000000 3.3322045 36.42857 39 1 0.6771113 0.78386362: 18 0 0.000000 0.0000000 38.42857 39 0 0.3919457 0.15581043: 21 1 1.714286 0.5389965 38.85714 39 0 0.1711458 0.18902914: 50 0 0.000000 0.0000000 38.85714 39 0 0.3907532 0.15403365: 60 0 0.000000 0.0000000 34.42857 39 0 0.4037292 0.1742668

---2353: 23537 0 0.000000 0.0000000 27.00000 39 2 0.4294331 0.22065542354: 23551 5 24.285714 5.5243721 27.00000 39 0 0.8222069 3.95010152355: 23554 0 0.000000 0.0000000 27.00000 39 1 0.4294331 0.22065542356: 23556 4 26.571429 6.3127713 27.00000 39 2 0.8557381 3.40193512357: 23569 0 0.000000 0.0000000 27.00000 39 0 0.4294331 0.2206554

E(X(T+T.star))P(alive)


BTYDplusdemo Pareto/GGG

> params_draws <- pggg.mcmc.DrawParameters(cbs)> round(summary(params_draws$level_2)$quantiles[, "50%"], 2)

t gamma r alpha s beta 45.31 43.36 0.55 10.74 0.66 12.51 > est_draws <- mcmc.DrawFutureTransactions(cbs, params_draws, cbs$T.star)> cbs$palive_est <- sapply(params_draws$level_1, function(draws) mean(as.matrix(draws)[, 'z']))> cbs$xstar_est <- apply(est_draws, 2, mean)> cbs

cust x t.x litt T.cal T.star x.star palive_est xstar_est1: 4 1 28.000000 3.3322045 36.42857 39 1 0.92 0.772: 18 0 0.000000 0.0000000 38.42857 39 0 0.26 0.083: 21 1 1.714286 0.5389965 38.85714 39 0 0.17 0.114: 50 0 0.000000 0.0000000 38.85714 39 0 0.33 0.055: 60 0 0.000000 0.0000000 34.42857 39 0 0.34 0.27

---2353: 23537 0 0.000000 0.0000000 27.00000 39 2 0.38 0.152354: 23551 5 24.285714 5.5243721 27.00000 39 0 0.95 4.552355: 23554 0 0.000000 0.0000000 27.00000 39 1 0.36 0.172356: 23556 4 26.571429 6.3127713 27.00000 39 2 1.00 3.412357: 23569 0 0.000000 0.0000000 27.00000 39 0 0.51 0.31

E(X(T+T.star))P(alive)

[email protected]@wu.ac.at

Try BTYDplus !!!

Appendix

• C Measure by Zhang, Bradlow, Small 2015

• MCMC Sampling Scheme

ZBS: Clumpiness Measure Ca metric-based approach

Predicting Customer Value Using Clumpiness: From RFM to RFMCZhang, Bradlow, Small

• Introduce metric C which captures the “non-randomness” in timing patterns

• Straightforward calculation at individual-level;

• Useful for descriptive analysis and segmentation;


Main Empirical Findings• Capturing timing patterns adds

predictive power• When controlling for R and F, then

clumpy customers tend to be more active in the future

both findings are supported and can be explained by our model-based approach


Shortcomings• Requires many transactions at

individual-level• Metric C will be skewed when

dealing with different acquisition dates and churn settings

both are appropriately handled by our model-based approach


à sparse individual-level data mandates a model-based approach

Parameter Estimation via MCMCComponent-wise Slice Sampling within Gibbs with Data Augmentation

SEITE 50

phd seminar riezlern 2016

Marketing