lessons for external validity from large scale experimentation · validity from large scale ......

48
Lessons for External Validity from Large Scale Experimentation SUSAN ATHEY, STANFORD TWO DAY COURSE ON MACHINE LEARNING AND CAUSAL INFERENCE WITH VIDEOS AND SCRIPTS: HTTPS://WWW.AEAWEB.ORG/CONFERENCE/CONT ED/2018 WEBCASTS SURVEY PAPER: HTTPS://WWW.NBER.ORG/CHAPTERS/C14009.PDF LINKS TO PAPERS: HTTPS://ATHEY.PEOPLE.STANFORD.EDU/RESEARCH

Upload: others

Post on 21-Jun-2020

6 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Lessons for External Validity from Large Scale Experimentation · Validity from Large Scale ... Experiments (Handbook of Experimental Economics) 7. ... Interference (networks) •Staggered

Lessons for External Validity from Large Scale ExperimentationSUSAN  ATHEY, STANFORD

TWO ‐DAY  COURSE  ON  MACHINE  LEARNING  AND  CAUSAL   INFERENCE  WITH  VIDEOS  AND  SCRIPTS :  HTTPS://WWW.AEAWEB.ORG/CONFERENCE/CONT ‐ED/2018 ‐WEBCASTS

SURVEY  PAPER:  HTTPS: //WWW.NBER.ORG/CHAPTERS/C14009.PDF

L INKS  TO  PAPERS:  HTTPS://ATHEY.PEOPLE.STANFORD.EDU/RESEARCH

Page 2: Lessons for External Validity from Large Scale Experimentation · Validity from Large Scale ... Experiments (Handbook of Experimental Economics) 7. ... Interference (networks) •Staggered

OverviewExperimentation at tech firms is ubiquitous and has become a large research field

Some problems it solves easily◦ Variations◦ Bandits

Some problems it solves with more complex designs◦ Interference◦ Staggered adoption◦ Multiple randomization designs

Some problems require modeling or offline simulation on top of or instead of experimentation

Validity◦ Can validate w/ ongoing experimentation◦ Lots of small experiments

Heterogenous treatment effects – one angle◦ Estimate them◦ Look for systematic differences in 𝜏 𝑥 across settings

Page 3: Lessons for External Validity from Large Scale Experimentation · Validity from Large Scale ... Experiments (Handbook of Experimental Economics) 7. ... Interference (networks) •Staggered

Analytics • Previous observational data • Previous experiments

Innovation • Algorithmic development, e.g. personalization• Pilot experiment

Experimental Design

• Develop KPIs and validate externally• Formulate hypotheses• Pre‐analysis planning• Advanced experimentation (e.g. adaptive)

Analyze and Improve

• Generalizable insights• Tactical insights• New innovation plan

Page 4: Lessons for External Validity from Large Scale Experimentation · Validity from Large Scale ... Experiments (Handbook of Experimental Economics) 7. ... Interference (networks) •Staggered

Tech Firm Experimentation

Page 5: Lessons for External Validity from Large Scale Experimentation · Validity from Large Scale ... Experiments (Handbook of Experimental Economics) 7. ... Interference (networks) •Staggered

Experimentation Research at Tech Firms

Page 6: Lessons for External Validity from Large Scale Experimentation · Validity from Large Scale ... Experiments (Handbook of Experimental Economics) 7. ... Interference (networks) •Staggered
Page 7: Lessons for External Validity from Large Scale Experimentation · Validity from Large Scale ... Experiments (Handbook of Experimental Economics) 7. ... Interference (networks) •Staggered

My Research on Design and Analysis of Experiments

Surrogates (Athey, Chetty, Imbens, Kang, 2016, update coming shortly)

Heterogeneous treatment effects  Athey & Imbens (PNAS 2016)Wager & Athey (JASA, 2018); Athey, Tibshirani, and Wager (AOS, 2019); Friedberg, Athey, Tibshirani, and Wager (2018)

Offline policy estimation (Athey and Wager, 2017; Zhou, Athey, and Wager 2018)

Improving estimation model used in contextual bandit algorithms (Dimakopoulou, Zhou, Athey and Imbens, AAAI, 2018)

Designing experiments with staggered rollouts (Xiong, Athey, Bayati, Imbens 2019)

Testing hypotheses using adaptively collected data (Hadad, Hirschberg, Zhan, Wager, Athey, 2019)

Survey Athey & Imbens, The Econometrics of Randomized Experiments (Handbook of Experimental Economics)

7

Page 8: Lessons for External Validity from Large Scale Experimentation · Validity from Large Scale ... Experiments (Handbook of Experimental Economics) 7. ... Interference (networks) •Staggered

A/B Testing:Typical Applications

Performance of ad copy

Factorial experiments for email campaigns

Compare two ranking algorithms

Background color for website

Change the signup process

Page 9: Lessons for External Validity from Large Scale Experimentation · Validity from Large Scale ... Experiments (Handbook of Experimental Economics) 7. ... Interference (networks) •Staggered

A/B Testing:Challenges

• Surrogates, structural models

Long term effects

•Bandits, factorial experiments

Many arms

•Clustered randomization and design

Interference (networks)

• Staggered rollout designs, clustered randomization, structural models

Marketplace experimentation

• Structural models, offline simulators + experiments

Equilibrium adjustment

• Experiment splitting, empirical Bayes, other recent work

Adjusting for many experiments

Page 10: Lessons for External Validity from Large Scale Experimentation · Validity from Large Scale ... Experiments (Handbook of Experimental Economics) 7. ... Interference (networks) •Staggered

Validity Challenges and Tactical Solutions

Impacts are specific to the state of the system/market at time of experiment• Continue to experiment• Long term holdouts (e.g. no‐ads group)

Impacts are specific to the platform/company• Likely true for many/most• Yet, many common themes• Reputation systems• Supplier incentives• Consumer marketing and promotions• Add‐on fees/delivery charges/taxes

Page 11: Lessons for External Validity from Large Scale Experimentation · Validity from Large Scale ... Experiments (Handbook of Experimental Economics) 7. ... Interference (networks) •Staggered

Active LearningBandits:◦ Balance exploration (learning) and exploitation(getting the best outcome for each subject)

◦ Heuristics such as Thompson Sampling◦ Assign treatment in proportion to probability it is optimal 

System interacts with its environment, taking actions or assigning treatments

Page 12: Lessons for External Validity from Large Scale Experimentation · Validity from Large Scale ... Experiments (Handbook of Experimental Economics) 7. ... Interference (networks) •Staggered

Active Learning

Bandits:◦ Balance exploration (learning) and exploitation(getting the best outcome for each subject)

◦ Heuristics such as Thompson Sampling◦ Assign treatment in proportion to probability it is optimal 

System interacts with its environment, taking actions or assigning treatments

Page 13: Lessons for External Validity from Large Scale Experimentation · Validity from Large Scale ... Experiments (Handbook of Experimental Economics) 7. ... Interference (networks) •Staggered

Testing hypotheses with adaptively collected data

IPW Estimator

Simple Mean

Weighted IPW

Hadad, Hirschberg, Zhan, Wager, Athey (2019)

Page 14: Lessons for External Validity from Large Scale Experimentation · Validity from Large Scale ... Experiments (Handbook of Experimental Economics) 7. ... Interference (networks) •Staggered

Active Learning

Contextual bandits:◦ Learn a targeted treatment assignment policy mapping from individual characteristics to treatments

𝜋:𝕏 → 𝕎

◦ Consider batches of subjects◦ After each batch, estimate a model mapping characteristics to (counterfactual) outcomes for each treatment

◦ Then apply bandit heuristics

System interacts with its environment, taking actions or assigning treatments

Outcomes for different arms depend on contexts

Page 15: Lessons for External Validity from Large Scale Experimentation · Validity from Large Scale ... Experiments (Handbook of Experimental Economics) 7. ... Interference (networks) •Staggered

Outcomes for different arms depend on contexts

Doubly robust contextual bandit learns the optimal treatment assignment policy

Estimation along the path plagued by adaptivity of assignment process; weighting creates variance as assignment probabilities converge

Page 16: Lessons for External Validity from Large Scale Experimentation · Validity from Large Scale ... Experiments (Handbook of Experimental Economics) 7. ... Interference (networks) •Staggered

Heterogeneous Treatment Effect Estimation

Estimate CATE • If sufficient data to estimate this well, address problem that other environments have different populations

Estimate CATE • s is the “state,” e.g. time, place• If CATE varies with s or if ATE differs across s after adjusting for covariates,concern about validity with unseen s

Page 17: Lessons for External Validity from Large Scale Experimentation · Validity from Large Scale ... Experiments (Handbook of Experimental Economics) 7. ... Interference (networks) •Staggered

ML and Econometrics

Supervised learning: ◦ Can evaluate in test set in model‐free way

MSE: ∑ 𝑌 𝜇 𝑋Causal inference◦ Objective: unbiased/consistent parameter estimation◦ Parameters of interest not observed in test set◦ Can estimate objective (MSE of parameter), but requires maintained assumptions, often not model‐free

Infeasible MSE: ∑ 𝜃 𝜃 𝑋◦ Tune for counterfactuals: distinct from tuning for fit, also different counterfactuals select different models

◦ Theoretical assumptions, domain knowledge ◦ Sampling variation matters even in large data sets◦ Statistical theory and inference play important roles

Causal inference vs. Supervised ML

Page 18: Lessons for External Validity from Large Scale Experimentation · Validity from Large Scale ... Experiments (Handbook of Experimental Economics) 7. ... Interference (networks) •Staggered

Causal Inference Approaches

Goal: estimate the causal impact of interventions or treatment assignment policies◦ Low dimensional intervention◦ Desire confidence intervals

Estimands◦ Average effect◦ Heterogeneous effects◦ Optimal policy

Designs that enable identification and estimation of these effects◦ Randomized experiments◦ Unconfoundedness◦ “Natural” experiments (IV)◦ Regression discontinuity◦ Difference‐in‐difference◦ Longitudinal data◦ Randomized and natural experiments in social network/settings w/ interference

“Program evaluation”, “Treatment effect estimation”

For each

Estimand X Design

New ML‐based method, theory, confidence intervals

Page 19: Lessons for External Validity from Large Scale Experimentation · Validity from Large Scale ... Experiments (Handbook of Experimental Economics) 7. ... Interference (networks) •Staggered

My own work on ML/Causal InferencePitfalls of Pure Prediction “Beyond Prediction: Using Big Data for Policy Problems,” Science, 2017“The Impact of Machine Learning on Economics,” The Economics of Artificial Intelligence

Stable/robust prediction and estimation“Stable Prediction across Unknown Environments,” (with Kun Kuang, Ruoxuan Xiong, Peng Cui, Bo Li), Knowledge Discovery & Data Mining, 2018.“Estimating Average Treatment Effects: Supplementary Analyses and Remaining Challenges,” (with Guido Imbens, Thai Pham, and Stefan Wager), American Economic Review, May 2017“A Measure of Robustness to Misspecification” (with Guido Imbens), American Economic Review, May 2015, 105 (5), 476‐480

Surrogates“Estimating Treatment Effects using Multiple Surrogates: The Role of the Surrogate Score and the Surrogate Index” (with Raj Chetty, Guido Imbens, Hyunseung Kang), 2016

Combining ML and Structural Models of Consumer Behavior“Estimating Heterogeneous Consumer Preferences for Restaurants and Travel Time Using Mobile Location Data,” (with David Blei, Robert Donnelly, Francisco Ruiz, and Tobias Schmidt), American Economic Review Papers and Proceedings, May, 2018“SHOPPER: A Probabilistic Model of Consumer Choice with Substitutes and Complements,” 2017, (with Francisco Ruiz and David Blei).“Counterfactual Inference for Consumer Choice Across Many Product Categories” (with David Blei, Rob Donnelly, Francisco Ruiz)

Generative Adversarial Networks“Using Wasserstein Generative Adversial Networks for the Design of Monte Carlo Simulations” with Guido Imbens, Jonas Metzger, Evan Munro

Causal Panel Data ModelsAthey, Bayati, Duodechenko, Khosravi, Imbens: “Matrix Completion Methods for Causal Panel Data Models” 2018Arkhangelsky, Athey, Hirschberg, Imbens, Wager: “Synthetic Difference in Differences” 2018Johannemann, Hadad, Athey, Wager: “Sufficient Representations for Categorical Variables”Xiong, Athey, Bayati, Imbens: “Optimal Experimental Designs for Staggered Rollouts” 2019

Treatment Effects, Assignment Policies“Recursive Partitioning for Heterogeneous Causal Effects” (with Guido Imbens), PNAS 2016“Estimation and Inference of Heterogeneous Treatment Effects using Random Forests” (with Stefan Wager), Journal of the American Statistical Association, 2018.“Generalized Random Forests,” with Julie Tibshirani and Stefan Wager, Annals of Statistics, 2019. “Efficient Policy Learning,” with Stefan Wager, 2017. “Offline Multi‐Action Policy Learning:  Generalization and Optimization,” (with Zhengyuan Zhou and Stefan Wager)“Local Linear Forests,” (with Rina Friedberg, Julie Tibshirani, and Stefan Wager), 2018.

Bandits, Contextual Bandits“Balanced Linear Contextual Bandits,” with Maria Dimakopoulou, Zhengyuan Zhou, and Guido Imbens, Association for the Advancement of Artificial Intelligence (AAAI), 2019.Hadad, Hirschberg, Zhan, Wager, Athey, “Confidence Intervals for Policy Evaluation in Adaptive Experiments.” 

Page 20: Lessons for External Validity from Large Scale Experimentation · Validity from Large Scale ... Experiments (Handbook of Experimental Economics) 7. ... Interference (networks) •Staggered

The potential outcomes framework

For a set of i.i.d. subjects i = 1, ..., n, we observe a tuple(Xi , Yi , Wi ), comprised of

I A feature vector Xi ∈ Rp,

I A response Yi ∈ R, and

I A treatment assignment Wi ∈ {0, 1}.

Following the potential outcomes framework (Holland, 1986,Imbens and Rubin, 2015, Rosenbaum and Rubin, 1983, Rubin,1974), we posit the existence of quantities Y

(0)i and Y

(1)i .

I These correspond to the response we would have measuredgiven that the i-th subject received treatment (Wi = 1) or notreatment (Wi = 0).

Page 21: Lessons for External Validity from Large Scale Experimentation · Validity from Large Scale ... Experiments (Handbook of Experimental Economics) 7. ... Interference (networks) •Staggered

The potential outcomes framework

For a set of i.i.d. subjects i = 1, ..., n, we observe a tuple(Xi , Yi , Wi ), comprised of

I A feature vector Xi ∈ Rp,

I A response Yi ∈ R, and

I A treatment assignment Wi ∈ {0, 1}.

Goal is to estimate the conditional average treatment effect

τ (x) = E[Y (1) − Y (0)

∣∣X = x].

NB: In experiments, we only get to see Yi = Y(Wi )i .

Page 22: Lessons for External Validity from Large Scale Experimentation · Validity from Large Scale ... Experiments (Handbook of Experimental Economics) 7. ... Interference (networks) •Staggered

The potential outcomes framework

If we make no further assumptions, estimating τ(x) is not possible.

I Literature often assumes unconfoundedness (Rosenbaumand Rubin, 1983)

{Y (0)i ,Y

(1)i }⊥⊥Wi

∣∣ Xi .

I When this assumption holds, methods based on matching orpropensity score estimation are usually consistent.

Page 23: Lessons for External Validity from Large Scale Experimentation · Validity from Large Scale ... Experiments (Handbook of Experimental Economics) 7. ... Interference (networks) •Staggered

Causal Trees

Divide population into subgroups to minimize MSE in treatmenteffects

I Goal: report heterogeneity without pre-analysis plan but withvalid confidence intervals

I Moving the goalposts: method defines estimand (treatmenteffects for subgroups) and generates estimates

I Solve over-fitting problem with sample splitting: choosesubgroups in half the sample and estimate on other half

Challenges

I Objective function is infeasible:∑

i

[(τi − τ(Xi ))2

]I Need to estimate objective to optimize for it rather than take

a simple average of squared error∑

i

[(Yi − µ(Xi ))2

]I Estimand is unstable

Page 24: Lessons for External Validity from Large Scale Experimentation · Validity from Large Scale ... Experiments (Handbook of Experimental Economics) 7. ... Interference (networks) •Staggered

Notation for Partitions and Leaf Effect Estimates

Three samples: model selection/tree construction: Str , estimationsample for leaf effects Sest , and a (hypothetical) test sample Ste .

Given a partition Π, τ(Xi ;Sest ,Π) is the sample average treatmenteffect in sample Sest for the leaf `(Xi ; Π) associated withcovariates Xi :

τ(Xi ;Sest ,Π) =1∑

j∈Sest∩`(Xi ;Π) Wi

∑j∈Sest∩`(Xi ;Π)

WiYi−

1∑j∈Sest∩`(Xi ;Π)(1−Wi )

∑j∈Sest∩`(Xi ;Π)

(1−Wi )Yi

Page 25: Lessons for External Validity from Large Scale Experimentation · Validity from Large Scale ... Experiments (Handbook of Experimental Economics) 7. ... Interference (networks) •Staggered

Estimating the MSE Criterion

Criterion for evaluating a partition Π anticipating re-estimating leafeffects using sample splitting:

MSE (Sest ,Ste) =∑i∈Ste

(τi − τ(Xi ;Sest ,Π))2

=∑i∈Ste

(τ2i − 2 · τi · τ(Xi ;Sest ,Π) + τ2(Xi ;Sest ,Π)

)

EMSE = ESest ,Ste[MSE (Sest ,Ste)

]= VSest ,Xi

[τ(Xi ; Π,Sest)

]− EXi

[τ2(Xi ; Π)

]+ E [τ2

i ]

The last equality makes use of fact that estimates are unbiased inindependent test sample. Can construct empirical estimates ofeach of these quantities except for the last which does not dependon Π and thus does not affect partition selection.

Page 26: Lessons for External Validity from Large Scale Experimentation · Validity from Large Scale ... Experiments (Handbook of Experimental Economics) 7. ... Interference (networks) •Staggered

Causal Tree Algorithm

I Divide data into tree-building Str and estimation Sest samplesI Use a greedy algorithm to recursively partition covariate spaceX into a deep partition ΠI At each node the split is selected as the one that minimizes

our estimate of EMSE over all possible binary splitsI Preserve minimum number of treated and control units in each

child leaf

I Use cross-validation to select the depth d∗ of the partitionthat minimizes an estimate of MSE of treatment effects, usingleft-out folds as proxies for the test set

I Select partition Π∗ by pruning Π to depth d∗, pruning leavesthat provide the smallest improvement in goodness of fit

I Estimate the treatment effects in each leaf of Π∗ using theestimation sample S

Page 27: Lessons for External Validity from Large Scale Experimentation · Validity from Large Scale ... Experiments (Handbook of Experimental Economics) 7. ... Interference (networks) •Staggered

Causal Trees: Search Demotion Example

Page 28: Lessons for External Validity from Large Scale Experimentation · Validity from Large Scale ... Experiments (Handbook of Experimental Economics) 7. ... Interference (networks) •Staggered

Causal Trees: Search Demotion Example

Page 29: Lessons for External Validity from Large Scale Experimentation · Validity from Large Scale ... Experiments (Handbook of Experimental Economics) 7. ... Interference (networks) •Staggered

Causal Trees: Search Demotion Example

Page 30: Lessons for External Validity from Large Scale Experimentation · Validity from Large Scale ... Experiments (Handbook of Experimental Economics) 7. ... Interference (networks) •Staggered

Causal Trees: Adaptive versus Honest Estimates

Crucial to use sample splitting!

Page 31: Lessons for External Validity from Large Scale Experimentation · Validity from Large Scale ... Experiments (Handbook of Experimental Economics) 7. ... Interference (networks) •Staggered

Low-Dimensional Representations v. Fully NonparametricEstimation

Causal Trees

I Move the goalpost, but get guaranteed coverage

I Easy to interpret, easy to mis-interpret

I Can be many trees

I Leaves differ in many ways if covariates correlated; describeleaves by means in all covariates

Causal Forests

I Attempt to estimate τ(x)

I Can estimate partial effects

I In high dimensions, still can have omitted variable issues

I Confidence intervals lose coverage in high dimensions (bias)

Page 32: Lessons for External Validity from Large Scale Experimentation · Validity from Large Scale ... Experiments (Handbook of Experimental Economics) 7. ... Interference (networks) •Staggered

Baseline method: k-NN matching

Consider the k-NN matching estimator for τ(x):

τ (x) =1

k

∑S1(x)

Yi −1

k

∑S0(x)

Yi ,

where S0/1(x) is the set of k-nearest cases/controls to x . This isconsistent given unconfoundedness and regularity conditions.

I Pro: Transparent asymptotics and good, robust performancewhen p is small.

I Con: Acute curse of dimensionality, even when p = 20 andn = 20k .

NB: Kernels have similar qualitative issues as k-NN.

Page 33: Lessons for External Validity from Large Scale Experimentation · Validity from Large Scale ... Experiments (Handbook of Experimental Economics) 7. ... Interference (networks) •Staggered

Adaptive nearest neighbor matching

Random forests are a a popular heuristic for adaptive nearestneighbors estimation introduced by Breiman (2001).

I Pro: Excellent empirical track record.

I Con: Often used as a black box, without statistical discussion.

There has been considerable interest in using forest-like methodsfor treatment effect estimation, but without formal theory.

I Green and Kern (2012) and Hill (2011) have considered usingBayesian forest algorithms (BART, Chipman et al., 2010).

I Several authors have also studied related tree-basedmethods: Athey and Imbens (2016), Su et al. (2009), Taddyet al. (2014), Wang and Rudin (2015), Zeilis et al. (2008), ...

Wager and Athey (2018) provide the first formal results allowingrandom forest to be used for provably valid asymptotic inference.

Page 34: Lessons for External Validity from Large Scale Experimentation · Validity from Large Scale ... Experiments (Handbook of Experimental Economics) 7. ... Interference (networks) •Staggered

Making k-NN matching adaptiveAthey and Imbens (2016) introduce causal tree: definesneighborhoods for matching based on recursive partitioning(Breiman, Friedman, Olshen, and Stone, 1984), advocate samplesplitting (w/ modified splitting rule) to get assumption-freeconfidence intervals for treatment effects in each leaf.

Euclidean neighborhood,for k-NN matching.

Tree-based neighborhood.

Page 35: Lessons for External Validity from Large Scale Experimentation · Validity from Large Scale ... Experiments (Handbook of Experimental Economics) 7. ... Interference (networks) •Staggered

From trees to random forests (Breiman, 2001)

Suppose we have a training set {(Xi , Yi , Wi )}ni=1, a test point x ,and a tree predictor

τ (x) = T (x ; {(Xi , Yi , Wi )}ni=1) .

Random forest idea: build and average many different trees T ∗:

τ (x) =1

B

B∑b=1

T ∗b (x ; {(Xi , Yi , Wi )}ni=1) .

· · ·

Page 36: Lessons for External Validity from Large Scale Experimentation · Validity from Large Scale ... Experiments (Handbook of Experimental Economics) 7. ... Interference (networks) •Staggered

From trees to random forests (Breiman, 2001)

Suppose we have a training set {(Xi , Yi , Wi )}ni=1, a test point x ,and a tree predictor

τ (x) = T (x ; {(Xi , Yi , Wi )}ni=1) .

Random forest idea: build and average many different trees T ∗:

τ (x) =1

B

B∑b=1

T ∗b (x ; {(Xi , Yi , Wi )}ni=1) .

We turn T into T ∗ by:

I Bagging / subsampling the training set (Breiman, 1996); thishelps smooth over discontinuities (Buhlmann and Yu, 2002).

I Selecting the splitting variable at each step from m out of prandomly drawn features (Amit and Geman, 1997).

Page 37: Lessons for External Validity from Large Scale Experimentation · Validity from Large Scale ... Experiments (Handbook of Experimental Economics) 7. ... Interference (networks) •Staggered

Statistical inference with regression forests

Honest trees do not use the same data to select partition (splits)and make predictions. Ex: Split-sample trees, propensity trees.

Theorem. (Wager and Athey, JASA, 2018) Regression forests areasymptotically Gaussian and centered,

µn (x)− µ (x)

σn (x)⇒ N (0, 1) , σ2

n(x)→p 0,

given the following assumptions (+ technical conditions):

1. Honesty. Individual trees are honest.

2. Subsampling. Individual trees are built on randomsubsamples of size s � nβ, where βmin < β < 1.

3. Continuous features. The features Xi have a density that isbounded away from 0 and ∞.

4. Lipschitz response. The conditional mean functionµ(x) = E

[Y∣∣X = x

]is Lipschitz continuous.

Page 38: Lessons for External Validity from Large Scale Experimentation · Validity from Large Scale ... Experiments (Handbook of Experimental Economics) 7. ... Interference (networks) •Staggered

The random forest kernel

· · ·

=⇒

Forests induce a kernel via averaging tree-based neighborhoods.This idea was used by Meinshausen (2006) for quantile regression.

Page 39: Lessons for External Validity from Large Scale Experimentation · Validity from Large Scale ... Experiments (Handbook of Experimental Economics) 7. ... Interference (networks) •Staggered

Applications in Economics and Marketing

I Hitsch and Misra (2017): Use causal forests to target catalogmailings. Causal forest detects significant heterogeneity,performs better than alternatives including LASSO andoff-the-shelf random forest

I Davis and Heller (2017): Analyze heterogeneous impacts ofsummer jobs using causal forest

I Athey, Campbell, Chyn, Hastings, and White (2018): Usecausal forest to show that re-employment services didn’tbenefit in ATE, but targeted policy can have substantialbenefits

Page 40: Lessons for External Validity from Large Scale Experimentation · Validity from Large Scale ... Experiments (Handbook of Experimental Economics) 7. ... Interference (networks) •Staggered

Labor Market - Reemployment ServicesAthey, Campbell, Chyn, Hastings, and White (2018)

I Goal: Increase job skills and employment for all RhodeIslanders (efficiently)

I Measure impact of employment service programsI Take advantage of a field experiment run by US Department

of Labor to measure the impact of employment services on UIand subsequent employment

I From 2005-2015, states were asked to randomly send lettersto UI claimants requiring employment services for continuedUI receipt

I Basic evaluation of 4 states finds mixed evidence on decreasein UI and earnings impacts

I In RI we find that nudge decreased weeks on UI by 1.4/21, noimpact on earnings

I Measure impact using new administrative data and causalforest (Wager and Athey 2018) to understand who benefits

I Use causal forest estimates to simulate benefits of targetedletters

Page 41: Lessons for External Validity from Large Scale Experimentation · Validity from Large Scale ... Experiments (Handbook of Experimental Economics) 7. ... Interference (networks) •Staggered

HTE in Rhode Island Re-employment Services Example

Page 42: Lessons for External Validity from Large Scale Experimentation · Validity from Large Scale ... Experiments (Handbook of Experimental Economics) 7. ... Interference (networks) •Staggered

HTE in Rhode Island Re-employment Services Example

Page 43: Lessons for External Validity from Large Scale Experimentation · Validity from Large Scale ... Experiments (Handbook of Experimental Economics) 7. ... Interference (networks) •Staggered

ML and Structural Models:Shopping Application

Scanner data from supermarket◦ Product hierarchy (category, class, subclass, UPC)◦ Prices change Tuesday evening◦ Study 123 high‐frequency categories with 1263 UPCs◦ Multiple UPCs per category◦ Typically purchase only one UPC per trip in categroy◦ Independent price changes◦ Not too much seasonality◦ 333,000 shopping trips for ~2000 consumers over 20 months

Economic Goals:◦ Optimal pricing◦ Benefits of personalization versus simpler segmentation

Methodological Goals:◦ Contrast off‐the‐shelf ML, off‐the‐shelf econometrics with combined models

◦ Tune and test models for counterfactual performance

Joint work with Rob Donnelly, David Blei, Fran Ruiz

Combine structural model with matrix factorization techniques and computational methods from ML

Page 44: Lessons for External Validity from Large Scale Experimentation · Validity from Large Scale ... Experiments (Handbook of Experimental Economics) 7. ... Interference (networks) •Staggered

Structural Model Matrix FactorizationMixed logit• User u, product i, time t

𝜇 𝜈 𝛽𝑋 𝛼 𝑝𝑈 𝜇 𝜖

• If 𝜖 i.i.d. Type I EV, then

Pr 𝑌 𝑖exp 𝜇

∑ exp 𝜇• Counterfactuals

• Out of stock• Price changes

Users

Items

𝑈 𝐼

𝑈 𝐾 𝐾 𝐼

Page 45: Lessons for External Validity from Large Scale Experimentation · Validity from Large Scale ... Experiments (Handbook of Experimental Economics) 7. ... Interference (networks) •Staggered

Structural Model + FactorizationMixed logit• User u, product i, time t

𝜇 𝜈 𝜅𝑋 𝛼 𝑝𝑈 𝜇 𝜖

• If 𝜖 i.i.d. Type I EV, then

Pr 𝑌 𝑖exp 𝜇

∑ exp 𝜇• Counterfactuals

• Out of stock• Price changes

Mixed logit + factors• User u, product i, time t

𝜇 𝛽 𝜃 𝜅 𝑋 𝜌 𝛼 𝑝

• Add in nesting for outside good• Implement as two‐stage estimation with inclusive value (McFadden)

• Also factorization of outside good

Page 46: Lessons for External Validity from Large Scale Experimentation · Validity from Large Scale ... Experiments (Handbook of Experimental Economics) 7. ... Interference (networks) •Staggered

Model Comparisons

Nested Factorization◦ All categories estimated in single model◦ Items substitutes within category, independent across◦ Tuned on held‐out validation set

Hierarchical Poisson Factorization (HPF)◦ All items in single model, each item independent of others◦ A form of matrix factorization allowing for covariates◦ Ignores prices◦ Scales easily

Category by category logits◦ Mixed logit (random coefficients)◦ Nested Logit◦ With various controls (demographic, etc.)

Logits with HPF Factors◦ Include user‐item prediction from HPF model

Page 47: Lessons for External Validity from Large Scale Experimentation · Validity from Large Scale ... Experiments (Handbook of Experimental Economics) 7. ... Interference (networks) •Staggered

Performance by Scenario(Counterfactual)

Evaluate log‐likelihood only in weeks where an item falls into specified scenarios:

• Price changed for the item this week

• Price changed for another item in the same category this week

• Another item in the same category is out of stock at least one day this week

Traditional logits improve with HPF(ML‐based user‐item predictions)

Page 48: Lessons for External Validity from Large Scale Experimentation · Validity from Large Scale ... Experiments (Handbook of Experimental Economics) 7. ... Interference (networks) •Staggered

Validation of Structural Parameter EstimatesCompare Tues‐Wed change in price to Tues‐Wed change in demand, in test setBreak out results by how price‐sensitive (elastic) we have estimated consumers to be