causal inference in practice

Here, there, causality is everywhereAMIT SHARMA, MICROSOFT RESEARCHhttp://www.amitsharma.in@amt_shrma

http://www.amitsharma.in/

My route to causality

Building recommender

systems in social networks

Conducting user experiments

Estimating impact of

recommendations and social feeds

Causality is everywhere Spans every branch of science.

◦ Economics◦ Political science◦ Study of human behavior◦ Biology and medicine ◦ Computer science (?)

Spans centuries of thought.◦ Aristotle: “To know, is to know the final cause.”

Took us until 1930s to come up with the randomized experiment (Fisher).

Still early days for estimating causal effects from observational data.

Causality in economics

David Card. The causal effect of education on earnings (1999)Conley and Heerwig. The Long-Term Effects of Military Conscription on Mortality: Estimates From the Vietnam-Era Draft Lottery (2012)

Causality in political science

Darrell West. Air Wars (2013)Chattopadhyay and Duflo. Women as Policy Makers: Evidence from a Randomized Policy Experiment inIndia (2004)

Causality in human behavior

Thistlewaithe and Campbell. Effect of public recognition of scholastic achievement (1960)

Christakis and Fowler. The collective dynamics of smoking in a large social network (2008)

Causality in biology and medicine

Effect of Vitamin D deficiency on colon cancer Effect of heart attack surgery on long-term health of patient

Causality in web applications

Sharma and Cosley. Distinguishing between personal preference and homophily in online activity feeds (2016).

Sharma, Hofman and Watts. Estimating the causal impact of recommender systems (2015).

Counterfactual reasoning

Correlation question: How well can X predict Y?◦ Machine learning, Statistical estimation.

Interventionist question: If X is changed to X’, what will be the value of Y?◦ Experiments, Reinforcement learning, Contextual bandits.

Counterfactual question: If X would have been X’, what would be the value of Y?◦ Today’s focus.

Estimating causal effects from observational data

Why is causal inference hard?◦ Simpson’s paradox

The language of graphical models◦ Backdoor criterion◦ Frontdoor criterion

Common approaches for causal inference◦ Conditioning◦ Mechanism-based◦ Natural Experiments

Example: Estimating causal impact of recommender systems

Estimating the effectiveness of kidney stone treatment

Treatment A Treatment BSmall stones 93% (81/87) 87% (234/270)Large stones 73% (192/263) 69% (55/80)Both 78% (273/350) 83% (289/350)

Julious and Mullee. Confounding and Simpson’s Paradox (1994).http://en.wikipedia.org/wiki/Simpson’s_paradox

Two treatments for kidney stonesTreatment A : 78% effectiveTreatment B : 83% effective

http://en.wikipedia.org/wiki/Simpson%E2%80%99s_paradox

http://en.wikipedia.org/wiki/Simpson%E2%80%99s_paradox

Estimating ad placement on a search engine

Suppose we would like to optimize the set of ads shown for a query, rather than optimize inidividually.

Click probability estimates: q1, q2

Does q2 depend on q1?

1st, q1

2nd, q2

Confounders in ad placement

Let us define two groups with 2000 queries each: ◦ High q1: (149/2000) CTR on second ad◦ Low q1: (124/2000) CTR on second ad

Low q1 High q1Low q2 5.1% (92/1823) 4.8% (71/1500)High q2 18.1% (32/176) 15.6% (78/500)Both 6.2% (124/2000) 7.5% (149/2000)

Bottou et al. Counterfactual reasoning and learning systems (2013).

Causal graphical models: a framework for causalityStructural equation modeling (SEM)X = q1Y = CTR on second ad

Which variables to condition on?

Observed variables◦ Which observed variables?◦ As we will see, observing on all variables may not be correct.

Known unknowns:◦ Age, Past diseases, Food intake

Unknown unknowns:◦ What else could impact recovery from kidney stones?◦ Genetic markers?

Which variables to condition on?

Connections to Bayesian networks

Markov assumption: Probability of an effect is independent of everything else given its direct causes.

Two approaches:--Backdoor criterion--Frontdoor criterion

Graphical Models and common methods for causal estimation

Condition on observed covariates

• Stratification• Matching• Regression (?)

Mechanism-based strategies

• Path-based approaches

Natural experiments

• As-if experiments• Instrumental

Variables• Regression

discontinuity

I. Conditioning on observed covariates

Corresponds to Backdoor criterion.

a) StratificationCondition on different levels of socio-economic status.

b) Matching Socio-Economic status is a function of parents’ income, locality and other observed indicators.

b) Matching Model propensity to attend a particular school.

Pschool = f(PI, Loc, …)

c) RegressionCondition on observed covariates by adding them as independent variables in regression.

Works only if true causalrelationship between variables is linear.

II. Mechanism-based strategies

Corresponds to Front door criterion.

III. Natural Experiments Look for experiments happening in the real world.

Promise greater generalizability than controlled lab experiments.

Require greater care to ensure validity of causal identification.

a. (As-if) random experiments

b) Regression discontinuity

c) Instrumental variables

Shock! Increase in traffic

Summary: Two graphical criteria explain all of conventional approaches

A principled, succinct framework for causality.

Allows arbitrary functional forms for relationships between variables.

Leads to clear statements about causal assumptions.

If a causal effect can be identified, it can be derived using do-calculus (helpful for bigger graphs).

Product recommendations on Amazon

Do recommendations expose people to new products?

Do recommendations lead to more purchases?

Counterfactual reasoning

What would have happened in case there were no recommendations?

X = Activity on current item that the user is viewing

Y = Activity on the recommended Item

UX = Latent properties of X

UY = Latent Properties of Y

Why is estimating effects of recommendations difficult using observational data?If latent properties for X and Y are correlated, then observed changes in AY cannot be directly attributed to AX.

AX AY

UYUX

A causal graphical model for the impact of recommendations

(ref. Pearl 09)

AX = Visits on a product X on Amazon

AY = Recommendation click-throughs from X to Y

UX = Consumer demand for X

UY = Consumer demand for Y

If latent properties for X and Y are correlated, then observed changes in AY cannot be directly attributed to AX.

AX AY

UYUX

A causal graphical model for the impact of recommendations

Example: Looking for a machine learning bookObserved clickthrough data due to recommendations do not tell the full story.

For example, let’s assume I just completed the Artificial Intelligence book by Russell and Norvig and now I want to learn more about machine learning.

Xi: Focal Product

Yj: Recommended Products

Xi: Focal Product

Yj: Recommended Products

Causal Link

Convenience Link

Revisit Link

Wasted Link

There could be also be irrelevant links.

The Shock strategy (I.V.)

If direct visits to product Yj are nearly constant, then we can assume that the convenience clicks to Yj will be nearly constant.

Thus,

The Shock strategyWe cannot say much during normal traffic for a product. But if a product experiences a spike in visits and its recommended product does not, then we can demonstrate a method to compute the causal clickthrough rate.

Data description Dataset: Anonymized Amazon URL log data from Bing toolbar for opted-in users. Eight months (Sept. 1 2013 to May 31 2014).

URL structure allows us to determine:◦ Type of page visited (product, search, cart, bestsellers, wishlist)◦ Type of referral to a product (recommendation, search, none, others)

After filtering out bots, sellers, authors, publishers and unpopular products (<5 visits):

◦ Number of products = 1.38 M◦ Number of users = 2.1M◦ 60 product categories (such as Books, Toys, Electronics)

Implementing the strategy: The shock criteriaLarge: Visits during a shock must exceed 5 times the median traffic for a product

Sudden: Visits during a shock must be 5 times the last day’s traffic and 5 times the last week’s traffic

Sane: Visits from at least 10 unique users and on 5 different days before and after a shock

4776 shocks to 4126 products

Implementing the strategy: The shock criteriaAdditionally, we want direct visits to Yj be constant. Maximum change in direct visits to Yj should not bigger than the size of the shock.When beta=1, ideally causal. When beta=1, all bets are off.

Good shock

Bad shock (filtered out at beta=0.7)

Results: Fraction of causal clickthroughs by categoryMajority of the clickthroughs are due to convenience.

Within any category, 5% or lower is a more accurate estimate of clickthroughs caused by recommendations.

Robustness checks Shocks may not be representative

◦ Distribution of users, popularity and the affinity between users and products does not see much difference (except that shocked products are, on average, more popular).

Shocks may be caused by deals which make the focal product more attractive

◦ Verification using referrals from log data (e.g. bookbub.com) and manual inspection of past prices (from camelcamelcamel.com)

Shocks may be a property of the weird holiday season.◦ They occur throughout the data, although with more frequency during the

holidays.

Graphical models form a succinct, sound and complete framework for reasoning about causality.They can also be practical.

THANK YOU!AMIT SHARMA, MICROSOFT RESEARCHhttp://www.amitsharma.in@amt_shrma

http://www.amitsharma.in/

causal inference in practice

Science