introduction to causal bayesian inference chris holmes ... · causal bayesian inference 3 overview...

Causal Bayesian Inference 1

Introduction to Causal Bayesian Inference

Chris HolmesUniversity of Oxford


Objectives of Course

◦ To introduce concepts and methods for Causal Inference from experimental and

observational studies

◦ To demonstrate applied Bayesian methods for causal data analysis


Overview of Course

◦ Introduction to causal inference and statistical association

◦ The fundamental problem of causal inference: potential outcomes and

counterfactuals

◦ Experimental designs where causal inference is straightforward

◦ The Rubin Causal Model – assignment mechanisms and models of “the

science”

◦ Causal inference in observational studies

- challenges and pitfalls

◦ Instrumental Variables and Mendelian Randomisation

◦ Conclusion: is causal analysis useful?


Reference material

Material is taken from some important references – please Google for exact citations

Angrist J. and Pischke J. (2009). Mostly Harmless Economics. Princeton.

Dawid, P. (2000). Causal inference without counterfactuals (with lively(!)

discussion). J. Am. Stat. Soc.

Didelez V. and Sheehan, N. (2007). Mendelian randomization as an

instrumental variable....Stats. Methods in Medical Research 16: 309–330.

Gelman, A. and Hill, J. (2007) Data analysis using regression and

multilevel/hierarchical models. Chaps 9, 10. Cambridge.

Hill, B. (1965) The environment and disease: association or causation? Proc.

Roy. Soc. Med. 58: 295–300

Holland, P. (1986). Statistics and causal inference. (with discussions) J. Am.

Stats. Assoc Vol. 81: 945-960.


Lauritzen, S. (2001). Causal inference from graphical models. In

Barndorff-Nielsen, O.E., Cox, D. R. and Klppelberg, C. (eds.) Complex

Stochastic Systems. Chapman and Hall/CRC. pp: 63-107.

Neyman, J. (1923). reproduced in English with commentary (1990): On the

application of probability......Statistical Science 5:4: 465–480

Parascandola, D. and Weed, D.L. (2001). Causation in epidemiology. J. Epi.

Community Health 55:905–912.

Pearl, J. (2009). Causal inference in statistics. Statistics Surveys 3: 96–146

Rubin, D. (1974). Estimating causal effects...J. Educational Physcology

Rubin, D. (2004). Direct and indirect causal effects...(with discussion). Scan. J.

Stats. 32: 161–170

Rubin, D. (2005). Causal inference using potential outcomes...J. Am. Stats.

Assoc 100:469: pp 322–

Causal Inference 6

Caveat

In my study of the literature and in my thoughts I find myself most closely aligned

with Rubin’s approach to causal inference treated within a missing data framework

embedded within a Bayesian model based approach. I have found this to be the

most explicit and useful for the type of applied problems in causal inference that I

address through my research. However, there is considerable contention in the field

as to the best way to approach causal inference (see references); graphical models

being another important field. For clarity and purpose this course will mainly

consider the Rubin approach, though I strongly recommend you read the literature

to take an informed opinion on your own position.

Causal Inference 7

Statistics

◦ Statistics is the scientific discipline concerned with the collection, analysis and

interpretation of data in order to better understand some underlying system of

interest

◦ Statistical analysis of experimental or observational data is traditionally based

around predictive models of association between explanatory variables (or

covariates) X and a response variable of interest Y

Pr(Y |X)

◦ Such descriptive analysis of covariation or dependence between X and Y is

without recourse to any mechanism which might link X to Y

◦ This approach is widespread, well motivated, important and sufficient for many

situations

- e.g. a predictive biomarker of treatment efficiency where we are concerned

with the predictive strength or quality and calibration of predictions

Causal Inference 8

- an exploratory (discovery driven) study of genome-wide genotypic variation

associating with disease risk

- the construction of an efficient portfolio of economic indicators

◦ However, when considering predictive models it is natural for one’s mind to

entertain notions of causation or to (incorrectly) interpret the model Pr(Y |X)as a causal model

- “What would have been the effect on y if x had been x∗?”

◦ In order to make such causal statements, on the effect on Y in changing X , we

will need an extended statistical framework

◦ In order to extend the statistical framework to make valid causal inference

assumptions will have to be made

- such assumptions should be made explicit and highly visible from the

beginning of any analysis

“nothing comes from nothing”

Causal Inference 9

Statistical Inference

◦ Statements such as “statistics can only tell us about association, and

association is not causation” are somewhat out-dated

◦ For example, randomised clinical trials estimate causal effects and are widely

used

- NEJM (2000) listed statistics (and RCTs) as one of top 11 most important

contributions to medicine in the last 1000 years; alongside fields such as

“elucidation of human anatomy and physiology”

◦ Why might one be interested in causation?

- Scientific endeavor: understanding mechanisms by which variables interact

- Public health: identifying modifiable risk factors – “if you stop smoking you

will reduce your risk of heart disease”

- Pharmacogenomics: characterising drugable targets on a pathway

Causal Inference 10

Fundamentals of Causal Inference: Potential Outcomes, Counterfactuals and

the fundamental problem of causal inference

◦ Throughout we shall restrict attention to the causal effect of a binary treatment

variable W ∈ {0, 1}e.g. W might stand for “taking an asprin”, “inheriting an allele”, “drinking a

pint of ale”....

◦ Causal statements will be made using the notion of potential outcomes (due to

Neyman-Rubin)

- this is comparisons of the potential outcomes, Y , which would have been

observed under different exposures of units to treatment

- a unit here refers to an experimental entity to which the treatment will be

applied: an individual, a plot, a mouse. A row in a data matrix

Causal Inference 11

◦ Potential Outcomes are closely related to the concept of counterfactuals which

in philosophy refer to statements of the kind

Counterfactual: “If X had been x∗, Y ∗ would have occurred”; when it is known

in fact that X 6= x∗

◦ To make clear the dependence of Y on the treatment we shall use notation

Y (0) to denote the outcome from an untreated unit and Y (1) for that of a

treated unit

◦ So the causal effect of the treatment is [Y (1)− Y (0)] and the potential

outcome Y (1−W ) is counterfactual following application of the treatment

◦ The “fundamental problem of causal inference” (Holland 1986) is that for any

individual only one of {Y (1), Y (0)} is observed, the other being missing (or

unobserved)

Yobs = WY (1) + (1−W )Y (0)

Causal Inference 12

and

Ymis = (1−W )Y (1) + WY (0)

Causal Inference 13

Fundamental of Causal Inference

◦ It is instructive to contemplate the following data structure which illustrates the

fundamental problem and motivates a solution

Causal Inference 14

Unit Covariate X W Y (0) Y (1) Individ. causal effect Y (1)− Y (0)

1 31 0 4 2∗ -2∗

2 20 1 6∗ 8 +2∗

3 27 1 2∗ 5 +3∗

4 34 0 7 8∗ +1∗

5 29 0 4 3∗ -1∗

6 31 1 7∗ 11 +4∗

True Averages 5∗ 6.2∗ +1.2∗

Observed 5 8

◦ where a ∗ denotes the value was unobserved

Causal Inference 15

◦ In reality what we observe is a table which looks like

Unit Covariate X W Y (0) Y (1) Individual causal effect Y (1)− Y (0)

1 31 0 4 ? ?

2 20 1 ? 8 ?

3 27 1 ? 5 ?

4 34 0 7 ? ?

5 29 0 4 ? ?

6 31 1 ? 11 ?

True Averages ? ? ?

Observed 5 8

Causal Inference 16

Causal Statements

◦ Given the fundamental problem, is it possible to make any valid statements on

causal effects?

◦ Not without assumptions

◦ However, consider a simple case where

1. Ascertainment / Recruitment: Units are selected or drawn at random from

some underlying population FU

2. Assignment: The experiment is designed and treatments are administered

according to Wi’s which are drawn randomly; say uniformly without

replacement from W = {0, 0, . . . , 0, 1, 1 . . . , 1}3. Assumption: The units do not interfere with one another such that

[Yi(1), Yi(0)] |Wi ⊥ W−i, Y−i(1), Y−i(0)

where W−i, Y−i(1), Y−i(0) denotes the treatments and outcomes other

than that of unit i.

Causal Inference 17

◦ Then under such circumstances we are able to make valid claims about

population statistics such as Y (1) and Y (0) the average (population) causal

effect (ACE), Neyman (1923),

ACE = EU [Y (1)]− EU [Y (0)]

Causal Inference 18

Completely Randomised Design

The important implications of points 1-3 above are that:

Point 1: The causal statement is with respect to the population defined via the unit

ascertainment process

Point 2: The randomisation process ensures, by design, that the assignment of

treatments to units is both unconfounded with potential outcomes and ignorable

Unconfounded:

Pr(W |Y (1), Y (0), X) = Pr(W |X)

so that yi(1), yi(0) ⊥ wi

Ignorable:

Pr(W |Y (1), Y (0), X) = Pr(W |Yobs, X)

Uncounfounded being the stronger condition

From which it follows that any systematic differences between EU [Y (1)] and

Causal Inference 19

EU [Y (0)] are attributable to treatment

Point 3: Non-interference implies conditional independence between the units given the

treatments and hence

y1 =1n1

∑

Wi=1

Yi(1)

and

y0 =1n0

∑

Wi=0

Yi(0)

are unbiased and optimal estimators of EU [Y (1)] and EU [Y (0)]

◦ These ideas date back (independently) to Fisher (1924) and Neyman (1923)

◦ Fisher considered the above set up and the notion of a sharp null hypothesis

that the treatment has no effect

H0 : Yi(1) = Yi(0)

it being “sharp” as under the null all potential outcomes are known

Causal Inference 20

◦ Then, under the null we can examine the distribution of a statistic S with respect

to the randomisation Pr(W ),

S = y1 − y0

which has known distributional form. Hence a p-value for H0 the causal null

hypothesis can be obtained given the observed statistic S.

◦ Neyman (1923), translated and reviewed in Stats. Sci (1990), derived

y1 − y0 ± 1.96se

as a confidence interval with frequentist coverage probabilities of

Y (1)− Y (0), with respect to the randomisation and population of units U

Causal Inference 21

Hypothetical Example

◦ Suppose I wish to infer the causal effect of asprin on migraine

1. Ascertainment / Recruitment: I sit in a hospital waiting for migraine patients

to present

2. Assignment: On presentation I draw a ball out of a bag at random without

replacement. If ball is red I give, and watch them take, an asprin. If ball is

blue I give, and watch them take, a placebo. The tablets are considered

identical other than the drug content. I do not tell the patients which type they

are are receiving. I stay with the patient for 30 mins and then ask them to

report their severity of migraine

3. Assumption: The patients have no interaction during this trial

◦ The trial stops when there are no more balls in the bag. I then look at the

average severity scores for red ball people (treated) vrs blue ball people

(untreated). I report the difference in average scores as an estimate of the

population average causal effect of asprin on migraine severity

Causal Inference 22

◦ Aside: what is the causal estimand here?

Causal Inference 23

Some things to note

◦ Point 1 defines the implicit frame of reference for the analysis

◦ Any systematic difference that exists between the units of study and future units

of treatment may well lead to differences between observed treatment effects

and inferred treatment effects

- consider a drug trial on poor under nourished students. This might have

systematic differences to the population as a whole

◦ Point 3: no interference between units has been formalised in the

stable-unit-treatment-value condition (SUTVA) of Rubin

- this is a key and necessary assumption without which more restrictive

assumptions are required

- SUTVA also assumes non-heterogenous treatment effects. That is, the

treatment has the same expected effect no matter how it is administered

e.g. if I slip you an asprin in your tea or I give you an asprin to take (and

Causal Inference 24

you take it) the treatment effect is the same

◦ However some examples where SUTVA maybe violated:

- suppose W is asprin treatment for headache administered to a group of

patients on a single ward. Then the outcome of the patient next to me may

effect my headache (if they are complaining or sleeping)

- in an agricultural field experiment with treatments applied to adjacent plots

healthy plants might block out light from unhealthy plants, or remove nutrient

or water from the soil thus affecting neighbouring plots

Causal Inference 25

Extensions

◦ Causal inference is clearly on safest ground when

- units are drawn at random from the population of interest

- treatments are administered in a completely randomised fashion

- units are treated such that interference cannot take place

◦ However, such circumstances are clearly restrictive

◦ Question: can causal effects be estimated in more general settings?

Causal Inference 26

Rubin Causal Model

◦ Beginning in 1974 Rubin produced a series of papers exploring generalisations

of the completely randomised experimental design

◦ This has become known as the Rubin Causal Model (RCM), though it’s more of

a framework than a model

◦ The corner stone of Rubin’s approach is to consider a joint probability model for

all quantities; under which the analysis of potential outcomes reduces to a

missing data problem for which well known inferential methods exist

◦ Under the RCM we construct a joint model

Pr(Y (1), Y (0),W,X)

it is important to note that for now X only includes covariates which are not

affected by the treatment (more on this later)

Causal Inference 27

◦ The joint model can be factorised as

Pr(Y (1), Y (0),W,X) = Pr(Y (1), Y (0), X)Pr(W |Y (1), Y (0), X)

That is, into a model of “the science”

Pr(Y (1), Y (0), X)

and a model for the assignment mechanism

Pr(W |Y (1), Y (0), X)

◦ Moreover if the assignment mechanism is unconfounded then

Pr(Y (1), Y (0),W,X) = Pr(Y (1), Y (0)|X)Pr(W |X)Pr(X)

which allows causal inference to proceed (with some further assumptions to be

explained).

Causal Inference 28

◦ It is the form of assignment mechanism Pr(W |·) that dictates whether the

scientific model is identifiable or not from the study data.

- It’s important to note the truism that the science exists independently of the study

◦ From this framework, randomised, completely randomised, experimental and

observational studies can all be studied: they’re simply different assignment

mechanisms!

◦ The RCM is beautifully simple and powerful

◦ From such a framework we are better able to better understand and interpret

the methods of Fisher and Neyman and see how the process can be

generalised to less rigid experimental designs and even observational studies

under further assumptions

Causal Inference 29

Implications of the RCM

1. The RCM separates out the true unknown model of Nature (“the science”)

Pr(Y (1), Y (0)|X)Pr(X)

from the man-made experiment (the assignment mechanism)

Pr(W |·)assuming non-heterogeneous treatment effects; that the treatment has the

same effect regardless of how it was administered

◦ This construction is incredibly powerful as it allows us concentrate on

appropriate models for the causal effect on potential outcomes {Y (1), Y (0)}in isolation of the experimental set up

- how the experiment was performed should not alter your prior beliefs about

the scientific mechanism

◦ Furthermore, as noted above, both experimental and observational studies can

Causal Inference 30

be analysed within the same framework. They simply have different assignment

mechanisms, for instance,

(a) In a completely randomised experiment the probabilistic assignment

mechanism is determined by the experimenter, say,

Pr(W |·) = Pr(W )

which, by design, makes the assignment unconfounded (and hence

ignorable) against the potential outcomes. Moreover

0 < Pr(Wi = 1|X,Yobs) < 1

the propensities are between (0, 1).

These two features in turn allows for efficient causal inference on

Pr(Y (1), Y (0), X). This is the key benefit of randomisation!

(b) In an observational study we need to construct a model for the assignment.

Causal Inference 31

If we can safely assume that the assignment is unconfounded or ignorable

Pr(W |·) = Pr(W |X,Yobs)

then we have a causal inferential framework to learn about Nature

Pr(Y (1), Y (0), X) – see below.

If we’re unable to believe Pr(W |·) 6= Pr(W |X, Yobs) then it may be that

the data and study do not support a causal analysis

In observational studies we may often be in the situation whereby the data

does not support a causal analysis. This is life. The assumptions should be

clearly stated and the analysis closed

◦ What can occur in non-ignorable assignment mechanisms?

◦ Consider a study whereby a doctor is prescribing a treatment whose effect we

wish to infer. Suppose we obtain the following data:

Causal Inference 32

Unit W Y (0) Y (1) Individual causal effect Y (1)− Y (0)

1 0 10 11∗ +1∗

2 1 3∗ 8 +5∗

3 1 5∗ 9 +4∗

4 0 11 10∗ -1∗

5 0 9 9∗ +0∗

6 1 7∗ 10 +3∗

True Averages 7.5∗ 9.5∗ +2∗

Observed 10 9

Causal Inference 33

◦ Unbeknown to us the doctor has a (good) intuition of whom the treatment will

most benefit: and he uses his intuition to assign treatments

◦ Hence the assignment mechanism

Pr(W |Y (1), Y (0), X)

cannot be simplified and is confounded with potential outcomes, such that if your

individual [Yi(1)− Yi(0)] is large you are more likely to be given the treatment

◦ In this case, if we had wrongly assumed an ignorable assignment mechanism

Pr(W |Y (1), Y (0), X) = Pr(W |Yobs, X)

we would estimate the average causal effect as−1 when it is in fact +2

◦ This highlights the strengths of experimental studies (we know(!)) that Pr(W ) is

unconfounded by construction; and the dangers of interpreting observational

studies where we may have to assume the assignment is ignorable

Causal Inference 34

Implications of the RCM II

2. We note that the Fisher and Neyman procedures are “model free”

nonparametric estimators of causal effects concerned with population statistics,

such as expectation.

◦ The RCM allows for the definition of a probabilitic model for

Pr(Y (1), Y (2), X) which captures a priori the uncertainty in the causal

effects and allows for any causal estimand to be inferred

Causal Inference 35

The scientific model

◦ The RCM leads to the specification of the joint distribution on potential

outcomes and covariates unaffected by treatment

Pr(Y (1), Y (0), X)

◦ It is natural to assume exchangeability between units

- you can think of exchangeability as implying “if I permuted the rows of the

data matrix you would not change your model”

◦ Hence by the de Finetti representation theorem we have, without loss of

generality, for any Pr(Y (1), Y (0), X)

Pr(Y (1), Y (0), X) =∫ ∏

i

f(Yi(1), Yi(0), Xi|θ)p(θ)dθ

where f(·|θ) is an iid model (akin to a likelihood) for the potential outcomes and

covariates and p(θ) is a prior distribution on the model parameters interpreted

Causal Inference 36

very loosely as

Pr(θ ≤ θ′) = Pr[F̂Y (1),Y (0) →n FY (1),Y (0)(·|θ∗)] : θ∗ ≤ θ′

That is, your subjective beliefs that the empirical distribution function converges

to the model’s sampling distribution with a parameter value less than θ′

◦ Hence the RCM naturally aligns to a Bayesian modelling perspective whereby

we use Pr(Y (1), Y (0), X) to capture our current state of uncertainty in the

causal effects

◦ One immediate beneficial consequence of this is that within Bayesian inference

when given a joint model and missing data captured in the form

Causal Inference 37

Unit Covariate X W Y (0) Y (1) Individual causal effect Y (1)− Y (0)

1 31 0 4 ? ?

2 20 1 ? 8 ?

3 27 1 ? 5 ?

4 34 0 7 ? ?

5 29 1 ? 2 ?

6 31 0 7 ? ?

True Averages ? ? ?

Observed 6 8

◦ We have well known imputation methods to proceed

Causal Inference 38

◦ This also highlights the key aspect played by ignorable assignment mechanisms

Pr(W |·) = Pr(W |X, Yobs)

as under this assumption we have

Pr(Ymis|X, Yobs,W ) ∝ Pr(Ymis|X, Yobs)

◦ which then allows inference to proceed via, say, simulation in a two stage

imputation process

(i) Update Ymis|Yobs, X, θ

(ii) Update θ|Yobs, Ymis, X

Repeat

◦ This procedure is known as Gibbs sampling. Under this scheme we know that

the collection of samples {θ(1), θ(2), . . . , θ(T )} from T -iterations (loops of the

Causal Inference 39

above) will look like samples of

Pr(θ|Yobs, X)

the posterior distribution (estimating) the causal effect

◦ We can use the samples {θ(i)}Ti=1 to calculate any statistic we wish about the

causal effect

◦ Not it may also be that the marginal Pr(Ymis|X, Yobs) is available

Pr(Ymis|X, Yobs) =∫

Pr(Ymis|X, Yobs, θ)Pr(θ|X, Yobs)dθ

which allows joint updating

◦ If the assignment is ignorable and the scientific model is simple, say,

WY (1) + (1−W )Y (0) ∼ N(Xβ + Wθ, σ2)

then we can simply regress Yobs on {X,W} to obtain an esitmate of the

causal effect θ̂; though the Gibbs sampling approach is more general.

Causal Inference 40

◦ Note: for an observational study (with non-randomized treatment assignment)

ignorable means we have accounted for all confounders, i.e. all covariates

which associate with treatment and potential outcomes are contained in X

Causal Inference 41

RCM Recap

◦ To recap, the essential features of the RCM are

(a) A joint model for potential outcomes, assignment mechanism and covariates

unaffected by treatment; which is factorised as

(b) A model of Nature on potential outcomes

(c) An explicit model for the assignment mechanism

(d) When the assignment mechanism is deemed ignorable we can impute Ymis

and thence infer any causal estimand

Causal Inference 42

Practical complications arising in Observational studies

◦ A number of complications can arise when dealing with observational studies

even when the assignment mechanism can be assumed ignorable

◦ In particular when the distributions of covariates Pr(X, W = 1) and

Pr(X,W = 0) differ across the treated-untreated groups

◦ Two particular instances of this are

(i) Imbalance: when the distributions are dissimilar

and the more extreme case of....

(ii) Lack of overlap: where the samples {Xi}Wi=1 and {Xi}Wi=0 have

regions of X space where they share little coverage

◦ Both of these lead to large variation in the propensity scores over X space

Pr(Wi|Xi, Yobs)

◦ Imbalance will decrease the precision (increase variance) in the posterior

Causal Inference 43

uncertainty of causal effects

◦ Lack of overlap may lead to non-identifiability of causal effects as differences in

potential outcomes are equally attributable to X as to W

- again, sometimes a data set cannot support a causal analysis

◦ See Gelman and Hill, Chapter 10, (2007) for further details and possible

solutions

Causal Inference 44

Dealing with variables on the causal pathway

◦ An interesting situation arises when we have additional intermediate variables

assumed to arise on the causal pathway, pictorially,

W → Z → Y

◦ If the assignment is ignorable, one might be tempted to treat Z as a covariate

and condition on it within a regression model, say,

Yobs ∼ N(Xβ + Zα + Wθ, σ2I)

and report θ̂ as the causal estimate for θ having controlled for X and Z

◦ This would be wrong!

- note even Fisher made this mistake; see Rubin (2005)

◦ In general you cannot treat Z as a covariate as you are conditioning on Zobs

and Z is affected by treatment

Causal Inference 45

◦ The solution is to consider {Z(1), Z(0)} as a potential intermediate variable;

only one of which is observed, the other being counterfactual once the

treatment has been assigned

◦ This causes no difficulties for the Bayesian simulation approach using Gibbs

Sampling which allows inference to proceed via, say, simulation in a three stage

imputation process

(i) Update Ymis|·(ii) Update Zmis|·(iii) Update θ|·

Repeat

◦ A key feature of the RCM is that it allows us to treat all unknowns within a

common coherent framework

- Condition on all potential confounders X unaffected by treatment

- Treat potential outcomes, and intermediate outcomes on the causal pathway

Causal Inference 46

within a missing data framework

{Y (1), Y (0), Z(1)(1), Z(1)(0), Z(2)(1), . . . , Z(p)(1), Z(p)(0)} for p

intermediate potential outcomes

Causal Inference 47

Observational Studies Recap

When the assignment mechanism is not randomised by the experimenter, i.e. an

observational study, we need to be much more careful about statements of putative

causality. Important points to note:

◦ Carefully consider and condition on all possible confounders, variables which

associate with both assignment and potential outcomes, in an attempt to make

the assignment ignorable

◦ Include any available predictors of potential outcome, i.e. variables which are

independent of assignment but associated with [Y (1), Y (0)]. This will improve

the precision of causal estimates

- my rule of thumb: you get 1% increase in power for every 1% of variance

explained in Y

◦ Treat any additional variables on the causal pathway as potential intermediate

outcomes [Z(1), Z(0)]

Causal Inference 48

◦ Use Bayesian inference

Finally, an extremely useful piece of advice when considering causal analysis of an

observational study is to think about the idealized randomised experiment that you

would have liked to perform (putting budgetary and ethical considerations aside) in

order to learn about the causal estimand of interest. This thought experiment really

helps reveal underlying deficiencies in the data and concentrates the mind on

confounders and what exactly can and is being estimated. See Angrist and Pischke

(2009) and Gelman and Hill (2007).

Causal Inference 49

Instrumental Variables

◦ As we’ve seen, one real challenge in the assumptions needed to make causal

statements on non-randomised studies is the inclusion of all confounders

(including the “known unknowns and the unknown unknowns”)

◦ It is perhaps rare that we will be in a position of confidence when we state we

have included all variables that associate with assignment and potential

outcome

◦ However, in certain circumstances we are able to proceed, all be it with a loss of

information, if we can find an instrumental variable that associates with

treatment but not marginally with the unmeasured confounder or conditionally

(on treatment) with the outcome. That is, T is an instrumental variable if we

have

1. Pr(T,W ) 6= Pr(T )Pr(W ) and

2. Pr(Y (0), Y (1)|W,T, U) = Pr(Y (0, Y (1)|W,U) and

Causal Inference 50

3. Pr(U, T ) = Pr(U)Pr(T ) for any confounder U

◦ In this case there is information in the data to infer a causal effect of treatment

even if we’ve not measured all confounders – see below

◦ Note: T does not have to be causal for W , simply it has to associated.

Causal Inference 51

Mendelian Randomization

◦ Mendelian Randomisation has received considerable attention and debate in

the epidemiology literature, see Didelez and Sheehan (2007).

◦ MR is an instantiation of an Instrumental Variable approach where the

instrument is an allele (a genetic marker) where interest is not on the causal

effect of the allele on a phenotype

◦ In particular suppose we have a molecular biomarker, say a metabolite

expression level in blood {present, absent}, and we’re interested to see if it

causally related to some clinical phenotype or anthropomorphic trait

◦ We are concerned that we have not recorded all potential confounders between

the biomarker and the phenotype

◦ However we know of a genetic marker (allele or SNP) which strongly associates

with the biomarker

◦ Then we can use the genotype as an instrumental variable for the biomarker

Causal Inference 52

◦ Because the genetic marker has been handed out randomly (Mendelian

inheritance) we may have greater confidence that as an instrument it is

unrelated to possible confounders. E.g. the biomarker might vary with (suppose

unmeasured) age, or diet, but as long as we can safely assume that the marker

you inherited is (marginally) unrelated with your age or diet then we’re safe to

proceed with a causal analysis

Causal Inference 53

Estimation of causal effect from IV

◦ Classical (non-Bayesian) estimation of the Average Causal Effect (ACE)

requires an additional assumptions on the dependence structures in order to

make the estimate identifiable

Strict Monotonicity: which implies that E[W |T ] is either increasing with T

or decreasing with T – clearly this always holds for binary T

◦ In this case a consistent estimate of the ACE is

θ̂ =β̂Y,T

β̂W,G

where β̂Y,T is the estimate of the regression coefficient from a least squares

regression of Yobs on the instrument T , with abuse of notation

β̂Y,T = (T ′T )−1T ′Yobs

and β̂W,G is from the least squares regression of Wobs on the instrument T ,

Causal Inference 54

with abuse of notation

β̂W,T = (T ′T )−1T ′Wobs

◦ Note:

(i) the stronger the causal effect between treatment and Y the stronger the

association between instrument and Y

(ii) the weaker the association between instrument and treatment the greater the

attenuation in the estimate of the ACE

◦ From which we can see that we’d like a strong instrument, one strongly

correlated with treatment

Causal Inference 55

Discussion: Is causal inference useful?

◦ When the experimentalist has the opportunity to intervene by prescribing

treatments, by designing Pr(W |X), there is no doubt that causal analysis is

highly informative

◦ For observational studies causal inference requires a lot more careful thought

over using predictive, descriptive models of association Pr(Y |X,W )

◦ We might question whether in fact causal statements can ever be achieved

(Dawid, 2000)

- though note there are many situations where we’re happy to proceed as if

conditions are met – c.f. Bayesian analysis which requires with Probability 1

that the true model is contained under your prior,M-closed

◦ The question then remains is the undertaking of a causal analysis useful?

◦ I believe, in many circumstances, the answer is yes. Clearly when designing

experiments or observational studies, thinking about the assignment

Causal Inference 56

mechanism is a highly useful exercise

◦ When presented with observational data to analyse, contemplating the idealized

randomized experiment to infer the causal estimand is an extremely useful

exercise

- Better understanding the limitations of your data is always a good thing.

◦ Moreover, causal inference pointing to a putative causal effect adds greater

weight to an association being true than a simple predictive model and hence is

more likely to hold up in practice

Causal Inference 57

In short when interest falls on understanding the effect of a particular explanatory

variable W on a response variable Y , causal analysis can help you plan, design,

understand and interpret both experimental and observational data even if the data

does not support a causal interpretation and you have to rely on predictive models

of dependence.

introduction to causal bayesian inference chris holmes ... · causal bayesian inference 3 overview...

Documents