introduction to causal bayesian inference chris holmes ... · causal bayesian inference 3 overview...
TRANSCRIPT
Causal Bayesian Inference 1
Introduction to Causal Bayesian Inference
Chris HolmesUniversity of Oxford
Causal Bayesian Inference 2
Objectives of Course
◦ To introduce concepts and methods for Causal Inference from experimental and
observational studies
◦ To demonstrate applied Bayesian methods for causal data analysis
Causal Bayesian Inference 3
Overview of Course
◦ Introduction to causal inference and statistical association
◦ The fundamental problem of causal inference: potential outcomes and
counterfactuals
◦ Experimental designs where causal inference is straightforward
◦ The Rubin Causal Model – assignment mechanisms and models of “the
science”
◦ Causal inference in observational studies
- challenges and pitfalls
◦ Instrumental Variables and Mendelian Randomisation
◦ Conclusion: is causal analysis useful?
Causal Bayesian Inference 4
Reference material
Material is taken from some important references – please Google for exact citations
Angrist J. and Pischke J. (2009). Mostly Harmless Economics. Princeton.
Dawid, P. (2000). Causal inference without counterfactuals (with lively(!)
discussion). J. Am. Stat. Soc.
Didelez V. and Sheehan, N. (2007). Mendelian randomization as an
instrumental variable....Stats. Methods in Medical Research 16: 309–330.
Gelman, A. and Hill, J. (2007) Data analysis using regression and
multilevel/hierarchical models. Chaps 9, 10. Cambridge.
Hill, B. (1965) The environment and disease: association or causation? Proc.
Roy. Soc. Med. 58: 295–300
Holland, P. (1986). Statistics and causal inference. (with discussions) J. Am.
Stats. Assoc Vol. 81: 945-960.
Causal Bayesian Inference 5
Lauritzen, S. (2001). Causal inference from graphical models. In
Barndorff-Nielsen, O.E., Cox, D. R. and Klppelberg, C. (eds.) Complex
Stochastic Systems. Chapman and Hall/CRC. pp: 63-107.
Neyman, J. (1923). reproduced in English with commentary (1990): On the
application of probability......Statistical Science 5:4: 465–480
Parascandola, D. and Weed, D.L. (2001). Causation in epidemiology. J. Epi.
Community Health 55:905–912.
Pearl, J. (2009). Causal inference in statistics. Statistics Surveys 3: 96–146
Rubin, D. (1974). Estimating causal effects...J. Educational Physcology
Rubin, D. (2004). Direct and indirect causal effects...(with discussion). Scan. J.
Stats. 32: 161–170
Rubin, D. (2005). Causal inference using potential outcomes...J. Am. Stats.
Assoc 100:469: pp 322–
Causal Inference 6
Caveat
In my study of the literature and in my thoughts I find myself most closely aligned
with Rubin’s approach to causal inference treated within a missing data framework
embedded within a Bayesian model based approach. I have found this to be the
most explicit and useful for the type of applied problems in causal inference that I
address through my research. However, there is considerable contention in the field
as to the best way to approach causal inference (see references); graphical models
being another important field. For clarity and purpose this course will mainly
consider the Rubin approach, though I strongly recommend you read the literature
to take an informed opinion on your own position.
Causal Inference 7
Statistics
◦ Statistics is the scientific discipline concerned with the collection, analysis and
interpretation of data in order to better understand some underlying system of
interest
◦ Statistical analysis of experimental or observational data is traditionally based
around predictive models of association between explanatory variables (or
covariates) X and a response variable of interest Y
Pr(Y |X)
◦ Such descriptive analysis of covariation or dependence between X and Y is
without recourse to any mechanism which might link X to Y
◦ This approach is widespread, well motivated, important and sufficient for many
situations
- e.g. a predictive biomarker of treatment efficiency where we are concerned
with the predictive strength or quality and calibration of predictions
Causal Inference 8
- an exploratory (discovery driven) study of genome-wide genotypic variation
associating with disease risk
- the construction of an efficient portfolio of economic indicators
◦ However, when considering predictive models it is natural for one’s mind to
entertain notions of causation or to (incorrectly) interpret the model Pr(Y |X)as a causal model
- “What would have been the effect on y if x had been x∗?”
◦ In order to make such causal statements, on the effect on Y in changing X , we
will need an extended statistical framework
◦ In order to extend the statistical framework to make valid causal inference
assumptions will have to be made
- such assumptions should be made explicit and highly visible from the
beginning of any analysis
“nothing comes from nothing”
Causal Inference 9
Statistical Inference
◦ Statements such as “statistics can only tell us about association, and
association is not causation” are somewhat out-dated
◦ For example, randomised clinical trials estimate causal effects and are widely
used
- NEJM (2000) listed statistics (and RCTs) as one of top 11 most important
contributions to medicine in the last 1000 years; alongside fields such as
“elucidation of human anatomy and physiology”
◦ Why might one be interested in causation?
- Scientific endeavor: understanding mechanisms by which variables interact
- Public health: identifying modifiable risk factors – “if you stop smoking you
will reduce your risk of heart disease”
- Pharmacogenomics: characterising drugable targets on a pathway
Causal Inference 10
Fundamentals of Causal Inference: Potential Outcomes, Counterfactuals and
the fundamental problem of causal inference
◦ Throughout we shall restrict attention to the causal effect of a binary treatment
variable W ∈ {0, 1}e.g. W might stand for “taking an asprin”, “inheriting an allele”, “drinking a
pint of ale”....
◦ Causal statements will be made using the notion of potential outcomes (due to
Neyman-Rubin)
- this is comparisons of the potential outcomes, Y , which would have been
observed under different exposures of units to treatment
- a unit here refers to an experimental entity to which the treatment will be
applied: an individual, a plot, a mouse. A row in a data matrix
Causal Inference 11
◦ Potential Outcomes are closely related to the concept of counterfactuals which
in philosophy refer to statements of the kind
Counterfactual: “If X had been x∗, Y ∗ would have occurred”; when it is known
in fact that X 6= x∗
◦ To make clear the dependence of Y on the treatment we shall use notation
Y (0) to denote the outcome from an untreated unit and Y (1) for that of a
treated unit
◦ So the causal effect of the treatment is [Y (1)− Y (0)] and the potential
outcome Y (1−W ) is counterfactual following application of the treatment
◦ The “fundamental problem of causal inference” (Holland 1986) is that for any
individual only one of {Y (1), Y (0)} is observed, the other being missing (or
unobserved)
Yobs = WY (1) + (1−W )Y (0)
Causal Inference 12
and
Ymis = (1−W )Y (1) + WY (0)
Causal Inference 13
Fundamental of Causal Inference
◦ It is instructive to contemplate the following data structure which illustrates the
fundamental problem and motivates a solution
Causal Inference 14
Unit Covariate X W Y (0) Y (1) Individ. causal effect Y (1)− Y (0)
1 31 0 4 2∗ -2∗
2 20 1 6∗ 8 +2∗
3 27 1 2∗ 5 +3∗
4 34 0 7 8∗ +1∗
5 29 0 4 3∗ -1∗
6 31 1 7∗ 11 +4∗
True Averages 5∗ 6.2∗ +1.2∗
Observed 5 8
◦ where a ∗ denotes the value was unobserved
Causal Inference 15
◦ In reality what we observe is a table which looks like
Unit Covariate X W Y (0) Y (1) Individual causal effect Y (1)− Y (0)
1 31 0 4 ? ?
2 20 1 ? 8 ?
3 27 1 ? 5 ?
4 34 0 7 ? ?
5 29 0 4 ? ?
6 31 1 ? 11 ?
True Averages ? ? ?
Observed 5 8
Causal Inference 16
Causal Statements
◦ Given the fundamental problem, is it possible to make any valid statements on
causal effects?
◦ Not without assumptions
◦ However, consider a simple case where
1. Ascertainment / Recruitment: Units are selected or drawn at random from
some underlying population FU
2. Assignment: The experiment is designed and treatments are administered
according to Wi’s which are drawn randomly; say uniformly without
replacement from W = {0, 0, . . . , 0, 1, 1 . . . , 1}3. Assumption: The units do not interfere with one another such that
[Yi(1), Yi(0)] |Wi ⊥ W−i, Y−i(1), Y−i(0)
where W−i, Y−i(1), Y−i(0) denotes the treatments and outcomes other
than that of unit i.
Causal Inference 17
◦ Then under such circumstances we are able to make valid claims about
population statistics such as Y (1) and Y (0) the average (population) causal
effect (ACE), Neyman (1923),
ACE = EU [Y (1)]− EU [Y (0)]
Causal Inference 18
Completely Randomised Design
The important implications of points 1-3 above are that:
Point 1: The causal statement is with respect to the population defined via the unit
ascertainment process
Point 2: The randomisation process ensures, by design, that the assignment of
treatments to units is both unconfounded with potential outcomes and ignorable
Unconfounded:
Pr(W |Y (1), Y (0), X) = Pr(W |X)
so that yi(1), yi(0) ⊥ wi
Ignorable:
Pr(W |Y (1), Y (0), X) = Pr(W |Yobs, X)
Uncounfounded being the stronger condition
From which it follows that any systematic differences between EU [Y (1)] and
Causal Inference 19
EU [Y (0)] are attributable to treatment
Point 3: Non-interference implies conditional independence between the units given the
treatments and hence
y1 =1n1
∑
Wi=1
Yi(1)
and
y0 =1n0
∑
Wi=0
Yi(0)
are unbiased and optimal estimators of EU [Y (1)] and EU [Y (0)]
◦ These ideas date back (independently) to Fisher (1924) and Neyman (1923)
◦ Fisher considered the above set up and the notion of a sharp null hypothesis
that the treatment has no effect
H0 : Yi(1) = Yi(0)
it being “sharp” as under the null all potential outcomes are known
Causal Inference 20
◦ Then, under the null we can examine the distribution of a statistic S with respect
to the randomisation Pr(W ),
S = y1 − y0
which has known distributional form. Hence a p-value for H0 the causal null
hypothesis can be obtained given the observed statistic S.
◦ Neyman (1923), translated and reviewed in Stats. Sci (1990), derived
y1 − y0 ± 1.96se
as a confidence interval with frequentist coverage probabilities of
Y (1)− Y (0), with respect to the randomisation and population of units U
Causal Inference 21
Hypothetical Example
◦ Suppose I wish to infer the causal effect of asprin on migraine
1. Ascertainment / Recruitment: I sit in a hospital waiting for migraine patients
to present
2. Assignment: On presentation I draw a ball out of a bag at random without
replacement. If ball is red I give, and watch them take, an asprin. If ball is
blue I give, and watch them take, a placebo. The tablets are considered
identical other than the drug content. I do not tell the patients which type they
are are receiving. I stay with the patient for 30 mins and then ask them to
report their severity of migraine
3. Assumption: The patients have no interaction during this trial
◦ The trial stops when there are no more balls in the bag. I then look at the
average severity scores for red ball people (treated) vrs blue ball people
(untreated). I report the difference in average scores as an estimate of the
population average causal effect of asprin on migraine severity
Causal Inference 22
◦ Aside: what is the causal estimand here?
Causal Inference 23
Some things to note
◦ Point 1 defines the implicit frame of reference for the analysis
◦ Any systematic difference that exists between the units of study and future units
of treatment may well lead to differences between observed treatment effects
and inferred treatment effects
- consider a drug trial on poor under nourished students. This might have
systematic differences to the population as a whole
◦ Point 3: no interference between units has been formalised in the
stable-unit-treatment-value condition (SUTVA) of Rubin
- this is a key and necessary assumption without which more restrictive
assumptions are required
- SUTVA also assumes non-heterogenous treatment effects. That is, the
treatment has the same expected effect no matter how it is administered
e.g. if I slip you an asprin in your tea or I give you an asprin to take (and
Causal Inference 24
you take it) the treatment effect is the same
◦ However some examples where SUTVA maybe violated:
- suppose W is asprin treatment for headache administered to a group of
patients on a single ward. Then the outcome of the patient next to me may
effect my headache (if they are complaining or sleeping)
- in an agricultural field experiment with treatments applied to adjacent plots
healthy plants might block out light from unhealthy plants, or remove nutrient
or water from the soil thus affecting neighbouring plots
Causal Inference 25
Extensions
◦ Causal inference is clearly on safest ground when
- units are drawn at random from the population of interest
- treatments are administered in a completely randomised fashion
- units are treated such that interference cannot take place
◦ However, such circumstances are clearly restrictive
◦ Question: can causal effects be estimated in more general settings?
Causal Inference 26
Rubin Causal Model
◦ Beginning in 1974 Rubin produced a series of papers exploring generalisations
of the completely randomised experimental design
◦ This has become known as the Rubin Causal Model (RCM), though it’s more of
a framework than a model
◦ The corner stone of Rubin’s approach is to consider a joint probability model for
all quantities; under which the analysis of potential outcomes reduces to a
missing data problem for which well known inferential methods exist
◦ Under the RCM we construct a joint model
Pr(Y (1), Y (0),W,X)
it is important to note that for now X only includes covariates which are not
affected by the treatment (more on this later)
Causal Inference 27
◦ The joint model can be factorised as
Pr(Y (1), Y (0),W,X) = Pr(Y (1), Y (0), X)Pr(W |Y (1), Y (0), X)
That is, into a model of “the science”
Pr(Y (1), Y (0), X)
and a model for the assignment mechanism
Pr(W |Y (1), Y (0), X)
◦ Moreover if the assignment mechanism is unconfounded then
Pr(Y (1), Y (0),W,X) = Pr(Y (1), Y (0)|X)Pr(W |X)Pr(X)
which allows causal inference to proceed (with some further assumptions to be
explained).
Causal Inference 28
◦ It is the form of assignment mechanism Pr(W |·) that dictates whether the
scientific model is identifiable or not from the study data.
- It’s important to note the truism that the science exists independently of the study
◦ From this framework, randomised, completely randomised, experimental and
observational studies can all be studied: they’re simply different assignment
mechanisms!
◦ The RCM is beautifully simple and powerful
◦ From such a framework we are better able to better understand and interpret
the methods of Fisher and Neyman and see how the process can be
generalised to less rigid experimental designs and even observational studies
under further assumptions
Causal Inference 29
Implications of the RCM
1. The RCM separates out the true unknown model of Nature (“the science”)
Pr(Y (1), Y (0)|X)Pr(X)
from the man-made experiment (the assignment mechanism)
Pr(W |·)assuming non-heterogeneous treatment effects; that the treatment has the
same effect regardless of how it was administered
◦ This construction is incredibly powerful as it allows us concentrate on
appropriate models for the causal effect on potential outcomes {Y (1), Y (0)}in isolation of the experimental set up
- how the experiment was performed should not alter your prior beliefs about
the scientific mechanism
◦ Furthermore, as noted above, both experimental and observational studies can
Causal Inference 30
be analysed within the same framework. They simply have different assignment
mechanisms, for instance,
(a) In a completely randomised experiment the probabilistic assignment
mechanism is determined by the experimenter, say,
Pr(W |·) = Pr(W )
which, by design, makes the assignment unconfounded (and hence
ignorable) against the potential outcomes. Moreover
0 < Pr(Wi = 1|X,Yobs) < 1
the propensities are between (0, 1).
These two features in turn allows for efficient causal inference on
Pr(Y (1), Y (0), X). This is the key benefit of randomisation!
(b) In an observational study we need to construct a model for the assignment.
Causal Inference 31
If we can safely assume that the assignment is unconfounded or ignorable
Pr(W |·) = Pr(W |X,Yobs)
then we have a causal inferential framework to learn about Nature
Pr(Y (1), Y (0), X) – see below.
If we’re unable to believe Pr(W |·) 6= Pr(W |X, Yobs) then it may be that
the data and study do not support a causal analysis
In observational studies we may often be in the situation whereby the data
does not support a causal analysis. This is life. The assumptions should be
clearly stated and the analysis closed
◦ What can occur in non-ignorable assignment mechanisms?
◦ Consider a study whereby a doctor is prescribing a treatment whose effect we
wish to infer. Suppose we obtain the following data:
Causal Inference 32
Unit W Y (0) Y (1) Individual causal effect Y (1)− Y (0)
1 0 10 11∗ +1∗
2 1 3∗ 8 +5∗
3 1 5∗ 9 +4∗
4 0 11 10∗ -1∗
5 0 9 9∗ +0∗
6 1 7∗ 10 +3∗
True Averages 7.5∗ 9.5∗ +2∗
Observed 10 9
Causal Inference 33
◦ Unbeknown to us the doctor has a (good) intuition of whom the treatment will
most benefit: and he uses his intuition to assign treatments
◦ Hence the assignment mechanism
Pr(W |Y (1), Y (0), X)
cannot be simplified and is confounded with potential outcomes, such that if your
individual [Yi(1)− Yi(0)] is large you are more likely to be given the treatment
◦ In this case, if we had wrongly assumed an ignorable assignment mechanism
Pr(W |Y (1), Y (0), X) = Pr(W |Yobs, X)
we would estimate the average causal effect as−1 when it is in fact +2
◦ This highlights the strengths of experimental studies (we know(!)) that Pr(W ) is
unconfounded by construction; and the dangers of interpreting observational
studies where we may have to assume the assignment is ignorable
Causal Inference 34
Implications of the RCM II
2. We note that the Fisher and Neyman procedures are “model free”
nonparametric estimators of causal effects concerned with population statistics,
such as expectation.
◦ The RCM allows for the definition of a probabilitic model for
Pr(Y (1), Y (2), X) which captures a priori the uncertainty in the causal
effects and allows for any causal estimand to be inferred
Causal Inference 35
The scientific model
◦ The RCM leads to the specification of the joint distribution on potential
outcomes and covariates unaffected by treatment
Pr(Y (1), Y (0), X)
◦ It is natural to assume exchangeability between units
- you can think of exchangeability as implying “if I permuted the rows of the
data matrix you would not change your model”
◦ Hence by the de Finetti representation theorem we have, without loss of
generality, for any Pr(Y (1), Y (0), X)
Pr(Y (1), Y (0), X) =∫ ∏
i
f(Yi(1), Yi(0), Xi|θ)p(θ)dθ
where f(·|θ) is an iid model (akin to a likelihood) for the potential outcomes and
covariates and p(θ) is a prior distribution on the model parameters interpreted
Causal Inference 36
very loosely as
Pr(θ ≤ θ′) = Pr[F̂Y (1),Y (0) →n FY (1),Y (0)(·|θ∗)] : θ∗ ≤ θ′
That is, your subjective beliefs that the empirical distribution function converges
to the model’s sampling distribution with a parameter value less than θ′
◦ Hence the RCM naturally aligns to a Bayesian modelling perspective whereby
we use Pr(Y (1), Y (0), X) to capture our current state of uncertainty in the
causal effects
◦ One immediate beneficial consequence of this is that within Bayesian inference
when given a joint model and missing data captured in the form
Causal Inference 37
Unit Covariate X W Y (0) Y (1) Individual causal effect Y (1)− Y (0)
1 31 0 4 ? ?
2 20 1 ? 8 ?
3 27 1 ? 5 ?
4 34 0 7 ? ?
5 29 1 ? 2 ?
6 31 0 7 ? ?
True Averages ? ? ?
Observed 6 8
◦ We have well known imputation methods to proceed
Causal Inference 38
◦ This also highlights the key aspect played by ignorable assignment mechanisms
Pr(W |·) = Pr(W |X, Yobs)
as under this assumption we have
Pr(Ymis|X, Yobs,W ) ∝ Pr(Ymis|X, Yobs)
◦ which then allows inference to proceed via, say, simulation in a two stage
imputation process
(i) Update Ymis|Yobs, X, θ
(ii) Update θ|Yobs, Ymis, X
Repeat
◦ This procedure is known as Gibbs sampling. Under this scheme we know that
the collection of samples {θ(1), θ(2), . . . , θ(T )} from T -iterations (loops of the
Causal Inference 39
above) will look like samples of
Pr(θ|Yobs, X)
the posterior distribution (estimating) the causal effect
◦ We can use the samples {θ(i)}Ti=1 to calculate any statistic we wish about the
causal effect
◦ Not it may also be that the marginal Pr(Ymis|X, Yobs) is available
Pr(Ymis|X, Yobs) =∫
Pr(Ymis|X, Yobs, θ)Pr(θ|X, Yobs)dθ
which allows joint updating
◦ If the assignment is ignorable and the scientific model is simple, say,
WY (1) + (1−W )Y (0) ∼ N(Xβ + Wθ, σ2)
then we can simply regress Yobs on {X,W} to obtain an esitmate of the
causal effect θ̂; though the Gibbs sampling approach is more general.
Causal Inference 40
◦ Note: for an observational study (with non-randomized treatment assignment)
ignorable means we have accounted for all confounders, i.e. all covariates
which associate with treatment and potential outcomes are contained in X
Causal Inference 41
RCM Recap
◦ To recap, the essential features of the RCM are
(a) A joint model for potential outcomes, assignment mechanism and covariates
unaffected by treatment; which is factorised as
(b) A model of Nature on potential outcomes
(c) An explicit model for the assignment mechanism
(d) When the assignment mechanism is deemed ignorable we can impute Ymis
and thence infer any causal estimand
Causal Inference 42
Practical complications arising in Observational studies
◦ A number of complications can arise when dealing with observational studies
even when the assignment mechanism can be assumed ignorable
◦ In particular when the distributions of covariates Pr(X, W = 1) and
Pr(X,W = 0) differ across the treated-untreated groups
◦ Two particular instances of this are
(i) Imbalance: when the distributions are dissimilar
and the more extreme case of....
(ii) Lack of overlap: where the samples {Xi}Wi=1 and {Xi}Wi=0 have
regions of X space where they share little coverage
◦ Both of these lead to large variation in the propensity scores over X space
Pr(Wi|Xi, Yobs)
◦ Imbalance will decrease the precision (increase variance) in the posterior
Causal Inference 43
uncertainty of causal effects
◦ Lack of overlap may lead to non-identifiability of causal effects as differences in
potential outcomes are equally attributable to X as to W
- again, sometimes a data set cannot support a causal analysis
◦ See Gelman and Hill, Chapter 10, (2007) for further details and possible
solutions
Causal Inference 44
Dealing with variables on the causal pathway
◦ An interesting situation arises when we have additional intermediate variables
assumed to arise on the causal pathway, pictorially,
W → Z → Y
◦ If the assignment is ignorable, one might be tempted to treat Z as a covariate
and condition on it within a regression model, say,
Yobs ∼ N(Xβ + Zα + Wθ, σ2I)
and report θ̂ as the causal estimate for θ having controlled for X and Z
◦ This would be wrong!
- note even Fisher made this mistake; see Rubin (2005)
◦ In general you cannot treat Z as a covariate as you are conditioning on Zobs
and Z is affected by treatment
Causal Inference 45
◦ The solution is to consider {Z(1), Z(0)} as a potential intermediate variable;
only one of which is observed, the other being counterfactual once the
treatment has been assigned
◦ This causes no difficulties for the Bayesian simulation approach using Gibbs
Sampling which allows inference to proceed via, say, simulation in a three stage
imputation process
(i) Update Ymis|·(ii) Update Zmis|·(iii) Update θ|·
Repeat
◦ A key feature of the RCM is that it allows us to treat all unknowns within a
common coherent framework
- Condition on all potential confounders X unaffected by treatment
- Treat potential outcomes, and intermediate outcomes on the causal pathway
Causal Inference 46
within a missing data framework
{Y (1), Y (0), Z(1)(1), Z(1)(0), Z(2)(1), . . . , Z(p)(1), Z(p)(0)} for p
intermediate potential outcomes
Causal Inference 47
Observational Studies Recap
When the assignment mechanism is not randomised by the experimenter, i.e. an
observational study, we need to be much more careful about statements of putative
causality. Important points to note:
◦ Carefully consider and condition on all possible confounders, variables which
associate with both assignment and potential outcomes, in an attempt to make
the assignment ignorable
◦ Include any available predictors of potential outcome, i.e. variables which are
independent of assignment but associated with [Y (1), Y (0)]. This will improve
the precision of causal estimates
- my rule of thumb: you get 1% increase in power for every 1% of variance
explained in Y
◦ Treat any additional variables on the causal pathway as potential intermediate
outcomes [Z(1), Z(0)]
Causal Inference 48
◦ Use Bayesian inference
Finally, an extremely useful piece of advice when considering causal analysis of an
observational study is to think about the idealized randomised experiment that you
would have liked to perform (putting budgetary and ethical considerations aside) in
order to learn about the causal estimand of interest. This thought experiment really
helps reveal underlying deficiencies in the data and concentrates the mind on
confounders and what exactly can and is being estimated. See Angrist and Pischke
(2009) and Gelman and Hill (2007).
Causal Inference 49
Instrumental Variables
◦ As we’ve seen, one real challenge in the assumptions needed to make causal
statements on non-randomised studies is the inclusion of all confounders
(including the “known unknowns and the unknown unknowns”)
◦ It is perhaps rare that we will be in a position of confidence when we state we
have included all variables that associate with assignment and potential
outcome
◦ However, in certain circumstances we are able to proceed, all be it with a loss of
information, if we can find an instrumental variable that associates with
treatment but not marginally with the unmeasured confounder or conditionally
(on treatment) with the outcome. That is, T is an instrumental variable if we
have
1. Pr(T,W ) 6= Pr(T )Pr(W ) and
2. Pr(Y (0), Y (1)|W,T, U) = Pr(Y (0, Y (1)|W,U) and
Causal Inference 50
3. Pr(U, T ) = Pr(U)Pr(T ) for any confounder U
◦ In this case there is information in the data to infer a causal effect of treatment
even if we’ve not measured all confounders – see below
◦ Note: T does not have to be causal for W , simply it has to associated.
Causal Inference 51
Mendelian Randomization
◦ Mendelian Randomisation has received considerable attention and debate in
the epidemiology literature, see Didelez and Sheehan (2007).
◦ MR is an instantiation of an Instrumental Variable approach where the
instrument is an allele (a genetic marker) where interest is not on the causal
effect of the allele on a phenotype
◦ In particular suppose we have a molecular biomarker, say a metabolite
expression level in blood {present, absent}, and we’re interested to see if it
causally related to some clinical phenotype or anthropomorphic trait
◦ We are concerned that we have not recorded all potential confounders between
the biomarker and the phenotype
◦ However we know of a genetic marker (allele or SNP) which strongly associates
with the biomarker
◦ Then we can use the genotype as an instrumental variable for the biomarker
Causal Inference 52
◦ Because the genetic marker has been handed out randomly (Mendelian
inheritance) we may have greater confidence that as an instrument it is
unrelated to possible confounders. E.g. the biomarker might vary with (suppose
unmeasured) age, or diet, but as long as we can safely assume that the marker
you inherited is (marginally) unrelated with your age or diet then we’re safe to
proceed with a causal analysis
Causal Inference 53
Estimation of causal effect from IV
◦ Classical (non-Bayesian) estimation of the Average Causal Effect (ACE)
requires an additional assumptions on the dependence structures in order to
make the estimate identifiable
Strict Monotonicity: which implies that E[W |T ] is either increasing with T
or decreasing with T – clearly this always holds for binary T
◦ In this case a consistent estimate of the ACE is
θ̂ =β̂Y,T
β̂W,G
where β̂Y,T is the estimate of the regression coefficient from a least squares
regression of Yobs on the instrument T , with abuse of notation
β̂Y,T = (T ′T )−1T ′Yobs
and β̂W,G is from the least squares regression of Wobs on the instrument T ,
Causal Inference 54
with abuse of notation
β̂W,T = (T ′T )−1T ′Wobs
◦ Note:
(i) the stronger the causal effect between treatment and Y the stronger the
association between instrument and Y
(ii) the weaker the association between instrument and treatment the greater the
attenuation in the estimate of the ACE
◦ From which we can see that we’d like a strong instrument, one strongly
correlated with treatment
Causal Inference 55
Discussion: Is causal inference useful?
◦ When the experimentalist has the opportunity to intervene by prescribing
treatments, by designing Pr(W |X), there is no doubt that causal analysis is
highly informative
◦ For observational studies causal inference requires a lot more careful thought
over using predictive, descriptive models of association Pr(Y |X,W )
◦ We might question whether in fact causal statements can ever be achieved
(Dawid, 2000)
- though note there are many situations where we’re happy to proceed as if
conditions are met – c.f. Bayesian analysis which requires with Probability 1
that the true model is contained under your prior,M-closed
◦ The question then remains is the undertaking of a causal analysis useful?
◦ I believe, in many circumstances, the answer is yes. Clearly when designing
experiments or observational studies, thinking about the assignment
Causal Inference 56
mechanism is a highly useful exercise
◦ When presented with observational data to analyse, contemplating the idealized
randomized experiment to infer the causal estimand is an extremely useful
exercise
- Better understanding the limitations of your data is always a good thing.
◦ Moreover, causal inference pointing to a putative causal effect adds greater
weight to an association being true than a simple predictive model and hence is
more likely to hold up in practice
Causal Inference 57
In short when interest falls on understanding the effect of a particular explanatory
variable W on a response variable Y , causal analysis can help you plan, design,
understand and interpret both experimental and observational data even if the data
does not support a causal interpretation and you have to rely on predictive models
of dependence.