instrumental variables in gaussian directed acyclic graph models with an unobserved confounder
TRANSCRIPT
ENVIRONMETRICS
Environmetrics 2004; 15: 463–469 (DOI: 10.1002/env.676)
Instrumental variables in Gaussian directed acyclicgraph models with an unobserved confounder
Elena Stanghellini*,y
Dipartimento di Scienze Statistiche, Universita di Perugia, Italy
SUMMARY
We discuss the problem of identification of relevant parameters in a DAG model for Gaussian variables with oneunobserved variable that acts as a counfounder. We first make explicit what we intend for identification and thendiscuss an example where a modified notion of instrumental variable renders a system with a confounderidentifiable. Copyright # 2004 John Wiley & Sons, Ltd.
key words: identification; latent variables; path analysis
1. INTRODUCTION
Recent literature has focused on the notion of identification of relevant parameters in probabilistic
models represented by directed acyclic graphs (DAGs). The renewed interest lies in the fact that, under
some assumptions, relevant parameters of a DAG model can be given a causal interpretation.
The debate about the relationship between causality and probability has raged in the statistical
literature for a long time (see Edwards, 2000, Ch. 8, for early references) but has become explicit after
the work of Pearl (1995). The motivation of Pearl’s work is that we are often interested in the effects of
an intervention that forces one or more of the variables in a system to take on some externally assigned
values or value. As an instance, in a study on the relationship between the level of lead in the blood of
an individual and his/her psychological functioning, we want establish how an hypothetical interven-
tion, that forces the level of lead to become deliberately high, affects some performance indicators.
Under some assumptions on how the distribution of the remaining variables is modified by the
intervention, Pearl derives conditions for identification of causal effects from DAG models. The
conditions are based on the graph and can be derived using graphical rules. As noticed by Dawid
(2002), this view coincides with enhancing a probabilistic model described by a DAG with a further
causal interpretation that it is not implicit in its nature. Dawid calls the graph with this modified
semantics an intervention DAG and discusses further directed acyclic graphs from which a causal
interpretation can be extracted. A detailed account of the debate can also be found in the papers of
Pearl (1998), Lauritzen and Richardson (2002), Lindley (2002) and Singpurwalla (2002).
Copyright # 2004 John Wiley & Sons, Ltd.
*Correspondence to: E. Stanghellini, Dipartimento di Scienze Statistiche, Universita di Perugia, Italy.yE-mail: [email protected]
In this article, conditions are discussed for identification of relevant parameters in a DAG model
for Gaussian variables with one unobserved variable that acts as a counfounder. We first make
explicit what we intend for identification and then discuss an example where a modified notion of
instrumental variable renders a system with an unobserved confounder identifiable. This example is
a particular instance of systems with one latent variable as studied by Stanghellini and Wermuth
(2003).
In Section 2 the relationships between a univariate generating process and induced conditional
independence graphs are recalled and in Section 3 path analysis is introduced together with the
associated directed acyclic graph. In Section 4 we frame the problem of latent variables within the
notion of identification, whereas in Section 5 we present an example of a modified notion of
instrumental variables. The relationship between the identification in the sense used in this article
and the identification of causal effects needs to be further investigated. Some remarks are made in
Section 6, in which we draw our conclusions.
2. UNIVARIATE GENERATING PROCESS AND GRAPHS
Given a vector Y ¼ ðY1; . . . ; YkÞ0 of random variables, a univariate generating process determines a
full ordering of the variables such that each variable in the ordering is potentially a response variable
for the previous ones and an explanatory variable for the following ones. The joint density of the
variables in Y can be factorized accordingly into k univariate (conditional) densities:
f1;...;kðY1 . . . YkÞ ¼ fkðYkÞYk�1
i¼1
fi Yi j YparðiÞ ¼ yparðiÞ� �
ð1Þ
where par(i) is the subset of fiþ 1; . . . ; kg containing the variables that have a direct influence on Yi.
The subset fiþ 1; . . . ; kg is called the potential ancestor of i. A univariate generating process is
represented by a directed acyclic graph. A DAG associated with (1) is a pair of vertices and edges,
GVdag ¼ ðV ; EÞ, where V is the set of vertices or nodes corresponding to the variables Y1; . . . ; Yk and EV
is the set of directed edges drawn as arrows pointing from j to i whenever j 2 parðiÞ. Nodes with a
directed edge originating from j are called the children of j and their set is denoted by chlðjÞ. The
defining independence structure of a DAG is
fi?? potential ancestor of i excluding parðiÞ j parðiÞg ð2Þ
In Figure 1 a DAG with k¼ 6 is represented. The joint distribution of Y factorizes as follows:
f1;...;6ðY1 . . . Y6Þ ¼ f6ðY6Þ f5ðY5 j Y6 ¼ y6Þf4ðY4Þ f3ðY3 j Y4 ¼ y4; Y5 ¼ y5Þf2ðY2 j Y3 ¼ y3Þ f1ðY1 j Y3 ¼ y3; Y4 ¼ y4Þ
ð3Þ
and the defining independence structure is:
1??f2; 5; 6g j f3; 4g 2??f4; 5; 6g j 3
3??6 j f4; 5g 4??f5; 6g ð4Þ
464 E. STANGHELLINI
Copyright # 2004 John Wiley & Sons, Ltd. Environmetrics 2004; 15: 463–469
In this simple case it is not difficult to verify that 1??2 j 3 is also implied by (3). Given a generic
GVdag ¼ ðV; EÞ, for a subset C � V , we may be interested in seeing if the independence i?? j jC n fi; jg
is implied by (1). This may be derived by directly combining probability statements (see, for instance,
Dawid, 1979) or visually by using a separation criterion for DAGs (Pearl, 1988).
3. PATH ANALYSIS AND DIRECTED ACYCLIC GRAPHS
We assume Y to be a vector of k mean centered Gaussian random variables such that
AY ¼ � ð5Þ
where A ¼ f��ijg is an upper triangular matrix with ones along the diagonal, the errors � have zero
mean and are mutually independent such that covð�Þ ¼ � is a diagonal matrix. The linear system
(5)—not necessarily with a joint Gaussian distribution—is known as path analysis, from the work of
Wright (1923, 1934). The DAG associated with (5), GVdag ¼ ðV ; EÞ, has Yi corresponding to node i and
an arrow pointing from j to i whenever �ij is a non-zero coefficient. If all the �ij in (5) are different
from zero the model is saturated and the corresponding DAG is complete.
From (5) the covariance matrix R and the concentration matrix R�1 of Y are:
R ¼ B�BT; R�1 ¼ AT��1A ð6Þ
where B ¼ A�1. Therefore, given A and � the matrix R (or equivalently R�1) is uniquely determined.
The converse also holds when the full ordering of the variables is given. A structural (i, j)-zero in Rmeans that i ?? j is implied by (1). Analogously, a structural (i, j)-zero in R�1 means that
i ?? j jVnfi; jg is implied by (1).
A univariate generating process may contain latent variables, where latent means hidden or
unobserved variables. In this article we consider a situation where a Gaussian latent variable, which we
will indicate with YL, acts as a confounder. More precisely, we focus on the identification of a
particular coefficient �ij of system (5) in which YL is a unobserved variable that is a parent of i and j.
Unobserved variables are denoted in a graph by a double crossing over the corresponding node. In
Figure 2, Y4 is an unobserved confounder of the effect �12.
In this situation, Y can be partitioned onto Y ¼ ðY 0O; YLÞ
0, with YL a latent variable. The set
V ¼ fO; Lg is partitioned accordingly. In the following we indicate with Mab the block ½M�a;b of a
Figure 1. An example of a DAG model
INSTRUMENTAL VARIABLES IN GAUSSIAN MODELS 465
Copyright # 2004 John Wiley & Sons, Ltd. Environmetrics 2004; 15: 463–469
matrix M and with Mab the block ½M�1�a;b of its inverse. The covariance matrix R and the
concentration matrix R�1 of Y are therefore:
R ¼ ROO
RLO
ROL
�LL
� �; R�1 ¼ ROO
RLO
ROL
�LL
� �ð7Þ
We now apply the well known results for the inverse of partitioned matrices (see, for example,
Dempster, 1969). We pose R�1OO:L ¼ ROO and ��1
LL:O ¼ �LL. Then
ROO ¼ ROL��1LLRLO þ ROO:L ð8Þ
and
R�1OO ¼ �ddT þ R�1
OO:L ð9Þ
in which d ¼ ffiffiffiffiffiffiffiffiffiffi�LL:O
pROL. By identification of a coefficient �ij we mean that the the parameter �ij can
be uniquely reconstructed from ROO or its inverse.
A common assumption of latent variable models, which we will follow here, is that the EðYLÞ ¼ 0.
This renders the EðYÞ immediately identifiable from the observable variables and offers a theoretical
justification for dealing with mean centered Gaussian random variables. For the derivation in this
paper, we will not need the further normalising assumption that �LL ¼ 1.
We note that if all the parameters in A and � can be uniquely reconstructed from the distribution of
the observed variables, the model is globally identified (see Rothenberg, 1971). Global identification
of (5) when all variables are observed has been established by Wold (1960) (see also Goldberger, 1964,
p. 383). The problem of global identification of structural equation models with one unobserved
variable has been addressed by Stanghellini and Wermuth (2003) leading to sufficient conditions of
identification of DAG models with one unobserved variable.
4. INSTRUMENTAL VARIABLES: SOME NEW PERSPECTIVES
Suppose that the system is as the one represented in Figure 2, with k ¼ 3 Gaussian variables such that
�13 ¼ �34 ¼ 0 only. Suppose that the aim is to estimate the parameter �12, but the variable Y4 is not
observed. The marginal independence between Y3 and Y4 renders the parameter identified, as
�12 ¼ �13��123 .
This problem is known in the econometric literature as a system with an instrumental variable.
In order to estimate the regression coefficient between two variables with an unobserved confounder,
it is crucial to have an ‘instrument’, that is a third variable that is marginally correlated with the
Figure 2. A DAG with node 4 unobserved and node 3 acting as instrument for �12
466 E. STANGHELLINI
Copyright # 2004 John Wiley & Sons, Ltd. Environmetrics 2004; 15: 463–469
explanatory variable and marginally independent from the unobserved variable. This assumption can
never be tested from the marginal distribution of the observed variables.
Instrumental variables have also been used within the factor analysis models (see Bollen, 1989,
p. 409). In this context the latent variable is such that the observed variables are independent con-
ditionally on that variable, that is, the covariance matrix ROO:L, and therefore its inverse, is diagonal.
Here we consider more complex situations that may arise in observational studies. Suppose again
that we want to estimate the regression coefficient between the concentration level of lead in the blood
and some indicator of the psychological functioning. Suppose that we know that there is an unobserved
variable—the general level of air pollution, say—that influences both the explanatory and the re-
sponse variable. Suppose further that any suitable instrument cannot be assumed to be marginally inde-
pendent from the unobserved variable. More complex systems should therefore be taken into account.
We argue that the analysis of the implied structure of the concentration matrix of the observable
variables, R�1OO, may lead to identify relevant effects in complex systems with an unobserved variable
acting as a confounder. We do it by showing that the model represented in Figure 3 with
O ¼ f1; 2; 3; 5g and L ¼ f4g allows the effect �12 to be identified. We indicate with �ij:L the
concentration between Yi and Yj; i; j 2 O, after conditioning the variables in Vnfi; jg and with
�ijðLÞ the concentration between the observed variable Yi and Yj; i; j 2 fOg, after conditioning on the
variables in Onfi; jg (that is after marginalizing on L). In other words, the first is an element of R�1OO:L,
whereas the second is an element of R�1OO.
We note that �12 ¼ ��12:4=�11:4 but those elements involve conditioning on all the other variables,
that cannot be performed as Y4 is not observed. However, after some calculations we see that
ð�1Þ2 ¼ � �13ð4Þ�15ð4Þ
�53ð4Þ
with the elements of the right-hand side identified, as they involve the distribution of YO after
integrating over YL, which is observed. Furthermore, by noting that �15ð4Þ ¼ �1�5 and �25ð4Þ ¼ �2�5 we
see that �1�2 is identified. Therefore, from (9) we see that
�12:4 ¼ �12ð4Þ � �1�2
and
�11:4 ¼ �11ð4Þ � ð�1Þ2
which shows that �12 is identified.
Figure 3. A DAG with node 4 unobserved and nodes {3, 5} acting as instruments for �12
INSTRUMENTAL VARIABLES IN GAUSSIAN MODELS 467
Copyright # 2004 John Wiley & Sons, Ltd. Environmetrics 2004; 15: 463–469
This result implies that, if more variables are available that act as ‘instruments’ in a general sense,
there are systems with an unobserved variable acting as a confounder in which the effect of interest
can be identified. Note that also in this case the conditional independencies involving Y4 in the
conditioning set cannot be tested from the observable distribution. Nevertheless, this model constitutes
an alternative model for solving the identification problem of �12.
Note that, to identify the system, it is crucial that the matrix ROO:L, or its inverse, contains a
particular constellation of off-diagonal zeros. As zeros in those matrices imply some marginal or
conditional independencies, identified models can be visually characterized. This has been done in
Stanghellini and Wermuth (2003), where sufficient conditions for identifiability of Gaussian DAG
models having one hidden node are presented. These conditions are based on the properties of the
graph and can be derived using graphical rules (see also Stanghellini, 1997; Vicard, 2000 for a related
problem). Stanghellini and Wermuth (2003) treat further the identifiability of a model when the hidden
variable is conditioned on, by establishing a connection to the extended skew-normal distribution
(Capitanio et al., 2003).
Identification of some relevant parameters is the first step for solving the problem of inference on
those parameters, no matter whether we take a frequentist’s or a Bayesian approach. If a parameter is
not identified, the likelihood function is flat along ridges in the parameter space.
5. CONCLUSIONS
We have shown a particular instance of a system with one unobserved variable for which identification
of some relevant parameters can be established. The relationship between the notion of identification
used in this article and the one derived by Pearl is to be investigated. In particular, we need to
understand whether a partial regression coefficient in a Gaussian DAG with an unobserved variable
can summarise the intervention criterion described by Pearl. Furthermore, we should consider whether
the parametric assumptions made in this article enlarge the class of identified effects as defined by
Pearl’s criteria. However, we take the view of Cox and Wermuth (1996) that firm conclusions about
causality can rarely be drawn from one single study.
ACKNOWLEDGEMENTS
The author is grateful to Giovanni M. Marchetti, Mounir Mesbah and Nanny Wermuth for interesting andstimulating discussions and to the referees for important comments on a previous version of the paper.
REFERENCES
Bollen KA. 1989. Structural Equations with Latent Variables. Wiley: New York.Capitanio A, Azzalini A, Stanghellini E. 2003. Graphical models for skew-normal variates. Scandinavian Journal of Statistics30(1): 129–144.
Cox DR, Wermuth N. 1996. Multivariate Dependencies—Models, Analysis and Interpretation. Chapman & Hall: London.Dawid AP. 1979. Conditional independence in statistical theory (with discussion). Journal of the Royal Statistical Society B 41:
1–31.Dawid AP. 2002. Influence diagrams for causal modelling and inference. International Statistical Review 70(2): 161–189.Dempster AP. 1969. Elements of Continuous Multivariate Analysis. Addison-Wesley: Reading.Edwards D. 2000. Introduction to Graphical Modelling, 2nd edn. Springer: New York.Goldberger A. 1964. Econometric Theory. Wiley: New York.
468 E. STANGHELLINI
Copyright # 2004 John Wiley & Sons, Ltd. Environmetrics 2004; 15: 463–469
Lauritzen SL, Richardson TS. 2002. Chain graph model and their causal interpretation (with Discussion). Journal of the RoyalStatistical Society B 64(3): 321–361.
Lindley DV. 2002. Seeing and doing: the concept of causation. International Statistical Review 70(2): 191–214.Pearl J. 1988. Probabilistic Reasoning in Intelligent Systems. Morgan and Kaufmann: San Mateo.Pearl J. 1995. Causal diagrams for empirical research. Biometrika 84: 669–710.Pearl J. 1998. Graphs, causality and structural equation models. Sociological Methods and Research 27(2): 226–284.Rothenberg T. 1971. Identification in parametric models. Econometrica 39: 577–591.Singpurwalla ND. 2002. On causality and causal mechanism. Comment in Dennis Lindley’s. ‘Seeing and doing: the concept of
causation’. International Statistical Review 70(2): 198–206.Stanghellini E. 1997. Identification of a single-factor model using graphical Gaussian rules. Biometrika 84: 241–244.Stanghellini E, Wermuth N. 2003. On the identification of path analysis models with one hidden variable. Biometrika, submitted.Vicard P. 2000. On the identification of a single-factor model with correlated residuals. Biometrika 87: 199–205.Wold HOA. 1960. A generalization of causal chain models. Econometrica 28: 443–463.Wright S. 1923. The theory of path coefficients: a reply to Niles’ criticism. Genetics 8: 239–255.Wright S. 1934. The method of path coefficients. Annals of Statistics 5: 161–215.
INSTRUMENTAL VARIABLES IN GAUSSIAN MODELS 469
Copyright # 2004 John Wiley & Sons, Ltd. Environmetrics 2004; 15: 463–469