instrumental variables in gaussian directed acyclic graph models with an unobserved confounder

ENVIRONMETRICS

Environmetrics 2004; 15: 463–469 (DOI: 10.1002/env.676)

Instrumental variables in Gaussian directed acyclicgraph models with an unobserved confounder

Elena Stanghellini*,y

Dipartimento di Scienze Statistiche, Universita di Perugia, Italy

SUMMARY

We discuss the problem of identification of relevant parameters in a DAG model for Gaussian variables with oneunobserved variable that acts as a counfounder. We first make explicit what we intend for identification and thendiscuss an example where a modified notion of instrumental variable renders a system with a confounderidentifiable. Copyright # 2004 John Wiley & Sons, Ltd.

key words: identification; latent variables; path analysis

1. INTRODUCTION

Recent literature has focused on the notion of identification of relevant parameters in probabilistic

models represented by directed acyclic graphs (DAGs). The renewed interest lies in the fact that, under

some assumptions, relevant parameters of a DAG model can be given a causal interpretation.

The debate about the relationship between causality and probability has raged in the statistical

literature for a long time (see Edwards, 2000, Ch. 8, for early references) but has become explicit after

the work of Pearl (1995). The motivation of Pearl’s work is that we are often interested in the effects of

an intervention that forces one or more of the variables in a system to take on some externally assigned

values or value. As an instance, in a study on the relationship between the level of lead in the blood of

an individual and his/her psychological functioning, we want establish how an hypothetical interven-

tion, that forces the level of lead to become deliberately high, affects some performance indicators.

Under some assumptions on how the distribution of the remaining variables is modified by the

intervention, Pearl derives conditions for identification of causal effects from DAG models. The

conditions are based on the graph and can be derived using graphical rules. As noticed by Dawid

(2002), this view coincides with enhancing a probabilistic model described by a DAG with a further

causal interpretation that it is not implicit in its nature. Dawid calls the graph with this modified

semantics an intervention DAG and discusses further directed acyclic graphs from which a causal

interpretation can be extracted. A detailed account of the debate can also be found in the papers of

Pearl (1998), Lauritzen and Richardson (2002), Lindley (2002) and Singpurwalla (2002).

Copyright # 2004 John Wiley & Sons, Ltd.

*Correspondence to: E. Stanghellini, Dipartimento di Scienze Statistiche, Universita di Perugia, Italy.yE-mail: [email protected]

In this article, conditions are discussed for identification of relevant parameters in a DAG model

for Gaussian variables with one unobserved variable that acts as a counfounder. We first make

explicit what we intend for identification and then discuss an example where a modified notion of

instrumental variable renders a system with an unobserved confounder identifiable. This example is

a particular instance of systems with one latent variable as studied by Stanghellini and Wermuth

(2003).

In Section 2 the relationships between a univariate generating process and induced conditional

independence graphs are recalled and in Section 3 path analysis is introduced together with the

associated directed acyclic graph. In Section 4 we frame the problem of latent variables within the

notion of identification, whereas in Section 5 we present an example of a modified notion of

instrumental variables. The relationship between the identification in the sense used in this article

and the identification of causal effects needs to be further investigated. Some remarks are made in

Section 6, in which we draw our conclusions.

2. UNIVARIATE GENERATING PROCESS AND GRAPHS

Given a vector Y ¼ ðY1; . . . ; YkÞ0 of random variables, a univariate generating process determines a

full ordering of the variables such that each variable in the ordering is potentially a response variable

for the previous ones and an explanatory variable for the following ones. The joint density of the

variables in Y can be factorized accordingly into k univariate (conditional) densities:

f1;...;kðY1 . . . YkÞ ¼ fkðYkÞYk�1

i¼1

fi Yi j YparðiÞ ¼ yparðiÞ� �

ð1Þ

where par(i) is the subset of fiþ 1; . . . ; kg containing the variables that have a direct influence on Yi.

The subset fiþ 1; . . . ; kg is called the potential ancestor of i. A univariate generating process is

represented by a directed acyclic graph. A DAG associated with (1) is a pair of vertices and edges,

GVdag ¼ ðV ; EÞ, where V is the set of vertices or nodes corresponding to the variables Y1; . . . ; Yk and EV

is the set of directed edges drawn as arrows pointing from j to i whenever j 2 parðiÞ. Nodes with a

directed edge originating from j are called the children of j and their set is denoted by chlðjÞ. The

defining independence structure of a DAG is

fi?? potential ancestor of i excluding parðiÞ j parðiÞg ð2Þ

In Figure 1 a DAG with k¼ 6 is represented. The joint distribution of Y factorizes as follows:

f1;...;6ðY1 . . . Y6Þ ¼ f6ðY6Þ f5ðY5 j Y6 ¼ y6Þf4ðY4Þ f3ðY3 j Y4 ¼ y4; Y5 ¼ y5Þf2ðY2 j Y3 ¼ y3Þ f1ðY1 j Y3 ¼ y3; Y4 ¼ y4Þ

ð3Þ

and the defining independence structure is:

1??f2; 5; 6g j f3; 4g 2??f4; 5; 6g j 3

3??6 j f4; 5g 4??f5; 6g ð4Þ

464 E. STANGHELLINI

Copyright # 2004 John Wiley & Sons, Ltd. Environmetrics 2004; 15: 463–469

In this simple case it is not difficult to verify that 1??2 j 3 is also implied by (3). Given a generic

GVdag ¼ ðV; EÞ, for a subset C � V , we may be interested in seeing if the independence i?? j jC n fi; jg

is implied by (1). This may be derived by directly combining probability statements (see, for instance,

Dawid, 1979) or visually by using a separation criterion for DAGs (Pearl, 1988).

3. PATH ANALYSIS AND DIRECTED ACYCLIC GRAPHS

We assume Y to be a vector of k mean centered Gaussian random variables such that

AY ¼ � ð5Þ

where A ¼ f��ijg is an upper triangular matrix with ones along the diagonal, the errors � have zero

mean and are mutually independent such that covð�Þ ¼ � is a diagonal matrix. The linear system

(5)—not necessarily with a joint Gaussian distribution—is known as path analysis, from the work of

Wright (1923, 1934). The DAG associated with (5), GVdag ¼ ðV ; EÞ, has Yi corresponding to node i and

an arrow pointing from j to i whenever �ij is a non-zero coefficient. If all the �ij in (5) are different

from zero the model is saturated and the corresponding DAG is complete.

From (5) the covariance matrix R and the concentration matrix R�1 of Y are:

R ¼ B�BT; R�1 ¼ AT��1A ð6Þ

where B ¼ A�1. Therefore, given A and � the matrix R (or equivalently R�1) is uniquely determined.

The converse also holds when the full ordering of the variables is given. A structural (i, j)-zero in Rmeans that i ?? j is implied by (1). Analogously, a structural (i, j)-zero in R�1 means that

i ?? j jVnfi; jg is implied by (1).

A univariate generating process may contain latent variables, where latent means hidden or

unobserved variables. In this article we consider a situation where a Gaussian latent variable, which we

will indicate with YL, acts as a confounder. More precisely, we focus on the identification of a

particular coefficient �ij of system (5) in which YL is a unobserved variable that is a parent of i and j.

Unobserved variables are denoted in a graph by a double crossing over the corresponding node. In

Figure 2, Y4 is an unobserved confounder of the effect �12.

In this situation, Y can be partitioned onto Y ¼ ðY 0O; YLÞ

0, with YL a latent variable. The set

V ¼ fO; Lg is partitioned accordingly. In the following we indicate with Mab the block ½M�a;b of a

Figure 1. An example of a DAG model

INSTRUMENTAL VARIABLES IN GAUSSIAN MODELS 465


matrix M and with Mab the block ½M�1�a;b of its inverse. The covariance matrix R and the

concentration matrix R�1 of Y are therefore:

R ¼ ROO

RLO

ROL

�LL

� �; R�1 ¼ ROO

RLO

ROL

�LL

� �ð7Þ

We now apply the well known results for the inverse of partitioned matrices (see, for example,

Dempster, 1969). We pose R�1OO:L ¼ ROO and ��1

LL:O ¼ �LL. Then

ROO ¼ ROL��1LLRLO þ ROO:L ð8Þ

and

R�1OO ¼ �ddT þ R�1

OO:L ð9Þ

in which d ¼ ffiffiffiffiffiffiffiffiffiffi�LL:O

pROL. By identification of a coefficient �ij we mean that the the parameter �ij can

be uniquely reconstructed from ROO or its inverse.

A common assumption of latent variable models, which we will follow here, is that the EðYLÞ ¼ 0.

This renders the EðYÞ immediately identifiable from the observable variables and offers a theoretical

justification for dealing with mean centered Gaussian random variables. For the derivation in this

paper, we will not need the further normalising assumption that �LL ¼ 1.

We note that if all the parameters in A and � can be uniquely reconstructed from the distribution of

the observed variables, the model is globally identified (see Rothenberg, 1971). Global identification

of (5) when all variables are observed has been established by Wold (1960) (see also Goldberger, 1964,

p. 383). The problem of global identification of structural equation models with one unobserved

variable has been addressed by Stanghellini and Wermuth (2003) leading to sufficient conditions of

identification of DAG models with one unobserved variable.

4. INSTRUMENTAL VARIABLES: SOME NEW PERSPECTIVES

Suppose that the system is as the one represented in Figure 2, with k ¼ 3 Gaussian variables such that

�13 ¼ �34 ¼ 0 only. Suppose that the aim is to estimate the parameter �12, but the variable Y4 is not

observed. The marginal independence between Y3 and Y4 renders the parameter identified, as

�12 ¼ �13��123 .

This problem is known in the econometric literature as a system with an instrumental variable.

In order to estimate the regression coefficient between two variables with an unobserved confounder,

it is crucial to have an ‘instrument’, that is a third variable that is marginally correlated with the

Figure 2. A DAG with node 4 unobserved and node 3 acting as instrument for �12

466 E. STANGHELLINI


explanatory variable and marginally independent from the unobserved variable. This assumption can

never be tested from the marginal distribution of the observed variables.

Instrumental variables have also been used within the factor analysis models (see Bollen, 1989,

p. 409). In this context the latent variable is such that the observed variables are independent con-

ditionally on that variable, that is, the covariance matrix ROO:L, and therefore its inverse, is diagonal.

Here we consider more complex situations that may arise in observational studies. Suppose again

that we want to estimate the regression coefficient between the concentration level of lead in the blood

and some indicator of the psychological functioning. Suppose that we know that there is an unobserved

variable—the general level of air pollution, say—that influences both the explanatory and the re-

sponse variable. Suppose further that any suitable instrument cannot be assumed to be marginally inde-

pendent from the unobserved variable. More complex systems should therefore be taken into account.

We argue that the analysis of the implied structure of the concentration matrix of the observable

variables, R�1OO, may lead to identify relevant effects in complex systems with an unobserved variable

acting as a confounder. We do it by showing that the model represented in Figure 3 with

O ¼ f1; 2; 3; 5g and L ¼ f4g allows the effect �12 to be identified. We indicate with �ij:L the

concentration between Yi and Yj; i; j 2 O, after conditioning the variables in Vnfi; jg and with

�ijðLÞ the concentration between the observed variable Yi and Yj; i; j 2 fOg, after conditioning on the

variables in Onfi; jg (that is after marginalizing on L). In other words, the first is an element of R�1OO:L,

whereas the second is an element of R�1OO.

We note that �12 ¼ ��12:4=�11:4 but those elements involve conditioning on all the other variables,

that cannot be performed as Y4 is not observed. However, after some calculations we see that

ð�1Þ2 ¼ � �13ð4Þ�15ð4Þ

�53ð4Þ

with the elements of the right-hand side identified, as they involve the distribution of YO after

integrating over YL, which is observed. Furthermore, by noting that �15ð4Þ ¼ �1�5 and �25ð4Þ ¼ �2�5 we

see that �1�2 is identified. Therefore, from (9) we see that

�12:4 ¼ �12ð4Þ � �1�2

and

�11:4 ¼ �11ð4Þ � ð�1Þ2

which shows that �12 is identified.

Figure 3. A DAG with node 4 unobserved and nodes {3, 5} acting as instruments for �12



This result implies that, if more variables are available that act as ‘instruments’ in a general sense,

there are systems with an unobserved variable acting as a confounder in which the effect of interest

can be identified. Note that also in this case the conditional independencies involving Y4 in the

conditioning set cannot be tested from the observable distribution. Nevertheless, this model constitutes

an alternative model for solving the identification problem of �12.

Note that, to identify the system, it is crucial that the matrix ROO:L, or its inverse, contains a

particular constellation of off-diagonal zeros. As zeros in those matrices imply some marginal or

conditional independencies, identified models can be visually characterized. This has been done in

Stanghellini and Wermuth (2003), where sufficient conditions for identifiability of Gaussian DAG

models having one hidden node are presented. These conditions are based on the properties of the

graph and can be derived using graphical rules (see also Stanghellini, 1997; Vicard, 2000 for a related

problem). Stanghellini and Wermuth (2003) treat further the identifiability of a model when the hidden

variable is conditioned on, by establishing a connection to the extended skew-normal distribution

(Capitanio et al., 2003).

Identification of some relevant parameters is the first step for solving the problem of inference on

those parameters, no matter whether we take a frequentist’s or a Bayesian approach. If a parameter is

not identified, the likelihood function is flat along ridges in the parameter space.

5. CONCLUSIONS

We have shown a particular instance of a system with one unobserved variable for which identification

of some relevant parameters can be established. The relationship between the notion of identification

used in this article and the one derived by Pearl is to be investigated. In particular, we need to

understand whether a partial regression coefficient in a Gaussian DAG with an unobserved variable

can summarise the intervention criterion described by Pearl. Furthermore, we should consider whether

the parametric assumptions made in this article enlarge the class of identified effects as defined by

Pearl’s criteria. However, we take the view of Cox and Wermuth (1996) that firm conclusions about

causality can rarely be drawn from one single study.

ACKNOWLEDGEMENTS

The author is grateful to Giovanni M. Marchetti, Mounir Mesbah and Nanny Wermuth for interesting andstimulating discussions and to the referees for important comments on a previous version of the paper.

REFERENCES

Bollen KA. 1989. Structural Equations with Latent Variables. Wiley: New York.Capitanio A, Azzalini A, Stanghellini E. 2003. Graphical models for skew-normal variates. Scandinavian Journal of Statistics30(1): 129–144.

Cox DR, Wermuth N. 1996. Multivariate Dependencies—Models, Analysis and Interpretation. Chapman & Hall: London.Dawid AP. 1979. Conditional independence in statistical theory (with discussion). Journal of the Royal Statistical Society B 41:

1–31.Dawid AP. 2002. Influence diagrams for causal modelling and inference. International Statistical Review 70(2): 161–189.Dempster AP. 1969. Elements of Continuous Multivariate Analysis. Addison-Wesley: Reading.Edwards D. 2000. Introduction to Graphical Modelling, 2nd edn. Springer: New York.Goldberger A. 1964. Econometric Theory. Wiley: New York.

468 E. STANGHELLINI


Lauritzen SL, Richardson TS. 2002. Chain graph model and their causal interpretation (with Discussion). Journal of the RoyalStatistical Society B 64(3): 321–361.

Lindley DV. 2002. Seeing and doing: the concept of causation. International Statistical Review 70(2): 191–214.Pearl J. 1988. Probabilistic Reasoning in Intelligent Systems. Morgan and Kaufmann: San Mateo.Pearl J. 1995. Causal diagrams for empirical research. Biometrika 84: 669–710.Pearl J. 1998. Graphs, causality and structural equation models. Sociological Methods and Research 27(2): 226–284.Rothenberg T. 1971. Identification in parametric models. Econometrica 39: 577–591.Singpurwalla ND. 2002. On causality and causal mechanism. Comment in Dennis Lindley’s. ‘Seeing and doing: the concept of

causation’. International Statistical Review 70(2): 198–206.Stanghellini E. 1997. Identification of a single-factor model using graphical Gaussian rules. Biometrika 84: 241–244.Stanghellini E, Wermuth N. 2003. On the identification of path analysis models with one hidden variable. Biometrika, submitted.Vicard P. 2000. On the identification of a single-factor model with correlated residuals. Biometrika 87: 199–205.Wold HOA. 1960. A generalization of causal chain models. Econometrica 28: 443–463.Wright S. 1923. The theory of path coefficients: a reply to Niles’ criticism. Genetics 8: 239–255.Wright S. 1934. The method of path coefficients. Annals of Statistics 5: 161–215.



instrumental variables in gaussian directed acyclic graph models with an unobserved confounder

Documents