non-gaussian structural equation models for causal discovery

44
Shohei Shimizu Osaka University, Japan Non-Gaussian structural equation models for causal discovery 2016 Probabilistic Graphical Model Workshop: Sparsity, Structure and High-dimensionality References: https ://sites.google.com/site/sshimizu06/home/lingampapers

Upload: sshimizu2006

Post on 24-Jan-2017

514 views

Category:

Science


3 download

TRANSCRIPT

Page 1: Non-Gaussian structural equation models for causal discovery

Shohei Shimizu

Osaka University, Japan

1

Non-Gaussian structural equation

models for causal discovery

2016 Probabilistic Graphical Model Workshop:

Sparsity, Structure and High-dimensionality

References:

https://sites.google.com/site/sshimizu06/home/lingampapers

Page 2: Non-Gaussian structural equation models for causal discovery

Abstract

• Estimation of causal direction and

connection strength of two observed

variables in the presence of hidden

common causes

• A key challenge in causal discovery

• Propose a non-Gaussian model

– Not require us to specify the number of hidden

common causes

2

Page 3: Non-Gaussian structural equation models for causal discovery

Illustrative example

Page 4: Non-Gaussian structural equation models for causal discovery

Significant correlation btw Chocolate

consumption and Num. Nobel laureates(Messerli12NEJM)

4

2002-2011Chocolate consumption (kg/yr/capita)

Nu

m. N

ob

el la

ure

ate

s p

er

10

mill

ion

po

p.

Corr. 0.791

P-value < 0.001

Page 5: Non-Gaussian structural equation models for causal discovery

Eating more chocolate increases

the number of Nobel laureates??

• Interpretational Drift (Maurage+13, J. Nutrition)

5

Chclt Nobel?

Chclt Nobelor

GDP GDP

Chclt Nobelor

GDP

Corr. 0.791

P-value < 0.001N

ob

el

Chocolate

Hidden

Common

cause

Manage this gap!

Hidden

Common

cause

Hidden

Common

cause

Page 6: Non-Gaussian structural equation models for causal discovery

Under what conditions

can we manage this gap?

• We have shown that it is possible under the

three assumptions (Hoyer+08IJAR; Shimizu+14JMLR)

– Linearity

– Acyclity

– Non-Gaussianity

• Performing interventions often very hard

• Theory closely related to independent

component analysis (ICA) (Hyvarinen+01)

6

Page 7: Non-Gaussian structural equation models for causal discovery

7

Many application areas

Epidemiology Economics

Neuroscience Chemistry

Sleep

problems

Depression

mood

Sleep

problems

Depression

mood ?

or

OpInc.gr(t)

Empl.gr(t)

Sales.gr(t)

R&D.gr(t)

Empl.gr(t+1)

Sales.gr(t+1)

R&D(.grt+1)

OpInc.gr(t+1)

Empl.gr(t+2)

Sales.gr(t+2)

R&D.gr(t+2)

OpInc.gr(t+2)

(Moneta et al., 2012)(Rosenstrom et al., 2012)

Policy evaluation

(Campomanes et al., 2014)

Causal information flow

Improving health and QOL

(Boukrina & Graves, 2013)

What changes absorption spectra?

Page 8: Non-Gaussian structural equation models for causal discovery

Brief review of structural

causal models

Page 9: Non-Gaussian structural equation models for causal discovery

Structural causal models (Pearl, 2000)

• A framework for describing causal relations

(or data generating processes)

• An example of linear cases:

• Generally speaking, if the value of 𝑥1 has

been changed and then that of 𝑥2 changes,

then 𝑥1 causes 𝑥2

9

𝒙𝟐 ∶= 𝒃𝟐𝟏𝒙𝟏 + 𝒆𝟐

𝒙𝟏 ∶= 𝒆𝟏

x2x1

e1 e2

e1 and e2 dependent

Page 10: Non-Gaussian structural equation models for causal discovery

73

Changing the value of x1

from c to d

• Replacing the function determining x1 with

a constant c, denoted by do(x1=c), and

then change the constant to d (Pearl, 2000)

21212

11

exbx

ex

21212

1

exbx

cx

Intervention: do(x1=c)

x2x1

e1 e2

x2x1

c e2

Page 11: Non-Gaussian structural equation models for causal discovery

74

Average causal effect(Rubin, 1974; Pearl, 2000)

• Average causal effect of x1 on x2 when changing x1 from c to d

– Computed based on the models with do(x1=d) and do(x1=c)

cdb

cxdoxEdxdoxE

21

1212 ||

cdbxE

dcx

212

1

bychangewill)(then

,tofromof value thechangedhaveyouIf

Page 12: Non-Gaussian structural equation models for causal discovery

Formulating the problem

Page 13: Non-Gaussian structural equation models for causal discovery

13

Estimation of causal direction

• Suppose that data X was randomly generated from either of the following two models:

• Estimate which model generated the data X based on the data X only

or

21212

11

exbx

ex

22

12121

ex

exbx

Model 1: Model 2:

)0( 21 b

x1x2

e2 e1

x1x2

e2 e1

12b21b

)0( 12 b

Page 14: Non-Gaussian structural equation models for causal discovery

Major difficulty

• Errors and are often dependent

• Regression coefficient of on is not

equal to even if we know the right

causal direction

14

or

21212

11

exbx

ex

22

12121

ex

exbx

Model 1: Model 2:

x1x2

e2 e1

x1x2

e2 e1

12b21b

21b

1e 2e

1x2x

)0( 21 b )0( 12 b

Page 15: Non-Gaussian structural equation models for causal discovery

Hidden common causes

• Such dependency is typically introduced

by hidden common causes, say

15

or

Model 1’: Model 2’:

x1x2

e’2 e’1

21b

2

21211212

1

11111

e

efxbx

e

efx

1f

1f

x1x2

e’2 e’1

12b

1f

2

21212

1

11112121

e

efx

e

efxbx

Page 16: Non-Gaussian structural equation models for causal discovery

A well-known guideline(Pearl2000; Spirtes+1993)

• Observe the hidden common cause ,

incorporate it in the models,

and carry out three-variable analysis

• Errors independent!

16

1f

or

Model 1’: Model 2’:

x1x2

e’2 e’1

21b

21211212

11111

efxbx

efx

1f

x1x2

e’2 e’1

12b

1f

21212

11112121

efx

efxbx

21, ee

Page 17: Non-Gaussian structural equation models for causal discovery

Following the guideline is often

very hard• A large number of hidden common causes

may exist (Q unknown)

• Often no idea what they are

17

Qfff ,,, 21

or

Model 1’: Model 2’:

x1x2

e’2 e’1

21b

221212

111

efxbx

efx

q

qq

q

qq

1f

222

112121

efx

efxbx

q

qq

q

qq

Qf

x1x2

e’2 e’1

12b

1f Qf

Page 18: Non-Gaussian structural equation models for causal discovery

18Estimation of causal direction

in the presence of

hidden common causes

• Estimate which model generated the data X

or

Model 1’: Model 2’:

x1x2

e’2 e’1

21b

221212

111

efxbx

efx

q

qq

q

qq

1f

222

112121

efx

efxbx

q

qq

q

qq

Qf

x1x2

e’2 e’1

12b

1f Qf

qf

Page 19: Non-Gaussian structural equation models for causal discovery

Note

• If we intervene on x1 (and x2), we have no

hidden common causes

• But, ethically and costly often difficult to do

interventions

19

Model 1’:

x1x2

e’2 e’1

21b

221212

111

efxbx

efx

q

qq

q

qq

1f Qf

Model 1’’:

x1x2

e’2 c

21b

cx 1

1f Qf

221212 efxbxq

qq

Page 20: Non-Gaussian structural equation models for causal discovery

1. Estimation of causal direction when temporal information is not available

2. Managing hidden common causes

20

Major challenges

x1 x2

?x1 x2

or

x1 x2 ?x1 x2 or

f1 f1

Page 21: Non-Gaussian structural equation models for causal discovery

Basic non-Gaussian model

(No hidden common cause)

S. Shimizu, P. O. Hoyer, A. Hyvärinen

and A. Kerminen.

Journal of Machine Learning Research,

2006.

Page 22: Non-Gaussian structural equation models for causal discovery

• Implying no hidden common causes

• The two models distinguishable if the errors

e1 and e2 are non-Gaussian (Dodge+00CSTM; Shimizu+06JMLR)

Independent errors22

or

21212

11

exbx

ex

22

12121

ex

exbx

Model 1: Model 2:

x1x2

e2 e1

x1x2

e2 e1

12b21b

)0,( 2112 bb

Page 23: Non-Gaussian structural equation models for causal discovery

2323

Different directions give

different data distributionsGaussian Non-Gaussian

Model 1:

Model 2:

x1

x2

x1

x2

e1

e2

x1

x2

e1

e2

x1

x2

x1

x2

x1

x2

212

11

8.0 exx

ex

22

121 8.0

ex

exx

1varvar 21 xx

,021 eEeE

Page 24: Non-Gaussian structural equation models for causal discovery

24

Independent Component Analysis

(ICA) (Jutten & Herault, 1991; Comon, 1994)

• Observed random vector x is modeled by

where

– The mixing matrix A = [ ]

– The hidden variables (independent components) are non-Gaussian and mutually independent

• Then, A is identifiable up to permutation and scaling of the columns

Asx

is

p

j

jiji sax1

or

ija

Page 25: Non-Gaussian structural equation models for causal discovery

Sketch of the identifiability proof

• Different directions give different zero/non-

zero patterns of the mixing matrices

– No zeros on the diagonal in the causal model

– No permutation indeterminacy

25

2

1

212

1

1

01

e

e

bx

x

21212

11

exbx

ex

A sx

2

112

2

1

10

1

e

eb

x

x

A sx22

12121

ex

exbx

x1

x2

e1

e2

x1

x2

e1

e2

0

0

Page 26: Non-Gaussian structural equation models for causal discovery

Linear Non-Gaussian Acyclic

Models (LiNGAM) (Shimizu+06JMLR)

• Identifiable: Directions, coefficients, and intercepts

– Can be uniquely estimated without knowing the causal

structure

26

i

ij

jijii exbx

x1 x2

x3

21b

23b13b

2e

3e

1e

Acyclicity

Non-Gaussian errors ei

Independence of errors ei

(no hidden common causes)

Page 27: Non-Gaussian structural equation models for causal discovery

Extensions

• Cyclic models (Lacerda+08UAI; Hyvarinen+13JMLR)

• Time series (Hyvarinen+10JMLR; Huang+15IJCAI; Gong15ICML)

• Nonlinearity (Zhang+09UAI; Peters+14JMLR; cf. Imoto02PSB)

• Discrete variables (Peters+11TPAMI; Park+15NIPS)

27

iiiii exofparentsffx

1,

1

2,

x1x2e2 e1

)()()(0

tttk

exBx

Page 28: Non-Gaussian structural equation models for causal discovery

LiNGAM with hidden

common causes

P. O. Hoyer, S. Shimizu, A. Kerminen,

and M. Palviainen.

Int. J. Approximate Reasoning

2008

Page 29: Non-Gaussian structural equation models for causal discovery

• Extension to incorporate non-Gaussian hidden

common causes

i

ij

jij

Q

q

qiqii exbfx 1

LiNGAM with hidden

common causes (Hoyer+08IJAR)

29

where are independent: ),,1( Qqfq

qf

x1 x2 2e1e

1f 2f

2121

1

222

1

1

111

exbfx

efx

Q

q

qq

Q

q

qq

Page 30: Non-Gaussian structural equation models for causal discovery

qfWLG, hidden common causes

are assumed to be independent

Independent hidden

common causes

i

ij

jij

Q

q

qiqii exbfx 1

30

x1 x2 2e1e

1fe

2fe

x1 x2 2e1e

1

:1 fef

2

:2 fef

1f 2f

Dependent hidden

common causes

2

1

2221

11

2221

11

2

100

2

1

f

f

aa

a

e

e

aa

a

f

f

f

f

Page 31: Non-Gaussian structural equation models for causal discovery

Different causal directions give

different data distributions(Hoyer, Shimizu, Kerminen and Palviainen, 2008, IJAR)

• Faithfulness + N. hidden common causes “known”

31

x1 x2

f1

x1 x2

orfQ f1 fQ

… …

2e1e2e1e

2121

1

222

1

1

111

exbfx

efx

Q

q

qq

Q

q

qq

2

1

222

1212

1

111

efx

exbfx

Q

q

qq

Q

q

qq

1x1x

2x2x

Page 32: Non-Gaussian structural equation models for causal discovery

Previous estimation approaches

• Explicitly model hidden common causes and

compare two models with opposite directions of

causation

– Maximum likelihood principle (Hoyer+08IJAR)

– Bayesian model selection (Henao & Winther, 2011, JMLR)

• Require us to specify the number of hidden

common causes, which is difficult in general

32

x1 x2

f1

x1 x2

orfQ f1 fQ… …

2e1e2e1e

Page 33: Non-Gaussian structural equation models for causal discovery

Our proposal:

a Bayesian approach

S. Shimizu and K. Bollen.

Journal of Machine Learning Research,

2014

Page 34: Non-Gaussian structural equation models for causal discovery

)(

2

m

)1(

1x)1(

2x

)(

2

mx)1(

1x

)(

2

)(

121

1

)(

22

)(

2

mmQ

q

m

qq

m exbfx

Key idea (1/2)

• Another look at the LiNGAM with hidden common

causes:

34

x1 x2

f1 fQ…

2e1e

m-th obs.:

)1(

2e)1(

1e

)(

2

me)(

1

me

……

21b

21b

21b)(

22

m

)1(

22

Observations are generated from the LiNGAM

model with possibly different intercepts )(

22

m

Page 35: Non-Gaussian structural equation models for causal discovery

Key idea (2/2)

• Include the sums of hidden common

causes as the observation-specific

intercepts:

• Not explicitly model hidden common

causes

– Neither necessary to specify the number of

hidden common causes Q nor estimate the

coefficients

35

)(

2

m

)(

2

)(

121

1

)(

22

)(

2

mmQ

q

m

qq

m exbfx

m-th obs.:

q2

Obs.-specific

intercept

Page 36: Non-Gaussian structural equation models for causal discovery

• Compare the marginal likelihoods of these two

models with opposite directions

• Many additional parameters

– Similar to mixed models and multi-level models

– Informative Prior for the observation-specific intercepts

)()(

121

)(

22

)(

2

)(

1

)(

11

)(

1

m

i

mmm

mmm

exbx

ex

Bayesian model selection36

),,1;2,1()( nmim

i

Model 3 (x1 x2)

)(

2

)(

22

)(

2

)(

1

)(

212

)(

11

)(

1

mmm

mmmm

ex

exbx

Model 4 (x1 x2)

Page 37: Non-Gaussian structural equation models for causal discovery

v

Prior for the observation-specific

intercepts

• Motivation: Central limit theorem

– Sums of independent variables tend to be more Gaussian

• Approximate the density by a bell-shaped curve dist.

• Select the hyper-parameter values that maximize the

marginal likelihood

– DOF fixed to be 6 in the experiments below

37

Q

q

m

qq

mQ

q

m

qq

m ff1

)(

2

)(

2

1

)(

1

)(

1 ,

~)(

2

)(

1

m

m

t-distribution with sd ,

correlation , and DOF1221,

v

)},(sd0.1,),(sd2.0,0{ lll xx }9.0,,1.0,0{12

Page 38: Non-Gaussian structural equation models for causal discovery

The chocolate data revisited

Corr. 0.791

P-value < 0.001No

bel

Chocolate

Gaussianity rejected for both

``Chocolate consumption”

and ``Num. Nobel laureates’’

Page 39: Non-Gaussian structural equation models for causal discovery

Model comparison

• No method available before to compare these two

39

Page 40: Non-Gaussian structural equation models for causal discovery

Conclusions

Page 41: Non-Gaussian structural equation models for causal discovery

Conclusions

• Estimation of causal direction in the presence of

hidden common causes is a major challenge in

causal discovery

• Proposed a linear non-Gaussian SEM with

possibly different intercepts

– Not require to specify the number of hidden common

causes

• Future work

– Sensitivity to the choice of prior distributions

– Better estimation methods computationally and

statistically efficient … and many others

41

Page 42: Non-Gaussian structural equation models for causal discovery

42

Page 43: Non-Gaussian structural equation models for causal discovery

Pairwise

analysis

High-dimensional cases

• Huge number of candidate networks

• Analyze every pair of variables and Integrate the

results to get an entire causal ordering

• Simpler than trying all the combinations of

causal orders

43

x1

x2x4

x3

f1

f3

x1 x2

x3 x4

x1

x2x4

x3

f1

f3

Full graph

Prune

redundant

edges

Integrate

the results

Page 44: Non-Gaussian structural equation models for causal discovery

Non-Gaussian

x2

x1

Gaussian e1,e2, f1

x2

• Faithfulness on 𝑥𝑖, 𝑓𝑖 + Number of 𝑓𝑖 given

Different zero/non-zero patterns

of the mixing matrices (Hoyer+08IJAR)

44

x1 x2

f1

x1 x2

f1

x1 x2

f1

Models

1.

2.

3.

**0

*0*

***

*0*

**0

***

A

A