causal inference and complex network methods for the ... · new research group climate sensitivity...

Causal inference and complex network

methods for the geosciences

Jakob Runge

September 28, 2017

https://jakobrunge.github.io/tigramite/


New research group

Climate sensitivity

Extremes prediction

Deep learning

Climate model evaluation

Z1X

Z2

W2

W1

Y

Z1X

Z2

W2

W1

Y

Data-driven dynamical

models

X

YZ

W

Time-scale causality

Causal discovery theory

JENA

Open PhD and Postdoc positions→ climateinformaticslab.com

1

Problem setting

Problem setting

Goal

Learn causal interactions from time series of complex dynamical systems

X1

X2

X3

X4

linear

nonlinear

2

Problem setting

Goal


X1

X2

X3

X1

X2

X3

X4

spuriousassociations

causallinks

X4

linear

nonlinear

1

2

1

3

2

Problem setting

Goal


X1

X2

X3

X1

X2

X3

X4


causallinks

strong autocorrelation

X4

linear

nonlinear

1

2

1

3

2

Problem setting

Goal


X1

X2

X3

X1

X2

X3

X4


causallinks


X4

linear

nonlinear

1

2

1

3

1. How to formulate causal inference for complex dynamical systems?

2

Problem setting

Goal


X1

X2

X3

X1

X2

X3

X4


causallinks


X4

linear

nonlinear

1

2

1

3


2. How to detect causal links?

2

Problem setting

Goal


X1

X2

X3

X1

X2

X3

X4


causallinks


X4

linear

nonlinear

1

2

1

3


2. How to detect causal links?

3. How to quantify causal interactions? 2

How to formulate causal

inference for complex dynamical

systems?

Time series graphs

Time series graphs

Definition

X1

X2

X3

X4

N

[Eichler, 2012]

3

Time series graphs

Definition

X1

X2

X3

tt−1t−2

X4

t−3

N

τmax

[Eichler, 2012]

3

Time series graphs

Definition

X1

X2

X3

tt−1t−2

X4

t−3

N

τmax

X it−τ ✚✚⊥⊥ X j

t | X−t

X it−τ

is not independent

of X jt given X−

t

[Eichler, 2012]

3

Time series graphs

Definition

X1

X2

X3

tt−1t−2

X4

t−3

N

τmax


t | X−t

X it−τ

is not independent

of X jt given X−

t

Assuming stationarity

[Eichler, 2012]

3

Time series graphs

Definition

X1

X2

X3

tt−1t−2

X4

t−3

N

τmax


t | X−t

X it−τ

is not independent

of X jt given X−

t


X3

X41

1

[Eichler, 2012]

3

Time series graphs

Definition

X1

X2

X3

tt−1t−2

X4

t−3

N

τmax


t | X−t

X it−τ

is not independent

of X jt given X−

t


X3

X41

1

Contemporaneous links defined as X it ✚✚⊥⊥ X j

t | X−t left undirected here

[Eichler, 2012]

3

Time series graphs

Definition

X1

X2

X3

tt−1t−2

X4

t−3

N

τmax

Parents


t | X−t

X it−τ

is not independent

of X jt given X−

t


X3

X41

1

Contemporaneous links defined as X it ✚✚⊥⊥ X j

t | X−t left undirected here

[Eichler, 2012]

3

How to detect causal links?

Causal discovery

Causal discovery overview

Z2

Z1

Z2

X

Z2

Y

Z1

Z3

X

Z2

Y

Z1

Z2

Z3

X

Z2

Y

Z3

s=0.03 s=0.91 s=0.87

Y

X

tt−1t−2t−3

Z1

N

τmax

X

Z2

Y

Z1

Z2

Z3

X

Z2

Y

Z1

Z2

Z3

X

Z2

Y

Z1

Z2

Z3

p=0 p=1 p=2

High-dimensional independencetesting /

Granger causality

Score-based / Bayesian network estimation

Constraint-based / PC algorithm

4

Causal discovery overview

Z2

Z1

Z2

X

Z2

Y

Z1

Z3

X

Z2

Y

Z1

Z2

Z3

X

Z2

Y

Z3

s=0.03 s=0.91 s=0.87

Y

X

tt−1t−2t−3

Z1

N

τmax

X

Z2

Y

Z1

Z2

Z3

X

Z2

Y

Z1

Z2

Z3

X

Z2

Y

Z1

Z2

Z3

p=0 p=1 p=2

Score-based / Bayesian network estimation

Constraint-based / PC algorithm

Hybrid-methods

High-dimensional independencetesting /

Granger causality

4

Causal discovery

Granger causality

N

τmax

Lag-specific Granger

causality:


t | X−t

here implemented with

ParCorr based on OLS /

Ridge / Lasso,

non-parametric Gaussian

processes test → paper

5

PCMCI = PC1-condition selection + MCI test

1. Condition selection (Markov blanket)

For j ∈ {1, . . . ,N}: Estimate superset of

parents P(X jt ) such that

X jt ⊥⊥ X−

t \ P(X jt ) | P(X j

t ) with iterative

PC1 algorithm: tuned to high power with

liberal α, false pos. control in next step!

N

τmax

6





X jt ⊥⊥ X−


t ) with iterative



N

τmax

Remove links atα−significance level

6





X jt ⊥⊥ X−


t ) with iterative



N

τmax

6





X jt ⊥⊥ X−


t ) with iterative



N

τmax


6





X jt ⊥⊥ X−


t ) with iterative



N

τmax

6





X jt ⊥⊥ X−


t ) with iterative



N

τmax


6





X jt ⊥⊥ X−


t ) with iterative



N

τmax

convergence

6





X jt ⊥⊥ X−


t ) with iterative



N

τmax

6





X jt ⊥⊥ X−


t ) with iterative



N

τmax

2. Momentary conditional independence

(MCI) test

For i , j ∈ {1, . . . ,N} and 0 ≤ τ ≤ τmax:

Test

MCI: X it−τ

⊥⊥ X jt | P(X j

t ), P(X it−τ

)

N

τmax

6





X jt ⊥⊥ X−


t ) with iterative



N

τmax


(MCI) test

For i , j ∈ {1, . . . ,N} and 0 ≤ τ ≤ τmax:

Test

MCI: X it−τ

⊥⊥ X jt | P(X j

t ), P(X it−τ

)

N

τmax

pX=2

6





X jt ⊥⊥ X−


t ) with iterative



N

τmax


(MCI) test

For i , j ∈ {1, . . . ,N} and 0 ≤ τ ≤ τmax:

Test

MCI: X it−τ

⊥⊥ X jt | P(X j

t ), P(X it−τ

)

N

τmax

pX=2

Flexible regarding conditional independence tests

here ParCorr (OLS), Gaussian processes → paper 6


Condition-selection significance level10

7


Condition-selection significance level

Correlation / mutual information

Granger causality /Transfer entropy

10

7



Correlation / mutual information PCMCI


10

7





10

d-separation(used in PC algo.)

7





10

remove autocorrelation

& measure causal strength


7





10by model-selection




7





10by model-selection




More theory in paper:

If PC1 identifies parents, then

• MCI has unbiased detection power for linear links in additive models

IMCI

X→Y = I(ηXt−τ

; cηXt−τ+ ηYt

)

• MCI is well-calibrated also for autocorrelated data

• effect size of MCI is larger than GC → more power also for low

dimensions

7

Conditional independence tests

Conditional independence tests X ⊥⊥ Y | Z

Assuming linear model: Partial correlation (ParCorr)

1. Regress out influence of Z with OLS

X = ZβX + ǫX

Y = ZβY + ǫY

Ridge and Lasso implemented with scikit-learn on standardized time series

• Ridge regularization: LOO-cross-validated regularization parameter

α ∈ {0.1, 1, 2, . . . , 500}

• Lasso regularization: multi-task lasso,

α ∈ {0.0001, 0.001, 0.01, 0.1, 1} using 5-fold cross-validation, max.

iterations = 100

2. Test independence of residuals with t-test

• OLS: T − DZ − 2 degrees of freedom

8


Assuming nonlinear additive Gaussian: GPDC

1. Regress out influence of Z with Gaussian process assuming

X = fX (Z ) + ǫX

Y = fY (Z ) + ǫY

ǫ ∼ N (0, σ2)

GP regression implemented using sklearn

• Radial Basis Function (RBF) + White Noise kernel

• bandwidth estimated with MLE

2. Test independence of uniformized residuals with distance correlation

coefficient [Szekely et al., 2007]

R (rX , rY )

using pre-computed null distribution (for every T )

9


General: Conditional mutual information (CMI)

I (X ;Y |Z ) =

∫

dz p(z)

∫ ∫

dxdy p(x , y |z) logp(x , y |z)

p(x |z) · p(y |z)

Estimated with kNN-estimator

[Kraskov et al., 2004,

Frenzel and Pompe, 2007]

Free parameter: number of

nearest neighbors k ∼ locally

adaptive bandwidth X

Y

Z

B

A

10


X Y

Z

X Y

Z

ParCorr GPACE / GPDC CMIknn

X Y

Z

X Y

Z

nonlinear,additive noise

nonlinear,multiplicative noise

X Y

Z

X Y

Z

X Y

Z

X Y

Z

X Y

Z

X Y

Z

linear,additive noise

X Y

Z

X Y

Z

Dependency

11

Causal assumptions

Causal interpretation assumes [Spirtes et al., 2000]:

• Causal Markov Condition: “All the relevant probabilistic

information that can be obtained from the system is contained in its

direct causes”

• Causal Sufficiency: Measured variables include all of the common

causes

• Faithfulness / Stableness: “Independencies in data arise not from

incredible coincidence, but rather from causal structure”; violated by

purely deterministic dependencies

12

Causal assumptions




direct causes”


causes




• No contemporaneous effects: (but can be extended)

• Stationarity: time series case

• Parametric assumptions of independence tests

12

Causal assumptions




direct causes”


causes







More discussion → appendix

12

Numerical experiments


Model setup

X 1t = 0. 20X 1

t− 1 +η1tX 2

t = 0. 60X 2t− 1 −0. 287f (1)(X 5

t− 1) +0. 287f (1)(X 4

t− 1) +η2tX 3

t =+η3tX 4

t = 0. 40X 4t− 1 −0. 287f (1)(X 1

t− 1) +0. 287f (1)(X 5

t− 1) +η4tX 5

t = 0. 60X 5t− 1 −0. 287f (1)(X 2

t− 1) +η5t

η ∼ N(0, 1)f (1)(x) = x

f (2)(x) = (1− 4e−x2/2)x

f (3)(x) = (1− 4x3e−x2/2)x

X 1

X 2

X 3

X 4

X 5

τ=1

τ=1

τ=1τ=1

τ=1

0.0 0.4 0.8autocorrelation a

(node color)

0.4 0.0 0.4coeff.

(link color)

3113

X1

3113

X2

3113

X3

202

X4

0 40 80 120 160time

3113

X5

• random coupling topologies, time lags, linear

• fixed link strength within each network

• different autocorrelations for variables

• τmax = 5, T = 150, varying N = 5..60 → N · τmax > T

13


Model setup

X 1t = 0. 40X 1

t− 1 +0. 287f(1)(X 5

t− 1) +0. 287f (1)(X 4

t− 1) +η1tX 2

t = 0. 20X 2t− 1 −0. 287f (1)(X 4

t− 1) +η2tX 3

t =+η3tX 4

t = 0. 80X 4t− 1 +0. 287f

(1)(X 5t− 1)

−0. 287f (1)(X 3t− 2) +η4t

X 5t = 0. 60X 5

t− 1 +η5t

η ∼ N(0, 1)f (1)(x) = x

f (2)(x) = (1− 4e−x2/2)x

f (3)(x) = (1− 4x3e−x2/2)x

X 1

X 2

X 3

X 4

X 5

τ=2

τ=1τ=1

τ=1τ=1


(node color)

0.4 0.0 0.4coeff.

(link color)

3113

X1

3113

X2

4202

X3

3113

X4

0 40 80 120 160time

3113

X5





13


Model setup

X 1t =−0. 287f (1)(X 4

t− 1) +η1tX 2

t = 0. 40X 2t− 1 −0. 287f (1)(X 4

t− 1) +0. 287f (1)(X 1

t− 1) +η2tX 3

t = 0. 90X 3t− 1 −0. 287f (1)(X 1

t− 2) +η3tX 4

t = 0. 90X 4t− 1 +η4t

X 5t = 0. 80X 5

t− 1 −0. 287f (1)(X 3t− 2) +η5t

η ∼ N(0, 1)f (1)(x) = x

f (2)(x) = (1− 4e−x2/2)x

f (3)(x) = (1− 4x3e−x2/2)x

X 1

X 2

X 3

X 4

X 5

τ=1τ=2τ=2

τ=1τ=1


(node color)

0.4 0.0 0.4coeff.

(link color)

311

X1

2.51.50.50.51.5

X2

3113

X3

3113

X4

0 40 80 120 160time

3113

X5





13


Model setup

X 1t = 0. 20X 1

t− 1 −0. 287f (1)(X 2t− 1) +η1t

X 2t = 0. 80X 2

t− 1 −0. 287f (1)(X 4t− 2) +η2t

X 3t = 0. 80X 3

t− 1 +η3tX 4

t = 0. 90X 4t− 1 +η4t

X 5t = 0. 40X 5

t− 1 +0. 287f(1)(X 7

t− 2) +η5tX 6

t = 0. 40X 6t− 1 +0. 287f

(1)(X 1t− 1)

−0. 287f (1)(X 2t− 2) +η6t

X 7t = 0. 80X 7

t− 1 +η7tX 8

t = 0. 20X 8t− 1 +0. 287f

(1)(X 4t− 1)

+0. 287f (1)(X 3t− 1) +η8t

X 9t = 0. 90X 9

t− 1 +0. 287f(1)(X 8

t− 2) −0. 287f (1)(X 5

t− 1) +η9tX 1

t 0 = 0. 20X 10t− 1 +0. 287f (1)(X 1t− 2) +η10t

η ∼ N(0, 1)f (1)(x) = x

f (2)(x) = (1− 4e−x2/2)x

f (3)(x) = (1− 4x3e−x2/2)x

X 1

X 2

X 3X 4

X 5

X 6

X 7

X 8 X 9

X 10

τ=2τ=1

τ=1τ=2τ=1

τ=2

τ=1

τ=1

τ=2

τ=2


(node color)

0.4 0.0 0.4coeff.

(link color)

3113

X1

3113

X2

3113

X3

3113

X4

3113

X5

3113

X6

3113

X7

21012

X8

311

X9

0 40 80 120 160time

4202

X10





13


Similarly well-calibrated tests

Number of variables N

Fals

e p

osit

ives

Tru

e p

osit

ives

GC PCMCI

14


GC suffers from curse of dimensionality and power bias


Fals

e p

osit

ives

Tru

e p

osit

ives

GC PCMCI

14


GC suffers from curse of dimensionality and power bias


Fals

e p

osit

ives

Tru

e p

osit

ives

GC PCMCI

autocorrelation of links:

weak strong

unbiasedpower

biasedpower

14


Lasso not well-calibrated and power bias


Fals

e p

osit

ives

Tru

e p

osit

ives

GC LASSO PCMCI

14


PC algorithm also low and biased power


Fals

e p

osit

ives

Tru

e p

osit

ives

GC LASSO PC PCMCI

14


Key idea again

N

τmax

14

Applications

Applications

• Causal hypothesis testing

[Runge et al., 2014, Runge et al., 2015c, Kretschmer et al., 2016]

• Variable selection for model building

• ...or prediction schemes

[Runge et al., 2015a, Kretschmer et al., 2017]

• Pathway analysis [Runge et al., 2015b, Runge, 2015]

15

Global sea-level pressure interactions

Sea-level pressure system [Kalnay et al., 1996]

87.5°S

85°S

82.5°S

80°S

77.5°S

75°S

72.5°S

70°S

67.5°S

65°S

62.5°S

60°S

57.5°S

55°S

52.5°S

50°S

47.5°S

45°S

42.5°S

40°S

37.5°S

35°S

32.5°S

30°S

27.5°S

25°S

22.5°S

20°S

17.5°S

15°S

12.5°S

10°S

7.5°S

5°S

2.5°S

0°

2.5°N

5°N

7.5°N

10°N

12.5°N

15°N

17.5°N

20°N

22.5°N

25°N

27.5°N

30°N

32.5°N

35°N

37.5°N

40°N

42.5°N

45°N

47.5°N

50°N

52.5°N

55°N

57.5°N

60°N

62.5°N

65°N

67.5°N

70°N

72.5°N

75°N

77.5°N

80°N

82.5°N

85°N

87.5°N

2.5°E 5°E 7.5°E 10°E 12.5°E 15°E 17.5°E 20°E 22.5°E 25°E 27.5°E 30°E 32.5°E 35°E 37.5°E 40°E 42.5°E 45°E 47.5°E 50°E 52.5°E 55°E 57.5°E 60°E 62.5°E 65°E 67.5°E 70°E 72.5°E 75°E 77.5°E 80°E 82.5°E 85°E 87.5°E 90°E 92.5°E 95°E 97.5°E 100°E 102.5°E 105°E 107.5°E 110°E 112.5°E 115°E 117.5°E 120°E 122.5°E 125°E 127.5°E 130°E 132.5°E 135°E 137.5°E 140°E 142.5°E 145°E 147.5°E 150°E 152.5°E 155°E 157.5°E 160°E 162.5°E 165°E 167.5°E 170°E 172.5°E 175°E 177.5°E 180°177.5°W 175°W 172.5°W 170°W 167.5°W 165°W 162.5°W 160°W 157.5°W 155°W 152.5°W 150°W 147.5°W 145°W 142.5°W 140°W 137.5°W 135°W 132.5°W 130°W 127.5°W 125°W 122.5°W 120°W 117.5°W 115°W 112.5°W 110°W 107.5°W 105°W 102.5°W 100°W 97.5°W 95°W 92.5°W 90°W 87.5°W 85°W 82.5°W 80°W 77.5°W 75°W 72.5°W 70°W 67.5°W 65°W 62.5°W 60°W 57.5°W 55°W 52.5°W 50°W 47.5°W 45°W 42.5°W 40°W 37.5°W 35°W 32.5°W 30°W 27.5°W 25°W 22.5°W 20°W 17.5°W 15°W 12.5°W 10°W 7.5°W 5°W 2.5°W

0 1 23 456 78 910 11 1213 141516 1718 19 2021 2223 242526 272829 30 31323334 35 36373839 404142 43 44 4546 474849505152 535455 5657 5859

• detrended, anomalized, winter-only (DJF) of 1981–2012

• dimension reduction using Varimax-rotated PCA

[Vejmelka et al., 2014]

• time resolution: 3-days, τmax = 3 weeks

16



60°S

30°S

0°

30°N

60°N

60°E 120°E 180° 120°W 60°W

0

1

2

3

4

5

6

7

8

9

10

11

12

13

14 15

16

17

18

19

20

21

22

23

24

25

26

27

28

29

30

31

32

33

34

35

36

37

38

39

40

41

42

43

44

45

46

47

48

49

50

51

52

53

54

55

56

57

58

59

0 1

2

3

4

5

6

7

8

9

10

11

1213

14 15

16

17

18

19

20

21

22

2324

25

26

27

2829

30

31

3233

34

3536

37

38

39

40 41

42

43

44

45

46

47

48

49

50

51

52

53

54

55

56

57

5859

11





16



60°S

30°S

0°

30°N

60°N

60°E 120°E 180° 120°W 60°W

0

1

2

3

4

5

6

7

8

9

10

11

12

13

14 15

16

17

18

19

20

21

22

23

24

25

26

27

28

29

30

31

32

33

34

35

36

37

38

39

40

41

42

43

44

45

46

47

48

49

50

51

52

53

54

55

56

57

58

59

0 1

2

3

4

5

6

7

8

9

10

11

1213

14 15

16

17

18

19

20

21

22

2324

25

26

27

2829

30

31

3233

34

3536

37

38

39

40 41

42

43

44

45

46

47

48

49

50

51

52

53

54

55

56

57

5859

11

5





16



60°S

30°S

0°

30°N

60°N

60°E 120°E 180° 120°W 60°W

0

1

2

3

4

5

6

7

8

9

10

11

12

13

14 15

16

17

18

19

20

21

22

23

24

25

26

27

28

29

30

31

32

33

34

35

36

37

38

39

40

41

42

43

44

45

46

47

48

49

50

51

52

53

54

55

56

57

58

59

0 1

2

3

4

5

6

7

8

9

10

11

1213

14 15

16

17

18

19

20

21

22

2324

25

26

27

2829

30

31

3233

34

3536

37

38

39

40 41

42

43

44

45

46

47

48

49

50

51

52

53

54

55

56

57

5859

11

5

0

233

22





16



60°S

30°S

0°

30°N

60°N

60°E 120°E 180° 120°W 60°W

0

1

2

3

4

5

6

7

8

9

10

11

12

13

14 15

16

17

18

19

20

21

22

23

24

25

26

27

28

29

30

31

32

33

34

35

36

37

38

39

40

41

42

43

44

45

46

47

48

49

50

51

52

53

54

55

56

57

58

59

0 1

2

3

4

5

6

7

8

9

10

11

1213

14 15

16

17

18

19

20

21

22

2324

25

26

27

2829

30

31

3233

34

3536

37

38

39

40 41

42

43

44

45

46

47

48

49

50

51

52

53

54

55

56

57

5859

11

5

0

233

22





• Nτmax = 60 · 7 = 420 with a comparably small sample size of about

950 samples and partially strong autocorrelations

16

Climate applications

Spurious correlation vs MCI

0.0

0.2

0.4

0.6

0.8

1.0

Auto

corr

ela

tion

0.0 0.2 0.4

|Correlation|

0.0

0.1

0.2

0.3

|MC

I|

FDR 5%

26→42

35→4

9→58→5

20→48

44→5

55→41

22→5

38→5

43→2

15→2

15→5

30→15

2→26

33→0

48→0

43→542→0

42→8

55→1

26→0

7→042→29

2→5

10→38

4→26

18→32

20→16

31→5

45→5

26→2

22→1047→49

18→1

18→2

51→18

0→33

0→57

16→055→5

40→41

A• even strong correlations

are spurious

17

Climate applications

Granger causality vs MCI

0.0

0.2

0.4

0.6

0.8

1.0

Auto

corr

ela

tion

0.0 0.1 0.2

|GC|

0.0

0.1

0.2

0.3

|MC

I|

FDR 5%

35→1

55→41

35→4

9→58→5

20→48

1→1244→5

16→59

45→5

43→2

15→2

15→522→5

14→43

55→5

30→15

12→42

43→5

55→1

38→5

20→16

7→042→29

21→8

10→38

27→26

9→11

27→32

9→6

22→10

52→0

47→49

59→4

31→5

51→18

2→5

40→41

B• many even strong causal

links overlooked with GC

18

How to quantify causal

interactions?

Causal strength

MCI and causal strength

Defining causal strength

time Xt−τ = gX (P (Xt−τ )) + ηXt−τ

Yt = gY (P (Yt) \ {Xt−τ}) + ηYt

with link Xt−τ → Yt represented as

ηYt = f (Xt−τ ) + ηYt

Causal strength

I(ηXt−τ

; ηYt |P (Xt−τ ))

measures “momentary” dependence

in ηYt on Xt−τ that does not come

through the parents of Xt−τ

19


Definition

MCI: Xt−τ ⊥⊥ Yt | P(Yt) \ {Xt−τ}, P(Xt−τ ) (1)

20


Definition

IMCI

X→Y (τ) = I (Xt−τ ; Yt | P(Yt) \ {Xt−τ}, P(Xt−τ )) (1)

20


Definition

IMCI


1. MCI measures causal strength

IMCI

X→Y = I(gX

(PXt−τ

)+ ηXt−τ

; gY (PYt\ {Xt−τ}) + ηYt | . . .

)

= I(ηXt−τ

; ηYt | PXt−τ

)

20


Definition

IMCI



IMCI

X→Y = I(gX

(PXt−τ

)+ ηXt−τ

; gY (PYt\ {Xt−τ}) + ηYt | . . .

)

= I(ηXt−τ

; ηYt | PXt−τ

)

2. MCI has unbiased detection power for linear links

ηYt = cXt−τ + ηYt = c(gX

(PXt−τ

)+ ηXt−τ

)+ ηYt

=⇒ IMCI

X→Y = I(ηXt−τ

; cηXt−τ+ ηYt

)

20


Definition

IMCI



IMCI

X→Y = I(gX

(PXt−τ

)+ ηXt−τ

; gY (PYt\ {Xt−τ}) + ηYt | . . .

)

= I(ηXt−τ

; ηYt | PXt−τ

)

2. MCI has unbiased detection power for linear links

ηYt = cXt−τ + ηYt = c(gX

(PXt−τ

)+ ηXt−τ

)+ ηYt

=⇒ IMCI

X→Y = I(ηXt−τ

; cηXt−τ+ ηYt

)

3. MCI leads to well-calibrated test

ηYt = ηYt =⇒ IMCI

X→Y = I(ηXt−τ

; ηYt)= 0

20

MCI, GC, and PC

1. Generally: GC ≤ MCI (→ GC has lower power)

IGC

X→Y (τ) = I(Xt−τ ;Yt |X

−t \ {Xt−τ}

)

time I ((X ,Z );Y |W ) = I (X ;Y |W )︸︷︷︸

MCI

+ I (Z ;Y |W ,X )︸︷︷︸

=0 (Markov)

= I (Z ;Y |W )︸︷︷︸

≥0

+ I (X ;Y |WZ )︸︷︷︸

GC

=⇒ IMCI

X→Y (τ) ≥ IGC

X→Y (τ)

21

MCI, GC, and PC

2. Single PC test has more power, but is non-iid

IPC

X→Y (τ) = I (Xt−τ ;Yt |P (Yt) \ {Xt−τ})

= I(ηXt−τ

,P (Xt−τ ) ;Yt |P (Yt) \ {Xt−τ})

= I (P (Xt−τ ) ;Yt |P (Yt) \ {Xt−τ})︸︷︷︸

typically non-iid

+ I(ηXt−τ

;Yt |P (Yt) \ {Xt−τ} ,P (Xt−τ ))

︸︷︷︸

MCI

=⇒ IMCI

X→Y (τ) ≤ IPC

X→Y (τ)

21


Effect size analysis for simple model

A

22



A

0.0 0.2 0.4 0.6 0.8 1.0

Autocorrelation aX

0.0

0.1

0.2

0.3

0.4

0.5

0.6

I [n

ats

]

MI

PC / ITY

GC

MCI

B

22



A

1.0 0.5 0.0 0.5 1.0 1.5

Common driver cWX

0.0

0.1

0.2

0.3

0.4

0.5

0.6

I [n

ats

]

MI

PC / ITY

GC

MCI

C

22



A

0.0 0.5 1.0 1.5 2.0

External effect cXZ

0.0

0.1

0.2

0.3

0.4

0.5

0.6

I [n

ats

]

MI

PC / ITY

GC

MCI

D

22



A

0.0 0.5 1.0 1.5 2.0

External effect cXZ

0.0

0.1

0.2

0.3

0.4

0.5

0.6

I [n

ats

]

MI

PC / ITY

GC

MCI

D

General proof for ‘unbiased’ detection power → paper

22

Quantifying causal pathways


Linear approach: Mediated Causal Effect (MCE)

Z1X

Z2

W2

W1

Y

Yt = f ( ~PY ) + error = ~PY · ~B + error

• Direct links: path coefficients

α, β, γ, δ, ε

23



Z1X

Z2

W2

W1

Y



α, β, γ, δ, ε

• Indirect causal effect:

CEX→Y = αε+ βδ + βγε

23



Z1X

Z2

W2

W1

Y



α, β, γ, δ, ε

• Indirect causal effect:

CEX→Y = αε+ βδ + βγε

• Mediated causal effect:

MCEX→Y |W1= βδ + βγε

23


Climate application: East Pacific → Monsoon pathway

Here whole year analysis [Runge et al., 2015b]

• causal approach to

atmospheric

teleconnections

24

Network analysis

Climate application


OUT

IN

Ave.

Causal

Effect

X

• Average Causal Effect (ACE)

ACE(i) =1

N − 1

∑

j 6=i

CEmaxi→j

• Average Causal Susceptibility

ACS(j) =1

N − 1

∑

i 6=j

CEmaxi→j

25

Network analysis

Climate application


60°S

30°S

0°

30°N

60°N

0° 60°E 120°E 180° 120°W 60°W 44%33%

Nin/out

a)

0 1

2

3

4

5

6

78

910

11

1213

14 15

16

17

18

19

20

21

2223 24

2526

27

2829

30

31

323334

35 36

37

38

3940 41

42

43

44

45

46

47

48

49

50

51

52

53

54

55

56

57

5859

0.00 0.02 0.04 0.06ACS (inner node) and ACE (outer ring)

• uplifts over

tropical oceans

are major drivers

26

Network analysis

Climate application


THROUGH

Ave.

Mediated

Causal

Effect

X

• Average Mediated Causal Effect

(AMCE)

AMCE(i) =1

|Ck |

∑

(i,j)∈Ck

maxτ

|MCEi→j|k(τ)|

27

Network analysis

Climate application


60°S

30°S

0°

30°N

60°N

0° 60°E 120°E 180° 120°W 60°W82%

Max. |C|/Cmax

b)

0 1

2

3

4

5

6

7

8

910

11

1213

14 15

16

17

18

19

20

21

22

23 24

25

26

27

2829

30

31

3233

34

35 36

37

38

39

40 41

42

43

44

45

46

47

48

49

50

51

52

53

54

55

56

57

5859

0.0000 0.0005 0.0010 0.0015 0.0020AMCE

• uplifts over

tropical oceans

also major

mediators

28



30

Discussion and conclusion


• framework for reliable large-scale time-lagged causal discovery

Python code on https://jakobrunge.github.io/tigramite/

Paper: J. Runge, D. Sejdinovic, S. Flaxman (2017):

https://arxiv.org/abs/1702.07007

31




• flexible regarding (non-parametric) conditional independence tests

→ nodes can be multivariate, variables discrete, ...




31






• interpretable MCI statistic measures causal strength

→ ranking causal links in large-scale analyses




31








• Causal pathway analysis [PRE Dec 2015]




31









• Causal complex network measures [Nature Comm. 2015]




31









• Causal complex network measures [Nature Comm. 2015]

• Optimal prediction [PRE May 2015]




31


Tigramite

Tigramite 3.0 → tigramite tutorial.ipynb


pcmci.PCMCI(dataframe, cond_ind_test)

run_pcmci(tau_max, pc_alpha, fdr_method)

Returns: p_matrix, q_matrix, val_matrix

data_processingpreprocessing functions

DataFrame(data, mask, missing_flag)

independence_tests.ParCorr/GPDC/CMIknn/CMIsymb(use_mask, mask_type, significance,

confidence)

models.Models(dataframe, model, data_transform, use_mask, mask_type,

missing_flag)Wrapper around sklearn

get_fit(all_parents)Returns: fitted model

models.LinearMediation(dataframe, data_transform,

use_mask, mask_type,

missing_flag)

models.Prediction(dataframe, train_indices,

test_indices, prediction_model, data_transform, cond_ind_model,

missing_flag)

get_predictors(steps_ahead, tau_max, pc_alpha)Returns: predictors

fit(target_predictors)predict(target)

Returns: prediction results

32


New research group

Climate sensitivity

Extremes prediction

Deep learning

Climate model evaluation

Z1X

Z2

W2

W1

Y

Z1X

Z2

W2

W1

Y

Data-driven dynamical

models

X

YZ

W

Time-scale causality

Causal discovery theory

JENA

Open PhD and Postdoc positions→ climateinformaticslab.com

33

Thank you !!!

34

References i

Eichler, M. (2012).

Graphical modelling of multivariate time series.

Probability Theory and Related Fields, 153(1):233–268.

Frenzel, S. and Pompe, B. (2007).

Partial mutual information for coupling analysis of multivariate

time series.

Physical Review Letters, 99(20):204101.

35

References ii

Kalnay, E., Kanamitsu, M., Kistler, R., Collins, W., Deaven, D.,

Gandin, L., Iredell, M., Saha, S., White, G., Woollen, J., Zhu, Y.,

Chelliah, M., Ebisuzaki, W., Higgins, W., Janowiak, J., Mo, K. C.,

Ropelewski, C., Wang, J., Leetmaa, A., Reynolds, R., Jenne, R., and

Joseph, D. (1996).

The NCEP/NCAR 40-year reanalysis project.

Bulletin of the American Meteorological Society, 77(3):437–471.

Kraskov, A., Stogbauer, H., and Grassberger, P. (2004).

Estimating mutual information.

Physical Review E, 69(6):066138.

36

References iii

Kretschmer, M., Coumou, D., Donges, J. F., and Runge, J. (2016).

Using causal effect networks to analyze different arctic drivers

of midlatitude winter circulation.

Journal of Climate, 29(11):4069–4081.

Kretschmer, M., Coumou, D., and Runge, J. (2017).

Early prediction of weak stratospheric polar vortex states using

causal precursors.

under review.

Runge, J. (2015).

Quantifying information transfer and mediation along causal

pathways in complex systems.

Phys. Rev. E, 92(6):062829.

37

References iv

Runge, J., Donner, R. V., and Kurths, J. (2015a).

Optimal model-free prediction from multivariate time series.

Phys. Rev. E, 91(5):052909.

Runge, J., Petoukhov, V., Donges, J. F., Hlinka, J., Jajcay, N.,

Vejmelka, M., Hartman, D., Marwan, N., Palus, M., and Kurths, J.

(2015b).

Identifying causal gateways and mediators in complex

spatio-temporal systems.

Nature Communications, 6:8502.

Runge, J., Petoukhov, V., and Kurths, J. (2014).

Quantifying the Strength and Delay of Climatic Interactions:

The Ambiguities of Cross Correlation and a Novel Measure

Based on Graphical Models.

Journal of Climate, 27(2):720–739.

38

References v

Runge, J., Riedl, M., Muller, A., Stepan, H., Wessel, N., and Kurths,

J. (2015c).

Quantifying the causal strength of multivariate cardiovascular

couplings with momentary information transfer.

Physiological Measurement, 36(4):813–825.

Spirtes, P., Glymour, C., and Scheines, R. (2000).

Causation, Prediction, and Search, volume 81.

The MIT Press, Boston.

Szekely, G. J., Rizzo, M. L., and Bakirov, N. K. (2007).

MEASURING AND TESTING DEPENDENCE BY

CORRELATION OF DISTANCES.

The Annals of Statistics, 35(6):2769–2794.

39

References vi

Vejmelka, M., Pokorna, L., Hlinka, J., Hartman, D., Jajcay, N., and

Palus, M. (2014).

Non-random correlation structures and dimensionality

reduction in multivariate climate data.

Climate Dynamics, 44(9-10):2663–2682.

40

Causal discovery challenges

Latent variables Synergistic dependencies

Contemporaneous effects Long-range memory

Causal assumptions

Causal assumptions




direct causes”


causes




Causal assumptions




direct causes”


causes







More discussion → appendix

Causal assumptions

Causal Markov Condition

“All the relevant probabilistic information that can be obtained from the

system is contained in its direct causes”

Formally: Upon specifying a complete graph that contains all common

causes: separation in graph entails at least implied conditional

independencies in process

X1

X2

X3

tt−1t−2

X4

t−3

Causal assumptions

Causal Markov Condition

“All the relevant probabilistic information that can be obtained from the

system is contained in its direct causes”

Formally: Upon specifying a complete graph that contains all common

causes: separation in graph entails at least implied conditional

independencies in process

X1

X2

X3

tt−1t−2

X4

t−3 Structural equation modeling

framework:

X jt = f (X−

t , ηjt) η

jt ⊥⊥ X−

t+1 \ Xjt

Causal assumptions

Causal Sufficiency

R. Scheines: “Theory of causal inference is about the inferential effect of

a variety of assumptions far more than it is an endorsement of particular

assumptions”

Given estimates X ⊥⊥ Z | Y and no other independencies. Assuming

only Markov condition and faithfulness allows for several different graphs:

ZX

Y

time

ZX

YU

ZX

YU

Assuming sufficiency

Causal assumptions

Causal Sufficiency

R. Scheines: “Theory of causal inference is about the inferential effect of

a variety of assumptions far more than it is an endorsement of particular

assumptions”

Given estimates X ⊥⊥ Z | Y and no other independencies. Assuming

only Markov condition and faithfulness allows for several different graphs:

ZX

Y

time

ZX

YU

ZX

YU

Assuming sufficiency

V

Causal assumptions

Faithfulness

If there are any independence relations in the population that are not a

consequence of the Causal Markov condition (or d-separation), then the

population is unfaithful.

For example, given three variables and assuming the Causal Markov and

Sufficiency Conditions, suppose we measure these (in-)dependencies:

X ⊥⊥ Z X✚✚⊥⊥Z | Y

X✚✚⊥⊥Y (X✚✚⊥⊥Y | Z )

Y✚✚⊥⊥Z Y✚✚⊥⊥Z | X Z

X

Y

time

Causal assumptions

Stationarity of causal structure

structurally stationary for all samples

Causal assumptions

Stationarity of causal structure

structurally stationary within sliding windows

Causal assumptions

Stationarity of causal structuresummer summer summer summerwinter winter winter

periodically structurally stationary

Causal assumptions



• structurally, not necessarily same strengh / parameters

Causal assumptions



• structurally, not necessarily same strengh / parameters

• masking implemented in Tigramite

causal inference and complex network methods for the ... · new research group climate sensitivity...

Documents