causal inference and complex network methods for the ... · new research group climate sensitivity...
TRANSCRIPT
Causal inference and complex network
methods for the geosciences
Jakob Runge
September 28, 2017
https://jakobrunge.github.io/tigramite/
New research group
Climate sensitivity
Extremes prediction
Deep learning
Climate model evaluation
Z1X
Z2
W2
W1
Y
Z1X
Z2
W2
W1
Y
Data-driven dynamical
models
X
YZ
W
Time-scale causality
Causal discovery theory
JENA
Open PhD and Postdoc positions→ climateinformaticslab.com
1
Problem setting
Problem setting
Goal
Learn causal interactions from time series of complex dynamical systems
X1
X2
X3
X4
linear
nonlinear
2
Problem setting
Goal
Learn causal interactions from time series of complex dynamical systems
X1
X2
X3
X1
X2
X3
X4
spuriousassociations
causallinks
X4
linear
nonlinear
1
2
1
3
2
Problem setting
Goal
Learn causal interactions from time series of complex dynamical systems
X1
X2
X3
X1
X2
X3
X4
spuriousassociations
causallinks
strong autocorrelation
X4
linear
nonlinear
1
2
1
3
2
Problem setting
Goal
Learn causal interactions from time series of complex dynamical systems
X1
X2
X3
X1
X2
X3
X4
spuriousassociations
causallinks
strong autocorrelation
X4
linear
nonlinear
1
2
1
3
1. How to formulate causal inference for complex dynamical systems?
2
Problem setting
Goal
Learn causal interactions from time series of complex dynamical systems
X1
X2
X3
X1
X2
X3
X4
spuriousassociations
causallinks
strong autocorrelation
X4
linear
nonlinear
1
2
1
3
1. How to formulate causal inference for complex dynamical systems?
2. How to detect causal links?
2
Problem setting
Goal
Learn causal interactions from time series of complex dynamical systems
X1
X2
X3
X1
X2
X3
X4
spuriousassociations
causallinks
strong autocorrelation
X4
linear
nonlinear
1
2
1
3
1. How to formulate causal inference for complex dynamical systems?
2. How to detect causal links?
3. How to quantify causal interactions? 2
How to formulate causal
inference for complex dynamical
systems?
Time series graphs
Time series graphs
Definition
X1
X2
X3
X4
N
[Eichler, 2012]
3
Time series graphs
Definition
X1
X2
X3
tt−1t−2
X4
t−3
N
τmax
[Eichler, 2012]
3
Time series graphs
Definition
X1
X2
X3
tt−1t−2
X4
t−3
N
τmax
X it−τ ✚✚⊥⊥ X j
t | X−t
X it−τ
is not independent
of X jt given X−
t
[Eichler, 2012]
3
Time series graphs
Definition
X1
X2
X3
tt−1t−2
X4
t−3
N
τmax
X it−τ ✚✚⊥⊥ X j
t | X−t
X it−τ
is not independent
of X jt given X−
t
Assuming stationarity
[Eichler, 2012]
3
Time series graphs
Definition
X1
X2
X3
tt−1t−2
X4
t−3
N
τmax
X it−τ ✚✚⊥⊥ X j
t | X−t
X it−τ
is not independent
of X jt given X−
t
Assuming stationarity
X3
X41
1
[Eichler, 2012]
3
Time series graphs
Definition
X1
X2
X3
tt−1t−2
X4
t−3
N
τmax
X it−τ ✚✚⊥⊥ X j
t | X−t
X it−τ
is not independent
of X jt given X−
t
Assuming stationarity
X3
X41
1
Contemporaneous links defined as X it ✚✚⊥⊥ X j
t | X−t left undirected here
[Eichler, 2012]
3
Time series graphs
Definition
X1
X2
X3
tt−1t−2
X4
t−3
N
τmax
Parents
X it−τ ✚✚⊥⊥ X j
t | X−t
X it−τ
is not independent
of X jt given X−
t
Assuming stationarity
X3
X41
1
Contemporaneous links defined as X it ✚✚⊥⊥ X j
t | X−t left undirected here
[Eichler, 2012]
3
How to detect causal links?
Causal discovery
Causal discovery overview
Z2
Z1
Z2
X
Z2
Y
Z1
Z3
X
Z2
Y
Z1
Z2
Z3
X
Z2
Y
Z3
s=0.03 s=0.91 s=0.87
Y
X
tt−1t−2t−3
Z1
N
τmax
X
Z2
Y
Z1
Z2
Z3
X
Z2
Y
Z1
Z2
Z3
X
Z2
Y
Z1
Z2
Z3
p=0 p=1 p=2
High-dimensional independencetesting /
Granger causality
Score-based / Bayesian network estimation
Constraint-based / PC algorithm
4
Causal discovery overview
Z2
Z1
Z2
X
Z2
Y
Z1
Z3
X
Z2
Y
Z1
Z2
Z3
X
Z2
Y
Z3
s=0.03 s=0.91 s=0.87
Y
X
tt−1t−2t−3
Z1
N
τmax
X
Z2
Y
Z1
Z2
Z3
X
Z2
Y
Z1
Z2
Z3
X
Z2
Y
Z1
Z2
Z3
p=0 p=1 p=2
Score-based / Bayesian network estimation
Constraint-based / PC algorithm
Hybrid-methods
High-dimensional independencetesting /
Granger causality
4
Causal discovery
Granger causality
N
τmax
Lag-specific Granger
causality:
X it−τ ✚✚⊥⊥ X j
t | X−t
here implemented with
ParCorr based on OLS /
Ridge / Lasso,
non-parametric Gaussian
processes test → paper
5
Causal discovery
Granger causality
N
τmax
Lag-specific Granger
causality:
X it−τ ✚✚⊥⊥ X j
t | X−t
here implemented with
ParCorr based on OLS /
Ridge / Lasso,
non-parametric Gaussian
processes test → paper
5
Causal discovery
Granger causality
N
τmax
Lag-specific Granger
causality:
X it−τ ✚✚⊥⊥ X j
t | X−t
here implemented with
ParCorr based on OLS /
Ridge / Lasso,
non-parametric Gaussian
processes test → paper
5
PCMCI = PC1-condition selection + MCI test
1. Condition selection (Markov blanket)
For j ∈ {1, . . . ,N}: Estimate superset of
parents P(X jt ) such that
X jt ⊥⊥ X−
t \ P(X jt ) | P(X j
t ) with iterative
PC1 algorithm: tuned to high power with
liberal α, false pos. control in next step!
N
τmax
6
PCMCI = PC1-condition selection + MCI test
1. Condition selection (Markov blanket)
For j ∈ {1, . . . ,N}: Estimate superset of
parents P(X jt ) such that
X jt ⊥⊥ X−
t \ P(X jt ) | P(X j
t ) with iterative
PC1 algorithm: tuned to high power with
liberal α, false pos. control in next step!
N
τmax
6
PCMCI = PC1-condition selection + MCI test
1. Condition selection (Markov blanket)
For j ∈ {1, . . . ,N}: Estimate superset of
parents P(X jt ) such that
X jt ⊥⊥ X−
t \ P(X jt ) | P(X j
t ) with iterative
PC1 algorithm: tuned to high power with
liberal α, false pos. control in next step!
N
τmax
6
PCMCI = PC1-condition selection + MCI test
1. Condition selection (Markov blanket)
For j ∈ {1, . . . ,N}: Estimate superset of
parents P(X jt ) such that
X jt ⊥⊥ X−
t \ P(X jt ) | P(X j
t ) with iterative
PC1 algorithm: tuned to high power with
liberal α, false pos. control in next step!
N
τmax
Remove links atα−significance level
6
PCMCI = PC1-condition selection + MCI test
1. Condition selection (Markov blanket)
For j ∈ {1, . . . ,N}: Estimate superset of
parents P(X jt ) such that
X jt ⊥⊥ X−
t \ P(X jt ) | P(X j
t ) with iterative
PC1 algorithm: tuned to high power with
liberal α, false pos. control in next step!
N
τmax
6
PCMCI = PC1-condition selection + MCI test
1. Condition selection (Markov blanket)
For j ∈ {1, . . . ,N}: Estimate superset of
parents P(X jt ) such that
X jt ⊥⊥ X−
t \ P(X jt ) | P(X j
t ) with iterative
PC1 algorithm: tuned to high power with
liberal α, false pos. control in next step!
N
τmax
6
PCMCI = PC1-condition selection + MCI test
1. Condition selection (Markov blanket)
For j ∈ {1, . . . ,N}: Estimate superset of
parents P(X jt ) such that
X jt ⊥⊥ X−
t \ P(X jt ) | P(X j
t ) with iterative
PC1 algorithm: tuned to high power with
liberal α, false pos. control in next step!
N
τmax
6
PCMCI = PC1-condition selection + MCI test
1. Condition selection (Markov blanket)
For j ∈ {1, . . . ,N}: Estimate superset of
parents P(X jt ) such that
X jt ⊥⊥ X−
t \ P(X jt ) | P(X j
t ) with iterative
PC1 algorithm: tuned to high power with
liberal α, false pos. control in next step!
N
τmax
6
PCMCI = PC1-condition selection + MCI test
1. Condition selection (Markov blanket)
For j ∈ {1, . . . ,N}: Estimate superset of
parents P(X jt ) such that
X jt ⊥⊥ X−
t \ P(X jt ) | P(X j
t ) with iterative
PC1 algorithm: tuned to high power with
liberal α, false pos. control in next step!
N
τmax
Remove links atα−significance level
6
PCMCI = PC1-condition selection + MCI test
1. Condition selection (Markov blanket)
For j ∈ {1, . . . ,N}: Estimate superset of
parents P(X jt ) such that
X jt ⊥⊥ X−
t \ P(X jt ) | P(X j
t ) with iterative
PC1 algorithm: tuned to high power with
liberal α, false pos. control in next step!
N
τmax
6
PCMCI = PC1-condition selection + MCI test
1. Condition selection (Markov blanket)
For j ∈ {1, . . . ,N}: Estimate superset of
parents P(X jt ) such that
X jt ⊥⊥ X−
t \ P(X jt ) | P(X j
t ) with iterative
PC1 algorithm: tuned to high power with
liberal α, false pos. control in next step!
N
τmax
6
PCMCI = PC1-condition selection + MCI test
1. Condition selection (Markov blanket)
For j ∈ {1, . . . ,N}: Estimate superset of
parents P(X jt ) such that
X jt ⊥⊥ X−
t \ P(X jt ) | P(X j
t ) with iterative
PC1 algorithm: tuned to high power with
liberal α, false pos. control in next step!
N
τmax
6
PCMCI = PC1-condition selection + MCI test
1. Condition selection (Markov blanket)
For j ∈ {1, . . . ,N}: Estimate superset of
parents P(X jt ) such that
X jt ⊥⊥ X−
t \ P(X jt ) | P(X j
t ) with iterative
PC1 algorithm: tuned to high power with
liberal α, false pos. control in next step!
N
τmax
Remove links atα−significance level
6
PCMCI = PC1-condition selection + MCI test
1. Condition selection (Markov blanket)
For j ∈ {1, . . . ,N}: Estimate superset of
parents P(X jt ) such that
X jt ⊥⊥ X−
t \ P(X jt ) | P(X j
t ) with iterative
PC1 algorithm: tuned to high power with
liberal α, false pos. control in next step!
N
τmax
Remove links atα−significance level
6
PCMCI = PC1-condition selection + MCI test
1. Condition selection (Markov blanket)
For j ∈ {1, . . . ,N}: Estimate superset of
parents P(X jt ) such that
X jt ⊥⊥ X−
t \ P(X jt ) | P(X j
t ) with iterative
PC1 algorithm: tuned to high power with
liberal α, false pos. control in next step!
N
τmax
Remove links atα−significance level
6
PCMCI = PC1-condition selection + MCI test
1. Condition selection (Markov blanket)
For j ∈ {1, . . . ,N}: Estimate superset of
parents P(X jt ) such that
X jt ⊥⊥ X−
t \ P(X jt ) | P(X j
t ) with iterative
PC1 algorithm: tuned to high power with
liberal α, false pos. control in next step!
N
τmax
Remove links atα−significance level
6
PCMCI = PC1-condition selection + MCI test
1. Condition selection (Markov blanket)
For j ∈ {1, . . . ,N}: Estimate superset of
parents P(X jt ) such that
X jt ⊥⊥ X−
t \ P(X jt ) | P(X j
t ) with iterative
PC1 algorithm: tuned to high power with
liberal α, false pos. control in next step!
N
τmax
convergence
6
PCMCI = PC1-condition selection + MCI test
1. Condition selection (Markov blanket)
For j ∈ {1, . . . ,N}: Estimate superset of
parents P(X jt ) such that
X jt ⊥⊥ X−
t \ P(X jt ) | P(X j
t ) with iterative
PC1 algorithm: tuned to high power with
liberal α, false pos. control in next step!
N
τmax
6
PCMCI = PC1-condition selection + MCI test
1. Condition selection (Markov blanket)
For j ∈ {1, . . . ,N}: Estimate superset of
parents P(X jt ) such that
X jt ⊥⊥ X−
t \ P(X jt ) | P(X j
t ) with iterative
PC1 algorithm: tuned to high power with
liberal α, false pos. control in next step!
N
τmax
2. Momentary conditional independence
(MCI) test
For i , j ∈ {1, . . . ,N} and 0 ≤ τ ≤ τmax:
Test
MCI: X it−τ
⊥⊥ X jt | P(X j
t ), P(X it−τ
)
N
τmax
6
PCMCI = PC1-condition selection + MCI test
1. Condition selection (Markov blanket)
For j ∈ {1, . . . ,N}: Estimate superset of
parents P(X jt ) such that
X jt ⊥⊥ X−
t \ P(X jt ) | P(X j
t ) with iterative
PC1 algorithm: tuned to high power with
liberal α, false pos. control in next step!
N
τmax
2. Momentary conditional independence
(MCI) test
For i , j ∈ {1, . . . ,N} and 0 ≤ τ ≤ τmax:
Test
MCI: X it−τ
⊥⊥ X jt | P(X j
t ), P(X it−τ
)
N
τmax
pX=2
6
PCMCI = PC1-condition selection + MCI test
1. Condition selection (Markov blanket)
For j ∈ {1, . . . ,N}: Estimate superset of
parents P(X jt ) such that
X jt ⊥⊥ X−
t \ P(X jt ) | P(X j
t ) with iterative
PC1 algorithm: tuned to high power with
liberal α, false pos. control in next step!
N
τmax
2. Momentary conditional independence
(MCI) test
For i , j ∈ {1, . . . ,N} and 0 ≤ τ ≤ τmax:
Test
MCI: X it−τ
⊥⊥ X jt | P(X j
t ), P(X it−τ
)
N
τmax
pX=2
Flexible regarding conditional independence tests
here ParCorr (OLS), Gaussian processes → paper 6
PCMCI = PC1-condition selection + MCI test
Condition-selection significance level10
7
PCMCI = PC1-condition selection + MCI test
Condition-selection significance level
Correlation / mutual information
Granger causality /Transfer entropy
10
7
PCMCI = PC1-condition selection + MCI test
Condition-selection significance level
Correlation / mutual information PCMCI
Granger causality /Transfer entropy
10
7
PCMCI = PC1-condition selection + MCI test
Condition-selection significance level
Correlation / mutual information PCMCI
Granger causality /Transfer entropy
10
d-separation(used in PC algo.)
7
PCMCI = PC1-condition selection + MCI test
Condition-selection significance level
Correlation / mutual information PCMCI
Granger causality /Transfer entropy
10
remove autocorrelation
& measure causal strength
d-separation(used in PC algo.)
7
PCMCI = PC1-condition selection + MCI test
Condition-selection significance level
Correlation / mutual information PCMCI
Granger causality /Transfer entropy
10by model-selection
remove autocorrelation
& measure causal strength
d-separation(used in PC algo.)
7
PCMCI = PC1-condition selection + MCI test
Condition-selection significance level
Correlation / mutual information PCMCI
Granger causality /Transfer entropy
10by model-selection
remove autocorrelation
& measure causal strength
d-separation(used in PC algo.)
More theory in paper:
If PC1 identifies parents, then
• MCI has unbiased detection power for linear links in additive models
IMCI
X→Y = I(ηXt−τ
; cηXt−τ+ ηYt
)
• MCI is well-calibrated also for autocorrelated data
• effect size of MCI is larger than GC → more power also for low
dimensions
7
Conditional independence tests
Conditional independence tests X ⊥⊥ Y | Z
Assuming linear model: Partial correlation (ParCorr)
1. Regress out influence of Z with OLS
X = ZβX + ǫX
Y = ZβY + ǫY
Ridge and Lasso implemented with scikit-learn on standardized time series
• Ridge regularization: LOO-cross-validated regularization parameter
α ∈ {0.1, 1, 2, . . . , 500}
• Lasso regularization: multi-task lasso,
α ∈ {0.0001, 0.001, 0.01, 0.1, 1} using 5-fold cross-validation, max.
iterations = 100
2. Test independence of residuals with t-test
• OLS: T − DZ − 2 degrees of freedom
8
Conditional independence tests X ⊥⊥ Y | Z
Assuming nonlinear additive Gaussian: GPDC
1. Regress out influence of Z with Gaussian process assuming
X = fX (Z ) + ǫX
Y = fY (Z ) + ǫY
ǫ ∼ N (0, σ2)
GP regression implemented using sklearn
• Radial Basis Function (RBF) + White Noise kernel
• bandwidth estimated with MLE
2. Test independence of uniformized residuals with distance correlation
coefficient [Szekely et al., 2007]
R (rX , rY )
using pre-computed null distribution (for every T )
9
Conditional independence tests X ⊥⊥ Y | Z
General: Conditional mutual information (CMI)
I (X ;Y |Z ) =
∫
dz p(z)
∫ ∫
dxdy p(x , y |z) logp(x , y |z)
p(x |z) · p(y |z)
Estimated with kNN-estimator
[Kraskov et al., 2004,
Frenzel and Pompe, 2007]
Free parameter: number of
nearest neighbors k ∼ locally
adaptive bandwidth X
Y
Z
B
A
10
Conditional independence tests X ⊥⊥ Y | Z
X Y
Z
X Y
Z
ParCorr GPACE / GPDC CMIknn
X Y
Z
X Y
Z
nonlinear,additive noise
nonlinear,multiplicative noise
X Y
Z
X Y
Z
X Y
Z
X Y
Z
X Y
Z
X Y
Z
linear,additive noise
X Y
Z
X Y
Z
Dependency
11
Causal assumptions
Causal interpretation assumes [Spirtes et al., 2000]:
• Causal Markov Condition: “All the relevant probabilistic
information that can be obtained from the system is contained in its
direct causes”
• Causal Sufficiency: Measured variables include all of the common
causes
• Faithfulness / Stableness: “Independencies in data arise not from
incredible coincidence, but rather from causal structure”; violated by
purely deterministic dependencies
12
Causal assumptions
Causal interpretation assumes [Spirtes et al., 2000]:
• Causal Markov Condition: “All the relevant probabilistic
information that can be obtained from the system is contained in its
direct causes”
• Causal Sufficiency: Measured variables include all of the common
causes
• Faithfulness / Stableness: “Independencies in data arise not from
incredible coincidence, but rather from causal structure”; violated by
purely deterministic dependencies
• No contemporaneous effects: (but can be extended)
• Stationarity: time series case
• Parametric assumptions of independence tests
12
Causal assumptions
Causal interpretation assumes [Spirtes et al., 2000]:
• Causal Markov Condition: “All the relevant probabilistic
information that can be obtained from the system is contained in its
direct causes”
• Causal Sufficiency: Measured variables include all of the common
causes
• Faithfulness / Stableness: “Independencies in data arise not from
incredible coincidence, but rather from causal structure”; violated by
purely deterministic dependencies
• No contemporaneous effects: (but can be extended)
• Stationarity: time series case
• Parametric assumptions of independence tests
More discussion → appendix
12
Numerical experiments
Numerical experiments
Model setup
X 1t = 0. 20X 1
t− 1 +η1tX 2
t = 0. 60X 2t− 1 −0. 287f (1)(X 5
t− 1) +0. 287f (1)(X 4
t− 1) +η2tX 3
t =+η3tX 4
t = 0. 40X 4t− 1 −0. 287f (1)(X 1
t− 1) +0. 287f (1)(X 5
t− 1) +η4tX 5
t = 0. 60X 5t− 1 −0. 287f (1)(X 2
t− 1) +η5t
η ∼ N(0, 1)f (1)(x) = x
f (2)(x) = (1− 4e−x2/2)x
f (3)(x) = (1− 4x3e−x2/2)x
X 1
X 2
X 3
X 4
X 5
τ=1
τ=1
τ=1τ=1
τ=1
0.0 0.4 0.8autocorrelation a
(node color)
0.4 0.0 0.4coeff.
(link color)
3113
X1
3113
X2
3113
X3
202
X4
0 40 80 120 160time
3113
X5
• random coupling topologies, time lags, linear
• fixed link strength within each network
• different autocorrelations for variables
• τmax = 5, T = 150, varying N = 5..60 → N · τmax > T
13
Numerical experiments
Model setup
X 1t = 0. 40X 1
t− 1 +0. 287f(1)(X 5
t− 1) +0. 287f (1)(X 4
t− 1) +η1tX 2
t = 0. 20X 2t− 1 −0. 287f (1)(X 4
t− 1) +η2tX 3
t =+η3tX 4
t = 0. 80X 4t− 1 +0. 287f
(1)(X 5t− 1)
−0. 287f (1)(X 3t− 2) +η4t
X 5t = 0. 60X 5
t− 1 +η5t
η ∼ N(0, 1)f (1)(x) = x
f (2)(x) = (1− 4e−x2/2)x
f (3)(x) = (1− 4x3e−x2/2)x
X 1
X 2
X 3
X 4
X 5
τ=2
τ=1τ=1
τ=1τ=1
0.0 0.4 0.8autocorrelation a
(node color)
0.4 0.0 0.4coeff.
(link color)
3113
X1
3113
X2
4202
X3
3113
X4
0 40 80 120 160time
3113
X5
• random coupling topologies, time lags, linear
• fixed link strength within each network
• different autocorrelations for variables
• τmax = 5, T = 150, varying N = 5..60 → N · τmax > T
13
Numerical experiments
Model setup
X 1t =−0. 287f (1)(X 4
t− 1) +η1tX 2
t = 0. 40X 2t− 1 −0. 287f (1)(X 4
t− 1) +0. 287f (1)(X 1
t− 1) +η2tX 3
t = 0. 90X 3t− 1 −0. 287f (1)(X 1
t− 2) +η3tX 4
t = 0. 90X 4t− 1 +η4t
X 5t = 0. 80X 5
t− 1 −0. 287f (1)(X 3t− 2) +η5t
η ∼ N(0, 1)f (1)(x) = x
f (2)(x) = (1− 4e−x2/2)x
f (3)(x) = (1− 4x3e−x2/2)x
X 1
X 2
X 3
X 4
X 5
τ=1τ=2τ=2
τ=1τ=1
0.0 0.4 0.8autocorrelation a
(node color)
0.4 0.0 0.4coeff.
(link color)
311
X1
2.51.50.50.51.5
X2
3113
X3
3113
X4
0 40 80 120 160time
3113
X5
• random coupling topologies, time lags, linear
• fixed link strength within each network
• different autocorrelations for variables
• τmax = 5, T = 150, varying N = 5..60 → N · τmax > T
13
Numerical experiments
Model setup
X 1t = 0. 20X 1
t− 1 −0. 287f (1)(X 2t− 1) +η1t
X 2t = 0. 80X 2
t− 1 −0. 287f (1)(X 4t− 2) +η2t
X 3t = 0. 80X 3
t− 1 +η3tX 4
t = 0. 90X 4t− 1 +η4t
X 5t = 0. 40X 5
t− 1 +0. 287f(1)(X 7
t− 2) +η5tX 6
t = 0. 40X 6t− 1 +0. 287f
(1)(X 1t− 1)
−0. 287f (1)(X 2t− 2) +η6t
X 7t = 0. 80X 7
t− 1 +η7tX 8
t = 0. 20X 8t− 1 +0. 287f
(1)(X 4t− 1)
+0. 287f (1)(X 3t− 1) +η8t
X 9t = 0. 90X 9
t− 1 +0. 287f(1)(X 8
t− 2) −0. 287f (1)(X 5
t− 1) +η9tX 1
t 0 = 0. 20X 10t− 1 +0. 287f (1)(X 1t− 2) +η10t
η ∼ N(0, 1)f (1)(x) = x
f (2)(x) = (1− 4e−x2/2)x
f (3)(x) = (1− 4x3e−x2/2)x
X 1
X 2
X 3X 4
X 5
X 6
X 7
X 8 X 9
X 10
τ=2τ=1
τ=1τ=2τ=1
τ=2
τ=1
τ=1
τ=2
τ=2
0.0 0.4 0.8autocorrelation a
(node color)
0.4 0.0 0.4coeff.
(link color)
3113
X1
3113
X2
3113
X3
3113
X4
3113
X5
3113
X6
3113
X7
21012
X8
311
X9
0 40 80 120 160time
4202
X10
• random coupling topologies, time lags, linear
• fixed link strength within each network
• different autocorrelations for variables
• τmax = 5, T = 150, varying N = 5..60 → N · τmax > T
13
Numerical experiments
Similarly well-calibrated tests
Number of variables N
Fals
e p
osit
ives
Tru
e p
osit
ives
GC PCMCI
14
Numerical experiments
Similarly well-calibrated tests
Number of variables N
Fals
e p
osit
ives
Tru
e p
osit
ives
GC PCMCI
14
Numerical experiments
GC suffers from curse of dimensionality and power bias
Number of variables N
Fals
e p
osit
ives
Tru
e p
osit
ives
GC PCMCI
14
Numerical experiments
GC suffers from curse of dimensionality and power bias
Number of variables N
Fals
e p
osit
ives
Tru
e p
osit
ives
GC PCMCI
autocorrelation of links:
weak strong
unbiasedpower
biasedpower
14
Numerical experiments
Lasso not well-calibrated and power bias
Number of variables N
Fals
e p
osit
ives
Tru
e p
osit
ives
GC LASSO PCMCI
14
Numerical experiments
PC algorithm also low and biased power
Number of variables N
Fals
e p
osit
ives
Tru
e p
osit
ives
GC LASSO PC PCMCI
14
Numerical experiments
Key idea again
N
τmax
14
Numerical experiments
Key idea again
N
τmax
14
Applications
Applications
• Causal hypothesis testing
[Runge et al., 2014, Runge et al., 2015c, Kretschmer et al., 2016]
• Variable selection for model building
• ...or prediction schemes
[Runge et al., 2015a, Kretschmer et al., 2017]
• Pathway analysis [Runge et al., 2015b, Runge, 2015]
15
Global sea-level pressure interactions
Sea-level pressure system [Kalnay et al., 1996]
87.5°S
85°S
82.5°S
80°S
77.5°S
75°S
72.5°S
70°S
67.5°S
65°S
62.5°S
60°S
57.5°S
55°S
52.5°S
50°S
47.5°S
45°S
42.5°S
40°S
37.5°S
35°S
32.5°S
30°S
27.5°S
25°S
22.5°S
20°S
17.5°S
15°S
12.5°S
10°S
7.5°S
5°S
2.5°S
0°
2.5°N
5°N
7.5°N
10°N
12.5°N
15°N
17.5°N
20°N
22.5°N
25°N
27.5°N
30°N
32.5°N
35°N
37.5°N
40°N
42.5°N
45°N
47.5°N
50°N
52.5°N
55°N
57.5°N
60°N
62.5°N
65°N
67.5°N
70°N
72.5°N
75°N
77.5°N
80°N
82.5°N
85°N
87.5°N
2.5°E 5°E 7.5°E 10°E 12.5°E 15°E 17.5°E 20°E 22.5°E 25°E 27.5°E 30°E 32.5°E 35°E 37.5°E 40°E 42.5°E 45°E 47.5°E 50°E 52.5°E 55°E 57.5°E 60°E 62.5°E 65°E 67.5°E 70°E 72.5°E 75°E 77.5°E 80°E 82.5°E 85°E 87.5°E 90°E 92.5°E 95°E 97.5°E 100°E 102.5°E 105°E 107.5°E 110°E 112.5°E 115°E 117.5°E 120°E 122.5°E 125°E 127.5°E 130°E 132.5°E 135°E 137.5°E 140°E 142.5°E 145°E 147.5°E 150°E 152.5°E 155°E 157.5°E 160°E 162.5°E 165°E 167.5°E 170°E 172.5°E 175°E 177.5°E 180°177.5°W 175°W 172.5°W 170°W 167.5°W 165°W 162.5°W 160°W 157.5°W 155°W 152.5°W 150°W 147.5°W 145°W 142.5°W 140°W 137.5°W 135°W 132.5°W 130°W 127.5°W 125°W 122.5°W 120°W 117.5°W 115°W 112.5°W 110°W 107.5°W 105°W 102.5°W 100°W 97.5°W 95°W 92.5°W 90°W 87.5°W 85°W 82.5°W 80°W 77.5°W 75°W 72.5°W 70°W 67.5°W 65°W 62.5°W 60°W 57.5°W 55°W 52.5°W 50°W 47.5°W 45°W 42.5°W 40°W 37.5°W 35°W 32.5°W 30°W 27.5°W 25°W 22.5°W 20°W 17.5°W 15°W 12.5°W 10°W 7.5°W 5°W 2.5°W
0 1 23 456 78 910 11 1213 141516 1718 19 2021 2223 242526 272829 30 31323334 35 36373839 404142 43 44 4546 474849505152 535455 5657 5859
• detrended, anomalized, winter-only (DJF) of 1981–2012
• dimension reduction using Varimax-rotated PCA
[Vejmelka et al., 2014]
• time resolution: 3-days, τmax = 3 weeks
16
Global sea-level pressure interactions
Sea-level pressure system [Kalnay et al., 1996]
60°S
30°S
0°
30°N
60°N
60°E 120°E 180° 120°W 60°W
0
1
2
3
4
5
6
7
8
9
10
11
12
13
14 15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
0 1
2
3
4
5
6
7
8
9
10
11
1213
14 15
16
17
18
19
20
21
22
2324
25
26
27
2829
30
31
3233
34
3536
37
38
39
40 41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
5859
11
• detrended, anomalized, winter-only (DJF) of 1981–2012
• dimension reduction using Varimax-rotated PCA
[Vejmelka et al., 2014]
• time resolution: 3-days, τmax = 3 weeks
16
Global sea-level pressure interactions
Sea-level pressure system [Kalnay et al., 1996]
60°S
30°S
0°
30°N
60°N
60°E 120°E 180° 120°W 60°W
0
1
2
3
4
5
6
7
8
9
10
11
12
13
14 15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
0 1
2
3
4
5
6
7
8
9
10
11
1213
14 15
16
17
18
19
20
21
22
2324
25
26
27
2829
30
31
3233
34
3536
37
38
39
40 41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
5859
11
• detrended, anomalized, winter-only (DJF) of 1981–2012
• dimension reduction using Varimax-rotated PCA
[Vejmelka et al., 2014]
• time resolution: 3-days, τmax = 3 weeks
16
Global sea-level pressure interactions
Sea-level pressure system [Kalnay et al., 1996]
60°S
30°S
0°
30°N
60°N
60°E 120°E 180° 120°W 60°W
0
1
2
3
4
5
6
7
8
9
10
11
12
13
14 15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
0 1
2
3
4
5
6
7
8
9
10
11
1213
14 15
16
17
18
19
20
21
22
2324
25
26
27
2829
30
31
3233
34
3536
37
38
39
40 41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
5859
11
5
• detrended, anomalized, winter-only (DJF) of 1981–2012
• dimension reduction using Varimax-rotated PCA
[Vejmelka et al., 2014]
• time resolution: 3-days, τmax = 3 weeks
16
Global sea-level pressure interactions
Sea-level pressure system [Kalnay et al., 1996]
60°S
30°S
0°
30°N
60°N
60°E 120°E 180° 120°W 60°W
0
1
2
3
4
5
6
7
8
9
10
11
12
13
14 15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
0 1
2
3
4
5
6
7
8
9
10
11
1213
14 15
16
17
18
19
20
21
22
2324
25
26
27
2829
30
31
3233
34
3536
37
38
39
40 41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
5859
11
5
0
233
22
• detrended, anomalized, winter-only (DJF) of 1981–2012
• dimension reduction using Varimax-rotated PCA
[Vejmelka et al., 2014]
• time resolution: 3-days, τmax = 3 weeks
16
Global sea-level pressure interactions
Sea-level pressure system [Kalnay et al., 1996]
60°S
30°S
0°
30°N
60°N
60°E 120°E 180° 120°W 60°W
0
1
2
3
4
5
6
7
8
9
10
11
12
13
14 15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
0 1
2
3
4
5
6
7
8
9
10
11
1213
14 15
16
17
18
19
20
21
22
2324
25
26
27
2829
30
31
3233
34
3536
37
38
39
40 41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
5859
11
5
0
233
22
• detrended, anomalized, winter-only (DJF) of 1981–2012
• dimension reduction using Varimax-rotated PCA
[Vejmelka et al., 2014]
• time resolution: 3-days, τmax = 3 weeks
• Nτmax = 60 · 7 = 420 with a comparably small sample size of about
950 samples and partially strong autocorrelations
16
Climate applications
Spurious correlation vs MCI
0.0
0.2
0.4
0.6
0.8
1.0
Auto
corr
ela
tion
0.0 0.2 0.4
|Correlation|
0.0
0.1
0.2
0.3
|MC
I|
FDR 5%
26→42
35→4
9→58→5
20→48
44→5
55→41
22→5
38→5
43→2
15→2
15→5
30→15
2→26
33→0
48→0
43→542→0
42→8
55→1
26→0
7→042→29
2→5
10→38
4→26
18→32
20→16
31→5
45→5
26→2
22→1047→49
18→1
18→2
51→18
0→33
0→57
16→055→5
40→41
A• even strong correlations
are spurious
17
Climate applications
Granger causality vs MCI
0.0
0.2
0.4
0.6
0.8
1.0
Auto
corr
ela
tion
0.0 0.1 0.2
|GC|
0.0
0.1
0.2
0.3
|MC
I|
FDR 5%
35→1
55→41
35→4
9→58→5
20→48
1→1244→5
16→59
45→5
43→2
15→2
15→522→5
14→43
55→5
30→15
12→42
43→5
55→1
38→5
20→16
7→042→29
21→8
10→38
27→26
9→11
27→32
9→6
22→10
52→0
47→49
59→4
31→5
51→18
2→5
40→41
B• many even strong causal
links overlooked with GC
18
How to quantify causal
interactions?
Causal strength
MCI and causal strength
Defining causal strength
time Xt−τ = gX (P (Xt−τ )) + ηXt−τ
Yt = gY (P (Yt) \ {Xt−τ}) + ηYt
with link Xt−τ → Yt represented as
ηYt = f (Xt−τ ) + ηYt
Causal strength
I(ηXt−τ
; ηYt |P (Xt−τ ))
measures “momentary” dependence
in ηYt on Xt−τ that does not come
through the parents of Xt−τ
19
MCI and causal strength
Defining causal strength
time Xt−τ = gX (P (Xt−τ )) + ηXt−τ
Yt = gY (P (Yt) \ {Xt−τ}) + ηYt
with link Xt−τ → Yt represented as
ηYt = f (Xt−τ ) + ηYt
Causal strength
I(ηXt−τ
; ηYt |P (Xt−τ ))
measures “momentary” dependence
in ηYt on Xt−τ that does not come
through the parents of Xt−τ
19
MCI and causal strength
Definition
MCI: Xt−τ ⊥⊥ Yt | P(Yt) \ {Xt−τ}, P(Xt−τ ) (1)
20
MCI and causal strength
Definition
IMCI
X→Y (τ) = I (Xt−τ ; Yt | P(Yt) \ {Xt−τ}, P(Xt−τ )) (1)
20
MCI and causal strength
Definition
IMCI
X→Y (τ) = I (Xt−τ ; Yt | P(Yt) \ {Xt−τ}, P(Xt−τ )) (1)
1. MCI measures causal strength
IMCI
X→Y = I(gX
(PXt−τ
)+ ηXt−τ
; gY (PYt\ {Xt−τ}) + ηYt | . . .
)
= I(ηXt−τ
; ηYt | PXt−τ
)
20
MCI and causal strength
Definition
IMCI
X→Y (τ) = I (Xt−τ ; Yt | P(Yt) \ {Xt−τ}, P(Xt−τ )) (1)
1. MCI measures causal strength
IMCI
X→Y = I(gX
(PXt−τ
)+ ηXt−τ
; gY (PYt\ {Xt−τ}) + ηYt | . . .
)
= I(ηXt−τ
; ηYt | PXt−τ
)
2. MCI has unbiased detection power for linear links
ηYt = cXt−τ + ηYt = c(gX
(PXt−τ
)+ ηXt−τ
)+ ηYt
=⇒ IMCI
X→Y = I(ηXt−τ
; cηXt−τ+ ηYt
)
20
MCI and causal strength
Definition
IMCI
X→Y (τ) = I (Xt−τ ; Yt | P(Yt) \ {Xt−τ}, P(Xt−τ )) (1)
1. MCI measures causal strength
IMCI
X→Y = I(gX
(PXt−τ
)+ ηXt−τ
; gY (PYt\ {Xt−τ}) + ηYt | . . .
)
= I(ηXt−τ
; ηYt | PXt−τ
)
2. MCI has unbiased detection power for linear links
ηYt = cXt−τ + ηYt = c(gX
(PXt−τ
)+ ηXt−τ
)+ ηYt
=⇒ IMCI
X→Y = I(ηXt−τ
; cηXt−τ+ ηYt
)
3. MCI leads to well-calibrated test
ηYt = ηYt =⇒ IMCI
X→Y = I(ηXt−τ
; ηYt)= 0
20
MCI, GC, and PC
1. Generally: GC ≤ MCI (→ GC has lower power)
IGC
X→Y (τ) = I(Xt−τ ;Yt |X
−t \ {Xt−τ}
)
time I ((X ,Z );Y |W ) = I (X ;Y |W )︸ ︷︷ ︸
MCI
+ I (Z ;Y |W ,X )︸ ︷︷ ︸
=0 (Markov)
= I (Z ;Y |W )︸ ︷︷ ︸
≥0
+ I (X ;Y |WZ )︸ ︷︷ ︸
GC
=⇒ IMCI
X→Y (τ) ≥ IGC
X→Y (τ)
21
MCI, GC, and PC
2. Single PC test has more power, but is non-iid
IPC
X→Y (τ) = I (Xt−τ ;Yt |P (Yt) \ {Xt−τ})
= I(ηXt−τ
,P (Xt−τ ) ;Yt |P (Yt) \ {Xt−τ})
= I (P (Xt−τ ) ;Yt |P (Yt) \ {Xt−τ})︸ ︷︷ ︸
typically non-iid
+ I(ηXt−τ
;Yt |P (Yt) \ {Xt−τ} ,P (Xt−τ ))
︸ ︷︷ ︸
MCI
=⇒ IMCI
X→Y (τ) ≤ IPC
X→Y (τ)
21
MCI and causal strength
Effect size analysis for simple model
A
22
MCI and causal strength
Effect size analysis for simple model
A
0.0 0.2 0.4 0.6 0.8 1.0
Autocorrelation aX
0.0
0.1
0.2
0.3
0.4
0.5
0.6
I [n
ats
]
MI
PC / ITY
GC
MCI
B
22
MCI and causal strength
Effect size analysis for simple model
A
1.0 0.5 0.0 0.5 1.0 1.5
Common driver cWX
0.0
0.1
0.2
0.3
0.4
0.5
0.6
I [n
ats
]
MI
PC / ITY
GC
MCI
C
22
MCI and causal strength
Effect size analysis for simple model
A
0.0 0.5 1.0 1.5 2.0
External effect cXZ
0.0
0.1
0.2
0.3
0.4
0.5
0.6
I [n
ats
]
MI
PC / ITY
GC
MCI
D
22
MCI and causal strength
Effect size analysis for simple model
A
0.0 0.5 1.0 1.5 2.0
External effect cXZ
0.0
0.1
0.2
0.3
0.4
0.5
0.6
I [n
ats
]
MI
PC / ITY
GC
MCI
D
General proof for ‘unbiased’ detection power → paper
22
Quantifying causal pathways
Quantifying causal pathways
Linear approach: Mediated Causal Effect (MCE)
Z1X
Z2
W2
W1
Y
Yt = f ( ~PY ) + error = ~PY · ~B + error
• Direct links: path coefficients
α, β, γ, δ, ε
23
Quantifying causal pathways
Linear approach: Mediated Causal Effect (MCE)
Z1X
Z2
W2
W1
Y
Yt = f ( ~PY ) + error = ~PY · ~B + error
• Direct links: path coefficients
α, β, γ, δ, ε
• Indirect causal effect:
CEX→Y = αε+ βδ + βγε
23
Quantifying causal pathways
Linear approach: Mediated Causal Effect (MCE)
Z1X
Z2
W2
W1
Y
Yt = f ( ~PY ) + error = ~PY · ~B + error
• Direct links: path coefficients
α, β, γ, δ, ε
• Indirect causal effect:
CEX→Y = αε+ βδ + βγε
• Mediated causal effect:
MCEX→Y |W1= βδ + βγε
23
Quantifying causal pathways
Climate application: East Pacific → Monsoon pathway
Here whole year analysis [Runge et al., 2015b]
• causal approach to
atmospheric
teleconnections
24
Network analysis
Climate application
Here whole year analysis [Runge et al., 2015b]
OUT
IN
Ave.
Causal
Effect
X
• Average Causal Effect (ACE)
ACE(i) =1
N − 1
∑
j 6=i
CEmaxi→j
• Average Causal Susceptibility
ACS(j) =1
N − 1
∑
i 6=j
CEmaxi→j
25
Network analysis
Climate application
Here whole year analysis [Runge et al., 2015b]
60°S
30°S
0°
30°N
60°N
0° 60°E 120°E 180° 120°W 60°W 44%33%
Nin/out
a)
0 1
2
3
4
5
6
78
910
11
1213
14 15
16
17
18
19
20
21
2223 24
2526
27
2829
30
31
323334
35 36
37
38
3940 41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
5859
0.00 0.02 0.04 0.06ACS (inner node) and ACE (outer ring)
• uplifts over
tropical oceans
are major drivers
26
Network analysis
Climate application
Here whole year analysis [Runge et al., 2015b]
THROUGH
Ave.
Mediated
Causal
Effect
X
• Average Mediated Causal Effect
(AMCE)
AMCE(i) =1
|Ck |
∑
(i,j)∈Ck
maxτ
|MCEi→j|k(τ)|
27
Network analysis
Climate application
Here whole year analysis [Runge et al., 2015b]
60°S
30°S
0°
30°N
60°N
0° 60°E 120°E 180° 120°W 60°W82%
Max. |C|/Cmax
b)
0 1
2
3
4
5
6
7
8
910
11
1213
14 15
16
17
18
19
20
21
22
23 24
25
26
27
2829
30
31
3233
34
35 36
37
38
39
40 41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
5859
0.0000 0.0005 0.0010 0.0015 0.0020AMCE
• uplifts over
tropical oceans
also major
mediators
28
Quantifying causal pathways
Information-theoretic approach [Runge, 2015]
Z1X
Z2
W2
W1
Y
Z3
I (X ;Y |Z ) =∫dz p(z)
∫∫dxdy p(x , y |z) log p(x,y |z)
p(x|z)·p(y |z)
• Direct links: Momentary information
transfer (MIT)
IMIT
X→Y = I (X ;Y |PY ,PX )
29
Quantifying causal pathways
Information-theoretic approach [Runge, 2015]
Z1X
Z2
W2
W1
Y
Z3
Xt = f (PX ) + ηXt
IMITP
X→Y = I (ηXt−3 ; Yt | Ppaths)
I (X ;Y |Z ) =∫dz p(z)
∫∫dxdy p(x , y |z) log p(x,y |z)
p(x|z)·p(y |z)
• Direct links: Momentary information
transfer (MIT)
IMIT
X→Y = I (X ;Y |PY ,PX )
• Indirect paths: Momentary information
transfer along paths (MITP)
IMITP
X→Y (τ) = I (X ;Y |Ppaths)
29
Quantifying causal pathways
Information-theoretic approach [Runge, 2015]
Z1X
Z2
W2
W1
Y
Z3
Xt = f (PX ) + ηXt
IMITP
X→Y = I (ηXt−3 ; Yt | Ppaths)
IMII
X→Y |W1= I(ηXt−3;W1,t−2;Yt |Ppaths)
I (X ;Y |Z ) =∫dz p(z)
∫∫dxdy p(x , y |z) log p(x,y |z)
p(x|z)·p(y |z)
• Direct links: Momentary information
transfer (MIT)
IMIT
X→Y = I (X ;Y |PY ,PX )
• Indirect paths: Momentary information
transfer along paths (MITP)
IMITP
X→Y (τ) = I (X ;Y |Ppaths)
• Mediation: Momentary interaction
information (MII)
IMII
X→Y |W(τ) = IMITP
X→Y (τ)
− I (X ;Y | Ppaths ,W)︸ ︷︷ ︸
MITP conditioned on W
29
Quantifying causal pathways
Information-theoretic approach [Runge, 2015]
30
Discussion and conclusion
Discussion and conclusion
• framework for reliable large-scale time-lagged causal discovery
Python code on https://jakobrunge.github.io/tigramite/
Paper: J. Runge, D. Sejdinovic, S. Flaxman (2017):
https://arxiv.org/abs/1702.07007
31
Discussion and conclusion
• framework for reliable large-scale time-lagged causal discovery
• flexible regarding (non-parametric) conditional independence tests
→ nodes can be multivariate, variables discrete, ...
Python code on https://jakobrunge.github.io/tigramite/
Paper: J. Runge, D. Sejdinovic, S. Flaxman (2017):
https://arxiv.org/abs/1702.07007
31
Discussion and conclusion
• framework for reliable large-scale time-lagged causal discovery
• flexible regarding (non-parametric) conditional independence tests
→ nodes can be multivariate, variables discrete, ...
• interpretable MCI statistic measures causal strength
→ ranking causal links in large-scale analyses
Python code on https://jakobrunge.github.io/tigramite/
Paper: J. Runge, D. Sejdinovic, S. Flaxman (2017):
https://arxiv.org/abs/1702.07007
31
Discussion and conclusion
• framework for reliable large-scale time-lagged causal discovery
• flexible regarding (non-parametric) conditional independence tests
→ nodes can be multivariate, variables discrete, ...
• interpretable MCI statistic measures causal strength
→ ranking causal links in large-scale analyses
• Causal pathway analysis [PRE Dec 2015]
Python code on https://jakobrunge.github.io/tigramite/
Paper: J. Runge, D. Sejdinovic, S. Flaxman (2017):
https://arxiv.org/abs/1702.07007
31
Discussion and conclusion
• framework for reliable large-scale time-lagged causal discovery
• flexible regarding (non-parametric) conditional independence tests
→ nodes can be multivariate, variables discrete, ...
• interpretable MCI statistic measures causal strength
→ ranking causal links in large-scale analyses
• Causal pathway analysis [PRE Dec 2015]
• Causal complex network measures [Nature Comm. 2015]
Python code on https://jakobrunge.github.io/tigramite/
Paper: J. Runge, D. Sejdinovic, S. Flaxman (2017):
https://arxiv.org/abs/1702.07007
31
Discussion and conclusion
• framework for reliable large-scale time-lagged causal discovery
• flexible regarding (non-parametric) conditional independence tests
→ nodes can be multivariate, variables discrete, ...
• interpretable MCI statistic measures causal strength
→ ranking causal links in large-scale analyses
• Causal pathway analysis [PRE Dec 2015]
• Causal complex network measures [Nature Comm. 2015]
• Optimal prediction [PRE May 2015]
Python code on https://jakobrunge.github.io/tigramite/
Paper: J. Runge, D. Sejdinovic, S. Flaxman (2017):
https://arxiv.org/abs/1702.07007
31
Tigramite
Tigramite 3.0 → tigramite tutorial.ipynb
https://jakobrunge.github.io/tigramite/
pcmci.PCMCI(dataframe, cond_ind_test)
run_pcmci(tau_max, pc_alpha, fdr_method)
Returns: p_matrix, q_matrix, val_matrix
data_processingpreprocessing functions
DataFrame(data, mask, missing_flag)
independence_tests.ParCorr/GPDC/CMIknn/CMIsymb(use_mask, mask_type, significance,
confidence)
models.Models(dataframe, model, data_transform, use_mask, mask_type,
missing_flag)Wrapper around sklearn
get_fit(all_parents)Returns: fitted model
models.LinearMediation(dataframe, data_transform,
use_mask, mask_type,
missing_flag)
models.Prediction(dataframe, train_indices,
test_indices, prediction_model, data_transform, cond_ind_model,
missing_flag)
get_predictors(steps_ahead, tau_max, pc_alpha)Returns: predictors
fit(target_predictors)predict(target)
Returns: prediction results
32
New research group
Climate sensitivity
Extremes prediction
Deep learning
Climate model evaluation
Z1X
Z2
W2
W1
Y
Z1X
Z2
W2
W1
Y
Data-driven dynamical
models
X
YZ
W
Time-scale causality
Causal discovery theory
JENA
Open PhD and Postdoc positions→ climateinformaticslab.com
33
Thank you !!!
34
References i
Eichler, M. (2012).
Graphical modelling of multivariate time series.
Probability Theory and Related Fields, 153(1):233–268.
Frenzel, S. and Pompe, B. (2007).
Partial mutual information for coupling analysis of multivariate
time series.
Physical Review Letters, 99(20):204101.
35
References ii
Kalnay, E., Kanamitsu, M., Kistler, R., Collins, W., Deaven, D.,
Gandin, L., Iredell, M., Saha, S., White, G., Woollen, J., Zhu, Y.,
Chelliah, M., Ebisuzaki, W., Higgins, W., Janowiak, J., Mo, K. C.,
Ropelewski, C., Wang, J., Leetmaa, A., Reynolds, R., Jenne, R., and
Joseph, D. (1996).
The NCEP/NCAR 40-year reanalysis project.
Bulletin of the American Meteorological Society, 77(3):437–471.
Kraskov, A., Stogbauer, H., and Grassberger, P. (2004).
Estimating mutual information.
Physical Review E, 69(6):066138.
36
References iii
Kretschmer, M., Coumou, D., Donges, J. F., and Runge, J. (2016).
Using causal effect networks to analyze different arctic drivers
of midlatitude winter circulation.
Journal of Climate, 29(11):4069–4081.
Kretschmer, M., Coumou, D., and Runge, J. (2017).
Early prediction of weak stratospheric polar vortex states using
causal precursors.
under review.
Runge, J. (2015).
Quantifying information transfer and mediation along causal
pathways in complex systems.
Phys. Rev. E, 92(6):062829.
37
References iv
Runge, J., Donner, R. V., and Kurths, J. (2015a).
Optimal model-free prediction from multivariate time series.
Phys. Rev. E, 91(5):052909.
Runge, J., Petoukhov, V., Donges, J. F., Hlinka, J., Jajcay, N.,
Vejmelka, M., Hartman, D., Marwan, N., Palus, M., and Kurths, J.
(2015b).
Identifying causal gateways and mediators in complex
spatio-temporal systems.
Nature Communications, 6:8502.
Runge, J., Petoukhov, V., and Kurths, J. (2014).
Quantifying the Strength and Delay of Climatic Interactions:
The Ambiguities of Cross Correlation and a Novel Measure
Based on Graphical Models.
Journal of Climate, 27(2):720–739.
38
References v
Runge, J., Riedl, M., Muller, A., Stepan, H., Wessel, N., and Kurths,
J. (2015c).
Quantifying the causal strength of multivariate cardiovascular
couplings with momentary information transfer.
Physiological Measurement, 36(4):813–825.
Spirtes, P., Glymour, C., and Scheines, R. (2000).
Causation, Prediction, and Search, volume 81.
The MIT Press, Boston.
Szekely, G. J., Rizzo, M. L., and Bakirov, N. K. (2007).
MEASURING AND TESTING DEPENDENCE BY
CORRELATION OF DISTANCES.
The Annals of Statistics, 35(6):2769–2794.
39
References vi
Vejmelka, M., Pokorna, L., Hlinka, J., Hartman, D., Jajcay, N., and
Palus, M. (2014).
Non-random correlation structures and dimensionality
reduction in multivariate climate data.
Climate Dynamics, 44(9-10):2663–2682.
40
Causal discovery challenges
Latent variables Synergistic dependencies
Contemporaneous effects Long-range memory
Causal assumptions
Causal assumptions
Causal interpretation assumes [Spirtes et al., 2000]:
• Causal Markov Condition: “All the relevant probabilistic
information that can be obtained from the system is contained in its
direct causes”
• Causal Sufficiency: Measured variables include all of the common
causes
• Faithfulness / Stableness: “Independencies in data arise not from
incredible coincidence, but rather from causal structure”; violated by
purely deterministic dependencies
Causal assumptions
Causal interpretation assumes [Spirtes et al., 2000]:
• Causal Markov Condition: “All the relevant probabilistic
information that can be obtained from the system is contained in its
direct causes”
• Causal Sufficiency: Measured variables include all of the common
causes
• Faithfulness / Stableness: “Independencies in data arise not from
incredible coincidence, but rather from causal structure”; violated by
purely deterministic dependencies
• No contemporaneous effects: (but can be extended)
• Stationarity: time series case
• Parametric assumptions of independence tests
Causal assumptions
Causal interpretation assumes [Spirtes et al., 2000]:
• Causal Markov Condition: “All the relevant probabilistic
information that can be obtained from the system is contained in its
direct causes”
• Causal Sufficiency: Measured variables include all of the common
causes
• Faithfulness / Stableness: “Independencies in data arise not from
incredible coincidence, but rather from causal structure”; violated by
purely deterministic dependencies
• No contemporaneous effects: (but can be extended)
• Stationarity: time series case
• Parametric assumptions of independence tests
More discussion → appendix
Causal assumptions
Causal Markov Condition
“All the relevant probabilistic information that can be obtained from the
system is contained in its direct causes”
Formally: Upon specifying a complete graph that contains all common
causes: separation in graph entails at least implied conditional
independencies in process
X1
X2
X3
tt−1t−2
X4
t−3
Causal assumptions
Causal Markov Condition
“All the relevant probabilistic information that can be obtained from the
system is contained in its direct causes”
Formally: Upon specifying a complete graph that contains all common
causes: separation in graph entails at least implied conditional
independencies in process
X1
X2
X3
tt−1t−2
X4
t−3
Causal assumptions
Causal Markov Condition
“All the relevant probabilistic information that can be obtained from the
system is contained in its direct causes”
Formally: Upon specifying a complete graph that contains all common
causes: separation in graph entails at least implied conditional
independencies in process
X1
X2
X3
tt−1t−2
X4
t−3
Causal assumptions
Causal Markov Condition
“All the relevant probabilistic information that can be obtained from the
system is contained in its direct causes”
Formally: Upon specifying a complete graph that contains all common
causes: separation in graph entails at least implied conditional
independencies in process
X1
X2
X3
tt−1t−2
X4
t−3 Structural equation modeling
framework:
X jt = f (X−
t , ηjt) η
jt ⊥⊥ X−
t+1 \ Xjt
Causal assumptions
Causal Sufficiency
R. Scheines: “Theory of causal inference is about the inferential effect of
a variety of assumptions far more than it is an endorsement of particular
assumptions”
Given estimates X ⊥⊥ Z | Y and no other independencies. Assuming
only Markov condition and faithfulness allows for several different graphs:
ZX
Y
time
ZX
YU
ZX
YU
Assuming sufficiency
Causal assumptions
Causal Sufficiency
R. Scheines: “Theory of causal inference is about the inferential effect of
a variety of assumptions far more than it is an endorsement of particular
assumptions”
Given estimates X ⊥⊥ Z | Y and no other independencies. Assuming
only Markov condition and faithfulness allows for several different graphs:
ZX
Y
time
ZX
YU
ZX
YU
Assuming sufficiency
V
Causal assumptions
Faithfulness
If there are any independence relations in the population that are not a
consequence of the Causal Markov condition (or d-separation), then the
population is unfaithful.
For example, given three variables and assuming the Causal Markov and
Sufficiency Conditions, suppose we measure these (in-)dependencies:
X ⊥⊥ Z X✚✚⊥⊥Z | Y
X✚✚⊥⊥Y (X✚✚⊥⊥Y | Z )
Y✚✚⊥⊥Z Y✚✚⊥⊥Z | X Z
X
Y
time
Causal assumptions
Faithfulness
If there are any independence relations in the population that are not a
consequence of the Causal Markov condition (or d-separation), then the
population is unfaithful.
For example, given three variables and assuming the Causal Markov and
Sufficiency Conditions, suppose we measure these (in-)dependencies:
X ⊥⊥ Z X✚✚⊥⊥Z | Y
X✚✚⊥⊥Y (X✚✚⊥⊥Y | Z )
Y✚✚⊥⊥Z Y✚✚⊥⊥Z | X Z
X
Y
time
Causal assumptions
Faithfulness
If there are any independence relations in the population that are not a
consequence of the Causal Markov condition (or d-separation), then the
population is unfaithful.
For example, given three variables and assuming the Causal Markov and
Sufficiency Conditions, suppose we measure these (in-)dependencies:
X ⊥⊥ Z X✚✚⊥⊥Z | Y
X✚✚⊥⊥Y (X✚✚⊥⊥Y | Z )
Y✚✚⊥⊥Z Y✚✚⊥⊥Z | X Z
X
Y
time
Causal assumptions
Stationarity of causal structure
structurally stationary for all samples
Causal assumptions
Stationarity of causal structure
structurally stationary within sliding windows
Causal assumptions
Stationarity of causal structuresummer summer summer summerwinter winter winter
periodically structurally stationary
Causal assumptions
Stationarity of causal structuresummer summer summer summerwinter winter winter
periodically structurally stationary
• structurally, not necessarily same strengh / parameters
Causal assumptions
Stationarity of causal structuresummer summer summer summerwinter winter winter
periodically structurally stationary
• structurally, not necessarily same strengh / parameters
• masking implemented in Tigramite