causality analysis in large-scale time series data - cikm ...liu32/cause.pdf · causality analysis...

66
Causality Analysis in Large-scale Time Series Data CIKM 2013 Tutorial Yan Liu University of Southern California October 29, 2013 Yan Liu (USC) Causality Analysis in Large-scale Time Series Data October 29, 2013 1 / 66

Upload: ngohuong

Post on 07-Mar-2018

225 views

Category:

Documents


4 download

TRANSCRIPT

Page 1: Causality Analysis in Large-scale Time Series Data - CIKM ...liu32/cause.pdf · Causality Analysis in Large-scale Time Series Data CIKM 2013 Tutorial Yan Liu University of Southern

Causality Analysis in Large-scale Time Series DataCIKM 2013 Tutorial

Yan Liu

University of Southern California

October 29, 2013

Yan Liu (USC) Causality Analysis in Large-scale Time Series Data October 29, 2013 1 / 66

Page 2: Causality Analysis in Large-scale Time Series Data - CIKM ...liu32/cause.pdf · Causality Analysis in Large-scale Time Series Data CIKM 2013 Tutorial Yan Liu University of Southern

History I

I would rather discover one causal lawthan be the king of Persia.

Democritus, (460-370 BC)

Yan Liu (USC) Causality Analysis in Large-scale Time Series Data October 29, 2013 2 / 66

Page 3: Causality Analysis in Large-scale Time Series Data - CIKM ...liu32/cause.pdf · Causality Analysis in Large-scale Time Series Data CIKM 2013 Tutorial Yan Liu University of Southern

History II

Figure : The Illustrated Sutra of Cause and Effect. 8th century AD, Japan

Yan Liu (USC) Causality Analysis in Large-scale Time Series Data October 29, 2013 3 / 66

Page 4: Causality Analysis in Large-scale Time Series Data - CIKM ...liu32/cause.pdf · Causality Analysis in Large-scale Time Series Data CIKM 2013 Tutorial Yan Liu University of Southern

Outline

Lecture 1: Introduction to Granger Causality (90 mins)

Overview for Causality Analysis from Time Series Data (20 mins)

Granger causality (40 mins)▸ Definition▸ Identification and learning▸ Examples in practical applications

Known Issues of Granger causality (30 mins)▸ Non-linear extensions▸ Latent factors▸ Instantaneous causation

Break (30min)

Yan Liu (USC) Causality Analysis in Large-scale Time Series Data October 29, 2013 4 / 66

Page 5: Causality Analysis in Large-scale Time Series Data - CIKM ...liu32/cause.pdf · Causality Analysis in Large-scale Time Series Data CIKM 2013 Tutorial Yan Liu University of Southern

Outline

Lecture 2: Alternative Approaches and New Trends (80 mins)

Practical Issues in Granger causality (30 mins)▸ Time lag▸ Group effect▸ Non-stationary▸ Collinearity▸ Scalability

Alternative Approaches (30 mins)▸ Randomization test▸ Auto-correlation and cross-correlation▸ Transfer entropy

Illustration Examples (20 mins)

Yan Liu (USC) Causality Analysis in Large-scale Time Series Data October 29, 2013 5 / 66

Page 6: Causality Analysis in Large-scale Time Series Data - CIKM ...liu32/cause.pdf · Causality Analysis in Large-scale Time Series Data CIKM 2013 Tutorial Yan Liu University of Southern

Disclaimer

This is not meant for a comprehensive tutorial for causality

Most techniques discussed in the tutorials are meant for facilitatingcausal discovery instead of automatic causal discovery

This is mostly a computational view of causal discovery, i.e., how tomake causal discovery practical

Yan Liu (USC) Causality Analysis in Large-scale Time Series Data October 29, 2013 6 / 66

Page 7: Causality Analysis in Large-scale Time Series Data - CIKM ...liu32/cause.pdf · Causality Analysis in Large-scale Time Series Data CIKM 2013 Tutorial Yan Liu University of Southern

Outline

Lecture 1: Introduction to Granger Causality (90 mins)

Overview for Causality Analysis from Time Series Data (20 mins)

Granger causality (40 mins)▸ Definition▸ Identification and learning▸ Examples in practical applications

Known Issues of Granger causality (30 mins)▸ Non-linear extensions▸ Latent factors▸ Instantaneous causation

Break (30min)

Yan Liu (USC) Causality Analysis in Large-scale Time Series Data October 29, 2013 7 / 66

Page 8: Causality Analysis in Large-scale Time Series Data - CIKM ...liu32/cause.pdf · Causality Analysis in Large-scale Time Series Data CIKM 2013 Tutorial Yan Liu University of Southern

Motivation

Finding the answer to “why?” questions.

All physical laws are causal laws; for ex. The Gravity Law:

F ← Gm1m2

r2

It is not just an algebraic equation; it also implies cause and effectrelationship.

Treatment and Symptoms are correlated in the observations; but onlychanging Treatment has an impact on Symptoms.

Challenges

Identifying True causal relationship.

Designing Scalable algorithms for large datasets.

Yan Liu (USC) Causality Analysis in Large-scale Time Series Data October 29, 2013 8 / 66

Page 9: Causality Analysis in Large-scale Time Series Data - CIKM ...liu32/cause.pdf · Causality Analysis in Large-scale Time Series Data CIKM 2013 Tutorial Yan Liu University of Southern

Philosophical View: Causality and Counter-factual Analysis

Causal Processes: Mark Transmission Theory by Wesley Salmon (1984):

Let P be a process that, in the absence of interactions with other processeswould remain uniform with respect to a characteristic Q, which it wouldmanifest consistently over an interval that includes both of the space-timepoints A and B. Then, a mark (consisting of a modification of Q into Q⋆),which has been introduced into process P by means of a single localinteraction at a point A, is transmitted to point B if [and only if] Pmanifests the modification Q⋆ at B and at all stages of the process betweenA and B without additional interactions.

Counter-factual Analysis: David Hume (1711 - 1776):

where if the first object had not been, the second had neverexisted.

p→ q

¬p→ ¬q

Yan Liu (USC) Causality Analysis in Large-scale Time Series Data October 29, 2013 9 / 66

Page 10: Causality Analysis in Large-scale Time Series Data - CIKM ...liu32/cause.pdf · Causality Analysis in Large-scale Time Series Data CIKM 2013 Tutorial Yan Liu University of Southern

Philosophical View: Counter-factual Analysis

Philosophical Definition: Usually in counter-factual language:

A→ B if ¬A→ ¬B.

It requires intervention in the world andchange of A to ¬A, while keeping the rest ofthe confounders unchanged. (aka “ControlledExperiment”)In many cases intervention is impossible: thecar accident example.

Yan Liu (USC) Causality Analysis in Large-scale Time Series Data October 29, 2013 10 / 66

Page 11: Causality Analysis in Large-scale Time Series Data - CIKM ...liu32/cause.pdf · Causality Analysis in Large-scale Time Series Data CIKM 2013 Tutorial Yan Liu University of Southern

Observational Studies: Randomization Test

Analyzing X → Y :

Fix all other factors.

Change X and record the response of Y .

Perform a statistical significance test.

𝑋 Y

𝑋 Y

Statistical Identifiability Problem: The joint distribution only encodes theassociation. Causality (in general) cannot be inferred from theobservational data: we need causal priors to resolve the directions.Advantage

⇒ By far, the most accurate methodNot comparable with non-experimental (Data centric) methods.

Limitations Sometimes randomized experiments are: Expensive, Immoralor Impossible.

Yan Liu (USC) Causality Analysis in Large-scale Time Series Data October 29, 2013 11 / 66

Page 12: Causality Analysis in Large-scale Time Series Data - CIKM ...liu32/cause.pdf · Causality Analysis in Large-scale Time Series Data CIKM 2013 Tutorial Yan Liu University of Southern

Time Series Data are Everywhere

One important task in “From Data to Knowledge”:

Discovery of Temporal Causal Relationships

Yan Liu (USC) Causality Analysis in Large-scale Time Series Data October 29, 2013 12 / 66

Page 13: Causality Analysis in Large-scale Time Series Data - CIKM ...liu32/cause.pdf · Causality Analysis in Large-scale Time Series Data CIKM 2013 Tutorial Yan Liu University of Southern

Understanding Climate Change Data

Yan Liu (USC) Causality Analysis in Large-scale Time Series Data October 29, 2013 13 / 66

Page 14: Causality Analysis in Large-scale Time Series Data - CIKM ...liu32/cause.pdf · Causality Analysis in Large-scale Time Series Data CIKM 2013 Tutorial Yan Liu University of Southern

Gene Regulatory Network Discovery

Yan Liu (USC) Causality Analysis in Large-scale Time Series Data October 29, 2013 14 / 66

Page 15: Causality Analysis in Large-scale Time Series Data - CIKM ...liu32/cause.pdf · Causality Analysis in Large-scale Time Series Data CIKM 2013 Tutorial Yan Liu University of Southern

Social Influence Analysis

Yan Liu (USC) Causality Analysis in Large-scale Time Series Data October 29, 2013 15 / 66

Page 16: Causality Analysis in Large-scale Time Series Data - CIKM ...liu32/cause.pdf · Causality Analysis in Large-scale Time Series Data CIKM 2013 Tutorial Yan Liu University of Southern

Major Challenges

Large-scale high-dimensional time series data with complex relations

Yan Liu (USC) Causality Analysis in Large-scale Time Series Data October 29, 2013 16 / 66

Page 17: Causality Analysis in Large-scale Time Series Data - CIKM ...liu32/cause.pdf · Causality Analysis in Large-scale Time Series Data CIKM 2013 Tutorial Yan Liu University of Southern

Outline

Lecture 1: Introduction to Granger Causality (90 mins)

Overview for Causality Analysis from Time Series Data (20 mins)

Granger causality (40 mins)▸ Definition▸ Identification and learning▸ Examples in practical applications

Known Issues of Granger causality (30 mins)▸ Non-linear extensions▸ Latent factors▸ Instantaneous causation

Break (30min)

Yan Liu (USC) Causality Analysis in Large-scale Time Series Data October 29, 2013 17 / 66

Page 18: Causality Analysis in Large-scale Time Series Data - CIKM ...liu32/cause.pdf · Causality Analysis in Large-scale Time Series Data CIKM 2013 Tutorial Yan Liu University of Southern

Granger Causality

A cause is prior to its effect.

If XGrangerÐÐÐÐÐ→ Y , then Xpast should

significantly help predicting Y future via Y past

alone.

Based on linear regression/prediction of timeseries (Granger 1969)

“Applied economists found the definitionunderstandable and useable”

Several writers stated that “of course, this isnot real causality, it is only GrangerCausality.”

“It is generally agreed that it does notcapture all aspects of causality, but enough tobe worth considering in an empirical test.”

Clive Granger, recipient of

the 2003 Nobel Prize in

Economics.Yan Liu (USC) Causality Analysis in Large-scale Time Series Data October 29, 2013 18 / 66

Page 19: Causality Analysis in Large-scale Time Series Data - CIKM ...liu32/cause.pdf · Causality Analysis in Large-scale Time Series Data CIKM 2013 Tutorial Yan Liu University of Southern

Granger Causality: Formal Definition

Two Principles

1 The cause happens prior to the effect.

2 The cause makes unique changes in the effect. In other words, thecausal series contains unique information about the effect series thatis not available otherwise.

Define the following information sets:

I⋆(t) - the set of all information in the universe up to time t;

I⋆−X(t) - the set of all information in the universe excluding X up totime t.

Granger’s definition of causality (1969, 1980). Given two time seriesX and Y , we say X causes Y if

P[Y (t + 1) ∈ A∣I⋆(t)] ≠ P[Y (t + 1) ∈ A∣I⋆−X(t)],

Yan Liu (USC) Causality Analysis in Large-scale Time Series Data October 29, 2013 19 / 66

Page 20: Causality Analysis in Large-scale Time Series Data - CIKM ...liu32/cause.pdf · Causality Analysis in Large-scale Time Series Data CIKM 2013 Tutorial Yan Liu University of Southern

Granger Causality: Practical Definition

We perform two vector auto-regressions as follows:

Y (t) =L

∑l=1

alY (t − l) + ε1 (1)

Y (t) =L

∑l=1

a′lY (t − l) +L

∑l=1

b′lX(t − l) + ε2, (2)

where L is the maximal time lag. We say X causes Y if eq (2) isstatistically significantly better than eq (1).

Yan Liu (USC) Causality Analysis in Large-scale Time Series Data October 29, 2013 20 / 66

Page 21: Causality Analysis in Large-scale Time Series Data - CIKM ...liu32/cause.pdf · Causality Analysis in Large-scale Time Series Data CIKM 2013 Tutorial Yan Liu University of Southern

Examples

(a) (b)

Figure : Illustration of the main principle behind Granger causality: in bothexamples the cause happens prior to its effect and its past values help predictingfuture values of the effect. (a) Plot of the values of two time series. (b) Plot oftwo point processes in which each event is shown with a mark at its happeningtime.

Yan Liu (USC) Causality Analysis in Large-scale Time Series Data October 29, 2013 21 / 66

Page 22: Causality Analysis in Large-scale Time Series Data - CIKM ...liu32/cause.pdf · Causality Analysis in Large-scale Time Series Data CIKM 2013 Tutorial Yan Liu University of Southern

Granger Causality: Extension to Multivariate Time Series

Give p number of time series, X1, . . . ,Xp, we are interested in identifyingwhich time series Granger causes Xi.

Exhaustive-Granger Examine the pairwise relationships between pairs oftime seriesMultivariate Regression Perform vector auto-regression as follows:

Xi(t) =p

∑j=1

a⊺i,jXt,Laggedj + ε,

Xt,Laggedj = [Xj(t −L), . . . ,Xj(t − 1)] is the lagged time series, and

ai,j = [ai,j,1, . . . , ai,j,L] is the coefficient vector. Then we test the zeronessof ai,j by statistical significant tests.

Yan Liu (USC) Causality Analysis in Large-scale Time Series Data October 29, 2013 22 / 66

Page 23: Causality Analysis in Large-scale Time Series Data - CIKM ...liu32/cause.pdf · Causality Analysis in Large-scale Time Series Data CIKM 2013 Tutorial Yan Liu University of Southern

Granger Causality: Creating the Causal Graph

Evolution Matrices ⇒ Granger Causality Graph

0.5

0.3

0.8

0.1

1 2

3 4

1

2

3

4

1 2 3 4

2

1

2

1

If αi,j ≠ 0 ⇒ Xj →Xi

Yan Liu (USC) Causality Analysis in Large-scale Time Series Data October 29, 2013 23 / 66

Page 24: Causality Analysis in Large-scale Time Series Data - CIKM ...liu32/cause.pdf · Causality Analysis in Large-scale Time Series Data CIKM 2013 Tutorial Yan Liu University of Southern

Granger Causality in High Dimensions

Granger Causality with significance test

T /L ≥ p + 1 P[Error] ≤ cL√T −L exp (− c22 (T −L))

T /L < p + 1 Inconsistent

Lasso-Granger the variable selection can be efficiently done in highdimensions: (Valdes-Sosa et al 2005 and Arnold et al 2007)

min{β}

T

∑t=L+1

p

∑i=1

∥Xi(t) −p

∑i=1

β⊺i,jXt,Laggedj ∥

2

2

+ λ ∥β∥1,

Lasso-GrangerP[Error] = o(c′L exp(−T ν)) for some 0 ≤ ν < 1.

Yan Liu (USC) Causality Analysis in Large-scale Time Series Data October 29, 2013 24 / 66

Page 25: Causality Analysis in Large-scale Time Series Data - CIKM ...liu32/cause.pdf · Causality Analysis in Large-scale Time Series Data CIKM 2013 Tutorial Yan Liu University of Southern

Granger Causality in Practice

Advantages The simplicity, robustness, interpretability and scalabilitymakes Granger causality widely adopted

Philosophical Causal order defines time order; the reverse is not true ingeneral. Examples: Instant Causation, The forward looking behavior ofhuman beings

Spurious Causality Granger causality is sensitive to the absence ofpossible causes.

Direct Causes Granger causality only detects the direct causality. Forexample consider

XGrangerÐÐÐÐÐ→ Z

GrangerÐÐÐÐÐ→ Y

Given Z, Granger causality test will not detect the transferred causaleffects of X on Y .

Yan Liu (USC) Causality Analysis in Large-scale Time Series Data October 29, 2013 25 / 66

Page 26: Causality Analysis in Large-scale Time Series Data - CIKM ...liu32/cause.pdf · Causality Analysis in Large-scale Time Series Data CIKM 2013 Tutorial Yan Liu University of Southern

Application 1: Brain Image Analysis

Threshold difference GCMs for a

face-selective region in the left fusiform

gyrus for subject 1 in (A) and subject 2 in

(B). Event-related BOLD responses are

shown for the circled areas in the calcarine

sulcus (in green), the fusiform gyrus (the

reference area, in red) and the intra-parital

sulcus (in blue) for both the Cue stimulus

and the Face stimulus.

Mapping directed influence over thebrain using Granger causality andfMRI. Roebroeck et al, 2005.

Yan Liu (USC) Causality Analysis in Large-scale Time Series Data October 29, 2013 26 / 66

Page 27: Causality Analysis in Large-scale Time Series Data - CIKM ...liu32/cause.pdf · Causality Analysis in Large-scale Time Series Data CIKM 2013 Tutorial Yan Liu University of Southern

Application 2: Climate Change Attribution

Yan Liu (USC) Causality Analysis in Large-scale Time Series Data October 29, 2013 27 / 66

Page 28: Causality Analysis in Large-scale Time Series Data - CIKM ...liu32/cause.pdf · Causality Analysis in Large-scale Time Series Data CIKM 2013 Tutorial Yan Liu University of Southern

Application 3: Regulatory Network Discovery

Yan Liu (USC) Causality Analysis in Large-scale Time Series Data October 29, 2013 28 / 66

Page 29: Causality Analysis in Large-scale Time Series Data - CIKM ...liu32/cause.pdf · Causality Analysis in Large-scale Time Series Data CIKM 2013 Tutorial Yan Liu University of Southern

Application 4: Twitter Influence Network Discovery

Twitter Meme:

Tiger Woods

Haiti

Scott Brown

Christmas

Avatar

Dubai Burj

Occupy WS:

O-LA

O-WS

O-SF

O-DC

Yan Liu (USC) Causality Analysis in Large-scale Time Series Data October 29, 2013 29 / 66

Page 30: Causality Analysis in Large-scale Time Series Data - CIKM ...liu32/cause.pdf · Causality Analysis in Large-scale Time Series Data CIKM 2013 Tutorial Yan Liu University of Southern

Outline

Lecture 1: Introduction to Granger Causality (90 mins)

Overview for Causality Analysis from Time Series Data (20 mins)

Granger causality (40 mins)▸ Definition▸ Identification and learning▸ Examples in practical applications

Known Issues of Granger causality (30 mins)▸ Non-linear extensions▸ Latent factors▸ Instantaneous causation

Break (30min)

Yan Liu (USC) Causality Analysis in Large-scale Time Series Data October 29, 2013 30 / 66

Page 31: Causality Analysis in Large-scale Time Series Data - CIKM ...liu32/cause.pdf · Causality Analysis in Large-scale Time Series Data CIKM 2013 Tutorial Yan Liu University of Southern

Issue 1: Nonlinear Extensions

The assumption of linearity in VAR-based Granger causality testsignificantly simplifies the task of testing conditional independence, but itcan be easily violated in the real applications.

0 20 40 600

200

400

600

800

1000

Gust Speed

Freq

uenc

y

Distribution of Wind Gust Speed

0 1 2 3 4

2000

4000

6000

8000

10000

Rainfall

Freq

uenc

y

Distribution of Rainfall

Yan Liu (USC) Causality Analysis in Large-scale Time Series Data October 29, 2013 31 / 66

Page 32: Causality Analysis in Large-scale Time Series Data - CIKM ...liu32/cause.pdf · Causality Analysis in Large-scale Time Series Data CIKM 2013 Tutorial Yan Liu University of Southern

Solutions to Nonlinearity

Many different approaches have been proposed to address the problem:

ARCH, GARCH

Non-parametric approach: Fourier or Wavelet transformation, Kernelmethods

Semi-parametric approach: copula-based approach

heuristic additive models

Yan Liu (USC) Causality Analysis in Large-scale Time Series Data October 29, 2013 32 / 66

Page 33: Causality Analysis in Large-scale Time Series Data - CIKM ...liu32/cause.pdf · Causality Analysis in Large-scale Time Series Data CIKM 2013 Tutorial Yan Liu University of Southern

Nonlinear Extensions: Kernel Methods

Given the following predictors,

U(t) ≜ [X1(t −L), . . . ,X1(t − 1), . . . ,Xn(t −L), . . . ,Xn(t − 1)] (3)

V(t) ≜ [X1(t −L), . . . ,X1(t − 1), . . . ,Xj−1(t −L), . . . ,Xj−1(t − 1),. . . ,Xj+1(t −L), . . . ,Xj+1(t − 1), . . . ,Xn(t −L), . . . ,Xn(t − 1)]

(4)

where V(t) is U(t) excluding the observations of Xj , we can perform twokernel regressions [Marinazzo et al., 2008]:

Predicting Xi(t) using U(t),

Predicting Xi(t) using V(t).

Let σ̂1 and σ̂2 denote the variance of noise obtained by the regressionalgorithms. Significance test can be applied on 1 − σ̂1/σ̂2 to decide theedge Xj →Xi.Kernel functions: Gaussian kernel is a common choice.

Yan Liu (USC) Causality Analysis in Large-scale Time Series Data October 29, 2013 33 / 66

Page 34: Causality Analysis in Large-scale Time Series Data - CIKM ...liu32/cause.pdf · Causality Analysis in Large-scale Time Series Data CIKM 2013 Tutorial Yan Liu University of Southern

Nonlinear Extensions: Semi-parametric Solutions

Modeling dependency requires more samples than modeling the marginals.

Separate learning for marginals and dependency: (Liu et al 2009)

▸ A non-parametric estimator for modeling the marginals▸ The data-efficient VAR model for modeling dependency.

42

Sklar�s Theorem[Sklar 59]

= +

[Krishner’11]

Yan Liu (USC) Causality Analysis in Large-scale Time Series Data October 29, 2013 34 / 66

Page 35: Causality Analysis in Large-scale Time Series Data - CIKM ...liu32/cause.pdf · Causality Analysis in Large-scale Time Series Data CIKM 2013 Tutorial Yan Liu University of Southern

Granger Non-paranormal (G-NPN) Model

We say a set of time series X = (X1, . . . ,Xp) has Granger-Nonparanormaldistribution G −NPN(X,A, F ) if:

1 There exist distribution functions {Fj}pj=1 such that Zj ≜ Fj(Xj) forj = 1, . . . , p are jointly Gaussian.

2 Zj for j = 1, . . . , p are factorized according to the VAR model withcoefficients A.

Yan Liu (USC) Causality Analysis in Large-scale Time Series Data October 29, 2013 35 / 66

Page 36: Causality Analysis in Large-scale Time Series Data - CIKM ...liu32/cause.pdf · Causality Analysis in Large-scale Time Series Data CIKM 2013 Tutorial Yan Liu University of Southern

Copula-Granger Algorithm

Learning Copula-Granger:

1 Find the empirical marginal distribution for each time series F̂i.

2 Map the observations into the copula space asf̂i(Xi(t)) = Φ−1 (F̂i(Xi(t))).

3 Find the Granger causality among f̂i(Xi(t)) using Lasso-Granger.

Winsorized Estimation of CDF:

F̃j =⎧⎪⎪⎪⎨⎪⎪⎪⎩

δT if F̂j(Xj) < δTF̂j(Xj) if δT ≤ F̂j(Xj) ≤ 1 − δT(1 − δT ) if F̂j(Xj) > 1 − δT

0 0.5 1-4

-2

0

2

4

Yan Liu (USC) Causality Analysis in Large-scale Time Series Data October 29, 2013 36 / 66

Page 37: Causality Analysis in Large-scale Time Series Data - CIKM ...liu32/cause.pdf · Causality Analysis in Large-scale Time Series Data CIKM 2013 Tutorial Yan Liu University of Southern

Issue 2: Latent Factors

Latent Factors with GlobalImpact

10 20 30 40 50 60 700

1

2

3

4

Time (days)

# of

Tw

eets

(×10

4 )

Twitter Examples: Unobservedexternal factors impact the entirenetwork activities.Question: How can we discovercausal relationships in presence oflatent factors with global impacts?

Latent Factors with LocalImpact

X1

X4 X2

X3

2

1

1

The complex nature of the latentfactors makes the analysischallenging.Question: How can we accuratelymodel in presence of latent factorswith minor impacts?

Yan Liu (USC) Causality Analysis in Large-scale Time Series Data October 29, 2013 37 / 66

Page 38: Causality Analysis in Large-scale Time Series Data - CIKM ...liu32/cause.pdf · Causality Analysis in Large-scale Time Series Data CIKM 2013 Tutorial Yan Liu University of Southern

Latent Factors with Local Impact

Common Approach

1 Explicitly model the hidden variables in the graphical model.

2 Apply the Expectation Maximization algorithm to learn the structure.

Drawbacks

1 EM algorithm is slow.

2 EM algorithm is trapped in the local minima.

Yan Liu (USC) Causality Analysis in Large-scale Time Series Data October 29, 2013 38 / 66

Page 39: Causality Analysis in Large-scale Time Series Data - CIKM ...liu32/cause.pdf · Causality Analysis in Large-scale Time Series Data CIKM 2013 Tutorial Yan Liu University of Southern

Latent Factors with Global Impact

Sparse plus Low-rank Decomposition

= +

Original Low Rank Sparse

There are convex optimization algorithms to decompose a matrix into asparse matrix and a low-rank matrix.

Yan Liu (USC) Causality Analysis in Large-scale Time Series Data October 29, 2013 39 / 66

Page 40: Causality Analysis in Large-scale Time Series Data - CIKM ...liu32/cause.pdf · Causality Analysis in Large-scale Time Series Data CIKM 2013 Tutorial Yan Liu University of Southern

Generalized Linear Auto-regressive Processes

Generalized Linear Model (GLM)

g(Ey∣x[y]) = Ax + b,

where the strictly monotone function g(.) is called the link function.Generalized Linear Auto-Regressive Processes (GLARP)

g(EH(t)[x(t)]) =K

∑`=1

A(`)x(t − `) + b.

where the matrices A(`) for ` = 1, . . . ,K are called the Evolution Matrices.GLARP with Latent Factors

g (EH(t) [x(t)z(t)]) =

K

∑`=1

[A(`) B(`)

C(`) D(`)] [x(t − `)

z(t − `)] + b

for t =K + 1, . . . , T . x(t), a p × 1 vector, represents the observedvariables, z(t), a r × 1 vector, denotes the unobserved variables.

Yan Liu (USC) Causality Analysis in Large-scale Time Series Data October 29, 2013 40 / 66

Page 41: Causality Analysis in Large-scale Time Series Data - CIKM ...liu32/cause.pdf · Causality Analysis in Large-scale Time Series Data CIKM 2013 Tutorial Yan Liu University of Southern

Key Models: Count Data

Commonly assumed Poisson distribution:

logλ(t) = log(EH(t)[x(t)]) =K

∑`=1

A(`)x(t − `) + b,

Conway-Maxwell Poisson distribution: The adjustable rate of decay:

P[X = k − 1]P[X = k] = (k

µ)ν

,

The model :

P[xi(t)∣µi(t), ν] =1

S(µi(t), ν)(µi(t)

xi(t)

xi(t)!)ν

log (µ(t) + 1

2ν− 1

2) = log (EH(t)[x(t)]) =

K

∑`=1

A(`)x(t − `) + b.

Yan Liu (USC) Causality Analysis in Large-scale Time Series Data October 29, 2013 41 / 66

Page 42: Causality Analysis in Large-scale Time Series Data - CIKM ...liu32/cause.pdf · Causality Analysis in Large-scale Time Series Data CIKM 2013 Tutorial Yan Liu University of Southern

Key Models: Climate Extreme Value Data

Gumbel Distribution

F (xi(t)∣µi(t), σ) =

exp(− exp(−xi(t) − µi(t)σ

))

The model

µ(t) + σγE = EH(t)[x(t)] =K

∑`=1

A(`)x(t − `) + b,

0 20 40 600

200

400

600

800

1000

Gust Speed

Freq

uenc

y

Marginal distribution of Wind

Gust Speed

−5 0 5 10 150

0.05

0.1

0.15

0.2

Heavy Tail

µ = 0.5σ = 2

Yan Liu (USC) Causality Analysis in Large-scale Time Series Data October 29, 2013 42 / 66

Page 43: Causality Analysis in Large-scale Time Series Data - CIKM ...liu32/cause.pdf · Causality Analysis in Large-scale Time Series Data CIKM 2013 Tutorial Yan Liu University of Southern

Issue 3: Instantaneous causation

In some cases, there is a possibility that a time series has an instantaneouscausal effect on another time series.Solutions: Structural Vector Authoregressive (SVAR) formulated asfollows:

Xi(t) =n

∑j=1

a⊺i,jXi(t −L, . . . , t) + εi(t), (5)

where εi, i = 1, . . . , n are independent white noise processes.Similar to Independent Component Analysis (ICA), SVAR suffers fromidentifiability when the noise processes are Gaussian processes.Several attempts have been made to estimate the model above, such as[Hyvarinen et al, 2010] based on non-Gaussian assumption of the noiseprocesses.

Yan Liu (USC) Causality Analysis in Large-scale Time Series Data October 29, 2013 43 / 66

Page 44: Causality Analysis in Large-scale Time Series Data - CIKM ...liu32/cause.pdf · Causality Analysis in Large-scale Time Series Data CIKM 2013 Tutorial Yan Liu University of Southern

Outline

Lecture 2: Alternative Approaches and New Trends (80 mins)

Practical Issues in Granger causality (30 mins)▸ Time lag▸ Group effect▸ Non-stationary▸ Collinearity▸ Scalability

Alternative Approaches (30 mins)▸ Randomization test▸ Auto-correlation and cross-correlation▸ Transfer entropy

Illustration Examples (20 mins)

Yan Liu (USC) Causality Analysis in Large-scale Time Series Data October 29, 2013 44 / 66

Page 45: Causality Analysis in Large-scale Time Series Data - CIKM ...liu32/cause.pdf · Causality Analysis in Large-scale Time Series Data CIKM 2013 Tutorial Yan Liu University of Southern

Practical Issues: Time Lag

Observation: Granger causality requires prior knowledge about themaximum lag L.

Solutions:

Cross-validation

Modeling the distribution of maximum lag length

Autocorrelation function (ACF) and partical autocorrelations (PACF)

Other model selection techniques: AIC and BIC

Yan Liu (USC) Causality Analysis in Large-scale Time Series Data October 29, 2013 45 / 66

Page 46: Causality Analysis in Large-scale Time Series Data - CIKM ...liu32/cause.pdf · Causality Analysis in Large-scale Time Series Data CIKM 2013 Tutorial Yan Liu University of Southern

Practical Issues: Group Effect

Observation: The existence of any non-zero element in the coefficientvector ai,j is interpreted as Xj is a Granger cause of Xi, the penalizationterm should shrink the group of coefficient ai instead of individuallysparsifying its elements.

Solutions: We can reformulates the Lasso-Granger via the Grouped-Lassopenalization as follows:

min{a}

T

∑t=L+1

∥X1(t) −n

∑i=1

a⊺iXt,Laggedi ∥

2

2

+ λn

∑i=1

∥ai∥2 , (6)

Yan Liu (USC) Causality Analysis in Large-scale Time Series Data October 29, 2013 46 / 66

Page 47: Causality Analysis in Large-scale Time Series Data - CIKM ...liu32/cause.pdf · Causality Analysis in Large-scale Time Series Data CIKM 2013 Tutorial Yan Liu University of Southern

Practical Issues: Non-stationary

Observation: Stationary assumptions can be easily violated in practicalapplications.

Solutions:

Window-based approach: divide the time series into sub-intervals,assume that in each sub-interval the time series are stationary andapply Granger causality

Markov model-based approach for recurrent patterns: associate eachtemporal structure with a hidden state and automatically infer thepossible state assignment of each time stamp and its temporalstructures via EM algorithm

Yan Liu (USC) Causality Analysis in Large-scale Time Series Data October 29, 2013 47 / 66

Page 48: Causality Analysis in Large-scale Time Series Data - CIKM ...liu32/cause.pdf · Causality Analysis in Large-scale Time Series Data CIKM 2013 Tutorial Yan Liu University of Southern

Practical Issues: Collinearity

Observation: Some time series could be linearly correlated to each other,which raises identifiability issues for regression-based approaches

Solutions: feature selection methods, such as SCAD [Fan et al, 2001] andadaptive Lasso [Yuan & Zou, 2006], can be applied.

Yan Liu (USC) Causality Analysis in Large-scale Time Series Data October 29, 2013 48 / 66

Page 49: Causality Analysis in Large-scale Time Series Data - CIKM ...liu32/cause.pdf · Causality Analysis in Large-scale Time Series Data CIKM 2013 Tutorial Yan Liu University of Southern

Practical Issues: Scalability

Parallel Stochastic Optimization AlgorithmsStochastic Subgradient Langevin Dynamics (SGLD) [Welling & Teh, 2011]:combining mini-batch stochastic subgradient descent and Langevin Dynamics.

Parallel Stochastic Coordinate Descent (Shotgun) [Bradley et al, 2011]: a parallelimplementation of Stochastic Coordinate Descent.

Parallel Stochastic Gradient Descent (PSGD) [Zinkevich et al, 2010]: randomlypartitioning the data, giving one partition to each processor, which sequentiallyuses each data point of its own partition to update β using a constant step size η.

Yan Liu (USC) Causality Analysis in Large-scale Time Series Data October 29, 2013 49 / 66

Page 50: Causality Analysis in Large-scale Time Series Data - CIKM ...liu32/cause.pdf · Causality Analysis in Large-scale Time Series Data CIKM 2013 Tutorial Yan Liu University of Southern

Outline

Lecture 2: Alternative Approaches and New Trends (80 mins)

Practical Issues in Granger causality (30 mins)▸ Time lag▸ Group effect▸ Non-stationary▸ Collinearity▸ Scalability

Alternative Approaches (30 mins)▸ Randomization test▸ Auto-correlation and cross-correlation▸ Transfer entropy

Illustration Examples (20 mins)

Yan Liu (USC) Causality Analysis in Large-scale Time Series Data October 29, 2013 50 / 66

Page 51: Causality Analysis in Large-scale Time Series Data - CIKM ...liu32/cause.pdf · Causality Analysis in Large-scale Time Series Data CIKM 2013 Tutorial Yan Liu University of Southern

Randomization Tests

Also knowns as Exact Test or Permutation Test; is a computationalapproach for statistical significance test.

H0 ∶X → Y vs. H1 ∶X /→ Y

Species Richness Area

32 2.0

29 0.9

35 3.1

26 3.0

41 1.0

62 2.0

88 4.0

77 3.5

In 9.5% of permuted cases thecomputed correlation is higherthan the original case. s

⇒ “How likely is it that if the null hypothesis were true, I would observe avalue this extreme just due to chance?”

Yan Liu (USC) Causality Analysis in Large-scale Time Series Data October 29, 2013 51 / 66

Page 52: Causality Analysis in Large-scale Time Series Data - CIKM ...liu32/cause.pdf · Causality Analysis in Large-scale Time Series Data CIKM 2013 Tutorial Yan Liu University of Southern

Significance Tests

Suppose We are testing H0 ∶ θ ∈ Θ0 vs H1 ∶ θ ∉ Θ0. We can define thefollowing statistics:

λn =supθ∈ΘLn(θ)supθ∈Θ0

Ln(θ)The Likelihood Ratio Test rejects H0 if λn ≥ c for some c.

Theorem Asymptotic Distribution of Generalized Likelihood RatioStatistics: Subject to many assumptions (mostly smoothness) forhypotheses like H0 ∶ θ1 = θ2 = . . . = θr = 0 for some 1 ≤ r ≤ k, then

2 logλn → χ2r as n→∞

Practically suppose we want to guarantee with 1 − α probability that thetest is correct. Use the αχ2

rquantile to find the appropriate level for c.

Yan Liu (USC) Causality Analysis in Large-scale Time Series Data October 29, 2013 52 / 66

Page 53: Causality Analysis in Large-scale Time Series Data - CIKM ...liu32/cause.pdf · Causality Analysis in Large-scale Time Series Data CIKM 2013 Tutorial Yan Liu University of Southern

Structural Equation Modeling (SEM): Concepts

Goals

Formulate a framework for counter-factual analysis

Design statistical tools for testing proposed causal structure.

Terminology

Confounder background (usually unobserved) factors which can affect thecause and effect under study.

Spurious Effects The paths between the proposed cause and effectthrough other factors (variables).

Key claims

Causality is associated with change.

Causality cannot be inferred from observational data without extraassumptions.

Given a graph, I can tell you whether you can judge about a certaincausal relationship.

Yan Liu (USC) Causality Analysis in Large-scale Time Series Data October 29, 2013 53 / 66

Page 54: Causality Analysis in Large-scale Time Series Data - CIKM ...liu32/cause.pdf · Causality Analysis in Large-scale Time Series Data CIKM 2013 Tutorial Yan Liu University of Southern

SEM: The Model

Structural Equations A SEM is a set of equations describing the value ofeach variable in V as a function fX of its parents paX and a randomdisturbance term uX :

x = fX(paX , uX).Counter-Factual Analysis The Counter-Factual Yx(u), Y would be y (insituation u) had X been x, is given by:

Yx(u) ≜ YMx(u),Mx is a modified version of M with equations of X replaced by X = x.J. Pearl/Causal inference in statistics 123

X YZ X YZ

(a) (b)

U UU U UU

Z X Y

0x

YZ X

Fig 5. (a) Causal diagram representing a clinical trial with imperfect compliance. (b) Adiagram representing interventional treatment control.

the observed response. The UY term represents all factors (unobserved) thatinfluence the way a subject responds to treatments; hence, an arrow is drawnfrom UY to Y . Similarly, UX denotes all factors that influence the subject’scompliance with the assignment, and UZ represents the random device used indeciding assignment. The dependence between UX and UY allows for certain fac-tors (e.g., socio economic status or predisposition to disease and complications)to influence both compliance and response. In Eq. (5), fX represents the pro-cess by which subjects select treatment level and fY represents th process thatdetermines the outcome Y . Clearly, perfect compliance would amount to settingfX(z, uX) = z while any dependence on uX represents imperfect compliance.

The graphical model of Fig. 5(a) reflects two assumptions.

1. The assignment Z does not influence Y directly but rather through theactual treatment taken, X. This type of assumption is called “exclusion”restriction, for it excludes a variable (Z) from being a determining argu-ment of the function fY , as in Eq. (5).

2. The variable Z is independent of UY and UX ; this is ensured through therandomization of Z, which rules out a common cause for both Z and UY

(as well as for Z and UX).

By drawing the diagram of Fig. 5(a) an investigator encodes an unambiguousspecification of these two assumptions, and permits the technical part of theanalysis to commence, under the interpretation provided by Eq. (5).

The target of causal analysis in this setting is to estimate the causal effect ofthe treatment (X) on the outcome (Y ), as defined by the modified model of Eq.(6) and the corresponding distribution P (y|do(x0)). In words, this distributiondescribes the response of the population to a hypothetical experiment in whichwe administer treatment at level X = x0 uniformly to the entire populationand let x0 take different values on hypothetical copies of the population. Aninspection of the diagram in Fig. 5(a) reveals immediately that this distributionis not identifiable by adjusting for confounders. The graphical criterion for suchidentification (Definition 3) requires the existence of observed covariates on the“back-door” path X ← UX ↔ UY → Y , that blocks the spurious associationscreated by that path. Had UX (or UY ) been observable, the treatment effect

Figure : Pearl’s Graph Surgery

Yan Liu (USC) Causality Analysis in Large-scale Time Series Data October 29, 2013 54 / 66

Page 55: Causality Analysis in Large-scale Time Series Data - CIKM ...liu32/cause.pdf · Causality Analysis in Large-scale Time Series Data CIKM 2013 Tutorial Yan Liu University of Southern

SEM: Graph Surgery with Conditional Independence Tests

CI Test Principle Condition on adequate control variables, which willblock paths linking X and Y other than those which would exist in thesurgically-altered graph where all paths into X have been removed.Back-door criterion: (i) S blocks every path from X to Y that has anarrow into X , and (ii) no node in S is a descendant of X. Then

P[Y ∣do(X = x)] =∑s

P[Y ∣X = x,S = s]P(S = s)

Front-door criterion:(i) S blocks all directed paths from X to Y , (ii) thereare no unblocked back-door paths from X to S, and(iii) X blocks all back-door paths from S to Y . Forexample:

P[Y ∣do(X = x)] = P[Y ∣do(Z = z)]P[Z ∣do(X = x)]= P[Y ∣Z = z]P[Z ∣X = x]

Yan Liu (USC) Causality Analysis in Large-scale Time Series Data October 29, 2013 55 / 66

Page 56: Causality Analysis in Large-scale Time Series Data - CIKM ...liu32/cause.pdf · Causality Analysis in Large-scale Time Series Data CIKM 2013 Tutorial Yan Liu University of Southern

SEM: Theoretical Results

The back-door and front-door criteria are Sufficient for estimating thecausal effects from probabilistic distributions,

but they are not Necessary .

The necessary conditions don’t have nice forms.

When controlling all the variables is impossible we can still estimateor bound the P[Y ∣do(X = x)]. This is called Partial Identification.

Yan Liu (USC) Causality Analysis in Large-scale Time Series Data October 29, 2013 56 / 66

Page 57: Causality Analysis in Large-scale Time Series Data - CIKM ...liu32/cause.pdf · Causality Analysis in Large-scale Time Series Data CIKM 2013 Tutorial Yan Liu University of Southern

SEM: A Practical Example

Figure : Effect of Warm-up on Injury (After Shrier & Platt, 2008)

Yan Liu (USC) Causality Analysis in Large-scale Time Series Data October 29, 2013 57 / 66

Page 58: Causality Analysis in Large-scale Time Series Data - CIKM ...liu32/cause.pdf · Causality Analysis in Large-scale Time Series Data CIKM 2013 Tutorial Yan Liu University of Southern

Partial Correlation Based Algorithms List

TC Algorithm (Pellet et al., 2007) Finds correlation of (X,Y ) given therest of the variables. Then identifies the colliders and finds the direction ofthe edges using them. Complexity O(n3).Many BN Learning algorithms:

L1MB (Schmidt et al., 2007) Maximizing L1 penalized likelihoodfunction to get an undirected graph. Prune the graph usinggreedy hill climbing.

IAMB (Tsamardinos et al., 2003) Uses Mutual Information insteadof correlation and thresholds for deciding which edges tokeep. Not necessarily returns DAG. Complexity O(n2).

Yan Liu (USC) Causality Analysis in Large-scale Time Series Data October 29, 2013 58 / 66

Page 59: Causality Analysis in Large-scale Time Series Data - CIKM ...liu32/cause.pdf · Causality Analysis in Large-scale Time Series Data CIKM 2013 Tutorial Yan Liu University of Southern

Use of Asymmetry

Define higher order covariance as:

covjk[x1, x2] = E[(x1 −E(x1))j(x2 −E(x2))k],and higher order correlation as:

ρjk(x1, x2) =covjk(x1, x2)

σjx1σkx2

.

Consider two models:Model #1: x2 = b12x1 + ε1,Model #2: x1 = b21x2 + ε2,Prefer Model #1 if (as long as ρx1x2 ≠ 0, µ3(x1) ≠ 0 and µ3(x2) ≠ 0.):

ρ̂212 > ρ̂2

21

and Model #2 otherwise.If µ3(ε) = 0 then the test becomes:

γ̂2x1 > γ̂

2x2

The response variable should have lower skewness!Yan Liu (USC) Causality Analysis in Large-scale Time Series Data October 29, 2013 59 / 66

Page 60: Causality Analysis in Large-scale Time Series Data - CIKM ...liu32/cause.pdf · Causality Analysis in Large-scale Time Series Data CIKM 2013 Tutorial Yan Liu University of Southern

Transfer Entropy

Definition (Schreiber, 2000)

TX→Y = H (Y t∣Y t−1,Xt−1) −H (Y t∣Y t−1)The amount of resolved uncertainty in future of Y by past values of Xgiven past values of Y .Properties

Non-linear causation

Non-stationary time series

Conceptually more meaningful;since uses independence instead of correlation.

Limitations

Only well-defined for two time series

Computation of entropy from observations is challenging.

Relationship with Granger Causality (Barnett, 2009)

Transfer Entropy and Granger Causality are equivalent if the data isdistributed according to multivariate Gaussian distribution.

Yan Liu (USC) Causality Analysis in Large-scale Time Series Data October 29, 2013 60 / 66

Page 61: Causality Analysis in Large-scale Time Series Data - CIKM ...liu32/cause.pdf · Causality Analysis in Large-scale Time Series Data CIKM 2013 Tutorial Yan Liu University of Southern

Outline

Lecture 2: Alternative Approaches and New Trends (80 mins)

Practical Issues in Granger causality (30 mins)▸ Time lag▸ Group effect▸ Non-stationary▸ Collinearity▸ Scalability

Alternative Approaches (30 mins)▸ Randomization test▸ Auto-correlation and cross-correlation▸ Transfer entropy

Illustration Examples (20 mins)

Yan Liu (USC) Causality Analysis in Large-scale Time Series Data October 29, 2013 61 / 66

Page 62: Causality Analysis in Large-scale Time Series Data - CIKM ...liu32/cause.pdf · Causality Analysis in Large-scale Time Series Data CIKM 2013 Tutorial Yan Liu University of Southern

Demo

Code is available at:http://www-bcf.usc.edu/~liu32/code.htm

Visualization is available at:http://beijing.usc.edu:22222/granger/granger.php

Yan Liu (USC) Causality Analysis in Large-scale Time Series Data October 29, 2013 62 / 66

Page 63: Causality Analysis in Large-scale Time Series Data - CIKM ...liu32/cause.pdf · Causality Analysis in Large-scale Time Series Data CIKM 2013 Tutorial Yan Liu University of Southern

Demo 1: Synthetic Dataset

Synthetic Dataset generated from VAR models with 12 time series

Yan Liu (USC) Causality Analysis in Large-scale Time Series Data October 29, 2013 63 / 66

Page 64: Causality Analysis in Large-scale Time Series Data - CIKM ...liu32/cause.pdf · Causality Analysis in Large-scale Time Series Data CIKM 2013 Tutorial Yan Liu University of Southern

Demo 2: Twitter Dataset

Twitter dataset generated from Tweet behaviors by top 20 Twitter userson Haiti-earthquake

Yan Liu (USC) Causality Analysis in Large-scale Time Series Data October 29, 2013 64 / 66

Page 65: Causality Analysis in Large-scale Time Series Data - CIKM ...liu32/cause.pdf · Causality Analysis in Large-scale Time Series Data CIKM 2013 Tutorial Yan Liu University of Southern

Bibliography I

Philosophical Interpretations— Stanford Encyc. of Philosophy. Causal Processes; Counter-factual Theories of CausationRandomization Tests— Sheehan, N. A., Didelez, V., Burton, P. R., & Tobin, M. D. (2008). Mendelianrandomisation and causal inference in observational epidemiology. PLoS medicine.— La Fond, T., & Neville, J. (2010). Randomization tests for distinguishing social influence andhomophily effects. WWW’10SEM and Partial Correlation Algorithms— Pearl, J. (2009). Causal inference in statistics: An overview. Statistics Surveys.— Pellet;, J.-P., & Elisseeff, A. (2007). Partial Correlation- and Regression-Based Approachesto Causal Structure Learning. IBM Technical Report— Schmidt, M., Niculescu-Mizil, A., & Murphy, K. (2007). Learning graphical model structureusing L1-regularization paths.Theory— Robins, J. M., Scheines, R., Spirtes, P., & Wasserman, L. (2003). Uniform consistency incausal inference. Biometrika, 90(3), 491-515. doi:10.1093/biomet/90.3.491Non-Gaussian SEM— Shimizu, S., & Kano, Y. (2008). Use of non-normality in structural equation modeling:Application to direction of causation. Journal of Statistical Planning and Inference.— Shimizu, S., Hoyer, P. O., Hyvrinen, A., & Kerminen, A. (2006). A Linear Non-GaussianAcyclic Model for Causal Discovery. Journal of Machine Learning Research.

— Dodge, Y., & Rousson, V. (2001). On Asymmetric Properties of the Correlation Coefficient

in the Regression Setting. The Journal of American Statistical Association.

Yan Liu (USC) Causality Analysis in Large-scale Time Series Data October 29, 2013 65 / 66

Page 66: Causality Analysis in Large-scale Time Series Data - CIKM ...liu32/cause.pdf · Causality Analysis in Large-scale Time Series Data CIKM 2013 Tutorial Yan Liu University of Southern

Bibliography II

Non-linear Transformations— Hoyer, P. O., Janzing, D., Mooij, J., Peters, J., & Schlkopf, B. (2008). Nonlinear causaldiscovery with additive noise models. NIPS— Daniusis, P., Janzing, D., Mooij, J., Zscheischler, J., Steudel, B., Zhang, K., & Schoelkopf,B. (2010). Inferring deterministic causal relations. UAIGranger Causality— Anil Seth (2007) Granger causality. Scholarpedia, 2(7):1667.— Michael Eichler. Causal inference from time series: What can be learned from grangercausal- ity? Technical report, 2008.— Valds-Sosa, P. A., Snchez-Bornot, J. M., Lage-Castellanos, A., Vega-Hernndez, M.,Bosch-Bayard, J., Melie-Garca, L., & Canales-Rodrguez, E. (2005). Estimating brain functionalconnectivity with sparse multivariate autoregression. Philosophical transactions of the RoyalSociety of London. Series B, Biological sciences.— Look into publications of Prof. LiuTransfer Entropy— Schreiber, T. (2000). Measuring Information Transfer. Physical Review Letters.— Barnett, L. (2009). Granger Causality and Transfer Entropy Are Equivalent for GaussianVariables. Physical Review Letters.General Reading— Cosma Shalizi, Causality Inference lecture note:http://www.stat.cmu.edu/~cshalizi/350/lectures/31/lecture-31.pdf.Causality Reference List. http://cscs.umich.edu/~crshalizi/notabene/causality.html

Yan Liu (USC) Causality Analysis in Large-scale Time Series Data October 29, 2013 66 / 66