discovering causal structure with the pc-algorithm · pc algorithm two simple principles 1.there is...

Discovering Causal Structure with thePC-algorithm

CRG and MSDSlab Meeting

Discussant: Oiśın Ryan

February 23, 2018

What is a DAG?

I DAG = Directed Acyclic Graph

I Nodes or Vertices = {Author,Bar, V5.. }

I Directed Edges →I No cycles

I Cannot have A→ B → C → A

Why DAGs?

1. DAGs can be used to represent joint probability distributionsI Often called Bayesian networksI Nodes represent variablesI Edges represent dependencies between pairs of variables

I A → B means A 6⊥⊥ BI Read off conditional dependency relationships

2. DAGs + probability distribution used for causal inferenceI Edges represent direct causal links

I A → B means A causes BI Counterfactual causality (Pearl, Rubin, Spirtes & Glymour)I Account for typical ideas about causality

I Forward in time - acyclicalI Explains “paradoxes” - Simpsons, Lords

Why DAGs?

1. DAGs can be used to represent joint probability distributionsI Often called Bayesian networksI Nodes represent variablesI Edges represent dependencies between pairs of variables

I A → B means A 6⊥⊥ BI Read off conditional dependency relationships

2. DAGs + probability distribution used for causal inferenceI Edges represent direct causal links

I A → B means A causes BI Counterfactual causality (Pearl, Rubin, Spirtes & Glymour)I Account for typical ideas about causality

I Forward in time - acyclicalI Explains “paradoxes” - Simpsons, Lords

Some graph terminology

I Parents(B) = {A,D}I Children(B) = {C}I A is an ancestor of C

I C is a descendant of A

I Path = sequence of edges

DAGs and Probability distributions

DAGs can be used to represent a joint density function using theMarkov Condition

P(V ) =∏V∈V

P(V |Parents(V )

)(1)

P(A,B,C ,D) =P(A)P(B)P(C |A,B)P(D|C )

We can also read off conditional (in)dependence relationshipsnot directly implied by the Markov Condition

P(V ) =∏V∈V

P(V |Parents(V )

)(1)

P(V ) =∏V∈V

P(V |Parents(V )

)(1)

DAGs and conditional dependencies

𝑋

𝑍

𝑌

X 6⊥⊥ YX ⊥⊥ Y | Z𝑋

𝑍

𝑌 𝑋

𝑍

𝑌

𝑋

𝑍

𝑌 X ⊥⊥ YIX 6⊥⊥ Y | Z

General rules to read off conditional (in)dependencies from DAGsare known as d-seperation rules

𝑋

𝑍

𝑌 X ⊥⊥ YIX 6⊥⊥ Y | Z

General rules to read off conditional (in)dependencies from DAGsare known as d-seperation rules

What use is having a DAG?

I Representation of causal relations amongst variables

I Estimation of causal effectsI Identify sufficient conditioning set to control for confounding

I I may not need to condition on all possible ancestorsI I shouldn’t condition on colliders

Discovering DAGs from Data

PC algorithm (Peter Spirtes & Clark Glymour)

Assuming:I The set of observed variables is sufficent

I No common causes not present in the datasetI Extensions that account for latent variables do exist!

I The distribution of the observed variables is faithful to a DAG

PC algorithm

Two simple principles

1. There is an edge X − Y if and only if X and Y are dependentconditional on every possible subset of the other variables

I For a graph G with vertex set V :I X − Y iff X 6⊥⊥ Y |S , for all S ⊆ V \{X ,Y }

2. If X − Y − Z , we can orientate the arrows as X → Y ← Z ifand only if X and Z are dependent conditional on every setcontaining Y

I We only orientate arrows if we can identify a collider

PC algorithm

Two simple principles

1. There is an edge X − Y if and only if X and Y are dependentconditional on every possible subset of the other variables

I For a graph G with vertex set V :I X − Y iff X 6⊥⊥ Y |S , for all S ⊆ V \{X ,Y }

2. If X − Y − Z , we can orientate the arrows as X → Y ← Z ifand only if X and Z are dependent conditional on every setcontaining Y

I We only orientate arrows if we can identify a collider

PC algorithm example

X Y

Z Q

W

True DAG

X Y

Z Q

W

Step 0

I Start with a fully connectedundirected graph

X Y

Z Q

W

Step 1a

I Test marginal dependenciesfor each pair

I E.g. correlation between Xand Z

I If not significant, delete theedge

I repeat for all pairs

X Y

Z Q

W

Step 1a

I Test marginal dependenciesfor each pair

I E.g. correlation between Xand Z

I If not significant, delete theedge

I repeat for all pairs

X Y

Z Q

W

Step 1b

I Take the result of step 1a asinput

I For each adjacent pair inthis graph (e.g., A,B)

I Form the adjacency set ofA and B: set of variablesconnected to either A or B,Adj (A,B)

I Test CI of A and B for eachsize=1 subset of Adj (A,B)

I Record the variables thatseperate A from B

X Y

Z Q

W

Step 1b

I Take the result of step 1a asinput

I For each adjacent pair inthis graph (e.g., A,B)

I Form the adjacency set ofA and B: set of variablesconnected to either A or B,Adj (A,B)

I Test CI of A and B for eachsize=1 subset of Adj (A,B)

I Record the variables thatseperate A from B

X Y

Z Q

W

Step 1c

I Take the result of step 1b asinput

I Repeat previous step but forsubset of size =2

In general, keep increasingthe number of variables youcondition on until this islarger than the size ofAdj (A,B)

X Y

Z Q

W

Step 1c

I In general, keep increasingthe number of variables youcondition on until this islarger than the size ofAdj (A,B)

X Y

Z Q

W

Step 1c

I In general, keep increasingthe number of variables youcondition on until this islarger than the size ofAdj (A,B)

X Y

Z Q

W

I This is an estimate of theskeleton of the DAG

I An estimate of theundirected version of theDAG

X Y

Z Q

W

Step 2

I For each triplet (opentriangle) A− B − C

I Orient A→ B ← C if wedid not condition on B toseperate A and C

I We now have a CompletePartially-Oriented DAG(CPDAG)

X Y

Z Q

W

Step 2

I For each triplet (opentriangle) A− B − C

I Orient A→ B ← C if wedid not condition on B toseperate A and C

I We now have a CompletePartially-Oriented DAG(CPDAG)

Let’s test the PC algorithm out

discovering causal structure with the pc-algorithm · pc algorithm two simple principles 1.there is...

Documents