discovering causal structure with the pc-algorithm · pc algorithm two simple principles 1.there is...

28
Discovering Causal Structure with the PC-algorithm CRG and MSDSlab Meeting Discussant: Ois´ ın Ryan February 23, 2018

Upload: others

Post on 23-Oct-2020

1 views

Category:

Documents


0 download

TRANSCRIPT

  • Discovering Causal Structure with thePC-algorithm

    CRG and MSDSlab Meeting

    Discussant: Oiśın Ryan

    February 23, 2018

  • What is a DAG?

    I DAG = Directed Acyclic Graph

    I Nodes or Vertices = {Author,Bar, V5.. }

    I Directed Edges →I No cycles

    I Cannot have A→ B → C → A

  • Why DAGs?

    1. DAGs can be used to represent joint probability distributionsI Often called Bayesian networksI Nodes represent variablesI Edges represent dependencies between pairs of variables

    I A → B means A 6⊥⊥ BI Read off conditional dependency relationships

    2. DAGs + probability distribution used for causal inferenceI Edges represent direct causal links

    I A → B means A causes BI Counterfactual causality (Pearl, Rubin, Spirtes & Glymour)I Account for typical ideas about causality

    I Forward in time - acyclicalI Explains “paradoxes” - Simpsons, Lords

  • Why DAGs?

    1. DAGs can be used to represent joint probability distributionsI Often called Bayesian networksI Nodes represent variablesI Edges represent dependencies between pairs of variables

    I A → B means A 6⊥⊥ BI Read off conditional dependency relationships

    2. DAGs + probability distribution used for causal inferenceI Edges represent direct causal links

    I A → B means A causes BI Counterfactual causality (Pearl, Rubin, Spirtes & Glymour)I Account for typical ideas about causality

    I Forward in time - acyclicalI Explains “paradoxes” - Simpsons, Lords

  • Some graph terminology

    I Parents(B) = {A,D}I Children(B) = {C}I A is an ancestor of C

    I C is a descendant of A

    I Path = sequence of edges

  • DAGs and Probability distributions

    DAGs can be used to represent a joint density function using theMarkov Condition

    P(V ) =∏V∈V

    P(V |Parents(V )

    )(1)

    P(A,B,C ,D) =P(A)P(B)P(C |A,B)P(D|C )

    We can also read off conditional (in)dependence relationshipsnot directly implied by the Markov Condition

  • DAGs and Probability distributions

    DAGs can be used to represent a joint density function using theMarkov Condition

    P(V ) =∏V∈V

    P(V |Parents(V )

    )(1)

    P(A,B,C ,D) =P(A)P(B)P(C |A,B)P(D|C )

    We can also read off conditional (in)dependence relationshipsnot directly implied by the Markov Condition

  • DAGs and Probability distributions

    DAGs can be used to represent a joint density function using theMarkov Condition

    P(V ) =∏V∈V

    P(V |Parents(V )

    )(1)

    P(A,B,C ,D) =P(A)P(B)P(C |A,B)P(D|C )

    We can also read off conditional (in)dependence relationshipsnot directly implied by the Markov Condition

  • DAGs and conditional dependencies

    𝑋

    𝑍

    𝑌

    X 6⊥⊥ YX ⊥⊥ Y | Z𝑋

    𝑍

    𝑌 𝑋

    𝑍

    𝑌

  • DAGs and conditional dependencies

    𝑋

    𝑍

    𝑌 X ⊥⊥ YIX 6⊥⊥ Y | Z

    General rules to read off conditional (in)dependencies from DAGsare known as d-seperation rules

  • DAGs and conditional dependencies

    𝑋

    𝑍

    𝑌 X ⊥⊥ YIX 6⊥⊥ Y | Z

    General rules to read off conditional (in)dependencies from DAGsare known as d-seperation rules

  • What use is having a DAG?

    I Representation of causal relations amongst variables

    I Estimation of causal effectsI Identify sufficient conditioning set to control for confounding

    I I may not need to condition on all possible ancestorsI I shouldn’t condition on colliders

  • Discovering DAGs from Data

    PC algorithm (Peter Spirtes & Clark Glymour)

    Assuming:I The set of observed variables is sufficent

    I No common causes not present in the datasetI Extensions that account for latent variables do exist!

    I The distribution of the observed variables is faithful to a DAG

  • PC algorithm

    Two simple principles

    1. There is an edge X − Y if and only if X and Y are dependentconditional on every possible subset of the other variables

    I For a graph G with vertex set V :I X − Y iff X 6⊥⊥ Y |S , for all S ⊆ V \{X ,Y }

    2. If X − Y − Z , we can orientate the arrows as X → Y ← Z ifand only if X and Z are dependent conditional on every setcontaining Y

    I We only orientate arrows if we can identify a collider

  • PC algorithm

    Two simple principles

    1. There is an edge X − Y if and only if X and Y are dependentconditional on every possible subset of the other variables

    I For a graph G with vertex set V :I X − Y iff X 6⊥⊥ Y |S , for all S ⊆ V \{X ,Y }

    2. If X − Y − Z , we can orientate the arrows as X → Y ← Z ifand only if X and Z are dependent conditional on every setcontaining Y

    I We only orientate arrows if we can identify a collider

  • PC algorithm example

    X Y

    Z Q

    W

    True DAG

  • PC algorithm example

    X Y

    Z Q

    W

    Step 0

    I Start with a fully connectedundirected graph

  • PC algorithm example

    X Y

    Z Q

    W

    Step 1a

    I Test marginal dependenciesfor each pair

    I E.g. correlation between Xand Z

    I If not significant, delete theedge

    I repeat for all pairs

  • PC algorithm example

    X Y

    Z Q

    W

    Step 1a

    I Test marginal dependenciesfor each pair

    I E.g. correlation between Xand Z

    I If not significant, delete theedge

    I repeat for all pairs

  • PC algorithm example

    X Y

    Z Q

    W

    Step 1b

    I Take the result of step 1a asinput

    I For each adjacent pair inthis graph (e.g., A,B)

    I Form the adjacency set ofA and B: set of variablesconnected to either A or B,Adj (A,B)

    I Test CI of A and B for eachsize=1 subset of Adj (A,B)

    I Record the variables thatseperate A from B

  • PC algorithm example

    X Y

    Z Q

    W

    Step 1b

    I Take the result of step 1a asinput

    I For each adjacent pair inthis graph (e.g., A,B)

    I Form the adjacency set ofA and B: set of variablesconnected to either A or B,Adj (A,B)

    I Test CI of A and B for eachsize=1 subset of Adj (A,B)

    I Record the variables thatseperate A from B

  • PC algorithm example

    X Y

    Z Q

    W

    Step 1c

    I Take the result of step 1b asinput

    I Repeat previous step but forsubset of size =2

    In general, keep increasingthe number of variables youcondition on until this islarger than the size ofAdj (A,B)

  • PC algorithm example

    X Y

    Z Q

    W

    Step 1c

    I Take the result of step 1b asinput

    I Repeat previous step but forsubset of size =2

    I In general, keep increasingthe number of variables youcondition on until this islarger than the size ofAdj (A,B)

  • PC algorithm example

    X Y

    Z Q

    W

    Step 1c

    I Take the result of step 1b asinput

    I Repeat previous step but forsubset of size =2

    I In general, keep increasingthe number of variables youcondition on until this islarger than the size ofAdj (A,B)

  • PC algorithm example

    X Y

    Z Q

    W

    I This is an estimate of theskeleton of the DAG

    I An estimate of theundirected version of theDAG

  • PC algorithm example

    X Y

    Z Q

    W

    Step 2

    I For each triplet (opentriangle) A− B − C

    I Orient A→ B ← C if wedid not condition on B toseperate A and C

    I We now have a CompletePartially-Oriented DAG(CPDAG)

  • PC algorithm example

    X Y

    Z Q

    W

    Step 2

    I For each triplet (opentriangle) A− B − C

    I Orient A→ B ← C if wedid not condition on B toseperate A and C

    I We now have a CompletePartially-Oriented DAG(CPDAG)

  • Let’s test the PC algorithm out