discovering the cause: tools for structure learning in r · 2019-07-26 ·...

23
university of copenhagen Discovering the cause: Tools for structure learning in R Anne Helby Petersen Github: annennenne, [email protected] Section of Biostatistics, University of Copenhagen useR! July 11, 2019 Slide 1/15

Upload: others

Post on 07-Jul-2020

1 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Discovering the cause: Tools for structure learning in R · 2019-07-26 · Slide13/15—AnneHelbyPetersen—Discoveringthecause: ToolsforstructurelearninginR. university of copenhagen

university of copenhagen

Discovering the cause: Tools for structurelearning in R

Anne Helby PetersenGithub: annennenne, [email protected]

Section of Biostatistics, University of Copenhagen

useR! July 11, 2019Slide 1/15

Page 2: Discovering the cause: Tools for structure learning in R · 2019-07-26 · Slide13/15—AnneHelbyPetersen—Discoveringthecause: ToolsforstructurelearninginR. university of copenhagen

u n i v e r s i t y o f c o p e n h a g e n

Looking for a causeRQ: What factors influence development of dementia, depressionand alcohol abuse?

Slide 2/15 — Anne Helby Petersen — Discovering the cause: Tools for structure learning in R

Page 3: Discovering the cause: Tools for structure learning in R · 2019-07-26 · Slide13/15—AnneHelbyPetersen—Discoveringthecause: ToolsforstructurelearninginR. university of copenhagen

u n i v e r s i t y o f c o p e n h a g e n

Looking for a causeRQ: What factors influence development of dementia, depressionand alcohol abuse?

Slide 2/15 — Anne Helby Petersen — Discovering the cause: Tools for structure learning in R

Page 4: Discovering the cause: Tools for structure learning in R · 2019-07-26 · Slide13/15—AnneHelbyPetersen—Discoveringthecause: ToolsforstructurelearninginR. university of copenhagen

u n i v e r s i t y o f c o p e n h a g e n

More automation, please!

Q: Can we infer causal models from data?

A: Yes – sometimes!

Slide 3/15 — Anne Helby Petersen — Discovering the cause: Tools for structure learning in R

Page 5: Discovering the cause: Tools for structure learning in R · 2019-07-26 · Slide13/15—AnneHelbyPetersen—Discoveringthecause: ToolsforstructurelearninginR. university of copenhagen

u n i v e r s i t y o f c o p e n h a g e n

More automation, please!

Q: Can we infer causal models from data?

A: Yes – sometimes!

Slide 3/15 — Anne Helby Petersen — Discovering the cause: Tools for structure learning in R

Page 6: Discovering the cause: Tools for structure learning in R · 2019-07-26 · Slide13/15—AnneHelbyPetersen—Discoveringthecause: ToolsforstructurelearninginR. university of copenhagen

u n i v e r s i t y o f c o p e n h a g e n

Correlation does not imply causation

Source: www.xkcd.com/552/

Slide 4/15 — Anne Helby Petersen — Discovering the cause: Tools for structure learning in R

Page 7: Discovering the cause: Tools for structure learning in R · 2019-07-26 · Slide13/15—AnneHelbyPetersen—Discoveringthecause: ToolsforstructurelearninginR. university of copenhagen

u n i v e r s i t y o f c o p e n h a g e n

. . . but causation may imply correlationReichenbach’s common cause principle: A correlation occursdue to one of three possible mechanisms:

Slide 5/15 — Anne Helby Petersen — Discovering the cause: Tools for structure learning in R

Page 8: Discovering the cause: Tools for structure learning in R · 2019-07-26 · Slide13/15—AnneHelbyPetersen—Discoveringthecause: ToolsforstructurelearninginR. university of copenhagen

u n i v e r s i t y o f c o p e n h a g e n

Causal discovery (aka structure learning)

Main idea: Causal relationships leave behind traces in datathat can be used to reconstruct (parts of) the causal model.

Note: This detective work is a matter of data analysis.

Which R procedures that can be applied depends on:

• What type of data you have - numerical? Categorical? Mixed?• What you are willing to assume about the data generating

mechanism• What is feasible for your data size• What is missing in your data - observations? Full variables?• ...

Slide 6/15 — Anne Helby Petersen — Discovering the cause: Tools for structure learning in R

Page 9: Discovering the cause: Tools for structure learning in R · 2019-07-26 · Slide13/15—AnneHelbyPetersen—Discoveringthecause: ToolsforstructurelearninginR. university of copenhagen

u n i v e r s i t y o f c o p e n h a g e n

Causal discovery (aka structure learning)

Main idea: Causal relationships leave behind traces in datathat can be used to reconstruct (parts of) the causal model.

Note: This detective work is a matter of data analysis.

Which R procedures that can be applied depends on:

• What type of data you have - numerical? Categorical? Mixed?

• What you are willing to assume about the data generatingmechanism

• What is feasible for your data size• What is missing in your data - observations? Full variables?• ...

Slide 6/15 — Anne Helby Petersen — Discovering the cause: Tools for structure learning in R

Page 10: Discovering the cause: Tools for structure learning in R · 2019-07-26 · Slide13/15—AnneHelbyPetersen—Discoveringthecause: ToolsforstructurelearninginR. university of copenhagen

u n i v e r s i t y o f c o p e n h a g e n

Causal discovery (aka structure learning)

Main idea: Causal relationships leave behind traces in datathat can be used to reconstruct (parts of) the causal model.

Note: This detective work is a matter of data analysis.

Which R procedures that can be applied depends on:

• What type of data you have - numerical? Categorical? Mixed?• What you are willing to assume about the data generatingmechanism

• What is feasible for your data size• What is missing in your data - observations? Full variables?• ...

Slide 6/15 — Anne Helby Petersen — Discovering the cause: Tools for structure learning in R

Page 11: Discovering the cause: Tools for structure learning in R · 2019-07-26 · Slide13/15—AnneHelbyPetersen—Discoveringthecause: ToolsforstructurelearninginR. university of copenhagen

u n i v e r s i t y o f c o p e n h a g e n

Causal discovery (aka structure learning)

Main idea: Causal relationships leave behind traces in datathat can be used to reconstruct (parts of) the causal model.

Note: This detective work is a matter of data analysis.

Which R procedures that can be applied depends on:

• What type of data you have - numerical? Categorical? Mixed?• What you are willing to assume about the data generatingmechanism

• What is feasible for your data size

• What is missing in your data - observations? Full variables?• ...

Slide 6/15 — Anne Helby Petersen — Discovering the cause: Tools for structure learning in R

Page 12: Discovering the cause: Tools for structure learning in R · 2019-07-26 · Slide13/15—AnneHelbyPetersen—Discoveringthecause: ToolsforstructurelearninginR. university of copenhagen

u n i v e r s i t y o f c o p e n h a g e n

Causal discovery (aka structure learning)

Main idea: Causal relationships leave behind traces in datathat can be used to reconstruct (parts of) the causal model.

Note: This detective work is a matter of data analysis.

Which R procedures that can be applied depends on:

• What type of data you have - numerical? Categorical? Mixed?• What you are willing to assume about the data generatingmechanism

• What is feasible for your data size• What is missing in your data - observations? Full variables?

• ...

Slide 6/15 — Anne Helby Petersen — Discovering the cause: Tools for structure learning in R

Page 13: Discovering the cause: Tools for structure learning in R · 2019-07-26 · Slide13/15—AnneHelbyPetersen—Discoveringthecause: ToolsforstructurelearninginR. university of copenhagen

u n i v e r s i t y o f c o p e n h a g e n

Causal discovery (aka structure learning)

Main idea: Causal relationships leave behind traces in datathat can be used to reconstruct (parts of) the causal model.

Note: This detective work is a matter of data analysis.

Which R procedures that can be applied depends on:

• What type of data you have - numerical? Categorical? Mixed?• What you are willing to assume about the data generatingmechanism

• What is feasible for your data size• What is missing in your data - observations? Full variables?• ...

Slide 6/15 — Anne Helby Petersen — Discovering the cause: Tools for structure learning in R

Page 14: Discovering the cause: Tools for structure learning in R · 2019-07-26 · Slide13/15—AnneHelbyPetersen—Discoveringthecause: ToolsforstructurelearninginR. university of copenhagen

u n i v e r s i t y o f c o p e n h a g e n

Causal discovery in R• I have looked at 24 causal discovery procedures from 6different packages: pcalg, bnstruct, bnlearn, catnet,stablespec, deal.

• Each procedure classified according to 14 properties.• Minimal code example and description for each procedure.

matrix

score (RC)

network (S3)

BNDataset (S4)

suff. stat (list)

data.frame

0 2 4 6 8 10 12count

Input formats

Slide 7/15 — Anne Helby Petersen — Discovering the cause: Tools for structure learning in R

Page 15: Discovering the cause: Tools for structure learning in R · 2019-07-26 · Slide13/15—AnneHelbyPetersen—Discoveringthecause: ToolsforstructurelearninginR. university of copenhagen

u n i v e r s i t y o f c o p e n h a g e n

Causal discovery in R• I have looked at 24 causal discovery procedures from 6different packages: pcalg, bnstruct, bnlearn, catnet,stablespec, deal.

• Each procedure classified according to 14 properties.• Minimal code example and description for each procedure.

matrix

score (RC)

network (S3)

BNDataset (S4)

suff. stat (list)

data.frame

0 2 4 6 8 10 12count

Input formats

Slide 7/15 — Anne Helby Petersen — Discovering the cause: Tools for structure learning in R

Page 16: Discovering the cause: Tools for structure learning in R · 2019-07-26 · Slide13/15—AnneHelbyPetersen—Discoveringthecause: ToolsforstructurelearninginR. university of copenhagen

u n i v e r s i t y o f c o p e n h a g e n

Asking the right questions

Slide 8/15 — Anne Helby Petersen — Discovering the cause: Tools for structure learning in R

Page 17: Discovering the cause: Tools for structure learning in R · 2019-07-26 · Slide13/15—AnneHelbyPetersen—Discoveringthecause: ToolsforstructurelearninginR. university of copenhagen

u n i v e r s i t y o f c o p e n h a g e n

Getting a proper overview of the answers

biostatistics.dk/causaldisco

Two restrictions:

1 Only consider procedures for observational data2 Only consider procedures for acyclic models

Slide 9/15 — Anne Helby Petersen — Discovering the cause: Tools for structure learning in R

Page 18: Discovering the cause: Tools for structure learning in R · 2019-07-26 · Slide13/15—AnneHelbyPetersen—Discoveringthecause: ToolsforstructurelearninginR. university of copenhagen

u n i v e r s i t y o f c o p e n h a g e n

The causalDisco web tool

Slide 10/15 — Anne Helby Petersen — Discovering the cause: Tools for structure learning in R

Page 19: Discovering the cause: Tools for structure learning in R · 2019-07-26 · Slide13/15—AnneHelbyPetersen—Discoveringthecause: ToolsforstructurelearninginR. university of copenhagen

u n i v e r s i t y o f c o p e n h a g e n

Learning the structure of numDataload(url(paste(

"https://github.com/annennenne/causalDisco/","raw/master/data/exampledata_numData.rda",sep = "")))

library(pcalg)

pcalg_suffstat_numData <- list(C = cor(numData),n = nrow(numData))

pcalg_pc_out <- pc(pcalg_suffstat_numData,labels = names(numData),indepTest = gaussCItest,alpha = 0.01)

Slide 11/15 — Anne Helby Petersen — Discovering the cause: Tools for structure learning in R

Page 20: Discovering the cause: Tools for structure learning in R · 2019-07-26 · Slide13/15—AnneHelbyPetersen—Discoveringthecause: ToolsforstructurelearninginR. university of copenhagen

u n i v e r s i t y o f c o p e n h a g e n

Look at the model graphplot(pcalg_pc_out, main = "Model learned from data")

Model learned from data

X1 X2

X3Z

Y

Slide 12/15 — Anne Helby Petersen — Discovering the cause: Tools for structure learning in R

Page 21: Discovering the cause: Tools for structure learning in R · 2019-07-26 · Slide13/15—AnneHelbyPetersen—Discoveringthecause: ToolsforstructurelearninginR. university of copenhagen

u n i v e r s i t y o f c o p e n h a g e n

Compare with true model

X1 X2

X3Z

Y

Slide 13/15 — Anne Helby Petersen — Discovering the cause: Tools for structure learning in R

Page 22: Discovering the cause: Tools for structure learning in R · 2019-07-26 · Slide13/15—AnneHelbyPetersen—Discoveringthecause: ToolsforstructurelearninginR. university of copenhagen

u n i v e r s i t y o f c o p e n h a g e n

Directions for future work

• Crowdsourcing: Make it easy for users to report – and fordevelopers to see – what procedures are needed but not yetavailable

• Currently missing procedures for: categorical data withunobserved variables, numerical data with missing information,...

• Implement one interface for all available methods• Allow for hybrid queries combining methods from several

backends• Allow for dynamic manipulation of assumptions

Slide 14/15 — Anne Helby Petersen — Discovering the cause: Tools for structure learning in R

Page 23: Discovering the cause: Tools for structure learning in R · 2019-07-26 · Slide13/15—AnneHelbyPetersen—Discoveringthecause: ToolsforstructurelearninginR. university of copenhagen

u n i v e r s i t y o f c o p e n h a g e n

Thank you!

Slide 15/15 — Anne Helby Petersen — Discovering the cause: Tools for structure learning in R