university of massachusetts amherst college of information...

34
On Causal Analysis for Heterogeneous Networks Katerina Marazopoulou, David Arbour, David Jensen August 2017 University of Massachusetts Amherst College of Information and Computer Sciences KDD Workshop on Causal Discovery

Upload: others

Post on 09-Jul-2020

0 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: University of Massachusetts Amherst College of Information ...nugget.unisa.edu.au/CD2017/slides/KaterinaMarazopoulou.pdf · Katerina Marazopoulou On Causal Analysis for Heterogeneous

On Causal Analysis for Heterogeneous Networks

Katerina Marazopoulou, David Arbour, David Jensen

August 2017

University of Massachusetts Amherst College of Information and Computer Sciences

KDD Workshop on Causal Discovery

Page 2: University of Massachusetts Amherst College of Information ...nugget.unisa.edu.au/CD2017/slides/KaterinaMarazopoulou.pdf · Katerina Marazopoulou On Causal Analysis for Heterogeneous

Katerina Marazopoulou On Causal Analysis for Heterogeneous Networks 2

source: Visual Complexity

Causal inference in networks: How is the behavior of an individual affected by his/her peers?

Page 3: University of Massachusetts Amherst College of Information ...nugget.unisa.edu.au/CD2017/slides/KaterinaMarazopoulou.pdf · Katerina Marazopoulou On Causal Analysis for Heterogeneous

Katerina Marazopoulou On Causal Analysis for Heterogeneous Networks 3

source: Visual Complexity

How does the presence of multiple relationship types affect causal analysis?

Page 4: University of Massachusetts Amherst College of Information ...nugget.unisa.edu.au/CD2017/slides/KaterinaMarazopoulou.pdf · Katerina Marazopoulou On Causal Analysis for Heterogeneous

Katerina Marazopoulou On Causal Analysis for Heterogeneous Networks

Outline

• Background: Causal effect estimation on networks

• Causal effect estimation in heterogeneous networks

• Experiments on synthetic data

• Application on real-world dataset

4

Page 5: University of Massachusetts Amherst College of Information ...nugget.unisa.edu.au/CD2017/slides/KaterinaMarazopoulou.pdf · Katerina Marazopoulou On Causal Analysis for Heterogeneous

Katerina Marazopoulou On Causal Analysis for Heterogeneous Networks

Causal Effect Estimation in Networks

5

friends

Page 6: University of Massachusetts Amherst College of Information ...nugget.unisa.edu.au/CD2017/slides/KaterinaMarazopoulou.pdf · Katerina Marazopoulou On Causal Analysis for Heterogeneous

Katerina Marazopoulou On Causal Analysis for Heterogeneous Networks

• Population of n individuals that form an undirected graph

• Binary treatment T and outcome O

Causal Effect Estimation in Networks

6

friends

Oi(T = t) t 2 {0, 1}nwhere

• The outcome of a node depends on the global treatment assignment:

⌧(1,0) =1

n

nX

i=1

E[Oi(T = 1)�Oi(T = 0)]

• ATE between global treatment and global control

Page 7: University of Massachusetts Amherst College of Information ...nugget.unisa.edu.au/CD2017/slides/KaterinaMarazopoulou.pdf · Katerina Marazopoulou On Causal Analysis for Heterogeneous

Katerina Marazopoulou On Causal Analysis for Heterogeneous Networks

Causal effect estimation

1. Treatment assignment

2. Exposure model: When an individual is considered to be treated

3. Analysis: How to estimate the causal quantity of interest

7

Estimation procedure for causal inference:

Page 8: University of Massachusetts Amherst College of Information ...nugget.unisa.edu.au/CD2017/slides/KaterinaMarazopoulou.pdf · Katerina Marazopoulou On Causal Analysis for Heterogeneous

Katerina Marazopoulou On Causal Analysis for Heterogeneous Networks

Gui, Basin, Han. WWW 2015

Estimation procedure for causal inference:

Causal effect estimation

8

1. Treatment assignment

2. Exposure model: Fraction neighborhood exposure [Gui et al. 2015]

3. Analysis: Linear regression adjustment [Gui et al. 2015]

Page 9: University of Massachusetts Amherst College of Information ...nugget.unisa.edu.au/CD2017/slides/KaterinaMarazopoulou.pdf · Katerina Marazopoulou On Causal Analysis for Heterogeneous

Katerina Marazopoulou On Causal Analysis for Heterogeneous Networks

The Gui et al. framework

• Fraction neighborhood exposure model:

The response function depends on a node’s own treatment assignment and the proportion of its treated peers

9

g(Ti,�i) = ↵+ �Ti + ��i

⌧(1,0) = g(Ti = 1,�i = 1)� g(Ti = 0,�i = 0) = � + �• ATE:

Page 10: University of Massachusetts Amherst College of Information ...nugget.unisa.edu.au/CD2017/slides/KaterinaMarazopoulou.pdf · Katerina Marazopoulou On Causal Analysis for Heterogeneous

Katerina Marazopoulou On Causal Analysis for Heterogeneous Networks

Heterogenous Network

10

friends coworkers

Page 11: University of Massachusetts Amherst College of Information ...nugget.unisa.edu.au/CD2017/slides/KaterinaMarazopoulou.pdf · Katerina Marazopoulou On Causal Analysis for Heterogeneous

Katerina Marazopoulou On Causal Analysis for Heterogeneous Networks

Response function

11

Heterogeneous networks:

g(Ti,�i) = ↵+ �Ti + ��i

gf,c(Ti,�i) = ↵+ �Ti + �f�fi + �c�c

i

Homogeneous networks:

Page 12: University of Massachusetts Amherst College of Information ...nugget.unisa.edu.au/CD2017/slides/KaterinaMarazopoulou.pdf · Katerina Marazopoulou On Causal Analysis for Heterogeneous

Katerina Marazopoulou On Causal Analysis for Heterogeneous Networks

Sets of peers

• There are more options than friends and coworkers.

• We can consider any combination of non-overlapping sets of peers

friends and coworkers

friends only

friends or coworkers but not both

12

gf,c(Ti,�i) = ↵+ �Ti + �f�fi + �c�c

i

Page 13: University of Massachusetts Amherst College of Information ...nugget.unisa.edu.au/CD2017/slides/KaterinaMarazopoulou.pdf · Katerina Marazopoulou On Causal Analysis for Heterogeneous

Katerina Marazopoulou On Causal Analysis for Heterogeneous Networks

Peer-sets of interest

• Friends (homogeneous network)

• Coworkers (homogeneous network)

• Friends or coworkers (union as a homogeneous network)

• Disjoint

• Friends-coworkers

13

Page 14: University of Massachusetts Amherst College of Information ...nugget.unisa.edu.au/CD2017/slides/KaterinaMarazopoulou.pdf · Katerina Marazopoulou On Causal Analysis for Heterogeneous

Katerina Marazopoulou On Causal Analysis for Heterogeneous Networks

Sets of peers we consider

14

A

B

C

D

Coworkers

Friends only

Friends and coworkers

A

B

C

D

Coworkers only

Friends

A

B

C

D

Coworkers

Friends or coworkers

A

B

C

D

Friends

A

B

C

D

friendscoworkers

Friends Coworkers Friends or coworkers

Disjoint Friends-coworkers

Page 15: University of Massachusetts Amherst College of Information ...nugget.unisa.edu.au/CD2017/slides/KaterinaMarazopoulou.pdf · Katerina Marazopoulou On Causal Analysis for Heterogeneous

Katerina Marazopoulou On Causal Analysis for Heterogeneous Networks

Peer sets of interest: Where are they used?

• Response functions • ATE estimators • Outcome generation

15

Used for:

Page 16: University of Massachusetts Amherst College of Information ...nugget.unisa.edu.au/CD2017/slides/KaterinaMarazopoulou.pdf · Katerina Marazopoulou On Causal Analysis for Heterogeneous

Katerina Marazopoulou On Causal Analysis for Heterogeneous Networks

Peer sets of interest: Where are they used?

• Response functions

• ATE estimators

• Outcome generation

16

Friends-coworkers

Used for:

Friends

A

B

C

D

Coworkersgf,c(Ti,�i) = ↵+ �Ti+�f�f

i +�c�ci

⌧f,c = �+�f+�c

Oi = w0 + w1Ti+wf2

F [·, i]>Ot

DFi

+wc2C[·, i]>Ot

DCi

+ ✏

Page 17: University of Massachusetts Amherst College of Information ...nugget.unisa.edu.au/CD2017/slides/KaterinaMarazopoulou.pdf · Katerina Marazopoulou On Causal Analysis for Heterogeneous

Katerina Marazopoulou On Causal Analysis for Heterogeneous Networks 17

How does ignoring/mis-specifying the type of relationships affect estimation of causal effects?

Page 18: University of Massachusetts Amherst College of Information ...nugget.unisa.edu.au/CD2017/slides/KaterinaMarazopoulou.pdf · Katerina Marazopoulou On Causal Analysis for Heterogeneous

Katerina Marazopoulou On Causal Analysis for Heterogeneous Networks

Experiments (synthetic data)

Goal: impact on estimation of causal effects

• Generation of graphs

Erdos-Renyi

Watts-Strogatz

Stochastic block model

• Generation of treatment values

1. Independent assignment for every node

2. Graph cluster randomization [Ugander et al. 2013]

18

Ugander, Karrer, Backstrom, Kleinberg. KDD 2013

Page 19: University of Massachusetts Amherst College of Information ...nugget.unisa.edu.au/CD2017/slides/KaterinaMarazopoulou.pdf · Katerina Marazopoulou On Causal Analysis for Heterogeneous

Katerina Marazopoulou On Causal Analysis for Heterogeneous Networks 19

• Generation of outcome values

1. Outcome Interference

2. Treatment Interference

Experiments (synthetic data)

Oi,t+1 ⇠ w0 + w1Ti + f(Opeers of i,t) + ✏

Oi ⇠ w0 + w1Ti + f(Tpeers of i

) + ✏

where: ✏ = �✏N (0, 1)

Page 20: University of Massachusetts Amherst College of Information ...nugget.unisa.edu.au/CD2017/slides/KaterinaMarazopoulou.pdf · Katerina Marazopoulou On Causal Analysis for Heterogeneous

Katerina Marazopoulou On Causal Analysis for Heterogeneous Networks 20

Experiment configuration:

• Graph model: Watts-Strogatz

• Treatment assignment: Graph cluster randomization

• Treatment probability: 0.5

• Outcome generation: Treatment interference

Results

Page 21: University of Massachusetts Amherst College of Information ...nugget.unisa.edu.au/CD2017/slides/KaterinaMarazopoulou.pdf · Katerina Marazopoulou On Causal Analysis for Heterogeneous

Katerina Marazopoulou On Causal Analysis for Heterogeneous Networks 21

0

−6.3

−30.2

0.1

3.8

−12.2

2

−46.2

−11.2

−2.8

−16.7

−3.7

0

−0.1

2

−4.1

−6.4

−25

0

3.8

−19.6

−7.7

−19.5

−6.9

0

Coworkers

Disjoint

Friends

Friends−Coworkers

Friends or Coworkers

CoworkersDisjoint

Friends

Friends−Coworkers

Friends or Coworkers

Generative model

Assu

med

mod

el

10

20

30

40

Absoluterelativebias

Page 22: University of Massachusetts Amherst College of Information ...nugget.unisa.edu.au/CD2017/slides/KaterinaMarazopoulou.pdf · Katerina Marazopoulou On Causal Analysis for Heterogeneous

Katerina Marazopoulou On Causal Analysis for Heterogeneous Networks 22

Experiment configuration:

• Graph model: Watts-Strogatz

• Treatment assignment: Graph cluster randomization

• Treatment probability: 0.5

• Outcome generation: Treatment interference

Results

Page 23: University of Massachusetts Amherst College of Information ...nugget.unisa.edu.au/CD2017/slides/KaterinaMarazopoulou.pdf · Katerina Marazopoulou On Causal Analysis for Heterogeneous

Katerina Marazopoulou On Causal Analysis for Heterogeneous Networks 23

Experiment configuration:

• Graph model: Watts-Strogatz

• Treatment assignment: Graph cluster randomization

• Treatment probability: varying

• Outcome generation: Treatment interference

Results

Page 24: University of Massachusetts Amherst College of Information ...nugget.unisa.edu.au/CD2017/slides/KaterinaMarazopoulou.pdf · Katerina Marazopoulou On Causal Analysis for Heterogeneous

Katerina Marazopoulou On Causal Analysis for Heterogeneous Networks 24

● ● ● ●●

●● ●

● ● ● ● ● ● ● ● ●

● ● ●● ● ●

●●

● ● ● ●●

●●

● ● ● ●●

●● ●

Generative model: Coworkers

Generative model: Disjoint

Generative model: Friends

Generative model: Friends−Coworkers

Generative model: Friends or Coworkers

0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9

−30

−20

−10

0

Treatment probability

Rel

ative

bia

s (%

ove

r tru

e at

e)

Exposure model ●Coworkers Disjoint Friends Friends−Coworkers Friends or Coworkers

Page 25: University of Massachusetts Amherst College of Information ...nugget.unisa.edu.au/CD2017/slides/KaterinaMarazopoulou.pdf · Katerina Marazopoulou On Causal Analysis for Heterogeneous

Katerina Marazopoulou On Causal Analysis for Heterogeneous Networks

Model selection

Given a set of alternative models, is it possible to identify the true generating model?

25

Procedure:

• Generate synthetic networks and synthetic data (as before).

• Compute BIC for each of the five alternative models.

• Select model with the lowest BIC.

Page 26: University of Massachusetts Amherst College of Information ...nugget.unisa.edu.au/CD2017/slides/KaterinaMarazopoulou.pdf · Katerina Marazopoulou On Causal Analysis for Heterogeneous

Katerina Marazopoulou On Causal Analysis for Heterogeneous Networks

Model selection

26

Erdos−Renyi Stochastic−block−model Watts−Strogatz

C0 C1 C2 C3 C0 C1 C2 C3 C0 C1 C2 C3

0.00

0.25

0.50

0.75

1.00

Configuration of coefficients

Accu

racy

of m

odel

sel

ectio

n

Noise 0.5 1.0 2.0

Page 27: University of Massachusetts Amherst College of Information ...nugget.unisa.edu.au/CD2017/slides/KaterinaMarazopoulou.pdf · Katerina Marazopoulou On Causal Analysis for Heterogeneous

Katerina Marazopoulou On Causal Analysis for Heterogeneous Networks

Model selection

27

Random Random Random

Erdos−Renyi Stochastic−block−model Watts−Strogatz

C0 C1 C2 C3 C0 C1 C2 C3 C0 C1 C2 C3

0.00

0.25

0.50

0.75

1.00

Configuration of coefficients

Accu

racy

of m

odel

sel

ectio

n

Noise 0.5 1.0 2.0

Page 28: University of Massachusetts Amherst College of Information ...nugget.unisa.edu.au/CD2017/slides/KaterinaMarazopoulou.pdf · Katerina Marazopoulou On Causal Analysis for Heterogeneous

Katerina Marazopoulou On Causal Analysis for Heterogeneous Networks

Real data

• Study on the diffusion of micro financing loans through various social networks

• Survey conducted in 75 villages in southern India

• Village-level survey and follow-up survey on a subsample of individuals for each village

• Individual surveys identify 13 types of social relationships (e.g., friends, relatives, borrowing money from, going to temple with)

• Individual’s attributes (age, gender, etc)

28

Page 29: University of Massachusetts Amherst College of Information ...nugget.unisa.edu.au/CD2017/slides/KaterinaMarazopoulou.pdf · Katerina Marazopoulou On Causal Analysis for Heterogeneous

Katerina Marazopoulou On Causal Analysis for Heterogeneous Networks

Real heterogeneous network

29

Page 30: University of Massachusetts Amherst College of Information ...nugget.unisa.edu.au/CD2017/slides/KaterinaMarazopoulou.pdf · Katerina Marazopoulou On Causal Analysis for Heterogeneous

Katerina Marazopoulou On Causal Analysis for Heterogeneous Networks

Experimental setup for real data

• Several pairs of social relationships

• Combinations of treatment-outcome variables

• Estimate effect using different response functions

30

Page 31: University of Massachusetts Amherst College of Information ...nugget.unisa.edu.au/CD2017/slides/KaterinaMarazopoulou.pdf · Katerina Marazopoulou On Causal Analysis for Heterogeneous

Katerina Marazopoulou On Causal Analysis for Heterogeneous Networks 31

0.0

0.2

0.4

0.6

0.8

friends−relatives

gender−savingsfriends−relatives

gender−working

helping with decisions−relatives

gender−working

borrowing money−relatives

gender−working

Relation1−Relation2 Treatment−Outcome

Estim

ated

effe

ctAssumed model●

Rel1

Rel2

Rel1orRel2

Rel1−Rel2

Disjoint

Page 32: University of Massachusetts Amherst College of Information ...nugget.unisa.edu.au/CD2017/slides/KaterinaMarazopoulou.pdf · Katerina Marazopoulou On Causal Analysis for Heterogeneous

Katerina Marazopoulou On Causal Analysis for Heterogeneous Networks

Summary

• Recent work has extended causal inference frameworks for network data.

• We address the case of heterogeneous networks and causal effect estimation in this framework.

• Mis-specifying the relational structure of causal dependence can lead to significant bias.

• Model selection for distinguishing among candidate response functions.

32

Page 33: University of Massachusetts Amherst College of Information ...nugget.unisa.edu.au/CD2017/slides/KaterinaMarazopoulou.pdf · Katerina Marazopoulou On Causal Analysis for Heterogeneous

Katerina Marazopoulou On Causal Analysis for Heterogeneous Networks

Directions for future work

33

• Formal characterization of bias and variance of ATE estimators for heterogeneous networks

• Interactions of relational semantics (effect present from multiple relational phenomena)

• Measure of model selection for relational data

• Fully automated methods for choosing appropriate response functions

• Extending A/B testing framework for heterogeneous networks

Page 34: University of Massachusetts Amherst College of Information ...nugget.unisa.edu.au/CD2017/slides/KaterinaMarazopoulou.pdf · Katerina Marazopoulou On Causal Analysis for Heterogeneous

Katerina Marazopoulou On Causal Analysis for Heterogeneous Networks

Questions?

Thank you!

34