networks and complex data: discussion · networks and complex data: discussion . ... gotas de...
TRANSCRIPT
Edoardo Airoldi Department of Statistics
Harvard University
Becker Friedman Institute, University of Chicago, September 24th 2016
Networks and complex data: Discussion
Analysis of network data
Becker Friedman Institute, 2016 2
Remarks
• What researchers might do in practice
• Complications arising from a population network
• What we know and what we do not
Becker Friedman Institute, 2016 3
A field experiment in Honduras
• Education on standards of care for newborns (176 villages; 30,000 people; 3,000 target households)
• Design: 2 target nomination schemes and 8 levels of treatment
• Short- and long-term causal effects, including on social interactions 2-year post intervention
(N Christakis, J Fowler, D Spielman, R Negon, et al.)
4
Agua Buena
Agua Buena Norte
Agua Caliente
Agua Caliente
Agua Sucia
Agua Zarca
Aldea Nueva
Aldea Nueva
Banaderos
Barbascos
Barbascos Copan
Boca del monte
Buena Vista
Buena Vista Buenos Aires
Campamento 1
Carrizalito 1
Carrizalito 2
Carrizalon
Ceiba Segunda
ChimichalChonco
Colon Jubuco
Corralito
Crucitas
Delicias 1
Delicias 2
Descombros
Dos Quebradas
El Barrial
El Bonete
El Caobal
El Carrizal
El Cedral
El Chilar
El Chilcal
El Chorreron
El Cisne
El Conal
El Cordoncillo
El Eden/La Moya
El Encino
El Florido
El Guayabo
El Jaragua
El Jaral
El Llanetillo
El Llano
El Malcote
El Mirador
El Pedernal
El Pinabeton
El Porvenir 1
El Porvenir 2
El Prado
El Quebracho El Raizal
El Rosario
El Rosario
El Salitron
El sompopero
El Tamarindo
El Templador
El Tesorito
El Tigre
El Transito
El Trapiche
El Triunfo
El Volcan
El Zapote
El Zapote
Gotas de Sangre
Hacienda Grande
Hacienda San Juan
Ingenios
La Arada
La Canteada
La Casita
La Castellona
La Cuchilla
La Cuchilla
La cumbre
La Esperanza
La Esperanza
La Estanzuela
La Hermosura
La Huertona
La Laguna
La Leonita
La Libertad
La Linea/Pinalito
La Pintada
La Ramada 1
La Ramada 2
La Reforma
La Union 2
La Union, OtutaLa Vegona
La Vegona
La Zona
Lancetillal
Las Brisas
Las Delicias
Las Flores
Las Juntas 1
Las Juntas 2
Las Medias
Las Mesas
Las Pavas
Llano la Puerta
Londres
Los Achiotes
Los Amates
Los Arcos
Los Ranchos
mariposal
Mecatal
Mirasol
Mirasolito/Rio Negro
Monte Cristo
Monte los NegrosMoscarronal
Motagua
Mun. San Jeronimo
Naranjales
Naranjito
Nueva Alianza
Nueva Armenia
Nueva Esperanza Sur
Nueva Suyapa
Nuevo San Isidro
Penas 1
Penas 2
Piedras Coloradas
Pinabetal
Pinalito
Pinalito
Plan de Limon
Plan del Danto
Plan El Perico
Plan Grande
Planes de la Brea
Planon de la Libertad
Pueblo viejo
Queseras
Rastrojitos
Rincon del Buey
Rio Amarillo
Rio Blanco
Rio Negro
San Antonio del Norte
San Antonio Miramar
San Antonio Tapesco
San Cristobal
San Francisco
San Isidro 1
San Jeronimo
San Jose Miramar
San Juan
San Manuel
San Miguel
San Rafael
Santa Cruz
Santa Cruz Virginia
Santa Elena
Santa Rosita
Santo DomingoSesesmil 1
Sesesmil 2
Tierra Blanca
Tierra Fria 1
Tierra Fria 2
Union Cedral
Valle Maria
Vara de Cohete
Virginia 1
14.7
14.8
14.9
15.0
15.1
−89.2 −89.1 −89.0 −88.9lon
lat
Dosage
0
0.05
0.1
0.2
0.3
0.5
0.75
1
Treatment
random
friend nomination
Aldeas − Copan, Honduras − Treatment Blocking
Becker Friedman Institute, 2016 5
Networks mapped pre-intervention
• Trellis for mapping 20+ aspects of social relations (Arguably no measurement error, but missing data)
Becker Friedman Institute, 2016 6
Other illustrative applications
• LinkedIn – Labor mobility, employment histories, recruiting – Network only provides limited / partial view of
professional interactions – Connections may be interventions of interest
• Google display ads – Selective callouts problem (who to invite in auctions) – Many networks available – Causal mechanisms are not well developed
Becker Friedman Institute, 2016 7
Remarks
• What researchers might do in practice
• Complications arising from a population network
• What we know and what we do not
Becker Friedman Institute, 2016 8
Potential outcomes / table of science
Table of science Y is N=4 x |Z|=24, with element Yi(Z)
Causal inferential targets are defined as a function of Y Typically we assume Yi(Zi) – Y becomes Nx2 9
Network interference and exposure
• Given network G, if no interference is untenable, we must assume how ZNi affects Yi(Zi, g(ZNi), rest)
• Example: “i is exposed if it has 1+ treated neighbors” leads to 4 distinct potential outcomes
10
0
0
00
0
0
1
0
00
0
0
0
1
10
1
0
1
1
10
1
0
Control
Treated
Exposed
Treated & Exposedi
i
i
i
Becker Friedman Institute, 2016
What fails?
• ATE and TTE are no longer equal • Allocations Zj on G with the same (nt,nc) have diffe-
rent configurations of treatment, exposure and control
• Need randomization schemes that leverage G • Need to revisit causal interpretation of parameters
11
0
1
1
0
1
0
00
0
0
0
1
0 0
01
0
1
Becker Friedman Institute, 2016
Causal interpretation of parameters
Recall Yiobs = Zi Yi(1) + (1−Zi) Yi(0); assume SUTVA
and additivity of treatment effect Yi(1) = Yi(0) + β • Slope coefficient in a regression model for Yi
obs
equals the ATE; thus it has a clear interpretation as causal effect defined on table of science Y
In the presence of network interference? The practice is • State a model for Yi
obs and argue why its parameter(s) capture (aspects of) the causal effect(s) of interest
12 Becker Friedman Institute, 2016
Neighborhood interference assumption (__N__IA)
Convenient to define
And parameters
Additive model: Yi(Z) = αi + βiZi + Γi(ZNi) + ZiΔi(Zni)
Main assumption and model
13 Becker Friedman Institute, 2016
Structural assumptions
(_AN__IA)
(__N_SIA)
(S_N__IA)
(__NA_IA)
Relations among twelve unique models
15 Becker Friedman Institute, 2016
Unambiguous causal interpretation
• Can now write causal effects of interest in terms of parameters of the 12 additive models for Yi
obs, e.g., ATE = 1/n Σi (Yi(ei) − Yi(0))
= 1/n Σi βi (under NIA)
TTE ≡ 1/n Σi (Yi(1) − Yi(0)) = 1/n Σi (βi + Γi(di) + Δi(di)) (under NIA) = 1/n Σi (βi + γ di) (under SANASIA)
• We can now spell out assumptions that justify previously published estimators / estimates
16 Becker Friedman Institute, 2016
Remarks
• What researchers might do in practice
• Complications arising from a population network
• What we know and what we do not
Becker Friedman Institute, 2016 17
Complex landscape and set-up
• Inferential targets: ATE, TTE, AIE, …
• Assumptions about the network: fixed vs. model, observed pre-intervention or outcome; fully vs. partially observed, with and without errors, formal notion of interference
• Sampling mechanism; finite vs. infinite population inference; models for the outcomes; observational vs. experimental
• Treatment allocation strategy; estimator; complications …
18
What do we know? Not much …
• Network observed pre-intervention without error – Fisher tests for interference (beyond first order neighbors) – Durbin-Wu-Hausman style test SUTVA violations – Estimation theory (LUE / MIVE) for ATE and failures – New randomization and rerandomization strategies – Homophily vs peer influence in observational studies
• Network observed with error – Inference from non-ignorable network sampling designs – Partially revealed interference (network as outcome) – Condition on a model for the network
19
Analysis of text data
Becker Friedman Institute, 2016 20
Analysis of word counts
• Statistics and machine learning++
• Common elements Matrix of word counts W (n documents x v terms) Mixture models (k components, interpreted as ??) Parameters µ are rates of occurrence (v terms x k components)
• Problem specific Document covariates L (n documents x …; e.g., author(s), publication year, topic annotations by professional editors)
Becker Friedman Institute, 2016 21
Remarks
• Statistics and machine learning: Classic papers
• Evaluation and interpretation issues
• Bringing causality back
Becker Friedman Institute, 2016 22
An authorship attribution problem
Becker Friedman Institute, 2016 23
Parameterization and priors
24
• Counts for term v are Poisson with rates (µvH, µv
M) • Re-parameterize with total and differential rates
σv = µvH + µv
M τv = µv
H / (µvH + µv
M)
• Priors σv ∝ constant τv = symmetric beta (α1 + α2 σv)
Also see Negative-Binomial (MW) and COM Poisson distributions (Kadane, Shmueli, and co-authors)
Becker Friedman Institute, 2016 25
Remarks
• Authorship vector L is largely observed • Clear interpretation of the 2 mixture components • Evaluation
– In-sample using agreement between posterior odds of authorship for undisputed papers and Lobs
– Out-of-sample predictions for Lmis
26
Characterizing topics
27
Basic model
Data: count matrix W, document lengths N Re-parameterize rate matrix β where βvk = µvk / Σv µvk
For document d • θd ~ Dirichlet (α) where θd is a k x 1 vector • zd ~ Multinomial (θd,Nd) where zd is a k x 1 vector
• w'dk ~ Multinomial (β·k, zdk) • Wdv = Σk w'dkv
• Place symmetric Dirichlet prior on columns β·k 28
(Source: Blei, 2012) 29
Remarks
• Topic vector L is entirely unobserved • Often unclear interpretation of many of the k mixture
components • Evaluation
– Lists of most frequent words – Predictions for Lmis using cross-validation, held-out log-lik
30
Remarks
• Statistics and machine learning: Classic papers
• Evaluation and interpretation issues
• Bringing causality back
Becker Friedman Institute, 2016 31
Issues with evaluation standards
32
• Original papers Interpret most frequent words for most frequent topics Supplemental websites to explore the entire model output
• Follow-up papers Almost exclusively focus on frequency Qualitative and anecdotal evaluations
• We need exhaustive and quantitative evaluations How to quantify topic diversity and coherence? How to maximize interpretability of components/topics?
Hypotheses and MTurk experiments
1. Topic summaries based on frequency and exclu-sivity are more interpretable than frequency alone
2. Regularizing rates by word yields better estimates of FREX scores than regularizing rates by topic
• However, interpretability is hard to quantify
• We carry out two experiments on Amazon MTurk that enlist human evaluators to execute a comparative analysis of the interpretability of topic summaries
Becker Friedman Institute, 2016 33
Design of experiments
• Three strategies to generate topic summaries PCM FREX: Poisson, regularize by word, max FREX
LDA FREQ: Binomial, regularize by topic, most frequent
LDA FREX: Binomial, regularize by topic, max FREX (exclusivity estimated by renormalizing rates post inference)
• Two tasks: (i) word intrusion, (ii) topic coherence
• Top-5 words from models with 10, 25, 50, 100 topics
• 400 turkers for each model size; 2 replicates 34
Becker Friedman Institute, 2016 35
36
Remarks
• Statistics and machine learning: Classic papers
• Evaluation and interpretation issues
• Bringing causality back
Becker Friedman Institute, 2016 37
A simple idea
• Make the topic proportions and the rates a function of document level covariates
• JASA paper with Molly Roberts, Brandon Stewart
• R package stm, by Molly, Brandon and Dustin Tingley
• What causal questions can we answer leveraging text data? Text as outcome or covariates.
Becker Friedman Institute, 2016 38
Acknowledgements and pointers
Rubin, Basse, Karwa, Sussman, Toulis (Harvard), Aral (MIT), Imbens (Stanford), Christakis (Yale), Manski (Northwestern), Azari, Lambert (Google), Agarwal, Xu, Ghosh (LinkedIn).
Some fundamental ideas for causal inference on large networks. (Donald B Rubin) Soon on PNAS. Optimal design of experiments in the presence of network-correlated outcomes. (Guillaume Basse) arXiv paper no. 1507.00803 Optimal design of experiments in the presence of network interference. (with discussion) Soon on EJS. Papers on text are on the JASA’s “latest articles” page