missing data in social networks - problems and prospects for model-based inference johan koskinen...
TRANSCRIPT
Missing data in social networks- Problems and prospects for
model-based inference
Johan Koskinen
The Social Statistics Discipline Area, School of Social Sciences
Mitchell Centre for Network Analysis
Tuesday, 20 December 2011 @:
A relational perspective – networks matter
Vegetarian partnerx
• Ethical• Economics• Health• Taste
Dr D eats (predominantly) vegetarian food...
Dr Dean Lusher’s ([email protected]) relational take
A relational perspective – networks matter
Someone close to you is unhappy...
... will you remain unaffected?
A relational perspective – networks matter
Equal opportunities based on our individual qualities ...
...
A relational perspective – networks matter
... bowl alone others bowl in leagues
Some people ...
... bowl alone
Social networks
marypaul
We conceive of a network as a Relation defined on a collection of individuals
relates to
“… go to for advice…”
Social networks
marypaul
We conceive of a network as a Relation defined on a collection of individuals
relates to
“… consider a friend…”
Social networks
marypaul
We conceive of a network as a Relation defined on a collection of individuals
relates to
on
off
Gen
eral
ly b
inar
y Tie present
Tie absent
Network representations
A non-directed graph A social network of tertiary students – Kalish (2003)
Network representations
Marriage ties in 15th century Florens (Padgett and Ansell, 1993)
Network representations
A directed graph
Police training squad:Confiding network(Pane, 2003)
Network representations: attributes
• The actors (nodes) in the network are individuals with– attitudes, behaviours, and attributes
• These may – guide them in their choices of partners– be shaped (influenced) by their partners
• The actors may have individual and collective outcomes
Network representations: attributes
A non-directed graph A social network of tertiary students – Kalish (2003)
Network representations: attributes
A non-directed graph A social network of tertiary students – Kalish (2003)
Jewish Arab
Network representations: attributes
High School friendship, Moody, 2001
white black other
Network representations: attributes
Romantic/sexual relationships at a US high school (Bearman, Moody & Stovel, 2004)
Guess the blue and pink
Network representations: attributes
detached team oriented positive
Team structures in training squads (Pane, 2003)
(friendship network in 12th week of training)
Multiple relations – entrailment, exchange, and generalized exchange
Physical violence
friend
Verbal violence
Violence & attitudes among school boys (Lusher, 2003)
Social networks
We conceive of the Graph as a collection of
Tie variables: {Xij: i,j V}
john pete
mary
paul
i - xij xik xil
j xji - xjl
k xki xkj - xkl
l xli xlj xlk -
x =
i - 1 1 0
j 0 - 0 0
k 0 1 - 0
l 0 1 0 -
=
Social networks
We conceive of the Graph as a collection of
Tie variables: {Xij: i,j V}
i - xij xik xil
j xji - xjl
k xki xkj - xkl
l xli xlj xlk -
x =
i - 1 1 0
j 0 - 0 0
k 0 1 - 0
l 0 1 0 -
=
l
i
j
k
Social networks
The Adjacency matrix:
The matrix of the collection Tie var. {Xij: i,j V}
i - xij xik xil
j xji - xjl
k xki xkj - xkl
l xli xlj xlk -
x =
Social networks: adjacency matrix
Read Highland tribes
0 1 0 0 0 0 0 0 0 0 0 0 0 0 1 1 1 0 0 0 0 0 0 0 0 0 0 0 0 0 1 1 0 0 0 1 0 1 1 1 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 1 0 0 0 0 1 0 0 0 0 1 0 0 0 1 1 0 0 1 1 0 0 0 0 0 0 1 0 1 1 0 1 0 0 1 1 1 0 0 0 0 0 1 1 0 1 1 0 0 0 1 1 0 0 0 0 0 0 0 0 1 0 0 0 0 1 0 0 1 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 1 0 0 0 0 0 0 0 0 1 1 1 0 0 0 1 0 0 0 0 0 0 0 0 0 1 1 1 0 0 1 0 0 0 0 0 0 0 0 0 0 0 1 0 1 1 0 0 0 1 0 0 0 0 0 0 1 0 0 0 0 0 0 0 1 0 0 0 1 1 0 0 0 0 0 0 0 0 0 0 0 0 0 1 1 1 0 0 0 0 0 0 0 0 0 0 0 0 1 0
Social networks: adjacency matrix
Read Highland tribes
0 1 0 0 0 0 0 0 0 0 0 0 0 0 1 1 1 0 0 0 0 0 0 0 0 0 0 0 0 0 1 1 0 0 0 1 0 1 1 1 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 1 0 0 0 0 1 0 0 0 0 1 0 0 0 1 1 0 0 1 1 0 0 0 0 0 0 1 0 1 1 0 1 0 0 1 1 1 0 0 0 0 0 1 1 0 1 1 0 0 0 1 1 0 0 0 0 0 0 0 0 1 0 0 0 0 1 0 0 1 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 1 0 0 0 0 0 0 0 0 1 1 1 0 0 0 1 0 0 0 0 0 0 0 0 0 1 1 1 0 0 1 0 0 0 0 0 0 0 0 0 0 0 1 0 1 1 0 0 0 1 0 0 0 0 0 0 1 0 0 0 0 0 0 0 1 0 0 0 1 1 0 0 0 0 0 0 0 0 0 0 0 0 0 1 1 1 0 0 0 0 0 0 0 0 0 0 0 0 1 0
1
1
2
11
15
16
Social networks: adjacency matrix
Read Highland tribes
0 1 0 0 0 0 0 0 0 0 0 0 0 0 1 1 1 0 0 0 0 0 0 0 0 0 0 0 0 0 1 1 0 0 0 1 0 1 1 1 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 1 0 0 0 0 1 0 0 0 0 1 0 0 0 1 1 0 0 1 1 0 0 0 0 0 0 1 0 1 1 0 1 0 0 1 1 1 0 0 0 0 0 1 1 0 1 1 0 0 0 1 1 0 0 0 0 0 0 0 0 1 0 0 0 0 1 0 0 1 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 1 0 0 0 0 0 0 0 0 1 1 1 0 0 0 1 0 0 0 0 0 0 0 0 0 1 1 1 0 0 1 0 0 0 0 0 0 0 0 0 0 0 1 0 1 1 0 0 0 1 0 0 0 0 0 0 1 0 0 0 0 0 0 0 1 0 0 0 1 1 0 0 0 0 0 0 0 0 0 0 0 0 0 1 1 1 0 0 0 0 0 0 0 0 0 0 0 0 1 0
2
Social networks: adjacency matrix
Read Highland tribes
0 1 0 0 0 0 0 0 0 0 0 0 0 0 1 1 1 0 0 0 0 0 0 0 0 0 0 0 0 0 1 1 0 0 0 1 0 1 1 1 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 1 0 0 0 0 1 0 0 0 0 1 0 0 0 1 1 0 0 1 1 0 0 0 0 0 0 1 0 1 1 0 1 0 0 1 1 1 0 0 0 0 0 1 1 0 1 1 0 0 0 1 1 0 0 0 0 0 0 0 0 1 0 0 0 0 1 0 0 1 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 1 0 0 0 0 0 0 0 0 1 1 1 0 0 0 1 0 0 0 0 0 0 0 0 0 1 1 1 0 0 1 0 0 0 0 0 0 0 0 0 0 0 1 0 1 1 0 0 0 1 0 0 0 0 0 0 1 0 0 0 0 0 0 0 1 0 0 0 1 1 0 0 0 0 0 0 0 0 0 0 0 0 0 1 1 1 0 0 0 0 0 0 0 0 0 0 0 0 1 0
2
Symmetric for a non-directed network
Social networks: adjacency matrix
Read Highland tribes
0 1 0 0 0 0 0 0 0 0 0 0 0 0 1 1 1 0 0 0 0 0 0 0 0 0 0 0 0 0 1 1 0 0 0 1 0 1 1 1 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 1 0 0 0 0 1 0 0 0 0 1 0 0 0 1 1 0 0 1 1 0 0 0 0 0 0 1 0 1 1 0 1 0 0 1 1 1 0 0 0 0 0 1 1 0 1 1 0 0 0 1 1 0 0 0 0 0 0 0 0 1 0 0 0 0 1 0 0 1 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 1 0 0 0 0 0 0 0 0 1 1 1 0 0 0 1 0 0 0 0 0 0 0 0 0 1 1 1 0 0 1 0 0 0 0 0 0 0 0 0 0 0 1 0 1 1 0 0 0 1 0 0 0 0 0 0 1 0 0 0 0 0 0 0 1 0 0 0 1 1 0 0 0 0 0 0 0 0 0 0 0 0 0 1 1 1 0 0 0 0 0 0 0 0 0 0 0 0 1 0
10
11
9
13
Social networks: adjacency matrix
Read Highland tribes
0 1 0 0 0 0 0 0 0 0 0 0 0 0 1 1 1 0 0 0 0 0 0 0 0 0 0 0 0 0 1 1 0 0 0 1 0 1 1 1 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 1 0 0 0 0 1 0 0 0 0 1 0 0 0 1 1 0 0 1 1 0 0 0 0 0 0 1 0 1 1 0 1 0 0 1 1 1 0 0 0 0 0 1 1 0 1 1 0 0 0 1 1 0 0 0 0 0 0 0 0 1 0 0 0 0 1 0 0 1 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 1 0 0 0 0 0 0 0 0 1 1 1 0 0 0 1 0 0 0 0 0 0 0 0 0 1 1 1 0 0 1 0 0 0 0 0 0 0 0 0 0 0 1 0 1 1 0 0 0 1 0 0 0 0 0 0 1 0 0 0 0 0 0 0 1 0 0 0 1 1 0 0 0 0 0 0 0 0 0 0 0 0 0 1 1 1 0 0 0 0 0 0 0 0 0 0 0 0 1 0
Zeroes along the diagonal – self ties not permitted
Part 2
Analysing social networks – Putting the building
blocks of networks together using ERGM
Do we need to “analyse” networks?
Do we need to analyse networks?
- Is the network a unique narrative?
- stick to an ethnography?
Possible answers
- Detecting systematic tendencies
- Social mechanisms
- “lift the description” to describe network in
generalizable terms
Networks matter – ERGMS matter
6018 grade 6 children 1966 – 300 schools Stockholm
Networks matter – ERGMS matter
6018 grade 6 children 1966 – 200 schools Stockholm
-10
-5
0
5
10
15
Density-4
-2
0
2
4
6
8
Mutuality-4
-3
-2
-1
0
1
2
Alt. in-stars-20
-15
-10
-5
0
5
10
Alt. out-stars
-1
0
1
2
3
Alt. trans trian-2
-1.5
-1
-0.5
0
0.5
1
1.5
Alt. indep 2-paths-2
0
2
4
6
Homophily girl-15
-10
-5
0
5
10
Main girl
CI
WLS
CIWLS
Koskinen and Stenberg (in press) JEBS
Networks matter – ERGMS matter
6018 grade 6 children 1966 – 200 schools Stockholm
-10
-5
0
5
10
15
Density-4
-2
0
2
4
6
8
Mutuality-4
-3
-2
-1
0
1
2
Alt. in-stars-20
-15
-10
-5
0
5
10
Alt. out-stars
-1
0
1
2
3
Alt. trans trian-2
-1.5
-1
-0.5
0
0.5
1
1.5
Alt. indep 2-paths-2
0
2
4
6
Homophily girl-15
-10
-5
0
5
10
Main girl
CI
WLS
CIWLS
j
i
Koskinen and Stenberg (in press) JEBS
Networks matter – ERGMS matter
6018 grade 6 children 1966 – 200 schools Stockholm
-10
-5
0
5
10
15
Density-4
-2
0
2
4
6
8
Mutuality-4
-3
-2
-1
0
1
2
Alt. in-stars-20
-15
-10
-5
0
5
10
Alt. out-stars
-1
0
1
2
3
Alt. trans trian-2
-1.5
-1
-0.5
0
0.5
1
1.5
Alt. indep 2-paths-2
0
2
4
6
Homophily girl-15
-10
-5
0
5
10
Main girl
CI
WLS
CIWLS
h1
j
i
h1
j
i
h2 h1
j
i
h2 h3
h1
j
i
h2
hk
h3
(a)
Koskinen and Stenberg (in press) JEBS
Networks matter – ERGMS matter
6018 grade 6 children 1966 – 200 schools Stockholm
-10
-5
0
5
10
15
Density-4
-2
0
2
4
6
8
Mutuality-4
-3
-2
-1
0
1
2
Alt. in-stars-20
-15
-10
-5
0
5
10
Alt. out-stars
-1
0
1
2
3
Alt. trans trian-2
-1.5
-1
-0.5
0
0.5
1
1.5
Alt. indep 2-paths-2
0
2
4
6
Homophily girl-15
-10
-5
0
5
10
Main girl
CI
WLS
CIWLS
j i
i j
Koskinen and Stenberg (in press) JEBS
Do we need to “analyse” networks?
Do we need to analyse networks?
- Is the network a unique narrative?
- stick to an ethnography?
Possible answers
- Detecting systematic tendencies
- Social mechanisms
- “lift the description” to describe network in
generalizable terms
Conceptualising the “network” as
a graph is what enables this
ERGMS – modelling graphs
ERGMS – modelling graphs
• We want to model tie variables• But structure – overall pattern – is evident• What kind of structural elements can we
incorporate in the model for the tie variables?
ERGMS – modelling graphs: example
Marriage network of Padgett’s
Florentine families
ERGMS – modelling graphs: example
Marriage network of Padgett’s
Florentine families
Model this as combination of 4 local structures
)()()()()()Pr(log 33221 xTxSxSxLxX
#)( xL #)( xT#)(2 xS #)(3 xS
Their importance measured by their parameters
ERGMS – modelling graphs: example
effect MLE S.E.Edge -4.14 1.072-star .97 .593-star -.56 .35Triangle 1.26 .61
ERGMS – modelling graphs: example
effect MLE S.E.Edge -4.14 1.072-star .97 .593-star -.56 .35Triangle 1.26 .61
Independence - Deriving the ERGM
li
j k
m n
heads
tails
li
li
heads
tails
i
k
i
k
Independence - Deriving the ERGM
0.25 0.25
0.25 0.25
AUD
0.5 0.5
SEK
0.5
0.5 li
k
li
k
Knowledge of AUD, e.g. does not help us predict SEK
e.g. whether or
li
j k
m n
Independence - Deriving the ERGM
i
i
k
Knowledge of AUD, e.g. does not help us predict SEK
e.g. whether or
even though dyad {i,l} li
and dyad {i,k}
have vertex i
in common
Independence - Deriving the ERGM
0.4 0.1
0.1 0.4
AUD
0.5 0.5
SEK
0.5
0.5
li
k
li
k
li
j k
m n
May we find model such that knowledge of AUD, e.g. does help us predict SEK
e.g. whether or ?
Deriving the ERGM: From Markov graph to Dependence graph
johnpete
mary
paul
Consider the tie-variables that have Mary in common
How may we make these “dependent”?
Deriving the ERGM: From Markov graph to Dependence graph
johnpete
mary
paul
petemary
Deriving the ERGM: From Markov graph to Dependence graph
johnpete
mary
paul
johnpetemary mary
Deriving the ERGM: From Markov graph to Dependence graph
johnpete
mary
paul
johnpetemary
paulmary
mary
Deriving the ERGM: From Markov graph to Dependence graph
johnpete
mary
paul
johnpetemary
paulmary
mary
pete john
Deriving the ERGM: From Markov graph to Dependence graph
johnpete
mary
paul
johnpetemary
paulmary
mary
paul john
pete john
Deriving the ERGM: From Markov graph to Dependence graph
johnpete
mary
paul
johnpetemary
paulmary
mary
paul pete paul john
pete john
Deriving the ERGM: From Markov graph to Dependence graph
johnpete
mary
paulm,pa
pa,pe pa,j
m,pe
pe,j
m,j
The “probability structure” of a Markov graph is described by cliques of the dependence graph (Hammersley-Clifford)….
Deriving the ERGM: From Markov graph to Dependence graph
pete
mary
paulm,pa
pa,pe pa,j
m,pe
pe,j
m,j
Deriving the ERGM: From Markov graph to Dependence graph
johnpete
mary
paul
m,pe
Deriving the ERGM: From Markov graph to Dependence graph
johnpete
mary
paulm,pa
pa,pe pa,j
m,pe
pe,j
m,j
Deriving the ERGM: From Markov graph to Dependence graph
mary
johnpete
paulm,pa
pa,pe pa,j
m,pe
pe,j
m,j
Deriving the ERGM: From Markov graph to Dependence graph
mary
johnpete
paul
m,pe m,j
Deriving the ERGM: From Markov graph to Dependence graph
mary
johnpete
paulm,pa
pa,pe pa,j
m,pe
pe,j
m,j
Deriving the ERGM: From Markov graph to Dependence graph
mary
johnpete
paulm,pa
pa,pe pa,j
m,pe
pe,j
m,j
Deriving the ERGM: From Markov graph to Dependence graph
mary
johnpete
paul
m,pe
pe,j
m,j
Deriving the ERGM: From Markov graph to Dependence graph
mary
johnpete
paulm,pa
pa,pe pa,j
m,pe
pe,j
m,j
Deriving the ERGM: From Markov graph to Dependence graph
mary
johnpete
paulm,pa
pa,pe pa,j
m,pe
pe,j
m,j
Deriving the ERGM: From Markov graph to Dependence graph
mary
johnpete
paulm,pa
pa,j
m,pe
pe,j
Deriving the ERGM: From Markov graph to Dependence graph
mary
johnpete
paulm,pa
pa,pe pa,j
m,pe
pe,j
m,j
From Markov graph to Dependence graph – distinct subgraphs?
too ma
ny sta
tistic
s (par
ameter
s)
A log-linear model (ERGM) for ties
)()()()()Pr(log 2211 xzxzxzxX pp
”Aggregated” to a joint model for entire adjacency matrix
Interaction terms in log-linear model of types
ijX ikij XX jkikij XXX
A log-linear model (ERGM) for ties
By definition of (in-) dependence
)Pr()Pr(),Pr( ikikijijikikijij xXxXxXxX
E.g. and co-occuring i
j
i
j k
i
k
Main effects interaction term
ijX ikX ikij XX
More than is explained by margins
Likelihood equations for exponential fam
)()()()()Pr(log 2211 xzxzxzxX pp
”Aggregated” to a joint model for entire adjacency matrix X
)()}({ˆ obsxzXzEMLE
Sum over all 2n(n-1)/2 graphs
The MLE solves the equation (cf. Lehmann, 1983):
Likelihood equations for exponential fam
Solving )()}({ˆ obsxzXzEMLE
• Using the cumulant generating function (Corander, Dahmström, and Dahmström, 1998)
• Stochastic approximation (Snijders, 2002, based on Robbins-Monro, 1951)
• Importance sampling (Handcock, 2003; Hunter and Handcock, 2006, based on Geyer-Thompson 1992)
)}()({ obs)(1
0)()1(
)( xzxzDa mr
mmm
Robbins-Monro algorithm
Solving )()}({ˆ obsxzXzEMLE
Snijders, 2002, algorithm
- Initialisation phase
- Main estimation
- convergence check and cal. of standard errors
MAIN:
Draw using MCMC
)()()( obs1
)()(1)1()1()( xzxzwIM
m
mmggg
Geyer-Thompson
Solving )()}({ˆ obsxzXzEMLE
Handcock, 2003, approximate Fisher scoring
MAIN:
Approximated using importance sample from MCMC
Bayes: dealing with likelihood
The normalising constant of the posterior not essential for Bayesian inference, all we need is:
y
p
k kk
p
k kk
yz
xzx
1
1
)}(exp{
)}(exp{);(
)();(d)();(
)();()|(
x
x
xx
… but
Sum over all 2n(n-1)/2 graphs
Bayes: MCMC?
Consequently, in e.g. Metropolis-Hastings, acceptance probability of move to θ
y
p
k kk
y
p
k kk
yz
yz
1
*
1
)}(exp{
)}(exp{
)|(
)|(
)();(
*)();(min
)|(
)|(
)|(
)|(,1min
*
**
*
**
prop
prop
prop
prop
q
q
x
x
q
q
x
x
… which contains
Bayes: Linked Importance Sampler Auxiliary Variable MCMC
LISA (Koskinen, 2008; Koskinen, Robins & Pattison, 2010): Based on
Møller et al. (2006), we define an auxiliary variable
And produce draws from the joint posterior
m
j
K KK1
,,1,,1 X
)(})(exp{
)(
})(exp{
})(exp{)|,( ,
yz
P
yz
xzx
kk
B
kk
obskkobs
using the proposal distributions
),(~| )()(* tt N and )}(exp{
)(~|
*
*
,** *
yz
P
kk
F
Bayes: alternative auxiliary variable
LISA (Koskinen, 2008; Koskinen, Robins & Pattison, 2010): Based on
Møller et al. (2006), we define an auxiliary variable
Improvement: use exchange algorithm (Murray et al. 2006)
m
j
K KK1
,,1,,1 X
Many linked chains:- Computation time- storage (memory and time issues)
),(~| )()(* tt N and )ERGM(~| *** x
Accept θ* with log-probability: ))}(*)((*)(,0min{ obsT xzxz
Caimo & Friel, 2011
Bayes: Implications of using alternative auxiliary variable
Improvement: use exchange algorithm (Murray et al. 2006)
),(~| )()(* tt N and )ERGM(~| *** x
Accept θ* with log-probability: ))}(*)((*)(,0min{ obsT xzxz
Caimo & Friel, 2011
• Storing only parameters• No pre tuning – no need for good initial values• Standard MCMC properties of sampler• Less sensitive to near degeneracy in estimation• Easier than anything else to implement
QUICK and ROBUST
Sampling in/on networks
Sampling in/on networks
Sampling in/on networks
Sampling in/on networks
Sampling in/on networks
Sampling in/on networks
Sampling in/on networks
Sampling in/on networks
Sampling in/on networks
=
0
0
0
1
0
1
0
0 0
0
0
00
0
x =
-
- 0
1
1 0
1
0
0 1
Sampling in/on networks
=
0
0
0
1
0
1
0
0 0
0
0
00
0
x =
-
-
-
0
1
0
1 00 -
1
0
-
1
00 1 1 -
Sampling in/on networks
=
0
0
0
1
0
1
0
0
?
0
?
0
??
??0
00
0
x =
-
-
-
-
-
0
1
?
?
0
1 0 ? ?0 -
1
0
?
?
-
1
?00 1 ?1 -
Ignoring non-sampled?
=
0
0
0
1
0
1
0
0
?
0
?
0
??
??0
00
0
x =
-
-
-
-
-
0
1
?
?
0
1 0 ? ?0 -
1
0
?
?
-
1
?00 1 ?1 -
What about alter – alter across ego?
=
0
0
0
1
0
1
0
0
?
0
?
0
??
??0
00
0
x =
-
-
-
-
-
0
1
?
?
0
1 0 ? ?0 -
1
0
?
?
-
?
?00 1 ?? -
School classes
School classes
School classes
School classes
Multilevel attribute models
If network like another level:
Groups:
Group indicators:
Networks in groups (scaled):
With random intercepts:
Problem of boundary specification
By design – children do not nominate alters outside of school class
Problem of boundary specification
By design – children do not nominate alters outside of school class
Out of school
To other school class
Problem of boundary specification
By design – children do not nominate alters outside of school class
Out of school
To other school class
Multilevel autocorrelation/nef models
=
0
0
0
0
0
0
0
0
0
0
0
0
00
000
0
0
0
Multilevel autocorrelation/nef models
=
?
?
?
?
?
?
?
?
?
?
?
?
??
???
?
?
?
Model assisted treatment of missing network data
missing data
observed data
If you don’t have a model for what you have observed
How are you going to be able to say something about what you have not observed using what you have observed
Model assisted treatment of missing network data
• Importance sampling (Handcock & Gile 2010; Koskinen, Robins & Pattison, 2010)
• Stochastic approximation and the missing data principle (Orchard & Woodbury,1972) (Koskinen & Snijders, forthcoming)
• Bayesian data augmentation (Koskinen, Robins & Pattison, 2010)
What about alter – alter across ego?
missing data
observed data
Available case analysis: pretend missing does not exist
The principled approach in ERGM framework
missing data
observed data
We have to simulate the missing (complement)
and “pool” our inferences
Subgraph of ERGM not ERGM
i
j
k
Dependence in ERGM We may also have dependence
i
j
lk
But if
k
?
j
We should include
counts of:
Marginalisation (Snijders, 2010; Koskinen et al, 2010)
Bayesian Data Augmentation
With missing data:
Simulate parameters
In each iteration simulate graphs
missing
Bayesian Data Augmentation
),(~| )()(* tt N
)ERGM(~| *** x
))}(*)((*)(,0min{ obsT xzxz
Bayesian Data Augmentation
Simulate parameters
With missing data:
In each iteration simulate graphs
missing
Most likely missing given current
Bayesian Data Augmentation
Bayesian Data Augmentation
Simulate parameters
With missing data:
In each iteration simulate graphs
missing
Most likely given current missing
Bayesian Data Augmentation
Bayesian Data Augmentation
Simulate parameters
With missing data:
In each iteration simulate graphs
missing
Most likely missing given current
Bayesian Data Augmentation
Bayesian Data Augmentation
Simulate parameters
With missing data:
In each iteration simulate graphs
missing
Most likely given current missing
Bayesian Data Augmentation
Bayesian Data Augmentation
Simulate parameters
With missing data:
In each iteration simulate graphs
missing
Most likely missing given current
Bayesian Data Augmentation
Bayesian Data Augmentation
Simulate parameters
With missing data:
In each iteration simulate graphs
missing
and so on…
Bayesian Data Augmentation
Bayesian Data Augmentation
Simulate parameters
With missing data:
In each iteration simulate graphs
missing
… until
Bayesian Data Augmentation
Bayesian Data Augmentation
What does it give us?
Distribution of parameters
Distribution of missing data
Subtle point
Missing data does not depend on the parameters (we don’t have to choose parameters to simulate missing)
missing
Bayesian Data Augmentation
Bayesian Data Augmentation
What does it give us?
Distribution of parameters
Distribution of missing data
Subtle point
Missing data does not depend on the parameters (we don’t have to choose parameters to simulate missing)
missing
Bayesian Data Augmentation
Bayesian Data AugmentationLazega’s (2001) Lawyers
Collaboration network among 36 lawyers in a
New England law firm (Lazega, 2001)
Boston office:
Hartford office:
Providence off.:
least senior:
most senior:
Bayesian Data AugmentationLazega’s (2001) Lawyers
133
ijx
)( jiij aax )( jiij bbx
)( jiij bbx 1
)( jiij ccx 1
)( jiij ddx 1
323
12
1
)()1(
)()(3
n
nn xtxtxt
Edges:
Seniority:
Practice:
Homophily
Sex:
Office:
GWESP:
with 8 = log()
Practice:
Main effect
t1 :
t2 :
etc.
(bi = 1, if i corporate,0 litigation)
t3 :
Bayesian Data AugmentationLazega’s (2001) Lawyers – ERGM posteriors (Koskinen, 2008)
Bayesian Data AugmentationCross validation (Koskinen, Robins & Pattison, 2010)
Remove 200 of the 630 dyads at random
Fit inhomogeneous Bernoulli model obtain the posterior predictive tie-probabilities for the missing tie-variables
Fit ERGM and obtain the posterior predictive tie-probabilities for the missing tie-variables (Koskinen et al., in press)
Fit Hoff’s (2008) latent variable probit model with linear predictor Tz(xij) + wiwj
T
Repeat many times
Bayesian Data AugmentationROC curve for predictive probabilities combined over 20 replications (Koskinen et al. 2010)
0 0.2 0.4 0.6 0.8 10
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
1
False Positive Rate
Tru
e P
ositi
ve R
ate
Bayesian Data AugmentationROC curve for predictive probabilities combined over 20 replications (Koskinen et al. 2010)
0 0.2 0.4 0.6 0.8 10
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
1
False Positive Rate
Tru
e P
ositi
ve R
ate
Bayesian Data AugmentationROC curve for predictive probabilities combined over 20 replications (Koskinen et al. 2010)
0 0.2 0.4 0.6 0.8 10
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
1
False Positive Rate
Tru
e P
ositi
ve R
ate
Part 8
Estimation of ERGM with missing data
- Sampled data and “covert” actors
Bayesian Data AugmentationSnowball sampling
• Snowball sampling design ignorable for ERGM (Thompson and Frank, 2000, Handcock & Gile 2010; Koskinen, Robins & Pattison, 2010)
• ... but snowball sampling rarely used when population size is known...
• Using the Sageman (2004) “clandestine” network as test-bed for unknown N
Bayesian Data AugmentationSnowball sampling
• Snowball sampling design ignorable for ERGM (Thompson and Frank, 2000, Handcock & Gile 2010; Koskinen, Robins & Pattison, 2010)
• ... but snowball sampling rarely used when population size is known...
• Using the Sageman (2004) “clandestine” network as test-bed for unknown N
Bayesian Data Augmentationthe Sageman (2004) N = 366 network
Bayesian Data Augmentationthe Sageman (2004) N = 366 network
Bayesian Data Augmentationthe Sageman (2004) N = 366 network
Bayesian Data Augmentationthe Sageman (2004) N = 366 network
Bayesian Data Augmentationthe Sageman (2004) N = 366 network
Take seed of size
n = 120
Snowball out 1 wave. Additional nodes
m = 160
Bayesian Data Augmentationthe Sageman (2004) N = 366 network
Seed n = 120, first wave m = 160, N ≥ 280
Bayesian Data Augmentationthe Sageman (2004) N = 366 network
Seed n = 120, first wave m = 160, N ≥ 280
-5 -4.5 -4 -3.5 -3 -2.5 -2 -1.5 -1 -0.5 00
0.5
1
density
-1.6 -1.4 -1.2 -1 -0.8 -0.6 -0.40
2
4
alt. star
1.4 1.5 1.6 1.7 1.8 1.9 2 2.1 2.2 2.3 2.40
2
4
alt. triangle
Assume
N = 281
Bayesian Data Augmentationthe Sageman (2004) N = 366 network
Seed n = 120, first wave m = 160, N ≥ 280
-5 -4.5 -4 -3.5 -3 -2.5 -2 -1.5 -1 -0.5 00
0.5
1
density
-1.6 -1.4 -1.2 -1 -0.8 -0.6 -0.40
2
4
alt. star
1.4 1.5 1.6 1.7 1.8 1.9 2 2.1 2.2 2.3 2.40
2
4
alt. triangle
Assume
N = 281
N = 291
Bayesian Data Augmentationthe Sageman (2004) N = 366 network
Seed n = 120, first wave m = 160, N ≥ 280
-5 -4.5 -4 -3.5 -3 -2.5 -2 -1.5 -1 -0.5 00
0.5
1
density
-1.6 -1.4 -1.2 -1 -0.8 -0.6 -0.40
2
4
alt. star
1.4 1.5 1.6 1.7 1.8 1.9 2 2.1 2.2 2.3 2.40
2
4
alt. triangle
Assume
N = 281
N = 291
N = 301
Bayesian Data Augmentationthe Sageman (2004) N = 366 network
Seed n = 120, first wave m = 160, N ≥ 280
-5 -4.5 -4 -3.5 -3 -2.5 -2 -1.5 -1 -0.5 00
0.5
1
density
-1.6 -1.4 -1.2 -1 -0.8 -0.6 -0.40
2
4
alt. star
1.4 1.5 1.6 1.7 1.8 1.9 2 2.1 2.2 2.3 2.40
2
4
alt. triangle
Assume
N = 281
N = 291
N = 301
N = 311
Bayesian Data Augmentationthe Sageman (2004) N = 366 network
Seed n = 120, first wave m = 160, N ≥ 280
Assume
N = 281
N = 291
N = 301
N = 311
...
N = 391
N = 396
N = 399
-5 -4.5 -4 -3.5 -3 -2.5 -2 -1.5 -1 -0.5 00
0.5
1
density
-1.6 -1.4 -1.2 -1 -0.8 -0.6 -0.40
2
4
alt. star
1.4 1.5 1.6 1.7 1.8 1.9 2 2.1 2.2 2.3 2.40
2
4
alt. triangle
Bayesian Data Augmentationthe Sageman (2004) N = 366 network
Seed n = 120, first wave m = 160, N ≥ 280
Assume
N = 281
N = 291
N = 301
N = 311
...
N = 391
N = 396
N = 399300 320 340 360 380
-4.5
-4
-3.5
-3
-2.5
-2
-1.5
-1
-0.5
edge
s
300 320 340 360 380
-1.6
-1.4
-1.2
-1
-0.8
-0.6
alt.
star
300 320 340 360 3801.5
1.6
1.7
1.8
1.9
2
2.1
2.2
2.3
alt.
trian
gle
.95 credibility intervals
N
Bayesian Data Augmentationthe Sageman (2004) N = 366 network
Seed n = 120, first wave m = 160, N ≥ 280
Assume
N = 281
N = 291
N = 301
N = 311
...
N = 391
N = 396
N = 399300 320 340 360 380
-4.5
-4
-3.5
-3
-2.5
-2
-1.5
-1
-0.5
edge
s
300 320 340 360 380
-1.6
-1.4
-1.2
-1
-0.8
-0.6
alt.
star
300 320 340 360 3801.5
1.6
1.7
1.8
1.9
2
2.1
2.2
2.3
alt.
trian
gle
.95 credibility intervals
Bayesian Data Augmentationthe Sageman (2004) N = 366 network
Seed n = 120, first wave m = 160, N ≥ 280
Assume
N = 281
N = 291
N = 301
N = 311
...
N = 391
N = 396
N = 399300 320 340 360 380
-4.5
-4
-3.5
-3
-2.5
-2
-1.5
-1
-0.5
edge
s
300 320 340 360 380
-1.6
-1.4
-1.2
-1
-0.8
-0.6
alt.
star
300 320 340 360 3801.5
1.6
1.7
1.8
1.9
2
2.1
2.2
2.3
alt.
trian
gle
.95 credibility intervals
Bayesian Data Augmentationthe Sageman (2004) N = 366 network
Seed n = 120, first wave m = 160, N ≥ 280
.95 prediction intervals
0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 160
50
100
N =
281
Degree
0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 160
50
100
N =
366
Degree
0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 160
50
100
N =
396
Degree
Bernoulli
ERGM
obs
Bayesian Data Augmentationthe Sageman (2004) N = 366 network
Seed n = 120, first wave m = 160, N ≥ 280
.95 prediction intervals
Bernoulli
ERGM
obs 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20
0
0.5
1
N =
281
Geodesic
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 200
0.5
1
N =
366
Geodesic
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20
0.5
1
N =
396
Geodesic
Bayesian Data AugmentationSnowball sampling – Next steps
We can fit and predict missing conditional on N
Next:
Marginalise with respect to N, and
Estimate N - Use path sampler- Take combinatorics of zero block into account
How large networks can we allow for?
Large N• ERGMs do not scale up (cp missing data
experiments)• Lot of unobserved data – lot of
unobserved covariates• Computational issues – time and memory• Heterogeneity…
How large networks can we allow for?
ERGMs typically assume homogeneity
=
=
=
=
(A)Block modelling and ERGM (Koskinen, 2009)(B) Latent class ERGM (Schweingberger & Handcock)
Solutions and future directions
Ignoring unknown N:- Conditional MLE for snowball sample does not
require knowledge of N (sic!) (Pattison et al., in preparation)
Estimating N:- Bernoulli assumptions (Frank and Snijders,
1994 JOS)- Using ERGM and Bayes factors? (Koskinen et
al., in preparation)- Using heuristic GOF; posterior predictive
distributions, re-sampling and copula (?)
Wrap-up
ERGMs- Increasingly being used- Increasingly being understood- Increasingly being able to handle imperfect data (also missing link prediction)
Methods-Plenty of open issues-Bayes is the way of the futureLegitimacy and dissemination- e.g. Lusher, Koskinen, Robins ERGMs for SN, CUP, 2011