missing data in social networks - problems and prospects for model-based inference johan koskinen...

163
Missing data in social networks - Problems and prospects for model-based inference Johan Koskinen [email protected] The Social Statistics Discipline Area, School of Social Sciences Mitchell Centre for Network Analysis Tuesday, 20 December 2011 @:

Upload: noel-nolan-glover

Post on 28-Mar-2015

218 views

Category:

Documents


1 download

TRANSCRIPT

Page 1: Missing data in social networks - Problems and prospects for model-based inference Johan Koskinen johan.koskinen@manchester.ac.uk The Social Statistics

Missing data in social networks- Problems and prospects for

model-based inference

Johan Koskinen

[email protected]

The Social Statistics Discipline Area, School of Social Sciences

Mitchell Centre for Network Analysis

Tuesday,  20 December 2011 @:

Page 2: Missing data in social networks - Problems and prospects for model-based inference Johan Koskinen johan.koskinen@manchester.ac.uk The Social Statistics

A relational perspective – networks matter

Vegetarian partnerx

• Ethical• Economics• Health• Taste

Dr D eats (predominantly) vegetarian food...

Dr Dean Lusher’s ([email protected]) relational take

Page 3: Missing data in social networks - Problems and prospects for model-based inference Johan Koskinen johan.koskinen@manchester.ac.uk The Social Statistics

A relational perspective – networks matter

Someone close to you is unhappy...

... will you remain unaffected?

Page 4: Missing data in social networks - Problems and prospects for model-based inference Johan Koskinen johan.koskinen@manchester.ac.uk The Social Statistics

A relational perspective – networks matter

Equal opportunities based on our individual qualities ...

...

Page 5: Missing data in social networks - Problems and prospects for model-based inference Johan Koskinen johan.koskinen@manchester.ac.uk The Social Statistics

A relational perspective – networks matter

... bowl alone others bowl in leagues

Some people ...

... bowl alone

Page 6: Missing data in social networks - Problems and prospects for model-based inference Johan Koskinen johan.koskinen@manchester.ac.uk The Social Statistics

Part 1

Network representations

Page 7: Missing data in social networks - Problems and prospects for model-based inference Johan Koskinen johan.koskinen@manchester.ac.uk The Social Statistics

Social networks

marypaul

We conceive of a network as a Relation defined on a collection of individuals

relates to

“… go to for advice…”

Page 8: Missing data in social networks - Problems and prospects for model-based inference Johan Koskinen johan.koskinen@manchester.ac.uk The Social Statistics

Social networks

marypaul

We conceive of a network as a Relation defined on a collection of individuals

relates to

“… consider a friend…”

Page 9: Missing data in social networks - Problems and prospects for model-based inference Johan Koskinen johan.koskinen@manchester.ac.uk The Social Statistics

Social networks

marypaul

We conceive of a network as a Relation defined on a collection of individuals

relates to

on

off

Gen

eral

ly b

inar

y Tie present

Tie absent

Page 10: Missing data in social networks - Problems and prospects for model-based inference Johan Koskinen johan.koskinen@manchester.ac.uk The Social Statistics

Network representations

A non-directed graph A social network of tertiary students – Kalish (2003)

Page 11: Missing data in social networks - Problems and prospects for model-based inference Johan Koskinen johan.koskinen@manchester.ac.uk The Social Statistics

Network representations

Marriage ties in 15th century Florens (Padgett and Ansell, 1993)

Page 12: Missing data in social networks - Problems and prospects for model-based inference Johan Koskinen johan.koskinen@manchester.ac.uk The Social Statistics

Network representations

A directed graph

Police training squad:Confiding network(Pane, 2003)

Page 13: Missing data in social networks - Problems and prospects for model-based inference Johan Koskinen johan.koskinen@manchester.ac.uk The Social Statistics

Network representations

World Trade in 1992

Plümper, 2003, JOSS

Page 14: Missing data in social networks - Problems and prospects for model-based inference Johan Koskinen johan.koskinen@manchester.ac.uk The Social Statistics

Network representations: attributes

• The actors (nodes) in the network are individuals with– attitudes, behaviours, and attributes

• These may – guide them in their choices of partners– be shaped (influenced) by their partners

• The actors may have individual and collective outcomes

Page 15: Missing data in social networks - Problems and prospects for model-based inference Johan Koskinen johan.koskinen@manchester.ac.uk The Social Statistics

Network representations: attributes

A non-directed graph A social network of tertiary students – Kalish (2003)

Page 16: Missing data in social networks - Problems and prospects for model-based inference Johan Koskinen johan.koskinen@manchester.ac.uk The Social Statistics

Network representations: attributes

A non-directed graph A social network of tertiary students – Kalish (2003)

Jewish Arab

Page 17: Missing data in social networks - Problems and prospects for model-based inference Johan Koskinen johan.koskinen@manchester.ac.uk The Social Statistics

Network representations: attributes

High School friendship, Moody, 2001

white black other

Page 18: Missing data in social networks - Problems and prospects for model-based inference Johan Koskinen johan.koskinen@manchester.ac.uk The Social Statistics

Network representations: attributes

Romantic/sexual relationships at a US high school (Bearman, Moody & Stovel, 2004)

Guess the blue and pink

Page 19: Missing data in social networks - Problems and prospects for model-based inference Johan Koskinen johan.koskinen@manchester.ac.uk The Social Statistics

Network representations: attributes

detached team oriented positive

Team structures in training squads (Pane, 2003)

(friendship network in 12th week of training)

Page 20: Missing data in social networks - Problems and prospects for model-based inference Johan Koskinen johan.koskinen@manchester.ac.uk The Social Statistics

Multiple relations – entrailment, exchange, and generalized exchange

Physical violence

friend

Verbal violence

Violence & attitudes among school boys (Lusher, 2003)

Page 21: Missing data in social networks - Problems and prospects for model-based inference Johan Koskinen johan.koskinen@manchester.ac.uk The Social Statistics

Social networks

We conceive of the Graph as a collection of

Tie variables: {Xij: i,j V}

john pete

mary

paul

i - xij xik xil

j xji - xjl

k xki xkj - xkl

l xli xlj xlk -

x =

i - 1 1 0

j 0 - 0 0

k 0 1 - 0

l 0 1 0 -

=

Page 22: Missing data in social networks - Problems and prospects for model-based inference Johan Koskinen johan.koskinen@manchester.ac.uk The Social Statistics

Social networks

We conceive of the Graph as a collection of

Tie variables: {Xij: i,j V}

i - xij xik xil

j xji - xjl

k xki xkj - xkl

l xli xlj xlk -

x =

i - 1 1 0

j 0 - 0 0

k 0 1 - 0

l 0 1 0 -

=

l

i

j

k

Page 23: Missing data in social networks - Problems and prospects for model-based inference Johan Koskinen johan.koskinen@manchester.ac.uk The Social Statistics

Social networks

The Adjacency matrix:

The matrix of the collection Tie var. {Xij: i,j V}

i - xij xik xil

j xji - xjl

k xki xkj - xkl

l xli xlj xlk -

x =

Page 24: Missing data in social networks - Problems and prospects for model-based inference Johan Koskinen johan.koskinen@manchester.ac.uk The Social Statistics

Social networks: adjacency matrix

Read Highland tribes

0 1 0 0 0 0 0 0 0 0 0 0 0 0 1 1 1 0 0 0 0 0 0 0 0 0 0 0 0 0 1 1 0 0 0 1 0 1 1 1 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 1 0 0 0 0 1 0 0 0 0 1 0 0 0 1 1 0 0 1 1 0 0 0 0 0 0 1 0 1 1 0 1 0 0 1 1 1 0 0 0 0 0 1 1 0 1 1 0 0 0 1 1 0 0 0 0 0 0 0 0 1 0 0 0 0 1 0 0 1 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 1 0 0 0 0 0 0 0 0 1 1 1 0 0 0 1 0 0 0 0 0 0 0 0 0 1 1 1 0 0 1 0 0 0 0 0 0 0 0 0 0 0 1 0 1 1 0 0 0 1 0 0 0 0 0 0 1 0 0 0 0 0 0 0 1 0 0 0 1 1 0 0 0 0 0 0 0 0 0 0 0 0 0 1 1 1 0 0 0 0 0 0 0 0 0 0 0 0 1 0

Page 25: Missing data in social networks - Problems and prospects for model-based inference Johan Koskinen johan.koskinen@manchester.ac.uk The Social Statistics

Social networks: adjacency matrix

Read Highland tribes

0 1 0 0 0 0 0 0 0 0 0 0 0 0 1 1 1 0 0 0 0 0 0 0 0 0 0 0 0 0 1 1 0 0 0 1 0 1 1 1 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 1 0 0 0 0 1 0 0 0 0 1 0 0 0 1 1 0 0 1 1 0 0 0 0 0 0 1 0 1 1 0 1 0 0 1 1 1 0 0 0 0 0 1 1 0 1 1 0 0 0 1 1 0 0 0 0 0 0 0 0 1 0 0 0 0 1 0 0 1 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 1 0 0 0 0 0 0 0 0 1 1 1 0 0 0 1 0 0 0 0 0 0 0 0 0 1 1 1 0 0 1 0 0 0 0 0 0 0 0 0 0 0 1 0 1 1 0 0 0 1 0 0 0 0 0 0 1 0 0 0 0 0 0 0 1 0 0 0 1 1 0 0 0 0 0 0 0 0 0 0 0 0 0 1 1 1 0 0 0 0 0 0 0 0 0 0 0 0 1 0

1

1

2

11

15

16

Page 26: Missing data in social networks - Problems and prospects for model-based inference Johan Koskinen johan.koskinen@manchester.ac.uk The Social Statistics

Social networks: adjacency matrix

Read Highland tribes

0 1 0 0 0 0 0 0 0 0 0 0 0 0 1 1 1 0 0 0 0 0 0 0 0 0 0 0 0 0 1 1 0 0 0 1 0 1 1 1 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 1 0 0 0 0 1 0 0 0 0 1 0 0 0 1 1 0 0 1 1 0 0 0 0 0 0 1 0 1 1 0 1 0 0 1 1 1 0 0 0 0 0 1 1 0 1 1 0 0 0 1 1 0 0 0 0 0 0 0 0 1 0 0 0 0 1 0 0 1 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 1 0 0 0 0 0 0 0 0 1 1 1 0 0 0 1 0 0 0 0 0 0 0 0 0 1 1 1 0 0 1 0 0 0 0 0 0 0 0 0 0 0 1 0 1 1 0 0 0 1 0 0 0 0 0 0 1 0 0 0 0 0 0 0 1 0 0 0 1 1 0 0 0 0 0 0 0 0 0 0 0 0 0 1 1 1 0 0 0 0 0 0 0 0 0 0 0 0 1 0

2

Page 27: Missing data in social networks - Problems and prospects for model-based inference Johan Koskinen johan.koskinen@manchester.ac.uk The Social Statistics

Social networks: adjacency matrix

Read Highland tribes

0 1 0 0 0 0 0 0 0 0 0 0 0 0 1 1 1 0 0 0 0 0 0 0 0 0 0 0 0 0 1 1 0 0 0 1 0 1 1 1 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 1 0 0 0 0 1 0 0 0 0 1 0 0 0 1 1 0 0 1 1 0 0 0 0 0 0 1 0 1 1 0 1 0 0 1 1 1 0 0 0 0 0 1 1 0 1 1 0 0 0 1 1 0 0 0 0 0 0 0 0 1 0 0 0 0 1 0 0 1 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 1 0 0 0 0 0 0 0 0 1 1 1 0 0 0 1 0 0 0 0 0 0 0 0 0 1 1 1 0 0 1 0 0 0 0 0 0 0 0 0 0 0 1 0 1 1 0 0 0 1 0 0 0 0 0 0 1 0 0 0 0 0 0 0 1 0 0 0 1 1 0 0 0 0 0 0 0 0 0 0 0 0 0 1 1 1 0 0 0 0 0 0 0 0 0 0 0 0 1 0

2

Symmetric for a non-directed network

Page 28: Missing data in social networks - Problems and prospects for model-based inference Johan Koskinen johan.koskinen@manchester.ac.uk The Social Statistics

Social networks: adjacency matrix

Read Highland tribes

0 1 0 0 0 0 0 0 0 0 0 0 0 0 1 1 1 0 0 0 0 0 0 0 0 0 0 0 0 0 1 1 0 0 0 1 0 1 1 1 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 1 0 0 0 0 1 0 0 0 0 1 0 0 0 1 1 0 0 1 1 0 0 0 0 0 0 1 0 1 1 0 1 0 0 1 1 1 0 0 0 0 0 1 1 0 1 1 0 0 0 1 1 0 0 0 0 0 0 0 0 1 0 0 0 0 1 0 0 1 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 1 0 0 0 0 0 0 0 0 1 1 1 0 0 0 1 0 0 0 0 0 0 0 0 0 1 1 1 0 0 1 0 0 0 0 0 0 0 0 0 0 0 1 0 1 1 0 0 0 1 0 0 0 0 0 0 1 0 0 0 0 0 0 0 1 0 0 0 1 1 0 0 0 0 0 0 0 0 0 0 0 0 0 1 1 1 0 0 0 0 0 0 0 0 0 0 0 0 1 0

10

11

9

13

Page 29: Missing data in social networks - Problems and prospects for model-based inference Johan Koskinen johan.koskinen@manchester.ac.uk The Social Statistics

Social networks: adjacency matrix

Read Highland tribes

0 1 0 0 0 0 0 0 0 0 0 0 0 0 1 1 1 0 0 0 0 0 0 0 0 0 0 0 0 0 1 1 0 0 0 1 0 1 1 1 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 1 0 0 0 0 1 0 0 0 0 1 0 0 0 1 1 0 0 1 1 0 0 0 0 0 0 1 0 1 1 0 1 0 0 1 1 1 0 0 0 0 0 1 1 0 1 1 0 0 0 1 1 0 0 0 0 0 0 0 0 1 0 0 0 0 1 0 0 1 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 1 0 0 0 0 0 0 0 0 1 1 1 0 0 0 1 0 0 0 0 0 0 0 0 0 1 1 1 0 0 1 0 0 0 0 0 0 0 0 0 0 0 1 0 1 1 0 0 0 1 0 0 0 0 0 0 1 0 0 0 0 0 0 0 1 0 0 0 1 1 0 0 0 0 0 0 0 0 0 0 0 0 0 1 1 1 0 0 0 0 0 0 0 0 0 0 0 0 1 0

Zeroes along the diagonal – self ties not permitted

Page 30: Missing data in social networks - Problems and prospects for model-based inference Johan Koskinen johan.koskinen@manchester.ac.uk The Social Statistics

Part 2

Analysing social networks – Putting the building

blocks of networks together using ERGM

Page 31: Missing data in social networks - Problems and prospects for model-based inference Johan Koskinen johan.koskinen@manchester.ac.uk The Social Statistics

Do we need to “analyse” networks?

Do we need to analyse networks?

- Is the network a unique narrative?

- stick to an ethnography?

Possible answers

- Detecting systematic tendencies

- Social mechanisms

- “lift the description” to describe network in

generalizable terms

Page 32: Missing data in social networks - Problems and prospects for model-based inference Johan Koskinen johan.koskinen@manchester.ac.uk The Social Statistics

Networks matter – ERGMS matter

6018 grade 6 children 1966

FEMALEMale

Page 33: Missing data in social networks - Problems and prospects for model-based inference Johan Koskinen johan.koskinen@manchester.ac.uk The Social Statistics

Networks matter – ERGMS matter

6018 grade 6 children 1966 – 300 schools Stockholm

Page 34: Missing data in social networks - Problems and prospects for model-based inference Johan Koskinen johan.koskinen@manchester.ac.uk The Social Statistics

Networks matter – ERGMS matter

6018 grade 6 children 1966 – 200 schools Stockholm

-10

-5

0

5

10

15

Density-4

-2

0

2

4

6

8

Mutuality-4

-3

-2

-1

0

1

2

Alt. in-stars-20

-15

-10

-5

0

5

10

Alt. out-stars

-1

0

1

2

3

Alt. trans trian-2

-1.5

-1

-0.5

0

0.5

1

1.5

Alt. indep 2-paths-2

0

2

4

6

Homophily girl-15

-10

-5

0

5

10

Main girl

CI

WLS

CIWLS

Koskinen and Stenberg (in press) JEBS

Page 35: Missing data in social networks - Problems and prospects for model-based inference Johan Koskinen johan.koskinen@manchester.ac.uk The Social Statistics

Networks matter – ERGMS matter

6018 grade 6 children 1966 – 200 schools Stockholm

-10

-5

0

5

10

15

Density-4

-2

0

2

4

6

8

Mutuality-4

-3

-2

-1

0

1

2

Alt. in-stars-20

-15

-10

-5

0

5

10

Alt. out-stars

-1

0

1

2

3

Alt. trans trian-2

-1.5

-1

-0.5

0

0.5

1

1.5

Alt. indep 2-paths-2

0

2

4

6

Homophily girl-15

-10

-5

0

5

10

Main girl

CI

WLS

CIWLS

j

i

Koskinen and Stenberg (in press) JEBS

Page 36: Missing data in social networks - Problems and prospects for model-based inference Johan Koskinen johan.koskinen@manchester.ac.uk The Social Statistics

Networks matter – ERGMS matter

6018 grade 6 children 1966 – 200 schools Stockholm

-10

-5

0

5

10

15

Density-4

-2

0

2

4

6

8

Mutuality-4

-3

-2

-1

0

1

2

Alt. in-stars-20

-15

-10

-5

0

5

10

Alt. out-stars

-1

0

1

2

3

Alt. trans trian-2

-1.5

-1

-0.5

0

0.5

1

1.5

Alt. indep 2-paths-2

0

2

4

6

Homophily girl-15

-10

-5

0

5

10

Main girl

CI

WLS

CIWLS

h1

j

i

h1

j

i

h2 h1

j

i

h2 h3

h1

j

i

h2

hk

h3

(a)

Koskinen and Stenberg (in press) JEBS

Page 37: Missing data in social networks - Problems and prospects for model-based inference Johan Koskinen johan.koskinen@manchester.ac.uk The Social Statistics

Networks matter – ERGMS matter

6018 grade 6 children 1966 – 200 schools Stockholm

-10

-5

0

5

10

15

Density-4

-2

0

2

4

6

8

Mutuality-4

-3

-2

-1

0

1

2

Alt. in-stars-20

-15

-10

-5

0

5

10

Alt. out-stars

-1

0

1

2

3

Alt. trans trian-2

-1.5

-1

-0.5

0

0.5

1

1.5

Alt. indep 2-paths-2

0

2

4

6

Homophily girl-15

-10

-5

0

5

10

Main girl

CI

WLS

CIWLS

j i

i j

Koskinen and Stenberg (in press) JEBS

Page 38: Missing data in social networks - Problems and prospects for model-based inference Johan Koskinen johan.koskinen@manchester.ac.uk The Social Statistics

Do we need to “analyse” networks?

Do we need to analyse networks?

- Is the network a unique narrative?

- stick to an ethnography?

Possible answers

- Detecting systematic tendencies

- Social mechanisms

- “lift the description” to describe network in

generalizable terms

Conceptualising the “network” as

a graph is what enables this

Page 39: Missing data in social networks - Problems and prospects for model-based inference Johan Koskinen johan.koskinen@manchester.ac.uk The Social Statistics

ERGMS – modelling graphs

Page 40: Missing data in social networks - Problems and prospects for model-based inference Johan Koskinen johan.koskinen@manchester.ac.uk The Social Statistics

ERGMS – modelling graphs

• We want to model tie variables• But structure – overall pattern – is evident• What kind of structural elements can we

incorporate in the model for the tie variables?

Page 41: Missing data in social networks - Problems and prospects for model-based inference Johan Koskinen johan.koskinen@manchester.ac.uk The Social Statistics

ERGMS – modelling graphs: example

Marriage network of Padgett’s

Florentine families

Page 42: Missing data in social networks - Problems and prospects for model-based inference Johan Koskinen johan.koskinen@manchester.ac.uk The Social Statistics

ERGMS – modelling graphs: example

Marriage network of Padgett’s

Florentine families

Model this as combination of 4 local structures

)()()()()()Pr(log 33221 xTxSxSxLxX

#)( xL #)( xT#)(2 xS #)(3 xS

Their importance measured by their parameters

Page 43: Missing data in social networks - Problems and prospects for model-based inference Johan Koskinen johan.koskinen@manchester.ac.uk The Social Statistics

ERGMS – modelling graphs: example

effect MLE S.E.Edge -4.14 1.072-star .97 .593-star -.56 .35Triangle 1.26 .61

Page 44: Missing data in social networks - Problems and prospects for model-based inference Johan Koskinen johan.koskinen@manchester.ac.uk The Social Statistics

ERGMS – modelling graphs: example

effect MLE S.E.Edge -4.14 1.072-star .97 .593-star -.56 .35Triangle 1.26 .61

Page 45: Missing data in social networks - Problems and prospects for model-based inference Johan Koskinen johan.koskinen@manchester.ac.uk The Social Statistics

Part 3

Modelling graphs – deriving building blocks out of

dependencies

Page 46: Missing data in social networks - Problems and prospects for model-based inference Johan Koskinen johan.koskinen@manchester.ac.uk The Social Statistics

Independence - Deriving the ERGM

li

j k

m n

heads

tails

li

li

heads

tails

i

k

i

k

Page 47: Missing data in social networks - Problems and prospects for model-based inference Johan Koskinen johan.koskinen@manchester.ac.uk The Social Statistics

Independence - Deriving the ERGM

0.25 0.25

0.25 0.25

AUD

0.5 0.5

SEK

0.5

0.5 li

k

li

k

Knowledge of AUD, e.g. does not help us predict SEK

e.g. whether or

li

j k

m n

Page 48: Missing data in social networks - Problems and prospects for model-based inference Johan Koskinen johan.koskinen@manchester.ac.uk The Social Statistics

Independence - Deriving the ERGM

i

i

k

Knowledge of AUD, e.g. does not help us predict SEK

e.g. whether or

even though dyad {i,l} li

and dyad {i,k}

have vertex i

in common

Page 49: Missing data in social networks - Problems and prospects for model-based inference Johan Koskinen johan.koskinen@manchester.ac.uk The Social Statistics

Independence - Deriving the ERGM

0.4 0.1

0.1 0.4

AUD

0.5 0.5

SEK

0.5

0.5

li

k

li

k

li

j k

m n

May we find model such that knowledge of AUD, e.g. does help us predict SEK

e.g. whether or ?

Page 50: Missing data in social networks - Problems and prospects for model-based inference Johan Koskinen johan.koskinen@manchester.ac.uk The Social Statistics

Deriving the ERGM: From Markov graph to Dependence graph

johnpete

mary

paul

Consider the tie-variables that have Mary in common

How may we make these “dependent”?

Page 51: Missing data in social networks - Problems and prospects for model-based inference Johan Koskinen johan.koskinen@manchester.ac.uk The Social Statistics

Deriving the ERGM: From Markov graph to Dependence graph

johnpete

mary

paul

petemary

Page 52: Missing data in social networks - Problems and prospects for model-based inference Johan Koskinen johan.koskinen@manchester.ac.uk The Social Statistics

Deriving the ERGM: From Markov graph to Dependence graph

johnpete

mary

paul

johnpetemary mary

Page 53: Missing data in social networks - Problems and prospects for model-based inference Johan Koskinen johan.koskinen@manchester.ac.uk The Social Statistics

Deriving the ERGM: From Markov graph to Dependence graph

johnpete

mary

paul

johnpetemary

paulmary

mary

Page 54: Missing data in social networks - Problems and prospects for model-based inference Johan Koskinen johan.koskinen@manchester.ac.uk The Social Statistics

Deriving the ERGM: From Markov graph to Dependence graph

johnpete

mary

paul

johnpetemary

paulmary

mary

pete john

Page 55: Missing data in social networks - Problems and prospects for model-based inference Johan Koskinen johan.koskinen@manchester.ac.uk The Social Statistics

Deriving the ERGM: From Markov graph to Dependence graph

johnpete

mary

paul

johnpetemary

paulmary

mary

paul john

pete john

Page 56: Missing data in social networks - Problems and prospects for model-based inference Johan Koskinen johan.koskinen@manchester.ac.uk The Social Statistics

Deriving the ERGM: From Markov graph to Dependence graph

johnpete

mary

paul

johnpetemary

paulmary

mary

paul pete paul john

pete john

Page 57: Missing data in social networks - Problems and prospects for model-based inference Johan Koskinen johan.koskinen@manchester.ac.uk The Social Statistics

Deriving the ERGM: From Markov graph to Dependence graph

johnpete

mary

paulm,pa

pa,pe pa,j

m,pe

pe,j

m,j

The “probability structure” of a Markov graph is described by cliques of the dependence graph (Hammersley-Clifford)….

Page 58: Missing data in social networks - Problems and prospects for model-based inference Johan Koskinen johan.koskinen@manchester.ac.uk The Social Statistics

Deriving the ERGM: From Markov graph to Dependence graph

pete

mary

paulm,pa

pa,pe pa,j

m,pe

pe,j

m,j

Page 59: Missing data in social networks - Problems and prospects for model-based inference Johan Koskinen johan.koskinen@manchester.ac.uk The Social Statistics

Deriving the ERGM: From Markov graph to Dependence graph

johnpete

mary

paul

m,pe

Page 60: Missing data in social networks - Problems and prospects for model-based inference Johan Koskinen johan.koskinen@manchester.ac.uk The Social Statistics

Deriving the ERGM: From Markov graph to Dependence graph

johnpete

mary

paulm,pa

pa,pe pa,j

m,pe

pe,j

m,j

Page 61: Missing data in social networks - Problems and prospects for model-based inference Johan Koskinen johan.koskinen@manchester.ac.uk The Social Statistics

Deriving the ERGM: From Markov graph to Dependence graph

mary

johnpete

paulm,pa

pa,pe pa,j

m,pe

pe,j

m,j

Page 62: Missing data in social networks - Problems and prospects for model-based inference Johan Koskinen johan.koskinen@manchester.ac.uk The Social Statistics

Deriving the ERGM: From Markov graph to Dependence graph

mary

johnpete

paul

m,pe m,j

Page 63: Missing data in social networks - Problems and prospects for model-based inference Johan Koskinen johan.koskinen@manchester.ac.uk The Social Statistics

Deriving the ERGM: From Markov graph to Dependence graph

mary

johnpete

paulm,pa

pa,pe pa,j

m,pe

pe,j

m,j

Page 64: Missing data in social networks - Problems and prospects for model-based inference Johan Koskinen johan.koskinen@manchester.ac.uk The Social Statistics

Deriving the ERGM: From Markov graph to Dependence graph

mary

johnpete

paulm,pa

pa,pe pa,j

m,pe

pe,j

m,j

Page 65: Missing data in social networks - Problems and prospects for model-based inference Johan Koskinen johan.koskinen@manchester.ac.uk The Social Statistics

Deriving the ERGM: From Markov graph to Dependence graph

mary

johnpete

paul

m,pe

pe,j

m,j

Page 66: Missing data in social networks - Problems and prospects for model-based inference Johan Koskinen johan.koskinen@manchester.ac.uk The Social Statistics

Deriving the ERGM: From Markov graph to Dependence graph

mary

johnpete

paulm,pa

pa,pe pa,j

m,pe

pe,j

m,j

Page 67: Missing data in social networks - Problems and prospects for model-based inference Johan Koskinen johan.koskinen@manchester.ac.uk The Social Statistics

Deriving the ERGM: From Markov graph to Dependence graph

mary

johnpete

paulm,pa

pa,pe pa,j

m,pe

pe,j

m,j

Page 68: Missing data in social networks - Problems and prospects for model-based inference Johan Koskinen johan.koskinen@manchester.ac.uk The Social Statistics

Deriving the ERGM: From Markov graph to Dependence graph

mary

johnpete

paulm,pa

pa,j

m,pe

pe,j

Page 69: Missing data in social networks - Problems and prospects for model-based inference Johan Koskinen johan.koskinen@manchester.ac.uk The Social Statistics

Deriving the ERGM: From Markov graph to Dependence graph

mary

johnpete

paulm,pa

pa,pe pa,j

m,pe

pe,j

m,j

Page 70: Missing data in social networks - Problems and prospects for model-based inference Johan Koskinen johan.koskinen@manchester.ac.uk The Social Statistics

From Markov graph to Dependence graph – distinct subgraphs?

too ma

ny sta

tistic

s (par

ameter

s)

Page 71: Missing data in social networks - Problems and prospects for model-based inference Johan Koskinen johan.koskinen@manchester.ac.uk The Social Statistics

The homogeneity assumption

=

=

=

=

Page 72: Missing data in social networks - Problems and prospects for model-based inference Johan Koskinen johan.koskinen@manchester.ac.uk The Social Statistics

A log-linear model (ERGM) for ties

)()()()()Pr(log 2211 xzxzxzxX pp

”Aggregated” to a joint model for entire adjacency matrix

Interaction terms in log-linear model of types

ijX ikij XX jkikij XXX

Page 73: Missing data in social networks - Problems and prospects for model-based inference Johan Koskinen johan.koskinen@manchester.ac.uk The Social Statistics

A log-linear model (ERGM) for ties

By definition of (in-) dependence

)Pr()Pr(),Pr( ikikijijikikijij xXxXxXxX

E.g. and co-occuring i

j

i

j k

i

k

Main effects interaction term

ijX ikX ikij XX

More than is explained by margins

Page 74: Missing data in social networks - Problems and prospects for model-based inference Johan Koskinen johan.koskinen@manchester.ac.uk The Social Statistics

Part 4

Estimation of ERGM

Page 75: Missing data in social networks - Problems and prospects for model-based inference Johan Koskinen johan.koskinen@manchester.ac.uk The Social Statistics

Likelihood equations for exponential fam

)()()()()Pr(log 2211 xzxzxzxX pp

”Aggregated” to a joint model for entire adjacency matrix X

)()}({ˆ obsxzXzEMLE

Sum over all 2n(n-1)/2 graphs

The MLE solves the equation (cf. Lehmann, 1983):

Page 76: Missing data in social networks - Problems and prospects for model-based inference Johan Koskinen johan.koskinen@manchester.ac.uk The Social Statistics

Likelihood equations for exponential fam

Solving )()}({ˆ obsxzXzEMLE

• Using the cumulant generating function (Corander, Dahmström, and Dahmström, 1998)

• Stochastic approximation (Snijders, 2002, based on Robbins-Monro, 1951)

• Importance sampling (Handcock, 2003; Hunter and Handcock, 2006, based on Geyer-Thompson 1992)

Page 77: Missing data in social networks - Problems and prospects for model-based inference Johan Koskinen johan.koskinen@manchester.ac.uk The Social Statistics

)}()({ obs)(1

0)()1(

)( xzxzDa mr

mmm

Robbins-Monro algorithm

Solving )()}({ˆ obsxzXzEMLE

Snijders, 2002, algorithm

- Initialisation phase

- Main estimation

- convergence check and cal. of standard errors

MAIN:

Draw using MCMC

Page 78: Missing data in social networks - Problems and prospects for model-based inference Johan Koskinen johan.koskinen@manchester.ac.uk The Social Statistics

)()()( obs1

)()(1)1()1()( xzxzwIM

m

mmggg

Geyer-Thompson

Solving )()}({ˆ obsxzXzEMLE

Handcock, 2003, approximate Fisher scoring

MAIN:

Approximated using importance sample from MCMC

Page 79: Missing data in social networks - Problems and prospects for model-based inference Johan Koskinen johan.koskinen@manchester.ac.uk The Social Statistics

Bayes: dealing with likelihood

The normalising constant of the posterior not essential for Bayesian inference, all we need is:

y

p

k kk

p

k kk

yz

xzx

1

1

)}(exp{

)}(exp{);(

)();(d)();(

)();()|(

x

x

xx

… but

Sum over all 2n(n-1)/2 graphs

Page 80: Missing data in social networks - Problems and prospects for model-based inference Johan Koskinen johan.koskinen@manchester.ac.uk The Social Statistics

Bayes: MCMC?

Consequently, in e.g. Metropolis-Hastings, acceptance probability of move to θ

y

p

k kk

y

p

k kk

yz

yz

1

*

1

)}(exp{

)}(exp{

)|(

)|(

)();(

*)();(min

)|(

)|(

)|(

)|(,1min

*

**

*

**

prop

prop

prop

prop

q

q

x

x

q

q

x

x

… which contains

Page 81: Missing data in social networks - Problems and prospects for model-based inference Johan Koskinen johan.koskinen@manchester.ac.uk The Social Statistics

Bayes: Linked Importance Sampler Auxiliary Variable MCMC

LISA (Koskinen, 2008; Koskinen, Robins & Pattison, 2010): Based on

Møller et al. (2006), we define an auxiliary variable

And produce draws from the joint posterior

m

j

K KK1

,,1,,1 X

)(})(exp{

)(

})(exp{

})(exp{)|,( ,

yz

P

yz

xzx

kk

B

kk

obskkobs

using the proposal distributions

),(~| )()(* tt N and )}(exp{

)(~|

*

*

,** *

yz

P

kk

F

Page 82: Missing data in social networks - Problems and prospects for model-based inference Johan Koskinen johan.koskinen@manchester.ac.uk The Social Statistics

Bayes: alternative auxiliary variable

LISA (Koskinen, 2008; Koskinen, Robins & Pattison, 2010): Based on

Møller et al. (2006), we define an auxiliary variable

Improvement: use exchange algorithm (Murray et al. 2006)

m

j

K KK1

,,1,,1 X

Many linked chains:- Computation time- storage (memory and time issues)

),(~| )()(* tt N and )ERGM(~| *** x

Accept θ* with log-probability: ))}(*)((*)(,0min{ obsT xzxz

Caimo & Friel, 2011

Page 83: Missing data in social networks - Problems and prospects for model-based inference Johan Koskinen johan.koskinen@manchester.ac.uk The Social Statistics

Bayes: Implications of using alternative auxiliary variable

Improvement: use exchange algorithm (Murray et al. 2006)

),(~| )()(* tt N and )ERGM(~| *** x

Accept θ* with log-probability: ))}(*)((*)(,0min{ obsT xzxz

Caimo & Friel, 2011

• Storing only parameters• No pre tuning – no need for good initial values• Standard MCMC properties of sampler• Less sensitive to near degeneracy in estimation• Easier than anything else to implement

QUICK and ROBUST

Page 84: Missing data in social networks - Problems and prospects for model-based inference Johan Koskinen johan.koskinen@manchester.ac.uk The Social Statistics

Part 5

Types of missing data

Page 85: Missing data in social networks - Problems and prospects for model-based inference Johan Koskinen johan.koskinen@manchester.ac.uk The Social Statistics

Sampling in/on networks

Page 86: Missing data in social networks - Problems and prospects for model-based inference Johan Koskinen johan.koskinen@manchester.ac.uk The Social Statistics

Sampling in/on networks

Page 87: Missing data in social networks - Problems and prospects for model-based inference Johan Koskinen johan.koskinen@manchester.ac.uk The Social Statistics

Sampling in/on networks

Page 88: Missing data in social networks - Problems and prospects for model-based inference Johan Koskinen johan.koskinen@manchester.ac.uk The Social Statistics

Sampling in/on networks

Page 89: Missing data in social networks - Problems and prospects for model-based inference Johan Koskinen johan.koskinen@manchester.ac.uk The Social Statistics

Sampling in/on networks

Page 90: Missing data in social networks - Problems and prospects for model-based inference Johan Koskinen johan.koskinen@manchester.ac.uk The Social Statistics

Sampling in/on networks

Page 91: Missing data in social networks - Problems and prospects for model-based inference Johan Koskinen johan.koskinen@manchester.ac.uk The Social Statistics

Sampling in/on networks

Page 92: Missing data in social networks - Problems and prospects for model-based inference Johan Koskinen johan.koskinen@manchester.ac.uk The Social Statistics

Sampling in/on networks

Page 93: Missing data in social networks - Problems and prospects for model-based inference Johan Koskinen johan.koskinen@manchester.ac.uk The Social Statistics

missing data

observed data

Sampling in/on networks

Page 94: Missing data in social networks - Problems and prospects for model-based inference Johan Koskinen johan.koskinen@manchester.ac.uk The Social Statistics

Sampling in/on networks

=

0

x =

- 1

1

Page 95: Missing data in social networks - Problems and prospects for model-based inference Johan Koskinen johan.koskinen@manchester.ac.uk The Social Statistics

Sampling in/on networks

=

0

0

1

1x =

-

- 0

1

1 0

1

1

Page 96: Missing data in social networks - Problems and prospects for model-based inference Johan Koskinen johan.koskinen@manchester.ac.uk The Social Statistics

Sampling in/on networks

=

0

0

0

1

0

1x =

-

- 0

1

1 0

1

0

0 1

Page 97: Missing data in social networks - Problems and prospects for model-based inference Johan Koskinen johan.koskinen@manchester.ac.uk The Social Statistics

Sampling in/on networks

=

0

0

0

1

0

1

0

0 0

0

0

00

0

x =

-

- 0

1

1 0

1

0

0 1

Page 98: Missing data in social networks - Problems and prospects for model-based inference Johan Koskinen johan.koskinen@manchester.ac.uk The Social Statistics

Sampling in/on networks

=

0

0

0

1

0

1

0

0 0

0

0

00

0

x =

-

-

-

0

1

0

1 00 -

1

0

-

1

00 1 1 -

Page 99: Missing data in social networks - Problems and prospects for model-based inference Johan Koskinen johan.koskinen@manchester.ac.uk The Social Statistics

Sampling in/on networks

=

0

0

0

1

0

1

0

0

?

0

?

0

??

??0

00

0

x =

-

-

-

-

-

0

1

?

?

0

1 0 ? ?0 -

1

0

?

?

-

1

?00 1 ?1 -

Page 100: Missing data in social networks - Problems and prospects for model-based inference Johan Koskinen johan.koskinen@manchester.ac.uk The Social Statistics

Ignoring non-sampled?

=

0

0

0

1

0

1

0

0

?

0

?

0

??

??0

00

0

x =

-

-

-

-

-

0

1

?

?

0

1 0 ? ?0 -

1

0

?

?

-

1

?00 1 ?1 -

Page 101: Missing data in social networks - Problems and prospects for model-based inference Johan Koskinen johan.koskinen@manchester.ac.uk The Social Statistics

What about alter – alter across ego?

=

0

0

0

1

0

1

0

0

?

0

?

0

??

??0

00

0

x =

-

-

-

-

-

0

1

?

?

0

1 0 ? ?0 -

1

0

?

?

-

?

?00 1 ?? -

Page 102: Missing data in social networks - Problems and prospects for model-based inference Johan Koskinen johan.koskinen@manchester.ac.uk The Social Statistics

School classes

Page 103: Missing data in social networks - Problems and prospects for model-based inference Johan Koskinen johan.koskinen@manchester.ac.uk The Social Statistics

School classes

Page 104: Missing data in social networks - Problems and prospects for model-based inference Johan Koskinen johan.koskinen@manchester.ac.uk The Social Statistics

School classes

Page 105: Missing data in social networks - Problems and prospects for model-based inference Johan Koskinen johan.koskinen@manchester.ac.uk The Social Statistics

School classes

)();(d)();(

)();()|(

x

x

xx

Page 106: Missing data in social networks - Problems and prospects for model-based inference Johan Koskinen johan.koskinen@manchester.ac.uk The Social Statistics

School classes

Page 107: Missing data in social networks - Problems and prospects for model-based inference Johan Koskinen johan.koskinen@manchester.ac.uk The Social Statistics

Multilevel attribute models

If network like another level:

Groups:

Group indicators:

Networks in groups (scaled):

With random intercepts:

Page 108: Missing data in social networks - Problems and prospects for model-based inference Johan Koskinen johan.koskinen@manchester.ac.uk The Social Statistics

Empirical setup

=

0

0

0

0

0

0

0

0

0

0

0

0

00

000

0

0

0

Page 109: Missing data in social networks - Problems and prospects for model-based inference Johan Koskinen johan.koskinen@manchester.ac.uk The Social Statistics

Empirical setup

=

0

0

0

0

0

0

0

0

0

0

0

0

00

000

0

0

0

Page 110: Missing data in social networks - Problems and prospects for model-based inference Johan Koskinen johan.koskinen@manchester.ac.uk The Social Statistics

Problem of boundary specification

By design – children do not nominate alters outside of school class

Page 111: Missing data in social networks - Problems and prospects for model-based inference Johan Koskinen johan.koskinen@manchester.ac.uk The Social Statistics

Problem of boundary specification

By design – children do not nominate alters outside of school class

Out of school

To other school class

Page 112: Missing data in social networks - Problems and prospects for model-based inference Johan Koskinen johan.koskinen@manchester.ac.uk The Social Statistics

Problem of boundary specification

By design – children do not nominate alters outside of school class

Out of school

To other school class

Page 113: Missing data in social networks - Problems and prospects for model-based inference Johan Koskinen johan.koskinen@manchester.ac.uk The Social Statistics

Multilevel autocorrelation/nef models

=

0

0

0

0

0

0

0

0

0

0

0

0

00

000

0

0

0

Page 114: Missing data in social networks - Problems and prospects for model-based inference Johan Koskinen johan.koskinen@manchester.ac.uk The Social Statistics

Multilevel autocorrelation/nef models

=

?

?

?

?

?

?

?

?

?

?

?

?

??

???

?

?

?

Page 115: Missing data in social networks - Problems and prospects for model-based inference Johan Koskinen johan.koskinen@manchester.ac.uk The Social Statistics

Part 6

Estimation of ERGM with missing data

Page 116: Missing data in social networks - Problems and prospects for model-based inference Johan Koskinen johan.koskinen@manchester.ac.uk The Social Statistics

Model assisted treatment of missing network data

missing data

observed data

If you don’t have a model for what you have observed

How are you going to be able to say something about what you have not observed using what you have observed

Page 117: Missing data in social networks - Problems and prospects for model-based inference Johan Koskinen johan.koskinen@manchester.ac.uk The Social Statistics

Model assisted treatment of missing network data

• Importance sampling (Handcock & Gile 2010; Koskinen, Robins & Pattison, 2010)

• Stochastic approximation and the missing data principle (Orchard & Woodbury,1972) (Koskinen & Snijders, forthcoming)

• Bayesian data augmentation (Koskinen, Robins & Pattison, 2010)

Page 118: Missing data in social networks - Problems and prospects for model-based inference Johan Koskinen johan.koskinen@manchester.ac.uk The Social Statistics

What about alter – alter across ego?

missing data

observed data

Available case analysis: pretend missing does not exist

Page 119: Missing data in social networks - Problems and prospects for model-based inference Johan Koskinen johan.koskinen@manchester.ac.uk The Social Statistics

The principled approach in ERGM framework

missing data

observed data

We have to simulate the missing (complement)

and “pool” our inferences

Page 120: Missing data in social networks - Problems and prospects for model-based inference Johan Koskinen johan.koskinen@manchester.ac.uk The Social Statistics

Subgraph of ERGM not ERGM

i

j

k

Dependence in ERGM We may also have dependence

i

j

lk

But if

k

?

j

We should include

counts of:

Marginalisation (Snijders, 2010; Koskinen et al, 2010)

Page 121: Missing data in social networks - Problems and prospects for model-based inference Johan Koskinen johan.koskinen@manchester.ac.uk The Social Statistics

Bayesian Data Augmentation

With missing data:

Simulate parameters

In each iteration simulate graphs

missing

Bayesian Data Augmentation

),(~| )()(* tt N

)ERGM(~| *** x

))}(*)((*)(,0min{ obsT xzxz

Page 122: Missing data in social networks - Problems and prospects for model-based inference Johan Koskinen johan.koskinen@manchester.ac.uk The Social Statistics

Bayesian Data Augmentation

Simulate parameters

With missing data:

In each iteration simulate graphs

missing

Most likely missing given current

Bayesian Data Augmentation

Page 123: Missing data in social networks - Problems and prospects for model-based inference Johan Koskinen johan.koskinen@manchester.ac.uk The Social Statistics

Bayesian Data Augmentation

Simulate parameters

With missing data:

In each iteration simulate graphs

missing

Most likely given current missing

Bayesian Data Augmentation

Page 124: Missing data in social networks - Problems and prospects for model-based inference Johan Koskinen johan.koskinen@manchester.ac.uk The Social Statistics

Bayesian Data Augmentation

Simulate parameters

With missing data:

In each iteration simulate graphs

missing

Most likely missing given current

Bayesian Data Augmentation

Page 125: Missing data in social networks - Problems and prospects for model-based inference Johan Koskinen johan.koskinen@manchester.ac.uk The Social Statistics

Bayesian Data Augmentation

Simulate parameters

With missing data:

In each iteration simulate graphs

missing

Most likely given current missing

Bayesian Data Augmentation

Page 126: Missing data in social networks - Problems and prospects for model-based inference Johan Koskinen johan.koskinen@manchester.ac.uk The Social Statistics

Bayesian Data Augmentation

Simulate parameters

With missing data:

In each iteration simulate graphs

missing

Most likely missing given current

Bayesian Data Augmentation

Page 127: Missing data in social networks - Problems and prospects for model-based inference Johan Koskinen johan.koskinen@manchester.ac.uk The Social Statistics

Bayesian Data Augmentation

Simulate parameters

With missing data:

In each iteration simulate graphs

missing

and so on…

Bayesian Data Augmentation

Page 128: Missing data in social networks - Problems and prospects for model-based inference Johan Koskinen johan.koskinen@manchester.ac.uk The Social Statistics

Bayesian Data Augmentation

Simulate parameters

With missing data:

In each iteration simulate graphs

missing

… until

Bayesian Data Augmentation

Page 129: Missing data in social networks - Problems and prospects for model-based inference Johan Koskinen johan.koskinen@manchester.ac.uk The Social Statistics

Bayesian Data Augmentation

What does it give us?

Distribution of parameters

Distribution of missing data

Subtle point

Missing data does not depend on the parameters (we don’t have to choose parameters to simulate missing)

missing

Bayesian Data Augmentation

Page 130: Missing data in social networks - Problems and prospects for model-based inference Johan Koskinen johan.koskinen@manchester.ac.uk The Social Statistics

Bayesian Data Augmentation

What does it give us?

Distribution of parameters

Distribution of missing data

Subtle point

Missing data does not depend on the parameters (we don’t have to choose parameters to simulate missing)

missing

Bayesian Data Augmentation

Page 131: Missing data in social networks - Problems and prospects for model-based inference Johan Koskinen johan.koskinen@manchester.ac.uk The Social Statistics

Part 7

Estimation of ERGM with missing data

- Example Missing ties

Page 132: Missing data in social networks - Problems and prospects for model-based inference Johan Koskinen johan.koskinen@manchester.ac.uk The Social Statistics

Bayesian Data AugmentationLazega’s (2001) Lawyers

Collaboration network among 36 lawyers in a

New England law firm (Lazega, 2001)

Boston office:

Hartford office:

Providence off.:

least senior:

most senior:

Page 133: Missing data in social networks - Problems and prospects for model-based inference Johan Koskinen johan.koskinen@manchester.ac.uk The Social Statistics

Bayesian Data AugmentationLazega’s (2001) Lawyers

133

ijx

)( jiij aax )( jiij bbx

)( jiij bbx 1

)( jiij ccx 1

)( jiij ddx 1

323

12

1

)()1(

)()(3

n

nn xtxtxt

Edges:

Seniority:

Practice:

Homophily

Sex:

Office:

GWESP:

with 8 = log()

Practice:

Main effect

t1 :

t2 :

etc.

(bi = 1, if i corporate,0 litigation)

t3 :

Page 134: Missing data in social networks - Problems and prospects for model-based inference Johan Koskinen johan.koskinen@manchester.ac.uk The Social Statistics

Bayesian Data AugmentationLazega’s (2001) Lawyers – ERGM posteriors (Koskinen, 2008)

Page 135: Missing data in social networks - Problems and prospects for model-based inference Johan Koskinen johan.koskinen@manchester.ac.uk The Social Statistics

Bayesian Data AugmentationCross validation (Koskinen, Robins & Pattison, 2010)

Remove 200 of the 630 dyads at random

Fit inhomogeneous Bernoulli model obtain the posterior predictive tie-probabilities for the missing tie-variables

Fit ERGM and obtain the posterior predictive tie-probabilities for the missing tie-variables (Koskinen et al., in press)

Fit Hoff’s (2008) latent variable probit model with linear predictor Tz(xij) + wiwj

T

Repeat many times

Page 136: Missing data in social networks - Problems and prospects for model-based inference Johan Koskinen johan.koskinen@manchester.ac.uk The Social Statistics

Bayesian Data AugmentationROC curve for predictive probabilities combined over 20 replications (Koskinen et al. 2010)

0 0.2 0.4 0.6 0.8 10

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

1

False Positive Rate

Tru

e P

ositi

ve R

ate

Page 137: Missing data in social networks - Problems and prospects for model-based inference Johan Koskinen johan.koskinen@manchester.ac.uk The Social Statistics

Bayesian Data AugmentationROC curve for predictive probabilities combined over 20 replications (Koskinen et al. 2010)

0 0.2 0.4 0.6 0.8 10

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

1

False Positive Rate

Tru

e P

ositi

ve R

ate

Page 138: Missing data in social networks - Problems and prospects for model-based inference Johan Koskinen johan.koskinen@manchester.ac.uk The Social Statistics

Bayesian Data AugmentationROC curve for predictive probabilities combined over 20 replications (Koskinen et al. 2010)

0 0.2 0.4 0.6 0.8 10

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

1

False Positive Rate

Tru

e P

ositi

ve R

ate

Page 139: Missing data in social networks - Problems and prospects for model-based inference Johan Koskinen johan.koskinen@manchester.ac.uk The Social Statistics

Part 8

Estimation of ERGM with missing data

- Sampled data and “covert” actors

Page 140: Missing data in social networks - Problems and prospects for model-based inference Johan Koskinen johan.koskinen@manchester.ac.uk The Social Statistics

Bayesian Data AugmentationSnowball sampling

• Snowball sampling design ignorable for ERGM (Thompson and Frank, 2000, Handcock & Gile 2010; Koskinen, Robins & Pattison, 2010)

• ... but snowball sampling rarely used when population size is known...

• Using the Sageman (2004) “clandestine” network as test-bed for unknown N

Page 141: Missing data in social networks - Problems and prospects for model-based inference Johan Koskinen johan.koskinen@manchester.ac.uk The Social Statistics

Bayesian Data AugmentationSnowball sampling

• Snowball sampling design ignorable for ERGM (Thompson and Frank, 2000, Handcock & Gile 2010; Koskinen, Robins & Pattison, 2010)

• ... but snowball sampling rarely used when population size is known...

• Using the Sageman (2004) “clandestine” network as test-bed for unknown N

Page 142: Missing data in social networks - Problems and prospects for model-based inference Johan Koskinen johan.koskinen@manchester.ac.uk The Social Statistics

Bayesian Data Augmentationthe Sageman (2004) N = 366 network

Page 143: Missing data in social networks - Problems and prospects for model-based inference Johan Koskinen johan.koskinen@manchester.ac.uk The Social Statistics

Bayesian Data Augmentationthe Sageman (2004) N = 366 network

Page 144: Missing data in social networks - Problems and prospects for model-based inference Johan Koskinen johan.koskinen@manchester.ac.uk The Social Statistics

Bayesian Data Augmentationthe Sageman (2004) N = 366 network

Page 145: Missing data in social networks - Problems and prospects for model-based inference Johan Koskinen johan.koskinen@manchester.ac.uk The Social Statistics

Bayesian Data Augmentationthe Sageman (2004) N = 366 network

Page 146: Missing data in social networks - Problems and prospects for model-based inference Johan Koskinen johan.koskinen@manchester.ac.uk The Social Statistics

Bayesian Data Augmentationthe Sageman (2004) N = 366 network

Take seed of size

n = 120

Snowball out 1 wave. Additional nodes

m = 160

Page 147: Missing data in social networks - Problems and prospects for model-based inference Johan Koskinen johan.koskinen@manchester.ac.uk The Social Statistics

Bayesian Data Augmentationthe Sageman (2004) N = 366 network

Seed n = 120, first wave m = 160, N ≥ 280

Page 148: Missing data in social networks - Problems and prospects for model-based inference Johan Koskinen johan.koskinen@manchester.ac.uk The Social Statistics

Bayesian Data Augmentationthe Sageman (2004) N = 366 network

Seed n = 120, first wave m = 160, N ≥ 280

-5 -4.5 -4 -3.5 -3 -2.5 -2 -1.5 -1 -0.5 00

0.5

1

density

-1.6 -1.4 -1.2 -1 -0.8 -0.6 -0.40

2

4

alt. star

1.4 1.5 1.6 1.7 1.8 1.9 2 2.1 2.2 2.3 2.40

2

4

alt. triangle

Assume

N = 281

Page 149: Missing data in social networks - Problems and prospects for model-based inference Johan Koskinen johan.koskinen@manchester.ac.uk The Social Statistics

Bayesian Data Augmentationthe Sageman (2004) N = 366 network

Seed n = 120, first wave m = 160, N ≥ 280

-5 -4.5 -4 -3.5 -3 -2.5 -2 -1.5 -1 -0.5 00

0.5

1

density

-1.6 -1.4 -1.2 -1 -0.8 -0.6 -0.40

2

4

alt. star

1.4 1.5 1.6 1.7 1.8 1.9 2 2.1 2.2 2.3 2.40

2

4

alt. triangle

Assume

N = 281

N = 291

Page 150: Missing data in social networks - Problems and prospects for model-based inference Johan Koskinen johan.koskinen@manchester.ac.uk The Social Statistics

Bayesian Data Augmentationthe Sageman (2004) N = 366 network

Seed n = 120, first wave m = 160, N ≥ 280

-5 -4.5 -4 -3.5 -3 -2.5 -2 -1.5 -1 -0.5 00

0.5

1

density

-1.6 -1.4 -1.2 -1 -0.8 -0.6 -0.40

2

4

alt. star

1.4 1.5 1.6 1.7 1.8 1.9 2 2.1 2.2 2.3 2.40

2

4

alt. triangle

Assume

N = 281

N = 291

N = 301

Page 151: Missing data in social networks - Problems and prospects for model-based inference Johan Koskinen johan.koskinen@manchester.ac.uk The Social Statistics

Bayesian Data Augmentationthe Sageman (2004) N = 366 network

Seed n = 120, first wave m = 160, N ≥ 280

-5 -4.5 -4 -3.5 -3 -2.5 -2 -1.5 -1 -0.5 00

0.5

1

density

-1.6 -1.4 -1.2 -1 -0.8 -0.6 -0.40

2

4

alt. star

1.4 1.5 1.6 1.7 1.8 1.9 2 2.1 2.2 2.3 2.40

2

4

alt. triangle

Assume

N = 281

N = 291

N = 301

N = 311

Page 152: Missing data in social networks - Problems and prospects for model-based inference Johan Koskinen johan.koskinen@manchester.ac.uk The Social Statistics

Bayesian Data Augmentationthe Sageman (2004) N = 366 network

Seed n = 120, first wave m = 160, N ≥ 280

Assume

N = 281

N = 291

N = 301

N = 311

...

N = 391

N = 396

N = 399

-5 -4.5 -4 -3.5 -3 -2.5 -2 -1.5 -1 -0.5 00

0.5

1

density

-1.6 -1.4 -1.2 -1 -0.8 -0.6 -0.40

2

4

alt. star

1.4 1.5 1.6 1.7 1.8 1.9 2 2.1 2.2 2.3 2.40

2

4

alt. triangle

Page 153: Missing data in social networks - Problems and prospects for model-based inference Johan Koskinen johan.koskinen@manchester.ac.uk The Social Statistics

Bayesian Data Augmentationthe Sageman (2004) N = 366 network

Seed n = 120, first wave m = 160, N ≥ 280

Assume

N = 281

N = 291

N = 301

N = 311

...

N = 391

N = 396

N = 399300 320 340 360 380

-4.5

-4

-3.5

-3

-2.5

-2

-1.5

-1

-0.5

edge

s

300 320 340 360 380

-1.6

-1.4

-1.2

-1

-0.8

-0.6

alt.

star

300 320 340 360 3801.5

1.6

1.7

1.8

1.9

2

2.1

2.2

2.3

alt.

trian

gle

.95 credibility intervals

N

Page 154: Missing data in social networks - Problems and prospects for model-based inference Johan Koskinen johan.koskinen@manchester.ac.uk The Social Statistics

Bayesian Data Augmentationthe Sageman (2004) N = 366 network

Seed n = 120, first wave m = 160, N ≥ 280

Assume

N = 281

N = 291

N = 301

N = 311

...

N = 391

N = 396

N = 399300 320 340 360 380

-4.5

-4

-3.5

-3

-2.5

-2

-1.5

-1

-0.5

edge

s

300 320 340 360 380

-1.6

-1.4

-1.2

-1

-0.8

-0.6

alt.

star

300 320 340 360 3801.5

1.6

1.7

1.8

1.9

2

2.1

2.2

2.3

alt.

trian

gle

.95 credibility intervals

Page 155: Missing data in social networks - Problems and prospects for model-based inference Johan Koskinen johan.koskinen@manchester.ac.uk The Social Statistics

Bayesian Data Augmentationthe Sageman (2004) N = 366 network

Seed n = 120, first wave m = 160, N ≥ 280

Assume

N = 281

N = 291

N = 301

N = 311

...

N = 391

N = 396

N = 399300 320 340 360 380

-4.5

-4

-3.5

-3

-2.5

-2

-1.5

-1

-0.5

edge

s

300 320 340 360 380

-1.6

-1.4

-1.2

-1

-0.8

-0.6

alt.

star

300 320 340 360 3801.5

1.6

1.7

1.8

1.9

2

2.1

2.2

2.3

alt.

trian

gle

.95 credibility intervals

Page 156: Missing data in social networks - Problems and prospects for model-based inference Johan Koskinen johan.koskinen@manchester.ac.uk The Social Statistics

Bayesian Data Augmentationthe Sageman (2004) N = 366 network

Seed n = 120, first wave m = 160, N ≥ 280

.95 prediction intervals

0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 160

50

100

N =

281

Degree

0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 160

50

100

N =

366

Degree

0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 160

50

100

N =

396

Degree

Bernoulli

ERGM

obs

Page 157: Missing data in social networks - Problems and prospects for model-based inference Johan Koskinen johan.koskinen@manchester.ac.uk The Social Statistics

Bayesian Data Augmentationthe Sageman (2004) N = 366 network

Seed n = 120, first wave m = 160, N ≥ 280

.95 prediction intervals

Bernoulli

ERGM

obs 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20

0

0.5

1

N =

281

Geodesic

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 200

0.5

1

N =

366

Geodesic

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20

0.5

1

N =

396

Geodesic

Page 158: Missing data in social networks - Problems and prospects for model-based inference Johan Koskinen johan.koskinen@manchester.ac.uk The Social Statistics

Bayesian Data AugmentationSnowball sampling – Next steps

We can fit and predict missing conditional on N

Next:

Marginalise with respect to N, and

Estimate N - Use path sampler- Take combinatorics of zero block into account

Page 159: Missing data in social networks - Problems and prospects for model-based inference Johan Koskinen johan.koskinen@manchester.ac.uk The Social Statistics

Part 9

Further issues

Page 160: Missing data in social networks - Problems and prospects for model-based inference Johan Koskinen johan.koskinen@manchester.ac.uk The Social Statistics

How large networks can we allow for?

Large N• ERGMs do not scale up (cp missing data

experiments)• Lot of unobserved data – lot of

unobserved covariates• Computational issues – time and memory• Heterogeneity…

Page 161: Missing data in social networks - Problems and prospects for model-based inference Johan Koskinen johan.koskinen@manchester.ac.uk The Social Statistics

How large networks can we allow for?

ERGMs typically assume homogeneity

=

=

=

=

(A)Block modelling and ERGM (Koskinen, 2009)(B) Latent class ERGM (Schweingberger & Handcock)

Page 162: Missing data in social networks - Problems and prospects for model-based inference Johan Koskinen johan.koskinen@manchester.ac.uk The Social Statistics

Solutions and future directions

Ignoring unknown N:- Conditional MLE for snowball sample does not

require knowledge of N (sic!) (Pattison et al., in preparation)

Estimating N:- Bernoulli assumptions (Frank and Snijders,

1994 JOS)- Using ERGM and Bayes factors? (Koskinen et

al., in preparation)- Using heuristic GOF; posterior predictive

distributions, re-sampling and copula (?)

Page 163: Missing data in social networks - Problems and prospects for model-based inference Johan Koskinen johan.koskinen@manchester.ac.uk The Social Statistics

Wrap-up

ERGMs- Increasingly being used- Increasingly being understood- Increasingly being able to handle imperfect data (also missing link prediction)

Methods-Plenty of open issues-Bayes is the way of the futureLegitimacy and dissemination- e.g. Lusher, Koskinen, Robins ERGMs for SN, CUP, 2011