statistical analysis of network data with r - network...

27
Statistical Analysis of Network Data with R Network Cohesion & Graph Partitioning Kim Seonghyeon April 14, 2017 Kim Seonghyeon Statistical Analysis of Network Data with R April 14, 2017 1 / 27

Upload: others

Post on 04-Jun-2020

2 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Statistical Analysis of Network Data with R - Network ...stat.snu.ac.kr/idea/seminar/20170414/network_cohesion_partition.pdf · Kim Seonghyeon Statistical Analysis of Network Data

Statistical Analysis of Network Data with RNetwork Cohesion & Graph Partitioning

Kim Seonghyeon

April 14, 2017

Kim Seonghyeon Statistical Analysis of Network Data with R April 14, 2017 1 / 27

Page 2: Statistical Analysis of Network Data with R - Network ...stat.snu.ac.kr/idea/seminar/20170414/network_cohesion_partition.pdf · Kim Seonghyeon Statistical Analysis of Network Data

Network Cohesion

Kim Seonghyeon Statistical Analysis of Network Data with R April 14, 2017 2 / 27

Page 3: Statistical Analysis of Network Data with R - Network ...stat.snu.ac.kr/idea/seminar/20170414/network_cohesion_partition.pdf · Kim Seonghyeon Statistical Analysis of Network Data

subgraph & censuses

cliqueclique: Complete subgraphmaximal clique: A clique that is not a subset of a larger clique

Kim Seonghyeon Statistical Analysis of Network Data with R April 14, 2017 3 / 27

Page 4: Statistical Analysis of Network Data with R - Network ...stat.snu.ac.kr/idea/seminar/20170414/network_cohesion_partition.pdf · Kim Seonghyeon Statistical Analysis of Network Data

subgraph & censuses

H

2

3

4

5

6

7

8

9

10

11

12

13

14

15

16

17

18

19

20

21

22

23

24

2526

27

28

2930

31

32

33A

Figure 1: karateKim Seonghyeon Statistical Analysis of Network Data with R April 14, 2017 4 / 27

Page 5: Statistical Analysis of Network Data with R - Network ...stat.snu.ac.kr/idea/seminar/20170414/network_cohesion_partition.pdf · Kim Seonghyeon Statistical Analysis of Network Data

subgraph & censuses

Table 1: number of clique

1 2 3 4 5count 34 78 45 11 2

Table 2: number of maximal clique

2 3 4 5count 11 21 2 2

Kim Seonghyeon Statistical Analysis of Network Data with R April 14, 2017 5 / 27

Page 6: Statistical Analysis of Network Data with R - Network ...stat.snu.ac.kr/idea/seminar/20170414/network_cohesion_partition.pdf · Kim Seonghyeon Statistical Analysis of Network Data

subgraph & censuses

core & corenessk-core: weakened notion of cliqueA subgraph of G for which all vertex degrees are at least k.No other subgraph obeying the same condition contains it. (i.e., it ismaximal in this property)coreness: coreness(v) = max{k|H is k-core, v ∈ VH}

Kim Seonghyeon Statistical Analysis of Network Data with R April 14, 2017 6 / 27

Page 7: Statistical Analysis of Network Data with R - Network ...stat.snu.ac.kr/idea/seminar/20170414/network_cohesion_partition.pdf · Kim Seonghyeon Statistical Analysis of Network Data

subgraph & censuses

Figure 2: visualization with corenessKim Seonghyeon Statistical Analysis of Network Data with R April 14, 2017 7 / 27

Page 8: Statistical Analysis of Network Data with R - Network ...stat.snu.ac.kr/idea/seminar/20170414/network_cohesion_partition.pdf · Kim Seonghyeon Statistical Analysis of Network Data

subgraph & censuses

Censuses (directed graph)mutual:Cmut = {{u, v} ⊂ VG |(u, v), (v , u) ∈ EG}asymmetric:Casym = {{u, v} ⊂ VG |(u, v) ∈ EG} \ Cmutnull:Cnull = {{u, v}|{u, v} ⊂ VG} \ (Cmut ∪ Casym)

Kim Seonghyeon Statistical Analysis of Network Data with R April 14, 2017 8 / 27

Page 9: Statistical Analysis of Network Data with R - Network ...stat.snu.ac.kr/idea/seminar/20170414/network_cohesion_partition.pdf · Kim Seonghyeon Statistical Analysis of Network Data

subgraph & censuses## aidsblog

## $v## [1] 146#### $e## [1] 183#### $mut## [1] 3#### $asym## [1] 177#### $null## [1] 10405

Kim Seonghyeon Statistical Analysis of Network Data with R April 14, 2017 9 / 27

Page 10: Statistical Analysis of Network Data with R - Network ...stat.snu.ac.kr/idea/seminar/20170414/network_cohesion_partition.pdf · Kim Seonghyeon Statistical Analysis of Network Data

Density and Related Notions of Relative Frequency

DensityDensity:

den(H) = |EH ||VH |(|VH | − 1)/2

*In the case that G is a directed graph, the denominator is replaced by|VH|(|VH| − 1).

Kim Seonghyeon Statistical Analysis of Network Data with R April 14, 2017 10 / 27

Page 11: Statistical Analysis of Network Data with R - Network ...stat.snu.ac.kr/idea/seminar/20170414/network_cohesion_partition.pdf · Kim Seonghyeon Statistical Analysis of Network Data

Density and Related Notions of Relative Frequency

clustering coefficientglobal clustering coefficient:

clT (G) = 3τ∆(G)τ3(G)

local clustering coefficient:

cl(v) = τ∆(v)τ3(v)

Kim Seonghyeon Statistical Analysis of Network Data with R April 14, 2017 11 / 27

Page 12: Statistical Analysis of Network Data with R - Network ...stat.snu.ac.kr/idea/seminar/20170414/network_cohesion_partition.pdf · Kim Seonghyeon Statistical Analysis of Network Data

Density and Related Notions of Relative Frequency

a

b

c

d

Figure 3: transitivityKim Seonghyeon Statistical Analysis of Network Data with R April 14, 2017 12 / 27

Page 13: Statistical Analysis of Network Data with R - Network ...stat.snu.ac.kr/idea/seminar/20170414/network_cohesion_partition.pdf · Kim Seonghyeon Statistical Analysis of Network Data

Density and Related Notions of Relative Frequency

reciprocity (directed graph)type 1:

rec1(G) = |Cmut ||Cmut ∪ Casym|

type 2:

rec2(G) = 2|Cmut ||EG |

Kim Seonghyeon Statistical Analysis of Network Data with R April 14, 2017 13 / 27

Page 14: Statistical Analysis of Network Data with R - Network ...stat.snu.ac.kr/idea/seminar/20170414/network_cohesion_partition.pdf · Kim Seonghyeon Statistical Analysis of Network Data

Density and Related Notions of Relative Frequency

a

b

c

Figure 4: reciprocityKim Seonghyeon Statistical Analysis of Network Data with R April 14, 2017 14 / 27

Page 15: Statistical Analysis of Network Data with R - Network ...stat.snu.ac.kr/idea/seminar/20170414/network_cohesion_partition.pdf · Kim Seonghyeon Statistical Analysis of Network Data

Connectivity, Cuts, and Flows

ConnectivityA graph G is said to be connected if every vertex is reachable fromevery other.Connected component of a graph is a maximally connected subgraph.diameter: length of the longest path.

Kim Seonghyeon Statistical Analysis of Network Data with R April 14, 2017 15 / 27

Page 16: Statistical Analysis of Network Data with R - Network ...stat.snu.ac.kr/idea/seminar/20170414/network_cohesion_partition.pdf · Kim Seonghyeon Statistical Analysis of Network Data

Connectivity, Cuts, and Flows

k-vertex-connectedA graph G is called k-vertex-connected if(i) the number of vertices Nv > k(ii) the removal of any subset of vertices X ⊂ V of cardinality |X | < kleaves a subgraph that is connected.connectivity: connectivity(G) = max{k|G is k-vertex-connected }

Kim Seonghyeon Statistical Analysis of Network Data with R April 14, 2017 16 / 27

Page 17: Statistical Analysis of Network Data with R - Network ...stat.snu.ac.kr/idea/seminar/20170414/network_cohesion_partition.pdf · Kim Seonghyeon Statistical Analysis of Network Data

Graph Partitioning

Kim Seonghyeon Statistical Analysis of Network Data with R April 14, 2017 17 / 27

Page 18: Statistical Analysis of Network Data with R - Network ...stat.snu.ac.kr/idea/seminar/20170414/network_cohesion_partition.pdf · Kim Seonghyeon Statistical Analysis of Network Data

Graph Partition

Graph Partitionpartition: C = {C1, ...,CK}, partition of the vertex set VGE (Ck ,Ck′): edges connecting vertices in Ck to vertices in Ck

We want to seek partition C where E (Ck ,Ck′) is relatively small in sizecompared to the set E (Ck ,Ck)

Kim Seonghyeon Statistical Analysis of Network Data with R April 14, 2017 18 / 27

Page 19: Statistical Analysis of Network Data with R - Network ...stat.snu.ac.kr/idea/seminar/20170414/network_cohesion_partition.pdf · Kim Seonghyeon Statistical Analysis of Network Data

Hierarchical Clustering

modularity

eij = |E (Ci ,Cj)|2|E | , ai =

K∑j=1

eij

mod(C) =K∑

i=1(eii − ai

2)

mod(C) = 0 if eij = aiajmod(C) is large if ∑K

i=1 eii = 1

Kim Seonghyeon Statistical Analysis of Network Data with R April 14, 2017 19 / 27

Page 20: Statistical Analysis of Network Data with R - Network ...stat.snu.ac.kr/idea/seminar/20170414/network_cohesion_partition.pdf · Kim Seonghyeon Statistical Analysis of Network Data

Hierarchical Clustering

## fraction1

## 1 2 3 sum## 1 0.04 0.04 0.12 0.2## 2 0.04 0.04 0.12 0.2## 3 0.12 0.12 0.36 0.6## sum 0.20 0.20 0.60 1.0

## fraction2

## 1 2 3 sum## 1 0.2 0.0 0.0 0.2## 2 0.0 0.2 0.0 0.2## 3 0.0 0.0 0.6 0.6## sum 0.2 0.2 0.6 1.0

Kim Seonghyeon Statistical Analysis of Network Data with R April 14, 2017 20 / 27

Page 21: Statistical Analysis of Network Data with R - Network ...stat.snu.ac.kr/idea/seminar/20170414/network_cohesion_partition.pdf · Kim Seonghyeon Statistical Analysis of Network Data

Hierarchical Clustering

Hierarchical methodsagglomerative: begin with partition {{v1}, ..., {vNv}}divisive: begin with partition {V }

Kim Seonghyeon Statistical Analysis of Network Data with R April 14, 2017 21 / 27

Page 22: Statistical Analysis of Network Data with R - Network ...stat.snu.ac.kr/idea/seminar/20170414/network_cohesion_partition.pdf · Kim Seonghyeon Statistical Analysis of Network Data

Hierarchical Clustering

H2

34

5

6

7

8

9

10

11

12

13

14

151617

18

19

2021

22

23

24

2526

27

28

2930

31

32

33A

Mr Hi

Actor 2

Actor 3Actor 4

Actor 5

Actor 6Actor 7

Actor 8

Actor 9

Actor 10

Actor 11

Actor 12

Actor 13

Actor 14

Actor 15

Actor 16

Actor 17Actor 18

Actor 19

Actor 20

Actor 21

Actor 22

Actor 23

Actor 24

Actor 25Actor 26

Actor 27

Actor 28

Actor 29

Actor 30

Actor 31

Actor 32

Actor 33John A

Figure 5: Agglomerative ClusteringKim Seonghyeon Statistical Analysis of Network Data with R April 14, 2017 22 / 27

Page 23: Statistical Analysis of Network Data with R - Network ...stat.snu.ac.kr/idea/seminar/20170414/network_cohesion_partition.pdf · Kim Seonghyeon Statistical Analysis of Network Data

Spectral Partitioning

graph Laplaciangraph Laplacian: L = D − A, where A is adjacency matrix andD = diag [(dv )]λ1 ≤ ... ≤ λNv are the eigenvalues of L.graph G will consist of K connected components if and only ifλ1(L) = ··· = λK (L) = 0 and 0 < λK+1.

Kim Seonghyeon Statistical Analysis of Network Data with R April 14, 2017 23 / 27

Page 24: Statistical Analysis of Network Data with R - Network ...stat.snu.ac.kr/idea/seminar/20170414/network_cohesion_partition.pdf · Kim Seonghyeon Statistical Analysis of Network Data

Spectral Partitioning

A

B

C

D

E

F

G

H

I

J

K

L

Figure 6: Agglomerative ClusteringKim Seonghyeon Statistical Analysis of Network Data with R April 14, 2017 24 / 27

Page 25: Statistical Analysis of Network Data with R - Network ...stat.snu.ac.kr/idea/seminar/20170414/network_cohesion_partition.pdf · Kim Seonghyeon Statistical Analysis of Network Data

Spectral Partitioning

## [1] 0 0 0 1 2 2 2 2 3 3 4 5

## A B C D E F G H I J K L## 1 0.00 0.00 0.00 0.00 0.00 0.5 0.5 0.5 0.5 0.00 0.00 0.00## 2 0.00 0.00 0.00 0.00 0.00 0.0 0.0 0.0 0.0 0.58 0.58 0.58## 3 0.45 0.45 0.45 0.45 0.45 0.0 0.0 0.0 0.0 0.00 0.00 0.00

Kim Seonghyeon Statistical Analysis of Network Data with R April 14, 2017 25 / 27

Page 26: Statistical Analysis of Network Data with R - Network ...stat.snu.ac.kr/idea/seminar/20170414/network_cohesion_partition.pdf · Kim Seonghyeon Statistical Analysis of Network Data

Spectral Partitioning

spectral bisectionIf λ2(L) is close to zero, we might expect that there is good candidatefor bisection.partition vertices by separating them according to the sign of theirentries in the corresponding eigenvector x2S = {v ∈ V : x2(v) ≥ 0}, S̄ = {v ∈ V : x2(v) < 0}

Kim Seonghyeon Statistical Analysis of Network Data with R April 14, 2017 26 / 27

Page 27: Statistical Analysis of Network Data with R - Network ...stat.snu.ac.kr/idea/seminar/20170414/network_cohesion_partition.pdf · Kim Seonghyeon Statistical Analysis of Network Data

Spectral Partitioning

0 5 10 15 20 25 30 35

010

2030

4050

Index

Eig

enva

lues

of G

raph

Lap

laci

an

0 5 10 15 20 25 30 35

−0.

3−

0.2

−0.

10.

00.

10.

2

Actor Number

Fie

dler

Vec

tor

Ent

ry

Figure 7: Agglomerative Clustering

Kim Seonghyeon Statistical Analysis of Network Data with R April 14, 2017 27 / 27