network characterization via random walks b. ribeiro, d. towsley umass-amherst

20
Network Characterization via Random Walks B. Ribeiro, D. Towsley UMass-Amherst

Upload: shawn-cunningham

Post on 12-Jan-2016

219 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Network Characterization via Random Walks B. Ribeiro, D. Towsley UMass-Amherst

Network Characterization via Random Walks

B. Ribeiro, D. TowsleyUMass-Amherst

Page 2: Network Characterization via Random Walks B. Ribeiro, D. Towsley UMass-Amherst

Problem

Given large, possibly dynamic, network, how does one efficiently sample/crawl to accurately characterize it?

degree distribution centrality clustering …

Page 3: Network Characterization via Random Walks B. Ribeiro, D. Towsley UMass-Amherst

Motivation

understanding technological networks, social networks Internet, wireless networks on-line social networks such as FaceBook,

MySpace, Orkut, YouTube, …

when network dataset not available size, lack of global view, dynamics

Page 4: Network Characterization via Random Walks B. Ribeiro, D. Towsley UMass-Amherst

Outline

review of sampling

random walks (RWs)

multiple coupled RWs

results

Page 5: Network Characterization via Random Walks B. Ribeiro, D. Towsley UMass-Amherst

Sampling methods

random sampling uniform vertex sampling

• θi - fraction of vertices with degree i

• degree i vertex sampled with probability θi

uniform edge sampling• πi - probability degree i vertex sampled

• πi = θi x i / <average degree>

crawling snowball sampling – commonly used, highly

biased random walk

Page 6: Network Characterization via Random Walks B. Ribeiro, D. Towsley UMass-Amherst

6

Estimate θi - fraction of vertices with degree i

Budget: B samples accuracy: Normalized root Mean Squared

Error

uniform vertex

uniform edge

Random sampling: accuracy of estimates

head: GOOD tail: BAD

q head: BAD

q tail: GOOD

Page 7: Network Characterization via Random Walks B. Ribeiro, D. Towsley UMass-Amherst

NM

SE

in-degree

Uniform vertex vs. edge sampling

edge

vertex

head: GOOD tail: BAD

GO

OD

head: BAD tail: GOOD

BA

D

Flickr graph (1.7 M vertices, 22M

edges)

budget: B = |V|/100

Page 8: Network Characterization via Random Walks B. Ribeiro, D. Towsley UMass-Amherst

8

uniform vertex

Pros: independent sampling OSN needs numeric

user IDs. E.g.: Livejournal, Flickr, MySpace, Facebook,...

Cons: resource intensive

(sparse user ID space) difficult to sample

large degree vertices

Pros & Consuniform edge

Pros:◦ independent sampling◦ easy to sample high

degree vertices

Cons:◦ no public OSN interface

to sample edges

Page 9: Network Characterization via Random Walks B. Ribeiro, D. Towsley UMass-Amherst

9

start at node v randomly select a neighbor of v repeat till collected B samples

sampling with replacement

Random walk (RW)

Page 10: Network Characterization via Random Walks B. Ribeiro, D. Towsley UMass-Amherst

Random walk sampling produces biased

estimate iRW

of i

easily corrected

iRW

= i i /avg. degree

i = Norm iRW

/iCCDF

RW sampling^ ^

Page 11: Network Characterization via Random Walks B. Ribeiro, D. Towsley UMass-Amherst

11

uniform vertex

Pros: independent sampling OSN needs numeric

user IDs. E.g.: Livejournal, Flickr, MySpace, Facebook,...

Cons: resource intensive

(sparse user ID space) difficult to sample

large degree vertices

Pros & Consrandom walk

Pros: asymptotically unbiased easy to sample high

degree vertices low cost resource-wise

Cons: graph must be

connected large estimation errors

when graph loosely connected

length of transient?

Page 12: Network Characterization via Random Walks B. Ribeiro, D. Towsley UMass-Amherst

12

uniform vertex samples A and C subgraphs but is expensive

RW samples A or C but is cheap

A

C

Combine advantages of

uniform vertex & RWs?

Hybrid sampling

Page 13: Network Characterization via Random Walks B. Ribeiro, D. Towsley UMass-Amherst

Multiple random walks

m independent uniformly placed RWs split budget B among

them

Pros cover all components whp as m increases

Cons bias due to transient difficult to combine estimates

Couple the RWs?

Page 14: Network Characterization via Random Walks B. Ribeiro, D. Towsley UMass-Amherst

14

m coupled walkers

B – sampling budget

S = {v1, … , vm} initial set of m vertices; E’ =

(1) start from vr S w.p. deg(vr)

(2) walk one step from vr

(3) add walked edge to E’ and update vr

(4) return to (1) (until m + | E’ | = B)

Frontier Sampling (FS)

Page 15: Network Characterization via Random Walks B. Ribeiro, D. Towsley UMass-Amherst

Random walk on Gm

At steady state

samples edges uniformlyas m → , walkers uniformly distributed in

graph m coupled RWs start approximately in

steady state short transient

15

FS properties

Page 16: Network Characterization via Random Walks B. Ribeiro, D. Towsley UMass-Amherst

16

Sample paths for θ1 estimate (Flickr graph)

Plot evolution (n) , n - number of steps

Page 17: Network Characterization via Random Walks B. Ribeiro, D. Towsley UMass-Amherst

17

large connected component of Flickr graph

accuracy metric: NMSE of CCDF

Sampling errors

in-degree

NM

SE

Page 18: Network Characterization via Random Walks B. Ribeiro, D. Towsley UMass-Amherst

18

2 Albert-Barabasi graphs with average degrees 2, 10, connected by one edge

Sampling errors: GAB graph

in-degree

NM

SE

Page 19: Network Characterization via Random Walks B. Ribeiro, D. Towsley UMass-Amherst

20

m independent walkers walker i takes next step with

exponentially distributed time, mean current node degree

walkers run for time T, report to central site

Distributed FS

Page 20: Network Characterization via Random Walks B. Ribeiro, D. Towsley UMass-Amherst

Future work analyzing, speeding up convergence

other forms of coupling other graph statistics study how graph structure affects

sampling efficiency power law vs exponential tail spatial correlation, independence vs. SRD

vs. LRD application to different networks

wireless, social, wireless/social