1 a random-surfer web-graph model (joint work with avrim blum & hubert chan) mugizi rwebangira

1

A Random-Surfer Web-Graph Model

(Joint work with Avrim Blum & Hubert Chan)

Mugizi Rwebangira

2

The Web as a GraphConsider the World Wide Web as a graph, with web pages as nodes and hyperlinks between pages as edges.

links.html

resume.html

index.htmlhttp://cnn.com

3

Studying the Web

Since the Web emerged there has been a lot of interest in:

1. Empirically studying properties of the Web Graph.

2. Modeling the Web Graph mathematically.

Benefits of Generative Models:

1. Simulation – When real data is scarce

2. Extrapolation – How will the graph change?

3. Understanding – Inspire further research on real data

4

Power Law

The distribution of a random variable X follows a power law ifProb [X=k] ~ Ck-α

f(x) ~ g(x) if Limx→∞ f(x)/g(x) = 1

e.g (x+1) ~ (x+2)

Example: Prob [X=k] = k-2

5

Power Law: Prob [X=k] = k-2

6

Power Law

log Prob [X=k] ~ log C –α log k

Prob [X=k] ~ Ck-α

Prob [X=k] = k-2

log Prob [X=k] = -2 log k

7

Power Law: Log-Log plot

8

Power Law contd.

Prob [X≥k] ~ Ck-α

Particularly useful if X takes on real values.

More general definition:

Sometimes referred to as “heavy tailed” or “scale free.”

9

Power Laws in Degree distribution

Let G be a graph.

Let Xk be the proportion of nodes with degree k in G.

Then if Xk ~ Ck-α

we say that G has power law degree distribution.

10

Properties of the Web Graph

A Power-law degree distribution has been observed in a wide variety of graphs including citation networks, social networks, protein-protein interaction networks and so on.

It has also been observed in the Web Graph. [Barabási & Albert]

11

Outline

• Background/Previous Work

• Motivation

• Models

• Theoretical results

• Experimental results

• Conclusions

12

Classic Random Graph Models

• In the G(n,p) random graph model:1. There are n nodes.

2. There is an edge between any two nodes with probability p.

•Was proposed by Erdös and Renyi in 1960s.

13

Online G(n,p)

In this model each new node makes k connections to existing nodes uniformly at random.

For this talk we will focus on k = 1,

hence the graph will be a tree.

14

Online G(n,p)

T=1

T=2

½T=3

½

T=4

⅓

⅓ ⅓

15

Properties of Online G(n,p)

• Xk = Proportion of nodes with degree k

E[Xk] = (½k)

• E[degree of first node] = 1+ 1/2 +1/3+1/4 + …1/n = (log n)

• E[max degree] = (log n)

NOT POWER LAWED!!

16

Online G(n,p) (n=100,000, average of 100 runs)

17

Preferential AttachmentIn the Preferential Attachment model, each newnode connects to the existing nodes with a

probability proportional to their degree.

[Barabási & Albert]

18

Preferential Attachment

T=2

¾T=3

¼

Deg = 3 Deg = 1

T=4

32

61

61

Deg = 4 Deg = 1

Deg = 1

T=1Degree = in-degree + out-degree

19


Preferential Attachment gives a power-law degree distribution. [Mitzenmacher, Cooper & Frieze 03, KRRSTU00]

E[degree of 1st node] = √n

20


21

Other Models

Kumar et. al. proposed the “copying model.” [KRRSTU00]

Leskovec et. al. propose a “forest fire” model which has some similarites to this work. [LKF05]

22

Outline


• Motivation

• Models



• Conclusions

23

Motivating Questions

Why would a new node connect to nodes of high degree?-Are high degree nodes more attractive?-Or are there other explanations?

How does a new node find out what the high degree nodes are?

24

Motivating QuestionsMotivating Observation:

•If p is small then this is the same as preferential attachment.

•Suppose a user does a (undirected) random walk until they find an interesting page.

•What about other processes and directed graphs?

•Suppose each page has a small probability p of being interesting.

25

Outline


• Motivation

• Models



• Conclusions

26

Directed 1-step Random Surfer, p=.5

¾

T=3

¼

(½) (½)+ (½) (½)+ (½) (½)

T=1Start with a single node with a self-loop.

T=2 1. Choose a node uniformly at random2. With probability p connect3. With probability (1-p) connect to its neighbor

27

Directed 1-step Random SurferIt turns out this model is a mixture of connecting to nodes uniformly at random and preferential attachment.

But taking one step is not very natural.

Has a power-law degree distribution.

What about doing a real random walk?

28

NEW NODE

RANDOM STARTING NODE

1. COIN TOSS: TAIL (at node A)2. COIN TOSS: TAIL (at node B)

3. COIN TOSS: HEAD (at node C)

1. Pick a node uniformly at random.

2. Flip a coin of bias pIf HEADS connect to current node, else walk to neighbor

AB

CD

Directed Coin Flipping model

29

Directed Coin Flipping model

1. At time 1, we start with a single node with a self-loop.

2. At time t, we choose a node u uniformly at random.

3. We then flip a coin of bias p.

4. If the coin comes up heads, we connect to the current node.

5. Else we walk to a random neighbor and go to step 3.

“each page has equal probability p of being interesting to us”

30

Outline


• Motivation

• Models



• Conclusions

31

Is Directed Coin-Flipping Power-lawed?

We don’t know … but we do have some partial results ...

32

Virtual DegreeDefinitions:

Let li(u) be the number of level i descendents of node u.l1(u) = # of childrenl2(u) = # of grandchildren, e.t.c.

Let = (β1, β2,..) be a sequence of real numbers with 1=1.

Then v(u) = 1 + β1 l1(u) + β2 l2(u) + β3 l3(u) + …

We’ll call v(u) the “Virtual degree of u with respect to .”

33

u

Virtual Degree

v(u) = 1 + β1 (2) + β2 (4) + β3 (0) + β4 (0) + ...

# of children # of grandchildren

34

Virtual Degree

Easy observation: If we set βi = (1-p)i then the expected increase in deg(u) is proportional to v(u).

Expected increase in deg(u) = p/t + (1-p)pl1(u)/t + (1-p)2pl2(u)/t + …= (p/t)v(u)

u

35

Virtual DegreeTheorem: There always exist βi such that 1. For i ≥ 1, |βi| · 1.2. As i → ∞, βi →0 exponentially. 3. The expected increase in v(u) is proportional to v(u).

Recurrence: 1=1, 2=p, i+1=i – (1-p)i-1

for p=½, i = 1, 1/2, 0, -1/4, -1/4, -1/8, 0, 1/16, …

E.g., for p=¾, i = 1, 3/4, 1/2, 5/16, 3/16, 7/64,...

36

Virtual Degree, continued

Theorem: For any node u and time t ≥ tu, E[vt(u)] = Θ((t/tu)p)

Let vt(u) be the virtual degree of node u at time t and tu be the time when node u first appears.

So, the expected virtual degrees follow a power law.

37

Actual Degree

Theorem: For any node u and time t ≥ tu, E[degree(u)] ≥ Ω((t/tu)p(1-p))

We can also obtain lower bounds on the expected values of the actual degrees:

38

Outline


• Motivation

• Models



• Conclusions

39

Experiments

• Random graphs of n=100,000 nodes

• Compute statistics averaged over 100 runs.

• K=1 (Every node has out-degree 1)

40

Online Erdös-Renyi

41

Directed 1-Step Random Surfer, p=3/4

42


43


44

Directed Coin Flipping, p=1/2

45

Directed Coin Flipping, p=1/4

46

Undirected coin flipping, p=1/2

47

Undirected Coin Flipping p=0.05

48

Outline


• Motivation

• Models



• Conclusions

49

Conclusions

Directed random walk models appear to generate power-laws (and partial theoretical results).

Power laws can naturally emerge, even if all nodes have the same intrinsic “attractiveness”.

50

Open questions

•Can we prove that the degrees in the directed coin-flipping model do indeed follow a power law?

•Analyze degree distribution for the undirected coin-flipping model with p=1/2?

•Suppose page i has “interestingness” pi. Can we analyze the degree as a function of t, i and pi?

51

Questions?

1 a random-surfer web-graph model (joint work with avrim blum & hubert chan) mugizi rwebangira

Documents

degree slide

n slide

lkf05 slide

runs slide

preferential attachment

web graph

loglog plot slide

power law log prob x