1 a random-surfer web-graph model (joint work with avrim blum & hubert chan) mugizi rwebangira

51
1 A Random-Surfer Web- Graph Model (Joint work with Avrim Blum & Hubert Chan) Mugizi Rwebangira

Upload: warren-hensley

Post on 18-Dec-2015

215 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: 1 A Random-Surfer Web-Graph Model (Joint work with Avrim Blum & Hubert Chan) Mugizi Rwebangira

1

A Random-Surfer Web-Graph Model

(Joint work with Avrim Blum & Hubert Chan)

Mugizi Rwebangira

Page 2: 1 A Random-Surfer Web-Graph Model (Joint work with Avrim Blum & Hubert Chan) Mugizi Rwebangira

2

The Web as a GraphConsider the World Wide Web as a graph, with web pages as nodes and hyperlinks between pages as edges.

links.html

resume.html

index.htmlhttp://cnn.com

Page 3: 1 A Random-Surfer Web-Graph Model (Joint work with Avrim Blum & Hubert Chan) Mugizi Rwebangira

3

Studying the Web

Since the Web emerged there has been a lot of interest in:

1. Empirically studying properties of the Web Graph.

2. Modeling the Web Graph mathematically.

Benefits of Generative Models:

1. Simulation – When real data is scarce

2. Extrapolation – How will the graph change?

3. Understanding – Inspire further research on real data

Page 4: 1 A Random-Surfer Web-Graph Model (Joint work with Avrim Blum & Hubert Chan) Mugizi Rwebangira

4

Power Law

The distribution of a random variable X follows a power law ifProb [X=k] ~ Ck-α

f(x) ~ g(x) if Limx→∞ f(x)/g(x) = 1

e.g (x+1) ~ (x+2)

Example: Prob [X=k] = k-2

Page 5: 1 A Random-Surfer Web-Graph Model (Joint work with Avrim Blum & Hubert Chan) Mugizi Rwebangira

5

Power Law: Prob [X=k] = k-2

Page 6: 1 A Random-Surfer Web-Graph Model (Joint work with Avrim Blum & Hubert Chan) Mugizi Rwebangira

6

Power Law

log Prob [X=k] ~ log C –α log k

Prob [X=k] ~ Ck-α

Prob [X=k] = k-2

log Prob [X=k] = -2 log k

Page 7: 1 A Random-Surfer Web-Graph Model (Joint work with Avrim Blum & Hubert Chan) Mugizi Rwebangira

7

Power Law: Log-Log plot

Page 8: 1 A Random-Surfer Web-Graph Model (Joint work with Avrim Blum & Hubert Chan) Mugizi Rwebangira

8

Power Law contd.

Prob [X≥k] ~ Ck-α

Particularly useful if X takes on real values.

More general definition:

Sometimes referred to as “heavy tailed” or “scale free.”

Page 9: 1 A Random-Surfer Web-Graph Model (Joint work with Avrim Blum & Hubert Chan) Mugizi Rwebangira

9

Power Laws in Degree distribution

Let G be a graph.

Let Xk be the proportion of nodes with degree k in G.

Then if Xk ~ Ck-α

we say that G has power law degree distribution.

Page 10: 1 A Random-Surfer Web-Graph Model (Joint work with Avrim Blum & Hubert Chan) Mugizi Rwebangira

10

Properties of the Web Graph

A Power-law degree distribution has been observed in a wide variety of graphs including citation networks, social networks, protein-protein interaction networks and so on.

It has also been observed in the Web Graph. [Barabási & Albert]

Page 11: 1 A Random-Surfer Web-Graph Model (Joint work with Avrim Blum & Hubert Chan) Mugizi Rwebangira

11

Outline

• Background/Previous Work

• Motivation

• Models

• Theoretical results

• Experimental results

• Conclusions

Page 12: 1 A Random-Surfer Web-Graph Model (Joint work with Avrim Blum & Hubert Chan) Mugizi Rwebangira

12

Classic Random Graph Models

• In the G(n,p) random graph model:1. There are n nodes.

2. There is an edge between any two nodes with probability p.

•Was proposed by Erdös and Renyi in 1960s.

Page 13: 1 A Random-Surfer Web-Graph Model (Joint work with Avrim Blum & Hubert Chan) Mugizi Rwebangira

13

Online G(n,p)

In this model each new node makes k connections to existing nodes uniformly at random.

For this talk we will focus on k = 1,

hence the graph will be a tree.

Page 14: 1 A Random-Surfer Web-Graph Model (Joint work with Avrim Blum & Hubert Chan) Mugizi Rwebangira

14

Online G(n,p)

T=1

T=2

½T=3

½

T=4

⅓ ⅓

Page 15: 1 A Random-Surfer Web-Graph Model (Joint work with Avrim Blum & Hubert Chan) Mugizi Rwebangira

15

Properties of Online G(n,p)

• Xk = Proportion of nodes with degree k

E[Xk] = (½k)

• E[degree of first node] = 1+ 1/2 +1/3+1/4 + …1/n = (log n)

• E[max degree] = (log n)

NOT POWER LAWED!!

Page 16: 1 A Random-Surfer Web-Graph Model (Joint work with Avrim Blum & Hubert Chan) Mugizi Rwebangira

16

Online G(n,p) (n=100,000, average of 100 runs)

Page 17: 1 A Random-Surfer Web-Graph Model (Joint work with Avrim Blum & Hubert Chan) Mugizi Rwebangira

17

Preferential AttachmentIn the Preferential Attachment model, each newnode connects to the existing nodes with a

probability proportional to their degree.

[Barabási & Albert]

Page 18: 1 A Random-Surfer Web-Graph Model (Joint work with Avrim Blum & Hubert Chan) Mugizi Rwebangira

18

Preferential Attachment

T=2

¾T=3

¼

Deg = 3 Deg = 1

T=4

32

61

61

Deg = 4 Deg = 1

Deg = 1

T=1Degree = in-degree + out-degree

Page 19: 1 A Random-Surfer Web-Graph Model (Joint work with Avrim Blum & Hubert Chan) Mugizi Rwebangira

19

Preferential Attachment

Preferential Attachment gives a power-law degree distribution. [Mitzenmacher, Cooper & Frieze 03, KRRSTU00]

E[degree of 1st node] = √n

Page 20: 1 A Random-Surfer Web-Graph Model (Joint work with Avrim Blum & Hubert Chan) Mugizi Rwebangira

20

Preferential Attachment

Page 21: 1 A Random-Surfer Web-Graph Model (Joint work with Avrim Blum & Hubert Chan) Mugizi Rwebangira

21

Other Models

Kumar et. al. proposed the “copying model.” [KRRSTU00]

Leskovec et. al. propose a “forest fire” model which has some similarites to this work. [LKF05]

Page 22: 1 A Random-Surfer Web-Graph Model (Joint work with Avrim Blum & Hubert Chan) Mugizi Rwebangira

22

Outline

• Background/Previous Work

• Motivation

• Models

• Theoretical results

• Experimental results

• Conclusions

Page 23: 1 A Random-Surfer Web-Graph Model (Joint work with Avrim Blum & Hubert Chan) Mugizi Rwebangira

23

Motivating Questions

Why would a new node connect to nodes of high degree?-Are high degree nodes more attractive?-Or are there other explanations?

How does a new node find out what the high degree nodes are?

Page 24: 1 A Random-Surfer Web-Graph Model (Joint work with Avrim Blum & Hubert Chan) Mugizi Rwebangira

24

Motivating QuestionsMotivating Observation:

•If p is small then this is the same as preferential attachment.

•Suppose a user does a (undirected) random walk until they find an interesting page.

•What about other processes and directed graphs?

•Suppose each page has a small probability p of being interesting.

Page 25: 1 A Random-Surfer Web-Graph Model (Joint work with Avrim Blum & Hubert Chan) Mugizi Rwebangira

25

Outline

• Background/Previous Work

• Motivation

• Models

• Theoretical results

• Experimental results

• Conclusions

Page 26: 1 A Random-Surfer Web-Graph Model (Joint work with Avrim Blum & Hubert Chan) Mugizi Rwebangira

26

Directed 1-step Random Surfer, p=.5

¾

T=3

¼

(½) (½)+ (½) (½)+ (½) (½)

T=1Start with a single node with a self-loop.

T=2 1. Choose a node uniformly at random2. With probability p connect3. With probability (1-p) connect to its neighbor

Page 27: 1 A Random-Surfer Web-Graph Model (Joint work with Avrim Blum & Hubert Chan) Mugizi Rwebangira

27

Directed 1-step Random SurferIt turns out this model is a mixture of connecting to nodes uniformly at random and preferential attachment.

But taking one step is not very natural.

Has a power-law degree distribution.

What about doing a real random walk?

Page 28: 1 A Random-Surfer Web-Graph Model (Joint work with Avrim Blum & Hubert Chan) Mugizi Rwebangira

28

NEW NODE

RANDOM STARTING NODE

1. COIN TOSS: TAIL (at node A)2. COIN TOSS: TAIL (at node B)

3. COIN TOSS: HEAD (at node C)

1. Pick a node uniformly at random.

2. Flip a coin of bias pIf HEADS connect to current node, else walk to neighbor

AB

CD

Directed Coin Flipping model

Page 29: 1 A Random-Surfer Web-Graph Model (Joint work with Avrim Blum & Hubert Chan) Mugizi Rwebangira

29

Directed Coin Flipping model

1. At time 1, we start with a single node with a self-loop.

2. At time t, we choose a node u uniformly at random.

3. We then flip a coin of bias p.

4. If the coin comes up heads, we connect to the current node.

5. Else we walk to a random neighbor and go to step 3.

“each page has equal probability p of being interesting to us”

Page 30: 1 A Random-Surfer Web-Graph Model (Joint work with Avrim Blum & Hubert Chan) Mugizi Rwebangira

30

Outline

• Background/Previous Work

• Motivation

• Models

• Theoretical results

• Experimental results

• Conclusions

Page 31: 1 A Random-Surfer Web-Graph Model (Joint work with Avrim Blum & Hubert Chan) Mugizi Rwebangira

31

Is Directed Coin-Flipping Power-lawed?

We don’t know … but we do have some partial results ...

Page 32: 1 A Random-Surfer Web-Graph Model (Joint work with Avrim Blum & Hubert Chan) Mugizi Rwebangira

32

Virtual DegreeDefinitions:

Let li(u) be the number of level i descendents of node u.l1(u) = # of childrenl2(u) = # of grandchildren, e.t.c.

Let = (β1, β2,..) be a sequence of real numbers with 1=1.

Then v(u) = 1 + β1 l1(u) + β2 l2(u) + β3 l3(u) + …

We’ll call v(u) the “Virtual degree of u with respect to .”

Page 33: 1 A Random-Surfer Web-Graph Model (Joint work with Avrim Blum & Hubert Chan) Mugizi Rwebangira

33

u

Virtual Degree

v(u) = 1 + β1 (2) + β2 (4) + β3 (0) + β4 (0) + ...

# of children # of grandchildren

Page 34: 1 A Random-Surfer Web-Graph Model (Joint work with Avrim Blum & Hubert Chan) Mugizi Rwebangira

34

Virtual Degree

Easy observation: If we set βi = (1-p)i then the expected increase in deg(u) is proportional to v(u).

Expected increase in deg(u) = p/t + (1-p)pl1(u)/t + (1-p)2pl2(u)/t + …= (p/t)v(u)

u

Page 35: 1 A Random-Surfer Web-Graph Model (Joint work with Avrim Blum & Hubert Chan) Mugizi Rwebangira

35

Virtual DegreeTheorem: There always exist βi such that 1. For i ≥ 1, |βi| · 1.2. As i → ∞, βi →0 exponentially. 3. The expected increase in v(u) is proportional to v(u).

Recurrence: 1=1, 2=p, i+1=i – (1-p)i-1

for p=½, i = 1, 1/2, 0, -1/4, -1/4, -1/8, 0, 1/16, …

E.g., for p=¾, i = 1, 3/4, 1/2, 5/16, 3/16, 7/64,...

Page 36: 1 A Random-Surfer Web-Graph Model (Joint work with Avrim Blum & Hubert Chan) Mugizi Rwebangira

36

Virtual Degree, continued

Theorem: For any node u and time t ≥ tu, E[vt(u)] = Θ((t/tu)p)

Let vt(u) be the virtual degree of node u at time t and tu be the time when node u first appears.

So, the expected virtual degrees follow a power law.

Page 37: 1 A Random-Surfer Web-Graph Model (Joint work with Avrim Blum & Hubert Chan) Mugizi Rwebangira

37

Actual Degree

Theorem: For any node u and time t ≥ tu, E[degree(u)] ≥ Ω((t/tu)p(1-p))

We can also obtain lower bounds on the expected values of the actual degrees:

Page 38: 1 A Random-Surfer Web-Graph Model (Joint work with Avrim Blum & Hubert Chan) Mugizi Rwebangira

38

Outline

• Background/Previous Work

• Motivation

• Models

• Theoretical results

• Experimental results

• Conclusions

Page 39: 1 A Random-Surfer Web-Graph Model (Joint work with Avrim Blum & Hubert Chan) Mugizi Rwebangira

39

Experiments

• Random graphs of n=100,000 nodes

• Compute statistics averaged over 100 runs.

• K=1 (Every node has out-degree 1)

Page 40: 1 A Random-Surfer Web-Graph Model (Joint work with Avrim Blum & Hubert Chan) Mugizi Rwebangira

40

Online Erdös-Renyi

Page 41: 1 A Random-Surfer Web-Graph Model (Joint work with Avrim Blum & Hubert Chan) Mugizi Rwebangira

41

Directed 1-Step Random Surfer, p=3/4

Page 42: 1 A Random-Surfer Web-Graph Model (Joint work with Avrim Blum & Hubert Chan) Mugizi Rwebangira

42

Directed 1-Step Random Surfer, p=1/2

Page 43: 1 A Random-Surfer Web-Graph Model (Joint work with Avrim Blum & Hubert Chan) Mugizi Rwebangira

43

Directed 1-Step Random Surfer, p=1/4

Page 44: 1 A Random-Surfer Web-Graph Model (Joint work with Avrim Blum & Hubert Chan) Mugizi Rwebangira

44

Directed Coin Flipping, p=1/2

Page 45: 1 A Random-Surfer Web-Graph Model (Joint work with Avrim Blum & Hubert Chan) Mugizi Rwebangira

45

Directed Coin Flipping, p=1/4

Page 46: 1 A Random-Surfer Web-Graph Model (Joint work with Avrim Blum & Hubert Chan) Mugizi Rwebangira

46

Undirected coin flipping, p=1/2

Page 47: 1 A Random-Surfer Web-Graph Model (Joint work with Avrim Blum & Hubert Chan) Mugizi Rwebangira

47

Undirected Coin Flipping p=0.05

Page 48: 1 A Random-Surfer Web-Graph Model (Joint work with Avrim Blum & Hubert Chan) Mugizi Rwebangira

48

Outline

• Background/Previous Work

• Motivation

• Models

• Theoretical results

• Experimental results

• Conclusions

Page 49: 1 A Random-Surfer Web-Graph Model (Joint work with Avrim Blum & Hubert Chan) Mugizi Rwebangira

49

Conclusions

Directed random walk models appear to generate power-laws (and partial theoretical results).

Power laws can naturally emerge, even if all nodes have the same intrinsic “attractiveness”.

Page 50: 1 A Random-Surfer Web-Graph Model (Joint work with Avrim Blum & Hubert Chan) Mugizi Rwebangira

50

Open questions

•Can we prove that the degrees in the directed coin-flipping model do indeed follow a power law?

•Analyze degree distribution for the undirected coin-flipping model with p=1/2?

•Suppose page i has “interestingness” pi. Can we analyze the degree as a function of t, i and pi?

Page 51: 1 A Random-Surfer Web-Graph Model (Joint work with Avrim Blum & Hubert Chan) Mugizi Rwebangira

51

Questions?