1 a random-surfer web-graph model (joint work with avrim blum & hubert chan) mugizi rwebangira
TRANSCRIPT
1
A Random-Surfer Web-Graph Model
(Joint work with Avrim Blum & Hubert Chan)
Mugizi Rwebangira
2
The Web as a GraphConsider the World Wide Web as a graph, with web pages as nodes and hyperlinks between pages as edges.
links.html
resume.html
index.htmlhttp://cnn.com
3
Studying the Web
Since the Web emerged there has been a lot of interest in:
1. Empirically studying properties of the Web Graph.
2. Modeling the Web Graph mathematically.
Benefits of Generative Models:
1. Simulation – When real data is scarce
2. Extrapolation – How will the graph change?
3. Understanding – Inspire further research on real data
4
Power Law
The distribution of a random variable X follows a power law ifProb [X=k] ~ Ck-α
f(x) ~ g(x) if Limx→∞ f(x)/g(x) = 1
e.g (x+1) ~ (x+2)
Example: Prob [X=k] = k-2
5
Power Law: Prob [X=k] = k-2
6
Power Law
log Prob [X=k] ~ log C –α log k
Prob [X=k] ~ Ck-α
Prob [X=k] = k-2
log Prob [X=k] = -2 log k
7
Power Law: Log-Log plot
8
Power Law contd.
Prob [X≥k] ~ Ck-α
Particularly useful if X takes on real values.
More general definition:
Sometimes referred to as “heavy tailed” or “scale free.”
9
Power Laws in Degree distribution
Let G be a graph.
Let Xk be the proportion of nodes with degree k in G.
Then if Xk ~ Ck-α
we say that G has power law degree distribution.
10
Properties of the Web Graph
A Power-law degree distribution has been observed in a wide variety of graphs including citation networks, social networks, protein-protein interaction networks and so on.
It has also been observed in the Web Graph. [Barabási & Albert]
11
Outline
• Background/Previous Work
• Motivation
• Models
• Theoretical results
• Experimental results
• Conclusions
12
Classic Random Graph Models
• In the G(n,p) random graph model:1. There are n nodes.
2. There is an edge between any two nodes with probability p.
•Was proposed by Erdös and Renyi in 1960s.
13
Online G(n,p)
In this model each new node makes k connections to existing nodes uniformly at random.
For this talk we will focus on k = 1,
hence the graph will be a tree.
14
Online G(n,p)
T=1
T=2
½T=3
½
T=4
⅓
⅓ ⅓
15
Properties of Online G(n,p)
• Xk = Proportion of nodes with degree k
E[Xk] = (½k)
• E[degree of first node] = 1+ 1/2 +1/3+1/4 + …1/n = (log n)
• E[max degree] = (log n)
NOT POWER LAWED!!
16
Online G(n,p) (n=100,000, average of 100 runs)
17
Preferential AttachmentIn the Preferential Attachment model, each newnode connects to the existing nodes with a
probability proportional to their degree.
[Barabási & Albert]
18
Preferential Attachment
T=2
¾T=3
¼
Deg = 3 Deg = 1
T=4
32
61
61
Deg = 4 Deg = 1
Deg = 1
T=1Degree = in-degree + out-degree
19
Preferential Attachment
Preferential Attachment gives a power-law degree distribution. [Mitzenmacher, Cooper & Frieze 03, KRRSTU00]
E[degree of 1st node] = √n
20
Preferential Attachment
21
Other Models
Kumar et. al. proposed the “copying model.” [KRRSTU00]
Leskovec et. al. propose a “forest fire” model which has some similarites to this work. [LKF05]
22
Outline
• Background/Previous Work
• Motivation
• Models
• Theoretical results
• Experimental results
• Conclusions
23
Motivating Questions
Why would a new node connect to nodes of high degree?-Are high degree nodes more attractive?-Or are there other explanations?
How does a new node find out what the high degree nodes are?
24
Motivating QuestionsMotivating Observation:
•If p is small then this is the same as preferential attachment.
•Suppose a user does a (undirected) random walk until they find an interesting page.
•What about other processes and directed graphs?
•Suppose each page has a small probability p of being interesting.
25
Outline
• Background/Previous Work
• Motivation
• Models
• Theoretical results
• Experimental results
• Conclusions
26
Directed 1-step Random Surfer, p=.5
¾
T=3
¼
(½) (½)+ (½) (½)+ (½) (½)
T=1Start with a single node with a self-loop.
T=2 1. Choose a node uniformly at random2. With probability p connect3. With probability (1-p) connect to its neighbor
27
Directed 1-step Random SurferIt turns out this model is a mixture of connecting to nodes uniformly at random and preferential attachment.
But taking one step is not very natural.
Has a power-law degree distribution.
What about doing a real random walk?
28
NEW NODE
RANDOM STARTING NODE
1. COIN TOSS: TAIL (at node A)2. COIN TOSS: TAIL (at node B)
3. COIN TOSS: HEAD (at node C)
1. Pick a node uniformly at random.
2. Flip a coin of bias pIf HEADS connect to current node, else walk to neighbor
AB
CD
Directed Coin Flipping model
29
Directed Coin Flipping model
1. At time 1, we start with a single node with a self-loop.
2. At time t, we choose a node u uniformly at random.
3. We then flip a coin of bias p.
4. If the coin comes up heads, we connect to the current node.
5. Else we walk to a random neighbor and go to step 3.
“each page has equal probability p of being interesting to us”
30
Outline
• Background/Previous Work
• Motivation
• Models
• Theoretical results
• Experimental results
• Conclusions
31
Is Directed Coin-Flipping Power-lawed?
We don’t know … but we do have some partial results ...
32
Virtual DegreeDefinitions:
Let li(u) be the number of level i descendents of node u.l1(u) = # of childrenl2(u) = # of grandchildren, e.t.c.
Let = (β1, β2,..) be a sequence of real numbers with 1=1.
Then v(u) = 1 + β1 l1(u) + β2 l2(u) + β3 l3(u) + …
We’ll call v(u) the “Virtual degree of u with respect to .”
33
u
Virtual Degree
v(u) = 1 + β1 (2) + β2 (4) + β3 (0) + β4 (0) + ...
# of children # of grandchildren
34
Virtual Degree
Easy observation: If we set βi = (1-p)i then the expected increase in deg(u) is proportional to v(u).
Expected increase in deg(u) = p/t + (1-p)pl1(u)/t + (1-p)2pl2(u)/t + …= (p/t)v(u)
u
35
Virtual DegreeTheorem: There always exist βi such that 1. For i ≥ 1, |βi| · 1.2. As i → ∞, βi →0 exponentially. 3. The expected increase in v(u) is proportional to v(u).
Recurrence: 1=1, 2=p, i+1=i – (1-p)i-1
for p=½, i = 1, 1/2, 0, -1/4, -1/4, -1/8, 0, 1/16, …
E.g., for p=¾, i = 1, 3/4, 1/2, 5/16, 3/16, 7/64,...
36
Virtual Degree, continued
Theorem: For any node u and time t ≥ tu, E[vt(u)] = Θ((t/tu)p)
Let vt(u) be the virtual degree of node u at time t and tu be the time when node u first appears.
So, the expected virtual degrees follow a power law.
37
Actual Degree
Theorem: For any node u and time t ≥ tu, E[degree(u)] ≥ Ω((t/tu)p(1-p))
We can also obtain lower bounds on the expected values of the actual degrees:
38
Outline
• Background/Previous Work
• Motivation
• Models
• Theoretical results
• Experimental results
• Conclusions
39
Experiments
• Random graphs of n=100,000 nodes
• Compute statistics averaged over 100 runs.
• K=1 (Every node has out-degree 1)
40
Online Erdös-Renyi
41
Directed 1-Step Random Surfer, p=3/4
42
Directed 1-Step Random Surfer, p=1/2
43
Directed 1-Step Random Surfer, p=1/4
44
Directed Coin Flipping, p=1/2
45
Directed Coin Flipping, p=1/4
46
Undirected coin flipping, p=1/2
47
Undirected Coin Flipping p=0.05
48
Outline
• Background/Previous Work
• Motivation
• Models
• Theoretical results
• Experimental results
• Conclusions
49
Conclusions
Directed random walk models appear to generate power-laws (and partial theoretical results).
Power laws can naturally emerge, even if all nodes have the same intrinsic “attractiveness”.
50
Open questions
•Can we prove that the degrees in the directed coin-flipping model do indeed follow a power law?
•Analyze degree distribution for the undirected coin-flipping model with p=1/2?
•Suppose page i has “interestingness” pi. Can we analyze the degree as a function of t, i and pi?
51
Questions?