estimating pagerank on graph streams
DESCRIPTION
Estimating PageRank on Graph Streams. Atish Das Sarma (Georgia Tech) Sreenivas Gollapudi, Rina Panigrahy (Microsoft Research). PageRank. PageRank Determine Ranking of nodes in graphs Typically large graphs - WWW, Social Networks Run daily by commercial search engines. - PowerPoint PPT PresentationTRANSCRIPT
Estimating PageRank on Graph Streams
Atish Das Sarma (Georgia Tech) Sreenivas Gollapudi,
Rina Panigrahy (Microsoft Research)
PageRank
• PageRank – Determine Ranking of nodes in graphs
• Typically large graphs - WWW, Social Networks
• Run daily by commercial search engines
PageRank computation
u
a
b
c
PageRank Computation
Our Approach:No Matrix-Vector
Multiplication!
u
a
b
c
Our Result
Many Random Walk SamplesEfficiently.
Approximate PageRank
u
Other results from Random Walks
We can estimate:Mixing TimeConductance
Using Streams
G
u
Streaming
7
e1, e2, e3, e4, e5, e6, e7, ….
Input is a “stream”
Small RAM working memory
Few Passes
Frequency moments, quantiles
Graphs: Edges, arbitrary order
010001011
011101011
0100110111
Related Work
• Sparsifiers (Benczur-Karger 96, Spielman-Teng 01, Spielman-Srivastava 08)– Given an undirected graph, produces a sparse one– approximately preserves x’Lx– Can be used to compute sparse cuts
• Streaming version of BK96 (Ahn, Guha 09)– Sparse cuts in 1 pass and O(n) space.
• Accelarated Page Rank (McSherry 08)– heuristics
8
~
Key Idea
One walk from ulength l efficiently
Later extend toMany walks
u
vl
Single Random Walk - Naive Algo.
One Stepwith every
Pass!
Constant Space Passes
s
Second Naive Algo
Single PassSample sufficient edges!
If ,then sample2 out-edges
from each node.
(store order)
s
Comparison
Naive (single walk):
Our Result:
In fact walks!
u
l
Automatically:
Insight: Merge Short Walks
Sample fraction of nodes(centers)
passes - length walks
Merge and extendshort walks!
Two problems:End up at node second timeEnd up at non-sampled node
s
w
w
w
w
w
w
w
ab
Stuck Nodes
Sample an edgefrom stuck.
Again.And again...
Slow?
If new nodes, good in passes!
s
w
w
w
w
w
w
w
Stuck nodes
Stuck on sameNodes?
Sample s edges from each
s progress ORnew node!
Must include to set previous seen
centers
s
w
w
w
w
w
w
w
ww s s
s
s s
s
Summary
s
w
w
w
w
w
w
w
ww s s
s
s s
s
• Perform short walks from sampled centers
• Concatenate walks until stuck
• Sample edges from stuck
• Make local progress until new node
• Local progress = s• New node : center with
prob • Amortized progress,
every pass
Summary
s
w
w
w
w
w
w
w
ww s s
s
s s
s
Total number of passes :
Total Space :
Summary
s
w
w
w
w
w
w
w
ww s s
s
s s
s
Set
Number of passes =
Space =
Many WalksNaive Space
Bound:
Observation:Many short walks
not used inSingle RW.
s
w
w
w
w
w
w
w
ww s s
s
s s
s
We show:
lnKnO /for )(~
Many Random Walks
ir
ir
w
lKrK i
ir
• : probability node ’s short walk used in single RW.
• If known : save lot of space!• Perform K random walks• Total number of short walks required is
about
• Don’t know . But can estimate.ir
Estimating
• Run K = (log n) walks of length
• Gives a crude estimate of • Sufficient to double K• Continue doubling K• Gives K walks in space
• Passes
u
l
ir
irO
)(~
Kll
KnO
Distributions
samples
Distribution: u
SpacePasses
Mixing Time, Conductance• Undirected graphs: Compare Distribution
with Steady State.• Estimating difference: samples.
[Batu et. al.’ 01]– approximate mixing time.
• Directed, till distribution “stabilizes”: samples.
• Conductance:• Recall space for walks: lnKnO /for )(
~
Results recap
• - Mixing Time for Undirected Graphs :
• Quadratic Approximation to Conductance• PageRank to accuracy
)(~
:Space nO
Open Questions?
• Improve passes for random walks. In particular, sub-linear space and constant passes.
• Graph Cuts and Graph Sparsification for directed graphs
• Better (streaming) algorithms for computing eigenvectors
Thank You!
Summary
• Perform short walks from sampled centers• Concatenate walks until stuck• Sample edges from stuck• Make local progress until new node• Local progress = s• New node = nodes gives center• Amortized, every pass -
Summary
• Perform short walks from sampled centers• Concatenate walks until stuck• Sample edges from stuck• Make local progress until new node• Local progress = s• New node = nodes gives center• Amortized, every pass -
Analysis
• Total number of passes :• Total Space : • Set• Number of passes = • Space =