the block two-level erdös-rényi (bter) graph model ali pinar, c. seshadri, and tamara g. kolda...

The Block Two-Level Erdös-Rényi (BTER) Graph Model

Ali Pinar, C. Seshadri, and Tamara G. KoldaSandia National Laboratories

July 13, 2012 Pinar - MMDS12 1

Sandia National Laboratories is a multi-program laboratory managed and operated by Sandia Corporation, a wholly owned subsidiary of Lockheed Martin Corporation, for the U.S. Department of Energy's National Nuclear Security Administration

under contract DE-AC04-94AL85000.

U.S. Department of EnergyOffice of Advanced Scientific Computing Research

U.S. Department of DefenseDefense Advanced Research Projects Agency

Pinar - MMDS12 2

Why model massive graphs? • Enable sharing of surrogate data

– Computer network traffic– Social networks– Financial transactions

• Testing graph algorithms– Scalability– Versatility (e.g., vary degree distributions)– Characterizing algorithm performance– Verification & validation

• Insight into…– Generative process– Community structure– Comparison– Evolution– Uncertainty

July 13, 2012

Block Two-Level Erdös-Rényi (BTER) graph; image courtesy of Nurcan Durak.

Pinar - MMDS12 3

Model Desiderata• Captures heavy-tailed degree

distribution– Not necessarily power law

• Captures community structure– Measured indirectly through

clustering coefficient, and other measures

• Able to “fit” real-world data– Reproduce degree distribution– Reproduce community structure

• Scales up to ~240 nodes – Motivated by GRAPH500

benchmark

July 13, 2012

A.-L. Barabasi and R. Albert. Emergence of scalingin random networks. Science, 286(5349):509-512, 1999.

Actor Collaboration WWW Power Grid

M.E.J. Newman and M. Girvan, Finding and evaluating community structure in networks,

Phys. Rev. E 69, 026113, 2004.http://www.graph500.org/

Preserving Degree Distribution

• Majority of models are amazingly bad at matching degree distributions

– Tend to focus on matching “spirit” of the distribution– May not even have the capability of matching some

distributions

• Exception is the Chung-Lu (CL) model [Chung & Lu, 2002]

– Also known as configuration model [Newman, 2003] or weighted Erdős-Rényi (ER) model

• Premise: Probability of edge is proportional to degrees of endpoints

– Pr(eij) = didj/(2m)

• Can be implemented in a scalable way by repeated edge insertions.

The CL model is nearly perfect at matching desired degree

distributions.

Degree

Degree Distribution

cit-HepPhCL

Pinar - MMDS12 5

Community Structure in Graphs• Numerous community finding algorithms exist

– Difficult to validate– Trouble in finding full range of sizes

• Instead, use related measures like clustering coefficient– Triangles arise because of community structure

• What “community” structure must be presentto ensure a high clustering coefficient, especially for low-degree nodes?– How many communities?– What do they look like?

July 13, 2012

M.E.J. Newman and M. Girvan, Finding and evaluating community structure in networks, Phys. Rev. E 69, 026113,

Pinar - MMDS12 6

Clustering Coefficients

July 13, 2012

ti = # triangles at vertex idi = degree of vertex i

Clustering Coefficient

Global Clustering Coeff.

Pinar - MMDS12 7

Clustering coefficients are higher for lower degree vertices

July 13, 2012

Model building approach • Find features that restrict the space and identify the

structure imposed by these features • Given

• a skewed degree distribution• high clustering coefficients

• What structure should we expect? • Let’s take it to an extreme, i.e., clustering coefficients ~1. • Implication 1: The graph must be a union of cliques.• Implication 2: The sizes of the cliques (communities) will have the

same distribution as the degree distribution. • The challenge is in characterizing the gray area:

clustering coefficients are high but not 1

Pinar - MMDS12 9

Building the basis for a model

Empirical Observations• Degree distributions are heavy tailed.• Clustering coefficients are highest for

small degree vertices.

July 13, 2012

We are not only trying to build a formal model, we are trying to formalize the model building process itself.

Seshadhri, Kolda, & Pinar, Phys. Rev. Let. E 2012

Theoretical Analysis Theorem: If a community has s edges then there must be Ω(√s) vertices with degree Ω(√s).

Hypothesis: Real-world interaction networks consist of a scale –free collection of relatively dense Erdős-Rényi graphs.

Pinar - MMDS12 10

Verifying the model

July 13, 2012

Prediction: Within a community, the degree variance should be small.

Prediction: the number of communities grows with the number of nodes.

Verification: Vertex degree is correlated to the average degree of neighbors.

And degrees of triangle vertices are close.

Verification: The sizes of the communities are small, and do not grow (or grow very slowly) for larger graphs. (e.g., Dunbar number).

Hypothesis: Real-world interaction networks consist of a scale –free collection of dense Erdős-Rényi graphs.

Matching the Clustering Coefficients

• Random pairings cannot close wedges!– Models that achieve high clustering coefficients require

history to guide future edge insertions

• To get high clustering coefficient without history, we have to pre-group the nodes into affinity blocks

– Our theory: CL graph with high clustering coefficient must have dense ER block at its core

– Each affinity block is a near-clique

• Simplifying assumption: Non-overlapping blocks– Implies blocks must comprise nodes of similar degree

– Can tune clustering coefficient to be appropriate for the degree by changing connectivity of the block If nodes of different degrees are

in the same block, then the high-degree nodes will have low

clustering coefficients.

Pinar - MMDS12 12

Preprocessing: Determining Blocks

July 13, 2012

• Input: Desired degree-distribution– Assume nodes sorted by degree, least to greatest

– Ignore degree-1 vertices in this phase

• Algorithm:– Group nodes in order

– Bulk assign, if possible

• Output: Compressed information about blocks– Size of each block

– Starting index of each block

– “Weight” of each block (depends on connectivity)

– O(duniq) information, if compressed

• Observe: If degree distribution is power law, then so is the distribution of the block sizes

– Evocative of Dunbar’s number: Max block size = 100 for γ=2

homogeneous

heterogeneous

BTER: Block Two-Level Erdős-Rényi

Preprocessing:Create explicit

affinity blocks of nodes with same (or nearly same)

degree. The blocks are

determined by the degree

distribution.

Phase 1: Erdős-Rényi graphs in

each block. User-specified connectivity determines number of links within each block. A formula is given that

depends on lowest-degree node in block.

Phase 2: CL (aka weighted ER)

model on “excess” degree, creates

connections across blocks. Can run in tandem to Phase 1

by worked with expected excess

degree.

Scalable BTER: Independent Edges

Choose phase 1 or 2?

Create Phase 2 edgeusing CL model on

“excess degree”

Create Block 1 edge per ER model with

connectivity ρ1

Choose 1st endpoint

Choose 2nd endpoint

Create Block K edge per ER model with

connectivity ρK

Choose 1st endpoint

Choose 2nd endpoint

Choose 1st endpoint

Choose 2nd endpoint

Choose block proportional to number of “samples” per block

Requires O(n) data to determine the various probabilities, and can be compressed to O(duniq).

Visualization of BTER Adjacency Matrix

Red = Phase 1Blue = Phase 2

Visualization of BTER Graph

Nodes colored by degree

(darker=higher)

Phase 1 = BluePhase 2 = Green

Image courtesy of Nurcan

Durak.

Pinar - MMDS12 17

Co-authorship (ca-AstroPh)

July 13, 2012

• 18,771 nodes 396,100 edges; based on arxiv• BTER

Phase 1 edges: 290,268 Phase 2 edges: 112,808

• Global clustering coefficient Original: 0.32, BTER: 0.31, CL: 0.01

• Normalized size of the largest connected component: Original: 0.95, BTER: 0.86, CL: 1.0

Eigenvalues are not determined by the degree distribution.

Pinar - MMDS12 18

Trust Network

July 13, 2012

• 75,879 vertices, 811,480 edges; based on Epinion web site; edges represent trust between two users

• BTER Phase 1 Edges: 300,162 Phase 2 Edges: 515,192

• Global clustering coefficientOriginal: 0.07, BTER: 0.07, CL: 0.03

• Normalized size of the largest connected componentOriginal: 1.0, BTER: 0.96, CL: 0.98

BTER provides good matches to real data, even when the clustering coefficients are not too high

Pinar - MMDS12 19

Concluding Remarks• Models are a critical challenge in network analysis• The challenge is not in building a formal model, but

formalizing the modeling process itself • We propose the Block Two-Level Erdös-Rényi (BTER)

– New theory says there must be many dense subgraphs for high clustering coefficient

– New BTER model explicitly creates dense communities using ER

– Exceptional similarities to real data in terms of clustering coefficients and eigenvalues

• The code is available at http://www.sandia.gov/~tgkolda/bter_supplement/.• Scalable versions (Hadoop and MPI) will be

available soon• For more information,

– Ali Pinar apinar@sandia.govJuly 13, 2012

A new workshop• SIAM Workshop on Network Science

– Proposed; under review• Dates: July 7-8, 2013• Place: San Diego, CA• Co-located with SIAM Annual Meeting • Contact:

– Ali Pinar (apinar@sandia.gov), Sandia National Labs– Madhav Marathe (mmarathe@vbi.vt.edu), Virginia Tech

Pinar - MMDS12 21

Relevant Publications• Modeling

– C. Seshadhri, T. Kolda, and A. Pinar, “The Blocked Two-Level Erdos Renyi Graph Model,” Phys. Rev. E 2012.

– C. Seshadhri, A. Pinar, and T. Kolda, “An In Depth analysis of Stochastic Kronecker Graphs," submitted.– A. Pinar, C. Seshadhri, and T. Kolda, “The Similarity of Stochastic Kronecker Graphs to Edge-

Configuration Models,” SDM’12– C. Seshadhri, A. Pinar, and T. Kolda, “An In Depth study of Stochastic Kronecker Graphs,” ICDM’12

• Generating a random graph– J. Ray, A. Pinar, and C. Sehadhri, Are we there yet? When to stop a Markov chain while generating

random graphs," submitted.– I. Stanton and A. Pinar, “Constructing and uniform sampling graphs with prescribed joint degree

distribution using Markov Chains,” to appear in ACM JEA.– I. Stanton and A. Pinar, “Sampling graphs with prescribed joint degree distribution using Markov

Chains,” ALENEX’11.• Community structure

– C. Seshadhri, A. Pinar, and T. Kolda, “Fast Triangle Counting through Wedge Sampling," submitted. – M. Rocklin and A. Pinar, “On Clustering on Graphs with Multiple Edge Types,” Internet Math. 2012. – M. Rocklin and A. Pinar, “Latent Clustering on Graphs with Multiple Edge Types,” Proc. 8th Workshop on

Algorithms and Models for the Web Graph WAW’ 11.– M. Rocklin and A. Pinar, “Computing an Aggregate Edge-weight function for Clustering Graphs with

Multiple Edge Types,” WAW’10.July 13, 2012

the block two-level erdös-rényi (bter) graph model ali pinar, c. seshadri, and tamara g. kolda...

matching degree distributions

pinar mmds127 slide

degree of vertex

structure imp

pinar mmds123

pinar mmds125

model desiderata

clustering coefficients

Documents

a class of rényi information estimators for...

first passage percolation on the erd˝os-rényi random graph

rényi divergence and kullback-leibler divergence

board of technical education...

(dl hacks輪読) variational inference with rényi...

graphes alÉatoires d’erdÖs-rÉnyi

variations on a result of erdös and surányi - andrica...

statistical inference for rényi entropy

random graph modelssrijan kumar, georgia tech, cse6240...

gabor erdös, malte book klinik für anästhesiologie und...

inhoudsopgave - bter bouw bter jv 2018...de heren verhoeven...

heller, m. - rényi, Á.: a nyilvános kommunikáció...

rényi entropy measure of noise-aided information...

variance of the subgraph count for sparse erdős–rényi...

rényi alfréd doktori disszertációja...i.2. gyermekkora...

rényi institute of mathematics, budapest - bourbaki.fr ·...

foreword - institutul de fizica atomica · of generalized...

2. random graphs the erdös-rényi models. distinguish:...

replika 62-07 rényi

collectieve arbeidsovereenkomst bedrijfstakeigen regelingen...