tools and primitives for high performance graph computationgilbert/talks/siamannual12... ·...

35
1 Tools and Primitives for High Performance Graph Computation John R. Gilbert University of California, Santa Barbara Aydin Buluç (LBNL) Adam Lugowski (UCSB) SIAM Minisymposium on Analyzing Massive Real-World Graphs July 12, 2010 Support: NSF, DARPA, DOE, Intel

Upload: others

Post on 26-Apr-2020

2 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Tools and Primitives for High Performance Graph Computationgilbert/talks/SIAMannual12... · 2010-07-12 · 1 Tools and Primitives for High Performance Graph Computation. John R. Gilbert

1

Tools and Primitives for High Performance Graph Computation

John R. GilbertUniversity of California, Santa Barbara

Aydin Buluç (LBNL)Adam Lugowski (UCSB)

SIAM Minisymposium on Analyzing Massive Real-World GraphsJuly 12, 2010

Support: NSF, DARPA, DOE, Intel

Page 2: Tools and Primitives for High Performance Graph Computationgilbert/talks/SIAMannual12... · 2010-07-12 · 1 Tools and Primitives for High Performance Graph Computation. John R. Gilbert

2

An analogy?

As the “middleware” of scientific computing,

linear algebra has suppliedor enabled:

• Mathematical tools

• “Impedance match” to computer operations

• High-level primitives

• High-quality software libraries

• Ways to extract performancefrom computer architecture

• Interactive environments

Computers

Continuousphysical modeling

Linear algebra

Page 3: Tools and Primitives for High Performance Graph Computationgilbert/talks/SIAMannual12... · 2010-07-12 · 1 Tools and Primitives for High Performance Graph Computation. John R. Gilbert

3

An analogy?

Computers

Continuousphysical modeling

Linear algebra

Discretestructure analysis

Graph theory

Computers

Page 4: Tools and Primitives for High Performance Graph Computationgilbert/talks/SIAMannual12... · 2010-07-12 · 1 Tools and Primitives for High Performance Graph Computation. John R. Gilbert

4

An analogy? Well, we’re not there yet ….

Discretestructure analysis

Graph theory

Computers

√ Mathematical tools

? “Impedance match” to computer operations

? High-level primitives

? High-quality software libs

? Ways to extract performancefrom computer architecture

? Interactive environments

Page 5: Tools and Primitives for High Performance Graph Computationgilbert/talks/SIAMannual12... · 2010-07-12 · 1 Tools and Primitives for High Performance Graph Computation. John R. Gilbert

5

• By analogy to numerical scientific computing. . .

• What should the combinatorial BLAS look like?

The Primitives Challenge

C = A*B

y = A*x

μ = xT y

Basic Linear Algebra Subroutines (BLAS):Speed (MFlops) vs. Matrix Size (n)

Page 6: Tools and Primitives for High Performance Graph Computationgilbert/talks/SIAMannual12... · 2010-07-12 · 1 Tools and Primitives for High Performance Graph Computation. John R. Gilbert

6

Primitives should …

• Supply a common notation to express computations

• Have broad scope but fit into a concise framework

• Allow programming at the appropriate level ofabstraction and granularity

• Scale seamlessly from desktop to supercomputer

• Hide architecture-specific details from users

Page 7: Tools and Primitives for High Performance Graph Computationgilbert/talks/SIAMannual12... · 2010-07-12 · 1 Tools and Primitives for High Performance Graph Computation. John R. Gilbert

7

The Case for Sparse Matrices

Many irregular applications contain coarse-grained parallelism that can be exploited

by abstractions at the proper level.

Traditional graph computations

Graphs in the language of linear algebra

Data driven,unpredictable communication.

Fixed communication patterns

Irregular and unstructured, poor locality of reference

Operations on matrix blocks exploit memory hierarchy

Fine grained data accesses, dominated by latency

Coarse grained parallelism, bandwidth limited

The case for sparse matrices

Page 8: Tools and Primitives for High Performance Graph Computationgilbert/talks/SIAMannual12... · 2010-07-12 · 1 Tools and Primitives for High Performance Graph Computation. John R. Gilbert

8

Identification of Primitives

Sparse matrix-matrix multiplication (SpGEMM)

Element-wise operations

x

Matrices on various semirings: (x, +) , (and, or) , (+, min) , …

Sparse matrix-dense vector multiplication

Sparse matrix indexing

x

.*

Sparse array-based primitives

Page 9: Tools and Primitives for High Performance Graph Computationgilbert/talks/SIAMannual12... · 2010-07-12 · 1 Tools and Primitives for High Performance Graph Computation. John R. Gilbert

9

Multiple-source breadth-first search

X

1 2

3

4 7

6

5

AT

Page 10: Tools and Primitives for High Performance Graph Computationgilbert/talks/SIAMannual12... · 2010-07-12 · 1 Tools and Primitives for High Performance Graph Computation. John R. Gilbert

10

XAT ATX

1 2

3

4 7

6

5

Multiple-source breadth-first search

Page 11: Tools and Primitives for High Performance Graph Computationgilbert/talks/SIAMannual12... · 2010-07-12 · 1 Tools and Primitives for High Performance Graph Computation. John R. Gilbert

11

• Sparse array representation => space efficient

• Sparse matrix-matrix multiplication => work efficient

• Three possible levels of parallelism: searches, vertices, edges

XAT ATX

1 2

3

4 7

6

5

Multiple-source breadth-first search

Page 12: Tools and Primitives for High Performance Graph Computationgilbert/talks/SIAMannual12... · 2010-07-12 · 1 Tools and Primitives for High Performance Graph Computation. John R. Gilbert

12

A Few Examples

Page 13: Tools and Primitives for High Performance Graph Computationgilbert/talks/SIAMannual12... · 2010-07-12 · 1 Tools and Primitives for High Performance Graph Computation. John R. Gilbert

13

Betweenness Centrality (BC)What fraction of shortest paths pass through this node?

Brandes’ algorithm

A parallel graph library based on distributed-memory sparse arrays

and algebraic graph primitives

Typical software stack

Combinatorial BLAS[Buluc, G]

Page 14: Tools and Primitives for High Performance Graph Computationgilbert/talks/SIAMannual12... · 2010-07-12 · 1 Tools and Primitives for High Performance Graph Computation. John R. Gilbert

14

BC performance in distributed memory

• TEPS = Traversed Edges Per Second

• One page of code using C-BLAS

0

50

100

150

200

250

25 36 49 64 81 100

121

144

169

196

225

256

289

324

361

400

441

484

TEPS

sco

reM

illio

ns

Number of Cores

BC performance

Scale 17

Scale 18

Scale 19

Scale 20

RMAT power-law graph,

2Scale vertices, avg degree 8

Page 15: Tools and Primitives for High Performance Graph Computationgilbert/talks/SIAMannual12... · 2010-07-12 · 1 Tools and Primitives for High Performance Graph Computation. John R. Gilbert

15

KDT: A toolbox for graph analysis and pattern discovery[G, Reinhardt, Shah]

Layer 1: Graph Theoretic Tools

• Graph operations

• Global structure of graphs

• Graph partitioning and clustering

• Graph generators

• Visualization and graphics

• Scan and combining operations

• Utilities

Page 16: Tools and Primitives for High Performance Graph Computationgilbert/talks/SIAMannual12... · 2010-07-12 · 1 Tools and Primitives for High Performance Graph Computation. John R. Gilbert

16

MATLAB®

Star-P architecture

Ordinary Matlab variables

Star-P

client manager

server manager

package manager

processor #0

processor #n-1

processor #1

processor #2

processor #3

. . .

ScaLAPACKFFTWFPGA interface

matrix manager Distributed matrices

sortdense/sparse

UPC user code

MPI user code

Page 17: Tools and Primitives for High Performance Graph Computationgilbert/talks/SIAMannual12... · 2010-07-12 · 1 Tools and Primitives for High Performance Graph Computation. John R. Gilbert

17

Landscape connectivity modeling

• Habitat quality, gene flow, corridor identification, conservation planning

• Pumas in southern California: 12 million nodes, < 1 hour

• Targeting larger problems: Yellowstone-to-Yukon corridor

Figures courtesy of Brad McRae

Page 18: Tools and Primitives for High Performance Graph Computationgilbert/talks/SIAMannual12... · 2010-07-12 · 1 Tools and Primitives for High Performance Graph Computation. John R. Gilbert

18

Circuitscape [McRae, Shah]

• Predicting gene flow with resistive networks

• Matlab, Python, and Star-P (parallel) implementations

• Combinatorics:– Initial discrete grid: ideally 100m resolution (for pumas)

– Partition landscape into connected components

– Graph contraction: habitats become nodes in resistive network

• Numerics:– Resistance computations for pairs of habitats in the landscape

– Iterative linear solvers invoked via Star-P: Hypre (PCG+AMG)

Page 19: Tools and Primitives for High Performance Graph Computationgilbert/talks/SIAMannual12... · 2010-07-12 · 1 Tools and Primitives for High Performance Graph Computation. John R. Gilbert

19

A Few Nuts & Bolts

Page 20: Tools and Primitives for High Performance Graph Computationgilbert/talks/SIAMannual12... · 2010-07-12 · 1 Tools and Primitives for High Performance Graph Computation. John R. Gilbert

20

Why focus on SpGEMM?

• Graph clustering (Markov, peer pressure)• Subgraph / submatrix indexing• Shortest path calculations • Betweenness centrality• Graph contraction• Cycle detection• Multigrid interpolation & restriction• Colored intersection searching• Applying constraints in

finite element computations• Context-free parsing ...

11

11

1 x x

SpGEMM: sparse matrix x sparse matrix

Page 21: Tools and Primitives for High Performance Graph Computationgilbert/talks/SIAMannual12... · 2010-07-12 · 1 Tools and Primitives for High Performance Graph Computation. John R. Gilbert

21

Two Versions of Sparse GEMM

A1 A2 A3 A4 A7A6A5 A8 B1 B2 B3 B4 B7B6B5 B8 C1 C2 C3 C4 C7C6C5 C8

j

x =i

kk

Cij

Cij += Aik Bkj

Ci = Ci + A Bi

x =1D block-column distribution

2D block distribution

Page 22: Tools and Primitives for High Performance Graph Computationgilbert/talks/SIAMannual12... · 2010-07-12 · 1 Tools and Primitives for High Performance Graph Computation. John R. Gilbert

22

Three levels of parallelism from 2-D data decomposition:• columns of X : over multiple simultaneous searches

• rows of X & columns of AT: over multiple frontier nodes

• rows of AT: over edges incident on high-degree frontier nodes

XAT ATX

1 2

3

4 7

6

5

Parallelism in multiple-source BFS

Page 23: Tools and Primitives for High Performance Graph Computationgilbert/talks/SIAMannual12... · 2010-07-12 · 1 Tools and Primitives for High Performance Graph Computation. John R. Gilbert

23

Modeled limits on speedup, sparse 1-D & 2-D

• 1-D algorithms do not scale beyond 40x

• Break-even point is around 50 processors

N P

1-D algorithm

N P

2-D algorithm

Page 24: Tools and Primitives for High Performance Graph Computationgilbert/talks/SIAMannual12... · 2010-07-12 · 1 Tools and Primitives for High Performance Graph Computation. John R. Gilbert

24

Submatrices are hypersparse (nnz << n)

p blocks

blocks

0→p

c

Total memory using compressed sparse

columns = )( nnzpn +Ο

Average nonzeros per column within block =

Average nonzeros per column = c

Any algorithm whose complexity depends on matrix dimension n is asymptotically too wasteful.

p

Page 25: Tools and Primitives for High Performance Graph Computationgilbert/talks/SIAMannual12... · 2010-07-12 · 1 Tools and Primitives for High Performance Graph Computation. John R. Gilbert

25

Distributed-memory sparse matrix-matrix multiplication

j

* =i

kk

Cij

Cij += Aik * Bkj

2D block layout Outer product formulation Sequential “hypersparse” kernel

• Scales well to hundreds of processors

• Betweenness centrality benchmark: over 200 MTEPS

• Experiments: TACC Lonestar cluster

Time vs Number of cores -- 1M-vertex RMAT

Page 26: Tools and Primitives for High Performance Graph Computationgilbert/talks/SIAMannual12... · 2010-07-12 · 1 Tools and Primitives for High Performance Graph Computation. John R. Gilbert

26

CSB: Compressed sparse block storage[Buluc, Fineman, Frigo, G, Leiserson]

Page 27: Tools and Primitives for High Performance Graph Computationgilbert/talks/SIAMannual12... · 2010-07-12 · 1 Tools and Primitives for High Performance Graph Computation. John R. Gilbert

27

CSB for parallel Ax and ATx[Buluc, Fineman, Frigo, G, Leiserson]

• Efficient multiplication of a sparse matrix and its transpose by a vector

• Compressed sparse block storage

• Critical path never more than ~ sqrt(n)*log(n)

• Multicore / multisocket architectures

Page 28: Tools and Primitives for High Performance Graph Computationgilbert/talks/SIAMannual12... · 2010-07-12 · 1 Tools and Primitives for High Performance Graph Computation. John R. Gilbert

28

From Semirings to Computational Patterns

Page 29: Tools and Primitives for High Performance Graph Computationgilbert/talks/SIAMannual12... · 2010-07-12 · 1 Tools and Primitives for High Performance Graph Computation. John R. Gilbert

29

Matrices over semirings

• Matrix multiplication C = AB (or matrix/vector):

Ci,j = Ai,1×B1,j + Ai,2×B2,j + · · · + Ai,n×Bn,j

• Replace scalar operations × and + by

⊗ : associative, distributes over ⊕, identity 1

⊕ : associative, commutative, identity 0 annihilates under ⊗

• Then Ci,j = Ai,1⊗B1,j ⊕ Ai,2⊗B2,j ⊕ · · · ⊕ Ai,n⊗Bn,j

• Examples: (×,+) ; (and,or) ; (+,min) ; . . .

• No change to data reference pattern or control flow

Page 30: Tools and Primitives for High Performance Graph Computationgilbert/talks/SIAMannual12... · 2010-07-12 · 1 Tools and Primitives for High Performance Graph Computation. John R. Gilbert

30

From semirings to computational patterns

Sparse matrix times vector as a semiring operation:

– Given vertex data xi and edge data ai,j

– For each vertex j of interest, compute

yj = ai1,j⊗xi1 ⊕ ai2,j⊗xi2 ⊕ · · · ⊕ aik,j⊗ xik

– User specifies: definition of operations ⊗ and ⊕

Page 31: Tools and Primitives for High Performance Graph Computationgilbert/talks/SIAMannual12... · 2010-07-12 · 1 Tools and Primitives for High Performance Graph Computation. John R. Gilbert

31

Sparse matrix times vector as a computational pattern:

– Given vertex data and edge data

– For each vertex of interest, combine data from neighboring vertices and edges

– User specifies: desired computation on data from neighbors

From semirings to computational patterns

Page 32: Tools and Primitives for High Performance Graph Computationgilbert/talks/SIAMannual12... · 2010-07-12 · 1 Tools and Primitives for High Performance Graph Computation. John R. Gilbert

32

• Explore length-two paths that use specified vertices

• Possibly do some filtering, accumulation, or other computation with vertex and edge attributes

• E.g. “friends of friends” (per Lars Backstrom)

• May or may not want to form the product graph explicitly

• Formulation as semiring matrix multiplication is often possible but sometimes clumsy

• Same data flow and communication patterns as in SpGEMM

SpGEMM as a computational pattern

Page 33: Tools and Primitives for High Performance Graph Computationgilbert/talks/SIAMannual12... · 2010-07-12 · 1 Tools and Primitives for High Performance Graph Computation. John R. Gilbert

33

Graph BLAS: A pattern-based library

• User-specified operations and attributes give the performance benefits of algebraic primitives with a more intuitive and flexible interface.

• Common framework integrates algebraic (edge-based), visitor (traversal-based), and map-reduce patterns.

• 2D compressed sparse block structure supports user-defined edge/vertex/attribute types and operations.

• “Hypersparse” kernels tuned to reduce data movement.

• Initial target: manycore and multisocket shared memory.

Page 34: Tools and Primitives for High Performance Graph Computationgilbert/talks/SIAMannual12... · 2010-07-12 · 1 Tools and Primitives for High Performance Graph Computation. John R. Gilbert

34

Challenge: Complete the analogy . . .

Discretestructure analysis

Graph theory

Computers

√ Mathematical tools

? “Impedance match” to computer operations

? High-level primitives

? High-quality software libs

? Ways to extract performancefrom computer architecture

? Interactive environments

Page 35: Tools and Primitives for High Performance Graph Computationgilbert/talks/SIAMannual12... · 2010-07-12 · 1 Tools and Primitives for High Performance Graph Computation. John R. Gilbert

35

Some challenges

• Fault tolerance

• Uncertainty and probabilistic attributes

• Fine-grained dynamic updates

• Interactive environments for exploration & productivity

• Hardware architecture