computation on meshes, sparse matrices, and graphs some slides are from david culler, jim demmel,...

25
Computation on Computation on meshes, meshes, sparse matrices, sparse matrices, and graphs and graphs Some slides are from David Culler, Jim Demmel, Bob Lucas, Horst Simon, Kathy Yelick, et al., UCB CS267

Upload: lesley-holt

Post on 01-Jan-2016

218 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Computation on meshes, sparse matrices, and graphs Some slides are from David Culler, Jim Demmel, Bob Lucas, Horst Simon, Kathy Yelick, et al., UCB CS267

Computation onComputation onmeshes, meshes,

sparse matrices, sparse matrices, and graphsand graphs

Some slides are from David Culler, Jim Demmel, Bob Lucas, Horst Simon,

Kathy Yelick, et al., UCB CS267

Page 2: Computation on meshes, sparse matrices, and graphs Some slides are from David Culler, Jim Demmel, Bob Lucas, Horst Simon, Kathy Yelick, et al., UCB CS267

2

Parallelizing Stencil ComputationsParallelizing Stencil Computations

• Parallelism is simple• Grid is a regular data structure• Even decomposition across processors gives load balance

• Spatial locality limits communication cost• Communicate only boundary values from neighboring patches

• Communication volume • v = total # of boundary cells between patches

Page 3: Computation on meshes, sparse matrices, and graphs Some slides are from David Culler, Jim Demmel, Bob Lucas, Horst Simon, Kathy Yelick, et al., UCB CS267

3

Two-dimensional block decompositionTwo-dimensional block decomposition

• n mesh cells, p processors• Each processor has a patch of n/p cells• Block row (or block col) layout: v = 2 * p * sqrt(n)• 2-dimensional block layout: v = 4 * sqrt(p) * sqrt(n)

Page 4: Computation on meshes, sparse matrices, and graphs Some slides are from David Culler, Jim Demmel, Bob Lucas, Horst Simon, Kathy Yelick, et al., UCB CS267

4

Ghost Nodes in Stencil ComputationsGhost Nodes in Stencil Computations

Comm cost = α * (#messages) + β * (total size of messages)

• Keep a ghost copy of neighbors’ boundary nodes• Communicate every second iteration, not every iteration• Reduces #messages, not total size of messages• Costs extra memory and computation• Can also use more than one layer of ghost nodes

Green = my interior nodes

Yellow = neighbors’ boundary nodes = my “ghost nodes”

Blue = my boundary nodes

Page 5: Computation on meshes, sparse matrices, and graphs Some slides are from David Culler, Jim Demmel, Bob Lucas, Horst Simon, Kathy Yelick, et al., UCB CS267

5

Irregular mesh: NASA Airfoil in 2DIrregular mesh: NASA Airfoil in 2D

Page 6: Computation on meshes, sparse matrices, and graphs Some slides are from David Culler, Jim Demmel, Bob Lucas, Horst Simon, Kathy Yelick, et al., UCB CS267

6

Composite Mesh from a Mechanical StructureComposite Mesh from a Mechanical Structure

Page 7: Computation on meshes, sparse matrices, and graphs Some slides are from David Culler, Jim Demmel, Bob Lucas, Horst Simon, Kathy Yelick, et al., UCB CS267

7

Adaptive Mesh Refinement (AMR)Adaptive Mesh Refinement (AMR)

• Adaptive mesh around an explosion

• Refinement done by calculating errors

Page 8: Computation on meshes, sparse matrices, and graphs Some slides are from David Culler, Jim Demmel, Bob Lucas, Horst Simon, Kathy Yelick, et al., UCB CS267

8

Adaptive MeshAdaptive Mesh

Shock waves in a gas dynamics using AMR (Adaptive Mesh Refinement) See: http://www.llnl.gov/CASC/SAMRAI/

fluid density

Page 9: Computation on meshes, sparse matrices, and graphs Some slides are from David Culler, Jim Demmel, Bob Lucas, Horst Simon, Kathy Yelick, et al., UCB CS267

9

Irregular mesh: Tapered Tube (Multigrid)Irregular mesh: Tapered Tube (Multigrid)

Page 10: Computation on meshes, sparse matrices, and graphs Some slides are from David Culler, Jim Demmel, Bob Lucas, Horst Simon, Kathy Yelick, et al., UCB CS267

10

Challenges of Irregular Meshes for PDE’sChallenges of Irregular Meshes for PDE’s

• How to generate them in the first place• For example, Triangle, a 2D mesh generator (Shewchuk)

• 3D mesh generation is harder! For example, QMD (Vavasis)

• How to partition them into patches• For example, ParMetis, a parallel graph partitioner (Karypis)

• How to design iterative solvers• For example, PETSc, Aztec, Hypre (all from national labs)

• … Prometheus, a multigrid solver for finite elements on irregular meshes

• How to design direct solvers• For example, SuperLU, parallel sparse Gaussian elimination

• These are challenges to do sequentially, more so in parallel

Page 11: Computation on meshes, sparse matrices, and graphs Some slides are from David Culler, Jim Demmel, Bob Lucas, Horst Simon, Kathy Yelick, et al., UCB CS267

11

Converting the Mesh to a MatrixConverting the Mesh to a Matrix

Page 12: Computation on meshes, sparse matrices, and graphs Some slides are from David Culler, Jim Demmel, Bob Lucas, Horst Simon, Kathy Yelick, et al., UCB CS267

Sparse matrix data structure (stored by rows)Sparse matrix data structure (stored by rows)

• Full:

• 2-dimensional array of real or complex numbers

• (nrows*ncols) memory

31 0 53

0 59 0

41 26 0

31 53 59 41 26

1 3 2 1 2

• Sparse:

• compressed row storage

• about (2*nzs + nrows) memory

Page 13: Computation on meshes, sparse matrices, and graphs Some slides are from David Culler, Jim Demmel, Bob Lucas, Horst Simon, Kathy Yelick, et al., UCB CS267

P0

P1

P2

Pn

5941 532631

23 131

Each processor stores:

• # of local nonzeros• range of local rows• nonzeros in CSR form

Distributed row sparse matrix data structureDistributed row sparse matrix data structure

Page 14: Computation on meshes, sparse matrices, and graphs Some slides are from David Culler, Jim Demmel, Bob Lucas, Horst Simon, Kathy Yelick, et al., UCB CS267

Conjugate gradient iteration to solve A*x=bConjugate gradient iteration to solve A*x=b

• One matrix-vector multiplication per iteration• Two vector dot products per iteration• Four n-vectors of working storage

x0 = 0, r0 = b, d0 = r0

for k = 1, 2, 3, . . .αk = (rT

k-1rk-1) / (dTk-1Adk-1) step length

xk = xk-1 + αk dk-1 approx solution

rk = rk-1 – αk Adk-1 residual

βk = (rTk rk) / (rT

k-1rk-1) improvement

dk = rk + βk dk-1 search direction

Page 15: Computation on meshes, sparse matrices, and graphs Some slides are from David Culler, Jim Demmel, Bob Lucas, Horst Simon, Kathy Yelick, et al., UCB CS267

Parallel Dense Matrix-Vector Product (Review)Parallel Dense Matrix-Vector Product (Review)

• y = A*x, where A is a dense matrix

• Layout: • 1D by rows

• Algorithm:Foreach processor j

Broadcast X(j)

Compute A(p)*x(j)

• A(i) is the n by n/p block row that processor Pi owns• Algorithm uses the formula

Y(i) = A(i)*X = j A(i)*X(j)

x

y

P0

P1

P2

P3

P0 P1 P2 P3

Page 16: Computation on meshes, sparse matrices, and graphs Some slides are from David Culler, Jim Demmel, Bob Lucas, Horst Simon, Kathy Yelick, et al., UCB CS267

• Lay out matrix and vectors by rows• y(i) = sum(A(i,j)*x(j))• Only compute terms with A(i,j) ≠ 0

• AlgorithmEach processor i:

Broadcast x(i)

Compute y(i) = A(i,:)*x

• Optimizations• Only send each proc the parts of x it needs, to reduce comm • Reorder matrix for better locality by graph partitioning• Worry about balancing number of nonzeros / processor,

if rows have very different nonzero counts

x

y

P0

P1

P2

P3

P0 P1 P2 P3

Parallel sparse matrix-vector productParallel sparse matrix-vector product

Page 17: Computation on meshes, sparse matrices, and graphs Some slides are from David Culler, Jim Demmel, Bob Lucas, Horst Simon, Kathy Yelick, et al., UCB CS267

Other memory layouts for matrix-vector productOther memory layouts for matrix-vector product

• Column layout of the matrix eliminates the broadcast• But adds a reduction to update the destination – same total comm

• Blocked layout uses a broadcast and reduction, both on only sqrt(p) processors – less total comm

• Blocked layout has advantages in multicore / Cilk++ too

P0 P1 P2 P3

P0 P1 P2 P3

P4 P5 P6 P7

P8 P9 P10 P11

P12 P13 P14 P15

Page 18: Computation on meshes, sparse matrices, and graphs Some slides are from David Culler, Jim Demmel, Bob Lucas, Horst Simon, Kathy Yelick, et al., UCB CS267

Graphs and Sparse Matrices Graphs and Sparse Matrices

1 1 1

2 1 1 1

3 1 1 1

4 1 1

5 1 1

6 1 1

1 2 3 4 5 6

3

6

2

1

5

4

• Sparse matrix is a representation of a (sparse) graph

Page 19: Computation on meshes, sparse matrices, and graphs Some slides are from David Culler, Jim Demmel, Bob Lucas, Horst Simon, Kathy Yelick, et al., UCB CS267

19

Graph partitioningGraph partitioning

• Assigns subgraphs to processors• Determines parallelism and locality.• Tries to make subgraphs all same size (load balance)• Tries to minimize edge crossings (communication).• Exact minimization is NP-complete.

edge crossings = 6 edge crossings = 10

Page 20: Computation on meshes, sparse matrices, and graphs Some slides are from David Culler, Jim Demmel, Bob Lucas, Horst Simon, Kathy Yelick, et al., UCB CS267

Link analysis of the webLink analysis of the web

• Web page = vertex

• Link = directed edge

• Link matrix: Aij = 1 if page i links to page j

1 2

3

4 7

6

5

1 52 3 4 6 7

1

5

2

3

4

6

7

Page 21: Computation on meshes, sparse matrices, and graphs Some slides are from David Culler, Jim Demmel, Bob Lucas, Horst Simon, Kathy Yelick, et al., UCB CS267

Web graph: PageRank (Google) Web graph: PageRank (Google) [Brin, Page]

• Markov process: follow a random link most of the time; otherwise, go to any page at random.

• Importance = stationary distribution of Markov process.

• Transition matrix is p*A + (1-p)*ones(size(A)), scaled so each column sums to 1.

• Importance of page i is the i-th entry in the principal eigenvector of the transition matrix.

• But the matrix is 1,000,000,000,000 by 1,000,000,000,000.

An important page is one that many important pages point to.

Page 22: Computation on meshes, sparse matrices, and graphs Some slides are from David Culler, Jim Demmel, Bob Lucas, Horst Simon, Kathy Yelick, et al., UCB CS267

A Page Rank MatrixA Page Rank Matrix

• Importance ranking of web pages

•Stationary distribution of a Markov chain

•Power method: matvec and vector arithmetic

•Matlab*P page ranking demo (from SC’03) on a web crawl of mit.edu (170,000 pages)

Page 23: Computation on meshes, sparse matrices, and graphs Some slides are from David Culler, Jim Demmel, Bob Lucas, Horst Simon, Kathy Yelick, et al., UCB CS267

Social Network Analysis in Matlab: 1993Social Network Analysis in Matlab: 1993

Co-author graph from 1993

Householdersymposium

Page 24: Computation on meshes, sparse matrices, and graphs Some slides are from David Culler, Jim Demmel, Bob Lucas, Horst Simon, Kathy Yelick, et al., UCB CS267

Social Network Analysis in Matlab: 1993Social Network Analysis in Matlab: 1993

Which author hasthe most collaborators?

>>[count,author] = max(sum(A)) count = 32 author = 1

>>name(author,:) ans = Golub

Sparse Adjacency Matrix

Page 25: Computation on meshes, sparse matrices, and graphs Some slides are from David Culler, Jim Demmel, Bob Lucas, Horst Simon, Kathy Yelick, et al., UCB CS267

Social Network Analysis in Matlab: 1993Social Network Analysis in Matlab: 1993

Have Gene Golub and Cleve Moler ever been coauthors? >> A(Golub,Moler) ans = 0No.But how many coauthors do they have in common? >> AA = A^2; >> AA(Golub,Moler) ans = 2And who are those common coauthors? >> name( find ( A(:,Golub) .* A(:,Moler) ), :) ans = Wilkinson VanLoan