global address space applications kathy yelick nersc/lbnl and u.c. berkeley

12
Global Address Space Applications Kathy Yelick NERSC/LBNL and U.C. Berkeley

Upload: baldwin-burns

Post on 26-Dec-2015

218 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Global Address Space Applications Kathy Yelick NERSC/LBNL and U.C. Berkeley

Global Address Space Applications

Kathy Yelick

NERSC/LBNL and U.C. Berkeley

Page 2: Global Address Space Applications Kathy Yelick NERSC/LBNL and U.C. Berkeley

Algorithm Space

Regularity

Reuse

Two-sided dense linear algebra

One-sided dense linear algebra

FFTs

Sparse iterative solvers

Sparse direct solvers

Asynchronous discrete even simulation

Grobner Basis (“Symbolic LU”)

Search

Sorting

Page 3: Global Address Space Applications Kathy Yelick NERSC/LBNL and U.C. Berkeley

Scaling Applications

• Machine Parameters- Floating point performance

- Application dependent, not theoretical peak

- Amount of memory per processor- Use 1/10th for algorithm data

- Communication Overhead- Time processor is busy sending a message

- Cannot be overlapped

- Communication Latency- Time across the network (can be overlapped)

- Communication Bandwidth - Single node and bisection

• Back-of-the envelope calculations !

Page 4: Global Address Space Applications Kathy Yelick NERSC/LBNL and U.C. Berkeley

Running Sparse MVM on a Pflop

- 1 GHz * 8 pipes * 8 ALUs/Pipe = 64 GFLOPS/node peak

- 8 Address generators limit performance to 16 Gflops

- 500ns latency, 1 cycle put/get overhead, 100 cycle MP overhead

- Programmability differences too: packing vs. global address space

1.E+07

1.E+08

1.E+09

1.E+10

1.E+11

1.E+12

1.E+13

1.E+14

1.E+15

1.E+16

Op

s/s

ec

Put/Get

Blocking read/w rite

Synchronous MP

Asynch MP

Peak

Page 5: Global Address Space Applications Kathy Yelick NERSC/LBNL and U.C. Berkeley

Effect of Memory Size

• Low overhead is important for - Small memory nodes or smaller problem sizes- Programmability

1.E+07

1.E+08

1.E+09

1.E+10

1.E+11

1.E+12

1.E+13

1.E+14

1.E+15

1.E+16

0.3

0.5

1.0

2.1

4.1

8.2

16.4

32.9

65.8

131.

626

3.1

526.

3

1052

.5

2105

.0

4210

.0

MB/node of data

Op

s/s

ec

Put/Get

Blocking read/w rite

Synchronous MP

Asynch MP

Peak

Page 6: Global Address Space Applications Kathy Yelick NERSC/LBNL and U.C. Berkeley

Parallel Applications in Titanium

• Genome Application• Heart simulation • AMR elliptic and hyperbolic solvers• Scalable Poisson for infinite domains• Genome application• Several smaller benchmarks: EM3D, MatMul, LU,

FFT, Join

Page 7: Global Address Space Applications Kathy Yelick NERSC/LBNL and U.C. Berkeley

MOOSE Application

• Problem: Microarray construction- Used for genome experiments- Possible medical applications long-term

• Microarray Optimal Oligo Selection Engine (MOOSE) - A parallel engine for selecting the best

oligonucleotide sequences for genetic microarray testing

- Uses dynamic load balancing within Titanium

Page 8: Global Address Space Applications Kathy Yelick NERSC/LBNL and U.C. Berkeley

Heart Simulation

• Problem: compute blood flow in the heart- Model as elastic structure in incompressible fluid.

- “Immersed Boundary Method” [Peskin and McQueen]- Particle/Mesh method stress communication performance- 20 years of development in model

- Many other applications: blood clotting, inner ear, insect flight, embryo growth,…

• Can be used for design of prosthetics- Artificial heart valves- Cochlear implants

Page 9: Global Address Space Applications Kathy Yelick NERSC/LBNL and U.C. Berkeley

Scalable Poisson Solver

• MLC for Finite-Differences by Balls and Colella• Poisson equation with infinite boundaries

- arise in astrophysics, some biological systems, etc.• Method is scalable

- Low communication • Performance on

- SP2 (shown) and t3e- scaled speedups- nearly ideal (flat)

• Currently 2D and non-adaptive

• Point charge example shown- Rings & star charges- Relative error shown

-6.4

7x10

-9

0

1.31

x10-9

Page 10: Global Address Space Applications Kathy Yelick NERSC/LBNL and U.C. Berkeley

AMR Gas Dynamics

• Developed by McCorquodale and Colella• 2D Example (3D supported)

- Mach-10 shock on solid surface at oblique angle

• Future: Self-gravitating gas dynamics package

Page 11: Global Address Space Applications Kathy Yelick NERSC/LBNL and U.C. Berkeley

UPC Application Investigations

• Pyramid- 3D Mesh generation [Shewchuk]- 2D version (triangle) critical in Quake project- Written in C, challenge to parallelize

• SuperLU- Sparse direct solver [Li,Demmel]- Written in C+MPI or threads- UPC may enable new algorithmic techniques

• N-Body simulation- “Simulating the Universe”

Page 12: Global Address Space Applications Kathy Yelick NERSC/LBNL and U.C. Berkeley

Summary

• UPC Killer App should- Leverage programmability: hard in MPI- Use fine-grained, irregular, asynchronous

communication • Libraries

- Must allow for interface to libraries - MPI libraries, multithreaded libraries, serial

libraries• Compilation needs

- High performance on at least one machine- Portability across many machines