evaluation of blue gene performance and applicability · 2009-02-05 · san diego supercomputer...

16
SAN DIEGO SUPERCOMPUTER CENTER at the UNIVERSITY OF CALIFORNIA, SAN DIEGO Evaluation of Blue Gene Performance and Applicability Wayne Pfeiffer July 21, 2006

Upload: others

Post on 06-Jul-2020

1 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Evaluation of Blue Gene Performance and Applicability · 2009-02-05 · SAN DIEGO SUPERCOMPUTER CENTER at the UNIVERSITY OF CALIFORNIA, SAN DIEGO We had several reasons for acquiring

SAN DIEGO SUPERCOMPUTER CENTER

at the UNIVERSITY OF CALIFORNIA, SAN DIEGO

Evaluation of Blue GenePerformance and Applicability

Wayne PfeifferJuly 21, 2006

Page 2: Evaluation of Blue Gene Performance and Applicability · 2009-02-05 · SAN DIEGO SUPERCOMPUTER CENTER at the UNIVERSITY OF CALIFORNIA, SAN DIEGO We had several reasons for acquiring

SAN DIEGO SUPERCOMPUTER CENTER

at the UNIVERSITY OF CALIFORNIA, SAN DIEGO

SDSC has a single-rack Blue Gene system with 1,024compute nodes & 128 I/O nodes, the maximum allowed

Page 3: Evaluation of Blue Gene Performance and Applicability · 2009-02-05 · SAN DIEGO SUPERCOMPUTER CENTER at the UNIVERSITY OF CALIFORNIA, SAN DIEGO We had several reasons for acquiring

SAN DIEGO SUPERCOMPUTER CENTER

at the UNIVERSITY OF CALIFORNIA, SAN DIEGO

We had several reasons for acquiringour Blue Gene system

• We wanted to see what fraction of our applications couldbe usefully run on Blue Gene to plan for the future

• We hoped to offload some of our major applications fromDataStar, our large cluster with mostly 8-way p655 nodes

• We wanted to see how well Blue Gene could supportI/O-intensive applications

Page 4: Evaluation of Blue Gene Performance and Applicability · 2009-02-05 · SAN DIEGO SUPERCOMPUTER CENTER at the UNIVERSITY OF CALIFORNIA, SAN DIEGO We had several reasons for acquiring

SAN DIEGO SUPERCOMPUTER CENTER

at the UNIVERSITY OF CALIFORNIA, SAN DIEGO

Blue Gene offers many pluses

+ Hardware is more reliable than for other high-endsystems installed at SDSC in recent years

+ Compute times are extremely reproducible+ Networks scale well+ I/O performance with GPFS is good

(given SDSC’s max I/O-node configuration)+ Price per peak flop/s is low+ Power per flop/s is low+ Footprint is small

Page 5: Evaluation of Blue Gene Performance and Applicability · 2009-02-05 · SAN DIEGO SUPERCOMPUTER CENTER at the UNIVERSITY OF CALIFORNIA, SAN DIEGO We had several reasons for acquiring

SAN DIEGO SUPERCOMPUTER CENTER

at the UNIVERSITY OF CALIFORNIA, SAN DIEGO

But there are also some minuses

- Processors are relatively slow• Clock speed is 700 MHz• Compilers make little use of second FPU in each processor

(though optimized libraries do much better)- Applications must scale well to get high absolute

performance- Memory is only 512 MB/node, so some problems don’t fit

• Coprocessor mode can be used (with 1p/node), but this is inefficient• Some problems still don’t fit even in coprocessor mode

- Cross-compiling complicates software development forcomplex codes

Page 6: Evaluation of Blue Gene Performance and Applicability · 2009-02-05 · SAN DIEGO SUPERCOMPUTER CENTER at the UNIVERSITY OF CALIFORNIA, SAN DIEGO We had several reasons for acquiring

SAN DIEGO SUPERCOMPUTER CENTER

at the UNIVERSITY OF CALIFORNIA, SAN DIEGO

Compute, communication, & I/O speeds have beenmeasured for many synthetic & application benchmarkson Blue Gene & DataStar (with 1.5-GHz Power4+ procs)

• Synthetics• sloops• HPL (Linpack)• HPC Challenge• NAS Parallel Benchmarks• IOR

• Applications• Amber 9 PMEMD (biophysics: molecular dynamics)• …• SPECFEM3D (geophysics: seismic wave propagation)

Page 7: Evaluation of Blue Gene Performance and Applicability · 2009-02-05 · SAN DIEGO SUPERCOMPUTER CENTER at the UNIVERSITY OF CALIFORNIA, SAN DIEGO We had several reasons for acquiring

SAN DIEGO SUPERCOMPUTER CENTER

at the UNIVERSITY OF CALIFORNIA, SAN DIEGO

Speed of BG relative to DataStar is as good or better than clock speedratio for HPCC benchmarks on 1024p; G-Random Access & RR Latencyare especially good on BG; CO & VN mode perform similarly (per MPI p)

Clock speedratio = 0.47

0.1

1.0

10.0

G-HPL

(ESSL)

G-PTRANS G-FFTE G-Random

Access

EP-DGEMM

(ESSL)

EP-STREAM

Triad

Random

Ring

Bandwidth

Random

Ring

Latency

HPCC benchmark on 1,024p

Speed r

ela

tive t

o 1

.5-G

Hz D

ata

Sta

r

BG in CO mode

BG in VN mode

Page 8: Evaluation of Blue Gene Performance and Applicability · 2009-02-05 · SAN DIEGO SUPERCOMPUTER CENTER at the UNIVERSITY OF CALIFORNIA, SAN DIEGO We had several reasons for acquiring

SAN DIEGO SUPERCOMPUTER CENTER

at the UNIVERSITY OF CALIFORNIA, SAN DIEGO

Many applications have been ported to BG at SDSC;some run well enough that BG is attractive

Code name Discipline Description Implementors

Amber 9 PMEMD Biophysics Molecular dynamics Ross Walker (SDSC)

AWM Geophysics 3-D seismic wave Yifeng Cui (SDSC)

propagation

DNS (ESSL) Engineering Direct numerical simulation Diego Donzis (Georgia Tech) &

of 3-D turbulence Dmitry Pekurovsky (SDSC)

DOT (FFTW) Biophysics Protein docking Susan Lindsey (SDSC) &

Wayne Pfeiffer (SDSC)

MILC * Physics Quantum chromodynamics Doug Toussaint (Arizona)

mpcugles Engineering 3-D fluid dynamics Giri Chukkapalli (SDSC)

NAMD 2.6b1 * Biophysics Molecular dynamics Sameer Kumar (IBM)

(FFTW)

Rosetta * Biophysics Protein folding Ross Walker (SDSC)

SPECFEM3D Geophysics 3-D seismic wave Brian Savage

propagation (Carnegie Institution)

* Most heavily used

Page 9: Evaluation of Blue Gene Performance and Applicability · 2009-02-05 · SAN DIEGO SUPERCOMPUTER CENTER at the UNIVERSITY OF CALIFORNIA, SAN DIEGO We had several reasons for acquiring

SAN DIEGO SUPERCOMPUTER CENTER

at the UNIVERSITY OF CALIFORNIA, SAN DIEGO

Speed of BG relative to DataStar varies about clock speed ratio(0.47 = 0.7/1.5) for applications on ≥ 512p;

CO & VN mode perform similarly (per MPI p)

0.1

1.0

Amber 9

PMEMD

Cellulose:

768p

AWM 512^3

w/o I/O:

1,024p

DNS (ESSL)

1,024^3:

1,024p

DOT (FFTW)

UDG/UGI 54k

rots: 512p

MILC large:

1,024p

mpcugles

forward prop

w/o I/O:

512p

NAMD 2.6b1

(FFTW)

ApoA1:

512p

Rosetta 5

structs: 1p

SPECFEM3D

Tonga-Fiji:

1,024p

Application

Speed r

ela

tive t

o 1

.5-G

Hz D

ata

Sta

r

BG in CO mode

BG in VN mode

Clock speed

ratio = 0.47

Page 10: Evaluation of Blue Gene Performance and Applicability · 2009-02-05 · SAN DIEGO SUPERCOMPUTER CENTER at the UNIVERSITY OF CALIFORNIA, SAN DIEGO We had several reasons for acquiring

SAN DIEGO SUPERCOMPUTER CENTER

at the UNIVERSITY OF CALIFORNIA, SAN DIEGO

MILC strong scaling on BG is very good, but is superlinear on DataStarfor modest p, presumably because of better cache usage

0.1

1.0

10.0

16 32 64 128 256 512 1,024 2,048

Processors

Speed/pro

cessor

rela

tive t

o 1

.5-G

Hz D

ata

Sta

r

1.5-GHz DataStar

Blue Gene VN modeMILC medium

Page 11: Evaluation of Blue Gene Performance and Applicability · 2009-02-05 · SAN DIEGO SUPERCOMPUTER CENTER at the UNIVERSITY OF CALIFORNIA, SAN DIEGO We had several reasons for acquiring

SAN DIEGO SUPERCOMPUTER CENTER

at the UNIVERSITY OF CALIFORNIA, SAN DIEGO

NAMD 2.6b1 is much faster than 2.5 on BG (thanks to Sameer Kumar)& has somewhat better strong scaling on BG than DataStar;

CO & VN mode perform similarly (per MPI p)

0.1

1.0

16 32 64 128 256 512 1024 2048

MPI processors

Speed/pro

cessor

rela

tive t

o 1

.5-G

Hz D

ata

Sta

r

1.5-GHz DataStar

Blue Gene CO mode

Blue Gene VN mode

NAMD 2.5 & 2.6b1

for ApoA1 with 92k atoms

2.6b1

2.5

2.6b1

2.5

Page 12: Evaluation of Blue Gene Performance and Applicability · 2009-02-05 · SAN DIEGO SUPERCOMPUTER CENTER at the UNIVERSITY OF CALIFORNIA, SAN DIEGO We had several reasons for acquiring

SAN DIEGO SUPERCOMPUTER CENTER

at the UNIVERSITY OF CALIFORNIA, SAN DIEGO

DNS strong scaling on BG is generally better than on DataStar,but shows unusual variation;

VN mode is somewhat slower than CO mode (per MPI p)

0.1

1.0

16 32 64 128 256 512 1024 2048

MPI processors

Speed/pro

cessor

rela

tive t

o 1

.5-G

Hz D

ata

Sta

r

1.5-GHz DataStar

Blue Gene CO mode

Blue Gene VN modeDNS 1024^3

Data fromDmitry Pekurovsky

Page 13: Evaluation of Blue Gene Performance and Applicability · 2009-02-05 · SAN DIEGO SUPERCOMPUTER CENTER at the UNIVERSITY OF CALIFORNIA, SAN DIEGO We had several reasons for acquiring

SAN DIEGO SUPERCOMPUTER CENTER

at the UNIVERSITY OF CALIFORNIA, SAN DIEGO

If number of allocated processors is considered,then VN mode is faster than CO mode,

and unusual variation is aligned

0.1

1.0

16 32 64 128 256 512 1024 2048

Allocated processors

Speed/pro

cessor

rela

tive t

o 1

.5-G

Hz D

ata

Sta

r

1.5-GHz DataStar

Blue Gene CO mode

Blue Gene VN modeDNS 1024^3

Data fromDmitry Pekurovsky

Page 14: Evaluation of Blue Gene Performance and Applicability · 2009-02-05 · SAN DIEGO SUPERCOMPUTER CENTER at the UNIVERSITY OF CALIFORNIA, SAN DIEGO We had several reasons for acquiring

SAN DIEGO SUPERCOMPUTER CENTER

at the UNIVERSITY OF CALIFORNIA, SAN DIEGO

Some applications that have been ported to BG at SDSCneed more memory at the scalability limit

to run interesting problemsCode name Discipline Description Issues

ASH Astrophysics Solar convection Needs to run in CO mode

at scalability limit

Enzo Astrophysics Cosmological simulation Not enough memory for

(both unigrid & AMR) large unigrid problems

at scalability limit

Modest scaliability limit

with AMR

NAMD Biophysics Molecular dynamics Needs to run in CO mode

for large problems

at scalabilty limit

PARATEC Materials Ab initio quantum Not enough memory

science mechanics for large problems

at scalability limit

Modest scalability limit

Page 15: Evaluation of Blue Gene Performance and Applicability · 2009-02-05 · SAN DIEGO SUPERCOMPUTER CENTER at the UNIVERSITY OF CALIFORNIA, SAN DIEGO We had several reasons for acquiring

SAN DIEGO SUPERCOMPUTER CENTER

at the UNIVERSITY OF CALIFORNIA, SAN DIEGO

IOR weak scaling scans using GPFS-WAN show BG in VN modeachieves 3.4 GB/s for writes (~DS) & 2.7 GB/s for reads (>DS)

10

100

1,000

10,000

1 2 4 8 16 32 64 128 256 512 1024 2048

MPI processors

I/O

rate

(M

B/s)

CO peak

CO write

CO read

VN peak

VN write

VN read

Blue Gene in CO & VN mode using gpfs-wan (default mapping)

Noncollective read/write via IOR (256 or 128 MB/p)

(-a POSIX -e -t 1m -b 256m or -b 128m)

Data from 3/7/06

Page 16: Evaluation of Blue Gene Performance and Applicability · 2009-02-05 · SAN DIEGO SUPERCOMPUTER CENTER at the UNIVERSITY OF CALIFORNIA, SAN DIEGO We had several reasons for acquiring

SAN DIEGO SUPERCOMPUTER CENTER

at the UNIVERSITY OF CALIFORNIA, SAN DIEGO

Blue Gene has more limited applicability than DataStar,but is a good choice if the application is right

+ Some applications run relatively fast & scale well+ Turnaround is good with only a few users+ Hardware is reliable & easy to maintain- Some applications run slowly or don’t scale well- Some typical problems need to run in CO mode to fit in

memory- Other typical problems won’t fit at all