evaluation of blue gene performance and applicability · 2009-02-05 · san diego supercomputer...
TRANSCRIPT
SAN DIEGO SUPERCOMPUTER CENTER
at the UNIVERSITY OF CALIFORNIA, SAN DIEGO
Evaluation of Blue GenePerformance and Applicability
Wayne PfeifferJuly 21, 2006
SAN DIEGO SUPERCOMPUTER CENTER
at the UNIVERSITY OF CALIFORNIA, SAN DIEGO
SDSC has a single-rack Blue Gene system with 1,024compute nodes & 128 I/O nodes, the maximum allowed
SAN DIEGO SUPERCOMPUTER CENTER
at the UNIVERSITY OF CALIFORNIA, SAN DIEGO
We had several reasons for acquiringour Blue Gene system
• We wanted to see what fraction of our applications couldbe usefully run on Blue Gene to plan for the future
• We hoped to offload some of our major applications fromDataStar, our large cluster with mostly 8-way p655 nodes
• We wanted to see how well Blue Gene could supportI/O-intensive applications
SAN DIEGO SUPERCOMPUTER CENTER
at the UNIVERSITY OF CALIFORNIA, SAN DIEGO
Blue Gene offers many pluses
+ Hardware is more reliable than for other high-endsystems installed at SDSC in recent years
+ Compute times are extremely reproducible+ Networks scale well+ I/O performance with GPFS is good
(given SDSC’s max I/O-node configuration)+ Price per peak flop/s is low+ Power per flop/s is low+ Footprint is small
SAN DIEGO SUPERCOMPUTER CENTER
at the UNIVERSITY OF CALIFORNIA, SAN DIEGO
But there are also some minuses
- Processors are relatively slow• Clock speed is 700 MHz• Compilers make little use of second FPU in each processor
(though optimized libraries do much better)- Applications must scale well to get high absolute
performance- Memory is only 512 MB/node, so some problems don’t fit
• Coprocessor mode can be used (with 1p/node), but this is inefficient• Some problems still don’t fit even in coprocessor mode
- Cross-compiling complicates software development forcomplex codes
SAN DIEGO SUPERCOMPUTER CENTER
at the UNIVERSITY OF CALIFORNIA, SAN DIEGO
Compute, communication, & I/O speeds have beenmeasured for many synthetic & application benchmarkson Blue Gene & DataStar (with 1.5-GHz Power4+ procs)
• Synthetics• sloops• HPL (Linpack)• HPC Challenge• NAS Parallel Benchmarks• IOR
• Applications• Amber 9 PMEMD (biophysics: molecular dynamics)• …• SPECFEM3D (geophysics: seismic wave propagation)
SAN DIEGO SUPERCOMPUTER CENTER
at the UNIVERSITY OF CALIFORNIA, SAN DIEGO
Speed of BG relative to DataStar is as good or better than clock speedratio for HPCC benchmarks on 1024p; G-Random Access & RR Latencyare especially good on BG; CO & VN mode perform similarly (per MPI p)
Clock speedratio = 0.47
0.1
1.0
10.0
G-HPL
(ESSL)
G-PTRANS G-FFTE G-Random
Access
EP-DGEMM
(ESSL)
EP-STREAM
Triad
Random
Ring
Bandwidth
Random
Ring
Latency
HPCC benchmark on 1,024p
Speed r
ela
tive t
o 1
.5-G
Hz D
ata
Sta
r
BG in CO mode
BG in VN mode
SAN DIEGO SUPERCOMPUTER CENTER
at the UNIVERSITY OF CALIFORNIA, SAN DIEGO
Many applications have been ported to BG at SDSC;some run well enough that BG is attractive
Code name Discipline Description Implementors
Amber 9 PMEMD Biophysics Molecular dynamics Ross Walker (SDSC)
AWM Geophysics 3-D seismic wave Yifeng Cui (SDSC)
propagation
DNS (ESSL) Engineering Direct numerical simulation Diego Donzis (Georgia Tech) &
of 3-D turbulence Dmitry Pekurovsky (SDSC)
DOT (FFTW) Biophysics Protein docking Susan Lindsey (SDSC) &
Wayne Pfeiffer (SDSC)
MILC * Physics Quantum chromodynamics Doug Toussaint (Arizona)
mpcugles Engineering 3-D fluid dynamics Giri Chukkapalli (SDSC)
NAMD 2.6b1 * Biophysics Molecular dynamics Sameer Kumar (IBM)
(FFTW)
Rosetta * Biophysics Protein folding Ross Walker (SDSC)
SPECFEM3D Geophysics 3-D seismic wave Brian Savage
propagation (Carnegie Institution)
* Most heavily used
SAN DIEGO SUPERCOMPUTER CENTER
at the UNIVERSITY OF CALIFORNIA, SAN DIEGO
Speed of BG relative to DataStar varies about clock speed ratio(0.47 = 0.7/1.5) for applications on ≥ 512p;
CO & VN mode perform similarly (per MPI p)
0.1
1.0
Amber 9
PMEMD
Cellulose:
768p
AWM 512^3
w/o I/O:
1,024p
DNS (ESSL)
1,024^3:
1,024p
DOT (FFTW)
UDG/UGI 54k
rots: 512p
MILC large:
1,024p
mpcugles
forward prop
w/o I/O:
512p
NAMD 2.6b1
(FFTW)
ApoA1:
512p
Rosetta 5
structs: 1p
SPECFEM3D
Tonga-Fiji:
1,024p
Application
Speed r
ela
tive t
o 1
.5-G
Hz D
ata
Sta
r
BG in CO mode
BG in VN mode
Clock speed
ratio = 0.47
SAN DIEGO SUPERCOMPUTER CENTER
at the UNIVERSITY OF CALIFORNIA, SAN DIEGO
MILC strong scaling on BG is very good, but is superlinear on DataStarfor modest p, presumably because of better cache usage
0.1
1.0
10.0
16 32 64 128 256 512 1,024 2,048
Processors
Speed/pro
cessor
rela
tive t
o 1
.5-G
Hz D
ata
Sta
r
1.5-GHz DataStar
Blue Gene VN modeMILC medium
SAN DIEGO SUPERCOMPUTER CENTER
at the UNIVERSITY OF CALIFORNIA, SAN DIEGO
NAMD 2.6b1 is much faster than 2.5 on BG (thanks to Sameer Kumar)& has somewhat better strong scaling on BG than DataStar;
CO & VN mode perform similarly (per MPI p)
0.1
1.0
16 32 64 128 256 512 1024 2048
MPI processors
Speed/pro
cessor
rela
tive t
o 1
.5-G
Hz D
ata
Sta
r
1.5-GHz DataStar
Blue Gene CO mode
Blue Gene VN mode
NAMD 2.5 & 2.6b1
for ApoA1 with 92k atoms
2.6b1
2.5
2.6b1
2.5
SAN DIEGO SUPERCOMPUTER CENTER
at the UNIVERSITY OF CALIFORNIA, SAN DIEGO
DNS strong scaling on BG is generally better than on DataStar,but shows unusual variation;
VN mode is somewhat slower than CO mode (per MPI p)
0.1
1.0
16 32 64 128 256 512 1024 2048
MPI processors
Speed/pro
cessor
rela
tive t
o 1
.5-G
Hz D
ata
Sta
r
1.5-GHz DataStar
Blue Gene CO mode
Blue Gene VN modeDNS 1024^3
Data fromDmitry Pekurovsky
SAN DIEGO SUPERCOMPUTER CENTER
at the UNIVERSITY OF CALIFORNIA, SAN DIEGO
If number of allocated processors is considered,then VN mode is faster than CO mode,
and unusual variation is aligned
0.1
1.0
16 32 64 128 256 512 1024 2048
Allocated processors
Speed/pro
cessor
rela
tive t
o 1
.5-G
Hz D
ata
Sta
r
1.5-GHz DataStar
Blue Gene CO mode
Blue Gene VN modeDNS 1024^3
Data fromDmitry Pekurovsky
SAN DIEGO SUPERCOMPUTER CENTER
at the UNIVERSITY OF CALIFORNIA, SAN DIEGO
Some applications that have been ported to BG at SDSCneed more memory at the scalability limit
to run interesting problemsCode name Discipline Description Issues
ASH Astrophysics Solar convection Needs to run in CO mode
at scalability limit
Enzo Astrophysics Cosmological simulation Not enough memory for
(both unigrid & AMR) large unigrid problems
at scalability limit
Modest scaliability limit
with AMR
NAMD Biophysics Molecular dynamics Needs to run in CO mode
for large problems
at scalabilty limit
PARATEC Materials Ab initio quantum Not enough memory
science mechanics for large problems
at scalability limit
Modest scalability limit
SAN DIEGO SUPERCOMPUTER CENTER
at the UNIVERSITY OF CALIFORNIA, SAN DIEGO
IOR weak scaling scans using GPFS-WAN show BG in VN modeachieves 3.4 GB/s for writes (~DS) & 2.7 GB/s for reads (>DS)
10
100
1,000
10,000
1 2 4 8 16 32 64 128 256 512 1024 2048
MPI processors
I/O
rate
(M
B/s)
CO peak
CO write
CO read
VN peak
VN write
VN read
Blue Gene in CO & VN mode using gpfs-wan (default mapping)
Noncollective read/write via IOR (256 or 128 MB/p)
(-a POSIX -e -t 1m -b 256m or -b 128m)
Data from 3/7/06
SAN DIEGO SUPERCOMPUTER CENTER
at the UNIVERSITY OF CALIFORNIA, SAN DIEGO
Blue Gene has more limited applicability than DataStar,but is a good choice if the application is right
+ Some applications run relatively fast & scale well+ Turnaround is good with only a few users+ Hardware is reliable & easy to maintain- Some applications run slowly or don’t scale well- Some typical problems need to run in CO mode to fit in
memory- Other typical problems won’t fit at all