high-performance computing at sgi and the status of climate … · 2015. 11. 24. · scalability on...
TRANSCRIPT
High-Performance Computing at SGI and the Status of Climate and
Weather Codes on the SGI AltixGerardo Cisneros, Ph.D.
Scientist
9/9/2004Slide 2
C2004 Silicon Graphics, Inc. All rights reserved. Silicon Graphics, SGI, IRIX, Origin, Onyx, Onyx2, IRIS, Altix, InfiniteReality, Challenge, Reality Center, Geometry Engine, ImageVision Library, OpenGL, XFS, the SGI logo and the SGI cube are registered trademarks and CXFS, Onyx4, InfinitePerformance, IRIS GL, Power Series, Personal IRIS, Power Challenge, NUMAflex, REACT, Open Inventor, OpenGL Performer, OpenGL, Optimizer, OpenGL Volumizer, OpenGL Shader, OpenGL Multipipe, OpenGL Vizserver, SkyWriter, RealityEngine, SGI ProPack, Performance Co-Pilot, SGI Advanced Linux, UltimateVision and The Source of Innovation and Discovery are trademarks of Silicon Graphics, Inc., in the U.S. and/or other countries worldwide. Linux is a registered trademark of Linus Torvalds in several countries, used with permission by Silicon Graphics, Inc. MIPS is a registered trademark of MIPS Technologies, Inc., used under license by Silicon Graphics, Inc. Intel and Itanium are registered trademarks of Intel Corporation or its subsidiaries in the United States and other countries. Red Hat and all Red Hat-based trademarks are trademarks or registered trademarks of Red Hat, Inc. in the United States and other countries. Linux penguin logo created by Larry Ewing. All other trademarks mentioned herein are the property of their respective owners. (04/04)
AA
9/9/2004Slide 3
Overview
• Company focus• SGI Altix: present and future• Performance of NWS codes• Conclusions
9/9/2004Slide 4
Silicon Graphics
•Exclusively focused on the technical computing market•Technology is designed to enable the most significant scientific and creative breakthroughs of the 21st century•Products and services are mission critical to government and defense, science and research, manufacturing, energy and media industries
Providing the Industry’s Highest-Performing Compute, Storage and Visualization Products
Images courtesy of SCI Institute, University of Utah, Navy Rehearsal TOPSCENE Program, Magic Earth LLC, and WETA Digital
AA
9/9/2004Slide 5
Strategic Focus Areas
High Performance Computing
High-performance NUMAflex™ architecture delivers unprecedented flexibility and performance.
Storage
CXFS™ shared file system allows transparent, heterogeneous file access everywhere, without copying data.
Advanced Visualization
Onyx4™ allows users to combine multiple industry-standard graphics cards in a high-bandwidth, low-latency architecture, for cost-effective high performance visualization.
AA
9/9/2004Slide 6
Architecture Designed for HPC;Choice of Deployments
NUMAflex™ Global Shared-Memory ArchitectureBalanced, scalable performance
Operating environment optimized for HPCLow-latency memory access
Easily Deployable
MIPS® and IRIX® Intel® Itanium® 2 and Linux®
SGI® Origin® Family SGI® Altix® Family
AA
9/9/2004Slide 7
System Interconnect
CPU & Memory
Storage Expansion
System I/O
High-Bandwidth I/O Expansion
Standard I/O Expansion
Graphics Expansion
Scalability: No central bus or switch; just modules and NUMAlink™ cables
Scalability: No central bus or switch; just modules and NUMAlink™ cables
Investment protection: Add new technologies as they evolve
Investment protection: Add new technologies as they evolve
Flexibility: Tailored configurations for different dimensions of scalability
Flexibility: Tailored configurations for different dimensions of scalability
Performance: High-bandwidth interconnect with very low latency
Performance: High-bandwidth interconnect with very low latency
Modular SGI® NUMAflex™ Architecture
9/9/2004Slide 8
SGI Family of Scalable Linux® Solutions
Image courtesy: NASA Ames
SGI® Altix® 3000Servers and Superclusters
SGI® Altix® 350Servers and Clusters
Mid-range Departmental Capability
AA
9/9/2004Slide 9
SGI® Altix® 3000 Scaling Roadmap
Worlds Most Scalable Linux® Supercomputer
This slide contains forward-looking statements. The results and forecasts as stated may vary. Other risks and uncertainties relating to this slide may be found in the "Safe-Harbor" statement at the beginning of this presentation.
Jan 2003 Sept 2003 Oct 2003 Mar 2004 mid-2004 2005late 2004Apr 2003
64p 64p 64p128p
64p
256p
64p
512p
256p
512p
256p
1024p
SSI Scalability Shared Memory Scalability
512p
2048p
512p
16,3
84 p
roce
sso
r ca
pab
ility
9/9/2004Slide 10
• 4x Intel® Itanium® 2 processors• 2 processor per 6.4GB/sec
frontside bus• 4–64GB memory C-brick• SHUB memory controller
8.51–10.2GB/sec memory bandwidth(varies with memory speed 133 MHz vs. 166 MHz)
• 6.4GB/sec aggregate interconnect bandwidth
• 4.8GB/sec aggregate I/O bandwidth
16 x PC2100 or PC2700 DDR SDRAM8 to 16GB of memory per node8.51–10.2GB/sec memory b/w
SHUB SHUB
SGI® Altix® 3000 C-Brick Detail
9/9/2004Slide 11
Other Altix Bricks
PX-brickPCI-X expansion
D-brick2Disk expansion
R-brick2Router interconnect
IX-brickBase I/O module
M-brickMemory expansion
9/9/2004Slide 12
Altix 3700 — 128 processors
9/9/2004Slide 13
Altix 3700 — 512 processors
9/9/2004Slide 14
Next generation Altix
(8) Fewer External Routers(1) Less Power Bay(1) Less Rack(20) Fewer Cables
Altix 3700 “ Tornado”
9/9/2004Slide 15
Next generation Altix
9/9/2004Slide 16
• Independently scale CPU, memory, I/O
• One Linux® instance to manage
• Investment protection & leverage current assets• Allocate budget and resources to ongoing needs
CPU Expansion Module
Memory Expansion Module
I/O Expansion (4-32) Module
The Altix® 350: “ Expand on Demand” Growth Path
1-16P/2-192GB
1-12P/ 2-144 GB
Base Unit:1 or 2P/2-24GB
1- 4P/ 2- 48GB
1-8P/ 2-96GB
Right-size systems for the ultimate
price/performance
9/9/2004Slide 17
Customers with Altix systems running climateand weather applications
• NASA – 10240p – ECCO, CAM, CCSM, etc.• NCSA – 1024p – WRF• GFDL – 608p (2x256, 1x96) – MOM4, CM2.1• NRL – 384p (1x128, 1x256) – various• ORNL – 256p – CCSM, CAM, POP • LANL – 256p – POP, HYCOM• BAMS – 20p – MM5 and MAQSIP• CMMACS – 12p Altix350 – MOM4• APAT – 8p Altix350 – MM5• Romanian Nat’l Met Admin – 2p Altix350 – HRM
9/9/2004Slide 18
IFS - Performance on SGI Altix 3000
T511 performance in Forecast days per day
0
100
200
300
400
500
600
700
IBM Power41.3GHz
SGI Altix 30001.3GHz
SGI 30001.5GHz
32 Cpu64 Cpu128 Cpu256 Cpu384 Cpu512 Cpu
Altix 3000 @1.5GHz is 2.22 x faster IBM Power4 @1.3GHz
Source: Roland Richter, SGI, May 2004
9/9/2004Slide 19
IFS - Scalability on SGI Altix 3000
Itanium2 @1.5GHz is 1.3x faster than Itanium @1.3GHz because of the larger cache and higher clock rate
Source: Roland Richter, SGI, May 2004
9/9/2004Slide 20
HRM Performance on the Altix 350
78h forecast in 2340 time steps over a 181x217 horizontal grid with 26 vertical levels
Simulation speed
0
50
100
150
200
250
300
350
1 2 4 8 10 12 16
Nproc
Origin 350/600 MHz Altix 350/1.4 GHz
9/9/2004Slide 21
LM-RAPS 2.1(Optimization using shared Memory)
Benchmark case used by Emy (Greece) in 2003Grid 363x263x45
Scalability on Altix using Intel Compiler and SGI’s MPT library
9/9/2004Slide 22
MM5 3.6.3 - Standard benchmark, 2004Higher
isBetter
Source: http://www.mmm.ucar.edu/mm5/mpp/helpdesk/20040304a.html
Altix version was compiled with the Intel 8.1.007 (beta) ifort Fortrancompiler and the Intel 8.1.010 (beta) icc C compiler, and linked with SGI's MPI from MPT 1.10.
9/9/2004Slide 23
WRF 1.3 - Scalability & Performance on Altix 3000
Source: http://www.mmm.ucar.edu/wrf/bench and Gerardo Cisneros, SGI
Altix 3000 is 1.28x faster than HP cluster at the same clock frequency.
9/9/2004Slide 24
WRF 2.0.2 on a Large Problem
WRF SI 5km CONUS 48h forecast (980x720x37, 5km, 30s)
01000020000300004000050000
32 64 128 256 512
NCPUs
Ela
psed
s
1.5GHz/6MB L3 1.3GHz/3MB L3
9/9/2004Slide 25
WRF 2.0.2 on a Large Problem
WRF SI 5km CONUS 48h forecast (980x720x37, 5km, 30s)
0
10
20
30
40
32 64 128 256 512
NCPUs
Sim
ulat
ion
spee
d
1.5GHz/6MB L3 1.3GHz/3MB L3
9/9/2004Slide 26
GFS Performance on the Altix 3700 (1.5GHz)
GFS T240L30 (720x360x30) 120h forecast
2.23 4.21 9.58 23.6851.15
98.34
190.01
255.20
343.91
0.00
50.00
100.00
150.00
200.00
250.00
300.00
350.00
400.00
1 2 4 8 16 32 64 96 128
NCPUs
Sim
ula
tion
sp
eed
9/9/2004Slide 27
MOM4 - Performance on SGI Altix 37000.25° Ocean Model (1440x600x24)
15,476
7,813
3,8222,308
10,185
4,640
2,397
0
2,000
4,000
6,000
8,000
10,000
12,000
14,000
16,000
18,000
16 32 64 128Number of processors
Ela
pse
d
Tim
e
0
20
40
60
80
100
120
Sp
eed
up
Altix 3000 @1.5GHz w/ Dynamic memory compilation
Altix 3000 @1.5GHz w/ Static memory compilation
Speedup
Source: Gerardo Cisneros, SGI, May 2004
9/9/2004Slide 28
POP 1.4.3(Optimization by reducing the # of synchronizations)
Increasing Message length and reducing synchronization frequency improved the performance by 20% at 256 CPU
Scalability on ALTIX using Intel Compiler and SGI’s MPT library
9/9/2004Slide 29
POP 1.4.3 - Performance on 1 Degree “X1” Problem
0.0
20.0
40.0
60.0
80.0
100.0
0 64 128 192 256 320 384Nr. of Processors
Sim
ulat
ed Y
ears
/ wa
llclo
ck D
ay) Altix 3000 @1.5 GHz (SGI)
Altix 3000 @ 1.3 GHz (SGI)
Altix 3000 @1.5GHz (NASA)
POP 1.4.3 - Performance on SGI Altix 3000 SSI
Source: Altix 3000 (SGI) - version optimized by Roland Richter using MPT Altix 3000 (NASA) - Jim Taft’s version using MLP
Configuration: Altix 3000 @1.3GHz, 512P SSI, Intel compiler 7.1.035 MPT 1.9
Higheris
Better
08
1624324048566472808896
0 32 64 96 128 160 192 224 256 288 320 352 384Sp
eedu
p
IdealAltix 3000
9/9/2004Slide 30
CCSM 3.0 (beta08) Performance on SGI Altix 3000Atmosphere-Land-Ice-Ocean Coupled Model
2.36
4.89
5.06
7.30
8.36
8.98
10.3
6
13.0
20
2
4
6
8
10
12
14
26 52 64 78 96 104 128 208
Number of Processors
Fore
cast
yea
rs/w
allc
lock
day Higher
isBetter
Run on 512P Altix 3000 @1.3GHz with efc 7.1.035 compiler
Speedup from 26P to 208P
3.8
5.6
3.1
1.0
2.1
0
1
2
3
4
5
6
7
8
26 52 78 104 208Number of Processors
Spee
dup
Ideal speedup
SGI Altix 3000 @1.3GHz
Source: Roland Richter, SGI
9/9/2004Slide 31
Conclusions
The Altix is proving to be an excellent system for weather and climate codes
• Shared memory allows very large models to run• The compilers are getting better• The libraries are excellent and still improving• SGI’s software layers on top of Linux help jobs
attain their best performance• SGI engineers have the know-how to make
weather and climate codes run well on its customers’ SGI systems
9/9/2004Slide 32