supercomputer platforms and its applications dr. george chiu ibm t.j. watson research center

26
Supercomputer Platforms and Its Applications Dr. George Chiu IBM T.J. Watson Research Center

Upload: mab

Post on 16-Jan-2016

49 views

Category:

Documents


0 download

DESCRIPTION

Supercomputer Platforms and Its Applications Dr. George Chiu IBM T.J. Watson Research Center. Plasma Science – International Challenges. Microturbulence & Transport What causes plasma transport? Macroscopic Stability What limits the pressure in plasmas? Wave-particle Interactions - PowerPoint PPT Presentation

TRANSCRIPT

Page 1: Supercomputer Platforms and Its Applications Dr. George Chiu IBM T.J. Watson Research Center

Supercomputer Platforms and Its ApplicationsDr. George ChiuIBM T.J. Watson Research Center

Page 2: Supercomputer Platforms and Its Applications Dr. George Chiu IBM T.J. Watson Research Center

Plasma Science – International Challenges

• Microturbulence & Transport

– What causes plasma transport?

• Macroscopic Stability

– What limits the pressure in plasmas?

• Wave-particle Interactions

– How do particles and plasma waves interact?

• Plasma-wall Interactions

– How can high-temperature plasma and material surfaces co-exist?

Page 3: Supercomputer Platforms and Its Applications Dr. George Chiu IBM T.J. Watson Research Center

IBM Confidential © 2007 IBM Corporation

IBM Systems

3

2007-2008 Deep Computing Roadmap Summary 1H07 2H07 1H08 2H08

System PServers

System PSoftware

System XServers

System XSoftware

BlueGene

Blue GeneSoftware

Cell BE

P5 560Q+

PL4/ML16 PHV8

p6 Blade p6IH

PL4/ML32

PHV8

p6 Blade

Specific & Exclusive Repurposed – Neither Specific nor exclusive

11S0

11S2

SystemStorage

JS21 IB AIX Solution: CSM 1.6/RSCT 2.4.7 GPFS 3.1, LoadLeveler 3.4.1 PESSL 3.3, PE 4.3.1PERCS System Design Analysis

Initial p6 support for SMPs & EthernetGPFS 3.2 – filesystem mgtCSM 1.7

p6 IH/Blades IB SolutionsAIX 5.3 and SLES 10Initial AIX 6.1 support for SMPs & Ethernet

x3455 DC x3455 QC (Barcelona)

x3550 QC

x3850 QCx3755 QC

HS21 LS21 LS41 LS Blades –> Barcelona QC

x3550 Harpertown/ Greencreek Refresh

CSM RHEL 5 support CSM 1.6/RSCT 2.4.7*

GPFS 3.2 support for System x/1350RHEL 5 supportCSM 1.7 for System x/1350*

Blue Gene /L BG/L (EOL)

BG/P 1st Petaflop * .

BlueGene/P Support: GPFS 3.2, CSM 1.7 LoadLeveler 3.4.2, ESSL 4.3.1

Specific but not exclusive

SERVER & SYSTEMS LEGEND

QS20 QS21 Prototype QS22

DDN OEM Agreement

DS4800

DCS9550

EXP100 Attach DS4800 Follow-on

SDK 2.1 SDK 3.0 SDK 4.0

Workstations

System Accept

M50 R1Z30 R1 Z40

M60 Z40 R1

M60 R1*APro elim impacts DCV

Source: IBM Deep Computing Strategy – 7.18.07

P6H

HV4

p6 IH/Blades IB SolutionsAIX 6.1 CSM 1.7.0.x ,GPFS 3.3, LoadLeveler 3.5, PE 5.1, ESSL 4.4, PESSL 3.3

GPFS 3.3 and CSM 1.7.0.x support for System x/1350

LA

DCS9550 +

DS4700 for HPC

iDPX – Stoakley Planar iDPX – Thurley Planar

SDK 5.0 QS21

* 1st Petaflop dependent on BG client demand

Page 4: Supercomputer Platforms and Its Applications Dr. George Chiu IBM T.J. Watson Research Center

IBM HPC roadmap

Power 5

Power 6

Power 7

Blue Gene/L

Blue Gene/PBlue Gene/Q

Clusters and Blades

Page 5: Supercomputer Platforms and Its Applications Dr. George Chiu IBM T.J. Watson Research Center

• The POWER series is IBM’s mainstream computing offering– Market is about 60% commercial and 40% technical– Product line value proposition

• General purpose computing engine• Robustness, security & reliability fitting mission-critical requirements• Standard programming model and interfaces• Performance leadership with competitive performance/price value• Robust integration with industry standards (hardware and software)

• Current status– POWER 6 announced– POWER 7 is underway

IBM HPC conceptual roadmap: POWER

Power 5

Power 6

Power 7

Page 6: Supercomputer Platforms and Its Applications Dr. George Chiu IBM T.J. Watson Research Center

ASC Purple 100TF Machine based on Power 5 ~1500 8-way Power5 Nodes Federation (HPS) ~12K CPUs

(~1500 × 2 multi-plane fat-tree topology, 2x2 GB/s links)

Communication libraries: < 5 µs latency, 1.8 GB/s uni

GPFS: 122 GB/s Supports NIF

#6 in Top500 (Nov. 2007)

www.top500.org

'95 '96 '97 '98 '99 '00 '01 '02 '03 '040.5

1

10

100

1+ Tflop/0.5 TB

3+ Tflop/1.5 TB

10+ Tflop/5 TB

30+ Tflop/10 TB

100+ Tflop/50 TB

Option Red

Option Blue

Option White

Accelerated Strategic Computing Initiative

Purple

Turquoise

'05

Page 7: Supercomputer Platforms and Its Applications Dr. George Chiu IBM T.J. Watson Research Center

Autonomic Computing Enhancements

2001

POWER4 2007

POWER6

65 nm

L2 caches

4.7 GHz Core

Advanced SystemFeatures & Switch

Chip Multi Processing - Distributed Switch - Shared L2Dynamic LPARs (16)

180 nm

1.3 GHzCore

1.3 GHzCore

Distributed Switch

Shared L2

2002-3

POWER4+

1.7 GHz Core

1.7 GHz Core

130 nm

Reduced sizeLower powerLarger L2More LPARs (32)

Shared L2

Distributed Switch

2004

POWER52005-06

POWER5+

Simultaneous multi-threadingSub-processor partitioningDynamic firmware updatesEnhanced scalability, parallelismHigh throughput performanceEnhanced memory subsystem

90 nm

Shared L2

2.3 GHz Core

2.3 GHz Core

Distributed Switch

130 nm

1.9 GHzCore

1.9 GHz Core

Distributed Switch

Shared L2

POWER Server Roadmap

**Planned to be offered by IBM. All statements about IBM’s future direction and intent are subject to change or withdrawal without notice and represent goals and objectives only.

4.7 GHz Core

Ultra High FrequencyVery Large L2Robust Error RecoveryHigh ST and HPC PerfHigh throughput PerfMore LPARs (1024)Enhanced memory subsystem

Page 8: Supercomputer Platforms and Its Applications Dr. George Chiu IBM T.J. Watson Research Center

Challenge

Innovation

Deliver world-class deep-computing and e-Science services with an attractive cost/performance ratio

Enable collaboration among leading scientific teams in the areas of biology, chemistry, medicine, earth sciences and physics

Efficient integration of commercially available commodity components

Modular and scalable open cluster architecture

computing, storage, networking, software, management, applications

Diskless capabilityimproves node reliability, reducing installation and maintenance costs

Record cluster density and power efficiencyLeading price/performance and TCO

in High Performance Computing

IBM e1350 capability Linux cluster platform comprising 42 IBM eServer p615 servers, 2560 IBM eServer BladeCenter JS21 serversand IBM TotalStorage hardware

94 TF DP (64-bit)186 TF SP (32-bit)376 Tops (8-bit)

20 TB RAM, 370 TB disk

Linux 2.6#1 in Europe#9 in TOP500

~ 120 m²~ 750 kW

MareNostrum at a Glance

Page 9: Supercomputer Platforms and Its Applications Dr. George Chiu IBM T.J. Watson Research Center

BlueGene/P

13.6 GF/s8 MB EDRAM

4 processors

1 chip, 20 DRAMs

13.6 GF/s2.0 GB DDR2

(4.0GB is an option)

32 Node Cards

13.9 TF/s2 TB

72 Racks, 72x32x32

1 PF/s144 TB

Cabled 8x8x16Rack

System

Compute Card

Chip

435 GF/s64 GB

(32 chips 4x4x2)32 compute, 0-1 IO cards

Node Card

Page 10: Supercomputer Platforms and Its Applications Dr. George Chiu IBM T.J. Watson Research Center
Page 11: Supercomputer Platforms and Its Applications Dr. George Chiu IBM T.J. Watson Research Center

November 28, 2005

HPC Challenge Benchmarks

Benchmark 64-rack BG

(optimized)

Best posted

Cray XT3

4600/5200 nodes

HPL (TFlop/s) 259.213 20.53

RandomAccess

(GUP/s)

35.46 0.687 (7.69 on Cray X1E)

FFT (GFlop/s) 2311.09 905.57

STREAM Triad

(GB/s)

160,064 29,164.8

PTRANS

(GB/s)

4600 1800 (Red Storm)

Page 12: Supercomputer Platforms and Its Applications Dr. George Chiu IBM T.J. Watson Research Center

0.23

0.34

0.02 0.02 0.02

0.00

0.05

0.10

0.15

0.20

0.25

0.30

0.35

BG/L BG/P Red Storm Thunderbird Purple

System Power EfficiencyG

flo

ps/

Wa

tt

Page 13: Supercomputer Platforms and Its Applications Dr. George Chiu IBM T.J. Watson Research Center

Failures per Month per @ 100 TFlops (20 BG/L racks)unparalleled reliability

0

100

200

300

400

500

600

700

800

IA64

X86

Power5

Blue Gene

127

1

394

800

Results of survey conducted by Argonne National Lab on 10 clusters ranging from 1.2 to 365 TFlops (peak); excluding storage subsystem, management nodes, SAN network equipment, software outages

Page 14: Supercomputer Platforms and Its Applications Dr. George Chiu IBM T.J. Watson Research Center

Classical MD – ddcMD2005 Gordon Bell Prize Winner!!

Scalable, general purpose code for performing classical molecular dynamics (MD) simulations using highly accurate MGPT potentials

MGPT semi-empirical potentials, based on a rigorous expansion of many body terms in the total energy, are needed in to quantitatively investigate dynamic behavior of d-shell and f-shell metals.

524 million atom simulations on 64K nodes achieved 101.5 TF/s sustained. Superb strong and weak scaling for full machine - (“very impressive machine” says PI)

Visualization of important scientific findings already achieved on BG/L: Molten Ta at 5000K demonstrates solidification during isothermal compression to 250 GPa

2,048,000 Tantalum atoms

Page 15: Supercomputer Platforms and Its Applications Dr. George Chiu IBM T.J. Watson Research Center

Qbox: First Principles Molecular DynamicsFrancois Gygi UCD, Erik Draeger, Martin Schulz, Bronis de Supinski, LLNLFranz Franchetti Carnegie mellon, John Gunnels, Vernon Austel, Jim Sexton, IBM

Treats electrons quantum mechanically Treats nuclii classically Developed at LLNL BG Supported provided by IBM Simulated 1,000 Mo atoms with 12,000 electrons Achieves 207.3 Teraflops sustained.

(56.8% of peak).

Qbox simulation of the transition from a molecular solid (top) to a quantum liquid (bottom) that is expected to occur in hydrogen under high pressure.

Page 16: Supercomputer Platforms and Its Applications Dr. George Chiu IBM T.J. Watson Research Center

Gordon Bell Special Achievement Award 2006, P. Vranas et. al.QCD Speedup on BG/L to 70.5 sustained Teraflops.

0.00

10.00

20.00

30.00

40.00

50.00

60.00

70.00

80.00

0 20000 40000 60000 80000 100000 120000 140000

Number of CPU cores

Su

stai

ned

Ter

aflo

ps

Dirac Operator = 19.3%

CG inverter = 18.7%

Page 17: Supercomputer Platforms and Its Applications Dr. George Chiu IBM T.J. Watson Research Center

Compute Power of the Gyrokinetic Toroidal CodeNumber of particles (in million) moved 1 step in 1 second

10

100

1000

10000

100000

10 10000 10000000

Number of Cores

Co

mp

ute

Po

we

r (m

illio

n o

f p

art

icle

s)

Cray XT3/XT4

BG/L

BG/L Optimal

BG/L at Livermore

Page 18: Supercomputer Platforms and Its Applications Dr. George Chiu IBM T.J. Watson Research Center

Compute Power of the Gyrokinetic Toroidal CodeNumber of particles (in million) moved 1 step in 1 second

BlueGene can reach 150 billion particles in 2008, >1 trillion in 2011.POWER6 can reach 1 billion particles in 2008, >0.3 trillion in 2011.

10

100

1000

10000

100000

1000000

10 10000 10000000

Number of Cores

Co

mp

ute

Po

we

r (m

illio

n o

f p

art

icle

s)

Cray XT3/XT4

BG/L

BG/L Optimal

BG/L at Livermore

BG/P at 3.5PF

IBM Power

P6 at 300TF

Page 19: Supercomputer Platforms and Its Applications Dr. George Chiu IBM T.J. Watson Research Center

Systems Performance

IBM Systems & Technology Group

Rechenzentrum Garching at BG Watson: GENE

Strong scaling of GENEv11+ for a problem size of 300-500 GB with measurement points for 1k, 2k, 4k, 8k and 16k processors normalized to 1k processors.

Quasi-linear scaling has been observed with a parallel efficiency of 95% on 8k processors, and of 89% on 16k processors By Hermann Lederer*, Reinhard Tisma* and Frank Jenko+, RZG*and IPP+, March 21,22 2007

Page 20: Supercomputer Platforms and Its Applications Dr. George Chiu IBM T.J. Watson Research Center

November 28, 2005

Current HPC Systems Characteristics

IBM POWER IBM BlueGene/P #1 Worldwide

Mare Nostrum

#1 in Europe

Intel (Clovertown)

GF/socket 37.6 13.6 18.4 37.328

TF/rack 1.8 13.9 3.1 4.48

GB/core 4 0.5 2 1

GB/rack 384 2048 672 480

Mem BW byte/flop

1.5 1 - 0.34

Mem BW in TB/s 2.7 13.9 - 1.5

P-P Interconnect byte/flop

0.1 0.75 0.014 0.1

Kilo-watt/rack 35 35 25 25

Space/100TF 1500 170 1200 1200

Page 21: Supercomputer Platforms and Its Applications Dr. George Chiu IBM T.J. Watson Research Center

November 28, 2005

Summary

IBM is much involved in ITER applications through its collaborations- Princeton Plasma Physics Laboratory

- Max-Planck-Institut für Plasma Physik/Rechenzentrum Garching

- Barcelona Supercomputer Center

- Oak Ridge National Laboratory

IBM is also involved in laser-plasma fusion through its collaborations- Lawrence Livermore National Laboratory

- Forschungszentrum Jülich

IBM offers multiple platforms to address ITER needs- POWER: high memory capacity/node, moderate interprocessor bandwidth, moderate scalability – capability and capacity machine

- Blue Gene – low power, low memory capacity/node, high interprocessor bandwidth, highest scalability - capability and capacity applications

- X Series and white box: moderate memory capacity/node, low interprocessor bandwidth, limited, moderate scalability – mostly capacity machine.

Page 22: Supercomputer Platforms and Its Applications Dr. George Chiu IBM T.J. Watson Research Center

November 28, 2005

Backup

Page 23: Supercomputer Platforms and Its Applications Dr. George Chiu IBM T.J. Watson Research Center

November 28, 2005

What BG brings to Core Turbulence Transport

Benchmark case CYCLONE- GENE: < 1 day on 64 procs; few hours on 1024 procs BG/L

- GYSELA: ~2.5 days on 64 procs

- ORB5: < 1day on 64 procs; few hours on 1024 procs BG/L

Similar ITER-size benchmark- GENE: ~ ½ day on 6K procs BG/L

- GYSELA: ~ 10 days on 1024 procs

- ORB5: ~ ½ day on 16K procs BG/L; ~1 week on 256 procs PC cluster

Courtesy José Mª Cela , Director of Applications, BSC

Page 24: Supercomputer Platforms and Its Applications Dr. George Chiu IBM T.J. Watson Research Center

The Gyrokinetic Toroidal Code GTC

Description: Particle-in-cell code (PIC) Developed by Zhihong Lin (now at UC Irvine) Non-linear gyrokinetic simulation of microturbulence [Lee,

1983] Particle-electric field interaction treated self-consistently Uses magnetic field line following coordinates () Guiding center Hamiltonian [White and Chance, 1984] Non-spectral Poisson solver [Lin and Lee, 1995] Low numerical noise algorithm (f method) Full torus (global) simulation

Page 25: Supercomputer Platforms and Its Applications Dr. George Chiu IBM T.J. Watson Research Center

BlueGene Key Applications - Major Scientific Advances

1. Qbox (DFT) LLNL: 56.5% 2006 Gordon-Bell Award 64 racksCPMD IBM: 30% highest scaling 64 racks

2. ddcMD (Classical MD) LLNL: 27.6% 2005 Gordon-Bell Award 64 racksMDCASK LLNL: highest scaling 64 racksSPaSM LANL: highest scaling 64 racksLAMMPS SNL: highest scaling 16 racksBlue Matter IBM: highest scaling 16 racksRosetta UW: highest scaling 20 racksAMBER 8 racks

3. Quantum Chromodynamics IBM: 30% 2006 GB Special Award 64 racksQCD at KEK: 10 racks

4. sPPM (CFD) LLNL: 18% highest scaling 64 racksMiranda LLNL: highest scaling 64 racksRaptor LLNL: highest scaling 64 racksDNS highest scaling 16 racksPETSc FUN3D ANL: 14.2%NEK5 (Thermal Hydraulics) ANL: 22%

5. ParaDis (dislocation dynamics) LLNL: highest scaling 64 racks6. GFMC (Nuclear Physics) ANL: 16%7. WRF (Weather) NCAR: 14% highest scaling 64 racks

POP (Oceanography): highest scaling 16 racks8. HOMME (Climate) NCAR: 12% highest scaling 32 racks

9. GTC (Plasma Physics) PPPL: highest scaling 16 racks

ORB5 RZG: highest scaling 8 racksGENE RZG: 12.5% highest scaling 16 racks

10. Flash (Supernova Ia) highest scaling 32 racksCactus (General Relativity) highest scaling 16 racks

11. AWM (Earthquake) highest scaling 20 racks

Page 26: Supercomputer Platforms and Its Applications Dr. George Chiu IBM T.J. Watson Research Center

Science

Theory

Experiment Simulation