high-performance computing at sgi and the status of climate … · 2015. 11. 24. · scalability on...

32
High-Performance Computing at SGI and the Status of Climate and Weather Codes on the SGI Altix Gerardo Cisneros, Ph.D. Scientist

Upload: others

Post on 27-Feb-2021

1 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: High-Performance Computing at SGI and the Status of Climate … · 2015. 11. 24. · Scalability on ALTIX using Intel Compiler and SGI’s MPT library. 9/9/2004 Slide 29 POP 1.4.3

High-Performance Computing at SGI and the Status of Climate and

Weather Codes on the SGI AltixGerardo Cisneros, Ph.D.

Scientist

Page 2: High-Performance Computing at SGI and the Status of Climate … · 2015. 11. 24. · Scalability on ALTIX using Intel Compiler and SGI’s MPT library. 9/9/2004 Slide 29 POP 1.4.3

9/9/2004Slide 2

C2004 Silicon Graphics, Inc. All rights reserved. Silicon Graphics, SGI, IRIX, Origin, Onyx, Onyx2, IRIS, Altix, InfiniteReality, Challenge, Reality Center, Geometry Engine, ImageVision Library, OpenGL, XFS, the SGI logo and the SGI cube are registered trademarks and CXFS, Onyx4, InfinitePerformance, IRIS GL, Power Series, Personal IRIS, Power Challenge, NUMAflex, REACT, Open Inventor, OpenGL Performer, OpenGL, Optimizer, OpenGL Volumizer, OpenGL Shader, OpenGL Multipipe, OpenGL Vizserver, SkyWriter, RealityEngine, SGI ProPack, Performance Co-Pilot, SGI Advanced Linux, UltimateVision and The Source of Innovation and Discovery are trademarks of Silicon Graphics, Inc., in the U.S. and/or other countries worldwide. Linux is a registered trademark of Linus Torvalds in several countries, used with permission by Silicon Graphics, Inc. MIPS is a registered trademark of MIPS Technologies, Inc., used under license by Silicon Graphics, Inc. Intel and Itanium are registered trademarks of Intel Corporation or its subsidiaries in the United States and other countries. Red Hat and all Red Hat-based trademarks are trademarks or registered trademarks of Red Hat, Inc. in the United States and other countries. Linux penguin logo created by Larry Ewing. All other trademarks mentioned herein are the property of their respective owners. (04/04)

AA

Page 3: High-Performance Computing at SGI and the Status of Climate … · 2015. 11. 24. · Scalability on ALTIX using Intel Compiler and SGI’s MPT library. 9/9/2004 Slide 29 POP 1.4.3

9/9/2004Slide 3

Overview

• Company focus• SGI Altix: present and future• Performance of NWS codes• Conclusions

Page 4: High-Performance Computing at SGI and the Status of Climate … · 2015. 11. 24. · Scalability on ALTIX using Intel Compiler and SGI’s MPT library. 9/9/2004 Slide 29 POP 1.4.3

9/9/2004Slide 4

Silicon Graphics

•Exclusively focused on the technical computing market•Technology is designed to enable the most significant scientific and creative breakthroughs of the 21st century•Products and services are mission critical to government and defense, science and research, manufacturing, energy and media industries

Providing the Industry’s Highest-Performing Compute, Storage and Visualization Products

Images courtesy of SCI Institute, University of Utah, Navy Rehearsal TOPSCENE Program, Magic Earth LLC, and WETA Digital

AA

Page 5: High-Performance Computing at SGI and the Status of Climate … · 2015. 11. 24. · Scalability on ALTIX using Intel Compiler and SGI’s MPT library. 9/9/2004 Slide 29 POP 1.4.3

9/9/2004Slide 5

Strategic Focus Areas

High Performance Computing

High-performance NUMAflex™ architecture delivers unprecedented flexibility and performance.

Storage

CXFS™ shared file system allows transparent, heterogeneous file access everywhere, without copying data.

Advanced Visualization

Onyx4™ allows users to combine multiple industry-standard graphics cards in a high-bandwidth, low-latency architecture, for cost-effective high performance visualization.

AA

Page 6: High-Performance Computing at SGI and the Status of Climate … · 2015. 11. 24. · Scalability on ALTIX using Intel Compiler and SGI’s MPT library. 9/9/2004 Slide 29 POP 1.4.3

9/9/2004Slide 6

Architecture Designed for HPC;Choice of Deployments

NUMAflex™ Global Shared-Memory ArchitectureBalanced, scalable performance

Operating environment optimized for HPCLow-latency memory access

Easily Deployable

MIPS® and IRIX® Intel® Itanium® 2 and Linux®

SGI® Origin® Family SGI® Altix® Family

AA

Page 7: High-Performance Computing at SGI and the Status of Climate … · 2015. 11. 24. · Scalability on ALTIX using Intel Compiler and SGI’s MPT library. 9/9/2004 Slide 29 POP 1.4.3

9/9/2004Slide 7

System Interconnect

CPU & Memory

Storage Expansion

System I/O

High-Bandwidth I/O Expansion

Standard I/O Expansion

Graphics Expansion

Scalability: No central bus or switch; just modules and NUMAlink™ cables

Scalability: No central bus or switch; just modules and NUMAlink™ cables

Investment protection: Add new technologies as they evolve

Investment protection: Add new technologies as they evolve

Flexibility: Tailored configurations for different dimensions of scalability

Flexibility: Tailored configurations for different dimensions of scalability

Performance: High-bandwidth interconnect with very low latency

Performance: High-bandwidth interconnect with very low latency

Modular SGI® NUMAflex™ Architecture

Page 8: High-Performance Computing at SGI and the Status of Climate … · 2015. 11. 24. · Scalability on ALTIX using Intel Compiler and SGI’s MPT library. 9/9/2004 Slide 29 POP 1.4.3

9/9/2004Slide 8

SGI Family of Scalable Linux® Solutions

Image courtesy: NASA Ames

SGI® Altix® 3000Servers and Superclusters

SGI® Altix® 350Servers and Clusters

Mid-range Departmental Capability

AA

Page 9: High-Performance Computing at SGI and the Status of Climate … · 2015. 11. 24. · Scalability on ALTIX using Intel Compiler and SGI’s MPT library. 9/9/2004 Slide 29 POP 1.4.3

9/9/2004Slide 9

SGI® Altix® 3000 Scaling Roadmap

Worlds Most Scalable Linux® Supercomputer

This slide contains forward-looking statements. The results and forecasts as stated may vary. Other risks and uncertainties relating to this slide may be found in the "Safe-Harbor" statement at the beginning of this presentation.

Jan 2003 Sept 2003 Oct 2003 Mar 2004 mid-2004 2005late 2004Apr 2003

64p 64p 64p128p

64p

256p

64p

512p

256p

512p

256p

1024p

SSI Scalability Shared Memory Scalability

512p

2048p

512p

16,3

84 p

roce

sso

r ca

pab

ility

Page 10: High-Performance Computing at SGI and the Status of Climate … · 2015. 11. 24. · Scalability on ALTIX using Intel Compiler and SGI’s MPT library. 9/9/2004 Slide 29 POP 1.4.3

9/9/2004Slide 10

• 4x Intel® Itanium® 2 processors• 2 processor per 6.4GB/sec

frontside bus• 4–64GB memory C-brick• SHUB memory controller

8.51–10.2GB/sec memory bandwidth(varies with memory speed 133 MHz vs. 166 MHz)

• 6.4GB/sec aggregate interconnect bandwidth

• 4.8GB/sec aggregate I/O bandwidth

16 x PC2100 or PC2700 DDR SDRAM8 to 16GB of memory per node8.51–10.2GB/sec memory b/w

SHUB SHUB

SGI® Altix® 3000 C-Brick Detail

Page 11: High-Performance Computing at SGI and the Status of Climate … · 2015. 11. 24. · Scalability on ALTIX using Intel Compiler and SGI’s MPT library. 9/9/2004 Slide 29 POP 1.4.3

9/9/2004Slide 11

Other Altix Bricks

PX-brickPCI-X expansion

D-brick2Disk expansion

R-brick2Router interconnect

IX-brickBase I/O module

M-brickMemory expansion

Page 12: High-Performance Computing at SGI and the Status of Climate … · 2015. 11. 24. · Scalability on ALTIX using Intel Compiler and SGI’s MPT library. 9/9/2004 Slide 29 POP 1.4.3

9/9/2004Slide 12

Altix 3700 — 128 processors

Page 13: High-Performance Computing at SGI and the Status of Climate … · 2015. 11. 24. · Scalability on ALTIX using Intel Compiler and SGI’s MPT library. 9/9/2004 Slide 29 POP 1.4.3

9/9/2004Slide 13

Altix 3700 — 512 processors

Page 14: High-Performance Computing at SGI and the Status of Climate … · 2015. 11. 24. · Scalability on ALTIX using Intel Compiler and SGI’s MPT library. 9/9/2004 Slide 29 POP 1.4.3

9/9/2004Slide 14

Next generation Altix

(8) Fewer External Routers(1) Less Power Bay(1) Less Rack(20) Fewer Cables

Altix 3700 “ Tornado”

Page 15: High-Performance Computing at SGI and the Status of Climate … · 2015. 11. 24. · Scalability on ALTIX using Intel Compiler and SGI’s MPT library. 9/9/2004 Slide 29 POP 1.4.3

9/9/2004Slide 15

Next generation Altix

Page 16: High-Performance Computing at SGI and the Status of Climate … · 2015. 11. 24. · Scalability on ALTIX using Intel Compiler and SGI’s MPT library. 9/9/2004 Slide 29 POP 1.4.3

9/9/2004Slide 16

• Independently scale CPU, memory, I/O

• One Linux® instance to manage

• Investment protection & leverage current assets• Allocate budget and resources to ongoing needs

CPU Expansion Module

Memory Expansion Module

I/O Expansion (4-32) Module

The Altix® 350: “ Expand on Demand” Growth Path

1-16P/2-192GB

1-12P/ 2-144 GB

Base Unit:1 or 2P/2-24GB

1- 4P/ 2- 48GB

1-8P/ 2-96GB

Right-size systems for the ultimate

price/performance

Page 17: High-Performance Computing at SGI and the Status of Climate … · 2015. 11. 24. · Scalability on ALTIX using Intel Compiler and SGI’s MPT library. 9/9/2004 Slide 29 POP 1.4.3

9/9/2004Slide 17

Customers with Altix systems running climateand weather applications

• NASA – 10240p – ECCO, CAM, CCSM, etc.• NCSA – 1024p – WRF• GFDL – 608p (2x256, 1x96) – MOM4, CM2.1• NRL – 384p (1x128, 1x256) – various• ORNL – 256p – CCSM, CAM, POP • LANL – 256p – POP, HYCOM• BAMS – 20p – MM5 and MAQSIP• CMMACS – 12p Altix350 – MOM4• APAT – 8p Altix350 – MM5• Romanian Nat’l Met Admin – 2p Altix350 – HRM

Page 18: High-Performance Computing at SGI and the Status of Climate … · 2015. 11. 24. · Scalability on ALTIX using Intel Compiler and SGI’s MPT library. 9/9/2004 Slide 29 POP 1.4.3

9/9/2004Slide 18

IFS - Performance on SGI Altix 3000

T511 performance in Forecast days per day

0

100

200

300

400

500

600

700

IBM Power41.3GHz

SGI Altix 30001.3GHz

SGI 30001.5GHz

32 Cpu64 Cpu128 Cpu256 Cpu384 Cpu512 Cpu

Altix 3000 @1.5GHz is 2.22 x faster IBM Power4 @1.3GHz

Source: Roland Richter, SGI, May 2004

Page 19: High-Performance Computing at SGI and the Status of Climate … · 2015. 11. 24. · Scalability on ALTIX using Intel Compiler and SGI’s MPT library. 9/9/2004 Slide 29 POP 1.4.3

9/9/2004Slide 19

IFS - Scalability on SGI Altix 3000

Itanium2 @1.5GHz is 1.3x faster than Itanium @1.3GHz because of the larger cache and higher clock rate

Source: Roland Richter, SGI, May 2004

Page 20: High-Performance Computing at SGI and the Status of Climate … · 2015. 11. 24. · Scalability on ALTIX using Intel Compiler and SGI’s MPT library. 9/9/2004 Slide 29 POP 1.4.3

9/9/2004Slide 20

HRM Performance on the Altix 350

78h forecast in 2340 time steps over a 181x217 horizontal grid with 26 vertical levels

Simulation speed

0

50

100

150

200

250

300

350

1 2 4 8 10 12 16

Nproc

Origin 350/600 MHz Altix 350/1.4 GHz

Page 21: High-Performance Computing at SGI and the Status of Climate … · 2015. 11. 24. · Scalability on ALTIX using Intel Compiler and SGI’s MPT library. 9/9/2004 Slide 29 POP 1.4.3

9/9/2004Slide 21

LM-RAPS 2.1(Optimization using shared Memory)

Benchmark case used by Emy (Greece) in 2003Grid 363x263x45

Scalability on Altix using Intel Compiler and SGI’s MPT library

Page 22: High-Performance Computing at SGI and the Status of Climate … · 2015. 11. 24. · Scalability on ALTIX using Intel Compiler and SGI’s MPT library. 9/9/2004 Slide 29 POP 1.4.3

9/9/2004Slide 22

MM5 3.6.3 - Standard benchmark, 2004Higher

isBetter

Source: http://www.mmm.ucar.edu/mm5/mpp/helpdesk/20040304a.html

Altix version was compiled with the Intel 8.1.007 (beta) ifort Fortrancompiler and the Intel 8.1.010 (beta) icc C compiler, and linked with SGI's MPI from MPT 1.10.

Page 23: High-Performance Computing at SGI and the Status of Climate … · 2015. 11. 24. · Scalability on ALTIX using Intel Compiler and SGI’s MPT library. 9/9/2004 Slide 29 POP 1.4.3

9/9/2004Slide 23

WRF 1.3 - Scalability & Performance on Altix 3000

Source: http://www.mmm.ucar.edu/wrf/bench and Gerardo Cisneros, SGI

Altix 3000 is 1.28x faster than HP cluster at the same clock frequency.

Page 24: High-Performance Computing at SGI and the Status of Climate … · 2015. 11. 24. · Scalability on ALTIX using Intel Compiler and SGI’s MPT library. 9/9/2004 Slide 29 POP 1.4.3

9/9/2004Slide 24

WRF 2.0.2 on a Large Problem

WRF SI 5km CONUS 48h forecast (980x720x37, 5km, 30s)

01000020000300004000050000

32 64 128 256 512

NCPUs

Ela

psed

s

1.5GHz/6MB L3 1.3GHz/3MB L3

Page 25: High-Performance Computing at SGI and the Status of Climate … · 2015. 11. 24. · Scalability on ALTIX using Intel Compiler and SGI’s MPT library. 9/9/2004 Slide 29 POP 1.4.3

9/9/2004Slide 25

WRF 2.0.2 on a Large Problem

WRF SI 5km CONUS 48h forecast (980x720x37, 5km, 30s)

0

10

20

30

40

32 64 128 256 512

NCPUs

Sim

ulat

ion

spee

d

1.5GHz/6MB L3 1.3GHz/3MB L3

Page 26: High-Performance Computing at SGI and the Status of Climate … · 2015. 11. 24. · Scalability on ALTIX using Intel Compiler and SGI’s MPT library. 9/9/2004 Slide 29 POP 1.4.3

9/9/2004Slide 26

GFS Performance on the Altix 3700 (1.5GHz)

GFS T240L30 (720x360x30) 120h forecast

2.23 4.21 9.58 23.6851.15

98.34

190.01

255.20

343.91

0.00

50.00

100.00

150.00

200.00

250.00

300.00

350.00

400.00

1 2 4 8 16 32 64 96 128

NCPUs

Sim

ula

tion

sp

eed

Page 27: High-Performance Computing at SGI and the Status of Climate … · 2015. 11. 24. · Scalability on ALTIX using Intel Compiler and SGI’s MPT library. 9/9/2004 Slide 29 POP 1.4.3

9/9/2004Slide 27

MOM4 - Performance on SGI Altix 37000.25° Ocean Model (1440x600x24)

15,476

7,813

3,8222,308

10,185

4,640

2,397

0

2,000

4,000

6,000

8,000

10,000

12,000

14,000

16,000

18,000

16 32 64 128Number of processors

Ela

pse

d

Tim

e

0

20

40

60

80

100

120

Sp

eed

up

Altix 3000 @1.5GHz w/ Dynamic memory compilation

Altix 3000 @1.5GHz w/ Static memory compilation

Speedup

Source: Gerardo Cisneros, SGI, May 2004

Page 28: High-Performance Computing at SGI and the Status of Climate … · 2015. 11. 24. · Scalability on ALTIX using Intel Compiler and SGI’s MPT library. 9/9/2004 Slide 29 POP 1.4.3

9/9/2004Slide 28

POP 1.4.3(Optimization by reducing the # of synchronizations)

Increasing Message length and reducing synchronization frequency improved the performance by 20% at 256 CPU

Scalability on ALTIX using Intel Compiler and SGI’s MPT library

Page 29: High-Performance Computing at SGI and the Status of Climate … · 2015. 11. 24. · Scalability on ALTIX using Intel Compiler and SGI’s MPT library. 9/9/2004 Slide 29 POP 1.4.3

9/9/2004Slide 29

POP 1.4.3 - Performance on 1 Degree “X1” Problem

0.0

20.0

40.0

60.0

80.0

100.0

0 64 128 192 256 320 384Nr. of Processors

Sim

ulat

ed Y

ears

/ wa

llclo

ck D

ay) Altix 3000 @1.5 GHz (SGI)

Altix 3000 @ 1.3 GHz (SGI)

Altix 3000 @1.5GHz (NASA)

POP 1.4.3 - Performance on SGI Altix 3000 SSI

Source: Altix 3000 (SGI) - version optimized by Roland Richter using MPT Altix 3000 (NASA) - Jim Taft’s version using MLP

Configuration: Altix 3000 @1.3GHz, 512P SSI, Intel compiler 7.1.035 MPT 1.9

Higheris

Better

08

1624324048566472808896

0 32 64 96 128 160 192 224 256 288 320 352 384Sp

eedu

p

IdealAltix 3000

Page 30: High-Performance Computing at SGI and the Status of Climate … · 2015. 11. 24. · Scalability on ALTIX using Intel Compiler and SGI’s MPT library. 9/9/2004 Slide 29 POP 1.4.3

9/9/2004Slide 30

CCSM 3.0 (beta08) Performance on SGI Altix 3000Atmosphere-Land-Ice-Ocean Coupled Model

2.36

4.89

5.06

7.30

8.36

8.98

10.3

6

13.0

20

2

4

6

8

10

12

14

26 52 64 78 96 104 128 208

Number of Processors

Fore

cast

yea

rs/w

allc

lock

day Higher

isBetter

Run on 512P Altix 3000 @1.3GHz with efc 7.1.035 compiler

Speedup from 26P to 208P

3.8

5.6

3.1

1.0

2.1

0

1

2

3

4

5

6

7

8

26 52 78 104 208Number of Processors

Spee

dup

Ideal speedup

SGI Altix 3000 @1.3GHz

Source: Roland Richter, SGI

Page 31: High-Performance Computing at SGI and the Status of Climate … · 2015. 11. 24. · Scalability on ALTIX using Intel Compiler and SGI’s MPT library. 9/9/2004 Slide 29 POP 1.4.3

9/9/2004Slide 31

Conclusions

The Altix is proving to be an excellent system for weather and climate codes

• Shared memory allows very large models to run• The compilers are getting better• The libraries are excellent and still improving• SGI’s software layers on top of Linux help jobs

attain their best performance• SGI engineers have the know-how to make

weather and climate codes run well on its customers’ SGI systems

Page 32: High-Performance Computing at SGI and the Status of Climate … · 2015. 11. 24. · Scalability on ALTIX using Intel Compiler and SGI’s MPT library. 9/9/2004 Slide 29 POP 1.4.3

9/9/2004Slide 32