blue gene experience at the national center for atmospheric research october 4, 2006

20
Computer Science Section National Center for Atmospheric Research Department of Computer Science University of Colorado at Boulder Blue Gene Experience at the National Center for Atmospheric Research October 4, 2006 Theron Voran [email protected]

Upload: india

Post on 19-Mar-2016

29 views

Category:

Documents


0 download

DESCRIPTION

Blue Gene Experience at the National Center for Atmospheric Research October 4, 2006. Theron Voran [email protected]. Why Blue Gene?. Extreme scalability, balanced architecture, simple design Efficient energy usage A change from IBM Power systems at NCAR But familiar Programming model - PowerPoint PPT Presentation

TRANSCRIPT

Page 1: Blue Gene Experience at the National Center for Atmospheric Research October 4, 2006

Computer Science SectionNational Center for Atmospheric Research

Department of Computer ScienceUniversity of Colorado at Boulder

Blue Gene Experience at the National Center for Atmospheric Research

October 4, 2006

Theron [email protected]

Page 2: Blue Gene Experience at the National Center for Atmospheric Research October 4, 2006

University of Colorado at Boulder / National Center for Atmospheric Research2

Why Blue Gene?

Extreme scalability, balanced architecture, simple design Efficient energy usage A change from IBM Power systems at NCAR But familiar

Programming model Chip (similar to Power4) Linux on front-end and IO nodes

Interesting research platform

Page 3: Blue Gene Experience at the National Center for Atmospheric Research October 4, 2006

University of Colorado at Boulder / National Center for Atmospheric Research3

Outline

System Overview Applications In the Classroom Scheduler Development TeraGrid Integration Other Current Research Activities

Page 4: Blue Gene Experience at the National Center for Atmospheric Research October 4, 2006

University of Colorado at Boulder / National Center for Atmospheric Research4

Frost Fun Facts

Collaborative effort Univ of Colorado at Boulder (CU) NCAR Univ of Colorado at Denver

Debuted in June 2005, tied for 58th place on Top500 5.73 Tflops peak – 4.71 sustained

25KW loaded power usage 4 front-ends, 1 service node 6TB usable storage

Why is it leaning?

Henry Tufo and Rich Loft, with Frost

Page 5: Blue Gene Experience at the National Center for Atmospheric Research October 4, 2006

University of Colorado at Boulder / National Center for Atmospheric Research5

System Internals

PLB (4:1)

“Double FPU”

Ethernet Gbit

JTAGAccess

144 bit wide DDR256MB

JTAG

Gbit Ethernet

440 CPU

440 CPUI/O proc

L2

L2

MultiportedSharedSRAM Buffer

Torus

DDR Control with ECC

SharedL3 directoryfor EDRAM

Includes ECC

4MB EDRAM

L3 CacheorMemory

l

6 out and6 in, each at 1.4 Gbit/s link

256

256

1024+144 ECC256

128

128

32k/32k L1

32k/32k L1

2.7GB/s

22GB/s

11GB/s

“Double FPU”

5.5GB/s

5.5 GB/s

256

snoop

Tree

3 out and3 in, each at 2.8 Gbit/s link

GlobalInterrupt

4 global barriers orinterrupts

128

Blue Gene/L system on-a-chip

Page 6: Blue Gene Experience at the National Center for Atmospheric Research October 4, 2006

University of Colorado at Boulder / National Center for Atmospheric Research6

More Details

Chips PPC440 @700MHZ, 2 cores per

node 512 MB memory per node Coprocessor vs Virtual Node 1:32 IO to Compute ratio

Interconnects 3D Torus (154 MB/s one

direction) Tree (354 MB/s) Global Interrupt GigE JTAG/IDO

Storage 4 Power5 systems as GPFS

cluster NFS export to BGL IO nodes

Page 7: Blue Gene Experience at the National Center for Atmospheric Research October 4, 2006

University of Colorado at Boulder / National Center for Atmospheric Research7

Frost Utilization

BlueGene/L (frost) Usage

0%

10%

20%

30%

40%

50%

60%

70%

80%

90%

100%

1/1/

06

1/15

/06

1/29

/06

2/12

/06

2/26

/06

3/12

/06

3/26

/06

4/9/

06

4/23

/06

5/7/

06

5/21

/06

6/4/

06

6/18

/06

7/2/

06

7/16

/06

7/30

/06

8/13

/06

8/27

/06

9/10

/06

Utilization

Page 8: Blue Gene Experience at the National Center for Atmospheric Research October 4, 2006

University of Colorado at Boulder / National Center for Atmospheric Research8

HOMME

High Order Method Modeling Environment

Spectral element dynamical core Proved scalable on other platforms Cubed-sphere topology Space-filling curves

Page 9: Blue Gene Experience at the National Center for Atmospheric Research October 4, 2006

University of Colorado at Boulder / National Center for Atmospheric Research9

HOMME Performance

Ported in 2004 on BG/L prototype at TJ Watson, with eventual goal of Gordon Bell submission in 2005

Serial and parallel obstacles: SIMD instructions Eager vs Adaptive routing Mapping strategies

Result: Good scalability out to 32,768

processors (3 elements per processor)

0 1

2 3

4 5

6 7 0

1 2

3 4

5 6

7

0

1

2

3

4

5

6

7

Snake mapping on 8x8x8 3D torus

Page 10: Blue Gene Experience at the National Center for Atmospheric Research October 4, 2006

University of Colorado at Boulder / National Center for Atmospheric Research10

HOMME Scalability on 32 Racks

Page 11: Blue Gene Experience at the National Center for Atmospheric Research October 4, 2006

University of Colorado at Boulder / National Center for Atmospheric Research11

Other Applications

Popular codes on Frost WRF CAM, POP, CICE MPIKAIA EULAG BOB PETSc

Used as a scalability test bed, in preparation for runs on 20-rack BG/W system

Page 12: Blue Gene Experience at the National Center for Atmospheric Research October 4, 2006

University of Colorado at Boulder / National Center for Atmospheric Research12

Classroom Access

Henry Tufo’s ‘High Performance Scientific Computing’ course at University of Colorado

Let students loose on 2048 processors Thinking BIG Throughput and latency studies Scalability tests - Conway’s

Game of Life Final projects

Feedback from ‘novice’ HPC users

Page 13: Blue Gene Experience at the National Center for Atmospheric Research October 4, 2006

University of Colorado at Boulder / National Center for Atmospheric Research13

Cobalt

Component-Based Lightweight Toolkit Open source resource manager and scheduler Developed by ANL along with NCAR/CU

Component Architecture Communication via XML-RPC Process manager, queue manager, scheduler

~3000 lines of python code Manages traditional clusters also

http://www.mcs.anl.gov/cobalt

Page 14: Blue Gene Experience at the National Center for Atmospheric Research October 4, 2006

University of Colorado at Boulder / National Center for Atmospheric Research14

Cobalt Architecture

Page 15: Blue Gene Experience at the National Center for Atmospheric Research October 4, 2006

University of Colorado at Boulder / National Center for Atmospheric Research15

Cobalt Development Areas

Scheduler improvements Efficient packing Multi-rack challenges Simulation ability Tunable scheduling parameters

Visualization Aid in scheduler development Give users (and admins) better understanding of machine allocation

Accounting / project management and logging Blue Gene/P TeraGrid integration

Page 16: Blue Gene Experience at the National Center for Atmospheric Research October 4, 2006

University of Colorado at Boulder / National Center for Atmospheric Research16

NCAR joins the TeraGrid, June 2006

Page 17: Blue Gene Experience at the National Center for Atmospheric Research October 4, 2006

University of Colorado at Boulder / National Center for Atmospheric Research17

TeraGrid Testbed

E N T E R P R I S E6 0 0 0

CSS Switch

ComputationalCluster

E NT E R P R IS E6 0 0 0

E N T ER P R I SE6 0 0 0

Teragrid

NLR

CU experimental

StorageCluster

Experimental Environment Production Environment

NETS Switch

Datagrid

NCAR1GBNet

FRGP

Page 18: Blue Gene Experience at the National Center for Atmospheric Research October 4, 2006

University of Colorado at Boulder / National Center for Atmospheric Research18

TeraGrid Activities

Grid-enabling Frost Common TeraGrid Software Stack (CTSS) Grid Resource Allocation Manager (GRAM) and Cobalt

interoperability Security infrastructure

Storage Cluster 16 OSTs, 50-100 TB usable storage 10G connectivity GPFS-WAN Lustre-WAN

Page 19: Blue Gene Experience at the National Center for Atmospheric Research October 4, 2006

University of Colorado at Boulder / National Center for Atmospheric Research19

Other Current Research Activities

Scalability of CCSM components POP CICE

Scalable solver experiments Efficient communication mapping

Coupled climate models Petascale parallelism

Meta-scheduling Across sites Cobalt vs other schedulers

Storage PVFS2 + ZeptoOS Lustre

Page 20: Blue Gene Experience at the National Center for Atmospheric Research October 4, 2006

University of Colorado at Boulder / National Center for Atmospheric Research20

Frost has been a success as a …

Research experiment Utilization rates

Educational tool Classroom Fertile ground for grad students

Development platform Petascale problems Systems work

Questions?

[email protected]://wiki.cs.colorado.edu/BlueGeneWiki