supermuc hpc system · supermuc in germany and europe the leibniz computing center of the bawarian...

IBM Deep Computing

© 2011 IBM Germany

SuperMUC HPC System

Klaus Gottschalk

HPC Systems Architect

IBM Germany

HPC Advisory Council

Switzerland Workshop 2011

Lugano, March 21-23 , 2011

© 2011 IBM Germany

IBM Deep Computing

2 Smarter Systems for a Smarter Planet.

SuperMUC in Germany and Europe

The Leibniz Computing Center of the Bawarian Academy of Science

is a member of the Gauss Centre for Supercomputing.

GCS is the German Federal Computing center with the three sites at

– LRZ in Garching

– FZJ in Jülich

– HLRS in Stuttgart

GCS is a member of PRACE (Partnership for Advanced Computing in Europe)

– With 21 member states in Europe

– And four Hosting-Partners: D, F, I, SP commiting to fund 100 Mio EUR for 5 years

Access to SuperMUC will be regulated by the

– PRACE Scientific Steering Committee (with peer review process)

– GCS Steering Comitee

© 2011 IBM Germany

IBM Deep Computing


LRZ Supercomputer Performance

(1990 - 2012)

1

10

100

1,000

10,000

100,000

1,000,000

10,000,000

GF

lop

/s, G

By

te

GFlop/s GByte

HLRB1: Hitachi SR8000+

HLRB2: SGI Altix 4700+ Linux-Cluster

10-fold every 4 yearsDouble every 14.1 Montths

Cray Y-MP8

IBM SP2Cray T90

Fujitsu

KSR

SuperMUCIBM

= 1Tera...

= 1 Peta...

= 1Giga

Cray Y-MP2

Source: LRZ – 13.12.2010

© 2011 IBM Germany

IBM Deep Computing


LRZ Power Consumption over the Years

0

5,000

10,000

15,000

20,000

25,000

Po

wer

Co

nsu

mp

tio

n in

MW

h

Source: LRZ – 13.12.2010

© 2011 IBM Germany

IBM Deep Computing


LRZ - Extension to the Facilities

5 Source: LRZ – 13.12.2010

© 2011 IBM Germany

IBM Deep Computing


LRZ Objective for SuperMUC

6

“To establish an integrated, highly energy efficient system and a programming environment which enable the solution of the most challenging scientific problems

from widely varying scientific areas.”*

* Ref [1] RfP, Description of Goods and Services for the European High Performance Computer SuperMUC at LRZ, 23. September 2010, Page 1

© 2011 IBM Germany

IBM Deep Computing


LRZ Procurement

Source: LRZ – 13.12.2010

© 2011 IBM Germany

IBM Deep Computing


Value of the SuperMUC System

SuperMUC represents tightly integrated innovative solution, with a value proposition which

reduces client’s total cost of ownership and which address growth areas of x86 and green

computing.

Energy- and cooling efficiency characteristics of hardware and HPC Software Stack

provides quantifiable cost reduction

Holistic view of the Supercomputer Hardware, -Software and Applications

– Running cost of client reduced by 40% compared to HPC standard system of similar size

Scalability, functionality and quality of hardware, software and service provide a

qualifiable cost advantage

– Fewer problems because of leveraging experience from other platforms

– Faster problem resolution because of integrating development and support

– Running cost of client reduced by less downtime

– Running cost of client reduced by less management effort

© 2011 IBM Germany

IBM Deep Computing


SuperMUC Technical Highlights

Next PetaFLOP Computer in Germany in Gauß-Center

Fastest Computer in Germany in 2011/2012

– 9414 Nodes with 2 Intel Sandy Bridge EP CPUs

– 209 Nodes with 4 Intel Westmere EX CPUs

– 9623 Nodes with total 19660 CPUs /158928 Cores

– 3 PetaFLOP/s Peak Performance

– 327 TB Memory

– Infiniband Interconnect

– Large File Space for multiple purpose

10 PetaByte File Space based on IBM GPFS

with 200GigaByte/s aggregated I/O Bandwidth

2 PetaByte NAS Storage with 10GigaByte/s aggregated I/O Bandwidth

No GPGPUs or other Accelerator Technology

Innovative Technology for Energy Efficient Computing

– Hot Water Cooling

– Energy Aware Scheduling

Most Energy and Cooling Efficient high End HPC System: Target PUE 1.1

© 2011 IBM Germany

IBM Deep Computing


LRZ Nodes

Planned ‚thin‘ node design

– Based on IBM iDatePlex

– diskless

– 2 x Sandy Bridge-EP 8C

– 2 GB per core, 1600MHz, 1DPC

– 2 x Infiniband mezzanine cards on PCIe

– 1 x simple 1GigE for PXE

– 2 x PCIe Gen3 x16 slots available

Planned ‚fat‘ node design

– native 4-socket WM-EX IBM HX5 Blade in IBM BladeCenter

– diskless

– 4 x WM-EX 10Core, 2.40GHz, 130W

– 4GB/core

– 1 x Infiniband QDR

© 2011 IBM Germany

IBM Deep Computing


SuperMUC Infiniband Interconnect compute + I/O islands

(pruned) Infiniband Interconnect

node 1

...

island #1 (thin compute)

Infiniband

node 2

node 518

node 1

...

island #2 (thin compute)

Infiniband

node 2

node 518

f 01 ...

migration/fat island

Infiniband

m 0

2

.......

I/O 1

...

I/O island (I/O+login+mgmt)

Infiniband I/O

72

log

14

...

node 1

...

island #n (thin compute)

Infiniband

node 2

node 518

GPFS to LRZ backbone

f 02 ...

© 2011 IBM Germany

IBM Deep Computing


Energy Aware LoadLeveler Features

1. Goals

• Identify idle nodes in the cluster and put them in the lowest power mode

• Provide to system admins query capability on historical usage of power and energy by workload, user, etc.

• Reduction of energy consumption on workloads with minimal impact to performance

2. Choices for system admin:

• Decide to use Energy Optimize policy or not • Decide the max power/performance/energy degradation one application will

be impacted by, if the Energy policy is applied

3. If Energy Policy is on, policy is applied only to jobs that match the

performance degradation criteria

4. System admin can query LoadLeveler DB to evaluate the impact of the potential policy on performance degradation and energy saving

© 2011 IBM Germany

IBM Deep Computing


IBM xCAT and LoadLeveler Power and Energy Aware Goals

IBM xCAT

– Manage power consumption on an ad hoc (time) basis

For example, while cluster is being installed, or when there is high power

consumption in other parts of the lab for a period of time

Query: p-state, frequency, power consumed info, CPU usage, fan speed,

environment temperature

Set: p-state, idle mode

Tivoli LoadLeveler

– Report power and energy consumption per job

Energy report

– Optimize power and energy consumption:

When nodes are idle

» Set nodes to the least power consumption mode : C6 state or Shutdown

When nodes are running jobs

» For each job, set the frequency such as: Power , Performance or Energy

degradations is less than a given Delta provided by Admin or User

Energy Report

xCAT-LL DB

© 2011 IBM Germany

IBM Deep Computing


Smart Job Scheduling:

Energy Aware Application Scheduling and System Management

First Implementation of Energy Aware HPC Software Stack on x86

Application Energy consumption will be monitored, stored and reported to the user

For a second application run, the scheduler will decide based on administrative policies

– Which Processor Frequency is optimal for the application

– Lower Frequency reduces energy consumption

Currently not used system nodes will put to sleep mode or shutdown based on administrator

capacity expectations

© 2011 IBM Germany

IBM Deep Computing


Green Datacenter Market Drivers and Trends

Increased green consciousness, and rising cost of power

IT demand outpaces technology improvements

– Server energy use doubled 2000-2005;

expected to increase15%/year

– 15 % power growth per year is not sustainable

– Koomey Study: Server use 1.2% of U.S. energy

ICT industries consume 2% worldwide energy

– Carbon dioxide emission like global aviation

Source IDC 2006, Document# 201722, "The impact of Power and

Cooling on Datacenter Infrastructure, John Humphreys, Jed Scaramella" Brouillard, APC, 2006

Future datacenters dominated by energy cost;

half energy spent on cooling

Real Actions Needed

© 2011 IBM Germany

IBM Deep Computing


Aquasar – A Prototype of Hot Water cooling

Design and build a hot-water cooling for the

- IBM BladeCenter H

- QS22 blade

- HS22 blade

The Design Goal and Challenge

- Build water cooling in BC-H Blades, which had been designed for air cooling only.

- Design a hot-water cooling for world record low emission in data centers (MFlops/gCO2).

- Achieve a high energy efficiency of Aquasar (MFlops/W)

The project was funded by the ETH Zurich and by IBM Research.

© 2011 IBM Germany

IBM Deep Computing


65 – 70oC

60 – 65oC

Aquasar Vision: The Zero-Emission-Datacenter

>95% recovered heat

© 2011 IBM Germany

IBM Deep Computing


RACK

NODE IN RACK

QUICK DISCONNECT COUPLERS

HOT AND COLD MANIFOLDS ATTACHED TO RACK

LRZ Water Cooled Node

© 2011 IBM Germany

IBM Deep Computing


Cost Advantage with target PUE 1.1

0

0.2

0.4

0.6

0.8

1

1.2

1.4

1.6

1.8

2

mostDataCenter

efficient.DataCenter

SuperMUC

Datacenter Overhead

System PowerConsumption

40%

supermuc hpc system · supermuc in germany and europe the leibniz computing center of the bawarian...

Documents