supermuc hpc system · supermuc in germany and europe the leibniz computing center of the bawarian...
TRANSCRIPT
IBM Deep Computing
© 2011 IBM Germany
SuperMUC HPC System
Klaus Gottschalk
HPC Systems Architect
IBM Germany
HPC Advisory Council
Switzerland Workshop 2011
Lugano, March 21-23 , 2011
© 2011 IBM Germany
IBM Deep Computing
2 Smarter Systems for a Smarter Planet.
SuperMUC in Germany and Europe
The Leibniz Computing Center of the Bawarian Academy of Science
is a member of the Gauss Centre for Supercomputing.
GCS is the German Federal Computing center with the three sites at
– LRZ in Garching
– FZJ in Jülich
– HLRS in Stuttgart
GCS is a member of PRACE (Partnership for Advanced Computing in Europe)
– With 21 member states in Europe
– And four Hosting-Partners: D, F, I, SP commiting to fund 100 Mio EUR for 5 years
Access to SuperMUC will be regulated by the
– PRACE Scientific Steering Committee (with peer review process)
– GCS Steering Comitee
© 2011 IBM Germany
IBM Deep Computing
3 Smarter Systems for a Smarter Planet.
LRZ Supercomputer Performance
(1990 - 2012)
1
10
100
1,000
10,000
100,000
1,000,000
10,000,000
GF
lop
/s, G
By
te
GFlop/s GByte
HLRB1: Hitachi SR8000+
HLRB2: SGI Altix 4700+ Linux-Cluster
10-fold every 4 yearsDouble every 14.1 Montths
Cray Y-MP8
IBM SP2Cray T90
Fujitsu
KSR
SuperMUCIBM
= 1Tera...
= 1 Peta...
= 1Giga
Cray Y-MP2
Source: LRZ – 13.12.2010
© 2011 IBM Germany
IBM Deep Computing
4 Smarter Systems for a Smarter Planet.
LRZ Power Consumption over the Years
0
5,000
10,000
15,000
20,000
25,000
Po
wer
Co
nsu
mp
tio
n in
MW
h
Source: LRZ – 13.12.2010
© 2011 IBM Germany
IBM Deep Computing
5 Smarter Systems for a Smarter Planet.
LRZ - Extension to the Facilities
5 Source: LRZ – 13.12.2010
© 2011 IBM Germany
IBM Deep Computing
6 Smarter Systems for a Smarter Planet.
LRZ Objective for SuperMUC
6
“To establish an integrated, highly energy efficient system and a programming environment which enable the solution of the most challenging scientific problems
from widely varying scientific areas.”*
* Ref [1] RfP, Description of Goods and Services for the European High Performance Computer SuperMUC at LRZ, 23. September 2010, Page 1
© 2011 IBM Germany
IBM Deep Computing
7 Smarter Systems for a Smarter Planet.
LRZ Procurement
Source: LRZ – 13.12.2010
© 2011 IBM Germany
IBM Deep Computing
8 Smarter Systems for a Smarter Planet.
Value of the SuperMUC System
SuperMUC represents tightly integrated innovative solution, with a value proposition which
reduces client’s total cost of ownership and which address growth areas of x86 and green
computing.
Energy- and cooling efficiency characteristics of hardware and HPC Software Stack
provides quantifiable cost reduction
Holistic view of the Supercomputer Hardware, -Software and Applications
– Running cost of client reduced by 40% compared to HPC standard system of similar size
Scalability, functionality and quality of hardware, software and service provide a
qualifiable cost advantage
– Fewer problems because of leveraging experience from other platforms
– Faster problem resolution because of integrating development and support
– Running cost of client reduced by less downtime
– Running cost of client reduced by less management effort
© 2011 IBM Germany
IBM Deep Computing
9 Smarter Systems for a Smarter Planet.
SuperMUC Technical Highlights
Next PetaFLOP Computer in Germany in Gauß-Center
Fastest Computer in Germany in 2011/2012
– 9414 Nodes with 2 Intel Sandy Bridge EP CPUs
– 209 Nodes with 4 Intel Westmere EX CPUs
– 9623 Nodes with total 19660 CPUs /158928 Cores
– 3 PetaFLOP/s Peak Performance
– 327 TB Memory
– Infiniband Interconnect
– Large File Space for multiple purpose
10 PetaByte File Space based on IBM GPFS
with 200GigaByte/s aggregated I/O Bandwidth
2 PetaByte NAS Storage with 10GigaByte/s aggregated I/O Bandwidth
No GPGPUs or other Accelerator Technology
Innovative Technology for Energy Efficient Computing
– Hot Water Cooling
– Energy Aware Scheduling
Most Energy and Cooling Efficient high End HPC System: Target PUE 1.1
© 2011 IBM Germany
IBM Deep Computing
10 Smarter Systems for a Smarter Planet.
LRZ Nodes
Planned ‚thin‘ node design
– Based on IBM iDatePlex
– diskless
– 2 x Sandy Bridge-EP 8C
– 2 GB per core, 1600MHz, 1DPC
– 2 x Infiniband mezzanine cards on PCIe
– 1 x simple 1GigE for PXE
– 2 x PCIe Gen3 x16 slots available
Planned ‚fat‘ node design
– native 4-socket WM-EX IBM HX5 Blade in IBM BladeCenter
– diskless
– 4 x WM-EX 10Core, 2.40GHz, 130W
– 4GB/core
– 1 x Infiniband QDR
© 2011 IBM Germany
IBM Deep Computing
11 Smarter Systems for a Smarter Planet.
SuperMUC Infiniband Interconnect compute + I/O islands
(pruned) Infiniband Interconnect
node 1
...
island #1 (thin compute)
Infiniband
node 2
node 518
node 1
...
island #2 (thin compute)
Infiniband
node 2
node 518
f 01 ...
migration/fat island
Infiniband
m 0
2
.......
I/O 1
...
I/O island (I/O+login+mgmt)
Infiniband I/O
72
log
14
...
node 1
...
island #n (thin compute)
Infiniband
node 2
node 518
GPFS to LRZ backbone
f 02 ...
© 2011 IBM Germany
IBM Deep Computing
12 Smarter Systems for a Smarter Planet.
Energy Aware LoadLeveler Features
1. Goals
• Identify idle nodes in the cluster and put them in the lowest power mode
• Provide to system admins query capability on historical usage of power and energy by workload, user, etc.
• Reduction of energy consumption on workloads with minimal impact to performance
2. Choices for system admin:
• Decide to use Energy Optimize policy or not • Decide the max power/performance/energy degradation one application will
be impacted by, if the Energy policy is applied
3. If Energy Policy is on, policy is applied only to jobs that match the
performance degradation criteria
4. System admin can query LoadLeveler DB to evaluate the impact of the potential policy on performance degradation and energy saving
© 2011 IBM Germany
IBM Deep Computing
13 Smarter Systems for a Smarter Planet.
IBM xCAT and LoadLeveler Power and Energy Aware Goals
IBM xCAT
– Manage power consumption on an ad hoc (time) basis
For example, while cluster is being installed, or when there is high power
consumption in other parts of the lab for a period of time
Query: p-state, frequency, power consumed info, CPU usage, fan speed,
environment temperature
Set: p-state, idle mode
Tivoli LoadLeveler
– Report power and energy consumption per job
Energy report
– Optimize power and energy consumption:
When nodes are idle
» Set nodes to the least power consumption mode : C6 state or Shutdown
When nodes are running jobs
» For each job, set the frequency such as: Power , Performance or Energy
degradations is less than a given Delta provided by Admin or User
Energy Report
xCAT-LL DB
© 2011 IBM Germany
IBM Deep Computing
14 Smarter Systems for a Smarter Planet.
Smart Job Scheduling:
Energy Aware Application Scheduling and System Management
First Implementation of Energy Aware HPC Software Stack on x86
Application Energy consumption will be monitored, stored and reported to the user
For a second application run, the scheduler will decide based on administrative policies
– Which Processor Frequency is optimal for the application
– Lower Frequency reduces energy consumption
Currently not used system nodes will put to sleep mode or shutdown based on administrator
capacity expectations
© 2011 IBM Germany
IBM Deep Computing
15 Smarter Systems for a Smarter Planet.
Green Datacenter Market Drivers and Trends
Increased green consciousness, and rising cost of power
IT demand outpaces technology improvements
– Server energy use doubled 2000-2005;
expected to increase15%/year
– 15 % power growth per year is not sustainable
– Koomey Study: Server use 1.2% of U.S. energy
ICT industries consume 2% worldwide energy
– Carbon dioxide emission like global aviation
Source IDC 2006, Document# 201722, "The impact of Power and
Cooling on Datacenter Infrastructure, John Humphreys, Jed Scaramella" Brouillard, APC, 2006
Future datacenters dominated by energy cost;
half energy spent on cooling
Real Actions Needed
© 2011 IBM Germany
IBM Deep Computing
16 Smarter Systems for a Smarter Planet.
Aquasar – A Prototype of Hot Water cooling
Design and build a hot-water cooling for the
- IBM BladeCenter H
- QS22 blade
- HS22 blade
The Design Goal and Challenge
- Build water cooling in BC-H Blades, which had been designed for air cooling only.
- Design a hot-water cooling for world record low emission in data centers (MFlops/gCO2).
- Achieve a high energy efficiency of Aquasar (MFlops/W)
The project was funded by the ETH Zurich and by IBM Research.
© 2011 IBM Germany
IBM Deep Computing
17 Smarter Systems for a Smarter Planet.
65 – 70oC
60 – 65oC
Aquasar Vision: The Zero-Emission-Datacenter
>95% recovered heat
© 2011 IBM Germany
IBM Deep Computing
18 Smarter Systems for a Smarter Planet.
CAD Model of the Aquasar Water Cooled Blade HS22
© 2011 IBM Germany
IBM Deep Computing
19 Smarter Systems for a Smarter Planet.
Hot-Water Cooled Blade HS22
© 2011 IBM Germany
IBM Deep Computing
20 Smarter Systems for a Smarter Planet.
RACK
NODE IN RACK
QUICK DISCONNECT COUPLERS
HOT AND COLD MANIFOLDS ATTACHED TO RACK
LRZ Water Cooled Node
© 2011 IBM Germany
IBM Deep Computing
21 Smarter Systems for a Smarter Planet.
Cost Advantage with target PUE 1.1
0
0.2
0.4
0.6
0.8
1
1.2
1.4
1.6
1.8
2
mostDataCenter
efficient.DataCenter
SuperMUC
Datacenter Overhead
System PowerConsumption
40%
© 2011 IBM Germany
IBM Deep Computing
22 Smarter Systems for a Smarter Planet.