towards energy efficient hpc hp apollo 8000 at cyfronet part i patryk lasoń, marek magryś

37
Towards energy efficient HPC HP Apollo 8000 at Cyfronet Part I Patryk Lasoń, Marek Magryś

Upload: meredith-lawrence

Post on 29-Jan-2016

222 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Towards energy efficient HPC HP Apollo 8000 at Cyfronet Part I Patryk Lasoń, Marek Magryś

Towards energy efficient HPC

HP Apollo 8000 at Cyfronet

Part I

Patryk Lasoń, Marek Magryś

Page 2: Towards energy efficient HPC HP Apollo 8000 at Cyfronet Part I Patryk Lasoń, Marek Magryś
Page 3: Towards energy efficient HPC HP Apollo 8000 at Cyfronet Part I Patryk Lasoń, Marek Magryś

ACC Cyfronet AGH-USTACC Cyfronet AGH-UST

• established in 1973• part of AGH University of Science and

Technology in Krakow, PL• provides free computing resources for scientific

institutions• centre of competence in HPC and Grid Computing• IT service management expertise (ITIL, ISO 20k)• member of PIONIER• operator of Krakow MAN• home for Zeus

Page 4: Towards energy efficient HPC HP Apollo 8000 at Cyfronet Part I Patryk Lasoń, Marek Magryś

International projectsInternational projects

Page 5: Towards energy efficient HPC HP Apollo 8000 at Cyfronet Part I Patryk Lasoń, Marek Magryś

PL-Grid Consortium

• Consortium creation – January 2007• a response to requirements from Polish

scientists

• due to ongoing Grid activities in Europe (EGEE, EGI_DS)

• Aim: significant extension of amount of computing resources provided to the scientific community (start of the PL-Grid Programme)

• Development based on: • projects funded by the European Regional

Development Fund as part of the Innovative Economy Program

• close international collaboration (EGI, ….)

• previous projects (5FP, 6FP, 7FP, EDA…)

• National Network Infrastructure available: Pionier National Project

• computing resources: Top500 list

• Polish scientific communities: ~75% highly rated Polish publications in 5 Communities

PL-Grid Consortium members: 5 High Performance Computing Polish Centres, representing Communities, coordinated by ACC

Cyfronet AGH

Page 6: Towards energy efficient HPC HP Apollo 8000 at Cyfronet Part I Patryk Lasoń, Marek Magryś

PL-Grid infrastructurePL-Grid infrastructure

• Polish national IT infrastructure supporting e-Science• based upon resources of most powerful academic resource centres• compatible and interoperable with European Grid• offering grid and cloud computing paradigms• coordinated by Cyfronet

• Benefits for users• one infrastructure instead of 5 separate compute centres• unified access to software, compute and storage resources• non-trivial quality of service

• Challenges• unified monitoring, accounting, security• create environment of cooperation rather than competition

• Federation – the key to success

Page 7: Towards energy efficient HPC HP Apollo 8000 at Cyfronet Part I Patryk Lasoń, Marek Magryś

PLGrid Core projectCompetence Centre in the Field of

Distributed Computing Grid Infrastructures

• Budget: total 104 949 901,16 PLN, including funding from the EC : 89 207 415,99 PLN

• Duration: 01.01.2014 – 31.11.2015

• Project Coordinator: Academic Computer Centre CYFRONET AGH

The main objective of the project is to support the development of ACC Cyfronet AGH as a specialized competence centre in the field of distributed computing infrastructures, with particular emphasis on grid technologies, cloud computing and infrastructures supporting computations on big data.

Page 8: Towards energy efficient HPC HP Apollo 8000 at Cyfronet Part I Patryk Lasoń, Marek Magryś

PLGrid Core project – services

• Basic infrastructure services

• Uniform access to distributed data

• PaaS Cloud for scientists

• Applications maintenance environment of MapReduce type

• End-user services

• Technologies and environments implementing the Open Science paradigm

• Computing environment for interactive processing of scientific data

• Platform for development and execution of large-scale applications organized in a workflow

• Automatic selection of scientific literature

• Environment supporting data farming mass computations

Page 9: Towards energy efficient HPC HP Apollo 8000 at Cyfronet Part I Patryk Lasoń, Marek Magryś

2007 2008 2009 2010 2011 2012

Baribal

Panda

Zeus vSMP

Mars

Zeus

FPGA

Zeus GPU

Zeus

2013

Platon U3

HPC at CyfronetHPC at Cyfronet

Page 10: Towards energy efficient HPC HP Apollo 8000 at Cyfronet Part I Patryk Lasoń, Marek Magryś
Page 11: Towards energy efficient HPC HP Apollo 8000 at Cyfronet Part I Patryk Lasoń, Marek Magryś
Page 12: Towards energy efficient HPC HP Apollo 8000 at Cyfronet Part I Patryk Lasoń, Marek Magryś
Page 13: Towards energy efficient HPC HP Apollo 8000 at Cyfronet Part I Patryk Lasoń, Marek Magryś

ZeusZeus

• over 1300 servers• HP BL2x220c blades• HP BL685c fat nodes (64 cores, 256 GB)• HP BL490c vSMP nodes (up to 768 cores, 6

TB)• HP SL390s GPGPU (2x,8x) nodes• Infiniband QDR (Mellanox+Qlogic)• >3 PB of disk storage (Lustre+GPFS)• Scientific Linux 6, Torque/Moab

Page 14: Towards energy efficient HPC HP Apollo 8000 at Cyfronet Part I Patryk Lasoń, Marek Magryś
Page 15: Towards energy efficient HPC HP Apollo 8000 at Cyfronet Part I Patryk Lasoń, Marek Magryś

Zeus - statisticsZeus - statistics

• 2400 registered users

• >2000 jobs running simultaneously

• >22000 jobs per day

• 96 000 000 computing hours in 2013

• jobs lasting from minutes to weeks

• jobs from 1 core to 4000 cores

Page 16: Towards energy efficient HPC HP Apollo 8000 at Cyfronet Part I Patryk Lasoń, Marek Magryś

CoolingCooling

Rack Rack20°C

40°C 40°C

Hot aisle Hot aisleCold aisle

Page 17: Towards energy efficient HPC HP Apollo 8000 at Cyfronet Part I Patryk Lasoń, Marek Magryś
Page 18: Towards energy efficient HPC HP Apollo 8000 at Cyfronet Part I Patryk Lasoń, Marek Magryś

Why upgrade?Why upgrade?

• Jobs growing

• Users hate queuing

• New users, new requirements

• Technology moving forward

• Power bill staying the same

Page 19: Towards energy efficient HPC HP Apollo 8000 at Cyfronet Part I Patryk Lasoń, Marek Magryś

New buildingNew building

Page 20: Towards energy efficient HPC HP Apollo 8000 at Cyfronet Part I Patryk Lasoń, Marek Magryś

RequirementsRequirements

• Petascale system• Lowest TCO• Energy efficient• Dense• Good MTBF• Hardware:

• core count• memory size• network topology• storage

Page 21: Towards energy efficient HPC HP Apollo 8000 at Cyfronet Part I Patryk Lasoń, Marek Magryś

CoolingCooling

Page 22: Towards energy efficient HPC HP Apollo 8000 at Cyfronet Part I Patryk Lasoń, Marek Magryś

Direct Liquid Cooling!Direct Liquid Cooling!

• Up to 1000x more efficient heat exchange than air

• Less energy needed to move the coolant• Hardware can handle

• CPUs ~70C• memory ~80C

• Hard to cool 100% of HW with liquid• network switches• PSUs

Page 23: Towards energy efficient HPC HP Apollo 8000 at Cyfronet Part I Patryk Lasoń, Marek Magryś

MTBFMTBF

• The less movement the better

• less pumps

• less fans

• less HDDs

• Example

• pump MTBF: 50 000 hrs

• fan MTBF: 50 000 hrs

• 1800 node system MTBF: 7 hrs

Page 24: Towards energy efficient HPC HP Apollo 8000 at Cyfronet Part I Patryk Lasoń, Marek Magryś

The topologyThe topology

576 computing nodes

576 computing nodes

services nodes

Service isle

storage nodes

Computing isle

576 computing nodes

576 computing nodes

Computing isle

576 computing nodes

576 computing nodes

Computing isle

Core IB switches

Page 25: Towards energy efficient HPC HP Apollo 8000 at Cyfronet Part I Patryk Lasoń, Marek Magryś

It should countIt should count

• Max jobsize ~10k cores• Fastest CPUs, but compatible with old codes

• Two sockets are enough• CPUs, not accelerators

• Newest memory• and more than before

• Fast interconnect• still Infiniband• but no need for full CBB fat tree

Page 26: Towards energy efficient HPC HP Apollo 8000 at Cyfronet Part I Patryk Lasoń, Marek Magryś

The hard partThe hard part

• Public institution, public tender

• Strict requirements

• 1.65 PFLOPS, max. 1728 servers

• 128 GB DDR4 per node

• warm water cooling, no pumps inside nodes

• infiniband topology

• compute+cooling, dry-cooler only

• Criteria: price, power, space

Page 27: Towards energy efficient HPC HP Apollo 8000 at Cyfronet Part I Patryk Lasoń, Marek Magryś
Page 28: Towards energy efficient HPC HP Apollo 8000 at Cyfronet Part I Patryk Lasoń, Marek Magryś

And the winner is…And the winner is…

• HP Apollo 8000• Most energy efficient• The only solution with 100% warm water

cooling• Least floor space needed• Lowest TCO

Page 29: Towards energy efficient HPC HP Apollo 8000 at Cyfronet Part I Patryk Lasoń, Marek Magryś

Even more ApolloEven more Apollo

• Focuses also on ‘1’ in PUE!• Power distribution• Less fans• Detailed monitoring

• ‘energy to solution’

• Safer maintenance• Less cables• Prefabricated piping• Simplified management

Page 30: Towards energy efficient HPC HP Apollo 8000 at Cyfronet Part I Patryk Lasoń, Marek Magryś

System configurationSystem configuration

• 1.65 PFLOPS (first 30. of the current Top500)

• 1728 nodes, Intel Haswell E5-2680v3

• 41472 cores, 13824 per island

• 216 TB DDR4 RAM

• PUE ~1.05, 680 kW total power

• 15 racks, 12.99 m2

• System ready for undisruptive upgrade

• Scientific Linux 6 or 7

Page 31: Towards energy efficient HPC HP Apollo 8000 at Cyfronet Part I Patryk Lasoń, Marek Magryś

PrometheusPrometheus

• Created human

• Gave fire to the people

• Accelerated innovation

• Defeated Zeus

Page 32: Towards energy efficient HPC HP Apollo 8000 at Cyfronet Part I Patryk Lasoń, Marek Magryś

Deployment planDeployment plan

• Contract signed on 20.10.2014

• Installation of the primary loop started on 12.11.2014

• First delivery (service island) expected on 24.11.2014

• Apollo piping should arrive before Christmas

• Main delivery in January

• Installation and acceptance in February

• Production work since Q2 2015

Page 33: Towards energy efficient HPC HP Apollo 8000 at Cyfronet Part I Patryk Lasoń, Marek Magryś
Page 34: Towards energy efficient HPC HP Apollo 8000 at Cyfronet Part I Patryk Lasoń, Marek Magryś
Page 35: Towards energy efficient HPC HP Apollo 8000 at Cyfronet Part I Patryk Lasoń, Marek Magryś

Future plansFuture plans

• Benchmarking and Top500 submission

• Evaluation of Scientific Linux 7

• Moving users from the previous system

• Tuning of applications

• Energy-aware scheduling

• First experience presented at HP-CAST 24

Page 36: Towards energy efficient HPC HP Apollo 8000 at Cyfronet Part I Patryk Lasoń, Marek Magryś

[email protected]@cyfronet.pl

Page 37: Towards energy efficient HPC HP Apollo 8000 at Cyfronet Part I Patryk Lasoń, Marek Magryś

More information

•www.cyfronet.krakow.pl/en•www.plgrid.pl/en