trends and perspectives for hpc infrastructures

22
Trends and Perspectives for HPC infrastructures Carlo Cavazzoni, CINECA

Upload: carys

Post on 24-Feb-2016

74 views

Category:

Documents


0 download

DESCRIPTION

Trends and Perspectives for HPC infrastructures. Carlo Cavazzoni , CINECA. outline. HPC resource in EUROPA (PRACE) Today HPC architectures Technology trends Cineca roadmaps ( toward 50PFlops ) EuroExa project. - PowerPoint PPT Presentation

TRANSCRIPT

Page 1: Trends  and  Perspectives for  HPC  infrastructures

Trends and Perspectives for HPC infrastructures

Carlo Cavazzoni, CINECA

Page 2: Trends  and  Perspectives for  HPC  infrastructures

outline

- HPC resource in EUROPA (PRACE)- Today HPC architectures- Technology trends- Cineca roadmaps (toward 50PFlops) - EuroExa project

Page 3: Trends  and  Perspectives for  HPC  infrastructures

The PRACE RI provides access to distributed persistent pan-European world class HPC computing and data management resources and services. Expertise in efficient use of the resources is available through participating centers throughout Europe. Available resources are announced for each Call for Proposals..

Peer reviewed open accessPRACE Projects (Tier-0)PRACE Preparatory (Tier-0)DECI Projects (Tier-1)

European

Local

Tier 0

Tier 1

Tier 2

National

Page 4: Trends  and  Perspectives for  HPC  infrastructures

TIER-0 System, PRACE regular calls

CURIE (GENCI, Fr), BULL Cluster, Intel Xeon, Nvidia cards, Infiniband network FERMI (CINECA, It) &

JUQUEEN (Juelich, D), IBM BGQ, Power processors, custom 5D torus net.

HERMIT (HLRS, D), Cray XE6, AMD procs, custom 3D torus net.1PFLops

MARENOSTRUM (BSC, S), IBM DataPlex, Intel Xeon node,Infiniband net.

SuperMUC (LRZ, D), IBM DataPlex, Intel Xeon Node, Infiniband net..

Page 5: Trends  and  Perspectives for  HPC  infrastructures

DECI site Machine name System type chip peak perfor- mance (Tflops)

GPU cards

Bulgaria (NCSA) EA"ECNIS" IBM BG/P PowerPC 450 27Czech Repulic (VSB-TUO) Anselm Bull Bullx Intel Sandy Bridge-EP 6623 nVIDIA Tesla 4 Intel Xeon Phi P5110Finland (CSC) Sisu Cray XC30 Intel Sandy Bridge 244.9France (CINES) Jade SGI ICE EX8200 Intel Quad-Core E5472/X5560 267.88France (IDRIS) Babel IBM BG/P PowerPC 450 139Germany (Jülich) JuRoPA Intel cluster Intel Xeon X5570 207Germany (RZG) Genius IMB BG/P PowerPC 450 54Germany (RZG) iDataPlex Intel Sandy Bridge 200Ireland (ICHEC) Stokes Sgi ICE 8200EX Intel Xeon E5650Italy (CINECA) PLX iDataPlex Intel Westmere 293548 nVIDIA Tesla M2070/ M2070QNorway (SIGMA) Abel MegWare cluster Intel Sandy Bridge 260Poland (WCSS) Supernova Cluster Intel Westmere-EP 51.58Poland (PSNC) chimera SGI UV1000 Intel Xeon E7-8837 21.8Poland (PSNC) cane cluster AMD&GPU AMD Opteron™ 6234 224.3334 NVIDIA TeslaM2050Poland (ICM) boreasz IBM Power 775 (Power7) IBM Power7 74.5Poland (Cyfronet) Zeus-gpgpu Linux Cluster Intel Xeon X5670/E5645 136.848 M2050/160 M2090Spain (BSC) MinoTauro Bull Cuda Cluster Intel Xeon E5649 182256 nVIDIA Tesla M2090Sweden (PDC) Lindgren Cray XE6 AMD Opteron 305Switzerland (CSCS) Monte Rosa Cray XE6 AMD Opteron 402The Netherlands (SARA) Huygens IBM pSeries 575 Power 6 65Turkey (UYBHM) Karadeniz HP Cluster Intel Xeon 5550 2.5UK (EPCC) HECToR Cray XE6 AMD Opteron 829.03UK (ICE-CSE) ICE Advance IBM BG/Q PowerPC A2 1250

TIER-1 Systems, DECI calls

Page 6: Trends  and  Perspectives for  HPC  infrastructures

HPC Architectures

two model

Hybrid: Server class processors:

Server class nodes Special purpose nodes

Accelerator devices: Nvidia Intel AMD FPGA

Homogeneus: Server class node:

Standar processors Special porpouse nodes

Special purpose processors

Page 7: Trends  and  Perspectives for  HPC  infrastructures

Networks

Standard/switched: Infiniband

Special purpose/Topology: BGQ CRAY TOFU (Fujitsu) TH Express-2 (Thiane-2)

Page 8: Trends  and  Perspectives for  HPC  infrastructures

Programming Models

fundamental paradigm: Message passing Multi-threads Consolidated standard: MPI & OpenMP New task based programming model

Special purpose for accelerators: CUDA Intel offload directives OpenACC, OpenCL, Ecc… NO consolidated standard

Scripting: python

Page 9: Trends  and  Perspectives for  HPC  infrastructures

Roadmap to Exascale(architectural trends)

Page 10: Trends  and  Perspectives for  HPC  infrastructures

Dennard scaling law(downscaling)

L’ = L / 2V’ = V / 2F’ = F * 2D’ = 1 / L2 = 4DP’ = P

do not hold anymore!

The power crisis!

L’ = L / 2V’ = ~VF’ = ~F * 2D’ = 1 / L2 = 4 * DP’ = 4 * P

Increase the number of coresto maintain the architectures evolution on the Moore’s law

Programming crisis!

The core frequencyand performance do notgrow following the Moore’s law any longer

new VLSI gen.old VLSI gen.

Page 11: Trends  and  Perspectives for  HPC  infrastructures

The cost per chip “is going down more than the capital intensity is going up,” Smith said, suggesting Intel’s profit margins should not suffer because of heavy capital spending. “This is the economic beauty of Moore’s Law.”And Intel has a good handle on the next production shift, shrinking circuitry to 10 nanometers. Holt said the company has test chips running on that technology. “We are projecting similar kinds of improvements in cost out to 10 nanometers,” he said.So, despite the challenges, Holt could not be induced to say there’s any looming end to Moore’s Law, the invention race that has been a key driver of electronics innovation since first defined by Intel’s co-founder in the mid-1960s.

Moore’s Law

Economic and market law

From WSJ

Stacy Smith, Intel’s chief financial officer, later gave some more detail on the economic benefits of staying on the Moore’s Law race.

It is all about the number of chips per Si wafer!

Page 12: Trends  and  Perspectives for  HPC  infrastructures

But!

Si lattice

 0.54 nm

There will be still 4~6 cycles (or technology generations) left untilwe reach 11 ~ 5.5 nm technologies, at which we will reach downscaling limit, in some year between 2020-30 (H. Iwai, IWJT2008).

300 atoms!

14nm VLSI

Page 13: Trends  and  Perspectives for  HPC  infrastructures

What about Applications?

In a massively parallel context, an upper limit for the scalability of parallel applications is determined by the fraction of the overall execution time spent in non-scalable operations (Amdahl's law).

maximum speedup tends to 1 / ( 1 − P )

P= parallel fraction

1000000 core

P = 0.999999

serial fraction= 0.000001

Page 14: Trends  and  Perspectives for  HPC  infrastructures

Architectural trends

Peak Performance Moore law

FPU Performance Dennard law

Number of FPUs Moore + Dennard

App. Parallelism Amdahl's law

Page 15: Trends  and  Perspectives for  HPC  infrastructures

HPC Architectures

two model Hybrid, but…

Homogeneus, but…

What 100PFlops system we will see … my guessIBM (hybrid) Power8+Nvidia GPUCray (homo/hybrid) with Intel only!Intel (hybrid) Xeon + MICArm (homo) only arm chip, but…Nvidia/Arm (hybrid) arm+NvidiaFujitsu (homo) sparc high density low powerChina (homo/hybrid) with Intel onlyRoom for AMD console chips

Page 16: Trends  and  Perspectives for  HPC  infrastructures

Chip Architecture

Intel

ARM

NVIDIA

Power

AMD

Strongly market driven Mobile, Tv set, ScreensVideo/Image processing

New arch to compete with ARM Less Xeon, but PHI Main focus on low power mobile chip Qualcomm, Texas inst. , Nvidia, ST, ecc new HPC market, server maket GPU alone will not last long ARM+GPU, Power+GPU Embedded market Power+GPU, only chance for HPC Console market Still some chance for HPC

Page 17: Trends  and  Perspectives for  HPC  infrastructures

CINECA Roadmaps

Page 18: Trends  and  Perspectives for  HPC  infrastructures

Roadmap 50PFlops

Power consump

tion

EURORA 50KW, PLX

350 KW, BGQ 1000KW +

ENI

EURORA or PLX upgrade 400KW; BGQ

1000KW, Data

repository 200KW; - ENI

R&D Eurora EuroExa

STM / ARM board

EuroExa STM / ARM prototype

PCP Proto 1PF in a rack

EuroExa STM / ARM PF platform

ETP proto towards

exascale board

Deployment

Eurora industrial

prototype 150 TF

Eurora or PLX upgrade 1PF peak, 350TF

scalar

multi petaflop system

Tier-0 50PF Tier-1

towards exascale

Time line 2013 2014 2015 2016 2017 2018 2019 2020

Page 19: Trends  and  Perspectives for  HPC  infrastructures

Requisiti di alto livello del sistema

Potenza elettrica assorbita: 400KWDimensione fisica del sistema: 5 racksPotenza di picco del sistema (CPU+GPU): nell'ordine di 1PFlops Potenza di picco del sistema (solo CPU): nell'ordine di 300TFlops

Tier 1 CINECA

Procurement Q2014

Page 20: Trends  and  Perspectives for  HPC  infrastructures

Requisiti di alto livello del sistema

Architettura CPU: Intel Xeon Ivy BridgeNumero di core per CPU: 8 @ >3GHz, oppure 12 @ 2.4GHz

La scelta della frequenza ed il numero di core dipende dal TDP del socket, dalla densità del sistema e dalla capacità di raffreddamentoNumero di server: 500 - 600,

( Peak perf = 600 * 2socket * 12core * 3GHz * 8Flop/clk = 345TFlops )Il numero di server del sistema potrà dipendere dal costo o dalla geometria della

configurazionein termini di numero di nodi solo CPU e numero di nodi CPU+GPU

Architettura GPU: Nvidia K40Numero di GPU: >500

( Peak perf = 700 * 1.43TFlops = 1PFlops )Il numero di schede GPU del sistema potrà dipendere dal costo o dalla geometria della configurazione in termini dinumero di nodi solo CPU e numero di nodi CPU+GPU

Tier 1 CINECA

Page 21: Trends  and  Perspectives for  HPC  infrastructures

Requisiti di alto livello del sistema

Vendor identificati: IBM, EurotechDRAM Memory: 1GByte/core

Verrà richiesta la possibilità di avere un sottoinsieme di nodi con una quantità di memoria più elevata

Memoria non volatile locale: >500GByte SSD/HD a seconda del costo e dalla configurazione del sistema

Cooling: sistema di raffreddamento a liquido con opzione di free coolingSpazio disco scratch: >300TByte (provided by CINECA)

Tier 1 CINECA

Page 22: Trends  and  Perspectives for  HPC  infrastructures

Thank you