bridging the productivity gap in hpc · – agile manifesto test-driven development code refactors...

22
Bridging the productivity gap in HPC Diego Rossinelli CSE Lab, ETH Zurich

Upload: others

Post on 04-Aug-2020

1 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Bridging the productivity gap in HPC · – Agile Manifesto Test-driven development Code refactors and hacks are a source of benefits Fine-grained Hack, debug, refactor – XP for

Bridging the productivity gapin HPCDiego RossinelliCSE Lab, ETH Zurich

Page 2: Bridging the productivity gap in HPC · – Agile Manifesto Test-driven development Code refactors and hacks are a source of benefits Fine-grained Hack, debug, refactor – XP for

Unprecedented diversity

Industry switch to many-core architectures

● Unprecedented computing power for science/med apps

● Diverse architectural trends

● Unprecedented software challenges

Page 3: Bridging the productivity gap in HPC · – Agile Manifesto Test-driven development Code refactors and hacks are a source of benefits Fine-grained Hack, debug, refactor – XP for

Unprecedented divergence

Are these challenges properly addressed?

CUBISM/MPCF

Stanford CTR compressible turbulence

Shinjo, UmumuraLiquid jet breakup

PPM

Page 4: Bridging the productivity gap in HPC · – Agile Manifesto Test-driven development Code refactors and hacks are a source of benefits Fine-grained Hack, debug, refactor – XP for

Productivity-performance gap

Fast evolution of computing hardware may lead to

● Frequent rewrite of software

● Unsustainable developing efforts

● Or a suboptimal use of the hardware

BER

KEL

EY V

IEW

Page 5: Bridging the productivity gap in HPC · – Agile Manifesto Test-driven development Code refactors and hacks are a source of benefits Fine-grained Hack, debug, refactor – XP for

Dwarfs are goneH

ARD

WAR

E

APPL

ICAT

ION

S● Idea: identify common patterns

– That are shared among different application domains– That are difficult to execute efficiently

● Application developers rely on dwarf-libraries– Delegation of development and optimization's burden– Increase of productivity– HPC experts develop dwarf libraries– Dwarf-specific software optimization techniques

Page 6: Bridging the productivity gap in HPC · – Agile Manifesto Test-driven development Code refactors and hacks are a source of benefits Fine-grained Hack, debug, refactor – XP for

Recent software from CSE Lab● CUBISM-MPCF

– Started in 2012– 6 months, 3 developers– Shock-Bubble Interaction at Mach 3

● VP2.0– Started in 2014– 6 months, 1 developer– Simulation of Collapsing Bubbles

● uDeviceX– Started in 2014– 8 months, 4 core developers, 4 HPC specialists– Catching a Needle in a Flowing Haystack

Page 7: Bridging the productivity gap in HPC · – Agile Manifesto Test-driven development Code refactors and hacks are a source of benefits Fine-grained Hack, debug, refactor – XP for

In Silico Lab-On-A-Chip

● Focus: devices isolating CTCs

– CTC-iChip [M. Toner's Group]● Deterministic Lateral Displacement

– Funnels ratchet [Mc Faul]● Viscoelastic deformations

● Goal: numerical investigation

– Can we assess the device effectiveness?

– Can we improve it?

Page 8: Bridging the productivity gap in HPC · – Agile Manifesto Test-driven development Code refactors and hacks are a source of benefits Fine-grained Hack, debug, refactor – XP for

Dissipative Particle Dynamics● N-body algorithm● Short-range pairwise interactions

● Fluctuation-dissipation theorem● Unified framework for complex FSI

– Walls as frozen particles– Cells discretized as membranes– Suspended RBCs, CTCs in the solvent– Enables separation of scales

[Hoogerbrugge and Koelman, 1992][Groot and Warren, 1997]

[Espanol, 1995]

Page 9: Bridging the productivity gap in HPC · – Agile Manifesto Test-driven development Code refactors and hacks are a source of benefits Fine-grained Hack, debug, refactor – XP for

The In-Silico Lab-on-a-Chip: Petascale and High-Throughput Simulations of Microfluidics at Cell ResolutionRossinelli, Tang, Lykov, Alexeev, Bernaschi, Hadjidoukas, Bisson, Joubert, Conti, Karniadakis, Fatica, Pivkin, KoumoutsakosACM Gordon Bell finalist 2015

Page 10: Bridging the productivity gap in HPC · – Agile Manifesto Test-driven development Code refactors and hacks are a source of benefits Fine-grained Hack, debug, refactor – XP for

Simulation Complexity

Unprecedented level of detail● Up to 0.3 ml of blood / 1.4 billion RBCs● Devices of up to 50 mm3

●Up to 1 trillion DPD particles● 10-100 million steps● 3-60 ms per step

Page 11: Bridging the productivity gap in HPC · – Agile Manifesto Test-driven development Code refactors and hacks are a source of benefits Fine-grained Hack, debug, refactor – XP for

DPD:● Irregular computation● Irregular MPI messages● Neighborhood changes

at ~ every step● About 10X

more expensive than LJ● Dominated by INT OPS

Consequences:➢ Poor GPU execution➢ Detrimental effect on the network

(both bandwidth and latency)➢ Verlet lists are not profitable here➢ Increased TTS➢ GK110 has a 3X slowdown for most

of the integer instructions

HPC Challenges NV GK110

Page 12: Bridging the productivity gap in HPC · – Agile Manifesto Test-driven development Code refactors and hacks are a source of benefits Fine-grained Hack, debug, refactor – XP for

Results

CTC-iChip1● 13 tilted rows of egg-shaped obstacles● Separates large cells from RBCs

Funnels ratchet● 128 rows of shrinking funnels● Separates large cells according to

their to viscoelastic properties

Page 13: Bridging the productivity gap in HPC · – Agile Manifesto Test-driven development Code refactors and hacks are a source of benefits Fine-grained Hack, debug, refactor – XP for

Supercomputing - uDeviceXOn Titan, considering 18,688 nodes:● 30-40X faster than LAMMPS DPD with the GPU Package

100X faster for blood simulations● Achieving up to 65% of the nominal CPU+GPU peak● 35% of peak overall, sustained● Weak scaling efficiency of 99+%

Strong scaling efficiency of 94% from 625 to 5,000 nodes

Page 14: Bridging the productivity gap in HPC · – Agile Manifesto Test-driven development Code refactors and hacks are a source of benefits Fine-grained Hack, debug, refactor – XP for

High-impact HPC softwareMinimize risks– Software should be built to answer one high-impact scientific question– Domain specific competences are a prerequisite (external is good too)– Pick a suitable algorithm: advanced and established (trade off?) – Target an established platform/architecture. No future system!

HPC– Life is 3D, computers are 1D/2D (for now) -> HPC will always make a difference– When does HPC render computing game-changing? – Next level: HW/SW codesign (Anton, bitcoin mining) and ASIC coprocessors (BGQ, web servers)

Primary goal and priorities– Time-to-solution?– Time-to-software?

➢ Time-to-sciencePapers, Proposals ( -> time-to-money )

Page 15: Bridging the productivity gap in HPC · – Agile Manifesto Test-driven development Code refactors and hacks are a source of benefits Fine-grained Hack, debug, refactor – XP for

Vertical solutions● Horizontal solutions almost never pay off

– Absolutely avoid division of competences outside the group

– Avoid development of libraries– Avoid homogeneity across developers

● Form a small group (5-10 people)– Very heterogeneous competences – Very high competences

● A-priori software design is a waste of time– Except if you are a genius

● Mortals like us should embrace XP principles– Resources– Quality– Time– Scope

Science/papers

SW interface

TTSAccuracy

Numerical schemes

Map computation toILP DLP TLP

Ninja/assembly

HW architectureWha

t we

like

mos

tW

here

cre

dit g

oes

Page 16: Bridging the productivity gap in HPC · – Agile Manifesto Test-driven development Code refactors and hacks are a source of benefits Fine-grained Hack, debug, refactor – XP for

Give up on ambitions

Approach– Start small, figure out all the details

● Nothing is easy, even fwrite

– Time-to-software– Focus on just 1-2 platforms– Agile Manifesto

● Test-driven development● Code refactors and hacks are a source of benefits● Fine-grained Hack, debug, refactor

– XP for team sizes < ~5– Do force homogeneity inside the group

● Divergence of opinion is a source of benefits

Page 17: Bridging the productivity gap in HPC · – Agile Manifesto Test-driven development Code refactors and hacks are a source of benefits Fine-grained Hack, debug, refactor – XP for

And for the “puritans”...● Embrace the sad truth

– We can't write HPC software with just one programming language– Performance portability in HPC is a joke or a career– Writing “clean code” is just a misleading feeling– The standard package C/MPI/pthreads/OpenMP/CUDA will stay around

● Generality is nothing else than a big set of specific cases– Generic programming does not make my software generic– The most generic pieces of code around are written in C

● Do not underestimate vanilla codes● Life is dirty, so is my code● From the dirt, non-obvious beautiful things emerge

– Bitonic sort and recompaction at warp level in CUDA– Distributed Work Stealing based with foMPI– M4-unrolling and SSA for C kernels– In-place 1D CUDA wavelet transforms

● Predicting re-usability and software design is almost impossible

Page 18: Bridging the productivity gap in HPC · – Agile Manifesto Test-driven development Code refactors and hacks are a source of benefits Fine-grained Hack, debug, refactor – XP for

Adopt UNIX and Linux philosophy● Bell-labs

– Ritchie, Thompson, Sweldens– UNIX, C, second-generation wavelets– Embrace the UNIX philosophy

● Linux– Very powerful thread scheduler, file system, fast I/O– Powerful programming environment– Good job in dealing with

● High oversubscription● High performance I/O● IPC

– Can't be ignored during the software development● Cray and IBM supercomputers

– Put hard bounds on fanciness– Example: aprun + fork + mpi init

Page 19: Bridging the productivity gap in HPC · – Agile Manifesto Test-driven development Code refactors and hacks are a source of benefits Fine-grained Hack, debug, refactor – XP for

End / Conversation

Page 20: Bridging the productivity gap in HPC · – Agile Manifesto Test-driven development Code refactors and hacks are a source of benefits Fine-grained Hack, debug, refactor – XP for

Abstracting the hardware

How can we characterize hardware performance?● We target high throughput● GFLOP/s (or other operations /s, /cycle)● GB/s (or B/cycle)

Identify common traits:● Increasing data-parallelism● Small data cache units● Superscalar execution

– Compute-Transfer overlap(+data prefetching & streaming stores)

– Dedicated floating-point units

Page 21: Bridging the productivity gap in HPC · – Agile Manifesto Test-driven development Code refactors and hacks are a source of benefits Fine-grained Hack, debug, refactor – XP for

Abstracting the software

Operational Intensity (OI) ● Operations per byte of DRAM traffic [FLOP/B]● It measures the traffic between the DRAM and the LLC● Irregularities/instructions-related problem are ignored● One OI for each kernel, the spectrum is wide!

Software as set of compute kernels

Each kernel is characterized by:● Op count [FLOP]● Compulsory off-chip memory traffic [B]

Page 22: Bridging the productivity gap in HPC · – Agile Manifesto Test-driven development Code refactors and hacks are a source of benefits Fine-grained Hack, debug, refactor – XP for

The roofline model

● It visually relates hardware with software● Performance = min(PB x OI, PP)● Ridge point characterizes the model

Har

dwar

e (G

FLO

P/s )

Software (FLOP/B)