Download - IBM POWER8 as an HPC platform
IBM POWER8 as an HPC platform
Alexander Pozdneev, Georgy PavlovIBM
October 23, 2015 — IBM Linux on Power: Platform News
1 c© 2015 IBM Corporation
What is HPC?
• HPC — High Performance Computing• A.k.a. technical computing
• Aeroacoustics: Effects of chevrons on jet noise• Supersonic jet engine noise computational fluid dynamics simulation• 128k Blue Gene/P cores — ≈ 100 hours• 1M Blue Gene/Q cores — ≈ 12 hours
http://youtu.be/cjoz5tncRUs http://youtu.be/uxT-VmY3OWc
2 c© 2015 IBM Corporation
Secrets of the Dark Universe
• Cosmology: The evolution of the Universe simulation• Understanding the physics of the dark matter and energy• 1 BG/Q rack — 68B particles• 32 BG/Q racks — 1.1T particles
http://www.youtube.com/watch?v=tdv8yrJk4VE http://www.youtube.com/watch?v=-S-T_iTiAxQ
3 c© 2015 IBM Corporation
Real-time modeling of human heart ventricles
• Physiology: Simulation of drug-inducedarrhythmias
• Resolution — 0.1 mm
• 768k Blue Gene/Q cores
• 43% peak
• http://dl.acm.org/citation.cfm?id=2388999
• LLNL, IBM Research, IBM ResearchCollaboratory for Life Sciences
4 c© 2015 IBM Corporation
Modelling of a complete human viral pathogen poliovirus
• Molecular biology: Reconstruction and simulation of poliovirus• Antiviral drugs, virus infection, modelling related viruses• 3.3M–3.7M atoms• Blue Gene/Q, Victorian Life Sciences Computing Initiative• http://www.youtube.com/watch?v=Nih0Qa673FY
5 c© 2015 IBM Corporation
Speakers
• Alexander PozdneevResearch Software EngineerHPC
• Georgy PavlovSoftware EngineerESSL Russian team leader
6 c© 2015 IBM Corporation
Outline
1 Data centric computing as a new HPC paradigm
2 Architecture of IBM HPC systems based on POWER8+NVIDIA servers
3 Software stack of IBM HPC systems
4 IBM HPC mathematical libraries
5 Measuring efficiency of an HPC system on real applications
6 Summary
7 c© 2015 IBM Corporation
Application diversity
Image credit: http://www.ibm.com/common/ssi/cgi-bin/ssialias?htmlfid=POL03229USEN
8 c© 2015 IBM Corporation
Data centric computing as a new HPC paradigm
• That is all about moving data around• Memory bandwidth• Memory latency• High value of “memory access operations” / “computations”• Number of FLOPs1 per cycle is no longer relevant• Offloading computing to memory (Active Memory Cube by Micron)
1FLOP — Floating-Point Operation9 c© 2015 IBM Corporation
Overview of IBM Power System S822LC
Power S822LC model 8335-GTA• POWER8 processor module:
I 8-core, 3.32 GHzI 10-core, 2.92 GHz
• Two sockets• Graphics processing units
I Two NVIDIA K80 GPUs
• Eight memory slots• 2U height
13 c© 2015 IBM Corporation
System softwareSoftware stack of IBM HPC systems
• System softwareI Operating system: Linux, bare-metal (no virtualization)I Drivers:
• Mellanox InfiniBand OFED• NVIDIA
I Deployment: xCAT• Parallel operating environment
I IBM Parallel Environment Runtime Edition (PE RTE)I Workload scheduler: IBM Platform LSFI Parallel filesystem: IBM Spectrum Scale (“GPFS”2)
2General Parallel File System15 c© 2015 IBM Corporation
Development toolsSoftware stack of IBM HPC systems
• CompilersI IBM XL C/C++/Fortran compilersI IBM Advance Toolchain, http://ibm.co/AdvanceToolchain
• Fork of GNU compiler/tools optimized for POWER8• gcc, g++, gfortran• Analysis tools (oprofile, valgrind, itrace)
I Vanilla GCC, binutils, etc.I CUDA Toolkit
• IBM Parallel Environment Developer Edition (PE DE)• IBM Software Development Kit for Linux on Power• Mathematical libraries
I Mathematical Acceleration Subsystem (MASS)I IBM Engineering and Scientific Subroutine Library (ESSL)I IBM Parallel ESSL
16 c© 2015 IBM Corporation
Engineering and Scientific Subroutine Library
• High-performance mathematical functionsI Scientific applicationsI Engineering applications
• PlatformsI IBM POWER serversI IBM POWER clusters
• LibrariesI ESSL Serial and SMP: 600+ subroutines
(SMP — Symmetric Multi-Processing)I Parallel ESSL: 125+ subroutines
• Languages:I CI C++I Fortran
http://www.ibm.com/systems/power/software/essl
17 c© 2015 IBM Corporation
ESSL: Industry de facto standards
• ESSL implements the following interfaces:I BLAS (linear algebra)I LAPACK (linear algebra)I FFTW (Fourier transformation)
• Parallel ESSL implements the following interfaces:I ScaLAPACK
• Easy migration• Just recompile! http://fftw.org
18 c© 2015 IBM Corporation
What mathematical areas are supported?
ESSL• Linear algebra subprograms• Matrix operations• Linear algebraic equations• Eigensystems analysis• Fourier transforms, convolution,correlation, . . .
• Sorting and searching• Interpolation• Numerical quadrature• Random number generation
Parallel ESSL• BLACS• Level 2 parallel BLAS• Level 3 parallel BLAS• Linear algebraic equations• Eigensystems analysis• Fourier transforms• Random number generation
19 c© 2015 IBM Corporation
How to leverage the hardware?
Symmetric multiprocessing:• Multiple hardware threads• Multiple cores
POWER8+NVIDIA:• Use multiple GPUs• Select which GPU to use• Run ESSL in a hybrid mode
20 c© 2015 IBM Corporation
Synthetic benchmarks vs. real appsMeasuring car pollution in official tests?
• You get low toxic nitrogen oxides in a lab environment• You cannot predict how much smoke you produce,unless you test your scenarios
• You would run a testdrive prior to car purchase
21 c© 2015 IBM Corporation
Threads behavior: Typical vs. HPC
Typical workloadHPC workload
22 c© 2015 IBM Corporation
Importance of threads affinityNAS Parallel Benchmarks, mg.C (peaks at SMT1), 20 cores
23 c© 2015 IBM Corporation
Choice of compilation parameters: -O5 -qnohotNAS Parallel Benchmarks, bt.C, affinity, baseline: -O3 -qhot
24 c© 2015 IBM Corporation
Compilation parameters: -O3 -qhot, -O5 -qprefetchNAS Parallel Benchmarks, mg.C, affinity, baseline: -O5 -qnohot
25 c© 2015 IBM Corporation
Choice of an SMT mode: SMT1NAS Parallel Benchmarks, mg.C, affinity, baseline: SMT8
26 c© 2015 IBM Corporation
Choice of an SMT mode: SMT2, SMT4NAS Parallel Benchmarks, bt.C, affinity, baseline: SMT1
27 c© 2015 IBM Corporation
Choice of an SMT mode: SMT8NAS Parallel Benchmarks, cg.C, affinity, baseline: SMT1
28 c© 2015 IBM Corporation
Summary
• Technical computing problems ⇒ need for HPC• Data centric computing as a new HPC paradigm• CORAL project• IBM Power System S822LC model 8335-GTA• IBM HPC Software stack• High performance math libraries• Leveraging performance
29 c© 2015 IBM Corporation
Publications
http://www.redbooks.ibm.com/abstracts/sg248263.html30 c© 2015 IBM Corporation
Further reads
• XL C/C++ for Linux,http://www.ibm.com/support/knowledgecenter/SSXVZZ/
• XL Fortran for Linux,http://www.ibm.com/support/knowledgecenter/SSAT4T/
• XL C/C++ for Linux 13.1.2 Optimization and Programming Guide,http://www.ibm.com/support/knowledgecenter/SSXVZZ_13.1.2/com.ibm.xlcpp1312.lelinux.doc/proguide/optimization.html
• XL Fortran for Linux 15.1.2 Optimization and Programming Guide,http://www.ibm.com/support/knowledgecenter/SSAT4T_15.1.2/com.ibm.xlf1512.lelinux.doc/proguide/optimization.html
31 c© 2015 IBM Corporation
Relevance of LINPACK
• Based on DGEMM()• 80–90% of peak performance• Commercial deployment verification test for large systems• Proprietary binary files run by the installation team
32 c© 2015 IBM Corporation
Benchmarking methodology options
1. Take one core for the initial tuningI Try SMT1, SMT2, SMT4, SMT8
(number of threads + affinity + -qtune=pwr8:XXXI Try different optimization options (-O3, -O4, . . . )
2. Choose SMT-mode and compiler options that provide the best timing3. Take one core as a baseline
I Run on 1–5 cores (within one chip)I Run on 5 and 10 cores (within one socket)I Run on 10 and 20 cores
34 c© 2015 IBM Corporation
POWER8 features
• Eight threads per coreI Hide memory latency (like GPU3)I Instuction flow is arbitrary (unlike GPU)
• Memory bandwidth• No sense in benchmarking only one thread (like in GPU)• Scalability within a core depends only on the application• Advanced features to try
I Transactional memoryI Relaxed memory modelI Decimal floating point unit
3GPU — Graphical Processing Unit35 c© 2015 IBM Corporation
Disclaimer
All the information, representations, statements, opinions and proposals in thisdocument are correct and accurate to the best of our present knowledge but arenot intended (and should not be taken) to be contractually binding unless anduntil they become the subject of separate, specific agreement between us.Any IBM Machines provided are subject to the Statements of Limited Warrantyaccompanying the applicable Machine.Any IBM Program Products provided are subject to their applicable license terms.Nothing herein, in whole or in part, shall be deemed to constitute a warranty.IBM products are subject to withdrawal from marketing and or service uponnotice, and changes to product configurations, or follow-on products, may resultin price changes.Any references in this document to “partner” or “partnership” do not constitute orimply a partnership in the sense of the Partnership Act 1890.IBM is not responsible for printing errors in this proposal that result in pricing orinformation inaccuracies.
36 c© 2015 IBM Corporation
Правовая информация
IBM, логотип IBM, BladeCenter, System Storage и System x являются товарными знаками International BusinessMachines Corporation в США и/или других странах. Полный список товарных знаков компании IBM смотритена узле Web: www.ibm.com/legal/copytrade.shtml.
Названия других компаний, продуктов и услуг могут являться товарными знаками или знаками обслуживаниядругих компаний.
(c) 2015 International Business Machines Corporation. Все права защищены.
Упоминание в этой публикации продуктов или услуг корпорации IBM не означает, что IBM предполагаетпредоставлять их во всех странах, в которых осуществляет свою деятельность, информация опредоставлении продуктов или услуг может быть изменена без уведомления. За самой свежей информациейо продуктах и услугах компании IBM, предоставляемых в Вашем регионе, следует обращаться в ближайшееторговое представительство IBM или к авторизованным бизнес-партнерам.
Все заявления относительно намерений и перспективных планов IBM могут быть изменены без уведомления.
Информация о продуктах третьих фирм получена от производителей этих продуктов или из опубликованныханонсов указанных продуктов. IBM не тестировала эти продукты и не может подтвердитьпроизводительность, совместимость, или любые другие заявления относительно продуктов третьих фирм.Вопросы о возможностях продуктов третьих фирм следует адресовать поставщику этих продуктов.
Информация может содержать технические неточности или типографические ошибки. В представленную впубликации информацию могут вноситься изменения, эти изменения будут включаться в новые редакцииданной публикации. IBM может вносить изменения в рассматриваемые в данной публикации продукты илиуслуги в любое время без уведомления.
Любые ссылки на узлы Web третьих фирм приведены только для удобства и никоим образом не служатподдержкой этим узлам Web. Материалы на указанных узлах Web не являются частью материалов дляданного продукта IBM.
37 c© 2015 IBM Corporation