scalability issues : hpc applications & performance...

29
High Performance Computing | Systems and Technology Group Scalability issues : HPC Applications & Performance Tools Chiranjib Sur HPC @ India Systems and Technology Lab [email protected]

Upload: others

Post on 07-Aug-2020

3 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Scalability issues : HPC Applications & Performance Toolsspscicomp.org/wordpress/wp-content/uploads/2011/05/sur... · 2011-05-13 · Top 500 : Some statistics Top 500 - Domains Scalability

High Performance Computing | Systems and Technology Group

Scalability issues :HPC Applications & Performance Tools

Chiranjib Sur HPC @ India Systems and Technology Lab

[email protected]

Page 2: Scalability issues : HPC Applications & Performance Toolsspscicomp.org/wordpress/wp-content/uploads/2011/05/sur... · 2011-05-13 · Top 500 : Some statistics Top 500 - Domains Scalability

Top 500 : Some statistics

Top 500 - Domains

Scalability Performance

2

Top500 : Systems Top500 : Performance

Source : www.top500.org

34%

16.4%

35.63% 33.99%

Page 3: Scalability issues : HPC Applications & Performance Toolsspscicomp.org/wordpress/wp-content/uploads/2011/05/sur... · 2011-05-13 · Top 500 : Some statistics Top 500 - Domains Scalability

Laboratory astrophysics - computational snapshot

Laboratory Astrophysics

Multi-phased, multi-level Massive computation

Computational challenge !!

Massive ParallelismRequired

3

Page 4: Scalability issues : HPC Applications & Performance Toolsspscicomp.org/wordpress/wp-content/uploads/2011/05/sur... · 2011-05-13 · Top 500 : Some statistics Top 500 - Domains Scalability

Parallel AlgorithmParallel

Language

HardwareArchitecture,Threading,

I/O

Interconnects

OS &

ParallelEnvironment

CompilersOptimization

&Debuggers

Scalable parallel

File System

Performance Analysis tools - Single place to go !

Scalability challenges – different aspects

High Throughput

Sustained Performance

Scalable High

Performance Computing

4

Page 5: Scalability issues : HPC Applications & Performance Toolsspscicomp.org/wordpress/wp-content/uploads/2011/05/sur... · 2011-05-13 · Top 500 : Some statistics Top 500 - Domains Scalability

70%70%

30%30%

70%

Amdahl's law

If the serial component remain proportionately equal, there is no inherent speedup !

Parallel component is 50x, max speed up is 3.25x

http://en.wikipedia.org/wiki/Amdahl's_law

70%70%

30%

95%

Gustafson's law

If the serial component shrinks in size, as the problem scales, there is opportunity for speedup !

Parallel component is 50x, max speed up is 18.26x

5%

http://en.wikipedia.org/wiki/Gustafson's_Law

High PERFORMANCE or High THROUGHPUT

5

Page 6: Scalability issues : HPC Applications & Performance Toolsspscicomp.org/wordpress/wp-content/uploads/2011/05/sur... · 2011-05-13 · Top 500 : Some statistics Top 500 - Domains Scalability

70%70%

30%30%

70%

Amdahl's law

If the serial component remain proportionately equal, there is no inherent speedup !

Parallel component is 50x, max speed up is 3.25x

http://en.wikipedia.org/wiki/Amdahl's_law

70%70%

30%

95%

Gustafson's law

If the serial component shrinks in size, as the problem scales, there is opportunity for speedup !

Parallel component is 50x, max speed up is 18.26x

5%

http://en.wikipedia.org/wiki/Gustafson's_Law

High PERFORMANCE or High THROUGHPUT

6

T p=T s

p+T Oh( p)

Parametrization of Scalability

Tp = parallel execution time

Ts = serial execution time

TOh

= Overheard

Page 7: Scalability issues : HPC Applications & Performance Toolsspscicomp.org/wordpress/wp-content/uploads/2011/05/sur... · 2011-05-13 · Top 500 : Some statistics Top 500 - Domains Scalability

Scalability – algorithm / programming languages

7

Parallel algorithm

- Most legacy codes are not designed to work in parallel

- Mostly not designed to exploit modern day HPC architecture

Parallel languages

- Legacy codes contains language (version) specific syntaxes

(e.g. dynamic memory in FORTRAN 77)

- Old codes needs major revision to use modern features, e.g. handling of large arrays

- Not so easy to re-write old codes using new languages like X10, UPC etc.

Page 8: Scalability issues : HPC Applications & Performance Toolsspscicomp.org/wordpress/wp-content/uploads/2011/05/sur... · 2011-05-13 · Top 500 : Some statistics Top 500 - Domains Scalability

Legacy code – Algorithm - a Case Study

8

Page 9: Scalability issues : HPC Applications & Performance Toolsspscicomp.org/wordpress/wp-content/uploads/2011/05/sur... · 2011-05-13 · Top 500 : Some statistics Top 500 - Domains Scalability

Legacy code – Algorithm - a Case Study

9

Page 10: Scalability issues : HPC Applications & Performance Toolsspscicomp.org/wordpress/wp-content/uploads/2011/05/sur... · 2011-05-13 · Top 500 : Some statistics Top 500 - Domains Scalability

Legacy code – Algorithm - a Case Study

10

Page 11: Scalability issues : HPC Applications & Performance Toolsspscicomp.org/wordpress/wp-content/uploads/2011/05/sur... · 2011-05-13 · Top 500 : Some statistics Top 500 - Domains Scalability

Hardware – Scaling OUT or Scaling UP ?

Scalability – computing platform

Courtesy : Thomas Dunning, http://www.nsca.illinois.edu/BlueWaters 11

Page 12: Scalability issues : HPC Applications & Performance Toolsspscicomp.org/wordpress/wp-content/uploads/2011/05/sur... · 2011-05-13 · Top 500 : Some statistics Top 500 - Domains Scalability

Hardware – what to look for ? / how to look for ?

Scalability – computing platform

12

Hardware Thread Management

Usage of multiple lightweight concurrent threads Less switching overhead Addressing the issue of instruction and memory latency

Threading - Random Access to Global Memory

Any thread can read/write any location(s) Sync with the system software Monolithic thread vs blocks (smaller in size) of threads

On-Chip Shared Memory

Efficient managament of Data @ cache Efficient thread communication / cooperation within blocks

Page 13: Scalability issues : HPC Applications & Performance Toolsspscicomp.org/wordpress/wp-content/uploads/2011/05/sur... · 2011-05-13 · Top 500 : Some statistics Top 500 - Domains Scalability

Hardware – what to look for ? / how to look for ?

Scalability – computing platform

13

Hardware Thread Management

Usage of multiple lightweight concurrent threads Less switching overhead Addressing the issue of instruction and memory latency

Threading - Random Access to Global Memory

Any thread can read/write any location(s) Sync with the system software Monolithic thread vs blocks (smaller in size) of threads

On-Chip Shared Memory

Efficient managament of Data @ cache Efficient thread communication / cooperation within blocks

O1 O2 O3 O4

Opt level ---->

Page 14: Scalability issues : HPC Applications & Performance Toolsspscicomp.org/wordpress/wp-content/uploads/2011/05/sur... · 2011-05-13 · Top 500 : Some statistics Top 500 - Domains Scalability

User Space Kernel Space

IP

IF_LSDDHYP

Op

erat

ing

Sys

tem

s: A

IX /

Lin

ux

NSD - VDISK

GPFS

HCP

LL

/ R

eso

urc

e M

gr

Pre

-em

pt i

on

, C

/RxC

at

Network(s)Network Adapter(s) – HFI, IB

Hardware Platforms: pSeries / xSeries

HAL – AIX & Linux

AIX & Linux Verbs

GSMInfra-structure

LAPI – Reliable FIFO, RDMA, Striping,Failover/Recovery, Checkpoint/Restart,Pre-emption, User Space Statistics,Multi-Protocol, Scalability

PNSD / NRTDebug/CommInfrastructure

Eclipse PTP FrameworkPOE Runtime

ParallelDebugger

HPCS ToolkitEclipse Tools

APPLICATION

MP

I

C,

C+

+O

pen

MP

Fo

rtra

n (

77,

95

)O

pen

MP

ES

SL

MA

SS

UP

C

CA

F

SH

ME

M

GS

M

TCPUDP

SOCKETS

Multi-Link, Superpkt

NM

ParallelESSL

Scalability – system software

14

Page 15: Scalability issues : HPC Applications & Performance Toolsspscicomp.org/wordpress/wp-content/uploads/2011/05/sur... · 2011-05-13 · Top 500 : Some statistics Top 500 - Domains Scalability

Compilers (www.ibm.com/software/awdtools/fortran/xlfortran/library)

Five distinct optimization levels + many additional options

Code generation and tuning for specific hardware chipsets

Interprocedural optimization and inlining using IPA

Profile-directed feedback (PDF) optimization

User-directed optimization with directives and source-level intrinsic functions

Optimization of OpenMP programs and auto-parallelization capabilities to exploit SMP systems

Automatic parallelization of calculations using vector machine instructions and high-performance mathematical libraries

++++++++ .....

OS and Parallel Environment

Scalability – System Software stack

15

Page 16: Scalability issues : HPC Applications & Performance Toolsspscicomp.org/wordpress/wp-content/uploads/2011/05/sur... · 2011-05-13 · Top 500 : Some statistics Top 500 - Domains Scalability

Compilers (www.ibm.com/software/awdtools/fortran/xlfortran/library)

Five distinct optimization levels + many additional options

Code generation and tuning for specific hardware chipsets

Interprocedural optimization and inlining using IPA

Profile-directed feedback (PDF) optimization

User-directed optimization with directives and source-level intrinsic functions

Optimization of OpenMP programs and auto-parallelization capabilities to exploit SMP systems

Automatic parallelization of calculations using vector machine instructions and high-performance mathematical libraries

++++++++ .....

OS and Parallel Environment

Scalability – System Software stack

16

Opt level

Mflo

ps/ S

ec

Page 17: Scalability issues : HPC Applications & Performance Toolsspscicomp.org/wordpress/wp-content/uploads/2011/05/sur... · 2011-05-13 · Top 500 : Some statistics Top 500 - Domains Scalability

Parallel Environment – what next ?

Memory -Using Remote Direct Memory Access (RDMA)

Interconnects - RDMA with proper interconnect

Parallel tuned library - Customized

http://publib.boulder.ibm.com/infocenter/clresctr/vxrx/index.jsp?topic=%2Fcom.ibm.cluster.pe432.opuse1.doc%2Fam102_scalaperf.html

Data intensive / Task intensive computing – Combining Massive Data parallelism and instruction level parallelism – heterogeneous model ?

Next generation – MPI 3 ..?

Scalability – System Software stack

17

Page 18: Scalability issues : HPC Applications & Performance Toolsspscicomp.org/wordpress/wp-content/uploads/2011/05/sur... · 2011-05-13 · Top 500 : Some statistics Top 500 - Domains Scalability

The Computing cycle

18

Page 19: Scalability issues : HPC Applications & Performance Toolsspscicomp.org/wordpress/wp-content/uploads/2011/05/sur... · 2011-05-13 · Top 500 : Some statistics Top 500 - Domains Scalability

The Performance Pie

Performance Performance DimensionsDimensions

CPU Performance

MPI Performance

Threading Performance

I/O Performance

19

Page 20: Scalability issues : HPC Applications & Performance Toolsspscicomp.org/wordpress/wp-content/uploads/2011/05/sur... · 2011-05-13 · Top 500 : Some statistics Top 500 - Domains Scalability

What this tool is all about ? – More on next few sessions

What we can do with a tool like this ?

What programming language ? - FORTRAN, C, C++ ...

Which platform we can use ? - Entire range of IBM HPC hardware portfolio

Which operating system ? - AIX & Linux M$

What we mean by Scalable Tools ?

Scalability – Performance Tools

20

Page 21: Scalability issues : HPC Applications & Performance Toolsspscicomp.org/wordpress/wp-content/uploads/2011/05/sur... · 2011-05-13 · Top 500 : Some statistics Top 500 - Domains Scalability

Performance analysis in a nutshell – IBM HPC Toolkit

21

Hardware Hardware Performance Performance MonitoringMonitoring

HPM

Profiling MPI calls

OpenMP

Profiling openMP

directives

I/O analysis and

optimization

Eclipse Plug-in, Eclipse Plug-in, PeekPerf,PeekPerf,

XprofXprof

Visualization

MPI MIO

Page 22: Scalability issues : HPC Applications & Performance Toolsspscicomp.org/wordpress/wp-content/uploads/2011/05/sur... · 2011-05-13 · Top 500 : Some statistics Top 500 - Domains Scalability

2 4 8 16 320

10

20

30

40

50

60

NPB 3.3 - Fourier Transform - Class A

NonInstInst

No of procs

Exe

cutio

n ti

me

Scalability – Performance tools

Page 23: Scalability issues : HPC Applications & Performance Toolsspscicomp.org/wordpress/wp-content/uploads/2011/05/sur... · 2011-05-13 · Top 500 : Some statistics Top 500 - Domains Scalability

2 4 8 16 32 64 128 256 5120.0

2.0

4.0

6.0

8.0

10.0

12.0

14.0

16.0

18.0

20.0

Timing - ft.A

Exec time (2)Initialization time (4)Overhead (4)

No of procs

Tim

ing

23

Scalability – case studies : Timing and overhead

Page 24: Scalability issues : HPC Applications & Performance Toolsspscicomp.org/wordpress/wp-content/uploads/2011/05/sur... · 2011-05-13 · Top 500 : Some statistics Top 500 - Domains Scalability

2 4 8 16 32 64 1280

5000000

10000000

15000000

20000000

25000000

30000000

35000000

40000000

MPI All-to-All communication - ft.A

No of procs

Da

ta tr

an

sfe

r ( b

yte

s)

24

Scalability – case studies : MPI communication

2 4 8 16 320

0.5

1

1.5

2

2.5

3

3.5

Average Communication time (MPI) - ft.A

No of Procs

Tim

e (

s)

Page 25: Scalability issues : HPC Applications & Performance Toolsspscicomp.org/wordpress/wp-content/uploads/2011/05/sur... · 2011-05-13 · Top 500 : Some statistics Top 500 - Domains Scalability

2 4 8 16 32 640

1000

2000

3000

4000

5000

6000

No. of pagefault without I/O - ft.A

No of Procs

pa

ge

fau

lts

2 4 8 16 32 640

20

40

60

80

100

Context switch - ft.A

No of procs

Co

nte

xt s

witc

h

25

Scalability – case studies : Hardware & I/O

Page 26: Scalability issues : HPC Applications & Performance Toolsspscicomp.org/wordpress/wp-content/uploads/2011/05/sur... · 2011-05-13 · Top 500 : Some statistics Top 500 - Domains Scalability

Summary : Performance analysis and next ...

What we can do now ?

What we need ?

26

Page 27: Scalability issues : HPC Applications & Performance Toolsspscicomp.org/wordpress/wp-content/uploads/2011/05/sur... · 2011-05-13 · Top 500 : Some statistics Top 500 - Domains Scalability

Summary : Performance analysis and next ...

What we can do now ?

What we need ?

What we are planning to do ?

27

Page 28: Scalability issues : HPC Applications & Performance Toolsspscicomp.org/wordpress/wp-content/uploads/2011/05/sur... · 2011-05-13 · Top 500 : Some statistics Top 500 - Domains Scalability

Next few talks ..

28

Today

Tomorrow

Page 29: Scalability issues : HPC Applications & Performance Toolsspscicomp.org/wordpress/wp-content/uploads/2011/05/sur... · 2011-05-13 · Top 500 : Some statistics Top 500 - Domains Scalability

The team working on performance tools @ IBM

PidadAditya

Praful

Servesh

Dave

29

John

Chiranjib