adac6 · why hpc containers april 2017 cray inc. confidential and proprietary mobility of computing...

Post on 13-Sep-2020

1 Views

Category:

Documents

0 Downloads

Preview:

Click to see full reader

TRANSCRIPT

ADAC6Appropriate Use of Containers in HPC

Agenda

April 2017 Cray Inc. Confidential and Proprietary

● Why HPC containers● Investigate different container runtimes for HPC ● Integration with batch systems● Best practices for HPC● HPC Application performance

2

Why HPC containers

April 2017 Cray Inc. Confidential and Proprietary

● Mobility of computing to both users and HPC centers● Means to capture and distribute software and compute environments● Entire workflow can be contained for others to replicate no matter what

version of Linux they are running (Kernel ABI compatibility – syscall’s)● Reproducibility of results – may not mean same performance

● MPI-ABI, libraries, host drivers● Standard and simplified packaging

● Independent Software Vendors (ISV) codes● Delivered benchmarks● CRI standard and/or propriety formats

● Legacy workload packaging and execution

3

Container Runtime Environments

April 2017 Cray Inc. Confidential and Proprietary

● Enterprise, and HPC container environments● opencontainers/runc

● 236 contributors, 17 release. 3,613 commits● Standard image format (Open Container Initiative – OCI)

● rtk/rtk● 197 contributors, 65 releases , 5,539 commits● Standard image format (appc Specification – Application Container Image, ACI)

● NERSC Shifter (github)● 11 contributors, 5 releases, 1,712 commits● Propriety image format

● LBL Singularity (syslabs.io)● 63 contributors, 19 releases, 4,503 commits● Propriety image format

4

As of 06/18/18

Integration with batch systems

April 2017 Cray Inc. Confidential and Proprietary

● Examples of Moab/Torque and Cray ALPS● Application based utilities● Limited or no cgroup and kernel namespace support

① $ aprun -n N -b runc --bundle /tmp/cle.img run $(date +%Y%m%d%H%M)② $ aprun -n N… -b rkt run \

--stage1-name=coreos.com/rkt/stage1-fly:1.21.0 \registry-1.docker.io/library/cle:latest --net=host --exec=/bin/hostname

③ $ aprun -n N… -b shifter --image cle:latest /bin/hostname④ $ aprun -n N… -b singularity exec /global/cle.img /bin/hostname

5

Container Runtime Specifics

April 2017 Cray Inc. Confidential and Proprietary

● Container runtimes configuration and semantics● Unique image management● Runtime arguments are specific

6

Configuration & Runtime options

April 2017 Cray Inc. Confidential and Proprietary

● runc● Embedded in the OCI image definition: config.json

● rkt● /etc/rkt, /usr/lib/rkt, user-defined

● Repository authentication, data and image locations● Command line can override system configurations

● Shifter● System configuration (/etc/shifter)

● Authentication, data and image locations● Singularity

● System configuration $SYSCONFDIR/singularity/singularity.conf● Authentication, data and image locations

7

runc

April 2017 Cray Inc. Confidential and Proprietary

● Create a root file system, created via Docker container$ docker export alpine | tar xvfC - ./rootfs

● Create container configuration file config.json$ runc spec$ vi config.json # modify the args, see OCI specification

● Run the container$ runc --log /dev/stdout run $(date +%Y%m%d%H%M)

8

OCI specification: https://github.com/opencontainers/runtime-spec/blob/master/runtime.md

rkt

April 2017 Cray Inc. Confidential and Proprietary9

● Pull an image locallydocker2aci$ docker2aci docker://jsparkscraycom/cle:6.0.2

● Run the container$ qsub –Isalloc: Granted job allocation 88$ rkt run --no-overlay=true --net=host \

--stage1-name=coreos.com/rkt/stage1-fly:1.22.0 \registry-1.docker.io/jsparkscraycom/cle:6.0.2 \${options} –inherit-env=true \--environment=LD_LIBRARY_PATH=$LD_LIBRARY_PATH \--exec=/lus/${USER}/a.out -- myargs

Shifter

April 2017 Cray Inc. Confidential and Proprietary

● Pull an image$ shifterimg pull jsparkscraycom/test:latest

● Run the container$ shifter --image jsparkscraycom/test:latest \/usr/bin/date

10

Singularity

April 2017 Cray Inc. Confidential and Proprietary

● Pull an image (if needed)$ singularity pull docker://jsparkscraycom/test:latest

● Run the container$ singularity exec \

docker://jsparkscraycom/test:latest \/usr/bin/date

11

Container Best Practices

April 2017 Cray Inc. Confidential and Proprietary

● Portability● Run anywhere vs. exploit host acceleration● Isolated contained environment, aka encapsulation – full vs. partial

● Image standardization compliance● OCI, ACI and custom (singularity)

● HPC Optimizations● Override container defaults via mounts and environment variables

● Libraries and configurations● Framework independence

● No vendor/runtime lock in● Stripped frameworks

● opencontainers/runc, Charliecloud, rtk

12

Containerization Models

April 2017 Cray Inc. Confidential and Proprietary13

● HPC case● HPC runtimes

(Shifter/Singularity/...)● Partial encapsulation –

limited namespaces● Access to host resources –

networks, storage● blurs the line between

container and host such that local directories can exist within the container.

● Multiple applications● Batch system integration● Image format for scale (pfs)

● Enterprise case● Standard runtimes

(docker/rkt)● Fully encapsulated

● Everything the application needs is in the container

● Single application● Orchestration software● Standard image formats

HPC Partial Encapsulation

April 2017 Cray Inc. Confidential and Proprietary14

Container

binaries, libraries, configurations

PFS

ISV app, libs, configuration

ISV app, libs,configuration,

directories

exec() / mount()

filesystem mount

ssh/sshd

/home, /var/lib/hugetlbfs, /opt/nvidia, /lustre

HPC Application performance

April 2017 Cray Inc. Confidential and Proprietary

● Launch times● Time to setup and launch via container runtime

● Application performance● Hugepage optimization● Environment pass-through

15

An Updated Performance Comparison of Virtual Machinesand Linux ContainersRef: http://domino.research.ibm.com/library/cyberdig.nsf/papers/0929052195DD819C85257D2300681E7B/$File/rc25482.pdf

Launch Times

April 2017 Cray Inc. Confidential and Proprietary16

0. 00

2. 00

4. 00

6. 00

8. 00

10 .0 0

12 .0 0

14 .0 0

16 .0 0

18 .0 0

20 .0 0

2 4 8 16 32 64

Sec

on

ds

N um ber of nodes

Container Execution OverheadExecution time of /bin/true

ba re O S

sh if te r

r kt

r unC w / Lu str e

si ngu la ri t y

r unC w / t m pf s

OSU Benchmarks

April 2017 Cray Inc. Confidential and Proprietary17

0. 00

10 0. 00

20 0. 00

30 0. 00

40 0. 00

50 0. 00

60 0. 00

0 2 832 12

851

220

4881

9232

768

1310

72

5242

88

2097

152

Lat

ency

(u

s)

S ize

OSU One Sided MPI_GET latency Test v3.8

r kt

S hif t er

C LE

S ing ul ar it y

0

10 00

20 00

30 00

40 00

50 00

60 00

70 00

80 00

90 00

1 416 64 25

610

2440

9616

384

6553

626

214

410

485

7641

943

04

Ban

dw

idth

(M

B/s

)

S ize

OSU One Sided MPI_GET Bandwidth Test v3.8

r kt

S hif t er

C LE

S ing ul ar it y

Hugepage / DMAPP

April 2017 Cray Inc. Confidential and Proprietary18

0

10 00

20 00

30 00

40 00

50 00

60 00

70 00

80 00

1 416 64 25

610

2440

9616

384

6553

626

214

410

485

7641

943

04

Ban

dw

idth

(M

B/s

)

S ize

OSU MPI One Slided MPI_Get Bandwidth

S hif t er w/ ohu gep age s

S hif t er whu gep age s

010 0020 0030 0040 0050 0060 0070 0080 0090 00

10 000

1 416 64 25

610

2440

9616

384

6553

626

214

410

485

7641

943

04

Ban

dw

idth

(M

B/s

)

S ize

OSU MPI One Sided MPI_Get Bandwidth

Hugepages + DMAPP

S hif t er w/ oD M AP P

S hif t er wD M AP P

NPB Single node

April 2017 Cray Inc. Confidential and Proprietary19

0. 0020 .0 0

40 .0 060 .0 0

80 .0 010 0. 00

12 0. 0014 0. 00

16 0. 0018 0. 00

20 0. 00

CLE

Shif

ter

rkt

Sing

ular

ity

CLE

Shif

ter

rkt

Sing

ular

ity

CLE

Shif

ter

rkt

Sing

ular

ity

CLE

Shif

ter

rkt

Sing

ular

ity

CLE

Shif

ter

rkt

Sing

ular

ity

CLE

Shif

ter

rkt

Sing

ular

ity

CLE

Shif

ter

rkt

Sing

ular

ity

CLE

Shif

ter

rkt

Sing

ular

ity

CLE

Shif

ter

rkt

Sing

ular

ity

bt cg dc ep f t i s l u sp ua

Tim

e in

sec

on

ds

B enchm ark

NAS Parallel Benchmarks 3.3Serial Single node CLASS=A

N PB - S in gle n ode

NPB Multi-node

April 2017 Cray Inc. Confidential and Proprietary20

0. 00

20 .0 0

40 .0 0

60 .0 0

80 .0 0

10 0. 00

12 0. 00

14 0. 00

CLE

Shif

ter

rkt

Sing

ular

ityru

nC CLE

Shif

ter

rkt

Sing

ular

ityru

nC CLE

Shif

ter

rkt

Sing

ular

ityru

nC CLE

Shif

ter

rkt

Sing

ular

ityru

nC CLE

Shif

ter

rkt

Sing

ular

ityru

nC CLE

Shif

ter

rkt

Sing

ular

ityru

nC CLE

Shif

ter

rkt

Sing

ular

ityru

nC CLE

Shif

ter

rkt

Sing

ular

ityru

nC

bt cg ep f t i s l u m g sp

be nch m ar k

Tim

e in

sec

on

ds

NAS Parallel Benchmarks 3.3NPROCS=256 CLASS=D

N PB

Quantum ESPRESSO – optimization

April 2017 Cray Inc. Confidential and Proprietary21

0. 00

50 .0 0

10 0. 00

15 0. 00

20 0. 00

25 0. 00

30 0. 00

35 0. 00

36 72 10 8 14 4 18 0 21 6 25 2 28 8

Ex

ec

uti

on

tim

e (

se

cs

)

cores

Execution time of QUANTUM ESPRESSO 6.0

/ Broadwellhugepage

C LE

S hif t er

S ing ul ar it y

0. 00

50 .0 0

10 0. 00

15 0. 00

20 0. 00

25 0. 00

30 0. 00

35 0. 00

36 72 10 8 14 4 18 0 21 6 25 2 28 8

Ex

ec

uti

on

tim

e (

se

cs

)

cores

Execution time of QUANTUM ESPRESSO 6.0

/ Broadwellnone-hugepage

C LE

S hif t er

S ing ul ar it y

Radioss – optimization

April 2017 Cray Inc. Confidential and Proprietary22

0

10

20

30

40

50

60

36 72 10 8 14 4

Ela

pse

d T

ime

(sec

s)

N um ber of C ores

Radioss (Intel MPI): Normalized to CLEBroadwell ppn:36

w /o h uge pag es

hu gep age s

I nt el l ib ra r ies

en vir o nm en t pr op oga ti on

Conclusions and Future Work

April 2017 Cray Inc. Confidential and Proprietary

● Deployment● Enterprise frameworks required per-node image caches● Enterprise frameworks, as expected offer more runtime options● HPC frameworks enforce strict user security

● Performance● Native application performance can be achieved, requires host-level access

to resources (network, file system)● Host environment pass-through● Launch time dependent on container infrastructure and image size

● Future work● Scaling investigation of open container frameworks

● Shared image storage – faster startup times● Abstraction layer standardizing on applications packaging, deployed and run

in isolation – OCI.

23

Legal Disclaimer

April 2017 Cray Inc. Confidential and Proprietary24

Information in this document is provided in connection with Cray Inc. products. No license, express or implied, to any intellectual property rights is granted by this document.

Cray Inc. may make changes to specifications and product descriptions at any time, without notice.

All products, dates and figures specified are preliminary based on current expectations, and are subject to change without notice.

Cray hardware and software products may contain design defects or errors known as errata, which may cause the product to deviatefrom published specifications. Current characterized errata are available on request.

Cray uses codenames internally to identify products that are in development and not yet publicly announced for release. Customers and other third parties are not authorized by Cray Inc. to use codenames in advertising, promotion or marketing and any use of Cray Inc. internal codenames is at the sole risk of the user.

Performance tests and ratings are measured using specific systems and/or components and reflect the approximate performance of Cray Inc. products as measured by those tests. Any difference in system hardware or software design or configuration may affect actual performance.

The following are trademarks of Cray Inc. and are registered in the United States and other countries: CRAY and design, SONEXION, URIKA and YARCDATA. The following are trademarks of Cray Inc.: CHAPEL, CLUSTER CONNECT, CLUSTERSTOR, CRAYDOC, CRAYPAT, CRAYPORT, DATAWARP, ECOPHLEX, LIBSCI, NODEKARE, REVEAL. The following system family marks, and associated model number marks, are trademarks of Cray Inc.: CS, CX, XC, XE, XK, XMT and XT. The registered trademark LINUX is used pursuant to a sublicense fromLMI, the exclusive licensee of Linus Torvalds, owner of the mark on a worldwide basis. Other trademarks used on this website are the property of their respective owners.

top related