using docker containers for scientific environments —on- … · 2018. 11. 21. · using docker...
TRANSCRIPT
Using Docker Containers for Scientific Environments — On-Premises and in the Cloud
Sergey Yakubov, Martin Gasthuber, Birgit LewendelKEK, Tsukuba, 18.10.2017
Page 2
Contents
Introduction
Scientific environments on-premises• IT-Managed containers
• Custom user containers
Scientific environments in hybrid clouds• HNSciCloud project• Using cloud to extend local resources
Conclusions and outlook
| Sergey Yakubov | 18.10.2017 | Hepix Fall 2017 | KEK, Tsukuba
Page 3
Introduction
• Batch farm (HTCondor) – see talk by T. Finnern• HPC cluster Maxwell (SLURM)
• Large storage, fast network and CPUs• 12,000 cores, Infiniband, 76 TB memory, 3.3 PB storage
• Used mostly for offline data analyses/numerical simulations• But also for online analyses (more in the future)
• Docker containers
| Sergey Yakubov | 18.10.2017 | Hepix Fall 2017 | KEK, Tsukuba
Compute resources at DESY
Page 4
Introduction
• Using Docker container technology we can create environments that allow to:• separate IT and user requirements/dependencies
• separate responsibilities - IT focus on scaling and container template construction, physicist on application development
• provide compute resources dynamically and quickly, whether on top of existing local resources or in the cloud
• control provisioned resources - storage, CPUs, memory, networks, …
• Can we do this with OpenStack & Co? Probably yes, but …
| Sergey Yakubov | 18.10.2017 | Hepix Fall 2017 | KEK, Tsukuba
Containerized scientific environments
Page 5
Scientific environments on-premises
• A Dockerfile is created by IT/ group admins (e.g. Debian image with software for a specific experiment) and stored as Puppet resource
• Puppet automatically creates an image on Dockerfile changes and pushes it to DESY’s Docker registry
• Compute resources are reserved via SLURM
• At a specified time SLURM job starts Docker containers on each of the allocated compute nodes with sshd daemon.
• Users with corresponding rights can login and do their work.
IT-Managed Containers
| Sergey Yakubov | 18.10.2017 | Hepix Fall 2017 | KEK, Tsukuba
admin
admin userssh
Page 6
Scientific environments on-premises
• User submits a SLURM job script with Docker commands• Compute resources are allocated via SLURM
• SLURM execute specified Docker containers on each of the allocated compute nodes • Any Docker images can be used
• Docker authorization plugin takes care about security.
Custom user containers
| Sergey Yakubov | 18.10.2017 | Hepix Fall 2017 | KEK, Tsukuba
user
Page 7
Scientific environments on-premisesExample - SIMEX
| Sergey Yakubov | 18.10.2017 | Hepix Fall 2017 | KEK, Tsukuba
SimEx - photon science simulation platform
https://github.com/eucall-software/simex_platform
Page 8
Scientific environments on-premisesExample - SIMEX
| Sergey Yakubov | 18.10.2017 | Hepix Fall 2017 | KEK, Tsukuba
X-ray wavefront propagation calculator• Propagation of light through optical elements
• Utilizes SRW (Synchrotron Radiation Workshop) library• C++ core + python wrappers
• Hybrid OpenMP/MPI parallelization
02468
101214
0 10 20 30 40
Spee
d-up
N cores
Threads x MPIprocesses
Numberof nodes
Total time Time/file
1x1 1 11h 1031 s40x1 1 65 min 98 s4x10 4 7.5 min 45 s8x5 8 4.2 min 51 s
Single source file 40 source files
Page 9
Scientific environments on-premisesExample - SIMEX
| Sergey Yakubov | 18.10.2017 | Hepix Fall 2017 | KEK, Tsukuba
X-ray wavefront propagation calculator• Propagation of light through optical elements
• Utilizes SRW (Synchrotron Radiation Workshop) library• C++ core + python wrappers
• Hybrid OpenMP/MPI parallelization
02468
101214
0 10 20 30 40
Spee
d-up
N cores
Threads x MPIprocesses
Numberof nodes
Total time Time/file
1x1 1 11h 1031 s40x1 1 65 min 98 s4x10 4 7.5 min 45 s8x5 8 4.2 min 51 s
Single source file 40 source files
160x speed-up
Page 10
Helix Nebula Science CloudJoint Pre-Commercial Procurement
| Sergey Yakubov | 18.10.2017 | Hepix Fall 2017 | KEK, Tsukuba
Procurers: CERN, CNRS, DESY, EMBL-EBI, ESRF, IFAE, INFN, KIT, STFC, SURFSaraExperts: Trust-IT & EGI.eu
The group of procurers have committed• Procurement funds• Manpower for testing/evaluation• Use-cases with applications & data• In-house IT resources
Resulting services will be made available to end-users from many research communities
Co-funded via H2020 Grant Agreement 687614
Total procurement budget >5M€* Thanks to the CERN IT Group for the provided HNSciCloud slides
*
Page 11
Helix Nebula Science Cloud
• Compute and storage• support a range of virtual machine and container configurations including HPC working
with datasets in the petabyte range• Transparent Data Access
• provide transparent for user on-premise’s data access from the cloud
• Network connectivity• provide high-end network capacity via GEANT for the whole platform
• Federated Identity Management• provide common identity and access management
| Sergey Yakubov | 18.10.2017 | Hepix Fall 2017 | KEK, Tsukuba
*
Technical challenges
Page 12
Helix Nebula Science Cloud
Preparation
• Analysis of requirements, current market offers and relevant standards
• Build stakeholder group• Develop tender material
Implementation & Sharing
Jan’16 Dec’18
Eachstepiscompetitive - onlycontractorsthatsuccessfullycompletethepreviousstepcanbidinthenext
4Designs 3Prototypes 2Pilots
Call-offFeb’17
Call-offDec’17
TenderJul’16
We are here
| Sergey Yakubov | 18.10.2017 | Hepix Fall 2017 | KEK, Tsukuba
*
Project phases
Page 13
Scientific environments in hybrid clouds
Resources, Fast network, Transparent Data Access from HNSciCloud and SLURM Elastic Computing
| Sergey Yakubov | 18.10.2017 | Hepix Fall 2017 | KEK, Tsukuba
control node
compute nodes
Using cloud to extend local resources
Page 14
Scientific environments in hybrid clouds
| Sergey Yakubov | 18.10.2017 | Hepix Fall 2017 | KEK, Tsukuba
cloudcompute nodes
Using cloud to extend local resources
control node
compute nodes
Page 15
Scientific environments in hybrid clouds
test.sh
Example
| Sergey Yakubov | 18.10.2017 | Hepix Fall 2017 | KEK, Tsukuba
#!/bin/sh
#SBATCH --partition=cloudXXX#SBATCH --workdir=/test_id#SBATCH --nodes=1
id –u > cloud_id.txtdockerrun centos:7 id –u > cloud_docker_id.txt
local-node$ sbatch test.sh
local-node$ id –u12345local-node$ cat cloud_id.txt12345local-node$ cat cloud_docker_id.txt12345
Page 16
Conclusions and outlookContainerized scientific environment• Implemented via Docker
• Isolates work of different users/groups• Same performance as on underlying infrastructure
• Portable
• More user experience to be gained
Hybrid clouds• Dynamical cloud resource allocation/deallocation
• Transparent to the user
• user submits job to local scheduler• transparent data access from the cloud
• thanks to Docker no need to install user software on the cloud VM• Performance to be tested
| Sergey Yakubov | 18.10.2017 | Hepix Fall 2017 | KEK, Tsukuba
Thank you for you attention!