continuous analysis atlas user containers on the grid · many existing use-cases of containers...
TRANSCRIPT
Continuous Analysis ATLAS User Containers on the GridLukas Heinrich, Alessandra Forti, Tadashi Maeno, Paul Nilsson on behalf of the ATLAS collaboration ACAT 2019
HEP computation is fundamentally highly parallelizable
• loose coupling • good for distributed, bulk computation
• Worldwide LHC Computing Grid (WLCG) provides the global computing infrastructure • Storage and Compute • Idea: code goes to where the data is
• Used for pre-analysis processing (Reconstruction, Data Reduction) but also for analysis
!2
Event
Event
...
Event
Event
Event
...
File File Dataset
!3
Final Ntuple
Reco
Results
DxAOD
xAODPB
TB
GB
AthAnalysisAnalysisBase
DerivationFramework
AthAnalysisAnalysisBase
ROOT
ATLAS Analysis Model during Run 2 • centralized data reduction (derivations) • User Analysis: from TB scale data to results
• dedicated analysis software releases • still includes a lot of data-reduction, often on Grid
Key Problem in Grid Computing: Software Distribution
• need to be able to instantiate desired runtime environment where the data is
Ingredients of the Software Stack
!4
OperatingSystem
VO specific software
Applicationspecific
s/w
libc openssl sed, curl... ...
The basics: OS + utils
Experiment Frameworksoften with • custom compiler • custom language runtimes
(python) • LCG
CommonExperiment
Your Code
Applicationspecificcode
Previous Solutions
Three parties neededto get a functioning stack for your job
User needs to rely on site admins andexperiments to provide upstream layers(OS, Frameworks)
!5
OperatingSystem
OperatingSystem
VO specific software
OperatingSystem
VO specific software
Applicationspecific
s/w
Provided by
Site Admins
Experiments via CVMFS
Users viaTarball
View on Worker Node
...
Previous Solutions
Three parties neededto get a functioning stack for your job
User needs to rely on site admins andexperiments to provide upstream layers(OS, Frameworks)
only variability in analysis part • forces to be compatible w/ upstream • upstream might change, no guarantee
of reproducibility
!6
OperatingSystem
OperatingSystem
VO specific software
OperatingSystem
VO specific software
Applicationspecific
s/w
Provided by
Site Admins
Experiments via CVMFS
Users viaTarball
View on Worker Node
Job 1
Job 2
Job 3
Job 4
...
Current S/W distribution optimized for upstream processing:
Reco, Simulation, Reduction do not have app-specific runtime requirements - only site-admin / VO interface
Analysis Code is more volatile, heterogeneous.
Q: How to make shared compute resources available while ensuring freedom for users to choose their software?
A: Containers
!7
OperatingSystem
VO specific software
!8
Industry Solution to Software distribution: Containers
Public / Commerical clouds cannot rely on tight coupling between software interfaces.
• Let client define the full stack of their app (rootfs) • Interface pushed to Kernel / syscall level (very stable)
Advantage: Host can be very simplified. Only needs to run one type of workload: containers. (CoreOS)
Ctr Host Application specific Software
OperatingSystem
VO specific software
Applicationspecific
s/wvs
!9
Bringing Containers to the Grid Many existing use-cases of containers within WLCG focus on streamlining operations (SLC6 base images on CC7)
However, this does not change UX for physicists / misses the big advantage of containers on self-defined environments. Here: let users build the image and run it for them. Need to
1. adapt WN code to be able to run user-defined containers 2. adapt user-facing clients to submit container-based jobs
OperatingSystem
VO specific software
Applicationspecific
s/w
base OS container still from CVMFS still downloaded
ad-hoc
!10
Worker node structure
VOs schedule long-running jobs "pilots" onto site batch systems. Pilots connect to VO queue and retrieve jobs
Pilot handles data stage in, payload execution and data stage out
pilot process
stage in
payload job
stage-out
User Image
ATLAS release environment version X.Y.Z.
!11
pilot process
stage in stage-out
code download payload job
pilot process
stage in stage-out
image pull payload job
after: payload job runs entirely in its own environment.
before: Full job lifecycle needed to work within same environment
!12
Details
• container jobs first class concept: it's own "transform" • since no s/w stack prep necessary, very simplfied
transform code
• data mounted into at well-defined path (/data by default)
• container runtime-agnostic • currently singularity • investigating podman, docker (rootless)
• no assumptions on container content • e.g user can run busybox, alpine, etc.
!13
User Interface
Integrated into main command line interface for Grid submission (PanDA):
prun \--containerImage docker://<image> --exec "<shell script>" \--inDS <inputdataset> --outDS <output dataset> \--outputs <outputfiles> --forceStaged
Eventually looking into providing dedicated UI pcontainer: Opportunity to drop all s/w related flags.
!14
Image Acquisition
Images currently pulled into local cache on-demand
ATLAS split between Reconstruction (Athena) and Analysis releases (AnalysisBase) helpful to not overloadincoming bandwidth
Industry images often much smaller
Imag
e S
ize
0
2
4
6
AnalysisBase AthAnalysis Athena Tensorflow ROOT python python alpine
!15
Image Acquisition
Future possibilities:
• near-local caches (proxy requests on image download) • global layer caches
• unpacked.cern.ch project by CVMFS • investigating containerd "snapshotters"
Allows optimization on which layers are cached (long-lived, highly used layers) vs on-demand download (analysis code)
Laye
r 1
Laye
r 2
Laye
r 3 . . .
Laye
r Nfrom local/global cachedownload on demand
!16
Integration into Analysis Workflow
Containers are entering Analysis Work from various angles
• Continuous Integration: • test analysis developments against production and
testing releases
!17
Integration into Analysis Workflow
Continuous Analysis: New analysis workflow in which
• changes to analysis via Merge Request • tests and image building on CI • use built image on distributed compute (clusters, grid)
user laptop local dev
unit & integration tests (CI)image building
registry
GRID
Kubernetes
Batch Systems
Integration into Analysis Workflow
Containers and ability to run them on LHC computing crucial for Analysis Preservation / Reuse (e.g RECAST).
• Desire to re-execute analysisat later date for new studies
• preservation as images + workflow description
Container integration into existing computing infrastructure helps with adoption: images already usedduring analysis
Analysis
!18
Analysis
Stat Analysis CodeEvent Selection Code
Event Selection Code Stat Analysis Code
ML Workloads
Machine Learning: area in HEP with significant influence from industry: non-traditional s/w stack:
• Frameworks like Tensorflow,PyTorch not on CVMFS
• Fast moving field
With Containers we can runlarge-scale jobs on WLCG infrastructure (Hyper-parameter scans)
New: GPU support (see Poster w/ M. Guth, A. Forti)
!19
!20
Outlook
• ATLAS Grid enabled users to run individually defined container images on distributed computing resources • first class concept in Grid middleware • smooth "deployment" path for analysis
code built in continuous integration
• Focus on generality • Working on bringing more sites online,
optimizing image distribution • Hardware acceleration for container-based
jobs