continuous analysis atlas user containers on the grid · many existing use-cases of containers...

Continuous Analysis ATLAS User Containers on the GridLukas Heinrich, Alessandra Forti, Tadashi Maeno, Paul Nilsson on behalf of the ATLAS collaboration ACAT 2019

HEP computation is fundamentally highly parallelizable

• loose coupling • good for distributed, bulk computation

• Worldwide LHC Computing Grid (WLCG) provides the global computing infrastructure • Storage and Compute • Idea: code goes to where the data is

• Used for pre-analysis processing (Reconstruction, Data Reduction) but also for analysis

!2

Event

Event

...

Event

Event

Event

...

File File Dataset

!3

Final Ntuple

Reco

Results

DxAOD

xAODPB

TB

GB

AthAnalysisAnalysisBase

DerivationFramework

AthAnalysisAnalysisBase

ROOT

ATLAS Analysis Model during Run 2 • centralized data reduction (derivations) • User Analysis: from TB scale data to results

• dedicated analysis software releases • still includes a lot of data-reduction, often on Grid

Key Problem in Grid Computing: Software Distribution

• need to be able to instantiate desired runtime environment where the data is

Ingredients of the Software Stack

!4

OperatingSystem

VO specific software

Applicationspecific

s/w

libc openssl sed, curl... ...

The basics: OS + utils

Experiment Frameworksoften with • custom compiler • custom language runtimes

(python) • LCG

CommonExperiment

Your Code

Applicationspecificcode

Previous Solutions

Three parties neededto get a functioning stack for your job

User needs to rely on site admins andexperiments to provide upstream layers(OS, Frameworks)

!5

OperatingSystem

OperatingSystem


OperatingSystem


Applicationspecific

s/w

Provided by

Site Admins

Experiments via CVMFS

Users viaTarball

View on Worker Node

...

Previous Solutions

Three parties neededto get a functioning stack for your job

User needs to rely on site admins andexperiments to provide upstream layers(OS, Frameworks)

only variability in analysis part • forces to be compatible w/ upstream • upstream might change, no guarantee

of reproducibility

!6

OperatingSystem

OperatingSystem


OperatingSystem


Applicationspecific

s/w

Provided by

Site Admins

Experiments via CVMFS

Users viaTarball

View on Worker Node

Job 1

Job 2

Job 3

Job 4

...

Current S/W distribution optimized for upstream processing:

Reco, Simulation, Reduction do not have app-specific runtime requirements - only site-admin / VO interface

Analysis Code is more volatile, heterogeneous.

Q: How to make shared compute resources available while ensuring freedom for users to choose their software?

A: Containers

!7

OperatingSystem


!8

Industry Solution to Software distribution: Containers

Public / Commerical clouds cannot rely on tight coupling between software interfaces.

• Let client define the full stack of their app (rootfs) • Interface pushed to Kernel / syscall level (very stable)

Advantage: Host can be very simplified. Only needs to run one type of workload: containers. (CoreOS)

Ctr Host Application specific Software

OperatingSystem


Applicationspecific

s/wvs

!9

Bringing Containers to the Grid Many existing use-cases of containers within WLCG focus on streamlining operations (SLC6 base images on CC7)

However, this does not change UX for physicists / misses the big advantage of containers on self-defined environments. Here: let users build the image and run it for them. Need to

1. adapt WN code to be able to run user-defined containers 2. adapt user-facing clients to submit container-based jobs

OperatingSystem


Applicationspecific

s/w

base OS container still from CVMFS still downloaded

ad-hoc

!10

Worker node structure

VOs schedule long-running jobs "pilots" onto site batch systems. Pilots connect to VO queue and retrieve jobs

Pilot handles data stage in, payload execution and data stage out

pilot process

stage in

payload job

stage-out

User Image

ATLAS release environment version X.Y.Z.

!11

pilot process

stage in stage-out

code download payload job

pilot process

stage in stage-out

image pull payload job

after: payload job runs entirely in its own environment.

before: Full job lifecycle needed to work within same environment

!12

Details

• container jobs first class concept: it's own "transform" • since no s/w stack prep necessary, very simplfied

transform code

• data mounted into at well-defined path (/data by default)

• container runtime-agnostic • currently singularity • investigating podman, docker (rootless)

• no assumptions on container content • e.g user can run busybox, alpine, etc.

!13

User Interface

Integrated into main command line interface for Grid submission (PanDA):

prun \--containerImage docker://<image> --exec "<shell script>" \--inDS <inputdataset> --outDS <output dataset> \--outputs <outputfiles> --forceStaged

Eventually looking into providing dedicated UI pcontainer: Opportunity to drop all s/w related flags.

!14

Image Acquisition

Images currently pulled into local cache on-demand

ATLAS split between Reconstruction (Athena) and Analysis releases (AnalysisBase) helpful to not overloadincoming bandwidth

Industry images often much smaller

Imag

e S

ize

0

2

4

6

AnalysisBase AthAnalysis Athena Tensorflow ROOT python python alpine

!15

Image Acquisition

Future possibilities:

• near-local caches (proxy requests on image download) • global layer caches

• unpacked.cern.ch project by CVMFS • investigating containerd "snapshotters"

Allows optimization on which layers are cached (long-lived, highly used layers) vs on-demand download (analysis code)

Laye

r 1

Laye

r 2

Laye

r 3 . . .

Laye

r Nfrom local/global cachedownload on demand

http://unpacked.cern.ch

!16

Integration into Analysis Workflow

Containers are entering Analysis Work from various angles

• Continuous Integration: • test analysis developments against production and

testing releases

!17


Continuous Analysis: New analysis workflow in which

• changes to analysis via Merge Request • tests and image building on CI • use built image on distributed compute (clusters, grid)

user laptop local dev

unit & integration tests (CI)image building

registry

GRID

Kubernetes

Batch Systems


Containers and ability to run them on LHC computing crucial for Analysis Preservation / Reuse (e.g RECAST).

• Desire to re-execute analysisat later date for new studies

• preservation as images + workflow description

Container integration into existing computing infrastructure helps with adoption: images already usedduring analysis

Analysis

!18

Analysis

Stat Analysis CodeEvent Selection Code

Event Selection Code Stat Analysis Code

ML Workloads

Machine Learning: area in HEP with significant influence from industry: non-traditional s/w stack:

• Frameworks like Tensorflow,PyTorch not on CVMFS

• Fast moving field

With Containers we can runlarge-scale jobs on WLCG infrastructure (Hyper-parameter scans)

New: GPU support (see Poster w/ M. Guth, A. Forti)

!19

!20

Outlook

• ATLAS Grid enabled users to run individually defined container images on distributed computing resources • first class concept in Grid middleware • smooth "deployment" path for analysis

code built in continuous integration

• Focus on generality • Working on bringing more sites online,

optimizing image distribution • Hardware acceleration for container-based

jobs

continuous analysis atlas user containers on the grid · many existing use-cases of containers...

Documents