panda: exascale federation of resources for the atlas experiment fernando barreiro megino...

PanDA: Exascale Federation of Resources for the ATLAS

ExperimentFernando Barreiro Megino (University of Texas at Arlington)

for the PanDA teamMMCP15, Stará Lesná, Slovakia

The LHC

2

The ATLAS detector

3

...~1/10th of its members

4

Distributed Computing: the WLCG

5

Tier-0 (CERN): (15%)Tier-1 (11 centres): (40%)Tier-2 (~140 centres): (45%)

Big Data?

6

Source: Wired magazine

Business emails sent3000PB/year

(Doesn’t count; not managed asa coherent data set)

Google search100PB

Facebook uploads180PB/year

KaiserPermanente

30PB

LHC data15PB/yr

YouTube15PB/yr

USCensus

Lib ofCongress

ClimateDB

Nasdaq

Wired 4/2013

Current ATLAS data set, all data products: 140

PB

Big Data in 2012

~14x growthexpected 2012-2020

http://www.wired.com/2013/04/bigdata/

http://www.wired.com/2013/04/bigdata/

What is PanDA?● Production and Distributed Analysis system developed for

ATLAS● Now also used by AMS, ALICE, LSST and others● Many international partners: DoE HEP, DoE ASCR, NSF, CERN

IT, OSG, ASGC, NorduGrid, European grid projects, Russian grid projects…

http://news.pandawms.org/

7

http://news.pandawms.org/

PanDA at a glance

8

PanDA

Rucio: Distributed Data Management

Users

Pilot factory

Tier 1

Tier 2

Tier 2

Tier 1

Tier 2

Tier 2

Cloud A

Cloud B

Orders of magnitude

9

http://bigpanda.cern.ch/https://rucio-ui.cern.ch/

http://bigpanda.cern.ch/

https://rucio-ui.cern.ch/

Paradigm Shift in HEP ComputingNew Ideas from PanDA● Distributed resources are seamlessly

integrated● worldwide through a single submission

system● All users have access to same resources● Global fair share, priorities and policies

allow efficient management of resources● Automation, error handling, and other

features improve user experience● All users have access to resources

10

Old HEP paradigm• Distributed resources are independent

entities• Groups of users utilize specific resources

(whether locally or remotely)• Fair shares, priorities and policies are

managed locally, for each resource• Uneven user experience at different sites,

based on local support and experience• Privileged users have access to special

resources

Core Ideas in PanDA● Single entry point to the WLCGProvide a central queue for users – similar to local batch systems

● Make hundreds of distributed sites appear as local

● Reduce site related errors and reduce latency● Build a pilot job system – late transfer of user payloads● Crucial for distributed infrastructure maintained by local experts

● Hide middleware while supporting diversity and evolution● PanDA interacts with middleware – users see high level workflow

● Hide variations in infrastructure● PanDA presents uniform ‘job’ slots to user (with minimal sub-types)● Easy to integrate grid sites, clouds, HPC sites …

● Production and Analysis users see same PanDA system● Same set of distributed resources available to all users● Highly flexible system, giving full control of priorities to experiment

11

Key Features of PanDA ● Workflow is maximally asynchronous● Pilot based job execution system

● Condor based pilot factory● Payload is sent only after execution begins on CE● Minimize latency, reduce error rates

● Central job queue● Unified treatment of distributed resources● SQL DB keeps state - critical component

● Automatic error handling and recovery● Extensive monitoring● Modular design● RESTful communications● GSI authentication● Use of Open Source components 12

Task management

13

PanDA is not just a job execution engine: it manages complex tasks. Tasks are groupings of jobs where a certain order might have to be respected.

Monitoring

14





Evolution of the PanDA system1. Integration of upcoming computing paradigms

• Clouds• Leadership Computing Facilities

2. Integration of network as a resource in workload management

3. PanDA beyond ATLAS: BigPanDA, MegaPanDA…

15

PanDA and upcoming computing paradigms

16

Overspilling into the CLOUD Backfilling HPC

It is not about replacing the WLCG, but about integrating additional computing resources

Monte Carlo jobs as ideal candidates for external compute

PanDA and the Cloud• ATLAS Cloud activity started in 2012

– Commercial clouds frequently offer free allocations trying to entice research institutes

– Research clouds: institutes serving multiple experiments wanted to increase flexibility by offering resources through a cloud interface

• Some questions we needed to solve– What is the best integration model for PanDA?

• If we get any offering… we want to be ready!• Possibility of overspilling on the cloud in periods of high demand

– Study the cost models of commercial providers… is running your own computing center really cheaper?

17

PanDA and the Cloud

18

• Wide range of providers have been integrated and evaluated

• Most cloud providers have similar offerings– However watch out for lack of standardization

• Running jobs in the Cloud is “easy”– Run condor workers in the cloud that join a

centrally managed condor pool– With the current experience, new cloud

providers can be plugged in with reduced effort– Sustained operation demonstrated

• The more difficult part is using permanent Cloud storage

• Monte Carlo jobs to the rescue– High CPU usage, low IO

Example: PanDA on GCE

19

• We ran for about 8 weeks (2 weeks were planned for scaling up) Very stable running on the Cloud side. Most problems on the ATLAS side Completed 458,000 jobs, generated and processed about 214 M events

PanDA and HPC• Please see Ruslan’s presentation in this conference

20

Extending beyond the Grid

21

Example for 13-19 June 2015

Cloud and HPC resources are steadily gaining territory

Network as a resource in PanDA

22

● Network bandwidth has multiplied at a factor of O(1k) in the last 15 years

● Networking transcended national boundaries

● With LHCOPN and LHCONE… do we need to keep the MONARC restrictions?

Direct mesh of Tier 2 data flows, cloud boundaries

loosened based on network metrics

Network as a resource in PanDA

23

● Let’s relax the limitations defined back in those days

● Let’s take network measurements to do this gradually

● Better and more dynamic use of storage

● Reduced load on the Tier1s for data serving

● Increased speed to populate analysis facilities

Sources of network information● DDM Sonar: transfer stats covering the whole mesh, as reported by DDM/FTS● perfSonar: low level network statistics● FAX data: transfer stats covering federated XRootD sites

24

Faster User analysis through FAX● First use case of network integration with PanDA● Brokerage will use concept of ‘nearby’ sites● Calculate weight based on brokerage criteria

○ availability of CPU, release, pilot rate…○ add network transfer cost to brokerage weight

● Jobs will be sent to site with best weight – not necessarily the site with local data

● If nearby site has less wait time, access data through FAXFAX transfer monitoringHistorical job dashboard

FAX Kibana25

http://dashb-atlas-xrootd-transfers.cern.ch/ui/%23tab=matrix

http://dashb-atlas-job.cern.ch/dashboard/request.py/dailysummary

http://kibana.mwt2.org:5601/%23/dashboard/Overflows-Dashboard?_a=(filters:!(),panels:!((col:1,id:Local-VS-Remote-access-jobs-pie-Visualization,row:1,size_x:4,size_y:3,type:visualization),(col:5,id:Local-VS-Remote-access-jobs-in-absolute-numbers-bar-Visualization,row:1,size_x:8,size_y:3,type:visualization),(col:5,id:Local-Job-efficiency-absolute-numbers-bar-Visualization,row:4,size_x:8,size_y:3,type:visualization),(col:1,id:Local-Job-efficiency-pie-Visualization,row:4,size_x:4,size_y:3,type:visualization),(col:1,id:Remote-jobs-efficiency-pie-Visualization,row:7,size_x:4,size_y:3,type:visualization),(col:5,id:Remote-Job-efficiency-absolute-numbers-bar-Visualization,row:7,size_x:8,size_y:3,type:visualization),(col:1,id:Regular-jobs-wait-times-bar,row:10,size_x:6,size_y:3,type:visualization),(col:7,id:Remote-jobs-average-wait-times,row:10,size_x:6,size_y:3,type:visualization),(col:7,id:CPU-efficiency-of-Remote-jobs-per-site,row:13,size_x:6,size_y:3,type:visualization),(col:7,id:CPU-efficiency-Source-slash-Destination,row:16,size_x:6,size_y:7,type:visualization),(col:1,id:CPU-efficiencies,row:13,size_x:6,size_y:3,type:visualization),(col:1,id:Job-timings,row:16,size_x:6,size_y:6,type:visualization)),query:(query_string:(analyze_wildcard:!t,query:'*')),title:'Overflows%20Dashboard')&_g=(refreshInterval:(display:'1%20day',section:3,value:86400000),time:(from:now/y,mode:quick,to:now/y))

Dynamic cloud selection● A cloud is an aggregations of sites, usually delimited nationally● Tasks are kept in a cloud and the output aggregated in the Tier1● Optimize and automate choice of T1-T2 pairings

● Currently manual operation using suggestionsDynamic cloud monitoring

26

http://aipanda021.cern.ch/networking/CurrentHomeForeignT2SitesPerCloud/

PanDA beyond ATLAS

• If PanDA works so well, why not use it also for other experiments?• Collaborative work with other institutes: NRC KI, JINR• Make PanDA accessible to everyone

• Migrated code to github: https://github.com/PanDAWMS• PanDA is now Oracle and MySQL compatible• Refactorize the core: update the architecture to a plugin approach, where different

communities can customize the components• Host multi-VO instance on Amazon EC2• Redesigned, modular monitoring

• Experiments collaborating with PanDA• AMS• ALICE• COMPASS• LSST 27

https://github.com/PanDAWMS

https://github.com/PanDAWMS

AcknowledgementsKaushik De, Alexei Klimentov, Tadashi Maeno, Paul Nilsson, Danila Oleynik, Sergey Panitkin, Artem Petrosyan, Ilija Vukotic, Torre Wenaus

28

panda: exascale federation of resources for the atlas experiment fernando barreiro megino...

Documents