panda: exascale federation of resources for the atlas experiment fernando barreiro megino...
TRANSCRIPT
PanDA: Exascale Federation of Resources for the ATLAS
ExperimentFernando Barreiro Megino (University of Texas at Arlington)
for the PanDA teamMMCP15, Stará Lesná, Slovakia
The LHC
2
The ATLAS detector
3
...~1/10th of its members
4
Distributed Computing: the WLCG
5
Tier-0 (CERN): (15%)Tier-1 (11 centres): (40%)Tier-2 (~140 centres): (45%)
Big Data?
6
Source: Wired magazine
Business emails sent3000PB/year
(Doesn’t count; not managed asa coherent data set)
Google search100PB
Facebook uploads180PB/year
KaiserPermanente
30PB
LHC data15PB/yr
YouTube15PB/yr
USCensus
Lib ofCongress
ClimateDB
Nasdaq
Wired 4/2013
Current ATLAS data set, all data products: 140
PB
Big Data in 2012
~14x growthexpected 2012-2020
What is PanDA?● Production and Distributed Analysis system developed for
ATLAS● Now also used by AMS, ALICE, LSST and others● Many international partners: DoE HEP, DoE ASCR, NSF, CERN
IT, OSG, ASGC, NorduGrid, European grid projects, Russian grid projects…
http://news.pandawms.org/
7
PanDA at a glance
8
PanDA
Rucio: Distributed Data Management
Users
Pilot factory
Tier 1
Tier 2
Tier 2
Tier 1
Tier 2
Tier 2
Cloud A
Cloud B
Orders of magnitude
9
http://bigpanda.cern.ch/https://rucio-ui.cern.ch/
Paradigm Shift in HEP ComputingNew Ideas from PanDA● Distributed resources are seamlessly
integrated● worldwide through a single submission
system● All users have access to same resources● Global fair share, priorities and policies
allow efficient management of resources● Automation, error handling, and other
features improve user experience● All users have access to resources
10
Old HEP paradigm• Distributed resources are independent
entities• Groups of users utilize specific resources
(whether locally or remotely)• Fair shares, priorities and policies are
managed locally, for each resource• Uneven user experience at different sites,
based on local support and experience• Privileged users have access to special
resources
Core Ideas in PanDA● Single entry point to the WLCGProvide a central queue for users – similar to local batch systems
● Make hundreds of distributed sites appear as local
● Reduce site related errors and reduce latency● Build a pilot job system – late transfer of user payloads● Crucial for distributed infrastructure maintained by local experts
● Hide middleware while supporting diversity and evolution● PanDA interacts with middleware – users see high level workflow
● Hide variations in infrastructure● PanDA presents uniform ‘job’ slots to user (with minimal sub-types)● Easy to integrate grid sites, clouds, HPC sites …
● Production and Analysis users see same PanDA system● Same set of distributed resources available to all users● Highly flexible system, giving full control of priorities to experiment
11
Key Features of PanDA ● Workflow is maximally asynchronous● Pilot based job execution system
● Condor based pilot factory● Payload is sent only after execution begins on CE● Minimize latency, reduce error rates
● Central job queue● Unified treatment of distributed resources● SQL DB keeps state - critical component
● Automatic error handling and recovery● Extensive monitoring● Modular design● RESTful communications● GSI authentication● Use of Open Source components 12
Task management
13
PanDA is not just a job execution engine: it manages complex tasks. Tasks are groupings of jobs where a certain order might have to be respected.
Monitoring
14
http://bigpanda.cern.ch/
Evolution of the PanDA system1. Integration of upcoming computing paradigms
• Clouds• Leadership Computing Facilities
2. Integration of network as a resource in workload management
3. PanDA beyond ATLAS: BigPanDA, MegaPanDA…
15
PanDA and upcoming computing paradigms
16
Overspilling into the CLOUD Backfilling HPC
It is not about replacing the WLCG, but about integrating additional computing resources
Monte Carlo jobs as ideal candidates for external compute
PanDA and the Cloud• ATLAS Cloud activity started in 2012
– Commercial clouds frequently offer free allocations trying to entice research institutes
– Research clouds: institutes serving multiple experiments wanted to increase flexibility by offering resources through a cloud interface
• Some questions we needed to solve– What is the best integration model for PanDA?
• If we get any offering… we want to be ready!• Possibility of overspilling on the cloud in periods of high demand
– Study the cost models of commercial providers… is running your own computing center really cheaper?
17
PanDA and the Cloud
18
• Wide range of providers have been integrated and evaluated
• Most cloud providers have similar offerings– However watch out for lack of standardization
• Running jobs in the Cloud is “easy”– Run condor workers in the cloud that join a
centrally managed condor pool– With the current experience, new cloud
providers can be plugged in with reduced effort– Sustained operation demonstrated
• The more difficult part is using permanent Cloud storage
• Monte Carlo jobs to the rescue– High CPU usage, low IO
Example: PanDA on GCE
19
• We ran for about 8 weeks (2 weeks were planned for scaling up) Very stable running on the Cloud side. Most problems on the ATLAS side Completed 458,000 jobs, generated and processed about 214 M events
PanDA and HPC• Please see Ruslan’s presentation in this conference
20
Extending beyond the Grid
21
Example for 13-19 June 2015
Cloud and HPC resources are steadily gaining territory
Network as a resource in PanDA
22
● Network bandwidth has multiplied at a factor of O(1k) in the last 15 years
● Networking transcended national boundaries
● With LHCOPN and LHCONE… do we need to keep the MONARC restrictions?
Direct mesh of Tier 2 data flows, cloud boundaries
loosened based on network metrics
Network as a resource in PanDA
23
● Let’s relax the limitations defined back in those days
● Let’s take network measurements to do this gradually
● Better and more dynamic use of storage
● Reduced load on the Tier1s for data serving
● Increased speed to populate analysis facilities
Sources of network information● DDM Sonar: transfer stats covering the whole mesh, as reported by DDM/FTS● perfSonar: low level network statistics● FAX data: transfer stats covering federated XRootD sites
24
Faster User analysis through FAX● First use case of network integration with PanDA● Brokerage will use concept of ‘nearby’ sites● Calculate weight based on brokerage criteria
○ availability of CPU, release, pilot rate…○ add network transfer cost to brokerage weight
● Jobs will be sent to site with best weight – not necessarily the site with local data
● If nearby site has less wait time, access data through FAXFAX transfer monitoringHistorical job dashboard
FAX Kibana25
Dynamic cloud selection● A cloud is an aggregations of sites, usually delimited nationally● Tasks are kept in a cloud and the output aggregated in the Tier1● Optimize and automate choice of T1-T2 pairings
● Currently manual operation using suggestionsDynamic cloud monitoring
26
PanDA beyond ATLAS
• If PanDA works so well, why not use it also for other experiments?• Collaborative work with other institutes: NRC KI, JINR• Make PanDA accessible to everyone
• Migrated code to github: https://github.com/PanDAWMS• PanDA is now Oracle and MySQL compatible• Refactorize the core: update the architecture to a plugin approach, where different
communities can customize the components• Host multi-VO instance on Amazon EC2• Redesigned, modular monitoring
• Experiments collaborating with PanDA• AMS• ALICE• COMPASS• LSST 27
AcknowledgementsKaushik De, Alexei Klimentov, Tadashi Maeno, Paul Nilsson, Danila Oleynik, Sergey Panitkin, Artem Petrosyan, Ilija Vukotic, Torre Wenaus
28