atlas distributed computing
DESCRIPTION
ATLAS Distributed Computing. Kors Bos Annecy, le 18 Mai 2009. ATLAS Workflows. Calibration & Alignment Express Stream Analysis. Prompt Reconstruction. Tier-0. CAF. CASTOR. RAW Re-processing HITS Reconstruction. Tier-1. Tier-1. Tier-1. Tier-2. Tier-2. Tier-2. Simulation Analysis. - PowerPoint PPT PresentationTRANSCRIPT
![Page 1: ATLAS Distributed Computing](https://reader035.vdocuments.site/reader035/viewer/2022062408/56813e85550346895da8bd05/html5/thumbnails/1.jpg)
ATLAS Distributed Computing
1
Kors BosAnnecy, le 18 Mai 2009
![Page 2: ATLAS Distributed Computing](https://reader035.vdocuments.site/reader035/viewer/2022062408/56813e85550346895da8bd05/html5/thumbnails/2.jpg)
ATLAS Workflows
Tier-0
CASTOR
CAF
Prompt ReconstructionCalibration & AlignmentExpress Stream Analysis
Tier-1 Tier-1 Tier-1
RAW Re-processingHITS Reconstruction
Tier-2Tier-2
Tier-2Tier-2
Tier-2Tier-2
Tier-2Tier-2
Tier-2Tier-2
Tier-2Tier-2
SimulationAnalysis
2
![Page 3: ATLAS Distributed Computing](https://reader035.vdocuments.site/reader035/viewer/2022062408/56813e85550346895da8bd05/html5/thumbnails/3.jpg)
At the Tier-0
RAW, Data from the detector 1.6 MB/evESD, Event Summary Data 1.0 MB/evAOD, Analysis Object Data 0.2 MB/evDPD, Derived Physics Data 0.2 MB/evTAG, Data tag 0.01 MB/ev
![Page 4: ATLAS Distributed Computing](https://reader035.vdocuments.site/reader035/viewer/2022062408/56813e85550346895da8bd05/html5/thumbnails/4.jpg)
Reality is more complicated
4
![Page 5: ATLAS Distributed Computing](https://reader035.vdocuments.site/reader035/viewer/2022062408/56813e85550346895da8bd05/html5/thumbnails/5.jpg)
From the detector
Data Streams
Physics streams• egamma • muon• Jet• Etmiss• tau• Bphys• minBias
Calibration streams• Inner Detector Calibration Stream
– Contains only partial events
• Muon Calibration Stream– Contains only partial events– Analyzed outside CERN
• Express line– Full events, 10% of data
Runs and RAW Merging
• A start/stop is between 2 Luminosity Blocks ~30 seconds file
• All files in a run dataset• 200 Hz for 30’ is 6000 events but split
between ~10 streams• Streams are unequal and some create
too small files• Small RAW files are merged into 2 GB
files • Only merged files are written to tape
and exported
5
![Page 6: ATLAS Distributed Computing](https://reader035.vdocuments.site/reader035/viewer/2022062408/56813e85550346895da8bd05/html5/thumbnails/6.jpg)
Calibration and Alignment Facility CAF
Per run ..• Express line used for real-time processing
o Initial calibration usedo verified by DQ shifters
o Calibration data processed in CAFo Initial calibrations usedoNew calibrations into offline db
o Express line processed againoNew calibrations usedo Verified by DQ shifterso If necessary fixes applied
o Express line processed again if necessaryo Buffer for several days of data
o Reconstruction of all data triggeredo Results archived on tape. andoMade available at CERN, ando Replicated to other clouds
6
![Page 7: ATLAS Distributed Computing](https://reader035.vdocuments.site/reader035/viewer/2022062408/56813e85550346895da8bd05/html5/thumbnails/7.jpg)
ATLAS Clouds
Cloud Tier-1 Share [%] Tier-2’s [#]
Asian Pas. ASGC 5 0
US BNL 25 6
Italy CNAF 5 4
German FZK 10 11
French CCIN2P3 15 11
Nordic NDGF 5 2
Iberian PIC 5 5
UK RAL 10 13
Dutch SARA 15 9
Canadian Triumf 5 4
![Page 8: ATLAS Distributed Computing](https://reader035.vdocuments.site/reader035/viewer/2022062408/56813e85550346895da8bd05/html5/thumbnails/8.jpg)
French Tier-2
![Page 9: ATLAS Distributed Computing](https://reader035.vdocuments.site/reader035/viewer/2022062408/56813e85550346895da8bd05/html5/thumbnails/9.jpg)
Activity areas
1. Detector data distribution 2. Detector data re-processing (in the Tier-1’s)3. MC Simulation production (in the Tier-2’s)4. User analysis (in the Tier-2’s)
9
![Page 10: ATLAS Distributed Computing](https://reader035.vdocuments.site/reader035/viewer/2022062408/56813e85550346895da8bd05/html5/thumbnails/10.jpg)
STEP09
10
A Functional and Performance TestFor all 4 experiments simultaneously
ATLAS
![Page 11: ATLAS Distributed Computing](https://reader035.vdocuments.site/reader035/viewer/2022062408/56813e85550346895da8bd05/html5/thumbnails/11.jpg)
What we would like to test
• Full computing model• Tape writing and reading simultaneously in Tier-1’s and Tier-0• Processing priorities and shares in Tier-1 and -2’s• Monitoring of all those activities• Simultaneously with other experiments (test shares)• All at nominal rates for 2 weeks: June 1 - 14• Full shift schedule in place like for cosmics data taking• As little disruptive as possible for detector comissioning
11
![Page 12: ATLAS Distributed Computing](https://reader035.vdocuments.site/reader035/viewer/2022062408/56813e85550346895da8bd05/html5/thumbnails/12.jpg)
Activity areas
1. Detector data distribution 2. Detector data re-processing (in the Tier-1’s)3. MC Simulation production (in the Tier-2’s)4. User analysis (in the Tier-2’s)
12
![Page 13: ATLAS Distributed Computing](https://reader035.vdocuments.site/reader035/viewer/2022062408/56813e85550346895da8bd05/html5/thumbnails/13.jpg)
Detector Data Distribution
![Page 14: ATLAS Distributed Computing](https://reader035.vdocuments.site/reader035/viewer/2022062408/56813e85550346895da8bd05/html5/thumbnails/14.jpg)
The Common Computing Readiness Challengeof last year
T0->T1s throughput
MB/s
MB/s
Subscriptions injected every 4
hours and immediately
honored
12h backlog Fully
Recovered in 30
minutes
All Experiments in the game
![Page 15: ATLAS Distributed Computing](https://reader035.vdocuments.site/reader035/viewer/2022062408/56813e85550346895da8bd05/html5/thumbnails/15.jpg)
Tier-0 Tier-1 rates and volumes
![Page 16: ATLAS Distributed Computing](https://reader035.vdocuments.site/reader035/viewer/2022062408/56813e85550346895da8bd05/html5/thumbnails/16.jpg)
Activity areas
1. Detector data distribution 2. Detector data re-processing (in the Tier-1’s)3. MC Simulation production (in the Tier-2’s)4. User analysis (in the Tier-2’s)
16
![Page 17: ATLAS Distributed Computing](https://reader035.vdocuments.site/reader035/viewer/2022062408/56813e85550346895da8bd05/html5/thumbnails/17.jpg)
2. Detector data re-processing (in the Tier-1’s)
• Each Tier-1 responsible to re-process its share• Pre-stage RAW data back from tape to disk• Re-process reconstruction (on average 30’ per event)• Output ESD, AOD, DPD archived to tape• Copy AOD and DPD to all other 9 Tier-1’s• Distribute AOD and DPD over Tier-2’s of ‘this’ cloud• Copy ESD to 1 other (sister) Tier-1
17
![Page 18: ATLAS Distributed Computing](https://reader035.vdocuments.site/reader035/viewer/2022062408/56813e85550346895da8bd05/html5/thumbnails/18.jpg)
Re-processing work flow
Here means mAODmDPD merged AOD/DPD files
![Page 19: ATLAS Distributed Computing](https://reader035.vdocuments.site/reader035/viewer/2022062408/56813e85550346895da8bd05/html5/thumbnails/19.jpg)
Spring09 re-Processing Campaign
• Total input data (RAW)– 138 runs, 852 containers, 334,191 files, 520 TB– https://twiki.cern.ch/twiki/pub/Atlas/ DataPreparationReprocessing/reproMarch09_inputnew.txt
• Total output data (ESD, AOD, DPD, TAG, NTUP, etc.)– 12,339 containers, 1,847,149 files, 133 TB– Compare with last time - 116.8TB - due to extra runs, DPD formats etc
![Page 20: ATLAS Distributed Computing](https://reader035.vdocuments.site/reader035/viewer/2022062408/56813e85550346895da8bd05/html5/thumbnails/20.jpg)
Simplified re-processing for STEP09
• Spring09 campaign too complicated• Simplify by just running RAWESD
– Using Jumbo tasks
• RAW staged from tape• ESD archived back onto tape
– Volume is smaller than with real data
• Increase Data Distribution FT– To match the missing AOD/DPD traffic
![Page 21: ATLAS Distributed Computing](https://reader035.vdocuments.site/reader035/viewer/2022062408/56813e85550346895da8bd05/html5/thumbnails/21.jpg)
Re-Processing targets
• Re-processing at 5x the rate of nominal data taking• Be aware: ESD is much smaller for cosmics than for data
– ESD file size 140 MB i.s.o. 1 GB
![Page 22: ATLAS Distributed Computing](https://reader035.vdocuments.site/reader035/viewer/2022062408/56813e85550346895da8bd05/html5/thumbnails/22.jpg)
Tier-1 Tier-1 Volumes and Rates
• Re-processed data distributed like original data from Tier-0– ESD to 1 partner Tier-1– AOD and DPD to all other 9 Tier-1’s (and CERN)
• and further to the Tier-2’s
• AOD and DPD load simulated through DDM FT
![Page 23: ATLAS Distributed Computing](https://reader035.vdocuments.site/reader035/viewer/2022062408/56813e85550346895da8bd05/html5/thumbnails/23.jpg)
Tier-1 Tier-2 Volume and rates
• Computing Model foresaw 1 copy of AOD+DPD per cloud• Tier-2 sites very hugely in size and many clouds export more
than 1 copy
![Page 24: ATLAS Distributed Computing](https://reader035.vdocuments.site/reader035/viewer/2022062408/56813e85550346895da8bd05/html5/thumbnails/24.jpg)
Activity areas
1. Detector data distribution 2. Detector data re-processing (in the Tier-1’s)3. MC Simulation production (in the Tier-2’s)4. User analysis (in the Tier-2’s)
24
![Page 25: ATLAS Distributed Computing](https://reader035.vdocuments.site/reader035/viewer/2022062408/56813e85550346895da8bd05/html5/thumbnails/25.jpg)
G4 Monte Carlo Simulation Production
EVNT = 0.02 MB/eventHITS = 2.0 MB/eventRDO = 2.0 MB/eventESD = 1.0 MB/eventAOD = 0.2 MB/eventTAG = 0.01 MB/event
G4 simulation takes ~1000 s/eventdigi+reco takes ~20-40 s/event
![Page 26: ATLAS Distributed Computing](https://reader035.vdocuments.site/reader035/viewer/2022062408/56813e85550346895da8bd05/html5/thumbnails/26.jpg)
MC Simulation Production statistics• Only limited by requests and disk space
![Page 27: ATLAS Distributed Computing](https://reader035.vdocuments.site/reader035/viewer/2022062408/56813e85550346895da8bd05/html5/thumbnails/27.jpg)
G4 Simulation Volumes• Mc09 should have started during STEP09• Exclusively run on Tier-2 resources
– Rate will be lower because other activities– Small HITS files produced in Tier-2’s uploaded to Tier-1– Merged into Jumbo HITS and written to tape in Tier-1
• Merged MC08 data from tape will be used for reconstruction– AOD (and some ESD) written back to tape and distributed like data
![Page 28: ATLAS Distributed Computing](https://reader035.vdocuments.site/reader035/viewer/2022062408/56813e85550346895da8bd05/html5/thumbnails/28.jpg)
Activity areas
1. Detector data distribution 2. Detector data re-processing (in the Tier-1’s)3. MC Simulation production (in the Tier-2’s)4. User analysis (in the Tier-2’s)
28
![Page 29: ATLAS Distributed Computing](https://reader035.vdocuments.site/reader035/viewer/2022062408/56813e85550346895da8bd05/html5/thumbnails/29.jpg)
5. User analysis
• Mainly done in Tier-2’s• 50% capacity should be reserved for user analysis
– We already see 30% activity at least in some site• In addition some Tier-1 sites have analysis facilities
– Must make sure does not disrupt scheduled Tier-1 activities• We will also use HammerCloud analysis test framework
– Contains 4 different AOD analyses– Can generate constant flow of jobs– Uses both WMS and PanDA back-ends in EGEE
• Tier-2’s should install the following shares:
29
![Page 30: ATLAS Distributed Computing](https://reader035.vdocuments.site/reader035/viewer/2022062408/56813e85550346895da8bd05/html5/thumbnails/30.jpg)
Putting it all togetherTier-1 Volumes and Rates for STEP09
• For CCIN2P3:– ~10 TB for MCDISK and ~200 TB for DATADISK and ~55 TB on tape– ~166 MB/s data in and 265 MB/s data out (!?)
![Page 31: ATLAS Distributed Computing](https://reader035.vdocuments.site/reader035/viewer/2022062408/56813e85550346895da8bd05/html5/thumbnails/31.jpg)
Tape Usage
For CCIN2P3• Reading: 143 MB/s
– RAW for re-processing– Jumbo HITS for reconstruction
• Writing: 44 MB/s– RAW from the Tier-0– (Merged) HITS from the Tier-2’s– output from re-processing (ESD, AOD, DPD, ..)– output from reconstruction (RDO, AOD, ..)
![Page 32: ATLAS Distributed Computing](https://reader035.vdocuments.site/reader035/viewer/2022062408/56813e85550346895da8bd05/html5/thumbnails/32.jpg)
Nous vous demandons ..
• Que les sites vérifient les chiffres et les dates– nous savons que CCIN2P3 ne peut pas faire pre-staging automatique– C’est prévu d’avoir suffisamment de refroidissement début Juin?– Combien est la capacité des Tier-2’s dans le nuage Français?
• Qu’ils y a une personne/des personnes qui observent– Sur chargement de systèmes, ralentissements, erreurs, ..– Saturations des bandes de passages – Au moins 1 personne par site, Tier-1 et Tier-2– On aimeraient rassembler des noms
• Aide pour rassembler des informations pour le rapport final– Post mortem Juillet 9 - 10
![Page 33: ATLAS Distributed Computing](https://reader035.vdocuments.site/reader035/viewer/2022062408/56813e85550346895da8bd05/html5/thumbnails/33.jpg)
Nous vous offrons
• Un twiki avec de l’information détaillée• La réunion Atlas (aussi par tel.) de 09:00 hr. • La réunion WLCG (aussi par tel.) de 15:00 hr• La réunion d’opération (aussi par tel.) les Jeudis a 15:30 hr.• La réunion virtuelle par Skype (24/24)• Plusieurs listes de courrier électronique• Des adresses email privées• Des numéros de téléphone• Notre bonne volonté
![Page 34: ATLAS Distributed Computing](https://reader035.vdocuments.site/reader035/viewer/2022062408/56813e85550346895da8bd05/html5/thumbnails/34.jpg)
La Fin
34
![Page 35: ATLAS Distributed Computing](https://reader035.vdocuments.site/reader035/viewer/2022062408/56813e85550346895da8bd05/html5/thumbnails/35.jpg)
reconstruction
analysis
simulationinteractivephysicsanalysis
batchphysicsanalysis
detector
event summary data
rawdata
eventreprocessing
eventsimulation
analysis objects(extracted by physics topic)
Data Handling and Computation for Physics
Analysisevent filter(selection &
reconstruction)
processeddata
Tier-2
Tier-1
Tier-0
35
![Page 36: ATLAS Distributed Computing](https://reader035.vdocuments.site/reader035/viewer/2022062408/56813e85550346895da8bd05/html5/thumbnails/36.jpg)
atldata
atlprod
atlcal
t0atlas t0merge
CPUsCPUsCPUsT0
CPUsCPUsCPUsCAF
Tape
T1T1T1T1
Calibration data
Xpress Stream
Group
Scratch
CPUsCPUsCPUsCPU
CPUsCPUsCPUsCPUAOD
DPD
AOD
DPD
managers spaceusers space
calibration and alignment
physics group analysis
end-user analysis
afs
Storage Area’s @CERN
36
Detector data
Re-processing data
MC data
default
User
Detector data
![Page 37: ATLAS Distributed Computing](https://reader035.vdocuments.site/reader035/viewer/2022062408/56813e85550346895da8bd05/html5/thumbnails/37.jpg)
T1 Space Token Summary
![Page 38: ATLAS Distributed Computing](https://reader035.vdocuments.site/reader035/viewer/2022062408/56813e85550346895da8bd05/html5/thumbnails/38.jpg)
T2 Space Token Summary