the atlas computing model
DESCRIPTION
The ATLAS Computing Model. Eric Lançon Saclay LCG-France Lyon 14-15 Dec. 2005. Overview. The ATLAS Facilities and their roles Growth of resources CPU, Disk, Mass Storage Network requirements CERN Tier 1 Tier 2 Data & Service challenges. Computing Resources. - PowerPoint PPT PresentationTRANSCRIPT
The ATLAS Computing Model
Eric LançonEric LançonSaclaySaclay
LCG-France LCG-France
Lyon 14-15 Dec. 2005Lyon 14-15 Dec. 2005
Workshop LCG-France E. lanconWorkshop LCG-France E. lancon 2
Overview
• The ATLAS Facilities and their rolesThe ATLAS Facilities and their roles
• Growth of resourcesGrowth of resources• CPU, Disk, Mass Storage
• Network requirementsNetwork requirements• CERN Tier 1 Tier 2
• Data & Service challengesData & Service challenges
Workshop LCG-France E. lanconWorkshop LCG-France E. lancon 3
Computing Resources
• Computing Model fairly well evolved, but still being revisedComputing Model fairly well evolved, but still being revised• Documented in:
http://doc.cern.ch//archive/electronic/cern/preprints/lhcc/public/lhcc-2005-022.pdf
• There are (and will remain for some time) many unknownsThere are (and will remain for some time) many unknowns• Calibration and alignment strategy is still evolving
• Physics data access patterns MAY start to be exercised this Spring• Unlikely to know the real patterns until 2007/2008!
• Still uncertainties on the event sizes , reconstruction time
• Lesson from the previous round of experiments at CERN (LEP)Lesson from the previous round of experiments at CERN (LEP)• Reviews in 1988 underestimated the computing requirements by an
order of magnitude!
Workshop LCG-France E. lanconWorkshop LCG-France E. lancon 4
ATLAS Facilities
• Event Filter Farm at CERN (pit)Event Filter Farm at CERN (pit)• Located near the Experiment• Assembles data into a stream to the Tier 0 Center
• Tier 0 Center at CERN (comp. center)Tier 0 Center at CERN (comp. center)• Raw data Mass storage at CERN and to Tier 1 centers• Prompt reconstruction producing Event Summary Data (ESD) and Analysis
Object Data (AOD)• Ship ESD, AOD to Tier 1 centers
• Tier 1 Centers distributed worldwide (approximately 10 centers)Tier 1 Centers distributed worldwide (approximately 10 centers)• Re-reconstruction of raw data, producing new ESD, AOD
• Tier 2 Centers distributed worldwide (approximately 30 centers)Tier 2 Centers distributed worldwide (approximately 30 centers)• Monte Carlo Simulation, producing ESD, AOD• ESD, AOD Tier 1 centers• Physics analysis
• CERN Analysis FacilityCERN Analysis Facility
• Tier 3 Centers distributed worldwideTier 3 Centers distributed worldwide• Physics analysis
Workshop LCG-France E. lanconWorkshop LCG-France E. lancon 5
Processing
• Tier-0:Tier-0:• First pass processing on express/calibration physics stream
• 24-48 hours later, process full physics data stream with reasonable calibrations
These imply large data movement from T0 to T1s
• Tier-1:Tier-1:• Reprocess 1-2 months after arrival with better calibrations
• Reprocess all resident RAW at year end with improved calibration and software
These imply large data movement from T1 to T1 and T1 to T2
Workshop LCG-France E. lanconWorkshop LCG-France E. lancon 6
Processing cont’d
• Tier-1:Tier-1:• 1/10 of RAW data and derived samples• Shadow the ESD for another Tier-1 (e.g. 2/10 of whole sample)
• Full AOD sample• Reprocess 1-2 months after arrival with better calibrations (to produce a coherent dataset)
• Reprocess all resident RAW at year end with improved calibration and software
• Provide scheduled access to ESD samples
• Tier-2s Tier-2s • Provide access to AOD and group Derived Physics Datasets
• Carry the full simulation load
Workshop LCG-France E. lanconWorkshop LCG-France E. lancon 7
Analysis Model
Analysis model broken into two componentsAnalysis model broken into two components• Scheduled central production of augmented AOD, tuples & TAG collections from ESD
Derived files moved to other T1s and to T2s
• Chaotic user analysis of augmented AOD streams, tuples, new selections etc and individual user simulation and CPU-bound tasks matching the official MC production
Modest job traffic between T2s
Workshop LCG-France E. lanconWorkshop LCG-France E. lancon 8
Inputs to the ATLAS Computing Model (1)
Workshop LCG-France E. lanconWorkshop LCG-France E. lancon 9
Inputs to the ATLAS Computing Model (2)
Workshop LCG-France E. lanconWorkshop LCG-France E. lancon 10
Data Flow from experiment to T2s
• T0 Raw data T0 Raw data Mass Storage at CERN Mass Storage at CERN
• T0 Raw data T0 Raw data Tier 1 centers Tier 1 centers• 1/10 in each T1
• T0 ESD T0 ESD Tier 1 centers Tier 1 centers • 2 copies of ESD distributed worldwide
• 2/10 in each T1
• T0 AOD T0 AOD Each Tier 1 center Each Tier 1 center
• T1 T1 T2 T2• Some ESD, ALL AOD?
• T0 T0 T2 Calibration processing? T2 Calibration processing?• Dedicated T2s for some sub-detectors
Workshop LCG-France E. lanconWorkshop LCG-France E. lancon 11
Total ATLAS Requirements in for 2008
Workshop LCG-France E. lanconWorkshop LCG-France E. lancon 12
Important points
• Storage of Simulation data from Tier 2’sStorage of Simulation data from Tier 2’s• Assumed to be at T1s• Need partnerships to plan networking• Must have fail-over to other sites?
• CommissioningCommissioning• These numbers are calculated for the steady-state but with the requirement of flexibility in the early stages
• Simulation fraction is an important tunable Simulation fraction is an important tunable
parameter in T2 numbers!parameter in T2 numbers!• Simulation / Real Data = 20% in TDR
• Heavy Ion running still under discussion.Heavy Ion running still under discussion.
Workshop LCG-France E. lanconWorkshop LCG-France E. lancon 13
ATLAS T0 Resources
Workshop LCG-France E. lanconWorkshop LCG-France E. lancon 14
ATLAS T1 Resources
Workshop LCG-France E. lanconWorkshop LCG-France E. lancon 15
ATLAS T2 Resources
Workshop LCG-France E. lanconWorkshop LCG-France E. lancon 16
Offres / T1
2008
Workshop LCG-France E. lanconWorkshop LCG-France E. lancon 17
Tier 1 Tier 2 Bandwidth
0
50
100
150
200
250
JulAug Sep Oct NovDec Jan Feb Mar Apr May
Jun JulAugSep Oct NovDec Jan Feb Mar Apr May
Jun JulAugSep Oct NovDec Jan Feb Mar Apr May
Jun JulAugSep Oct Nov Dec
2007 2008 2009 2010
Month
MB/s (nominal)
ATLAS HI
ATLAS
The projected time profile of the nominal aggregate bandwidth expected for an average ATLAS Tier- 1 and its three associated Tier-2s.
Workshop LCG-France E. lanconWorkshop LCG-France E. lancon 18
Tier 1 CERN Bandwidth
0
200
400
600
800
1000
1200
JulAugSepOctNovDecJanFebMarAprMayJunJulAugSepOctNovDecJanFebMarAprMayJunJulAugSepOctNovDecJanFebMarAprMayJunJulAugSepOctNovDec
2007 2008 2009 2010
Mo nth
MB/s (nominal)
ATLAS HI
ATLAS
The projected time profile of the nominal bandwidth required between CERN and the Tier-1 cloud.
Workshop LCG-France E. lanconWorkshop LCG-France E. lancon 19
Tier 1 Tier 1 Bandwidth
0
100
200
300
400
500
600
700
800
900
1000
JulAugSepOctNovDecJanFebMarAprMayJunJulAugSepOctNovDecJanFebMarAprMayJunJulAugSepOctNovDecJanFebMarAprMayJunJulAugSepOctNovDec
2007 2008 2009 2010Mo nth
MB/s (nominal)
ATLAS HI
ATLAS
The projected time profile of the nominal bandwidth required between T1 and Tier-1.
Workshop LCG-France E. lanconWorkshop LCG-France E. lancon 20
Conclusions
• Computing Model Data Flow understood for Computing Model Data Flow understood for
placing Raw, ESD and AOD at Tiered centersplacing Raw, ESD and AOD at Tiered centers• Still need to understand data flow implications of Physics Analysis• How often do you need to “back navigate” AOD to ESD?• How distributed is Distributed Analysis?
• Some of these issues will be addressed in the Some of these issues will be addressed in the
upcoming (early 2006) Computing System upcoming (early 2006) Computing System
Commissioning exercise.Commissioning exercise.
• Some will only be resolved with real data in Some will only be resolved with real data in
2007-82007-8
Workshop LCG-France E. lanconWorkshop LCG-France E. lancon 21
Key dates for Service Preparation
SC3
LHC Service OperationFull physics run
2005 20072006 2008
First physicsFirst beams
cosmics
Sep05 - SC3 Service Phase
May06 –SC4 Service Phase
Sep06 – Initial LHC Service in stable operation
SC4
• SC3 – Reliable base service – most Tier-1s, some Tier-2s – basic experiment software chain – grid data throughput 1GB/sec, including mass storage 500 MB/sec (150 MB/sec & 60 MB/sec at Tier-1s)• SC4 – All Tier-1s, major Tier-2s – capable of supporting full experiment software chain inc. analysis – sustain nominal final grid data throughput (~ 1.5 GB/sec mass storage throughput)• LHC Service in Operation – September 2006 – ramp up to full operational capacity by April 2007 – capable of handling twice the nominal data throughput
Apr07 – LHC Service commissioned
Workshop LCG-France E. lanconWorkshop LCG-France E. lancon 22
Data Challenges
Workshop LCG-France E. lanconWorkshop LCG-France E. lancon 23
Statistics on LCG usage
Assez loin des ambitionsBcp de problèmes techniques au démarrageVers la fin 10% de LCG au CC
In2p3 = Clermont + Lyon
305410 jobs
Workshop LCG-France E. lanconWorkshop LCG-France E. lancon 24
SC3 : T0->T1 exercise
T0 - T1 dataflowSC3 ATLAS
Workshop LCG-France E. lanconWorkshop LCG-France E. lancon 25
Distributed Data Management (DDM)
• Catalogues généraux centralisés (LFC):Catalogues généraux centralisés (LFC):• Contenus des datasets (1 dataset = plusieurs fichiers)
• Localisation des datasets dans les sites (T0-T1-T2)
• Liste des requêtes de transferts des datasets, etc…
• Catalogues locaux (LFC)Catalogues locaux (LFC)• Localisation dans le site des fichiers de chaque dataset
• Site prend en charge à travers des agents (VOBox) :Site prend en charge à travers des agents (VOBox) :• Récupération auprès du catalogue central de la liste des
datasets et des fichiers associés à transférer
• Gestion du transfert
• Enregistrement des informations dans les catalogues locaux et centraux
Workshop LCG-France E. lanconWorkshop LCG-France E. lancon 26
Pb : cron n’est pas géré par utilisateur grille
DDM au CC
• 1 serveur dCache1 serveur dCache• 2 pools de chacun 1-2 TB + Driver de bande : 40 MB/s
• Transfert par FTS (Geant 1 Gb/s)Transfert par FTS (Geant 1 Gb/s)• A quand 10 Gb/s?
• VOBox LCG mode-CCIN2P3VOBox LCG mode-CCIN2P3• Machine Linux SL3 • Catalogue LFC• Serveur FTS (futurs transferts T1-T1 ou T1-T2)• Certificats Grille• Accès gsissh• Soft DDM installé
• Pas besoin d'accès ROOT• Peut être installé depuis le CERN
Workshop LCG-France E. lanconWorkshop LCG-France E. lancon 27
Services challenges
Workshop LCG-France E. lanconWorkshop LCG-France E. lancon 28
Integrated TB transfered
Workshop LCG-France E. lanconWorkshop LCG-France E. lancon 29
Backup Slides
Workshop LCG-France E. lanconWorkshop LCG-France E. lancon 30
From LATB…
Workshop LCG-France E. lanconWorkshop LCG-France E. lancon 31
Heavy Ion Running
32Workshop LCG-France E. lanconWorkshop LCG-France E. lancon
Preliminary Tier-1 Resource Planning
Capacity at all Tier-1s in 2008
Tier-1 Planning for 2008 ALICE ATLAS CMS LHCb SUM 2008Planned capacity 6.2 22.6 11.6 4.4 44.8Required 12.3 24.0 15.2 4.4 55.9Balance -49% -6% -24% -0% -20%Offered 2.7 12.6 5.5 2.2 23Planned capacity 7.4 14.4 7.0 2.4 31.2Balance -64% -12% -21% -10% -26%Planned capacity 3.1 9.2 8.1 1.9 22.3Required 6.9 9.0 16.7 2.1 34.7Balance -55% 2% -51% -9% -36%
Includes current planning for all Tier-1 centres
CPU - MSI2K
Disk - PBytes
Tape - PBytes
Workshop LCG-France E. lanconWorkshop LCG-France E. lancon 33
Preliminary Tier-2 Resource Planning
Capacity at all Tier-2s in 2008
• Includes resource planning from 27 Includes resource planning from 27 centres/federationscentres/federations
• 11 known Tier-2 federations have not yet provided 11 known Tier-2 federations have not yet provided datadata• These include potentially significant resources: USA CMS, Canada East+West, MPI Munich …..
Tier-2 Planning for 2008 ALICE ATLAS CMS LHCb SUM 2008Planned capacity 5.0 19.2 10.3 4.4 38.9Required 14.4 19.9 19.3 7.7 61.3Balance -65% -4% -46% -42% -36%Offered 1.4 5.7 3.1 0.8 11Required 5.1 8.7 4.9 0.023 18.723Balance -72% -34% -36% n/a -41%
# Tier-2 federations - included(expected) 13 (13) 20 (29) 17 (19) 11 (12) 27 (38)
Disk - PBytes
CPU - MSI2K
Workshop LCG-France E. lanconWorkshop LCG-France E. lancon 34
‘Rome’ production : French sites
Workshop LCG-France E. lanconWorkshop LCG-France E. lancon 35
‘Rome’ production : Italian sites
Workshop LCG-France E. lanconWorkshop LCG-France E. lancon 36
Distributed Data Management (DDM)
• Eviter l’utilisation d’un catalogue plat complètement Eviter l’utilisation d’un catalogue plat complètement
centralisécentralisé
• Organisation hiérarchique des catalogues de donnéesOrganisation hiérarchique des catalogues de données• Dataset= Collection de fichiers
• Datablock : Collection de datasets
• Etiquetage des datasets pour maintenir une complète consistence
• Information de chaque fichier physique stockée Information de chaque fichier physique stockée
localementlocalement
• Mouvements de données par datasetMouvements de données par dataset
• Mouvement vers un site déclenché par une souscription Mouvement vers un site déclenché par une souscription
à travers un programme clientà travers un programme client