the atlas computing model

36
The ATLAS Computing Model Eric Lançon Eric Lançon Saclay Saclay LCG-France LCG-France Lyon 14-15 Dec. 2005 Lyon 14-15 Dec. 2005

Upload: trista

Post on 17-Jan-2016

23 views

Category:

Documents


0 download

DESCRIPTION

The ATLAS Computing Model. Eric Lançon Saclay LCG-France Lyon 14-15 Dec. 2005. Overview. The ATLAS Facilities and their roles Growth of resources CPU, Disk, Mass Storage Network requirements CERN  Tier 1  Tier 2 Data & Service challenges. Computing Resources. - PowerPoint PPT Presentation

TRANSCRIPT

Page 1: The ATLAS Computing Model

The ATLAS Computing Model

Eric LançonEric LançonSaclaySaclay

LCG-France LCG-France

Lyon 14-15 Dec. 2005Lyon 14-15 Dec. 2005

Page 2: The ATLAS Computing Model

Workshop LCG-France E. lanconWorkshop LCG-France E. lancon 2

Overview

• The ATLAS Facilities and their rolesThe ATLAS Facilities and their roles

• Growth of resourcesGrowth of resources• CPU, Disk, Mass Storage

• Network requirementsNetwork requirements• CERN Tier 1 Tier 2

• Data & Service challengesData & Service challenges

Page 3: The ATLAS Computing Model

Workshop LCG-France E. lanconWorkshop LCG-France E. lancon 3

Computing Resources

• Computing Model fairly well evolved, but still being revisedComputing Model fairly well evolved, but still being revised• Documented in:

http://doc.cern.ch//archive/electronic/cern/preprints/lhcc/public/lhcc-2005-022.pdf

• There are (and will remain for some time) many unknownsThere are (and will remain for some time) many unknowns• Calibration and alignment strategy is still evolving

• Physics data access patterns MAY start to be exercised this Spring• Unlikely to know the real patterns until 2007/2008!

• Still uncertainties on the event sizes , reconstruction time

• Lesson from the previous round of experiments at CERN (LEP)Lesson from the previous round of experiments at CERN (LEP)• Reviews in 1988 underestimated the computing requirements by an

order of magnitude!

Page 4: The ATLAS Computing Model

Workshop LCG-France E. lanconWorkshop LCG-France E. lancon 4

ATLAS Facilities

• Event Filter Farm at CERN (pit)Event Filter Farm at CERN (pit)• Located near the Experiment• Assembles data into a stream to the Tier 0 Center

• Tier 0 Center at CERN (comp. center)Tier 0 Center at CERN (comp. center)• Raw data Mass storage at CERN and to Tier 1 centers• Prompt reconstruction producing Event Summary Data (ESD) and Analysis

Object Data (AOD)• Ship ESD, AOD to Tier 1 centers

• Tier 1 Centers distributed worldwide (approximately 10 centers)Tier 1 Centers distributed worldwide (approximately 10 centers)• Re-reconstruction of raw data, producing new ESD, AOD

• Tier 2 Centers distributed worldwide (approximately 30 centers)Tier 2 Centers distributed worldwide (approximately 30 centers)• Monte Carlo Simulation, producing ESD, AOD• ESD, AOD Tier 1 centers• Physics analysis

• CERN Analysis FacilityCERN Analysis Facility

• Tier 3 Centers distributed worldwideTier 3 Centers distributed worldwide• Physics analysis

Page 5: The ATLAS Computing Model

Workshop LCG-France E. lanconWorkshop LCG-France E. lancon 5

Processing

• Tier-0:Tier-0:• First pass processing on express/calibration physics stream

• 24-48 hours later, process full physics data stream with reasonable calibrations

These imply large data movement from T0 to T1s

• Tier-1:Tier-1:• Reprocess 1-2 months after arrival with better calibrations

• Reprocess all resident RAW at year end with improved calibration and software

These imply large data movement from T1 to T1 and T1 to T2

Page 6: The ATLAS Computing Model

Workshop LCG-France E. lanconWorkshop LCG-France E. lancon 6

Processing cont’d

• Tier-1:Tier-1:• 1/10 of RAW data and derived samples• Shadow the ESD for another Tier-1 (e.g. 2/10 of whole sample)

• Full AOD sample• Reprocess 1-2 months after arrival with better calibrations (to produce a coherent dataset)

• Reprocess all resident RAW at year end with improved calibration and software

• Provide scheduled access to ESD samples

• Tier-2s Tier-2s • Provide access to AOD and group Derived Physics Datasets

• Carry the full simulation load

Page 7: The ATLAS Computing Model

Workshop LCG-France E. lanconWorkshop LCG-France E. lancon 7

Analysis Model

Analysis model broken into two componentsAnalysis model broken into two components• Scheduled central production of augmented AOD, tuples & TAG collections from ESD

Derived files moved to other T1s and to T2s

• Chaotic user analysis of augmented AOD streams, tuples, new selections etc and individual user simulation and CPU-bound tasks matching the official MC production

Modest job traffic between T2s

Page 8: The ATLAS Computing Model

Workshop LCG-France E. lanconWorkshop LCG-France E. lancon 8

Inputs to the ATLAS Computing Model (1)

Page 9: The ATLAS Computing Model

Workshop LCG-France E. lanconWorkshop LCG-France E. lancon 9

Inputs to the ATLAS Computing Model (2)

Page 10: The ATLAS Computing Model

Workshop LCG-France E. lanconWorkshop LCG-France E. lancon 10

Data Flow from experiment to T2s

• T0 Raw data T0 Raw data Mass Storage at CERN Mass Storage at CERN

• T0 Raw data T0 Raw data Tier 1 centers Tier 1 centers• 1/10 in each T1

• T0 ESD T0 ESD Tier 1 centers Tier 1 centers • 2 copies of ESD distributed worldwide

• 2/10 in each T1

• T0 AOD T0 AOD Each Tier 1 center Each Tier 1 center

• T1 T1 T2 T2• Some ESD, ALL AOD?

• T0 T0 T2 Calibration processing? T2 Calibration processing?• Dedicated T2s for some sub-detectors

Page 11: The ATLAS Computing Model

Workshop LCG-France E. lanconWorkshop LCG-France E. lancon 11

Total ATLAS Requirements in for 2008

Page 12: The ATLAS Computing Model

Workshop LCG-France E. lanconWorkshop LCG-France E. lancon 12

Important points

• Storage of Simulation data from Tier 2’sStorage of Simulation data from Tier 2’s• Assumed to be at T1s• Need partnerships to plan networking• Must have fail-over to other sites?

• CommissioningCommissioning• These numbers are calculated for the steady-state but with the requirement of flexibility in the early stages

• Simulation fraction is an important tunable Simulation fraction is an important tunable

parameter in T2 numbers!parameter in T2 numbers!• Simulation / Real Data = 20% in TDR

• Heavy Ion running still under discussion.Heavy Ion running still under discussion.

Page 13: The ATLAS Computing Model

Workshop LCG-France E. lanconWorkshop LCG-France E. lancon 13

ATLAS T0 Resources

Page 14: The ATLAS Computing Model

Workshop LCG-France E. lanconWorkshop LCG-France E. lancon 14

ATLAS T1 Resources

Page 15: The ATLAS Computing Model

Workshop LCG-France E. lanconWorkshop LCG-France E. lancon 15

ATLAS T2 Resources

Page 16: The ATLAS Computing Model

Workshop LCG-France E. lanconWorkshop LCG-France E. lancon 16

Offres / T1

2008

Page 17: The ATLAS Computing Model

Workshop LCG-France E. lanconWorkshop LCG-France E. lancon 17

Tier 1 Tier 2 Bandwidth

0

50

100

150

200

250

JulAug Sep Oct NovDec Jan Feb Mar Apr May

Jun JulAugSep Oct NovDec Jan Feb Mar Apr May

Jun JulAugSep Oct NovDec Jan Feb Mar Apr May

Jun JulAugSep Oct Nov Dec

2007 2008 2009 2010

Month

MB/s (nominal)

ATLAS HI

ATLAS

The projected time profile of the nominal aggregate bandwidth expected for an average ATLAS Tier- 1 and its three associated Tier-2s.

Page 18: The ATLAS Computing Model

Workshop LCG-France E. lanconWorkshop LCG-France E. lancon 18

Tier 1 CERN Bandwidth

0

200

400

600

800

1000

1200

JulAugSepOctNovDecJanFebMarAprMayJunJulAugSepOctNovDecJanFebMarAprMayJunJulAugSepOctNovDecJanFebMarAprMayJunJulAugSepOctNovDec

2007 2008 2009 2010

Mo nth

MB/s (nominal)

ATLAS HI

ATLAS

The projected time profile of the nominal bandwidth required between CERN and the Tier-1 cloud.

Page 19: The ATLAS Computing Model

Workshop LCG-France E. lanconWorkshop LCG-France E. lancon 19

Tier 1 Tier 1 Bandwidth

0

100

200

300

400

500

600

700

800

900

1000

JulAugSepOctNovDecJanFebMarAprMayJunJulAugSepOctNovDecJanFebMarAprMayJunJulAugSepOctNovDecJanFebMarAprMayJunJulAugSepOctNovDec

2007 2008 2009 2010Mo nth

MB/s (nominal)

ATLAS HI

ATLAS

The projected time profile of the nominal bandwidth required between T1 and Tier-1.

Page 20: The ATLAS Computing Model

Workshop LCG-France E. lanconWorkshop LCG-France E. lancon 20

Conclusions

• Computing Model Data Flow understood for Computing Model Data Flow understood for

placing Raw, ESD and AOD at Tiered centersplacing Raw, ESD and AOD at Tiered centers• Still need to understand data flow implications of Physics Analysis• How often do you need to “back navigate” AOD to ESD?• How distributed is Distributed Analysis?

• Some of these issues will be addressed in the Some of these issues will be addressed in the

upcoming (early 2006) Computing System upcoming (early 2006) Computing System

Commissioning exercise.Commissioning exercise.

• Some will only be resolved with real data in Some will only be resolved with real data in

2007-82007-8

Page 21: The ATLAS Computing Model

Workshop LCG-France E. lanconWorkshop LCG-France E. lancon 21

Key dates for Service Preparation

SC3

LHC Service OperationFull physics run

2005 20072006 2008

First physicsFirst beams

cosmics

Sep05 - SC3 Service Phase

May06 –SC4 Service Phase

Sep06 – Initial LHC Service in stable operation

SC4

• SC3 – Reliable base service – most Tier-1s, some Tier-2s – basic experiment software chain – grid data throughput 1GB/sec, including mass storage 500 MB/sec (150 MB/sec & 60 MB/sec at Tier-1s)• SC4 – All Tier-1s, major Tier-2s – capable of supporting full experiment software chain inc. analysis – sustain nominal final grid data throughput (~ 1.5 GB/sec mass storage throughput)• LHC Service in Operation – September 2006 – ramp up to full operational capacity by April 2007 – capable of handling twice the nominal data throughput

Apr07 – LHC Service commissioned

Page 22: The ATLAS Computing Model

Workshop LCG-France E. lanconWorkshop LCG-France E. lancon 22

Data Challenges

Page 23: The ATLAS Computing Model

Workshop LCG-France E. lanconWorkshop LCG-France E. lancon 23

Statistics on LCG usage

Assez loin des ambitionsBcp de problèmes techniques au démarrageVers la fin 10% de LCG au CC

In2p3 = Clermont + Lyon

305410 jobs

Page 24: The ATLAS Computing Model

Workshop LCG-France E. lanconWorkshop LCG-France E. lancon 24

SC3 : T0->T1 exercise

T0 - T1 dataflowSC3 ATLAS

Page 25: The ATLAS Computing Model

Workshop LCG-France E. lanconWorkshop LCG-France E. lancon 25

Distributed Data Management (DDM)

• Catalogues généraux centralisés (LFC):Catalogues généraux centralisés (LFC):• Contenus des datasets (1 dataset = plusieurs fichiers)

• Localisation des datasets dans les sites (T0-T1-T2)

• Liste des requêtes de transferts des datasets, etc…

• Catalogues locaux (LFC)Catalogues locaux (LFC)• Localisation dans le site des fichiers de chaque dataset

• Site prend en charge à travers des agents (VOBox) :Site prend en charge à travers des agents (VOBox) :• Récupération auprès du catalogue central de la liste des

datasets et des fichiers associés à transférer

• Gestion du transfert

• Enregistrement des informations dans les catalogues locaux et centraux

Page 26: The ATLAS Computing Model

Workshop LCG-France E. lanconWorkshop LCG-France E. lancon 26

Pb : cron n’est pas géré par utilisateur grille

DDM au CC

• 1 serveur dCache1 serveur dCache• 2 pools de chacun 1-2 TB + Driver de bande : 40 MB/s

• Transfert par FTS (Geant 1 Gb/s)Transfert par FTS (Geant 1 Gb/s)• A quand 10 Gb/s?

• VOBox LCG mode-CCIN2P3VOBox LCG mode-CCIN2P3• Machine Linux SL3 • Catalogue LFC• Serveur FTS (futurs transferts T1-T1 ou T1-T2)• Certificats Grille• Accès gsissh• Soft DDM installé

• Pas besoin d'accès ROOT• Peut être installé depuis le CERN

Page 27: The ATLAS Computing Model

Workshop LCG-France E. lanconWorkshop LCG-France E. lancon 27

Services challenges

Page 28: The ATLAS Computing Model

Workshop LCG-France E. lanconWorkshop LCG-France E. lancon 28

Integrated TB transfered

Page 29: The ATLAS Computing Model

Workshop LCG-France E. lanconWorkshop LCG-France E. lancon 29

Backup Slides

Page 30: The ATLAS Computing Model

Workshop LCG-France E. lanconWorkshop LCG-France E. lancon 30

From LATB…

Page 31: The ATLAS Computing Model

Workshop LCG-France E. lanconWorkshop LCG-France E. lancon 31

Heavy Ion Running

Page 32: The ATLAS Computing Model

32Workshop LCG-France E. lanconWorkshop LCG-France E. lancon

Preliminary Tier-1 Resource Planning

Capacity at all Tier-1s in 2008

Tier-1 Planning for 2008 ALICE ATLAS CMS LHCb SUM 2008Planned capacity 6.2 22.6 11.6 4.4 44.8Required 12.3 24.0 15.2 4.4 55.9Balance -49% -6% -24% -0% -20%Offered 2.7 12.6 5.5 2.2 23Planned capacity 7.4 14.4 7.0 2.4 31.2Balance -64% -12% -21% -10% -26%Planned capacity 3.1 9.2 8.1 1.9 22.3Required 6.9 9.0 16.7 2.1 34.7Balance -55% 2% -51% -9% -36%

Includes current planning for all Tier-1 centres

CPU - MSI2K

Disk - PBytes

Tape - PBytes

Page 33: The ATLAS Computing Model

Workshop LCG-France E. lanconWorkshop LCG-France E. lancon 33

Preliminary Tier-2 Resource Planning

Capacity at all Tier-2s in 2008

• Includes resource planning from 27 Includes resource planning from 27 centres/federationscentres/federations

• 11 known Tier-2 federations have not yet provided 11 known Tier-2 federations have not yet provided datadata• These include potentially significant resources: USA CMS, Canada East+West, MPI Munich …..

Tier-2 Planning for 2008 ALICE ATLAS CMS LHCb SUM 2008Planned capacity 5.0 19.2 10.3 4.4 38.9Required 14.4 19.9 19.3 7.7 61.3Balance -65% -4% -46% -42% -36%Offered 1.4 5.7 3.1 0.8 11Required 5.1 8.7 4.9 0.023 18.723Balance -72% -34% -36% n/a -41%

# Tier-2 federations - included(expected) 13 (13) 20 (29) 17 (19) 11 (12) 27 (38)

Disk - PBytes

CPU - MSI2K

Page 34: The ATLAS Computing Model

Workshop LCG-France E. lanconWorkshop LCG-France E. lancon 34

‘Rome’ production : French sites

Page 35: The ATLAS Computing Model

Workshop LCG-France E. lanconWorkshop LCG-France E. lancon 35

‘Rome’ production : Italian sites

Page 36: The ATLAS Computing Model

Workshop LCG-France E. lanconWorkshop LCG-France E. lancon 36

Distributed Data Management (DDM)

• Eviter l’utilisation d’un catalogue plat complètement Eviter l’utilisation d’un catalogue plat complètement

centralisécentralisé

• Organisation hiérarchique des catalogues de donnéesOrganisation hiérarchique des catalogues de données• Dataset= Collection de fichiers

• Datablock : Collection de datasets

• Etiquetage des datasets pour maintenir une complète consistence

• Information de chaque fichier physique stockée Information de chaque fichier physique stockée

localementlocalement

• Mouvements de données par datasetMouvements de données par dataset

• Mouvement vers un site déclenché par une souscription Mouvement vers un site déclenché par une souscription

à travers un programme clientà travers un programme client