lhcb 2009-q4 report

11
LHCb 2009-Q4 report

Upload: margot

Post on 15-Jan-2016

31 views

Category:

Documents


0 download

DESCRIPTION

LHCb 2009-Q4 report. Activities in 2009-Q4. Core Software Stable versions of Gaudi and LCG-AA Applications Stable as of September for real data Fast minor releases to cope with reality of life… Monte-Carlo Some MC09 channel simulation Few events in foreseen 2009 configuration - PowerPoint PPT Presentation

TRANSCRIPT

Page 1: LHCb 2009-Q4 report

LHCb 2009-Q4report

Page 2: LHCb 2009-Q4 report

2009 Q

4 r

ep

ort

LHCb 2009-Q4 report, PhC 2

Activities in 2009-Q4

m Core Softwareo Stable versions of Gaudi and LCG-AA

m Applicationso Stable as of September for real datao Fast minor releases to cope with reality of life…

m Monte-Carloo Some MC09 channel simulationo Few events in foreseen 2009 configurationo Minimum bias MC09 stripping

m Real data reconstructiono As of November 20th

Page 3: LHCb 2009-Q4 report

2009 Q

4 r

ep

ort

LHCb 2009-Q4 report, PhC 3

Jobs in 2009-Q4

Page 4: LHCb 2009-Q4 report

2009 Q

4 r

ep

ort

LHCb 2009-Q4 report, PhC 4

First experience with real data

m Very low crossing rateo Maximum 8 bunches colliding (88 kHz crossing)o Very low luminosityo Minimum bias trigger rate: from 0.1 to 10 Hzo Data taken with single beam and with collisions

Page 5: LHCb 2009-Q4 report

2009 Q

4 r

ep

ort

LHCb 2009-Q4 report, PhC 5

Real data processing

m Iterative processo Small changes in reconstruction applicationo Improved alignmento In total 5 sets of processing conditions

P Only last files were all processed twicem Processing submission

o Automatic job creation and submission after:P File is successfully migrated in CastorP File is successfully replicated at Tier1

o If job fails for a reason other than application crashP The file is reset as “to be processed”P New job is created / submitted

o Processing more efficient at CERN (see later)P Eventually after few trials at Tier1, the file is processed

at CERNo No stripping ;-)

P DST files distributed to all Tier1s for analysis

Page 6: LHCb 2009-Q4 report

2009 Q

4 r

ep

ort

LHCb 2009-Q4 report, PhC 6

Reconstruction jobs

Page 7: LHCb 2009-Q4 report

2009 Q

4 r

ep

ort

LHCb 2009-Q4 report, PhC 7

Issues with real data

m Castor migrationo Very low rate: had to change the migration algorithm

for more frequent migrationm Issue with large files (above 2 GB)

o Real data files are not ROOT files but open by ROOTo There was an issue with a compatibility library for

slc4-32 bit on slc5 nodesP Fixed within a day

m Wrong magnetic field signo Due to different coordinate systems for LHCb and

LHC ;-)o Fixed within hours

m Data access problem (by protocol, directly from server)o Still dCache issue at IN2P3 and NIKHEF

P dCache experts working on ito Moved to copy mode paradigm for reconstructiono Still a problem for user jobs

P Sites have been banned for analysis

Page 8: LHCb 2009-Q4 report

2009 Q

4 r

ep

ort

LHCb 2009-Q4 report, PhC 8

Transfers and job latency

m No problem observed during file transferso Files randomly distributed to Tier1o Will move to distribution by runs (few 100’s files)o For 2009, runs were not longer than 4-5 files!

m Very good Grid latencyo Time between submission and jobs starting running

Page 9: LHCb 2009-Q4 report

2009 Q

4 r

ep

ort

LHCb 2009-Q4 report, PhC 9

Brief digression on LHCb position w.r.t. MUPJs

m So-called Multi-User Pilot Jobs (MUPJ) are used by DIRAC on all sites that accept role=Piloto They are just regular jobs!

m MUPJs match any Dirac job in the central queueo Production or User analysis (single queue)o Each PJ can execute sequentially up to 5 jobs

P If remaining capabilities allow (e.g. CPU time left)P MUPJ has 5 tokens for matching jobs

o role=Pilot proxy can only retrieve jobs, limited to 5m A limited user proxy is used for DM operations of

the payloado Cannot be used for job submission

m Proxies can be hidden when not neededm DIRAC is instrumented for using gLexec (in any

mode)m First experience

o Problems are not with gLexec but with SCAS configuration

Page 10: LHCb 2009-Q4 report

2009 Q

4 r

ep

ort

LHCb 2009-Q4 report, PhC 10

MUPJs (cont’d)

m LHCb is not willing to loose efficiency due to the introduction of badly configured gLexeco Yet another point of failure!o Cannot afford testing individually all sites at once

m This topic has been lasting over 3 years nowo Where is the emergency? Why did it take so long if

so important?m Propose to reconsider pragmatically the policy

o They were defined when the frameworks had not been evaluated, and VOs had to swallow the bullet

o Re-assessing the risks was not really done in the TF (yet)

o Questionnaire leaves decisions to sitesP We got no message that sites are unhappy with current

situationo MUPJs are just jobs for which their owner is

responsibleP Move responsibility to MUPJ owner (“the VO”)P Two tier trust relation ( Sites / VO / User )P Apply a posteriori control and not a priori mistrust

o VOs should assess the risk on their side

Page 11: LHCb 2009-Q4 report

2009 Q

4 r

ep

ort

LHCb 2009-Q4 report, PhC 11

Conclusions

m Concentrating on real datao Very few data (200 GB)!o Very important learning exerciseo A few improvements identified for the 2010 running

P Run distribution (rather than files)P Conditions DB synchronization check

d Make sure Online Conditions are up-to-date

m Still some MC productionso With feedback from first real data

P E.g. final position of the VeLo (15 mm from beam)

m Analysiso First analysis of 2009 data made on the Grido Foresee a stripping phase for V0 physics publications

m LHCb definitely wants to continue using MUPJs!