corrupted mc data chunks

10
Corrupted MC data Corrupted MC data chunks chunks Offline weekly Offline weekly July 7, 2012 July 7, 2012

Upload: job

Post on 09-Jan-2016

18 views

Category:

Documents


2 download

DESCRIPTION

Corrupted MC data chunks. Offline weekly July 7, 2012. The issue. As reported by PWG-LF, numerous sub-jobs from LHC11b10a MC have no global tracks (back-propagated ITS tracks) Matching efficiency drop and incorrect normalization factors In the above production, the effect is 3.5%(+1.2%) - PowerPoint PPT Presentation

TRANSCRIPT

Page 1: Corrupted MC data chunks

Corrupted MC data chunksCorrupted MC data chunks

Offline weeklyOffline weeklyJuly 7, 2012July 7, 2012

Page 2: Corrupted MC data chunks

22

The issue The issue • As reported by PWG-LF, numerous sub-jobs from LHC11b10a MC have no global tracks (back-propagated ITS tracks)

• Matching efficiency drop and incorrect normalization factors

• In the above production, the effect is 3.5%(+1.2%)• Full report in Savannah

• The effect is only in MC

Page 3: Corrupted MC data chunks

33

Forensics Forensics • If a file (Trigger.root) is not created during the simulation phase the string of detectors in the trigger cluster are left empty and all ITS layers are skipped (no ITS tracks)• The error generates only a warning in the reconstruction

W-AliReconstruction::GetEventInfo: No trigger can be loaded! The trigger information will not be used!

• The conditions for this are always in the late part of the simulation, usually, but not always, during digitisation

Page 4: Corrupted MC data chunks

44

Forensics (2) Forensics (2) • Two ‘events’ have been discovered so far

• AliRoot aborts during a failed access to OCDB (biggest contriibutor)• Silent crash, no specific error

• The AliRoot abort generates ‘Abort’ signal, which should have been printed in sim.log (redirect from standard error stream)

• However in some of the cases it does not appear…• … and subsequently is not caught by the job validation script

• The silent crash is not caught by any of the ‘per job’ validations

Page 5: Corrupted MC data chunks

55

Forensics (3) Forensics (3) • The defective jobs are not caught by

• validation script – parses only *.log, not stderr/stdout• Per job CheckESD macro, successful also in the ‘corrupted’ case • The per run QA – there is a ‘hint’, but it is dissolved as the error is on ~4% level• …In addition, the mean vertex cut eliminates the events

Page 6: Corrupted MC data chunks

66

Re-validation of the productions Re-validation of the productions • Fast and indirect method – size of the sim.log

Good production

LHC11b10a

Bad chunks, 4.9%

Page 7: Corrupted MC data chunks

77

Re-validation of the productions (2) Re-validation of the productions (2) • Other cases and Pb+Pb

LHC11b10c – not straightforward PbPb, OK period

Page 8: Corrupted MC data chunks

88

‘‘Suspicious’ cycles Suspicious’ cycles • Tested all 2010 (149 cycles), 2011 (104 cycles), 2012 (62 cycles)Production cycle Type Effect from

log size scan [%]Effect from stderrscan (Abort) [%]

LHC10a13 pp (early physics) 0.07

LHC10b4 pp (early physics) 0.03

LHC10b5 pp (early physics) 0.61

LHC10d5 pp (beauty) 0.94

LHC11a2h-j pp (jet-jet) 2.0

LHC11a6a-b pp (Xi, Omega) 12.0

LHC11b2 pp (rare charm) 1.9

LHC11b5-6 pp (Pythia, Phojet) 6.9

LHC11b10a pp 2.76GeV MB 4.9 3.7

LHC11b10b pp 2.67 HF 4.4 12.5

LHC11b10c pp 2.76 Flat 12.6 14.4

LHC11b12a pp 2.76 Pythia 0.9

LHC11b12b pp 2.76 Phojet 3.6

LHC11c2a-b pp PMD special 1.75

LHC12a2 pp , difract. 0.65

LHC12a9 pp, 2.76 HF-e 41%(?) – two equal distr.

LHC12a13b pp, 2.76, Omega 71%(?) – two equal disrt.

Page 9: Corrupted MC data chunks

99

Past productions remedyPast productions remedy• From the above table, scan rec.log for ‘W-AliReconstruction::GetEventInfo: No trigger…’ to positively identify affected chunks

• Ongoing…• Rename the ESDs and AODs in the catalogue to ‘something else’, which will not show up in the standard analysis searches

• Mild danger for analysis, which uses ‘prepared’ collections – jobs will fail…• Merged AOD (deltas) will have to be re-merged

• For Pb+Pb, a cut on ‘zero ITS tracks’ will eliminate the bad chunks

Page 10: Corrupted MC data chunks

1010

Code fixesCode fixes• job validation – scan all files (implemented)• per job ‘checkESD’ macro – strengthen the script, positive feedback to validate the job • QA – to be discussed• reconstruction logic – abort in case the Trigger.root file is not found

• Follow-up by Offline, discussion in the weekly meetings