v5-01-release & v5-02-release
DESCRIPTION
v5-01-Release & v5-02-Release. Peter Hristov 13 /02/ 2012. Changes: v5-01-Rev-22. #90817 Please commit PHOS trigger part in the AliAnalysisTaskESDfilter.cxx . From rev. 54439 #90870 Request to port ZDC code to the release. From rev. 54133 - PowerPoint PPT PresentationTRANSCRIPT
v5-01-Release & v5-02-Release
Peter Hristov13/02/2012
Changes: v5-01-Rev-22• #90817 Please commit PHOS trigger part in the
AliAnalysisTaskESDfilter.cxx. From rev. 54439• #90870 Request to port ZDC code to the release. From rev.
54133• #90916 Request: porting to v5-01-Release of the new ESD-
>AOD filter. From rev. 52540,53853,54188,54210• #91005 Fix in AliCTPRawStream.cxx. From rev. 54234• #91126: Request to commit a patch for AMPT. From rev. 54424• #91159: Vertex generation in AliGenCorrHF. From rev. 54417• #91030: 2.76 TeV LHC11a pass3 dca problem. From rev. 54304
Changes: v5-01-Rev-22• #88827: Request for porting updates to TOF QA task into
release. From rev. 54242• #90320: Request to port additional consistency checks in HLT
TPC cluster decoding to v5-01-Release. From rev. 54055,54056• #90546: Filtering crashes when processing 2010 MC data.
From rev. 54115• #90738 Request to port a fix to the release in AliZDCDigitizer.
From rev. 54035• #90749 ESD Porting Request: GetTPCClusterInfo with
additional switch. From rev. 54081• #90812 Porting overlap fixes to the release. From rev. 54092
v5-01-Rev-23
• #91061: AOD production very slow. From rev. 54460
Requests: v5-01-Release
• #24646 Re-produce AODs for cascades in pass2 PbPb 2010 (data & MC). Change in the configuration of vertexingHF from rev. 54561
v5-02-Release
• Branch v5-02-Release is created on 03/02• All tests OK (Root v5-30-00-patches)• Tests with Root v5-32-00-patches: ongoing• Several modifications to be ported• Issues
– Creation of AliMDC RPM with shared libraries– Creation of PAR files– Merging of raw tags
Old slides
Other reports
• #90615 Problems in the material budget, eta<0.9 and 0.9<eta<1.4
• #90944 adding alignment objects for the additional supermodules in the geometry setup 2012
• #90939 Request to include (anti)hypertriton in ALICE GEANT3
Requests/Additional fixes
• #90625 Memory problem in AliTPCtrackerMI• #90622 Logic flaw in AliTPCseed. From rev.
53997• #90616 Worrying message from TPC
reconstruction. From rev. 54237• Changes in RAW (TClonesArray usage)
v5-02-Release
• Coverity: 129 defects to be fixed• AliRoot tests: mostly OK• Root v5-32-00-patches: needs tests• PWGs transition: PWG0 and PWG2 still
ongoing• One library per subdirectory: not yet ready• Savannah bug reports: many old bugs are still
open
GDB on Grid
• Some potential problems detected and fixed (ITS, TPC, HLT)
• Some jobs fail in the beginning (event 0-10), ~4%– Not reproducible locally, even if we run many
reconstruction jobs in parallel– Always caused by std_badalloc in different places
• Other jobs are killed by the system (memory) ~20%
Changes: v5-01-Rev-21• #90324: Exception in
AliITStrackerMI::FollowProlongationTree. From rev. 53978• #90549: Request to port r53948 to the release (MUON
small leak fix)• #90658: For v5-01: Option to isolate heavy flavor part of a
Pythia event. From rev. 53959• #84578: Request to extend AliGenBox for using Yrange.
From rev. 53996• Optional RB/PX 24 shielding and scoring. From rev.
53955,53956
Changes: v5-01-Rev-21
• #90461: Request to port a new feature for ZDC to the release. From rev. 53705
• #90504: EVE muon_init.C update r53875• #25142: Commit and porting to Release of the
new ESD->AOD filter. From rev. 54021• #90540: Port 53910,53911 and 53912 to the
Release (Full MC Header in the AOD)
Changes: OCDB
• #90756 Request to port object in RAW OCDB (for realistic MUON simulations)
• #90736 Calibration of the TRD cosmics of May,Jun and August
Reconstruction of RAW (LHC11h)
• Back trace problem solved• Clean-up of the PATH and LD_LIBRARY_PATH
on the GRID• Clean-up of the AliEn libraries• Deterministic splitting of the failed jobs (in
preparation)• New tests in parallel with the Grid production
Changes: v5-01-Rev-20• #90319: Segmentation violation in
AliPHOSRawFitterv1::~AliPHOSRawFitterv1. From rev. 53869• #90053: Request: Port bug fix TRD calibration code to release. From rev.
53734• #90292: Add line ConvertZDC() in
AliAnalysisTaskESDfilter::ConvertESDtoAOD(). From rev. 53895• #90307: ZDC QA update. From rev. 52738,53081,53271• #90309: ZDC request to port code to the release. From rev. 52616• #90024: port changes in PYTHIA6 for pyquen production (pyquen-
1.5.F,CMakelib6.4.21.pkg updated), rev.53645• #90359: Request: fix cached values in ESD. From rev. 53900• #90013: Vertexing task crashing in trunk. From rev. 53793• Additional protection. From rev. 53904
LHC11h Pass2 – reconstruction details• Use v5-01-Rev-19 in the production• Start in inverse time order (last runs first, “LIFO”): OK• Use MB trigger for CPass0: OK• Exercise the full production setup on runs from “grey area”:
special “gdb” production, run 170593: OK• Run with TPC pools: OK• Work on a local raw file: OK• Use OCDB snapshot: OK• Keep only the rec. points for the current event: OK• Switch off QA: OK• Switch off MUON, if the memory consumption is still too
high
17
18
Results• CPass0: 185 jobs, 523,509 out of 539,890 raw files successfully reconstructed => 97% efficiency•All runs with mag.field configuration (+ +) ready (170593-169628)
• Details on losses follow
• Pass2 current status: 131 jobs, 225,568 out of 362,790 files successfully reconstructed => 62.2% efficiency
19
Losses – Pass2• G_exception – average 6.5%
1698
3516
9838
1698
5516
9859
1699
1916
9922
1699
2416
9961
1699
6916
9981
1700
3617
0040
1700
8317
0085
1700
8917
0152
1701
5917
0193
1702
0317
0205
1702
0817
0230
1702
6817
0270
1703
0817
0311
1703
1317
0387
1703
8917
0546
1705
93
0.00%
5.00%
10.00%
15.00%
20.00%
25.00%
30.00%
35.00%G_exception (%)
Strong run dependency
20
Losses – Pass2 (2)• Memory overrun – average 16.8%
1698
3516
9838
1698
5516
9859
1699
1916
9922
1699
2416
9961
1699
6916
9981
1700
3617
0040
1700
8317
0085
1700
8917
0152
1701
5917
0193
1702
0317
0205
1702
0817
0230
1702
6817
0270
1703
0817
0311
1703
1317
0387
1703
8917
0546
1705
93
0.00%
5.00%
10.00%
15.00%
20.00%
25.00%
30.00%
35.00%
40.00%
Memory overrun (%)
Strong run dependencyFunction of number of events/chunk and data taking configuration
21
Losses• G_exception
• Debugging hard as there is no traceback• Seems to be random (from syswatch.log)• Irreproducible in local tests• No related issues shown by Valgrind• Appears in the first events of the chunks• Working with ROOT experts, at least to get the
exception in the logs => special “gdb” run• Memory overrun
• Additional profiling ongoing• All external sources are out – gain only possible
through changes in reconstruction
Special “gdb” run
• “catch throw” mode• Several problems discovered, to be submitted
to Savannah. Most probably uninitialized memory is used as index in an array– TClonesArray new with placement, where the
index come from GetEntriesFast– corrupted (?) raw data– deletion of arrays
Plans
• Continue the investigation of G__exception on the GRID
• Understand the difference between CPass0 and Pass2 (MB trigger, V0s, cascades?)
• Try to reproduce completely the GRID execution flow on a local machine
• Resubmit the failed jobs in “split” mode
v5-02-Release
• Complete the transition of the analysis code to the new modules
• Move every library to a sub-directory and get rid of *.pkg (native CMake)
• Fix the Coverity defects and compilation warnings• Solve as much as possible Savannah issues• Create the branch at the end of January• First stable tag in February