john marshall, 1 john marshall, university of cambridge lcd wg6 meeting, april 18 2011
TRANSCRIPT
John Marshall, 1
John Marshall,University of Cambridge
LCD WG6 Meeting, April 18 2011
ReconstructionEfficiency
John Marshall, 2
Overview Reconstruction of events with overlaid background is a challenge for our
reconstruction software. Even functions that are intrinsically efficient cannot help but be affected by the
huge increases in combinations of tracks and calorimeter hits. CPU time clearly important, so efforts have been made to specifically address the
problems of overlaid γγhadrons background, with default NumberBackground=3.2
André has been focusing on improving the performance of processors in MarlinReco library, whilst I have examined PandoraPFANew.
There are some difference between the CPU times we report, but we are now satisfied that these are due to machine specifications rather than build configurations, etc.
André has been able to access Intel VTune Amplifier XE 2011 package to provide impressive amount of profiling information. This can report the actual CPU time required by each function and can even provide a line-by-line breakdown of CPU time.
Start with a reminder of performance at the time of the previous meeting...
John Marshall, 3
Status at last meeting
Processor NameSeconds per event (10 event
sample),01/04/2011*
MarlinPandora 126.353
FullLDCTracking 35.552
LCIOOutputProcessor 7.964
LEPTrackingProcessor 7.039
SiliconTrackingCLIC 3.579
TPCDigiProcessor 2.107
KinkFinder 0.325
V0Finder 0.244
RecoMCTruthLinker 0.197
ILDCaloDigi 0.097
Total 183.578
Division of total CPU time between Marlin processors for ten 91GeV Z->uds events with overlaid γγhadrons background and NumberBackground=3.2
Without background, total CPU-time is just 0.565s per event.
* MarlinReco revision 2151, PandoraPFANew revision 1100
John Marshall, 4
Costly functionsFunction
CPU Time 01/04/2011
ClusterHelper::GetTrackClusterDistance 42.839s
IsolatedHitMergingAlgorithm::GetDistanceToHit 32.070s
ConeClusteringAlgorithm::GetGenericDistanceToHit 29.802s
ConeClusteringAlgorithm::GetDistanceToHitInSameLayer 20.070s
CartesianVector::GetZ 16.702s
ClusterHelper::GetDistanceToClosestHit 15.620s
ClusterContact::HitDistanceComparison 11.077s
Cluster::GetCentroid 8.950s
CaloHitHelper::GetDensityWeightContribution 6.739s
operator- (CartesianVector) 6.562s
ConeClusteringAlgorithm::FindHitsInSameLayer 5.399s
TrackClusterAssociationAlgorithm::Run 4.279s
CaloHitHelper::IsolationCountNearbyHits 4.139s
ConeClusteringAlgorithm::GetConeApproachDistanceToHit 3.870s
ClusterHelper::GetDistanceToClosestCentroid 3.661s
ConeClusteringAlgorithm::GetConeApproachDistanceToHit 3.330s
FragmentRemovalHelper::GetClusterContactDetails 2.591s
ClusterHelper::GetTrackClusterDistance 2.461s
CartesianVector::GetUnitVector 2.362s TestPandora application used with input Pandora binary files to perform
standalone Pandora reconstruction and concentrate purely on PandoraPFANew. MarlinPandora not considered.
John Marshall, 5
Reduce function calls Most costly function is GetTrackClusterDistance, used to help identify track-
cluster associations. With background, this function is called for many track-cluster combinations.
For each combination, examine hits in first n cluster layers to find closest perpendicular distance between a straight-line (defined by track state at calorimeter) and a hit in the cluster.
After basic C++ optimization, difficult to further reduce CPU time without changing function behaviour. Instead, try to avoid comparison of tracks and clusters with very different “expected directions”.
Similar cuts implemented in cone-based clustering algorithms, SoftClusterMerging, IsolatedHitMerging and FragmentRemoval algorithms. Potentially dangerous, but...
...cut values are configurable, default cut cos(angle)>0 should be safe. Validation crucial.
Track direction
Parallel distance region
Find smallest perpendicular distance to hit within parallel distance
region
John Marshall, 6
Change approach Another costly function is used by the IsolatedHitMerging algorithm, which
matches isolated hits to nearby clusters, based on the distance to the nearest hit in the cluster.
This algorithm is not unimportant, but still a rather small part of Pandora reconstruction. That it is one of the most time consuming processes justifies a change in approach.
Isolated hits are now matched to clusters based upon distances to the nearest layer centroid position. This allows the nested loop over hits in each layer to be avoided.
Small change to behaviour, but not obvious if any better/worse. Again, validation is crucial.
Get distance to nearest layer
centroid
Get distance to nearest hit
John Marshall, 7
Change approach The CartesianVector class is crucial to Pandora reconstruction. Used extensively
throughout all algorithms. Even small efficiency improvements to this class can help.
Previously, this class offered a default constructor:
inline CartesianVector::CartesianVector(float x, float y, float z) : m_x(x), m_y(y), m_z(z){}
No longer any need for the initialization flag, removing checks from many important functions:
inline CartesianVector::CartesianVector() : m_x(0.f), m_y(0.f), m_z(0.f), m_isInitialized(false){}
This meant that each instance needed an initialization flag, set to true only when explicit component values were assigned. The flag needed to be checked in most member functions.
Removal of the default constructor means that the fully qualified constructor must be used:
GetDotProduct, GetCrossProduct, GetMagnitude, GetOpeningAngle, GetX,Y,Z, ...
John Marshall, 8
General optimization Two of the functions badly affected by the increased calorimeter occupancies
are those used to calculate the “density weight” and “surrounding energy” values for each hit.
These quantities are intended for use with digital calorimeters and are not actually used in CLIC_CDR reconstruction. Can add entries to PandoraSettings to skip these calculations:
Finally, attempted a general optimization of remaining costly functions. Try to avoid square roots, avoid trigonometric functions and simply avoid unnecessary instructions.
However, not too much gained here; functions already designed for efficiency. There are still some potential further changes/savings, but now need more aggressive changes.
Such changes likely to make code less readable/maintainable (e.g. repeated code to avoid function calls) and/or introduce changes to physics output (require step-by-step validation).
<pandora> … <CaloHitHelper> <ShouldCalculateDensityWeight>false</ShouldCalculateDensityWeight> <ShouldCalculateSurroundingEnergy>false</ShouldCalculateSurroundingEnergy> </CaloHitHelper> …</pandora>
John Marshall, 9
Impact of changesFunction
CPU Time 01/04/2011
CPU Time 15/04/2011*
ConeClusteringAlgorithm::GetGenericDistanceToHit 29.802s 28.010s
IsolatedHitMergingAlgorithm::GetDistanceToHit 32.070s 15.310s
ClusterHelper::GetTrackClusterDistance 42.839s 10.450s
ClusterHelper::GetDistanceToClosestHit 15.620s 10.150sConeClusteringAlgorithm::GetDistanceToHitInSameLayer 20.070s 9.742s
ClusterContact::HitDistanceComparison 11.077s 9.089s
Cluster::GetCentroid 8.950s 8.730s
CaloHitHelper::GetDensityWeightContribution 6.739s (6.230s)
CartesianVector::GetCosOpeningAngle 0.190s 5.540s
ConeClusteringAlgorithm::FindHitsInSameLayer 5.399s 5.158s
CaloHitHelper::IsolationCountNearbyHits 4.139s 4.040s
ClusterHelper::GetDistanceToClosestCentroid 3.661s 3.019s
CartesianVector::GetUnitVector 0.070s 2.430sConeClusteringAlgorithm::GetConeApproachDistanceToHit 3.870s 2.160s
FragmentRemovalHelper::GetClusterContactDetails 2.591s 1.870s
ConeClusteringAlgorithm::FindHitsInPreviousLayers 2.360s 1.691s
CaloHitHelper::MipCountNearbyHits 1.480s 1.660s
TrackClusterAssociationAlgorithm::Run 4.279s 1.600s
CaloHitHelper::GetSurroundingEnergyContribution 1.781s (1.529s)
Analysis of PandoraPFANew after efficiency improvements: it is interesting to see how the load has been redistributed. There is an large overall decrease in CPU time.
* MarlinReco revision 2179, PandoraPFANew revision 1137
John Marshall, 10
A. Sailer
MarlinReco Since the previous meeting, André’s examination of MarlinReco has focused on
FullLDCTracking and, in particular, the assignment of tpc hits to tracks:
John Marshall, 12
Current statusProcessor Name
Seconds per event,01/04/2011
Seconds per event,15/04/2011
MarlinPandora 126.353 76.690
FullLDCTracking 35.552 18.917
LCIOOutputProcessor 7.964 8.086
LEPTrackingProcessor 7.039 7.064
SiliconTrackingCLIC 3.579 3.557
TPCDigiProcessor 2.107 2.109
KinkFinder 0.325 0.328
V0Finder 0.244 0.242
RecoMCTruthLinker 0.197 0.198
ILDCaloDigi 0.097 0.970
Total 183.578 117.410
Great success in improving efficiency of reconstruction software in presence of background.
For Pandora, declared “first pass” of efficiency improvements complete. Still some gains to be made, but becoming difficult to make changes.
John Marshall, 13
Validation
Ej 45GeV 100GeV 250GeV 500GeV
Status at 01/04/2011 3.71 ± 0.05 3.02 ± 0.04 2.99 ± 0.04 3.18 ± 0.06
Status at 15/04/2011 3.72 ± 0.05 3.03 ± 0.04 2.97 ± 0.04 3.17 ± 0.06
Efficiency changes carefully implemented to avoid affecting physics output. Have confirmed that Pandora jet energy reconstruction performance is unaffected.
Have also examined a number of low- and high-energy single particle files to help confirm that particle id performance is unaffected.
Jacopo has performed a full validation of particle id and reported good results. All efficiency improvements now in Ilcsoft v01-11 pre-release 04. Remember to use
up-to-date steering files. In particular, there are changes to PandoraSettings.xml file. No other changes for CLIC_ILD or CLIC_SiD.
Have hopefully saved many CPU cycles!