john marshall, 1 john marshall, university of cambridge lcd wg6 meeting, april 18 2011

13
John Marshall, 1 John Marshall, University of Cambridge LCD WG6 Meeting, April 18 2011 Reconstructi on Efficiency

Upload: eugenia-boone

Post on 13-Dec-2015

220 views

Category:

Documents


0 download

TRANSCRIPT

John Marshall, 1

John Marshall,University of Cambridge

LCD WG6 Meeting, April 18 2011

ReconstructionEfficiency

John Marshall, 2

Overview Reconstruction of events with overlaid background is a challenge for our

reconstruction software. Even functions that are intrinsically efficient cannot help but be affected by the

huge increases in combinations of tracks and calorimeter hits. CPU time clearly important, so efforts have been made to specifically address the

problems of overlaid γγhadrons background, with default NumberBackground=3.2

André has been focusing on improving the performance of processors in MarlinReco library, whilst I have examined PandoraPFANew.

There are some difference between the CPU times we report, but we are now satisfied that these are due to machine specifications rather than build configurations, etc.

André has been able to access Intel VTune Amplifier XE 2011 package to provide impressive amount of profiling information. This can report the actual CPU time required by each function and can even provide a line-by-line breakdown of CPU time.

Start with a reminder of performance at the time of the previous meeting...

John Marshall, 3

Status at last meeting

Processor NameSeconds per event (10 event

sample),01/04/2011*

MarlinPandora 126.353

FullLDCTracking 35.552

LCIOOutputProcessor 7.964

LEPTrackingProcessor 7.039

SiliconTrackingCLIC 3.579

TPCDigiProcessor 2.107

KinkFinder 0.325

V0Finder 0.244

RecoMCTruthLinker 0.197

ILDCaloDigi 0.097

Total 183.578

Division of total CPU time between Marlin processors for ten 91GeV Z->uds events with overlaid γγhadrons background and NumberBackground=3.2

Without background, total CPU-time is just 0.565s per event.

* MarlinReco revision 2151, PandoraPFANew revision 1100

John Marshall, 4

Costly functionsFunction

CPU Time 01/04/2011

ClusterHelper::GetTrackClusterDistance 42.839s

IsolatedHitMergingAlgorithm::GetDistanceToHit 32.070s

ConeClusteringAlgorithm::GetGenericDistanceToHit 29.802s

ConeClusteringAlgorithm::GetDistanceToHitInSameLayer 20.070s

CartesianVector::GetZ 16.702s

ClusterHelper::GetDistanceToClosestHit 15.620s

ClusterContact::HitDistanceComparison 11.077s

Cluster::GetCentroid 8.950s

CaloHitHelper::GetDensityWeightContribution 6.739s

operator- (CartesianVector) 6.562s

ConeClusteringAlgorithm::FindHitsInSameLayer 5.399s

TrackClusterAssociationAlgorithm::Run 4.279s

CaloHitHelper::IsolationCountNearbyHits 4.139s

ConeClusteringAlgorithm::GetConeApproachDistanceToHit 3.870s

ClusterHelper::GetDistanceToClosestCentroid 3.661s

ConeClusteringAlgorithm::GetConeApproachDistanceToHit 3.330s

FragmentRemovalHelper::GetClusterContactDetails 2.591s

ClusterHelper::GetTrackClusterDistance 2.461s

CartesianVector::GetUnitVector 2.362s TestPandora application used with input Pandora binary files to perform

standalone Pandora reconstruction and concentrate purely on PandoraPFANew. MarlinPandora not considered.

John Marshall, 5

Reduce function calls Most costly function is GetTrackClusterDistance, used to help identify track-

cluster associations. With background, this function is called for many track-cluster combinations.

For each combination, examine hits in first n cluster layers to find closest perpendicular distance between a straight-line (defined by track state at calorimeter) and a hit in the cluster.

After basic C++ optimization, difficult to further reduce CPU time without changing function behaviour. Instead, try to avoid comparison of tracks and clusters with very different “expected directions”.

Similar cuts implemented in cone-based clustering algorithms, SoftClusterMerging, IsolatedHitMerging and FragmentRemoval algorithms. Potentially dangerous, but...

...cut values are configurable, default cut cos(angle)>0 should be safe. Validation crucial.

Track direction

Parallel distance region

Find smallest perpendicular distance to hit within parallel distance

region

John Marshall, 6

Change approach Another costly function is used by the IsolatedHitMerging algorithm, which

matches isolated hits to nearby clusters, based on the distance to the nearest hit in the cluster.

This algorithm is not unimportant, but still a rather small part of Pandora reconstruction. That it is one of the most time consuming processes justifies a change in approach.

Isolated hits are now matched to clusters based upon distances to the nearest layer centroid position. This allows the nested loop over hits in each layer to be avoided.

Small change to behaviour, but not obvious if any better/worse. Again, validation is crucial.

Get distance to nearest layer

centroid

Get distance to nearest hit

John Marshall, 7

Change approach The CartesianVector class is crucial to Pandora reconstruction. Used extensively

throughout all algorithms. Even small efficiency improvements to this class can help.

Previously, this class offered a default constructor:

inline CartesianVector::CartesianVector(float x, float y, float z) : m_x(x), m_y(y), m_z(z){}

No longer any need for the initialization flag, removing checks from many important functions:

inline CartesianVector::CartesianVector() : m_x(0.f), m_y(0.f), m_z(0.f), m_isInitialized(false){}

This meant that each instance needed an initialization flag, set to true only when explicit component values were assigned. The flag needed to be checked in most member functions.

Removal of the default constructor means that the fully qualified constructor must be used:

GetDotProduct, GetCrossProduct, GetMagnitude, GetOpeningAngle, GetX,Y,Z, ...

John Marshall, 8

General optimization Two of the functions badly affected by the increased calorimeter occupancies

are those used to calculate the “density weight” and “surrounding energy” values for each hit.

These quantities are intended for use with digital calorimeters and are not actually used in CLIC_CDR reconstruction. Can add entries to PandoraSettings to skip these calculations:

Finally, attempted a general optimization of remaining costly functions. Try to avoid square roots, avoid trigonometric functions and simply avoid unnecessary instructions.

However, not too much gained here; functions already designed for efficiency. There are still some potential further changes/savings, but now need more aggressive changes.

Such changes likely to make code less readable/maintainable (e.g. repeated code to avoid function calls) and/or introduce changes to physics output (require step-by-step validation).

<pandora> … <CaloHitHelper> <ShouldCalculateDensityWeight>false</ShouldCalculateDensityWeight> <ShouldCalculateSurroundingEnergy>false</ShouldCalculateSurroundingEnergy> </CaloHitHelper> …</pandora>

John Marshall, 9

Impact of changesFunction

CPU Time 01/04/2011

CPU Time 15/04/2011*

ConeClusteringAlgorithm::GetGenericDistanceToHit 29.802s 28.010s

IsolatedHitMergingAlgorithm::GetDistanceToHit 32.070s 15.310s

ClusterHelper::GetTrackClusterDistance 42.839s 10.450s

ClusterHelper::GetDistanceToClosestHit 15.620s 10.150sConeClusteringAlgorithm::GetDistanceToHitInSameLayer 20.070s 9.742s

ClusterContact::HitDistanceComparison 11.077s 9.089s

Cluster::GetCentroid 8.950s 8.730s

CaloHitHelper::GetDensityWeightContribution 6.739s (6.230s)

CartesianVector::GetCosOpeningAngle 0.190s 5.540s

ConeClusteringAlgorithm::FindHitsInSameLayer 5.399s 5.158s

CaloHitHelper::IsolationCountNearbyHits 4.139s 4.040s

ClusterHelper::GetDistanceToClosestCentroid 3.661s 3.019s

CartesianVector::GetUnitVector 0.070s 2.430sConeClusteringAlgorithm::GetConeApproachDistanceToHit 3.870s 2.160s

FragmentRemovalHelper::GetClusterContactDetails 2.591s 1.870s

ConeClusteringAlgorithm::FindHitsInPreviousLayers 2.360s 1.691s

CaloHitHelper::MipCountNearbyHits 1.480s 1.660s

TrackClusterAssociationAlgorithm::Run 4.279s 1.600s

CaloHitHelper::GetSurroundingEnergyContribution 1.781s (1.529s)

Analysis of PandoraPFANew after efficiency improvements: it is interesting to see how the load has been redistributed. There is an large overall decrease in CPU time.

* MarlinReco revision 2179, PandoraPFANew revision 1137

John Marshall, 10

A. Sailer

MarlinReco Since the previous meeting, André’s examination of MarlinReco has focused on

FullLDCTracking and, in particular, the assignment of tpc hits to tracks:

John Marshall, 11

MarlinReco

A. Sailer

MarlinReco revision 2161 MarlinReco revision 2162

John Marshall, 12

Current statusProcessor Name

Seconds per event,01/04/2011

Seconds per event,15/04/2011

MarlinPandora 126.353 76.690

FullLDCTracking 35.552 18.917

LCIOOutputProcessor 7.964 8.086

LEPTrackingProcessor 7.039 7.064

SiliconTrackingCLIC 3.579 3.557

TPCDigiProcessor 2.107 2.109

KinkFinder 0.325 0.328

V0Finder 0.244 0.242

RecoMCTruthLinker 0.197 0.198

ILDCaloDigi 0.097 0.970

Total 183.578 117.410

Great success in improving efficiency of reconstruction software in presence of background.

For Pandora, declared “first pass” of efficiency improvements complete. Still some gains to be made, but becoming difficult to make changes.

John Marshall, 13

Validation

Ej 45GeV 100GeV 250GeV 500GeV

Status at 01/04/2011 3.71 ± 0.05 3.02 ± 0.04 2.99 ± 0.04 3.18 ± 0.06

Status at 15/04/2011 3.72 ± 0.05 3.03 ± 0.04 2.97 ± 0.04 3.17 ± 0.06

Efficiency changes carefully implemented to avoid affecting physics output. Have confirmed that Pandora jet energy reconstruction performance is unaffected.

Have also examined a number of low- and high-energy single particle files to help confirm that particle id performance is unaffected.

Jacopo has performed a full validation of particle id and reported good results. All efficiency improvements now in Ilcsoft v01-11 pre-release 04. Remember to use

up-to-date steering files. In particular, there are changes to PandoraSettings.xml file. No other changes for CLIC_ILD or CLIC_SiD.

Have hopefully saved many CPU cycles!