ca+kf track reconstruction in the sts

14
CA+KF Track Reconstruction CA+KF Track Reconstruction in the STS in the STS I. Kisel I. Kisel GSI / KIP GSI / KIP CBM Collaboration Meeting CBM Collaboration Meeting GSI, February 28, 2008 GSI, February 28, 2008

Upload: adena-sanders

Post on 02-Jan-2016

26 views

Category:

Documents


0 download

DESCRIPTION

CA+KF Track Reconstruction in the STS. I. Kisel GSI / KIP. CBM Collaboration Meeting GSI, February 28, 2008. Track Finder: what is the next Step?. High track density Non-homogeneous magnetic field Fake space points are dominated Single-sided strip detectors Detector inefficiency - PowerPoint PPT Presentation

TRANSCRIPT

CA+KF Track ReconstructionCA+KF Track Reconstructionin the STSin the STS

I. KiselI. KiselGSI / KIPGSI / KIP

CBM Collaboration MeetingCBM Collaboration MeetingGSI, February 28, 2008GSI, February 28, 2008

28 February 2008, GSI28 February 2008, GSI Ivan Kisel, GSIIvan Kisel, GSI 22/14/14

Track Finder: what is the next Step?Track Finder: what is the next Step?

• Optimize the STS geometry (strips, sector navigation) Optimize the STS geometry (strips, sector navigation) • Mathematical and computational optimizationMathematical and computational optimization• SIMDization of the algorithm (from scalars to vectors)SIMDization of the algorithm (from scalars to vectors)• MIMDization (multi-threads, multi-cores)MIMDization (multi-threads, multi-cores)

• High track densityHigh track density• Non-homogeneous magnetic fieldNon-homogeneous magnetic field• Fake space points are dominatedFake space points are dominated• Single-sided strip detectorsSingle-sided strip detectors• Detector inefficiencyDetector inefficiency• Not perfectly aligned systemNot perfectly aligned system• On-line event selectionOn-line event selection• Large PC farm Large PC farm

28 February 2008, GSI28 February 2008, GSI Ivan Kisel, GSIIvan Kisel, GSI 33/14/14

Data Acquisition SystemData Acquisition System

Event Event Builder Builder NetworkNetwork

100 100 evev//sliceslice

DetectorDetector

PC FarmPC Farm

101077 ev/s ev/s

101055 slsl/s/s

50 50 kBkB//evev

5 M5 MBB//sliceslice

N x MN x MN x MN x MSchedulerSchedulerSchedulerScheduler

Sub-FarmSub-Farm

RURURURURURURURURURURURURURURURU

RURURURURURURURURURURURURURURURU

Sub-FarmSub-Farm Sub-FarmSub-Farm Sub-FarmSub-Farm Sub-FarmSub-Farm

Farm Control System

Sub-FarmSub-FarmSub-FarmSub-Farm Sub-FarmSub-Farm Sub-FarmSub-Farm Sub-FarmSub-Farm

Sub-FarmSub-FarmSub-FarmSub-Farm Sub-FarmSub-Farm Sub-FarmSub-Farm Sub-FarmSub-Farm

Sub-FarmSub-FarmSub-FarmSub-Farm Sub-FarmSub-Farm Sub-FarmSub-Farm Sub-FarmSub-Farm

Sub-FarmSub-FarmSub-FarmSub-Farm Sub-FarmSub-Farm Sub-FarmSub-Farm Sub-FarmSub-Farm

SF

n

availab

le

SFnt MAPS STS RICH TRD ECAL

SFnt MAPS STS RICH TRD ECAL

SFntSFnt SFnt SFnt

1010?? PCs PCs

28 February 2008, GSI28 February 2008, GSI Ivan Kisel, GSIIvan Kisel, GSI 44/14/14

Cell Blade – a Sub-Farm with Cell Blade – a Sub-Farm with (2+16) Cores(2+16) Cores

Tracking and Vertexing UnitsTracking and Vertexing Units

Sub-Farm Sub-Farm Management UnitManagement Unit

Sub-Farm Sub-Farm Decision/Selection UnitDecision/Selection Unit

FP

GA

FP

GA

FP

GA

FP

GA

PCPC PCPCPCPCPCPC PCPC

Sub-FarmSub-Farm

28 February 2008, GSI28 February 2008, GSI Ivan Kisel, GSIIvan Kisel, GSI 55/14/14

Welcome to the Era of Multicore HPCWelcome to the Era of Multicore HPC

GamingGaming STI: STI: CellCell

GamingGaming STI: STI: CellCell

GP GPUGP GPU Nvidia: Nvidia: TeslaTesla

GP GPUGP GPU Nvidia: Nvidia: TeslaTesla

GP CPUGP CPU Intel: Intel: LarrabeeLarrabee

GP CPUGP CPU Intel: Intel: LarrabeeLarrabee

CPU/GPUCPU/GPU AMD: AMD: FusionFusion

CPU/GPUCPU/GPU AMD: AMD: FusionFusion

????

• High performance computing (HPC)High performance computing (HPC)• Highest clock rate is reachedHighest clock rate is reached• Performance/power optimizationPerformance/power optimization• Heterogeneous systems of many (>8) coresHeterogeneous systems of many (>8) cores• Similar programming languages (Ct and CUDA), but standards are unlikelySimilar programming languages (Ct and CUDA), but standards are unlikely• We need a uniform approach to all CPU/GPU familiesWe need a uniform approach to all CPU/GPU families• How to take advantage of the additional cores?How to take advantage of the additional cores?

28 February 2008, GSI28 February 2008, GSI Ivan Kisel, GSIIvan Kisel, GSI 66/14/14

NVIDIA GeForce 9600 GT GPU: NVIDIA GeForce 9600 GT GPU: 64 Cores64 Cores

• 64 processors64 processors• 1.625 GHz frequency1.625 GHz frequency• double precision (?)double precision (?)• 170 EUR price170 EUR price

28 February 2008, GSI28 February 2008, GSI Ivan Kisel, GSIIvan Kisel, GSI 77/14/14

Intel Polaris: Intel Polaris: 80 Cores80 Cores

3.16 GHz, 0.95 Volt, 62 Watt3.16 GHz, 0.95 Volt, 62 Watt -> 1.01 Teraflops -> 1.01 Teraflops

28 February 2008, GSI28 February 2008, GSI Ivan Kisel, GSIIvan Kisel, GSI 88/14/14

Cell Processor: Cell Processor: 1+8 Cores1+8 Cores

28 February 2008, GSI28 February 2008, GSI Ivan Kisel, GSIIvan Kisel, GSI 99/14/14

Computer Physics Communications 178 (2008) 374-383Computer Physics Communications 178 (2008) 374-383

28 February 2008, GSI28 February 2008, GSI Ivan Kisel, GSIIvan Kisel, GSI 1010/14/14

Speed-up of the Kalman Filter Track FitSpeed-up of the Kalman Filter Track Fit

28 February 2008, GSI28 February 2008, GSI Ivan Kisel, GSIIvan Kisel, GSI 1111/14/14

Structure and Data: Structure and Data: a Bottlenecka Bottleneck

cbmroot/L1cbmroot/L1

L1AlgoL1Algo

L1GeometryL1Event

(L1Strips, L1Hits) L1Tracks

Strips:Strips: float vStripValues[NStrips]; // strip coordinates (32b)unsigned char vStripFlags [NStrips]; // strip iStation (6b) + used (1b) + used_by_dublets (1b)Hits:Hits:struct L1StsHit { unsigned short int f, b; // front (16b) and back (16b) strip indices };L1StsHitL1StsHit vHits[NHits];

unsigned short int vRecoHits [NRecoHits]; // hit index (16b)unsigned char vRecoTracks [NRecoTracks]; // N hits on track (8b)

class L1Triplet{ unsigned short int w0; // left hit (16b) unsigned short int w1; // first neighbour (16b) or middle hit (16b) unsigned short int w2; // N neighbours (16b) or right hit (16b) unsigned char b0; // chi2 (5b) + level (3b) unsigned char b1; // qp (8b) unsigned char b2; // qp error (8b)}

Input:Input:

Output:Output:

Internal:Internal:

• A standalone L1AlgoL1Algo module• About 300 kB300 kB per central event

28 February 2008, GSI28 February 2008, GSI Ivan Kisel, GSIIvan Kisel, GSI 1212/14/14

Parallelization of the CA Track FinderParallelization of the CA Track Finder

11 Create trackletsCreate tracklets 22 Collect tracksCollect tracks

GSI, KIP, CERNGSI, KIP, CERNGSI, KIP, CERNGSI, KIP, CERN

28 February 2008, GSI28 February 2008, GSI Ivan Kisel, GSIIvan Kisel, GSI 1313/14/14

Kalman Filter Track Fit on Multicore Systems: Kalman Filter Track Fit on Multicore Systems: MultithreadingMultithreading

Real fit

time/track

(us)

#tasks

Logarithmic

scale!

1 2 4 8 160.1

1

10

Cell SPE (approx)

icc/[email protected]

gcc4.1.2/[email protected]

gcc3.4.6/[email protected]

icc/[email protected]

Håvard Bjerke

28 February 2008, GSI28 February 2008, GSI Ivan Kisel, GSIIvan Kisel, GSI 1414/14/14

Summary and PlansSummary and Plans

SIMDized CA track finder works wellSIMDized CA track finder works well Work on single-sided strip detectors startedWork on single-sided strip detectors started Multithreaded Kalman filter track fitMultithreaded Kalman filter track fit Learn Ct (Intel) and CUDA (Nvidia) programming languagesLearn Ct (Intel) and CUDA (Nvidia) programming languages Investigate large multi-core systems (CPU and GPU)Investigate large multi-core systems (CPU and GPU) Parallelize the CA track finderParallelize the CA track finder Parallel hardware -> parallel languages -> parallel algorithmsParallel hardware -> parallel languages -> parallel algorithms