gpu implementations of online track finding algorithms at panda
DESCRIPTION
Talk at spring meeting of German Physical Society. Status of my PhD, more or less.TRANSCRIPT
-
5/27/2018 GPU Implementations of Online Track Finding Algorithms at PANDA
1/37
MitgliedderHelmholtz-Gemeinschaft
GPU Implementations of
Online Track Finding
Algorithms at PANDA
1
HK 57.2, DPG-Frhjahrstagung 2014, Frankfurt21 March 2014, Andreas Herten (Institut fr Kernphysik, Forschungszentrum Jlich) for the PANDA Collaboration
-
5/27/2018 GPU Implementations of Online Track Finding Algorithms at PANDA
2/37
MitgliedderHelmholtz-Gemeinschaft
Andreas Herten, DPG Frhjahrstagung 2014, HK 57.2
PANDA The Experiment
2
-
5/27/2018 GPU Implementations of Online Track Finding Algorithms at PANDA
3/37
MitgliedderHelmholtz-Gemeinschaft
Andreas Herten, DPG Frhjahrstagung 2014, HK 57.2
PANDA The Experiment
2
Magnet
STT
MVD
-
5/27/2018 GPU Implementations of Online Track Finding Algorithms at PANDA
4/37
MitgliedderHelmholtz-Gemeinschaft
PANDA Event Reconstruction
Triggerlessread out
Many benchmark channels Background & signal similar
Event Rate: 2 107/s
3
Raw Data Rate:200 GB/s
Disk Storage Space for
Offline Analysis: 3 PB/y
Reduce by
~1/1000(Reject background events,
save interesting physics events)
-
5/27/2018 GPU Implementations of Online Track Finding Algorithms at PANDA
5/37
MitgliedderHelmholtz-Gemeinschaft
PANDA Event Reconstruction
Triggerlessread out
Many benchmark channels Background & signal similar
Event Rate: 2 107/s
3
Raw Data Rate:200 GB/s
Disk Storage Space for
Offline Analysis: 3 PB/y
Reduce by
~1/1000(Reject background events,
save interesting physics events)
GPUs
-
5/27/2018 GPU Implementations of Online Track Finding Algorithms at PANDA
6/37
MitgliedderHelmholtz-Gemeinschaft
Andreas Herten, DPG Frhjahrstagung 2014, HK 57.2 4
Trigger
Detectorla
yers
PANDA Tracking, OnlineTracking
PANDA: No
hardware-based
trigger
But computational
intensive softwaretrigger
!Online Tracking
-
5/27/2018 GPU Implementations of Online Track Finding Algorithms at PANDA
7/37
MitgliedderHelmholtz-Gemeinschaft
Andreas Herten, DPG Frhjahrstagung 2014, HK 57.2 4
Trigger
Detectorla
yers
PANDA Tracking, OnlineTracking
PANDA: No
hardware-based
trigger
But computational
intensive softwaretrigger
!Online Tracking
-
5/27/2018 GPU Implementations of Online Track Finding Algorithms at PANDA
8/37
MitgliedderHelmholtz-Gemeinschaft
Andreas Herten, DPG Frhjahrstagung 2014, HK 57.2 4
Trigger
Detectorla
yers
Usual HEP experiment
PANDA Tracking, OnlineTracking
PANDA: No
hardware-based
trigger
But computational
intensive softwaretrigger
!Online Tracking
-
5/27/2018 GPU Implementations of Online Track Finding Algorithms at PANDA
9/37
MitgliedderHelmholtz-Gemeinschaft
Andreas Herten, DPG Frhjahrstagung 2014, HK 57.2 4
Trigger
Detectorla
yers
Usual HEP experiment
PANDA Tracking, OnlineTracking
PANDA: No
hardware-based
trigger
But computational
intensive softwaretrigger
!Online Tracking
-
5/27/2018 GPU Implementations of Online Track Finding Algorithms at PANDA
10/37
MitgliedderHelmholtz-Gemeinschaft
Andreas Herten, DPG Frhjahrstagung 2014, HK 57.2 4
Trigger
Detectorla
yers
Usual HEP experiment
PANDA Tracking, OnlineTracking
PANDA: No
hardware-based
trigger
But computational
intensive softwaretrigger
!Online Tracking
-
5/27/2018 GPU Implementations of Online Track Finding Algorithms at PANDA
11/37
MitgliedderHelmholtz-Gemeinschaft
Andreas Herten, DPG Frhjahrstagung 2014, HK 57.2 4
Trigger
Detectorla
yers
Usual HEP experiment
PANDA Tracking, OnlineTracking
PANDA: No
hardware-based
trigger
But computational
intensive softwaretrigger
!Online Tracking
-
5/27/2018 GPU Implementations of Online Track Finding Algorithms at PANDA
12/37
MitgliedderHelmholtz-Gemeinschaft
Andreas Herten, DPG Frhjahrstagung 2014, HK 57.2 4
Trigger
Detectorla
yers
Usual HEP experiment
PANDA
PANDA Tracking, OnlineTracking
PANDA: No
hardware-based
trigger
But computational
intensive softwaretrigger
!Online Tracking
-
5/27/2018 GPU Implementations of Online Track Finding Algorithms at PANDA
13/37
MitgliedderHelmholtz-Gemeinschaft
Andreas Herten, DPG Frhjahrstagung 2014, HK 57.2 4
Trigger
Detectorla
yers
Usual HEP experiment
PANDA
PANDA Tracking, OnlineTracking
PANDA: No
hardware-based
trigger
But computational
intensive softwaretrigger
!Online Tracking
-
5/27/2018 GPU Implementations of Online Track Finding Algorithms at PANDA
14/37
MitgliedderHelmholtz-Gemeinschaft
Andreas Herten, DPG Frhjahrstagung 2014, HK 57.2 4
Trigger
Detectorla
yers
Usual HEP experiment
PANDA
PANDA Tracking, OnlineTracking
PANDA: No
hardware-based
trigger
But computational
intensive softwaretrigger
!Online Tracking
-
5/27/2018 GPU Implementations of Online Track Finding Algorithms at PANDA
15/37
MitgliedderHelmholtz-Gemeinschaft
Andreas Herten, DPG Frhjahrstagung 2014, HK 57.2 4
Trigger
Detectorla
yers
Usual HEP experiment
PANDA
PANDA Tracking, OnlineTracking
PANDA: No
hardware-based
trigger
But computational
intensive softwaretrigger
!Online Tracking
-
5/27/2018 GPU Implementations of Online Track Finding Algorithms at PANDA
16/37
MitgliedderHelmholtz-Gemeinschaft
Andreas Herten, DPG Frhjahrstagung 2014, HK 57.2
GPUs @PANDA OnlineTracking
Porttracking algorithms to GPU
Serial!parallel C++!CUDA
Investigate suitabilityfor online performance
But also: Find & invent tracking algorithms
Under investigation:
Hough Transformation
Riemann Track Finder
Triplet Finder
5
-
5/27/2018 GPU Implementations of Online Track Finding Algorithms at PANDA
17/37
MitgliedderHelmholtz-Gemeinschaft
Andreas Herten, DPG Frhjahrstagung 2014, HK 57.2
Algorithm: Hough Transform
Idea:Transform (x,y)i!(,r)ij, find lines via (,r)space
Solve rijline equation for
Lots of hits (x,y,)iand
Many j![0,360) each
Fill histogram Extract track parameters
6
-
-
5/27/2018 GPU Implementations of Online Track Finding Algorithms at PANDA
18/37
MitgliedderHelmholtz-Gemeinschaft
Andreas Herten, DPG Frhjahrstagung 2014, HK 57.2
Algorithm: Hough Transform
Idea:Transform (x,y)i!(,r)ij, find lines via (,r)space
Solve rijline equation for
Lots of hits (x,y,)iand
Many j![0,360) each
Fill histogram Extract track parameters
6
rij = cosjxi + sinjyi + i
i: ~100 hits/event (STT)
j: every 0.2rij: 180 000
-
-
5/27/2018 GPU Implementations of Online Track Finding Algorithms at PANDA
19/37
Angle /0 20 40 60 80 100 120 140 160 180
H
oughtransformed
-0.4
-0.3
-0.2
-0.1
0
0.1
0.2
0.3
0.4
0.5
0.6 0Entries 2.2356e+08
Mean x 90
Mean y 0.02905
RMS x 51.96
RMS y 0.1063
0
5
10
15
20
25
0Entries 2.2356e+08
Mean x 90
Mean y 0.02905
RMS x 51.96
RMS y 0.1063
1800 x 1800 GridPANDA STT+MVD
MitgliedderHelmholtz-Gemeinschaft
7
68 (x,y) pointsr
Algorithm: Hough Transform
-
5/27/2018 GPU Implementations of Online Track Finding Algorithms at PANDA
20/37
Angle /0 20 40 60 80 100 120 140 160 180
H
oughtransformed
-0.4
-0.3
-0.2
-0.1
0
0.1
0.2
0.3
0.4
0.5
0.6 0Entries 2.2356e+08
Mean x 90
Mean y 0.02905
RMS x 51.96
RMS y 0.1063
0
5
10
15
20
25
0Entries 2.2356e+08
Mean x 90
Mean y 0.02905
RMS x 51.96
RMS y 0.1063
1800 x 1800 GridPANDA STT+MVD
MitgliedderHelmholtz-Gemeinschaft
7
68 (x,y) pointsr
Algorithm: Hough Transform
-
5/27/2018 GPU Implementations of Online Track Finding Algorithms at PANDA
21/37
MitgliedderHelmholtz-Gemeinschaft
Andreas Herten, DPG Frhjahrstagung 2014, HK 57.2
Algorithm: Hough Transform
8
Thrust Plain CUDA
Performance:3 ms/event
Independentof granularity
Reduced to set of standard routines Fast (uses Thrusts optimized algorithms)
Inexible (has its limits, hard to customize)
No peakndingincluded
Even possible?
Adds to time!
Performance:0.5 ms/event
Built completely for this task
Fitting to every problem Customizable
A bit more complicated at parts
Simple peaknderimplemented
(threshold)
Using: Dynamic Parallelism, Shared
Memory
Two Implementations
-
5/27/2018 GPU Implementations of Online Track Finding Algorithms at PANDA
22/37
MitgliedderHelmholtz-Gemeinschaft
Andreas Herten, DPG Frhjahrstagung 2014, HK 57.2 9
Idea:Dont t lines (in 2D), t planes (in 3D)!
Create seeds All possiblethree hit combinations
Growseeds to tracks
Continuously test next hit if it ts
Use mapping to Riemann paraboloid
Summer student project (J. Timcheck)
Algorithm: Riemann Track Finder
-
5/27/2018 GPU Implementations of Online Track Finding Algorithms at PANDA
23/37
Layerx =1
2
8x+ 1 1
pos(nLayerx) =3p
3243x2 1+27x
32/3 +
1
33
3p
3243x2 1+27x
1
MitgliedderHelmholtz-Gemeinschaft
10
Algorithm: Riemann Track Finder
intijk =threadIdx.x +blockIdx.x *blockDim.x;for() {for() {for() {}}}
GPU Optimization: Unfolding loops
!100 faster than CPU version
Time for one event (Tesla K20X): ~0.6 ms
-
5/27/2018 GPU Implementations of Online Track Finding Algorithms at PANDA
24/37
MitgliedderHelmholtz-Gemeinschaft
Andreas Herten, DPG Frhjahrstagung 2014, HK 57.2 11
Algorithm: Triplet Finder
Idea:Use only sub-set of detector as seed
Combine 3 hits to Triplet Calculate circle from 3 Triplets (no t)
Features
Tailored for PANDA
Fast & robust algorithm, no t0
Ported to GPU together with NVIDIA Application Lab
http://localhost/Users/Andi/Downloads/triplet-non-bunched-3.numbershttp://localhost/Users/Andi/Downloads/triplet-non-bunched-3.numbers -
5/27/2018 GPU Implementations of Online Track Finding Algorithms at PANDA
25/37
MitgliedderHelmholtz-Gemeinschaft
Andreas Herten, DPG Frhjahrstagung 2014, HK 57.2 12
Triplet Finder Time
-
5/27/2018 GPU Implementations of Online Track Finding Algorithms at PANDA
26/37
MitgliedderHelmholtz-Gemeinschaft
Andreas Herten, DPG Frhjahrstagung 2014, HK 57.2
Triplet Finder Optimizations
Bunching Wrapper
Hits from one event have similar timestamp Combine hits to sets (bunches) which ll up GPU best
13
-
5/27/2018 GPU Implementations of Online Track Finding Algorithms at PANDA
27/37
MitgliedderHelmholtz-Gemeinschaft
Andreas Herten, DPG Frhjahrstagung 2014, HK 57.2
Triplet Finder Optimizations
Bunching Wrapper
Hits from one event have similar timestamp Combine hits to sets (bunches) which ll up GPU best
13
Hit
-
5/27/2018 GPU Implementations of Online Track Finding Algorithms at PANDA
28/37
MitgliedderHelmholtz-Gemeinschaft
Andreas Herten, DPG Frhjahrstagung 2014, HK 57.2
Triplet Finder Optimizations
Bunching Wrapper
Hits from one event have similar timestamp Combine hits to sets (bunches) which ll up GPU best
13
Hit Event
-
5/27/2018 GPU Implementations of Online Track Finding Algorithms at PANDA
29/37
MitgliedderHelmholtz-Gemeinschaft
Andreas Herten, DPG Frhjahrstagung 2014, HK 57.2
Triplet Finder Optimizations
Bunching Wrapper
Hits from one event have similar timestamp Combine hits to sets (bunches) which ll up GPU best
13
Hit Event
-
5/27/2018 GPU Implementations of Online Track Finding Algorithms at PANDA
30/37
MitgliedderHelmholtz-Gemeinschaft
Andreas Herten, DPG Frhjahrstagung 2014, HK 57.2
Triplet Finder Optimizations
Bunching Wrapper
Hits from one event have similar timestamp Combine hits to sets (bunches) which ll up GPU best
13
Hit Event
Bunch
-
5/27/2018 GPU Implementations of Online Track Finding Algorithms at PANDA
31/37
MitgliedderHelmholtz-Gemeinschaft
Andreas Herten, DPG Frhjahrstagung 2014, HK 57.2
Triplet Finder Optimizations
Bunching Wrapper
Hits from one event have similar timestamp Combine hits to sets (bunches) which ll up GPU best
13
Hit Event
Bunch
!(N2)!!(N)
-
5/27/2018 GPU Implementations of Online Track Finding Algorithms at PANDA
32/37
MitgliedderHelmholtz-Gemeinschaft
Andreas Herten, DPG Frhjahrstagung 2014, HK 57.2 14
Triplet Finder Bunching
Performance
-
5/27/2018 GPU Implementations of Online Track Finding Algorithms at PANDA
33/37
DynamicParallelism
MitgliedderHelmholtz-Gemeinschaft
Andreas Herten, DPG Frhjahrstagung 2014, HK 57.2
Triplet Finder Optimizations
Compare kernel launch strategies
15
1 thread/bunchCalling
kernel
Triplet
Finder
1 thread/bunch
Callingkernel
1 block/bunch
Joinedkernel
1 block/bunch
Joinedkernel
TF Stage #1
1 stream/bunch
Combiningstream
1 stream/bunch
Callingstream
JoinedKernel
HostStreams
Triplet
Finder
Triplet
FinderCPU
GPU
TF Stage #1
TF Stage #1
-
5/27/2018 GPU Implementations of Online Track Finding Algorithms at PANDA
34/37
MitgliedderHelmholtz-Gemeinschaft
Andreas Herten, DPG Frhjahrstagung 2014, HK 57.2 16
Triplet Finder Kernel Launches
Preliminary
(in publication)
-
5/27/2018 GPU Implementations of Online Track Finding Algorithms at PANDA
35/37
MitgliedderHelmholtz-Gemeinschaft
Andreas Herten, DPG Frhjahrstagung 2014, HK 57.2 17
Triplet Finder Clock Speed / Chipset
Preliminary
(in publication)
K40 3004 MHz, 745 MHz/ 875 MHz
K20X 2600 MHz, 732 MHz/ 784 MHz
Memory Clock Core Clock GPU Boost
-
5/27/2018 GPU Implementations of Online Track Finding Algorithms at PANDA
36/37
MitgliedderHelmholtz-Gemeinschaft
Andreas Herten, DPG Frhjahrstagung 2014, HK 57.2
Summary
Investigated different tracking algorithms
Best performance: 20 s/event!Online Tracking a feasible technique for PANDA
Multi GPU system needed !(100) GPUs
Still much optimization necessary (effi
ciency) Collaboration with NVIDIA Application Lab
18
-
5/27/2018 GPU Implementations of Online Track Finding Algorithms at PANDA
37/37
MitgliedderHelmholtz-Gemeinschaft
Summary
Investigated different tracking algorithms
Best performance: 20 s/event!Online Tracking a feasible technique for PANDA
Multi GPU system needed !(100) GPUs
Still much optimization necessary (effi
ciency) Collaboration with NVIDIA Application Lab
Thankyou!
AndreasHerten
mailto:[email protected]:[email protected]:[email protected]:[email protected]:[email protected]:[email protected]:[email protected]:[email protected]:[email protected]:[email protected]