gpu implementations of online track finding algorithms at panda

Upload: andiherten

Post on 16-Oct-2015

32 views

Category:

Documents


0 download

DESCRIPTION

Talk at spring meeting of German Physical Society. Status of my PhD, more or less.

TRANSCRIPT

  • 5/27/2018 GPU Implementations of Online Track Finding Algorithms at PANDA

    1/37

    MitgliedderHelmholtz-Gemeinschaft

    GPU Implementations of

    Online Track Finding

    Algorithms at PANDA

    1

    HK 57.2, DPG-Frhjahrstagung 2014, Frankfurt21 March 2014, Andreas Herten (Institut fr Kernphysik, Forschungszentrum Jlich) for the PANDA Collaboration

  • 5/27/2018 GPU Implementations of Online Track Finding Algorithms at PANDA

    2/37

    MitgliedderHelmholtz-Gemeinschaft

    Andreas Herten, DPG Frhjahrstagung 2014, HK 57.2

    PANDA The Experiment

    2

  • 5/27/2018 GPU Implementations of Online Track Finding Algorithms at PANDA

    3/37

    MitgliedderHelmholtz-Gemeinschaft

    Andreas Herten, DPG Frhjahrstagung 2014, HK 57.2

    PANDA The Experiment

    2

    Magnet

    STT

    MVD

  • 5/27/2018 GPU Implementations of Online Track Finding Algorithms at PANDA

    4/37

    MitgliedderHelmholtz-Gemeinschaft

    PANDA Event Reconstruction

    Triggerlessread out

    Many benchmark channels Background & signal similar

    Event Rate: 2 107/s

    3

    Raw Data Rate:200 GB/s

    Disk Storage Space for

    Offline Analysis: 3 PB/y

    Reduce by

    ~1/1000(Reject background events,

    save interesting physics events)

  • 5/27/2018 GPU Implementations of Online Track Finding Algorithms at PANDA

    5/37

    MitgliedderHelmholtz-Gemeinschaft

    PANDA Event Reconstruction

    Triggerlessread out

    Many benchmark channels Background & signal similar

    Event Rate: 2 107/s

    3

    Raw Data Rate:200 GB/s

    Disk Storage Space for

    Offline Analysis: 3 PB/y

    Reduce by

    ~1/1000(Reject background events,

    save interesting physics events)

    GPUs

  • 5/27/2018 GPU Implementations of Online Track Finding Algorithms at PANDA

    6/37

    MitgliedderHelmholtz-Gemeinschaft

    Andreas Herten, DPG Frhjahrstagung 2014, HK 57.2 4

    Trigger

    Detectorla

    yers

    PANDA Tracking, OnlineTracking

    PANDA: No

    hardware-based

    trigger

    But computational

    intensive softwaretrigger

    !Online Tracking

  • 5/27/2018 GPU Implementations of Online Track Finding Algorithms at PANDA

    7/37

    MitgliedderHelmholtz-Gemeinschaft

    Andreas Herten, DPG Frhjahrstagung 2014, HK 57.2 4

    Trigger

    Detectorla

    yers

    PANDA Tracking, OnlineTracking

    PANDA: No

    hardware-based

    trigger

    But computational

    intensive softwaretrigger

    !Online Tracking

  • 5/27/2018 GPU Implementations of Online Track Finding Algorithms at PANDA

    8/37

    MitgliedderHelmholtz-Gemeinschaft

    Andreas Herten, DPG Frhjahrstagung 2014, HK 57.2 4

    Trigger

    Detectorla

    yers

    Usual HEP experiment

    PANDA Tracking, OnlineTracking

    PANDA: No

    hardware-based

    trigger

    But computational

    intensive softwaretrigger

    !Online Tracking

  • 5/27/2018 GPU Implementations of Online Track Finding Algorithms at PANDA

    9/37

    MitgliedderHelmholtz-Gemeinschaft

    Andreas Herten, DPG Frhjahrstagung 2014, HK 57.2 4

    Trigger

    Detectorla

    yers

    Usual HEP experiment

    PANDA Tracking, OnlineTracking

    PANDA: No

    hardware-based

    trigger

    But computational

    intensive softwaretrigger

    !Online Tracking

  • 5/27/2018 GPU Implementations of Online Track Finding Algorithms at PANDA

    10/37

    MitgliedderHelmholtz-Gemeinschaft

    Andreas Herten, DPG Frhjahrstagung 2014, HK 57.2 4

    Trigger

    Detectorla

    yers

    Usual HEP experiment

    PANDA Tracking, OnlineTracking

    PANDA: No

    hardware-based

    trigger

    But computational

    intensive softwaretrigger

    !Online Tracking

  • 5/27/2018 GPU Implementations of Online Track Finding Algorithms at PANDA

    11/37

    MitgliedderHelmholtz-Gemeinschaft

    Andreas Herten, DPG Frhjahrstagung 2014, HK 57.2 4

    Trigger

    Detectorla

    yers

    Usual HEP experiment

    PANDA Tracking, OnlineTracking

    PANDA: No

    hardware-based

    trigger

    But computational

    intensive softwaretrigger

    !Online Tracking

  • 5/27/2018 GPU Implementations of Online Track Finding Algorithms at PANDA

    12/37

    MitgliedderHelmholtz-Gemeinschaft

    Andreas Herten, DPG Frhjahrstagung 2014, HK 57.2 4

    Trigger

    Detectorla

    yers

    Usual HEP experiment

    PANDA

    PANDA Tracking, OnlineTracking

    PANDA: No

    hardware-based

    trigger

    But computational

    intensive softwaretrigger

    !Online Tracking

  • 5/27/2018 GPU Implementations of Online Track Finding Algorithms at PANDA

    13/37

    MitgliedderHelmholtz-Gemeinschaft

    Andreas Herten, DPG Frhjahrstagung 2014, HK 57.2 4

    Trigger

    Detectorla

    yers

    Usual HEP experiment

    PANDA

    PANDA Tracking, OnlineTracking

    PANDA: No

    hardware-based

    trigger

    But computational

    intensive softwaretrigger

    !Online Tracking

  • 5/27/2018 GPU Implementations of Online Track Finding Algorithms at PANDA

    14/37

    MitgliedderHelmholtz-Gemeinschaft

    Andreas Herten, DPG Frhjahrstagung 2014, HK 57.2 4

    Trigger

    Detectorla

    yers

    Usual HEP experiment

    PANDA

    PANDA Tracking, OnlineTracking

    PANDA: No

    hardware-based

    trigger

    But computational

    intensive softwaretrigger

    !Online Tracking

  • 5/27/2018 GPU Implementations of Online Track Finding Algorithms at PANDA

    15/37

    MitgliedderHelmholtz-Gemeinschaft

    Andreas Herten, DPG Frhjahrstagung 2014, HK 57.2 4

    Trigger

    Detectorla

    yers

    Usual HEP experiment

    PANDA

    PANDA Tracking, OnlineTracking

    PANDA: No

    hardware-based

    trigger

    But computational

    intensive softwaretrigger

    !Online Tracking

  • 5/27/2018 GPU Implementations of Online Track Finding Algorithms at PANDA

    16/37

    MitgliedderHelmholtz-Gemeinschaft

    Andreas Herten, DPG Frhjahrstagung 2014, HK 57.2

    GPUs @PANDA OnlineTracking

    Porttracking algorithms to GPU

    Serial!parallel C++!CUDA

    Investigate suitabilityfor online performance

    But also: Find & invent tracking algorithms

    Under investigation:

    Hough Transformation

    Riemann Track Finder

    Triplet Finder

    5

  • 5/27/2018 GPU Implementations of Online Track Finding Algorithms at PANDA

    17/37

    MitgliedderHelmholtz-Gemeinschaft

    Andreas Herten, DPG Frhjahrstagung 2014, HK 57.2

    Algorithm: Hough Transform

    Idea:Transform (x,y)i!(,r)ij, find lines via (,r)space

    Solve rijline equation for

    Lots of hits (x,y,)iand

    Many j![0,360) each

    Fill histogram Extract track parameters

    6

    -

  • 5/27/2018 GPU Implementations of Online Track Finding Algorithms at PANDA

    18/37

    MitgliedderHelmholtz-Gemeinschaft

    Andreas Herten, DPG Frhjahrstagung 2014, HK 57.2

    Algorithm: Hough Transform

    Idea:Transform (x,y)i!(,r)ij, find lines via (,r)space

    Solve rijline equation for

    Lots of hits (x,y,)iand

    Many j![0,360) each

    Fill histogram Extract track parameters

    6

    rij = cosjxi + sinjyi + i

    i: ~100 hits/event (STT)

    j: every 0.2rij: 180 000

    -

  • 5/27/2018 GPU Implementations of Online Track Finding Algorithms at PANDA

    19/37

    Angle /0 20 40 60 80 100 120 140 160 180

    H

    oughtransformed

    -0.4

    -0.3

    -0.2

    -0.1

    0

    0.1

    0.2

    0.3

    0.4

    0.5

    0.6 0Entries 2.2356e+08

    Mean x 90

    Mean y 0.02905

    RMS x 51.96

    RMS y 0.1063

    0

    5

    10

    15

    20

    25

    0Entries 2.2356e+08

    Mean x 90

    Mean y 0.02905

    RMS x 51.96

    RMS y 0.1063

    1800 x 1800 GridPANDA STT+MVD

    MitgliedderHelmholtz-Gemeinschaft

    7

    68 (x,y) pointsr

    Algorithm: Hough Transform

  • 5/27/2018 GPU Implementations of Online Track Finding Algorithms at PANDA

    20/37

    Angle /0 20 40 60 80 100 120 140 160 180

    H

    oughtransformed

    -0.4

    -0.3

    -0.2

    -0.1

    0

    0.1

    0.2

    0.3

    0.4

    0.5

    0.6 0Entries 2.2356e+08

    Mean x 90

    Mean y 0.02905

    RMS x 51.96

    RMS y 0.1063

    0

    5

    10

    15

    20

    25

    0Entries 2.2356e+08

    Mean x 90

    Mean y 0.02905

    RMS x 51.96

    RMS y 0.1063

    1800 x 1800 GridPANDA STT+MVD

    MitgliedderHelmholtz-Gemeinschaft

    7

    68 (x,y) pointsr

    Algorithm: Hough Transform

  • 5/27/2018 GPU Implementations of Online Track Finding Algorithms at PANDA

    21/37

    MitgliedderHelmholtz-Gemeinschaft

    Andreas Herten, DPG Frhjahrstagung 2014, HK 57.2

    Algorithm: Hough Transform

    8

    Thrust Plain CUDA

    Performance:3 ms/event

    Independentof granularity

    Reduced to set of standard routines Fast (uses Thrusts optimized algorithms)

    Inexible (has its limits, hard to customize)

    No peakndingincluded

    Even possible?

    Adds to time!

    Performance:0.5 ms/event

    Built completely for this task

    Fitting to every problem Customizable

    A bit more complicated at parts

    Simple peaknderimplemented

    (threshold)

    Using: Dynamic Parallelism, Shared

    Memory

    Two Implementations

  • 5/27/2018 GPU Implementations of Online Track Finding Algorithms at PANDA

    22/37

    MitgliedderHelmholtz-Gemeinschaft

    Andreas Herten, DPG Frhjahrstagung 2014, HK 57.2 9

    Idea:Dont t lines (in 2D), t planes (in 3D)!

    Create seeds All possiblethree hit combinations

    Growseeds to tracks

    Continuously test next hit if it ts

    Use mapping to Riemann paraboloid

    Summer student project (J. Timcheck)

    Algorithm: Riemann Track Finder

  • 5/27/2018 GPU Implementations of Online Track Finding Algorithms at PANDA

    23/37

    Layerx =1

    2

    8x+ 1 1

    pos(nLayerx) =3p

    3243x2 1+27x

    32/3 +

    1

    33

    3p

    3243x2 1+27x

    1

    MitgliedderHelmholtz-Gemeinschaft

    10

    Algorithm: Riemann Track Finder

    intijk =threadIdx.x +blockIdx.x *blockDim.x;for() {for() {for() {}}}

    GPU Optimization: Unfolding loops

    !100 faster than CPU version

    Time for one event (Tesla K20X): ~0.6 ms

  • 5/27/2018 GPU Implementations of Online Track Finding Algorithms at PANDA

    24/37

    MitgliedderHelmholtz-Gemeinschaft

    Andreas Herten, DPG Frhjahrstagung 2014, HK 57.2 11

    Algorithm: Triplet Finder

    Idea:Use only sub-set of detector as seed

    Combine 3 hits to Triplet Calculate circle from 3 Triplets (no t)

    Features

    Tailored for PANDA

    Fast & robust algorithm, no t0

    Ported to GPU together with NVIDIA Application Lab

    http://localhost/Users/Andi/Downloads/triplet-non-bunched-3.numbershttp://localhost/Users/Andi/Downloads/triplet-non-bunched-3.numbers
  • 5/27/2018 GPU Implementations of Online Track Finding Algorithms at PANDA

    25/37

    MitgliedderHelmholtz-Gemeinschaft

    Andreas Herten, DPG Frhjahrstagung 2014, HK 57.2 12

    Triplet Finder Time

  • 5/27/2018 GPU Implementations of Online Track Finding Algorithms at PANDA

    26/37

    MitgliedderHelmholtz-Gemeinschaft

    Andreas Herten, DPG Frhjahrstagung 2014, HK 57.2

    Triplet Finder Optimizations

    Bunching Wrapper

    Hits from one event have similar timestamp Combine hits to sets (bunches) which ll up GPU best

    13

  • 5/27/2018 GPU Implementations of Online Track Finding Algorithms at PANDA

    27/37

    MitgliedderHelmholtz-Gemeinschaft

    Andreas Herten, DPG Frhjahrstagung 2014, HK 57.2

    Triplet Finder Optimizations

    Bunching Wrapper

    Hits from one event have similar timestamp Combine hits to sets (bunches) which ll up GPU best

    13

    Hit

  • 5/27/2018 GPU Implementations of Online Track Finding Algorithms at PANDA

    28/37

    MitgliedderHelmholtz-Gemeinschaft

    Andreas Herten, DPG Frhjahrstagung 2014, HK 57.2

    Triplet Finder Optimizations

    Bunching Wrapper

    Hits from one event have similar timestamp Combine hits to sets (bunches) which ll up GPU best

    13

    Hit Event

  • 5/27/2018 GPU Implementations of Online Track Finding Algorithms at PANDA

    29/37

    MitgliedderHelmholtz-Gemeinschaft

    Andreas Herten, DPG Frhjahrstagung 2014, HK 57.2

    Triplet Finder Optimizations

    Bunching Wrapper

    Hits from one event have similar timestamp Combine hits to sets (bunches) which ll up GPU best

    13

    Hit Event

  • 5/27/2018 GPU Implementations of Online Track Finding Algorithms at PANDA

    30/37

    MitgliedderHelmholtz-Gemeinschaft

    Andreas Herten, DPG Frhjahrstagung 2014, HK 57.2

    Triplet Finder Optimizations

    Bunching Wrapper

    Hits from one event have similar timestamp Combine hits to sets (bunches) which ll up GPU best

    13

    Hit Event

    Bunch

  • 5/27/2018 GPU Implementations of Online Track Finding Algorithms at PANDA

    31/37

    MitgliedderHelmholtz-Gemeinschaft

    Andreas Herten, DPG Frhjahrstagung 2014, HK 57.2

    Triplet Finder Optimizations

    Bunching Wrapper

    Hits from one event have similar timestamp Combine hits to sets (bunches) which ll up GPU best

    13

    Hit Event

    Bunch

    !(N2)!!(N)

  • 5/27/2018 GPU Implementations of Online Track Finding Algorithms at PANDA

    32/37

    MitgliedderHelmholtz-Gemeinschaft

    Andreas Herten, DPG Frhjahrstagung 2014, HK 57.2 14

    Triplet Finder Bunching

    Performance

  • 5/27/2018 GPU Implementations of Online Track Finding Algorithms at PANDA

    33/37

    DynamicParallelism

    MitgliedderHelmholtz-Gemeinschaft

    Andreas Herten, DPG Frhjahrstagung 2014, HK 57.2

    Triplet Finder Optimizations

    Compare kernel launch strategies

    15

    1 thread/bunchCalling

    kernel

    Triplet

    Finder

    1 thread/bunch

    Callingkernel

    1 block/bunch

    Joinedkernel

    1 block/bunch

    Joinedkernel

    TF Stage #1

    1 stream/bunch

    Combiningstream

    1 stream/bunch

    Callingstream

    JoinedKernel

    HostStreams

    Triplet

    Finder

    Triplet

    FinderCPU

    GPU

    TF Stage #1

    TF Stage #1

  • 5/27/2018 GPU Implementations of Online Track Finding Algorithms at PANDA

    34/37

    MitgliedderHelmholtz-Gemeinschaft

    Andreas Herten, DPG Frhjahrstagung 2014, HK 57.2 16

    Triplet Finder Kernel Launches

    Preliminary

    (in publication)

  • 5/27/2018 GPU Implementations of Online Track Finding Algorithms at PANDA

    35/37

    MitgliedderHelmholtz-Gemeinschaft

    Andreas Herten, DPG Frhjahrstagung 2014, HK 57.2 17

    Triplet Finder Clock Speed / Chipset

    Preliminary

    (in publication)

    K40 3004 MHz, 745 MHz/ 875 MHz

    K20X 2600 MHz, 732 MHz/ 784 MHz

    Memory Clock Core Clock GPU Boost

  • 5/27/2018 GPU Implementations of Online Track Finding Algorithms at PANDA

    36/37

    MitgliedderHelmholtz-Gemeinschaft

    Andreas Herten, DPG Frhjahrstagung 2014, HK 57.2

    Summary

    Investigated different tracking algorithms

    Best performance: 20 s/event!Online Tracking a feasible technique for PANDA

    Multi GPU system needed !(100) GPUs

    Still much optimization necessary (effi

    ciency) Collaboration with NVIDIA Application Lab

    18

  • 5/27/2018 GPU Implementations of Online Track Finding Algorithms at PANDA

    37/37

    MitgliedderHelmholtz-Gemeinschaft

    Summary

    Investigated different tracking algorithms

    Best performance: 20 s/event!Online Tracking a feasible technique for PANDA

    Multi GPU system needed !(100) GPUs

    Still much optimization necessary (effi

    ciency) Collaboration with NVIDIA Application Lab

    Thankyou!

    AndreasHerten

    [email protected]

    mailto:[email protected]:[email protected]:[email protected]:[email protected]:[email protected]:[email protected]:[email protected]:[email protected]:[email protected]:[email protected]