machine learning in pandaroot - jlab.org · • tensorflow (deep learning) • keras (on top of...
TRANSCRIPT
Machine Learning in PandaRoot
GlueX-Panda Workshop G.Washington University, May 2019
Ralf Kliemt (GSI)
Motivation
• Machine Learning (ML) is about modelling data
• Self-learning algorithms gain knowledge from data to make predictions
Why?
• Gain computation speed in online scenarios
• More precision, e.g. by respecting correlations
• Let algorithms do the tedious tasks of recognising patterns, structures, principal components etc.
!2
!3
Which type of ML?
!4
Which type of ML?
PANDA FPGA
Sim Digi Local Reco
Global Reco
Event Building
Event Selection
Analysis
Storage
Paper
Raw Data
online
offline
sim
Alignm. Calib.
ML Activities at PANDA
!5
PANDA FPGA
Sim Digi Local Reco
Global Reco
Event Building
Event Selection
Analysis
Storage
Paper
Raw Data
online
offline
sim
Alignm. Calib.
ML Activities at PANDA
!6
Key Concept
!7
Boosted Decision Tree (BDT)
!8
• Break down data by steps of decisions
• Based on features in training data
• Splitting by maximising information gain
—> Typical application: Classification for PID
Artificial Neural Networks (ANN)
!9
• Transform data meaningfully
• Learn iteratively
A lot to choose from…
!10
!11
A lot to choose from…
Programming Frameworks & Packages
!12
• ROOT / TMVA• NumPy• TensorFlow (Deep Learning) • Keras (on top of TensorFlow with GPUs)• MLlib Apache Spark• Sci-Kit Learn • PyTorch (Deep Learning) • DL4J (Deep Learning, Java)• R Implementations
Popular Choices
!13
Machine Learning for Forward Tracking
!15
Artificial Neural Networks:
Application to the FTS:
9 Create all possible combinations of hit pairs (adjacent
layers).
9 Train the network to predict if hit pairs are on the
same track or not.
9 Input observables:
1) Hit pair positions in x-z projection (vertical layers).
2) Drift radii (Isochrones).
3) Distance between hits.
9 Output:
1) Probability that hit pair are on the same track.
9 Connect hits that pass the probability cut (threshold).
e.g. probability(h1-h2)> threshold, and
probability(h2-h3)> threshold, so
h1, h2, h3 are on the same track.
03.April.2019 Page 9
Machine Learning For Track Findingat PANDA FTS
Institut für Kernphysik (IKP)
Forschungszentrum Jülich
Waleed Esmail, Tobias Stockmanns, and James Ritman
On Behalf of the PANDA Collaboration
Contribution at “Connecting the Dots 2019”
!16
A) first layers
B) middle layers inside Magnet
C) last layers
AB
C
ANN Tracking: Pattern Recognition with parallel layers
!17
ANN Tracking: Residuals including skewed layers
First promising results
inside magnetic
field
!18
RNN (LSTM) Tracking: Residuals including skewed layers
Next Step: Track Fitting with RNN - stay tuned
Machine Learning for Particle Identification
PID: Usual Observables
!20
p [GeV/c]0 2 4 6
E/p
0.2
0.4
0.6
0.8
1
1
10
210
EMC: E/p vs. p
p [GeV/c]0 2 4 6
[rad
]C
Θ
0.10.20.30.40.50.60.70.80.9
1
10
210
vs. pCΘDRC:
p [GeV/c]0 2 4 6
[rad
]C
Θ
0.10.20.30.40.50.60.70.80.9
1
10
210
vs. pCΘDSC:
p [GeV/c]0 2 4 6
dE/d
x [a
.u.]
2468
1012141618
1
10
210
STT: dE/dx vs. p
p [GeV/c]0 2 4 6
[cm
]iro
nL
1020304050607080
-110
1
10
)µ vs. p (ironMUO: L
p [GeV/c]0 2 4 6
[cm
]iro
nL
1020304050607080
-110
1
10
)µ vs. p (non-ironMUO: L
Figure 3: PID raw detector info for EMC, DIRC, DISC, STT and MUO. Distributions are superposedfor all particle species (electrons, muons, pions, kaons, protons) as a function of momentum.
p [GeV/c]0 2 4 6 8 10
P(e)
0
0.2
0.4
0.6
0.8
1P(e) vs p (Electrons)
p [GeV/c]0 2 4 6 8 10
)πP(
0
0.2
0.4
0.6
0.8
1) vs p (Pions)πP(
p [GeV/c]0 2 4 6 8 10
P(K)
0
0.2
0.4
0.6
0.8
1P(K) vs p (Kaons)
p [GeV/c]0 2 4 6 8 10
P(p)
0
0.2
0.4
0.6
0.8
1P(p) vs p (Protons)
p [GeV/c]0 2 4 6 8 10
P(e)
0
0.2
0.4
0.6
0.8
1P(e) vs p (Non-Electrons)
p [GeV/c]0 2 4 6 8 10
)πP(
0
0.2
0.4
0.6
0.8
1) vs p (Non-Pions)πP(
p [GeV/c]0 2 4 6 8 10
P(K)
0
0.2
0.4
0.6
0.8
1P(K) vs p (Non-Kaons)
p [GeV/c]0 2 4 6 8 10
P(p)
0
0.2
0.4
0.6
0.8
1P(p) vs p (Non-Protons)
Figure 4: Graphical representation of the combined PID likelihood values (detectors: EMC, STT,DRC, DSC, MUO) for elecrons, pions, kaons and protons as a function of particle momentum. Theplots in the upper row show the distributions for the correct particle type, the lower one for theincorrect type. It can clearly be seen, that for the correct type the distributions tend to higherlikelihood values, for the incorrect type they accumulate around P = 0. The PID preselection for thestudies in this note was choses to be P > 0.1 as very loose veto against wrong particle types.
11
(charged particles only)
PID Approaches
!21
P (~x|h) = L(~x|h)⇥ P (h)Ph=e,µ,⇡,K,p
L(~x|h)⇥ P (h)<latexit sha1_base64="3ZvoEMUZT2fNdKs/18Cr9fQEUpU=">AAACSnicdVDPSxtBGJ2Nv2PV2B57GZRChBB29aCXQGgvhXrYQqNCJoTZybfu4MzuMvOtGrb750kv7aXQg6f+BV56sJRenCQerNoHA4/3vsf3zYtyJS36/rVXm5tfWFxaXqmvvlhb32hsvjyyWWEE9ESmMnMScQtKptBDiQpOcgNcRwqOo7N3E//4HIyVWfoJxzkMND9NZSwFRycNGzxssnMQ5WX1OdnpsNhwUR4+kBhKDZaGzWSnKpktNFNSS7TDMulAi+mixXLZ+tDKq/+lho1tv+1PQZ+S4J5sd8OL259X336Ew8Z3NspEoSFFobi1/cDPcVByg1IoqOqssJBzccZPoe9oyt2qQTmtoqJvnDKicWbcS5FO1YeJkmtrxzpyk5pjYh97E/E5r19gfDAoZZoXCKmYLYoLRTGjk17pSBoQqMaOcGGku5WKhLs60bVfdyUEj7/8lBzttoO99u5H18ZbMsMyeU22SJMEZJ90yXsSkh4R5Au5Ibfkt/fV++X98f7ORmvefeYV+Qe1+TsV/7iv</latexit>
L(~x|h) =Y
k
pk(~x|h)<latexit sha1_base64="woYPUKldMCmRspq6e3C9QClfum8=">AAACC3icbVC7SgNBFJ31GeNr1dJmSBBiE3ZjoY0QTKGFRQTzgGxYZmdnk2FnH8zMBsOaXgQLf8TGQhFbP0A7f0acPIqYeODC4Zx7ufceJ2ZUSMP41hYWl5ZXVjNr2fWNza1tfWe3LqKEY1LDEYt400GCMBqSmqSSkWbMCQocRhqOXxn6jR7hgkbhtezHpB2gTkg9ipFUkq3nLgtWj+D0ZnDbPTy1Yh65th/b/pRq63mjaIwA54k5Ifny+X3lx3r8rNr6l+VGOAlIKDFDQrRMI5btFHFJMSODrJUIEiPsow5pKRqigIh2OvplAA+U4kIv4qpCCUfq9ESKAiH6gaM6AyS7YtYbiv95rUR6J+2UhnEiSYjHi7yEQRnBYTDQpZxgyfqKIMypuhXiLuIISxVfVoVgzr48T+qlonlULF2pNM7AGBmwD3KgAExwDMrgAlRBDWBwB57AC3jVHrRn7U17H7cuaJOZPfAH2scvmcmfKw==</latexit>
k = MVD dE/dx, DRC thetaC …
Probability that a given track with parameters x corresponds to particle type h
Combination of measurements:Bayes:
Machine Learning:A. Boosted Decision Tree (BDT)B. “Deep Learning” Artificial Neural Network (ANN)
—> gain performance by considering correlations
Input Features
!22
Input:
• 𝑝𝑝→𝑋𝑋𝑌𝑌,• where X and Y =
e∓,𝜋∓,𝜇∓,k∓,p∓
• Beam momentum: 15GeV/c
Performance Plots
!23
Boosted Decision Tree Artificial Neural Network
Confusion Matrices
!24
Boosted Decision Tree Artificial Neural Network
Pions
!25
Boosted Decision Tree Artificial Neural Network
Kaons
!26
Boosted Decision Tree Artificial Neural Network
Machine Learning for the Software Trigger
Expected Data Rates
!28
• PANDA will run with a continuous beam• Event rates will be high, some events will overlap• Storage constraints in size & bandwidth• Data rate has to be reduced by 1/1000• No specific trigger on hardware possible• Event topology of signals similar to background
—> no “Jets” or similarly obvious features
Solution: An online physics filter (“software trigger”)
Event Generation • Signal• Background
Simulation & Reconstruction
Event Filtering • Combinatorics • Mass Window Selection• Trigger Specific Selection → Event Tagging
Global Trigger Tag
!29
Present status of the PANDA software trigger
March 5, 2014
Abstract
This note presents the current status of the PANDA software trigger project. Apart from thepresent results obtained from Monte Carlo simulated events for various PANDA physics channelsof interest, the task is definded and intersections to the DAQ and detector projects are pointed out.
EvtGen Physics Channel 1 Physics Channel 2
... Physics Channel m
DPM
Background
Toy MC Full MC
Trigger 1 Trigger n
Trigger Decision (Logical OR)
Trigger 2 ... Trigger 3
1
reduce bg. by 1/1000
Software Trigger
Software Trigger - Cuts
!30
• Cut based approach, optimised on signal & bg. MC• Many observables taken into account• Correlations are not respected
Entries 968
]2) [GeV/c-e+m(e0 1 2 3 4 5 6 7
05
1015202530354045 Entries 968
= 0.0%∈
-e+ e→ ppEntries 6926
]2) [GeV/c-K+m(K0.5 1 1.5 2 2.5 30
10
20
30
40
50Entries 6926
= 0.3%∈
-K+ K→ φEntries 94141
]2) [GeV/c-π+ KS
m(K0 1 2 3 4 5 6
0500
1000150020002500300035004000 Entries 94141
= 72.5%∈
-π + KS K→ cη
Entries 968
]2) [GeV/c-e+m(e0 1 2 3 4 5 6
0
5
10
15
20
25
30 Entries 968
= 0.1%∈
-e+ e→ ψJ/Entries 964
]2) [GeV/c-µ+µm(0 1 2 3 4 5 6
05
1015202530354045 Entries 964
= 0.2%∈
= 89.1%)t∈(-µ+µ → ψJ/
Entries 147835
]2) [GeV/c+π-m(K
0 1 2 3 4 5 60
200400600800
100012001400160018002000 Entries 147835
= 77.2%∈
+π- K→ 0D
Entries 128092
]2) [GeV/c+π+π-m(K
0 1 2 3 4 5 60
200400600800
100012001400160018002000 Entries 128092
= 45.7%∈
+π+π- K→ +D
Entries 24481
]2) [GeV/c+π-K+m(K
0 1 2 3 4 5 60
50100150200250300350
Entries 24481
= 6.4%∈
+π-K+ K→ +
sDEntries 32319
]2) [GeV/c-πm(p0.5 1 1.5 2 2.5 30
20406080
100120140160180200220 Entries 32319
= 2.1%∈
-π p→ ΛEntries 14226
]2) [GeV/c+π-m(pK
0 1 2 3 4 5 60
20406080
100120140160180 Entries 14226
= 3.2%∈
+π- pK→ cΛ
⌘ c!
KsK
+⇡+
Figure 9: Mass window cuts (ToyMC) — Illustration of simultaneous tagging for the ⌘c ! KsK+⇡+
dataset example atps= 5.5GeV. The individual signal e�ciencies ✏ for the di↵erent trigger lines
after the mass window cuts have been applied are given on the corresponding plots, in addition theglobal e�ciency ✏tot for all 10 channels applied simultaneously for triggering are given top/right, fordiscussion see text.
Entries 15610
]2) [GeV/c-e+m(e0 1 2 3 4 5 6 7
0200400600800
1000120014001600180020002200 Entries 15610
= 0.0%∈
-e+ e→ ppEntries 25069
]2) [GeV/c-K+m(K0.5 1 1.5 2 2.5 30
100
200
300
400
500 Entries 25069
= 0.8%∈
-K+ K→ φEntries 210195
]2) [GeV/c-π+ KS
m(K0 1 2 3 4 5 6
0500
1000150020002500300035004000 Entries 210195
= 3.4%∈
-π + KS K→ cη
Entries 15610
]2) [GeV/c-e+m(e0 1 2 3 4 5 6
0200400600800
10001200140016001800 Entries 15610
= 0.0%∈
-e+ e→ ψJ/Entries 6232
]2) [GeV/c-µ+µm(0 1 2 3 4 5 6
020406080
100120140160180 Entries 6232
= 0.0%∈
= 21.9%)t∈(-µ+µ → ψJ/
Entries 295962
]2) [GeV/c+π-m(K
0 1 2 3 4 5 60
2000
4000
6000
8000
10000 Entries 295962
= 7.2%∈
+π- K→ 0D
Entries 224819
]2) [GeV/c+π+π-m(K
0 1 2 3 4 5 60
50010001500200025003000350040004500
Entries 224819
= 8.6%∈
+π+π- K→ +D
Entries 74779
]2) [GeV/c+π-K+m(K
0 1 2 3 4 5 60
200400600800
1000120014001600 Entries 74779
= 2.8%∈
+π-K+ K→ +
sDEntries 667078
]2) [GeV/c-πm(p0.5 1 1.5 2 2.5 30
10002000300040005000600070008000 Entries 667078
= 7.6%∈
-π p→ ΛEntries 90669
]2) [GeV/c+π-m(pK
0 1 2 3 4 5 60
200400600800
100012001400 Entries 90669
= 5.2%∈
+π- pK→ cΛD
PM
Figure 10: Mass window cuts (ToyMC) — Illustration of simultaneous tagging for the DPM back-ground dataset example at
ps= 5.5GeV. The individual signal e�ciencies ✏ for the di↵erent trigger
lines after the mass window cuts have been applied are given on the corresponding plots, in additionthe global e�ciency ✏tot for all 10 channels applied simultaneously for triggering are given top/right,for discussion see text.
Note that due to combinatorics, the 50k input events may result in a larger number of entries inthe histograms, which is about a factor three larger in this example. The other trigger lines willcross-tag our channel at hand at various rates, e.g. does the e
+e�-trigger accept no event of this
dataset, thus the e�ciency is ✏ = 0.0%, whereas e.g. ✏ = 43.6% of the events are accepted by the�-trigger, and so on. In total, the 10 trigger lines tag ✏t = 90.4% of the events of the D
+s -data.
For the second example, the ⌘c-dataset (Fig. 9), the 8� mass cut applied on the ⌘c mass forthe ⌘c!KSK
+⇡� trigger results in an e�ciency ✏ = 72.5%. Also here, the e
+e�-trigger does not
accept any event of this dataset (✏ = 0.0%), and e.g. ✏ = 0.3% of the events are accepted by the�-tag, and so on. The total e�ciency of the 10 simultaneous trigger lines for the ⌘c-data results in✏t = 89.1%.
In case of the DPM-dataset (Fig. 10), applying the 8� mass cuts for all 10 trigger lines result ina total e�ciency ✏t = 21.9% (e.g. ✏ = 2.8% for the D
+s -tag, ✏ = 3.4% for the ⌘c-tag, ✏ = 0.0% for
17
DPM
2014
Software Trigger - Cuts
!31
[GeV]s2 2.5 3 3.5 4 4.5 5 5.5 6
0
0.1
0.2
0.3
0.4
0.5
0.6
0.7
e+e- φ cη (2e)ψJ/ )µ(2ψJ/0D ±D sD Λ cΛ
Full MC - Efficiencies - mass window cut only
[GeV]s2 2.5 3 3.5 4 4.5 5 5.5 6
00.05
0.10.150.2
0.250.3
0.350.4
0.450.5 mass cut only
Full MC - Background fraction
Figure 17: FullMC: Summary of signal (left) and background (right) e�ciencies, after mass windowcuts applied.
[GeV]s2 2.5 3 3.5 4 4.5 5 5.5 6
0
0.1
0.2
0.3
0.4
0.5
0.6
0.7 e+e- φ cη (2e)ψJ/ )µ(2ψJ/0D ±D sD Λ cΛ
Full MC - Efficiencies - all cuts (high efficiency)
2 2.5 3 3.5 4 4.5 5 5.5 6
-310
-210
-110
1
mass cut onlyall cuts (high efficiency)
Full MC - Background fraction
Figure 18: FullMC: Summary of signal and background e�ciencies after all cuts, mass window cutsand further cuts optimised for signal e�ciency.
[GeV]s2 2.5 3 3.5 4 4.5 5 5.5 6
0
0.1
0.2
0.3
0.4
0.5
0.6
0.7 e+e- φ cη (2e)ψJ/ )µ(2ψJ/0D ±D sD Λ cΛ
Full MC - Efficiencies - all cuts (high suppression)
[GeV]s2 2.5 3 3.5 4 4.5 5 5.5 6
-310
-210
-110
1
mass cut onlyall cuts (high supression)
Full MC - Background fraction
Figure 19: FullMC: Summary of signal and background e�ciencies after all cuts, mass window cutsand further cuts optimised for background suppression.
optimisation for background suppression, the total simultaneous trigger e�ciencies obtained are✏t = 14.5% (D+
s -data set), ✏t = 6.4% (⌘c-data set) and — “by definiton” — ✏t = 0.1% (DPMdata-set), respectively (Fig. 29). Again these e�ciency values are summarised for all data sets atall five pp̄ centre-of-mass energies under study in Tab. 8 and Tab. 9. The full information on theseresults of each individual trigger line for each data set is summarised for completeness in Tab. 23and Tab. 24, respectively, in the Appendix (Sec. 8).
All the results of signal and background e�ciencies obtained after the further cuts applied forthe two approaches of optimisation (Tab. 8 and Tab. 9) are in addition graphically compiled inFig. 18 and Fig. 19.
While in the case of signal e�ciency optimised cuts a significantly improved background sup-pression (factors roughly between 4 and 25) can be achieved, the signal e�ciencies are basically
25
• Trade-off between signal efficiency & background suppression
• Many channels = much feedthrough & cross-tagging
Goal: BG suppression 1/1000
2014
Software Trigger - TMVA
!32
• First studies with many algorithms• Dependence on offered observables
—> output performance—> calculation speed
[GeV]s
2 2.5 3 3.5 4 4.5 5 5.5 600.10.20.30.40.50.60.70.80.9
1 φ 0D ±D sD (2e)ψJ/)µ(2ψJ/ c
η -e+e Λ cΛ
Efficiency - TMVA application
[GeV]s
2 2.5 3 3.5 4 4.5 5 5.5 6-310
-210
-110
1
Mass cut onlyTMVA application
Background fraction
Figure 23: CFMlpANN with primary event shape variables only (FullMC) — Summary of signal (left)and background (right) e�ciencies, after mass window cuts and CFMlpANN with primary event shapevariables only applied.
Approach 3. CFMlpANN with primary event shape variablesThird TMVA approach is based on only the primary event shape variables. The number of usedinput variables is fixed with 28 at each trigger category. To test the TMVA performance of globalevent shape variables, no secondary variables from each resonance as like p or pt are required. Thus,the contents in the list are slightly di↵erent from the approach 1. Few more variables such as numberof hits in trackers and mean scattering angle of all charged particles are additionally introduced toget an improvement of TMVA performance. In this case, all generated events can be put into thetraining, therefore no MC true matched smaples are necessary at all. It has an advantage to reducethe statistical uncertainty due to the size of training samples. Again, all introduced variables arelisted in Table ?? (Appendix sec. 8). In Fig. 25, training responses and discriminator distributionsfor the approach of few best variables are plotted for 9 tagging categories at E=5.5 GeV. Cut valuearound 0.5 should be suitable to separate between signal and background for every categories.For the approach with global event shape variables, the performance of background reduction isslightly worser than TMVA approach 1 or 2. At E=5.5 GeV the background reduction increaseupto ✏t = 5.74%. However, the signal e�ciencies are much higher than both TMVA approach 1and 2. Special feature of the approach with only global event shape is that the enhancement ofe�ciency at the signal tagging for the ⌘c and ⇤c data. The signal e�ciency can be recovered bythe factor of, which has ✏t = 33.64% and ✏t = 42.08% for the ⌘c and ⇤c, respectively. E�cienciesand background reduction for TMVA with primary event shape variables at all five centre-of-massenergies are summarised in the Fig. 23 and Table 12. The full information on these results of eachindividual trigger line for each data set is summarised for completeness in Table 30.
Approach 4. Systematics and Summary of TMVA applicationAs a systematic check, another well known non-linear classification, Boosted Decision Tree (BDT)has been tested. There are also existing several di↵erent algorithms for boosted classifiers. Wetake a version of adaptive boost tree, namely BDTD with variable transformation. To reducethe correlation among the variables for boosted algorithms, it suggested that all input variableswould be transformed their shapes into more appropiriate forms in advance. This preprocessingtransformations may lead to better performance for BDT method and to reduce the training time.
Table 12: CFMlpANN with primary event shape variables only(FullMC) — Summary of the totalsimultaneous trigger e�ciencies ✏t [%] for the di↵erent data sets, after mass window cuts and CFMl-pANN with primary event shape variables onlyapplied.
ps (GeV) e
+e�
� ⌘c J/ (ee) J/ (µµ) D0
D±
Ds ⇤ ⇤c DPM2.4 50.38 34.35 - - - - - - 18.68 - 0.813.77 42.36 41.29 33.64 39.48 54.76 44.90 32.57 - 19.01 - 1.584.5 44.22 40.06 45.38 37.59 53.96 50.51 43.45 45.05 18.88 - 2.705.5 38.84 37.43 50.82 42.80 57.08 52.44 47.60 51.05 19.81 42.08 5.74
30
CFMlpANN
Goal: BG suppression 1/1000
FDA_GA response
-0.6 -0.4 -0.2 0 0.2 0.4 0.6 0.8
dx
/ (1
/N)
dN
0
0.5
1
1.5
2
2.5
3
3.5
4Signal (test sample)
Background (test sample)
Signal (training sample)
Background (training sample)
Kolmogorov-Smirnov test: signal (background) probability = 0.321 (0.199)
U/O
-flo
w (
S,B
): (
0.0
, 0
.0)%
/ (
0.0
, 0.0
)%
TMVA overtraining check for classifier: FDA_GA
Fisher response
-500 0 500 1000 1500 2000
dx
/ (1
/N)
dN
0
0.0002
0.0004
0.0006
0.0008
0.001
0.0012
0.0014
0.0016
0.0018
0.002 Signal (test sample)
Background (test sample)
Signal (training sample)
Background (training sample)
Kolmogorov-Smirnov test: signal (background) probability = 0.443 (0.709)
U/O
-flo
w (
S,B
): (
0.0
, 0
.0)%
/ (
0.0
, 0.0
)%
TMVA overtraining check for classifier: Fisher
RuleFit response
-4 -2 0 2 4
dx
/ (1
/N)
dN
0
0.1
0.2
0.3
0.4
0.5
0.6Signal (test sample)
Background (test sample)
Signal (training sample)
Background (training sample)
Kolmogorov-Smirnov test: signal (background) probability = 0.4 (0.919)
U/O
-flo
w (
S,B
): (
0.0
, 0
.0)%
/ (
0.0
, 0.0
)%
TMVA overtraining check for classifier: RuleFit
KNN response
0 0.2 0.4 0.6 0.8 1
dx
/ (1
/N)
dN
0
5
10
15
20
25
30
35Signal (test sample)
Background (test sample)
Signal (training sample)
Background (training sample)
Kolmogorov-Smirnov test: signal (background) probability = 0.671 (0.473)
U/O
-flo
w (
S,B
): (
0.0
, 0
.0)%
/ (
0.0
, 0
.0)%
TMVA overtraining check for classifier: KNN
Likelihood response
-8 -6 -4 -2 0 2
dx
/ (1
/N)
dN
0
0.2
0.4
0.6
0.8
1 Signal (test sample)
Background (test sample)
Signal (training sample)
Background (training sample)
Kolmogorov-Smirnov test: signal (background) probability = 1 (0.712)
U/O
-flo
w (
S,B
): (
0.0
, 0
.0)%
/ (
0.0
, 0
.0)%
TMVA overtraining check for classifier: Likelihood
MLP response
-1.5 -1 -0.5 0 0.5 1 1.5
dx
/ (1
/N)
dN
0
1
2
3
4
5
6 Signal (test sample)
Background (test sample)
Signal (training sample)
Background (training sample)
Kolmogorov-Smirnov test: signal (background) probability = 0.832 (0.976)
U/O
-flo
w (
S,B
): (
0.0
, 0
.0)%
/ (
0.0
, 0
.0)%
TMVA overtraining check for classifier: MLP
BDT response
-0.4 -0.3 -0.2 -0.1 0 0.1 0.2 0.3 0.4
dx
/ (1
/N)
dN
0
1
2
3
4
5
6 Signal (test sample)
Background (test sample)
Signal (training sample)
Background (training sample)
Kolmogorov-Smirnov test: signal (background) probability = 0.0238 (0.000372)
U/O
-flo
w (
S,B
): (
0.0
, 0
.0)%
/ (
0.0
, 0
.0)%
TMVA overtraining check for classifier: BDT
SVM response
0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9
dx
/ (1
/N)
dN
0
2
4
6
8
10Signal (test sample)
Background (test sample)
Signal (training sample)
Background (training sample)
Kolmogorov-Smirnov test: signal (background) probability = 0.654 (0.265)
U/O
-flo
w (
S,B
): (
0.0
, 0
.0)%
/ (
0.0
, 0
.0)%
TMVA overtraining check for classifier: SVM
TMlpANN response
-0.4 -0.2 0 0.2 0.4 0.6 0.8 1 1.2
dx
/ (1
/N)
dN
0
2
4
6
8
10
12
14 Signal (test sample)
Background (test sample)
Signal (training sample)
Background (training sample)
Kolmogorov-Smirnov test: signal (background) probability = 0.073 (0.0268)
U/O
-flo
w (
S,B
): (
0.0
, 0
.0)%
/ (
0.0
, 0
.0)%
TMVA overtraining check for classifier: TMlpANN
Figure 30: Classification of 9 TMVA algortihms for the J/ selection in the J/ ! l+l�⇡+⇡� event
at E=5.5 GeV data.
Signal efficiency
0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1
Ba
ck
gro
un
d r
eje
cti
on
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
1
MVA Method:
BDT
SVM
RuleFit
MLP
KNN
TMlpANN
Likelihood
FDA_GA
Fisher
Background rejection versus Signal efficiency
Figure 31: ROC curve : summary of signal e�ciency and background rejection obtained by di↵erentalgorithms of J/ classification for J/ ! l
+l�⇡+⇡� event at E=5.5 GeV data.
55
2014
Software Trigger on GPU?
!33
FastSim - Principle Component Analysis - no GPU, yet
pp ! D+D� ! K�⇡+⇡+ D�(incl.) (& cc.)<latexit sha1_base64="j/nYHFxr7hPc0Pmeo0J07vdFGDI=">AAACR3icbVDLSgMxFM3Ud31VXboJFqUiHWZUUHAj6kJwU8E+oNOWTJq2wUwSkoxShv6dG7fu/AU3LhRxafoAa+sNuZyccy5JTigZ1cbzXp3UzOzc/MLiUnp5ZXVtPbOxWdIiVpgUsWBCVUKkCaOcFA01jFSkIigKGSmH95d9vfxAlKaC35muJLUItTltUYyMpRqZugyE1fvjiewFirY7BiklHuFV/eCqnh9nbuxR0vrBsJ1ZRz5HOWbuPgzOfhfMBXuWgBi7+41M1nO9QcFp4I9AFoyq0Mi8BE2B44hwgxnSuup70tQSpAzFjPTSQayJRPgetUnVQo4iomvJIIce3LVME7aEspsbOGDHJxIUad2NQuuMkOnoSa1P/qdVY9M6rSWUy9gQjocXtWIGjYD9UGGTKoIN61qAsKL2rRB3kELY2OjTNgR/8svToHTo+kfu4e1x9vxiFMci2AY7IAd8cALOwTUogCLA4Am8gQ/w6Tw7786X8z20ppzRzBb4UynnB7BErn8=</latexit>
Reaction:
Signal BackgroundTotal events 24713 52180True selection 23555 50350False selection 1158 1830True selection rate 0.953 0.965
MC input
Software Trigger on GPU
!34
FullSim - Artificial Neural Network - training on GTX1080Ti
pp ! D+D� ! K�⇡+⇡+ D�(incl.) (& cc.)<latexit sha1_base64="j/nYHFxr7hPc0Pmeo0J07vdFGDI=">AAACR3icbVDLSgMxFM3Ud31VXboJFqUiHWZUUHAj6kJwU8E+oNOWTJq2wUwSkoxShv6dG7fu/AU3LhRxafoAa+sNuZyccy5JTigZ1cbzXp3UzOzc/MLiUnp5ZXVtPbOxWdIiVpgUsWBCVUKkCaOcFA01jFSkIigKGSmH95d9vfxAlKaC35muJLUItTltUYyMpRqZugyE1fvjiewFirY7BiklHuFV/eCqnh9nbuxR0vrBsJ1ZRz5HOWbuPgzOfhfMBXuWgBi7+41M1nO9QcFp4I9AFoyq0Mi8BE2B44hwgxnSuup70tQSpAzFjPTSQayJRPgetUnVQo4iomvJIIce3LVME7aEspsbOGDHJxIUad2NQuuMkOnoSa1P/qdVY9M6rSWUy9gQjocXtWIGjYD9UGGTKoIN61qAsKL2rRB3kELY2OjTNgR/8svToHTo+kfu4e1x9vxiFMci2AY7IAd8cALOwTUogCLA4Am8gQ/w6Tw7786X8z20ppzRzBb4UynnB7BErn8=</latexit>
Reaction:
MC inputAll mass spectra normalised to 1
Summary
• Machine learning sees a comeback in physics• Many available libraries with fresh concepts
• ML in algorithms under development: • Forward tracking• charged PID• Software trigger
• Potential parts which may benefit from ML:• EMC cluster shape analysis• Event building• Physics analysis
!35
Thanks for your attention.
PandaRoot Communication
• Code Repository: pandaatfair.githost.io
• Issue tracker, including discussions
• WiKi page:panda-wiki.gsi.de/foswiki/bin/view/Computing/PandaRoot
• Forums: forum.gsi.de & Slack: pandaroot.slack.com
• Bi-weekly online meetings: Thu. 10-11
• Dashboard:
https://cdash.gsi.de/index.php?project=PandaRoot
!38
!39
TString inputGenerator = "psi2s_Jpsi2pi_Jpsi_mumu.dec"; // "dpm" "ftf" or e.g. "box:type(211,1):p(1,1):tht(10,120):phi(0,360)" PndMasterRunSim *fRun = new PndMasterRunSim();fRun->SetInput(inputGenerator);fRun->SetName("TGeant4");fRun->SetOptions("");fRun->SetParamAsciiFile("all.par");fRun->SetBeamMom(7.0);fRun->SetStoreTraj(kTRUE);
fRun->Setup("evtcomplete"); // file name prefixfRun->CreateGeometry();fRun->SetGenerator();fRun->AddSimTasks();
fRun->Init();fRun->Run(1000); //nEventsfRun->Finish();
e.g. in macro/master
Short ROOT macros to start simulations