a neural network approach to high energy cosmic rays mass identification at the pierre auger...
TRANSCRIPT
A neural network approach to A neural network approach to high energy cosmic rays mass high energy cosmic rays mass
identification identification at the Pierre Auger Observatoryat the Pierre Auger Observatory
S. RiggiS. Riggi, R. Caruso, A. Insolia, M. Scuderi, R. Caruso, A. Insolia, M. Scuderi
Department of Physics and Astronomy, University of CataniaDepartment of Physics and Astronomy, University of CataniaINFN, Section of CataniaINFN, Section of Catania
Amsterdam - April 23-27, 2007Amsterdam - April 23-27, 2007
ACAT 2007ACAT 2007
Open questions in UHECR physics
• Origin and nature of the cosmic radiation at the highest Origin and nature of the cosmic radiation at the highest energyenergy
(AGNs? GRBs? Pulsars? Exotic scenarios?...) (AGNs? GRBs? Pulsars? Exotic scenarios?...) • Cutoff or not cutoff?Cutoff or not cutoff?
Energy spectra and mass composition
Propagation through galactic Propagation through galactic and intergalactic mediumand intergalactic medium
Arrival direction and Arrival direction and anisotropiesanisotropies
3 principal research fields, 3 principal research fields, interconnectedinterconnected
each othereach other
Open questions in UHECR physics
• Origin and nature of the cosmic radiation at the highest Origin and nature of the cosmic radiation at the highest energyenergy
(AGNs? GRBs? Pulsars? Exotic scenarios?...) (AGNs? GRBs? Pulsars? Exotic scenarios?...) • Cutoff or not cutoff?Cutoff or not cutoff?
Energy spectra and mass Energy spectra and mass compositioncomposition
Propagation through galactic Propagation through galactic and intergalactic mediumand intergalactic medium
Arrival direction and Arrival direction and anisotropiesanisotropies
ACAT 2007ACAT 2007
3 principal research fields, 3 principal research fields, interconnectedinterconnected
each othereach other
Open questions in UHECR physics
• Origin and nature of the cosmic radiation at the highest Origin and nature of the cosmic radiation at the highest energyenergy
(AGNs? GRBs? Pulsars? Exotic scenarios?...) (AGNs? GRBs? Pulsars? Exotic scenarios?...) • Cutoff or not cutoff?Cutoff or not cutoff?
ACAT 2007ACAT 2007
Energy spectra and mass Energy spectra and mass compositioncomposition
Propagation through galactic Propagation through galactic and intergalactic mediumand intergalactic medium
Arrival direction and anisotropies
3 principal research fields, 3 principal research fields, interconnectedinterconnected
each othereach other
Why to study mass composition?
ACAT 2007ACAT 2007
• Discrimination between different models advanced to explain the cosmic rays originDiscrimination between different models advanced to explain the cosmic rays origin
(Different energy spectra predicted to be observed at ground from model to model, according to the mass of the primary) (Different energy spectra predicted to be observed at ground from model to model, according to the mass of the primary)
Importance of event-by-event mass analysisImportance of event-by-event mass analysis
• Study possible correlations between the mass of the event and the arrival direction at groundStudy possible correlations between the mass of the event and the arrival direction at ground• Correct the reconstructed energy of the shower with the right missing energy factor (reduce systematic uncertainties in the Correct the reconstructed energy of the shower with the right missing energy factor (reduce systematic uncertainties in the
measurement of the energy)measurement of the energy)
How to study mass composition?
ACAT 2007ACAT 2007
• Indirect methods Indirect methods
Need some shower observables sensitive to the primary massNeed some shower observables sensitive to the primary mass
Need to rely on simulation codes and parameterizations of the Need to rely on simulation codes and parameterizations of the interactions in the low and high energy regimeinteractions in the low and high energy regime
Heavy nuclei-induced cascades develop faster in atmosphere than light nuclei-induced ones (at the same energy and zenith), due to their higher interaction cross section with air.This behaviour results in a set of mass-discriminating parameters:• Longitudinal shower profiles Longitudinal shower profiles
(number of particles in the (number of particles in the cascade vs atmospheric cascade vs atmospheric depth)depth)
Shifts of Shifts of 100 g/cm100 g/cm22 in the in the depth at which the cascade depth at which the cascade has its maximum has its maximum
• Number of muons and Number of muons and electrons at a given distance electrons at a given distance from the shower core (usually from the shower core (usually 1000 m)1000 m)
Less muons in a proton shower Less muons in a proton shower than in an iron one.than in an iron one.
How to study mass composition?
ACAT 2007ACAT 2007
Other parameters so far have been used: steepness of the Other parameters so far have been used: steepness of the lateral distribution function, rise time of the signals in ground lateral distribution function, rise time of the signals in ground detectors, shower curvature parameters,…detectors, shower curvature parameters,…
How to study mass composition?
ACAT 2007ACAT 2007
Mass identification…a very difficult task:Mass identification…a very difficult task:• Any parameter does not show a strong correlation to the massAny parameter does not show a strong correlation to the mass• Correlation to the mass is reduced by intrinsic shower-to-shower fluctuations and by detector Correlation to the mass is reduced by intrinsic shower-to-shower fluctuations and by detector responseresponse• In any case any prediction is always extremely dependent on the adopted interaction modelIn any case any prediction is always extremely dependent on the adopted interaction model
Combine different observables to perform a multidimensional analysisCombine different observables to perform a multidimensional analysis
Event-by-event case in a multicomponent primary flux is prohibitive. Event-by-event case in a multicomponent primary flux is prohibitive.
The Pierre Auger Experiment
ACAT 2007ACAT 2007
Actual status of Auger SudActual status of Auger Sud
SDSD: About 1164 tanks running : About 1164 tanks running To be completed at the end of 2007To be completed at the end of 2007 FDFD: Completed: Completed
Auger Sud (Malargue – Auger Sud (Malargue – Argentina)Argentina)
• 1600 Cherenkov 1600 Cherenkov detectorsdetectors
• 4 fluorescence sites 4 fluorescence sites (6 telescope each)(6 telescope each)• Tank spacing: 1.5 kmTank spacing: 1.5 km 100% efficiency above 100% efficiency above
101018.5 18.5 eVeVExtension 3000
km2
Auger North (Lamar – USA)Auger North (Lamar – USA)• Still in project phaseStill in project phase
Experimental techniques
Surface Surface DetectionDetection•Cherenkov Cherenkov
detectorsdetectors•Shower front Shower front
observation at observation at groundground
•100% duty cycle 100% duty cycle
Fluorescence Fluorescence DetectionDetection
•Telescope with a Telescope with a PMTs cameraPMTs camera
•Fluorescence Fluorescence light observation light observation in atmospherein atmosphere
•10% duty cycle10% duty cycle
Hybrid Hybrid DetectionDetection
Calorimetric energy calibration (FD) Calorimetric energy calibration (FD) + high event collecting power (SD)+ high event collecting power (SD)
Cross-check between the two Cross-check between the two techniquestechniques
ACAT 2007ACAT 2007
ACAT 2007ACAT 2007
Mass Analysis
Simulation strategySimulation strategy Parameters sensitive to the primary massParameters sensitive to the primary mass Neural network application Neural network application
Data setsData sets: : 36000 36000 protonsprotons
34000 helium nuclei34000 helium nuclei
29000 oxygen nuclei29000 oxygen nuclei
32000 silicon nuclei32000 silicon nuclei
29000 iron nuclei29000 iron nuclei
Simulation codeSimulation code: CONEX 1.4 (1-dimensional shower simulation, : CONEX 1.4 (1-dimensional shower simulation, appropriate for FD analysisappropriate for FD analysis
Hadronic interaction modelHadronic interaction model: QGSJET II-03: QGSJET II-03
Energy rangeEnergy range: 10: 101818-10-101919 eV eVZenith rangeZenith range: 0-60 degrees: 0-60 degrees
Uniform distributionsUniform distributions
Simulation strategySimulation strategy Parameters sensitive to the primary massParameters sensitive to the primary mass Neural network application Neural network application
Mass Analysis
Heavy nuclei-induced cascades develop faster in atmosphere than light nuclei-induced ones.The longitudinal profiles, measurable with the FD, could show this behaviour.
7 features as NN inputs7 features as NN inputs
p10, p50, p90p10, p50, p90: depths at : depths at which the 10%, 50%, 90% of which the 10%, 50%, 90% of the integral profile are the integral profile are reached;reached;
XXmaxmax: depth of shower maximum;: depth of shower maximum;
E, E, : primary energy and zenith angle;: primary energy and zenith angle;
NNmaxmax: number of charged : number of charged particles at shower particles at shower maximum;maximum;
ACAT 2007ACAT 2007
Mass Analysis
ACAT 2007ACAT 2007
Simulation strategySimulation strategy Parameters sensitive to the primary massParameters sensitive to the primary mass Neural network application
Data sets: 3 input data sets (learn, cross validation, test)Patterns random-selected
Feature preprocessing: normalization in the range [-1;1]
Error function: Mean Square Error 2
1
1
N
iii ty
NMSE
Learning algorithm: quasi-Newton with BFGS minimization formula
Mass Analysis
ACAT 2007ACAT 2007
Simulation strategySimulation strategy Parameters sensitive to the primary massParameters sensitive to the primary mass Neural network application
Net Architecture: • Optimize the net architecture (neurons per layer, number of hidden layers) to our specific problem; • Use tgh as activation functions in hidden layers and linear function in output layer;• No appreciable differences with logistic functions;
Identification procedure: • Train the network to assign 0,1,2,3,4 to proton, helium, oxigen, silicon, iron events;• Stop the training phase when overfitting appear in the cross validation set;• Cut over the net outputs to separate the mass classes;• Estimate the results in terms of identification efficiency and purity
Results – 2 components
ACAT 2007ACAT 2007
Efficiency:
Purity:
protonsirons VERY GOOD
IDENTIFICATION
NN design: 7-15-15-1
Good results even with only one hidden layer
ACAT 2007ACAT 2007
Results – 5 components
Efficiency:
Purity:
p/Fe BETTER RECOGNIZEDSTRONGER
CONTAMINATION IN INTERMEDIATE COMPONENTS
ACAT 2007ACAT 2007
Determining the mean composition
Given the classification matrix Cij, we determine the mean composition of a data sample, by solving this linear system:
trueFeFeFe
trueSiFeSi
trueOFeO
trueHeFeHe
truepFep
recFe
trueFeSiFe
trueSiSiSi
trueOSiO
trueHeSiHe
truepSip
recSi
trueFeOFe
trueSiOSi
trueOOO
trueHeOHe
truepOp
recO
trueFeHeFe
trueSiHeSi
trueOHeO
trueHeHeHe
truepHep
recHe
trueFepFe
trueSipSi
trueOpO
trueHepHe
trueppp
recp
ncncncncncn
ncncncncncn
ncncncncncn
ncncncncncn
ncncncncncn
nirec: number of reconstructed events in the sample for the
given i-th mass;cij: elements of the classification matrix;
nirec: true number of events in the sample for the given i-th
mass;Passing to the “fraction” notation…
M.Ambrosio et al, Astropart. Phys. 24 (2005) 355
ACAT 2007ACAT 2007
Determining the mean composition
We work with the fractions of event (abundances) for a given mass instead of using the number of events, scaling the ni with the total number of events N in the sample:
Nn
f ii
5
1
5
1
1j
truej
j
recj ff
5
1j
truejij
reci fcf
The linear system becomes:
with the constraints:
We solve the system minimizing with MINUIT the following function:
5
1
5
12
25
12 )1(
)(~
i
truei
i i
j
truejij
reci
ffcf
standard chisquare
constraint term
Lagrange multiplier
ACAT 2007ACAT 2007
Determining the mean composition
where the error is given by:
5
1
2
5
12 )(
)1(
j
truejij
jijij
truej
i fcN
ccf
variance of a multinomial distribution
uncertainty over the classification matrix
MINUIT solve the non-linear fit with the given constraints and returns the estimates of the true abundances.
ACAT 2007ACAT 2007
Results – Composition 1
Reconstructed fractions
Mass classes
Results – Composition 2
ACAT 2007ACAT 2007
Reconstructed fractions
Mass classes
Results – Composition 3 (iron most abundant)
ACAT 2007ACAT 2007
Reconstructed fractions
Mass classes
Results – Composition 4 (proton most abundant)
ACAT 2007ACAT 2007
Reconstructed fractions
Mass classes
Taking into account FD response
ACAT 2007ACAT 2007
Shower simulation and reconstruction with the Auger official Offline tool
Simulate the shower core in the field of view of FD (say LosLeones) Generation of fluorescence and Cherenkov light and propagation to the telescope aperture Simulation of PMT responses and trigger levels Reconstruction of shower parameters (energy, direction, longitudinal profile,…) Several quality cuts have been applied to the reconstructed
events:Require a good fit of the longitudinal profiles, observation of Xmax, …
Results – 2 components
ACAT 2007ACAT 2007
Early loss of NN generalization capabilities during the training
wN
jjwMSEREGMSE
1
2)1( Add a regularization term to MSE to avoid larger value weights
Results – 2 components
ACAT 2007ACAT 2007
Deviations from true fractions are around 5÷6 %
Conclusions and future plans
ACAT 2007ACAT 2007
Mass identification for p-Fe components performed with efficiency of nearly 100% Mass identification for 5-components performed with misclassification of 22%-30% for p-Fe component and 40% for intermediate components. Reconstructed mean mass composition deviates from the true one of about 5%
Pure simulated data
Reconstructed data Mass identification for p-Fe components performed with misclassification of 20-25% Reconstructed mean mass composition deviates from the true one of about 5%
Improve classification efficiency by adding parameters from SD Full hybrid simulation is required in this case using Corsika or Aires codes Better event quality cuts definition, analysis with multi-components flux, restrict analysis in smaller energy bin…application of the method over the Auger experimental data
WORK IN PROGRESS…