advanced residual analysis techniques for model selection
DESCRIPTION
3. 2. 5. 1. Advanced Residual Analysis Techniques for Model Selection. 4 University of Rome “ Tor Vergata ”. A.Murari 1 , D.Mazon 2 , J.Vega 3 , P.Gaudio 4 , M.Gelfusa 4 , A.Grognu 5 , I.Lupelli 4 , M.Odstrcil 5. The Scientific Method and Models. - PowerPoint PPT PresentationTRANSCRIPT
Advanced Residual Advanced Residual Analysis Techniques for Analysis Techniques for
Model SelectionModel SelectionA.MurariA.Murari11, D.Mazon, D.Mazon22, J.Vega, J.Vega33, P.Gaudio, P.Gaudio44, , M.GelfusaM.Gelfusa44, A.Grognu, A.Grognu55, I.Lupelli, I.Lupelli44, M.Odstrcil, M.Odstrcil55
1
32
4 University of Rome
“Tor Vergata”
5
The Scientific Method and Models
Model building is a innate faculty of human beings because it allows handling information in a much more economic way.
• Model validation: process of assessing the quality of your model
• Model selection: process of selecting the best model, among many, to interpret the available data.
A “subject value” approach is advocated: both model selection and model validation are “utility” based
Model
Model selection =No established and universal methodology available
Model Falsification Criterion (MFC) :
Estimates the most appropriate model among a set of competing and independent ones
Based both on the accuracy and the robustness of the candidate models
Implements a form of falsification principle more than the ‘Occam razor’
A model is not penalised for its complexity but on the basis of its lack of robustness
Model selection: introductionModel selection: introduction
ROBUSTNESS: a model is not penalised for its complexity but more for its lack of robustness, i.e. the fact that its estimates degrade if errors in the parameters are
made
small errors introduced on each model parameter study of the repercussions on the global estimates
The repercussions of the parameter errors are quantified with some sort of information theoretic
quantity (Shannon entropy) calculated for the residuals
MFCMFC : THE BASIC : THE BASIC PHYLOSOPHY PHYLOSOPHY
Details in the paper A.Murari et al “Preliminary discussion on a new Model Selection Criterion, based on the statistics of the residuals and the falsification principle”
Conference FDT2 (Frontiers in Diagnostic Technologies)
The correlation tests method Hypothesis: the noise is random and additiveConsequence: the residuals of a perfect model should be randomly distributed
The model with the distribution of the residuals closer to a random one is to be preferred
A random distribution of numbers (residuals) maximizes the Shannon entropy
Mathematical expression of the Model Falsification Criterion :
ri the absolute value of the i-th residual
pir the quantised probability of the i-th residual
rpar,i calculated after varying each parameter one at time (± 10 %)
ppar,i the quantised probability of this new residual
npar the number of model parameters
1<i<n where n is the total number of experimental points
MFC1 MFC2
parn
n
iparipar
n
ipar
parn
ri
ri
n
i
pp
r
np
p
rparrMFC
1
1 ,,
1,
1
1
1ln.2
11
ln.),(
MFCMFC : Matematics an example : Matematics an example
A better model = smaller sum of
residuals + higher entropy of residuals
A better model = smaller change in case of small
errors introduced on the various parameters
The best among the candidate models is the one which presents the lowest
value of the MFC indicator
Mathematical expression of the Model Falsification Criterion :
parn
i iparipar
iipar
par
iri
ri
ii
pp
r
np
p
rparrMFC
1
,,
,
1ln2
11
ln),(
MFCMFC : MATHEMATICS : MATHEMATICS
xxy 22
A purely numerical equation :
exact solution + random noise of ± 10% = synthetic experimental data
Seven models created to fit the data
xxy 221
22 xy
xxy 22 23
xxy 424
xxy 235
476 xy
))510(cos(427 xabsxxy
NUMERICAL TESTSNUMERICAL TESTS
noisexxydata %10)2( 2
xxy 221
NUMERICAL TESTSNUMERICAL TESTS
noisexxydata %10)2( 2
))510(cos(427 xabsxxy
NUMERICAL TESTSNUMERICAL TESTS
xxy 424
noisexxydata %10)2( 2
NUMERICAL TESTSNUMERICAL TESTS
22 xy
noisexxydata %10)2( 2
NUMERICAL TESTSNUMERICAL TESTS
476 xy
noisexxydata %10)2( 2
NUMERICAL TESTSNUMERICAL TESTS
xxy 22 23
noisexxydata %10)2( 2
NUMERICAL TESTSNUMERICAL TESTS
xxy 235
noisexxydata %10)2( 2
NUMERICAL TESTSNUMERICAL TESTS
Error of ±10% introduced on
each parameter one
at time MFC
evaluated for each model
Model classification obtained (classified in
order of increasing MFC value)
INTUITIVE CLASSIFICATION n Model 1
Model 7 Model 4 Model 2 Model 6 Model 3 Model 5
xxy 22 23
1
NUMERICAL TESTSNUMERICAL TESTS
MFC AIC BIC BModel 1 41 -105 -549
Model 7 106 959 372Model 4 120 889 435
Model 2 130 955 362 Model 6 309 1071 570 Model 3 493 1231 700
Model 5 5043 1634 1129
)ln(2 RSSnnAIC par
n
iirRSS
1
2
)ln()ln( 2 nnnBIC par
n
imi rr
n 1
2)(1
Various forms of MFC criteria seem to outperform traditional criteria in particular for extrapolation and
for high levels of noise
NUMERICAL TESTS: ResultsNUMERICAL TESTS: Results
Electron temperature required to access the H-mode of confinement in tokamak plasmas :
Variables scanned over their respective interval (using 500 values)
Synthetic experimental data generated by adding a random noise of ±10%
Five models considered to test the indicator
85.0
02.085.021.095.05106.9
q
nRaBtT
Bt R a n q N Min 2 0.8 0.2 1 2Max 8 2 0.7 10 8
85.0
02.085.021.095.05
1 106.9q
nRaBtT
85.0
84.028.095.06
2 100.1q
RaBtT
02.081.0
87.031.06
3 108.3nq
RaT
80.0
01.096.05
4 107.9q
nBtT
86.0
85.095.05
5 103.8q
RBtT
Scaling Laws: Numerical testsScaling Laws: Numerical tests
Error of ±10% introduced on each
parameter one at time
MFC evaluated for each model
Model classification obtained (classified in order of increasing
MFC value)
INTUITIVE CLASSIFICATION n
Model 1 Model 2 Model 5 Model 3 Model 4
80.0
01.096.05
4 107.9q
nBtT
85.0
02.085.021.095.05
1 106.9q
nRaBtT
85.0
84.028.095.06
2 100.1q
RaBtT
86.0
85.095.05
5 103.8q
RBtT
02.081.0
87.031.06
3 108.3nq
RaT
80.0
01.096.05
4 107.9q
nBtT
SCALING LAWSSCALING LAWS
INTUITIVE CLASSIFICATION MFC value n
Model 1 Model 2 = 2.40*106
Model 2 Model 1 = 3.21*106
Model 5 Model 5 = 4.81*106
Model 3 Model 3 =
1.12*107
Model 4 Model 4 = 1.22*107
Results of the MFC criterion :
MFC ClassificationMFC Classification
Results of the MFC criterion: since the exponent of ne is very low, at realistic noise levels the MFC realises that the models containing this quantity are prone to overfitting and that models without this parameter are more robust.
85.0
84.028.095.06
2 100.1q
RaBtT
85.0
02.085.021.095.05
1 106.9q
nRaBtT
VS
small dependence from the density
when affected by an error bigger MFC value
not a fundamental variable
VS
Residuals Residuals
The MFC criterion also automatically penalises the major and minor radii from the scaling laws of individual devices because they do not vary over a significant range
errors introduced = bounds of the 85% confidence interval
theoretical models variables used models generated n
Chankin Bt, q, R model 1
Kernel collisionless
Kernel collisional Bt, q, R, n model 2
Rogister
Scott Bt, n model 3
Shaing & Crume q, R, a, n model 4
none Bt, q, R, a, n model 5
none Bt, q, n model 6
linear regression
lowerupper
56.641.0
25.351009.1
Rq
Bt
lower bound central value upper bound n
3.05 3.25 3.44
ITPA DATABASEITPA DATABASE
JET : Linear regression VS well-known theoretical models
ITPA DATABASEITPA DATABASE
56.641.0
25.35
1 1009.1Rq
BtT
123.0125.05.0689 RqBtTChankin
12.049.544.0
26.34
2 1005.4nRq
BtT
11.0
16.3
3 77n
BtT
19.0
55.335.041.22
4 103.1q
anRT
4.18.08.0
234
1 103nRq
BtTKernel
07.013.013.042.02 6.24 nRqBtTKernel
2
2
Bt
nqRTRogister
n
BtT tS
224
cot 103.6
75.0
5.025.15.09
& 105.1a
nRqT CrumeShaing
JET : 469 shots MFC n
Model 6 4.49*104
Model 3 4.49*104
Model 5 2.08*105
Model 2 2.18*105
Model 1 3.27*105
Model 4 7.88*105
11.0
16.3
3 77n
BtT
48.127.011.645.0
42.310
5 1005.1anRq
BtT
12.049.544.0
26.34
2 1005.4nRq
BtT
56.641.0
25.35
1 1009.1Rq
BtT
19.0
55.335.041.22
4 103.1q
anRT
ITPA DATABASEITPA DATABASE
19.045.0
22.35
6 1080.5nq
BtT
ASDEX : 48 shots MFC n
Model 6 4265
Model 3 4664
Model 5 1.83*105
Model 1 2.22*105
Model 2 4.11*105
Model 4 1.05*106
79.028.0
94.03
6 10nq
BtT
92.0
92.0
3 854n
BtT
17.034.1085.0
77.109
5 103.7nRq
aBtT
03.1223.0
75.05
1 1060.2Rq
BtT
30.07.924.0
80.05
2 10nRq
BtT
22.4340.0
54.685.114
4 103.5Rq
anT
ITPA DATABASEITPA DATABASE
CMOD : 98 shots MFC n
Model 6 2.13*105
Model 3 2.21*105
Model 2 1.06*109
Model 1 1.24*109
Model 4 1.17*1010
Model 5 1.49*1010
61.2
35.217.03
6 1039.1Bt
qnT
27.019.24
3
11068.6
nBtT
03.2543.2
19.007.22
2 1021.7RBt
nqT
36.2433.2
96.11
1 1062.1RBt
qT
05.057.3394.4
67.18
4 1054.7nRa
qT
11.579.2244.2
18.004.25
5 1003.8aRBt
nqT
ITPA DATABASEITPA DATABASE
JET + ASDEX + CMOD : 615 shots MFC n
Model 4 2.20*105
Model 3 2.25*105
Model 6 2.37*105
Model 5 2.68*105
Model 2 2.72*105
Model 1 2.93*105
07.097.0
09.045.13
4 1006.4qR
naT
91.0
07.2
3 386n
BtT
55.092.0
22.2
6 664qn
BtT
12.137.029.0
46.272.2
5 16anq
RBtT
36.038.0
07.137.2
2 102nq
RBtT
28.0
50.119.2
1 53q
RBtT
ITPA DATABASEITPA DATABASE
Summary of the best results obtained :
JET :
ASDEX :
CMOD :
All the database :
ITPA DATABASE: SummaryITPA DATABASE: Summary
07.097.0
09.045.13
4 1006.4qR
naT
61.2
35.217.03
6 1039.1Bt
qnT
79.028.0
94.03
6 10nq
BtT
19.045.0
22.35
6 1080.5nq
BtT
The MFC determines that Te depends only
on Bt, n and q but with not the same
exponents at all
Results are very different from the ones obtained with each independent
tokamak
Analyse of the best results obtained :
ITPA: Comparison ExponentsITPA: Comparison Exponents
Plasma radius, a and R, not evaluated as fundamental variables
Not the same exponents at all
19.045.0
22.35
6 1080.5nq
BtT
79.028.0
94.03
6 10nq
BtT
61.2
35.217.03
6 1039.1Bt
qnT
JET ASDEX CMOD
378.2 R
13.189.0 a
70.167.1 R
39.037.0 a
68.066.0 R
23.021.0 a
91.743.3 Bt63 q266 n
80.238.1 Bt53.420.2 q
08.342.2 n
48.31 Bt
37.923.2 q
16.631.0 n
- The MFC criterion has some potential advantages compared to traditional criteria particularly in the case of scaling laws and extrapolation
- The application to the ITPA database has given some interesting results (See also talk by I.Lupelli)
- Model selection: various alternative MFC criteria are being applied to the power threshold to reach the H mode of confinement
Summary and Future developments