report from durham statistics workshop - part 1 · • statistical issues to do with defining...
TRANSCRIPT
3rd April 2002 Tim Adye 1
Report from DurhamReport from DurhamStatistics WorkshopStatistics Workshop
Part 1Part 1
Tim AdyeParticle Physics Department
Rutherford Appleton Laboratory
HEP Seminar3rd April 2002
3rd April 2002 Tim Adye 2
“Advanced Statistical Techniquesin Particle Physics”
• The aim of the Conference was to discuss advanced statisticalanalysis techniques as used in measurements and searches inParticle Physics, including Astroparticle Physics.• Combination of analyses and results• Simulation issues and Monte Carlo theory• Treatment of systematics in theory and practice• Signal significance• Setting limits• Multivariate event classification• Convolution and deconvolution• Optimal measurements• Techniques for 'blind' analyses• Statistical issues to do with defining uncertainties on parton
distributions extracted from global fits to data
• http://www.ippp.dur.ac.uk/statistics/
3rd April 2002 Tim Adye 3
My overall impression
• I learnt a lot• Since I’m no expert, I found the pedagogical sessions
particularly useful.
• Lots of useful techniques• Now all I need is applications to try them out!
• Also some good advice on what not to do• Lots of good-natured discussion between
Frequentists and Bayesians• Almost philosophy!
3rd April 2002 Tim Adye 4
Plan of talk
• I will concentrate on the review talks.• Bill will summarise some of the presentations of new
techniques.
• Overview of Bayesian and Frequentist principles[Fred James]
• Multivariate Analysis [Harrison Prosper]
• Choosing variables for a Multivariate Analysis[Sherry Towers]
• Blind analyses [Paul Harrison]
• Systematic errors: facts and fictions [Roger Barlow]
• Summary [Bob Cousins]
3rd April 2002 Tim Adye 5
Overview of Bayesian and Frequentistprinciples
Fred JamesCERN
Fred James
3rd April 2002 Tim Adye 6
Fred James
������������������ ���� �������� ������
��������� ����������������� ��������� �����
3rd April 2002 Tim Adye 7
Fred James
3rd April 2002 Tim Adye 8
Fred James
3rd April 2002 Tim Adye 9
Fred James
3rd April 2002 Tim Adye 10
Fred James
Bayesian statistics require a prior PDF
3rd April 2002 Tim Adye 11
Additional Comments
• Note: Bayes’ theorem applies foreither definition
• It’s just a consequence of axioms of probability
• HEP is used to an ensemble of experiments(eg. MC samples)• Thus frequentist interpretation seems more natural to us
• Bayesian methods are hard to do right, but can be the onlyway to attack certain hard problems.
• Bayesian interpretation maybe more appropriate forsystematics
3rd April 2002 Tim Adye 12
Multivariate AnalysisMultivariate AnalysisA Unified PerspectiveA Unified Perspective
Harrison B. ProsperFlorida State University
Advanced Statistical Techniques in Particle PhysicsDurham, UK, 20 March 2002
Harrison Prosper
3rd April 2002 Tim Adye 13
Introduction – i
• Multivariate analysis is hard!• Our mathematical intuition based on analysis in one
dimension often fails rather badly for spaces of very highdimension.
• One should distinguish the problem to be solved fromthe algorithm to solve it.
• Typically, the problems to be solved, when viewedwith sufficient detachment, are relatively few innumber whereas algorithms to solve them areinvented every day.
Harrison Prosper
3rd April 2002 Tim Adye 14
Introduction – ii
• So why bother with multivariate analysis?
• Because:
• The variables we use to describe events are usuallystatistically dependent.
• Therefore, the N-dimensional density of the variablescontains more information than is contained in the set of 1-dmarginal densities fi(xi).
• This extra information may be useful
Harrison Prosper
3rd April 2002 Tim Adye 15
Some Multivariate Methods
• Fisher Linear Discriminant (FLD)
• Principal Component Analysis (PCA)
• Independent Component Analysis (ICA)• Self Organizing Map (SOM)
• Random Grid Search (RGS)
• Probability Density Estimation (PDE)
• Artificial Neural Network (ANN)
• Support Vector Machine (SVM)
• There is considerably empirical evidence that, as yet, nouniformly most powerful method exists. Therefore, be wary ofclaims to the contrary!
Harrison Prosper
3rd April 2002 Tim Adye 16
Event classification – i(eg. signal/background)
• Every classification task tries to solves the samefundamental problem, which is:• After adequately pre-processing the data
• …find a good, and practical, approximation to the Bayesdecision rule: Given X, if P(S|X) > P(B|X) , choosehypothesis S otherwise choose B.
• If we knew the densities p(X|S) and p(X|B) and thepriors p(S) and p(B) we could compute the BayesDiscriminant Function (BDF):• D(X) = P(S|X)/P(B|X)
Harrison Prosper
3rd April 2002 Tim Adye 17
Event classification – ii
• The Fisher discriminant (FLD), random grid search(RGS), probability density estimation (PDE), neuralnetwork (ANN) and support vector machine (SVM)are simply different algorithms to approximate theBayes discriminant function D(X), or a functionthereof.
• It follows, therefore, that if a method is already closeto the Bayes limit, then no other method, howeversophisticated, can be expected to yield dramaticimprovements.
Harrison Prosper
3rd April 2002 Tim Adye 18
Benefits of Minimizing theBenefits of Minimizing theNumber of Discriminators UsedNumber of Discriminators Used
in a Multivariate Analysisin a Multivariate Analysis
Sherry TowersState University of New York
at Stony Brook
Sherry Towers
3rd April 2002 Tim Adye 19
The case for fewer discriminators…
• Using a large number of variables indiscriminatelycan indicate a lack of forethought in the design andconceptualization of an analysis
• Also, each added variable makes it more difficult todetermine if modelling of data is sound, and makesanalysis more difficult to understand
• And, each added variable adds statistical noise…Thiscan degrade overall discrimination power!
Sherry Towers
3rd April 2002 Tim Adye 20
Implementing the procedure…
• choose the combination of variables that maximisesS/sqrt(S+B) (as long as S/sqrt(S+B) is X standarddeviations better than S/sqrt(S+B) from previousiteration)
• Very easy to implement in analysis code!• TerraFerMA, a program that interfaces to MLPfit, Jetnet,
PDE methods, Fisher Discriminant, etc, etc, etc, includesthis variable sorting method. User can quickly and easilysort potential discriminators.• http://www-d0.fnal.gov/~smjt/ferma.ps
• In general case, variables deletion is safer than variableaddition. – Michael Goldstein
Sherry Towers
3rd April 2002 Tim Adye 21
A “real-world” example…
• A Tevatron RunI analysis used a 7 variable NN todiscriminate between signal and background.
• Were all 7 needed?
• Ran the signal and background n-tuples through theTerraFerMA interface to the sorting method…
Sherry Towers
3rd April 2002 Tim Adye 22
A “real-world” example…
0
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
1
0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1signal efficiency
back
grou
nd e
ffic
ienc
y
All seven original variables in Jetnet
Only the two best discriminators in Jetnet
Sherry Towers
3rd April 2002 Tim Adye 23
Blind Analysis
Paul Harrison
Queen Mary University of London
Durham Workshop on Advanced Statistical Techniques
in Particle Physics
Acknowledgements: B. Meadows, J. Richman, A. Roodman, A. Watson
Paul Harrison
3rd April 2002 Tim Adye 24
An Example of Experimenters' Bias: the \Split A2"
� At CERN in the mid 1960's, a group using a missing mass spectrometer
observed several new mesons in the MM spectrum from
��p! p+MM�
� The A2 (now known as the I=1 member
of the 2+ nonet) was apparently split. It
was �tted with a dipole form.
� The split A2 was discussed for several
years, and generated considerable specu-
lation by theorists on causality, non-local
�eld theory etc.
Paul Harrison
3rd April 2002 Tim Adye 25
The Split \A2" Contd.
� Similar experiments performed later found no evidence at all for a split.
� Other experiments gathered data on A2 via decays to K+K�. They found no
evidence for a split.
� At the Washington APS meeting of 1971, B. Maglic, the spokesman of the
original CERN experiment, revealed that several cuts which had been made on
the data were unneccessary.
� One of the cuts was based on "running conditions". His group discarded whole
runs in which the split did not show up!
� This was widely regarded as an example of \innocent bias".
Paul Harrison
3rd April 2002 Tim Adye 26
Do the LEP Experiments Agree Too Well?
Yes! (agreement with SM is also v.
good cf. error.)
Is this because:
� experimenters' bias has pulled
the results closer to each other
and to SM value?
� systematic uncertainties are
overestimated?
� Or have we simply chosen one
result which happens to have
uctuated \down" in �2? �2=dof = 0:92=7
(=2.1/7 ignoring systematic errors!).
Paul Harrison
3rd April 2002 Tim Adye 27
Blinding in Rare Decay Analyses
\Rare decays" =)
� BF not yet measured, or poorly known
� Backgrounds probably large: analysis must provide signi�cant reduction factor
Then blind analysis highly desirable!
For \cut & count" analysis, the hidden
\signal box" method is recommended.
Signal box de�ned by loose cuts.
Blinding means excluding events in sig-
nal box from analysis AND plots.
Sidebands are used to characterise the
background in each variable
Assumes that the variables are uncor-
related - checked in Monte Carlo.
Paul Harrison
3rd April 2002 Tim Adye 28
Unblinding
� In BABAR an analysis will normally have been presented to an Analysis
Working Group before unblinding. The presentation will include:
{ Description of cut optimisation and BG characterisation.
{ Expected # of background events in signal box
{ Signal eÆciency from MC or control samples
{ Expected statistical sensitivity
{ Estimation of systematic errors
� After discussion, it can be unblinded.
� For a rare decay mode, this is essentially just a counting exercise: how many
events are inside the signal box?
Paul Harrison
3rd April 2002 Tim Adye 29
Closing Comments
� BA brings particle physics into line with best practice from other branches of
science.
� More a formalisation of good experimental practice, than a radical new idea.
� An analysis which is not blind, is not necessarily a wrong analysis
� An analysis which is blind is not necessarily a right analysis
� The �eld has its fair-share of embarrassing wrong results
� Even the chance of experimenters' bias reduces our con�dence in our results.
� If we can reduce risks of bias, why not do so?
Paul Harrison
3rd April 2002 Tim Adye 30
� � � � � � � � � � � � � � � � � � � � � � � � �
� � � � � � �
� � � � � � ! # % ' ! ) + - / ! + 2 3 5 2 6 / � ) + ' 2 ) ' / !
< = ' > / � ! ' � 2 # � B / 6 ) � E 3 ' B / � / 2 ) G � ' � � !
� � � > + K L + ) ' � 2 � E 5 2 6 / � ) + ' 2 ) U
< = ' > / � ! ' � 2 # W Y / � � � � � ! � 2 � � � � � ! ] L ^ ^ K /
� � � ` Y / 6 - ' 2 c E � � % ' ! ) + - / !
< = ' > / � ! ' � 2 # = / g 2 ' 2 c + i j l + K K = ' B / � / 2 6 / n
� p j U ! ) / l + ) ' 6 ! ' 2 G � + 6 ) ' 6 /
t u v w y z { w } ~ � � � � � v � � � � � { z � � � � y � y � ~ y � � { � ~ � � � � � ] + c / �
Roger Barlow
3rd April 2002 Tim Adye 31
Nice Java limits calculator – runs straight from Netscape!http://www.slac.stanford.edu/~barlow/
� � � � � � � � � � � � � � � × � � � � � � �
� ' l ' ) ! � 2 � E � � l � © ! / � > / 3 � ±
` � L ! ' 2 ! + 2 3 � ' c Y K + 2 3 # � Ë � �
» + » + � # � Ë � � � � � Ï �
j L G G � ! / U � L � © ! / � > / Ü / > / 2 ) !
5 2 6 / � ) + ' 2 ) U � E � � ! � 2 � � � �
# Ù $ Ü Ú E � � l � Ë Ý & � ' ! Ö á ± Ö ! Ù ` � � Ú © L ) Ö � ± � ! Ù » + » + � j � - Ú
� Y U . ÷ l © ' c L ' ) U ' 2 G � ' � � 0 - + L ! ! ' + 2 ' 2 � ' ! 2 � ) - + L ! ! ' + 2 ' 2 � ±
3 / B � / U ! n G � ' � � Ù L 2 ' E � � l ' 2 4 � � Ú c ' > / ! ' 2 ) / � l / 3 ' + ) / � / ! L K ) ±
t u v w y z { w } ~ � � � � � v � � � � � { z � � � � y � y � ~ y � � { � ~ � � � � � ] + c / á
Roger Barlow
3rd April 2002 Tim Adye 32
A Ù Ý Ü � Þ C A Ù Þ W Û ³ C Ý Ø D A Ú X Ú Û Ý � C � C A Ù Ø Ú Þ
µ � û ç } b û i ý n p t þ t l b i í � b í b n t ü i n j h t l l ç l ' 2 û t p n û ç } ü t i p t b n � b í b n t ü i n j h
t O t h n '
µ � û ç } b û i ý n � p ç 2 i n i ý ý n j ü t b 2 û t n û t l 2 û i n n û ç } � t l ç l ü t b n j b i h û t h �
ç l i ü j b n i � t ç l i p t þ i ý } i n j ç p ç i p } p h t l n i j p n í
µ � û ç } b û i ý n p ç n j p h ç l � ç l i n t b } h h t b b } ý h û t h � l t b } ý n b j p n ç n û í n ç n i ý b í b :
n t ü i n j h t l l ç l i p r ü i � t n û t l t ê í i b û j t ý r ê t û j p r 2 û j h û n ç û j r t n û í r ç r Ö í
l t b } ý n
µ � û ç } b û i ý n p ç n j p h ç l � ç l i n t i j ý t r h û t h � l t b } ý n b } p ý t b b n û ç } i l n n l } ý í i n
n û í 2 j n b ' t p r
µ � û ç } b û i ý n b i í 2 û i n n û ç } r ç t b n � i p r n û ç } b û i ý n ê t i ê ý t n ç ¸ } b n j í j n ç } n
ç n û j p t ç 2 p ü ç } n û ¹ p ç n n û t ü ç } n û ç n û í b } � t l þ j b ç l � p ç l n û í h ç ý ý t i Ö } t
2 û ç r j r n û t i p i ý í b j b ý i b n n j ü t � p ç l n û í ý ç h i ý b n i n j b n j h b Ö } l } � p ç l n û í ü i n t
r ç 2 p n û t � } ê & ç n û t b t � i p r n û ç } b û i ý n , ç } l j b û � i p r n û j p t i p i ý í b j b ý j � t 2 j b t
� � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � ¢ £ � � � � º » » º Õ i Ö t ä ä
> g� Ë > gg _ > g�
Roger Barlow
��������������� ������� ���������
Uncertainty on systematic effect
3rd April 2002 Tim Adye 33
Conference SummaryConference Summary
Bob Cousins, UCLA22 March 2002
Bob Cousins
3rd April 2002 Tim Adye 34
Educate Your Colleagues!
• The area under the likelihood function ismeaningless.
• Mode of a probability density is metric-dependent, asare shortest intervals.
• A confidence interval is a statement aboutP(data | parameters), not P(parameters | data)
• Don’t confuse confidence intervals (statements aboutparameter) with goodness of fit (statement aboutmodel itself).
• P(non-SM physics | data) requires a prior; you won’tget it from frequentist statistics.
• The argument for coherence of Bayesian P is basedon P = subjective degree of belief.
Bob Cousins
Not a PDF!