report from durham statistics workshop - part 1 · • statistical issues to do with defining...

3rd April 2002 Tim Adye 1

Report from DurhamReport from DurhamStatistics WorkshopStatistics Workshop

Part 1Part 1

Tim AdyeParticle Physics Department

Rutherford Appleton Laboratory

HEP Seminar3rd April 2002


“Advanced Statistical Techniquesin Particle Physics”

• The aim of the Conference was to discuss advanced statisticalanalysis techniques as used in measurements and searches inParticle Physics, including Astroparticle Physics.• Combination of analyses and results• Simulation issues and Monte Carlo theory• Treatment of systematics in theory and practice• Signal significance• Setting limits• Multivariate event classification• Convolution and deconvolution• Optimal measurements• Techniques for 'blind' analyses• Statistical issues to do with defining uncertainties on parton

distributions extracted from global fits to data

• http://www.ippp.dur.ac.uk/statistics/


My overall impression

• I learnt a lot• Since I’m no expert, I found the pedagogical sessions

particularly useful.

• Lots of useful techniques• Now all I need is applications to try them out!

• Also some good advice on what not to do• Lots of good-natured discussion between

Frequentists and Bayesians• Almost philosophy!


Plan of talk

• I will concentrate on the review talks.• Bill will summarise some of the presentations of new

techniques.

• Overview of Bayesian and Frequentist principles[Fred James]

• Multivariate Analysis [Harrison Prosper]

• Choosing variables for a Multivariate Analysis[Sherry Towers]

• Blind analyses [Paul Harrison]

• Systematic errors: facts and fictions [Roger Barlow]

• Summary [Bob Cousins]


Overview of Bayesian and Frequentistprinciples

Fred JamesCERN

Fred James


Fred James

��

��


Fred James


Fred James

Bayesian statistics require a prior PDF


Additional Comments

• Note: Bayes’ theorem applies foreither definition

• It’s just a consequence of axioms of probability

• HEP is used to an ensemble of experiments(eg. MC samples)• Thus frequentist interpretation seems more natural to us

• Bayesian methods are hard to do right, but can be the onlyway to attack certain hard problems.

• Bayesian interpretation maybe more appropriate forsystematics


Multivariate AnalysisMultivariate AnalysisA Unified PerspectiveA Unified Perspective

Harrison B. ProsperFlorida State University

Advanced Statistical Techniques in Particle PhysicsDurham, UK, 20 March 2002

Harrison Prosper


Introduction – i

• Multivariate analysis is hard!• Our mathematical intuition based on analysis in one

dimension often fails rather badly for spaces of very highdimension.

• One should distinguish the problem to be solved fromthe algorithm to solve it.

• Typically, the problems to be solved, when viewedwith sufficient detachment, are relatively few innumber whereas algorithms to solve them areinvented every day.

Harrison Prosper


Introduction – ii

• So why bother with multivariate analysis?

• Because:

• The variables we use to describe events are usuallystatistically dependent.

• Therefore, the N-dimensional density of the variablescontains more information than is contained in the set of 1-dmarginal densities fi(xi).

• This extra information may be useful

Harrison Prosper


Some Multivariate Methods

• Fisher Linear Discriminant (FLD)

• Principal Component Analysis (PCA)

• Independent Component Analysis (ICA)• Self Organizing Map (SOM)

• Random Grid Search (RGS)

• Probability Density Estimation (PDE)

• Artificial Neural Network (ANN)

• Support Vector Machine (SVM)

• There is considerably empirical evidence that, as yet, nouniformly most powerful method exists. Therefore, be wary ofclaims to the contrary!

Harrison Prosper


Event classification – i(eg. signal/background)

• Every classification task tries to solves the samefundamental problem, which is:• After adequately pre-processing the data

• …find a good, and practical, approximation to the Bayesdecision rule: Given X, if P(S|X) > P(B|X) , choosehypothesis S otherwise choose B.

• If we knew the densities p(X|S) and p(X|B) and thepriors p(S) and p(B) we could compute the BayesDiscriminant Function (BDF):• D(X) = P(S|X)/P(B|X)

Harrison Prosper


Event classification – ii

• The Fisher discriminant (FLD), random grid search(RGS), probability density estimation (PDE), neuralnetwork (ANN) and support vector machine (SVM)are simply different algorithms to approximate theBayes discriminant function D(X), or a functionthereof.

• It follows, therefore, that if a method is already closeto the Bayes limit, then no other method, howeversophisticated, can be expected to yield dramaticimprovements.

Harrison Prosper


Benefits of Minimizing theBenefits of Minimizing theNumber of Discriminators UsedNumber of Discriminators Used

in a Multivariate Analysisin a Multivariate Analysis

Sherry TowersState University of New York

at Stony Brook

Sherry Towers


The case for fewer discriminators…

• Using a large number of variables indiscriminatelycan indicate a lack of forethought in the design andconceptualization of an analysis

• Also, each added variable makes it more difficult todetermine if modelling of data is sound, and makesanalysis more difficult to understand

• And, each added variable adds statistical noise…Thiscan degrade overall discrimination power!

Sherry Towers


Implementing the procedure…

• choose the combination of variables that maximisesS/sqrt(S+B) (as long as S/sqrt(S+B) is X standarddeviations better than S/sqrt(S+B) from previousiteration)

• Very easy to implement in analysis code!• TerraFerMA, a program that interfaces to MLPfit, Jetnet,

PDE methods, Fisher Discriminant, etc, etc, etc, includesthis variable sorting method. User can quickly and easilysort potential discriminators.• http://www-d0.fnal.gov/~smjt/ferma.ps

• In general case, variables deletion is safer than variableaddition. – Michael Goldstein

Sherry Towers


A “real-world” example…

• A Tevatron RunI analysis used a 7 variable NN todiscriminate between signal and background.

• Were all 7 needed?

• Ran the signal and background n-tuples through theTerraFerMA interface to the sorting method…

Sherry Towers


A “real-world” example…

0

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

1

0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1signal efficiency

back

grou

nd e

ffic

ienc

y

All seven original variables in Jetnet

Only the two best discriminators in Jetnet

Sherry Towers


Blind Analysis

Paul Harrison

Queen Mary University of London

Durham Workshop on Advanced Statistical Techniques

in Particle Physics

Acknowledgements: B. Meadows, J. Richman, A. Roodman, A. Watson

Paul Harrison


An Example of Experimenters' Bias: the \Split A2"

� At CERN in the mid 1960's, a group using a missing mass spectrometer

observed several new mesons in the MM spectrum from

��p! p+MM�

� The A2 (now known as the I=1 member

of the 2+ nonet) was apparently split. It

was �tted with a dipole form.

� The split A2 was discussed for several

years, and generated considerable specu-

lation by theorists on causality, non-local

�eld theory etc.

Paul Harrison


The Split \A2" Contd.

� Similar experiments performed later found no evidence at all for a split.

� Other experiments gathered data on A2 via decays to K+K�. They found no

evidence for a split.

� At the Washington APS meeting of 1971, B. Maglic, the spokesman of the

original CERN experiment, revealed that several cuts which had been made on

the data were unneccessary.

� One of the cuts was based on "running conditions". His group discarded whole

runs in which the split did not show up!

� This was widely regarded as an example of \innocent bias".

Paul Harrison


Do the LEP Experiments Agree Too Well?

Yes! (agreement with SM is also v.

good cf. error.)

Is this because:

� experimenters' bias has pulled

the results closer to each other

and to SM value?

� systematic uncertainties are

overestimated?

� Or have we simply chosen one

result which happens to have

uctuated \down" in �2? �2=dof = 0:92=7

(=2.1/7 ignoring systematic errors!).

Paul Harrison


Blinding in Rare Decay Analyses

\Rare decays" =)

� BF not yet measured, or poorly known

� Backgrounds probably large: analysis must provide signi�cant reduction factor

Then blind analysis highly desirable!

For \cut & count" analysis, the hidden

\signal box" method is recommended.

Signal box de�ned by loose cuts.

Blinding means excluding events in sig-

nal box from analysis AND plots.

Sidebands are used to characterise the

background in each variable

Assumes that the variables are uncor-

related - checked in Monte Carlo.

Paul Harrison


Unblinding

� In BABAR an analysis will normally have been presented to an Analysis

Working Group before unblinding. The presentation will include:

{ Description of cut optimisation and BG characterisation.

{ Expected # of background events in signal box

{ Signal eÆciency from MC or control samples

{ Expected statistical sensitivity

{ Estimation of systematic errors

� After discussion, it can be unblinded.

� For a rare decay mode, this is essentially just a counting exercise: how many

events are inside the signal box?

Paul Harrison


Closing Comments

� BA brings particle physics into line with best practice from other branches of

science.

� More a formalisation of good experimental practice, than a radical new idea.

� An analysis which is not blind, is not necessarily a wrong analysis

� An analysis which is blind is not necessarily a right analysis

� The �eld has its fair-share of embarrassing wrong results

� Even the chance of experimenters' bias reduces our con�dence in our results.

� If we can reduce risks of bias, why not do so?

Paul Harrison


� � � � � � � � � � � � � � � � � � � � � � � � �

� � � � � � �

� � � � � � ! # % ' ! ) + - / ! + 2 3 5 2 6 / � ) + ' 2 ) ' / !

< = ' > / � ! ' � 2 # � B / 6 ) � E 3 ' B / � / 2 ) G � ' � � !

� � � > + K L + ) ' � 2 � E 5 2 6 / � ) + ' 2 ) U

< = ' > / � ! ' � 2 # W Y / � � � � � ! � 2 � � � � � ! ] L ^ ^ K /

� � � ` Y / 6 - ' 2 c E � � % ' ! ) + - / !

< = ' > / � ! ' � 2 # = / g 2 ' 2 c + i j l + K K = ' B / � / 2 6 / n

� p j U ! ) / l + ) ' 6 ! ' 2 G � + 6 ) ' 6 /

t u v w y z { w } ~ � � � � � v � � � � � { z � � � � y � y � ~ y � � { � ~ � � � � � ] + c / �

Roger Barlow


Nice Java limits calculator – runs straight from Netscape!http://www.slac.stanford.edu/~barlow/

� � � � � � � � � � � � � � � × � � � � � � �

� ' l ' ) ! � 2 � E � � l � © ! / � > / 3 � ±

` � L ! ' 2 ! + 2 3 � ' c Y K + 2 3 # � Ë � �

» + » + � # � Ë � � � � � Ï �

j L G G � ! / U � L � © ! / � > / Ü / > / 2 ) !

5 2 6 / � ) + ' 2 ) U � E � � ! � 2 � � � �

# Ù $ Ü Ú E � � l � Ë Ý & � ' ! Ö á ± Ö ! Ù ` � � Ú © L ) Ö � ± � ! Ù » + » + � j � - Ú

� Y U . ÷ l © ' c L ' ) U ' 2 G � ' � � 0 - + L ! ! ' + 2 ' 2 � ' ! 2 � ) - + L ! ! ' + 2 ' 2 � ±

3 / B � / U ! n G � ' � � Ù L 2 ' E � � l ' 2 4 � � Ú c ' > / ! ' 2 ) / � l / 3 ' + ) / � / ! L K ) ±

t u v w y z { w } ~ � � � � � v � � � � � { z � � � � y � y � ~ y � � { � ~ � � � � � ] + c / á

Roger Barlow


A Ù Ý Ü � Þ C A Ù Þ W Û ³ C Ý Ø D A Ú X Ú Û Ý � C � C A Ù Ø Ú Þ

µ � û ç } b û i ý n p t þ t l b i í � b í b n t ü i n j h t l l ç l ' 2 û t p n û ç } ü t i p t b n � b í b n t ü i n j h

t O t h n '

µ � û ç } b û i ý n � p ç 2 i n i ý ý n j ü t b 2 û t n û t l 2 û i n n û ç } � t l ç l ü t b n j b i h û t h �

ç l i ü j b n i � t ç l i p t þ i ý } i n j ç p ç i p } p h t l n i j p n í

µ � û ç } b û i ý n p ç n j p h ç l � ç l i n t b } h h t b b } ý h û t h � l t b } ý n b j p n ç n û í n ç n i ý b í b :

n t ü i n j h t l l ç l i p r ü i � t n û t l t ê í i b û j t ý r ê t û j p r 2 û j h û n ç û j r t n û í r ç r Ö í

l t b } ý n

µ � û ç } b û i ý n p ç n j p h ç l � ç l i n t i j ý t r h û t h � l t b } ý n b } p ý t b b n û ç } i l n n l } ý í i n

n û í 2 j n b ' t p r

µ � û ç } b û i ý n b i í 2 û i n n û ç } r ç t b n � i p r n û ç } b û i ý n ê t i ê ý t n ç ¸ } b n j í j n ç } n

ç n û j p t ç 2 p ü ç } n û ¹ p ç n n û t ü ç } n û ç n û í b } � t l þ j b ç l � p ç l n û í h ç ý ý t i Ö } t

2 û ç r j r n û t i p i ý í b j b ý i b n n j ü t � p ç l n û í ý ç h i ý b n i n j b n j h b Ö } l } � p ç l n û í ü i n t

r ç 2 p n û t � } ê & ç n û t b t � i p r n û ç } b û i ý n , ç } l j b û � i p r n û j p t i p i ý í b j b ý j � t 2 j b t

� � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � ¢ £ � � � � º » » º Õ i Ö t ä ä

> g� Ë > gg _ > g�

Roger Barlow

��

Uncertainty on systematic effect


Conference SummaryConference Summary

Bob Cousins, UCLA22 March 2002

Bob Cousins


Educate Your Colleagues!

• The area under the likelihood function ismeaningless.

• Mode of a probability density is metric-dependent, asare shortest intervals.

• A confidence interval is a statement aboutP(data | parameters), not P(parameters | data)

• Don’t confuse confidence intervals (statements aboutparameter) with goodness of fit (statement aboutmodel itself).

• P(non-SM physics | data) requires a prior; you won’tget it from frequentist statistics.

• The argument for coherence of Bayesian P is basedon P = subjective degree of belief.

Bob Cousins

Not a PDF!

report from durham statistics workshop - part 1 · • statistical issues to do with defining...

Documents