detect unknown systematic effect: diagnose bad fit to multiple data sets advanced statistical...

41
Detect Unknown Systematic Effect: Diagnose bad fit to multiple data sets Advanced Statistical Techniques in Particle Physics Grey College, Durham 18 - 22 March 2002 M. J. Wang Institute of Physics Academia Sinica

Post on 21-Dec-2015

218 views

Category:

Documents


2 download

TRANSCRIPT

Page 1: Detect Unknown Systematic Effect: Diagnose bad fit to multiple data sets Advanced Statistical Techniques in Particle Physics Grey College, Durham 18 -

Detect Unknown Systematic Effect: Diagnose bad fit to

multiple data sets

Detect Unknown Systematic Effect: Diagnose bad fit to

multiple data sets

Advanced Statistical Techniques in Particle Physics

Grey College, Durham

18 - 22 March 2002

M. J. Wang

Institute of Physics

Academia Sinica

Advanced Statistical Techniques in Particle Physics

Grey College, Durham

18 - 22 March 2002

M. J. Wang

Institute of Physics

Academia Sinica

Page 2: Detect Unknown Systematic Effect: Diagnose bad fit to multiple data sets Advanced Statistical Techniques in Particle Physics Grey College, Durham 18 -

PrefacePreface

• Motivation and gratitude – Learn quite a lot at the workshop on

confidence limits at Fermilab in 2000 – Thanks for hosting this conference• Main title: Detect Unknown Systematic

Effect – More suitable to this conference aim – Important for experimentalists – Might be able to detect it in global fit• Sub-title: Diagnose bad fit to multiple

data sets – Global fit is not internally consistent – Don’t know which part is wrong?

– Need to diagnose the data sample

Page 3: Detect Unknown Systematic Effect: Diagnose bad fit to multiple data sets Advanced Statistical Techniques in Particle Physics Grey College, Durham 18 -

OutlineOutline

• Introduction

• Global fit and its goodness of fit

• Parameter fitting criterion

• Diagnose bad fit to multiple data sets

• Conclusion

Page 4: Detect Unknown Systematic Effect: Diagnose bad fit to multiple data sets Advanced Statistical Techniques in Particle Physics Grey College, Durham 18 -

IntroductionIntroduction

• Knowledge of parton distribution function is essential for hadron collider research

• Global fit is used to obtain parton distribution function

• Uncertainties of parton distribution function parameters

– Precision hadron collider results require estimates of uncertainties of parton distribution function parameters

– Important for Fermilab RunII and LHC physics analyses

Page 5: Detect Unknown Systematic Effect: Diagnose bad fit to multiple data sets Advanced Statistical Techniques in Particle Physics Grey College, Durham 18 -

IntroductionIntroduction

• Knowledge of parton distribution function is essential for hadron collider research

– Interpretation of data with SM

– SM parameter precision measurement

– Search for beyond SM signal

• Global fit is used to obtain parton distribution function

– Non-perturbative parton distribution functions could not be determined by PQCD

– Therefore, they are determined by global fit

Page 6: Detect Unknown Systematic Effect: Diagnose bad fit to multiple data sets Advanced Statistical Techniques in Particle Physics Grey College, Durham 18 -

Global fit and goodness of fit

Global fit and goodness of fit

• Reliable parton distribution function parameter and uncertainty estimates require passing goodness of fit criterion

– Total chi-square is used for goodness of fit

– +/- sqrt(2N) is used as a accepted range

• Is total chi-square good enough for goodness of fit ?

– Total chi-square is insensitive to small subset of data with bad fit

• Is there any way for more stringent criterion?

– Need new idea

Page 7: Detect Unknown Systematic Effect: Diagnose bad fit to multiple data sets Advanced Statistical Techniques in Particle Physics Grey College, Durham 18 -

Parameter fitting criterionParameter fitting criterion

• Idea motivated by Louis Lyons’s

goodness of fit paradox at ACAT 2000

• J.C. Collins and J. Pumplin applied this idea to the goodness of fit for global fit

– Hypothesis-testing vs parameter-fitting criteria

– Subset chi-square against total chi-square

– Found inconsistent data sets in CTEQ5 data sets

• Still don‘t know which part is correct or wrong ?

Page 8: Detect Unknown Systematic Effect: Diagnose bad fit to multiple data sets Advanced Statistical Techniques in Particle Physics Grey College, Durham 18 -

Parameter fitting criterionParameter fitting criterion

– Hypothesis-testing vs parameter-fitting criteria ( cited from J.C. Collins, J. Pumplin, hep-ph/0105207, p.3 )

Page 9: Detect Unknown Systematic Effect: Diagnose bad fit to multiple data sets Advanced Statistical Techniques in Particle Physics Grey College, Durham 18 -

Parameter fitting criterionParameter fitting criterion

– Subset chi-square against total chi-square( cited from J.C. Collins, J. Pumplin, hep-ph/0105207, p.10 )

Page 10: Detect Unknown Systematic Effect: Diagnose bad fit to multiple data sets Advanced Statistical Techniques in Particle Physics Grey College, Durham 18 -

Parameter fitting criterionParameter fitting criterion

– Found inconsistent data sets in CTEQ5 data ( cited from J.C. Collins, J. Pumplin, hep-ph/0105207, p. 13 )

Page 11: Detect Unknown Systematic Effect: Diagnose bad fit to multiple data sets Advanced Statistical Techniques in Particle Physics Grey College, Durham 18 -

Diagnose bad fit to multiple data sets

Diagnose bad fit to multiple data sets

• Importance of studying bad fit – Is the inconsistent data set free of

unknown systematic effects? – Is the theoretical prediction adequate? – Is there any hint for new physics?

• Any statistics for the diagnose purpose? – Pull can be used to identify

inconsistent experiment or data point ( thanks to F. James’s “Statistical methods in experimental physics” )

– But for real data, there is no measured pull distribution for each data point

– What should we do with pull ?

Page 12: Detect Unknown Systematic Effect: Diagnose bad fit to multiple data sets Advanced Statistical Techniques in Particle Physics Grey College, Durham 18 -

Diagnose bad fit to multiple data sets

Diagnose bad fit to multiple data sets

• Pull definition for each data point

Mi = Ti + ( random error )

Ri = Ti - Mi = -( random error )

Pi = Ri / sigma( Ri )

• Pull properties

– Gaussian shape

– Center at zero

– With unit variance

– Independence among pulls of different data points

Page 13: Detect Unknown Systematic Effect: Diagnose bad fit to multiple data sets Advanced Statistical Techniques in Particle Physics Grey College, Durham 18 -

Diagnose bad fit to multiple data sets

Diagnose bad fit to multiple data sets

• Systematic effects introduce correlation among pulls

– Constant shift on all data points

– Correlated shift on all data points

Page 14: Detect Unknown Systematic Effect: Diagnose bad fit to multiple data sets Advanced Statistical Techniques in Particle Physics Grey College, Durham 18 -

Diagnose bad fit to multiple data sets

Diagnose bad fit to multiple data sets

• Correlation among pulls is the key for detecting unknown systematic effects

• Pull correlation study

– Pull distribution consists of all data points in one experiment( experiment pull distribution )

– Pull as a function of measurement variable X

Page 15: Detect Unknown Systematic Effect: Diagnose bad fit to multiple data sets Advanced Statistical Techniques in Particle Physics Grey College, Durham 18 -

Diagnose bad fit to multiple data sets

Diagnose bad fit to multiple data sets

• Naive case without known systematic uncertainties

Mi = Ti + ( random error ) + Si ( or S )

Ri = Ti - Mi = -( random error ) - Si ( or S )

Pi = Ri / sigma( Ri )

Page 16: Detect Unknown Systematic Effect: Diagnose bad fit to multiple data sets Advanced Statistical Techniques in Particle Physics Grey College, Durham 18 -

Diagnose bad fit to multiple data setsDiagnose bad fit to multiple data sets

• Naive case without known systematic uncertainties

– Representative systematic shifts

1. Constant horizontal shift( MC data vs true curve )

Page 17: Detect Unknown Systematic Effect: Diagnose bad fit to multiple data sets Advanced Statistical Techniques in Particle Physics Grey College, Durham 18 -

Diagnose bad fit to multiple data setsDiagnose bad fit to multiple data sets

• Naive case without known systematic uncertainties

– Representative systematic shifts

1. Constant horizontal shift( residual dis. of first 6 channels with 10,000 entries )

Page 18: Detect Unknown Systematic Effect: Diagnose bad fit to multiple data sets Advanced Statistical Techniques in Particle Physics Grey College, Durham 18 -

Diagnose bad fit to multiple data setsDiagnose bad fit to multiple data sets

• Naive case without known systematic uncertainties

– Representative systematic shifts

1. Constant horizontal shift( 10% uncertainty on error estimate of the first 6 channels with 10,000 entries )

Page 19: Detect Unknown Systematic Effect: Diagnose bad fit to multiple data sets Advanced Statistical Techniques in Particle Physics Grey College, Durham 18 -

Diagnose bad fit to multiple data setsDiagnose bad fit to multiple data sets

• Naive case without known systematic uncertainties

– Representative systematic shifts

1. Constant horizontal shift( pull dis. of the first 6 channels with 10,000 entries )

Page 20: Detect Unknown Systematic Effect: Diagnose bad fit to multiple data sets Advanced Statistical Techniques in Particle Physics Grey College, Durham 18 -

Diagnose bad fit to multiple data setsDiagnose bad fit to multiple data sets

• Naive case without known systematic uncertainties

– Representative systematic shifts

1. Constant horizontal shift( effect of error estimate uncertainties 0%,10%,20% on pull dis. With 10,000 entries )

Page 21: Detect Unknown Systematic Effect: Diagnose bad fit to multiple data sets Advanced Statistical Techniques in Particle Physics Grey College, Durham 18 -

Diagnose bad fit to multiple data setsDiagnose bad fit to multiple data sets

• Naive case without known systematic uncertainties

– Representative systematic shifts

1. Constant horizontal shift ( experiment residual and pull dis. with 100,000 entries )

Page 22: Detect Unknown Systematic Effect: Diagnose bad fit to multiple data sets Advanced Statistical Techniques in Particle Physics Grey College, Durham 18 -

Diagnose bad fit to multiple data setsDiagnose bad fit to multiple data sets

• Naive case without known systematic uncertainties

– Representative systematic shifts

1. Constant horizontal shift ( experiment residual and pull profiles as function of X with 100,000 entries )

Page 23: Detect Unknown Systematic Effect: Diagnose bad fit to multiple data sets Advanced Statistical Techniques in Particle Physics Grey College, Durham 18 -

Diagnose bad fit to multiple data setsDiagnose bad fit to multiple data sets

• Naive case without known systematic uncertainties

– Representative systematic shifts

1. Constant horizontal shift( experiment residual and pull dis. with 100 entries )

Page 24: Detect Unknown Systematic Effect: Diagnose bad fit to multiple data sets Advanced Statistical Techniques in Particle Physics Grey College, Durham 18 -

Diagnose bad fit to multiple data setsDiagnose bad fit to multiple data sets

• Naive case without known systematic uncertainties

– Representative systematic shifts

1. Constant horizontal shift( experiment residual and pull profile as function of X with 100 entries )

Page 25: Detect Unknown Systematic Effect: Diagnose bad fit to multiple data sets Advanced Statistical Techniques in Particle Physics Grey College, Durham 18 -

Diagnose bad fit to multiple data setsDiagnose bad fit to multiple data sets

• Naive case without known systematic uncertainties

– Representative systematic shifts

2. Constant vertical shift( MC data vs true curve )

Page 26: Detect Unknown Systematic Effect: Diagnose bad fit to multiple data sets Advanced Statistical Techniques in Particle Physics Grey College, Durham 18 -

Diagnose bad fit to multiple data setsDiagnose bad fit to multiple data sets

• Naive case without known systematic uncertainties

– Representative systematic shifts

2. Constant vertical shift( residual dis. Of the first 6 channels with 10,000 entries )

Page 27: Detect Unknown Systematic Effect: Diagnose bad fit to multiple data sets Advanced Statistical Techniques in Particle Physics Grey College, Durham 18 -

Diagnose bad fit to multiple data setsDiagnose bad fit to multiple data sets

• Naive case without known systematic uncertainties

– Representative systematic shifts

2. Constant vertical shift ( pull dis. Of the first 6 channels with 10,000 entries )

Page 28: Detect Unknown Systematic Effect: Diagnose bad fit to multiple data sets Advanced Statistical Techniques in Particle Physics Grey College, Durham 18 -

Diagnose bad fit to multiple data setsDiagnose bad fit to multiple data sets

• Naive case without known systematic uncertainties

– Representative systematic shifts

2. Constant vertical shift( experiment residual and pull dis. with 100,000 entries )

Page 29: Detect Unknown Systematic Effect: Diagnose bad fit to multiple data sets Advanced Statistical Techniques in Particle Physics Grey College, Durham 18 -

Diagnose bad fit to multiple data setsDiagnose bad fit to multiple data sets

• Naive case without known systematic uncertainties

– Representative systematic shifts

2. Constant vertical shift( experiment residual and pull profile as function of X with 100,000 entries )

Page 30: Detect Unknown Systematic Effect: Diagnose bad fit to multiple data sets Advanced Statistical Techniques in Particle Physics Grey College, Durham 18 -

Diagnose bad fit to multiple data setsDiagnose bad fit to multiple data sets

• Naive case without known systematic uncertainties

– Representative systematic shifts

2. Constant vertical shift( experiment residual and pull dis. as function of X with 100 entries )

Page 31: Detect Unknown Systematic Effect: Diagnose bad fit to multiple data sets Advanced Statistical Techniques in Particle Physics Grey College, Durham 18 -

Diagnose bad fit to multiple data setsDiagnose bad fit to multiple data sets

• Naive case without known systematic uncertainties

– Representative systematic shifts

2. Constant vertical shift( experiment residual and pull profiles as function of X with 100 entries )

Page 32: Detect Unknown Systematic Effect: Diagnose bad fit to multiple data sets Advanced Statistical Techniques in Particle Physics Grey College, Durham 18 -

Diagnose bad fit to multiple data setsDiagnose bad fit to multiple data sets

• Naive case without known systematic uncertainties

– Representative systematic shifts

3. Combined horizontal and vertical vertical shift ( MC data vs true curve )

Page 33: Detect Unknown Systematic Effect: Diagnose bad fit to multiple data sets Advanced Statistical Techniques in Particle Physics Grey College, Durham 18 -

Diagnose bad fit to multiple data setsDiagnose bad fit to multiple data sets

• Naive case without known systematic uncertainties

– Representative systematic shifts

3. Combined horizontal and vertical vertical shift ( residual dis. Of the first 6 channels with 10,000 entries )

Page 34: Detect Unknown Systematic Effect: Diagnose bad fit to multiple data sets Advanced Statistical Techniques in Particle Physics Grey College, Durham 18 -

Diagnose bad fit to multiple data setsDiagnose bad fit to multiple data sets

• Naive case without known systematic uncertainties

– Representative systematic shifts

3. Combined horizontal and vertical vertical shift ( pull dis. Of the first 6 channels with 10,000 entries )

Page 35: Detect Unknown Systematic Effect: Diagnose bad fit to multiple data sets Advanced Statistical Techniques in Particle Physics Grey College, Durham 18 -

Diagnose bad fit to multiple data setsDiagnose bad fit to multiple data sets

• Naive case without known systematic uncertainties

– Representative systematic shifts

3. Combined horizontal and vertical vertical shift ( experiment residual and pull dis. with 100,000 entries )

Page 36: Detect Unknown Systematic Effect: Diagnose bad fit to multiple data sets Advanced Statistical Techniques in Particle Physics Grey College, Durham 18 -

Diagnose bad fit to multiple data setsDiagnose bad fit to multiple data sets

• Naive case without known systematic uncertainties

– Representative systematic shifts

3. Combined horizontal and vertical vertical shift ( experiment residual and pull profiles as function of X with 100,000 entries )

Page 37: Detect Unknown Systematic Effect: Diagnose bad fit to multiple data sets Advanced Statistical Techniques in Particle Physics Grey College, Durham 18 -

Diagnose bad fit to multiple data setsDiagnose bad fit to multiple data sets

• Naive case without known systematic uncertainties

– Representative systematic shifts

3. Combined horizontal and vertical vertical shift ( experiment residual and pull dis. as function of X with 100 entries )

Page 38: Detect Unknown Systematic Effect: Diagnose bad fit to multiple data sets Advanced Statistical Techniques in Particle Physics Grey College, Durham 18 -

Diagnose bad fit to multiple data setsDiagnose bad fit to multiple data sets

• Naive case without known systematic uncertainties

– Representative systematic shifts

3. Combined horizontal and vertical vertical shift ( experiment residual and pull profiles as function of X with 100 entries )

Page 39: Detect Unknown Systematic Effect: Diagnose bad fit to multiple data sets Advanced Statistical Techniques in Particle Physics Grey College, Durham 18 -

Diagnose bad fit to multiple data sets

Diagnose bad fit to multiple data sets

• Real case with known systematic uncertainties

Mi = Ti + ( random error ) +

( systematic error ) + Si ( or S )

Ri = Ti – Mi = - ( random error ) –

( systematic error ) - Si( or S )

Pi = Ri / sigma( Ri )

Page 40: Detect Unknown Systematic Effect: Diagnose bad fit to multiple data sets Advanced Statistical Techniques in Particle Physics Grey College, Durham 18 -

Diagnose bad fit to multiple data sets

Diagnose bad fit to multiple data sets

• Real case with known systematic uncertainties

– Need to take out known systematic uncertainty term in order to restore the independence property

– Need to fit the residual systematic effect with the aid of global fit

– Regain the naive case results

Page 41: Detect Unknown Systematic Effect: Diagnose bad fit to multiple data sets Advanced Statistical Techniques in Particle Physics Grey College, Durham 18 -

ConclusionConclusion

• Global fit is important in determining parton distribution function parameter and uncertainties

• There are inconsistent data samples found by the parameter fitting criterion

• Correlations among pulls could be a technique of detecting unknown systematic effects

• Will apply and implement this technique to global fit