es 07 these slides can be found at optimized for windows)

ES07

These slides can be found at http://www.hep.lu.se/staff/stenlund/Somethings.ppt(

optimized for Windows)

ES07

The Gaussian distribution

f(x)dx is the probability that an observation will fall in between x – dx/2 and x + dx/2

ES07

Normally the Gaussian distribution is standardized by putting

= 0 and = 1

() is called the

frequency function

note that () = (– )

ES07The distribution function is the primitive function to the

frequency function

The distribution function cannot be calculated analytically, but is tabulated in most standard books. Or it can be approximated.

Note that (– ) = 1 – () and that F(x) = ()

The probability to obtain a value between 1 and 2 (1 < 2 ) is given by (2) – (1)

ES07

An approximation which can be used for () is:

() = 1 – () (a1t + a2t2 +a3t3) + () ( 0)

where t = (1 + p)-1

with p = 0.33267

a1 = 0.4361836

a2 = - 0.1201676

a3 = 0.937280

giving |()| < 110-5

ES07

Expectation value, variance and covariance

The sum is over the whole population

Standard deviation:

ES07

Variance of the population mean value

V[X] = V[X]/N

ES07

Expectation value and variance from a sample

Estimates with correct expectation value are thus given by:

and

ES07

The variance of the variance

ES07

This leads to an estimate of the “error” in the estimate of the standard deviation of a

distribution

Beware! V[V[X]] is normally a small positive number, but the terms used for its calculation

are normally very large. High precision is needed in the calculations.

ES07

Parameter fitting with the maximum likelihood method

If we know that the sample we want to study comes from a certain distribution, e. g. a Gaussian with unknown parameters, we can fit those using he

maximum likelihood method.

Calculate the probability to obtain exactly the sample you have as a function of the parameters

and maximize this probability

L(,) = f(xi) or l(,) = ln(f(xi))

The “error” of a parameter p is estimated by

l(p p) = lmax – ½

ES07

The l-function is usually close to a

parabola

l(p p) = lmax – ½

ES072 fitting and 2 testing

This method needs binning of the data.

In each bin we have (xi)min, (xi)max, yi = ni/N and i which can be taken as (ni)/N as long as ni 5 (no less than

five observations in a bin) and ni N.

Minimize the sum S

ES07

yth is calculated from the tested distribution. If this is a Gaussian with

parameters G and G we have

ES07

S can now be minimized with respect to the parameters to be fitted.

When Smin is found the “error” of the parameter can be estimated from (c. f. maximum likelihood method)

S(p p) = Smin + 1

S is in many cases approximately of parabolic shape close to the minimum.

ES07

S is 2 distributed with degrees of freedom. The number of degrees of freedom are the number of bins we have

minus the number of parameters that are fitted.

In the previous example we had 7 bins and two parameters giving = 5.

S(=5) Table meaning

1.6 0.90125 in only about 10 % of the cases a smaller S-value would be obtained

3.2 0.66918 in about 33 % of the cases a smaller S-value would be obtained

5.0 0.41588 in about 42 % of the cases a larger S-value would be obtained

7.8 0.16761 in about 17 % of the cases a larger S-value would be obtained

9.8 0.08110 in only about 8 % of the cases a larger S-value would be obtained

Generally we expect S/ to be close to 1 if the fluctuations in the data are of purely statistical origin and if the data is

described by the distribution in question.

ES07

Confidence levels and confidence intervalsAssume that we have estimated a parameter p and found

that p = 1.23 with p = 0.11.

Lets say that we want to construct an interval that covers the true value of p with 90 % confidence. This means that

we leave out 5 % on each side.

Start by finding , so that () = 0.95 = 1.6449

pmax = 1.64490.11 + 1.23 = 1.41 and

pmin = - 1.64490.11 + 1.23 = 1.05

We have found the two sided confidence interval of our estimate of p on the 90 % confidence level to be

1.05 – 1.41

ES07

If we want to state that p < x with some confidence we can construct a one sided confidence region.

Lets say that we want to construct an region that covers the true value of p with 99 % confidence.

Start by finding , so that () = 0.99 = 2.3263

pmax = 2.32630.11 + 1.23 = 1.49

We have found the one sided confidence region of our estimate of p on the 99 % confidence level to be

p < 1.49

ES07

Hypothesis testing (simple case)Lets again assume that we have estimated a parameter p

and found that p = 1.23 with p = 0.11.

No we have a hypothesis stating that p = 1.4

We now ask our selves with what probability the hypothesis is wrong.

We calculate = (1.4 – 1.23)/0.11 = 1.5455 and the probability is given by () = 0.939, i. e. we can state with

94 % confidence that the hypothesis is wrong.

es 07 these slides can be found at optimized for windows)

Documents