error analysis lecture 3

7/30/2019 Error analysis lecture 3

1/31

Physics 509

Common Probabilit

y Distributions

Scott OserLecture #3


2/31

Physics 509 2

Last Time

We discussed various descriptive statistics (mean, variance,

mode, etc.), and studied a few of the most common probabilitydistributions:

Gaussian (normal) distributionCauchy/Lorentz/Breit-Wigner distributionBinomial distributionMultinomial distributionNegative binomial distribution

TODAY

More common distributions: Poisson, exponential, 2Methods for manipulating and deriving new PDFsMarginalizing and projecting multi-dimensional PDFs


3/31

Physics 509 3

Poisson Distribution

Suppose that some event happens at random times with a

constant rate R (probability per unit time). (For example,supernova explosions.)

If we wait a time interval dt, then the probability of the eventoccurring is R dt. If dtis very small, then there is negligibleprobability of the event occuring twice in any given time interval.

We can therefore divide any time interval of length Tinto N=T/dtsubintervals. In each subinterval an event either occurs ordoesn't occur. The total number of events occurring thereforefollows a binomial distribution:

P kp=R dt , N=N !

k ! Nk!p

k1p

Nk


4/31

Physics 509 4

Poisson Distribution

Let dt=T/N 0, so that N goes to infinity. Then

P kp=R dt , N=limN N !k !Nk!RT/Nk1RT/NNk

P kp=R dt , N=limNN

k

k ! RTN k

1RT/NN1RT/Nk

=RTk eRT

k ! e

k

k !

P(k|) is called the Poisson distribution. It is the probabilityof seeing kevents that happen randomly at constant rate R

within a time interval of length T.From the derivation, it's clear that the binomial distributionapproaches a Poisson distribution whenp is very small.

is the mean number of events expected in interval T.


5/31

Physics 509 5

Properties of the Poisson distribution

Mean =

Variance =

Approaches Gaussiandistribution when gets large.

Note that in this case,the standarddeviation is in factequal to sqrt(N).


6/31

Physics 509 6

Poisson vs. Gaussian distribution


7/31

Physics 509 7

The sum of two Poisson variables is Poisson

Here we will consider the sum of two independent Poissonvariables X and Y. If the mean number of expected events

of each type are A and B, we naturally would expect thatthe sum will be a Poisson with mean A+B.

Let Z=X+Y. Consider P(X,Y):

P X ,Y=P XP Y=e

A

A

X

X ! e

B

B

Y

Y ! =e

AB

A

X

B

Y

X ! Y !

To find P(Z), sum P(X,Y) over all (X,Y) satisfying X+Y=Z

P Z=

X=0

ZeAB

AX

BZX

X !ZX!=

eAB

Z ! X=0

ZZ ! A

XBZX

X !ZX!

P Z=eAB

Z !AB

Z(by the binomial theorem)


8/31

Physics 509 8

Why do I labour the point?

First, calculating the PDF for a function of two other randomvariables is good practice.

More importantly, I want you to develop some intuition ofhow these distributions work. The sum of two Gaussiansis a Gaussian, even if they have different means, RMS.

Sum of two Poissons is a Poisson, even if means aredifferent.

What about the sum of two binomial random variables?


9/31

Physics 509 9

Sum of two binomials is binomial only if p1=p

2

I hope it's intuitively obvious that the number of heads fromN coin flips plus the number from M coin flips is equivalent

to the number from N+M coin flips, if you flip identicalcoins.

But what ifp1p

2? Consider the mean and variance of the

sum:

mean=Np1Mp2 variance=Np11p1Mp21p2

This doesn't have the generic form of the mean and varianceformulas for a binomial distribution, unlessp

1=p

2.

Contrast with the case of summing two Poissons:

mean=12 variance=12


10/31

Physics 509 10

Things you might model with a PoissonNumber of supernovas occurring per centuryNumber of Higgs particles produced in a detector during a

collisionAs an approximation for a binomial distribution where N is

large and Np is small.What about the number of people dying in traffic accidents

each day in Vancouver?

WARNING: the number of events in a histogram bin oftenfollows a Poisson distribution. When that number is small,a Gaussian distribution is a poor approximation to thePoisson. Beware of statistical tools that assume Gaussianerrors when the number of events in a bin is small (e.g. a

2 fit to a histogram)!


11/31

Physics 509 11

An exponential distribution

Consider for example the distribution of measured lifetimesfor a decaying particle:

Pt=1

et/

both t ,0

mean: t= RMS: =

HW question: Is the sum of two random variables thatfollow exponential distributions itself exponential?


12/31

Physics 509 12

The 2 distribution

Suppose that you generate N random numbers from anormal distribution with =0, =1: Z

1... Z

N.

LetXbe the sum of the squared variables:

X=i=1

N

Zi2

The variableXfollows a 2 distribution with N degrees offreedom:

P 2N=

2N/2

N/2

2N2/2

e2/2

Recall that (N) = (N-1)! if N is an integer.


13/31

Physics 509 13

Properties of the 2 distribution

A 2 distribution hasmean=N, but

variance=2N.

This makes it relativelyeasy to estimateprobabilities on the tail of

a

2

distribution.


14/31

Physics 509 14

Properties of the 2 distribution

Since 2 is a sum of Nindependent and

identical randomvariables, it is truethat it tends to beGaussian in the limitof large N (centrallimit theorem) ...

But the quantitysqrt(22) is actuallymuch moreGaussian, as the

plots to the left show!It has mean ofsqrt(2N-1) and unitvariance.


15/31

Physics 509 15

Calculating a2 tail probability

You're sitting in a talk, and someone shows a dubious-looking fit, and claims that the 2 for the fit is 70 for 50

degrees of freedom. Can you work out in your head howlikely it is to get that large of a 2 by chance?


16/31

Physics 509 16

Calculating a2 tail probability

You're sitting in a talk, and someone shows a dubious-looking fit, and claims that the 2 for the fit is 70 for 50

degrees of freedom. Can you work out in your head howlikely it is to get that large of a 2 by chance?

Estimate 1: Mean should be 50, and RMS issqrt(2N)=sqrt(100)=10, so this is a 2 fluctuation. For a

normal distribution, the probability content above +2is2.3%

More accurate estimate: sqrt(22) = sqrt(140)=11.83. Meanshould be sqrt(2N-1)=9.95. This is really more like a 1.88fluctuation.

It is good practice to always report the P value, whethergood or bad.


17/31

Physics 509 17

Uses of the 2 distribution

The dominant use of the 2 statistics is for least squaresfitting.

The best fit values of the parameters are those thatminimize the 2.

If there are m free parameters, and the deviation of themeasured points from the model follows Gaussian

distributions, then this statistic should be a

2

with N-mdegrees of freedom. More on this later.

2 is also used to test the goodness of the fitPearson'stest.

2=i=1

N

yifxi i 2


18/31

Physics 509 18

Limitations of the 2 distribution

The 2 distribution isbased on the

assumption ofGaussian errors.

Beware of using it incases where thisdoesn't apply.

To the left, the blackline is the fit while thered is the true parentdistribution.


19/31

Physics 509 19

Joint PDFs

We've already seen a few examples of multi-dimensionalprobability distributions: P(x,y), where X and Y are two

random variables.

These have the obvious interpretation thatP(x,y) dx dy = probability that X is the range x to x+dx while

simultaneously Y is in the range y to y+dy. This cantrivially be extended to multiple variables, or to the casewhere one or more variables are discrete and notcontinuous.

Normalization condition still applies:

dxiP xi=1


20/31

Physics 509 20

Marginalization vs. projection

Often we will want to determine the PDF for just onevariable without regards to the value of the other variables.

The process of eliminating unwanted parameters from thePDF is called marginalization.

P x =dy P x , y

If P(x,y) is properly normalized, then so is P(x).

Marginalization should very careful be distinguished fromprojection, in which you calculate the distribution ofxfor fixedy:

P xy =P x , y

dx P x , y


21/31

Physics 509 21

PDFs for functions of random variables

Marginalization is related to calculating the PDF of somefunction of random variables whose distributions are

known.

Suppose you know the PDFs for two variablesXand Y, andyou then want to calculate the PDF for some functionZ=f(X,Y).

Basic idea: for all values of Z,determine the region for whichZ < f < Z+dZ. Then integrate theprobability over this region to getthe probability forZ < f < Z+dZ:

P ZdZ=R

P X , YdXdY


22/31

Physics 509 22

Change of variables: 1D

Suppose we have the probability distribution P(x). We wantto instead parameterize the problem by some other

parameter y, where y=f(x). How do we get P(y)?

P(x) dx = probability that X is in range x to x+dx

This range of X maps to some range of Y: y to y+dy. Assume

for now a 1-to-1 mapping. Probability of X being in thespecified range must equal the probability of Y being in themapped range.

P x dx=P y dy=P fxdy

P y =P x dxdy=P x1

f 'x =P f1y 1f ' f1y


23/31

Physics 509 23

Change of variables: 1D example

We are told that the magnitude distribution for a group ofstars follows P(m) = B exp(m/A) over the range 0


24/31

Physics 509 24


We are told that the magnitude distribution for a group ofstars follows P(m) = B exp(m/A) over the range 0


25/31

Physics 509 25


P L=

dm

dLP Lm=

2.5

ln10

1

LB

exp

2.5

ln10

lnL

A

Top: P(m), simulated andtheoryBottom: P(L), simulated and

theory

For discussion: when youassign probabilities, how doyou choose theparametrization you use?

h f i bl l i di i l


26/31

Physics 509 26

Change of variables: multi-dimensional

To generalize to multi-dimensional PDFs, just apply a littlecalculus:

R

f x , y dx dy=R '

f[x u , v , y u , v ]x , y u , v dudvThis gives us a rule relating multi-dim PDFs after a change

of variables:

P x , y dx dy=P [x u , v , y u , v ] x , y u , v dudvRecall that the last term is the Jacobian:

x , yu , v =det xu

x v

yu

y v

Ch f i bl l i di l


27/31

Physics 509 27

Change of variables: multi-dim example

Consider a 2D uniform distribution inside a square -1


28/31

Physics 509 28


Consider a 2D uniform distribution inside a square -1


29/31

Physics 509 29


The square region in the X,Y planemaps to the parabolic region in the U,Vplane.

gu , v =

1

8 u

for any u,vin theshaded region.

Note that a lot ofthe complexity ofthe PDF is in theshape of theboundary region---

for example,marginalized PDFG(u) is not simplyproportional to 1/u.

But is this PDF properly normalized?

Change of variables multi dim normalization


30/31

Physics 509 30

Change of variables: multi-dim normalization

0

1

du u

u

dv1

8u=0

1

du2

u8u =

1

2

Normalization is wrong! Why? Mapping is not 1-to-1.

For any given value of u, there are two possible values of xthat map to that. This doubles the PDF.

In reality we need to keep track of how many differentregions map to the same part of parameter space.


31/31

Physics 509 31

Change of variables: use in fitting whendealing with limits of parameters

Sometimes in statistics problems the PDF only has physical meaning forcertain values of the parameters. For example, if fitting for the mass of anobject, you may want to require that the answer be positive.

Many minimizers run into trouble at fit boundaries, because they want toevaluate derivatives but can't.

Some fitters get around this with a change of variables. If a parameter X isrestricted to the range (a,b), try doing your internal calculations using:

Y=arcsin 2 Xaba 1 The internal parameter Y is then nice and can take on any value.Unfortunately the fit is nonlinear and becomes more subject to numericalroundoff.

error analysis lecture 3

Documents