error analysis lecture 3
TRANSCRIPT
-
7/30/2019 Error analysis lecture 3
1/31
Physics 509
Common Probabilit
y Distributions
Scott OserLecture #3
-
7/30/2019 Error analysis lecture 3
2/31
Physics 509 2
Last Time
We discussed various descriptive statistics (mean, variance,
mode, etc.), and studied a few of the most common probabilitydistributions:
Gaussian (normal) distributionCauchy/Lorentz/Breit-Wigner distributionBinomial distributionMultinomial distributionNegative binomial distribution
TODAY
More common distributions: Poisson, exponential, 2Methods for manipulating and deriving new PDFsMarginalizing and projecting multi-dimensional PDFs
-
7/30/2019 Error analysis lecture 3
3/31
Physics 509 3
Poisson Distribution
Suppose that some event happens at random times with a
constant rate R (probability per unit time). (For example,supernova explosions.)
If we wait a time interval dt, then the probability of the eventoccurring is R dt. If dtis very small, then there is negligibleprobability of the event occuring twice in any given time interval.
We can therefore divide any time interval of length Tinto N=T/dtsubintervals. In each subinterval an event either occurs ordoesn't occur. The total number of events occurring thereforefollows a binomial distribution:
P kp=R dt , N=N !
k ! Nk!p
k1p
Nk
-
7/30/2019 Error analysis lecture 3
4/31
Physics 509 4
Poisson Distribution
Let dt=T/N 0, so that N goes to infinity. Then
P kp=R dt , N=limN N !k !Nk!RT/Nk1RT/NNk
P kp=R dt , N=limNN
k
k ! RTN k
1RT/NN1RT/Nk
=RTk eRT
k ! e
k
k !
P(k|) is called the Poisson distribution. It is the probabilityof seeing kevents that happen randomly at constant rate R
within a time interval of length T.From the derivation, it's clear that the binomial distributionapproaches a Poisson distribution whenp is very small.
is the mean number of events expected in interval T.
-
7/30/2019 Error analysis lecture 3
5/31
Physics 509 5
Properties of the Poisson distribution
Mean =
Variance =
Approaches Gaussiandistribution when gets large.
Note that in this case,the standarddeviation is in factequal to sqrt(N).
-
7/30/2019 Error analysis lecture 3
6/31
Physics 509 6
Poisson vs. Gaussian distribution
-
7/30/2019 Error analysis lecture 3
7/31
Physics 509 7
The sum of two Poisson variables is Poisson
Here we will consider the sum of two independent Poissonvariables X and Y. If the mean number of expected events
of each type are A and B, we naturally would expect thatthe sum will be a Poisson with mean A+B.
Let Z=X+Y. Consider P(X,Y):
P X ,Y=P XP Y=e
A
A
X
X ! e
B
B
Y
Y ! =e
AB
A
X
B
Y
X ! Y !
To find P(Z), sum P(X,Y) over all (X,Y) satisfying X+Y=Z
P Z=
X=0
ZeAB
AX
BZX
X !ZX!=
eAB
Z ! X=0
ZZ ! A
XBZX
X !ZX!
P Z=eAB
Z !AB
Z(by the binomial theorem)
-
7/30/2019 Error analysis lecture 3
8/31
Physics 509 8
Why do I labour the point?
First, calculating the PDF for a function of two other randomvariables is good practice.
More importantly, I want you to develop some intuition ofhow these distributions work. The sum of two Gaussiansis a Gaussian, even if they have different means, RMS.
Sum of two Poissons is a Poisson, even if means aredifferent.
What about the sum of two binomial random variables?
-
7/30/2019 Error analysis lecture 3
9/31
Physics 509 9
Sum of two binomials is binomial only if p1=p
2
I hope it's intuitively obvious that the number of heads fromN coin flips plus the number from M coin flips is equivalent
to the number from N+M coin flips, if you flip identicalcoins.
But what ifp1p
2? Consider the mean and variance of the
sum:
mean=Np1Mp2 variance=Np11p1Mp21p2
This doesn't have the generic form of the mean and varianceformulas for a binomial distribution, unlessp
1=p
2.
Contrast with the case of summing two Poissons:
mean=12 variance=12
-
7/30/2019 Error analysis lecture 3
10/31
Physics 509 10
Things you might model with a PoissonNumber of supernovas occurring per centuryNumber of Higgs particles produced in a detector during a
collisionAs an approximation for a binomial distribution where N is
large and Np is small.What about the number of people dying in traffic accidents
each day in Vancouver?
WARNING: the number of events in a histogram bin oftenfollows a Poisson distribution. When that number is small,a Gaussian distribution is a poor approximation to thePoisson. Beware of statistical tools that assume Gaussianerrors when the number of events in a bin is small (e.g. a
2 fit to a histogram)!
-
7/30/2019 Error analysis lecture 3
11/31
Physics 509 11
An exponential distribution
Consider for example the distribution of measured lifetimesfor a decaying particle:
Pt=1
et/
both t ,0
mean: t= RMS: =
HW question: Is the sum of two random variables thatfollow exponential distributions itself exponential?
-
7/30/2019 Error analysis lecture 3
12/31
Physics 509 12
The 2 distribution
Suppose that you generate N random numbers from anormal distribution with =0, =1: Z
1... Z
N.
LetXbe the sum of the squared variables:
X=i=1
N
Zi2
The variableXfollows a 2 distribution with N degrees offreedom:
P 2N=
2N/2
N/2
2N2/2
e2/2
Recall that (N) = (N-1)! if N is an integer.
-
7/30/2019 Error analysis lecture 3
13/31
Physics 509 13
Properties of the 2 distribution
A 2 distribution hasmean=N, but
variance=2N.
This makes it relativelyeasy to estimateprobabilities on the tail of
a
2
distribution.
-
7/30/2019 Error analysis lecture 3
14/31
Physics 509 14
Properties of the 2 distribution
Since 2 is a sum of Nindependent and
identical randomvariables, it is truethat it tends to beGaussian in the limitof large N (centrallimit theorem) ...
But the quantitysqrt(22) is actuallymuch moreGaussian, as the
plots to the left show!It has mean ofsqrt(2N-1) and unitvariance.
-
7/30/2019 Error analysis lecture 3
15/31
Physics 509 15
Calculating a2 tail probability
You're sitting in a talk, and someone shows a dubious-looking fit, and claims that the 2 for the fit is 70 for 50
degrees of freedom. Can you work out in your head howlikely it is to get that large of a 2 by chance?
-
7/30/2019 Error analysis lecture 3
16/31
Physics 509 16
Calculating a2 tail probability
You're sitting in a talk, and someone shows a dubious-looking fit, and claims that the 2 for the fit is 70 for 50
degrees of freedom. Can you work out in your head howlikely it is to get that large of a 2 by chance?
Estimate 1: Mean should be 50, and RMS issqrt(2N)=sqrt(100)=10, so this is a 2 fluctuation. For a
normal distribution, the probability content above +2is2.3%
More accurate estimate: sqrt(22) = sqrt(140)=11.83. Meanshould be sqrt(2N-1)=9.95. This is really more like a 1.88fluctuation.
It is good practice to always report the P value, whethergood or bad.
-
7/30/2019 Error analysis lecture 3
17/31
Physics 509 17
Uses of the 2 distribution
The dominant use of the 2 statistics is for least squaresfitting.
The best fit values of the parameters are those thatminimize the 2.
If there are m free parameters, and the deviation of themeasured points from the model follows Gaussian
distributions, then this statistic should be a
2
with N-mdegrees of freedom. More on this later.
2 is also used to test the goodness of the fitPearson'stest.
2=i=1
N
yifxi i 2
-
7/30/2019 Error analysis lecture 3
18/31
Physics 509 18
Limitations of the 2 distribution
The 2 distribution isbased on the
assumption ofGaussian errors.
Beware of using it incases where thisdoesn't apply.
To the left, the blackline is the fit while thered is the true parentdistribution.
-
7/30/2019 Error analysis lecture 3
19/31
Physics 509 19
Joint PDFs
We've already seen a few examples of multi-dimensionalprobability distributions: P(x,y), where X and Y are two
random variables.
These have the obvious interpretation thatP(x,y) dx dy = probability that X is the range x to x+dx while
simultaneously Y is in the range y to y+dy. This cantrivially be extended to multiple variables, or to the casewhere one or more variables are discrete and notcontinuous.
Normalization condition still applies:
dxiP xi=1
-
7/30/2019 Error analysis lecture 3
20/31
Physics 509 20
Marginalization vs. projection
Often we will want to determine the PDF for just onevariable without regards to the value of the other variables.
The process of eliminating unwanted parameters from thePDF is called marginalization.
P x =dy P x , y
If P(x,y) is properly normalized, then so is P(x).
Marginalization should very careful be distinguished fromprojection, in which you calculate the distribution ofxfor fixedy:
P xy =P x , y
dx P x , y
-
7/30/2019 Error analysis lecture 3
21/31
Physics 509 21
PDFs for functions of random variables
Marginalization is related to calculating the PDF of somefunction of random variables whose distributions are
known.
Suppose you know the PDFs for two variablesXand Y, andyou then want to calculate the PDF for some functionZ=f(X,Y).
Basic idea: for all values of Z,determine the region for whichZ < f < Z+dZ. Then integrate theprobability over this region to getthe probability forZ < f < Z+dZ:
P ZdZ=R
P X , YdXdY
-
7/30/2019 Error analysis lecture 3
22/31
Physics 509 22
Change of variables: 1D
Suppose we have the probability distribution P(x). We wantto instead parameterize the problem by some other
parameter y, where y=f(x). How do we get P(y)?
P(x) dx = probability that X is in range x to x+dx
This range of X maps to some range of Y: y to y+dy. Assume
for now a 1-to-1 mapping. Probability of X being in thespecified range must equal the probability of Y being in themapped range.
P x dx=P y dy=P fxdy
P y =P x dxdy=P x1
f 'x =P f1y 1f ' f1y
-
7/30/2019 Error analysis lecture 3
23/31
Physics 509 23
Change of variables: 1D example
We are told that the magnitude distribution for a group ofstars follows P(m) = B exp(m/A) over the range 0
-
7/30/2019 Error analysis lecture 3
24/31
Physics 509 24
Change of variables: 1D example
We are told that the magnitude distribution for a group ofstars follows P(m) = B exp(m/A) over the range 0
-
7/30/2019 Error analysis lecture 3
25/31
Physics 509 25
Change of variables: 1D example
P L=
dm
dLP Lm=
2.5
ln10
1
LB
exp
2.5
ln10
lnL
A
Top: P(m), simulated andtheoryBottom: P(L), simulated and
theory
For discussion: when youassign probabilities, how doyou choose theparametrization you use?
h f i bl l i di i l
-
7/30/2019 Error analysis lecture 3
26/31
Physics 509 26
Change of variables: multi-dimensional
To generalize to multi-dimensional PDFs, just apply a littlecalculus:
R
f x , y dx dy=R '
f[x u , v , y u , v ]x , y u , v dudvThis gives us a rule relating multi-dim PDFs after a change
of variables:
P x , y dx dy=P [x u , v , y u , v ] x , y u , v dudvRecall that the last term is the Jacobian:
x , yu , v =det xu
x v
yu
y v
Ch f i bl l i di l
-
7/30/2019 Error analysis lecture 3
27/31
Physics 509 27
Change of variables: multi-dim example
Consider a 2D uniform distribution inside a square -1
-
7/30/2019 Error analysis lecture 3
28/31
Physics 509 28
Change of variables: multi-dim example
Consider a 2D uniform distribution inside a square -1
-
7/30/2019 Error analysis lecture 3
29/31
Physics 509 29
Change of variables: multi-dim example
The square region in the X,Y planemaps to the parabolic region in the U,Vplane.
gu , v =
1
8 u
for any u,vin theshaded region.
Note that a lot ofthe complexity ofthe PDF is in theshape of theboundary region---
for example,marginalized PDFG(u) is not simplyproportional to 1/u.
But is this PDF properly normalized?
Change of variables multi dim normalization
-
7/30/2019 Error analysis lecture 3
30/31
Physics 509 30
Change of variables: multi-dim normalization
0
1
du u
u
dv1
8u=0
1
du2
u8u =
1
2
Normalization is wrong! Why? Mapping is not 1-to-1.
For any given value of u, there are two possible values of xthat map to that. This doubles the PDF.
In reality we need to keep track of how many differentregions map to the same part of parameter space.
-
7/30/2019 Error analysis lecture 3
31/31
Physics 509 31
Change of variables: use in fitting whendealing with limits of parameters
Sometimes in statistics problems the PDF only has physical meaning forcertain values of the parameters. For example, if fitting for the mass of anobject, you may want to require that the answer be positive.
Many minimizers run into trouble at fit boundaries, because they want toevaluate derivatives but can't.
Some fitters get around this with a change of variables. If a parameter X isrestricted to the range (a,b), try doing your internal calculations using:
Y=arcsin 2 Xaba 1 The internal parameter Y is then nice and can take on any value.Unfortunately the fit is nonlinear and becomes more subject to numericalroundoff.