signal modeling, statistical inference and data mining in...
TRANSCRIPT
![Page 1: Signal Modeling, Statistical Inference and Data Mining in ...hosting.astro.cornell.edu/~cordes/A6523/A6523_lecture_8.pdfSignal Modeling, Statistical Inference and Data Mining in Astrophysics](https://reader034.vdocuments.site/reader034/viewer/2022042400/5f0f35e27e708231d44308b1/html5/thumbnails/1.jpg)
A6523 Signal Modeling, Statistical Inference
and Data Mining in Astrophysics Spring 2011
Reading • Chapter 5 (continued)
Lecture 8 • Key points in probability • CLT • CLT examples
![Page 2: Signal Modeling, Statistical Inference and Data Mining in ...hosting.astro.cornell.edu/~cordes/A6523/A6523_lecture_8.pdfSignal Modeling, Statistical Inference and Data Mining in Astrophysics](https://reader034.vdocuments.site/reader034/viewer/2022042400/5f0f35e27e708231d44308b1/html5/thumbnails/2.jpg)
Prior vs Likelihood
Box & Tiao
![Page 3: Signal Modeling, Statistical Inference and Data Mining in ...hosting.astro.cornell.edu/~cordes/A6523/A6523_lecture_8.pdfSignal Modeling, Statistical Inference and Data Mining in Astrophysics](https://reader034.vdocuments.site/reader034/viewer/2022042400/5f0f35e27e708231d44308b1/html5/thumbnails/3.jpg)
“Learning” in Bayesian Estimation
Box & Tiao
![Page 4: Signal Modeling, Statistical Inference and Data Mining in ...hosting.astro.cornell.edu/~cordes/A6523/A6523_lecture_8.pdfSignal Modeling, Statistical Inference and Data Mining in Astrophysics](https://reader034.vdocuments.site/reader034/viewer/2022042400/5f0f35e27e708231d44308b1/html5/thumbnails/4.jpg)
![Page 5: Signal Modeling, Statistical Inference and Data Mining in ...hosting.astro.cornell.edu/~cordes/A6523/A6523_lecture_8.pdfSignal Modeling, Statistical Inference and Data Mining in Astrophysics](https://reader034.vdocuments.site/reader034/viewer/2022042400/5f0f35e27e708231d44308b1/html5/thumbnails/5.jpg)
![Page 6: Signal Modeling, Statistical Inference and Data Mining in ...hosting.astro.cornell.edu/~cordes/A6523/A6523_lecture_8.pdfSignal Modeling, Statistical Inference and Data Mining in Astrophysics](https://reader034.vdocuments.site/reader034/viewer/2022042400/5f0f35e27e708231d44308b1/html5/thumbnails/6.jpg)
![Page 7: Signal Modeling, Statistical Inference and Data Mining in ...hosting.astro.cornell.edu/~cordes/A6523/A6523_lecture_8.pdfSignal Modeling, Statistical Inference and Data Mining in Astrophysics](https://reader034.vdocuments.site/reader034/viewer/2022042400/5f0f35e27e708231d44308b1/html5/thumbnails/7.jpg)
![Page 8: Signal Modeling, Statistical Inference and Data Mining in ...hosting.astro.cornell.edu/~cordes/A6523/A6523_lecture_8.pdfSignal Modeling, Statistical Inference and Data Mining in Astrophysics](https://reader034.vdocuments.site/reader034/viewer/2022042400/5f0f35e27e708231d44308b1/html5/thumbnails/8.jpg)
![Page 9: Signal Modeling, Statistical Inference and Data Mining in ...hosting.astro.cornell.edu/~cordes/A6523/A6523_lecture_8.pdfSignal Modeling, Statistical Inference and Data Mining in Astrophysics](https://reader034.vdocuments.site/reader034/viewer/2022042400/5f0f35e27e708231d44308b1/html5/thumbnails/9.jpg)
3
I. Mutually exclusive events:
If a occurs then b cannot have occurred.
Let c = a+ b + ! “or” (same as a " b)
P (c) = P{a or b occurred} = P (a) + P (b)
Let d = a · b · ! “and” (same as a # b)
P (d) = P{a and b occurred} = 0 if mutually exclusive
II. Non-mutually exclusive events:
P (c) = P{a or b} = P (a) + P (b)$ P (ab)! "# $
III. Independent events:
P (ab) % P (a)P (b)
Examples
I. Mutually exclusive events
toss a coin once:
2 possible outcomes H & T
H & T are mutually exclusive
H & T are not independent because P (HT ) = P{heads & tails} = 0 so P (HT ) &= P (H)P (T ).
![Page 10: Signal Modeling, Statistical Inference and Data Mining in ...hosting.astro.cornell.edu/~cordes/A6523/A6523_lecture_8.pdfSignal Modeling, Statistical Inference and Data Mining in Astrophysics](https://reader034.vdocuments.site/reader034/viewer/2022042400/5f0f35e27e708231d44308b1/html5/thumbnails/10.jpg)
4
II. Independent events
toss a coin twice = experiment
The outcomes of the experiment are
1st toss 2nd toss
H1 H2
H1 T2
T1 H2
T1 T2
events might be defined as:
H1H2 = event that H on 1st toss, H on 2nd
H1T2 = event that H on 1st toss, T on 2nd
T1H2 = event that T on 1st toss, H on 2nd
T1T2 = event that T on 1st toss, T on 2nd
note P (H1H2) = P (H1)P (H2) [as long as coin not altered between tosses]
![Page 11: Signal Modeling, Statistical Inference and Data Mining in ...hosting.astro.cornell.edu/~cordes/A6523/A6523_lecture_8.pdfSignal Modeling, Statistical Inference and Data Mining in Astrophysics](https://reader034.vdocuments.site/reader034/viewer/2022042400/5f0f35e27e708231d44308b1/html5/thumbnails/11.jpg)
5
Random Variables
Of interest to us is the distribution of probability along the real number axis:
Random variables assign numbers to events or, more precisely, map the event space into a set of numbers:
a !" X(a)
event !" number
The definition of probability translates directly over to the numbers that are assigned by random variables.
The following properties are true for a real random variable.
1. Let {X # x} = event that the r.v. X is less than the number x; defined for all x [this defines all
intervals on the real number line to be events]
2. the events {X = +$} and {X = !$} have zero probability. (Otherwise, moments would not
be finite, generally.)
Distribution function: (CDF = Cumulative Distribution Function)
FX(x) = P{X # x} % P{all eventsA : X(A) # x}
properties:
1. FX(x) is a monotonically increasing function of x.
2. F (!$) = 0, F (+$) = 1
3. P{x1 # X # x2} = F (x2)! F (x1)
Probability Density Function (pdf)
fX(x) =dFX(x)
dx
properties:
1. fX(x) dx = P{x # X # x+ dx}
2.!!"! dx fX(x) = FX($)! FX(!$) = 1! 0 = 1
![Page 12: Signal Modeling, Statistical Inference and Data Mining in ...hosting.astro.cornell.edu/~cordes/A6523/A6523_lecture_8.pdfSignal Modeling, Statistical Inference and Data Mining in Astrophysics](https://reader034.vdocuments.site/reader034/viewer/2022042400/5f0f35e27e708231d44308b1/html5/thumbnails/12.jpg)
All three measures are localization measures
Other quantities are needed to measure the width and asymmetry of the PDF, etc.
![Page 13: Signal Modeling, Statistical Inference and Data Mining in ...hosting.astro.cornell.edu/~cordes/A6523/A6523_lecture_8.pdfSignal Modeling, Statistical Inference and Data Mining in Astrophysics](https://reader034.vdocuments.site/reader034/viewer/2022042400/5f0f35e27e708231d44308b1/html5/thumbnails/13.jpg)
6
Continuous r.v.’s: derivative of FX(x) exists !x
Discrete random variables: use delta functions to write the pdf in pseudo continuous form.
e.g. coin flipping
Let X =
!
"
#
1 heads
"1 tails
then
fX(x) =1
2[!(x+ 1) + !(x " 1)]
FX(x) =1
2[U(x+ 1) + U(x" 1)]
Functions of a random variable:
The function Y = g(X) is a random variable that is a mapping from some event A to a number Y
according to:
Y (A) = g[X(A)]
Theorem, if Y = g(X), then the pdf of Y is
fY (y) =n$
j=1
fX(xj)
|dg(x)/dx|x=xj
,
where xj, j = 1, n are the solutions of x = g!1(y). Note the normalization property is conserved (unit
area).
This is one of the most important equations!
Example
Y = g(X) = aX + b
dg
dx= a
g!1(y) = x1 =y " b
a
fY (y) =fX(x1)
|dg(x1)/dx|= a!1fX(
y " b
a).
*
![Page 14: Signal Modeling, Statistical Inference and Data Mining in ...hosting.astro.cornell.edu/~cordes/A6523/A6523_lecture_8.pdfSignal Modeling, Statistical Inference and Data Mining in Astrophysics](https://reader034.vdocuments.site/reader034/viewer/2022042400/5f0f35e27e708231d44308b1/html5/thumbnails/14.jpg)
Comment about “natural” random number generators
![Page 15: Signal Modeling, Statistical Inference and Data Mining in ...hosting.astro.cornell.edu/~cordes/A6523/A6523_lecture_8.pdfSignal Modeling, Statistical Inference and Data Mining in Astrophysics](https://reader034.vdocuments.site/reader034/viewer/2022042400/5f0f35e27e708231d44308b1/html5/thumbnails/15.jpg)
7
To check: show that!!"! dy fY (y) = 1
Example Suppose we want to transform from a uniform distribution to an exponential distribution:
We want ant fY (y) = exp(!y). A typical random number generator gives fX(x) with
fX(x) =
"1, 0 " x < 1;0, otherwise.
Choose y = g(x) = ! ln(x). Then:
dg
dx= !
1
x
x1 = g"1(y) = e"y
fY (y) =fX [exp(!y)]
|! 1/x1|= x1 = e"y.
Moments
We will always use angular brackets < > to denote average over an ensemble (integrating overan ensemble); time averages and other sample averages will be denoted di!erently.
Expected value of a random variable:
E(X) # $X% =#
dx xfX(x)
&denotes expectation w.r.t. the PDF of x
Arbitrary power:
$Xn% =#
dx xnfX(x)
Variance:!2x = $X2% ! $X%2
Function of a random variable: If y = g(x) and $Y % '!
dy y fY (y) then it is easy to show that
$Y % =!
dx g(x)fX(x).
Proof:
$y% '#
dy fY (y) =
#
dyn
$
j=1
fX [xj(y)]
|dg[xj(y)]/dx|
Factoid: Poission events in time have spacings that are exponentially distributed
![Page 16: Signal Modeling, Statistical Inference and Data Mining in ...hosting.astro.cornell.edu/~cordes/A6523/A6523_lecture_8.pdfSignal Modeling, Statistical Inference and Data Mining in Astrophysics](https://reader034.vdocuments.site/reader034/viewer/2022042400/5f0f35e27e708231d44308b1/html5/thumbnails/16.jpg)
8
A change of variable: dy = dgdx dx yields the result.
Central Moments:µn = !(X " !X#)n#
Moment Tests:
Moments are useful for testing hypotheses such as whether a given PDF is consistent with data:
E.g. Consistency with Gaussian PDF:
kurtosis k =µ4
µ3/22
" 3 = 0
skewness parameter ! =µ3
µ3/22
= 0
k > 0 $ 4th moment proportionately larger $ larger amplitude tail than Gaussian and lessprobable values near the mean.
![Page 17: Signal Modeling, Statistical Inference and Data Mining in ...hosting.astro.cornell.edu/~cordes/A6523/A6523_lecture_8.pdfSignal Modeling, Statistical Inference and Data Mining in Astrophysics](https://reader034.vdocuments.site/reader034/viewer/2022042400/5f0f35e27e708231d44308b1/html5/thumbnails/17.jpg)
9
Uses of Moments:
Often one wants to infer the underlying PDF of an observable, e.g. perhaps because determinationof the PDF is tantamount to understanding the underlying physics of some process.
Two approaches are:
1. construct a histogram and compare the shape with a theoretical shape.
2. determine some of the moments (usually low-order) and compare.
Suppose the data are {xj , j = 1, N}
1. One could form bins of size !x and count how many xj fall into each bin. If N is largeenough so that nk = # points in the k-th bin is also large, then a reasonably good estimateof the PDF can be made. (But beware of dependence of results on choice of binning.)
2. However, often times N is too small or one would like to determine only basic information
about the shape of the distribution (is it symmetric?), or determine the mean and varianceof the PDF or test whether the data are consistent with a given PDF (hypothesis testing).
Some of the typical situations are:
i) assume the data were drawn from a Gaussian parent PDF; estimate the mean and ! ofthe Gaussian [parameter estimation]
ii) test whether the data are consistent with a Gaussian PDF [moment test]
note that if the r.v. is zero mean then the PDF is determined solely by one parameter: !
fX(x) =1!2"!2
e!x2/2!2
The moments are
"xn# =
!
"
#
1 · 3...(n$ 1)!n % (n$ 1)!! !n n even
0 n odd
Therefore, the n = 2 moment = 1st non-zero moment & all other moments.
This statement remains for more multi-dimensional Gaussian processes:
Any moment of order higher than 3 is redundant ... or can be used as a test forgaussianity.
![Page 18: Signal Modeling, Statistical Inference and Data Mining in ...hosting.astro.cornell.edu/~cordes/A6523/A6523_lecture_8.pdfSignal Modeling, Statistical Inference and Data Mining in Astrophysics](https://reader034.vdocuments.site/reader034/viewer/2022042400/5f0f35e27e708231d44308b1/html5/thumbnails/18.jpg)
10
Characteristic Function:
Of considerable use is the characteristic function
!X(!) ! "ei!x# !!
dx fX(x) ei!x.
If we know !X(!) then we know all there is to know about the PDF because
fX(x) =1
2"
!
d! !X(!) e!i!x
is the inversion formula.
If we know all the moments of fX(x), then we also can completely characterize fX(x). Similarly,
the characteristic function is a moment-generating function:
!X(!) = "ei!X# !" "#
n=0
(i!X)n
n!
$
="#
n=0
(i!)n
n!"Xn#
because the expectation of the sum = sum of the expectations.
By taking derivatives we can show that
#!
#!|!=0 = i"X#
#2!
#!2|!=0 = i2"X2#
#k!
#!k|!=0 = in"Xn#
or
"Xn# = i!n #n!
#!n|!=0 = ($i)n
#n!
#!n|!=0 Price#s theorem
Characteristic functions are useful for deriving PDFs of combinations of r.v.’s as well as for deriving
particular moments.
![Page 19: Signal Modeling, Statistical Inference and Data Mining in ...hosting.astro.cornell.edu/~cordes/A6523/A6523_lecture_8.pdfSignal Modeling, Statistical Inference and Data Mining in Astrophysics](https://reader034.vdocuments.site/reader034/viewer/2022042400/5f0f35e27e708231d44308b1/html5/thumbnails/19.jpg)
11
Joint Random Variables
Let X and Y be two random variables with their associated sample spaces. The actual eventsassociated with X and Y may or may not be independent (e.g. throwing a die may map into
X ; choosing colored marbles from a hat may map into Y ). The relationship of the events will bedescribed by the joint distribution function of X and Y :
FXY (x, y) ! P{X " x, Y " y}
and the joint probability density function is
fXY (x, y) !!2Fxy(x, y)
!x!y(a two dimensional PDF)
Note that the one dimensional PDF of X , for example, is obtained by integrating the joint PDFover all y:
fX(x) =
!
dy fXY (x, y)
which corresponds to asking what the PFf of X is given that the certain event for Y occurs.
Example: flip two coins a and b. Let heads =1; tails =0. Define 2 r.v.’s: X = a + b; Y = a. With
these definitions X + Y are statistically dependent.
Characteristic function of joint r.v.’s:
!XY ("1,"2) = #ei(!1X+!2Y )$ =!!
dx dy ei(!1x+!2y)fXY (x, y).
For x, y independent
!XY ("1,"2) =" !
dx fX(x) ei!1x
# " !
dy fY (y) ei!2y
#
! !X("1) !Y ("2).
Example for independent r.v.’s: flip two coins a and b. As before, heads = 1 and tails = 0, let
x = a, y = b (x and y are independent).
Independent random variables
Two random variables are said to be independent if the events mapping into one r.v. are indepen-dent of those mapping into the other.
![Page 20: Signal Modeling, Statistical Inference and Data Mining in ...hosting.astro.cornell.edu/~cordes/A6523/A6523_lecture_8.pdfSignal Modeling, Statistical Inference and Data Mining in Astrophysics](https://reader034.vdocuments.site/reader034/viewer/2022042400/5f0f35e27e708231d44308b1/html5/thumbnails/20.jpg)
12
In this case, joint probabilities are factorable so that
FXY (x, y) = FX(x) FY (y)
fXY (x, y) = fX(x) fY (y).
Such factorization is plausible if one considers moments of independent r.v.’s:
!XnY m" = !Xn"!Y m"
which follows from
!XnY m" #! !
dx dy xnym fXY (x, y) =" !
dx xnfX(x)# " !
dy ymfY (y)#
.
![Page 21: Signal Modeling, Statistical Inference and Data Mining in ...hosting.astro.cornell.edu/~cordes/A6523/A6523_lecture_8.pdfSignal Modeling, Statistical Inference and Data Mining in Astrophysics](https://reader034.vdocuments.site/reader034/viewer/2022042400/5f0f35e27e708231d44308b1/html5/thumbnails/21.jpg)
13
Convolution theorem for sums of independent RVs
If Z = X+Y where X, Y are independent random variables, then the PDF of Z is the convolutionof the PDFs of X and Y :
fZ(z) = fX(x) ! fY (y) =!
dx fX(x) fY (z " x) =
!
dx fX(z " x) fY (x).
proof: By definition,
fZ(z) =d
dzFZ(z)
Consider
Fz(z) = P{Z # z}
Now, as before, this is
FZ(z) = P{X + Y # z} = P{Y # z "X}.
To evaluate this, first evaluate the probability P{Y # z " x} where x is just a number.
Now
P{Y # z " x} $ FY (z " x) $! z!x
!"dy fY (y)
but P{Y # z " X} is the probability that Y # z " x for all values of x so we need to integrate
over x and weight by the probability of x:
P{Y # z "X} =
! "
!"dx fX(x)
! z!x
!"dy FY (y)
that is, P{Y # z "X} is the expected value of FY (z " x). By the Leibniz integration formula
d
db
! g(b)
a
d! h(!) $ h(g(b))dg(b)
db
we obtain the convolution results.
![Page 22: Signal Modeling, Statistical Inference and Data Mining in ...hosting.astro.cornell.edu/~cordes/A6523/A6523_lecture_8.pdfSignal Modeling, Statistical Inference and Data Mining in Astrophysics](https://reader034.vdocuments.site/reader034/viewer/2022042400/5f0f35e27e708231d44308b1/html5/thumbnails/22.jpg)
14
Characteristic function of Z = X + Y
For X, Y independent we have
fZ = fX ! fY " !Z(!) = #ei!z$ = !X(!) !Y (!)
Variance of Z: if variance of X and Y are "2X , "
2Y , then variance of Z is "2
Z = "2X + "2
Y .
Assume X and Y and hence Z are zero mean r.v.’s, then we have
"2X = #x2$ = i!2 "2#x
"!2 (! = 0) = %"2#x
"!2 (! = 0)
"2Y = #y2$ = %"2#y
"!2 (! = 0)
Using Price’s theorem:
"2Z = #Z2$ = %
#2$Z
#!2(! = 0)
= %#2
#!2[$X(!) $Y (!)]!=0
= %#
#!
!
$X#$Y
#!+ $Y
#$X
#!
"
!=0
= %!
$X#2$Y
#!2+ $Y
#2$x
#!2+ 2
#$X
#!·#$Y
#!
"
!=0.
We have ”discovered” that variances add (independent variables only):
"2Z = "2
X + "2Y .
![Page 23: Signal Modeling, Statistical Inference and Data Mining in ...hosting.astro.cornell.edu/~cordes/A6523/A6523_lecture_8.pdfSignal Modeling, Statistical Inference and Data Mining in Astrophysics](https://reader034.vdocuments.site/reader034/viewer/2022042400/5f0f35e27e708231d44308b1/html5/thumbnails/23.jpg)
15
Multivariate random variables: N dimensional
The results for the bivariate case are easily extrapolated. If
Z = X1 +X2 + . . .+XN =N!
j=1
Xj
where the Xj are all independent r.v.’s, then
fZ(z) = fX1! fX2
! . . . ! fXN
and
!Z =N"
j=1
!Xj(!)
and
"2Z "
N!
j=1
"2Xj.
![Page 24: Signal Modeling, Statistical Inference and Data Mining in ...hosting.astro.cornell.edu/~cordes/A6523/A6523_lecture_8.pdfSignal Modeling, Statistical Inference and Data Mining in Astrophysics](https://reader034.vdocuments.site/reader034/viewer/2022042400/5f0f35e27e708231d44308b1/html5/thumbnails/24.jpg)
16
Central Limit Theorem:
Let
ZN =1!N
N!
j=1
Xj
where the Xj are independent r.v.’s with means and variances
µj " #Xj$
!2j = #X2
j $ % #Xj$2.
and the PDFs of the Xj ’s are almost arbitrary. Restrictions on the distributions of each Xj are
that
i) !2j > m > 0 m = constant
ii) #|X|n$ < M = constant for n > 2
In the limit N %& ', ZN becomes a Gaussian random variable with mean
#ZN$ =1!N
N!
j=1
µj
and variance
!2Z =
1
N
N!
j=1
!2j .
Example: suppose the Xj are all uniformly distributed between ±12 , so
fX(x) = !(x) (sin "f
"f=
sin !2
#/2
![Page 25: Signal Modeling, Statistical Inference and Data Mining in ...hosting.astro.cornell.edu/~cordes/A6523/A6523_lecture_8.pdfSignal Modeling, Statistical Inference and Data Mining in Astrophysics](https://reader034.vdocuments.site/reader034/viewer/2022042400/5f0f35e27e708231d44308b1/html5/thumbnails/25.jpg)
17
Thus the characteristic function is
!j(!) = !ei!xj" =sin !/2
!/2
Graphically:
Gaussian
N = 2 N = 3 N = #
e!x2
(sin !
2
!/2 )2 ( sin !/2!/2 )3 e!!2
From the convolution results we have
""NZN
(!) =!sin!/2
!/2
"N
From the transformation of random variables we have that
fZN(x) =
$N f"NZN
($Nx)
and by the scaling theorem for Fourier transforms
"ZN(!) = ""
NZN
! !$N
"
=!sin!/2
$N
!/2$N
"N
.
![Page 26: Signal Modeling, Statistical Inference and Data Mining in ...hosting.astro.cornell.edu/~cordes/A6523/A6523_lecture_8.pdfSignal Modeling, Statistical Inference and Data Mining in Astrophysics](https://reader034.vdocuments.site/reader034/viewer/2022042400/5f0f35e27e708231d44308b1/html5/thumbnails/26.jpg)
18
Now
limN!"
!ZN(") = e#
1
2!2"2
Z
or
fZN(x) =
1!
2# $2Z
e#x2/2"2
Z .
Consistency with this limiting form can be seen by expanding !ZNfor small "
!ZN(") !
""/2"N # 1
3!("/2"N)3
"/2"N
#N
! 1#"2
24
that is identical to the expansion of exp (#"2$2Z/2).
if the CLT holds:
![Page 27: Signal Modeling, Statistical Inference and Data Mining in ...hosting.astro.cornell.edu/~cordes/A6523/A6523_lecture_8.pdfSignal Modeling, Statistical Inference and Data Mining in Astrophysics](https://reader034.vdocuments.site/reader034/viewer/2022042400/5f0f35e27e708231d44308b1/html5/thumbnails/27.jpg)
CLT Comments
• A sum of Gaussian RVs is automatically a Gaussian RV (can show using characteristic functions)
• Convergence to a Gaussian form depends on the actual PDFs of the terms in the sum and their relative variances
• Exceptions exist!
![Page 28: Signal Modeling, Statistical Inference and Data Mining in ...hosting.astro.cornell.edu/~cordes/A6523/A6523_lecture_8.pdfSignal Modeling, Statistical Inference and Data Mining in Astrophysics](https://reader034.vdocuments.site/reader034/viewer/2022042400/5f0f35e27e708231d44308b1/html5/thumbnails/28.jpg)
19
CLT: Example of a PDF that does not work
The Cauchy distribution and its characteristic function are
fX(x) =!
"
1
!2 + x2
!(w) = e!!|"|
Now
ZN =1!N
N!
j=1
xj
has a characteristic function
!N (#) = e!N!|"|/"N
By inspection the exponential will not converge to a Gaussian. Instead, the sum of N Cauchy RVsis a Cauchy RV.
Is the Cauchy distribution a legitimate PDF? No!
The variance diverges:
"X2# =" #
#dx x2!
"
1
!2 + x2$ %.
![Page 29: Signal Modeling, Statistical Inference and Data Mining in ...hosting.astro.cornell.edu/~cordes/A6523/A6523_lecture_8.pdfSignal Modeling, Statistical Inference and Data Mining in Astrophysics](https://reader034.vdocuments.site/reader034/viewer/2022042400/5f0f35e27e708231d44308b1/html5/thumbnails/29.jpg)
A CLT Problem • Consider a set of N quantities
that are i.i.d. (independently and identically distributed) with zero mean
• We are interested in the cross correlation between all unique pairs
• What do you expect <CN> to be? • What do you expect the PDF of
CN to be?
{ai, i = 1, . . . , N}�ai� = 0
�aiaj� = σ2aδij
CN =1
NX
�
i<j
aiaj =1
NX
N−1�
i=1
N�
j=i+1
aiaj
NX = N(N − 1)/2
![Page 30: Signal Modeling, Statistical Inference and Data Mining in ...hosting.astro.cornell.edu/~cordes/A6523/A6523_lecture_8.pdfSignal Modeling, Statistical Inference and Data Mining in Astrophysics](https://reader034.vdocuments.site/reader034/viewer/2022042400/5f0f35e27e708231d44308b1/html5/thumbnails/30.jpg)
A CLT Problem (2) • Note:
• The number of independent quantities (random variables) is N
• The sum CN has terms that are products of i.i.d. variables
• Any given term in the sum is s.i. of some of the other terms
• The PDF of products is different from the PDF of individual factors
• In the limit N >> 1 there should be many independent terms in the sum
• N=2: • Can show that PDF is symmetric (odd
order moments = 0) • N>2:
• Can show that the third moment ≠ 0 • What gives?
![Page 31: Signal Modeling, Statistical Inference and Data Mining in ...hosting.astro.cornell.edu/~cordes/A6523/A6523_lecture_8.pdfSignal Modeling, Statistical Inference and Data Mining in Astrophysics](https://reader034.vdocuments.site/reader034/viewer/2022042400/5f0f35e27e708231d44308b1/html5/thumbnails/31.jpg)
![Page 32: Signal Modeling, Statistical Inference and Data Mining in ...hosting.astro.cornell.edu/~cordes/A6523/A6523_lecture_8.pdfSignal Modeling, Statistical Inference and Data Mining in Astrophysics](https://reader034.vdocuments.site/reader034/viewer/2022042400/5f0f35e27e708231d44308b1/html5/thumbnails/32.jpg)
20
Conditional Probabilities & Bayes’ Theroem
We have considered P (!), the probability of an event ! . Also obeying axioms of probabilityare conditional probabilities: P ("|!), the probability of the event " given that the event ! has
occurred.
P ("|!) !P ("!)
P (!)
Recast the axioms as
I. P ("|!) " 0
II. P ("|!) + P ("̄|!) = 1
III.
P ("! |#) = P ("|#)P (! |"#)= P (! |#)P ("|!#)
How does this relate to experiments? Use the product rule:
P (! |"#) =P (! |#)P ("|!#)
P ("|#)
or, letting M = model (or hypothesis), D = data and I = background information (assumptions),
P (M|DI) = P (M|I)P (D|MI)P (D|I)
Terms:
prior: P (M|I)
sampling distribution for D: P (D|MI) (also called likelihood for M)
prior predictive for D: P (D|I) (also called global likelihood for M or evidence for M)
![Page 33: Signal Modeling, Statistical Inference and Data Mining in ...hosting.astro.cornell.edu/~cordes/A6523/A6523_lecture_8.pdfSignal Modeling, Statistical Inference and Data Mining in Astrophysics](https://reader034.vdocuments.site/reader034/viewer/2022042400/5f0f35e27e708231d44308b1/html5/thumbnails/33.jpg)
21
Particular strengths of Bayesian method include:
1. One must often be explicit about what is assumed about I, the background information.
2. In assessing models, we get a PDF for parameters rather than just point estimates.
3. Occam’s razor (simpler models win, all else being equal) is easily invoked when comparingmodels. We may have many di!erent models, Mi that we wish to compare. Form the
odds ratio: from the posterior PDFs: P (Mi|DI):
Oi,j !P (Mi|DI)P (Mj |DI)
=P (Mi|I)P (Mj |I)
P (D|MiI)P (D|MjI)
.
![Page 34: Signal Modeling, Statistical Inference and Data Mining in ...hosting.astro.cornell.edu/~cordes/A6523/A6523_lecture_8.pdfSignal Modeling, Statistical Inference and Data Mining in Astrophysics](https://reader034.vdocuments.site/reader034/viewer/2022042400/5f0f35e27e708231d44308b1/html5/thumbnails/34.jpg)
22
Example
Data: {ki}, i = 1, . . . , n, drawn from Poisson process
Poisson PDF: Pk =!ke!!
k!
Want: mean of process
Frequentist approach:
We need an estimator for the mean; consider the likelihood
f(!) =n!
i=1
P (ki) =1
"ni=1 ki!
!!n
i=1kie!n!.
Maximizing,
df
d!= 0 = f(!)
#
!n + !!1n$
i=1
ki
%
we obtain an estimator for the mean is
k̄ =1
n
n$
i=1
ki.
![Page 35: Signal Modeling, Statistical Inference and Data Mining in ...hosting.astro.cornell.edu/~cordes/A6523/A6523_lecture_8.pdfSignal Modeling, Statistical Inference and Data Mining in Astrophysics](https://reader034.vdocuments.site/reader034/viewer/2022042400/5f0f35e27e708231d44308b1/html5/thumbnails/35.jpg)
23
Bayesian approach:
Likelihood (as before):
P (D|MI) =n!
i=1
P (ki) =1
"nı=1 ki!
!!n
i=1kie!n!.
Prior:P (M|I) = P (!|I)
AssumeP (!|I)! ! !U(!)
Prior Predictive:
P (D|I) "# "
!"d!U(!)P (D|MI) =
n!nx̄
"nı=1 ki!
!(nx̄).
Combining all the above, we find
P (!|{ki}I) =nnx̄
!(nx̄)!nx̄e!n! U(!)
Note that rather than getting a point estimate for the mean, we get a PDF for its value. Forhypothesis testing, this is much more useful than a point estimate.