lecture 3: inferences using least-squares. abstraction vector of n random variables, x with joint...
Post on 22-Dec-2015
219 views
TRANSCRIPT
![Page 1: Lecture 3: Inferences using Least-Squares. Abstraction Vector of N random variables, x with joint probability density p(x) expectation x and covariance](https://reader035.vdocuments.site/reader035/viewer/2022062715/56649d795503460f94a5d580/html5/thumbnails/1.jpg)
Lecture 3:
Inferences using Least-Squares
![Page 2: Lecture 3: Inferences using Least-Squares. Abstraction Vector of N random variables, x with joint probability density p(x) expectation x and covariance](https://reader035.vdocuments.site/reader035/viewer/2022062715/56649d795503460f94a5d580/html5/thumbnails/2.jpg)
Abstraction
Vector of N random variables, x
with joint probability density p(x)
expectation x
and covariance Cx
x2
x1
Shown as 2D here, but actually N-dimensional
![Page 3: Lecture 3: Inferences using Least-Squares. Abstraction Vector of N random variables, x with joint probability density p(x) expectation x and covariance](https://reader035.vdocuments.site/reader035/viewer/2022062715/56649d795503460f94a5d580/html5/thumbnails/3.jpg)
the multivariate normal distribution
p(x) = (2)-N/2 |Cx|-1/2 exp{ -1/2 (x-x)T Cx-1 (x-x) }
has expectation x
covariance Cx
And is normalized to unit area
![Page 4: Lecture 3: Inferences using Least-Squares. Abstraction Vector of N random variables, x with joint probability density p(x) expectation x and covariance](https://reader035.vdocuments.site/reader035/viewer/2022062715/56649d795503460f94a5d580/html5/thumbnails/4.jpg)
examples
![Page 5: Lecture 3: Inferences using Least-Squares. Abstraction Vector of N random variables, x with joint probability density p(x) expectation x and covariance](https://reader035.vdocuments.site/reader035/viewer/2022062715/56649d795503460f94a5d580/html5/thumbnails/5.jpg)
x = 2 Cx = 1 0 1 0 1
p(x,y)
![Page 6: Lecture 3: Inferences using Least-Squares. Abstraction Vector of N random variables, x with joint probability density p(x) expectation x and covariance](https://reader035.vdocuments.site/reader035/viewer/2022062715/56649d795503460f94a5d580/html5/thumbnails/6.jpg)
x = 2 Cx = 2 0 1 0 1
p(x,y)
![Page 7: Lecture 3: Inferences using Least-Squares. Abstraction Vector of N random variables, x with joint probability density p(x) expectation x and covariance](https://reader035.vdocuments.site/reader035/viewer/2022062715/56649d795503460f94a5d580/html5/thumbnails/7.jpg)
x = 2 Cx = 1 0 1 0 2
p(x,y)
![Page 8: Lecture 3: Inferences using Least-Squares. Abstraction Vector of N random variables, x with joint probability density p(x) expectation x and covariance](https://reader035.vdocuments.site/reader035/viewer/2022062715/56649d795503460f94a5d580/html5/thumbnails/8.jpg)
x = 2 Cx = 1 0.5 1 0.5 1
p(x,y)
![Page 9: Lecture 3: Inferences using Least-Squares. Abstraction Vector of N random variables, x with joint probability density p(x) expectation x and covariance](https://reader035.vdocuments.site/reader035/viewer/2022062715/56649d795503460f94a5d580/html5/thumbnails/9.jpg)
x = 2 Cx = 1 -0.5 1 -0.5 1
p(x,y)
![Page 10: Lecture 3: Inferences using Least-Squares. Abstraction Vector of N random variables, x with joint probability density p(x) expectation x and covariance](https://reader035.vdocuments.site/reader035/viewer/2022062715/56649d795503460f94a5d580/html5/thumbnails/10.jpg)
Remember this from last lecture ?
x2
x1
x2
x1
x1
p(x1)
p(x1) = p(x1,x2) dx2
x2
p(x2)
p(x2) = p(x1,x2) dx1
distribution of x1
(irrespective of x2)distribution of x2
(irrespective of x1)
![Page 11: Lecture 3: Inferences using Least-Squares. Abstraction Vector of N random variables, x with joint probability density p(x) expectation x and covariance](https://reader035.vdocuments.site/reader035/viewer/2022062715/56649d795503460f94a5d580/html5/thumbnails/11.jpg)
p(x,y)
p(y)
y
y
x
p(y) = p(x,y) dx
![Page 12: Lecture 3: Inferences using Least-Squares. Abstraction Vector of N random variables, x with joint probability density p(x) expectation x and covariance](https://reader035.vdocuments.site/reader035/viewer/2022062715/56649d795503460f94a5d580/html5/thumbnails/12.jpg)
p(x)
x
p(x,y)
y
x
p(x) = p(x,y) dy
![Page 13: Lecture 3: Inferences using Least-Squares. Abstraction Vector of N random variables, x with joint probability density p(x) expectation x and covariance](https://reader035.vdocuments.site/reader035/viewer/2022062715/56649d795503460f94a5d580/html5/thumbnails/13.jpg)
Remember
p(x,y) = p(x|y) p(y) = p(y|x) p(x)
from the last lecture ?
we can compute p(x|y) and p(y,x) as follows
P(x|y) = P(x,y) / P(y)
P(y|x) = P(x,y) / P(x)
![Page 14: Lecture 3: Inferences using Least-Squares. Abstraction Vector of N random variables, x with joint probability density p(x) expectation x and covariance](https://reader035.vdocuments.site/reader035/viewer/2022062715/56649d795503460f94a5d580/html5/thumbnails/14.jpg)
p(x,y)
p(x|y)
p(y|x)
![Page 15: Lecture 3: Inferences using Least-Squares. Abstraction Vector of N random variables, x with joint probability density p(x) expectation x and covariance](https://reader035.vdocuments.site/reader035/viewer/2022062715/56649d795503460f94a5d580/html5/thumbnails/15.jpg)
Any linear function of a normal distributionis a normal distribution
p(x) = (2)-N/2 |Cx|-1/2 exp{ -1/2 (x-x)T Cx-1 (x-x) }
And y=Mx then
p(y) = (2)-N/2 |Cy|-1/2 exp{ -1/2 (y-y)T Cy-1 (y-y) }
with y=Mx and Cy=MCxMT
Memorize!
![Page 16: Lecture 3: Inferences using Least-Squares. Abstraction Vector of N random variables, x with joint probability density p(x) expectation x and covariance](https://reader035.vdocuments.site/reader035/viewer/2022062715/56649d795503460f94a5d580/html5/thumbnails/16.jpg)
Do you remember this from a previous lecture?
then the standard Least-squares solution is
mest = [GTG]-1 GT
and the rule for error propagation gives
Cm = d2 [GTG]-1
if d = G m
![Page 17: Lecture 3: Inferences using Least-Squares. Abstraction Vector of N random variables, x with joint probability density p(x) expectation x and covariance](https://reader035.vdocuments.site/reader035/viewer/2022062715/56649d795503460f94a5d580/html5/thumbnails/17.jpg)
Example – all the data assumed to have the same true value, m1, and each measured with the same variance, d
2
d1 1
d2 1
d3 = 1 m1
…
dN 1
G
GTG = N so [GTG]-1 = N-1
GTd = i di
mest=[GTG]-1GTd = (i di) / N
Cm = d2 / N
![Page 18: Lecture 3: Inferences using Least-Squares. Abstraction Vector of N random variables, x with joint probability density p(x) expectation x and covariance](https://reader035.vdocuments.site/reader035/viewer/2022062715/56649d795503460f94a5d580/html5/thumbnails/18.jpg)
m1est = (i di) / N … the traditional formula for the mean!
the estimated mean has variance Cm = d2 / N = m
2
note then that m = d / N
the estimated mean is a normally-distributed random variable
the width of this distribution, m, decreases with the square root of the number of measurements
![Page 19: Lecture 3: Inferences using Least-Squares. Abstraction Vector of N random variables, x with joint probability density p(x) expectation x and covariance](https://reader035.vdocuments.site/reader035/viewer/2022062715/56649d795503460f94a5d580/html5/thumbnails/19.jpg)
Accuracy grows only slowly with N
N=1
N=100
N=10
N=1000
![Page 20: Lecture 3: Inferences using Least-Squares. Abstraction Vector of N random variables, x with joint probability density p(x) expectation x and covariance](https://reader035.vdocuments.site/reader035/viewer/2022062715/56649d795503460f94a5d580/html5/thumbnails/20.jpg)
Estimating the variance of the data
What 2d do you use
in this formula?
![Page 21: Lecture 3: Inferences using Least-Squares. Abstraction Vector of N random variables, x with joint probability density p(x) expectation x and covariance](https://reader035.vdocuments.site/reader035/viewer/2022062715/56649d795503460f94a5d580/html5/thumbnails/21.jpg)
Prior estimates of d
Based on knowledge of the limits of you measuring technique …
my ruler has only mm tics, so I’m going to assume that d = 0.5 mm
the manufacturer claims that the instrument is accurate to 0.1%, so since my typical measurement is 25, I’ll assume d=0.025
![Page 22: Lecture 3: Inferences using Least-Squares. Abstraction Vector of N random variables, x with joint probability density p(x) expectation x and covariance](https://reader035.vdocuments.site/reader035/viewer/2022062715/56649d795503460f94a5d580/html5/thumbnails/22.jpg)
posterior estimate of the errorBased on error measured with respect to best fit
2d = (1/N) i (di
obs-dipre)2 = (1/N) i ei
2
![Page 23: Lecture 3: Inferences using Least-Squares. Abstraction Vector of N random variables, x with joint probability density p(x) expectation x and covariance](https://reader035.vdocuments.site/reader035/viewer/2022062715/56649d795503460f94a5d580/html5/thumbnails/23.jpg)
1 x1 a y1
1 x2 b = y2
… … …
1 xN y3
G m = d
mest = [GTG]-1 GTd is normally distributed with variance
Cm = d2 [GTG]-1
![Page 24: Lecture 3: Inferences using Least-Squares. Abstraction Vector of N random variables, x with joint probability density p(x) expectation x and covariance](https://reader035.vdocuments.site/reader035/viewer/2022062715/56649d795503460f94a5d580/html5/thumbnails/24.jpg)
p(m) = p(a,b) = p(intercept, slope)
slope
inte
rcep
t
![Page 25: Lecture 3: Inferences using Least-Squares. Abstraction Vector of N random variables, x with joint probability density p(x) expectation x and covariance](https://reader035.vdocuments.site/reader035/viewer/2022062715/56649d795503460f94a5d580/html5/thumbnails/25.jpg)
How probable is a dataset ?
![Page 26: Lecture 3: Inferences using Least-Squares. Abstraction Vector of N random variables, x with joint probability density p(x) expectation x and covariance](https://reader035.vdocuments.site/reader035/viewer/2022062715/56649d795503460f94a5d580/html5/thumbnails/26.jpg)
N data d are all drawn from the same distribution p(d)
the probable-ness of a single measurement di is p(di)
So the probable-ness of the whole dataset is
p(d1) p(d2) … p(dN) = i p(di)
L = ln i p(di) = i ln p(di)
called then “Likelihood” of the data
![Page 27: Lecture 3: Inferences using Least-Squares. Abstraction Vector of N random variables, x with joint probability density p(x) expectation x and covariance](https://reader035.vdocuments.site/reader035/viewer/2022062715/56649d795503460f94a5d580/html5/thumbnails/27.jpg)
Now imagine that the distribution p(d) is known up to a vector m of unknown parameters
write p(d; m) with semicolon as a reminder
that its not a joint probability
The L is a function of m
L(m) = i ln p(di; m)
![Page 28: Lecture 3: Inferences using Least-Squares. Abstraction Vector of N random variables, x with joint probability density p(x) expectation x and covariance](https://reader035.vdocuments.site/reader035/viewer/2022062715/56649d795503460f94a5d580/html5/thumbnails/28.jpg)
The Principle of Maximum Likelihood
choose m so that it maximizes L(m)
the dataset that was in fact observed is the most probable one that could have been observed
The best choice of parameters m are the ones that make the dataset likely
![Page 29: Lecture 3: Inferences using Least-Squares. Abstraction Vector of N random variables, x with joint probability density p(x) expectation x and covariance](https://reader035.vdocuments.site/reader035/viewer/2022062715/56649d795503460f94a5d580/html5/thumbnails/29.jpg)
the multivariate normal distribution for data, d
p(d) = (2)-N/2 |Cd|-1/2 exp{ -1/2 (d-d)T Cd-1 (d-d) }
Let’s assume that the expectation d is given by a general linear model
d = Gm
And that the covariance Cd
is known (prior covariance)
![Page 30: Lecture 3: Inferences using Least-Squares. Abstraction Vector of N random variables, x with joint probability density p(x) expectation x and covariance](https://reader035.vdocuments.site/reader035/viewer/2022062715/56649d795503460f94a5d580/html5/thumbnails/30.jpg)
Then we have a distribution P(d; m)with unknown parameters, m
p(d)=(2)-N/2|Cd|-1/2exp{ -½ (d-Gm)T Cd-1 (d-Gm) }
We can now apply theprinciple of maximum likelihood
To estimate the unknown parameters m
![Page 31: Lecture 3: Inferences using Least-Squares. Abstraction Vector of N random variables, x with joint probability density p(x) expectation x and covariance](https://reader035.vdocuments.site/reader035/viewer/2022062715/56649d795503460f94a5d580/html5/thumbnails/31.jpg)
Find the m that maximizes L(m) = ln p(d; m)
with
p(d;m)=(2)-N/2|Cd|-1/2exp{ -½ (d-Gm)T Cd-1 (d-Gm) }
![Page 32: Lecture 3: Inferences using Least-Squares. Abstraction Vector of N random variables, x with joint probability density p(x) expectation x and covariance](https://reader035.vdocuments.site/reader035/viewer/2022062715/56649d795503460f94a5d580/html5/thumbnails/32.jpg)
L(m) = ln p(d; m) =
- ½Nln(2) - ½ln(|Cd|) - ½(d-Gm)T Cd-1 (d-Gm)
The first two terms do not contain m, so the principle of maximum likelihood is
Maximize -½ (d-Gm) T Cd-1 (d-Gm)
or
Minimize (d-Gm) T Cd-1 (d-Gm)
![Page 33: Lecture 3: Inferences using Least-Squares. Abstraction Vector of N random variables, x with joint probability density p(x) expectation x and covariance](https://reader035.vdocuments.site/reader035/viewer/2022062715/56649d795503460f94a5d580/html5/thumbnails/33.jpg)
Minimize (d-Gm) T Cd-1 (d-Gm)
Special case of uncorrelated data with equal variance
Cd = d2I
Minimize d-2 (d-Gm)T (d-Gm) with respect to m
Which is the same as
Minimize (d-Gm)T (d- Gm) with respect to m
This is the Principle of Least Squares
![Page 34: Lecture 3: Inferences using Least-Squares. Abstraction Vector of N random variables, x with joint probability density p(x) expectation x and covariance](https://reader035.vdocuments.site/reader035/viewer/2022062715/56649d795503460f94a5d580/html5/thumbnails/34.jpg)
But back to the general case …
What formula for m does the rule
Minimize (d-Gm)T Cd-1 (d-Gm)
imply ?
![Page 35: Lecture 3: Inferences using Least-Squares. Abstraction Vector of N random variables, x with joint probability density p(x) expectation x and covariance](https://reader035.vdocuments.site/reader035/viewer/2022062715/56649d795503460f94a5d580/html5/thumbnails/35.jpg)
Answer(after a lot of algebra)
m = [GT Cd-1G]-1GTCd
-1d
And then by the usual rules of error propagation
Cm = [GTCd-1G]-1
![Page 36: Lecture 3: Inferences using Least-Squares. Abstraction Vector of N random variables, x with joint probability density p(x) expectation x and covariance](https://reader035.vdocuments.site/reader035/viewer/2022062715/56649d795503460f94a5d580/html5/thumbnails/36.jpg)
This special case is often called
Weighted Least Squares
Note that the total error is
E = eT Cd-1 e = i i
-2 ei2
Each individual error is weighted by the reciprocal of its variance, so errors involving data with SMALL variance get MORE weight
weight
![Page 37: Lecture 3: Inferences using Least-Squares. Abstraction Vector of N random variables, x with joint probability density p(x) expectation x and covariance](https://reader035.vdocuments.site/reader035/viewer/2022062715/56649d795503460f94a5d580/html5/thumbnails/37.jpg)
Example: fitting a straight line
100 data, first 50 have a different d than the last 50
![Page 38: Lecture 3: Inferences using Least-Squares. Abstraction Vector of N random variables, x with joint probability density p(x) expectation x and covariance](https://reader035.vdocuments.site/reader035/viewer/2022062715/56649d795503460f94a5d580/html5/thumbnails/38.jpg)
Equal variance
Left 50: d = 5 right 50: d = 5
![Page 39: Lecture 3: Inferences using Least-Squares. Abstraction Vector of N random variables, x with joint probability density p(x) expectation x and covariance](https://reader035.vdocuments.site/reader035/viewer/2022062715/56649d795503460f94a5d580/html5/thumbnails/39.jpg)
Left has smaller variance
first 50: d = 5 last 50: d = 100
![Page 40: Lecture 3: Inferences using Least-Squares. Abstraction Vector of N random variables, x with joint probability density p(x) expectation x and covariance](https://reader035.vdocuments.site/reader035/viewer/2022062715/56649d795503460f94a5d580/html5/thumbnails/40.jpg)
Right has smaller variance
first 50: d = 100 last 50: d = 5
![Page 41: Lecture 3: Inferences using Least-Squares. Abstraction Vector of N random variables, x with joint probability density p(x) expectation x and covariance](https://reader035.vdocuments.site/reader035/viewer/2022062715/56649d795503460f94a5d580/html5/thumbnails/41.jpg)
What can go wrong in least-squares
m = [GTG]-1 GT d
the matrix [GTG]-1 is singular
![Page 42: Lecture 3: Inferences using Least-Squares. Abstraction Vector of N random variables, x with joint probability density p(x) expectation x and covariance](https://reader035.vdocuments.site/reader035/viewer/2022062715/56649d795503460f94a5d580/html5/thumbnails/42.jpg)
m =
d1
d2
d3
…
dN
1 x1
1 x2
1 x3
…
1 xN
EXAMPLE - a straight line fit
N i xi
i xi Si xi2
GTG =
det(GTG) = N i xi2 – [i xi]2
[GTG]-1 singular when determinant is zero
![Page 43: Lecture 3: Inferences using Least-Squares. Abstraction Vector of N random variables, x with joint probability density p(x) expectation x and covariance](https://reader035.vdocuments.site/reader035/viewer/2022062715/56649d795503460f94a5d580/html5/thumbnails/43.jpg)
N=1, only one measurement (x,d)
N i xi2 – [i xi]2 = x2 - x2 = 0
you can’t fit a straight line to only one point
N1, all data measured at the same x
N i xi2 – [i xi]2 = N2 x2 – N2 x2 = 0
measuring the same point over and over doesn’t help
det(GTG) = N i xi2 – [i xi]2 = 0
![Page 44: Lecture 3: Inferences using Least-Squares. Abstraction Vector of N random variables, x with joint probability density p(x) expectation x and covariance](https://reader035.vdocuments.site/reader035/viewer/2022062715/56649d795503460f94a5d580/html5/thumbnails/44.jpg)
This sort of ‘missing measurement’might be difficult to recognize in a
complicated problem
but it happens all the time …
![Page 45: Lecture 3: Inferences using Least-Squares. Abstraction Vector of N random variables, x with joint probability density p(x) expectation x and covariance](https://reader035.vdocuments.site/reader035/viewer/2022062715/56649d795503460f94a5d580/html5/thumbnails/45.jpg)
Example - Tomography
![Page 46: Lecture 3: Inferences using Least-Squares. Abstraction Vector of N random variables, x with joint probability density p(x) expectation x and covariance](https://reader035.vdocuments.site/reader035/viewer/2022062715/56649d795503460f94a5d580/html5/thumbnails/46.jpg)
in this method, you try to plaster the subject with X-ray beams made at every possible position and direction, but you can easily wind up missing some small region …
no data coverage here
![Page 47: Lecture 3: Inferences using Least-Squares. Abstraction Vector of N random variables, x with joint probability density p(x) expectation x and covariance](https://reader035.vdocuments.site/reader035/viewer/2022062715/56649d795503460f94a5d580/html5/thumbnails/47.jpg)
What to do ?
Introduce prior information
assumptions about the behavior of the unknowns
that ‘fill in’ the data gaps
![Page 48: Lecture 3: Inferences using Least-Squares. Abstraction Vector of N random variables, x with joint probability density p(x) expectation x and covariance](https://reader035.vdocuments.site/reader035/viewer/2022062715/56649d795503460f94a5d580/html5/thumbnails/48.jpg)
Examples of Prior Information
The unknowns:
are close to some already-known valuethe density of the mantle is close to 3000 kg/m3
vary smoothly with time or with geographical position
ocean currents have length scales of 10’s of km
obey some physical law embodied in a PDEwater is incompressible andthus its velocity satisfies div(v)=0
![Page 49: Lecture 3: Inferences using Least-Squares. Abstraction Vector of N random variables, x with joint probability density p(x) expectation x and covariance](https://reader035.vdocuments.site/reader035/viewer/2022062715/56649d795503460f94a5d580/html5/thumbnails/49.jpg)
Are you only fooling yourself ?
It depends …
are your assumptions good ones?
![Page 50: Lecture 3: Inferences using Least-Squares. Abstraction Vector of N random variables, x with joint probability density p(x) expectation x and covariance](https://reader035.vdocuments.site/reader035/viewer/2022062715/56649d795503460f94a5d580/html5/thumbnails/50.jpg)
Application of theMaximum Likelihood Method
to this problem
so, let’s have a foray into the world of probability
![Page 51: Lecture 3: Inferences using Least-Squares. Abstraction Vector of N random variables, x with joint probability density p(x) expectation x and covariance](https://reader035.vdocuments.site/reader035/viewer/2022062715/56649d795503460f94a5d580/html5/thumbnails/51.jpg)
Overall Strategy
1. Represent the observed data as a probability distribution
2. Represent prior information as a probability distribution
3. Represent the relationship between data and model parameters as a probability distribution
4. Combine the three distributions in a way that embodies combining the information that they contain
5. Apply maximum likelihood to the combined distribution
![Page 52: Lecture 3: Inferences using Least-Squares. Abstraction Vector of N random variables, x with joint probability density p(x) expectation x and covariance](https://reader035.vdocuments.site/reader035/viewer/2022062715/56649d795503460f94a5d580/html5/thumbnails/52.jpg)
How to combine distributions in a way that embodies combining the
information that they contain …
Short answer: multiply them
x
p1(x)
x
p2(x)
x
pT(x)
x1 x2 x3
x between x1 and x3
x between x2 and x4
x between x2 and x3
x4
![Page 53: Lecture 3: Inferences using Least-Squares. Abstraction Vector of N random variables, x with joint probability density p(x) expectation x and covariance](https://reader035.vdocuments.site/reader035/viewer/2022062715/56649d795503460f94a5d580/html5/thumbnails/53.jpg)
Overall Strategy
1. Represent the observed data as a Normal probability distribution
pA(d) exp{ -½ (d-dobs)T Cd-1 (d-dobs) }
In the absence of any other information, the best estimate of the mean of the data is the observed data itself.
Prior covariance of the data.
I don’t feel like typing the normalization
![Page 54: Lecture 3: Inferences using Least-Squares. Abstraction Vector of N random variables, x with joint probability density p(x) expectation x and covariance](https://reader035.vdocuments.site/reader035/viewer/2022062715/56649d795503460f94a5d580/html5/thumbnails/54.jpg)
Overall Strategy
2. Represent prior information as a Normal probability distribution
pA(m) exp{ -½ (m-mA)T Cm-1 (m-mA) }
Prior estimate of the model, your best guess as to what it would be, in the absence of any observations.
Prior covariance of the model quantifies how good you think your prior estimate is …
![Page 55: Lecture 3: Inferences using Least-Squares. Abstraction Vector of N random variables, x with joint probability density p(x) expectation x and covariance](https://reader035.vdocuments.site/reader035/viewer/2022062715/56649d795503460f94a5d580/html5/thumbnails/55.jpg)
example
one observationdobs = 0.8 ± 0.4
one model parameter withmA=1.0 ± 1.25
![Page 56: Lecture 3: Inferences using Least-Squares. Abstraction Vector of N random variables, x with joint probability density p(x) expectation x and covariance](https://reader035.vdocuments.site/reader035/viewer/2022062715/56649d795503460f94a5d580/html5/thumbnails/56.jpg)
mA=1
dobs =
0.8
0 2
20
pA(d) pA(m)
![Page 57: Lecture 3: Inferences using Least-Squares. Abstraction Vector of N random variables, x with joint probability density p(x) expectation x and covariance](https://reader035.vdocuments.site/reader035/viewer/2022062715/56649d795503460f94a5d580/html5/thumbnails/57.jpg)
Overall Strategy
3. Represent the relationship between data and model parameters as a probability distribution
pT(d,m) exp{ -½ (d-Gm)T CG-1 (d-Gm) }
Prior covariance of the theory quantifies how good you think your linear theory is.
linear theory, Gm=d, relating data, d, to model parameters, m.
![Page 58: Lecture 3: Inferences using Least-Squares. Abstraction Vector of N random variables, x with joint probability density p(x) expectation x and covariance](https://reader035.vdocuments.site/reader035/viewer/2022062715/56649d795503460f94a5d580/html5/thumbnails/58.jpg)
example
theory: d=m
but only accurate to ± 0.2
![Page 59: Lecture 3: Inferences using Least-Squares. Abstraction Vector of N random variables, x with joint probability density p(x) expectation x and covariance](https://reader035.vdocuments.site/reader035/viewer/2022062715/56649d795503460f94a5d580/html5/thumbnails/59.jpg)
mA=1
d obs
=0.
8
0 2
20
pT(d,m)
![Page 60: Lecture 3: Inferences using Least-Squares. Abstraction Vector of N random variables, x with joint probability density p(x) expectation x and covariance](https://reader035.vdocuments.site/reader035/viewer/2022062715/56649d795503460f94a5d580/html5/thumbnails/60.jpg)
Overall Strategy
4. Combine the three distributions in a way that embodies combining the information that they contain
p (m,d) = pA(d) pA(m) pT(m,d)
exp{ -½ [
(d-dobs)T Cd-1 (d-dobs) +
(m-mA)T Cm-1 (m-mA) +
(d-Gm)T CG-1 (d-Gm) ]}
a bit of a mess, but it can be simplified ,,,
![Page 61: Lecture 3: Inferences using Least-Squares. Abstraction Vector of N random variables, x with joint probability density p(x) expectation x and covariance](https://reader035.vdocuments.site/reader035/viewer/2022062715/56649d795503460f94a5d580/html5/thumbnails/61.jpg)
0 2
20
p(d,m)=pA(d) pA(m) pT(d,m)
![Page 62: Lecture 3: Inferences using Least-Squares. Abstraction Vector of N random variables, x with joint probability density p(x) expectation x and covariance](https://reader035.vdocuments.site/reader035/viewer/2022062715/56649d795503460f94a5d580/html5/thumbnails/62.jpg)
Overall Strategy
5. Apply maximum likelihood to the combined distribution, p(d,m) = pA(d) pA(m) pT(m,d)
![Page 63: Lecture 3: Inferences using Least-Squares. Abstraction Vector of N random variables, x with joint probability density p(x) expectation x and covariance](https://reader035.vdocuments.site/reader035/viewer/2022062715/56649d795503460f94a5d580/html5/thumbnails/63.jpg)
mest
dpre
0 2
20
p(d,m)
Maximum likelihood point
![Page 64: Lecture 3: Inferences using Least-Squares. Abstraction Vector of N random variables, x with joint probability density p(x) expectation x and covariance](https://reader035.vdocuments.site/reader035/viewer/2022062715/56649d795503460f94a5d580/html5/thumbnails/64.jpg)
special case of an exact theory
Exact Theory: the covariance CG is very small: limit CG0
After projecting p(d,m) to p(m) by integrating over all d
p(m) exp{-½(Gm-dobs)TCd-1(Gm-dobs)+(m-mA)TCm
-1(m-mA)]}
![Page 65: Lecture 3: Inferences using Least-Squares. Abstraction Vector of N random variables, x with joint probability density p(x) expectation x and covariance](https://reader035.vdocuments.site/reader035/viewer/2022062715/56649d795503460f94a5d580/html5/thumbnails/65.jpg)
maximizing p(m) is equivalent to minimizing
(Gm-dobs)TCd-1(Gm-dobs) + (m-mA)TCm
-1(m-mA)
weighted “prediction error” weighted “distance of the model from its prior value”
+
![Page 66: Lecture 3: Inferences using Least-Squares. Abstraction Vector of N random variables, x with joint probability density p(x) expectation x and covariance](https://reader035.vdocuments.site/reader035/viewer/2022062715/56649d795503460f94a5d580/html5/thumbnails/66.jpg)
solutioncalculated via the usual messy minimization process
mest = mA + M [ dobs – GmA]
where M = [GTCd-1G + Cm
-1]-1 GT Cd-1
Don’t Memorize, but be prepared to use
![Page 67: Lecture 3: Inferences using Least-Squares. Abstraction Vector of N random variables, x with joint probability density p(x) expectation x and covariance](https://reader035.vdocuments.site/reader035/viewer/2022062715/56649d795503460f94a5d580/html5/thumbnails/67.jpg)
interesting interpretation
mest - mA = M [ dobs – GmA]
estimated model minus its prior
observed data minus the prediction of the prior model
linear connection between the two is a generalized form of least squares
![Page 68: Lecture 3: Inferences using Least-Squares. Abstraction Vector of N random variables, x with joint probability density p(x) expectation x and covariance](https://reader035.vdocuments.site/reader035/viewer/2022062715/56649d795503460f94a5d580/html5/thumbnails/68.jpg)
special uncorrelated case Cm=m
2I and Cd=d2I
M = [GTCd-1G + Cm
-1]-1 GT Cd-1
= [ GTG + (d/m)2I ]-1 GT
this formula is sometimes called “damped least squares”, with “damping factor” =d/m
![Page 69: Lecture 3: Inferences using Least-Squares. Abstraction Vector of N random variables, x with joint probability density p(x) expectation x and covariance](https://reader035.vdocuments.site/reader035/viewer/2022062715/56649d795503460f94a5d580/html5/thumbnails/69.jpg)
Damped Least Squaresmakes the process of avoiding
singular matrices associated with insufficient data
trivially easy
you just add 2I to GTG before computing the inverse
![Page 70: Lecture 3: Inferences using Least-Squares. Abstraction Vector of N random variables, x with joint probability density p(x) expectation x and covariance](https://reader035.vdocuments.site/reader035/viewer/2022062715/56649d795503460f94a5d580/html5/thumbnails/70.jpg)
GTG GTG + 2I
this process regularizes the matrix, so its inverse always exists
its interpretation is :in the absence of relevant data,
assume the model parameter has its prior value
![Page 71: Lecture 3: Inferences using Least-Squares. Abstraction Vector of N random variables, x with joint probability density p(x) expectation x and covariance](https://reader035.vdocuments.site/reader035/viewer/2022062715/56649d795503460f94a5d580/html5/thumbnails/71.jpg)
Are you only fooling yourself ?
It depends …
is the assumption - that you know the prior value - a good one?