lectures 3 & 4: estimators
TRANSCRIPT
![Page 1: Lectures 3 & 4: Estimators](https://reader030.vdocuments.site/reader030/viewer/2022012809/61befe82c00e2c46c865939e/html5/thumbnails/1.jpg)
Lectures 3 & 4: Estimators
• Maximum likelihood estimation (MLE)• Likelihood function, information matrix
• Least Squares estimation• Total least squares
• General least squares
• Regularization
• Maximum a Posteriori estimation (MAP)• Posterior density via Bayes’ rule
• Confidence regions
Hilary Term 2007 A. Zisserman
Maximum Likelihood Estimation
In the line fitting (linear regression) example the estimate of the line parameters θ involved two steps:
1. Write down the likelihood function expressing the probability of the data z given the parameters θ
2. Maximize the likelihood to determine θ
i.e. choose the value of θ so as to make the data as likely as possible.
Note, the solution does not depend on the variance σ2 of the noise
y
x
mina,b
nXi
(yi− (axi+ b))2
![Page 2: Lectures 3 & 4: Estimators](https://reader030.vdocuments.site/reader030/viewer/2022012809/61befe82c00e2c46c865939e/html5/thumbnails/2.jpg)
Example application: Navigation
Take multiple measurements of beacons to improve location estimates
![Page 3: Lectures 3 & 4: Estimators](https://reader030.vdocuments.site/reader030/viewer/2022012809/61befe82c00e2c46c865939e/html5/thumbnails/3.jpg)
The Likelihood function
More formally, the likelihood is defined as
L(θ) = p(z|θ)• the likelihood is the conditional probability of obtaining measurements z = (z1, z2, …, zn) given that the true value is θ
p(z|θ)
θz
Note, in general
• The likelihood is a function of θ, but it is not a probability distribution over θ,
and its integral with respect to θ does not (necessarily) equal one.
• It would be incorrect to refer to this as “the likelihood of the data”.
Example: Gaussian Sensors
Likelihood is a Gaussian response in 2D for true value θ = x
-6 -4 -2 0 2 4 6-5
-4
-3
-2
-1
0
1
2
3
4
5
Z1
Z2
x
L(θ) = p(z|θ)
Given observations z and a likelihood function , the Maximum Likelihoodestimate of θ is the value of θ which maximizes the likelihood function
L(θ)
θ = argmaxθL(θ)
Now, look at several problems involving combining multiple measurements from Gaussian sensors
![Page 4: Lectures 3 & 4: Estimators](https://reader030.vdocuments.site/reader030/viewer/2022012809/61befe82c00e2c46c865939e/html5/thumbnails/4.jpg)
MLE Example 1
Suppose we have two independent sonar measurements (z1, z2) of position x, and the sensor error may be modelled as p(zi|θ) = N(θ, σ2), determine the MLE of x.
Since the sensors are independent the likelihood is
and since the sensors are Gaussian
Solution
L(x) = p(z1, z2|x) = p(z1|x) p(z2|x)
L(x) ∼ e−(z1−x)22σ2 × e−
(z2−x)22σ2 = e
−(z1−x)2+(z2−x)22σ2
ignore irrelevant normalization constants
z1 z2
p(z1|x) p(z2|x)
L(x) ∼ e−(z1−x)2+(z2−x)2
2σ2
x
x=z1 + z2
2L(x) ∼ e−(x−x)2σ2
− lnL(x) = ((z1− x)2+ (z2− x)2)/(2σ2)= (2x2 − 2x(z1+ z2)+ z21 + z22)/(2σ
2)
= (x − x)2/(σ2)+ c(z1, z2)
with
![Page 5: Lectures 3 & 4: Estimators](https://reader030.vdocuments.site/reader030/viewer/2022012809/61befe82c00e2c46c865939e/html5/thumbnails/5.jpg)
Note
• the likelihood is a Gaussian
• the variance is reduced (cf the original sensor variance)
Z1Z2
p(z1|x) p(z2|x)L(x) ∼ e−
(x−x)2σ2
Maximum likelihood estimate of x
x = argmaxx
L(x)
negative log likelihood
Compute min by differentiating wrt x
x= x=z1 + z2
2
ML x= argminx{− lnL(x)}
d{− lnL(x)}dx
= 2(x − x)/(σ2) = 0
− lnL(x) = (x − x)2/(σ2)+ c(z1, z2)
![Page 6: Lectures 3 & 4: Estimators](https://reader030.vdocuments.site/reader030/viewer/2022012809/61befe82c00e2c46c865939e/html5/thumbnails/6.jpg)
MLE Example 2: Gaussian fusion with different variances
Suppose we have two independent sonar measurements (z1, z2) of position x, and the sensor error may be modelled as p(z1|θ) = N(θ, σ2
1) and p(z2|θ) = N(θ, σ2
2), determine the MLE of x.
Again, as the sensors are independent
Solution
L(x) = p(z1, z2|x) = p(z1|x) p(z2|x)
L(x) ∼ e−(z1−x)
2
2σ21 × e
−(z2−x)2
2σ22
e.g. if the sensors are: p(z1|x) ∼ N(x,102) p(z2|x) ∼ N(x,202)
Suppose we obtain sensor readings of z1 = 130, z2 = 170.
xMLE =130/102 + 170/202
1/102 + 1/202= 138.0
• the ML estimate is closer to the more confident measurement
The negative log likelihood is then:
− lnL(x) = 1
2
hσ−21 (z1− x)2+ σ−22 (z2− x)2
i+const
=1
2
h(σ−21 + σ−22 )x
2 − 2(σ−21 z1+ σ−22 z2)xi+const
=1
2(σ−21 + σ−22 )
⎡⎣x− σ−21 z1 + σ−22 z2
σ−21 + σ−22
⎤⎦2+ constwhich is maximised w.r.t. x when
xMLE =σ−21 z1 + σ−22 z2
σ−21 + σ−22
![Page 7: Lectures 3 & 4: Estimators](https://reader030.vdocuments.site/reader030/viewer/2022012809/61befe82c00e2c46c865939e/html5/thumbnails/7.jpg)
The Information Matrix
We have seen that for two independent Normal variables:
• information weighted mean
• additivity of statistical information
(like the parallel resistors formula)
This motivates the definition of the information matrix
x =σ−21 z1 + σ−22 z2
σ−21 + σ−22
σ−2x = σ−21 + σ−22
S =Σ−1
Sx = Sz1 + Sz2
In 2D with z= (zx, zy)>
Sx = Σ−1x = Sz1 + Sz2
x = Σx¡Sz1z1+ Sz2z2
¢If Σ=
"σ2x 0
0 σ2y
#then S =
"1/σ2x 0
0 1/σ2y
#
• large covariance → uncertainty, lack of information
• small covariance → some confidence, or strong information
A singular covariance matrix implies “infinite” or “perfect” information.
![Page 8: Lectures 3 & 4: Estimators](https://reader030.vdocuments.site/reader030/viewer/2022012809/61befe82c00e2c46c865939e/html5/thumbnails/8.jpg)
MLE Example 3: Gaussian fusion with 2D sensors
Suppose we have two independent 2D vision measurements (z1, z2) of position x, and the sensor error may be modelled as p(z1|θ) = N(θ, Σ1) and p(z2|θ) = N(θ, Σ2), determine the MLE of x, where
z1 =
Ã11
!Σ1 =
"1 00 4
#and z2 =
Ã2−1
!Σ2 =
"4 00 1
#
Again, the sensors are independent so we sum their information
Solution
Σ−1x = Σ−11 +Σ−12 =
"1 00 0.25
#+
"0.25 00 1
#=
"1.25 00 1.25
#
x = Σx¡Sz1z1+ Sz2z2
¢=
"0.8 00 0.8
# Ã"1 00 0.25
#Ã11
!+
"0.25 00 1
#Ã2−1
!!=
Ã1.2−0.6
!
L(x) = p(z1|x) p(z2|x)
N(z1,Σ1), z1 =Ã11
!Σ1 =
"1 00 4
#and N(z2,Σ2), z2 =
Ã2−1
!Σ2 =
"4 00 1
#
xy
![Page 9: Lectures 3 & 4: Estimators](https://reader030.vdocuments.site/reader030/viewer/2022012809/61befe82c00e2c46c865939e/html5/thumbnails/9.jpg)
Line fitting with errors in both coordinates
Suppose points are measured from a true line
y = ax+ b with errors as
xi = xi+ vi vi ∼ N(0,σ2)yi = yi+ wi wi ∼ N(0,σ2)
In the case of linear regression, the ML estimate gave the cost function
measured value
predicted value
mina,b
nXi
(yi− (axi+ b))2
In the case of errors in both coordinates, the ML cost function is
mina,b
nXi
(xi−xi)2+(yi−yi)2 subject to yi = axi+b
i.e. it is necessary also to estimate (predict)
corrected points (xi, yi)
![Page 10: Lectures 3 & 4: Estimators](https://reader030.vdocuments.site/reader030/viewer/2022012809/61befe82c00e2c46c865939e/html5/thumbnails/10.jpg)
corrected point
This estimator is called total least squares, or orthogonal regression
Solution [exercise – non-examinable]
• line passes through centroid of the points
• line direction is parallel to the eigenvector of the covariance matrix corresponding to the least eigenvalue
µ=nXi
xi
Σ =nXi
h(xi − µ)(xi− µ)>
i
mina,b
nXi
(xi−xi)2+(yi−yi)2 subject to yi = axi+b
Application: Estimating Geometric Transformations
The line estimation problem is equivalent to estimating a 1D affine transformation from noisy point correspondences
x/
x
x0 = αx+ β
{x0i↔ xi}
Can also compute the ML estimate of 2D transformations
![Page 11: Lectures 3 & 4: Estimators](https://reader030.vdocuments.site/reader030/viewer/2022012809/61befe82c00e2c46c865939e/html5/thumbnails/11.jpg)
Reminder: Plane Projective transformation
• A projective transformation is also called a ``homography'' and a ``collineation''.
• H has 8 degrees of freedom.
• It can be estimated from 4 or more point correpondences
If the measurement error is Gaussian, then the ML estimate of H and the corrected correspondences is given by minimizing
Maximum Likelihood estimation of homography H
Cost function minimization:
• for 2D affine transformations there is a matrix solution (non-examinable)
• for 2D projective transformation, minimize numerically using e.g. non-linear gradient descent
subject to
![Page 12: Lectures 3 & 4: Estimators](https://reader030.vdocuments.site/reader030/viewer/2022012809/61befe82c00e2c46c865939e/html5/thumbnails/12.jpg)
e.g. Camera rotating about its centre
image plane 1
image plane 2
• The two image planes are related by a homography H
• H only depends on the camera centre, C, and the planes, not on the 3D structure
Example : Building panoramic mosaics
4 frames from a sequence of 30
The camera rotates (and zooms) with fixed centre
![Page 13: Lectures 3 & 4: Estimators](https://reader030.vdocuments.site/reader030/viewer/2022012809/61befe82c00e2c46c865939e/html5/thumbnails/13.jpg)
Example : Building panoramic mosaics
30 frames
The camera rotates (and zooms) with fixed centre
Homography between two frames
ML estimate of homography from these 100s of point correspondences
![Page 14: Lectures 3 & 4: Estimators](https://reader030.vdocuments.site/reader030/viewer/2022012809/61befe82c00e2c46c865939e/html5/thumbnails/14.jpg)
Choice of mosaic frame
This produces the classic "bow-tie" mosaic.
Choose central image as reference
![Page 15: Lectures 3 & 4: Estimators](https://reader030.vdocuments.site/reader030/viewer/2022012809/61befe82c00e2c46c865939e/html5/thumbnails/15.jpg)
General Linear Least Squares
w = y− Hθ
Write the residuals as an n-vector⎛⎜⎜⎜⎜⎜⎜⎝w1w2..wn
⎞⎟⎟⎟⎟⎟⎟⎠ =⎛⎜⎜⎜⎜⎜⎜⎝y1y2..yn
⎞⎟⎟⎟⎟⎟⎟⎠ −⎡⎢⎢⎢⎢⎢⎢⎣x1 1x2 1..xn 1
⎤⎥⎥⎥⎥⎥⎥⎦Ãab
!
mina,b
Xi
(yi− axi− b)2
Then the sum of squared residuals becomes
w>w = (y−Hθ)> (y−Hθ)We want to minimize this w.r.t. θ
H is a n x 2 matrix
Summary
• If the generative model is linear
• then the ML solution for Gaussian noise is
• where the matrix is the pseudo-inverse of H
=
H>Hθ = H>y
d
dθ(y −Hθ)> (y−Hθ) = 2H> (y −Hθ) = 0
H+ = (H>H)−1H>
θ = (H>H)−1 H> y = H+y
y= Hθ
θ = H+y
![Page 16: Lectures 3 & 4: Estimators](https://reader030.vdocuments.site/reader030/viewer/2022012809/61befe82c00e2c46c865939e/html5/thumbnails/16.jpg)
Generalization I: multi-linear regression
H=
⎡⎢⎢⎢⎢⎢⎢⎣x1 z1 1x2 z2 1..xn zn 1
⎤⎥⎥⎥⎥⎥⎥⎦ θ = (H>H)−1H>y
yi = θ2xi + θ1zi+ θ0+ wi
yi =Xj
θjφj (xi) + wi
Mo re general still with basis functions φj (x)
Generalization II: regression with basis functions
yi = θ2 (xi)2 + θ1 (xi)+ θ0 + wi
for example, quadratic regression
• what is the correct function to fit ?
• how is the quality of the estimate assessed?
H=
⎡⎢⎢⎢⎢⎢⎢⎣φ2(x1) φ1(x1) 1φ2(x2) φ1(x2) 1
.
.φ2(xn) φ1(xn) 1
⎤⎥⎥⎥⎥⎥⎥⎦ =⎡⎢⎢⎢⎢⎢⎢⎣x21 x1 1
x22 x2 1..
x2n xn 1
⎤⎥⎥⎥⎥⎥⎥⎦
yi =Xj
θjφj (xi)+ wi
![Page 17: Lectures 3 & 4: Estimators](https://reader030.vdocuments.site/reader030/viewer/2022012809/61befe82c00e2c46c865939e/html5/thumbnails/17.jpg)
We have seen:
• An estimator is a function f(z) which computes an estimate
for the parameters θ from measurements z = (z1, z2, … zn)
• The Maximum Likelihood estimator is
Summary point
θ = f(z)
θ = argmaxθp(z|θ)
Now, we will compute the posterior, and hence the Maximum a Posteriori(MAP) estimator
Bayes’ rule and the Posterior distribution
Suppose we have
• p(z|θ) the conditional density of the sensor(s)
• additionally, p(θ), a prior density of θ the parameter to be estimated.
Use Bayes’ rule to get the posterior density
p(θ|z) = p(z|θ)p(θ)p(z)
=likelihood× priornormalizing factor
![Page 18: Lectures 3 & 4: Estimators](https://reader030.vdocuments.site/reader030/viewer/2022012809/61befe82c00e2c46c865939e/html5/thumbnails/18.jpg)
Likelihood p(z|θ)
Prior p(θ)
Posterior
p(θ|z) ∼ p(z|θ)p(θ)
θ
Likelihood p(z|θ)
Prior p(θ)
Posterior
p(θ|z) ∼ p(z|θ)p(θ)
θ
![Page 19: Lectures 3 & 4: Estimators](https://reader030.vdocuments.site/reader030/viewer/2022012809/61befe82c00e2c46c865939e/html5/thumbnails/19.jpg)
Example: posterior for uniform distributions
Likelihood p(z|θ)
Prior p(θ)
Posterior
p(θ|z) ∼ p(z|θ)p(θ)
θ
Likelihood p(z|θ)
Prior p(θ)
Posterior
p(θ|z) ∼ p(z|θ)p(θ)
θ
MAP Estimator
θ = argmaxθp(θ|z)
MAP
MLE?
![Page 20: Lectures 3 & 4: Estimators](https://reader030.vdocuments.site/reader030/viewer/2022012809/61befe82c00e2c46c865939e/html5/thumbnails/20.jpg)
MAP Example: Gaussian fusion
Consider again two sensors modelled as independent normal
distributions:
z1 = θ+ w1, p(z1|θ) ∼ N(θ,σ21)z2 = θ+ w2, p(z2|θ) ∼ N(θ,σ22)
so that the joint likelihood is
p(z1, z2|θ) = p(z1|θ)× p(z2|θ)Suppose in addition we have prior information that
p(θ) ∼ N(θp,σ2p )Then the posterior density is
p(θ|z1, z2) ∼ p(z1, z2|θ) × p(θ)
e.g. if the sensors are: p(z1|x) ∼ N(x,102), p(z2|x) ∼ N(x,202), and the
prior is p(x) ∼ N(150,302).
Suppose we obtain sensor readings of z1 = 130, z2 = 170 then
xMAP =130/102+ 170/202+ 150/302
1/102 + 1/202+ 1/302= 139.0
σpos =1q
1/102 + 1/202+ 1/302= 8.57
The posterior is a Gaussian, and we can write down its mean
and variance simply by adding statistical information:
p(θ|z1, z2) ∼ N(µpos,σ2pos)where
σ−2pos = σ−21 + σ−22 + σ−2p
µpos =σ−21 z1 + σ−22 z2+ σ−2p θp
σ−21 + σ−22 + σ−2p= θMAP
![Page 21: Lectures 3 & 4: Estimators](https://reader030.vdocuments.site/reader030/viewer/2022012809/61befe82c00e2c46c865939e/html5/thumbnails/21.jpg)
50 100 150 200 2500
0.005
0.01
0.015
0.02
0.025
0.03
0.035
0.04
0.045
0.05
Matlab example
p(z1|x)p(z2|x)
p(θ)
p(θ|z1, z2)
xinc=1; t = [50:inc:250];
pz1 = normal(t, 130,10^2); pz2 = normal(t, 170,20^2); prx = normal(t, 150,30^2);
posx = pz1 .* pz2 .* prx / (inc*(sum(pz1 .* pz2 .* prx)));
plot(t,pz1,'r', t,pz2,'g', t,prx,'k:', t, posx, 'b');
MAP vs MLE
For the posterior
σ−2pos = σ−21 + σ−22 + σ−2p θMAP=σ−21 z1+ σ−22 z2 + σ−2p θp
σ−21 + σ−22 + σ−2p
Recall that for the likelihood
σ−2L = σ−21 + σ−22 θMLE =σ−21 z1 + σ−22 z2
σ−21 + σ−22
so
σ−2pos = σ−2L + σ−2p θMAP=σ−2p θp+ σ−2L θMLE
σ−2p + σ−2L
i.e. prior acts as an additional sensor
![Page 22: Lectures 3 & 4: Estimators](https://reader030.vdocuments.site/reader030/viewer/2022012809/61befe82c00e2c46c865939e/html5/thumbnails/22.jpg)
• Decrease in posterior variance relative to the prior and likelihood
1
σ2pos=
1
σ2L+1
σ2p
e.g.
σL = =1q
1/102 + 1/202= 8.94 σp = 30 σpos =
1q1/102+ 1/202 + 1/302
= 8.57
If σp→∞ then σpos→ σL (prior is very uninformative)
• Effect of uniform prior ?
θMAP= θp+σ2p
σ2p + σ2L׳θMLE− θp
´• Update to the prior
• If θMLE = θp then θMAP is unchanged from the prior
(but its variance is decreased)
• If σL is large compared to σp then θMAP≈ θp
• If σp is large compared to σL then θMAP≈ θMLE
(noisy sensor)
(weak prior)
![Page 23: Lectures 3 & 4: Estimators](https://reader030.vdocuments.site/reader030/viewer/2022012809/61befe82c00e2c46c865939e/html5/thumbnails/23.jpg)
MAP Application 1: Berkeley traffic tracking system
Tracks cars by:
• computing corner features on each video frame
• tracking corners (2D) from frame to frame
Objective: estimate image corner position x(t)
features
tracks
grouping
Tracking corner features
![Page 24: Lectures 3 & 4: Estimators](https://reader030.vdocuments.site/reader030/viewer/2022012809/61befe82c00e2c46c865939e/html5/thumbnails/24.jpg)
Berkeley visual traffic monitoring system
Traffic by day and by night
![Page 25: Lectures 3 & 4: Estimators](https://reader030.vdocuments.site/reader030/viewer/2022012809/61befe82c00e2c46c865939e/html5/thumbnails/25.jpg)
ground plane homography
Velocity measurement gate
![Page 26: Lectures 3 & 4: Estimators](https://reader030.vdocuments.site/reader030/viewer/2022012809/61befe82c00e2c46c865939e/html5/thumbnails/26.jpg)
Tracking Corners
true position
measurement
Corners need 2D Gaussians
Corner feature in 2D coordinates is a vector x = (x, y)>.Suppose x ∼ N(0,σ2x), y ∼ N(0, σ2y) then
p(x, y) =1
2πσxσyexp
(−12
Ãx2
σ2x+y2
σ2y
!)and the covariance matrix is
Σ =
"σ2x 0
0 σ2y
#
![Page 27: Lectures 3 & 4: Estimators](https://reader030.vdocuments.site/reader030/viewer/2022012809/61befe82c00e2c46c865939e/html5/thumbnails/27.jpg)
Corner fusion in 2D
Given a vehicle at position x with prior
distribution
N(x0, P0)
and a corner measurement
z ∼ N(x, R)Summing information:
S = P−1 = R−1 + P−10Information weighted mean:
x= P³R−1z+P−10 x0
´
Measurementlikelihood
Priordistribution
Prior:
Likelihood:
Corner fusion in 2D
Isotropic measurement error
P0 =
Ã2 11 3
!
x0 =
Ã−1−1
!
R =
Ã1 00 1
!
z=
Ã12
!
![Page 28: Lectures 3 & 4: Estimators](https://reader030.vdocuments.site/reader030/viewer/2022012809/61befe82c00e2c46c865939e/html5/thumbnails/28.jpg)
Posteriordistribution
Posterior:
P =
Ã0.64 0.090.09 0.73
!
x=
Ã0.551.37
!
Where does the prior come from?
Use the posterior from the previous time step, “updated” to take
account of the dynamics
Iterate:
• prediction (posterior + dynamics) to give prior
• make measurement z
• fuse measurement and prior to give new posterior
xt− xt−1 ∼ NÃÃ0v
!, σ2
Ã1 00 1
!!
![Page 29: Lectures 3 & 4: Estimators](https://reader030.vdocuments.site/reader030/viewer/2022012809/61befe82c00e2c46c865939e/html5/thumbnails/29.jpg)
Compute estimate and covariance for tracked corner
Prior: Dynamics: Likelihood:
a) Predictionat t=1
b) Fuse measurementat t=1
Dynamics:
P1 = P0+PD =
Ã2.3 1.01.0 3.3
!; x1 = x0+v =
Ã−1−1
!
P1 =³P−11 +R−1
´−1=
Ã0.674 0.0760.076 0.750
!
x1 = P1³P−11 x1 +R−1z1
´=
Ã3.951.78
!
P0 =
Ã2 11 3
!PD =
Ã0.3 00 0.3
!R =
Ã1 00 1
!
x0 =
Ã−1−1
!v =
Ã00
!z1 =
Ã62
!
Berkeley tracker
In action …
![Page 30: Lectures 3 & 4: Estimators](https://reader030.vdocuments.site/reader030/viewer/2022012809/61befe82c00e2c46c865939e/html5/thumbnails/30.jpg)
• To quantify confidence and uncertainty define a confidence region R about a point x (e.g. the mode) such that at a confidence level c · 1
• we can then say (for example) there is a 99% probability that the true value is in R
• e.g. for a univariate normal distribution N(µ,σ2)
Confidence regions
p(x ∈ R) = c
p(|x − µ| < σ) ≈ 0.67
p(|x− µ| < 2σ) ≈ 0.95
p(|x− µ| < 3σ) ≈ 0.997-6 -4 -2 0 2 4 6
0
0.05
0.1
0.15
0.2
0.25
0.3
0.35
0.4
• In 2D confidence regions are usually chosen as ellipses
0 2 4 6 8 10
-3
-2
-1
0
1
2
3
4
5
99%3
86%2
39%1
confidenced
99%
86%
For an n dimensionalnormal distributionwithmean zero and covariance
Σ, then
x>Σ−1x
is distributed as a χ2-distributionon n degrees of freedom.
In 2D the region enclosed by the ellipse
x>Σ−1x= d2
defines a confidence region
p(χ22 < d2) = c
where c can be obtained from standard χ2 tables (e.g. HLT) or com-
puted numerically.
![Page 31: Lectures 3 & 4: Estimators](https://reader030.vdocuments.site/reader030/viewer/2022012809/61befe82c00e2c46c865939e/html5/thumbnails/31.jpg)
MAP Application 2: Image Restoration
1
0
i
likelihood prior
uMAP= argminu C(u)
Suppose sensor model is
zi = ui +wi
where
ui ∈ {0,1} wi ∼ N(0,σ2)Then the likelihood is
p(z|u) ∼Yi
e−(zi−ui)2/(2σ2)
Assume the prior has the form
p(u) ∼Yi
e−λ(ui−ui−1)2
then the negative log of the posterior p(u|z) is
C =Xi
(zi− ui)2 + d(ui− ui−1)2
original
original plus noise
MAP restoration
20 40 60 80 100 120 140 160 180 200 220
20
40
60
20 40 60 80 100 120 140 160 180 200 220
20
40
60
20 40 60 80 100 120 140 160 180 200 220
20
40
60
d = 10
![Page 32: Lectures 3 & 4: Estimators](https://reader030.vdocuments.site/reader030/viewer/2022012809/61befe82c00e2c46c865939e/html5/thumbnails/32.jpg)
20 40 60 80 100 120 140 160 180 200 220
20
40
60
20 40 60 80 100 120 140 160 180 200 220
20
40
60
20 40 60 80 100 120 140 160 180 200 220
20
40
60
original
original plus noise
MAP restoration
d = 60
![Page 33: Lectures 3 & 4: Estimators](https://reader030.vdocuments.site/reader030/viewer/2022012809/61befe82c00e2c46c865939e/html5/thumbnails/33.jpg)
Loss Functions
Posterior
p(θ|z) ∼ p(z|θ)p(θ)
MAP
Example: hitting a target
Loss function L(x, x) =
(1 if |x − x| > t0 if |x − x| <= t
Expected Loss L(x) =Zp(x|z)L(x, x) dx
chosen value actual value
x
x
2t
Choose to minimize expected lossx x= argminx
Zp(x|z)L(x, x) dx
Example 1: squared loss function
µ= argminu
Z(x − u)2p(x) dx
L(x, x) = (x− x)2
this results in choosing the mean value
Sketch proof
Differentiate w.r.t. uZ2(u− x)p(x) dx= 0
![Page 34: Lectures 3 & 4: Estimators](https://reader030.vdocuments.site/reader030/viewer/2022012809/61befe82c00e2c46c865939e/html5/thumbnails/34.jpg)
Choose to minimize expected lossx x= argminx
Zp(x|z)L(x, x) dx
Example 2: absolute loss function
this results in choosing the median value
L(x, x) = |x − x|
xmed = argminu
Z|x− u|p(x) dx
[proof: exercise]
![Page 35: Lectures 3 & 4: Estimators](https://reader030.vdocuments.site/reader030/viewer/2022012809/61befe82c00e2c46c865939e/html5/thumbnails/35.jpg)
Warning: MAP and change of parameterization
MAP
MAP
Posterior
p(θ|z)
Posterior
p(θ0|z)
non linear map
θ0 = g(θ)
Problem:
• MAP finds maximum value of the posterior, but the posterior is a probability density
Solution:
• Loss functions correctly transform under change of coordinate (reparameterization)
Example of non-linear map: perspective projection