bayesian estimation & information...
TRANSCRIPT
Bayesian Estimation & Information Theory
Jonathan Pillow
Mathematical Tools for Neuroscience (NEU 314)Spring, 2016
lecture 18
Bayesian Estimation
1. Likelihood
2. Prior
3. Loss function
jointly determine the posterior
“cost” of making an estimate if the true value is
• fully specifies how to generate an estimate from the data
Bayesian estimator is defined as:
✓̂(m) = argmin✓̂
ZL(✓̂, ✓)p(✓|m)d✓
L(✓̂, ✓)
“Bayes’ risk”
three basic ingredients:
Typical Loss functions and Bayesian estimators
1. squared error loss
need to find minimizing the expected loss:
Differentiate with respect to and set to zero:
“posterior mean”
also known as Bayes’ Least Squares (BLS) estimator
L(✓̂, ✓) = (✓̂ � ✓)2
0
Typical Loss functions and Bayesian estimators
2. “zero-one” loss (1 unless )
• posterior maximum (or “mode”).• known as maximum a posteriori (MAP) estimate.
expected loss:
which is minimized by:
L(✓̂, ✓) = 1� �(✓̂ � ✓)0
MAP vs. Posterior Mean estimate:
0 2 4 6 8 100
0.1
0.2
0.3
Note: posterior maximum and mean not always the same!
gamma pdf
Typical Loss functions and Bayesian estimators
3. “L1” loss
expected loss:
HW problem: What is the Bayesian estimator for this loss function?
0
Simple Example: Gaussian noise & prior
1. Likelihood additive Gaussian noise
3. Loss function:
zero-mean Gaussian2. Prior
doesn’t matter (all agree here)
posterior distribution
MAP estimate variance
Likelihood
8 0 8
8
0
8
-
-
θ
m
Likelihood
θ
m
8 0 8
8
0
8
-
-
Likelihood
θ
m
8 0 8
8
0
8
-
-
Likelihood
θ
m
8 0 8
8
0
8
-
-
8 0 8-
8 0 8-
Prior
θ
m
8 0 8-
8
0
8
-
Computing the posterior
x
likelihood prior
00
∝
posterior
0
0
θm
x ∝
likelihood prior posterior
00 0
00 0
0
bias
m*
θm
Making an Bayesian Estimate:
x ∝
likelihood prior posterior
00 0
00 0
0
largerbias
θm
High Measurement Noise: large bias
x ∝
likelihood prior posterior
00 0
00 0
0
smallbias
θm
Low Measurement Noise: small bias
Bayesian Estimation:
• Likelihood and prior combine to form posterior
• Bayesian estimate is always biased towards the prior (from the ML estimate)
+
Which grating moves faster?
Application #1: Biases in Motion Perception
+
Which grating moves faster?
Application #1: Biases in Motion Perception
Explanation from Weiss, Simoncelli & Adelson (2002):
• In the limit of a zero-contrast grating, likelihood becomes infinitely broad ⇒ percept goes to zero-motion.
prior priorlikelihood
likelihoodposterior
0 0
Noisier measurements, so likelihood is broader⇒ posterior has
larger shift toward 0(prior = no motion)
• Claim: explains why people actually speed up when driving in fog!
summary
• 3 ingredients for Bayesian estimation (prior, likelihood, loss)
• Bayes’ least squares (BLS) estimator (posterior mean)
• maximum a posteriori (MAP) estimator (posterior mode)
• accounts for stimulus-quality dependent bias in motion perception (Weiss, Simoncelli & Adelson 2002)