computational motor control: optimal estimation in noisy world (jaist summer course)
TRANSCRIPT
Computational Motor Control Summer School04: Optimal estimation
in noisy world.
Hirokazu Tanaka
School of Information Science
Japan Institute of Science and Technology
In this lecture, we will learn:
• Maximum-likelihood (ML) estimation
• Cramer-Rao’s lower bound
• Bayesian estimation
• Wiener filter
• Kalman filter
• Kalman smoother
• Causal inference
• Information integration
• Postdiction
Classical estimation (point) and Bayesian estimation (density).
sample space Χ parameter space Θ
sample space Χ parameter space Θ
ˆ xx
x
|P x
Maximum-likelihood (ML) estimation.
sample space Χ parameter space Θ
ˆ xx
MLˆ arg max |
arg max log |
x P x
P x
ML is …• Asymptotically unbiased (i.e., approaches to a true value).• Asymptotically efficient (i.e., achieves minimum variance, CRLB).
z
Maximum-likelihood (ML) estimation.
sample space Χ parameter space Θ
1ˆ x
1x
2x
2ˆ x
3ˆ x
3x 0
Kullback-Leibler divergence: a distance between two density functions.
KL ; log 0i
i
i
i
pD q p p
q
KL ; log 0
p xD q p dxp x
q x
KL divergence for discrete variable KL divergence for continuous variable
ML as a minimization of KL divergence.
KL ; | log|
=E log E log |
p xD p x q x dx p x
q x
p x q x
Minimization of KL divergence
Maximization of
KL ; |D p x q x
E log |q x
1
1E log | log |
N
i
iq x q xN
KL divergence between true density p(x) and parametrized density q(x|θ):
Sampling approximation:
ML as a minimization of KL divergence.
2T0 0
0
1T
T0
1E log | log loˆ g |ˆ ˆ ˆ ˆ|
ˆ ˆ ˆ
N
i
i
q x q x E q xN
I
2
T T
log | log |E E
ˆ ˆ
l ˆog |ˆq x q x
I q x
0 11ˆ ˆ, IN
Sampling approximation:
ML is …• Asymptotically unbiased (i.e., approaches to a true value).• Asymptotically efficient (i.e., achieves minimum variance, CRLB).
Fisher information:
In the limit of large samples (infinite N), the ML estimator is unbiased and efficient.
Cramer-Rao lower bound: scalar parameter case.
Suppose a random variable 𝑋~𝑝(𝑥; 𝜃), where 𝜃 is fixed but unknown. Assume that 𝑝(𝑥;𝜃) satisfies the “regularity” condition:
where the expectation is with respect to 𝑝(𝑥;𝜃). Then the variance of any unbiased estimator 𝜃 satisfies
E log | 0,p x
1ˆVar
I
2 2
2
logE
|E
| logp x p xI
Fisher information:
Cramer-Rao lower bound: vector case.
Suppose a random variable 𝑋~𝑝(𝑥|𝛉), where θ is fixed but unknown. Assume that 𝑝(𝑥|θ) satisfies the “regularity” condition:
where the expectation is with respect to 𝑝(𝑥;𝜃). Then the variance of any unbiased estimator 𝛉 satisfies
E log | 0,p x
θ
θ
1ˆCov θ I θ
2log | log |
E Elog |
iji j ji
p x p x p x
θ θθI
Fisher information matrix:
Width estimation: optimal integration of vision and haptics.
22
HV2 2 2 2
V V
2 2 2
V
VH
H H
H
1 1 1
ˆ x x
Ernst & Banks (2002) Nature
H V
2 2
V
V
V H
2
2
2
H
H
2
, | | |
exp2 2
exp2
ˆ
xp x p x p x
x x
ML estimation of object width from vision and haptics:
x1: visually perceived width,x2: haptically perceived width
σ1: visual uncertaintyσ2 : haptic uncertainty
Width estimation: optimal integration of vision and haptics.
Ernst & Banks (2002) Nature
Density functions (sensory uncertainty)
Distribution functions (human response)
By examining human forced choice, the sensory density function can be estimated.
Width estimation: optimal integration of vision and haptics.
Ernst & Banks (2002) Nature
mean (xV) and variance (σV2) of vision
mean (xH) and variance (σH2) of haptics
mean (x) and variance (σV) of multi-modal integration
Human performance in visual-haptic discrimination (RIGHT) is predicted by maximum-likelihood integration of those in visual and haptic discrimination measured separately (LEFT).
Width estimation: optimal integration of vision and haptics.
Ernst & Banks (2002) Nature
22
H1
V
H
2 2 2 2
V V H
Hx x x
V
2 2 2
H
1 1 1
Bayes’ theorem: mapping observation to probability density.
sample space Χ parameter space Θ
x
|P x
||
P x PP x
P x
| :
| :
:
:
P x
P x
P
P x
Posterior probability
Likelihood
Prior probability
Marginal probability
Bayesian estimation: Maximum A Posteriori (MAP).
sample space Χ parameter space Θ
ˆ xx
ˆ arg max
arg max
arg m
|
ax
|
|
P x
P x
P
P
P
x
P x
x
Bayesian estimation: Mean minimum-squared error estimator.
sample space Χ parameter space Θ
MMSEˆ x
x
2
MMSEˆ
2
ˆ
ˆ arg min ˆE
arg m ˆin
x x
p x d
MMSEˆ x p d Ex x
×: MAP, ×: MMSE
Motion illusion as optimal percepts.
http://www.cs.huji.ac.il/~yweiss/Rhombus/rhombus.html
Motion illusion as optimal percepts.
Weiss et al. (2002) Nature Neurosci
2
2
, , ,1, | , exp
2
, , ,, y
i i i i i i
i i x y x
I x t I xy y yy
t I x tP I x
x y tt v v v v
2
2 2
2
1
2, expx y x y
v
P vvv v
2
22 2
2 2
, , ,1 1, | , ex
,, p
,
2 2
,y
i i i i i i
x y i i
v i
x y x
y y yy v
x
I x t I x t I x tP v v I x t v
y tv v
Prior density: most of objects have small velocity (not moving).
Likelihood: Probability of observed image with given velocity (vx, vy).
, , , ,x yt y t t t yI x v v I tx Lucas-Kanade image model:
Posterior density
Motion illusion as optimal percepts.
Weiss et al. (2002) Nature Neurosci
Causal inference of sensory information.
Kording et al. (2007) PLoS One; Kayer & Shams (2015) PLoS Biol
Causal inference of sensory information.
Kording et al. (2007) PLoS One; Kayer & Shams (2015) PLoS Biol
Given visual (xV) and auditory (xA) observations…
Q1: Do they belong to a single object (C=1) or come from two different objects (C=2)?
Q2: What are the most likely positions for the visual and the auditory objects?
Ax
Vx
Ax
Vx
Ax
Vx
Causal inference of sensory information.
Kording et al. (2007) PLoS One; Kayer & Shams (2015) PLoS Biol
A V
A V
A V A V
,,
, ,
| 1 11|
| 1 1 | 2 2
xx
x
P x C P CP C x
P x C P C P x C P Cx
A V
A V
A V A V
,,
, ,
| 2 22 |
| 1 1 | 2 2
xx
x
P x C P CP C x
P x C P C P x C P Cx
Bayes’ theorem:
Causal inference of sensory information.
Kording et al. (2007) PLoS One; Kayer & Shams (2015) PLoS Biol
A A, 1 A V A, 2 A Vˆ ˆ ˆ1| , 2 | ,C Cs s P C x x s P C x x
V, 1 A V V, 2V A Vˆ ˆ ˆ1| , 2 | ,C Cs s P C x x s P C x x
Model averaging.:
Prediction, filtering and smoothing of a dynamic system.
Prediction
Filtering
Smoothing
0 k T
0 01: 1, ,k kz z z
1: 1, ,k kz z z
1: 1, ,T Tz z z
01:k kp x z
1:k kp x z
1:k Tp x z
Estimation of dynamic systems: Kalman filter.
1k k k
k k k
Ax
Cx
w
z v
x
Stochastic state-space model
Problem: Given a sequence of measurements up to step k,
then estimate the conditional mean
and the conditional covariance of the state vector at step k,
1: 1 2 ,, , , kk z z z z
| 1:ˆ ,k k k kE x x z
| 1 1:
T
: : :ˆ ˆCov E ,k k k k k k k k k k k
x z x x x x z
Kalman filter optimally integrates prediction and measurement.
Prediction
| 1
| 1
1| 1
T
1| 1
ˆ ˆ
k k
k
k
k
k
k k
x Ax
A A Q
Filtering
| |
| 1
1
| | 1ˆ ˆ ˆ
k k k k k k k k
k k k kk
x x K z Cx
I K C
1
T T
| 1 | 1k k k k k
K C C C RKalman gain
1| 1
1| 1
ˆk k
k k
x | 1
| 1
ˆk k
k k
x |
|
ˆk k
k k
xPrediction Filtering
kz
Kalman filter optimally integrates prediction and measurement.
1| 1
1| 1
ˆk k
k k
x | 1
| 1
ˆk k
k k
x |
|
ˆk k
k k
xPrediction Filtering
1: 1k kp x z
k kp z x
1:k kp x z
1: 1k kp x z k kp z x 1:k kp x z
prior
likelihood
posterior
Kalman filter: Derivation.
1| 1
1| 1
ˆk k
k k
x | 1
| 1
ˆk k
k k
x |
|
ˆk k
k k
xPrediction Filtering
kz
1: 1 1 1 1
1 1 1 1| 1 1| 1
| 1
1
|
: 1
1
ˆ; , ; ,
ˆ; ,
kk k k k kk
k k k k k k k k
k k k k k
p d p p
d
x z x x x zx
x x Ax Q x x
x x
1 1| 1 1| 1ˆ; ,k k k k k x x | 1 | 1; ˆ ,k k k k k x x | |; ,ˆ
k k k k kx x
Prediction
Filtering
1: 1
1: | |
1: 1
ˆ; ,k k k k
k k k k k k k
k k
p pp
p
z x x z
x z x xz z
Kalman filter: Derivation.
1| 1
1| 1
ˆk k
k k
x | 1
| 1
ˆk k
k k
x |
|
ˆk k
k k
xPrediction Filtering
kz
, ,
y y
x x xy
yx
x
y
μ
y μ
x
1 1, y yxx yy xx yyxy xyμ μx y y
1 1, x yy xyy xx xxyx yxμ μy x x
Lemma: When a joint density function of two variables x and y are given as
then conditional probability densities are given as follows:
Kalman filter explains smooth eye pursuit movements.
Burge (2007) J Vision
| | 1 | 1ˆ ˆ ˆ
k k k k k k k k x x K z Cx
noisy sensory feedback:Kalman filter predicts sloweradaptation.
accurate sensory feedback:Kalman filter predicts fasteradaptation.
Kalman filter explains smooth eye pursuit movements.
Orban de Xivry et al. (2013) J Neurosci
Kalman filter #1 for visual processing of retinal slip.Kalman filter #2 for remembered target dynamics.
Kalman filter explains smooth eye pursuit movements.
Sinusoidal target motion Anticipatory smooth pursuit
Orban de Xivry et al. (2013) J Neurosci
Smoothing of Dynamic systems: Kalman smoother.Smoothing consists of forward and backward paths.
Filtering
Smoothing
0 k T
1: 1, ,k kz z z
1: 1, ,T Tz z z
1:k kp x z
1:k Tp x z
0 k TForward path 1: 1, ,k kz z z
backward path 1: 1, ,k T k T z z z
Kalman smoother algorithm: RTS smoother.
Forward path: Using the standard algorithm of Kalman filter, compute the predicted mean and covariance,
and the filtered mean and covariance,
1| 1| , 1,ˆ , 0,k k k k k T x
| |ˆ , .0 ,,k k k k Tk x
Backward path: Then, the smoothed mean and covariance,
are solved backward in time as follows:
| |ˆ , .0 ,,k T k T Tk x
| | 1| 1|
| | 1| 1|
ˆ ˆ ˆ ˆk T k k k k T k k
T
k T k k k k T k k k
x x G x x
G G
1
| 1|
T
k k k k k
G Awhere
Flash-lag effect explained by Kalman smoother.
http://visionlab.harvard.edu/Members/Alumni/David/flash-lag.htm
Psychophysics Extrapolation model Latency difference model Optimal smoothing model
Tim
e
Nijhawan (1994) Whitney & Murakami (1998) Rao et al. (2001)
Rao et al. (2001) Neural Comput
Attractor neural networks for ML estimation.
1i ij j
j
u t w o t
Deneve et al. (1999) Nature Neurosci
0i io a
lim it
o t
Dynamics of recurrent neural network
2
2
11
1 1
i
j
i
j
u to t
u t
Local averaging
Divisive normalization (“winner-takes-all”)
Attractor neural networks for ML estimation.
Deneve et al. (1999) Nature Neurosci
Neural network algorithm for maximum likelihood.
Poissoni in f s
|!
ii s
i
i
nf
ifs
e sP r
n
1 1
|!
|
iinsN fN
i
i
i
ii
fs n
e sP P
ns
n
1
1 11
log
log
|
log
|
!
N
i
N N
i i
N
i ii i
i
i
n
sP
n sP
f nfs s
n
Jazayeri & Movshon (2006) Nature Neurosci
Probabilistic neural responses to stimulus s:
Likelihood function:
Log likelihood function:
Neural network algorithm for maximum likelihood.
Jazayeri & Movshon (2006) Nature Neurosci
1
1 1 1
log log
log
|
!
|N
i
N N N
i i i
i
i i i i
P P
f s r
r
r
s
f
s
s
r
Maximize 1
logi i
N
i
r f s
1
constantN
i
if s
under constraint
Neural implementation of multimodal integration.
Ma et al. (2006) Nature Neurosci
2
21 1 0
log | log const. const.2
N Ni
i i i
i i
nP s n f s s s
n
2| ; ,s sP n2
21 0
1 1
,
N
i ii
N N
i ii i
n
n n
s
If tuning function is Gaussian with mean si and variance σ02,
then, the likelihood function is also Gaussian:
where
2
2
0 2200
1, exp ,
22
i
i if sss s
Population activity Likelihood Population activity Likelihood
Neural implementation of multimodal integration.
Ma et al. (2006) Nature Neurosci
2
3
3 1 2
1 2
2 2
1 1 2
2
3
| |
, ,
| |
,
s s
s s
P P
P P
n n n
n n
2 2
23 12 2 2 2
1
1
1
2 2 2
1
2
2 2
3 2
1 1 1
x x
Summary
• Inference from noisy measurements is formulated as a statistical inference problem.
• Statistical inference is categorized into classical point estimation and Bayesian probability estimation.
• Human performance in psychophysical experiments matches the predictions of optimal estimation.
• The computation of optimal inference can be implemented by a few neural mechanisms.
References
• Ernst, M. O., & Banks, M. S. (2002). Humans integrate visual and haptic information in a statistically optimal fashion. Nature, 415(6870), 429-433.
• Weiss, Y., Simoncelli, E. P., & Adelson, E. H. (2002). Motion illusions as optimal percepts. Nature neuroscience, 5(6), 598-604.
• Kording, K. P., & Wolpert, D. M. (2004). Bayesian integration in sensorimotor learning. Nature, 427(6971), 244-247.
• Kording, K. P., Beierholm, U., Ma, W. J., Quartz, S., Tenenbaum, J. B., & Shams, L. (2007). Causal inference in multisensory perception.
• Kayser, C., & Shams, L. (2015). Multisensory causal inference in the brain. PLoS biology, 13(2).
• Burge, J., Ernst, M. O., & Banks, M. S. (2008). The statistical determinants of adaptation rate in human reaching. Journal of Vision, 8(4), 20.
• de Xivry, J. J. O., Coppe, S., Blohm, G., & Lefevre, P. (2013). Kalman filtering naturally accounts for visually guided and predictive smooth pursuit dynamics. The Journal of Neuroscience, 33(44), 17301-17313.
• Rao, R. P., Eagleman, D. M., & Sejnowski, T. J. (2001). Optimal smoothing in visual motion perception. Neural Computation, 13(6), 1243-1253.
• Deneve, S., Latham, P. E., & Pouget, A. (1999). Reading population codes: a neural implementation of ideal observers. Nature neuroscience, 2(8), 740-745.
• Jazayeri, M., & Movshon, J. A. (2006). Optimal representation of sensory information by neural populations. Nature neuroscience, 9(5), 690-696.
• Ma, W. J., Beck, J. M., Latham, P. E., & Pouget, A. (2006). Bayesian inference with probabilistic population codes. Nature neuroscience, 9(11), 1432-1438.
Exercise
• Write a Matlab code for Kalman filter and Kalman smoothing, and examine the difference.
• Psychophysics: Smooth eye pursuit experiment using GazeParser (http://gazeparser.sourceforge.net/).