computational motor control: optimal estimation in noisy world (jaist summer course)

Computational Motor Control Summer School04: Optimal estimation

in noisy world.

Hirokazu Tanaka

School of Information Science

Japan Institute of Science and Technology

In this lecture, we will learn:

• Maximum-likelihood (ML) estimation

• Cramer-Rao’s lower bound

• Bayesian estimation

• Wiener filter

• Kalman filter

• Kalman smoother

• Causal inference

• Information integration

• Postdiction

Classical estimation (point) and Bayesian estimation (density).

sample space Χ parameter space Θ


ˆ xx

x

|P x

Maximum-likelihood (ML) estimation.


ˆ xx

MLˆ arg max |

arg max log |

x P x

P x

ML is …• Asymptotically unbiased (i.e., approaches to a true value).• Asymptotically efficient (i.e., achieves minimum variance, CRLB).

z

Maximum-likelihood (ML) estimation.


1ˆ x

1x

2x

2ˆ x

3ˆ x

3x 0

Kullback-Leibler divergence: a distance between two density functions.

KL ; log 0i

i

i

i

pD q p p

q

KL ; log 0

p xD q p dxp x

q x

KL divergence for discrete variable KL divergence for continuous variable

ML as a minimization of KL divergence.

2T0 0

0

1T

T0

1E log | log loˆ g |ˆ ˆ ˆ ˆ|

ˆ ˆ ˆ

N

i

i

q x q x E q xN

I

2

T T

log | log |E E

ˆ ˆ

l ˆog |ˆq x q x

I q x

0 11ˆ ˆ, IN

Sampling approximation:

ML is …• Asymptotically unbiased (i.e., approaches to a true value).• Asymptotically efficient (i.e., achieves minimum variance, CRLB).

Fisher information:

In the limit of large samples (infinite N), the ML estimator is unbiased and efficient.

Cramer-Rao lower bound: scalar parameter case.

Suppose a random variable 𝑋~𝑝(𝑥; 𝜃), where 𝜃 is fixed but unknown. Assume that 𝑝(𝑥;𝜃) satisfies the “regularity” condition:

where the expectation is with respect to 𝑝(𝑥;𝜃). Then the variance of any unbiased estimator 𝜃 satisfies

E log | 0,p x

1ˆVar

I

2 2

2

logE

|E

| logp x p xI

Fisher information:

Cramer-Rao lower bound: vector case.

Suppose a random variable 𝑋~𝑝(𝑥|𝛉), where θ is fixed but unknown. Assume that 𝑝(𝑥|θ) satisfies the “regularity” condition:

where the expectation is with respect to 𝑝(𝑥;𝜃). Then the variance of any unbiased estimator 𝛉 satisfies

E log | 0,p x

θ

θ

1ˆCov θ I θ

2log | log |

E Elog |

iji j ji

p x p x p x

θ θθI

Fisher information matrix:

Width estimation: optimal integration of vision and haptics.

22

HV2 2 2 2

V V

2 2 2

V

VH

H H

H

1 1 1

ˆ x x

Ernst & Banks (2002) Nature

H V

2 2

V

V

V H

2

2

2

H

H

2

, | | |

exp2 2

exp2

ˆ

xp x p x p x

x x

ML estimation of object width from vision and haptics:

x1: visually perceived width,x2: haptically perceived width

σ1: visual uncertaintyσ2 : haptic uncertainty



Density functions (sensory uncertainty)

Distribution functions (human response)

By examining human forced choice, the sensory density function can be estimated.



mean (xV) and variance (σV2) of vision

mean (xH) and variance (σH2) of haptics

mean (x) and variance (σV) of multi-modal integration

Human performance in visual-haptic discrimination (RIGHT) is predicted by maximum-likelihood integration of those in visual and haptic discrimination measured separately (LEFT).



22

H1

V

H

2 2 2 2

V V H

Hx x x

V

2 2 2

H

1 1 1

Bayes’ theorem: mapping observation to probability density.


x

|P x

||

P x PP x

P x

| :

| :

:

:

P x

P x

P

P x

Posterior probability

Likelihood

Prior probability

Marginal probability

Bayesian estimation: Maximum A Posteriori (MAP).


ˆ xx

ˆ arg max

arg max

arg m

|

ax

|

|

P x

P x

P

P

P

x

P x

x

Bayesian estimation: Mean minimum-squared error estimator.


MMSEˆ x

x

2

MMSEˆ

2

ˆ

ˆ arg min ˆE

arg m ˆin

x x

p x d

MMSEˆ x p d Ex x

×: MAP, ×: MMSE

Motion illusion as optimal percepts.

http://www.cs.huji.ac.il/~yweiss/Rhombus/rhombus.html


Weiss et al. (2002) Nature Neurosci

2

2

, , ,1, | , exp

2

, , ,, y

i i i i i i

i i x y x

I x t I xy y yy

t I x tP I x

x y tt v v v v

2

2 2

2

1

2, expx y x y

v

P vvv v

2

22 2

2 2

, , ,1 1, | , ex

,, p

,

2 2

,y

i i i i i i

x y i i

v i

x y x

y y yy v

x

I x t I x t I x tP v v I x t v

y tv v

Prior density: most of objects have small velocity (not moving).

Likelihood: Probability of observed image with given velocity (vx, vy).

, , , ,x yt y t t t yI x v v I tx Lucas-Kanade image model:

Posterior density


Weiss et al. (2002) Nature Neurosci

Causal inference of sensory information.

Kording et al. (2007) PLoS One; Kayer & Shams (2015) PLoS Biol



Given visual (xV) and auditory (xA) observations…

Q1: Do they belong to a single object (C=1) or come from two different objects (C=2)?

Q2: What are the most likely positions for the visual and the auditory objects?

Ax

Vx

Ax

Vx

Ax

Vx



A V

A V

A V A V

,,

, ,

| 1 11|

| 1 1 | 2 2

xx

x

P x C P CP C x

P x C P C P x C P Cx

A V

A V

A V A V

,,

, ,

| 2 22 |

| 1 1 | 2 2

xx

x

P x C P CP C x

P x C P C P x C P Cx

Bayes’ theorem:



A A, 1 A V A, 2 A Vˆ ˆ ˆ1| , 2 | ,C Cs s P C x x s P C x x

V, 1 A V V, 2V A Vˆ ˆ ˆ1| , 2 | ,C Cs s P C x x s P C x x

Model averaging.:

Prediction, filtering and smoothing of a dynamic system.

Prediction

Filtering

Smoothing

0 k T

0 01: 1, ,k kz z z

1: 1, ,k kz z z

1: 1, ,T Tz z z

01:k kp x z

1:k kp x z

1:k Tp x z

Estimation of dynamic systems: Kalman filter.

1k k k

k k k

Ax

Cx

w

z v

x

Stochastic state-space model

Problem: Given a sequence of measurements up to step k,

then estimate the conditional mean

and the conditional covariance of the state vector at step k,

1: 1 2 ,, , , kk z z z z

| 1:ˆ ,k k k kE x x z

| 1 1:

T

: : :ˆ ˆCov E ,k k k k k k k k k k k

x z x x x x z

Kalman filter optimally integrates prediction and measurement.

Prediction

| 1

| 1

1| 1

T

1| 1

ˆ ˆ

k k

k

k

k

k

k k

x Ax

A A Q

Filtering

| |

| 1

1

| | 1ˆ ˆ ˆ

k k k k k k k k

k k k kk

x x K z Cx

I K C

1

T T

| 1 | 1k k k k k

K C C C RKalman gain

1| 1

1| 1

ˆk k

k k

x | 1

| 1

ˆk k

k k

x |

|

ˆk k

k k

xPrediction Filtering

kz

Kalman filter: Derivation.

1| 1

1| 1

ˆk k

k k

x | 1

| 1

ˆk k

k k

x |

|

ˆk k

k k


kz

1: 1 1 1 1

1 1 1 1| 1 1| 1

| 1

1

|

: 1

1

ˆ; , ; ,

ˆ; ,

kk k k k kk

k k k k k k k k

k k k k k

p d p p

d

x z x x x zx

x x Ax Q x x

x x

1 1| 1 1| 1ˆ; ,k k k k k x x | 1 | 1; ˆ ,k k k k k x x | |; ,ˆ

k k k k kx x

Prediction

Filtering

1: 1

1: | |

1: 1

ˆ; ,k k k k

k k k k k k k

k k

p pp

p

z x x z

x z x xz z

Kalman filter: Derivation.

1| 1

1| 1

ˆk k

k k

x | 1

| 1

ˆk k

k k

x |

|

ˆk k

k k


kz

, ,

y y

x x xy

yx

x

y

μ

y μ

x

1 1, y yxx yy xx yyxy xyμ μx y y

1 1, x yy xyy xx xxyx yxμ μy x x

Lemma: When a joint density function of two variables x and y are given as

then conditional probability densities are given as follows:

Kalman filter explains smooth eye pursuit movements.

Burge (2007) J Vision

| | 1 | 1ˆ ˆ ˆ

k k k k k k k k x x K z Cx

noisy sensory feedback:Kalman filter predicts sloweradaptation.

accurate sensory feedback:Kalman filter predicts fasteradaptation.


Orban de Xivry et al. (2013) J Neurosci

Kalman filter #1 for visual processing of retinal slip.Kalman filter #2 for remembered target dynamics.


Sinusoidal target motion Anticipatory smooth pursuit

Orban de Xivry et al. (2013) J Neurosci

Smoothing of Dynamic systems: Kalman smoother.Smoothing consists of forward and backward paths.

Filtering

Smoothing

0 k T

1: 1, ,k kz z z

1: 1, ,T Tz z z

1:k kp x z

1:k Tp x z

0 k TForward path 1: 1, ,k kz z z

backward path 1: 1, ,k T k T z z z

Kalman smoother algorithm: RTS smoother.

Forward path: Using the standard algorithm of Kalman filter, compute the predicted mean and covariance,

and the filtered mean and covariance,

1| 1| , 1,ˆ , 0,k k k k k T x

| |ˆ , .0 ,,k k k k Tk x

Backward path: Then, the smoothed mean and covariance,

are solved backward in time as follows:

| |ˆ , .0 ,,k T k T Tk x

| | 1| 1|

| | 1| 1|

ˆ ˆ ˆ ˆk T k k k k T k k

T

k T k k k k T k k k

x x G x x

G G

1

| 1|

T

k k k k k

G Awhere

Flash-lag effect explained by Kalman smoother.

http://visionlab.harvard.edu/Members/Alumni/David/flash-lag.htm

Psychophysics Extrapolation model Latency difference model Optimal smoothing model

Tim

e

Nijhawan (1994) Whitney & Murakami (1998) Rao et al. (2001)

Rao et al. (2001) Neural Comput

Attractor neural networks for ML estimation.

1i ij j

j

u t w o t

Deneve et al. (1999) Nature Neurosci

0i io a

lim it

o t

Dynamics of recurrent neural network

2

2

11

1 1

i

j

i

j

u to t

u t

Local averaging

Divisive normalization (“winner-takes-all”)

Attractor neural networks for ML estimation.

Deneve et al. (1999) Nature Neurosci

Neural network algorithm for maximum likelihood.

Poissoni in f s

|!

ii s

i

i

nf

ifs

e sP r

n

1 1

|!

|

iinsN fN

i

i

i

ii

fs n

e sP P

ns

n

1

1 11

log

log

|

log

|

!

N

i

N N

i i

N

i ii i

i

i

n

sP

n sP

f nfs s

n

Jazayeri & Movshon (2006) Nature Neurosci

Probabilistic neural responses to stimulus s:

Likelihood function:

Log likelihood function:

Neural network algorithm for maximum likelihood.

Jazayeri & Movshon (2006) Nature Neurosci

1

1 1 1

log log

log

|

!

|N

i

N N N

i i i

i

i i i i

P P

f s r

r

r

s

f

s

s

r

Maximize 1

logi i

N

i

r f s

1

constantN

i

if s

under constraint

Neural implementation of multimodal integration.

Ma et al. (2006) Nature Neurosci

2

21 1 0

log | log const. const.2

N Ni

i i i

i i

nP s n f s s s

n

2| ; ,s sP n2

21 0

1 1

,

N

i ii

N N

i ii i

n

n n

s

If tuning function is Gaussian with mean si and variance σ02,

then, the likelihood function is also Gaussian:

where

2

2

0 2200

1, exp ,

22

i

i if sss s

Population activity Likelihood Population activity Likelihood

Neural implementation of multimodal integration.

Ma et al. (2006) Nature Neurosci

2

3

3 1 2

1 2

2 2

1 1 2

2

3

| |

, ,

| |

,

s s

s s

P P

P P

n n n

n n

2 2

23 12 2 2 2

1

1

1

2 2 2

1

2

2 2

3 2

1 1 1

x x

Summary

• Inference from noisy measurements is formulated as a statistical inference problem.

• Statistical inference is categorized into classical point estimation and Bayesian probability estimation.

• Human performance in psychophysical experiments matches the predictions of optimal estimation.

• The computation of optimal inference can be implemented by a few neural mechanisms.

References

• Ernst, M. O., & Banks, M. S. (2002). Humans integrate visual and haptic information in a statistically optimal fashion. Nature, 415(6870), 429-433.

• Weiss, Y., Simoncelli, E. P., & Adelson, E. H. (2002). Motion illusions as optimal percepts. Nature neuroscience, 5(6), 598-604.

• Kording, K. P., & Wolpert, D. M. (2004). Bayesian integration in sensorimotor learning. Nature, 427(6971), 244-247.

• Kording, K. P., Beierholm, U., Ma, W. J., Quartz, S., Tenenbaum, J. B., & Shams, L. (2007). Causal inference in multisensory perception.

• Kayser, C., & Shams, L. (2015). Multisensory causal inference in the brain. PLoS biology, 13(2).

• Burge, J., Ernst, M. O., & Banks, M. S. (2008). The statistical determinants of adaptation rate in human reaching. Journal of Vision, 8(4), 20.

• de Xivry, J. J. O., Coppe, S., Blohm, G., & Lefevre, P. (2013). Kalman filtering naturally accounts for visually guided and predictive smooth pursuit dynamics. The Journal of Neuroscience, 33(44), 17301-17313.

• Rao, R. P., Eagleman, D. M., & Sejnowski, T. J. (2001). Optimal smoothing in visual motion perception. Neural Computation, 13(6), 1243-1253.

• Deneve, S., Latham, P. E., & Pouget, A. (1999). Reading population codes: a neural implementation of ideal observers. Nature neuroscience, 2(8), 740-745.

• Jazayeri, M., & Movshon, J. A. (2006). Optimal representation of sensory information by neural populations. Nature neuroscience, 9(5), 690-696.

• Ma, W. J., Beck, J. M., Latham, P. E., & Pouget, A. (2006). Bayesian inference with probabilistic population codes. Nature neuroscience, 9(11), 1432-1438.

Exercise

• Write a Matlab code for Kalman filter and Kalman smoothing, and examine the difference.

• Psychophysics: Smooth eye pursuit experiment using GazeParser (http://gazeparser.sourceforge.net/).