computer vision: models, learning and inference …cv192/wiki.files/cv192_lec...non-visual tracking...

Computer Vision: Models, Learning and Inference–

Tracking

Oren Freifeld and Ron Shapira-Weber

Computer Science, Ben-Gurion University

June 3, 2019

www.cs.bgu.ac.il/~cv192/ Tracking (ver. 1.00) June 3, 2019 1 / 50

www.cs.bgu.ac.il/~cv192/

1 MMSE and Conditional ExpectationsThe Gaussian Case

2 Non-visual Tracking

3 Visual Tracking



MMSE and Conditional Expectations

Reminder

Let A be an n× n matrix.

A is called Positive Definite (PD) if

xTAx > 0 ∀ non-zero x ∈ Rn

A is called Positive Semidefinite (PSD) if

xTAx ≥ 0 ∀ non-zero x ∈ Rn

A is called symmetric ifA = AT

A is called SPD if it is both symmetric and PD.

A is called SPSD if it is both symmetric and PSD.




A ≺ B

We say that a matrix A is “smaller” than B, and write

A ≺ B

if B −A is positive definite.

Similarly, we writeA � B

if B −A is semi-positive definite.

Particularly, if Σ1 and Σ2 are two covariance matrices such that

Σ1 ≺ Σ2

than the RV associated with Σ2 has “more variance” than the RVassociated with Σ1.




Example

Σ1 =

[1 00 1

]and Σ2 =

[4 00 4

](1)

⇒ Σ2 −Σ1 =

[3 00 3

](2)

is SPD, so Σ1 ≺ Σ2.




Example

Σ1 =

[4 00 1

]and Σ2 =

[1 00 4

](3)

⇒ Σ2 −Σ1 =

[−3 00 3

](4)

is not SPD (it’s symmetric, but not PD). Similarly, Σ1 −Σ2 is also notSPD. Thus, we can’t “order” the two matrices.




Let X and Y be two random variables.

Let g : R→ R be some function.

Definition (MSE)

Let x̂ = g(y) be an estimate of x. Then

E((X − g(Y ))2) =

∫ ∫(x− g(y))2p(x, y) dxdy

is called the Mean Square Error of the estimator g.




Definition (MMSE)

The Minimal Mean Square Error (MMSE) is

ming(·)

E((X − g(Y ))2)

(the optimization is over the space of all R→ R functions) and theestimator that achieves it is the MMSE estimator:

X̂MMSE , argming(·)

E((X − g(Y ))2) .




Note that ∫E((X − g(y))2|Y = y)p(y)dy

=

∫ (∫(x− g(y))2p(x|Y = y) dx

)p(y)dy

=

∫ ∫(x− g(y))2p(x, y) dxdy = E((X − g(Y ))2)




Suppose we estimate x by some constant number m.

Leth(m) , E((X −m)2)︸︷︷︸∫

(x−m)2p(x) dx

= E(X2)− 2mE(X) +m2 .

h′(m) = −2E(X) + 2m.

h′(m) = 0⇒ m = E(X).

argminm

E((X −m)2) = E(X) =

∫xp(x) dx

minm

E((X −m)2) = var(X) =

∫(x− E(X))2p(x) dx




If you get the lowest grade in every exam, then you also have the lowestaverage grade among all the students.




Similarly, since, for every y,

argminm(y)

E((X −m(y))2|Y = y) = E(X|Y = y)

it follows that

argming(·)

E(X − g(Y ))2 = E(X|Y ) .

That is, the conditional mean achieves the MMSE. In other words, theconditional mean is the optimal estimator in the sense of MSE.




Fact

This generalizes to the vector case, X ∈ Rn (regardless whether y is ascalar or a vector) where

ming(y)∈Rn

E((X − g(y))T (X − g(y))|Y = y)

= cov(X|Y = y) = E([X − E(X|Y = y)]T [X − E(X|Y = y)]|Y = y)

where the notion of the “smallest matrix” is defined in terms of ≺.




Fact

E(X̂MMSE) = E(E(X|Y )) = E(X) (5)

(the law of iterated expectation)

In other words, the estimation error, X − X̂MMSE, has zero mean:

E(X − X̂MMSE) = E(X)− E(X̂MMSE) = 0 .




Let ε = X − X̂MMSE.

E(ε|Y ) = 0.

Proof.

E(ε|Y ) = E(X − X̂MMSE|Y ) = E(X|Y )− E(X̂MMSE|Y )) =

E(X|Y )− E(E(X|Y )|Y ) = E(X|Y )− E(X|Y ) = 0




Let ε = X − X̂MMSE. For any function g(Y ), we have

E(εg(Y )) = 0 .

Proof.

E(εg(Y )|Y ) = g(Y )E(ε) = g(Y ) · 0 = 0.

Then, by the law of iterated expectation:

E(εg(Y )) = E(E(εg(Y )|Y )) = 0 .




The estimation error, ε = X − X̂MMSE, and the estimator, X̂MMSE areuncorrelated.

Proof.

cov(εX̂MMSE) = E(εX̂MMSE)− E(ε)E(X̂MMSE)

= E(εX̂MMSE)− 0 · E(X̂MMSE) = E(εX̂MMSE) = E(εg(Y )) = 0

where used the fact that X̂MMSE = g(Y ); i.e., it is a function of Y .




Since cov(εX̂MMSE) = 0, it follows that

var(X) = var(X̂MMSE) + var(ε)

= var(E(X|Y )) + E((X − E(X|Y ))2)

= var(E(X|Y )) + EY (EX|Y [(X − E(X|Y ))2|Y = y])

= var(E(X|Y ) + E(var(X|Y ))

(AKA law of total variance)

Also, starting from the first line:

E(X2)− (E(X))2 = E(X̂2MMSE)− (E(X̂MMSE))

2 + E(ε2)− (E(ε))2

E(X2)− (E(X))2 = E(X̂2MMSE)− (E(X))2 + E(ε2)

E(X2) = E(X̂2MMSE) + E(ε2)

This is just Pythagoras’ theorem – note that (X,Y ) 7→ E(XY ) is aninner product, so X 7→

√E(X2) is the corresponding induced norm.



MMSE and Conditional Expectations The Gaussian Case

Conditional Expectations For Gaussians

Earlier we mentioned that Gaussians are closed under conditioning.

Fact (expressions for the conditional mean and conditional covariance)

If X and Y are jointly Gaussians, then E(X|Y = y) is a “linear” (affine,really) function of the measurement, y. Particularly:[

XY

]∼ N

([µX

µY

],

[ΣX ΣXY

ΣTXY ΣY

])⇒X|Y = y ∼ N

(µX|y,ΣX|y

)where

µX|y = µX + ΣXY Σ−1Y (y − µY )

and

ΣX|y = ΣX −ΣXY Σ−1Y ΣTXY




⇒In the Gaussian case, x̂MMSE is a “linear” function of y:

µX|y = µX + ΣXY Σ−1Y (y − µY ) = ΣXY Σ−1Y y + (µX −ΣXY Σ−1Y µY )

ΣX|y’s does not depend on y:

ΣX|y = ΣX −ΣXY Σ−1Y ΣTXY

Both these properties hold for Gaussians, but not in general.





Fact

The expression for ΣX|y is equivalent to taking

[ΣX ΣXY

ΣTXY ΣY

]−1,

dropping the the rows and columns that correspond to y, and invert backthe remaining block.

In Numpy, an easy way to drop rows/cols is using np.delete.




Example

Let x =[X1 X2 X3

]Tbe a Gaussian RV with a precision matrix Q.

Then the covariance of[X1 X3

]Tconditioned on X2 = x2 is[

Q11 Q13

Q31 Q33

]−1.





ΣX|y � ΣX (conditioning does not increase uncertainty) – this holds ingeneral, not just for Gaussians.



Non-visual Tracking

Classical (Non-visual) Tracking

A continuous-state discrete-time setting.

Time:1, 2, . . . , t

Hidden state at time t:xt

Hidden states till time t

x1:t = [x1,x2, . . . ,xt]

Measurement at time t:yt

Measurements till time t:

y1:t = [y1,y2, . . . ,yt]



Non-visual Tracking


The simplest case in a continuous-state discrete-time setting.

Linear dynamics with iid additive Gaussian noise:

xt = Axt−1 + ηx ηx ∼ N (0,Σηx)

– AKA called a first-order Auto-Regressive (AR) model – where thematrix A is known, deterministic, and doesn’t depend on t.

Linear observation model with iid additive Gaussian noise:

yt = Bxt + ηy ηy ∼ N (0,Σηy)

where the matrix B is known, deterministic, and doesn’t depend on t.



Non-visual Tracking


Gaussians RV’s are closed under affine transformations, marginalization,and conditioning ⇒ everything here is Gaussian: e.g.:

p(x1:t,y1:t)

p(x1:t)

p(y1:t)

p(x1:t|y1:t)p(xt|y1:t)p(xt|x1:(t−1))

p(xt|x1:(t−1),y1:t)

p(xt|x1:(t−1),y1:(t−1))

p(xt|y1:(t−1))

Moreover, all the associated means and covariances have closed from.



Non-visual Tracking


p(x1:t,y1:t): an MRF with an “HMM-like” graph – but the term‘HMM” is usually used when the hidden states are discrete.

p(x1:t): a Markov-chain structure. E.g.:

p(xt|x1:(t−1)) = p(xt|xt−1)

p(y1:t): graph is fully connected



Non-visual Tracking


Because everything is Gaussian here, the MMSE estimators of xt|y1:tand xt|y1:(t−1) are given in terms of the corresponding conditionalexpectations.

Turns out:

(µxt|y1:(t−1),Σxt|y1:(t−1)

) = func(µxt−1|y1:(t−1),Σxt−1|y1:(t−1)

)

(µxt|y1:t ,Σxt|y1:t) = func(µxt−1|y1:(t−1),Σxt−1|y1:(t−1)

,yt)

and these recursive computations have closed forms (omitted here).

These computations are known as the Kalman Filter. Note it is a linearfilter.



Non-visual Tracking


The Kalman Filter gives us more than mere point estimates of xt|y1:tand xt|y1:(t−1); rather, it gives us an entire posterior distribution that isbeing propagated in time.

Note this distribution, being Gaussian, is unimodal. This is a limitation.



Non-visual Tracking

Convolution of Probability Density Functions

Fact

If X and Y are two independent RVs, and Z = X + Y , then

pZ = pX ∗ pY

where ∗ denotes convolution.

Particularly, if, x ∼ p(x) and η ∼ N (0, σ2) are independent, then thedensity of y = x+ η is a “blurred” version of p(x).This is sometimes called diffusion.



Non-visual Tracking

Kalman Filter as Probability Density Propagation



Non-visual Tracking

Kalman Filter: Pros and Cons

Pros: Optimal for linear dynamics and Gaussian models; simple, widelyknown, efficient

Cons: Can’t handle multi-modal distribution, supports only singlehypothesis; restricted to linear models



Non-visual Tracking

Nonlinear Extensions of the Kalman Filter

The Extended Kalman Filter (EKF) is a nonlinear filter that designed tohandle nonlinear differentiable dynamics and observation models;essentially, the system is being linearized around the current estimate.

The Unscented Kalman Filter (UKF), which uses deterministic sampling,better handles highly-nonlinear dynamics and observation models (anddifferentiability is not assumed)

In both cases, however, the underlying distribution is still assumed to beunimodal.



Non-visual Tracking

More General Probability Density Propagation

Figure from Michael Isard and Andrew Blake, IJCV ’98www.cs.bgu.ac.il/~cv192/ Tracking (ver. 1.00) June 3, 2019 37 / 50


Non-visual Tracking

Particle Filter

Particle Filter (AKA Sequential Monte Carlo) provides an alternative toKalman filter that is also easy to implement, but can also handlemultiple modes, and does not assume linearity/differentiability.

It is based on a discrete approximation of p(xt|xt−1,yt) via a set of“particles” which are being propagated across time.

The main downside of the Particle Filter is that it doesn’t scale well withthe dimensionality of x.

Usually more effective than Kalman filter in visual tracking.



Non-visual Tracking

Factored Sampling

Consider first the static case (i.e., there is no t).

(si)Ni=1 points, called “particles”, are sampled iid from a prior, p(x).

Each si is assigned weight, πi (depicted here by the blob’s size) inproportion to the likelihood, p(y|x = si).

The weighted point set then serves as an approximated representation ofthe posterior, p(x|y).Figure from Michael Isard and Andrew Blake, IJCV ’98



Non-visual Tracking

Propagating between Consecutive Times

Given a weighted set of particles, we would like to evolve it in time.

Resampling:Sample N new particles, (sti)

Ni=1, with replacement, from (st−1i )Ni=1

according to the discrete distribution (πt−1i )Ni=1

Deterministic drift. E.g., apply a linear transformation to each particle.sti ← AstiDiffuse. I.e., add noise. E.g.,sti+ = nti where nti is Gaussian IID noise.

Weight by the new likelihood: πti ∝ pt(yt|x = sti)

Remark

Usually we will have expressions defined in terms of log of unnormalizedπti ’s – so don’t forget the log-sum-exp trick.



Non-visual Tracking

CONDENSATION

Figure from Michael Isard and Andrew Blake, IJCV ’98www.cs.bgu.ac.il/~cv192/ Tracking (ver. 1.00) June 3, 2019 41 / 50


Visual Tracking

We will restrict discussion to:

2D-based tracking in a single camera;

a probabilistic formulation where the state of interest is defined via asmall number of parameters.Example: track a bounding box, or another shape defined via a smallnumber of a parameters.



Visual Tracking

Visual Tracking

In Computer Vision, it is not always clear what yt is

This leads to complicated expressions of p(yt|xt), often with noclosed-form.

In turn, this complicates p(xt|xt−1,yt)

Motivates the need for more flexible methods

The CONDENSATION1 algorithm (Isard and Blake, IJCV ’98), relatedto the Particle Filter

Can handle multiple modes, recovering from failures (not always. . .),“only” needs a way to sample from p(xt|xt−1) and evaluate, apossibly-unnormalized, p(yt|xt)

There are also many other visual-tracking methods

1Conditional Density Propagation for Visual Trackingwww.cs.bgu.ac.il/~cv192/ Tracking (ver. 1.00) June 3, 2019 43 / 50


Visual Tracking

CONDENSATION Example

Figure from Isard and Blake, IJCV ’98www.cs.bgu.ac.il/~cv192/ Tracking (ver. 1.00) June 3, 2019 44 / 50


Visual Tracking

How to get Observations?

Some possible approaches:

Background modeling

Tracking lines/contours/features

Tracking-by-detection (e.g. using template matching)



Visual Tracking

Process used in Isard and Blake for Contour Tracking

Figure from Isard and Blake, IJCV ’98www.cs.bgu.ac.il/~cv192/ Tracking (ver. 1.00) June 3, 2019 46 / 50


Visual Tracking

Isard and Blake

See demos



Visual Tracking

Another Example for Parameterization

Articulated parts

Figure from Sidenbladh, Black and Fleet, ECCV 2000www.cs.bgu.ac.il/~cv192/ Tracking (ver. 1.00) June 3, 2019 48 / 50


Visual Tracking

Bigger Problem: Data Association

What if want to explicitly model multiple objects being tracked?Which measurement goes with which track? (or clutter)

Often heuristics are used

Can be done in a principled way, but the details are not trivial



Visual Tracking

Version Log

3/6/2019, ver 1.00.



computer vision: models, learning and inference …cv192/wiki.files/cv192_lec...non-visual tracking...

Documents