computer vision: models, learning and inference …cv192/wiki.files/cv192_lec...non-visual tracking...
TRANSCRIPT
Computer Vision: Models, Learning and Inference–
Tracking
Oren Freifeld and Ron Shapira-Weber
Computer Science, Ben-Gurion University
June 3, 2019
www.cs.bgu.ac.il/~cv192/ Tracking (ver. 1.00) June 3, 2019 1 / 50
1 MMSE and Conditional ExpectationsThe Gaussian Case
2 Non-visual Tracking
3 Visual Tracking
www.cs.bgu.ac.il/~cv192/ Tracking (ver. 1.00) June 3, 2019 2 / 50
MMSE and Conditional Expectations
Reminder
Let A be an n× n matrix.
A is called Positive Definite (PD) if
xTAx > 0 ∀ non-zero x ∈ Rn
A is called Positive Semidefinite (PSD) if
xTAx ≥ 0 ∀ non-zero x ∈ Rn
A is called symmetric ifA = AT
A is called SPD if it is both symmetric and PD.
A is called SPSD if it is both symmetric and PSD.
www.cs.bgu.ac.il/~cv192/ Tracking (ver. 1.00) June 3, 2019 3 / 50
MMSE and Conditional Expectations
A ≺ B
We say that a matrix A is “smaller” than B, and write
A ≺ B
if B −A is positive definite.
Similarly, we writeA � B
if B −A is semi-positive definite.
Particularly, if Σ1 and Σ2 are two covariance matrices such that
Σ1 ≺ Σ2
than the RV associated with Σ2 has “more variance” than the RVassociated with Σ1.
www.cs.bgu.ac.il/~cv192/ Tracking (ver. 1.00) June 3, 2019 4 / 50
MMSE and Conditional Expectations
Example
Σ1 =
[1 00 1
]and Σ2 =
[4 00 4
](1)
⇒ Σ2 −Σ1 =
[3 00 3
](2)
is SPD, so Σ1 ≺ Σ2.
www.cs.bgu.ac.il/~cv192/ Tracking (ver. 1.00) June 3, 2019 5 / 50
MMSE and Conditional Expectations
Example
Σ1 =
[4 00 1
]and Σ2 =
[1 00 4
](3)
⇒ Σ2 −Σ1 =
[−3 00 3
](4)
is not SPD (it’s symmetric, but not PD). Similarly, Σ1 −Σ2 is also notSPD. Thus, we can’t “order” the two matrices.
www.cs.bgu.ac.il/~cv192/ Tracking (ver. 1.00) June 3, 2019 6 / 50
MMSE and Conditional Expectations
Let X and Y be two random variables.
Let g : R→ R be some function.
Definition (MSE)
Let x̂ = g(y) be an estimate of x. Then
E((X − g(Y ))2) =
∫ ∫(x− g(y))2p(x, y) dxdy
is called the Mean Square Error of the estimator g.
www.cs.bgu.ac.il/~cv192/ Tracking (ver. 1.00) June 3, 2019 7 / 50
MMSE and Conditional Expectations
Definition (MMSE)
The Minimal Mean Square Error (MMSE) is
ming(·)
E((X − g(Y ))2)
(the optimization is over the space of all R→ R functions) and theestimator that achieves it is the MMSE estimator:
X̂MMSE , argming(·)
E((X − g(Y ))2) .
www.cs.bgu.ac.il/~cv192/ Tracking (ver. 1.00) June 3, 2019 8 / 50
MMSE and Conditional Expectations
Note that ∫E((X − g(y))2|Y = y)p(y)dy
=
∫ (∫(x− g(y))2p(x|Y = y) dx
)p(y)dy
=
∫ ∫(x− g(y))2p(x, y) dxdy = E((X − g(Y ))2)
www.cs.bgu.ac.il/~cv192/ Tracking (ver. 1.00) June 3, 2019 9 / 50
MMSE and Conditional Expectations
Suppose we estimate x by some constant number m.
Leth(m) , E((X −m)2)︸ ︷︷ ︸∫
(x−m)2p(x) dx
= E(X2)− 2mE(X) +m2 .
h′(m) = −2E(X) + 2m.
h′(m) = 0⇒ m = E(X).
argminm
E((X −m)2) = E(X) =
∫xp(x) dx
minm
E((X −m)2) = var(X) =
∫(x− E(X))2p(x) dx
www.cs.bgu.ac.il/~cv192/ Tracking (ver. 1.00) June 3, 2019 10 / 50
MMSE and Conditional Expectations
Let y be a realization of Y . Repeating the analysis where now we allowm to depend on y, and replacing means with conditional means:
h(m; y) , E((X −m)2|Y = y)︸ ︷︷ ︸∫(x−m)2p(x|Y=y) dx
= E(X2|Y = y)− 2mE(X|Y = y) +m2 .
h′(m; y) = −2E(X|Y = y) + 2m.
h′(m; y) = 0⇒ m = E(X|Y = y).
argminm
E((X −m)2|Y = y) = E(X|Y = y) =
∫xp(x|Y = y) dx
minm
E((X −m)2|Y = y)
= var(X|Y = y) =
∫(x− E(X|Y = y))2p(x|Y = y) dx
www.cs.bgu.ac.il/~cv192/ Tracking (ver. 1.00) June 3, 2019 11 / 50
MMSE and Conditional Expectations
If you get the lowest grade in every exam, then you also have the lowestaverage grade among all the students.
www.cs.bgu.ac.il/~cv192/ Tracking (ver. 1.00) June 3, 2019 12 / 50
MMSE and Conditional Expectations
Similarly, since, for every y,
argminm(y)
E((X −m(y))2|Y = y) = E(X|Y = y)
it follows that
argming(·)
E(X − g(Y ))2 = E(X|Y ) .
That is, the conditional mean achieves the MMSE. In other words, theconditional mean is the optimal estimator in the sense of MSE.
www.cs.bgu.ac.il/~cv192/ Tracking (ver. 1.00) June 3, 2019 13 / 50
MMSE and Conditional Expectations
To Summarize:
Suppose X and Y and two jointly-distributed scalar RVs.
Observed Y = y, want to estimate x
x̂MMSE , argming(y)∈R
E((X − g(y))2|Y = y) = E(X|Y = y)
ming(y)∈R
E((X − g(y))2|Y = y) = var(X|Y = y)
var(X|Y = y) , E((X − E(X|Y = y))2|Y = y)
www.cs.bgu.ac.il/~cv192/ Tracking (ver. 1.00) June 3, 2019 14 / 50
MMSE and Conditional Expectations
Fact
This generalizes to the vector case, X ∈ Rn (regardless whether y is ascalar or a vector) where
ming(y)∈Rn
E((X − g(y))T (X − g(y))|Y = y)
= cov(X|Y = y) = E([X − E(X|Y = y)]T [X − E(X|Y = y)]|Y = y)
where the notion of the “smallest matrix” is defined in terms of ≺.
www.cs.bgu.ac.il/~cv192/ Tracking (ver. 1.00) June 3, 2019 15 / 50
MMSE and Conditional Expectations
Fact
E(X̂MMSE) = E(E(X|Y )) = E(X) (5)
(the law of iterated expectation)
In other words, the estimation error, X − X̂MMSE, has zero mean:
E(X − X̂MMSE) = E(X)− E(X̂MMSE) = 0 .
www.cs.bgu.ac.il/~cv192/ Tracking (ver. 1.00) June 3, 2019 16 / 50
MMSE and Conditional Expectations
Let ε = X − X̂MMSE.
E(ε|Y ) = 0.
Proof.
E(ε|Y ) = E(X − X̂MMSE|Y ) = E(X|Y )− E(X̂MMSE|Y )) =
E(X|Y )− E(E(X|Y )|Y ) = E(X|Y )− E(X|Y ) = 0
www.cs.bgu.ac.il/~cv192/ Tracking (ver. 1.00) June 3, 2019 17 / 50
MMSE and Conditional Expectations
Let ε = X − X̂MMSE. For any function g(Y ), we have
E(εg(Y )) = 0 .
Proof.
E(εg(Y )|Y ) = g(Y )E(ε) = g(Y ) · 0 = 0.
Then, by the law of iterated expectation:
E(εg(Y )) = E(E(εg(Y )|Y )) = 0 .
www.cs.bgu.ac.il/~cv192/ Tracking (ver. 1.00) June 3, 2019 18 / 50
MMSE and Conditional Expectations
The estimation error, ε = X − X̂MMSE, and the estimator, X̂MMSE areuncorrelated.
Proof.
cov(εX̂MMSE) = E(εX̂MMSE)− E(ε)E(X̂MMSE)
= E(εX̂MMSE)− 0 · E(X̂MMSE) = E(εX̂MMSE) = E(εg(Y )) = 0
where used the fact that X̂MMSE = g(Y ); i.e., it is a function of Y .
www.cs.bgu.ac.il/~cv192/ Tracking (ver. 1.00) June 3, 2019 19 / 50
MMSE and Conditional Expectations
Since cov(εX̂MMSE) = 0, it follows that
var(X) = var(X̂MMSE) + var(ε)
= var(E(X|Y )) + E((X − E(X|Y ))2)
= var(E(X|Y )) + EY (EX|Y [(X − E(X|Y ))2|Y = y])
= var(E(X|Y ) + E(var(X|Y ))
(AKA law of total variance)
Also, starting from the first line:
E(X2)− (E(X))2 = E(X̂2MMSE)− (E(X̂MMSE))
2 + E(ε2)− (E(ε))2
E(X2)− (E(X))2 = E(X̂2MMSE)− (E(X))2 + E(ε2)
E(X2) = E(X̂2MMSE) + E(ε2)
This is just Pythagoras’ theorem – note that (X,Y ) 7→ E(XY ) is aninner product, so X 7→
√E(X2) is the corresponding induced norm.
www.cs.bgu.ac.il/~cv192/ Tracking (ver. 1.00) June 3, 2019 20 / 50
MMSE and Conditional Expectations The Gaussian Case
Conditional Expectations For Gaussians
Earlier we mentioned that Gaussians are closed under conditioning.
Fact (expressions for the conditional mean and conditional covariance)
If X and Y are jointly Gaussians, then E(X|Y = y) is a “linear” (affine,really) function of the measurement, y. Particularly:[
XY
]∼ N
([µX
µY
],
[ΣX ΣXY
ΣTXY ΣY
])⇒X|Y = y ∼ N
(µX|y,ΣX|y
)where
µX|y = µX + ΣXY Σ−1Y (y − µY )
and
ΣX|y = ΣX −ΣXY Σ−1Y ΣTXY
www.cs.bgu.ac.il/~cv192/ Tracking (ver. 1.00) June 3, 2019 21 / 50
MMSE and Conditional Expectations The Gaussian Case
⇒In the Gaussian case, x̂MMSE is a “linear” function of y:
µX|y = µX + ΣXY Σ−1Y (y − µY ) = ΣXY Σ−1Y y + (µX −ΣXY Σ−1Y µY )
ΣX|y’s does not depend on y:
ΣX|y = ΣX −ΣXY Σ−1Y ΣTXY
Both these properties hold for Gaussians, but not in general.
www.cs.bgu.ac.il/~cv192/ Tracking (ver. 1.00) June 3, 2019 22 / 50
MMSE and Conditional Expectations The Gaussian Case
Conditional Expectations For Gaussians
Fact
The expression for ΣX|y is equivalent to taking
[ΣX ΣXY
ΣTXY ΣY
]−1,
dropping the the rows and columns that correspond to y, and invert backthe remaining block.
In Numpy, an easy way to drop rows/cols is using np.delete.
www.cs.bgu.ac.il/~cv192/ Tracking (ver. 1.00) June 3, 2019 23 / 50
MMSE and Conditional Expectations The Gaussian Case
Example
Let x =[X1 X2 X3
]Tbe a Gaussian RV with a precision matrix Q.
Then the covariance of[X1 X3
]Tconditioned on X2 = x2 is[
Q11 Q13
Q31 Q33
]−1.
www.cs.bgu.ac.il/~cv192/ Tracking (ver. 1.00) June 3, 2019 24 / 50
MMSE and Conditional Expectations The Gaussian Case
Conditional Expectations For Gaussians
ΣX|y � ΣX (conditioning does not increase uncertainty) – this holds ingeneral, not just for Gaussians.
www.cs.bgu.ac.il/~cv192/ Tracking (ver. 1.00) June 3, 2019 25 / 50
Non-visual Tracking
Classical (Non-visual) Tracking
A continuous-state discrete-time setting.
Time:1, 2, . . . , t
Hidden state at time t:xt
Hidden states till time t
x1:t = [x1,x2, . . . ,xt]
Measurement at time t:yt
Measurements till time t:
y1:t = [y1,y2, . . . ,yt]
www.cs.bgu.ac.il/~cv192/ Tracking (ver. 1.00) June 3, 2019 26 / 50
Non-visual Tracking
Classical (Non-visual) Tracking
The simplest case in a continuous-state discrete-time setting.
Linear dynamics with iid additive Gaussian noise:
xt = Axt−1 + ηx ηx ∼ N (0,Σηx)
– AKA called a first-order Auto-Regressive (AR) model – where thematrix A is known, deterministic, and doesn’t depend on t.
Linear observation model with iid additive Gaussian noise:
yt = Bxt + ηy ηy ∼ N (0,Σηy)
where the matrix B is known, deterministic, and doesn’t depend on t.
www.cs.bgu.ac.il/~cv192/ Tracking (ver. 1.00) June 3, 2019 27 / 50
Non-visual Tracking
Classical (Non-visual) Tracking
Gaussians RV’s are closed under affine transformations, marginalization,and conditioning ⇒ everything here is Gaussian: e.g.:
p(x1:t,y1:t)
p(x1:t)
p(y1:t)
p(x1:t|y1:t)p(xt|y1:t)p(xt|x1:(t−1))
p(xt|x1:(t−1),y1:t)
p(xt|x1:(t−1),y1:(t−1))
p(xt|y1:(t−1))
Moreover, all the associated means and covariances have closed from.
www.cs.bgu.ac.il/~cv192/ Tracking (ver. 1.00) June 3, 2019 28 / 50
Non-visual Tracking
Classical (Non-visual) Tracking
p(x1:t,y1:t): an MRF with an “HMM-like” graph – but the term‘HMM” is usually used when the hidden states are discrete.
p(x1:t): a Markov-chain structure. E.g.:
p(xt|x1:(t−1)) = p(xt|xt−1)
p(y1:t): graph is fully connected
www.cs.bgu.ac.il/~cv192/ Tracking (ver. 1.00) June 3, 2019 29 / 50
Non-visual Tracking
Classical (Non-visual) Tracking
p(x1:t|y1:t): a Markov-chain structure. In fact:
p(xt|x1:(t−1),y1:t)MC= p(xt|xt−1,y1:t)
xt⊥⊥y1:(t−1)|xt−1= p(xt|xt−1,yt)
p(x1:t|y1:(t−1)): a Markov-chain structure. In fact:
p(xt|x1:(t−1),y1:(t−1))MC= p(xt|xt−1,y1:(t−1))
xt⊥⊥y1:(t−1)|xt−1= p(xt|xt−1)
www.cs.bgu.ac.il/~cv192/ Tracking (ver. 1.00) June 3, 2019 30 / 50
Non-visual Tracking
Classical (Non-visual) Tracking
Because everything is Gaussian here, the MMSE estimators of xt|y1:tand xt|y1:(t−1) are given in terms of the corresponding conditionalexpectations.
Turns out:
(µxt|y1:(t−1),Σxt|y1:(t−1)
) = func(µxt−1|y1:(t−1),Σxt−1|y1:(t−1)
)
(µxt|y1:t ,Σxt|y1:t) = func(µxt−1|y1:(t−1),Σxt−1|y1:(t−1)
,yt)
and these recursive computations have closed forms (omitted here).
These computations are known as the Kalman Filter. Note it is a linearfilter.
www.cs.bgu.ac.il/~cv192/ Tracking (ver. 1.00) June 3, 2019 31 / 50
Non-visual Tracking
Classical (Non-visual) Tracking
The Kalman Filter gives us more than mere point estimates of xt|y1:tand xt|y1:(t−1); rather, it gives us an entire posterior distribution that isbeing propagated in time.
Note this distribution, being Gaussian, is unimodal. This is a limitation.
www.cs.bgu.ac.il/~cv192/ Tracking (ver. 1.00) June 3, 2019 32 / 50
Non-visual Tracking
Convolution of Probability Density Functions
Fact
If X and Y are two independent RVs, and Z = X + Y , then
pZ = pX ∗ pY
where ∗ denotes convolution.
Particularly, if, x ∼ p(x) and η ∼ N (0, σ2) are independent, then thedensity of y = x+ η is a “blurred” version of p(x).This is sometimes called diffusion.
www.cs.bgu.ac.il/~cv192/ Tracking (ver. 1.00) June 3, 2019 33 / 50
Non-visual Tracking
Kalman Filter as Probability Density Propagation
www.cs.bgu.ac.il/~cv192/ Tracking (ver. 1.00) June 3, 2019 34 / 50
Non-visual Tracking
Kalman Filter: Pros and Cons
Pros: Optimal for linear dynamics and Gaussian models; simple, widelyknown, efficient
Cons: Can’t handle multi-modal distribution, supports only singlehypothesis; restricted to linear models
www.cs.bgu.ac.il/~cv192/ Tracking (ver. 1.00) June 3, 2019 35 / 50
Non-visual Tracking
Nonlinear Extensions of the Kalman Filter
The Extended Kalman Filter (EKF) is a nonlinear filter that designed tohandle nonlinear differentiable dynamics and observation models;essentially, the system is being linearized around the current estimate.
The Unscented Kalman Filter (UKF), which uses deterministic sampling,better handles highly-nonlinear dynamics and observation models (anddifferentiability is not assumed)
In both cases, however, the underlying distribution is still assumed to beunimodal.
www.cs.bgu.ac.il/~cv192/ Tracking (ver. 1.00) June 3, 2019 36 / 50
Non-visual Tracking
More General Probability Density Propagation
Figure from Michael Isard and Andrew Blake, IJCV ’98www.cs.bgu.ac.il/~cv192/ Tracking (ver. 1.00) June 3, 2019 37 / 50
Non-visual Tracking
Particle Filter
Particle Filter (AKA Sequential Monte Carlo) provides an alternative toKalman filter that is also easy to implement, but can also handlemultiple modes, and does not assume linearity/differentiability.
It is based on a discrete approximation of p(xt|xt−1,yt) via a set of“particles” which are being propagated across time.
The main downside of the Particle Filter is that it doesn’t scale well withthe dimensionality of x.
Usually more effective than Kalman filter in visual tracking.
www.cs.bgu.ac.il/~cv192/ Tracking (ver. 1.00) June 3, 2019 38 / 50
Non-visual Tracking
Factored Sampling
Consider first the static case (i.e., there is no t).
(si)Ni=1 points, called “particles”, are sampled iid from a prior, p(x).
Each si is assigned weight, πi (depicted here by the blob’s size) inproportion to the likelihood, p(y|x = si).
The weighted point set then serves as an approximated representation ofthe posterior, p(x|y).Figure from Michael Isard and Andrew Blake, IJCV ’98
www.cs.bgu.ac.il/~cv192/ Tracking (ver. 1.00) June 3, 2019 39 / 50
Non-visual Tracking
Propagating between Consecutive Times
Given a weighted set of particles, we would like to evolve it in time.
Resampling:Sample N new particles, (sti)
Ni=1, with replacement, from (st−1i )Ni=1
according to the discrete distribution (πt−1i )Ni=1
Deterministic drift. E.g., apply a linear transformation to each particle.sti ← AstiDiffuse. I.e., add noise. E.g.,sti+ = nti where nti is Gaussian IID noise.
Weight by the new likelihood: πti ∝ pt(yt|x = sti)
Remark
Usually we will have expressions defined in terms of log of unnormalizedπti ’s – so don’t forget the log-sum-exp trick.
www.cs.bgu.ac.il/~cv192/ Tracking (ver. 1.00) June 3, 2019 40 / 50
Non-visual Tracking
CONDENSATION
Figure from Michael Isard and Andrew Blake, IJCV ’98www.cs.bgu.ac.il/~cv192/ Tracking (ver. 1.00) June 3, 2019 41 / 50
Visual Tracking
We will restrict discussion to:
2D-based tracking in a single camera;
a probabilistic formulation where the state of interest is defined via asmall number of parameters.Example: track a bounding box, or another shape defined via a smallnumber of a parameters.
www.cs.bgu.ac.il/~cv192/ Tracking (ver. 1.00) June 3, 2019 42 / 50
Visual Tracking
Visual Tracking
In Computer Vision, it is not always clear what yt is
This leads to complicated expressions of p(yt|xt), often with noclosed-form.
In turn, this complicates p(xt|xt−1,yt)
Motivates the need for more flexible methods
The CONDENSATION1 algorithm (Isard and Blake, IJCV ’98), relatedto the Particle Filter
Can handle multiple modes, recovering from failures (not always. . .),“only” needs a way to sample from p(xt|xt−1) and evaluate, apossibly-unnormalized, p(yt|xt)
There are also many other visual-tracking methods
1Conditional Density Propagation for Visual Trackingwww.cs.bgu.ac.il/~cv192/ Tracking (ver. 1.00) June 3, 2019 43 / 50
Visual Tracking
CONDENSATION Example
Figure from Isard and Blake, IJCV ’98www.cs.bgu.ac.il/~cv192/ Tracking (ver. 1.00) June 3, 2019 44 / 50
Visual Tracking
How to get Observations?
Some possible approaches:
Background modeling
Tracking lines/contours/features
Tracking-by-detection (e.g. using template matching)
www.cs.bgu.ac.il/~cv192/ Tracking (ver. 1.00) June 3, 2019 45 / 50
Visual Tracking
Process used in Isard and Blake for Contour Tracking
Figure from Isard and Blake, IJCV ’98www.cs.bgu.ac.il/~cv192/ Tracking (ver. 1.00) June 3, 2019 46 / 50
Visual Tracking
Isard and Blake
See demos
www.cs.bgu.ac.il/~cv192/ Tracking (ver. 1.00) June 3, 2019 47 / 50
Visual Tracking
Another Example for Parameterization
Articulated parts
Figure from Sidenbladh, Black and Fleet, ECCV 2000www.cs.bgu.ac.il/~cv192/ Tracking (ver. 1.00) June 3, 2019 48 / 50
Visual Tracking
Bigger Problem: Data Association
What if want to explicitly model multiple objects being tracked?Which measurement goes with which track? (or clutter)
Often heuristics are used
Can be done in a principled way, but the details are not trivial
www.cs.bgu.ac.il/~cv192/ Tracking (ver. 1.00) June 3, 2019 49 / 50
Visual Tracking
Version Log
3/6/2019, ver 1.00.
www.cs.bgu.ac.il/~cv192/ Tracking (ver. 1.00) June 3, 2019 50 / 50