eecs 274 computer vision motion estimation. motion estimation aligning images estimate motion...
TRANSCRIPT
EECS 274 Computer Vision
Motion Estimation
Motion estimation
• Aligning images• Estimate motion parameters• Optical flow
– Lucas-Kanade algorithm– Horn-Schunck algorithm
• Slides based on Szeliski’s lecture notes• Reading: FP Chapter 15, S Chapter 8
Why estimate visual motion?• Visual Motion can be annoying
– Camera instabilities, jitter– Measure it; remove it (stabilize)
• Visual Motion indicates dynamics in the scene– Moving objects, behavior– Track objects and analyze trajectories
• Visual Motion reveals spatial layout – Motion parallax
Classes of techniques
• Feature-based methods– Extract visual features (corners, textured areas) and track
them– Sparse motion fields, but possibly robust tracking– Suitable especially when image motion is large (10s of pixels)
• Direct methods– Directly recover image motion from spatio-temporal image
brightness variations– Global motion parameters directly recovered without an
intermediate feature motion calculation– Dense motion fields, but more sensitive to appearance
variations– Suitable for video and when image motion is small (< 10
pixels)
Optical flow vs. motion field
• Optical flow does not always correspond to motion field
• Optical flow is an approximation of the motion field• The error is small at points with high spatial gradient
under some simplifying assumptions
Patch matching
• How do we determine correspondences?
– block matching or SSD (sum squared differences)
Correlation and SSD
• For larger displacements, do template matching– Define a small area around a pixel as the
template– Match the template against each pixel within a
search area in next image– Use a match measure such as correlation,
normalized correlation, or sum-of-squares difference
– Choose the maximum (or minimum) as the match– Sub-pixel estimate (Lucas-Kanade)
Discrete search vs. gradient based• Consider image I translated by
• The discrete search method simply searches for the best estimate
• The gradient method linearizes the intensity function and solves for the estimate
21
,00
2
,1
1001
0
)),(),(),((
)),(),((),(
),(),(),(
),(),(
yxvvyuuxIyxI
vyuxIyxIvuE
yxyxIvyuxI
yxIyxI
yx
yx
00 ,vu
Brightness constancy
• Brightness Constancy Equation:
• Minimize :
• Linearize (assuming small (u,v))using Taylor series expansion:
image template theis
)),(),,((),(
0
10
I
yxvyyxuxIyxI
201 )),(),((),( yxIvyuxIvuE
),(),(),(),(),(),( 1111 yxvyxIyxuyxIyxIyxI yx
Estimating optical flow
• Assume the image intensity I is constant
Time = t Time = t+dt
dyydxx ,
yx,
),,(),,( 10 ttyyxxItyxI
Optical flow constraint
tt
Iy
y
Ix
x
ItyxItyxI
ttyyxxItyxI
),,(),,(
),,(),,(
v
2)(),(
;,
,
0
0let and ,0
tyx
Ttyx
tyx
tyx
IvIuIvuE
AIIv
uII
IvIuI
IvIuI
tt
I
t
y
y
I
t
x
x
I
bubu
u
),( yx II
Lucas-Kanade algorithm
Minimizing
Assume a single velocity for all pixels within an image patch
LHS: sum of the 2x2 outer product of the gradient vector
yx
tyx IvyxIuyxIvuE,
2)),(),((),(
tT
ty
tx
yyx
yxx
TT
IIuII
II
II
v
u
III
III
AAA
A
)(
)(
2
2
1
2
bu
bu
Hessian
Matrix form
)()(
)(,)()(
)(
)()()(
)()()(
1
1111
21
211
21
ioii
iii
iii
iioii
iioi
IIe
y
I
x
II
e
II
IIE
xux
uxuxuxJ
uuxJ
xuuxJux
xuuxuu
Matrix form
)()(
)()(
)(,)()(
,
)(
)()(
01
1
1111
2
2
1
11
ii
ioii
iii
ty
tx
yyx
yxx
iii
iii
T
IIe
y
I
x
II
II
II
III
IIIA
uxe
A
A
xJuxJ
xux
uxuxuxJ
b
Jb
uxJuxJ
bu
for computational efficiency
Computing gradients in X-Y-T
k
k+1
i i+1
j
j+1
time
x
y
)](
)[(4
1
1,1,,1,1,,,,
1,1,1,1,11,,1,,1
kjikjikjikji
kjikjikjikjix
IIII
IIIIx
I
likewise for Iy and It
Local patch analysis
• How certain are the motion estimates?
The aperture problem
• Algorithm: At each pixel compute by solving
• A is singular if all gradient vectors point in the same direction• e.g., along an edge• of course, trivially singular if the summation is over a single pixel or there is no texture• i.e., only normal flow is available (aperture problem)
• Corners and textured areas are OK
ty
txT
II
IIIIA band,Let
u bu A
SSD surface – textured area
SSD surface – edge
SSD – homogeneous area
Iterative refinement
• Estimate velocity at each pixel using one iteration of Lucas and Kanade estimation
• Warp one image toward the other using the estimated flow field(easier said than done)
• Refine estimate by repeating the process
Optical flow: iterative estimation
xx0
Initial guess:
Estimate:
estimate update
(using d for displacement here instead of u)
Optical flow: iterative estimation
xx0
estimate update
Initial guess:
Estimate:
Optical flow: iterative estimation
xx0
Initial guess:
Estimate:
Initial guess:
Estimate:
estimate update
Optical flow: iterative estimation
xx0
Optical flow: iterative estimation• Some implementation issues:
– Warping is not easy (ensure that errors in warping are smaller than the estimate refinement)
– Warp one image, take derivatives of the other so you don’t need to re-compute the gradient after each iteration
– Often useful to low-pass filter the images before motion estimation (for better derivative estimation, and linear approximations to image intensity)
Error functions
• Robust error function
• Spatially varying weights
• Bias and gain: images taken with different exposure
• Correlation (and normalized cross correlation)
22
2
/1)()),()(()(
ax
xxIIE
ii
xuxu
i
iiii IIwwE 210 )()()()()( xuxuxxu
2)()1()()(
bias theis andgain theis ,)()1()(
iii IIE
II
xuxu
xux
i
ii IIE )()()( uxxu
Horn-Schunck algorithm
• Global method with smoothness constraint to solve aperture problem
• Minimize a global energy functional with calculus of variations
dxdyvuIvIuIE tyx 2222
functionenergy theof integrand theis where
0
0
L
v
L
yv
L
xv
L
u
L
yu
L
xu
L
yx
yx
Horn-Schunck algorithm
operator Laplace theis where
0)(
0)(
2
2
2
2
2
2
yx
vIvIuII
uIvIuII
tyxy
tyxx
tyyyx
txyxx
IIvvIuII
IIuvIIuI
yxuyxuyxu
222
222
)(
)(
),(),(),(
See Robot Vision by Horn for details
Horn-Schunck algorithm
• Iterative scheme
• Yields high density flow• Fill in missing information in the
homogenous regions
• More sensitive to noise than local methods
2221
2221
)(
)(
yx
tk
yk
xykk
yx
tk
yk
xxkk
II
IvIuIIvv
II
IvIuIIuu
u
v
),( yx II
),( vu
),( vu
Optical flow: aliasing
Temporal aliasing causes ambiguities in optical flow because images can have many pixels with the same intensity.
I.e., how do we know which ‘correspondence’ is correct?
nearest match is correct (no aliasing)
nearest match is incorrect (aliasing)
To overcome aliasing: coarse-to-fine estimationcoarse-to-fine estimation.
actual shift
estimated shift
image Iimage J
aJwwarp refine
a
aΔ+
Pyramid of image J Pyramid of image I
image Iimage J
Coarse-to-fine estimation
u=10 pixels
u=5 pixels
u=2.5 pixels
u=1.25 pixels
J Jw Iwarp refine
ina
a
+
J Jw Iwarp refine
a
a+
J
pyramid construction
J Jw Iwarp refine
a+
I
pyramid construction
outa
Coarse-to-fine estimation
Global (parametric) motion models• 2D Models:• Affine• Quadratic• Planar projective transform (Homography)
• 3D Models:• Instantaneous camera motion models • Homography+epipole• Plane+Parallax
Motion models
Translation
2 unknowns
Affine
6 unknowns
Perspective
8 unknowns
3D rotation
3 unknowns
Example: affine motion
• Substituting into the B.C. equation
Each pixel provides 1 linear constraint in 6 global unknowns
Least square minimization (over all pixels):
2 tyx IyaxaaIyaxaaIaErr )()()( 654321
0)()( 654321 tyx IyaxaaIyaxaaI
0 tyx IvIuI
yaxaayxv
yaxaayxu
654
321
),(
),(
Parametric motion
)'('
)'()'(
)'(
)()'()'(
)());('(
map encecorrespondor fieldmotion :);('
11
1
21
2011
201
ii
ii
iii
iiii
iii
II
e
II
IIE
xp
xx
pxJ
pxJ
xpxJx
xppxx
pxx
i.e., the product of image gradient with Jacobian of the correspondence field
Parametric motion
)'()(
)()'()'()(
1
11
'
''
iT
ii
iT
iT
iiT
ii
T
Ieb
IIA
i
ii
xxJ
xJxxxJ
x
xx
the expressions inside the brackets are the same as the cases for simpler translation motion case
Other 2D Motion Models
Quadratic – instantaneous approximation to planar motion 2
87654
82
7321
yqxyqyqxqqv
xyqxqyqxqqu
yyvxxu
yhxhh
yhxhhy
yhxhh
yhxhhx
','
and
'
'
987
654
987
321
Projective – exact planar motion
3D Motion Models
ZxTTxxyyv
ZxTTyxxyu
ZYZYX
ZXZYX
)()1(
)()1(2
2
yyvxxu
thyhxh
thyhxhy
thyhxh
thyhxhx
',' :and
'
'
3987
1654
3987
1321
)(1
)(1
233
133
tytt
xyv
txtt
xxu
w
w
Local Parameter:
ZYXZYX TTT ,,,,,
),( yxZ
Instantaneous camera motion:
Global parameters:
Global parameters: 32191 ,,,,, ttthh
),( yx
Homography+Epipole
Local Parameter:
Residual Planar Parallax Motion
Global parameters: 321 ,, ttt
),( yxLocal Parameter:
Shi-Tomasi feature tracker
1. Find good features (min eigenvalue of 22 Hessian)
2. Use Lucas-Kanade to track with pure translation
3. Use affine registration with first feature patch
4. Terminate tracks whose dissimilarity gets too large
5. Start new tracks when needed
Tracking results
Tracking - dissimilarity
Tracking results
Correlation window size
• Small windows lead to more false matches• Large windows are better this way, but…
– Neighboring flow vectors will be more correlated (since the template windows have more in common)
– Flow resolution also lower (same reason)– More expensive to compute
• Small windows are good for local search:more detailed and less smooth (noisy?)
• Large windows good for global search:less detailed and smoother
Robust estimation
Noise distributions are often non-Gaussian, having much heavier tails. Noise samples from the tails are called outliers.
• Sources of outliers (multiple motions):– specularities / highlights– jpeg artifacts / interlacing / motion blur– multiple motions (occlusion boundaries, transparency)velocity spacevelocity space
u1
u2
++
How to handle background and foreground motion
Robust estimation
Standard Least Squares Estimation allows too much influence for outlying points
)()
)()(
)()(
2
mxx
x
mxx
xmE
i
ii
ii
( Influence
Robust estimation
Robust gradient constraint
Robust SSD
),(),(),(
),(
ssssd
tsysxssd
vyuxIyxIvuE
IvIuIvuE
Robust estimation
Problem: Least-squares estimators penalize deviations between data & model with quadratic error fn (extremely sensitive to outliers)
error penalty function influence function
Redescending error functions (e.g., Geman-McClure) help to reduce the influence of outlying measurements.
error penalty function influence function
2)( 2
)()(
2
2
);(
s
s 22 )(
2);(
s
ss
Performance evaluation
• See Baker et al. “A Database and Evaluation Methodology for Optical Flow”, ICCV 2007
• Algorithms:• Pyramid LK: OpenCV-based implementation of
Lucas-Kanade on a Gaussian pyramid• Black and Anandan: Author’s implementation
• Bruhn et al.: Our implementation• MediaPlayerTM: Code used for video frame-rate
upsampling in Microsoft MediaPlayer• Zitnick et al.: Author’s implementation
Performance evaluation
• Difficulty: Data substantially more challenging than Yosemite
• Diversity: Substantial variation in difficulty across the various datasets
• Motion GT vs Interpolation: Best algorithms for one are not the best for the other
• Comparison with Stereo: Performance of existing flow algorithms appears weak
Motion representations
• How can we describe this scene?
Block-based motion prediction• Break image up into square blocks• Estimate translation for each block• Use this to predict next frame, code
difference (MPEG-2)
Layered motion
• Break image sequence up into “layers”:
=
• Describe each layer’s motion
Layered motion
• Advantages:• can represent occlusions / disocclusions• each layer’s motion can be smooth• video segmentation for semantic
processing• Difficulties:• how do we determine the correct number?• how do we assign pixels?• how do we model the motion?
Layers for video summarization
Background modeling (MPEG-4)• Convert masked images into a
background sprite for layered video coding
+ + + + + +
=
What are layers?
• Intensities• Velocities• Layers• Alpha matting• Sprites• Wang and Adelson,
“Representing moving images with layers”, IEEE Transactions on Image Processing, 1994
How do we form them?
How do we estimate the layers?1. compute coarse-to-fine flow2. estimate affine motion in blocks
(regression)3. cluster with k-means4. assign pixels to best fitting affine region5. re-estimate affine motions in each
region…
Layer synthesis
• For each layer:• stabilize the sequence with the affine
motion• compute median value at each pixel
• Determine occlusion relationships
Results
Applications
• Motion analysis• Video coding• Morphing• Video denoising• Video stabilization • Medical image registration• Frame interpolation