video processing communications yao wang chapter6b

32
Two-Dimensional Motion Estimation (Continuing) Yao Wang olytechnic University, Brooklyn, NY11201

Upload: ashoka-vanjare

Post on 17-May-2017

236 views

Category:

Documents


11 download

TRANSCRIPT

Page 1: Video Processing Communications Yao Wang Chapter6b

Two-Dimensional Motion Estimation(Continuing)

Yao WangPolytechnic University, Brooklyn, NY11201

Page 2: Video Processing Communications Yao Wang Chapter6b

Outline (Chapter 6.3,6.4,6.9)

• Pixel-based motion estimation• Block-based motion estimation

– EBMA algorithm

• Problems with EBMA• Multiresolution motion estimation

Page 3: Video Processing Communications Yao Wang Chapter6b

3

Pixel-Based Motion Estimation

• Have talked about general ideas of motion estimation• Now consider specific approaches• Start with Pixel-based motion estimation• Then consider Block-based motion estimation• If using constant intensity assumption -> ill-defined as

many pixels have the same intensity• Hence the several different approaches:

1. Regularization to enforce smoothness2. Constant MV in a neighborhood assumption3. Additional invariance constraints: intensity gradient invariance4. Consider relationship between phase function of two frames

(using Fourier transform of each frame

Page 4: Video Processing Communications Yao Wang Chapter6b

4

Pixel-Based Motion Estimation

• Horn-Schunck method– OF + smoothness criterion

• Multipoint neighborhood method– Assuming every pixel in a small block surrounding a pixel

has the same MV

• Pel-recurrsive method (pel==pixel)– MV for a current pel is updated from those of its previous

pels, so that the MV does not need to be coded– Developed for early generation of video coder– Motion estimation is poor, hence poor compression

Page 5: Video Processing Communications Yao Wang Chapter6b

5

Pixel-Based Motion Estimation

• Horn-Schunck method

Page 6: Video Processing Communications Yao Wang Chapter6b

6

Multipoint Neighborhood Method

• Estimate the MV at each pixel independently, by minimizing the DFD (displaced frame difference) error over a neighborhood surrounding this pixel

• Every pixel in the neighborhood B is assumed to have the same MV

• Minimizing function:

• Optimization method:– Exhaustive search (feasible as one only needs to search one MV

at a time)• Need to select appropriate search range and search step-size

– Gradient-based method

smaller)(farther weight pixel - x)(

min)()()()()(

212nDFD

w

wEnB

nxx

xdxxd

Page 7: Video Processing Communications Yao Wang Chapter6b

7

Example: Gradient Descent Method

iterationsmany too take small toominimum thearound oscillatemay solution large too

econvergencfor carefully choose toneed size, step - numberiteration - where)(

:descentgradient order First

)()(),( where),()()(

min)()()()(

)(n

)(n

)1(n

122

)(nn

)(

212nDFD

l

eewE

wE

lll

nnB

n

Bn

nn

n

dgdd

xdxdxx

dxxd

dg

xdxxd

dxxx

xx

Page 8: Video Processing Communications Yao Wang Chapter6b

8

Example: Gradient Descent Method

pixels. at those valuesDFD weightedby the scaled pixels, at various gradients image of sum on the dependsiteration each at update The

again size step - )()(

)(

derivative second dropping

))()()((

)),()(()(

matrix)Hessian (needsRalphson -Newton

)(n

1)(n

)(n

)1(n

22

)(

22

222

)(

2

)(n2

n

2

n

llll

T

B

n

T

B

Bn

nn

nnn

n

w

eww

ewEE

dgdHdd

xxx

xdxx

xxx

xdxx

dddH

dxxx

dxdxxx

xx

Page 9: Video Processing Communications Yao Wang Chapter6b

9

Simplification Using OF Criterion

)(121

1

)(11optn,

1)(

121n

)(

2121nOF

)()()()()()()(

0)()()()()(

min)()()()()(

nn

n

n

BB

T

Bn

T

Bn

T

ww

wE

wE

xxxx

xx

xx

xxxxxxxd

xxxdxxd

xxdxxd

The solution is good only if the actual MV is small. When this is not the case, one should iterate the above solution, with the following update:

iterationat that found MV thedenote

)()(

)1(

)1()(n

)1(n

)(n2

)1(2

ln

ln

ll

ll

where

dd

dxx

Page 10: Video Processing Communications Yao Wang Chapter6b

10

Block-Based Motion Estimation:Overview

• One way to impose smoothness is use non-overlapping blocks• Assume all pixels in a block undergo a coherent motion, and

search for the motion parameters for each block independently• Blocks can be any shape – but usually squares• Simplest case: constant motion for the block• Block matching algorithm (BMA): assume translational motion, 1

MV per block (2 parameter)– Exhaustive BMA (EBMA)– Fast algorithms

• Deformable block matching algorithm (DBMA): allow more complex motion (affine, bilinear), to be discussed later.

Page 11: Video Processing Communications Yao Wang Chapter6b

11

Block Matching Algorithm

• Overview:– Assume all pixels in a block undergo a translation, denoted by a

single MV– Estimate the MV for each block independently, by minimizing the

DFD error over this block (as prediction error affects just this block)• Minimizing function:

• Optimization method:– Exhaustive search (feasible as one only needs to search one MV at

a time) within search region, using MAD criterion (p=1)– Fast search algorithms– Integer vs. fractional pel accuracy search

min)()()( 12mDFD mB

pmE

xxdxd

Page 12: Video Processing Communications Yao Wang Chapter6b

12

Exhaustive Block Matching Algorithm (EBMA)

Page 13: Video Processing Communications Yao Wang Chapter6b

13

Complexity of Integer-Pel EBMA

• Assumption– Image size: MxM– Block size: NxN– Search range: (-R,R) in each dimension– Search stepsize: 1 pixel (assuming integer MV)

• Operation counts (1 operation=1 “-”, 1 “+”, 1 “*”):– Each candidate position: N^2– Each block going through all candidates: (2R+1)^2 N^2– Entire frame: (M/N)^2 (2R+1)^2 N^2=M^2 (2R+1)^2

• Independent of block size!• Example: M=512, N=16, R=16, 30 fps

– Total operation count = 2.85x10^8/frame =8.55x10^9/second• Regular structure suitable for VLSI implementation• Challenging for software-only implementation

Page 14: Video Processing Communications Yao Wang Chapter6b

14

Sample Matlab Script for Integer-pel EBMA

%f1: anchor frame; f2: target frame, fp: predicted image;%mvx,mvy: store the MV image%widthxheight: image size; N: block size, R: search range

for i=1:N:height-N, for j=1:N:width-N %for every block in the anchor frame MAD_min=256*N*N;mvx=0;mvy=0; for k=-R:1:R, for l=-R:1:R %for every search candidate MAD=sum(sum(abs(f1(i:i+N-1,j:j+N-1)-f2(i+k:i+k+N-1,j+l:j+l+N-1))));

% calculate MAD for this candidate if MAD<MAX_min

MAD_min=MAD,dy=k,dx=l; end; end;end; fp(i:i+N-1,j:j+N-1)= f2(i+dy:i+dy+N-1,j+dx:j+dx+N-1);

%put the best matching block in the predicted image iblk=(floor)(i-1)/N+1; jblk=(floor)(j-1)/N+1; %block index mvx(iblk,jblk)=dx; mvy(iblk,jblk)=dy; %record the estimated MVend;end;

Note: A real working program needs to check whether a pixel in the candidate matching block falls outside the image boundary and such pixel should not count in MAD. This program is meant to illustrate the main operations involved. Not the actual working matlab script.

Page 15: Video Processing Communications Yao Wang Chapter6b

15

Fractional Accuracy EBMA

• Real MV may not always be multiples of pixels. To allow sub-pixel MV, the search step_size must be less than 1 pixel

• Half-pel EBMA: step_size=1/2 pixel in both dimension• Difficulty:

– Target frame only have integer pels• Solution:

– Interpolate the target frame by factor of two before searching– Bilinear interpolation is typically used

• Complexity: – 4 times of integer-pel, plus additional operations for interpolation.

• Fast algorithms:– Search in integer precisions first, then refine in a small search

region in half-pel accuracy.

Page 16: Video Processing Communications Yao Wang Chapter6b

16

Half-Pel Accuracy EBMA

Page 17: Video Processing Communications Yao Wang Chapter6b

17

Bilinear Interpolation

(x+1,y)(x,y)

(x+1,y+1)(x,y+!)

(2x,2y) (2x+1,2y)

(2x,2y+1) (2x+1,2y+1)

O[2x,2y]=I[x,y]O[2x+1,2y]=(I[x,y]+I[x+1,y])/2O[2x,2y+1]=(I[x,y]+I[x+1,y])/2O[2x+1,2y+1]=(I[x,y]+I[x+1,y]+I[x,y+1]+I[x+1,y+1])/4

Page 18: Video Processing Communications Yao Wang Chapter6b

Pre

dict

ed a

ncho

r fra

me

(29.

86dB

)an

chor

fram

e

targ

et fr

ame

Mot

ion

field

Example: Half-pel EBMA

Page 19: Video Processing Communications Yao Wang Chapter6b

19

Pros and Cons with EBMA

• Blocking effect (discontinuity across block boundary) in the predicted image– Because the block-wise translation model is not accurate– Fix: Deformable BMA

• Motion field somewhat chaotic– because MVs are estimated independently from block to block– Fix 1: Mesh-based motion estimation – Fix 2: Imposing smoothness constraint explicitly (over blocks)

• Wrong MV in the flat region– because motion is indeterminate when spatial gradient is near zero

• Nonetheless, widely used for motion compensated prediction in video coding– Because its simplicity and optimality in minimizing prediction error

Page 20: Video Processing Communications Yao Wang Chapter6b

20

Fast Algorithms for BMA

• Key idea to reduce the computation in EBMA: – Reduce # of search candidates:

• Only search for those that are likely to produce small errors. • Predict possible remaining candidates, based on previous search result

– Simplify the error measure (DFD) to reduce the computation involved for each candidate

• Classical fast algorithms– Three-step– 2D-log– Conjugate direction

• Many new fast algorithms have been developed since then– Some suitable for software implementation, others for VLSI

implementation (memory access, etc)

Page 21: Video Processing Communications Yao Wang Chapter6b

21

Fast Algorithms for BMA

-if center=best-reduce step size

-if still center = best-stop

-Similar but fixed # of steps-9 candidates-Step size reduced at each step

Page 22: Video Processing Communications Yao Wang Chapter6b

22

Problems with EBMA (section 6.9)

• Blocking effect (discontinuity across block boundary) in the predicted image– Because the block-wise translation model is not accurate– Real motion in a block may be more complicated than

translation• Fix: Deformable BMA = DBMA (next lecture)

– There may be multiple objects with different motions in a block

• Fix: – region-based motion estimation– mesh-based using adaptive mesh

– Intensity changes may be due to illumination effect• Should compensate for illumination effect before applying

“constant intensity assumption”

Page 23: Video Processing Communications Yao Wang Chapter6b

23

Problems with EBMA

• Motion field somewhat chaotic– because MVs are estimated independently from block to block– Fix:

• Imposing smoothness constraint explicitly• Multi-resolution approach• Mesh-based motion estimation

• Wrong MV in the flat region– because motion is indeterminate when spatial gradient is near zero– Ideally, should use non-regular partition– Fix: region based motion estimation

• Requires tremendous computation!– Fix:

• fast algorithms• Multi-resolution

Page 24: Video Processing Communications Yao Wang Chapter6b

24

Multi-resolution Motion Estimation

• Problems with BMA– Unless exhaustive search is used, the solution may not be global

minimum– Exhaustive search requires extremely large computation– Block wise translation motion model is not always appropriate

• Multi-resolution approach– Aim to solve the first two problems– First estimate the motion in a coarse resolution over low-pass

filtered, down-sampled image pair• Can usually lead to a solution close to the true motion field

– Then modify the initial solution in successively finer resolution within a small search range

• Reduce the computation– Can be applied to different motion representations, but we will focus

on its application to BMA

Page 25: Video Processing Communications Yao Wang Chapter6b

25

Hierarchical Block Matching Algorithm (HBMA)

Page 26: Video Processing Communications Yao Wang Chapter6b

26

Hierarchical Block Matching Algorithm (HBMA)

Benefits:1. - minimization problem at coarser resolution is better-posed than at a finer resolution: hence close to a true solution - interpolation of that solution to finer resolution provides good initial solution - more likely to reach global max2. - search range significantly reduced - reduced computation

Page 27: Video Processing Communications Yao Wang Chapter6b

27

estimate

Page 28: Video Processing Communications Yao Wang Chapter6b

28

Pre

dict

ed a

ncho

r fra

me

(29.

32dB

)

Example: Three-level HBMA

Page 29: Video Processing Communications Yao Wang Chapter6b

29

Pre

dict

ed a

ncho

r fra

me

(29.

86dB

)

Mot

ion

field

Example: Half-pel EBMA

Page 30: Video Processing Communications Yao Wang Chapter6b

30

Computation Requirement of HBMA

• Assumption– Image size: MxM; Block size: NxN at every level; Levels: L– Search range:

• 1st level: R/2^(L-1) (Equivalent to R in L-th level)• Other levels: R/2^(L-1) (can be smaller)

• Operation counts for EBMA – image size M, block size N, search range R– # operations:

• Operation counts at l-th level (Image size: M/2^(L-l))

• Total operation count

• Saving factor:

22)2(

1

21244

3112/22/ RMRM L

L

l

LlL

21212/22/ LlL RM

22 12 RM

)3(12);2(343 )2( LLL

Page 31: Video Processing Communications Yao Wang Chapter6b

31

Summary (6.1-6.4, 6.9)

• Optical flow equation– Derived from constant intensity and small motion assumption– Ambiguity in motion estimation

• How to represent motion:– Pixel-based, block-based, region-based, global, etc.

• Estimation criterion:– DFD (constant intensity)– OF (constant intensity+small motion)– Bayesian (MAP, DFD+motion smoothness)

• Search method:– Exhaustive search, gradient-descent, multi-resolution (next lecture)

• Pixel-based motion estimation– Most accurate representation, but also most costly to estimate

• Block-based motion estimation– Good trade-off between accuracy and speed– EBMA and its fast but suboptimal variant is widely used in video coding for

motion-compensated temporal prediction.• Multi-resolution methods

Page 32: Video Processing Communications Yao Wang Chapter6b

32

VcDemo Example

VcDemo: Image and Video Compression Learning Tool  Developed at Delft University of Technology

http://www-ict.its.tudelft.nl/~inald/vcdemo/

Use the ME tool to show the motion estimation results with different parameter choices