frame by frame bit allocation for motion-compensated video michael ringenburg may 9, 2003

Frame by Frame Bit Allocation for Motion-Compensated

VideoMichael Ringenburg

May 9, 2003

The Problem…

• Given a maximum bit budget B and a video with F frames, how many bits bi should we allocate to each frame fi in order to maximize the overall quality of the video?

• Formally, our constraint is:

• We assume a lossy, embedded coding scheme. Thus we can choose the exact number of bits to allocate to each frame.€

bi

i=1

F

∑ ≤ B

Rate-Distortion Curves

• If we increase the number of bits allocated to a frame (the frame’s bit rate) and hold everything else constant, the frame’s distortion (the mean squared error or MSE) decreases.

• The distortion decays exponentially (2-b)

Motion Compensation

• Each frame is predicted by the previous frame. We find blocks in the previous frame which are similar to blocks in the current frame, and calculate motion vectors which estimate the disparity between the previous and the current blocks. We only encode the difference, or the “residue”, between the predicted and actual frame.

• This complicates the task of bit allocation, because the quality of a frame depends not only on its rate, but also on the rates of all of the previous frames.

Measuring Video Quality

• We can measure the quality of individual frames with MSE (distortion) or PSNR (Peak Signal-to-Noise Ratio), but how do we measure the overall quality of the whole video?

• Method 1 (MMSE): Minimize the Mean MSE.• Method 2 (MINMAX): Minimize the Maximum

MSE. Leads to constant quality, which may be more visually appealing.

Outline of Talk

• Cheng-Li-Kuo ‘97 algorithm for MMSE

• Yang-Hemami ‘99 algorithm for MINMAX

• If time permits:– Adapting these algorithms for Group Testing

for Video (GTV)– A new algorithm for the MINMAX metric

Preliminaries

• I frames: “Independent” frames - not predicted. Distortion of an I frame depends only on its own bit rate.

• P frames: “Predicted” frames. Distortion depends on own bit rate, plus bit rate of most recent I frame and all P frames in between.

• Group of Pictures (GOP): I frame, followed by some number of P frames.

• Both algorithms optimize individual GOP’s.

Cheng-Li-Kuo ‘97 Algorithm

• From “Rate Control for an Embedded Wavelet Video Coder”, by Po-Yuen Cheng, Jin Li, and C.-C. Jay Kuo.

• Minimizes the MMSE of each Group of Pictures.

• Based on experimental observations of rate-distortion and motion-compensation behavior of typical videos.

• Uses Lagrange Multiplier method to derive optimal allocations.

Rate-Distortion Curves

• Authors experimentally determined that Rate-Distortion curves can be approximated by:

• Dmax is distortion at rate 0. This is equivalent to the variance of the wavelet coefficients.€

D = Dmax 2−βR = σ 22−βR

More on the ß parameter

• Larger ß indicates more efficient coding.• ßI is typically larger than ßP, because the I frame

quality is dependent only on its allocation.• The ratio ßI / ßP is usually between 1.1 and 1.4• Examples:

– Flower: ßI = 2.07, ßP = 1.50– Mobile: ßI = 1.65, ßP = 1.50– Tennis: ßI = 1.08, ßP = 0.86– Cheer: ßI = 1.72, ßP = 1.43

Frame Dependency

• If e is the residue of motion compensation, f is the predicted frame, d is the displacement, and g is the reference frame after lossy encoding, then:

• Then if E represents the expected value, the variance is:€

e(i, j) = f (i, j) − g([d(i, j)])

€

σ 2 = E[e2(i, j)]− E[e(i, j)]2 = E({ f (i, j) − g([d(i, j)]}2)

Frame Dependency

• Let a be the actual reference frame (as opposed to the reference frame after lossy encoding):

• This is the residue with respect to the original reference frame, plus the coding error of the reference frame. We assume they are not correlated.

€

e(i, j) = ( f (i, j) − a[d(i, j)]) + (a[d(i, j)] − g[d(i, j)])

Frame Dependency

• Let σa2 be the variance if the reference

frame was perfectly coded:

• The second part is the distortion of the motion-translated reference frame, which is linearly related to the actual distortion of the non-translated reference frame. Thus:

€

σ 2 = σ a2 + E({a[d(i, j)] − g[d(i, j)]}2)

€

σ 2 = σ a2 + αD f

More on the parameter

• Typically ranges from 0.5 to 0.9

• Higher indicates better quality motion compensation.

• Decreases if there is violent motion or a scene change.

Lagrange multiplier method

• Using the experimentally observed rate distortion model and the frame dependency we just derived, we can solve for the optimal allocation using the Lagrange multiplier method. Let RGOP be the number of bits assigned to a group of pictures. We minimize:

€

Di

i=1

N

∑ + λ Ri

i=1

N

∑ ⎛

⎝ ⎜

⎞

⎠ ⎟− RGOP

⎛

⎝ ⎜ ⎜

⎞

⎠ ⎟ ⎟

€

Di = Dmax,i2−β i R i = σ i

22−β i R i

Solution

• The authors solve the minimization, and derive:

€

DN =K

βN

,DN−1 =K

β N−1FN−1

Di =−Bi + Bi

2 − 4 AiCi

2Ai

FN−1 =1+ α N

K

βNσ a,N2

Ai = β iβ i+1α i+1

Bi = β iβ i+1σ a,i+12 + Kα i+1(β i − β i+1)

Ci = −Kβ i+1σ a,i+12

R1 =1

β1

log2

σ a,12

D1

Ri =1

β i

log2

α iDi−1 + σ a,i2

Di

Parameters

• We need to determine the variance σa2, the

coding efficiency ß, and the dependency for every frame in the Group Of Pictures before we begin coding it. Alternatively, if this is too expensive, we can estimate the values using the previous GOP.

• K is an adjustable parameter. We perform a binary search until the rate constraint is met.

Experimental Results

Yang-Hemami ‘99 Algorithm

• From “MINMAX Frame Rate Control Using a Rate-Distortion Optimized Wavelet Coder”, by Yan Yang and Sheila Hemami.

• Minimizes the maximum distortion of any frame in the Group of Pictures.

• Leads to constant quality within a GOP.

Outline of Algorithm• Let N be the number of frames in a GOP.• Recall the first frame is an I frame, the

rest are P frames.• Step 1: Find rates RI and RP such that:

RI + (N-1)RP=Rt and D1(RI) = D2(RP)=D.

• Step 2: Code the rest of the frames to distortion D. If the rate starts to get too high, adaptively raise D. If the rate starts to get too low, adaptively lower D.

Finding the Initial Rates

• Initially assume that all P frames have identical Rate-Distortion curves.

• Binary search:– Let R(D) = R1(D) + (N-1)R2 (D) – Find a D1 and D2 such that R(D1) < Rt < R(D2)– Repeat until |R(D) - Rt| < = Rt x 1%:

• Let D = (D1 + D2)/2• If R(D) < Rt let D1 = D• Else, let D2 = D

• Variant – force DI to be slightly less than DP

Adaptive Adjustment Algorithm

• In reality, the P frames are not identical.• Iterate over the P frames, coding each to the

current target distortion DP .• Let be the mismatch between jRP and the

number of bits actually used to code the first j P frames. Let up and low be upper and lower bounds on the allowable mismatch.

• If > up then raise DP using update algorithm.• If < low then lower DP using update algorithm.

Update Algorithm

• Code current P frame j at rate Rj = Rj-1 - /(N - j).

• If is still outside the allowable range, we use the update algorithm again on the next frame.

• Once is back within the acceptable range, we update the target distortion DP .

Experimental Results

Adapting for GTV

• Both algorithms optimize GOP’s.• Group Testing for video doesn’t have GOP’s in the

traditional sense – just a single I frame at the beginning of the video, and then only P frames.

• But GTV does have “pseudo”-I frames. When there is a scene change, or violent motion, the frame is not predicted very well, and thus it behaves like an I frame. They are usually much less frequent than traditional I frames, though, and not evenly distributed.

• Detect these frames by the residual magnitude.

Adapting Yang-Hemami ‘99

• Approach 1: Run initial rate algorithm on first two frames. Run adaptive adjustment algorithm until we encounter a “pseudo”-I frame. Repeat.

• Approach 2: Create N frame GOP’s at the beginning and at every “pseudo”-I frame. Use constant bit-rate coding outside the GOP’s.

Adapting Cheng-Li-Kuo ‘97

• Approach 1: Start a new GOP after N frames or at the next “pseudo”-I frame, whichever comes first. Even if the first frame of the new GOP is a P frame, it will behave like an I frame with a low ß value, because all previous allocations are fixed.

• Approach 2:Create N frame GOP’s at the beginning and at every “pseudo”-I frame. Use constant bit-rate coding outside the GOP’s.

New algorithm

• Set targetD to a small value.• Repeat:

– Encode all frames in GOP to distortion targetD

– If |bits_used – max_bits| < , break.

– Scale all allocations by max_bits/bits_used and encode GOP.

– If variance of the frame distortions is less than max_variance, break.

– Set targetD to average distortion.

My Project

• Implement Cheng-Li-Kuo, Yang-Hemami, and my new algorithm in the context of GTV.

• Compare the speed and quality of the three algorithms.

frame by frame bit allocation for motion-compensated video michael ringenburg may 9, 2003

Documents