EE591f Digital Video Processing EE591f Digital Video Processing
11
RoadmapRoadmap
IntroductionIntroduction
Intra-frame coding Intra-frame coding – Review of JPEGReview of JPEG
Inter-frame codingInter-frame coding– Conditional Replenishment (CR)Conditional Replenishment (CR)– Motion Compensated Prediction (MCP)Motion Compensated Prediction (MCP)
Object-based and scalable video coding*Object-based and scalable video coding*– Motion segmentation, scalability issuesMotion segmentation, scalability issues
EE591f Digital Video Processing EE591f Digital Video Processing
22
Introduction to Video CodingIntroduction to Video Coding
Lossless vs. lossy data compressionLossless vs. lossy data compression– Source entropy H(X)Source entropy H(X)– Rate-Distortion function R(D) or D(R)Rate-Distortion function R(D) or D(R)
Probabilistic modeling is at the heart of data Probabilistic modeling is at the heart of data compressioncompression– What is P(X) for video source X?What is P(X) for video source X?– Modeling moving pictures is more difficult than Modeling moving pictures is more difficult than
modeling still images due to temporal dependencymodeling still images due to temporal dependency
EE591f Digital Video Processing EE591f Digital Video Processing
33
Shannon’s PictureShannon’s Picture
Rate (bps)
Distortion
Coder ACoder B
Which coder wins, A or B?
EE591f Digital Video Processing EE591f Digital Video Processing
44
Distortion MeasuresDistortion Measures
ObjectiveObjective– Mean Square Error (MSE)Mean Square Error (MSE)– Peak Signal-to-Noise-Ratio (PSNR)Peak Signal-to-Noise-Ratio (PSNR)– Measure the fidelity to original videoMeasure the fidelity to original video
SubjectiveSubjective– Human Vision System (HVS) basedHuman Vision System (HVS) based– Emphasize visual quality rather than fidelityEmphasize visual quality rather than fidelity
We only discuss objective measures in this We only discuss objective measures in this coursecourse
EE591f Digital Video Processing EE591f Digital Video Processing
55
RoadmapRoadmap
IntroductionIntroduction
Intra-frame codingIntra-frame coding – Review of JPEGReview of JPEG
Inter-frame codingInter-frame coding– Conditional Replenishment (CR)Conditional Replenishment (CR)– Motion Compensated Prediction (MCP)Motion Compensated Prediction (MCP)
Scalable video codingScalable video coding– 3D subband/wavelet coding and recent trend3D subband/wavelet coding and recent trend
EE591f Digital Video Processing EE591f Digital Video Processing
66
A Tour of JPEG Coding Standard
Key Components
Transform
Quantization
Coding
-8×8 DCT-boundary padding
-uniform quantization
-DC/AC coefficients
-Zigzag scan-run length/Huffman coding
EE591f Digital Video Processing EE591f Digital Video Processing
77
JPEG Baseline Coder
169130
173129
170181
170183
179181
182180
179180
179179169132
171130
169183
164182
179180
176179
180179
178178167131
167131
165179
170179
177179
182171
177177
168179169130
165132
166187
163194
176116
15394
153183
160183Tour Example
EE591f Digital Video Processing EE591f Digital Video Processing
88
Step 1: Transform• DC level shifting
• 2D DCT
169130
173129
170181
170183
179181
182180
179180
179179169132
171130
169183
164182
179180
176179
180179
178178167131
167131
165179
170179
177179
182171
177177
168179169130
165132
166187
163194
176116
15394
153183
160183
412
451
4253
4255
5153
5452
5152
5151414
432
4155
3654
5152
4851
5251
5050393
393
3751
4251
4951
5443
4949
4051412
374
3859
3566
4812
2534
2555
3655
-128
412
451
4253
4255
5153
5452
5152
5151414
432
4155
3654
5152
4851
5251
5050393
393
3751
4251
4951
5443
4949
4051412
374
3859
3566
4812
2534
2555
3655
13
42
12
09
40
21
13
4430
55
47
73
30
46
32
16113
916
109
621
179
3310
810
17201024
2727
132
6078
4413
1827
2738
56313
DCT
EE591f Digital Video Processing EE591f Digital Video Processing
99
Step 2: Quantization
99103
101120
100112
121103
9895
8778
9272
644992113
77103
10481
10968
6455
5637
3524
22186280
5669
8751
5740
2922
2416
1714
13145560
6151
5826
4024
1914
1610
1212
1116
Q-table
13
42
12
09
40
21
13
4430
55
47
73
30
46
32
16113
916
109
621
179
3310
810
17201024
2727
132
6078
4413
1827
2738
56313
00
00
00
00
00
00
00
0000
00
00
00
00
00
00
0000
00
00
01
10
11
01
1100
01
01
23
21
13
23
520
Q
Why increasefrom top-left tobottom-right?
EE591f Digital Video Processing EE591f Digital Video Processing
1010
Step 3: Entropy Coding
Zigzag Scan
00
00
00
00
00
00
00
0000
00
00
00
00
00
00
0000
00
00
01
10
11
01
1100
01
01
23
21
13
23
520
(20,5,-3,-1,-2,-3,1,1,-1,-1,0,0,1,2,3,-2,1,1,0,0,0,0,0,0,1,1,0,1,EOB)
Zigzag Scan
End Of the Block:All following coefficients are zero
EE591f Digital Video Processing EE591f Digital Video Processing
1111
RoadmapRoadmap
IntroductionIntroduction
Intra-frame coding Intra-frame coding – Review of JPEGReview of JPEG
Inter-frame codingInter-frame coding– Conditional Replenishment (CR)Conditional Replenishment (CR)– Motion Compensated Prediction (MCP)Motion Compensated Prediction (MCP)
Scalable video codingScalable video coding– 3D subband/wavelet coding and recent trend3D subband/wavelet coding and recent trend
EE591f Digital Video Processing EE591f Digital Video Processing
1212
Conditional ReplenishmentConditional Replenishment
Based on motion detection rather than motion Based on motion detection rather than motion estimationestimationPartition the current frame into “still areas” Partition the current frame into “still areas” and “moving areas”and “moving areas”– Replenishment is applied to moving regions onlyReplenishment is applied to moving regions only– Repetition is applied to still regionsRepetition is applied to still regions
Need to transmit the location of moving areas Need to transmit the location of moving areas as well as new (replenishment) informationas well as new (replenishment) information– No motion vectors transmittedNo motion vectors transmitted
EE591f Digital Video Processing EE591f Digital Video Processing
1313
Conditional ReplenishmentConditional Replenishment
EE591f Digital Video Processing EE591f Digital Video Processing
1414
Motion DetectionMotion Detection
EE591f Digital Video Processing EE591f Digital Video Processing
1515
From Replenishment to PredictionFrom Replenishment to Prediction
Replenishment can be viewed as a degenerated Replenishment can be viewed as a degenerated case of predictioncase of prediction– Only zero motion vector is considered Only zero motion vector is considered – Discard the historyDiscard the history
A more powerful approach of exploiting A more powerful approach of exploiting temporal dependency is predictiontemporal dependency is prediction– Locate the best match from the previous frameLocate the best match from the previous frame– Use the history to predict the current Use the history to predict the current
EE591f Digital Video Processing EE591f Digital Video Processing
1616
Differential Pulse Coded ModulationDifferential Pulse Coded Modulation
_
+D
xn
xnxn-1
yn
xn-1
+yn xn
D
Encoder
Decoder
Q^yn
^
^^
^
xn-1^
^
Xn,yn: unquantized samples and prediction residues
Xn,yn: decoded samples and quantized prediction residues^ ^
nnnn yyxx ˆˆ
EE591f Digital Video Processing EE591f Digital Video Processing
1717
Motion-Compensated Predictive CodingMotion-Compensated Predictive Coding
EE591f Digital Video Processing EE591f Digital Video Processing
1818
A Closer LookA Closer Look
EE591f Digital Video Processing EE591f Digital Video Processing
1919
Key ComponentsKey Components
Motion Estimation/CompensationMotion Estimation/Compensation – At the heart of MCP-based codingAt the heart of MCP-based coding
Coding of Motion Vectors (overhead)Coding of Motion Vectors (overhead)– Lossless: errors in MV are catastrophic Lossless: errors in MV are catastrophic
Coding of MCP residuesCoding of MCP residues– Lossy: distortion is controlled by the quantization Lossy: distortion is controlled by the quantization
step-sizestep-size
Rate-Distortion optimizationRate-Distortion optimization
EE591f Digital Video Processing EE591f Digital Video Processing
2020
Block-based Motion ModelBlock-based Motion Model
Block sizeBlock size– Fixed vs. variableFixed vs. variable
Motion accuracyMotion accuracy– Integer-pel vs. fractional-pelInteger-pel vs. fractional-pel
Number of hypothesisNumber of hypothesis– Overlapped Block Motion Compensation (OBMC)Overlapped Block Motion Compensation (OBMC)– Multi-frame predictionMulti-frame prediction
EE591f Digital Video Processing EE591f Digital Video Processing
2121
Quadtree Representation of Quadtree Representation of Motion Field with Variable BlocksizeMotion Field with Variable Blocksize
EE591f Digital Video Processing EE591f Digital Video Processing
2222
Rate-Distortion Optimized BMARate-Distortion Optimized BMA
Distortion alone
Rate and Distortion
counted bits using a VLC table
EE591f Digital Video Processing EE591f Digital Video Processing
2323
Experimental ResultsExperimental Results
Cited from G. Sullivan and L. Baker, “Rate-Distortion optimizedmotion compensation for video compression using fixed or variable size blocks”, Globecom’1991
EE591f Digital Video Processing EE591f Digital Video Processing
2424
Fractional-pel BMAFractional-pel BMA
Recall the tradeoff between spending bits on Recall the tradeoff between spending bits on motion and spending bits on MCP residuesmotion and spending bits on MCP residues
Intuitively speaking, going from integer-pel to Intuitively speaking, going from integer-pel to fractional-pel is good for it dramatically fractional-pel is good for it dramatically reduces the variance of MCP residues for reduces the variance of MCP residues for some video sequence.some video sequence.
The gain quickly saturates as motion accuracy The gain quickly saturates as motion accuracy refinesrefines
EE591f Digital Video Processing EE591f Digital Video Processing
2525
8-by-8 block, half-pel, var(e)=123.88-by-8 block, integer-pel, var(e)=220.8
ExampleExample
MCP residue comparison for the first two frames of Mobile sequence
EE591f Digital Video Processing EE591f Digital Video Processing
2626
Fractional-pel MCPFractional-pel MCP
EE591f Digital Video Processing EE591f Digital Video Processing
2727
Multi-Hypothesis MCPMulti-Hypothesis MCP
Using one block from one reference frame Using one block from one reference frame represents a single-hypothesis MCPrepresents a single-hypothesis MCPIt is possible to formulate multiple hypothesis It is possible to formulate multiple hypothesis by consideringby considering– Overlapped blocksOverlapped blocks– More than one reference frameMore than one reference frame
Why multi-hypothesis?Why multi-hypothesis?– The benefit of reducing variance of MCP residues The benefit of reducing variance of MCP residues
outweighs the increased overhead on motionoutweighs the increased overhead on motion
EE591f Digital Video Processing EE591f Digital Video Processing
2828
Example: B-frameExample: B-frame
fn-1 fn fn+1
1/5.0/0),,()1(),(),(ˆ 11 ayxfayxfayxf nnn
EE591f Digital Video Processing EE591f Digital Video Processing
2929
Generalized B-frameGeneralized B-frame
fn-1 fn fn+1
nk
kk
n yxfayxf ),(),(ˆ
fn+2fn-2
EE591f Digital Video Processing EE591f Digital Video Processing
3030
Multi-Hypothesis MCPMulti-Hypothesis MCP
EE591f Digital Video Processing EE591f Digital Video Processing
3131
Key ComponentsKey Components
Motion Estimation Motion Estimation – At the heart of MCP-based codingAt the heart of MCP-based coding
Coding of Motion Vectors (overhead)Coding of Motion Vectors (overhead)– Lossless: errors in MV are catastrophic Lossless: errors in MV are catastrophic
Coding of MCP residuesCoding of MCP residues– Lossy: distortion is controlled by the quantization Lossy: distortion is controlled by the quantization
step-sizestep-size
Rate-Distortion optimizationRate-Distortion optimization
EE591f Digital Video Processing EE591f Digital Video Processing
3232
Motion Vector CodingMotion Vector Coding
2D lossless DPCM2D lossless DPCM– Spatially (temporally) adjacent motion vectors are Spatially (temporally) adjacent motion vectors are
correlatedcorrelated– Use causal neighbors to predict the current oneUse causal neighbors to predict the current one– Code Motion Vector Difference (MVD) instead of Code Motion Vector Difference (MVD) instead of
MVsMVs
Entropy coding techniquesEntropy coding techniques– Variable length codes (VLC)Variable length codes (VLC)– Arithmetic codingArithmetic coding
EE591f Digital Video Processing EE591f Digital Video Processing
3333
MVD ExampleMVD Example
MV
MV1 MV2
MV3
),,( 321 MVMVMVmedianMVMVD
Due to smoothness of MV field, MVD usually hasa smaller variance than MV
EE591f Digital Video Processing EE591f Digital Video Processing
3434
VLC Example VLC Example
MVx/MVy symbol codeword
0
-1
-2
1
2
3
1
2
3
4
5
6
1
010011
00100
00101
00110
Exponential Golomb Codes: 0…01x…xm m
EE591f Digital Video Processing EE591f Digital Video Processing
3535
Key ComponentsKey Components
Motion Estimation Motion Estimation – At the heart of MCP-based codingAt the heart of MCP-based coding
Coding of Motion Vectors (overhead)Coding of Motion Vectors (overhead)– Lossless: errors in MV are catastrophic Lossless: errors in MV are catastrophic
Coding of MCP residuesCoding of MCP residues– Lossy: distortion is controlled by the quantization Lossy: distortion is controlled by the quantization
step-sizestep-size
Rate-Distortion optimizationRate-Distortion optimization
EE591f Digital Video Processing EE591f Digital Video Processing
3636
MCP Residue CodingMCP Residue Coding
Transform Quantization Coding
Conceptually similar to JPEG
Transform: unitary transform
Quantization: Deadzone quantization
Coding: Run-length coding
EE591f Digital Video Processing EE591f Digital Video Processing
3737
TransformTransform
Unitary matrix: A is real, AUnitary matrix: A is real, A-1-1=A=ATT
Unitary transform: A is unitary, Y=AXAUnitary transform: A is unitary, Y=AXATT
ExamplesExamples– 8-by-8 DCT8-by-8 DCT– 4-by-4 integer transform 4-by-4 integer transform
1221
1111
2112
1111
A
EE591f Digital Video Processing EE591f Digital Video Processing
3838
Deadzone QuantizationDeadzone Quantization
2
0
deadzone
codewords
EE591f Digital Video Processing EE591f Digital Video Processing
3939
Key ComponentsKey Components
Motion Estimation Motion Estimation – At the heart of MCP-based codingAt the heart of MCP-based coding
Coding of Motion Vectors (overhead)Coding of Motion Vectors (overhead)– Lossless: errors in MV are catastrophic Lossless: errors in MV are catastrophic
Coding of MCP residuesCoding of MCP residues– Lossy: distortion is controlled by the quantization Lossy: distortion is controlled by the quantization
step-sizestep-size
Rate-Distortion optimizationRate-Distortion optimization
EE591f Digital Video Processing EE591f Digital Video Processing
4040
Lagrangian Multiplier MethodLagrangian Multiplier MethodRDJ
motionmotiondfd RDJ
RECModeREC RDJ
Motion estimation
Mode selection
ModeMotion
2cQUANTMode
QUANT: a user-specified parameter controlling quantization stepsize
EE591f Digital Video Processing EE591f Digital Video Processing
4141
SummarySummary
How does MCP coding work?How does MCP coding work?– The predictive model captures the slow-varying The predictive model captures the slow-varying
trend of the samples {ftrend of the samples {fnn}}
– The modeling of prediction residues {eThe modeling of prediction residues {enn} is easier } is easier
than that of original samples {fthan that of original samples {fnn}}
Fundamental weaknessFundamental weakness– Quantization error will propagate unless the Quantization error will propagate unless the
memory of predictor is refreshedmemory of predictor is refreshed– Not suitable for scalable coding applicationsNot suitable for scalable coding applications
EE591f Digital Video Processing EE591f Digital Video Processing
4242
RoadmapRoadmap
IntroductionIntroduction
Intra-frame coding Intra-frame coding – Review of JPEGReview of JPEG
Inter-frame codingInter-frame coding– Conditional Replenishment (CR)Conditional Replenishment (CR)– Motion Compensated Prediction (MCP)Motion Compensated Prediction (MCP)
Scalable video codingScalable video coding– 3D subband/wavelet coding and recent trend3D subband/wavelet coding and recent trend
EE591f Digital Video Processing EE591f Digital Video Processing
4343
Scalable vs. MulticastScalable vs. Multicast
What is scalable coding?What is scalable coding?
Multicast Scalable coding
foreman.yuv
foreman128k.codforeman256k.codforeman512k.codforeman1024k.cod
foreman.yuv
foreman.cod
1024512256128
EE591f Digital Video Processing EE591f Digital Video Processing
4444
Spatial scalabilitySpatial scalability
11 00 11 11 11 …… 00 11 00 11 00 00 00 …… 11 11 00 11 00 00
EE591f Digital Video Processing EE591f Digital Video Processing
4545
Temporal scalabilityTemporal scalability
11 00 11 11 11 …… 00 11 00 11 00 00 00 …… 11 11 00 11 00 00
Frame 0,1,2,3,4,5,…Frame 0,2,4,6,8,…Frame 0,4,8,12,…
30Hz15Hz7.5Hz
EE591f Digital Video Processing EE591f Digital Video Processing
4646
SNR (Rate) scalabilitySNR (Rate) scalability
11 00 11 11 11 …… 00 11 00 11 00 00 00 …… 11 11 00 11 00 00
PSNRavg=30dB PSNRavg=35dB PSNRavg=40dB
N
iiavg PSNR
NPSNR
1
1PSNRi: PSNR of frame i
EE591f Digital Video Processing EE591f Digital Video Processing
4747
Scalability via Bit-Plane CodingScalability via Bit-Plane Coding
A=(a0+a12+a222+ … … +a727)
Least Significant Bit (LSB)
Most Significant Bit (MSB)
Example A=129 sign=+,a0a1a2 …a7=10000001
sign=-, a0a1a2 …a7=00110011 A=-(4+8+64+128)=-204
sign bit
EE591f Digital Video Processing EE591f Digital Video Processing
4848
Why DPCM Bad for Scalability?Why DPCM Bad for Scalability?
Base layer
Enhancement Layer 1
Enhancement Layer 2
Ibase P P P
Ienh1
Ienh2
1 2 3 …Frame number
P
P
P
P
P
P
suffer from drifting problemsuffer from coding efficiency loss
EE591f Digital Video Processing EE591f Digital Video Processing
4949
3D Wavelet/Subband Coding3D Wavelet/Subband Coding
t
x
y
2D spatial WT+1D temporal WT
EE591f Digital Video Processing EE591f Digital Video Processing
5050
Motion-Adaptive 3D Wavelet TransformMotion-Adaptive 3D Wavelet TransformRecall Haar transform
)12()2()(
)),12()2((2
1)(
nxnxnd
nxnxns
])[(2
1
],[
12
122
nnn
nnn
dWfs
fWfd
Motion-adaptive Haar transform
))()2((2
1)(
),12()2()(
ndnxns
nxnxnd
W,W-1: forward and backward motion vector
lifting-based implementation