existing video coding standards

EE569 Digital Video Processing EE569 Digital Video Processing

11

Existing Video Coding Standards

ISO

ITU

MPEG-1 (1992)1.5Mbps, VCD

MPEG-2/H.262 (1996)2-10Mbps, DVDMPEG-4 (2000)8-1024Kbps

H.261 (1990)p×64Kbps

H.2638-512KbpsH.263+(1998)windows media player

or real player

H.264/AVC coding standard

H.120 (1984)


22

H.261 Coding StandardH.261 Coding Standard

Background:Background:– Facilitate video Facilitate video conferencingconferencing and and videophonevideophone

service over ISDNservice over ISDN– p×64 kbps (p=1:videophone; p>5: p×64 kbps (p=1:videophone; p>5:

videoconference; p=30: VHS-quality)videoconference; p=30: VHS-quality)– Basis of MPEG-1 and MPEG-2Basis of MPEG-1 and MPEG-2FeaturesFeatures– Maximum coding delay of 150msMaximum coding delay of 150ms– Amenable to low-cost VLSA implementationAmenable to low-cost VLSA implementation


33

Input Image FormatsInput Image FormatsCIFCIF QCIFQCIF

# of pels/line (Y)# of pels/line (Y)# of pels/line (U/V)# of pels/line (U/V)

360(352)360(352)180(176)180(176)

180(176)180(176)90(88)90(88)

# of lines/pic (Y)# of lines/pic (Y)# of lines/pic (U/V)# of lines/pic (U/V)

288288144144

1441447272

InterlacingInterlacing 1:11:1 1:11:1

Temporal rateTemporal rate 30,15,10,7.530,15,10,7.5 30,15,10,7.530,15,10,7.5

Aspect ratioAspect ratio 4:34:3 4:34:3


44

Video MultiplexVideo Multiplex

It defines a data structure so that a decoder can It defines a data structure so that a decoder can interpret the received bit stream without any interpret the received bit stream without any ambiguityambiguityHierarchical data structureHierarchical data structure– Picture layerPicture layer– Group of blocks (GOB) layerGroup of blocks (GOB) layer– Macroblock (MB) layerMacroblock (MB) layer– Block layerBlock layerEach layer has a distinct headerEach layer has a distinct header


55

Picture and GOB LayersPicture and GOB Layers

Picture layer consists of picture header Picture layer consists of picture header followed by the data for GOBsfollowed by the data for GOBs– Picture header contains data such as picture format Picture header contains data such as picture format

(CIF or QCIF)(CIF or QCIF)

GOB layer is always composed of 33 GOB layer is always composed of 33 macroblocksmacroblocks– GOB header contains a MB address and GOB header contains a MB address and

compression mode followed by the data for the compression mode followed by the data for the blocksblocks


66

Macroblock and Block LayersMacroblock and Block Layers

Macroblock: the smallest unit to select the compression mode

Y1 Y2

Y4Y3

Cr Cb

A MB always consists of 6 blocks (Y1 – Y4, Cr, Cb)

MBAMBA MTYPEMTYPE MQUANTMQUANT MVDMVD CBPCBP Bock DataBock Data


77

Compression Modes Compression Modes

Intra ModeIntra Mode– Similar to JPEG codingSimilar to JPEG coding– Support two compression modesSupport two compression modes

Inter ModeInter Mode– ME is not specified (MC is optional)ME is not specified (MC is optional)– Usually, 16-by-16 BMA, integer-pel accuracy, Usually, 16-by-16 BMA, integer-pel accuracy,

search range [-15,15]search range [-15,15]– Support various compression modesSupport various compression modes


88

Selecting a Compression ModeSelecting a Compression Mode

Should a MV be transmitted?Should a MV be transmitted?Should we use intra or inter compression Should we use intra or inter compression mode?mode?Should the quantizer stepsize be changed?Should the quantizer stepsize be changed?

We can choose the optimal compression mode based onthe variance of the original MB, the MB difference (bd),the displaced MB difference (dbd) and the best MV estimate


99

Selection MethodSelection Method

If the variance of If the variance of dbddbd is smaller than is smaller than bdbd, then we , then we select Inter mode and MC is neededselect Inter mode and MC is needed– Need to transmit MVDNeed to transmit MVD– The transmission of DCT coefficients is optionalThe transmission of DCT coefficients is optional

Otherwise, no MV will be transmittedOtherwise, no MV will be transmitted– If the original MB has a smaller variance, select Intra If the original MB has a smaller variance, select Intra

mode; otherwise select Inter mode (but with a zero MV)mode; otherwise select Inter mode (but with a zero MV)

For MC blocks, prediction errors can be modified by For MC blocks, prediction errors can be modified by a 2D spatial filter (the prototype of deblocking filter)a 2D spatial filter (the prototype of deblocking filter)


1010

H.261 Compression ModesH.261 Compression ModesModeMode MQUANTMQUANT MVDMVD CBPCBP TCOEFFTCOEFF VLCVLC

IntraIntra xx 00010001

IntraIntra xx xx 0000 0010000 001

InterInter xx xx 11

InterInter xx xx xx 0000 10000 1

Inter+MCInter+MC xx 0000 0000 10000 0000 1

Inter+MCInter+MC xx xx xx 0000 00010000 0001

Inter+MCInter+MC xx xx xx xx 0000 0000 010000 0000 01

Inter+MC+FILInter+MC+FIL xx 001001

Inter+MC+FILInter+MC+FIL xx xx xx 0101

Inter+MC+FILInter+MC+FIL xx xx xx xx 0000 010000 01


1111

InterpretationInterpretation

MQUANT: when it is on, a new value of MQUANT: when it is on, a new value of quantizer stepsize will be transmitted;quantizer stepsize will be transmitted;MVD: when it is on, the motion vector MVD: when it is on, the motion vector difference will be transmitted;difference will be transmitted;CBP: when it is on, it means at least one CBP: when it is on, it means at least one transform coefficient in MB will be transform coefficient in MB will be transmitted;transmitted;TCOEFF: when it is on, transform coeffients TCOEFF: when it is on, transform coeffients will be transmittedwill be transmitted


1212

Variable ThresholdingVariable ThresholdingT=g, Tmax=g+g/2

Coeff<T?Y N

T<Tmax?Y N

T=T+1 T=Tmax

T=g

Q[Coeff]=0

Q[Coeff]=g+g/2

Motivation: to increase the number of zero coefficients


1313

ExampleExample

CoeffCoeff

TT

Q[Coeff]Q[Coeff]

5050 00 00 00 3333 3434 00 4040 3333

3232 3232 3333 3434 3535 3636 3737 3838 3232

4848 00 00 00 00 00 00 4848 4848

Coef>T Coef<T Coef>T


1414

Run-Length CodingRun-Length Coding

Zigzag Scan

0000

0000

0000

0000

0000

0000

0000

0001

0000

0000

0000

0000

0000

0000

0000

0203

(run,level)

(0,3) (1,2) (7,1) EOB


1515

H.261 Rate/Buffer ControlH.261 Rate/Buffer Control

The coded video data rate is controlled byThe coded video data rate is controlled by– Pre-processingPre-processing– Quantization step-sizeQuantization step-size– Block significance criterion (CBP flag)Block significance criterion (CBP flag)– Temporal sampling ratio Temporal sampling ratio

The fullness of buffer is controlled byThe fullness of buffer is controlled by– Quantization step-sizeQuantization step-size– Maximum allowable coding delay (150ms)Maximum allowable coding delay (150ms)


1616

MPEG-I Standard

• Features- Syntax based no specific algorithm is standardized, the parameters defining the encoded bit stream and decoder are contained in the bit stream itself.- Random accessAllow independent access points (I-frame) to the bitstream.- Fast forward and reverse search- Reasonable coding/decoding delay


1717

Input Video Format

• Progressive video (interlaced video is handled by MPEG2)

• Input video is first converted into the MPEG standardinput format (SIF).SIF format: Y - 352 ×240, Cr/Cb - 176 ×120, 30 frames/sec

Y

Cr Cb


1818

MPEG-I Constrained Parameter SetMPEG-I Constrained Parameter Set

-maximum number pixels/line: 720-maximum number of lines/picture: 576-maximum number of pictures/sec: 30-maximum number of macro-blocks/picture: 396-maximum number of macro-blocks/sec: 9900-maximum bit rate: 1.86 Mbps-maximum decoder buffer size: 376,832 bits


1919

Perspective Video FormatsPerspective Video Formats

formatformat resolutionresolution Bit rateBit rate

SIFSIF 352x240, 30Hz352x240, 30Hz 1.2-3Mbps1.2-3Mbps

CCIR601CCIR601 720x486,30Hz720x486,30Hz 5-10Mbps5-10Mbps

EDTVEDTV 960x486,30Hz960x486,30Hz 7-15Mbps7-15Mbps

HDTVHDTV 1920x1080,30Hz1920x1080,30Hz 20-40Mbps20-40Mbps


2020

Hierarchical Data Structure (I)

• Sequences are formed by Group Of Pictures (GOP)

• GOP are made up of pictures

• Pictures consist of slices

• Slices are made up of macro-blocks

• Macro-blocks (MB) consist of blocks

• Blocks are 8×8 pixels arrays


2121

GOPGOPGOP GOP GOP GOP

frameframeframe frame frame frame

slicesliceslice slice slice slice

MBMBMB MB MB MB

blockblockblock block

Hierarchical Data Structure (II)


2222

Four Compression Modes

• I frame : Intra-frame JPEG-like coding• P frame : forward Prediction from previous frames• B frame : forward, backward or bi-directional Prediction• D frame : contain only the DC component of each block

I P PB B BB B B

0 1 2 3 4 5 6 7 8GOP


2323

GOP ReorderingGOP Reordering

I P PB B BB B B

0 1 2 3 4 5 6 7 8GOP

Processing order: 0,4,1,2,3,8,5,6,7


2424

MB Types in MPEG-IMB Types in MPEG-II-picturesI-pictures P-picturesP-pictures B-picturesB-pictures

IntraIntra IntraIntra IntraIntra

Intra-AIntra-A Intra-AIntra-A Intra-AIntra-A

Inter-DInter-D Inter-FInter-F

Inter-DAInter-DA Inter-FDInter-FD

Inter-FInter-F Inter-FDAInter-FDA

Inter-FDInter-FD Inter-BInter-B

Inter-FDAInter-FDA Inter-BDInter-BD

SkippedSkipped Inter-BDAInter-BDA

A- adaptive quantizationA- adaptive quantizationF- forward prediction with MC F- forward prediction with MC

D- DCT of prediction error will be codedD- DCT of prediction error will be codedB – backward prediction with MCB – backward prediction with MC

I – interpolated prediction with MCI – interpolated prediction with MC

Inter-IInter-I

Inter-IDInter-ID

Inter-IDAInter-IDA

SkippedSkipped


2525

Intra-frame Compression Mode

8×8 DCT Quantization Run-length coding

JPEG-like coder

83696956

56464638

38353429

29272726

58484840

40353532

32292927

27262622

40373834

34293429

27262726

22222219

37343429

29272726

24222219

1616168

Default quantization matrix Q0

spatially adaptive quantizationMQUANT parameter

MQUANTQQ 0

• MB types- Intra Q0

- Intra-A Q


2626

Inter-frame Compression Mode (P)• MB types

- Intra- Intra-A- Inter-D- Inter-DA- Inter-F- Inter-FD- Inter-FDA- skipped Directly copy from the block at the

same position in the previous frame

A new MQUANT value and DCT ofprediction error will be coded

We need to transmit MV and DCT ofprediction error We need to transmit MV, DCT ofprediction error and a new MQUANT


2727

Interframe Compression Mode (B)1/5.0/0),,()1(),(),( 11 ayxfayxfayxf nnn

• Advantages

• Disadvantages

-allow efficient handling of problems associated withcovered/uncovered background-MC averaging over two frames suppresses noise betterthan prediction from just one frame-Since B-frames are not used in predicting future frames,they can be coded with fewer bits without causing errorpropagation

-Two frame buffers are needed-Longer coding delay


2828

Theoretical Framework Theoretical Framework behind B-frame Codingbehind B-frame Coding

Why does it improve coding efficiency?Why does it improve coding efficiency?– Multi-hypothesis motion compensation (MHMC)Multi-hypothesis motion compensation (MHMC)– B frame is one of the simplest MHMC (two B frame is one of the simplest MHMC (two

hypotheses: forward and backward)hypotheses: forward and backward)

Why does it facilitate scalable coding?Why does it facilitate scalable coding?– Temporal scalabilityTemporal scalability– We can skip B-frames without affecting the We can skip B-frames without affecting the

decoding of other framesdecoding of other frames


2929

MPEG-I Encoder and Decoder• Encoder modules

• Decoder modules

motion estimation, selection of compression mode (MTYPE)per MB, setting MQUANT value, MCP, quantizer and dequantizer, DCT and IDCT, VLC, multiplexer, buffer and buffer regulator

Demultiplexer, VLC decoder, MCP, dequantizer and IDCT

- relative number of I,P,B pictures in a GOP is application dependent. The use of B-pictures is optional. There is at least one I picture every 132 pictures. - half-pixel accuracy in motion estimation- m.v. that refer to pixels outside of picture is not allowed


3030

Software ImplementationsSoftware Implementations

Bellcore versionBellcore version– ivy.ee.princeton.edu (not publically accessible)ivy.ee.princeton.edu (not publically accessible)

Berkeley version Berkeley version – toe.cs.berkeley.edu (128.32.149.117)toe.cs.berkeley.edu (128.32.149.117)– /pub/multimedia/mpeg/mpeg-2.0.tar.Z/pub/multimedia/mpeg/mpeg-2.0.tar.Z

Stanford versionStanford version– ftp://havefun.stanford.edu/ftp://havefun.stanford.edu/ (36.2.0.35) (36.2.0.35)– /pub/mpeg/MPEGv1.2.tar.Z/pub/mpeg/MPEGv1.2.tar.Z

ftp://havefun.stanford.edu/


3131

MPEG-I vs. H.261MPEG-I vs. H.261H.261H.261 MPEG-1MPEG-1

Sequential accessSequential access Random accessRandom access

One basic frame rateOne basic frame rate Flexible frame rateFlexible frame rate

CIF and QCIF images onlyCIF and QCIF images only Flexiable image sizeFlexiable image size

I and P frames onlyI and P frames only I, P and I, P and BB frames frames

MC over 1 frameMC over 1 frame MC over 1 or MC over 1 or moremore frames frames

Integer-pel MV accuracyInteger-pel MV accuracy HalfHalf-pel MV accuracy-pel MV accuracy

Spatial filtering in the loopSpatial filtering in the loop No filterNo filter

Variable threshold+uniform Variable threshold+uniform quantizationquantization

Quantization matrixQuantization matrix

No GOP structureNo GOP structure GOP structureGOP structure

GOB structureGOB structure Slice structureSlice structure


3232

MPEG-2 Standard

• Features

- it allows for interlaced input, higher-definition inputsand alternative subsampling of chrominance channels- it offers scalable bit stream

- it provides improved quantization and coding options

• Profiles- simple profile, main profile, SNR scalable profile,spatially scalable profile and high profile


3333

Chrominance Subsampling• 4:2:0 (same as MPEG-I)

• 4:2:2 (chroma subsampled in the horizontal direction only)

• 4:4:4 (no chroma subsampling)

luminance

chrominance

luminance

chrominance


3434

Interlaced Video Coding

• Frame pictures

• Field pictures

Interleave lines of even and odd fields to form composite frames

odd field

even field

8 8

8

8

Even and odd fields are treated as separate pictures

Q: For video containing significant motion, which format is preferred?


3535

GOP can be composed of mixture of frame GOP can be composed of mixture of frame and field picturesand field pictures– Field pictures always appear in pair (top field and Field pictures always appear in pair (top field and

bottom field)bottom field)– If the top field is a P-/B- picture, then the bottom If the top field is a P-/B- picture, then the bottom

field must also be a P-/B- picturefield must also be a P-/B- picture– If the top field is an I-picture, then the bottom field If the top field is an I-picture, then the bottom field

can be an I- or P- picturecan be an I- or P- picture– A pair of field pictures are encoded in the order in A pair of field pictures are encoded in the order in

which they should appear at the outputwhich they should appear at the output

Frame and Field Pictures


3636

Frame and Field DCT

Frame DCT Field DCT


3737

MC Prediction ModesMC Prediction Modes– Simple field predictionSimple field prediction– Simple frame predictionSimple frame prediction

Within a field picture, only simple field Within a field picture, only simple field prediction is usedprediction is usedWithin a frame picture, either simple field Within a frame picture, either simple field prediction or simple frame prediction can be prediction or simple frame prediction can be employed on a MB-by-MB basisemployed on a MB-by-MB basis

Frame and Field Prediction


3838

In the presence of motion, frame prediction suffers In the presence of motion, frame prediction suffers from strong motion artifacts; in the absence of from strong motion artifacts; in the absence of motion, field prediction does not utilize all the motion, field prediction does not utilize all the available informationavailable information16×8 MC mode: only used in the field pictures, two 16×8 MC mode: only used in the field pictures, two MVs are used for top and bottom fields respectivelyMVs are used for top and bottom fields respectivelyDual-prime mode: used only for P-pictures, one MV Dual-prime mode: used only for P-pictures, one MV and a small differential MV are encodedand a small differential MV are encoded

Frame and Field Prediction (cont’d)


3939

Spatial, Temporal and SNR Scalability in MPEG-2

• Spatial (resolution) scalability

• SNR (rate, quality) scalability

• Temporal scalability

-base layer is a low spatial resolution of the video-enhancement layers successively enhances thespatial resolution

-base layer uses a coarse quantizer for DCT coefficients-enhancement layer uses a fine quantizer for DCT coeffcients

-allow the decodability at different frame rates

Note: the scalability feature provides by MPEG-2 is ad-hocin the sense of significantly sacrificing coding efficiency


4040

Other Improvements (I)

optional alternate scan (said to fit interlaced video better)


4141

Other Improvements (II)MPEG-IMPEG-I MPEG-IIMPEG-II

Intra MBIntra MBDC Coeff.DC Coeff.

8bits8bits 11bits11bits

Intra MBIntra MBAC Coeff.AC Coeff.

[-256,255][-256,255] [-2048,2047][-2048,2047]

Non-intra Non-intra MB Coeff.MB Coeff.

[-256,255][-256,255] [-2048,2047][-2048,2047]

Finer Quantization of the DCT Coefficients


4242

Other Improvements (III)

Finer Adjustment of MQUANT

1.02.03.04.05.06.07.08.0

9.010.011.012.013.014.015.016.0

17.018.019.020.021.022.023.024.0

25.026.027.028.029.030.031.0

0.51.01.52.02.53.03.54.0

5.06.07.08.09.010.011.012.0

14.016.018.020.022.024.026.028.0

32.036.040.044.048.052.056.0

MQUANT in MPEG-I MQUANT in MPEG-II


4343

Implementation Issues (I) Implementation Issues (I)

LevelLevel Max. pels/lineMax. pels/line Max. lines/pic.Max. lines/pic. Max. frames/sec.Max. frames/sec.

LowLow 352352 288288 3030

MainMain 720720 576576 3030

High-1440High-1440 14401440 11521152 6060

HighHigh 19201920 11521152 6060

Four levels defined by MPEG-II


4444

Implementation Issues (II)Implementation Issues (II)ProfileProfile NotesNotes

SimpleSimple Does not allow B-frame and only support Main levelDoes not allow B-frame and only support Main level

MainMain Does not support scalabilityDoes not support scalabilitySupport all four levels with upper bound of 4,15,60 Support all four levels with upper bound of 4,15,60 and 80 Mbps respectivelyand 80 Mbps respectively

SNR scalableSNR scalable Support Low and Main levels with maximum bit rates Support Low and Main levels with maximum bit rates 4(3) and 15(10) Mbps4(3) and 15(10) Mbps

Spatially Spatially scalablescalable

Support only High-1440 level with a maximum Support only High-1440 level with a maximum bitrate of 60(15) Mbpsbitrate of 60(15) Mbps

HighHigh Support Main, High-1440 and High levels with Support Main, High-1440 and High levels with maximum bit rates of 20(4), 80(20) and 100(25) maximum bit rates of 20(4), 80(20) and 100(25) Mbps respectivelyMbps respectively

Five profiles defined by MPEG-II


4545

Hardware ImplementationsHardware ImplementationsC-CubeC-Cube– CL450: single-chip, MPEG-I, SIF ratesCL450: single-chip, MPEG-I, SIF rates– CL950: MPEG-IICL950: MPEG-II– CL4000: single-chip, MPEG-I/JPEG/H.261CL4000: single-chip, MPEG-I/JPEG/H.261SGS-ThomsonSGS-Thomson– STi3400: single-chip, MPEG-I, SIF ratesSTi3400: single-chip, MPEG-I, SIF rates– STi3500: the first MPEG-II chip on the marketSTi3500: the first MPEG-II chip on the marketMotorolaMotorola– MCD2500: single-chip, MPEG-I, SIF ratesMCD2500: single-chip, MPEG-I, SIF rates


4646

H.26x Standards

• H.261 (1983-1990)

• H.263/H.263+/H.263++ (1993-1999)

• H.264/AVC (1999-2003)

Video conferencing, video email, video telephony overPublic Switching Telephone Networks (PSTN) and wireless networks

-Based on H.261 but offers significant improvement on coding efficiency-Adopted by several videophone terminal standards: H.324 (PSTN), H.320 (ISDN), H.310 (B-ISDN)


4747

H.263 Input Image Formats

• sub-QCIF: 88×72

• QCIF: 176×144

• CIF: 352×288

• 4-CIF: 704×576

• 16-CIF: 1408×1152

4:2:0 YUV

30,15,10,7.5Hz

Temporal rate

Color format


4848

H.263 Picture Structure176 pelsGOB1GOB2GOB3GOB4GOB5GOB6GOB7GOB8GOB9

144 linesPictureFrame

MB1 MB2 MB3 MB4 MB5 MB6 MB7 MB8 MB9 MB10MB11

Y1 Y2

Y3 Y4 Cb Cr

8 pels

8 lines

Group ofBlocks (GOB)

Macroblock

Block An example at QCIF resolution


4949

H.263 Baseline Coding Algorithm

• Video Frame Structure

• Video Coding Tools

• Coding Control

- support sub-QCIF, QCIF,CIF,4CIF and 16CIF

- Motion estimation and compensation

- Transform - Quantization

- Entropy Coding

range : [-16,15.5] accuracy : half-pel

8×8 DCT

Q factor 7,0,,, nm

Qc

c nmqnm

3D VLC (LAST,RUN,LEVEL)

- Intra/Inter switch


5050

Advanced Coding Modes in H.263

• Annex D

• Annex E

• Annex F

• Annex G

Unrestricted motion vector mode

Syntax-based arithmetic coding mode

Advanced prediction mode

PB-frame mode

Overlapped Block Motion Compensation (OBMC)

I B P B P …

range : [-31.5,31.5]Allow MV to point outside the picture boundaries

About 5% savings over VLC


5151

Unrestricted MV ModeUnrestricted MV Mode(vx,vy)

reference frame current frame


5252

Overlapped Block Motion Overlapped Block Motion Compensation (OBMC)Compensation (OBMC)


5353

H.263+

• Annex I

• Annex J

• Annex K

• Annex L

• Annex M

• Annex N

• Annex O

• Annex P

• Annex Q

• Annex R

• Annex S

• Annex T

Advanced intra coding mode

Deblocking filter mode

Slice structure mode

Supplemental enhancementinformation mode

Improved PB-frame mode

Reference pictureselection mode

Temporal, SNR andSpatial scalability mode

Reference pictureresampling mode

Reduced resolutionupdate mode

Independently segmenteddecoding mode

Alternative Inter VLC mode

Modified quantization mode


5454

Annex D: Unrestricted MVAnnex D: Unrestricted MV

Reversible Variable Length Codes (RVLC)are used to encodethe MV differences


5555

Why RVLC?Why RVLC?


5656

Annex I: Advanced Intra CodingAnnex I: Advanced Intra Coding

1. DC only2. Vertical DC/AC3. Horizontal DC/AC


5757

Annex J: Deblocking FilterAnnex J: Deblocking Filter

[1 2 1]/4


5858

Annex M: Improved PB-FramesAnnex M: Improved PB-Frames

I PB B P

I PB B P

H.263

H.263+


5959

Annex O: Temporal, SNR and Spatial Annex O: Temporal, SNR and Spatial ScalabilityScalability


6060

Some Comparison ResultsSome Comparison Results

existing video coding standards

Documents

video conferencing

optimal compression

compression modeshould

original mb

mb address

mb difference bd

compression modesinter

displaced mb difference