optimizing baseline profile in h.264/avc video coding · web viewtable 3: 4x4 luma prediction...

BYVINOOTHNA GAJULAID 1000803103MS in Electrical Engineering

Optimizing Baseline Profile in H.264/AVC Video Coding by Parallel Programming and Fast Intra and Inter predictions

TABLE OF ACRONYMS

ASO Arbitrary slice orderingAVC Advanced video codingCABAC Context adaptive binary arithmetic codingCAVLC Context adaptive variable length codingCBP Coded block patternCIF Common intermediate formatDCT Discrete cosine transformFMO Flexible macro block orderingIEC International electro technical commissionI-frame Intra frameITU-T International telecommunication unionISO International organization for standardizationJM Joint modelJVT Joint video teamMB Micro blockMPEG Moving picture experts groupMSE Mean square errorNAL Network abstraction layerPSNR Peak signal to noise ratioQCIF Quarter common intermediate formatQP Quantization parameterRDO Rate distortion optimizationRS Redundant slicesSATD Sum of absolute transformed differencesSSIM Structural similarity index metricVCEG Video coding experts groupVLC Variable length coding

Optimizing Baseline Profile in H.264/AVC Video Coding by Parallel

Programming and Fast Intra and Inter predictions

OBJECTIVE:

In this project, the computational complexity and encoding time of baseline profile of H.264 are reduced

by using parallel programming in encoding video frames [1], [7], instead of sequentially encoding and

then by using fast adaptive termination (FAT) algorithmin intra and inter predictions [2] [15].

FAT algorithm in intra prediction is executed by using simple directional masks and neighboring modes

[2], [8], [15] and in inter prediction mode decision and motion estimation by adapting minimum rate

distortion (RD) cost of both skip and non-skip modes and an early-skip mode detection test is proposed

for skip mode and a three-stage scheme is proposed to speed up the mode decision process for non-skip

mode [3], [9], [15], [20].

INTRODUCTION:

H.264 also known as MPEG (Moving picture experts group) Part10/ AVC “ (MPEG-4’s advanced video

coding)” was jointly published in 2003 by International standards bodies - International

Telecommunication Union (ITU-T) [17], International Organization for Standardization and International

Electro-Technical Commission (ISO / IEC) called as Joint Video Team (JVT) [4].

It has many advantages over previous coding standards MPEG-2 [13] and MPEG-4 [14], like significant

rate distortion efficiency, achieving higher bit rate reduction, error resilience and most networks friendly

compared to other standards.

H.264 - PROFILES:

H.264 has three major profiles which are the baseline, main and extended and in addition to the four high

profiles namely High, High 10 [11], High 4:2:2 [11], and High 4:4:4 [5] [11] as given in the figure1[5]

[11].

-Baseline profile is applicable in real-time conversational services such as video conferencing and video

phone. [5] [11]

-Main profile is designed for digital storage media and television broadcasting [5] [11].

-Extended profile targets multimedia services over the internet [5] [11].

-High, High 10, High 4:2:2, and High 4:4:4 [11] are used in the fidelity range extensions for applications

such as content-contribution, content-distribution, and studio editing and post-processing respectively [5]

[11].

Fig.1: Various profiles of H.264 [5]

Encoder and Decoder of H.264:

A H.264 is a codec i.e., a combination of encoder and decoder complimenting each other to achieve the

required compression and better picture quality. H.264 encoder converts the video into a compressed

format (.264 formats) and a decoder converts the compressed video back into an uncompressed format

with very few losses.

A H.264 video encoder carries out prediction, transform and encoding processes to produce a compressed

video form as in the figure 2 [6].

Fig. 2 H.264 encoder block diagram [6]

After encoding, the coded video data is organized into network abstraction layer (NAL) containing NAL

units, each of which is effectively a packet that contains an integer number of bytes. Each NAL unit is a

collection of SLICES which is a group of macro blocks (MB) representing MB type, prediction

information, coded block pattern (CBP), residual coefficients and quantization parameter (QP) as

explained in figure 3 [4].

Fig. 3: NAL unit interface between encoder and decoder [4]

A H.264 video decoder function is to re-produce video sequence by carrying out the complementary

functions of the encoder i.e., decoding, inverse transformation and reconstruction as explained in the

figure 3a [6].

Fig. 3a H.264 decoder block diagram [6]

Prediction Modes:

The prediction modes in H.264 can be categorized as intra prediction (I), inter prediction (P) and their

combination.

INTRA PREDICTION MODE:

An intra (I) macro block is a coded reference to the data only in the current slice. I macro blocks may

occur in any slice type. In an intra MB the luma component can be selected in 3 ways, namely 16 × 16,

8 × 8 or 4 × 4. A single prediction block is generated for each chroma component as shown in table 1 [4].

The modes of prediction for 16 × 16 MB are given in table 2 and figure 4 [4][5].

Table 1: Various intra prediction block sizes and properties. [4]

Table 2: 16x16 luma prediction modes and properties.[4]

Fig 4: 16x16 luma prediction modes, all predicted fron pixelsH and V. [4]

8x8 (for Chroma) –

Mode 0 (DC): mean of upper and left-hand samples (H+V). [5]

Mode 1 (horizontal): extrapolation from left samples (V). [5]

Mode 2 vertical): extrapolation from upper samples (H). [5]

Mode 3 (Plane): a linear “plane” function is fitted to the upper and left-hand samples H and V.

This works well in areas of smoothly-varying luminance. [5]The properties of the modes of

prediction are given in table 3[4] and the pictorial representation in figure 5 [4Table 3: 4x4 luma

prediction modes and properties[4]

Table 3: 4x4 luma prediction modes and properties [4]

Fig.5: 4x4 luma prediction (intra-prediction) modes in H.264[1]

(Pixels A through M which have been coded and reconstructed to form the prediction for the 4 x 4 block.)

INTER PREDICTION MODE:

Inter prediction is the process of predicting a block of luma and chroma samples in a current frane from

samples already coded and transmitted from another frame or a reference frame. Initially a prediction

region is selected, generating a prediction block and this is removed/subtracted from the original block of

samples to form a residual that is then transformed, coded and transmitted along with the sample number

or the reference sample [4].

The MB’s are split into four types [4] as shown in the figures 6 and 7.

(a) One 16x16 MB Partition.

(b) Two 8x16 MB Partitions.

(c) Two 16x8 MB Partitions.

(d) Four 8x8 Partitions and

(e) Combination of any of b, c and d.

Fig.6 Macro block partitions: 16x16, 8x16, 16x8, 8x8 [4]

Fig. 7 Macro block sub partitions: 8x8, 4x8, 8x4, 4x4 [4]

Rate Distortion Optimization (RDO) [6], [8], [9], [20]:

Once the prediction is obtained and residual is calculated for all the modes, the best mode among

these modes is one which has least residual. The H.264/AVC encoder performs the rate-distortion

optimization (RDO) technique for each macro block to obtain the best mode. [2]

Set macro block parameters : Quantization parameter (QP) and Lagrangian multiplier λ

Calculate : 0.85 x 2(QP-12)/3[2]………………………………..(1)

Then calculate the cost, which determines the best mode

Cost = D + λ MODE x R[2],………………………………………(2)

Where D – Distortion and R - Bit rate with given QP

Distortion (D) is obtained by SSD (sum of squared differences) between the original macro block

and its reconstructed block.

Bit rate (R) includes the number of bits for the mode information and transforms coefficients for

macro block.

Considering the RDO procedure for intra mode selection in H.264/AVC, the number of mode

combinations in one macro block is

N8x (16x N4 + N16) = 8x(16+16)=592

N8 – number of modes of an 8x8 chroma block

N4 – number of modes of a 4x4 luma block

N16 – number of modes of a 16x16 luma block

The H.264/AVC encoder carries out 592 RDO calculations to choose the best matching MB. As a

result, the complexity of the encoder increases extremely [16].

INPUT FORMATS:

H.264 can compress planar and interleaved/packed raw image data (viz., yuv, rgb) and depending upon

the video, it converts them into intermediate formats like CIF (common intermediate format), QCIF

(quarter common intermediate format), Sub-QCIF and 4 CIF. But mostly CIF and QCIF are used here.

The resolutions of the different formats are shown in the table 4 [4]. The resolutions of CIF and QCIF for

4:2:0 sampling are shown in figure 8 [18].

Table 4: Different intermediate formats[4]

Fig. 8 CIF and QCIF resolutions(Y, Cb, Cr), (4:2:0) [8][9]

QUALITY MEASUREMENT:

The major challenge is determining the quality of the image/video obtained as measuring visual quality

using objective criteria gives accurate and repeatable results but as yet there are no objective

measurement systems that completely reproduce the human visual system [11].

PSNR

Peak signal to noise ratio (PSNR) is measured on a logarithmic scale and depends on the mean squared

error (MSE) between an original and a decoded/lossy image or video frame, relative to the square of the

highest-possible signal value in the image, where n is the number of bits per image sample [11].

PSNRdB = 10 log10 ((2n − 1)2/ MSE)

This is easy to calculate and is widely used and the most popular measure of quality.

SSIM

The structural similarity index (SSIM) is a method to measure the similarity between two images.But,

while calculating SSIM, the reference image used is assumed as a perfect one i.e., the original image

without any artifacts. Hence, SSIM is measured by providing the original image or image which is the

most close to original one [4].

OPTIMIZATION PROCESS OF BASELINE PROFILE:

H.264 provides the best compression but is computationally much more complex than any of the previous

codecs and also time consuming for real time applications. So to make H.264 more adaptable for practical

application, the encoding time is to be reduced. In this project, encoding time reduction is achieved by

applying following methods simultaneously.

1. Parallel programming in baseline profile [7],

2. Fast algorithm for intra mode selection [8] and

3. Fast algorithm for inter mode selection [9] [20].

Baseline profile is selected because of the ease of implementation and the important features of baseline

profile are:

a) I and P slice coding.

b) Enhanced error resilience such as flexible macro block ordering (FMO) and arbitrary slice

ordering(ASO) and redundant slices (RS).

c) Context adaptive variable length coding (CAVLC)

Baseline profile is primarily used for low-cost applications, for data loss robustness like video

conferencing and videophone. The joint model (JM 18.0) implementation of the H.264 encoder is used in

this project [10].

1. Parallel Programming in Baseline Profile [7]:

This parallel programming is done by considering several frames together for encoding. This can be

achieved by

The strategy adopted for encoding the frames to be parallel is as follows:

Step1. Separate the total number of frames to encode into 2 equal sets.

Ex: If the total number of frames to be encoded is 30, then part ion is done as the frame numbers from 1

to 15 into set 1 and frame numbers from 16 to 30 into set 2 .

Step2. Perform the parallel intra coding on two frames in both partitions.

Ex: Frame 1 and frame 16 together. Frame 1 can be used as a reference frame for frame 2 and frame 16

can be used as a reference frame for frame 17 and so on.

Step3. Perform inter coding on frame 2 and frame 17 by incorporating changes in the encoding algorithm

using Open MP. Repeat for frame 3 and frame 18 and so on till all the frames are encoded, as given in the

figure 9.

Fig.9: Parallel processing of frames to reduce encoding time[7]

2. Fast Algorithm for Intra Mode Selection[8]:

Proposed intra mode selection algorithm for a 4x4 luma block [12], [8]:

In figure 10, black dots indicate positions of the pixels to be computed for investing directional

correlation in the 4x4 luma block, and arrows represent the directions of correlation associated with the

corresponding mask. Since directions of the H.264/AVC intra-prediction are limited to 8 directions except

DC mode, 8 directional masks are proposed instead of a precise edge detector such as Sobel operator [16].

One candidate mode with the minimum difference is selected [8].

Fig.10: The proposed directional masks for a 4.4 luma block. (a) Vertical, (b) Horizontal, (c)

Diagonal down left, (d) Diagonal down right, (e) Vertical right, (f) Horizontal down, (g) Vertical

left, (h) Horizontal up mask [8].

Fig. 11: Pixel indices and modes of adjacent blocks used in the proposed intra mode selection algorithm.

(a) Indices used in (3) to (10) for a 4x4 luma block, (b) Modes of upper and left blocks for additional

candidate modes [8].

Diff = |a – m| + |b – n| + |c – o| + |d – p|, for vertical direction, (3)

Diff = |a – d| + |e – h| + |i – l| + |m – p|, for horizontal direction, (4)

Diff = |c – i| + 2·|d – m| + |h – n|, for diagonal down left direction, (5)

Diff = |b – l| + 2·|a – p| + |e – o|, for diagonal down right direction, (6)

Diff = |a – n| + 2·|b – o| + |c – p|, for vertical right direction, (7)

Diff = |a – h| + 2·|e – l| + |i – p|, for horizontal down direction, (8)

Diff = |b – m| + 2·|c – n| + |d – o|, for vertical left direction, (9)

Diff = |e – d| + 2·|i – h| + |m – l|, for horizontal up direction, (10)

Where a to p denote the pixels for investing directional correlation associated with the corresponding

mask of the indices for pixel positions used in (3) to (10) as shown in figure 10. Diff is used as a criterion

for correlation, i.e., the direction with smaller Diff is the more correlated one. From the second

observation, additional candidate modes are obtained by using mode information of adjacent blocks,

where one is the upper block with the corresponding mode of mode A and the other is the left block with

the corresponding mode of mode B, as shown in figure11 [8].

The additional modes are included namely mode A and mode B, to the candidate modes for RDO

procedure. Since the directions in the H.264/AVC intraprediction are defined with the directional relation

between current block and boundary pixels of adjacent blocks, instead of direction within the current

block only. In this case, one mode when mode A and mode B are the same, or two modes when mode A

and mode B are different from each other, is included in RDO procedure. [8]

To determine whether DC mode is included in RDO procedure or not, the sum(S) of difference

between an average of current block to each pixel (pi) is considered (11).

Where the condition is , and pi is each pixel of current block.…….(11)

Condition 1: If S is smaller than a threshold, T1, RDO is carried out for at most 4 candidate modes, i.e.,

one mode from the proposed masks, at most two modes from adjacent blocks, and DC mode [8].

Condition 2: If S is larger than a threshold, T1, RDO is performed for at most 4 candidate modes, i.e.,

two modes from the proposed masks (with minimum and second minimum Diff) and at most two modes

from adjacent blocks [8].

The proposed intra mode selection algorithm for a 4x4 luma block is summarized as follows:

Step 1 - For a 4x4 luma block, obtain avg and S by (1). [8]

Step 2a - If S is larger than a threshold, T1, carry out RDO procedure for at most 4 candidate modes: two

modes with minimum and second minimum Diff by (3) to (10), and at most two modes from adjacent

blocks. In this case, DC mode of adjacentblocks is excluded from RDO procedure [8].

Step 2b - If S is smaller than a threshold, T1, carry out RDO procedure for at most 4 candidate modes:

one mode with minimum Diff by (3) to (10), at most two modes from adjacent blocks, and DC mode [8].

Proposed intra mode selection algorithm for a 16x16 luma block [12], [8]:

Step 1 - Examine sizes of adjacent blocks: if both blocks (upper block and left block) are 16x16, go to

Step 2, otherwise go to Step 4 [8].

Step 2 - Examine modes of adjacent blocks: if both modes are same, go to Step 3, otherwise select the

best mode for a 16x16 luma block, which results in the minimum SATD (sum of absolute transformed

differences) between two adjacent modes of mode A and mode B [8].

Step 3 - If both adjacent modes are DC mode, go to Step 4, and otherwise select the best mode for a

16x16 luma block, which results in the minimum SATD between the adjacent mode and DC mode [8].

Step 4 - Let ΔV be a vertical difference between upper boundary pixels of the current block and boundary

pixels of the upper block, and ΔH be a horizontal difference between left boundary pixels of the current

block and boundary pixels of the left block as follows [8].

Where, ΔV = Σ |u(i)-q(i)| for i =0 to 15.

ΔH = Σ |l(i)-r(i)| for i =0 to 15.

u(i) -> upper block boundary pixels,

q(i) -> upper boundary pixels of current block,

l(i) -> boundary pixels of the left block, and

r(i) -> left boundary pixels of the current block.

Fig. 12: Calculation for ΔV and ΔH in 16x16 luma block [2] [8].

Obtain candidate modes by using two difference values, ΔV and ΔH: if |ΔV − ΔH | is smaller than 2xT2,

candidate modes are DC mode and plane mode as shown in the figure 12; if (ΔV − ΔH) is larger than T2,

candidate modes are DC mode and horizontal mode; if (ΔV − ΔH) is smaller than T2, candidate modes

are DC and vertical mode, where T2 is a positive value. The threshold T2 is set equal to 32. Finally,

select the best mode between each candidate mode by choosing the mode with minimum SATD.

3. Fast algorithm for Inter Mode Selection [9], [20]:

FAT for mode decision exploits statistical similarity between current macro block and predicted macro

block. Predicted mode is obtained from the spatial and temporal macro blocks.

For accuracy, the rate distortion cost is checked against adaptive Threshold I and adaptive Threshold II

Adaptive Threshold I: RD thres = RD pred x (1-8xβ)

Adaptive Threshold II: RD thres = RD pred x (1+10xβ)

Such that

………. (4)

Where, β is the modulator, N is the rows of the image and M is number of columns of N X M MB. If the

predicted mode is less than P 8 x 8, it is checked if the current macro block is homogeneous or not.

Further partitioning is done into 8x4, 4x8 and 4x4 blocks, if the current macro block is not homogenous.

A mode histogram from spatial and temporal neighboring macro blocks is obtained; then the best mode as

the index corresponding to the maximum value in the mode histogram is selected. The average rate-

distortion cost of each neighboring macro block corresponding to the best mode is then selected as the

prediction cost for the current macro block [9], [20].

FAT Algorithm [9][20]:The algorithm is given in figure 13 and is explained below:

Step 1: If current macro block belongs to I slice, check for intra prediction using I4x4 or I16x16, go to

step 10 else go to step 2.

Step 2: If a current macro block belongs to the first macro block in P slice check inter and intra

prediction modes, go to step 10 else go to step 2.

Step 3: Compute mode histogram from neighboring spatial and temporal macro blocks, go to step 4.

Step 4: Select prediction mode as the index corresponding to maximum in the mode histogram and

obtain values of adaptive Threshold I and adaptive Threshold II, go to step 5.

Step 5: Always check over P16x16 mode and check the conditions in the skip mode, if the conditions

of skip mode are satisfied go to step 10, otherwise go to step 6.

Step 6: If all left, up , up-left and up-right have skip modes, then check the skip mode against, then

check the skip mode against adaptive Threshold I if the rate distortion is less than adaptive

Threshold I , the current macro block is labeled as skip mode and go to step 10, otherwise, go to

step 7.

Step 7: First round check over the predicted mode; if the predicted mode is P8x8, go to step 8;

otherwise, check the rate distortion cost of the predicted mode against Adaptive Threshold I. If the

RD cost is less than Adaptive Threshold I, go to step 10; otherwise go to step 9.

Step 8: If a current P 8x8 is homogeneous, no further partition is required. Otherwise, further

partitioning into smaller blocks 8x4, 4x8, 4x4 is performed. If the RD of P 8x8 is less than Adaptive

Threshold I, go to step 10; otherwise go to step 9.

Step 9: Second round check over the remaining modes against Adaptive Threshold II : If the rate

distortion is less than Adaptive Threshold II; go to step 10; otherwise continue check all the

remaining modes, go to step 10.

Step 10: Save the best mode and rate distortion cost.

Fig 13: Flow chart for inter prediction [9] [20]

CONCLUSION:

As proposed by implementing parallel programming in baseline profile along with FATalgorithm in intra

and inter prediction modes on numerous test subjects, and by obtaining various quality measurements

like PSNR and SSIM , the optimized baseline profile will be obtained.

The performance of the optimized H.264 baseline profile is compared with the H.264 baseline

profile using the quality measurements, and thus the faster computation speed, video quality and bit rates

can be calculated based on various test sequences.

REFERENCES:

[1] H. Kalva, “Parallel programming for multimedia applications”, Springer Science and Business

Media, Florida Atlantic University, Florida, USA, Dec. 2010.

[2] J. Kim, D. Kim, and J. Jeong, “Complexity reduction algorithm for intra mode selection in

H.264/AVC video coding” J. Blanc-Talon et al. (Eds.): ACIVS 2006, LNCS 4179, pp. 454 – 465,

Springer-Verlag Berlin Heidelberg, 2006.

[3] J. Ren, et al, “Computationally efficient mode selection in H.264/AVC video coding”, IEEE

Trans. on Consumer Electronics, vol. 54, pp. 877 – 886, May 2008.

[4] I. Richardson, “The H.264 advanced video compression standard” –second edition, Wiley,

2010.

[5] I. E. G. Richardson, “H.264 and MPEG-4 video compression: video coding for next generation

multimedia”, Wiley 2nd edition, Aug. 2010.

[6] D. Marpe, T. Wiegand and G. J. Sullivan, “The H.264/MPEG-4 AVC standard and its

applications”, IEEE Communications Magazine, vol. 44, pp. 134-143, Aug. 2006.

[7] T. Saxena, “Reducing the encoding time of H.264 baseline profile using parallel programming

techniques”, M.S., Thesis EE, UTA, expected Dec. 2012.

[8] S.K Muniyappa, “Implementation of complexity algorithm for intra mode selection in H.264/AVC

video coding”, M.S., Thesis EE, UTA, Dec. 2011.

[9] A. Kulkarni, ”Implementation of fast inter-prediction mode decision algorithm in H.264/AVC

video encoder”, ” M.S., Thesis EE, UTA, May 2012.

[10] JM reference software, Fraunhofer Institute for Telecommunications Heinrich Hertz Institute.

http://iphome.hhi.de/suehring/tml/.

http://iphome.hhi.de/suehring/tml/

http://www.amazon.com/s/ref=ntt_athr_dp_sr_2?_encoding=UTF8&sort=relevancerank&search-alias=books&field-author=Iain%20E.%20G.%20Richardson

[11] G. Sullivan, P. Topiwala, and A. Luthra, “The H.264/AVC advanced video coding standard:

overview and introduction to the fidelity range extensions”, SPIE Conference on Applications of Digital

Image Processing XXVII, vol. 5558, pp. 53-74, 2004.

[12] F. Pan et al, “Fast intra mode decision algorithm for H.264/AVC video coding”, in Proc.IEEE Int.

Conf. Image Process., pp. 781–784, Singapore, Oct. 2004.

[13] I. E.G. Richardson, “H.264 and MPEG-4 video compression: video coding for next-generation

multimedia”, Wiley, 2003.

[14] ISO/IEC 11172-5. Information technology - Coding of moving pictures and associated audio for

digital storage media at up to about 1.5 Mbps. Nov. 1998.

[15] M. Jafari and S. Kasaei, “Fast intra- and inter-prediction mode decision in H.264 advanced video

coding”, International Journal of Computer Science and Network Security, VOL.8 No.5, pp. 1-6, May

2008.

[16] T. Stockhammer, D. Kontopodis, and T. Wiegand, “Rate-distortion optimization for H.26L video

coding in packet loss environment,” in Proc. Packet Video Workshop 2002, Pittsburgh, PA, April 2002.

[17] Draft ITU-T Recommendation and final draft international standard of joint video specification

(ITU-T Rec. H.264/ISO/IEC 14 496-10 AVC), Mar. 2003.

[18] YUV test video sequences : http://trace.eas.asu.edu/yuv/.

[19] T.Wiegand, et al, “Overview of the H.264/AVC Video Coding Standard.” IEEE Trans.

Circuits and Syst. for Video Technol., Vol. 13, pp. 560-576, July 2003.

[20] D. Han, A. Kulkarni and K.R. Rao, “Fast inter-prediction mode decision algorithm for H.264 video encoder”, ECTICON 2012, Cha Am, Thailand, May 2012.

http://trace.eas.asu.edu/yuv/

optimizing baseline profile in h.264/avc video coding · web viewtable 3: 4x4 luma prediction...

Documents