[ieee tencon 2007 - 2007 ieee region 10 conference - taipei, taiwan (2007.10.30-2007.11.2)] tencon...

4
Hybrid Intermode Decision for H.264 Video Coding Y.M. Lee and Y. Lin Department of Communication Engineering National Central University, Taiwan 32054, R.O.C. Abstract- This paper proposes a hybrid intermode decision algorithm for H.264 video coding in which the sum of the absolute difference statistics between macroblocks is employed to describe the stationary characteristics of video objects. Two adequate algorithms, namely the zero-block detection algorithm and the bottom-up merging algorithm, are applied to the stationary and non-stationary regions of the video sequences respectively, to reduce the computation. The simulation results reveal that the proposed algorithm can significantly reduce the computation while maintaining high coding efficiency, and greatly outperforms other investigated algorithms in computation for any bit-rate coding. I. INTRODUCTION The emerging H.264 video coding standard achieves significantly better performance in both PSNR and video quality at the same bit rate compared with priori video coding standards such as MPEG4 and H.26L etc. This is due to the fact that a number of new techniques are employed in H.264/AVC. One important technique is the use of variable block-size motion estimation/compensation. In the H.264/AVC, interframe motion estimation is performed for 7 different block sizes (denoted as modes 7 1 ~ m m ), varying among 16x16, 16x8, 8x16, 8x8, 8x4, 4x8, and 4x4. The rate distortion optimization (RDO) technique is used to check all possible inter-modes and find the best coding result to obtain the highest coding efficiency. The computational complexity of H.264 is dramatically increased due to this optimization technique and variable block-size modes performed. To reduce the computation cost and maintain coding performance, many fast and efficient methods for intermode decisions have been proposed in recent years. Some algorithms attempt to compose motion vector candidates for different block-size modes to reduce the computation using bottom-up merging and top-down splitting algorithms [1]. Some other algorithms attempt to reduce the computation by excluding the less likely modes in the mode decision process [2]. This approach is based on the observation that if a larger block-size mode has a higher RD cost than the current block-size mode, then even larger block-size modes could have larger RD cost and be excluded, and vice versa. Other schemes make use of temporal and/or spatial correlations to classify a video object into stationary (as well as homogeneous) or non-stationary (non-homogeneous) areas [3]-[4]. The sum of the absolute difference ( 16 16× SAD ) between the current macroblock (MB) and the collocated macroblock (MB) in the reference frame is commonly used to check whether the current MB is temporally stationary by comparing 16 16× SAD with a threshold (denoted as 16 16 x sad T ). If the MB is stationary, only large block-size modes are then performed. If not, all 7 modes are performed. In this work, 16 16× SAD is used to classify a video object into stationary and non-stationary regions. Two adequate algorithms, namely the zero-block detection algorithm and the bottom- up merging algorithm, are employed to predict the modes performed for the MBs in the stationary area and the non- stationary area, respectively. II. HYBRID INTERMODE DECISION ALGORITHM Most video sequences possess a lot of background or motionless video objects in stationary region, and most of MBs are finally encoded with SKIP mode or large block- size modes after the computationally expensive rate distortion optimization (RDO). And the areas of video sequences that have fast motion or exhibit high detail should be split into smaller block modes to get the best coding efficiency. It is also observed that the MBs encoded with skip mode or large block-size modes (e.g., L1 modes, including m 1 , m 2 and m 3 ) has numerous zero blocks of 4x4 DCT coefficients in 16 16× SAD , while the MBs with L2 modes (including m 4 , m 5 , m 6 and m 7 ) has fewer zero-blocks. In the simple mode decision [4] a video sequence is classified into a stationary region and a non-stationary region by comparing the threshold 16 16 x sad T with 16 16× SAD . Only modes of L1 type are performed for MBs in the stationary region, while both L1and L2 type modes are performed for MBs in the non-stationary region. In order to improve computation efficiency, different and adequate decision algorithms for both regions are required. Here we propose a novel hybrid algorithm in which the MBs belonging to the stationary region (i.e., 16 16 16 16 x sad x T SAD ) is performed using a decision algorithm based upon zero- block (ZB) detection of DCT coefficients; while the macroblocks belonging to the non-stationary region (i.e., 16 16 16 16 x sad x T SAD > ) using bottom-up merging algorithm, that estimates the PMVs of the larger bock-size modes from the MVs of the 4 4 × mode. The hybrid decision algorithm is depicted in Fig. 1.Both algorithms are described as follows.

Upload: y

Post on 15-Apr-2017

213 views

Category:

Documents


0 download

TRANSCRIPT

Hybrid Intermode Decision for H.264 Video Coding

Y.M. Lee and Y. Lin Department of Communication Engineering

National Central University, Taiwan 32054, R.O.C.

Abstract- This paper proposes a hybrid intermode decision

algorithm for H.264 video coding in which the sum of the absolute difference statistics between macroblocks is employed to describe the stationary characteristics of video objects. Two adequate algorithms, namely the zero-block detection algorithm and the bottom-up merging algorithm, are applied to the stationary and non-stationary regions of the video sequences respectively, to reduce the computation. The simulation results reveal that the proposed algorithm can significantly reduce the computation while maintaining high coding efficiency, and greatly outperforms other investigated algorithms in computation for any bit-rate coding.

I. INTRODUCTION

The emerging H.264 video coding standard achieves significantly better performance in both PSNR and video quality at the same bit rate compared with priori video coding standards such as MPEG4 and H.26L etc. This is due to the fact that a number of new techniques are employed in H.264/AVC. One important technique is the use of variable block-size motion estimation/compensation. In the H.264/AVC, interframe motion estimation is performed for 7 different block sizes (denoted as modes 71~ mm ), varying among 16x16, 16x8, 8x16, 8x8, 8x4, 4x8, and 4x4. The rate distortion optimization (RDO) technique is used to check all possible inter-modes and find the best coding result to obtain the highest coding efficiency. The computational complexity of H.264 is dramatically increased due to this optimization technique and variable block-size modes performed.

To reduce the computation cost and maintain coding performance, many fast and efficient methods for intermode decisions have been proposed in recent years. Some algorithms attempt to compose motion vector candidates for different block-size modes to reduce the computation using bottom-up merging and top-down splitting algorithms [1]. Some other algorithms attempt to reduce the computation by excluding the less likely modes in the mode decision process [2]. This approach is based on the observation that if a larger block-size mode has a higher RD cost than the current block-size mode, then even larger block-size modes could have larger RD cost and be excluded, and vice versa.

Other schemes make use of temporal and/or spatial correlations to classify a video object into stationary (as well as homogeneous) or non-stationary (non-homogeneous) areas [3]-[4]. The sum of the absolute difference ( 1616×SAD )

between the current macroblock (MB) and the collocated macroblock (MB) in the reference frame is commonly used to check whether the current MB is temporally stationary by comparing 1616×SAD with a threshold (denoted as 1616xsadT ). If the MB is stationary, only large block-size modes are then performed. If not, all 7 modes are performed. In this work,

1616×SAD is used to classify a video object into stationary and non-stationary regions. Two adequate algorithms, namely the zero-block detection algorithm and the bottom-up merging algorithm, are employed to predict the modes performed for the MBs in the stationary area and the non-stationary area, respectively.

II. HYBRID INTERMODE DECISION ALGORITHM

Most video sequences possess a lot of background or motionless video objects in stationary region, and most of MBs are finally encoded with SKIP mode or large block-size modes after the computationally expensive rate distortion optimization (RDO). And the areas of video sequences that have fast motion or exhibit high detail should be split into smaller block modes to get the best coding efficiency. It is also observed that the MBs encoded with skip mode or large block-size modes (e.g., L1 modes, including m1, m2 and m3) has numerous zero blocks of 4x4 DCT coefficients in 1616×SAD , while the MBs with L2 modes (including m4, m5, m6 and m7) has fewer zero-blocks.

In the simple mode decision [4] a video sequence is classified into a stationary region and a non-stationary region by comparing the threshold 1616xsadT with 1616×SAD . Only modes of L1 type are performed for MBs in the stationary region, while both L1and L2 type modes are performed for MBs in the non-stationary region.

In order to improve computation efficiency, different and adequate decision algorithms for both regions are required. Here we propose a novel hybrid algorithm in which the MBs belonging to the stationary region (i.e., 16161616 xsadx TSAD ≤ ) is performed using a decision algorithm based upon zero-block (ZB) detection of DCT coefficients; while the macroblocks belonging to the non-stationary region (i.e.,

16161616 xsadx TSAD > ) using bottom-up merging algorithm, that estimates the PMVs of the larger bock-size modes from the MVs of the 44 × mode. The hybrid decision algorithm is depicted in Fig. 1.Both algorithms are described as follows.

2.1. Zero-block detection algorithm for stationary region

In the simple mode decision algorithm, L1 modes are performed in the stationary region. In this section we propose an algorithm to further improve the computation performance by considering the number of ZBs as the criterion to classify a video object with more details. In the algorithm the sum of the absolute difference (SAD), denoted as SAD16x16, between the current MB and the collocated MB in the reference frame, is employed to check if a MB is temporally stationary. The sum of the absolute difference (SAD) SAD16x16 is given by

∑ ∑ −== =

×15

0

15

01616 ),(),(

i jrc jiMBjiMBSAD

(1)

where MBc(I,j) and MBr(I,j) represent pixel intensities in the current MB and the reconstructed co-located MB, respectively.

In H.264/AVC instead of 8x8 discrete cosine transform (DCT), a 4x4 integer DCT is used to reduce both ringing and blocking artifacts. The sum of the absolute difference (SAD) SAD16x16 can be rewritten as

∑ ∑=∑ ∑

∑ ∑=

∑ ∑ −=

= == = = =

= =×

3

0

3

0

,3

0

3

0

3

0

3

0

,

15

0

15

01616

),(

),(),(

l k

kl

l k i j

kl

i jrc

Xjix

jiMBjiMBSAD

(2)

Where Xl,k is the sum of the absolute difference (SAD) between 4x4 blocks located at (4l,4k) in the MBs. In this work, instead of performing L1 modes we employ the number of zero blocks (ZB) of 4x4 DCT in 1616×SAD to check what kind of modes should be performed in a MB to reduce the computation.

Moon et al. [5] derived a sufficient condition for ZBs in H.264 video coding which is summarized as follows. (1) If )0(, TX kl ≤ , then klX , is a ZB, and where

)]0;6%(4/[]2[)0( 6/1565 QPMT QP ⋅⋅= + , );6%( rQPM

is a multiplication factor for quantization . (2) If klXT ,)0( > and )}1(,2/)0(min{, TTX kl γ+≤ , then

klX , is also a ZB. The parameters )1(T and γ are respectively given by

)]1;6%(2/[]2[)1( 6/1565 QPMT QP ⋅⋅= + ,

and

)]],2(),1([)],,3(),0([min{ ,3

0

,,3

0

, jxjxjxjx kl

j

klkl

j

kl +∑+∑===

γ

The early ZB detection algorithm in [5] is used. Based

upon the number of ZBs (denoted as N), the ZB detection algorithm is briefly described as follows.

(1) For 16=N , perform and choose mode 0m (skip) as the

best mode. (2) For 158 ≤≤ N , perform modes 0m and 1m . If the RD

cost of mode 1m is less than that of mode 0m , then further perform modes 2m and 3m . Choose the best mode.

(3) For 75 ≤≤ N , perform modes 0m , 1m , 2m and 3m . Choose the best mode.

(4) For 40 ≤≤ N , perform modes 0m , 1m , 2m , 3m and

4m . Choose the best mode. If mode 4m has the best (least) RD cost, further perform 5m , 6m and 7m . Choose the best mode.

Note that when the zero-block detection algorithm is applied to the stationary region, the procedure for performing 5m ,

6m and 7m for 40 ≤≤ N can be omitted. 2.2. Decision algorithms for non-stationary region

There are many fast and efficient algorithms for intermode decision proposed recently. Two popular types of algorithms are bottom-up merging algorithm and mode exclusion algorithm. These algorithms are modified and applied to the non-stationary region where MBs are encoded using modes L1 and L2 equally. The algorithms are summarized as follows. 2.2.1. Bottom-up merging algorithm

In the bottom-up merging algorithm, mode )44(7 ×m is performed first, and the predicted motion vectors (PMVs) of other larger block-size modes are obtained using the merging procedure. The PMV is used as the center of search point, and based on the initial point the full search refinement with 2± pixels is performed to obtain a more accurate MV. It was shown that more than 90% accuracy could be achieved with pel−±2 refinement. The merging algorithm is described as follows. Step 1. Perform mode )44(7 ×m and find the estimated MVs. Step 2. Predict MVs for larger block-size modes using the merging procedure. Step 3. Calculate the absolute difference Diff of PMVs with the same block-size modes. Step 4. If β≤Diff , then perform pel−±2 refinement. Otherwise perform the full search estimation.

Note that for comparison purpose, the motion estimation for mode )44(7 ×m is performed using full search instead of fast search. The threshold β is a variable, and a larger β leads to a more computation reduction but with more severe degradation, and vice versa.

2.2.2 General mode exclusion algorithm The general mode exclusion algorithm first performs three general modes ( 41, mm and 7m ) and then incorporates the exclusion of less likely modes to reduce the computation. The algorithm is described as follows. Step 1. Perform modes )88(),1616(),( 410 xmmskipm × and

)44(7 ×m for each MB, and calculate their RD cost. Step 2. The best mode is the skip mode if mode 0m has the least RD cost and the process stops. Step 3. If the best two RD costs are modes 1m and 4m , further perform modes 2m and 3m and calculate their RD cost. Select the best mode among 321 ,, mmm and 4m , according to their RD cost. Step 4. If the best two RD costs are modes 4m and 7m , further perform modes 5m and 6m and calculate their RD cost. Select the best mode among modes 654 ,, mmm and 7m . Step 5. If the best two costs are modes 1m and 7m , further perform modes 2m , 3m , 5m and 6m and calculate their RD cost. Select the best mode among modes 321 ,, mmm , 5m ,

6m and 7m . Both algorithms are implemented into a JM86 encoder

with a full search algorithm. The experiments were conducted on several video sequences for various QPs and the simulation result shows that both algorithms achieve a negligible PSNR distortion (within 0.05 dB), but the merging algorithm outperforms the general mode exclusion algorithm in bit increment (0.5% and 2.5% respectively). This indicates that the general mode exclusion algorithm has more SNR loss (about 0.2 dB at the same rate). Thus the merging algorithm is applied to the non-stationary region in the proposed algorithm.

III. EXPERIMENTAL RESULT

The proposed algorithm and all other algorithms, including simple mode decision, zero-block detection and merging algorithms, are implemented into JM86 encoder for comparison. The experiments were carried on all the 13 test video sequences (6 QCIF: foreman, carphone, claire, grandma, highway, mthr-dotr and 7 CIF sequences: mobile, paris, salesman, tempete, football, container, news), which range from sequences with high motion activity to sequences with low motion activity. The simulation conditions are given as follows: Profile: Baseline Number of Frames: 100 Reference Frames: 5 Entropy Coding: UVLC RDO: on QP:14 – 42 Coding Structure: IPPP Resolution: 1/4 Pixels Search Range: ± 16 Hadamard: on The experiment indicates that the threshold 1616xsadT is a variable of the quantization parameter QP, a large QP requires a large threshold 1616xsadT and a small QP requires

a small threshold. The threshold 1616xsadT is experimentally assumed as

QPxsad eT 0923.01616 76 ⋅= (3)

that is a threshold with at least 98% of skip modes located in the region 16161616 xsadx TSAD < . The average PSNR, bit rate as well as computation saving of the proposed algorithm versus QP are compared with other algorithms and tabulated in TABLE 1 and the average saving of computation for the 13 video sequences is also depicted in Fig. 2. As shown, the proposed hybrid algorithm significantly outperforms other algorithms in computation for any bit-rate coding. For small QP or high bit-rate coding, the bottom-up merging algorithm is a good algorithm, however, the computation saving is not significant for low bit-rate coding or large QP. In contrast, the zero-block detection algorithm is more suitable for low bit-rate coding or small QP. For comparison purposes, the results for foreman video sequence are depicted in Figs. 3 and 4. As shown, all algorithms can achieve coding efficiency as good as the original algorithm. The proposed algorithm, however, greatly outperforms the other investigated algorithms in computation.

(1) eq.in SAD calculate and MB a Find 16x16

? eq.(3)in T SAD sad16x1616x16 ≤

algorithmdetection block -zero Perform

algorithm merging Perform

MB)y (Stationar Y

MB) stationary-(Non N

Figure 1 Proposed hybrid intermode decision algorithm

0

10

20

30

40

50

60

70

80

12 16 20 24 28 32 36 40 44

QP

Tim

e Sav

ing(%

)

Simple ZBD Merging Proposed

Figure 2 Average computation comparison

24

28

32

36

40

44

48

0 200 400 600 800 1000 1200

Bitrate(kbps)

PSN

R(d

B)

Orig. Simple ZBD Merging Proposed

Figure 3 RD curve for Foreman

0

10

20

30

40

50

60

70

80

12 16 20 24 28 32 36 40 44

QP

Tim

e Sa

ving

(%)

Simple ZBD Merging Proposed

Figure 4 Computation comparison for Foreman

4. CONCLUSION

We presented a fast hybrid intermode decision algorithm in which the sum of the absolute difference statistics between macroblocks is used to classify a video object into stationary and non-stationary regions. Two adequate algorithms (ZB detection algorithm and bottom-up merging algorithm) are then applied to the stationary and non-stationary regions, respectively. The experimental results revealed that the proposed algorithm is adequate for any bit-rate coding or any QP. In computation, the proposed algorithm significantly outperforms all other investigated algorithms, while maintaining high coding performance.

Table 1 Coding performance for various algorithms

QP △PSNR(dB)△Bitrate(%) △Time(%) QP △PSNR(dB)△Bitrate(%) △Time(%)14 -0.027 0.10 -18.66 14 -0.007 -0.07 -5.2616 -0.028 0.16 -21.89 16 -0.006 0.03 -7.9418 -0.033 0.15 -25.15 18 -0.014 -0.01 -10.9020 -0.028 0.45 -27.22 20 -0.014 0.12 -14.6322 -0.033 0.34 -29.24 22 -0.019 0.02 -18.0924 -0.034 0.30 -31.72 24 -0.030 0.13 -24.2526 -0.042 0.31 -33.81 26 -0.037 0.14 -28.8028 -0.032 0.32 -35.48 28 -0.037 0.27 -33.6330 -0.044 0.04 -37.16 30 -0.056 0.19 -40.5832 -0.044 0.15 -38.62 32 -0.055 0.07 -45.1134 -0.038 0.32 -39.66 34 -0.074 0.08 -50.9036 -0.046 0.00 -40.57 36 -0.065 -0.05 -57.7838 -0.018 -0.17 -41.31 38 -0.070 -0.64 -62.5340 -0.014 0.36 -41.72 40 -0.078 -0.19 -67.3142 -0.012 0.17 -42.26 42 -0.101 -0.57 -72.14

QP △PSNR(dB)△Bitrate(%) △Time(%) QP △PSNR(dB)△Bitrate(%) △Time(%)14 -0.006 -0.05 -32.95 14 -0.024 0.05 -37.8316 -0.001 0.03 -34.60 16 -0.026 0.10 -40.2618 -0.006 -0.05 -35.54 18 -0.040 0.08 -42.2720 0.000 0.07 -35.72 20 -0.034 0.30 -43.9322 -0.006 -0.06 -36.50 22 -0.036 0.28 -45.3224 -0.012 -0.08 -36.44 24 -0.044 0.20 -47.5326 -0.021 0.00 -36.93 26 -0.062 0.14 -49.2928 -0.016 0.00 -37.59 28 -0.051 0.10 -51.4230 -0.029 -0.07 -38.03 30 -0.069 0.16 -54.8432 -0.029 -0.18 -38.44 32 -0.070 0.05 -57.2234 -0.032 -0.26 -38.95 34 -0.073 0.03 -60.3636 -0.034 -0.09 -39.69 36 -0.081 -0.13 -64.1738 -0.018 -0.08 -40.45 38 -0.063 -0.54 -67.4340 -0.021 0.16 -41.16 40 -0.079 -0.41 -70.6442 -0.030 -0.17 -41.37 42 -0.094 -0.64 -74.28

Simple ZBD

Merging Proposed

5. REFERENCES [1] Y. K. Tu, J. F. Yang, Y. N. Shen and M. T. Sun, “Fast variable-

size block motion estimation using merging procedure with an adaptive threshold,” in Proc. IEEE ICME, vol. 2, July 2003, pp. 789-792.

[2] Z. Zhou and M. T. Sun, “Fast macroblock inter mode decision and motion estimation for H.264/MPEG-4 AVC,” in Proc. IEEE ICIP, vol. 2, Oct. 2004, pp. 789-792.

[3] D. Wu, F. Pan, K. P. Lim, S. Wu, Z. G. Li, X. Lin, S. Rahardja and C. C. Ko, “Fast Intermode Decision in H.264/AVC Video Coding,” IEEE Trans. Circuits Syst. Video Technol., vol.15, no. 6, pp. 953-958, July 2005.

[4] X. Jing and L. P. Chau, “An Efficient Inter Mode Decision Approach for H.264 Video Coding,” in Proc IEEE. ICM, July 2004, pp. 1111-1114.

[5] Y. H. Moon, G. Y. Kim and J. H. Kim, ”An improved early etection algorithm for all-zero blocks in H.264 video encoding,” IEEE Trans. Circuits Syst. Video Technol., vol. 15, pp. 1053–1057, Aug. 2005.