a final project report on residual dpcm for improving inter prediction in hevc for lossless screen...

57
A final project report on Residual DPCM for improving Inter Prediction in HEVC for Lossless Screen Content Coding Under the guidance of Dr. K. R. Rao For the fulfillment of the course Multimedia Processing (EE5359) Spring 2015 Submitted by Siddu Basawaraj Pratapur UTA ID: 1001053422 1

Upload: georgina-boyd

Post on 28-Dec-2015

229 views

Category:

Documents


2 download

TRANSCRIPT

1

A final project report on

Residual DPCM for improving Inter Prediction in HEVC for Lossless Screen Content Coding

Under the guidance of Dr. K. R. Rao 

For the fulfillment of the course Multimedia Processing (EE5359) Spring 2015

 

Submitted by Siddu Basawaraj Pratapur

  UTA ID: 1001053422

2

Acknowledgement

I would like to acknowledge Dr. Rao for his continuous support and guidance during the course of the project. I thank him for providing necessary feedback and dedicating his precious time in reviewing the reports and presentation slides at each step.

I would also like extend my gratitude towards Ms. S.C. Kodpadi and Ms. N.N. Mundgemane in helping me with the test sequences with screen content and other related issues that I faced during the course of the project.

This project wouldn’t have been successful without your guidance and efforts.

3

Table of Contents

   Objective of the Project Basic Concepts of Video Coding

• Color Spaces  H.265/ High Efficiency Video Coding

• Introduction• Encoder and Decoder 

Introduction to Screen Content Coding  Residual DPCM in HEVC Inter-predication

• General considerations and the HEVC coding structure• General method for inter RDPCM• Additional tools for inter RDPCM

4

Progress..

Test Configurations• Intra only configuration• Low delay configurations• Random access configuration

Comparison Metrics • Peak Signal to Noise Ratio• Bjontegaard Delta Bit-rate (BD-BR) and Bjontegaard Delta PSNR (BD-PSNR)• Implementation Complexity

Test Sequences Implementation

• Configuration profiles used for comparison • Parameters modified• Sample command line parameters for HM-16.4+SCM-4.0RC1

Graphs for test sequence parameters References

5

 List of Acronyms and Abbreviations

• AVC : Advanced Video Coding.• CABAC: Context Adaptive Binary Arithmetic Coding. • CTB: Coding Tree Block.• CTU: Coding Tree Unit.• CU: Coding Unit.• CB : Coding Block• DCT : Discrete Cosine Transform.• DBF: De-blocking Filter.• FPGA : Field Programmable Gate Array.• GPB : Generalized P and B picture• HEVC: High Efficiency Video Coding.• HM: HEVC Test Model.• HP : Hierarchical Prediction. • JCT: Joint Collaborative Team.• JCT-VC: Joint Collaborative Team on Video Coding. • JM: H.264 Test Model.

6

• JPEG: Joint Photographic Experts Group.• MV : Motion Vector.• MC: Motion Compensation.• ME: Motion Estimation.• MPEG: Motion Picture Experts Group.• PC : Prediction Chunking.• PU : Prediction Units• PB: Prediction Block.• QP: Quantization Parameter • RDPCM : Residual Differential Pulse code Modulation.• SAO: Sample Adaptive Offset.• TB: Transform Block.• TU: Transform Unit.• VCEG: Video Coding Experts Group.

7

Objective of the Project

• We propose the mathematical implementation of Inter Residual Differential Pulse Code Modulation (inter RDPCM) applied to motion compensated residuals in lossless screen content coding (SCC) scenarios.

• Two additional tools are proposed for inter RDPCM: Prediction Chunking (PC)[1] and Hierarchical Prediction (HP)[1].

• The simulation will be conducted using HM-16.4+SCM-4.0rc1 [2], with different video sequences [3], search range, block sizes and number of frames using GPU multi-core computing.

8

Basic Concepts of Video Coding

Color Spaces :• RGB color space – Each pixel is represented by three numbers indicating

the relative proportions of red, green and blue colors

 • YCrCb color space – Y is the luminance component, a monochrome

version of color image. Y is a weighted average of R, G and B:

 

Y = kr R + kg G + kb B

where k are the weighting factors.

 

9

The popular patterns of sampling [4] are: 

• 4:4:4 – The three components Y: Cr: Cb have the same resolution, which is for every 4 luminance samples there are 4 Cr and 4 Cb samples.

 • 4:2:2 – For every 4 luminance samples in the horizontal direction, there are 2

Cr and 2 Cb samples. This representation is used for high quality video color reproduction.

 • 4:2:0 – The Cr and Cb each have half the horizontal and vertical resolution of

Y. This is popularly used in applications such as video conferencing, digital television and DVD storage.

 

 

10

Figure 1: 4:2:0 sub-sampling pattern [4]

Figure 2: 4:2:2 and 4:4:4 sub-sampling and sampling patterns [4]

11

H.265 / High Efficiency Video Coding

• High Efficiency Video Coding (HEVC) [5] is an international standard for video compression developed by a working group of ISO/IEC MPEG (Moving Picture Experts Group) and ITU-T( International Telecommunication Union ) VCEG (Video Coding Experts Group).

• The main goal of HEVC standard is to significantly improve compression performance compared to existing standards (such as H.264/Advanced Video Coding [6]) in the range of 50% bit rate reduction at similar visual quality[7].

• The macroblocks and blocks in H.264 are replaced by CTU,CU,TU and PU in H.265/HEVC.

12

Figure 3: Block Diagram of HEVC CODEC[10]

13

HEVC Encoder and Decoder

Figure 4: Block Diagram of the HEVC Encoder[6]

14

Figure 5: Block Diagram of the HEVC Decoder[11]

15

• Each picture is split into block-shaped regions, with the exact block partitioning being conveyed to the decoder. The first picture of a video sequence is coded using only intra-picture prediction.

• The encoder and decoder generate identical inter-picture prediction signals by applying motion compensation (MC) using the MV and mode decision data, which are transmitted as side information.

• The transform coefficients are then scaled, quantized, entropy coded, and transmitted together with the prediction information.

• The quantized transform coefficients are constructed by inverse scaling and are then inverse transformed to duplicate the decoded approximation of the residual signal. The residual is then added to the prediction, and the result of that addition may then be fed into one or two loop filters to smooth out artifacts induced by block-wise processing and quantization.

• The duplicate of the output of the decoder is stored in a decoded picture buffer to be used for the prediction of subsequent pictures.

16

Coding tree units and coding tree block (CTB) structure:

Figure 6: 64*64 CTBs split into CBs [13]

17

Coding units (CUs) and coding blocks (CBs):

One Luma CB and ordinarily two Chroma CBs, together with associated syntax, form a coding unit (CU) as shown in Figure.7. A CTB may contain only one CU or may be split to form multiple CUs, and each CU has an associated partitioning into prediction units (PUs) and a tree of transform units (TUs).

Figure 7: CU’s split into CB’s [13]

18

Prediction units and prediction blocks (PBs) :

Figure 8: Partitioning of Prediction Blocks from Coding Blocks [13]

19

TUs and transform blocks :

Figure 9: Partitioning of Transform Blocks from Coding Blocks [13]

20

• Quantization control: As in H.264/MPEG-4 AVC, uniform reconstruction quantization (URQ) is used in HEVC, with quantization scaling matrices supported for the various transform block sizes.

• Entropy coding: Context adaptive binary arithmetic coding (CABAC) is used for entropy coding. This is similar to the CABAC scheme in H.264/MPEG-4 AVC, but has undergone several improvements to improve its throughput speed (especially for parallel-processing architectures).

• In-loop de-blocking filtering: A de-blocking filter similar to the one used in H.264/MPEG-4 AVC is operated within the inter-picture prediction loop. However, the design is simplified in regard to its decision-making and filtering processes, and is made more friendly to parallel processing.

• Sample adaptive offset (SAO): A nonlinear amplitude mapping is introduced within the inter-picture prediction loop after the de-blocking filter. Its goal is to better reconstruct the original signal amplitudes by using a look-up table that is described by a few additional parameters that can be determined by histogram analysis at the encoder side.

21

Introduction to Screen Content Coding

• Screen content refers to images and videos which contain computer generated objects or screen shots from computer applications.

• This kind of content requires efficient compression solutions as its use is becoming more popular in emerging technologies such as desktop sharing, video walls in control rooms, wireless display and digital remote operating rooms for surgeries [15], [16].

• Screen content differs significantly from the camera captured content due to the presence of high frequency features such as sharp edges and high contrast areas.

• The presence of these features reduces the coding efficiency of classical hybrid block-based image and video codecs which use spatial transforms to compact the energy of signals into a few lower frequency coefficients.

22

Residual DPCM in HEVC Inter-prediction

General considerations and the HEVC coding structure

• The HEVC standard performs inter-prediction by means of block-based motion compensation which assumes that all the pixels inside a block move approximately with the same motion. This assumption leads to poor prediction performance along sharp edges.

• In screen content it is reasonable to expect that inter-prediction residuals still present some correlation along image edges, which can be exploited by performing a spatial DPCM along the edge direction. This intuition is the basis for the proposed inter RDPCM.

• Several directions may be considered; however, to limit the computational complexity, only horizontal and vertical ones are included since they are predominant in screen content.

23

General method for inter RDPCM• Let r(i, j) be the elements of an M×N residual block of inter-predicted luma or

chroma samples where M and N are the block height and width respectively. The vertical inter RDPCM mode is defined as follows.

The horizontal inter RDPCM mode is defined in a similar way.

Figure 10: Hierarchical prediction (green line) - an additional step is applied on the top row after vertical RDPCM (red lines)[1].

24

• At the decoder side, when vertical RDPCM is selected, the residuals r(i, j) to be added to the motion compensated prediction are obtained as follows

 

For horizontal RDPCM, the summation is performed across the current row.

25

Additional tools for inter RDPCM:• These two observations motivated the design of the two proposed

prediction chunking (PC) and hierarchical prediction (HP) tools.

• The prediction chunking tool limits the residual DPCM prediction to groups of samples with a specified length L, denoted as chunking length. In this way the RDPCM process is reset every L samples so that the number of operations per sample at the decoder side is reduced. The vertical RDPCM prediction when the PC tool is used is defined as follows:

26

• At the decoder, the residuals can be reconstructed as follows:

where the operator returns the largest integer smaller than or equal to the argument than its argument. Equivalent expressions for forward and inverse inter RDPCM can be easily derived for the horizontal mode when using PC.

27

• Once RDPCM is performed on a block, samples in the first column and the first row for horizontal and vertical RDPCM, respectively, are not predicted. Therefore it is beneficial to exploit redundancy by performing prediction on these samples in the direction orthogonal to the main RDPCM direction. The HP tool performs a RDPCM along the first column of samples when horizontal RDPCM is selected as the best mode or along the first row for vertical RDPCM. For the case of vertical RDPCM, the HP is defined as:

28

Test Configurations

Intra only configuration• All the frames are encoded as independent I frames. • Have no dependency with the neighboring frames.• Spatial compression is seen.

QPI

time

0 1 3 5 72 64 8

IDR Picture

QPI ・・・・・

Figure 19: Graphical presentation of Intra-only configuration [34].

29

Low delay configuration• Only the first frame in the video sequence is encoded as I frame.• Other successive pictures will be encoded as Generalized P and B-picture 

(GPB).• Both Spatial and Temporal compression occurs.

QPI QPBL1=QPI+1QPBL1=QPI+1

QPBL2=QPI+2

QPBL3=QPI+3

time

QPBL3=QPI+3 QPBL3=QPI+3 QPBL3=QPI+3

QPBL2=QPI+2

0

1 3 5 72

46

8

IDR or Intra Picture

GPB(Generalized P and B) Picture

Figure 20 : Graphical presentation of Low-delay configuration [34].

30

Random access configuration• Only the first frame in the video sequence is encoded as I frame.• Other successive pictures will be encoded as Generalized P and B-picture (GPB).• The frames may be encoded in a random manner with ‘Open GOP’.

 

QPI QPBL1=QPI+1

QPBL2=QPI+2

QPBL3=QPI+3

QPBL4=QPI+4

GPB(Generalized P and B) Picture

time

Referenced B Picture

Non-referenced B Picture

0

5 6 7 8

3 4

2

1

IDR or Intra Picture

Referenced B Picture

Figure 21 : Graphical presentation of Random-access configuration [34].

31

Comparison Metrics

• Peak Signal to Noise Ratio Peak signal-to-noise ratio (PSNR) [35] [36] is an expression for the ratio between the maximum

possible value (power) of a signal and the power of distorting noise that affects the quality of its representation.

PSNR is usually expressed in terms of the logarithmic decibel scale.

• Bjontegaard Delta Bit-rate (BD-BR) and Bjontegaard Delta PSNR (BD-PSNR) BD metrics allow to compute the average gain in PSNR or the average per cent saving in Bit-rate

between two rate-distortion curves. However, BD-PSNR has a critical drawback: It does not take the coding complexity into account

[37].

• Implementation Complexity The computational time for various configuration profiles in HM-16.4+SCM-4.0rc1 software will be

compared and this serves as an indication of implementation complexity.

32

Test Sequences

Figure 22 : Different video resolutions ranging from mobile devices, tablets to advanced Televisions [39].

33

• The following test sequences [23] of various resolutions are used for study of different configuration profiles of HEVC codecs:

 

Table 1 : List of test sequences for different resolutions

34

35

36

Implementation

• Parameters modified

F - Number of frames to be encoded

Fr - Frame rate

Wdt - Width of the video sequence

Hgt - Height of the video sequence

Profile - encoder_intra_main/ encoder_randomaccess _main /encoder_ lowdelay_main.

• Testing Platform 

Processor - Intel(R) Core(TM) i5-2410M CPU @ 2.30GHz

Number of cores - 4

Memory - 4.00 GB

Operating System - 64 bit Windows(TM)7 Ultimate OS 

37

• Sample command line parameters for HM-16.4+SCM-4.0RC1

Encoding

C:\Users\Siddu_Pratapur\Desktop\HEVC16.4\bin\vc9\Win32\Debug>TAppEncoder.exe

–c

C:\Users\Siddu_Pratapur \Desktop\HEVC16.4\cfg\encoder_intra_main.cfg

–wdt 352 –hgt 288 –fr 24 –f 90

-i

C:\Users\Siddu_Pratapur\Desktop\Test_sequences_n_results\Without_modifications\CIF\Elephant\ElephantsDream_CIF_24fps.yuv

>>C:\Users\Siddu_Pratapur\Desktop\Test_sequences_n_results\Without_modifications\CIF\Elephant\Encoded_data\Log_Encoded_IM.txt

 

Decoding

 C:\Users\Siddu_Pratapur\Desktop\HEVC16.4\bin\vc9\Win32\Debug>TAppDecoder.exe

–b

C:\Users\Siddu_Pratapur\Desktop\Test_sequences_n_results\Without_modifications\CIF\Elephant\Encoded_data\str_IM.bin

–o

C:\Users\Siddu_Pratapur\Desktop\Test_sequences_n_results\Without_modifications\CIF\Elephant\Decoded_data\Elephant_Decoded.yuv

>>C:\Users\Siddu_Pratapur\Desktop\Test_sequences_n_results\Without_modifications\CIF\Elephant\Decoded_data\Elephant_Decoded_IM.txt

38

39

40

Bit-rate Comparison

41

42

Size of the binary file

43

44

%BD Bit-rate

45

46

BD-PSNR

47

48

Encoding time

49

50

Decoding time

51

52

Conclusion and Further work :

From this project, we can conclude that the Random access configuration in HM-16.4+SCM-4.0RC1 [18] software gives optimum Encoding and Decoding results for both Natural and Screen Content for the video sequences of various resolutions.

1. Other comparison metrics such as SSIM can be calculated and tabulated for comparison for each of the test sequences.

2. Test sequences with higher resolutions like 2048 x 872 (2K), 4096 x 1744 (4K) sequences can be used for test comparison.

3. Even better Encoding and Decoding speeds can be obtained by using machines with higher processing speeds or FPGA implemented General Processing Units dedicated for Encoding and Decoding only.

53

REFERENCES [1] M. Naccari1et al, “Improving Inter Prediction in HEVC with Residual DPCM for Lossless Screen Content

Coding”, Picture Coding Symposium (PCS), 2013, San Jose, CA, 361 – 364, 8-11 Dec. 2014.

 

[2] HM-16.4+SCM-4.0rc1 software - https://hevc.hhi.fraunhofer.de/trac/hevc/milestone /HM-16.4+SCM-4.0rc1  

[3] Video test sequences - https://media.xiph.org/video/derf/

 

[4] I.E.G. Richardson, “Video Codec Design: Developing Image and Video Compression Systems”, Wiley, 2002.

 [5] B. Bross et al, “High Efficiency Video Coding (HEVC) Text Specification Draft 10”, Document JCTVC-L1003, ITU-T/ISO/IEC Joint Collaborative Team on Video Coding (JCT-VC),Mar.2013 http://phenix.itsudparis.eu/jct/doc_end_user/current_document.php?id=7243

[6]D. Marpe et al, “The H.264/MPEG4 advanced video coding standard and its applications”, IEEE Communications Magazine, Vol. 44, pp. 134-143, Aug. 2006.

 

[7] G. J. Sullivan et al, “Overview of the High Efficiency Video Coding (HEVC) Standard”, IEEE Transactions on Circuits and Systems for Video Technology, Vol. 22, No. 12, pp. 1649-1668, Dec. 2012.

 

[8] HEVC white paper-Ateme: http://www.ateme.com/an-introduction-to-uhdtv-and-hevc

54

 

[9] G. Sullivan et al, “Standardized Extensions of High Efficiency Video Coding (HEVC)”, IEEE Journal of selected topics in Signal Processing, Vol. 7, No. 6, pp. 1001-1016, Dec. 2013.

 

[10] HEVC tutorial by I.E.G. Richardson: http://www.vcodex.com/h265.html

[11] C. Fogg, “Suggested figures for the HEVC specification”, ITU-T / ISO-IEC Document: JCTVC J0292r1, July 2012.

 [12] T. Wiegand et al, “Overview of the H.264/AVC Video Coding Standard”, IEEE Transactions on Circuits and Systems for Video Technology, Vol. 13, No. 7, pp. 560-576, Jul. 2003.

 

[13] U.S.M. Dayananda, “Study and Performance comparison of HEVC and H.264 video codecs” Final project report, EE Dept., UTA, Arlington, TX, Dec. 2011 available on http://www.uta.edu/faculty/krrao/dip/Courses/EE5359/index_tem.html

.

 [14] B. Bross et al, “High Efficiency Video Coding (HEVC) Text Specification Draft 10”, Document JCTVC-L1003, ITU-T/ISO/IEC Joint Collaborative Team on Video Coding (JCT-VC), Mar. 2013 available on http://phenix.it-sudparis.eu/jct/doc_end_user/current_document.php?id=7243

[15] T. Vermeir, “Use cases and requirements for lossless and screen content coding”, JCTVC-M0172, 13th JCT-VC meeting, Incheon, KR, Apr. 2013.

 

55

[16] J. Sole, R. Joshi and M. Karczewicz, “Requirements for wireless display applications”, JCTVC-M0315, 13th JCT-VC meeting, Incheon, KR, Apr. 2013.

[17] A. Gabriellini et al, “Combined Intra-Prediction for High-Efficiency Video Coding”, IEEE J. of Sel. Topics in Signal Processing. Vol. 5, no. 7; pp. 1282-1289, Nov. 2011.

 [18] Software repository for HEVC - http://hevc.hhi.fraunhofer.de/

 

[19] HM Software Manual - https://hevc.hhi.fraunhofer.de/svn/svn_HEVCSoftware/

 

[20] Visual studio: http://www.dreamspark.com

[21] Tortoise SVN: http://tortoisesvn.net/downloads.html

[22] Multimedia processing course website: http://www.uta.edu/faculty/krrao/dip/

[23] K.R. Rao, D.N. Kim and J.J. Hwang, “Video Coding Standards: AVS China, H.264/MPEG-4 Part 10, HEVC, VP6, DIRAC and VC-1”, Springer, 2014.

[24] V.Sze, M.Budagavi, G.J. Sullivan,”High Efficiency Video Coding (HEVC) Algorithms and Architectures” Springer,2014.

 

56

 

[25] G. Braeckman et al,"Lossy-to-Iossless screen content coding using an HEVC base-layer." in Proceedings of IEEE international Conference on Signal Processing (DSP), Santorini, Greece, 1-3 July, 2013.

[26] M. Mrak and J.Z. Xu, "Improving screen content coding in HEVC by transform skipping," in Proceedings of 20th European Signal Processing Conference (EUSIPCO), August 2012.

 

[27] M. Wien, H. Schwarz, and T. Oelbaum, “Performance analysis of SVC,” Special issue on Scalable Video Coding (SVC), IEEE Transactions on Circuits and Systems for Video Technology, vol. 17, no. 9, pp. 1194-1203, Sep. 2007.

 

[28] W. Zhu,et al, "Screen Content Coding Based on HEVC Framework," IEEE Transactions on Multimedia, vol.16, no.5, pp.1316-1326, Aug. 2014.

[29] I.E.G. Richardson, “ Coding Video: A practical guide to HEVC and beyond”, Wiley, 11 May 2015.

[30] Aggelos K. Katsaggelos, An online course on “Fundamentals of Image and Video Coding” ,Northwestern University - https://www.coursera.org/course/digital.

[31] N.N. Mundgemane, A thesis proposal on “Multi-stage prediction scheme for Screen Content based on HEVC” M.S. Thesis, EE Dept., UTA, Arlington, TX, Sep. 2014, available on http://www.uta.edu/faculty/krrao/dip/Courses/EE5359/index_tem.html

 

 

57

[32] S. Kodpadi, A thesis proposal on “Fast algorithms for Screen Content Coding in HEVC” M.S. Thesis, EE Dept., UTA, Arlington, TX, Sep. 2014, available on http://www.uta.edu/faculty/krrao/dip/Courses/EE5359/index_tem.html

 

[33] W. Zhu, et al, "Compound image compression by multi-stage prediction," IEEE Trans. on Visual Communications and Image Processing (VCIP), pp.1-6, 27-30 Nov. 2012.

[34] I.K. Kim et al , “Coding of moving pictures and audio”, ISO/IEC Joint Collaborative Team on Video Coding (JCT-VC) of ITU-T SG16 WP3 and ISO/IEC JTC1/SC29/WG11, Document: JCTVC-K1002-v1 ,11th Meeting: Shanghai, CN, 10–19 October 2012.

[35] White paper on PSNR-NI: http://www.ni.com/white-paper/13306/en/

[36] Website on PSNR: http://en.wikipedia.org/wiki/Peak_signal-to-noise_ratio

[37] X. Li et al, “Rate-complexity-distortion evaluation for hybrid video coding”, IEEE International Conference on Multimedia and Expo (ICME), pp. 685-690, Jul. 2010.

[38] G. Bjontegaard, “Calculation of Average PSNR Differences between RD Curves”, document VCEG-M33, ITU-T SG 16/Q 6, Austin, TX, Apr. 2001.

[39] Different Video Resolutions : http://www.mediamerge.com/what-your-tech-team-needs-to-know-about-hd-video-projection /

[40] Video test sequences with screen content : http://trace.eas.asu.edu/yuv/

[41] G. Correa et al, “Fast  HEVC encoding decisions using data mining”, IEEE Trans. CSVT, vol.25, pp. 660-673, April 2015.

[42] JCT-VC documents are publicly available at http://ftp3.itu.ch/av-arch/jctvc-site and http://phenix.it-sudparis.eu/jct/