3-d direction aligned wavelet transform for scalable video coding

3-D Direction Aligned Wavelet Transform for Scalable Video Coding

Yu Liu1, King Ngi Ngan1, and Feng Wu2

1Department of Electronic Engineering The Chinese University of Hong Kong

2Internet Media Group, Microsoft Research Asia, Beijing, China

ISCAS2008, Seattle, USA, May 18-21, 2008

Y. Liu, K.N. Ngan and F. Wu 13-D Direction Aligned Wavelet Transform for SVC

• Introduction• 3-D Directional Threading• 3-D Extension of Weighted Adaptive Lifting• Experimental Results• Conclusion


Outline

Introduction3-D Directional Threading

3-D Extension of Weighted Adaptive LiftingExperimental Results and Conclusion

• 3-D Wavelet-based Scalable Video Coding– full spatio-temporal-quality scalability– non-redundant 3-D subband decomposition– comparable with H.264-based JSVM scheme

• In temporal domain– Motion Aligned Temporal Filtering (MATF)

• motion compensation is incorporated into temporal wavelet transform

• In spatial domain– Conventional 2-D lifting-based wavelet transform

• uses the elements in neighbor horizontal or vertical direction

– However, richly directional attributes in natural image/video• such as linear edges, in neither horizontal nor vertical direction


Introduction

3-D Wavelet–based Scalable Video CodingDirectionally Spatial Wavelet Transform for Image Coding



• 2-D DWT with directionally spatial prediction for image coding– Adaptive Directional Lifting (ADL)-based DWT [Ding2007]

– Direction-Adaptive (DA) DWT [Chang2007]

– Weighted Adaptive Lifting (WAL)-based DWT [Liu2007a]

• There is no literature incorporating directionally spatial wavelet transform into the framework of video coding, not to mention scalable video coding


Introduction

3-D Wavelet–based Scalable Video CodingDirectionally Spatial Wavelet Transform for Image Coding



• Temporal Motion Threading (MTh) [Liu2007b]– an efficient implementation of Motion Aligned Temporal Filtering (MATF)

– Direction Aligned Spatial Filtering (DASF) vs. MATF• by aligning the direction of the spatial wavelet filtering to the direction of the edges• 2-D Spatial Directional Threading vs. Temporal Motion Threading

– two separable 1-D threading:» horizontal directional threading » vertical directional threading


3-D Directional Threading

Temporal Motion Threading 3-D Direction Coordinate SystemGeneralized Separable 3-D Directional Threading



VOP 0 VOP1 VOP 2 … ... ... VOP n-1 VOP n

L0 L1 L2 L3

H0 H1 H2 H3

many-to-one mapping

non-referred pixel

normal pixel

terminatingpixel

ME ME ME ME

• 3-D Direction Coordinate System– 3-D direction coordinate system in a unified framework

• where x, y, and z denote the horizontal, vertical, and temporal direction, respectively

– 3-D direction vector, dv={dx,dy,dz}• dz =-1, dz = 1, and dz = 0 indicate that the current block is forward, backward and

not temporal direction compensated, respectively.• {dx, dy} denote displacements in horizontal and vertical direction






z

y

x

• Generalized separable 3-D directional threading– to unify the concepts of temporal motion threading and 2-D spatial

directional threading

– in each direction axis, pixels along the same directional trajectory are linked to form a directional thread according to direction vectors of blocks they belong to.






1 2 … ... ... 2n 2n+1

H0 ... ... Hn

DE DE DEDE DE DE DE

Frame/Row/Column

0

L0 L1 ... Ln

3-D Extension of Weighted Adaptive Lifting

• Original Weighted Adaptive Lifting (WAL) [Liu2007a]– Weighted Function

• Integer Pixel precision:

• Sub-Pixel precision:

where is the coefficient factor of a certain interpolation filter.

• Weighted Lifting

8

)1(;1 :constraint under the,],[)( ,,,, k kiik kikiekiei wWnnmmxwxf

)4(,],[],[],[

)3(,],[],[],[0

1 ,,,

1

0 ,,,

j l ljljeljje

i k kikiekiio

nnmmxwunmxnmc

nnmmxwpnmxnmd

)0,1(],,[)( ,, akiakieei wwanimxxf

)(,],[)( ,,,,, kikik kikiekiei wnnmmxxf ki ,

)2(

xn-1

xn

xn+1

(b)(a)

xn-1

xn

xn+1

ColumnRow /

2m 2m+1 2m+2

ColumnRow /

2m 2m+1 2m+2

xn

xn+1

(d) (e)

xn

xn+1

ColumnRow /

2m 2m+1 2m+2

ColumnRow /

2m 2m+1 2m+2

xn-1 xn-1

(c)

xn-1

xn

xn+1

ColumnRow /

2m 2m+1 2m+2


Original Weighted Adaptive LiftingImproved Weighted Lifting for 3-D TransformsDirectional Adaptive Interpolation for 3-D Transforms3-D WAL-based Direction Aligned Wavelet Transform



ColumnRow /

2m 2m+1 2m+22m-12m-2 2m+3 2m+4

xn-1

xn

xn+1

xn-3

xn-2

xn+2

xn+3

h2

h1

h0

h-1

h-2

h-3

ColumnRow /

2m 2m+1 2m+22m-12m-2 2m+3 2m+4

xn-1

xn

xn+1

xn-3

xn-2

xn+2

xn+3

h2h1

h0

h-1

h-2

h-3


• Original Weighted Adaptive Lifting (WAL)– Directional Interpolation

– Adaptive Interpolation Filter

• To find the optimal filter,minimize the energy of the high subband

by using the Wiener-Hopf equation

9

)5(])],[],[[(minarg 21

0 ,,,

i k kikiekiiol nnmmxwpnmxEw

ColumnRow /

2m 2m+1 2m+22m-12m-2 2m+3 2m+4

xn-1

xn

xn+1

xn-3

xn-2

xn+2

xn+3

h2

h1

h0

h-1

h-2

h-3

Telenor 4-tap

bilinear2-tap

(h-2, h-1, h0, h1)

(h-3, h2)





• Improved Weighted Lifting for 3-D Transforms– Weighted Lifting

• works well for single lifting stage, such as (6,6), 5/3-tap filters– not only in spatial transform [Chang2007,Liu2007a]– but also in temporal transform [Xiong2004]

• over/under-weighted update problems for multiple lifting stages, such as 9/7-tap– reason: update equation doesn’t fulfill the constraint condition in Eq.(1)

– Solutions to over/under-weighted update problems

• Case 1:

• Case 2:

• Case 3:






1, k kiw

)6(],[],[],[0

1 ,,,

j jl ljljeljjje nnmmxwunmxnmc

1 0 1 0 1 01 0

1 1, , 0, 1 and 1if W W

W W

01 0 1 0 1 0

1

11 0 1 0 1 0

0

2, 1, 0, 1 and 1

21, , 0, 1 and 1

Wif W W

W

Wif W W

W

1 0 1 02 2 11, (1 ) ( )[ , ], 1 and 1j j m m jW DA d m j n if W W

• Directional Adaptive Interpolation for 3-D Transforms– Directional interpolation

• extended to temporal domain– To simplify the explanations, the example is

restricted to one spatial coordinate x.– 1-D image lines instead of 2-D images– 3-D filter is restricted to a 2-D filter

• to interpolate the sub-pixel si,2,(i=0,1) in Frame2m+1+(-1)i+1, the integer pixels include

– not only {hi,-2,hi,-1,hi,0,hi,1} in Frame 2m+1+(-1)i+1

– but also {hi,-3,hi,2} in Frame 2m+1+3(-1)i+1

– Adaptive Interpolation Filter• In order to find adaptive filter coefficients for temporal domain, the minimization

problem can also be solved by Wiener-Hopf Eq. (5)






Frame/Row/Column

2m 2m+1 2m+22m-12m-2 2m+3 2m+4

xn-1

xn

xn+1

xn-3

xn-2

xn+2

xn+3

h1,2h1,1

h1,0

h1,-1

h1,-2

h1,-3

s1,2

h0,-2

h0,-1

s0,2

h0,0

h0,1h0,2

h0,-3

• 3-D WAL-based Direction Aligned Wavelet Transform– Apply the 3-D directional threading technique to align a series of video

frames to form a totally direction-aligned 3-D video cube– Within each GOP, apply temporal WAL with 5/3-tap filter to each

temporal directional thread• Perform the above operation to the low-pass temporal bands until the desired level

of temporal wavelet decomposition is reached

– Within each frame, apply the spatial WAL with 5/3-tap or 9/7-tap filters to the 2-D spatial directional thread

• Apply the WAL to each horizontal directional thread• Apply the WAL to each vertical directional thread• Perform the above operation to the low-low-pass spatial bands until the desired

level of spatial wavelet decomposition is reached






• MSRA 3-D wavelet video coder VIDWAV 2.0 is used as the reference software

– The 3-D DWT and MATF modules are replaced with the proposed 3-D WAL-based DAWT– Other modules, such as bit-plane coding, entropy coding, etc., keep unchanged.

• Two MPEG standard test sequences: (a) Carphone and (b) Foreman


Experimental Results

Experimental ResultsConclusion



• The performance comparison of 3-D WAL-based and 3-D DWT-based SVC for the Y component of decoded Carphone and Foreman at CIF 30 Hz with 5/3 and 9/7-tap spatial filter






• The performance comparison of 3-D WAL-based and 3-D DWT-based SVC for the Y component of decoded Carphone and Foreman at CIF 15 Hz with 5/3 and 9/7-tap spatial filter






• The performance comparison of 3-D WAL-based and 3-D DWT-based SVC for the Y component of decoded Carphone and Foreman at QCIF 15 Hz with 5/3 and 9/7-tap spatial filter






• The performance comparison of 3-D WAL-based and 3-D DWT-based SVC for the Y component of decoded Carphone and Foreman at QCIF 7.5 Hz with 5/3 and 9/7-tap spatial filter






• Coding Performance Comparison (3-D WAL vs. 3-D DWT)– The highest PSNR gain can be up to 1.62 dB– For 5/3-tap spatial filter

• average PSNR gains are 0.89 dB for Carphone and 1.12 dB for Foreman, respectively

– For 9/7-tap spatial filter• average PSNR gains are 0.69 dB for Carphone and 1.00 dB for Foreman, respectively.

• Complexity Comparison (3-D WAL vs. 3-D DWT)– on encoder side

• Increases considerably complexity • due to 3-D direction estimation process

– on decoder side• has similar complexity• due to asymmetric design






• 3-D Direction Aligned Wavelet Transform for Scalable Video Coding– 3-D generalized directional threading– 3-D extension of weighted adaptive lifting

• References– [Ding2007] W. Ding, F. Wu, X. Wu, S. Li, and H. Li, ”Adaptive directional lifting-based wavelet transform for

image coding,” IEEE Trans. Image Process., vol.16, no.2, pp.416-427, Feb. 2007

– [Chang2007] C.-L. Chang and B. Girod, ”Direction-adaptive discrete wavelet transform for image compress,” IEEE Trans. Image Process., vol.16, no.5, pp.1289-1302, May 2007

– [Liu2007a] Y. Liu and K.N. Ngan, ”Weighted adaptive lifting-based wavelet transform,” 2007 IEEE Int. Conf. Image Process. (ICIP2007), San Antonio, USA, Sept. 2007

– [Liu2007b] Y. Liu, F. Wu, and K.N. Ngan, ”3-D object-based scalable wavelet video coding with boundary effect suppression”, IEEE Trans. Circuits Syst. Video Technol., vol.17, no. 5, pp.639-644, May 2007

– [Xiong2004] R. Xiong, F. Wu, J.Xu, S. Li and Y.-Q. Zhang, ”Barbell lifting wavelet transform for highly scalable video coding,” Picture Coding Symposium 2004, USA, Dec 2004


Conclusion




Thank You !

Q&AQ&A

3-D Direction Aligned Wavelet Transform for SVC 20Y. Liu, K.N. Ngan and F. Wu

3-d direction aligned wavelet transform for scalable video coding

Documents

d threading

svc3d wavelet

conclusion3d direction

svc2d dwt

spatial wavelet filtering

directional attributes

spatial prediction

spatial filtering dasf