chapter 12 transform domain techniques for multimedia image and

20
Suthaharan, S. et al. Transform Domain Techniques for Multimedia Image and Video CodingMultimedia Image and Video Processing Ed. Ling Guan et al. Boca Raton: CRC Press LLC, 2001

Upload: others

Post on 12-Sep-2021

2 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Chapter 12 Transform Domain Techniques for Multimedia Image and

Suthaharan, S. et al. “Transform Domain Techniques for Multimedia Image and Video Coding”Multimedia Image and Video ProcessingEd. Ling Guan et al.Boca Raton: CRC Press LLC, 2001

Page 2: Chapter 12 Transform Domain Techniques for Multimedia Image and

© 2001 CRC Press LLC

Chapter 12

Transform Domain Techniques for MultimediaImage and Video Coding

S. Suthaharan, S.W. Kim, H.R. Wu, and K.R. Rao

12.1 Coding Artifacts Reduction

12.1.1 Introduction

Block-based transform coding is used for compressing digital video that normally requiresa large bandwidth during transmission. Compression of digital video is vital for the reductionof bandwidth for effective storage and transmission. However, this results in coding artifactsin the decoded video, especially at low bit rates. Techniques, in either the spatial domain orthe transform domain, can be developed to reduce these artifacts. Several methods have beenproposed in the spatial domain to reduce the so-called blocking artifact. However, none ofthese methods can reduce all the coding artifacts [1] at the same time. Some methods are imageenhancement techniques and others are intrinsically iterative, which makes them impossiblefor real-time applications [2]. Also, they do not completely eliminate the artifact. Hence,researchers are investigating new approaches.

The objective of the proposed approach is to present a new transform domain filteringtechnique to reduce a number of coding artifacts [1] including the well-known blocking artifact.In digital video coding, compression is achieved by first transforming the digital video fromthe spatial and temporal domains into the frequency domain using the block-based discretecosine transform (DCT) and then applying quantization to the transform coefficients followedby variable-length coding [3].

The DCT is a block-based transform, and at low bit rates the noise caused by the coarsequantization of transform coefficients is visible in the form of a blocking artifact. In orderto reduce this artifact while maintaining compatibility with current video coding standards,various spatial domain postfiltering techniques such as low-pass filtering (LPF) [4], projectiononto convex sets (POCS) [5], maximum a posteriori (MAP) [6] filters, and adaptive low-passfilters (ALPFs) [7, 8] have been introduced.

Because the quantization of transform coefficients is the main error source for the codingartifact at low bit rates, it would be much more effective in tackling the coding artifacts in thetransform domain than in the spatial domain. Recently, a weighted least square (WLS) [2]method has been introduced in the transform domain to estimate the transform coefficientsfrom their quantized versions. It assumes the uniform probability density function for thequantization errors and estimates their variances from the step size of the corresponding quan-tizer. It also estimates the variances of signal and noise separately and thus increases thecomputational complexity as well as computational errors. Therefore, it is more sensible to

Page 3: Chapter 12 Transform Domain Techniques for Multimedia Image and

© 2001 CRC Press LLC

estimate the signal-to-noise ratio (SNR) as a single entity because this ratio plays a major rolein a number of image quality restoration techniques.

We have recently proposed an improved Wiener filter (IWF) [9] by estimating the SNRin an image restoration problem. This IWF is also suitable to reduce the coding artifact incompressed images. In this section, the IWF and the WLS is investigated and modified anda new approach is developed in the transform domain to reduce the coding artifacts. First,the methods WLS and approximated WLS∗ are investigated and implemented to reduce theblocking artifacts. Second, the method IWF is further investigated and the SNR of the quantizedtransform coefficients is estimated. This estimated SNR is used with the IWF to reduce thenoise in the transform coefficients. This noise is the source of coding artifacts. Reducing suchnoise results in a corresponding reduction of the coding artifacts.

12.1.2 Methodology

Let us first discuss the mathematical model used in image and video transform codingschemes. It is clear from the digital image and video coding literature that block-based trans-form coding can be modeled as follows:

Y = Q(T (x)) , (12.1)

where x, T , Q, and Y represent the input image, the discrete cosine transform, the quantizationprocess, and the quantized DCT coefficients, respectively.

Using this transform coding mechanism, a linear model for the quantization error can bedeveloped and written as follows:

n = Y − T (x) , (12.2)

where n is the quantization error, which introduces the signal-independent noise on the trans-form coefficients T (x) and results in Y . Therefore, n is often called the quantization noise. Tosimplify the above linear noise model, we write it as follows:

Y = X + n , (12.3)

where X = T (x) and n is a zero-mean additive noise introduced by the quantization processof the DCT coefficients. Without loss of generality it can be assumed that the noise n is alsouncorrelated with transform coefficients X.

With the a priori information about n and X, the latter can be estimated from Y using theWiener filter technique [10, 11]. That is,

X̂ = IFFT{

11+|n|2/2

· FFT (Y )}

(12.4)

where FFT and IFFT represent the fast Fourier transform and inverse fast Fourier transform,respectively. The symbol | · |2 represents the power spectrum, and the ratio |n|2/|X|2 is calledthe noise-to-signal power ratio (which is the a priori representation of the SNR).

As we can see from (12.4), computation of X-hat requires the a priori knowledge of powerspectra of the nonquantized transform coefficient (X) and the quantization noise (n). Inreality such information is rarely available. We propose two approaches to estimate the noise-to-signal power ratio as a whole and use it in (12.4) to restore the transform coefficients fromthe corrupted version.

APPROACH I (IWF): Assuming Noise Power Spectrum is Known

In this approach, we use the assumption suggested by Choy et al. [2] that the quantizationnoise has a uniform probability density function. Thus quantization noise variance is given

Page 4: Chapter 12 Transform Domain Techniques for Multimedia Image and

© 2001 CRC Press LLC

by σ 2n = q2/12, where q is the known step size of the quantizer applied to the transform

coefficients X to get Y .Since the quantization noise n is assumed zero-mean uncorrelated with X, using (12.3) we

can derive [9, 12]:

|Y |2 = |X|2 + |n|2 , (12.5)

where |n|2 = σ 2n , which can be calculated from the quantization step size.

It gives

|n|2/|X|2 = |n|2/(|Y |2 − |n|2

). (12.6)

The noise-to-signal power ratio can be calculated from the power spectra of the quantizedtransform coefficients Y and the quantization noise n.

Using this ratio and (12.4), we can reduce the quantization error in the transform coefficientsand in turn can reduce a number of coding artifacts including the blocking artifact in thedecompressed images.

The advantage of this proposed method over WLS is that it uses only one assumption (noisevariance), whereas WLS uses one approximation (mean of X) and one assumption (noisevariance). Because no approximation is used in our proposed approach, it gives better resultsin terms of peak signal-to-noise ratio (PSNR) as well as visual quality.

APPROACH II (IWF*): Estimate the Noise-to-Signal Power Ratio

In this approach, we propose a technique to estimate the noise-to-signal power ratio as awhole and then use it in the Wiener filter equation (12.4). We recently proposed the IWFtechnique to handle the image restoration problem [9]. In this technique, we do not need thea priori information about the noise-to-signal power ratio in order to apply the Wiener filter;instead, it is estimated from the given degraded image. It has been successfully used in theimage restoration problem. We use this approach to remove the coding artifacts introduced bythe coarse quantization of DCT coefficients.

The IWF method needs two versions of an image so that the noise-to-signal power ratio can beestimated. In digital video, we have a sequence of images (frames) and the consecutive frameshave very few differences, except when the scene changes. Therefore, the decoded frames canbe different because of the different quantization scalers used and thus the quantization error(noise) can be different. Let us assume that the DCT coefficients of the ith frame are to berestored from its quantization noise; then we can model the quantized DCT coefficients of theith and (i + 1)th frames as follows [refer to (12.3)]:

Yi = X + ni

Yi+1 ≈ X + ni+1 (12.7)

The restriction of the method is that the adjacent frames cannot be quantized with the samequantization scaler. In the above equations, it has been assumed that there is no scene changeand thus the approximation of the second equation is valid. Therefore, Yi+1 cannot be usedwhen the scene changes. To overcome this problem, we suggest using the previous frame asthe second one when the scene changes; thus, we can write:

Yi = X + ni

Yi−1 ≈ X + ni−1 (12.8)

To implement IWF we need only two versions of an image, and in digital video coding wecan have two images using either the previous frame or the next frame for the second image as

Page 5: Chapter 12 Transform Domain Techniques for Multimedia Image and

© 2001 CRC Press LLC

shown in (12.7) and (12.8). In case of scene changes on the previous and the next frames, wecan construct the second frame (i +1 or i −1) from the ith frame using the methods discussedin [10]. If we represent the quantized DCT coefficients of the frame (that is to be restored) byY1 and the second one by Y2, then we have

Y1 = X + n1

Y2 ≈ X + n2 (12.9)

In video compression, the quantization scaler is used as a quantization parameter, and it isvalid to assume a linear relationship between the quantization noises n1 and n2. Thus, thelinear relationship n2 = an1 is acceptable (where a is a constant).

Therefore, equation (12.9) can be written as follows:

Y1 = X + n1

Y2 ≈ X + a · n1 (12.10)

From these two equations and using the definition of a power spectrum [12], we can easilyderive

|Y1 − Y2|2 ≈ (1 − a)2 |n1|2|Y2 − a · Y1|2 ≈ (1 − a)2|X|2 (12.11)

Dividing the first equation by the second, the noise-to-signal power ratio of Y1 can be approx-imated as follows:

|n1|2|X|2 ≈ |Y1 − Y2|2

|Y2 − a · Y1|2 , (12.12)

where the constant a can be calculated during the encoding process as follows:

a = V ar(Y2 − X)

V ar(Y1 − X), (12.13)

and a single value for each frame can be transmitted to the decoder. Although an extra overheadon the bit rate is created, compared to the PSNR improvement and visual quality of the images,it is marginal. This overhead can be reduced by transmitting the differences in a.

On the other hand, to avoid this overhead, the constant a can be approximated to the corre-sponding ratio calculated from the decoded versions of Y1 and Y2 as follows:

a = V ar(decoded(Y2))

V ar(decoded(Y1)). (12.14)

The Wiener filter in (12.4), replacing Y by Y1 and n by n1 along with the above expressions,has been used to restore the transform coefficients X of Y1.

The important point to note here is that the proposed methods have been used in the transformdomain to reduce the noise presented in the quantized DCT coefficients. This in turn reducesthe blocking artifacts in the decompressed images. Also, note that if we cannot assume that theconsecutive frames have very little differences, we can still use equation (12.7) by constructingthe second image from the first according to the method discussed in [9, 10] by Suthaharan,and thus equation (12.9) is valid.

Page 6: Chapter 12 Transform Domain Techniques for Multimedia Image and

© 2001 CRC Press LLC

12.1.3 Experimental Results

In the simulation, the proposed methods (IWF and IWF∗) have been implemented using anumber of test video sequences, which include Flower Garden, Trevor, Footy, Calendar, andCameraman. We show here the effectiveness of the proposed methods with the images ofTrevor, Footy, and Cameraman. Figure 12.1 displays the original Trevor image, its MPEGcoded image, and the processed images by the WLS method and IWF∗ method, respectively.In Figure 12.1b the coding artifacts can be clearly seen. Although the images in Figure 12.1cand 12.1d look similar, a closer look on a region-by-region (and along the edges) basis showsthat many of the coding artifacts have been removed in Figure 12.1d, and it is supported by thehigher PSNR value. Figures 12.2 and 12.3 show similar results for the Footy and Cameramanimages. All these figures show that the proposed methods significantly reduce the codingartifacts by removing some noise introduced by the quantization process. This improves thePSNR by more than 1 dB and in turn improves the visual quality of the images. Note that inour simulation we have used the MPEG quantizer with an 8 × 8 block-based DCT [13].

It is clear from [2] that Choy et al.’s proposed filter WLS gives better PSNR values than theLPF [4] and POCS filter [5]. In our experiment, we have compared the results of IWF andIWF∗ with those of WLS and WLS∗, respectively (Table 12.1).

Table 12.1 PSNR Improvements of the Images Reconstructed UsingWLS and IWF

MPEG PSNR Improvements over POCS [5]Encoded bpp PSNR (dB) WLS WLS∗ IWF IWF∗Images

Flower Garden 0.3585 23.1401 0.9668 0.5317 1.1439 1.2537Trevor 0.1537 31.8739 0.8074 0.6009 1.0338 1.0239Footy 0.2325 26.4812 0.9769 0.5958 1.0324 1.0253Calendar 0.3398 22.4223 0.8777 0.6232 1.2363 1.1867Cameraman 0.2191 27.1588 0.5485 0.3568 1.1002 0.9100

From the PSNR improvements shown in Table 12.1, we can conclude that the proposedmethods give better restoration of the transform coefficients than those of the WLS and WLS∗methods and yield a better visual quality of images.

It is evident from our simulation that our proposed methods restore the transform coefficientsfrom the quantized versions and, thus, they can reduce a number of coding artifacts, such as

1. Blocking artifact: This is due to coarse quantization of the transform coefficients. Theblocking artifacts can be clearly seen in all the images.

2. DCT basis images effect: This is due to the appearance of DCT basis images. Forexample, it can be seen in certain blocks on the background of the Footy image andground of the Cameraman image. In the filtered images of Footy and Cameraman, thiseffect has been reduced.

3. Ringing effect: This is due to quantization of the AC DCT coefficients along the high-contrast edges. It is prominent in the images of Trevor and Cameraman along the armand shoulder, respectively, and in the filtered images it has been reduced.

4. Staircase effect: This is due to the appearance of DCT basis images along the diagonaledges. It appears due to the quantization of higher order DCT basis images and fails tobe muted with lower order basis images.

Page 7: Chapter 12 Transform Domain Techniques for Multimedia Image and

© 2001 CRC Press LLC

FIGURE 12.1Images of Trevor. (a) Original image; (b) MPEG coded image (0.1537 bpp, 31.8739 dB);(c) processed by WLS (32.6813 dB); (Continued).

Page 8: Chapter 12 Transform Domain Techniques for Multimedia Image and

© 2001 CRC Press LLC

FIGURE 12.1(Cont.) Images of Trevor. (d) Processed by the proposed IWF∗ method (32.9178 dB).

12.1.4 More Comparison

In this section we have compared our proposed techniques with a recently published blockingartifact reduction method proposed by Kim et al. [14]. In their method the artifact reductionoperation has been applied to only the neighborhood of each block boundary in the wavelettransform at the first and second scales. The technique removes the blocking component thatreveals stepwise discontinuity at block boundaries. It is a blocking artifact reduction techniqueand does not necessarily reduce the other coding artifacts mentioned above. It is evident fromthe images in Figure 12.4 that this method still blurs the image (Figure 12.4c) significantly andthus some edge details that are important for visual perception have been lost.

We have used the JPEG-coded Lena image provided by Kim et al. [14] to compare ourresults. This Lena image is JPEG coded with a 40:1 compression ratio. The enlarged portionof the original and JPEG-coded Lena images are given in Figures 12.4a and b. In Figure 12.4c,the processed image of Figure 12.4b by the method proposed by Kim et al. is presented. Ascan be seen in Figure 12.4c, their proposed method still blurs the image significantly and thusthe sharpness of the image is lost. In addition, there are a number of other obvious problems:(1) the ringing effect along the right cheek edge, (2) the blurred stripes on the hat, and (3) theblurred edge between the hat and the forehead, to name just a few. The image processedby our proposed method (IWF∗) is presented in Figure 12.4d. In this image we can clearlysee the sharpness of the edges, while reducing a number of coding artifacts, and an overallimprovement in the visual quality of the image.

12.2 Image and Edge Detail Detection

12.2.1 Introduction

The recent interest of the Moving Pictures Expert Group (MPEG) is object-based image rep-resentation and coding. Compared to the conventional frame-based compression techniques,the object-based coding enables MPEG-4 to cover a wide range of emerging applications in-cluding multimedia. The MPEG-4 supports new tools functionality not available in existingstandards.

Page 9: Chapter 12 Transform Domain Techniques for Multimedia Image and

© 2001 CRC Press LLC

FIGURE 12.2Images of Footy. (a) Original image; (b) MPEG coded image (0.2325 bpp, 26.4812 dB);(c) processed by WLS (27.4581 dB); (Continued).

Page 10: Chapter 12 Transform Domain Techniques for Multimedia Image and

© 2001 CRC Press LLC

FIGURE 12.2(Cont.) Images of Footy. (d) Processed by the proposed IWF∗ method (27.6095 dB).

One of the important tools needed to enhance and broaden its applications is to introduceeffective methods for image segmentation. Image segmentation techniques not only enhancethe MPEG standards but also are needed for many computer vision and image processingapplications. The goal of image segmentation is to find regions that represent objects ormeaningful parts of objects. Therefore, image segmentation methods will look for objects thateither have some measure of homogeneity within them or have some measure of contrast withthe objects or their border.

In order to carry out image segmentation we need effective image and edge detail detectionand enhancement algorithms. Since edges often occur at image locations representing objectboundaries, edge detection is extensively used in image segmentation when we want to dividethe image into areas corresponding to different objects. The first stage in many edge detectionalgorithms is a process of enhancement that generates an image in which ridges correspond tostatistical evidence for an edge [15]. This process is achieved using linear operators, includingRoberts, Prewitt, and Canny. These are called edge convolution enhancement techniques.They are based on convolution and are suitable for detecting the edges for still images. Thesetechniques do not adapt any visual perception properties, but use only the statistical behaviorof the edges. Thus, they cannot detect the edges that might contribute to the edge fluctuationsand coding artifacts that could occur in the temporal domain [16, 17].

In this section a transform domain technique has been introduced to detect image and edgedetails that are suitable for image segmentation and reduction of coding artifacts in digitalvideo coding. This method uses the perceptual properties and edge contrast information ofthe transform coefficients and, thus, it gives meaningful edges that are correlated with humanvisual perception. Also, the method allows users to select suitable edge details from differentlevels of edge details detected by the method for different applications in which human visualquality is an important factor.

12.2.2 Methodology

Let us consider an image I of size N1n × N2n, where I is divided into N1 × N2 blockswith each block having n × n pixels. In transform coding, a block is transformed into thetransform domain using a two-dimensional separable unitary transform such as the DCT, andthis process can be expressed by

Y = T ∗ X ∗ T ′ (12.15)

Page 11: Chapter 12 Transform Domain Techniques for Multimedia Image and

© 2001 CRC Press LLC

FIGURE 12.3Images of Cameraman. (a) Original image; (b) MPEG coded image (0.2191 bpp,27.1588 dB); (c) processed by WLS (27.8365 dB); (Continued).

Page 12: Chapter 12 Transform Domain Techniques for Multimedia Image and

© 2001 CRC Press LLC

FIGURE 12.3(Cont.) Images of Cameraman. (d) Processed by the proposed IWF∗ method(28.0718 dB).

where X and Y represent a block of I and its transform coefficients, respectively; T is then × n unitary transform matrix; and T ′ represents the transpose of the matrix T . The operator∗ represents the matrix multiplication. The 8 × 8 block-based DCT is used in the experiment,and it is also the one used in the many international image and video coding standards.

Let a be the DC coefficient of X and U be the image that is obtained using the AC coefficientsof X and zero DC coefficient. Then we have the following:

a = 1

n

n∑i=1

n∑j=1

x(i, j) (12.16)

X = T ′ ∗ U ∗ T + a

n

1 . . . 1. . . . . .

1 . . . 1

n×n

(12.17)

where x(i, j) is the (i, j)th intensity value of image block X. It is known that two-dimensionaltransform coefficients, in general, have different visual sensitivity and edge information. Thus,transform coefficients in U can be decomposed into a number of regions based on frequencylevel and edge structures, and they are called frequency distribution decomposition and struc-tural decomposition, respectively [3] (see Figure 12.5). Using these regions of transformcoefficients, we treat the edge details corresponding to the low- (and medium) frequencytransform coefficients separately from the edge details corresponding to the high-frequencycoefficients.

It is clear from a visual perception viewpoint that the low- (and medium) frequency coeffi-cients are much more sensitive than the high-frequency coefficients. In our proposed algorithm,we use this convention and separate the edge details falling in the low- (and medium) frequencycoefficients from the edge details falling in the high-frequency coefficients. To carry out thistask, let us first define a new image block X1 from the transform coefficients of X using thefollowing equation, similar to equation (12.17):

X1 = T ′ ∗ (L · ∗U) ∗ T + α · a

n

1 . . . 1. . . . . .

1 . . . 1

n×n

(12.18)

where L is called an edge-enhancement matrix and α can be chosen to adjust the DC level ofX1 to obtain different levels of edge details with respect to the average intensity of the block

Page 13: Chapter 12 Transform Domain Techniques for Multimedia Image and

© 2001 CRC Press LLC

FIGURE 12.4Lena image coded by JPEG with a 40:1 compression ratio and the processed images.(a) Original image; (b) coded image; (c) processed by Kim et al. method; (Continued).

Page 14: Chapter 12 Transform Domain Techniques for Multimedia Image and

© 2001 CRC Press LLC

FIGURE 12.4(Cont.) Lena* image coded by JPEG with a 40:1 compression ratio and the processedimages. (d) Processed by our proposed method(*Copyright © 1972 by Playboy magazine).

Low FrequencyVertical edges

HorizontalEdges

Diagonal Edges High Frequency

MediumFrequency

(a) (b)

FIGURE 12.5Decomposition of transform coefficients. (a) Structural decomposition based on edgedetails; (b) frequency distribution decomposition.

X1. The operator · ∗ represents the element-by-element matrix multiplication as defined inthe MATLAB package [18].

Selection of L and α is up to the user’s application and thus it forms a flexible algorithm.Suitable selection of L and α for different applications is leading to new research. In thisapproach we used the JPEG-based quantization table as the edge-enhancement matrix, and it

Page 15: Chapter 12 Transform Domain Techniques for Multimedia Image and

© 2001 CRC Press LLC

is given in the following equation:

L =

α 60 70 70 90 120 255 25560 60 70 96 130 255 255 25570 70 80 120 200 255 255 25570 96 120 145 255 255 255 25590 130 200 255 255 255 255 255

120 255 255 255 255 255 255 255255 255 255 255 255 255 255 255255 255 255 255 255 255 255 255

(12.19)

One can select different weighting for the edge-enhancement matrix L. During the selectionof weights L(i, j) we treat DC and AC coefficients differently, because the DC coefficientcontains information about an average (spatial) of the intensity values of an image blockwhereas the AC coefficients contain information about the edges in an image. Using theabove L and α = 1, the proposed method can easily identify vertical, horizontal, and diagonaledges that are vital to improving image quality with respect to visual perception. Significantseparation of these two regions using appropriate L can cause intensity of certain edge details(based on their contrast) in the low- (and medium) frequency coefficients to be pushed downbelow zero while keeping the intensity of the edge details in the high-frequency coefficientsabove zero.

In order to get different levels of edge details based on their contrast, the α can be adjustedabove 1. By increasing α from 1, the average edge intensity can be lifted and low- (andmedium) contrast edges can be lifted above zero while keeping the high-contrast edge detailsbelow zero. Thus, we can obtain different levels of edge detail with respect to the averageedge details in a block.

12.2.3 Experimental Results

Experiments were carried out using a number of images to evaluate the performance of theproposed method; however, only the results of the Cameraman and Flower Garden images aregiven in this chapter. In Figures 12.6a and 12.7a the original images of the Cameraman andFlower Garden are given, respectively.

In Figures 12.6b and 12.7b, the edge details are obtained from the images in Figures 12.6aand 12.7a using the proposed method with the edge-enhancement matrix L in equation (12.19)and α = 1. We see from these images that the proposed method detects the edges and alsoenhances much of the image details. For example, in the Cameraman image we can see glovesdetails, pockets in the jacket, windows in the tower, and so forth. Similarly, in the FlowerGarden image we can see roof textures, branches in the trees, and many other sensitive details.

The images in Figures 12.6c and d and 12.7c and d show the results with the same L inequation (12.19) but now with different α = 25 and 50, respectively. This proves that theincrease in α leaves only high-contrast edge details. We can see from the Cameraman imagethat the proposed method can even highlight the mouth edge detail.

In general, edge detection algorithms give edge details that are suitable for determiningthe image boundary for segmentation. The proposed method not only gives edge details forsegmentation but also provides a flexible (adaptive to the DC level) edge detail identificationalgorithm that is suitable for coding artifact reduction in the transform-coding scheme andallows user control on the DC level to obtain different levels of edge details for differentapplications.

Page 16: Chapter 12 Transform Domain Techniques for Multimedia Image and

© 2001 CRC Press LLC

FIGURE 12.6Images of Cameraman. (a) Original image; (b) edge details with L and α = 1; (c) edgedetails with L and α = 25; (Continued).

Page 17: Chapter 12 Transform Domain Techniques for Multimedia Image and

© 2001 CRC Press LLC

FIGURE 12.6(Cont.) Images of Cameraman. (d) Edge details with L and α = 50.

FIGURE 12.7Images of Flower Garden. (a) Original image; (b) edge details with L and α = 1;(Continued).

Page 18: Chapter 12 Transform Domain Techniques for Multimedia Image and

© 2001 CRC Press LLC

FIGURE 12.7(Cont.) Images of Flower Garden. (c) Edge details with L and α = 25; (d) edge detailswith L and α = 50.

12.3 Summary

In Section 12.1, we introduced two approaches in the transform domain to estimate thequantization error (noise) in DCT coefficients. The IWF method uses the Wiener filter assuminguniform quantization noise, whereas the IWF∗ uses the improved Wiener filter in the transformdomain in order to correct the quantization error that causes the coding artifacts in the decodedimages. The advantage of these proposed methods over WLS and WLS∗ is that they use alesser number of approximations or assumptions compared to WLS and WLS∗ and give betterresults even for low-bit-rate coded images. The IWF∗ method does not directly depend on thenoise characteristics and thus can be used for any type of noise. The proposed methods givebetter results than the other two methods in terms of PSNR, bit rate, and visual image quality.In addition, they also give better results than the method recently introduced by Kim et al. [14].

In Section 12.2, a new approach was proposed to identify image and edge details that are suit-able for image segmentation and coding artifacts reduction in digital image and video coding.In contrast to existing methods, which are mainly edge convolution enhancement techniquesand use statistical properties of the edges, the proposed method uses human perceptual infor-mation and edge details in the transform coefficients. The proposed method is suitable for manyapplications, including low-level image processing, medical imaging, image/video coding andtransmission, multimedia image retrieval, and computer vision. It identifies the edge details

Page 19: Chapter 12 Transform Domain Techniques for Multimedia Image and

© 2001 CRC Press LLC

based on the human visual properties of the transform coefficients, and these edges can con-tribute to the temporal edge fluctuations in digital video coding, which can lead to unpleasantartifacts. Identification of such edge details can lead to an early fix during encoding.

Acknowledgement

We thank Nam Chul Kim (Visual Communication Laboratory, Dept. of Electrical Engineer-ing, Kyungpook National University, Korea, who has kindly given us the simulation results ofthe JPEG-coded Lena images for comparison.

References

[1] M. Yuen and H.R. Wu, “A survey of hybrid MC/DPCM/DCT video coding distortions,”Signal Processing EURASIP J., vol. 70, pp. 247–278, Oct. 1998.

[2] S.S.O. Choy, Y.-H. Chan, and W.-C. Siu, “Reduction of block-transform image codingartifacts by using local statistics of transform coefficients,” IEEE Signal Processing Lett.,vol. 4, no. 1, Jan. 1997.

[3] K.R. Rao and J.J. Hwang, Techniques and Standards for Image, Video, and Audio Coding,Prentice-Hall, Englewood Cliffs, NJ, 1996.

[4] H.C. Reeves and J.S. Lim, Reduction of blocking effects in image coding, Opt. Eng.,vol. 23, pp. 34–37, Jan./Feb. 1984.

[5] Y. Yang, N.P. Galatsanos, and A.K. Katsaggelos, “Regularized reconstruction to reduceblocking artifacts of block discrete cosine transform compressed images,” IEEE Trans.CSVT, vol. 3, pp. 421–432, Dec. 1993.

[6] R.L. Stevenson, “Reduction of coding artifacts in transform image coding,” Proc.ICASSP, vol. 5, pp. 401–404, 1993.

[7] S. Suthaharan and H.R. Wu, “Adaptive-neighbourhood image filtering for MPEG-1coded images,” Proc. ICARCV’96: Fourth Int. Conf. on Control, Automation, Roboticsand Vision, pp. 1676–1680, Singapore, Dec. 1996.

[8] S. Suthaharan, “Block-edge reduction in MPEG-1 coded images using statistical infer-ence,” 1997 IEEE ISCAS, pp. 1329–1332, Hong Kong, June 1997.

[9] S. Suthaharan, “New SNR estimate for the Wiener filter to image restoration,” J. Elec-tronic Imaging, vol. 3, no. 4, pp. 379–389, Oct. 1994.

[10] S. Suthaharan, “A modified Lagrange’s interpolation for image restoration,” Austr. J.Intelligent Information Processing Systems, vol. 1, no. 2, pp. 43–52, June 1994.

[11] J.M. Blackledge, Quantitative Coherent Imaging, Academic Press, New York, 1989.

[12] J.S. Lim, Two-Dimensional Signal and Image Processing, Prentice-Hall, EnglewoodCliffs, NJ, 1990.

Page 20: Chapter 12 Transform Domain Techniques for Multimedia Image and

© 2001 CRC Press LLC

[13] Secretariat ISO/IEC JTC1/SC29, ISO CD11172-2, “Coding of moving pictures andassociated audio for digital storage media at up to about 1–5 Mbit/s,” MPEG-1, ISO,Nov. 1991.

[14] N.C. Kim, I.H. Jang, D.H. Kim, and W.H. Hong, “Reduction of blocking artifact in block-coded images using wavelet transform,” IEEE Trans. CSVT, vol. 8, no. 3, pp. 253–257,June 1998.

[15] S.E. Umbaugh, Computer Vision and Image Processing: A Practical Approach UsingCVIP Tools, Prentice-Hall, Englewood Cliffs, NJ, 1998.

[16] S.M. Smith and J.M. Brady, “SUSAN — a new approach to low level image processing,”Int. J. Computer Vision, vol. 23, no. 1, pp. 45–78, May 1997.

[17] I. Sobel, “An isotropic 3 × 3 image gradient operator.” In Machine Vison for Three-Dimensional Scenes, H. Freeman, ed., pp. 376–379, Academic Press, New York, 1990.

[18] MATLAB (version 5), “Getting started with MATLAB,” The Math Works Inc., 1996.