image compression using dct

10

Click here to load reader

Upload: sivaranjan-goswami

Post on 13-Dec-2015

10 views

Category:

Documents


0 download

DESCRIPTION

This tutorial covers the fundamentals of Image transformation, the concept of energy compaction intransformed domain an introduction to DCT and its application in image compression. Finally we willlook into the standard schemes of JPEG image compression technique in brief.

TRANSCRIPT

Page 1: Image Compression using DCT

This tutorial is downloaded from https://sites.google.com/site/enggprojectece

Image Compression Part 1

Discrete Cosine Transform (DCT)

Compiled by: Sivaranjan Goswami, Pursuing M. Tech. (2013-15 batch) Dept. of ECE, Gauhati University, Guwahati, India

Contact: [email protected]

This tutorial covers the fundamentals of Image transformation, the concept of energy compaction in transformed domain an introduction to DCT and its application in image compression. Finally we will look into the standard schemes of JPEG image compression technique in brief.

Image Representation in Transformation Domain

An image in spatial domain can be represented in a transformed domain given by:

,ݑ)ܶ (ݒ = ݂(ݕ,ݔ)ݑ,ݕ,ݔ)ݎ, (ݒேିଵ

௬ୀ

ெିଵ

௫ୀ

(1)

Here, f(x,y) is the image in spatial domain and r(x,y,u,v) is called the forward transformation kernel.

The original image can be obtained from the transformed image using the relation:

(ݕ,ݔ)݂ = ܶ(ݑ, ,ݑ,ݕ,ݔ)ݏ(ݒ (ݒேିଵ

௬ୀ

ெିଵ

௫ୀ

(2)

Where s(x,y,u,v) is called the inverse transformation kernel.

The transformation can be represented in matrix form given by:

T(MN) = R1(MM) f(MN) R2(NN) (3)

Where f(MxN) is the image matrix and R1(MxM) and R2(NXN) combined represent the transformation kernel.

Two Dimensional Discrete Fourier Transform

Two dimensional (2D) DFT is given by:

,ݑ)ܨ (ݒ = ݂(ݔ, ଶగቀି݁(ݕ௨௫ெ ା௩௬ே ቁ

ேିଵ

௬ୀ

ெିଵ

௫ୀ

(4)

Similarly, 2D IDFT is given by:

(ݕ,ݔ)݂ = ݑ)ܨ, ଶగቀ݁(ݒ௨௫ெ ା௩௬ே ቁ

ேିଵ

௬ୀ

ெିଵ

௫ୀ

(4)

Page 2: Image Compression using DCT

This tutorial is downloaded from https://sites.google.com/site/enggprojectece

We know that Fourier transform gives the image in frequency domain. Hence the independent variables u and v represents frequency along x axis and y axis respectively. In case of image, frequency is nothing but the rate of change of intensity.

Thus F(0,0) contains information about the pixels where there is no variation. Similarly F(M – 1, N – 1) contains the information of the pixels where this variation is more, that is the pixels where there is variation in intensity along both x axis and y axis in every neighboring pixel.

In MATLAB a 2D FFT can be obtained using the function fft2

The Concept of Energy Compaction:

In image, the low frequency corresponds to the coarser details, where the amount of intensity variation is less and the high frequency corresponds to finer details, where there is more variation in intensity. In most of the image, the high frequency portion contains less information compared to the low frequency portion. So low frequency portions will have more information compared to high frequency portion.

If the origin refers to the lowest frequency, then most of the information will be stored around the origin. This is called energy compaction.

It can be verified by passing an image through LPF and HPF in the transformed domain and then reconstructing them to the spatial domain.

Note

It is to be noted that MATLAB performs FFT (1D) by discretizing frequency, ω in the range [0,2π]. But for plotting both positive and negative spectrum in the range [-π, π], we must use the function fftshift.

Same is the case for 2D FFT also. When we perform fft2 the low frequency portion goes to the four corners of the image and the high frequency portion goes to the centre. However for most calculations, we take the frequency to be 0 at origin and increasing outwards along x and y directions. That’s why before applying any filter or other algorithms we have to use function fftshift.

However, before performing IFT2 don’t perform an ftshift operation to ensure correct result. Because the origin of spatial domain is at the upper left corner of the image and we consider the image to be present only in the first quadrant (considering the actual Cartesian plane to be 90 degree rotated).

Page 3: Image Compression using DCT

This tutorial is downloaded from https://sites.google.com/site/enggprojectece

Figure 1 (a): Passing an image through LPF – more information in low frequency region

Figure 1 (b): Passing an image through HPF – less information in high frequency region

From the figures 1(a) and 1(b) it can be seen that the low frequency components of the signal has more information compared to the high frequency components. The LPF output is intelligible but HPF output contains only some fine details of the image.

Here I have just shown the example of 2D DFT. But the fact is that any transformation shows some energy compaction in the transformed domain. We shall discuss other transforms and their performance in terms of energy compaction and other affecting factors in a later part of this tutorial. Now we will try to understand what actually image compression is and why it is required.

Overview of Image Compression

Let us consider a 32642448 color image (photographs taken by commonly used digital cameras such as Nikon Coolpix are of such size – actual size may vary from model to model). Every pixel will require 3 numbers to represent the color (RGB). We know that in digital image the pixel values are in the range 0 to 255. Thus we need 1 byte (8 bits) to store one number. Hence, total amount of memory required to store the image is:

326424483 = 23970816 bytes = 22.86 mega bytes (MB)

Page 4: Image Compression using DCT

This tutorial is downloaded from https://sites.google.com/site/enggprojectece

But practically we can see that the sizes of the pictures taken by such cameras are in the approximate range of 1.5 MB to 4 MB. This is possible only because of image compression.

How an Image is compressed?

An image (or most of the signals that we use) contains redundant amount of data than that is actually required to convey information. There are basically three types of redundancy:

1. Coding Redundancy: This is because of inefficient allocation of bits per symbol. This type of redundancy is eliminated using efficient coding schemes, such as Huffman coding, Arithmetic coding etc. that are based on probability values of the symbols.

2. Inter Sample Redundancy: Sometimes samples of a signal can be predicted from the previous samples. In this case it is not an efficient way to store or transmit all the samples. Instead, some proper code that yields only the necessary data to reconstruct the information is sufficient. For example we can speak about DPCM (Differential Pulse Code Modulation) where only the difference between two successive samples is encoded.

3. Perceptual Redundancy: Sometimes an image contains more than that we can see with our eyes. Such redundancy is called perceptual redundancy. This is also known as psycho-visual redundancy.

Types of Compression

Compression algorithms are characterized by information preservation. There are three types of compression:

1. Loss-less or Information Preserving: No loss of information. (text, legal or medical application) – But lossless compression provides only a modest amount of compression.

2. Lossy Compression: Sacrifice some information for better compression (web images) – In this tutorial we are mainly dealing with this type.

3. Near Lossless: No (or very little) perceptible loss of information. (increasingly accepted for legal, medical application).

Performance Evaluation of Compression

The performance of a compression technique is evaluated by comparing the reconstructed image after compression with the original uncompressed image. The performance is expressed in terms of root mean square (RMS) error and signal to noise ratio (SNR). They can be calculated using equation 5(a) and 5(b) respectively.

Page 5: Image Compression using DCT

This tutorial is downloaded from https://sites.google.com/site/enggprojectece

(5a)

(5b)

Steps in Image Compression and Decompression:

Figure 2: steps in image compression

Transformer: It transforms the input data into a format to reduce interpixel redundancies in the input image. Transform coding techniques use a reversible, linear mathematical transform to map the pixel values onto a set of coefficients, which are then quantized and encoded. The key factor behind the success of transform-based coding schemes is that many of the resulting coefficients for most natural images have small magnitudes and can be quantized without causing significant distortion in the decoded image. For compression purpose, the higher the capability. of compressing information in fewer coefficients, the better the transform; for that reason, the Discrete Cosine Transform (DCT) and Discrete Wavelet Transform(DWT) have become the most widely used transform coding techniques.

Transform coding algorithms usually start by partitioning the original image into subimages (blocks) of small size (usually 8 × 8). For each block the transform coefficients are calculated, effectively converting the original 8 × 8 array of pixel values into an array of coefficients within which the coefficients closer to the top-left corner usually contain most of the information needed to quantize and encode (and eventually perform the reverse process at the decoder’s side) the image with little perceptual distortion. The resulting coefficients are

Page 6: Image Compression using DCT

This tutorial is downloaded from https://sites.google.com/site/enggprojectece

then quantized and the output of the quantizer is used by symbol encoding techniques to produce the output bitstream representing the encoded image. In image decompression model at the decoder’s side, the reverse process takes place, with the obvious difference that the dequantization stage will only generate an approximated version of the original coefficient values e.g., whatever loss was

Quantizer: It reduces the accuracy of the transformer’s output in accordance with some pre-established fidelity criterion. Reduces the psychovisual (perceptual) redundancies of the input image. This operation is not reversible and must be omitted if lossless compression is desired. The quantization stage is at the core of any lossy image encoding algorithm. Quantization at the encoder side, means partitioning of the input data range into a smaller set of values. There are two main types of quantizers: scalar quantizers and vector quantizers. A scalar quantizer partitions the domain of input values into a smaller number of intervals. If the output intervals are equally spaced, which is the simplest way to do it, the process is called uniform scalar quantization; otherwise, for reasons usually related to minimization of total distortion, it is called non uniform scalar quantization. One of the most popular non uniform quantizers is the Lloyd-Max quantizer. Vector quantization (VQ) techniques extend the basic principles of scalar quantization to multiple dimensions.

Symbol (entropy) encoder: It creates a fixed or variable-length code to represent the quantizer’s output and maps the output in accordance with the code. In most cases, a variable-length code is used. An entropy encoder compresses the compressed values obtained by the quantizer to provide more efficient compression. Most important types of entropy encoders used in lossy image compression techniques are arithmetic encoder, huffman encoder and run-length encoder.

Decompression is the reverse operation of all these steps to recover the image in spatial domain.

Page 7: Image Compression using DCT

This tutorial is downloaded from https://sites.google.com/site/enggprojectece

Discrete Cosine Transform (2D)

The 2D DCT is given by:

,ݑ)ܶ (ݒ = (ݒ)ߙ(ݑ)ߙ ݂(ݔ, cos(ݕேିଵ

௬ୀ

ݔ2) + 1)

ܯ2 ൨ெିଵ

௫ୀ

ݏܿ ݕ2) + 1)

2ܰ ൨ (6)

Where,

(ݑ)ߙ = ට ଵெ

if u=0

(ݑ)ߙ = ට ଶெ

if u=1,2,3,….,M – 1

2D Inverse DCT (IDCT) is given by:

(ݕ,ݔ)݂ = ݑ)ܶ(ݒ)ߙ(ݑ)ߙ, ݏܿ(ݒ ݑ2) + 1)

ܯ2 ൨ܿݏ ݒ2) + 1)

2ܰ ൨ (7)ேିଵ

௩ୀ

ெିଵ

௨ୀ

Compression using Quantization

DCT itself is almost lossless. But actual compression is achieved in the step quantization. As we have already discussed in the section Energy Compaction, in the transformed domain, the high frequency components contain very less information. It is to be noted that the energy compaction of DCT is much more than DFT. Thus in the quantization stage, the image in DCT domain is divided by a quantization matrix.

The following MATLAB program will give you a brief idea (here framing into 88 blocks is skipped:

Example 1

clear all; close all; clc; x=imread('lena_gray_256.tif'); %Reading original 256x256 image X=dct2(x); %2D DCT Y=X(1:150,1:150); %Taking only 150x150 of the DCT Y1=round(Y); %Quantization - Practically some encoding required y1=idct2(Y1,[256,256]);%Converting IDCT and making the size back to 256x256 y=uint8(y1); %Rounding off to integer values figure subplot 121 imshow(x), title('Original Image') subplot 122 imshow(y), title('Compressed Image') %Calculating mean square error (MSE) mse=1/(256*256)*sqrt(sum(sum((x-y).^2)))

Page 8: Image Compression using DCT

This tutorial is downloaded from https://sites.google.com/site/enggprojectece

Result:

Figure 3: Output of Example 1: Comparison of the original and the compressed image

MSE obtained = 0.0156

It can be seen that the compressed image is similar to the original image and MSE is also small. But we converted a 256256 image to just 150150 in the transformed (DCT) domain.

The encoding is nothing but representing the 150150 quantized image in transformed image into some efficient encoding scheme to represent them in an efficient manner. Usually some efficient variable length coding such as Huffman Coding is used.

The encoding part is important because in the transformed domain, there are more than 104 levels in the image and we need at approximately 16 bits to represent each pixel. Thus although the size of the image is small, the number of bits required will be double compared to a standard image (8-bit). If we reduce the number of number quantization levels then quality suffers.

Here I have used in-built functions to perform DCT and IDCT. Students are encouraged to write their own programs to perform these using equations 6 and 7 respectively.

Original Image Compressed Image

Page 9: Image Compression using DCT

This tutorial is downloaded from https://sites.google.com/site/enggprojectece

JPEG Compression

Figure 4: Block diagram of a JPEG encoder.

JPEG Compression is the name given to an algorithm developed by the Joint Photographic Experts Group whose purpose is to minimize the file size of photo-graphic image files. JPEG images are used extensively in the internet. Nowadays most of the commercially available digital cameras including mobile phone cameras gives JPEG images. The extension of JPEG image files is “.jpeg” or in most cases “.jpg”.

1. The RGB image is first converted to YIQ or YIV format and subsample color. 2. Divide the whole image into 88 blocks 3. Perform DCT of each 88 blocks

Note: The results of a 64-element DCT transform are 1 DC coefficient and 63 AC coefficients. The DC coefficient represents the average color of the 8x8 region. The 63 AC coefficients represent color change across the block. Low-numbered coefficients represent low-frequency color change, or gradual color change across the region. High-numbered coefficients represent high-frequency color change, or color which changes rapidly from one pixel to another within the block. These 64 results

Page 10: Image Compression using DCT

This tutorial is downloaded from https://sites.google.com/site/enggprojectece

are written in a zig-zag order as follows, with the DC coefficient followed by AC coefficients of increasing frequency.

4. Quantize the image in DCT format using pre-defined quantization table 5. Encode DC part of the signal using DPCM (differential pulse code modulation) –

Thus only the difference between the DC parts of two successive 88 blocks will be quantized.

6. Encode the AC parts using RLE (Run Length Encoding) – It is to be noted that as an 88 block is traced in zig-zag manner to make RLE more efficient.

7. Finally the entropy encoder uses Huffman coding to optimize the entropy.

Part 2 will cover Image compression using Discrete Wavelet Transform and JPEG 2000

Run-length encoding (RLE) is a very simple form of data compression in which runs of data (that is, sequences in which the same data value occurs in many consecutive data elements) are stored as a single data value and count, rather than as the original run.

For example:

2 2 2 2 3 3 3 3 3 3 3 4 4 4 4 5 5 5 5 5 5 1 1 1

The above stream will be encoded as:

4 2 7 3 4 4 6 5 3 1

As the image is traced in zig-zag manner, more neighboring pixels will come together and many of them are likely to have same values after quantization; which will make the use of RLE more effective.