ee 398a: image & video compression project march 2011 …€¦ · ee 398a: image & video...

8
EE 398A: IMAGE & VIDEO COMPRESSION PROJECT MARCH 2011 1 Direction-Adaptive Karhunen-Lo` eve Transform for Image Compression Vinay Raj Hampapur, Wendy Ni Abstract—Amongst transform coding techniques for image compression, the Karhunen-Lo` eve Transform (KLT) uniquely achieves maximum coding gain by decorrelating the input image signals. However, KLT’s performance relies heavily on the accu- racy of the estimated input signal distribution. On the other hand, directional transforms such as the Directional Discrete Transform (DDCT) exploit the similarity in image blocks with directional content, though the transform does not adapt to varying image content. In our studies, we designed and implemented an adap- tive KLT transform dubbed Direction-Adaptive Karhunen-Lo` eve Transform (DA-KLT), a novel transform coding technique for lossy image compression. The DA-KLT combined the optimality of the KLT with direction-adaptive training to provide a more accurate estimate of the image signal statistics. Our method achieved a consistent improvement over KLT,but fell short of the DDCT in performance. Nevertheless, the DA-KLT delved into a new direction in image compression research by combining two approaches to further exploit statistical dependencies in images. Index Terms—DA-KLT, DDCT, KLT, block classification, im- age compression I. I NTRODUCTION T RANSFORM coding is a class of lossy image compres- sion techniques that exploits the statistical dependencies between image pixels by converting them to coefficients with little or no dependencies [1]. Image transforms are generally carried out as block-wise decomposition of image pixels using a basis that is often pre-defined and independent of image content. Transforms with fixed bases include Discrete Cosine Transform (DCT) and various Discrete Wavelet Trans- forms (DWTs). In comparison, the Karhunen-Lo` eve Transform (KLT) improves compression by customising its basis vectors to the image content. For a random vector X representing the set of pixel values in image blocks, the matrix of basis vectors Φ is defined to be the matrix of eigenvectors of the covariance matrix C X [1]: C X = E (X - E[X])(X - E[X]) T = ΦC Y Φ T (1) Because C Y , the covariance matrix of the output transform coefficients, is a diagonal matrix, the KLT has completely decorrelated its input X. As this is a necessary condition for full exploitation of input signal dependencies, the KLT tends to perform very well in general [1]. Furthermore, eigendecompo- sition using the single value decomposition (SVD) ensures that V. Hampapur is with the Department of Electrical Engineering, Stanford University, Stanford, CA, 94305 USA e-mail: [email protected]. W. Ni is with the Department of Electrical Engineering, Stanford University, Stanford, CA, 94305 USA e-mail: [email protected]. Report submitted March 13, 2011. the matrix Φ is orthonormal. For a Gaussian input, the output is then Gaussian and independent. Hence, for an image with Gaussian statistics, KLT fully exploits statistical dependencies between pixels and is the optimal transform [1]. However, in order to calculate the covariance matrix C X , it is essential that the image statistics is known a priori. This limits the use of KLT as such information is not usually available. Transforms may be improved in another manner by taking into account the directionality of the block content. For exam- ple, the Directional Discrete Cosine Transform (DDCT) oper- ates in multiple modes, where each block can be vectorized differently for use with pre-defined basis vectors customised for various directions (Fig. 1 and 2) [2]. Then, one set of basis vectors would align most closely with the block content, improving energy compaction and hence compression ratio [2]. In this study, we combined KLT and direction adaptation to design and implement a novel transform coding technique, Direction-Adaptive KLT (DA-KLT). Instead of pre-defining Fig. 1. Six of eight modes directional modes defined in a similar way as H.264 and used in DDCT, for block size 8x8. Modes 1 and 2 (not shown) are horizontal and vertical respectively, corresponding to conventional DCT. [2] Fig. 2. Performing DDCT in mode 5. (a) Variable-length 1D DCT along the vertical-right direction in the first step. (b) Variable length 1D DCT in the second step. (c) Modified zigzag scanning for encoding of coefficients. [2]

Upload: others

Post on 14-Jun-2020

2 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: EE 398A: IMAGE & VIDEO COMPRESSION PROJECT MARCH 2011 …€¦ · EE 398A: IMAGE & VIDEO COMPRESSION PROJECT MARCH 2011 1 Direction-Adaptive Karhunen-Loeve Transform for` Image Compression

EE 398A: IMAGE & VIDEO COMPRESSION PROJECT MARCH 2011 1

Direction-Adaptive Karhunen-Loeve Transform forImage Compression

Vinay Raj Hampapur, Wendy Ni

Abstract—Amongst transform coding techniques for imagecompression, the Karhunen-Loeve Transform (KLT) uniquelyachieves maximum coding gain by decorrelating the input imagesignals. However, KLT’s performance relies heavily on the accu-racy of the estimated input signal distribution. On the other hand,directional transforms such as the Directional Discrete Transform(DDCT) exploit the similarity in image blocks with directionalcontent, though the transform does not adapt to varying imagecontent. In our studies, we designed and implemented an adap-tive KLT transform dubbed Direction-Adaptive Karhunen-LoeveTransform (DA-KLT), a novel transform coding technique forlossy image compression. The DA-KLT combined the optimalityof the KLT with direction-adaptive training to provide a moreaccurate estimate of the image signal statistics. Our methodachieved a consistent improvement over KLT,but fell short of theDDCT in performance. Nevertheless, the DA-KLT delved into anew direction in image compression research by combining twoapproaches to further exploit statistical dependencies in images.

Index Terms—DA-KLT, DDCT, KLT, block classification, im-age compression

I. INTRODUCTION

TRANSFORM coding is a class of lossy image compres-sion techniques that exploits the statistical dependencies

between image pixels by converting them to coefficients withlittle or no dependencies [1]. Image transforms are generallycarried out as block-wise decomposition of image pixelsusing a basis that is often pre-defined and independent ofimage content. Transforms with fixed bases include DiscreteCosine Transform (DCT) and various Discrete Wavelet Trans-forms (DWTs). In comparison, the Karhunen-Loeve Transform(KLT) improves compression by customising its basis vectorsto the image content. For a random vector X representing theset of pixel values in image blocks, the matrix of basis vectorsΦ is defined to be the matrix of eigenvectors of the covariancematrix CX [1]:

CX = E[(X − E[X])(X − E[X])T

]= ΦCYΦ

T (1)

Because CY , the covariance matrix of the output transformcoefficients, is a diagonal matrix, the KLT has completelydecorrelated its input X . As this is a necessary condition forfull exploitation of input signal dependencies, the KLT tends toperform very well in general [1]. Furthermore, eigendecompo-sition using the single value decomposition (SVD) ensures that

V. Hampapur is with the Department of Electrical Engineering, StanfordUniversity, Stanford, CA, 94305 USA e-mail: [email protected].

W. Ni is with the Department of Electrical Engineering, Stanford University,Stanford, CA, 94305 USA e-mail: [email protected].

Report submitted March 13, 2011.

the matrix Φ is orthonormal. For a Gaussian input, the outputis then Gaussian and independent. Hence, for an image withGaussian statistics, KLT fully exploits statistical dependenciesbetween pixels and is the optimal transform [1]. However, inorder to calculate the covariance matrix CX , it is essential thatthe image statistics is known a priori. This limits the use ofKLT as such information is not usually available.

Transforms may be improved in another manner by takinginto account the directionality of the block content. For exam-ple, the Directional Discrete Cosine Transform (DDCT) oper-ates in multiple modes, where each block can be vectorizeddifferently for use with pre-defined basis vectors customisedfor various directions (Fig. 1 and 2) [2]. Then, one set ofbasis vectors would align most closely with the block content,improving energy compaction and hence compression ratio[2].

In this study, we combined KLT and direction adaptationto design and implement a novel transform coding technique,Direction-Adaptive KLT (DA-KLT). Instead of pre-defining

Fig. 1. Six of eight modes directional modes defined in a similar way asH.264 and used in DDCT, for block size 8x8. Modes 1 and 2 (not shown) arehorizontal and vertical respectively, corresponding to conventional DCT. [2]

Fig. 2. Performing DDCT in mode 5. (a) Variable-length 1D DCT alongthe vertical-right direction in the first step. (b) Variable length 1D DCT in thesecond step. (c) Modified zigzag scanning for encoding of coefficients. [2]

Page 2: EE 398A: IMAGE & VIDEO COMPRESSION PROJECT MARCH 2011 …€¦ · EE 398A: IMAGE & VIDEO COMPRESSION PROJECT MARCH 2011 1 Direction-Adaptive Karhunen-Loeve Transform for` Image Compression

EE 398A: IMAGE & VIDEO COMPRESSION PROJECT MARCH 2011 2

basis vectors, the DA-KLT used training images to optimisethe basis vectors. Training images were first partitioned intoequal-sized square blocks, which were classified into depend-ing on the presence and direction of dominant edges within theblocks. The classification algorithm we implemented utilisedCanny’s edge detection [3] and pixel variance. Block statisticsfor each class were then used to calculate multiple sets ofbasis vectors to be used for image compression. To compressa test image, the same partition and classification procedureswere carried out. Then, KLT was performed on each blockusing appropriate basis vectors. The coefficients were finallyquantised and encoded using a Huffman coder.

The DA-KLT was implemented in MATLAB and testedon 512 × 512 8-bit grayscale images. The results werebenchmarked against conventional KLT and DDCT. We alsoinvestigated the effect of varying quantisation step and blocksize, and implementing the additional features of principalcomponent truncation and DC separation. The outcome weobtained shows that DA-KLT is a feasible transformationcoding technique. Hence our work signalled a new directionof exploration in the improvement of transform coding tech-niques.

In Section II of this report, we first describe the functionaland programmatic components of DA-KLT from training tothe compression stage. Then, in Section III, we present anddiscuss the trends, implications and performance assessmentof some results. Section IV concludes this report by reviewingthe benefits and significance of DA-KLT.

II. DA-KLT

A. Training

To overcome the lack of a priori knowledge of the inputimage statistics, we processed a set of training images toapproximate the image statistics and derive basis vectorsfor the transform. In order to exploit directionality, wefirst partitioned the images into blocks of size N whereN ∈ {4, 8, 16, 32} . These blocks were then classified intoone of 10 classes using an algorithm utilising Canny’s EdgeDetection Algorithm [3]. In our algorithm, we first calculatedthe gradients δx and δy of the block in the horizontal andvertical directions respectively. From these gradients, the angleand magnitude for each pixel were calculated. Then angleθ = tan−1

(δyδx

)was quantized into 8 different directions:

0◦,±22.5◦,±45◦,±67.5◦, and 90◦. In order to determine thedominant direction, we summed the magnitudes of the gradientvectors in each direction. If the maximum sum exceededthreshold Xth , we defined the block as directional andassigned a class ID in the range 1, 2, · · · 8 correspondingto one of the directions. The choice of Xth is critical to thesuccess of the DA-KLT. If the threshold is set too high, manyblocks will be misclassified as belonging to the remainingtwo classes, thereby degrading the performance of DA-KLT.Through visual assessment, we set the threshold for a blocksize of N with variance σ2 as:

Xth = 10 · (2N√

2− 1) +σ2

N(2)

Fig. 3. Functional modules and data flow in the training stage of DA-KLT. The block classifier, KLT and quantizer modules are also used in thecompression stage of DA-KLT. The VLC table was to be used to estimate theentropy of the transform coefficients, while the Huffman table would allowimplementation of a Huffman encoder to demonstrate an achievable rate.

The first term in the numerator accounts for the minimumnumber of pixels that should contribute to the directionality ofthe block, while the block variance provides some adjustment.Experimentation showed that this threshold always identifiedstrongly directional classes, but missed some weakly direc-tional blocks. A class ID of 9 corresponded to a ”flat” blockwith variance of the block less than a threshold of 15, whichwas again determined using visual criteria. All remainingblocks were classified as class 10, which corresponded to”textured” blocks. This block classifier is demonstrated in Fig.4. Following block classification for the entire test image set,we calculated the statistics for each class. For block size N ,all blocks with the same class ID are vectorized column-wiseand collated to calculate a covariance matrix of size N ×N .Thus, 10 covariance matrices are produced. Then the matrixof basis vectors is calculated as in (1). For a good blockclassification scheme, the basis vectors should demonstratestrong directionality (Fig. 5). For a block classificationscheme with good energy compaction, truncation of leastdominant principal components in the training data couldbe carried out to reduce the number of coefficients. Here,dominance of a principal component was measured by therelative size of the corresponding eigenvalue, which translatedto the energy carried by the component. With the basis vectors

Page 3: EE 398A: IMAGE & VIDEO COMPRESSION PROJECT MARCH 2011 …€¦ · EE 398A: IMAGE & VIDEO COMPRESSION PROJECT MARCH 2011 1 Direction-Adaptive Karhunen-Loeve Transform for` Image Compression

EE 398A: IMAGE & VIDEO COMPRESSION PROJECT MARCH 2011 3

Fig. 4. Example output of the block classifier on part of the test image”Lena”. The arrows indicate the dominant direction of each block as de-termined by the classifier. ”T” labels blocks of class 10 (”textured” class).Unlabelled blocks belong to class 9 (”flat” class). Some blocks on Lena’shat consisted of two nearly equally-dominant directions, producing varyingclassification in that region.

Fig. 5. Visualisation of trained basis vectors from classes with directionality(a) 90, and (b) 45 for 88 blocks. The 64 basis vectors are ordered column-wise,with each small block representing the 64 elements of one basis vector.

arranging in order of decreasing dominance, the number oftruncated principal components was chosen for class m so thatthe first L(m) (preserved) components carried energy at or justexceeding a given threshold. If the training image set werea good representative of the test images to be compressed,the preserved components would also be the most dominantcomponents in the test images, hence preserving energy at orabove the threshold.

Following transformation of the images, the quantised trans-formation coefficients and class IDs must be transmitted toreconstruct the images. Hence we needed to calculate the min-imum average variable-length code (VLC) codeword length asan approximation of entropy, the theoretical lower bound ofthe average number of bits required to transmit each pixel[1]. VLC tables were trained for each quantised coefficient ofeach class, and the class IDs. We used a uniform mid-treadquantizer with quantisation step ∆. The minimum codewordlength for a quantisation index q is calculated, using the VLCtable for the ith coefficient from class m with quantisationstep ∆ and probability mass function (PMF) of the set of allqunatisation coefficients pm,i,∆(q) using the aforementionedVLC table as:

r(q) = − log2 (pm,i,∆(q)) (3)

Similarly, the minimum codeword length for a block classi-fication ID m with the PMF of the set of all block classificationIDs in the training image set pclass(m) is calculated as:

r(m) = − log2 (pclass(m)) (4)

To account for differences in distributions across coefficientsand classes, we implemented a separate VLC for each coef-ficient from each class by estimating PMFs individually foreach quantisation step. This required that we perform block-wise DA-KLT on the training set. As all blocks have alreadybeen classified, the transformation coefficients were calculatedby applying the KLT using corresponding basis vectors in Φ tox the vector of pixels values from the block (with vectorizingdone in a manner consistent with that during the training ofthe basis vectors) as follows:

y = T · x = ΦTx (5)

The resulting vector of coefficients y has length N2 whenthere is no principal component truncation, and L(m) whenthere is. The coefficients were then quantised. PMFs wereestimated by calculating the frequency of occurrence for eachvalue of the quantisation index. Furthermore, we also imple-mented a Huffman coder to demonstrate an achievable bitrate.We used MATLAB’s implementation of the Huffman encoderto simulate our binary lossless entropy encoding system andhence generate the Huffman table (i.e. the codebook). Wealso made a minor adjustment and encoded coefficients witha single value in their PMF with a 0, i.e., deterministiccoefficients are encoded with the codeword 0.

Because the performance of DA-KLT depends stronglyon the trained PMF, the choice of training images is vital.This requires not only that all block classes be sufficientlyrepresented in the training set, but also that the range oftraining coefficient values be representative of the expectedcoefficient values for test images. At the same time, to optimisethe system for general photographic images, we limited thetesting images to 50 photos of mostly natural scenes andobjects, man-made objects in natural environments, humanfaces, and aerial photos. While this led to a non-uniformdistribution of blocks amongst the classes (with the ”flat”and ”textured” classes contain more samples than directionalclasses), we felt this is a reasonable representation of mostphotographs that have more surface pixels than edge pixels.

B. Compression

Image compression in DA-KLT consisted of the same stepsof block classification, KLT and quantisation of transformcoefficients (Fig. 6). They were implemented using the samealgorithm and code as during training. In order to gaugethe performance of DA-KLT and gain insight into the effectof changing various parameters, we observed the relationshipbetween peak-signal-to-noise ratio (PSNR) and minimum av-erage bit rates (i.e. average word lengths). The PSNR is ameasure of distortion, which arises due to the quantisation ofthe transform coefficients in the process of compression. Thedistortion d is measured as the mean-square-error (MSE) of

Page 4: EE 398A: IMAGE & VIDEO COMPRESSION PROJECT MARCH 2011 …€¦ · EE 398A: IMAGE & VIDEO COMPRESSION PROJECT MARCH 2011 1 Direction-Adaptive Karhunen-Loeve Transform for` Image Compression

EE 398A: IMAGE & VIDEO COMPRESSION PROJECT MARCH 2011 4

Fig. 6. Functional modules and data flow in the compression stage ofDA-KLT. The test images are not included in the training image set. Theblock classifier, KLT and quantizer modules are reused from the trainingstage.Outputs from this stage allow the measurement of performance via thePSNR-rate curve, Huffman rate and reconstructed image quality.

the reconstructed image X and original image X:

d = E[(X −X)2

]=

∑N−1i=0

∑N−1j=0 (xi,j − xi,j)2

N2(6)

where xi,j and xi,j are random variables representing the pixelvalues of the reconstructed and original images respectively.The KLT is an orthonormal transform and Parseval’s theoremmay be used to calculate the distortion using the trans-form coefficients; however, the implementation of principalcomponent truncation prevented this method being used dueto the discarding of image energy. Hence the image wasreconstructed using:

x = TT · y = Φ · y (7)

which is clearly the inverse transformation of (5).Then thedistortion was calculated using (6).For 8-bit grayscale images,PSNR is calculated as:

PSNR = 10 · log10

(2552

d

)(8)

The minimum average codeword lengths for individualtransform coefficients’ quantisation indices were calculatedusing (4) and averaged across all coefficients to produce anoverall minimum average codeword length. To complete theencoding half of the DA-KLT image compression system, wealso examined the Huffman rates produced by the Huffmanencoder. Our Huffman encoder used the Huffman codebookgenerated in the training stage to encode the quantized trans-form coefficients. However, since the codebook used approxi-mate, trained PMFs, it may be incomplete, i.e. some DA-KLTcoefficients from the test images may not be present in thecodebook as they had zero probabilities in the trained PMFs.As this occurred relatively rarely (about 20 per test imageon average), we found the closest values represented in the

codebook and used their codewords instead. These gaps in thePMF will disappear as we increase the number and diversityof the training images.

C. Measure of Performance

The performance of DA-KLT was predominantly measuredby observing the PSNR-rate curve. The minimum averageword length calculated was intended to be an estimation ofthe entropy of the image. However, as it used trained PMFs,it was not the actual entropy, which is used in the followingtheoretical calculations. For an image with pixels that can becan be modelled by a Gaussian distribution with variance σ2

, the SNR is expected to approach the theoretical bound [1]:

SNRd(R) = 10 · log10

(σ2

d

)= 20 ·R · log10(2) ∼= 6R (9)

From (8) and (9), the PSNR is found to have the followingbound:

PSNR ∼= 6R+ 10 log10

(2552

σ2

)(10)

Hence, we expect the PSNR-rate curve to approach anasymptote of gradient 6 dB per bit if both the test image hasGaussian statistics or can be modelled as a Gaussian mixture[1], and the PMFs are estimated accurately.

While inspecting the PSNR-rate curves, our region of in-terest is 30− 40 dB of PSNR. A realistic image compressionscheme would generally not operate in the region with lowerPSNR due to the high distortion, or the region with high rate,which translates to a low compression ratio.

The effectiveness of an orthonormal transform is measuredby the amount of energy compaction it is able to achieve[1]. The concentration of energy in a few transformationcoefficients also leads to smaller variances for at least someof the non-energy-carrying coefficients. The coding gain GTfor an orthonormal transform is calculated as :

GT =1N2

∑N2−1n=0 σ2

Yn

N2√∏N2−1

n=0 σ2Yn

(11)

where the numerator is the arithmetic of the variances ofthe transformation coefficients in a block of size N, and thedenominator is the geometric mean of the same quantities.Itis to be noted that even though different blocks have differentbasis functions applied on them based on their class ID, weessentially treat the blocks and consequently the coefficients tobe the same after the transformation. Ideally, we would weighthe coefficients from each block in the image based on theoccurance(s) of similar such blocks in the image.We did notpursue this because we defined our transform method whichdoubly depends on the block classification for both the trainingand testing purposes.

A small variance in at least one of the transformation coeffi-cients significantly reduces the geometric mean, resulting in ahigher coding gain. Hence, a high coding gain is an indicationof good energy compaction and consequently effective coding.

While numerical measures such as the PSNR-rate curve andcoding gain provide an idea of the performance of compression

Page 5: EE 398A: IMAGE & VIDEO COMPRESSION PROJECT MARCH 2011 …€¦ · EE 398A: IMAGE & VIDEO COMPRESSION PROJECT MARCH 2011 1 Direction-Adaptive Karhunen-Loeve Transform for` Image Compression

EE 398A: IMAGE & VIDEO COMPRESSION PROJECT MARCH 2011 5

schemes, they do not always reflect the visual quality of thereconstructed images.

III. RESULTS & DISCUSSION

A. Overview of performance

To examine the PSNR-rate behaviour of the DA-KLT,we began with a block size of 8 and quantisation steps∆ ∈ {21, 21.5, · · · , 29} . As the quantisation step decreasedand the rate increased, the PSNR curves of the test images”Mandrill”, ”Lena”, ”Peppers” and ”Goldhill” approachedslopes of 6 dB/bit (Fig. 7) as predicted by theory (10). Thissuggests that the images could be modelled well as eitherGaussian or a Gaussian mixture.

The relative displacement of the PSNR-rate curves waslargely due to the variance σ2 as shown in (10). The image”Mandrill”, which consists of significant variation in pixelvalues for each block, had a greater variance and hence a lowervertical position in comparison to ”Lena” and ”Peppers”, bothof which have large areas of relatively little pixel variation(Fig. 8).

Visual inspection of the reconstructed images showed fewdiscernable artefacts for quantisation step ∆ < 32 . Forhigher ∆, blocking artefacts became increasingly noticeable.However, an exception was the highly textured furry regionson ”Mandrill”. Despite its relatively high distortion, thoseregions remained visually satisfactory at quantisation steps of64 and 128 (Fig. 9). This clearly illustrated that the visualredundancy in some images.

We then investigated the performance of DA-KLT byvarying the block sizes and sweeping the PSNR-rate curveusing the same set of quantisation steps. Changing blocksize produced progressive improvement in the performance(Fig. 10). For example, for ”Mandrill”, increasing the blocksize from 4 to 8 produced a gain of about 1 dB. A furtherincrease from 8 to 16 produced an additional gain of about0.1 dB. This improvement was partly due to the decrease in

Fig. 7. Performance of DA-KLT on four test images. The statistics of thesefour images can be modelled well using Gaussian distributions or Gaussianmixtures (e.g. Laplacian distributions) due to the 6 dB/bit slope of the PSNR-rate curve.

Fig. 8. Original test images (from left to right): ”Mandrill”, ”Lena”,”Peppers” and ”Goldhill”.

Fig. 9. Reconstructed images ”Mandrill” and ”Lena” with quantisation stepof 128 and block size 8. Visually, blocking artefact in the fur of the mandrillis much harder to discern than relatively uniform regions.

overhead bits required to transmit class IDs. More importantly,large blocks were able to capture more joint information andallowed it to be exploited by the compression technique. ThePSNR-rate curves for ”Mandrill” remained roughly parallel,demonstrating that the class statistics (Gaussian or Gaussianmixture in this case) for various block sizes remained roughlyconstant. For ”Lena”, the statistics were not as invariant toblock size, so the PSNR-rate curve at larger block sizes hada shallower slope (Fig. 10). Visually, larger blocks producedbetter reconstruction within the block boundaries (Fig. 11).

B. Effectiveness of Direction Adaptation

To isolate the improvement offered by exploiting the di-rectionality present in the image blocks, and adapting basisvectors to each class, we compared the PSNR-rate curve ofDA-KLT to that of a simple trained KLT that uses onlyone set of basis vectors across all blocks. We found thatsome improvements are made by DA-KLT comparing to KLT.The amount of improvement varied between test images,corresponding to the number of strongly directional blocksthat can take advantage of direction adaptation. For example,direction adaptation improved the high-rate performance for”Mandrill” by about 1 dB, for ”Peppers” by 0.85 dB andfor ”Lena” by 1.15 dB (Fig. 12). Coding gain values for theimages also confirmed that the DA-KLT was a more effectiveorthonormal transform coding scheme than KLT (Table I).

At very low rates, the KLT outperformed the DA-KLT interms of PSNR-rate relationship. For such high distortion,no information was effectively exploited by either trans-form method, while DA-KLT had the additional overhead oftransmitting class IDs. Of course, since realistic compressionroutines would not be operating in this region, this point wasmostly of academic interest.

Page 6: EE 398A: IMAGE & VIDEO COMPRESSION PROJECT MARCH 2011 …€¦ · EE 398A: IMAGE & VIDEO COMPRESSION PROJECT MARCH 2011 1 Direction-Adaptive Karhunen-Loeve Transform for` Image Compression

EE 398A: IMAGE & VIDEO COMPRESSION PROJECT MARCH 2011 6

Fig. 10. Performance of DA-KLT on ”Mandrill” and ”Lena” for block sizes of4, 8 and 16. Larger block sizes improved performance, though with decreasingbenefit.

Fig. 11. Reconstructed images ”Mandrill” and ”Lena” with quantisation stepof 128 and block size 16. Reconstructed image quality is improved using thelarger block size.

Fig. 12. PSNR-rate curve of DA-KLT with comparison to a trained, non-directional-adaptive conventional KLT. Over the region of interest (30 −40 dB), DA-KLT consistently outperforms KLT by exploiting the direction-ality of image blocks to estimate the pixel statistics for each block moreaccurately.

TABLE ICODING GAIN FOR TEST IMAGES ”MANDRILL”, ”LENA” AND ”PEPPERS”

FOR 8 × 8 BLOCKS. DA-KLT CONSISTENTLY ACHIEVES A HIGHERCODING GAIN THAN KLT, SHOWING THAT IT IS A MORE EFFECTIVE

TRANSFORM CODING METHOD.

Coding Gain Mandrill Lena PeppersDA-KLT 4.79 54.35 52.35KLT 4.46 47.63 41.82

C. Comparison to DDCT

In the region of interest (30 − 40 dB PSNR), the DA-KLT consistently underperformed in comparison to the DDCTimplemented by Chuo-ling Chang of Stanford University (Fig.13). The main reason may be that the estimated minimumaverage word length was only an estimation of the entropyrate. Hence the theoretical bound prescribed by the PSNR-entropy curve should be located further to the left of boththe DDCT and DA-KLT curves. Then, poor estimation of thePMFs led to a ”poorer” performance by the DA-KLT. Giventhat the DDCT algorithm proposed by Zeng and Fu [2] utiliseda brute-force block classification technique that choose theencoding mode (i.e. direction) with the lowest Lagrangian cost,it is also likely our gradient and variance-based classificationalgorithm was sub-optimal. The overhead in transmitting theclass information of the blocks also contributed to the dis-crepancy. Hence the gap between DA-KLT and DDCT maybe reduced through the use of more training images with morerepresentative statistics, and the implementation of a better setof measures for block classification.

One advantage of DA-KLT over DDCT is that the gradientand variance-based block classification method was muchfaster than the brute-force classification, making the formermuch more suitable to applications where the encoder haslimited computing resources, e.g. mobile devices.

Fig. 13. PSNR-rate curves of DA-KLT and DDCT. DDCT consistentlyperform better than DA-KLT.

Page 7: EE 398A: IMAGE & VIDEO COMPRESSION PROJECT MARCH 2011 …€¦ · EE 398A: IMAGE & VIDEO COMPRESSION PROJECT MARCH 2011 1 Direction-Adaptive Karhunen-Loeve Transform for` Image Compression

EE 398A: IMAGE & VIDEO COMPRESSION PROJECT MARCH 2011 7

D. Huffman Encoder Performance

The performance of the Huffman encoder was as expectedfrom theory. At high rates (low quantisation step), the Huffmanrate approached the estimated entropy (Fig. 14). It is importantto note, however, that both the VLC tables and Huffmantables were calculated using the same set of estimated PMFs.Depending on the quality of the PMF estimation, both theHuffman rate and the estimated entropy may still have ampleroom for improvement. At low rates (high quantisation step),because our implementation used at least one bit to representeach symbol (transformation coefficient or class ID), theHuffman rate was lower-bounded by 1 (Fig. 14).

E. Principal Component Truncation

Because truncation of principal components irreversiblydiscarded some energy-carrying coefficients from the testimages, it placed an upper limit on the PSNR that could beachieved (Fig. 15). The actual values of the limit for differenttest images depended on the actual image statistics, as thetruncation threshold applies exactly only to the training imageset.

F. DC Separation and ∆DC Correction

The technique of DC separation and ∆DC correction wasimplemented in the forward and inverse transforms of DDCT[2] to avoid the ”mean weighting defect” associated with usingvariable-length DCT [4] as shown in the Introduction. Whileit was proposed that this addition may be able to improvethe performance of the DA-KLT, we hypothesised that itwould in fact reduce the performance. Because the KLT isnot a separable transform, encoding the mean value of theblock is equivalent to assigning the constant vector as a basisvector. While the remaining data, now with N2 − 1 degreesof freedom, was encoded using an optimal basis, the overallset of N2 basis vectors was sub-optimal. Hence the DA-KLTperformed significantly worse with DC correction (Fig. 16).

Fig. 14. Huffman rate and estimated entropy of DA-KLT with respect tochanging quantisation steps for test images ”Lena” and ”Peppers”.

Fig. 15. PSNR-rate curves of DA-KLT with principal component truncation atvarious thresholds. The curves are traced using the same range of quantisationsteps. At high rate the plateauing of the PSNR is caused by the irreversiblediscarding of image energy during the truncation process. As finer quantisationsteps are used, the increase in rate is lower when more principal componentshave been discarded. This is due to the smaller number of transformationcoefficients produced.

Fig. 16. PSNR-rate curves of DA-KLT with and without DC separationand ∆DC correction. Due to the optimality already achieved by KLT, DCcorrection degrades the performance of DA-KLT.

IV. CONCLUSION

In this report, we present DA-KLT as a novel transform cod-ing technique for image compression. DA-KLT was designedto exploit the directionality in images as well as optimisethe transformation basis for image content. PSNR-rate curvesand coding gain showed that it consistently outperformed theKLT. However, DA-KLT underperformed comparing to theDDCT, possibly due to insufficient training and a sub-optimalclassification algorithm. We further explored the effects ofprincipal component truncation and DC separation, and foundboth modifications to be disadvantageous to the performanceof DA-KLT. Our work has combined two approaches that havehitherto remained separate. Hopefully this will become a fea-sible direction for future improvements in image compression

Page 8: EE 398A: IMAGE & VIDEO COMPRESSION PROJECT MARCH 2011 …€¦ · EE 398A: IMAGE & VIDEO COMPRESSION PROJECT MARCH 2011 1 Direction-Adaptive Karhunen-Loeve Transform for` Image Compression

EE 398A: IMAGE & VIDEO COMPRESSION PROJECT MARCH 2011 8

techniques.Suggested future work may involve improving the block

classification algorithm and increasing the size and diversity ofthe training set, so that coefficient distributions may be betterestimated. We propose a Lagrangian cost-based frameworkfor choosing the best trained basis from the trained set ofbases. Performance of the DA-KLT may also be improvedusing adaptive block sizes.

APPENDIX AWORK DISTRIBUTION

TABLE IITABLE SHOWING TASK DISTRIBUTION BETWEEN TEAM MEMBERS FOR

RESULTS THAT MADE IT INTO REPORT

Task Vinay WendyPlanning and code architecture 50% 50%Block classification 90% 10%KLT basis calculation (includingclass statistics calculation and prin-cipal component truncation)

10% 90%

KLT implementation 0% 100%Quantisation 100% 0%VLC training 0% 100%Huffman coder training 100% 0%Image reconstruction and distortioncalculation

0% 100%

VLC rate calculation 0% 100%Coding gain calculation 100% 0%Huffman encoder 100% 0%System integration and debugging 20% 80%Presentation slides 30% 70%Report 45% 55%

TABLE IIITABLE SHOWING TASK DISTRIBUTION BETWEEN TEAM MEMBERS FORINVESTIGATIVE WORK AND RESULTS THAT COULD NOT MAKE IT INTO

REPORT OWING TO TIME CONSTRAINTS

Task Vinay WendyAdaptive block size 40% 60%Brute force classification 90% 10%Synthetic directional block genera-tion

100% 0%

DDCT implementation (renderedunnecessary by Chuo-ling’s code)

0% 100%

ACKNOWLEDGMENT

The authors would like to thank Prof. Bernd Girod and MinaMakar for their invaluable guidance throughout this project.

REFERENCES

[1] D. S. Taubman, M. W. Marcellin, and M. Rabbani,JPEG2000: ImageCompression Fundamentals, Standards and Practice. Norwell, MA:Kluwer, 2002, pp. 16, 151-154 and 185.

[2] B. Zeng and J. Fu, ”Directional Discrete Cosine Transforms-a newframework for image coding,” IEEE Trans. Circuits Syst. Video Technol.,vol. 18, no. 3, pp. 305-313, Mar. 2008.

[3] J. Canny, ”A computational approach to edge detection,” IEEE Trans.Pattern Anal. Mach. Intell., vol. 8, no. 6, pp. 679-698, Nov. 1986.

[4] P. Kauff and K Schuur, ”A shape-adaptive DCT with block-basedDC separation and ∆DC correction,” IEEE Trans. Circuits Syst. VideoTechnol., vol. 8, no. 3, pp. 237-242, Jun. 1998.

[5] The training and test images come from the following sources:• University of Wisconsin-Madison, (date unknown). Public-domain

test images for homeworks and projects [Online]. Available: http://homepages.cae.wisc.edu/∼ece533/images/

• University of Southern California, (date unknown). The USC-SIPIimage database [Online]. Available: http://sipi.usc.edu/database/

• F. Petitcolas, (Jun 2009). Photo database [Online]. Available:http://www.petitcolas.net/fabien/watermarking/image database/index.html– Fontaine des Terreaux. Copyright photo courtesy of Eric Laboure– Pills. Copyright photo courtesy of Karel de Gendre– Brandy rose. Copyright photo courtesy of Toni Lankerd, 18347

Woodland Ridge Dr. Apt #7, Spring Lake, MI 49456, U.S.A.([email protected])

– Fourviere Cathedral, north wall. F. A. P. Petitcolas– Paper machine. Copyright photo courtesy of Karel de Gendre– Opera House of Lyon– New-York. Copyright photo courtesy of Patrick Loo, University

of Cambridge ([email protected])– Arctic Hare. Copyright photos courtesy of Robert E. Barber,

Barber Nature Photography ([email protected])