[ieee 2013 picture coding symposium (pcs) - san jose, ca, usa (2013.12.8-2013.12.11)] 2013 picture...

4
Compression of Image Ensembles using Tensor Decomposition Abo Talib Mahfoodh, IEEE Student, and Hayder Radha, IEEE Fellow Department of Electrical and Computer Engineering Michigan State University East Lansing, MI, USA {mahfoodh,radha}@msu.edu Abstract- In this paper we address the problem of compressing a collection of still images of the same type, such as an image database of faces or any other visual objeets of similar shapes. We refer to such collection of images as an image ensemble. The compression of image ensembles presents a unique set of requirements sueh as random access to any image within the collection without the need to reconstruct other images in the same ensemble. Such requirement is readily met by any still image compression standard, simply by encoding each image in isolation of other images within the same ensemble. However, traditional approaches of still image eompression do not exploit the strong correlation that might exist among images within a given ensemble. In this paper we argue that a tensor- decomposition framework can achieve both: (a) random access to any image within an ensemble and (b) the exploitation of the correlation among all images within the same ensemble. To that end, we propose a progressive tensor-factorization approach that decomposes an image ensemble into a set of block-wise rank-one tensors. We derive and encode the rank-one tensors of an image ensemble using an optimization method for rank allocation among the original bloek tensors. Our simulation results show the viability of the proposed tensor framework when applied to image ensembles of faces. I. INTRODUCTION Image databases represent a core component of many well- established and emerging applications and services including ecommerce and security. For example, image databases of faces, fingerprints, and eye retinas are used extensively for biometric and other security-related applications. Such databases store a vast number of images of the same type, and yet, traditional compression standards are used to compress and store these images without exploiting the correlation that potentially exists among the images within the same database. For example, the ISO/IEC 19794 standard on biometric data interchange format defined JPEG and JPEG2000 as admissible lossy compression methods. A key driver for encoding each image in isolation of other images within the same database is the ability to access and decode any image without the need to access/decode other images. Such requirement eliminates popular video coding standards as viable candidates for coding still-image databases. In this paper, we propose to employ a tensor-decomposition amework that can achieve both: (a) random access to any image within a collection of images coded jointly and (b) coding efficiency by exploiting any potential correlation that may exist among the images within the same database. To 978-1-4799-0294-1/13/$31.00 ©2013 IEEE 21 bring focus to the problem addressed here, we define an image ensemble as a set of images of the same type (e.g., images of human faces). Thus, our goal is to develop a compression approach for image ensembles while achieving ll random access, and we argue that a tensor-based strategy is a viable solution. The proposed tensor-based framework can access any image within an ensemble at different levels of quality (and corresponding scalable bitrates) without the need to reconstruct or access any other image om the same ensemble. This is crucial, not only for storage efficiency, but also to reduce bandwidth across networks for scalable search and retrieval engines. In particular, we treat an image ensemble as a 3D tensor. We employ a Progressive Canonical/Parallel (PCP) tensor decomposition, which is based on the popular CP tensor factorization [1] , for decomposing the image-ensemble tensor. Similar to CP, PCP factors an N -dimensional tensor into R rank-one tensors, each of which is represented by N (one- dimensional) vectors. We apply PCP in a block-wise manner, and each tensor-block is decomposed into a different number R of rank-one tensors. This strategy leads to the need for deriving the optimal distribution of rank-one tensors among the different image-ensemble tensor blocks. Thus, we develop a greedy algorithm for identiing the optimal number of rank-one tensors that should be used for decomposing the image- ensemble blocks. (More detail regarding our tensor-based compression of image-ensembles is provided rther below.) We applied the proposed method to the Yale Face Database B [2] and compared the results with traditional compression standards. The experimental results show the viability of the proposed tensor-based amework for image-ensemble compression. II. COMPRESSION SYSTEM ARCHITECTURE A high-level architecture for the proposed image-ensemble coding system is shown in Figure 1. A collection of similar images are organized into a 3D image-ensemble tensor, and are divided into 3D tensor-blocks. A tensor factorization approach is applied to each tensor block j. This process generates Rj rank-one tensors for the corresponding original tensor-blockj. The resulting rank-one tensors are represented by ID vectors that we refer to as eigenfibers. These eigenfibers, which contain significantly smaller number of elements than the number of voxels in the original 3D image-ensemble, represent a compact representation of the entire 3D tensor data. It is crucial to note that any image (2D slice) within the 3D tensor- ensemble can be reconstructed entirely and independently of PCS 2013

Upload: hayder

Post on 02-Mar-2017

213 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: [IEEE 2013 Picture Coding Symposium (PCS) - San Jose, CA, USA (2013.12.8-2013.12.11)] 2013 Picture Coding Symposium (PCS) - Compression of image ensembles using tensor decomposition

Compression of Image Ensembles using Tensor Decomposition

Abo Talib Mahfoodh, IEEE Student, and Hayder Radha, IEEE Fellow Department of Electrical and Computer Engineering

Michigan State University East Lansing, MI, USA

{mahfoodh,radha}@msu.edu

Abstract- In this paper we address the problem of compressing a collection of still images of the same type, such as an image database of faces or any other visual objeets of similar shapes. We refer to such collection of images as an image ensemble. The compression of image ensembles presents a unique set of requirements sueh as random access to any image within the collection without the need to reconstruct other images in the same ensemble. Such requirement is readily met by any still image compression standard, simply by encoding each image in isolation of other images within the same ensemble. However, traditional approaches of still image eompression do not exploit the strong correlation that might exist among images within a given ensemble. In this paper we argue that a tensor­decomposition framework can achieve both: (a) random access to any image within an ensemble and (b) the exploitation of the correlation among all images within the same ensemble. To that end, we propose a progressive tensor-factorization approach that decomposes an image ensemble into a set of block-wise rank-one tensors. We derive and encode the rank-one tensors of an image ensemble using an optimization method for rank allocation among the original bloek tensors. Our simulation results show the viability of the proposed tensor framework when applied to image ensembles of faces.

I. INTRODUCTION

Image databases represent a core component of many well­established and emerging applications and services including ecommerce and security. For example, image databases of faces, fingerprints, and eye retinas are used extensively for biometric and other security-related applications. Such databases store a vast number of images of the same type, and yet, traditional compression standards are used to compress and store these images without exploiting the correlation that potentially exists among the images within the same database. For example, the ISO/IEC 19794 standard on biometric data interchange format defined JPEG and JPEG2000 as admissible lossy compression methods. A key driver for encoding each image in isolation of other images within the same database is the ability to access and decode any image without the need to access/decode other images. Such requirement eliminates popular video coding standards as viable candidates for coding still-image databases. In this paper, we propose to employ a tensor-decomposition framework that can achieve both: (a) random access to any image within a collection of images coded jointly and (b) coding efficiency by exploiting any potential correlation that may exist among the images within the same database. To

978-1-4799-0294-1/13/$31.00 ©2013 IEEE 21

bring focus to the problem addressed here, we define an image ensemble as a set of images of the same type (e.g., images of human faces). Thus, our goal is to develop a compression approach for image ensembles while achieving full random access, and we argue that a tensor-based strategy is a viable solution. The proposed tensor-based framework can access any image within an ensemble at different levels of quality (and corresponding scalable bitrates) without the need to reconstruct or access any other image from the same ensemble. This is crucial, not only for storage efficiency, but also to reduce bandwidth across networks for scalable search and retrieval engines. In particular, we treat an image ensemble as a 3D tensor. We employ a Progressive Canonical/Parallel (PCP) tensor decomposition, which is based on the popular CP tensor factorization [1], for decomposing the image-ensemble tensor. Similar to CP, PCP factors an N -dimensional tensor into R rank-one tensors, each of which is represented by N (one­dimensional) vectors. We apply PCP in a block-wise manner, and each tensor-block is decomposed into a different number R of rank-one tensors. This strategy leads to the need for deriving the optimal distribution of rank-one tensors among the different image-ensemble tensor blocks. Thus, we develop a greedy algorithm for identifying the optimal number of rank-one tensors that should be used for decomposing the image­ensemble blocks. (More detail regarding our tensor-based compression of image-ensembles is provided further below.) We applied the proposed method to the Yale Face Database B [2] and compared the results with traditional compression standards. The experimental results show the viability of the proposed tensor-based framework for image-ensemble compression.

II. COMPRESSION SYSTEM ARCHITECTURE

A high-level architecture for the proposed image-ensemble coding system is shown in Figure 1. A collection of similar images are organized into a 3D image-ensemble tensor, and are divided into 3D tensor-blocks. A tensor factorization approach is applied to each tensor block j. This process generates Rj rank-one tensors for the corresponding original tensor-blockj. The resulting rank-one tensors are represented by ID vectors that we refer to as eigenfibers. These eigenfibers, which contain significantly smaller number of elements than the number of voxels in the original 3D image-ensemble, represent a compact representation of the entire 3D tensor data. It is crucial to note that any image (2D slice) within the 3D tensor­ensemble can be reconstructed entirely and independently of

PCS 2013

Page 2: [IEEE 2013 Picture Coding Symposium (PCS) - San Jose, CA, USA (2013.12.8-2013.12.11)] 2013 Picture Coding Symposium (PCS) - Compression of image ensembles using tensor decomposition

other images, directly from the eigenfibers (as explained further below). This compact representation provides the random access capability mentioned earlier. To achieve high coding-efficiency, the system includes two more components. First, an optimal rank-allocation process is applied to assign the appropriate number of rank-one tensors to the corresponding tensor blocks. This process is analogous to rate-allocation (or rate control) in traditional image/video coding systems. Second, one can further exploit the correlation that exists among the eigenfibers by applying some form of compression and coding processes. As explained later, we simply align the eigenfibers into a 2D matrix/image and apply a 2D image compression method. Consequently, when an access to a 2D image from the coded image-ensemble file is desired, then only a 2D image decompression operation is needed followed by a very low-complexity eigenfiber multiplications. This is the same order of complexity required when reconstructing any 2D compressed image stored within a database using a traditional approach (e.g., JPEG or JPEG2000).

The remainder of this section provides details regarding the key components of the proposed system.

A. Tensor Decomposition

Tensor is a multi-dimensional set of data. For example a vector is a one dimensional tensor and a matrix is a two dimensional tensor. Image ensemble is a 3D tensor (i.e.

X E �V1XV2XV3). Here a tensor is partitioned into a set of equal size (i.e. VI x v2 X V3) 3D tensor blocks. We employed PCP to factorize each block into a set of rank-one tensors each of which can be written as an outer product of three vectors as in

� _ "R j ( (1) (2) (3») Xj - L.r=1 Aj,r bj,r 0 bj,r 0 bj,r . (1)

Where 0 is an outer product; j is the block index; Rj is the

number of rank-one tensors and Aj,r is a normalization value.

Alternative Least Square (ALS) [3] algorithm is a method to

compute bf�) where d = 1,2,3. It minimizes the reconstruction

error as In

(2)

Where zed) = b(d1) iO. bed,) iO. is the Kronecker product r ) ,r \C:>I ) ,r ' \C:>I , d E {l,2,3}, dl E {l,2,3} - {d}, and d2 E {1,2,3} - {d, dl} . Xed) is a matrix that results from unfolding the block jwith

respect to the dth dimension. For example, X(i) E �V1X(V2V3) is

a matrix that results from unfolding the blockj with respect to the first dimension (i.e. vd [1] . X'(d),k is the eh rank-one

tensor unfolded over dimension d, and k E {l,2, ... , R} .

XI 0 X' 1 b(d)( (d))T F .

k (d),O = , (d),k = /lk k Zk • or a gIven ran

parameter R j, the ALS approach solves b�l) by fixing b�2) and

b(3) . and similarly for b(2) and b(3) r , T T •

The first issue of CP is how to choose R [1] . Since PCP is progressive in nature we employed a rank-distortion optimization algorithm to overcome this issue. For a given tensor with N 3D blocks, the goal is to find the global optimum R, where R is a vector of dimension N.

22

Single image

Block-wise tensor

Random access reconstruction

Eigenfibers

Figure 1. The architecture of the proposed compression method.

Each entry of R = (RI,R2, • • • ,RN) corresponds to the number of rank-one tensors which are used to reconstruct a 3D­block of the tensor. We formulate the rank-distortion optimization problem to find the global optimum R as in

min R

N Rj

-Nl '\' Xl - '\' A}' t' (b(l) 0 b(2) 0 b(3)) < E L L ' j,L j,L j,t - max

j=1 i=1 F

R, < - � R IN (lIN VI V2 V3 )

j=I } Y j=1 VI + V2 + V3 - max

(3)

Where Emax is the overall acceptable average error. The

second inequality in (3) captures an upper bound for the total

number of eigenfibers that can be used. Y is a regularization

parameter where y > 1. Assuming that we use the same

precision for the original tensor entries and for the elements of

the PCP decomposition (e.g., eight bits/element), using the

eigenfibers instead of the original block will result in the

compaction ratio equal to VI V2V3/Rj (VI +V2 +V3)' The second

inequality in (3) came from lower bounding the compaction

ratio by y. Note that Rmax can be simplified as (CVlV2V3)N)/ (YCvl + V2 + v3)).

A solution to this optimization problem can be found by

searching for the optimum Rj which satisfies the constraints.

We propose a greedy algorithm to solve (3). The algorithm

starts initially using R = 1. This initialization is along with the

fact that each 3D tensor block should be represented at least

with one rank-one tensor. We define Dj as block j error

decrement, corresponding to Rj increment by one (i.e.

Dj = Ej,Rj - Ej,Rj+l ) ' Initially Dj = Ej,l - Ej,2 for j = 1 ... N . Iteratively we find block j that has the maximum Dj and

increase its corresponding Rj by one. This greedy choice

provides the largest possible error reduction. After

increasing R j, the inequalities are checked to make sure they

are satisfied. The details of the algorithm are outlined in the

Page 3: [IEEE 2013 Picture Coding Symposium (PCS) - San Jose, CA, USA (2013.12.8-2013.12.11)] 2013 Picture Coding Symposium (PCS) - Compression of image ensembles using tensor decomposition

Algorithm block.

Algorithm: Rank-Distortion Optimization

Data: A set of 3D blocks of a tensor (i.e. Xj,j = 1, ... , N.), and y. Result: A set of eigenfibers to represent the input tensor. Initialization:

R=l F· d 1 bel) b(2) d b(3)c - 2 d' - C (2) m /ljp j,r' j,r' an j,r lor r - 1, an } - 1 ... N lrom .

tJ· r = IlxJ· - Ir=l AJ' i(b(l) 0 b(2) 0 b(3)) II , ! , J,! J,! J,! F forr = 1,2 andj = 1 ... N. Dj = Ej,l - Ej,2' for j = 1 ... N

While first inequality in (3) is not satisfied and the second inequality in (3) is satisfied: do

end

j = argmaxj Dj r = Rj + 1

• 1 bel) b(2) b(3) C Fmd /lj,r, j,r' j,r ' and j,r lrom (2) and store them

in matrices B(1), B(2), and B(3). • 1 bel) b(2) b(3) Fmd /lj,r+l' j,r+l' j,r+l' and j,r+l from (2).

tJ· r = IlxJ· - F=l AJ' i(b(l) 0 b(2) 0 b(3)) II , ! , J,! J,! J,! F

tJ· r+l = IlxJ· - Ir�f AJ' i(b(l) 0 b(2) 0 b(3)) II , , J,! J,! J,! F

Dj = tj,r+l - tj,r Rj = r

B. Eigerifibers arrangment

We arranged the decomposed eigenfibers bf�) into matrix N

B(d) where B(d) E IR/.VdXLj=l Rj , N is number of blocks, and

d = 1,2. The vectors Aj,r(bf;)) are arranged in matrix B(3). This will eliminate the need for coding Aj,r parameters

separately. Analogous to DC-AC separation, the eigenfibers correspond to the first rank-one tensor are arranged at the left of the matrices. This will help the coding due to the high correlation among these vectors. Figure 2 shows 220 eigenfibers arrangements obtained by decomposing an image ensemble of Yale Face Database B [2] . Total number of blocks shown is 96 and the blocks size is 16x21 x64.

C. Header Data and Reconstruction

Image ensemble size, block size, and the R vector are stored in the header. This information is required in decoding

an image. Image i of the ensemble can be decoded as in

_ ", Rj ((1) (2)) (3) IMAGEi.j - "'r=l Bj,r 0 Bj,r Bi.j,r· (4)

Where j is the block' s index, B}�), d = 1,2 is a vector B f�:r' is

a single value from row i of matrix B(3). The corresponding column index is calculated based on the value of j and r as in

{j ifr=l Column Index = N _ 1 ", j - l R. ·f 1 (5)

+ r + "'k=l ) [ r >

23

r=l A

Rj: � 1111111 111111111111111111,111,,1,111 1111111111111111111111111 II 11111 II ,111111111,,1,1, 111111 I, II 1 17 33 49 65 81 96

Figure 2. Eigenfibers arrangement and the rank values R for 96 blocks of the Yale Face Database B image ensemble.

D. Coding

The header information is entropy coded in a lossless manner. The decomposed vectors can be coded by any 2D image compression method. Here, we used lossy JPEG2000 to code them. From experience, we found that the lossy mode with compression ratio equal to 2 yields good compression without major losses in the final image quality.

III. SIMULA TION RESULT

The proposed method was applied to the Yale Face Database B [2] . The database has images of 38 persons. Each of them has 64 images of size 192x 168. These images vary in expression and illumination condition. After stacking the images on top of each other, we have a 3D tensor of size 192x168x2432. The resulting tensor is decomposed using PCP and the eigerifibers are arranged in 2D matrices. Then the result is compressed by JPEG2000. Within the context of our proposed image-ensemble tensor based compression, we compare the following tensor decomposition approaches and existing still-image compression standards used in image databases: 1- Block-wise PCP-decomposition. The block size is

16x21 x64 and the value of y is changed to control the final storage size.

2- Block-wise CP-decomposition. The block size is the same as in method (1) and the value ofy is changed to compare the results for different compaction ratios. Here, we used the same structure as in Figure 1 except the decomposition method is replaced with CP and JPEG2000 lossless mode is used since small changes in the CP decomposed vectors can lead to large error in reconstruction.

3- Storing each image separately using JPEG2000 standard. The MATLAB implementation is used. Figure 3 shows the reconstruction PSNR averaged over 38

persons versus the required space (in Bytes) for all the 64 images of a person averaged over 38 persons. Over a wide­range of bit rates, PCP clearly outperforms other methods.

Based on the progressive nature of PCP, its time complexity is linear as a function of the number of rank-one tensors. CP factorization (i.e., the encoding side) has a quadratic complexity as a function of R. Either case (PCP or

Page 4: [IEEE 2013 Picture Coding Symposium (PCS) - San Jose, CA, USA (2013.12.8-2013.12.11)] 2013 Picture Coding Symposium (PCS) - Compression of image ensembles using tensor decomposition

CP), the decoding complexity is on the same order as a traditional JPEG2000 decoding. Figure 4 shows the time complexity, where the decomposition methods are evaluated at a desktop computer with 12 GB of memory and an Intel Core i7 2600 CPU (8MB Cache, 3.4 GHz).

Figure 5 shows the PCP R values for the blocks of an image ensemble encoded at two different bitrates. Figure 6 shows one of the reconstructed images using the above methods and standard JPEG along with the MATLAB implementation of Motion JPEG2000. At higher bit rates, all the methods have similar PSNR values, and the visual quality is also similar. Figure 7 shows the reconstruction results at

higher bit rates. In conclusion, these simulations confirm the conjecture that

one can achieve highly-efficient progressive coding of image ensembles while maintaining low-complexity random access to any desired image when employing tensor-based decomposition. The proposed system that is based on PCP factorization, optimal rank-allocation, and eigenfibers coding clearly shows its viability as an image-ensemble compression framework.

34 � 32 co

;EJ..-E)

� 30 � 28 Q., � 26 � � 24

« 22 20

� � -&

� � -<Y' � / (y" /1' -e--PCP I-

/ --+--CP I--a-- JPEG2000

[!f 200 300 400 500 600

Average Bytes per image Figure 3. Average PSNR plot of 38 persons' face images from Yale Face Database B versus the average storage size required per image.

� g 0.09 f--I--+--+---IK''-7'''l--t g 60 i--i.r-t---+ � � S 0.08 W----..�.e-���==+"----( � 40 f--I--+----vL--+--+-+ � � :B 0.07 H-7:''::7r±=±==±=;t 8 0 cfl 20 � 0.06

200 300 400 500 600 0.05 Ll-:--�----'====C=::=i:=.t

300

a) Average Bytes per image b) Average Bytes per image Figure 4. Average a) encoding b) decoding time of 38 persons' face images from Yale Face Database B versus the average storage size

required per image.

4

6

8

10

12

2 4 6 8

10 12

4 6 8 2 4 6 8 a) b)

20

15

10

5

Figure 5. PCP R values for the blocks of an image ensemble encoded

at average bit rate of a) 288 b) 979 Bytes per image.

24

��. T ,,-.' .... • . t . ..

"�,, r..��

�: ..

.� '\ ,.,. �

, • •

w .

W 4·� �

Figure 6. from left to right top to bottom, original image; PCP (288 Bytes, 30.1 dB) ; CP (286 Bytes, 28.98 dB); JPEG2000 (291 Bytes, 25.68 dB); Motion JPEG2000(292 Bytes, 24.63 dB); JPEG (790 Bytes, 25.19 dB).

-.p" T' � .' .... >:' ' .. . .. . .. .. ..

,,;'\ t';\ .,;\ -- ... :-:. � .. � ... " .. , '" 'fit", .--�. �. T . . ' .... .' ' .. .. "_-t . . "

,,;\ .,�, r�� ·�i4 .�- 4 .:�

Figure 7. from left to right top to bottom, original image; PCP (979 Bytes, PSNR: 34.5); CP (986 Bytes, PSNR: 32.6); JPEG2000 (975 Bytes, PSNR: 35.27); Motion JPEG2000 (990 Bytes, PSNR: 35.1); JPEG (999 Bytes, PSNR:29.1).

REFERENCES

[1] T. G. Kolda and B. W. Bader, "Tensor decompositions and applications, " SiAM review, vol. 51, no. 3, pp. 455-500,2009.

[2] A. S. Georghiades, P. N. Belhumeur, and D. J. Kriegman, "From Few to Many: Illumination Cone Models for Face Recognition under Variable Lighting and Pose, " iEEE

Trans. Pattern Anal. Mach. intelligence, vol. 23, no. 6, pp. 643-660,2001.

[3] J. Carroll and J. Chang, "Analysis of individual differences in multidimensional scaling via an N-way generalization of 'Eckart-Young' decomposition, " Psychometrika, vol. 35, no. 3, pp. 283-319,1970.

[4] E. Acar, D. M. Dunlavy, and T. G. Kolda, "A Scalable Optimization Approach for Fitting Canonical Tensor Decompositions, " vol. 25, no. February, pp. 67-86, 2011.

[5] A. Mahfoodh and H. Radha, "Tensor Video Coding, " Acoustics, Speech and Signal Processing (iCASSP), 2013 iEEE international Conference on. IEEE, 2013.