[ieee 2013 picture coding symposium (pcs) - san jose, ca, usa (2013.12.8-2013.12.11)] 2013 picture...
TRANSCRIPT
![Page 1: [IEEE 2013 Picture Coding Symposium (PCS) - San Jose, CA, USA (2013.12.8-2013.12.11)] 2013 Picture Coding Symposium (PCS) - Compression of image ensembles using tensor decomposition](https://reader035.vdocuments.site/reader035/viewer/2022080422/5750a5311a28abcf0cb01985/html5/thumbnails/1.jpg)
Compression of Image Ensembles using Tensor Decomposition
Abo Talib Mahfoodh, IEEE Student, and Hayder Radha, IEEE Fellow Department of Electrical and Computer Engineering
Michigan State University East Lansing, MI, USA
{mahfoodh,radha}@msu.edu
Abstract- In this paper we address the problem of compressing a collection of still images of the same type, such as an image database of faces or any other visual objeets of similar shapes. We refer to such collection of images as an image ensemble. The compression of image ensembles presents a unique set of requirements sueh as random access to any image within the collection without the need to reconstruct other images in the same ensemble. Such requirement is readily met by any still image compression standard, simply by encoding each image in isolation of other images within the same ensemble. However, traditional approaches of still image eompression do not exploit the strong correlation that might exist among images within a given ensemble. In this paper we argue that a tensordecomposition framework can achieve both: (a) random access to any image within an ensemble and (b) the exploitation of the correlation among all images within the same ensemble. To that end, we propose a progressive tensor-factorization approach that decomposes an image ensemble into a set of block-wise rank-one tensors. We derive and encode the rank-one tensors of an image ensemble using an optimization method for rank allocation among the original bloek tensors. Our simulation results show the viability of the proposed tensor framework when applied to image ensembles of faces.
I. INTRODUCTION
Image databases represent a core component of many wellestablished and emerging applications and services including ecommerce and security. For example, image databases of faces, fingerprints, and eye retinas are used extensively for biometric and other security-related applications. Such databases store a vast number of images of the same type, and yet, traditional compression standards are used to compress and store these images without exploiting the correlation that potentially exists among the images within the same database. For example, the ISO/IEC 19794 standard on biometric data interchange format defined JPEG and JPEG2000 as admissible lossy compression methods. A key driver for encoding each image in isolation of other images within the same database is the ability to access and decode any image without the need to access/decode other images. Such requirement eliminates popular video coding standards as viable candidates for coding still-image databases. In this paper, we propose to employ a tensor-decomposition framework that can achieve both: (a) random access to any image within a collection of images coded jointly and (b) coding efficiency by exploiting any potential correlation that may exist among the images within the same database. To
978-1-4799-0294-1/13/$31.00 ©2013 IEEE 21
bring focus to the problem addressed here, we define an image ensemble as a set of images of the same type (e.g., images of human faces). Thus, our goal is to develop a compression approach for image ensembles while achieving full random access, and we argue that a tensor-based strategy is a viable solution. The proposed tensor-based framework can access any image within an ensemble at different levels of quality (and corresponding scalable bitrates) without the need to reconstruct or access any other image from the same ensemble. This is crucial, not only for storage efficiency, but also to reduce bandwidth across networks for scalable search and retrieval engines. In particular, we treat an image ensemble as a 3D tensor. We employ a Progressive Canonical/Parallel (PCP) tensor decomposition, which is based on the popular CP tensor factorization [1], for decomposing the image-ensemble tensor. Similar to CP, PCP factors an N -dimensional tensor into R rank-one tensors, each of which is represented by N (onedimensional) vectors. We apply PCP in a block-wise manner, and each tensor-block is decomposed into a different number R of rank-one tensors. This strategy leads to the need for deriving the optimal distribution of rank-one tensors among the different image-ensemble tensor blocks. Thus, we develop a greedy algorithm for identifying the optimal number of rank-one tensors that should be used for decomposing the imageensemble blocks. (More detail regarding our tensor-based compression of image-ensembles is provided further below.) We applied the proposed method to the Yale Face Database B [2] and compared the results with traditional compression standards. The experimental results show the viability of the proposed tensor-based framework for image-ensemble compression.
II. COMPRESSION SYSTEM ARCHITECTURE
A high-level architecture for the proposed image-ensemble coding system is shown in Figure 1. A collection of similar images are organized into a 3D image-ensemble tensor, and are divided into 3D tensor-blocks. A tensor factorization approach is applied to each tensor block j. This process generates Rj rank-one tensors for the corresponding original tensor-blockj. The resulting rank-one tensors are represented by ID vectors that we refer to as eigenfibers. These eigenfibers, which contain significantly smaller number of elements than the number of voxels in the original 3D image-ensemble, represent a compact representation of the entire 3D tensor data. It is crucial to note that any image (2D slice) within the 3D tensorensemble can be reconstructed entirely and independently of
PCS 2013
![Page 2: [IEEE 2013 Picture Coding Symposium (PCS) - San Jose, CA, USA (2013.12.8-2013.12.11)] 2013 Picture Coding Symposium (PCS) - Compression of image ensembles using tensor decomposition](https://reader035.vdocuments.site/reader035/viewer/2022080422/5750a5311a28abcf0cb01985/html5/thumbnails/2.jpg)
other images, directly from the eigenfibers (as explained further below). This compact representation provides the random access capability mentioned earlier. To achieve high coding-efficiency, the system includes two more components. First, an optimal rank-allocation process is applied to assign the appropriate number of rank-one tensors to the corresponding tensor blocks. This process is analogous to rate-allocation (or rate control) in traditional image/video coding systems. Second, one can further exploit the correlation that exists among the eigenfibers by applying some form of compression and coding processes. As explained later, we simply align the eigenfibers into a 2D matrix/image and apply a 2D image compression method. Consequently, when an access to a 2D image from the coded image-ensemble file is desired, then only a 2D image decompression operation is needed followed by a very low-complexity eigenfiber multiplications. This is the same order of complexity required when reconstructing any 2D compressed image stored within a database using a traditional approach (e.g., JPEG or JPEG2000).
The remainder of this section provides details regarding the key components of the proposed system.
A. Tensor Decomposition
Tensor is a multi-dimensional set of data. For example a vector is a one dimensional tensor and a matrix is a two dimensional tensor. Image ensemble is a 3D tensor (i.e.
X E �V1XV2XV3). Here a tensor is partitioned into a set of equal size (i.e. VI x v2 X V3) 3D tensor blocks. We employed PCP to factorize each block into a set of rank-one tensors each of which can be written as an outer product of three vectors as in
� _ "R j ( (1) (2) (3») Xj - L.r=1 Aj,r bj,r 0 bj,r 0 bj,r . (1)
Where 0 is an outer product; j is the block index; Rj is the
number of rank-one tensors and Aj,r is a normalization value.
Alternative Least Square (ALS) [3] algorithm is a method to
compute bf�) where d = 1,2,3. It minimizes the reconstruction
error as In
(2)
Where zed) = b(d1) iO. bed,) iO. is the Kronecker product r ) ,r \C:>I ) ,r ' \C:>I , d E {l,2,3}, dl E {l,2,3} - {d}, and d2 E {1,2,3} - {d, dl} . Xed) is a matrix that results from unfolding the block jwith
respect to the dth dimension. For example, X(i) E �V1X(V2V3) is
a matrix that results from unfolding the blockj with respect to the first dimension (i.e. vd [1] . X'(d),k is the eh rank-one
tensor unfolded over dimension d, and k E {l,2, ... , R} .
XI 0 X' 1 b(d)( (d))T F .
k (d),O = , (d),k = /lk k Zk • or a gIven ran
parameter R j, the ALS approach solves b�l) by fixing b�2) and
b(3) . and similarly for b(2) and b(3) r , T T •
The first issue of CP is how to choose R [1] . Since PCP is progressive in nature we employed a rank-distortion optimization algorithm to overcome this issue. For a given tensor with N 3D blocks, the goal is to find the global optimum R, where R is a vector of dimension N.
22
Single image
Block-wise tensor
Random access reconstruction
Eigenfibers
Figure 1. The architecture of the proposed compression method.
Each entry of R = (RI,R2, • • • ,RN) corresponds to the number of rank-one tensors which are used to reconstruct a 3Dblock of the tensor. We formulate the rank-distortion optimization problem to find the global optimum R as in
min R
N Rj
-Nl '\' Xl - '\' A}' t' (b(l) 0 b(2) 0 b(3)) < E L L ' j,L j,L j,t - max
j=1 i=1 F
R, < - � R IN (lIN VI V2 V3 )
j=I } Y j=1 VI + V2 + V3 - max
(3)
Where Emax is the overall acceptable average error. The
second inequality in (3) captures an upper bound for the total
number of eigenfibers that can be used. Y is a regularization
parameter where y > 1. Assuming that we use the same
precision for the original tensor entries and for the elements of
the PCP decomposition (e.g., eight bits/element), using the
eigenfibers instead of the original block will result in the
compaction ratio equal to VI V2V3/Rj (VI +V2 +V3)' The second
inequality in (3) came from lower bounding the compaction
ratio by y. Note that Rmax can be simplified as (CVlV2V3)N)/ (YCvl + V2 + v3)).
A solution to this optimization problem can be found by
searching for the optimum Rj which satisfies the constraints.
We propose a greedy algorithm to solve (3). The algorithm
starts initially using R = 1. This initialization is along with the
fact that each 3D tensor block should be represented at least
with one rank-one tensor. We define Dj as block j error
decrement, corresponding to Rj increment by one (i.e.
Dj = Ej,Rj - Ej,Rj+l ) ' Initially Dj = Ej,l - Ej,2 for j = 1 ... N . Iteratively we find block j that has the maximum Dj and
increase its corresponding Rj by one. This greedy choice
provides the largest possible error reduction. After
increasing R j, the inequalities are checked to make sure they
are satisfied. The details of the algorithm are outlined in the
![Page 3: [IEEE 2013 Picture Coding Symposium (PCS) - San Jose, CA, USA (2013.12.8-2013.12.11)] 2013 Picture Coding Symposium (PCS) - Compression of image ensembles using tensor decomposition](https://reader035.vdocuments.site/reader035/viewer/2022080422/5750a5311a28abcf0cb01985/html5/thumbnails/3.jpg)
Algorithm block.
Algorithm: Rank-Distortion Optimization
Data: A set of 3D blocks of a tensor (i.e. Xj,j = 1, ... , N.), and y. Result: A set of eigenfibers to represent the input tensor. Initialization:
R=l F· d 1 bel) b(2) d b(3)c - 2 d' - C (2) m /ljp j,r' j,r' an j,r lor r - 1, an } - 1 ... N lrom .
tJ· r = IlxJ· - Ir=l AJ' i(b(l) 0 b(2) 0 b(3)) II , ! , J,! J,! J,! F forr = 1,2 andj = 1 ... N. Dj = Ej,l - Ej,2' for j = 1 ... N
While first inequality in (3) is not satisfied and the second inequality in (3) is satisfied: do
end
j = argmaxj Dj r = Rj + 1
• 1 bel) b(2) b(3) C Fmd /lj,r, j,r' j,r ' and j,r lrom (2) and store them
in matrices B(1), B(2), and B(3). • 1 bel) b(2) b(3) Fmd /lj,r+l' j,r+l' j,r+l' and j,r+l from (2).
tJ· r = IlxJ· - F=l AJ' i(b(l) 0 b(2) 0 b(3)) II , ! , J,! J,! J,! F
tJ· r+l = IlxJ· - Ir�f AJ' i(b(l) 0 b(2) 0 b(3)) II , , J,! J,! J,! F
Dj = tj,r+l - tj,r Rj = r
B. Eigerifibers arrangment
We arranged the decomposed eigenfibers bf�) into matrix N
B(d) where B(d) E IR/.VdXLj=l Rj , N is number of blocks, and
d = 1,2. The vectors Aj,r(bf;)) are arranged in matrix B(3). This will eliminate the need for coding Aj,r parameters
separately. Analogous to DC-AC separation, the eigenfibers correspond to the first rank-one tensor are arranged at the left of the matrices. This will help the coding due to the high correlation among these vectors. Figure 2 shows 220 eigenfibers arrangements obtained by decomposing an image ensemble of Yale Face Database B [2] . Total number of blocks shown is 96 and the blocks size is 16x21 x64.
C. Header Data and Reconstruction
Image ensemble size, block size, and the R vector are stored in the header. This information is required in decoding
an image. Image i of the ensemble can be decoded as in
_ ", Rj ((1) (2)) (3) IMAGEi.j - "'r=l Bj,r 0 Bj,r Bi.j,r· (4)
Where j is the block' s index, B}�), d = 1,2 is a vector B f�:r' is
a single value from row i of matrix B(3). The corresponding column index is calculated based on the value of j and r as in
{j ifr=l Column Index = N _ 1 ", j - l R. ·f 1 (5)
+ r + "'k=l ) [ r >
23
r=l A
Rj: � 1111111 111111111111111111,111,,1,111 1111111111111111111111111 II 11111 II ,111111111,,1,1, 111111 I, II 1 17 33 49 65 81 96
Figure 2. Eigenfibers arrangement and the rank values R for 96 blocks of the Yale Face Database B image ensemble.
D. Coding
The header information is entropy coded in a lossless manner. The decomposed vectors can be coded by any 2D image compression method. Here, we used lossy JPEG2000 to code them. From experience, we found that the lossy mode with compression ratio equal to 2 yields good compression without major losses in the final image quality.
III. SIMULA TION RESULT
The proposed method was applied to the Yale Face Database B [2] . The database has images of 38 persons. Each of them has 64 images of size 192x 168. These images vary in expression and illumination condition. After stacking the images on top of each other, we have a 3D tensor of size 192x168x2432. The resulting tensor is decomposed using PCP and the eigerifibers are arranged in 2D matrices. Then the result is compressed by JPEG2000. Within the context of our proposed image-ensemble tensor based compression, we compare the following tensor decomposition approaches and existing still-image compression standards used in image databases: 1- Block-wise PCP-decomposition. The block size is
16x21 x64 and the value of y is changed to control the final storage size.
2- Block-wise CP-decomposition. The block size is the same as in method (1) and the value ofy is changed to compare the results for different compaction ratios. Here, we used the same structure as in Figure 1 except the decomposition method is replaced with CP and JPEG2000 lossless mode is used since small changes in the CP decomposed vectors can lead to large error in reconstruction.
3- Storing each image separately using JPEG2000 standard. The MATLAB implementation is used. Figure 3 shows the reconstruction PSNR averaged over 38
persons versus the required space (in Bytes) for all the 64 images of a person averaged over 38 persons. Over a widerange of bit rates, PCP clearly outperforms other methods.
Based on the progressive nature of PCP, its time complexity is linear as a function of the number of rank-one tensors. CP factorization (i.e., the encoding side) has a quadratic complexity as a function of R. Either case (PCP or
![Page 4: [IEEE 2013 Picture Coding Symposium (PCS) - San Jose, CA, USA (2013.12.8-2013.12.11)] 2013 Picture Coding Symposium (PCS) - Compression of image ensembles using tensor decomposition](https://reader035.vdocuments.site/reader035/viewer/2022080422/5750a5311a28abcf0cb01985/html5/thumbnails/4.jpg)
CP), the decoding complexity is on the same order as a traditional JPEG2000 decoding. Figure 4 shows the time complexity, where the decomposition methods are evaluated at a desktop computer with 12 GB of memory and an Intel Core i7 2600 CPU (8MB Cache, 3.4 GHz).
Figure 5 shows the PCP R values for the blocks of an image ensemble encoded at two different bitrates. Figure 6 shows one of the reconstructed images using the above methods and standard JPEG along with the MATLAB implementation of Motion JPEG2000. At higher bit rates, all the methods have similar PSNR values, and the visual quality is also similar. Figure 7 shows the reconstruction results at
higher bit rates. In conclusion, these simulations confirm the conjecture that
one can achieve highly-efficient progressive coding of image ensembles while maintaining low-complexity random access to any desired image when employing tensor-based decomposition. The proposed system that is based on PCP factorization, optimal rank-allocation, and eigenfibers coding clearly shows its viability as an image-ensemble compression framework.
34 � 32 co
;EJ..-E)
� 30 � 28 Q., � 26 � � 24
« 22 20
� � -&
� � -<Y' � / (y" /1' -e--PCP I-
/ --+--CP I--a-- JPEG2000
[!f 200 300 400 500 600
Average Bytes per image Figure 3. Average PSNR plot of 38 persons' face images from Yale Face Database B versus the average storage size required per image.
� g 0.09 f--I--+--+---IK''-7'''l--t g 60 i--i.r-t---+ � � S 0.08 W----..�.e-���==+"----( � 40 f--I--+----vL--+--+-+ � � :B 0.07 H-7:''::7r±=±==±=;t 8 0 cfl 20 � 0.06
200 300 400 500 600 0.05 Ll-:--�----'====C=::=i:=.t
300
a) Average Bytes per image b) Average Bytes per image Figure 4. Average a) encoding b) decoding time of 38 persons' face images from Yale Face Database B versus the average storage size
required per image.
4
6
8
10
12
2 4 6 8
10 12
4 6 8 2 4 6 8 a) b)
20
15
10
5
Figure 5. PCP R values for the blocks of an image ensemble encoded
at average bit rate of a) 288 b) 979 Bytes per image.
24
��. T ,,-.' .... • . t . ..
"�,, r..��
�: ..
�
.� '\ ,.,. �
, • •
w .
W 4·� �
Figure 6. from left to right top to bottom, original image; PCP (288 Bytes, 30.1 dB) ; CP (286 Bytes, 28.98 dB); JPEG2000 (291 Bytes, 25.68 dB); Motion JPEG2000(292 Bytes, 24.63 dB); JPEG (790 Bytes, 25.19 dB).
-.p" T' � .' .... >:' ' .. . .. . .. .. ..
,,;'\ t';\ .,;\ -- ... :-:. � .. � ... " .. , '" 'fit", .--�. �. T . . ' .... .' ' .. .. "_-t . . "
,,;\ .,�, r�� ·�i4 .�- 4 .:�
Figure 7. from left to right top to bottom, original image; PCP (979 Bytes, PSNR: 34.5); CP (986 Bytes, PSNR: 32.6); JPEG2000 (975 Bytes, PSNR: 35.27); Motion JPEG2000 (990 Bytes, PSNR: 35.1); JPEG (999 Bytes, PSNR:29.1).
REFERENCES
[1] T. G. Kolda and B. W. Bader, "Tensor decompositions and applications, " SiAM review, vol. 51, no. 3, pp. 455-500,2009.
[2] A. S. Georghiades, P. N. Belhumeur, and D. J. Kriegman, "From Few to Many: Illumination Cone Models for Face Recognition under Variable Lighting and Pose, " iEEE
Trans. Pattern Anal. Mach. intelligence, vol. 23, no. 6, pp. 643-660,2001.
[3] J. Carroll and J. Chang, "Analysis of individual differences in multidimensional scaling via an N-way generalization of 'Eckart-Young' decomposition, " Psychometrika, vol. 35, no. 3, pp. 283-319,1970.
[4] E. Acar, D. M. Dunlavy, and T. G. Kolda, "A Scalable Optimization Approach for Fitting Canonical Tensor Decompositions, " vol. 25, no. February, pp. 67-86, 2011.
[5] A. Mahfoodh and H. Radha, "Tensor Video Coding, " Acoustics, Speech and Signal Processing (iCASSP), 2013 iEEE international Conference on. IEEE, 2013.