[ieee 2013 ieee global conference on signal and information processing (globalsip) - austin, tx, usa...
TRANSCRIPT
Single Image Super Resolution via Manifold Linear Approximation using Sparse Subspace Clustering
Chinh T. Dang, Student Member IEEE, M. Aghagolzadeh, A. A. Moghadam and Hayder Radha, Fellow IEEE Electrical and Computer Engineering Department, Michigan State University
Ahstract-This paper considers the problem of single image super-resolution (SR). Previous example-based SR approaches mainly focus on analyzing the co-occurrence property of low resolution (LR) and high resolution (HR) patches via dictionary learning. In this paper, we propose a novel approach based on local linear approximation of the HR patch space using a sparse subspace clustering (SSC) algorithm. Our approach exploits the underlying HR patches' non-linear space by considering it as a low dimensional manifold in a high dimensional Euclidean space, and by employing each training HR patch as a sample from the manifold. We utilize the sse algorithm to create the set of low dimensional linear spaces that are considered, approximately, as tangent spaces at the HR samples. Based on the obtained approximated tangent spaces, we examine the structure of the underlying HR manifold that allows locating the cooccurrence HR patch for a given LR one. The proposed approach requires a small number of training HR samples (about 1000 patches), without any prior assumption about the LR images. A comparison of the obtained results with other state-of-the-art methods clearly indicates the viability of the proposed approach.
Index Terms: Image super resolution; manifold; subspace
clustering.
I. INTRODUCTION
The area of image super-resolution (SR) handles the illposed problem of recovering a high-resolution (HR) image
from a given set of low-resolution (LR) images. It remains
an important research topic to overcome the limitations of
physical acquisition systems, such as consumer videos cap
tured by low-resolution camcorders, as well as to support the
development of HR displays. Although the SR problem has
been researched for years, it is still a challenging topic due
to loss of information (e.g., through unknown down-sampling,
blurring, and noise operators), as well as the lack of accurate
analytical models.
Conventional approaches demand multiple observed LR
images, and rely on the synthesis of essential non-redundant
information from these LR images to create the final HR one
[11]. However, in many practical cases, multiple LR images
are not available. Thereby, recent efforts have focused on
the problem of SR under the constraint of recovery from a
single LR image [1-6]. This problem area is also known as
image up-scaling. Current state-of-the-art methods for image up-scaling can be loosely divided into two groups [12]: (i)
This work was supported in part by the National Science Foundation under Grant CCF-l 1 17709. Corre:,pondence to: Chinh Dang; email: dangch[email protected]
learning edge model [6], and (ii) example-based SR [1-5]. The
former focuses on maintaining the sharpness and shape of the
edges by learning stable statistical relationships between HR
and LR images. On the other hand, the latter recovers missing
HR details from training dataset by exploiting the property of co-occurrence/similarity between spaces of HR and LR
features. Our effort addresses the problem from the second
angle, which is the recovery of missing HR details based on
training of HR images.
The proposed approach requires a small number of training
data (less than a thousand of HR patches extracted randomly
from several HR images) when compared to the enormous databases of hundreds of thousands [1-2] or even millions of
HR and LR patch pairs required by other approaches [5].
The work by Freeman et.al. [5] requires a huge database
since it considers each patch as a discrete point without any
consideration regarding the structure of the HR patch space.
The sparse representation-based method proposed in [2] could
work with a smaller database due to a linear space that can
be generated from a dictionary learning process. However, the
space of HR patches is a non-linear space, and therefore, the
HR dictionary generates LR patches as well. Our approach
considers the underlying non-linear space of HR patches as a
low dimensional manifold in the ambient Euclidean space, and each training HR patch as a sample that reveals the underlying
structure of the manifold. Instead of targeting the similar
local geometry of HR and LR manifolds, we aim directly
to analyze the space of HR manifold via a novel technique
of local linear approximation using sparse subspace clustering (SSC) [7]. This allows creating linear approximation spaces
(or approximated tangent spaces) for the nonlinear manifold.
The final set of approximated tangent spaces will be used to determine a HR patch that corresponds to a given LR patch.
The remainder of the paper is organized as follows. Section
2 briefly reviews SSC. Section 3 describes our proposed
manifold-based image super-resolution framework. In Section
4, we show some experimental results, and Section 5 outlines
some concluding remarks.
II. SPARSE SUBSPACE CLUSTERING OV ERV IEW
Subspace clustering considers the problem of modeling a
collection of data points with a union of subspaces [13]. The
main idea of SSC was developed from the self-expressive property where the set of N high dimensional vector points in
]RD (normally N » D), denoted by H = {hj E ]RD} ;=1' is
used as a dictionary itself for sparse representation to create
978-1-4799-0248-4/13/$31.00 ©2013 IEEE 949 GlobalSIP 2013
an affinity matrix. In particular, the set of vector points could be written in a matrix form H = [h1,h2, ... ,hN] E ]RDxN. Let H; E ]RDx(N-1) = H/{hd be the matrix obtained
from H by removing its ith column. The self-expressive property implies looking for a sparse representation of hi from
its corresponding dictionary H;. According to the theory of
compressive sensing, a sparse solution could be found via h norm minimization.
min Ilci III subject to hi = H;Ci (1)
Minimizing II norm with equality constraints could be
transformed into an equivalence form of a convex optimization
problem for a fixed dictionary H; :
ci=arg min Ilhi-H;CiI12+AllciII1 (2) ciEIRN-1
After solving (2) for every column of the data matrix H, the
coefficient vector Ci E ]RN-l is arranged into the ith column of a coefficient matrix C E ]RNxN by inserting a zero entry at
the ith position of ci(i.e., diagonal elements of C are zeros).
Finally, the affinity symmetric matrix C = [Cij] is de-NxN
fined as Cij = ICij 1+ICji I, which represents a balanced graph.
An algorithm from spectral graph theory, e.g. normalized cut
algorithm, has been exploited for data segmentation.
III. SUPER-RESOLU TION VIA S S e-BASED LINEAR
APPROXIMATION OF MANIFOLDS (SR-S LAM)
The problem of SR can be casted as an ill-posed inverse
problem. Given a LR patch lp, an automated algorithm needs
to recover its corresponding HR patch hp that satisfies the
underdeterrnined linear equation:
(3)
Here, D and B are the decimation and blur (low pass filter)
operators, respectively. € denotes an additive noise, normally
i.i.d. white Gaussian noise. To eliminate the complexities of
dealing with different spatial resolution (number of pixels)
between lp and hp, image/patch lp is scaled-up to the target HR by a simple interpolation operator, e.g. bicubic interpolation
[3].
T; = Qlp = Q (DBhp + €) = (QDB) hp + € (4)
Many example-based approaches focus on learning the cooccurrence model, specially coupled dictionaries for both LR
and HR features (e.g. Al and Ah respectively) [1][2][8]. However, as pointed out in [3], even if the dictionary Ah of the
HR space has low coherence, the corresponding LR dictionary
Al = (QDB)Ah could not guarantee of having low coherence
due to the undefined and unknown multiplication (Q D B) . Our proposed framework, which we refer to as image SR via
SSe-based Linear Approximation of Manifolds (SR-SLAM),
differs from prior works in two key aspects. First, we employ the sse approach to construct a linear approximation for
the HR manifold. Having a linear approximation for the HR
manifold provides a viable and accurate search space for the HR solution that corresponds to the real-time LR-patch
Fig. 1. lllustration of high-resolution patch reconstruction. {hj} ;=1 ( the red points) represent the set of sampling HR patches from the curve Hs, and II .12 are input low resolution patches. The co-occurrence HR patches of 1I.l2 are these projections onto the tangent space at h7 and the space spanned by h5, h6 respectively.
observation. It is important to note that this strategy is different from prior SR efforts. In particular, in [4], the HR patch is
found based on searching for the nearest neighbor within the
LR space and then mapping the LR solution to the HR space.
This approach has the following potential shortcoming. The
nearest neighbor space is very limited and hence it might not lead to an accurate approximation of the optimal solution.
Second, it is crucial to recognize that many prior example
based approaches are based on an underlying assumption that
the overall structure and shape of the LR and HR manifolds
are similar. Although this assumption might not have been
explicitly mentioned in some prior works, it represents a key foundation for example-based SR methods. Under our pro
posed SR-SLAM framework, we depart from such underlying
assumption, and hence, the two LR and HR manifolds do
not need to have similar structures or shapes. Therefore, our
method eliminates the problem of not having low-coherence in
the learned dictionaries as we mentioned above, as well as not
requiring learning a new dictionary for better approximations
under different magnification factors.
Our approach considers the space of HR patches as a low
dimensional manifold in a high dimensional ambient space. Given a LR patch lp, its HR patch hp will be determined as
the closest HR patch in the HR space. Denote the space of HR
patches for a given patch size as Hs C ]RD, then, the process
of finding hp can be defined as follows:
(5)
Here, d(lp, v) denotes a distance measure. In the experi
ment, T; will be used instead of lp. In general, Hs is infinite;
hence, we need a practical approach for searching that space
for the best result using moderate computation. Our proposed
solution to (5) includes two main steps. The underlying idea is to find a piecewise linear approximation of Hs from which
we can infer the optimum solution for (5).
i) Step 1 (training phase) :
Denote H = {hj E ]RD} ;=1 as in the previous section where
hj is a HR patch extracted from a set of training images.
The problem can be stated as follows: for a given set H of
950
Fig. 2. The set of training images
HR patch samples, how can we partition H into (unknown) k non-overlapping subsets, {Hj} 7=1' and the corresponding low
dimensional linear subspaces, {SHj = span (Hj)} :=1' such
that S Hj (1 :::; j :::; k) could be considered as a set of tangent spaces at these points E Hj.
k k H = U Hj ; Hs � Uj=l SHj j=l
Hjn Hi = ¢ for (1 :::; i # j :::; k) rank (S Hi) :::; ro for 1 :::; j :::; k
(6)
Here, ro is the upper bound rank for a linear subspace. H s is
approximated by the union of the linear subspaces {SHi} k=l ' The idea of using an underlying manifold approximation
by a set of linear subspaces was proposed in [10]. It has
also been applied for a pair of HR and LR patch manifolds
[4] by using Euclidean distance to define neighborhoods.
In our approach, we do not embed the manifold into a
low-dimensional space; instead, we approximate directly in
the high-dimensional space. In addition, as pointed out by
[2], a HR sparse representation over a leamed dictionary leads to a better prediction of an HR patch, while in most
cases, elements in the sparse representation do not belong to
its neighborhood set. This implies the method of defining a
neighborhood using Euclidean distance does not perform well.
We exploit the technique of SSC to define a neighborhood
in the manifold Hs, where a subset Hj includes elements
that are neighbors to other elements inside the subset, but are
not neighbors to elements in different subsets. Consequently,
the linear space SHj can be considered as an approximate
tangent space (or a subspace in the tangent space if there are
not enough neighbors) of the manifold Hs. Details of the implementation of this step are shown in Algorithm 1.
ii) Step 2 (Testing phase) : For a given LR patch lp in (5), the algorithm locates the co-occurrence HR patch hp based on union of these linear
subs paces obtained from the training phase:
Here, d(lpl SHj) is the distance from a point to a linear subspace, which could be computed simply via orthogonal
projection onto a linear space. The algorithm converts the
original intractable problem (5) that requires considering every
single point in the manifold Hs to a tractable problem of
Algorithm 1 Training phase
Inputs: Set of training images, patch size p, constant integer ro,
initial number of clusters n1, number of clusters in loop n2 Outputs: Set of low dimensional subspaces Hs Begin
Extract HR patches of size p from training images, create H.
End
Perform sse to data points in H into n1 subspaces Loop
If rank of any subspaces > TO
Perform sse to that subspace into n2 subspaces End If
End Loop
Return Hs � {SHj} :=1
evaluating the distance from one point to a finite set of linear
subspaces. Next, the closest subspace will be selected:
jmin = arg min d (lpl SH) l�j�k J (8)
Finally, the solution hp of equation (5) is the projection of
lp into its closest linear subspace.
(9)
Fig.l illustrates a simple geometric intuition of the testing
phase including a simple curve Hs and some HR samples
{hj} ;=1' For example, one subspace containing two close points h5,h6 form a tangent linear-approximation of the con
tinuous Hs manifold at h5 and h6. In our implementation,
for a given set of linear subspaces (obtained from the training
phase) and a low resolution input image, the algorithm detects
the corresponding HR patch for each LR patch. Overlapping LR patches lead to overlapping HR patches, and overlapped
pixel values will be averaged. We also exploit the technique of
back projection that enforces a global reconstruction constraint
[9] as many previous methods did.
IV. EXPERIMENTAL RESULTS
A. Training Phase
The set of training images is shown in Fig. 2 that includes a
total of 10 images. They are taken quite randomly from the set of training images that has been used in [2]. For each image,
only 80 HR patches are extracted for further processing; hence, we have a total number of 800 HR patches. At the first step,
all images are converted into YCbCr and our algorithm works
with luminance component only. All computations for PSNR and SSIM in our simulations are for the luminance channel.
There are four parameters in the training phase including the
patch size p, maximum rank in a subgroup ro, initial number of clusters n1 and the number of clusters in the loop n2. In
our experiment, we work with p = 8 (then extracted patches
belong to the Euclidean space JR64), ro = 2, nl = 10 and n2 = 2. In general, at the first step, Algorithm 1 partitions the
HR patches into 10 groups. Then in a recursive manner, each
group that has a rank greater than ro will be further partitioned
into two subgroups.
951
a) Input image b) Neighbor Embedding c) Sparse Representation d) Manifold-SCC e)Ground Truth
Fig. 3. Comparison of single image SR for baby face image magnified by a factor of 3. Left to right: input low resolution image, Neighbor Embedding method [4] (PSNR = 31.663, SSIM = 0.7772), Sparse Representation [2] (PSNR = 33.14, SSIM = 0.8164), our proposed manifold & sparse subspace clusteringbased method (PSNR = 33.428, SSIM = 0.8245) and the ground truth of high resolution image. Numerical results are computed for Y component.
Image Bicubic SR [2] Zeyde [3] Manifold-SSC
Lena 30.0986 31.0517 29.3063 31.2649 0.8430 0.8662 0.8502 0.8691
Mountain 27.0522 27.7505 26.4941 27.8713
0.708 0.7563 0.7475 0.7603
House 24.4172 24.8523 24.1079 24.9365 0.6944 0.7285 0.7102 0.7291
Lion 28.3921 28.9604 27.6353 29.0441 0.7245 0.7648 0.7561 0.7662
Car 27.4234 28.1826 27.7151 28.3789 0.8412 0.8655 0.8689 0.8681
Table 1. Comparison (PSNR and SSIM) of magnification factor 3 for some other different images ( luminance channel only)
B. Testing Phase
We show experimental results, and we compare these results
with some related current methods, particularly the Neighbor
Embedding (NE) method [4] and the sparse representation
approach [2]. Some of the comparison results are taken from
[2]. Fig. 3 shows the results for the "baby face" image under these three different algorithms. In the NE method, a total of
100,000 training patch pairs are used for the method with
different parameters of k nearest neighbors to obtain the best
result [2]. The sparse representation based method trains two
dictionaries by using around 100,000 training HRJLR patch
pairs. In terms of numerical comparison, the proposed SR
SLAM method leads to better results in both PSNR and SSIM.
The sparse representation based method has a tendency to
provide sharp images, however, it could lead to artifacts that
do not fit well with the ground truth HR image (e.g. Fig.3).
Table 1 shows other experimental results and comparisons to
some other recent methods and different set of images.
V. CONCLUSION AND FUTURE WORK
In this paper, a linear-approximation of image-manifold
based method has been proposed to deal with the problem of
single image super-resolution. In the proposed method, SSC
The authors would like to thanks Li He from The University of Tennessee. Knoxville for providing some experimental results for comparison.
has been exploited for manifold tangent space approximation.
The corresponding HR patch for a given LR patch is
then located by finding the closest patch in the HR space
(approximated by the tangent spaces). Our proposed method
requires a very small number of training patches and does not
rely on similarity structure of the LRJHR patch spaces. The
results clearly indicate the viability of the proposed approach. The future work will focus on learning manifold structure of
HR patches, as well as validating the proposed approach for
super resolution of video and multiple reference images.
REFERENCES
[1] Yang, Jianchao, et al. "Image super-resolution as sparse representation of raw image patches." Computer Vision and Pattern Recognition, 2008. CVPR 2008. IEEE Conference on. IEEE, 2008.
[2] Yang, Jianchao, et al. "Image super-resolution via sparse representation." Image Processing. IEEE Transactions on 19.11 (2010): 2861-2873.
[3] Zeyde. Roman. Michael Elad. and Matan Protter. "On single image scaleup using sparse-representations." In Curves and Surfaces, pp. 711-730. Springer Berlin Heidelberg, 2012.
[4] Chang. Hong, Dit-Yan Yeung. and Yimin Xiong. "Super-resolution through neighbor embedding." Computer Vision and Pallern Recognition.
2004. CVPR 2004. Proceedings of the 2004 IEEE Computer Society Conference on. Vol. 1. IEEE. 2004.
[5] Freeman, William T .• Egon C. Pasztor. and Owen T. Carmichael. "Learning low-level vision." International journal of computer vision 40.1 (2000): 25-47.
[6] Sun, Jian, Zongben Xu, and Heung-Yeung Shum. "Image super-resolution using gradient profile prior." Computer Vision and Pattern Recognition. 2008. CVPR 2008. IEEE Conference on. IEEE. 2008.
[7] E. Elhamifar and R. Vidal. Sparse subspace clustering. Computer Vision and Pallern Recognition, CVPR 2009.
[8] Yang, Jianchao. et al. "Bilevel sparse coding for coupled feature spaces."Computer Vision and Pallern Recognition (CVPR), 2012 IEEE Conference on. IEEE. 2012.
[9] Irani, Michal, and Shmuel Peleg. "Super resolution from image sequences." Pattern Recognition. 1990. Proceedings., 10th International Conference on. Vol. 2. IEEE. 1990.
[10] Roweis. Sam T .• and Lawrence K. Saul. "Nonlinear dimensionality reduction by locally linear embedding." Science 290.5500 (2000): 2323-2326.
[11] Farsiu. Sina, M. Dirk Robinson, Michael Elad, and Peyman Milanfar. Fast and robust multiframe super resolution. Image processing. IEEE
Transactions on 13, no. 10 (2004): 1327-1344. [12] Glasner, Daniel, Shai Bagon, and Michal Irani. Super-resolution from a
single image. Computer Vision. 2009 IEEE 12th International Conference on. IEEE. 2009.
[13] Rene Vidal. Subspace clustering. IEEE Signal Processing Magazine 2011.
952