[ieee 2013 ieee global conference on signal and information processing (globalsip) - austin, tx, usa...

Single Image Super Resolution via Manifold Linear Approximation using Sparse Subspace Clustering

Chinh T. Dang, Student Member IEEE, M. Aghagolzadeh, A. A. Moghadam and Hayder Radha, Fellow IEEE Electrical and Computer Engineering Department, Michigan State University

Ahstract-This paper considers the problem of single image super-resolution (SR). Previous example-based SR approaches mainly focus on analyzing the co-occurrence property of low resolution (LR) and high resolution (HR) patches via dictionary learning. In this paper, we propose a novel approach based on local linear approximation of the HR patch space using a sparse subspace clustering (SSC) algorithm. Our approach exploits the underlying HR patches' non-linear space by considering it as a low dimensional manifold in a high dimensional Euclidean space, and by employing each training HR patch as a sample from the manifold. We utilize the sse algorithm to create the set of low dimensional linear spaces that are considered, approximately, as tangent spaces at the HR samples. Based on the obtained approximated tangent spaces, we examine the structure of the underlying HR manifold that allows locating the cooccurrence HR patch for a given LR one. The proposed approach requires a small number of training HR samples (about 1000 patches), without any prior assumption about the LR images. A comparison of the obtained results with other state-of-the-art methods clearly indicates the viability of the proposed approach.

Index Terms: Image super resolution; manifold; subspace

clustering.

I. INTRODUCTION

The area of image super-resolution (SR) handles the illposed problem of recovering a high-resolution (HR) image

from a given set of low-resolution (LR) images. It remains

an important research topic to overcome the limitations of

physical acquisition systems, such as consumer videos cap

tured by low-resolution camcorders, as well as to support the

development of HR displays. Although the SR problem has

been researched for years, it is still a challenging topic due

to loss of information (e.g., through unknown down-sampling,

blurring, and noise operators), as well as the lack of accurate

analytical models.

Conventional approaches demand multiple observed LR

images, and rely on the synthesis of essential non-redundant

information from these LR images to create the final HR one

[11]. However, in many practical cases, multiple LR images

are not available. Thereby, recent efforts have focused on

the problem of SR under the constraint of recovery from a

single LR image [1-6]. This problem area is also known as

image up-scaling. Current state-of-the-art methods for image up-scaling can be loosely divided into two groups [12]: (i)

This work was supported in part by the National Science Foundation under Grant CCF-l 1 17709. Corre:,pondence to: Chinh Dang; email: dangch[email protected]

learning edge model [6], and (ii) example-based SR [1-5]. The

former focuses on maintaining the sharpness and shape of the

edges by learning stable statistical relationships between HR

and LR images. On the other hand, the latter recovers missing

HR details from training dataset by exploiting the property of co-occurrence/similarity between spaces of HR and LR

features. Our effort addresses the problem from the second

angle, which is the recovery of missing HR details based on

training of HR images.

The proposed approach requires a small number of training

data (less than a thousand of HR patches extracted randomly

from several HR images) when compared to the enormous databases of hundreds of thousands [1-2] or even millions of

HR and LR patch pairs required by other approaches [5].

The work by Freeman et.al. [5] requires a huge database

since it considers each patch as a discrete point without any

consideration regarding the structure of the HR patch space.

The sparse representation-based method proposed in [2] could

work with a smaller database due to a linear space that can

be generated from a dictionary learning process. However, the

space of HR patches is a non-linear space, and therefore, the

HR dictionary generates LR patches as well. Our approach

considers the underlying non-linear space of HR patches as a

low dimensional manifold in the ambient Euclidean space, and each training HR patch as a sample that reveals the underlying

structure of the manifold. Instead of targeting the similar

local geometry of HR and LR manifolds, we aim directly

to analyze the space of HR manifold via a novel technique

of local linear approximation using sparse subspace clustering (SSC) [7]. This allows creating linear approximation spaces

(or approximated tangent spaces) for the nonlinear manifold.

The final set of approximated tangent spaces will be used to determine a HR patch that corresponds to a given LR patch.

The remainder of the paper is organized as follows. Section

2 briefly reviews SSC. Section 3 describes our proposed

manifold-based image super-resolution framework. In Section

4, we show some experimental results, and Section 5 outlines

some concluding remarks.

II. SPARSE SUBSPACE CLUSTERING OV ERV IEW

Subspace clustering considers the problem of modeling a

collection of data points with a union of subspaces [13]. The

main idea of SSC was developed from the self-expressive property where the set of N high dimensional vector points in

]RD (normally N » D), denoted by H = {hj E ]RD} ;=1' is

used as a dictionary itself for sparse representation to create

978-1-4799-0248-4/13/$31.00 ©2013 IEEE 949 GlobalSIP 2013

an affinity matrix. In particular, the set of vector points could be written in a matrix form H = [h1,h2, ... ,hN] E ]RDxN. Let H; E ]RDx(N-1) = H/{hd be the matrix obtained

from H by removing its ith column. The self-expressive property implies looking for a sparse representation of hi from

its corresponding dictionary H;. According to the theory of

compressive sensing, a sparse solution could be found via h norm minimization.

min Ilci III subject to hi = H;Ci (1)

Minimizing II norm with equality constraints could be

transformed into an equivalence form of a convex optimization

problem for a fixed dictionary H; :

ci=arg min Ilhi-H;CiI12+AllciII1 (2) ciEIRN-1

After solving (2) for every column of the data matrix H, the

coefficient vector Ci E ]RN-l is arranged into the ith column of a coefficient matrix C E ]RNxN by inserting a zero entry at

the ith position of ci(i.e., diagonal elements of C are zeros).

Finally, the affinity symmetric matrix C = [Cij] is de-NxN

fined as Cij = ICij 1+ICji I, which represents a balanced graph.

An algorithm from spectral graph theory, e.g. normalized cut

algorithm, has been exploited for data segmentation.

III. SUPER-RESOLU TION VIA S S e-BASED LINEAR

APPROXIMATION OF MANIFOLDS (SR-S LAM)

The problem of SR can be casted as an ill-posed inverse

problem. Given a LR patch lp, an automated algorithm needs

to recover its corresponding HR patch hp that satisfies the

underdeterrnined linear equation:

(3)

Here, D and B are the decimation and blur (low pass filter)

operators, respectively. € denotes an additive noise, normally

i.i.d. white Gaussian noise. To eliminate the complexities of

dealing with different spatial resolution (number of pixels)

between lp and hp, image/patch lp is scaled-up to the target HR by a simple interpolation operator, e.g. bicubic interpolation

[3].

T; = Qlp = Q (DBhp + €) = (QDB) hp + € (4)

Many example-based approaches focus on learning the cooccurrence model, specially coupled dictionaries for both LR

and HR features (e.g. Al and Ah respectively) [1][2][8]. However, as pointed out in [3], even if the dictionary Ah of the

HR space has low coherence, the corresponding LR dictionary

Al = (QDB)Ah could not guarantee of having low coherence

due to the undefined and unknown multiplication (Q D B) . Our proposed framework, which we refer to as image SR via

SSe-based Linear Approximation of Manifolds (SR-SLAM),

differs from prior works in two key aspects. First, we employ the sse approach to construct a linear approximation for

the HR manifold. Having a linear approximation for the HR

manifold provides a viable and accurate search space for the HR solution that corresponds to the real-time LR-patch

Fig. 1. lllustration of high-resolution patch reconstruction. {hj} ;=1 ( the red points) represent the set of sampling HR patches from the curve Hs, and II .12 are input low resolution patches. The co-occurrence HR patches of 1I.l2 are these projections onto the tangent space at h7 and the space spanned by h5, h6 respectively.

observation. It is important to note that this strategy is different from prior SR efforts. In particular, in [4], the HR patch is

found based on searching for the nearest neighbor within the

LR space and then mapping the LR solution to the HR space.

This approach has the following potential shortcoming. The

nearest neighbor space is very limited and hence it might not lead to an accurate approximation of the optimal solution.

Second, it is crucial to recognize that many prior example

based approaches are based on an underlying assumption that

the overall structure and shape of the LR and HR manifolds

are similar. Although this assumption might not have been

explicitly mentioned in some prior works, it represents a key foundation for example-based SR methods. Under our pro

posed SR-SLAM framework, we depart from such underlying

assumption, and hence, the two LR and HR manifolds do

not need to have similar structures or shapes. Therefore, our

method eliminates the problem of not having low-coherence in

the learned dictionaries as we mentioned above, as well as not

requiring learning a new dictionary for better approximations

under different magnification factors.

Our approach considers the space of HR patches as a low

dimensional manifold in a high dimensional ambient space. Given a LR patch lp, its HR patch hp will be determined as

the closest HR patch in the HR space. Denote the space of HR

patches for a given patch size as Hs C ]RD, then, the process

of finding hp can be defined as follows:

(5)

Here, d(lp, v) denotes a distance measure. In the experi

ment, T; will be used instead of lp. In general, Hs is infinite;

hence, we need a practical approach for searching that space

for the best result using moderate computation. Our proposed

solution to (5) includes two main steps. The underlying idea is to find a piecewise linear approximation of Hs from which

we can infer the optimum solution for (5).

i) Step 1 (training phase) :

Denote H = {hj E ]RD} ;=1 as in the previous section where

hj is a HR patch extracted from a set of training images.

The problem can be stated as follows: for a given set H of

950

Fig. 2. The set of training images

HR patch samples, how can we partition H into (unknown) k non-overlapping subsets, {Hj} 7=1' and the corresponding low

dimensional linear subspaces, {SHj = span (Hj)} :=1' such

that S Hj (1 :::; j :::; k) could be considered as a set of tangent spaces at these points E Hj.

k k H = U Hj ; Hs � Uj=l SHj j=l

Hjn Hi = ¢ for (1 :::; i # j :::; k) rank (S Hi) :::; ro for 1 :::; j :::; k

(6)

Here, ro is the upper bound rank for a linear subspace. H s is

approximated by the union of the linear subspaces {SHi} k=l ' The idea of using an underlying manifold approximation

by a set of linear subspaces was proposed in [10]. It has

also been applied for a pair of HR and LR patch manifolds

[4] by using Euclidean distance to define neighborhoods.

In our approach, we do not embed the manifold into a

low-dimensional space; instead, we approximate directly in

the high-dimensional space. In addition, as pointed out by

[2], a HR sparse representation over a leamed dictionary leads to a better prediction of an HR patch, while in most

cases, elements in the sparse representation do not belong to

its neighborhood set. This implies the method of defining a

neighborhood using Euclidean distance does not perform well.

We exploit the technique of SSC to define a neighborhood

in the manifold Hs, where a subset Hj includes elements

that are neighbors to other elements inside the subset, but are

not neighbors to elements in different subsets. Consequently,

the linear space SHj can be considered as an approximate

tangent space (or a subspace in the tangent space if there are

not enough neighbors) of the manifold Hs. Details of the implementation of this step are shown in Algorithm 1.

ii) Step 2 (Testing phase) : For a given LR patch lp in (5), the algorithm locates the co-occurrence HR patch hp based on union of these linear

subs paces obtained from the training phase:

Here, d(lpl SHj) is the distance from a point to a linear subspace, which could be computed simply via orthogonal

projection onto a linear space. The algorithm converts the

original intractable problem (5) that requires considering every

single point in the manifold Hs to a tractable problem of

Algorithm 1 Training phase

Inputs: Set of training images, patch size p, constant integer ro,

initial number of clusters n1, number of clusters in loop n2 Outputs: Set of low dimensional subspaces Hs Begin

Extract HR patches of size p from training images, create H.

End

Perform sse to data points in H into n1 subspaces Loop

If rank of any subspaces > TO

Perform sse to that subspace into n2 subspaces End If

End Loop

Return Hs � {SHj} :=1

evaluating the distance from one point to a finite set of linear

subspaces. Next, the closest subspace will be selected:

jmin = arg min d (lpl SH) l�j�k J (8)

Finally, the solution hp of equation (5) is the projection of

lp into its closest linear subspace.

(9)

Fig.l illustrates a simple geometric intuition of the testing

phase including a simple curve Hs and some HR samples

{hj} ;=1' For example, one subspace containing two close points h5,h6 form a tangent linear-approximation of the con

tinuous Hs manifold at h5 and h6. In our implementation,

for a given set of linear subspaces (obtained from the training

phase) and a low resolution input image, the algorithm detects

the corresponding HR patch for each LR patch. Overlapping LR patches lead to overlapping HR patches, and overlapped

pixel values will be averaged. We also exploit the technique of

back projection that enforces a global reconstruction constraint

[9] as many previous methods did.

IV. EXPERIMENTAL RESULTS

A. Training Phase

The set of training images is shown in Fig. 2 that includes a

total of 10 images. They are taken quite randomly from the set of training images that has been used in [2]. For each image,

only 80 HR patches are extracted for further processing; hence, we have a total number of 800 HR patches. At the first step,

all images are converted into YCbCr and our algorithm works

with luminance component only. All computations for PSNR and SSIM in our simulations are for the luminance channel.

There are four parameters in the training phase including the

patch size p, maximum rank in a subgroup ro, initial number of clusters n1 and the number of clusters in the loop n2. In

our experiment, we work with p = 8 (then extracted patches

belong to the Euclidean space JR64), ro = 2, nl = 10 and n2 = 2. In general, at the first step, Algorithm 1 partitions the

HR patches into 10 groups. Then in a recursive manner, each

group that has a rank greater than ro will be further partitioned

into two subgroups.

951

a) Input image b) Neighbor Embedding c) Sparse Representation d) Manifold-SCC e)Ground Truth

Fig. 3. Comparison of single image SR for baby face image magnified by a factor of 3. Left to right: input low resolution image, Neighbor Embedding method [4] (PSNR = 31.663, SSIM = 0.7772), Sparse Representation [2] (PSNR = 33.14, SSIM = 0.8164), our proposed manifold & sparse subspace clusteringbased method (PSNR = 33.428, SSIM = 0.8245) and the ground truth of high resolution image. Numerical results are computed for Y component.

Image Bicubic SR [2] Zeyde [3] Manifold-SSC

Lena 30.0986 31.0517 29.3063 31.2649 0.8430 0.8662 0.8502 0.8691

Mountain 27.0522 27.7505 26.4941 27.8713

0.708 0.7563 0.7475 0.7603

House 24.4172 24.8523 24.1079 24.9365 0.6944 0.7285 0.7102 0.7291

Lion 28.3921 28.9604 27.6353 29.0441 0.7245 0.7648 0.7561 0.7662

Car 27.4234 28.1826 27.7151 28.3789 0.8412 0.8655 0.8689 0.8681

Table 1. Comparison (PSNR and SSIM) of magnification factor 3 for some other different images ( luminance channel only)

B. Testing Phase

We show experimental results, and we compare these results

with some related current methods, particularly the Neighbor

Embedding (NE) method [4] and the sparse representation

approach [2]. Some of the comparison results are taken from

[2]. Fig. 3 shows the results for the "baby face" image under these three different algorithms. In the NE method, a total of

100,000 training patch pairs are used for the method with

different parameters of k nearest neighbors to obtain the best

result [2]. The sparse representation based method trains two

dictionaries by using around 100,000 training HRJLR patch

pairs. In terms of numerical comparison, the proposed SR

SLAM method leads to better results in both PSNR and SSIM.

The sparse representation based method has a tendency to

provide sharp images, however, it could lead to artifacts that

do not fit well with the ground truth HR image (e.g. Fig.3).

Table 1 shows other experimental results and comparisons to

some other recent methods and different set of images.

V. CONCLUSION AND FUTURE WORK

In this paper, a linear-approximation of image-manifold

based method has been proposed to deal with the problem of

single image super-resolution. In the proposed method, SSC

The authors would like to thanks Li He from The University of Tennessee. Knoxville for providing some experimental results for comparison.

has been exploited for manifold tangent space approximation.

The corresponding HR patch for a given LR patch is

then located by finding the closest patch in the HR space

(approximated by the tangent spaces). Our proposed method

requires a very small number of training patches and does not

rely on similarity structure of the LRJHR patch spaces. The

results clearly indicate the viability of the proposed approach. The future work will focus on learning manifold structure of

HR patches, as well as validating the proposed approach for

super resolution of video and multiple reference images.

REFERENCES

[1] Yang, Jianchao, et al. "Image super-resolution as sparse representation of raw image patches." Computer Vision and Pattern Recognition, 2008. CVPR 2008. IEEE Conference on. IEEE, 2008.

[2] Yang, Jianchao, et al. "Image super-resolution via sparse representation." Image Processing. IEEE Transactions on 19.11 (2010): 2861-2873.

[3] Zeyde. Roman. Michael Elad. and Matan Protter. "On single image scaleup using sparse-representations." In Curves and Surfaces, pp. 711-730. Springer Berlin Heidelberg, 2012.

[4] Chang. Hong, Dit-Yan Yeung. and Yimin Xiong. "Super-resolution through neighbor embedding." Computer Vision and Pallern Recognition.

2004. CVPR 2004. Proceedings of the 2004 IEEE Computer Society Conference on. Vol. 1. IEEE. 2004.

[5] Freeman, William T .• Egon C. Pasztor. and Owen T. Carmichael. "Learning low-level vision." International journal of computer vision 40.1 (2000): 25-47.

[6] Sun, Jian, Zongben Xu, and Heung-Yeung Shum. "Image super-resolution using gradient profile prior." Computer Vision and Pattern Recognition. 2008. CVPR 2008. IEEE Conference on. IEEE. 2008.

[7] E. Elhamifar and R. Vidal. Sparse subspace clustering. Computer Vision and Pallern Recognition, CVPR 2009.

[8] Yang, Jianchao. et al. "Bilevel sparse coding for coupled feature spaces."Computer Vision and Pallern Recognition (CVPR), 2012 IEEE Conference on. IEEE. 2012.

[9] Irani, Michal, and Shmuel Peleg. "Super resolution from image sequences." Pattern Recognition. 1990. Proceedings., 10th International Conference on. Vol. 2. IEEE. 1990.

[10] Roweis. Sam T .• and Lawrence K. Saul. "Nonlinear dimensionality reduction by locally linear embedding." Science 290.5500 (2000): 2323-2326.

[11] Farsiu. Sina, M. Dirk Robinson, Michael Elad, and Peyman Milanfar. Fast and robust multiframe super resolution. Image processing. IEEE

Transactions on 13, no. 10 (2004): 1327-1344. [12] Glasner, Daniel, Shai Bagon, and Michal Irani. Super-resolution from a

single image. Computer Vision. 2009 IEEE 12th International Conference on. IEEE. 2009.

[13] Rene Vidal. Subspace clustering. IEEE Signal Processing Magazine 2011.

952

[ieee 2013 ieee global conference on signal and information processing (globalsip) - austin, tx, usa...

Documents