15arspc submission 217

1

IMPROVING IMAGE QUALITY: A TECHNIQUE FOR LONG DISTANCE EARTH-BASED OBJECTS

Gabriel Scarmana

Department of Transport and Main Roads Queensland Government

[email protected]

Abstract

A technique for improving images of earth-based long distance objects is discussed. The input data are from a large number of short exposure image frames of an object which appears distant within an image of a scene taken from a remote location.The reconstruction of a sharper and/or improved image is achieved in two steps: (1) Sub-sets of short exposure images of the same scene are merged

separately using a method referred to as image stacking. This preliminary step is used to increase the signal-to-noise ratio while effectively freezing atmospheric distortions and thereby retaining high frequency spatial information.

(2) Once the stacking process is complete, the composites obtained from each

subset in step (1) are combined via image super-resolution techniques. Super-resolution is a term given to single image products which have been produced by combining images of the same scene, using algorithms that purport to increase the resolution of the final product. The theory is that subtle sub-pixel shifts in each image will, when combined, provide for improved spatial resolution as if the images were sampled at more points than detected by the sensor array. In this second step the final higher-resolution image is obtained by mapping a model of the image formation process using local translations, or shifts, among the composite images of step (1). These pixel shifts, if they exist, are determined by way of a rigorous least-squares area-based image matching scheme.

This paper discusses the development of the above two step process in detail and concludes with an evaluation of its implementation using practical examples and/or experiments. The aim is to demonstrate the potential application of this technique for long range surveillance systems.

2

1. Introduction Image sequences are degraded by the loss of resolution due to down-sampling (not meeting the sampling theorem) of the images and to the integration over the sensor area. However, the knowledge of sub-pixel motions between frames depicting the same scene usually allows the reconstruction of high resolution images from low resolution image sequences. The author acknowledges the rapid advances that have been made in hardware solutions for image sensors to solve the problem of increasing the resolution of digital imagery. The work presented here does not detract from those advances rather it provides a complementary technique that is hardware independent. This technique can increase the resolution of any image sequence taken from any digital image capturing device.

In this work the sequence of images is acquired using an off-the-shelf digital camera, which under-samples an object of interest within the same scene. The reconstruction of a sharper and/or improved image is achieved in two steps:

1. A large number of images in a scene of interest are taken with a digital camera. Sub-sets of these images are thereafter pre-processed using a method referred to as image stacking. Image stacking is the process of merging aligned images together in order to enhance detail, suppress noise, remove undesired motion effects, or otherwise leverage data contained across multiple exposures of the same scene. In general terms, image stacking consists of taking several hundred images of the same scene, registering (aligning their centres) then stacking them (adding them all together pixel-by-pixel then dividing each pixel by the number of images).

2. Once the stacking process is complete the improved images generated

by each subset in step 1 are combined via image Super-Resolution (SR) techniques. SR is a term given to a single image product or composite that has been produced by combining images of the same scene, using procedures that increase the resolution of the final product. The theory is that accurate sub-pixel shifts in each image will, when the images are combined, provide for a higher spatial resolution that if the images were sampled at more points than were detected by the sensor array (Zhouchen and Heung-Yeung, 2004).

2. Notes on image stacking Image stacking is a popular method of image processing amongst astronomical-photographers. The exact same technique can be applied to any situation where very similar but not identical images can be captured over a period of time, in other words in situations where the scene is not changing drastically due to motion or varying light and shadow. As mentioned earlier, this pre-processing step can be used to: (1) reduce artifacts created by

3

compression (2) increase the signal-to-noise ratio noise without compromising detail in the image (3) increase the dynamic range of an image and (4) effectively freeze atmospheric distortions while retaining high frequency spatial information. Image stacking works on the assumption that the noise in an image is truly random. This way, random fluctuations above and below actual image data will gradually even out as more and more images are stacked. In order to perform a stacking operation the images should be first aligned. Image alignments ranging between 0.5 to 1 pixels may be sufficient to carry out this operation (Bovik, 2005). Alignment techniques compare portions of images against one another. This is carried out by way of a registration procedure referred to as normalised cross-correlation (Russ, 2007). The technique allows images to be aligned without using control points in the registration procedure. Tests by the author showed that the alignment process is satisfactory when the correlation coefficient between two aligned images is greater or equal to 0.999. Images that do not comply with this figure are simply discarded from the process. As mentioned earlier image stacking is also useful for reducing the effects of compression. This is the case of imagery obtained from digital images which are compressed in a lossy manner, such as by the JPEG (Joint Photographic Expert Group) protocol, in order to reduce the storage requirements. Lossy compression means that data is lost during compression so the quality after decoding is less than the original picture (Gonzalez, 2008). Lossy compression protocols introduce several distortions which can complicate the proposed enhancement process. For example, most compression algorithms divide the original image into blocks which are processed independently, thus creating problems of continuity between blocks after decompression. Moreover, at high compression ratios (>20:1) the blocking effect is especially obvious in flat areas of an image. In areas with lots of detail, artefacts referred to as ringing or mosquito noise also become noticeable (Zhouchen and Heung-Yeung, 2005). 3. Notes on image Super-Resolution (SR) The majority of the literature on SR describes the use of three basic steps: (1) accurate estimation of shifts among the different low-resolution images at a sub-pixel level; (2) projecting or mapping the pixels of the low-resolution images onto a higher resolution grid using the shifts detected and; (3) interpolating or solving sets of equations derived from the geometric relationships existing between low and high-resolution pixels. The method for estimating sub-pixel shifts between images of the same scene is based on first order Taylor series (Vanderwalle et al., 2005) and can determine sub-pixel shifts between images with an accuracy of 0,1 pixels.

4

For a correct detection of the shifts between two images, the image must contain some features that make it possible to match two under-sampled images. Very sharp edges and small details are most affected by aliasing, so they are not reliable to be used to estimate these shifts. Uniform areas are ineffective, since they are translation invariant (Farsiu et al. 2004). The best features are slow transitions between two areas of grey values as these areas are generally unaffected by aliasing. Such portions of an image need not be detected specifically, although their presence is very important for an accurate result. Hence, before attempting to match a given sequence of images of the same scene to a sub-pixel level it is recommended to uniformly apply a low-pass filter to each image. The purpose of a low-pass filter, as shown in Figure 1, is to smooth:

• Sharp edges and small details

• Sudden changes of intensity values and

• Aliasing effects

(a) (b)

Figure 1 – (a) A low-resolution image of the aerial view and (b) the effect of applying a low-pass filter. The sub-pixel motion estimator adopted here determines the x- and y-shifts and slight rotations between any two images, but what is really required is the accurate relative positions of a sequence of images. By calculating the shifts with respect to a single reference image, only one realization of the relative positions is obtained. By repeating the procedure for another reference image, a second estimate for the relative positions is made. Continuing to repeat this process for all images in the sequence, a better estimate of the relative shifts, image to image, can be found. The statistical

5

measure used to determine the ‘best’ possible value for all possible combinations of the motion vectors between a set of shifted low-resolution images is the vector median. If the vector mean was taken instead of the median, then the final motion vector would be an entirely new vector, and not one of the vectors originally estimated. In addition, the mean is less robust than the median if outliers are present (Spiegel et al., 1999). 4. Image reconstruction

Once all the improved low-resolution images (as obtained from the stacking operation) have been processed and matched to a sub-pixel level, they are projected or mapped on a uniformly spaced high-resolution grid (see Figure 2). The values of the randomly distributed pixels and shifts of these images can then be processed to generate an image with a higher resolution. A weighted arithmetic mean can be used for this purpose. A weighted arithmetic mean associates each known pixel of the low-resolution images to the high-resolution pixels. For example, in Figure 2 the low-resolution pixel C1 can be related to the pixels of the high-resolution grid by way of Equation 1. In Figure 2 the Xi (i=1…25) represent the high-resolution pixels whereas the Cn, (n=1…6) are the low-resolution pixel.

Figure 2: An idealized image enhancement set-up.

After C1 is related to the high-resolution grid, the process moves on to the next low-resolution data pixel (i.e. C2) where another equation is constructed. This sequence of equations may be thought of as “observation equations” where the unknowns are the values of the high-resolution pixels (Xi). These linear equations can be solved by traditional systems of simultaneous equations (Fryer and McIntosh, 2001).

w12x12+w13x13+w16x16+w17x17+…+w23x23 C1 =

w12 +w13 +w16+w17…+w23

(1)

R

6

The weights (w) are defined by the inverse of the distance that separates the low-resolution pixel from the unknown high-resolution pixels that fall within a circle of constant radius (R). This circle is centred on each low-resolution pixel as shown in Figure 2. The dimension of the radius R depends on the magnification factor required. As a general rule, if the magnification factor is chosen to be equal to 2 then the minimum radius for the circle required to search all the high-resolution pixels is 2√2. On the other hand, if the chosen magnification factor is n then the minimum search radius is taken as n√n, etc. The example in Figure 2 relates to a magnification factor of 4 where the final high- resolution composite will have 4 times more pixel than any of the low-resolution images. To comply with sampling theory, R must ensure that an overlapping occurs between the circles, as it is important that each of the unknown high-resolution pixels (Xi) appear at least twice in different observation equations (Scarmana, 2009). Note that there will be an equation for each low-resolution pixel, being the number of equations at least equal or greater than the number of desired high-resolution pixels in the final enhanced image. Hence, when (say) 10 suitably overlapping images each of modest size 320x320 are considered, it becomes apparent that 320x320x10 = 1.00 million observation equations could be formed. If a magnification factor of 2 is chosen, then the resultant resolution enhanced image may require twice as many equations. Although more computationally expensive, as compared to alternative reconstruction techniques based on direct interpolation methods, this reconstruction system gives accurate estimates of the error at each computed point, thus providing a measure of confidence and reliability for the accuracy and precision of each high-resolution pixel of the enhanced image. 5. Reconstruction with synthetic images The proposed image enhancement process was first tested for digital imaging applications using synthetic data. The performance of the method was thereafter tested with real data as extracted from a sequence of images of an object within a static scene. In this synthetic experiment, the ‘true’ image was known prior to the enhancement and thus the accuracy of the enhancement could be investigated and quantified. A set of 600 grey scale images of the aerial view (256 colour gradations) were derived by down-sampling this image using a weighted mean average of neighbouring pixels and using pre-assigned sub-pixel shift values. Each of the 600 images was also JPEG compressed using the same compression ratio (i.e., 10:1). The size of the original image was 512x512 pixels whereas the size of the under sampled images was 128x128. Random 'salt & pepper' noise was added to each of these 600 images. The percentage (3%) of the total number of pixels was changed to either totally

7

black or white. As illustrated in Figure 3(a) the effect is similar to sprinkling white and black dots on the image. One example where salt and pepper noise arises is in transmitting images over noisy digital links (Russ, 2007).

(a) (b)

Figure 3 - (a) one of the 600 low-resolution, blurred and noisy images of the aerial view and (b) one of the improved 20 images obtained from the stacking process.

No rotations were applied in this test. Rotations in the images would have added extra parameters to the enhancement process and may have detracted from the strength of the conclusions reached in the experiments. Correlation obviously exists between the image plane of a digital camera and orientation parameters such as tilts, rotations and affinity/obliquity of the sensor. In a controlled experiment where the aim is to demonstrate the use of a process to enhance image resolution per se, it was thought unwise to introduce such complications. The 600 images were then divided in subsets of 30 images each. Image stacking was then applied to each subset so as to obtain 20 composites of improved quality (see an example in Figure 3(b)). Each of these composite displayed an improved image where the effects of compression and noise in the imagery were virtually eliminated. Note that the stacking process was not used to increase image resolution. The purpose of the stacking process is that of improving quality and eliminating unwanted distorting effects (i.e., noise leves) so as to prepare the images for the SR step. Subsets of 30 images were selected because this number proved to be sufficiently effective in a number of synthetic experiments conducted in images of the same characteristics and

distortions used in this test. Figure 4 shows the result of combining the 20 com-posites via SR.

8

Figure 4 - The final high resolution image (512x512) as constructed using SR. The enhancement is a result of combining 20 (128x128) image composites as shown in 3(b). The enhanced image contains 4 times more real pixels than any of the original low-resolution images.

6. Application to real imaging A set of 350 images was taken to a weather radar station located approximately 1 Km away from the camera position. The scene is shown in Figure 5. The camera resolution was set to its maximum quality (5 megapixels) but only a section of the scene containing the dome of the radar station shown in the box was enhanced in this experiment. The diameter of the dome is approximately 12 metres and is located about 150m above mean sea level. The images were taken with the camera fixed on a tripod. An enlarged view of the radar dome is shown in Figure 5(b) as extracted from one of the raw images taken by the camera. The proposed enhancement technique produced the result shown in Figure 5(c) and 5(d). This enhancement was obtained following the same steps applied in the synthetic example outlined in the previous section. The only difference is that only 7 subsets of 50 low-resolution images were created. The resulting enhanced image of the radar station contains 4 times more pixels (240x400) than any of the original low-resolution mages (60x100).

9

(a) (b)

(c) (d)

Figure 5 – (a) A view of the scene as taken by a 5 megapixels digital camera located 1 Km. away from the object of interest and (b) the area of interest; (c) is the improved view of the same scene shown in (a) and (d) is an enlarged improved view of the radar’s dome after processing 350 low resolution images using the proposed method.

Although this experiment relates to a grey scale sequence, the same process can be applied when using colour. Colour images can be considered as three separate images containing red, green and blue components (RGB). Each of these components or channels can be enhanced independently and then fused to produce a colour image with enhanced resolution (Bovik, 2005 and Rees 2007). The underlying premise is that for any colour image sequence, the motion between adjacent frames for each colour channel should be exactly the same (Farsiu et al., 2001). In other words there is only one actual motion field which describes the sub-pixel shifts from one frame to the next. In practice, however, when the motion estimation is performed on each channel independently, the motion vectors may differ slightly between the different colour channels, thus requiring more complex statistical computations (Scarmana, 2009).

10

7. Number of low-resolution images required

The required number of low-resolution images generally depends on the distribution of the sub-pixel shifts as well as on the signal to noise ratio, and the amount of noise present in the imagery. For instance, to minimise the influence of noise it is important that the distribution of the shifts between the low-resolution images be as complete as possible. This is illustrated in an analogous manner in the example of Figure 6 (after Hendriks and van Vliet, 1999). In this figure, any three exact samples define a circle (left diagram).

Figure 6 - The effect of noise on geometry determination.

However, if the samples contain noise and are situated close to one another, almost any circle will fit (middle). Positioning theses samples far apart ensures a more correct representation in spite of the noise (right). The reconstruction of a higher resolution image with the minimum number of low-resolution images is possible, but it should not be expected to always achieve a high accuracy, especially for higher magnification factors (>4). High magnification factors require large numbers of low-resolution images, meaning that these low resolution images must be relatively close to one another, that is, relatively small shifts. The accuracy of detecting those offsets will clearly affect the accuracy of the final high-resolution image as the uncertainty in a sub-pixel shift’s determination may be of the same magnitude as the shift itself. Conclusions

The problem of aliasing reduction and resolution enhancement can be addressed by exploiting multiple frames of interest that offer unique perspectives of a specific scene of interest. The focus here was to exploit frame-to-frame translational shifts or motions that may result from line-of-sight jitter of a sensor mounted on a still platform. However, exploiting these sup-pixel motions requires accurate estimates of them. In this context, a method for enhancing the resolution of a compressed and noisy sequence using a multi-frame image enhancement approach has been

11

presented. The process uses two techniques referred to as Image stacking followed by Super-Resolution. The technique does not rely on control points for the accurate matching or registration the images. The registration or matching methodology and subsequent use of the proposed enhancement technique may lead to a general approach to the problem of generating a higher resolution image from compressed and noisy sequences of slightly offset images. The application and effectiveness of the enhancement process has been demonstrated in 2D tests and refinements to the technique are being undertaken to increase the accuracy achievable for larger image magnifications. This may extend the range of applications which could benefit from utilising this device independent image enhancement process, possibly adapting this method to a generalized scheme whereby both sensors and objects of interest are dynamic and the illumination is non-uniform. It is the author’s belief that this enhancement technique may be ideal for surveillance systems.

References: Farsiu S., Robinson D., Elad M., and Milanfar P., 2004. “Advances and Challenges in SR”, International Journal of Imaging Systems and Technology, Volume 14, no 2, pp. 47-57, August. Fryer J. and McIntosh, K.L., 2001. “Enhancement of Image Resolution in Digital Photogrammetry”. Photogrammetric Engineering & Remote Sensing”, Vol.67, No.6, pp.741-749. Gonzalez R.C. and Woods R.E. 2008. “Digital image processing”. 3d Edition, Published by Prentice Hall. 954 pages. Hendriks L. C. and van Vliet L.J., 1999. “Resolution Enhancement of a Sequence of Undersampled Shifted Images”. ASCI 1999. Proceeding 5

th

Annual Conference of the Advanced School for Computing and Imaging (Heijen. NL. June 15-17). ASCI. Delft. 95-102. Rees W. G. 2007. “Physical Principles of Remote Sensing”. Second edition. Cambridge University Press. Russ C. J., 2007. “The Image Processing Handbook”, Published by CRC Press, ISBN 0849372542, 9780849372544, 817 pages. Spiegel M. R. and Stephens L., 1999. “Theory and Problems of Statistics”, Schaum’s Outline Series, McGraw-Hill Book Company”. pp. 556. Scarmana G., 2009. “High-resolution Image Generation Using Warping Transformations”. SIGMAP 2009. Proceedings of the International Conference on Signal Processing and Multimedia Applications, Milan, Italy, July 7-10.

12

Vandewalle P., Susstrunk S. and Vetterli M., 2005. “A Frequency Domain Approach to Registration of Aliased Images with Application to SR”, EURASIP Journal on Applied Signal Processing. Zhouchen, L. and Heung-Yeung, S., 2004. “Fundamental limits of Reconstruction-Based SR Algorithms under Local Translation”. IEEE Transactions on Pattern Analysis and Machine Intelligence. Vol. 26, No.1, January.

15arspc submission 217

Documents

image stacking image

image sensors

image quality

image formation process

improved images

actual image data

final higherresolution

low resolution image