high dynamic range image compression of color filter array … · 2013-11-01 · abstract high...
TRANSCRIPT
High Dynamic Range Image Compression ofColor Filter Array Data for the Digital Camera Pipeline
by
Dohyoung Lee
A thesis submitted in conformity with the requirementsfor the degree of Master of Applied Science
Graduate Department of Electrical and Computer EngineeringUniversity of Toronto
Copyright c© 2011 by Dohyoung Lee
Abstract
High Dynamic Range Image Compression of
Color Filter Array Data for the Digital Camera Pipeline
Dohyoung Lee
Master of Applied Science
Graduate Department of Electrical and Computer Engineering
University of Toronto
2011
Typical consumer digital cameras capture the scene by generating a mosaic-like grayscale
image, known as a color filter array (CFA) image. One obvious challenge in digital pho-
tography is the storage of image, which requires the development of an efficient compres-
sion solution. This issue has become more significant due to a growing demand for high
dynamic range (HDR) imaging technology, which requires increased bandwidth to allow
realistic presentation of visual scene.
This thesis proposes two digital camera pipelines, efficiently encoding CFA image
data represented in HDR format. Firstly, a lossless compression scheme exploiting a
predictive coding followed by a JPEG XR encoding module is introduced. It achieves
efficient data reduction without loss of quality. Secondly, a lossy compression scheme
that consists of a series of processing operations and a JPEG XR encoding module is
introduced. Performance evaluation indicates that the proposed method delivers high
quality images at low computational costs.
ii
Contents
1 INTRODUCTION 1
1.1 Motivation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1
1.2 Key Challenges . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3
1.3 Thesis Scope and Contributions . . . . . . . . . . . . . . . . . . . . . . . 5
1.3.1 Lossless HDR CFA compression scheme for the digital camera pipeline 5
1.3.2 Lossy HDR CFA compression scheme for the digital camera pipeline 6
1.4 Thesis Organization . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6
2 BACKGROUND 7
2.1 Digital Camera Design . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7
2.1.1 Digital Camera Architecture . . . . . . . . . . . . . . . . . . . . . 7
2.1.2 Image Processing Pipeline . . . . . . . . . . . . . . . . . . . . . . 9
2.1.3 Color Demosaicking . . . . . . . . . . . . . . . . . . . . . . . . . . 11
2.1.4 High Dynamic Range Imaging in Single Sensor Digital Cameras . 12
2.2 Image Compression . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 18
2.2.1 Common Image Compression Techniques . . . . . . . . . . . . . . 19
2.2.2 Image Compression Standards : JPEG family . . . . . . . . . . . 21
2.2.3 Prior arts on Bayer CFA compression . . . . . . . . . . . . . . . . 23
2.3 Image Quality Assessment Metrics . . . . . . . . . . . . . . . . . . . . . 26
2.3.1 Non-perceptual Quality Metrics . . . . . . . . . . . . . . . . . . . 27
iii
2.3.2 Perceptual Quality Metrics . . . . . . . . . . . . . . . . . . . . . . 27
3 Lossless CFA Compression using Prediction 30
3.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 30
3.2 Proposed Algorithm . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 31
3.2.1 Deinterleaving Bayer CFA . . . . . . . . . . . . . . . . . . . . . . 32
3.2.2 Green sub-image prediction . . . . . . . . . . . . . . . . . . . . . 34
3.2.3 Non-Green sub-image prediction . . . . . . . . . . . . . . . . . . . 37
3.2.4 Compression of prediction error . . . . . . . . . . . . . . . . . . . 41
3.3 Experimental Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 42
3.3.1 Primary color channel and color difference channel . . . . . . . . 44
3.3.2 Green channel interpolation method . . . . . . . . . . . . . . . . . 46
3.3.3 Dissimilarity measure in template matching . . . . . . . . . . . . 47
3.3.4 Prediction algorithm . . . . . . . . . . . . . . . . . . . . . . . . . 48
3.4 Chapter Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 53
4 Lossy CFA Compression using Colorspace Conversion 55
4.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 55
4.2 Proposed Algorithm . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 56
4.2.1 Interpolation of missing green components . . . . . . . . . . . . . 58
4.2.2 Interpolation of color difference components . . . . . . . . . . . . 61
4.2.3 Correction of green and color difference components . . . . . . . . 62
4.2.4 YCoCg color conversion . . . . . . . . . . . . . . . . . . . . . . . 63
4.2.5 Structure conversion . . . . . . . . . . . . . . . . . . . . . . . . . 64
4.3 Experimental Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 64
4.3.1 Edge Sensing Mechanism (ESM) and Compression . . . . . . . . . 66
4.3.2 Color Space and Compression . . . . . . . . . . . . . . . . . . . . 67
4.3.3 Proposed Pipeline and Conventional Pipelines . . . . . . . . . . . 70
iv
4.4 Chapter Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 74
5 Conclusions and Future Work 76
5.1 Conclusions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 76
5.2 Future Work . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 77
5.2.1 Potential extensions on the proposed systems . . . . . . . . . . . 77
5.2.2 General future work . . . . . . . . . . . . . . . . . . . . . . . . . 78
Bibliography 80
v
List of Tables
3.1 Lossless bitrate of proposed compression scheme with primary channel and
color difference channel . . . . . . . . . . . . . . . . . . . . . . . . . . . . 45
3.2 Lossless bitrate of proposed compression scheme with various G interpo-
lation schemes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 46
3.3 Lossless bitrate of proposed compression scheme with SAD and SSE dis-
similarity metrics . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 48
3.4 Lossless bitrate of various CFA compression schemes (direct CFA encoding
schemes) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 51
3.5 Lossless bitrate of various CFA compression schemes (predictive coding
schemes) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 52
3.6 Number of operations per pixel required for the proposed scheme . . . . 53
4.1 Encoding time for different pipelines and codecs . . . . . . . . . . . . . . 74
vi
List of Figures
2.1 Typical optical path for single sensor cameras . . . . . . . . . . . . . . . 8
2.2 Bayer CFA arrangement . . . . . . . . . . . . . . . . . . . . . . . . . . . 8
2.3 Conventional Image Processing Pipeline . . . . . . . . . . . . . . . . . . 10
2.4 Alternative Image Processing Pipeline . . . . . . . . . . . . . . . . . . . 10
2.5 Typical images with limited dynamic range and a HDR image . . . . . . 13
2.6 HDR image acquisition by capturing multiple images . . . . . . . . . . . 15
2.7 HDR image acquisition by estimation . . . . . . . . . . . . . . . . . . . . 16
2.8 Image pipeline design with raw CFA image storage . . . . . . . . . . . . 17
2.9 Image pipeline design exploiting HDR contents compression . . . . . . . 17
2.10 Block diagram of JPEG XR encoding process . . . . . . . . . . . . . . . 22
2.11 CFA deinterleave process . . . . . . . . . . . . . . . . . . . . . . . . . . . 24
2.12 CFA deinterleave process : G subimage . . . . . . . . . . . . . . . . . . . 25
3.1 Overview of the proposed lossless CFA compression pipeline . . . . . . . 32
3.2 Bayer CFA deinterleave method . . . . . . . . . . . . . . . . . . . . . . . 33
3.3 Current pixel to be predicted and its 4 closest neighborhood pixels in a
quincunx G sub-image . . . . . . . . . . . . . . . . . . . . . . . . . . . . 34
3.4 Template of G sub-image centered at (i,j). ’o’ indicates pixels in the tem-
plate region . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 35
3.5 Pixel values required for the prediction of G pixel at (i,j) . . . . . . . . . 36
3.6 Weight computation for the prediction of G pixel at (i,j) . . . . . . . . . 37
vii
3.7 Current pixel to be predicted and its closest neighborhood pixels in a red
difference (dr) sub-image . . . . . . . . . . . . . . . . . . . . . . . . . . 38
3.8 Template of red difference (dr) sub-image centered at (i,j). ’o’ indicates
pixels in the template region . . . . . . . . . . . . . . . . . . . . . . . . . 39
3.9 Weight computation for the prediction of red difference (dr) pixel at (i,j) 40
3.10 Test digital color images (referred to as image 1 to image 31, from left to
right and top to bottom) . . . . . . . . . . . . . . . . . . . . . . . . . . . 43
3.11 2D autocorrelation graphs for the image 4 in database (a) original images,
R and B, (b) color difference images, dr and db . . . . . . . . . . . . . . 44
3.12 Entropy of sample images from the database with various prediction methods 50
4.1 Overview of the proposed lossy HDR CFA image compression pipeline . . 57
4.2 Indexing of the samples within a 5x5 window of Bayer CFA . . . . . . . . 59
4.3 Two versions of color space conversion . . . . . . . . . . . . . . . . . . . 63
4.4 Rate-distortion curves of proposed pipelines with different ESMs for vari-
ous quality metrics . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 68
4.5 Rate-distortion curves of proposed pipelines with different color spaces for
various quality metrics . . . . . . . . . . . . . . . . . . . . . . . . . . . . 69
4.6 Rate-distortion curves of the proposed pipelines and 4 other pipelines for
various image quality metrics . . . . . . . . . . . . . . . . . . . . . . . . 71
4.7 Full color images obtained from four examined IPPs with JPEG XR codec
at bit rate between 1 and 2 bpp. First 4 images are sub-regions of the image
18, next 4 images are from the image 21, and last 4 images are from the
image 1 in the database . . . . . . . . . . . . . . . . . . . . . . . . . . . 73
viii
Acronyms
ALCM Activity level classification model
ASIC Application specific integrated circuit
BPP Bit per pixel
CCD Charge coupled device
CDM Color demosaicking
CFA Color filter array
CMBP Context matching based prediction
CMOS Complementary metal oxide semiconductor
DCT Discrete cosine transform
DSP Digital signal processor
DWT Discrete wavelet transform
ESM Edge-sensing mechanism
EXIF Exchangeable image file
HDR High dynamic range
HDRI High dynamic range imaging
ix
HVS Human visual system
JPEG Joint photography experts group
JPEG XR JPEG extended range
LBT Lapped bi-orthogonal transform
LDR Low dynamic range
MOS Mean opinion score
MSE Mean square error
PSNR Peak to signal noise ratio
RCT Reversible color transform
SAD Sum of absolute differences
SM Spectral model
SSE Sum of square errors
SSIM Structural similarity index
UHDTV Ultrahigh-definition television
VDP Visual difference predictor
x
Chapter 1
INTRODUCTION
1.1 Motivation
Over the past years, advancement in color imaging technology has reduced complexity,
size, and cost of color devices, such as digital cameras, monitors, and printers, allowing
more convenient access to them in various environments. One of the rapidly evolving
fields in color imaging technology is digital photography which has gained significant pop-
ularity in recent years. In order to create an image of a scene, digital cameras use a sensor,
an array of light-sensitive spots, called photosites, which records the total intensity of
the light that touches its surface. Commonly used image sensors have monochromatic
characteristic which cannot record color information. Among existing solutions, a single-
sensor imaging technology, which captures visual scenes in color using a monochrome
sensor in conjunction with a color filter array (CFA), offers tradeoffs among cost, per-
formance, and complexity. Thus, the single-sensor solution is widely adopted to typical
consumer grades digital cameras. Due to the advancement and proliferation of emerging
digital camera based applications and commercial devices, such as multimedia mobile
phones, sensor networks, and personal digital assistants(PDA), the demand for single-
sensor imaging and digital camera image processing solutions will grow considerably in
1
Chapter 1. INTRODUCTION 2
the next decade. [1]
Digital cameras embed a series of signal processing operations in their processors to
produce digital images, which is called an image processing pipeline. The three main
components of the image processing pipeline include image acquisition, image transmis-
sion/storage, and image visualization. Since the pipeline design is a key element to
determine image quality and computational efficiency of digital cameras, a significant
amount of research efforts has been devoted to it. At the first stage of pipeline, single-
sensor cameras produce a mosaic-like image formed by intermixing samples from RGB
channels, also called a raw CFA image. The CFA image differs from a full color RGB
image, as it contains only one color component at each pixel. In order to convert the CFA
image to a full color RGB image, two missing components of each pixel are estimated
by demosaicking operation. Then various image processing techniques are applied to the
full color demosaicked image to enhance image quality. Finally the enhanced image is
compressed to reduce memory consumption. Recently, this demosaicking-first approach
is found to be sub-optimal in terms of compression efficiency. An alternative solution,
which performs compression prior to demosaicking, is proposed and it raised an issue
specific to single-sensor cameras, the compression of a mosaic-like CFA image.
One of the most challenging and rapidly emerging issues for digital cameras is sup-
porting high dynamic range imaging (HDRI) technology. HDRI uses increased number of
bits to represent each pixel of digital images than conventional systems and thus provides
increased tonal resolution. As a result, it achieves more realistic representation of the
visual scene with smoother gradation. It is foreseeable that the imaging industry will
inevitably transit to HDRI technology in near future. This change will affect all stages
of the image processing pipeline of digital cameras from data acquisition to visualization.
Especially, increase in dynamic range leads to increased number of bits in image data.
For example, many digital cameras started to produce the CFA image in high bit format,
typically between 10 and 16 bit per pixel (replacing conventional 8 bit). Therefore, it
Chapter 1. INTRODUCTION 3
has become highly important to develop efficient compression techniques for HDR CFA
images to use expensive storage effectively.
The purpose of this thesis is to propose an efficient compression scheme for single-
sensor digital cameras to encode CFA images given in HDR format. The proposed system
is designed to minimize the amount of memory required to store HDR CFA data, while
maintaining computational requirements low due to limited resources in digital cameras.
The development of an efficient HDR CFA compression scheme will ultimately enables
ordinary users to experience promising HDRI technology in consumer level cameras and
allows considerable improvement of the visual realism in digital visual contents.
1.2 Key Challenges
In a design for the efficient CFA image compression scheme, a number of engineering
decisions is need to be made. This section lists general challenges and considerations
associated with it for digital cameras. Main concerns are cost, image quality, opera-
tional/power efficiency, and portability. [2]
• Dynamic Range (Image Precision) : Recently high dynamic range imaging
(HDRI) technologies have gained significant popularity in various fields, such as
movie, digital photography and computer graphics industries. Research trend in
digital photography is shifting from enhancement of spatial resolution to tonal res-
olution and significant emphasis is given to incorporation of HDRI technologies
into consumer level digital cameras. HDRI addresses limitations of traditional low
dynamic range imaging (LDRI) by providing wider range of luminance informa-
tion to achieve more precise representation of real visual scenes. Consequently,
HDRI technology can represent entire dynamic range of luminance that human can
perceive. [3] In order to support HDRI in digital cameras, each stage in image pro-
cessing pipeline (IPP), from image acquisition to visualization, should be updated
Chapter 1. INTRODUCTION 4
to facilitate image data in HDR format. Especially, the proposed CFA compression
scheme should retain the high bit-depth of given CFA data.
• Cost/Operational Efficiency : Production cost and operational efficiency are
two closely related consideration factors in IPP design. The proposed scheme should
efficiently manage expensive camera on-board memories and other computational
resources. Embedding sophisticated algorithms in resource constrained system is a
challenging task due to hardware limitation and cost. The ideal solution exploits
low complexity techniques in on-board processors and offloads high complexity
algorithms to end devices, where sufficient processing power is provided.
For optimum computational efficiency, computing hardware on a camera can be
explicitly designed to implement a given processing algorithm in the form of an
application specific integrated circuit (ASIC). However, development of new ASIC is
an expensive process requiring relatively high usage volumes to make this approach
financially attractive. Once constructed, the image processing chain in the ASIC
cannot be changed. On the other hands, the digital signal processor (DSP) provides
a significant degree of freedom over the ASIC block as the DSP is a programmable
device. In addition, the DSP is advantageous in terms of production cost. In
terms of processing speed, the ASIC is better choice than the DSP, as the ASIC is
dedicated for a given task, thus more optimized.
• Image Quality : The proposed compression scheme should be able to reproduce
color in great fidelity and high accuracy. The quality of final images is affected
by selection of processing algorithms. There are two categories of approaches de-
pending on the nature of compression: lossless and lossy. The lossless compression
algorithm does not allow the loss of image quality and the regenerated image after
decompression is an exact replica of the original image. The lossless compres-
sion algorithm is applicable for areas including medical imaging, image archiving
Chapter 1. INTRODUCTION 5
system, cultural-heritage, and surveillance system. On the other hands, a lossy
compression algorithm aims to achieve high compression ratio than lossless one by
allowing marginal image distortion. Thus, part of original data can be lost with a
lossy approach but it should maintain good perceptual quality of the reconstructed
image.
1.3 Thesis Scope and Contributions
This thesis focuses on implementing color filter array (CFA) compression schemes for
the digital camera pipeline that efficiently encode CFA images given in high dynamic
range format (high bit-depth). Although there exist various other CFA patterns, we only
focus on the Bayer CFA since it is most commonly used one in the industry due to its
optimal spatial arrangement [4]. Therefore, hereafter, if a CFA image is mentioned, a
Bayer CFA image is referred to unless specifically stated otherwise. Two different types
of compression schemes are proposed in this thesis. The first proposed solution encodes
HDR CFA data without loss of quality, referred as a lossless scheme. The other solution
is a lossy scheme that compresses HDR CFA image with marginal quality loss to enhance
compression efficiency.
1.3.1 Lossless HDR CFA compression scheme for the digital
camera pipeline
The first contribution of this thesis is proposing a lossless Bayer CFA image compres-
sion scheme capable of handling HDR representation. The proposed pipeline consists
of a series of pre-processing operations followed by a JPEG XR encoding module. A
deinterleaving step separates the CFA image to sub-images of a single color channel, and
each sub-image is processed by a proposed weighted template matching based prediction.
The utilized JPEG XR codec allows the compression of HDR data at low computational
Chapter 1. INTRODUCTION 6
cost. Extensive experimentation is performed using sample test HDR images to validate
performance and the proposed pipeline outperforms existing lossless CFA compression
solutions in terms of compression efficiency.
1.3.2 Lossy HDR CFA compression scheme for the digital cam-
era pipeline
The second contribution of this thesis is proposing a lossy Bayer CFA image compression
scheme capable of handling HDR representation. The proposed pipeline consists of a
series of pre-processing steps followed by a JPEG XR encoding module. A 8-directional
edge sensing mechanism and an inter-channel correlator are used to reduce estimation
errors and preserve edge related information in missing color component estimation. The
utilized YCoCg color space allows for a simplified pipeline implementation and deliver-
ance of high quality results. The proposed solution is tested using sample HDR images
and performance is validated using three image quality assessment metrics, including
composite peak-signal-to-noise ratio (CPSNR), multi-scale structural similarity index
(MSSIM), and HDR visual difference predictor (HDR-VDP). Extensive experimentation
reported in this thesis indicates that the proposed lossy compression solution is suitable
for limited resource environments due to low complexity and high performance.
1.4 Thesis Organization
The remainder of this thesis is organized as follows. Chapter 2 provides necessary back-
ground information and review of previous works related to single-sensor imaging tech-
nology, HDRI technology, image compression techniques, and image quality assessment
metrics. Our proposed CFA compression schemes are presented in Chapters 3 and 4. In
Chapter 5, we conclude this thesis and discuss some limitations and practical issues to
be considered in future researches.
Chapter 2
BACKGROUND
In this chapter, we provide technical concepts and existing research activities on digital
camera processing pipeline, high dynamic range imaging, fundamentals of image com-
pression, and image quality assessment metrics.
2.1 Digital Camera Design
2.1.1 Digital Camera Architecture
In digital cameras, the color information of an real-world scene is acquired through an
image sensor, usually a charge-coupled device (CCD) [5] or complementary metal oxide
semiconductor (CMOS) sensor [6] in the format of superimposition of three primary
colors, red(R), green(G), and blue(B). Commonly used image sensors are monochromatic
devices that sense the light within limited frequency range, and therefore cannot acquire
color information directly. Due to the monochromatic nature of the image sensor, digital
camera manufacturers implement several solutions to capture the visual scene in color.
The most straightforward approach to capture a digital image is to use three separate
sensors to capture RGB light. A beam splitter is used to project the light through three
color filters, and towards three sensors. However, a sensor is one of the most expensive
7
Chapter 2. BACKGROUND 8
components of a digital camera, usually taking upto 25 percent of the total production
costs [7], and thus, the three-sensor method is only used for high-end professional cameras.
The cost effective alternative to the three-sensor approach is a single-sensor imaging
technology. To reduce cost and complexity, most of digital cameras are equipped with a
sensor coupled with a color filter array (CFA). A CFA is a mosaic of color filters placed
on the top of conventional CCD/CMOS image sensor to filter out two of the R, G, and
B components in each pixel position. Consequently, a digital image acquired by CFA,
called a raw CFA image, stores only a single measurement of RGB in each pixel and
missing components are regenerated through a color demosaicking (CDM) process, also
known as a CFA interpolation. [1] Typical optical path for a single sensor camera is
shown in Figure 2.1.
Figure 2.1: Typical optical path for single sensor cameras
Figure 2.2: Bayer CFA arrangement
A number of RGB CFAs with the various layout of color filters in the array are
used in practice. Since the CFA is placed in the early stage in the image acquisition
pipeline, it determines the maximal resolution, image quality, and computational effi-
ciencies achievable by subsequent processing pipeline. The most common CFA design is
a Bayer pattern [8], contains two green, one blue, and one red samples arranged in a 2x2
block, as shown in Figure 2.2. The green component in the Bayer CFA is measured at
Chapter 2. BACKGROUND 9
double sampling rate since human visual system (HVS) is more sensitive to the green
portion of the spectrum.
2.1.2 Image Processing Pipeline
Digital cameras embed a series of signal processing operations in their processors to
produce images, which is called an image processing pipeline (IPP). An image pipeline
design plays a key role in digital camera systems for generating high quality images.
Although the sequence of operations differs from manufacturer to manufacturer, a general
image pipeline consists of a series of processing functions as shown in Figure 2.3. In typical
digital camera pipeline architecture, the CDM is one of the first operations performed
after CFA image acquisition. The CDM is a mandatory process that restores the color
information from the original CFA image. Then, the demosaicked RGB images are
modified by adjusting white balance, and performing color and gamma correction to
match the colors of the input scene when rendered on a display device. White balancing
removes the color tint of an image to make white objects appear white. Color correction
transforms the CFA sensor color space to a standard RGB space, such as linear sRGB
[9]. Gamma correction adjusts the image intensity to compensate the non-linearity of
CRT or LCD display. Once adjustment and correction processes are completed, the
enhanced image is compressed for storage or transmission. Typical cameras commonly
store the image in a compressed format using the Joint Photography Experts Group
(JPEG) standard [10]. The exchangeable image file (EXIF) format [11] allows storage
of additional metadata information related to the camera and the image characteristic
along with compressed image data using JPEG. [1] A drawback of conventional IPP
in Figure 2.3 is that CDM does not increase the information content of the original
image, but introduces redundancies by estimating missing pixels, consuming substantial
storages of the camera. The objective of image compression is to reduce redundancies in
image data, and therefore, compression of demosaicked images can be counterproductive.
Chapter 2. BACKGROUND 10
To avoid such issue, an alternative IPP in Figure 2.4, which reverses the CDM and
compression stages, can be utilized. [12]
Figure 2.3: Conventional Image Processing Pipeline
Figure 2.4: Alternative Image Processing Pipeline
In the alternative scheme, the CFA image is compressed prior to converting it to a
full color image. The main advantage of the alternative IPP is that a number of CFA
samples is only 1/3 of that in the full color image, thus requiring less computational
resource and storage capacity. In addition, this approach allows CDM and other en-
hancement/correction operations to be performed in the end device, rather than inside
the camera. Offloading of the CDM from the camera to the end device, such as a personal
computer (PC), allows utilization of a highly sophisticated CDM algorithm to produce
a more visually pleasing color output, because computational cost is less of issue in this
case. Moreover, it simplifies the hardware architecture and reduces cost, processing delay
and power consumption of digital cameras. Experimental results from various literatures
[13, 14, 15] suggests that the alternative IPP can generate similar or higher quality images
than the conventional chain under low compression ratios.
Chapter 2. BACKGROUND 11
2.1.3 Color Demosaicking
Color demosaicking (CDM) [16, 17] is a crucial operation in the single-sensor imaging
pipeline to restore the color image from the raw mosaic sensor data. The image acquired
through CFA appears as an interleaved mosaic similarly to a grayscale image and missing
components in the CFA image are reconstructed through CDM in order to produce a
complete RGB image. Thus, the objective of CDM is to transform a K1 ×K2 grayscale
image z : Z2 → Z to a K1 × K2 full color image x : Z2 → Z3. The CDM process can
be modeled as an interpolation function fϕ, which defines a relationship between output
image x and input CFA image z as follows:
x = fϕ(Λ,Ψ, ζ, z)
Λ : ESM(edge sensing mechanism) operator
Ψ : SM(spectral model) operator
ζ : local neighborhood area
z : CFA image
(2.1)
The edge-sensing mechanism (ESM) operator Λ = {w(i,j); (i, j) ∈ ζ} generates edge-
sensing weights w(i,j) of each individual neighborhood pixel on the basis of edge direction
so that the structural information of the input image z is preserved in missing information
estimation. Non-data adaptive ESM operators use simple linear averaging models and
fixed weights for all surrounding pixels resulting in blurred edges. On the other hands,
data adaptive ESM operators produces better quality full-color images with enhanced
fine details by adjusting edge-sensing weight factors of surrounding pixels.
The spectral model (SM) operator Ψ uses correlation between color channels to elim-
inate spectral artifacts in the output image x. There are two fundamental inter-channel
correlation models: the color ratio rule [18] and the color difference rule [19]. The first
model employs a property that ratios of two color channels are constant over local re-
gions. It assumes that within a given object, the ratio R/G or B/G are locally stationary.
Chapter 2. BACKGROUND 12
The second model is based on the property that the color difference signal between R, G,
and B images are slowly varying and thus, they are regarded as locally constant. Instead
of estimating the original intensity in the two chromatic color channels, R and B, color
difference model based algorithms estimate the difference signals, R-G, or B-G, in order
to derive missing values.
It is essential to use appropriate ESM operator and SM operator in order to reduce
excessive blur, color shifts and visible aliasing effects during the demosaicking process.
The equation (2.1) reflects important characteristics of natural scenes such that i) non-
stationary characteristic due to existence of edges and fine details, ii) existence of inter-
channel correlation among RGB channels, and iii) existence of intra-channel correlation
among spatially neighborhood pixels. [20]
2.1.4 High Dynamic Range Imaging in Single Sensor Digital
Cameras
Currently, the research emphasis in digital photography is shifting from spatial resolution
to tonal resolution and a significant amount of research effort has been devoted to HDRI.
HDRI is a imaging technology that enables more realistic representation of the visual
scene than conventional technologies by increasing dynamic range of image data. Dy-
namic range of a digital camera refers to the ratio between the maximum charge that the
sensor can collect and the minimum detectable charge that just overcomes sensor noise.
Once the light intensities of real world scene are measured in a sensor, they are quantized
to produce digital data, traditionally into 8 bit per component, which gives 256 distinct
levels. [21] However, the 8-bit representation is often not sufficient to represent the range
of intensity levels in visual scenes containing both very bright and dark areas at the same
time, and often such limitation results in improper exposure issues in captured images.
For instance in a digital image captured with low exposure settings, dark areas in the
scene will be recorded as black (underexposure). On the other hands, in high exposure
Chapter 2. BACKGROUND 13
settings, bright areas will be saturated (overexposure). HDRI performs operations on
color data with a larger number of bits per component than 8 bit to represent more tonal
levels over a much wider dynamic range. For example, 16-bit format can be used to
represent pixels in a HDR image, which provides us tonal levels of 65,536 (= 216). It is
sufficient to reveal more detail in complex scene lighting conditions. Figure 2.5 demon-
strates poorly captured images due to limited dynamic range and a HDR image that
preserves a wide dynamic range of light intensities. It can be seen that texture pattern
on the wall is hidden under dimly illuminated areas in a low exposure image while detail
of stained glass is not visible due to saturation in a high exposure image. On the other
hands, the final HDR image reveals all details without loss of information.
(a) image taken with low exposure time (b) image taken with high exposure time
(c) HDR image
Figure 2.5: Typical images with limited dynamic range and a HDR image
This section provides a brief overview of the three major components in the HDR
Chapter 2. BACKGROUND 14
image processing pipeline for digital cameras: image acquisition, compression, and visu-
alization. Especially strong emphasis is given on acquisition and compression of HDR
images, which are generally embedded on digital cameras rather than end devices.
HDR Content Acquisition
There are two common approaches to produce HDR images in single sensor digital cam-
eras: i) capture images directly from a HDR sensor, ii) generate HDR images by com-
bining multiple low dynamic range (LDR) images at more than one exposure level using
a regular sensor. Due to high production cost associated with a HDR sensor, the latter
approach is more practical for consumer level cameras. In order to generate HDR images,
multiple photos in different exposure values are captured and combined together to get
good detail in all areas of a scene. Merging of multiple LDR images, so called HDR
reconstruction process, involves the characterization of the sensor’s intensity response
function f , which relates a image pixel value zij and an actual scene radiance value Eij
as follows.
zij,k = f(Eij∆tk + ηij) (2.2)
A collection of k differently exposed pictures of a scene acquired with known variable
exposure times ∆tk and the sensor’s noise ηij give a set of zij,k values for each pixel ij,
where k is the index on exposure times. Once f is recovered, the actual scene radiance
values are obtained by applying its inverse f−1 to the set of correspondent brightness val-
ues zij,k observed in the differently exposed images. One of the most popular techniques
for HDR reconstruction is the Debevec and Malik method, shown in Figure 2.6 [22]. It
is a two-stage HDR reconstruction algorithm that estimates a non-parametric response
function from image pixels and then recovers the radiance map.
Chapter 2. BACKGROUND 15
Figure 2.6: HDR image acquisition by capturing multiple images
The input to the algorithm is a number of digital images taken from the same vantage
point with different known exposure durations ∆tk. It is assumed that the scene is static,
the sensor’s noise ηij is negligible, the irradiance values Eij,k for each pixel ij are constant,
and f is monotonic, thus invertible. The camera response function f is
zij,k = f(Eij∆tk)
f−1(zij,k) = Eij∆tk
ln f−1(zij,k) = lnEij + ln ∆tk
g(zij,k) = lnEij + ln ∆tk ,where g = ln f−1
(2.3)
The algorithm finds the function g and the radiances Eij that best satisfy an objective
function in a least-squared error sense. Once g is obtained, it can be used to convert pixel
values to relative radiance values Eij using known ∆tk. For multiple capture approaches,
it is essential that the scene is completely static during captures. Otherwise, misalignment
between images due to movement of either objects in the scene or a camera causes a
ghosting effect, which introduces blurry or transparent artifacts on a generated HDR
image. Several techniques are proposed to reduce ghosting problem: i) use a tripod to
eliminate camera movements, ii) capture a scene with a faster shutter speed to freeze
motion of objects, and iii) exploit anti-ghosting techniques [23, 24].
Recently, new HDR acquisition technique [25] is proposed which doesn’t require mul-
tiple captures of images. This method, shown in Figure 2.7, generates multiple LDR
Chapter 2. BACKGROUND 16
images of different exposure levels from an input Bayer CFA image using predefined
look-up tables(LUTs) and merging the original and generated LDR images together to
produce a final HDR image. Since this method removes needs for iterative processing
and ghosting artifact caused by moving object, it is a reasonable solution that makes the
HDRI technology feasible in single sensor imaging devices along with the multiple LDR
capture method.
Figure 2.7: HDR image acquisition by estimation
HDR Image Compression
As discussed in previous section, there are different techniques to create HDR images
in digital cameras. Compression of acquired HDR content forms the next component in
the processing chain. Nowadays, high-end/professional cameras allow the storage of the
raw CFA data in high bit-depth, typically between 10 to 16 bit per pixel. For example,
a popular high-end camera, the Canon EOS 5D Mark 2 can provides raw CFA image
in bit depth of 14 bits. Increase in data bit depth leads to increased amount of image
data and we need more efficient encoding algorithms. The JPEG compression, the most
widely used image compression solution, disallows future manipulation offered by the
high bit depth data since they are limited to 8 bit representation. Therefore, original
HDR contents should be squashed into 8 bit prior to apply JPEG compression, causing
the loss of precision. Current high-end cameras address this issue by allowing the storage
of raw CFA image without compression, as illustrated in Figure 2.8.
Chapter 2. BACKGROUND 17
Figure 2.8: Image pipeline design with raw CFA image storage
In such design, the user can retrieve CFA images from the digital camera and perform
high quality post-processing operations in PC without loss of HDR contents. Camera
manufacturers different types of raw files, such as CR2 (Canon), NEF (Nikon), ORF
(Olympus), PEF (Pentax), RW2 (Panasonic) and SR2 (Sony), mostly based on the TIFF
file format. However, preserving CFA images in a raw format leads to excessive consump-
tion of the camera storage memory. Figure 2.9 presents the image processing pipeline
that addresses the storage inefficiency issue associated with HDR data.
Figure 2.9: Image pipeline design exploiting HDR contents compression
In the proposed IPP, image compression standard capable of handling high bit-depth
data, such as JPEG XR or JPEG 2000, is applied immediately after raw CFA image
acquisition. It allows the CFA image to be compressed while retaining the necessary
high bit-depth data for future manipulation. Ultimately the user will be offered efficient
usage of expensive memory resources while maintaining superior image quality during
various post-processing operations.
Chapter 2. BACKGROUND 18
HDR Image Display
Displaying HDR content is the last component of the HDR image processing chain.
HDR content usually cannot be directly displayed on common display devices, LCD
or CRT monitors, as dynamic ranges of such devices are limited to conventional 8 bit
representation. Tone mapping process performs a conversion which takes luminance
of a HDR image as input and produces output pixel intensity that can be displayed on
standard display devices. Several tone mapping algorithms are proposed in the literatures
and they are categorized in two classes, global approaches [26, 27] and local approaches
[28]. Global tone-mapping algorithms apply same transfer function for all pixels. On the
other hands, local tone tone-mapping algorithms adapt mapping functions depending on
local statistics and pixel contexts. Generally, there is no single method which produces
the best result for all images and thus, user need to select an optimal algorithm based
on particular requirements and available computational resources.
2.2 Image Compression
In digital imaging, each pixel is a sample of an original image, and its intensity is typ-
ically represented with a fixed number of bits. The statistical analysis indicates that
digital images contain a significant amount of spatial and spectral redundancies. Image
compression aims at taking advantage of these redundancies to reduce the number of
bits to represent an image. In addition, the insensitivity of HVS allows further reduction
of bandwidth by ignoring certain signals that is not sensible by human. This section
elaborates on fundamental image compression techniques, common image compression
standards, and various CFA compression algorithms for single sensor imaging devices.
Chapter 2. BACKGROUND 19
2.2.1 Common Image Compression Techniques
Color Space Conversion
A digital image generally has three color components per pixel, RGB. Instead of cod-
ing RGB data directly, common compression standards exploit color space conversion
to transform them into luminance/chrominance system. The luminance/chrominance
system defines a color space in terms of one luminance and two chrominance compo-
nents. Luminance is perceived brightness of the light, while chrominance is defined as
the characteristic of light that produces the sensation of color apart from luminance.
[1] Luminance/chrominance spaces are advantageous over RGB in two major reasons.
Firstly, for general color images, inter-channel correlation can be reduced by converting
RGB images to luminance/chrominance images, thus color space conversion allows better
compression performance. Secondly, it is a more convenient form to apply a subsampling
technique that allows reduction of visually redundant content that is less sensible for
human. The most commonly used luminance/chrominance system in multimedia com-
pression is the YCbCr space. The forward and inverse conversions between RGB and
YCbCr are defined in the JPEG 2000 specification as follows [29]:Y
Cb(U)
Cr(V )
=
0.299 0.587 0.144
−0.169 −0.331 0.5
0.5 −0.4187 −0.08
R
G
B
⇐⇒
R
G
B
=
1 0 1.402
1 −0.344 −0.714
1 1.772 0
Y
Cb
Cr
(2.4)
The conversion process in (2.4) is computationally expensive due to floating point arith-
metic. Recently the YCoCg color space was introduced to simplify color transformation
by avoiding use of floating point coefficients and rounding errors. This new color space
defines two chrominance channels, Co and Cg, which can be regarded as excess orange
Chapter 2. BACKGROUND 20
and excess green. The transform matrix of YCoCg is derived by close approximation
of Karhunen-Loeve transform (KLT) from standard Kodak image set and can be imple-
mented using simple addition and right shift as follows [30]:Y
Co
Cg
=
1/4 1/2 1/4
1/2 0 −1/2
−1/4 1/2 −1/4
R
G
B
⇐⇒
R
G
B
=
1 1 −1
1 0 1
1 −1 −1
Y
Co
Cg
(2.5)
The reversible form of YCoCg transform, referred as YCoCg-R, is used in the JPEG XR
standard and in recent edition of the H.264/MPEG-4 AVC standard.
Predictive Coding
Instead of encoding original signal directly, a predictive coding technique, also known as
a differential coding, encodes the difference between the original signal and its prediction.
Since pixels in a natural image are highly correlated to each other, a pixel can be predicted
with a good accuracy from its adjacent pixels. A predicted value is then subtracted from
the original value of the corresponding pixel to obtain a prediction error, also called
a prediction residue. The performance of predictive coding is significantly affected by
the accuracy of prediction algorithm. If the prediction is well designed, distribution of
prediction error signal will be closely concentrated on zero and the variance of the error
signal will be much lower than that of the original signal. Consequently, applying an
entropy coding on the prediction error signal will improve compression efficiency.
Predictive coding is often used in lossless compression standards. The most popular
compression standards make use of predictive coding technique is JPEG-LS [31]. JPEG-
LS standard exploits a predictor called Median Edge Detector (MED) which provides a
good balance between prediction accuracy and computational simplicity. It predicts the
Chapter 2. BACKGROUND 21
value of the current pixel by examining 3 neighbor pixels of the current one in North,
West, and North-west directions. Another lossless image codec CALIC [32] employs an
advanced predictor called Gradient Adaptive Predictor (GAP) that provides a higher
prediction performance by using 7 neighbor pixels.
2.2.2 Image Compression Standards : JPEG family
In digital photography, there are many different formats to compressed raw images.
However, most frequently used compression standards are the ones established by Joint
Photographic Experts Group. These standards are widely adopted by manufacturers
for compatibility for their products. The first standard released by the JPEG group
is the JPEG standard [10], introduced in the 1980s. JPEG’s baseline mode, the most
dominantly used operation mode, is a lossy compression scheme based on two dimensional
Discrete Cosine Transform (DCT). Its workflow consists of color space conversion, DCT
transform, quantization, and entropy coding. Although JPEG has been successful in
the industry for a long period, its limitation in rate-distortion performance and lack
of supports for unified pipeline for both lossy and lossless coding raised the need for
an advanced compression standard. To overcome limitations of JPEG, JPEG2000 [33]
was released in 2000 under the principle of the Discrete Wavelet Transform (DWT).
JPEG2000 provides not only higher rate-distortion performance than the original JPEG
standard but also a single pipeline for both lossy and lossless encoding. Its spatial
and quality scalability allows decoding of compressed bitstream in different resolution
and precision configurations to meet different application requirements. In addition,
JPEG2000 can handle high bit-depth data such as 16-bit integer or 32-bits floating point
per components, enabling compression of HDR images. However, main disadvantage of
JPEG2000 compared with the JPEG is its complex architecture which resulted in limited
industrial adoption.
JPEG XR (extended range) [34], released in 2009, is a new image compression stan-
Chapter 2. BACKGROUND 22
dard based on Microsoft coding technology known as HD Photo [35]. JPEG XR pro-
vides many convenient features offered in JPEG 2000 while maintaining its architecture
substantially simpler than JPEG 2000 since it only uses integer based computations
internally.
Figure 2.10: Block diagram of JPEG XR encoding process
JPEG XR supports a wide range of input bit-depth from 1 bit through 32 bit per
component. 8-bit and 16-bit formats are supported for both lossy and lossless com-
pression, while 32-bit format is only supported for lossy compression as only 24 bits
are typically retained through internal operations. Following conventional image com-
pression structure, JPEG XR’s coding path, shown in Figure 2.10, includes color space
conversion, block transform based on a reversible lapped bi-orthogonal transform (LBT),
quantization, and entropy coding. The LBT converts image data from spatial domain
to frequency domain. As a result of the LBT the coefficients are grouped into three
subbands, DC, lowpass(LP), and highpass(HP). DC, LP and HP subbands are then
quantized and entropy coded independently.
The performance of JPEG XR has been compared with other compression standards
in literatures. [36] evaluates rate-distortion performance of JPEG XR against JPEG,
JPEG 2000 and AVC/H.264 HP 4:4:4 intra using objective quality metrics, such as PSNR
and MSSIM index. It concludes that the performances of JPEG XR and JPEG 2000 are
Chapter 2. BACKGROUND 23
very close to each other and JPEG 2000 outperforms JPEG XR slightly in some cases.
[37] performs perceptual quality assessments to compare rate-distortion performance of
JPEG, JPEG2000, and JPEG XR. Experimental results drew the similar outcome as
objective assessments.
2.2.3 Prior arts on Bayer CFA compression
As discussed in Section 2.1.4, storage of raw CFA images leads to excessive usage of
camera on-board memory and therefore, it raised the problem of efficient CFA image
compression. This section summarizes various CFA image compression schemes in lit-
eratures which follows the alternative processing workflow that performs compression
in earlier stage than CDM. The most straightforward approach is a direct application
of standard image compression, such as JPEG, JPEG-LS, or JPEG 2000, on raw CFA
images. [38, 39] However, direct compression of raw CFA images is found to be inef-
ficient since existing compression solutions are generally optimized for continuous tone
images and don’t work as effectively for mosaic-like images. Due to nonuniform spectral
sensitivity of image sensor, pixels from different color channels have different average
intensity levels. Therefore, intermixing pixels from different color channels generates ar-
tificial discontinuity. In order to address this issue, advanced CFA compression schemes
typically exploit various pre-processing operations prior to image encoding for optimal
use of compression tools.
In current, state-of-the-art, single sensor camera designs utilize compression schemes
in three different ways: lossless [13, 39, 40, 41], lossy [14, 15, 38, 42, 43, 44, 45], and near-
lossless [40], depending on nature of pre-processing algorithms and compression tools.
Lossless compression is used when the exact replica of the original image data is pre-
ferred over high compression ratio. It is crucial in the field of medical imaging, cinema
industry, and image archiving system of museum arts and relics. On the other hands,
lossy approaches aim to minimize amount of image data by discarding visually redun-
Chapter 2. BACKGROUND 24
dant contents. They are suitable for the areas where the efficient usage of memory and
computational resource is paramount. Near-lossless schemes lie somewhere in-between
two other classes, where algorithms achieve perceptually lossless compression by limiting
distortion in compressed image to pre-defined threshold values.
Figure 2.11: CFA deinterleave process
In following, a number of pre-processing techniques exploiting a pixel rearrangement
strategy are discussed. Commonly, prior-art solutions deinterleave the CFA images into
sub-images, each consists of samples from a single color channel. The resulting R and B
sub-images form rectangular lattice which can be easily encoded by common standards.
However, the quincunx lattice of G sub-image is needed to be further processed for sub-
sequent compression. There are three popular approaches to transform the quincunx G
sub-image to the form more convenient for compression: i) merge, ii) separation, and
iii) rotation. Some CFA compression techniques [14, 42] employ a color space conversion
to convert CFA image in RGB domain to luminance-chrominance domain prior to dein-
terleave. In such scenario, deinterleave operation produces a quincunx luminance (Y)
and rectangular chrominance (C) color channels, and thus following techniques can be
applied to Y channel instead of G.
Firstly, the merge method [14, 43, 44] shifts either even pixel rows up or even pixel
columns left by one pixel. This produces a rectangular grid where one dimension is
equal and the other one is a half of the corresponding CFA. The generated rectangular
Chapter 2. BACKGROUND 25
Figure 2.12: CFA deinterleave process : G subimage
images are compressed by JPEG or JPEG2000. Since, simple shift can introduce dis-
tortion causing suboptimal compression, [14] applies directional lowpass filtering prior
to compression. This is only suitable for lossy approaches as lowpass filtering removes
edges and fine details. Secondly, the separation method [14, 38, 40, 42] splits the quin-
cunx lattice into two rectangular lattices and compresses them separately. Independent
encoding of two sublattices is inefficient as it disregards spatial correlation between two
sublattices, and therefore, [40] applies a predictive coding technique to improve com-
pression efficiency. Lastly, rotation method [45] rotates the quincunx grid by 45 degree
and removes blank pixel positions. However, the resulting image forms a rhombus and
standard encoders such as JPEG, and JPEG2000 cannot be applied directly.
Instead of performing color channel deinterleaving, [13] applies a wavelet decomposi-
tion followed by an entropy coding directly to CFA images to alleviate the aliasing issue
in direct CFA encoding. In this scheme, the Mallet wavelet transform decorrelates CFA
images by efficiently packing the signal energy into subbands. Overall, there exist various
CFA compression schemes and the experimental result indicates that there is no single
best method for all test images. Therefore, the ultimate design goal is to decide appropri-
Chapter 2. BACKGROUND 26
ate pre-processing operations and compression standards to meet a set of requirements:
rate-distortion performance and computational cost.
2.3 Image Quality Assessment Metrics
With the advent of various multimedia compression standards, it has become increasingly
important for industry to devise standardized quality assessment tools for compressed
digital contents. Since human observers are ultimate receivers in image processing ap-
plications, the most reliable way to evaluate quality is to conduct a survey, where a
group of humans is asked to rate on perceived quality of presented images on a numer-
ical scale. The average of obtained values is called the mean opinion score (MOS) and
such assessment technique is referred as subjective quality assessment (QA). However,
the impracticality of subjective QA raised the need for objective QA that measures the
perceived quality of visual contents using automated algorithms. Those metrics can be
employed to benchmark image processing systems and also embedded into system to opti-
mize system parameter settings. Generally objective QA metrics are categorized in three
classes: i) full-reference (FR), ii) no-reference (NR), and iii) reduced-reference (RR). [46]
FR algorithm require an original version of image (non-distorted) to predict perceived
quality of a sample distorted image. NR algorithms don’t need an access to original
image and RR algorithms lie somewhere in-between where they only require some char-
acteristics of a reference image. This section focuses on image QA metrics implementing
FR algorithms, which are mainly used in this thesis research.
Chapter 2. BACKGROUND 27
2.3.1 Non-perceptual Quality Metrics
One of the most common objective QA metrics, the Mean Square Error(MSE) is defined
as,
MSE =1
MN
M∑i=1
N∑j=1
(X(i,j) − Y(i,j))2 (2.6)
where X denotes a reference image, Y denotes a distorted image to be compared, and
M,N denote image dimensions. The MSE is basically a normalized Minkowsky distance
with order p being 2, where the Minkowsky distance is defined as follows:
Ep = (M∑i=1
N∑j=1
|X(i,j) − Y(i,j)|p)1/p (2.7)
In addition, setting p = 1 yields the mean absolute error (MAE), and p = ∞ yields
the maximum absolute difference (MAD). In practice, MSE’s variant, the peak to signal
noise ratio(PSNR) is often used in dB scale, which is defined as follows:
PSNR = 10 log10
(2B − 1)2
MSE(2.8)
where B represents bit depth. MSE, PSNR and other variants can be easily implemented
in real world applications but often don’t reflect the way that HVS perceives images.
Therefore, a major emphasis in recent research has been given to image QA algorithms
based on explicit modeling of the HVS, such as the structural similarity index (SSIM)
and the Visible Difference Predictor (VDP).
2.3.2 Perceptual Quality Metrics
The SSIM index [47] is a widely used FR algorithm based on an idea that HVS is
highly adapted to extract structural information from visual scenes. It separates the
task of image similarity measurement into three components: luminance, contrast, and
structure. The luminance and contrast distortions are affected by illuminance variations,
while structure information of the objects is independent of the illuminance. Hence,
Chapter 2. BACKGROUND 28
the SSIM algorithm performs independent structure distortion measurement along with
luminance and contrast analysis. Similarly to other FR approaches, the SSIM index is a
function of two images denoted as X and Y , that if one of the images is assumed to be
the reference image, the SSIM index can be regarded as a quality measure of the other
image.
Initially, the algorithm estimates the local luminance of each image signal by the
mean intensity. The local luminance of image X, µx, is obtained by
µx =1
MN
M∑i=1
N∑j=1
X(i,j) (2.9)
Secondly, the mean intensity is removed from the signal and the standard deviation is
used as a round estimation of the contrast information. The contrast of image X, σx, is
estimated as follows
σx = { 1
MN − 1
M∑i=1
N∑j=1
(X(i,j) − µx)2}1/2 (2.10)
Next, the signal is normalized by its own mean and standard deviation. This normalized
signal, (X − µx)/σx, is used as a structure estimation of image X. Parameters for local
luminance, contrast, and structure information is obtained for each image signal and
they formulates luminance comparison function l(X, Y ), contrast comparison function
C(X, Y ), and structure comparison functions s(X, Y ) as follows:
l(X, Y ) = (2µxµy + C1)/(µ2x + µ2
y + C1)
c(X, Y ) = (2σxσy + C2)/(σ2x + σ2
y + C2)
s(X, Y ) = (2σxy + C3)/(σxσy + C3)
where σxy =1
MN − 1
M∑i=1
N∑j=1
(X(i,j) − µx)(Y(i,j) − µy)
(2.11)
C1, C2, and C3 are defined as C1 = (K1L)2, C2 = (K2L)2, and C3 = C2/2, where L
denotes the dynamic range of the pixel values, and K1, K2 are positive constants generally
set to be 0.01 and 0.03 respectively. Finally, the three components are combined to yield
Chapter 2. BACKGROUND 29
an overall similarity measure SSIM(X, Y )
SSIM(X, Y ) = [l(X, Y )]α · [c(X, Y )]β · [s(X, Y )]γ (2.12)
where α, β and γ are positive parameters that adjust the relative importance of the three
components. Typically, the SSIM method is applied locally rather than globally using a
support window, producing a SSIM index quality map of the image. In practice, when a
single quality measure of the entire image is preferred to the quality map, a mean SSIM
index is often used using (2.13):
SSIM(X, Y ) =1
M
M∑i=1
SSIM(xi, yi) (2.13)
where xi and yi are the image pixel values of the reference and the distorted images at
the i-th local window, and M is the number of local windows in the image.
An advanced SSIM metric, called a multi-scale SSIM (MSSIM) [48] is often used due
to its robustness in variation of viewing conditions. MSSIM initially decomposes a test
image into several scales and provides statistics by measuring luminance, contrast, and
structure information of each sub-scale image. Finally, all the data is pooled into a single
number. MSSIM provides good correlation to subjective measurements at a reasonable
computational cost.
Another widely used image QA metric is the Visible Difference Predictor(VDP). The
VDP metric predicts pixel percentage of a test image that standard observers would
perceive as different from an original. In order words, VDP does not try to judge how
irritating image artifacts introduced by compression are, it only tries to predict whether
they are detectable. VDP deploys a highly complex model of the HVS and thus, com-
putationally intensive than MSSIM. The VDP algorithm customized for HDR images is
called the HDR-VDP [49]. It deploys several modifications to the VDP to improve its
prediction accuracy in the wider range of luminance and under the adaptation conditions
corresponding to real scene observation.
Chapter 3
Lossless CFA Compression using
Prediction
3.1 Introduction
In this chapter, a new lossless CFA compression method capable of handling HDR rep-
resentation is presented. We focus on the Bayer CFA structure as it is the dominant
CFA arrangement in the industry. The proposed scheme consists of color channel dein-
terleave, weighted template matching prediction, and lossless image compression opera-
tions. There are main differences of the proposed method compared to prior art solutions.
Firstly, it introduces a weighted template matching prediction to increase the accuracy of
prediction and achieve high compression efficiency. Our method is similar to the context
matching based prediction (CMBP) presented in [41], but is more advantageous in terms
of computational complexity. It is because the proposed method does not require the
generation of the direction vector map, that is necessary to carry out predictive coding
in CMBP. Secondly, we make use of the JPEG XR codec [34] to facilitate a lossless com-
pression of CFA image in HDR representation, such as 16 bit per pixel format. Although
other codecs, such as JPEG 2000 or JPEG-LS, are also capable of handling HDR input,
30
Chapter 3. Lossless CFA Compression using Prediction 31
JPEG XR’s balance between performance and complexity makes it a suitable solution
for digital camera implementation.
The rest of this chapter is structured as follows. The proposed lossless CFA compres-
sion pipeline is presented in Section 3.2. Experiment results and analysis are demon-
strated in Section 3.3. Finally the chapter summary is given in Section 3.4.
3.2 Proposed Algorithm
Figure 3.1 illustrates the proposed CFA compression method for encoding process and
decoding process. The proposed scheme employs a structure separation to extract 3 sub-
images of single color component from the original CFA layout. Then, each sub-image
undergoes a predictive coding process. The predictive coding forms a prediction for
current pixel based on a linear combination of previously coded neighborhood pixels, and
encodes the prediction error signal to remove spatial redundancies. Initially we process
G sub-image using the weighted template matching prediction technique in raster scan
order, and generate the prediction error of G channel, eg. After completion of G channel
prediction, non-green sub-images are processed. Instead of carrying out the prediction on
R and B samples directly, we use color difference domain signals, dr (G-R), and db (G-B)
for non-green components. This allows us to reduce spectral (inter-channel) redundancies
in the data, leading to higher compression efficiency. In order to obtain color difference
signals, the estimation of missing G values at non-green pixel positions is necessary. In
the proposed algorithm, we perform a bilinear interpolation on a quincunx G sub-image,
which delivers satisfactory performance at low computational cost. Again, the prediction
error of color difference signals, edr and edb are obtained by the proposed predictor. The
generated error signals constitute standard 4:2:2 formatted data. Therefore, they are
encoded by JPEG XR codec using its 4:2:2 lossless encoding mode.
In the companion decoding pipeline, compressed prediction error signals are decoded.
Chapter 3. Lossless CFA Compression using Prediction 32
Then the decoder forms the identical prediction as the one from the encoding pipeline us-
ing decompressed error signal to reconstruct individual sub-images. Finally, we combine
generated sub-images to recover original CFA layout.
(a) Encoding process
(b) Decoding process
Figure 3.1: Overview of the proposed lossless CFA compression pipeline
3.2.1 Deinterleaving Bayer CFA
The proposed pipeline initially deinterleaves the Bayer CFA images into three sub-images,
r, g, and b, as shown in Figure 3.2. As previously mentioned in Section 2.2.3, the direct
application of compression solution to the CFA image is inefficient as CFA data are
formed by intermixing samples from different color channels. Although for most natural
Chapter 3. Lossless CFA Compression using Prediction 33
images, there still exist spatial correlations between CFA samples, pixels from different
channels contain high frequency discontinuities, disallowing high compression ratio. By
deinterleaving the CFA image, three downsampled sub-images, each of which consists of
pixels in a single color channel, are extracted.
Figure 3.2: Bayer CFA deinterleave method
Let us consider, a K1K2 grayscale CFA image z(i,j) : Z2 → Z representing a two-
dimensional input image to encode. The deinterleaving process can be formulated as
follows:
g(i,j) =
z(i,j) , (i, j) ∈ {(2m− 1, 2n), (2m, 2n− 1)}
0 , otherwise
r(i,j) =
z(i,j) , (i, j) ∈ {(2m− 1, 2n− 1)}
0 , otherwise
b(i,j) =
z(i,j) , (i, j) ∈ {(2m, 2n)}
0 , otherwise
(3.1)
where m = 1, 2, · · · , K1/2, and n = 1, 2, · · · , K2/2. The obtained R and B sub-images
form square lattices, while the obtained G sub-image constitutes a quincunx lattice.
Each sub-image contains pixels from same color component and thus, subsequent pre-
diction process can effectively remove spatial redundancies to achieve high compression
performance.
Chapter 3. Lossless CFA Compression using Prediction 34
3.2.2 Green sub-image prediction
The compression efficiency of predictive coding depends on the accuracy of a prediction
model. Simple linear predictors often yield poor performance at image edge regions.
The proposed adaptive predictor exploits a template matching technique to achieve high
prediction performance. It measures the dissimilarity between the template of a current
pixel to predict and the template of candidate pixels in neighbor to determine weight
factors of candidate pixels. The weight factors adaptively increases the influence of
candidate pixel whose associated template closely resembles the template of the pixel to
predict and located closer from current spatial position. The proposed scheme handles
the pixels in a raster scan order, which means from left pixel to right and from top to
bottom.
Figure 3.3: Current pixel to be predicted and its 4 closest neighborhood pixels in a
quincunx G sub-image
Figure 3.3 illustrates the current G pixel g(i,j) to predict and its 4 candidate pixels,
which are previously scanned 4 neighbor G pixels. The predicted value of g(i,j), denoted
as g(i,j), is given by,
g(i,j) =∑
(p,q)∈ζ1
(w′g(p,q) · g(p,q)) (3.2)
where ζ1 are 4 closest neighborhood pixels of g(i,j) such that ζ1 ∈ {(i, j − 2), (i − 1, j −
Chapter 3. Lossless CFA Compression using Prediction 35
1), (i− 2, j), (i− 1, j + 1)}. The normalized weight factors, w′g(p,q) are given by
w′g(p,q) = wg(p,q)/∑
(m,n)∈ζ1
wg(m,n) (3.3)
The original weight factor wg(p,q) is defined as follows:
wg(p,q) = {1 + (∑
(r,s)∈ζ1
Diff(Tg(p,q), Tg(r,s))/D(g(p,q), g(r,s)))}−1 (3.4)
where Tg(p,q) is the template of G prediction centered at pixel (p,q), Tg(p,q) ∈ {(p, q −
2), (p− 1, q− 1), (p− 2, q), (p− 1, q+ 1)}, operator Diff(·) is a dissimilarity metric, and
operator D(·) is a spatial distance between two pixels. We add 1 in the denominator
to avoid a singularity issue that (∑
(r,s)∈ζ1 Diff(Tg(p,q), Tg(r,s))/D(g(p,q), g(r,s))) becomes
zero. [50]
Figure 3.4: Template of G sub-image centered at (i,j). ’o’ indicates pixels in the
template region
The template used for G prediction is shown in Figure 3.4. Although using a larger
template image in matching process improves prediction performance, the template of 4
pixels shows good trade-off between prediction accuracy and computational cost.
Typically, prediction techniques use sum of absolute differences (SAD) or sum square
errors (SSE) between two templates in order to determine the degree of dissimilarity. We
use the SAD due to its simplicity in implementation. Therefore, Diff(Tg(p,q), Tg(r,s)) is
Chapter 3. Lossless CFA Compression using Prediction 36
defined as follows:
Diff(Tg(p,q), Tg(r,s)) = |g(p,q−2) − g(r,s−2)|+ |g(p−1,q−1) − g(r−1,s−1)|
+ |g(p−2,q) − g(r−2,s)|+ |g(p−1,q+1) − g(r−1,s+1)|(3.5)
Figure 3.5: Pixel values required for the prediction of G pixel at (i,j)
As shown in Figure 3.5, the proposed predictor requires a 5x7 support window
centered at pixel location (i-2, j-1) to calculate g(i,j). wg(i,j−2), wg(i−1,j−1), wg(i−2,j), and
wg(i−1,j+1), correspond to the west, northwest, north, and northeast weight factors of g(i,j)
pixel, are obtained using equation(3.6).
wg(i,j−2) = {1 + (|g(i,j−2) − g(i,j−4)|+ |g(i−1,j−1) − g(i−1,j−3)|+
|g(i−2,j) − g(i−2,j−2)|+ |g(i−1,j+1) − g(i−1,j−1)|)/(2)}−1
wg(i−1,j−1) = {1 + (|g(i,j−2) − g(i−1,j−3)|+ |g(i−1,j−1) − g(i−2,j−2)|+
|g(i−2,j) − g(i−3,j−1)|+ |g(i−1,j+1) − g(i−2,j)|)/(√
2)}−1
wg(i−2,j) = {1 + (|g(i,j−2) − g(i−2,j−2)|+ |g(i−1,j−1) − g(i−3,j−1)|+
|g(i−2,j) − g(i−4,j)|+ |g(i−1,j+1) − g(i−3,j+1)|)/(2)}−1
wg(i−1,j+1) = {1 + (|g(i,j−2) − g(i−1,j−1)|+ |g(i−1,j−1) − g(i−2,j)|+
|g(i−2,j) − g(i−3,j+1)|+ |g(i−1,j+1) − g(i−2,j+2)|)/(√
2)}−1
(3.6)
Figure 3.6 demonstrates the weight factor computation sequence for the G pixel at
location (i,j). In diagrams, the template region for the current pixel to predict is indicated
Chapter 3. Lossless CFA Compression using Prediction 37
with red boxes and the template region for candidate pixel are indicated with blue boxes.
(a) weight factor for west (b) weight factor for northwest
(c) weight factor for north (d) weight factor for northeast
Figure 3.6: Weight computation for the prediction of G pixel at (i,j)
Once g(i,j) is obtained, G prediction error, eg(i,j), is determined by eg(i,j) = g(i,j)− g(i,j)
and coded in the encoding module. Since the decoder can make same prediction g(i,j)
as the encoder, the original G sub-image can be reconstructed without loss by adding
decoded prediction error, e′g, and g(i,j).
3.2.3 Non-Green sub-image prediction
Independent encoding of deinterleaved sub-images yields suboptimal compression effi-
ciency since data redundancy in the form of inter-channel correlation is disregarded
during compression. In order to take into account inter-channel correlation, we per-
Chapter 3. Lossless CFA Compression using Prediction 38
form the prediction of non-green sub-images in the color difference domain rather than
the original intensity domain. To obtain color difference images, we need to estimate
G samples at original R and B pixel locations, which are unavailable in original CFA
layout. The missing G values are estimated from available G samples of the CFA image
by interpolation. Various interpolation schemes are available from the low-complexity
bilinear method to the complex methods utilizing a variety of estimation operators and
edge-sensing mechanisms. Our simulation results have shown that advanced interpola-
tion techniques typically improve the compression efficiency only marginally and thus,
we use the simple bilinear approach.
Figure 3.7: Current pixel to be predicted and its closest neighborhood pixels in a red
difference (dr) sub-image
Two color difference images, dr(i,j) and db(i,j) are defined as follows:
dr(i,j) = G(i,j) − r(i,j) , (i, j) ∈ {(2m− 1, 2n− 1)} (3.7)
db(i,j) = G(i,j) − b(i,j) , (i, j) ∈ {(2m, 2n)} (3.8)
where G denotes interpolated G channels. Since prediction procedure of two color dif-
ference images, dr(i,j) and db(i,j), are essentially identical, we only present a prediction
procedure for the red difference image using generalized difference signal d(i,j) in this
section. Similarly to G case, the proposed scheme predicts a current pixel d(i,j) using
its four closest candidate pixels placed in the direction of west, northwest, north, and
Chapter 3. Lossless CFA Compression using Prediction 39
northeast, as shown in Figure 3.7. However, unlike G component, non-green components
forms square lattices rather than quincunx ones, and hence, candidate pixels are defined
to be ζ2 ∈ {(i, j − 2), (i− 2, j − 2), (i− 2, j), (i− 2, j + 2)}.
The prediction of color difference sub-images is also performed in a raster-scan order
using the weighted template matching technique. The template for the color difference
sub-image is defined in Figure 3.8 using G samples, since edge and fine detail are typically
deemphasized in color difference domain, while well preserved in G channel due to double
sampling rate.
Figure 3.8: Template of red difference (dr) sub-image centered at (i,j). ’o’ indicates
pixels in the template region
The original weight factor of difference sub-image wd(p,q) is defined as follows :
wd(p,q) = {1 + (∑
(r,s)∈ζ1
Diff(Td(p,q), Td(r,s))/D(d(p,q), d(r,s)))}−1 (3.9)
where Td(p,q) denotes the template of color difference image at (p,q), and defined as
Td(p,q) ∈ {(p, q + 1), (p, q − 1), (p + 1, q), (p − 1, q)}. wd(i,j−2), wd(i−1,j−1), wd(i−2,j), and
wd(i−1,j+1), correspond to the west, northwest, north, and northeast weight factors of d(i,j)
Chapter 3. Lossless CFA Compression using Prediction 40
pixel, are obtained using equation (3.10).
wd(i,j−2) = {1 + (|g(i,j−1) − g(i,j−3)|+ |g(i−1,j) − g(i−1,j−2)|+
|g(i,j+1) − g(i,j−1)|+ |g(i+1,j) − g(i+1,j−2)|)/(2)}−1
wd(i−2,j−2) = {1 + (|g(i,j−1) − g(i−2,j−3)|+ |g(i−1,j) − g(i−3,j−2)|+
|g(i,j+1) − g(i−2,j−1)|+ |g(i+1,j) − g(i−1,j−2)|)/(2√
2)}−1
wd(i−2,j) = {1 + (|g(i,j−1) − g(i−2,j−1)|+ |g(i−1,j) − g(i−3,j)|+
|g(i,j+1) − g(i−2,j+1)|+ |g(i+1,j) − g(i−1,j+2)|)/(2)}−1
wd(i−2,j+2) = {1 + (|g(i,j−1) − g(i−2,j+1)|+ |g(i−1,j) − g(i−3,j+2)|+
|g(i,j+1) − g(i−2,j+3)|+ |g(i+1,j) − g(i−1,j+2)|)/(2√
2)}−1
(3.10)
(a) weight factor for west (b) weight factor for northwest
(c) weight factor for north (d) weight factor for northeast
Figure 3.9: Weight computation for the prediction of red difference (dr) pixel at (i,j)
Chapter 3. Lossless CFA Compression using Prediction 41
Figure 3.9 demonstrates the weight factor computation sequence for the red difference
pixel at location (i,j). In diagrams, the template region for the current pixel to predict
is indicated with yellow boxes and the template region for candidate pixel are indicated
with blue boxes.
Once weight factors for all directions are computed, the predicted value is obtained
using normalized weights w′d(p,q) as follows:
d(i,j) =∑
(p,q)∈ζ2
(w′d(p,q) · d(p,q)) (3.11)
The prediction error of color difference images ed is determined by ed(i,j) = d(i,j) − d(i,j)
and coded in the encoding module. Again, the decoder has all information to make same
prediction as the encoder and thus, it can reconstruct the R and B sub-image without
loss.
3.2.4 Compression of prediction error
The prediction error for three sub-images, eg, edr, and edb, are obtained from previous
stages. To compress them without loss, various existing image codecs with lossless en-
coding capability, such as JPEG-LS, JPEG 2000, and JPEG XR are considered. In our
proposed pipeline, we make use of JPEG XR standard due to the following reasons: i)
JPEG XR supports channel bit-depth upto 24 bits for lossless compression, allowing ef-
ficient storage of HDR format data, and ii) JPEG XR yields balanced output between
compression efficiency and computational complexity. In our experiment, JPEG XR
provides almost comparable coding efficiency to high performance JPEG 2000. In terms
of complexity, JPEG XR has considerably simpler architecture than JPEG 2000 and is
comparable to low complexity JPEG-LS. Therefore, we believe that JPEG XR is an ideal
compression solution for resource constrained environments such as digital cameras. The
number of samples to compress in eg is twice as much as the ones in edr, and edb. It im-
plies that the prediction error signal forms a standard 4:2:2 arrangement and thus, YCC
Chapter 3. Lossless CFA Compression using Prediction 42
4:2:2 encoding mode of JPEG XR can be applied to compress it. JPEG XR performs
lapped bi-orthogonal transform (LBT), quantization, and adaptive Huffman coding to
compress given input.
3.3 Experimental Results
Experiments are carried out using 31 RGB images from the Para-Dice Insight Compres-
sion Database [51], shown in Figure 3.10. This database is chosen since it is a publicly
available dataset containing a wide variety of RGB images in 16-bit HDR representa-
tion, varying in the edges and color appearances, and thus suitable for the evaluation
of our proposed solution. Three channel RGB images in the database are initially re-
sized to 960x640 and sampled by the Bayer CFA to produce the grayscale CFA images
z : Z2 → Z. The CFA images z are then processed by the proposed pipeline and com-
pressed into JPEG XR format c by JPEG XR reference software [52]. The reconstructed
CFA images x : Z2 → Z are generated by applying JPEG XR decompression to the
compressed data c, followed by processing operations in decoding pipeline. As all inter-
mediate steps are lossless, the reconstructed CFA images x should be identical to the
original CFA images z.
Performance of different solutions is evaluated by comparing lossless compression
bitrate. Compression bitrate is reported in bits per pixel (bpp), (8 × B)/n, where B is
the file size in bytes of the compressed image including image header and n is the number
of pixels in the image.
The JPEG XR codec is operated in lossless mode as follows: i) all subbands (DC,
LP, and HP) and flexbits are preserved during encoding, ii) Quantization is disabled by
setting quantization parameters to 1 for all subbands and color channels.
Chapter 3. Lossless CFA Compression using Prediction 43
Figure 3.10: Test digital color images (referred to as image 1 to image 31, from left to
right and top to bottom)
Chapter 3. Lossless CFA Compression using Prediction 44
3.3.1 Primary color channel and color difference channel
This section compares the compression performance of original R/B channels and color
difference channels.
(a) original channels (b) color difference channels
Figure 3.11: 2D autocorrelation graphs for the image 4 in database (a) original images,
R and B, (b) color difference images, dr and db
Figure 3.11 shows the two-dimensional autocorrelation of the primary color images
R and B, and the color difference images dr and db, for the image 4 in our database. The
height at each position indicates the correlation between the original image and spatially
shifted version of itself, which is defined in equation(3.12):
Corr(m,n) =
∑i
∑j(X(i,j) −X(i,j))(X(i+m,j+n) −X(i+m,j+n))√∑
i
∑j(X(i,j) −X(i,j))2 ·
√∑i
∑j(X(i+m,j+n) −X(i+m,j+n))2
(3.12)
where X(i,j) is the original image, X(i+m,j+n) is the shifted version of itself, X represent
the mean values of the given image, and m, n denote spatial shifts in horizontal and
vertical directions. The value at the center of graph is always 1 as it corresponds to zero
shift case.
The figure shows that the level of similarity drops off more rapidly with color differ-
ence images than primary color images as shifting distance increases. This observation
Chapter 3. Lossless CFA Compression using Prediction 45
holds true for the other images in database. It implies that dr and db have lower spa-
tial correlation between neighborhood pixels than R and B. Since spatial redundancy is
reduced by using color difference images, more efficient entropy coding is expected. As
shown in Table 3.1, the proposed scheme yields average lossless compression bitrates of
12.340 bit per pixel (bpp) for primary color images and 11.875 bpp for color difference
images, respectively.
Image RB dRdB Image RB dRdB
1 10.405 10.069 17 12.189 11.671
2 13.500 13.040 18 13.560 13.113
3 13.468 13.041 19 12.830 12.215
4 11.182 10.676 20 11.737 11.192
5 12.024 11.736 21 12.568 12.094
6 10.397 10.126 22 12.306 11.756
7 10.278 10.079 23 11.126 10.758
8 11.115 10.659 24 11.469 10.98
9 13.420 12.939 25 12.090 11.335
10 13.820 13.338 26 12.639 12.103
11 14.404 13.872 27 13.022 12.525
12 11.421 11.097 28 12.669 12.069
13 13.369 12.841 29 10.475 10.157
14 14.497 14.004 30 13.667 13.198
15 13.748 13.268 31 12.059 11.548
16 11.075 10.622 Avg 12.340 11.875
Table 3.1: Lossless bitrate of proposed compression scheme with primary channel and
color difference channel
Chapter 3. Lossless CFA Compression using Prediction 46
3.3.2 Green channel interpolation method
Img BI SPL EDI NEDI Img BI SPL EDI NEDI
1 10.069 10.109 10.023 9.995 17 11.671 11.679 11.715 11.662
2 13.040 13.064 13.075 13.049 18 13.113 13.133 13.168 13.116
3 13.041 13.068 13.064 13.048 19 12.215 12.227 12.276 12.220
4 10.676 10.668 10.680 10.655 20 11.192 11.226 11.251 11.172
5 11.736 11.784 11.791 11.751 21 12.094 12.118 12.145 12.096
6 10.126 10.144 10.136 10.129 22 11.756 11.761 11.771 11.736
7 10.079 10.121 10.119 10.100 23 10.758 10.787 10.817 10.756
8 10.659 10.674 10.672 10.646 24 10.980 10.994 11.035 10.942
9 12.939 12.947 12.971 12.936 25 11.335 11.328 11.404 11.354
10 13.338 13.333 13.345 13.308 26 12.103 12.120 12.118 12.074
11 13.872 13.879 13.901 13.894 27 12.525 12.563 12.555 12.493
12 11.097 11.122 11.114 11.091 28 12.069 12.082 12.103 12.016
13 12.841 12.851 12.879 12.822 29 10.157 10.230 10.250 10.161
14 14.004 13.997 14.029 14.011 30 13.198 13.223 13.242 13.209
15 13.268 13.281 13.315 13.274 31 11.548 11.582 11.578 11.546
16 10.622 10.661 10.647 10.614 Avg 11.875 11.895 11.909 11.867
Table 3.2: Lossless bitrate of proposed compression scheme with various G
interpolation schemes
Since we perform the weighted template matching prediction on the color difference
domain, the estimation of missing G samples at R and B pixel positions is necessary. This
is essentially achieved by interpolating the quincunx G image. In order to investigate
the influence of an interpolation technique in coding performance, we examined several
interpolation methods, including bilinear (BI), cubic spline interpolation (SPL), edge-
Chapter 3. Lossless CFA Compression using Prediction 47
directed interpolation (EDI) [16], new edge-directed interpolation (NEDI) [53], which
vary in estimation accuracy and computational complexity. For BI, missing G samples
are estimated by taking an average value of four surrounding pixels. In SPL, a piecewise
continuous curve, passing through each of the given samples in G sub-image, is defined to
determine missing pixel values. EDI is an adaptive approach that measures horizontal and
vertical gradients of missing G samples to decide the direction to perform interpolation.
NEDI initially computes the local covariance coefficients and and use them to adapt the
interpolation direction.
Table 3.2 lists the lossless compression bitrates of the proposed scheme for different
interpolation methods. The bitrates for BI is not listed as they are the same as the bitrates
of color difference image in Table 3.1. On average, lossless bitrates are 11.875, 11.895,
11.909, 11.867 bpp for BI, SPL, EDI, and NEDI, respectively. The observation shows that
use of advanced interpolation doesn’t significantly improve compression efficiency and
sometimes even degrades performance. Therefore, it is sufficient to use low complexity
bilinear interpolation in our proposed scheme for optimal compression performance.
3.3.3 Dissimilarity measure in template matching
The dissimilarity measure is a key element in template matching during prediction, since
the choice of dissimilarity metric in equation(3.4) and equation(3.9) affects computational
complexity and the accuracy of the prediction process. Table 3.3 presents the lossless
compression bitrates of the proposed scheme for the images from our database using two
commonly used dissimilarity metrics, SAD and SSE. They are defined as follows:
SAD(i,j) = |i− j| (3.13)
SSE(i,j) = (i− j)2 (3.14)
According to Table 3.3, the lossless bitrates for SAD and SSE are almost identical as
11.875 bpp and 11.874 bpp, respectively. We can conclude that selection of dissimilarity
Chapter 3. Lossless CFA Compression using Prediction 48
measure does not significantly affect compression performance and therefore, SAD is
preferred to SSE due to its low complexity in implementation.
Image SAD SSE Image SAD SSE
1 10.069 10.083 17 11.671 11.669
2 13.040 13.036 18 13.113 13.114
3 13.041 13.039 19 12.215 12.211
4 10.676 10.672 20 11.192 11.189
5 11.736 11.735 21 12.094 12.094
6 10.126 10.122 22 11.756 11.754
7 10.079 10.077 23 10.758 10.756
8 10.659 10.659 24 10.98 10.974
9 12.939 12.938 25 11.335 11.329
10 13.338 13.339 26 12.103 12.101
11 13.872 13.871 27 12.525 12.524
12 11.097 11.095 28 12.069 12.069
13 12.841 12.839 29 10.157 10.161
14 14.004 14.001 30 13.198 13.198
15 13.268 13.269 31 11.548 11.550
16 10.622 10.620 Avg 11.875 11.874
Table 3.3: Lossless bitrate of proposed compression scheme with SAD and SSE
dissimilarity metrics
3.3.4 Prediction algorithm
We compared performance of our proposed method with other methods described in the
literature. Methods in comparison are : i) method 1 : direct CFA image encoding using
Chapter 3. Lossless CFA Compression using Prediction 49
JPEG XR, ii) method 2 : direct CFA image encoding using JPEG 2000, iii) method
3 : direct CFA image encoding using JPEG-LS, iv) method 4 : prediction based on
separation method [40] in junction with JPEG XR compression, v) method 5 : CMBP
predictor based method [41] in junction with JPEG XR compression, vi) method 6 :
activity level classification model (ALCM) [54] predictor based method combined with
JPEG XR compression, and vii) method 7 : our proposed method.
As a basis for performance comparison, we used some representative lossless com-
pression schemes, such as JPEG XR, JPEG 2000, and JPEG-LS, directly on the CFA
image in first three methods. Kakadu v.6.4 software implementation is used for JPEG
2000 coding and FFMpeg software is used for JPEG-LS coding. Other methods from 4
to 7 are considered to demonstrate the relationship between accuracy of predictor and
the compression efficiency. In method 4, quincunx G channel is separated into two rect-
angular lattices G1 and G2, and the prediction is carried out by estimating G1 from
G2. Non-green channels are directly encoded in color difference domain. The CMBP
predictor in method 5 is essentially very similar to our proposed predictor. It initially
generates a direction vector map of sample image to determine homogeneous regions and
only performs prediction in nonhomogeneous regions with pre-defined weight factors for
neighborhood pixels. The ALCM predictor in method 6 estimates a current pixel using
a weighted combination of neighbor pixels. Initially equal weights are assigned for all
pixels and if previous prediction was higher than the actual pixel value, then the weight
of the largest neighbor pixel is decreased by 1/256 and the one for smallest neighbor pixel
is increased by the same amount. If previous prediction was lower than the actual pixel
value, then the weights of the largest and the smallest neighbor pixels are adjusted in
opposite way.
Figure 3.12 shows the entropy of sample images from our database associated with
different prediction schemes, from method 4 to 7. The entropy of image can be determined
Chapter 3. Lossless CFA Compression using Prediction 50
by the formula
H = −n∑i=1
Pi log2 Pi (3.15)
where Pi is probability of occurrence of pixel value i and H is the entropy of image. The
entropy is evaluated by generating image histogram from the prediction error image of
each sample images. Since the entropy of image data determines the theoretical lower
bound which can be achieved by lossless compression, we can evaluate the effectiveness
of different prediction algorithms. The average entropies of various prediction methods
result in 12.956, 11.637, 11.704, and 11.395 for method 4, 5, 6, and 7, respectively.
The proposed method shows the lowest average entropy value, indicating potential high
compression efficiency.
Figure 3.12: Entropy of sample images from the database with various prediction
methods
The output compression bitrates of CFA images from our database achieved by var-
ious methods are presented in Table 3.4 and Table 3.5. The results clearly show that
direct compression of the CFA mosaic image is not efficient. In direct CFA compression
Chapter 3. Lossless CFA Compression using Prediction 51
scenario, JPEG 2000 is superior to JPEG XR and JPEG-LS in terms of compression
efficiency, outperforming JPEG XR and JPEG-LS in average bitrate by 0.5 and 1.1 bpp,
respectively. However, as can be seen, exploiting accurate prediction method allows the
JPEG XR equipped pipeline to achieve higher compression ratio than JPEG 2000. On
Img M1 M2 M3 Img M1 M2 M3
1 11.351 9.393 10.657 17 13.090 12.868 13.444
2 14.290 14.200 14.590 18 14.804 14.334 15.287
3 14.406 14.322 15.080 19 14.035 13.579 15.066
4 13.964 12.856 15.766 20 12.774 12.436 13.192
5 13.162 12.961 13.828 21 13.810 13.308 14.192
6 11.270 10.775 10.890 22 13.779 13.250 14.693
7 12.971 11.883 15.130 23 11.998 12.096 12.362
8 13.269 12.258 14.571 24 13.410 12.389 14.283
9 14.224 14.122 14.860 25 13.495 13.098 14.506
10 14.636 14.552 15.231 26 14.236 13.488 14.883
11 15.155 15.143 15.554 27 14.203 13.665 14.956
12 12.938 12.349 13.650 28 13.184 13.145 13.245
13 14.047 14.047 14.252 29 12.891 11.508 14.110
14 15.433 15.319 16.156 30 14.328 14.347 14.524
15 14.452 14.470 14.971 31 13.235 12.030 12.914
16 12.633 11.958 13.388 Avg 13.596 13.102 14.201
Table 3.4: Lossless bitrate of various CFA compression schemes (direct CFA encoding
schemes)
average, the our proposed scheme yields a lossless compression bitrate of 11.875 bpp for
images in our database. The average compression bitrate obtained by other reviewed
Chapter 3. Lossless CFA Compression using Prediction 52
Img M4 M5 M6 M7 Img M4 M5 M6 M7
1 13.701 10.204 10.311 10.069 17 12.796 11.900 11.844 11.671
2 13.962 13.223 13.241 13.040 18 14.407 13.338 13.319 13.113
3 14.031 13.264 13.268 13.041 19 13.699 12.388 12.441 12.215
4 12.763 10.957 10.949 10.676 20 12.589 11.405 11.448 11.192
5 12.672 11.967 11.925 11.736 21 13.812 12.310 12.279 12.094
6 12.411 10.297 10.255 10.126 22 13.543 11.923 11.989 11.756
7 12.150 10.325 10.266 10.079 23 11.496 11.034 10.972 10.758
8 12.572 10.841 10.906 10.659 24 12.196 11.228 11.149 10.980
9 13.759 13.190 13.148 12.939 25 13.174 11.590 11.526 11.335
10 14.261 13.553 13.545 13.338 26 13.357 12.278 12.280 12.103
11 14.643 14.103 14.055 13.872 27 13.816 12.691 12.832 12.525
12 12.849 11.313 11.319 11.097 28 13.272 12.250 12.300 12.069
13 13.605 13.091 13.042 12.841 29 11.405 10.381 10.389 10.157
14 14.790 14.238 14.173 14.004 30 14.114 13.410 13.444 13.198
15 14.095 13.534 13.436 13.268 31 12.930 11.703 11.821 11.548
16 12.021 10.906 10.843 10.622 Avg 13.255 12.091 12.088 11.875
Table 3.5: Lossless bitrate of various CFA compression schemes (predictive coding
schemes)
predictors with JPEG XR compression are 13.255, 12.091, and 12.088 bpp, for method 4,
5, and 6, respectively. For most of images in database, the proposed method consistently
achieves the lowest lossless compression bitrates, proving robustness of the solution in
terms of compression efficiency.
Apart from the lossless bitrate performance of the proposed solution, its computa-
tional complexity is also analyzed in terms of normalized operations, such as addition
Chapter 3. Lossless CFA Compression using Prediction 53
(ADD), bit shift (SHF), multiplication (MUL), absolute value (ABS), and comparison
(CMP). Table 3.6 presents a summary of number of operations per pixel required to
carry out each stage of prediction process. In this analysis, the bilinear interpolation is
used for missing G pixel estimation and SAD metric is used for dissimilarity measure-
ment during prediction. It can be seen that performing non-green prediction in the color
difference domain instead of the intensity domain increases number of operations for the
proposed scheme by 2 addition and 0.5 shift per pixel since the G interpolation and
the difference signal estimation stages are unnecessary for the intensity domain. Such
a marginal increase in computational cost is considered to be tolerable given that use
of the color difference domain yields reduction in average lossless bitrate by 0.5 bpp as
shown in Section 3.3.1.
Stage ADD SHF MUL ABS CMP
G sub-image prediction 19.5 1 7 8 0
G interpolation (BI) 1.5 0.5 0 0 0
Diff R/B channel estimation 0.5 0 0 0 0
Diff R sub-image prediction 9.75 0.5 3.5 4 0
Diff B sub-image prediction 9.75 0.5 3.5 4 0
Total 41 2.5 14 16 0
Table 3.6: Number of operations per pixel required for the proposed scheme
3.4 Chapter Summary
In this chapter, a lossless Bayer CFA compression scheme capable of handling HDR
representation is presented. In summary, the following conclusion can be drawn from
this chapter: i) the structure separation step reduces high frequency artifacts, leading
to high compression efficiency, ii) the proposed weighted template matching predictor
Chapter 3. Lossless CFA Compression using Prediction 54
exploits inter-channel and spatial correlation to achieve high compression performance,
and iii) the proposed scheme utilizes low complexity building blocks, such as bilinear
interpolation, SAD dissimilarity measure, and JPEG XR encoding module, to minimize
the computational cost. The image entropy analysis and experimental results indicate
that the proposed scheme delivers higher lossless compression performance than other
prior-art solutions.
Chapter 4
Lossy CFA Compression using
Colorspace Conversion
4.1 Introduction
The previous chapter presented a HDR CFA compression solution which is reversible
so that the original CFA image can be perfectly reconstructed. Despite its advantage
of having no loss of information, the proposed lossless scheme do not provide adequate
compression ratios for target devices with low data storage. This chapter presents a lossy
CFA compression pipeline capable of handling HDR representation, which provide greater
compression ratio gains than the lossless scheme at the expense of marginal quality loss.
We focused on the Bayer CFA structure as it is the most widely utilized CFA arrange-
ment in the industry. The proposed scheme consists of a color space conversion module,
structure conversion step, and thus similar to the approaches discussed in [14, 38, 43, 55].
However, there are three important differences between the proposed scheme and the prior
art solutions. First, a novel color space namely YCoCg is used, instead of the YCbCr in
order to offer higher compression with reduced computation cost. YCoCg, another vari-
ation of luminance-chrominance based color space, offers simplified implementation due
55
Chapter 4. Lossy CFA Compression using Colorspace Conversion 56
to its integer based operation [30]. Secondly, we introduce a data adaptive edge-sensing
mechanism into the encoding pipeline in order to enhance the quality of reconstructed
images, which are generated by the companion decoding pipeline. Contrary to most of
the prior art solutions which utilize non-data adaptive or 4-direction based mechanism,
the proposed pipeline uses 8-directional system based approach to generate higher qual-
ity images at fractionally higher computation cost. Lastly, we make use of the recently
standardized image compression, JPEG XR in pipeline to facilitate CFA compression
with HDR representation [34]. HDR imaging typically requires 10 to 16 bit per color
component to represent image scenes, whereas conventional low dynamic range (LDR)
imaging only requires up to 8 bit. Due to its higher precision, HDR capability has re-
cently become one of the key features for high-end digital cameras. However, most of
the prior art in CFA compression is limited to codecs applied to conventional 8 bit per
color channel image inputs. Such a conventional pipeline disallows the rich visual content
afforded by the HDR CFA data as the original HDR data stream is mapped onto an 8
bit equivalent representation prior to applying compression solutions. It would be shown
that the proposed CFA compression pipeline produces high quality compressed images
while using expensive memory resources efficiently.
The rest of this chapter is organized as follows. Section 4.2 presents the new CFA
compression pipeline in detail. Experiment results are reported in Section 4.3 and the
chapter summary is provided in Section 4.4.
4.2 Proposed Algorithm
The proposed CFA compression schemes require a series of reversible pre-processing op-
erations prior to applying JPEG XR compression. The pre-processing operations give us
full control on color space conversion and pixel arrangement of input images to achieve
highly efficient compression performance.
Chapter 4. Lossy CFA Compression using Colorspace Conversion 57
Figure 4.1: Overview of the proposed lossy HDR CFA image compression pipeline
Initially the CFA image is transformed from the RGB domain into the YCoCg domain
to reduce inter-channel redundancy. We advocate the use of the YCoCg color space over
commonly used YCbCr since it is shown that YCoCg transform provides higher coding
gain at lower computational complexity [30]. The color space conversion requires all
three RGB components at each pixel location but the CFA image contains only one at
each pixel, so missing two components need to be estimated from adjacent pixels. Our
methods use a 8-directional data adaptive CDM to interpolate missing pixels. Following
a conventional CDM approaches on Bayer CFA, this algorithm initially perform interpo-
lation on G pixels, followed by interpolation of color difference signals, R-G, and B-G.
We then immediately compute the YCoCg image from interpolated G, R-G, and B-G
signals. As illustrated in Figure 4.1, two versions of image processing pipelines (IPP) are
proposed depending on number of the Y pixels to calculate during this stage. Namely,
IPP1 computes Y values at all pixel location to preserve complete edge information. On
the other hands, IPP2 reduces computational complexity by keeping only half of the Y
values. In both IPPs, only one chrominance pixels are computed for every each 2x2 pixel
blocks in the original CFA image. Once color conversion is completed, the YCoCg image
Chapter 4. Lossy CFA Compression using Colorspace Conversion 58
is rearranged to a shape more appropriate for the subsequent compression. Since this
structure conversion step produces the data output formatted in YUV 4:2:0 for IPP1 and
YUV 4:2:2 for IPP2, matching encoding modes provided by JPEG XR codec are applied
to them corresponding to output formats.
In the companion decoding pipeline, where a final reconstructed RGB image is pro-
duced to be rendered in display devices, the sequence of encoding pipeline is reversed.
Unlike the encoding pipeline which has to be implemented on the camera, the decoding
pipeline can be off-loaded to the end device, such as personal computers (PC). A PC
based decoding pipeline can include advanced algorithms to produce high fidelity recon-
structed images due to sufficient resources whereas a camera on-chip solution typically
exploits less complex algorithms to reduce computational cost and power consumption.
4.2.1 Interpolation of missing green components
In Bayer CFA, the G is a dominant component among three primary colors and suffers
the least from an aliasing issue. For this reason, it is common to start estimation of
missing pixels from the G components [50, 56, 57]. In our method, we employ an ESM
operator, and an inter-channel correlator to reconstruct missing G components. Among
several ESM operators, we found that the 8-directional data adaptive algorithm [50, 56]
offers high performance at low computation cost, and thus it is exploited in our pipeline.
In this algorithm, missing pixel value is computed by weighted sum of neighbor pixels
from 8 directions. The estimation of G pixels is formulated as follows:
y(i,j)G =
x(i,j)G if z(i,j) ∼= x(i,j)G∑
(p,q)∈ζ(w′(p,q) · x′(p,q)G) otherwise
(4.1)
where operator ∼= denotes a one to one relationship, z is the pixel value of the original
grayscale CFA image, x(i,j)G is the G pixel value at position (i,j), ζ are the 8 neighborhood
pixels of (i,j) such that ζ ∈ {(i−1, j), (i, j−1), (i, j+ 1), (i+ 1, j), (i−1, j−1), (i−1, j+
Chapter 4. Lossy CFA Compression using Colorspace Conversion 59
1), (i + 1, j − 1), (i + 1, j + 1)}, and x′(i,j)G are the predicted G values of neighborhood
pixels obtained using local edge information. The normalized edge-sensing weights w′(p,q)
are given by,
w′(p,q) = w(p,q)/∑
(m,n)∈ζ
w(m,n) (4.2)
The original edge-sensing weight factor, w(p,q) is defined in equation(4.3) using inverse
gradient
w(p,q) = {1 + (∑
(r,s)∈ζ
|z(p,q) − z(r,s)|/D(z(p,q), z(r,s)))}−1 (4.3)
where D(z(p,q), z(r,s)) represents spatial distance between two pixel locations and 1 is
added in the denominator to avoid singularity issue. The weight factor adaptively reduces
the influence of pixels across an edge, and located further from current spatial location,
to enhance estimation performance.
Figure 4.2: Indexing of the samples within a 5x5 window of Bayer CFA
Using the 8-directional data adaptive system, calculation of G components requires
a 5x5 support window centered at the missing G location. In Figure 4.2, estimation of
G components at location (i, j) requires edge-sensing weight coefficients and predicted
G values for 8 adjacent pixels. w(i−1,j), w(i,j+1), w(i+1,j), and w(i,j−1), correspond to the
Chapter 4. Lossy CFA Compression using Colorspace Conversion 60
north, east, south, and west weight factors of x(i,j) pixel are defined as:
w(i−1,j) = {1 + (|z(i,j) − z(i−2,j)|+ |z(i−1,j) − z(i+1,j)|)/(2)}−1
w(i,j+1) = {1 + (|z(i,j) − z(i,j+2)|+ |z(i,j+1) − z(i,j−1)|)/(2)}−1
w(i+1,j) = {1 + (|z(i,j) − z(i+2,j)|+ |z(i+1,j) − z(i−1,j)|)/(2)}−1
w(i,j−1) = {1 + (|z(i,j) − z(i,j−2)|+ |z(i,j−1) − z(i,j+1)|)/(2)}−1
(4.4)
w(i−1,j−1), w(i−1,j+1), w(i+1,j+1), and w(i+1,j−1), correspond to the north-west, north-east,
south-east, and south-west weight factors are defined as,
w(i−1,j−1) = {1 + (|z(i,j) − z(i−2,j−2)|+ |z(i−1,j−1) − z(i+1,j+1)|)/(2√
2)}−1
w(i−1,j+1) = {1 + (|z(i,j) − z(i−2,j+2)|+ |z(i−1,j+1) − z(i+1,j−1)|)/(2√
2)}−1
w(i+1,j+1) = {1 + (|z(i,j) − z(i+2,j+2)|+ |z(i+1,j+1) − z(i−1,j−1)|)/(2√
2)}−1
w(i+1,j−1) = {1 + (|z(i,j) − z(i+2,j−2)|+ |z(i+1,j−1) − z(i−1,j+1)|)/(2√
2)}−1
(4.5)
Similar to computation of edge-sensing weights, computation of predicted values of G
pixel around x(i,j) differentiates horizontal/vertical and diagonal directions. For horizon-
tal and vertical directions, predicted G pixel values are given by,
x′(i−1,j)G = x(i−1,j)G + (z(i−2,j) − z(i,j) + z(i−1,j) − z(i+1,j))/(4)
x′(i,j+1)G = x(i,j+1)G + (z(i,j+2) − z(i,j) + z(i,j+1) − z(i,j−1))/(4)
x′(i+1,j)G = x(i+1,j)G + (z(i+2,j) − z(i,j) + z(i+1,j) − z(i−1,j))/(4)
x′(i,j−1)G = x(i,j−1)G + (z(i,j−2) − z(i,j) + z(i,j−1) − z(i,j+1))/(4)
(4.6)
Chapter 4. Lossy CFA Compression using Colorspace Conversion 61
For diagonal direction, they are defined as follows:
x′(i−1,j−1)G = {x(i−1,j)G + x(i,j−1)G + (z(i−1,j−1) − z(i+1,j+1))/(2√
2)
+ (z(i−2,j) + z(i,j−2) − 2× z(i,j))/(4)}/2
x′(i−1,j+1)G = {x(i−1,j)G + x(i,j+1)G + (z(i−1,j+1) − z(i+1,j−1))/(2√
2)
+ (z(i−2,j) + z(i,j+2) − 2× z(i,j))/(4)}/2
x′(i+1,j+1)G = {x(i+1,j)G + x(i,j+1)G + (z(i+1,j+1) − z(i−1,j−1))/(2√
2)
+ (z(i+2,j) + z(i,j+2) − 2× z(i,j))/(4)}/2
x′(i+1,j−1)G = {x(i+1,j)G + x(i,j−1)G + (z(i+1,j−1) − z(i−1,j+1))/(2√
2)
+ (z(i+2,j) + z(i,j−2) − 2× z(i,j))/(4)}/2
(4.7)
By substituting normalized edge-sensing weight factors and predicted G pixel values into
equation(4.1), missing G pixels are estimated and full G channel is constructed.
4.2.2 Interpolation of color difference components
We perform interpolation in the color difference domain, R-G and B-G, instead of the
original R and B intensity domain. Image signals in the color difference domain are
generally smoother than ones in the intensity domain, thus, are more suitable for linear
interpolation. The difference signal R-G is obtained as follows:
y(i,j)RG =
x(i,j)R − y(i,j)G if z(i,j) ∼= x(i,j)R∑
(p,q)∈ζ1(w′′(p,q) · y(p,q)RG) if z(i,j) ∼= x(i,j)G∑
(p,q)∈ζ2(w′′′(p,q) · y(p,q)RG) if z(i,j) ∼= x(i,j)B
ζ1 ∈ {(i− 1, j), (i, j − 1), (i, j + 1), (i+ 1, j)}
ζ2 ∈ {(i− 1, j − 1), (i− 1, j + 1), (i+ 1, j − 1), (i+ 1, j + 1)}
(4.8)
where y(i,j)RG is the estimated R-G value at pixel (i, j), y(i,j)G is the estimated G value
from previous stage, and ζ1 and ζ2 are horizontal/vertical and diagonal neighbor pixels
Chapter 4. Lossy CFA Compression using Colorspace Conversion 62
of (i, j) respectively. Here, w′′ and w′′′ are renormalized edge-sensing weights for hor-
izontal/vertical and diagonal directions, respectively. The B-G signal, y(i,j)BG, can be
calculated using the same technique as follows:
y(i,j)BG =
∑(p,q)∈ζ2(w
′′′(p,q) · y(p,q)BG) if z(i,j) ∼= x(i,j)R∑
(p,q)∈ζ1(w′′(p,q) · y(p,q)BG) if z(i,j) ∼= x(i,j)G
x(i,j)B − y(i,j)G if z(i,j) ∼= x(i,j)B
(4.9)
4.2.3 Correction of green and color difference components
The correction operation utilizes correlation between color channels and edge information
to enhance estimation accuracy. The correction mechanism initially updates G as follows:
y(i,j)G =
x(i,j)R −
∑(p,q)∈ζ1(w
′′(p,q) · y(p,q)RG) if z(i,j) ∼= x(i,j)R
x(i,j)B −∑
(p,q)∈ζ1(w′′(p,q) · y(p,q)BG) if z(i,j) ∼= x(i,j)B
(4.10)
Then, the corresponding color difference signals, R-G and B-G, at corrected G pixel
positions are also updated as follows:
y(i,j)RG = x(i,j)R − y(i,j)G ,if z(i,j) ∼= x(i,j)R
y(i,j)BG = x(i,j)B − y(i,j)G ,if z(i,j) ∼= x(i,j)B
(4.11)
Finally R-G and B-G planes are corrected using the same formula given by equation(4.8)
and (4.9). This simple iteration reduces false color estimation and blurred edges while
preserving original z values of CFA data. [50]
Chapter 4. Lossy CFA Compression using Colorspace Conversion 63
4.2.4 YCoCg color conversion
The G, R-G, and B-G planes are fully populated through previous stages. Color space
conversion from the RGB domain to the YCoCg domain is given by:
y(i,j)Y =1
4y(i,j)R +
1
2y(i,j)G +
1
4y(i,j)B =
(y(i,j)RG + y(i,j)BG + 4× y(i,j)G)
4
y(i,j)Co =1
2y(i,j)R −
1
2y(i,j)B =
y(i,j)RG − y(i,j)BG2
y(i,j)Cg = −1
4y(i,j)R +
1
2y(i,j)G +
1
4y(i,j)B =
(−y(i,j)RG − y(i,j)BG)
4
(4.12)
It should be noted that calculating full resolution of three channels, Y, Co, and Cg, will
triple number of pixels to compress compared to the ones in original CFA image. We
propose two methods to reduce number of pixels to compress.
(a) Color space conversion for IPP1 (b) Color space conversion for IPP2
Figure 4.3: Two versions of color space conversion
The first method, IPP1, preserves four Y, one Co, and one Cg components for every
2x2 CFA pixels. This process reduces the spatial resolution of chrominance (chroma)
channels by 75 percent, but still allows us to maintain high image quality by keeping full
Y plane, which is perceptually more significant than chroma planes. In order to reduce
the spatial resolution of Co and Cg, a chroma subsampling is applied. Here, we discard
three chroma pixels from each 2x2 block for simplicity. After subsampling, the spatial
resolutions of chroma channels are halved in both horizontal and vertical direction.
The second method, IPP2, further reduces number of pixels to compress by discarding
half of Y pixels. It calculates Y pixels only at the G positions of the original CFA
image. It is because, the G is the dominant color in Y calculation and distortion can be
Chapter 4. Lossy CFA Compression using Colorspace Conversion 64
minimized by using reliable original G samples instead of interpolated ones. [43] Two
chroma channels of IPP2 are subsampled in the same manner as IPP1.
4.2.5 Structure conversion
Since image compression standards typically only allow rectangular patterns as inputs,
a structure conversion process is necessary. During this stage, the quincunx Y channel
in IPP2 is rearranged into a rectangular array by up-shifting every Y pixels located in
even rows by 1 pixel. It should be noted this step is unnecessary for IPP1 as Y pixels
already arranged in rectangular grid. For both IPP1 and IPP2, the Co and Cg pixels
are pressed together to form rectangular arrays. After structure conversion, the YCoCg
data in IPP1 constitutes the standard YUV 4:2:0 format, and thus, can be compressed
by applying YCC 4:2:0 mode of JPEG XR encoding. Similarly, the rearranged YCoCg
data in IPP2, formatted in YUV 4:2:2, can be compressed by YCC 4:2:2 mode of JPEG
XR encoding.
4.3 Experimental Results
The performance of the proposed solution is examined in following sequences. RGB
images of 16-bit per component representation from the Para-Dice Insight Compression
Database [51] in Figure 3.10 are initially resized to 960x640. The resized test images
o : Z2 → Z3 are sampled by the Bayer CFA to produce the CFA images z : Z2 → Z.
The CFA images z are then preprocessed using the proposed pipelines and compressed
into JPEG XR format c by JPEG XR reference software [52]. The reconstruct RGB
images x : Z2 → Z3 to be displayed to the end-user are generated by applying JPEG
XR decompression to the compressed data c followed by processing operations in reverse
order. In our experiments, we apply the bilinear interpolation to estimate missing Y, Co,
and Cg components in the decoding pipeline. The reconstructed image x should be as
Chapter 4. Lossy CFA Compression using Colorspace Conversion 65
close as possible to the desired RGB image o.
We modified the reference software to allow 16-bit per component YUV 4:2:0 data
as inputs for raw encoding mode. This modification allows us to simulate IPP1. The
JPEG XR codec is configured in following manners: i) all subbands (DC, LP, and HP)
and flexbits are preserved during encoding, ii) first level overlapping mode is used for
the pre-filter function, and iii) the bit rate of encoded image is controlled by adjusting
quantization variables. Uniform quantization parameters are used for all three subbands
and color channels.
To evaluate the performance of the proposed solutions, image quality is measured
by comparing o and x using three quality assessment (QA) metrics: i) Composite Peak
Signal to Noise Ratio (CPSNR), ii) Multi-scale Structural Similarity Index (MSSIM) [48],
and iii) High Dynamic Range Visible Difference Predictor (HDR-VDP) [49]. CPSNR is
defined as follows :
CPSNR = 10 log10((216 − 1)2/(
1
3K1K2
3∑k=1
K1∑r=1
K2∑s=1
(o(r,s)k − x(r,s)k)2)) (4.13)
where B stands for bit depth. Although, CPSNR is widely used in literatures, it has
poor correlation with perceived quality. Therefore, we include human visual system
(HVS) modeling oriented metrics, multi-scale MSSIM and HDR-VDP.
MSSIM initially decomposes a test image into several scales and provides statistics
by measuring luminance, contrast, and structure information of each sub-scale image. It
is generally evaluated by assigning different weights to color channels, and represented
in dB scale as follows:
MSSIM = 20 log10{(wY ·MSSIMY ) + (wCb ·MSSIMCb) + (wCr ·MSSIMCr)}−1
(4.14)
In this report, the weight coefficients for each channel of MSSIM are selected to be wY =
0.95, wCb = 0.02, and wCr = 0.03, following suggested usage from previous publications
[36, 58].
Chapter 4. Lossy CFA Compression using Colorspace Conversion 66
The VDP metric predicts pixel percentage of a test image that standard observers
would perceive as different from an original. The HDR-VDP deploys several HVS char-
acteristics into VDP to enhance its prediction accuracy in full visible range of luminance.
It is specifically tuned to support HDR images, and widely adopted in the comparison
of HDR images, and thus, we make use of HDR-VDP in reporting experimental results.
Similar to the MSSIM metric, HDR-VDP has been plotted in the dB scale as follows:
HDR− V DP = 20 log(1/r) (4.15)
where r denotes the ratio of pixels that standard observers would perceive as different
from the original.
Results reported in following sections are obtained from a wide range of compression
ratio values by averaging computed image quality of test images.
4.3.1 Edge Sensing Mechanism (ESM) and Compression
The rate distortion performance of CFA compression pipelines with various ESMs are
illustrated in Figure 4.4. The ESMs under consideration include the bilinear interpo-
lation (BI), the Laplacian interpolation (HA) [59], and the 8-directional data adaptive
interpolation (ESCC), which is deployed in our proposed pipeline. These ESM schemes
represent simple to complicated in terms of computational costs and vary in the quality
of image they produce. The BI interpolation is a typical example of non data adaptive
estimator that utilizes fixed edge-sensing weight factors for missing pixel estimation. The
HA interpolation is a classical edge-directed interpolator using the second order gradients
as the correction terms. Those two algorithms are often used as benchmark algorithms
in literatures [14, 38, 43] and thus we compare the performance of my solution against
them.
For IPP1 pipeline, the ESCC outperforms other ESMs throughout almost entire bit
rate range in all three quality metrics. The HA provides slightly higher CPSNR gain
Chapter 4. Lossy CFA Compression using Colorspace Conversion 67
than the ESCC at low bit rate, but other two perceptual metrics, strongly correlated
to visual perception, indicate that the ESCC is superior than the HA. It implies that
utilizing sophisticated ESM enhances the rate distortion performance of CFA compression
pipeline. However, as bit rate decreases, the selection of ESM has less impact on the
performance. At low bit rate, as shown in Figure 4.4, the ESSC provides almost identical
compression performance to the HA, and is still more efficient than the BI, although the
improvement is not as significant as the one at high bit rate. It is because the advanced
ESMs are more sensitive to fine edge detail, susceptible to compression errors, than low
complexity ones.
For IPP2 pipeline, not all error criteria show consistent results. CPSNR and MSSIM
metrics indicate that the ESCC achieves the best performance over almost entire range of
bit rates, while HDR-VDP metric indicates the HA outperforms the ESCC at the bit rate
range higher than 3 bits per pixel (bpp). This observation shows that the ESSC ESM is
more optimized for IPP1 than IPP2. Such suboptimal compression performance of IPP2
is caused by artificial high frequency components introduced in structure conversion
stage. Similar to IPP1 case, advanced ESMs in IPP2 provide less benefit in terms of
compression efficiency as bit rate decreases.
4.3.2 Color Space and Compression
The Figure 4.5 demonstrates rate distortion curves of the proposed scheme in junction
with the RGB-YCoCg conversion, and two other variants, including the RGB-YCbCr
and the JPEG2000 reversible color transform (RCT). The RGB-YCbCr conversion is
commonly used color conversion in CFA compression pipeline [14, 43] and thus, we con-
sider it as a reference method. The JPEG2000 RCT is considered in comparison since
it features low complexity as the YCoCg, requiring only addition and shift operations in
computation.
Our experimental results show that all three color space variants produce nearly
Chapter 4. Lossy CFA Compression using Colorspace Conversion 68
(a) CPSNR for IPP1 (b) MSSIM for IPP1
(c) HDR-VDP for IPP1 (d) CPSNR for IPP2
(e) MSSIM for IPP2 (f) HDR-VDP for IPP2
Figure 4.4: Rate-distortion curves of proposed pipelines with different ESMs for various
quality metrics
Chapter 4. Lossy CFA Compression using Colorspace Conversion 69
(a) CPSNR for IPP1 (b) MSSIM for IPP1
(c) HDR-VDP for IPP1 (d) CPSNR for IPP2
(e) MSSIM for IPP2 (f) HDR-VDP for IPP2
Figure 4.5: Rate-distortion curves of proposed pipelines with different color spaces for
various quality metrics
Chapter 4. Lossy CFA Compression using Colorspace Conversion 70
identical performance for both IPP1 and IPP2. The YCoCg slightly outperforms other
two methods in MSSIM and HDR-VDP metrics, but results in a small loss (maximum
0.2dB) in CPSNR measure compared to the YCbCr. Since the YCoCg space offers
marginally higher perceptual metric performance at low complexity, among reviewed, it
is the most efficient choice for our CFA compression pipeline implementation.
4.3.3 Proposed Pipeline and Conventional Pipelines
The Figure 4.6 compares the rate distortion performance of our proposed pipelines, IPP1
and IPP2, against other variants. Namely, IPP3 represents the conventional workflow,
that initially demosaicks the CFA image via the ESSC CDM and then compresses the
resultant RGB image. The compressed image is decoded and displayed. IPP4 firstly
encodes the CFA image directly without any pre-processing operations. The full RGB
image is obtained by demosaicking the decoded CFA image using the ESCC CDM. The
combination of two new pipelines with two codecs, JPEG XR and JPEG 2000, allows us
to test four new solutions in addition to our methods. For JPEG 2000 coding, the JasPer
software implementation [60] is used. Comparison to the conventional JPEG is omitted
due to its lack of support for 16-bit per components input.
Experimental result shows that IPP1 consistently outperforms IPP3 and IPP4 in all
three quality measures at high bit rates, above 8 bpp, regardless of the used codec. Also,
IPP1 substantially outperforms IPP2 at high bit rates. For mid-range bit rate, between
2 and 8 bpp, IPP4 provides the best image quality. At low bit rate, all three metrics
show that IPP3 produces images of superior quality than other pipelines.
At low bit rate IPP2 outperforms IPP1 in terms of the rate distortion performance.
There are two reasons behind this. Higher compression removes more texture and edge
details and thus it reduces the high frequency artifacts generated during quincunx to
rectangular array conversion of Y pixels in IPP2. Consequently, the reduction of the high
frequency components leads to improvement of the compression efficiency. In addition,
Chapter 4. Lossy CFA Compression using Colorspace Conversion 71
(a) CPSNR for various IPPs (b) MSSIM for various IPPs
(c) HDR-VDP for various IPPs at below 4 bpp (d) HDR-VDP for various IPPs at above 4 bpp
Figure 4.6: Rate-distortion curves of the proposed pipelines and 4 other pipelines for
various image quality metrics
the smaller input size of IPP2 results in giving better performance. Conversely, at high
bit rate the aliasing in IPP2 disallows efficient coding and the reduction of Y pixels results
in poor edge restoration. Thus, high bit rate favors IPP1 while low bit rate favors IPP2.
Figure 4.7 allows the visual evaluation of pipelines via the sub-region of reconstructed
images generated at low bit rate between 1 and 2 bpp. We can observe that IPP2 and
IPP3 maintain acceptable visual quality even under high compression ratio, whereas
IPP1 and IPP4 suffer from various visual artifacts. Images generated by IPP4 at low
Chapter 4. Lossy CFA Compression using Colorspace Conversion 72
bit rate are significantly distorted by lattice patterned artifacts. This unpleasing texture
appears for IPP4 with both JPEG 2000 and JPEG XR. Applying high compression
on CFA data removes edge information required for CDM and introduces noise which
can misguide ESM operators to generate false weight factors. Thus, advanced ESMs,
typically more sensitive to edge detail, may not produce acceptable quality images out of
highly compressed CFA data. The conventional workflow, IPP3, does not suffer from such
problem at low bit rate since CDM is done prior to compression. In addition, demosaicked
data typically have higher inter-pixel correlation than CFA data, enabling more efficient
compression. For these reasons, IPP3 works well at low bit rate, providing almost same
rate distortion performance as our proposed IPP2 pipeline. In Figure 4.6, the perceptual
metrics, MSSIM and HDR-VDP, indicate that IPP4 results in lower quality gain than
IPP2 and IPP3, providing us consistent results as the visual inspection.
Our experimental results show that the compression performance of JPEG XR and
JPEG 2000 is very close to each other and generally JPEG 2000 is slightly superior but
the gain is marginal. It can be seen in Figure 4.6 that for both IPP3 and IPP4, the
use of JPEG 2000 instead of JPEG XR compression slightly improves the rate distortion
performance over wide bit rate ranges in all three metrics.
Apart from the rate distortion performance, we also report the average encoding time
per image in millisecond for different combinations of pipelines and codecs in Table 4.1.
Experimental results, averaged over the image sets, are obtained on an Intel Core 2 Duo
2.53GHz CPU with 4GB RAM running Windows 7 operating system. For the CFA
input of size K1 × K2, number of pixels to encode in IPP1, IPP2, IPP3, and IPP4 are
1.5×K1×K2, K1×K2, 3×K1×K2, and K1×K2 pixels, respectively. The result shows
that the encoding delay for each pipeline is proportional to the number of pixels in input
data. This observation clearly shows a trade-off between the quality and the complexity.
At low bit rate, as shown in Figure 4.6, IPP2 performs significantly better than IPP1
and almost comparable to IPP3 in terms of image quality. The average encoding speed
Chapter 4. Lossy CFA Compression using Colorspace Conversion 73
(a) IPP1 (b) IPP2 (c) IPP3 (d) IPP4
(e) IPP1 (f) IPP2 (g) IPP3 (h) IPP4
(i) IPP1 (j) IPP2 (k) IPP3 (l) IPP4
Figure 4.7: Full color images obtained from four examined IPPs with JPEG XR codec
at bit rate between 1 and 2 bpp. First 4 images are sub-regions of the image 18, next 4
images are from the image 21, and last 4 images are from the image 1 in the database
of IPP2 is considerably faster than either IPP1 and IPP3 with JPEG XR encoding.
Therefore, on the condition that small quality loss is tolerable for reduction in encoding
delay, low complexity IPP2 solution is desirable.
According to Table 4.1, JPEG XR substantially faster than JPEG 2000 by 3.5 to 4
times in compression speed. It is important to note that direct comparison of compression
Chapter 4. Lossy CFA Compression using Colorspace Conversion 74
Pipeline-Codec Number of pixels to encode
(for K1 ×K2 CFA image)
Encoding Time
(ms/frame)
IPP1 - JPEG XR 1.5×K1 ×K2 216.91
IPP2 - JPEG XR K1 ×K2 133.09
IPP3 - JPEG XR 3×K1 ×K2 280.15
IPP4 - JPEG XR K1 ×K2 126.47
IPP3 - JPEG 2000 3×K1 ×K2 1152.94
IPP4 - JPEG 2000 K1 ×K2 429.41
Table 4.1: Encoding time for different pipelines and codecs
speed may not be meaningful as the different codecs implementing same coding standards
might produce different results. However, given that the computation complexity of
JPEG XR is much lower than JPEG 2000, it is reasonable to assume that the encoding
speed of JPEG XR is relatively faster than JPEG 2000 even in real world applications.
Simplified architecture and fast encoding speed are huge advantages for consumer level
devices, and thus, it justifies use of JPEG XR against JPEG 2000 for the proposed CFA
compression pipeline at the expense of a small loss in quality performance.
4.4 Chapter Summary
In this chapter, a lossy Bayer CFA compression scheme capable of handling HDR rep-
resentation is presented. In summary, the following conclusion can be drawn: i) use
of the 8-directional data adaptive ESM as an alternative to simple ESMs in the pro-
posed pipeline yields high quality reconstructed images, especially at high bit rate, ii)
the YCoCg is a low complexity alternative to the conventional YCbCr color space, offer-
ing identical or slightly better perceptual quality, enabling more efficient implementation
of workflow, iii) the proposed IPP1 offers the highest image quality among reviewed
Chapter 4. Lossy CFA Compression using Colorspace Conversion 75
pipelines at high bit rate. At low bit rate, IPP2 produces visually pleasing images with
the lowest processing delay. For bit rate range between low and high, the direct CFA
encoding method provides the most satisfactory results in terms of coding efficiency. The
results suggest a selective use of pipelines in digital cameras depending on the target bit
rate, iv) Although JPEG 2000 can provide marginally higher coding efficiency, JPEG XR
is a light-weighted image codec, capable of supporting HDR format and therefore, suit-
able for resource constrained systems. Combining a series of pre-processing operations
and a JPEG XR encoding module delivers a complete novel cost-effective solution that
allows the efficient storage of HDR CFA image data.
Chapter 5
Conclusions and Future Work
5.1 Conclusions
Over the past years, advancement in single-sensor digital cameras has offered more con-
venient access to digital images in various environments. These consumer level cameras
capture the natural scene by generating a mosaic-like grayscale image, also known as a
CFA image. One major challenge in this field is to support HDRI technology to achieve
more accurate representation of real visual scenes. Since digital images in HDR for-
mat require a larger amount of data than conventional 8 bit representation, efficient
compression of HDR contents has become a critical issue to be addressed. This thesis
introduces both lossless and lossy compression schemes for the digital camera pipeline
which efficiently encode CFA images provided in HDR format. Both systems combine
a series of pre-processing operations and a JPEG XR encoding module. Pre-processing
operations exploit spatial and spectral (inter-channel) correlations in the original CFA
image to achieve optimal compression performance. The utilized JPEG XR codec enables
compression of HDR data at reasonable processing cost.
In Chapter 3, we proposed a lossless compression scheme to compress Bayer CFA
image. The proposed scheme deinterleaves the input CFA image into into sub-images
76
Chapter 5. Conclusions and Future Work 77
of single color component and adopts a predictor depending on local image statistics.
Generated prediction error signals of each sub-image are then encoded by JPEG XR
compression. Experimental result confirms that the proposed scheme effectively removes
spatial and spectral redundancies, delivering higher compression efficiency than other
prior-art solutions.
In Chapter 4, we proposed a novel cost-effective lossy CFA encoding pipeline capable
of handling HDR image representations. This scheme combines color space conversion,
structure conversion and JPEG XR encoding module. Experimental analysis and com-
parative evaluations using objective quality metrics indicate that the proposed pipeline
outperforms state-of-the-art CFA compression solutions, which deploy low complexity
edge sensing mechanisms and conventional color space conversions. Results suggest that
the proposed schemes offer superior performance at low and high bit rates.
The proposed lossless scheme can be utilized in high-end/professional photography
applications where the original CFA image needs to be preserved. On the other hand, the
proposed lossy scheme provides greater compression gains at the expense of information
integrity and is suitable for general consumer-level cameras with limited data storages.
5.2 Future Work
Although significant achievements have been made in this research to improve the com-
pression performance of digital cameras, there is still room for further improvements.
This section discusses potential technical improvements and problems which can be fur-
ther explored.
5.2.1 Potential extensions on the proposed systems
• Results provided by empirical evaluations are limited to the specific database and
are inconclusive. Further experiments are needed to analyze the behavior of pro-
Chapter 5. Conclusions and Future Work 78
posed schemes with diverse sample sets including both natural and synthetic im-
ages. In addition, we can further investigate the influence of various image char-
acteristics on compression performance of the proposed scheme including spatial
resolution, edge strength, and edge orientation.
• The proposed lossless CFA compression scheme in Chapter 3 can be extended to
support lossy compression. This can be achieved by enabling a quantization module
in JPEG XR codec and embedding a de-quantization module in the proposed pre-
dictor to undo quantization during prediction process. Eventually this modification
will allow us to build an unified compression pipeline that offers high compression
efficiency for both lossy and lossless encodings.
• The proposed compression schemes can be improved to support a near-lossless
compression. This would require identification of the function of allowable encoding
error. [40] The near-lossless compression scheme that exploits characteristics of
human visual perception is expected to achieve much higher compression efficiency
at the cost of marginal encoding errors.
5.2.2 General future work
In the past few years, new research problems in the area of digital photography have been
identified. Especially demands for advanced technologies which lead to the increased
amount of image data, such as 3D imaging, HDRI, and ultrahigh-definition technologies
have raised needs for efficient encoding algorithms in digital photography.
• One of the new challenges in digital photography is an adaptation of 3D image pro-
cessing technology into mobile devices equipped with single-sensor imaging technol-
ogy [61]. In general, 3D visual content is generated by capturing the visual scene
from multiple viewpoints. This leads to larger amount of image data compared
to the conventional 2D technology and thus we need a high performance compres-
Chapter 5. Conclusions and Future Work 79
sion scheme. For commercial viability, new implementation needs to be interactive,
portable, and embedded system friendly.
• Ultrahigh-definition television (UHDTV) is a digital video format that supports
16 times of high-definition television (HDTV) resolution with a framerate of 60Hz
and progressive scan mode. Development of an UHDTV camera utilizing single-
sensor imaging technology constitutes another emerging research direction in this
area [62]. Realization of such cameras requires an efficient compression technique
to handle high data rate visual contents.
Bibliography
[1] K. N. Plataniotis and A. N. Venetsanopoulos, Color image processing and applica-
tions. New York, NY, USA: Springer-Verlag New York, Inc., 2000.
[2] M. Mancuso and S. Battiato, “An introduction to the digital still camera technol-
ogy,” in ST Journal of System Research - Special Issue on Image Processing for
Digital Still Camera, pp. 200–1, 2001.
[3] K. Myszkowski, R. Mantiuk, and G. Krawczyk, High Dynamic Range Video. Synthe-
sis Lectures on Computer Graphics and Animation, Morgan & Claypool Publishers,
2008.
[4] D. Alleysson, S. Susstrunk, and J. Herault, “Linear demosaicing inspired by the
human visual system,” Image Processing, IEEE Transactions on, vol. 14, pp. 439
–449, Apr. 2005.
[5] B. Turko and G. Yates, “Low smear ccd camera for high frame rates,” Nuclear
Science, IEEE Transactions on, vol. 36, pp. 165 –169, Feb. 1989.
[6] A. J. Blanksby and M. J. Loinaz, “Performance analysis of a color CMOS photogate
image sensor,” IEEE Transactions on Electron Devices, vol. 47, pp. 55–64, Jan.
2000.
[7] J. Adams, K. Parulski, and K. Spaulding, “Color processing in digital cameras,”
Micro, IEEE, vol. 18, pp. 20 –30, nov/dec 1998.
80
Bibliography 81
[8] B. E. Bayer, “Color imaging array,” July 1976.
[9] I. E. Commission, “Colour measurement and management in multimedia systems
and equipment - part 2-1: Default rgb colour space - srgb,” 1999.
[10] G. K. Wallace, “The jpeg still picture compression standard,” Commun. ACM,
vol. 34, pp. 30–44, April 1991.
[11] “Exchangeable image file format for digital still cameras: Exif version 2.2,” 2002.
Standard of Japan Electronics and Information Technology Industries Association.
[12] R. Lukac, Single-Sensor Imaging: Methods and Applications for Digital Cameras.
Boca Raton, FL, USA: CRC Press, Inc., 1 ed., 2008.
[13] N. Zhang and X. Wu, “Lossless compression of color mosaic images,” Image Pro-
cessing, IEEE Transactions on, vol. 15, pp. 1379 –1388, June 2006.
[14] C. C. Koh, J. Mukherjee, and S. Mitra, “New efficient methods of image compression
in digital cameras with color filter array,” Consumer Electronics, IEEE Transactions
on, vol. 49, pp. 1448 – 1456, Nov. 2003.
[15] N.-X. Lian, L. Chang, V. Zagorodnov, and Y.-P. Tan, “Reversing demosaicking
and compression in color filter array image processing: Performance analysis and
modeling,” Image Processing, IEEE Transactions on, vol. 15, pp. 3261 –3278, Nov.
2006.
[16] B. Gunturk, J. Glotzbach, Y. Altunbasak, R. Schafer, and R. Mersereau, “Demo-
saicking: color filter array interpolation,” Signal Processing Magazine, IEEE, vol. 22,
pp. 44 – 54, Jan. 2005.
[17] X. Li, B. Gunturk, and L. Zhang, “Image demosaicing: a systematic survey,”
vol. 6822, p. 68221J, SPIE, 2008.
Bibliography 82
[18] R. Kimmel, “Demosaicing: image reconstruction from color ccd samples,” Image
Processing, IEEE Transactions on, vol. 8, pp. 1221 –1228, Sept. 1999.
[19] S.-C. Pei and I.-K. Tam, “Effective color interpolation in ccd color filter arrays using
signal correlation,” Circuits and Systems for Video Technology, IEEE Transactions
on, vol. 13, pp. 503 – 513, June 2003.
[20] R. Lukac, B. Smolka, K. Martin, K. Plataniotis, and A. Venetsanopoulos, “Vector
filtering for color imaging,” Signal Processing Magazine, IEEE, vol. 22, pp. 74 – 86,
jan. 2005.
[21] S. Battiato, A. Castorina, and M. Mancuso, “High dynamic range imaging for digital
still camera: an overview,” Journal of Electronic Imaging, vol. 12, no. 3, pp. 459–469,
2003.
[22] P. E. Debevec and J. Malik, “Recovering high dynamic range radiance maps from
photographs,” in Proceedings of the 24th annual conference on Computer graphics
and interactive techniques, SIGGRAPH ’97, (New York, NY, USA), pp. 369–378,
ACM Press/Addison-Wesley Publishing Co., 1997.
[23] E. Khan, A. Akyuz, and E. Reinhard, “Ghost removal in high dynamic range im-
ages,” in Image Processing, 2006 IEEE International Conference on, pp. 2005 –2008,
oct. 2006.
[24] O. Gallo, N. Gelfandz, W.-C. Chen, M. Tico, and K. Pulli, “Artifact-free high
dynamic range imaging,” in Computational Photography (ICCP), 2009 IEEE Inter-
national Conference on, pp. 1 –7, april 2009.
[25] T.-H. Lee, W.-J. Kyung, C.-H. Lee, and Y.-H. Ha, “Estimation of low dynamic range
images from single bayer image using exposure look-up table for high dynamic range
image,” vol. 7866, p. 78660B, SPIE, 2011.
Bibliography 83
[26] G. Qiu, J. Guan, J. Duan, and M. Chen, “Tone mapping for hdr image using opti-
mization a new closed form solution,” in Proceedings of the 18th International Con-
ference on Pattern Recognition - Volume 01, ICPR ’06, (Washington, DC, USA),
pp. 996–999, IEEE Computer Society, 2006.
[27] J. Duan, M. Bressan, C. Dance, and G. Qiu, “Tone-mapping high dynamic range
images by novel histogram adjustment,” Pattern Recogn., vol. 43, pp. 1847–1862,
May 2010.
[28] E. Reinhard, M. Stark, P. Shirley, and J. Ferwerda, “Photographic tone reproduction
for digital images,” ACM Trans. Graph., vol. 21, pp. 267–276, July 2002.
[29] D. S. Taubman and M. W. Marcellin, JPEG 2000: Image Compression Funda-
mentals, Standards and Practice. Norwell, MA, USA: Kluwer Academic Publishers,
2001.
[30] H. S. Malvar, G. J. Sullivan, and S. Srinivasan, “Lifting-based reversible color trans-
formations for image compression,” vol. 7073, p. 707307, SPIE, 2008.
[31] M. Weinberger, G. Seroussi, and G. Sapiro, “The loco-i lossless image compression
algorithm: principles and standardization into jpeg-ls,” Image Processing, IEEE
Transactions on, vol. 9, pp. 1309 –1324, aug 2000.
[32] X. Wu and N. Memon, “Calic-a context based adaptive lossless image codec,” in
Acoustics, Speech, and Signal Processing, 1996. ICASSP-96. Conference Proceed-
ings., 1996 IEEE International Conference on, vol. 4, pp. 1890 –1893 vol. 4, may
1996.
[33] M. Rabbani and R. Joshi, “An overview of the jpeg 2000 still image compression
standard,” Signal Processing: Image Communication, vol. 17, no. 1, pp. 3 – 48, 2002.
Bibliography 84
[34] I.-T. R. T.832 and I. 29199-2, “Information technology - jpeg xr image coding system
- part 2: Image coding specification,” 2009.
[35] S. Srinivasan, C. Tu, S. L. Regunathan, and G. J. Sullivan, “Hd photo: a new image
coding technology for digital photography,” vol. 6696, p. 66960A, SPIE, 2007.
[36] F. De Simone, M. Ouaret, F. Dufaux, A. G. Tescher, and T. Ebrahimi, “A com-
parative study of JPEG 2000, AVC/H.264, and HD Photo,” in SPIE Optics and
Photonics, Applications of Digital Image Processing XXX, vol. 6696, 2007.
[37] T. Bruylants, J. Barbarien, A. Munteanu, and P. Schelkens, “Perceptual quality
assessment of jpeg, jpeg 2000, and jpeg xr,” vol. 7723, p. 77230E, SPIE, 2010.
[38] R. Lukac and K. Plataniotis, “Single-sensor camera image compression,” Consumer
Electronics, IEEE Transactions on, vol. 52, pp. 299 – 307, May 2006.
[39] G. Schaefer, R. Nowosielski, and R. Starosolski, “Evaluation of lossless image com-
pression algorithms for cfa data,” in ELMAR, 2008. 50th International Symposium,
vol. 1, pp. 57 –60, Sept. 2008.
[40] A. Bazhyna and K. Egiazarian, “Lossless and near lossless compression of real color
filter array data,” Consumer Electronics, IEEE Transactions on, vol. 54, pp. 1492
–1500, november 2008.
[41] K.-H. Chung and Y.-H. Chan, “A lossless compression scheme for bayer color filter
array images,” Image Processing, IEEE Transactions on, vol. 17, pp. 134 –144, Feb.
2008.
[42] A. Bazhyna, K. Egiazarian, S. Mitra, and C. Koh, “A lossy compression algorithm
for bayer pattern color filter array data,” in Signals, Circuits and Systems, 2007.
ISSCS 2007. International Symposium on, vol. 2, pp. 1 –4, July 2007.
Bibliography 85
[43] H. Chen, M. Sun, and E. Steinbach, “Compression of bayer-pattern video sequences
using adjusted chroma subsampling,” Circuits and Systems for Video Technology,
IEEE Transactions on, vol. 19, pp. 1891 –1896, Dec. 2009.
[44] S. H. Lee and N. I. Cho, “H.264/avc based color filter array compression with inter-
channel prediction model,” in Image Processing (ICIP), 2010 17th IEEE Interna-
tional Conference on, pp. 1237 –1240, Sept. 2010.
[45] S.-Y. Lee and A. Ortega, “A novel approach of image compression in digital cam-
eras with a bayer color filter array,” in Image Processing, 2001. Proceedings. 2001
International Conference on, vol. 3, pp. 482 –485 vol.3, 2001.
[46] Z. Wang and A. C. Bovik, Modern Image Quality Assessment. Synthesis Lectures
on Image, Video, and Multimedia Processing, Morgan & Claypool Publishers, 2006.
[47] Z. Wang, A. C. Bovik, H. R. Sheikh, and E. P. Simoncelli, “Image quality assessment:
From error visibility to structural similarity,” IEEE TRANSACTIONS ON IMAGE
PROCESSING, vol. 13, no. 4, pp. 600–612, 2004.
[48] Z. Wang, E. P. Simoncelli, and A. C. Bovik, “Multi-scale structural similarity for
image quality assessment,” in in Proc. IEEE Asilomar Conf. on Signals, Systems,
and Computers, pp. 1398–1402, 2003.
[49] R. Mantiuk, K. Myszkowski, and H. peter Seidel, “Visible difference predicator for
high dynamic range images,” in in Proceedings of IEEE International Conference
on Systems, Man and Cybernetics, pp. 2763–2769, 2004.
[50] R. Lukac and K. Plataniotis, “Data adaptive filters for demosaicking: a framework,”
Consumer Electronics, IEEE Transactions on, vol. 51, pp. 560 – 570, May 2005.
[51] “Para-dice in sight - compression database.” http://cdb.paradice-insight.us/.
Bibliography 86
[52] I.-T. R. T.832 and I. 29199-5, “Information technology - jpeg xr image coding system
- part 5: Reference software,” 2010.
[53] X. Li and M. Orchard, “New edge-directed interpolation,” Image Processing, IEEE
Transactions on, vol. 10, pp. 1521 –1527, oct 2001.
[54] K. Subbalakshmi, Lossless Compression Handbook, ch. Lossless Image Compression.
No. ISBN 0-12-620861-1 in Communications, Networking, and Multimedia, Aca-
demic Press, 2003.
[55] C. Doutre and P. Nasiopoulos, “An efficient compression scheme for colour filter
array images using estimated colour differences,” in Electrical and Computer Engi-
neering, 2007. CCECE 2007. Canadian Conference on, pp. 24 –27, Apr. 2007.
[56] R. Lukac, K. Plataniotis, D. Hatzinakos, and M. Aleksic, “A novel cost effective de-
mosaicing approach,” Consumer Electronics, IEEE Transactions on, vol. 50, pp. 256
– 261, Feb. 2004.
[57] X. Li, “Demosaicing by successive approximation,” Image Processing, IEEE Trans-
actions on, vol. 14, pp. 370 –379, Mar. 2005.
[58] D. Schonberg, S. Sun, G. J. Sullivan, S. Regunathan, Z. Zhou, and S. Srinivasan,
“Techniques for enhancing JPEG XR / HD Photo rate-distortion performance for
particular fidelity metrics,” in Society of Photo-Optical Instrumentation Engineers
(SPIE) Conference Series, vol. 7073 of Society of Photo-Optical Instrumentation
Engineers (SPIE) Conference Series, Oct. 2008.
[59] J. F. Hamilton and J. E. Adams, “Adaptive color plane interpolation in single sensor
color electronic camera,” 1997.
Bibliography 87
[60] M. Adams and F. Kossentini, “Jasper: a software-based jpeg-2000 codec implemen-
tation,” in Image Processing, 2000. Proceedings. 2000 International Conference on,
vol. 2, pp. 53 –56 vol.2, Sept. 2000.
[61] K. Atanassov, V. Ramachandra, S. R. Goma, and M. Aleksic, “3D image process-
ing architecture for camera phones,” in Society of Photo-Optical Instrumentation
Engineers (SPIE) Conference Series, vol. 7864 of Society of Photo-Optical Instru-
mentation Engineers (SPIE) Conference Series, Jan. 2011.
[62] R. Funatsu, T. Yamashita, K. Mitani, and Y. Nojiri, “Single-chip color imaging
for UHDTV camera with a 33M-pixel CMOS image sensor,” in Society of Photo-
Optical Instrumentation Engineers (SPIE) Conference Series, vol. 7875 of Society
of Photo-Optical Instrumentation Engineers (SPIE) Conference Series, Feb. 2011.