high dynamic range image compression of color filter array … · 2013-11-01 · abstract high...

High Dynamic Range Image Compression ofColor Filter Array Data for the Digital Camera Pipeline

by

Dohyoung Lee

A thesis submitted in conformity with the requirementsfor the degree of Master of Applied Science

Graduate Department of Electrical and Computer EngineeringUniversity of Toronto

Copyright c© 2011 by Dohyoung Lee

Abstract

High Dynamic Range Image Compression of

Color Filter Array Data for the Digital Camera Pipeline

Dohyoung Lee

Master of Applied Science

Graduate Department of Electrical and Computer Engineering

University of Toronto

2011

Typical consumer digital cameras capture the scene by generating a mosaic-like grayscale

image, known as a color filter array (CFA) image. One obvious challenge in digital pho-

tography is the storage of image, which requires the development of an efficient compres-

sion solution. This issue has become more significant due to a growing demand for high

dynamic range (HDR) imaging technology, which requires increased bandwidth to allow

realistic presentation of visual scene.

This thesis proposes two digital camera pipelines, efficiently encoding CFA image

data represented in HDR format. Firstly, a lossless compression scheme exploiting a

predictive coding followed by a JPEG XR encoding module is introduced. It achieves

efficient data reduction without loss of quality. Secondly, a lossy compression scheme

that consists of a series of processing operations and a JPEG XR encoding module is

introduced. Performance evaluation indicates that the proposed method delivers high

quality images at low computational costs.

ii

Contents

1 INTRODUCTION 1

1.1 Motivation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1

1.2 Key Challenges . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3

1.3 Thesis Scope and Contributions . . . . . . . . . . . . . . . . . . . . . . . 5

1.3.1 Lossless HDR CFA compression scheme for the digital camera pipeline 5

1.3.2 Lossy HDR CFA compression scheme for the digital camera pipeline 6

1.4 Thesis Organization . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6

2 BACKGROUND 7

2.1 Digital Camera Design . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7

2.1.1 Digital Camera Architecture . . . . . . . . . . . . . . . . . . . . . 7

2.1.2 Image Processing Pipeline . . . . . . . . . . . . . . . . . . . . . . 9

2.1.3 Color Demosaicking . . . . . . . . . . . . . . . . . . . . . . . . . . 11

2.1.4 High Dynamic Range Imaging in Single Sensor Digital Cameras . 12

2.2 Image Compression . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 18

2.2.1 Common Image Compression Techniques . . . . . . . . . . . . . . 19

2.2.2 Image Compression Standards : JPEG family . . . . . . . . . . . 21

2.2.3 Prior arts on Bayer CFA compression . . . . . . . . . . . . . . . . 23

2.3 Image Quality Assessment Metrics . . . . . . . . . . . . . . . . . . . . . 26

2.3.1 Non-perceptual Quality Metrics . . . . . . . . . . . . . . . . . . . 27

iii

2.3.2 Perceptual Quality Metrics . . . . . . . . . . . . . . . . . . . . . . 27

3 Lossless CFA Compression using Prediction 30

3.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 30

3.2 Proposed Algorithm . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 31

3.2.1 Deinterleaving Bayer CFA . . . . . . . . . . . . . . . . . . . . . . 32

3.2.2 Green sub-image prediction . . . . . . . . . . . . . . . . . . . . . 34

3.2.3 Non-Green sub-image prediction . . . . . . . . . . . . . . . . . . . 37

3.2.4 Compression of prediction error . . . . . . . . . . . . . . . . . . . 41

3.3 Experimental Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 42

3.3.1 Primary color channel and color difference channel . . . . . . . . 44

3.3.2 Green channel interpolation method . . . . . . . . . . . . . . . . . 46

3.3.3 Dissimilarity measure in template matching . . . . . . . . . . . . 47

3.3.4 Prediction algorithm . . . . . . . . . . . . . . . . . . . . . . . . . 48

3.4 Chapter Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 53

4 Lossy CFA Compression using Colorspace Conversion 55

4.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 55

4.2 Proposed Algorithm . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 56

4.2.1 Interpolation of missing green components . . . . . . . . . . . . . 58

4.2.2 Interpolation of color difference components . . . . . . . . . . . . 61

4.2.3 Correction of green and color difference components . . . . . . . . 62

4.2.4 YCoCg color conversion . . . . . . . . . . . . . . . . . . . . . . . 63

4.2.5 Structure conversion . . . . . . . . . . . . . . . . . . . . . . . . . 64

4.3 Experimental Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 64

4.3.1 Edge Sensing Mechanism (ESM) and Compression . . . . . . . . . 66

4.3.2 Color Space and Compression . . . . . . . . . . . . . . . . . . . . 67

4.3.3 Proposed Pipeline and Conventional Pipelines . . . . . . . . . . . 70

iv

4.4 Chapter Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 74

5 Conclusions and Future Work 76

5.1 Conclusions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 76

5.2 Future Work . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 77

5.2.1 Potential extensions on the proposed systems . . . . . . . . . . . 77

5.2.2 General future work . . . . . . . . . . . . . . . . . . . . . . . . . 78

Bibliography 80

v

List of Tables

3.1 Lossless bitrate of proposed compression scheme with primary channel and

color difference channel . . . . . . . . . . . . . . . . . . . . . . . . . . . . 45

3.2 Lossless bitrate of proposed compression scheme with various G interpo-

lation schemes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 46

3.3 Lossless bitrate of proposed compression scheme with SAD and SSE dis-

similarity metrics . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 48

3.4 Lossless bitrate of various CFA compression schemes (direct CFA encoding

schemes) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 51

3.5 Lossless bitrate of various CFA compression schemes (predictive coding

schemes) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 52

3.6 Number of operations per pixel required for the proposed scheme . . . . 53

4.1 Encoding time for different pipelines and codecs . . . . . . . . . . . . . . 74

vi

List of Figures

2.1 Typical optical path for single sensor cameras . . . . . . . . . . . . . . . 8

2.2 Bayer CFA arrangement . . . . . . . . . . . . . . . . . . . . . . . . . . . 8

2.3 Conventional Image Processing Pipeline . . . . . . . . . . . . . . . . . . 10

2.4 Alternative Image Processing Pipeline . . . . . . . . . . . . . . . . . . . 10

2.5 Typical images with limited dynamic range and a HDR image . . . . . . 13

2.6 HDR image acquisition by capturing multiple images . . . . . . . . . . . 15

2.7 HDR image acquisition by estimation . . . . . . . . . . . . . . . . . . . . 16

2.8 Image pipeline design with raw CFA image storage . . . . . . . . . . . . 17

2.9 Image pipeline design exploiting HDR contents compression . . . . . . . 17

2.10 Block diagram of JPEG XR encoding process . . . . . . . . . . . . . . . 22

2.11 CFA deinterleave process . . . . . . . . . . . . . . . . . . . . . . . . . . . 24

2.12 CFA deinterleave process : G subimage . . . . . . . . . . . . . . . . . . . 25

3.1 Overview of the proposed lossless CFA compression pipeline . . . . . . . 32

3.2 Bayer CFA deinterleave method . . . . . . . . . . . . . . . . . . . . . . . 33

3.3 Current pixel to be predicted and its 4 closest neighborhood pixels in a

quincunx G sub-image . . . . . . . . . . . . . . . . . . . . . . . . . . . . 34

3.4 Template of G sub-image centered at (i,j). ’o’ indicates pixels in the tem-

plate region . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 35

3.5 Pixel values required for the prediction of G pixel at (i,j) . . . . . . . . . 36

3.6 Weight computation for the prediction of G pixel at (i,j) . . . . . . . . . 37

vii

3.7 Current pixel to be predicted and its closest neighborhood pixels in a red

difference (dr) sub-image . . . . . . . . . . . . . . . . . . . . . . . . . . 38

3.8 Template of red difference (dr) sub-image centered at (i,j). ’o’ indicates

pixels in the template region . . . . . . . . . . . . . . . . . . . . . . . . . 39

3.9 Weight computation for the prediction of red difference (dr) pixel at (i,j) 40

3.10 Test digital color images (referred to as image 1 to image 31, from left to

right and top to bottom) . . . . . . . . . . . . . . . . . . . . . . . . . . . 43

3.11 2D autocorrelation graphs for the image 4 in database (a) original images,

R and B, (b) color difference images, dr and db . . . . . . . . . . . . . . 44

3.12 Entropy of sample images from the database with various prediction methods 50

4.1 Overview of the proposed lossy HDR CFA image compression pipeline . . 57

4.2 Indexing of the samples within a 5x5 window of Bayer CFA . . . . . . . . 59

4.3 Two versions of color space conversion . . . . . . . . . . . . . . . . . . . 63

4.4 Rate-distortion curves of proposed pipelines with different ESMs for vari-

ous quality metrics . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 68

4.5 Rate-distortion curves of proposed pipelines with different color spaces for

various quality metrics . . . . . . . . . . . . . . . . . . . . . . . . . . . . 69

4.6 Rate-distortion curves of the proposed pipelines and 4 other pipelines for

various image quality metrics . . . . . . . . . . . . . . . . . . . . . . . . 71

4.7 Full color images obtained from four examined IPPs with JPEG XR codec

at bit rate between 1 and 2 bpp. First 4 images are sub-regions of the image

18, next 4 images are from the image 21, and last 4 images are from the

image 1 in the database . . . . . . . . . . . . . . . . . . . . . . . . . . . 73

viii

Acronyms

ALCM Activity level classification model

ASIC Application specific integrated circuit

BPP Bit per pixel

CCD Charge coupled device

CDM Color demosaicking

CFA Color filter array

CMBP Context matching based prediction

CMOS Complementary metal oxide semiconductor

DCT Discrete cosine transform

DSP Digital signal processor

DWT Discrete wavelet transform

ESM Edge-sensing mechanism

EXIF Exchangeable image file

HDR High dynamic range

HDRI High dynamic range imaging

ix

HVS Human visual system

JPEG Joint photography experts group

JPEG XR JPEG extended range

LBT Lapped bi-orthogonal transform

LDR Low dynamic range

MOS Mean opinion score

MSE Mean square error

PSNR Peak to signal noise ratio

RCT Reversible color transform

SAD Sum of absolute differences

SM Spectral model

SSE Sum of square errors

SSIM Structural similarity index

UHDTV Ultrahigh-definition television

VDP Visual difference predictor

x

Chapter 1

INTRODUCTION

1.1 Motivation

Over the past years, advancement in color imaging technology has reduced complexity,

size, and cost of color devices, such as digital cameras, monitors, and printers, allowing

more convenient access to them in various environments. One of the rapidly evolving

fields in color imaging technology is digital photography which has gained significant pop-

ularity in recent years. In order to create an image of a scene, digital cameras use a sensor,

an array of light-sensitive spots, called photosites, which records the total intensity of

the light that touches its surface. Commonly used image sensors have monochromatic

characteristic which cannot record color information. Among existing solutions, a single-

sensor imaging technology, which captures visual scenes in color using a monochrome

sensor in conjunction with a color filter array (CFA), offers tradeoffs among cost, per-

formance, and complexity. Thus, the single-sensor solution is widely adopted to typical

consumer grades digital cameras. Due to the advancement and proliferation of emerging

digital camera based applications and commercial devices, such as multimedia mobile

phones, sensor networks, and personal digital assistants(PDA), the demand for single-

sensor imaging and digital camera image processing solutions will grow considerably in

1

Chapter 1. INTRODUCTION 2

the next decade. [1]

Digital cameras embed a series of signal processing operations in their processors to

produce digital images, which is called an image processing pipeline. The three main

components of the image processing pipeline include image acquisition, image transmis-

sion/storage, and image visualization. Since the pipeline design is a key element to

determine image quality and computational efficiency of digital cameras, a significant

amount of research efforts has been devoted to it. At the first stage of pipeline, single-

sensor cameras produce a mosaic-like image formed by intermixing samples from RGB

channels, also called a raw CFA image. The CFA image differs from a full color RGB

image, as it contains only one color component at each pixel. In order to convert the CFA

image to a full color RGB image, two missing components of each pixel are estimated

by demosaicking operation. Then various image processing techniques are applied to the

full color demosaicked image to enhance image quality. Finally the enhanced image is

compressed to reduce memory consumption. Recently, this demosaicking-first approach

is found to be sub-optimal in terms of compression efficiency. An alternative solution,

which performs compression prior to demosaicking, is proposed and it raised an issue

specific to single-sensor cameras, the compression of a mosaic-like CFA image.

One of the most challenging and rapidly emerging issues for digital cameras is sup-

porting high dynamic range imaging (HDRI) technology. HDRI uses increased number of

bits to represent each pixel of digital images than conventional systems and thus provides

increased tonal resolution. As a result, it achieves more realistic representation of the

visual scene with smoother gradation. It is foreseeable that the imaging industry will

inevitably transit to HDRI technology in near future. This change will affect all stages

of the image processing pipeline of digital cameras from data acquisition to visualization.

Especially, increase in dynamic range leads to increased number of bits in image data.

For example, many digital cameras started to produce the CFA image in high bit format,

typically between 10 and 16 bit per pixel (replacing conventional 8 bit). Therefore, it


has become highly important to develop efficient compression techniques for HDR CFA

images to use expensive storage effectively.

The purpose of this thesis is to propose an efficient compression scheme for single-

sensor digital cameras to encode CFA images given in HDR format. The proposed system

is designed to minimize the amount of memory required to store HDR CFA data, while

maintaining computational requirements low due to limited resources in digital cameras.

The development of an efficient HDR CFA compression scheme will ultimately enables

ordinary users to experience promising HDRI technology in consumer level cameras and

allows considerable improvement of the visual realism in digital visual contents.

1.2 Key Challenges

In a design for the efficient CFA image compression scheme, a number of engineering

decisions is need to be made. This section lists general challenges and considerations

associated with it for digital cameras. Main concerns are cost, image quality, opera-

tional/power efficiency, and portability. [2]

• Dynamic Range (Image Precision) : Recently high dynamic range imaging

(HDRI) technologies have gained significant popularity in various fields, such as

movie, digital photography and computer graphics industries. Research trend in

digital photography is shifting from enhancement of spatial resolution to tonal res-

olution and significant emphasis is given to incorporation of HDRI technologies

into consumer level digital cameras. HDRI addresses limitations of traditional low

dynamic range imaging (LDRI) by providing wider range of luminance informa-

tion to achieve more precise representation of real visual scenes. Consequently,

HDRI technology can represent entire dynamic range of luminance that human can

perceive. [3] In order to support HDRI in digital cameras, each stage in image pro-

cessing pipeline (IPP), from image acquisition to visualization, should be updated


to facilitate image data in HDR format. Especially, the proposed CFA compression

scheme should retain the high bit-depth of given CFA data.

• Cost/Operational Efficiency : Production cost and operational efficiency are

two closely related consideration factors in IPP design. The proposed scheme should

efficiently manage expensive camera on-board memories and other computational

resources. Embedding sophisticated algorithms in resource constrained system is a

challenging task due to hardware limitation and cost. The ideal solution exploits

low complexity techniques in on-board processors and offloads high complexity

algorithms to end devices, where sufficient processing power is provided.

For optimum computational efficiency, computing hardware on a camera can be

explicitly designed to implement a given processing algorithm in the form of an

application specific integrated circuit (ASIC). However, development of new ASIC is

an expensive process requiring relatively high usage volumes to make this approach

financially attractive. Once constructed, the image processing chain in the ASIC

cannot be changed. On the other hands, the digital signal processor (DSP) provides

a significant degree of freedom over the ASIC block as the DSP is a programmable

device. In addition, the DSP is advantageous in terms of production cost. In

terms of processing speed, the ASIC is better choice than the DSP, as the ASIC is

dedicated for a given task, thus more optimized.

• Image Quality : The proposed compression scheme should be able to reproduce

color in great fidelity and high accuracy. The quality of final images is affected

by selection of processing algorithms. There are two categories of approaches de-

pending on the nature of compression: lossless and lossy. The lossless compression

algorithm does not allow the loss of image quality and the regenerated image after

decompression is an exact replica of the original image. The lossless compres-

sion algorithm is applicable for areas including medical imaging, image archiving


system, cultural-heritage, and surveillance system. On the other hands, a lossy

compression algorithm aims to achieve high compression ratio than lossless one by

allowing marginal image distortion. Thus, part of original data can be lost with a

lossy approach but it should maintain good perceptual quality of the reconstructed

image.

1.3 Thesis Scope and Contributions

This thesis focuses on implementing color filter array (CFA) compression schemes for

the digital camera pipeline that efficiently encode CFA images given in high dynamic

range format (high bit-depth). Although there exist various other CFA patterns, we only

focus on the Bayer CFA since it is most commonly used one in the industry due to its

optimal spatial arrangement [4]. Therefore, hereafter, if a CFA image is mentioned, a

Bayer CFA image is referred to unless specifically stated otherwise. Two different types

of compression schemes are proposed in this thesis. The first proposed solution encodes

HDR CFA data without loss of quality, referred as a lossless scheme. The other solution

is a lossy scheme that compresses HDR CFA image with marginal quality loss to enhance

compression efficiency.

1.3.1 Lossless HDR CFA compression scheme for the digital

camera pipeline

The first contribution of this thesis is proposing a lossless Bayer CFA image compres-

sion scheme capable of handling HDR representation. The proposed pipeline consists

of a series of pre-processing operations followed by a JPEG XR encoding module. A

deinterleaving step separates the CFA image to sub-images of a single color channel, and

each sub-image is processed by a proposed weighted template matching based prediction.

The utilized JPEG XR codec allows the compression of HDR data at low computational


cost. Extensive experimentation is performed using sample test HDR images to validate

performance and the proposed pipeline outperforms existing lossless CFA compression

solutions in terms of compression efficiency.

1.3.2 Lossy HDR CFA compression scheme for the digital cam-

era pipeline

The second contribution of this thesis is proposing a lossy Bayer CFA image compression

scheme capable of handling HDR representation. The proposed pipeline consists of a

series of pre-processing steps followed by a JPEG XR encoding module. A 8-directional

edge sensing mechanism and an inter-channel correlator are used to reduce estimation

errors and preserve edge related information in missing color component estimation. The

utilized YCoCg color space allows for a simplified pipeline implementation and deliver-

ance of high quality results. The proposed solution is tested using sample HDR images

and performance is validated using three image quality assessment metrics, including

composite peak-signal-to-noise ratio (CPSNR), multi-scale structural similarity index

(MSSIM), and HDR visual difference predictor (HDR-VDP). Extensive experimentation

reported in this thesis indicates that the proposed lossy compression solution is suitable

for limited resource environments due to low complexity and high performance.

1.4 Thesis Organization

The remainder of this thesis is organized as follows. Chapter 2 provides necessary back-

ground information and review of previous works related to single-sensor imaging tech-

nology, HDRI technology, image compression techniques, and image quality assessment

metrics. Our proposed CFA compression schemes are presented in Chapters 3 and 4. In

Chapter 5, we conclude this thesis and discuss some limitations and practical issues to

be considered in future researches.

Chapter 2

BACKGROUND

In this chapter, we provide technical concepts and existing research activities on digital

camera processing pipeline, high dynamic range imaging, fundamentals of image com-

pression, and image quality assessment metrics.

2.1 Digital Camera Design

2.1.1 Digital Camera Architecture

In digital cameras, the color information of an real-world scene is acquired through an

image sensor, usually a charge-coupled device (CCD) [5] or complementary metal oxide

semiconductor (CMOS) sensor [6] in the format of superimposition of three primary

colors, red(R), green(G), and blue(B). Commonly used image sensors are monochromatic

devices that sense the light within limited frequency range, and therefore cannot acquire

color information directly. Due to the monochromatic nature of the image sensor, digital

camera manufacturers implement several solutions to capture the visual scene in color.

The most straightforward approach to capture a digital image is to use three separate

sensors to capture RGB light. A beam splitter is used to project the light through three

color filters, and towards three sensors. However, a sensor is one of the most expensive

7

Chapter 2. BACKGROUND 8

components of a digital camera, usually taking upto 25 percent of the total production

costs [7], and thus, the three-sensor method is only used for high-end professional cameras.

The cost effective alternative to the three-sensor approach is a single-sensor imaging

technology. To reduce cost and complexity, most of digital cameras are equipped with a

sensor coupled with a color filter array (CFA). A CFA is a mosaic of color filters placed

on the top of conventional CCD/CMOS image sensor to filter out two of the R, G, and

B components in each pixel position. Consequently, a digital image acquired by CFA,

called a raw CFA image, stores only a single measurement of RGB in each pixel and

missing components are regenerated through a color demosaicking (CDM) process, also

known as a CFA interpolation. [1] Typical optical path for a single sensor camera is

shown in Figure 2.1.

Figure 2.1: Typical optical path for single sensor cameras

Figure 2.2: Bayer CFA arrangement

A number of RGB CFAs with the various layout of color filters in the array are

used in practice. Since the CFA is placed in the early stage in the image acquisition

pipeline, it determines the maximal resolution, image quality, and computational effi-

ciencies achievable by subsequent processing pipeline. The most common CFA design is

a Bayer pattern [8], contains two green, one blue, and one red samples arranged in a 2x2

block, as shown in Figure 2.2. The green component in the Bayer CFA is measured at


double sampling rate since human visual system (HVS) is more sensitive to the green

portion of the spectrum.

2.1.2 Image Processing Pipeline

Digital cameras embed a series of signal processing operations in their processors to

produce images, which is called an image processing pipeline (IPP). An image pipeline

design plays a key role in digital camera systems for generating high quality images.

Although the sequence of operations differs from manufacturer to manufacturer, a general

image pipeline consists of a series of processing functions as shown in Figure 2.3. In typical

digital camera pipeline architecture, the CDM is one of the first operations performed

after CFA image acquisition. The CDM is a mandatory process that restores the color

information from the original CFA image. Then, the demosaicked RGB images are

modified by adjusting white balance, and performing color and gamma correction to

match the colors of the input scene when rendered on a display device. White balancing

removes the color tint of an image to make white objects appear white. Color correction

transforms the CFA sensor color space to a standard RGB space, such as linear sRGB

[9]. Gamma correction adjusts the image intensity to compensate the non-linearity of

CRT or LCD display. Once adjustment and correction processes are completed, the

enhanced image is compressed for storage or transmission. Typical cameras commonly

store the image in a compressed format using the Joint Photography Experts Group

(JPEG) standard [10]. The exchangeable image file (EXIF) format [11] allows storage

of additional metadata information related to the camera and the image characteristic

along with compressed image data using JPEG. [1] A drawback of conventional IPP

in Figure 2.3 is that CDM does not increase the information content of the original

image, but introduces redundancies by estimating missing pixels, consuming substantial

storages of the camera. The objective of image compression is to reduce redundancies in

image data, and therefore, compression of demosaicked images can be counterproductive.


To avoid such issue, an alternative IPP in Figure 2.4, which reverses the CDM and

compression stages, can be utilized. [12]

Figure 2.3: Conventional Image Processing Pipeline

Figure 2.4: Alternative Image Processing Pipeline

In the alternative scheme, the CFA image is compressed prior to converting it to a

full color image. The main advantage of the alternative IPP is that a number of CFA

samples is only 1/3 of that in the full color image, thus requiring less computational

resource and storage capacity. In addition, this approach allows CDM and other en-

hancement/correction operations to be performed in the end device, rather than inside

the camera. Offloading of the CDM from the camera to the end device, such as a personal

computer (PC), allows utilization of a highly sophisticated CDM algorithm to produce

a more visually pleasing color output, because computational cost is less of issue in this

case. Moreover, it simplifies the hardware architecture and reduces cost, processing delay

and power consumption of digital cameras. Experimental results from various literatures

[13, 14, 15] suggests that the alternative IPP can generate similar or higher quality images

than the conventional chain under low compression ratios.


2.1.3 Color Demosaicking

Color demosaicking (CDM) [16, 17] is a crucial operation in the single-sensor imaging

pipeline to restore the color image from the raw mosaic sensor data. The image acquired

through CFA appears as an interleaved mosaic similarly to a grayscale image and missing

components in the CFA image are reconstructed through CDM in order to produce a

complete RGB image. Thus, the objective of CDM is to transform a K1 ×K2 grayscale

image z : Z2 → Z to a K1 × K2 full color image x : Z2 → Z3. The CDM process can

be modeled as an interpolation function fϕ, which defines a relationship between output

image x and input CFA image z as follows:

x = fϕ(Λ,Ψ, ζ, z)

Λ : ESM(edge sensing mechanism) operator

Ψ : SM(spectral model) operator

ζ : local neighborhood area

z : CFA image

(2.1)

The edge-sensing mechanism (ESM) operator Λ = {w(i,j); (i, j) ∈ ζ} generates edge-

sensing weights w(i,j) of each individual neighborhood pixel on the basis of edge direction

so that the structural information of the input image z is preserved in missing information

estimation. Non-data adaptive ESM operators use simple linear averaging models and

fixed weights for all surrounding pixels resulting in blurred edges. On the other hands,

data adaptive ESM operators produces better quality full-color images with enhanced

fine details by adjusting edge-sensing weight factors of surrounding pixels.

The spectral model (SM) operator Ψ uses correlation between color channels to elim-

inate spectral artifacts in the output image x. There are two fundamental inter-channel

correlation models: the color ratio rule [18] and the color difference rule [19]. The first

model employs a property that ratios of two color channels are constant over local re-

gions. It assumes that within a given object, the ratio R/G or B/G are locally stationary.


The second model is based on the property that the color difference signal between R, G,

and B images are slowly varying and thus, they are regarded as locally constant. Instead

of estimating the original intensity in the two chromatic color channels, R and B, color

difference model based algorithms estimate the difference signals, R-G, or B-G, in order

to derive missing values.

It is essential to use appropriate ESM operator and SM operator in order to reduce

excessive blur, color shifts and visible aliasing effects during the demosaicking process.

The equation (2.1) reflects important characteristics of natural scenes such that i) non-

stationary characteristic due to existence of edges and fine details, ii) existence of inter-

channel correlation among RGB channels, and iii) existence of intra-channel correlation

among spatially neighborhood pixels. [20]

2.1.4 High Dynamic Range Imaging in Single Sensor Digital

Cameras

Currently, the research emphasis in digital photography is shifting from spatial resolution

to tonal resolution and a significant amount of research effort has been devoted to HDRI.

HDRI is a imaging technology that enables more realistic representation of the visual

scene than conventional technologies by increasing dynamic range of image data. Dy-

namic range of a digital camera refers to the ratio between the maximum charge that the

sensor can collect and the minimum detectable charge that just overcomes sensor noise.

Once the light intensities of real world scene are measured in a sensor, they are quantized

to produce digital data, traditionally into 8 bit per component, which gives 256 distinct

levels. [21] However, the 8-bit representation is often not sufficient to represent the range

of intensity levels in visual scenes containing both very bright and dark areas at the same

time, and often such limitation results in improper exposure issues in captured images.

For instance in a digital image captured with low exposure settings, dark areas in the

scene will be recorded as black (underexposure). On the other hands, in high exposure


settings, bright areas will be saturated (overexposure). HDRI performs operations on

color data with a larger number of bits per component than 8 bit to represent more tonal

levels over a much wider dynamic range. For example, 16-bit format can be used to

represent pixels in a HDR image, which provides us tonal levels of 65,536 (= 216). It is

sufficient to reveal more detail in complex scene lighting conditions. Figure 2.5 demon-

strates poorly captured images due to limited dynamic range and a HDR image that

preserves a wide dynamic range of light intensities. It can be seen that texture pattern

on the wall is hidden under dimly illuminated areas in a low exposure image while detail

of stained glass is not visible due to saturation in a high exposure image. On the other

hands, the final HDR image reveals all details without loss of information.

(a) image taken with low exposure time (b) image taken with high exposure time

(c) HDR image

Figure 2.5: Typical images with limited dynamic range and a HDR image

This section provides a brief overview of the three major components in the HDR


image processing pipeline for digital cameras: image acquisition, compression, and visu-

alization. Especially strong emphasis is given on acquisition and compression of HDR

images, which are generally embedded on digital cameras rather than end devices.

HDR Content Acquisition

There are two common approaches to produce HDR images in single sensor digital cam-

eras: i) capture images directly from a HDR sensor, ii) generate HDR images by com-

bining multiple low dynamic range (LDR) images at more than one exposure level using

a regular sensor. Due to high production cost associated with a HDR sensor, the latter

approach is more practical for consumer level cameras. In order to generate HDR images,

multiple photos in different exposure values are captured and combined together to get

good detail in all areas of a scene. Merging of multiple LDR images, so called HDR

reconstruction process, involves the characterization of the sensor’s intensity response

function f , which relates a image pixel value zij and an actual scene radiance value Eij

as follows.

zij,k = f(Eij∆tk + ηij) (2.2)

A collection of k differently exposed pictures of a scene acquired with known variable

exposure times ∆tk and the sensor’s noise ηij give a set of zij,k values for each pixel ij,

where k is the index on exposure times. Once f is recovered, the actual scene radiance

values are obtained by applying its inverse f−1 to the set of correspondent brightness val-

ues zij,k observed in the differently exposed images. One of the most popular techniques

for HDR reconstruction is the Debevec and Malik method, shown in Figure 2.6 [22]. It

is a two-stage HDR reconstruction algorithm that estimates a non-parametric response

function from image pixels and then recovers the radiance map.


Figure 2.6: HDR image acquisition by capturing multiple images

The input to the algorithm is a number of digital images taken from the same vantage

point with different known exposure durations ∆tk. It is assumed that the scene is static,

the sensor’s noise ηij is negligible, the irradiance values Eij,k for each pixel ij are constant,

and f is monotonic, thus invertible. The camera response function f is

zij,k = f(Eij∆tk)

f−1(zij,k) = Eij∆tk

ln f−1(zij,k) = lnEij + ln ∆tk

g(zij,k) = lnEij + ln ∆tk ,where g = ln f−1

(2.3)

The algorithm finds the function g and the radiances Eij that best satisfy an objective

function in a least-squared error sense. Once g is obtained, it can be used to convert pixel

values to relative radiance values Eij using known ∆tk. For multiple capture approaches,

it is essential that the scene is completely static during captures. Otherwise, misalignment

between images due to movement of either objects in the scene or a camera causes a

ghosting effect, which introduces blurry or transparent artifacts on a generated HDR

image. Several techniques are proposed to reduce ghosting problem: i) use a tripod to

eliminate camera movements, ii) capture a scene with a faster shutter speed to freeze

motion of objects, and iii) exploit anti-ghosting techniques [23, 24].

Recently, new HDR acquisition technique [25] is proposed which doesn’t require mul-

tiple captures of images. This method, shown in Figure 2.7, generates multiple LDR


images of different exposure levels from an input Bayer CFA image using predefined

look-up tables(LUTs) and merging the original and generated LDR images together to

produce a final HDR image. Since this method removes needs for iterative processing

and ghosting artifact caused by moving object, it is a reasonable solution that makes the

HDRI technology feasible in single sensor imaging devices along with the multiple LDR

capture method.

Figure 2.7: HDR image acquisition by estimation

HDR Image Compression

As discussed in previous section, there are different techniques to create HDR images

in digital cameras. Compression of acquired HDR content forms the next component in

the processing chain. Nowadays, high-end/professional cameras allow the storage of the

raw CFA data in high bit-depth, typically between 10 to 16 bit per pixel. For example,

a popular high-end camera, the Canon EOS 5D Mark 2 can provides raw CFA image

in bit depth of 14 bits. Increase in data bit depth leads to increased amount of image

data and we need more efficient encoding algorithms. The JPEG compression, the most

widely used image compression solution, disallows future manipulation offered by the

high bit depth data since they are limited to 8 bit representation. Therefore, original

HDR contents should be squashed into 8 bit prior to apply JPEG compression, causing

the loss of precision. Current high-end cameras address this issue by allowing the storage

of raw CFA image without compression, as illustrated in Figure 2.8.


Figure 2.8: Image pipeline design with raw CFA image storage

In such design, the user can retrieve CFA images from the digital camera and perform

high quality post-processing operations in PC without loss of HDR contents. Camera

manufacturers different types of raw files, such as CR2 (Canon), NEF (Nikon), ORF

(Olympus), PEF (Pentax), RW2 (Panasonic) and SR2 (Sony), mostly based on the TIFF

file format. However, preserving CFA images in a raw format leads to excessive consump-

tion of the camera storage memory. Figure 2.9 presents the image processing pipeline

that addresses the storage inefficiency issue associated with HDR data.

Figure 2.9: Image pipeline design exploiting HDR contents compression

In the proposed IPP, image compression standard capable of handling high bit-depth

data, such as JPEG XR or JPEG 2000, is applied immediately after raw CFA image

acquisition. It allows the CFA image to be compressed while retaining the necessary

high bit-depth data for future manipulation. Ultimately the user will be offered efficient

usage of expensive memory resources while maintaining superior image quality during

various post-processing operations.


HDR Image Display

Displaying HDR content is the last component of the HDR image processing chain.

HDR content usually cannot be directly displayed on common display devices, LCD

or CRT monitors, as dynamic ranges of such devices are limited to conventional 8 bit

representation. Tone mapping process performs a conversion which takes luminance

of a HDR image as input and produces output pixel intensity that can be displayed on

standard display devices. Several tone mapping algorithms are proposed in the literatures

and they are categorized in two classes, global approaches [26, 27] and local approaches

[28]. Global tone-mapping algorithms apply same transfer function for all pixels. On the

other hands, local tone tone-mapping algorithms adapt mapping functions depending on

local statistics and pixel contexts. Generally, there is no single method which produces

the best result for all images and thus, user need to select an optimal algorithm based

on particular requirements and available computational resources.

2.2 Image Compression

In digital imaging, each pixel is a sample of an original image, and its intensity is typ-

ically represented with a fixed number of bits. The statistical analysis indicates that

digital images contain a significant amount of spatial and spectral redundancies. Image

compression aims at taking advantage of these redundancies to reduce the number of

bits to represent an image. In addition, the insensitivity of HVS allows further reduction

of bandwidth by ignoring certain signals that is not sensible by human. This section

elaborates on fundamental image compression techniques, common image compression

standards, and various CFA compression algorithms for single sensor imaging devices.


2.2.1 Common Image Compression Techniques

Color Space Conversion

A digital image generally has three color components per pixel, RGB. Instead of cod-

ing RGB data directly, common compression standards exploit color space conversion

to transform them into luminance/chrominance system. The luminance/chrominance

system defines a color space in terms of one luminance and two chrominance compo-

nents. Luminance is perceived brightness of the light, while chrominance is defined as

the characteristic of light that produces the sensation of color apart from luminance.

[1] Luminance/chrominance spaces are advantageous over RGB in two major reasons.

Firstly, for general color images, inter-channel correlation can be reduced by converting

RGB images to luminance/chrominance images, thus color space conversion allows better

compression performance. Secondly, it is a more convenient form to apply a subsampling

technique that allows reduction of visually redundant content that is less sensible for

human. The most commonly used luminance/chrominance system in multimedia com-

pression is the YCbCr space. The forward and inverse conversions between RGB and

YCbCr are defined in the JPEG 2000 specification as follows [29]:Y

Cb(U)

Cr(V )

=

0.299 0.587 0.144

−0.169 −0.331 0.5

0.5 −0.4187 −0.08

R

G

B

⇐⇒

R

G

B

=

1 0 1.402

1 −0.344 −0.714

1 1.772 0

Y

Cb

Cr

(2.4)

The conversion process in (2.4) is computationally expensive due to floating point arith-

metic. Recently the YCoCg color space was introduced to simplify color transformation

by avoiding use of floating point coefficients and rounding errors. This new color space

defines two chrominance channels, Co and Cg, which can be regarded as excess orange


and excess green. The transform matrix of YCoCg is derived by close approximation

of Karhunen-Loeve transform (KLT) from standard Kodak image set and can be imple-

mented using simple addition and right shift as follows [30]:Y

Co

Cg

=

1/4 1/2 1/4

1/2 0 −1/2

−1/4 1/2 −1/4

R

G

B

⇐⇒

R

G

B

=

1 1 −1

1 0 1

1 −1 −1

Y

Co

Cg

(2.5)

The reversible form of YCoCg transform, referred as YCoCg-R, is used in the JPEG XR

standard and in recent edition of the H.264/MPEG-4 AVC standard.

Predictive Coding

Instead of encoding original signal directly, a predictive coding technique, also known as

a differential coding, encodes the difference between the original signal and its prediction.

Since pixels in a natural image are highly correlated to each other, a pixel can be predicted

with a good accuracy from its adjacent pixels. A predicted value is then subtracted from

the original value of the corresponding pixel to obtain a prediction error, also called

a prediction residue. The performance of predictive coding is significantly affected by

the accuracy of prediction algorithm. If the prediction is well designed, distribution of

prediction error signal will be closely concentrated on zero and the variance of the error

signal will be much lower than that of the original signal. Consequently, applying an

entropy coding on the prediction error signal will improve compression efficiency.

Predictive coding is often used in lossless compression standards. The most popular

compression standards make use of predictive coding technique is JPEG-LS [31]. JPEG-

LS standard exploits a predictor called Median Edge Detector (MED) which provides a

good balance between prediction accuracy and computational simplicity. It predicts the


value of the current pixel by examining 3 neighbor pixels of the current one in North,

West, and North-west directions. Another lossless image codec CALIC [32] employs an

advanced predictor called Gradient Adaptive Predictor (GAP) that provides a higher

prediction performance by using 7 neighbor pixels.

2.2.2 Image Compression Standards : JPEG family

In digital photography, there are many different formats to compressed raw images.

However, most frequently used compression standards are the ones established by Joint

Photographic Experts Group. These standards are widely adopted by manufacturers

for compatibility for their products. The first standard released by the JPEG group

is the JPEG standard [10], introduced in the 1980s. JPEG’s baseline mode, the most

dominantly used operation mode, is a lossy compression scheme based on two dimensional

Discrete Cosine Transform (DCT). Its workflow consists of color space conversion, DCT

transform, quantization, and entropy coding. Although JPEG has been successful in

the industry for a long period, its limitation in rate-distortion performance and lack

of supports for unified pipeline for both lossy and lossless coding raised the need for

an advanced compression standard. To overcome limitations of JPEG, JPEG2000 [33]

was released in 2000 under the principle of the Discrete Wavelet Transform (DWT).

JPEG2000 provides not only higher rate-distortion performance than the original JPEG

standard but also a single pipeline for both lossy and lossless encoding. Its spatial

and quality scalability allows decoding of compressed bitstream in different resolution

and precision configurations to meet different application requirements. In addition,

JPEG2000 can handle high bit-depth data such as 16-bit integer or 32-bits floating point

per components, enabling compression of HDR images. However, main disadvantage of

JPEG2000 compared with the JPEG is its complex architecture which resulted in limited

industrial adoption.

JPEG XR (extended range) [34], released in 2009, is a new image compression stan-


dard based on Microsoft coding technology known as HD Photo [35]. JPEG XR pro-

vides many convenient features offered in JPEG 2000 while maintaining its architecture

substantially simpler than JPEG 2000 since it only uses integer based computations

internally.

Figure 2.10: Block diagram of JPEG XR encoding process

JPEG XR supports a wide range of input bit-depth from 1 bit through 32 bit per

component. 8-bit and 16-bit formats are supported for both lossy and lossless com-

pression, while 32-bit format is only supported for lossy compression as only 24 bits

are typically retained through internal operations. Following conventional image com-

pression structure, JPEG XR’s coding path, shown in Figure 2.10, includes color space

conversion, block transform based on a reversible lapped bi-orthogonal transform (LBT),

quantization, and entropy coding. The LBT converts image data from spatial domain

to frequency domain. As a result of the LBT the coefficients are grouped into three

subbands, DC, lowpass(LP), and highpass(HP). DC, LP and HP subbands are then

quantized and entropy coded independently.

The performance of JPEG XR has been compared with other compression standards

in literatures. [36] evaluates rate-distortion performance of JPEG XR against JPEG,

JPEG 2000 and AVC/H.264 HP 4:4:4 intra using objective quality metrics, such as PSNR

and MSSIM index. It concludes that the performances of JPEG XR and JPEG 2000 are


very close to each other and JPEG 2000 outperforms JPEG XR slightly in some cases.

[37] performs perceptual quality assessments to compare rate-distortion performance of

JPEG, JPEG2000, and JPEG XR. Experimental results drew the similar outcome as

objective assessments.

2.2.3 Prior arts on Bayer CFA compression

As discussed in Section 2.1.4, storage of raw CFA images leads to excessive usage of

camera on-board memory and therefore, it raised the problem of efficient CFA image

compression. This section summarizes various CFA image compression schemes in lit-

eratures which follows the alternative processing workflow that performs compression

in earlier stage than CDM. The most straightforward approach is a direct application

of standard image compression, such as JPEG, JPEG-LS, or JPEG 2000, on raw CFA

images. [38, 39] However, direct compression of raw CFA images is found to be inef-

ficient since existing compression solutions are generally optimized for continuous tone

images and don’t work as effectively for mosaic-like images. Due to nonuniform spectral

sensitivity of image sensor, pixels from different color channels have different average

intensity levels. Therefore, intermixing pixels from different color channels generates ar-

tificial discontinuity. In order to address this issue, advanced CFA compression schemes

typically exploit various pre-processing operations prior to image encoding for optimal

use of compression tools.

In current, state-of-the-art, single sensor camera designs utilize compression schemes

in three different ways: lossless [13, 39, 40, 41], lossy [14, 15, 38, 42, 43, 44, 45], and near-

lossless [40], depending on nature of pre-processing algorithms and compression tools.

Lossless compression is used when the exact replica of the original image data is pre-

ferred over high compression ratio. It is crucial in the field of medical imaging, cinema

industry, and image archiving system of museum arts and relics. On the other hands,

lossy approaches aim to minimize amount of image data by discarding visually redun-


dant contents. They are suitable for the areas where the efficient usage of memory and

computational resource is paramount. Near-lossless schemes lie somewhere in-between

two other classes, where algorithms achieve perceptually lossless compression by limiting

distortion in compressed image to pre-defined threshold values.

Figure 2.11: CFA deinterleave process

In following, a number of pre-processing techniques exploiting a pixel rearrangement

strategy are discussed. Commonly, prior-art solutions deinterleave the CFA images into

sub-images, each consists of samples from a single color channel. The resulting R and B

sub-images form rectangular lattice which can be easily encoded by common standards.

However, the quincunx lattice of G sub-image is needed to be further processed for sub-

sequent compression. There are three popular approaches to transform the quincunx G

sub-image to the form more convenient for compression: i) merge, ii) separation, and

iii) rotation. Some CFA compression techniques [14, 42] employ a color space conversion

to convert CFA image in RGB domain to luminance-chrominance domain prior to dein-

terleave. In such scenario, deinterleave operation produces a quincunx luminance (Y)

and rectangular chrominance (C) color channels, and thus following techniques can be

applied to Y channel instead of G.

Firstly, the merge method [14, 43, 44] shifts either even pixel rows up or even pixel

columns left by one pixel. This produces a rectangular grid where one dimension is

equal and the other one is a half of the corresponding CFA. The generated rectangular


Figure 2.12: CFA deinterleave process : G subimage

images are compressed by JPEG or JPEG2000. Since, simple shift can introduce dis-

tortion causing suboptimal compression, [14] applies directional lowpass filtering prior

to compression. This is only suitable for lossy approaches as lowpass filtering removes

edges and fine details. Secondly, the separation method [14, 38, 40, 42] splits the quin-

cunx lattice into two rectangular lattices and compresses them separately. Independent

encoding of two sublattices is inefficient as it disregards spatial correlation between two

sublattices, and therefore, [40] applies a predictive coding technique to improve com-

pression efficiency. Lastly, rotation method [45] rotates the quincunx grid by 45 degree

and removes blank pixel positions. However, the resulting image forms a rhombus and

standard encoders such as JPEG, and JPEG2000 cannot be applied directly.

Instead of performing color channel deinterleaving, [13] applies a wavelet decomposi-

tion followed by an entropy coding directly to CFA images to alleviate the aliasing issue

in direct CFA encoding. In this scheme, the Mallet wavelet transform decorrelates CFA

images by efficiently packing the signal energy into subbands. Overall, there exist various

CFA compression schemes and the experimental result indicates that there is no single

best method for all test images. Therefore, the ultimate design goal is to decide appropri-


ate pre-processing operations and compression standards to meet a set of requirements:

rate-distortion performance and computational cost.

2.3 Image Quality Assessment Metrics

With the advent of various multimedia compression standards, it has become increasingly

important for industry to devise standardized quality assessment tools for compressed

digital contents. Since human observers are ultimate receivers in image processing ap-

plications, the most reliable way to evaluate quality is to conduct a survey, where a

group of humans is asked to rate on perceived quality of presented images on a numer-

ical scale. The average of obtained values is called the mean opinion score (MOS) and

such assessment technique is referred as subjective quality assessment (QA). However,

the impracticality of subjective QA raised the need for objective QA that measures the

perceived quality of visual contents using automated algorithms. Those metrics can be

employed to benchmark image processing systems and also embedded into system to opti-

mize system parameter settings. Generally objective QA metrics are categorized in three

classes: i) full-reference (FR), ii) no-reference (NR), and iii) reduced-reference (RR). [46]

FR algorithm require an original version of image (non-distorted) to predict perceived

quality of a sample distorted image. NR algorithms don’t need an access to original

image and RR algorithms lie somewhere in-between where they only require some char-

acteristics of a reference image. This section focuses on image QA metrics implementing

FR algorithms, which are mainly used in this thesis research.


2.3.1 Non-perceptual Quality Metrics

One of the most common objective QA metrics, the Mean Square Error(MSE) is defined

as,

MSE =1

MN

M∑i=1

N∑j=1

(X(i,j) − Y(i,j))2 (2.6)

where X denotes a reference image, Y denotes a distorted image to be compared, and

M,N denote image dimensions. The MSE is basically a normalized Minkowsky distance

with order p being 2, where the Minkowsky distance is defined as follows:

Ep = (M∑i=1

N∑j=1

|X(i,j) − Y(i,j)|p)1/p (2.7)

In addition, setting p = 1 yields the mean absolute error (MAE), and p = ∞ yields

the maximum absolute difference (MAD). In practice, MSE’s variant, the peak to signal

noise ratio(PSNR) is often used in dB scale, which is defined as follows:

PSNR = 10 log10

(2B − 1)2

MSE(2.8)

where B represents bit depth. MSE, PSNR and other variants can be easily implemented

in real world applications but often don’t reflect the way that HVS perceives images.

Therefore, a major emphasis in recent research has been given to image QA algorithms

based on explicit modeling of the HVS, such as the structural similarity index (SSIM)

and the Visible Difference Predictor (VDP).

2.3.2 Perceptual Quality Metrics

The SSIM index [47] is a widely used FR algorithm based on an idea that HVS is

highly adapted to extract structural information from visual scenes. It separates the

task of image similarity measurement into three components: luminance, contrast, and

structure. The luminance and contrast distortions are affected by illuminance variations,

while structure information of the objects is independent of the illuminance. Hence,


the SSIM algorithm performs independent structure distortion measurement along with

luminance and contrast analysis. Similarly to other FR approaches, the SSIM index is a

function of two images denoted as X and Y , that if one of the images is assumed to be

the reference image, the SSIM index can be regarded as a quality measure of the other

image.

Initially, the algorithm estimates the local luminance of each image signal by the

mean intensity. The local luminance of image X, µx, is obtained by

µx =1

MN

M∑i=1

N∑j=1

X(i,j) (2.9)

Secondly, the mean intensity is removed from the signal and the standard deviation is

used as a round estimation of the contrast information. The contrast of image X, σx, is

estimated as follows

σx = { 1

MN − 1

M∑i=1

N∑j=1

(X(i,j) − µx)2}1/2 (2.10)

Next, the signal is normalized by its own mean and standard deviation. This normalized

signal, (X − µx)/σx, is used as a structure estimation of image X. Parameters for local

luminance, contrast, and structure information is obtained for each image signal and

they formulates luminance comparison function l(X, Y ), contrast comparison function

C(X, Y ), and structure comparison functions s(X, Y ) as follows:

l(X, Y ) = (2µxµy + C1)/(µ2x + µ2

y + C1)

c(X, Y ) = (2σxσy + C2)/(σ2x + σ2

y + C2)

s(X, Y ) = (2σxy + C3)/(σxσy + C3)

where σxy =1

MN − 1

M∑i=1

N∑j=1

(X(i,j) − µx)(Y(i,j) − µy)

(2.11)

C1, C2, and C3 are defined as C1 = (K1L)2, C2 = (K2L)2, and C3 = C2/2, where L

denotes the dynamic range of the pixel values, and K1, K2 are positive constants generally

set to be 0.01 and 0.03 respectively. Finally, the three components are combined to yield


an overall similarity measure SSIM(X, Y )

SSIM(X, Y ) = [l(X, Y )]α · [c(X, Y )]β · [s(X, Y )]γ (2.12)

where α, β and γ are positive parameters that adjust the relative importance of the three

components. Typically, the SSIM method is applied locally rather than globally using a

support window, producing a SSIM index quality map of the image. In practice, when a

single quality measure of the entire image is preferred to the quality map, a mean SSIM

index is often used using (2.13):

SSIM(X, Y ) =1

M

M∑i=1

SSIM(xi, yi) (2.13)

where xi and yi are the image pixel values of the reference and the distorted images at

the i-th local window, and M is the number of local windows in the image.

An advanced SSIM metric, called a multi-scale SSIM (MSSIM) [48] is often used due

to its robustness in variation of viewing conditions. MSSIM initially decomposes a test

image into several scales and provides statistics by measuring luminance, contrast, and

structure information of each sub-scale image. Finally, all the data is pooled into a single

number. MSSIM provides good correlation to subjective measurements at a reasonable

computational cost.

Another widely used image QA metric is the Visible Difference Predictor(VDP). The

VDP metric predicts pixel percentage of a test image that standard observers would

perceive as different from an original. In order words, VDP does not try to judge how

irritating image artifacts introduced by compression are, it only tries to predict whether

they are detectable. VDP deploys a highly complex model of the HVS and thus, com-

putationally intensive than MSSIM. The VDP algorithm customized for HDR images is

called the HDR-VDP [49]. It deploys several modifications to the VDP to improve its

prediction accuracy in the wider range of luminance and under the adaptation conditions

corresponding to real scene observation.

Chapter 3

Lossless CFA Compression using

Prediction

3.1 Introduction

In this chapter, a new lossless CFA compression method capable of handling HDR rep-

resentation is presented. We focus on the Bayer CFA structure as it is the dominant

CFA arrangement in the industry. The proposed scheme consists of color channel dein-

terleave, weighted template matching prediction, and lossless image compression opera-

tions. There are main differences of the proposed method compared to prior art solutions.

Firstly, it introduces a weighted template matching prediction to increase the accuracy of

prediction and achieve high compression efficiency. Our method is similar to the context

matching based prediction (CMBP) presented in [41], but is more advantageous in terms

of computational complexity. It is because the proposed method does not require the

generation of the direction vector map, that is necessary to carry out predictive coding

in CMBP. Secondly, we make use of the JPEG XR codec [34] to facilitate a lossless com-

pression of CFA image in HDR representation, such as 16 bit per pixel format. Although

other codecs, such as JPEG 2000 or JPEG-LS, are also capable of handling HDR input,

30

Chapter 3. Lossless CFA Compression using Prediction 31

JPEG XR’s balance between performance and complexity makes it a suitable solution

for digital camera implementation.

The rest of this chapter is structured as follows. The proposed lossless CFA compres-

sion pipeline is presented in Section 3.2. Experiment results and analysis are demon-

strated in Section 3.3. Finally the chapter summary is given in Section 3.4.

3.2 Proposed Algorithm

Figure 3.1 illustrates the proposed CFA compression method for encoding process and

decoding process. The proposed scheme employs a structure separation to extract 3 sub-

images of single color component from the original CFA layout. Then, each sub-image

undergoes a predictive coding process. The predictive coding forms a prediction for

current pixel based on a linear combination of previously coded neighborhood pixels, and

encodes the prediction error signal to remove spatial redundancies. Initially we process

G sub-image using the weighted template matching prediction technique in raster scan

order, and generate the prediction error of G channel, eg. After completion of G channel

prediction, non-green sub-images are processed. Instead of carrying out the prediction on

R and B samples directly, we use color difference domain signals, dr (G-R), and db (G-B)

for non-green components. This allows us to reduce spectral (inter-channel) redundancies

in the data, leading to higher compression efficiency. In order to obtain color difference

signals, the estimation of missing G values at non-green pixel positions is necessary. In

the proposed algorithm, we perform a bilinear interpolation on a quincunx G sub-image,

which delivers satisfactory performance at low computational cost. Again, the prediction

error of color difference signals, edr and edb are obtained by the proposed predictor. The

generated error signals constitute standard 4:2:2 formatted data. Therefore, they are

encoded by JPEG XR codec using its 4:2:2 lossless encoding mode.

In the companion decoding pipeline, compressed prediction error signals are decoded.


Then the decoder forms the identical prediction as the one from the encoding pipeline us-

ing decompressed error signal to reconstruct individual sub-images. Finally, we combine

generated sub-images to recover original CFA layout.

(a) Encoding process

(b) Decoding process

Figure 3.1: Overview of the proposed lossless CFA compression pipeline

3.2.1 Deinterleaving Bayer CFA

The proposed pipeline initially deinterleaves the Bayer CFA images into three sub-images,

r, g, and b, as shown in Figure 3.2. As previously mentioned in Section 2.2.3, the direct

application of compression solution to the CFA image is inefficient as CFA data are

formed by intermixing samples from different color channels. Although for most natural


images, there still exist spatial correlations between CFA samples, pixels from different

channels contain high frequency discontinuities, disallowing high compression ratio. By

deinterleaving the CFA image, three downsampled sub-images, each of which consists of

pixels in a single color channel, are extracted.

Figure 3.2: Bayer CFA deinterleave method

Let us consider, a K1K2 grayscale CFA image z(i,j) : Z2 → Z representing a two-

dimensional input image to encode. The deinterleaving process can be formulated as

follows:

g(i,j) =

z(i,j) , (i, j) ∈ {(2m− 1, 2n), (2m, 2n− 1)}

0 , otherwise

r(i,j) =

z(i,j) , (i, j) ∈ {(2m− 1, 2n− 1)}

0 , otherwise

b(i,j) =

z(i,j) , (i, j) ∈ {(2m, 2n)}

0 , otherwise

(3.1)

where m = 1, 2, · · · , K1/2, and n = 1, 2, · · · , K2/2. The obtained R and B sub-images

form square lattices, while the obtained G sub-image constitutes a quincunx lattice.

Each sub-image contains pixels from same color component and thus, subsequent pre-

diction process can effectively remove spatial redundancies to achieve high compression

performance.


3.2.2 Green sub-image prediction

The compression efficiency of predictive coding depends on the accuracy of a prediction

model. Simple linear predictors often yield poor performance at image edge regions.

The proposed adaptive predictor exploits a template matching technique to achieve high

prediction performance. It measures the dissimilarity between the template of a current

pixel to predict and the template of candidate pixels in neighbor to determine weight

factors of candidate pixels. The weight factors adaptively increases the influence of

candidate pixel whose associated template closely resembles the template of the pixel to

predict and located closer from current spatial position. The proposed scheme handles

the pixels in a raster scan order, which means from left pixel to right and from top to

bottom.

Figure 3.3: Current pixel to be predicted and its 4 closest neighborhood pixels in a

quincunx G sub-image

Figure 3.3 illustrates the current G pixel g(i,j) to predict and its 4 candidate pixels,

which are previously scanned 4 neighbor G pixels. The predicted value of g(i,j), denoted

as g(i,j), is given by,

g(i,j) =∑

(p,q)∈ζ1

(w′g(p,q) · g(p,q)) (3.2)

where ζ1 are 4 closest neighborhood pixels of g(i,j) such that ζ1 ∈ {(i, j − 2), (i − 1, j −


1), (i− 2, j), (i− 1, j + 1)}. The normalized weight factors, w′g(p,q) are given by

w′g(p,q) = wg(p,q)/∑

(m,n)∈ζ1

wg(m,n) (3.3)

The original weight factor wg(p,q) is defined as follows:

wg(p,q) = {1 + (∑

(r,s)∈ζ1

Diff(Tg(p,q), Tg(r,s))/D(g(p,q), g(r,s)))}−1 (3.4)

where Tg(p,q) is the template of G prediction centered at pixel (p,q), Tg(p,q) ∈ {(p, q −

2), (p− 1, q− 1), (p− 2, q), (p− 1, q+ 1)}, operator Diff(·) is a dissimilarity metric, and

operator D(·) is a spatial distance between two pixels. We add 1 in the denominator

to avoid a singularity issue that (∑

(r,s)∈ζ1 Diff(Tg(p,q), Tg(r,s))/D(g(p,q), g(r,s))) becomes

zero. [50]

Figure 3.4: Template of G sub-image centered at (i,j). ’o’ indicates pixels in the

template region

The template used for G prediction is shown in Figure 3.4. Although using a larger

template image in matching process improves prediction performance, the template of 4

pixels shows good trade-off between prediction accuracy and computational cost.

Typically, prediction techniques use sum of absolute differences (SAD) or sum square

errors (SSE) between two templates in order to determine the degree of dissimilarity. We

use the SAD due to its simplicity in implementation. Therefore, Diff(Tg(p,q), Tg(r,s)) is


defined as follows:

Diff(Tg(p,q), Tg(r,s)) = |g(p,q−2) − g(r,s−2)|+ |g(p−1,q−1) − g(r−1,s−1)|

+ |g(p−2,q) − g(r−2,s)|+ |g(p−1,q+1) − g(r−1,s+1)|(3.5)

Figure 3.5: Pixel values required for the prediction of G pixel at (i,j)

As shown in Figure 3.5, the proposed predictor requires a 5x7 support window

centered at pixel location (i-2, j-1) to calculate g(i,j). wg(i,j−2), wg(i−1,j−1), wg(i−2,j), and

wg(i−1,j+1), correspond to the west, northwest, north, and northeast weight factors of g(i,j)

pixel, are obtained using equation(3.6).

wg(i,j−2) = {1 + (|g(i,j−2) − g(i,j−4)|+ |g(i−1,j−1) − g(i−1,j−3)|+

|g(i−2,j) − g(i−2,j−2)|+ |g(i−1,j+1) − g(i−1,j−1)|)/(2)}−1

wg(i−1,j−1) = {1 + (|g(i,j−2) − g(i−1,j−3)|+ |g(i−1,j−1) − g(i−2,j−2)|+

|g(i−2,j) − g(i−3,j−1)|+ |g(i−1,j+1) − g(i−2,j)|)/(√

2)}−1

wg(i−2,j) = {1 + (|g(i,j−2) − g(i−2,j−2)|+ |g(i−1,j−1) − g(i−3,j−1)|+

|g(i−2,j) − g(i−4,j)|+ |g(i−1,j+1) − g(i−3,j+1)|)/(2)}−1

wg(i−1,j+1) = {1 + (|g(i,j−2) − g(i−1,j−1)|+ |g(i−1,j−1) − g(i−2,j)|+

|g(i−2,j) − g(i−3,j+1)|+ |g(i−1,j+1) − g(i−2,j+2)|)/(√

2)}−1

(3.6)

Figure 3.6 demonstrates the weight factor computation sequence for the G pixel at

location (i,j). In diagrams, the template region for the current pixel to predict is indicated


with red boxes and the template region for candidate pixel are indicated with blue boxes.

(a) weight factor for west (b) weight factor for northwest

(c) weight factor for north (d) weight factor for northeast

Figure 3.6: Weight computation for the prediction of G pixel at (i,j)

Once g(i,j) is obtained, G prediction error, eg(i,j), is determined by eg(i,j) = g(i,j)− g(i,j)

and coded in the encoding module. Since the decoder can make same prediction g(i,j)

as the encoder, the original G sub-image can be reconstructed without loss by adding

decoded prediction error, e′g, and g(i,j).

3.2.3 Non-Green sub-image prediction

Independent encoding of deinterleaved sub-images yields suboptimal compression effi-

ciency since data redundancy in the form of inter-channel correlation is disregarded

during compression. In order to take into account inter-channel correlation, we per-


form the prediction of non-green sub-images in the color difference domain rather than

the original intensity domain. To obtain color difference images, we need to estimate

G samples at original R and B pixel locations, which are unavailable in original CFA

layout. The missing G values are estimated from available G samples of the CFA image

by interpolation. Various interpolation schemes are available from the low-complexity

bilinear method to the complex methods utilizing a variety of estimation operators and

edge-sensing mechanisms. Our simulation results have shown that advanced interpola-

tion techniques typically improve the compression efficiency only marginally and thus,

we use the simple bilinear approach.

Figure 3.7: Current pixel to be predicted and its closest neighborhood pixels in a red

difference (dr) sub-image

Two color difference images, dr(i,j) and db(i,j) are defined as follows:

dr(i,j) = G(i,j) − r(i,j) , (i, j) ∈ {(2m− 1, 2n− 1)} (3.7)

db(i,j) = G(i,j) − b(i,j) , (i, j) ∈ {(2m, 2n)} (3.8)

where G denotes interpolated G channels. Since prediction procedure of two color dif-

ference images, dr(i,j) and db(i,j), are essentially identical, we only present a prediction

procedure for the red difference image using generalized difference signal d(i,j) in this

section. Similarly to G case, the proposed scheme predicts a current pixel d(i,j) using

its four closest candidate pixels placed in the direction of west, northwest, north, and


northeast, as shown in Figure 3.7. However, unlike G component, non-green components

forms square lattices rather than quincunx ones, and hence, candidate pixels are defined

to be ζ2 ∈ {(i, j − 2), (i− 2, j − 2), (i− 2, j), (i− 2, j + 2)}.

The prediction of color difference sub-images is also performed in a raster-scan order

using the weighted template matching technique. The template for the color difference

sub-image is defined in Figure 3.8 using G samples, since edge and fine detail are typically

deemphasized in color difference domain, while well preserved in G channel due to double

sampling rate.

Figure 3.8: Template of red difference (dr) sub-image centered at (i,j). ’o’ indicates

pixels in the template region

The original weight factor of difference sub-image wd(p,q) is defined as follows :

wd(p,q) = {1 + (∑

(r,s)∈ζ1

Diff(Td(p,q), Td(r,s))/D(d(p,q), d(r,s)))}−1 (3.9)

where Td(p,q) denotes the template of color difference image at (p,q), and defined as

Td(p,q) ∈ {(p, q + 1), (p, q − 1), (p + 1, q), (p − 1, q)}. wd(i,j−2), wd(i−1,j−1), wd(i−2,j), and

wd(i−1,j+1), correspond to the west, northwest, north, and northeast weight factors of d(i,j)


pixel, are obtained using equation (3.10).

wd(i,j−2) = {1 + (|g(i,j−1) − g(i,j−3)|+ |g(i−1,j) − g(i−1,j−2)|+

|g(i,j+1) − g(i,j−1)|+ |g(i+1,j) − g(i+1,j−2)|)/(2)}−1

wd(i−2,j−2) = {1 + (|g(i,j−1) − g(i−2,j−3)|+ |g(i−1,j) − g(i−3,j−2)|+

|g(i,j+1) − g(i−2,j−1)|+ |g(i+1,j) − g(i−1,j−2)|)/(2√

2)}−1

wd(i−2,j) = {1 + (|g(i,j−1) − g(i−2,j−1)|+ |g(i−1,j) − g(i−3,j)|+

|g(i,j+1) − g(i−2,j+1)|+ |g(i+1,j) − g(i−1,j+2)|)/(2)}−1

wd(i−2,j+2) = {1 + (|g(i,j−1) − g(i−2,j+1)|+ |g(i−1,j) − g(i−3,j+2)|+

|g(i,j+1) − g(i−2,j+3)|+ |g(i+1,j) − g(i−1,j+2)|)/(2√

2)}−1

(3.10)

(a) weight factor for west (b) weight factor for northwest

(c) weight factor for north (d) weight factor for northeast

Figure 3.9: Weight computation for the prediction of red difference (dr) pixel at (i,j)


Figure 3.9 demonstrates the weight factor computation sequence for the red difference

pixel at location (i,j). In diagrams, the template region for the current pixel to predict

is indicated with yellow boxes and the template region for candidate pixel are indicated

with blue boxes.

Once weight factors for all directions are computed, the predicted value is obtained

using normalized weights w′d(p,q) as follows:

d(i,j) =∑

(p,q)∈ζ2

(w′d(p,q) · d(p,q)) (3.11)

The prediction error of color difference images ed is determined by ed(i,j) = d(i,j) − d(i,j)

and coded in the encoding module. Again, the decoder has all information to make same

prediction as the encoder and thus, it can reconstruct the R and B sub-image without

loss.

3.2.4 Compression of prediction error

The prediction error for three sub-images, eg, edr, and edb, are obtained from previous

stages. To compress them without loss, various existing image codecs with lossless en-

coding capability, such as JPEG-LS, JPEG 2000, and JPEG XR are considered. In our

proposed pipeline, we make use of JPEG XR standard due to the following reasons: i)

JPEG XR supports channel bit-depth upto 24 bits for lossless compression, allowing ef-

ficient storage of HDR format data, and ii) JPEG XR yields balanced output between

compression efficiency and computational complexity. In our experiment, JPEG XR

provides almost comparable coding efficiency to high performance JPEG 2000. In terms

of complexity, JPEG XR has considerably simpler architecture than JPEG 2000 and is

comparable to low complexity JPEG-LS. Therefore, we believe that JPEG XR is an ideal

compression solution for resource constrained environments such as digital cameras. The

number of samples to compress in eg is twice as much as the ones in edr, and edb. It im-

plies that the prediction error signal forms a standard 4:2:2 arrangement and thus, YCC


4:2:2 encoding mode of JPEG XR can be applied to compress it. JPEG XR performs

lapped bi-orthogonal transform (LBT), quantization, and adaptive Huffman coding to

compress given input.

3.3 Experimental Results

Experiments are carried out using 31 RGB images from the Para-Dice Insight Compres-

sion Database [51], shown in Figure 3.10. This database is chosen since it is a publicly

available dataset containing a wide variety of RGB images in 16-bit HDR representa-

tion, varying in the edges and color appearances, and thus suitable for the evaluation

of our proposed solution. Three channel RGB images in the database are initially re-

sized to 960x640 and sampled by the Bayer CFA to produce the grayscale CFA images

z : Z2 → Z. The CFA images z are then processed by the proposed pipeline and com-

pressed into JPEG XR format c by JPEG XR reference software [52]. The reconstructed

CFA images x : Z2 → Z are generated by applying JPEG XR decompression to the

compressed data c, followed by processing operations in decoding pipeline. As all inter-

mediate steps are lossless, the reconstructed CFA images x should be identical to the

original CFA images z.

Performance of different solutions is evaluated by comparing lossless compression

bitrate. Compression bitrate is reported in bits per pixel (bpp), (8 × B)/n, where B is

the file size in bytes of the compressed image including image header and n is the number

of pixels in the image.

The JPEG XR codec is operated in lossless mode as follows: i) all subbands (DC,

LP, and HP) and flexbits are preserved during encoding, ii) Quantization is disabled by

setting quantization parameters to 1 for all subbands and color channels.


Figure 3.10: Test digital color images (referred to as image 1 to image 31, from left to

right and top to bottom)


3.3.1 Primary color channel and color difference channel

This section compares the compression performance of original R/B channels and color

difference channels.

(a) original channels (b) color difference channels

Figure 3.11: 2D autocorrelation graphs for the image 4 in database (a) original images,

R and B, (b) color difference images, dr and db

Figure 3.11 shows the two-dimensional autocorrelation of the primary color images

R and B, and the color difference images dr and db, for the image 4 in our database. The

height at each position indicates the correlation between the original image and spatially

shifted version of itself, which is defined in equation(3.12):

Corr(m,n) =

∑i

∑j(X(i,j) −X(i,j))(X(i+m,j+n) −X(i+m,j+n))√∑

i

∑j(X(i,j) −X(i,j))2 ·

√∑i

∑j(X(i+m,j+n) −X(i+m,j+n))2

(3.12)

where X(i,j) is the original image, X(i+m,j+n) is the shifted version of itself, X represent

the mean values of the given image, and m, n denote spatial shifts in horizontal and

vertical directions. The value at the center of graph is always 1 as it corresponds to zero

shift case.

The figure shows that the level of similarity drops off more rapidly with color differ-

ence images than primary color images as shifting distance increases. This observation


holds true for the other images in database. It implies that dr and db have lower spa-

tial correlation between neighborhood pixels than R and B. Since spatial redundancy is

reduced by using color difference images, more efficient entropy coding is expected. As

shown in Table 3.1, the proposed scheme yields average lossless compression bitrates of

12.340 bit per pixel (bpp) for primary color images and 11.875 bpp for color difference

images, respectively.

Image RB dRdB Image RB dRdB

1 10.405 10.069 17 12.189 11.671

2 13.500 13.040 18 13.560 13.113

3 13.468 13.041 19 12.830 12.215

4 11.182 10.676 20 11.737 11.192

5 12.024 11.736 21 12.568 12.094

6 10.397 10.126 22 12.306 11.756

7 10.278 10.079 23 11.126 10.758

8 11.115 10.659 24 11.469 10.98

9 13.420 12.939 25 12.090 11.335

10 13.820 13.338 26 12.639 12.103

11 14.404 13.872 27 13.022 12.525

12 11.421 11.097 28 12.669 12.069

13 13.369 12.841 29 10.475 10.157

14 14.497 14.004 30 13.667 13.198

15 13.748 13.268 31 12.059 11.548

16 11.075 10.622 Avg 12.340 11.875

Table 3.1: Lossless bitrate of proposed compression scheme with primary channel and

color difference channel


3.3.2 Green channel interpolation method

Img BI SPL EDI NEDI Img BI SPL EDI NEDI

1 10.069 10.109 10.023 9.995 17 11.671 11.679 11.715 11.662

2 13.040 13.064 13.075 13.049 18 13.113 13.133 13.168 13.116

3 13.041 13.068 13.064 13.048 19 12.215 12.227 12.276 12.220

4 10.676 10.668 10.680 10.655 20 11.192 11.226 11.251 11.172

5 11.736 11.784 11.791 11.751 21 12.094 12.118 12.145 12.096

6 10.126 10.144 10.136 10.129 22 11.756 11.761 11.771 11.736

7 10.079 10.121 10.119 10.100 23 10.758 10.787 10.817 10.756

8 10.659 10.674 10.672 10.646 24 10.980 10.994 11.035 10.942

9 12.939 12.947 12.971 12.936 25 11.335 11.328 11.404 11.354

10 13.338 13.333 13.345 13.308 26 12.103 12.120 12.118 12.074

11 13.872 13.879 13.901 13.894 27 12.525 12.563 12.555 12.493

12 11.097 11.122 11.114 11.091 28 12.069 12.082 12.103 12.016

13 12.841 12.851 12.879 12.822 29 10.157 10.230 10.250 10.161

14 14.004 13.997 14.029 14.011 30 13.198 13.223 13.242 13.209

15 13.268 13.281 13.315 13.274 31 11.548 11.582 11.578 11.546

16 10.622 10.661 10.647 10.614 Avg 11.875 11.895 11.909 11.867

Table 3.2: Lossless bitrate of proposed compression scheme with various G

interpolation schemes

Since we perform the weighted template matching prediction on the color difference

domain, the estimation of missing G samples at R and B pixel positions is necessary. This

is essentially achieved by interpolating the quincunx G image. In order to investigate

the influence of an interpolation technique in coding performance, we examined several

interpolation methods, including bilinear (BI), cubic spline interpolation (SPL), edge-


directed interpolation (EDI) [16], new edge-directed interpolation (NEDI) [53], which

vary in estimation accuracy and computational complexity. For BI, missing G samples

are estimated by taking an average value of four surrounding pixels. In SPL, a piecewise

continuous curve, passing through each of the given samples in G sub-image, is defined to

determine missing pixel values. EDI is an adaptive approach that measures horizontal and

vertical gradients of missing G samples to decide the direction to perform interpolation.

NEDI initially computes the local covariance coefficients and and use them to adapt the

interpolation direction.

Table 3.2 lists the lossless compression bitrates of the proposed scheme for different

interpolation methods. The bitrates for BI is not listed as they are the same as the bitrates

of color difference image in Table 3.1. On average, lossless bitrates are 11.875, 11.895,

11.909, 11.867 bpp for BI, SPL, EDI, and NEDI, respectively. The observation shows that

use of advanced interpolation doesn’t significantly improve compression efficiency and

sometimes even degrades performance. Therefore, it is sufficient to use low complexity

bilinear interpolation in our proposed scheme for optimal compression performance.

3.3.3 Dissimilarity measure in template matching

The dissimilarity measure is a key element in template matching during prediction, since

the choice of dissimilarity metric in equation(3.4) and equation(3.9) affects computational

complexity and the accuracy of the prediction process. Table 3.3 presents the lossless

compression bitrates of the proposed scheme for the images from our database using two

commonly used dissimilarity metrics, SAD and SSE. They are defined as follows:

SAD(i,j) = |i− j| (3.13)

SSE(i,j) = (i− j)2 (3.14)

According to Table 3.3, the lossless bitrates for SAD and SSE are almost identical as

11.875 bpp and 11.874 bpp, respectively. We can conclude that selection of dissimilarity


measure does not significantly affect compression performance and therefore, SAD is

preferred to SSE due to its low complexity in implementation.

Image SAD SSE Image SAD SSE

1 10.069 10.083 17 11.671 11.669

2 13.040 13.036 18 13.113 13.114

3 13.041 13.039 19 12.215 12.211

4 10.676 10.672 20 11.192 11.189

5 11.736 11.735 21 12.094 12.094

6 10.126 10.122 22 11.756 11.754

7 10.079 10.077 23 10.758 10.756

8 10.659 10.659 24 10.98 10.974

9 12.939 12.938 25 11.335 11.329

10 13.338 13.339 26 12.103 12.101

11 13.872 13.871 27 12.525 12.524

12 11.097 11.095 28 12.069 12.069

13 12.841 12.839 29 10.157 10.161

14 14.004 14.001 30 13.198 13.198

15 13.268 13.269 31 11.548 11.550

16 10.622 10.620 Avg 11.875 11.874

Table 3.3: Lossless bitrate of proposed compression scheme with SAD and SSE

dissimilarity metrics

3.3.4 Prediction algorithm

We compared performance of our proposed method with other methods described in the

literature. Methods in comparison are : i) method 1 : direct CFA image encoding using


JPEG XR, ii) method 2 : direct CFA image encoding using JPEG 2000, iii) method

3 : direct CFA image encoding using JPEG-LS, iv) method 4 : prediction based on

separation method [40] in junction with JPEG XR compression, v) method 5 : CMBP

predictor based method [41] in junction with JPEG XR compression, vi) method 6 :

activity level classification model (ALCM) [54] predictor based method combined with

JPEG XR compression, and vii) method 7 : our proposed method.

As a basis for performance comparison, we used some representative lossless com-

pression schemes, such as JPEG XR, JPEG 2000, and JPEG-LS, directly on the CFA

image in first three methods. Kakadu v.6.4 software implementation is used for JPEG

2000 coding and FFMpeg software is used for JPEG-LS coding. Other methods from 4

to 7 are considered to demonstrate the relationship between accuracy of predictor and

the compression efficiency. In method 4, quincunx G channel is separated into two rect-

angular lattices G1 and G2, and the prediction is carried out by estimating G1 from

G2. Non-green channels are directly encoded in color difference domain. The CMBP

predictor in method 5 is essentially very similar to our proposed predictor. It initially

generates a direction vector map of sample image to determine homogeneous regions and

only performs prediction in nonhomogeneous regions with pre-defined weight factors for

neighborhood pixels. The ALCM predictor in method 6 estimates a current pixel using

a weighted combination of neighbor pixels. Initially equal weights are assigned for all

pixels and if previous prediction was higher than the actual pixel value, then the weight

of the largest neighbor pixel is decreased by 1/256 and the one for smallest neighbor pixel

is increased by the same amount. If previous prediction was lower than the actual pixel

value, then the weights of the largest and the smallest neighbor pixels are adjusted in

opposite way.

Figure 3.12 shows the entropy of sample images from our database associated with

different prediction schemes, from method 4 to 7. The entropy of image can be determined


by the formula

H = −n∑i=1

Pi log2 Pi (3.15)

where Pi is probability of occurrence of pixel value i and H is the entropy of image. The

entropy is evaluated by generating image histogram from the prediction error image of

each sample images. Since the entropy of image data determines the theoretical lower

bound which can be achieved by lossless compression, we can evaluate the effectiveness

of different prediction algorithms. The average entropies of various prediction methods

result in 12.956, 11.637, 11.704, and 11.395 for method 4, 5, 6, and 7, respectively.

The proposed method shows the lowest average entropy value, indicating potential high

compression efficiency.

Figure 3.12: Entropy of sample images from the database with various prediction

methods

The output compression bitrates of CFA images from our database achieved by var-

ious methods are presented in Table 3.4 and Table 3.5. The results clearly show that

direct compression of the CFA mosaic image is not efficient. In direct CFA compression


scenario, JPEG 2000 is superior to JPEG XR and JPEG-LS in terms of compression

efficiency, outperforming JPEG XR and JPEG-LS in average bitrate by 0.5 and 1.1 bpp,

respectively. However, as can be seen, exploiting accurate prediction method allows the

JPEG XR equipped pipeline to achieve higher compression ratio than JPEG 2000. On

Img M1 M2 M3 Img M1 M2 M3

1 11.351 9.393 10.657 17 13.090 12.868 13.444

2 14.290 14.200 14.590 18 14.804 14.334 15.287

3 14.406 14.322 15.080 19 14.035 13.579 15.066

4 13.964 12.856 15.766 20 12.774 12.436 13.192

5 13.162 12.961 13.828 21 13.810 13.308 14.192

6 11.270 10.775 10.890 22 13.779 13.250 14.693

7 12.971 11.883 15.130 23 11.998 12.096 12.362

8 13.269 12.258 14.571 24 13.410 12.389 14.283

9 14.224 14.122 14.860 25 13.495 13.098 14.506

10 14.636 14.552 15.231 26 14.236 13.488 14.883

11 15.155 15.143 15.554 27 14.203 13.665 14.956

12 12.938 12.349 13.650 28 13.184 13.145 13.245

13 14.047 14.047 14.252 29 12.891 11.508 14.110

14 15.433 15.319 16.156 30 14.328 14.347 14.524

15 14.452 14.470 14.971 31 13.235 12.030 12.914

16 12.633 11.958 13.388 Avg 13.596 13.102 14.201

Table 3.4: Lossless bitrate of various CFA compression schemes (direct CFA encoding

schemes)

average, the our proposed scheme yields a lossless compression bitrate of 11.875 bpp for

images in our database. The average compression bitrate obtained by other reviewed


Img M4 M5 M6 M7 Img M4 M5 M6 M7

1 13.701 10.204 10.311 10.069 17 12.796 11.900 11.844 11.671

2 13.962 13.223 13.241 13.040 18 14.407 13.338 13.319 13.113

3 14.031 13.264 13.268 13.041 19 13.699 12.388 12.441 12.215

4 12.763 10.957 10.949 10.676 20 12.589 11.405 11.448 11.192

5 12.672 11.967 11.925 11.736 21 13.812 12.310 12.279 12.094

6 12.411 10.297 10.255 10.126 22 13.543 11.923 11.989 11.756

7 12.150 10.325 10.266 10.079 23 11.496 11.034 10.972 10.758

8 12.572 10.841 10.906 10.659 24 12.196 11.228 11.149 10.980

9 13.759 13.190 13.148 12.939 25 13.174 11.590 11.526 11.335

10 14.261 13.553 13.545 13.338 26 13.357 12.278 12.280 12.103

11 14.643 14.103 14.055 13.872 27 13.816 12.691 12.832 12.525

12 12.849 11.313 11.319 11.097 28 13.272 12.250 12.300 12.069

13 13.605 13.091 13.042 12.841 29 11.405 10.381 10.389 10.157

14 14.790 14.238 14.173 14.004 30 14.114 13.410 13.444 13.198

15 14.095 13.534 13.436 13.268 31 12.930 11.703 11.821 11.548

16 12.021 10.906 10.843 10.622 Avg 13.255 12.091 12.088 11.875

Table 3.5: Lossless bitrate of various CFA compression schemes (predictive coding

schemes)

predictors with JPEG XR compression are 13.255, 12.091, and 12.088 bpp, for method 4,

5, and 6, respectively. For most of images in database, the proposed method consistently

achieves the lowest lossless compression bitrates, proving robustness of the solution in

terms of compression efficiency.

Apart from the lossless bitrate performance of the proposed solution, its computa-

tional complexity is also analyzed in terms of normalized operations, such as addition


(ADD), bit shift (SHF), multiplication (MUL), absolute value (ABS), and comparison

(CMP). Table 3.6 presents a summary of number of operations per pixel required to

carry out each stage of prediction process. In this analysis, the bilinear interpolation is

used for missing G pixel estimation and SAD metric is used for dissimilarity measure-

ment during prediction. It can be seen that performing non-green prediction in the color

difference domain instead of the intensity domain increases number of operations for the

proposed scheme by 2 addition and 0.5 shift per pixel since the G interpolation and

the difference signal estimation stages are unnecessary for the intensity domain. Such

a marginal increase in computational cost is considered to be tolerable given that use

of the color difference domain yields reduction in average lossless bitrate by 0.5 bpp as

shown in Section 3.3.1.

Stage ADD SHF MUL ABS CMP

G sub-image prediction 19.5 1 7 8 0

G interpolation (BI) 1.5 0.5 0 0 0

Diff R/B channel estimation 0.5 0 0 0 0

Diff R sub-image prediction 9.75 0.5 3.5 4 0

Diff B sub-image prediction 9.75 0.5 3.5 4 0

Total 41 2.5 14 16 0

Table 3.6: Number of operations per pixel required for the proposed scheme

3.4 Chapter Summary

In this chapter, a lossless Bayer CFA compression scheme capable of handling HDR

representation is presented. In summary, the following conclusion can be drawn from

this chapter: i) the structure separation step reduces high frequency artifacts, leading

to high compression efficiency, ii) the proposed weighted template matching predictor


exploits inter-channel and spatial correlation to achieve high compression performance,

and iii) the proposed scheme utilizes low complexity building blocks, such as bilinear

interpolation, SAD dissimilarity measure, and JPEG XR encoding module, to minimize

the computational cost. The image entropy analysis and experimental results indicate

that the proposed scheme delivers higher lossless compression performance than other

prior-art solutions.

Chapter 4

Lossy CFA Compression using

Colorspace Conversion

4.1 Introduction

The previous chapter presented a HDR CFA compression solution which is reversible

so that the original CFA image can be perfectly reconstructed. Despite its advantage

of having no loss of information, the proposed lossless scheme do not provide adequate

compression ratios for target devices with low data storage. This chapter presents a lossy

CFA compression pipeline capable of handling HDR representation, which provide greater

compression ratio gains than the lossless scheme at the expense of marginal quality loss.

We focused on the Bayer CFA structure as it is the most widely utilized CFA arrange-

ment in the industry. The proposed scheme consists of a color space conversion module,

structure conversion step, and thus similar to the approaches discussed in [14, 38, 43, 55].

However, there are three important differences between the proposed scheme and the prior

art solutions. First, a novel color space namely YCoCg is used, instead of the YCbCr in

order to offer higher compression with reduced computation cost. YCoCg, another vari-

ation of luminance-chrominance based color space, offers simplified implementation due

55

Chapter 4. Lossy CFA Compression using Colorspace Conversion 56

to its integer based operation [30]. Secondly, we introduce a data adaptive edge-sensing

mechanism into the encoding pipeline in order to enhance the quality of reconstructed

images, which are generated by the companion decoding pipeline. Contrary to most of

the prior art solutions which utilize non-data adaptive or 4-direction based mechanism,

the proposed pipeline uses 8-directional system based approach to generate higher qual-

ity images at fractionally higher computation cost. Lastly, we make use of the recently

standardized image compression, JPEG XR in pipeline to facilitate CFA compression

with HDR representation [34]. HDR imaging typically requires 10 to 16 bit per color

component to represent image scenes, whereas conventional low dynamic range (LDR)

imaging only requires up to 8 bit. Due to its higher precision, HDR capability has re-

cently become one of the key features for high-end digital cameras. However, most of

the prior art in CFA compression is limited to codecs applied to conventional 8 bit per

color channel image inputs. Such a conventional pipeline disallows the rich visual content

afforded by the HDR CFA data as the original HDR data stream is mapped onto an 8

bit equivalent representation prior to applying compression solutions. It would be shown

that the proposed CFA compression pipeline produces high quality compressed images

while using expensive memory resources efficiently.

The rest of this chapter is organized as follows. Section 4.2 presents the new CFA

compression pipeline in detail. Experiment results are reported in Section 4.3 and the

chapter summary is provided in Section 4.4.

4.2 Proposed Algorithm

The proposed CFA compression schemes require a series of reversible pre-processing op-

erations prior to applying JPEG XR compression. The pre-processing operations give us

full control on color space conversion and pixel arrangement of input images to achieve

highly efficient compression performance.


Figure 4.1: Overview of the proposed lossy HDR CFA image compression pipeline

Initially the CFA image is transformed from the RGB domain into the YCoCg domain

to reduce inter-channel redundancy. We advocate the use of the YCoCg color space over

commonly used YCbCr since it is shown that YCoCg transform provides higher coding

gain at lower computational complexity [30]. The color space conversion requires all

three RGB components at each pixel location but the CFA image contains only one at

each pixel, so missing two components need to be estimated from adjacent pixels. Our

methods use a 8-directional data adaptive CDM to interpolate missing pixels. Following

a conventional CDM approaches on Bayer CFA, this algorithm initially perform interpo-

lation on G pixels, followed by interpolation of color difference signals, R-G, and B-G.

We then immediately compute the YCoCg image from interpolated G, R-G, and B-G

signals. As illustrated in Figure 4.1, two versions of image processing pipelines (IPP) are

proposed depending on number of the Y pixels to calculate during this stage. Namely,

IPP1 computes Y values at all pixel location to preserve complete edge information. On

the other hands, IPP2 reduces computational complexity by keeping only half of the Y

values. In both IPPs, only one chrominance pixels are computed for every each 2x2 pixel

blocks in the original CFA image. Once color conversion is completed, the YCoCg image


is rearranged to a shape more appropriate for the subsequent compression. Since this

structure conversion step produces the data output formatted in YUV 4:2:0 for IPP1 and

YUV 4:2:2 for IPP2, matching encoding modes provided by JPEG XR codec are applied

to them corresponding to output formats.

In the companion decoding pipeline, where a final reconstructed RGB image is pro-

duced to be rendered in display devices, the sequence of encoding pipeline is reversed.

Unlike the encoding pipeline which has to be implemented on the camera, the decoding

pipeline can be off-loaded to the end device, such as personal computers (PC). A PC

based decoding pipeline can include advanced algorithms to produce high fidelity recon-

structed images due to sufficient resources whereas a camera on-chip solution typically

exploits less complex algorithms to reduce computational cost and power consumption.

4.2.1 Interpolation of missing green components

In Bayer CFA, the G is a dominant component among three primary colors and suffers

the least from an aliasing issue. For this reason, it is common to start estimation of

missing pixels from the G components [50, 56, 57]. In our method, we employ an ESM

operator, and an inter-channel correlator to reconstruct missing G components. Among

several ESM operators, we found that the 8-directional data adaptive algorithm [50, 56]

offers high performance at low computation cost, and thus it is exploited in our pipeline.

In this algorithm, missing pixel value is computed by weighted sum of neighbor pixels

from 8 directions. The estimation of G pixels is formulated as follows:

y(i,j)G =

x(i,j)G if z(i,j) ∼= x(i,j)G∑

(p,q)∈ζ(w′(p,q) · x′(p,q)G) otherwise

(4.1)

where operator ∼= denotes a one to one relationship, z is the pixel value of the original

grayscale CFA image, x(i,j)G is the G pixel value at position (i,j), ζ are the 8 neighborhood

pixels of (i,j) such that ζ ∈ {(i−1, j), (i, j−1), (i, j+ 1), (i+ 1, j), (i−1, j−1), (i−1, j+


1), (i + 1, j − 1), (i + 1, j + 1)}, and x′(i,j)G are the predicted G values of neighborhood

pixels obtained using local edge information. The normalized edge-sensing weights w′(p,q)

are given by,

w′(p,q) = w(p,q)/∑

(m,n)∈ζ

w(m,n) (4.2)

The original edge-sensing weight factor, w(p,q) is defined in equation(4.3) using inverse

gradient

w(p,q) = {1 + (∑

(r,s)∈ζ

|z(p,q) − z(r,s)|/D(z(p,q), z(r,s)))}−1 (4.3)

where D(z(p,q), z(r,s)) represents spatial distance between two pixel locations and 1 is

added in the denominator to avoid singularity issue. The weight factor adaptively reduces

the influence of pixels across an edge, and located further from current spatial location,

to enhance estimation performance.

Figure 4.2: Indexing of the samples within a 5x5 window of Bayer CFA

Using the 8-directional data adaptive system, calculation of G components requires

a 5x5 support window centered at the missing G location. In Figure 4.2, estimation of

G components at location (i, j) requires edge-sensing weight coefficients and predicted

G values for 8 adjacent pixels. w(i−1,j), w(i,j+1), w(i+1,j), and w(i,j−1), correspond to the


north, east, south, and west weight factors of x(i,j) pixel are defined as:

w(i−1,j) = {1 + (|z(i,j) − z(i−2,j)|+ |z(i−1,j) − z(i+1,j)|)/(2)}−1

w(i,j+1) = {1 + (|z(i,j) − z(i,j+2)|+ |z(i,j+1) − z(i,j−1)|)/(2)}−1

w(i+1,j) = {1 + (|z(i,j) − z(i+2,j)|+ |z(i+1,j) − z(i−1,j)|)/(2)}−1

w(i,j−1) = {1 + (|z(i,j) − z(i,j−2)|+ |z(i,j−1) − z(i,j+1)|)/(2)}−1

(4.4)

w(i−1,j−1), w(i−1,j+1), w(i+1,j+1), and w(i+1,j−1), correspond to the north-west, north-east,

south-east, and south-west weight factors are defined as,

w(i−1,j−1) = {1 + (|z(i,j) − z(i−2,j−2)|+ |z(i−1,j−1) − z(i+1,j+1)|)/(2√

2)}−1

w(i−1,j+1) = {1 + (|z(i,j) − z(i−2,j+2)|+ |z(i−1,j+1) − z(i+1,j−1)|)/(2√

2)}−1

w(i+1,j+1) = {1 + (|z(i,j) − z(i+2,j+2)|+ |z(i+1,j+1) − z(i−1,j−1)|)/(2√

2)}−1

w(i+1,j−1) = {1 + (|z(i,j) − z(i+2,j−2)|+ |z(i+1,j−1) − z(i−1,j+1)|)/(2√

2)}−1

(4.5)

Similar to computation of edge-sensing weights, computation of predicted values of G

pixel around x(i,j) differentiates horizontal/vertical and diagonal directions. For horizon-

tal and vertical directions, predicted G pixel values are given by,

x′(i−1,j)G = x(i−1,j)G + (z(i−2,j) − z(i,j) + z(i−1,j) − z(i+1,j))/(4)

x′(i,j+1)G = x(i,j+1)G + (z(i,j+2) − z(i,j) + z(i,j+1) − z(i,j−1))/(4)

x′(i+1,j)G = x(i+1,j)G + (z(i+2,j) − z(i,j) + z(i+1,j) − z(i−1,j))/(4)

x′(i,j−1)G = x(i,j−1)G + (z(i,j−2) − z(i,j) + z(i,j−1) − z(i,j+1))/(4)

(4.6)


For diagonal direction, they are defined as follows:

x′(i−1,j−1)G = {x(i−1,j)G + x(i,j−1)G + (z(i−1,j−1) − z(i+1,j+1))/(2√

2)

+ (z(i−2,j) + z(i,j−2) − 2× z(i,j))/(4)}/2

x′(i−1,j+1)G = {x(i−1,j)G + x(i,j+1)G + (z(i−1,j+1) − z(i+1,j−1))/(2√

2)

+ (z(i−2,j) + z(i,j+2) − 2× z(i,j))/(4)}/2

x′(i+1,j+1)G = {x(i+1,j)G + x(i,j+1)G + (z(i+1,j+1) − z(i−1,j−1))/(2√

2)

+ (z(i+2,j) + z(i,j+2) − 2× z(i,j))/(4)}/2

x′(i+1,j−1)G = {x(i+1,j)G + x(i,j−1)G + (z(i+1,j−1) − z(i−1,j+1))/(2√

2)

+ (z(i+2,j) + z(i,j−2) − 2× z(i,j))/(4)}/2

(4.7)

By substituting normalized edge-sensing weight factors and predicted G pixel values into

equation(4.1), missing G pixels are estimated and full G channel is constructed.

4.2.2 Interpolation of color difference components

We perform interpolation in the color difference domain, R-G and B-G, instead of the

original R and B intensity domain. Image signals in the color difference domain are

generally smoother than ones in the intensity domain, thus, are more suitable for linear

interpolation. The difference signal R-G is obtained as follows:

y(i,j)RG =

x(i,j)R − y(i,j)G if z(i,j) ∼= x(i,j)R∑

(p,q)∈ζ1(w′′(p,q) · y(p,q)RG) if z(i,j) ∼= x(i,j)G∑

(p,q)∈ζ2(w′′′(p,q) · y(p,q)RG) if z(i,j) ∼= x(i,j)B

ζ1 ∈ {(i− 1, j), (i, j − 1), (i, j + 1), (i+ 1, j)}

ζ2 ∈ {(i− 1, j − 1), (i− 1, j + 1), (i+ 1, j − 1), (i+ 1, j + 1)}

(4.8)

where y(i,j)RG is the estimated R-G value at pixel (i, j), y(i,j)G is the estimated G value

from previous stage, and ζ1 and ζ2 are horizontal/vertical and diagonal neighbor pixels


of (i, j) respectively. Here, w′′ and w′′′ are renormalized edge-sensing weights for hor-

izontal/vertical and diagonal directions, respectively. The B-G signal, y(i,j)BG, can be

calculated using the same technique as follows:

y(i,j)BG =

∑(p,q)∈ζ2(w

′′′(p,q) · y(p,q)BG) if z(i,j) ∼= x(i,j)R∑

(p,q)∈ζ1(w′′(p,q) · y(p,q)BG) if z(i,j) ∼= x(i,j)G

x(i,j)B − y(i,j)G if z(i,j) ∼= x(i,j)B

(4.9)

4.2.3 Correction of green and color difference components

The correction operation utilizes correlation between color channels and edge information

to enhance estimation accuracy. The correction mechanism initially updates G as follows:

y(i,j)G =

x(i,j)R −

∑(p,q)∈ζ1(w

′′(p,q) · y(p,q)RG) if z(i,j) ∼= x(i,j)R

x(i,j)B −∑

(p,q)∈ζ1(w′′(p,q) · y(p,q)BG) if z(i,j) ∼= x(i,j)B

(4.10)

Then, the corresponding color difference signals, R-G and B-G, at corrected G pixel

positions are also updated as follows:

y(i,j)RG = x(i,j)R − y(i,j)G ,if z(i,j) ∼= x(i,j)R

y(i,j)BG = x(i,j)B − y(i,j)G ,if z(i,j) ∼= x(i,j)B

(4.11)

Finally R-G and B-G planes are corrected using the same formula given by equation(4.8)

and (4.9). This simple iteration reduces false color estimation and blurred edges while

preserving original z values of CFA data. [50]


4.2.4 YCoCg color conversion

The G, R-G, and B-G planes are fully populated through previous stages. Color space

conversion from the RGB domain to the YCoCg domain is given by:

y(i,j)Y =1

4y(i,j)R +

1

2y(i,j)G +

1

4y(i,j)B =

(y(i,j)RG + y(i,j)BG + 4× y(i,j)G)

4

y(i,j)Co =1

2y(i,j)R −

1

2y(i,j)B =

y(i,j)RG − y(i,j)BG2

y(i,j)Cg = −1

4y(i,j)R +

1

2y(i,j)G +

1

4y(i,j)B =

(−y(i,j)RG − y(i,j)BG)

4

(4.12)

It should be noted that calculating full resolution of three channels, Y, Co, and Cg, will

triple number of pixels to compress compared to the ones in original CFA image. We

propose two methods to reduce number of pixels to compress.

(a) Color space conversion for IPP1 (b) Color space conversion for IPP2

Figure 4.3: Two versions of color space conversion

The first method, IPP1, preserves four Y, one Co, and one Cg components for every

2x2 CFA pixels. This process reduces the spatial resolution of chrominance (chroma)

channels by 75 percent, but still allows us to maintain high image quality by keeping full

Y plane, which is perceptually more significant than chroma planes. In order to reduce

the spatial resolution of Co and Cg, a chroma subsampling is applied. Here, we discard

three chroma pixels from each 2x2 block for simplicity. After subsampling, the spatial

resolutions of chroma channels are halved in both horizontal and vertical direction.

The second method, IPP2, further reduces number of pixels to compress by discarding

half of Y pixels. It calculates Y pixels only at the G positions of the original CFA

image. It is because, the G is the dominant color in Y calculation and distortion can be


minimized by using reliable original G samples instead of interpolated ones. [43] Two

chroma channels of IPP2 are subsampled in the same manner as IPP1.

4.2.5 Structure conversion

Since image compression standards typically only allow rectangular patterns as inputs,

a structure conversion process is necessary. During this stage, the quincunx Y channel

in IPP2 is rearranged into a rectangular array by up-shifting every Y pixels located in

even rows by 1 pixel. It should be noted this step is unnecessary for IPP1 as Y pixels

already arranged in rectangular grid. For both IPP1 and IPP2, the Co and Cg pixels

are pressed together to form rectangular arrays. After structure conversion, the YCoCg

data in IPP1 constitutes the standard YUV 4:2:0 format, and thus, can be compressed

by applying YCC 4:2:0 mode of JPEG XR encoding. Similarly, the rearranged YCoCg

data in IPP2, formatted in YUV 4:2:2, can be compressed by YCC 4:2:2 mode of JPEG

XR encoding.

4.3 Experimental Results

The performance of the proposed solution is examined in following sequences. RGB

images of 16-bit per component representation from the Para-Dice Insight Compression

Database [51] in Figure 3.10 are initially resized to 960x640. The resized test images

o : Z2 → Z3 are sampled by the Bayer CFA to produce the CFA images z : Z2 → Z.

The CFA images z are then preprocessed using the proposed pipelines and compressed

into JPEG XR format c by JPEG XR reference software [52]. The reconstruct RGB

images x : Z2 → Z3 to be displayed to the end-user are generated by applying JPEG

XR decompression to the compressed data c followed by processing operations in reverse

order. In our experiments, we apply the bilinear interpolation to estimate missing Y, Co,

and Cg components in the decoding pipeline. The reconstructed image x should be as


close as possible to the desired RGB image o.

We modified the reference software to allow 16-bit per component YUV 4:2:0 data

as inputs for raw encoding mode. This modification allows us to simulate IPP1. The

JPEG XR codec is configured in following manners: i) all subbands (DC, LP, and HP)

and flexbits are preserved during encoding, ii) first level overlapping mode is used for

the pre-filter function, and iii) the bit rate of encoded image is controlled by adjusting

quantization variables. Uniform quantization parameters are used for all three subbands

and color channels.

To evaluate the performance of the proposed solutions, image quality is measured

by comparing o and x using three quality assessment (QA) metrics: i) Composite Peak

Signal to Noise Ratio (CPSNR), ii) Multi-scale Structural Similarity Index (MSSIM) [48],

and iii) High Dynamic Range Visible Difference Predictor (HDR-VDP) [49]. CPSNR is

defined as follows :

CPSNR = 10 log10((216 − 1)2/(

1

3K1K2

3∑k=1

K1∑r=1

K2∑s=1

(o(r,s)k − x(r,s)k)2)) (4.13)

where B stands for bit depth. Although, CPSNR is widely used in literatures, it has

poor correlation with perceived quality. Therefore, we include human visual system

(HVS) modeling oriented metrics, multi-scale MSSIM and HDR-VDP.

MSSIM initially decomposes a test image into several scales and provides statistics

by measuring luminance, contrast, and structure information of each sub-scale image. It

is generally evaluated by assigning different weights to color channels, and represented

in dB scale as follows:

MSSIM = 20 log10{(wY ·MSSIMY ) + (wCb ·MSSIMCb) + (wCr ·MSSIMCr)}−1

(4.14)

In this report, the weight coefficients for each channel of MSSIM are selected to be wY =

0.95, wCb = 0.02, and wCr = 0.03, following suggested usage from previous publications

[36, 58].


The VDP metric predicts pixel percentage of a test image that standard observers

would perceive as different from an original. The HDR-VDP deploys several HVS char-

acteristics into VDP to enhance its prediction accuracy in full visible range of luminance.

It is specifically tuned to support HDR images, and widely adopted in the comparison

of HDR images, and thus, we make use of HDR-VDP in reporting experimental results.

Similar to the MSSIM metric, HDR-VDP has been plotted in the dB scale as follows:

HDR− V DP = 20 log(1/r) (4.15)

where r denotes the ratio of pixels that standard observers would perceive as different

from the original.

Results reported in following sections are obtained from a wide range of compression

ratio values by averaging computed image quality of test images.

4.3.1 Edge Sensing Mechanism (ESM) and Compression

The rate distortion performance of CFA compression pipelines with various ESMs are

illustrated in Figure 4.4. The ESMs under consideration include the bilinear interpo-

lation (BI), the Laplacian interpolation (HA) [59], and the 8-directional data adaptive

interpolation (ESCC), which is deployed in our proposed pipeline. These ESM schemes

represent simple to complicated in terms of computational costs and vary in the quality

of image they produce. The BI interpolation is a typical example of non data adaptive

estimator that utilizes fixed edge-sensing weight factors for missing pixel estimation. The

HA interpolation is a classical edge-directed interpolator using the second order gradients

as the correction terms. Those two algorithms are often used as benchmark algorithms

in literatures [14, 38, 43] and thus we compare the performance of my solution against

them.

For IPP1 pipeline, the ESCC outperforms other ESMs throughout almost entire bit

rate range in all three quality metrics. The HA provides slightly higher CPSNR gain


than the ESCC at low bit rate, but other two perceptual metrics, strongly correlated

to visual perception, indicate that the ESCC is superior than the HA. It implies that

utilizing sophisticated ESM enhances the rate distortion performance of CFA compression

pipeline. However, as bit rate decreases, the selection of ESM has less impact on the

performance. At low bit rate, as shown in Figure 4.4, the ESSC provides almost identical

compression performance to the HA, and is still more efficient than the BI, although the

improvement is not as significant as the one at high bit rate. It is because the advanced

ESMs are more sensitive to fine edge detail, susceptible to compression errors, than low

complexity ones.

For IPP2 pipeline, not all error criteria show consistent results. CPSNR and MSSIM

metrics indicate that the ESCC achieves the best performance over almost entire range of

bit rates, while HDR-VDP metric indicates the HA outperforms the ESCC at the bit rate

range higher than 3 bits per pixel (bpp). This observation shows that the ESSC ESM is

more optimized for IPP1 than IPP2. Such suboptimal compression performance of IPP2

is caused by artificial high frequency components introduced in structure conversion

stage. Similar to IPP1 case, advanced ESMs in IPP2 provide less benefit in terms of

compression efficiency as bit rate decreases.

4.3.2 Color Space and Compression

The Figure 4.5 demonstrates rate distortion curves of the proposed scheme in junction

with the RGB-YCoCg conversion, and two other variants, including the RGB-YCbCr

and the JPEG2000 reversible color transform (RCT). The RGB-YCbCr conversion is

commonly used color conversion in CFA compression pipeline [14, 43] and thus, we con-

sider it as a reference method. The JPEG2000 RCT is considered in comparison since

it features low complexity as the YCoCg, requiring only addition and shift operations in

computation.

Our experimental results show that all three color space variants produce nearly


(a) CPSNR for IPP1 (b) MSSIM for IPP1

(c) HDR-VDP for IPP1 (d) CPSNR for IPP2

(e) MSSIM for IPP2 (f) HDR-VDP for IPP2

Figure 4.4: Rate-distortion curves of proposed pipelines with different ESMs for various

quality metrics


(a) CPSNR for IPP1 (b) MSSIM for IPP1

(c) HDR-VDP for IPP1 (d) CPSNR for IPP2

(e) MSSIM for IPP2 (f) HDR-VDP for IPP2

Figure 4.5: Rate-distortion curves of proposed pipelines with different color spaces for

various quality metrics


identical performance for both IPP1 and IPP2. The YCoCg slightly outperforms other

two methods in MSSIM and HDR-VDP metrics, but results in a small loss (maximum

0.2dB) in CPSNR measure compared to the YCbCr. Since the YCoCg space offers

marginally higher perceptual metric performance at low complexity, among reviewed, it

is the most efficient choice for our CFA compression pipeline implementation.

4.3.3 Proposed Pipeline and Conventional Pipelines

The Figure 4.6 compares the rate distortion performance of our proposed pipelines, IPP1

and IPP2, against other variants. Namely, IPP3 represents the conventional workflow,

that initially demosaicks the CFA image via the ESSC CDM and then compresses the

resultant RGB image. The compressed image is decoded and displayed. IPP4 firstly

encodes the CFA image directly without any pre-processing operations. The full RGB

image is obtained by demosaicking the decoded CFA image using the ESCC CDM. The

combination of two new pipelines with two codecs, JPEG XR and JPEG 2000, allows us

to test four new solutions in addition to our methods. For JPEG 2000 coding, the JasPer

software implementation [60] is used. Comparison to the conventional JPEG is omitted

due to its lack of support for 16-bit per components input.

Experimental result shows that IPP1 consistently outperforms IPP3 and IPP4 in all

three quality measures at high bit rates, above 8 bpp, regardless of the used codec. Also,

IPP1 substantially outperforms IPP2 at high bit rates. For mid-range bit rate, between

2 and 8 bpp, IPP4 provides the best image quality. At low bit rate, all three metrics

show that IPP3 produces images of superior quality than other pipelines.

At low bit rate IPP2 outperforms IPP1 in terms of the rate distortion performance.

There are two reasons behind this. Higher compression removes more texture and edge

details and thus it reduces the high frequency artifacts generated during quincunx to

rectangular array conversion of Y pixels in IPP2. Consequently, the reduction of the high

frequency components leads to improvement of the compression efficiency. In addition,


(a) CPSNR for various IPPs (b) MSSIM for various IPPs

(c) HDR-VDP for various IPPs at below 4 bpp (d) HDR-VDP for various IPPs at above 4 bpp

Figure 4.6: Rate-distortion curves of the proposed pipelines and 4 other pipelines for

various image quality metrics

the smaller input size of IPP2 results in giving better performance. Conversely, at high

bit rate the aliasing in IPP2 disallows efficient coding and the reduction of Y pixels results

in poor edge restoration. Thus, high bit rate favors IPP1 while low bit rate favors IPP2.

Figure 4.7 allows the visual evaluation of pipelines via the sub-region of reconstructed

images generated at low bit rate between 1 and 2 bpp. We can observe that IPP2 and

IPP3 maintain acceptable visual quality even under high compression ratio, whereas

IPP1 and IPP4 suffer from various visual artifacts. Images generated by IPP4 at low


bit rate are significantly distorted by lattice patterned artifacts. This unpleasing texture

appears for IPP4 with both JPEG 2000 and JPEG XR. Applying high compression

on CFA data removes edge information required for CDM and introduces noise which

can misguide ESM operators to generate false weight factors. Thus, advanced ESMs,

typically more sensitive to edge detail, may not produce acceptable quality images out of

highly compressed CFA data. The conventional workflow, IPP3, does not suffer from such

problem at low bit rate since CDM is done prior to compression. In addition, demosaicked

data typically have higher inter-pixel correlation than CFA data, enabling more efficient

compression. For these reasons, IPP3 works well at low bit rate, providing almost same

rate distortion performance as our proposed IPP2 pipeline. In Figure 4.6, the perceptual

metrics, MSSIM and HDR-VDP, indicate that IPP4 results in lower quality gain than

IPP2 and IPP3, providing us consistent results as the visual inspection.

Our experimental results show that the compression performance of JPEG XR and

JPEG 2000 is very close to each other and generally JPEG 2000 is slightly superior but

the gain is marginal. It can be seen in Figure 4.6 that for both IPP3 and IPP4, the

use of JPEG 2000 instead of JPEG XR compression slightly improves the rate distortion

performance over wide bit rate ranges in all three metrics.

Apart from the rate distortion performance, we also report the average encoding time

per image in millisecond for different combinations of pipelines and codecs in Table 4.1.

Experimental results, averaged over the image sets, are obtained on an Intel Core 2 Duo

2.53GHz CPU with 4GB RAM running Windows 7 operating system. For the CFA

input of size K1 × K2, number of pixels to encode in IPP1, IPP2, IPP3, and IPP4 are

1.5×K1×K2, K1×K2, 3×K1×K2, and K1×K2 pixels, respectively. The result shows

that the encoding delay for each pipeline is proportional to the number of pixels in input

data. This observation clearly shows a trade-off between the quality and the complexity.

At low bit rate, as shown in Figure 4.6, IPP2 performs significantly better than IPP1

and almost comparable to IPP3 in terms of image quality. The average encoding speed


(a) IPP1 (b) IPP2 (c) IPP3 (d) IPP4

(e) IPP1 (f) IPP2 (g) IPP3 (h) IPP4

(i) IPP1 (j) IPP2 (k) IPP3 (l) IPP4

Figure 4.7: Full color images obtained from four examined IPPs with JPEG XR codec

at bit rate between 1 and 2 bpp. First 4 images are sub-regions of the image 18, next 4

images are from the image 21, and last 4 images are from the image 1 in the database

of IPP2 is considerably faster than either IPP1 and IPP3 with JPEG XR encoding.

Therefore, on the condition that small quality loss is tolerable for reduction in encoding

delay, low complexity IPP2 solution is desirable.

According to Table 4.1, JPEG XR substantially faster than JPEG 2000 by 3.5 to 4

times in compression speed. It is important to note that direct comparison of compression


Pipeline-Codec Number of pixels to encode

(for K1 ×K2 CFA image)

Encoding Time

(ms/frame)

IPP1 - JPEG XR 1.5×K1 ×K2 216.91

IPP2 - JPEG XR K1 ×K2 133.09

IPP3 - JPEG XR 3×K1 ×K2 280.15

IPP4 - JPEG XR K1 ×K2 126.47

IPP3 - JPEG 2000 3×K1 ×K2 1152.94

IPP4 - JPEG 2000 K1 ×K2 429.41

Table 4.1: Encoding time for different pipelines and codecs

speed may not be meaningful as the different codecs implementing same coding standards

might produce different results. However, given that the computation complexity of

JPEG XR is much lower than JPEG 2000, it is reasonable to assume that the encoding

speed of JPEG XR is relatively faster than JPEG 2000 even in real world applications.

Simplified architecture and fast encoding speed are huge advantages for consumer level

devices, and thus, it justifies use of JPEG XR against JPEG 2000 for the proposed CFA

compression pipeline at the expense of a small loss in quality performance.

4.4 Chapter Summary

In this chapter, a lossy Bayer CFA compression scheme capable of handling HDR rep-

resentation is presented. In summary, the following conclusion can be drawn: i) use

of the 8-directional data adaptive ESM as an alternative to simple ESMs in the pro-

posed pipeline yields high quality reconstructed images, especially at high bit rate, ii)

the YCoCg is a low complexity alternative to the conventional YCbCr color space, offer-

ing identical or slightly better perceptual quality, enabling more efficient implementation

of workflow, iii) the proposed IPP1 offers the highest image quality among reviewed


pipelines at high bit rate. At low bit rate, IPP2 produces visually pleasing images with

the lowest processing delay. For bit rate range between low and high, the direct CFA

encoding method provides the most satisfactory results in terms of coding efficiency. The

results suggest a selective use of pipelines in digital cameras depending on the target bit

rate, iv) Although JPEG 2000 can provide marginally higher coding efficiency, JPEG XR

is a light-weighted image codec, capable of supporting HDR format and therefore, suit-

able for resource constrained systems. Combining a series of pre-processing operations

and a JPEG XR encoding module delivers a complete novel cost-effective solution that

allows the efficient storage of HDR CFA image data.

Chapter 5

Conclusions and Future Work

5.1 Conclusions

Over the past years, advancement in single-sensor digital cameras has offered more con-

venient access to digital images in various environments. These consumer level cameras

capture the natural scene by generating a mosaic-like grayscale image, also known as a

CFA image. One major challenge in this field is to support HDRI technology to achieve

more accurate representation of real visual scenes. Since digital images in HDR for-

mat require a larger amount of data than conventional 8 bit representation, efficient

compression of HDR contents has become a critical issue to be addressed. This thesis

introduces both lossless and lossy compression schemes for the digital camera pipeline

which efficiently encode CFA images provided in HDR format. Both systems combine

a series of pre-processing operations and a JPEG XR encoding module. Pre-processing

operations exploit spatial and spectral (inter-channel) correlations in the original CFA

image to achieve optimal compression performance. The utilized JPEG XR codec enables

compression of HDR data at reasonable processing cost.

In Chapter 3, we proposed a lossless compression scheme to compress Bayer CFA

image. The proposed scheme deinterleaves the input CFA image into into sub-images

76

Chapter 5. Conclusions and Future Work 77

of single color component and adopts a predictor depending on local image statistics.

Generated prediction error signals of each sub-image are then encoded by JPEG XR

compression. Experimental result confirms that the proposed scheme effectively removes

spatial and spectral redundancies, delivering higher compression efficiency than other

prior-art solutions.

In Chapter 4, we proposed a novel cost-effective lossy CFA encoding pipeline capable

of handling HDR image representations. This scheme combines color space conversion,

structure conversion and JPEG XR encoding module. Experimental analysis and com-

parative evaluations using objective quality metrics indicate that the proposed pipeline

outperforms state-of-the-art CFA compression solutions, which deploy low complexity

edge sensing mechanisms and conventional color space conversions. Results suggest that

the proposed schemes offer superior performance at low and high bit rates.

The proposed lossless scheme can be utilized in high-end/professional photography

applications where the original CFA image needs to be preserved. On the other hand, the

proposed lossy scheme provides greater compression gains at the expense of information

integrity and is suitable for general consumer-level cameras with limited data storages.

5.2 Future Work

Although significant achievements have been made in this research to improve the com-

pression performance of digital cameras, there is still room for further improvements.

This section discusses potential technical improvements and problems which can be fur-

ther explored.

5.2.1 Potential extensions on the proposed systems

• Results provided by empirical evaluations are limited to the specific database and

are inconclusive. Further experiments are needed to analyze the behavior of pro-


posed schemes with diverse sample sets including both natural and synthetic im-

ages. In addition, we can further investigate the influence of various image char-

acteristics on compression performance of the proposed scheme including spatial

resolution, edge strength, and edge orientation.

• The proposed lossless CFA compression scheme in Chapter 3 can be extended to

support lossy compression. This can be achieved by enabling a quantization module

in JPEG XR codec and embedding a de-quantization module in the proposed pre-

dictor to undo quantization during prediction process. Eventually this modification

will allow us to build an unified compression pipeline that offers high compression

efficiency for both lossy and lossless encodings.

• The proposed compression schemes can be improved to support a near-lossless

compression. This would require identification of the function of allowable encoding

error. [40] The near-lossless compression scheme that exploits characteristics of

human visual perception is expected to achieve much higher compression efficiency

at the cost of marginal encoding errors.

5.2.2 General future work

In the past few years, new research problems in the area of digital photography have been

identified. Especially demands for advanced technologies which lead to the increased

amount of image data, such as 3D imaging, HDRI, and ultrahigh-definition technologies

have raised needs for efficient encoding algorithms in digital photography.

• One of the new challenges in digital photography is an adaptation of 3D image pro-

cessing technology into mobile devices equipped with single-sensor imaging technol-

ogy [61]. In general, 3D visual content is generated by capturing the visual scene

from multiple viewpoints. This leads to larger amount of image data compared

to the conventional 2D technology and thus we need a high performance compres-


sion scheme. For commercial viability, new implementation needs to be interactive,

portable, and embedded system friendly.

• Ultrahigh-definition television (UHDTV) is a digital video format that supports

16 times of high-definition television (HDTV) resolution with a framerate of 60Hz

and progressive scan mode. Development of an UHDTV camera utilizing single-

sensor imaging technology constitutes another emerging research direction in this

area [62]. Realization of such cameras requires an efficient compression technique

to handle high data rate visual contents.

Bibliography

[1] K. N. Plataniotis and A. N. Venetsanopoulos, Color image processing and applica-

tions. New York, NY, USA: Springer-Verlag New York, Inc., 2000.

[2] M. Mancuso and S. Battiato, “An introduction to the digital still camera technol-

ogy,” in ST Journal of System Research - Special Issue on Image Processing for

Digital Still Camera, pp. 200–1, 2001.

[3] K. Myszkowski, R. Mantiuk, and G. Krawczyk, High Dynamic Range Video. Synthe-

sis Lectures on Computer Graphics and Animation, Morgan & Claypool Publishers,

2008.

[4] D. Alleysson, S. Susstrunk, and J. Herault, “Linear demosaicing inspired by the

human visual system,” Image Processing, IEEE Transactions on, vol. 14, pp. 439

–449, Apr. 2005.

[5] B. Turko and G. Yates, “Low smear ccd camera for high frame rates,” Nuclear

Science, IEEE Transactions on, vol. 36, pp. 165 –169, Feb. 1989.

[6] A. J. Blanksby and M. J. Loinaz, “Performance analysis of a color CMOS photogate

image sensor,” IEEE Transactions on Electron Devices, vol. 47, pp. 55–64, Jan.

2000.

[7] J. Adams, K. Parulski, and K. Spaulding, “Color processing in digital cameras,”

Micro, IEEE, vol. 18, pp. 20 –30, nov/dec 1998.

80

Bibliography 81

[8] B. E. Bayer, “Color imaging array,” July 1976.

[9] I. E. Commission, “Colour measurement and management in multimedia systems

and equipment - part 2-1: Default rgb colour space - srgb,” 1999.

[10] G. K. Wallace, “The jpeg still picture compression standard,” Commun. ACM,

vol. 34, pp. 30–44, April 1991.

[11] “Exchangeable image file format for digital still cameras: Exif version 2.2,” 2002.

Standard of Japan Electronics and Information Technology Industries Association.

[12] R. Lukac, Single-Sensor Imaging: Methods and Applications for Digital Cameras.

Boca Raton, FL, USA: CRC Press, Inc., 1 ed., 2008.

[13] N. Zhang and X. Wu, “Lossless compression of color mosaic images,” Image Pro-

cessing, IEEE Transactions on, vol. 15, pp. 1379 –1388, June 2006.

[14] C. C. Koh, J. Mukherjee, and S. Mitra, “New efficient methods of image compression

in digital cameras with color filter array,” Consumer Electronics, IEEE Transactions

on, vol. 49, pp. 1448 – 1456, Nov. 2003.

[15] N.-X. Lian, L. Chang, V. Zagorodnov, and Y.-P. Tan, “Reversing demosaicking

and compression in color filter array image processing: Performance analysis and

modeling,” Image Processing, IEEE Transactions on, vol. 15, pp. 3261 –3278, Nov.

2006.

[16] B. Gunturk, J. Glotzbach, Y. Altunbasak, R. Schafer, and R. Mersereau, “Demo-

saicking: color filter array interpolation,” Signal Processing Magazine, IEEE, vol. 22,

pp. 44 – 54, Jan. 2005.

[17] X. Li, B. Gunturk, and L. Zhang, “Image demosaicing: a systematic survey,”

vol. 6822, p. 68221J, SPIE, 2008.

Bibliography 82

[18] R. Kimmel, “Demosaicing: image reconstruction from color ccd samples,” Image

Processing, IEEE Transactions on, vol. 8, pp. 1221 –1228, Sept. 1999.

[19] S.-C. Pei and I.-K. Tam, “Effective color interpolation in ccd color filter arrays using

signal correlation,” Circuits and Systems for Video Technology, IEEE Transactions

on, vol. 13, pp. 503 – 513, June 2003.

[20] R. Lukac, B. Smolka, K. Martin, K. Plataniotis, and A. Venetsanopoulos, “Vector

filtering for color imaging,” Signal Processing Magazine, IEEE, vol. 22, pp. 74 – 86,

jan. 2005.

[21] S. Battiato, A. Castorina, and M. Mancuso, “High dynamic range imaging for digital

still camera: an overview,” Journal of Electronic Imaging, vol. 12, no. 3, pp. 459–469,

2003.

[22] P. E. Debevec and J. Malik, “Recovering high dynamic range radiance maps from

photographs,” in Proceedings of the 24th annual conference on Computer graphics

and interactive techniques, SIGGRAPH ’97, (New York, NY, USA), pp. 369–378,

ACM Press/Addison-Wesley Publishing Co., 1997.

[23] E. Khan, A. Akyuz, and E. Reinhard, “Ghost removal in high dynamic range im-

ages,” in Image Processing, 2006 IEEE International Conference on, pp. 2005 –2008,

oct. 2006.

[24] O. Gallo, N. Gelfandz, W.-C. Chen, M. Tico, and K. Pulli, “Artifact-free high

dynamic range imaging,” in Computational Photography (ICCP), 2009 IEEE Inter-

national Conference on, pp. 1 –7, april 2009.

[25] T.-H. Lee, W.-J. Kyung, C.-H. Lee, and Y.-H. Ha, “Estimation of low dynamic range

images from single bayer image using exposure look-up table for high dynamic range

image,” vol. 7866, p. 78660B, SPIE, 2011.

Bibliography 83

[26] G. Qiu, J. Guan, J. Duan, and M. Chen, “Tone mapping for hdr image using opti-

mization a new closed form solution,” in Proceedings of the 18th International Con-

ference on Pattern Recognition - Volume 01, ICPR ’06, (Washington, DC, USA),

pp. 996–999, IEEE Computer Society, 2006.

[27] J. Duan, M. Bressan, C. Dance, and G. Qiu, “Tone-mapping high dynamic range

images by novel histogram adjustment,” Pattern Recogn., vol. 43, pp. 1847–1862,

May 2010.

[28] E. Reinhard, M. Stark, P. Shirley, and J. Ferwerda, “Photographic tone reproduction

for digital images,” ACM Trans. Graph., vol. 21, pp. 267–276, July 2002.

[29] D. S. Taubman and M. W. Marcellin, JPEG 2000: Image Compression Funda-

mentals, Standards and Practice. Norwell, MA, USA: Kluwer Academic Publishers,

2001.

[30] H. S. Malvar, G. J. Sullivan, and S. Srinivasan, “Lifting-based reversible color trans-

formations for image compression,” vol. 7073, p. 707307, SPIE, 2008.

[31] M. Weinberger, G. Seroussi, and G. Sapiro, “The loco-i lossless image compression

algorithm: principles and standardization into jpeg-ls,” Image Processing, IEEE

Transactions on, vol. 9, pp. 1309 –1324, aug 2000.

[32] X. Wu and N. Memon, “Calic-a context based adaptive lossless image codec,” in

Acoustics, Speech, and Signal Processing, 1996. ICASSP-96. Conference Proceed-

ings., 1996 IEEE International Conference on, vol. 4, pp. 1890 –1893 vol. 4, may

1996.

[33] M. Rabbani and R. Joshi, “An overview of the jpeg 2000 still image compression

standard,” Signal Processing: Image Communication, vol. 17, no. 1, pp. 3 – 48, 2002.

Bibliography 84

[34] I.-T. R. T.832 and I. 29199-2, “Information technology - jpeg xr image coding system

- part 2: Image coding specification,” 2009.

[35] S. Srinivasan, C. Tu, S. L. Regunathan, and G. J. Sullivan, “Hd photo: a new image

coding technology for digital photography,” vol. 6696, p. 66960A, SPIE, 2007.

[36] F. De Simone, M. Ouaret, F. Dufaux, A. G. Tescher, and T. Ebrahimi, “A com-

parative study of JPEG 2000, AVC/H.264, and HD Photo,” in SPIE Optics and

Photonics, Applications of Digital Image Processing XXX, vol. 6696, 2007.

[37] T. Bruylants, J. Barbarien, A. Munteanu, and P. Schelkens, “Perceptual quality

assessment of jpeg, jpeg 2000, and jpeg xr,” vol. 7723, p. 77230E, SPIE, 2010.

[38] R. Lukac and K. Plataniotis, “Single-sensor camera image compression,” Consumer

Electronics, IEEE Transactions on, vol. 52, pp. 299 – 307, May 2006.

[39] G. Schaefer, R. Nowosielski, and R. Starosolski, “Evaluation of lossless image com-

pression algorithms for cfa data,” in ELMAR, 2008. 50th International Symposium,

vol. 1, pp. 57 –60, Sept. 2008.

[40] A. Bazhyna and K. Egiazarian, “Lossless and near lossless compression of real color

filter array data,” Consumer Electronics, IEEE Transactions on, vol. 54, pp. 1492

–1500, november 2008.

[41] K.-H. Chung and Y.-H. Chan, “A lossless compression scheme for bayer color filter

array images,” Image Processing, IEEE Transactions on, vol. 17, pp. 134 –144, Feb.

2008.

[42] A. Bazhyna, K. Egiazarian, S. Mitra, and C. Koh, “A lossy compression algorithm

for bayer pattern color filter array data,” in Signals, Circuits and Systems, 2007.

ISSCS 2007. International Symposium on, vol. 2, pp. 1 –4, July 2007.

Bibliography 85

[43] H. Chen, M. Sun, and E. Steinbach, “Compression of bayer-pattern video sequences

using adjusted chroma subsampling,” Circuits and Systems for Video Technology,

IEEE Transactions on, vol. 19, pp. 1891 –1896, Dec. 2009.

[44] S. H. Lee and N. I. Cho, “H.264/avc based color filter array compression with inter-

channel prediction model,” in Image Processing (ICIP), 2010 17th IEEE Interna-

tional Conference on, pp. 1237 –1240, Sept. 2010.

[45] S.-Y. Lee and A. Ortega, “A novel approach of image compression in digital cam-

eras with a bayer color filter array,” in Image Processing, 2001. Proceedings. 2001

International Conference on, vol. 3, pp. 482 –485 vol.3, 2001.

[46] Z. Wang and A. C. Bovik, Modern Image Quality Assessment. Synthesis Lectures

on Image, Video, and Multimedia Processing, Morgan & Claypool Publishers, 2006.

[47] Z. Wang, A. C. Bovik, H. R. Sheikh, and E. P. Simoncelli, “Image quality assessment:

From error visibility to structural similarity,” IEEE TRANSACTIONS ON IMAGE

PROCESSING, vol. 13, no. 4, pp. 600–612, 2004.

[48] Z. Wang, E. P. Simoncelli, and A. C. Bovik, “Multi-scale structural similarity for

image quality assessment,” in in Proc. IEEE Asilomar Conf. on Signals, Systems,

and Computers, pp. 1398–1402, 2003.

[49] R. Mantiuk, K. Myszkowski, and H. peter Seidel, “Visible difference predicator for

high dynamic range images,” in in Proceedings of IEEE International Conference

on Systems, Man and Cybernetics, pp. 2763–2769, 2004.

[50] R. Lukac and K. Plataniotis, “Data adaptive filters for demosaicking: a framework,”

Consumer Electronics, IEEE Transactions on, vol. 51, pp. 560 – 570, May 2005.

[51] “Para-dice in sight - compression database.” http://cdb.paradice-insight.us/.

http://cdb.paradice-insight.us/

Bibliography 86

[52] I.-T. R. T.832 and I. 29199-5, “Information technology - jpeg xr image coding system

- part 5: Reference software,” 2010.

[53] X. Li and M. Orchard, “New edge-directed interpolation,” Image Processing, IEEE

Transactions on, vol. 10, pp. 1521 –1527, oct 2001.

[54] K. Subbalakshmi, Lossless Compression Handbook, ch. Lossless Image Compression.

No. ISBN 0-12-620861-1 in Communications, Networking, and Multimedia, Aca-

demic Press, 2003.

[55] C. Doutre and P. Nasiopoulos, “An efficient compression scheme for colour filter

array images using estimated colour differences,” in Electrical and Computer Engi-

neering, 2007. CCECE 2007. Canadian Conference on, pp. 24 –27, Apr. 2007.

[56] R. Lukac, K. Plataniotis, D. Hatzinakos, and M. Aleksic, “A novel cost effective de-

mosaicing approach,” Consumer Electronics, IEEE Transactions on, vol. 50, pp. 256

– 261, Feb. 2004.

[57] X. Li, “Demosaicing by successive approximation,” Image Processing, IEEE Trans-

actions on, vol. 14, pp. 370 –379, Mar. 2005.

[58] D. Schonberg, S. Sun, G. J. Sullivan, S. Regunathan, Z. Zhou, and S. Srinivasan,

“Techniques for enhancing JPEG XR / HD Photo rate-distortion performance for

particular fidelity metrics,” in Society of Photo-Optical Instrumentation Engineers

(SPIE) Conference Series, vol. 7073 of Society of Photo-Optical Instrumentation

Engineers (SPIE) Conference Series, Oct. 2008.

[59] J. F. Hamilton and J. E. Adams, “Adaptive color plane interpolation in single sensor

color electronic camera,” 1997.

Bibliography 87

[60] M. Adams and F. Kossentini, “Jasper: a software-based jpeg-2000 codec implemen-

tation,” in Image Processing, 2000. Proceedings. 2000 International Conference on,

vol. 2, pp. 53 –56 vol.2, Sept. 2000.

[61] K. Atanassov, V. Ramachandra, S. R. Goma, and M. Aleksic, “3D image process-

ing architecture for camera phones,” in Society of Photo-Optical Instrumentation

Engineers (SPIE) Conference Series, vol. 7864 of Society of Photo-Optical Instru-

mentation Engineers (SPIE) Conference Series, Jan. 2011.

[62] R. Funatsu, T. Yamashita, K. Mitani, and Y. Nojiri, “Single-chip color imaging

for UHDTV camera with a 33M-pixel CMOS image sensor,” in Society of Photo-

Optical Instrumentation Engineers (SPIE) Conference Series, vol. 7875 of Society

of Photo-Optical Instrumentation Engineers (SPIE) Conference Series, Feb. 2011.

high dynamic range image compression of color filter array … · 2013-11-01 · abstract high...

Documents