ecec 453 image processing architecture

Image Processing Architecture, © 2001-2004 Oleh Tretiak Lecture 5

ECEC 453Image Processing Architecture

Lecture 5, 1/22/2004Rate-Distortion Theory, Quantizers and DCT

Oleh TretiakDrexel University


Quality - Rate Tradeoff• Given: 512x512 picture, 8 bits per pixel

- Bit reductiono Fewer bits per pixelo Fewer pixelso Both

• Issues:- How do we measure compression?

o Bits/pixel — does not work when we change number of pixelo Total bits — valid, but hard to interpret

- How do we measure quality?o RMS noiseo Peak signal to noise ratio (PSR) in dBo Subjective quality


Comparison, Bit and Pixel Reduction

0

10

20

30

40

50

60

0 1000000 2000000Total bits in image

SubsampleDrop bits


Quantizer Performance• Questions:

- How much error does the quantizer introduce (Distortion)?- How many bits are required for the quantized values (Rate)?

• Rate:- 1. No compression. If there are N possible quantizer output values, then it takes ceiling(log2N) bits

per sample.- 2(a). Compression. Compute the histogram of the quantizer output. Design Huffman code for the

histogram. Find the average lentgth.- 2(b). Find the entropy of the quantizer distribution- 2(c). Preprocess quantizer output, ....

• Distortion:Let x be the input to the quantizer, x* the de-quantized value. Quantization noise n = x* - x. Quantization noise power is equal to D = Average(n2).


Quantizer: practical lossy encoding

• Quantizer- Symbols

x — input to quantizer, q — output of quantizer,S — quantizer step

- Quantizer:q = round(x/S)

- Dequantizer characteristicx* = Sq

- Typical noise power added by quantizer-dequantizer combination: D = S2/12

noise standard deviation = sqrt(D) = 0.287SExample: S = 8, D = 82/12 = 5.3,

rms. quatization noise = sqrt(D) = 2.3 If input is 8 bits, max input is 255. There are 255/8

~ 32 quantizer output values PSNR = 20 log10(255/2.3) = 40.8 dB

Quantizer characteristic

Dequantizer characteristic

Δxq

S

Δx*q

S


Rate-Distortion Theorem• When long sequences (blocks) are encoded, it is possible to

construct a coder-decoder pair that achieves the specified distortion whenever bits per sample are R(D) +

• Formula: X ~ Gaussian random variable, Q = E[X2] ~ signal power

• D = E[(X–Y)2 ] ~ noise power

p(x ) =1

2πQxπ(−x 2 / 2Q )

R =12log2

QD

D ≤Q0 D >Q

⎧ ⎨ ⎪ ⎩ ⎪


This Lecture• Decorrelation and Bit Allocation• Discrete Cosine Transform• Video Coding


Coding Correlated Samples• How to code correlated samples

- Decorrelate- Code

• Methods for decorrelation- Prediction- Transformation

o Block transformo Wavelet transform


Prediction Rules• Simplest: previous value

q i+Delay

p i

ˆ x iQuantizerDelayxi+-+qipi

p ij =w1 ˆxi−1,j+w 2 ˆxi,j−1 +w 3 ˆxi−1,j−1

ˆ x i −1,j

ˆxi,j−1ˆxi−1,j−1

πij

ˆ x ip i = ˆxi−1

ˆxi =πi + q i = ˆxi−1 +q i


General Predictive Coding• General System

• Example of linear predictive image coder

+–Quantizer

Predictor

xi

pi

ei ei*

+Predictor

xi^

pi

ei*

+

p ij =w1 ˆxi−1,j+w 2 ˆxi,j−1 +w 3 ˆxi−1,j−1

ˆ x i −1,j

ˆxi,j−1ˆxi−1,j−1

πij


Rate-distortion theory — correlated samples

• Given: x = (x1, x2, ... xn), a sequence of Gaussian correlated samples

• Preprocess: convert to y = (y1, y2, ... yn), y = Ax, A ~ an orthogonal matrix (A-1 = AT) that de-correlates the samples. This is called a Karhunen-Loeve transformation

• Perform lossy encoding of (y1, y2, ... yn) - get y* = (y1*, y2*, ... yn*) after decoding

• Reconstruct: x* = A-1y*SignalPre-processorEncoderBitsCodesDecoderCodesRecon-structor

SignalTransmitReceive


Block-Based Coding• Discrete Cosine Transform (DCT)

is used instead of the K-L transform

• Full image DCT - one set of decorrelated coefficients for whole image

• Block-based coding: - Image divided into ‘small’ blocks- Each block is decorrelated

separately• Block decorrelation performs

almost as well (better?) than full image decorrelation

• Current standards (JPEG, MPEG) use 8x8 DCT blocks


Rate-distortion theory: non-uniform random variables

• Given (x1, x2, ... xn), use orthogonal transform to obtain (y1, y2, ... yn).

• Sequence of independent Gaussian variables (y1, y2, ... yn), Var[yi ] = Qi.

• Distortion allocation: allocate Di distortion to Qi

• Rate (bits) for i-th variable is Ri = max[0.5 log2(Qi/Di), 0]• Total distortion

• Total rate (bits)

• We specify R. What are the values of Di to get minimum total distortion D?

D = Dii=1

n∑

R = Rii=1

n∑


Bit allocation solution

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16

QiDi

q

• Implicit solution (water-filling construction)• Choose Q (parameter)• Di = min(Qi, Q)

- If (Qi > Q) then Di = Q, else Di = Qi• Ri = max[0.5 log2(Qi/Di), 0]

- If (Qi > Q) then Ri = 0.5 log2(Qi/ Q), else Ri = 0.• Find value of Q to get specified R


Wavelet Transform• Filterbank and wavelets• 2 D wavelets• Wavelet Pyramid


Filterbank and Wavelets• Put signal (sequence) through two filters

- Low frequencies- High frequencies

• Downsample both by factor of 2• Do it in such a way that the original signal can be reconstructed!

LHx(i)l(k)h(k) LRHR+x(i)

100

50

50

100


Filterbank Pyramid

LHx(i)

LH

LH

1000 500

250

125

125


2D Wavelets• Apply wavelet processing along rows of picture

Apply wavelet processing along columns of picture

Pyramid processing

LHHHLHLVLHHV

HHLVHHHV

LHLV


Lena: Top Level, next level

1.01

0.372.52

48.81 9.23

15.45 6.48


Lena, more levels


Decorrelation of Images• x = (x1, x2, ... xn), a sequence of image gray values• Preprocess: convert to y = (y1, y2, ... yn), y = Ax, A ~ an

orthogonal matrix (A-1 = AT)• Theoretical best (for Gaussian process): A is the Karhunen-

Loeve transformation matrix- Images are not Gaussian processes- Karhunen-Loeve matrix is image-dependent, computationally

expensive to find- Evaluating y = Ax with K-L transformation is computationally

expensive• In practice, we use DCT (discrete cosine transform) for

decorrelation- Computationally efficient- Almost as good as the K-L transformation


DPCM• Simple to implement (low complexity)

- Prediction: 3 multiplications and 2 additions- Estimation: 1 addition- Encoding: 1 addition + quantization

• Performance for 2-D coding not as good as block quantization- In theory, for large past history the performance (rate-distortion)

should be as good as other linear methods, but in that case there is no computational advantage

• Bottom line: useful when complexity is limited• Important idea: Lossy predictive encoding.


Review: Image Decorrelation• x = (x1, x2, ... xn), a sequence of image gray values• Preprocess: convert to y = (y1, y2, ... yn), y = Ax, A ~ an

orthogonal matrix (A-1 = AT)• Theoretical best (for Gaussian process): A is the Karhunen-

Loeve transformation matrix- Images are not Gaussian processes- Karhunen-Loeve matrix is image-dependent, computationally

expensive to find- Evaluating y = Ax with K-L transformation is computationally

expensive• In practice, we use DCT (discrete cosine transform) for

decorrelation- Computationally efficient- Almost as good as the K-L transformation


Rate-Distortion: 1D vs. 2D coding• Theory on tradeoff between distortion and least number of bits• Interesting tradeoff only if samples are correlated• “Water-filling” construction to compute R(d)


Review: Block-Based Coding• Full image DCT - one set of

decorrelated coefficients for whole image

• Block-based coding: - Image divided into ‘small’

blocks- Each block is decorrelated

separately• Block decorrelation performs

almost as well (better?) than full image decorrelation

• Current standards (JPEG, MPEG) use 8x8 DCT blocks


What is the DCT?• One-dimensional 8 point DCT

Input x0, ... x7, output y0, ... y7

• One-dimensional inverse DCTInput y0, ... y7, output x0, ... x7

• Matrix form of equations: x, y are one column matrices

yk =c(k)

2 xi cos (2i +1)kp16

⎛ ⎝

⎞ ⎠

i=0

7∑ , k =0,1,K ,7. c(k) = 1/ 2 k =0

1 otherwise⎧ ⎨ ⎩

y=Tx, x=TTy, tki =c(k)

2 cos (2i +1)kp16

⎛ ⎝

⎞ ⎠

xk = yi

c(i)2 cos (2k +1)ip

16⎛ ⎝

⎞ ⎠

i=0

7∑ , k =0,1,K ,7.

Note: in these equations, p stands for π


• Forward 2DDCT. Input xij i = 0, ... 7, j = 0, ... 7. Output ykl k = 0, ... 7, l = 0, ... 7

• Matrix form, X, Y ~ 8x8 matrices with coefficients xij , ykl

• The 2DDCT is separable!

Two-Dimensional DCT

ykl =c(k)c(l)

2 xij cos (2i +1)kp16

⎛ ⎝

⎞ ⎠ cos (2j +1)lp

16⎛ ⎝

⎞ ⎠

j=0

7∑

i=0

7∑c(k) = 1/ 2 k =0

1 otherwise⎧ ⎨ ⎩

Y=TXTT , X=TTYT, tki =c(k)

2 cos (2i +1)kp16

⎛ ⎝

⎞ ⎠

Note: in these equations, p stands for π


General DCT • One dimension

• Two dimensions

€

y(k) = t(k,i)x(i)i=0

N−1

∑ , k = 0,1,K ,N −1

€

t(k,i) =1/ N k = 0

2/N cos (2i+1)kπ2N

⎛ ⎝ ⎜ ⎞

⎠ ⎟ k ≠ 0

⎧ ⎨ ⎪

⎩ ⎪

€

y(k,l) = x(i, j)t(k,i)t(l, j)j=0

N−1

∑i=0

N−1

∑

ecec 453 image processing architecture

Documents