ecec 453 image processing architecture

30
Image Processing Architecture, © 2001-2004 Oleh Tretiak Page 1 Lecture 5 ECEC 453 Image Processing Architecture Lecture 5, 1/22/2004 Rate-Distortion Theory, Quantizers and DCT Oleh Tretiak Drexel University

Upload: alka

Post on 25-Feb-2016

41 views

Category:

Documents


1 download

DESCRIPTION

ECEC 453 Image Processing Architecture. Lecture 5, 1/22/2004 Rate-Distortion Theory, Quantizers and DCT Oleh Tretiak Drexel University. Quality - Rate Tradeoff. Given: 512x512 picture, 8 bits per pixel Bit reduction Fewer bits per pixel Fewer pixels Both Issues: - PowerPoint PPT Presentation

TRANSCRIPT

Page 1: ECEC 453 Image Processing Architecture

Image Processing Architecture, © 2001-2004 Oleh Tretiak Page 1Lecture 5

ECEC 453Image Processing Architecture

Lecture 5, 1/22/2004Rate-Distortion Theory, Quantizers and DCT

Oleh TretiakDrexel University

Page 2: ECEC 453 Image Processing Architecture

Image Processing Architecture, © 2001-2004 Oleh Tretiak Page 2Lecture 5

Quality - Rate Tradeoff• Given: 512x512 picture, 8 bits per pixel

- Bit reductiono Fewer bits per pixelo Fewer pixelso Both

• Issues:- How do we measure compression?

o Bits/pixel — does not work when we change number of pixelo Total bits — valid, but hard to interpret

- How do we measure quality?o RMS noiseo Peak signal to noise ratio (PSR) in dBo Subjective quality

Page 3: ECEC 453 Image Processing Architecture

Image Processing Architecture, © 2001-2004 Oleh Tretiak Page 3Lecture 5

Comparison, Bit and Pixel Reduction

0

10

20

30

40

50

60

0 1000000 2000000Total bits in image

SubsampleDrop bits

Page 4: ECEC 453 Image Processing Architecture

Image Processing Architecture, © 2001-2004 Oleh Tretiak Page 4Lecture 5

Quantizer Performance• Questions:

- How much error does the quantizer introduce (Distortion)?- How many bits are required for the quantized values (Rate)?

• Rate:- 1. No compression. If there are N possible quantizer output values, then it takes ceiling(log2N) bits

per sample.- 2(a). Compression. Compute the histogram of the quantizer output. Design Huffman code for the

histogram. Find the average lentgth.- 2(b). Find the entropy of the quantizer distribution- 2(c). Preprocess quantizer output, ....

• Distortion:Let x be the input to the quantizer, x* the de-quantized value. Quantization noise n = x* - x. Quantization noise power is equal to D = Average(n2).

Page 5: ECEC 453 Image Processing Architecture

Image Processing Architecture, © 2001-2004 Oleh Tretiak Page 5Lecture 5

Quantizer: practical lossy encoding

• Quantizer- Symbols

x — input to quantizer, q — output of quantizer,S — quantizer step

- Quantizer:q = round(x/S)

- Dequantizer characteristicx* = Sq

- Typical noise power added by quantizer-dequantizer combination: D = S2/12

noise standard deviation = sqrt(D) = 0.287SExample: S = 8, D = 82/12 = 5.3,

rms. quatization noise = sqrt(D) = 2.3 If input is 8 bits, max input is 255. There are 255/8

~ 32 quantizer output values PSNR = 20 log10(255/2.3) = 40.8 dB

Quantizer characteristic

Dequantizer characteristic

Δxq

S

Δx*q

S

Page 6: ECEC 453 Image Processing Architecture

Image Processing Architecture, © 2001-2004 Oleh Tretiak Page 6Lecture 5

Rate-Distortion Theorem• When long sequences (blocks) are encoded, it is possible to

construct a coder-decoder pair that achieves the specified distortion whenever bits per sample are R(D) +

• Formula: X ~ Gaussian random variable, Q = E[X2] ~ signal power

• D = E[(X–Y)2 ] ~ noise power

p(x ) =1

2πQxπ(−x 2 / 2Q )

R =12log2

QD

D ≤Q0 D >Q

⎧ ⎨ ⎪ ⎩ ⎪

Page 7: ECEC 453 Image Processing Architecture

Image Processing Architecture, © 2001-2004 Oleh Tretiak Page 7Lecture 5

This Lecture• Decorrelation and Bit Allocation• Discrete Cosine Transform• Video Coding

Page 8: ECEC 453 Image Processing Architecture

Image Processing Architecture, © 2001-2004 Oleh Tretiak Page 8Lecture 5

Coding Correlated Samples• How to code correlated samples

- Decorrelate- Code

• Methods for decorrelation- Prediction- Transformation

o Block transformo Wavelet transform

Page 9: ECEC 453 Image Processing Architecture

Image Processing Architecture, © 2001-2004 Oleh Tretiak Page 9Lecture 5

Prediction Rules• Simplest: previous value

q i+Delay

p i

ˆ x iQuantizerDelayxi+-+qipi

p ij =w1 ˆxi−1,j+w 2 ˆxi,j−1 +w 3 ˆxi−1,j−1

ˆ x i −1,j

ˆxi,j−1ˆxi−1,j−1

πij

ˆ x ip i = ˆxi−1

ˆxi =πi + q i = ˆxi−1 +q i

Page 10: ECEC 453 Image Processing Architecture

Image Processing Architecture, © 2001-2004 Oleh Tretiak Page 10Lecture 5

General Predictive Coding• General System

• Example of linear predictive image coder

+–Quantizer

Predictor

xi

pi

ei ei*

+Predictor

xi^

pi

ei*

+

p ij =w1 ˆxi−1,j+w 2 ˆxi,j−1 +w 3 ˆxi−1,j−1

ˆ x i −1,j

ˆxi,j−1ˆxi−1,j−1

πij

Page 11: ECEC 453 Image Processing Architecture

Image Processing Architecture, © 2001-2004 Oleh Tretiak Page 11Lecture 5

Rate-distortion theory — correlated samples

• Given: x = (x1, x2, ... xn), a sequence of Gaussian correlated samples

• Preprocess: convert to y = (y1, y2, ... yn), y = Ax, A ~ an orthogonal matrix (A-1 = AT) that de-correlates the samples. This is called a Karhunen-Loeve transformation

• Perform lossy encoding of (y1, y2, ... yn) - get y* = (y1*, y2*, ... yn*) after decoding

• Reconstruct: x* = A-1y*SignalPre-processorEncoderBitsCodesDecoderCodesRecon-structor

SignalTransmitReceive

Page 12: ECEC 453 Image Processing Architecture

Image Processing Architecture, © 2001-2004 Oleh Tretiak Page 12Lecture 5

Block-Based Coding• Discrete Cosine Transform (DCT)

is used instead of the K-L transform

• Full image DCT - one set of decorrelated coefficients for whole image

• Block-based coding: - Image divided into ‘small’ blocks- Each block is decorrelated

separately• Block decorrelation performs

almost as well (better?) than full image decorrelation

• Current standards (JPEG, MPEG) use 8x8 DCT blocks

Page 13: ECEC 453 Image Processing Architecture

Image Processing Architecture, © 2001-2004 Oleh Tretiak Page 13Lecture 5

Rate-distortion theory: non-uniform random variables

• Given (x1, x2, ... xn), use orthogonal transform to obtain (y1, y2, ... yn).

• Sequence of independent Gaussian variables (y1, y2, ... yn), Var[yi ] = Qi.

• Distortion allocation: allocate Di distortion to Qi

• Rate (bits) for i-th variable is Ri = max[0.5 log2(Qi/Di), 0]• Total distortion

• Total rate (bits)

• We specify R. What are the values of Di to get minimum total distortion D?

D = Dii=1

n∑

R = Rii=1

n∑

Page 14: ECEC 453 Image Processing Architecture

Image Processing Architecture, © 2001-2004 Oleh Tretiak Page 14Lecture 5

Bit allocation solution

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16

QiDi

q

• Implicit solution (water-filling construction)• Choose Q (parameter)• Di = min(Qi, Q)

- If (Qi > Q) then Di = Q, else Di = Qi• Ri = max[0.5 log2(Qi/Di), 0]

- If (Qi > Q) then Ri = 0.5 log2(Qi/ Q), else Ri = 0.• Find value of Q to get specified R

Page 15: ECEC 453 Image Processing Architecture

Image Processing Architecture, © 2001-2004 Oleh Tretiak Page 15Lecture 5

Page 16: ECEC 453 Image Processing Architecture

Image Processing Architecture, © 2001-2004 Oleh Tretiak Page 16Lecture 5

Wavelet Transform• Filterbank and wavelets• 2 D wavelets• Wavelet Pyramid

Page 17: ECEC 453 Image Processing Architecture

Image Processing Architecture, © 2001-2004 Oleh Tretiak Page 17Lecture 5

Filterbank and Wavelets• Put signal (sequence) through two filters

- Low frequencies- High frequencies

• Downsample both by factor of 2• Do it in such a way that the original signal can be reconstructed!

LHx(i)l(k)h(k) LRHR+x(i)

100

50

50

100

Page 18: ECEC 453 Image Processing Architecture

Image Processing Architecture, © 2001-2004 Oleh Tretiak Page 18Lecture 5

Filterbank Pyramid

LHx(i)

LH

LH

1000 500

250

125

125

Page 19: ECEC 453 Image Processing Architecture

Image Processing Architecture, © 2001-2004 Oleh Tretiak Page 19Lecture 5

2D Wavelets• Apply wavelet processing along rows of picture

Apply wavelet processing along columns of picture

Pyramid processing

LHHHLHLVLHHV

HHLVHHHV

LHLV

Page 20: ECEC 453 Image Processing Architecture

Image Processing Architecture, © 2001-2004 Oleh Tretiak Page 20Lecture 5

Lena: Top Level, next level

1.01

0.372.52

48.81 9.23

15.45 6.48

Page 21: ECEC 453 Image Processing Architecture

Image Processing Architecture, © 2001-2004 Oleh Tretiak Page 21Lecture 5

Lena, more levels

Page 22: ECEC 453 Image Processing Architecture

Image Processing Architecture, © 2001-2004 Oleh Tretiak Page 22Lecture 5

Decorrelation of Images• x = (x1, x2, ... xn), a sequence of image gray values• Preprocess: convert to y = (y1, y2, ... yn), y = Ax, A ~ an

orthogonal matrix (A-1 = AT)• Theoretical best (for Gaussian process): A is the Karhunen-

Loeve transformation matrix- Images are not Gaussian processes- Karhunen-Loeve matrix is image-dependent, computationally

expensive to find- Evaluating y = Ax with K-L transformation is computationally

expensive• In practice, we use DCT (discrete cosine transform) for

decorrelation- Computationally efficient- Almost as good as the K-L transformation

Page 23: ECEC 453 Image Processing Architecture

Image Processing Architecture, © 2001-2004 Oleh Tretiak Page 23Lecture 5

DPCM• Simple to implement (low complexity)

- Prediction: 3 multiplications and 2 additions- Estimation: 1 addition- Encoding: 1 addition + quantization

• Performance for 2-D coding not as good as block quantization- In theory, for large past history the performance (rate-distortion)

should be as good as other linear methods, but in that case there is no computational advantage

• Bottom line: useful when complexity is limited• Important idea: Lossy predictive encoding.

Page 24: ECEC 453 Image Processing Architecture

Image Processing Architecture, © 2001-2004 Oleh Tretiak Page 24Lecture 5

Review: Image Decorrelation• x = (x1, x2, ... xn), a sequence of image gray values• Preprocess: convert to y = (y1, y2, ... yn), y = Ax, A ~ an

orthogonal matrix (A-1 = AT)• Theoretical best (for Gaussian process): A is the Karhunen-

Loeve transformation matrix- Images are not Gaussian processes- Karhunen-Loeve matrix is image-dependent, computationally

expensive to find- Evaluating y = Ax with K-L transformation is computationally

expensive• In practice, we use DCT (discrete cosine transform) for

decorrelation- Computationally efficient- Almost as good as the K-L transformation

Page 25: ECEC 453 Image Processing Architecture

Image Processing Architecture, © 2001-2004 Oleh Tretiak Page 25Lecture 5

Rate-Distortion: 1D vs. 2D coding• Theory on tradeoff between distortion and least number of bits• Interesting tradeoff only if samples are correlated• “Water-filling” construction to compute R(d)

Page 26: ECEC 453 Image Processing Architecture

Image Processing Architecture, © 2001-2004 Oleh Tretiak Page 26Lecture 5

Review: Block-Based Coding• Full image DCT - one set of

decorrelated coefficients for whole image

• Block-based coding: - Image divided into ‘small’

blocks- Each block is decorrelated

separately• Block decorrelation performs

almost as well (better?) than full image decorrelation

• Current standards (JPEG, MPEG) use 8x8 DCT blocks

Page 27: ECEC 453 Image Processing Architecture

Image Processing Architecture, © 2001-2004 Oleh Tretiak Page 27Lecture 5

What is the DCT?• One-dimensional 8 point DCT

Input x0, ... x7, output y0, ... y7

• One-dimensional inverse DCTInput y0, ... y7, output x0, ... x7

• Matrix form of equations: x, y are one column matrices

yk =c(k)

2 xi cos (2i +1)kp16

⎛ ⎝

⎞ ⎠

i=0

7∑ , k =0,1,K ,7. c(k) = 1/ 2 k =0

1 otherwise⎧ ⎨ ⎩

y=Tx, x=TTy, tki =c(k)

2 cos (2i +1)kp16

⎛ ⎝

⎞ ⎠

xk = yi

c(i)2 cos (2k +1)ip

16⎛ ⎝

⎞ ⎠

i=0

7∑ , k =0,1,K ,7.

Note: in these equations, p stands for π

Page 28: ECEC 453 Image Processing Architecture

Image Processing Architecture, © 2001-2004 Oleh Tretiak Page 28Lecture 5

• Forward 2DDCT. Input xij i = 0, ... 7, j = 0, ... 7. Output ykl k = 0, ... 7, l = 0, ... 7

• Matrix form, X, Y ~ 8x8 matrices with coefficients xij , ykl

• The 2DDCT is separable!

Two-Dimensional DCT

ykl =c(k)c(l)

2 xij cos (2i +1)kp16

⎛ ⎝

⎞ ⎠ cos (2j +1)lp

16⎛ ⎝

⎞ ⎠

j=0

7∑

i=0

7∑c(k) = 1/ 2 k =0

1 otherwise⎧ ⎨ ⎩

Y=TXTT , X=TTYT, tki =c(k)

2 cos (2i +1)kp16

⎛ ⎝

⎞ ⎠

Note: in these equations, p stands for π

Page 29: ECEC 453 Image Processing Architecture

Image Processing Architecture, © 2001-2004 Oleh Tretiak Page 29Lecture 5

General DCT • One dimension

• Two dimensions

y(k) = t(k,i)x(i)i=0

N−1

∑ , k = 0,1,K ,N −1

t(k,i) =1/ N k = 0

2/N cos (2i+1)kπ2N

⎛ ⎝ ⎜ ⎞

⎠ ⎟ k ≠ 0

⎧ ⎨ ⎪

⎩ ⎪

y(k,l) = x(i, j)t(k,i)t(l, j)j=0

N−1

∑i=0

N−1

Page 30: ECEC 453 Image Processing Architecture

Image Processing Architecture, © 2001-2004 Oleh Tretiak Page 30Lecture 5