digital image representation - rwth aachen university · • digital image representation ......

Lehrstuhl für Informatik 4

Kommunikation und verteilte Systeme

Chapter 2.2: Images and Graphics

2.2: Images and Graphics• Digital image representation

• Image formats and color models• JPEG, JPEG2000

• Image synthesis and graphics systems

• Image analysis

Chapter 2: Basics• Audio Technology• Images and Graphics

• Video and Animation

Chapter 3: Multimedia Systems - Communication Aspects and Services

Chapter 4: Multimedia Systems – Storage Aspects

Chapter 5: Multimedia Usage



Page 2Chapter 2.2: Images and Graphics

A digital image is a spatial representation of an object(2D, 3D scene or another image - real or virtual)

Definition of “digital image”:Let I, J, K ⊆ Z be a finite interval. Let G ⊂ N0 with |G| < ∞ be the grey scale level / color depth (intensity value of a picture element = a pixel) of the image.

(1) A 2D-image is a function f: I × J → G(2) A 3D-image is a function f: I × J × K → G

(3) If G = {0,1}, the function is a binary (or bit) image, otherwise it is a pixel image

The Resolution depends on the size of I and J (and K) and describes the number of pixels per row resp. column.

Example

To display a 525-line television picture (NTSC) without noticeable degradation with a Video Graphics Array (VGA) video controller, 640x480 pixels and 256 discrete grey levels give an array of 307.200 8-bit numbers and a total of 2.457.600 bit.

Digital Image Representation




An Image Capturing Format is specified by:

spatial resolution (pixel x pixel) and color encoding (bits per pixel)

Example: captured image of a DVD video with 4:3 picture size:

spatial resolution: 768 x 576 pixelcolor encoding: 1-bit (binary image), 8-bit (color or grayscale),

24-bit (color-RGB)

Image Representation

An Image Storing Format is a 2-dimensional array of values representing the image in a bitmap or pixmap, respectively. Also called raster graphics. The data of the fields of a bitmap is a binary digit, data in a pixmap may be a collection of:

• 3 numbers representing the intensities of red, green, and blue components of the color

• 3 numbers representing indices to tables of red, green and blue intensities• Single numbers as index to a table of color triples

• Single numbers as index to any other data structures that represents a color / colorsystem

Further properties can be assigned with the whole image: width, height, depth, version, etc.




Color Models

Why storing values for red, green, blue?

Color perception by the human brain is possible through the additive composition of red, green and blue light (RGB system). The relative intensities of RGB values are transmitted to the monitor where they are reproduced at each point in time.

On a computer monitor, each pixel is given as an overlay of those three image tones with different intensities – by this, any color can be reproduced.

But: another possible color model: CYMK

When printing an image, other color components are used –cyan, yellow, magenta, kontrast – which in all can also reproduce all colors.Thus, many image processing software and also some image storing formats also support this model.




Another possibility is to use a different representation of color information by means of the YUV system where

• Y is the brightness (or luminance) information• U and V are color difference signals (chrominance)

• Y, U and V are functions of R, G and BWhy? As the human eye is more sensitive to brightness than to chrominance, separate brightness information from the color information and code the more important luminance with more bit than the chrominance – this can save bits in the representation format.

Color Models




Usual scheme:

• Y = 0.30 · R + 0.59 ·G + 0.11 · B (the color sensitivity of the human eye is considered)• U = c1 · (B-Y); V = c2 ·(R-Y)

• c1 , c2 = constants reflecting perception aspects of the human eye and the human brain!

Possible Coding:• YUV signal

� Y = 0.30 R + 0.59 G + 0.11 B� U = (B-Y) · 0.493 = - 0.148 · R - 0.29 · G + 0.439 · B� V = (R-Y) · 0.877 = 0.614 · R - 0.517 · G - 0.096 · B

• This is a system of 3 equations for determining Y, U, V from R, G, B or for recalculating R, G, B from Y, U, V

• The resolution of Y is more important than the resolution of U and V

• Spend more bits for Y than for U and V (Y : U : V = 4 : 2 : 2)• The weighting factors in the calculation of the Y signal compensate the color perception

misbalance of the human eye

Color Models




Image FormatsLots of different image formats are in use today, e.g.• GIF (Graphics Interchange Format)

Compressed with some basic lossless compression techniques to 20 – 25% of original picture without loss. Supports 24-bit colors.

• BMP (Bitmap)

Devide-independent representation of an image: uses RGB color model, without compression. Color depth up to 24-bit, additional option of specifying a color table to use.

• TIFF (Tagged Image File Format)Supports grey levels, RGB, and CYMK color model. Also supports lots of different compression methods. Additionally contains a descriptive part with properties a displayshould provide to show the image.

• PostScriptImages are described without reference to special properties as e.g. resolution. Nice feature for printers, but hard to include into documents where you have to know the image size...

• JPEG (Joint Photographics Expert Group)Lots of possible compressions, mostly with loss!




Why Compression?

High-resolution image: e.g. 1024×768 pixel, 24 bit color depth→ 1024·768·24 = 18.874.368 bit

Image formats like GIF:

• Lossless compression (entropy encoding) for reducing data amount while keeping image quality

JPEG: • Lossy compression – remove some image details to achieve a higher compression rate

by suppressing higher frequencies

• Combined with lossless techniques• Trade-Off between file size and quality

• JPEG is a joint standard of ISO and ITU-T� In June 1987, an adaptive transformation coding technique based on DCT was

adopted for JPEG

� In 1992, JPEG became a ISO international standard




JPEG

Implementation

• Independent of image size• Applicable to any image and pixel aspect ratio

Color representation

• JPEG applies to color and grey-scaled still images

Image content• Of any complexity, with any statistical characteristics

Properties of JPEG• State-of-the-art regarding compression factor and image quality

• Run on as many available standard processors as possible• Compression mechanisms are available as software-only packages or together with

specific hardware support - use of specialized hardware should speed up image decompression

• Encoded data stream has a fixed interchange format • Fast coding is also used for video sequences: Motion JPEG




How could We compress?

Entropy encoding

� Data stream is considered to be a simple digital sequence without semantics� Lossless coding, decompression process regenerates the data completely

� Used regardless of the media‘s specific characteristics� Examples: Run-length encoding, Huffman encoding, Arithmetic encoding

Source encoding� Semantics of the data are taken into account

� Lossy coding (encoded data are not identical with original data) � Degree of compression depends on the data contents

� Example: Discrete Cosine Transformation (DCT) as transformation technique of the spatial domain into the two-dimensional frequency domain

Hybrid encoding

� Used by most multimedia systems� Combination of entropy and source encoding

� Examples: JPEG, MPEG, H.261




Compression Steps in JPEG

ImagePreparation

Pixel

Quantization(approxima-tion of real numbers by

rational numbers)

Block, MCU

Image Processing

Predictor

DCT

Entropy Encoding

Run-length

Huffman

Arithmetic

UncompressedImage

CompressedImage

MCU: Minimum Coded UnitDCT: Discrete Cosine Transform





Image Preparation• Analog-to-digital conversion

• Image division into blocks of N×N pixels• Suitable structuring and ordering of image information

Image Processing - Source Encoding• Transformation from time to frequency domain using DCT

• In principle no compression itself – but computation of new coefficients as input forcompression process

Quantization

• Mapping of real numbers into rational numbers (approximation)• A certain loss of precision will in general be unavoidable

Entropy Encoding• Lossless compression of a sequential digital data stream




The Principle

• Without “Quantization“: Encoding gain would be very poor (or nonexisting) • Transformation and Retransformation must be inverse to each other

• Task of transformation: produce a picture representation which may be encoded with a high gain of reduction

Original

Transformation Quantization

get rid of“invisible details“

QuantizationTable

Encode

Huffman,Run Length Encoding

JPEGPicture

The opposite

JPEG Decoder Dequantization

the “details“ cannotbe reconstructed

Retransformation

“Original“




Variants of Image Compression

JPEG is not a single format, but it can be chosen from a number of modes:

Lossy sequential DCT-based mode (baseline process)• Must be supported by every JPEG implementation

• Block, MCU, FDCT, Run-length, Huffman

Expanded lossy DCT-based mode

• Enhancement to the baseline process by adding progressive encoding

Lossless mode

• Low compression ratio → „perfect“ reconstruction of original image• No DCT, but differential encoding by prediction

Hierarchical mode• Accommodates images of different resolutions

• Selects its algorithms from the three other modes




First Step: Image Preparation

General image model

• Independence from image parameters like size and pixel ratio • Description of most of the well-known picture representations• Source picture consists of 1 to 255 components (planes) Ci

• Components may be assigned to RGB or YUV values• For example, C1 may be assigned to ‘red color information’

• Each component Ci can have a different number of superpixels Xi, Yi

(A superpixel is a rectangle of pixels which all have the same value)

Yi

Ci

Xisuperpixel• • •• • •• • •• • •• • •• • •• • •• • •• • •• • •• • •• • •• • •• • •• • •• • •

CN

C3C2C1




Picture Preparation - Components

Resolution of the components may be different:

A1

AN

A2

X1

Y1

B1

BM

B2

X2

Y2

D1

DM

D2

X3

Y3

1 2 3

1 2 3

X 2 X 2 X

Y Y Y

= ⋅ = ⋅

= =

• A grey-scale image consists (in most cases) of a single component• RGB color representation has three components with equal resolution

• YUV color image processing usesY1 = 4 · Y2 = 4 · Y3 and X1 = 4 · X2 = 4 · X3




Image Preparation - Dimensions

Dimensions of a compressed image are defined by

• X (maximum of all Xi),• Y (maximum of all Yi),

• Hi and Vi (relative horizontal and vertical sampling ratios for each component i) with and

• Hi and Vi must be integers in the range of 1 to 4. This restriction is needed for the interleaving of components

Example:

Y = 4 pixels, X = 6 pixels

X1 = 6, Y1 = 4H1 = 2V1 = 2

i

jj

Xi min XH = i

jj

Yi m in YV =

• • •

X3 = 3, Y3 = 2H3 = 1V3 = 1

X2 = 6, Y2 = 2H2 = 2V2 = 1

C2

• • •

C3

• • • • • •

• • • • • •

X1

••••

••••

••••

••••

••••

••••

C1

Y1




Image Preparation – Data Ordering

An image is divided into several components which can be processed one by one. But: how to prepare a component for processing?

• Observation for most parts of an image: not so much difference between the values in a rectangle of N×N pixels

• For further processing: divide each component of an image into blocks of N×N pixels

• Thus, the image is divided into data units (blocks):� Lossless mode uses one pixel as one data unit

� Lossy mode uses blocks of 8×8 pixels (with 8 or 12 bits per pixel)




Image Preparation - Data Ordering

Non-interleaved data ordering:• The easiest but not the most convenient sequence of data processing

• Data units are processed component by component• For one component, the processing order is left-to-right and top-to-bottom

•• • • • •• • • • • •

•

•••

•

•

•

•

•

•

•

•

With the non-interleaved technique, a RGB-encoded image is processed by:

• First the red component only• Then the blue component, followed by the green component

This is (for speed reasons) less suitable than data unit interleaving




Interleaved Data Ordering

Often more suitable: interleave data units

• Interleaving means: don‘t process all blocks component by component, but mix data units from all components

• Interleaved data units of different components:

� Combination to Minimum Coded Units (MCUs)

• If all components have the same resolution

→ MCU consists of one data unit for each component

• If components have different resolutions

1. For each component, regions of data units are determined;data units in one region are ordered left-to-right and top-to bottom

2. Each component consists of the same number of regions3. MCU consists of one region in each component

• Up to 4 components can be encoded in interleaved mode (according to JPEG)

• Each MCU consists of at most ten data units




Image Preparation - MCUs

0 1 2 3 4 5

0

1

2

3

MCU example: four components C1, C2, C3, C4

a00 a01

a10 a11

� �

� �

� �

� �

� �

� �

� �

� �

� �

� �

b00 b01 � � � �

� ��

C2: H2 = 2, V2 = 1C1: H1 = 2, V1 = 2

c00

c10

�

�

�

�

�

�

�

�

�

�

C3: H3 = 1, V3 = 2 C4: H4 = 1, V4 = 1

d00 � �

��

MCUs: 9 data units per MCU

MCU1 = a00a01a10a11b00b01c00c10d00

MCU2 = a02a03a12a13b02b03c01c11d01

MCU3 = a04a05a14a15b04b05c02c12d02

MCU4 = a20a21a30a31b10b11c20c30d10

ii

j

ii

j

xH

min x

yV

min y

=

=

whereaij: data units of C1

bij: data units of C2

cij: data units of C3

dij: data units of C4





ImagePreparation

Pixel


rational numbers)

Block, MCU

Image Processing Entropy

Encoding

Run-length

Huffman

Arithmetic

UncompressedImage

CompressedImage

• Result of image preparation: sequence of 8×8 blocks, the order is defined by MCUs

• The samples are encoded with 8 bit/pixel

• Next step: image processing by source encoding

DCT





Source Encoding – Transformation

Encoding by transformation: Data are transformed into another mathematical domain, which is more suitable for compression.

• The inverse transformation must exist and must be easy to calculate• Most widely known example: Fourier transformation

• The parameters m and n indicate the ‘granularity’

Most effective transformation for image compression:

• Discrete Cosine Transformation (DCT)

• Fast Fourier Transformation (FFT)

2 i 2 im n

m 1 n 1 u x vyu v xyx 0 y 0

F f e eπ π− −

= == ∑ ∑

m 1 n 1 u( 2 x 1 ) v ( 2 y 1 )uv nm xy 2 m 2 nx 0 y 0

F f cos cosπ πδ − − + += =

= ∑ ∑




Discrete Cosine Transformation

xyLet f be a pixel (x,y) in the original picture. (0 x N 1; 0 y N 1)≤ ≤ − ≤ ≤ −

{ }N 1 N 1

uv N u v xyx 0 y 0

u

( 2 x 1 ) u ( 2 y 1 ) vF : c c f cos cos , u ,v 0,...,N 1 ,

2N 2N

1 u 02 c

1 u 0

π πγ− −

= =

+ ⋅ + ⋅⎛ ⎞ ⎛ ⎞= ⋅ ⋅ ⋅ ⋅ ⋅ ∈ −⎜ ⎟ ⎜ ⎟⎝ ⎠ ⎝ ⎠

⎧ =⎪= ⎨⎪ >⎩

∑∑

fxy ↔ space domain (i.e. “geometric“)

Fuv ↔ “frequency domain“ (indicates how fast the information moves inside the rectangle)

F00 is the lowest frequency in both directions, i.e. a measure of the average pixel

value

Fuv with small total frequency (i.e u+v small) are (in general) larger than Fuv with large u+v




Retransformation: Inverse Cosine Transformation

N 1 N 1

xy N u v u vu 0 v 0

( 2 x 1 ) u ( 2 y 1 ) vf c c F co s co s

2 N 2 Nπ πδ

− −

= =

+ ⋅ + ⋅⎛ ⎞ ⎛ ⎞= ⋅ ⋅ ⋅ ⋅ ⋅⎜ ⎟ ⎜ ⎟⎝ ⎠ ⎝ ⎠

∑ ∑

2

00 N N uvx y

1 1 NF f cos(0 ) f all other F 0

22 2γ γ⇒ = ⋅ ⋅ ⋅ = ⋅ ⋅ ← =∑∑

N N N

2if then

Nδ γ γ= =

!

00 N u v uvu v

N 0 0 00

2

N N

1

f f c c F cos(...) cos(...)

c c F 1 1

1 Nf

2 2

δ

δ

δ γ

=

= = ⋅ ⋅ ⋅ ⋅ ⋅

= ⋅ ⋅ ⋅ ⋅ ⋅

= ⋅ ⋅ ⋅ ⋅

∑∑

��

Simplest example (just for demonstration): Let fxy = f = constant




Example

• N=8 (Standard):

• N=2:

1uv u v 4F c c ...= ⋅ ⋅ ∑∑

1 1

u v u v xyx 0 y 0

u v 00 0 1 1 0 1 1

2 ( 2 x 1 ) u ( 2 y 1 ) vF c c f co s co s

2 4 4

u v u 3 v 3 u v 3 u 3 vc c f co s co s f co s f co s f co s

4 4 4 4 4 4 4 4

π π

π π π π π π π π= =

+ ⋅ + ⋅⎛ ⎞ ⎛ ⎞= ⋅ ⋅ ⋅ ⋅⎜ ⎟ ⎜ ⎟⎝ ⎠ ⎝ ⎠

⎡ ⎤= ⋅ ⋅ ⋅ ⋅ + ⋅ ⋅ + ⋅ ⋅ + ⋅ ⋅⎢ ⎥⎣ ⎦

∑ ∑

[ ]00 00 01 10 11 xy

1F f f f f i.e. 2 f if f f

2= ⋅ + + + ← ≈ ⋅ ≈

01 00 01 10 11 00 01 10 11

1 1 1 12 2 2 22 2 2 2

1 3 3 1F f cos f cos f cos f cos f f f f

4 4 4 4 22

π π π π

− −

⎡ ⎤⎢ ⎥ ⎡ ⎤⎢ ⎥ ⎢ ⎥= ⋅ + ⋅ + ⋅ + ⋅ = − + −⎢ ⎥ ⎢ ⎥⎣ ⎦⎢ ⎥⎣ ⎦

��

2 positive + 2 negative terms,i.e. if fxy ≈ f ⇒ F01 ≈ 0

Transformed values can be much smaller than original values:




Baseline Process - Image Processing

First step of image processing:• Samples are encoded with 8 bits/pixel; each pixel is an integer in the range [0,255]

• Pixel values are shifted to the range [-128, 127] (2-complement representation)

• Data units of 8 x 8 pixel values are defined by fxy ∈ [-128, 127] , where x, y are in the range [0, 7]

• Each value is transformed using the Forward DCT (FDCT):

7 7( 2 x 1 )u ( 2 y 1 )v1

uv u v xy4 16 16x 0 y 0

F = c c f cos cosπ π+ +

= =∑∑

12

u/v

for u / v 0where c and u,v [0,7 ]

1 otherwise

=⎧⎪= ∈⎨⎪⎩

• Cosine expressions are independent of fxy → fast calculation is possible• Result: From 64 coefficients fxy we get 64 coefficients Fuv in the frequency domain

How can DCT be useful for JPEG? - Fuv for larger values of u and v are often very small!




Meaning of Coefficients

Low

Low

High

High

Low

High

HighLow

8×8 block

Transformation to frequencies

+

...

F00

F01

F10F12 F23




Baseline Process - Image Processing

• Coefficient F00: DC-coefficient

– Corresponds to the lowest frequency in both dimensions– Determines the fundamental color of the data unit of 64 pixels– Normally the values for F00 are very similar in neighbored blocks

• Other coefficients (Fuv for u+v > 0): AC-coefficients– Non-zero frequency in one or both dimensions

• Reconstruction of the image: Inverse DCT (IDCT)If FDCT and IDCT could be calculated with full precision → DCT would be lossless

• In practice: precision is restricted (real numbers!), thus DCT is lossy

→ different implementations of JPEG decoder may produce different images

• Reason for the transformation:

– Experience shows that many AC-coefficients have a value of almost zero, i.e. they are zero after quantization → entropy encoding may lead to significant data reduction.





ImagePreparation

Pixel


rational numbers)

Block, MCU

Image Processing

Predictor

EntropyEncoding

Run-length

Huffman

Arithmetic

UncompressedImage

CompressedImage

• Result of image processing: 8×8 blocks of DC/AC coefficients

• Till now, no compression is done – this task is enabled by quantization

DCT





Quantization

How to enforce that even more values are zero?Answer: by Quantization.

Divide Fuv by Quantumuv = Quv and take the nearest integer as the result

[ ]Quv uv uvF F / Q=

Q *uv uv uvF Q F⋅ =

uv

3 5 7 9 ... 17

5 7 9

7 9Q

9

17 31

⎡ ⎤⎢ ⎥⎢ ⎥⎢ ⎥

= ⎢ ⎥⎢ ⎥⎢ ⎥⎢ ⎥⎢ ⎥⎣ ⎦...

...

...

. . .

.. .

Example:N=8; quantization step=2, Quv =2⋅(u+v)+3

Fuv �

smaller values

most values are zero

0 1 ... N-1

0

1...N-1

Observation:

Dequantization:

(only an approximation of Fuv)




Baseline Process - Quantization

• Quantization process:

– Divide DCT-coefficient value Fuv by an integer number Quv

and round the result to the nearest integer

• Quantization of all DCT-coefficients results in a lossy transformation – some image details given by higher frequencies are cut off.

• JPEG application provides a table with 64 entries, each used for quantization of one DCT-coefficient → each coefficient can be adjusted separately

• A high compression factor is achievable on the expense of image quality→ large quantization numbers: high data reduction but information loss increases

• No default values for quantization tables are specified in JPEG




Example

152

144

140

168

162

148

136

147

155

152

144

145

148

155

156

167

136

140

147

156

156

136

123

140

167

147

140

160

148

155

167

155

163

140

140

152

140

152

162

155

162

148

155

155

136

147

144

140

152

167

179

136

147

147

140

136

172

179

175

160

162

136

147

162

Input values from exemplary grey-scale image

1.

First: subtract 128 from each element

Then: perform FDCT




Example

Quantization Matrix for Quality Level 2

3.

7

5

3

9

11

17

15

13

9

7

5

11

13

19

17

15

11

9

7

13

15

21

19

17

13

11

9

15

17

23

21

19

15

13

11

17

19

25

23

21

17

15

13

19

21

27

25

23

19

17

15

21

23

29

27

25

21

19

17

23

25

31

29

27

2.

Fuv

DCcoefficient

FDCT Output Values(because of space reasons only

the part before the comma )

-10

21

186

-8

-3

0

9

4

-24

-34

-18

-5

10

-8

1

-2

-2

26

15

14

8

-2

-3

-18

6

-9

-9

-15

1

2

4

8

-18

-11

23

-8

-11

1

-1

8

3

11

-9

-3

18

4

-7

- 4

-20

14

-14

-3

18

-6

-1

1

-1

7

-19

8

15

0

-2

-7




Example

-1

4

62

-1

0

0

1

0

-3

-5

-4

0

1

0

0

0

0

3

2

1

1

0

0

-1

0

-1

-1

-1

0

0

0

0

-1

-1

2

0

-1

0

0

0

0

1

-1

0

1

0

0

0

-1

1

-1

0

1

0

0

0

0

0

-1

0

1

0

0

0

Effects of Quantization

4.

Quantized Matrix

instead of -18

Indication of “quality loss“

-7

20

186

-9

0

0

15

0

-27

-35

-20

0

13

0

0

0

0

27

14

13

15

0

0

-17

0

-11

-9

-15

0

0

0

0

-15

-13

22

0

-19

0

0

0

0

15

-13

0

21

0

0

0

-19

17

-15

0

23

0

0

0

0

0

-17

0

25

0

0

0

Correct valuewas -11

5.

F*uv - reconstructionafter dequantization




Example

161

152

146

169

172

157

143

158

183

176

157

170

174

179

180

188

161

148

161

180

175

159

140

152

193

176

155

178

169

172

186

188

188

167

149

175

163

181

186

181

187

176

162

176

161

167

160

162

180

186

181

169

163

175

155

159

203

209

185

183

186

168

171

190

Reconstructed image after performing the inverse DCT:

6.9

8

6

1

10

9

7

11

28

24

13

25

26

24

24

21

25

8

14

24

19

23

17

12

26

29

15

18

21

17

19

33

25

27

9

23

23

29

24

26

25

28

7

21

25

20

16

22

28

19

2

33

16

28

15

23

31

30

10

23

24

32

24

28

Error in reconstruction:




Problem of Quantization

Cutting of higher frequencies leads to partly wrong color information

→ the higher the quantization coefficients, the more disturbance is in a 8×8 block

Result: edges of blocks can be seen





ImagePreparation

Pixel


rational numbers)

Block, MCU

Image Processing

Predictor

Entropy Encoding

Run-length

Huffman

Arithmetic

UncompressedImage

CompressedImage

• Result of quantization: 8×8 blocks of DC/AC coefficients with lots of zeros

• How to process and encode the data efficiently?

DCT





Entropy Encoding

• Initial step: map 8×8 block of transformed values FQuv to a 64 element vector which

can be further process by entropy encoding• DC-coefficients determine the basic color of the data units in a frame; variation

between DC-coefficients of successive frames is typically small→ The DC-coefficient is encoded as difference between the current coefficient and the previous one

• AC-coefficients: processing order uses zig-zag sequence

Baseline Process - Entropy Encoding

DC-coefficient AC-coefficients, higher frequencies

•

•• ••••

••

••

••

••

•••

•

•••

• Coefficients with lower frequencies are encoded first, followed by higher frequencies. Result: sequence of similar data bytes → efficient entropy encoding




Example

-1

4

62

-1

0

0

0

0

-3

-5

-3

0

1

0

0

0

0

3

2

1

0

0

0

-1

0

-1

-1

-1

0

0

0

0

-1

-1

2

0

-1

0

0

0

0

1

-1

0

1

0

0

0

-1

1

-1

0

1

0

0

0

0

0

1

0

1

0

0

0

Zig-zag ordering

DC-coefficient: code coefficients for one block as difference to the previous one

AC-coefficients: consider each block separately, order data using zig-zagsequence to achieve long sequences of zero-values:-3 4 -1 -5 2 -1 3 -3 -1 0 0 0 -1 2 -1 -10 1 1 0 0 0 0 -1 -1 1 -1 1 1 0 0 0 -1 0 0 0 0 0 -1 0 -1 0 0 0 1 0 0 0 0 0 01 0 1 0 0 0 0 0 0 0 0 0

Entropy encoding:

• Run-length encoding of zero values of quantized AC-coefficients• Huffman encoding on DC- and AC-coefficients




Run-length Encoding

Run-length encoding is a content-dependent coding technique

• Sequences of the same bytes are replaced by the number of their occurrences• A special flag byte is used which doesn‘t occur in the byte stream itself• Coding procedure:

� If a byte occurs at least four consecutive times, the “number of occurrences – 4”(offset = 4) is counted

� The compression data contain this byte followed by the special flag and the “number of occurrences – 4”

• As a consequence: Representation of 4 to 259 bytes with three bytes is possible (with corresponding compression effect)

Example with ‘!’ as special flag:Uncompressed sequence: ABCCCCCCCCDEFGGG

Run-length coded sequence: ABC!4DEFGGG

Offset of 4, since for smaller blocks there would be no reduction effect; e.g. with offset 3:• D!0 → DDD (both strings have same length)




Run-length Encoding

Similar it is done in JPEG:• The zero-value is the only one appearing in longer sequences, thus use a more efficient

coding by only “compressing” zero sequences: code nonzero coefficients together with their run-length, i.e. the number of zeros preceding the nonzero value

• Run-length ∈ {0,...,15}, i.e. 4 Bit for representing the length of zero sequences• Coded sequence: run-length, size, amplitude

with run-length number of subsequent zero-coefficientssize number of bits used for representing the following coefficientamplitude value of that following coefficient using size bits

• By adapting the size of representing a coefficient to its value achieves a further compression because most coefficients for higher frequencies have very small values

• If (run-length, size) = (15, 0) then there are more than 15 zeros after each other.

• (0,0) = EoB symbol (End of Block) indicates the termination of the actual rectangle (EoB is very frequently used)




Size i Amplitude

1 -1 ↔ 12 -3, -2 ↔ 2,33 -7,...,-4 ↔ 4,...,74 -15,...,-8 ↔ 8,...,15

-2i+1, ...,-2i-1 ↔ 2i-1,...,2i-110 -1023,...,-512 ↔ 512,...,1023

Example

35 zeroes

15, 0, 15, 0, 5, 7, 57

4 bits 1-complementRepresentation (other representations are

possible)

11 is for instance represented by:

size = 4, amplitude = 0011

The sequence 0 . . . 0 121 0 . . . is encoded by

In a second step, the string may be still reduced by Huffman encoding principles

0000 8

0001 9

0010 10

0011 11

0100 12

0101 13

0110 14

0111 15

1000 15

1001 14

1010 13

1011 12

1100 11

1101 10

1110 9

1111

∧

∧

∧

∧

∧

∧

∧

∧

∧

∧

∧

∧

∧

∧

∧

=

=

=

=

=

=

=

=

= −

= −

= −

= −

= −

= −

= −

8∧= −

35 zeros at all, followed by a value represented using 7 bit

With 7 bit, 121 is 64 + 57




Huffman Encoding

• The Huffman code is an optimal code using the minimum number of bits for a string of data with given probabilities per character

• Statistical encoding method: � For each character, a probability of occurrence is known by encoder and

decoder� Frequently occurring characters are coded with shorter strings than seldomly

occurring characters

� Successive characters are coded independent of each other• Resulting code is prefix free → unique decoding is guaranteed

• A binary tree is constructed to determine the Huffman codewords of the characters:� Leaves represent the characters that are to be encoded� Nodes contain the occurrence probability of the characters belonging to the

subtree� Edges of the tree are assigned with 0 and 1




Huffman Encoding

Algorithm for computing the Huffman code:1.) List all characters as well as their frequencies

2.) Select the two list elements with the smallest frequency and remove them from the list

3.) Make them the leafs of a tree, whereby the probabilities for both elements are being added; place the tree into the list

4.) Repeat steps 2 and 3, until the list contains only one element5.) Mark all edges:

Father → left son with “0”Father → right son with “1”

The code words result from the path from the root to the leafs




Huffman Encoding Example

Resulting Code:

p(ADCEB) = 1.00

p(C) = 0.16

0

00

0

1

11

1

p(CED) = 0.37

p(ED) = 0.21

p(E) = 0.07 p(D) = 0.14

p(AB) = 0.63

p(A) = 0.27

p(B) = 0.36

Suppose that characters A, B, C, D and E occur with probabilitiesp(A) = 0.27, p(B) = 0.36, p(C) = 0.16, p(D) = 0.14, p(E) = 0.07

x w(x)

A 10B 11C 00D 011E 010




Huffman Encoding in JPEG

Coding of run-length (∈ {0, …, 15}), size (∈ {0, …, 10})

• (i,j): i preceding zeroes (0≤ i ≤15) in front of a nonzero value coded with j bits

• The table has 10·16+2 = 162 entries with significantly different occurrence probabilities

• EoB is relatively frequent• ZRL: at least 16 successive zeroes, i.e.

ZRL = (15,0)

• Some values such as (15,10) are extremely rare: 15 preceding zeros in front of a very large value is practically impossible! The same holds for most of the combinations in the table.

• Thus: Huffman coding of the table entries will lead to significant further compression!

runlength

0 EoB1 (impossible)2 (impossible). .. .14 (impossible) 15 ZRL

0 1 2 3 ... 10

size

(i,j)

(1,3)

(15,10)




Huffman Encoding in JPEG

• Different Huffman tables for (run-length, size) are used for different 8x8 blocks, basing on their contents

• Thus the coding begins with a HTN (Huffman-table-number)

• The coding of amplitudes may also change from block to block

• Amplitude codes are stored in the preceding (run-length, size) coding table

A 8×8 block thus is coded as follows:

[VLC, DC coefficient, sequence of (run-length, size, amplitude) for the AC coefficients]

VLC = variable length code: contains actual HTN + actual VLI (Variable Length Integer), i.e. coding method for next amplitude




Characteristics:

• Achieves optimality (coding rate) as the Huffman coding

• Difference to Huffman: the entire data stream has an assigned probability, which consists of the probabilities of the contained characters. Coding a character takes place with consideration of all previous characters.

• The data are coded as an interval of real numbers between 0 and 1. Each value within the interval can be used as code word.

• The minimum length of the code is determined by the assigned probability.

• Disadvantage: the data stream can be decoded only as a whole.

Alternative to Huffman: Arithmetic Coding




Code data ACAB with pA = 0.5, pB = 0.2, pC = 0.3

pA = 0.5 pC = 0.3

ACAB can be coded by each binary number from the interval [0.3875, 0.4025),

rounded up to -log2(pACAB) = 6.06 i.e. 7 bit, e.g. 0.0110010

pB = 0.2

pAA = 0.25 pAC = 0.15pAB= 0.1 pBA pBCpBB pCA pCB pCC

pACA = 0.075 pACC = 0.045pACB = 0.03

pACAA = 0.0375 pACAC = 0.0225pACAB = 0.015

0

0

0.35

0.35

0.5

0.25 0.35 0.5

0.425

0.425

0.7 1

10.70.6 0.68 0.85 0.91

0.50.455

0.3875 0.4025

Arithmetic Coding: Example




Variants of Image Compression

JPEG is not a single format, but it can be chosen from a number of modes:

Lossy sequential DCT-based mode (baseline process)• Presented before, but not the only method

Expanded lossy DCT-based mode• Enhancement to the baseline process by adding progressive encoding

Lossless mode• Low compression ratio → „perfect“ reconstruction of original image

• No DCT, but differential encoding

Hierarchical mode

• Accommodates images of different resolutions• Selects its algorithms from the three other modes




With sequential encoding as in the baseline process the whole image is coded and decoded in a single run. An alternative to sequential encoding is progressive encoding, done in the entropy encoding step.

Two alternatives for progressive encoding are possible:

• Spectral selectionAt first, coefficients of low frequencies are passed to entropy encoding, coefficients of higher frequencies are processed in successive runs

• Successive approximationAll coefficients are transferred in one run, but most-significant bits are encoded prior to less-significant bits.

12 possible coding alternatives in the expanded mode:• Using sequential encoding, spectral selection, or successive approximation (3 variants)

• Using Huffman or Arithmetic encoding (2 variants)• Using 8 or 12 bits for representing the samples (2 variants)

Most popular mode: sequential display mode with 8 bits/sample and Huffman encoding

Variants: Expanded Lossy DCT-based Mode




Expanded Lossy DCT-based Mode (Example)

Progressive encoding: image is coded and decoded in refining steps

Sequential encoding: image is coded and decoded in a single run

Step 1 Step 2 Step 3

Step 1 Step 2 Step 3




Variants: Lossless Mode

Lossless mode uses differential encoding(Differential encoding is also known as prediction or relative encoding)• Sequence of characters whose values are different from zero, but which do not differ

much.→ Calculate only the difference wrt. the previous value (used also for DC-coefficients)

Differential encoding for still images:• Avoid using DCT/quantization• Instead: calculation of differences between nearby pixels or pixel groups

• Edges are represented by large values• Areas with similar luminance and chrominance are represented by small values

• Homogenous area is represented by a large number of zeros → further compression with run-length encoding is possible as for DCT




Variants: Lossless Mode

• Uses data units of single pixels for image preparation

• Any precision between 2 and 16 bits/pixel can be used• Image processing and quantization use a predictive technique instead of transformation

encoding

• 8 predictors are specified for each pixel X by means of a combination of the already known adjacent samples A, B, and C

predictor predicted values X

0 no prediction1 A2 B3 C4 A+B-C5 A+(B-C)/26 B+(A-C)/27 (A+B)/2

The number of the chosen predictor and the difference of the prediction to the actual value are passed to entropy encoding (Huffman or Arithmetic Encoding)

Example:(4,0): X is exactly given by A+B-C(7,1): X is (A+B)/2+1

ABCX

The actual predictor should give thebest approximation of x by the already known values A,B,C

Uncompressed data

Predictor

Entropy encoder

Compressed data




Variants: Hierarchical Mode

The Hierarchical mode uses either the lossy DCT-based algorithms or the lossless compression technique

The idea: encoding of an image at different resolutions

Algorithm:• Image is initially sampled at a low resolution• Subsequently, the resolution is raised and the compressed image is subtracted from the

previous result• The process is repeated until the full resolution of the image is obtained in a compressed

form

Disadvantage:• Requires substantially more storage capacity

Advantage:• Compressed image is immediately available at different resolutions→ scaling becomes cheap




JPEG 2000

Improvement of the original JPEG standard:

• 16 Bit color depth (up to 281 billion colors)• Progressive mode is not only an option, but mandatory

• Definition of “Regions of Interest” – choose an image region which will be less compressed than the rest of the image. Thus, images can be individually coded, improving the subjective image quality

• Integration of watermarks – invisibly embed additional information which can be recognized by certain programs; watermarks cannot be removed from the image

• Resync Marker – improve fault tolerance and error correction by setting markers; if an transmission error corrupts the data, not all following data are lost like in normal JPEG

• Most important: better compression, faster coding and decoding process by using Wavelets instead of DCT. Wavelets describe transforming functions, how fast image information are changing. No pixel blocks are used in compression, infinite functions are performed on finite image regions. Thus the wavelets can describe edges and hard changes better than DCT.




Wavelets

Wavelets are mathematical tools which allow to decompose functions in a hierarchical way, i.e. to describe a function by means of:

[overall shape, first detail, second detail, ...], e.g. by (“most important features, refinements, less important topics, ...”)

Thus, the most important aspects are available with a relatively small number of bits

Application of Wavelets:• Image compression• Image editing

• Animation• Signal processing

• ...




Haar Transforms

The simplest example of wavelet technology is the Haar transform. There are • one-dimensional Haar wavelet transforms (which allow the compression of the

representation of piecewise constant functions)• two-dimensional Haar wavelet transforms (for image compression, ...)• higher-dimensional Haar wavelet transforms

Example (one-dimensional):

• Simple example to show how we can reduce the amount of “bits needed for representation” by transformation and compression

• Let a string of pixels be given as follows:

9 7 3 5 6 8 6 4

• Then we do (recursively): � calculate the average of successive pairs

� calculate the distance from the average (“detail”)




1D Haar Transforms

• In the first step of this procedure we get: [average1; average2; ...; detail1; detail2; ...]

• Thereafter we get:[average1,2; detail1,2;...]

where average1,2 is the mean value of average1 and average2and detail1,2 is the mean value of detail1 and detail2

• This procedure may be continued. The detail values become less and less important

• From the transformed strings we can fully reconstruct the original• Very often the details are very small and may be suppressed. In such a case we

cannot exactly reconstruct the original but the errors are comparatively small.




Thus the sequence [9, 7, 3, 5, 6, 8, 6, 4]has been transformed to [6; 0; 2, 1; 1,-1,-1,1] detail coefficients finest resolution

detail coefficient coarsest resolution

9 7 3 5 6 8 6 4

8 4 7 5 +1 -1 -1 +1

6 6 2 1

6 0

Averages Details

Global average detail

Example




Example

Reconstruction:

If we would suppress the finest detail coefficients (i.e., “1, -1, -1, 1”) then we would reconstruct to: 8 8 4 4 7 7 5 5

instead of: 9 7 3 5 6 8 6 4

6; 0; 2, 1; 1, -1, -1, 1

6 = 6 + 0 6 = 6 - 0

8 = 6 + 2 4 = 6 - 2 7 = 6 + 1 5 = 6 - 1

9 = 8 + 1 7 = 8 - 1 3 = 4 + (-1) 5 = 4 - (-1) 6 = 7 + (-1) 8 = 7 - (-1) 6 = 5 + 1 4 = 5 - 1




Generalizsation

Generalization to piecewise constant functions (instead of strings):

• Let Vj be vector space of all functions which are piecewise constant in the 2j equal subintervals of [0:1]

• Every one-dimensional image with 2j

pixels can be considered as an element of Vj

(e.g. a two-pixel image has two constant parts over the intervals [0,0.5) and [0.5,1)

• The figure shows an element of V3

• Obviously: V3 is a refinement of V2 etc.

• A “basis“ for the vector space Vj is a set of functions by which all functions of Vj

may be represented as linear combinations of the basis functions.

0 1 2 3V V V V ...⊂ ⊂ ⊂ ⊂The piecewise constant sectionsmight be an approximation of the function f(x) according to a given norm

x

f(x)

1

2

4

68

10




Generalization

A simple basis of Vj (there are many other alternatives for a basis) is:

Definition:

1. The inner product of two functions f, g ∈ Vj is defined as 2. The L2 norm is defined by

Very often we try to approximate a given function f(x) by a function according to the L2 norm.I.e. find such that

j jj

i

j j

i i 11 x ;

2 2( x )

i i 10 x ;

2 2

φ

⎧ +⎡ ⎤∈⎪ ⎢ ⎥⎪ ⎣ ⎦= ⎨+⎡ ⎤⎪ ∉ ⎢ ⎥⎪ ⎣ ⎦⎩

for i = 0, ..., 2j - 11

0

1

30φ 3

4φ

1

0

f g : f ( x ) g ( x ) dx= ⋅∫

∫=><=1

0

2

2)(:)( 2

1

dxxuuuxu

f ( x )�

f ( x )�

2f ( x ) f ( x ) minimum for all f ( x )− =� �




Generalization

• In the following we will use another basis for Vj than the very simple basis which consists only of the elementary components shown before

• This will lead us to the concept of a Haar Wavelet basis

• Construction procedure of a suitable basis for Vj+1 from a basis of Vj:� Suppose that we already have a basis for Vj

� Then define a new vector space Wj as orthogonal complement of Vj in Vj+1, i.e. Wj := vector space of functions in Vj+1 which are orthogonal to all functions in Vj

( orthogonal means that <u|v>=0 if u ∈ Vj; v ∈ Wj)

• Wj is the detail of Vj+1 which cannot be expressed by Vj

• The basis functions of Vj together with the basis functions of Wj form a basis of Vj+1




Generalization

Definition: The elements of a basis of Wj (i.e. linearly independent functions which span Wj) are called Wavelets.

Immediate consequence:

1. together with the basis of Vj are a basis of Vj+1

2. is orthogonal to

→ The “detail coefficients“ are coefficients of the wavelet basis functions.

ji ( x )ψ

ji ( x )ψ

ji ( x )ψ

1j j j

k i k0

( x ), i.e . ( x ) ( x )dx 0φ ψ φ⋅ =∫




Generalization

• Wavelets for Wj (i.e. elements of Vj+1 which are orthogonal to basis elements of Vj) are denoted as Haar wavelets

• Haar wavelets of Wj (there would be other possibilities for basis functions):

12

j j

1j 2

i j j

ii1 if x ;

2 2

i i 1( x ) : 1 if x ;

2 2

0 o th e rw is e

ψ

⎧ +⎡ ⎞+ ∈⎪ ⎟⎢⎣ ⎠⎪

⎪ + +⎡ ⎞⎪= − ∈⎨ ⎟⎢⎣ ⎠⎪

⎪⎪⎪⎩

for i = 0, ..., 2j - 1




Example

Example: V2 basis functions Haar wavelets for W2

1

+1

-1

+1

-1

+1

-1

+1

-1

20ψ 2

1ψ

22ψ 2

3ψ

20φ

0 1

1

21φ

22φ

23φ

1




Example

We can (using these functions) redo the string example as follows:

3 3 3 3 3 3 3 30 1 2 3 4 5 6 7

2 2 2 2 2 2 2 20 1 2 3 0 1 2 3

10 1

S( x ) string (written as a function of x )

9 ( x ) 7 ( x ) 3 ( x ) 5 ( x ) 6 ( x ) 8 ( x ) 6 ( x ) 4 ( x )

8 ( x ) 4 ( x ) 7 ( x ) 5 ( x ) 1 ( x ) 1 ( x ) 1 ( x ) 1 ( x )

6 ( x ) 6

φ φ φ φ φ φ φ φ

φ φ φ φ ψ ψ ψ ψ

φ φ

=

= ⋅ + ⋅ + ⋅ + ⋅ + ⋅ + ⋅ + ⋅ + ⋅

= ⋅ + ⋅ + ⋅ + ⋅ + ⋅ − ⋅ − ⋅ + ⋅

= ⋅ + ⋅ 1 1 1 2 2 2 20 1 0 1 2 3

0 0 1 1 2 2 2 20 0 0 1 0 1 2 3

( x ) 2 ( x ) 1 ( x ) 1 ( x ) 1 ( x ) 1 ( x ) 1 ( x )

6 ( x ) 0 ( x ) 2 ( x ) 1 ( x ) 1 ( x ) 1 ( x ) 1 ( x ) 1 ( x )

ψ ψ ψ ψ ψ ψ

φ ψ ψ ψ ψ ψ ψ ψ

+ ⋅ + ⋅ + ⋅ − ⋅ − ⋅ + ⋅

= ⋅ + ⋅ + ⋅ + ⋅ + ⋅ − ⋅ − ⋅ + ⋅

9 7 3 5 6 8 6 4




Example

Graphical representation:

1

1

1

1

1

1

1

1

6x

+ 2x

+ 1x

+ 1x

+ 1x

0x

- 1x

- 1x

00φ00ψ10ψ11ψ

ψ 2021ψ22ψ23ψ

= 9 7 3 5 6 8 6 4




Example

Starting with the basis function for V0 we can refine it by the W0 detail coefficient, i.e. by to the basis for V1

This in turn can be refined by two detail coefficients of W1 to the basis of V2:

With four more detail coefficients of W2 we get a basis for V3 consisting of the eight basis functions

With eight more detail coefficients of W3 we get a basis for V4 etc.

This procedure leads to a better approximation of a given function (or of a 1D image)

00φ

00ψ

1 10 1, ψ ψ

11

10

00

00 , , , ψψψφ

23

22

21

20

11

10

00

00 , ,,; , ; ; ψψψψψψψφ




Approximation Of Continuous Functions

Approximation of a continuous function f(x)(dotted line) by averages and detail coefficients

• We start with V0. Then:

V0-appr. + W0-detail = V1-appr.V1-appr. + W1-detail = V2-appr.

V2-appr. + W2-detail = V3-appr.V3-appr. + W3-detail = V4-appr.

We can also work backwards, i.e. start withthe V4 approximation and go to the V3

approximation by suppressing the W3

detail coefficients etc.

W3 detail coefficients




0 1V0 approximation

average of f(x)

0 1V2 approximation

0 1V3 approximation

0 1V4 approximation

0 1V1 approximation




Approximation Of Continuous Functions

All Haar basis functions are orthogonal to each other (of course not to itself); i.e. <f|g> = 0 if f ≠ g are basis functions

• Orthogonality to all other basis functions is not necessarily valid for other systems of basis functions

• In addition to orthogonality we can provide orthonormality, i.e. <f|f> = 1• The basis functions

are orthonormal

• If the basis functions are orthonormalized, then the detail coefficients and the average coefficient have to be multiplied by

• In our example we then get:

,.... , , , 11

10

00

00 ψψψφ

)(2 :)(

)(2 :)(

2j

2j

*

*

xx

xx

ji

ji

ji

ji

ψψ

φφ

⋅=

⋅=

j22

−

0 0 1 1 2 2 2 20 0 0 1 0 1 2 3

0* 0* 1* 1* 2* 2* 2* 2*2 1 1 1 1 10 0 0 1 0 1 2 32 2 2 22 2

S(x) 6 0 2 1 1 1 1 1

6 0

φ ψ ψ ψ ψ ψ ψ ψφ ψ ψ ψ ψ ψ ψ ψ

= ⋅ + ⋅ + ⋅ + ⋅ + ⋅ − ⋅ − ⋅ + ⋅

= ⋅ + ⋅ + ⋅ + ⋅ + ⋅ − ⋅ − ⋅ + ⋅




Decomposition

Decomposition of a sequence of h numbers together with a normalization: each original coefficient with superscript j is multiplied by 2(-j/2)

proc DecompositionStep(C:array[1..h] of reals)for i:=1 to h/2 doC’[i]:=(C[2i-1]+C[2i])*sqrt(2)C’[h/2+i]:=(C[2i-1]-C[2i])*sqrt(2)

end forC:=C’;

end proc

proc Decomposition(C:array[1..h] of reals)C := C/sqrt(h); // normalize input coefficientswhile h>1 doDecompositionStep(C[1..h]);h:=h/2;

end while;end proc




Usage for Data Compression

Haar wavelets for data compression:• Suppose that a function f(x) is given by a linear combination of m basis functions, e.g.

by a linear combination of m Haar wavelet functions:

• We want to reduce the number of coefficients, i.e. we seek for a function which:

1. Is similar to f(x), i.e.

2. May be represented by fewer coefficients (possibly with other basis functions) →

• Finding the best is a difficult problem if all possible are taken into account. Thus we restrict ourselves to the Haar Wavelet basis

m

i ii 1

0 21 0 8 3

f ( x ) c u ( x )

for example: u ( x ) ( x ),...., u ( x ) ( x )φ ψ=

= ⋅

⎡ ⎤= =⎣ ⎦

∑

f ( x )�

f ( x ) f ( x ) for some norm ε− ≤�…

i iu ( x ) instead of u ( x )�

∑ <⋅=m

iii mmxucxf

~

=1

~with )(~~)(~

f ( x )�

iu ( x )�




→ The error is minimized if for the smallest coefficients in absolute values are used

If we do not change the basis function but reduce the number of coefficients then it is easy to show that:

• If the basis functions are orthonormal, then for coefficients and for the L2-norm, i.e. , the error is minimized if we keep the coefficients of c1 ,..., cm whose absolute values are the largest ones

Proof: Let σ1, ..., σm be a permutation of 1, ..., m

Data Compression

m m<�

f f f(x) f (x)| f(x) f (x)− =< − − >� � �

mm 1c ,...,cσ σ+

�

i i

m

i 1

f (x) c uσ σ=

= ⋅∑�

�

1)( 2

1~

1~ 1~

1~1~

⋅=

><⋅=

⋅⋅=

∑

∑ ∑

∑∑

+=

+= +=

+=+=

i

iji

Jjii

c

uucc

ucuc)(~

)( -⇒ xfxf

m

mi

j

m

mi

m

mj

m

mj

m

mi

σ

σσσσ

σσσσ

since ui are orthonormal

Let




Data Compression

Approximation of f(x) by V3 , W3

i.e. 8 averages

8 details

And (further on) by:

• The sequence of pictures shows the effect (i.e. loss of exactitude) if more and more coefficients are eliminated

3 30 7

3 30 7

( x ),..., ( x )

( x ),..., ( x )

φ φψ ψ

orthogonal functions!!(for simplicity the * has

been suppressed)

16 coeffi-cients

0 0 1 1 2 2 3 30 0 0 1 0 3 0 7; ; ; ; ,..., ; ,...,φ ψ ψ ψ ψ ψ ψ ψ

global average




2D Haar Wavelets

• We have shown how to use one-dimensional Haar wavelets for representation and compression of one-dimensional functions

• Now we generalize to 2D-images• First we show how to apply the (“averaging“+“detail“) technique to 2D-images

�

�

�

�

�

�

�

�

�

�

�

�

�

�

�

�

�

�

�

�

�

�

�

�

�

�

�

�

�

�

�

�

a pixel of 2D-image

�x∼∼++++ �x∼∼++++�x∼∼++++�x∼∼++++

Details of differentresolution

(see previousexamples)

average

�◊ααββββXXXX++++∼∼∼∼−−−−∼∼∼∼−−−−

global average Detailcoefficients

First approach: Standard decomposition1. Apply the 1D-approach to each row.

Result: Average + details for each row 2. Apply the 1D-approach to each

column of the results obtained by step 1

I.e.: first the rows, then the columns




2D Haar Wavelets

Second Approach: Nonstandard decomposition

• Alternate between rows and columns1. Out of the 2n values per row:

� calculate n averages + n details

2. Take the result and do the same for the columns:� i.e. calculate m averages + m details for the 2m values per column

Result of steps 1 and 2:

3. Repeat step 1 and 2 for the “averages” part, i.e. for the left upper corner4. Do that recursively until only one global average is left over

• Remark: This technique works only for square images, i.e. 2n×2n pixels

••••••

••••••

••••••

••••••

••••••

••••••

••••••

••••••

averages

detail coefficients

step 2(for left

half only)

step 1




2D Haar Wavelet Basis

There are two methods for the construction of a two-dimensional basis:1. The standard construction2. The nonstandard construction

Regarding Vi which has 2i basis functions in 1D, namely

we get 2i ×2i basis functions for Vi in 2D by combining the 1D basis functions with each other.

Example for V2 where the 2D-basis will consist of 4x4 = 16 basis functions (since there are 4 basis functions )

i-10 0 1 1 2 2 i-1 i-10 0 0 1 0 3 0 2 1

; ; , ; ,..., ;.....; ,..., φ ψ ψ ψ ψ ψ ψ ψ−

0 0 1 1 20 0 0 1, , , in Vφ ψ ψ ψ




Let be the 1D basis functions

Then is a basis function for 2D images

→ There are 2i x 2i = 22i basis functions for Vj in 2D.

• Example:

• Doing that for

The Standard Construction

i1 2u ,...,u

1 2

ij j j 1 2v ( x,y ) u ( x ) u ( y ) for j , j 1 : 2⎡ ⎤= ⋅ ∈ ⎣ ⎦

0 00 00 00 01 10 01 11 1

( x ) ( y )

( x ) ( y )

( x ) ( y )

( x ) ( y )

φ φψ ψψ ψψ ψ

y y y

xx

0

x0

- +-+

0

+ means: +1- means: - 1

+ -+

-

)(10 xψ )(1

1 yψ )()( 11

10 yx ψψ ⋅

we get all the 16 basis functions for V2 in 2D




The Nonstandard Construction

The nonstandard construction of a two-dimensional basis defines

• a two-dimensional scaling function • and three wavelet functions

(x,y ) : (x) (y )φφ φ φ=(x,y ) : (x) (y )

(x,y ) : (x) (y )

(x,y ) : (x) (y )

φψ φ ψψφ ψ φψψ ψ ψ

===

j j j jkl

j j j jkl

j j j jkl

(x,y ) : 2 (2 x k,2 y l )

(x,y ) : 2 (2 x k,2 y l )

(x,y ) : 2 (2 x k,2 y l )

φψ φψφψ ψφφψ ψψ

= − −

= − −

= − −

Furthermore we define scaling by a superscript j and horizontal and

vertical translations by a pair of subscripts k and lThe nonstandard basis consists of as well as of scaled and translated versions of the wavelet functions

The result is

000 ( x ,y ) : ( x,y )φφ φφ=

, , and .φψ ψφ ψψ




Nonstandard Construction Of 2D Haar Wavelets

+-

10, 1( x,y )φψ

+-

10, 0 ( x,y )φψ

-

+

00, 0 ( x,y )φψ

function forglobal average

+

00, 0 ( x,y )φφ

-

+

11, 1( x,y )φψ

+-

11, 0 ( x,y )φψ

-

-

+

+

00, 0 ( x,y )ψψ

-+

00, 0 ( x,y )ψφ

+

+ --

10, 1( x,y )ψψ

+

+ --

10, 0 ( x,y )ψψ

-+

10, 1( x,y )ψφ

-+

10, 0 ( x,y )ψφ

+

+ --

11, 1( x,y )ψψ

+

+ --

11, 0 ( x,y )ψψ

-+

11, 1( x,y )ψφ

-+

11, 0 ( x,y )ψφ




The Standard Construction

The standard construction of a 2D Haarwavelet basis for V2

• We calculate the products of the one-dimensional basis for both dimensions

• A detail: The portion of the square [0:1]x[0:1] where the basis function is different from zero is not a square for all 16 functions

• If we apply the standard construction to an orthonormal basis in one dimension, we get an orthonormalbasis in two dimensions

-

+

-

-

+

+

-+

function forglobal average

+

+

+ -- +

+--

+

+-- +

+--

+-

+-

--

+ +- +

-

-+

-++ -

-+ -+

0 10 1(x) (y)φ ψ 0 1

0 1(x) (y)ψ ψ 1 10 1(x) (y )ψ ψ 1 1

1 1(x) (y)ψ ψ

0 10 0(x) (y )φ ψ 0 1

0 0(x) (y )ψ ψ 1 10 0(x) (y )ψ ψ 1 1

1 0(x) (y )ψ ψ

0 00 0(x) (y )φ ψ 0 0

0 0(x) (y)ψ ψ 1 00 0(x) (y )ψ ψ 1 0

1 0(x) (y)ψ ψ

0 00 0(x) (y )φ φ 0 0

0 0(x) (y )ψ φ 1 00 0(x) (y )ψ φ 1 0

1 0(x) (y )ψ φ

0 0 1 10 0 0 1, , ,φ ψ ψ ψ

-

++

+-

- -

-

-

-

-- +

+

+

+

-

+




Comparison

The standard decomposition of an m x m image requires one-dimensional transforms on all rows and then on all columns. 4(m2-m) assignment operations are needed

The nonstandard decomposition requires only (8/3) (m2-1) assignment operationsIn the nonstandard case, all the nonstandard basis functions have square „support“(„support“ = area where the function is nonzero). This is not the case for standard basis functions.




Decomposition

Only to remember: decomposition of a sequence of h numbers together with a normalization.

These procedures are needed in the 2-dimensional decomposition.

proc DecompositionStep(C:array[1..h] of reals)for i:=1 to h/2 doC’[i]:=(C[2i-1]+C[2i])*sqrt(2)C’[h/2+i]:=(C[2i-1]-C[2i])*sqrt(2)

end forC:=C’;

end proc

proc Decomposition(C:array[1..h] of reals)C := C/sqrt(h); // normalize input coefficientswhile h>1 doDecompositionStep(C[1..h]);h:=h/2;

end while;end proc




Decomposition

The standard decomposition of a 2D picture of h×w pixels

proc StandardDecompositionStep(C:array[1..h,1..w] of reals)for row:=1 to h doDecomposition(C[row,1..w]);

end forfor col:=1 to h doDecomposition(C[1..h,col]);

end forend proc

The nonstandard decomposition of a 2D picture of h×w pixelsproc NonstandardDecompositionStep(C:array[1..h,1..w] of reals)

C:=C/h; // normalize input coefficientswhile h>1 dofor row:=1 to h do

DecompositionStep(C[row,1..h]);end forfor col:=1 to h doDecompositionStep(C[1..h,col]);

end forend while

end proc




Decomposition

Instead as of the unnormalized Haar Wavelet functions

j2

j2

j ji 1j j 2

i 12

j ji

j ji

1 for 0 x 1( x ) :

0 otherwise( x ) : ( 2 x i )

where 1 for 0 x( x ) : ( 2 x i )

( x ) : 1 for x 1

0 otherwise

( x ) : 2 ( 2 x i )*

( x ) : 2 ( 2 x i )

φφ φ

ψ ψψ

φ φ

ψ ψ

≤ ≤⎧= ⎨⎩

= −≤ <⎧

= − ⎪= − ≤ <⎨⎪⎩

= ⋅ −

= ⋅ −

As a compensation we have to multiply each unnormalized coefficient with superscript j by 2(-j/2)

Example:0 0 1 1 2 2 2 20 0 0 1 0 1 2 3

2 1 1 -1 -1 12 2 2 22 2

0* 0* 1* 1* 2*0 0 0 1 0

Unnormalized version:

coefficients (6;0;2,1;1,-1,-1,1)

basis functions ( ; ; , ; , , , )

Normalized version:

coefficients (6;0; , ; , , , )

basis functions ( ; ; , ;

φ ψ ψ ψ ψ ψ ψ ψ

φ ψ ψ ψ ψ

==

=

= 2* 2* 2*1 2 3

normalized according to (*), , , )ψ ψ ψ

we can use the normalized Haar wavelet functions




Decomposition

Comment on input normalization

• C’[i]:=(C[2i-1]+C[2i])*sqrt(2), i.e. the average is multiplied by

• At the beginning we normalize by

Example: Let h = 2n, e.g. h = 24; basis functionsThen the higher superscript is n - 1, e.g. 3

• We normalize first:

• Then in the first run we multiply the first detail by , then the coefficient is normalized by

• The same holds for the averages. In successive runs the other detail levels and the global average are correctly evaluated by means of successive multiplications with

2

0 0 30 0 7( ; ; ; )Θ Ψ Ψ…

n2n

C C CC :

h 22= = =

2

n n 12 2

C C2

2 2−

⎛ ⎞ =⎜ ⎟⎝ ⎠

2

CC :

h=




2D Image Compression

2D-image compression with 2D-Haar basis functions(Generalization of the corresponding technique for 1D-images)

• Step1: Compute coefficients c1,...,cm which represent an image in a normalized Haarbasis. (Image = c1·u1(x,y) + c2 ·u2(x,y) + ... + cm ·um(x,y))

• Step2: Sort c1,...,cm in order of decreasing magnitude. Result:

• Step3: Find the smallest k with where ε is the allowed error regarding the L2norm,i.e.

i

m2 2

i k 1

cσ ε= +

≤∑!

f f ε− ≤�

1 mc ,...,cσ σ




Decomposition

A fast procedure (by binary search) for finding the threshold which coefficients are negligible

proc Compress(C:array[1..m]of reals, ε: real)τmin:=min{|C[i]|}τmax:=max{|C[i]|}do

τ:=(τmin+ τmax)/2s:=0for i:=1 to m doif |C[i]|≥ τ then s:=s+(C[i])^2

end forif s < ε^2 then

τmin:= τelse

τmax:= τuntil τmin ≈ τmaxfor i:=1 to m doif |C[i]|< τ then C[i]:=0;

end forend proc




Decomposition

• This procedure starts with coefficients C[1], …, C[m] and finds the smallest for which

The algorithm works as follows:

1. It starts with a threshold τ which is the average between the smallest and the largest coefficient

2. It computes the L2 error if all coefficients smaller in magnitude than τ would be discarded

3. If “squared error < ε2” then we continue with the right half of the interval

4. If “squared error ≥ ε2” then we continue with the left half of the interval

i

1 2 m

m2 2

1 mi m 1

2

c where , , is a permutation of 1, ,m

such that c c c

and where is a tolerable L error

σ

σ σ σ

ε σ σ

ε

= +

≤

≥ ≥ ≥

∑�

… …

�

m�




DecompositionLet (without any loss of generality) c1, c2, …, cm be already sorted such that |c1| (τmax) ≥ |c2| ≥ … ≥ |cm| (τmin)

Let s := “squared error by discarding all coefficients smaller than τ, i.e. all coefficients on the left side of τ”

Case 1: s < ε2

We could even discard (possibly) more than those coefficients! Hence we discard the part on the left of τ. We set τnew,min:= τ and restart with the new interval.

Case 2: s > ε2

By discarding the elements left of the actual τ we would discard too many elements. The elements left of τ must be kept anyway. Set τnew,max:= τ and restart.

τnew,min = τ τmax

new intervalτmin

Still very small error?

τnew,max = τ τmax

new interval

τmin

Error too large?




Multiresolution Analysis

Multiresolution analysis means analyzing a signal at different frequencies giving different resolutions

• Consider a nested set of vector spaces V0⊂V1 ⊂ V2 ⊂...• Basis functions of Vj are sometimes called scaling functions• Wj := orthogonal complement of Vj in Vj+1

[orthogonal according to some definition of an “inner product“]• The functions which we choose as a basis for Wj are called wavelets

• Example (Haar Wavelets):

• Matrix notation:• If the basis Vj and Wj consists of mj and nj elements respectively, then we may

combine the basis functions into single row matrices

22ψ 2

3ψ

1

20ψ

0

21ψ

1

j jj j j j j j

0 0m 1 n 1( x ) : [ ( x ),..., ( x )]; ( x ) : [ ( x ),..., ( x )]Φ φ φ Ψ ψ ψ

− −= =




Matrix Formulation

Some immediate consequences:

1. Vj ∪ Wj = Vj+1 ; Wj orthogonal to Vj→ mj + nj = mj+1

2. Vj-1⊂ Vj→ The elements of Vj-1 must be expressible as linear combination of the “finer“

functions of Vj, i.e. the scaling functions must be “refinable”

Example:

We have:

There must be a matrix Pj withwhere Pj is a mj × mj-1 matrix

20φ

0 1

21φ

1

10φ

0 1

22φ

1

23φ

1

11φ

1

1 2 20 0 1

1 2 21 2 3

φ φ φφ φ φ

= +

= +

j 1 j j(a) ( x ) ( x ) PΦ Φ− = ⋅

mj-1

mj

Scaling functions of V1:

Scaling functions of V2:




Matrix Formulation

3. Wj-1⊂ Vj→ Elements of Wj-1 can be written as linear combination of scaling functions

of Vj

→ There is a mj × nj-1 matrix Qj with

Example:

4. (a) and (b) can be combined to:

In our example:

j-1 j j( b ) ( x ) ( x ) QΨ Φ= ⋅

1 2 2 1 2 2

2 2

( x ) ( x ) P and ( x ) ( x ) Q

1 0 1 0

1 0 1 0with: P and Q

0 1 0 1

0 1 0 1

Φ Φ Ψ Φ= ⋅ = ⋅

⎛ ⎞ ⎛ ⎞⎜ ⎟ ⎜ ⎟−⎜ ⎟ ⎜ ⎟= =⎜ ⎟ ⎜ ⎟⎜ ⎟ ⎜ ⎟⎜ ⎟ ⎜ ⎟−⎝ ⎠ ⎝ ⎠

j 1 j 1 j j jP QΦ Ψ Φ− −⎡ ⎤ ⎡ ⎤= ⋅⎣ ⎦ ⎣ ⎦

1 1 1 1 2 2 2 20 1 0 1 0 1 2 3

1 0 1 0

1 0 1 0

0 1 0 1

0 1 0 1

φ φ ψ ψ φ φ φ φ

−⎛ ⎞⎜ ⎟−⎜ ⎟⎡ ⎤ ⎡ ⎤= ⋅⎣ ⎦ ⎣ ⎦ ⎜ ⎟⎜ ⎟⎜ ⎟−⎝ ⎠




Matrix Formulation

5. All functions of must be orthogonal to all functions of

→

→ (from (b)):

6. is a homogeneous systems of equations. The set of solutions is called “null space“ of

j 1 ( x )Φ − j 1( x ).Ψ −

j 1 j 1 jQ 0Φ Φ− −⎡ ⎤< > ⋅ =⎣ ⎦

j 1 j 1 0Φ Ψ− −⎡ ⎤< > =⎣ ⎦j 1 j 1

k l(Matrix whose (k,l) entry is )φ ψ− −< >

j 1 j 1 jQ 0Φ Φ− −⎡ ⎤< > ⋅ =⎣ ⎦j 1 j 1 .Φ Φ− −⎡ ⎤< >⎣ ⎦




Matrix Formulation

The columns of Qj are a basis for the null space. There are many alternatives for Qj, thus there are many different wavelet bases for a given Wj. The Haar wavelets are defined by the additional requirement that the number of consecutive nonzero values in Qj per column is minimal

Definition• Orthogonal wavelet basis: all functions (wavelets + scaling functions) are orthogonal to

each other• Semi-orthogonal wavelet basis: wavelets orthogonal to scaling functions but not to

each other

• Haar basis orthogonal; spline basis semi-orthogonal

2

1 0

1 0Q

0 1

0 1

⎛ ⎞⎜ ⎟−⎜ ⎟=⎜ ⎟⎜ ⎟⎜ ⎟−⎝ ⎠

=̂=̂




Matrix Formulation

A matrix notation for the approximation of functions

• We have shown earlier that (and how) it was possible to approximate a function f(x)in Vj by

where ui(x) are basis functions of Vj

• The coefficients cij might be, for example, pixel colors or y-coordinates of the

function f(x)• We can write the coefficients ci

j as

j jj j

0 0 m 1 m 1f ( x ) c u ( x ) ... c u ( x )

− −= ⋅ + + ⋅

j

Tj j j0 m 1

C c ,...,c−

⎡ ⎤= ⎣ ⎦

jj j j

0 m 1f ( x ) U C where U [u (x),...,u (x)]

−= ⋅ =




Matrix Formulation

• We may want to express f(x) within Vj-1 (a lower resolution version, i.e. less accurate, with lower number of coefficients, i.e. mj-1 instead as of mj coefficients)

• The standard procedure of creating the mj-1 coefficients of Cj-1 out of the mj coefficients of Cj consists of:

� linear filtering� down sampling

• This may be expressed as a matrix equation: Cj-1 = Aj · Cj

where Aj is a mj-1× mj matrix

• Since Cj-1 is smaller then Cj this “filtering process“ looses some detail.

The lost detail may be expressed in a second matrix Dj-1 with Dj-1 = Bj · Cj where Bj is a nj-1 × mj matrix

the coefficients of Cj

mj-1

mj




Matrix Formulation

Cj-1 = Aj · Cj Dj-1 = Bj · Cj

If Aj and Bj are appropriately chosen, Cj may be recovered from Cj-1 and Dj-1: Cj = Pj · Cj-1 + Qj · Dj-1

• The process of calculating Cj-1 and Dj-1 from Cj is called decomposition of Cj or <analysis>

• The process of calculating Cj from Cj-1 and Dj-1 is called reconstruction of Cj or <synthesis>

The global procedure looks as follows:

Cj Cj-1 Cj-2......C1 C0

Low resolution of Cj Detail of Cj which cannot be expressed by Cj-1

Aj Aj-1 A1

Dj-1 Dj-2 D0Bj Bj-1 B1

This recursive procedure is called a “filter bank“

The coefficients Cj can be reconstructed from: C0,D0,D1,D2,...,Dj-1

This sequence is called wavelet transform. It has the same size as Cj.




Example

For the Haar basis of V2 we have

• A general relation which must be satisfied by the matrices Aj and Bj is :

• Thus since (shown earlier) we must have

i.e. must be invertible

2 21 12 2

1 1 0 0 1 1 0 0A B

0 0 1 1 0 0 1 1

−⎛ ⎞ ⎛ ⎞= ⋅ = ⋅⎜ ⎟ ⎜ ⎟−⎝ ⎠ ⎝ ⎠

• A2 is the averaging operation

• B2 is the differencing operation (this explains the factor ½)C1 = A2 ·C2

D1 = B2 ·C2

jj 1 j 1 j

j

AB

Φ Ψ Φ− − ⎡ ⎤⎡ ⎤ ⋅ =⎢ ⎥⎣ ⎦

⎣ ⎦

j 1 j 1 j j jP QΦ Ψ Φ− −⎡ ⎤ ⎡ ⎤=⎣ ⎦ ⎣ ⎦j 1

j jj

AP Q

B

−⎡ ⎤⎡ ⎤=⎢ ⎥ ⎣ ⎦

⎣ ⎦

j

j

AB

⎡ ⎤⎢ ⎥⎣ ⎦





How to choose a suitable technique for transformation? “Suitable“ for a particular application.

The following steps have to be made:

1. Select the scaling functions(this defines Vj and Pj)

2. Select an inner product for V0, V1, ...(this defines the L2 norm as well as Wj)

3. Select a set of wavelets which are a basis for Wj (j = 0, 1, ...)(this defines Qj)

Remark: Pj and Qj define Aj and Bj since

j ( x ) (j 0,1,...)Φ =

j ( x )Ψ

j 1j j

j

AP Q

B

−⎡ ⎤⎡ ⎤=⎢ ⎥ ⎣ ⎦

⎣ ⎦





It is desirable for reasons of data compression that:

1. Wavelets are an orthogonal basis for Wj

2. Wavelets have a small support (the support of a function f(x) is the number of points where f(x)≠ 0)

But: orthogonality comes very often with “large support“. Thus it may be better to “sacrifice“ the orthogonality. An example for that are the spline wavelets which:

� have minimum support� are not orthogonal to each other (except for degree = 0)




B-Spline Wavelets

Haar type wavelets have many advantages:• simplicity

• orthogonality• very small supports• non-overlapping scaling functions for a given level j• non-overlapping wavelets for a given level j

However, Haar transforms are not well suited for animation and for curve editing since they do not have enough “smoothness“

• Therefore, we are interested in wavelets which have several continuous derivatives• Such wavelets can be derived from piecewise polynomial splines (B-splines)

• B-spline scaling functions and B-spline wavelets can be defined for different degrees d = 0, d = 1, d = 2, ...

• The higher the degree the smoother the transform• With degree d, the scaling functions have d - 1 continuous derivatives

• d = 0 is the Haar transform




Non-Uniform B-Spline Scaling Functions

The nonuniform B-spline basis functions for degree d are constructed as follows:

• Choose positive integers k and d (k ≥ d) and values These values are called knots

• Then we define recursively the non-uniform B-spline basis functions of degree d:

(i = 0, ..., k; r = 1, ..., d)

Remark: If the denominator is 0 then the whole term is defined to be zero

i i 10i

1 x x xN ( x ) :

0 otherwise+≤ <⎧

= ⎨⎩

i r 1

i r 1 i 1

x xr r 1 r 1ii i i 1x x

i r i

x xN ( x ) : N ( x ) N ( x )

x x+ −

+ − −

−− −+−

+

−= ⋅ + ⋅−

0 k d 1x ,..., x + +




Uniform B-Spline Scaling Functions

• Endpoint-interpolating B-splines of degree d on the interval [0,1] are defined by setting the first d + 1 knots to 0 and the last d + 1 knots to 1

• form a basis for the space of piecewise-polynomials of degree dwith d-1 continuous derivatives

• Finally, uniformly spaced B-splines are constructed by selecting k = 2j + d – 1 and by making the interior knots equally spaced.

• This gives 2j + d B-spline functions which are a basis of a particular degree d and a level j, i.e. for Vj(d).

• If Vj(d) denotes the space which is spanned by the B-spline scaling functions of degree j with 2j uniform intervals, then the spaces V0(d), V1(d), ... are nested: V0(d)⊂ V1(d) ⊂ V2(d) ⊂ ...

d d0 kN ( x ),...,N ( x )

d 1 kx ,..., x+

jd d0 2 d 1

N ( x ),...,N ( x )+ −




B-Spline Scaling Functions

ExampleJ = 1, i.e. 21 = 2 subintervals: Degrees d=0, d=1, d=2, d=3

B-spline scaling functions

The functions Njd(x) have

d - 1 continuous derivatives.N0

d(x), ..., Nid(x) are a basis

for V1(d)

00N

01N

10N

11N

12N

20N

21N

22N

23N

31N

30N

34N

33N

32N

Degree 0

Degree 1

Degree 2

Degree 3

0 1

0 1

0 1

0 1

1 12 2[ 0 : ), [ : 1)

jd d d0 d 12 d 1

N ( x ),...,N ( x ) N ( x )++ −=




Scaling and Wavelet Functions for V3(d)

In the following we show B-spline scaling functions for j = 3 and for degrees

• d = 0 (Haar wavelet functions)• d = 1 (linear splines)

• d = 2 (quadratic splines)• d = 3 (cubic splines)

We get 2j + d = 8 + d scaling functions and 2j =8 wavelets for each case• The wavelets are determined by matrices which satisfy

• The solution of this equation system is not unique

jQ j 1 j 1 jQ 0Φ Φ− −⎡ ⎤< > ⋅ =⎣ ⎦




B-Spline Scaling Functions for V3(0)

8 scaling functions 8 wavelets

0 0

1 2 78 8 8

lim f ( x ) lim f ( x )

if x , ,...,ε ε

ε ε→ →

+ ≠ −

=

Degree d = 0 (Haar wavelets) - not a continuous function:

0 01 1





Degree d = 1 (linear B-spline wavelets)


0 011





Degree d = 2 (quadratic B-spline) - can be continuously differentiated once


0 01 1




B-Spline Scaling Functions for V3(d)

Degree d = 3 (cubic B-spline) - can be differentiated continuously two times:


There are 2j + d scaling functions if the unit interval [0:1) is subdivided into 2j

subintervals. The scaling functions have d-1 continuous derivatives

00

11




Wavelets in JPEG2000

Using wavelets, images can better be compressed than with DCT.Used in compression here: nonstandard decomposition




Decomposition

Vertical information(fine details)

Horizontal information(fine details)

Diagonal information(finest details)

Coarser details




JPEG vs. JPEG2000

(a) (b) (c)

(a) Original Image256x256Pixels, 24-Bit RGB

(b) JPEG - Compressed with compression ratio 43:1(c) JPEG2000 - Compressed with compression ratio 43:1




But…

Disadvantages of using Wavelets• The cost of computing as compared to DCT may be higher• The use of larger basis functions or wavelet filters produces blurring and ringing

noise near edge regions in images or video frames

• Longer compression time• Lower quality than JPEG at low compression rates




Graphics Format

Graphics image formats are specified through:

• Graphics primitives: lines, rectangles, circles, ellipses, text strings (2D), polyhedron (3D), vectors

• Attributes: line style, line width, color affect.

Graphics primitives and their attributes allow a higher level of an image representation by abstract representation. The graphics package determines which primitives are supported.

• Formats� IGES (Initial Graphics Exchange Standard), for transferring 2D/3D CAD data

� DXF: developed for 2D/3D AutoCAD data� HPGL (Hewlett Packard Graphics Language), developed for addressing plotters

• Also possible: mixed formats (vector and raster graphics), e.g. WMF (Windows Metafile)




Raster and Graphics Formats

Graphics (vector) format

Raster format

Advantages of graphics format

• Reduction of the graphical image data

• Easier manipulation of graphical images (enlarging, scaling, ...)

Disadvantages of graphics format

• Higher computation effort for representation on a monitor – transformation to a raster representation




Image Synthesis

Image synthesis (generation) is an integral part of all computer graphical user interfaces and indispensable for visualising 2D, 3D and higher dimensional objects. E.g.:

• Graphical User Interface (GUI)� Desktop windows system with icons and menu items

� Office Automation and Electronic Publishing: desktop publishing, Hypermedia systems

� Simulation and Animation for Scientific Visualisation and Entertainment

• Pictures can be dynamically varied by adjusting the animation speed, portion of the total scene in view, the amount of details shown etc.

� Motion Dynamics: Objects are moved and enabled with respect to a stationary or also dynamic observer, e.g. flight simulator

� Update Dynamics: Objects being viewed are changed in shape, color, or other properties, e.g. deformation of an in-flight aeroplane structure




Components of Interactive Graphics Systems

• Application model� Represents data or objects to be pictured (stored in an application database)� Stores graphics image formats and connectivity relationships of the components� Should be application specific and independent of any particular display system

� Converts image database representations to the graphics system format

• Application program� Handles user inputs by sending commands to the graphics system describing what to

display and how this objects should appear

• Graphics system� Intermediary component between application programs and the display� Performs an output transformation on objects in the application model

� Performs an input transformation on user actions to the application� Consists of output subroutines collected in a graphics package to display images

• Graphics hardware� Receives input from interaction devices and outputs images to display device




Input: Mouse, keyboard, data tablet, touch-sensitive panel on the screen (2D input)track-balls, space-balls, data glove, etc (3D and higher-dimensional input)

Output: raster display

Computer-Host Interface

Display Controller (DC)

Video Controller1101100001010010100010100010101000000100000000000001000010000111010110111000001000010010000001001000000000000001000100000101010101111

Refresh Buffer

Display Commands

Interaction Data

Mouse

Keyboard

Sampled Image

Graphics Hardware




Start raster scan

raster lines

vertical retrace

pixel

horizontal retrace

• To avoid flickering of images, a 60 Hz (or higher) refresh rate is recommended• For each pixel, the beam’s intensity is set to reflect the pixel’s intensity

• In color screens, three beams are controlled (red, green, blue)• Monochrome displays use achromatic light that is determined by the quality of the light

(intensity and luminance)

Raster Display




Construct a line from left to right with a maximal gradient 0 ≤ m ≤ 1, (≤ 45°), all other lines are achieved through reflections.

Next pixel is the one in the east Next pixel is the one in the north-east

• •

•

Object Representation - Bresenham Line

•

1

•

•

1

12jd + ≤

10

2jd< ≤

1

12jd + >

0jd =

1 1j jy y+⇒ = +

Problem: an exact representation of a geometrical object on a raster display is impossible! How to determine a good approximation by choosing “near” pixels?

→ Bresenham Algorithm. Determine the pixels nearest to a real line as follows:

1j jy y+⇒ =




•

1

•

•

•

•

i i+1 i+2 i+3

didi+1

di+2

Yi

Yi+1

Yi+2

Yi+3

+

+ +

+ +

+

= += + = += + − = + −

⎛ ⎞>⎜ ⎟⎝ ⎠

1

2 1

3 2

2

2

1 3 1

1since

2

i i

i i i

i i i

i

d d m

d d m d m

d d m d m

d

+

+ + +

+ + +

=

⎛ ⎞= + >⎜ ⎟⎝ ⎠

⎛ ⎞= ≤⎜ ⎟⎝ ⎠

1

2 1 2

3 2 3

11 since

2

1 since

2

i i

i i i

i i i

y y

y y d

y y d

Let Yi be the true value of curve,

then Yi+1 = yi + di+1

with 0 ≤ di+1 <1,


di+3

•

∈ ∈� �,i iy Y

yi yi+1

yi+2 yi+3




Yi+1 = Yi + m; Yi+1 = yi + di+1

⇒ yi + di+1 = yi-1 + di + m

⇒ di+1 = di + m + (yi-1 - yi)di + m if yi = yi-1

di + m - 1 if yi = yi-1+ 1=

i

i

1 criterion :d

21

criterion :d2

← ≤

← >

di + m (and yi = yi-1)

di + m - 1(and yi = yi-1 + 1)

i

i

1if d

21

if d2

≤

>=

Changes of yi:

yi

yi+1yi+1 =


i 1

i 1

1if d

21

if d2

+

+

≤

>i i

10 2

i i i i 1i 1 i 1

i i i i 1

Easier calculation by D instead of d as follows:

D (0 ) 2 x x, M=m 2 x 2 y

D M if D 0 y if D 0D and y

D M 2 x if D 0 y +1 if D 0

∆ ∆ ∆ ∆

∆+

+ ++

= − ⋅ = − ⋅ =

+ ≤ ≤⎧ ⎧= =⎨ ⎨+ − > >⎩ ⎩




0

Example:

m 0.2 x 1, y 0.2,

M 2 y 0.4, D x 1

∆ ∆∆ ∆

= → = == = = − = −

i

i

i

x

y

D

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17

0 0 0 1 1 1 1 1 2 2 2 2 2 3 3 3 3

-1 -0,60 -0,20 0,20 -1,40 -1,00 -0,60 -0,20 0,20 -1,40 -1,00 -0,60 -0,20 0,20 -1,40 -1,00 -0,60

xi

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17

0

1

2

3

yi





• Dispersed-dot ordered dithering (monochrome dithering) is used e.g. to extend the number of available colors of displays (at the expense of resolution).Example: consider 3 bits (red, green, blue) per pixel and a 2 x 2 pattern area - each pattern can display 5 intensities for each color resulting in 5 x 5 x 5 = 125 colors.

• Halftoning (clustered-dot ordered dithering) uses the human eye’s weakness to produce different intensity levels, e.g. used by laser printers that are not able to display individual dots.Example: 2 x 2 dither patterns (5 intensity levels) represented in dither matrices

Kind of cheating to achieve more than two intensity levels if only two-level intensities may be represented resp. to increase the possible color depth is dithering.Dithering means to use the human eyes impossibility to differentiate between neighbored pixels if the resolution is too fine.

Dithering




Pixel information is very often insufficient! There is no information about:• shape of an object• orientation of an object

• three-dimensional interpretation of an object� Image Analysis (Image Processing) is required.

Some applications or special topics are:• Image enhancement

• Pattern recognition• Scene analysis

• Computer vision

Example: Traffic scenes taken by a camera installed in a carProblems: - Is there a traffic sign visible? Which traffic sign?

- Is a moving car in front of our car? Which type of car?- Which relative speed to our speed?

Image Analysis




Image analysis is concerned with techniques for extracting descriptions about images: • Computation of perceived brightness and color

• Partial or complete recovery of 3D data in a scene• Location of discontinuities corresponding to objects in a scene

• Characterisation of the properties of uniform regions in a image

Image Enhancement• Improves image quality by eliminating noise (extraneous or missing pixels) or by

enhancing contrast, i.e. X-ray images, computerized axial tomography (CAT)

Scene Analysis and Computer Vision• Deals with recognising and reconstructing 3D models of a scene from several 2D

images, i.e. industrial robot sensing (relative sizes, shapes, positions, colors)

Pattern Detection and Recognition• Deals with detecting and clarifying standard patterns and finds distortions from these

patterns, i.e. Optical Character Recognition (OCR)• Static character recognition (OCR), dynamic recognition (handwriting)

Image Analysis




000000000000000000000000111111111111111111100000000000000000000000000000000

Turning radar detector:

an object

000000010000110000000000111111001101111110010000000001000001110000000000000

Imperfect radar: some 0/1 mistakes:

Elimination of mistakes by digital filtering

Simplest version: symmetrical (2m-1, m) filter

000001000010001110110110111100010

000000000000001111111111111100000

Filtering with (5,3)-filter

Example:

h m -1

i11 2 1 2

1 if a... ... ... .. . w ith

0 e ls ei h mi n h n h

ma a a a a a a a a

+

= − +

⎧ ≥⎪′ ′ ′ ′ ′→ = ⎨⎪⎩

∑

Target Detection




Source of Image

Target Image

Symbolic Representation

Matching

Formatting

Steps

digital image data structure

logical data structure

unimproved

improved

Image Recognition Steps

Conditioning

Labelling

Grouping

Extracting




Digital Image Data Structures• Formatting

� Capturing of an image and transforming to a digital representation

• Conditioning� Based on a model that assumes that the observed image is composed of an

informative pattern modified by uninteresting variations

� Estimates informative pattern on the basis of the observed image� Suppresses noise and performs background normalization by suppressing

uninteresting systematic or patterned variations� Applied uniformly and context-independent

• Labelling� Based on a model that assumes that the informative pattern has structure as a

spatial arrangement of events

� Determines in what kind of spatial events each pixel participates� Labelling operations:

• edge detection, corner detection • thresholding (e.g. filters significant edges and labels them)• identification of pixels that participate in various shape primitives




Digital Image Data Structures are produced through image transformations:

I J I JLet I,J,G and f : I J G an image. : G G is an image transformation.τ × ×

⊆ × →→�

Examples1. Location dependent weakening of Intensity (e.g. effect with lens):

(f ( x,y )) f ( x,y ) intensityfunction( x,y )τ = +

t( f )

Let G : {0,...,r },r the maximum intensity and s G the threshold, f : I J G an image and t :G G' (here G' := {0,1}) is defined as:

0 for 0 i st( i ) : so that 1 for s i r

h is the new Histogram of

⎧⎪⎨⎪⎩

= ∈ ∈× →→

≤ ≤=< ≤

�

f

1f f

h defined as:

h :G with h ( i ) f ( i ) number of pixels with greyvalue i.Ν −→ = =

1

rs

t(i)

it(i)

Image Transformation

2. Threshold-Transformation (e.g. filtering of greyscales/colors):




3. Histogram Spreading (e.g. to increase the contrast of an image)

Let G : { 0,...,r },r and f : I J G an image.

Assume the image f uses only grey values in the scale of 0 a i b r .

With t defined as follows, the whole greyscale values of G are used.

0

rt( i ) : ( i a )

b ar

Ν= ∈ × →< ≤ ≤ <

⎧⎪⎪= −⎨ −⎪

⎩

[ ]t ( f )

for 0 i a

for a i b so that h is the new Histogram, t( i ) 0,r

for b i r

≤ ≤< < ∈≤ ≤

⎪

Analogous to spreading, the greyscale can be compressed and used only in partial intervals.A combination of both is e.g. useful to manipulate film materials.

r

ra b

Range of new greyscale values

Range of old values

Image Transformation




Grouping• Identifies events by collecting or identifying maximal connected sets of pixels

participating in the same kind of event (neural networks!)

• Determines new sets of entities • Changes the logical data structure

• Entities of interest after grouping are sets of pixels• E.g. line-fitting is a grouping operation, where edges are grouped into lines

Extracting• Computes for each group of pixels a list of properties:

Centroid, area, orientation, spatial moments, grey tone moments, spatial-grey tone moments, circumscribing circle, inscribing circle, number of holes in a region, average curvature in an arc, etc.

• Measures topological or spatial relationships between two or more groupings, i.e. clarifies whether two groupings touch, are spatially close or layered

Grouping and Extracting




Matching• Determines the interpretation of some related set of image events recognised

previously in the extracting step

• Associates events with some given 3D objects or 2D shapes• Template matching is a classical example of a wide variety of matching operations,

compares examined pattern with known and stored models and chooses the best match

Conclusion• Conditioning, labelling, grouping, extracting and matching constitute a canonical

decomposition of the image recognition problem• Each step prepares and transforms the data to facilitate the next step

• On any level the transformation is a unit process and data are prepared for the unit transformation to the next higher level

• Depending on the application, the sequence of steps has more than one level of recognition and description process

Matching




Formatting

Conditioning

Labelling

Corners

Extracting

Grouping

a group

Matching

Asome purification has been done

structures as corners, edges ... are detected

properties as size,orientation ...are detected

Meaning detected:It is letter A

maximal group of events

Detection of License Plates




Conclusion

Images are represented by...• Resolution (pixel x pixel)• Color depth (bit)

• Color model: greyscale, RGB, CYMK• Graphics primitives / vectors in graphics image formats

• Lots of image formats, as well lossless as lossy compression schemes• JPEG / JPEG2000 as examples for common formats

Image Synthesis...• Is needed to bring an image to the monitor

• Can use dithering to artificially enhance image quality

Image analysis...• Tries to detect patterns in an image representation• Can be used for improving image quality by suppressing noise or rescaling

• Can be used for recognize characters / text

digital image representation - rwth aachen university · • digital image representation ......

Documents