the digital image - clemson university

VIZA 654 / CPSC 646 – The Digital ImageCourse Notes

Donald H. HouseVisualization LaboratoryCollege of ArchitectureTexas A&M Universtiy

2 Sept 2002

dhouse

CPSC 404/604 Computer Graphics Images copyright: Donald H. House Clemson University

Contents

Preface v

1 Digital Image Basics 11.1 What is a Digital Image? . . . . . . . . . . . . . . . . . . . 11.2 Bitmaps and Pixmaps . . . . . . . . . . . . . . . . . . . . . 3

1.2.1 Bitmap - the simplest image storage mechanism . . 31.2.2 Pixmap - Representing Grey Levels or Color . . . . 5

1.3 The RGB Color Space . . . . . . . . . . . . . . . . . . . . . 8

2 Simple Image File Formats 12.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . 12.2 PPM file format . . . . . . . . . . . . . . . . . . . . . . . . 1

2.2.1 PPM header block . . . . . . . . . . . . . . . . . . . 32.3 Homework Nuts and Bolts . . . . . . . . . . . . . . . . . . . 4

2.3.1 OpenGL and PPM . . . . . . . . . . . . . . . . . . . 42.3.2 Logical shifts, and, or . . . . . . . . . . . . . . . . . 62.3.3 Dynamic memory allocation . . . . . . . . . . . . . . 72.3.4 Reading from the image file . . . . . . . . . . . . . . 82.3.5 C command-line interface . . . . . . . . . . . . . . . 9

3 Display Devices 13.1 CRT’s - Cathode Ray Tubes . . . . . . . . . . . . . . . . . . 1

3.1.1 Vacuum tube basics . . . . . . . . . . . . . . . . . . 13.1.2 CRT’s . . . . . . . . . . . . . . . . . . . . . . . . . . 2

3.2 Framebuffers . . . . . . . . . . . . . . . . . . . . . . . . . . 53.3 Gamma Correction . . . . . . . . . . . . . . . . . . . . . . . 8

4 Color 14.1 What is Color? . . . . . . . . . . . . . . . . . . . . . . . . . 1

4.1.1 Physical vs. physiological models . . . . . . . . . . . 1

i

ii CONTENTS

4.1.2 Color as a contextual phenomenon . . . . . . . . . . 34.1.3 Tri-stimulus theory . . . . . . . . . . . . . . . . . . . 4

4.2 HSV Color Space . . . . . . . . . . . . . . . . . . . . . . . . 44.3 CIE Color Space . . . . . . . . . . . . . . . . . . . . . . . . 5

5 Simple Image File Compression Schemes 15.1 Run-Length Encoding . . . . . . . . . . . . . . . . . . . . . 2

5.1.1 Silicon Graphics Iris RGB File Format . . . . . . . . 35.1.2 SoftImage Picture Files . . . . . . . . . . . . . . . . 6

5.2 Color Tables . . . . . . . . . . . . . . . . . . . . . . . . . . 95.3 Lempel–Ziv–Welch Encoding . . . . . . . . . . . . . . . . . 10

5.3.1 The LZW encoding algorithm . . . . . . . . . . . . . 115.3.2 LZW Decoding Algorithm . . . . . . . . . . . . . . . 115.3.3 GIF - Graphics Interchange File Format . . . . . . . 15

6 The PostScript Page Description Language 16.1 Basic Notions of the PostScript Language . . . . . . . . . . 46.2 Structure of PostScript Language . . . . . . . . . . . . . . . 46.3 PostScript Language Syntax and Semantics . . . . . . . . . 56.4 Execution of PostScript Programs . . . . . . . . . . . . . . 6

6.4.1 Procedure definition and execution . . . . . . . . . . 76.4.2 Control flow . . . . . . . . . . . . . . . . . . . . . . . 8

6.5 Graphics in PostScript . . . . . . . . . . . . . . . . . . . . . 106.5.1 Graphics State . . . . . . . . . . . . . . . . . . . . . 106.5.2 Working with Text . . . . . . . . . . . . . . . . . . . 13

7 Compositing 17.1 Alpha . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 27.2 The Over Operator . . . . . . . . . . . . . . . . . . . . . . 37.3 Associated Colors . . . . . . . . . . . . . . . . . . . . . . . . 57.4 Operations on Associated-Color Images . . . . . . . . . . . 57.5 Other Compositing Operations . . . . . . . . . . . . . . . . 7

8 Filtering 18.1 Global Filters . . . . . . . . . . . . . . . . . . . . . . . . . . 2

8.1.1 Normalization . . . . . . . . . . . . . . . . . . . . . . 28.1.2 Histogram equalization . . . . . . . . . . . . . . . . 4

8.2 Local Filters . . . . . . . . . . . . . . . . . . . . . . . . . . 68.2.1 Construction of local neighborhoods . . . . . . . . . 78.2.2 Median filter . . . . . . . . . . . . . . . . . . . . . . 88.2.3 Convolution . . . . . . . . . . . . . . . . . . . . . . . 108.2.4 Important convolution filters . . . . . . . . . . . . . 14

CONTENTS iii

9 Image Warps 19.1 Forward Map . . . . . . . . . . . . . . . . . . . . . . . . . . 29.2 Inverse Map . . . . . . . . . . . . . . . . . . . . . . . . . . . 39.3 Warping Artifacts . . . . . . . . . . . . . . . . . . . . . . . 59.4 Affine Maps or Warps . . . . . . . . . . . . . . . . . . . . . 69.5 Composing affine warps . . . . . . . . . . . . . . . . . . . . 99.6 Forward and Inverse Maps . . . . . . . . . . . . . . . . . . . 109.7 Perspective Warps . . . . . . . . . . . . . . . . . . . . . . . 12

10 Inverse Projective Maps and Bilinear Maps 110.1 Projective Maps . . . . . . . . . . . . . . . . . . . . . . . . 110.2 Inverse Projective Maps . . . . . . . . . . . . . . . . . . . . 210.3 Bilinear Interpolation . . . . . . . . . . . . . . . . . . . . . 4

10.3.1 Algebra of Bilinear Map . . . . . . . . . . . . . . . . 610.4 Scanline Approach to Inverse Mapping . . . . . . . . . . . . 810.5 Homework Nuts and Bolts: Projective Transformation . . . 10

10.5.1 Building the transformation matrix M . . . . . . . . 1010.5.2 Constructing the output pixmap . . . . . . . . . . . 1010.5.3 Finding the inverse transformation . . . . . . . . . . 1110.5.4 Constructing an edge table for the inverse map . . . 1110.5.5 Painting in the output image . . . . . . . . . . . . . 1210.5.6 Data structures and algorithms . . . . . . . . . . . . 13

11 Filtering to Avoid Artifacts 111.1 The Warping Process . . . . . . . . . . . . . . . . . . . . . . 111.2 Sampling and Reconstruction . . . . . . . . . . . . . . . . . 111.3 Resampling and Aliasing . . . . . . . . . . . . . . . . . . . . 211.4 Sampling and Filtering . . . . . . . . . . . . . . . . . . . . . 411.5 The Warping Pipeline . . . . . . . . . . . . . . . . . . . . . 411.6 Filtering as Convolution . . . . . . . . . . . . . . . . . . . . 511.7 Ideal vs. Practical Filter Convolution Kernels . . . . . . . . 7

11.7.1 The Ideal Low Pass Filter . . . . . . . . . . . . . . . 711.7.2 Practical Filter Kernels . . . . . . . . . . . . . . . . 911.7.3 Practical Convolution Filtering . . . . . . . . . . . . 10

11.8 Antialiasing: Spatially Varying Filtering and Sampling . . . 1311.8.1 The Aliasing Problem . . . . . . . . . . . . . . . . . 1311.8.2 Adaptive Sampling . . . . . . . . . . . . . . . . . . . 1511.8.3 Spatially Variable Filtering . . . . . . . . . . . . . . 1711.8.4 MIP Maps and Pyramid Schemes . . . . . . . . . . . 20

iv CONTENTS

12 Scanline Warping Algorithms 112.1 Scanline Algorithms . . . . . . . . . . . . . . . . . . . . . . 112.2 Separable Scanline Algorithms . . . . . . . . . . . . . . . . 3

12.2.1 Separation of projective warps . . . . . . . . . . . . 512.2.2 Paeth-Tanaka rotation algorithm . . . . . . . . . . . 6

12.3 Mesh Warp Algorithm . . . . . . . . . . . . . . . . . . . . . 1012.3.1 Separable mesh warp . . . . . . . . . . . . . . . . . . 1012.3.2 Mesh Warp Using Spline Interpolation . . . . . . . . 12

13 Morphing 113.1 Morphing Algorithms . . . . . . . . . . . . . . . . . . . . . 3

13.1.1 A Scanline Morphing Algorithm . . . . . . . . . . . 313.1.2 Triangle Mesh Warp Morph . . . . . . . . . . . . . . 413.1.3 Feature Based Morph . . . . . . . . . . . . . . . . . 4

14 Frequency Analysis of Images 114.1 Sampling, Aliasing and Reconstruction . . . . . . . . . . . . 114.2 Frequency Analysis . . . . . . . . . . . . . . . . . . . . . . . 3

14.2.1 Fourier series . . . . . . . . . . . . . . . . . . . . . . 314.2.2 Fourier transform . . . . . . . . . . . . . . . . . . . . 414.2.3 Discrete Fourier transform . . . . . . . . . . . . . . . 814.2.4 Discrete Fourier Transform of an image . . . . . . . 914.2.5 A demonstration of frequency based image construction 11

14.3 The Sampling Theorem . . . . . . . . . . . . . . . . . . . . 1214.4 Ideal Low-Pass Filter . . . . . . . . . . . . . . . . . . . . . . 1414.5 Fast Fourier Transform . . . . . . . . . . . . . . . . . . . . . 17

Preface

This book would never have been born without the frustrating, volatile, yetsomehow wonderful Visualization Sciences Program at Texas A&M Uni-versity. This program brings artists, scientists, designers and engineerstogether in a unified curriculum, dedicated to building a core expertise inall aspects of electronic visualization, from image making, to animation, tothe science of computer graphics, to videography. Our Program demandsthat students not only learn to use the latest high-end software for com-puter graphics production but that they both learn how it works and to dotheir own algorithm and software development. Our hope is to turn out re-searchers, developers, artists and designers who have a real appreciation fornot only their own specialty but the broad foundations on which their workis based. Students admitted to the program all have shown real excellencein one of the fields of mathematics, computer science, art or design. Theyhave also shown a proclivity and talent for the other aspects of the field,even if they may not have had much earlier formal training. Thus, there isan abundance of talent, at various levels of development. As a computerscientist teaching in this program, my constant challenge is to stimulate, en-courage and develop latent technical talents in the artists while at the sametime challenging, stretching and expanding the knowledge of the computerscientists.

The reality of today’s world is demanding that images move. Makingor admiring a single image is not really where it is at in computer graphicsthese days. The hot topics are animation, morphs, integration of CG withlive action, virtual reality – all with an emphasis on action. It is alsotrue that much of the commercial computer graphics work today is notdone by three-dimensional animation, but by image manipulation. Howare computers used in movie making? Much of the time they are usedto manipulate, process, modify, compose, warp, and otherwise manhandleimages. 3D computer animation has its uses, and very powerful ones atthat, but this has not diminished the use of computers to simply work onthe images themselves.

v

vi PREFACE

This book, then, is my attempt to collect together a wide variety of bothpractical and theoretical information on digital images within the contextof Computer Graphics, covering their handling, storage, transmission, ma-nipulation and display. This material has been used successfully for eightyears now in a course for our students. It is intended to be taught as aproject course, with each subject area accompanied by an assignment thatrequires the students to actually implement algorithms, or experiment withtools and techniques. Students should have access to computers with agood program development environment, supporting the OpenGL or simi-lar graphics standard. They should also have access to PhotoShop or Gimp,and some means of recording reasonable image hardcopy. An ideal systemwould have a good quality color printer, a 35mm film recorder, and meansof recording sequences of images on videotape; although the course canbe run well with just the medium quality color printer and the tools andpatience to shoot slides from the screen.

The course should be of interest to students majoring in computer sci-ence, the biological and physical sciences, engineering, art, architecture, anddesign. Recommended background would be computer science through thesecond course (typically Data Structures), and enough mathematical so-phistication to understand the concepts of functions, vector algebra andlinear transformations in two dimensions.

It is the common wisdom in the Image Processing community that itto really understand the processing of digital images one needs to under-stand the transformation of images to the frequency domain using fourieranalysis. This relegates the study of images to advanced students, with asolid foundation in complex analysis. This book, however, departs from thispath, recognizing that there is a wealth of interesting and useful materialregarding digital images that does not require such background, and in factan algorithmic approach can be taken. Instead of starting with frequencydomain analysis, the book reserves this material to the final chapters. Itattempts to build a strong intuitive foundation, based on the idea thata digital image is a two-dimensional collection of uniform samples, takenthrough a point-spread function, from a (hypothetical) continuous two-dimensional image. Image reconstruction for display and/or resampling isdone through another point-spread function. Without frequency domainanalysis it is not possible to present a mathematically rigorous argumentfor chosing one point-spread function shape over another or for discoveringwhat sampling rate is adequate to prevent artifacts. However, it is possibleto make very strong intuitive arguments for good shapes and rates that willwork well while minimizing artifacts. In fact an argument can be made,that the approach taken in this book builds a strong intuitive sense of goodpractice in working with images, that will greatly enhance the student’s

vii

appreciation for the deep insights that are gained when digital images arestudied using fourier analysis.

The course, as it stands, would make an excellent first course in Com-puter Graphics in a Computer Science program. It can be placed in thecurriculum so that it preceeds a traditional course in 3D Modeling andGraphics. If this course were made a prerequisite for the 3D course itwould be possible to make the 3D course much more focused on the nu-merous issues in 3D modeling and graphics that sometimes get lost in aone semester stand-alone course. It would also be possible to design thecurriculum so that both courses function independently, which has certainadministrative advantages. 3D graphics demands a sophisticated knowl-edge of data-structures and algorithms, and thus is best taught as an ad-vanced computer science elective. In this course, on the other hand, thealgorithms are considerably easier to implement and understand. Yet, thevisual results can be as interesting and compelling as those obtained in a3D graphics course, and the potential audience for the course can be muchbroader.

viii PREFACE

Chapter 1

Digital Image Basics

1.1 What is a Digital Image?

To understand what a digital image is, we have to first realize that whatwe see when we look at a “digital image” is actually a physical imagereconstructed from a digital image. The digital image itself is really a datastructure within the computer, containing a number or code for each pixelor picture element in the image. This code determines the color of thatpixel. Each pixel can be thought of as a discrete sample of a continuousreal image.

It is helpful to think about the common ways that a digital image iscreated. Some of the main ways are via a digital camera, a page or slidescanner, a 3D rendering program, or a paint or drawing package. Thesimplest process to understand is the one used by the digital camera.

Figure 1.1 diagrams how a digital image is made with a digital camera.The camera is aimed at a scene in the world, and light from the scene isfocused onto the camera’s picture plane by the lens (Figure 1.1a). Thecamera’s picture plane contains photosensors arranged in a grid-like array,with one sensor for each pixel in the resulting image (Figure 1.1b). Eachsensor emits a voltage proportional to the intensity of the light falling on it,and an analog to digital conversion circuit converts the voltage to a binarycode or number suitable for storage in a cell of computer memory. Thiscode is called the pixel’s value. The typical storage structure is a 2D arrayof pixel values, arranged so that the layout of pixel values in memory isorganized into a regular grid with row and column numbers correspondingwith the row and column numbers of the photosensor reading this pixel’svalue (Figure 1.1c).

1

2 CHAPTER 1. DIGITAL IMAGE BASICS

Figure 1.1: Capturing a 2D Continuous Image of a Scene

Since each photosensor has a finite area, as indicated by the circles inFigure 1.1b, the reading that it makes is a weighted average of the intensityof the light falling over its surface. So, although each pixel is conceptuallythe sample of a single point on the image plane of the camera, in reality itrepresents a spread of light over a small area centered at the sample point.The weighting function that is used to describe how the weighted average isobtained over the area of the sample is called a point spread function. Theexact form of the point spread function is a complex combination of pho-tosensor size and shape, focus of the camera, and photoelectric propertiesof the photosensor surface. Sampling through a point spread function of ashape that might be encountered in a digital camera is shown in diagramform in Figure 1.2.

A digital image is of little use if it cannot be viewed. To recreate thediscretely sampled image from a real continuous scene, there must be asreconstruction process to invert the sampling process. This process mustconvert the discrete image samples back into a continuous image suitablefor output on a device like a CRT or LCD for viewing, or a printer or filmrecorder for hardcopy. This process can also be understood via the notionof the point spread function. Think of each sample (i.e. pixel) in the digitalimage being passed back through a point spread function that spreads the

1.2. BITMAPS AND PIXMAPS 3

Figure 1.2: Sampling Through a Point-Spread Function

pixel value out over a small region.

box tent gaussian hat

Typical Point Spread Functions: sample points

Figure 1.3: Some Typical Point-Spread Functions

1.2 Bitmaps and Pixmaps

1.2.1 Bitmap - the simplest image storage mechanism

A bitmap is a simple black and white image, stored as a 2D array of bits(ones and zeros). In this representation, each bit represents one pixel ofthe image. Typically, a bit set to zero represents black and a bit set to onerepresents white. The left side of Figure 1.4shows a simple block letter Ulaid out on an 8 × 8 grid. The right side shows the 2-dimensional array ofbit values that would correspond to the image, if it were stored as a bitmap.Each row or scanline on the image corresponds to a row of the 2D array,and each element of a row corresponds with a pixel on the scanline.

Although our experience with television, the print media, and computersleads us to feel that the natural organization of an image is as a 2D grid of


1 1 1 1 1 1 1 11 1 0 1 1 0 1 11 1 0 1 1 0 1 11 1 0 1 1 0 1 11 1 0 1 1 0 1 11 1 0 1 1 0 1 11 1 0 0 0 0 1 11 1 1 1 1 1 1 1

Figure 1.4: Image of Black Block Letter U and Corresponding Bitmap

dots or pixels, this notion is simply a product of our experience. In fact,although images are displayed as 2D grids, most image storage media arenot organized in this way. For example, the computer’s memory is organizedinto a long linear array of addressable bytes (8 bit groups) of storage. Thus,somewhere in the memory of a typical computer, the block letter U ofFigure 1.4 might be represented as the following string of contiguous bytes:

11111111 11011011 11011011 11011011 11011011 11011011 11000011 11111111

Since the memory is addressable only at the byte level, the color of eachpixel (black or white) must be extracted from the byte holding the pixel’svalue. And, since the memory is addressed as a linear array, rather than asa 2D array, a computation must be made to determine which byte in therepresentation contains the pixel that we wish to examine, and which bitin that byte corresponds with the pixel.

The procedure print_bitmap() in Figure 1.5 will print the contents ofthe image stored in the array named bitmap. We assume that the imagerepresented by bitmap contains exactly width * height pixels, organizedinto height scanlines, each of length width. In other words, the numberof pixels vertically along the image is height, and the number of pixelshorizontally across the image is width. The print_bitmap() procedureassumes that each scanline in memory is padded out to a multiple of 8 bits(pixels), so that it exactly fits into an integer number of bytes. The variablew gives the width of a scanline in bytes.

Another issue is that the representation of groups of pixels in terms oflists of ones and zeros is extremely difficult for humans to deal with cogni-tively. To convince yourself of this, try looking at a group of two or morebytes of information, remembering what you see, and then writing downthe numbers from memory. To make the handling of this binary encodedinformation more manageable, it is convenient to think of each group of 4bits as encoding a hexadecimal number. The hexadecimal numbers are thenumbers written using a base of 16, as opposed to the usual decimal num-


void print_bitmap(unsigned char *bitmap, int width, int height){

int w = (width + 7) / 8; // number of bytes per scanline

int y; // scanline numberint x; // pixel number on scanlineint byte; // byte number within bitmap arrayint bit; // bit number within byteint value; // value of bit (0 or 1)

for(y = 0; y < height; y++){ // loop for each scanlinefor(x = 0; x < width; x++){ // loop for each pixel on linebyte = y * w + x / 8;bit = 7 - x % 8;value = bitmap[byte] >> bit & 1; // isolate bitprintf("%1d", value);

}printf("\n");

}}

Figure 1.5: Procedure to Print the Contents of a Bitmap

bers that use base 10, or the binary numbers of the computer that use base2. Since 16 is the 4th power of 2, each hexadecimal digit can be representedexactly by a unique pattern of 4 binary digits. These patterns are given intable Table 1.1, and because of their regular organization they can be easilymemorized. With the device of hexadecimal notation, we can now displaythe internal representation of the block letter U, by representing each 8-bitbyte by two hexadecimal digits. This reduces the display to:

FF DB DB DB DB DB C3 FF

1.2.2 Pixmap - Representing Grey Levels or Color

If the pixels of an image can be arbitrary grey tones, rather than simplyblack or white, we could allocate enough space in memory to store a realnumber, rather than a single bit, for each pixel. Then arbitrary levels ofgrey could be represented as a 2D array of real numbers, say between 0 and1, with pixel color varying smoothly from black at 0.0 through mid-grey at


Table 1.1: Hexadecimal Notation

Binary Hexadecimal Decimal0000 0 00001 1 10010 2 20011 3 30100 4 40101 5 50110 6 60111 7 71000 8 81001 9 91010 A 101011 B 111100 C 121101 D 131110 E 141111 F 15

0.5 to white at 1.0. However, this scheme would be very inefficient, sincefloating point numbers (the computer equivalent of real numbers) typicallytake 32 or more bits to store. Thus image size would grow 32 times fromthat needed to store a simple bitmap. The pixmap is an efficient alternativeto the idea of using a full floating point number for each pixel. The mainidea is that we can take advantage of the eye’s finite ability to discriminatelevels of grey.

It is a simple mathematical fact that in a group of n bits, the number ofdistinct combinations of 1’s and 0’s is 2n. In other words, n bits of storagewill allow us to represent and discriminate among exactly 2n different valuesor pieces of information. This relationship is shown in tabular form inTable 1.2. If, in our image representation, we use 1 byte (8 bits) to representeach pixel, then we can represent up to 256 different grey levels. This turnsout to be enough to “fool” the eye of most people. If these 256 differentgrey levels are drawn as vertical lines across a computer screen, people willthink that they are seeing a smoothly varying grey scale.

The structure of a pixmap, then, is a 2D array of pixel values, witheach pixel’s value stored as a group of 2 or more bits. To conform to byteboundaries, the number of bits used is typically 8, 16, 24 or 32 bits per


Table 1.2: Combinations of Bits

Bits # of Combinations Combinations1 21 = 2 0, 12 22 = 4 00, 01, 10, 113 23 = 8 000, 001, 010, 011, 100, 101, 110, 111

... ... ...8 28 = 256 00000000, 00000001, ... , 11111110, 11111111

pixel, although any size is possible. If we think of the bits within a byteas representing a binary number, we can store grey levels between 0 and255 in 8 bits. We can easily convert the pixel value in each byte to a greylevel between 0.0 and 1.0 by dividing the pixel value by the maximum greyvalue of 255.

Assuming that we have a pixmap storing grey levels in eight bits perpixel, the procedure print_greymap() in Figure 1.6 will print the contentsof the image stored in the array named greymap. We assume that the imagerepresented by greymap contains exactly width * height pixels, organizedinto height scanlines, each of length width.

void print_greymap(unsigned char *greymap, int width, int height){

int y; // scanline numberint x; // pixel number on scanlineint value; // value of pixel (0 to 255)

for(y = 0; y < height; y++){ // loop for each scanlinefor(x = 0; x < width; x++){ // loop for each pixel on linevalue = greymap[y * width + x]; // fetch pixel valueprintf("%5.3f ", value / 255.0);

}printf("\n");

}}

Figure 1.6: Procedure to Print the Contents of an 8 bit/pixel GreylevelPixmap


1.3 The RGB Color Space

If we want to store color images, we need a scheme of color representa-tion that will allow us to represent color in a pattern of bits (just like werepresented grey levels as patterns of bits). Fortunately, many such repre-sentations exist, and the most common one used for image storage is theRGB or Red-Green-Blue system. This takes advantage of the fact that wecan “fool” the human eye into “seeing” most of the colors that we can rec-ognize perceptually by superimposing 3 lights colored red, green and blue.The level or intensity of each of the three lights determines the color thatwe perceive.

Red

Green

Blue ColoredSpot

Figure 1.7: Additive Color Mixing for the Red-Green-Blue System

If we think of red, green, and blue levels as varying from 0 (off) to 1(full brightness), then a color can be represented as a red, green, blue triple.Some example color representations using this on/off scheme are shown inFigure 1.8. It is interesting and somewhat surprising that yellow is madeby combining red and green!

Now, we can extend this idea by allowing a group of bits to representone pixel. We can assign some of these bits to the red level, some to green,and some to blue, using a binary encoding scheme like we used to storegrey level. For example, if we have only 8 bits per pixel, we might use threefor the red level, 3 for green, and 2 for blue (since our eye discriminatesblue much more weakly than red or green). Figure 1.9 shows how a mutedgreen color could be stored using this kind of scheme. The value actually

1.3. THE RGB COLOR SPACE 9

(1, 0, 0)=red (0, 1, 0)=green (0, 0, 1)=blue(1, 1, 0)=yellow (0, 1, 1)=cyan (1, 0, 1)=magenta(0, 0, 0)=black (1, 1, 1)=white (0.5, 0.5, 0.5)=grey

Figure 1.8: Example Colors Encoded as RGB Triples

stored is hexadecimal 59, which is then shown in binary broken into red,green and blue binary fields. Each of these binary numbers is divided bythe maximum unsigned number possible in the designated number of bits,and finally shown represented as a (RGB) triple of color primary values,each on a scale of 0 – 1.

On a high end graphics computer, it is not unusual to allocate 24 bitsper pixel for color representation, allowing 8 bits for each of the red, greenand blue components. This is more than enough to allow for perceptuallysmooth color gradations, and fits nicely into a computer whose memory isorganized into 8-bit bytes. If you read specifications for computer displaysor use graphics software, you will have noticed that many of these systemsuse red, green, and blue levels between 0-255. These are obviously systemsthat use an 8-bit per color primary representation.

5916 = 010 110 01R G B = (2/7, 6/7, 1/3) = (0.286, 0.757, 0.333)

Figure 1.9: 8-Bit Encoding of a Muted Green

Since the RGB system organizes color into three primaries, and allowsus to scale each primary independently, we can think of all of the colors thatare represented by the system as being organized in the shape of a cube, asshown in Figure 1.10. We call this the RGB color cube, or the RGB colorspace (when we add coordinate axes to measure R, G and B levels). Notethat the corners of the RGB color cube represent pure black and pure white,the three primaries red, green and blue, and the 3 secondary colors yellow,cyan and magenta. The diagonal from the black corner to the white cornerrepresents all of the grey levels. Other locations within the cube correspondwith all of the other colors that can be displayed.

A pixmap storing RGB levels using eight bits per primary, with an ad-ditional eight bits per pixel reserved, is called an RGBA (or Red, Green,Blue, Alpha) pixmap. The procedure print_pixmap() in Figure 1.11


White(1,1,1)

Red(1,0,0)

Green(0,1,0)

Black(0,0,0)

Magenta(1,0,1)

Cyan(0,1,1)

Yellow(1,1,0)

Blue(0,0,1)

G

B

R

Figure 1.10: RGB Color Cube

will print the contents of the RGBA image stored in the array namedpixmap. We assume that the image represented by pixmap contains exactlywidth * height pixels, organized into height scanlines, each of lengthwidth.

1.3. THE RGB COLOR SPACE 11

void print_pixmap(unsigned int *pixmap, int width, int height){

int y; // scanline numberint x; // pixel number on scanlineunsigned int value; // pixel as fetched from pixmapint r, g, b; // RGB values of pixel (0 to 255)

for(y = 0; y < height; y++){ // loop for each scanlinefor(x = 0; x < width; x++){ // loop for each pixel on linevalue = pixmap[y * width + x]; // fetch pixel valuer = value >> 24;g = (value >> 16) & 0xFF;b = (value >> 8) & 0xFF;printf("(%5.3f,%5.3f,%5.3f) ",

r / 255.0, g / 255.0, b / 255.0);}printf("\n");

}}

Figure 1.11: Procedure to Print the RGB Values in a 32 Bit/Pixel RGBAPixmap

Chapter 2

Simple Image FileFormats

2.1 Introduction

The purpose of this lecture is to acquaint you with the simplest ideas inimage file format design, and to get you ready for this week’s assignment -which is to write a program to read, display, and write a file in the PPMrawbits format.

Image file storage is obviously an important issue. A TV resolutiongreyscale image has about 1/3 million pixels – so a full color RGB imagewill contain 3 × 1/3 = 1 million bytes of color information. Now, at 1,800frames (or images) per minute in a computer animation, we can expect touse up nearly 2 gigabytes of disk storage for each minute of animation weproduce! Fortunately, we can do somewhat better than this using variousfile compression techniques, but disk storage space remains a crucial issue.Related to the space issue is the speed of access issue – that is, the biggeran image file, the longer it takes to read, write and display.

But, for now let us start with looking at the simplest of formats, beforemoving on to compression schemes and other issues.

2.2 PPM file format

The PPM, or Portable Pixmap, format was devised to be an intermediateformat for use in developing file format conversion systems. Most of youknow that there are numerous image file formats, with names like GIF,

1

2 CHAPTER 2. SIMPLE IMAGE FILE FORMATS

Targa, RLA, SGI, PICT, RLE, RLB, etc. Converting images from oneformat to another is one of the common tasks in visualization work, sincedifferent software packages and hardware units require different file formats.If there were N different file formats, and we wanted to be able to convertany one of these formats into any of the other formats, we would have tohave N × (N − 1) conversion programs – or about N2. The PPM idea isthat we have one format that any other format can be converted into andthen write N programs to convert all formats into PPM and then N moreprograms to convert PPM files into any format. In this way, we need only2 × N programs – a huge savings if N is a large number (and it is!).

The PPM format is not intended to be an archival format, so it doesnot need to be too storage efficient. Thus, it is one of the simplest formats.Nevertheless, it will still serve to illustrate features common to many imagefile formats.

Most file formats are variants of the organization shown in Figure 2.1.The file will typically contain some indication of the file type, a block ofheader or control information, and the image description data. The headerblock contains descriptive information necessary to interpret the data inthe image data block. The image data block is usually an encoding of thepixmap or bitmap that describes the image. Some formats are fancier, someare extremely complex, but this is the basic layout. Also, most (but notall) formats have some kind of identifier – called the magic number – at thestart, that identifies the file type. Often the magic number is not a numberat all, but is a string of characters. But in any case, that is what it is called.

header

imagedata

"magicnumber"

Figure 2.1: Typical Image File Layout

In the PPM format, the magic number is either the ASCII characterstring "P1", "P2", "P3", "P4", "P5", or "P6" depending upon the storagemethod used. "P1" and "P4" indicate that the image data is in a bitmap.

2.2. PPM FILE FORMAT 3

These files are called PBM (portable bitmap) files. "P2" and "P5" areused to indicate greyscale images or PGM (portable greymap) files. "P3"and "P6" are used to indicate full color PPM (portable pixmap) files. Thelower numbers – "P1", "P2", "P3" – indicate that the image data is storedas ASCII characters; i.e., all numbers are stored as character strings. Thisis a real space waster but has the advantage that you can read the filein a text editor. The higher numbers – "P4", "P5", "P6" – indicate thatimage data is stored in a binary encoding – affectionately known as PortablePixmap rawbits format. In our study of the PPM format, we will look onlyat "P6" type files.

2.2.1 PPM header block

The header for a PPM file consists of the information shown in Figure 2.2,stored as ASCII characters in consecutive bytes in the file. The imagewidth and height determine the length of a scanline, and the number ofscanlines. The maximum color value cannot exceed 255 (8 bits of colorinformation) but may be less, if less than 8 bits of color information perprimary are available. In the header, all white-space (blanks, carriage re-turns, newlines, tabs, etc.) is ignored, so the program that writes the filecan freely intersperse spaces and line breaks. Exceptions to this are thatfollowing an end-of-line character (decimal 10 or hexadecimal 0A) in thePPM header, the character # indicates the start of a text comment, andanother end-of-line character ends the comment. Also, the maximum colorvalue at the end of the header must be terminated by a single white-spacecharacter (typically an end-of-line).

P6 -- magic number# comment -- comment lines begin with ## another comment -- any number of comment lines200 300 -- image width & height255 -- max color value

Figure 2.2: PPM Rawbits Header Block Layout

The PPM P6 data block begins with the first pixel of the top scanlineof the image (upper lefthand corner), and pixel data is stored in scanlineorder from left to right in 3 byte chunks giving the R, G, B values foreach pixel, encoded as binary numbers. There is no separator betweenscanlines, and none is needed as the image width given in the header blockexactly determines the number of pixels per scanline. Figure 2.3a shows a


red cube on a mid-grey background, and Figure 2.3b gives the first severallines of a hexadecimal dump (text display) of the contents of the PPM filedescribing the image. Each line of this dump has the hexadecimal bytecount on the left, followed by 16 bytes of hexadecimal information from thefile, and ends with the same 16 bytes of information displayed in ASCII(non-printing characters are displayed using the character !). Except forthe first line of the dump, which contains the file header information, theASCII information is meaningless, since the image data in the file is binaryencoded. A line in the dump containing only a * indicates a sequence oflines all containing exactly the same information as the line above.

2.3 Homework Nuts and Bolts

2.3.1 OpenGL and PPM

Display of an image using the OpenGL library is done most easily usingthe procedure glDrawPixels(), that takes an image pixmap and displaysit in the graphics window. Its calling sequence is

glRasterPos2i(0, 0);glDrawPixels(width, height, GL_RGBA, GL_UNSIGNED_BYTE, Pixmap);

The call glRasterPos2i(0, 0) assures that the image will be drawn inthe window starting at the lower left corner (pixel 0 on scanline 0). Thewidth and height parameters to glDrawPixels() specify the width, inpixels, of a scanline, and the height of the image, in number of scan-lines. The GL_RGBA parameter indicates that the pixels are stored as RGBAquadruples, with color primaries stored in the order red, green, blue, alpha,and the GL_UNSIGNED_BYTE parameter indicates that each color primary isstored in a single byte and treated as an unsigned number between zeroand 255. Finally, the parameter Pixmap is a pointer to an array of integers(unsigned int *Pixmap) that is used to store the image’s pixmap. Sincean unsigned integer is 32 bits long, each element of the Pixmap array hasroom for four bytes of information, exactly what is required to store onepixel.

Please make special note of the following complications:

1. By default, OpenGL wants the image raster stored from the bottomscanline of the image to the top scanline, whereas PPM stores theimage from the top scanline to the bottom.

2. Each 32 bit int in the pixmap stores 1 pixel value in the order R, G,B, α. α is an opacity value that you can set to 0 for now. Each of R,G, B, and α take 1 byte.

2.3. HOMEWORK NUTS AND BOLTS 5

a) red cube on a midgrey (0.5 0.5 0.5) background

000000 5036 0a33 3030 2032 3030 0a32 3535 0a7f P6!300 200!255!!000010 7f7f 7f7f 7f7f 7f7f 7f7f 7f7f 7f7f 7f7f !!!!!!!!!!!!!!!!*009870 886d 6d92 5959 8a69 6984 7676 817d 7d80 !mm!YY!ii!vv!}}!009880 7f7f 7f7f 7f7f 7f7f 7f7f 7f7f 7f7f 7f7f !!!!!!!!!!!!!!!!*009bf0 7f80 7e7e 9d41 41b5 0909 b211 11a9 2424 !!~~!AA!!!!!!!$$009c00 9f3b 3b94 5454 8b68 6886 7272 827b 7b80 !;;!TT!hh!rr!{{!009c10 7f7f 7f7f 7f7f 7f7f 7f7f 7f7f 7f7f 7f7f !!!!!!!!!!!!!!!!*009f70 7f7f 7f7f 7f82 7979 a72b 2bb9 0000 ba00 !!!!!!yy!++!!!!!009f80 00ba 0000 b901 01b7 0606 b20f 0faf 1d1d !!!!!!!!!!!!!!!!009f90 a532 3297 4d4d 8d62 6286 7272 827a 7a80 !22!MM!bb!rr!zz!009fa0 7e7e 7f7f 7f7f 7f7f 7f7f 7f7f 7f7f 7f7f ~~!!!!!!!!!!!!!!009fb0 7f7f 7f7f 7f7f 7f7f 7f7f 7f7f 7f7f 7f7f !!!!!!!!!!!!!!!!*00a2f0 7f7f 7f7f 7f7f 7f7f 7f8a 6969 b014 14ba !!!!!!!!!!ii!!!!00a300 0000 ba00 00ba 0000 ba00 00ba 0000 ba00 !!!!!!!!!!!!!!!!00a310 00ba 0000 ba00 00b9 0505 b60d 0daf 1d1d !!!!!!!!!!!!!!!!00a320 a62f 2f9d 4141 915b 5b88 6d6d 8279 7980 !//!AA![[!mm!yy!00a330 7e7e 7f7f 7f7f 7f7f 7f7f 7f7f 7f7f 7f7f ~~!!!!!!!!!!!!!!00a340 7f7f 7f7f 7f7f 7f7f 7f7f 7f7f 7f7f 7f7f !!!!!!!!!!!!!!!!*

b) several lines in a dump of PPM P6 red-cube image file

Figure 2.3: Example PPM P6 Data


3. PPM stores pixel primaries in R, G, B order, and there is no provisionfor an α value.

2.3.2 Logical shifts, and, or

Reading a PPM file and displaying it, requires you to pack the red, greenand blue information for a single pixel into one 32 bit unsigned integer.When you have to write the file back out, it is necessary to unpack thered, green and blue components. I recommend that you do this using thefollowing approach

int red, green, blue;unsigned int pixel;

/* packing */pixel = red << 24 | green << 16 | blue << 8;

/* unpacking */red = (pixel >> 24);green = (pixel >> 16) & 0xff;blue = (pixel >> 8) & 0xff;

The operator <<, when used in an arithmetic expression, causes the bitsin the memory cell specified to the left of the << symbol to be shifted tothe left by the number of positions specified to the right of the << symbol.0’s are shifted into the rightmost end of the cell to fill vacated positions.Likewise, when used in an arithmetic expression, >> causes a shift to theright by the specified number of bits, with 0’s shifted into the leftmostend of the cell. The operator | specifies a logical or operation betweenthe value specified to its left and the value specified to its right. Similarly,the operator & specifies a logical and operation between its left and rightoperands. Both the logical or and and operations operate bit by bit betweencorresponding bit positions in their two operands. A logical or of two bitsyields a 1 if either of the operand bits is a 1, otherwise it yields a 0. Inother words, if either one or the other or both operands are 1 the resultis 1. A logical and of two bits yields a 0 if either of the operand bits is a0, otherwise it returns a 1. In other words, the result is 1 only if the leftoperand and the right operand are 1. These operations are diagrammed inFigure 2.4.

There are many practical uses of these logical operations, but for ourpurposes, the most important is that they allow us to do selective oper-ations on groups of bits (fields) within a byte or word. Logical or allows


OR

0 | 0 = 00 | 1 = 11 | 0 = 11 | 1 = 1

AND

0 & 0 = 00 & 1 = 01 & 0 = 01 & 1 = 1

01101010& 00001111 00001010

01100000| 00001010 01101010

Figure 2.4: Logical or and and Operations

superposition of fields within a word, and logical and allows masking off offields, leaving only the values in the bit positions that have been logicallyanded with 1 bits. Figure 2.5 gives a few examples to show how these logicaloperations work. The examples use only 8 bits for simplicity, but the sameprinciples hold for a 32 bit quantity.

unsigned char a = 6; a: 00000110unsigned char b = 10; b: 00001010unsigned char c = 106; c: 01101010unsigned char byte; byte: xxxxxxxx

a) starting values in variables a, b, c, and byte

byte = a << 4; byte: 01100000byte = (a << 4) | b; byte: 01101010byte = c & 0x0f; byte: 00001010byte = (c >> 4) & 0x0f; byte: 00000110

b) examples of shifts, or, and

Figure 2.5: Logical Shift, Or and And Operations in C

2.3.3 Dynamic memory allocation

You will not know how big to make the pixmap storage array until youhave read the PPM Header information to get the image width and height.Once you have that, in C++ simply do

unsigned long *Pixmap;Pixmap = new unsigned long[width * height];


or in traditional C simply do

unsigned long *Pixmap;Pixmap = (unsigned long *)malloc(width * height * sizeof(long));

to allocate exactly enough space to store the image when it is read in.Note that Pixmap will just be a big 1D array, not a 2D array nicely

arranged by scanline. Thus, your program will have to figure out whereeach scanline starts and ends using image width.

2.3.4 Reading from the image file

Unless you are quite experienced with C++ I/O facilities for handling files,I recommend that you use the file input/output routines from standard C,rather than using the stream I/O facilities of C++. To make use of themyou will have to

#include <stdio.h>

Once you have opened a file for reading via

FILE * infile;infile = fopen(infilename, "r");

you can read individual bytes from the file via

int ch;ch = fgetc(infile);

Once you have opened a file for writing via

FILE *outfile;outfile = fopen(outfilename, "w");

you can write individual bytes to the file via:

int ch;fputc(ch, outfile);

For reading and writing ASCII data from the file, fscanf() and fprintf()are very handy. Note: fscanf() ignores white-space.


2.3.5 C command-line interface

The C command-line interface allows you to determine what was typed onthe command line to run your program. It works as follows – declare yourmain() procedure like this:

int main (int argc, char *argv[]){...

}

Unix parses the command line and places strings from it into the arrayargv. It also sets argc to be the number of strings parsed. If the commandline were:

ppmview in.ppm out.ppm

then the argv and argc data structures would be built as shown in Fig-ure 2.6. Thus, the file name of the input file would be given by argv[1],and the output file by argv[2]. argv[0] contains the name of the programitself.

3

p p m v i e w \0

i n . p p m \0

o u t . p p m \0

argc

argv

0

1

2

Figure 2.6: argc and argv data structures in C

Chapter 3

Display Devices

3.1 CRT’s - Cathode Ray Tubes

Commonly called a picture tube, from the days when most people onlyencountered them in their television sets, the cathode ray tube or CRT isthe primary device used to display digital images. Before the inventionof transistors, chips and integrated circuits, electronic devices dependedupon vacuum tubes as “electronic valves” and “switches”. Since CRT’s aresimply highly specialized vacuum tubes, it will be useful to understand thebasic concepts employed in vacuum tubes before studying studying CRT’s.

3.1.1 Vacuum tube basics

A simple vacuum tube has four active elements enclosed in an evacuatedtube, as shown in the schematic diagram of Figure 3.1. Wires extending tothe base of the tube allow a voltage to be applied to the heater coil, anda second voltage across the cathode and plate. A control voltage can beapplied to the grid. The device can be thought of as a “valve” or a “switch”,since small changes in control voltage can produce a large change in currentthrough the cathode/plate circuit. The tiny control voltage regulates like avalve, or can turn the current on or off like a switch. Here is how it works:

1. The cathode is negatively charged, giving it an excess of electrons

2. The heater heats the cathode, imparting energy that causes the re-lease of some of the excess electrons

3. The free electrons are attracted to the positively charged plate, re-sulting in a current through the cathode/plate circuit

1

2 CHAPTER 3. DISPLAY DEVICES

4. Since the electrical charge on the grid is nearer to the cathode thanthe plate is, it can prevent electrons from passing on to the plate or“encourage” them. A more positive grid voltage attracts electrons,increasing their acceleration towards the plate. A more negative gridvoltage inhibits electrons.

ControlVoltage

Cathode

Grid

Plate

- +

- +

Heater

plate currentcontrolled by smallchanges in gridvoltage

Vacuumenvelope

Figure 3.1: Schematic Diagram of Vacuum Tube

3.1.2 CRT’s

A CRT or Cathode Ray Tube works on exactly the same principle as asimple vacuum tube but the internal organization is somewhat different. Aschematic diagram showing the organization of a simple CRT is shown inFigure 3.2. As the electrons travel from cathode to plate they are focusedinto a beam and directed onto precise positions on the plate. In a CRT,the plate is a glass screen, coated with phosphor. The phosphor on thescreen glows more or less brightly depending on the intensity of the beamimpinging on it.

The flow of electrons from cathode to plate works like in a regularvacuum tube. However, focusing coils align the electrons into a beam, likea lens focuses light into a beam. Steering coils push the beam left/right andup/down so that it is directed to a particular spot on the screen. The gridcontrol voltage adjusts the intensity of the beam, and thus the brightnessof the glowing phosphor dot where the beam hits the screen.

A CRT can be used to display a picture in two different ways. Theelectron beam can be directed to “draw” a line-drawing on the screen –much like a high-speed electronic “Etch-a-Sketch”. The picture is drawnover and over on the screen at very high speed, giving the illusion of apermanent image. This type of device is known as a vector display, and was

3.1. CRT’S - CATHODE RAY TUBES 3

− +

heater

cathode grid

focusing andsteeringcoils

controlvoltage

phosphorcoating

Figure 3.2: Schematic Diagram of CRT

quite popular for use in Computer Graphics and Computer Aided Design upuntil the early 1980’s. By far the most popular type of CRT-based displaydevice today is the raster display. They work by scanning the electron beamacross the screen in a regular pattern of scanlines to “paint” out a picture,as shown in Figure 3.3. As a scanline is traced across the screen by thebeam, the beam is modulated proportional to the intended brightness ofthe corresponding point on the picture. After a scanline is drawn, the beamis turned off, and brought back to the starting point of the next scanline.

The resulting pattern of scanlines is known as a raster. The NTSCbroadcast TV standard that is used throughout most of America uses 585scanlines with 486 of these in the visible raster. The extra scanlines areused to transmit additional information, like control signals and closedcaption titling, along with the picture. The NTSC standard specifies aframerate of 30 frames per second, with each frame (single image) broadcastas two interlaced fields. The first of each pair of fields contains every evennumbered scanline, and the second every odd numbered scanline. In thisway the screen is refreshed 60 times every second, giving the illusion of asolid flicker-free image. In actuality, most of the screen is blank (or dark)most of the time!

Please see Foley, vanDam, Feiner & Hughes for many more details onCRT’s.

A color CRT works like a monochrome CRT, but the tube has threeseparately controllable electron beams - we say it has three electron guns.The screen has dots of red, green and blue colored phosphors, and each ofthe three beams is calibrated to illuminate only one of the phosphor colors.Thus, even though beams of electrons have no color, we can think of theCRT as having red, green and blue electron guns. Colors are made using


scanline

verticalretrace

horizontalretrace

Figure 3.3: Raster Scan Pattern

the RGB system, as optical mixing of the colors of the adjacent tiny dotstakes place in the eye.

Figure 3.4 shows the most typical triangular pattern or triad arrange-ment of phosphors on the back of the glass screen of color CRT screens.On the inside of the screen, an opaque shadow mask is placed between thethree electron guns and the phosphors to assure that each gun excites onlythe phosphors of its appropriate color. High precision color CRT’s, thatrequire the ability to draw a fine horizontal scanline, use an inline ratherthan a triad phosphor arrangement, to keep the scanline confined to a morefinely focused vertical area.

3.2 Framebuffers

A framebuffer is simply an array of computer memory, large enough tohold the color information for one frame (i.e., one screenful), and displayhardware to convert the frame into control signals to drive a CRT. Thesimple framebuffer schematized in Figure 3.5 holds a monochrome (black& white) image in a bitmap. The circuitry that controls the electron gunon the CRT loops through each row of the image array, fetching each pixelvalue (1 or 0) in turn and using it to either turn on the electron gun forwhite or turn it off for black. Of course the timing has to be such thatthe memory fetches and conversion to grid voltages is synchronized exactlywith the trace of the beam across the corresponding screen scanline.

Figure 3.6 shows a greyscale framebuffer with three bits per pixel, thatuses a DAC (digital to analog converter) to convert numeric grey level to

3.2. FRAMEBUFFERS 5

green gun

blue gun

shadow mask

red gun

glass screen withphosphor dot triads

Figure 3.4: Phosphor Triad Arrangement in Color CRT’sredrawn from Meko website, The European Source for Display Data andMarket Research, http://www.meko.co.uk.

one of 23 = 8 different voltages.Figure 3.7 introduces the notion of a look-up table. Each pixel value

from the framebuffer is used as an index into a table of 2n entries (8 inthe n = 3 example). Each table entry has a stored value whose precisionis usually greater than the framebuffer resolution. This gives a palette ofonly 2n colors, but the palette can be drawn from 2m greylevels, where mis the number of bits per entry in the lookup table. One way this couldbe used in a greyscale framebuffer would be to correct for non-linearities inthe display and in human perception so that each step in grey level wouldresult in a uniform perceptual step in luminance level.

Figure 3.8 shows how a framebuffer can be arranged to drive a colordisplay via three lookup tables. Virtually all 8-bit per pixel color displays– like the ones in older Macintoshes, PC’s or workstations – utilize a 24bit/pixel lookup table with 256 entries (28). This gives a palette of 256colors per frame, but the colors are drawn from a selection of nearly 17million (224 = 16, 777, 216).

A full color resolution framebuffer, called a truecolor framebuffer, isshown in Figure 3.9. This type of device will have at least 24 bits per pixel(8 bits per color primary), either driving 3 color guns directly or (as shownin the figure) through a separate lookup table per color primary that canbe used to correct nonlinearities or to obtain certain effects (like overlayplanes). Very high end graphic displays may have more than 24 bits allo-cated per pixel, to handle such tasks as color compositing, depth-bufferingfor hidden surface resolution, double buffering for real-time animated dis-


00000011

famebuffer memory1 bit per pixel

1

currentpixel

register

DAC

digitalto analogconverter

electron gun

Figure 3.5: Monochrome Framebuffer

011

famebuffer memory3 bits per pixel

currentpixel

register

DAC


electron gun

0113

Figure 3.6: 3-Bit Per Pixel Greyscale Framebuffer

play, and overlays.

3.3 Gamma Correction

So far we have treated the transfer of information from the framebuffer tothe CRT as if it were linear. For example, in an eight-bit grey-scale frame-buffer without a lookup table, pixel values can range from 0, representingblack, to 255, representing white. Thus, we would expect that a pixel valueof 127 would represent a middle grey, and in that even increments of pixelvalues would represent even increments of grey. In fact, this is not the case.

There are two reasons why even increments of pixel values do not re-

3.3. GAMMA CORRECTION 7

famebuffer memory3 bits per pixel


DAC

electron guncurrentpixel

register

011

011

0

1

2

3

4

5

7

00100110

38

lookup table8 bits per entry

6

Figure 3.7: Greyscale Framebuffer with Lookup Table

sult in even increments of perceived grey. The first is that the human eyeregisters equal increments of color intensity, not as a function of the differ-ence between intensities (luminance), but as a function of the ratio betweenintensities. In other words, if I1 < I2 < I3 are three intensities, the stepbetween I2 and I1 would look the same as the step between I3 and I2 if

I2/I1 = I3/I2,

but the steps would look unequal if

I2 − I1 = I3 − I2.

Because of the ratio law, in fact, the step between I2 and I1 would appearto be greater than the step between I3 and I2. This relationship is shown inFigure 3.10, which plots perceived intensity Ip in dimensionless units versusactual intensity Ia on a unit scale. The solid line indicates how perceivedintensity will vary with actual intensity, and the dashed line shows howperceived intensity would vary if the relationship were linear.

In practice, the story is further complicated by the fact that the relation-ship between CRT grid voltage and phosphor luminance is also non-linear.The actual phosphor luminance Ia due to grid voltage Iv is given by a curveof the form

Ia = Iγv ,

where γ is a positive constant greater than 1. This relationship is shownin Figure 3.11 for a CRT with γ = 2. For most CRTs in common usage,values of γ range between 1.6 and 2.4.

If we take both the nonlinearity of the CRT and the nonlinear perceptualresponse into account, the two effects interact to make a more complexrelationship. This effect is shown in Figure 3.12.


DAC

famebuffer memory1 byte per pixel

currentpixel

register

lookup tables

digitalto analogconverters

electron guns

254

255

DAC

00000011

00000011

DACR

G

2

3

4

5

00100110

00100110

B2

3

4

500100110

Figure 3.8: 8-Bit Color Framebuffer with 3 Lookup Tables

2

3

4

5

00000011

00000011 00100110

DAC

famebuffer memory3 bytes per pixel

currentpixel

registers

lookup tables

digitalto analogconverters

electron guns

254

255

2

3

4

5

11111111

00000011 00100110

DAC

00000011

00000011

DAC

00100110R

G

B

Figure 3.9: Truecolor Framebuffer with Three Lookup Tables

3.3. GAMMA CORRECTION 9

5

5

x = 1

x = 0.125

x = 0.25

x = 0.

x = 0.06250

0.25

0.

0.75

1

0 0 5 0 0.75 1.2 .5 Ia

Ip

Figure 3.10: Perceived Intensity vs. Actual Intensity

y =1

y = 0.0625

y = 0.25

y = 0.5625

0

0.25

0.5

0.75

1

0 0.25 0.5 0.75 1

Ia

Iv

γ = 2

Figure 3.11: Actual Intensity vs. Stored Intensity (voltage), gamma = 2

5x = 0.

x = 0.3536

x = 0.7071

x = 1

0

0.25

0.5

0.75

1

0 0.25 0.5 0.75 1Iv

Ip

Figure 3.12: Perceived Intensity vs. Stored Intensity (voltage)


The process of correcting for the nonlinearities introduced by the natureof human perception and the physics of CRTs is called gamma correction. Ifwe did not have to take the perceptual issue into account, gamma correctionwould simply consist of scaling each pixel value to the scale 0...1, raising itto the power 1/γ, and then restoring it to its original scale before using itto determine grid voltage. Actually, the problem is complicated by the factthat perceived intensity varies as the ratio of actual intensity. In most casesa suitable value for γ can be found experimentally, that will give an evenperceived gradation across the full range of greys. However, the γ valueused for gamma correction and the γ of the CRT will differ somewhat.

Gamma correction can be done on a per-pixel basis at the time of dis-play, replacing a stored image with a gamma corrected image. On a frame-buffer with a lookup table, however, the lookup table can be loaded withgamma corrected intensity values, that are simply indexed by the colors inthe image. This approach has the virtues that it need only be done once,even if we are displaying a sequence of images, that it requires much lesscomputation even for a single image, and that it does not require modifyingthe image itself. When we study compositing, we will see that pre-gammacorrecting images is a very bad idea if we plan on doing any work on the im-age that requires adding or averaging pixels. We will see that this includescompositing, spray painting, filtering and many other common operations.

Chapter 4

Color

4.1 What is Color?

There is probably nothing more important to the study of images than thenotion of color, what it means exactly, how it can be encoded, and howit can be manipulated. In our attempt to answer the question “what iscolor?”, we will examine several ways of looking at the phenomenon. Wewill first take the point of view of the physicist, later the physiologist, andfinally the artist. Each point of view has its own validity within its owncontext, and knowledge of each will help us to develop a more completeunderstanding of this higly complex question.

4.1.1 Physical vs. physiological models

The physical model of color is directly related to the physical phenomenonof light, and is actually a way of describing distributions of light energy.Light is energy in the form of electromagnetic waves, with wavelengths inthe visible range from 400 to 700 nanometers. The energy of an electro-magnetic wave is generally not concentrated at a single wavelength but isdistributed across a range of wavelengths. This distribution is known asthe spectrum of the wave, and it is this spectrum that is interpreted by oureyes as the color of the light. Knowing only this distribution, we can makereasonable predictions about perception of color.

Graphs of two light energy spectra are shown in Figure 4.1. The areaunder each curve gives the total energy per unit illuminated surface area.This total energy relates directly to the luminance or percieved brightnessof the color. Both spectra shown in Figure 4.1 have about the same lu-minance. The “dominant” wavelength of the curve determines the hue or

1

2 CHAPTER 4. COLOR

color name of the color. Both of the spectra in Figure 4.1 have a maximumconcentration of energy around 560 nm, and energy at this wavelength isperceived by humans to have a yellow hue. The peakedness of the spec-trum, i.e. the percentage of the total energy concentrated in a narrow bandaround the dominant wavelength, determines the saturation or purity ofthe color. The spectrum in Figure 4.1 that is labeled “yellow” will look likea fairly pure yellow or orange, whereas the spectrum labeled “brown” willlook muddy, like a yellow ochre.

yellowRelative Energy D

ensity

brown

Figure 4.1: Physics of Color

However, from the point of view of the physiology of human vision, wecan find no “machinery” in the eye that does anything like measure theshape of light spectra. Instead, perception of color is the result of variousmental processes that receive their initial input from broadly-tuned photosensors in the retina. These sensors are of two main types.

Retinal rod cells measure illumination level (grey level) and have a spec-tral sensitivity or efficiency function of the shape shown in Figure 4.1.1a.Even though the rod cells convey no color information, they are maximallysensitive to the greens and yellows. This means that under very low illu-mination, when there is no color perception, we still see yellow and greenobjects as being brighter than red or blue objects.

Retinal cone cells provide differential measurements of illumination,with spectral tuning that is much finer than that exhibited by rod cells.The cone cells give us the ability to discriminate hue. There are three dis-tinct cone types, that can be thought of as corresponding very roughly to

4.1. WHAT IS COLOR? 3

the color primaries red, green, and blue. Figure 4.1.1b shows the relativesensitivities of the three cone types to light across the visible spectrum ofwavelengths.

a) rods b) cones

Figure 4.2: Spectral Sensitivity of Retinal Rod and Cone Cellscopied from Foley, van Dam, Feiner and Hughes, Computer Graphics Prin-ciples and Practice, Addison-Wesley, 1990, pg. 577.

However, curves of rod and cone sensitivity are only measures at theeye. They say a lot about input to brain processes, but very little aboutthe use to which the brain puts these inputs. Mental processing is complexand very ill understood. Judging from the amount of brain area devotedto it, visual processing is one of the most complex tasks undertaken by thebrain. Thus, we would be making a gross oversimplification of the problemif we tried to relate eye physiology directly to color perception.

4.1.2 Color as a contextual phenomenon

The artist, Josef Albers, elevated the art of perceptual color manipulationto a very high level in his development of color-field painting. His workingthesis was that color is a relative perceptual phenomenon, that transcendsattempts to exactly quantify or measure individual colors. In other words,the color we will perceive cannot be predicted outside of the context withinwhich the color will be presented. A memorable painting by Albers issimply a violet square immersed in a field of yellow. On close inspection,

4 CHAPTER 4. COLOR

however, the violet square is seen to be slate grey. The presentation of thegrey square in the context of a brilliant yellow surround, dramatically altersour perception. I will show two simple studies in class, that demonstratesome of the ideas that Albers worked with in his painting. For more on thecontextual nature of color, see Joseph Albers, Interaction of Color, YaleUniv. Press, 1963.

The point is, that in the end, color perception transcends the physicaland the physiological, involving the integration of context into the “reading”of a presentation.

4.1.3 Tri-stimulus theory

Nevertheless, there is much evidence that a tri-stimulus theory of color (likethe RGB system) provides a very useful color model, and if we ignore con-textual issues, allows us to represent specific colors in a very compact form.This form is highly suitable for manipulation in a computer. In general, tri-stimulus theory says that color can be quantified by a 3-parameter system.There is much experimental and experiential evidence to back this theory,and it underlies most of current color technology in the print, broadcast,and film industries. The foundation principle of tri-stimulus theory is thatmost perceptual colors can be produced by presenting a mixture of threeprimary colors. A direct and important consequence of this theory is thatmany different light energy distributions will be read as the same perceptualcolor.

Some color systems extend the three color primary idea, and use atri-stimulus system, where the three parameters are more abstract colormeasures.

4.2 HSV Color Space

One of the most widely used examples of an abstract tri-stimulus systemis the HSV color space. The HSV system attempts to represent all percep-tual colors using three measures that relate directly to how artists oftenthink about color. It provides separate measures of hue (correspondingto dominant color name), saturation (purity of color), and value (bright-ness on grey scale). Its structure is derived directly from the RGB system,and in fact there is a simple translation from RGB to HSV and back. Fig-ure 4.2 diagrams the relationship between the two systems. If the RGB colorcube of Figure 4.2a is viewed along its white-black diagonal, it presents thehexagonal silhouette shown in Figure 4.2b. The complete HSV system is acone-shaped space derived from this projection and shown in Figure 4.4.

4.3. CIE COLOR SPACE 5

Yellow

CyanB lue

White

B lack

R ed

Green

M agenta

a) RGB

Red

Green

Blue

Y ellow

Cyan

M agenta

W hite

b) HSV

Figure 4.3: Relationship Between RGB and HSV Color Systems

The HSV color space is parameterized by color coordinates (h, s, v),standing for hue, saturation and value. This parameterization is shown inthe three diagrams of Figure 4.5. Hue h is measured by angular positionaround the face of the cone. As shown in Figure 4.5a, it goes from 0 to 360degrees, starting with red at 0◦, and proceeding counterclockwise aroundthe color-wheel through yellow at 60◦, green at 120◦, cyan at 180◦, blue at240◦ and magenta at 300◦. Saturation s is measured by distance from thecentral axis of the cone. As shown in Figure 4.5b, it is a fraction that goesfrom 0 for grey at the center to 1 for fully saturated at the boundary of thecone. Value is measured by v. As shown in Figure 4.5c, it is a fraction thatgoes from 0 for black at the apex or point of the cone to 1 for full intensityat the base of the cone, measured along the axis of cone. For example, a fullintensity, fully saturated green would be (120, 1, 1). A very dark but fullysaturated green would be (120, 1, 0.3), a pastel green would be (120, 0.3, 1),and a neutral green brown would be (120, 0.3, 0.3).

4.3 CIE Color Space

The final space we will look at is the CIE system. This system was de-veloped in the 1930’s to place the determination of color and illumination

6 CHAPTER 4. COLOR

Black

Red

YellowGreen

CyanWhite

M agentaBlue

Figure 4.4: HSV Color Cone

on a more scientific basis with respect to human perception. One problemwith both the RGB and HSV systems, is that they do not allow for precisecolor specification because they have no set basis for the colors that can beformed. In other words, the pure red color on one display or printer maylook quite different from pure red on another device, although they bothhave the same RGB specification (1, 0, 0) or HSV specification (0, 1, 1). TheCIE system, on the other hand, makes it possible to specify exactly repro-ducible colors. Thus, CIE colors can be catalogued and then displayed onCRT’s, printed in ink, mixed as paint - always giving the same result.

Red

0k

Green 120k

Blue 240k

Yellow 60k

Cyan

180k

Magenta 300k

Gray

h

a) hue

Red

Green

Blue

Yellow

Cyan

Magenta

Gray

s

b) saturationBlack

Red

YellowGreen

Cyan White

M agentaBlue

v

c) value

Figure 4.5: Parameterization of HSV Color Space


The CIE system was developed using the experimental configurationshown in Figure 4.6. Observers were presented with a split screen. Acolored test light illuminated one side of the screen, and a set of three pureprimary lights illuminated the other side. The observer could adjust theintensity of each primary with a dial. The task was to adjust the screencolor produced by the three primaries, to match the color projected by thetest light.

t e s t l i g

o bserver

G526 nm

R645 nm

B444 nm

Figure 4.6: CIE Test Apparatus

Using a range of pure single-wavelength test lights, the curves of primaryintensities shown in Figure 4.3 were obtained. For each wavelength of testlight, the exact setting of each primary necessary to reproduce the perceivedcolor of that test light was determined. A complication was that for somecolors the primary settings needed to be negative – in other words it was notpossible to reproduce all test colors with the three additive primaries. Todeal with this complication, the nonreproducable colors were matched byadding primary colors to the test light. This was considered to be equivalentto subtraction of the corresponding primary from the additive mixture andshows up as negative values on the curves.

From the test data, three functions of wavelength, known as the CIEx, y and z color matching functions were developed. These functions aregraphed in Figure 4.3. They do not correspond to realizable light sources,but can be thought of mathematically as if each were a primary light source.The corresponding three light levels of the matching functions necessary tomatch a given color are called the color’s CIE XY Z coordinates. Thus,any color C can be represented by a weighted sum of these 3 primaries

C = Xx + Y y + Zz. (4.1)

Usually, however, the CIE coordinate system is given in modified form,

Donald House

Note confusion with x,y,z in several fonts!

8 CHAPTER 4. COLOR

Figure 4.7: CIE Color Matching Experiment Datacopied from Wikpedia, CIE 1931 Color Space,http://en.wikipedia.org/wiki/CIE 1931 color space.

by defining the color’s chromaticity coordinates (x, y, z). These coordinatesmeasure the color’s chromatic content, and are given by

x = X/(X + Y + Z), (4.2)y = Y/(X + Y + Z), (4.3)

z = Z/(X + Y + Z) = 1 − (x + y). (4.4)

The CIE matching function y was chosen to be identical to the humanresponse to luminance (compare Figures 4.1.1a and 4.3). Thus, the CIEY coordinate does not contribute to the chromatic content of a color andcan be thought of as the luminance or brightness of the color. Since thez component of chromaticity can be calculated directly from the x andy components, a unique color specification is given by (x, y, Y ). This isknown as the CIE xyY color specification. It has the advantage that all ofthe chromatic information is contained in the coordinate pair (x,y) and allof the luminance information is given by one coordinate Y. A cross sectionthrough the CIE xyY space for a fixed luminance (i.e. a fixed value of Y)looks like that shown in Figure 4.3.

There are methods of going from a catalogued CIE color to an RGBtriple, that require knowing some characteristics of the display device (e.g.the CIE coordinates of the phosphors of a CRT). The science of using CIE


Figure 4.8: The CIE XYZ Color Matching Functionscopied from Wikpedia, CIE 1931 Color Space,http://en.wikipedia.org/wiki/CIE 1931 color space.

information fits into the broad area of colorimetry, and is something thatyou may want to study further if you have a deep interest in color.

10 CHAPTER 4. COLOR

Figure 4.9: Cross Section of the CIE Space for Fixed Luminance Yfrom Schubert, E.F., Light Emitting Diodes, Colorimetry, 2003.¡http://www.ecse.rpi.edu/ schubert/Light-Emitting-Diodes-dot-org/¿

Chapter 5

Simple Image FileCompression Schemes

We have noted earlier that one of the big issues that we will need to dealwith in treating the subject of image files is file size. What ways are avail-able to store accurate image information in a file, while minimizing thestorage used for the file and at the same time keeping file access time toa minimum? This issue is clearly of importance in the animation and filmindustries, where it is often necessary to store many thousands of images tofully describe a scene. It is also of crucial importance in multimedia, whereit is often necessary to squeeze the maximum out of the limited storagecapacity of a CD-ROM or other personal computer storage medium. And,not of least significance, image storage size has crucial implications for thespeed of transmittal of the image between computers, especially within thecontext of visually-oriented computer networking such as is done over theInternet using the World Wide Web. The formula for transmission timeis quite simple – it is exactly proportional to the size of the file as it isencoded for transmission.

All compression schemes exploit the fact that most images contain agood deal of redundant information, and that this redundancy manifestsitself in various kinds of coherency in the image. We will define coherencyas the tendency for one portion of an image to be similar to other portions.Often coherency is most apparent in regions of the image that are physicallynear each other. For example, a digital image containing a pale yellowwall in the background will have broad expanses of nearly identical colors,and in fact it may have many thousands of pixels with exactly the samecolor value. This is the simplest kind of coherency – many copies of the

1

2 CHAPTER 5. SIMPLE IMAGE FILE COMPRESSION SCHEMES

same color. Coherency, however, can be a more sophisticated concept. Forexample, we could have the same wall, but now with a light gradationover its surface, so that we have a smooth and predictable variation inlight intensity (value) across the wall. Another more sophisticated examplecould be a picture containing a wall in the background that is covered witha patterned wallpaper, so that we have large regions that contain the samerepeating texture. There might be schemes for exploiting this coherency,for big space savings.

In this chapter, we will look at several of the simpler file compressionschemes, namely run-length encoding, the use of color tables, and a moresophisticated table-based encoding scheme. In a later chapter, we will lookat two much more powerful but also much more complex schemes – theJPEG protocol and Wavelet compression.

5.1 Run-Length Encoding

Run-length encoding, in its simplest form, takes advantage of color co-herency along a scanline. It is a readily observable fact that in certainimages there are many regions along a scanline where adjacent color val-ues are repeated across several pixels. This group of identical pixels, takentogether, is called a run. Runs appear frequently in synthesized computergraphic images, where it is not unusual for large areas to tend to have thesame color. Runs appear less frequently in photographic images, especiallywhen they are of natural scenes.

In the simplest run-length encoding scheme, instead of storing each pixelvalue along a scanline, a pair of numbers is stored for each run along thescanline. The first number in the pair is a repeat count, and the second isthe pixel value. For example, suppose we had the scanline

2 2 2 2 2 3 4 1 1 1which is ten pixels long. The run-length encoding of this scanline could bewritten

(5 2) (1 3) (1 4) (3 1)which contains one pair for each of the four runs in the scanline, requiringonly eight distinct values, and saving two units of storage. (Note that inthe above example, the parentheses are used only for clarity, and would notactually need to be be stored.) The savings of 20% in total space is aboutwhat can be expected for a typical computer graphic image. However,there are no guarantees. In the worst case, where no adjacent pixels havethe same color (i.e. all runs are of length one) this scheme actually doublesthe space used, since a pair of numbers would be required to represent eachpixel!

5.1. RUN-LENGTH ENCODING 3

Fortunately, there is an easy improvement that can be made on thesimple run-length encoding scheme, that solves this problem. The trick isto somehow “flag” the repeat count, such that the repeat count withoutthe flag is just the same as in the simple scheme, but the count with theflag is used to prefix a string of pixels that have different values. Then toencode the scanline, each run of length two or more would be output usingstandard run-length encoding, but all consecutive runs of length one wouldbe grouped together and output as a group prefixed by a flagged repeatcount. Using this scheme, the scanline from the example above would beencoded

(5 2) (2 3 4) (3 1)which requires only seven values to encode the original ten pixels. This linewould be decoded as follows:

(5 2) ⇒ 2 2 2 2 2

(2 3 4) ⇒ 3 4 the flag indicates that 2 explicitly given values follow

(3 1) ⇒ 1 1 1

With this modified runlength encoding scheme, we add only one extra valueper scanline, even in the worst case when every pixel has a different value.In actual implementation, worst-case performance might be a bit worsethan this, since we will use a limited storage space (typically one byte) forthe runlength count, so that the length of a run or an unencoded groupof pixels will typically be limited to less than the length of a scanline.Many popular computer graphics image file formats use this type of run-length encoding scheme. Important examples are wavefront RLA (.rla)files, Silicon Graphics Iris RGB (.rgb or .sgi), and Softimage Picture files(.pic).

5.1.1 Silicon Graphics Iris RGB File Format

Silicon Graphics developed its own image file format, known as the IrisRGB file format, and supported by the SGI GL 3D interactive graphicsstandard. It is not the standard in OpenGL (which supports no particularfile format), but nevertheless, the Iris image file format remains in wide use.Iris RGB files typically carry the file name suffix .rgb, although the suffix.sgi is also common. An Iris RGB file can be written in either verbatim orrun-length encoded modes. Overall file layout is shown in Figure 5.1, anddepends upon which of these two modes is chosen. The verbatim file formatof Figure 5.1a contains only the header and the uncompressed image data.


header blockunencoded image data

header blockscanline offset tables

run-length encoded image dataa) verbatim file b) run-length encoded file

Figure 5.1: SGI Iris RGB File Layout

A run-length encoded file of Figure 5.1b contains tables giving scanlineoffsets, in addition to the header and the run-length encoded image data.

The header is laid out according to the C++ data-structure shown in Fig-ure 5.2. The magic number of an Iris RGB file is 01DA16, and is stored inthe first two bytes of the file. The storage byte indicates whether the file isstored verbatim or run-length encoded. The image is organized into chan-nels, with one channel per color primary and additional channels for anyauxilliary values, such as an alpha value. The bpc byte indicates whetherone or two bytes per pixel are used for channel data. The dimensions byteindicates how many scanlines and channels are stored in the file, and willnormally be set to 3, indicating that the ysize and zsize entries belowgive the number of scanlines and channels. If dimensions is set to 1, theimage file consists of just one scanline and one channel, so ysize and zsizeare ignored. If dimensions is set to 2 there is only one channel, so zsizeis ignored. The xsize, ysize and zsize entries are each two bytes long,and give the scanline length and image height in pixels, and the numberof channels. The pixmin and pixmax entries are each four bytes long, andare used to hold the minimum and the maximum channel values in theimage. The imagename entry is an 80 byte area that is used to store anytext description of the image that is desired. Typically this will store thefilename of the image. The colormap entry is normally set to 0. Non-zeroentries were used in the past to give indexed colormap information for fileswith pixels stored as indices into a color table.

Verbatim format

Image data in verbatim Iris RGB files is stored starting with the bottomimage scanline, working to the top of the image. Thus, the first pixelstored is in the bottom lefthand corner and the last stored is in the upperrighthand corner. All of the entries for channel 1 are stored, then all ofthe entries for channel 2, etc. For a four channel RGBα image, channel 1is used for red, 2 for green, 3 for blue and 4 for α. Channel data will bestored using either one or two bytes per pixel, depending on the value of


struct Iris_Header{short magic; // 01DA(hex) Iris RGB magic numberchar storage; // 0 = verbatim, 1 = run length encodedchar bpc; // 1/2 bytes per pixel per channelshort dimensions; // 1 = 1 scanline, one channel

// 2 = ysize scanlines, one channel// 3 = ysize scanlines, zsize channels

unsigned short xsize; // image (scanline) width in pixelsunsigned short ysize; // image height (number of scanlines)unsigned short zsize; // number of channelslong pixmin; // minimum channel value in imagelong pixmax; // maximum channel value in imagelong unused; // unused spacerchar imagename[80]; // ASCII text describing imagelong colormap; // 0 = normal, others values obsoletechar spacer[404]; // unused spacer

};

Figure 5.2: SGI Iris RGB File Header Format

the bpc entry in the file header.

Run-length encoded format

Scanlines of image data in run-length encoded Iris RGB files can be storedin any order. However, access to the scanline data is done through thescanline offset tables. There are two such tables, immediately following thefile header. The first is the scanline start table, which gives the locationin the file (i.e. the starting byte number from the start of the file) foreach channel of each scanline. The second table is the scanline length table,which give the encoded length of each channel of each scanline. There arefour bytes per table entry, i.e. each entry is an unsigned long. Bothtables are ordered using the same ordering scheme as for verbatim imagedata. Starting from the bottom of the image, there will be one entry perscanline for channel 1, then channel 2, etc.

Channel data is run-length encoded on a scanline by scanline basis.Encoding is done using the scheme shown in Figure 5.3. Each channel isrun-length encoded using an unsigned 8-bit1 (one byte) value for the repeat

1Recall that in 8 bits there is the possibility to store 28 = 256 different values. For


(for bpc = 1)

(for bpc = 2)

07

valuen

07

015

valuen

07

1 ≤ n ≤ 127, Repeat count for run of length = n

(for bpc = 1)

(for bpc = 2)

...07

valuen-128

07

value1

07

value2

07

value3n

07 07

value4

015

value1

value2

valuen-128

...n

07 015 015

128 ≤ n ≤ 255, n − 128 gives number of nonrepeating values that follow

0

07

end of data

Figure 5.3: SGI Iris RGB Run-length Encoding Scheme

count. If this value is between 1 and 127 it stands for the length of a run,and is followed by a one or two byte channel value that is to be repeatedto form the run. Thus, if this repeat count is 10, it means that the channelvalue following the count should be repeated 10 times to form a run of10 identical values. However, if the repeat count is between 128 and 255the count is followed by a sequence of nonrepeating channel values. Thenumber of nonrepeating values will be the count minus 128. A repeat countof 0, indicating end of data, is used to mark the end of each scanline.

5.1.2 SoftImage Picture Files

The SoftImage animation system also uses its own image file format, knownas the SoftImage Picture format. Files in this format usually carry thefilename suffix .pic. In Picture files, image compression is done using the

unsigned 8 bit number’s, this available space is used to encode values from 0 to 255.


run-length encoding scheme shown in Figure 5.4. Unlike the WavefrontRLA scheme, the Picture file scheme treats all count bytes as unsignednumbers. In this scheme, the run-length count is stored in either an eight ora 24 bit field. A non-repeating string of explicitly stored values is precededby a single byte count that must be a number between one and 127, thatindicates the length of the nonrepeating sequence minus one. A short run,runs of between two and 128 pixels, are encoded by preceding the repeatedvalue by a single byte-count between 129 and 255, which is interpreted asthe length of the run plus 127. Longer runs of from 128 to 65,535 pixelsare encoded by a single byte containing the value 128, followed by a twobyte field giving the explicit repeat count. The scheme is made a bit morecomplicated by the fact that encoded values may be either 24 bit RGBtriples or single byte values. Generally, in a full RGB image with an alphachannel, the RGB values will be encoded as triples, and the alpha values willbe single bytes encoded as a separate channel from the color information.A channel information block in the image file tells the decoding programhow to interpret the channels of image data.

SoftImage Picture files are organized into a header and an image datasection. The header has two parts, the picture information block and thechannel information block, as shown in Figure 5.5.

C++ code detailing the picture information block is shown in Figure 5.6,and is reasonably self explanatory. The magic number for a Picture fileis 5380F63416 and is stored in the first four bytes of the file. Floats arestored as four-byte SGI format floating point numbers. The aspect field istypically 1.0. The fields code field will be 0 if there is no image data, 1if only the odd numbered scanlines are stored for the image, 2 if only theeven numbered scanlines are stored, and 3 if all scanlines for the image arestored. The channel information section specifies what channels are presentin the image data, and how they are organized. Usually, the R, G and Bchannels are grouped together into 24 bit values per pixel and the alphavalues (if present) occupy a second group.

The channel information block immediately follows the picture informa-tion block in the file, and has one entry per channel of image data. Theformat of an entry in the channel information block is is four bytes long,with the format shown in Figure 5.7. The chained byte is set to 1 exceptfor the last entry. So if there are two image channels, the first entry wouldcontain a 1 in the first byte, and the second entry would contain a 0. Thesize byte will typically be set to 8, indicating the number of bits allocatedto each value in a channel. The type field indicates whether or not thechannel is run-length encoded. The four upper bits of the primaries byteare used to indicate which of the three color primaries or alpha are encodedin the channel. Typically, this byte will be set to 0xe0 to indicate that the


07

n

07

_ val

07

R val

07

G val

07

B val

07

n

or

129 ≤ n ≤ 255, Repeat count for short run of length = n − 127 (i.e. 2 · · · 128)

or

07

128

07

_ val

15 0

n

07

R val

07

G val

07

B val

07

128

15 0

n

129 ≤ n ≤ 65, 535, Repeat count for long run of length = n

or

07

n

07

_ val107

_ val2 ...07

_ valn+1

07

n

07

R val107

G val107

B val107

R valn+107

G valn+107

B valn+1...

0 ≤ n ≤ 127, n + 1 gives number of nonrepeating values that follow

Figure 5.4: Softimage Picture File Runlength Encoding Scheme

header picture informationheader channel information

image data

Figure 5.5: SoftImage Picture File Format

5.2. COLOR TABLES 9

struct PIC_Header{long magic; // 0x5380f634 -- PIC magic numberfloat version; // PIC file format version numberchar comment[80]; // any text describing the imagechar id[4]; // "PICT"short width; // image width in pixelsshort height; // image height in scanlinesfloat aspect; // pixel aspect ratio (width/height)short fields_code; // 3 for full frame imageshort unused; // should be 0 (but might not be!)

};

Figure 5.6: SoftImage Picture File Picture Information Block

typedef struct{unsigned char chained; // 0 = last, 1 = more channels followunsigned char size; // 8 -- number of bits/channel valueunsigned char type; // 2 = run-length encoded, 0 unencodedunsigned char primaries; // bits indicate R, G, B, alpha

// R 0x80, G 0x40, B 0x20, alpha 0x10}PIC_CHANNEL_INFO;

Figure 5.7: SoftImage Picture File Channel Information Block

channel contains R, G and B data, or to 0x10 to indicate that it containsalpha data only.

The image data immediately follows the channel information block, andis organized by scanline, with the top scanline of the image appearing firstin the file. Each scanline is encoded in channels as indicated by the chan-nel information block, typically as an RGB channel followed by an alphachannel, as shown in Figure 5.8.

5.2 Color Tables

Another compression scheme that can be quite effective, if a picture isknown to have a limited color palette, is to store a table of the RGB colorsin the image file, and then for each pixel store the index to the table entrythat holds the RGB color for that pixel. For example, if an image has 256


alphachannel

red, green, bluechannel

Figure 5.8: SoftImage Picture File Scanline Organization

or less colors, a one byte (eight bit) entry for each pixel will suffice to indexinto a table containing full 24 bit RGB information. This reduces the imagefile size by 2/3, less the space required to store the 256 table entries.

This scheme can even work for images with large numbers of colors, ifone is willing to take some degradation of image quality. There are variousways to quantize an image to reduce the number of distinct colors so thatthe color table method is effective. A simple way of reducing the number ofbits per pixel is by discarding low order bits for each color primary. This es-sentially “throws away” the least significant part of the primary color value,leaving a picture with a reduced color palette. For example, discarding thebottom two bits for each primary will reduce the potential total number ofdifferent colors by a factor of 22 = 4. For natural scenes, with high imagecomplexity, this often is hardly noticable. However, for computer graphicimages, especially ones with smoothly varying color gradations, color quan-tization leads to very noticable banding of colors. To avoid artifacts such asbanding, more sophisticated techniques for reducing the number of colorsin an image can be used. These schemes attempt to choose the colors thatwill be preserved in an optimal way. One example is a scheme for choosingthose elements of the RGB color cube which approximate the colors in theoriginal image with the least variance between the original and quantizedimages.

5.3 Lempel–Ziv–Welch Encoding

Another very popular compression scheme is also based on a table, butit involves 1) no loss of data, and 2) no need to store the table! It isknown as the Lempel-Ziv-Welch (LZW) algorithm and forms the basis ofthe well-known Unix compress utility. In a nutshell, this technique worksby “discovering” and remembering patterns of colors (or any other datayou may want to compress), and storing these patterns in a table – thenonly table indices are stored in the file. Unlike the color table technique,table entries can grow arbitrarily long, so that one table index can standfor a long string of data in the file. And, by a clever construction, it turnsout that the table itself never needs to be stored in the file.

5.3. LEMPEL–ZIV–WELCH ENCODING 11

5.3.1 The LZW encoding algorithm

To see how the LZW encoding algorithm works, we will follow a simpleexample. To begin with, let us assume that there is a fixed number ofcolor values (like 256 for 8-bit pixels). For our example, let us represent aspace of only 4 possible colors – A, B, C, or D. Allocate a table called T, acurrent-input pixel register called in, a current table index register calledindex, and a string variable called prefix, and consider the example inputscanline

A B A C A B A

Then the algorithm works like this

1. Initialize the table T with the possible 1-pixel color values, and theprefix variable with the empty string.

Tvalue A B C Dindex 0 1 2 3

in

index

prefix ""

2. while (input not exhausted)a load input register in with next input pixel valueb append in to string in prefixc if (prefix not in table T)

i output indexii insert prefix in table Tiii set prefix = in

d set index = position of prefix in table

3. output index

Figure 5.9 shows how this algorithm would progress when run on thesample input. In this example, it compresses the original seven input pixelsinto only 6 output values.

5.3.2 LZW Decoding Algorithm

The genius of Lempel-Ziv is in the decoding algorithm. This converts theencoded file back into the original sequence, without having the table thatwas built upon encoding. In fact, the decoding algorithm recreates thistable as it runs.

The algorithm utilizes the following logic:


Sequence of Prefixes and Outputsprefix in output"" A -A B 0B A 1A C 0C A 2A B -AB A 4A - 0

Resulting Table T

index value0 A1 B2 C3 D4 AB5 BA6 AC7 CA8 ABA

Figure 5.9: LZW Encoding of the Input String: A B A C A B A


1. We know the first, 1-pixel, table entries. In our example, these were(0, A) (1, B) (2, C) (3, D). Thus, if we read any input from theencoded file that has a value in this range (e.g. 0 through 3), we candirectly translate it.

2. We keep track of the previous matched table entry, and when weinput the next code from the file, note whether it is either in thetable already or it is not.

3. If the new code is in the table, then we know what it stands for,and we also know that the previously translated code, plus the firstcharacter of this code must be the next entry to be added to the tableif it is not already there.

4. If the new code is not in the table, then the previously translated code,with its own first character appended to the end, is the translationfor the input, and must also be the next entry to be added to thetable.

In pseudocode form, the LZW decoding algorithm is given by:

1. Initialize the table T, e.g. T = [(0, A) (1, B) (2, C) (3, D)]

2. Read 1st value from the file, code = getcode()

3. Output(T[code])

4. oldcode = code

5. for(code = getcode(); code = End-of-File; code = getcode())

a if(code < tablelength) /* i.e. code already in T */

i output(T[code]);ii Insert T[oldcode] with T[code][0] appended, into table T

b else /* code not in table yet */

i translation = T[oldcode] with T[oldcode][0] appendedii output(translation)iii insert translation into table T

c oldcode = code

Figure 5.10 shows how the example encoded file from Figure 5.9 wouldbe decoded. Recall that the encoded file contains the sequence (0 1 0 2 40), which is the encoding of the original scanline (A B A C A B A).


Sequence of Inputs, Outputs, Actionsoldcode code translation action

- 0 A T = [(0,A) (1, B) (2, C) (3, D)]0 1 B add AB to T1 0 A add BA to T0 2 C add AC to T2 4 AB add CA to T4 0 A add ABA to T0 end of file — —

Figure 5.10: LZW Decoding of the Encoded String: 0 1 0 2 4 0

The way to understand why this algorithm works correctly, is to thinkabout how the original encoding algorithm builds its table. Imagine thedecoding algorithm as using exactly the same table building logic. However,instead of reading pixels directly from the input file, it gets its input pixelsfrom reading codes and decoding them via the partially built table. As longas a code that is read is in the table already, simply take its translationpixel by pixel and feed it to the encoding algorithm’s table building logic.The only time this has a problem is when a code is read that is not in thetable. In this case, we know that the missing code must be the next oneto be added to the table, and even better, we even have the missing entrypartially built. In the encoding algorithm, the partially built table entryis the “prefix” string. Equivalently, in the decoding algorithm, it is thetable entry for oldcode. So, instead of being stopped by the missing tableentry, we can proceed on using the characters in the partially built newtable entry. If we do this, we see that the first character in the partiallybuilt entry (i.e. the first character of the translation for oldcode) becomesthe last character of the new entry that gets added to the table. Workingthrough the following simple example will demonstrate this to you.

Try the following – both encode and decode the sequence (A B A B AB A)1. You should get the encoding (0 1 4 6), reducing the original seveninput values to four output values. When you try decoding this sequence,you will see that the code 6 is read before the corresponding table entryis built. Note that results obtained with the LZW algorithm are moreimpressive when there are numerous repeated patterns in the input.

1also try to pronounce it :)


5.3.3 GIF - Graphics Interchange File Format

The GIF format was developed by CompuServe, Inc. to meet the demandfor a highly compressed image file format for storage in remote archives anddownloading over networks. The format assumes that display devices are atmost 8-bit/pixel resolution, and employ a color-lookup-table. Accordingly,GIF stores a color table along with an image, and the image itself is storedas a sequence of indices into the color table. GIF gets further compressionby using LZW encoding to compress the image data.

A GIF file for a single RGB image is organized as follows:

1. Three bytes containing the magic number “GIF”

2. Three bytes containing the GIF version number. This defines the GIFstandard employed by the file, it typically is “87a”.

3. Seven bytes containing the logical screen descriptor for the file. Thefirst four bytes contain the window width and height, two bytes pro-vide information on color table format, and the last byte is the colortable index for the background color. Note, the window width andheight are stored in byte reversed form, least significant byte followedby most significant byte.

4. Global color table. The color table is stored as a consecutive listof RGB values. There is no accommodation for an alpha channel.Although it can be varied, typically the table has 256 entries, eachcontaining a three bytes, one for each of the Red, Green, and Bluecomponents of the color.

5. Image descriptor. This begins with a byte containing the identifyingcode 0x2C. This is followed by eight bytes giving coordinates of upperleft hand and lower right hand corners of image in the display win-dow (in byte reversed order), and typically a single byte containing 0(although there may be much more graphical control information).

6. Image data, in LZW encoded form. When decoded, this gives thecolor table index for each pixel in the image. Indices are stored inscanline order starting at the upper lefthand corner of the image.

7. Trailer byte. This single byte contains the code 0x3b, and indicatesthe end of the GIF file.

As an example of the GIF format, Figure 5.11 contains portions of anannotated hexadecimal dump of a GIF file encoding of the red cube image,


Byte Values Decoding

Header000000 4749 4638 3761 GIF87a

Logical Screen Descriptor000006 2c01 c800 byte reversed w x h00000a f700 CMap: 8-bits x 25600000c 00 background index

Global Color Table00000d 7a 0000000010 7a01 017b 0000 7b02 027b 0303 7b04 047b reds and red-greys000020 0606 7b09 097b 0d0d 7b11 117c 0000 7c01000030 017c 0606 7c09 097c 0d0d 7c15 157c 1c1c000040 7c1d 1d7d 0000 7d01 017d 0202 7d04 047d

... etc. etc. ...000100 7f7a 7a7f 7b7b 7f7c 7c7f 7d7d 7f7e 7e

00010f 7f 7f7f middle grey

000112 8000 0080 0202 8004 0480 0909 8022 reds and red-greys000120 2280 5050 8051 5180 5b5b 8062 6280 6f6f

... etc. etc. ...000300 0000 0000 0000 0000 0000 0000 00

Image Descriptor00030d 2c00030e 0000 0000 2c01 c800 00 (0, 0), (300, 200), 0

Table Based Image Data000317 08 LZW min code size = 8000318 fe 254000319 00 ad08 1c48 b0a0 packed data000320 c183 0813 2a5c c8b0 a1c3 8710 234a 9c48000330 b1a2 c58b 1833 6adc c8b1 a3c7 8f20 438a000340 1c49 b2a4 c993 2853 aa5c c9b2 a5cb 9730000350 63ca 9c49 b3a6 cd9b 3873 eadc c9b3 a7cf

... etc. etc. ...001060 bcda abbe faab c01a acc2 6aa6 0101 00

Trailer00106f 3b

Figure 5.11: Hexadecimal Dump of cube.gif, with Explanatory Comments


cube.gif of Figure 2.3. This is the same image that we saw in the PPMassignment.

There are a lot of other details and features supported by GIF, that wewill leave for the interested student to study on his own. For that purpose,you can pick up the file gif89a.doc in/usr/local/misc/courses/viza654/documentation/if you want more information on GIF. In the same directory you will findlzw.and.gif.doc, by Steve Blakestock, which gives more detail on LZWencoding and the GIF implementation of it.

Chapter 6

The PostScript PageDescription Language

PostScript1 is the most widely known and used alternative to the raster-oriented pixmap-style image description technique. Although PostScriptcan support images stored as bitmaps or pixmaps, its main use is as a lan-guage for describing how a page should be drawn. In other words, PostScript“thinks” of an image as a sequence of steps – an algorithm – for drawingthat image from a small set of drawing primitives.

PostScript became widely known and used when several laser-printermanufacturers agreed to build printers that could “understand” the PostScriptlanguage. Such printers look at a PostScript file to be printed as a program.The printer loads the program, and executes it – thus drawing out the im-age. Such printers are really small computers, with several megabytes ofmemory, that are attached to an electrostatic copy machine, a laser scan-ner, and an input port through which they can download PostScript filesfrom the computer that is requesting the printing.

Our object in this course will be to achieve a basic level of understandingof the structure of PostScript. We will learn just enough so that we havea feel for this very important file format, and can look at a PostScriptfile and understand basically what it is all about. We will not attempt tobecome expert PostScript programmers, but will develop a little of our ownPostScript code for a simple problem. Students should not look to thesebrief notes as anything but an outline of the highpoints of PostScript.

For a complete description of PostScript please see the pair of PostScriptreference books in the laboratory. The PostScript Language Reference Man-

1PostScript is a trademark of Adobe Systems Incorporated.

1

2CHAPTER 6. THE POSTSCRIPT PAGE DESCRIPTION LANGUAGE

ual is a red covered book, and gives a complete description of the details ofPostScript. The blue covered PostScript Language Tutorial and Cookbookis a “how to” guide, that has lots of fancy examples of things that youcan do with PostScript. Both books are by Adobe Systems Incorporated,and are published by Addison-Wesley, copyright 1985, 1986. In addition, asmall group of example PostScript programs can be found in

/usr/local/misc/courses/viza654/handouts/postscript/.

Figure 6.1a contains a listing – in plain text form – of the PostScriptfile (i.e. the PostScript program) that will produce the Hello World targetpicture shown in Figure 6.1a. The file target.ps can be found in thehandouts directory listed above. If you have the file in your local directory,then the Unix command

lpr target.ps

will cause the target picture to be sent to the laser printer and printed.The first line of the file (actually a PostScript comment) tells the printerthat this is a PostScript file, and not simply a plain text file, causing theprinter to read and interpret the file as a PostScript program, rather thanjust printing it as pages of text.

There are two handy tools on our system for examining PostScript files,to see what they will draw without having to go to the printer. These arethe programs xpsview and ghostview. Simply type

xpsview target.ps or ghostview target.ps

to display the image produced by the target program on the screen. Bothprograms come with a simple interactive interface that should be reasonablyself explanatory. My personal preference is for xpsview, as I find that itis faster and gives a truer representation of what finally ends up on theprinter. However, xpsview is not available on all machines, so you shouldknow about both programs.

Sometimes, for debugging purposes, you may want to obtain a printoutof the text of a PostScript file. You cannot do this by simply sending the fileto the printer, as the printer will interpret the file as PostScript. To printthe text of a PostScript file, use the program psf, which formats text filesfor printing, to change the ascii text of the file into a PostScript programthat will “draw” the contents of your file. The output of psf can then besent to the printer via lpr.

PostScript is not a particularly compact representation, so .ps filestend to be large. Fortunately, they are all text, so programs like the Unixcompress utility are quite successful in reducing the amount of space neededto store them.

3

%!PS-Adobe-

/circle {newpath % procedure to draw a circle0 360 arcclosepath

} def

/inch {72 mul} def % procedure to convert pts. to inches

/pg save def % current state saved in pg

90 rotate % page rotated clockwise to landscape mode0 -8.5 inch translate % restore y-coordinate to bottom edge

% draw the target5.5 inch 4.25 inch 3 inch circle stroke5.5 inch 4.25 inch 2.5 inch circle fill5.5 inch 4.25 inch 2.0 inch circle 0.5 setgray fill

% draw the text into the target/Helvetica-Bold findfont 32 scalefont setfont(Hello World) dup stringwidth pop11 inch exch sub 2 div 8.5 inch 32 sub 2 div moveto1 setgray show

showpage pg restore % restore state after page drawn

a) Program

Hello World

b) Resulting Page at 1/5 Scale

Figure 6.1: PostScript Program to Make Hello World Target Picture


6.1 Basic Notions of the PostScript Language

The basic underlying notion in PostScript is that of the page. A PostScriptprogram works a page at a time, creating a blank virtual page, drawing intothe page to produce an image, and finally outputting the page to an outputdevice. Measurements within a page are done in points a measurement usedby typographers – 1 pt.= 1/72 inch. By default, the origin of a page is itslower lefthand corner, as shown in Figure 6.2. The horizontal directionfrom this corner is the x coordinate, and the vertical direction is the ycoordinate.

(0,0) x

y

Figure 6.2: Origin of a Page

All drawing commands are done in a continuous coordinate system –not a discrete system like in a pixmap (i.e. there are no pixels). Becauseof this, PostScript images are continuously scalable to an arbitrary sizewithout loss of detail.

PostScript maintains a current path – i.e. a sequence of points, curves,and lines – that can be added to by drawing commands. When a path iscompleted, it can be painted into the current page be either stroking thepath (drawing its outline) or filling the inside of the path with a color.

PostScript also maintains a current transformation matrix (CTM), thatacts to transform the original coordinate system of the page into a systemmore convenient for drawing a portion of the current path. Translate, rotateand scale operators allow one to manipulate the CTM.

6.2 Structure of PostScript Language

The basic data structure of PostScript is the stack. This is a first-in/last-out structure that works just like a stack of cards or books. New things areadded to the top. Removals are also from the top. These basic operationson a stack are called push and pop, and are illustrated in Figure 6.3.

6.3. POSTSCRIPT LANGUAGE SYNTAX AND SEMANTICS 5

4321

PUSH POPin out

Figure 6.3: Operations on a Stack

PostScript maintains the following stacks:

• operand stack results of executing program objects,

• systemdict system dictionary stack of predefined PostScript opera-tors,

• userdict user dictionary stack of user defined operators,

• execution stack call stack for procedure calls,

• graphics state stack top item is current context for drawing.

6.3 PostScript Language Syntax and Seman-tics

The following brief outline of PostScript Language syntax and seman-tics lists the valid objects in the language, and the action taken by thePostScript interpreter upon encountering the object.

• Numbers: examples of standard notations are 123, -123, -0.002,1e14. Radix notation follows the base 16 example 16#FF73. actionpush numeric value onto stack.

• Strings: delimited by parentheses, as in the example (this is astring). Parentheses can be inserted in a string if they are balancedlike (more (stuff)), or can be escaped by preceding them by a \character – i.e. $ and $. Standard C special characters like \n canalso be used. action push entire string onto stack as a single unit.

• Names: a sequence of characters containing no whitespace and with-out any of the syntax characters like (, ), [, ], \, /, for example abc@nyc 2for1. action the name should be bound to either a procedureor a value. The associated procedure is executed or the associatedvalue is returned.


• Literal Names: any name immediately preceded by a / character, like/abc. action evaluates to the name itself, i.e. the name abc in theabove example.

• Arrays: a sequence of objects delimited by [ and ], like [123 abc(xyz)]. action The [ symbol marks the start of an array, and the] symbol marks its end. The objects between [ and ] are evaluatedone by one, then the entire result is pushed onto the stack as a singleunit.

• Procedures: a sequence of objects delimited by { and }, such as{5 10 moveto}. action acts just like an array, only the objects be-tween the delimiters { and } are left unevaluated.

• Comments: from % symbol to the end of the current line, example% - this is a comment. action comments are not evaluated andhave no effect.

6.4 Execution of PostScript Programs

Execution of a PostScript program is stack oriented. As each expression isevaluated, its operands are popped from the operand stack, and its result ispushed back onto the operand stack. Figure 6.4 shows example executionof program text that adds the numbers 5 and 10 together. Following theexample, we see that program execution is from left to right through theprogram text. The execution of a number from the program text placesthe number’s numeric value on the operand stack. The execution of aprocedure, such as add pops the procedure’s operands from the top of thestack, performs the designated operation, and pushes the result back ontothe stack. We see that add retrieves the top 2 values from the stack, addsthem, and pushes their sum back onto the stack.

Because of the stack orientation of PostScript, arithmetic is naturallydone in postfix form. What this means is that operators follow their operands,rather than being between their operands as in normal arithmetic (e.g.5 10 add rather than 5 + 10). This takes a bit of getting used to, butis actually just as convenient as the usual infix scheme. One consequenceof the scheme is that order of evaluation is strictly dependent upon posi-tion within a sequence, and thus correct evaluation of expressions does notrequire the use of parentheses.

The arithmetic operators that are built into PostScript are: add submul div idiv mod. They work as shown in Figure 6.5. As an example,the standard infix expression

6.4. EXECUTION OF POSTSCRIPT PROGRAMS 7

... 5 10 add ...program text:

program counter

operand stack program execution

... ... 5 10 add ...

... 5 ... 5 10 add ...

... 5 10 ... 5 10 add ...

... 15 ... 5 10 add ...

Figure 6.4: Example PostScript Program Execution Sequence

(3x + 2y)/4,is equivalent to the PostScript postfix notation

3 x mul 2 y mul add 4 div.

a b add −→ a + ba b sub −→ a − ba b mul −→ aba b div −→ a/b real resulta b idiv −→ a/b integer resulta b mod −→ a mod b

Figure 6.5: PostScript Arithmetic Operators

6.4.1 Procedure definition and execution

The operator def turns a name into a variable, by binding the name toeither a value or a procedure. It takes the top 2 stack elements – the topelement being the value to be bound or assigned to the literal name whichis the second element on the stack. To define a procedure, the top elementmust be a procedure { . . . }. def binds the name to the procedure, sothat when the name is executed in a program, the procedure bond to it is


evaluated. The following example defines a procedure with the name sqr,that computes the square of the current top stack element.

/sqr {dup mul} def % squares the top stack element

An example use of the sqr procedure would be

4 sqr

which would leave the value 16 (4 squared) on the top of the stack afterexecution.

So, a procedure can be looked at as requiring a certain stack configura-tion when it is called, and that leaves the stack in a new configuration afterits execution. Typically, a procedure will remove its arguments from thestack (pop them from the stack), compute one or more results and placethe result value(s) on the stack (push them onto the stack). Thus, to fullydocument what a procedure does, one must specify 1) what operands itexpects and in what order they must appear on the stack, 2) what valuesare returned to the stack by the procedure, and 3) what side effects (if any)the procedure has. Side effects are effects of running the procedure, otherthan changes to the operand stack. For example, some procedures affectother stacks, like the graphics stack, and others cause graphics to be drawn.

6.4.2 Control flow

Normally, PostScript programs are executed in sequence. Flow of controloperators break the normal sequence of execution, and allow selective exe-cution and looping. The most important control flow operators supportedby PostScript are if, ifelse, and for.

Before describing these operators, it is necessary to show how logical ex-pressions are built up. As in any programming language, there are operatorsto test relationships. These relational operators are shown in Figure 6.6.

The control flow operators are summarized in Figure 6.7. The if opera-tor in PostScript pops a boolean expression and a procedure from the stack,and executes the procedure only if the boolean expression is true. Simi-larly, the ifelse operator pops a boolean expression and two proceduresfrom the stack. It executes the top procedure if the boolean expression isfalse, and the second procedure if the boolean expression is true. The foroperator provides a looping mechanism. It pops an initial counter value, acounter increment value, a counter limit, and a procedure from the stack.The procedure is executed once for each counter value, from the initial valueup to the limit, with the counter incremented by the increment value after

6.4. EXECUTION OF POSTSCRIPT PROGRAMS 9

operatorsoperator gt ge lt le eq nerelationship > ≥ < ≤ = =

relational operator examplesprior stack operator ending stack6 2 gt true6 2 lt false

Note that true & false are defined symbols in PostScript

Figure 6.6: Relational Operators in PostScript

each execution. The current counter value is pushed onto the stack just be-fore the procedure is executed. The procedure is responsible for removingthis counter value from the stack. If the increment value is negative, thecounter is decremented by this value on each iteration, and termination iswhen the counter is less than the limit. If the initial value exceeds the limitthe procedure is never called.

control flow operator examplesprior stack operator ending stackboolean-expr proc-true if results of proc6 2 gt {(big)} if (big)6 2 lt {(small)} if —boolean-expr proc-true proc-false ifelse results of proc6 2 gt {(big)} {(small)} ifelse (big)6 2 lt {(big)} {(small)} ifelse (small)

initial incr limit procedure for results of proc0 0 2 10 add for 30

Figure 6.7: Control Flow Operators in PostScript


6.5 Graphics in PostScript

6.5.1 Graphics State

The graphics state in PostScript is a description or list of parameters thatcontrol how drawing commands will be interpreted. This includes:

• CTM – current transformation matrix, determines the drawing coor-dinate system.

• point – where the virtual pen is that is “tracing out” the drawing

• current path – accumulation of lines and curves defining a boundarythat will be drawn in by a stroke or fill command.

• clip path – defines a boundary, outside of which stroke and fillcommands will have no effect.

• grey level (or color for color PostScript)

• line width – width in points of a stroke

• cap – 0 , 1 , 2

• join – 0 , 1 , 2

• miter limit – limit on length of sharp miters (when join = 0)

• dash – dash pattern for dashed lines

• flatness – measure of how accurately curved lines are rasterized whenfilled or stroked.

The graphics state can be saved, in its entirety, on the graphics statestack via the operator

gsave.It can be restored from this stack via

grestore.gsave and grestore give one the ability to, for example, scale an individualobject while it is being drawn without affecting the rest of the page. Simplysave the graphics state on the stack, change the scale and draw the object,and then restore the graphics state from the stack before continuing.

Commands that affect the parameters of the current graphics state are

6.5. GRAPHICS IN POSTSCRIPT 11

n setgray black 0 ≤ n ≤ 1 whiten setlinewidth n is line width in pointsn setlinecap n = 0, 1, 2n setlinejoin n = 0, 1, 2n setmiterlimit

pattern offset setdash pattern = array of dashes and gaps,offset = offset from start of line

n setflat n < 1: very fine, n larger: cruder

Drawing operators, are those that have side effects that alter the currentpath. These are


x y moveto move current drawing point to (x, y)dx dy rmoveto move current drawing point by

increment (dx, dy)x y lineto draw line from current point to (x, y)

dx dy rlineto same as lineto, but incrementalmovement by (dx, dy)

x y r θ0 θ1 arc counter clockwise arc

arc

e1

e0

r

Circular arc measured from the xaxis, with center (x, y) and radiusr

x y r θ0 θ1 narc clockwise arcx1 y1 x2 y2 x3 y3 curveto Bezier curve

(x0,y0)(x3,y3)

(x2,y2)

(x1,y1)

currentpoint

Bezier control points are the cur-rent point and the 3 (x, y) argumentpairs

x1 y1 x2 y2 r arcto curve tangent to 2 lines

(x0,y0)

(x1,y1)

(x2,y2)

currentpoint

Determines the curve of radius rthat is exactly tangent to the linesdefined by the current point and (x1,y1), and by (x1, y1) and (x2, y2). Astraight line is drawn from the cur-rent point to the first tangent point,and an arc is drawn between the twotangent points

newpath initialize the path to be emptyclosepath connect start and end of current path

There are three operators that affect or make use of the clip path. Theseare


initclip set clip path to device outer boundaryclip clip path = intersection of clip and current

paths

clip path

currentpath

resultingclip path

The clip path is replaced by the in-tersection of the clip path and thecurrent path. This means that all ofthe interior of the current drawingpath that lies within the clip pathbecomes the new clip path.

clippath set current path to clip path

Operators that cause the current path to be drawn into device’s raster,or otherwise affect the output page are

erasepage that’s what is does!fill fill inside current path with gray (or color)stroke outline current path using line style parameters

The command that causes the current page to be sent to the outputdevice (i.e. printed on the page for a laser printer) is

showpageCommands that affect the current transformation matrix are

dx dy translatesx sy scaleangle rotate angle degrees counterclockwise

The way to think about these commands is that they operate on the coor-dinate axis, as shown in Figure 6.8. The effect of these operations on theimage is to cause all commands that affect the current path to be executedin the transformed coordinate frame.

6.5.2 Working with Text

Characters in PostScript are actually drawn using the drawing commandsdescribed above. However, it is convenient to have a special set of com-mands to handle text characters. Characters are stored in fonts. Youcan think of a font as a dictionary that associates each drawable char-acter (like the letter ‘a’) with a procedure for drawing that character.The names for the fonts that are typically found in all PostScript imple-mentations are: Times-Roman, Helvetica, Courier, and Symbol. There


3 2 translate

30 rotate

0.5 0.5 scale

Figure 6.8: Translate, Rotate and Scale Transform the Coordinate Axes

are also bold, italic, and bold-italic versions: Times-Bold, Times-Italic,Times-BoldItalic, Helvetica-Bold, Helvetica-Oblique, Courier-Bold,Courier-Oblique. Of course, most implementations have many more fontsavailable.

The following operators operate on fonts and display text

key findfont looks up the font with the name key and pushesit onto stack. e.g. /Times-Roman findfont

font scale scalefont applies scale to font, returning the scaledfont to the stack

font setfont makes font the current font, e.g./Courier findfont 14 scalefont setfontmakes Courier 14 the current font

Once a font is set, the commandstring show


will draw the string into the page starting at the current point. If characteroutlines are needed, then

string boolean charpathwill append the outlines of the characters in string to the current path. Ifthe boolean operand is false, the outline will be suitable for stroking. Iftrue, the outline will be suitable for filling or as a clip boundary.

Chapter 7

Compositing

Image compositing has been used for many years in traditional film, graph-ics and animation to achieve a variety of effects. Compositing can be assimple an operation as to lay down a “masked out” set of pixels from oneimage onto a background image. This is similar to the process shown inFigure 7.1 that is used in classical animation. Foreground characters arepainted with opaque ink onto transparent cels, laid down over a backgroundpainting, and photographed to make a composite. This allows the anima-tors to use a single background image for an entire animated scene. For mostanimation work, this idea is pushed quite far, with a typical frame being thecomposite of multiple cel layers. Other techniques, such as blue screeninguse the same kind of idea to composite film or video imagery together –again superimposing the image of an opaque foreground character over abackground. Modifications of this idea are to make the foreground char-acters semi-transparent, blending foreground characters with backgroundimages, to create a variety of effects.

Compositing is particularly useful in computer graphics, and its ease ofuse in this medium is largely responsible for the explosion of interest in digi-tal techniques in film special effects production. The technique of morphingis usually implemented by combining image warping with compositing toblend one image into another in such a way that it looks like the first im-age is “turning into” the second image. Other uses of compositing are tointegrate animated imagery with recorded live action imagery, to produceeffects that would be impossible to stage, such as characters with missinglimbs, action in fantastic settings, and to integrate new footage with stockfootage.

1

2 CHAPTER 7. COMPOSITING

background

composite

cel

Figure 7.1: Compositing Foreground Cel with Background Image

7.1 Alpha

Recall that in our early discussion of framebuffers and image storage, wenoted that besides storing red, green and blue primary values for each pixel,a full-color framebuffer, a pixel of which is diagrammed in Figure 7.2, willoften also have eight bits per pixel available to store an alpha value. So farwe have not dealt with this quantity other than to mention its presence. Inprinciple, this alpha value could be used for any convenient purpose in animage manipulation algorithm, but its usual use is for compositing.

_R G B

Figure 7.2: Four Component Pixel Value

When used for compositing, the alpha value of a pixel is interpreted asthe pixel’s opacity. On a scale from 0 to 1, an alpha value of 0 indicatesthat the pixel is fully transparent and a value of 1 indicates that it is fullyopaque.

7.2. THE OVER OPERATOR 3

7.2 The Over Operator

The over operator, operating between two images, is the basic method forcompositing a foreground image F over a background image G to producea composite image P . The entire process would be written symbolically as

P = F overG.

To define the over operator, let us let C represent a color channel value(i.e. a red, green or blue value) for a specific single pixel in one of theimages, and let a subscript on C, such as CF , indicate which image thecolor channel value is from. Similarly, let α represent the alpha value forthe same pixel, again with a subscript to indicate the image. Then, the perpixel operation implied by the over operator is given by

CP = αF CF + (1 − αF )CG. (7.1)

This is often rewritten for implementation as

CP = αF (CF − CG) + CG.

to save a multiply. Note, that all we are saying by this formulation is thatthe output color is computed by a linear interpolation between foregroundand background pixel color channel values. The computation for each ofthe red, green and blue color channels is identical. If it is desired to com-posite multiple images over the same background image, this process can berepeated, compositing each foreground image into the background image,one at a time.

However, it is often desirable to composite multiple foreground imagestogether and then composite the result with a background. In other wordswe would like the over operator to be associative, so that we can groupcompositing operations in any way and still get the same end result. Moreprecisely, if we let A and B represent two different foreground images, andG the background image, we would like the over operator to be definedsuch that it obeys the associative law

Aover (B overG) = (AoverB)overG.

The questions to be dealt with are 1) how do we reformulate the overoperator to use the two alpha values in the foreground images to compos-ite the color-primary values into the intermediate composited foregroundimage, and 2) how do we extend the over operator to combine the alphavalues from the two foreground images to provide an alpha value for theintermediate image?


The problem is simplified somewhat by the fact that we know whatthe final result of the multiple compositing should be for the color-primaryvalues. This is simply the result obtained by compositing each of the fore-ground images into the background image one at a time, using the processgiven by Equation 7.1. If we first composite image B with the background,and then composite image A with the result, by Equation 7.1 we wouldhave

CP = αACA + (1 − αA)[αBCB + (1 − αB)CG]. (7.2)

Also, it is clear that the final composited image would have an alpha valueof 1 for all pixels, since the compositing is being done into the fully opaquebackground image G. Thus, we have

αP = 1. (7.3)

Now, letH = AoverB. (7.4)

stand for the the unknown intermediate foreground image that would beformed if we first composited the two original foreground images to forman intermediate image H. Then the final operation to composite the inter-mediate image H with the background image G is described by

CP = αHCH + (1 − αH)CG, (7.5)

Equation 7.2 can be rearranged to yield

CP = αACA + (1 − αA)αBCB + (1 − αA)(1 − αB)CG. (7.6)

Comparing Equations 7.5 and 7.6, we see that for associativity to hold wemust have

αHCH = αACA + (1 − αA)αBCB , (7.7)

and(1 − αH) = (1 − αA)(1 − αB). (7.8)

Equation 7.7 can be solved for CH to yield

CH = αA/αHCA + (1 − αA)αB/αHCB . (7.9)

and Equation 7.8 can be solved for αH to yield

αH = αA + (1 − αA)αB . (7.10)

7.3. ASSOCIATED COLORS 5

7.3 Associated Colors

Now, Equations 7.9 and 7.10 give us methods for computing both the color-primary and alpha values for the intermediate image H. The problem withthis formulation is that the two equations give us different calculationsfor opacity (alpha) and color value. It would be handy to have a unifiedscheme, so that color-primary and alpha values could all be calculated bythe same method, meaning that we would not have to differentiate betweenthe two kinds of quantities used in representing a pixel. The idea thatwill provide the desired unification is to use Equation 7.7, where each pixelcolor-primary value appears premultiplied by its associated alpha value.

The association of each color-primary with its alpha value can be doneinitially in the framebuffer by a simple multiplication. When an image isstored in this way we say it is an associated color image. Let A and Bbe the two associated-color foreground images corresponding to images Aand B, and let H be the intermediate associated-color image obtained bycompositing A over B. Then by Equation 7.7 our formulation for colorprimary value reduces to

CH = CA + (1 − αA)CB. (7.11)

which would be computed in the same way as we computed alpha in Equa-tion 7.10.

Equation 7.11 then is the equation governing our unified over opera-tor and fully describes the computation needed to composite both color-primary and alpha values between pixels of associated-color images to pro-duce an associated-color output image. One nice result of this formulationis that it is no longer necessary to differentiate between a foreground anda background image, except to note that a background image would be animage all of whose pixels have alpha values of one.

Two final things about associated-color images are reasonably obviousbut worthy of note. The first is that when all alpha values are 1, the origi-nal image and its corresponding associated-color image are identical. Thesecond is that before displaying an associated-color image, it is importantto remember to divide each stored associated-color color-primary value byits associated alpha value to recover the original color. Otherwise, imageswill appear washed out.

7.4 Operations on Associated-Color Images

The desire to unify the color-primary and alpha value calculations was mo-tivated by more than aesthetics. It turns out that the associated color


image representation together with the over operator of Equation 7.11,gives us exactly the formulation that is needed to do other operations onimages whose pixels have opacity information stored with them. For ex-ample, if we wish to blur a foreground image before compositing it withanother image, we would want the blurred image to maintain a reasonableset of alpha values, i.e. we would want the alpha channel to be in somesense “blurred” along with the image color channels. If we did not do this,our resulting image would have clean hard edges where the original alphamask was, although the rest of the image was blurred. It turns out thatif we store our images using the associated-color representation, then op-erations like blurring, and other image manipulations like size and shapechanges will work correctly if we apply the same operations to both thecolor channels and the alpha channel.

To verify this, let us look at a combined process of shrinking (minifying)two images and compositing. If we take two images and composite themfirst, and then follow this composition by the minification, it would bedesirable that the result would be the same as if we first minified each imageseparately and then composited them. Let us say that our minification issimply a scale by one-half in the horizontal direction, so that the resultingimage is one-half of the original width, but the height remains the same. Tounderstand what color computations happen when we combine shrinkingwith compositing, we need only look at two adjacent pixels on a scanlinein one image and the corresponding two adjacent pixels in the other image.The shrinking operation will be to replace each adjacent pair of pixels ona scanline in the original image with a single pixel whose color value is theaverage of the two original pixels. Call the two pixel associated-color color-primary values in the first image P and Q, both with alpha value α1, andlet the primaries in the second image be R and S, both with alpha valuesα2. Now if we were to minify both pictures first, averaging the adjacentpixels into a single output pixel as shown in Figure 7.3a, then the resultingcomposite would have the value

(1/2P + 1/2Q)over (1/2R + 1/2S),

which yields1/2[(P + Q) + (1 − α1)(R + S)].

If we were to composite first and then minify, as shown in Figure 7.3b, wewould have

1/2(P overR) + 1/2(QoverS)

which reduces to the same form. A careful check will show that this sameproperty holds over image magnification. In fact, a whole variety of im-age warping and filtering operations can be performed within a consistent

7.5. OTHER COMPOSITING OPERATIONS 7

context, given that the images are stored as associated color images. It isalso easy to show that this is not true with normal color images. If thesame process is attempted, changing the order of operations will result indifferent colors in the final image.

P Q

+

1/2(P+Q)

R S

+

1/2(R+S)

Image 1 Image 2

1/2[(P+Q)+(1-_1)(R+S)]

over

a) minification first

P Q

Image 1

R S

Image 2

over over

P+(1-_1)R

1/2[(P+Q)+(1-_1)(R+S)]

+

Q+(1-_1)S

b) compositing first

Figure 7.3: Combining Minification and Compositing

7.5 Other Compositing Operations

Once the basic notion of image storage in associated color form has beenestablished, it becomes easy to define additional image combination opera-tors. These are shown in Table 7.1, and are the standard image combinationoperators.

Table 7.1: Associated Color Image Combination Operations

op per pixel operationA over B CA + (1 − αA)CBA in B αBCAA out B (1 − αB)CAA atop B αBCA + (1 − αA)CBA xor B (1 − αB)CA + (1 − αA)CB

Chapter 8

Filtering

There is a large body of literature on digital images that has come out ofthe engineering and scientific communities. This is largely motivated by thedesire to more effectively transmit, store, enhance, and reconstruct visualinformation in a variety of applications. Most notable in this area is workdone to enhance and analyze images from microscopy, astronomy, satellitesand space probes. NASA missions during the 1970’s depended heavily onthe ability of digital image processing techniques to extract clean imagesfrom data transmitted over many millions of miles of space from very weakspace vehicle transmitters. Much of what is now common practice in work-ing with images came out of this work. In this course we will not attemptto exhaustively cover this broad field, but will focus on several well under-stood procedures that have especially widespread use in the manipulationof images for graphics, special effects, digital painting and photographicenhancement.

Digital image processing algorithms tend to fall into two broad cate-gories, those of filtering and warping. Warping is simply any change to animage that operates on the image’s geometric structure. Algorithms forwarping will be covered in a later chapter. Here, we will focus on imagefiltering.

We define a filtering operation to be one in which we modify an imagebased on image color content, without any overt change in the underlyinggeometry of the image. The resulting image will have essentially the samesize and shape as the original. Let Pi be the single input pixel with index i,whose color is Ci, and let Pi be the corresponding output pixel, whose coloris Ci. We can think of an image filtering operation as one that associateswith each pixel i a neighborhood or set of pixels Ni and determines a filtered

1

dhouse

Cross-Out

dhouse

Inserted Text

visual

2 CHAPTER 8. FILTERING

output pixel color via a filter function f such that

Ci = f(Ni).

This process is diagrammed in Figure 8.1.

f(Ni)

original image filtered image

neighborhood Ni

pixel i

Figure 8.1: Diagram of the Filtering Process

8.1 Global Filters

A special class of filtering operations modifies an image’s color values bytaking the entire image as the neighborhood about each pixel. These canbe thought of as global filters or simply recoloring algorithms. Of thesetechniques, the two most commonly encountered are normalization andhistogram equalization.

8.1.1 Normalization

The goal of image normalization is to adjust the range of image colors tomake sure that all colors fall within the range of colors allowed by the frame-buffer or file format being used, and at the same time assure that the fullrange of color intensities or values is used. A typical and simple normaliza-tion process would be to examine an image to determine the minimum andmaximum values in the image, and then to rescale all of the colors using alinear map which places the minimum value at 0 and the maximum valueat 1 (or 0 and 255 on an 8-bit integer scale).

We first find Cmin and Cmax, the minimum and maximum image chan-nel values, by scanning all of the pixel channel values in the entire image.Then for each channel j of each pixel i in the image we compute a normal-ized value

Cij =Cij − Cmin

Cmax − Cmin,

8.1. GLOBAL FILTERS 3

This process guarantees that the resulting image uses the full range ofvalues available, and can be very helpful in the process of repairing a badlyunderexposed or overexposed image. Later we will see that many imageprocessing algorithms do not guarantee that the resulting pixel colors willlie within the allowable range. Normalization will prove to be a very handytechnique for readjusting the colors.

Figure 8.2 shows the effect of the normalization process. Figure 8.2ais a photograph of a natural scene that is exposed normally. Figure 8.2bshows the same scene but with the color values in the image artificallyshifted toward the darks so that the maximum color value is only 1/2 ofthe full possible range. Finally, Figure 8.2c shows what this image wouldlook like after the normalization process. Some artifacts are introduced dueto the loss of color accuracy when the dark image was made, but otherwisethe resulting image recovers most of the value information in the originalimage.

a) original image b) image at 1/2 brightness c) image renormalized

Figure 8.2: Image Normalization

As an aside, normalization, as well as all of the image manipulationprocedures discussed in these notes, should be done before any attemptat gamma correction of the image for the display. If gamma correction isdone before normalization, for example, the linear normalization operationbecomes highly nonlinear, failing to preserve either the correct relationshipsamong colors in the image or the correct gamma correction.


8.1.2 Histogram equalization

An image histogram is simply a set of tabulations recording how manypixels in the image have particular attributes. Most commonly, such ahistogram records pixel count for a set of discretized brightnesses or values.For example, examine the original natural scene image in Figure 8.3a, andits histogram in Figure 8.3b. In this case, the histogram has 256 bins, eachcorresponding with one of the integer values from 0 to 255. The height ofthe vertical line at each of these values indicates how many pixels in theimage have a value that, when placed on a scale from 0 to 255 and roundedto the nearest integer, equals the value on the horizontal axis. Althoughthe image of Figure 8.3a is normally exposed, the histogram shows that itis strongly dominated by darks, while still having a good number of pixelsthat are very bright. In fact, we see that all 256 possible values that canappear in the image are being used. A look at the image itself will verifythe information shown in the histogram.

a) original image

0 255value

pixel count

b) histogram

Figure 8.3: Normally Exposed Image and its Histogram

The construction of the image histogram shown in Figure 8.3b is basedon brightness values. We begin by initializing the 256 entries in integerarray H to 0. This array will be used to collect value counts for the his-togram. Recall that given the red Ri, green Gi, and blue Bi componentsof pixel i, its perceived value is given by

Yi = 0.30Ri + 0.59Gi + 0.11Bi (8.1)

8.1. GLOBAL FILTERS 5

Then for each pixel i in the image, we compute its value Yi using Equa-tion 8.1, scale to the range 0 to 255 and round to produce the discrete pixelvalue

Yi = round(255Yi).

This is done across the image, keeping a count of the number of pixelshaving each of the 256 possible discrete values, by incrementing H[Yi] by 1for each discrete pixel value computed.

The goal of histogram equalization is to not only assure that the fullrange of values is used, but that the distribution of values in the image isas uniform as possible across the range of allowable values. To take a moreextreme case, consider the image of Figure 8.4a. It was made from theimage of Figure 8.3a by artificially enhancing the darks and suppressingthe lights, leading to the histogram shown in Figure 8.4b. Clearly, thisprocess has resulted in an image in which it is difficult to see detail, sinceso many of the values are compressed into a small range. However, sincethe image still uses the full range of values, a simple normalization wouldnot improve the image. Histogram equalization will act to more uniformlyspread the values in the image across the full range, brightening the lighterdarks, while leaving bright pixels relatively unchanged.

a) darkened image

0 255value

pixel count

b) histogram

Figure 8.4: Darkened Image and its Histogram

Given an image and its histogram H, it is a simple matter to adjust theimage colors so that they make a much more uniform use of the availablepixel colors. Simply compute a new value Yi for each pixel i in the image


using its discretized value Yi and the histogram by the formula

Yi = 1/TYi∑

j=0

H[j], (8.2)

where T is the total number of pixels in the image. The original pixel red,green and blue color channel values would then be rescaled by the scalefactor

Si = Yi/Yi. (8.3)

Equation 8.2 simply guarantees that each remapped pixel color will beset such that its value is proportional to the count of all pixels dimmerthan or as bright as the pixel being remapped. In this way, pixel colorsmaintain the same relative brightness ordering with respect to each other,but will be shifted up or down the value axis to adjust the histogram. Afterremapping, it may be the case that by increasing a pixel’s value, one or moreof its channel values may exceed the maximim of 255. To correct this, thehistogram equalization can be followed by normalization.

Figure 8.5a shows the dark image of Figure 8.4a after histogram equal-ization is performed. Comparing the histograms of Figure 8.5b and Fig-ure 8.4b, it is clear how effective the algorithm is in its attempt to moreevenly utilize the available color values. The histogram equalized image ofFigure 8.5a is clearly not successful as an image from an artistic point ofview, but it is very successful compared with Figure 8.4a in terms of ourability to see and correctly interpret the details in the image.

One final note on histogram equalization applies if you are using a frame-buffer with a color lookup table. A histogram equalized image can be pro-duced for display much more quickly simply by using Equations 8.2 and8.3 to modify color lookup table entries rather than the image pixels them-selves.

8.2 Local Filters

A local image filter, as opposed to a global filter, is simply any operationthat modifies an image’s color content by replacing each pixel’s value with anew value obtained by examining only pixels in a local neighborhood aboutthe pixel. Image content outside of the local neighborhood has no effect onthe pixel’s new value.

8.2. LOCAL FILTERS 7

a) equalized image

0 255value

pixel count

b) histogram

Figure 8.5: Histogram Equalized Image and its Histogram

8.2.1 Construction of local neighborhoods

For all of the filters being discussed here, the neighborhood about a pixelwill be taken to be a rectangular region centered on the pixel. The sizeof the rectangular neighborhood will be determined by the requirements ofthe filter function, but will be subject to the constraint that its width wand height h must be odd so that the neighborhood’s center can lie on apixel. Thus, in general

w = 2n + 1,

andh = 2m + 1,

where n and m are positive integers. Figure 8.6 diagrams the relationshipsbetween w and n, and h and m.

pixel i

m

w

h

n

Figure 8.6: Rectangular Local Neighborhood About Pixel i


When using a rectangular neighborhood around a pixel for purposes offiltering, the question of what to do around the edges of the image invari-ably arises. For instance when the pixel under consideration is less than npixels from the left edge of the image, the neighborhood extends beyondthe image’s boundary, and it becomes unclear what pixel colors should beincluded. There are two common ways of addressing this dilema, by eithershrinking the output image or expanding the input image.

Shrinking the output image is the simplest approach. Here we simplyignore the n pixel columns on the left and right of the image and the mscanlines on the top and bottom of the image. If we do this, we will computeno values for the corresponding pixels in the output image, producing anoutput image that is smaller than the input image by a few scanlines andcolumns.

If it is important that the output image be the same size as the inputimage expanding the input image is the best method. The idea is to thinkof the image as lying on a 2D plane of infinite extent, and conceptually tileall of this plane with pixels. We then need to consider what color valuesthe pixels external to the image should have. The three most commonlyused choices are 1) to give all of the external pixels a fixed value (typicallyblack), 2) to tile the space external to the image with copies of the image,and 3) to tile the external space with alternating mirrored copies of theimage. Setting the pixels outside of the image’s boundary to a fixed value isclearly the easiest, but will give poor results for many filters or images, sincethere may be no reasonable correspondence between this fixed color and theimage colors. The option of tiling space with the image can produce goodresults where the image is periodic in nature, with its right edge matchingits left, and its top edge matching its bottom. For example, the tiling shownin Figure 8.7 appears to be reasonable in the horizontal direction, due to therepetitive nature of the image. However the tiling in the vertical directionclearly leads to discontinuities that may produce bad results with manyfilters. The option of tiling with mirrored copies of the image is shown inFigure 8.8. This procedure nearly always gives good results, as it tends toresult in very clean seams between the real image and its replications. Asa rule, this method is recommended.

8.2.2 Median filter

The median filter is a local filter that is typically used as a way to improvean image that suffers from shot noise. Shot noise is random dark or lightspots, usually a single pixel in size, caused by dirt, electronic noise or imagetransmission errors.

The median filter works as follows. For each pixel in the image, choose


Figure 8.7: Image Tiling Space

a square neighborhood with the pixel being considered at its center. Theoutput pixel value should be the median of the values in the neighborhood.The median is determined by taking each color value in the neighborhood,sorting them into an ordered list, and choosing the value that sorts to themiddle position in the list.

The logic of the median filter stems from the observation that a smallregion of an image will usually contain pixel values that are closely relatedto each other. Further, one would expect that the pixel in the center of theregion would contain a color value very close to the average of the colorssurrounding it. Now, if we know that an image contains shot noise, andif the pixel at the center of the region is significantly different from theaverage, we can consider its value to be “suspect”. In this case, the bestwe can do is to replace its value by the average of the “non-suspect” pixelsin the region. The median filter accomplishes this by sorting pixels withvery high or low values to the two extreme ends of the sorted list of colors.However, the color in the center of the list, i.e. the median color, will benear to the mean of the uncorrupted pixels in the region.

Figure 8.9 shows the kind of results that can be obtained with a median


Figure 8.8: Mirrored Image Tiling Scheme

filter. Figure 8.9a is a clean original image, and Figure 8.9b is a version ofthis image degraded with shot noise. Finally, Figure 8.9c shows that imageafter applying a median filter employing a 3×3 square neighborhood. Notethat virtually all of the noise is entirely removed, and the image suffers onlyminimal degradation.

8.2.3 Convolution

Probably the most important set of filtering operations is implemented bytaking a weighted sum or difference of pixels within a neighborhood toproduce each output pixel. This approach implements the mathematicalconcept of the convolution of two functions. The idea is easy to follow, as itcan be thought of as sliding a rectangular array of weights over an image,replacing the pixel positioned at the center of the array by the appropri-ately weighted sum of the pixels under the array. The array of weights iscommonly called the convolution kernel. Figure 8.10 shows a snapshot intime of the convolution process, with a kernel of weights centered over thepixel in the fourth scanline from the top and the fourth column from the


a) original image b) with shot noise c) after filtering

Figure 8.9: Median Filtering

left of the image. Pixel values are multiplied by kernel values, and the sumreplaces the pixel in the fourth scanline and fourth column in the outputimage.

original image G filtered image G*H

colors inneighborhood Ni

kernel H

1 2 1

1 2 12 4 21/16 1 3 2

1 2 23 1 3

output pixel i

new pixelcolor = 27/16

Figure 8.10: Convolution Filtering

Since the convolution process is a fixed process, a convolution filter isidentified by its particular kernel. The simplest of the convolution filters isa uniform filter, or box filter. It is simply one in which all of the weightsare the same, and that sum to 1. For each pixel Pi, it produces an outputthat is the numerical average of the pixels in its neighborhood.

Before proceeding, we should make our terminology and mathematicaldescription more precise. In mathematical notation, it is usual to represent


the filter kernel by the letter H. We will call our input image G and theoutput image G. For the time being, let us examine the weighting processin one dimension (i.e. along a scanline) since it will be easier to draw anddescribe. Later, the notation can be easily extended to two dimensions.

The discrete convolution process, involving “sliding” H over G to pro-duce a weighted output, is indicated by the mathematical symbol ∗. Assumethat both G and H are represented as arrays with indices beginning at 0as in the C Programming Language. Then their discrete convolution wouldbe written mathematically as

G[i] = (G ∗ H)[i] =∞∑

j=−∞G[j]H[i − j]. (8.4)

There are several things to note about Equation 8.4. First, the sum-mation is infinite. However, we may assume that the kernel width is thesmall odd integer w = 2n + 1, and we can assume that all kernel elementsoutside of the array bounds are 0. To simplify the notation it will be con-venient to reindex the kernel H from −n to n instead of from 0 to w − 1.The second is that the subscript on H decrements from ∞ to −∞ as theindex of summation j increases. One way to view this is that accordingto the precise mathematical definition of convolution, we mirror H beforemultiplying in the summation. This is so that if we convolve a single point,the convolution will reproduce the point-spread function represented by theconvolution kernel. Finally, we note that Equation 8.4 requires the exten-sion of G by exactly n elements on either end to produce valid outputs foreach of the N elements in the original input G. Thus, if the input G haswidth N , it must be expanded, for purposes of the convolution operation,to have N + 2n elements, and indices from −n to N + n − 1. Since weare not interested in the output of the convolution operation beyond theoriginal extent of input G, after these modifications, Equation 8.4 can berewritten

G[i] = (G ∗ H)[i] =i+n∑

j=i−n

G[j]H[i − j], i ∈ [0, N − 1]. (8.5)

Figure 8.11 shows the operation implied by Equation 8.5 for a kernel ofwidth w = 3 and an image scanline of width N = 7. The Figure clearlyshows the weighting process, and emphasizes that since the array indexingfor H is reversed from the index of the summation, we have to think of thekernel as being reflected about its center before being superimposed overthe image.

All of this can be extended to two dimensions in a straightforward way.Assume that the kernel H is a rectangular array of width w = 2n + 1 and


Input Output

H 1/4 1/2 1/4

1 0 -1

-1 0 1 2 3 4 5 6 7

G 3 3 2 4 2 1 3 1 1

0 1 2 3 4 5 6

G*H 11/4 11/4 3 9/4 7/4 2 3/2

reflectionreflection

Figure 8.11: Convolution G ∗ H

height h = 2m + 1, and that the image G is of width M and height N .Then we may write

G[i, j] = (G∗H)[i, j] =i+m∑

l=i−m

j+n∑

k=j−n

G[l, k]H[i−l, j−k], i ∈ [0,M−1], j ∈ [0, N−1].

(8.6)Equation 8.6 assumes that two-dimensional array indexing is row major,where the first index is the row (scanline) index and the second is thecolumn index.

If, for some reason, it is desired to use a non-rectangular kernel, thiscan be modeled by making the kernel H a rectangle, but filling its entrieswith 0’s outside of the boundary of the desired kernel shape.

A final note on convolution kernels has to do with kernel constructionand efficiency of processing. Many of the important convolution kernelshave the separability property that they can be represented as the outerproduct of two one dimensional vectors. If we take the 1-dimensional con-volution kernel H1 and express it as the column vector H1, then the matrix

H2 = H1Ht1,

can be used as the 2-dimensional convolution kernel H2. A example wouldbe the 1-dimensional tent filter kernel

1 2 1

which can be used to construct the 2-dimensional tent filter kernel⎡

⎣1 2 12 4 21 2 1

⎤

⎦ =

⎡

⎣121

⎤

⎦ [1 2 1

].

Thus, if we can imagine a 1-dimensional filter and its properties, we canoften easily create a 2-dimensional filter that will give similar results.


Along with ease of construction, the seperability property leads to animportant efficiency consideration. This is that we can get the same resultby filtering twice using the 1-dimensional filter as we would get by applyingthe 2-dimensional filter once. If the filter kernel is small, this has no obviousadvantages but can have a real effiency payoff for large kernels. Considera square filter kernel of width w. To filter an image using this kernelrequires w2 operations per pixel. If the kernel were seperable, then analternative would be to process the scanlines of the image using the 1-dimensional factor of the kernel to produce an intermediate image, and thenprocess the columns of the intermediate image using the same 1-dimensionalkernel. Now, the 1-dimensional convolution process takes only w operationsper pixel and it is done twice, requiring a total of 2w operations. Usingthe seperability property results in a filtering algorithm whose time growsonly linerally with the size of the kernel, whereas the direct 2-dimensionalconvolution takes time that grows with the square of w. There is clearly abig speed advantage as w becomes large.

8.2.4 Important convolution filters

There are two main categories of convolution filter. Those that tend tosmooth or blur the image are known as low-pass filters and those that tendto enhance edges, noise and detail are known as high-pass filters. In gen-eral, a low-pass filter is typified by a kernel dominated by positive weights,whereas a high-pass filter will have more of a balance between positive andnegative weights. To gain an intuitive feeling for this terminology, think ofan image of the surface of a pond covered with ripples of varying size. Thereare large smooth ripples covering large areas, and smaller ripples riding ontop of the bigger ones. The large ripples provide the general contour to thewater’s surface and the little ones provide the detail to the surface. Whata low-pass filter does is that it tends to remove detail by averaging over alocal area, analogous to letting only the large smooth ripple through. Whata high-pass filter does is that it tends to flatten out large ripples while en-hancing the the small ones. They do this by taking differences over a localarea, so that the output in a region of nearly identical values goes to zero,while the output in a region with rapidly varying detail will be enhanced.

An important class or family of low-pass or smoothing filters can beobtained by starting with a 1-dimensional box or pulse filter kernel of widththree,

1 1 1

Convolving with this box filter yields the simple average of the three pixelsin a scanline neighborhood. Figure 8.12a shows this filter kernel graphed.

Donald House


This can be extended to the 2-dimensional filter kernel

1 1 11 1 11 1 1

by the principle of seperability. It is easy to verify for yourself that if ascanline were convolved with this kernel, and then the resulting scanlinewere convolved with this kernel again, that the result would be the sameas if the original scanline had been convolved with the kernel

1 2 3 2 1

This is simply the convolution of the original kernel with itself. This kernelhas the shape shown in Figure 8.12b, and is known as the tent filter. Whatit does is to apply linearly decreasing weights to pixels with distance fromthe center. Again, by the separability principle it can be used to form the2-dimensional filter

1 2 3 2 12 4 6 4 23 6 9 6 32 4 6 4 21 2 3 2 1

The final filter kernel shown in Figure 8.12c is the result of convolving thebox kernel with itself two more times. The kernel is bell shaped, and isgiven by

1 4 10 16 19 16 10 4 1

which can again be extended to 2-dimensions by separability. Further rep-etitions of convolution by the box kernel will tend more and more towardsa gaussian curve.

1

0

a) box filter

1

0

2

3

b) tent filter

4

0

8

12

16

20

c) bell filter

Figure 8.12: Filters Formed From Successive Convolution of the Box Filter

The filters resulting from successive convolutions of the box filter tendto give better image smoothing results than a simple box filter of the same


width. Figure 8.13 shows how these filters compare on an image containinga white square on a black background. Figure 8.14 makes the same com-parisons on an image consisting of a series of vertical lines, and Figure 8.15makes comparisons on a ripple image. In each of these figures image a) isthe original image and image b), c) and d) are versions of image a) filteredby box filters of kernel width 3, 5 and 9 respectively. Image e) is filteredwith a tent filter of width 5 and should be compared with image c). Imagef) is filtered with a bell-shaped filter of width 9 and should be comparedwith image d).

There is a noticable improvement in image quality in the bottom row,with edges smoothly and cleanly blurred rather than blended together withthe muddy appearance visible in the second row. This is especially apparentwhen comparing images d) and f) in Figures 8.14 and 8.15. Another thing tonote is that a simple box filter has a strong directional bias in the horizontaland vertical directions that is largely corrected in the higher order filters.This is most obvious when comparing images d) and f) in Figure 8.15.

Figure 8.15 is an especially useful image in demonstrating why this fam-ily of filters are called low-pass filters. In the center, where the ripple sizeis large, there is very little difference between images a) and f). How-ever, in the corners, where the ripples are small and closely spaced, thereis a marked blur and decrease in intensity. The low-pass filter passes thelow-frequency (high-wavelength) ripples relatively unchanged, but nearlyeliminates the high-frequency (low-wavelength) ripples.

The final filter to be considered here is the high-pass filter with thekernel

1 1 11 -8 11 1 1

This filter acts to suppress large areas of uniform or smoothly varying color,while emphasizing edges, fine texture, and noise. Because the sum of thekernel weights is 0, an area of constant color will result in a filter outputof 0. However, along edges or wherever the image has high detail the filteroutput will be a large positive or negative value. Figure 8.16 was madeusing this high-pass filter.

The images a), c) and e) are original unfiltered images and images b),d) and f) are the corresponding high-pass filtered images. Due to the high-pass filtering, only the edges in image a) pass through to image b). Imaged) replaces each black/white or white/black boundary in image c) witha thin line, and replaces each black or white interior with black. Imagef) is perhaps the most interesting, especially compared with the low-passfiltered image f) in Figure 8.15. Here the smooth low-frequency ripples in


a)

c)

e)

b)

d)

f)

Figure 8.13: Square Image Low-Pass Filtered Versions


a)

c)

e)

b)

d)

f)

Figure 8.14: Vertical Line Image and Low-Pass Filtered Versions


a)

c)

e)

b)

d)

f)

Figure 8.15: Ripple Image and Low-Pass Filtered Versions


the center of the image are suppressed, leaving only the effects of the finehigh-frequency ripples in the corners.

Note that since the filter can produce negative as well as positive values,filter outputs must be adjusted to the range of the framebuffer. The filteredimages in Figure 8.16 were made by scaling the absolute value of the filteroutput to fit the range of allowable values. Another approach would havebeen to compute the output image and then normalize the result.

There are many other possibilities for convolution filters. These includefilters that do a combination of low and high-pass filtering, thus smoothingthe image while emphasizing edges, filters that are directionally sensitive,emphasizing edges oriented in a particular direction, and filters that detectand emphasize particular patterns in the image. The convolution filteringhomework assignment will give the student an opportunity to experimentwith some of these filters on their own.


a)

c)

e)

b)

d)

f)

Figure 8.16: Original Images and High-Pass Filtered Images

Chapter 9

Image Warps

The word warp brings mental images of large shape distortions, but thetechnical term image warp is used to refer to any transformation on animage that changes any of its geometric properties. Thus a transformationthat simply changes the size of an image and a transformation that makescomplex twists and bends to an image are both referred to as warps. Otherimage operations, even ones that make drastic changes, are not referredto as warps as long as they only change non-geometric attributes such ascolor, texture, graininess, etc.

We often refer to the operation that does the warp as a map or functionthat sends an input image into a warped output image. To nail this ideadown more precisely, let us start by measuring the input image in coordi-nates (u, v), and our resulting warped image in coordinates (x, y). Then anywarp can be thought of as a mapping that sends each coordinate pair (u, v)into a corresponding pair (x, y). Any such map can be expressed as twofunctions, X that determines the transformed x coordinate from (u, v), andY that determines the transformed y coordinate from (u, v). This conceptis illustrated in Figure 9.1, and is written mathematically as

x = X(u, v),y = Y (u, v),

or in vector notation as[

xy

]=

[X(u, v)Y (u, v)

]. (9.1)

1

2 CHAPTER 9. IMAGE WARPS

[X(u,v), Y(u,v)]

u

v

InputImage

x

y

WarpedImage

(u0,v0) (x0,y0)

Figure 9.1: Image Map defined by functions X(., .) and Y (., .)

9.1 Forward Map

The function pair [X(), Y ()] defines what is known as a forward map fromthe input image to output image. If the input image were continuous, andthe forward map were reasonably smooth (i.e. it had no discontinuities), itwould suffice to allow one to “paint in” the warped image from the inputimage. For each coordinate pair (u, v) in the input image, we would simplycolor the point [X(u, v), Y (u, v)] in the output image the same color as thepoint (u, v) in the input image. However, if you reflect a bit you will see thatthis process is only of hypothetical interest, as it implies that we paint aninfinitely finely represented image into another infinitely finely representedimage using an arbitrarily large number of painting steps – it works greatbut it takes forever!

Realistically, digital images are not infinitely fine continuous represen-tations but in fact consist of a finite number of discrete samples. We viewan image by spreading the color of each sample over a small area – so thateach sample corresponds with a pixel when we view the image. This seemsto solve our problem, and allows us to write the simple algorithm

for(v = 0; v < in_height; v++)for(u = 0; u < in_width; u++)

Out[round(X(u,v))][round(Y(u,v))] = In[u][v];

that will perform the forward map with only one operation per pixel inthe input image. Note that the round() operations in the assignmentstatement are necessary to provide integer pixel coordinates. Unfortunately,the application of this simple forward map algorithm to a sampled imagewill result in the kinds of situations shown in Figure 9.2. The output imagewill be left with “holes” (i.e. pixels with unknown values) where the outputis scaled up compared with the input, and multiple pixel overlaps wherethe output is scaled down with respect to the input.

9.2. INVERSE MAP 3

In Out

[X(), Y()]

11,21

31

12

13

22

23

32

33

?

? ? ?

?

?

?

?

?

?

?

?

?

?

??

?

?

? ? ?

?

???

? ?

?11 12 13

21 22 23

31 32 33

Figure 9.2: Forward Map Leaves Holes and Overlaps

The problem is that each pixel in the input image represents a finite(non-zero) area and actually projects an area of the input image onto theoutput, as shown in Figure 9.3. If our forward mapping algorithm carefullyprojected each pixel onto the output, we would see that each input pixelmight fully cover some output pixels and partially cover several other outputpixels, or it might only partially cover a single pixel. If we did this kind ofprojection, saving the percentage coverage of each output pixel by the inputpixels, we could use these percentages to compute a weighted average colorfor each output pixel. This would give us an output image without holesor overlaps. But, the calculation of the correct percentages could get verycomplex, especially if the forward map is highly nonlinear (i.e. straightlines get mapped to curved lines). If the map is linear, a simple way todo the projection of a pixel is to project each of its four corners and thenconnect these projected corners by straight lines. However, if the mappingis non-linear, we would have to resort to other methods to get accurateresults, such as mapping many sample points around the pixel contour. Nomatter what, this method is bound to be slow to compute and much harderto implement than our original simple pixel-to-pixel algorithm.

9.2 Inverse Map

The problems with the forward image mapping process can be solved bysimply inverting the problem, turning it into an inverse map. Instead ofsending each input pixel to an output pixel, we look at each output pixeland determine what input pixel(s) map to it, as shown in Figure 9.4. Todo this we need to invert the mapping functions X and Y . We will namethese inverse mapping functions U(x, y) and V (x, y), and define them as

u = U(x, y),


Input Pixel Output Pixels

[x(), Y()]

Figure 9.3: Projection of pixel area onto output raster

v = V (x, y),

or in vector notation as[

uv

]=

[U(x, y)V (x, y)

], (9.2)

with U and V chosen such that

u = U [X(u, v), Y (u, v)],v = V [X(u, v), Y (u, v)]. (9.3)

In other words, the pair [U(), V ()] is chosen such that it exactly invertsor “undoes” the mapping due to the pair [X(), Y ()]. Thus, if one samplesthe output image at point (x, y), then (u, v) = [U(x, y), V (x, y)] gives thecoordinates (u, v) in the input image that map to position (x, y) in theoutput. As in the forward map, we have to round to give integer pixelcoordinates, but here we round the coordinates into the input image, notthe output image. Thus, even with a sampled digital image, we can computeour output image from the input image using the inverse map, without fearof holes or overlaps in the output image. The mapping algorithm is simply

for(y = 0; y < out_height; y++)for(x = 0; x < out_width; x++)

Out[x][y] = In[round(U(x,y))][round(V(x,y))];

The inverse mapping method is simple and fast to compute, and thusis used extensively in most image warping tasks such as morphing and 3Dtexture mapping. It also has some nice features, such as the ability tosample through a mask to provide automatic clipping of the output image,as shown in Figure 9.5.

9.3. WARPING ARTIFACTS 5

In Out

[U(), V()]

21

31

12

13

22

23

32

33

11

12 13 13

13

13

23

23

33

33

33

23

13

13

1212

21

31

31 32 32

32

222222

22 22

1111 12 13

21 22 23

31 32 33

Figure 9.4: Inverse Map gives Complete Covering

no sample,clip mask

sample

Figure 9.5: Inverse Mapping Through a Clip Mask

9.3 Warping Artifacts

However, nothing comes for free and in fact the inverse mapping process,while solving the serious problems of the forward map, still leaves us withartifacts that can degrade output image quality. These artifacts are due tothe fact that in dealing with digital images we are dealing with sampled,not continuous, data. We will treat this issue in detail later, but for now itis enough to know that they fall into two important categories:

• aliasing artifacts caused by sampling the input too coarsely in someplaces, leading to missed features and unwanted patterning (Fig-ure 9.6a) – aliasing will occur where a region of the output imageis minified (scaled down) with respect to the corresponding region ofthe input image.

• reconstruction artifacts caused by using too crude a method (psf)to reconstruct the area around the input pixel, leading to staircasingor “jaggies” in the output (Figure 9.6b) – reconstruction artifacts willappear where a region of the output image is magnified (scaled up)with respect to the corresponding region of the input image.


We can minimize aliasing by doing a careful job of smoothing the inputimage before we sample it to produce the output image. We can mini-mize reconstruction artifacts by using a more sophisticated reconstructionscheme than the square flat areas that we get with pixels. However, theseissues will require a careful treatment and we will not begin to build up thenecessary tools until the next chapter.

input image output aliased image

output grid overinput image

a) aliasing artifacts

b) reconstruction artifacts

Figure 9.6: Aliasing and Reconstruction Artifacts

9.4 Affine Maps or Warps

An affine map on an image is of the general form[

xy

]=

[a11u + a12v + a13

a21u + a22v + a23

].

9.4. AFFINE MAPS OR WARPS 7

In other words

X(u, v) = a11u + a12v + a13,

Y (u, v) = a21u + a22v + a23,

where the coefficients aij are constants. Affine maps have several niceproperties. They always have an inverse (except in some uninterestingdegenerate cases), and they can be represented in matrix form. The mapabove can be written

⎡

⎣xy1

⎤

⎦ =

⎡

⎣a11 a12 a13

a21 a22 a23

0 0 1

⎤

⎦

⎡

⎣uv1

⎤

⎦ , (9.4)

where a 1 is padded onto the end of each vector to allow for the translationconstants a13 and a23. Multiplying the top row of the matrix by the vector,one can see that the effect of the constant a13 is to translate the x coordinateby an amount equal to the value of a13, i.e.

x = a11u + a12v + a13 =[

a11 a12 a13

]⎡

⎣uv1

⎤

⎦ . (9.5)

We have already seen some of the important affine transformations whenwe studied PostScript. These were scale, translate and rotate. An addi-tional, highly useful, affine transformation is shear. In the matrix form ofEquation 9.4 these transformations are:

• scale (x, y) = (a11u, a22v)

⎡

⎣a11ua22v

1

⎤

⎦ =

⎡

⎣a11 0 00 a22 00 0 1

⎤

⎦

⎡

⎣uv1

⎤

⎦ 1

1

a11

a22

• translation (x, y) = (u + a13, v + a23)⎡

⎣u + a13

v + a23

1

⎤

⎦ =

⎡

⎣1 0 a13

0 1 a23

0 0 1

⎤

⎦

⎡

⎣uv1

⎤

⎦(0,0)

(a13,a23)


• rotation (x, y) = (u cos θ − v sin θ, u sin θ + v cos θ)⎡

⎣u cos θ − v sin θu sin θ + v cos θ

1

⎤

⎦ =

⎡

⎣cos θ − sin θ 0sin θ cos θ 0

0 0 1

⎤

⎦

⎡

⎣uv1

⎤

⎦O

• shear (x, y) = (u + a12v, a21u + v)

⎡

⎣u + a12va21u + v

1

⎤

⎦ =

⎡

⎣1 a12 0

a21 1 00 0 1

⎤

⎦

⎡

⎣uv1

⎤

⎦

u1

v1

a12v1

a21u1

The structure of the scale, translation and shear transforms is reason-ably obvious. The derivation of the rotation transform goes as follows.Refer to Figure 9.7 which shows the rotation of point (x0, y0) through an-gle θ to make new point (x1, y1). Since the rotation is about the origin,both points are at distance r =

√x2

0 + y20 =

√x2

1 + y21 from the origin. The

initial angle of rotation of point (x0, y0) from the x axis is α. Then fromthe definitions of the sine and cosine functions we have

x0 = r cos α, x1 = r cos(α + θ),y0 = r sin α, y1 = r sin(α + θ).

The well known trigonometric identity

cos(α + θ) = cos α cos θ − sin α sin θ,

may be multipled by r to yield

r cos(α + θ) = r cos α cos θ − r sin α sin θ,

so that by the definitions of x0 and x1 above we may obtain

x1 = x0 cos θ − y0 sin θ.

Likewise, the identity

sin(α + θ) = sin α cos θ + cos α sin θ.

may be multiplied by r to yield

r sin(α + θ) = r sin α cos θ + r cos α sin θ,

so that by the definitions of y0 and y1 above we may obtain

y1 = y0 cos θ + x0 sin θ

9.5. COMPOSING AFFINE WARPS 9

x

y

r

_

O

(x1, y1)

(x0, y0)

Figure 9.7: Rotation of a Point Around the Origin

9.5 Composing affine warps

In situations where one wants to do a series of affine transformations to oneimage, the matrix representation of the transform gives us an easy way tocompose the transforms.

Let S be a scale, T a translation, R a rotation, and H a shear matrix.Let’s do a rotation, followed by a scale, followed by a translation

1. R

⎡

⎣uv1

⎤

⎦ =

⎡

⎣u′

v′

1

⎤

⎦

2. S

⎡

⎣u′

v′

1

⎤

⎦ =

⎡

⎣u′′

v′′

1

⎤

⎦

3. T

⎡

⎣u′′

v′′

1

⎤

⎦ =

⎡

⎣u′′′

v′′′

1

⎤

⎦.

This sequence of operations can be written in one equation as

T (S(R

⎡

⎣uv1

⎤

⎦)) =

⎡

⎣u′′′

v′′′

1

⎤

⎦

but by the associative law, this gives the same result as

((TS)R)

⎡

⎣uv1

⎤

⎦ =

⎡

⎣u′′′

v′′′

1

⎤

⎦ .


Thus, we see that we can premultiply the matrices to get a single compositetransformation matrix

M = TSR,

that will do all 3 transformations (warps) in the specified order, in thesingle operation

M

⎡

⎣uv1

⎤

⎦ =

⎡

⎣u′′′

v′′′

1

⎤

⎦ .

This is a wonderfully compact and unified way of constructing a largevariety of very useful warps in a very intuitive way – i.e. by simple composi-tion of very easy to understand operations. Figure 9.8 shows an example ofa composite operation that first scales an image by 1/2 and then translatesit so that it is still centered in the original image rectangle.

1/2 Scale translate

(0,0) (1,0)

(0,1) (1,1)

(0,0)

(1/2,1/2)

(1/4,1/4)

(3/4,3/4)

S T

S =

[1/2 0 00 1/2 00 0 1

], T =

[1 0 1/40 1 1/40 0 1

]

TS =

[1 0 1/40 1 1/40 0 1

][1/2 0 00 1/2 00 0 1

]

M = TS =

[1/2 0 1/40 1/2 1/40 0 1

]

Figure 9.8: Affine Composition – 1/2 Scale Followed by Centering Trans-lation

9.6 Forward and Inverse Maps

A matrix M , which is a composite affine transformation built up as theproduct of a series of affine transformation matrices, is all we need to com-

9.6. FORWARD AND INVERSE MAPS 11

pute the forward map

M

⎡

⎣uv1

⎤

⎦ =

⎡

⎣xy1

⎤

⎦ .

Thus the forward map is implemented by simply multiplying the matrix Mby the pixel coordinates (u, v) of each pixel in the input image to producetransformed pixel coordinates (x, y) in the output image.

But, we already know that a forward map is not really what we want,as it will produce holes and overlaps in the output image. Fortunately, it isrelatively easy to find the correct inverse map for any affine transformationexpressed as a matrix M . All that we need to do is to find the matrix M−1

with the property given by Equations 9.3 that

M−1M

⎡

⎣uv1

⎤

⎦ =

⎡

⎣uv1

⎤

⎦ ,

i.e. that M−1 exactly cancels the effect of M . Now when applied directlyto output pixel coordinates (x, y), M−1 gives us the coordinates (u, v) inthe input image that map to (x, y) under the forward map M ,

⎡

⎣uv1

⎤

⎦ = M−1

⎡

⎣xy1

⎤

⎦ .

If we can find such a matrix, the inverse map would be implemented bysimply multiplying the matrix M−1 by the pixel coordinates (x, y) of eachpixel in the output image to produce transformed pixel coordinates (u, v)in the input image.

The matrix M−1 is known (not surprisingly) as the inverse of the orig-inal matrix M . For a 3 × 3 matrix M , of the form

M =

⎡

⎣a11 a12 a13

a21 a22 a23

0 0 1

⎤

⎦ ,

its inverse M−1 is given by

M−1 =1

a11a22 − a12a21

⎡

⎣a22 −a12 a12a23 − a13a22

−a21 a11 −a11a23 + a13a21

0 0 a11a22 + a12a21

⎤

⎦ .

Thus, the terms of the inverse are easy to compute from the terms of theoriginal affine transformation matrix, and it is a simple matter to write a C

dhouse

Cross-Out

dhouse

Sticky Note

minus, not plus


function that, given the affine transformation matrix M , returns its inverseM−1.

In the next section we will begin to explore a class of nonaffine imagewarps that can nevertheless be expressed as matrices. We will again havethe problem of determining the inverse matrix, but the compact formulaabove will not hold, since the bottom row of the forward matrix will not,in general, be simply

[0 0 1

]. In this case we will need to have a

convenient algorithm for computing the inverse of such a matrix. Thegeneral formula for such an inverse is

M−1 =A(M)|M | ,

where |M | is the determinant of the matrix M , and A(M) is the adjoint ofM . Students with mathematical curiosity can check out a Linear Algebrabook for details on how the adjoint and determinant are computed, or waitfor the appendix to these notes which will be done in about a year!

9.7 Perspective Warps

Perspective warps are popular and useful warps that give the illusion thatthe image is receding off into space, as if it had been rotated out of thepicture plane. Think of the image being painted onto a plane in three-dimensional space, and imagine a second plane, the plane of projection,which cuts through the image plane. A perspective warp can be thought ofas a three-dimensional rotation of the image about the line of intersectionbetween the two planes. It distorts the image’s shape in such a way thatgroups of parallel lines that are also parallel to the line of intersectionremain parallel to each other, but other groups of parallel lines recede toa common vanishing point. Figure 9.9 shows an example of a perspectivewarp in which it appears that the image has been rotated around its centralhorizontal axis. Note that all horizontal lines remain parallel to each other,but that the vertical lines (if extended upwards) would all intersect ata common point. Horizontal lines rotated to be “nearer” to the viewerappear longer than in the original image, and horizontal lines rotated to be“farther” from the viewer are shorter. This change in scale with perceiveddistance is known as foreshortening, and helps to create an illusion of depthin the perspective image.

Perspective warps extend the affine transforms by adding a non-linearscaling after the matrix multiplication, and make use of the two terms inthe third row of the transform matrix that are 0 in the affine warps. Theyare obtained by setting the terms a31 and/or a32 to non-zero values. When

Donald House

Donald House

A(M) transpose

9.7. PERSPECTIVE WARPS 13

Figure 9.9: A Perspective Warp

this is done, the product of the matrix by pixel coordinates yields a resultwhere the third coordinate of the resulting vector is not one. This can beseen in the example

⎡

⎣1 0 00 1 0

a31 a32 1

⎤

⎦

⎡

⎣uv1

⎤

⎦ =

⎡

⎣uv

a31u + a32v + 1

⎤

⎦ .

It is common practice to call this third element the w-coordinate of thevector. For all of the affine warps, if the w-coordinate was one before thewarp, it remained at one after the warp. For the perspective warps, wis in general not one after the matrix multiplication is done. In fact, thew coordinate is related to the distance of the point (u, v) in the rotatedpicture plane from the original picture plane – i.e. it can be thought of asa non-linear measure of the distance of the pixel from the viewer.

Abstractly, when we originally formed the vector

[uv1

]from the two

dimensional coordinate pair (u, v), we were treating each two dimensionalpoint as if it were a point in three-dimensional (u, v, w) space, lying on theplane w = 1. This is shown pictorially in Figure 9.10, and is known as ahomogeneous coordinate system (i.e. a system in which every point has anidentical third coordinate).

Returning to the perspective transformation process, we noted that afterthe matrix multiplication, our homogeneous image coordinates no longerhad a third coordinate of one. To complete a perspective warp, and restoreour points to homogeneous coordinates with w = 1, we follow the matrixmultiplication by a scaling operation. We divide each vector by its own wcoordinate, i.e.

1w

⎡

⎣uvw

⎤

⎦ =

⎡

⎣u/wv/w1

⎤

⎦ .

This can be seen to be equivalent to connecting the three-dimensional point


u

v

w

w = 1plane

(0,1,1)

(-1,2,1)

2

1-1

1

Figure 9.10: Homogeneous Coordinates Lie on the Plane w = 1

with the origin by a line, and determining the intersection of that line withthe plane w = 1.

So, the process for computing the forward transformation for a perspec-tive warp is to 1) multiply each pair of image coordinates by the perspectivewarp matrix, then 2) scale each coordinate triple by the reciprocal of itsown w coordinate. An example of this process is worked out in Figure 9.11,where the perspective warp matrix is

P =

⎡

⎣1 0 00 1 01 0 1

⎤

⎦ .

Note in this example, that no matter how large u is, the projected u coor-dinate cannot exceed 1. Similarly, for large u, the projected v coordinatetends towards 0 – i.e. the vanishing point is at (1, 0).

Unlike the affine warps, however, the perspective warp involves a non-linear operation (dividing by w), and so it may be more difficult to find asuitable inverse function for the inverse mapping operation. Fortunately, itis possible to find the inverse to within a scale, and this will turn out to beall that we need. We shall see about this in the next Chapter.

9.7. PERSPECTIVE WARPS 15

(0,1) (1,1)

(0,0) (1,0)

(0,1)

(.5,.5)

(0,0) (.5,0) (1,0)

P =

[1 0 00 1 01 0 1

]

P

[uv1

]=

[uv

u + 1

]=⇒

⎡

⎣u

u+1v

u+1

1

⎤

⎦

Figure 9.11: Example Perspective Warp

Chapter 10

Inverse Projective Mapsand Bilinear Maps

10.1 Projective Maps

A useful tool for doing warps would allow the user to change the shape of arectangular image by moving corner points, as shown in Figure 10.1. Thiswould allow us to make any arbitrary shape distortion that preserves linearedges. The problem is: what is the map from input to output?

u1

u3u0

u2

x0

x3

x2

x1

Figure 10.1: Interactive Image Warping by Moving Corners of Rectangle

The answer is that the map

M

⎡

⎣uv1

⎤

⎦ =

⎡

⎣u′

v′

w′

⎤

⎦ =⇒

⎡

⎣xy1

⎤

⎦ , (10.1)

1

2CHAPTER 10. INVERSE PROJECTIVE MAPS AND BILINEAR MAPS

withx = u′/w′, y = v′/w′,

that we used to implement a perspective transform, will in fact warp arectangle into an arbitrary convex quadrilateral. We call this entire classof operations the projective transforms.

The elements of the transformation matrix M for a general quadrilateralwarp are not hard to compute. It turns out that the lower righthand cornerterm a33 of M simply acts to provide a uniform scaling. Since this isredundant with the scaling effects of the other diagonal terms a11 and a22,we can arbitrarily set a33 to one without any loss of generality. Havingdone this, each application of Equation 10.1 to a vertex ui of the startingrectangle gives two equations

xi =a11ui + a12vi + a13

a31ui + a32vi + 1,

yi =a21ui + a22vi + a23

a31ui + a32vi + 1,

for vertex xi of the corresponding vertex of the warped quadralateral. Sincewe have four corresponding coordinate pairs (ui, vi), (xi, yi) this gives us theeight necessary correspondences to solve for the eight unknown coefficientsaij of the transformation matrix M .

10.2 Inverse Projective Maps

For purposes of image warping we have defined a projective map to be anymap that can be defined by a 3×3 homogeneous transformation matrix. Themap itself consists of the following steps for transforming a two-dimensionalcoordinate pair (u, v):

1. extend (u, v) to a three-dimensional homogeneous vector

(u, v) −→[

uv

]=⇒

⎡

⎣uv1

⎤

⎦

2. multiply by the transformation matrix to get the transformed homo-geneous vector

M

⎡

⎣uv1

⎤

⎦ =

⎡

⎣u′

v′

w′

⎤

⎦

10.2. INVERSE PROJECTIVE MAPS 3

3. scale the result by 1/w′ (project onto the plane w = 1) and discardthe w coordinate, giving a transformed 2D coordinate pair (x, y)

1/w′

⎡

⎣u′

v′

w′

⎤

⎦ =

⎡

⎣u′/w′

v′/w′

1

⎤

⎦ =⇒[

u′/w′

v′/w′

]−→ (x, y).

Now, this whole process is invertible, to within a scale applied to thematrix M . If we compute M−1 (or any scale of M−1), then if we invertthe process for the forward map, we get the inverse map:

1. extend (x, y) to three-dimensional homogeneous form

(x, y) −→[

xy

]=⇒

⎡

⎣xy1

⎤

⎦

2. multiply by M−1

M−1

⎡

⎣xy1

⎤

⎦ =

⎡

⎣x′′

y′′

w′′

⎤

⎦

3. scale by 1/w′′ and discard the w coordinate

1/w′′

⎡

⎣x′′

y′′

w′′

⎤

⎦ =

⎡

⎣x′′/w′′

y′′/w′′

1

⎤

⎦ =⇒[

x′′/w′′

y′′/w′′

]−→ (u, v)

Note, in the above process, that any arbitrary scale factor is “washedout” when we scale by 1/w′′ in the last step. Thus, although the generalprojective mapping process contains a non-linear operation (divide by w), itis still possible to use a simple matrix inverse to compute the inverse map.This is of great importance to image warping algorithms, since it meansthat for any projective map we can always find an inverse map, making ita simple matter to use an inverse mapping process to compute the warpedimage.

Figure 10.2 gives the computation of the inverse of a general 3 × 3projective transformation matrix. It is taken directly from the Wolbergtext on pages 52-53. One important note is that since we just need M−1 towithin a scale, we can skip the divide by the determinant |M |, and simplyuse the adjoint A(M) in place of the inverse. This gives a nice time savingsin the original calculation of the inverse.


M =

⎡

⎣a11 a12 a13

a21 a22 a23

a31 a32 a33

⎤

⎦

|M | = a11a22a33+a12a23a31+a13a32a21−a13a22a31−a12a21a33−a11a32a23

A(M) =

⎡

⎣a22a33 − a23a32 a13a32 − a12a33 a12a23 − a13a22

a23a31 − a21a33 a11a33 − a13a31 a13a21 − a11a23

a21a32 − a22a31 a12a31 − a11a32 a11a22 − a12a21

⎤

⎦

M−1 =A(M)|M |

Figure 10.2: 3 × 3 Matrix Inverse

10.3 Bilinear Interpolation

We have already seen that a projective warp can map a rectangular image toany quadrilateral shape. If the warp maps a rectangle into a parallelogram,then the projection is a simple affine projection with elements a31 and a32

of matrix M both set to zero. However, if these terms are non-zero, wehave a perspective warp, and the resulting shape is a quadrilateral whoseopposite sides are not necessarily parallel. This is fine if we want a trueperspective of the original image, but there are times when we would liketo do a warp which preserves uniform spacing between image features. Aperspective warp will have a foreshortening effect, making image featuresshrink as they “recede” away from the viewer. This effect can be seen inFigure 10.3.

Often this foreshortening effect is not desired, and a mapping like thatshown in Figure 10.4 would be preferred. This kind of map is known asa bilinear warp. Just like a perspective warp, it will allow for arbitrarymapping from any quadrilateral shape to any other quadrilateral shape,but will preserve spacing.

The bilinear warp uses linear interpolation to map from the (u, v) sourceimage plane to the (x, y) destination image plane. It is obtained by thefollowing construction:

Donald House

Donald House

A(M) transpose

10.3. BILINEAR INTERPOLATION 5

y

x

v

u

Figure 10.3: Foreshortening due to Perspective

y

x

v

u

Figure 10.4: Even Spacing under Bilinear Warp

1. number the corners of the source image, and the corresponding cor-ners of the destination image with the same subscripts, as shown inFigure 10.5.

Now, for each point (u, v) in the source image, find the corresponding point(x, y) in the destination image, as shown in Figure 10.6:

2. measure the fraction of distance u along the u-axis of the source, andthe same fraction along the [x0, x3] and [x1, x2] edges of the destina-tion, to place points x03 and x12.

3. measure the fraction of distance v along the v-axis, and the samefraction along the [x0, x1] and [x3, x2] edges, to place points x01 andx23.

4. the lines [x03, x12] and [x01, x23] intersect at the point (x, y) that theinput point (u, v) maps to.

Although the bilinear map eliminates the foreshortening effect, it is notthe best for all purposes. Its main problem is that, unlike the perspectivewarp which maps straight lines to straight lines, the bilinear warp mapsstraight lines to curved (parabolic) lines. This can be seen in Figure 10.7,


v

u

y

xx3

u1

u0

u2

u3

x0

x2x1

Figure 10.5: Establishing Correspondence Between Corners

v

u

u1

u0

u2

u32/3 1/3

1/3

2/3

(u,v)

y

x x3

x0

x2x1

2/3

1/3

2/3

1/3

x01

x12

x03

x23(x,y)

Figure 10.6: Example Map of Point (u, v) = (2/3, 2/3) to Point (x, y)

which shows how the diagonals of a rectangle are mapped under both per-spective and bilinear maps. The mapping of straight lines to curved linescan lead to quite annoying image distortions for certain kinds of images.

10.3.1 Algebra of Bilinear Map

In order to fully understand the bilinear warp, it is necessary to explore itsalgebra. For simplicity, we will first normalize the coordinates of the inputimage to the range [0, 1]. If the original coordinates (s, t) are on the range

s0 ≤ s ≤ s1, t0 ≤ t ≤ t1,

then letu =

s − s0

s1 − s0,

andv =

t − t0t1 − t0

.

10.3. BILINEAR INTERPOLATION 7

Perspective Warp

y

x

v

u

Bilinear Warp

y

x

v

u

Figure 10.7: Mapping of Diagonals Under Perspective and Bilinear Warps

Now the u and v coordinates can be thought of as fractions of the distancealong the original s and t axes.

Note that in the original construction of the bilinear map, steps 2, 3and 4 are equivalent to constructing the line [x03, x12] and then measuringfraction v along the line from x03. This is given by the equations for theforward map:

x03 = (x3 − x0)u + x0,

x12 = (x2 − x1)u + x1,

andx = (x12 − x03)v + x03,

where each vector equation really stands for two scalar equations, one forthe x coordinate and one for the y coordinate. To compute the inversemap, first combine these six equations into the two equations

x = a0 + a1u + a2v + a3uv

y = b0 + b1u + b2v + b3uv,

where a0 = x0, a1 = x3 −x0, a2 = x1 −x0, a3 = x2 −x1 −x3 −x0, b0 = y0,b1 = y3 − y0, b2 = y1 − y0, b3 = y2 − y1 − y3 − y0.


With some effort, these equations can be solved for u and v to yield

v =−c1

2c2± 1

2c2

√c1

2 − 4c2c0 (10.2)

u =x − a0 − a2v

a1 + a3v, (10.3)

subject to the constraints 0 < u < 1, 0 < v < 1 and where

c0 = a1(b0 − y) + b1(x − a0),

c1 = a3(b0 − y) + b3(x − a0) + a1b2 − a2b1,

andc2 = a3b2 − a2b3.

The decision about adding or subtracting the second term in Equa-tion 10.2 for v is made by applying the constraints that u and v lie in therange [0, 1].

10.4 Scanline Approach to Inverse Mapping

Whether we are using a projective map or a bilinear map, we are stillleft with the problem of how to efficiently compute the output image fromthe input. Inverse mapping is usually the best approach, and this canbe done using a scanline algorithm. First, determine the bounds of theoutput pixmap by finding the minimum and maximum x and y coordinatesof the corners, as shown in Figure 10.8. This will have to be done usingthe forward map, or may already be known if this is part of an interactivewarping tool.

Now, represent each edge in parametric form. For example, for edge[x0, x1] we write

x = (x1 − x0)t + x0,

andy = (y1 − y0)t + y0,

where t is a parameter measuring the fraction of the distance along the edge,measured from x0. The intersection of this edge with a specific scanline aty = yi is then given by solving for t

t =yi − y0

y1 − y0

10.4. SCANLINE APPROACH TO INVERSE MAPPING 9

x0

x1

x2

x3

(xmax,ymax)

(xmin,ymin)

scanliney = yi

Figure 10.8: Bounding Pixmap Around Corners of Image

and then applying this to the equation for x, giving

x = (x1 − x0)(yi − y0)y1 − y0

+ x0,

which is the equation for the x coordinate of the intersection of scanline yi

with the line.Given this simple way to find the intersection between scanlines and im-

age edges, we can compute the output image with the following algorithm:

for(y = y_min; y < y_max; y++){list the 2 intersections with scanline y;sort intersections by x coordinate, giving x_low, x_high;for(x = x_min; x < x_low - 1; x++)

pixel[x][y] = black;for(x = x_low; x < x_high; x++){

use inverse map to compute (u,v) coordinates in input imagethat correspond to coordinates (x,y) in output image

pixel[x][y] = pixel value at location (u,v) in source image;}for(x = x_high + 1; x < x_max; x++)

pixel[x][y] = black;}

This simple looking algorithm has a few pitfalls. First, x_low andx_high, as well as u and v should be rounded to the nearest integer valueto form proper indices. Second, care must be taken to deal with situationswhen a scanline intersects exactly with a corner, so that only one intersec-tion is recorded for


but two or zero intersections are recorded for

If two intersections are recorded, we can draw the corner pixel and if zerointersections are recorded, we can skip the corner pixel. This can be astylistic decision. There are also some nice ways to organize the algorithmand the data to get some efficiencies (see Homework Nuts and Bolts below).

This approach works for maps from a rectangular image to an arbitraryquadrilateral or from a quadrilateral to a rectangle (inverse). This leadsto a general quadrilateral to quadrilateral algorithm with an intermediaterectangular step.

10.5 Homework Nuts and Bolts: ProjectiveTransformation

10.5.1 Building the transformation matrix M

A composite transformation matrix can easily be built up from a series ofsimpler transformations. Start by initializing M to the identity matrix

M =

⎡

⎣1 0 00 1 00 0 1

⎤

⎦ .

Then, for each new operation T that is to be concatenated onto the trans-form, premultiply the matrix M by T , replacing M by the product TM

M ⇐= TM,

10.5.2 Constructing the output pixmap

Let u0, u1, u2, u3 be the coordinates of the four corners of the input image.Apply transform M to each of these in turn, using the steps of a projectivetransformation covered last class. This will give you the output corners x0,x1, x2, x3, as shown in Figure 10.5.

10.5. HOMEWORK NUTS AND BOLTS: PROJECTIVE TRANSFORMATION11

Now, as long as the transformation M is projective, it is certain thatthe minimum and maximum x and y coordinates of the corners x0, x1,x2 and x3 will determine the bounding box around the output image, asshown in Figure 10.9. The bounding box can then be used to determinethe necessary width and height of the pixmap for the output image.

x0

x1

x2

x3

y0 = bottom

y3 = top

x1 = left x3 = right

Figure 10.9: Constructing a Bounding Box

10.5.3 Finding the inverse transformation

This is simply the inverse of the matrix M , as given in Figure 10.2.

10.5.4 Constructing an edge table for the inverse map

We color in the output image by using the inverse map to determine whichinput pixel maps to each output pixel, by iterating over the output pixels.This can cause problems for certain inverse maps, which will send pixelsin the output pixmap, that are not within the boundaries of the projectedimage outline, back to the input pixmap.

To prevent coloring in output pixels that are outside of the image bound-ary, you will need to be sure to do the inverse map only on pixels withinthe projected image boundary. The most convenient way to do this is withan edge table.

For each of the projected edges: [x0, x1], [x1, x2], [x2, x3] and [x3, x4]store the following:

• ymin — minimum y coordinate of the edge


• ymax — maximum y coordinate of the edge

• dx/dy — inverse of the slope of the edge, e.g., x1−x0y1−y0

• x0 — x coordinate of the endpoint having y coordinate ymin

Since a horizontal edge, i.e. an edge ij with (yi − yj) = 0, will notintersect a scanline, each such edge should be discarded. Now, sort theremaining edges by ymin, breaking ties using dx/dy.

Note: in determining ties and checking for horizontal lines, it is impor-tant to allow for some roundoff or truncation error in the floating pointnumbers. The test x1 == x2 will only be true if x1 exactly equals x2.Instead, use the test

fabs(x1 - x2) < EPSILON

where EPSILON is some very small number. I would suggest

#define EPSILON l.0e-5

i.e. one hundred thousandth.

10.5.5 Painting in the output image

The first 2 entries in your edge table will mark the leftmost and rightmostpixels in the projected image on the bottom scan line. For each scan line,color in black up to the first edge, inverse map between edges, and color inblack from the second edge, as shown in Figure 10.10.

inverse map

blackblack

Figure 10.10: Coloring a Scanline in the Output Image

In order to do the bookkeeping on this, with each “active” edge (i.e.each edge which the current scan line crosses) store the x coordinate of thecrossing point of the scanline with the edge. This is easy to do. At thestart, the crossing point is given by x0. Each time the scanline is advanced,

10.5. HOMEWORK NUTS AND BOLTS: PROJECTIVE TRANSFORMATION13

add dx/dy to the crossing x coordinate – that’s it! The only remaining bitof bookkeeping is to note when you have reached the top of an “active”edge. At this point, deactivate the edge, and activate the next edge in thetable.

10.5.6 Data structures and algorithms

The following data structures will be convenient for storing edge informa-tion:

typedef struct _edge{int Ymin, Ymax;int X0;double dxdy;double X;

}Edge;

Edge edge_table[4];

The following selection sort algorithm on an array of N integers will putthe smallest value in table[0], the second smallest in table[1], etc.

int table[N], temp;int i,j, k;

for(i = 0 ; i < N - 1 ; i++){for(k = i, j = i + 1; j < N; j++)

if(table [j] < table[k])k = j;

if(k != j){temp = table[i];table[i] = table[j];table[j] = temp;

}}

Chapter 11

Filtering to AvoidArtifacts

11.1 The Warping Process

If you have made some progress on the warping assignment, you will havenoticed some artifacts that are introduced into the warped image. Fig-ure 11.1 depicts a typical image warp, and shows how the same warp canproduce both maginification and minification in the output image. Wherethe warped image is magnified compared with the input, the artifacts arethe result of oversampling. Where the warped image is minified comparedwith the input, the artifacts are aliasing errors due to undersampling. Inorder to understand where these errors are coming from and how to correctthem, we need to have a better conceptual view of what we are doing whenwe warp a digital image.

11.2 Sampling and Reconstruction

First, remember that the original image consists of samples of some realworld (or virtual) scene. Each pixel value is simply a point sample of thescene. When we view such an image on a screen, we are really viewinga reconstruction that is done by spreading each sample out over the rect-angular area of a pixel, and then tiling the image with these rectangles.Figures 11.2a-c show the steps in the process from sampling to reconstruc-tion, looked at in one-dimensional form. When viewed from a distance, orwhen the scene is sufficiently smooth, the rectangular artifacts in the recon-

1

2 CHAPTER 11. FILTERING TO AVOID ARTIFACTS

input output

magnify

minify

samescale

Figure 11.1: Magnification and Minification in the Same Warp

struction are negligible. However, under magnification they become verynoticeable as jaggies and blockiness in the output. Figure 11.2d shows howthe extra resampling when magnifying results in multiple samples with thesame value, effectively spreading single pixel values in the original imageover multiple pixels in the magnified image.

Lesson 1: To reduce magnification artifacts we need to do abetter job of reconstruction.

11.3 Resampling and Aliasing

When we minify an image, however, the resampling process causes us tomiss many of the reconstructed samples in the original image, leading tomissing details, ropiness of fine lines, and other aliasing artifacts which willappear as patterning in the output. In the worst case, the result can lookamazingly unlike the original, like the undersampled reconstruction shownin Figure 11.2e. The problem here is that the resampling is being done toocoarsely to pick up all the detail in the reconstructed image, and worse, theregular sampling can often pick up high frequency patterns in the originalimage and reintroduce them as low frequency patterns in the output image.

Lesson 2: To reduce minification artifacts we must either 1)sample more finely than once for each output pixel, or 2) smooththe reconstructed input before sampling.

11.3. RESAMPLING AND ALIASING 3

a) brightness along a scanline across the original scene

b) brightness samples along the same scanline

c) pixel-like reconstruction of original line from the samples

d) resampling under magnification

e) resampling under minification

Figure 11.2: The Sampling, Reconstruction and Resampling Processes


11.4 Sampling and Filtering

Although we have not yet developed the mathematics to state this pre-cisely, it will be shown in a later chapter that under certain conditions it istheoretically possible to exactly recover an original unsampled image givenonly the samples in the pixmap. It will be possible to do this when theoriginal image is smooth enough given the sampling density that we usedto sample the original image to capture it in the pixmap. The criterion,loosely stated, is that the original image should not have any fluctuationsthat occur at a rate greater than 1/2 of the sampling rate. In other words,if the original image has an intensity change that from dark to light andback to dark again (or vica versa), that there should be at least two samplestaken in the period of this fluctuation. If we have more, that is even better.Another way of saying this is that any single dark to light or light to darktransition should be wide enough that at least one sample is taken in thetransition. This criterion must hold over the whole image, for the shortesttransition period in the image.

If we let the sampling period (i.e. the space between samples) be TS ,then this is useful in two ways in considering the image warping problem:

1. A perfect reconstruction could be obtained by prefiltering the sampledimage with a filter that smooths out all fluctuations that occur in aspace smaller than 2TS .

2. Aliasing under resampling could be avoided by filtering the recon-structed image to remove all fluctuations with periods smaller thantwice the resampling period, before doing the resampling.

In simple terms, we need to: 1) Do a nice smooth reconstruction, not thecrude one obtained by spreading samples over a pixel area, and 2) possiblyfurther smooth the reconstruction so that when we resample, we are alwaysdoing the resampling finely enough to pick up all remaining detail.

11.5 The Warping Pipeline

Conceptually, what we are trying to do when we warp a digital image isthe process diagrammed in Figure 11.3. It consists of three steps:

1. reconstruct a continuous image from the sampled input image, byfiltering out all fluctuations with period less than twice the originalsampling period TS ,

2. warp the continuous reconstruction into a new shape,

11.6. FILTERING AS CONVOLUTION 5

reconstruct warp resample

lowpassfilteruS/2

lowpassfilteruR/2

discreteoriginalimage

continuousreconstructed

image

continuouswarpedimage

discretewarpedimage

Figure 11.3: A Conceptual View of Digital Image Warping

3. filter the warped image to eliminate fluctuations with period less thantwice resampling period to be used in the next step, and

4. resample the warped image with a resampling period TR to form anew discrete digital image.

However, in fact we do not really have any way in the computer ofreconstructing and manipulating a continuous image, since by its natureeverything in the computer is discrete. Now, since the reconstructed andwarped continuous versions of the image never really get constructed, itmakes sense to think of combining the reconstruction and low-pass filteringoperations of steps 1 and 3 into a single step, that somehow would take intoaccount the warp done in step 2. To understand just how this might work,it would help to have a view of the filtering process that would operate overthe original spatial domain image, rather than on the image after it hasbeen transformed into the frequency domain.

11.6 Filtering as Convolution

In order to understand what a filter might look like in the spatial domain,let us start by looking at the process of reconstruction – remembering thatin the frequency domain this turned out to be a filtering process that dis-carded high frequencies. To do this we will need to expand our notion ofconvolution to include the convolution of continuous functions. Convertingour image samples into pixels – i.e. reconstructing an image from samples– is mathematically a convolution of the samples with a continuous squareunit pulse function. A one-dimensional, analogy follows.

Define a one-dimensional unit pulse function of width T to be

PT (t) ={

1, −T/2 < t < T/20, otherwise,


1

T t

PT(t)

a) unit pulse function

t

1

T T

unitpulse sample

values

sample periodpulse width

b) sliding unit pulse over samples

t

c) resulting reconstruction

Figure 11.4: Reconstruction by Convolution of Samples with Unit Pulse

as shown in Figure 11.4a. Now, think of the pulse as sliding back and forthalong the scanline, as in Figure 11.4b. Wherever you want a reconstructedvalue from the samples, simply center the pulse over the point along thescanline at which you want the image value and select the value of thesample that the pulse overlaps. Another way of thinking of this is that wemultiply the value of the shifted pulse by all of the samples that it overlapsand add all of the products. If we choose a pulse whose width is exactlythe same as the sampling period used to capture the original samples, thenon-zero portion of the pulse will overlap only one of the samples at a time,and the reconstruction process will simply spread each sample value out 1/2the sampling period T in each direction. This will result in a reconstructedscanline as shown in Figure 11.2c. The unit pulse serves as a weightingfunction, weighting each sample’s contribution to the final reconstruction.A weighting function used in this way is also known as a convolution kernel.

The convolution process is described mathematically by the integral

fR(t) =∫ ∞

−∞fT (λ)h(t − λ)dλ,

11.7. IDEAL VS. PRACTICAL FILTER CONVOLUTION KERNELS 7

where: fR is the reconstructed function, fT is the function f(t) sampledwith sampling period T , h is the convolution kernel, and λ is a shift pa-rameter that determines where the kernel h is centered.

If fT is zero except at integer multiples of its sampling period T , andthere are N samples numbered 0 through N − 1, then the integral reducesto the sum

fR(t) =N−1∑

n=0

fT (nT )h(t − nT ).

For example, if T = 1, then

fR(2) = fT (0)h(2)+fT (1)h(1)+fT (2)h(0)+ · · ·+fT (N −1)h[2− (N −1)].

Now, in this example, if the kernel function h is of width 1, then all theterms in the sum will be zero except the term fT (2)h(0), giving the samekind of squared off result as in Figures 11.2c and 11.4c.

If we choose the “tent” function of base width two shown in Figure 11.5aas our convolution kernel h, then doing a convolution as in Figure 11.5b, allterms would be zero except fT (2)h(0) only if the kernel is centered exactlyover a sample. If we choose a value of t that is not exactly at a samplepoint, the tent function will overlap two samples, and we will get a weightedaverage of adjacent sample points. For example,

fR(2.5) = fT (2)h(0.5) + fT (3)h(−0.5) = 0.5fT (2) + 0.5fT (3)

This yields a linear weighted reconstruction as shown in Figure 11.5c, es-sentially connecting the samples by straight lines. Obviously this is a much“smoother” result than that obtained with the unit pulse kernel, and inpractical applications is often enough to eliminate the most obvious recon-struction artifacts in a magnified image.

There is a whole science behind selecting and designing convolutionkernels for filtering, and this will be the subject of the following sections.

11.7 Ideal vs. Practical Filter ConvolutionKernels

11.7.1 The Ideal Low Pass Filter

Assume that we have sampled continuous function f(t) with a samplinginterval T , to obtain the sampled signal fT (t). Assuming that the originalcontinuous signal were smoothed so that it has no fluctuations with sizesmaller than 2T , then we know that we should be able to reconstruct f(t)


t

1

2T

a) unit tent function

t2T T

unittent

samplevalues

sample periodtent width

1

b) sliding unit tent over samples

t

c) resulting reconstruction

Figure 11.5: Reconstruction by Convolution of Samples with Unit Tent

11.7. IDEAL VS. PRACTICAL FILTER CONVOLUTION KERNELS 9

tT2T

1/TsincT(t)

Figure 11.6: Unit Sinc Function of Period 2T

exactly. Although the mathematics behind precisely how to do this mustwait for a later chapter, we can, at this point, develop a good workingtechnique for at least doing a good approximate reconstruction.

It is a well known result in the theory of filtering that an ideal low-passfilter, that cuts off all fluctuations of period less than 2T is given by theconvolution kernel

h(t) =sin πt/T

πt,

where the function sin θθ is known as the sinc function. The graph of this

function is shown in Figure 11.6. Now, to filter out fluctuations that areshorter than 2T we need to convolve with the octopus-like sinc function ofperiod 2T . This is not such a happy result, because the sinc never goes toa constant zero value. Thus, we are left with the need to convolve all ofour image samples with a convolution kernel of infinite extent – a bit hardto do!

11.7.2 Practical Filter Kernels

Thus, practical image filtering cannot be done using the ideal low-passfilter. However, there are many ways in which we can approximate theideal sinc kernel. In fact, the quality of a low-pass filter will be judged byhow nearly it approximates the sinc, recognizing that there is always thetrade-off between wanting the computational speed advantages of a narrowconvolution kernel versus wanting a high quality result. Some of the waysof approximating the sinc are shown in Figure 11.7. The simplest is totruncate the sinc spatially as in Figure 11.7a. A second approach is touse another function that has the same general local shape as the sinc,such as the spatial pulse and the tent kernels that we discussed in the lastsection. Figure 11.7b shows that these can both be seen as approximationsto the sinc, albeit poor ones!. The trick that tends to yield the best tradeoffbetween simplicity and quality is to window the sinc by multiplying by a


smooth function that goes to zero beyond a certain distance, as shown inFigure 11.7c.

Another technique is to approximate the windowed sinc by a splinecurve. These and other ideas are given a very thorough treatment in Wol-berg, Chapter 5.

11.7.3 Practical Convolution Filtering

We have seen the mathematical representation of the convolution, but nowneed to move to a more practical level and learn how to efficiently computea convolution. The convolution problem is shown in Figure 11.8. Put insimple terms, the problem is: given a convolution kernel h, and a set ofuniform samples fT , compute the value of the convolution of h and fT foran arbitrary coordinate t. This essentially allows us, in one step, to bothreconstruct and resample our function at any arbitrary resampling period.

The mathematical expression for the convolution of kernel h with sam-pled function fT , evaluated at coordinate t is

f∗(t) =∞∑

n=−∞fT (nT )h(t − nT ),

where n is the sample number, T is the sample period, and fT is taken tobe zero beyond the region for which samples are stored.

Now, if we assume that h(t) has a finite extent, say 3 units of the originalsampling period T on either side of t, then the sum reduces to centeringthe convolution kernel over the position t, multiplying it by each of the3 samples on either side of t, and then summing the result. This can bewritten as the sum

f∗(t) =3∑

m=−2

fT [(⌊t/T ⌋ + m)T ]h[t/T − (⌊t/T ⌋ + m)T ],

Here, ⌊t/T ⌋ is the greatest integer less than or equal to t/T , or in otherwords it is simply the sample number of the sample just to the left ofposition t. Thus, fT [(⌊t/T ⌋ + n)T ] gives the sample value just to the leftof t when n = 0, and the two to the left of this when n = −1, −2, andthe three to the right when n = 1, 2, 3. This concept is diagrammed inFigure 11.9.

Now, with respect to the array indices in a pixmap, the sample periodT is just 1, i.e. horizontally or vertically a move of 1 unit moves 1 pixelacross a scan line or along a column.

11.7. IDEAL VS. PRACTICAL FILTER CONVOLUTION KERNELS11

a) simple truncation of sinc to width 4T

b) unit pulse and tent as approximations to the sinc

x

c) windowing the sinc with a finite function

Figure 11.7: Approximating the Infinite Sinc Function to Achieve a FiniteConvolution Kernel


t

h(t)fT(t)

T

t-3T t+3T

... ...

Figure 11.8: Convolution of Kernel h with samples fT at Position t

t( t/T -1)T

t/T

( t/T +1)T

Figure 11.9: Values of h are Needed Only at Integer Valued Offsets from t

11.8. ANTIALIASING: SPATIALLY VARYING FILTERING AND SAMPLING13

Thus, sampling the filtered pixmap is simple. If W (u, v) is the warpfunction, and W−1(x, y) is its inverse, then

[uv

]= W−1(x, y).

For a kernel of width 6, we sample the input image along scanline ⌊v⌋ atpositions ⌊u⌋ − 3 , ⌊u⌋ − 2 , ⌊u⌋ − 1 , ⌊u⌋ , ⌊u⌋ + 1 , ⌊u⌋ + 2 , and ⌊u⌋ + 3;and multiply each of the samples by

h[u − (⌊u⌋ + n)],

orh[(u − ⌊u⌋) + n],

where n is the sample index from −2 to 3. (Note that (u − ⌊u⌋) is simplythe fractional part of the real number u.) The sum of all of these sixproducts is our approximation to the pixel in the reconstructed warpedimage resampled at the point (x, y). Of course, this operation needs to bedone both along the columns and along the scanlines that fall under thefilter kernel if the image warp involves distortion in both directions. Thehorizontal filter should be done to an intermediate image, and then thatimage should be vertically filtered into the final image, so that the actualconvolution is over a rectangular area about the point (u, v).

You should note that this calculation requires computing the kernelfunction h six times for each (x, y) coordinate pair under the inverse map.This can be avoided by an initialization step that samples h finely, com-puting h once for each of these samples and storing the resulting samplevalues in a table, as diagrammed in Figure 11.10. This can be done veryfinely (e.g. 500 samples) since it only need be done once. Then when avalue for h is needed it can be obtained from the table by simply selectingthe sample value in the table whose index is nearest to the coordinate forwhich a value for h is needed. Now, the whole process of taking a singlefiltered sample is simply a series of multiplies and adds.

11.8 Antialiasing: Spatially Varying Filter-ing and Sampling

11.8.1 The Aliasing Problem

Remember that in the image warping problem there are two places wherefiltering is important:


h(t)

sample values to be stored in a table

Figure 11.10: Precalculation of Many Samples of Convolution Kernel h

Figure 11.11: Area Sampling, Pixel Area Projected Back Into Input Image

1. reconstruction filtering (critical when magnifying)

2. antialiasing filtering (critical when minifying).

In the last section we looked at the general idea of practical filtering, withspecial attention paid to the reconstruction problem. Here we will focusspecifically on the aliasing problem.

Recall that aliasing will occur when resampling an image whose warpedreconstruction has frequency content above 1/2 the resampling frequency.To prevent this we can either:

1. filter the reconstructed and warped image so that its highest frequen-cies are below 1/2 the resampling rate,

2. adjust the resampling rate to be at least twice the highest frequencyin the reconstructed warped image.

In general, the problem in both cases is that a warp is non-uniform, meaningthat the highest frequency will vary across the image, as schematized inFigure 11.11.


We can deal with this by filtering the entire image to reduce the highestfrequencies found anywhere in the image, or sample everywhere at a ratetwice the highest frequency found anywhere in the image. Either solutioncan be very wasteful (sometimes horribly wasteful) of computation time inthose areas of the image that do not require the heavy filtering or excesssampling. The answer is to use either spatially varying filtering or adaptivesampling.

11.8.2 Adaptive Sampling

We can think of the inverse mapping process described in Chapter 7, andshown in Figure 11.12a, as a point sampling of the input image, over theinverse map, guided by a uniform traversal over the output image. Thistechnique is simple to implement but, as we have discovered, leaves manyartifacts, and we need to look at more powerful approaches.

One method for doing sampling without aliasing is area sampling. Thistechnique is diagrammed in Figure 11.12b. Area sampling projects thearea of an output pixel back into the input image and attempts to takea weighted average of covered and partially covered pixels in the input.Area sampling produces good results under minification, i.e. aliasing iseliminated. However, to do area sampling accurately is very time consumingto compute and hard to implement. Fortunately, experience shows thatgood success can be had with simpler, less expensive approximations toarea sampling.

A common approximation to area sampling is known as supersampling.Here, we still take only point samples, but instead of sampling the inputonly once, or over an entire area as with area sampling, multiple samplesare taken per output pixel. This process is diagrammed in Figure 11.12c.A sum or weighted average of all of these samples is computed before stor-ing in the output pixel. This gives results that can be nearly as good aswith area sampling, but results vary with the amount of minification. Theproblem with this method is that if you take enough samples to eliminatealiasing everywhere, you will be wasting calculations over parts of the im-age that do not need to be sampled so finely. Although supersampling canbe wasteful, some economies can be had. Figure 11.13 shows how somecareful bookkeeping can be done to minimize the number of extra samplesper pixel. If samples are shared at the corners and along the edges of pixels,there is no need to recompute these samples when advancing to a new pixel.

Adaptive supersampling improves on simple supersampling by attempt-ing to adjust the sampling density for each pixel to the density needed toavoid aliasing. The process used in adaptive supersampling is as follows:


a) point sampling

b) area sampling

c) supersampling

Figure 11.12: Sampling Techniques Under Inverse Mapping

samples already calculated

new samples to be calculated

Figure 11.13: Sharing of Samples Under Inverse Mapping. Scan is Left toRight, Top to Bottom


Circled Samples are Different from Average

Figure 11.14: Adaptive Supersampling

1. Do a small number of test samples for a pixel, and look at the maxi-mum difference between the samples and their average.

2. If any samples differ from the average by more than some predefinedthreshold value, subdivide the pixel and repeat the process for eachsub-pixel that has an extreme sample.

3. Continue until all samples are within the threshold for their sub-pixels(s) or some maximum allowable pixel subdivision is reached.

4. Compute the output pixel value as an area-weighted average of thesamples collected above.

Any regular sampling approach will be susceptible to aliasing artifactsintroduced by the regular repeating sampling pattern. A final refinementto the adaptive supersampling technique is to jitter the samples so thatthey no longer are done on a regular grid. As shown in Figure 11.15, anadaptive sampling approach is taken but each sample is moved slightly bya random perturbation from its regular position within the output pixelbefore the inverse map is calculated. Irregular sampling effectively replacesthe low frequency patterning artifacts that arise from regular sampling withhigh frequency noise. This noise is usually seen as a kind of graininess inthe image and is usually much less visually objectionable than the regularpatterning from aliasing.

11.8.3 Spatially Variable Filtering

Antialiasing via filtering is a fundamentally different approach from an-tialiasing via adaptive supersampling, and leads to another basic problem.Since the frequency content of an image will vary due to the warp applied,efficiency issues require that whatever filtering is done be adapted to thischanging frequency content. The idea behind spatially varying filtering is


Figure 11.15: Jittered Adaptive Supersampling

Kernel size expands to match sampling rate. For high samplingrate, kernel size is small, for low rate it is large.

Figure 11.16: Spatially Varying Filtering

to filter a region of a picture only as much as needed, varying the size ofthe filter convolution kernel with the variance in frequency content.

Spatially variable filtering schemes attempt to pre-filter each area ofthe input picture, using only the level of filtering necessary to remove fre-quencies above either 1/2 the sampling rate or 1/2 the resampling rate,whichever is lower under the inverse map. The idea here is that the sameconvolution kernel is used across the whole image but it changes scale toaccommodate the resampling rate. Where the output is minified, the scaleof the kernel being convolved with the input image is increased. Wherethe output is magnified, it is decreased. Remember, of course, that thebigger the kernel the more computation is involved, so the problem withthis approach is that it gets very slow over minification.

A crude but efficient way of implementing spatially varying filtering isthe summed area table scheme. It is crude in that it only allows for a boxconvolution kernel, but fast in that it does a 128 × 128 convolution as fastas a 4×4 convolution! The trick is to first transform the input image into adata structure called a summed area table (SAT). The SAT is a rectangulararray that has one cell per pixel of the input image. Each cell in the SAT


2 1 3 21 3 2 13 2 1 11 2 1 3input image

=⇒

7 15 22 295 12 16 214 8 10 141 3 4 7

summed area table

Figure 11.17: Summed Area Table

corresponds spatially with a pixel in the input, and contains the sum of thecolor primary values in that pixel and all other pixels below and to the leftof it. An example of a 4 × 4 SAT is shown in Figure 11.17.

The SAT can be computed very efficiently by first computing its leftcolumn and bottom row, and then traversing the remaining scan lines doingthe following calculation:

SAT[row][col] = Image[row][col] + SAT[row-1][col] +SAT[row][col-1] - SAT[row-1][col-1];

where SAT is the summed area table array and Image is the input imagepixmap, and array indexing out of array bounds is taken to yield a resultof 0.

Once the SAT is built, we can compute any box-filtered pixel value forany integer sized box filter with just 2 subtracts, 1 add and 1 divide. Thecomputation of the convolution for one pixel is

BoxAvg = (SAT[row][col] - SAT[row-w][col] -SAT[row][col-w] + SAT[row-w][col-w]) / ^2;

where w is the desired convolution kernel width and (row, col) are thecoordinates of the upper right hand corner of the filter kernel. Note, thatthe size w of the convolution kernel has no effect on the time to computethe convolution. The required size of the convolution can be approximatedby back projecting a pixel and getting the size of its bounding rectangle inthe input, as shown in Figure 11.18.

Looked at mathematically, if the inverse map is given by functionsu(x, y) and v(x, y), then the bounding rectangle size is given by

du = max(∂u/∂x, ∂u/∂y)

in the horizontal direction, and

dv = max(∂v/∂x, ∂v/∂y)

in the vertical direction, as shown in Figure 11.19. Alternatively, the eu-clidian distances

(du)2 = (∂u/∂x)2 + (∂u/∂y)2


boundingrectangle

Figure 11.18: Finding the Input Bounding Rectangle for an Output Pixel

dy

dx

dx = 1dy = 1

v

u

bv/by

bu/bx

bv/bx

bu/by

Figure 11.19: Measuring the Required Kernel Size Under an Inverse Map

and(dv)2 = (∂v/∂x)2 + (∂v/∂y)2

can be used.

11.8.4 MIP Maps and Pyramid Schemes

A final and very popular technique for speeding the calculation of a vary-ing filter kernel size is to use a Multiresolution Image Pyramid or MIP-Mapscheme for storing the image. The idea here is to get quick access to theimage prefiltered to any desired resolution by storing the image and se-quence of minified versions of the image in a pyramid-like structure, asdiagrammed in Figure 11.20. Each level in pyramid stores a filtered versionof the image at 1/2 the scale of the image below it in the pyramid. In theextreme, the top level of the pyramid is simply a single pixel containing theaverage color value across the entire original image. The pyramid can bestored very neatly in the data structure shown in Figure 11.21, if the imageis stored as an array of RGB values.

The MIP-Map is somewhat costly to compute, so the scheme finds mostuse in dealing with textures for texture mapping for three-dimensional com-


1/4 scale

1/2 scale

full scale

Figure 11.20: An Image Pyramid

R

GB

R

GBR

GB

Figure 11.21: Data Structure for a MIP-Map


puter graphics rendering. Here, the MIP-Map can be computed once for atexture and stored in a texture library to be used as needed later. Thenwhen the texture is used, there is very little overhead required to completefiltering operations to any desired scale.

From the MIP-Map itself, it may seem that it is only possible to domagnification or minification to scales that are powers of 2 from the originalscale. For example if we want to scale the image to 1/3 its original size, the1/2 scale image in the MIP-Map will be too fine, and the 1/4 scale imagewill be too coarse. The solution to this problem is to interpolate acrosspyramid levels, to get approximation to any resolution image. We wouldsample at the levels above and below the desired scale, and then take aweighted average of the color values retrieved. The idea is simply to findthe pyramid level where an image pixel is just larger than the projectedarea of the output pixel, and the level where a pixel is just smaller thanthis area. Then we compute the pixel coordinates at each level, do a simplepoint sample at each level, and finally take the weighted average of the twosamples.

Chapter 12

Scanline WarpingAlgorithms

12.1 Scanline Algorithms

A scanline algorithm is one in which the main control loop iterates over thescanlines of an image and an inner loop iterates across each scanline, as inthe code fragment of Figure 12.1. Often these algorithms alternate passesover the scanlines and then the columns of the image.

... // loop initializationfor(y = 0; v < height; y++){... // scanline initialization for scanline yfor(x = 0; x < width; x++){

... // operation on pixel x on scanline y}... // scanline y wrap-up

}

Figure 12.1: Basic Organization of a Scanline Algorithm

A scanline approach to constructing image manipulation algorithms canoften allow the realization of speedups that exploit coherencies that mayexist along scanlines or down columns. The observation that neighboringpixels have similar values is just one such coherency that can lead to im-portant savings in computation. Another idea that can save time is to

1

2 CHAPTER 12. SCANLINE WARPING ALGORITHMS

assume that a movement across a pixel is so small that certain nonlineari-ties can be ignored and sometimes simple linear interpolations can be usedin advancing from pixel to pixel across a scanline.

For example, one such trick would be to assume that an inverse map islinear across a short span. Then, as diagrammed in Figure 12.2, instead ofcomputing the inverse map for every pixel, compute it for the two ends ofa scanline span (x0 and x1 in the figure), yielding two corresponding (u, v)coordinates pairs in the input image (u0 and u1 in the figure). Then, whileadvancing across the output scanline, calculate intermediate input (u, v)values by linear interpolation, as shown in the code fragment in Figure 12.2.

u

v y

x

x0 x1

u0 u1

for(y = 0; y < height; y++){determine x0 and x1 of a span on scanline y;(u0, v0) = inv_map(x0, y); // inverse map of span left end(u1, v1) = inv_map(x1, y); // inverse map of span right enddudx = (u1 - u0) / (x1 - x0); // change in u per pixeldvdx = (v1 - v0) / (x1 - x0); // change in v per pixelu = u0;v = v0;for(x = x0; x <= x1; x++){out[x][y] = in[u][v];u += dudx; // interpolate to next u valuev += dvdx; // interpolate to next v value

}}

Figure 12.2: Approximation of the Inverse Map by Linear Interpolation

This trick will work exactly for affine warps, since these maps and theirinverses are simply linear functions and translations in x and y. For non-affine warps, this approach will lead to errors in the image mapping, thatwill be more or less severe, depending upon the degree of the nonlinearity

12.2. SEPARABLE SCANLINE ALGORITHMS 3

and the length of the span being interpolated in the input image space(u, v). However, the approach tends to be much faster than the exactcomputation, which involves computing the inverse map for each pixel.

A further trick that will often make the linear interpolation idea workwell even for nonlinear warps is to subdivide the input image, using theforward projection of the subdivision to subdivide the output image. Thisidea is illustrated in Figure 12.3. When interpolations are done, they aredone across the subregions, thus making the spans shorter and minimizingthe effects of nonlinearities.

u

v y

x

Figure 12.3: Subdivision to Minimize the Effect of Nonlinearities

For the special case of a perspective warp, a trick which allows inter-polation to compute the exact inverse map is to do the interpolation in(u, v, w) space, before dividing by w to normalize the (u, v) coordinates.In other words, instead of interpolating only u and v, interpolate u, v andw. This trick works because the inverse map in three-dimensional homo-geneous coordinates is linear, up until normalization by the w coordinate.Figure 12.4 shows the code of Figure 12.2 modified to do interpolation in(u, v, w) space.

12.2 Separable Scanline Algorithms

A separable algorithm is one in which the map can be factored into twoor more seperate maps, each of which operates only over scanlines or overcolumns. This idea is diagrammed in Figure 12.5. The scanline operationschange only the horizontal coordinate and the column operations changeonly the vertical coordinate.

Algebraically, what is meant here is this. Given a warp

(x, y) = f(u, v) = [X(u, v), Y (u, v)],


for(y = 0; y < height; y++){determine x0 and x1 of a span on scanline y;(u0, v0, w0) = inv_map(x0, y); // inverse map of span left end(u1, v1, w1) = inv_map(x1, y); // inverse map of span right enddudx = (u1 - u0) / (x1 - x0); // change in u per pixeldvdx = (v1 - v0) / (x1 - x0); // change in v per pixeldwdx = (w1 - w0) / (x1 - x0); // change in w per pixelu = u0;v = v0;w = w0;for(x = x0; x <= x1; x++){nu = u / w;nv = v / w;out[x][y] = in[nu][nv];u += dudx; // interpolate to next u valuev += dvdx; // interpolate to next v valuew += dwdx; // interpolate to next w value

}}

Figure 12.4: Linear Interpolation in (u, v, w) Space before Normalizationby w

we say that the warp is two-pass separable if we can find a function G(u, v)such that

f(u, v) = [X(u, v), G(X(u, v), v)].

In other words, if we can find a function G such that

Y (u, v) = G(X(u, v), v).

If we can find such a function, then we can first apply the map

(u, v) .→ (X(u, v), v) = (x, v),

keeping the vertical coordinate fixed and changing only the horizontal co-ordinate and then apply the map

(x, v) .→ (x, G(x, v)) = (x, y)

to the result, keeping the horizontal coordinate fixed and changing only thevertical.

In general, the seperable property holds when the function X(u, v) canbe partially inverted to yield a solution for u in terms of x and v. Whenever


G(x,v)columnwarpf(u,v)

X(u,v)scanline

warp

Figure 12.5: Separation of a Warp into Sequential Scanline and ColumnWarps

this can be done, then it is a simple matter to use this solution for u as theinput to the function Y (u, v), to yield G(x, v).

12.2.1 Separation of projective warps

The projective warps can all be implemented via separable algorithms. Notethat all of the projective warps are expressed as a multiplication of the inputimage coordinates (u, v) by a 3× 3 homogeneous projection matrix M , i.e.

M

⎡

⎣uv1

⎤

⎦ =

⎡

⎣xyw

⎤

⎦ .

For the special case of an affine warp, where w is 1, separation is achievedby simply factoring the matrix M into the product of two matrices, the firstof which affects only the horizontal coordinate u, and the second of whichaffects only the vertical coordinate v. The portion of the matrix whichaffects the output w coordinate can be placed in either of the two factors,or distributed between them. If the matrix M is given by

M =

⎡

⎣a b cd e f0 0 1

⎤

⎦ ,


then one possible factorization would be

M = V U,

where

U =

⎡

⎣a b c0 1 00 0 1

⎤

⎦ ,

and

V =

⎡

⎣1 0 0

d/a (ea − bd)/a (af − dc)/a0 0 1

⎤

⎦ .

In the case of a general projective warp, where we cannot assume thatw is 1,

M =

⎡

⎣a b cd e fg h 1

⎤

⎦ .

Now we haveX(u, v) =

au + bv + c

gu + hv + 1(12.1)

andY (u, v) =

du + ev + f

gu + hv + 1. (12.2)

Equation 12.1 can be solved for u to yield

u =v(b − hx) + c − x

gx − a,

which when substituted for u in equation 12.2 gives

G(x, v) =d(v(b − hx) + c − x) + ev(gx − a) + f(gx − a)g(v(b − hx) + c − x) + hv(gx − a) + (gx − a)

Thus, any of the projective warps can be seen to be separable.

12.2.2 Paeth-Tanaka rotation algorithm

The Paeth-Tanaka algorithm is a very clever example of a separable warpalgorithm. It handles only pure image rotations, which is a special caseof the projective warps, and operates in three phases – 1) scanline, 2)column, and 3) scanline. Paeth and Tanaka realized that any rotation intwo dimensions can be factored into three successive shears, as illustratedin Figure 12.6.


RO

horizontal

shear

verticalshear

horizontal

shear

Figure 12.6: Image Rotation Implemented as a Sequence of Three Shears

The factorization of the rotation matrix used in the algorithm is

Rθ =

[cos θ − sin θsin θ cos θ

]=

[1 − tan θ/20 1

][1 0

sin θ 1

][1 − tan θ/20 1

].

(12.3)

Checking this factorization is left as an exercise to the student. Whilechecking, you will need the trigonometric identities

tan θ/2 =sin θ

1 + cos θ,

andtan θ/2 =

1 − cos θ

sin θ.

Note that under the factorization of Equation 12.3, the multiplication

Rθ

[uv

]

becomes

([

1 − tan θ/20 1

]([

1 0sin θ 1

]([

1 − tan θ/20 1

] [uv

]))),

which is a horizontal shear, followed by a vertical shear, followed by a secondhorizontal shear.

The beauty of this formulation is that under each of the shears, no scaleis required and at each step in the algorithm, pixel motion is in one direction


only. Thus, each output scanline from the first shear is just a horizontaldisplacement of the entire input scanline. The output from the second shearis simply a vertical displacement of each column of the intermediate result,and the output from the third shear is again a horizontal displacement ofeach scanline of the second intermediate result. Since the forward mapdoes no scales and only applies translations, this allows us to use a forwardmapping algorithm rather than an inverse map to compute the rotation.Further, we can apply a simple tent reconstruction filter very efficientlyby simply averaging each adjacent pair of input pixels in the source into asingle pixel in the destination according to a fixed ratio on each scanline,as depicted in Figure 12.7

g

0 1 2 3

input scanline

4 5 60 1 2 3

output scanlinef

Figure 12.7: Image of Input Scanline Overlaid on Output Scanline

The entire algorithm reduces to three simple steps to compute the threesuccessive shears:

1. Loop across each scanline, displacing each pixel on the line and weight-ing each adjacent pair of pixels into each output pixel,

2. Loop on each column, displacing each pixel in the column and weight-ing each adjacent pair of pixels into each output pixel,

3. Repeat of Step 1.

The only complication is that before each step, it is necessary to computethe output array size to store the intermediate image. This can be donequite simply, since at each step either the number of rows or number ofcolumns stays fixed so only the extents of the other coordinates need to bedetermined. The geometry of this calculation is shown in Figure 12.8a, andFigure 12.8b gives a detailed algorithm for implementing step 1.


(0,0) (w,0)

(0,h) (w,h)

w

dx = h * tan(O/2)

h

dx

a) geometry of step 1 of the Paeth-Tanaka algorithm

x_off = height * tan(theta/2); // offset of lower lefthand cornerif(x_off < 0) x_off = 0;

for(v = 0; v < height; v++){ // v = input scanline numberx0 = -v * tan(theta/2) + x_off; // starting x, for u = 0f = fabs(x0 - (int)x0); // overlap fractiong = 1.0 - f;

for(j = 0; j < (int)x0; j++) // zero output scanline up to x0out[j][v] = 0;

out[j++][v] = g * in[0][v]; // tent filter input pixels into outputfor(i = 0; i < width - 1; i++)out[j++][v] = f * in[i][v] + g * in[i+1][v];

out[j++][v] = f * in[i+1][v];

for(; j < width + x_off; j++) // zero rest of scanlineout[j][v] = 0;

}

b) algorithm for step 1 of the Paeth-Tanaka algorithm

Figure 12.8: Implementation Details for Step 1 of the Paeth-Tanaka Rota-tion Algorithm


12.3 Mesh Warp Algorithm

In a mesh warp, we superimpose a regular grid over the input image, andthen allow the user to move grid corners to deform the grid. The image isthen warped, usually using forward bilinear interpolation, to map the inputsubimage in each grid square to the corresponding deformed quadrilateralin the deformed grid.

input imagewith superimposed

grid

output imagewith deformed

girid

Figure 12.9: Image Deformation via a Mesh Warp

Typical constraints in a mesh warp are that

1. image corners cannot be moved,

2. grid vertices on horizontal and vertical edges may only move alongthe edge,

3. grid edges cannot be made to cross over each other.

These constraints assure that input and output images are the same sizeand shape, and that the image will deform like a rubber sheet, withouttearing or folding over itself (i.e., it keeps its original topology).

The edges of the mesh can either be straight lines or spline curves. Wewill look first at a mesh warp where mesh edges are straight lines.

12.3.1 Separable mesh warp

The mesh warp can be implemented easily using a two-pass separable scan-line algorithm. The idea is to treat each vertex perturbation as a translationin the horizontal direction, followed by a translation in the vertical direc-tion. This leads quite naturally into a decomposition of the warp into aset of horizontal movements followed by a set of vertical movements as di-agrammed in Figure 12.10a, which leads to the full mesh warp shown inFigure 12.10b.

The algorithm, based on this idea, goes as follows:

12.3. MESH WARP ALGORITHM 11

(x,y)

(x+6x,y)

6y(x+6x,y+6y)

6x

6x

a) displacement of a single mesh point in two steps

y

x

v

x

v

u

full warp

movement only inv direction

movementonly inu direction

b) full mesh warp in two steps

Figure 12.10: Mesh Warp Implementation Using Horizontal and VerticalDisplacements

1. Apply horizontal movement to each mesh point in the u direction togive (x, v) coordinates for each mesh point.

2. For each scanline in the intermediate output image, note the intersec-tion of all grid lines with this scanline as shown in Figure 12.11, andbuild an array of these intersection points. If there are M verticalgrid lines, and N scan lines, then the array will be N ×M in size, asthere will be M entries for each of N scanlines.

3. Resample across each scanline to the full image resolution, using for-ward mapping, accounting for overlap by using box-filter averagingwhere the scanline is being minified (compressed) and filling in holesusing tent-filter resampling where the scanline is being magnified(stretched).


0 0.8 2.3 3

Figure 12.11: Crossings of Mesh Lines with Scanlines

4. Repeat this process from the intermediate image to the final outputimage, but doing the process over image columns rather than scan-lines, and doing movements in the vertical direction to change (x, v)coordinates into (x, y) coordinates.

12.3.2 Mesh Warp Using Spline Interpolation

The warp algorithm as described can easily be extended to warp an im-age via a mesh with smoothly curved edges. The only extension neededhere is to interpolate between mesh control points using a higher orderinterpolating curve. A typical choice is a cubic interpolation polynomial.The difference between simple linear interpolation and cubic interpolationis illustrated in Figure 12.13.

A cubic interpolating polynomial is constructed by fitting curves de-scribed by cubic polynomials between the control points in such a way thatthey join together smoothly to create a continuous curve across all of thepoints. Choosing to use cubic curve segments between mesh control pointsallows us to organize the calculation in a straightforward way. In readingthe following description, please refer to Figure 12.14, which shows howsuch an interpolating curve would be constructed down a column (in the vdirection) in the input image to govern horizontal displacements (in the xdirection) along a scanline in the output image. For each segment i of thecurve (e.g. segment 1 in the figure) define a distance parameter

ti =v − vi

vi+1 − vi(12.4)

which varies from 0 to 1 as v varies from vi to vi+1. Then the displacement

12.3. MESH WARP ALGORITHM 13

minifybox-filter

magnifytent-filter

weighted average ofpixels overlapped

weighted average of nearest neighbors

INPUT

OUTPUT

a) filters for magnification and minification

INPUT

OUTPUT

0 1 2 3

0 1 1.8 3

b) different filters applied along the scanline

Figure 12.12: Filtering Input Image into Output Image Along a Scanline

of the curve in the x direction for segment i is given by

Xi(ti) = ait3i + bit

2i + citi + di, (12.5)

i.e. a cubic curve in ti. The coefficients ai, bi, ci, and di are determined byimposing constraints on the curve segment.

Certainly we want the curve segment to pass through the control pointsat each of its two ends. Letting xi and xi+1 be the control point horizontalcoordinates for vi and vi+1 , we have

Xi(0) = xi and Xi(1) = xi+1.

Also, we would like the curve segment to join smoothly with its neighborsto the left and right. This is often enforced by requiring that the slopesand curvatures must match where the segments join. In other words, theirfirst and second derivatives must match. This gives

X ′i−1(1) = X ′

i(0) and X ′′i−1(1) = X ′′

i (0).

These constraints, plus some additional ones to handle conditions on theextreme left and right ends of the entire curve serve to completely determinethe coefficients ai, bi, ci and di for each curve segment.


linearinterpolation

higher orderinterpolation

Figure 12.13: Linear Interpolation vs. Cubic Polynomial Interpolation

x

vv0 v1 v2 v3 v4

control points

segment1

t1=0 t1=1

Figure 12.14: Cubic Spline Down Input Column Giving Output HorizontalDisplacements

Once these coefficients are determined, Equations 12.4 and 12.5 can beused to find the intersections of the curves with the scanlines and columns.Equations 12.4 and 12.5 are set up to determine horizontal displacement xin the output image as a function of vertical displacement v in the inputimage. They can be used for pass-one of the algorithm to determine whereeach curve crosses the scan lines of the image. From here, it is an easyexercise to construct similar functions to be used for pass-two, which givethe vertical displacements y in the output image as a function of horizontaldisplacements u in the input image.

Chapter 13

Morphing

A morph from one image to another involves 1) warping both of the imagesto some intermediate shape where they can be superimposed upon eachother, and 2) blending the two images together to produce a third image.An example is given in Figure 13.1, which shows a step in a morph from abrick into a ball. First, we find an intermediate shape that we can deformboth into, warp both images to this intermediate shape, and then blend theintermediate images to produce an image that is a mix of the two originals.

intermediatewarpedbrick

originalball

originalbrick

intermediatewarpedball

blend of thetwo intermediate

images intomorphed brick-ball

Figure 13.1: A Step in a Morph of a Brick into a Ball

Five steps in the complete brick-to-ball morph sequence are shown inFigure 13.2. The complete sequence consists of a series of intermediate

1

2 CHAPTER 13. MORPHING

warped shapes that go from one of the original shapes to the other, andthe blend is controlled by giving proportionally more strength to the origi-nal image that is closest to the deformed shape. If this is done with someartistry, when the sequence is played back it can look like a smooth transi-tion from one shape and coloring to the other.

brick

1.00

0.75

0.50

0.25

0.00

ball

0.00

0.25

0.50

0.75

1.00

Figure 13.2: Steps in Morph Sequence from Brick to Ball

If the two objects being morphed are also moving, then each warp inthe sequence must be done between corresponding frames in the originalmotion sequence. For this reason, it is important that a morph tool foranimation or video gives the animator good control over the warp betweenthe images while allowing for the exploitation of continuity from frame toframe – to minimize the amount of hand work that must be done for eachframe.

13.1. MORPHING ALGORITHMS 3

13.1 Morphing Algorithms

We will look at three schemes for doing morphing; a scan line approach, atriangular area fill approach, and a feature based approach.

13.1.1 A Scanline Morphing Algorithm

The scanline algorithm extends the separable mesh warping algorithm thatwe studied in the last chapter. The idea is to allow the user to deform amesh over each of the two images, so that the mesh approximately fits overthe image features that are to be warped and so that corresponding meshelements enclose areas of the two images that are to be warped and blendedinto each other. Figure 13.3 shows how a pair of meshes might be set up forthe brick-to-ball morph, although generally this will require a much finergrid and careful line placement than that shown in the figure.

ending meshstarting mesh

intermediatemesh for framein middle of sequence

Figure 13.3: Rectangular Meshes Morphing a Brick into a Ball

Now, each image will be warped to an intermediate form that can bederived by interpolating corresponding vertices in the mesh over image 1into the vertices in the mesh over image 2. This is very simple to compute– for the frame at fractional position f along the sequence (f = 0 at thestarting frame, and f = 1 at the ending frame), simply set each vertex Pi,f

in the mesh for frame f to

Pi,f = fPi,1 + (1 − f)Pi,0.


This process of mesh interpolation is also shown in Figure 13.3, which showshow the starting and ending meshes are interpolated to form the mesh inthe middle of the sequence.

Then, each image is warped to this intermediate mesh using the separa-ble scan line mesh warp algorithm, and the images are blended, also usingf as the blending factor. The interested student is referred to the text byWolberg for complete details of the algorithm.

13.1.2 Triangle Mesh Warp Morph

The rectangular mesh technique is by no means the only mesh-orientedtechnique that can be used to do a morph. For example, a triangle meshwarp algorithm was developed by Darrin Butts for the Midas morphingprogram. It gives the user free control over the number and placement ofthe vertices of a polygonal outline in the initial image, and then requiresthe user to triangulate the region enclosed by the polygon by connectinggroups of three vertices with edges. It then copies the outline to the finalimage and allows the user to reposition the vertices, thus warping and repo-sitioning the triangles. The program creates intermediate meshes by linearinterpolation based on frame number, just as with the rectangular meshwarp. Triangles in the original two images are warped into the triangles inthe intermediate images using an approach very similar to bilinear inter-polation for rectangular grid elements. The triangular mesh warp processused by Midas is illustrated in Figure 13.4.

13.1.3 Feature Based Morph

A feature based approach to defining a morph was by Thad Beier of SGIand Shawn Neely of PDI (SIGGRAPH ‘92). This technique was used forthe ending segment of the Michael Jackson “Black and White” music video,which features a sequence of morphs of dancers heads into each other. Thistechnique works by having the user draw pairs of lines, one in each of thetwo images that are to be morphed, that locate and identify significant fea-tures of the two images that should correspond to each other in the morphedimage. The idea is that corresponding lines are both to map to an “aver-aged” line in the intermediate image. This is implemented mathematicallyby using these lines to define an influence function which spreads a kindof gravitational influence of each line out across the image. At each pixelin the image, the sum of the influence functions of all of the feature linesdetermines the value of a global “field”, which is used to warp the imageinto the intermediate image. The basic idea of this algorithm is depictedin Figure 13.5.

13.1. MORPHING ALGORITHMS 5

brick and ball outlined by 8 mesh points

triangular mesh over control points

intermediate warped mesh

Figure 13.4: Triangular Meshes Morphing a Brick into a Ball

brick and ball with feature lines

intermediate "averaged"feature lines

Figure 13.5: Feature Lines used to Morph a Brick into a Ball

Chapter 14

Frequency Analysis ofImages

14.1 Sampling, Aliasing and Reconstruction

We have already informally looked at the problem of doing any sort of imagewarp (change in geometry) when the image is a discretely sampled rasterrepresentation. When the image is warped, changes in size and orientationmake the map from input pixels to output pixels complicated to determine,and in general not 1-to-1, i.e. there is not usually precisely one input pixelcorresponding to each output pixel.

This problem is fundamental to digital processing of any sort of informa-tion such as images, sound, simulation data, etc., that is sampled as part ofthe information capture process. We can view such sampling abstractly asthe selection of a sequence of evenly spaced values from a continuous signalor function, as depicted in the graphs of Figure 14.1. When we take anysort of continuous signal and store a representation of it as a sequence ofsamples, we have made some very fundamental changes in the information.Obviously, we have thrown away a lot of potential detail. It is this potentialfor loss of important information that leads to problems later. Samplingleads to two general classes of problem, reconstruction errors and aliasing.

We can recognize aliasing as the introduction of pattern artifacts intothe image or sound that we are trying to reproduce. Moire patterns (thatare often made for artistic effect) are the result of the same kind of error thatcauses aliasing – basically that we are sampling at a rate that is below thelevel of detail of what we are trying to capture. Such aliasing artifacts aresymptomatic of the sampling process itself, and cannot be avoided unless

1

2 CHAPTER 14. FREQUENCY ANALYSIS OF IMAGES

tstored samplesoriginal signal t

Figure 14.1: Sampling a Continuous Function

care is taken to do sampling in a way that is appropriate to the data beingsampled. A demonstration of Moire patterns created by radial lines drawnon a CRT raster can be seen by running the program moire that can befound in the directory /usr/local/misc/courses/viza654/examples/.

Another problem with sampled data is that in order to reproduce theoriginal input from the samples, we must somehow reconstruct the inputusing only the samples as our information source. For example, in a digitalimage we reconstruct an original continuous image by spreading each sam-ple value across the area of a rectangular pixel on the screen. The artifactsdue to the reconstruction process are known as reconstruction artifacts, andfor digital images result in such errors as staircasing or jaggies, as shown inFigure 14.2.

Figure 14.2: Jaggies Due to Rectangular Pixel Reconstruction

The questions that we need to address if we are to be able to producehigh quality warped images are

1. How should we do sampling to ensure that we have retained enoughdata to avoid aliasing artifacts?

2. How should we do reconstruction, given a set of samples, to avoidreconstruction artifacts?

Our goal in attempting to answer both of these questions will be to see howsuccessful we can be in avoiding – or at least minimizing – both aliasingand reconstruction artifacts.

dhouse

14.2. FREQUENCY ANALYSIS 3

14.2 Frequency Analysis

Before we can answer either of these questions, we need to develop anunderstanding of how the smoothness of a data set relates to how finelyit must be sampled, and how much information is contained in a set ofsamples. Frequency analysis will give us the tools we need to do this.

We are used to thinking of sound as having a frequency spectrum, andcan readily identify the low frequency tuba from the high frequency piccolo.The notion of frequency is normally associated with a time varying signal(like sound), but finds important use in both developing tools to operateon images and in providing a framework for the investigation of imagequalities. In fact, it provides the foundation of theory supporting the designand refinement of most image processing techniques.

14.2.1 Fourier series

What do we mean by frequency analysis? The notion here is that any finiteduration or periodic “signal”, can be represented as an infinite weightedsum of sines and/or cosines. This representation is known as the FourierSeries representation. Given

f(t) = g(t) =

⎧⎨

⎩

0, t > Tg(t), 0 ≤ t ≤ T0, t < 0,

then, on the interval (0, T ) we have

f(t) = 1/2C0 +∞∑

n=1

Cn cos(2πn

Tt − φn) (14.1)

where the amplitude coefficients Cn and the phase angles φn are determinedby first computing auxiliary sine and cosine coefficients

An =2T

∫ T

0f(t) cos

2πn

Ttdt, (14.2)

Bn =2T

∫ T

0f(t) sin

2πn

Ttdt, (14.3)

and then computingCn =

√A2

n + B2n, (14.4)

andφn = tan−1 Bn

An. (14.5)

dhouse

This is bolloxed, and also we need the periodic version


Figure 14.3 shows how the Fourier Series representation can be used torepresent a square wave. It shows how the coefficients for the series arecalculated, and shows how the series gradually converges to the shape ofthe original wave as more and more terms are added to the sum.

What does this mean intuitively?

f(t) = 1/2C0 +∞∑

n=1

Cn cos(2πn

Tt − φn)

On the range from 0 to T , our function f(t) can be represented as thesum of a constant 1/2C0, which is simply the average value of f(t), anda sum of sine waves (shifted to the left by π/2 which changes the sinesto cosines), each weighted by the weights Cn and shifted to the right bythe distance T

2πnφn, and with frequencies that are integer multiples of thefundamental frequency 2π/T . The integer multiples of the fundamentalfrequency (shown in Figure 14.4) are known as its harmonics.

To determine the cosine weight for a particular value of n: 1) multiplythe function f(t) by the cosine, 2) find the area under the resulting curve,3) divide by 1/2 the fundamental interval T . In other words,

An =2T

∫ T

0f(t) cos

2πn

Ttdt.

A similar process is used to find the sine weights, yielding the equation

Bn =2T

∫ T

0f(t) sin

2πn

Ttdt.

If f(t) is our signal, we can now think of the signal in terms of itsfrequency content. The combined sine and cosine coefficient

Cn =√

A2n + B2

n

gives the total weight of the nth harmonic in determining the shape of thesignal f(t). In other words, the weights Cn tell you how important the nthharmonic is in determining the overall shape of the signal or function.

14.2.2 Fourier transform

A more general transformation from a time varying to a frequency-basedor frequency domain signal is given by the Fourier Transform. Unlike theFourier series, the Fourier transform does not require the signal to be of

dhouse

confusing?

dhouse

Need a figure earlier on that shows the fundamental and the first few harmonics.

dhouse

as in eqn(14.2)


1

-1

0 T

An =2T

∫ T

0

f(t) cos2πnT

tdt

=2T

[∫ T/2

0

cos2πnT

tdt −∫ T

T/2

cos2πnT

tdt

]

=1

πn

[sin

2πnT

t∣∣∣T/20 − sin

2πnT

t∣∣∣T

T/2

]= 0,

i.e. all cosine coefficients are 0.

Bn =2T

∫ T

0

f(t) sin2πnT

tdt

=2T

[∫ T/2

0

sin2πnT

tdt −∫ T

T/2

sin2πnT

tdt

]

=1

πn

[− cos

2πnT

t∣∣∣T/20 + cos

2πnT

t∣∣∣T

T/2

]

=2(1 − cos πn)

πn,

i.e. all sine coefficients are 0 for n even, and 4πn for n odd. Thus

f(t) =∑

n odd

4πn

sin2πnT

t =∑

n odd

4πn

cos(2πnT

t − π/2).

4 Terms1 Term Only 2 Terms

6 Terms 50 Terms 500 Terms

Figure 14.3: Square Wave Example of Fourier Series


t

1

−1

0T

sin(2//T)t

Figure 14.4: Fundamental Sine Wave of Period T

finite duration or to be periodic, and it produces a continuous frequencyspectrum, rather than discrete harmonic weights. It is given by

F (ω) =∫ ∞

−∞f(t)e−iωtdt, (14.6)

and its inverse isf(t) =

12π

∫ ∞

−∞F (ω)eiωtdω, (14.7)

where t is the measure used along the horizontal axis, ω is frequency mea-sured in radians per unit of the horizontal axis, and i =

√−1, and eiθ is the

complex exponential. The complex exponential is defined by its complexnumber equivalent in trigonometric form:

eiθ = cos θ + i sin θ.

The relationship between the Fourier Transform and the Fourier Se-ries may seem obscure, but due to the equivalence between the complexexponential and the trigonometric form, we could rewrite Equation 14.6 as

F (ω) =∫ ∞

−∞f(t) cos ωtdt − i

∫ ∞

−∞f(t) sin ωtdt. (14.8)

In this form, the Fourier Transform bears a close resemblance to the Fourierseries (compare Equation 14.8 with Equations 14.2 and 14.3 for the Fourierseries coefficients). Anyway, the point is that we are back to integrals ofproducts of our function and sines and cosines. The principal differencesbetween the Fourier Series and Fourier Transform representations are that

1. F (ω) is continuous, rather than a collection of discrete harmonicweights, and


2. F (ω) is a complex function, rather than a real function.

If F (ω) is the Fourier Transform of function f(t), we say that |F (ω)|(i.e. the magnitude of the Fourier Transform) is its Fourier Spectrum.Figure 14.5 shows what such a spectrum might look like for a typical inputsignal. Like the coefficients of the Fourier Series, the Fourier Spectrum givesa measure of how much each frequency contributes to the overall shape ofthe signal or function.

f(t)

t

|F(t)|

t

Figure 14.5: |F (ω)| is the Fourier Spectrum

Fourier transforms of real signals obey the following laws:

• If f(t) is a real signal, |F (ω)| is symmetric about 0.

• f(t) and F (ω) are two complete representations of the same signaland are completely recoverable from each other.

• If f(t) is periodic then F (ω) is discrete.

f(t)

t

|F(t)|

t

T2//T

• If f(t) is discrete (sampled), then F (ω) is periodic with period equalto the sampling rate of f(t).

|F(t)|

t2//T

... ...

f(t)

tT


• If f(t) is both discrete and periodic then F (ω) is also both discreteand periodic.

14.2.3 Discrete Fourier transform

There is a discrete version of the Fourier Transform based on the last law,that is especially useful for dealing with images or any kind of sampleddata. When our function f(t) is both discrete and periodic, then its FourierTransform F (ω) is also. It is clear that any discrete periodic functionrequires only a fixed number of samples or values to represent it. If thefunction is periodic, the discrete values that describe it repeat over andover, and there is no need to record the repetitions.

The same is true for the function’s Fourier Transform. Take a functionf(t) sampled on a unit interval and periodic with period N , then it isdescribed by N discrete sample values, and its Fourier transform is alsodescribed by N discrete sample values. Also, if there is a finite number ofsamples, then the Fourier Transform is given by a finite sum rather than anintegral. This form of the Fourier Transform is called the Discrete FourierTransform (DFT) and is given by

F (u) = 1/NN−1∑

t=0

f(t)e−i2πut/N ; u = 0, 1, 2, ..., N − 1, (14.9)

with inverse

f(t) =N−1∑

u=0

F (u)ei2πut/N ; t = 0, 1, 2, ..., N − 1. (14.10)

Note that in these equations t is still used as the time or distance measure,but it is a sequence of integers, and u is an integer frequency measure incycles per unit of the total span N , unlike ω that was measured in radiansper unit.

Figure 14.6 depicts a sampled periodic signal and its DFT. Note thatthe DFT itself is sampled and periodic with period N , but more – it issymmetric about the origin, u = 0. This symmetricity is an attributeof the Fourier Transform of any real valued signal. Since F (u) is bothperiodic and symmetric, we need to store only 1/2 of the N samples tocompletely describe F (u), but remember that F (u) is complex and so carriesboth amplitude and phase information at each sample. Thus we need twonumbers to describe each sample so that we still have to store a total of Nnumbers – we haven’t gained any “compression” simply by doing the DFT.However, this symmetry is very important in giving us a way to compute


the DFT efficiently, as we will see later when we discuss the fast algorithmfor computing the DFT, known as the Fast Fourier Transform.

......

t

f(t)

-4 -3 -2 -1 1 2 3 4 5 6 7

a) f(t) sampled on unit interval and with period 4

......

u

|F(u)|

-4 -3 -2 -1 1 2 3 4 5 6 7-5-6-7

b) F (u) also has period 4 and is symmetric about the origin

Figure 14.6: A Discrete Periodic Function and its Fourier Transform

14.2.4 Discrete Fourier Transform of an image

This discrete form of the Fourier Transform permits us to transform adigital image. Given an image sampled with N samples per scanline, andM scanlines, and letting x represent horizontal distance, y vertical distance(i.e. column and row numbers) and u and v represent frequency in cyclesper scanline or cycles per column, the two-dimensional Discrete FourierTransform is given by

F (u, v) =1

MN

M−1∑

x=0

N−1∑

y=0

f(x, y)e−i2π( uxN + vy

M ),

with inverse transform

f(x, y) =M−1∑

u=0

N−1∑

v=0

F (u, v)ei2π( uxN + vy

M ),

Here x, y, u, and v are integers, with 0 ≤ x, u < N , and 0 ≤ y, v < M .


Because of the law x(a+b) = xaxb governing sums of exponents, theseequations can be factored into a handy computational form that allowscomputation of the DFT in two parts – first by computing the DFT ofeach scanline, and then by computing the DFT of each column of thisintermediate form. This factorization gives us

F (u, v) =1

MN

M−1∑

y=0

[N−1∑

x=0

f(x, y)e−i2πux/N

]e−i2πvy/M , (14.11)

and

f(x, y) =M−1∑

v=0

[N−1∑

u=0

F (u, v)ei2πux/N

]ei2πvy/M , (14.12)

Now, we have waded through a lot of mathematics, and you might beasking yourself at this point, what does the Discrete Fourier Transformbuy us? In a word, it gives us a new way of looking at our image data.Specifically,

1. it gives us a different space within which to study images, known asthe frequency domain.

2. |F (u, v)| gives the frequency spectrum, or spectral content of the im-age, i.e. how much “information” in the image is concentrated atvarious frequencies. This can tip us off as to what periodicities existin the image, and what ranges of frequencies are most represented?

3. The frequency content of an image is strongly related to what kindof features might exist in the image. In general, hard edges and highcontrast in an image are equivalent to having strong high frequencycontent in the Fourier Transform. Soft edges and low contrast areequivalent to having strong low frequency content.

4. Certain filtering operations become trivial to implement in the fre-quency domain. The filtering process can be seen as

f(t)

fouriertransform

=⇒ F (ω)

applyfilter

−→ F ′(ω)

inversetransform

=⇒ f ′(t)

5. The Fourier transform holds the key to understanding and copingwith sampling issues.


14.2.5 A demonstration of frequency based image con-struction

The demonstration program frequency in the examples directory underthe course directory allows the user to construct images by specifying theamplitude and phase of the harmonic frequencies along scanlines and downcolumns. The program sums all of the harmonics, using one such sum for allof the scanlines, and another such sum for all of the columns. The interfaceto the program is simply a menu of three buttons. The open button is usedto select an input data file describing the image size and harmonic content.the save button allows the user to save an image to a file once it is created,and the quit button is used to exit the program.

Input to the program is via a data file organized as shown in Figure 14.7.The first line of the file contains the image width and height, which must beinteger numbers between 1 and 1,024. The second line contains a numberbetween 0.0 and 1.0 which gives the average image intensity. The third linecontains triples describing scanline (horizontal) harmonics and the fourthline has an identical format but describes column harmonics. Each har-monic triple consists of a positive integer harmonic number between 1 andone-half the scanline (or column) width, a number between 0.0 and 1.0giving the amplitude of the harmonic, and a number between 0 and 360giving the phase angle of the harmonic in degrees. The two lines describingharmonics each start with an interpolation flag that must be either 0 or 1.If the flag is 0, each specified harmonic is taken as an explicit discrete value.If the flag is 1, harmonics are taken in pairs, with amplitude and phase ofthe harmonics lying between these pairs linearly interpolated between thetwo harmonics in the pair.

Data Descriptionwidth height image sizemidlevel average grey levelinterpolation-flag harmonic amplitude phase ... scanline harmonicsinterpolation-flag harmonic amplitude phase ... column harmonics

Figure 14.7: frequency Program Input Data File Format

The image is constructed by taking products across the scanline andcolumn harmonic sums. Given the arrays S and C contining the computedsums of scanline and column harmonics, the pixel in image I at location(row, col) is computed by


I[row][col] = S[col] * C[row];

Figure 14.8 shows how this computation works. The pixel with the • in theoutput image on the left is the product of the scanline and column arraycells containing •’s. Likewise, the pixel with the # is the product of thescanline and column cells containing #’s.

= *#

##

Figure 14.8: Row-Column Product used by Demonstration Programfrequency

This is not the same as the fully general approach to building up animage, given by the Fourier Transform, since unlike in a two-dimensionalFourier Transform, each row has the same spectrum, and each column hasthe same spectrum. Nevertheless, one gets the idea of how patterns can bebuilt up, and how a Fourier Transform can represent an image.

14.3 The Sampling Theorem

The Sampling theorem gives the following very surprising result. It statesthat if a signal is sampled at a sampling rate greater than twice the highestfrequency contained in the signal, then it is possible to perfectly reproducethe signal from the samples. This is precisely what we want to be ableto do with images – we would like our pixels (samples) to provide enoughinformation so that we can exactly reproduce the original unsampled scene(or image). The reasoning in the theorem is as follows.

Assume that we start with a signal f(t) that is band-limited, i.e. it has amaximum frequency umax beyond which its Fourier Spectrum |F (u)| is zero.Such a band limited signal and its spectrum are shown in Figure 14.9a.Now, sample that signal at some sampling frequency us to form a newdiscrete signal f∗(t) and take its Fourier Transform. It turns out that theFourier Transform F ∗(u) of the sampled signal f∗(t) is the mathematicalsum of an infinite number of copies of the Fourier Transform F (u) of theoriginal signal f(t), with each copy of F (u) shifted from its neighboringcopies by exactly the sampling frequency, as shown in Figure 14.9b. Now,since F (u) is band limited, the copies of F (u) in the sampled transform

14.3. THE SAMPLING THEOREM 13

F ∗(u) will not overlap as long as the sampling frequency, us is greater thantwice the highest frequency in F (u). If the sampling frequency is less thanthis value, as depicted in Figure 14.9c, the lobes overlap and when theyare summed, the shape of the Fourier Transform is distorted in the regionof overlap. You can see from this figure that high frequencies get shiftedinto and added together with low frequencies. This is the source of thealiasing phenomenon – because of sampling, artifacts get introduced intothe sampled data that cannot be removed – they come from the samplingprocess itself.

f(t)

t

|F(u)|

uumax-umaxa) signal and its Fourier spectrum

us = 2//T

|F*(u)|

u

... ...

us-umax

f*(t)

tT

b) well sampled signal and its Fourier spectrum

us = 2//T

f*(t)

tT

... ...

|F*(u)|

uus-umax

c) undersampled signal and its Fourier spectrum

Figure 14.9: Fourier Spectra of Continuous and Sampled Band-limited Sig-nals

The reasoning in the sampling theorem continues – if we truncate theFourier Transform of a sampled signal at exactly us, as shown in Fig-ure 14.10a, to obtain the filtered Fourier Transform F ∗′

(u), then we haveback the Fourier Transform F (u) of the original signal! In other words,

F ∗′(u) = F (u), if us/2 > umax.


All we have to do is to take the inverse Fourier Transform to perfectlyrecover the full original signal from our sampled data. However, if we havenot sampled fast enough so that us/2 is larger than umax, as is the case inFigure 14.10b, the resulting Fourier Transform F ∗′

(u) will be misformed,and our reconstructed signal will have aliasing artifacts no matter what wedo.

...

|F*(u)|

...

-umax -umax

us/2-us/2

|F*’(u)|

-umax -umax

us/2-us/2

a) filtering to remove shifted copies of F (u)

... ...

|F*(u)|

-umax

us/2-us/2

umax

|F*’(u)|

-us/2 -us/2

b) deformation of result when sampling frequency is too low

Figure 14.10: Filtering in Frequency to Recover Fourier Transform of Orig-inal Signal

Here is the moral again:

If you want to sample sound, image data, anything at all, youhad better sample at more than twice the highest frequency inthe original signal, otherwise you will have aliasing artifacts inthe result no matter what you do later.

Figure 14.11 shows this result in simple graphical form.

14.4 Ideal Low-Pass Filter

In the chapter on removing aliasing and reconstruction artifacts, we statedthat the ideal low-pass filter was the sinc function. The secret to the deriva-tion of this result is the rather interesting fact that convolution in the spatialdomain is equivalent to multiplication in the frequency domain. It is left as

14.4. IDEAL LOW-PASS FILTER 15

no aliasingwell sampled signal

a) plenty of samples per period

under sampled signal aliased result

b) not enough samples per period

Figure 14.11: Aliasing Due to Undersampling

an exercise for the student to prove this to yourself by taking the FourierTransform of the convolution of two functions.

If we denote the convolution operation by the operator ∗, and the pairingof a function with its fourier transform by the operator ⇔, this result canbe expressed algebraically by

h(t) ∗ f(t) ⇐⇒ H(u)F (u),

where f(t) is the function, F (u) its Fourier transform, h(t) is the convolu-tion kernel and H(u) its Fourier transform. The law

h(t)f(t) ⇐⇒ H(u) ∗ F (u)

also holds but is not useful here.Now in the frequency domain, an ideal low-pass filter is easy to describe.

It is simply a unit pulse, centered at the origin in the frequency domain,with width twice the cutoff frequency. If we multiply any frequency domainfunction by the unit pulse, it will be set to zero outside of the extent of thepulse, and will be unchanged under the pulse. The relationship betweenmultiplication and convolution means that if we multiply the Fourier Trans-form FT (u), of a function f(t) sampled at sampling period T , by the unitpulse and then take the inverse transform, we should get the same result asif we convolved the inverse Fourier Transform of the pulse with the original


pT (t) =∫ ∞

−∞PT (u)ei2πutdu

=∫ 1/(2T )

−1/(2T )ei2πutdu

=eiπt/T − e−iπt/T

i2πt

pT (t) =sin πt/T

πt

so for T = 1, we have

p1(t) =sin πt

πt

which is the sinc function with period 2 and maximum amplitude 1 at t = 0.

Figure 14.12: Inverse Fourier Transform of Unit Pulse of Width 2π/T Cen-tered at Origin

sampled signal fT (t). Thus, the way to get the correct convolution kernelfor the ideal low-pass filter in the spatial domain would be to compute theinverse Fourier Transform of the unit pulse in frequency. This derivation isshown in Figure 14.12.

14.5. FAST FOURIER TRANSFORM 17

14.5 Fast Fourier Transform

The Discrete Fourier Transform in the form given by Equations 14.9 and14.10, and the two-dimensional version given by Equations 14.11 and 14.12are easy to program. Let us assume that we have defined a type Complexfor working with complex numbers, and that we have defined a function

Complex W(int n, int N)which returns e−i2πn/N as a complex number. Then, an algorithm basedon Equation 14.9 to compute the Discrete Fourier Transform of the arrayf of N elements is:

Complex *DFT(float f[], int N){int u, n;Complex *dft;

dft = new Complex[N];for(u = 0; u < N; u++){

dft[u] = 0;for(n = 0; n < N; n++)dft[u] += f[n] * W(u * n, N);

}return dft;

}

Now, this algorithm is extremely simple, but it has a very real problem.That is that the two nested for-loops, each iterating N times means thatthe algorithm will have a computation time proportional to the square ofN . In the parlance of complexity analysis, we say that the algorithm isO(N2). Now, this is not a problem if the number of samples N is small,but imagine a two dimensional version of this algorithm operating on a512 × 512 image. Since the total number of pixels is approximately 1/4 ofa million, then if we know the computation time to compute a one pixelFourier Transform, we would have to expect that the total computationtime will be on the order of 60 billion times greater! This is not likely tobe a useful algorithm for working on real images.

Fortunately, there are several hidden symmetries in the calculation ofthe Discrete Fourier Transform that can be exploited for big time savings.The name given to the fast version of the DFT algorithm that exploits thisis the Fast Fourier Transform. This is a bit of a misnomer, since what itcomputes is exactly the Discrete Fourier Transform, it just does it in anamount of time proportional to N log N . This means that our 512 × 512DFT that took 60 billion units of time using the naiive algorithm above,


can be done in about 1.5 million units of time, a speed-up of about 40,000times! Clearly, we need this algorithm.

The Fast Fourier Transform algorithm uses a Divide and Conquer strat-egy to exploit symmetries in the Discrete Fourier Transform calculation.What we mean by this is that at each stage in the calculation, the problemis separated into two problems, each 1/2 the size of the original problem.This separation is done recursively, for example breaking a size 4 probleminto two size 2 problems, then dividing each of these size 2 problems intotwo size 1 problems. When the problem is reduced to a small enough size(size 1 in this case) so that it can be computed easily, the computation isdone. Then, after computing each subproblem separately, the results arerecombined to eventually form the final solution. By doing this division ina clever way, redundancies in the calculation of the DFT are uncovered andeasily exploited.

First, let us define the function

WN = e−i2π/N .

Then substituting WN for the complex exponential and the variable n fort, we can rewrite Equation 14.9 for the DFT as

F (u) =N−1∑

n=0

f(n)WnuN . (14.13)

The summation in Equation 14.13 can be subdivided into two summations,one that sums over the even numbered terms and one that sums over theodd numbered terms. This gives

F (u) =N/2−1∑

n=0

f(2n)W 2nuN +

N/2−1∑

n=0

f(2n + 1)W (2n+1)uN ,

the second summation of which can be factored to yield

F (u) =N/2−1∑

n=0

f(2n)W 2nuN + Wu

N

N/2−1∑

n=0

f(2n + 1)W 2nuN . (14.14)

ButW 2

N = e−i2π/(N/2) = WN/2,

so Equation 14.14 can be rewritten

F (u) =N/2−1∑

n=0

f(2n)WnuN/2 + Wu

N

N/2−1∑

n=0

f(2n + 1)WnuN/2. (14.15)


The summations in Equation 14.15 can be recognized as simply two DFT’s,each of a function 1/2 the size of the original function. Let us define thesetwo DFT’s to be

Fe(u) =N/2−1∑

n=0

f(2n)WnuN/2,

representing the even numbered terms in the original DFT and

Fo(u) =N/2−1∑

n=0

f(2n + 1)WnuN/2

representing the odd numbered terms in the original DFT. Then we canrewrite Equation 14.15 to give the equation that forms the heart of the FastFourier Transform Algorithm

F (u) = Fe(u) + WuNFo(u). (14.16)


Equation 14.16 gives us a recursive formulation of the DFT calculation,i.e. the DFT is defined in terms of itself. All that we need to complete thisrecursive formulation is to determine a base case to end the recursion. Notethat when N = 1, the DFT of the single sample is just the single sampleitself, since W1 = e−i2π = 1. Thus the base case is that the DFT of a signalconsisting of only a single sample is the sample itself. The following versionof the DFT algorithm (which has a hidden flaw) is based on Equation 14.16and this base case:

Complex *FFT(float f[], int N){int n, u;float *fe, *fo;Complex *Fe, *Fo;Complex *fft;

fft = new Complex[N];if(N == 1)

fft[0] = f[0];else{

fe = new float[N/2];fo = new float[N/2];for(n = 0; n < N/2; n++){fe[n] = f[2*n];fo[n] = f[2*n + 1];

}

Fe = FFT(fe, N/2);Fo = FFT(fo, N/2);

for(u = 0; u < N; u++)fft[u] = Fe[u] + W(u, N) * Fo[u];

delete fe; delete fo; delete Fe; delete Fo;}

return fft;}

Did you find the flaw? On careful analysis it is made obvious by thearray sizes – the DFT’s Fe and Fo for the odd and even functions are rep-resented in the algorithm by arrays Fe and Fo, each of which has only N/2elements. However, the for-loop that merges the two partial calculations ofthe FFT needs to fill in N entries in the output array fft. Three simple


observations will give us all that we need to fix the problem, and give us aworking algorithm.

First, note that the DFT of a function containing N samples is reallyan infinite periodic function, with period N . This relationship can be seenin Figure 14.13, which depicts the complex exponential function W4 as asequence of unit vectors rotating around the origin of the complex plane.This periodicity can also be seen algebraically, because Wn+N

N = WnNWN

N ,but

WNN = e−i2π = 1,

yieldingWn+N

N = WnN .

Now, if WN is periodic with period N , then WN/2 must be periodic withperiod N/2, and we can simply index the arrays Fe and Fo modulo N/2.

Second, note that WnN and Wn+N/2

N are mirror reflections of each other.Like the periodicity, this relationship can also be seen in Figure 14.13. Themirroring is because Wn+N/2

N = WnNWN/2

N , but

WN/2N = e−iπ = −1,

yieldingWn+N/2

N = −WnN .

Re

Im W4W4 = W43 4+3

W40

W41

W42

Figure 14.13: Wn4 is Periodic in n with Period 4

Finally, note that choosing a base case of size 2 instead of size 1 alsoleaves us with only a simple calculation. Since W 0

2 = 1 and W 12 = −1, we

haveF [0] = f [0] + f [1],

andF [1] = f [0] − f [1].


Using these three observations, we can recode the FFT algorithm to getthe final working version:

Complex *FFT(float f[], int N){int n, u;float *fe, *fo;Complex *Fe, *Fo;Complex *fft;

fft = new Complex[N];if(N == 2){

fft[0] = f[0] + f[1];fft[1] = f[0] - f[1];

}else{

fe = new float[N/2];fo = new float[N/2];for(n = 0; n < N/2; n++){fe[n] = f[2*n];fo[n] = f[2*n + 1];

}

Fe = FFT(fe, N/2);Fo = FFT(fo, N/2);

for(u = 0; u < N/2; u++){fft[u] = Fe[u] + W(u, N) * Fo[u];fft[2*u] = Fe[u] - W(u, N) * Fo[u];

}

delete fe; delete fo; delete Fe; delete Fo;}return fft;

}

There is one last thing to note about our algorithm. It is really only goodfor values of N that are integer powers of 2. This is because our recursivesubdivision and recombination process expects to be able to eventuallysubdivide the array f down to chunks, each of which is of size 2. Thiscan only be done if N is a power of 2. If N is not a power of 2, thenone or the other of the chunks will eventually reach an odd size and whenit is subdivided the resulting chunks will not be of the same size. Thisrestriction on the algorithm is not really a big limitation of the algorithm


since we can always pad our image out so that it fits into the smallestrectangle the lengths of whose sides are powers of 2.

Now, let us look at the computational complexity of the algorithm.My claim was that we would be able to achieve O(N log n) performance.The old algorithm had two nested loops, each of which iterated N times,yielding O(N2) performance. In the new algorithm the nesting of loops isgone, being replaced by subdivisions and recursive calls. At each level ofrecursion, the main calculation loop iterates N/2L times, where N is thesize of the original problem, and L is the level of recursion, i.e. L = 1at the top level, 2 after one subdivision, etc. But, at level L the problemis subdivided into 2L−1 chunks, so the total number of iterations at eachlevel is (N/2L)2L−1 = N/2. Now, if N = 2M , then the deepest levelof subdivision will be Lmax = log2 N = M . So, the algorithm does N/2iterations at each of log2 N levels, yielding an algorithm whose performancewill be proportional to N log N .

the digital image - clemson university

Documents