target location in aerial photography · mittee for aerial photography. they are reproduced here...

Richard LancasterChurchill College

[email protected]

Target location in aerial photography

Cambridge University Computer Science Tripos

Part II project dissertation 1999

Richard LancasterChurchill College

[email protected]

Target location in aerial photography

Cambridge University Computer Science Tripos

Part II project dissertation 1999

Word count: 11,700Project Overseers: J G Daugman and L C PaulsonProject Supervisor: Andrew Penrose

Original aims

To design and implement an Automatic Target Recognition (ATR) system capable of locating cars invertical aerial photographs.

Work completed

The original aims have been met. A functional ATR system capable of locating cars in vertical aerialphotographs with reasonable accuracy has been designed and successfully implemented.

Special difficulties

None.

Copyright notices

The aerial photographs reproduced in this document are copyright the University of Cambridge Com-mittee for Aerial Photography. They are reproduced here with permission, but may not be reproduced,stored in a retrieval system, or transmitted in any form or by any means, electronic, mechanical, photo-copying, recording, or otherwise, without prior written permission of the copyright holder.

All other parts of this document are copyright Richard Lancaster and may not be reproduced, stored ina retrieval system, or transmitted in any form or by any means, electronic, mechanical, photocopying,recording, or otherwise, without prior written permission of the copyright holder.

Contents

1 Introduction 11.1 Specification . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11.2 Background . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11.3 Previous related work . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1

2 Preparation 32.1 Requirements analysis . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 32.2 Obtaining a dataset . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 32.3 Research . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 42.4 The generic architecture of an ATR system . . . . . . . . . . . . . . . . . . . . . . . . . . 42.5 Instantiating the architecture . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 52.6 Testing . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11

3 Implementation 133.1 Low level objects . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 133.2 Top level structure . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 163.3 Preprocessor . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 173.4 Segmenter . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 173.5 Fourier invariance transforms . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 233.6 Neural net classifier . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 253.7 Facilities for testing . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 263.8 Third party code . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 263.9 Source statistics . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 26

4 Evaluation 294.1 Segmenter . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 294.2 Fourier invariance transforms . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 294.3 Neural net classifier . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 324.4 The complete system . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 32

5 Conclusions 375.1 Comparison with aims . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 375.2 Design improvements . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 37

A Header file for InplaceVector object 41

B Definition of ImageSegment structure 43

C Header file for GFX module 45

iii

D Neural net feature vector evaluation function 47

E Fourier log polar invariance transform testing 49

F Fourier polar invariance transform testing 53

G Project proposal 55G.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 55G.2 Detailed specification . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 55G.3 Metrics of success . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 55G.4 Project content . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 56G.5 Implementation environment . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 56G.6 Starting point . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 56G.7 Special resources required . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 56G.8 Timetable . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 57

iv

Chapter 1

Introduction

Automatic Target Recognition (ATR) is a branchof the fields of Computer Vision and Pattern Recog-nition. In general an ATR system is one that readsin real world data from sensors, then locates andidentifies any objects of interest contained in thatdata. Classic ATR applications range from thedetection and identification of incoming aircraft,to searching spy satellite imagery for the militaryhardware of an adversary.

1.1 Specification

The intention of this project was to implement anATR system that located cars in vertical1 aerialphotography2. So for example, if given an aerialphotograph of a town, it would return the coordi-nates of all the cars that are visible in the image.

The aim was to produce a back end library thatconceptually had a single function call. The formatof this function call was to be along the lines of thespecification in Figure 1.1.

Potential applications of such a system wouldinclude aiding in the investigation of traffic con-gestion, or if suitably adapted, tracking the move-ments of military vehicles.

1.2 Background

ATR is a field in which there is a large body ofpast and ongoing research. There therefore exists awide range of building block techniques and system

1In the aerial photography industry, vertical refers tophotographs taken looking down from directly above the ob-ject.

2Aerial photography is taken here to mean data obtainedfrom satellites, conventional aircraft, airships or balloons.

AnalyseImage(<image>)

returns <array of object locations>

PARAMETERS:

<image> : The image to be processed.

RETURNS:

An array of (x,y) coordinates describing

where all the cars detected are to be

found in the image.

Figure 1.1: Specification of function call to be pro-vided by back end library

designs. The problem however is that there are nogeneral solutions. So a design that works well fora particular application will probably require fairlyheavy modification, at least of its low level compo-nents, to enable it to solve even a quite closely re-lated problem. Hence constructing an ATR systemis a matter of selecting and hand tailoring a num-ber of basic building blocks, then adding in someoriginal ideas and techniques to solve the problemsspecific to the particular application.

1.3 Previous related work

It it highly likely that the military has systemsalong the lines of this project to track the move-ment of military vehicles. These are also undoubt-edly classified. In the academic domain there is alot of research available on the building blocks ofsuch a vision system. However while there is a rea-sonable probability they exist, the author has yet tolocate data on any complete systems that performa function similar to the specified task.

1

Chapter 2

Preparation

This chapter details the research, thought pro-cesses and design that took place before implemen-tation of the project commenced.

2.1 Requirements analysis

One of the first tasks was to generate a precise setof requirements for the system. These requirementswere designed to be general enough, such that thelibrary developed during the project, could be usedfor any non real-time application that required thelocation of cars in top down aerial imagery. Theywere as follows:

• The system must be in the form of a black boxlibrary.

• When passed a colour top down aerial image,the system must return a vector of coordinatesspecifying the locations of cars within the im-age.

• The system must locate a high percentage ofthe cars that are visible in the image.

• The number of objects that are not cars, thatare marked as being such, must be as low aspossible.

• The system does not need to locate partiallyoccluded cars1.

• The system will be used for offline image anal-ysis.

• The system must be able to process an imageat least as fast as the same task could be per-formed by a human.

1This is a problem way beyond the scope of this project.

• The system must be executable on high endUNIX platforms so that optimum performancecan be obtained.

As well as influencing decisions made through-out the design and implementation stages of theproject, these requirements immediately led to anumber of global decisions:

• As the system had to execute fast on UNIXplatforms. It was therefore clear that either Cor C++ would need to be used for its imple-mentation, as other languages such a Java sim-ply do not provide the performance requiredfor this type of application or are not availableon all of the target platforms. Further it wasdecided to use C for the implementation as fullC++ support on UNIX is patchy at best2. Theuse of C does not however preclude the systembeing designed on OO principals.

• To execute on high end UNIX platforms thesystem would therefore have to be POSIX [1]compliant.

2.2 Obtaining a dataset

A dataset of aerial photographs was obtained andused with permission from the University of Cam-bridge Committee for Aerial Photography. Thesewere high resolution, vertical, colour images ofparts of the town of Cambridge, examples of whichare included3 throughout this document.

2Indeed even mainstream UNIX variants such as Linuxonly obtained good C++ support 6 months before thisproject was commenced.

3See copyright notice in front matter.

3

2.3 Research

At the start of the project, the author, while fa-miliar with the language and operating system tobe used, had very little knowledge of computer vi-sion systems and techniques. A two month periodbetween October and December 1998 was there-fore spent researching techniques such as patternrecognition, segmentation, Fourier domains, neuralnetworks and other classification methods. The re-sults of this research will be referred to throughoutthe following sections.

2.4 The generic architectureof an ATR system

Early in the research period it became clear thatthere was a generic architecture common to mostATR systems of the type being developed. This ar-chitecture is illustrated in Figure 2.1. The functionof each of its modules is as follows.

Preprocessor

This module cleans up the input image and con-verts it into formats suitable for processing by thedownstream modules.

Detector/Segmenter

This module is a course grain first pass filter. Itsfunction is to analyse the image passed in. Thenpass out only a small number of segments4 of theimage, each of which it believes might contain anobject that is being searched for. The downstreammodules can then look at each of the segments inturn and simply say, “Yes, this is an object that I’mlooking for” or “No, this is not what I’m lookingfor”.

Figure 2.2 shows an example input image fromthis particular ATR problem. The white polygonsoverlayed on it are the boundaries of the segmentsthat might be output by a typical segmenter de-signed for this application.

The operation of this module is intended toachieve two important goals:

4A segment is defined in this context as a contiguous areaof an image

Figure 2.2: Possible output from an ATR segmenterof a car location application

• By selecting the segments to be passed down-stream using a computationally inexpensiveset of heuristics, the module can dramaticallyreduce the search space that needs consideringat a very low cost. Hence the computation-ally expensive downstream modules have to doas little work as possible. The heuristics usedmust of course err on the side of passing toomuch downstream and therefore never discardsomething that is actually being searched for.

• If the segmenter is sufficiently clever. Thenthe segments which contain objects of interestcan contain only the objects of interest andno background scenery at all. Notice that thisis the case in Figure 2.2, where those objectsthat are cars are exactly wrapped by their seg-ments. This means that the downstream mod-ules don’t get misled during classification bybackground data surrounding objects of inter-est. Indeed the Fourier invariance transformsdiscussed later rely on the segment containingonly the object of interest and no backgroundinformation.

Invariance transforms

It is often the case that the objects being searchedfor in the image can occur in any orientation orscale. This causes problems for any program thatis trying to decide whether the image segment it islooking at matches the database of examples that

4

Inputimage

Vector ofcoordinates

Invariancetransforms

PreprocessorDetector/segmenter

Classifier

Figure 2.1: The generic architecture of an ATR system of the type being constructed

Figure 2.3: Example of a rotated image

it is using to make a classification.For example Figure 2.3 contains images of the

same car differing only by rotations of 90, 180 and270 degrees. It is obvious to the human observerthat they are the same car, but when the imagesare stored as arrays in a computer’s memory therewill be almost no correlation between the contentsof the arrays what so ever.

It is therefore usual to perform a transformationof each image segment to a different physical rep-resentation in memory5, such that in this represen-tation the data stored is invariant under rotations,scalings and translations of the original image seg-ment. Hence the program can compare the data toexamples in its database without consideration ofsuch variations in the image held in the segment.

Classifier

This module makes the decision about whethereach of the segments it has been passed containsan object that is being searched for or not.

Data representation

It’s worth noting that the invariance transforms arenot the only place that the physical representationof the data in memory might be transformed. Forexample the preprocessor might perform an edgedetection operation on the data or further downthe pipeline a wavelet transform may be performed.The usual reason for such transforms is that the

5Examples of such representations are discussed later.

resultant representation provides some form of ad-vantageous characteristics that make the processingbeing performed simpler. These characteristics willbe discussed later.

2.5 Instantiating the architec-ture

It was decided to adopt the architecture detailed inSection 2.4 for the system being designed, as it pro-vided a good framework to work from. The researchthen turned to searching for possible methods ofimplementing the modules of this architecture. Theoptions uncovered and the decisions made are nowdiscussed.

2.5.1 Detector/Segmenter

It was clear from the literature on this topic thatsegmentation algorithms are in general domain spe-cific. In other words a segmentation algorithm thatworks for one particular application is unlikely tobe applicable to another. Therefore in general eachapplication requires the development of its own seg-mentation algorithm tailored to its needs.

The description of the algorithm developed forthis particular application is therefore left untilChapter 3, which deals with the system’s imple-mentation. There were however some useful con-cepts/building blocks discovered during the re-search and these are detailed here.

Segment growing

This is a common method of generating a simplesegment. Its operation is analogous to the flood filloperation of a graphics package. Initially a singlepixel (the seed pixel) must be specified in the areain which a segment is to be formed. A segment isthen created containing only this pixel. It is then

5

grown outwards as if a flood fill was occurring fromthe seed pixel until the growing area hits some kindof boundary. What constitutes a boundary is ap-plication dependent, however it might for examplebe that the intensity of the pixels reached are morethan a certain number of intensity levels away fromthe seed pixel.

Segment aggregation

This is a general concept that refers to any tech-nique that analyses a set of segments and mergesany that together form a more useful segment thaneach of the component segments on their own.What kind of segment is deemed to be more usefulis of course application dependent.

2.5.2 Invariance transforms

Research [2] suggested that two major methodswere commonly applied in ATR applications tomake the data representing a segment invariant un-der rotation, scaling and translation of the segment.

Fourier log polar invariance techniques

This technique, first implemented using purely op-tical systems [3], relies on the mathematical prop-erties of the Fourier transform to obtain the invari-ance. Full details of the theory are given in therelevant texts [2, 3, 4, 5]. However the basic resultis that an invariant representation of a segment canbe obtained by the following process (which is alsoillustrated in Figure 2.4):

• Perform a 2D Fourier transform on segment’simage data.

• Compute the power spectrum of Fourier trans-form.

• Map the power spectrum of the Fourier trans-form onto a log polar basis.

• Treating the log polar coordinates as standardcartesian coordinates, perform a further 2DFourier transform on the data.

• Compute the power spectrum of this Fouriertransform.

It must be noted that each segment that this setof transforms is performed on must contain onlythe object of interest and no background infor-mation. This is because background informationwould propagate through into the invariant repre-sentation, hence making this invariant representa-tion vary depending on which background the ob-ject of interest is against. This obviously defeatsthe whole point of having an invarience transform.

High order neural networks (HONN)

This method [6] relies on a neural net being used asthe classifier in the system. The basic principle isthat extra layers of neurons are added to the inputof the net. These layers conceptually allow the netto perform transformations on the data such that ithas an invariant representation before it attemptsto classify it in the final layers of the net.

This method can also be used in combinationwith selected stages from the Fourier log polar tech-nique to produce a hybrid system [7].

The decision

Some of the research that has been performed onthese systems [6, 7], suggests that using a HONNprovides an approximately 10% better classificationaccuracy than Fourier methods followed by a clas-sical three layer neural net. It is also suggestedthat they are more resilient to noise in the inputimage. However it was decided that Fourier invari-ence techniques would be used to obtain invarience.

This was because HONNs are relatively complexstructures which need to be carefully designed andtrained or they simply will not solve the given prob-lem. Debugging a none functional net is then prob-lematic due to the difficulties involved in under-standing what is going on inside the net. Thereforeas the author had no knowledge of neural nets atthe start of the project, an engineering decision wasmade that a HONN was something that could getcompletely out of hand and not achieve any suc-cessful results by the end of the project.

Keeping the invarience transforms separate fromthe classifier would also aid implementation, inthat each of the modules could be tested and val-idated independently before being connected to-gether. This would not be possible with a HONNwhich would have to be tested as a whole.

6

Fouriertransform

Fouriertransform

y

Log polarmapping

Input image

Theta

ν

µ

x

Log R

Theta

Invariant data

µ

ν

R

Figure 2.4: Graphical illustration of the Fourier log polar invarience technique.

7

2.5.3 Classification methods

Two major classification methods were deemed tobe applicable to this problem. These were nearestneighbour techniques and neural nets used solelyas a classifier (As opposed to the HONNs discussedearlier that achieved both invarience and classifica-tion). However before discussing these methods afew preliminary concepts need introduction.

Feature vectors

A feature vector is a list of parameters, representingan object, that is fed into the classifier for classifi-cation. So for example if the problem being tackledwas the classification of rectangles into those thatare squares and those that are simply rectangles,then the obvious feature vector to feed into theclassifier would be one containing the width andheight of the rectangle to be classified.

The process of converting the data held aboutthe object to be classified into a feature vector istermed as feature extraction. So if the rectanglesin the example above were input as images, thenfeature extraction would involve the measurementof the width and height of the rectangles.

However in real world problems, such as the onebeing tackled by this project, it is often the casethat all the data that we have about the objectis simply placed into the feature vector. This isbecause with complex real world data it is oftendifficult to pass the classifier all the data it needsto perform classification without giving it all of thedata. Hence due to the shear complexity of the datato be output by the Fourier invarience transforms6,it was deemed necessary to simply take this outputas the input feature vector.

Training sets

The classifier in the rectangles example simplyneeded the knowledge that a square has a widththat equals its height to make a classification. How-ever in a real world problem it is common for a clas-sifier to be fed a large set of examples of the thingsit is trying to classify, each tagged with their re-spective class7. It can then use these in some way

6See the example in Figure 2.4.7A class is simply the group of things to which an object

belongs. So in the rectangles examples there would be twoclasses - rectangles and squares.

to make classifications of unseen objects it gets pre-sented with. This data set is known as the trainingset.

Nearest neighbour techniques

In this style of classifier, a feature vector is takento directly represent a location in n dimensionalspace, where n is the number of parameters in eachfeature vector. The idea is that if all of the featurevectors from the training set are plotted into thespace. Then it becomes possible to classify a newobject by plotting it into the space and working outwhich of the training set points it is closest to interms of Euclidean distance. The class of the newobject is then taken to be the same as the class ofthis nearest training set point. A variation of thisis to take a majority vote on the class using theclasses of the k nearest training set points. Wherek is an arbitrary number.

Neural networks

This style of classifier attempts to simulate the pat-tern recognition abilities of the brain by using arti-ficial representations of biological neurons. A com-prehensive coverage of this topic is impossible inthe space available here, so I refer the reader to anygood book [8, 9] on the topic for the full details.

However in brief, a network of artificial neuronsis constructed. These are then trained by repeat-edly presenting them with all of the feature vectorsfrom the training set. The error that they makein classifying each feature vector is used to alterweighting functions on the synapses8 between eachof the neurons and the activation threshold of theneurons themselves using an algorithm such a backpropagation [8, chapters 4, 5 and 6].

Unfortunately if the net is trained for too longthen it learns every noise perturbation in the train-ing set. This noise generally makes the objects inthe training set slightly a-typical of the objects tobe classified by the net. This means that know-ing about all these noise perturbations makes thenet worse at classifying unseen objects than a netthat hasn’t been trained for so long and hence hasa slightly more general knowledge of the trainingset.

8A synapse is a connection between two neurons.

8

To prevent this over generalisation a second, un-seen, annotated training set known as the valida-tion set is employed. This is not used to trainthe net. Instead at the end of each training passthrough the training set (called an epoch) the totalerror in classifying the validation set is computed.If this falls after every epoch then the net is ob-viously getting better at classifying unseen data.However if after a period of decline the error startsto rise, then it is likely that the net has startedto over generalise and hence is becoming worse atclassifying unseen data. At this point training isstopped as the net has reaching its maximum per-formance.

A trained net can then be presented with previ-ously unseen objects, which providing the net hasactually managed to get a feel for the underlyingproblem9, it should be able to correctly classify.

The decision

It was decided that a neural net would be usedto perform classification. This was because neuralnets intrinsically exhibit two properties necessaryfor this application that are more difficult to obtainwith nearest neighbour techniques.

The first is that during training, a neural netgenerates a weight for each of the parameters thatmake up a feature vector. Hence some parame-ters have more bearing on the classification resultthan others. Nearest neighbour techniques are notable to do this by default, as all of the parametersof a feature vector have a direct and equally im-portant bearing on the Euclidean distance betweenthe points plotted in feature space. It is possible toperform an analysis of the training set to determinewhich of the parameters is of more importance andthen generate weights for each of them. Howeverneural nets give you this by default.

This weighting ability is essential if classificationis to be perform directly on the raw data ratherthan on carefully extracted features of the data10.To understand why, consider the fact that the fea-ture vectors contain the power spectrums of Fouriertransforms. It is highly likely that a particular pa-rameter of these feature vectors bears no relationto whether a feature vector is a car or not. There-fore giving this parameter the same importance as

9This is termed as having generalised.10Such as the width and height in the rectangles example.

the other parameters will simply confuse the clas-sification.

Secondly, during training a neural net classifieris able to bind together several input parameters inits first layer of neurons and set up a simple relationfunction on them, the results of which it can thenuse to make the classification decision in its finallayer of neurons. This is important because a par-ticular feature that is important for classificationmay move around between several different featurevector parameters for different objects of the sameclass. This is rather more difficult to obtain us-ing nearest neighbour techniques, and hence is themain reason for selecting a neural net.

Which training set

Thought was then given to what exactly should bein the training set used to train the net, and howthis training set was to be obtained. As the job ofthe net was to be to classify the output from thesegmenter, it hence seemed logical that the train-ing set should be the segments output from the seg-menter, when run on example images. Each of thesegments would then be tagged by a human userwith its class, to generate an “annotated” trainingset. The hope was that the net would then be ableto learn the difference between a car and the otherobjects that the segmenter output11.

This therefore generated the requirement that itmust be possible to execute the pipeline in twomodes other than the linear mode that has beendiscussed so far. Firstly it must be possible to cutthe pipeline just before the classifier and write outall of the segments generated to disc as a trainingset. It must then be possible for the user to an-notate the segments with their classes and use thisannotated training set to train the neural net. Thedata flow diagrams illustrating this are included inFigure 2.5.

2.5.4 Discontinuities equal informa-tion

One of the points that came out of the literature [5],was that classification should not be performed bylooking at the pixel intensities of the segment to

11Of course if the other objects were not sufficiently dif-ferent from the cars then the classifier would never be ableto tell them apart.

9


Training orvalidation set

Annotatedtraining set

Annotatedvalidation set

Knowledge set

Neural net

Mode B: Neural net training

Inputimage

Vector ofcoordinates



Neural net

Knowledge set

Mode C: Image analysis


Inputimage

Mode A: Training set generation

Figure 2.5: Execution configurations of main pipeline

10

a

c

d

b

A B

C D

B

C D

A

Figure 2.6: Illustration of edge detection

be classified. Instead it should be performed onthe intensity differences between neighbouring pix-els. These differences are more commonly knownin computer science as the edge data or mathemat-ically as the two dimensional first derivative of theimage.

This is because “Discontinuities equal Informa-tion” [5]. What this means is that it’s not reallythe intensity of each pixel in an image that allowsyou to identify a car as a car, but the differencesin intensity between neighbouring pixels. E.g. it isthe edges that are important.

To demonstrate this point take a look at Fig-ure 2.6.a, this contains an image of a white and ablack car on a grey background. Figure 2.6.b showsthe intensity of the pixels along the scanline AB. Itcan be seen that the white car produces a large peakin the intensity and the black car a large depression.So while they are both cars they have completelydifferent signatures in the intensity domain.

Figure 2.6.c is the magnitude of the edges in Fig-ure 2.6.a. Figure 2.6.d is a plot of the same scanline,but this time taken across the edge data. Now ac-cepting the fact that the data is noisy because thisis a real image, it can be seen that both of the carsare represented by a pair of spikes. So by movinginto the edge domain the signatures of cars withdifferent intensity signatures have been given thesame form.

To obtain this edge data it was therefore logicalto place an edge detection module into the prepro-cessor. Hence segments output by the segmentercould contain edge information rather than pixelintensity information.

2.5.5 Noise filtering

The final major point that came out of the researchwas that the preprocessor should contain a filter toremove the noise that is present in the majority ofimages. Failure to do this could result in the seg-menter or classifier being overwhelmed or confusedby a spike on a single pixel.

2.5.6 Putting it all together

Taking all these design decisions and putting themtogether resulted in the core pipeline structureshown in Figure 2.7.

2.6 Testing

One advantageous side effect of the linear pipelinedesign is that it makes testing very easy. This isa result of it consisting of a number of totally in-dependent modules, each for which can be inde-pendently excersised with test data and checked tomake sure they are producing the correct results.

11

Inputimage

Noise filter

Preprocessor

Edge

detector

Segmenter

Fourier invariance

engine

Neural net

classifier

Vector ofsegments

Knowledgeset

Figure 2.7: The core structure of the system

12

Chapter 3

Implementation

This chapter details the system that was con-structed to implement the top level design detailedin Chapter 2. It contains both a detailed descrip-tion of the system and some of the reasons it wasconstructed in this way.

3.1 Low level objects

Figure 3.1 contains a diagram of the low level utilitymodules and data structure objects that are partof the system implemented. Each of them is anno-tated with the methods it supports. The followingsections contain a discussion of the more interestingones.

InplaceVector & IndirectVector

One of the major data structures used in the sys-tem is that of the simple direct access list, or array.These are used to hold lists of things like pixels, seg-ments and coordinates. An array is easy to createin C by making a simple call to malloc. The prob-lem is that if you run out of space in the array thenyou start having to call realloc and handle errorconditions. This code is then going to be duplicatedeverywhere that the list is written to, resulting inan error prone mess.InplaceVector and IndirectVector hence pro-

vide encapsulated vector structures that simple re-size themselves when necessary1. On constructionthey are passed a default size2 and an inc step.Then when elements are added or removed using

1The term vector is taken here to mean an array thatdynamically resizes its self.

2A size of 1 would mean enough space to store a singleelement, rather that 1 byte of memory.

their get, set, append or cut methods, they en-sure that their size is always the default size plusthe minimum number of inc steps required to con-tain the elements they are holding. Note that thismeans that the vectors also contract, saving mem-ory, as elements are removed. The reason for usingsize increments of inc step is that calling reallocevery time you need to add a single element wouldserverly damage performance and fragment mem-ory.

The difference between the two vectors is thatthe InplaceVector stores elements one after an-other in a contiguous block of memory, whereasthe IndirectVector only stores pointers to the el-ements that are placed into it. This difference isillustrated in Figure 3.2. Both of the vectors alsohave some interesting extra features. For exam-ple the InplaceVector has push and pop meth-ods that allow it to be used as a stack. While theIndirectVector is selectively able to call the de-structors of the objects it has pointers to when itits self is destroyed. This greatly reduces the codecomplexity of data structure deallocation.

The declaration of InplaceVector is included inAppendix A.

BitMatrix, IntegerMatrix & DoubleMatrix

These structures are simply handy wrappers of twodimensional arrays of bits, ints and doubles re-spectively. They perform both memory handlingand range checking on arguments. The BitMatrixis interesting in that it only uses one bit to storeeach element, rather than the entire machine wordper element found in most naive C bit array imple-mentations. The DoubleMatrix is able to store andload its contents to and from a file, hence facilitat-

13

- Push

- Pop

InplaceVector

- Create

- Destroy

- GetSize

- Put

- Get

- Append

- Cut

- Flush

IntGeometry

- 2DPerpendicularDistanceOfPointFromLine

- 2DDistanceToPerpIntersectionPointWithLine

- 2DEclideanDistance

- 2DAngleBetweenTwoVectors

- Create

- Destroy

BitMatrix

- SetBit

- GetBit

- Create

- Destroy

IndirectVector

- GetSize

- Put

- Get

- Append

- Cut

- Sort

- Compare

- Search

IntMath

- Sqrt

String

- Create

- CreateAndSet

- Destroy

- Set

- nSet

- Copy

- DeleteSubString

- GetWord

- Append

- Insert

- GetPostfix

- Create

- Destroy

IntegerMatrix

- SetElement

- GetElement

- Create

- Destroy

ImageSegment

- Create

- Destroy

DoubleMatrix

- OutputToDataFile

- SetElement

- GetElement

- CreateFromDataFile

- Create

RawPixmap

- CreateFromPPMFile

- Destroy

- SetPixelRGB

- SetPixelIntensity

- GetPixelMeanIntensity

- GetPixelPeakIntensity

Draw

- Rectangle

- VerticalLine

- HorizontalLine

- Line

GFX

- Convolve

- ParallelConvolve

IO

- ReadLine

- ReadWord

Figure 3.1: Low level utility modules and data structure objects.

14

Spare capacity...

ImageSegment

ImageSegment

ImageSegment

ImageSegment

Spare capacity...x0 y0 x1 y1 x2 y2 x3 y3 x4 y4

A: InplaceVector

B: IndirectVector

Figure 3.2: A: The memory map of an InplaceVector that contains five XY coordinates. B: Thememory map of an IndirectVector that contains pointers to four ImageSegment structures.

ing the storing and retrieval of the feature vectorsthat make up the training sets.

RawPixmap

This is a wrapper for 24bit RGB image data. In-dividual pixels can be read and written and theobject is capable of reading from and writing PPMformat [10] image files.

ImageSegment

This is a simple C struct that is used to store dataabout a segment that is defined on an image. Itsdefinition is given in Appendix B. It is designed tostore three important sets of data about a segment.These are:

• A list of the XY coordinates of the pixels ofthe image covered by the segment.

• Statistics about the segment. These includeits width, height, mid point, average colourand idensity3. These statistics are included inthe segment definition so that procedures that

3This is the inverse density, which is defined as (width ∗height)/area, where area is the number of pixels in the seg-ment. Its purpose is to distinguish compact, dense segmentswhich have an idensity close to 1 from segments which are fi-brous or consist of a ring of pixels and hence have an idensitymuch greater than 1.

process the segment after it has been createddon’t have to keep recomputing them from thelist of pixels covered.

• Status flags. These are used when the segmentis being processed to indicate whether the seg-ment has already been discarded or is part ofan object that has been classified (and hencecan’t be part of another object).

GFX

In the discrete two dimensional case applicable toimage processing, the mathematical operator ofconvolution is defined as:

out(x, y) =∑m

∑n

kern(m,n) · in(x−m, y − n)

Where in is the input image, out is the outputimage and kern is a small convolution kernel to beapplied. By the application of appropriate kernelsit is possible to perform processing operations suchas edge detection and noise filtering. For exampleFigure 3.3 contains Sobel kernels that perform de-tection of horizontal and vertical edges and a Gaus-sian filter that performs a blurring operation.

The GFX module contains a very powerful con-volution function called ParallelConvolve. Thedeclaration of this function is included in Ap-pendix C. The function has the following features

15

0

0

0 1

2

1-1

-2

-1

0 0 0

1 2 1

-1 -2 -1

Sobel vertical: Sobel horizontal:

1 2 1

2 24

1 2 1

Gaussian blur:

Figure 3.3: A selection of common convolution kernels.

which are not common to standard implementa-tions of the convolution operation:

• It is capable of performing the convolution inplace. E.g. the output image can be the sameblock of memory as the input image and a min-imal quantity of workspace is used in the com-putation.

• As well as reading in and outputting an image.It can read in and output a sign plane whichcontains the sign of each of the pixels in theimage. This is useful for performing an edgedetect from which you need the direction of thechanges in intensity as well as the magnitudes.Indeed it is essential to have this informationin order to perform a Laplacian edge detectionoperation.

• It is capable of handling kernels of any shapeand size, however it is inefficient for kernelsabove approximately 5 by 5 [5].

• It is capable of applying more than one kernelat a time and combining the results in a varietyof ways. This is useful for performing opera-tions such as a Sobel edge detection that findsboth horizontal and vertical edges. Normallythe horizontal kernel from Figure 3.3 would beapplied to the image and the result stored ina block of temporary workspace. The verticalkernel would then be applied and the resultsfor the two applications then merged. How-ever with the ParallelConvolve function itis possible to simply pass in both the kernelsand all of these operations are performed si-multaneously, in place and with sign planes ifnecessary.

- GenerateTrainingSet

- Learn

- AnalyseImage

- Create

- Destroy

CarScanOracle

- LoadKnowledge

- StoreKnowledge

Figure 3.4: The CarScanOracle object

3.2 Top level structure

Exported interface

In accordance with the original specification the en-tire processing pipeline was wrapped in a black boxlibrary. This library exports the single object illus-trated in Figure 3.4. The function of each of itsmethods is as follows:

• Create: Creates an untrained CarScanOracleobject.

• Destroy: Destroys a CarScanOracle object.

• GenerateTrainingSet: Accepts an example im-age, the name of a directory to write an unan-notated training set into and a collection ofuser settings from the caller. It passes the im-age through the pipeline configured in mode A- the training set generation mode from Fig-ure 2.5. Hence writing out the training setproduced from the image to the specified di-rectory.

• Learn: Accepts the name of a directory con-taining an annotated training set and the name

16

of a directory containing an annotated valida-tion set from the caller. It passes these intothe pipeline configured in mode B - the neu-ral net training mode from Figure 2.5. Hencetraining the neural net.

• AnalyseImage: Accepts an image to be anal-ysed and a collection of user settings from thecaller. It passes the image through the pipelineconfigured in mode C - the image analysismode from Figure 2.5. Hence the image isanalysed and a vector of segments that coverall of the cars in the image is returned to theuser. It is also worth mentioning that thisis actually returning more useful informationthan was originally specified, as it returns thelist of pixels that each car occupies, wrappedin an ImageSegment, rather than just the XYcoordinates of each car.

• LoadKnowledge & StoreKnowledge: Loadsand stores the knowledge held in theCarScanOracle’s neural net.

Data flow

The flow of data inside this object in each of thethree processing modes is illustrated in Figure 3.5.The key point from this diagram is that althoughconceptually the main pipeline is linear, it is notimplemented in a purely linear fashion. This isfor performance reasons, because if the segmenterwas to complete its processing and then pass on alarge list of segments to the downstream modules,then it would require a huge quantity of memoryto store this list. Therefore instead, every timethe segmenter generates a segment it is immedi-ately passed to the downstream modules to eitherbe classified or written out as part of a training set.If the segment is classified and is found to containa car then the segmenter places it into the outputvector of segments that contain cars that is to bereturned to the caller, otherwise it discards it im-mediately.

3.3 Preprocessor

Noise filter

The noise filter is implemented simply by applyingthe Gaussian blur filter from Figure 3.3 using the

Convolve function from the GFX module. This hasthe effect of rounding off any noise spikes in theinput image but not completely removing them.

Edge detector

This sub module is implemented by simultaneouslyapplying the two Sobel kernels shown in Figure 3.3to the image input, discarding the signs and takingthe maximum to the two results. This is achievedusing a single call to the ParallelConvolve func-tion in the GFX module. The output image there-fore contains the magnitudes of the edges at eachlocation irrespective of the direction of the edge.

3.4 Segmenter

Inputs and outputs

As illustrated in Figure 3.5, the segmenter acceptsas input the blurred image and the edge image asRawPixmaps. If executing in image analysis mode,it outputs a vector containing ImageSegment def-initions of all the segments that were deemed bythe classifier to contain cars. However if executingin training set generation mode it has no output,as the segments generated are written out to discafter the Fourier invariance transforms. Each seg-ment that is generated is passed to the Fourier in-variance transforms wrapped in an ImageSegmentdefinition.

Method overview

A number of experiments led to the conclusion thatit is not possible to create a useful segmentationsystem that directly produces segments that mightcontain cars. This is because cars are so diversethat any heuristic that finds all cars also appearsto find everything else in the image, which defeatsthe point of the segmenter. However further exper-imentation led to the development of a two stageprocess that is capable of locating all the cars inthe image without outputting everything. Stageone is to locate objects that might be componentsof cars, such as the boot, roof and bonnet. Stagetwo is an aggregation phase where a scan is per-formed for clusters of these candidate componentsthat are stereotypical of the presence of a car.

17

Edge detector

Key:

Mode A data (Training set generation mode) :

Mode B data (Neural net training mode) :

Mode C data (Image analysis mode) :

Annotatedvalidation set

Annotatedtraining set

The invariant representation of eachsegment processed is written to a discfile to be used as a training set once ithas been annotated by the user.

Fourier invariance

transforms

Noise filter

RawPixmap:RawPixmap:

RawPixmap:

Preprocessor

Segmenter

Boolean:

segment contained acar or not.

Indicates whether the

Neural net classifier

to be classified.

ImageSegment:Defines a segment

IndirectVector:List of segmentscontaining cars.

DoubleMatrix:

representation ofContains invariant

the segment.

raw image blurred image

edge image

Figure 3.5: Data flow within the CarScanOracle object

18

Edge imageRaw image

3D plot of edge image

Figure 3.6: Some different representations of animage containing a car.

Component location theory

After looking at car components in a number ofdifferent representations, such as those shown inFigure 3.6, it became clear that they generally havetwo things in common:

• Most components are areas of relatively con-tinuous colour surrounded by areas of differentcolours.

• The center of a component is usually a localminima in the edge domain.

This led immediately to a method for findingcar components using segment growth4. E.g. ifsegment growth is started from all local minimaand stopped when the pixels reached have a colourthat is deemed to be significantly different from theseed pixel, then a collection of segments that repre-sent car components is produced. In this documentthese segments will be termed as “micro segments”.

Of course the definition of a car component thatis being used is so general that other objects suchas trees, bits of road and houses will also be outputas micro segments. Hence the need for the aggrega-tion phase to work out which of the micro segmentgroups actually represent cars.

4See Section 2.5.1.

Aggregation theory

It was determined that cars have two different mi-cro segment signatures. They are either a singlemicro segment that is approximately rectangularand of the correct length and width to be a car5, orthey are a cluster of micro segments that togethercover an area that is again approximately rectan-gular and of the correct length and width to be acar. Both these signature types are illustrated inFigure 3.7. The aggregation system therefore sim-ply needs to look for these signatures to find thingsthat are possibly cars.

Note that for all the cars in Figure 3.7 that havesignatures that are clusters of micro segments, re-gardless of how many segments there are in thecluster, there is always a segment that spans acrossthe whole back of the car and another that spansacross the whole front of the car. These lateralspanning segments are a feature common to almost100% of cars that are made up of a cluster of mi-cro segments, and are generally the boot and thebonnet of the car. As will be seen later this fact isuseful when trying to perform aggregation.

Internal pipeline

The implementation of the segmenter hence hasthree conceptual stages to its internal pipeline.Firstly it generates the micro segments. It thenperforms aggregation. Then finally it converts anyaggregates found into individual segments that rep-resent possible cars to be passed downstream forclassification. The data flow and structure of thepipeline that achieves this is illustrated in Fig-ure 3.8. The next few sections describe the functionof each of its modules.

Micro segmenter

This module is responsible for generating the microsegments. It outputs two data structures, the microsegment list and the micro segment map.

The micro segment list is an IndirectVectorcontaining a list of ImageSegments, each of which

5The size of a car in the image can trivially be computedfrom the altitude at which the image was obtained and thelens size of the camera. This system is not concerned withthis computation and so reads in the approximate size of acar from the caller. Hence the size can either be computedor entered directly by the user.

19

Cluster of micro segments

Single micro segment

Figure 3.7: An illustration of the different micro segment signatures of cars.

defines a micro segment. However while the microsegment list holds definitions of all the segmentsgenerated by the micro segmenter, it is not efficientto use it to work out which micro segment coversa particular pixel of the input image6. To do thiswould require a linear search through all the pixellists of all the ImageSegments held in the microsegment list.

Therefore the location of a micro segment in themicro segment list is defined as the ID of the microsegment and a pixel to segment lookup table is thengenerated. This lookup table, the micro segmentmap, is an IntegerMatrix of the same dimensionsas the input image. Each element contains the IDof the micro segment that covers the correspondingpixel or -1 if there is no segment covering the pixel.

The micro segmenter implements the componentlocation technique discussed earlier. The algorithmdesigned to estimate the local minima is given inFigure 3.9 and the algorithm to grow the microsegments from each local minima is given in Fig-ure 3.10. Each segment growth that occurs gener-ates an ImageSegment with all its statistics appro-priately filled in. This ImageSegment is then addedto the two output data structures.

The reason for the algorithm in Figure 3.9 onlyestimating the location of local minima is thatfinding local minima is a time consuming process.Therefore it simply starts segment growth from all

6Micro segments are defined as being mutually exclusiveof one another, hence there exists a mapping from each inputimage pixel to zero or one micro segments.

the zero points of the edge image, which are by defi-nition local minima. It then starts segment growthfrom all points with a value of one that have nosegment covering them and so on. This doesn’texactly achieve the effect of growing segments onlyfrom the local minima. For example it tends to pro-duce ring segments around the rims of the volcanolike structures in Figure 3.6. However this doesn’tmatter as the results are close enough, faster to ob-tain and the aggregation routines that follow canactually make use of these extra segments.

A number of other local minima estimation tech-niques were also experimented with, but this onetended to give the best set of micro segments towork with during aggregation.

An example of the typical output generated isgiven if Figure 3.11. The different tones of shad-ing represent different micro segments. It shouldbe visible from this that the majority of the com-ponents of the cars have been detected.

Segment culler

It can be seen from Figure 3.11 that the micro seg-menter outputs segments that are obviously notcomponents of cars. For example in Figure 3.11 theroad that the cars are sitting on has been outputas a segment. The purpose of the culler is there-fore to discard any of the micro segments that areobviously not relevant.

It achieves this by scanning through the microsegment list and setting the discarded flag on any

20

Correct area detector

Segmenter culler

InplaceVector:

segment.

An aggregate to beconverted into a

Micro segmenter

Segment generator

IndirectVector: IntegerMatrix:

IndirectVector:

micro segmentlist

micro segmentmap

micro segmentlist

RawPixmap: RawPixmap:blurred image edge image

ImageSegment:A segment to be

classified or writtento a file dependingon the processingmode.

Cluster detector

Figure 3.8: The internal pipeline of the segmenter.

gradientFor

do

For every pixel in input image

do

If

= 0 to gradient_threshold

then

endif

done

done

equals gradient and pixel is not already

part of a micro segment

- Commence a region growth from pixel

value stored for pixel in the edge image

Figure 3.9: Simple local minima estimation algo-rithm.

- Generate an empty pixel stack

While pixel stack not empty

do

If difference in colour between pixel and seed

pixel is less than user threshold

- Generate a new micro segment

and

is not already part of a micro segment

pixel

then

- Add pixel to micro segment

- Push seed pixel onto stack

- Push the 8 pixels surrounding the pixel

- Pop pixel from top of pixel stack

onto the pixel stack

endif

done

- Write statistics into micro segment

Figure 3.10: Segment growth algorithm.

21

Input image

Micro segmenter output

Culler output

Figure 3.11: Example of output from the micro seg-menter and culler.

of the segments that have a width or height greaterthan 1.5 times the approximate length of a car. Theresults of this are also shown in Figure 3.11, theblack areas representing pixels with no micro seg-ments covering them. From this it can be seen thatthe components of the cars have become very obvi-ous.

Aggregation

To perform the micro segment aggregation the seg-menter pipeline splits into two, a branch for eachof the different micro segment signatures a car canhave. The two branches are executed serially oneafter the other. When a branch locates somethingthat it thinks might be a car it outputs a list of theIDs of the micro segments that might be the car,this is termed here as an aggregate, which is thenpassed on down the segmenter pipeline for conver-sion into a true segment and then on to the invari-ance transforms etc.

Correct area detector

This module tries to locate the micro segments thaton their own could represent a car. It does this byscanning through the micro segment list lookingfor segments that have not been discarded, have anidensity less than three7 and have and area greaterthan a caller specified threshold. Any segmentsthat meet these criteria are passed out as a singlesegment aggregate.

Cluster detector

This module tries to locate the clusters of microsegments that could represent cars. The ideal clus-ter detection algorithm would be to locate clustersof micro segments that will fit inside a rectangleof the size of a car and fill more than a thresholdpercentage of the rectangle. However it was decidedthat the mathematical and computational complex-ity of trying to fit an arbitrarily oriented rectangleto a cluster of micro segments, where there are alarge number of possible combinations of micro seg-ments, makes the method infeasible.

Fortunately it was realised that the feature notedearlier that cluster cars have segments that laterally

7This is a purely heuristic number that was found to beabout the upper bound for cars consisting of a single microsegment.

22

span their back and front can be used to make theproblem tractable. In brief the operation of theimplemented cluster detector is as follows (How itperforms some of these tasks will be explained in amoment):

• Initially it finds a pair of micro segments whichlook like they could be the front and back of(the ends of) a car.

• It then derives from them the location of thebounding rectangle that the car would have ifthe pair actually were the ends of a car.

• It then tests the hypothesis that the pair arethe ends of a car by scanning the boundingrectangle for other micro segments that couldrepresent components of the car, such as theroof and the windows. If a sufficient quan-tity of other components are found then it ac-cepts the pair and the other components asa car, wraps them together as an aggregateand passes them downstream for further pro-cessing. Otherwise if an insufficient numberof other components are located, then it con-cludes that the pair are not the ends of a carand moves on.

• It then repeats this process until there areno more pairs of segments that look like theycould be the ends of a car.

Some notes on this process are:

• To locate pairs of segments which look likethey could be the ends of a car, the cluster de-tector looks for pairs where the segments areboth about the right size to be a bonnet orboot, approximately the correct distance apartand of similar colour8.

• To derive the location of a car bounding rect-angle from a pair of end segments the clusterdetector makes use of the fact that if the seg-ments are the ends of a car then they also gen-erally laterally span the car. This means thattheir midpoints will lie on the longitudinal cen-terline of the car. The sides of the boundingrectangle are therefore parallel to this center-line at half a car width on either side. The

8Although this is not rigorously enforced to allow formulti coloured cars.

ends of the rectangle are then lines that areperpendicular to the centerline that push hardup against the two end segments.

Segment generator

The purpose of the segment generator is to convertan aggregate that is just a list of micro segmentsinto a single segment that can be passed to theinvariance transforms. As a car when viewed fromabove in real life is a convex object the segment gen-erator computes the convex hull of the aggregate.It then converts the hull into an ImageSegment byperforming a polygon scan conversion operation tocompute the pixels that the hull encloses. Some ofthe special cases involved in polygon scan conver-sion are helpfully avoided as the hull is convex. Fi-nally the segment is expanded by a couple of pixelsin all directions to ensure that the car is completelyincluded in the segment.

The segment definition is then passed on downthe main pipeline to the invariance transforms.

3.5 Fourier invariance trans-forms

Inputs and outputs

As illustrated in Figure 3.5, the Fourier invariancemodule accepts as input a segment defined by anImageSegment and the edge image as a RawPixmap.Its output is a DoubleMatrix which contains a ro-tation, scaling and translation invariant represen-tation of the segment.

Internal pipeline

The data flow and structure of the internal pipelineof this module is illustrated in Figure 3.12.

Masker and range normalisation

The invariance transforms need to work on thedata held in the region of the edge image de-fined by the ImageSegment passed in. Howeverthe ImageSegment only contains a list of the XYcoordinates of the pixels that make up the seg-ment not the data its self. The masker thereforecuts out the region of the edge image defined by

23

DoubleMatrix:Data to be made invariant to segmenttransformations.

2D FFT + powerspectrum

2D FFT + powerspectrum

RawPixmap:edge image

ImageSegment:Segment to be processed

DoubleMatrix:Transformationinvariant representationof segment.

Log polar mapping

DoubleMatrix:Translation invariantrepresentation ofsegment.

DoubleMatrix:Representation in whichscalings and rotationsare manifested astranslations.

Masker and rangenormalisation.

Figure 3.12: Internal pipeline of the Fourier invari-ance module.

the ImageSegment and passes it on down the in-variance transform pipeline as a 64 by 64 elementDoubleMatrix. This process is illustrated in Fig-ure 3.13.

The shift to a real valued DoubleMatrix is nec-essary because the Fourier transform and classifierroutines all operate on real valued data rather thandiscrete pixel values. It is of fixed 64 by 64 size be-cause all the feature vectors that are passed to theclassifier must be the same size, hence it is logicalto fix the size at this point and gain performancethrough the Fourier transforms by using matrix di-mensions of a power of 2 which are optimal for anFFT. Segments that are too large to fit into the64 by 64 matrix are scaled to fit. This can beperformed without loss of generality as the datais about to be made invariant to scalings.

Finally this module performs range normalisa-tion. What that means is that each DoubleMatrixobject output has its elements scaled so that theyrange from 0.0 to 1.0. This means that a dark caron a dark background which will hence have veryweak edges will have the magnitude of its edge in-creased so that is has the same representation asa bright car on a dark background which will havevery strong edges.

2D FFT + power spectrum

The FFTs are implemented simply by making callsto the MIT “Fastest Fourier Transform in theWest” (FFTW) library. This provides the excep-tionally high performance necessary to perform thelarge number of Fourier transforms required to pro-cess all of the segments output by the segmenter.A power spectrum computation is then performedon the results.

Log polar mapping

The mapping of the output from the first Fouriertransform onto a log polar basis utilises super sam-pling to ensure that as accurate as possible repre-sentation of the data is obtained.

24

64x64 DoubleMatrix:Holds section ofedge imagedescribed bysegment.

SegmentEdge image

+

Figure 3.13: Illustration of masker operation.

3.6 Neural net classifier

Inputs and outputs

As illustrated in Figure 3.5, the classifier moduleaccepts as input a transformation invariant repre-sentation of a segment in a DoubleMatrix. It thenclassifies the segment and outputs a boolean thatspecifies whether the segment contains a car or not.

Overview

The classifier is implemented using the classic threelayer neural net architecture illustrated in Fig-ure 3.14. The implementation has the followingfeatures9:

• Each element of the input DoubleMatrix istaken as a parameter of the feature vector fedinto the net.

• The net has a single output neuron that out-puts the likelihood of the feature vector beingfed into the net being a representation of a car.

• The number of neurons in the hidden layer isconfigurable by the user, hence it can be tai-lored to the complexity of the user’s data.

• Training is via gradient descent methods [8,chapter 4], using back propagation [8, chapter

9Some of the points in this list assume a good familiaritywith neural net techniques. The reader is asked to refer toa good text [8, 9] on the subject for more details.

Boolean:Specifies whethersegment holds acar or not.

Inferencemodule

Output layer

Hidden layer

Input layer / Feature vector

to every node of the hidden layer.Note: Every element of the input layer is connected

Figure 3.14: Architecture of neural net classifier.

25

6] to feed output errors back to the hiddenlayer nodes.

• The output of each node is fed through a sig-moid function to allow a smooth gradient de-scent [8, page 19].

• The net is trained by use of the “online” tech-nique in which the synapse weights and neuronactivation thresholds are updated after presen-tation of each training set feature vector.

• The training of the net makes use of simu-lated momentum to allow the gradient decentto cross local minima [8, section 6.5].

• Due to the sigmoid function the single out-put layer node outputs a number between 0.0and 1.0. This is converted into a boolean bythe inference module which is in fact simplya configurable threshold. If the output fromthe output node is greater than the thresholdthen the input is deemed to be a car and the in-ference module outputs true, otherwise it out-puts false. After the net has been trained thethreshold is set simply by selecting the valuethat gives the best tradeoff between the num-ber of cars missed and the number of otherobjects incorrectly identified as cars.

Data structures

Training a neural net is a very computationally ex-pensive task. Therefore the data structures used toimplement the net were influenced purely by perfor-mance considerations. To this end the data struc-tures consist solely of two linear arrays of doubles.One of these arrays stores the synapse weights andactivation thresholds of the hidden layer neurons10,the other stores the synapse weights and activationthresholds of the output layer neurons (Which inthis case is only a single neuron). Two other ar-rays are then used during training to store the rateat which each of the weights in both the layers arechanging. These rates are known as weight veloc-ities and are used in computing the momentum ofa weight. The format of these arrays is illustratedin Figure 3.15.

10The implementation makes use of the mathematicalproperties of the artificial neurons that allows an activationthreshold to be treated as a weight on an extra synapse thathas it’s input permanently set to a value of 1.0

The use of linear arrays rather than say an ob-ject for each neuron, allows fast pointer arithmetictransversal of the structures during evaluation andtraining. This provides a huge performance advan-tage over making multiple indirect reads and writesto the elements of objects.

Implementation

The evaluation and training/back propagation al-gorithms are not presented here as they are imple-mentations of standard textbook [8, 9] algorithms.However for the interested the feature vector eval-uation function is included in Appendix D.

3.7 Facilities for testing

Most of the objects in the system have extra de-bugging methods that allow their internal state tobe dumped to disc.

3.8 Third party code

The third part code and libraries used in the systemare:

• Clib: The standard C library.

• FFTW: The MIT “Fastest Fourier Transformin the West” library. Used for performing twodimensional FFTs.

• Integer sqrt: An Integer square root routinefrom Dr. Dobb’s Journal [11].

3.9 Source statistics

The statistics of the source code produced to im-plement the system are:

Lines of source : 15654

Lines in headers : 3541

Comments : 3099

Code : 442

Lines in body : 12113

Comments : 2374

Code : 9739

26

wv wv wv wv wv wv wv wv wv wv av0,0 0,1 0,2 0,3 0,4 0,5 0,6 0,7 0,8 0,9 0

wv wv wv wv av wv wv wv av0,0 0,1 0,2 0,1023 1,0 9,1022 9,1023 90

0,0 0,1 0,2 0,3 0,4 0,5 0,6 0,7 0,8 0,9w w w w w w w w w a

0w

w w w w w w w aa00,0 0,1 0,2 0,1023 1,0 9,1022 9,1023 9

Output layer weights and activation thresholds:

Hidden layer weights and activation thresholds:

Output layer weight and activation threshold velocities (Training only):

Hidden layer weight and activation threshold velocities (Training only):

Key:

wn,m

an

n,mwv

: The weight on synapse m of node n of the layer.

nav

: The velocity of the weight on synapse m of node n of the layer.

: The activation threshold of node n of the layer.

: The velocity of the activation threshold of node n of the layer.

Figure 3.15: Illustration of the arrays used to implement the neural net. Shown here is the layout for anet with a 4096 parameter feature vector input, a 10 node hidden layer and a 1 node output layer. Thisis in fact the configuration that it was found worked best for this application.

27

Chapter 4

Evaluation

This chapter contains an evaluation of the perfor-mance of the system and each of its sub modules.Details of the testing of every low level object inthe system have not been included.

4.1 Segmenter

A small section of test code was written to outputsegments generated by the segmenter to an imagefile for analysis. Figure 4.1 contains a test imageand the segments produced for it. The white areasare the segments output. The black line a few pixelsin from the boundary of each segment is the convexhull that is computed as part of the segmentationprocess1. From this and other such test images itis clear that the segmenter is performing exactly tospecifications. Specifically:

• Every whole car in the image has a segmentwhich exactly covers it.

• The number of other segments defined on theimage is low. This image contains only 104segments. This means only 104 invariancetransforms and classifications have to be per-formed, which it turns out is about the ballpark needed to get the performance of the sys-tem to meet its specifications.

1Remembering that the segments are grown by a few pix-els in all directions after they have been generated from theconvex hull. Hence the convex hulls lie several pixels in fromthe boundaries of each segment.

4.2 Fourier invariance trans-forms

Tests performed

Initially the transforms were tested with data thatchecked that they met their mathematical speci-fications. So for example Fourier basis functionswere fed into the FFTs to check that they outputa single point in the Fourier plane etc. From thesetests it was determined that the transforms werefunctioning to their mathematical specification.

Tests were then performed to find out how wellthe transforms actually achieved invariance. De-tails and results of this testing are given in Ap-pendix E. From these it was determined thatwhile translation invariance worked perfectly, ro-tation and scaling invariance hardly worked at all.

Discretisation errors

It was decided that these problems were due tosmall discretisation errors that were being amplifiedby the log polar transforms. To explain this Fig-ure 4.2 shows the first two stages of the processingof two images that are rotations of each other. Notethe way that the Fourier plane is simply rotated bythe same amount as the images. Now consider thediscretisation errors that must necessarily occur inthe eight elements that surround the element atthe origin of this Fourier plane. For example if theimage and hence the Fourier plane are rotated by40 degrees, then there is no possibility that theseeight elements can correctly represent the rotationand so the errors are going to be large. Now notethe way that the log polar transform maps theseelements onto large areas of the log polar plane.

29

Input image

Segments generated

Figure 4.1: Example of the segmenter’s output

30

Elements with greatestdiscretisation errors in

to largest areas of logpolar plane.

Fourier plane mapped Log polartransform

Fouriertransform

µ

ν

Theta

Log R

µ

ν

x

yy

x

Theta

Log R

Figure 4.2: Illustration of discretisation errors

31

Hence the elements of the Fourier plane that havethe largest discretisation error are mapped onto thelargest areas of the log polar plane.

Reference back to the original papers [2, 3, 4,7, 12], revealed that those that had performed thetransforms optically using lenses and hence at aneffectively infinite resolution had reported good re-sults. Meanwhile those that had implemented it inthe discrete case had been remarkably quiet abouthow well the technique had actually worked. Thisled to the conclusion that perhaps the discrete casehad a few more implementation problems than thepapers extolling its virtues were willing to let on.

The polar mapping

The simple solution implemented was to convertthe log polar mapping into a simple polar mapping.This served to remove the massive dilation of thecentral elements along the r axis of the polar planecaused by the log function. Of course the polarmapping still dilates these elements along the thetaaxis but the overall amplification is greatly reduced.

With this new implementation the rotation in-variance tests were repeated and the results areincluded in Appendix F. These results suggestedthat rotation invariance was showing signs of func-tioning. However presumably due the the discreti-sation errors that are intrinsic to the system itsperformance was still poor.

To overcome the fact that removing the log func-tion also removes the ability for the transforms toachieve scaling invariance, the segments being fedinto the transformations were all prescaled so thatthey had the same dimensions. This is an excep-tionally crude way of achieving scaling invariance,but as will be see in Section 4.4, even though therotation invariance has poor performance and thescaling invariance is crude, the complete system ac-tually manages to achieve surprisingly good results.

4.3 Neural net classifier

The neural net was tested on the well defined prob-lem of recognising hand written characters suchas those shown in Figure 4.1. It obtained almostperfect performance on the classification of unseen

Figure 4.3: An example of the hand written char-acters used to test the neural net.

examples2, however it did understandably have asmall amount of confusion about letters such ascapital Q and O. This test therefore proved thatthe net is capable of solving complex classificationproblems.

4.4 The complete system

Average complexity images

Even though there appear to be some problemswith the Fourier invariance transforms, the systemas a whole produces remarkably good results forthe majority of the images given to it for process-ing. For example, Figure 4.4 contains an image inwhich the system has located3 all but two of theun-occluded cars and has not misclassified a singlepiece of background scenery as a car. This is thetypical level of performance obtained from the sys-tem when it is used to process images of this levelof complexity.

2Note that it was trained directly on the characters withno invariance transforms involved anywhere. Hence wouldonly classify character in one orientation. However this itenough to show that the net its self is functioning correctly.

3Note that the test harness that was written to call theCarScanOracle object has rendered the list of segments con-taining cars returned as rectangles centered on the midpointsof these segments.

32

Figure 4.4: Example of the output from the system for an input image of average complexity.

High complexity images

In this application a high complexity image is animage which contains a large quantity of back-ground features such as buildings and other objectsthat bare a close resemblance to cars. Figure 4.5contains an example of such an image.

When the system was run on this image the seg-menter output 627 segments. The performance ofthe classifier at classifying these segments is de-tailed in Figure 4.64.

These results show that the classifier correctlyclassified 92% of the segments that contained carsand 96% of the segments that contained scenery.This represents amazingly high accuracy for a com-plex real world problem such as this.

There is however a problem. Due to the complex-ity of the background data in this image, the seg-menter output many times more segments that con-tained scenery than it did segments that containedcars. The results above show that 4% of thesescenery segments were miss classified as cars. Inreal terms this means that almost as many scenerysegments were misclassified as cars as there werecars in the image. Figure 4.7 contains the outputfrom a section of the image which contains some of

4Note that these figures have been corrected by subtract-ing special cases such as the line of cars outside the car showroom and the yellow box junction, so that they are not un-naturally skewed.

these misclassifications.Hence even though the classifier has a very high

accuracy, the shear complexity of certain input im-ages, which cause the segmenter to output a lotof scenery segments, can make the classifier’s inac-curacies significant. However this really is a worstcase situation. For the majority of images the com-plexity is low enough that the inaccuracies don’tbecome significant.

The “Yellow box” problem

During testing it was found that the system doeshave one specific problem. This is that it gets com-pletely confused by “Yellow box” junctions and theother forms of cross hatch marking that are oftenfound at road intersections. This problem is illus-trated in Figure 4.8. Note that it is not confusedby the dashed lines normally found in the middle ofa road, only markings that form particular shapeboxes in the image.

Attempts to solve this problem by adding extraexamples of these road markings to the training set,resulted in an overall reduction of the number ofcars detected. Hence these road marking evidentlycontain some of the features that the net is using toidentify cars, which causes the net to have difficultytelling them apart.

33

Figure 4.5: Example of a high complexity image.

Segments containing scenery that were correctly classified as such (Correct rejections): 582Segments containing cars that were correctly classified as such (Correct accepts) : 24Segments containing cars that were classified as containing scenery (False rejections): 2Segments containing scenery that were classified as containing cars (False accepts) : 19

Figure 4.6: The performance of the classifier when classifying the segments produced for Figure 4.5.

Figure 4.7: Example of misclassifications in a complex scene.

34

Figure 4.8: An example of how the system is con-fused by “Yellow box” junctions.

Machine Figure 4.4 Figure 4.5PentiumII-350 37s 186sPentium-166 88s 395s

Figure 4.9: Rough performance results for the sys-tem, giving the time taken to process some of thefigures in this chapter.

Speed

Some rough performance figures for the system aregiven in Figure 4.9. These show that the systemis very close to its target of being able to performthe task at least as fast as a human. For examplea human probably wouldn’t be able to obtain anaverage of 186 seconds for images such as Figure 4.5if you include the fact that a human is only likelyto work a maximum of 12 hours in a 24 hour day.

35

Chapter 5

Conclusions

5.1 Comparison with aims

The basic aim of this project was to implement aback end library that was capable of locating cars invertical aerial photography. As detailed in the Im-plementation and Evaluation sections, a back endlibrary has been implemented and for the majorityof images it accurately locates the cars containedwithin them.

5.2 Design improvements

Invariance transforms

It is clear from the Evaluation that there aresome serious problems with the Fourier log po-lar invariance technique. It is therefore probablethat High Order Neural Nets (HONN) or hybridFourier/HONN techniques may be utilised to ob-tain a higher performance.

Handling complex images

There is a possibility that improvements to the in-variance transforms could improve the accuracy ofthe classifier. Hence alleviating the problem of theclassifier inaccuracies becoming significant in com-plex images. However without such improvements,experience with the training of the classifier sug-gests that it is unlikely that its accuracy can beimproved past the 96% mark. This means thatthe solution would be to improve the segmenter byadding further heuristics to cut down the numberof segments containing scenery output. It is likelythat this would be an achievable objective.

Solving the “Yellow box” problem

As the road markings that confuse the classifier arealways yellow or white lines on a dark background.It should therefore be possible to add special caseheuristics to the segmenter, that detect and discardall segments containing such features before theyare passed to the classifier.

37

Bibliography

[1] Donald Lewine. POSIX programmer’s guide.O’Reilly, 1994.

[2] Invariant pattern recognition: A review. Pat-tern Recognition, 29:1–17, 1996.

[3] D Casasent and D Psaltis. Position, rotationand scale invariant optical correlation. AppliedOptics, 15:1793–1799, 1976.

[4] D Asselin and H H Arsenault. Rotationand scale invariance with polar and log polarcoordinate transformations. Optics Comm.,104:391–404, jan 1994.

[5] J G Daugman. Computer vision. Lecturenotes from Part II of the Cambridge UniversityComputer Science Tripos, 1999.

[6] Rotation and scale invarient pattern recogni-tion using a multistaged neural network. SPIE,1606:241.

[7] H Y Kwan, B C Kim, D S Cho, and H YHwang. Scale and rotation invariant patternrecognition using complex-log mapping andaugmented second order neural network. Elec-tronics Letters, 29(7):620–621, 1993.

[8] Kevin Gurney. An introduction to Neural Net-works. UCL press, 1997.

[9] Christopher M Bishop. Neural Networks forPattern Recognition. Oxford University Press,1995.

[10] Portable pixmap file format.WWW: http://www.dcs.ed.ac.uk/˜mxr/gfx/2d/PPM.txt.

[11] Peter Heinrich. Algorithm alley. Dr. Dobb’sJournal, April 1996.

[12] R A Messner and H H Szu. An image pro-cessing architecture for real time generation ofscale and rotation invariant patterns. CVGIP,31:50–66, 1985.

[13] Application of wavelet and neural procesing toautomatic target recognition. SPIE, 3069, ?

[14] Giorgio Bonmassar and Eric L. Schwartz.Space-variant fourier analysis: The expo-nential chirp transform. IEEE transactionson pattern analysis and machine intelligence,19(10):1080–1089, oct 1997.

[15] Mary L Boas. Mathematical methods in thephysical sciences. Wiley, 1983.

[16] Brian W Kernighan and Dennis M Ritchie.The C programming language. Prentice Hall,1988.

39

Appendix A

Header file for InplaceVector object

/********************************************************************/

/* Source : */

/* InplaceVector.h */

/* */

/* Description : */

/* A self maintaining inplace vector that increses and decreases */

/* its size as required. */

/* */

/* Language : */

/* C */

/* */

/* Authors & Modifications : */

/* 1999, Richard Lancaster, original. */

/* */

/* Copyright (c) 1999 Richard Lancaster. */

/* See LICENSE file for full license details. */

/********************************************************************/

#ifndef __InplaceVector_h

#define __InplaceVector_h

#include "StdDef.h"

/* STRUCTURE AND DATATYPE DEFINITIONS */

/*

* InplaceVector descriptor

*

* THIS STRUCTURE CONTAINS PRIVATE DATA AND SHOULD NOT BE ACCESSED

* DIRECTLY. ACCESS SHOULD ONLY BE THROUGH THE METHODS PROVIDED .

*/

struct InplaceVectorStruct

{

/* The size of each element held in the vector in bytes */

int element_size;

/* The minimum number of spaces that are to be allocated for */

/* elements in the vector and the number of elements by which */

/* the size should be incremented when the vector becomes full */

int default_size;

int inc_step;

/* The number of elements currently held in the vector and the */

/* number of elements the vector currently has the capacity */

/* to hold without memory reallocation */

int number_of_elements;

int capacity;

/* The actual data */

/* If capacity is zero then this pointer is NULL */

byte *data;

};

typedef struct InplaceVectorStruct InplaceVector;

/* CONSTRUCTOR AND DESTRUCTOR ROUTINES */

/*

* This routine creates an inplace vector.

*

* NOTES:

*

* - The vector produced uses zero based addressing.

*

* - The vector automatically reallocates the amount of memory it

* is using depending on the number of elements stored in it.

*

* PARAMETERS

*

* element_size:

* The size of the elements to be stored in the vector

* in bytes.

*

* default_size:

* The initial number of elements to be allocated in the

* vector.

*

* inc_step:

* The step size by which to increase and decrease the

* size of the vector when required.

*

* RETURNS

*

* The vector or NULL on failure.

*/

extern InplaceVector *InplaceVectorCreate

(int element_size, int default_size, int inc_step);

/*

* This routine destroys a vector.

* If the vector is NULL then it is ignored.

*

* PARAMETERS

*

* vector:

* The vector to be destroyed.

*/

extern void InplaceVectorDestroy(InplaceVector *vector);

/* PROPERTY MANIPULATION ROUTINES */

/*

* Returns the number of elements in the vector.

* Note that this is not the current capacity of the vector the

* data about which is private.

*

* PARAMETERS

*

* vector:

* The vector to query size of.

*

* RETURNS

*

* The size of the vector

*/

extern int InplaceVectorGetSize(InplaceVector *vector);

/* VECTOR STYLE DATA MANIPULATION ROUTINES */

/*

* Places an new element into a vector at a specified index.

*

* PARAMETERS

*

* vector:

* The vector to place element into.

*

* index:

* The index to place element at.

*

* value:

* A pointer to the value to be placed into the index.

41

*

* RETURNS

*

* True if sucessful or false otherwise.

*/

extern boolean InplaceVectorPut

(InplaceVector *vector,

int index, void *value);

/*

* Gets an element from a vector.

*

* PARAMETERS

*

* vector:

* The vector to get an element from.

*

* index:

* The index to get the element from.

*

* value:

* A pointer to the location to copy the value into.

*

* RETURNS

*

* True if sucessful or false otherwise.

*/

extern boolean InplaceVectorGet


int index, void *value);

/*

* Appends an element to the end of the data currently

* held by the vector.

*

* PARAMETERS

*

* vector:

* The vector to append an element to.

*

* value:

* A pointer to the value to append.

*

* RETURNS

*

* true if sucessful or false otherwise.

*/

extern boolean InplaceVectorAppend


void *value);

/*

* Removes an element from a vector, elements above will be shuffled

* down. The memory held by the vector will also be compressed

* if that is a sensible thing to do.

*

* NOTES:

*

* - If memory compression fails then no error is returned,

* the memory just remains uncompressed.

*

* PARAMETERS

*

* vector:

* The vector to remove an element from.

*

* index:

* The index to remove.

*

* RETURNS

*

* True on sucess or false otherwise.

*/

extern boolean InplaceVectorCut


int index);

/*

* Flushes all the elements from a vector.

*

* This function is designed to be far more efficient than

* calling "Cut" on every single element.

*

* NOTES:

*

* - If memory compression fails then no error is returned,

* the memory just remains uncompressed.

*

* PARAMETERS

*

* vector:

* The vector to flush.

*

* RETURNS

*

* True on sucess or false otherwise.

*/

extern boolean InplaceVectorFlush

(InplaceVector *vector);

/* STACK STYLE DATA MANIPULATION ROUTINES */

/*

* Pushes a value onto the end of a vector.

* Eg. This is a stack Push operation.

*

* PARAMETERS

*

* vector:

* The vector to push element onto.

*

* value:

* A pointer to the value to push onto the vector.

*

* RETURNS

*


*/

extern boolean InplaceVectorPush


void *value);

/*

* Reads the last element from a vector and then removes it

* from the vector. Eg. This is a stack pop operation.

*

* Popping from an empty stack results in true being returned,

* no value is written out and the stack_empty boolean is set.

*

* PARAMETERS

*

* vector:

* The vector to pop element from.

*

* value:

* A pointer to the location to place the poped element.

* If the stack is empty then this value is unaltered.

*

* stack_empty:

* A pointer to a boolean to be set to true if an attempt

* is made to pop from an empty stack. Otherwise it is set to

* false.

*

* A NULL pointer can be passed in if you don’t want to know.

* However that would be considered bad programming style.

*

* RETURNS

*

* True if sucessful or false on an error.

* Poping from an empty stack is defined as sucessful.

*/

extern boolean InplaceVectorPop


void *value, boolean *stack_empty);

#endif

42

Appendix B

Definition of ImageSegmentstructure

/*

* ImageSegment descriptor

*

* This structure provides public data and so may be accessed

* directly.

*/

struct ImageSegmentStruct

{

/* -- FLAGS -- */

/* Segment status flags */

boolean discarded;

boolean checked;

boolean classified;

int class;

int parent_object;

/* -- DATA -- */

/* List of the pixels that make up the segment */

/* Each pixel is stored in an XYCoordinate structure */

InplaceVector *pixels;

/* -- STATISTICS -- */

/* Average colour of the segment */

byte avg_red;

byte avg_green;

byte avg_blue;

/* Segment bounds */

int min_x;

int max_x;

int min_y;

int max_y;

/* Segment dimensions */

int width;

int height;

/* Segment center point */

int mid_x;

int mid_y;

/* Density details */

/* NOTE: The idensity is the bounding volume of the segment */

/* (max_x - min_x + 1) * (max_y - min_y + 1), divided by the */

/* number of pixels in the segment. Eg. it is the inverse */

/* density */

int idensity;

};

typedef struct ImageSegmentStruct ImageSegment;

43

Appendix C

Header file for GFX module

/********************************************************************/

/* Source : */

/* GFX.h */

/* */

/* Description : */

/* Graphics effects routines. */

/* */

/* Language : */

/* C */

/* */

/* Authors & Modifications : */

/* 1999 Richard Lancaster, original. */

/* */

/* Copyright (c) 1999 Richard Lancaster. */

/* See LICENSE file for full license details. */

/********************************************************************/

#ifndef __GFX_h

#define __GFX_h

#include "BitMatrix.h"

#include "RawPixmap.h"

#include "StdDef.h"

#include "String.h"

#include <stdlib.h>

#include <stdio.h>

#include <string.h>

/* Constant definitions */

#define GFX_EDGE_EFFECT_BLACK (0)

#define GFX_EDGE_EFFECT_WHITE (1)

#define GFX_EDGE_EFFECT_WRAP (2)

#define GFX_EDGE_EFFECT_EXTEND (3)

/* CONVOLUTION KERNELS */

/*

* A selection of standard 3x3 convolution kernels

*/

extern int GFX_kernel_gaussian[][];

extern int GFX_kernel_laplacian[][];

extern int GFX_kernel_sobelh[][];

extern int GFX_kernel_sobelv[][];

/* PROCESSING ROUTINES */

/*

* Apply a convolution kernel to a pixmap.

*

* NOTES:

*

* - The result obtained for each channel of each destination pixel

* after applying the kernel is truncated to the range -255 to 255

* inclusive.

*

* - The result stored in each output element of the dest_pixmap

* is abs(x).

*

* - The sign of each output element is discarded unless a sign

* BitMatrix is passed to the function.

*

* PARAMETERS

*

* source_pixmap:

* The pixmap to apply the kernel to.

*

* source_sign_plane:

* A BitMatrix of size:

*

* (source_pixmap->width * 3) by (source_pixmap->height)

*

* That holds the sign of each element in source_pixmap.

* A 0 represents positive or zero, 1 represents negative.

*

* If a NULL is passed as this parameter then all values in

* the source pixmap are considered to be positive.

*

* dest_pixmap:

* The pixmap to place the result of the convolution into.

* This must be the same size as the source pixmap.

* This will work if it is the same pixmap as the source_pixmap

* but it is likely to be slightly slower (probably not by

* very much though).

*

* dest_sign_plane:

* A BitMatrix of size:

*

* (source_pixmap->width * 3) by (source_pixmap->height)

*

* to hold the sign of each element in dest_pixmap.

* A 0 represents positive or zero, 1 represents negative.

*

* If a NULL is passed as this parameter then the sign

* is discarded.

*

* kernel:

* Pointer to a 2D C array of integers that contains the

* convolution kernel.

*

* divisor:

* The divisor to apply to each result pixel.

*

* kernel_width:

* kernel_height:

* The width and height of the kernel array.

* Both must be greater than 0 and odd.

*

* edge_effect:

* Specification of how to handle edges.

*

* RETURNS

*


*/

extern boolean GFXConvolve

(RawPixmap *source_pixmap, BitMatrix *source_sign_plane,

RawPixmap *dest_pixmap, BitMatrix *dest_sign_plane,

int *kernel, int divisor,

int kernel_width, int kernel_height,

int edge_effect);

/*

* Apply a set of convolution kernels to a pixmap in parallel.

*

* If used with n kernels then the following effect is

* obtained:

*

* - n copies of the image are made.

* - One of the n kernels is applied to each image.

* - The n images are merged back together according to a

* merging function.

*

* In reality no copies of the image are made.

*

* The reason for this is to allow something along the lines of

* a horizontal edge detect and a vertical edge detect to be

45

* performed. Then have their results merged together without

* creating needless copies of the image.

*

* PARAMETERS:

*

* (The parameters are as for GFXConvolve with the following

* exceptions)

*

* kernels:

* Points to an array of kernels to apply. All kernels must

* be of the size specified in kernel_width and kernel_height.

* Each kernel is of the format accepted by GFXConvolve.

*

* divisors:

* Points to an array of divisors to apply to the result

* produced by each kernel. divisors[n] is applied to

* the result from kernels[n].

*

* num_kernels:

* The number of elements in the kernels and divisors arrays.

*

* merge_style:

* How to merge the results of each kernel to produce the output

* pixel.

*

* RETURNS

*


*/

#define GFX_PARALLEL_CONVOLVE_MERGE_STYLE_ADD_ABS (0)

#define GFX_PARALLEL_CONVOLVE_MERGE_STYLE_ADD_SIGNED (1)

#define GFX_PARALLEL_CONVOLVE_MERGE_STYLE_MAX_ABS (2)

#define GFX_PARALLEL_CONVOLVE_MERGE_STYLE_MAX_SIGNED (3)

#define GFX_PARALLEL_CONVOLVE_MERGE_STYLE_MIN_ABS (4)

#define GFX_PARALLEL_CONVOLVE_MERGE_STYLE_MIN_SIGNED (5)

#define GFX_PARALLEL_CONVOLVE_MERGE_STYLE_AVG_ABS (6)

#define GFX_PARALLEL_CONVOLVE_MERGE_STYLE_AVG_SIGNED (7)

extern boolean GFXParallelConvolve

(RawPixmap *source_pixmap, BitMatrix *source_sign_plane,

RawPixmap *dest_pixmap, BitMatrix *dest_sign_plane,

int **kernels, int *divisors, int num_kernels,

int kernel_width, int kernel_height,

int merge_style, int edge_effect);

/*

* Apply a threshold to a pixmap.

*

* This takes in a pixmap and for each pixel:

*

* - Reads the peak intensity.

*

* - If peak intensity is greater than or equal to the value of

* ’threshold’ then 255 is output into all 3 channels of the

* corresponding pixel of the destination pixmap.

*

* - If peak intensity is less than the value of

* ’threshold’ then 0 is output into all 3 channels of the

* corresponding pixel of the destination pixmap.

*

* PARAMETERS

*

* source_pixmap:

* The pixmap to apply the threshold to.

*

* dest_pixmap:

* The pixmap to output the bitmap image to.

* This must be the same size as the source pixmap.

* This will work if it is the same pixmap as the source_pixmap.

*

* threshold:

* The threshold to apply. This does _not_ have to be in the

* range 0 to 255.

*

* RETURNS

*


*/

extern boolean GFXApplyThreshold

(RawPixmap *source_pixmap,

RawPixmap *dest_pixmap,

int threshold);

#endif

46

Appendix D

Neural net feature vector evaluationfunction

Neural net structure definition/*

* The ThreeLayerNN structure.

*

* This structure contains private data and should not be

* accessed directly.

*

* The nodes in one layer are assumed to be fully connected to the

* nodes in the layer above.

*

* This structure is designed with simple flat arrays so that

* it is _fast_.

*/

struct ThreeLayerNNStruct

{

/* The number of nodes in each layer. The input layer are just */

/* distribution rather than processing nodes */

int output_nodes;

int hidden_nodes;

int input_nodes;

/* The weights of the node inputs */

/* These are arrays of doubles. The first n elements are the */

/* weights of the first node in that layer. The next n are the */

/* weights of the next node etc. There lengths are therefore */

/* the number of nodes in the layer multiplied by the number of */

/* nodes in the layer below plus 1. The plus 1 is for the */

/* threshold pseudo weight, which adds a weight to each node in */

/* the layer */

double *output_node_weights;

double *hidden_node_weights;

};

typedef struct ThreeLayerNNStruct ThreeLayerNN;

Evaluation function declaration/*

* Evaluates a feature vector presented to it. It only returns

* the raw output of the net. Deciding how to classify the

* results is left to the calling procedure.

*

* PARAMETERS

*

* net:

* The network to perform evaluation with.

*

* input_vector:

* The input vector presented as an array.

* This must be of length net->input_nodes.

*

* hidden_vector:

* A vector to hold the outputs from the hidden nodes.

* This is passed into the routine rather than being allocated

* internally so that it doesn’t need to be continuously

* reallocated and so that these outputs can be interogated.

* This must be of length net->hidden_nodes.

*

* output_vector:

* The vector to place the processing results into.

* This must be of length net->output_nodes.

*

* hidden_raw:

* If not NULL this vector is filled with the raw version of

* hidden_vector that has not been passed through the sigmoid.

* This must be of length net->hidden_nodes or NULL.

*

* output_raw:

* If not NULL this vector is filled with the raw version of

* output_vector that has not been passed through the sigmoid.

* This must be of length net->output_nodes or NULL.

*/

extern void ThreeLayerNNEvaluateVector

(ThreeLayerNN *net,

double *input_vector,

double *hidden_vector,

double *output_vector,

double *hidden_raw,

double *output_raw);

Evaluation function definition/*

* See header file for description of routine

*/

extern void ThreeLayerNNEvaluateVector

(ThreeLayerNN *net,

double *input_vector,

double *hidden_vector,

double *output_vector,

double *hidden_raw,

double *output_raw)

{

int k;

int j;

int i;

double *base_weight;

double a;

double y;

/* Assertions */

assert(net != (ThreeLayerNN *) NULL);

/* Processing */

/* Evaluate hidden nodes */

for(k = 0; k < net->hidden_nodes; k++)

{

/* Set activation to zero */

a = 0.0;

/* Find the base of the weight list for this node */

base_weight = net->hidden_node_weights

+ (k * (net->input_nodes + 1));

/* Compute the scalar product of the inputs and the weights */

for(i = 0; i < net->input_nodes; i++)

{

a += (*(input_vector + i)) * (*(base_weight + i));

}

/* Add in the threshold pseudo weight */

a += (-1.0) * (*(base_weight + net->input_nodes));

47

/* If requested store the raw activation energy */

if(hidden_raw != (double *) NULL)

{

*(hidden_raw + k) = a;

}

/* Pass ’a’ through sigmoid */

y = ThreeLayerNNSigma(a);

/* Store the output */

*(hidden_vector + k) = y;

}

/* Evaluate output nodes */

for(j = 0; j < net->output_nodes; j++)

{

/* Set activation to zero */

a = 0.0;

/* Find the base of the weight list for this node */

base_weight = net->output_node_weights

+ (j * (net->hidden_nodes + 1));

/* Compute the scalar product of the inputs and the weights */

for(i = 0; i < net->hidden_nodes; i++)

{

a += (*(hidden_vector + i)) * (*(base_weight + i));

}

/* Add in the threshold pseudo weight */

a += (-1.0) * (*(base_weight + net->hidden_nodes));

/* If requested store the raw activation energy */

if(output_raw != (double *) NULL)

{

*(output_raw + j) = a;

}

/* Pass ’a’ through sigmoid */

y = ThreeLayerNNSigma(a);

/* Store the output */

*(output_vector + j) = y;

}

}

48

Appendix E

Fourier log polar invariancetransform testing

This appendix details the testing performed onthe Fourier log polar invariance transforms to dis-cover how well they actually achieved invariance.

Method

The set of hand written characters1 shown in Fig-ure E.1 was created2. In theory, if the invariantrepresentation of a transformed example of one ofthese character is compared with the invariant rep-resentations of these characters. Then, if the in-variance transforms are actually working, the dif-ference between the transformed example’s repre-sentation and the representation of the character itis a transform of, should be less than the differencebetween the transformed example’s representationand the representations of any of the other charac-ters. Rotated, scaled and translated versions of thecharacters were therefore created.

Results

The results obtained from performing the differencetest are shown in Figure E.2.

Interpretation of results

These results showed two things:

• Translation invariance was working perfectly,as the results showed that there was never

1Hand written characters are well defined objects andhence provide a better test case than cars.

2These tests have been performed on larger datasets butthis small sub set is illustrative of the kind of results obtainedfrom the larger sets.

a-base c-base e-base

k-base

t-basem-basel-base

h-baseg-base

Figure E.1: The base set of characters used for test-ing the invariance transforms.

49

Variations Base charactersa-base c-base e-base g-base h-base k-base l-base m-base t-base

a-rotated 1.15 2.21 2.86 1.93 2.23 2.21 2.48 2.98 2.48c-rotated 1.70 1.72 2.74 1.51 1.81 2.01 1.64 2.84 1.63e-rotated 1.91 2.08 2.43 1.69 2.15 1.82 1.89 2.58 2.27g-rotated 1.58 1.53 2.49 1.30 1.65 1.74 1.82 2.63 1.79h-rotated 2.04 2.08 2.56 1.65 2.08 1.93 1.92 2.65 2.21k-rotated 1.81 2.14 2.76 1.67 2.06 2.01 1.63 2.90 1.88l-rotated 2.37 1.68 2.19 1.57 2.08 1.48 1.70 2.34 1.99m-rotated 1.83 1.94 2.42 1.62 1.98 1.80 1.94 2.54 2.14t-rotated 2.42 2.67 3.98 2.22 2.68 3.13 0.76 4.10 0.79

a-scaled 2.62 2.91 4.33 2.50 2.88 3.47 1.01 4.45 0.78c-scaled 2.89 3.08 4.49 2.67 3.06 3.61 1.38 4.61 1.19e-scaled 2.59 2.56 3.77 2.24 2.59 2.90 1.46 3.90 1.37g-scaled 3.02 3.28 4.69 2.86 3.25 3.83 1.33 4.81 1.22h-scaled 2.98 3.05 4.37 2.73 2.96 3.54 1.52 4.47 1.17k-scaled 2.69 2.89 4.22 2.50 2.85 3.35 1.23 4.34 1.07l-scaled 3.55 3.85 5.23 3.41 3.84 4.36 1.69 5.36 1.77m-scaled 2.62 2.59 3.80 2.24 2.46 2.96 1.39 3.91 1.32t-scaled 3.86 4.19 5.61 3.77 4.14 4.74 2.10 5.73 1.80

a-translated 0.00 1.98 2.86 1.56 2.22 2.15 2.27 3.00 2.46c-translated 1.98 0.00 1.59 1.10 1.65 1.23 2.45 1.64 2.65e-translated 2.86 1.59 0.00 2.01 2.21 1.16 3.63 6.83 3.93g-translated 1.56 1.10 2.01 0.00 1.56 1.26 1.96 2.09 2.29h-translated 2.22 1.65 2.21 1.56 0.00 1.64 2.52 2.14 2.44k-translated 2.15 1.23 1.16 1.26 1.64 0.00 2.76 1.26 3.09l-translated 2.27 2.45 3.63 1.96 2.52 2.76 0.00 3.75 9.53m-translated 3.00 1.64 0.84 2.09 2.14 1.26 3.75 0.00 4.00t-translated 2.46 2.65 3.93 2.29 2.44 3.09 0.95 4.00 0.00

Figure E.2: This table shows the differences between the invariant representations of the base set ofcharacters and the rotated, scaled and translated versions of this base set of characters. The differencesare in arbitrary units, but the higher the number the greater the difference between the two represen-tations. The minimum difference in each row is marked in bold type. The element that should be theminimum difference in each row if the invariance transforms are working is underlined.

50

any difference between a base character andits translated variation.

• Rotation and scaling invariance were not work-ing at all, as there was hardly ever any corre-lation between each of the base characters andtheir rotated and scaled variations.

51

Appendix F

Fourier polar invariance transformtesting

This appendix details the results of repeating therotation invariance test from Appendix E on thepolar (i.e. non log) invariance transform.

Results

The results obtained from performing this test areshown in Figure F.1.

Interpretation of results

These results showed that the use of the polartransform results in some correlation between thebase characters and their variations. However it isstill a fairly poor result.

53

Variations Base charactersa-base c-base e-base g-base h-base k-base l-base m-base t-base

a-rotated 0.53 1.15 1.10 1.09 1.30 1.04 1.69 1.19 1.67c-rotated 1.12 0.52 1.03 0.58 0.80 0.79 1.01 1.00 0.95e-rotated 1.06 1.00 0.83 0.90 0.90 0.74 1.48 0.83 1.55g-rotated 0.97 0.49 0.90 0.45 0.78 0.66 0.98 0.90 1.00h-rotated 1.32 0.85 0.96 0.75 0.74 0.82 1.38 0.92 1.46k-rotated 0.97 0.72 0.82 0.55 0.80 0.65 1.06 0.85 1.16l-rotated 1.60 0.92 1.33 0.92 1.29 1.11 0.47 1.34 0.52m-rotated 1.02 0.81 0.86 0.75 0.80 0.72 1.31 0.80 1.35t-rotated 1.60 1.03 1.64 1.03 1.48 1.37 0.38 1.63 0.34

Figure F.1: This table shows the differences between the invariant representations of the base set ofcharacters and the rotations of this base set of characters. The differences are in arbitrary units, but thehigher the number the greater the difference between the two representations. The minimum differencein each row is marked in bold type. The element that should be the minimum difference in each row ifthe invariance transforms are working is underlined.

54

Appendix G

Project proposal

G.1 Introduction

At all hours of the day a huge quantity of aerialsurveillance photography is gathered by aircraftsand satellites. Extracting all of the available infor-mation content from this data by hand is impos-sible. Some form of automated processing of thisdata would therefore be beneficial.

The intention of this project is to develop a systemto detect the locations of cars in aerial photographs.So for example, if given an aerial photo of a townit would return the coordinates of all the cars thatare visible in the photo. Applications of such asystem would include aiding in the investigation oftraffic congestion, or if suitably adapted1, trackingthe movements of military vehicles.

G.2 Detailed specification

The core aim of this project is to implement a backend library which conceptually will have a singlefunction call. The format of this function call willbe along the lines of:

analyseImage(<image>)

returns <array of object locations>

PARAMETERS:

<image> : The image to be processed.

1By retraining it to recognize military vehicles.

RETURNS:

An array of (x,y) coordinates describing

where all the objects detected are to be

found in the image.

This low level implementation will allow the systemto easily be integrated into any proposed applica-tions. An example GUI and/or command line in-terface will also need to be implemented for testingpurposes.

G.3 Metrics of success

When evaluating the success of the system therewill be two levels of testing.

The first of these is unit testing. Here each of thecomponents that make up the system will be fed ar-tificial test data. If the results from the componentmatch its design specification then the unit will bedeemed to be functioning correctly.

The second level of testing is system testing. Herethe whole system will be tested on its ability to per-form it’s specified function, e.g. to locate cars. Thiswill be achieved by feeding the system carefully se-lected test images that will exercise its ability to de-tect cars in different circumstances and reject otherobjects that are not cars.

When evaluating the success at this level there arefour metrics of importance:

Correct accept accuracy: The percentage of test im-

55

ages shown to the system that contain a car, inwhich the system correctly locates the car. Thisfigure wants to be as high as possible.

Correct reject accuracy: The percentage of test im-ages that contain no car, which the system rejectsas containing no car. This figure wants to be ashigh as possible.

False accept accuracy: The percentage of test im-ages shown to the system that don’t contain a car,that the system reports as containing a car. Thisfigure wants to be as low as possible.

False reject accuracy: The percentage of test im-ages that the system is shown that contain a car,in which the system fails to locate the car. Thisfigure wants to be as low as possible.

This is a computer vision problem and computervision problems are notorious for not being able toachieve a 100% object recognition rate. For ex-ample if a car is partially obscured by a tree it isvery unlikely that the system will be able to detectthat car, whilst a human observer is likely to beable to infer that it’s a car. Therefore the percent-ages that will be set as being ”acceptable”, will beselected after the initial reading weeks have deter-mined what an acceptable success rate is for thistype of application.

G.4 Project content

A basic design for the system will not be arrived atuntil this proposal has been submitted and severalweeks of research and design have been completed.However it is likely that the system will involvesome or all of the following components:

Basic image processing techniques, such as edge de-tection and image sharpening. These are likely tobe used as a preprocessing operation to prepare theimages for analysis.

Analytical image processing techniques, such assearching for rectangles in the image of approxi-mately the correct size to be a car.

Domain conversion techniques, such as transforma-tions into the wavelet or Fourier domains. This willbe used to extract the main information compo-nents from the image or to obtain characteristicssuch as invariance of the data to rotation and scal-ing.

Feature classification systems, such as statisticaldecision, Bayesian inference or neural net classi-fiers. These will be used to take the features we’veextracted from the image and decide whether theimage is a car or not.

G.5 Implementation environ-ment

The system will be implemented on the UNIX plat-form in the C++ language. This is to give a goodcombination of speed, expressive power and flexibil-ity. Also implementation on a commonly availableplatform such as this will allow the continuation ofthe project even if a total hardware failure of themain development system is suffered.

G.6 Starting point

The language (C++) and the platform (UNIX) isalready known by the author.

At the start of the project proposal period the au-thor knew very little about computer vision andhas not as of yet attended any substantial courseson the subject. Therefore before design begins theauthor will need to do substantial reading in the ar-eas of computer vision, image analysis and targetrecognition.

G.7 Special resources re-quired

• Aerial photographs for analysis have alreadybeen provided by the Cambridge UniversityCommittee for Aerial Photography.

56

• Development of system will be on author’s per-sonal machine.

G.8 Timetable

Weeks 1 and 2: 26th October 1998 - 8th Novem-ber 1998

Investigation into image processing and writing ofsmall experimental routines for familiarization withtechniques.

Weeks 3 and 4: 9th November 1998 - 22ndNovember 1998

Further investigation, writing of experimental rou-tines and high level design.

Deliverables:

• High level design

• Module interface specifications

• Basic design of modules

Weeks 5 to 10: 23rd November 1998 - 3rd Jan-uary 1999

Implementation of feature extraction modules.

Deliverables:

• Feature extraction modules

Weeks 11 to 13: 4th January 1999 - 24th January1999

Implementation of feature classification modules.

Deliverables:

• Feature classification modules.

Week 14: 25th January 1999 - 31st January 1999

Preparing for and writing progress report.

Deliverables:

• Progress report

Weeks 15 and 16: 1st February 1999 - 14thFebruary 1999

Unit testing.

Deliverables:

• Unit testing results.

Week 17: 15th February 1999 - 21st February1999

System integration.

Deliverables:

• Integrated system.

Weeks 18 to 20: 22nd February 1999 - 14thMarch 1999

Final testing and problem resolution.

Deliverables:

• Completed system

• Complete testing results

Weeks 21 to 23: 15th March 1999 - 4th April1999

Writing draft dissertation.

Deliverables:

• Draft dissertation

Weeks 24 and 25: 5th April 1999 - 18th April1999

Completing dissertation.

Deliverables:

• Completed dissertation

57

target location in aerial photography · mittee for aerial photography. they are reproduced here...

Documents