bachelor thesis - university of novi sad · 2012. 10. 24. · this bachelor thesis is partially...

Bachelor thesis

True orthophoto generation

Rupert Wimmer

March 25th, 2010

Abstract

Throughout this Bachelor thesis methods for generating true orthophoto imagery from

aerial and satellite imagery based on digital elevation models are investigated, compared

and an application for generating true orthophotos is developed.

The term ”True Orthophoto” is based on a generation process that tries to restore any

occluded objects in aerial imagery while at the same time including as many objects as

possible in the surface model.

New developments in image and digital processing still increase the interest in orthopho-

tos and result in a demand for greater quality of orthophotos. However, occlusions due

to rough terrain or significant difference in elevation lead to inconsistencies in accuracy

and scale. True orthophotos eliminate these inconsistencies, but most of the existing ap-

proaches to generate true orthophotos require a 3D model of the desired earth surface

area, which is time and cost-intensive in the generation process. The German Aerospace

takes another route, and wants to generate fast and cheap true orthophotos based on fully

automated generated elevation models.

The four general steps of the true orthophoto generation process are (1) rectification of the

source images and locating occluded areas, (2) seamline placement based on a distance-to-

blindspot algorithm, combined with (3) mosaicking and (4) feathering of seamlines with

multiresolution splines.

The overall goals of this thesis are to investigate problems of orthophotos and to devise

solutions in order to implement methods that are capable to create true orthophoto im-

agery fully automated. The achievements show that the investigated and implemented

methods give reasonable results compared to other true orthophoto applications, due to

image quality and computation time. A performance gain of 10 times is accomplished.

iii

Contents

Contents

Abstract . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . iii

Contents . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . v

1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1

1.1 Motivation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1

1.2 Problem definition . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2

1.3 Outline and structure . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2

1.3.1 General overview of the chapters . . . . . . . . . . . . . . . . . . . . 3

2 Orthophotos . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5

2.1 Creating orthophotos . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6

2.1.1 Reprojection . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7

2.1.2 Mosaicking . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8

2.2 Relief displacements . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9

2.3 True orthophotos . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11

2.4 Accuracy of orthophotos . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 13

2.5 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 16

3 The Camera Model . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 17

3.1 Interior orientation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 18

3.2 Exterior orientation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 19

3.3 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 21

4 Digital Elevation Models . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 23

4.1 Elevation models . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 23

4.2 Data collection for digital elevation models . . . . . . . . . . . . . . . . . . 24

4.3 Surface representation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 25

4.3.1 Regular Raster Grid . . . . . . . . . . . . . . . . . . . . . . . . . . . 26

v

Contents

4.3.2 Triangulated Irregular Network . . . . . . . . . . . . . . . . . . . . . 27

4.4 DEM generation by stereo image matching . . . . . . . . . . . . . . . . . . 29

4.5 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 30

5 Design description . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 31

5.1 Limits of other true orthophoto applications . . . . . . . . . . . . . . . . . . 31

5.2 Creating true orthophotos - Step by step . . . . . . . . . . . . . . . . . . . . 32

5.2.1 Rectification . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 32

5.2.2 Locating occluded pixels . . . . . . . . . . . . . . . . . . . . . . . . . 34

5.2.3 Seamline placement . . . . . . . . . . . . . . . . . . . . . . . . . . . 34

5.2.4 Mosaicking . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 35

5.3 Implementation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 35

6 Raytracing the elevation model . . . . . . . . . . . . . . . . . . . . . . . . . . . 37

6.1 Data storage . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 38

6.2 Bounding box optimization . . . . . . . . . . . . . . . . . . . . . . . . . . . 39

6.3 Global and local maximum heights . . . . . . . . . . . . . . . . . . . . . . . 40

6.4 Raytracing with the Bresenham algorithm . . . . . . . . . . . . . . . . . . . 42

6.5 Parallel processing . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 45

6.6 Rectification . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 47

6.7 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 48

7 Mosaicking . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 51

7.1 Mosaicking and Merging methods . . . . . . . . . . . . . . . . . . . . . . . . 52

7.2 Mosaicking by Nearest Feature Transform . . . . . . . . . . . . . . . . . . . 52

7.3 Seamline feathering . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 53

7.3.1 Generating the Gaussian Pyramid . . . . . . . . . . . . . . . . . . . 55

7.3.2 Generating the Laplacian pyramids . . . . . . . . . . . . . . . . . . . 56

7.3.3 Summation and splinning overlapped images . . . . . . . . . . . . . 57

7.4 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 58

8 Experimentation and Evaluation . . . . . . . . . . . . . . . . . . . . . . . . . . 61

8.1 Performance . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 62

8.2 Pros... . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 63

8.3 ...and cons . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 64

8.4 Using a simpler DEM . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 65

8.5 Considering all images for Nearest Feature Transform . . . . . . . . . . . . 66

vi

Contents

8.6 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 66

9 Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 69

9.1 Evaluation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 69

9.2 Outlook . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 70

Acknowledgements . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 73

References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 75

List of Abbreviations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 77

List of Figures . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 79

A Appendix . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 81

A.1 Content of companion CD . . . . . . . . . . . . . . . . . . . . . . . . . . . . 81

A.2 Enblend user guide . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 82

A.2.1 Raytracing user guide . . . . . . . . . . . . . . . . . . . . . . . . . . 82

vii

1 Introduction

This chapter gives an overview of the general purpose for and objectives of this thesis.

The motivation and goals of the project are presented along with a brief description of the

following chapters of the thesis.

1.1 Motivation

New developments in image and digital processing still increase the interest in digital,

accurate, undistorted and true-in-scale images - so called orthophotos, the very common

part of spatial datasets. Orthophotos can be used to measure true distances and are

commonly used for tasks where a greater detail and timeliness than maps are required.

The opportunity for imagery with higher resolution results in the demand for greater

quality and accuracy of orthophotos.

With today’s high resolution aerial photography, only a limited accuracy is provided when

using traditional orthophoto production. Rough terrain or significant difference in eleva-

tion leads to inconsistencies in accuracy and scale with the normal orthophoto method,

which cannot handle occlusions. These limitations might cause problems for the user, who

is unaware of them, and incorrectly uses the orthorectified imagery as a true and accurate

map.

The increasing detail of orthophotos makes the limitations more and more evident. The

demand for greater quality and accuracy requires new methods and algorithms to over-

come these limitations of normal orthophotos. The ever raising computer processing power

increases the feasibility to create true orthophotos on a large scale, and hence the Ger-

1

1 Introduction

man Aerospace Center wanted to extend the existing image processing software by true

orthophoto generation to meet the demand on accurate true orthophotos.

Various researchers recently investigated true orthophoto generation, but most of their

approaches imply a manually created 3D model of the desired earth surface area, whose

production is very time and cost-intensive. To meet the demand of fast and cheap true

orthophotos, this study takes another route and works with fully automated generated

elevation models.

1.2 Problem definition

The aim of this bachelor thesis is the design and implementation of a software for true

orthophoto generation. The generation process based on aerial or satellite images and

digital elevation models is supposed to be as fully automatic as possible. The overall goals

of this thesis are:

- Devise a method to create true orthophotos.

- Investigate problems and solutions for generating orthophotos.

- Implement methods optimized regarding quality and computing time that are capa-

ble to create true orthophoto imagery fully automated.

- Evaluate the solutions through test methods.

1.3 Outline and structure

This bachelor thesis is partially based on the master thesis of M. O. Nielsen [Nie04].

Whenever the preparatory thesis is referenced, the important results are presented and

can therefore be read without the prior knowledge of [Nie04].

The first chapters cover the basic theory for generating orthophotos and the difference

to true orthophotos. Next, methods to create true orthophotos are introduced. The key

steps are explained, tested and evaluated independently in the following chapters.

2

1.3 Outline and structure

A software module as part of the existing image processing software XDibias was developed

during the investigations on this thesis to produce true orthophotos and the existing

software enblend [Md04] was used for mosaicking. The developed software can be found

on the companion CD.

1.3.1 General overview of the chapters

Chapter 2, Orthophotos: Introduces the concept of orthophotos and the procedure to

create them. Afterwards, this is extended to true orthophotos and the differences

are pointed out. In the end, the accuracy of orthophotos is analyzed.

Chapter 3, The Camera model: The mathematical model for the interior and exterior

orientation of a camera lens system, important for the true orthophoto generation,

is presented.

Chapter 4, Digital Elevation Models: The basic concept of digital surface models and

the different model representations are described. A description of stereo-matched

elevation models, which are used in this project, are given.

Chapter 5, Design description: A step-by-step method to create true orthophotos is in-

vestigated and specified.

Chapter 6, Raytracing the elevation model: Methods for an effective way of tracing rays

between the camera and the surface model are developed in this chapter. Since a

tremendous amount of calculations is required for processing large aerial images, the

performance is an important issue.

Chapter 7, Mosaicking: Methods for seamline placement and feathering are presented,

tested and evaluated.

Chapter 8, Test results: The implemented method is tested on a set of data. Pros and

cons are illustrated with close-ups and results are commented.

Chapter 9, Conclusion: This chapter takes the entire thesis into consideration again, sum-

marizes and draws out the final conclusion and statement. On top it presents sug-

gestions for future work and performance optimizations.

3

2 Orthophotos

A taken photograph shows an image of the world projected through a perspective center

onto the image plane. Because of this so called central projection and the fact that aerial

images are normally shot vertically, objects at the same point but with different heights

are placed at different positions in the photograph (figure 2.1). As an effect of these relief

displacements, objects placed at a high position (consequently closer to the camera) will

appear bigger in the photograph and occlude objects at a lower height.

Datum

Terrain

Image plane

Perspective center

A' B' C' D' E' F'

A BC

D E F

a b c d e ff e d c b a

A' B' C' D' E' F'

A BC

D E F

Orthographic projectionPerspective projection

Figure 2.1: Illustrating the difference between ortho-

graphic and perspective projection

Aerial and satellite images

are often used combined

with spatial data in Ge-

ographic Information Sys-

tems (GIS), as reference

maps in city planning, or as

part of realistic terrain vi-

sualizations in flight simula-

tors. Therefore the images

have to be adjusted for to-

pographic relief, lens distor-

tion, camera tilt and recal-

culated with an underlying Digital Elevation Model (DEM). All this is done throughout

the ortho rectification process, which tries to eliminate the perspectiveness of the image

by computing an orthogonal projection for every single point of the image instead of pro-

jecting the rays through one point onto the image plane. The orthophoto is true in scale

and has a reference to the world coordinate system and can consequently function as an

uninterpreted map. Orthophotos have a high up-to-dateness, can be merged into one large

photo of an enormous area and they can be generated more often than typical topographic

maps because of the low expenses.

5

2 Orthophotos

Image plate

Figure 2.2: Illustrating the cause of relief displacements [M+01]

2.1 Creating orthophotos

For the orthophoto generation process knowledge of the terrain and also the camera model,

position and orientation during exposure is required. A terrain model can be created in

several ways, but the most common is to use digital cameras with direct georeferencing

by GPS- and IMU-measurements (investigated in chapter 3). An additional way is pho-

togrammetry, which provides algorithms known as bundle adjustment to minimize the

errors of an image and to figure the needed parameters out. Another obsolete way to

extract the parameters is by manually fitting the image over some known Ground Control

Points (GCP) without considering the camera model (sampling). The points constitute

a relation between unique points in the source images and points located in terrain with

known positions due to a GIS. GCPs are typically used in bundle adjustment, too.

6

2.1 Creating orthophotos

2.1.1 Reprojection

Reprojection is the first step of orthophoto rectification, where rays are reprojected from

the image onto the model of the terrain. It is possible to do the reprojection in two ways:

Forward or backward reprojection.

The forward method projects the source image back onto the terrain (figure 2.3). The in-

tersection point of the projection with the terrain (X,Y,Z) is then stored in the orthophoto.

If the upper left corner of the orthoimage is placed at X0, Y0 the pixel coordinate of a

point in the orthoimage is at:

[

column

row

]

=1

GSD∗

[

X −X0

Y0 − Y

]

(2.1)

Figure 2.3: Main principle of for-

ward and backward projection.

[Nie04]

where GSD is the Ground Sample Distance, which

is the pixel size and consequently the distance be-

tween two pixels (from pixel center to the very next

pixel center). This equation also takes into consid-

eration that the world coordinate system has the Y

coordinate upwards / north and a pixel coordinate

system has the Y-axis downwards.

Through the forward projection, regularly spaced

points in the source image are projected to a set of

irregular spaced points on the terrain. To store the

pixels, they have to be interpolated into a regular

array of pixels of a digital image. This interpolation

is the reason for the preference of backward pro-

jection. Instead of projecting a point of the source

image onto the terrain, a pixel of the output ortho

image is projected back to the source image. In this

case, the interpolation is done in the source image, which is easier to implement and the

interpolation can be done right away for each output pixel. On top only needed pixels of

the orthophoto are reprojected.

7

2 Orthophotos

For the backward projection a row / column coordinate of a pixel of the orthophoto needs

to be converted to the world coordinate system. The Z coordinate is found at this point

in the terrain. The pixel-to-world transformation is done by:

[

X

Y

]

=

[

X0

Y0

]

+GSD ∗

[

column

−row

]

(2.2)

To identify the point in the source image that corresponds with the found X,Y, Z co-

ordinate, the camera needs to be modeled. A description of the camera model and the

equations needed for this calculation can be found in chapter 3.

2.1.2 Mosaicking

Orthophotos often cover an enormous area and will therefore require the rectification of

several source images that are merged together afterwards. This process is called mosaick-

ing and involves several steps:

- Seamline generation

- Color matching

- Feathering and dodging of the seamlines

The line where the images are stitched together is defined as a seamline and can be

generated either automatically or manually. The focus of this process is to mosaick the

images along places they look very familiar and in the best case the seamlines are not

recognizable. A manual seamline placement is often done along the centerlines of the

roads. There exist several ways to place seamlines automatically. The simplest is to place

the lines along the center of the overlap. Another way is to subtract the images from each

other and place the line along the minimum difference between the two images, doing a

so-called least-cost trace [Nie04].

To create a high quality orthophoto, the images mosaicked should have the same color

and brightness near seamlines to conceal them. There are several techniques that can be

performed to hide seamlines. Color matching and dodging try to remove the radiometric

differences in the images by analyzing and comparing the overlapping sections. Feathering

8

2.2 Relief displacements

tries to hide the remaining differences by making a smooth cut that slowly fades from one

image to the other.

2.2 Relief displacements

Figure 2.4: Relief displacements [Nie04]

The earth curvature for satellite pictures and the flight altitude for aerial images cause

relief displacements due to central projection. At the nadir point there are no relief

displacements, but they increase with the distance to nadir. On top errors in the elevation

model result also in horizontal errors caused by ”uncontrolled” relief displacements. The

horizontal error ∆hor (relief displacement) can be found through a geometric analysis of

a vertical offset ∆ver (building or object), the flight altitude above the base of the object

H, the distance to the image center rt and the camera constant f as illustrated in figure

2.4. From this figure the following relation is derived:

f

rt=

H

D + ∆hor=H −∆verD

(2.3)

Isolating ∆hor results in:

9

2 Orthophotos

∆hor =rt ∗∆verf

(2.4)

Figure 2.5 illustrates that a higher flying altitude results in smaller relief displacements.

A real-world example is illustrated on figure 2.6.

Figure 2.5: For images taken with the same kind of lens the relief displacementsdecrease with an higher altitude, but increase with the distance to the nadir point[Nie04].

Figure 2.6: The two images are taken from roughly the same position but different al-titudes and lenses. The building is about 70 meters tall and the relief displacementsdiffer significantly due to the flight altitudes and lenses. [Nie04]

10

2.3 True orthophotos

2.3 True orthophotos

Orthophotos are usually created using a base earth elevation model and do not consider

occlusions. However, due to rapid changes in elevation, the consequential bigger relief

displacements for higher buildings can be so large that they will occlude the terrain and

objects next to them (figure 2.8).

Figure 2.7: Ob-

ject Stretching

with forward

projection due

to occluded

areas

At the German Aerospace Center the image processing software XDib-

ias is used. It consists of several modules for almost any kind of image

processing. The orhtophoto is one of the modules, and by means

of forward projection it generates a normal orthophoto on basis of a

DEM. This approach is capable of handling different types of cameras,

but leads to unwanted stretched objects (figure 2.7) in occluded areas,

resulting from interpolations in the orthophoto. Interpolation has to

be done due to gaps in the orthophoto caused by occluded areas that

are based on different heights of objects.

In this project, backward projection is used in order to eliminate the

interpolation in the orthophoto and the simpler investigation of oc-

cluded areas. The backward projection rectifies buildings and objects

back to their original position, but also leaves a ”copy” of the object on the terrain. The

left copy on the terrain - a so-called ”ghost image” - is caused by lack of information;

rays are projected back from the elevation model to both the occluded area and the oc-

cluding object without detecting that occluded data is being rectified. Therefore, the

”wrong” image data is placed in the occluded areas and is illustrated in figure 2.11b. A

true orthophoto reprojects the source images over a digital elevation model as well, but

takes occluded areas into account and fills them with data from other images throughout

raytracing, seamlining and mosaicking.

An orthophoto is understood as ”true”, when the generation process tries to restore any

occluded objects while at the same time including as many objects as possible in the

surface model. To include anything in the surface model that is visible, like vegetation,

people, cars, traffic lights, etc., in the source images would be an impossible task. In a

general understanding, true orthophotos are based on surface models that only include

terrain, buildings and bridges. A similar definition is found in [A+98]:

11

2 Orthophotos

Figure 2.8: Because of the perspective projection rapid elevation chances and tallbuildings hide objects next to them.

[...] the term true orthophotos is generally used for an orthophoto where surface

elements that are not included in the digital terrain model are also rectified to

the orthogonal projection. Those elements are usually buildings and bridges.

A different definition, which defines the true orthophoto only on basis of removing ghost-

image artifacts, is found in [K+04]:

[...] the term ”True Ortho” means a processing technique to compensate for

double mapping effects caused by hidden areas. It is possible to fill the hidden

areas by data from overlapping aerial photo images or to mark them by a

specified solid color.

Figure 2.10: Combina-

tion of several im-

ages for full coverage

In order to restore the occluded areas - or blindspots - correctly

and to automatically fill them with data, imagery of these ar-

eas is required. This supplemental information can be gained

by pictures of the same area taken from different perspectives

(figure 2.10). These pictures have the occluded areas shown and

by combining them, full coverage can be achieved. Aerial im-

ages are typically captured with sufficient overlap as illustrated

in figure 2.9. That means, for every blindspot seamlines have to

be generated and will, therefore, result in a significantly higher

amount of seamlines compared to regular orthophotos. The con-

12

2.4 Accuracy of orthophotos

Figure 2.9: Possible seamline placement in some orthophotos

sequence is a high demand on the mosaic process and good colormatching algorithm, since

the match must fit around all the numerous seamlines.

Before going through the true orthophoto generation process instead of the ordinary or-

thophoto generation, a decision based on facts has to be made. For images taken at a

high altitude with a small scale or resolution, true orthophoto generation makes no sense

because of relief displacements at the subpixel level or even displacements of 2-3 pixels do

not really matter. Consequently true orthophoto generation is only interesting for images

of high detail or low altitude, tall buildings and rough terrain or off-nadir-images, which

are often captured by high resolution satellites. Additionally, the kind of lens matters;

normal-angle lenses have less relief displacements than wide-angle lenses. A further inter-

esting field for true orthophoto generation is sideways looking satellite images, especially

for mountains.


The accuracy of an orthophoto depends on several parameters. Orthophotos are based on

a product derived from other data and consequently dependent of the quality of this data.

In detail, these are:

- The quality and resolution for the source images,

13

2 Orthophotos

(a) (b)

(c) (d)

Figure 2.11: a) Original source image. The building has not been moved to it’s cor-rect position yet. b) Image orthorectified with the existing orthophoto generationprocess. The building is rectified to it’s right position, but a ”ghost image” is leftat the source position. c) Image with visibility mask. d) True orthophoto withmerged imagery.

14


- the inner and outer orientation of the images and

- the accuracy of the digital elevation model.

The general visual quality of a true orthophoto mainly depends on the source images.

Some of the parameters that affect the quality of the images are:

- a non-influenceable parameter - the weather,

- quality of the camera and lens and

- resolution, precision and overall quality of digital scanning (if film is used)

Nowadays, camera models and lenses used for mapping are of a very high quality and of

a resolution of up to 100 Megapixels with eight centimeters ground resolution per pixel.

For this project, the imagery used is taken with digital cameras and has a resolution of

either 8 or 25 centimeters per pixel. The accuracy of the inner orientation of these aerial

cameras is negligible and for the outer orientation, the deviation is at the most about 1

pixel because of bundle adjustment, transformations and different sources. This project is

working with a DEM from stereo processed imagery. The advantage of such a DEM is that

it is generated based on the source images and consequently works perfectly together with

them for true orthophoto generation. On top of that, stereo-matched DEM generation

is very cheap compared to most previous approaches in other true orthophoto generation

processes [Nie04], which use manually modelled buildings.

Regarding equation 2.4 inaccuracies due to a poor DEM (for example by measuring the

surface with a laser, vertical errors in the DEM may happen at sharp edges, such as a

roof, which are not hit exactly and therefore do not return the correct altitude), increase

linearly away from the nadir point and consequently a constant error cannot be used

for orthophotos. Ordinary orthophotos often use only the central part, which firstly is

derived by the overlapping of neighboring images and secondly and more importantly,

reduces the main part of the ”uncontrolled” relief displacements effect. However, for true

orthophotos it is difficult to give a good overall estimate of the mean accuracy because

they are normally heavily mosaicked. Hence, it all depends on the final mosaic pattern.

15

2 Orthophotos

One method to give an estimate of a mean standard deviation integrated over the entire

image area is given by [Nie04]:

σdg =∆verf∗

√

a2 + b2

3(2.5)

The clipping area for the smaller central part used for ordinary orthophotos is scaled by a

and b. For a true orthophoto, the effective area is much larger and the length of the sides

of the image is 2a and 2b. The probability that the edges of an image are not used is the

same as for the central part. Therefore it is not possible to predict a good measure for

the standard deviation prior to the mosaicking process.

2.5 Summary

In this chapter, the concept of orthophotos was introduced, which steps are included to

generate them and most importantly the cause and the problem of relief displacements

before and after the ortho rectification were explained. Next, true orthophotos were defined

as an ortho rectification, which determines occluded areas and mosaics it with overlapping

imagery. Finally, the accuracy of orthophotos and true orthophotos was described and the

problem to estimate the accuracy was pointed out.

16

3 The Camera Model

To work with remote sensing imagery, for instance to merge them to a large mosaick or for

cartographic reasons, a relation to a world coordinate system is required. Therefore, the

light rays need to be modelled in order to trace rays from the object space to the image

plane or the other way around. Knowledge of the orientation and position of the camera

and the inner geometry of the camera are needed to accomplish raytracing. Normally, the

camera model is split into two sets of orientations: the interior and exterior orientation.

The relationship between the image coordinates ξ and η of an image point P ′ and the coor-

dinates X,Y, Z of an object point P is illustrated in figure 3.2 and is generally formulated

in the following equation:

(ξ, η) = f

X

Y

Z

(3.1)

The best way to take aerial images would be with a pinhole camera, which lets light

through a very small hole in and projects the image of the world scaled down to f/H

onto a surface at the back of the camera (figure 3.1a). The distance from the pinhole to

the backside is f (also known as the focal length or the camera constant [Nie04]) and H

is the distance from the pinhole to the object imaged. However, the smaller the pinhole

is, meaning the better the resolution is, the longer the exposure time is. The exposure

time can increase to several hours, which makes it practically unusable for most types of

photography and especially aerial images.

Like pinhole cameras, push broom is a technology for obtaining images with optical cam-

eras. It is usually used for passive remote sensing from space. In a push broom sensor,

17

3 The Camera Model

a line of sensors arranged perpendicular to the flight direction of the spacecraft is used.

Different areas of the surface are imaged as the spacecraft flies forward. Subsequently, the

single lines are merged to a two dimensional picture (figure 3.1). [Wik09]

The parametric description of cameras is depicted at the interior orientation section below,

and the exterior orientation section at the end of this chapter describes algorithms to

eliminate the distortion based on the orientation and position of the camera as well as on

the earth curvature.

f

(a)

Optics

Line Array

Ground track

(b)

Figure 3.1: (a) Pinhole camera: Rays simply parse the hole without any bendingof the rays which makes it simple to model and results in a clear image. (b)Pushbroom camera: Single lines of the earth surface are imaged while the spacecraftflies forward and merged to a two dimensional picture afterwards.

3.1 Interior orientation

The interior orientation was a very important part for analog cameras and early digital

cameras. With new developments and technologies, the manufacturers offer camera sys-

tems with negligible distortions due to interior orientation. Since the camera system used

for this study, UltraCam X or DMC, is highly accurate, has algorithms and procedures

implemented to fix the already trivial distortions on the fly and provides thereby imagery

for which only the exterior orientation has to be considered, only a brief description is

given in this study. The interior orientation is described more precisely in [Kra07] and

[Nie04].

18

3.2 Exterior orientation

Within a camera system distortions due to the lens, focal length and the distance between

the principal point in the image plane and the image center may occur. To eliminate them,

the interior orientation of the camera has to be known. The three constants are specific

to the camera and are normally determined by the manufacturer in the laboratory or test

flights.

The center of the photograph is found by intersecting lines between opposite pairs of

fiducial marks, also referred to as the fiducial center. The Principal Point is given with

respect to the center of the photograph. The manufacturer ensures that, as closely as

possible, the fiducial center coincides with the Principal Point (ξ0 = η0 = 0), also known

as Principal Point of Autocollimation (PPAC), so that the origin of the image coordinate

system is the center of the image plane.

When the image space rays are not parallel to the incoming object space rays, it is caused

by a distortion in the lens. The distortion consists of several components, where the radial

distortion is usually the largest. With an odd polynomial, the radial distortion can be

determined, and by measuring several points in the image, the result is a set of distortions

with respect to the distance to the Principal Point of Best Symmetry (PPBS), which is

the origin of radial distortions and located very close to the fiducial center and PPAC.

The camera constant f is determined during the calibration process as well and is the

length that produces a mean overall distribution of lens distortion [Kra07]. The focal

point is therefore located directly above PPBS at a distance corresponding to the focal

length. [Nie04]

3.2 Exterior orientation

To be able to reconstruct the rays, the geometry of the image forming system must be

known. The exterior orientation of a camera specifies the orientation and position of the

camera in the object space and can be devised in several ways. The one used in this project

is based on a Global Positioning System (GPS) combined with an Inertial Measurement

Unit (IMU), which is highly accurate and fast. The GPS for example provides an absolute

positioning in the object space every second or faster and the IMU measures the orientation

of the camera. The inclusion of control points [Kra07], for which the image coordinates and

19

3 The Camera Model

the object coordinates are known, and bundle adjustment can be used to further increase

the accuracy of the exterior and interior orientation. Frame cameras like UltraCam X

have one exterior orienation for an image, but line cameras have an exterior orientation

for each line, since the satellite is moving while obtaining the single lines.

f

Figure 3.2: Relation between image and object coordinates. [Kra07]

O with coordinates (X0, Y0, Z0) as the position of the perspective center (camera location)

of a three-dimensional bundle of rays, PP as principal point with coordinates ξ0, η0, f as

focal length and M as fiducial center, the relation between the camera space (ξ, η) and

object space (X,Y, Z) consists of a scale, a transition and a rotation in three dimensions

(ω, φ, κ). These operations are expressed in the colinearity equations [Kra07]:

ξ = ξ0 − fr11(X −X0) + r21(Y − Y0) + r31(Z − Z0)

r13(X −X0) + r23(Y − Y0) + r33(Z − Z0)

η = η0 − fr12(X −X0) + r22(Y − Y0) + r32(Z − Z0)

r13(X −X0) + r23(Y − Y0) + r33(Z − Z0)

(3.2)

20

3.3 Summary

The parameters rik appearing in equation 3.2 are the elements of the rotation matrix R

which describes the three-dimensional attitude, or orientation, of the image with respect

to the XY Z object coordinate system. The single values of R and how to determine them

depends on the used GPS/IMU system.

If at least one coordinate is known in the object coordinate system, the reverse calculation

from camera to object coordinate system can be done with equation 3.3 [Kra07]:

X = X0 + (Z − Z0)r11(ξ − ξ0 + r12(η − η0)− r13f

r31(ξ − xi0) + r32(η − η0)− r33f

Y = Y0 + (Z − Z0)r21(ξ − ξ0 + r22(η − η0)− r23f

r31(ξ − xi0) + r32(η − η0)− r33f

(3.3)

3.3 Summary

The focus of this chapter was on the camera model in general and on the two main parts,

the exterior and interior orientations, in detail. The two orientations are mandatory for

an accurate trace of rays from the object space to the camera, through the lens and onto

the image plane. The distortion of the lens is removed by the two orientations.

21

4 Digital Elevation Models

The information of the geometric shape and altitude of objects the source images contain

are mandatory for the ortho rectification process. The imagery and the knowledge of

the camera model used describe the orientation of the camera during the exposure and

distortions within the images, but to determine occluded areas, a georeferenced model of

the earth surface with altitudes of the objects included is required to intersect the rays of

the camera with.

4.1 Elevation models

A digital elevation model is a mathematical representation of an existing or virtual object

and its environment and in the case of this thesis the earth surface. Based on [KE02], a

DEM is a generic concept that may refers to ground elevation but also to any layer above

the ground such as vegetation, bridges or buildings.

Depending on the usage of an elevation model, there are different levels of detail. When

the information is limited to ground elevation, the DEM is called a Digital Terrain Model

(DTM) and only provides information about the elevation of any point on ground or water

surface.

If the pixel information contains the highest elevation of each point, coming from ground

or above ground area, the DEM is called a Digital Surface Model (DSM). Simple DSMs

only contain the roof edges and ignore roof constructions. More advanced DSMs give a

more exact representation of the surface by considering chimneys and the ridges on the

roof as well.

23


An even more advanced surface model that considers eaves and details on the walls would

require terrestrial photogrammetry, which would be very expensive and only necessary for

3D imagery. For the case of true orthophotos, it is an unimportant detail because only

the topmost object is visible for orthogonal aerial images and needed for this project. If

for example the roof covers a balcony below it, this object will not be visible in a correct

true orthophoto. Figure 4.1 illustrates these different types of surface models.

(a) Terrain (b) Roof edges (c) Roof ridges,chimneys andedge of eaves

(d) Wall detailsand eaves

Figure 4.1: Four levels of detail of surface models

4.2 Data collection for digital elevation models

This section gives a brief description of the numerous ways digital elevation models may

be prepared. They are frequently obtained by remote sensing rather than direct survey.

One powerful and common technique for generating DEMs is scanning the earth surface

with a laser: With respect to time-of-flight, the laser shines a point on the earth surface

and measures the distance to the object point based on the runtime of the reflected light.

Further explanations can be found in [ST08]. Alternatively, stereoscopic pairs of images

can be employed using the digital image correlation method. Two optical images acquired

with different angles taken from the same pass of an airplane or an earth observation

satellite. Analog camera images normally have 30 percent sidelap and 60 percent forward

overlap, as figure 4.2a illustrates. For dense city areas this coverage is often not enough

for stereo-matching, due to the lack of available perspectives. With digital cameras and

their easy and convenient way of taking images, a 60-80 percent sidelap and 60-80 percent

forward overlap has come to be standard. Consequently for any point (except the corners

marked in figure 4.2b) several stereoscopic pairs of images are given and the coordinates

of any point can be derived by known exposure positions due to GPS/IMU and known

24

4.3 Surface representation

Flight

dire

ctio

n

(a)

Flight

dire

ctio

n

(b)

Figure 4.2: (a) 30 percent sidelap and 60 percent forward overlap (b) 60 percentsidelap and 60 percent forward overlap.

camera angles. Therefore a significantly cheaper elevation model can be generated. Older

methods for generating DEMs often involve interpolating digital contour maps that may

have been produced by direct survey of the land surface or manual stereo plotting of aerial

imagery.

The quality of a digital elevation model is a measure of how accurate elevation is at each

pixel (absolute accuracy) and how accurately the morphology is presented (relative accu-

racy). Numerous factors play an important role for quality of DEM-derived products:

- terrain roughness,

- sampling density,

- grid resolution / pixel size,

- interpolation algorithm (i.e. for vegetation),

- point location accuracy,

- grid structure.


For data processing purposes the DEM has to be represented in a way that the information

of each pixel can be easily and quickly read, since raytracing includes a lot of picture points

25


to work with. Two surface representations are well known and most common: The Regular

Raster Grid (RG) and the Triangulated Irregular Network (TIN).

4.3.1 Regular Raster Grid

Due to [KE02], one of the main advantages of RGs is, that they have the geometry of an

image where the pixels are the nodes of the regular raster grid (figure 4.3) and the gray

values of the pixels represent the elevations. Therefore, grids should preferably, for data

size reasons and reading performance, be stored as images. The transformation from the

image coordinates of pixel (i, j) to corresponding 3D coordinates (x, y, z) can be expressed

as:

x = i ∗∆x+ x0

y = j ∗∆y + y0

z = f(i, j)

(4.1)

Figure 4.3: Raster image

of a grid

f(i, j) is the height at pixel (i, j). (x0, y0) are the spatial

coordinates of the image’s first row and line pixel. (∆x,∆y)

are the spatial sampling of the grid, or grid size, respectively

along the x, y axes. These simple calculations are the huge

advantage of RGs because a simple and fast location of the

correct grid point instead of interpolating between the trian-

gle’s vertices with TINs [M+01] is provided. The benefit of

fast and simple calculations due to regular spaced grid cells is

also the big limitation of RGs. The grid has only one height

in any point, and therefore, rapid elevation changes cannot

occur within one grid cell. Consequently, the accuracy de-

pends on the spatial sampling. In addition, plain terrain areas are split to several grid

cells instead of merged together to one huge cell.

26


Figure 4.4: DEM image, where the gray level is related to the altitude of the pixel(dark is low altitude - in this case 280 m, white high - in this case 320 m).

4.3.2 Triangulated Irregular Network

Another method of representing geographical features is a triangulated irregular network

that connects irregularly spaced and located spot elevations in an area with lines (edges)

to form a continuous system of triangles [E+00]. Each point (vertex) is connected to at

least three other points and most commonly generated based on Delaunay triangulation,

a method which attempts to assure a most efficient triangulation, connecting each point

only to its nearest neighbors (figure 4.5). This approach maximizes the minimum angles

in all the triangles instead of creating them long and narrow. To handle abrupt changes in

the surface like cliffs, a very dense network is needed (figure 4.6) or a modified algorithm,

which is capable to deal with breaklines, that supplement the points in the surface with

lines. Breaklines are placed along edges in the terrain and a constraint is added to the

algorithm that prevents edges of the triangles to traverse them.

Having triangles and irregularly spaced points overcomes both of the RG disadvantages.

Since data points are placed irregularly, they only need to be collected where there is a

27


(a) (b)

Figure 4.5: In a correct (b) Delaunay Triangulation the circumcircle does not containany points within the circumcircle. Therefore (a) is not a valid Delaunay Triangu-lation.

variation in terrain. Over a large relatively flat or low slope area, only a few points will

serve to describe the form; in areas of greater relief and higher, changing slopes more

frequent points can be measured and stored. The benefit of TINs are that important

points such as local high points and peaks, or low points, stream centerlines, etc. can be

measured and incorporated into the model.

(a) (b)

Figure 4.6: An example for a simple TIN without breaklines (a) and a surface withbreaklines (b). [EH01]

The downside of TIN models are firstly, that the data structure to describe the triangula-

tion is relatively complex. Tables of points, lines and faces have to be maintained, points

have to be linked up to lines (edges), and lines into faces (triangles). On top the z eleva-

tion value has to be interpolated for a given (x, y), since it’s most likely located within a

triangle. Secondly, TINs cannot handle vertical objects, since this requires more than one

height per point. A local height for the triangle and a global height for calculations would

be needed.

28

4.4 DEM generation by stereo image matching

4.4 DEM generation by stereo image matching

The request for this project was an easy, fast and as fully-automated as possible true or-

thophoto generation. To meet this requirement, the true orthophotos generated with the

developed application of this project are based on regular raster grids. The software is in-

dependent of the elevation model generation, but works best with digital elevation models

generated out of stereo satellite or aerial imagery. The benefit of this approach is that the

source images are also the source images for the DEM generation and consequently, match

perfectly for the ortho rectification and visibility mask generation. The DEM generation

process consists of the following main steps, which are implemented as parts of XDibias

[d+09].

1. Stereo matching in epipolar geometry

2. Forward intersection and outlier removal

3. Interpolation and orthorectification

Due to [Tao09], the stereo matching is done pixelwise with Semi-Global-Matching (SGM)

and Mutual Information (MI) to compensate radiometric differences of the input images.

MI is a cost function that provides a pixelwise probability for every possible gray value

combination, which indicates how good these gray values correlate for the stereo images,

but is generally ambiguous and wrong matches can easily have a lower cost than correct

ones, due to noise for instance. Therefore, an additional constraint is added that supports

smoothness by penalizing changes of neighboring disparities. The pixelwise cost and the

smoothness constraints are expressed by defining the energy E(D) that depends on the

disparity image D [Hir07]:

E(D) =∑

p

(C(p,Dp) +∑

q∈Np

P1T [| Dp −Dq |= 1] +∑

q∈Np

P2T [| Dp −Dq |> 1]) (4.2)

The first term is the sum of all pixel matching costs for the disparities of D. The second

term adds a constant penalty P1 for all pixels q in the neighborhood Np of q, for which

the disparity changes a little bit. The third term adds a larger constant penalty P2, for

all larger disparity changes [Tao09].

29


(a) (b)

(c) (d)

Figure 4.7: Stereo matching results from aerial UltraCam X images. (a) Small partof aerial image. (b) Disparity against one image. (c) Reprojected disparity. (d)Merged reprojection.

After the stereo matching, the disparity is reprojected into a cartographic coordinate

system (figure 4.7b). The reprojections of all disparity images are merged using a median

filter (figure 4.7d). Occlusions, matching failures or moved objects, lead to holes in the

merged DEMs, and are filled by inverse distance weighted interpolation. The SGM is a

good trade off between reconstruction quality and computation speed.

4.5 Summary

This chapter introduced the concept of digital elevation and surface models. In the begin-

ning, the different DEMs, their difference in the level of detail and included objects were

described. The next section focuses on the approach of creating a DEM through stereo

matching. Finally the most common surface representations were described; the grid and

the triangulated irregular network.

30

5 Design description

This chapter focuses on characterizing the general methods of the true orthophoto gener-

ation, devised and implemented in this project. The process is a step-by-step procedure,

and each step is described in detail in the following chapters. The approach in this study

is to use regular grids instead of triangulated networks used in [Nie04] and on top no

regular color-matching algorithm matches the images, but multiresolution splines, so that

the limitations of other true orthophoto generation processes are negotiated.

5.1 Limits of other true orthophoto applications

In [Nie04] and in most of other true orthophoto applications manually created TINs are

used for the true orthophoto generation. Creating a 3D TIN is very time consuming

and very expensive because an automated generation is not possible and therefore has

to be done manually. As a result of the manual generation the model is very accurate

and includes undersides of eaves and walls. But for the generation of true orthophotos

undersides of eaves and walls are not necessary, since they are not visible in the final true

orthophoto anyway. It is desirable to reduce the cost and effort of true orthophoto by

using automatically generated DSM by stereomatching or laserscanning.

Existing applications, that work with elevation models often cannot handle eaves or would

require a pre-process that eliminates eaves. Furthermore, they are based on the operating

system Microsoft Windows, but as part of the image processing software XDibias the true

orthophoto application had to be Linux-based.

Since the very often used algorithms for color-matching, like histogram-matching and

hue-matching, need a reference image to apply the others to, the process is not fully

31


automated. The matching algorithm needs to be told which image the reference image

is. On the other hand, multiresolution splines mosaicking treats every image equally, and

therefore, no manual adjustment has to be done.

One of the most difficult tasks in creating true orthophotos is to feather seamlines. In

[Nie04] a 3x3 mean filter is applied several times to smooth the seamlines. The final image

is smooth and the seamlines are feathered but also blurred. To overcome the blurriness,

this study feathers seamlines based on multiresolution splines, which adapt the transition

zone from one image to the other to different spacial frequencies.

5.2 Creating true orthophotos - Step by step

The complete true orthophoto generation process can be broken down to these crucial

steps:

1. Rectify images to orthophotos.

2. Locating the occluded pixels (visibility mask).

3. Seamline placement.

4. Mosaicking.

The true orthophoto process is illustrated on the diagram 5.1.

5.2.1 Rectification

The orthophoto rectification is a commonly accepted method to trace each pixel of the

output image back to the pixel in the input image. It is normal that the trace rarely hits the

center of the pixel in the input image when resampling an image. Therefore, methods are

required which interpolate between pixels. Some of them are: Nearest neighbor, bilinear,

bicubic interpolation and can all be found in [C+01]. In this project, it’s possible to choose

between nearest neighbor and bilinear interpolation. Nearest neighbor was implemented

due to its simplicity, since it selects the pixel value from the pixel that is closest to the

incoming ray. Bilinear interpolation uses the four nearest pixel values, which are located

32

5.2 Creating true orthophotos - Step by step

Figure 5.1: General approach of the true ortho rectification process.

33


in diagonal direction from the pixel hit by the ray in order to find the appropriate pixel

value of the desired output pixel.

The rectification method is illustrated in chapter 2. Needed mathematics and knowledge

of the camera is describe in chapter 3. In chapter 6, the actual raytracing implementation

is characterized.

5.2.2 Locating occluded pixels

Figure 5.2: Possible case for

occluded pixels

The most important step for the true orthophoto gen-

eration is to locate the occluded pixels. A regular or-

thophoto can be generated without this information, but

for mosaicking purposes and to guarantee high accuracy

and scale, the location of any occluded pixel is manda-

tory. Therefore, any ray in the DEM that is ”blocked”

by another object on its path from the point on the sur-

face to the camera has to be registered. Since the rays

have to be traced for the ortho rectification process too, it

makes sense to combine the creation of the visibility mask

and the rectification step. The raytracing of the elevation

model is described in chapter 6.

5.2.3 Seamline placement

To merge the images in a sufficient way, a transition or seamline has to be placed between

the images. The placing is usually based on a scoring algorithm and can be done in various

ways. In the case of this study, the Nearest Feature Transform also known as Distance

Transformation is used, so that the transition line is placed as far as possible from the

blindspots and to be able to fade out in all directions, as near as possible to the middle of

the intersection. The seamline placement is treated in chapter 7 as part of the mosaicking

process.

34

5.3 Implementation

5.2.4 Mosaicking

Figure 5.3: Possible case of a

mosaicked image

The final true orthophoto will be heavily mosaicked due to

all the occluded pixels. So if the processed images have rel-

atively large differences in color and brightness the seam-

lines will be visible and the final result is poor. The images

could be color matched prior to the rectification process,

but to accelerate and optimize the process, multiresolu-

tion splines are used. In chapter 7, the multiresolution

splines mosaicking is investigated and the advantages to-

wards regular color matching and feathering are pointed

out.

5.3 Implementation

The true orthophoto application is implemented in two parts, to split the two main parts,

orthorectification and mosaicking, and to provide the opportunity to mosaick orthorec-

tified imagery of different age. The orthorectification and visibility mask generation are

implemented as a XDibias module. The mosaicking is performed using the Enblend [Md04]

program.

Initially the goal was to have the visibility mask generation merged into the existing

orthophoto generation module. Since the approach for ortho rectification and ray tracing

differs in reprojection, the true orthophoto generation process became an extra module.

The software is written in C and runs on a Linux-based operating system. The first part is

designed for multithreading due to its extensive and many calculations. The module needs

the source image and DEM as input and creates a rectified image with marked occluded

areas as output.

The second part takes generated orthorectified images with marked occluded areas into

consideration and handles the crucial steps seamline placement, feathering and mosaicking.

This module merges all input images to one large ortho image, while trying to fill the

occluded areas with information from overlapping images.

35


The true orthophoto application can be found on the enclosed CD.

36

6 Raytracing the elevation model

Topmost

surface point

Output pixel

Digital elevation model

rays

X,

Y,

Z c

oord

inate

s

Figure 6.1: When raytracing the output pixel back to the source image, the DEM pro-vides the topmost surface point (Z coordinate) of a certain pixel (X, Y coordinate).The ray checks for visibility between the point and the camera. The rightmost rayis occluded by the tower.

Orthophotos can be generated in two ways: with forward and backward projection. Due

to a simpler implementation and more importance to interpolate in the source images

instead of the output image, the backward projection is used for raytracing in this project.

Thereby the output pixels are traced from the object space through the camera lens and

onto the image plane. The digital elevation model provides for each output pixel the X

and Y as well as the Z coordinate. At this position the raytracing back to the camera

37


and onto the image plane starts. When creating a true orthophoto, the ray should also be

checked whether it intersects with another point in the object space or not. These steps

are illustrated in figure 6.1.

An 8 cm resolution true orthophoto of 1 km2 would contain roughly 156 million pixels,

even at 25 cm resolution 1 km2 would contain 16 million pixels and the same number

of ray traces. Therefore, an efficient way of performing the ray tracing is needed. Some

orthophoto applications speed up this process by doing a ray trace for every 2-3 pixels only

and then interpolating between them. This can result in jagged lines along the roof edges

where there are rapid changes of height in the surface model. The method is sufficient with

DTMs where the surface is smoother and it increases the speed significantly. [Nie04]

The preparatory thesis [Nie04] uses TINs in a binary tree data structure to perform the

raytracing and has acceptable results. This bachelor thesis is based on elevation models

using regular data grids. Therefore no binary tree has to be built and each output pixel

can be easily iterated through. To optimize raytracing, several optimizations like bounding

boxes, multithreading, global and local height maximum as well as a modified version of

the Bresenham raytracing algorithm [Bre65] are devised and all but local height maximum

are implemented.

6.1 Data storage

In this section a brief description of the data storage of elevation models, the source

images and the true orthophoto is given. All three of them are stored as folders on the

harddrive which include the actual image and some meta data like the world coordinates

of the upper left pixel of the DEM for example. Each image consists of channels, lines and

columns, which are imported into an one dimensional array in the application. A regular

image consists at least of three channels (red, green, blue) and stores for each pixel in

each channel the intensity of the certain color. Through a formula the three channels are

merged together and a true color image is generated. The DEM has only one channel,

which stores altitude. To get the value of a certain pixel in one of the image arrays, the

index has to be calculated with following equation:

38

6.2 Bounding box optimization

idx = r ∗ w ∗ tch+ w ∗ ch+ c (6.1)

Where idx is the index, r are is the row of the pixel, w is the width of a row, tch are

the total channels of the image, ch indicates the channel of the pixel and c is the column

of the pixel. Because of this equation the memory management is faster than creating a

three dimensional array and therefore speeds the application up.

6.2 Bounding box optimization

A

C

DEM

DEM

D

Figure 6.2: The orthophotos (A-D) and the DEM can be arranged in various ways toeach other. This image shows some arrangements and the required offsets for thebounding boxes (gray areas).

To avoid useless calculations, to save memory and to speed up the raytracing a bounding

box of the DEM and the ouput image is devised. To create the bounding box, at first the

minimum and maximum X, Y coordinates of the source image were calculated so that the

arrangement of DEM and orthophoto to each other could be figured (figure 6.2). Only the

rows and columns within the bounding box are required for the raytracing. The bounding

box considers besides the intersection of DEM and orthophoto also the airplane/satellite

position. Offsets are implemented so that if, for example, the airplane position is within

the DEM but not within the bounding box only the pixels within the bounding box are

traced but not any pixels outside of it.

39


If for instance the bounding box is only a fourth of the size of the elevation model, only

a fourth of the pixels are traced and therefore this simple modification improves the

raytracing significantly.

6.3 Global and local maximum heights

Flight height

Global maximum height

Local maximum height

Local maximum height

Figure 6.3: Illustration of different heights to trace.

Raytracing consists of a tremendous amount of calculations, since every intersected point

or pixel has to be checked if it is an object in the object space or just ”air”. As described

above images may have over 100 million pixels, so each pixel has to be traced and tracing

includes calculations, due to intersections. The regular flight height for aerial images

is about 1500 meters. If for instance the traced pixel is at a height of 280 meters and

the distance in X and Y direction to the airplane position is 100 meters with a ground

resolution of 25 cm includes for one pixel up to 400 calculations (if not intersected with

an object). For 100 million pixels that makes in the worst case 40 billion calculations and

assuming of 1 µs per calculation 11 hours all over computation time. Therefore, a method

is needed to speed the raytracing up.

40

6.3 Global and local maximum heights

In applying a global maximum height, the computation time can be significantly decreased.

The global height is the maximum altitude of any object in the bounding box and since

the DEM has to be imported anyway, checking for the maximum height is easily done.

Instead of checking the entire ray until the airplane, the ray has to be checked until it

intersects with the maximum height layer. For an image with flat terrain and two story

buildings the height differs about 30 meters throughout the image and the mean distance

to the intersection point with the maximum height layer is 5 meters. Doing the same

calculation as above leads to 2 billion calculations for 100 million pixel and a computation

time of 33 minutes and a performance gain of around 20 times! However, the optimization

just works if the altitudes of the objects only differ by a few meters; for rough terrain or

skyscrapers the computation time would rapidly increase again.

SP

RP

CRP(i) MPRP

Figure 6.4: With local maximum

heights only pixels of certain

cells have to be checked for oc-

clusions instead of all points on

the ray back to the camera.

To compensate rough terrain, local maximum

heights are introduced. The idea behind local max-

ima is to break the bounding box down to cells of an

adjustable size for which the local maximum heights

are stored. A mistake would be to check the ray just

until the local height and not against local heights

of intersected cells too (figure 6.3). Figure 6.4 illus-

trates rays for certain points and cells which have

to be checked for those points.

Prior to the actual raytracing of each point P the

intersected cells CP of the ray RP are calculated.

To devise if a cell CRP (i) has to be traced point by

point, the minimum intersection pointMPRP of the

CRP (i) and ray RP are checked with the maximum

local altitude MLH(CRP (i)). If MLH(CRP (i))

is higher or lower than the actual height RH at

MPRP (CRP (i)).

CRP (i) =

0 if RH(MPRP (CRP (i))) > MLH(CRP (i))

1 if RH(MPRP (CRP (i))) <=MLH(CRP (i))(6.2)

41


The cell is checked point by point only if CSP (i) is 1. This approach provides the oppor-

tunity to gain performance by checking only certain points on the raytracing instead of

all. Especially for rough terrain or large altitude differences the raytracing is accelerated.

Local maximum heights is not implemented in the application version on the enclosed

CD and was not considered for the experimentation in chapter 8, since not enough testing

could have been done until the deadline to ensure the algorithm to be bug free. Depending

on the source images short tests pointed a performance gain out of up to 10 times. This

would result in a computation time of about 3-8 minutes for 100 million pixel.

6.4 Raytracing with the Bresenham algorithm

The Bresenham algorithm is an algorithm in image processing to raster straights or circles

on to bit-mapped graphics [Bre65]. To trace the ray, the continuous world coordinates of

the object space have to be transformed into discrete pixel coordinates. If done for each

pixel and each step, a large amount of slow multiplications and divisions with floating

points would have to be done. Therefore, an algorithm to transform the coordinates once

and to trace in pixel coordinates with integer additions as the most complex operations

would accelerate the computing time significantly. The Bresenham algorithm fulfills these

requirements, is easy to implement and does minimize rounding errors.

0

Figure 6.5: Slopes in

different octants

[Wik10b]

The basic variant of this algorithm expects a straight line in the

first octant, that means a line with a slope between 0 and 1 from

(xstart, ystart) to (xend, yend) (figure 6.5). Then dx = xend−xstart

and dy = yend − ystart with 0 < dy ≤ dx. For octant 1 the

upcoming iteration is not like octant 0 based on dx but on dy. If

the slope is in octant 2-7, x or y is not raised by 1, but the signum

value of ∆y respectively ∆x. Furthermore for these octant the

iteration is done backwards instead of forward [Wik10b].

A step in the ”fast” direction (larger difference between end

and start point; in the case of figure 6.6a dx) is done every

iteration.Once the deviation from the ideal line becomes too large, a step in the slow

direction is performed, too. To determine the correct iteration step for the slower step is

by means of an error variable e, which is decreased by the smaller value (dy) every step

42

6.4 Raytracing with the Bresenham algorithm

(a) (b)

Figure 6.6: (a) and (b) describe the raster process of a straight line with the Bresenhamalgorithm. (b) illustrates the states of the error variable [Wik10a]

in x direction. If e < 0, a step in y has to be done and the larger dx value is added to e.

Due to the repeated crossover subtractions and additions the division of the slope triangle

is broken down just to basic operations [Wik10a]. Furthermore, the error variable has to

be initialized wisely. Consider the case of dy = 1, for which the step in y direction has to

be done at the middle or shortly after dx2

.

Mathematically that means, that

y = ystart + (x− xstart) ∗dy

dx(6.3)

gets transformed to

e = dx ∗ (y − ystart)− dy ∗ (x− xstart) (6.4)

If for instance one step in x direction is done, the error variable gets decreased by 1 ∗ dy.

Assuming that e < 0 after the decrease, results in an increase by dx, due to a step in y

direction, which is supposed to result in e ≥ 0 based on dx ≥ dy.

The following listing describes the Bresenham algorithm for all octant in pseudo code

[Wik10a]:

1 istartx = x coordinate of startpoint ; istarty = y coordinate of startpoint ;

2 iendx = x coordinate of endpoint ; iendy = y coordinate of endpoint ;

3

43


4 /∗ measure distances ∗/

5 dx = iendx − istartx ; dy = iendy − istarty ;

6

7 /∗ determine direction and prefix ∗/

8 incx = sgn(dx) ; incy = sgn(dy) ;

9 i f (dx<0) dx =−dx; i f (dy<0) dy =−dy;

10

11 /∗ determine greater distance ∗/

12 i f (dx>dy)

13 {

14 /∗ x is fast ∗/

15 pdx=incx ; pdy=0; /∗ paral le l step ∗/

16 ddx=incx ; ddy=incy ; /∗ diagonal step ∗/

17 ef =dy; es =dx; /∗ error steps fast , slow∗/

18 }

19 else

20 {

21 /∗ y is fast ∗/

22 pdx=0; pdy=incy ;

23 ddx=incx ; ddy=incy ;

24 ef =dx; es =dy; /∗ error steps fast , slow ∗/

25 }

26

27 /∗ in i t ia l i ze ∗/

28 ix = istartx ; iy = istarty ;

29 err = es/2;

30

31 for ( i=0; i<es ; ++i )

32 {

33 /∗ update error term ∗/

34 err −= ef ;

35 i f (err<0)

36 {

37 err += es ;

38 /∗ step in slow direction ∗/

39 ix += ddx; iy += ddy;

40 }

41 else

42 {

43 /∗ step in fast direction ∗/

44 ix += pdx; iy += pdy;

45 }

46 SetPixel(x,y) ; /∗ check height of this pixel ∗/

47 }

44

6.5 Parallel processing

6.5 Parallel processing

Determine global and

local height maximum

Raytracing

Ray 1,3,5,...

Ray 2,4,8,...

Ray 6,7,9,...

Thread A

Thread B

Thread C

Figure 6.7: Illustration of the fork and join procedure with OpenMP for 3 threads.

Nowadays not only supercomputers consist of more than one computing cores, also regular

computers and notebooks have at least two cores and in some cases four or eight cores.

The advantage of multi-core systems is that processes and threads can be computed si-

multaneously and therefore at a fraction of the time as a single-core system. To use the

extra cores efficiently, the software has to be designed for parallel processing, since operat-

ing systems and CPU instructions are not capable to distribute operations on their own,

yet.

A process is an instance of a computer program that is being executed. It contains the

program code and its current activity. Depending on the operating system, a process may

consist of multiple threads of execution that execute instructions concurrently. A thread

results from a fork of a computer program into two or more concurrently running tasks.

The implementation of threads and processes differs from one operating system to another,

but in most cases, a thread is contained inside a process. The difference between threads

and multitasking operating system process are [N+96]:

- processes are typically independent, while threads exist as subsets of a process

- processes carry considerable state information, whereas multiple threads within a

process share state as well as memory and other resources

- processes have separate address spaces; threads share theirs

- processes interact only through system-provided inter-process communication (like

semaphores or message queues [Nor96]) mechanisms.

45


- context switching between threads in the same process is typically faster than context

switching between processes, due to less overhead.

Lately another new technology was introduced, called Hyperthreading. It is an approach

from Intel for hardware-based multithreading. The idea is to utilize the cores of a CPU

better by filling the gaps in the pipeline with instructions of another thread. Those gaps

occur due to a cache-miss for instance and a second process or thread can compute in the

meantime. According to Intel a performance gain of up to 33 percent is possible [Cor04].

Not all algorithms are suitable for parallel processing. If for instance the calculations in

an iteration are based on each other, they have to be synchronized very often. In this

case, synchronization and thread switching produces a large overhead, and the multi-

threaded program will be slower than a single threaded one. The ideal data for parallel

processing are totally independent and may have to be synchronized at the end of all the

calculations.

In this study, parallel processing is used to investigate the global and local maximum

altitudes as well as for the raytracing itself. The Bresenham algorithm includes calculations

based on prior calculations and cannot be parallelized efficiently. Instead, multiple rays

are traced concurrently. The columns of a row of the bounding box are parallelized and at

the end of the row the investigated tracing results are joined, to export the finished row

to the true orthophoto image file. Since shared data is only read, race conditions do not

have to be considered. All other variables are created within a thread and therefore, not

shared with the other threads and cannot be manipulated unnoticed.

The implementation was easily done with the application programming interface OpenMP

(Open Multi-Processing) that supports multi-platform shared memory multiprocessing

programming in C on many architectures, including Linux. It consists of a set of compiler

directives, library routines, and environment variables that influence run-time behavior.

OpenMP was defined by a group of major computer hardware and software vendors and

gives programmers a simple and flexible interface for developing applications for desktops

as well as supercomputers [NF02]. Most of the compilers have OpenMP implemented and

iterations of a for-loop are concurrently computed with the compiler directive #pragma

omp parallel for . After the for-iteration the threads are joined and only the master thread

remains to continue with the single-processing parts of the software.

46

6.6 Rectification

6.6 Rectification

Source Image Rectified Image

Figure 6.8: Illustration of orthorectification. Ray does not always hit the center ofthe pixel of the source image.

If a pixel is not occluded it has to be filled with the correct data. With the equations of

chapter 3 and 4 the target position of the ray in the source image is investigated. Often the

ray does not hit the center of the source pixel due to distortions as figure 6.8 illustrates.

Therefore the pixel value of the output image has to be transformed or resampled. Several

ways are common, but Nearest Neighbor and Bilinear Interpolation are used in this thesis

[dB+08].

The nearest neighbor algorithm simply selects the value of the nearest point, and does

not consider the values of other neighboring points at all. In the case of this thesis the

nearest point is investigated by rounding the float value to an integer value. Figure 6.9a

illustrates the equation below.

o(ox, oy) =

i(ix, iy) if fx− ix < 0.5 and fy − iy < 0.5

i(ix+ 1, iy) if fx− ix ≥ 0.5 and fy − iy < 0.5

i(ix, iy + 1) if fx− ix < 0.5 and fy − iy ≥ 0.5

i(ix+ 1, iy + 1) if fx− ix ≥ 0.5 and fy − iy ≥ 0.5

(6.5)

where o is the output image, with (ox, oy) as the X, Y coordinates and i is the source

image with (ix, iy) as the integer values of the resampled values (fx, fy).

The bilinear interpolation calculates the value of point P = (x, y) by means of the four

neighboring points Q11, Q12, Q21 and Q22. The bilinear interpolation is an extension of

linear interpolation for interpolating functions of two variables on a regular grid. A linear

47


0(a)

P(x,y)

Q22

Q21

Q11

Q121

00 1

(b)

Figure 6.9: (a) Nearest Neighbor Transformation. (b) Bilinear Interpolation.

interpolation is firstly performed in one direction and then again in the other. The value

of the unknown function f at the point P is found by

f(x, y) ≈ f(0, 0)(1− x)(1− y) + f(1, 0)x(1− y) + f(0, 1)(1− x)y + f(1, 1)xy (6.6)

if a coordinate system is chosen where f for the four neighboring points Q11, Q12, Q21 and

Q22 is known as (0, 0), (0, 1), (1, 0) and (1, 1). This is accomplished by computing f(x, y)

only with the fractional digits of x and y.

For evaluation purposes the nearest neighbor algorithm was used, since it consists of

less calculations and the quality gain by bilinear interpolation is negligible as figure 6.10

illustrates.

6.7 Summary

One way for intersecting a surface model was introduced and highly optimized through

bounding boxes, global and local maximum heights and parallel processing. Two meth-

ods to resample pixel values that are not occluded are described. The performance is

highly dependent of the source images, but compared to the raytracing library of [Nie04]

48

6.7 Summary

(a) (b)

Figure 6.10: (a) Nearest Neighbor Transformation Example. (b) Bilinear InterpolationExample.

a performance gain of 20 times for city imagery with altitude differences of 60 meters was

accomplished. An optimization that makes raytracing feasible even for large images.

49

7 Mosaicking

To generate large scale ortho imagery, multiple images have to be merged to form a mosaic

of images. Adjacent images are usually assembled along seamlines that are automatically

or manually placed roughly along the middle of the overlapping areas. For orthophotos,

the seamlines are often placed along roads or flat terrain so that no buildings or other

objects are intersected, which would result in visible seams due to relief displacements.

Figure 7.1: The overall brightness of

an image relies on the reflection

of surfaces and more important on

the angles of airplane, surface and

sun to each other. The gray-scale

below shows the amount of light re-

flected.

True orthophotos instead have the advantage

that relief displacements are mostly removed, de-

pending on the quality of the DEM. Therefore,

placing the seamlines along roads is not neces-

sary. However, seamline placement is more cru-

cial for true orthophotos than for orthophotos,

since they have significantly more seamlines, due

to the large amount of occluded areas. Radiomet-

ric differences are an inherent part of imagery and

the reason for clearly visible seamlines. Since the

images rely on the light from the sun, the relative

angle to the sun may also have great influence as

illustrated in figure 7.1. To avoid poor results

the seamlines have to be feathered and a smooth

transition has to be guaranteed.

51

7 Mosaicking

7.1 Mosaicking and Merging methods

The mosaicking methods presented in this section rely on a pixel-by-pixel score method,

inspired by methods presented in [Nie04] and [BA83]. The method used in this thesis is

cutline generation by Nearest Feature Transform (NFT), also known as Distance Trans-

formation and multiresolution spline merging as implemented in the open source enblend

program [Md04].

7.2 Mosaicking by Nearest Feature Transform

The first step after rectifying, is to mosaick the images. In this study the nearest feature

transform due to it’s simplicity and overall good result is used. Moreover the NFT ensures

that the pixel information of any image is used, for which the distance to the blindspots

is the largest, and therefore, inaccuracies in the surface model are compensated. It’s a

method that maps binary images into distance images (1 channel), where the distance to

the nearest object corresponds to the color level. In this case, the visibility mask is the

binary image with blindspots as the ojects (figure 7.2).

Figure 7.2: Nearest Feature Transformation of blindspot image (superimposed inwhite). The more red the greater the distance to a blindspot.

For each source image, a corresponding blindspot distance map is created, where the score

of each pixel indicates the distance to blindspots. The distance maps are used to determine

52

7.3 Seamline feathering

the source image to take the pixel data from. For now it is only possible to merge two

images with enblend [Md04] at a time, and therefore, the order of the merging process

could be of some importance, since no color-matching or histogram-matching algorithm is

used in this study. If for instance the perspectives of the merged images differ only a little,

the final true orthophoto will be tremendously mosaicked and the feathering algorithm

might be overstrained, which could result in a poor final image. But if always the two

images with the most different perspective are merged, the blindspots will be mainly

filled by one image instead of five or ten. Consequently, less feathering has to be done.

However, the tested imagery in chapter 8 shows that the feathering algorithm smooths

every seamline precisely and therefore the order is not important. The only importance

is, that the images overlap. Figure 7.3 illustrates an image mosaick of two images.

Figure 7.3: Distance map for joining to images in late stage, where the dark areascorrespond to one image and the white areas to the other (surface outlines aresuperimposed).


As figure 7.4a shows, without any adjustment and feathering the seamlines are clearly

visible and the overall result is poor. Therefore, a method is required which easily joins

images. The merging thus needs to handle ratiometric differences in the input images.

In the traditional orthorectification mosaicking, color values are first adjusted to match

53

7 Mosaicking

(a) (b)

Figure 7.4: (a) is a trueorthophoto without seamline feathering so that the seamlinesare obvious. (b) is feathered by Multiresolution spline [BA83] and no seamlines arevisible.

a reference image and then merged with a simple feathering. The disadvantages of this

method are: Firstly, calculations besides feathering, due to the matching have to be done

and secondly, the reference image might has radiometric distortions and would lead to

an addition of these distortions to all other images. In this thesis instead, the promising

multiresolution splines [BA83] are used. The approach is to distort the surfaces gently,

so that they can be joined together with a smooth seam while still preserving as much of

the original image information as possible [BA83]. This means that, no reference image is

required and a smooth transition is guaranteed.

Figure 7.5: The weighted average method may be used to avoid seams. Exampleweighting functions are shown here in one dimension. The width of the transitionzone T is a critical parameter for this method [BA83].

54


An image consists of different spatial frequencies. Spatial frequency is a measure of how

often a certain structure repeats per unit of distance. In image processing applications,

the spatial frequency is often measured as lines per millimeter and differences in this

frequencies convey different information about the appearance of a stimulus. High spatial

frequencies represent abrupt spatial changes in the image, such as edges, and generally

correspond to fine detail. Low spatial frequencies, on the other hand, represent global

information about the shape and smooth areas like grass [Bar04].

So, to make a seamline really smooth the various spatial frequencies of an image have to

be joined differently. Figure 7.5 describes the merging of two images through a weighted

average (Hl(i) for the left image and Hr(i) for the right image) within a transition zone

T . If the transition zone is the same for every spatial frequency the resulting image will

be highly distorted, since high spatial frequency for instance has to be joined in a smaller

zone than low spatial frequency like the grass, that has to be joined slowly and smoothly.

Therefore the image should be decomposed into a set of band-pass component images

for the different spatial frequencies. A separate spline with an appropriately selected

T can then be performed in each band. Finally, the splined band-pass components are

recombined into the desired mosaic image [BA83].

7.3.1 Generating the Gaussian Pyramid

The key of a good overall result is to blend image features across a transition zone propor-

tional in size to the spatial frequency of the features. This is accomplished by blending two

images together, one spatial frequency level at a time. Each level uses a different blending

mask or distance map. At the top level, a sharp blend mask is used so that high-frequency

details are blended over a narrow region. At the bottom level, a wide blend mask is used

so that low-frequency details are blended over a large region.

Figure 7.6: A one-

dimensional graphical

representation of the

iterative REDUCE op-

eration used in pyramid

construction [BA83].

To do so, a sequence of low-pass filtered images

G0, G1, ..., GN are obtained by repeatedly convolving a

small weighting function with an image [BA83]. As fig-

ure 7.6 shows G0 is the original image and from that one

on the value of each node in the next level (for G0 G1, for

G1 G2 and so on) is computed as a weighted average of

55

7 Mosaicking

a 5 x 5 subarray of the current level. If this approach is

imagined, the result looks like a pyramid.

Sample density and resolution are decreased from level

to level of the pyramid and can be described with this

equation and 0 < l < N :

Gl(i, j) = REDUCE[Gl−1(i, j)] =∑

5∑

m,n=1

w(m,n)Gl−1(2i+m, 2j + n) (7.1)

Where i and j identify the pixel and w(m,n) is a pattern of weight used to generate each

pyramid level. Figure 7.7 illustrates different levels and a collapsed version of the highest

level, which clearly shows a really smooth transition zone for lowest spatial frequency.

(a) (b) (c) (d) (e) (f)

Figure 7.7: (a) original image - sharpest (b), (c) and (d) are intermediate levels (e) istop level for smoothest transition and (f) is the mask (e), but scaled up the originalsize using multiple EXPAND operations [Md04].

7.3.2 Generating the Laplacian pyramids

Images broken up into components based on spatial frequency are called Laplacian pyra-

mids, which contain the highest spatial frequency components at the lowest level and the

lowest spatial frequency components at the top level. Intermediate levels contain features

gradually decreasing in one-octave steps in spatial frequency from high to low.

56


A Laplacian pyramid is made by repeatedly applying a high-pass filter to the image. The

high-pass filter picks out all of the high spatial frequency components of the image and

passes everything else to the next level. This process can be compared to subtracting

each level of the pyramid from the next lowest level. Because these arrays differ in sample

density, it is necessary to interpolate new samples between those of a given array before

it is subtracted from the next lowest array [BA83]. Let Gl,k be the image obtained by

expanding Gl k times. Then

Gl,0 = Gl (7.2)

and the interpolation can be described as

Gl,k(i, j) = EXPAND[Gl,k−1(i, j)] = 4∑

2∑

m,n=−2

Gl,k−1(2i+m

2,2j + n

2) (7.3)

Only terms for which (2i+m)/2 and (2j + n)/2 are integers contribute to the sum. The

size of Gl = Gl−1 = G0.

L0, ...LN are defined as a sequence of band-pass images and for 0 < l < N with

Ll = Gl − EXPAND[Gl+1] = Gl −Gl+1,l and

LN = GN(7.4)

7.3.3 Summation and splinning overlapped images

The final image is obtained as a combination of expanding and summing. With one image

at the top pyramid level, LN is first expanded and added to LN−1 to recover GN−1 and

so forth. This can be written as [BA83]:

57

7 Mosaicking

(a) (b) (c) (d) (e) (f)

Figure 7.8: (a) highest spatial frequency (b), (c) and (d) are intermediate levels (e) istop level for smoothest transition as (f) shows. [Md04]

G0 =N∑

l=0

Ll,l (7.5)

The complete algorithm consists of the following steps:

1. Construct Laplacian pyramids LA and LB for images A (left) and B (right),

2. If the center line for level l of the final image is at i = 2N−1, then the final Laplacian

pyramid is calculated by:

LSl(i, j) =

LAl(i, j) if 1 < 2N−1

(LAl(i, j) + LBl(i, j))/2 if i = 2N−1

LBl(i, j) if 1 > 2N−1

(7.6)

3. The final image is then obtained by expanding and summing the levels of LS.

7.4 Summary

In this chapter, a process for seamline placement and mosaicking images, to form a seamless

true orthophoto was described. The used NST is a good foundation for the multiresolution

spline, since it places the transition line as far as possible from the blindspots so that the

58

7.4 Summary

spline has enough space to fade out and feather the seamlines really smooth. The enblend

[Md04] program contains an efficient implementation of this algorithm, which can be

applied to large images.

59

7 Mosaicking

(a) (b)

(c) (d) (e) (f)

(g) (h) (i) (j)

(k) (l) (m) (n)

(o) (p) (q) (r)

(s) (t) (u) (v)

Figure 7.9: (a) shows the two original images and (b) the final image.(e),(i),(m),(q),(u) are the Laplacian pyramid of image A. (f),(j),(n),(r),(v) are theLaplacian pyramid of image B. (d),(h),(l),(p),(t) are the Gaussian pyramid of dis-tance map. (c),(g),(k),(o),(s) are the Laplacian pyramid of the final image [Md04].

60

8 Experimentation and Evaluation

The previous chapters described the investigated approach of this thesis to generate true

orthophotos. Througout this chapter the developed application is tested for the city

Terrassa in Spain and Vaihingen an der Enz in Germany. The problems are illustrated

and commented. The tested imagery is exposed with the UltraCam X camera model or

DMC and has either a ground resolution of 8 cm or 25 cm. The software is tested with

stereo-matching based and laser scanned DEMs and differences based on the DEM are

drawn out.

Due to the size and resolution of the images, only close-up results are shown in this thesis.

The true orthophotos in full resolution can be found on the companion CD.

Below an overview of the generated true orthophotos is shown.

(a) Terrassa: 34 million pixel. Pixel size:0.25 m

(b) Vaihingen: 3.18 million pixel. Pixelsize: 0.08 m

61


8.1 Performance

As mentioned in chapter 6, the feasibility of large-scale true orthophoto imagery depends

on the computation time of the orthorectification. To evaluate the single performance op-

timizations, the computation time for some reference images was registered. The reference

images had a size between 3.18 million pixel and 34.50 million pixel and either a ground

resolution of 0.08 m or 0.25 m. Additionally, an image that covered an area of about

630 m x 1550 m with a ground resolution of 0.25 cm and roughly 21.86 million pixels was

raytraced for a comparison with the raytracing library of [Nie04]. Table 8.1 illustrates the

single computation times and points out that a performance gain of 20 times compared

to [Nie04] was accomplished.

The experimentation was done on what is comparable to a standard laptop at the time of

writing this thesis. The overall specifications are:

Processor: Intel Centrino Core 2 Duo, 2.4 GHz

800 MHz Front Side Bus, 4MB L2 Cache

Memory: 2048 MB DDR2 RAM

Operating System: Ubuntu 9.10

Table 8.1 points out the processing time of some images depending on the optimizations

activated. From left to right another optimization is applied on top:

Size Res Master thesis BBox GMH MT

3.18 million pixel 0.08 m 719 s 132 s 91 s

6.75 million pixel 0.25 m 2596 s 101 s 64 s

15.7 million pixel 0.25 m 424 s 256 s

21.86 million pixel 0.25 m 1 hr 615 s 371 s

34.50 million pixel 0.25 m 703 s 624 s

Size as the image size and Res as the ground resolution of the image describe the processed

photograph. Column Masterthesis contains the computation time of the raytracing li-

brary of [Nie04]. The computation time of the Bresenham algorithm plus Bounding Box

is drawn out in BBox, GMH is the computation time with global maximum height and

MT is the computation time with parallel processing and all optimizations actived.

62

8.2 Pros...

The Bresenham algorithm was tested without any optimization only for images with less

than 7 million pixels, since the computation time was not feasible for any larger imagery.

The biggest performance gain is done by means of global maximum heights. For a 3.18

million pixel image, the performance was 7 times faster, for a 6.75 million pixel image

26 times and increasing. The performance gain due to global maximum altitude can be

drawn as an exponential graph, since the pixels to trace get more and the tracing steps get

more, too. Additionally, interesting is the performance gain of the 7 million pixel image

compared to the 3 million pixel image. This fact points out that the shape of the elevation

model has a huge impact on the performance. If the difference between the global mean

height and the global maximum height is small, raytracing is done very fast, since only a

few steps have to be done.

To test the orthorectification as well as mosaicking due to the processing time of the true

orthophoto generation, 15 images of Vaihingen an der Enz with a resolution of 3.18 million

pixel and a ground resolution of 0.08 m were used. The processing time is given below:

Orthorectification and locating blindspots: 25 minutes (∼ 100 seconds per image)

Mosaicking and feathering: 2 minutes

8.2 Pros...

The overall result of the true orthophoto generation is very good. Narrow backyards are

fully visible, all roof tops are moved to their correct position and no walls are visible. Even

tall objects with large relief displacements are rectified correctly, and the large occluded

areas are filled with data from other images. Figure 8.1a shows a tall rectified objects.

[Nie04] is based on 3D models which only consist of buildings and terrain, but do not

include trees or cars. Stereo-matched DEMs even include trees and cars, which are visible

in both source images of SGM. Therefore trees do not look weird and cut-through cars

do not exist. Also narrow backyards of Terrassa are clearly visible, which is for normal

orthophotos only given for areas very close to the nadir point (figure 8.1b).

63


(a) (b)

Figure 8.1: (a) Rectified tall building (50 meters tall), some occluded areas are leftdue to too less perspectives. (b) Narrow backyards visible in a true orthophoto.

8.3 ...and cons

Some minor problems are still left in the true orthophoto generation. These remaining

errors are small and usually only noticeable if they are investigated.

Deviations in the DEM can cause poor rectifications. Terrain pixels might be treated

as roof pixels and they also get resampled to the roof’s rectified position (figure 8.2a).

The other way around is possible too, as the figure 8.2b illustrates, if the SGM process

identifies one roof point wrong, the result is holes in the roof.

(a) (b)

Figure 8.2: (a) Terrain pixels in roof. (b) Holes in roof.

64

8.4 Using a simpler DEM

If sufficient overlap and perspectives are not available, occluded areas may remain occluded

in the final true orthophoto. This is a huge problem as illustrated below. Some city parts

of Terrassa have many occluded pixels, because only one image provides coverage here.

Figure 8.3: Remaining blindspots in one part of Terrassa, due to only one imagecoverage.

8.4 Using a simpler DEM

The developed application was also evaluated based on other sets of source data. The

true orthophoto generation based on a laser-scanned DEM points out the importance of

a correct and accurate elevation model (figure 8.4). The visibility mask based on the

DEM is correct as figure 8.4a illustrates. But laser-scanned DEMs cut off sharp edges,

so that objects on the terrain are smaller modeled than they really are. Therefore the

source image does not fit properly over the DEM as figure 8.4b illustrates. The calculated

visibility masks are smaller than they are supposed to be and roof parts are pulled down

to the terrain level. On top the ”center” of the roof is not placed at the actual center of

the building and walls are visible. The mosaicking process treats the pulled down roof

parts as correct data and might uses these parts to fill occluded areas. The result is a

poor true orthophoto that has plenty of ghost images left ??.

65


(a) (b)

Figure 8.4: (a) shows the DEM merged with the visibility mask. (b) points thedisplacement of the visibility mask out. Due to different sources of DEM and orthoimage and poor accuracy at sharp edges the objects tend to be smaller.

8.5 Considering all images for Nearest Feature Transform

It was interesting to see what the mosaic pattern would look like, if enblend [Md04]

considers all images at the same time instead of two at a time. As expected the image

would be even more mosaicked, because the farthest pixel of any blindspot is taken. The

result could be poorer than the current approach, since even single pixels are taken from

different images instead of areas only. A master image cannot be clearly identified as

figure 8.5 shows, but an order of consideration of the images can be investigated. To

decrease mosaicking, the inclusion of the distance to nadir as mentioned in [Nie04], could

be helpful.

8.6 Summary

The implemented methods were tested on different sets of source images and elevation

models. The overall result is good and small remaining errors were pointed out in this

chapter. The significant errors are all caused by limitations or inaccuracies of the used

66

8.6 Summary

Figure 8.5: Illustration of a Nearest Feature Transform of 15 images.

DEM. Enough overlap and available perspectives are crucial for no remaining occluded

areas.

67

9 Conclusion

The overall goal of this thesis was to investigate different methods for true orthophoto

generation and to implement an efficient method as part of the image processing software

XDibias. The crucial steps of true orthophoto generation are: Rectification, mosaicking

and feathering. The rectification was implemented as an easy to use module of XDbibias,

but can be used stand-alone as well. The rectified images are mosaicked and feathered

with enblend.

9.1 Evaluation

The rectification was done by raytracing from the surface model back to the camera,

taking the camera model into account and registering rays that were blocked by objects

in the surface model. Several methods are implemented to optimize the rectification

process. At first a bounding box based on the size and coordinates of the output image

and the DEM was made, so that only those points are traced, which are available in both

datasets. Subsequently, the height of each bounding box point is devised and local altitude

maxima for areas of predefined size besides the global altitude maximum are stored, while

importing the required DEM points into a data structure. The stored altitude maxima

are used to speed up the raytracing, so that instead of tracing the ray until the camera,

the ray is traced to the local altitude maximum in best case and until the global altitude

maximum in worst case. The software is implemented for parallel processing, in order

to compute the tremendous amount of calculations simultaneously on multi-core systems.

The implemented optimizations work around the sufficient and easy Bresenham algorithm

and speed up the raytracing process compared to the raytracing library in [Nie04] from 1

hour for a 22.7 million pixel image to 5 minutes.

69

9 Conclusion

The method used in this thesis to mosaick the images relies on a pixel-by-pixel score

method. Seamline placement is done with Nearest Feature Transform, since it ensures

that the pixel information of any image is used, for which the distance to the blindspots

is the largest. The experimentations in 8 show that NFT compensates inaccuracies in the

surface model and provides a good foundation for the multiresolution splines.

Feathering and merging was based on multiresolution splines merging that showed a very

good overall image quality result. Despite the fact, that no color-matching or histogram-

matching algorithm was used radiometric differences are balanced and seamlines are not

obvious. The merging was done with different transition zones depending on different

spatial frequencies. High spatial frequency, like sharp edges, was joined in a smaller zone

than low spatial frequency like the grass, that was joined slowly and smoothly. The

mosaicking and feathering methods were able to assign images, so inaccuracies of the

surface model were less likely to create problems in the final result.

It is assessed that the methods devised are able to be applied on large scale true orthophoto

mosaics.

9.2 Outlook

Improvements to the true orthophoto generation process can still be made. As pointed out

in the test results, the mosaicking process could be improved by combining the distance

to blindspots algorithms besides nearest-to-nadir with an algorithm not based on a pixel-

score method, but an area-score method. So that for a certain occluded area the largest

locally coherent, visible area of one image instead of pixels from ten different images is

taken.

For now enblend merges only two images at a time. The consideration of all images

for each occluded pixel or the determination of a sequence to merge the images, so that

always images with opposed or most different perspectives are merged could improve the

mosaicking process.

Depending on the quality of the DEM and the number of viewing directions, some blindspots

might remain in the final image. Since the blindspots have the pixel value 0, and therefore

70

9.2 Outlook

are painted black, they are very obvious. To hide the leftover blindspots, the color of the

adjacent pixels could be interpolated.

Some imagery shows a huge contrast between areas in the sun and shadow. Since the

exact time and position of the images is known, by means of the elevation model shadows

thrown by buildings and objects can be calculated. The scoring method could take into

consideration if a image pixel is shadowed or shadowed areas could be brightened by

histogram equalizing or matching with histograms of sunlit areas.

The rectification process is by far the slowest part of the true orthophoto generation. Lately

the processor on 3D graphic cards - so called Graphics Processing Units (GPU) - are used

for fast calculations as an alternative to using super computers in various fields. These

graphics processing units are highly optimized for a few sets of calculations like floating

point calculations. With Computed Unified Device Architecture (CUDA) from nVidia or

Stream from AMD/ATI the GPUs can be used as co-processors and since raytracing is

an essential part of a 3D graphic processor, it may be possible to speed up the process

significantly by this hardware-based approach.

Up to now, an alternative to the true orthophoto was the creation of normal orthophotos

based on imagery with a large overlap greater than 80 percent. Using only the central part

of the images eliminated the worst part of relief displacements, but still produces orthopho-

tos with significant relief displacements, especially in dense city areas. This method is, of

course, expensive in flight hours and image preprocessing, but most likely cheaper than

manually creating a detailed surface model first. With new technologies, higher resolutions

and more details, negligibly subpixel level displacements become pixel level displacements

and could just be compensated with an even larger and more expensive overlap of the

imagery. This study devised a method to generate true orthophotos without expensive

3D surface models but with cheap, fully automated generated stereo-matched elevation

models. This leads to the conclusion that the decreasing expense of true orthophotos and

the demand for greater detail and accuracy will displace normal orthophotos in the near

future.

71

9.2 Outlook

Acknowledgements

At this point I woud like to thank all the people, who contributed with their support to

the success of my Bachelor thesis.

I am deeply indebted to Dr.-Ing. Pablo d’Angelo at the German Aerospace Center in

Oberpfaffenhofen for the offer, the mentoring and assistance during the development of

the thesis and the subsequent proofreading.

My thanks also go to Prof. Andreas Siebert PhD at the University of Applied Sciences

for mentoring and being the impulse of this thesis.

I want to thank all members of staff at the department of applied remote sensing cluster,

especially Dr. Peter Reinartz and Dr. Danielle Hoja, who made it possible to write

my Bachelor thesis at the German Aerospace Center and Dr. Danielle Hoja also for

proofreading certain chapters.

Addionally, I would like to thank my American friend Amar H Patel M. Eng. for proof-

reading.

The greatest debt of appreciation I will forever owe my family, who always supported me

in my professional career.

73

REFERENCES

References

[A+98] F. Amhar et al. The Generation of True Orthophotos Using a 3D Building Model

in Conjunction With a Conventional DTM. IAPRS, Vol.32, p.16-22, 1998.

[BA83] P. J. Burt and E. H. Adelson. A multiresolution spline with application to image

mosaics. ACM, 1983.

[Bar04] M. Bar. Visual objects in context. National Rev. Neuroscience 5, 2004.

[Bre65] J. E. Bresenham. Algorithm for computer control of a digital plotter. IBM

Systems Journal 4, 1, 1965.

[C+01] J.M. Carstensen et al. Image analysis, vision and computer graphics. Technical

University of Denmark, 2001.

[Cor04] Corbet. Scheduling domains, http://lwn.net/Articles/80911, 2004.

[d+09] P. d’Angelo et al. Towards automated DEM generation from high resolution

stereo satellite images. Commission IV, 2009.

[dB+08] M. de Berg et al. Computational Geometry. 2008.

[E+00] S. Easa et al. Urban planning and development applications of GIS. ASCE

Publications, 2000.

[EH01] S. M. Ervin and H. H. Hasbrouck. Landscape modeling: digital techniques for

landscape visualization. McGraw-Hill Professional, 2001.

[Hir07] H. Hirschmueller. Stereo Processing by Semiglobal Matching and Mutual Infor-

mation. IEEECS, 2007.

[K+04] Y. Kuzmin et al. Polygon-based True Orthophoto Generation. Proceedings of

the 20th ISPRS congress: 405, 2004.

[KE02] M. Kasser and Y. Egels. Digital photogrammetry. CRC Press, 2002.

[Kra07] K. Kraus. Photogrammetry. Walter de Gruyter, 2007.

75

REFERENCES

[M+01] E.M. Mikhail et al. Introduction to Modern Photogrammetry. Wiley and Sons

Inc., 2001.

[Md04] A. Mihal and P. d’Angelo. http://enblend.sourceforge.net/

details.htm, 2004.

[N+96] B. Nichols et al. Pthreads Programming. O’Reilly And Associates, 1996.

[NF02] O. Nommensen and M. Firuziaan. Parallel Processing via MPI & OpenMP.

Linux Enterprise, 2002.

[Nie04] M. O. Nielsen. True orthophoto generation. Technical University of Denmark,

2004. Master’s thesis, Preparatory thesis.

[Nor96] J. Northrup. Programming with UNIX Threads. John Wiley And Sons, 1996.

[ST08] J. Shan and C. K. Toth. Topographic laser ranging and scanning: principles

and processing. CRC Press, 2008.

[Tao09] J. Tao. Generierung von 3D-Oberflaechenmodellen aus stark ueberlappenden

Bildsequenzen eines Weitwinkel-Kamerasystems. University of Stuttgart, 2009.

Diploma thesis.

[Wik09] Wikipedia.org. http://de.wikipedia.org/wiki/Zeilenkamera, 2009.

[Wik10a] Wikipedia.org. http://de.wikipedia.org/wiki/

Bresenham-Algorithmus, 2010.

[Wik10b] Wikipedia.org. http://de.wikipedia.org/wiki/Rasterung_von_

Linien, 2010.

76

REFERENCES

List of Abbreviations

CUDA Computed Unified Device Architecture

DEM Digital Elevation Model

DSM Digital Surface Model

DTM Digital Terrain Model

GCP Ground Control Point

GIS Geographic Information System

GPS Global Positioning System

GSD Ground Sample Distance

IMU Inertial Measurement Unit

MI Mutual Information

NFT Nearest Feature Transform

PPAC Principal Point of Autocollimation

PPBS Principal Point of Best Symmetry

RG Regular Raster Grid

SGM Semi-Global-Matching

TIN Triangulated Irregular Network

77

List of Figures

List of Figures

2.1 Distinction of orthographic and perspective projection . . . . . . . . . . . . 5

2.2 Cause of relief displacements in perspective images . . . . . . . . . . . . . . 6

2.3 Forward and backward projection . . . . . . . . . . . . . . . . . . . . . . . . 7

2.4 Relief displacements . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9

2.5 Impact of flight altitude and distance to nadir point on relief displacements 10

2.6 Real-world example of relief displacement . . . . . . . . . . . . . . . . . . . 10

2.7 Object Stretching with forward projection due to occluded areas . . . . . . 11

2.8 Occluded areas . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 12

2.10 Combination of several images for full coverage . . . . . . . . . . . . . . . . 12

2.9 Possible seamline placement in some orthophotos . . . . . . . . . . . . . . . 13

2.11 Imagery showing the different stages of true orthophoto generation . . . . . 14

3.1 Camera types . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 18

3.2 Exterior orientation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 20

4.1 Levels of DEM . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 24

4.2 Illustration of image overlapping . . . . . . . . . . . . . . . . . . . . . . . . 25

4.3 Grid example . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 26

4.4 DEM image . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 27

4.5 Delaunay Triangulation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 28

4.6 TIN Examples . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 28

4.7 DEM Generation Process . . . . . . . . . . . . . . . . . . . . . . . . . . . . 30

5.1 General approach of the true ortho rectification process. . . . . . . . . . . . 33

5.2 Possible case for occluded pixels . . . . . . . . . . . . . . . . . . . . . . . . 34

5.3 Possible case of a mosaicked image . . . . . . . . . . . . . . . . . . . . . . . 35

6.1 Raytracing of output pixels to camera in object space . . . . . . . . . . . . 37

6.2 Possible arrangements of DEM and Orthophoto . . . . . . . . . . . . . . . . 39

79

List of Figures

6.3 Different heights to trace too . . . . . . . . . . . . . . . . . . . . . . . . . . 40

6.4 Raytracing with local maximum heights . . . . . . . . . . . . . . . . . . . . 41

6.5 Slopes in different octant . . . . . . . . . . . . . . . . . . . . . . . . . . . . 42

6.6 Raster process with Bresenham . . . . . . . . . . . . . . . . . . . . . . . . . 43

6.7 Multi-processing illustration . . . . . . . . . . . . . . . . . . . . . . . . . . . 45

6.8 Illustration of orthorectification . . . . . . . . . . . . . . . . . . . . . . . . . 47

6.9 Nearest Neighbor Transformation and Bilinear Interpolation . . . . . . . . . 48

6.10 Nearest Neighbor Transformation and Bilinear Interpolation Example . . . 49

7.1 Sunlight impact . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 51

7.2 Distance transformation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 52

7.3 Image mosaic . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 53

7.4 Seamline feathering . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 54

7.5 Weighted Average . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 54

7.6 REDUCE operation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 55

7.7 Gaussian pyramid . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 56

7.8 Laplacian pyramid . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 58

7.9 Multiresolution spline . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 60

8.1 Illustration of rectified tall building and narrow backyard . . . . . . . . . . 64

8.2 Impact of deviations in DEM . . . . . . . . . . . . . . . . . . . . . . . . . . 64

8.3 Remaining blindspots . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 65

8.4 Laser Scanner DEM . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 66

8.5 Mosaick of 15 images . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 67

80

A Appendix

A.1 Content of companion CD

For references the developed application and the current version of enblend [Md04] are

part of the companion CD. Used true orthophotos throughout the thesis and the source

code of the investigated application can be found on the enclosed CD.

Below is a listing of the contents:

Folder: Description:

/dem imgs Elevation models for raytracing

/enblend Compiled version of enblend [Md04] plus a python script to pre-

pare XDibias imagery for enblend.

/findocc The software for raytracing and rectifying a surface model. A brief

user guide can be found in Appendix ??.

/src Source code of findocc and enblend [Md04].

/src imgs Source images for true orthophoto generation

/thesis PDF version of this thesis.

/true orthophotos True orthophotos in full resolution created, using the findocc ap-

plication and enblend [Md04]. Format: Tiff.

/vis masks Rectified and raytraced images

81

A Appendix

A.2 Enblend user guide

To mosaick the images the enclosed python script has to be used, since the XDib-images

have to be converted to Tiff-images and the script has the parameters for enblend [Md04]

already set.

The shell command executed at the root directory of the CD is:

./enblend/enblend xdibias.py <output image>

<vis masks/input image><vis masks/input image>+

A.2.1 Raytracing user guide

To run the raytracing application without XDibias the following command has to be

executed in the shell at the root directory of the CD:

./findocc/findocc -i l=<src imgs/input image[,xs=,ys=,width=,height=]> -i

l=<dem imgs/dem image> -o l=<output image>

It is possible to set a bounding box for the input image, by adding start coordinates xs, ys

and the width width and height height of the box to the input image, so that only the

box is rectified.

82

bachelor thesis - university of novi sad · 2012. 10. 24. · this bachelor thesis is partially...

Documents