bachelor thesis - university of novi sad · 2012. 10. 24. · this bachelor thesis is partially...
TRANSCRIPT
Bachelor thesis
True orthophoto generation
Rupert Wimmer
March 25th, 2010
Abstract
Throughout this Bachelor thesis methods for generating true orthophoto imagery from
aerial and satellite imagery based on digital elevation models are investigated, compared
and an application for generating true orthophotos is developed.
The term ”True Orthophoto” is based on a generation process that tries to restore any
occluded objects in aerial imagery while at the same time including as many objects as
possible in the surface model.
New developments in image and digital processing still increase the interest in orthopho-
tos and result in a demand for greater quality of orthophotos. However, occlusions due
to rough terrain or significant difference in elevation lead to inconsistencies in accuracy
and scale. True orthophotos eliminate these inconsistencies, but most of the existing ap-
proaches to generate true orthophotos require a 3D model of the desired earth surface
area, which is time and cost-intensive in the generation process. The German Aerospace
takes another route, and wants to generate fast and cheap true orthophotos based on fully
automated generated elevation models.
The four general steps of the true orthophoto generation process are (1) rectification of the
source images and locating occluded areas, (2) seamline placement based on a distance-to-
blindspot algorithm, combined with (3) mosaicking and (4) feathering of seamlines with
multiresolution splines.
The overall goals of this thesis are to investigate problems of orthophotos and to devise
solutions in order to implement methods that are capable to create true orthophoto im-
agery fully automated. The achievements show that the investigated and implemented
methods give reasonable results compared to other true orthophoto applications, due to
image quality and computation time. A performance gain of 10 times is accomplished.
iii
Contents
Contents
Abstract . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . iii
Contents . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . v
1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1
1.1 Motivation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1
1.2 Problem definition . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2
1.3 Outline and structure . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2
1.3.1 General overview of the chapters . . . . . . . . . . . . . . . . . . . . 3
2 Orthophotos . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5
2.1 Creating orthophotos . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6
2.1.1 Reprojection . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7
2.1.2 Mosaicking . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8
2.2 Relief displacements . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9
2.3 True orthophotos . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11
2.4 Accuracy of orthophotos . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 13
2.5 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 16
3 The Camera Model . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 17
3.1 Interior orientation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 18
3.2 Exterior orientation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 19
3.3 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 21
4 Digital Elevation Models . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 23
4.1 Elevation models . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 23
4.2 Data collection for digital elevation models . . . . . . . . . . . . . . . . . . 24
4.3 Surface representation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 25
4.3.1 Regular Raster Grid . . . . . . . . . . . . . . . . . . . . . . . . . . . 26
v
Contents
4.3.2 Triangulated Irregular Network . . . . . . . . . . . . . . . . . . . . . 27
4.4 DEM generation by stereo image matching . . . . . . . . . . . . . . . . . . 29
4.5 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 30
5 Design description . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 31
5.1 Limits of other true orthophoto applications . . . . . . . . . . . . . . . . . . 31
5.2 Creating true orthophotos - Step by step . . . . . . . . . . . . . . . . . . . . 32
5.2.1 Rectification . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 32
5.2.2 Locating occluded pixels . . . . . . . . . . . . . . . . . . . . . . . . . 34
5.2.3 Seamline placement . . . . . . . . . . . . . . . . . . . . . . . . . . . 34
5.2.4 Mosaicking . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 35
5.3 Implementation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 35
6 Raytracing the elevation model . . . . . . . . . . . . . . . . . . . . . . . . . . . 37
6.1 Data storage . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 38
6.2 Bounding box optimization . . . . . . . . . . . . . . . . . . . . . . . . . . . 39
6.3 Global and local maximum heights . . . . . . . . . . . . . . . . . . . . . . . 40
6.4 Raytracing with the Bresenham algorithm . . . . . . . . . . . . . . . . . . . 42
6.5 Parallel processing . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 45
6.6 Rectification . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 47
6.7 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 48
7 Mosaicking . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 51
7.1 Mosaicking and Merging methods . . . . . . . . . . . . . . . . . . . . . . . . 52
7.2 Mosaicking by Nearest Feature Transform . . . . . . . . . . . . . . . . . . . 52
7.3 Seamline feathering . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 53
7.3.1 Generating the Gaussian Pyramid . . . . . . . . . . . . . . . . . . . 55
7.3.2 Generating the Laplacian pyramids . . . . . . . . . . . . . . . . . . . 56
7.3.3 Summation and splinning overlapped images . . . . . . . . . . . . . 57
7.4 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 58
8 Experimentation and Evaluation . . . . . . . . . . . . . . . . . . . . . . . . . . 61
8.1 Performance . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 62
8.2 Pros... . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 63
8.3 ...and cons . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 64
8.4 Using a simpler DEM . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 65
8.5 Considering all images for Nearest Feature Transform . . . . . . . . . . . . 66
vi
Contents
8.6 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 66
9 Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 69
9.1 Evaluation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 69
9.2 Outlook . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 70
Acknowledgements . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 73
References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 75
List of Abbreviations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 77
List of Figures . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 79
A Appendix . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 81
A.1 Content of companion CD . . . . . . . . . . . . . . . . . . . . . . . . . . . . 81
A.2 Enblend user guide . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 82
A.2.1 Raytracing user guide . . . . . . . . . . . . . . . . . . . . . . . . . . 82
vii
1 Introduction
This chapter gives an overview of the general purpose for and objectives of this thesis.
The motivation and goals of the project are presented along with a brief description of the
following chapters of the thesis.
1.1 Motivation
New developments in image and digital processing still increase the interest in digital,
accurate, undistorted and true-in-scale images - so called orthophotos, the very common
part of spatial datasets. Orthophotos can be used to measure true distances and are
commonly used for tasks where a greater detail and timeliness than maps are required.
The opportunity for imagery with higher resolution results in the demand for greater
quality and accuracy of orthophotos.
With today’s high resolution aerial photography, only a limited accuracy is provided when
using traditional orthophoto production. Rough terrain or significant difference in eleva-
tion leads to inconsistencies in accuracy and scale with the normal orthophoto method,
which cannot handle occlusions. These limitations might cause problems for the user, who
is unaware of them, and incorrectly uses the orthorectified imagery as a true and accurate
map.
The increasing detail of orthophotos makes the limitations more and more evident. The
demand for greater quality and accuracy requires new methods and algorithms to over-
come these limitations of normal orthophotos. The ever raising computer processing power
increases the feasibility to create true orthophotos on a large scale, and hence the Ger-
1
1 Introduction
man Aerospace Center wanted to extend the existing image processing software by true
orthophoto generation to meet the demand on accurate true orthophotos.
Various researchers recently investigated true orthophoto generation, but most of their
approaches imply a manually created 3D model of the desired earth surface area, whose
production is very time and cost-intensive. To meet the demand of fast and cheap true
orthophotos, this study takes another route and works with fully automated generated
elevation models.
1.2 Problem definition
The aim of this bachelor thesis is the design and implementation of a software for true
orthophoto generation. The generation process based on aerial or satellite images and
digital elevation models is supposed to be as fully automatic as possible. The overall goals
of this thesis are:
- Devise a method to create true orthophotos.
- Investigate problems and solutions for generating orthophotos.
- Implement methods optimized regarding quality and computing time that are capa-
ble to create true orthophoto imagery fully automated.
- Evaluate the solutions through test methods.
1.3 Outline and structure
This bachelor thesis is partially based on the master thesis of M. O. Nielsen [Nie04].
Whenever the preparatory thesis is referenced, the important results are presented and
can therefore be read without the prior knowledge of [Nie04].
The first chapters cover the basic theory for generating orthophotos and the difference
to true orthophotos. Next, methods to create true orthophotos are introduced. The key
steps are explained, tested and evaluated independently in the following chapters.
2
1.3 Outline and structure
A software module as part of the existing image processing software XDibias was developed
during the investigations on this thesis to produce true orthophotos and the existing
software enblend [Md04] was used for mosaicking. The developed software can be found
on the companion CD.
1.3.1 General overview of the chapters
Chapter 2, Orthophotos: Introduces the concept of orthophotos and the procedure to
create them. Afterwards, this is extended to true orthophotos and the differences
are pointed out. In the end, the accuracy of orthophotos is analyzed.
Chapter 3, The Camera model: The mathematical model for the interior and exterior
orientation of a camera lens system, important for the true orthophoto generation,
is presented.
Chapter 4, Digital Elevation Models: The basic concept of digital surface models and
the different model representations are described. A description of stereo-matched
elevation models, which are used in this project, are given.
Chapter 5, Design description: A step-by-step method to create true orthophotos is in-
vestigated and specified.
Chapter 6, Raytracing the elevation model: Methods for an effective way of tracing rays
between the camera and the surface model are developed in this chapter. Since a
tremendous amount of calculations is required for processing large aerial images, the
performance is an important issue.
Chapter 7, Mosaicking: Methods for seamline placement and feathering are presented,
tested and evaluated.
Chapter 8, Test results: The implemented method is tested on a set of data. Pros and
cons are illustrated with close-ups and results are commented.
Chapter 9, Conclusion: This chapter takes the entire thesis into consideration again, sum-
marizes and draws out the final conclusion and statement. On top it presents sug-
gestions for future work and performance optimizations.
3
2 Orthophotos
A taken photograph shows an image of the world projected through a perspective center
onto the image plane. Because of this so called central projection and the fact that aerial
images are normally shot vertically, objects at the same point but with different heights
are placed at different positions in the photograph (figure 2.1). As an effect of these relief
displacements, objects placed at a high position (consequently closer to the camera) will
appear bigger in the photograph and occlude objects at a lower height.
Datum
Terrain
Image plane
Perspective center
A' B' C' D' E' F'
A BC
D E F
a b c d e ff e d c b a
A' B' C' D' E' F'
A BC
D E F
Orthographic projectionPerspective projection
Figure 2.1: Illustrating the difference between ortho-
graphic and perspective projection
Aerial and satellite images
are often used combined
with spatial data in Ge-
ographic Information Sys-
tems (GIS), as reference
maps in city planning, or as
part of realistic terrain vi-
sualizations in flight simula-
tors. Therefore the images
have to be adjusted for to-
pographic relief, lens distor-
tion, camera tilt and recal-
culated with an underlying Digital Elevation Model (DEM). All this is done throughout
the ortho rectification process, which tries to eliminate the perspectiveness of the image
by computing an orthogonal projection for every single point of the image instead of pro-
jecting the rays through one point onto the image plane. The orthophoto is true in scale
and has a reference to the world coordinate system and can consequently function as an
uninterpreted map. Orthophotos have a high up-to-dateness, can be merged into one large
photo of an enormous area and they can be generated more often than typical topographic
maps because of the low expenses.
5
2 Orthophotos
Image plate
Figure 2.2: Illustrating the cause of relief displacements [M+01]
2.1 Creating orthophotos
For the orthophoto generation process knowledge of the terrain and also the camera model,
position and orientation during exposure is required. A terrain model can be created in
several ways, but the most common is to use digital cameras with direct georeferencing
by GPS- and IMU-measurements (investigated in chapter 3). An additional way is pho-
togrammetry, which provides algorithms known as bundle adjustment to minimize the
errors of an image and to figure the needed parameters out. Another obsolete way to
extract the parameters is by manually fitting the image over some known Ground Control
Points (GCP) without considering the camera model (sampling). The points constitute
a relation between unique points in the source images and points located in terrain with
known positions due to a GIS. GCPs are typically used in bundle adjustment, too.
6
2.1 Creating orthophotos
2.1.1 Reprojection
Reprojection is the first step of orthophoto rectification, where rays are reprojected from
the image onto the model of the terrain. It is possible to do the reprojection in two ways:
Forward or backward reprojection.
The forward method projects the source image back onto the terrain (figure 2.3). The in-
tersection point of the projection with the terrain (X,Y,Z) is then stored in the orthophoto.
If the upper left corner of the orthoimage is placed at X0, Y0 the pixel coordinate of a
point in the orthoimage is at:
[
column
row
]
=1
GSD∗
[
X −X0
Y0 − Y
]
(2.1)
Figure 2.3: Main principle of for-
ward and backward projection.
[Nie04]
where GSD is the Ground Sample Distance, which
is the pixel size and consequently the distance be-
tween two pixels (from pixel center to the very next
pixel center). This equation also takes into consid-
eration that the world coordinate system has the Y
coordinate upwards / north and a pixel coordinate
system has the Y-axis downwards.
Through the forward projection, regularly spaced
points in the source image are projected to a set of
irregular spaced points on the terrain. To store the
pixels, they have to be interpolated into a regular
array of pixels of a digital image. This interpolation
is the reason for the preference of backward pro-
jection. Instead of projecting a point of the source
image onto the terrain, a pixel of the output ortho
image is projected back to the source image. In this
case, the interpolation is done in the source image, which is easier to implement and the
interpolation can be done right away for each output pixel. On top only needed pixels of
the orthophoto are reprojected.
7
2 Orthophotos
For the backward projection a row / column coordinate of a pixel of the orthophoto needs
to be converted to the world coordinate system. The Z coordinate is found at this point
in the terrain. The pixel-to-world transformation is done by:
[
X
Y
]
=
[
X0
Y0
]
+GSD ∗
[
column
−row
]
(2.2)
To identify the point in the source image that corresponds with the found X,Y, Z co-
ordinate, the camera needs to be modeled. A description of the camera model and the
equations needed for this calculation can be found in chapter 3.
2.1.2 Mosaicking
Orthophotos often cover an enormous area and will therefore require the rectification of
several source images that are merged together afterwards. This process is called mosaick-
ing and involves several steps:
- Seamline generation
- Color matching
- Feathering and dodging of the seamlines
The line where the images are stitched together is defined as a seamline and can be
generated either automatically or manually. The focus of this process is to mosaick the
images along places they look very familiar and in the best case the seamlines are not
recognizable. A manual seamline placement is often done along the centerlines of the
roads. There exist several ways to place seamlines automatically. The simplest is to place
the lines along the center of the overlap. Another way is to subtract the images from each
other and place the line along the minimum difference between the two images, doing a
so-called least-cost trace [Nie04].
To create a high quality orthophoto, the images mosaicked should have the same color
and brightness near seamlines to conceal them. There are several techniques that can be
performed to hide seamlines. Color matching and dodging try to remove the radiometric
differences in the images by analyzing and comparing the overlapping sections. Feathering
8
2.2 Relief displacements
tries to hide the remaining differences by making a smooth cut that slowly fades from one
image to the other.
2.2 Relief displacements
Figure 2.4: Relief displacements [Nie04]
The earth curvature for satellite pictures and the flight altitude for aerial images cause
relief displacements due to central projection. At the nadir point there are no relief
displacements, but they increase with the distance to nadir. On top errors in the elevation
model result also in horizontal errors caused by ”uncontrolled” relief displacements. The
horizontal error ∆hor (relief displacement) can be found through a geometric analysis of
a vertical offset ∆ver (building or object), the flight altitude above the base of the object
H, the distance to the image center rt and the camera constant f as illustrated in figure
2.4. From this figure the following relation is derived:
f
rt=
H
D + ∆hor=H −∆verD
(2.3)
Isolating ∆hor results in:
9
2 Orthophotos
∆hor =rt ∗∆verf
(2.4)
Figure 2.5 illustrates that a higher flying altitude results in smaller relief displacements.
A real-world example is illustrated on figure 2.6.
Figure 2.5: For images taken with the same kind of lens the relief displacementsdecrease with an higher altitude, but increase with the distance to the nadir point[Nie04].
Figure 2.6: The two images are taken from roughly the same position but different al-titudes and lenses. The building is about 70 meters tall and the relief displacementsdiffer significantly due to the flight altitudes and lenses. [Nie04]
10
2.3 True orthophotos
2.3 True orthophotos
Orthophotos are usually created using a base earth elevation model and do not consider
occlusions. However, due to rapid changes in elevation, the consequential bigger relief
displacements for higher buildings can be so large that they will occlude the terrain and
objects next to them (figure 2.8).
Figure 2.7: Ob-
ject Stretching
with forward
projection due
to occluded
areas
At the German Aerospace Center the image processing software XDib-
ias is used. It consists of several modules for almost any kind of image
processing. The orhtophoto is one of the modules, and by means
of forward projection it generates a normal orthophoto on basis of a
DEM. This approach is capable of handling different types of cameras,
but leads to unwanted stretched objects (figure 2.7) in occluded areas,
resulting from interpolations in the orthophoto. Interpolation has to
be done due to gaps in the orthophoto caused by occluded areas that
are based on different heights of objects.
In this project, backward projection is used in order to eliminate the
interpolation in the orthophoto and the simpler investigation of oc-
cluded areas. The backward projection rectifies buildings and objects
back to their original position, but also leaves a ”copy” of the object on the terrain. The
left copy on the terrain - a so-called ”ghost image” - is caused by lack of information;
rays are projected back from the elevation model to both the occluded area and the oc-
cluding object without detecting that occluded data is being rectified. Therefore, the
”wrong” image data is placed in the occluded areas and is illustrated in figure 2.11b. A
true orthophoto reprojects the source images over a digital elevation model as well, but
takes occluded areas into account and fills them with data from other images throughout
raytracing, seamlining and mosaicking.
An orthophoto is understood as ”true”, when the generation process tries to restore any
occluded objects while at the same time including as many objects as possible in the
surface model. To include anything in the surface model that is visible, like vegetation,
people, cars, traffic lights, etc., in the source images would be an impossible task. In a
general understanding, true orthophotos are based on surface models that only include
terrain, buildings and bridges. A similar definition is found in [A+98]:
11
2 Orthophotos
Figure 2.8: Because of the perspective projection rapid elevation chances and tallbuildings hide objects next to them.
[...] the term true orthophotos is generally used for an orthophoto where surface
elements that are not included in the digital terrain model are also rectified to
the orthogonal projection. Those elements are usually buildings and bridges.
A different definition, which defines the true orthophoto only on basis of removing ghost-
image artifacts, is found in [K+04]:
[...] the term ”True Ortho” means a processing technique to compensate for
double mapping effects caused by hidden areas. It is possible to fill the hidden
areas by data from overlapping aerial photo images or to mark them by a
specified solid color.
Figure 2.10: Combina-
tion of several im-
ages for full coverage
In order to restore the occluded areas - or blindspots - correctly
and to automatically fill them with data, imagery of these ar-
eas is required. This supplemental information can be gained
by pictures of the same area taken from different perspectives
(figure 2.10). These pictures have the occluded areas shown and
by combining them, full coverage can be achieved. Aerial im-
ages are typically captured with sufficient overlap as illustrated
in figure 2.9. That means, for every blindspot seamlines have to
be generated and will, therefore, result in a significantly higher
amount of seamlines compared to regular orthophotos. The con-
12
2.4 Accuracy of orthophotos
Figure 2.9: Possible seamline placement in some orthophotos
sequence is a high demand on the mosaic process and good colormatching algorithm, since
the match must fit around all the numerous seamlines.
Before going through the true orthophoto generation process instead of the ordinary or-
thophoto generation, a decision based on facts has to be made. For images taken at a
high altitude with a small scale or resolution, true orthophoto generation makes no sense
because of relief displacements at the subpixel level or even displacements of 2-3 pixels do
not really matter. Consequently true orthophoto generation is only interesting for images
of high detail or low altitude, tall buildings and rough terrain or off-nadir-images, which
are often captured by high resolution satellites. Additionally, the kind of lens matters;
normal-angle lenses have less relief displacements than wide-angle lenses. A further inter-
esting field for true orthophoto generation is sideways looking satellite images, especially
for mountains.
2.4 Accuracy of orthophotos
The accuracy of an orthophoto depends on several parameters. Orthophotos are based on
a product derived from other data and consequently dependent of the quality of this data.
In detail, these are:
- The quality and resolution for the source images,
13
2 Orthophotos
(a) (b)
(c) (d)
Figure 2.11: a) Original source image. The building has not been moved to it’s cor-rect position yet. b) Image orthorectified with the existing orthophoto generationprocess. The building is rectified to it’s right position, but a ”ghost image” is leftat the source position. c) Image with visibility mask. d) True orthophoto withmerged imagery.
14
2.4 Accuracy of orthophotos
- the inner and outer orientation of the images and
- the accuracy of the digital elevation model.
The general visual quality of a true orthophoto mainly depends on the source images.
Some of the parameters that affect the quality of the images are:
- a non-influenceable parameter - the weather,
- quality of the camera and lens and
- resolution, precision and overall quality of digital scanning (if film is used)
Nowadays, camera models and lenses used for mapping are of a very high quality and of
a resolution of up to 100 Megapixels with eight centimeters ground resolution per pixel.
For this project, the imagery used is taken with digital cameras and has a resolution of
either 8 or 25 centimeters per pixel. The accuracy of the inner orientation of these aerial
cameras is negligible and for the outer orientation, the deviation is at the most about 1
pixel because of bundle adjustment, transformations and different sources. This project is
working with a DEM from stereo processed imagery. The advantage of such a DEM is that
it is generated based on the source images and consequently works perfectly together with
them for true orthophoto generation. On top of that, stereo-matched DEM generation
is very cheap compared to most previous approaches in other true orthophoto generation
processes [Nie04], which use manually modelled buildings.
Regarding equation 2.4 inaccuracies due to a poor DEM (for example by measuring the
surface with a laser, vertical errors in the DEM may happen at sharp edges, such as a
roof, which are not hit exactly and therefore do not return the correct altitude), increase
linearly away from the nadir point and consequently a constant error cannot be used
for orthophotos. Ordinary orthophotos often use only the central part, which firstly is
derived by the overlapping of neighboring images and secondly and more importantly,
reduces the main part of the ”uncontrolled” relief displacements effect. However, for true
orthophotos it is difficult to give a good overall estimate of the mean accuracy because
they are normally heavily mosaicked. Hence, it all depends on the final mosaic pattern.
15
2 Orthophotos
One method to give an estimate of a mean standard deviation integrated over the entire
image area is given by [Nie04]:
σdg =∆verf∗
√
a2 + b2
3(2.5)
The clipping area for the smaller central part used for ordinary orthophotos is scaled by a
and b. For a true orthophoto, the effective area is much larger and the length of the sides
of the image is 2a and 2b. The probability that the edges of an image are not used is the
same as for the central part. Therefore it is not possible to predict a good measure for
the standard deviation prior to the mosaicking process.
2.5 Summary
In this chapter, the concept of orthophotos was introduced, which steps are included to
generate them and most importantly the cause and the problem of relief displacements
before and after the ortho rectification were explained. Next, true orthophotos were defined
as an ortho rectification, which determines occluded areas and mosaics it with overlapping
imagery. Finally, the accuracy of orthophotos and true orthophotos was described and the
problem to estimate the accuracy was pointed out.
16
3 The Camera Model
To work with remote sensing imagery, for instance to merge them to a large mosaick or for
cartographic reasons, a relation to a world coordinate system is required. Therefore, the
light rays need to be modelled in order to trace rays from the object space to the image
plane or the other way around. Knowledge of the orientation and position of the camera
and the inner geometry of the camera are needed to accomplish raytracing. Normally, the
camera model is split into two sets of orientations: the interior and exterior orientation.
The relationship between the image coordinates ξ and η of an image point P ′ and the coor-
dinates X,Y, Z of an object point P is illustrated in figure 3.2 and is generally formulated
in the following equation:
(ξ, η) = f
X
Y
Z
(3.1)
The best way to take aerial images would be with a pinhole camera, which lets light
through a very small hole in and projects the image of the world scaled down to f/H
onto a surface at the back of the camera (figure 3.1a). The distance from the pinhole to
the backside is f (also known as the focal length or the camera constant [Nie04]) and H
is the distance from the pinhole to the object imaged. However, the smaller the pinhole
is, meaning the better the resolution is, the longer the exposure time is. The exposure
time can increase to several hours, which makes it practically unusable for most types of
photography and especially aerial images.
Like pinhole cameras, push broom is a technology for obtaining images with optical cam-
eras. It is usually used for passive remote sensing from space. In a push broom sensor,
17
3 The Camera Model
a line of sensors arranged perpendicular to the flight direction of the spacecraft is used.
Different areas of the surface are imaged as the spacecraft flies forward. Subsequently, the
single lines are merged to a two dimensional picture (figure 3.1). [Wik09]
The parametric description of cameras is depicted at the interior orientation section below,
and the exterior orientation section at the end of this chapter describes algorithms to
eliminate the distortion based on the orientation and position of the camera as well as on
the earth curvature.
f
(a)
Optics
Line Array
Ground track
(b)
Figure 3.1: (a) Pinhole camera: Rays simply parse the hole without any bendingof the rays which makes it simple to model and results in a clear image. (b)Pushbroom camera: Single lines of the earth surface are imaged while the spacecraftflies forward and merged to a two dimensional picture afterwards.
3.1 Interior orientation
The interior orientation was a very important part for analog cameras and early digital
cameras. With new developments and technologies, the manufacturers offer camera sys-
tems with negligible distortions due to interior orientation. Since the camera system used
for this study, UltraCam X or DMC, is highly accurate, has algorithms and procedures
implemented to fix the already trivial distortions on the fly and provides thereby imagery
for which only the exterior orientation has to be considered, only a brief description is
given in this study. The interior orientation is described more precisely in [Kra07] and
[Nie04].
18
3.2 Exterior orientation
Within a camera system distortions due to the lens, focal length and the distance between
the principal point in the image plane and the image center may occur. To eliminate them,
the interior orientation of the camera has to be known. The three constants are specific
to the camera and are normally determined by the manufacturer in the laboratory or test
flights.
The center of the photograph is found by intersecting lines between opposite pairs of
fiducial marks, also referred to as the fiducial center. The Principal Point is given with
respect to the center of the photograph. The manufacturer ensures that, as closely as
possible, the fiducial center coincides with the Principal Point (ξ0 = η0 = 0), also known
as Principal Point of Autocollimation (PPAC), so that the origin of the image coordinate
system is the center of the image plane.
When the image space rays are not parallel to the incoming object space rays, it is caused
by a distortion in the lens. The distortion consists of several components, where the radial
distortion is usually the largest. With an odd polynomial, the radial distortion can be
determined, and by measuring several points in the image, the result is a set of distortions
with respect to the distance to the Principal Point of Best Symmetry (PPBS), which is
the origin of radial distortions and located very close to the fiducial center and PPAC.
The camera constant f is determined during the calibration process as well and is the
length that produces a mean overall distribution of lens distortion [Kra07]. The focal
point is therefore located directly above PPBS at a distance corresponding to the focal
length. [Nie04]
3.2 Exterior orientation
To be able to reconstruct the rays, the geometry of the image forming system must be
known. The exterior orientation of a camera specifies the orientation and position of the
camera in the object space and can be devised in several ways. The one used in this project
is based on a Global Positioning System (GPS) combined with an Inertial Measurement
Unit (IMU), which is highly accurate and fast. The GPS for example provides an absolute
positioning in the object space every second or faster and the IMU measures the orientation
of the camera. The inclusion of control points [Kra07], for which the image coordinates and
19
3 The Camera Model
the object coordinates are known, and bundle adjustment can be used to further increase
the accuracy of the exterior and interior orientation. Frame cameras like UltraCam X
have one exterior orienation for an image, but line cameras have an exterior orientation
for each line, since the satellite is moving while obtaining the single lines.
f
Figure 3.2: Relation between image and object coordinates. [Kra07]
O with coordinates (X0, Y0, Z0) as the position of the perspective center (camera location)
of a three-dimensional bundle of rays, PP as principal point with coordinates ξ0, η0, f as
focal length and M as fiducial center, the relation between the camera space (ξ, η) and
object space (X,Y, Z) consists of a scale, a transition and a rotation in three dimensions
(ω, φ, κ). These operations are expressed in the colinearity equations [Kra07]:
ξ = ξ0 − fr11(X −X0) + r21(Y − Y0) + r31(Z − Z0)
r13(X −X0) + r23(Y − Y0) + r33(Z − Z0)
η = η0 − fr12(X −X0) + r22(Y − Y0) + r32(Z − Z0)
r13(X −X0) + r23(Y − Y0) + r33(Z − Z0)
(3.2)
20
3.3 Summary
The parameters rik appearing in equation 3.2 are the elements of the rotation matrix R
which describes the three-dimensional attitude, or orientation, of the image with respect
to the XY Z object coordinate system. The single values of R and how to determine them
depends on the used GPS/IMU system.
If at least one coordinate is known in the object coordinate system, the reverse calculation
from camera to object coordinate system can be done with equation 3.3 [Kra07]:
X = X0 + (Z − Z0)r11(ξ − ξ0 + r12(η − η0)− r13f
r31(ξ − xi0) + r32(η − η0)− r33f
Y = Y0 + (Z − Z0)r21(ξ − ξ0 + r22(η − η0)− r23f
r31(ξ − xi0) + r32(η − η0)− r33f
(3.3)
3.3 Summary
The focus of this chapter was on the camera model in general and on the two main parts,
the exterior and interior orientations, in detail. The two orientations are mandatory for
an accurate trace of rays from the object space to the camera, through the lens and onto
the image plane. The distortion of the lens is removed by the two orientations.
21
4 Digital Elevation Models
The information of the geometric shape and altitude of objects the source images contain
are mandatory for the ortho rectification process. The imagery and the knowledge of
the camera model used describe the orientation of the camera during the exposure and
distortions within the images, but to determine occluded areas, a georeferenced model of
the earth surface with altitudes of the objects included is required to intersect the rays of
the camera with.
4.1 Elevation models
A digital elevation model is a mathematical representation of an existing or virtual object
and its environment and in the case of this thesis the earth surface. Based on [KE02], a
DEM is a generic concept that may refers to ground elevation but also to any layer above
the ground such as vegetation, bridges or buildings.
Depending on the usage of an elevation model, there are different levels of detail. When
the information is limited to ground elevation, the DEM is called a Digital Terrain Model
(DTM) and only provides information about the elevation of any point on ground or water
surface.
If the pixel information contains the highest elevation of each point, coming from ground
or above ground area, the DEM is called a Digital Surface Model (DSM). Simple DSMs
only contain the roof edges and ignore roof constructions. More advanced DSMs give a
more exact representation of the surface by considering chimneys and the ridges on the
roof as well.
23
4 Digital Elevation Models
An even more advanced surface model that considers eaves and details on the walls would
require terrestrial photogrammetry, which would be very expensive and only necessary for
3D imagery. For the case of true orthophotos, it is an unimportant detail because only
the topmost object is visible for orthogonal aerial images and needed for this project. If
for example the roof covers a balcony below it, this object will not be visible in a correct
true orthophoto. Figure 4.1 illustrates these different types of surface models.
(a) Terrain (b) Roof edges (c) Roof ridges,chimneys andedge of eaves
(d) Wall detailsand eaves
Figure 4.1: Four levels of detail of surface models
4.2 Data collection for digital elevation models
This section gives a brief description of the numerous ways digital elevation models may
be prepared. They are frequently obtained by remote sensing rather than direct survey.
One powerful and common technique for generating DEMs is scanning the earth surface
with a laser: With respect to time-of-flight, the laser shines a point on the earth surface
and measures the distance to the object point based on the runtime of the reflected light.
Further explanations can be found in [ST08]. Alternatively, stereoscopic pairs of images
can be employed using the digital image correlation method. Two optical images acquired
with different angles taken from the same pass of an airplane or an earth observation
satellite. Analog camera images normally have 30 percent sidelap and 60 percent forward
overlap, as figure 4.2a illustrates. For dense city areas this coverage is often not enough
for stereo-matching, due to the lack of available perspectives. With digital cameras and
their easy and convenient way of taking images, a 60-80 percent sidelap and 60-80 percent
forward overlap has come to be standard. Consequently for any point (except the corners
marked in figure 4.2b) several stereoscopic pairs of images are given and the coordinates
of any point can be derived by known exposure positions due to GPS/IMU and known
24
4.3 Surface representation
Flight
dire
ctio
n
(a)
Flight
dire
ctio
n
(b)
Figure 4.2: (a) 30 percent sidelap and 60 percent forward overlap (b) 60 percentsidelap and 60 percent forward overlap.
camera angles. Therefore a significantly cheaper elevation model can be generated. Older
methods for generating DEMs often involve interpolating digital contour maps that may
have been produced by direct survey of the land surface or manual stereo plotting of aerial
imagery.
The quality of a digital elevation model is a measure of how accurate elevation is at each
pixel (absolute accuracy) and how accurately the morphology is presented (relative accu-
racy). Numerous factors play an important role for quality of DEM-derived products:
- terrain roughness,
- sampling density,
- grid resolution / pixel size,
- interpolation algorithm (i.e. for vegetation),
- point location accuracy,
- grid structure.
4.3 Surface representation
For data processing purposes the DEM has to be represented in a way that the information
of each pixel can be easily and quickly read, since raytracing includes a lot of picture points
25
4 Digital Elevation Models
to work with. Two surface representations are well known and most common: The Regular
Raster Grid (RG) and the Triangulated Irregular Network (TIN).
4.3.1 Regular Raster Grid
Due to [KE02], one of the main advantages of RGs is, that they have the geometry of an
image where the pixels are the nodes of the regular raster grid (figure 4.3) and the gray
values of the pixels represent the elevations. Therefore, grids should preferably, for data
size reasons and reading performance, be stored as images. The transformation from the
image coordinates of pixel (i, j) to corresponding 3D coordinates (x, y, z) can be expressed
as:
x = i ∗∆x+ x0
y = j ∗∆y + y0
z = f(i, j)
(4.1)
Figure 4.3: Raster image
of a grid
f(i, j) is the height at pixel (i, j). (x0, y0) are the spatial
coordinates of the image’s first row and line pixel. (∆x,∆y)
are the spatial sampling of the grid, or grid size, respectively
along the x, y axes. These simple calculations are the huge
advantage of RGs because a simple and fast location of the
correct grid point instead of interpolating between the trian-
gle’s vertices with TINs [M+01] is provided. The benefit of
fast and simple calculations due to regular spaced grid cells is
also the big limitation of RGs. The grid has only one height
in any point, and therefore, rapid elevation changes cannot
occur within one grid cell. Consequently, the accuracy de-
pends on the spatial sampling. In addition, plain terrain areas are split to several grid
cells instead of merged together to one huge cell.
26
4.3 Surface representation
Figure 4.4: DEM image, where the gray level is related to the altitude of the pixel(dark is low altitude - in this case 280 m, white high - in this case 320 m).
4.3.2 Triangulated Irregular Network
Another method of representing geographical features is a triangulated irregular network
that connects irregularly spaced and located spot elevations in an area with lines (edges)
to form a continuous system of triangles [E+00]. Each point (vertex) is connected to at
least three other points and most commonly generated based on Delaunay triangulation,
a method which attempts to assure a most efficient triangulation, connecting each point
only to its nearest neighbors (figure 4.5). This approach maximizes the minimum angles
in all the triangles instead of creating them long and narrow. To handle abrupt changes in
the surface like cliffs, a very dense network is needed (figure 4.6) or a modified algorithm,
which is capable to deal with breaklines, that supplement the points in the surface with
lines. Breaklines are placed along edges in the terrain and a constraint is added to the
algorithm that prevents edges of the triangles to traverse them.
Having triangles and irregularly spaced points overcomes both of the RG disadvantages.
Since data points are placed irregularly, they only need to be collected where there is a
27
4 Digital Elevation Models
(a) (b)
Figure 4.5: In a correct (b) Delaunay Triangulation the circumcircle does not containany points within the circumcircle. Therefore (a) is not a valid Delaunay Triangu-lation.
variation in terrain. Over a large relatively flat or low slope area, only a few points will
serve to describe the form; in areas of greater relief and higher, changing slopes more
frequent points can be measured and stored. The benefit of TINs are that important
points such as local high points and peaks, or low points, stream centerlines, etc. can be
measured and incorporated into the model.
(a) (b)
Figure 4.6: An example for a simple TIN without breaklines (a) and a surface withbreaklines (b). [EH01]
The downside of TIN models are firstly, that the data structure to describe the triangula-
tion is relatively complex. Tables of points, lines and faces have to be maintained, points
have to be linked up to lines (edges), and lines into faces (triangles). On top the z eleva-
tion value has to be interpolated for a given (x, y), since it’s most likely located within a
triangle. Secondly, TINs cannot handle vertical objects, since this requires more than one
height per point. A local height for the triangle and a global height for calculations would
be needed.
28
4.4 DEM generation by stereo image matching
4.4 DEM generation by stereo image matching
The request for this project was an easy, fast and as fully-automated as possible true or-
thophoto generation. To meet this requirement, the true orthophotos generated with the
developed application of this project are based on regular raster grids. The software is in-
dependent of the elevation model generation, but works best with digital elevation models
generated out of stereo satellite or aerial imagery. The benefit of this approach is that the
source images are also the source images for the DEM generation and consequently, match
perfectly for the ortho rectification and visibility mask generation. The DEM generation
process consists of the following main steps, which are implemented as parts of XDibias
[d+09].
1. Stereo matching in epipolar geometry
2. Forward intersection and outlier removal
3. Interpolation and orthorectification
Due to [Tao09], the stereo matching is done pixelwise with Semi-Global-Matching (SGM)
and Mutual Information (MI) to compensate radiometric differences of the input images.
MI is a cost function that provides a pixelwise probability for every possible gray value
combination, which indicates how good these gray values correlate for the stereo images,
but is generally ambiguous and wrong matches can easily have a lower cost than correct
ones, due to noise for instance. Therefore, an additional constraint is added that supports
smoothness by penalizing changes of neighboring disparities. The pixelwise cost and the
smoothness constraints are expressed by defining the energy E(D) that depends on the
disparity image D [Hir07]:
E(D) =∑
p
(C(p,Dp) +∑
q∈Np
P1T [| Dp −Dq |= 1] +∑
q∈Np
P2T [| Dp −Dq |> 1]) (4.2)
The first term is the sum of all pixel matching costs for the disparities of D. The second
term adds a constant penalty P1 for all pixels q in the neighborhood Np of q, for which
the disparity changes a little bit. The third term adds a larger constant penalty P2, for
all larger disparity changes [Tao09].
29
4 Digital Elevation Models
(a) (b)
(c) (d)
Figure 4.7: Stereo matching results from aerial UltraCam X images. (a) Small partof aerial image. (b) Disparity against one image. (c) Reprojected disparity. (d)Merged reprojection.
After the stereo matching, the disparity is reprojected into a cartographic coordinate
system (figure 4.7b). The reprojections of all disparity images are merged using a median
filter (figure 4.7d). Occlusions, matching failures or moved objects, lead to holes in the
merged DEMs, and are filled by inverse distance weighted interpolation. The SGM is a
good trade off between reconstruction quality and computation speed.
4.5 Summary
This chapter introduced the concept of digital elevation and surface models. In the begin-
ning, the different DEMs, their difference in the level of detail and included objects were
described. The next section focuses on the approach of creating a DEM through stereo
matching. Finally the most common surface representations were described; the grid and
the triangulated irregular network.
30
5 Design description
This chapter focuses on characterizing the general methods of the true orthophoto gener-
ation, devised and implemented in this project. The process is a step-by-step procedure,
and each step is described in detail in the following chapters. The approach in this study
is to use regular grids instead of triangulated networks used in [Nie04] and on top no
regular color-matching algorithm matches the images, but multiresolution splines, so that
the limitations of other true orthophoto generation processes are negotiated.
5.1 Limits of other true orthophoto applications
In [Nie04] and in most of other true orthophoto applications manually created TINs are
used for the true orthophoto generation. Creating a 3D TIN is very time consuming
and very expensive because an automated generation is not possible and therefore has
to be done manually. As a result of the manual generation the model is very accurate
and includes undersides of eaves and walls. But for the generation of true orthophotos
undersides of eaves and walls are not necessary, since they are not visible in the final true
orthophoto anyway. It is desirable to reduce the cost and effort of true orthophoto by
using automatically generated DSM by stereomatching or laserscanning.
Existing applications, that work with elevation models often cannot handle eaves or would
require a pre-process that eliminates eaves. Furthermore, they are based on the operating
system Microsoft Windows, but as part of the image processing software XDibias the true
orthophoto application had to be Linux-based.
Since the very often used algorithms for color-matching, like histogram-matching and
hue-matching, need a reference image to apply the others to, the process is not fully
31
5 Design description
automated. The matching algorithm needs to be told which image the reference image
is. On the other hand, multiresolution splines mosaicking treats every image equally, and
therefore, no manual adjustment has to be done.
One of the most difficult tasks in creating true orthophotos is to feather seamlines. In
[Nie04] a 3x3 mean filter is applied several times to smooth the seamlines. The final image
is smooth and the seamlines are feathered but also blurred. To overcome the blurriness,
this study feathers seamlines based on multiresolution splines, which adapt the transition
zone from one image to the other to different spacial frequencies.
5.2 Creating true orthophotos - Step by step
The complete true orthophoto generation process can be broken down to these crucial
steps:
1. Rectify images to orthophotos.
2. Locating the occluded pixels (visibility mask).
3. Seamline placement.
4. Mosaicking.
The true orthophoto process is illustrated on the diagram 5.1.
5.2.1 Rectification
The orthophoto rectification is a commonly accepted method to trace each pixel of the
output image back to the pixel in the input image. It is normal that the trace rarely hits the
center of the pixel in the input image when resampling an image. Therefore, methods are
required which interpolate between pixels. Some of them are: Nearest neighbor, bilinear,
bicubic interpolation and can all be found in [C+01]. In this project, it’s possible to choose
between nearest neighbor and bilinear interpolation. Nearest neighbor was implemented
due to its simplicity, since it selects the pixel value from the pixel that is closest to the
incoming ray. Bilinear interpolation uses the four nearest pixel values, which are located
32
5.2 Creating true orthophotos - Step by step
Figure 5.1: General approach of the true ortho rectification process.
33
5 Design description
in diagonal direction from the pixel hit by the ray in order to find the appropriate pixel
value of the desired output pixel.
The rectification method is illustrated in chapter 2. Needed mathematics and knowledge
of the camera is describe in chapter 3. In chapter 6, the actual raytracing implementation
is characterized.
5.2.2 Locating occluded pixels
Figure 5.2: Possible case for
occluded pixels
The most important step for the true orthophoto gen-
eration is to locate the occluded pixels. A regular or-
thophoto can be generated without this information, but
for mosaicking purposes and to guarantee high accuracy
and scale, the location of any occluded pixel is manda-
tory. Therefore, any ray in the DEM that is ”blocked”
by another object on its path from the point on the sur-
face to the camera has to be registered. Since the rays
have to be traced for the ortho rectification process too, it
makes sense to combine the creation of the visibility mask
and the rectification step. The raytracing of the elevation
model is described in chapter 6.
5.2.3 Seamline placement
To merge the images in a sufficient way, a transition or seamline has to be placed between
the images. The placing is usually based on a scoring algorithm and can be done in various
ways. In the case of this study, the Nearest Feature Transform also known as Distance
Transformation is used, so that the transition line is placed as far as possible from the
blindspots and to be able to fade out in all directions, as near as possible to the middle of
the intersection. The seamline placement is treated in chapter 7 as part of the mosaicking
process.
34
5.3 Implementation
5.2.4 Mosaicking
Figure 5.3: Possible case of a
mosaicked image
The final true orthophoto will be heavily mosaicked due to
all the occluded pixels. So if the processed images have rel-
atively large differences in color and brightness the seam-
lines will be visible and the final result is poor. The images
could be color matched prior to the rectification process,
but to accelerate and optimize the process, multiresolu-
tion splines are used. In chapter 7, the multiresolution
splines mosaicking is investigated and the advantages to-
wards regular color matching and feathering are pointed
out.
5.3 Implementation
The true orthophoto application is implemented in two parts, to split the two main parts,
orthorectification and mosaicking, and to provide the opportunity to mosaick orthorec-
tified imagery of different age. The orthorectification and visibility mask generation are
implemented as a XDibias module. The mosaicking is performed using the Enblend [Md04]
program.
Initially the goal was to have the visibility mask generation merged into the existing
orthophoto generation module. Since the approach for ortho rectification and ray tracing
differs in reprojection, the true orthophoto generation process became an extra module.
The software is written in C and runs on a Linux-based operating system. The first part is
designed for multithreading due to its extensive and many calculations. The module needs
the source image and DEM as input and creates a rectified image with marked occluded
areas as output.
The second part takes generated orthorectified images with marked occluded areas into
consideration and handles the crucial steps seamline placement, feathering and mosaicking.
This module merges all input images to one large ortho image, while trying to fill the
occluded areas with information from overlapping images.
35
5 Design description
The true orthophoto application can be found on the enclosed CD.
36
6 Raytracing the elevation model
Topmost
surface point
Output pixel
Digital elevation model
rays
X,
Y,
Z c
oord
inate
s
Figure 6.1: When raytracing the output pixel back to the source image, the DEM pro-vides the topmost surface point (Z coordinate) of a certain pixel (X, Y coordinate).The ray checks for visibility between the point and the camera. The rightmost rayis occluded by the tower.
Orthophotos can be generated in two ways: with forward and backward projection. Due
to a simpler implementation and more importance to interpolate in the source images
instead of the output image, the backward projection is used for raytracing in this project.
Thereby the output pixels are traced from the object space through the camera lens and
onto the image plane. The digital elevation model provides for each output pixel the X
and Y as well as the Z coordinate. At this position the raytracing back to the camera
37
6 Raytracing the elevation model
and onto the image plane starts. When creating a true orthophoto, the ray should also be
checked whether it intersects with another point in the object space or not. These steps
are illustrated in figure 6.1.
An 8 cm resolution true orthophoto of 1 km2 would contain roughly 156 million pixels,
even at 25 cm resolution 1 km2 would contain 16 million pixels and the same number
of ray traces. Therefore, an efficient way of performing the ray tracing is needed. Some
orthophoto applications speed up this process by doing a ray trace for every 2-3 pixels only
and then interpolating between them. This can result in jagged lines along the roof edges
where there are rapid changes of height in the surface model. The method is sufficient with
DTMs where the surface is smoother and it increases the speed significantly. [Nie04]
The preparatory thesis [Nie04] uses TINs in a binary tree data structure to perform the
raytracing and has acceptable results. This bachelor thesis is based on elevation models
using regular data grids. Therefore no binary tree has to be built and each output pixel
can be easily iterated through. To optimize raytracing, several optimizations like bounding
boxes, multithreading, global and local height maximum as well as a modified version of
the Bresenham raytracing algorithm [Bre65] are devised and all but local height maximum
are implemented.
6.1 Data storage
In this section a brief description of the data storage of elevation models, the source
images and the true orthophoto is given. All three of them are stored as folders on the
harddrive which include the actual image and some meta data like the world coordinates
of the upper left pixel of the DEM for example. Each image consists of channels, lines and
columns, which are imported into an one dimensional array in the application. A regular
image consists at least of three channels (red, green, blue) and stores for each pixel in
each channel the intensity of the certain color. Through a formula the three channels are
merged together and a true color image is generated. The DEM has only one channel,
which stores altitude. To get the value of a certain pixel in one of the image arrays, the
index has to be calculated with following equation:
38
6.2 Bounding box optimization
idx = r ∗ w ∗ tch+ w ∗ ch+ c (6.1)
Where idx is the index, r are is the row of the pixel, w is the width of a row, tch are
the total channels of the image, ch indicates the channel of the pixel and c is the column
of the pixel. Because of this equation the memory management is faster than creating a
three dimensional array and therefore speeds the application up.
6.2 Bounding box optimization
A
C
DEM
DEM
D
Figure 6.2: The orthophotos (A-D) and the DEM can be arranged in various ways toeach other. This image shows some arrangements and the required offsets for thebounding boxes (gray areas).
To avoid useless calculations, to save memory and to speed up the raytracing a bounding
box of the DEM and the ouput image is devised. To create the bounding box, at first the
minimum and maximum X, Y coordinates of the source image were calculated so that the
arrangement of DEM and orthophoto to each other could be figured (figure 6.2). Only the
rows and columns within the bounding box are required for the raytracing. The bounding
box considers besides the intersection of DEM and orthophoto also the airplane/satellite
position. Offsets are implemented so that if, for example, the airplane position is within
the DEM but not within the bounding box only the pixels within the bounding box are
traced but not any pixels outside of it.
39
6 Raytracing the elevation model
If for instance the bounding box is only a fourth of the size of the elevation model, only
a fourth of the pixels are traced and therefore this simple modification improves the
raytracing significantly.
6.3 Global and local maximum heights
Flight height
Global maximum height
Local maximum height
Local maximum height
Figure 6.3: Illustration of different heights to trace.
Raytracing consists of a tremendous amount of calculations, since every intersected point
or pixel has to be checked if it is an object in the object space or just ”air”. As described
above images may have over 100 million pixels, so each pixel has to be traced and tracing
includes calculations, due to intersections. The regular flight height for aerial images
is about 1500 meters. If for instance the traced pixel is at a height of 280 meters and
the distance in X and Y direction to the airplane position is 100 meters with a ground
resolution of 25 cm includes for one pixel up to 400 calculations (if not intersected with
an object). For 100 million pixels that makes in the worst case 40 billion calculations and
assuming of 1 µs per calculation 11 hours all over computation time. Therefore, a method
is needed to speed the raytracing up.
40
6.3 Global and local maximum heights
In applying a global maximum height, the computation time can be significantly decreased.
The global height is the maximum altitude of any object in the bounding box and since
the DEM has to be imported anyway, checking for the maximum height is easily done.
Instead of checking the entire ray until the airplane, the ray has to be checked until it
intersects with the maximum height layer. For an image with flat terrain and two story
buildings the height differs about 30 meters throughout the image and the mean distance
to the intersection point with the maximum height layer is 5 meters. Doing the same
calculation as above leads to 2 billion calculations for 100 million pixel and a computation
time of 33 minutes and a performance gain of around 20 times! However, the optimization
just works if the altitudes of the objects only differ by a few meters; for rough terrain or
skyscrapers the computation time would rapidly increase again.
SP
RP
CRP(i) MPRP
Figure 6.4: With local maximum
heights only pixels of certain
cells have to be checked for oc-
clusions instead of all points on
the ray back to the camera.
To compensate rough terrain, local maximum
heights are introduced. The idea behind local max-
ima is to break the bounding box down to cells of an
adjustable size for which the local maximum heights
are stored. A mistake would be to check the ray just
until the local height and not against local heights
of intersected cells too (figure 6.3). Figure 6.4 illus-
trates rays for certain points and cells which have
to be checked for those points.
Prior to the actual raytracing of each point P the
intersected cells CP of the ray RP are calculated.
To devise if a cell CRP (i) has to be traced point by
point, the minimum intersection pointMPRP of the
CRP (i) and ray RP are checked with the maximum
local altitude MLH(CRP (i)). If MLH(CRP (i))
is higher or lower than the actual height RH at
MPRP (CRP (i)).
CRP (i) =
0 if RH(MPRP (CRP (i))) > MLH(CRP (i))
1 if RH(MPRP (CRP (i))) <=MLH(CRP (i))(6.2)
41
6 Raytracing the elevation model
The cell is checked point by point only if CSP (i) is 1. This approach provides the oppor-
tunity to gain performance by checking only certain points on the raytracing instead of
all. Especially for rough terrain or large altitude differences the raytracing is accelerated.
Local maximum heights is not implemented in the application version on the enclosed
CD and was not considered for the experimentation in chapter 8, since not enough testing
could have been done until the deadline to ensure the algorithm to be bug free. Depending
on the source images short tests pointed a performance gain out of up to 10 times. This
would result in a computation time of about 3-8 minutes for 100 million pixel.
6.4 Raytracing with the Bresenham algorithm
The Bresenham algorithm is an algorithm in image processing to raster straights or circles
on to bit-mapped graphics [Bre65]. To trace the ray, the continuous world coordinates of
the object space have to be transformed into discrete pixel coordinates. If done for each
pixel and each step, a large amount of slow multiplications and divisions with floating
points would have to be done. Therefore, an algorithm to transform the coordinates once
and to trace in pixel coordinates with integer additions as the most complex operations
would accelerate the computing time significantly. The Bresenham algorithm fulfills these
requirements, is easy to implement and does minimize rounding errors.
0
Figure 6.5: Slopes in
different octants
[Wik10b]
The basic variant of this algorithm expects a straight line in the
first octant, that means a line with a slope between 0 and 1 from
(xstart, ystart) to (xend, yend) (figure 6.5). Then dx = xend−xstart
and dy = yend − ystart with 0 < dy ≤ dx. For octant 1 the
upcoming iteration is not like octant 0 based on dx but on dy. If
the slope is in octant 2-7, x or y is not raised by 1, but the signum
value of ∆y respectively ∆x. Furthermore for these octant the
iteration is done backwards instead of forward [Wik10b].
A step in the ”fast” direction (larger difference between end
and start point; in the case of figure 6.6a dx) is done every
iteration.Once the deviation from the ideal line becomes too large, a step in the slow
direction is performed, too. To determine the correct iteration step for the slower step is
by means of an error variable e, which is decreased by the smaller value (dy) every step
42
6.4 Raytracing with the Bresenham algorithm
(a) (b)
Figure 6.6: (a) and (b) describe the raster process of a straight line with the Bresenhamalgorithm. (b) illustrates the states of the error variable [Wik10a]
in x direction. If e < 0, a step in y has to be done and the larger dx value is added to e.
Due to the repeated crossover subtractions and additions the division of the slope triangle
is broken down just to basic operations [Wik10a]. Furthermore, the error variable has to
be initialized wisely. Consider the case of dy = 1, for which the step in y direction has to
be done at the middle or shortly after dx2
.
Mathematically that means, that
y = ystart + (x− xstart) ∗dy
dx(6.3)
gets transformed to
e = dx ∗ (y − ystart)− dy ∗ (x− xstart) (6.4)
If for instance one step in x direction is done, the error variable gets decreased by 1 ∗ dy.
Assuming that e < 0 after the decrease, results in an increase by dx, due to a step in y
direction, which is supposed to result in e ≥ 0 based on dx ≥ dy.
The following listing describes the Bresenham algorithm for all octant in pseudo code
[Wik10a]:
1 istartx = x coordinate of startpoint ; istarty = y coordinate of startpoint ;
2 iendx = x coordinate of endpoint ; iendy = y coordinate of endpoint ;
3
43
6 Raytracing the elevation model
4 /∗ measure distances ∗/
5 dx = iendx − istartx ; dy = iendy − istarty ;
6
7 /∗ determine direction and prefix ∗/
8 incx = sgn(dx) ; incy = sgn(dy) ;
9 i f (dx<0) dx =−dx; i f (dy<0) dy =−dy;
10
11 /∗ determine greater distance ∗/
12 i f (dx>dy)
13 {
14 /∗ x is fast ∗/
15 pdx=incx ; pdy=0; /∗ paral le l step ∗/
16 ddx=incx ; ddy=incy ; /∗ diagonal step ∗/
17 ef =dy; es =dx; /∗ error steps fast , slow∗/
18 }
19 else
20 {
21 /∗ y is fast ∗/
22 pdx=0; pdy=incy ;
23 ddx=incx ; ddy=incy ;
24 ef =dx; es =dy; /∗ error steps fast , slow ∗/
25 }
26
27 /∗ in i t ia l i ze ∗/
28 ix = istartx ; iy = istarty ;
29 err = es/2;
30
31 for ( i=0; i<es ; ++i )
32 {
33 /∗ update error term ∗/
34 err −= ef ;
35 i f (err<0)
36 {
37 err += es ;
38 /∗ step in slow direction ∗/
39 ix += ddx; iy += ddy;
40 }
41 else
42 {
43 /∗ step in fast direction ∗/
44 ix += pdx; iy += pdy;
45 }
46 SetPixel(x,y) ; /∗ check height of this pixel ∗/
47 }
44
6.5 Parallel processing
6.5 Parallel processing
Determine global and
local height maximum
Raytracing
Ray 1,3,5,...
Ray 2,4,8,...
Ray 6,7,9,...
Thread A
Thread B
Thread C
Figure 6.7: Illustration of the fork and join procedure with OpenMP for 3 threads.
Nowadays not only supercomputers consist of more than one computing cores, also regular
computers and notebooks have at least two cores and in some cases four or eight cores.
The advantage of multi-core systems is that processes and threads can be computed si-
multaneously and therefore at a fraction of the time as a single-core system. To use the
extra cores efficiently, the software has to be designed for parallel processing, since operat-
ing systems and CPU instructions are not capable to distribute operations on their own,
yet.
A process is an instance of a computer program that is being executed. It contains the
program code and its current activity. Depending on the operating system, a process may
consist of multiple threads of execution that execute instructions concurrently. A thread
results from a fork of a computer program into two or more concurrently running tasks.
The implementation of threads and processes differs from one operating system to another,
but in most cases, a thread is contained inside a process. The difference between threads
and multitasking operating system process are [N+96]:
- processes are typically independent, while threads exist as subsets of a process
- processes carry considerable state information, whereas multiple threads within a
process share state as well as memory and other resources
- processes have separate address spaces; threads share theirs
- processes interact only through system-provided inter-process communication (like
semaphores or message queues [Nor96]) mechanisms.
45
6 Raytracing the elevation model
- context switching between threads in the same process is typically faster than context
switching between processes, due to less overhead.
Lately another new technology was introduced, called Hyperthreading. It is an approach
from Intel for hardware-based multithreading. The idea is to utilize the cores of a CPU
better by filling the gaps in the pipeline with instructions of another thread. Those gaps
occur due to a cache-miss for instance and a second process or thread can compute in the
meantime. According to Intel a performance gain of up to 33 percent is possible [Cor04].
Not all algorithms are suitable for parallel processing. If for instance the calculations in
an iteration are based on each other, they have to be synchronized very often. In this
case, synchronization and thread switching produces a large overhead, and the multi-
threaded program will be slower than a single threaded one. The ideal data for parallel
processing are totally independent and may have to be synchronized at the end of all the
calculations.
In this study, parallel processing is used to investigate the global and local maximum
altitudes as well as for the raytracing itself. The Bresenham algorithm includes calculations
based on prior calculations and cannot be parallelized efficiently. Instead, multiple rays
are traced concurrently. The columns of a row of the bounding box are parallelized and at
the end of the row the investigated tracing results are joined, to export the finished row
to the true orthophoto image file. Since shared data is only read, race conditions do not
have to be considered. All other variables are created within a thread and therefore, not
shared with the other threads and cannot be manipulated unnoticed.
The implementation was easily done with the application programming interface OpenMP
(Open Multi-Processing) that supports multi-platform shared memory multiprocessing
programming in C on many architectures, including Linux. It consists of a set of compiler
directives, library routines, and environment variables that influence run-time behavior.
OpenMP was defined by a group of major computer hardware and software vendors and
gives programmers a simple and flexible interface for developing applications for desktops
as well as supercomputers [NF02]. Most of the compilers have OpenMP implemented and
iterations of a for-loop are concurrently computed with the compiler directive #pragma
omp parallel for . After the for-iteration the threads are joined and only the master thread
remains to continue with the single-processing parts of the software.
46
6.6 Rectification
6.6 Rectification
Source Image Rectified Image
Figure 6.8: Illustration of orthorectification. Ray does not always hit the center ofthe pixel of the source image.
If a pixel is not occluded it has to be filled with the correct data. With the equations of
chapter 3 and 4 the target position of the ray in the source image is investigated. Often the
ray does not hit the center of the source pixel due to distortions as figure 6.8 illustrates.
Therefore the pixel value of the output image has to be transformed or resampled. Several
ways are common, but Nearest Neighbor and Bilinear Interpolation are used in this thesis
[dB+08].
The nearest neighbor algorithm simply selects the value of the nearest point, and does
not consider the values of other neighboring points at all. In the case of this thesis the
nearest point is investigated by rounding the float value to an integer value. Figure 6.9a
illustrates the equation below.
o(ox, oy) =
i(ix, iy) if fx− ix < 0.5 and fy − iy < 0.5
i(ix+ 1, iy) if fx− ix ≥ 0.5 and fy − iy < 0.5
i(ix, iy + 1) if fx− ix < 0.5 and fy − iy ≥ 0.5
i(ix+ 1, iy + 1) if fx− ix ≥ 0.5 and fy − iy ≥ 0.5
(6.5)
where o is the output image, with (ox, oy) as the X, Y coordinates and i is the source
image with (ix, iy) as the integer values of the resampled values (fx, fy).
The bilinear interpolation calculates the value of point P = (x, y) by means of the four
neighboring points Q11, Q12, Q21 and Q22. The bilinear interpolation is an extension of
linear interpolation for interpolating functions of two variables on a regular grid. A linear
47
6 Raytracing the elevation model
0(a)
P(x,y)
Q22
Q21
Q11
Q121
00 1
(b)
Figure 6.9: (a) Nearest Neighbor Transformation. (b) Bilinear Interpolation.
interpolation is firstly performed in one direction and then again in the other. The value
of the unknown function f at the point P is found by
f(x, y) ≈ f(0, 0)(1− x)(1− y) + f(1, 0)x(1− y) + f(0, 1)(1− x)y + f(1, 1)xy (6.6)
if a coordinate system is chosen where f for the four neighboring points Q11, Q12, Q21 and
Q22 is known as (0, 0), (0, 1), (1, 0) and (1, 1). This is accomplished by computing f(x, y)
only with the fractional digits of x and y.
For evaluation purposes the nearest neighbor algorithm was used, since it consists of
less calculations and the quality gain by bilinear interpolation is negligible as figure 6.10
illustrates.
6.7 Summary
One way for intersecting a surface model was introduced and highly optimized through
bounding boxes, global and local maximum heights and parallel processing. Two meth-
ods to resample pixel values that are not occluded are described. The performance is
highly dependent of the source images, but compared to the raytracing library of [Nie04]
48
6.7 Summary
(a) (b)
Figure 6.10: (a) Nearest Neighbor Transformation Example. (b) Bilinear InterpolationExample.
a performance gain of 20 times for city imagery with altitude differences of 60 meters was
accomplished. An optimization that makes raytracing feasible even for large images.
49
7 Mosaicking
To generate large scale ortho imagery, multiple images have to be merged to form a mosaic
of images. Adjacent images are usually assembled along seamlines that are automatically
or manually placed roughly along the middle of the overlapping areas. For orthophotos,
the seamlines are often placed along roads or flat terrain so that no buildings or other
objects are intersected, which would result in visible seams due to relief displacements.
Figure 7.1: The overall brightness of
an image relies on the reflection
of surfaces and more important on
the angles of airplane, surface and
sun to each other. The gray-scale
below shows the amount of light re-
flected.
True orthophotos instead have the advantage
that relief displacements are mostly removed, de-
pending on the quality of the DEM. Therefore,
placing the seamlines along roads is not neces-
sary. However, seamline placement is more cru-
cial for true orthophotos than for orthophotos,
since they have significantly more seamlines, due
to the large amount of occluded areas. Radiomet-
ric differences are an inherent part of imagery and
the reason for clearly visible seamlines. Since the
images rely on the light from the sun, the relative
angle to the sun may also have great influence as
illustrated in figure 7.1. To avoid poor results
the seamlines have to be feathered and a smooth
transition has to be guaranteed.
51
7 Mosaicking
7.1 Mosaicking and Merging methods
The mosaicking methods presented in this section rely on a pixel-by-pixel score method,
inspired by methods presented in [Nie04] and [BA83]. The method used in this thesis is
cutline generation by Nearest Feature Transform (NFT), also known as Distance Trans-
formation and multiresolution spline merging as implemented in the open source enblend
program [Md04].
7.2 Mosaicking by Nearest Feature Transform
The first step after rectifying, is to mosaick the images. In this study the nearest feature
transform due to it’s simplicity and overall good result is used. Moreover the NFT ensures
that the pixel information of any image is used, for which the distance to the blindspots
is the largest, and therefore, inaccuracies in the surface model are compensated. It’s a
method that maps binary images into distance images (1 channel), where the distance to
the nearest object corresponds to the color level. In this case, the visibility mask is the
binary image with blindspots as the ojects (figure 7.2).
Figure 7.2: Nearest Feature Transformation of blindspot image (superimposed inwhite). The more red the greater the distance to a blindspot.
For each source image, a corresponding blindspot distance map is created, where the score
of each pixel indicates the distance to blindspots. The distance maps are used to determine
52
7.3 Seamline feathering
the source image to take the pixel data from. For now it is only possible to merge two
images with enblend [Md04] at a time, and therefore, the order of the merging process
could be of some importance, since no color-matching or histogram-matching algorithm is
used in this study. If for instance the perspectives of the merged images differ only a little,
the final true orthophoto will be tremendously mosaicked and the feathering algorithm
might be overstrained, which could result in a poor final image. But if always the two
images with the most different perspective are merged, the blindspots will be mainly
filled by one image instead of five or ten. Consequently, less feathering has to be done.
However, the tested imagery in chapter 8 shows that the feathering algorithm smooths
every seamline precisely and therefore the order is not important. The only importance
is, that the images overlap. Figure 7.3 illustrates an image mosaick of two images.
Figure 7.3: Distance map for joining to images in late stage, where the dark areascorrespond to one image and the white areas to the other (surface outlines aresuperimposed).
7.3 Seamline feathering
As figure 7.4a shows, without any adjustment and feathering the seamlines are clearly
visible and the overall result is poor. Therefore, a method is required which easily joins
images. The merging thus needs to handle ratiometric differences in the input images.
In the traditional orthorectification mosaicking, color values are first adjusted to match
53
7 Mosaicking
(a) (b)
Figure 7.4: (a) is a trueorthophoto without seamline feathering so that the seamlinesare obvious. (b) is feathered by Multiresolution spline [BA83] and no seamlines arevisible.
a reference image and then merged with a simple feathering. The disadvantages of this
method are: Firstly, calculations besides feathering, due to the matching have to be done
and secondly, the reference image might has radiometric distortions and would lead to
an addition of these distortions to all other images. In this thesis instead, the promising
multiresolution splines [BA83] are used. The approach is to distort the surfaces gently,
so that they can be joined together with a smooth seam while still preserving as much of
the original image information as possible [BA83]. This means that, no reference image is
required and a smooth transition is guaranteed.
Figure 7.5: The weighted average method may be used to avoid seams. Exampleweighting functions are shown here in one dimension. The width of the transitionzone T is a critical parameter for this method [BA83].
54
7.3 Seamline feathering
An image consists of different spatial frequencies. Spatial frequency is a measure of how
often a certain structure repeats per unit of distance. In image processing applications,
the spatial frequency is often measured as lines per millimeter and differences in this
frequencies convey different information about the appearance of a stimulus. High spatial
frequencies represent abrupt spatial changes in the image, such as edges, and generally
correspond to fine detail. Low spatial frequencies, on the other hand, represent global
information about the shape and smooth areas like grass [Bar04].
So, to make a seamline really smooth the various spatial frequencies of an image have to
be joined differently. Figure 7.5 describes the merging of two images through a weighted
average (Hl(i) for the left image and Hr(i) for the right image) within a transition zone
T . If the transition zone is the same for every spatial frequency the resulting image will
be highly distorted, since high spatial frequency for instance has to be joined in a smaller
zone than low spatial frequency like the grass, that has to be joined slowly and smoothly.
Therefore the image should be decomposed into a set of band-pass component images
for the different spatial frequencies. A separate spline with an appropriately selected
T can then be performed in each band. Finally, the splined band-pass components are
recombined into the desired mosaic image [BA83].
7.3.1 Generating the Gaussian Pyramid
The key of a good overall result is to blend image features across a transition zone propor-
tional in size to the spatial frequency of the features. This is accomplished by blending two
images together, one spatial frequency level at a time. Each level uses a different blending
mask or distance map. At the top level, a sharp blend mask is used so that high-frequency
details are blended over a narrow region. At the bottom level, a wide blend mask is used
so that low-frequency details are blended over a large region.
Figure 7.6: A one-
dimensional graphical
representation of the
iterative REDUCE op-
eration used in pyramid
construction [BA83].
To do so, a sequence of low-pass filtered images
G0, G1, ..., GN are obtained by repeatedly convolving a
small weighting function with an image [BA83]. As fig-
ure 7.6 shows G0 is the original image and from that one
on the value of each node in the next level (for G0 G1, for
G1 G2 and so on) is computed as a weighted average of
55
7 Mosaicking
a 5 x 5 subarray of the current level. If this approach is
imagined, the result looks like a pyramid.
Sample density and resolution are decreased from level
to level of the pyramid and can be described with this
equation and 0 < l < N :
Gl(i, j) = REDUCE[Gl−1(i, j)] =∑
5∑
m,n=1
w(m,n)Gl−1(2i+m, 2j + n) (7.1)
Where i and j identify the pixel and w(m,n) is a pattern of weight used to generate each
pyramid level. Figure 7.7 illustrates different levels and a collapsed version of the highest
level, which clearly shows a really smooth transition zone for lowest spatial frequency.
(a) (b) (c) (d) (e) (f)
Figure 7.7: (a) original image - sharpest (b), (c) and (d) are intermediate levels (e) istop level for smoothest transition and (f) is the mask (e), but scaled up the originalsize using multiple EXPAND operations [Md04].
7.3.2 Generating the Laplacian pyramids
Images broken up into components based on spatial frequency are called Laplacian pyra-
mids, which contain the highest spatial frequency components at the lowest level and the
lowest spatial frequency components at the top level. Intermediate levels contain features
gradually decreasing in one-octave steps in spatial frequency from high to low.
56
7.3 Seamline feathering
A Laplacian pyramid is made by repeatedly applying a high-pass filter to the image. The
high-pass filter picks out all of the high spatial frequency components of the image and
passes everything else to the next level. This process can be compared to subtracting
each level of the pyramid from the next lowest level. Because these arrays differ in sample
density, it is necessary to interpolate new samples between those of a given array before
it is subtracted from the next lowest array [BA83]. Let Gl,k be the image obtained by
expanding Gl k times. Then
Gl,0 = Gl (7.2)
and the interpolation can be described as
Gl,k(i, j) = EXPAND[Gl,k−1(i, j)] = 4∑
2∑
m,n=−2
Gl,k−1(2i+m
2,2j + n
2) (7.3)
Only terms for which (2i+m)/2 and (2j + n)/2 are integers contribute to the sum. The
size of Gl = Gl−1 = G0.
L0, ...LN are defined as a sequence of band-pass images and for 0 < l < N with
Ll = Gl − EXPAND[Gl+1] = Gl −Gl+1,l and
LN = GN(7.4)
7.3.3 Summation and splinning overlapped images
The final image is obtained as a combination of expanding and summing. With one image
at the top pyramid level, LN is first expanded and added to LN−1 to recover GN−1 and
so forth. This can be written as [BA83]:
57
7 Mosaicking
(a) (b) (c) (d) (e) (f)
Figure 7.8: (a) highest spatial frequency (b), (c) and (d) are intermediate levels (e) istop level for smoothest transition as (f) shows. [Md04]
G0 =N∑
l=0
Ll,l (7.5)
The complete algorithm consists of the following steps:
1. Construct Laplacian pyramids LA and LB for images A (left) and B (right),
2. If the center line for level l of the final image is at i = 2N−1, then the final Laplacian
pyramid is calculated by:
LSl(i, j) =
LAl(i, j) if 1 < 2N−1
(LAl(i, j) + LBl(i, j))/2 if i = 2N−1
LBl(i, j) if 1 > 2N−1
(7.6)
3. The final image is then obtained by expanding and summing the levels of LS.
7.4 Summary
In this chapter, a process for seamline placement and mosaicking images, to form a seamless
true orthophoto was described. The used NST is a good foundation for the multiresolution
spline, since it places the transition line as far as possible from the blindspots so that the
58
7.4 Summary
spline has enough space to fade out and feather the seamlines really smooth. The enblend
[Md04] program contains an efficient implementation of this algorithm, which can be
applied to large images.
59
7 Mosaicking
(a) (b)
(c) (d) (e) (f)
(g) (h) (i) (j)
(k) (l) (m) (n)
(o) (p) (q) (r)
(s) (t) (u) (v)
Figure 7.9: (a) shows the two original images and (b) the final image.(e),(i),(m),(q),(u) are the Laplacian pyramid of image A. (f),(j),(n),(r),(v) are theLaplacian pyramid of image B. (d),(h),(l),(p),(t) are the Gaussian pyramid of dis-tance map. (c),(g),(k),(o),(s) are the Laplacian pyramid of the final image [Md04].
60
8 Experimentation and Evaluation
The previous chapters described the investigated approach of this thesis to generate true
orthophotos. Througout this chapter the developed application is tested for the city
Terrassa in Spain and Vaihingen an der Enz in Germany. The problems are illustrated
and commented. The tested imagery is exposed with the UltraCam X camera model or
DMC and has either a ground resolution of 8 cm or 25 cm. The software is tested with
stereo-matching based and laser scanned DEMs and differences based on the DEM are
drawn out.
Due to the size and resolution of the images, only close-up results are shown in this thesis.
The true orthophotos in full resolution can be found on the companion CD.
Below an overview of the generated true orthophotos is shown.
(a) Terrassa: 34 million pixel. Pixel size:0.25 m
(b) Vaihingen: 3.18 million pixel. Pixelsize: 0.08 m
61
8 Experimentation and Evaluation
8.1 Performance
As mentioned in chapter 6, the feasibility of large-scale true orthophoto imagery depends
on the computation time of the orthorectification. To evaluate the single performance op-
timizations, the computation time for some reference images was registered. The reference
images had a size between 3.18 million pixel and 34.50 million pixel and either a ground
resolution of 0.08 m or 0.25 m. Additionally, an image that covered an area of about
630 m x 1550 m with a ground resolution of 0.25 cm and roughly 21.86 million pixels was
raytraced for a comparison with the raytracing library of [Nie04]. Table 8.1 illustrates the
single computation times and points out that a performance gain of 20 times compared
to [Nie04] was accomplished.
The experimentation was done on what is comparable to a standard laptop at the time of
writing this thesis. The overall specifications are:
Processor: Intel Centrino Core 2 Duo, 2.4 GHz
800 MHz Front Side Bus, 4MB L2 Cache
Memory: 2048 MB DDR2 RAM
Operating System: Ubuntu 9.10
Table 8.1 points out the processing time of some images depending on the optimizations
activated. From left to right another optimization is applied on top:
Size Res Master thesis BBox GMH MT
3.18 million pixel 0.08 m 719 s 132 s 91 s
6.75 million pixel 0.25 m 2596 s 101 s 64 s
15.7 million pixel 0.25 m 424 s 256 s
21.86 million pixel 0.25 m 1 hr 615 s 371 s
34.50 million pixel 0.25 m 703 s 624 s
Size as the image size and Res as the ground resolution of the image describe the processed
photograph. Column Masterthesis contains the computation time of the raytracing li-
brary of [Nie04]. The computation time of the Bresenham algorithm plus Bounding Box
is drawn out in BBox, GMH is the computation time with global maximum height and
MT is the computation time with parallel processing and all optimizations actived.
62
8.2 Pros...
The Bresenham algorithm was tested without any optimization only for images with less
than 7 million pixels, since the computation time was not feasible for any larger imagery.
The biggest performance gain is done by means of global maximum heights. For a 3.18
million pixel image, the performance was 7 times faster, for a 6.75 million pixel image
26 times and increasing. The performance gain due to global maximum altitude can be
drawn as an exponential graph, since the pixels to trace get more and the tracing steps get
more, too. Additionally, interesting is the performance gain of the 7 million pixel image
compared to the 3 million pixel image. This fact points out that the shape of the elevation
model has a huge impact on the performance. If the difference between the global mean
height and the global maximum height is small, raytracing is done very fast, since only a
few steps have to be done.
To test the orthorectification as well as mosaicking due to the processing time of the true
orthophoto generation, 15 images of Vaihingen an der Enz with a resolution of 3.18 million
pixel and a ground resolution of 0.08 m were used. The processing time is given below:
Orthorectification and locating blindspots: 25 minutes (∼ 100 seconds per image)
Mosaicking and feathering: 2 minutes
8.2 Pros...
The overall result of the true orthophoto generation is very good. Narrow backyards are
fully visible, all roof tops are moved to their correct position and no walls are visible. Even
tall objects with large relief displacements are rectified correctly, and the large occluded
areas are filled with data from other images. Figure 8.1a shows a tall rectified objects.
[Nie04] is based on 3D models which only consist of buildings and terrain, but do not
include trees or cars. Stereo-matched DEMs even include trees and cars, which are visible
in both source images of SGM. Therefore trees do not look weird and cut-through cars
do not exist. Also narrow backyards of Terrassa are clearly visible, which is for normal
orthophotos only given for areas very close to the nadir point (figure 8.1b).
63
8 Experimentation and Evaluation
(a) (b)
Figure 8.1: (a) Rectified tall building (50 meters tall), some occluded areas are leftdue to too less perspectives. (b) Narrow backyards visible in a true orthophoto.
8.3 ...and cons
Some minor problems are still left in the true orthophoto generation. These remaining
errors are small and usually only noticeable if they are investigated.
Deviations in the DEM can cause poor rectifications. Terrain pixels might be treated
as roof pixels and they also get resampled to the roof’s rectified position (figure 8.2a).
The other way around is possible too, as the figure 8.2b illustrates, if the SGM process
identifies one roof point wrong, the result is holes in the roof.
(a) (b)
Figure 8.2: (a) Terrain pixels in roof. (b) Holes in roof.
64
8.4 Using a simpler DEM
If sufficient overlap and perspectives are not available, occluded areas may remain occluded
in the final true orthophoto. This is a huge problem as illustrated below. Some city parts
of Terrassa have many occluded pixels, because only one image provides coverage here.
Figure 8.3: Remaining blindspots in one part of Terrassa, due to only one imagecoverage.
8.4 Using a simpler DEM
The developed application was also evaluated based on other sets of source data. The
true orthophoto generation based on a laser-scanned DEM points out the importance of
a correct and accurate elevation model (figure 8.4). The visibility mask based on the
DEM is correct as figure 8.4a illustrates. But laser-scanned DEMs cut off sharp edges,
so that objects on the terrain are smaller modeled than they really are. Therefore the
source image does not fit properly over the DEM as figure 8.4b illustrates. The calculated
visibility masks are smaller than they are supposed to be and roof parts are pulled down
to the terrain level. On top the ”center” of the roof is not placed at the actual center of
the building and walls are visible. The mosaicking process treats the pulled down roof
parts as correct data and might uses these parts to fill occluded areas. The result is a
poor true orthophoto that has plenty of ghost images left ??.
65
8 Experimentation and Evaluation
(a) (b)
Figure 8.4: (a) shows the DEM merged with the visibility mask. (b) points thedisplacement of the visibility mask out. Due to different sources of DEM and orthoimage and poor accuracy at sharp edges the objects tend to be smaller.
8.5 Considering all images for Nearest Feature Transform
It was interesting to see what the mosaic pattern would look like, if enblend [Md04]
considers all images at the same time instead of two at a time. As expected the image
would be even more mosaicked, because the farthest pixel of any blindspot is taken. The
result could be poorer than the current approach, since even single pixels are taken from
different images instead of areas only. A master image cannot be clearly identified as
figure 8.5 shows, but an order of consideration of the images can be investigated. To
decrease mosaicking, the inclusion of the distance to nadir as mentioned in [Nie04], could
be helpful.
8.6 Summary
The implemented methods were tested on different sets of source images and elevation
models. The overall result is good and small remaining errors were pointed out in this
chapter. The significant errors are all caused by limitations or inaccuracies of the used
66
8.6 Summary
Figure 8.5: Illustration of a Nearest Feature Transform of 15 images.
DEM. Enough overlap and available perspectives are crucial for no remaining occluded
areas.
67
9 Conclusion
The overall goal of this thesis was to investigate different methods for true orthophoto
generation and to implement an efficient method as part of the image processing software
XDibias. The crucial steps of true orthophoto generation are: Rectification, mosaicking
and feathering. The rectification was implemented as an easy to use module of XDbibias,
but can be used stand-alone as well. The rectified images are mosaicked and feathered
with enblend.
9.1 Evaluation
The rectification was done by raytracing from the surface model back to the camera,
taking the camera model into account and registering rays that were blocked by objects
in the surface model. Several methods are implemented to optimize the rectification
process. At first a bounding box based on the size and coordinates of the output image
and the DEM was made, so that only those points are traced, which are available in both
datasets. Subsequently, the height of each bounding box point is devised and local altitude
maxima for areas of predefined size besides the global altitude maximum are stored, while
importing the required DEM points into a data structure. The stored altitude maxima
are used to speed up the raytracing, so that instead of tracing the ray until the camera,
the ray is traced to the local altitude maximum in best case and until the global altitude
maximum in worst case. The software is implemented for parallel processing, in order
to compute the tremendous amount of calculations simultaneously on multi-core systems.
The implemented optimizations work around the sufficient and easy Bresenham algorithm
and speed up the raytracing process compared to the raytracing library in [Nie04] from 1
hour for a 22.7 million pixel image to 5 minutes.
69
9 Conclusion
The method used in this thesis to mosaick the images relies on a pixel-by-pixel score
method. Seamline placement is done with Nearest Feature Transform, since it ensures
that the pixel information of any image is used, for which the distance to the blindspots
is the largest. The experimentations in 8 show that NFT compensates inaccuracies in the
surface model and provides a good foundation for the multiresolution splines.
Feathering and merging was based on multiresolution splines merging that showed a very
good overall image quality result. Despite the fact, that no color-matching or histogram-
matching algorithm was used radiometric differences are balanced and seamlines are not
obvious. The merging was done with different transition zones depending on different
spatial frequencies. High spatial frequency, like sharp edges, was joined in a smaller zone
than low spatial frequency like the grass, that was joined slowly and smoothly. The
mosaicking and feathering methods were able to assign images, so inaccuracies of the
surface model were less likely to create problems in the final result.
It is assessed that the methods devised are able to be applied on large scale true orthophoto
mosaics.
9.2 Outlook
Improvements to the true orthophoto generation process can still be made. As pointed out
in the test results, the mosaicking process could be improved by combining the distance
to blindspots algorithms besides nearest-to-nadir with an algorithm not based on a pixel-
score method, but an area-score method. So that for a certain occluded area the largest
locally coherent, visible area of one image instead of pixels from ten different images is
taken.
For now enblend merges only two images at a time. The consideration of all images
for each occluded pixel or the determination of a sequence to merge the images, so that
always images with opposed or most different perspectives are merged could improve the
mosaicking process.
Depending on the quality of the DEM and the number of viewing directions, some blindspots
might remain in the final image. Since the blindspots have the pixel value 0, and therefore
70
9.2 Outlook
are painted black, they are very obvious. To hide the leftover blindspots, the color of the
adjacent pixels could be interpolated.
Some imagery shows a huge contrast between areas in the sun and shadow. Since the
exact time and position of the images is known, by means of the elevation model shadows
thrown by buildings and objects can be calculated. The scoring method could take into
consideration if a image pixel is shadowed or shadowed areas could be brightened by
histogram equalizing or matching with histograms of sunlit areas.
The rectification process is by far the slowest part of the true orthophoto generation. Lately
the processor on 3D graphic cards - so called Graphics Processing Units (GPU) - are used
for fast calculations as an alternative to using super computers in various fields. These
graphics processing units are highly optimized for a few sets of calculations like floating
point calculations. With Computed Unified Device Architecture (CUDA) from nVidia or
Stream from AMD/ATI the GPUs can be used as co-processors and since raytracing is
an essential part of a 3D graphic processor, it may be possible to speed up the process
significantly by this hardware-based approach.
Up to now, an alternative to the true orthophoto was the creation of normal orthophotos
based on imagery with a large overlap greater than 80 percent. Using only the central part
of the images eliminated the worst part of relief displacements, but still produces orthopho-
tos with significant relief displacements, especially in dense city areas. This method is, of
course, expensive in flight hours and image preprocessing, but most likely cheaper than
manually creating a detailed surface model first. With new technologies, higher resolutions
and more details, negligibly subpixel level displacements become pixel level displacements
and could just be compensated with an even larger and more expensive overlap of the
imagery. This study devised a method to generate true orthophotos without expensive
3D surface models but with cheap, fully automated generated stereo-matched elevation
models. This leads to the conclusion that the decreasing expense of true orthophotos and
the demand for greater detail and accuracy will displace normal orthophotos in the near
future.
71
9.2 Outlook
Acknowledgements
At this point I woud like to thank all the people, who contributed with their support to
the success of my Bachelor thesis.
I am deeply indebted to Dr.-Ing. Pablo d’Angelo at the German Aerospace Center in
Oberpfaffenhofen for the offer, the mentoring and assistance during the development of
the thesis and the subsequent proofreading.
My thanks also go to Prof. Andreas Siebert PhD at the University of Applied Sciences
for mentoring and being the impulse of this thesis.
I want to thank all members of staff at the department of applied remote sensing cluster,
especially Dr. Peter Reinartz and Dr. Danielle Hoja, who made it possible to write
my Bachelor thesis at the German Aerospace Center and Dr. Danielle Hoja also for
proofreading certain chapters.
Addionally, I would like to thank my American friend Amar H Patel M. Eng. for proof-
reading.
The greatest debt of appreciation I will forever owe my family, who always supported me
in my professional career.
73
REFERENCES
References
[A+98] F. Amhar et al. The Generation of True Orthophotos Using a 3D Building Model
in Conjunction With a Conventional DTM. IAPRS, Vol.32, p.16-22, 1998.
[BA83] P. J. Burt and E. H. Adelson. A multiresolution spline with application to image
mosaics. ACM, 1983.
[Bar04] M. Bar. Visual objects in context. National Rev. Neuroscience 5, 2004.
[Bre65] J. E. Bresenham. Algorithm for computer control of a digital plotter. IBM
Systems Journal 4, 1, 1965.
[C+01] J.M. Carstensen et al. Image analysis, vision and computer graphics. Technical
University of Denmark, 2001.
[Cor04] Corbet. Scheduling domains, http://lwn.net/Articles/80911, 2004.
[d+09] P. d’Angelo et al. Towards automated DEM generation from high resolution
stereo satellite images. Commission IV, 2009.
[dB+08] M. de Berg et al. Computational Geometry. 2008.
[E+00] S. Easa et al. Urban planning and development applications of GIS. ASCE
Publications, 2000.
[EH01] S. M. Ervin and H. H. Hasbrouck. Landscape modeling: digital techniques for
landscape visualization. McGraw-Hill Professional, 2001.
[Hir07] H. Hirschmueller. Stereo Processing by Semiglobal Matching and Mutual Infor-
mation. IEEECS, 2007.
[K+04] Y. Kuzmin et al. Polygon-based True Orthophoto Generation. Proceedings of
the 20th ISPRS congress: 405, 2004.
[KE02] M. Kasser and Y. Egels. Digital photogrammetry. CRC Press, 2002.
[Kra07] K. Kraus. Photogrammetry. Walter de Gruyter, 2007.
75
REFERENCES
[M+01] E.M. Mikhail et al. Introduction to Modern Photogrammetry. Wiley and Sons
Inc., 2001.
[Md04] A. Mihal and P. d’Angelo. http://enblend.sourceforge.net/
details.htm, 2004.
[N+96] B. Nichols et al. Pthreads Programming. O’Reilly And Associates, 1996.
[NF02] O. Nommensen and M. Firuziaan. Parallel Processing via MPI & OpenMP.
Linux Enterprise, 2002.
[Nie04] M. O. Nielsen. True orthophoto generation. Technical University of Denmark,
2004. Master’s thesis, Preparatory thesis.
[Nor96] J. Northrup. Programming with UNIX Threads. John Wiley And Sons, 1996.
[ST08] J. Shan and C. K. Toth. Topographic laser ranging and scanning: principles
and processing. CRC Press, 2008.
[Tao09] J. Tao. Generierung von 3D-Oberflaechenmodellen aus stark ueberlappenden
Bildsequenzen eines Weitwinkel-Kamerasystems. University of Stuttgart, 2009.
Diploma thesis.
[Wik09] Wikipedia.org. http://de.wikipedia.org/wiki/Zeilenkamera, 2009.
[Wik10a] Wikipedia.org. http://de.wikipedia.org/wiki/
Bresenham-Algorithmus, 2010.
[Wik10b] Wikipedia.org. http://de.wikipedia.org/wiki/Rasterung_von_
Linien, 2010.
76
REFERENCES
List of Abbreviations
CUDA Computed Unified Device Architecture
DEM Digital Elevation Model
DSM Digital Surface Model
DTM Digital Terrain Model
GCP Ground Control Point
GIS Geographic Information System
GPS Global Positioning System
GSD Ground Sample Distance
IMU Inertial Measurement Unit
MI Mutual Information
NFT Nearest Feature Transform
PPAC Principal Point of Autocollimation
PPBS Principal Point of Best Symmetry
RG Regular Raster Grid
SGM Semi-Global-Matching
TIN Triangulated Irregular Network
77
List of Figures
List of Figures
2.1 Distinction of orthographic and perspective projection . . . . . . . . . . . . 5
2.2 Cause of relief displacements in perspective images . . . . . . . . . . . . . . 6
2.3 Forward and backward projection . . . . . . . . . . . . . . . . . . . . . . . . 7
2.4 Relief displacements . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9
2.5 Impact of flight altitude and distance to nadir point on relief displacements 10
2.6 Real-world example of relief displacement . . . . . . . . . . . . . . . . . . . 10
2.7 Object Stretching with forward projection due to occluded areas . . . . . . 11
2.8 Occluded areas . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 12
2.10 Combination of several images for full coverage . . . . . . . . . . . . . . . . 12
2.9 Possible seamline placement in some orthophotos . . . . . . . . . . . . . . . 13
2.11 Imagery showing the different stages of true orthophoto generation . . . . . 14
3.1 Camera types . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 18
3.2 Exterior orientation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 20
4.1 Levels of DEM . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 24
4.2 Illustration of image overlapping . . . . . . . . . . . . . . . . . . . . . . . . 25
4.3 Grid example . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 26
4.4 DEM image . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 27
4.5 Delaunay Triangulation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 28
4.6 TIN Examples . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 28
4.7 DEM Generation Process . . . . . . . . . . . . . . . . . . . . . . . . . . . . 30
5.1 General approach of the true ortho rectification process. . . . . . . . . . . . 33
5.2 Possible case for occluded pixels . . . . . . . . . . . . . . . . . . . . . . . . 34
5.3 Possible case of a mosaicked image . . . . . . . . . . . . . . . . . . . . . . . 35
6.1 Raytracing of output pixels to camera in object space . . . . . . . . . . . . 37
6.2 Possible arrangements of DEM and Orthophoto . . . . . . . . . . . . . . . . 39
79
List of Figures
6.3 Different heights to trace too . . . . . . . . . . . . . . . . . . . . . . . . . . 40
6.4 Raytracing with local maximum heights . . . . . . . . . . . . . . . . . . . . 41
6.5 Slopes in different octant . . . . . . . . . . . . . . . . . . . . . . . . . . . . 42
6.6 Raster process with Bresenham . . . . . . . . . . . . . . . . . . . . . . . . . 43
6.7 Multi-processing illustration . . . . . . . . . . . . . . . . . . . . . . . . . . . 45
6.8 Illustration of orthorectification . . . . . . . . . . . . . . . . . . . . . . . . . 47
6.9 Nearest Neighbor Transformation and Bilinear Interpolation . . . . . . . . . 48
6.10 Nearest Neighbor Transformation and Bilinear Interpolation Example . . . 49
7.1 Sunlight impact . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 51
7.2 Distance transformation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 52
7.3 Image mosaic . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 53
7.4 Seamline feathering . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 54
7.5 Weighted Average . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 54
7.6 REDUCE operation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 55
7.7 Gaussian pyramid . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 56
7.8 Laplacian pyramid . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 58
7.9 Multiresolution spline . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 60
8.1 Illustration of rectified tall building and narrow backyard . . . . . . . . . . 64
8.2 Impact of deviations in DEM . . . . . . . . . . . . . . . . . . . . . . . . . . 64
8.3 Remaining blindspots . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 65
8.4 Laser Scanner DEM . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 66
8.5 Mosaick of 15 images . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 67
80
A Appendix
A.1 Content of companion CD
For references the developed application and the current version of enblend [Md04] are
part of the companion CD. Used true orthophotos throughout the thesis and the source
code of the investigated application can be found on the enclosed CD.
Below is a listing of the contents:
Folder: Description:
/dem imgs Elevation models for raytracing
/enblend Compiled version of enblend [Md04] plus a python script to pre-
pare XDibias imagery for enblend.
/findocc The software for raytracing and rectifying a surface model. A brief
user guide can be found in Appendix ??.
/src Source code of findocc and enblend [Md04].
/src imgs Source images for true orthophoto generation
/thesis PDF version of this thesis.
/true orthophotos True orthophotos in full resolution created, using the findocc ap-
plication and enblend [Md04]. Format: Tiff.
/vis masks Rectified and raytraced images
81
A Appendix
A.2 Enblend user guide
To mosaick the images the enclosed python script has to be used, since the XDib-images
have to be converted to Tiff-images and the script has the parameters for enblend [Md04]
already set.
The shell command executed at the root directory of the CD is:
./enblend/enblend xdibias.py <output image>
<vis masks/input image><vis masks/input image>+
A.2.1 Raytracing user guide
To run the raytracing application without XDibias the following command has to be
executed in the shell at the root directory of the CD:
./findocc/findocc -i l=<src imgs/input image[,xs=,ys=,width=,height=]> -i
l=<dem imgs/dem image> -o l=<output image>
It is possible to set a bounding box for the input image, by adding start coordinates xs, ys
and the width width and height height of the box to the input image, so that only the
box is rectified.
82