gpu based single-pass ray casting of large height elds ...€¦ · gpu based single-pass ray...
TRANSCRIPT
CGI2012 manuscript No.(will be inserted by the editor)
GPU based Single-Pass Ray Casting of Large HeightfieldsUsing Clipmaps
Dirk Feldmann · Klaus Hinrichs
Abstract Heightfields have proved to be useful for
rendering terrains or polygonal surfaces with fine-
structured details. While GPU-based ray casting has
become popular for the latter setting, terrains are com-
monly rendered by using mesh-based techniques, be-
cause the heightfields can be very large and hence ray
casting on these data is usually less efficient. Compared
to mesh-based techniques, ray casting is attractive, for
it does not require to deal with mesh related problems
such as tessellation of the heightfield, frustum culling
or mesh optimizations. In this paper we present an ap-
proach to render heightfields of almost arbitrary size at
real-time frame rates by means of GPU-based ray cast-
ing and clipmaps. Our technique uses level-of-detail de-
pendent early ray termination to accelerate ray casting
and avoids aliasing caused by texture sampling or spa-
tial sampling. Furthermore, we use two different meth-ods to improve the visual quality of the reconstructed
surfaces obtained from point sampled data. We evalu-
ate our implementation for four different data sets and
two different hardware configurations.
Keywords ray casting · rendering · single-pass ·clipmap · heightfield · terrain
1 Introduction
Heightfield rendering has numerous applications in sci-
ence and entertainment. One major application is ter-
rain rendering which is more and more used to map high
resolution aerial photographs acquired by air planes,
satellites or unmanned aerial vehicles onto a digital sur-
face model (DSM) of the covered area. This approach
Dirk Feldmann · Klaus HinrichsVisCG, Department of Computer Science, University ofMunster, Germany
preserves depth perception and provides context and
other information to the viewer. Popular examples are
NASA World Wind [23] or Google Earth [15]. Since
textured polygonal meshes can be processed and ren-
dered by GPUs at high speed, a widely used rendering
technique stores a DSM in (grayscale) texture maps
(so called heightmaps or heightfields) and uses them
to displace the vertices of the corresponding polygonal
mesh [6].
However, most renderers accept only triangle meshes
which can become rather complex and may easily con-
sist of millions of triangles. During mesh generation par-
ticular attention has to be paid to different issues, e. g.,
to not produce any cracks, to choose appropriate tessel-
lations and to avoid aliasing caused by small or distant
triangles.
Therefore it appears to be attractive to bypass theentire process of converting a heightfield into a mesh
which finally is rasterized and results for many trian-
gles in at most a few pixels whose corresponding frag-
ments succeed in passing all of the numerous tests en-
countered on their way through the rendering pipeline.
Techniques like relief mapping [27] or parallax occlusion
mapping [33] can make use of pixel shaders on modern
GPUs to perform real time ray casting on heightfields in
order to calculate the displaced sample positions in cor-
responding color textures which contribute to the final
fragment color. During this ray casting fine-structured
details can be added to surfaces without further tessel-
lating the underlying polygonal mesh. In many cases
this even allows to reduce the polygonal mesh to a sin-
gle planar quadrilateral which usually consists of only
two triangles.
In order to speed up the ray casting and to achieve
real-time frame rates, many GPU-based heightfield ren-
dering techniques employ maximum mipmaps to access
2 Dirk Feldmann, Klaus Hinrichs
the DSM. As the size of texture maps that can be han-
dled by GPUs is currently limited by vendor specific
restrictions and ultimately by the amount of available
video memory, large DSMs cannot be stored in a single
heightfield texture for direct access during GPU-based
ray casting.
In this paper we present a GPU-based heightfield ray
casting technique which performs single-pass rendering
of heightfields of almost arbitrary sizes in real time. Our
main contribution is to demonstrate how clipmaps and
current graphics hardware can be used to speed up the
ray casting and improve the image quality by early ray
termination based on level of detail selection while alle-
viating the aforementioned video memory limitations.
Additionally we use two different refinement methods
to improve the appearance of the reconstructed surfaces
in our renderings. We demonstrate the performance of
our technique for four large data sets of up to 31 GB
size.
2 Related Work
Much research has been performed on CPU-based ray
casting of heightfields as well as on terrain rendering
based on polygonal meshes. Since summarizing these
two areas would exceed the scope of this paper, we
confine ourselves to an overview of recent GPU-based
heightfield ray casting methods related to our work.
Qu et al. [28] presented one of the first GPU-based ray
casting schemes for heightfields which primarily aims at
accurate surface reconstruction of heightfields but does
not use any sophisticated structures for acceleration.
Relief mapping [25] and parallax (occlusion) mapping
[16] are techniques for adding structural details to polyg-
onal surfaces, which have their origin in CPU-based ren-
dering and improve upon the disadvantages of bump
mapping [4]. Both techniques have been implemented
for GPUs (e. g. [27,33]) and benefit from programmable
graphics pipelines. But as most of these implementa-
tions resemble the strategies used in CPU-based ray
casting, like iterative and/or binary search to detect
heightfield intersections, they are prone to the same
kind of rendering artifacts caused by missed intersec-
tions in highly spatially variant data sets. An introduc-
tion to these closely related techniques can be found
for instance in [1], and more details are given in the
comprehensive state-of-the-art report by Szirmay-Kalos
and Umenhoffer [30] which focuses on GPU-based im-
plementations.
Oh et al. [24] accelerate ray casting and achieve real-
time frame rates by creating a bounding volume hier-
archy (BVH) of the heightfield, which is stored in a
maximum mipmap and allows to safely advance along
the ray over long distances (see section 3.3). They also
present a method based on bilinear interpolation of
heightfield values to improve the quality of the recon-
structed surface obtained from point-sampled data. The
method presented by Tevs et al. [34] also relies on BVHs
stored in maximum mipmaps, but uses a different sam-
pling strategy. Their method advances along the ray
from one intersection of the projected ray with a texel
boundary to the next such intersection, whereas Oh
et al. use a constant step size to advance along the ray.
In addition, Tevs et al. store in each heightfield texel
the height values at the four corners of a quadrilateral
encoded as an RGBA value instead of point samples,
which allows surface reconstruction on parametric de-
scriptions.
Compared to other techniques which also rely on pre-
processed information about the heightfield and accel-
eration data structures, like for instance relaxed cone
step mapping [10,26,19], maximum mipmap creation is
much faster and can be performed on the GPU [34].
All these methods have in common that they operate
on single heightfields of relatively small extents which
are intended to add details to surfaces at meso- or mi-
croscales instead of representing vast surfaces them-
selves. Recently Dick et al. [8] have presented a method
for ray casting terrains of several square kilometers ex-
tent at real-time frame rates. Their method also em-
ploys maximum mipmaps to accelerate the ray casting
process and a tiling approach to render data sets of
several hundred GB size. They also presented a faster
hybrid method which uses ray casting or rasterization-
based rendering, but requires knowledge of the employed
GPU respectively a training phase to decide whether to
use rasterization or ray casting [9].
Our method presented in this paper also aims at ren-
dering very large heightfields only by means of GPU
ray casting. It has been inspired in large parts by the
works of Dick et al. and Tevs et al. as we also employ a
tile-based approach and their cell-precise ray traversal
scheme. But in contrast to the technique by Dick et al.,
which creates a complete mipmap for each tile and re-
quires additional rendering passes to determine the vis-
ibility of the tiles, our method further accelerates the
ray casting process and requires only a single rendering
pass by using a tile-based clipmap implementation.
The clipmap, as introduced by Tanner et al. [32], is
based on mipmaps [35] in order to handle very large
textures at several levels of detail which would exceed
the available video or main memory. While the orig-
inal version requires special hardware, modern GPU
features have superseded these requirements and other
clipmap implementations (or virtual textures) have be-
come available [11,7,29,20,17,31,13] whereupon most
GPU based Single-Pass Ray Casting of Large Heightfields Using Clipmaps 3
of them rely on texture tiles and permit handling of
arbitrarily large textures as briefly described in sec-
tion 3.1. Geometry clipmaps as introduced by Losasso
et al. [18], and derived GPU-based variations [3,5] have
also been used in terrain rendering, but according to our
knowledge only in the context of mesh-based rendering
and not for accelerating ray casting.
3 GPU-based Single-pass Ray Casting Using
Clipmaps
In this section we briefly present our tile-based clipmap
implementation, followed by a description of the used
storage scheme for heightfields. Next we describe the
employed ray traversal method, which is basically the
same as the one described in [8], and we discuss how
we accelerate it and avoid aliasing by using clipmaps.
Finally, we present two refinement methods which we
use to improve the appearance of the reconstructed sur-
faces.
3.1 Tile-based Clipmap Implementations
Clipmaps are storage schemes for texture maps (tex-
tures) which are based on mipmaps and rely like these
on the principle of using pre-filtered data to avoid alias-
ing artifacts when multiple texels are mapped to one
pixel or less in screen space due to perspective projec-
tion (texture minification) [35]. In contrast to mipmaps,
clipmaps only keep those data in memory which are
relevant for rendering the current frame, and they use
caching techniques to reload and update these data.
This reduces the amount of (video) memory occupied
by texture data and also allows to handle textures which
would by far exceed the limits of video or main mem-
ory. The clipmap by Tanner et al. [32] relies on special
hardware to update the texels in video memory when
the viewer’s eye point is moved. Modern GPUs allow to
implement clipmaps by using texture tiles and accessing
them in fragment shaders, e. g., by means of texture ar-
rays. Our implementation uses a Flexible Clipmap [13]
which is constructed as described in the following.
At the level l = 0, which corresponds to the finest res-
olution, the original virtual texture is partitioned into
smaller tiles of n×m texels (tile size). Like in a mipmap,
each 2×2 neighboring texels at level l are combined in a
certain way into a single texel at the next coarser level
l+ 1, which implies that 2× 2 neighboring tiles at level
l correspond to one tile of the same tile size at level
l+ 1. With color textures for instance, the combination
may simply be an averaging operation on the values of
the four texels, but the operation depends on the kind
of texture. This process is repeated until the original
texture is completely covered by a single tile of n ×mtexels at the least detailed level l = L−1, which can be
used to derive an ordinary mipmap. In the following we
use the term “clipmap” to refer only to these lower L
levels of a complete, tile-based clipmap but stick with
the terminology as used by Tanner et al. [32].
A clip center depending on the current location and
viewing direction of the virtual scene camera is used
to determine for each level the tiles that are needed
in the current frame. This group of neighboring tiles
is called the active area and located in video memory.
The clip area formed by a larger superset of tiles is kept
in main memory, and the remaining tiles are stored in
secondary memory, e. g., on hard disk. Since the lower
levels of a corresponding mipmap are effectively clipped
to smaller areas, this data structure is called clipmap.
Figure 1 illustrates the principle of a tile-based clipmap.
Once the tiles have been uploaded to video memory,
Fig. 1: Structure of a tile-based clipmap with L = 4
clip levels with active areas of at most 3× 3 tiles (dark
gray) and clip areas of at most 5× 5 tiles (light gray).
they can be accessed by shaders for rendering. When
the clip center is relocated, i. e., the virtual camera is
moved, tiles stored in video memory and main memory
can be replaced by neighboring ones from main mem-
ory respectively secondary memory, if necessary. If the
virtual camera is located for instance far away from the
textured surface currently visible, only the coarser res-
olution (higher) levels are required, as the texels from
the lower levels would cause aliasing. Hence it is not al-
ways required to keep the active areas of all clip levels
in video or main memory. Of course the tile size has to
be chosen carefully to ensure that the tiles themselves
are manageable by the graphics hardware. More details
on clipmap specific issues can be found in [32,7].
3.2 Clipmaps for DSM Storage
Due to their relation, clipmaps and mipmaps can be
created and used in very similar ways. To use a digital
surface model (DSM) for rendering, in our approach
4 Dirk Feldmann, Klaus Hinrichs
the heightfield values are stored in the clipmap tiles at
the finest resolution (lowest) level l = 0. A texel at
level l > 0 obtains as height value the maximum height
value of the corresponding 2 × 2 subordinate texels at
level l − 1. If we identify each texel with a bounding
box defined by its height value and its grid cell in the
texture, we obtain a bounding volume hierarchy (BVH)
of the underlying DSM as illustrated in figure 2. This
Fig. 2: BVH derived from a heightfield on a regu-
lar grid. Gray boxes correspond to samples at level 0.
Bounding boxes on higher levels and their maximum
value are highlighted by the same color.
is the same construction scheme as used with maxi-
mum mipmaps [24,34,8]. In the method presented by
Dick et al. [8], the heightfield is split into tiles as well,
but a separate maximum mipmap is created for each
tile. To render vast DSMs, this approach may require
either lots of tiles and thus mipmaps to be present in
video memory or additional rendering passes, especially
if the heightfield is shallow and there is little occlu-
sion between tiles. Furthermore, the tiles located far
away from the viewer may contain fine spatial details,
e. g., steep summits of distant mountains, which are not
only not perceivable from far away but may also exposespatial aliasing artifacts due to minification caused by
perspective projection. The latter aspect is the same
which motivated the development of mipmaps for tex-
ture mapping and also applies to mesh-based rendering
techniques which therefore strive to determine an ap-
propriate level of detail (LOD) in order to avoid ras-
terizing triangles that would become projected to less
than one pixel in screen space.
The important difference between the usage of clipmaps
and multiple mipmaps is that in case of clipmaps the
BVH spans the entire domain at the topmost level. A
proper placement of the clip center results in the se-
lection of only those tiles of highest resolution at level
l = 0 which are closest to the virtual camera and thus
potentially have to be rendered in full detail. Compared
to level l, at level l+1 the area of the heightfield covered
by a tile is four times larger, and the spatial resolution
is divided in half along each direction of the grid. Thus
the entire domain is spatially pre-filtered and the level
of detail of the heightfield decreases with increasing dis-
tance towards the viewer. Because higher clipmap levels
also correspond to larger bounding boxes, we can ex-
ploit this fact to accelerate GPU ray casting in the far
range of the scene as described in the following section.
3.3 Rendering and Accelerating Ray Casting
Given a DSM stored in a clipmap of L levels, we set
the clip center simply by projecting the center of the
viewport into the scene. We also ensure that all tiles in
the active areas of all clip levels or at least the highest
(coarsest) ones are stored in video memory by choosing
appropriate sizes for the tiles and the active area. The
axis-aligned bounding box of the entire DSM, which is
associated with the topmost tile, is based in the xz-
plane of a left-handed world coordinate system. It is
represented by a polygonal mesh consisting of 12 trian-
gles which serves as proxy geometry for the ray casting
process. A vertex shader calculates normalized 3D tex-
ture coordinates from the vertex coordinates of the box
corners, and the clipmap is positioned at the bottom
of the box corresponding to the minimum height value
y = Hmin of the DSM. Hmin and the maximum height
value Hmax are both determined during loading of the
topmost clipmap tile on the CPU. By rendering the
back faces of the proxy geometry we obtain each ray’s
exit point e, and we pass the camera position and the
geometry of the bounding box in world coordinates to
the fragment shader which calculates each ray’s direc-
tion d = (dx, dy, dz) and entry point s to the proxy
geometry and transforms them into normalized 3D tex-
ture space. If the camera is located within the bound-
ing box the entry point s becomes the camera position
(cf. [19]). In order to avoid that faces of the proxy ge-
ometry are clipped against the far plane of the view
frustum of the virtual camera and hence exit points are
missing, the box is fitted into the view frustum when
the camera is translated.
The actual ray traversal is performed by projecting the
ray onto a clip level dependent 2D grid. For a given
clip level 0 ≤ l < L the extensions of this grid are
determined by (Gu(l), Gv(l)) =(W2l, H2l
)with (W,H)
being the extensions of the DSM in sample points, i.e.,
texels. Hence, the grid at level l has the same size a
single texture containing the entire DSM at mipmap
level l would have. The current height py of a loca-
tion p = (px, py, pz) = s + k · d on the ray is retained
and updated in world coordinates to test for intersec-
tions with the heightfield. During ray traversal we move
from one intersection of the projected ray dp = (dx, dz)
with a texel boundary to the next such intersection,
i. e., from the projected ray’s entry point enp into a
grid cell directly to its exit point exp as shown in fig-
GPU based Single-Pass Ray Casting of Large Heightfields Using Clipmaps 5
ure 3. The only exception is at the first entry point
which is the projection of s. We start ray casting at
Fig. 3: Rays are traversed from one intersection of the
projected ray with a texel boundary to the next such
intersection.
the coarsest (highest) clip level L − 1 of the BVH at
which the entire DSM is given in a single tile and each
pixel corresponds to the maximum value and thus the
bounding box of 2L−1 × 2L−1 texels at level 0. To de-
termine whether a ray hits a bounding box at level l,
the clipmap tile containing the grid cell which belongs
to the current enp and exp has to be sampled for the
associated height value h. Since the direction of the ray
is needed to determine this grid cell we store the sign
bits of the components of d in the lower three bits of
an integer. This bit mask is created once for each ray
using bit-wise operations in the fragment shader, and
it is evaluated as needed by switch-statements to de-
termine the direction of a ray instead of duplicating
the shader code for the ray casting loop for each of the
overall eight possible branches.
When moving along the ray from point en to point ex
we hit the box surface if the ray is directed downwards
(resp. upwards) and ex (resp. en) lies below the top of
the box (at height h). If a ray hits a bounding box B
at the current level l, it may also hit a bounding box
contained in B at a lower level of the BVH. Therefore
the ray casting process is repeated at the next lower
level l′ = l − 1 from the current position en of the ray,
but only if it is possible and reasonable to proceed as
described in section 3.4. Otherwise the lowest possible
level l = lmin has been reached, and the exact inter-
section i on the bounding box surface is calculated by
i =
{en dy ≥ 0
en + d ·max(h−eny
dy, 0)
dy < 0
If a ray does not intersect a bounding box B at level l,
it cannot intersect any of the bounding boxes contained
in B at any lower level either, and we therefore advance
along the ray to ex which becomes the entry point en
of the next cell. Compared to a ray traversal performed
just on level 0, only one instead of 2l×2l samples have to
be tested for intersection, which results in a significant
speed up of the process (cf. [34], [8]).
If a ray hits a bounding box B at some level l > 0
it does not necessarily have to hit any bounding boxes
contained in B at level l−1. This cannot be determined
without descending to the lower level. In order to avoid
using the smaller step size over longer distances when it
is not really necessary, we move up again to level l if we
detect that the ray does not hit any bounding box at
level l−1 (cf. [34,8]). These three different cases for the
intersection of a ray with a bounding box are illustrated
in figure 4. The ray casting process is terminated if
Fig. 4: Intersection of ray with a height field. The green
ray hits the left red box, but none of the black boxes
contained.
either a valid intersection point i on a bounding box
has been found, the ray leaves the domain of the DSM,
or the maximal number of ray casting steps exceeds
2 · max(n,m) with n, m as the tile size in texels. In
the latter two cases, the fragment from which the ray
originates is discarded by the shader.
3.4 LOD-determined Ray Termination
To decide whether we can terminate ray casting at the
current level, we check the following two conditions.
First, we determine at each intersection of a bounding
box the highest resolution available, i. e., the lowest clip
level llow of a tile which covers the corresponding area of
the DSM and is present in video memory. The clipmap
tiles from the active areas of all clip levels are stored in a
texture array which is accessed by the fragment shader.
The Flexible Clipmap uses a certain tile layout and an
additional texture, the tile map [7], to find llow and the
index in the texture array where the corresponding tile
has been stored during its upload into video memory
(see [13] for details). The tile map covers the entire do-
main of the DSM as well, but each texel corresponds to
one tile of n×m texels at the lowest level l = 0. Each
texel stores the lowest clip level of the tile which cov-
ers the corresponding area of the DSM and is currently
present in video memory. For instance, given a tile size
of n = m = 512 texels, a tile map of 512 × 512 texels
holds information about the clip levels of 5122 × 5122
heightfield samples. When tiles at and above level l ≥ 0
6 Dirk Feldmann, Klaus Hinrichs
are available in video memory, the tile map contains a
square region of 2l×2l texels with value l (cf. [32]). The
tile map is created on the CPU whenever the cache for
the clipmap tiles is updated due to relocations of the
clip center, and tiles are uploaded in top-down order
to ensure that at least the highest levels are present if
secondary caching structures cause a delay, e. g., when
tiles have to be loaded from hard disk. Thus, by trans-
forming the hit point i on the bounding box surface to
normalized texture coordinates the shader can deter-
mine llow by a single texel-precise texture lookup in the
tile map.
Second, the optimal clip level lopt at the current hit
point i = (u, hgrid, v) is determined by the minification
of the corresponding box at level l = 0 in screen space
(cf. [12]). We project the four corners of the cell’s box
κ = (buc ·Rx, bhgridc ·Ry, bvc ·Rz), λ = κ+(Rx, 0, 0),
µ = κ + (0, 0, Rz) and ν = κ + (0, Ry, 0) from world
space into normalized screen space using the model,
view and projection matrix combined in M followed by
perspective division to obtain the vectors a,b, c and f ,
where Rx, Ry, Rz are the numbers of world space units
per heightfield sample along the respective direction.
Then we calculate the areas A1, A2 and A3 of the pro-
jected faces of a box in screen space:
p = (b− a), q = (c− a), r = (f − a)
A1 = |p× q| = |(px · qy)− (py · qx)|A2 = |p× r| = |(px · ry)− (py · rx)|A3 = |q× r| = |(qx · ry)− (qy · rx)|
We want the largest face of one box in screen space A =
max (A1, A2, A3) to correspond to one texel of a tile at
level lopt in texture space which itself has an area of
P = 1n·m . Hence 2lopt = P
A and lopt = − log2 (A · n ·m)
Instead of descending to a full resolution mipmap level
which may cause aliasing we can now terminate ray
casting already at level lmin = max (llow, lopt). The two
different LODs llow and lopt are visualized in figure 5
where each level is coded by a different color.
(a) llow (b) lopt
Fig. 5: The two different LODs llow and lopt are used
to terminate the ray traversal and to avoid aliasing.
3.5 Sampling Color Textures
In our implementation, each clipmap tile can consist of
several different texture layers which are handled identi-
cally and only differ by the stored data and their texel
aggregation scheme. For each tile we provide an ad-
ditional layer for a registered color texture to texture
the DSM. This color texture layer is uploaded along
with the heightfield layer and accessed in the fragment
shader via a second texture array. As long as they cover
the same area in world space, the different layers of the
tiles do not even need to be of the same resolution. How-
ever, we have not yet implemented this, and therefore
one heightfield sample corresponds to one color sam-
ple. In general, to avoid aliasing when sampling the
color texture layer, we would have to determine the
ideal LOD ltex at the final hit point i in the height-
field separately and transform it to the corresponding
tile which holds the color texture layer. This LOD ltexcan be calculated in the same way as lopt during ray
casting (see section 3.4), but in case of a 1:1 relation
of heightfield and color samples we can directly use loptand the texture coordinate for the heightfield layer ob-
tained during ray casting to sample the color texture.
The final fragment color is obtained by linear interpola-
tion between the linearly interpolated color values from
the two LODs adjacent to ltex (trilinear interpolation).
3.6 Refinement of Block-sampled Heightfield
Reconstruction
As pointed out by Oh et al. in [24], the point sampled
DSMs and their treatment as boxes results in blocky im-
ages which from a closeup view remind of models built
of bricks (see figure 6a). Because this effect may be un-
(a) none (b) linear (c) bicubic
Fig. 6: Demonstration of the improvement in surface
quality achieved by different refinement methods.
wanted in most applications, we also implemented two
refinement methods to obtain smooth surfaces. Both
refinement methods are applied after the intersection i
on the bounding box surface has been determined as
described in section 3.3.
GPU based Single-Pass Ray Casting of Large Heightfields Using Clipmaps 7
The first method is the one presented by Oh et al. [24]
and relies on linear interpolation of two samples ob-
tained from the linearly interpolated heightfield, which
are taken at a distance of each one half cell from i
in forward respectively backward direction along the
ray. This method works quite well and does hardly
slow down the overall performance on modern GPUs,
but in our implementation, some defects – presumably
caused by numerical inaccuracies – on surfaces with
steep slopes remain, as shown in figure 6b. Despite these
small defects, which are barely noticeable during ani-
mations or from farther viewing distances, the surfaces
look much smoother.
Our second method uses Hermite bicubic surfaces to
improve the reconstruction of the heightfield. Let (u, v)
denote the projection of i onto the grid of the height-
field where ray casting has been terminated. We inter-
pret the junctions at the four corners of the grid cell
containing (u, v) and its eight neighbors as the corners
of a bicubic surface patch. The four junctions are given
by
α = (buc ,min (SW,S,C,W ) , bvc)β = (buc+ 1,min (S, SE,E,C) , bvc)γ = (buc ,min (W,C,N,NW ) , bvc+ 1)
δ = (buc+ 1,min (C,E,NE,N) , bvc+ 1)
with C as the height value of the cell containing (u, v)
and SW,S, SE,E,NE,N,NW,W as the height values
of the neighboring cells, starting at the left lower cell
adjacent to α and enumerating them in counterclock-
wise order (see figure 7). Each patch is parametrized
Fig. 7: Construction scheme for a Hermite bicubic
patch from 3 × 3 heightfield samples surrounding the
projection of intersection point i on the bounding box.
along the grid axes by (s, t) ∈ [0, 1], and the height
h(s, t) on the surface patch is given by
h(s, t) =(s3 s2 s 1
)·H ·G ·HT ·
(t3 t2 t 1
)T
H =
2 −2 1 1
−3 3 −2 −1
0 0 1 0
1 0 0 0
, G =
αy βy
∂αy
∂v∂βy
∂v
γy δy∂γy∂v
∂δy∂v
∂αy
∂u∂βy
∂u∂2αy
∂u∂v∂2βy
∂u∂v∂γy∂u
∂δy∂u
∂2γy∂u∂v
∂2δy∂u∂v
(cf. [14]). The partial derivatives which define the tan-
gential planes on the patch are approximated by us-
ing forward respectively backward differences and by
making the following simplifications for the first order
derivatives:
∂αy∂u
=∂γy∂u≈ C −W, ∂βy
∂u=∂δy∂u≈ E − C
∂αy∂v
=∂βy∂v≈ C − S, ∂γy
∂v=∂δy∂v≈ N − C
Although the matrix G at each grid cell respectively
texel of the clipmap storing the heightfield is constant,
we calculate it directly in the fragment shader as needed.
The pair of parameters (s, t), which corresponds to an
intersection with the bicubic patch instead of the bound-
ing box, is determined by a second ray casting. Starting
at i on the bounding box surface, the ray p = i + k · dis advanced at a fixed step width until it either hits
the bicubic patch, i. e., py ≤ h(s, t), or it leaves the do-
main of the box without intersection. In the latter case,
we treat i as an entry point s on the proxy geometry
and proceed with the accelerated ray casting process
described in section 3 from the current level. We found
a subdivision into 16 steps for traversing the bounding
box of a cell to be completely sufficient, independent of
the clip level l. Fewer subdivision steps expose defects
by missed intersections, whereas increasing the number
of subdivision steps only reduces frame rates without
further improving the reconstruction of the surface.
Besides their simplicity and the possibility to calcu-
late all the relevant information in the fragment shader,
we decided to use Hermite bicubic patches because we
wanted to ensure that the surface remains inside the
bounding boxes of the BVH. By constructing the patches
as described above, we can ensure that they stay com-
pletely inside the bounding boxes as we control the
defining tangential planes. The direct usage of forward
and backward differences in our implementation avoids
any scaling of the tangents and therefore leads to de-
sired C1 continuity between neighboring patches, be-
cause their tangents have the same direction and magni-
tude (cf. [14]). The most severe drawback of this method
is its high computational cost, although we still may
achieve interactive frame rates (see section 4.2). Fur-
thermore, as this method ensures that the height of
each patch is less or equal than the height of its bound-
ing box, and the tangents are not scaled, isolated peaks
in the heightfield become clearly flattened as can be
seen in figure 6c.
However, both refinement methods presented in this
section rely on interpolation of point sampled data on
a regular grid, and only serve in making the resulting
renderings visually more appealing. Besides, even if it
might appear to be sufficient to apply refinement only in
8 Dirk Feldmann, Klaus Hinrichs
name extent [km] W ×H L scale size DSM size color texture time [min]City 1 1.4× 1.0 5600× 4000 5 1.0 133 MB 99 MB 0:54City 2 20.9× 26.3 83600× 105200 9 1.0 31.6 GB – 3:34ETOPO1 ≈ 40075.0× 19970.0 21600× 10800 7 10.0 1.3 GB – 9:53Blue Marble ≈ 40075.0× 19970.0 86400× 43200 9 10.0 19.2 GB 14.4 GB 13:10
Table 1: Properties of the different data sets used to evaluate performance. L denotes the total number of clip
levels which have been created, W ×H is the grid size at level 0 respectively the size a single texture would have.
Column time contains the durations of the virtual camera flights for our evaluation in minutes.
cases when the viewer is close to a highly detailed area
where the block sampled nature of the data becomes
apparent, we refine the surface at all discrete LODs,
because the transition between large distant boxes and
smooth surfaces is rather disturbing during animations.
In addition, the lighting conditions on smooth surfaces
and blocks are different due to distinct surface normals.
4 Performance Results and Discussion
The implementation of our technique relies on OpenGL
and GLSL 1.50 shaders, and we demonstrate its per-
formance by means of renderings of the four different
data sets listed in table 1. The data set City 2 was
acquired by means of photogrammetric methods from
aerial images. City 1 depicts a small area in City 2 in
which we have a color texture available that has been
derived from orthographic aerial images. The data sets
ETOPO1 [2] and Blue Marble [21] depict the entire
earth and are both derived in large parts from SRTM
data [22], but ETOPO1 also contains bathymetric data,
whereas Blue Marble possesses a color texture derived
from satellite images.
When being sampled in the fragment shader, the height
values are scaled by factors given in column scale in
order to avoid flattened surfaces. Shallow surfaces do
not challenge our ray caster because less mutual occlu-
sions lead to fewer level changes in the BVH during ray
traversal. Renderings of three data sets are shown in
figure 8.
4.1 Evaluation Setup and Results
We used tile sizes of 512× 512 texels, active area sizes
of 5 × 5 tiles and clip area sizes of 7 × 7 for all data
sets in our tests. The near resp. far plane of the vir-
tual camera were set to 1.0 resp. 2000.0 units. Height-
field layers consist of single channel 32-bit floating point
textures, and color texture layers consist of 24-bit RGB
textures. The results were recorded during virtual cam-
era flights along fixed paths over the heightfields on
a desktop computer with an Intel i7 860 CPU at 2.8
GHz, 6 GB RAM, NVIDIA GeForce GTX 470 graph-
ics adapter with 1280 MB dedicated VRAM and Win-
data set resolution[pixel]
frames min.[fps]
avg.[fps]
City 11024× 768 5450 6.2 100.91280× 1024 3464 5.8 64.11920× 1080 2217 5.1 41.0
City 21024× 768 25624 5.6 119.71280× 1024 16700 5.6 78.01920× 1080 11189 5.1 52.3
ETOPO11024× 768 105747 6.3 178.51280× 1024 66869 5.9 112.91920× 1080 42907 4.9 72.4
Blue Marble1024× 768 121844 3.3 154.21280× 1024 75721 4.0 95.81920× 1080 50028 1.8 63.3
(a) System A
data set resolution[pixel]
frames min.[fps]
avg.[fps]
City 11024× 768 3848 5.1 71.21280× 1024 2450 4.5 45.31920× 1080 1574 5.1 29.1
City 21024× 768 18174 3.6 84.91280× 1024 12327 3.6 57.61920× 1080 8290 3.3 38.7
ETOPO11024× 768 75790 5.2 127.91280× 1024 48027 5.2 81.11920× 1080 31134 4.7 52.5
BlueMarble
1024× 768 95285 2.8 120.61280× 1024 63409 2.5 80.31920× 1080 43314 1.0 54.8
(b) System B
Table 2: Performance results of our rendering tech-
nique.
dows 7 OS (system A). To make our results comparable
to the results reported in [8], we additionally ran the
same tests on a second desktop computer (system B)
with a hardware configuration more similar to theirs
(Intel Q6600 CPU at 2.4 GHz, 4 GB RAM, NVIDIA
GeForce GTX 285 with 1024 MB dedicated VRAM and
Windows 7 OS). Table 2 shows the results for different
screen resolutions on system A and system B in terms
of frames per second (fps). The frame rates take into
account the delays caused by updating the tile caches
in main memory and video memory as described in sec-
tion 3.1. The times for rendering the given number of
frames are denoted by column time in table 1.
4.2 Performance with Surface Refinement
All values given in table 2 were obtained without any of
the surface refinement methods described in section 3.6.
The impact on the rendering speed and the relative loss
GPU based Single-Pass Ray Casting of Large Heightfields Using Clipmaps 9
(a) City 2 (b) ETOPO1 (c) Blue Marble
Fig. 8: Example renderings of the data sets which we used in our performance evaluations. Color textures are only
available for City 1 and Blue Marble, ETOPO1 was rendered using a pseudo topographic color map.
method 1024× 768 1280× 1024 1920× 1080linear 96.2 (-19.6%) 62.8 (-19.5%) 42.0 (-19.7%)bicubic 36.8 (-69.3%) 24.3 (-68.8%) 16.6 (-68.3%)
(a) System A
method 1024× 768 1280× 1024 1920× 1080linear 75.4 (-12.4%) 49.8 (-13.5%) 33.7 (-12.9%)bicubic 36.4 (-57.1%) 17.4 (-69.8%) 11.9 (-69.3%)
(b) System B
Table 3: Impact on the performance by surface refine-
ment methods in terms of average frames per second
for City 2 data set and the loss compared to unrefined
rendering.
in performance when using surface refinement in our
implementation is shown in table 3. These data were
acquired from another evaluation of the same camera
flight through the City 2 data set on system A and sys-
tem B, because this data set has high spatial frequencies
in the rendered regions and is the most challenging for
our ray caster.
4.3 Discussion
The results in table 2 show that - in accordance with
the results of Dick et al. [8] - very large DSMs can be
rendered in real time by using only ray casting and
acceleration data structures. Although the hybrid ap-
proach of Dick et al. [9] performs faster rendering, it
appears to be less flexible, because it requires to select
representative tiles from the data set and views of the
scene during its training phase.
As expected, table 3 shows that when using bicubic sur-
face refinement, the loss in performance is much bigger
than with the linear method, but even at the high-
est resolution we still achieve interactive frame rates.
The linear method may expose some defects, but of-
fers a good compromise between quality and speed at
higher resolutions. Besides, the refinement of the recon-
structed surface only pays for coarse resolution DSMs
respectively low grid densities where the block struc-
ture becomes apparent. The differences in the frame
rates between the two city data sets and the two earth
data sets result from different grid densities.
5 Conclusions and Future Work
In this paper we have shown that by combining clipmaps
and ray casting very large DSMs can be rendered at
real-time frame rates in a single rendering pass. Our ap-
proach eliminates aliasing caused by texture sampling
or spatial sampling. The same LOD selection method
is used in order to avoid unnecessary ray casting steps
in regions distant to the viewer. The size of the ren-
dered DSMs is mainly limited by the amount of sec-
ondary memory available. We also used surface refine-
ment based on Hermite bicubic patches to improve the
renderings of point sampled data and still achieved in-
teractive frame rates at high scren resolutions. However,
assigning color values to reconstructed surfaces by us-
ing orthographic photo textures suffers from the prob-
lem that they do not contain information about surfaces
oriented oblique to the ground plane. This becomes es-
pecially apparent for our City 1 data set if we lower
the camera to street level where facades of buildings
are missing. In case of untextured DSMs, the transition
from one LOD to another in the heightfield layer can
be perceived as inconvenient and therefore a method
for smooth transitions between different LODs in this
layer is needed. Besides, it would be desirable to have
direct comparisons of the performance and rendering
quality of our implementation with rasterization-based
techniques and CPU ray casting implementations.
Acknowledgements Our work has been conducted withinthe project AVIGLE, which is part of the Hightech.NRW ini-tiative funded by the Ministry of Innovation, Science andResearch of the German State of North Rhine-Westphalia.AVIGLE is a cooperation of several academic and indus-trial partners, and we thank all partners for their work and
10 Dirk Feldmann, Klaus Hinrichs
contributions to the project with special thanks to Aerow-est GmbH, Dortmund, Germany for providing us with datafor our City data sets. We would further like to thank theanonymous reviewers for their valuable advice and all thosewho were involved in providing the original data for our datasets ETOPO1 and Blue Marble.
References
1. Akenine-Moller, T., Haines, E., Hoffman, N.: Real-TimeRendering, 3rd edn. A K Peters, Ltd. (2008)
2. Amante, C., Eakins, B.W.: ETOPO1 1 Arc-MinuteGlobal Relief Model: Procedures, Data Sources andAnalysis. In: NOAA Technical Memorandum NESDISNGDC-24, p. 19pp (2009)
3. Asirvatham, A., Hoppe, H.: GPU Gems 2, chap. Ter-rain Rendering Using GPU-Based Geometry Clipmaps.Addison-Wesley Longman (2005)
4. Blinn, J.F.: Simulation of Wrinkled Surfaces. In: SIG-GRAPH ’78: Proceedings of the 5th annual conference onComputer graphics and interactive techniques, pp. 286–292. ACM (1978)
5. Clasen, M., Hege, H.C.: Terrain Rendering using Spheri-cal Clipmaps. In: EuroVis06 Joint Eurographics - IEEEVGTC Symposium on Visualization, pp. 91–98. Euro-graphics Association (2006)
6. Cook, R.L.: Shade trees. In: SIGGRAPH ’84: Proceed-ings of the 11th annual conference on Computer graphicsand interactive techniques, pp. 223–231. ACM (1984)
7. Crawfis, R., Noble, E., Ford, M., Kuck, F., Wagner, E.:Clipmapping on the GPU. Tech. rep., Ohio State Uni-versity, Columbus, OH, USA (2007)
8. Dick, C., Kruger, J., Westermann, R.: GPU Ray-Castingfor Scalable Terrain Rendering. In: Proceedings of Euro-graphics 2009 - Areas Papers, pp. 43–50 (2009)
9. Dick, C., Kruger, J., Westermann, R.: GPU-Aware Hy-brid Terrain Rendering. In: Proceedings of IADIS Com-puter Graphics, Visualization, Computer Vision and Im-age Processing 2010, pp. 3–10 (2010)
10. Dummer, J.: Cone Step Mapping: An Iter-ative Ray-Heightfield Intersection Algorithm.http://www.lonesock.net/files/ConeStepMapping.pdf(2006)
11. Ephanov, A., Coleman, C.: Virtual Texture: A LargeArea Raster Resource for the GPU. In: Interser-vice/Industry Training, Simulation, and Education Con-ference (I/ITSEC) 2006, pp. 645–656 (2006)
12. Ewins, J.P., Waller, M.D., White, M., Lister, P.F.: MIP-Map Level Selection for Texture Mapping. IEEE Trans-actions on Visualization and Computer Graphics 4(4),317–329 (1998)
13. Feldmann, D., Steinicke, F., Hinrichs, K.: FlexibleClipmaps for Managing Growing Textures. In: Proceed-ings of International Conference on Computer GraphicsTheory and Applications (GRAPP) (2011)
14. Foley, J.D., van Dam, A., Feiner, S.K., Hughes, J.F.:Computer Graphics: Principles and Practice, Second Edi-tion in C edn. Addison-Wesley (1995)
15. Google Inc.: Google Earth. http://earth.google.com/(2005)
16. Kaneko, T., Takahei, T., Inami, M., Kawakami, N.,Yanagida, Y., Maeda, T., Tachi, S.: Detailed Shape Rep-resentation with Parallax Mapping. In: In Proceedingsof the ICAT 2001, pp. 205–208 (2001)
17. Li, Z., Li, H., Zeng, A., Wang, L., Wang, Y.: Real-TimeVisualization of Virtual Huge Texture. In: ICDIP ’09:Proceedings of the International Conference on DigitalImage Processing, pp. 132–136. IEEE Computer Society(2009)
18. Losasso, F., Hoppe, H.: Geometry clipmaps: Terrain Ren-dering Using Nested Regular Grids. ACM Transactionson Graphics (TOG) (2004)
19. Microsoft: DirectX SDK Documentation: Raycast-Terrain Sample. http://msdn.microsoft.com/en-us/library/ee416425(v=vs.85).aspx (2008)
20. Mittring, M., Crytek GmbH: Advanced Virtual TextureTopics. In: SIGGRAPH ’08: ACM SIGGRAPH 2008Classes, pp. 23–51. ACM (2008)
21. NASA: Visible Earth: Earth - The Blue Marble.http://visibleearth.nasa.gov/view.php?id=54388 (1997)
22. NASA: Shuttle Radar Topography Mission.http://www2.jpl.nasa.gov/srtm/ (2000)
23. NASA: World Wind. http://worldwind.arc.nasa.gov/(2004). http://www.goworldwind.org
24. Oh, K., Ki, H., Lee, C.H.: Pyramidal Displacement Map-ping: a GPU based Artifacts-free Ray Tracing through anImage Pyramid. In: VRST ’06: Proceedings of the ACMsymposium on Virtual reality software and technology,pp. 75–82. ACM (2006)
25. Oliveira, M.M., Bishop, G., McAllister, D.: Relief Tex-ture Mapping. In: SIGGRAPH ’00: Proceedings of the27th annual conference on Computer graphics and in-teractive techniques, pp. 359–368. ACM Press/Addison-Wesley Publishing Co. (2000)
26. Policarpo, F., Oliveira, M.M.: GPU Gems 3, chap. Re-laxed Cone Stepping for Relief Mapping. Addison-WesleyProfessional (2007)
27. Policarpo, F., Oliveira, M.M., Comba, J.a.L.D.: Real-time Relief Mapping on Arbitrary Polygonal Surfaces.In: Proceedings of the 2005 symposium on Interactive 3Dgraphics and games, I3D ’05, pp. 155–162. ACM (2005)
28. Qu, H., Qiu, F., Zhang, N., Kaufman, A., Wan, M.:Ray Tracing Height Fields. In: Procedings of ComputerGraphics International, pp. 202–207 (2003)
29. Seoane, A., Taibo, J., Hernandez, L.: Hardware-Independent Clipmapping. In: Journal of WSCG 2007,pp. 177 – 183 (2007)
30. Szirmay-Kalos, L., Umenhoffer, T.: Displacement Map-ping on the GPU - State of the Art (2006)
31. Taibo, J., Seoane, A., Hernandez, L.: Dynamic VirtualTextures. In: Journal of WSCG 2009, pp. 25 – 32. Euro-graphics Association (2009)
32. Tanner, C.C., Migdal, C.J., Jones, M.T.: The Clipmap:a Virtual Mipmap. In: SIGGRAPH ’98: Proceedings ofthe 25th Annual Conference on Computer Graphics andInteractive Techniques, pp. 151–158. ACM (1998)
33. Tatarchuk, N.: Dynamic Parallax Occlusion Mappingwith Approximate Soft Shadows. In: SIGGRAPH ’06:ACM SIGGRAPH 2006 Courses, pp. 63–69. ACM (2006)
34. Tevs, A., Ihrke, I., Seidel, H.P.: Maximum Mipmaps forFast, Accurate, and Scalable Dynamic Height Field Ren-dering. In: I3D ’08: Proceedings of the 2008 Symposiumon Interactive 3D Graphics and Games, pp. 183–190.ACM (2008)
35. Williams, L.: Pyramidal Parametrics. In: SIGGRAPH’83: Proceedings of the 10th Annual Conference on Com-puter Graphics and Interactive Techniques, pp. 1–11.ACM (1983)