gpu based single-pass ray casting of large height elds ...€¦ · gpu based single-pass ray...

CGI2012 manuscript No.(will be inserted by the editor)

GPU based Single-Pass Ray Casting of Large HeightfieldsUsing Clipmaps

Dirk Feldmann · Klaus Hinrichs

Abstract Heightfields have proved to be useful for

rendering terrains or polygonal surfaces with fine-

structured details. While GPU-based ray casting has

become popular for the latter setting, terrains are com-

monly rendered by using mesh-based techniques, be-

cause the heightfields can be very large and hence ray

casting on these data is usually less efficient. Compared

to mesh-based techniques, ray casting is attractive, for

it does not require to deal with mesh related problems

such as tessellation of the heightfield, frustum culling

or mesh optimizations. In this paper we present an ap-

proach to render heightfields of almost arbitrary size at

real-time frame rates by means of GPU-based ray cast-

ing and clipmaps. Our technique uses level-of-detail de-

pendent early ray termination to accelerate ray casting

and avoids aliasing caused by texture sampling or spa-

tial sampling. Furthermore, we use two different meth-ods to improve the visual quality of the reconstructed

surfaces obtained from point sampled data. We evalu-

ate our implementation for four different data sets and

two different hardware configurations.

Keywords ray casting · rendering · single-pass ·clipmap · heightfield · terrain

1 Introduction

Heightfield rendering has numerous applications in sci-

ence and entertainment. One major application is ter-

rain rendering which is more and more used to map high

resolution aerial photographs acquired by air planes,

satellites or unmanned aerial vehicles onto a digital sur-

face model (DSM) of the covered area. This approach

Dirk Feldmann · Klaus HinrichsVisCG, Department of Computer Science, University ofMunster, Germany

preserves depth perception and provides context and

other information to the viewer. Popular examples are

NASA World Wind [23] or Google Earth [15]. Since

textured polygonal meshes can be processed and ren-

dered by GPUs at high speed, a widely used rendering

technique stores a DSM in (grayscale) texture maps

(so called heightmaps or heightfields) and uses them

to displace the vertices of the corresponding polygonal

mesh [6].

However, most renderers accept only triangle meshes

which can become rather complex and may easily con-

sist of millions of triangles. During mesh generation par-

ticular attention has to be paid to different issues, e. g.,

to not produce any cracks, to choose appropriate tessel-

lations and to avoid aliasing caused by small or distant

triangles.

Therefore it appears to be attractive to bypass theentire process of converting a heightfield into a mesh

which finally is rasterized and results for many trian-

gles in at most a few pixels whose corresponding frag-

ments succeed in passing all of the numerous tests en-

countered on their way through the rendering pipeline.

Techniques like relief mapping [27] or parallax occlusion

mapping [33] can make use of pixel shaders on modern

GPUs to perform real time ray casting on heightfields in

order to calculate the displaced sample positions in cor-

responding color textures which contribute to the final

fragment color. During this ray casting fine-structured

details can be added to surfaces without further tessel-

lating the underlying polygonal mesh. In many cases

this even allows to reduce the polygonal mesh to a sin-

gle planar quadrilateral which usually consists of only

two triangles.

In order to speed up the ray casting and to achieve

real-time frame rates, many GPU-based heightfield ren-

dering techniques employ maximum mipmaps to access

2 Dirk Feldmann, Klaus Hinrichs

the DSM. As the size of texture maps that can be han-

dled by GPUs is currently limited by vendor specific

restrictions and ultimately by the amount of available

video memory, large DSMs cannot be stored in a single

heightfield texture for direct access during GPU-based

ray casting.

In this paper we present a GPU-based heightfield ray

casting technique which performs single-pass rendering

of heightfields of almost arbitrary sizes in real time. Our

main contribution is to demonstrate how clipmaps and

current graphics hardware can be used to speed up the

ray casting and improve the image quality by early ray

termination based on level of detail selection while alle-

viating the aforementioned video memory limitations.

Additionally we use two different refinement methods

to improve the appearance of the reconstructed surfaces

in our renderings. We demonstrate the performance of

our technique for four large data sets of up to 31 GB

size.

2 Related Work

Much research has been performed on CPU-based ray

casting of heightfields as well as on terrain rendering

based on polygonal meshes. Since summarizing these

two areas would exceed the scope of this paper, we

confine ourselves to an overview of recent GPU-based

heightfield ray casting methods related to our work.

Qu et al. [28] presented one of the first GPU-based ray

casting schemes for heightfields which primarily aims at

accurate surface reconstruction of heightfields but does

not use any sophisticated structures for acceleration.

Relief mapping [25] and parallax (occlusion) mapping

[16] are techniques for adding structural details to polyg-

onal surfaces, which have their origin in CPU-based ren-

dering and improve upon the disadvantages of bump

mapping [4]. Both techniques have been implemented

for GPUs (e. g. [27,33]) and benefit from programmable

graphics pipelines. But as most of these implementa-

tions resemble the strategies used in CPU-based ray

casting, like iterative and/or binary search to detect

heightfield intersections, they are prone to the same

kind of rendering artifacts caused by missed intersec-

tions in highly spatially variant data sets. An introduc-

tion to these closely related techniques can be found

for instance in [1], and more details are given in the

comprehensive state-of-the-art report by Szirmay-Kalos

and Umenhoffer [30] which focuses on GPU-based im-

plementations.

Oh et al. [24] accelerate ray casting and achieve real-

time frame rates by creating a bounding volume hier-

archy (BVH) of the heightfield, which is stored in a

maximum mipmap and allows to safely advance along

the ray over long distances (see section 3.3). They also

present a method based on bilinear interpolation of

heightfield values to improve the quality of the recon-

structed surface obtained from point-sampled data. The

method presented by Tevs et al. [34] also relies on BVHs

stored in maximum mipmaps, but uses a different sam-

pling strategy. Their method advances along the ray

from one intersection of the projected ray with a texel

boundary to the next such intersection, whereas Oh

et al. use a constant step size to advance along the ray.

In addition, Tevs et al. store in each heightfield texel

the height values at the four corners of a quadrilateral

encoded as an RGBA value instead of point samples,

which allows surface reconstruction on parametric de-

scriptions.

Compared to other techniques which also rely on pre-

processed information about the heightfield and accel-

eration data structures, like for instance relaxed cone

step mapping [10,26,19], maximum mipmap creation is

much faster and can be performed on the GPU [34].

All these methods have in common that they operate

on single heightfields of relatively small extents which

are intended to add details to surfaces at meso- or mi-

croscales instead of representing vast surfaces them-

selves. Recently Dick et al. [8] have presented a method

for ray casting terrains of several square kilometers ex-

tent at real-time frame rates. Their method also em-

ploys maximum mipmaps to accelerate the ray casting

process and a tiling approach to render data sets of

several hundred GB size. They also presented a faster

hybrid method which uses ray casting or rasterization-

based rendering, but requires knowledge of the employed

GPU respectively a training phase to decide whether to

use rasterization or ray casting [9].

Our method presented in this paper also aims at ren-

dering very large heightfields only by means of GPU

ray casting. It has been inspired in large parts by the

works of Dick et al. and Tevs et al. as we also employ a

tile-based approach and their cell-precise ray traversal

scheme. But in contrast to the technique by Dick et al.,

which creates a complete mipmap for each tile and re-

quires additional rendering passes to determine the vis-

ibility of the tiles, our method further accelerates the

ray casting process and requires only a single rendering

pass by using a tile-based clipmap implementation.

The clipmap, as introduced by Tanner et al. [32], is

based on mipmaps [35] in order to handle very large

textures at several levels of detail which would exceed

the available video or main memory. While the orig-

inal version requires special hardware, modern GPU

features have superseded these requirements and other

clipmap implementations (or virtual textures) have be-

come available [11,7,29,20,17,31,13] whereupon most

GPU based Single-Pass Ray Casting of Large Heightfields Using Clipmaps 3

of them rely on texture tiles and permit handling of

arbitrarily large textures as briefly described in sec-

tion 3.1. Geometry clipmaps as introduced by Losasso

et al. [18], and derived GPU-based variations [3,5] have

also been used in terrain rendering, but according to our

knowledge only in the context of mesh-based rendering

and not for accelerating ray casting.

3 GPU-based Single-pass Ray Casting Using

Clipmaps

In this section we briefly present our tile-based clipmap

implementation, followed by a description of the used

storage scheme for heightfields. Next we describe the

employed ray traversal method, which is basically the

same as the one described in [8], and we discuss how

we accelerate it and avoid aliasing by using clipmaps.

Finally, we present two refinement methods which we

use to improve the appearance of the reconstructed sur-

faces.

3.1 Tile-based Clipmap Implementations

Clipmaps are storage schemes for texture maps (tex-

tures) which are based on mipmaps and rely like these

on the principle of using pre-filtered data to avoid alias-

ing artifacts when multiple texels are mapped to one

pixel or less in screen space due to perspective projec-

tion (texture minification) [35]. In contrast to mipmaps,

clipmaps only keep those data in memory which are

relevant for rendering the current frame, and they use

caching techniques to reload and update these data.

This reduces the amount of (video) memory occupied

by texture data and also allows to handle textures which

would by far exceed the limits of video or main mem-

ory. The clipmap by Tanner et al. [32] relies on special

hardware to update the texels in video memory when

the viewer’s eye point is moved. Modern GPUs allow to

implement clipmaps by using texture tiles and accessing

them in fragment shaders, e. g., by means of texture ar-

rays. Our implementation uses a Flexible Clipmap [13]

which is constructed as described in the following.

At the level l = 0, which corresponds to the finest res-

olution, the original virtual texture is partitioned into

smaller tiles of n×m texels (tile size). Like in a mipmap,

each 2×2 neighboring texels at level l are combined in a

certain way into a single texel at the next coarser level

l+ 1, which implies that 2× 2 neighboring tiles at level

l correspond to one tile of the same tile size at level

l+ 1. With color textures for instance, the combination

may simply be an averaging operation on the values of

the four texels, but the operation depends on the kind

of texture. This process is repeated until the original

texture is completely covered by a single tile of n ×mtexels at the least detailed level l = L−1, which can be

used to derive an ordinary mipmap. In the following we

use the term “clipmap” to refer only to these lower L

levels of a complete, tile-based clipmap but stick with

the terminology as used by Tanner et al. [32].

A clip center depending on the current location and

viewing direction of the virtual scene camera is used

to determine for each level the tiles that are needed

in the current frame. This group of neighboring tiles

is called the active area and located in video memory.

The clip area formed by a larger superset of tiles is kept

in main memory, and the remaining tiles are stored in

secondary memory, e. g., on hard disk. Since the lower

levels of a corresponding mipmap are effectively clipped

to smaller areas, this data structure is called clipmap.

Figure 1 illustrates the principle of a tile-based clipmap.

Once the tiles have been uploaded to video memory,

Fig. 1: Structure of a tile-based clipmap with L = 4

clip levels with active areas of at most 3× 3 tiles (dark

gray) and clip areas of at most 5× 5 tiles (light gray).

they can be accessed by shaders for rendering. When

the clip center is relocated, i. e., the virtual camera is

moved, tiles stored in video memory and main memory

can be replaced by neighboring ones from main mem-

ory respectively secondary memory, if necessary. If the

virtual camera is located for instance far away from the

textured surface currently visible, only the coarser res-

olution (higher) levels are required, as the texels from

the lower levels would cause aliasing. Hence it is not al-

ways required to keep the active areas of all clip levels

in video or main memory. Of course the tile size has to

be chosen carefully to ensure that the tiles themselves

are manageable by the graphics hardware. More details

on clipmap specific issues can be found in [32,7].

3.2 Clipmaps for DSM Storage

Due to their relation, clipmaps and mipmaps can be

created and used in very similar ways. To use a digital

surface model (DSM) for rendering, in our approach


the heightfield values are stored in the clipmap tiles at

the finest resolution (lowest) level l = 0. A texel at

level l > 0 obtains as height value the maximum height

value of the corresponding 2 × 2 subordinate texels at

level l − 1. If we identify each texel with a bounding

box defined by its height value and its grid cell in the

texture, we obtain a bounding volume hierarchy (BVH)

of the underlying DSM as illustrated in figure 2. This

Fig. 2: BVH derived from a heightfield on a regu-

lar grid. Gray boxes correspond to samples at level 0.

Bounding boxes on higher levels and their maximum

value are highlighted by the same color.

is the same construction scheme as used with maxi-

mum mipmaps [24,34,8]. In the method presented by

Dick et al. [8], the heightfield is split into tiles as well,

but a separate maximum mipmap is created for each

tile. To render vast DSMs, this approach may require

either lots of tiles and thus mipmaps to be present in

video memory or additional rendering passes, especially

if the heightfield is shallow and there is little occlu-

sion between tiles. Furthermore, the tiles located far

away from the viewer may contain fine spatial details,

e. g., steep summits of distant mountains, which are not

only not perceivable from far away but may also exposespatial aliasing artifacts due to minification caused by

perspective projection. The latter aspect is the same

which motivated the development of mipmaps for tex-

ture mapping and also applies to mesh-based rendering

techniques which therefore strive to determine an ap-

propriate level of detail (LOD) in order to avoid ras-

terizing triangles that would become projected to less

than one pixel in screen space.

The important difference between the usage of clipmaps

and multiple mipmaps is that in case of clipmaps the

BVH spans the entire domain at the topmost level. A

proper placement of the clip center results in the se-

lection of only those tiles of highest resolution at level

l = 0 which are closest to the virtual camera and thus

potentially have to be rendered in full detail. Compared

to level l, at level l+1 the area of the heightfield covered

by a tile is four times larger, and the spatial resolution

is divided in half along each direction of the grid. Thus

the entire domain is spatially pre-filtered and the level

of detail of the heightfield decreases with increasing dis-

tance towards the viewer. Because higher clipmap levels

also correspond to larger bounding boxes, we can ex-

ploit this fact to accelerate GPU ray casting in the far

range of the scene as described in the following section.

3.3 Rendering and Accelerating Ray Casting

Given a DSM stored in a clipmap of L levels, we set

the clip center simply by projecting the center of the

viewport into the scene. We also ensure that all tiles in

the active areas of all clip levels or at least the highest

(coarsest) ones are stored in video memory by choosing

appropriate sizes for the tiles and the active area. The

axis-aligned bounding box of the entire DSM, which is

associated with the topmost tile, is based in the xz-

plane of a left-handed world coordinate system. It is

represented by a polygonal mesh consisting of 12 trian-

gles which serves as proxy geometry for the ray casting

process. A vertex shader calculates normalized 3D tex-

ture coordinates from the vertex coordinates of the box

corners, and the clipmap is positioned at the bottom

of the box corresponding to the minimum height value

y = Hmin of the DSM. Hmin and the maximum height

value Hmax are both determined during loading of the

topmost clipmap tile on the CPU. By rendering the

back faces of the proxy geometry we obtain each ray’s

exit point e, and we pass the camera position and the

geometry of the bounding box in world coordinates to

the fragment shader which calculates each ray’s direc-

tion d = (dx, dy, dz) and entry point s to the proxy

geometry and transforms them into normalized 3D tex-

ture space. If the camera is located within the bound-

ing box the entry point s becomes the camera position

(cf. [19]). In order to avoid that faces of the proxy ge-

ometry are clipped against the far plane of the view

frustum of the virtual camera and hence exit points are

missing, the box is fitted into the view frustum when

the camera is translated.

The actual ray traversal is performed by projecting the

ray onto a clip level dependent 2D grid. For a given

clip level 0 ≤ l < L the extensions of this grid are

determined by (Gu(l), Gv(l)) =(W2l, H2l

)with (W,H)

being the extensions of the DSM in sample points, i.e.,

texels. Hence, the grid at level l has the same size a

single texture containing the entire DSM at mipmap

level l would have. The current height py of a loca-

tion p = (px, py, pz) = s + k · d on the ray is retained

and updated in world coordinates to test for intersec-

tions with the heightfield. During ray traversal we move

from one intersection of the projected ray dp = (dx, dz)

with a texel boundary to the next such intersection,

i. e., from the projected ray’s entry point enp into a

grid cell directly to its exit point exp as shown in fig-


ure 3. The only exception is at the first entry point

which is the projection of s. We start ray casting at

Fig. 3: Rays are traversed from one intersection of the

projected ray with a texel boundary to the next such

intersection.

the coarsest (highest) clip level L − 1 of the BVH at

which the entire DSM is given in a single tile and each

pixel corresponds to the maximum value and thus the

bounding box of 2L−1 × 2L−1 texels at level 0. To de-

termine whether a ray hits a bounding box at level l,

the clipmap tile containing the grid cell which belongs

to the current enp and exp has to be sampled for the

associated height value h. Since the direction of the ray

is needed to determine this grid cell we store the sign

bits of the components of d in the lower three bits of

an integer. This bit mask is created once for each ray

using bit-wise operations in the fragment shader, and

it is evaluated as needed by switch-statements to de-

termine the direction of a ray instead of duplicating

the shader code for the ray casting loop for each of the

overall eight possible branches.

When moving along the ray from point en to point ex

we hit the box surface if the ray is directed downwards

(resp. upwards) and ex (resp. en) lies below the top of

the box (at height h). If a ray hits a bounding box B

at the current level l, it may also hit a bounding box

contained in B at a lower level of the BVH. Therefore

the ray casting process is repeated at the next lower

level l′ = l − 1 from the current position en of the ray,

but only if it is possible and reasonable to proceed as

described in section 3.4. Otherwise the lowest possible

level l = lmin has been reached, and the exact inter-

section i on the bounding box surface is calculated by

i =

{en dy ≥ 0

en + d ·max(h−eny

dy, 0)

dy < 0

If a ray does not intersect a bounding box B at level l,

it cannot intersect any of the bounding boxes contained

in B at any lower level either, and we therefore advance

along the ray to ex which becomes the entry point en

of the next cell. Compared to a ray traversal performed

just on level 0, only one instead of 2l×2l samples have to

be tested for intersection, which results in a significant

speed up of the process (cf. [34], [8]).

If a ray hits a bounding box B at some level l > 0

it does not necessarily have to hit any bounding boxes

contained in B at level l−1. This cannot be determined

without descending to the lower level. In order to avoid

using the smaller step size over longer distances when it

is not really necessary, we move up again to level l if we

detect that the ray does not hit any bounding box at

level l−1 (cf. [34,8]). These three different cases for the

intersection of a ray with a bounding box are illustrated

in figure 4. The ray casting process is terminated if

Fig. 4: Intersection of ray with a height field. The green

ray hits the left red box, but none of the black boxes

contained.

either a valid intersection point i on a bounding box

has been found, the ray leaves the domain of the DSM,

or the maximal number of ray casting steps exceeds

2 · max(n,m) with n, m as the tile size in texels. In

the latter two cases, the fragment from which the ray

originates is discarded by the shader.

3.4 LOD-determined Ray Termination

To decide whether we can terminate ray casting at the

current level, we check the following two conditions.

First, we determine at each intersection of a bounding

box the highest resolution available, i. e., the lowest clip

level llow of a tile which covers the corresponding area of

the DSM and is present in video memory. The clipmap

tiles from the active areas of all clip levels are stored in a

texture array which is accessed by the fragment shader.

The Flexible Clipmap uses a certain tile layout and an

additional texture, the tile map [7], to find llow and the

index in the texture array where the corresponding tile

has been stored during its upload into video memory

(see [13] for details). The tile map covers the entire do-

main of the DSM as well, but each texel corresponds to

one tile of n×m texels at the lowest level l = 0. Each

texel stores the lowest clip level of the tile which cov-

ers the corresponding area of the DSM and is currently

present in video memory. For instance, given a tile size

of n = m = 512 texels, a tile map of 512 × 512 texels

holds information about the clip levels of 5122 × 5122

heightfield samples. When tiles at and above level l ≥ 0


are available in video memory, the tile map contains a

square region of 2l×2l texels with value l (cf. [32]). The

tile map is created on the CPU whenever the cache for

the clipmap tiles is updated due to relocations of the

clip center, and tiles are uploaded in top-down order

to ensure that at least the highest levels are present if

secondary caching structures cause a delay, e. g., when

tiles have to be loaded from hard disk. Thus, by trans-

forming the hit point i on the bounding box surface to

normalized texture coordinates the shader can deter-

mine llow by a single texel-precise texture lookup in the

tile map.

Second, the optimal clip level lopt at the current hit

point i = (u, hgrid, v) is determined by the minification

of the corresponding box at level l = 0 in screen space

(cf. [12]). We project the four corners of the cell’s box

κ = (buc ·Rx, bhgridc ·Ry, bvc ·Rz), λ = κ+(Rx, 0, 0),

µ = κ + (0, 0, Rz) and ν = κ + (0, Ry, 0) from world

space into normalized screen space using the model,

view and projection matrix combined in M followed by

perspective division to obtain the vectors a,b, c and f ,

where Rx, Ry, Rz are the numbers of world space units

per heightfield sample along the respective direction.

Then we calculate the areas A1, A2 and A3 of the pro-

jected faces of a box in screen space:

p = (b− a), q = (c− a), r = (f − a)

A1 = |p× q| = |(px · qy)− (py · qx)|A2 = |p× r| = |(px · ry)− (py · rx)|A3 = |q× r| = |(qx · ry)− (qy · rx)|

We want the largest face of one box in screen space A =

max (A1, A2, A3) to correspond to one texel of a tile at

level lopt in texture space which itself has an area of

P = 1n·m . Hence 2lopt = P

A and lopt = − log2 (A · n ·m)

Instead of descending to a full resolution mipmap level

which may cause aliasing we can now terminate ray

casting already at level lmin = max (llow, lopt). The two

different LODs llow and lopt are visualized in figure 5

where each level is coded by a different color.

(a) llow (b) lopt

Fig. 5: The two different LODs llow and lopt are used

to terminate the ray traversal and to avoid aliasing.

3.5 Sampling Color Textures

In our implementation, each clipmap tile can consist of

several different texture layers which are handled identi-

cally and only differ by the stored data and their texel

aggregation scheme. For each tile we provide an ad-

ditional layer for a registered color texture to texture

the DSM. This color texture layer is uploaded along

with the heightfield layer and accessed in the fragment

shader via a second texture array. As long as they cover

the same area in world space, the different layers of the

tiles do not even need to be of the same resolution. How-

ever, we have not yet implemented this, and therefore

one heightfield sample corresponds to one color sam-

ple. In general, to avoid aliasing when sampling the

color texture layer, we would have to determine the

ideal LOD ltex at the final hit point i in the height-

field separately and transform it to the corresponding

tile which holds the color texture layer. This LOD ltexcan be calculated in the same way as lopt during ray

casting (see section 3.4), but in case of a 1:1 relation

of heightfield and color samples we can directly use loptand the texture coordinate for the heightfield layer ob-

tained during ray casting to sample the color texture.

The final fragment color is obtained by linear interpola-

tion between the linearly interpolated color values from

the two LODs adjacent to ltex (trilinear interpolation).

3.6 Refinement of Block-sampled Heightfield

Reconstruction

As pointed out by Oh et al. in [24], the point sampled

DSMs and their treatment as boxes results in blocky im-

ages which from a closeup view remind of models built

of bricks (see figure 6a). Because this effect may be un-

(a) none (b) linear (c) bicubic

Fig. 6: Demonstration of the improvement in surface

quality achieved by different refinement methods.

wanted in most applications, we also implemented two

refinement methods to obtain smooth surfaces. Both

refinement methods are applied after the intersection i

on the bounding box surface has been determined as

described in section 3.3.


The first method is the one presented by Oh et al. [24]

and relies on linear interpolation of two samples ob-

tained from the linearly interpolated heightfield, which

are taken at a distance of each one half cell from i

in forward respectively backward direction along the

ray. This method works quite well and does hardly

slow down the overall performance on modern GPUs,

but in our implementation, some defects – presumably

caused by numerical inaccuracies – on surfaces with

steep slopes remain, as shown in figure 6b. Despite these

small defects, which are barely noticeable during ani-

mations or from farther viewing distances, the surfaces

look much smoother.

Our second method uses Hermite bicubic surfaces to

improve the reconstruction of the heightfield. Let (u, v)

denote the projection of i onto the grid of the height-

field where ray casting has been terminated. We inter-

pret the junctions at the four corners of the grid cell

containing (u, v) and its eight neighbors as the corners

of a bicubic surface patch. The four junctions are given

by

α = (buc ,min (SW,S,C,W ) , bvc)β = (buc+ 1,min (S, SE,E,C) , bvc)γ = (buc ,min (W,C,N,NW ) , bvc+ 1)

δ = (buc+ 1,min (C,E,NE,N) , bvc+ 1)

with C as the height value of the cell containing (u, v)

and SW,S, SE,E,NE,N,NW,W as the height values

of the neighboring cells, starting at the left lower cell

adjacent to α and enumerating them in counterclock-

wise order (see figure 7). Each patch is parametrized

Fig. 7: Construction scheme for a Hermite bicubic

patch from 3 × 3 heightfield samples surrounding the

projection of intersection point i on the bounding box.

along the grid axes by (s, t) ∈ [0, 1], and the height

h(s, t) on the surface patch is given by

h(s, t) =(s3 s2 s 1

)·H ·G ·HT ·

(t3 t2 t 1

)T

H =

2 −2 1 1

−3 3 −2 −1

0 0 1 0

1 0 0 0

, G =

αy βy

∂αy

∂v∂βy

∂v

γy δy∂γy∂v

∂δy∂v

∂αy

∂u∂βy

∂u∂2αy

∂u∂v∂2βy

∂u∂v∂γy∂u

∂δy∂u

∂2γy∂u∂v

∂2δy∂u∂v

(cf. [14]). The partial derivatives which define the tan-

gential planes on the patch are approximated by us-

ing forward respectively backward differences and by

making the following simplifications for the first order

derivatives:

∂αy∂u

=∂γy∂u≈ C −W, ∂βy

∂u=∂δy∂u≈ E − C

∂αy∂v

=∂βy∂v≈ C − S, ∂γy

∂v=∂δy∂v≈ N − C

Although the matrix G at each grid cell respectively

texel of the clipmap storing the heightfield is constant,

we calculate it directly in the fragment shader as needed.

The pair of parameters (s, t), which corresponds to an

intersection with the bicubic patch instead of the bound-

ing box, is determined by a second ray casting. Starting

at i on the bounding box surface, the ray p = i + k · dis advanced at a fixed step width until it either hits

the bicubic patch, i. e., py ≤ h(s, t), or it leaves the do-

main of the box without intersection. In the latter case,

we treat i as an entry point s on the proxy geometry

and proceed with the accelerated ray casting process

described in section 3 from the current level. We found

a subdivision into 16 steps for traversing the bounding

box of a cell to be completely sufficient, independent of

the clip level l. Fewer subdivision steps expose defects

by missed intersections, whereas increasing the number

of subdivision steps only reduces frame rates without

further improving the reconstruction of the surface.

Besides their simplicity and the possibility to calcu-

late all the relevant information in the fragment shader,

we decided to use Hermite bicubic patches because we

wanted to ensure that the surface remains inside the

bounding boxes of the BVH. By constructing the patches

as described above, we can ensure that they stay com-

pletely inside the bounding boxes as we control the

defining tangential planes. The direct usage of forward

and backward differences in our implementation avoids

any scaling of the tangents and therefore leads to de-

sired C1 continuity between neighboring patches, be-

cause their tangents have the same direction and magni-

tude (cf. [14]). The most severe drawback of this method

is its high computational cost, although we still may

achieve interactive frame rates (see section 4.2). Fur-

thermore, as this method ensures that the height of

each patch is less or equal than the height of its bound-

ing box, and the tangents are not scaled, isolated peaks

in the heightfield become clearly flattened as can be

seen in figure 6c.

However, both refinement methods presented in this

section rely on interpolation of point sampled data on

a regular grid, and only serve in making the resulting

renderings visually more appealing. Besides, even if it

might appear to be sufficient to apply refinement only in


name extent [km] W ×H L scale size DSM size color texture time [min]City 1 1.4× 1.0 5600× 4000 5 1.0 133 MB 99 MB 0:54City 2 20.9× 26.3 83600× 105200 9 1.0 31.6 GB – 3:34ETOPO1 ≈ 40075.0× 19970.0 21600× 10800 7 10.0 1.3 GB – 9:53Blue Marble ≈ 40075.0× 19970.0 86400× 43200 9 10.0 19.2 GB 14.4 GB 13:10

Table 1: Properties of the different data sets used to evaluate performance. L denotes the total number of clip

levels which have been created, W ×H is the grid size at level 0 respectively the size a single texture would have.

Column time contains the durations of the virtual camera flights for our evaluation in minutes.

cases when the viewer is close to a highly detailed area

where the block sampled nature of the data becomes

apparent, we refine the surface at all discrete LODs,

because the transition between large distant boxes and

smooth surfaces is rather disturbing during animations.

In addition, the lighting conditions on smooth surfaces

and blocks are different due to distinct surface normals.

4 Performance Results and Discussion

The implementation of our technique relies on OpenGL

and GLSL 1.50 shaders, and we demonstrate its per-

formance by means of renderings of the four different

data sets listed in table 1. The data set City 2 was

acquired by means of photogrammetric methods from

aerial images. City 1 depicts a small area in City 2 in

which we have a color texture available that has been

derived from orthographic aerial images. The data sets

ETOPO1 [2] and Blue Marble [21] depict the entire

earth and are both derived in large parts from SRTM

data [22], but ETOPO1 also contains bathymetric data,

whereas Blue Marble possesses a color texture derived

from satellite images.

When being sampled in the fragment shader, the height

values are scaled by factors given in column scale in

order to avoid flattened surfaces. Shallow surfaces do

not challenge our ray caster because less mutual occlu-

sions lead to fewer level changes in the BVH during ray

traversal. Renderings of three data sets are shown in

figure 8.

4.1 Evaluation Setup and Results

We used tile sizes of 512× 512 texels, active area sizes

of 5 × 5 tiles and clip area sizes of 7 × 7 for all data

sets in our tests. The near resp. far plane of the vir-

tual camera were set to 1.0 resp. 2000.0 units. Height-

field layers consist of single channel 32-bit floating point

textures, and color texture layers consist of 24-bit RGB

textures. The results were recorded during virtual cam-

era flights along fixed paths over the heightfields on

a desktop computer with an Intel i7 860 CPU at 2.8

GHz, 6 GB RAM, NVIDIA GeForce GTX 470 graph-

ics adapter with 1280 MB dedicated VRAM and Win-

data set resolution[pixel]

frames min.[fps]

avg.[fps]

City 11024× 768 5450 6.2 100.91280× 1024 3464 5.8 64.11920× 1080 2217 5.1 41.0

City 21024× 768 25624 5.6 119.71280× 1024 16700 5.6 78.01920× 1080 11189 5.1 52.3

ETOPO11024× 768 105747 6.3 178.51280× 1024 66869 5.9 112.91920× 1080 42907 4.9 72.4

Blue Marble1024× 768 121844 3.3 154.21280× 1024 75721 4.0 95.81920× 1080 50028 1.8 63.3

(a) System A

data set resolution[pixel]

frames min.[fps]

avg.[fps]

City 11024× 768 3848 5.1 71.21280× 1024 2450 4.5 45.31920× 1080 1574 5.1 29.1

City 21024× 768 18174 3.6 84.91280× 1024 12327 3.6 57.61920× 1080 8290 3.3 38.7

ETOPO11024× 768 75790 5.2 127.91280× 1024 48027 5.2 81.11920× 1080 31134 4.7 52.5

BlueMarble

1024× 768 95285 2.8 120.61280× 1024 63409 2.5 80.31920× 1080 43314 1.0 54.8

(b) System B

Table 2: Performance results of our rendering tech-

nique.

dows 7 OS (system A). To make our results comparable

to the results reported in [8], we additionally ran the

same tests on a second desktop computer (system B)

with a hardware configuration more similar to theirs

(Intel Q6600 CPU at 2.4 GHz, 4 GB RAM, NVIDIA

GeForce GTX 285 with 1024 MB dedicated VRAM and

Windows 7 OS). Table 2 shows the results for different

screen resolutions on system A and system B in terms

of frames per second (fps). The frame rates take into

account the delays caused by updating the tile caches

in main memory and video memory as described in sec-

tion 3.1. The times for rendering the given number of

frames are denoted by column time in table 1.

4.2 Performance with Surface Refinement

All values given in table 2 were obtained without any of

the surface refinement methods described in section 3.6.

The impact on the rendering speed and the relative loss


(a) City 2 (b) ETOPO1 (c) Blue Marble

Fig. 8: Example renderings of the data sets which we used in our performance evaluations. Color textures are only

available for City 1 and Blue Marble, ETOPO1 was rendered using a pseudo topographic color map.

method 1024× 768 1280× 1024 1920× 1080linear 96.2 (-19.6%) 62.8 (-19.5%) 42.0 (-19.7%)bicubic 36.8 (-69.3%) 24.3 (-68.8%) 16.6 (-68.3%)

(a) System A

method 1024× 768 1280× 1024 1920× 1080linear 75.4 (-12.4%) 49.8 (-13.5%) 33.7 (-12.9%)bicubic 36.4 (-57.1%) 17.4 (-69.8%) 11.9 (-69.3%)

(b) System B

Table 3: Impact on the performance by surface refine-

ment methods in terms of average frames per second

for City 2 data set and the loss compared to unrefined

rendering.

in performance when using surface refinement in our

implementation is shown in table 3. These data were

acquired from another evaluation of the same camera

flight through the City 2 data set on system A and sys-

tem B, because this data set has high spatial frequencies

in the rendered regions and is the most challenging for

our ray caster.

4.3 Discussion

The results in table 2 show that - in accordance with

the results of Dick et al. [8] - very large DSMs can be

rendered in real time by using only ray casting and

acceleration data structures. Although the hybrid ap-

proach of Dick et al. [9] performs faster rendering, it

appears to be less flexible, because it requires to select

representative tiles from the data set and views of the

scene during its training phase.

As expected, table 3 shows that when using bicubic sur-

face refinement, the loss in performance is much bigger

than with the linear method, but even at the high-

est resolution we still achieve interactive frame rates.

The linear method may expose some defects, but of-

fers a good compromise between quality and speed at

higher resolutions. Besides, the refinement of the recon-

structed surface only pays for coarse resolution DSMs

respectively low grid densities where the block struc-

ture becomes apparent. The differences in the frame

rates between the two city data sets and the two earth

data sets result from different grid densities.

5 Conclusions and Future Work

In this paper we have shown that by combining clipmaps

and ray casting very large DSMs can be rendered at

real-time frame rates in a single rendering pass. Our ap-

proach eliminates aliasing caused by texture sampling

or spatial sampling. The same LOD selection method

is used in order to avoid unnecessary ray casting steps

in regions distant to the viewer. The size of the ren-

dered DSMs is mainly limited by the amount of sec-

ondary memory available. We also used surface refine-

ment based on Hermite bicubic patches to improve the

renderings of point sampled data and still achieved in-

teractive frame rates at high scren resolutions. However,

assigning color values to reconstructed surfaces by us-

ing orthographic photo textures suffers from the prob-

lem that they do not contain information about surfaces

oriented oblique to the ground plane. This becomes es-

pecially apparent for our City 1 data set if we lower

the camera to street level where facades of buildings

are missing. In case of untextured DSMs, the transition

from one LOD to another in the heightfield layer can

be perceived as inconvenient and therefore a method

for smooth transitions between different LODs in this

layer is needed. Besides, it would be desirable to have

direct comparisons of the performance and rendering

quality of our implementation with rasterization-based

techniques and CPU ray casting implementations.

Acknowledgements Our work has been conducted withinthe project AVIGLE, which is part of the Hightech.NRW ini-tiative funded by the Ministry of Innovation, Science andResearch of the German State of North Rhine-Westphalia.AVIGLE is a cooperation of several academic and indus-trial partners, and we thank all partners for their work and


contributions to the project with special thanks to Aerow-est GmbH, Dortmund, Germany for providing us with datafor our City data sets. We would further like to thank theanonymous reviewers for their valuable advice and all thosewho were involved in providing the original data for our datasets ETOPO1 and Blue Marble.

References

1. Akenine-Moller, T., Haines, E., Hoffman, N.: Real-TimeRendering, 3rd edn. A K Peters, Ltd. (2008)

2. Amante, C., Eakins, B.W.: ETOPO1 1 Arc-MinuteGlobal Relief Model: Procedures, Data Sources andAnalysis. In: NOAA Technical Memorandum NESDISNGDC-24, p. 19pp (2009)

3. Asirvatham, A., Hoppe, H.: GPU Gems 2, chap. Ter-rain Rendering Using GPU-Based Geometry Clipmaps.Addison-Wesley Longman (2005)

4. Blinn, J.F.: Simulation of Wrinkled Surfaces. In: SIG-GRAPH ’78: Proceedings of the 5th annual conference onComputer graphics and interactive techniques, pp. 286–292. ACM (1978)

5. Clasen, M., Hege, H.C.: Terrain Rendering using Spheri-cal Clipmaps. In: EuroVis06 Joint Eurographics - IEEEVGTC Symposium on Visualization, pp. 91–98. Euro-graphics Association (2006)

6. Cook, R.L.: Shade trees. In: SIGGRAPH ’84: Proceed-ings of the 11th annual conference on Computer graphicsand interactive techniques, pp. 223–231. ACM (1984)

7. Crawfis, R., Noble, E., Ford, M., Kuck, F., Wagner, E.:Clipmapping on the GPU. Tech. rep., Ohio State Uni-versity, Columbus, OH, USA (2007)

8. Dick, C., Kruger, J., Westermann, R.: GPU Ray-Castingfor Scalable Terrain Rendering. In: Proceedings of Euro-graphics 2009 - Areas Papers, pp. 43–50 (2009)

9. Dick, C., Kruger, J., Westermann, R.: GPU-Aware Hy-brid Terrain Rendering. In: Proceedings of IADIS Com-puter Graphics, Visualization, Computer Vision and Im-age Processing 2010, pp. 3–10 (2010)

10. Dummer, J.: Cone Step Mapping: An Iter-ative Ray-Heightfield Intersection Algorithm.http://www.lonesock.net/files/ConeStepMapping.pdf(2006)

11. Ephanov, A., Coleman, C.: Virtual Texture: A LargeArea Raster Resource for the GPU. In: Interser-vice/Industry Training, Simulation, and Education Con-ference (I/ITSEC) 2006, pp. 645–656 (2006)

12. Ewins, J.P., Waller, M.D., White, M., Lister, P.F.: MIP-Map Level Selection for Texture Mapping. IEEE Trans-actions on Visualization and Computer Graphics 4(4),317–329 (1998)

13. Feldmann, D., Steinicke, F., Hinrichs, K.: FlexibleClipmaps for Managing Growing Textures. In: Proceed-ings of International Conference on Computer GraphicsTheory and Applications (GRAPP) (2011)

14. Foley, J.D., van Dam, A., Feiner, S.K., Hughes, J.F.:Computer Graphics: Principles and Practice, Second Edi-tion in C edn. Addison-Wesley (1995)

15. Google Inc.: Google Earth. http://earth.google.com/(2005)

16. Kaneko, T., Takahei, T., Inami, M., Kawakami, N.,Yanagida, Y., Maeda, T., Tachi, S.: Detailed Shape Rep-resentation with Parallax Mapping. In: In Proceedingsof the ICAT 2001, pp. 205–208 (2001)

17. Li, Z., Li, H., Zeng, A., Wang, L., Wang, Y.: Real-TimeVisualization of Virtual Huge Texture. In: ICDIP ’09:Proceedings of the International Conference on DigitalImage Processing, pp. 132–136. IEEE Computer Society(2009)

18. Losasso, F., Hoppe, H.: Geometry clipmaps: Terrain Ren-dering Using Nested Regular Grids. ACM Transactionson Graphics (TOG) (2004)

19. Microsoft: DirectX SDK Documentation: Raycast-Terrain Sample. http://msdn.microsoft.com/en-us/library/ee416425(v=vs.85).aspx (2008)

20. Mittring, M., Crytek GmbH: Advanced Virtual TextureTopics. In: SIGGRAPH ’08: ACM SIGGRAPH 2008Classes, pp. 23–51. ACM (2008)

21. NASA: Visible Earth: Earth - The Blue Marble.http://visibleearth.nasa.gov/view.php?id=54388 (1997)

22. NASA: Shuttle Radar Topography Mission.http://www2.jpl.nasa.gov/srtm/ (2000)

23. NASA: World Wind. http://worldwind.arc.nasa.gov/(2004). http://www.goworldwind.org

24. Oh, K., Ki, H., Lee, C.H.: Pyramidal Displacement Map-ping: a GPU based Artifacts-free Ray Tracing through anImage Pyramid. In: VRST ’06: Proceedings of the ACMsymposium on Virtual reality software and technology,pp. 75–82. ACM (2006)

25. Oliveira, M.M., Bishop, G., McAllister, D.: Relief Tex-ture Mapping. In: SIGGRAPH ’00: Proceedings of the27th annual conference on Computer graphics and in-teractive techniques, pp. 359–368. ACM Press/Addison-Wesley Publishing Co. (2000)

26. Policarpo, F., Oliveira, M.M.: GPU Gems 3, chap. Re-laxed Cone Stepping for Relief Mapping. Addison-WesleyProfessional (2007)

27. Policarpo, F., Oliveira, M.M., Comba, J.a.L.D.: Real-time Relief Mapping on Arbitrary Polygonal Surfaces.In: Proceedings of the 2005 symposium on Interactive 3Dgraphics and games, I3D ’05, pp. 155–162. ACM (2005)

28. Qu, H., Qiu, F., Zhang, N., Kaufman, A., Wan, M.:Ray Tracing Height Fields. In: Procedings of ComputerGraphics International, pp. 202–207 (2003)

29. Seoane, A., Taibo, J., Hernandez, L.: Hardware-Independent Clipmapping. In: Journal of WSCG 2007,pp. 177 – 183 (2007)

30. Szirmay-Kalos, L., Umenhoffer, T.: Displacement Map-ping on the GPU - State of the Art (2006)

31. Taibo, J., Seoane, A., Hernandez, L.: Dynamic VirtualTextures. In: Journal of WSCG 2009, pp. 25 – 32. Euro-graphics Association (2009)

32. Tanner, C.C., Migdal, C.J., Jones, M.T.: The Clipmap:a Virtual Mipmap. In: SIGGRAPH ’98: Proceedings ofthe 25th Annual Conference on Computer Graphics andInteractive Techniques, pp. 151–158. ACM (1998)

33. Tatarchuk, N.: Dynamic Parallax Occlusion Mappingwith Approximate Soft Shadows. In: SIGGRAPH ’06:ACM SIGGRAPH 2006 Courses, pp. 63–69. ACM (2006)

34. Tevs, A., Ihrke, I., Seidel, H.P.: Maximum Mipmaps forFast, Accurate, and Scalable Dynamic Height Field Ren-dering. In: I3D ’08: Proceedings of the 2008 Symposiumon Interactive 3D Graphics and Games, pp. 183–190.ACM (2008)

35. Williams, L.: Pyramidal Parametrics. In: SIGGRAPH’83: Proceedings of the 10th Annual Conference on Com-puter Graphics and Interactive Techniques, pp. 1–11.ACM (1983)

gpu based single-pass ray casting of large height elds ...€¦ · gpu based single-pass ray...

Documents