hierarchical visibility and tiling

Hierarchical Visibility and Tiling

Ned Greene

The Hierarchical Visibility Algorithm

• Accelerates visibility computation in densely occluded scenes.

• Challenge: Efficiently cull hidden regions of the scene.

Hierarchical Visibility

• employs object-space and image-space hierarchies

• enables hierarchical culling of hidden geometry

• result: finds visible geometry by logarithmic search

Object Space• organize scene model in an octree• traverse octree cubes front-to-back,

culling hidden cubes

Image Space• maintain visibility info. in a pyramid• perform image-space culling

hierarchically


• Hierarchical Z-Buffer Visibility (Siggraph ‘93)• Error-Bounded Antialiased Rendering

(Siggraph ‘94)• Hierarchical Polygon Tiling (Siggraph ‘96)

An Ideal Visibility Algorithm Should

1) Quickly cull most hidden geometry, so rendering time is independent of the complexity of hidden geometry.2) Exploit available coherence to accelerate rendering of visible geometry.

Types of Coherence

• Object-space• Image-space• Temporal

Object-Space Coherence

Often a single visibility computation can resolve the visibility of a collection of objects that are near each other in space.

Image-Space Coherence

A single visibility computation can often resolve visibility within a region of the screen.

Temporal Coherence

Knowing the visibility ofa collection of objects inone frame can oftenaccelerate visibilitycomputation for thoseobjects in the next frame.


• Exploits object-space coherence with an octree.

• Exploits image-space coherence with a z-pyramid.

• Exploits temporal coherence by using the geometry that was visible in the preceding frame to construct a starting point.

Object-Space Coherence

• If an octree cube is hidden by a z-buffer, all geometry inside the cube is also hidden.

• Apply recursively through the octree.

Occluder

Recursive Subdivision Algorithm

Start with root cube.• If octree cube is outside the viewing frustum,

done.• Scan convert faces of the octree cube to see

if it’s visible; if hidden, done.• Render geometry associated with octree

cube.• Subdivide octree node in front-to-back order,

applying same algorithm to children.

Occluders

Properties of the Algorithm

• Only visits visible octree nodes and their children.

• Octree nodes are visited at most once.

• Only renders geometry in visible octree nodes.

Scan Conversion Bottleneck

• Testing octree cubes for visibility requires scan converting many large cube faces.

• Solution: exploit image-space coherence with a z-pyramid.

A scene andits z-pyramid

z-pyramid

• Finest level is a standard z-buffer.• Each pyramid sample is the farthest

sample in the corresponding 2x2 window at the next level.

• Each sample represents the farthest z for a square window of the screen.

Does a Z-pyramid completely hide a primitive?Step 1: Find the finest-level pyramid value whose

corresponding image region encloses the primitive.

screen Z-pyramid

Does a Z-pyramid completely hide a primitive?Step 2: If nearest z of primitive

is farther away than this value, the primitive is completely hidden.

Z-pyramid value forsquare window of screen

Nearest Z of objectOccluder

Depth Complexity of the Visibility Computation

83.7

Naive Z-Buffering

Depth Complexity of Visibility Computation

z-pyramid tests tiling polygons total .45 2.51 2.96

Hierarchical Visibility (log scale)

• Object-space octree rapidly culls hidden regions of space.

• Z-pyramid rapidly establishes visibility of octree cubes and primitives.

For animation, we can also exploittemporal coherence.

Exploiting Temporal Coherence

• First frame Render as before. Keep a list of visible octree cubes.• Subsequent frames Z-buffer geometry in previously visible cubes. Build Z-pyramid.

Render newly visible geometry with H.V.Update list of visible cubes.

first frame second frame(with temporal coherence)

Office Scene Performance

• 538 million polygons, 60 million in viewing frustum.

• 75 minutes to render using the hardware pipeline on Crimson.

• 4 seconds to render using H.V. on the same machine.

(on SGI R4000 Crimson Elan)

Hardware Acceleration

• With the temporal coherence option, most rendering of primitives can be done with hardware scan conversion.

• Testing visibility of octree cubes is problematic with existing accelerators.

Kubota Pacific Titan workstation

• “V-Query” function reports whether a polygon or a set of polygons is visible (has at least one visible pixel).

• Use V-Query to determine visibility of octree cubes.

• Faster than software scan conversion, but pipeline delay is a bottleneck.

Hardware Acceleration

• V-Query is an important feature for visibility algorithms in general.

• Candidates for hardware acceleration:– Z-pyramid– octree subdivision– temporal coherence scheme

Estimating Parallel Performance

• Screen divided into grid of windows.• Ran rendering algorithm sequentially

in each window.• Overhead of rendering in windows was

low for 32x32 windows and larger.

Total rendering time as a function of the size of the image window

renderingtime inseconds

window width in pixels

20481024

5122561286432168421128 64 32 16 8 4 2 1

Hierarchical Visibility - Conclusion

• Works on arbitrary models.• Exploits object-space, image-space,

and temporal coherence for substantial acceleration.

• Suitable for parallel implementation.• Suitable for hardware acceleration.

Hierarchical Rendering with Antialiasing

Error-Bounded Antialiased Rendering of Complex Environments

Ned Greene, Michael Kass Siggraph ‘94

• For many shading functions, can antialias with guaranteed accuracy.

• Too slow for interactive applications.

quadtree subdivision

log-scale quadtreesubdivision

Method:Generate higher-resolution image and

filter subpixel samples.

Problems: Much slower, uses much more image

memory.

Z-Buffer Antialiasing by Oversampling

Hierarchical Tiling

• tiling by recursive subdivision• subdivision driven by 3-state coverage masks• tiling and visibility done with bitmask operations (fast!)

Particularly fast:• high-resolution tiling (well suited to antialiasing)• densely occluded scenes (use “hierarchical visibility”)

Disadvantage: Polygons must beprocessed in front to back order.

Advantage is Speed

• tiling by recursive subdivision• coverage masks for pyramids• rendering a scene• object-space culling

Topics

Tiling by Recursive SubdivisionWarnock (1969)

Novel Variation:Process polygons front to back

• no lists of polygons kept with cells• no depth comparisons• only subdivides along visible edges

Warnock Subdivision

Classify quadtree cells versus polygons

• cell outside polygon: ignore• cell inside polygon: polygon covers cell, done• cell intersects polygon: subdivide cell

Tiling by Warnock Subdivision

Advantages:• finds visible geometry by logarithmic search• hierarchically culls hidden geometry

Disadvantage:Subdivision is too slow.

Conventional Coverage Masks

Application: filter pixels

Carpenter (1984)Sabella and Wozny (1983)Fiume, Fournier, Rudolph (1983)Abram, Westover, Whitted (1985)


pixel

Polygon MaskEdge Mask

pixel


Filtering operations done withbitmasks and table lookup:

tiling • visibility • filtering

Edge masks are precomputedand stored in a table.

& & =

Making a Polygon Mask

• look up edges masks• AND them together


Topics

“Triage” coverage masksdrive subdivision.

Classify cells versus polygons

• inside• outside• intersecting

outside

inside

represented astwo bitmasks

Triage Polygon Mask

Making a Triage Polygon Mask

+ =+

• look up triage edge masks• AND together inside masks• OR together outside masks


Topics

Data Structures

• polygons in BSP tree (Fuchs, Kedem, Naylor ‘80)

• accumulation buffer supports oversampling (Carpenter ‘84)

• coverage pyramid

Coverage Pyramid(pyramid of triage coverage masks)

• • • •

• • • •

• • • •

• • • •

• • • •• • • •

• • • •• • • •

zoom zoomzoom

WHOLESCREEN PIXEL

conventional coverage masks at finest level only (masks are really 8x8)

Recursive Tiling Procedure• ignore outside cells• write image in covered cells• subdivide intersected cells

• • • •

• • • •

• • • •

• • • •

• • • •• • • •

• • • •• • • •

zoom zoomzoom

WHOLESCREEN PIXEL

Rendering a Scene

• traverse polygons front to back• tile by recursive subdivision• where covered regions are found, update accumulation buffer and coverage pyramid

Properties of Hierarchical Tiling

• no overwrite of raster samples• minimal subdivision• maximal culling efficiency• low memory requirements

Frame time: .36 secondsModel: 192 presorted quadsRaster res.: 4096 x 4096Image res.: 512 x 512Filter kernel: box


Topics

Modifications toHierarchical Z-Buffering

(Greene, Kass, Miller ‘93)

• substitute hierarchical tiling for z-buffering• substitute coverage pyramid for z-pyramid• substitute octree of BSP trees for octree

Advantages of Hierarchical Tilingover Hierarchical Z-Buffering

• faster• uses much less memory when oversampling (e.g. 4%)• no overwrite of raster samples

Work ImageShows number of times cells in coverage

pyramid were visited (log scale).

1.1 cells visited per pixel (average)

work tiling cubes

.09 cells visited per pixel

work tiling polygons

1.01 cells visitedper pixel

Processing Dynamic Scenes

Novel Methods

• “lazy z-buffering”• merging octrees

.36 seconds totile and filter on4096 x 4096 grid

(75 mhz)

3.1 seconds totile and filter on4096 x 4096 grid

(75 mhz)

(image on right)

Frame time: 5 minutesModel: 167 million quads (octree of BSP trees)Raster res.: 4096 x 4096Image res.: 512 x 512Filter kernel: cosine-hump in 3x3 pixel neighborhood

Conclusion

• Warnock subdivision is a great algorithm.• Much more practical with coverage masks.• A very efficient visibility algorithm.

Hierarchical Visibility and Tiling

Hierarchical Rendering of Complex Environments

Over the last few years I’ve presented three Siggraph papers and written a PhD thesis on hierarchicalvisibility and tiling methods. The two papers included in these course notes are the most relevantof this work for interactive systems that require high performance. The first paper is a longerversion of the 1993 Siggraph paper that introduced hierarchical z-buffering, “Hierarchical Z-BufferVisibility,” written with Michael Kass and Gavin Miller. This paper introduced pyramidal z-buffers and hierarchical occlusion culling of models organized in an octree. Although this algorithmgenerates standard z-buffer images very efficiently, it is not well suited to high-quality antialiasing.This shortcoming is addressed by the second paper, “Hierarchical Polygon Tiling with CoverageMasks,” reprinted from the proceedings of Siggraph ‘96. This paper introduced using coveragemasks to perform tiling hierarchically, permitting very efficient generation of antialiased imagesof polygonal models that can be traversed in front-to-back order. These methods are presentedin a larger context in my PhD thesis, , which isavailable by ftp from the University of California at Santa Cruz (ftp.cse.ucsc.edu).

Ned GreenePortola Valley, CaliforniaApril 1997

1

Abstract

Introduction

Hierarchical Z-Buffer Visibility

HierarchicalRendering of Complex Environments

visible complexity

This paper is a slightly modified version of a chapter from Ned Greene’s PhD thesis,, University of California at Santa Cruz, 1995. The thesis

chapter is an extended version of the 1993 Siggraph paper “Hierarchical Z-Buffer Visibility” by NedGreene, Michael Kass, and Gavin Miller of Apple Computer.

We present a very efficient algorithm for generating z-buffer images of complex, densely occludedscenes. The algorithm achieves efficiency by applying hierarchical methods to culling hidden ge-ometry in both object space and image space, and by rendering visible geometry with the speed oftraditional z-buffer scan conversion. To enable hierarchical culling of geometry in hidden regionsof object space, we organize scene geometry in an octree. To render a frame, we recursively subdi-vide the octree in front-to-back order as with Meagher’s volume rendering algorithm [Meagher82b].During traversal of the octree, octree nodes hidden by the z-buffer are culled, and geometry insidevisible octree nodes is rendered into the z-buffer. Culling hidden octree nodes removes most hiddengeometry, but some geometry still overlaps when projected onto the screen. To accelerate cullingof remaining hidden geometry, z-buffer depth samples are maintained in a pyramid, which permitshierarchical culling in image space. Thus, the algorithm exploits coherence with both object-spaceand image-space hierarchies. For animation, we are also able to exploit temporal coherence byusing geometry that was visible in the preceding frame to construct a starting point for the algo-rithm. This appears to be the first visibility algorithm that materially profits from object-space,image-space, and temporal coherence simultaneously. For very densely occluded scenes, the algo-rithm sometimes achieves orders of magnitude acceleration compared with ordinary z-buffer scanconversion.

Extremely complex scenes offer interesting challenges for visibility algorithms. Consider, for exam-ple, an interactive walk-through of a detailed geometric model describing an entire city completewith gardens and trees, buildings with furnishings, etc. Traditional visibility algorithms runningon contemporary computers cannot come close to rendering scenes of this complexity at interactiverates, and it will be a long time before faster hardware alone will provide the needed performance.In order to get the most out of available hardware, we need faster algorithms that exploit propertiesof the visibility computation itself. Our ultimate objective is a visibility algorithm that does workproportional to the of the scene in the output image, rather than the complexityof the overall geometric model. While this objective is unattainable because it would allow no workto be expended in culling hidden geometry, culling can be done very efficiently using hierarchicalmethods.

The key to accelerating visibility computations is exploiting three forms of coherence: object-space coherence, image-space coherence, and for animation sequences, temporal coherence. Ideally, avisibility algorithm should be able to exploit all of these forms of coherence. However, no traditionalvisibility algorithm succeeds in doing this. Traditional z-buffering effectively exploits image-spacecoherence but not object-space coherence. Conversely, ray casting through a spatial subdivisioneffectively exploits object-space coherence but not image-space coherence. Although image-space

§§

§

§

2

2.1 The Object-Space Octree

The Hierarchical Z-Buffer Visibility Algorithm

culling with a ZZ-buffer [Salesin-Stolfi89] improves the efficiency of ray casting, it is still necessary totraverse every primitive in the scene at every frame, which impairs performance for densely occludedscenes. Prior work includes various attempts to simultaneously exploit both object-space and image-space coherence. In the domain of volume rendering, Meagher’s algorithm effectively exploits bothof these forms of coherence, but this method is not directly applicable to rendering geometricmodels [Meagher82b]. Potentially visible set methods effectively exploit both object-space andimage-space coherence (assuming z-buffering of primitives), but their usefulness is limited to modelshaving certain geometric characteristics [Airey90, Teller-Sequin91, Teller92]. Naylor’s BSP-treevisibility algorithm culls in both object space and image space, but it has not actually been shown toefficiently process densely occluded scenes [Naylor92b]. In short, no existing visibility algorithm hasbeen demonstrated to effectively exploit both object-space and image-space coherence in processingarbitrary geometric models. For animation, there is the additional challenge of harnessing temporalcoherence, which traditional algorithms rarely exploit in practice.

In this chapter, we present a z-buffer visibility algorithm that exploits object-space coher-ence as effectively as ray casting through a spatial subdivision, exploits image-space coher-ence even more effectively than traditional incremental scan conversion, and for animation se-quences is also able to exploit frame-to-frame coherence. To exploit object-space coherence,we use an octree spatial subdivision of the type commonly used to accelerate ray tracing[Reddy-Rubin78, Rubin-Whitted80, Glassner84, Kay-Kajiya86, Kaplan87, Jevans-Wyvill89]. Toexploit image-space coherence, we augment traditional z-buffer scan conversion with an image-spacez-pyramid that allows us to cull hidden geometry very rapidly. To exploit temporal coherence, weuse the geometry that was visible in the preceding frame to construct a starting point for thealgorithm. The result is a z-buffer visibility algorithm which is orders of magnitude faster thantraditional z-buffering for some densely occluded scenes we have experimented with. The algorithmis not difficult to implement and it works for arbitrary geometric models consisting of polygons andother primitives that can be efficiently scan converted. Moreover, the algorithm has modest memoryrequirements and it is amenable to parallel computation and to implementation in hardware.

In 2 we present the hierarchical z-buffer visibility algorithm, beginning with the data structuresthat it employs to exploit object-space, image-space, and temporal coherence. In 3 we discussmethods for building and maintaining the octree spatial subdivision. In 4 we describe our imple-mentation and show results for some complex models containing hundreds of millions of polygons.Finally, in 5 we state our conclusions.

The hierarchical z-buffer visibility algorithm uses an octree spatial subdivision to exploit object-space coherence, a z-pyramid to exploit image-space coherence, and to exploit temporal coherence inanimation sequences, it keeps track of octree nodes that were visible in the preceding frame. Whilethe full advantage of the algorithm is realized by using all three of these together, the octree and thez-pyramid can also be used separately. Whether used separately or together, these data structuresmake it possible to compute exactly the same result as ordinary z-buffering at less computationalexpense.

Octrees have been used previously with great effectiveness to accelerate ray casting and ray tracing[Rubin-Whitted80, Glassner84, Kay-Kajiya86, Kaplan87, Jevans-Wyvill89] and rendering of volumemodels [Meagher82b]. With some important modification, the principles of this previous work can

Occluder

z

TRUE FALSE

2.1.1 Testing Octree Cubes for Visibility

Figure 1: If a cube (or other bounding volume) is hidden, then all geometry it contains is also hidden.

IsCubeVisible(OctreeNode N)

{

if N is completely outside the viewing frustum

then return FALSE

if the viewpoint is inside N

then return TRUE

if N intersects the "near" face of the viewing frustum

then return TRUE

for each front face F of N {

if F is visible at one or more pixels

then return TRUE

}

return FALSE

}

be applied to z-buffer rendering. The result is an algorithm that can accelerate z-buffering byorders of magnitude for models with sufficient depth complexity. By depth complexity, we meanthe average number of primitives that overlap at each pixel on the screen.

In order to be precise about the algorithm, we begin with some simple definitions. We will saythat a polygon or other primitive is hidden with respect to a z-buffer if no depth samples on thetiled primitive are closer to the observer than the corresponding values already in the z-buffer.Similarly, we will say that a cube is hidden with respect to a z-buffer if all of its faces are hiddenpolygons. Finally, we will call an octree node hidden if its bounding cube is hidden. Note thatthese definitions depend on the sampling of the z-buffer. A primitive that is hidden at one z-bufferresolution may not be hidden at another.

With these definitions, we can state the basic observation that makes it possible to combine z-buffering with an octree spatial subdivision. As schematically illustrated in figure 2.1.1, if a cubeis hidden with respect to a z-buffer, then all geometry fully contained in the cube is also hidden.Actually, this principle applies to any bounding volume, not just a cube. It follows that if we tilethe faces of a cube and determine that it is hidden, we can safely ignore all the geometry containedin that cube. The following pseudocode outlines a procedure for determining whether an octreecube is visible, returning either or .

2.1.2 The Basic Rendering Algorithm

RenderScene(OctreeNode Root)

{

clear image buffer to background

clear z-buffer to far clipping plane

ProcessOctreeNode(Root)

}

ProcessOctreeNode(OctreeNode N)

{

if IsCubeVisible(N) returns FALSE

then return

for each primitive P associated with N

tile P into the z-buffer

for each child C of N in front-to-back order

ProcessOctreeNode(C)

}

ProcessOctreeNode()

This procedure’s first step is determining whether a node’s bounding cube intersects the viewingfrustum. We have developed a fast algorithm for the more general problem of detecting intersectionof an axis-aligned rectangular solid and a convex polyhedron. Our implementation uses this method,which is presented in the appendix and also in [Greene94]. With this method, determining whethera cube intersects the viewing frustum requires evaluating between one and thirty inequalities derivedfrom line and plane equations.

Given this cube-visibility test, the basic rendering algorithm is easy to construct. We begin byorganizing scene geometry into an octree. There are various ways of building an octree, but for themoment let’s assume that each primitive is associated with the smallest enclosing octree cube. Atthe beginning of a frame, we clear the image buffer to the background color (or image) and clear thez-buffer to the far clipping plane. Then, starting at the root node, we process the octree accordingto the following recursive steps. If the octree node is not visible, either outside the viewing frustumor hidden by the z-buffer, we are done. Otherwise, we tile any primitives associated with the octreenode into the z-buffer and then recursively process its children, if any, in front-to-back order usingthis same procedure. When the recursion finishes, we have a standard z-buffer image of the scene.It should be noted that Meagher’s volume rendering algorithm also uses this depth-first recursivesubdivision procedure to traverse an octree [Meagher82b]. The procedure for rendering a scene isoutlined in pseudocode below.

Note that procedure requires front-to-back ordering of the octants of anoctree node with respect to the viewpoint. This ordering during recursive subdivision guaranteesstrict front-to-back traversal of nodes in the octree. This is crucial to the algorithm’s efficiency,because it guarantees that any geometry that can occlude a node is tiled into the z-buffer beforethat node is processed. As a result, even though the z-buffer is usually only partially formed whencube-visibility tests are performed, these tests are definitive and succeed in culling all octree nodesthat are hidden.

The nested, rectilinear structure of an octree makes it easy to establish front-to-back ordering ofoctants. We use the method described in [Foley-et-al90], whereby the octant corresponding to the

A

E

F

G

B

C

D

H

m

m

{ } { } { } { }

2.1.3 Building the Octree

Figure 2: Ordering of a cube’s octants into four clusters of equivalent visibility priority: A , B,C,D , E,F,G , H .

nearest corner of the cube is known to be “frontmost,” and the three octants which share a face withthe frontmost octant all have the same visibility priority, just behind frontmost. Symmetrically,the octant opposite frontmost is “backmost,” and the three octants which share a face with thebackmost octant all have the same visibility priority, in front of backmost and behind all the otheroctants. Thus, this procedure clusters the eight octants of a cube into four groups of equivalentvisibility priority, as illustrated in figure 2. This ordering algorithm works regardless of the cube’sorientation and whether or not the viewpoint lies inside the cube.

The basic rendering algorithm outlined above has some desirable properties. First of all, thealgorithm only traverses and tiles primitives contained in octree nodes that are visible. Some ofthe tiled primitives may be hidden, but as illustrated in figure 3, each primitive in a visible octreecube is “nearly visible” in the following sense: there is some place we could move it where it wouldbe visible which is no farther away than the length of the diagonal of its bounding cube. Thus,the algorithm only tiles primitives which are visible or nearly visible. In addition, the algorithmonly visits visible octree nodes and their hidden children, so it does not waste time on irrelevantportions of the octree. Finally, the algorithm never visits an octree node more than once during therendering of a frame. This stands in marked contrast to ray casting through an octree, where theroot node is visited for every pixel rendered, and other nodes may be visited thousands of times.As a result of these properties, the algorithm is very efficient at both culling hidden geometry andtraversing visible geometry.

Recall that scene geometry must be organized into an octree prior to rendering. We can constructthe octree with a simple recursive procedure. Beginning with a root cube large enough to enclose theentire model and the complete list of geometric primitives, we perform the following steps recursively.If the number of primitives is sufficiently small, say less than or equal to , we associate all of theprimitives with the cube and return. Otherwise, we associate with the cube any primitive thatintersects any of the three axis-aligned planes that bisect the cube. We then subdivide the octreecube and call the procedure recursively with each of the eight child cubes and the list of primitivesthat fit entirely in that cube. When this recursive procedure finishes, each primitive is associatedwith the smallest enclosing octree cube in the hierarchy and each leaf node contains a maximum of

primitives.One weakness of this algorithm for building octrees is that it associates some small primitives

with large cubes if the primitives happen to intersect the planes that separate the cube’s children.For example, a small triangle that crosses the center of the root cube will be associated with theroot cube and it will need to be rendered anytime the entire model is not hidden. To avoid thisbehavior, there are two basic choices. One alternative is to clip the problematic small primitivesso they fit into much smaller octree cubes. This has the disadvantage of increasing the number ofprimitives in the model. The other alternative is to place some primitives in multiple octree cubes.We chose to implement the latter alternative. To do this, we modify the recursive construction of

Occluder

length ofcube's diagonal

×

1

z-pyramid

zz

zz z

2.2 The Image-Space Z-Pyramid

Figure 3: Any primitive inside a visible cube is within the length of the cube’s diagonal of being visible.

1

For polygonal primitives, we test for cube-polygon intersection using the method presented in the appendix.

the octree as follows. If we find that a primitive intersects a cube’s dividing planes but is smallcompared to the cube, then we no longer associate the primitive with the whole cube. Insteadwe associate it with each of the cube’s children that the primitive intersects. This subdivisionprocedure continues recursively until the primitive is associated with cubes of the appropriate size.The same strategy can also be used to place long, skinny primitives into multiple cubes.

Primitives that are associated with more than one octree node may be encountered more thanonce during rendering. To avoid rendering a primitive more than once, we mark it with the framenumber when it is rendered. This permits us to know whether a primitive has already been renderedin the current frame, without having to clear a flag for each primitive at the beginning of each frame.

The basic rendering algorithm described in the preceding section spends most of its time tiling cubefaces and primitives. The object-space octree allows us to cull large hidden portions of the modelat the cost of tiling the faces of the visible octree cubes and their hidden children. Actually, onlyhidden cube faces need to be tiled completely, since encountering a visible pixel on a face establishesthat a cube is visible, permitting tiling to stop. Even so, tiling of cube faces requires considerablecomputation because many faces are large, and since cubes are nested, faces may overlap denselyon the screen. Primitives associated with visible cubes must also be tiled, and they also overlapon the screen, sometimes densely. Even if we employ fast incremental methods to accelerate scanconversion tiling of cube faces and primitives is costly, because it requires traversing each primitivepixel by pixel, even if it is entirely or mostly hidden. Thus, there is a need to reduce the tilingrequirements of the algorithm. To accomplish this, we apply hierarchical methods to accelerateculling of hidden geometry in image space.

To support hierarchical culling in image space, we maintain z-buffer depth samples in a ,an image pyramid similar to those used in texture mapping (e.g. [Williams83]) and image processing(e.g. [Burt-Adelson83]). Frequently, the z-pyramid makes it possible to conclude very quickly thata cube face or primitive is hidden, making pixel-by-pixel scan conversion unnecessary.

Figure 4 shows a z-buffer image of a densely occluded scene and its corresponding z-pyramid. Thefinest level of the pyramid is an ordinary z-buffer. At all other levels, each sample is the farthest

from the observer in the corresponding 2 2 window of the next finer level. Every entry in thepyramid therefore represents the farthest sample for a square region of the screen. At the coarsestlevel of the pyramid there is a single value that is the farthest sample from the observer in the

×

1/3

z zz

zz

z

z

Figure 4: A scene and its corresponding z-pyramid. The finest level of the pyramid is the ordinary z-buffer. At all other levels,each sample is the farthest from the observer in the corresponding 2 2 window of the next finer level. Every entry in thepyramid therefore represents the farthest for a square region of the screen.

whole image. A z-pyramid requires more memory than a conventional z-buffer.

Maintaining the z-pyramid is an easy matter. Every time we modify the z-buffer, we propagatethe new value through to coarser levels of the pyramid. As soon as we reach a level where theentry in the pyramid is already as far away as the new value, propagation can stop, since coarserlevels of the pyramid will not be affected.

The method we use to test the visibility of primitives with respect to the z-pyramid is illustratedschematically in figure 5. First, we find the finest-level sample of the pyramid whose correspondingimage region encloses the primitive. Then, if the nearest value of the primitive is farther awaythan this z-pyramid sample, we know immediately that the entire primitive is hidden. Very often,we are able to show with this single depth comparison that an entire cube, cube face, or primitiveis hidden.

While the basic z-pyramid test can cull a substantial fraction of hidden primitives, it suffers froma similar difficulty to the basic octree method. Because of the structure of the pyramid, a smallprimitive covering the center of the image will be compared to the value at the coarsest level ofthe pyramid. While the test is still accurate in this case, it is not very powerful.

A definitive visibility test can be constructed by applying the basic test recursively through thepyramid. If the basic test fails to show that a primitive is hidden, we go to the next finer level

nearest zof primitive

screen z-pyramid

Step 1

Find the finest-level pyramid sample whose corresponding window of the screen encloses the primitive.

Step 2

If the nearest z of the primitive is farther away than this sample, the primitive is completely hidden.

previously rendered occluder

z-pyramid value forsquare window of screen

×

2

z

z

z

z

2

The znear pyramid can also establish that a primitive is entirely visible within a region of the screen, in which case, scan conversion

Figure 5: The procedure for determining whether the z-pyramid hides a primitive. Using the z-pyramid, often a single depthcomparison can show that an entire cube, cube face, or primitive is hidden. If a single test fails to show that one of thesegeometric entities is hidden, we can apply the same test recursively in smaller windows of the screen.

in the pyramid where the parent pyramid region is divided into four quadrants. Here we attemptto prove that the primitive is hidden in each of the quadrants that it intersects. For each of thesequadrants, we compare the closest value of the primitive within the quadrant to the value in thepyramid. If the pyramid value is closer, we know that the primitive is hidden in the quadrant. If wefail to prove that the primitive is hidden in one of the quadrants, we go to the next finer level of thepyramid for that quadrant and try again. Ultimately, we either prove that the entire primitive ishidden, or we recurse down to the finest level of the pyramid and find a visible pixel. This recursiveprocedure is essentially a logarithmic search for a visible pixel. Note that if we find all visible pixelson a primitive this way, we are tiling the primitive hierarchically. The polygon tiling algorithm wepresent in chapter 4, while not a z-buffer algorithm, performs tiling hierarchically and finds visiblesamples by logarithmic search.

In practice, it is expensive to perform definitive visibility tests on primitives in this hierarchicalmanner because it is necessary to determine which quadrants intersect the primitive and to findthe farthest value of the primitive within each intersected quadrant. An alternative is to performa much faster but non-definitive test which is often able to detect hidden primitives. According tothis method, we construct a primitive’s screen-space bounding rectangle and set its depth to theprimitive’s closest value. Then the recursive subdivision procedure described above can quicklyestablish whether there is at least one visible pixel on the bounding rectangle. If not, we know thatthe corresponding primitive is hidden. When this test fails to prove that a primitive is hidden, werevert to ordinary z-buffer scan conversion to establish visibility of cube faces and to tile primitivesinto the z-buffer. Our current implementation uses this method.

One interesting variation of this hierarchical culling procedure is to maintain a “znear pyramid” ofdepth samples in addition to the “zfar pyramid” described above. Like the zfar pyramid, the finestlevel of the znear pyramid is the ordinary z-buffer, which both pyramids share. At all other levels,each sample in the znear pyramid is the nearest sample from the observer in the corresponding 2 2window of the next finer level. Just as the zfar pyramid can establish that a primitive is hidden in aregion of the screen, the znear pyramid can establish that a primitive is at least partially visible ina region of the screen. Thus, visible primitives can usually be identified without subdividing down

z z

2.3 Exploiting Temporal Coherence

can proceed without comparisons. This occurs if the primitive’s farthest value within the region is nearer than the correspondingsample in the znear pyramid.

to the pixel level, which is always necessary if we only consult the zmax pyramid. We implementeddual z-pyramids and found that the overhead of maintaining the second pyramid cancelled outthe time saved by improved culling efficiency, so overall performance remained about the same.However, this was not a careful implementation, so no definitive conclusion should be drawn.

It is worth mentioning one other variation of the hierarchical culling procedure. If, during pixel-by-pixel scan conversion of front cube faces, a z-buffer depth comparison shows that a face is hiddenat a particular pixel, we know that we are completely done with that pixel, because front-to-backtraversal of octree cubes guarantees that any primitives subsequently traversed will be farther awayat that pixel. We can keep track of these “completed” pixels (e.g., by assigning them a special valuein the z-pyramid), and this status information can be propagated to coarser cells in the pyramid,since a pyramid cell is “complete” if its child cells are complete. This can accelerate culling becauseit often eliminates the need to perform depth comparisons. For example, if the z-pyramid samplefor the screen region which encloses an entire primitive is complete, we know immediately thatthe primitive is hidden, and there is no need to even establish the primitive’s depth. We have notimplemented this variation of hierarchical culling, so we can not report whether it improves thealgorithm’s overall performance.

We turn now to a method for exploiting frame-to-frame coherence during generation of animationsequences that can be applied if a z-buffer hardware accelerator is available. Note that the basichierarchical visibility algorithm can not make use of ordinary z-buffer accelerators because they donot maintain a z-pyramid and they can not perform visibility tests on cube faces.

When we render an image of a densely occluded scene with the hierarchical visibility algorithm,typically only a small fraction of the cubes in the octree are visible. When we render the nextframe, most of the cubes that were visible in the last frame will probably still be visible. Some ofthe cubes visible in the last frame will become hidden and some cubes hidden in the last frame willbecome visible, but the frame-to-frame coherence that is typical of most animation ensures thatthere will be relatively few changes in cube visibility for most frames, except for scene changes andcamera cuts. When a z-buffer hardware accelerator is available, we can exploit this fact in a simpleway with the hierarchical visibility algorithm.

We associate a frame number with each octree cube, indicating the last frame that it was knownto be visible. Frame numbers can be initialized to any number that is not a legal frame number. Werender the first frame of an animation sequence with the usual algorithm, marking visible cubes withthe number of the first frame. On all subsequent frames, we use the following two-pass algorithm torender a frame. The first pass begins by initializing the image and z-buffers. Then, we traverse allcubes in the octree that were visible in the preceding frame and render all of their primitives intoa conventional z-buffer using the hardware accelerator. This procedure is very fast because we aresimply traversing lists of primitives maintained in the octree and rendering them with the graphicsaccelerator. The final step of the first pass is to build a z-pyramid from the resulting z-buffer. Atthis point, the only thing missing from the image is geometry contained in cubes that have comeinto view since the last frame, and typically this is only a small fraction of all visible geometry.Likewise, the depth image stored in the z-pyramid is usually almost complete.

We render the “missing” geometry in the second pass, which is very similar to the original recursivesubdivision procedure for rendering a frame. We recursively subdivide the octree and traverse cubesin front-to-back order, testing octree cubes for visibility, culling hidden cubes, and if the primitives

3

§

−8 8

2

nn n

O nO n

O n n n O nO n

n mn m

O nO n O n

Building and Maintaining the Octree

associated with a visible cube have not already been rendered by the graphics accelerator, werender them into the z-pyramid with software scan conversion. Visible cubes are marked with thecurrent frame number. When traversal of the octree finishes, we have a standard z-buffer imageof the scene and all visible cubes have been marked with the current frame number. Typically,the second pass runs very rapidly because only a small amount of missing geometry needs tobe rendered. Moreover, since the z-pyramid is usually almost complete, hierarchical image-spaceculling of hidden cubes often requires less subdivision than it normally would, resulting in betterperformance. When there is a high degree of frame-to-frame coherence, this algorithm for exploitingtemporal coherence is much faster than the original algorithm because it renders nearly all primitiveswith the graphics accelerator rather than with software scan conversion, and because it permits moreefficient hierarchical image-space culling with the z-pyramid.

One way of thinking about how this temporal-coherence procedure accelerates culling is that webegin by guessing the final solution. If our guess is very close to the actual solution, the hierarchicalvisibility algorithm can use the z-pyramid to verify the portions of the guess that are correct fasterthan it can construct them from scratch. Only the portions of the image that it cannot verify asbeing correct require further processing.

Since the hierarchical visibility algorithm requires that scene primitives be organized into an octree,the cost of building and maintaining octrees is an important practical consideration. In this sectionwe show that the cost of building an octree from unorganized primitives is normally proportionalto log( ). By comparison, the cost of tiling a fixed-resolution image of a scene with naive z-buffering is linear in the number of primitives, since each primitive can be tiled in constant time.This disparity in complexity shows that an unorganized list of primitives is not a good representationfor very complex models, and underscores the need to understand the asymptotic performance ofthe algorithms applied.

We begin our discussion of this topic by considering the cost of building an octree from anunorganized list of primitives using either variation of the algorithm presented in 2.1. To estimateasymptotic cost, it is necessary to make some assumptions. First, we assume that the averagedepth of a leaf node in the octree is (log( )) and that, on average, the depth of insertion of aprimitive is also (log( )). We also assume that each primitive is inserted into no more than someconstant number of nodes at each level. Given these assumptions, the cost of building the octreeis ( log( )), since each of the primitives must be inserted into, on average, (log( )) levels.Our assumption that the average depth of a leaf node is (log( )) is valid, unless the octree is verypoorly balanced, which is unlikely to occur unless the underlying geometry is distributed in a veryunusual way. Average depth is lowest for a perfectly balanced octree. In this case, assuming that all

primitives are associated with leaf nodes and each leaf node has primitives, the tree’s depth islog ( ) log ( )+1. Although average depth is higher for octrees that are not so well balanced, asAho et al. show for binary trees [Aho-et-al83], even octrees with “random” branching patterns have

(log( )) average depth. In the worst case, the octree’s branching structure is “linear,” averagedepth is ( ), and the cost of building the octree is ( ).

When the scene model is static, the cost of building the octree is usually not an issue, because itmay be considered a precomputation expense which can be amortized over all of the frames. Formost interactive applications, the primary concern is maintaining rapid frame updates, and it maybe acceptable to pay a high precomputation cost. The same considerations apply if the scene ispredominantly static, with only a small number of moving objects. In this case, we can precompute

d O d d

d

kk

dynamicscenes

static octree dynamic primitivesdynamic octree

coherence

an octree for the static components, render this part of the scene with the usual recursive subdivisionprocedure, and then complete the frame by tiling the moving objects into the z-buffer. The abilityto render objects in any order is one of the advantages of z-buffering.

The cost of building and maintaining an octree is more problematic when animatinghaving numerous moving primitives. Although the static component of the scene, if any,

can be handled as described above with a , this leaves numerousthat must be organized each frame. One strategy is to build a from the dynamicprimitives at the beginning of each frame and, assuming that it is registered with the static octree,traverse the static and dynamic primitives simultaneously during the recursive rendering procedure.However, building the dynamic octree from unorganized primitives normally requires ( log( ))time and we would like to reduce this cost if possible. Fortunately, octree construction can oftenbe accelerated by exploiting frame-to-frame coherence or by using lazy evaluation.

To exploit coherence in building the dynamic octree, we note that due to frame-to-frame coherencein the motion of primitives, it is usually faster to update the dynamic octree for the preceding framethan to build a new octree from scratch. Typically, a dynamic primitive moves only a short distancefrom one frame to the next, and in this case the primitive usually remains in the same octree nodeor moves to a nearby octree node. On average, such repositioning would be expected to requiremuch less work than insertion into the log( ) levels of a new octree. While this approach can saveconsiderable computation, the cost of updating an octree is at least linear in the number of dynamicprimitives, since each primitive must be individually processed.

Now let’s consider lazy evaluation as a strategy for reducing the cost of building an octree fromunorganized primitives. With lazy evaluation, instead of building the complete octree and thentraversing its visible nodes, we organize primitives within octree nodes as they are encounteredduring the recursive rendering procedure. We begin by associating all scene primitives with a rootnode having no children. During recursive subdivision, we determine that an octree node is visiblebefore organizing its primitives into octants. This method avoids organizing primitives in hiddenoctree nodes, so in densely occluded scenes it can avoid most of the work required to build the fullysubdivided octree. However, even with lazy evaluation the work required to build the octree is atleast linear with respect to the number of primitives, since each primitive must be considered whenthe root node is processed, except in the trivial case that the root cube is hidden.

Thus far, the discussion of building and updating octrees has presumed that scenes are repre-sented as an unorganized list of primitives. Consequently, the methods presented need to considerevery dynamic primitive at every frame, and their performance is, at best, linear in the number ofdynamic primitives. Fortunately, it is usually possible to obtain better performance, but this gen-erally requires some higher-level structure in the scene model, some form of that can beexploited. For example, some primitives could be organized in bounding volumes, the model couldinclude instances of replicated geometry, there could be limits on the range of motion of primitivesor instances, and so forth. Let’s consider each of these circumstances in turn, bearing in mind thatthe proposed methods may require modifying the basic rendering algorithm.

If the model includes bounding volumes, it’s not necessary to insert primitives individually intothe octree. Rather, the cluster of primitives bounded by a volume can be inserted all at once intothe octree node that encloses the volume. If the model includes instances of a repeating module,the module can be organized into bounding volumes, and transformed instances of the modulecan then be organized within the octree by transforming and inserting their component boundingvolumes. To exploit limited range of motion, if we know that an object moves a maximum of somefixed distance each frame, we can construct a bounding volume that is guaranteed to enclose theobject over a sequence of frames. It follows that the octree entry for this object only needs to beupdated every frames. In the case that the range of motion of a cluster of objects is limited to a

4

§

Implementation and Results

particular bounding volume over all frames, the objects can be inserted into the static octree andnever updated. This would be appropriate, for example, for an anchored, articulated object suchas a robot arm.

In principle, it should be possible to perform space-time culling of hidden dynamic geometry bycombining some of the methods outlined above. The central idea is to cull geometry within regionsof space that are hidden over intervals of time. For example, if the motion of a cluster of primitivesis predetermined, then for any sequence of frames we can construct a bounding volume for theprimitives. Then, when processing any frame in the sequence, we can safely cull all primitives inthe cluster if that volume is hidden (or if an octree node enclosing the volume is hidden). Thisapproach can be applied hierarchically in both space and time. If the bounding volume is visiblein a given frame, we can subdivide in either space or time as appropriate to construct a smallerbounding volume and then proceed as before. While the details remain to be worked out, thisappears to be a promising approach to culling dynamic geometry.

In conclusion, a variety of strategies for exploiting coherence can accelerate building and main-taining octrees for dynamic scenes, and avoid the expense of considering every dynamic primitive atevery frame of animation. The effectiveness of particular methods depends a great deal on geomet-ric characteristics of the model and how the model is represented, so it is difficult to draw generalconclusions. But regardless of the specific circumstances, it is often possible to find methods thateffectively exploit coherence in order to avoid wasting time on geometry within hidden regions ofspace, whether or not that geometry is moving.

Our implementation of the hierarchical visibility algorithm uses the object-space octree, the image-space z-pyramid, and optionally, the temporal-coherence procedure. Software is written in C. Totest the algorithm, we constructed a modular polygonal model of an office interior. The model isorganized in cubic modules constructed from transformed instances of office cubicles and stairwells,each module consisting of approximately 15,000 polygons. To construct complex office interiors,instances of the module are replicated in a three-dimensional grid. The repeating module wasdesigned to create environments having very complex occlusion relationships in which it is possibleto see deep into the scene from most vantage points. This was accomplished by making supportcolumns thin, limiting the height of cubicle walls, and including large open stairwells that make itpossible to see parts of neighboring floors.

Given our simple replication scheme, it was possible to represent the model as an “octree ofoctrees” where the super-octree references translated instances of a conventional octree for therepeating module. Thus, it was only necessary to build one conventional octree from the 15,000polygons in the repeating module. To build this octree, we used the method described in section2.1.3 that avoids placing small polygons in large octree cubes. Assignment of polygons to octree

cubes obeyed the following recursively applied rule: if the area of a polygon was smaller than one-tenth the area of one of a cube’s faces, then the polygon was associated with the cube’s children thatit intersected. After building the complete octree, if there were fewer than a total of 20 polygons inany octree node and its children, we eliminated that node, associating its polygons with the parentnode. In the final octree, between 20 and 66 polygons were associated with each leaf node, theaverage being 31.3, and each polygon was associated with an average of 1.9 octree nodes.

3

×

× ×

§

×

4.1 A Simple Scene

4.2 A Complex Scene

3

It should be noted that our unoptimized scan conversion code only renders about 5,000 polygons a second. A careful implementationwould probably run several times faster.

For simple models with low depth complexity, the hierarchical visibility method can be expected totake somewhat longer than traditional scan conversion due to the overhead of performing visibilitytests on octree cubes and the cost of maintaining a z-pyramid. To measure the algorithm’s overheadon simple models, we rendered a single module consisting of approximately 15,000 polygons at aviewpoint from which a high proportion of the model was visible. On a 50Mhz R4000 SGI Crimson,rendering time for a 512 512 image was 1.52 seconds with the hierarchical visibility method and1.30 seconds with traditional scan conversion, indicating a performance penalty of 17%. When werendered three instances of the module (45,000 polygons), the running time was 3.05 seconds forboth methods, indicating that this level of complexity was the break-even point for this particularmodel.

The chief value of the hierarchical visibility algorithm is, of course, for scenes of much higherdepth complexity. To illustrate the point, we constructed a 33 33 33 replication of the office-interior module, producing a model having approximately 538 million polygons. Figure 6 showstwo consecutive frames of animation of the office environment. The frame on the right was renderedwith the temporal-coherence procedure (shading anomalies are explained below). In figure 7, weshow a top view of the viewing frustum and the octree subdivision. 59.7 million polygons lie insidethe viewing frustum, about one-tenth of the entire model. The cubes shown in wireframe in figure7 are the octree nodes visited during rendering, that is, the visible octree nodes and their hiddenchildren. Note that the algorithm is able to prove that many large octree nodes in the backgroundare hidden. The z-pyramid for the scene is shown in the left panel of figure 4. Even at fairlycoarse resolutions, the z-pyramid contains a recognizable representation of the major occluders inthe scene.

During recursive subdivision of the octree, our implementation of the hierarchical visibility algo-rithm culls hidden cubes using the fast z-pyramid test described in 2.2, which determines whethera cube face’s screen-space bounding rectangle is visible at the depth of its nearest vertex. If thistest fails to show that a cube’s faces are all hidden, we assume that the cube is visible withoutperforming a definitive visibility test by tiling the cubes’s front faces. We found that performingdefinitive tests took more time than it saved. For the scene of figure 6 (left panel), the z-pyramidtest was invoked on 5910 octree cubes and succeeded in culling 3450 of them. The remaining 2460potentially visible cubes contained approximately 47,900 polygons. It follows that the z-pyramidtest on octree cubes culled .9992 of all polygons inside the viewing frustum, leaving only .0008 ofthe polygons to be rendered. Of the 47,900 polygons in potentially visible cubes, approximately19,300 were front facing. Each front-facing polygon was tested for visibility with the fast z-pyramidtest, and if it failed to prove that the polygon was hidden, the polygon was tiled pixel by pixel intothe z-pyramid using software scan conversion.

Rendering this frame at 512 512 resolution using only software scan conversion took 4.88 secondson a 50Mhz R4000 SGI Crimson. Of the 4.88-second frame time, 15% was consumed by z-pyramidtests on octree cubes and polygons (.73 seconds), 82% was consumed by software scan conversion ofthe polygons inside potentially visible octree cubes (4.02 seconds), which left 3% (.13 seconds) for allother operations – clearing the image buffer and z-pyramid, testing octree cubes for intersection withthe viewing frustum, etc. Note that the hierarchical visibility algorithm spent the vast proportion

2×

×

×

n n/n

4.3 Depth Complexity of Tiling Operations

Figure 6: Two consecutive frames from animation of an office environment rendered with the hierarchical visibility algorithm.The model is organized as an octree of octrees and contains 538 million replicated polygons. The frame on the left wasrendered at 512 512 resolution with all-software scan conversion in 4.88 seconds on a 50Mhz R4000 SGI Crimson. The frameon the right was rendered with the temporal-coherence procedure, which begins by using the Crimson’s hardware acceleratorto render all primitives inside octree cubes that were visible in the preceding frame. The polygons rendered with hardwareacceleration are Gouraud shaded. The remaining polygons, shown in magenta to distinguish them, were rendered in a secondpass using software scan conversion. Using this temporal-coherence procedure, this 512 512 frame took 2.02 seconds to renderon the Crimson, which has a VGX hardware accelerator. The high degree of frame-to-frame coherence of visible octree cubesthat is apparent in this example is typical of walk-through animation of this environment.

of its time (82%) tiling visible or nearly visible polygons, indicating that the efficiency of traversingthe octree and culling hidden octree cubes and polygons with the z-pyramid was very high.

To compare the performance of the hierarchical visibility algorithm with naive z-buffering, weproduced the identical z-buffer image by performing software scan conversion of all 59.7 millionpolygons lying inside the viewing frustum. This process took 33.6 minutes on our 50Mhz Crimson,413 times longer than the hierarchical visibility algorithm.

As the preceding example shows, the hierarchical visibility algorithm spends nearly all of its timeperforming tiling – hierarchically tiling cubes faces to determine whether cubes are visible and tilingthe model primitives inside visible or nearly visible cubes. Consequently, we can visualize wherethe algorithm is spending its time by constructing images that indicate the number of times thateach pixel is traversed during tiling operations. We will say that such images depict the “depthcomplexity” of tiling operations. Constructing depth-complexity images for conventional z-bufferscan conversion is very straightforward: we simply count the number of times each pixel is traversedin the course of tiling and encode these numbers in an image. We can also depict how much workis done during hierarchical culling with the z-pyramid by keeping track of the number of times eachz-pyramid sample is accessed and amortizing each access over the corresponding square region ofthe screen. Specifically, when a pyramid sample representing an block of pixels is accessed, weadd 1 to the value for depth complexity at each pixel in that region of the screen.

Figure 8 shows depth-complexity images of the complete scene model and various tiling operationsperformed by the hierarchical visibility algorithm. These are log-scale images in which intensityencodes the log of the number of times each pixel is traversed. The upper-left panel is a z-bufferimage of the scene, also shown in figure 6. The upper-right panel shows the depth complexity of the

§

Figure 7: Top view of viewing frustum and octree cubes visited while rendering the office model of figure 6.

geometry in the entire scene, which is 83.7, meaning that on average, 83.7 polygons overlap at eachpixel. In other words, if we were to render the scene by casting a single ray through each pixel center,the average number of polygons intersected by each ray would be 83.7. As previously mentioned,we obtained this image by z-buffering the 59.7 million polygons that lie inside the viewing frustum,which took 33.6 minutes.

The bottom tier of panels in figure 8 shows the depth complexity of tiling operations performed bythe hierarchical visibility algorithm in the course of producing the same z-buffer image. The bottom-left panel shows the depth complexity of z-pyramid tests on cube faces and polygons (.45), encodedas described above, the bottom-center panel shows the depth complexity of tiling polygons (2.51),and the bottom-right panel shows the sum of these two images. Total average depth complexity is2.96. As previously mentioned, our implementation uses the fast z-pyramid test described in 2.2 tocull hidden cubes and polygons. The low average depth complexity of .45 for these operations (.20for cubes, .25 for polygons) is an indication of how efficiently the z-pyramid performs hierarchicalculling. The bottom-center panel shows the depth complexity of scan converting the polygons insidepotentially visible cubes. The bottom-right panel shows the sum of the z-pyramid and polygon-tiling depth-complexity images, and therefore indicates the total depth complexity of all tilingoperations performed by the hierarchical visibility algorithm in rendering this scene, which is 2.96.As previously mentioned, the depth complexity of naive z-buffering is 83.7, higher by a factor of28.3.

We also created depth-complexity images for running the hierarchical visibility algorithm withoutthe z-pyramid in order to estimate the value of hierarchical culling in image space. The results areshown in the middle tier of panels in figure 8. When a z-pyramid is not available, visibility of cubefaces and polygons must be determined by conventional pixel-by-pixel scan conversion. The middle-left panel shows the depth complexity of tiling cube faces (25.6), the middle-center panel shows thedepth complexity of tiling the polygons that are inside visible cubes (3.6), and the middle-rightpanel shows the sum of these two images. Thus, without the hierarchical image-space culling thatis enabled by a z-pyramid, the depth complexity of all tiling operations for this scene is 29.2, nearlyten times higher than when we use a z-pyramid.

Summing up, we have compared the depth complexity of three different tiling methods for produc-ing a standard z-buffer image of a densely occluded scene having complex occlusion relationships.

total (2.96)

total (29.2)

naive z-buffering (83.7)reference image

tiling of cube faces (25.6)

z-pyramid tests (.45)

tiling of polygons (3.6)

tiling of polygons (2.51)

standard hierarchical visibility algorithm

hierarchical visibility algorithm without z-pyramid

Figure 8: Log-scale depth-complexity images that help to visualize where various tiling procedures are spending their time.The right column of panels shows the total depth complexity of tiling for:

top: naive z-buffering (83.7)middle: the hierarchical visibility algorithm without z-pyramid culling (29.2)bottom: the standard hierarchical visibility algorithm (2.96).

The depth complexities of tiling operations for hierarchical visibility, hierarchical visibility withouthierarchical image-space culling, and naive z-buffering are 2.96, 29.2, and 83.7, respectively.

The office environment of figure 6 was chosen in part because of the difficulties it presents poten-tially visible set methods [Airey90, Teller-Sequin91, Teller92]. Recall that these methods partitionmodel space into disjoint regions and establish the “potentially visible set” of primitives that arevisible from each region. Then, only primitives in the potentially visible set for the region thatcontains the current viewpoint need to be rendered for a given frame. While potentially visible setmethods often work well for architectural models, they would not work effectively with our modelbecause within every office cubicle there are viewpoints from which almost every other cubicle onthe same floor is visible. As a result, if the cubicles were used as cells, the potentially visible set foreach cell would have to include nearly all the cells on its floor and many on other floors. Since eachfloor contains about four million polygons, potentially visible set methods would probably have torender many more polygons than the hierarchical visibility method.

××

4.4 An Outdoor Scene

4.5 Parallel Performance

Figure 9: Terrain models rendered with the hierarchical visibility algorithm.

Figure 9 shows the hierarchical visibility method applied to outdoor scenes consisting of a terrainmesh with vegetation replicated on a two-dimensional grid. The model shown in the left panelconsists of approximately 53 million polygons, but only about 25,000 polygons are visible from thispoint of view. Most of the model is hidden by the hill or is outside the viewing frustum. Thisframe took approximately 7 seconds to render with software scan conversion on our 50Mhz R4000SGI Crimson. On the right, we show a model consisting of approximately 5 million polygons. Eventhough this model has fewer primitives, the image took longer to render because a much largerfraction of the model is visible from this point of view. This image took approximately 40 secondsto render with software scan conversion on the Crimson. These outdoor scenes have very differentcharacteristics from the building interiors shown in figure 6 and are poorly suited to potentiallyvisible set methods because cell-to-cell visibility is not nearly as limited as in an architecturalinterior. Nonetheless, the hierarchical visibility algorithm continues to work effectively.

We have made our hierarchical visibility implementation capable of dividing the image into a gridof smaller windows, rendering them individually, and combining them into a final image. Theperformance of the algorithm as the window size is varied tells us about the parallel performance ofthe algorithm and the extent to which it is able to exploit image-space coherence. If, like most raytracers, the algorithm made no use of image-space coherence, we could render each pixel separatelyat no extra cost. Then it would be fully parallelizable. At the other extreme, if the algorithm madethe best possible use of image-space coherence, it would render a sizeable region of pixels with onlyslightly more computation than required to render a single pixel. Then it would be difficult toparallelize. Note that if we shrink the window size down to a single pixel, the hierarchical visibilityalgorithm resembles a ray caster using an octree subdivision.

Figure 10 graphs the rendering time for a frame from a walk-through of the model shown in figure6 as a function of the window size. For window sizes of 32 32 and larger, the curve is relatively flat,indicating that the algorithm should parallelize fairly well. For window sizes smaller than 32 32,however, the slope of the curve indicates that the time required to render a window is almost

2048

1024

512

256

128

64

32

16

8

4

2

1

128 64 32 16 8 4 2 1

renderingtime inseconds

window size

×

×

z-query

4.6 Use of Graphics Hardware

Figure 10: Total time in seconds to render all windows of a frame versus window size expressed as the number of pixels onthe side of each window.

independent of its size. For example, note that it only takes about four times longer to render a32 32 region as it does to ray-cast a single pixel with this algorithm.

In addition to the pure software implementation, we have attempted to modify the algorithm tomake effective use of available commercial hardware graphics accelerators. This raises some difficultchallenges because the hierarchical visibility algorithm makes somewhat different demands of scan-conversion hardware than traditional z-buffering. In particular, our octree culling procedure dependson being able to determine quickly whether a polygon would be visible if it were scan converted.Unfortunately, the commercial hardware graphics pipelines we have examined are either unable toanswer this query at all, or can take milliseconds to answer it. One would certainly expect somedelay in getting information back from a graphics pipeline, but hardware designed with this typeof query in mind should be able to return a result in microseconds rather than milliseconds.

We have implemented a modified version of the hierarchical visibility algorithm on a KubotaPacific Titan 3000 workstation with Denali GB graphics hardware. The Denali hardware supportsan unusual graphics library call that determines whether or not any pixels in a set of polygons arevisible given the current z-buffer. We use this feature to determine the visibility of octreecubes. The cost of a z-query depends on the screen size of the cube, and it can take up to severalmilliseconds to determine whether or not a cube is visible. Our implementation does not use az-pyramid because it is not supported by the Denali hardware. During walk-through animationof a version of the office environment with 1.9 million polygons, the Titan took an average of .54seconds per frame to render 512 512 images. Because of the cost of doing the z-query, we only testedvisibility of octree cubes containing at least 800 polygons. Even so, 36.5% of the running time wastaken up by z-queries. If z-query were faster, we could use it effectively on octree cubes containingmany fewer polygons and achieve substantial further acceleration. The Titan implementation hasnot been fully optimized for the Denali hardware and makes no use of temporal coherence, so theseperformance figures should be considered only suggestive of the machine’s capabilities.

The other implementation we have that makes use of graphics hardware runs on SGI workstations.On these workstations there is no way to inquire whether or not a polygon is visible without actually

4

5

§

Conclusion

4

No doubt this idea has occurred to many practitioners, but I first heard it articulated by Lance Williams in the early 1980’s whenwe worked together at the NYIT Computer Graphics Lab. When I moved to Apple in 1989, I was reminded of the strategy by GavinMiller, who had used this method to accelerate z-buffering.

rendering it, so we use the hybrid hardware/software strategy described in 2.3. Reiterating thisdiscussion, we render the first frame of an animation sequence entirely with software scan conversion.Starting with the second frame, we use the hardware pipeline to render the polygons contained inoctree nodes that were visible in the preceding frame. Then we read the image and the z-bufferfrom the hardware, build a z-pyramid, and continue with the second pass of the temporal-coherenceprocedure, filling in geometry that has come into view since the last frame with software scanconversion. With this implementation, temporal coherence typically reduces frame time by a factorof approximately two and one-half for animation of the office environment.

In the course of walk-through animation of the office environment, we rendered the frame in theleft panel of figure 6 without the temporal-coherence procedure, and then rendered the next frame,shown in the right panel, with it. In the temporal-coherence frame, polygons rendered with thehardware pipeline are Gouraud shaded and the remaining polygons rendered with software scanconversion are shown in magenta to distinguish them. For the most part, these are polygons thatcame into view as a result of panning the camera. These magenta polygons are the only geometryvisible in this frame that is not associated with an octree node that was visible in the precedingframe. This high degree of frame-to-frame coherence of visible octree nodes is typical of walkingthrough this environment.

Current graphics accelerators are not designed to support the rapid feedback from the pipelineneeded to realize the full potential of octree culling in the hierarchical visibility algorithm. Hard-ware designed to take full advantage of the algorithm, however, could make it possible to interactvery effectively with extremely complex environments as long as only a manageable number ofthe polygons are visible from any point of view. The octree subdivision, the z-pyramid, and thetemporal-coherence procedure are all suitable for hardware implementation.

Underlying our basic algorithm is a very simple idea: we organize scene geometry in boundingboxes, and before doing any work on the geometry inside a box, we first verify that the box itselfis visible by tiling its faces. In the context of z-buffering, visibility of boxes can be established bytiling their faces with ordinary scan conversion, but as we have seen, hierarchical tiling enabled by az-pyramid permits box-visibility tests to be performed much more efficiently. And if bounding boxesare arranged hierarchically, as they are in an octree, culling of hidden geometry can be performedhierarchically in both object space and image space. Conversely, we find visible geometry bylogarithmic search in both object space and image space. This simple strategy encapsulates ourbasic algorithm. As previously noted, Meagher has applied a similar approach to volume rendering[Meagher82b].

From the standpoint of coherence, the algorithm exploits object-space coherence as effectively asray casting through a spatial subdivision and it exploits image-space coherence even more effectivelythan traditional incremental scan conversion. For animation sequences it is also able to exploitframe-to-frame coherence. In fact, the hierarchical visibility algorithm appears to be the firstpractical visibility algorithm which materially profits from object-space, image-space, and temporalcoherence simultaneously. The algorithm has been tested and shown to work effectively on complex,densely occluded indoor and outdoor scenes with up to half a billion polygons. While the algorithmcan make use of existing graphics accelerators without modification, small changes in the design ofgraphics accelerators would dramatically improve the performance of the algorithm. We hope that

References

the appeal of this algorithm will induce hardware designers to adapt future graphics hardware tofacilitate hierarchical visibility computations.

Data Structures and Algorithms

ACM Transactions on Graphics

Computer Graphics, Principles and Practice

IEEE Computer Graphics and Applications

Proceedings of SIG-GRAPH ’93

Graphics GemsIV

Proceedings of GraphicsInterface ’89

Techniques for Computer Graphics

Proceedings of SIGGRAPH ’86

Proceedings of Graphics Interface ’92


Proceedings of the PIXIM ’89 Conference

Proceedings ofSIGGRAPH ’91


[Aho-et-al83] A. Aho, J. Hopcroft, and J. Ullman, , Addison-Wesley, Reading, MA,1983.

[Airey90] J. M. Airey, “Increasing Update Rates in the Building Walkthrough System with Automatic Model-SpaceSubdivision and Potentially Visible Set Calculations,” PhD Thesis, Technical Report TR90-027, ComputerScience Dept., U.N.C. Chapel Hill, 1990.

[Burt-Adelson83] P. J. Burt and E. H. Adelson, “A Multiresolution Spline with Applications to Image Mosaics,”, 2(4), Oct. 1983, 217–236.

[Foley-et-al90] J. Foley, A. van Dam, S. Feiner, and J. Hughes, , 2ndedition, Addison-Wesley, Reading, MA, 1990.

[Glassner84] A. S. Glassner, “Space Subdivision for Fast Ray Tracing,” ,4(10), Oct. 1984, 15–22.

[Greene-Kass-Miller93] N. Greene, M. Kass, and G. Miller, “Hierarchical Z-Buffer Visibility,”, July 1993, 231–238.

[Greene94] N. Greene, “Detecting Intersection of a Rectangular Solid and a Convex Polyhedron,”, Ed: P. Heckbert, 1994, 71–79.

[Jevans-Wyvill89] D. Jevans and B. Wyvill, “Adaptive Voxel Subdivision for Ray Tracing,”, June 1989, 164–172.

[Kaplan87] M. R. Kaplan, “The Use of Spatial Coherence in Ray Tracing,” in ,Ed: D. F. Rogers and R. A. Earnshaw, Springer-Verlag, New York, 1987, 173–193.

[Kay-Kajiya86] T. Kay and J. Kajiya, “Ray Tracing Complex Scenes,” , Aug. 1986,269–278.

[Meagher82b] D. Meagher, “The Octree Encoding Method for Efficient Solid Modeling,” PhD Thesis, ElectricalEngineering Dept., Rensselaer Polytechnic Institute, Troy, New York, Aug. 1982.

[Naylor92b] B. Naylor, “Partitioning Tree Image Representation and Generation from 3D Geometric Models,”, May 1992, 201–212.

[Reddy-Rubin78] D. R. Reddy and S. M. Rubin, “Representation of Three-Dimensional Objects,” Technical ReportCMU-CS-78-113, Computer Science Dept., Carnegie-Mellon University, April 1978.

[Rubin-Whitted80] S. M. Rubin and T. Whitted, “A 3-Dimensional Representation for Fast Rendering of ComplexScenes,” , July 1980, 110–116.

[Salesin-Stolfi89] D. Salesin and J. Stolfi, “The ZZ-buffer: A Simple and Efficient Rendering Algorithm with ReliableAntialiasing,” , Hermes Editions, Paris, Sept. 1989, 451–466.

[Teller-Sequin91] S. Teller and C. Sequin, “Visibility Preprocessing for Interactive Walkthroughs,”, July 1991, 61–69.

[Teller92] S. Teller, “Visibility Computations in Densely Occluded Polyhedral Environments,” PhD Thesis, Univ.of California at Berkeley, Report No. UCB/CSD 92/708, Oct. 1992.

[Williams83] L. Williams, “Pyramidal Parametrics,” , July 1983, 1–11.

∗

∗

×

×

××

cover-age masks

Abstract

1 INTRODUCTION

CR Categories:

Keywords:

Ned Greene

Apple Computer

Hierarchical Polygon Tiling with Coverage Masks

Contact author at [email protected],Apple Computer, 1 Infinite Loop, Cupertino, CA 95014

We present a novel polygon tiling algorithm in which recur-sive subdivision of image space is driven by coverage masksthat classify a convex polygon as inside, outside, or inter-secting cells in an image hierarchy. This approach permitsWarnock-style subdivision with its logarithmic search prop-erties to be driven very efficiently by bit-mask operations.The resulting hierarchical polygon tiling algorithm performssubdivision and visibility computations very rapidly whileonly visiting cells in the image hierarchy that are crossedby visible edges in the output image. Visible samples arenever overwritten. At 512 512 resolution, the algorithmtiles as rapidly as traditional incremental scan conversion,and at high resolution (e.g. 4096 4096) it is much faster,making it well suited to antialiasing by oversampling andfiltering. For densely occluded scenes, we combine hierarchi-cal tiling with the hierarchical visibility algorithm to enablehierarchical object-space culling. When we tested this com-bination on a densely occluded model, it computed visibilityon a 4096 4096 grid as rapidly as hierarchical z-buffering[Greene-Kass-Miller93] tiled a 512 512 grid, and it effec-tively antialiased scenes containing hundreds of thousandsof visible polygons. The algorithm requires strict front-to-back traversal of polygons, so we represent a scene as a BSPtree or as an octree of BSP trees. When maintaining depthorder of polygons is not convenient, we combine hierarchicaltiling with hierarchical z-buffering, resorting to z-bufferingonly in regions of the screen where the closest object is notencountered first.

I.3.7 [Computer Graphics]: Three-Dimensional Graphics and Realism - Hidden line/surfaceremoval; I.3.3 [Computer Graphics]: Picture/Image Gener-ation.

tiling, coverage mask, antialiasing, visibility,BSP tree, octree, recursive subdivision.

Polygon tiling algorithms have been an important topic incomputer image synthesis since the advent of raster graph-ics some two decades ago. Their purpose is to determine

which point samples on an image raster are covered by thevisible portion of each of the polygons composing a scene.Currently, polygon tiling software running on inexpensivecomputers can render point-sampled images of simple scenesat interactive rates. The fastest tiling algorithms have beencarefully tuned to exploit image-space coherence by usingincremental methods wherever possible. However, they failto exploit opportunities for precomputation and they wastetime tiling hidden geometry. There is a need for more effi-cient tiling algorithms that effectively exploit coherence andprecomputation to enable efficient culling of hidden geome-try and efficient tiling of visible geometry.

The dominant polygon tiling algorithm in use today is in-cremental scan conversion. Typically, raster samples on apolygon’s perimeter are traversed with an incremental line-tiling algorithm. Edge samples on each intersected scan-line define spans within a polygon, which are then traversedpixel-by-pixel, permitting incremental update of shading pa-rameters and, in the case of z-buffering, depth values. Vis-ibility of samples can be determined by a) maintaining az-buffer and performing depth comparisons [Catmull74], b)traversing primitives back to front and writing every pixeltiled [Foley-et-al90], or c) traversing primitives front to backand overwriting only vacant pixels [Foley-et-al90]. With in-cremental scan conversion, the cost per pixel tiled is verylow because incremental edge and span traversal effectivelyexploits image-space coherence.

One problem with traditional incremental scan conver-sion is that it must tile every sample on every primitive,whether or not it is visible, and so it wastes time tiling hid-den geometry. This is not a big problem for simple scenes,but for densely occluded scenes it severely impairs efficiency.Ideally, a tiling algorithm should cull hidden geometry effi-ciently so that running time is proportional to the visiblecomplexity of the scene and independent of the complexityof hidden geometry.

The Warnock subdivision algorithm [Warnock69] ap-proaches this goal, performing logarithmic search for visibletiles in the quadtree subdivision of a polygon. If scene prim-itives are processed front to back, only visible tiles and theirchildren in the quadtree are visited. Although Warnock sub-division satisfies our desire to work only on visible regions ofprimitives, the traditional subdivision procedure is relativelyslow and consequently, this approach is slower than incre-mental scan conversion, except for densely occluded scenes.Neither traditional incremental scan conversion nor Warnocksubdivision is well suited to tiling scenes of moderate depthcomplexity.

A second shortcoming of incremental scan conversion isthat it spends most of its time tiling edges and spans, travers-ing these features pixel by pixel, even though all possi-ble tiling patterns for an edge crossing a block of samplescan be precomputed and stored as bit masks called

. Then the samples that a convex polygon coverswithin a block can be quickly found by compositing the cov-erage masks of its edges. Previously, this technique has been

ANDING

×

××

§§ §

§

§§

§§

§

×2 PREVIOUS WORK

2.1 Warnock Subdivision

2.2 Coverage Masks

3 TRIAGE COVERAGE MASKS

triagecoverage masks in-side outside intersecting

occupied

depth-priority Warnock algorithm

area sampling

used to estimate coverage of polygonal fragments within apixel to accelerate filtering [Carpenter84, Sabella-Wozny83,Fiume-et-al83, Fiume91].

Here we present a polygon tiling algorithm that combinesthe best features of traditional algorithms. The key innova-tion that makes this integration possible is the generaliza-tion of coverage masks to permit their application to imagehierarchies. The generalized masks, which we call

, classify cells in the image hierarchy as, , or an edge. This enables them to

drive Warnock-style subdivision of image space. The resultis a hierarchical tiling algorithm that finds visible geometryby logarithmic search, as with the Warnock algorithm, thatexploits precomputation of tiling patterns, as with filteringwith coverage masks, and that also uses incremental meth-ods to exploit image-space coherence, as with incrementalscan conversion. The algorithm efficiently performs high-resolution tiling (e.g. 4096 4096), so it naturally supportshigh-quality antialiasing by oversampling and filtering. A-buffer-style antialiasing with coverage masks [Carpenter84]is particularly convenient.

For densely occluded scenes we combine hierarchi-cal tiling with the hierarchical visibility algorithm[Greene-Kass-Miller93, Greene-Kass94, Greene95] to per-mit hierarchical culling of hidden regions of object space.This combination of algorithms enables very rapid renderingof complex polygonal scenes with high-quality antialiasing.The method has been tested and shown to work effectivelyon densely occluded scenes. On a test scene containing up-wards of 167 million replicated polygons, the algorithm com-puted visibility on a 4096 4096 grid as rapidly as hierarchi-cal z-buffering [Greene-Kass-Miller93] tiled a 512 512 grid.

In 2, we survey previous work on efficient polygon tiling.In 3, we introduce triage coverage masks, and in 4 wepresent the rendering algorithm in which they are applied.In 5, we discuss how rendering of densely occluded scenescan be accelerated with object-space culling methods. In6, we discuss strategies for efficiently processing dynamic

scenes. In 7, we compare the hierarchical tiling algorithm tohierarchical z-buffering. In 8, we describe hierarchical tilingof polyhedra. In 9, we describe our implementation andshow results for both simple and densely occluded scenes.Finally, we present our conclusions in 10.

Our tiling algorithm is loosely based on the Warnock algo-rithm [Warnock69], a recursive subdivision procedure thatfinds the quadtree subdivision of visible edges in a sceneby logarithmic search. Scene primitives are inserted into aquadtree data structure beginning at the root cell, which rep-resents the whole screen. At each level of subdivision, thealgorithm classifies the quadrants of the current quadtreecell as inside, outside, or intersecting the primitive beingprocessed, and only intersected quadrants are subdivided.Quadrants which are entirely covered by one or more primi-tives are identified, permitting hidden geometry within themto be culled. The Warnock algorithm is actually a family ofalgorithms based on a common subdivision procedure, andthe control structure varies from implementation to imple-mentation [Rogers85]. A typical implementation processesprimitives in no particular order, maintains lists of poten-tially visible primitives at quadtree cells, and expends con-siderable work performing depth comparisons in order to cull

hidden geometry.When circumstances permit convenient front-to-back

traversal of primitives, as with a presorted static polygonalscene, a simpler and more efficient variation of the Warnockalgorithm can be employed. In this case, we insert primi-tives into the quadtree one at a time in front-to-back order.As subdivision proceeds, we mark cells that primitives com-pletely cover as and ignore cells that are already oc-cupied, since any geometry that projects to them is known tobe hidden. We complete subdivision of one primitive downto the finest level of the quadtree before processing the next.This version of the Warnock algorithm is simpler because itneed not maintain lists of primitives or perform depth com-parisons. It is more efficient because, unlike the traditionalalgorithm, it only subdivides cells crossed by edges that arevisible in the output image. Our tiling algorithm is basedon this variation of the Warnock algorithm, which we willrefer to as the . AlthoughMeagher’s volume rendering algorithm uses this procedureto tile faces of octree cubes [Meagher82], to the best of ourknowledge this variation of the Warnock algorithm has notbeen applied previously to rendering geometric models. In-cidentally, front-to-back traversal of primitives would accel-erate Warnock-style subdivision in the error-bounded ren-dering algorithm described in [Greene-Kass94].

We turn now to reviewing how filtering algorithms ex-ploit precomputation with coverage masks [Carpenter84,Sabella-Wozny83, Fiume-et-al83, Fiume91]. The underly-ing idea is that all possible tiling patterns for a singleedge crossing a grid of raster samples within a pixel canbe precomputed and later retrieved, indexed by the pointswhere the edge intersects the pixel’s border [Fiume-et-al83,Sabella-Wozny83]. These tiling patterns can be stored asbit masks, permitting samples inside a convex polygon tobe determined by together the coverage masks forits edges. Moreover, if polygons are processed front to backor back to front, visible-surface determination within a pixelcan also be performed with bit-mask operations. For exam-ple, Carpenter’s A-buffer algorithm [Carpenter84] clips poly-gons to pixel borders, sorts the polygonal fragments frontto back, and determines the visible samples on each frag-ment on a 4 8 grid by compositing coverage masks. TheA-buffer algorithm also uses coverage masks to acceleratefiltering. For each visible fragment, a single shading valueis computed, weighted by the bit count of its mask, andadded to pixel color. This shading method efficiently ap-proximates [Catmull78] and it effectively an-tialiases edges. Abram, Westover, and Whitted advance sim-ilar methods that permit jitter, convolution with arbitraryfilter kernels, and evaluation of simple shading functions tobe performed by table lookup [Abram-et-al85].

To accelerate polygon tiling, the hierarchical tiling algorithmgeneralizes coverage masks to operate on image hierarchies,thereby enabling Warnock-style subdivision of image spaceto be driven by bit-mask operations. A conventional cov-erage mask for an edge classifies each grid point within asquare region of the screen as inside or outside the edge, asshown in figure 1a. In the context of Warnock subdivision,the analogous operation is classifying subcells of an image hi-erarchy as inside, outside, or intersecting an edge, as shown

conventionalcoverage mask

C bit mask(covered region)

V bit mask(vacant region)

insidesubcells

outsidesubcells

a

c d eb

triagecoverage mask (active region)

A = ~(C|V)

intersectedsubcells

represented astwo bit masks

inside

outs

ide

inside

outs

ide

zoom zoom zoom

triage mask one-bit mask

WHOLE SCREEN SINGLE PIXEL

NL-1xNL-1

block of pixelsN2xN2 block

of pixels

NxN blockof pixels

NxN grid ofraster samples

triage masktriage mask triage mask

1

1

2 2

×

|×

×

CV

A C V

|

× ×

××

×× ×

××

C V

N NL N N

C V

N NN N

triage coverage masks

covered

vacant active

triage

covered vacantactive

coverage pyramid

covered vacant active

Tiling a convex polygon into a square region of thescreen using coverage masks. The existing coverage maskfor a screen cell represents previously tiled polygons,which are in front of the polygon being tiled.

Conventional Coverage MasksExisting pixel mask: C(1) Find intercepts of edges with pixel border and look

up edge masks (call them E1, E2, ... , EN).Find mask P of convex polygon from edge masks:P = E1 & E2 & ... & EN

(2) Find mask W of visible samples on polygon within C:W = P & ~CUpdate C:C’ = C | P.

Triage Coverage MasksExisting triage mask for cell in the coverage pyramid:

(Cc,Cv) - covered and vacant bit masks(1) Find intercepts of edges with cell border and look

up edge masks ((E1c,E1v), ... ,(ENc,ENv)).Find mask (Pc,Pv) of polygon from edge masks:Pc = E1c & E2c & ... & ENcPv = E1v | E2v | ... | ENv

(2) Find mask W of entirely visible cells on polygonwithin (Cc,Cv):

W = Cv & PcFind mask A of active cells on polygon in (Cc,Cv):A = ~(W | Pv | Cc)Update (Cc,Cv):Cc’ = Cc | WCv’ = Cv & ~W(Note: (Cc,Cv) may also be modified by propagation

from finer levels.)

We use standard notation for bit-mask operations: & for bit-wise AND, for bitwise OR, and ˜ for bitwise complement.

A conventional coverage mask classifies grid pointsas inside or outside an edge (panel a). A coverage maskclassifies subcells as inside, outside, or intersecting an edge (panelb). We refer to these regions as (panel c), (paneld), and (panel e), respectively. We represent triage masksas the pair of bit masks ( , ) indicating the covered and vacantregions. In practice, we use 8 8 masks rather than 4 4 masks.

Schematic diagram of a pyramid of masks withlevels for an image with oversampling at each pixel. This

is built from triage masks, except at the finestlevel where a conventional one-bit coverage mask is associatedwith each pixel. In this hierarchical representation of the screen,the and bits for each subcell in triage masks indicate whethera square region of the screen is , , or . At thecoarsest level, a single triage mask represents the whole screen(left), and at the finest level, a single one-bit mask represents theraster samples within a pixel (right). A four-level pyramid of 8 8masks corresponds to a 512 512 image with 8 8 oversampling ateach pixel. The corresponding diagram for a point-sampled imageis the same, except that the masks represent an block ofpixels, an block of pixels, and so forth.

in figure 1b for an edge crossing a square containing a 4 4grid of subcells. We call such masksbecause the three states that they distinguish correspond totrivial rejection, trivial acceptance, and “do further work.”We represent each triage mask as a pair of bit masks, oneindicating inside subcells, the other indicating outside sub-cells, as shown in figures 1c and 1d. We will refer to thebit mask for inside subcells as the “ ” mask (for )and the bit mask for outside subcells as the “ ” mask (for

). We call the intersected subcells the region ofthe mask, because the corresponding regions of the screenrequire further work and will later be subdivided. The bitmask for the active region is = ( ), as shown infigure 1e. In practice, we use 8 8 masks rather than theillustrated 4 4 masks.

The basic tiling and visibility operations performed byconventional coverage masks are (1) finding the mask of aconvex polygon from the masks of its edges, and (2) findingthe visible samples on a polygon within a pixel by com-positing the polygon’s mask with the pixel’s mask, whichrepresents previously tiled samples. In the context of thehierarchical tiling algorithm, tiling and visibility operationsperformed by triage masks are entirely analogous, exceptthat compositing is performed recursively on an image hier-archy rather than a single square region of the screen. Theimage hierarchy is a “coverage pyramid” constructed fromboth conventional and triage coverage masks, as schemat-ically illustrated in figure 2 (see caption). Operations (1)and (2) for triage masks are easily understood by analogywith conventional coverage masks, as outlined below. See[Greene95] for derivations of the formulas for triage masksand examples illustrating compositing of triage masks.

Figure 1:

Figure 2:

×

C V

LISTING 1

coverage pyramid

covered vacant active

3.1 Tiling by Recursive Subdivision

4 RENDERING A SCENE

4.1 Data Structures

LISTING 1 (pseudocode)

/*Recursive subdivision procedure for tiling a convexpolygon P.

After clipping P to the near clipping plane in objectspace, if necessary, and projecting P’s vertices intothe image plane, we call tile_poly with the root mask ofthe mask pyramid, P’s edge list, and "level" set to 1.

arguments:(Cc,Cv): pyramid mask (input and output)edge_list: P’s edges that intersect pyramid masklevel: pyramid level: 1 is root, 2 is next coarsest, etc.*/

tile_poly((Cc,Cv), edge_list, level){

set active_edge_list to nil

/* build P’s mask (Pc,Pv) */Pc = all_onesPv = all_zerosfor each edge on edge_list {

find intercepts on square perimeter of maskif square is outside edgethen return /* polygon doesn’t intersect mask */if edge intersects square, then {

append edge to active_edge_list/* Note: at the pixel level, Ec is a conventional

coverage mask and Ev = ~Ec */look up edge mask (Ec,Ev)Pc = Pc & EcPv = Pv | Ev

}}

/* make "write" bit mask and update pyramid mask */W = Cv & PcCc = Cc | WCv = Cv & ~W

if level is the pixel level, then {/* filter pixel using coverage mask W *//* to perform A-buffer box filtering:

add bitcount*color to accumulation buffer */evaluate shading and update accumulation bufferreturn

}

for each TRUE bit in W {for each pixel in this square region of screen {

/* to perform A-buffer box filtering:add 64*(polygon color) to accumulation buffer */

evaluate shading and update accumulation buffer}

}

/* Recursive Subdivision */

/* make "active" bit mask */A = ~(W | Pv | Cc)/* subdivide active subcells */for each TRUE bit in A {

/* call corresponding subcell Scall its pyramid mask (Sc,Sv) */

copy all edges on active_edge_list that intersectS to S_edge_list

tile_poly((Sc,Sv), S_edge_list, level+1)/* propagate coverage status to coarser levels of

mask pyramid */if Sc is all_onesthen Cc = Cc | active_bit /* set covered status */if Sv is not all_onesthen Cv = Cv & ~active_bit /* clear vacant status */

}}

Now that the primitive tiling and visibility operations havebeen described, we are ready to outline the recursive proce-dure for tiling a convex polygon into the coverage pyramid.Initially, the masks in the coverage pyramid are a hierar-chical representation of regions of the image raster that arealready occupied by previously tiled polygons. To make thediscussion more concrete, the following outline assumes 8 8oversampling and filtering.

To tile polygon P, we begin by finding the triage maskfor each of its edges that crosses the screen by finding itsintercepts on the screen border and looking up the corre-sponding mask in a precomputed table. Then we compositethe edge masks according to operation (1) above in order toconstruct P’s triage mask. Next, beginning at the root cellof the coverage pyramid, we composite P’s mask with cellsin the pyramid, using triage mask operations to distinguishthree classes of cells: where P is entirely hidden, where Pis entirely visible, and where P’s visibility is uncertain, i.e.“active” cells (operation (2)). We ignore cells where P isentirely hidden, we display (or tag) cells where P is entirelyvisible (mask W), and we recursively subdivide active cells(mask A). During subdivision, edge intercepts used to lookup edge masks are computed incrementally. In regions ofthe screen where P’s edges cross vacant or active cells, sub-division continues, ultimately down to all vacant and activepixels crossed by P’s edges. At the pixel level, coveragemasks in the pyramid are conventional one-bit masks. If weare box filtering, operations may follow the traditional A-buffer algorithm: we find P’s visible samples, compute theircontribution to pixel value and add it to the accumulationbuffer, and then update the pixel’s coverage mask. If thestatus of a pixel changes from vacant or active to active orcovered, the status of masks in coarser levels of the pyra-mid may also change, so whenever this occurs, we propagatecoverage information to coarser levels by performing simplebit-mask operations during recursive traversal of the pyra-mid. When this recursive tiling procedure finishes, all visiblesamples on P have been tiled and the coverage pyramid hasbeen updated. This procedure is outlined in .

Now that the procedure for tiling a polygon has been de-scribed, we are ready to place it in the context of rendering aframe. But first we describe the underlying data structures:the coverage pyramid, the image array, and the model tree.

To permit Warnock subdivision to be driven by bit-mask op-erations, we maintain visibility information about previouslytiled polygons in an image-space pyramid of coverage masks.As schematically illustrated in figure 2, a single triage maskrepresents the whole screen, triage masks at the next levelof the pyramid correspond to subcells in the root mask, andso forth. Thus, this is a hierarchical rep-resentation of the screen with the and bits for eachsubcell in the triage masks indicating whether a square re-gion of the screen is , , or . Within acovered region, all corresponding samples in the underlyingimage raster are covered, within a vacant region, all corre-sponding raster samples are vacant, and within an activeregion, at least one but not all corresponding raster samplesare covered. At the finest level of the pyramid only, we use

>

C V

W

1/32 2/63

LISTING 1

TRUE

××

×× × ×

×

××

× × ×

§

×

×

accumulation buffer

vacant cov-ered

active coveredvacant

areasampling

4.2 Precomputation Step

4.3 Generating a Frame

4.4 Other Filtering Methods

4.5 Point Sampling

5 HIERARCHICAL OBJECT-SPACE CULLING

conventional one-bit coverage masks to indicate whether ornot point samples in the image raster have been covered.If we are oversampling and filtering, each of these one-bitmasks corresponds to the 8 8 grid of raster samples withina pixel. The appropriate pyramid for a 512 512 image with8 8 oversampling at each pixel has four levels, three arraysof triage masks with dimensions 1 1, 8 8, and 64 64, andone 512 512 array of one-bit masks. Alternatively, if we arepoint sampling rather than filtering, each one-bit mask cor-responds to an 8 8 block of pixels. In this case, the pyramidfor a 512 512 image would have two arrays of triage maskswith dimensions 1 1 and 8 8, and one 64 64 array of one-bit masks.

Memory requirements for the coverage pyramid are verymodest. Since the finest level requires only one bit per rastersample and the vast majority of cells in the pyramid are inthe finest level, total memory requirements are only slightlymore than one bit per raster sample. The actual number ofbits per raster sample required for an n-level pyramid liesin the range [1 1 ) for n 1. Note that a z-bufferrequires a great deal more memory because it stores a depthvalue for each raster sample.

The other image-space data structure that our algorithmrequires is an image array with an element for each colorcomponent at each pixel. If we perform A-buffer-style filter-ing [Carpenter84], shading contributions from 64 subpixelsamples accumulate in each array element. Thus, elementsin this require considerable depth. Weuse 16 bits per pixel per color channel. When filtering witha convolution kernel that overlaps multiple pixels, we storecolor components as floating-point values in the accumula-tion buffer. If no filtering is performed, pixel values do notaccumulate, so a conventional image array is employed.

Now for representing the model. Our algorithm requiresfront-to-back traversal of polygons in the scene, so we rep-resent the scene as a binary space partitioning tree (BSPtree) [Fuchs-Kedem-Naylor80], which permits very efficienttraversal in depth order. Strategies for handling dynamicscenes are discussed in 6.

In a precomputation step, we build a BSP tree for the model.We also build lookup tables for both conventional and triagecoverage masks. In building mask tables, we divide theperimeter of a canonical square into some number of equalintervals (e.g. 64) and create an entry in a two-dimensionaltable for each pair of intervals not lying on a common edge.Once this table has been constructed, to obtain the mask foran arbitrary edge we determine which two intervals it crossesand look up the corresponding table entry. To conserve stor-age, we can use the same table entry for edges with oppositedirections, because complementing the ( , ) bit masks in atriage mask corresponds to reversing an edge. Hierarchicaltiling depends on accurate classification of and

regions in triage masks, so we construct them with thefollowing conservative procedure. The endpoints of the pairof intervals used to index a coverage mask define a quadri-lateral. Any subcells intersected by the quadrilateral areclassified , guaranteeing that cells classified arecompletely covered and cells classified are completelyvacant.

We begin a frame by clearing the accumulation buffer andthe coverage pyramid. We traverse polygons in the model’sBSP tree in front-to-back order. We clip each polygon to thefront clipping plane, if necessary, and project its vertices intothe image plane. There is no need to preserve depth infor-mation. Before tiling a polygon, we first determine whetherits bounding box is visible. If this procedure fails to provethat the polygon is hidden, we then tile it into the smallestenclosing cell in the coverage pyramid using the procedureoutlined in . In regions of the screen where thepolygon is visible, this procedure updates pixel values in theimage buffer and updates coverage status in the coveragepyramid. After all polygons have been processed, the sceneis complete and we display the image buffer.

We have already discussed A-buffer-style filtering by, a term used to describe convolution of visible sam-

ples with a pixel-sized box filter [Catmull78]. Abram, West-over, and Whitted extended coverage-mask techniques toinclude jitter, table-driven convolution with arbitrary filterkernels, and evaluation of simple shading functions by tablelookup [Abram-et-al85]. All of these methods are compati-ble with hierarchical tiling. To perform table-driven convo-lution, the contribution of each subpixel sample to neighbor-ing pixels is precomputed and stored in a table of filteringcoefficients. For some simple shading functions, the con-tribution of arbitrary collections of samples can be storedas precomputed coefficients which enables, for example, ef-ficient byte-by-byte processing of coverage masks. We usethis method when filtering 3 3 pixel neighborhoods with aone-pixel radius cosine-hump kernel.

Modifying the algorithm to produce point-sampled ratherthan filtered images is straightforward. In this case, eachmask at the finest level of the pyramid corresponds to an8 8 block of pixels. So for each subcell in the “ ”mask (see pseudocode), we evaluate the shading function atthe corresponding pixel and write the result to the imagebuffer. Since pixel values correspond to point samples, colorvalues do not accumulate, so we use a conventional imagearray rather than an accumulation buffer. Note that it isnot necessary to clear the image array at the beginning of aframe. Instead, after tiling all scene polygons, we compositea screen-sized polygon of the desired background color (ortexture) with the root mask, thereby tiling all remainingvacant pixels in the image.

Because of its ability to cull hierarchically in image space,the hierarchical tiling algorithm processes densely occludedscenes much more efficiently than conventional tiling meth-ods, which must traverse all hidden geometry pixel by pixel.Nonetheless, it must still consider every polygon in a scene,doing some work even on those that are entirely hidden.To avoid this behavior, we integrate our algorithm withthe hierarchical visibility algorithm [Greene-Kass-Miller93,Greene-Kass94, Greene95] to enable hierarchical object-space culling of hidden regions of the model. This can bedone by substituting hierarchical tiling for z-buffering in the

LISTING 1

TRUE

octree of BSP trees

znear zfar

5.1 Building an Octree of BSP Trees

5.2 Combining Hierarchical Tiling with HierarchicalVisibility

6 HANDLING DYNAMIC SCENES

6.1 Lazy Z-Buffering

hierarchical z-buffer algorithm of [Greene-Kass-Miller93], al-though this requires some changes in both the object-spaceand image-space hierarchies. In image space, instead of us-ing a z-pyramid of depth samples to maintain visibility in-formation, we use a coverage pyramid. In object space, wemodify the octree to permit strict front-to-back traversalof polygons. Note that the z-buffer algorithm traverses oc-tree cubes in front-to-back order, but not the primitives con-tained within them. And since octree cubes are nested, it isnot sufficient to simply organize the primitives inside eachcube into a BSP tree. Instead we use the following algo-rithm for building an that permits strictfront-to-back traversal.

Starting with a root cube which bounds model space, weinsert polygons one at a time into the cube. If the polygoncount in the cube reaches a specified threshold (e.g. 30),we subdivide the cube into eight octants and insert each ofits polygons into each octant that it intersects, clipping tothe cube’s three median planes. When all polygons in thescene have been inserted into the root cube and propagatedthrough the tree, we have an octree where all polygons areassociated only with leaf nodes, thereby circumventing theordering problem caused by nesting. The last step is to orga-nize the polygons in each leaf node of the octree into a BSPtree [Foley-et-al90]. Now scene polygons can be traversed instrict front-to-back order by traversing octree cubes front toback and traversing their BSP trees front to back.

Now that we have established how to traverse scene polygonsin front-to-back order, combining hierarchical tiling with thebasic hierarchical visibility algorithm is straightforward. Aswith hierarchical z-buffering, we traverse octree cubes infront-to-back order, testing them for visibility and cullingthose that are hidden. As with hierarchical z-buffering, wedetermine whether a cube is visible by tiling it, stoppingif a visible sample is found. Note that it is only necessaryto tile a cube’s polygonal silhouette (unless it intersects thefront clipping plane), rather than tiling its front faces. Bycomparison, z-buffering often needs to tile three faces of acube to establish its visibility. To test cube silhouettes forvisibility, we modify the tiling procedure of to re-port visibility status, returning whenever a polygon’smask indicates that it covers a vacant subcell or a vacantgrid point in the image raster. Once we have establishedthat an octree cube is visible, we traverse the polygons inits BSP tree in front-to-back order, tiling them into the cov-erage pyramid. When we finish traversing the octree, allvisible polygons have been tiled and the image is complete.

This version of the hierarchical visibility algorithm hasvery efficient traversal properties in both object-space andimage-space. Like the hierarchical z-buffer algorithm, in ob-ject space the algorithm only visits visible octree nodes andtheir children, and it only renders polygons that are in visibleoctree nodes. In image space, when tiling polygons into thecoverage pyramid, hierarchical tiling only visits cells that arecrossed by visible edges in the output image. Visible sam-ples are never overwritten. As a result of these properties,this variation of the hierarchical visibility algorithm is veryefficient at both culling hidden geometry and tiling visiblegeometry.

If a hardware graphics accelerator is available to performshading operations such as texture mapping, we can per-form visibility operations with software and shading withhardware. We would use the usual hierarchical tiling algo-rithm to maintain the coverage pyramid and perform object-space culling, and we would render visible polygons withthe graphics accelerator, using an accumulation buffer, ifavailable, to perform antialiasing [Haeberli-Akeley90]. Thiswould be a fast way to produce texture-mapped images ofdensely occluded scenes.

One weakness of the hierarchical tiling algorithm is that itrequires strict front-to-back traversal of polygons. This doesnot present a problem for a static model, since it may berepresented as a BSP tree [Fuchs-Kedem-Naylor80], and ifonly a relatively small number of polygons are moving, thetree can be efficiently maintained [Naylor92a]. However, inscenes with numerous moving polygons, maintaining depthorder can impose a severe computational burden. Here weconsider two different methods that address this problem.

The following “lazy z-buffering” algorithm is an attractivealternative whenever at least part of the model can be con-veniently traversed in approximate front-to-back order. Forconvenience, the following discussion assumes that we areoversampling and box-filtering. With this variation of hier-archical tiling, we make the following changes to the basicalgorithm. For every cell in the coverage pyramid, we main-tain and depth values for all potentially visiblepolygons thus far encountered that intersect the cell. In-stead of automatically culling a portion of a polygon thatintersects a covered cell, it is culled only if it lies behindthe cell’s zfar value. At a pixel, we assume that fragmentsarrive in an order that permits tiling with coverage masks,i.e., one or more non-overlapping fragments cover all of thepixel’s samples before any other fragments arrive. Theseconditions are easily monitored using the pixel’s coveragemask and znear/zfar values. Unless and until a fragmentviolating the conditions arrives, we perform filtering like theusual algorithm, adding shading contributions to the accu-mulation buffer and updating the pixel’s coverage mask. Wealso cache information about each fragment in case we needit later. If and when the conditions are violated, we discardthe current accumulated color value for the pixel and revertto ordinary z-buffering, allocating the memory required forstoring color and depth at each subpixel sample, and thentiling the cached fragments. This produces the same imagesamples as if we had been maintaining an oversampled z-buffer all along. The last step after all polygons in the scenehave been tiled is to filter the z-buffered pixels. This pro-cedure produces the same image as hierarchical tiling wouldhave produced if polygons had been traversed in depth order.

This simple strategy exploits whatever depth coherence isin the scene being processed. If polygons are mostly in front-to-back order, lazy z-buffering will not do much more workthan the usual hierarchical tiling algorithm. This would oc-cur, for example, if a few small dynamic objects were posi-tioned in front of a static background model that was tra-versed in depth order. In the worst case, when frontmostobjects are never processed first, lazy z-buffering does onlyslightly more work than hierarchical z-buffering.

<

× ×

× ×

§ §§

covered

2/63 (usu. 24-32 bits)

(usu. 24-36 bits)

6.2 Merging Octrees

7 HIERARCHICAL TILING VERSUS HIERARCHI-CAL Z-BUFFERING

8 TILING POLYHEDRA

9 IMPLEMENTATION AND RESULTS

Thing Being Compared Hierarchical Tiling Hierarchical Z-Buffering

object-space hierarchy BSP tree / octree of BSP trees octreeimage-space hierarchy pyramid of coverage masks z-pyramidfront-to-back polygon traversal required? yes novisibility information per raster sample 1 coverage-mask bits Zcolor information per raster sample none RGBtype of output-image buffer accumulation (deep) standardneed to store coverage-mask LUTs? yes nopixel overwrite? no yesmask support for filtering built-in? yes noidentifies image-pyramid cells? yes no

Table 1: Some points of comparison between hierarchical polygon tiling and hierarchical z-buffering.

For polygonal scenes consisting of independently movingrigid bodies, another strategy can be employed that guar-antees front-to-back traversal of polygons, permitting us torender polygons with the standard hierarchical tiling proce-dure. According to this method, each rigid body is repre-sented as an octree of BSP trees. To render a frame, wesimultaneously traverse all octrees front to back, culling anyoctree cubes which are hidden by the coverage pyramid, andusing the following strategy to synchronize traversal of oc-trees. For each octree, we determine the current frontmostleaf cube and then determine the frontmost leaf cube of alloctrees. If this single frontmost cube does not intersect a leafcube in any other octree, we can safely render its BSP tree.If this cube does intersect other leaf cubes, we clip their poly-gons to the frontmost cube, insert the clipped fragments intothe frontmost cube’s BSP tree, and then render that BSPtree. This procedure ultimately will cull or render all oc-tree leaf nodes, whereupon rendering of the scene is finished.This procedure for rendering dynamic scenes is nearly as fastas the standard hierarchical tiling algorithm, except for thetime spent merging octree leaf nodes. Although mergingoperations can require considerable computation, for manyscenes merging will only rarely be required, and in such casesthis algorithm will run efficiently.

Table 1 summarizes some points of comparison between hi-erarchical polygon tiling and hierarchical z-buffering. As thetable points out, hierarchical tiling requires strict front-to-back traversal of polygons, which complicates the object-space hierarchy, assuming that we are maintaining an octreeof BSP trees to enable object-space culling. Another pointin favor of hierarchical z-buffering is that it does not need tobuild or store lookup tables for coverage masks. The otherpoints of comparison strongly favor hierarchical tiling. Onebig advantage is that its memory requirements are much less.Whereas hierarchical z-buffering needs to store depth andcolor information for each raster sample, hierarchical tilingonly needs to store slightly more than one bit of coverageinformation for each raster sample. The resulting memorysavings can be very substantial. In fact, if we are rendering a512 512 image with 8 8 oversampling at each pixel, hierar-chical tiling requires only about 3.7% of the image memoryrequired for z-buffering. Other points in favor of hierar-chical tiling are that it never overwrites visible samples, ithas built-in support for filtering with coverage masks, and

it facilitates exploiting image-space coherence by identify-ing regions of the image-space pyramid that are completelycovered by individual polygons.

Hierarchical tiling with coverage masks can also be appliedto Warnock subdivision in three dimensions to tile convexpolyhedra into a voxel grid. In this case, 64-bit triage maskswould classify cells within a 4 4 4 subdivision of a cubeas inside, outside, or intersecting a plane. The triage maskfor a convex polyhedron within a cube would be obtainedby compositing the triage masks of its face planes. The re-cursive subdivision procedure for tiling a polyhedron into a3D pyramid of coverage masks would be analogous to hier-archical polygon tiling, and it would only visit cells in thepyramid that are intersected by the polyhedron’s faces. Thespeed and modest memory requirements of this volume tilingalgorithm make it an attractive alternative to traditionalmethods [Kaufman86].

Our implementation of hierarchical polygon tiling is pro-grammed in C and renders either point-sampled or filteredimages of scenes composed of flat-shaded convex polygons.Our polygon tiling program follows the pseudocode outline,except that we tile a polygon into the smallest enclosing cellin the coverage pyramid after first testing its bounding boxfor visibility, as described in 4.3. As described in 3 and4, filtering is performed by box filtering according to the

A-buffer method, or by table-driven convolution with a one-pixel radius cosine-hump kernel. In the latter case, kernelcoefficients are precomputed for all byte patterns and ac-cessed by table lookup for each non-zero byte within a poly-gon’s coverage mask at a pixel. Color components in theaccumulation buffer are represented as 16-bit integer val-ues when box filtering, and as 32-bit floating-point valueswhen filtering with a cosine-hump kernel. Tables of cover-age masks are constructed with 64 intervals along each edgeof the bounding square. One-bit coverage masks for filteringpixels are constructed with jitter, using random placementof raster samples within the corresponding sub-pixel square[Dippe-Wold85, Cook86]. All of the following tests were per-formed on a SGI Indigo2 with a 75 megahertz R8000 pro-cessor, which performs atomic 64-bit mask operations.

To compare the efficiency of hierarchical tiling to tradi-tional incremental scan conversion for tiling simple polyg-onal scenes, we employed the color-cube model of figure 5,

2

2

×

××

××

×

§

§

×××

×

§

×

×

×

×

×

××

Our accounting of work done on visibility does not includeclearing of the coverage pyramid. Clearing the pyramid at thebeginning of a frame visits each pyramid cell once, but this is notnecessary if a “lazy clearing” strategy is employed.

composed of 192 presorted front-facing squares. We ren-dered this model with hierarchical tiling and with a back-to-front “painter’s” algorithm [Foley-et-al90]. The painter’salgorithm maintained a color triplet for each point in theimage raster and performed tiling by incremental scan con-version, overwriting the image at every pixel encountered.On a 512 512 grid, hierarchical tiling tiled the color-cubemodel approximately ten percent faster than the painter’salgorithm (.087 seconds versus .097 seconds). At higher res-olution, the speed advantage of hierarchical tiling was muchmore pronounced. For example, hierarchical tiling took .357seconds to tile the model on a 4096 4096 grid and producethe 512 512 box-filtered image of figure 5. By comparison,the painter’s algorithm took 5.3 times longer (1.91 seconds)to tile this scene on a 2048 2048 grid without filtering (ourIndigo didn’t have enough memory to render a 4096 4096RGB image). By timing the painter’s algorithm at variousresolutions, we found that it was only able to tile a 910 910grid in the .357 seconds it took hierarchical tiling to tile andfilter the image of figure 5. This example illustrates that forsoftware tiling at sufficient resolution to enable high-qualityantialiasing by oversampling and filtering, hierarchical tilingis much more efficient than traditional incremental scan con-version, even for simple scenes.

To test the effectiveness of hierarchical tiling on denselyoccluded scenes we integrated hierarchical tiling with hi-erarchical visibility as described in 5, performing tilingof both model polygons and octree-cube silhouettes withthe hierarchical tiling method. For a test model, weused a version of the modular office building described in[Greene-Kass-Miller93]. We built an octree of BSP trees forthe repeating module using the method described in 5.1,each BSP tree containing approximately 16,000 quadrilat-erals. We replicated this octree within the shell of a 408-story building resembling the Empire State Building to cre-ate a model consisting of approximately 167 million repli-cated quadrilaterals. Figures 4, 6, and 7 show various viewsof this model.

To compare the relative speed of hierarchical tiling and hi-erarchical z-buffering, we rendered animation of a buildingwalk-through. We found that hierarchical tiling was able toperform tiling on a 4096 4096 grid and produce box-filtered512 512 frames as fast as hierarchical z-buffering produced512 512 point-sampled frames. On viewing the animationproduced with the z-buffer algorithm, we observed consider-able aliasing as expected. By comparison, we observed high-quality antialiasing with the box-filtered animation gener-ated with hierarchical tiling.

Next, we compared the speed of various rendering options.Hierarchical tiling took 3.21 seconds to tile the scene of figure4 on a 4096 4096 grid and produce the pictured box-filteredimage. When we rendered this same box-filtered image with-out using the bounding-box culling method of 4.3 and in-stead tiled all polygons into the root cell of the coveragepyramid, rendering took 1.28 seconds longer, indicating thatthe bounding-box culling strategy provides significant accel-eration. Next we rendered the scene of figure 4 with higher-quality antialiasing using cosine-hump filtering within a 3 3-pixel neighborhood. With this filtering method, it took 4.42seconds to tile the scene on a 4096 4096 grid and producethe filtered image. Finally, we used the point-sampling vari-ation of hierarchical tiling to render a 512 512 image of thescene, which took 1.71 seconds.

To compare the algorithmic efficiency of hierarchical tilingto hierarchical z-buffering, we constructed “work images”that show the number of times during frame generation

that each cell in the coverage pyramid is visited (not count-ing subpixel samples), with an access to a coarser-than-pixel cell being amortized over the corresponding windowof the screen. Work images show the “depth complex-ity” of the visibility computation and indicate where thealgorithm is working hardest [Greene95, Greene-Kass94,Greene-Kass-Miller93]. An average intensity of one in a workimage means that, an average, only a single pyramid cell isaccessed in the coverage pyramid during visibility operationsfor each pixel in the output image. With hierarchical tiling,except for very complex scenes or finely tessellated models,average intensity is usually less than one because visibility atmost pixels is established at a coarser level in the hierarchy.For example, for the simple model of figure 5, an averageof only .123 cells in the coverage pyramid are traversed perpixel in the output image.

Figure 3 shows log-scale work images corresponding tofigure 4. Left to right, the images show work tiling cube sil-houettes during visibility tests (.09 cells visited per pixel, onaverage), work tiling model polygons into the coverage pyra-mid (1.01 cells visited per pixel, on average), and the sumof these two images, showing total work performed on tiling(1.10 cells visited per pixel, on average). In other words,hierarchical tiling visited an average of only 1.10 cells inthe coverage pyramid for each pixel in the 512 512 outputimage, even though tiling and filtering were performed ona 4096 4096 grid. Far fewer cells in the image pyramidare visited with hierarchical tiling than with hierarchical z-buffering [Greene95] because it only visits cells in the imagepyramid that are crossed by visible edges in the output im-age. When we performed a motion test on the scene of figure4, we found that the number of pyramid cells visited was ap-proximately one for frames rendered with hierarchical tilingand approximately three for frames rendered with hierar-chical z-buffering. The lower figure for hierarchical tiling isparticularly impressive considering that it resolves visibilityat 64 times as many raster samples and many more poly-gons are visible. Of course the depth complexity of visibilitycomputations for the scene of figure 4 is far lower for bothhierarchical tiling and hierarchical z-buffering than for naivez-buffering, which visits each pixel dozens of times on aver-age [Greene95].

To explore the limits of hierarchical tiling to effectivelyfilter images of very complex scenes, we rendered a motionsequence in which the camera flies around and through the408-story model of the Empire State Building [Greene96].Figures 6 and 7 are 512 512 frames from this animation,which was produced by tiling on a jittered 4096 4096 gridand filtering with a cosine-hump kernel as previously de-scribed. From the viewpoint of figure 6, this scene posesa formidable challenge to effective filtering, since approx-imately 765,000 polygons are visible, and dozens of poly-gons are visible within some pixels. Nonetheless, we ob-served high-quality antialiasing in the motion sequence. Jit-tering of sub-pixel samples effectively converted aliasing tonoise [Dippe-Wold85, Cook86], which was noticeable only inframes having hundreds of thousands of visible polygons. Forthis motion sequence, we observed subtle patterned aliasingartifacts when the same jitter pattern was employed at allpixels, a problem that was overcome by using several dif-ferent jitter patterns. To reduce temporal aliasing we ren-dered each video field separately and to reduce flicker we

§

§

triage coverage masks

10 CONCLUSION

11 Acknowledgments

References

Gavin Miller suggested the method of combining hardware andsoftware tiling discussed in 5.2. Gavin also contributed to mak-ing the test model of figure 4, as did Eric Chen and Steve Rubin.

A discussion with Piter van Zee was my impetus for working outthe algorithm for merging octrees presented in 6.2. Finally, Igratefully acknowledge Siggraph reviewer #4 for a thoughtful cri-tique of this article.




ACM Transactions on Graphics

Proceedings of SIGGRAPH’85


Graphical Models and Image Processing

Com-puter Graphics, Principles and Practice

Pro-ceedings of SIGGRAPH ’80


Proceedingsof SIGGRAPH ’94

Pro-ceedings of SIGGRAPH ’90

Proceedings of the 1986 Workshopon Interactive 3D Graphics

Proceedings of Graphics Interface ’92

Proceedings of Graph-ics Interface ’92

Procedural Elements for Computer Graphics,

IEEE ComputerGraphics and Applications

applied a 1-4-6-4-1 filter to every other scanline (not appliedto figure 6 or 7). With cosine-hump filtering and multiplejitter patterns, rendering times were 5.15 minutes for fig-ure 6 and 34 seconds for figure 7, which has approximately81,300 visible polygons. We also recorded the motion se-quence with box filtering, but observed noticeably worse im-age quality, particularly the characteristic “ropyness” of areasampling. When box filtering with a single jitter pattern,rendering time for the scene of figure 6 was 4.28 minutes.We also rendered the version of this model shown in figure1 of [Greene-Kass94], which took the error-bounded render-ing algorithm described in that article one hour to produceon a 50-megahertz workstation. By comparison, this samescene took hierarchical tiling 34 seconds to render. Theseexamples illustrate that hierarchical tiling with object-spaceculling can produce high-quality animation of very complexscenes in reasonable frame times. Adding a level to the cov-erage pyramid would permit the algorithm to accurately fil-ter even more complex scenes.

Warnock subdivision with its elegant simplicity and logarith-mic search properties endures as one of the great computer-graphics algorithms. Although polygon tiling by Warnocksubdivision is well known, it has rarely been used in practicedue to the inefficiency of the traditional subdivision proce-dure. Here we have shown that Warnock-style subdivisioncan be driven very efficiently with .The resulting hierarchical polygon tiling algorithm is veryefficient, visiting only cells in the image hierarchy that arecrossed by visible edges in the output image and never over-writing a visible image sample. At high resolution, hier-archical tiling is much faster than traditional incrementalscan conversion, so it is well suited to antialiasing by over-sampling and filtering. Moreover, hierarchical tiling withobject-space culling can process densely occluded scenes ex-tremely efficiently, considerably faster than hierarchical z-buffering, while facilitating high-quality filtering. Althoughthe practicality of the basic algorithm for dynamic scenes isconstrained by the requirement that polygons be traversedfront to back, whenever at least part of the model can be tra-versed in approximate front-to-back order, “lazy z-buffering”helps to overcome this shortcoming. The algorithm is com-pact, straightforward to implement, and has very modestmemory requirements. In short, hierarchical tiling offers theprospect of generating high-quality animation at reasonableframe rates with modest computing resources.

[Abram-et-al85] G. Abram, L. Westover, and T. Whitted, “EfficientAlias-Free Rendering using Bit-Masks and Look-Up Tables,”

, July 1985, 53–59.

[Carpenter84] L. Carpenter, “The A-buffer, an Antialiased HiddenSurface Method,” , July 1984,103–108.

[Catmull74] E. Catmull, “A Subdivision Algorithm for ComputerDisplay of Curved Surfaces,” PhD Thesis, Report UTEC-CSc-74-133, Computer Science Dept., University of Utah, Salt LakeCity, Utah, Dec. 1974.

[Catmull78] E. Catmull, “A Hidden-Surface Algorithm with Anti-Aliasing,” , Aug. 1978, 6–11.

[Cook86] R. Cook, “Stochastic Sampling in Computer Graphics,”, Jan. 1986, 51–72.

[Dippe-Wold85] M. A. Z. Dippe and E. H. Wold, “Antialiasingthrough Stochastic Sampling,”

, July 1985, 69–78.

[Fiume-et-al83] E. Fiume, A. Fournier, and L. Rudolph, “A ParallelScan Conversion Algorithm with Anti-Aliasing for a General-Purpose Ultracomputer,” , July1983, 141–150.

[Fiume91] E. Fiume, “Coverage Masks and Convolution Tables forFast Area Sampling,” ,53(1), Jan. 1991, 25–30.

[Foley-et-al90] J. Foley, A. van Dam, S. Feiner, and J. Hughes,, 2nd edition, Addison-

Wesley, Reading, MA, 1990.

[Fuchs-Kedem-Naylor80] H. Fuchs, J. Kedem, and B. Naylor, “OnVisible Surface Generation by a Priori Tree Structures,”

, June 1980, 124–133.

[Greene-Kass-Miller93] N. Greene, M. Kass, and G. Miller, “Hier-archical Z-Buffer Visibility,” ,July 1993, 231–238.

[Greene-Kass94] N. Greene and M. Kass, “Error-Bounded An-tialiased Rendering of Complex Environments,”

, July 1994, 59–66.

[Greene95] N. Greene, “Hierarchical Rendering of Complex Environ-ments,” PhD Thesis, Univ. of California at Santa Cruz, ReportNo. UCSC-CRL-95-27, June 1995.

[Greene96] N. Greene, “Naked Empire,” ACM Siggraph Video Re-view Issue 115: The Siggraph ‘96 Electronic Theater, August1996.

[Haeberli-Akeley90] P. Haeberli and K. Akeley, “The AccumulationBuffer: Hardware Support for High-Quality Rendering,”

, Aug. 1990, 309–318.

[Kaufman86] A. Kaufman, “3D Scan Conversion Algorithms forVoxel-Based Graphics,”

, Oct. 1986, 45–75.

[Meagher82] D. Meagher, “The Octree Encoding Method for EfficientSolid Modeling,” PhD Thesis, Electrical Engineering Dept.,Rensselaer Polytechnic Institute, Troy, New York, Aug. 1982.

[Naylor92a] B. Naylor, “Interactive Solid Geometry Via PartitioningTrees,” , May 1992, 11–18.

[Naylor92b] B. Naylor, “Partitioning Tree Image Representation andGeneration from 3D Geometric Models,”

, May 1992, 201–212.

[Rogers85] D. Rogers,McGraw-Hill, New York, 1985.

[Sabella-Wozny83] P. Sabella and M. Wozny, “Toward Fast Color-Shaded Images of CAD/CAM Geometry,”

, 3(8), Nov. 1983, 60–71.

[Teller92] S. Teller, “Visibility Computations in Densely OccludedPolyhedral Environments,” PhD Thesis, Univ. of California atBerkeley, Report No. UCB/CSD 92/708, Oct. 1992.

[Warnock69] J. Warnock, “A Hidden Surface Algorithm for Com-puter Generated Halftone Pictures,” PhD Thesis, ComputerScience Dept., University of Utah, TR 4-15, June 1969.

× ×

Figure 3: Log-scale work images showing the number of times that cells in the coverage pyramid were visited while tiling theframe of figure 4. These images depict the “depth complexity” of the visibility computation, showing where the algorithm isworking hardest.Left: work tiling cubes: .09 cells visited per pixel (avg)Middle: work tiling polygons: 1.01 cells visited per pixel (avg)Right: total work on tiling: 1.10 cells visited per pixel (avg)

Figure 4: Interior view of the Empire State Building model. Hierarchical tiling took 3.21 seconds to tile this scene on a4096 4096 grid and produce this 512 512 box-filtered image (75 Mhz processor).

× ×

× ×

Figure 5: Hierarchical tiling took .36 seconds to tile this simple model on a 4096 4096 grid and produce this 512 512box-filtered image (75 Mhz processor).

Figure 6: A frame from “Naked Empire,” animation produced for the Siggraph ’96 Electronic Theater [Greene96]. The modelof this 408-story building consists of approximately 167 million quadrilaterals, 765,000 of which are visible in this frame. This512 512 frame was produced by tiling and filtering on a jittered 4096 4096 grid. Jitter converted aliasing to noise, which isevident in complex regions of the image. Rendering took 5.15 minutes on a 75 Mhz processor.

Figure 7: Another frame from “Naked Empire.” Note that the building model has no outer shell, making it possible to seedeep inside. Rendering time for this frame was 34 seconds (75 Mhz processor).

hierarchical visibility and tiling

Documents