accumulative anti-aliasing - nvidia...accumulative anti-aliasing eric enderton eric lum christian...

2
Permission to make digital or hard copies of part or all of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for third-party components of this work must be honored. For all other uses, contact the Owner/Author. Copyright is held by the owner/author(s). SIGGRAPH 2015 Talks, August 09 – 13, 2015, Los Angeles, CA. ACM 978-1-4503-3636-9/15/08. http://dx.doi.org/10.1145/2775280.2792515 Accumulative Anti-Aliasing Eric Enderton Eric Lum Christian Rouet Oleg Kuznetsov NVIDIA (a) (b) (c) Figure 1: An ACAA frame, with close-ups comparing (a) Single-sampled image. (b) 8x ACAA image. (c) 8x MSAA image. ACAA renders opaque surfaces nearly identically to MSAA but uses half the framebuffer memory. Scene courtesy of Kishonti Informatics. Abstract Accumulative anti-aliasing (ACAA) is a simple modification of forward-rendered multi-sample anti-aliasing (MSAA). It produces the same image quality but consumes half as much multi-sample framebuffer memory, and reduces both render time and off-chip bandwidth by 20% to 30%. ACAA stores multiple depth samples, computed by a depth-only pre-pass, but stores only one color sam- ple per pixel, which is used to accumulate final color as the sum of shaded fragment colors weighted by visibility. ACAA makes higher sample rates practical, improving image quality. 1 Algorithm and Results Many anti-aliasing schemes track color and coverage information for multiple visible fragments per pixel, resolving this data to a final color after all scene geometry has been processed. We observe that if visibility is computed and stored in an earlier pass, then color storage can be reduced to one color accumulator per pixel. Each pixel’s final color is summed on the fly: As each visible fragment is shaded, its contribution is weighted by the stored visibility. The ubiquitous example is multi-sample anti-aliasing (MSAA), which stores both depth and color at each of several samples within a pixel. While modern hardware graphics pipelines support MSAA with up to 8 samples per pixel, the memory and performance penal- ties are generally considered prohibitive for consumer applications such as computer games, and even 4 samples is considered a high- end setting. For forward-rendered game engines, we propose ac- cumulative anti-aliasing (ACAA), which computes multi-sampled color using one color per pixel. First, multi-sampled depth infor- mation is rendered during a z pre-pass; this pass is already a typi- cal game engine optimization, to avoid overshading. Second, dur- ing the shading pass, each visible fragment is tested against that z-buffer. The resulting post-z coverage indicates the weight of the fragment’s color in the accumulated pixel color. The result is improved performance. Memory savings are half or more the size of the MSAA framebuffer, which can be quite sub- stantial: at 4K resolution, an 8x MSAA framebuffer consumes 759 MB, while 8x ACAA requires only 316 MB. ACAA is particularly efficient on recent GPUs that include hardware support for both post-z coverage and target-independent rasterization (TIR). Here, ACAA boosts performance significantly, recovering 50% of the added frame time of MSAA rendering vs single-sampled rendering, a savings of 20-30% of total render time in our tests. We also mea- sured total bytes read and written to the framebuffer by the GPU during one frame. ACAA recovers 50% of the MSAA penalty here as well, which was 30% of total bandwidth used. This is significant because GPU performance is limited by total power dissipation, and off-chip memory access is one of the highest-power operations. 2 Limitations and Extensions Without adequate bit precision, intermediate sums of colors can introduce numerical errors. More significantly, if two fragments have numerically coincident z values at a sample, the sample may be double counted. We describe two efficient solutions, one using stencil tests and one using saturated alpha blending. It is not clear how to anti-alias transparent surfaces with ACAA, as it does not readily distinguish between fragments that abut and fragments that overlap. The water effects in Figure 1 are rendered single-sampled. ACAA does not work well for deferred rendering, as it would merge shader inputs rather than shader outputs. For mobile GPUs that use tiled rendering to combat MSAA penal- ties, ACAA lets larger tiles fit in memory, improving efficiency. ACAA extends naturally to higher image quality. Better, wider fil- ters can be implemented by allowing each fragment to contribute to a neighborhood of pixels. High sample rates can be achieved with- out extra memory, either with hardware support, or by accumulating multiple ACAA passes with custom sampling patterns.

Upload: others

Post on 31-Aug-2020

17 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Accumulative Anti-Aliasing - NVIDIA...Accumulative Anti-Aliasing Eric Enderton Eric Lum Christian Rouet Oleg Kuznetsov NVIDIA Appendix: Source Code Source code for the basic ACAA algorithm

Permission to make digital or hard copies of part or all of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for third-party components of this work must be honored. For all other uses, contact the Owner/Author. Copyright is held by the owner/author(s). SIGGRAPH 2015 Talks, August 09 – 13, 2015, Los Angeles, CA. ACM 978-1-4503-3636-9/15/08. http://dx.doi.org/10.1145/2775280.2792515

Accumulative Anti-Aliasing

Eric Enderton Eric Lum Christian Rouet Oleg Kuznetsov

NVIDIA

(a) (b) (c)

Figure 1: An ACAA frame, with close-ups comparing (a) Single-sampled image. (b) 8x ACAA image. (c) 8x MSAA image. ACAA rendersopaque surfaces nearly identically to MSAA but uses half the framebuffer memory. Scene courtesy of Kishonti Informatics.

Abstract

Accumulative anti-aliasing (ACAA) is a simple modification offorward-rendered multi-sample anti-aliasing (MSAA). It producesthe same image quality but consumes half as much multi-sampleframebuffer memory, and reduces both render time and off-chipbandwidth by 20% to 30%. ACAA stores multiple depth samples,computed by a depth-only pre-pass, but stores only one color sam-ple per pixel, which is used to accumulate final color as the sumof shaded fragment colors weighted by visibility. ACAA makeshigher sample rates practical, improving image quality.

1 Algorithm and Results

Many anti-aliasing schemes track color and coverage informationfor multiple visible fragments per pixel, resolving this data to a finalcolor after all scene geometry has been processed. We observe thatif visibility is computed and stored in an earlier pass, then colorstorage can be reduced to one color accumulator per pixel. Eachpixel’s final color is summed on the fly: As each visible fragmentis shaded, its contribution is weighted by the stored visibility.

The ubiquitous example is multi-sample anti-aliasing (MSAA),which stores both depth and color at each of several samples withina pixel. While modern hardware graphics pipelines support MSAAwith up to 8 samples per pixel, the memory and performance penal-ties are generally considered prohibitive for consumer applicationssuch as computer games, and even 4 samples is considered a high-end setting. For forward-rendered game engines, we propose ac-cumulative anti-aliasing (ACAA), which computes multi-sampledcolor using one color per pixel. First, multi-sampled depth infor-mation is rendered during a z pre-pass; this pass is already a typi-cal game engine optimization, to avoid overshading. Second, dur-ing the shading pass, each visible fragment is tested against that

z-buffer. The resulting post-z coverage indicates the weight of thefragment’s color in the accumulated pixel color.

The result is improved performance. Memory savings are half ormore the size of the MSAA framebuffer, which can be quite sub-stantial: at 4K resolution, an 8x MSAA framebuffer consumes 759MB, while 8x ACAA requires only 316 MB. ACAA is particularlyefficient on recent GPUs that include hardware support for bothpost-z coverage and target-independent rasterization (TIR). Here,ACAA boosts performance significantly, recovering 50% of theadded frame time of MSAA rendering vs single-sampled rendering,a savings of 20-30% of total render time in our tests. We also mea-sured total bytes read and written to the framebuffer by the GPUduring one frame. ACAA recovers 50% of the MSAA penalty hereas well, which was 30% of total bandwidth used. This is significantbecause GPU performance is limited by total power dissipation, andoff-chip memory access is one of the highest-power operations.

2 Limitations and Extensions

Without adequate bit precision, intermediate sums of colors canintroduce numerical errors. More significantly, if two fragmentshave numerically coincident z values at a sample, the sample maybe double counted. We describe two efficient solutions, one usingstencil tests and one using saturated alpha blending.

It is not clear how to anti-alias transparent surfaces with ACAA,as it does not readily distinguish between fragments that abut andfragments that overlap. The water effects in Figure 1 are renderedsingle-sampled. ACAA does not work well for deferred rendering,as it would merge shader inputs rather than shader outputs.

For mobile GPUs that use tiled rendering to combat MSAA penal-ties, ACAA lets larger tiles fit in memory, improving efficiency.

ACAA extends naturally to higher image quality. Better, wider fil-ters can be implemented by allowing each fragment to contribute toa neighborhood of pixels. High sample rates can be achieved with-out extra memory, either with hardware support, or by accumulatingmultiple ACAA passes with custom sampling patterns.

Page 2: Accumulative Anti-Aliasing - NVIDIA...Accumulative Anti-Aliasing Eric Enderton Eric Lum Christian Rouet Oleg Kuznetsov NVIDIA Appendix: Source Code Source code for the basic ACAA algorithm

Accumulative Anti-Aliasing

Eric Enderton Eric Lum Christian Rouet Oleg Kuznetsov

NVIDIA

Appendix: Source Code

Source code for the basic ACAA algorithm is publicly avail-able as part of NVIDIA’s GameWorks SDK. The codesample is named BlendedAA, and is described here:http://docs.nvidia.com/gameworks/index.html#gameworkslibrary/graphicssamples/opengl_samples/blendedaasample.htm.

To download, click on ”OpenGL Graphics and ComputeSamples” here: http://developer.nvidia.com/gameworksdownload.