gdc march 1999scalability - r huddy scalability advanced d3d programming richard huddy...
TRANSCRIPT
GDC March 1999 Scalability - R Huddy
Scalability
Advanced D3D Programming
Richard Huddy
GDC March 1999 Scalability - R Huddy
Basic Objectives
• To produce the best experience on every users machine
• To exploit all of the resources available
• To cope with a broad spread of hardware
• To avoid ‘bottoming out’ during the shelf-life of the game / engine
GDC March 1999 Scalability - R Huddy
What is a high-end PC?
A 125+ mega-texel device
A 125+ mega-pixel device
A fast CPU ( >= 350MHz)
AGP 2X/4X Bus
Lots of system RAM ( >= 64MB)
Huge frame buffers (16 to 32 MB)
Multi-Texture at low cost
GDC March 1999 Scalability - R Huddy
Power Trends
0
50
100
150
200
250
300
350
400
450
500
1st Gen(Virge)
2nd Gen(Voodoo)
3rd Gen(TNT)
4th Gen(???)
CPU Speed
Fill Rate
Appreciate the absolute values and the ratios.
?
GDC March 1999 Scalability - R Huddy
So what’s the problem?
Second generation hardware: A
aGraphics b c
CPU B C
time
A
aGraphics b c
CPU B C
time
Third generation hardware:
Wow, 10% faster!
BeginScene()
EndScene()
EndScene()
GDC March 1999 Scalability - R Huddy
What can you do to help?
Scalability is the key:• Run at higher screen resolutions• Run at higher color depths• Use more complex rendering techniques on
good hardware• Ship multiple geometry models• Protect your CPU• Unlock the frame rate
GDC March 1999 Scalability - R Huddy
Higher Screen Resolutions
1) Include direct support for higher resolution modes (uses lots of disk space).
2) Store high resolution art and filter down to produce lower resolution art.
3) Store low resolution art and pixel double:If you have art at 512x384 use it for 1024x768
If you have art at 640x480 use it on 1280x1024
(but only use a 1280x960 viewport)
GDC March 1999 Scalability - R Huddy
Higher Color Depths
• Runs at much the same speed but gives the user a much richer experience
• Uses frame buffer memory constructively• You can re-use the previous 16 bit assets• The main performance loss in true color is
often due to texture management
But beware the Frame Buffer + Z Buffer depth constraint on Riva TNT
GDC March 1999 Scalability - R Huddy
Complex Rendering Techniques - I
• Environment Mapping– Beware of spending too much CPU on this.
• Dual Texture Lighting
• Bump Mapping
• Use more alpha transparency– But see also “Alpha sort issues” later on…
Please try to use the extra fill rate!
GDC March 1999 Scalability - R Huddy
• Trilinear mipmapping for almost everything
• Use Detail textures
• Large textures for extra realism
• 32 bit textures - where it’s a quality win
• Compressed textures as long as quality is not compromised
Complex Rendering Techniques - II
GDC March 1999 Scalability - R Huddy
Protect your CPU
The big ones:
• __ftol and other ‘type conversion’ nightmares
• sqrt()– that’ll be seventy cycles please...
• Reciprocal square root– One hundred and nine cycles through the FPU…
• Transform and lighting (more on that later)
GDC March 1999 Scalability - R Huddy
Removing __ftol
• Remember that the compiler doesn’t have a choice but you can check the output
• Write you own inline assembler conversion routine if…– You can accept differing rounding rules
This doesn’t break the optimiser!
GDC March 1999 Scalability - R Huddy
Replacement for sqrt()
• Sqrt seems ‘natural’ if you are normalising vectors, calculating environment map coordinates or calculating distances - but it’s sloooow
• Sample code is available from the developer web site or from me directly and will be in future versions of the SDK.
GDC March 1999 Scalability - R Huddy
Saturation Arithmetic (C)
Limiting a floating point number to lie in the range 0.0 to 1.0 inclusive (traditional method):
if (f < 0.0)
f = 0.0;
else if (f > 1.0)
f = 1.0;
GDC March 1999 Scalability - R Huddy
Saturation Arithmetic (Pentium)
if (*(long *)&f < 0)
*(long *)&f = 0;
else if (*(long *)&f > 0x3f800000)
*(long *)&f = 0x3f800000;
• This is faster on a Pentium class processor since the FPU is “non-optimal” (i.e. slow) and the integer unit is much faster.
GDC March 1999 Scalability - R Huddy
Saturation Arithmetic (Pentium II)
• Use the “cmov” instructions:cmp [f],0
cmovb [f],0
cmp [f],3f800000
cmova [f],3f800000
Faster since unpredictable branches are the bottleneck here. Unavailable on a Pentium.
GDC March 1999 Scalability - R Huddy
Unlock the Frame Rate
• It’s essential that your physics model can run at high refresh rates.– At least 100fps
• 30 or 60 fps limits are not acceptable and lead to flat performance on high end hardware
GDC March 1999 Scalability - R Huddy
The Value of Batching
Case Specifics:
• The average # of ‘Polys Per Call’ (PPC) to DrawPrimitive was 2.6, producing 40fps
• Removing state changes to raise the average PPC to ~50 produced 58fps– Most of the removed state changes were
“reasonable”, i.e. not logically redundant– The changes did not reduce visual quality at all– PPC of 200 is optimal
GDC March 1999 Scalability - R Huddy
Alpha Sort Issues
The “standard” solution is…
1) Draw all non-alpha polys (sort by texture)
2) Draw all alpha polys in back to front order with Z compare enabled and Z update disabled. This copes with overlapping alpha polys but you can’t sort by texture. (Intersection requires decimation).
GDC March 1999 Scalability - R Huddy
Alpha Sort with Bounding Boxes
When you are ready to draw your alpha polys then draw non-overlapping sets using the sort-by-texture technique as before
AB
C
Viewport
Here, you can safely draw all of A before any of B or C…
B&C need sorting
GDC March 1999 Scalability - R Huddy
Geometry - Part 1
• Use the DX6 Transform and Clip engine - it’ll be nearly as fast as your best efforts
• It takes advantage of CPU specific optimisations done by Intel, AMD etc.
• It uses the guard band clipping region to enhance performance
• Use the DX7 interface ASAP
GDC March 1999 Scalability - R Huddy
Geometry - Part 2
• This gets you ready for hardware which can do the job much faster than the CPU
• Tell the chip designers if you need anything non-standard
• If you think DX is too slow then use a run-time benchmark to select between DX and your own code
GDC March 1999 Scalability - R Huddy
DIPVB()Geometry - Part 3
• Use the DX pipeline for geometry which may be rendered
• Use your own transform for bounding boxes, collisions, portals etc
• Treat hardware T&L as– Write only– Not necessarily pixel identical to CPU T&L
GDC March 1999 Scalability - R Huddy
Geometry - Part 4
• Consider choosing between models at game start-up time
• More complex Geometry should be several times more complex
• Introduce some LOD management
• Your artists are probably generating more complex models and then throwing them away
GDC March 1999 Scalability - R Huddy
Lighting - Part 1
• If the DX Lighting model is good enough then there are people who want to help you
• Multi-texture shadow maps and light maps can be very fast now– remember that (multi-pass != multi-texture)
• Tell the chip companies what you need
GDC March 1999 Scalability - R Huddy
Lighting - Part 2
• Support more lights
• User a richer set of light types
• Scale with available power
• If you have more complex geometry you get better lighting quality
GDC March 1999 Scalability - R Huddy
Summary
• Use the D3D pipeline as much as possible
• ‘Use’ the CPU carefully- ‘Abuse’ the fill rate
• Get on board with DX7
• Offer the richest experience possible
• You may have to treat the PC as two distinct platforms, ‘High-end’ and ‘Low-end’
GDC March 1999 Scalability - R Huddy
Questions
?Richard Huddy
www.nvidia.com
? ?
? ??
?