xenos: xbox360 gpu - lthfileadmin.cs.lth.se/cs/personal/michael_doggett/talks/unc-xenos...xxx 48...

36
Xenos: XBOX360 GPU Michael Doggett Architect November 14, 2005

Upload: lamphuc

Post on 09-Jun-2018

397 views

Category:

Documents


11 download

TRANSCRIPT

Page 1: Xenos: XBOX360 GPU - LTHfileadmin.cs.lth.se/cs/Personal/Michael_Doggett/talks/unc-xenos...xxx 48 Bits xxx xxx xxx xxx xxx xxx xxx xxx xxx xxx xxx xxx xxx xxx xxx RRRRRGGGGGGBBBBB

Xenos: XBOX360 GPU

Michael DoggettArchitectNovember 14, 2005

Page 2: Xenos: XBOX360 GPU - LTHfileadmin.cs.lth.se/cs/Personal/Michael_Doggett/talks/unc-xenos...xxx 48 Bits xxx xxx xxx xxx xxx xxx xxx xxx xxx xxx xxx xxx xxx xxx xxx RRRRRGGGGGGBBBBB

2

Overview

• Xenos• Rendering performance• GPU architecture• Unified shader• Memory Export• Texture/Vertex Fetch• HDR rendering • Displaced subdivision surfaces

• Graphics Hardware • GPU Realities• Graphics APIs• GPU Research

Page 3: Xenos: XBOX360 GPU - LTHfileadmin.cs.lth.se/cs/Personal/Michael_Doggett/talks/unc-xenos...xxx 48 Bits xxx xxx xxx xxx xxx xxx xxx xxx xxx xxx xxx xxx xxx xxx xxx RRRRRGGGGGGBBBBB

3

ATI - Driving the Visual Experience Everywhere

• Products from cell phones to super computers

ColorPhoneDisplay

Gaming

Gaming Console

Multimedia

Workstation Multi Monitor Display

Integrated

Embedded Display

Digital TV

Notebook

Page 4: Xenos: XBOX360 GPU - LTHfileadmin.cs.lth.se/cs/Personal/Michael_Doggett/talks/unc-xenos...xxx 48 Bits xxx xxx xxx xxx xxx xxx xxx xxx xxx xxx xxx xxx xxx xxx xxx RRRRRGGGGGGBBBBB

4

System architecture

GPU UNIFIEDMEMORY

32GB/s

700MHz128bit

GDDR3

22.4GB/s

Northbridge

Southbridge2x PCIE500MB/s

2x 10.8 GB/s

DAUGHTERDIE

CPU

Page 5: Xenos: XBOX360 GPU - LTHfileadmin.cs.lth.se/cs/Personal/Michael_Doggett/talks/unc-xenos...xxx 48 Bits xxx xxx xxx xxx xxx xxx xxx xxx xxx xxx xxx xxx xxx xxx xxx RRRRRGGGGGGBBBBB

5

Rendering performance

• GPU to Daughter Die interface• 8 pixels/clk

• 32BPP color• 4 samples Z - Lossless compression

• 16 pixels/clk – Double Z• 4 samples Z - Lossless compression

DAUGHTERDIE

GPU32GB/s

Page 6: Xenos: XBOX360 GPU - LTHfileadmin.cs.lth.se/cs/Personal/Michael_Doggett/talks/unc-xenos...xxx 48 Bits xxx xxx xxx xxx xxx xxx xxx xxx xxx xxx xxx xxx xxx xxx xxx RRRRRGGGGGGBBBBB

6

Rendering performance

• Alpha and Z logic to EDRAM interface• 256GB/s• Color and Z - 32 samples

• 32bit color, 24bit Z, 8bit stencil

• Double Z - 64 samples• 24bit Z, 8bit stencil

DAUGHTER DIE

10MB EDRAM256GB/s

8pix/clk, 4x MSAA, Stencil and Z test, Alpha blending

Page 7: Xenos: XBOX360 GPU - LTHfileadmin.cs.lth.se/cs/Personal/Michael_Doggett/talks/unc-xenos...xxx 48 Bits xxx xxx xxx xxx xxx xxx xxx xxx xxx xxx xxx xxx xxx xxx xxx RRRRRGGGGGGBBBBB

7

Unified Shader

GPU architecture

Vertex Pipeline

Pixel Pipeline

Display Pixels

Primitive SetupClipper

RasterizerHierarchical Z/S

Index Stream GeneratorTessellator

OutputBuffer

GPU

DAUGHTERDIE

Memory Export

UNIFIEDMEMORY

Texture/Vertex Fetch

Page 8: Xenos: XBOX360 GPU - LTHfileadmin.cs.lth.se/cs/Personal/Michael_Doggett/talks/unc-xenos...xxx 48 Bits xxx xxx xxx xxx xxx xxx xxx xxx xxx xxx xxx xxx xxx xxx xxx RRRRRGGGGGGBBBBB

8

Unified Shader

• A revolutionary step in Graphics Hardware

• One hardware design that performs both Vertex and Pixel shaders

• Vertex processing power

Pixel Shader

Vertex Shader

Vertices

Pixels

PixelsVertices

Unified Shader

Page 9: Xenos: XBOX360 GPU - LTHfileadmin.cs.lth.se/cs/Personal/Michael_Doggett/talks/unc-xenos...xxx 48 Bits xxx xxx xxx xxx xxx xxx xxx xxx xxx xxx xxx xxx xxx xxx xxx RRRRRGGGGGGBBBBB

9

Unified Shader

• GPU based vertex and pixel load balancing• Better vertex and pixel resource usage

• Union of features • E.g. Control flow, indexable constant, …

• DX9 Shader Model 3.0+

PixelsVertices

Unified Shader3 SIMDs

Vec4 Scalar

48 ALUs

Page 10: Xenos: XBOX360 GPU - LTHfileadmin.cs.lth.se/cs/Personal/Michael_Doggett/talks/unc-xenos...xxx 48 Bits xxx xxx xxx xxx xxx xxx xxx xxx xxx xxx xxx xxx xxx xxx xxx RRRRRGGGGGGBBBBB

10

Memory Export

• Shader output to a computed address• Virtualize shader resources - multipass• Shader debug• Randomly update data structures from

Vertex or Pixel Shader• Scatter write

GPU

EDRAM

MemoryExport

UNIFIEDMEMORY

Page 11: Xenos: XBOX360 GPU - LTHfileadmin.cs.lth.se/cs/Personal/Michael_Doggett/talks/unc-xenos...xxx 48 Bits xxx xxx xxx xxx xxx xxx xxx xxx xxx xxx xxx xxx xxx xxx xxx RRRRRGGGGGGBBBBB

11

Texture/Vertex Fetch

• Shader fetch can be either:• Texture fetch (16 units)

• LOD computation• Linear, Bi-linear, Tri-linear Filtering• Uses cache optimized for 2D, 3D texture data

with varying pixel sizes• Unified texture cache

• Vertex fetch (16 units)• Uses cache optimized for vertex-style data

Page 12: Xenos: XBOX360 GPU - LTHfileadmin.cs.lth.se/cs/Personal/Michael_Doggett/talks/unc-xenos...xxx 48 Bits xxx xxx xxx xxx xxx xxx xxx xxx xxx xxx xxx xxx xxx xxx xxx RRRRRGGGGGGBBBBB

12

Texture Arrays

• Generalization of 6 faced cube maps to 64 faces

• Each face is a 2D mipmapped surface

• Not volume texture• Applications

• Animation frames• Varying skins for

instanced characters / objects

• Character shadow texture flipbook animations

0

1

63

Page 13: Xenos: XBOX360 GPU - LTHfileadmin.cs.lth.se/cs/Personal/Michael_Doggett/talks/unc-xenos...xxx 48 Bits xxx xxx xxx xxx xxx xxx xxx xxx xxx xxx xxx xxx xxx xxx xxx RRRRRGGGGGGBBBBB

13

Texture array application :Unique seeds for instanced shading

Page 14: Xenos: XBOX360 GPU - LTHfileadmin.cs.lth.se/cs/Personal/Michael_Doggett/talks/unc-xenos...xxx 48 Bits xxx xxx xxx xxx xxx xxx xxx xxx xxx xxx xxx xxx xxx xxx xxx RRRRRGGGGGGBBBBB

14

Texture array application :Hundreds of instanced characters

Page 15: Xenos: XBOX360 GPU - LTHfileadmin.cs.lth.se/cs/Personal/Michael_Doggett/talks/unc-xenos...xxx 48 Bits xxx xxx xxx xxx xxx xxx xxx xxx xxx xxx xxx xxx xxx xxx xxx RRRRRGGGGGGBBBBB

15

Texture compression

• All of the old DXT formats• DXT1, DXT2/3, DXT4/5

• Several new formats (variations on above formats)• DXT3A• DXT3A as 1111• DXT5A• DXN • CTX1

Page 16: Xenos: XBOX360 GPU - LTHfileadmin.cs.lth.se/cs/Personal/Michael_Doggett/talks/unc-xenos...xxx 48 Bits xxx xxx xxx xxx xxx xxx xxx xxx xxx xxx xxx xxx xxx xxx xxx RRRRRGGGGGGBBBBB

16

RRRRRGGGGGGBBBBB

RRRRRGGGGGGBBBBB32 Bits

xx

xx

xx

xx

xx

xx

xx

xx

xx

xx

xx

xx

xx

xx

xx

xx

32 Bits

Color Block

DXT1

• Two 565 colors• Each texel lies on line

segment connecting these two colors in RGB space

• Optionally has 1-bit alpha channel depending on relative ordering of RGB colors

Page 17: Xenos: XBOX360 GPU - LTHfileadmin.cs.lth.se/cs/Personal/Michael_Doggett/talks/unc-xenos...xxx 48 Bits xxx xxx xxx xxx xxx xxx xxx xxx xxx xxx xxx xxx xxx xxx xxx RRRRRGGGGGGBBBBB

17

xxxx

xxxx

xxxx

xxxx

xxxx

xxxx

xxxx

xxxx

xxxx

xxxx

xxxx

xxxx

xxxx

xxxx

xxxx

xxxx

Alpha Block

RRRRRGGGGGGBBBBB

RRRRRGGGGGGBBBBB

xx

xx

xx

xx

xx

xx

xx

xx

xx

xx

xx

xx

xx

xx

xx

xx

Color Block

64 Bits

DXT2 DXT3

32 Bits

32 Bits

Page 18: Xenos: XBOX360 GPU - LTHfileadmin.cs.lth.se/cs/Personal/Michael_Doggett/talks/unc-xenos...xxx 48 Bits xxx xxx xxx xxx xxx xxx xxx xxx xxx xxx xxx xxx xxx xxx xxx RRRRRGGGGGGBBBBB

18

DXT4 DXT5

AAAAAAAA

AAAAAAAA16 Bits

xxx

48 Bitsxxx

xxx

xxx

xxx

xxx

xxx

xxx

xxx

xxx

xxx

xxx

xxx

xxx

xxx

xxx

RRRRRGGGGGGBBBBB

RRRRRGGGGGGBBBBB

xx

xx

xx

xx

xx

xx

xx

xx

xx

xx

xx

xx

xx

xx

xx

xx

Color Block Alpha Block

32 Bits

32 Bits

Page 19: Xenos: XBOX360 GPU - LTHfileadmin.cs.lth.se/cs/Personal/Michael_Doggett/talks/unc-xenos...xxx 48 Bits xxx xxx xxx xxx xxx xxx xxx xxx xxx xxx xxx xxx xxx xxx xxx RRRRRGGGGGGBBBBB

19

xxxx

xxxx

xxxx

xxxx

xxxx

xxxx

xxxx

xxxx

xxxx

xxxx

xxxx

xxxx

xxxx

xxxx

xxxx

xxxx

Scalar Block

64 Bits

DXT3A

• 4-bit scalar replicated into all channels in pixel shader

Page 20: Xenos: XBOX360 GPU - LTHfileadmin.cs.lth.se/cs/Personal/Michael_Doggett/talks/unc-xenos...xxx 48 Bits xxx xxx xxx xxx xxx xxx xxx xxx xxx xxx xxx xxx xxx xxx xxx RRRRRGGGGGGBBBBB

20

RGBA

64 BitsRGBA

RGBA

RGBA

RGBA

RGBA

RGBA

RGBA

RGBA

RGBA

RGBA

RGBA

RGBA

RGBA

RGBA

RGBA

• 1 bit per channel texture• Filters as you would expect• Useful for concise masks / logical textures

DXT3A as 1111

Page 21: Xenos: XBOX360 GPU - LTHfileadmin.cs.lth.se/cs/Personal/Michael_Doggett/talks/unc-xenos...xxx 48 Bits xxx xxx xxx xxx xxx xxx xxx xxx xxx xxx xxx xxx xxx xxx xxx RRRRRGGGGGGBBBBB

21

DXT5A

XXXXXXXX

XXXXXXXX16 Bits

xxx

48 Bitsxxx

xxx

xxx

xxx

xxx

xxx

xxx

xxx

xxx

xxx

xxx

xxx

xxx

xxx

xxx

Scalar Block

Page 22: Xenos: XBOX360 GPU - LTHfileadmin.cs.lth.se/cs/Personal/Michael_Doggett/talks/unc-xenos...xxx 48 Bits xxx xxx xxx xxx xxx xxx xxx xxx xxx xxx xxx xxx xxx xxx xxx RRRRRGGGGGGBBBBB

22

DXN

YYYYYYYY

YYYYYYYY16 Bits

xxx

48 Bitsxxx

xxx

xxx

xxx

xxx

xxx

xxx

xxx

xxx

xxx

xxx

xxx

xxx

xxx

xxx

y Block

XXXXXXXX

XXXXXXXX

xxx

xxx

xxx

xxx

xxx

xxx

xxx

xxx

xxx

xxx

xxx

xxx

xxx

xxx

xxx

xxx

x Block

• Same as 3Dc normal compression • Two-channel version of DXT5A• Good for tangent-space normal maps

16 Bits

48 Bits

Page 23: Xenos: XBOX360 GPU - LTHfileadmin.cs.lth.se/cs/Personal/Michael_Doggett/talks/unc-xenos...xxx 48 Bits xxx xxx xxx xxx xxx xxx xxx xxx xxx xxx xxx xxx xxx xxx xxx RRRRRGGGGGGBBBBB

23

XXXXXXXXYYYYYYYY

XXXXXXXXYYYYYYYY

xx

xx

xx

xx

xx

xx

xx

xx

xx

xx

xx

xx

xx

xx

xx

xx

XY Block

CTX1

• Two-channel format with channels sharing two-bit interpolation value

• Lower quality but more concise tangent-space normal maps than DXN

32 Bits

32 Bits

Page 24: Xenos: XBOX360 GPU - LTHfileadmin.cs.lth.se/cs/Personal/Michael_Doggett/talks/unc-xenos...xxx 48 Bits xxx xxx xxx xxx xxx xxx xxx xxx xxx xxx xxx xxx xxx xxx xxx RRRRRGGGGGGBBBBB

24

• Both have 8.8 anchor points• CTX1

• x and y share 2-bit interpolation value

• 4 representable normals per 4×4 block of texels

• 4 bits per texel• DXN

• Independent 3-bit interpolation values for x and y

• 64 representable normals per 4×4 block of texels

• 8 bits per texel

CTX1 vs. DXN

Upper hemisphere of normals

CTX1

DXN

Page 25: Xenos: XBOX360 GPU - LTHfileadmin.cs.lth.se/cs/Personal/Michael_Doggett/talks/unc-xenos...xxx 48 Bits xxx xxx xxx xxx xxx xxx xxx xxx xxx xxx xxx xxx xxx xxx xxx RRRRRGGGGGGBBBBB

25

High Dynamic Range Rendering

• Special compact HDR render target format:• Just 32 bits: 7e3 7e3 7e3 2• Compatible with multisample antialiasing• R, G and B are unsigned floating point

numbers• 7 bits of mantissa• 3 bits of exponent• Range of 0..16

• 2 bits of alpha channel

• 16-bit fixed point at half speed• With full blending

Page 26: Xenos: XBOX360 GPU - LTHfileadmin.cs.lth.se/cs/Personal/Michael_Doggett/talks/unc-xenos...xxx 48 Bits xxx xxx xxx xxx xxx xxx xxx xxx xxx xxx xxx xxx xxx xxx xxx RRRRRGGGGGGBBBBB

26

Displaced subdivision surfaces

• Prototype algorithm• Vineet Goel, ATI research Orlando

Page 27: Xenos: XBOX360 GPU - LTHfileadmin.cs.lth.se/cs/Personal/Michael_Doggett/talks/unc-xenos...xxx 48 Bits xxx xxx xxx xxx xxx xxx xxx xxx xxx xxx xxx xxx xxx xxx xxx RRRRRGGGGGGBBBBB

27

Displaced subdivision surface algorithm

• Tessellator: • Generates 64 vertices for each patch that are

fed into the VS.

• Vertex Shader: • Reads in one-ring, computes Stam’s method

using precomputed table lookup • Adds Displacement map

• Pixel Shader• Adds bump mapping and surface color

Page 28: Xenos: XBOX360 GPU - LTHfileadmin.cs.lth.se/cs/Personal/Michael_Doggett/talks/unc-xenos...xxx 48 Bits xxx xxx xxx xxx xxx xxx xxx xxx xxx xxx xxx xxx xxx xxx xxx RRRRRGGGGGGBBBBB

28

Displaced subdivision surfaces

Base mesh• Used by Tessellatorto generate vertices

Subdivision surface

Displaced subdivision surface

Page 29: Xenos: XBOX360 GPU - LTHfileadmin.cs.lth.se/cs/Personal/Michael_Doggett/talks/unc-xenos...xxx 48 Bits xxx xxx xxx xxx xxx xxx xxx xxx xxx xxx xxx xxx xxx xxx xxx RRRRRGGGGGGBBBBB

29

Graphics Hardware

Page 30: Xenos: XBOX360 GPU - LTHfileadmin.cs.lth.se/cs/Personal/Michael_Doggett/talks/unc-xenos...xxx 48 Bits xxx xxx xxx xxx xxx xxx xxx xxx xxx xxx xxx xxx xxx xxx xxx RRRRRGGGGGGBBBBB

30

GPU realities

• Die size/cost/yield• One die, multiple products• Decreasing technology, increasing

power• How to cool large chips ?• Ensure design is scalable• Refinements and enhancements of the

current pipeline

Page 31: Xenos: XBOX360 GPU - LTHfileadmin.cs.lth.se/cs/Personal/Michael_Doggett/talks/unc-xenos...xxx 48 Bits xxx xxx xxx xxx xxx xxx xxx xxx xxx xxx xxx xxx xxx xxx xxx RRRRRGGGGGGBBBBB

31

Graphics APIs

• Windows Vista• Virtual Memory• Improved state change efficiency

• Vista DirectX9• ClearType

• DirectX10• Unified shader programming model• Geometry Shader

• Access to entire triangle and adjacent vertices• Output to array of render targets, cube maps

• Stream output from Geometry Shader

Page 32: Xenos: XBOX360 GPU - LTHfileadmin.cs.lth.se/cs/Personal/Michael_Doggett/talks/unc-xenos...xxx 48 Bits xxx xxx xxx xxx xxx xxx xxx xxx xxx xxx xxx xxx xxx xxx xxx RRRRRGGGGGGBBBBB

32

GPU research

• Improved multi-GPU performance and antialiasing• CrossFire• Multi-chip, multi-core

• Higher Order Surfaces • Subdivision surfaces, NURBS

• General Purpose GPU (GPGPU)• Driving improved 32bit float performance• Xenos shader scatter write

Page 33: Xenos: XBOX360 GPU - LTHfileadmin.cs.lth.se/cs/Personal/Michael_Doggett/talks/unc-xenos...xxx 48 Bits xxx xxx xxx xxx xxx xxx xxx xxx xxx xxx xxx xxx xxx xxx xxx RRRRRGGGGGGBBBBB

33

GPU research - Shadows

• Performance enhancements for stencil shadows and shadow buffers

• Hierarchical rendering• User low resolution shadow map to find areas

that require detail rasterization• Chan, Durand EGSR04

• Stencil shadow volumes using 8x8 pixel tiles• Aila, Akenine-Möller GH04

• Improving Z, stencil performance

Page 34: Xenos: XBOX360 GPU - LTHfileadmin.cs.lth.se/cs/Personal/Michael_Doggett/talks/unc-xenos...xxx 48 Bits xxx xxx xxx xxx xxx xxx xxx xxx xxx xxx xxx xxx xxx xxx xxx RRRRRGGGGGGBBBBB

34

GPU research – Ray Tracing

• Accelerating ray tracing on GPUs• Build and update acceleration structures

• Xenos shader scatter write

• GPU evolution affected by ray tracing

Page 35: Xenos: XBOX360 GPU - LTHfileadmin.cs.lth.se/cs/Personal/Michael_Doggett/talks/unc-xenos...xxx 48 Bits xxx xxx xxx xxx xxx xxx xxx xxx xxx xxx xxx xxx xxx xxx xxx RRRRRGGGGGGBBBBB

35

Conclusion

• Xenos• Architecture• Features

• Graphics Hardware• Future• Research

Page 36: Xenos: XBOX360 GPU - LTHfileadmin.cs.lth.se/cs/Personal/Michael_Doggett/talks/unc-xenos...xxx 48 Bits xxx xxx xxx xxx xxx xxx xxx xxx xxx xxx xxx xxx xxx xxx xxx RRRRRGGGGGGBBBBB

36

Questions ?