department of computer science 1 beyond cuda/gpus and future graphics architectures karu...
TRANSCRIPT
![Page 1: Department of Computer Science 1 Beyond CUDA/GPUs and Future Graphics Architectures Karu Sankaralingam University of Wisconsin-Madison Adapted from “Toward](https://reader036.vdocuments.site/reader036/viewer/2022081603/56649f305503460f94c4a9a2/html5/thumbnails/1.jpg)
Department of Computer Science
1
Beyond CUDA/GPUs and Future Graphics Architectures
Karu SankaralingamUniversity of Wisconsin-Madison
Adapted from “Toward A Multicore Architecture for Real-time Raytracing, MICRO-41, 2008, Venkatraman Govindaraju, Peter
Djeu, Karthikeyan Sankaralingam, Mary Vernon, William R. Mark.
![Page 2: Department of Computer Science 1 Beyond CUDA/GPUs and Future Graphics Architectures Karu Sankaralingam University of Wisconsin-Madison Adapted from “Toward](https://reader036.vdocuments.site/reader036/viewer/2022081603/56649f305503460f94c4a9a2/html5/thumbnails/2.jpg)
Department of Computer Science
2
Real-time Graphics Rendering
Today
![Page 3: Department of Computer Science 1 Beyond CUDA/GPUs and Future Graphics Architectures Karu Sankaralingam University of Wisconsin-Madison Adapted from “Toward](https://reader036.vdocuments.site/reader036/viewer/2022081603/56649f305503460f94c4a9a2/html5/thumbnails/3.jpg)
Department of Computer Science
3
Real-time Graphics RenderingToday Future
![Page 4: Department of Computer Science 1 Beyond CUDA/GPUs and Future Graphics Architectures Karu Sankaralingam University of Wisconsin-Madison Adapted from “Toward](https://reader036.vdocuments.site/reader036/viewer/2022081603/56649f305503460f94c4a9a2/html5/thumbnails/4.jpg)
Department of Computer Science
4
Real-time Graphics Rendering
What are the problems?How can we get there?
![Page 5: Department of Computer Science 1 Beyond CUDA/GPUs and Future Graphics Architectures Karu Sankaralingam University of Wisconsin-Madison Adapted from “Toward](https://reader036.vdocuments.site/reader036/viewer/2022081603/56649f305503460f94c4a9a2/html5/thumbnails/5.jpg)
Department of Computer Science
What is wrong with this picture?
5
![Page 6: Department of Computer Science 1 Beyond CUDA/GPUs and Future Graphics Architectures Karu Sankaralingam University of Wisconsin-Madison Adapted from “Toward](https://reader036.vdocuments.site/reader036/viewer/2022081603/56649f305503460f94c4a9a2/html5/thumbnails/6.jpg)
Department of Computer Science
GPU/CUDA
6
Z-buffer
![Page 7: Department of Computer Science 1 Beyond CUDA/GPUs and Future Graphics Architectures Karu Sankaralingam University of Wisconsin-Madison Adapted from “Toward](https://reader036.vdocuments.site/reader036/viewer/2022081603/56649f305503460f94c4a9a2/html5/thumbnails/7.jpg)
Department of Computer Science
7
Z-buffer
Arch
“Ptolemic” Graphic Universe
Architecture, application all optimized for Z-buffer Difficult to render images with realistic effects.
– self-reflection, soft shadows, ambient occlusion Problems:
– Scene constraints, Artist and programmer productivity
Application
![Page 8: Department of Computer Science 1 Beyond CUDA/GPUs and Future Graphics Architectures Karu Sankaralingam University of Wisconsin-Madison Adapted from “Toward](https://reader036.vdocuments.site/reader036/viewer/2022081603/56649f305503460f94c4a9a2/html5/thumbnails/8.jpg)
Department of Computer Science
Current Graphics Architectures
8
Courtesy: ACM Queue
![Page 9: Department of Computer Science 1 Beyond CUDA/GPUs and Future Graphics Architectures Karu Sankaralingam University of Wisconsin-Madison Adapted from “Toward](https://reader036.vdocuments.site/reader036/viewer/2022081603/56649f305503460f94c4a9a2/html5/thumbnails/9.jpg)
Department of Computer Science
How did we get here?
Hardware Rasterizers and perspective-correct texture mapping (RIVA 128)
Single Pass Multitexture (TNT / TNT2) Register Combiners: a generalization of
multitexture (GeForce 256) Per-pixel Shading (Geforce 2 GTS) Programmable Hardware Pixel Shading Programmable Vertex Shading CUDA
9
![Page 10: Department of Computer Science 1 Beyond CUDA/GPUs and Future Graphics Architectures Karu Sankaralingam University of Wisconsin-Madison Adapted from “Toward](https://reader036.vdocuments.site/reader036/viewer/2022081603/56649f305503460f94c4a9a2/html5/thumbnails/10.jpg)
Department of Computer Science
10
AlgorithmArch
“Copernican” Graphic Universe
Architecture, application revolves around Algorithm
More general purpose algorithm Easier to provide realistic effects Architecture can support other applications
Application
Ray-tracin
g
![Page 11: Department of Computer Science 1 Beyond CUDA/GPUs and Future Graphics Architectures Karu Sankaralingam University of Wisconsin-Madison Adapted from “Toward](https://reader036.vdocuments.site/reader036/viewer/2022081603/56649f305503460f94c4a9a2/html5/thumbnails/11.jpg)
Department of Computer Science
Future Graphics Architectures
11
Courtesy: ACM Queue
![Page 12: Department of Computer Science 1 Beyond CUDA/GPUs and Future Graphics Architectures Karu Sankaralingam University of Wisconsin-Madison Adapted from “Toward](https://reader036.vdocuments.site/reader036/viewer/2022081603/56649f305503460f94c4a9a2/html5/thumbnails/12.jpg)
Department of Computer Science
12
Executive Summary: Copernicus System
Co-designed application, architecture and analysis framework
Path from specialized graphics architecture to more general purpose architecture.
A detailed characterization and analysis framework
Real-time frame rates possible for high quality dynamic scenes
![Page 13: Department of Computer Science 1 Beyond CUDA/GPUs and Future Graphics Architectures Karu Sankaralingam University of Wisconsin-Madison Adapted from “Toward](https://reader036.vdocuments.site/reader036/viewer/2022081603/56649f305503460f94c4a9a2/html5/thumbnails/13.jpg)
Department of Computer Science
13
Outline
Motivation Copernicus system
– Graphics Algorithm: Razor– Architecture– Evaluation and Results
Summary
![Page 14: Department of Computer Science 1 Beyond CUDA/GPUs and Future Graphics Architectures Karu Sankaralingam University of Wisconsin-Madison Adapted from “Toward](https://reader036.vdocuments.site/reader036/viewer/2022081603/56649f305503460f94c4a9a2/html5/thumbnails/14.jpg)
Department of Computer Science
14
Ray-tracingFull
scene
Cube Cylinder
Simulating the behavior light rays through 3D scene
Rays from eye to scene (Primary rays) Rays from hitpoint to light (Secondary rays) Acceleration structure (eg. BSP Tree) for
efficiency
![Page 15: Department of Computer Science 1 Beyond CUDA/GPUs and Future Graphics Architectures Karu Sankaralingam University of Wisconsin-Madison Adapted from “Toward](https://reader036.vdocuments.site/reader036/viewer/2022081603/56649f305503460f94c4a9a2/html5/thumbnails/15.jpg)
Department of Computer Science
15
Disadvantages of Raytracing
Every frame need to rebuild the acceleration structure for dynamic scenes.
Irregular data accesses for traversing the acceleration structure.
Higher resolution secondary ray tracing computation
![Page 16: Department of Computer Science 1 Beyond CUDA/GPUs and Future Graphics Architectures Karu Sankaralingam University of Wisconsin-Madison Adapted from “Toward](https://reader036.vdocuments.site/reader036/viewer/2022081603/56649f305503460f94c4a9a2/html5/thumbnails/16.jpg)
Department of Computer Science
16
Razor: A Dynamic Multiresolution Raytracer
Cube Cylinder
Thread 1 Thread 2
Packet ray-tracer: Traces beam of rays instead of a ray– Opportunity for data level parallelism
Each thread lazily builds its own acceleration structure(KD Tree)– Builds the portion of structure it needs.
![Page 17: Department of Computer Science 1 Beyond CUDA/GPUs and Future Graphics Architectures Karu Sankaralingam University of Wisconsin-Madison Adapted from “Toward](https://reader036.vdocuments.site/reader036/viewer/2022081603/56649f305503460f94c4a9a2/html5/thumbnails/17.jpg)
Department of Computer Science
17
Razor: A Dynamic Multiresolution Raytracer
Multi-level resolution to reduce secondary rays computation.
Replicates KD-Tree to reduce synchronization across threads. – Hypothesis: Duplication across threads will be
limited.
![Page 18: Department of Computer Science 1 Beyond CUDA/GPUs and Future Graphics Architectures Karu Sankaralingam University of Wisconsin-Madison Adapted from “Toward](https://reader036.vdocuments.site/reader036/viewer/2022081603/56649f305503460f94c4a9a2/html5/thumbnails/18.jpg)
Department of Computer Science
18
Razor Implementation
Linux/x86– Implemented Razor in Intel Clovertown.– Parallelized using pthreads.
Optimized with SSE instructions Sustains 1 FPS on this prototype system Helps develop algorithms Designed with future hardware in mind
![Page 19: Department of Computer Science 1 Beyond CUDA/GPUs and Future Graphics Architectures Karu Sankaralingam University of Wisconsin-Madison Adapted from “Toward](https://reader036.vdocuments.site/reader036/viewer/2022081603/56649f305503460f94c4a9a2/html5/thumbnails/19.jpg)
Department of Computer Science
19
Razor’s Memory Usage
# Threads
Mem
ory
footp
rin
t
![Page 20: Department of Computer Science 1 Beyond CUDA/GPUs and Future Graphics Architectures Karu Sankaralingam University of Wisconsin-Madison Adapted from “Toward](https://reader036.vdocuments.site/reader036/viewer/2022081603/56649f305503460f94c4a9a2/html5/thumbnails/20.jpg)
Department of Computer Science
20
Parallel Scalability
# Threads
Sp
eed
up
1
2
3
4
5
6
1 2 3 4 5 6 7 8
CourtyardFairyforestForestJuarezSaloon
![Page 21: Department of Computer Science 1 Beyond CUDA/GPUs and Future Graphics Architectures Karu Sankaralingam University of Wisconsin-Madison Adapted from “Toward](https://reader036.vdocuments.site/reader036/viewer/2022081603/56649f305503460f94c4a9a2/html5/thumbnails/21.jpg)
Department of Computer Science
21
Outline
Motivation Copernicus system
– Graphics Algorithm: Razor– Architecture– Evaluation and Results
Summary
![Page 22: Department of Computer Science 1 Beyond CUDA/GPUs and Future Graphics Architectures Karu Sankaralingam University of Wisconsin-Madison Adapted from “Toward](https://reader036.vdocuments.site/reader036/viewer/2022081603/56649f305503460f94c4a9a2/html5/thumbnails/22.jpg)
Department of Computer Science
22
Architecture: Core
• Inorder core• Private L1 Data
and Instruction Cache
• Supports SIMD instructions
• SMT Threads to hide memory latency
![Page 23: Department of Computer Science 1 Beyond CUDA/GPUs and Future Graphics Architectures Karu Sankaralingam University of Wisconsin-Madison Adapted from “Toward](https://reader036.vdocuments.site/reader036/viewer/2022081603/56649f305503460f94c4a9a2/html5/thumbnails/23.jpg)
Department of Computer Science
23
Architecture: Tile
• Shared L2 cache• Shared
Accelerator for specialized instructions
![Page 24: Department of Computer Science 1 Beyond CUDA/GPUs and Future Graphics Architectures Karu Sankaralingam University of Wisconsin-Madison Adapted from “Toward](https://reader036.vdocuments.site/reader036/viewer/2022081603/56649f305503460f94c4a9a2/html5/thumbnails/24.jpg)
Department of Computer Science
24
Architecture: Chip
![Page 25: Department of Computer Science 1 Beyond CUDA/GPUs and Future Graphics Architectures Karu Sankaralingam University of Wisconsin-Madison Adapted from “Toward](https://reader036.vdocuments.site/reader036/viewer/2022081603/56649f305503460f94c4a9a2/html5/thumbnails/25.jpg)
Department of Computer Science
25
Architecture Razor Mapping
Assigned to Tile
Assigned to Core
![Page 26: Department of Computer Science 1 Beyond CUDA/GPUs and Future Graphics Architectures Karu Sankaralingam University of Wisconsin-Madison Adapted from “Toward](https://reader036.vdocuments.site/reader036/viewer/2022081603/56649f305503460f94c4a9a2/html5/thumbnails/26.jpg)
Department of Computer Science
26
Outline
Motivation Copernicus system
– Graphics Algorithm: Razor– Architecture– Evaluation and Results
Summary
![Page 27: Department of Computer Science 1 Beyond CUDA/GPUs and Future Graphics Architectures Karu Sankaralingam University of Wisconsin-Madison Adapted from “Toward](https://reader036.vdocuments.site/reader036/viewer/2022081603/56649f305503460f94c4a9a2/html5/thumbnails/27.jpg)
Department of Computer Science
27
Benchmark Scenes
v
Courtyard Fairyforest Forest
Juarez Saloon
![Page 28: Department of Computer Science 1 Beyond CUDA/GPUs and Future Graphics Architectures Karu Sankaralingam University of Wisconsin-Madison Adapted from “Toward](https://reader036.vdocuments.site/reader036/viewer/2022081603/56649f305503460f94c4a9a2/html5/thumbnails/28.jpg)
Department of Computer Science
28
Evaluation Methodology
Simulation with Multifacet/GEMS– Simulate SSE Instructions– Simulate a full tile– Validated with prototype data
• Pin-based and PAPI-based performance counters
– Randomly selected regions of scenes
Full chip– Simulating full chip is too slow– Build customized analytic model
![Page 29: Department of Computer Science 1 Beyond CUDA/GPUs and Future Graphics Architectures Karu Sankaralingam University of Wisconsin-Madison Adapted from “Toward](https://reader036.vdocuments.site/reader036/viewer/2022081603/56649f305503460f94c4a9a2/html5/thumbnails/29.jpg)
Department of Computer Science
29
Analytical Model
Core Level– Pipeline stalls– Multiple threads
Tile Level– L2 contention
Chip Level– Main memory contention
Compared with our simulation results
![Page 30: Department of Computer Science 1 Beyond CUDA/GPUs and Future Graphics Architectures Karu Sankaralingam University of Wisconsin-Madison Adapted from “Toward](https://reader036.vdocuments.site/reader036/viewer/2022081603/56649f305503460f94c4a9a2/html5/thumbnails/30.jpg)
Department of Computer Science
30
Single Core Performance (Single Issue)
0
0.2
0.4
0.6
0.8
1
1.2
1.4
1.6
1.8
2
Courtyard Fairyforest Forest Juarez Saloon
No SMT 2 SMT 4 SMT
IPC
![Page 31: Department of Computer Science 1 Beyond CUDA/GPUs and Future Graphics Architectures Karu Sankaralingam University of Wisconsin-Madison Adapted from “Toward](https://reader036.vdocuments.site/reader036/viewer/2022081603/56649f305503460f94c4a9a2/html5/thumbnails/31.jpg)
Department of Computer Science
31
Single Core Performance (Dual Issue)
0
0.2
0.4
0.6
0.8
1
1.2
1.4
1.6
1.8
2
Courtyard Fairyforest Forest Juarez Saloon
No SMT 2 SMT 4 SMT
IPC
![Page 32: Department of Computer Science 1 Beyond CUDA/GPUs and Future Graphics Architectures Karu Sankaralingam University of Wisconsin-Madison Adapted from “Toward](https://reader036.vdocuments.site/reader036/viewer/2022081603/56649f305503460f94c4a9a2/html5/thumbnails/32.jpg)
Department of Computer Science
32
Single Tile Performance
0
1
2
3
4
5
6
7
8
Courtyard Fairyforest Forest Juarez Saloon
No SMT 2 SMT 4 SMT
IPC
![Page 33: Department of Computer Science 1 Beyond CUDA/GPUs and Future Graphics Architectures Karu Sankaralingam University of Wisconsin-Madison Adapted from “Toward](https://reader036.vdocuments.site/reader036/viewer/2022081603/56649f305503460f94c4a9a2/html5/thumbnails/33.jpg)
Department of Computer Science
33
Full Chip Performance
0
20
40
60
80
0 2 4 6 8 10 12 14 16
Ideal1 DIMM2 DIMMs3 DIMMs4 DIMMs
#Tiles
Mil
lion
R
ays
/Seco
nd
s
![Page 34: Department of Computer Science 1 Beyond CUDA/GPUs and Future Graphics Architectures Karu Sankaralingam University of Wisconsin-Madison Adapted from “Toward](https://reader036.vdocuments.site/reader036/viewer/2022081603/56649f305503460f94c4a9a2/html5/thumbnails/34.jpg)
Department of Computer Science
34
So, Are we there yet?
![Page 35: Department of Computer Science 1 Beyond CUDA/GPUs and Future Graphics Architectures Karu Sankaralingam University of Wisconsin-Madison Adapted from “Toward](https://reader036.vdocuments.site/reader036/viewer/2022081603/56649f305503460f94c4a9a2/html5/thumbnails/35.jpg)
Department of Computer Science
35
Results
Goal: 100 Million rays per second Achieved: 50 Million rays per second
– With 16 tiles and 4 DIMMs
Insights:– 4 SMT single issue is ideal for this workload– Good parallel scalability– Razor’s physically-motivated optimizations work
Potential for further architectural optimizations– Shared accelerator– Wide SIMD bundles
![Page 36: Department of Computer Science 1 Beyond CUDA/GPUs and Future Graphics Architectures Karu Sankaralingam University of Wisconsin-Madison Adapted from “Toward](https://reader036.vdocuments.site/reader036/viewer/2022081603/56649f305503460f94c4a9a2/html5/thumbnails/36.jpg)
Department of Computer Science
36
Outline
Motivation Copernicus system
– Graphics Algorithm: Razor– Architecture– Evaluation and Results
Summary
![Page 37: Department of Computer Science 1 Beyond CUDA/GPUs and Future Graphics Architectures Karu Sankaralingam University of Wisconsin-Madison Adapted from “Toward](https://reader036.vdocuments.site/reader036/viewer/2022081603/56649f305503460f94c4a9a2/html5/thumbnails/37.jpg)
Department of Computer Science
37
Summary
A transformation path to ray-tracing– Ptolemic universe to Copernican graphics universe
Unique architecture design point– Tradeoff data redundancy and re-computation over
synchronization
Evaluation methodology interesting in its own right– Prototype, simulation and analytical framework to design
and evaluate future systems
Future work– Instructions specialization and shared accelerator design– Tradeoffs with SIMD width and area– Memory system
![Page 38: Department of Computer Science 1 Beyond CUDA/GPUs and Future Graphics Architectures Karu Sankaralingam University of Wisconsin-Madison Adapted from “Toward](https://reader036.vdocuments.site/reader036/viewer/2022081603/56649f305503460f94c4a9a2/html5/thumbnails/38.jpg)
Department of Computer Science
38
Other Questions?