gpu architecture - rochester institute of...
TRANSCRIPT
![Page 1: GPU Architecture - Rochester Institute of Technologymeseec.ce.rit.edu/551-projects/fall2016/2-3.pdf · 3dfx Voodoo (1996) NVIDIA’s GeForce 256 (1999) ... + Fermi architecture: More](https://reader034.vdocuments.site/reader034/viewer/2022051523/5a7a75d07f8b9a05348bd0b1/html5/thumbnails/1.jpg)
GPU Architecture Chris Vuong Long Pham
![Page 2: GPU Architecture - Rochester Institute of Technologymeseec.ce.rit.edu/551-projects/fall2016/2-3.pdf · 3dfx Voodoo (1996) NVIDIA’s GeForce 256 (1999) ... + Fermi architecture: More](https://reader034.vdocuments.site/reader034/viewer/2022051523/5a7a75d07f8b9a05348bd0b1/html5/thumbnails/2.jpg)
Agenda
1. What is GPU?
a. Dedicated vs Integrated GPUs
b. GPU structure vs CPU
2. How does GPU work?
3. History & Evolution of GPUs
a. Background
b. 1980’s
c. 1990’s
d. 2000’s
e. 2010’s and beyond
4. OpenGL vs DirectX
5. Recent and Future Trends
![Page 3: GPU Architecture - Rochester Institute of Technologymeseec.ce.rit.edu/551-projects/fall2016/2-3.pdf · 3dfx Voodoo (1996) NVIDIA’s GeForce 256 (1999) ... + Fermi architecture: More](https://reader034.vdocuments.site/reader034/viewer/2022051523/5a7a75d07f8b9a05348bd0b1/html5/thumbnails/3.jpg)
1.What is GPU?
- A graphics processing unit.
- Accelerates creation of images.
- Used in embedded systems, mobile
phones, desktops, workstations and game
consoles.
![Page 4: GPU Architecture - Rochester Institute of Technologymeseec.ce.rit.edu/551-projects/fall2016/2-3.pdf · 3dfx Voodoo (1996) NVIDIA’s GeForce 256 (1999) ... + Fermi architecture: More](https://reader034.vdocuments.site/reader034/viewer/2022051523/5a7a75d07f8b9a05348bd0b1/html5/thumbnails/4.jpg)
a.Dedicated Card vs Integrated Card
![Page 5: GPU Architecture - Rochester Institute of Technologymeseec.ce.rit.edu/551-projects/fall2016/2-3.pdf · 3dfx Voodoo (1996) NVIDIA’s GeForce 256 (1999) ... + Fermi architecture: More](https://reader034.vdocuments.site/reader034/viewer/2022051523/5a7a75d07f8b9a05348bd0b1/html5/thumbnails/5.jpg)
- Interfaces with motherboard by means of
an expansion slot such as PCIe or AGP
- Easily replaceable or upgradeable
- Has its own RAM
- Produces much more heat than IGPs
![Page 6: GPU Architecture - Rochester Institute of Technologymeseec.ce.rit.edu/551-projects/fall2016/2-3.pdf · 3dfx Voodoo (1996) NVIDIA’s GeForce 256 (1999) ... + Fermi architecture: More](https://reader034.vdocuments.site/reader034/viewer/2022051523/5a7a75d07f8b9a05348bd0b1/html5/thumbnails/6.jpg)
Multiprocessor Structure:
- N multiprocessors with M cores each
- SIMD (Single Instruction Multiple Data) -
Cores share an Instruction Unit with other
cores in the same multiprocessor
- Shared memory, constant cache, and
texture cache
![Page 7: GPU Architecture - Rochester Institute of Technologymeseec.ce.rit.edu/551-projects/fall2016/2-3.pdf · 3dfx Voodoo (1996) NVIDIA’s GeForce 256 (1999) ... + Fermi architecture: More](https://reader034.vdocuments.site/reader034/viewer/2022051523/5a7a75d07f8b9a05348bd0b1/html5/thumbnails/7.jpg)
![Page 8: GPU Architecture - Rochester Institute of Technologymeseec.ce.rit.edu/551-projects/fall2016/2-3.pdf · 3dfx Voodoo (1996) NVIDIA’s GeForce 256 (1999) ... + Fermi architecture: More](https://reader034.vdocuments.site/reader034/viewer/2022051523/5a7a75d07f8b9a05348bd0b1/html5/thumbnails/8.jpg)
How is a pixel drawn on the screen?
![Page 9: GPU Architecture - Rochester Institute of Technologymeseec.ce.rit.edu/551-projects/fall2016/2-3.pdf · 3dfx Voodoo (1996) NVIDIA’s GeForce 256 (1999) ... + Fermi architecture: More](https://reader034.vdocuments.site/reader034/viewer/2022051523/5a7a75d07f8b9a05348bd0b1/html5/thumbnails/9.jpg)
![Page 10: GPU Architecture - Rochester Institute of Technologymeseec.ce.rit.edu/551-projects/fall2016/2-3.pdf · 3dfx Voodoo (1996) NVIDIA’s GeForce 256 (1999) ... + Fermi architecture: More](https://reader034.vdocuments.site/reader034/viewer/2022051523/5a7a75d07f8b9a05348bd0b1/html5/thumbnails/10.jpg)
Example: 1 million triangles * 100 pixels per triangle * 10 lights * 4 cycles per light computation = 4 billion cycles
![Page 11: GPU Architecture - Rochester Institute of Technologymeseec.ce.rit.edu/551-projects/fall2016/2-3.pdf · 3dfx Voodoo (1996) NVIDIA’s GeForce 256 (1999) ... + Fermi architecture: More](https://reader034.vdocuments.site/reader034/viewer/2022051523/5a7a75d07f8b9a05348bd0b1/html5/thumbnails/11.jpg)
3. History & Evolution of GPUs
a)Background Information
b) 1980’s
c) 1990’s
d) 2000’s
e) 2010’s and beyond
f) Trends
![Page 12: GPU Architecture - Rochester Institute of Technologymeseec.ce.rit.edu/551-projects/fall2016/2-3.pdf · 3dfx Voodoo (1996) NVIDIA’s GeForce 256 (1999) ... + Fermi architecture: More](https://reader034.vdocuments.site/reader034/viewer/2022051523/5a7a75d07f8b9a05348bd0b1/html5/thumbnails/12.jpg)
a)Background Information
- Graphics pipeline: The stages through which the graphics data is sent
+ Usually consists of CPU software + GPU cores
+ 3D coordinates => 2D pixel space
+ Stages in between: Geometry, Rendering
- Adopted by major GPU manufacturers such as NVIDIA, ATI
- Original GPUs used graphics pipeline with GPU performing Rendering only
- Later on GPUs started to take more tasks in the pipeline
![Page 13: GPU Architecture - Rochester Institute of Technologymeseec.ce.rit.edu/551-projects/fall2016/2-3.pdf · 3dfx Voodoo (1996) NVIDIA’s GeForce 256 (1999) ... + Fermi architecture: More](https://reader034.vdocuments.site/reader034/viewer/2022051523/5a7a75d07f8b9a05348bd0b1/html5/thumbnails/13.jpg)
Early GPU Pipeline
![Page 14: GPU Architecture - Rochester Institute of Technologymeseec.ce.rit.edu/551-projects/fall2016/2-3.pdf · 3dfx Voodoo (1996) NVIDIA’s GeForce 256 (1999) ... + Fermi architecture: More](https://reader034.vdocuments.site/reader034/viewer/2022051523/5a7a75d07f8b9a05348bd0b1/html5/thumbnails/14.jpg)
b) 1980’s
- GPUs were “integrated time buffers”
- IBM Professional Graphics Controller (PGA)
+ One of first PC’s 2D/3D video cards
+ Despite mass-market failings, became pivotal in GPU evolution
- Features were added to early GPUs by 1987
- Silicon Graphics Inc. (SGI) emergence
+ Creation of API and OpenGL
![Page 15: GPU Architecture - Rochester Institute of Technologymeseec.ce.rit.edu/551-projects/fall2016/2-3.pdf · 3dfx Voodoo (1996) NVIDIA’s GeForce 256 (1999) ... + Fermi architecture: More](https://reader034.vdocuments.site/reader034/viewer/2022051523/5a7a75d07f8b9a05348bd0b1/html5/thumbnails/15.jpg)
c) 1990’s
- Generation 0:
+ SGI’s RealityEngine
+ Cheap Hardware & Games Combo
+ Performance improvements
- Generation I:
+ 3dfx Voodoo (1996)
![Page 16: GPU Architecture - Rochester Institute of Technologymeseec.ce.rit.edu/551-projects/fall2016/2-3.pdf · 3dfx Voodoo (1996) NVIDIA’s GeForce 256 (1999) ... + Fermi architecture: More](https://reader034.vdocuments.site/reader034/viewer/2022051523/5a7a75d07f8b9a05348bd0b1/html5/thumbnails/16.jpg)
c) 1990’s (continued)
- Generation II: Breakthroughs in the field
+ Released cards could perform the entire pipeline
+ Used Accelerated Graphics Port (AGP) in place of PCI
+ New graphics features
+ Propelled computer gaming and GPU hardware markets
+ Still have room for performance improvements (fixed-function pipeline)
3dfx Voodoo (1996)
- 1 million transistors - 4 MB of 64-bit DRAM
- Core clock 50 MHz
NVIDIA’s GeForce 256 (1999) - 23 million transistors - 32 MB of 128-bit DRAM - Core clock 120 MHz
![Page 17: GPU Architecture - Rochester Institute of Technologymeseec.ce.rit.edu/551-projects/fall2016/2-3.pdf · 3dfx Voodoo (1996) NVIDIA’s GeForce 256 (1999) ... + Fermi architecture: More](https://reader034.vdocuments.site/reader034/viewer/2022051523/5a7a75d07f8b9a05348bd0b1/html5/thumbnails/17.jpg)
d) 2000’s
- Generation III:
+ GeForce 3, Radeon 8500: First GPUs
with programmable pipeline
+ Still limited in programmability
- Generation IV:
+ 2002 - GeForce FX, Radeon 9700: Fully
programmable
- Generation V:
+ GeForce 6, Radeon X800
![Page 18: GPU Architecture - Rochester Institute of Technologymeseec.ce.rit.edu/551-projects/fall2016/2-3.pdf · 3dfx Voodoo (1996) NVIDIA’s GeForce 256 (1999) ... + Fermi architecture: More](https://reader034.vdocuments.site/reader034/viewer/2022051523/5a7a75d07f8b9a05348bd0b1/html5/thumbnails/18.jpg)
Improved GPU Pipeline
![Page 19: GPU Architecture - Rochester Institute of Technologymeseec.ce.rit.edu/551-projects/fall2016/2-3.pdf · 3dfx Voodoo (1996) NVIDIA’s GeForce 256 (1999) ... + Fermi architecture: More](https://reader034.vdocuments.site/reader034/viewer/2022051523/5a7a75d07f8b9a05348bd0b1/html5/thumbnails/19.jpg)
d) 2000’s (continued)
- Generation VI:
+ GeForce 8 series (namely GeForce
8800): Unified shaders
+ SM (Streaming Multiprocessor):
Calculation of vertex, pixel, geometry
- Generation VII:
+ Fermi architecture: More
programmable
+ GPGPU (General Purpose GPU)
![Page 20: GPU Architecture - Rochester Institute of Technologymeseec.ce.rit.edu/551-projects/fall2016/2-3.pdf · 3dfx Voodoo (1996) NVIDIA’s GeForce 256 (1999) ... + Fermi architecture: More](https://reader034.vdocuments.site/reader034/viewer/2022051523/5a7a75d07f8b9a05348bd0b1/html5/thumbnails/20.jpg)
Parallelism in CPUs vs GPUs
CPUs
- Task parallelism
- Multiple tasks map to multiple threads
- Tasks run different instructions
- 10s of relatively heavyweight threads
run on 10s of cores
- Each thread managed and scheduled
explicitly
- Each thread has to be individually
programmed
GPUs
- Data parallelism
- SIMD model
- Same instruction on different data
- 10,000s of lightweight threads on
100 cores
- Threads are managed and
scheduled by hardware
- Programming done for batches of
threads(ie, 1 pixel shader per group
of pixels, or draw call)
![Page 21: GPU Architecture - Rochester Institute of Technologymeseec.ce.rit.edu/551-projects/fall2016/2-3.pdf · 3dfx Voodoo (1996) NVIDIA’s GeForce 256 (1999) ... + Fermi architecture: More](https://reader034.vdocuments.site/reader034/viewer/2022051523/5a7a75d07f8b9a05348bd0b1/html5/thumbnails/21.jpg)
Why Unify?
![Page 22: GPU Architecture - Rochester Institute of Technologymeseec.ce.rit.edu/551-projects/fall2016/2-3.pdf · 3dfx Voodoo (1996) NVIDIA’s GeForce 256 (1999) ... + Fermi architecture: More](https://reader034.vdocuments.site/reader034/viewer/2022051523/5a7a75d07f8b9a05348bd0b1/html5/thumbnails/22.jpg)
e) 2010’s and beyond
- GPU consisted of highly parallel and programmable cores
+ Essentially multi-core, general purpose CPUs
- New cards characterized this:
+ NVIDIA’s Fermi-based GTX 580
+ AMD’s Fusion (CPU+GPUs=APU)
+ Intel’s Larrabee & SandyBridge CPUs integrated GPU
![Page 23: GPU Architecture - Rochester Institute of Technologymeseec.ce.rit.edu/551-projects/fall2016/2-3.pdf · 3dfx Voodoo (1996) NVIDIA’s GeForce 256 (1999) ... + Fermi architecture: More](https://reader034.vdocuments.site/reader034/viewer/2022051523/5a7a75d07f8b9a05348bd0b1/html5/thumbnails/23.jpg)
4. OpenGL vs DirectX
![Page 24: GPU Architecture - Rochester Institute of Technologymeseec.ce.rit.edu/551-projects/fall2016/2-3.pdf · 3dfx Voodoo (1996) NVIDIA’s GeForce 256 (1999) ... + Fermi architecture: More](https://reader034.vdocuments.site/reader034/viewer/2022051523/5a7a75d07f8b9a05348bd0b1/html5/thumbnails/24.jpg)
- Both APIs rely on the use of traditional graphics pipeline.
- DirectX is more than just a graphics API (OpenGL is), it has tools to deal with
sound, music, input networking and multimedia.
- DirectX is exclusively to Windows platform whereas OpenGL is completely
cross platform.
- OpenGL is faster because of smoother and efficient pipeline.
4. OpenGL vs DirectX
![Page 25: GPU Architecture - Rochester Institute of Technologymeseec.ce.rit.edu/551-projects/fall2016/2-3.pdf · 3dfx Voodoo (1996) NVIDIA’s GeForce 256 (1999) ... + Fermi architecture: More](https://reader034.vdocuments.site/reader034/viewer/2022051523/5a7a75d07f8b9a05348bd0b1/html5/thumbnails/25.jpg)
5. Recent and Future Trends
- Moore’s Law applies to the
GPU transistors as well
- The number of transistors
have stopped increasing
recently due to
manufacturing constraints
![Page 26: GPU Architecture - Rochester Institute of Technologymeseec.ce.rit.edu/551-projects/fall2016/2-3.pdf · 3dfx Voodoo (1996) NVIDIA’s GeForce 256 (1999) ... + Fermi architecture: More](https://reader034.vdocuments.site/reader034/viewer/2022051523/5a7a75d07f8b9a05348bd0b1/html5/thumbnails/26.jpg)
5. Recent and Future Trends
- Unified Shader Architecture (center around flexible processor core).
- Extremely high parallel stream processing.
- Higher programmable capability.
![Page 27: GPU Architecture - Rochester Institute of Technologymeseec.ce.rit.edu/551-projects/fall2016/2-3.pdf · 3dfx Voodoo (1996) NVIDIA’s GeForce 256 (1999) ... + Fermi architecture: More](https://reader034.vdocuments.site/reader034/viewer/2022051523/5a7a75d07f8b9a05348bd0b1/html5/thumbnails/27.jpg)
5. Recent and Future Trends
![Page 28: GPU Architecture - Rochester Institute of Technologymeseec.ce.rit.edu/551-projects/fall2016/2-3.pdf · 3dfx Voodoo (1996) NVIDIA’s GeForce 256 (1999) ... + Fermi architecture: More](https://reader034.vdocuments.site/reader034/viewer/2022051523/5a7a75d07f8b9a05348bd0b1/html5/thumbnails/28.jpg)
References Sources:
http://mcclanahoochie.com/blog/wp-content/uploads/2011/03/gpu-hist-paper.pdf
http://www.cs.virginia.edu/~gfx/papers/pdfs/59_HowThingsWork.pdf
http://s09.idav.ucdavis.edu/talks/02_kayvonf_gpuArchTalk09.pdf
http://s09.idav.ucdavis.edu/talks/02_kayvonf_gpuArchTalk09.pdf
http://cs.nyu.edu/courses/fall15/CSCI-GA.3033-004/ http://s08.idav.ucdavis.edu/luebke-nvidia-gpu-architecture.pdf
Images:
http://s08.idav.ucdavis.edu/luebke-nvidia-gpu-architecture.pdf
http://www.hardwarezone.com.sg/feature-nvidia-geforce-8800-gtx-gts-g80-worlds-first-dx10-gpu/embracing-unified-shader-architecture
https://www.cs.utah.edu/~jeffp/teaching/MCMD/S20-GPU.pdf
https://www.directron.com/blog/what-is-pcie/