throughput oriented aarchitectures
DESCRIPTION
computer architecture article related to throughput oriented architecturesTRANSCRIPT
![Page 1: Throughput oriented aarchitectures](https://reader034.vdocuments.site/reader034/viewer/2022051411/548101105806b501118b460b/html5/thumbnails/1.jpg)
1
Throughput Oriented Architectures
![Page 2: Throughput oriented aarchitectures](https://reader034.vdocuments.site/reader034/viewer/2022051411/548101105806b501118b460b/html5/thumbnails/2.jpg)
2
Contents
• Throughput oriented Processors• Hardware Multithreading• Many Simple Processing Units • SIMD Execution • GPUs• NVIDIA GPU architecture• Throughput oriented programming• Conclusion
![Page 3: Throughput oriented aarchitectures](https://reader034.vdocuments.site/reader034/viewer/2022051411/548101105806b501118b460b/html5/thumbnails/3.jpg)
3
Key Points:
• Throughput oriented processors tackle problems where parallelism is abundant.
• Due to their design ,programming throughput oriented processors requires much more emphasis on parallelism and scalability than programming sequential processors.
• GPUs are the leading exemplars of modern throughput-oriented architectures .
![Page 4: Throughput oriented aarchitectures](https://reader034.vdocuments.site/reader034/viewer/2022051411/548101105806b501118b460b/html5/thumbnails/4.jpg)
4
Throughput-Oriented Architectures:
• Throughput and latency are two fundamental measures for processor performance.
• Traditional Scalar microprocessors are latency oriented architectures.
• Throughput oriented processors arise from the assumption that they will work where parallelism is abundant.
• Throughput oriented architectures rely on three key architectures:
1. Emphasis on many simple processing cores2. Extensive Hardware multi-threading3. SIMD Execution
![Page 5: Throughput oriented aarchitectures](https://reader034.vdocuments.site/reader034/viewer/2022051411/548101105806b501118b460b/html5/thumbnails/5.jpg)
5
Hardware Multithreading:
• A computation in which parallelism is abundant can be decomposed into a collection of concurrent sequential tasks that execute in parallel or across many threads.
• A thread is able to execute the instruction stream corresponding to a single sequential task.
• Multithreading weather in hardware or software provides a way of tolerating latency.
• Hardware multi-threading as a design strategy for improving aggregate performance on parallel workloads has a long history.
![Page 6: Throughput oriented aarchitectures](https://reader034.vdocuments.site/reader034/viewer/2022051411/548101105806b501118b460b/html5/thumbnails/6.jpg)
6
Hardware Multithreading:
• Tera, Sun Niagara and NVIDIA GPU22 uses multithreading for high throughput performance.
• Simultaneous multithreading is used to improve the efficiency of superscalar sequential processors.
• HEP, Tera and NVADIA G20 shows characteristics of throughput-oriented processors.
![Page 7: Throughput oriented aarchitectures](https://reader034.vdocuments.site/reader034/viewer/2022051411/548101105806b501118b460b/html5/thumbnails/7.jpg)
7
Many simple processing units:
• High density transistors consists of many simple processing units.
• Throughput oriented architectures achieve higher level of performance by using simple and many processing units.
• The instructions execute in the order they are in the program.• Saving in chip area allow many parallel processing units and
gives higher throughput on parallel workloads.
![Page 8: Throughput oriented aarchitectures](https://reader034.vdocuments.site/reader034/viewer/2022051411/548101105806b501118b460b/html5/thumbnails/8.jpg)
8
SIMD execution:
• Parallel processors uses form of SIMD execution to improve aggregate throughput.
• Two basic catagories of SIMD machines are SIMD processor array and vector processor.
• SIMD processor arrays consists of many processing units and single control unit.
• Vector processor consist of traditional scalar instructions and vector instructions operating on data vectors of fixed width.
![Page 9: Throughput oriented aarchitectures](https://reader034.vdocuments.site/reader034/viewer/2022051411/548101105806b501118b460b/html5/thumbnails/9.jpg)
9
• GPUs are similar to a computer's CPU. A GPU, however, is designed specifically for performing the complex mathematical and geometric calculations that are necessary for graphics rendering.
GPU:
![Page 10: Throughput oriented aarchitectures](https://reader034.vdocuments.site/reader034/viewer/2022051411/548101105806b501118b460b/html5/thumbnails/10.jpg)
10
• Difference between a CPU and GPU .• A CPU comprise of a few cores enhanced for serial
sequence.• And a GPU comprise of thousand of smaller more
efficient cores make for handling multiple tasks concurrently.
CPU And GPU:
![Page 11: Throughput oriented aarchitectures](https://reader034.vdocuments.site/reader034/viewer/2022051411/548101105806b501118b460b/html5/thumbnails/11.jpg)
11
CPU ANG GPU:
![Page 12: Throughput oriented aarchitectures](https://reader034.vdocuments.site/reader034/viewer/2022051411/548101105806b501118b460b/html5/thumbnails/12.jpg)
12
• Floating Point performance is 1000GFLOPS• On-chip scratchpads is 48KB/SM. • Off-chip memory bandwidth is 100GB/s
NVIDIA Fermi Graphical Processing Unit.
![Page 13: Throughput oriented aarchitectures](https://reader034.vdocuments.site/reader034/viewer/2022051411/548101105806b501118b460b/html5/thumbnails/13.jpg)
13
NVIDIA v Intel:
![Page 14: Throughput oriented aarchitectures](https://reader034.vdocuments.site/reader034/viewer/2022051411/548101105806b501118b460b/html5/thumbnails/14.jpg)
14
Performance per watt:
![Page 15: Throughput oriented aarchitectures](https://reader034.vdocuments.site/reader034/viewer/2022051411/548101105806b501118b460b/html5/thumbnails/15.jpg)
15
Microarchitecture of GPU
![Page 16: Throughput oriented aarchitectures](https://reader034.vdocuments.site/reader034/viewer/2022051411/548101105806b501118b460b/html5/thumbnails/16.jpg)
16
Reduction tree:
![Page 17: Throughput oriented aarchitectures](https://reader034.vdocuments.site/reader034/viewer/2022051411/548101105806b501118b460b/html5/thumbnails/17.jpg)
17
• Throughput oriented processors assume parallelism is more focused, rather than scarce, and it target is maximizing total throughput of all tasks rather than minimizing the latency of one task.
• A fully general purpose chip can not affords to aggressively trade for increased total performance at the cost of single thread performance.
Conclusion