throughput oriented aarchitectures

1

Throughput Oriented Architectures

2

Contents

• Throughput oriented Processors• Hardware Multithreading• Many Simple Processing Units • SIMD Execution • GPUs• NVIDIA GPU architecture• Throughput oriented programming• Conclusion

3

Key Points:

• Throughput oriented processors tackle problems where parallelism is abundant.

• Due to their design ,programming throughput oriented processors requires much more emphasis on parallelism and scalability than programming sequential processors.

• GPUs are the leading exemplars of modern throughput-oriented architectures .

4

Throughput-Oriented Architectures:

• Throughput and latency are two fundamental measures for processor performance.

• Traditional Scalar microprocessors are latency oriented architectures.

• Throughput oriented processors arise from the assumption that they will work where parallelism is abundant.

• Throughput oriented architectures rely on three key architectures:

1. Emphasis on many simple processing cores2. Extensive Hardware multi-threading3. SIMD Execution

5

Hardware Multithreading:

• A computation in which parallelism is abundant can be decomposed into a collection of concurrent sequential tasks that execute in parallel or across many threads.

• A thread is able to execute the instruction stream corresponding to a single sequential task.

• Multithreading weather in hardware or software provides a way of tolerating latency.

• Hardware multi-threading as a design strategy for improving aggregate performance on parallel workloads has a long history.

6

Hardware Multithreading:

• Tera, Sun Niagara and NVIDIA GPU22 uses multithreading for high throughput performance.

• Simultaneous multithreading is used to improve the efficiency of superscalar sequential processors.

• HEP, Tera and NVADIA G20 shows characteristics of throughput-oriented processors.

7

Many simple processing units:

• High density transistors consists of many simple processing units.

• Throughput oriented architectures achieve higher level of performance by using simple and many processing units.

• The instructions execute in the order they are in the program.• Saving in chip area allow many parallel processing units and

gives higher throughput on parallel workloads.

8

SIMD execution:

• Parallel processors uses form of SIMD execution to improve aggregate throughput.

• Two basic catagories of SIMD machines are SIMD processor array and vector processor.

• SIMD processor arrays consists of many processing units and single control unit.

• Vector processor consist of traditional scalar instructions and vector instructions operating on data vectors of fixed width.

9

• GPUs are similar to a computer's CPU. A GPU, however, is designed specifically for performing the complex mathematical and geometric calculations that are necessary for graphics rendering.

GPU:

10

• Difference between a CPU and GPU .• A CPU comprise of a few cores enhanced for serial

sequence.• And a GPU comprise of thousand of smaller more

efficient cores make for handling multiple tasks concurrently.

CPU And GPU:

11

CPU ANG GPU:

12

• Floating Point performance is 1000GFLOPS• On-chip scratchpads is 48KB/SM. • Off-chip memory bandwidth is 100GB/s

NVIDIA Fermi Graphical Processing Unit.

13

NVIDIA v Intel:

14

Performance per watt:

15

Microarchitecture of GPU

16

Reduction tree:

17

• Throughput oriented processors assume parallelism is more focused, rather than scarce, and it target is maximizing total throughput of all tasks rather than minimizing the latency of one task.

• A fully general purpose chip can not affords to aggressively trade for increased total performance at the cost of single thread performance.

Conclusion

throughput oriented aarchitectures

Engineering