computing parallel architectures

59
Parallel Computing Architectures Marcus Chow

Upload: others

Post on 03-Jun-2022

6 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Computing Parallel Architectures

Parallel Computing ArchitecturesMarcus Chow

Page 2: Computing Parallel Architectures

Logistics

● Please sign up for a week to present○ One presentation per group for now

● Emerson will present DVFS in Heterogeneous Embedded Systems next week● As always, feel free to message or email me if you have any questions or concerns● Let’s start working!!!!

Page 3: Computing Parallel Architectures

Today

● Provide you a helpful resource for some concepts, terminology, and history● Accelerated workshop on various computer architectures● Hope to peak your interests enough for you to start to ask questions

Page 4: Computing Parallel Architectures

Parallel Computing ArchitecturesHow is computing machine is design?

Page 5: Computing Parallel Architectures

COMPUTING ARCHITECTURES

5

A Computing Architecture is the design of functionality, organization, and implementation of a computing system

Hardware Software User

Page 6: Computing Parallel Architectures

COMPUTING HARDWARE

6

Physical components of a computing system

ElectricalBiologicalDesigned to solve some computational problem

Application Specific General Purpose

Mechanical

Page 7: Computing Parallel Architectures

COMPUTING SOFTWARE

7

A set of instructions that directs a computer’s hardware to perform a task

Biological Mechanical Electrical

Page 8: Computing Parallel Architectures

THE USERS

8

The people who creates software and operates the hardware

Melba Roy Mouton Dorothy Vaughan

Marget Hamiliton

Page 9: Computing Parallel Architectures

WEAVING AS CODE

9

https://www2.cs.arizona.edu/patterns/weaving/webdocs/gre_dda1.pdfhttp://blog.blprnt.com/blog/blprnt/infinite-weft-exploring-the-old-aesthetic

A warp thread is woven either above or belove the weft threadCan be visualized through binary representation (drawdown)A drawdown pattern can be made to produce a non-repeating automataIt is possible to generate a Turing-complete program woven into the fabric itself

Page 10: Computing Parallel Architectures

A History of Women’s Role in Computing Architectures

https://www.youtube.com/watch?v=GuEyXXZA3Ms&t=18s

Page 11: Computing Parallel Architectures

Threads as a streams of information

● Visualize two separate flows of information through a computer● One for instructions (Thread) and the Data that is computed on● Mixing the two streams produces a single output

Page 12: Computing Parallel Architectures

A thread as a Von Neumon processor

Page 13: Computing Parallel Architectures

5-stage Compute pipeline

Page 14: Computing Parallel Architectures

5-stage Compute pipeline

Page 15: Computing Parallel Architectures

Increased Complexity

Page 16: Computing Parallel Architectures

Memory Performance Gap

Page 17: Computing Parallel Architectures

Typical Memory Hierarchy

Page 18: Computing Parallel Architectures

Principle of Locality

Page 19: Computing Parallel Architectures

Principle of Locality

Page 20: Computing Parallel Architectures

Temporal locality in instructions

Page 21: Computing Parallel Architectures

Temporal locality in Data

Page 22: Computing Parallel Architectures

Spatial locality in Instructions

Page 23: Computing Parallel Architectures

Spatial locality in Data

Page 24: Computing Parallel Architectures

Abstracted CPU

Page 25: Computing Parallel Architectures

CPU in a Computer System

CPU

Disk

Memory

I/O

GPU / Accelerator

Page 26: Computing Parallel Architectures

Software-Hardware Stack

Page 27: Computing Parallel Architectures

The Process in Operating Systems

Page 28: Computing Parallel Architectures

The Process in Operating Systems

Page 29: Computing Parallel Architectures

Process vs Thread

Page 30: Computing Parallel Architectures

Process vs Thread

Page 31: Computing Parallel Architectures

Parallel Computing ArchitecturesHow do we process more data at once?

Page 32: Computing Parallel Architectures

Multicore Designs

● Gain parallelism by adding addition cores● Each core is independent of one another● Multiple Instruction Multiple Data● How do you connect each core together?

Page 33: Computing Parallel Architectures

Interconnection Network

● Independent Cores communicate through higher level caches (L2 and L3)● Are connected through bus system interconnect

Page 34: Computing Parallel Architectures

Utilizing multiple cores

● Multithreaded process can map a thread to a single core● Can have multiple process run on different cores

Page 35: Computing Parallel Architectures

Tiled Mesh Networks

● Can connect cores through a Mesh Network

● Each tile is very tiny● Each tile routes messages to its

neighbors● Hammerblade Manycore project● How to efficiently write parallel

algorithms for it?● How can we utilize the fact that each tile

is independent of one another?

Page 36: Computing Parallel Architectures

Parallel Other than MIMD

● What if we don’t need to have independent cores?● How can we still increase parallelization within the hardware?

Page 37: Computing Parallel Architectures

Vectorization

● We want to be able process more data● Add more physical compute units● this is how a loom works

Page 38: Computing Parallel Architectures

Back to Theory

● Single Data -> Multiple Data

Page 39: Computing Parallel Architectures

Vector Processors

Page 40: Computing Parallel Architectures

Vector Processors

Page 41: Computing Parallel Architectures

Vector Operations

Page 42: Computing Parallel Architectures

Von-Neumann Model with SIMD units

SIMD Unit

Page 43: Computing Parallel Architectures

Pipeline with SIMD Units

Increased Complexity

Vector RF Vector Unit Pipeline

Page 44: Computing Parallel Architectures

Abstracted CPU with Vector Units

Vector Units

Vector Units

Page 45: Computing Parallel Architectures

Raspberry Pi QPU

● Single instruction stream has mix of scalar and vector instructions

● How to program this style of simd unit?

● What are the overheads of needing to copy data to other parts of the chip?

Page 46: Computing Parallel Architectures

Massively Parallel Vector Processors

Increased Complexity

Vector RF

Vector Unit Pipeline

Vector Unit Pipeline

Vector Unit Pipeline

● Give more space to vector units, reduce the number of scalar units

Page 47: Computing Parallel Architectures

Massively Parallel Vector Processors

Increased Complexity

Vector RF

Vector Unit Pipeline

Vector Unit Pipeline

Vector Unit Pipeline

Compute Unit / Core

Page 48: Computing Parallel Architectures

GPU Architecture ● Now we can handle thousands of threads!!● How do we organize thread execution?

Page 49: Computing Parallel Architectures

Vector Operations

Page 50: Computing Parallel Architectures

Arrays of Parallel Threads

Page 51: Computing Parallel Architectures

Thread Blocks: Scalable Cooperation

Page 52: Computing Parallel Architectures

Transparent Scalability

Page 53: Computing Parallel Architectures

blockIdx and threadIdx

Page 54: Computing Parallel Architectures

GPU Hierarchies

Page 55: Computing Parallel Architectures

CU Scaling● Are using this many cores efficient?● How should we manage all of the CUs/SEs?● How do you keep track of utilization?

8% less energy by using only 32 CUs out of 56 total

Page 56: Computing Parallel Architectures

GPU in a Computer System

CPU

Disk

Memory

I/O

GPU / Accelerator

Page 57: Computing Parallel Architectures

Kernel Launching and Tasking● How are kernels launched from the CPU to GPU?● How to enforce dependencies between kernels?● What kind of parallel algorithms does this

mechanism allow us to make?

Page 58: Computing Parallel Architectures

Integrated GPU Architectures

● What if CPU and GPU are on the same chip?

● How does that affect power consumption?

● How do we control both set of cores now?

Page 59: Computing Parallel Architectures

Sustainable Computing Architectures

● Parallel architectures are energy efficient● Everyone is trying to make using them more efficient and cost effective