part 1 - hmvelms.org science… · part 4: pipelining : an overlapped parallelism, principles of...

PART 1: Paradigms of Computing: Synchronous – Vector/Array, SIMD, Systolic Asynchronous – MIMD, reduction Paradigm, Hardware taxanomy: Flynn’s classification, Software taxonomy: Kung’s taxanomy, SPMD.

PART 2: Parallel Computing Models Parallelism in Uniprocessor Systems: Trends in parallel processing, Basic Uniprocessor

Architecture, Parallel Processing Mechanism.

PART 3: Parallel Computer Structures: Pipeline Computers, Array Computers, Multiprocessor SystemsArchitectural Classification Schemes: Multiplicity of Instruction-Data Streams, Serial versusParallel Processing, Parallelism versus Pipelining

PART 4: Pipelining : An overlapped Parallelism, Principles of Linear Pipelining, Classification of Pipeline Processors, General Pipelines and Reservation Tables

Hardware taxonomy- Flynn’s classification

SISDSIMDMISDMIMD

Systolic array

Flynn's taxonomy is a classification computer architecture of proposed by Michael j. Flynn in 1966.

Two types of information flow into a processor: instructions and data.

The instruction stream is defined as the sequence of instructions performed by the processing unit.

The data stream is defined as data traffic exchanged between the memory and the processing unit.

Single instruction stream single data stream (SISD)

Single instruction stream, multiple data streams (SIMD)

Multiple instruction streams, single data stream (MISD)

Multiple instruction streams, multiple data streams (MIMD)

One processing element which has access to a single program and data storage.

It loads an instruction and the corresponding data and executes the instruction.

The result is stored back in the data storage. SISD has Single processor. Data stored in single memory.

PROCESSING ELEMENT(PE)

MAINMEMORY(M)

Instructions

Control Unit PE MemoryPE

Fast Sequential execution No Extra source Required

DISADVANTAGES OF SISD Slow execution of large instruction

In MISD there are multiple processing elements each of which has a private program memory.

In each processing element obtains the same data element from the data memory.

Loads an instruction from its private program memory.

The different instructions are then executed in parallel.

A single instruction is applied to different data simultaneously.

SIMD machines have more than one processing element (PE).

General characteristics of SIMD computers are: − They distribute processing over a large amount

of hardware − They operate concurrently on many different

data elements – They perform the same computation on the all

data elements

Instruction operates on all loaded data in a single operation.

processing multiple data elements at the same time, with a single instruction.

performance boost if SIMD techniques can be utilized.

Not everything is suitable for SIMD processing, and not all parts of an application need to be SIMD accelerated to realize significant improvements.

Multiple processing elements each of which has a separate instruction and data memory.

Each processing element loads a separate instruction and a separate data element.

Applies the instruction to the data element, and stores a possible result back into the data storage.

Processing elements work asynchronously to each other.

MIMD distribute processing over a number of independent processors.

� Share resources, among the component processors.

� Each processor operates independently. � Each processor runs its own program.

Systolic arrays are arrays of processors which are connected to a small number of nearest neighbours in a mesh-like topology. Processors perform a sequence of

operations on data that flows between them. Generally the operations will be the same in

each processor, with each processor performing an operation on a data item and them passing it on to its neighbour

Use of a large number of PEs arranged in a well-organized structure.

In a hexagonal array, each PE has a simple function and communicates with neighbour PEs in a pipelined fashion.

multiplication of two 3-by-3 matrices A and B. Each circle represents a PE that has three inputs

and three outputs. The input and output values move through the

PEs at every clock pulse.

Considering the current situation in the input values a11 and b11, and the output value c11 arrive at the same processor element, PE after two clock pulses.

Once all these values have arrived, the PE computes a new value for c11 by performing the following operation:

c11 = c11 + a11 * b11.

LINEAR ARRAY ORTHOGONAL SYSTOLIC ARRAY HEXAGONAL SYSTOLIC ARRAY TRIANGULAR ARRAY

Processing elements are arranged in one-dimension

Interconnection between PE and nearest element only

Differ relative to the number of data flows

Linear systolic array are- Matrix –vector

multiplication One dimensional

convolution

PE are arranged in 2D grid.

Each PE in interconnected to its nearest neighbours to each direction.Systolic array differ

relative to the number and direction of data flows .

PE are arranged in a two dimensional grid

PE are connected to the nearest neighbour to where interconnection have hexagonal symmetry.

Mapping of matrix-matrix multiplication algorithm result in a hexagonal array

PE are arranged in triangular form

Triangular array different from linear array

They use the two algorithm-

Gaussian eliminationDecomposition

Network Synchrony Modularity Regularity Locality Extensibility Pipelinability

High speed and Low cost. Simple I/O subsystem. Regularity and modular design. Local interconnections. High degree of pipelining. Highly synchronized multiprocessing.

Expensive Highly specialized, custom hardware is

required often application specific. Not widely implemented. Limited code base of programs and

algorithms.

Matrix Inversion and Decomposition. Polynomial Evaluation. Systolic arrays for matrix multiplication. Image Processing. Systolic lattice filters used for speech and

seismic signal processing. Artificial neural network.

Advanced Computer Architecture: Parallelism, Scalability, Programmability by kai hwang

part 1 - hmvelms.org science… · part 4: pipelining : an overlapped parallelism, principles of...

Documents

overlapped fft processing

pipelining iv

pipelining iii

pipelining · 2020-04-13 · fall 2019 cs5513computer...

pipelining - ii

color fastness - hmvelms.org · title: color fastness...

integral university, lucknow department of computer ... ·...

pipelining - university of toronto · 2005-09-17 ·...

pipelining instruction

software pipelining

pipelining and retiming prepared by mark jarvin. agenda...

advanced pipelining

m.sc. (computer science) msc(cs) session 2014-2015 ·...

instruction pipelining

kapitel 10: pipelining - multimediale...

pipelining verilog

1 installation guide for overlapped siding

overlapped watermarking for secured data transmission

complex pipelining

separating overlapped fingerprints using constrained