introduction to parallel processing chapter - 12 shobhana rajan 7/16/2001

41
INTRODUCTION TO PARALLEL PROCESSING CHAPTER - 12 SHOBHANA RAJAN 7/16/2001

Post on 20-Dec-2015

222 views

Category:

Documents


1 download

TRANSCRIPT

Page 1: INTRODUCTION TO PARALLEL PROCESSING CHAPTER - 12 SHOBHANA RAJAN 7/16/2001

INTRODUCTION TO PARALLEL PROCESSING

CHAPTER - 12

SHOBHANA RAJAN

7/16/2001

Page 2: INTRODUCTION TO PARALLEL PROCESSING CHAPTER - 12 SHOBHANA RAJAN 7/16/2001

The Primary Objective of any Computer Design:

• To correctly Fetch• Decode and• Execute every instruction in its instruction set ,

producing correct results. Beyond this , computer architects may seek to

maximize system performance.

Page 3: INTRODUCTION TO PARALLEL PROCESSING CHAPTER - 12 SHOBHANA RAJAN 7/16/2001

What is Parallel Processing?

• A method used to improve performance in a computer system.

• A Uniprocessor system can achieve parallelism , but most parallel processing systems are multiprocessor systems.

Page 4: INTRODUCTION TO PARALLEL PROCESSING CHAPTER - 12 SHOBHANA RAJAN 7/16/2001

What is Parallel Processing?

• Parallelism is two or more things that happen at the same time.

• A system that processes two different instructions simultaneously could be considered to perform parallel processing,but a system that performs different operations on the same instruction would not.

Page 5: INTRODUCTION TO PARALLEL PROCESSING CHAPTER - 12 SHOBHANA RAJAN 7/16/2001

What is Parallel Processing?

Example : A relatively simple CPU includes the following RTL statement as part of its instruction FETCH routine:

FETCH2: DR M , PC PC + 1 Two micro operations:copying the contents of

M to DR and the contents of PR + 1 to PR occur during this state , but both are used to process the same instruction.Therefore,this is not parallel processing.

Page 6: INTRODUCTION TO PARALLEL PROCESSING CHAPTER - 12 SHOBHANA RAJAN 7/16/2001

Parallelism in Uniprocessor Systems:

A Uniprocessor System may achieve parallelism in any of the following ways:

• Instruction Pipelines:– The processor executes one instruction per clock cycle.– Each instruction requires three cycles to be fetched decoded

and executed.

Example : IBM 801 , using four instruction pipelines.

Fetch Instructions

Decode Instructions andSelect Registers

Execute Instruction

Store Result

Page 7: INTRODUCTION TO PARALLEL PROCESSING CHAPTER - 12 SHOBHANA RAJAN 7/16/2001

Parallelism in Uniprocessor Systems:

• Reconfigurable Arithmetic Pipelines:– Each stage has a Multiplexer at its input.

– The Control Unit of the CPU sets the select signals of the Multiplexer to control data flow.

*Latch

01

MUX23

S1 S0

-Latch

01

MUX23

S1 S0

+Latch

01

MUX23

S1 S0

01

MUX23

S1 S0

To Memory and Registers

0 0 X X 0 1 1 1

Page 8: INTRODUCTION TO PARALLEL PROCESSING CHAPTER - 12 SHOBHANA RAJAN 7/16/2001

Parallelism in Uniprocessor Systems:

• Vectored Arithmetic Unit:– Arithmetic Pipelines cannot perform different

operations simultaneously.– A Vectored Arithmetic Unit contains multiple

functional units to perform different operations.

Data Input

Connections

+

-

*

/

Data Input

Connections

.

.

.

.

.

.

.

.

.

.

.

.

Data Inputs

Page 9: INTRODUCTION TO PARALLEL PROCESSING CHAPTER - 12 SHOBHANA RAJAN 7/16/2001

Parallelism in Uniprocessor Systems:

• Vectored Arithmetic Unit:• Problem:

– Getting all the data to the Vectored Arithmetic Unit.

– The CPU can address this issue by using Multiple Buses or Very Wide Data Buses.

• The system can Improve Performance– by allowing multiple , simultaneous memory access.

– by getting the memory chips to handle multiple transfers simultaneously.

Page 10: INTRODUCTION TO PARALLEL PROCESSING CHAPTER - 12 SHOBHANA RAJAN 7/16/2001

Multiport Memory:

• Is designed for the purpose of handling multiple transfers within the memory itself.– A Multiport memory chip has two sets of address , data

and control pins for simultaneous data transfer.

– The CPU and DMA Controller can transfer data concurrently.

– A system with more than one CPU can handle simultaneous requests from two different processors.

Page 11: INTRODUCTION TO PARALLEL PROCESSING CHAPTER - 12 SHOBHANA RAJAN 7/16/2001

Multiport Memory:

• Advantage:– Can handle two requests to read data from the same

location at the same time.

• Disadvantage:– Multiport Memory cannot process two simultaneous

requests to write data to the same memory location or to read from and write to the same memory location.

Page 12: INTRODUCTION TO PARALLEL PROCESSING CHAPTER - 12 SHOBHANA RAJAN 7/16/2001

Organization of Multiprocessor Systems:

• There are many ways to organize the processors and memory within a multiprocessor system,and different ways to classify these systems,including :– Flynn’s Classification

– System Topologies and

– MIMD System Architectures.

Page 13: INTRODUCTION TO PARALLEL PROCESSING CHAPTER - 12 SHOBHANA RAJAN 7/16/2001

Flynn’s Classification

• It is a commonly accepted taxonomy of computer organization proposed by researcher Michael J. Flynn.

• This classification is based on the flow of instructions and data processing within the computer.

Page 14: INTRODUCTION TO PARALLEL PROCESSING CHAPTER - 12 SHOBHANA RAJAN 7/16/2001

Flynn’s Classification (contd.)

A computer is classified by whether it processes a single instruction or multiple instructions at a time , and whether it operates on one or multiple data sets.

The four categories are as follows: 1) SISD : Single Instruction Single Data 2) SIMD : Single Instruction Multiple Data 3) MISD : Multiple Instruction Single Data 4) MIMD: Multiple Instruction Multiple Data

Page 15: INTRODUCTION TO PARALLEL PROCESSING CHAPTER - 12 SHOBHANA RAJAN 7/16/2001

Flynn’s Classification (contd.) SISD Machines:• SISD consist of a single CPU executing individual

instructions on individual data.

• This is the Classic von Neumann architecture studied in this text.

MISD Machines:• The MISD Classification is not practical to implement.

• No significant MISD Machines have been built till today.

SIMD Machines:• SIMD Machines execute a single instruction on multiple

data values simultaneously using many processors.• SIMD Machines have been built and can serve a

practical purpose.

Page 16: INTRODUCTION TO PARALLEL PROCESSING CHAPTER - 12 SHOBHANA RAJAN 7/16/2001

CPU MemorySubsystem

Address Bus

Data Bus

Control Bus

I/ODevice

I/ODevice

...

I/O Subsystem

A Generic SISD Organization

Page 17: INTRODUCTION TO PARALLEL PROCESSING CHAPTER - 12 SHOBHANA RAJAN 7/16/2001

MainMemory

ControlUnit

Processor

Processor

Processor

Memory

Memory

Memory

CommunicationNetwork

.

.

....

A Generic SIMD Organization

Page 18: INTRODUCTION TO PARALLEL PROCESSING CHAPTER - 12 SHOBHANA RAJAN 7/16/2001

MIMD Machines

• Referred to as Multiprocessors or Multicomputers.

• As these machines have multiple processors , each processor (CPU) includes its own Control Unit.

• The processors can be assigned to parts of the same task or to completely separate tasks.

Page 19: INTRODUCTION TO PARALLEL PROCESSING CHAPTER - 12 SHOBHANA RAJAN 7/16/2001

Topology of a Multiprocessor System: Some definitions:

• Topology:– The Topology of a Multiprocessor System refers to the

pattern of connections between its processors.

• Diameter:– The Diameter is the maximum distance between two

processors in the computer system.

– It is the maximum distance a message must pass through to reach its final destination .

• Bandwidth:– It is the capacity of a communications link multiplied by

the number of such links in the system.

– It is the best-case scenario achieved when every link is active simultaneously.

– It almost never occurs.

Page 20: INTRODUCTION TO PARALLEL PROCESSING CHAPTER - 12 SHOBHANA RAJAN 7/16/2001

Topology of a Multiprocessor System: Some definitions: (contd.)• Bisection Bandwidth:

– When a network is divided into two halves with equal number of processors (or within one if the number of processors is odd),the total bandwidth of the links connecting the two halves is the Bisection Bandwidth.

– It is close to a worst-case scenario.– It represents the maximum data transfer that

could occur at the bottleneck in the topology.

Page 21: INTRODUCTION TO PARALLEL PROCESSING CHAPTER - 12 SHOBHANA RAJAN 7/16/2001

Types of System Topologies:

• Shared Bus Topology.• Ring Topology.• Tree Topology.• Mesh Topology.• Hypercube.• Completely Connected.

Page 22: INTRODUCTION TO PARALLEL PROCESSING CHAPTER - 12 SHOBHANA RAJAN 7/16/2001

Shared Bus Topology:

• Processors communicate with each other exclusively via this bus.

• The bus can only handle one data transmission at a time.

• Its diameter is 1 , total bandwidth is 1*l and bisection bandwidth is also 1*l (where l is the bandwidth).

Page 23: INTRODUCTION TO PARALLEL PROCESSING CHAPTER - 12 SHOBHANA RAJAN 7/16/2001

M M M

PP P

Shared Bus

Global Memory

...

SHARED BUS TOPOLOGY

Page 24: INTRODUCTION TO PARALLEL PROCESSING CHAPTER - 12 SHOBHANA RAJAN 7/16/2001

Ring Topology:

• Processors communicate with each other directly instead of a bus.

• All communication links are active simultaneously.

• A ring with n processors has diameter of |_n/2_| , total bandwidth of n*l and bisection bandwidth is 2*l (where l is the bandwidth).

Page 25: INTRODUCTION TO PARALLEL PROCESSING CHAPTER - 12 SHOBHANA RAJAN 7/16/2001

P

PP

PP

P

RING TOPOLOGY

Page 26: INTRODUCTION TO PARALLEL PROCESSING CHAPTER - 12 SHOBHANA RAJAN 7/16/2001

Tree Topology:

• Processors communicate with each other directly like in ring topology.

• Each processor has three connections.

• It has an advantageously low diameter of 2*|_log n_| , total bandwidth of (n-1)*l and bisection bandwidth of 1*l (where l is the bandwidth).

Page 27: INTRODUCTION TO PARALLEL PROCESSING CHAPTER - 12 SHOBHANA RAJAN 7/16/2001

P

P

PP

P

PP

TREE TOPOLOGY

Page 28: INTRODUCTION TO PARALLEL PROCESSING CHAPTER - 12 SHOBHANA RAJAN 7/16/2001

Mesh Topology:

• Every processor connects to the processors above and below it , and to its left and right.

• It has a diameter of 2n , total bandwidth of (2n - 2n) and bisection bandwidth of 2n*l (where l is the bandwidth).

Page 29: INTRODUCTION TO PARALLEL PROCESSING CHAPTER - 12 SHOBHANA RAJAN 7/16/2001

P P P

P

P

P

P

P

P

MESH TOPOLOGY

Page 30: INTRODUCTION TO PARALLEL PROCESSING CHAPTER - 12 SHOBHANA RAJAN 7/16/2001

Hypercube:

• Is a multidimensional mesh.

• It has n processors with nlogn connections.

• It has a relatively low diameter of logn , total bandwidth of (n/2)*logn*l and a bisection bandwidth of (n/2)*l (where l is the bandwidth).

Page 31: INTRODUCTION TO PARALLEL PROCESSING CHAPTER - 12 SHOBHANA RAJAN 7/16/2001

P

P

P

P

P

P

P

PP

P

P

PP

PPP

HYPERCUBE

Page 32: INTRODUCTION TO PARALLEL PROCESSING CHAPTER - 12 SHOBHANA RAJAN 7/16/2001

Completely Connected:

• Every processor has n-1 connections , one to each of the other processors.

• Its diameter is 1 , a total bandwidth of (n/2)*(n-1)*l and bisection bandwidth of (|_n/2_| * n/2 )*l (where l is the bandwidth).

Page 33: INTRODUCTION TO PARALLEL PROCESSING CHAPTER - 12 SHOBHANA RAJAN 7/16/2001

P P

P P

P P

P P

COMPLETELY CONNECTED

Page 34: INTRODUCTION TO PARALLEL PROCESSING CHAPTER - 12 SHOBHANA RAJAN 7/16/2001

MIMD System Architectures:

• The Architecture of an MIMD system refers to its connections with respect to system memory.

• A Symmetric Multiprocessor ( SMP ) is a computer system that has two or more processors with comparable capabilities.– The processors are capable of performing the

same functions ; this is the symmetry of the SMPs.

Page 35: INTRODUCTION TO PARALLEL PROCESSING CHAPTER - 12 SHOBHANA RAJAN 7/16/2001

Types of SMP:

• Uniform Memory Access ( UMA ).

• NonUniform Memory Access ( NUMA ).

• Cache Coherent NUMA ( CC-NUMA).

• Cache Only Memory Access ( COMA ).

Page 36: INTRODUCTION TO PARALLEL PROCESSING CHAPTER - 12 SHOBHANA RAJAN 7/16/2001

Uniform Memory Access ( UMA ):

• UMA gives all CPUs equal access to all locations in shared memory.

Processor 1

Communications Mechanism

Shared Memory

Processor 2

Processor n

.

.

.

Page 37: INTRODUCTION TO PARALLEL PROCESSING CHAPTER - 12 SHOBHANA RAJAN 7/16/2001

NonUniform Memory Access ( NUMA ):• NUMA architectures do not allow uniform access to all

shared locations.

• Each processor can access the memory module closest to it , its local shared memory faster than the other

modules , hence ununiform memory access times. Example: The Cray T3E Supercomputer.

Processor 1 Processor 2 Processor n

Memory 1 Memory 1 Memory n

Communications Mechanism

. . .

Page 38: INTRODUCTION TO PARALLEL PROCESSING CHAPTER - 12 SHOBHANA RAJAN 7/16/2001

Cache Coherent NUMA ( CCNUMA ):• It is similar to the NUMA Architecture.

• In addition each processor includes cache memory. Example: Silicon Graphic’s SGI.

Cache Only Memory Access ( COMA ):• In this architecture , each processor’s local memory is treated

as a cache. Example: 1 )Kendall Square Research’s KSR1 and

KSR2. 2 )The Swedish Institute of Computer Science’s

Data Diffusion Machine ( DDM ).

Page 39: INTRODUCTION TO PARALLEL PROCESSING CHAPTER - 12 SHOBHANA RAJAN 7/16/2001

Multicomputers:

• A Multicomputer is an MIMD machine in which all processors are not under the control of one operating system.

• Each processor or group of processors is in charge of a different operating system.

• One centralized scheduler allocates tasks to processors and processors to tasks.

Page 40: INTRODUCTION TO PARALLEL PROCESSING CHAPTER - 12 SHOBHANA RAJAN 7/16/2001

Multicomputers:

Network Of Workstations ( NOW ) or Cluster Of Workstations ( COW ):– NOWs and COWs are more than a group of

workstations on a local area network (LAN).– They have a master scheduler , which

matches tasks and processors together.

Page 41: INTRODUCTION TO PARALLEL PROCESSING CHAPTER - 12 SHOBHANA RAJAN 7/16/2001

Massively Parallel Processor ( MPP ):

• Consist of many self-contained nodes , each having a processor , memory , and hardware for implementing internal communications.

• The processors communicate with each other using shared memory.

Example: IBM’s Blue Gene.