par com

Parallel Processing: Architecture and System Overview

Rajkumar Buyya

Grid Computing and Distributed Systems (GRIDS) Lab. The University of Melbourne, Australiawww.gridbus.org/~raj

WW Grid

Serial Vs. Parallel

QPlease

COUNTER

COUNTER 1

COUNTER 2

Overview of the Talk

Introduction Why Parallel Processing ? Parallel System H/W Architecture Parallel Operating Systems Parallel Programming Models Summary

P PP P P P..

Microkernel

Multi-Processor Computing System

Threads Interface

Hardware

Operating System

ProcessProcessor ThreadP

Applications

Computing Elements

Programming paradigms

Two Eras of Computing Architectures System Software/Compiler Applications P.S.Es Architectures System Software Applications P.S.Es

SequentialEra

ParallelEra

1940 50 60 70 80 90 2000 2030

Commercialization R & D Commodity

History of Parallel Processing

The notion of parallel processing can be traced to a tablet dated around 100 BC. Tablet has 3 calculating positions capable of

operating simultaneously. From this we can infer that:

They were aimed at “speed” or “reliability”.

Motivating Factor: Human Brain

The human brain consists of a large number (more than a billion) of neural cells that process information. Each cell works like a simple processor and only the massive interaction between all cells and their parallel processing makes the brain's abilities possible. Individual neuron response speed is slow

(ms) Aggregated speed with which complex

calculations carried out by (billions of) neurons demonstrate feasibility of parallel processing.

Why Parallel Processing?

Computation requirements are ever increasing: simulations, scientific prediction (earthquake),

distributed databases, weather forecasting (will it rain tomorrow?), search engines, e-commerce, Internet service applications, Data Center applications, Finance (investment risk analysis), Oil Exploration, Mining, etc.

Silicon based (sequential) architectures reaching their limits in processing capabilities (clock speed) as they are constrained by: the speed of light, thermodynamics

Age

Gro

wth

5 10 15 20 25 30 35 40 45 . . . .

Human Architecture! Growth Performance

Vertical Horizontal

No. of Processors

C.P

.I

1 2 . . . .

Computational Power Improvement

Multiprocessor

Uniprocessor


Hardware improvements like pipelining, superscalar are not scaling well and require sophisticated compiler technology to exploit performance out of them.

Techniques such as vector processing works well for certain kind of problems.


Significant development in networking technology is paving a way for network-based cost-effective parallel computing.

The parallel processing technology is mature and is being exploited commercially.

Processing Elements Architecture

Processing Elements

Flynn proposed a classification of computer systems based on a number of instruction and data streams that can be processed simultaneously.

They are: SISD (Single Instruction and Single Data)

Conventional computers SIMD (Single Instruction and Multiple Data)

Data parallel, vector computing machines MISD (Multiple Instruction and Single Data)

Systolic arrays MIMD (Multiple Instruction and Multiple Data)

General purpose machine

SISD : A Conventional Computer

Speed is limited by the rate at which computer can transfer information internally.

ProcessorData Input Data Output

Instru

ctions

Ex: PCs, Workstations

The MISD Architecture

More of an intellectual exercise than a practical configuration. Few built, but commercially not available

Data InputStream

Data OutputStream

Processor

A

Processor

B

Processor

C

InstructionStream A

InstructionStream B

Instruction Stream C

SIMD Architecture

Ex: CRAY machine vector processing, Thinking machine cm*Intel MMX (multimedia support)

Ci<= Ai * Bi

InstructionStream

Processor

A

Processor

B

Processor

C

Data Inputstream A

Data Inputstream B

Data Inputstream C

Data Outputstream A

Data Outputstream B

Data Outputstream C

Unlike SISD, MISD, MIMD computer works asynchronously.

Shared memory (tightly coupled) MIMD

Distributed memory (loosely coupled) MIMD

MIMD Architecture

Processor

A

Processor

B

Processor

C

Data Inputstream A

Data Inputstream B

Data Inputstream C

Data Outputstream A

Data Outputstream B

Data Outputstream C

InstructionStream A

InstructionStream B

InstructionStream C

MEMORY

BUS

Shared Memory MIMD machine

Comm: Source PE writes data to GM & destination retrieves it Easy to build, conventional OSes of SISD can be easily be ported Limitation : reliability & expandibility. A memory component or

any processor failure affects the whole system. Increase of processors leads to memory contention.

Ex. : Silicon graphics supercomputers....

MEMORY

BUS

Global Memory System

ProcessorA

ProcessorB

ProcessorC

MEMORY

BUS

MEMORY

BUS

Distributed Memory MIMD

● Communication : IPC (Inter-Process Communication) via High Speed Network.

● Network can be configured to ... Tree, Mesh, Cube, etc.● Unlike Shared MIMD

easily/ readily expandable Highly reliable (any CPU failure does not affect the whole system)

ProcessorA

ProcessorB

ProcessorC

MEMORY

BUS

MEMORY

BUS

MemorySystem A

MemorySystem B

MemorySystem C

IPC

channel

IPC

channel

Types of Parallel Systems

Tightly Couple Systems: Shared Memory Parallel

Smallest extension to existing systems

Program conversion is incremental

Distributed Memory Parallel Completely new systems Programs must be reconstructed

Loosely Coupled Systems: Clusters (now Clouds)

Built using commodity systems Centralised management

Grids Aggregation of distributed

systems Decentralized management

Laws of caution.....

● Speed of computation is proportional to the square root of system cost.

i.e. Speed = Cost

Speedup by a parallel computer increases as the logarithm of the number of processors. Speedup = log2(no. of processors)

S

P

log 2P

C

S

Caution....

Very fast development in network computing and related area have blurred concept boundaries, causing lot of terminological confusion : concurrent computing, parallel computing, multiprocessing, supercomputing, massively parallel processing, cluster computing, distributed computing, Internet computing, grid computing, Cloud computing, etc.

At the user level, even well-defined distinctions such as shared memory and distributed memory are disappearing due to new advances in technology.

Good tools for parallel program development and debugging are yet to emerge.

Caution....

There is no strict delimiters for contributors to the area of parallel processing: computer architecture, operating systems, high-level

languages, algorithms, databases, computer networks, …

All have a role to play.

Operating Systems forHigh Performance

Computing

Operating Systems for PP

MPP systems having thousands of processors requires OS radically different from current ones.

Every CPU needs OS : to manage its resources to hide its details

Traditional systems are heavy, complex and not suitable for MPP

Operating System Models

Frame work that unifies features, services and tasks performed

Three approaches to building OS.... Monolithic OS Layered OS Microkernel based OS

Client server OS Suitable for MPP systems Simplicity, flexibility and high performance are

crucial for OS.

ApplicationPrograms

ApplicationPrograms

System Services

Hardware

Monolithic Operating System

❃ Better application Performance❃ Difficult to extend Ex: MS-DOS

User Mode

Kernel Mode

Layered OS

● Easier to enhance● Each layer of code access lower level interface● Low-application performance

ApplicationPrograms

System Services

User Mode

Kernel Mode

Memory & I/O Device Mgmt

Hardware

Process Schedule

ApplicationPrograms

Ex : UNIX

Traditional OS

OS Designer

OS

Hardware

User Mode

Kernel Mode

ApplicationPrograms

ApplicationPrograms

New trend in OS design

User Mode

Kernel Mode

Hardware

Microkernel

ServersApplicationPrograms

ApplicationPrograms

Microkernel/Client Server OS(for MPP Systems)

● Tiny OS kernel providing basic primitive (process, memory, IPC)● Traditional services becomes subsystems● Monolithic Application Perf. Competence● OS = Microkernel + User Subsystems

ClientApplication

Thread lib.

FileServer

NetworkServer

DisplayServer

Microkernel

Hardware

User

Kernel

SendReply

Few Popular Microkernel Systems

✌MACH, CMU

✌ PARAS, C-DAC

✌Chorus

✌QNX

✌ (Windows)

Parallel Programs

Consist of multiple active “processes” simultaneously solving a given problem.

And the communication and synchronization between them (parallel processes) forms the core of parallel programming efforts.

Parallel Programming Models

Shared Memory Model DSM Threads/OpenMP (enabled for clusters) Java threads (HKU JESSICA, IBM cJVM)

Message Passing Model PVM MPI

Hybrid Model Mixing shared and distributed memory model Using OpenMP and MPI together

Object and Service Oriented Models Wide area distributed computing technologies

OO: CORBA, DCOM, etc. Services: Web Services-based service composition

Summary/Conclusions

Parallel processing has become a reality: E.g., SMPs are used as (Web) Servers extensively. Threads concept utilized everywhere. Clusters have emerged as popular data centers and

processing engine: E.g., Google search engine.

The emergence of commodity high-performance CPU, networks, and OSs have made parallel computing applicable to enterprise applications. E.g., Oracle {9i,10g} database on Clusters/Grids.

par com

Documents

parallel processing

vector processing

multiple datadata parallel

number of instruction

sisd single instruction

speed of light

data center applications

slow msaggregated speed