large computer systems

40
Large Computer Systems CE 140 A1/A2 27 August 2003

Upload: maggie-watkins

Post on 30-Dec-2015

42 views

Category:

Documents


1 download

DESCRIPTION

Large Computer Systems. CE 140 A1/A2 27 August 2003. Rationale. Although computers are getting faster, the demands are also increasing at least as fast High-performance applications: simulations and modeling - PowerPoint PPT Presentation

TRANSCRIPT

Page 1: Large Computer Systems

Large Computer Systems

CE 140 A1/A227 August 2003

Page 2: Large Computer Systems

Rationale Although computers are getting

faster, the demands are also increasing at least as fast

High-performance applications: simulations and modeling

Circuit speed cannot be increased indefinitely eventually, physical limits will be reached, and quantum mechanical effects will be a problem

Page 3: Large Computer Systems

Rationale

To handle larger problems, parallel computers are used

Machine level parallelism Replicates entire CPUs or portions of

them

Page 4: Large Computer Systems

Design Issues

What are the nature, size, and number of the processing elements?

What are the nature, size, and number of the memory modules?

How are the processing and memory elements interconnected?

What applications are to be run in parallel?

Page 5: Large Computer Systems

Grain Size

Coarse-grained parallelism Unit of parallelism is larger Running large pieces of software in

parallel with little or no communication between the pieces

Example: large time-sharing systems Fine-grained parallelism

Parallel programs with high degree of communication with each other

Page 6: Large Computer Systems

Tightly Coupled versus Loosely Coupled

Loosely coupled Small number of large, independent

CPUs that have relatively low-speed connections to each other

Tightly coupled Smaller processing units that work

closely together over high-bandwidth connections

Page 7: Large Computer Systems

Design Issues

In most cases Coarse-grained is well suited for loosely

coupled Fine-grained is well suited for tightly

coupled

Page 8: Large Computer Systems

Communication Models

In a parallel computer system, CPUs communicate with each other to exchange information

Two general types Multiprocessors Multicomputers

Page 9: Large Computer Systems

Multiprocessors

Shared Memory System All processors may share a single

virtual address space Easy model for programmers Global memory

any processor can access any memory module without intervention by another processor

Page 10: Large Computer Systems

Uniform Memory Access (UMA) Multiprocessor

INTERCONNECTION NETWORK

P1 P2 Pn

M1 M2 Mk

Page 11: Large Computer Systems

Non-Uniform Memory Access (NUMA) Multiprocessor

INTERCONNECTION NETWORK

P1 P2 PnM2 MnM1

Page 12: Large Computer Systems

Multiprocessor

Page 13: Large Computer Systems

Multicomputers Distributed Memory System Each CPU has its own private memory Local/private memory – a processor cannot

access a remote memory without the cooperation of the remote processor

Cooperation takes place in the form of a message passing protocol

Programming for a multicomputer is much more difficult than programming a multiprocessor

Page 14: Large Computer Systems

Distributed Memory System

INTERCONNECTION NETWORK

P1 P2 Pn

M2 MnM1

Page 15: Large Computer Systems

Distributed Memory System

Page 16: Large Computer Systems

Multiprocessors versus Multicomputers

Easier to program for multiprocessors But multicomputers are much simpler

and cheaper to build Goal: large computer systems that

combines the best of both worlds

Page 17: Large Computer Systems

Taxonomy of Large Computer Systems

Instruction Streams

Data Streams

Name Examples

1 1 SISD Classical Von Neumann Machine

1 Multiple SIMD Vector supercomputer, array processor

Multiple 1 MISD NONE

Multiple Multiple MIMD Multiprocessor, Multicomputer

Page 18: Large Computer Systems

Taxonomy of Large Computer Systems

Page 19: Large Computer Systems

Symmetric MultiProcessors (SMP)

Multiprocessor architecture where all processor can access all memory locations uniformly

Processors also share I/O SMP classified as an UMA SMP is simplest multiprocessor

system Any processor can execute either the

OS kernel or user programs

Page 20: Large Computer Systems

SMP Performance improves if programs

can be run in parallel Increased availability: if one processor

breaks down, system does not stop running

Performance is also improved incrementally by adding processors

Does not scale well beyond 16 processors

Page 21: Large Computer Systems

SMP

Page 22: Large Computer Systems

SMP

Page 23: Large Computer Systems

Clusters

A group of whole computers connected together to function as a parallel computer

Popular implementation: Linux computers using Beowulf clustering software

Page 24: Large Computer Systems

Clusters

High availability – redundant resources

Scalability Affordable – off-the-shelf parts

Page 25: Large Computer Systems

Clusters

Cyborg ClusterDrexel University

32 nodesDual P3 per node

Page 26: Large Computer Systems

Clusters

Page 27: Large Computer Systems

Memory Organization Shared Memory System (Multiprocessors)

each processor may also have a cache convenient to have a global address space

For NUMA, accesses to the global address space may be slower than access to remote address space

Distributed Memory System (Multicomputers) Private address space for each processor Easiest way to connect computers into a large

system Data sharing is implemented through message

passing

Page 28: Large Computer Systems

Issues

When processors share data, different processors must access the same value for a given data item

When a processor updates its cache, it must also update the caches of other processors, or invalidate other processors’ copies

shared data must be coherent

Page 29: Large Computer Systems

Cache Coherence

All cached copies of shared data must have the same value at all times

Page 30: Large Computer Systems

Snooping Caches

So-called because individual caches “snoop” on the bus

Page 31: Large Computer Systems

Write-Through Protocol

Write-Through with Update (Write Update) Update cache and memory, update the

cache of the rest of the processors Write-Through without Update (Write

Invalidate) Update cache and memory, invalidate

the cache of the rest of the processors

Page 32: Large Computer Systems

Write-Back Protocol When a processor wants to write to a block,

it must acquire exclusive control/ownership of the block All other copies are invalidated Block’s contents may be changed at any time When another processor requests to read the

block, owner processor sends block to requesting processor, and returns control of block to the memory module which updates block to contain the latest value

Page 33: Large Computer Systems

MESI Protocol Popular write-back cache coherence

protocol named after the initials of the four possible states of each cache line Modified – entry is valid; memory is invalid; no

copies exist Exclusive – no other cache holds the line;

memory is up to date Shared – multiple caches may hold the line;

memory is up to date Invalid – cache entry does not contain valid data

Page 34: Large Computer Systems
Page 35: Large Computer Systems

Snoopy Cache Issues

Snoopy caches require broadcasting information over the bus leading to increased bus traffic if the system grows in size

Page 36: Large Computer Systems

Directory Protocols

Uses a directory that keeps tracks of locations where multiple copies of a given data item is present

Eliminates need for broadcast If directory is centralized, the

directory will be a bottleneck

Page 37: Large Computer Systems

Performance

According to Amdahl’s law, introducing machine parallelism will not have a significant effect on performance if the program cannot take advantage of the parallel architecture

Not all programs parallelize well

Page 38: Large Computer Systems

Performance

Page 39: Large Computer Systems

Scalability Issues

Page 40: Large Computer Systems

Scalability Issues

Bandwidth Latency Depends on topology