ece 669 parallel computer architecture - university of · pdf file · 2012-01-12ece...

38
ECE669 L1: Course Introduction January 29, 2004 ECE 669 Parallel Computer Architecture Lecture 1 Course Introduction Prof. Russell Tessier Department of Electrical and Computer Engineering

Upload: nguyenkien

Post on 07-Mar-2018

220 views

Category:

Documents


2 download

TRANSCRIPT

Page 1: ECE 669 Parallel Computer Architecture - University of · PDF file · 2012-01-12ECE 669 Parallel Computer Architecture Lecture 1 Course Introduction ... • Simpler hardware & software

ECE669 L1: Course Introduction January 29, 2004

ECE 669

Parallel Computer Architecture

Lecture 1

Course Introduction

Prof. Russell Tessier

Department of Electrical and Computer Engineering

Page 2: ECE 669 Parallel Computer Architecture - University of · PDF file · 2012-01-12ECE 669 Parallel Computer Architecture Lecture 1 Course Introduction ... • Simpler hardware & software

ECE669 L1: Course Introduction January 29, 2004

Welcome to ECE 669/CA720-A

° Parallel computer architectures

° Interactive: your questions welcome!

° Grading• 6 Homework assignments (30%)• Mid-term exam (35%)• Final exam (35%)

° Experiments with network and cache simulators

° Culler text – research papers

° Acknowledgment • Prof Anant Agarwal (MIT)• Prof David Culler (California – Berkeley)

Page 3: ECE 669 Parallel Computer Architecture - University of · PDF file · 2012-01-12ECE 669 Parallel Computer Architecture Lecture 1 Course Introduction ... • Simpler hardware & software

ECE669 L1: Course Introduction January 29, 2004

Parallel Architectures

Why build parallel machines?

° To help build even bigger parallel machines

° To help solve important problems

° Speed – more trials, less time

° Cost

° Larger problems

° Accuracy

Must understand typical problems

Page 4: ECE 669 Parallel Computer Architecture - University of · PDF file · 2012-01-12ECE 669 Parallel Computer Architecture Lecture 1 Course Introduction ... • Simpler hardware & software

ECE669 L1: Course Introduction January 29, 2004

MIT Computer Architecture Group – early 1990’s

Alewife Machine J-Machine

NuMesh

Page 5: ECE 669 Parallel Computer Architecture - University of · PDF file · 2012-01-12ECE 669 Parallel Computer Architecture Lecture 1 Course Introduction ... • Simpler hardware & software

ECE669 L1: Course Introduction January 29, 2004

Applications ---> Requirements

• Processing• Communication• Memory• I/O• Synchronization

° Architect must provide all of the above

NumericSymbolicCombinatorial

Page 6: ECE 669 Parallel Computer Architecture - University of · PDF file · 2012-01-12ECE 669 Parallel Computer Architecture Lecture 1 Course Introduction ... • Simpler hardware & software

ECE669 L1: Course Introduction January 29, 2004

Requirements ---> Examples

• Relaxation - near-neighbor communication• Multigrid - 1, 2, 4, 8, ... communication• Numeric comp - Floating point• Symbolic - Tags, branches• Database - I/O• Data parallel - Barrier synchronicity• Dictionary - Memory, I/O

Page 7: ECE 669 Parallel Computer Architecture - University of · PDF file · 2012-01-12ECE 669 Parallel Computer Architecture Lecture 1 Course Introduction ... • Simpler hardware & software

ECE669 L1: Course Introduction January 29, 2004

Communication requirements ---> Example

° Relaxation

° So, let’s build a special machine!

° But ... pitfalls!

i, j+1

i+1, ji-1, j

i, j-1

i, jOnly near-neighbor communication

Page 8: ECE 669 Parallel Computer Architecture - University of · PDF file · 2012-01-12ECE 669 Parallel Computer Architecture Lecture 1 Course Introduction ... • Simpler hardware & software

ECE669 L1: Course Introduction January 29, 2004

Specialized machines: Pitfalls

• Faster algorithms appear … with different communication requirements

• Cost effectiveness- Economies of scale

• Simpler hardware & software mechanisms- More flexible- May even be faster!

– e.g. Specialized support for synchronization across multiple processors

Page 9: ECE 669 Parallel Computer Architecture - University of · PDF file · 2012-01-12ECE 669 Parallel Computer Architecture Lecture 1 Course Introduction ... • Simpler hardware & software

ECE669 L1: Course Introduction January 29, 2004

Technology ---> Limitations & Opportunities

• Wires- Area- Propogation speed

• Clock• Power• VLSI

- I/O pin limitations- Chip area- Chip crossing delay- Power

• Can not make light go any faster• Three dimensions max• KISS rule

Page 10: ECE 669 Parallel Computer Architecture - University of · PDF file · 2012-01-12ECE 669 Parallel Computer Architecture Lecture 1 Course Introduction ... • Simpler hardware & software

ECE669 L1: Course Introduction January 29, 2004

Major theme

• Look at typical applications• Understand physical limitations• Make tradeoffs

ARCHITECTURE

Application requirements

Technological constraints

Page 11: ECE 669 Parallel Computer Architecture - University of · PDF file · 2012-01-12ECE 669 Parallel Computer Architecture Lecture 1 Course Introduction ... • Simpler hardware & software

ECE669 L1: Course Introduction January 29, 2004

Unfortunately

° Requirements and constraints are often at odds with each other!

° Architecture ---> making tradeoffs

Full connectivity !

Gasp!!!

Page 12: ECE 669 Parallel Computer Architecture - University of · PDF file · 2012-01-12ECE 669 Parallel Computer Architecture Lecture 1 Course Introduction ... • Simpler hardware & software

ECE669 L1: Course Introduction January 29, 2004

Putting it all together

° The systems approach

• Lesson from RISCs• Hardware software tradeoffs• Functionality implemented at the right level

- Hardware- Runtime system- Compiler- Language, Programmer- Algorithm

Page 13: ECE 669 Parallel Computer Architecture - University of · PDF file · 2012-01-12ECE 669 Parallel Computer Architecture Lecture 1 Course Introduction ... • Simpler hardware & software

ECE669 L1: Course Introduction January 29, 2004

What will you get out of this course?

° In-depth understanding of the design and engineering of modern parallel computers

• Technology forces• Fundamental architectural issues

- naming, replication, communication, synchronization• Basic design techniques

- cache coherence, protocols, networks, pipelining, …• Underlying engineering trade-offs• Case studies• Influence of applications

° From moderate to very large scale

° Across the hardware/software boundary

Page 14: ECE 669 Parallel Computer Architecture - University of · PDF file · 2012-01-12ECE 669 Parallel Computer Architecture Lecture 1 Course Introduction ... • Simpler hardware & software

ECE669 L1: Course Introduction January 29, 2004

Speedup

° Speedup (p processors) =

° For a fixed problem size (input data set), performance = 1/time

° Speedup fixed problem (p processors) =

Performance (p processors)

Performance (1 processor)

Time (1 processor)

Time (p processors)

Page 15: ECE 669 Parallel Computer Architecture - University of · PDF file · 2012-01-12ECE 669 Parallel Computer Architecture Lecture 1 Course Introduction ... • Simpler hardware & software

ECE669 L1: Course Introduction January 29, 2004

Is Parallel Computing Inevitable?

° Application demands: Our insatiable need for computing cycles

° Technology Trends

° Architecture Trends

° Economics

° Current trends:• Today’s microprocessors have multiprocessor support• Servers and workstations becoming MP: Sun, SGI, DEC, HP!...• Tomorrow’s microprocessors are multiprocessors

Page 16: ECE 669 Parallel Computer Architecture - University of · PDF file · 2012-01-12ECE 669 Parallel Computer Architecture Lecture 1 Course Introduction ... • Simpler hardware & software

ECE669 L1: Course Introduction January 29, 2004

Application Trends

° Application demand for performance fuels advances in hardware, which enables new appl’ns, which...

• Cycle drives exponential increase in microprocessor performance• Drives parallel architecture harder

- most demanding applications

° Range of performance demands• Need range of system performance with progressively increasing

cost

New ApplicationsMore Performance

Page 17: ECE 669 Parallel Computer Architecture - University of · PDF file · 2012-01-12ECE 669 Parallel Computer Architecture Lecture 1 Course Introduction ... • Simpler hardware & software

ECE669 L1: Course Introduction January 29, 2004

Commercial Computing

° Relies on parallelism for high end• Computational power determines scale of business that can be

handled

° Databases, online-transaction processing, decision support, data mining, data warehousing ...

° TPC benchmarks Explicit scaling criteria provided• Size of enterprise scales with size of system• Problem size not fixed as p increases.• Throughput is performance measure (transactions per

minute or tpm)

Page 18: ECE 669 Parallel Computer Architecture - University of · PDF file · 2012-01-12ECE 669 Parallel Computer Architecture Lecture 1 Course Introduction ... • Simpler hardware & software

ECE669 L1: Course Introduction January 29, 2004

Thr

ough

put (

tpm

C)

Number of processors

u

u

uu

uu

uuuuuuu

uuuuuu

uu

u

u

u

uuu

uu

u

u

u

u

u

6

nnnnnnn

n

nn

n

ll

ll

lll

mmmml

]

6

6

6

6

s

s

s

s

s

0

5,000

10,000

15,000

20,000

25,000

0 20 40 60 80 100 120

s Tandem Himalaya6 DEC Alpha] SGI PowerChallengel HP PAn IBM PowerPCu Other

TPC-C Results for March 1996

° Parallelism is pervasive

° Small to moderate scale parallelism very important

° Difficult to obtain snapshot to compare across vendor platforms

Page 19: ECE 669 Parallel Computer Architecture - University of · PDF file · 2012-01-12ECE 669 Parallel Computer Architecture Lecture 1 Course Introduction ... • Simpler hardware & software

ECE669 L1: Course Introduction January 29, 2004

Scientific Computing Demand

Page 20: ECE 669 Parallel Computer Architecture - University of · PDF file · 2012-01-12ECE 669 Parallel Computer Architecture Lecture 1 Course Introduction ... • Simpler hardware & software

ECE669 L1: Course Introduction January 29, 2004

1980 1985 1990 1995

1 MIPS

10 MIPS

100 MIPS

1 GIPS

Sub-BandSpeech Coding

200 WordsIsolated SpeechRecognition

SpeakerVeri¼cation

CELPSpeech Coding

ISDN-CD StereoReceiver

5,000 WordsContinuousSpeechRecognition

HDTV Receiver

CIF Video

1,000 WordsContinuousSpeechRecognitionTelephone

NumberRecognition

10 GIPS

• Also CAD, Databases, . . .

• 100 processors gets you 10 years, 1000 gets you 20 !

Applications: Speech and Image Processing

Page 21: ECE 669 Parallel Computer Architecture - University of · PDF file · 2012-01-12ECE 669 Parallel Computer Architecture Lecture 1 Course Introduction ... • Simpler hardware & software

ECE669 L1: Course Introduction January 29, 2004

Is better parallel arch enough?

° AMBER molecular dynamics simulation program

° Starting point was vector code for Cray-1

° 145 MFLOP on Cray90, 406 for final version on 128-processor Paragon, 891 on 128-processor Cray T3D

Page 22: ECE 669 Parallel Computer Architecture - University of · PDF file · 2012-01-12ECE 669 Parallel Computer Architecture Lecture 1 Course Introduction ... • Simpler hardware & software

ECE669 L1: Course Introduction January 29, 2004

Summary of Application Trends

° Transition to parallel computing has occurred for scientific and engineering computing

° In rapid progress in commercial computing• Database and transactions as well as financial• Usually smaller-scale, but large-scale systems also used

° Desktop also uses multithreaded programs, which are a lot like parallel programs

° Demand for improving throughput on sequential workloads

• Greatest use of small-scale multiprocessors

° Solid application demand exists and will increase

Page 23: ECE 669 Parallel Computer Architecture - University of · PDF file · 2012-01-12ECE 669 Parallel Computer Architecture Lecture 1 Course Introduction ... • Simpler hardware & software

ECE669 L1: Course Introduction January 29, 2004

Per

form

ance

0.1

1

10

100

1965 1970 1975 1980 1985 1990 1995

Supercomputers

Minicomputers

Mainframes

Microprocessors

Technology Trends

° Today the natural building-block is also fastest!

Page 24: ECE 669 Parallel Computer Architecture - University of · PDF file · 2012-01-12ECE 669 Parallel Computer Architecture Lecture 1 Course Introduction ... • Simpler hardware & software

ECE669 L1: Course Introduction January 29, 2004

Proc $

Interconnect

Technology: A Closer Look

° Basic advance is decreasing feature size ( λ )• Circuits become either faster or lower in power

° Die size is growing too• Clock rate improves roughly proportional to improvement in λ• Number of transistors improves like λ2 (or faster)

° Performance > 100x per decade• clock rate < 10x, rest is transistor count

° How to use more transistors?• Parallelism in processing

- multiple operations per cycle reduces CPI• Locality in data access

- avoids latency and reduces CPI- also improves processor utilization

• Both need resources, so tradeoff

° Fundamental issue is resource distribution, as in uniprocessors

Page 25: ECE 669 Parallel Computer Architecture - University of · PDF file · 2012-01-12ECE 669 Parallel Computer Architecture Lecture 1 Course Introduction ... • Simpler hardware & software

ECE669 L1: Course Introduction January 29, 2004

• 30% per year

uuuuuuu

uuu

uu

uuuu

uuuu

uu

uu

u

u

uuu

uuu

u

u

uu

u uu

u

u uu

uuuu

u

uuuuu

uuuuuuuuuuuuuuuuuuuu

uuuuu

0.1

1

10

100

1,000

19701975

19801985

19901995

20002005

Clo

ck r

ate

(MH

z)

i4004i8008

i8080

i8086 i80286i80386

Pentium100

R10000

Growth Rates

Tran

sist

ors

uuuuuuuuuu

uu

uuuuuu

uu

uuu

uuuu

uu

u

u

u u

uu

u

u

uuu

uuuuuuuuuuuuuuuuuuu

uuu

uuuuu

1,000

10,000

100,000

1,000,000

10,000,000

100,000,000

19701975

19801985

19901995

20002005

i4004i8008

i8080

i8086

i80286i80386

R2000

Pentium R10000

R3000

40% per year

Page 26: ECE 669 Parallel Computer Architecture - University of · PDF file · 2012-01-12ECE 669 Parallel Computer Architecture Lecture 1 Course Introduction ... • Simpler hardware & software

ECE669 L1: Course Introduction January 29, 2004

Architectural Trends

° Architecture translates technology’s gifts into performance and capability

° Resolves the tradeoff between parallelism and locality

• Current microprocessor: 1/3 compute, 1/3 cache, 1/3 off-chip connect

• Tradeoffs may change with scale and technology advances

° Understanding microprocessor architectural trends

=> Helps build intuition about design issues or parallel machines

=> Shows fundamental role of parallelism even in “sequential” computers

Page 27: ECE 669 Parallel Computer Architecture - University of · PDF file · 2012-01-12ECE 669 Parallel Computer Architecture Lecture 1 Course Introduction ... • Simpler hardware & software

ECE669 L1: Course Introduction January 29, 2004

Tran

sist

ors

uuuuuuu

uu

u

uu

u

u

u uuu

u

u

u

uu

uuu u

uu

u

u

u u

u

u

u

u

u

uu

u u

uuuuuu u

uu uu u

u

uuu u

uuu

uu uuu

1,000

10,000

100,000

1,000,000

10,000,000

100,000,000

1970 1975 1980 1985 1990 1995 2000 2005

Bit-level parallelism Instruction-level Thread-level (?)

i4004

i8008i8080

i8086

i80286

i80386

R2000

Pentium

R10000

R3000

Phases in “VLSI” Generation

Page 28: ECE 669 Parallel Computer Architecture - University of · PDF file · 2012-01-12ECE 669 Parallel Computer Architecture Lecture 1 Course Introduction ... • Simpler hardware & software

ECE669 L1: Course Introduction January 29, 2004

Architectural Trends

° Greatest trend in VLSI generation is increase in parallelism

• Up to 1985: bit level parallelism: 4-bit -> 8 bit -> 16-bit- slows after 32 bit - adoption of 64-bit now under way, 128-bit far (not

performance issue)- great inflection point when 32-bit micro and cache fit on a

chip• Mid 80s to mid 90s: instruction level parallelism

- pipelining and simple instruction sets, + compiler advances (RISC)

- on-chip caches and functional units => superscalar execution

- greater sophistication: out of order execution, speculation, prediction

– to deal with control transfer and latency problems• Next step: thread level parallelism

Page 29: ECE 669 Parallel Computer Architecture - University of · PDF file · 2012-01-12ECE 669 Parallel Computer Architecture Lecture 1 Course Introduction ... • Simpler hardware & software

ECE669 L1: Course Introduction January 29, 2004

0 1 2 3 4 5 6+0

5

10

15

20

25

30

l

l

ll l

0 5 10 150

0.5

1

1.5

2

2.5

3

Fra

ctio

n of

tota

l cyc

les

(%)

Number of instructions issued

Spe

edup

Instructions issued per cycle

How far will ILP go?

° Infinite resources and fetch bandwidth, perfect branch prediction and renaming

– real caches and non-zero miss latencies

Page 30: ECE 669 Parallel Computer Architecture - University of · PDF file · 2012-01-12ECE 669 Parallel Computer Architecture Lecture 1 Course Introduction ... • Simpler hardware & software

ECE669 L1: Course Introduction January 29, 2004

Threads Level Parallelism “on board”

° Micro on a chip makes it natural to connect many to shared memory

– dominates server and enterprise market, moving down to desktop

° Faster processors began to saturate bus, then bus technology advanced

– today, range of sizes for bus-based systems, desktop to large servers

Proc Proc Proc Proc

MEM

Page 31: ECE 669 Parallel Computer Architecture - University of · PDF file · 2012-01-12ECE 669 Parallel Computer Architecture Lecture 1 Course Introduction ... • Simpler hardware & software

ECE669 L1: Course Introduction January 29, 2004

What about Multiprocessor Trends?

l

l

l

l

l

l

l

l

ll

l l

l l

l l

l

l

l

ll

l

l

l

l

l

l

0

10

20

30

40

CRAY CS6400

SGI Challenge

Sequent B2100

Sequent B8000

Symmetry81

Symmetry21

Power

SS690MP 140 SS690MP 120

AS8400

HP K400AS2100SS20

SE30SS1000E

SS10

SE10

SS1000

P-ProSGI PowerSeries

SE60

SE70

Sun E6000

SC2000ESun SC2000SGI PowerChallenge/XL

Sun E10000

50

60

70

1984 1986 1988 1990 1992 1994 1996 1998

Num

ber o

f pro

cess

ors

Page 32: ECE 669 Parallel Computer Architecture - University of · PDF file · 2012-01-12ECE 669 Parallel Computer Architecture Lecture 1 Course Introduction ... • Simpler hardware & software

ECE669 L1: Course Introduction January 29, 2004

What about Storage Trends?

° Divergence between memory capacity and speed even more pronounced

• Capacity increased by 1000x from 1980-95, speed only 2x

• Gigabit DRAM by c. 2000, but gap with processor speed much greater

° Larger memories are slower, while processors get faster• Need to transfer more data in parallel

• Need deeper cache hierarchies

• How to organize caches?

° Parallelism increases effective size of each level of hierarchy,without increasing access time

° Parallelism and locality within memory systems too• New designs fetch many bits within memory chip; follow with fast

pipelined transfer across narrower interface

• Buffer caches most recently accessed data

° Disks too: Parallel disks plus caching

Page 33: ECE 669 Parallel Computer Architecture - University of · PDF file · 2012-01-12ECE 669 Parallel Computer Architecture Lecture 1 Course Introduction ... • Simpler hardware & software

ECE669 L1: Course Introduction January 29, 2004

Economics

° Commodity microprocessors not only fast but CHEAP• Development costs tens of millions of dollars

• BUT, many more are sold compared to supercomputers

• Crucial to take advantage of the investment, and use the commodity building block

° Multiprocessors being pushed by software vendors (e.g. database) as well as hardware vendors

° Standardization makes small, bus-based SMPs commodity

° Desktop: few smaller processors versus one larger one?

° Multiprocessor on a chip?

Page 34: ECE 669 Parallel Computer Architecture - University of · PDF file · 2012-01-12ECE 669 Parallel Computer Architecture Lecture 1 Course Introduction ... • Simpler hardware & software

ECE669 L1: Course Introduction January 29, 2004

Consider Scientific Supercomputing

° Proving ground and driver for innovative architecture andtechniques

• Market smaller relative to commercial as MPs become mainstream

• Dominated by vector machines starting in 70s• Microprocessors have made huge gains in floating-point

performance- high clock rates - pipelined floating point units (e.g., multiply-add every

cycle)- instruction-level parallelism- effective use of caches (e.g., automatic blocking)

• Plus economics

° Large-scale multiprocessors replace vector supercomputers

Page 35: ECE 669 Parallel Computer Architecture - University of · PDF file · 2012-01-12ECE 669 Parallel Computer Architecture Lecture 1 Course Introduction ... • Simpler hardware & software

ECE669 L1: Course Introduction January 29, 2004

LIN

PA

CK

(GF

LOP

S)

n CRA Y peakl MPP peak

Xmp /416(4)

Ymp/832(8) nCUBE/2(1024)iPSC/860

CM-2CM-200

Delta

Paragon XP/S

C90(16)

CM-5

ASCI Red

T932(32)

T3D

Paragon XP/S MP (1024)

Paragon XP/S MP (6768)

n

n

n

n

l

l

nl

l

l

ll

l

ll

0.1

1

10

100

1,000

10,000

1985 1987 1989 1991 1993 1995 1996

Raw Parallel Performance: LINPACK

° Even vector Crays became parallel• X-MP (2-4) Y-MP (8), C-90 (16), T94 (32)

° Since 1993, Cray produces MPPs too (T3D, T3E)

Page 36: ECE 669 Parallel Computer Architecture - University of · PDF file · 2012-01-12ECE 669 Parallel Computer Architecture Lecture 1 Course Introduction ... • Simpler hardware & software

ECE669 L1: Course Introduction January 29, 2004

Where is Parallel Arch Going?

Application Software

SystemSoftware SIMD

Message Passing

Shared MemoryDataflow

SystolicArrays Architecture

• Uncertainty of direction paralyzed parallel software development!

Old view: Divergent architectures, no predictable pattern of growth.

Page 37: ECE 669 Parallel Computer Architecture - University of · PDF file · 2012-01-12ECE 669 Parallel Computer Architecture Lecture 1 Course Introduction ... • Simpler hardware & software

ECE669 L1: Course Introduction January 29, 2004

Modern Layered Framework

CAD

Multiprogramming Sharedaddress

Messagepassing

Dataparallel

Database Scientific modeling Parallel applications

Programming models

Communication abstractionUser/system boundary

Compilationor library

Operating systems support

Communication hardware

Physical communication medium

Hardware/software boundary

Page 38: ECE 669 Parallel Computer Architecture - University of · PDF file · 2012-01-12ECE 669 Parallel Computer Architecture Lecture 1 Course Introduction ... • Simpler hardware & software

ECE669 L1: Course Introduction January 29, 2004

Summary: Why Parallel Architecture?

° Increasingly attractive• Economics, technology, architecture, application demand

° Increasingly central and mainstream

° Parallelism exploited at many levels• Instruction-level parallelism• Multiprocessor servers• Large-scale multiprocessors (“MPPs”)

° Focus of this class: multiprocessor level of parallelism

° Same story from memory system perspective• Increase bandwidth, reduce average latency with many local

memories

° Spectrum of parallel architectures make sense• Different cost, performance and scalability