6.189 iap 2007 - mit csail

53
Dr. Rodric Rabbah, IBM. 1 6.189 IAP 2007 MIT 6.189 IAP 2007 Lecture 17 The Raw Experience

Upload: others

Post on 13-May-2022

5 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: 6.189 IAP 2007 - MIT CSAIL

Dr. Rodric Rabbah, IBM. 1 6.189 IAP 2007 MIT

6.189 IAP 2007

Lecture 17

The Raw Experience

Page 2: 6.189 IAP 2007 - MIT CSAIL

2 6.189 IAP 2007 MITDr. Rodric Rabbah, IBM.

Raw Chips

October 02

Page 3: 6.189 IAP 2007 - MIT CSAIL

3 6.189 IAP 2007 MITDr. Rodric Rabbah, IBM.

Raw Microprocessor

● Tiled microprocessor with point-to-point pipelined scalar operand network

● Each tiles is 4 mm x 4mmMIPS-style compute processor– Single-issue 8-stage pipe– 32b FPU– 32K D Cache, I Cache

● 4 on-chip mesh networksTwo for operandsOne for cache misses, I/OOne for message passing

Page 4: 6.189 IAP 2007 - MIT CSAIL

4 6.189 IAP 2007 MITDr. Rodric Rabbah, IBM.

Raw Microprocessor

● 16 tiles (16 issue)● 180 nm ASIC (IBM SA-27E)● ~100 million transistors● 1 million gates

● 3-4 years of development● 1.5 years of testing● 200K lines of test code

● Core Frequency:425 MHz @ 1.8 V500 MHz @ 2.2 V

● Frequency competitive with IBM-implemented PowerPCs in same process

● 18W average power

Page 5: 6.189 IAP 2007 - MIT CSAIL

5 6.189 IAP 2007 MITDr. Rodric Rabbah, IBM.

One Cycle in the Life of a Tiled Processor

● Application uses as many tiles as needed to exploit its parallelism

httpd

4-way automaticallyparallelizedC program

2-thread MPI app

mem

mem

mem

Zzz...

Page 6: 6.189 IAP 2007 - MIT CSAIL

6 6.189 IAP 2007 MITDr. Rodric Rabbah, IBM.

Raw Motherboard

Page 7: 6.189 IAP 2007 - MIT CSAIL

7 6.189 IAP 2007 MITDr. Rodric Rabbah, IBM.

Raw in Action

Page 8: 6.189 IAP 2007 - MIT CSAIL

8 6.189 IAP 2007 MITDr. Rodric Rabbah, IBM.

1

4

8

16

1 4 8 16

10

30

60

Spee

dup Fram

es/s

# of Tiles

350 x 240 Images 720 x 480 Images

1

4

8

16

4 8

10

30

60

Spee

dup Fram

es/s

# of Tiles

MPEG-2 Encoder Performance

Square – Linear speedupDiamond – Hand-optimized, slice parallel implementationCircle – Slice parallel implementationTriangle – Baseline macroblock parallel implementation

Page 9: 6.189 IAP 2007 - MIT CSAIL

9 6.189 IAP 2007 MITDr. Rodric Rabbah, IBM.

MPEG-2 Encoder Performance

51.90158*64

30*103*32

14.5716.7458.6516

7.528.6930.828

3.844.4516.184

1.972.248.482

1.001.144.301

720 x 480640 x 480352 x 240# Tiles

Encoding Rate (frames/s)

* Estimated data rates

Page 10: 6.189 IAP 2007 - MIT CSAIL

10 6.189 IAP 2007 MITDr. Rodric Rabbah, IBM.

Programmable Graphics Pipeline

screenshot from Counterstrike

Input

Vertex Vertex

Sync

Triangle Setup

Pixel Pixel

V

P

simplified graphics pipeline

Page 11: 6.189 IAP 2007 - MIT CSAIL

11 6.189 IAP 2007 MITDr. Rodric Rabbah, IBM.

Phong Shading

● Per-pixel phong-shaded polyhedron

● 162 vertices, 1 light

Output, rendered using Raw simulator

Page 12: 6.189 IAP 2007 - MIT CSAIL

12 6.189 IAP 2007 MITDr. Rodric Rabbah, IBM.

Reconfigurable pipeline

Fixed pipeline

● 33% faster● 150% better

utilization

Phong Shading (64-tiles)

Page 13: 6.189 IAP 2007 - MIT CSAIL

13 6.189 IAP 2007 MITDr. Rodric Rabbah, IBM.

Shadow Volumes

● 4 textured triangles● 1 point light● Rendered in 3 passes

Output, rendered using Raw simulator

Page 14: 6.189 IAP 2007 - MIT CSAIL

14 6.189 IAP 2007 MITDr. Rodric Rabbah, IBM.

Fixed pipelinePass 1 Pass 2 Pass 3

Pass 1 Pass 2 Pass 3Reconfigurable pipeline

● 40% faster

Shadow Volumes (64-tiles)

cycles

Page 15: 6.189 IAP 2007 - MIT CSAIL

15 6.189 IAP 2007 MITDr. Rodric Rabbah, IBM.

1020 Element Microphone Array

Page 16: 6.189 IAP 2007 - MIT CSAIL

16 6.189 IAP 2007 MITDr. Rodric Rabbah, IBM.

Case Study: Beamformer

240

19

640

1,430

0

200

400

600

800

1,000

1,200

1,400

1,600

C program C program UnoptimizedStreamIt

Optimized StreamIt

1 GHz Pentium III 420 MHz single tileRaw

420 MHz 64 tileRaw

420 MHz 16 tileRaw

MFL

OPS

Page 17: 6.189 IAP 2007 - MIT CSAIL

17 6.189 IAP 2007 MITDr. Rodric Rabbah, IBM.

The Raw Experience

● Insights into the design Raw architecture● Raw parallelizing compiler● StreamIt language and Compiler

Page 18: 6.189 IAP 2007 - MIT CSAIL

18 6.189 IAP 2007 MITDr. Rodric Rabbah, IBM.

ControlWideFetch

(16 inst)

UnifiedLoad/Store

Queue

PC

RF

ALU

ALU

ALU

ALU

ALU

ALU

ALU

ALU

ALU

ALU

ALU

ALU

ALU

ALU

ALU

ALU

Bypass Net

Scalability Problems in Wide Issue Processors

Page 19: 6.189 IAP 2007 - MIT CSAIL

19 6.189 IAP 2007 MITDr. Rodric Rabbah, IBM.

ALU

ALU

ALU

ALU

ALU

ALU

ALU

ALU

ALU

ALU

ALU

ALU

ALU

ALU

ALU

ALU

Bypass Net

RF

~N3 ~N2N ALUs

Area and Frequency Scalability Problems

Page 20: 6.189 IAP 2007 - MIT CSAIL

20 6.189 IAP 2007 MITDr. Rodric Rabbah, IBM.

ALU

ALU

ALU

ALU

ALU

ALU

ALU

ALU

ALU

ALU

ALU

ALU

ALU

ALU

ALU

ALU

Bypass Net

RF>>

+

Operand Routing is Global

Page 21: 6.189 IAP 2007 - MIT CSAIL

21 6.189 IAP 2007 MITDr. Rodric Rabbah, IBM.

ALU

ALU

ALU

ALU

ALU

ALU

ALU

ALU

ALU

ALU

ALU

ALU

ALU

ALU

ALU

ALU

Bypass Net

RF

Idea: Make Operand Routing Local

Page 22: 6.189 IAP 2007 - MIT CSAIL

22 6.189 IAP 2007 MITDr. Rodric Rabbah, IBM.

ALU

ALU

ALU

ALU

ALU

ALU

ALU

ALU

ALU

ALU

ALU

ALU

ALU

ALU

ALU

ALU

RF

Bypass Net

Idea: Exploit Locality

Page 23: 6.189 IAP 2007 - MIT CSAIL

23 6.189 IAP 2007 MITDr. Rodric Rabbah, IBM.

ALU

ALU

ALU

ALU

ALU

ALU

ALU

ALU

ALU

ALU

ALU

ALU

ALU

ALU

ALU

ALU

RF

Replace Crossbar with Point-To-Point Network

Page 24: 6.189 IAP 2007 - MIT CSAIL

24 6.189 IAP 2007 MITDr. Rodric Rabbah, IBM.

ALU

ALU

ALU

ALU

ALU

ALU

ALU

ALU

ALU

ALU

ALU

ALU

ALU

ALU

ALU

ALU

RF>>

+

Replace Crossbar with Point-To-Point Network

Page 25: 6.189 IAP 2007 - MIT CSAIL

25 6.189 IAP 2007 MITDr. Rodric Rabbah, IBM.

~ 1~ NLocality-driven Placement

~ N½~ NNon-local Placement

Point-to-Point NetworkCrossbar

Operand Transport Latency

Page 26: 6.189 IAP 2007 - MIT CSAIL

26 6.189 IAP 2007 MITDr. Rodric Rabbah, IBM.

ALU

ALU

ALU

ALU

ALU

ALU

ALU

ALU

ALU

ALU

ALU

ALU

ALU

ALU

ALU

ALU

RF

RFRF RFRF

RFRF RFRF

RFRF RFRF

RFRF RFRF

Distribute the Register File

Page 27: 6.189 IAP 2007 - MIT CSAIL

27 6.189 IAP 2007 MITDr. Rodric Rabbah, IBM.

ALU

ALU

ALU

ALU

ALU

ALU

ALU

ALU

ALU

ALU

ALU

ALU

ALU

ALU

ALU

ALU

RFRF RFRF

RFRF RFRF

RFRF RFRF

RFRF RFRF

ControlWideFetch

(16 inst)

UnifiedLoad/Store

Queue

PC

SCALABLE

More Scalability Problems

Page 28: 6.189 IAP 2007 - MIT CSAIL

28 6.189 IAP 2007 MITDr. Rodric Rabbah, IBM.

ControlWideFetch

(16 inst)

UnifiedLoad/Store

Queue

PC

More Scalability Problems

Page 29: 6.189 IAP 2007 - MIT CSAIL

29 6.189 IAP 2007 MITDr. Rodric Rabbah, IBM.

ALU

ALU

ALU

ALU

ALU

ALU

ALU

ALU

ALU

ALU

ALU

ALU

ALU

ALU

ALU

ALU

RFRF RFRF

RFRF RFRF

RFRF RFRF

RFRF RFRF

Control

WideFetch

(16 inst)

UnifiedLoad/Store

Queue

PC I$PC

D$I$PC

D$I$PC

D$I$PC

D$

I$PC

D$I$PC

D$I$PC

D$I$PC

D$

I$PC

D$I$PC

D$I$PC

D$I$PC

D$

I$PC

D$I$PC

D$I$PC

D$I$PC

D$

Distribute Everything

Page 30: 6.189 IAP 2007 - MIT CSAIL

30 6.189 IAP 2007 MITDr. Rodric Rabbah, IBM.

ALU

ALU

ALU

ALU

ALU

ALU

ALU

ALU

ALU

ALU

ALU

ALU

ALU

ALU

ALU

ALU

RFRF RFRF

RFRF RFRF

RFRF RFRF

RFRF RFRF

I$PC

D$I$PC

D$I$PC

D$I$PC

D$

I$PC

D$I$PC

D$I$PC

D$I$PC

D$

I$PC

D$I$PC

D$I$PC

D$I$PC

D$

I$PC

D$I$PC

D$I$PC

D$I$PC

D$

Tiled Processor

Page 31: 6.189 IAP 2007 - MIT CSAIL

31 6.189 IAP 2007 MITDr. Rodric Rabbah, IBM.

Tiled Processor

Page 32: 6.189 IAP 2007 - MIT CSAIL

32 6.189 IAP 2007 MITDr. Rodric Rabbah, IBM.

Tiled Processor (Taylor PhD 2007)

● Fast inter-tile communication through point-to-point pipelined scalar operand network (SON)

● Easy to scale for the same reasons as multicores

Page 33: 6.189 IAP 2007 - MIT CSAIL

33 6.189 IAP 2007 MITDr. Rodric Rabbah, IBM.

IF RFDA TL

M1 M2

F P

E

U

r26

r27

r25

r24

r26

r27

r25

r24

Raw Compute Processor Internals

Page 34: 6.189 IAP 2007 - MIT CSAIL

34 6.189 IAP 2007 MITDr. Rodric Rabbah, IBM.

add $25,$1,$2 Route $P->$E Route $W->$P

Tile-Tile Communication

sub $20,$1,$25

Page 35: 6.189 IAP 2007 - MIT CSAIL

35 6.189 IAP 2007 MITDr. Rodric Rabbah, IBM.

Multiprocessor Node 1 Multiprocessor Node 2

Transport Cost

sendoverhead

receiveoverhead

sendoccupancy

sendlatency

receiveoccupancy

receivelatency

Why Communication Is Expensive on Multicores

Page 36: 6.189 IAP 2007 - MIT CSAIL

36 6.189 IAP 2007 MITDr. Rodric Rabbah, IBM.

sendoccupancy

sendlatency

Destination node nameSequence numberValueLaunch sequence

Commit LatencyNetwork injection

Multiprocessor Node 1

Why Communication Is Expensive on Multicores

Page 37: 6.189 IAP 2007 - MIT CSAIL

37 6.189 IAP 2007 MITDr. Rodric Rabbah, IBM.

Multiprocessor Node 2

receiveoccupancy

receivelatency

receive sequence

injection cost

Why Communication Is Expensive on Multicores

Page 38: 6.189 IAP 2007 - MIT CSAIL

38 6.189 IAP 2007 MITDr. Rodric Rabbah, IBM.

A Figure of Merit for SONs

● 5-tuple <SO, SL, NHL, RL, RO> Send occupancySend latencyNetwork hop latencyReceive latencyReceive occupancy

● Tip: Ordering follows timing of message from sender to receiver

Page 39: 6.189 IAP 2007 - MIT CSAIL

39 6.189 IAP 2007 MITDr. Rodric Rabbah, IBM.

The Interesting Region

ScalableMultiprocessor <2, 14, 3, 14,4>(on-chip)

Raw SON < 0, 0, 1, 2, 0>(scalable)

Superscalar < 0, 0, 0, 0, 0>(not scalable)

● Where is Cell in this space?

Page 40: 6.189 IAP 2007 - MIT CSAIL

40 6.189 IAP 2007 MITDr. Rodric Rabbah, IBM.

The Raw Experience

● Insights into the design Raw architecture● Raw parallelizing compiler

Page 41: 6.189 IAP 2007 - MIT CSAIL

41 6.189 IAP 2007 MITDr. Rodric Rabbah, IBM.

Raw Parallelizing Compiler (Lee PhD 2005)

● Data distribution● Instruction distribution● Coordination

CommunicationControl flow

SequentialProgram

Compiler

Fine-grainedOrchestrated

Parallel Program

Page 42: 6.189 IAP 2007 - MIT CSAIL

42 6.189 IAP 2007 MITDr. Rodric Rabbah, IBM.

load r1 <- addradd r1 <- r1, 1

MM

MM?

??

?

Data Distribution

Page 43: 6.189 IAP 2007 - MIT CSAIL

43 6.189 IAP 2007 MITDr. Rodric Rabbah, IBM.

Instruction Distribution

pval5=seed.0*6.0

pval4=pval5+2.0

tmp3.6=pval4/3.0

tmp3=tmp3.6

v3.10=tmp3.6-v2.7

v3=v3.10

v2.4=v2

pval3=seed.o*v2.4

tmp2.5=pval3+2.0

tmp2=tmp2.5

pval6=tmp1.3-tmp2.5

v2.7=pval6*5.0

v2=v2.7

seed.0=seed

pval1=seed.0*3.0

pval0=pval1+2.0

tmp0.1=pval0/2.0

tmp0=tmp0.1

v1.2=v1

pval2=seed.0*v1.2

tmp1.3=pval2+2.0

tmp1=tmp1.3

pval7=tmp1.3+tmp2.5

v1.8=pval7*3.0

v1=v1.8v0.9=tmp0.1-v1.8

v0=v0.9

basic block

Page 44: 6.189 IAP 2007 - MIT CSAIL

44 6.189 IAP 2007 MITDr. Rodric Rabbah, IBM.

Clustering: Parallelism vs. Communication

pval5=seed.0*6.0

pval4=pval5+2.0

tmp3.6=pval4/3.0

tmp3=tmp3.6

v3.10=tmp3.6-v2.7

v3=v3.10

v2.4=v2

pval3=seed.o*v2.4

tmp2.5=pval3+2.0

tmp2=tmp2.5

pval6=tmp1.3-tmp2.5

v2.7=pval6*5.0

v2=v2.7

seed.0=seed

pval1=seed.0*3.0

pval0=pval1+2.0

tmp0.1=pval0/2.0

tmp0=tmp0.1

v1.2=v1

pval2=seed.0*v1.2

tmp1.3=pval2+2.0

tmp1=tmp1.3

pval7=tmp1.3+tmp2.5

v1.8=pval7*3.0

v1=v1.8v0.9=tmp0.1-v1.8

v0=v0.9

tmp0=tmp0.1

pval5=seed.0*6.0

pval4=pval5+2.0

tmp3.6=pval4/3.0

tmp3=tmp3.6

v3.10=tmp3.6-v2.7

v3=v3.10

v2.4=v2

pval3=seed.o*v2.4

tmp2.5=pval3+2.0

tmp2=tmp2.5

pval6=tmp1.3-tmp2.5

v2.7=pval6*5.0

v2=v2.7

seed.0=seed

pval1=seed.0*3.0

pval0=pval1+2.0

tmp0.1=pval0/2.0

v1.2=v1

pval2=seed.0*v1.2

tmp1.3=pval2+2.0

tmp1=tmp1.3

pval7=tmp1.3+tmp2.5

v1.8=pval7*3.0

v1=v1.8v0.9=tmp0.1-v1.8

v0=v0.9

Page 45: 6.189 IAP 2007 - MIT CSAIL

45 6.189 IAP 2007 MITDr. Rodric Rabbah, IBM.

tmp0=tmp0.1

pval5=seed.0*6.0

pval4=pval5+2.0

tmp3.6=pval4/3.0

tmp3=tmp3.6

v3.10=tmp3.6-v2.7

v3=v3.10

v2.4=v2

pval3=seed.o*v2.4

tmp2.5=pval3+2.0

tmp2=tmp2.5

pval6=tmp1.3-tmp2.5

v2.7=pval6*5.0

v2=v2.7

seed.0=seed

pval1=seed.0*3.0

pval0=pval1+2.0

tmp0.1=pval0/2.0

v1.2=v1

pval2=seed.0*v1.2

tmp1.3=pval2+2.0

tmp1=tmp1.3

pval7=tmp1.3+tmp2.5

v1.8=pval7*3.0

v1=v1.8v0.9=tmp0.1-v1.8

v0=v0.9

tmp0=tmp0.1

pval5=seed.0*6.0

pval4=pval5+2.0

tmp3.6=pval4/3.0

tmp3=tmp3.6

v3.10=tmp3.6-v2.7

v3=v3.10

v2.4=v2

pval3=seed.o*v2.4

tmp2.5=pval3+2.0

tmp2=tmp2.5

pval6=tmp1.3-tmp2.5

v2.7=pval6*5.0

v2=v2.7

seed.0=seed

pval1=seed.0*3.0

pval0=pval1+2.0

tmp0.1=pval0/2.0

v1.2=v1

pval2=seed.0*v1.2

tmp1.3=pval2+2.0

tmp1=tmp1.3

pval7=tmp1.3+tmp2.5

v1.8=pval7*3.0

v1=v1.8v0.9=tmp0.1-v1.8

v0=v0.9

Adjusting Granularity: Load Balancing

Page 46: 6.189 IAP 2007 - MIT CSAIL

46 6.189 IAP 2007 MITDr. Rodric Rabbah, IBM.

tmp0=tmp0.1

pval5=seed.0*6.0

pval4=pval5+2.0

tmp3.6=pval4/3.0

tmp3=tmp3.6

v3.10=tmp3.6-v2.7

v3=v3.10

v2.4=v2

pval3=seed.o*v2.4

tmp2.5=pval3+2.0

tmp2=tmp2.5

pval6=tmp1.3-tmp2.5

v2.7=pval6*5.0

v2=v2.7

seed.0=seed

pval1=seed.0*3.0

pval0=pval1+2.0

tmp0.1=pval0/2.0

v1.2=v1

pval2=seed.0*v1.2

tmp1.3=pval2+2.0

tmp1=tmp1.3

pval7=tmp1.3+tmp2.5

v1.8=pval7*3.0

v1=v1.8v0.9=tmp0.1-v1.8

v0=v0.9

Placement

pval5=seed.0*6.0

pval4=pval5+2.0

tmp3.6=pval4/3.0

tmp3=tmp3.6

v3.10=tmp3.6-v2.7

v3=v3.10

v2.4=v2

pval3=seed.o*v2.4

tmp2.5=pval3+2.0

tmp2=tmp2.5

pval6=tmp1.3-tmp2.5

v2.7=pval6*5.0

v2=v2.7

seed.0=seed

pval1=seed.0*3.0

pval0=pval1+2.0

tmp0.1=pval0/2.0

tmp0=tmp0.1

v1.2=v1

pval2=seed.0*v1.2

tmp1.3=pval2+2.0

tmp1=tmp1.3

pval7=tmp1.3+tmp2.5

v1.8=pval7*3.0

v1=v1.8v0.9=tmp0.1-v1.8

v0=v0.9

Page 47: 6.189 IAP 2007 - MIT CSAIL

47 6.189 IAP 2007 MITDr. Rodric Rabbah, IBM.

Communication Coordination

sendroutesreceive

Proc Switch ProcSwitchProcProc

Page 48: 6.189 IAP 2007 - MIT CSAIL

48 6.189 IAP 2007 MITDr. Rodric Rabbah, IBM.

Instruction Scheduling

● Inter-tile cycle scheduling schedules communication and can guarantee deadlock freedom

seed.0=recv()pval5=seed.0*6.0

pval4=pval5+2.0tmp3.6=pval4/3.0

tmp3=tmp3.6

v2.7=recv()v3.10=tmp3.6-v2.7

v3=v3.10

route(W,t)

route(W,S)

route(S,t)

send(seed.0)pval1=seed.0*3.0

pval0=pval1+2.0

tmp0.1=pval0/2.0

send(tmp0.1)tmp0=tmp0.1 route(t,E)

v2.4=v2

seed.0=recv(0)pval3=seed.o*v2.4

tmp2.5=pval3+2.0

tmp2=tmp2.5send(tmp2.5)

tmp1.3=recv()pval6=tmp1.3-tmp2.5

v2.7=pval6*5.0

Send(v2.7)v2=v2.7

route(t,E)route(E,t)

route(t,E)

v1.2=v1

seed.0=recv()pval2=seed.0*v1.2

tmp1.3=pval2+2.0

send(tmp1.3)tmp1=tmp1.3

tmp2.5=recv()

pval7=tmp1.3+tmp2.5

v1.8=pval7*3.0

v1=v1.8v0.9=tmp0.1-v1.8

v0=v0.9

route(N,t)

seed.0=seed

tmp0.1=recv()

route(t,W)route(W,t)

route(W,N)

route(t,E)route(t,E)

route(W,t)

Page 49: 6.189 IAP 2007 - MIT CSAIL

49 6.189 IAP 2007 MITDr. Rodric Rabbah, IBM.

Final Code Representation

seed.0=recv()

pval5=seed.0*6.0

pval4=pval5+2.0

tmp3.6=pval4/3.0

tmp3=tmp3.6

v2.7=recv()

v3.10=tmp3.6-v2.7

v3=v3.10

route(W,t)

route(W,S)

route(S,t)

send(seed.0)

pval1=seed.0*3.0

pval0=pval1+2.0

tmp0.1=pval0/2.0

send(tmp0.1)

tmp0=tmp0.1

route(t,E)

v2.4=v2

seed.0=recv(0)

pval3=seed.o*v2.4

tmp2.5=pval3+2.0

tmp2=tmp2.5

send(tmp2.5)

tmp1.3=recv()

pval6=tmp1.3-tmp2.5

v2.7=pval6*5.0

Send(v2.7)

v2=v2.7

route(t,E)

route(E,t)

route(t,E)

v1.2=v1

seed.0=recv()

pval2=seed.0*v1.2

tmp1.3=pval2+2.0

send(tmp1.3)

tmp1=tmp1.3

tmp2.5=recv()

pval7=tmp1.3+tmp2.5

v1.8=pval7*3.0

v1=v1.8

v0.9=tmp0.1-v1.8

v0=v0.9

route(N,t) seed.0=seed

tmp0.1=recv()

route(t,W)

route(W,t)

route(W,N)

route(t,E)route(t,E)

route(W,t)

Page 50: 6.189 IAP 2007 - MIT CSAIL

50 6.189 IAP 2007 MITDr. Rodric Rabbah, IBM.

Control Coordination

Page 51: 6.189 IAP 2007 - MIT CSAIL

51 6.189 IAP 2007 MITDr. Rodric Rabbah, IBM.

Asynchronous Global Branching

br x

br x

br xbr x

br x

Page 52: 6.189 IAP 2007 - MIT CSAIL

52 6.189 IAP 2007 MITDr. Rodric Rabbah, IBM.

YesNoYesClean semantics

Yes

Yes

Explicit

Expensive

Multicore

Yes

Yes

Both

Cheap

Tiled Processorwith SON

Noscalable

Nopower efficient

Implicitexploitation of parallelism

FreePE-PE communication

Superscalar

Summary

● Tiled microprocessors incorporate the best elements of superscalars and multiprocessors

Page 53: 6.189 IAP 2007 - MIT CSAIL

53 6.189 IAP 2007 MITDr. Rodric Rabbah, IBM.

Raw Project Contributors

Anant AgarwalSaman Amarasinghe

Jonathan BabbRajeev BaruaIan BrattJonathan EastepMatt Frank Ben GreenwaldHenry HoffmannPaul JohnsonJason KimTheo Konstantakopoulos

Walter LeeJason MillerJames PsotaArvind SarafVivek SarkarNathan ShnidmanVolker StrumpenMichael TaylorElliot WaingoldDavid Wentzlaff