flit-reservation flow control
TRANSCRIPT
1/10/2000 HPCA6
Flit-Reservation Flow Control
Li-Shiuan Peh and William J. Dally
Stanford University
1/10/2000 HPCA6
Motivation for on-chip networks
n Dedicated global wiresinefficient
n On-chip interconnectionnetwork
n Switch is “free”
1/10/2000 HPCA6
Flit-Reservation Flow Controln Fast control wires on thick upper metal layer
n Control flits traverse the fast wires ahead ofdata flits, reserving channel bandwidth andbuffers in advance
n Data flits forwarded or buffered according toreservation
1/10/2000 HPCA6
Benefit: Reduces latencyn Virtual-channel: decision latency + wire delay
n Flit-reservation: only wire delay
decision wire delay decision wire delay decision wire delay
wire delay wire delay wire delay
decision decision decision
1/10/2000 HPCA6
Benefit: Increases throughputthrough good buffer utilizationn Virtual-channel: leaves buffers idle between
use
n Flit-reservation: immediate buffer re-use
bufferuse
hold release hold
wiredelay
creditdelay
wiredelay
creditdelay
hold release hold release
bufferuse
bufferuse
bufferuse
1/10/2000 HPCA6
Dynamic vs. static schedulingn Control flits dynamically schedule movements
of data flits at packet injection time
n Statically-scheduled networks schedulemovements of data flits at compile time
n Approaches the benefits of statically-scheduled networks without sacrificingflexibility
1/10/2000 HPCA6
Packet Composition
d0d4 d3 d2 d1
VC
IDdesttd0
td1td2td3td4
control head flit
data flits
VC
ID
control body flit
1/10/2000 HPCA6
Router ArchitectureRoutingLogic
VCID Port
OutputScheduler
OutputReservation
Table
InputReservation
Table
InputBufferPool
controlflits in
creditsout
dataflits out
controlflits out
dataflits in
wr rd
InputScheduler
creditsin
otherinput
channels
otheroutputchannels
1/10/2000 HPCA6
Routing and Arbitrationn Control head flit obtains output port(s) from
routing unit
n Control body/tail flits obtains output portbased on VCID
n Control flits arbitrate for output virtualchannels and passage through the controlswitch
1/10/2000 HPCA6
Output Schedulingn Finds earliest departure time for data flit,
given its arrival time
n Consults output reservation table
n Increments number of free buffers on nextnode upon receiving credits
Channel busy
Free buffers onnext node
timeoutput channel
EastChannel
8 9 10 11 12 13 14 15 16 17
1/10/2000 HPCA6
Input Schedulingn Forwards or buffers a data flit when it arrives
n Determines which buffer to be read ontowhich output port at each cycle
n Sends credit to previous node
1/10/2000 HPCA6
Walk-through: Output Schedulern Example: data flit arriving in 9 cyclesn Before reservation:
n After reservation:
Channel busy
Free buffers onnext node
timeoutput channel
EastChannel
8 9 10 11 12 13 14 15 16 17
2 1 1 0 1 2 3 4 4 4
Channel busy
Free buffers onnext node
timeoutput channel
EastChannel 2 1 1 0 0 1 2 3 3 3
8 9 10 11 12 13 14 15 16 17
1/10/2000 HPCA6
Walk-through: Input Schedulern Place-holder at reservation timen Actual buffer allocation just before flit arrival
E
+2
5
Buffer in
Departure Time
timeinput channel
WestChannel
Buffer out
Output Channel
8 9 10 11 12 13 14 15 16 17
Flit Arriving?
5
Input reservation table
1/10/2000 HPCA6
Storage overhead Data buffers Control buffers Queue pointers Output reservation table Input reservation table
n Configurations with equal storage overhead:VC FR
8 buffers vs. 6 buffers 16 buffers vs. 13 buffers 32 buffers vs. 28 buffers
VC FR
1/10/2000 HPCA6
Bandwidth overheadn VC: VCID for each data flit.
n FR: VCID for each control flit + timestampsfor its data flits.
n FR incurs 5 bits/256 bits = 2% additionalbandwidth overhead over VC
VC
ID
VC
ID
data
VC
ID
time
time
data
data data
1/10/2000 HPCA6
Simulation parametersn 8-by-8 meshn Each flit 256 bits wide.n Control wires 4x faster than data wiresn 1 cycle routing and arbitration delayn VC: Choose number of VCs which realize best performancen FR: 1 control flit leads 1 data flit,
2 control flits injected per cycle, 32-cycle scheduling horizon.
n Average latency for 100,000 packetsn Latency spans time when packet is created, till last data flit is
ejected from destination
1/10/2000 HPCA6
Fast control (5-flit packets)
•Base latency: VC8 (32 cycles) FR6 (27 cycles)•Throughput: VC8 (63% capacity) FR6 (77% capacity)
VC16 (80% capacity) FR13 (85% capacity)
0
10
20
30
40
50
60
70
80
90
100
0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1
Traffic (fraction of capacity)
Late
ncy
(cyc
les)
VC8
FR6
VC16
FR13
1/10/2000 HPCA6
Fast control (21-flit packets)
•Base latency: VC16 (55 cycles) FR13 (46 cycles) •Throughput: VC16 (65% capacity) FR13 (75% capacity)
VC32 (65% capacity) FR28 (82% capacity)
0
50
100
150
200
250
300
0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1
Traffic (fraction of capacity)
Late
ncy
(cyc
les)
VC16
VC32
FR13
FR28
1/10/2000 HPCA6
Flit-reservation for off-chipnetworks
n No fast control wires for off-chip networks
n Defer data flits behind control flits
n Ensure excess capacity on control network
1/10/2000 HPCA6
Leading control (5-flit packets)
•Base latency: VC8 (15 cycles) FR6 (15 cycles) •50% capacity: VC8 (21 cycles) FR6 (19 cycles)•Throughput: VC8 (65% capacity) FR6 (75% capacity)
VC16 (80% capacity) FR13 (85% capacity)
0
10
20
30
40
50
60
70
80
90
100
0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1
Traffic (fraction of capacity)
Late
ncy
(cyc
les)
VC8VC16
FR6
FR13
1/10/2000 HPCA6
Conclusionn Flit-Reservation Flow Control
– Reduce latency & increase throughput– Approach performance of statically-scheduled
network– Dynamically reserve buffers and channel bandwidth– On-chip and off-chip networks
n Current status– Detailed verilog model of the router– Deadlock analysis