Routing Algorithms
ECE 284On-Chip Interconnection Networks
Spring 2014
1
Routing
• Will assume 2D mesh in this talk• How flits are routed from source
to destination can greatly impactnetwork congestion
• Two types of routing:• Oblivious routing: routing does not
consider or depend on the currenttraffic condition
• Adaptive routing: takes intoconsideration current traffic condition to determine the routing path (tries to get around congested areas)
• Oblivious routing simpler (less expensive) to implement• This talk will review existing oblivious routing algorithms
2
3
Routing Algorithm Objectives
• Maximize throughput– How much load the network can handle
• Minimize latency– Minimize routing delay between source and destination
4
Dimension-Ordered Routing (DOR)(also called XY routing)
either minimal XY or YX routing to the destination (here it uses XY route with probability 1.0)
S
D
5
DOR (XY) Routing with Uniform Traffic
• For an N = K x K mesh, N/2 nodes are in the top half.
• 1/2 of its traffic will cross the bisection.
• Traffic crossing bisection uniformly distributed across K channels.
• Therefore, maximum channel load for DOR with uniform traffic is:
ϒ(DOR, uniform) = [ (N/2) * (1/2) ] / K = K/4
(0, 0) -> (3, 3)
(1, 0) -> (3, 2)
(2, 0) -> (3, 1)
6
Problem with DOR (XY) Routing
• Minimal hop count• But, in the worst-case, the links can get overly
congested. e.g., transpose traffic pattern.
(0, 0) -> (3, 3)
(1, 0) -> (3, 2)
(2, 0) -> (3, 1)
ϒwc(DOR) = K – 1 >> ϒ(DOR, uniform)
in the worst-case.
7
Valiant Load-Balancing (VAL) [1981]
randomly chosenintermediate node
minimal XY routing to any intermediate node, then minimal XY routing to destination node
S
D
8
Valiant Load-Balancing (VAL)
• Works by turning any traffic pattern into2 phases of uniform traffic patterns, even adversarial or worst-case traffic patterns.
• In effect, it evenly load-balances the traffic.• Worst-case channel load
ϒwc(VAL) = 2 * ϒ(DOR, uniform) = 2 * (K/4) = K/2
which is 1/2 network capacity relative to DOR and uniform traffic.
9
Valiant Load-Balancing (VAL)
• Effective network capacity normalized throughput is 1/2 capacity.
• However, average hop count is 2X DOR.• 1/2 capacity was thought to be the optimal
worst-case throughput for any routing algorithm.
10
ROMM [1995]
intermediate noderandomly chosen
only in the minimaldirection to destination
minimal XY routing to an intermediate node only in the minimal direction, then minimal XY routing to
the final destination node
S
D
11
ROMM
• Tries to load-balance traffic by randomly distributing traffic along all possible minimal paths.
• Good that minimal number of hops is guaranteed.
• But, turns out in the worst-case, ROMM performs about as bad as DOR.
12
O1TURN [2005]
use both minimal XY and YX routing to the destination (0.5 XY + 0.5 YX)
S
D
13
O1TURN
• Even though it only considers XY or YX path, not all possible paths in VAL or all possible minimal paths in ROMM, it is guaranteed to achieve 1/2 capacity for the even radix case, which has been shown to be optimal.
• For the odd radix case, O1TURN is very near the optimal 1/2 capacity.
• Unlike VAL, O1TURN only uses minimal routing paths, thus no penalty in hop count.
14
Comparison
Even Radix : Opt * 1Odd Radix : Opt * (1 - 1 / K2)
VALDORROMMO1TURN
0.5
0.4
0.3
0.2
0.1
[adapted from Seo et al., O1TURN talk, ISCA 2005]
15
Simulation Results
• 4 x 4 2D MESH – Uniform Random Traffic Pattern
0
50
100
150
200
0 0.2 0.4 0.6 0.8 1
Throughput (flits / node / cycle)
Av
era
ge
La
ten
cy
(c
yc
le)
DOR
ROMM
O1TURN
DUATO
[adapted from Seo et al., O1TURN talk, ISCA 2005]
16
Simulation Results
• 4 x 4 2D MESH – Matrix Transpose Traffic Pattern– One of the worst-case traffic pattern for DOR
0
50
100
150
200
0 0.2 0.4 0.6 0.8 1
Throughput (flits / node / cycle)
Ave
rag
e L
aten
cy (c
ycle
) DOR
ROMM
O1TURN
DUATO
[adapted from Seo et al., O1TURN talk, ISCA 2005]
17
Simulation Results
• 4 x 4 2D MESH – Bit Complement Traffic Pattern– Already balanced traffic pattern
0
50
100
150
200
0 0.2 0.4 0.6 0.8 1
Throughput (flits / node / cycle)
Ave
rag
e L
aten
cy (c
ycle
) DOR
ROMM
O1TURN
DUATO
[adapted from Seo et al., O1TURN talk, ISCA 2005]
18
Simulation Results
• 4 x 4 2D MESH – HOT SPOT Traffic Pattern– 2 nodes have 20% of traffic
0
50
100
150
200
0 0.2 0.4 0.6 0.8 1
Throughput (flits / node / cycle)
Av
era
ge
La
ten
cy
(c
yc
le)
DOR
ROMM
O1TURN
DUATO
[adapted from Seo et al., O1TURN talk, ISCA 2005]
19
0
500
1000
1500
2000
0 0.2 0.4 0.6 0.8 1
Throughput (flits / node / cycle)
Av
era
ge
La
ten
cy
(F
O4
)
DOR
ROMM
O1TURN
DUATO
Simulation Results
• Delay penalty of adaptive routing– How the complexity of router implementation affects on latency– Hot Spot Traffic Pattern
[adapted from Seo et al., O1TURN talk, ISCA 2005]
20
U2TURN [2012]
• 1/2 capacity has been thought to be optimal worst-case throughput for both odd and even radices, and O1TURN is the state-of-the-art for achieving this for the even radix case.
• But, turns out 1/2 capacity is not optimal for the odd radix case.
21
U2TURN
• U2TURN considers all possible XYX and YXY 2-TURN paths, and selects these paths with equal probability.
• XYX paths: randomly select a node on the same row and route to it, followed by minimal YX routing to final destination.
• YXY paths: randomly select a node on the same column and route to it, followed by minimal XY routing to final destination.
22
Analytical Results• For the even radix case, worst-case capacity of U2TURN
= 1/2, same as VAL and O1TURN, which is optimal.
• But, for the odd radix case, worst-case capacity of U2TURN =
(K+1)/(2K+1) > 1/2
which is better than any existing routing algorithm.
23
Worst-Case Throughput
VALDORU2TURNO1TURNOptimal routing
ROMM
Throughput Comparison for Odd Radix
24
3X3 mesh VAL DOR O1TURN U2TURNWorst-case 0.5 0.33 0.44 0.57
Average-case 0.5 0.405 0.477 0.604
Transpose 0.5 0.33 0.67 0.8
Random 0.5 1 1 0.72
DOR-WC 0.5 0.33 0.67 0.8
Complement 0.5 0.67 0.67 0.57
Nearest-Neighbor 0.5 1.33 1.33 0.75
5X5 VAL DOR O1TURN U2TURN0.5 0.3 0.48 0.55
0.5 0.44 0.53 0.632
0.5 0.3 0.6 0.75
0.5 1 1 0.685
0.5 0.3 0.6 0.75
0.5 0.6 0.6 0.55
0.5 2.4 2.4 1.17
Throughput Comparison for Even Radix
25
4X4 mesh VAL DOR O1TURN U2TURNWorst-case 0.5 0.33 0.5 0.5
Average-case 0.5 0.48 0.54 0.64
Transpose 0.5 0.33 0.67 0.8
Random 0.5 1 1 0.7
DOR-WC 0.5 0.33 0.67 0.8
Complement 0.5 0.5 0.5 0.5
Nearest-Neighbor 0.5 2 2 1.1
6X6 VAL DOR O1TURN U2TURN0.5 0.3 0.5 0.5
0.5 0.47 0.556 0.65
0.5 0.3 0.6 0.75
0.5 1 1 0.682
0.5 0.3 0.6 0.75
0.5 0.5 0.5 0.5
0.5 3 3 1.27
Latency• A potential concern about U2TURN is that it uses
non-minimal routing paths.
• However, U2TURN does a better job of load-balancing traffic than O1TURN for difficult traffic patterns.
• Hence, the queuing delay can be very high for O1TURN for difficult traffic patterns, hence longer latency despite fewer number of hops.
• Surprisingly, latency better for both odd and even radix cases for difficult traffic patterns. 26
Latency for 7 x 7 Mesh
27
Latency for 8 x 8 Mesh
28
References• Valiant [L.G.Valiant et. al, ACM 1981]• ROMM [T.Nesson et. al, ACM 1995]• O1TURN [D. Seo et. al, ISCA 2005]• U2TURN [G. Sun et. Al, ICCD 2012]
29