ikki fujiwara, michihiro koibuchi national institute of...
TRANSCRIPT
Ikki Fujiwara,
Michihiro Koibuchi National Institute of Informatics
Hiroki Matsutani Keio University
Henri Casanova University of Hawaii at Manoa
IPDPS 2014 / May 20th, 2014 / Phoenix, Arizona, USA
The Light Speed is Fixed
2014-05-20
2
Koibuchi Lab @ National Institute of Informatics
c ≈ 0.3 m/ns c ≈ 0.2 m/ns
= 5.00 ns/m
Switch Delay is Continuously Decreasing
2014-05-20
3
Koibuchi Lab @ National Institute of Informatics
1 hop =
÷ 5 ns/m =
140 ns
QLogic 12300
28 m
200 ns
Cisco SFS7000D
40 m
60 ns
A future product
?
12 m
Switch delay will no longer dominate the end-to-end
communication latency
Switch delay
Equivalent
cable length
What Happens in the Future
2014-05-20
4
Koibuchi Lab @ National Institute of Informatics
0.8
1.6
2.4
3.2
0 60 120 180
Maxi
mu
m late
ncy
[μ
s]
Switch delay [ns]
Random
degree=11
diameter=5
Hypercube
degree=11
diameter=11
Traditional Hypercube outperforms the same-degree
Random topology!
Topology Design Trends
2014-05-20
5
Koibuchi Lab @ National Institute of Informatics
Geometrical Design Topological Design
Ring+Random [Koibuchi et al. ISCA12]
HyperX [Ahn et al. SC09]
Jellyfish [Singla et al. NDSI12]
Skywalk
Torus / Hypercube
Introduction
Skywalk construction
Intra-cabinet links
Inter-cabinet links
Graph analysis
Cycle-accurate simulation
Conclusion
Agenda
2014-05-20
6
Koibuchi Lab @ National Institute of Informatics
Intra-cabinet Links
2014-05-20
7
Koibuchi Lab @ National Institute of Informatics
Switch Hosts (compute nodes) *
Cabinet
* Hereafter the hosts are omitted
1 Randomly connect the switches in each cabinet — Possibly fully connected
Inter-cabinet Links
2014-05-20
8
Koibuchi Lab @ National Institute of Informatics
Floor
Cabinets
2 Randomly connect the
cabinets in each row
Inter-cabinet Links
2014-05-20
9
Koibuchi Lab @ National Institute of Informatics
3 Randomly connect the
cabinets in each column
4 Randomly connect the remaining cabinets (optional)
2 Randomly connect the
cabinets in each row
Skywalk Construction
2014-05-20
10
Koibuchi Lab @ National Institute of Informatics
4 Randomly connect the remaining cabinets (optional)
3 Randomly connect the
cabinets in each column
2 Randomly connect the
cabinets in each row
1 Randomly connect the switches in each cabinet — Possibly fully connected
Skywalk Details
Parameters
z = Number of switch in each cabinet
c = Number of cabinets
di = Number of intra-cabinet links at a switch
do = Number of inter-cabinet links at a switch
d = di + do = Total degree
Cyclic linking
Inter-cabinet links are connected to one of the switches in that
cabinet in a cyclic manner
Fastest routing
Packets choose lowest-latency paths (not shortest-hop paths)
2014-05-20
11
Koibuchi Lab @ National Institute of Informatics
Standpoints of Skywalk and Dragonfly
2014-05-20
12
Koibuchi Lab @ National Institute of Informatics
Geometrical Design Topological Design
Ring+Random [Koibuchi et al. ISCA12]
HyperX [Ahn et al. SC09]
Jellyfish [Singla et al. NDSI12]
Torus / Hypercube
Dragonfly 2-layer hierarchical meta-topology
with intra-group and inter-group
sub-topologies
Skywalk A Dragonfly instance
• group = cabinet
• intra-group: random
• inter-group: random
Introduction
Skywalk construction
Graph analysis
Switch delay vs. latency
Degree vs. latency
Total cable length vs. latency
Network size vs. latency
Cabinet size vs. latency
Cycle-accurate simulation
Conclusion
Agenda
2014-05-20
13
Koibuchi Lab @ National Institute of Informatics
Graph Analysis: Setup
Parameters: (unless otherwise specified)
z = 8 switches/cabinet
c = 256 cabinets arranged in a 16×16 grid
N = 2,048 switches in total
Switch delay = 60 ns
Packet injection delay = 300 ns
Featured topologies:
Skywalk fully connected for intra-cabinet
Random d-degree uniform random
Torus 3-D (8×16×16) or 5-D (8×4×4×4×4)
HyperX tailored to map onto the floorplan
Dragonfly group=cabinet, fully connected for both intra- and inter-group
See the proceeding for average latency
2014-05-20
14
Koibuchi Lab @ National Institute of Informatics
Switch Delay vs. Latency
2014-05-20
15
Koibuchi Lab @ National Institute of Informatics
* HyperX is omitted. See the proceeding for complete results.
0
0.5
1
1.5
2
2.5
3
3.5
0 100 200 300 400 500
Maxi
mu
m late
ncy
[μ
s]
Switch delay [ns]
3-D Torus
d=6
Hypercube
d=11 Random
d=11
Skywalk
d=11
Dragonfly
d=39
0.5
0.6
0.7
0.8
0.9
0 20 40 60
Skywalk leads to the lowest latency with ultra-low-delay
switches and also with high-delay switches
d = degree
Degree vs. Latency
2014-05-20
16
Koibuchi Lab @ National Institute of Informatics
0.7
0.8
0.9
1
1.1
1.2
1.3
1.4
1.5
0 8 16 24 32 40
Maxi
mu
m late
ncy
[μ
s]
Degree
5-D Torus
HyperX
Random
Skywalk
Dragonfly
* Skywalk with di = {1, 4} and Hypercube are omitted. See the proceeding for complete results.
Skywalk leads to a desirable tradeoff between degree and
latency
Total Cable Length vs. Latency
2014-05-20
17
Koibuchi Lab @ National Institute of Informatics
0.7
0.8
0.9
1
1.1
1.2
1.3
1.4
1.5
0 200 400 600
Maxi
mu
m late
ncy
[μ
s]
Total cable length [km]
5-D Torus
HyperX
Random
Skywalk
Dragonfly
* Skywalk with di = {1, 4} and Hypercube are omitted. See the proceeding for complete results.
Skywalk saves 90% cable length over Dragonfly with only
19% higher maximum latency
Network Size vs. Latency
2014-05-20
18
Koibuchi Lab @ National Institute of Informatics
0.5
0.6
0.7
0.8
0.9
1
1.1
1.2
1.3
128 512 2048 8192
Maxi
mu
m late
ncy
[μ
s]
#Switch
Skywalk
d=8Skywalk
d=16 Skywalk
d=32
Skywalk
d=64
Dragonfly
d=9
3-D Torus
d=6
Dragonfly
d=39
Dragonfly
d=135
Dragonfly
d=15
* Hypercube is omitted. See the proceeding for complete results.
Skywalk scales well with relatively low degree
d = degree
Cabinet Size vs. Latency
2014-05-20
19
Koibuchi Lab @ National Institute of Informatics
0.6
0.7
0.8
0.9
1
1.1
1.2
2 8 32 128
Maxi
mu
m late
ncy
[μ
s]
#Switch/cabinet
Skywalk
d=8
Skywalk
d=16
Skywalk
d=32
Skywalk has an optimal cabinet size because it becomes
similar to Random with very large or very small cabinets
d = degree
Introduction
Skywalk construction
Graph analysis
Cycle-accurate simulation
Throughput vs. latency
Conclusion
Agenda
2014-05-20
20
Koibuchi Lab @ National Institute of Informatics
Cycle-accurate Simulation: Setup
Topology parameters: h = 8 hosts/switch
z = 4 switches/cabinet
c = 64 cabinets arranged in an 8×8 grid
N = 256 switches in total
Switch delay = 60 ns
Simulation parameters: Adaptive deadlock-free routing
4 virtual channels
256 bits/flit × 33 flits/packet = 8,448 bits/packet
96 Gbps/switch ÷ 8 hosts/switch = 12 Gbps/host max.
Random uniform traffic
See the proceeding for: Bit reversal traffic
Matrix transpose traffic
2014-05-20
21
Koibuchi Lab @ National Institute of Informatics
Cycle-accurate Simulation: Result
2014-05-20
22
Koibuchi Lab @ National Institute of Informatics
* HyperX is omitted. See the proceeding for complete results.
0.3
0.4
0.5
0.6
0.7
0.8
0.9
1
1.1
0 2 4 6 8 10 12
Late
ncy
[μ
s]
Accepted traffic [Gbit/sec/host]
Skywalk
d=11
Hypercube
d=8
Dragonfly
d=19
3-D Torus
d=6Random
d=11
Skywalk achieves low latency and higher throughput than
Random at lower degree than Dragonfly
d = degree
Introduction
Skywalk construction
Graph analysis
Cycle-accurate simulation
Conclusion
Agenda
2014-05-20
23
Koibuchi Lab @ National Institute of Informatics
Wrap-up
The speed of light affects topology design once ultra-low-
delay switches are put into practical use
We propose the “Skywalk” topology that uses randomness
in a layout-conscious way
Skywalk achieves desirable tradeoffs between end-to-end
latency and degree or cable length
Cycle-accurate simulation show that Skywalk provides not
only low latency but also high throughput at low degree
2014-05-20
24
Koibuchi Lab @ National Institute of Informatics
Geometrical Design Topological Design
Skywalk