![Page 1: Department of Computer Science, Jinan University, Guangzhou, P.R. China Lijun Lyu, Junjie Xie, Yuhui Deng, Yongtao Zhou ICA3PP 2014: The 14th International](https://reader030.vdocuments.site/reader030/viewer/2022032523/56649d745503460f94a54233/html5/thumbnails/1.jpg)
Department of Computer Science, Jinan University, Guangzhou, P.R. China
Lijun Lyu, Junjie Xie, Yuhui Deng, Yongtao Zhou
ICA3PP 2014: The 14th International Conference on Algorithms & Architectures for Parallel Processing. August 24-27, Dalian, China.
![Page 2: Department of Computer Science, Jinan University, Guangzhou, P.R. China Lijun Lyu, Junjie Xie, Yuhui Deng, Yongtao Zhou ICA3PP 2014: The 14th International](https://reader030.vdocuments.site/reader030/viewer/2022032523/56649d745503460f94a54233/html5/thumbnails/2.jpg)
• Motivation
• Challenges
• Related work
• Our idea
• System architecture
• Evaluation
• Conclusion
2
![Page 3: Department of Computer Science, Jinan University, Guangzhou, P.R. China Lijun Lyu, Junjie Xie, Yuhui Deng, Yongtao Zhou ICA3PP 2014: The 14th International](https://reader030.vdocuments.site/reader030/viewer/2022032523/56649d745503460f94a54233/html5/thumbnails/3.jpg)
• The Explosive Growth of Data IDC: 1,800EB data in 2011, 40-60% annual increase
Larger Data Center Google: 19 data centers > 1 million servers
Higher traffic Cisco forecasts that annual traffic in global data centers will
nearly triple over the next 5 years and reach 7.7ZB by the end of 2017
3Google Data Center
![Page 4: Department of Computer Science, Jinan University, Guangzhou, P.R. China Lijun Lyu, Junjie Xie, Yuhui Deng, Yongtao Zhou ICA3PP 2014: The 14th International](https://reader030.vdocuments.site/reader030/viewer/2022032523/56649d745503460f94a54233/html5/thumbnails/4.jpg)
• Data Center Network Node increment Scalability? Failures are common Fault tolerance?
Google MapReduce in a 4,000-node cluster: 5 nodes fail during a job 1 disk fails every 6 hours
Bandwidth-hungry services Network capacity?Infrastructure services: MapReduce, GFS, …
Network applications: Cloud disk, Video, …
![Page 5: Department of Computer Science, Jinan University, Guangzhou, P.R. China Lijun Lyu, Junjie Xie, Yuhui Deng, Yongtao Zhou ICA3PP 2014: The 14th International](https://reader030.vdocuments.site/reader030/viewer/2022032523/56649d745503460f94a54233/html5/thumbnails/5.jpg)
• Tree-based Structure Traditional tree
Bandwidth bottleneck, Single points of failure, Expensive
Modified tree: Fat-tree High capacity Limited scalability
5
Traditional Tree-based StructureFat-tree
![Page 6: Department of Computer Science, Jinan University, Guangzhou, P.R. China Lijun Lyu, Junjie Xie, Yuhui Deng, Yongtao Zhou ICA3PP 2014: The 14th International](https://reader030.vdocuments.site/reader030/viewer/2022032523/56649d745503460f94a54233/html5/thumbnails/6.jpg)
• Other novel, hybrid network structures Physical topology
Level-based, but not tree-based Recursively defined
Routing mechanism No routers, without traditional internet routing mechanism Put routing intelligence on servers Take advantage of structural properties
Typical structures DCell, FiConn, BCube, Totoro…
6
![Page 7: Department of Computer Science, Jinan University, Guangzhou, P.R. China Lijun Lyu, Junjie Xie, Yuhui Deng, Yongtao Zhou ICA3PP 2014: The 14th International](https://reader030.vdocuments.site/reader030/viewer/2022032523/56649d745503460f94a54233/html5/thumbnails/7.jpg)
• DCell
7
• Totoro
• FiConn
• BCube
• Physical structures
![Page 8: Department of Computer Science, Jinan University, Guangzhou, P.R. China Lijun Lyu, Junjie Xie, Yuhui Deng, Yongtao Zhou ICA3PP 2014: The 14th International](https://reader030.vdocuments.site/reader030/viewer/2022032523/56649d745503460f94a54233/html5/thumbnails/8.jpg)
• Routing mechanisms
8
DCell Totoro FiConn BCube
Core idea Divide-and-ConquerCorrect different
address digits
Calculation Hop by hop Full path
Link state Broadcast domain Path probing
Path selection Dijkstra + Rerouting Greedy Available one
Traffic-aware No mention Yes No mention
Shortest distance
No Yes
![Page 9: Department of Computer Science, Jinan University, Guangzhou, P.R. China Lijun Lyu, Junjie Xie, Yuhui Deng, Yongtao Zhou ICA3PP 2014: The 14th International](https://reader030.vdocuments.site/reader030/viewer/2022032523/56649d745503460f94a54233/html5/thumbnails/9.jpg)
• What we achieve: Athena Routing Mechanism Routing algorithm
Based on Dynamic Programming Find the shortest path with lower complexity than classic algorithms Support Multi-path
Path probing mechanism Bypass the failed nodes & links Traffic-aware
PropertiesMore resilient, shorter latency, higher capacity, Lower complexity
9
![Page 10: Department of Computer Science, Jinan University, Guangzhou, P.R. China Lijun Lyu, Junjie Xie, Yuhui Deng, Yongtao Zhou ICA3PP 2014: The 14th International](https://reader030.vdocuments.site/reader030/viewer/2022032523/56649d745503460f94a54233/html5/thumbnails/10.jpg)
• Athena Routing Mechanism Implement on the structure of Totoro Compare with the original Totoro Fault-tolerant Routing
Algorithm (TFR) and Shortest Path Algorithm (SPA, based on Floyd-Warshall).
Applicable to DCell, FiConn, BCube… Similar topology: level-based, recursively defined.. Put routing intelligence on servers
10
![Page 11: Department of Computer Science, Jinan University, Guangzhou, P.R. China Lijun Lyu, Junjie Xie, Yuhui Deng, Yongtao Zhou ICA3PP 2014: The 14th International](https://reader030.vdocuments.site/reader030/viewer/2022032523/56649d745503460f94a54233/html5/thumbnails/11.jpg)
• Totoro Two-port servers Low-end switches Level-based Recursively defined
two-port NIC
11Totoro Structure of One Level
![Page 12: Department of Computer Science, Jinan University, Guangzhou, P.R. China Lijun Lyu, Junjie Xie, Yuhui Deng, Yongtao Zhou ICA3PP 2014: The 14th International](https://reader030.vdocuments.site/reader030/viewer/2022032523/56649d745503460f94a54233/html5/thumbnails/12.jpg)
• Building Totoro Connect N servers to an N-port switch Here, N=4 Basic partition: Totoro0
Intra-switch
A Totoro0 Structure 12
![Page 13: Department of Computer Science, Jinan University, Guangzhou, P.R. China Lijun Lyu, Junjie Xie, Yuhui Deng, Yongtao Zhou ICA3PP 2014: The 14th International](https://reader030.vdocuments.site/reader030/viewer/2022032523/56649d745503460f94a54233/html5/thumbnails/13.jpg)
• Building Totoro Available ports in Totoro0: c. Here, c=4
Connect n Totoro0s to n-port switches by using c/2 ports
Inter-switch
A Totoro1 structure consists of n Totoro0s. 13
![Page 14: Department of Computer Science, Jinan University, Guangzhou, P.R. China Lijun Lyu, Junjie Xie, Yuhui Deng, Yongtao Zhou ICA3PP 2014: The 14th International](https://reader030.vdocuments.site/reader030/viewer/2022032523/56649d745503460f94a54233/html5/thumbnails/14.jpg)
• Building Totoro Connect n Totoroi-1s to n-port switches to build a
Totoroi
Recursively defined Half of available ports ⇒ Open & Scalable The number of paths among Totorois is n/2 times of
the number of paths among Totoroi-1s ⇒ Multi-redundant links ⇒ High network capacity
14
![Page 15: Department of Computer Science, Jinan University, Guangzhou, P.R. China Lijun Lyu, Junjie Xie, Yuhui Deng, Yongtao Zhou ICA3PP 2014: The 14th International](https://reader030.vdocuments.site/reader030/viewer/2022032523/56649d745503460f94a54233/html5/thumbnails/15.jpg)
15
0, 0, 0 0, 0, 10, 0, 2 0, 0, 3 0, 1, 0 0, 1, 1 0, 1, 20, 1, 3 0, 2, 0 0, 2, 1 0, 2, 20, 2, 3 0, 3, 0 0, 3, 1 0, 3, 2 0, 3, 3
3, 2, 33, 2, 23, 2, 13, 2, 0 3, 3, 33, 3, 23, 3, 13, 3, 03, 1, 33, 1, 23, 1, 13, 1, 03, 0, 33, 0, 23, 0, 13, 0, 0 2, 3, 32, 3, 22, 3, 02, 2, 32, 2, 22, 2, 12, 2, 02, 1, 32, 1, 22, 1, 12, 1, 02, 0, 32, 0, 22, 0, 1
1-0, 0 1-0, 1
1-2, 11-2, 01-3, 0 1-3, 1
2-0 2-1 2-2 2-3
1-1, 0 1-1, 1
1, 0, 0 1, 0, 11, 0, 2 1, 0, 3 1, 1, 0 1, 1, 1 1, 1, 21, 1, 3 1, 2, 0 1, 2, 1 1, 2, 21, 2, 3 1, 3, 1 1, 3, 2 1, 3, 31, 3, 0
2, 3, 12, 0, 0
Level -1 Link
Level -2 Link
Totoro2 structure with N = 4, n = 4, K = 2.
![Page 16: Department of Computer Science, Jinan University, Guangzhou, P.R. China Lijun Lyu, Junjie Xie, Yuhui Deng, Yongtao Zhou ICA3PP 2014: The 14th International](https://reader030.vdocuments.site/reader030/viewer/2022032523/56649d745503460f94a54233/html5/thumbnails/16.jpg)
16
• Athena Routing Algorithm (ARA) Based on Dynamic Programming (DP) Applicable to problems which exhibit the properties of
Overlapping subproblems Optimal substructure
Recursively calculate
![Page 17: Department of Computer Science, Jinan University, Guangzhou, P.R. China Lijun Lyu, Junjie Xie, Yuhui Deng, Yongtao Zhou ICA3PP 2014: The 14th International](https://reader030.vdocuments.site/reader030/viewer/2022032523/56649d745503460f94a54233/html5/thumbnails/17.jpg)
17
Steps of ARA: 1.Suppose src and dst belong to two partitions.2.Get all paths connecting these two partitions.3.For each path, recursively calculate it.4.Store all paths.5.Sort all path by length.6.Remove the extra paths.
This function is based on the corresponding structural properties.
Cartesian product
![Page 18: Department of Computer Science, Jinan University, Guangzhou, P.R. China Lijun Lyu, Junjie Xie, Yuhui Deng, Yongtao Zhou ICA3PP 2014: The 14th International](https://reader030.vdocuments.site/reader030/viewer/2022032523/56649d745503460f94a54233/html5/thumbnails/18.jpg)
18
• Case study of ARA work out the path from src to dst
![Page 19: Department of Computer Science, Jinan University, Guangzhou, P.R. China Lijun Lyu, Junjie Xie, Yuhui Deng, Yongtao Zhou ICA3PP 2014: The 14th International](https://reader030.vdocuments.site/reader030/viewer/2022032523/56649d745503460f94a54233/html5/thumbnails/19.jpg)
19
• Case study of ARA Step. 1: src and dst belong to two different sub-
partitions respectively
![Page 20: Department of Computer Science, Jinan University, Guangzhou, P.R. China Lijun Lyu, Junjie Xie, Yuhui Deng, Yongtao Zhou ICA3PP 2014: The 14th International](https://reader030.vdocuments.site/reader030/viewer/2022032523/56649d745503460f94a54233/html5/thumbnails/20.jpg)
20
• Case study of ARA Step. 2: there exist two paths between these two sub-
partitions
![Page 21: Department of Computer Science, Jinan University, Guangzhou, P.R. China Lijun Lyu, Junjie Xie, Yuhui Deng, Yongtao Zhou ICA3PP 2014: The 14th International](https://reader030.vdocuments.site/reader030/viewer/2022032523/56649d745503460f94a54233/html5/thumbnails/21.jpg)
21
• Case study of ARA Step. 3: for Path 1, recursively work out the sub-paths
in these sub-partitions, and join them for a full path
![Page 22: Department of Computer Science, Jinan University, Guangzhou, P.R. China Lijun Lyu, Junjie Xie, Yuhui Deng, Yongtao Zhou ICA3PP 2014: The 14th International](https://reader030.vdocuments.site/reader030/viewer/2022032523/56649d745503460f94a54233/html5/thumbnails/22.jpg)
22
• Case study of ARA Step. 4: similarly, work out the full path for Path 2
![Page 23: Department of Computer Science, Jinan University, Guangzhou, P.R. China Lijun Lyu, Junjie Xie, Yuhui Deng, Yongtao Zhou ICA3PP 2014: The 14th International](https://reader030.vdocuments.site/reader030/viewer/2022032523/56649d745503460f94a54233/html5/thumbnails/23.jpg)
23
• Case study of ARA Step. 5: add all paths into the result set
![Page 24: Department of Computer Science, Jinan University, Guangzhou, P.R. China Lijun Lyu, Junjie Xie, Yuhui Deng, Yongtao Zhou ICA3PP 2014: The 14th International](https://reader030.vdocuments.site/reader030/viewer/2022032523/56649d745503460f94a54233/html5/thumbnails/24.jpg)
24
• Case study of ARA Step. 5: sort the paths by lengths
![Page 25: Department of Computer Science, Jinan University, Guangzhou, P.R. China Lijun Lyu, Junjie Xie, Yuhui Deng, Yongtao Zhou ICA3PP 2014: The 14th International](https://reader030.vdocuments.site/reader030/viewer/2022032523/56649d745503460f94a54233/html5/thumbnails/25.jpg)
25
• Case study of ARA Step. 5: remove the extra paths (here, we suppose the
size of set to return is 1, i.e., it is the shortest path)
![Page 26: Department of Computer Science, Jinan University, Guangzhou, P.R. China Lijun Lyu, Junjie Xie, Yuhui Deng, Yongtao Zhou ICA3PP 2014: The 14th International](https://reader030.vdocuments.site/reader030/viewer/2022032523/56649d745503460f94a54233/html5/thumbnails/26.jpg)
26
• Path Probing Mechanism Source host sends the probing request packets Destination host sends probing reply packets Intermediate servers record the link capacities in the
probing packets and forward them
![Page 27: Department of Computer Science, Jinan University, Guangzhou, P.R. China Lijun Lyu, Junjie Xie, Yuhui Deng, Yongtao Zhou ICA3PP 2014: The 14th International](https://reader030.vdocuments.site/reader030/viewer/2022032523/56649d745503460f94a54233/html5/thumbnails/27.jpg)
27
• Path Probing Mechanism Detect the failed paths No extra rerouting technique
is required Detect the link capacity Support load balance…
![Page 28: Department of Computer Science, Jinan University, Guangzhou, P.R. China Lijun Lyu, Junjie Xie, Yuhui Deng, Yongtao Zhou ICA3PP 2014: The 14th International](https://reader030.vdocuments.site/reader030/viewer/2022032523/56649d745503460f94a54233/html5/thumbnails/28.jpg)
28
![Page 29: Department of Computer Science, Jinan University, Guangzhou, P.R. China Lijun Lyu, Junjie Xie, Yuhui Deng, Yongtao Zhou ICA3PP 2014: The 14th International](https://reader030.vdocuments.site/reader030/viewer/2022032523/56649d745503460f94a54233/html5/thumbnails/29.jpg)
29
![Page 30: Department of Computer Science, Jinan University, Guangzhou, P.R. China Lijun Lyu, Junjie Xie, Yuhui Deng, Yongtao Zhou ICA3PP 2014: The 14th International](https://reader030.vdocuments.site/reader030/viewer/2022032523/56649d745503460f94a54233/html5/thumbnails/30.jpg)
30
• Protocol Implementation ARM Packet format
Path-probing packetData packet
![Page 31: Department of Computer Science, Jinan University, Guangzhou, P.R. China Lijun Lyu, Junjie Xie, Yuhui Deng, Yongtao Zhou ICA3PP 2014: The 14th International](https://reader030.vdocuments.site/reader030/viewer/2022032523/56649d745503460f94a54233/html5/thumbnails/31.jpg)
31
• Protocol Implementation Protocol
2.5-layer protocol
How an intermediate server determines the next hop? A fact: two adjacent servers in a path only differ at one “bit” Hence, we only store the different “bit”s in the vector.
![Page 32: Department of Computer Science, Jinan University, Guangzhou, P.R. China Lijun Lyu, Junjie Xie, Yuhui Deng, Yongtao Zhou ICA3PP 2014: The 14th International](https://reader030.vdocuments.site/reader030/viewer/2022032523/56649d745503460f94a54233/html5/thumbnails/32.jpg)
• Evaluating Path Failure & Average Path Lengths ARM vs. TFR vs. SPA
TFR: the original Totoro Fault-tolerant Routing algorithm
SPA: Shortest Path Algorithm, Floyd-Warshall, performance bound
• Evaluating Resource Usage
32
![Page 33: Department of Computer Science, Jinan University, Guangzhou, P.R. China Lijun Lyu, Junjie Xie, Yuhui Deng, Yongtao Zhou ICA3PP 2014: The 14th International](https://reader030.vdocuments.site/reader030/viewer/2022032523/56649d745503460f94a54233/html5/thumbnails/33.jpg)
33
• Evaluating Path Failure & Average Path Lengths Experimental parameters
Types of failures Link, Node, Switch & Rack failures
Platform Totoro2 (4096 servers)
Failures ratios 2% - 20%
Communication mode All-to-all
Simulation times 20 times
![Page 34: Department of Computer Science, Jinan University, Guangzhou, P.R. China Lijun Lyu, Junjie Xie, Yuhui Deng, Yongtao Zhou ICA3PP 2014: The 14th International](https://reader030.vdocuments.site/reader030/viewer/2022032523/56649d745503460f94a54233/html5/thumbnails/34.jpg)
34
• Evaluating Path Failure Path failure ratio vs. server/rack failure ratio
The performance of ARM/TFR are almost identical to that of SPA!
![Page 35: Department of Computer Science, Jinan University, Guangzhou, P.R. China Lijun Lyu, Junjie Xie, Yuhui Deng, Yongtao Zhou ICA3PP 2014: The 14th International](https://reader030.vdocuments.site/reader030/viewer/2022032523/56649d745503460f94a54233/html5/thumbnails/35.jpg)
35
• Evaluating Path Failure Path failure ratio vs. switch failure ratio
The performance of ARM is almost identical to that of SPA!
But TFR isn’t.
![Page 36: Department of Computer Science, Jinan University, Guangzhou, P.R. China Lijun Lyu, Junjie Xie, Yuhui Deng, Yongtao Zhou ICA3PP 2014: The 14th International](https://reader030.vdocuments.site/reader030/viewer/2022032523/56649d745503460f94a54233/html5/thumbnails/36.jpg)
36
• Evaluating Path Failure Path failure ratio vs. link failure ratio When a high link failure occurs:
ARM achieves slightly better capacity than TFR. Performance gap between ARM and SPA still exists!
SPA traverse all feasible links in the whole structure until finding a valid path!
This is a tradeoff that ARM makes to facilitate algorithmic complexity and save computation resources.
![Page 37: Department of Computer Science, Jinan University, Guangzhou, P.R. China Lijun Lyu, Junjie Xie, Yuhui Deng, Yongtao Zhou ICA3PP 2014: The 14th International](https://reader030.vdocuments.site/reader030/viewer/2022032523/56649d745503460f94a54233/html5/thumbnails/37.jpg)
37
• Evaluating Average Path Lengths
ARM:1.Better than TFR.2.Almost identical to SPA.3.Shorter than SPA, this is because the path failure ratio of ARM is a bit higher than that of SPA, thus our total path length is shorter.
![Page 38: Department of Computer Science, Jinan University, Guangzhou, P.R. China Lijun Lyu, Junjie Xie, Yuhui Deng, Yongtao Zhou ICA3PP 2014: The 14th International](https://reader030.vdocuments.site/reader030/viewer/2022032523/56649d745503460f94a54233/html5/thumbnails/38.jpg)
38
• Evaluating Resource Usage Experimental parameters
Testbed Lenovo T350, Quad-core, 8GB memory
Platform Totoro2 (4096 servers)
Size of each result 10 paths
Communication mode One-to-all in 4 Totoro1
![Page 39: Department of Computer Science, Jinan University, Guangzhou, P.R. China Lijun Lyu, Junjie Xie, Yuhui Deng, Yongtao Zhou ICA3PP 2014: The 14th International](https://reader030.vdocuments.site/reader030/viewer/2022032523/56649d745503460f94a54233/html5/thumbnails/39.jpg)
39
• Evaluating Resource Usage
+10nodes/s
28%
18s
0%
CPU:1.Increase by 10 per second2.Peak value of 28% at 18s3.Benefited from the cache
Memory:For each host, it only costs 164KB at most.
![Page 40: Department of Computer Science, Jinan University, Guangzhou, P.R. China Lijun Lyu, Junjie Xie, Yuhui Deng, Yongtao Zhou ICA3PP 2014: The 14th International](https://reader030.vdocuments.site/reader030/viewer/2022032523/56649d745503460f94a54233/html5/thumbnails/40.jpg)
• More resilient• Shorter latency
• Higher capacity• Lower complexity
• In the future work, we will focus on the implementation of ARM in DCell, FiConn and other structures!
40
![Page 41: Department of Computer Science, Jinan University, Guangzhou, P.R. China Lijun Lyu, Junjie Xie, Yuhui Deng, Yongtao Zhou ICA3PP 2014: The 14th International](https://reader030.vdocuments.site/reader030/viewer/2022032523/56649d745503460f94a54233/html5/thumbnails/41.jpg)
41
ICA3PP 2014: The 14th International Conference on Algorithms & Architectures for Parallel Processing. August 24-27, Dalian, China.