department of computer science, jinan university, guangzhou, p.r. china lijun lyu, junjie xie, yuhui...
TRANSCRIPT
Department of Computer Science, Jinan University, Guangzhou, P.R. China
Lijun Lyu, Junjie Xie, Yuhui Deng, Yongtao Zhou
ICA3PP 2014: The 14th International Conference on Algorithms & Architectures for Parallel Processing. August 24-27, Dalian, China.
• Motivation
• Challenges
• Related work
• Our idea
• System architecture
• Evaluation
• Conclusion
2
• The Explosive Growth of Data IDC: 1,800EB data in 2011, 40-60% annual increase
Larger Data Center Google: 19 data centers > 1 million servers
Higher traffic Cisco forecasts that annual traffic in global data centers will
nearly triple over the next 5 years and reach 7.7ZB by the end of 2017
3Google Data Center
• Data Center Network Node increment Scalability? Failures are common Fault tolerance?
Google MapReduce in a 4,000-node cluster: 5 nodes fail during a job 1 disk fails every 6 hours
Bandwidth-hungry services Network capacity?Infrastructure services: MapReduce, GFS, …
Network applications: Cloud disk, Video, …
• Tree-based Structure Traditional tree
Bandwidth bottleneck, Single points of failure, Expensive
Modified tree: Fat-tree High capacity Limited scalability
5
Traditional Tree-based StructureFat-tree
• Other novel, hybrid network structures Physical topology
Level-based, but not tree-based Recursively defined
Routing mechanism No routers, without traditional internet routing mechanism Put routing intelligence on servers Take advantage of structural properties
Typical structures DCell, FiConn, BCube, Totoro…
6
• DCell
7
• Totoro
• FiConn
• BCube
• Physical structures
• Routing mechanisms
8
DCell Totoro FiConn BCube
Core idea Divide-and-ConquerCorrect different
address digits
Calculation Hop by hop Full path
Link state Broadcast domain Path probing
Path selection Dijkstra + Rerouting Greedy Available one
Traffic-aware No mention Yes No mention
Shortest distance
No Yes
• What we achieve: Athena Routing Mechanism Routing algorithm
Based on Dynamic Programming Find the shortest path with lower complexity than classic algorithms Support Multi-path
Path probing mechanism Bypass the failed nodes & links Traffic-aware
PropertiesMore resilient, shorter latency, higher capacity, Lower complexity
9
• Athena Routing Mechanism Implement on the structure of Totoro Compare with the original Totoro Fault-tolerant Routing
Algorithm (TFR) and Shortest Path Algorithm (SPA, based on Floyd-Warshall).
Applicable to DCell, FiConn, BCube… Similar topology: level-based, recursively defined.. Put routing intelligence on servers
10
• Totoro Two-port servers Low-end switches Level-based Recursively defined
two-port NIC
11Totoro Structure of One Level
• Building Totoro Connect N servers to an N-port switch Here, N=4 Basic partition: Totoro0
Intra-switch
A Totoro0 Structure 12
• Building Totoro Available ports in Totoro0: c. Here, c=4
Connect n Totoro0s to n-port switches by using c/2 ports
Inter-switch
A Totoro1 structure consists of n Totoro0s. 13
• Building Totoro Connect n Totoroi-1s to n-port switches to build a
Totoroi
Recursively defined Half of available ports ⇒ Open & Scalable The number of paths among Totorois is n/2 times of
the number of paths among Totoroi-1s ⇒ Multi-redundant links ⇒ High network capacity
14
15
0, 0, 0 0, 0, 10, 0, 2 0, 0, 3 0, 1, 0 0, 1, 1 0, 1, 20, 1, 3 0, 2, 0 0, 2, 1 0, 2, 20, 2, 3 0, 3, 0 0, 3, 1 0, 3, 2 0, 3, 3
3, 2, 33, 2, 23, 2, 13, 2, 0 3, 3, 33, 3, 23, 3, 13, 3, 03, 1, 33, 1, 23, 1, 13, 1, 03, 0, 33, 0, 23, 0, 13, 0, 0 2, 3, 32, 3, 22, 3, 02, 2, 32, 2, 22, 2, 12, 2, 02, 1, 32, 1, 22, 1, 12, 1, 02, 0, 32, 0, 22, 0, 1
1-0, 0 1-0, 1
1-2, 11-2, 01-3, 0 1-3, 1
2-0 2-1 2-2 2-3
1-1, 0 1-1, 1
1, 0, 0 1, 0, 11, 0, 2 1, 0, 3 1, 1, 0 1, 1, 1 1, 1, 21, 1, 3 1, 2, 0 1, 2, 1 1, 2, 21, 2, 3 1, 3, 1 1, 3, 2 1, 3, 31, 3, 0
2, 3, 12, 0, 0
Level -1 Link
Level -2 Link
Totoro2 structure with N = 4, n = 4, K = 2.
16
• Athena Routing Algorithm (ARA) Based on Dynamic Programming (DP) Applicable to problems which exhibit the properties of
Overlapping subproblems Optimal substructure
Recursively calculate
17
Steps of ARA: 1.Suppose src and dst belong to two partitions.2.Get all paths connecting these two partitions.3.For each path, recursively calculate it.4.Store all paths.5.Sort all path by length.6.Remove the extra paths.
This function is based on the corresponding structural properties.
Cartesian product
18
• Case study of ARA work out the path from src to dst
19
• Case study of ARA Step. 1: src and dst belong to two different sub-
partitions respectively
20
• Case study of ARA Step. 2: there exist two paths between these two sub-
partitions
21
• Case study of ARA Step. 3: for Path 1, recursively work out the sub-paths
in these sub-partitions, and join them for a full path
22
• Case study of ARA Step. 4: similarly, work out the full path for Path 2
23
• Case study of ARA Step. 5: add all paths into the result set
24
• Case study of ARA Step. 5: sort the paths by lengths
25
• Case study of ARA Step. 5: remove the extra paths (here, we suppose the
size of set to return is 1, i.e., it is the shortest path)
26
• Path Probing Mechanism Source host sends the probing request packets Destination host sends probing reply packets Intermediate servers record the link capacities in the
probing packets and forward them
27
• Path Probing Mechanism Detect the failed paths No extra rerouting technique
is required Detect the link capacity Support load balance…
28
29
30
• Protocol Implementation ARM Packet format
Path-probing packetData packet
31
• Protocol Implementation Protocol
2.5-layer protocol
How an intermediate server determines the next hop? A fact: two adjacent servers in a path only differ at one “bit” Hence, we only store the different “bit”s in the vector.
• Evaluating Path Failure & Average Path Lengths ARM vs. TFR vs. SPA
TFR: the original Totoro Fault-tolerant Routing algorithm
SPA: Shortest Path Algorithm, Floyd-Warshall, performance bound
• Evaluating Resource Usage
32
33
• Evaluating Path Failure & Average Path Lengths Experimental parameters
Types of failures Link, Node, Switch & Rack failures
Platform Totoro2 (4096 servers)
Failures ratios 2% - 20%
Communication mode All-to-all
Simulation times 20 times
34
• Evaluating Path Failure Path failure ratio vs. server/rack failure ratio
The performance of ARM/TFR are almost identical to that of SPA!
35
• Evaluating Path Failure Path failure ratio vs. switch failure ratio
The performance of ARM is almost identical to that of SPA!
But TFR isn’t.
36
• Evaluating Path Failure Path failure ratio vs. link failure ratio When a high link failure occurs:
ARM achieves slightly better capacity than TFR. Performance gap between ARM and SPA still exists!
SPA traverse all feasible links in the whole structure until finding a valid path!
This is a tradeoff that ARM makes to facilitate algorithmic complexity and save computation resources.
37
• Evaluating Average Path Lengths
ARM:1.Better than TFR.2.Almost identical to SPA.3.Shorter than SPA, this is because the path failure ratio of ARM is a bit higher than that of SPA, thus our total path length is shorter.
38
• Evaluating Resource Usage Experimental parameters
Testbed Lenovo T350, Quad-core, 8GB memory
Platform Totoro2 (4096 servers)
Size of each result 10 paths
Communication mode One-to-all in 4 Totoro1
39
• Evaluating Resource Usage
+10nodes/s
28%
18s
0%
CPU:1.Increase by 10 per second2.Peak value of 28% at 18s3.Benefited from the cache
Memory:For each host, it only costs 164KB at most.
• More resilient• Shorter latency
• Higher capacity• Lower complexity
• In the future work, we will focus on the implementation of ARM in DCell, FiConn and other structures!
40
41
ICA3PP 2014: The 14th International Conference on Algorithms & Architectures for Parallel Processing. August 24-27, Dalian, China.