![Page 1: A scalable fabric architecture for data center networks...Introducing Stardust • Complex edge, simple network fabric • Fabric Element - Fabric device • A simple cell switch •](https://reader030.vdocuments.site/reader030/viewer/2022040608/5ec8da352f87ac71af46f1ce/html5/thumbnails/1.jpg)
© N. Zilberman, G. Bracha, G. Schzukin 2019
StardustDivide and Conquer in the Data Center Network
Golan Schzukin & Gabi BrachaBroadcom
Noa ZilbermanUniversity of Cambridge
February 2019
![Page 2: A scalable fabric architecture for data center networks...Introducing Stardust • Complex edge, simple network fabric • Fabric Element - Fabric device • A simple cell switch •](https://reader030.vdocuments.site/reader030/viewer/2022040608/5ec8da352f87ac71af46f1ce/html5/thumbnails/2.jpg)
© N. Zilberman, G. Bracha, G. Schzukin 2019
Network switches
Switch silicon
Switch box
Switch chassis
Scale: 12.8Tbps, 32×400GE 2
![Page 3: A scalable fabric architecture for data center networks...Introducing Stardust • Complex edge, simple network fabric • Fabric Element - Fabric device • A simple cell switch •](https://reader030.vdocuments.site/reader030/viewer/2022040608/5ec8da352f87ac71af46f1ce/html5/thumbnails/3.jpg)
© N. Zilberman, G. Bracha, G. Schzukin 2019
Network switch systems
3
Scale: Petabit / second
![Page 4: A scalable fabric architecture for data center networks...Introducing Stardust • Complex edge, simple network fabric • Fabric Element - Fabric device • A simple cell switch •](https://reader030.vdocuments.site/reader030/viewer/2022040608/5ec8da352f87ac71af46f1ce/html5/thumbnails/4.jpg)
© N. Zilberman, G. Bracha, G. Schzukin 2019
Data center networksConnecting 10K’s to 100K’s of servers
4
![Page 5: A scalable fabric architecture for data center networks...Introducing Stardust • Complex edge, simple network fabric • Fabric Element - Fabric device • A simple cell switch •](https://reader030.vdocuments.site/reader030/viewer/2022040608/5ec8da352f87ac71af46f1ce/html5/thumbnails/5.jpg)
© N. Zilberman, G. Bracha, G. Schzukin 2019
Do data center networks scale?
Network FabricLink Bundle
5
![Page 6: A scalable fabric architecture for data center networks...Introducing Stardust • Complex edge, simple network fabric • Fabric Element - Fabric device • A simple cell switch •](https://reader030.vdocuments.site/reader030/viewer/2022040608/5ec8da352f87ac71af46f1ce/html5/thumbnails/6.jpg)
© N. Zilberman, G. Bracha, G. Schzukin 2019
• Example: Building DC with 100K servers (2500 ToR switches)• Option 1 – Link bundle of 1 (L=1):
–6.4Tbps Fabric Switch, 256×25G–Requires 2 Tiers#fabric-switches = 1172
• Option 2 – Link bundle of 4 (L=4): –6.4Tbps Fabric Switch, 64×100G–Requires 3 Tiers#fabric-switches = 1954 (×1.66 more)
Do data center networks scale?
In a network of 𝑛𝑛 tiers scale is 𝑂𝑂 𝐿𝐿−𝑛𝑛
6
![Page 7: A scalable fabric architecture for data center networks...Introducing Stardust • Complex edge, simple network fabric • Fabric Element - Fabric device • A simple cell switch •](https://reader030.vdocuments.site/reader030/viewer/2022040608/5ec8da352f87ac71af46f1ce/html5/thumbnails/7.jpg)
© N. Zilberman, G. Bracha, G. Schzukin 2019
Observation: A link bundle of one enables an optimum build of the network(i.e., less tiers, less switches, …)
Do data center networks scale?
7
![Page 8: A scalable fabric architecture for data center networks...Introducing Stardust • Complex edge, simple network fabric • Fabric Element - Fabric device • A simple cell switch •](https://reader030.vdocuments.site/reader030/viewer/2022040608/5ec8da352f87ac71af46f1ce/html5/thumbnails/8.jpg)
© N. Zilberman, G. Bracha, G. Schzukin 2019
Designing new network devices
• A decade ago: “Can we implement this feature?”
• Today: “Is this feature worth implementing, given the design constraints?”
8
![Page 9: A scalable fabric architecture for data center networks...Introducing Stardust • Complex edge, simple network fabric • Fabric Element - Fabric device • A simple cell switch •](https://reader030.vdocuments.site/reader030/viewer/2022040608/5ec8da352f87ac71af46f1ce/html5/thumbnails/9.jpg)
© N. Zilberman, G. Bracha, G. Schzukin 2019
The resource wall• Network silicon die > 7 Billion transistors (Tomahawk, 2014)
• Limited by:• Power density• Die size• Manufacturing feasibility
9
![Page 10: A scalable fabric architecture for data center networks...Introducing Stardust • Complex edge, simple network fabric • Fabric Element - Fabric device • A simple cell switch •](https://reader030.vdocuments.site/reader030/viewer/2022040608/5ec8da352f87ac71af46f1ce/html5/thumbnails/10.jpg)
© N. Zilberman, G. Bracha, G. Schzukin 2019
PKT
Data center network
10
![Page 11: A scalable fabric architecture for data center networks...Introducing Stardust • Complex edge, simple network fabric • Fabric Element - Fabric device • A simple cell switch •](https://reader030.vdocuments.site/reader030/viewer/2022040608/5ec8da352f87ac71af46f1ce/html5/thumbnails/11.jpg)
© N. Zilberman, G. Bracha, G. Schzukin 2019
Switch system
Line card Fabric card Fabric card Fabric card Line card
11
PKT
![Page 12: A scalable fabric architecture for data center networks...Introducing Stardust • Complex edge, simple network fabric • Fabric Element - Fabric device • A simple cell switch •](https://reader030.vdocuments.site/reader030/viewer/2022040608/5ec8da352f87ac71af46f1ce/html5/thumbnails/12.jpg)
© N. Zilberman, G. Bracha, G. Schzukin 2019
Why waste resources?in n tier network
O(n×(Switching+2×I/O+2×NIF)+n×(Ingress Processing + Egress Processing + Queueing))
O(n×(Switching+2×I/O+2×NIF)+1×(Ingress Processing + Egress Processing + Queueing))
12
![Page 13: A scalable fabric architecture for data center networks...Introducing Stardust • Complex edge, simple network fabric • Fabric Element - Fabric device • A simple cell switch •](https://reader030.vdocuments.site/reader030/viewer/2022040608/5ec8da352f87ac71af46f1ce/html5/thumbnails/13.jpg)
© N. Zilberman, G. Bracha, G. Schzukin 2019
Observation: Significant resources can be saved by simplifying the data center network
Why waste resources?
13
![Page 14: A scalable fabric architecture for data center networks...Introducing Stardust • Complex edge, simple network fabric • Fabric Element - Fabric device • A simple cell switch •](https://reader030.vdocuments.site/reader030/viewer/2022040608/5ec8da352f87ac71af46f1ce/html5/thumbnails/14.jpg)
© N. Zilberman, G. Bracha, G. Schzukin 2019
12.8Tbps Switches!
Lets convert to packet rate requirements:5800 Mpps @ 256B (100GE→38.7Mpps)
19200 Mpps @ 64B (100GE→150Mpps)
But clock rate is only ~1GHz….
The single-pipeline switch
14
![Page 15: A scalable fabric architecture for data center networks...Introducing Stardust • Complex edge, simple network fabric • Fabric Element - Fabric device • A simple cell switch •](https://reader030.vdocuments.site/reader030/viewer/2022040608/5ec8da352f87ac71af46f1ce/html5/thumbnails/15.jpg)
© N. Zilberman, G. Bracha, G. Schzukin 2019
Observation: To support full line rate for all packet sizes,network devices need to process multiple packets each and every clock cycle.
The age of multi core has reached switching…
The single-pipeline switch
15
![Page 16: A scalable fabric architecture for data center networks...Introducing Stardust • Complex edge, simple network fabric • Fabric Element - Fabric device • A simple cell switch •](https://reader030.vdocuments.site/reader030/viewer/2022040608/5ec8da352f87ac71af46f1ce/html5/thumbnails/16.jpg)
© N. Zilberman, G. Bracha, G. Schzukin 2019
The switch pipelineThe common depiction:
16
PKT
![Page 17: A scalable fabric architecture for data center networks...Introducing Stardust • Complex edge, simple network fabric • Fabric Element - Fabric device • A simple cell switch •](https://reader030.vdocuments.site/reader030/viewer/2022040608/5ec8da352f87ac71af46f1ce/html5/thumbnails/17.jpg)
© N. Zilberman, G. Bracha, G. Schzukin 2019
PACKET512B
Actual Implementation:Throughput = clock frequency x bus width
Data pathWidthe.g. 256B
256B
256B
CLOCK CLOCKCYCLE2 CYCLE1
The switch pipeline
17
![Page 18: A scalable fabric architecture for data center networks...Introducing Stardust • Complex edge, simple network fabric • Fabric Element - Fabric device • A simple cell switch •](https://reader030.vdocuments.site/reader030/viewer/2022040608/5ec8da352f87ac71af46f1ce/html5/thumbnails/18.jpg)
© N. Zilberman, G. Bracha, G. Schzukin 2019
PACKET257B
Actual Implementation:Throughput ≠ clock frequency x bus width
Data pathWidthe.g. 256B
256B
1BCLOCK CLOCKCYCLE2 CYCLE1
The switch pipeline
18
![Page 19: A scalable fabric architecture for data center networks...Introducing Stardust • Complex edge, simple network fabric • Fabric Element - Fabric device • A simple cell switch •](https://reader030.vdocuments.site/reader030/viewer/2022040608/5ec8da352f87ac71af46f1ce/html5/thumbnails/19.jpg)
© N. Zilberman, G. Bracha, G. Schzukin 2019
02468
101214161820
64 320 576 832 1088 1344 1600
Requ
ired
Para
llelis
m
Packet Size [B]
12.8Tbps Switches!
Lets convert to packet rate requirements:5800 Mpps @ 256B (100GE→38.7Mpps)19200 Mpps @ 64B (100GE→150Mpps)But clock rate is only ~1GHz….
The single-pipeline switch
0
2
4
6
8
10
12
14
16
18
20
64 320 576 832 1088 1344 1600
Requ
ired
Para
llelis
m
Packet Size [B]
But if we pack data optimally…
19
![Page 20: A scalable fabric architecture for data center networks...Introducing Stardust • Complex edge, simple network fabric • Fabric Element - Fabric device • A simple cell switch •](https://reader030.vdocuments.site/reader030/viewer/2022040608/5ec8da352f87ac71af46f1ce/html5/thumbnails/20.jpg)
© N. Zilberman, G. Bracha, G. Schzukin 2019
Observation: To support full line rate for all packet sizes, network devices need to process multiple packets each and every clock cycle.
Observation: For best switch utilization, use fixed-size data units (cells)
The age of multi core has reached networking…
The single-pipeline switch
20
![Page 21: A scalable fabric architecture for data center networks...Introducing Stardust • Complex edge, simple network fabric • Fabric Element - Fabric device • A simple cell switch •](https://reader030.vdocuments.site/reader030/viewer/2022040608/5ec8da352f87ac71af46f1ce/html5/thumbnails/21.jpg)
© N. Zilberman, G. Bracha, G. Schzukin 2019
• A link bundle of one enables an optimum build of the network (i.e. less tiers, less switches, …)
• Significant resources can be saved by simplifying the network fabric
• To support full line rate for all packet sizes, network devices need to process multiple packets each and every clock cycle.
• For best switch utilization, use fixed-size data units (cells)
Observations
21
![Page 22: A scalable fabric architecture for data center networks...Introducing Stardust • Complex edge, simple network fabric • Fabric Element - Fabric device • A simple cell switch •](https://reader030.vdocuments.site/reader030/viewer/2022040608/5ec8da352f87ac71af46f1ce/html5/thumbnails/22.jpg)
© N. Zilberman, G. Bracha, G. Schzukin 2019
Introducing StardustFrom switch-system to data-center scale
22
![Page 23: A scalable fabric architecture for data center networks...Introducing Stardust • Complex edge, simple network fabric • Fabric Element - Fabric device • A simple cell switch •](https://reader030.vdocuments.site/reader030/viewer/2022040608/5ec8da352f87ac71af46f1ce/html5/thumbnails/23.jpg)
© N. Zilberman, G. Bracha, G. Schzukin 2019
Introducing Stardust• Complex edge, simple network fabric
• Fabric Element - Fabric device
• A simple cell switch
• Fabric Adapter – Edge device
• A packet switch
• Quite similar to a ToR
• Chops packets to cells 7th generation
5th generation
23
Widely used in switch-systems
![Page 24: A scalable fabric architecture for data center networks...Introducing Stardust • Complex edge, simple network fabric • Fabric Element - Fabric device • A simple cell switch •](https://reader030.vdocuments.site/reader030/viewer/2022040608/5ec8da352f87ac71af46f1ce/html5/thumbnails/24.jpg)
© N. Zilberman, G. Bracha, G. Schzukin 2019
A Stardust based network
No Link Bundles
24
![Page 25: A scalable fabric architecture for data center networks...Introducing Stardust • Complex edge, simple network fabric • Fabric Element - Fabric device • A simple cell switch •](https://reader030.vdocuments.site/reader030/viewer/2022040608/5ec8da352f87ac71af46f1ce/html5/thumbnails/25.jpg)
© N. Zilberman, G. Bracha, G. Schzukin 2019
Dynamic cell routing
Input 1 Output 1
Input 7 Output 1
Non-Blocking
123
456
789
123
456
789
Input 9 Output 7Input 8 Output 2
1/3, 1/3, 1/3 1/3
25
![Page 26: A scalable fabric architecture for data center networks...Introducing Stardust • Complex edge, simple network fabric • Fabric Element - Fabric device • A simple cell switch •](https://reader030.vdocuments.site/reader030/viewer/2022040608/5ec8da352f87ac71af46f1ce/html5/thumbnails/26.jpg)
© N. Zilberman, G. Bracha, G. Schzukin 2019
Reachability table• Need to know only the destination Fabric Adapter
• 1M virtual machines → 100K end hosts → 2500 Fabric Adapters
• Entries indicate “reachable through these links”
• “You can get to Fabric Adapter 1 using links 1,5,8,14,36”
• Bitmap of size “switch radix”
• Automatically constructed and updated
• Using reachability messages2
123
26
![Page 27: A scalable fabric architecture for data center networks...Introducing Stardust • Complex edge, simple network fabric • Fabric Element - Fabric device • A simple cell switch •](https://reader030.vdocuments.site/reader030/viewer/2022040608/5ec8da352f87ac71af46f1ce/html5/thumbnails/27.jpg)
© N. Zilberman, G. Bracha, G. Schzukin 2019
Buffering and scheduling• Packet buffering at the edge
• Using virtual output queues (VOQ) at the ingress Fabric Adapter
• A distributed scheduled fabric• A Fabric Adapter generates credits (e.g. 4KB) to all non-empty
associated VOQ
432-node Fat-Tree(simulation)
2790
KB F
low
s
![Page 28: A scalable fabric architecture for data center networks...Introducing Stardust • Complex edge, simple network fabric • Fabric Element - Fabric device • A simple cell switch •](https://reader030.vdocuments.site/reader030/viewer/2022040608/5ec8da352f87ac71af46f1ce/html5/thumbnails/28.jpg)
© N. Zilberman, G. Bracha, G. Schzukin 2019
Packet packing
28
![Page 29: A scalable fabric architecture for data center networks...Introducing Stardust • Complex edge, simple network fabric • Fabric Element - Fabric device • A simple cell switch •](https://reader030.vdocuments.site/reader030/viewer/2022040608/5ec8da352f87ac71af46f1ce/html5/thumbnails/29.jpg)
© N. Zilberman, G. Bracha, G. Schzukin 2019
Packet packing
29
+
NetFPGA SUME
![Page 30: A scalable fabric architecture for data center networks...Introducing Stardust • Complex edge, simple network fabric • Fabric Element - Fabric device • A simple cell switch •](https://reader030.vdocuments.site/reader030/viewer/2022040608/5ec8da352f87ac71af46f1ce/html5/thumbnails/30.jpg)
© N. Zilberman, G. Bracha, G. Schzukin 2019
Properties Protocol and traffic pattern agnosticism
Improved resilience and self healing
Less network tiers, better scalability
Optimal load balancing
Lossless transmission
Incast absorption
Pull fabric and port fairness30
Cell switching & packing, dynamic routing, fabric scheduling
Reachability messages, link bundling, dynamic routing
Link bundling, reachability messages, dynamic routing
Dynamic routing, cell switching & packing, fabric scheduling
Fabric scheduling, dynamic routing, cell switching, reachability messages
Fabric scheduling, dynamic routing, cell switching, reachability messages
Fabric scheduling, dynamic routing, cell switching, link bundling
![Page 31: A scalable fabric architecture for data center networks...Introducing Stardust • Complex edge, simple network fabric • Fabric Element - Fabric device • A simple cell switch •](https://reader030.vdocuments.site/reader030/viewer/2022040608/5ec8da352f87ac71af46f1ce/html5/thumbnails/31.jpg)
© N. Zilberman, G. Bracha, G. Schzukin 2019
Power and cost – entire network• Less network tiers → less devices• Less power & area (cost) per device
− Fabric Element saves 35% of power− Fabric Element saves 33.3% of silicon area
• Save 87% of header processing area• Save 70% of network interface area
31
![Page 32: A scalable fabric architecture for data center networks...Introducing Stardust • Complex edge, simple network fabric • Fabric Element - Fabric device • A simple cell switch •](https://reader030.vdocuments.site/reader030/viewer/2022040608/5ec8da352f87ac71af46f1ce/html5/thumbnails/32.jpg)
© N. Zilberman, G. Bracha, G. Schzukin 2019
What about the future?• Scalability of ToR / Fabric Adapter is the bottleneck
• Let us replace the ToR with a Fabric Element
• Let us turn the NIC into a Fabric Adapter• Lighter MAC
• Smaller tables
• Limited VOQs
• Fabric adapters already support DMA
32
Port
s
PCIe
SoC
Light MAC DMA
Engine
VoQ ReachabilityTable
![Page 33: A scalable fabric architecture for data center networks...Introducing Stardust • Complex edge, simple network fabric • Fabric Element - Fabric device • A simple cell switch •](https://reader030.vdocuments.site/reader030/viewer/2022040608/5ec8da352f87ac71af46f1ce/html5/thumbnails/33.jpg)
© N. Zilberman, G. Bracha, G. Schzukin 2019
Stardust - summaryFrom switch-system to data center scale:• Simple network fabric• Push complexity to the edge
• Combines:• Cell switching and Packet packing• Load balancing• Scheduled fabric• Reduced network tiers
• Better performance• Lower power, lower cost
33
![Page 34: A scalable fabric architecture for data center networks...Introducing Stardust • Complex edge, simple network fabric • Fabric Element - Fabric device • A simple cell switch •](https://reader030.vdocuments.site/reader030/viewer/2022040608/5ec8da352f87ac71af46f1ce/html5/thumbnails/34.jpg)
© N. Zilberman, G. Bracha, G. Schzukin 2019
Acknowledgements
34