data center switch architecture in the age of merchant...
TRANSCRIPT
![Page 1: Data Center Switch Architecture in the Age of Merchant …nathanfarrington.com/presentations/merchant_silicon-hoti09-slides.pdf · Data Center Switch Architecture in the Age of](https://reader030.vdocuments.site/reader030/viewer/2022021801/5b3ba6527f8b9a560a8cc1d2/html5/thumbnails/1.jpg)
Data Center Switch Architecturein the Age of Merchant Silicon
Nathan FarringtonErik Rubow
Amin Vahdat
![Page 2: Data Center Switch Architecture in the Age of Merchant …nathanfarrington.com/presentations/merchant_silicon-hoti09-slides.pdf · Data Center Switch Architecture in the Age of](https://reader030.vdocuments.site/reader030/viewer/2022021801/5b3ba6527f8b9a560a8cc1d2/html5/thumbnails/2.jpg)
The Network is a Bottleneck
• HTTP request amplification– Web search (e.g. Google)
– Small object retrieval (e.g. Facebook)
– Web services (e.g. Amazon.com)
• MapReduce-style parallel computation– Inverted search index
– Data analytics
• Need high-performance interconnects
Hot Interconnects August 27, 2009
Nathan Farrington [email protected]
2
![Page 3: Data Center Switch Architecture in the Age of Merchant …nathanfarrington.com/presentations/merchant_silicon-hoti09-slides.pdf · Data Center Switch Architecture in the Age of](https://reader030.vdocuments.site/reader030/viewer/2022021801/5b3ba6527f8b9a560a8cc1d2/html5/thumbnails/3.jpg)
The Network is Expensive
Hot Interconnects August 27, 2009
Nathan Farrington [email protected]
3
Rack 1 Rack 2 Rack 3 Rack N
8xGbE
. . . 48xGbE TOR Switch . . .
. . . 40x1U Servers . . .
10GbE
![Page 4: Data Center Switch Architecture in the Age of Merchant …nathanfarrington.com/presentations/merchant_silicon-hoti09-slides.pdf · Data Center Switch Architecture in the Age of](https://reader030.vdocuments.site/reader030/viewer/2022021801/5b3ba6527f8b9a560a8cc1d2/html5/thumbnails/4.jpg)
What we really need: One Big Switch
• Commodity
• Plug-and-play
• Potentially no oversubscription
Hot Interconnects August 27, 2009
Nathan Farrington [email protected]
4
Rack 1 Rack 2 Rack 3 Rack N
…
![Page 5: Data Center Switch Architecture in the Age of Merchant …nathanfarrington.com/presentations/merchant_silicon-hoti09-slides.pdf · Data Center Switch Architecture in the Age of](https://reader030.vdocuments.site/reader030/viewer/2022021801/5b3ba6527f8b9a560a8cc1d2/html5/thumbnails/5.jpg)
Why not just use a fat tree of commodity TOR switches?
M. Al-Fares, A. Loukissas, A. Vahdat. A Scalable, Commodity Data Center Network Architecture. In SIGCOMM ’08.
Hot Interconnects August 27, 2009
5Nathan Farrington
k=4,n=3
![Page 6: Data Center Switch Architecture in the Age of Merchant …nathanfarrington.com/presentations/merchant_silicon-hoti09-slides.pdf · Data Center Switch Architecture in the Age of](https://reader030.vdocuments.site/reader030/viewer/2022021801/5b3ba6527f8b9a560a8cc1d2/html5/thumbnails/6.jpg)
10 Tons of Cable
• 55,296 Cat-6 cables
• 1,128 separate cable bundles
The “Yellow Wall”
Hot Interconnects August 27, 2009
6Nathan Farrington
![Page 7: Data Center Switch Architecture in the Age of Merchant …nathanfarrington.com/presentations/merchant_silicon-hoti09-slides.pdf · Data Center Switch Architecture in the Age of](https://reader030.vdocuments.site/reader030/viewer/2022021801/5b3ba6527f8b9a560a8cc1d2/html5/thumbnails/7.jpg)
Merchant Silicon gives usCommodity Switches
Maker Broadcom Fulcrum Fujitsu
Model BCM56820 FM4224 MB86C69RBC
Ports 24 24 26
Cost NDA NDA $410
Power NDA 20 W 22 W
Latency < 1 μs 300 ns 300 ns
Area NDA 40 x 40 mm 35 x 35 mm
SRAM NDA 2 MB 2.9 MB
Process 65 nm 130 nm 90 nm
Hot Interconnects August 27, 2009
7Nathan Farrington
![Page 8: Data Center Switch Architecture in the Age of Merchant …nathanfarrington.com/presentations/merchant_silicon-hoti09-slides.pdf · Data Center Switch Architecture in the Age of](https://reader030.vdocuments.site/reader030/viewer/2022021801/5b3ba6527f8b9a560a8cc1d2/html5/thumbnails/8.jpg)
Eliminate Redundancy
• Networks of packet switches contain many redundant components– chassis, power
conditioning circuits, cooling
– CPUs, DRAM
• Repackage these discrete switches to lower the cost and power consumption
CPUASIC
PHY
SFP+ SFP+ SFP+
FAN
FAN
FAN
FAN
PSU
8 Ports
Hot Interconnects August 27, 2009
8Nathan Farrington
![Page 9: Data Center Switch Architecture in the Age of Merchant …nathanfarrington.com/presentations/merchant_silicon-hoti09-slides.pdf · Data Center Switch Architecture in the Age of](https://reader030.vdocuments.site/reader030/viewer/2022021801/5b3ba6527f8b9a560a8cc1d2/html5/thumbnails/9.jpg)
Our Architecture, in a Nutshell
• Fat tree of merchant silicon switch ASICs• Hiding cabling complexity with PCB traces and
optics• Partition into multiple pod switches + single
core switch array• Custom EEP ASIC to further reduce cost and
power• Scales to 65,536 ports when 64-port ASICs
become available, late 2009
Hot Interconnects August 27, 2009
Nathan Farrington [email protected]
9
![Page 10: Data Center Switch Architecture in the Age of Merchant …nathanfarrington.com/presentations/merchant_silicon-hoti09-slides.pdf · Data Center Switch Architecture in the Age of](https://reader030.vdocuments.site/reader030/viewer/2022021801/5b3ba6527f8b9a560a8cc1d2/html5/thumbnails/10.jpg)
3 Different Designs
• 24-ary 3-tree
• 720 switch ASICs
• 3,456 ports of 10GbE
• No oversubscription
Hot Interconnects August 27, 2009
Nathan Farrington [email protected]
10
1 2 3
![Page 11: Data Center Switch Architecture in the Age of Merchant …nathanfarrington.com/presentations/merchant_silicon-hoti09-slides.pdf · Data Center Switch Architecture in the Age of](https://reader030.vdocuments.site/reader030/viewer/2022021801/5b3ba6527f8b9a560a8cc1d2/html5/thumbnails/11.jpg)
Network 1: No Engineering Required
Hot Interconnects August 27, 2009
Nathan Farrington [email protected]
11
Cost of Parts $4.88M
Power 52.7 kW
Cabling Complexity 3,456
Footprint 720 RU
NRE $0
• 720 discrete packet switches, connected with optical fiber
Cabling complexity (noun): the number of long cables in a data center network.
![Page 12: Data Center Switch Architecture in the Age of Merchant …nathanfarrington.com/presentations/merchant_silicon-hoti09-slides.pdf · Data Center Switch Architecture in the Age of](https://reader030.vdocuments.site/reader030/viewer/2022021801/5b3ba6527f8b9a560a8cc1d2/html5/thumbnails/12.jpg)
Network 2: Custom Boards and Chassis
Hot Interconnects August 27, 2009
Nathan Farrington [email protected]
12
Cost of Parts $3.07M
Power 41.0 kW
Cabling Complexity 96
Footprint 192 RU
NRE $3M est
• 24 “pod” switches, one core switch array, 96 cables
This design is shown in more detail later.
![Page 13: Data Center Switch Architecture in the Age of Merchant …nathanfarrington.com/presentations/merchant_silicon-hoti09-slides.pdf · Data Center Switch Architecture in the Age of](https://reader030.vdocuments.site/reader030/viewer/2022021801/5b3ba6527f8b9a560a8cc1d2/html5/thumbnails/13.jpg)
Switch at 10G,but Transmit at 40G
SFP SFP+ QSFP
Rate 1 Gb/s 10 Gb/s 40 Gb/s
Cost/Gb/s $35* $25* $15*
Power/Gb/s 500mW 150mW 60mW
* 2008-2009 Prices
Hot Interconnects August 27, 2009
13Nathan Farrington
![Page 14: Data Center Switch Architecture in the Age of Merchant …nathanfarrington.com/presentations/merchant_silicon-hoti09-slides.pdf · Data Center Switch Architecture in the Age of](https://reader030.vdocuments.site/reader030/viewer/2022021801/5b3ba6527f8b9a560a8cc1d2/html5/thumbnails/14.jpg)
Network 3: Network 2 + Custom ASIC
Hot Interconnects August 27, 2009
Nathan Farrington [email protected]
14
Cost of Parts $2.33M
Power 36.4 kW
Cabling Complexity 96
Footprint 114 RU
NRE $8M est
• Uses 40GbE between pod switches and core switch array; everything else is same as Network 2.
EEP
This simple ASIC provides tremendous cost and power savings.
![Page 15: Data Center Switch Architecture in the Age of Merchant …nathanfarrington.com/presentations/merchant_silicon-hoti09-slides.pdf · Data Center Switch Architecture in the Age of](https://reader030.vdocuments.site/reader030/viewer/2022021801/5b3ba6527f8b9a560a8cc1d2/html5/thumbnails/15.jpg)
Cost of Parts
4.88
3.072.33
0
1
2
3
4
5
6
Cost of Parts (in millions)
Network 1
Network 2
Network 3
Hot Interconnects August 27, 2009
Nathan Farrington [email protected]
15
![Page 16: Data Center Switch Architecture in the Age of Merchant …nathanfarrington.com/presentations/merchant_silicon-hoti09-slides.pdf · Data Center Switch Architecture in the Age of](https://reader030.vdocuments.site/reader030/viewer/2022021801/5b3ba6527f8b9a560a8cc1d2/html5/thumbnails/16.jpg)
Power Consumption
52.7
4136.4
0
10
20
30
40
50
60
Power Consumption (kW)
Network 1
Network 2
Network 3
Hot Interconnects August 27, 2009
Nathan Farrington [email protected]
16
![Page 17: Data Center Switch Architecture in the Age of Merchant …nathanfarrington.com/presentations/merchant_silicon-hoti09-slides.pdf · Data Center Switch Architecture in the Age of](https://reader030.vdocuments.site/reader030/viewer/2022021801/5b3ba6527f8b9a560a8cc1d2/html5/thumbnails/17.jpg)
Cabling Complexity
3,456
96 960
500
1,000
1,500
2,000
2,500
3,000
3,500
4,000
Cabling Complexity
Network 1
Network 2
Network 3
Hot Interconnects August 27, 2009
Nathan Farrington [email protected]
17
![Page 18: Data Center Switch Architecture in the Age of Merchant …nathanfarrington.com/presentations/merchant_silicon-hoti09-slides.pdf · Data Center Switch Architecture in the Age of](https://reader030.vdocuments.site/reader030/viewer/2022021801/5b3ba6527f8b9a560a8cc1d2/html5/thumbnails/18.jpg)
Footprint
720
192114
0
100
200
300
400
500
600
700
800
Footprint (in rack units)
Network 1
Network 2
Network 3
Hot Interconnects August 27, 2009
Nathan Farrington [email protected]
18
![Page 26: Data Center Switch Architecture in the Age of Merchant …nathanfarrington.com/presentations/merchant_silicon-hoti09-slides.pdf · Data Center Switch Architecture in the Age of](https://reader030.vdocuments.site/reader030/viewer/2022021801/5b3ba6527f8b9a560a8cc1d2/html5/thumbnails/26.jpg)
Why an Ethernet Extension Protocol?
• Optical transceivers are 80% of the cost
• EEP allows the use of fewer and faster optical transceivers
Hot Interconnects August 27, 2009
Nathan Farrington [email protected]
26
EEP EEP40GbE
10GbE
10GbE
10GbE
10GbE
10GbE
10GbE
10GbE
10GbE
![Page 27: Data Center Switch Architecture in the Age of Merchant …nathanfarrington.com/presentations/merchant_silicon-hoti09-slides.pdf · Data Center Switch Architecture in the Age of](https://reader030.vdocuments.site/reader030/viewer/2022021801/5b3ba6527f8b9a560a8cc1d2/html5/thumbnails/27.jpg)
How does EEP work?
• Ethernet frames are split up into EEP frames• Most EEP frames are 65 bytes
– Header is 1 byte; payload is 64 bytes
• Header encodes ingress/egress port
Hot Interconnects August 27, 2009
Nathan Farrington [email protected]
27
EEP EEP
![Page 28: Data Center Switch Architecture in the Age of Merchant …nathanfarrington.com/presentations/merchant_silicon-hoti09-slides.pdf · Data Center Switch Architecture in the Age of](https://reader030.vdocuments.site/reader030/viewer/2022021801/5b3ba6527f8b9a560a8cc1d2/html5/thumbnails/28.jpg)
How does EEP work?
• Round-robin arbiter• EEP frames are transmitted as one large
Ethernet frame• 40GbE overclocked by 1.6%
Hot Interconnects August 27, 2009
Nathan Farrington [email protected]
28
EEP EEP
![Page 30: Data Center Switch Architecture in the Age of Merchant …nathanfarrington.com/presentations/merchant_silicon-hoti09-slides.pdf · Data Center Switch Architecture in the Age of](https://reader030.vdocuments.site/reader030/viewer/2022021801/5b3ba6527f8b9a560a8cc1d2/html5/thumbnails/30.jpg)
Hot Interconnects August 27, 2009
Nathan Farrington [email protected]
30
EEP EEP
EEP Frames
123
1
12
1
3
2
![Page 31: Data Center Switch Architecture in the Age of Merchant …nathanfarrington.com/presentations/merchant_silicon-hoti09-slides.pdf · Data Center Switch Architecture in the Age of](https://reader030.vdocuments.site/reader030/viewer/2022021801/5b3ba6527f8b9a560a8cc1d2/html5/thumbnails/31.jpg)
Hot Interconnects August 27, 2009
Nathan Farrington [email protected]
31
EEP EEP
123
1
12
1
3
2
123
1
12
1
3
2
![Page 32: Data Center Switch Architecture in the Age of Merchant …nathanfarrington.com/presentations/merchant_silicon-hoti09-slides.pdf · Data Center Switch Architecture in the Age of](https://reader030.vdocuments.site/reader030/viewer/2022021801/5b3ba6527f8b9a560a8cc1d2/html5/thumbnails/32.jpg)
EEP Frame Format
SOF: Start of Ethernet Frame
EOF: End of Ethernet Frame
LEN: Set if EEP Frame contains less than 64B of payload
Virtual Link ID: Corresponds to port number (0-15)
Payload Length: (0-63B)
Hot Interconnects August 27, 2009
32Nathan Farrington
![Page 33: Data Center Switch Architecture in the Age of Merchant …nathanfarrington.com/presentations/merchant_silicon-hoti09-slides.pdf · Data Center Switch Architecture in the Age of](https://reader030.vdocuments.site/reader030/viewer/2022021801/5b3ba6527f8b9a560a8cc1d2/html5/thumbnails/33.jpg)
Why not use VLANs?
• Because it adds latency and requires more SRAM
• FPGA Implementation– VLAN tagging
– EEP
Hot Interconnects August 27, 2009
Nathan Farrington [email protected]
33
![Page 35: Data Center Switch Architecture in the Age of Merchant …nathanfarrington.com/presentations/merchant_silicon-hoti09-slides.pdf · Data Center Switch Architecture in the Age of](https://reader030.vdocuments.site/reader030/viewer/2022021801/5b3ba6527f8b9a560a8cc1d2/html5/thumbnails/35.jpg)
Related Work
• M. Al-Fares, A. Loukissas, A. Vahdat. A Scalable, Commodity Data Center Network Architecture. In SIGCOMM ’08.• Fat trees of commodity switches, Layer 3 routing, flow scheduling
• R. N. Mysore, A. Pamboris, N. Farrington, N. Huang, P. Miri, S. Radhakrishnan, V. Subramanya, and A. Vahdat. PortLand: A Scalable Fault-Tolerant Layer 2 Data Center Network Fabric. In SIGCOMM ’09.
– Layer 2 routing, plug-and-play configuration, fault tolerance, switch software modifications
• A. Greenberg, J. R. Hamilton, N. Jain, S. Kandula, C. Kim, P. Lahiri, D. A. Maltz, P. Patel, and S. Sengupta. VL2: A Scalable and Flexible Data Center Network. In SIGCOMM ’09.
– Layer 2 routing, end-host modifications
Hot Interconnects August 27, 2009
35Nathan Farrington
![Page 36: Data Center Switch Architecture in the Age of Merchant …nathanfarrington.com/presentations/merchant_silicon-hoti09-slides.pdf · Data Center Switch Architecture in the Age of](https://reader030.vdocuments.site/reader030/viewer/2022021801/5b3ba6527f8b9a560a8cc1d2/html5/thumbnails/36.jpg)
Conclusion
• General architecture– Fat tree of merchant silicon switch ASICs
– Hiding cabling complexity
– Pods + Core
– Custom EEP ASIC
– Scales to 65,536 ports with 64-port ASICs
• Design of a 3,456-port 10GbE switch
• Design of the EEP ASIC
Hot Interconnects August 27, 2009
Nathan Farrington [email protected]
36