ocpeu14
TRANSCRIPT
![Page 1: Ocpeu14](https://reader034.vdocuments.site/reader034/viewer/2022052700/55be677cbb61eb37668b4839/html5/thumbnails/1.jpg)
Accelerate networking innovation through programmable data plane
Removing switches from datacenters with TRILL/VNT and smartNIC
Ahmed Amamou, [email protected] Benoît Ganne, [email protected]
![Page 2: Ocpeu14](https://reader034.vdocuments.site/reader034/viewer/2022052700/55be677cbb61eb37668b4839/html5/thumbnails/2.jpg)
• Gandi is a domain name registrars since 1999 and a cloud provider
since 2008
• We provide both – IaaS: Infrastructure As A Service – PaaS: Platform As A Service
• We support open source community:
– Provide open source code : https://github.com/Gandi – Support open source project: VLC, Debian, … *
* Check http://www.gandi.net/supports/ for exhaustive list
Who is Gandi?
2
![Page 3: Ocpeu14](https://reader034.vdocuments.site/reader034/viewer/2022052700/55be677cbb61eb37668b4839/html5/thumbnails/3.jpg)
IaaS new network’s challenges
3
• Cisco Forecast report*: – Cloud traffic was about 3.3 zetta (1021) Bytes in 2013 – Cloud traffic will reach 6.6 zetta Bytes in 2016 – 76% of cloud traffic are East-West (within the same datacenter)
A high density of links within a datacenter is needed
• Customer need a full network access – Should be isolated – VM network configuration should not be restrictive
Overlaying tenant traffic should be considered * Cisco Global Cloud Index Forecast and Methodology, 2011-2016.
![Page 4: Ocpeu14](https://reader034.vdocuments.site/reader034/viewer/2022052700/55be677cbb61eb37668b4839/html5/thumbnails/4.jpg)
• New protocols are proposed to solve these problems (TRILL , VXLAN, 802.1 ad STT …) but: – Hardware integration is slow – Protocol extensions are hard to integrate
• We believe the OpenCompute community can help us
– To define an open, vendor-neutral API for
programmable data plane
– Bring open hardware fulfilling those needs
Why OpenCompute?
4
![Page 5: Ocpeu14](https://reader034.vdocuments.site/reader034/viewer/2022052700/55be677cbb61eb37668b4839/html5/thumbnails/5.jpg)
• Switch from classic datacenter architecture to a full-mesh one • Upgrade hardware to improve performances
New datacenter architecture
5
![Page 6: Ocpeu14](https://reader034.vdocuments.site/reader034/viewer/2022052700/55be677cbb61eb37668b4839/html5/thumbnails/6.jpg)
TRILL @Gandi
6
• Gandi uses commodity hardware as TRILL Rbridges since 2013 • We did not yet found hardware that suits our needs.
![Page 7: Ocpeu14](https://reader034.vdocuments.site/reader034/viewer/2022052700/55be677cbb61eb37668b4839/html5/thumbnails/7.jpg)
• Layer 2 Routing Protocol • Uses a control and a data plane • Control plane : based on IS-IS that computes all Routing information • Data plane : forward packets using provided information from control plane • Uses Mac-in-Mac encapsulation
TRILL: TRansparent Interconnection of Lot of Links
7
Original payload TRILL Header
![Page 8: Ocpeu14](https://reader034.vdocuments.site/reader034/viewer/2022052700/55be677cbb61eb37668b4839/html5/thumbnails/8.jpg)
TRILL benefits
8
Commutation(L2) Routing (L3) TRILL
Configuration Minimal Intense Minimal
Plug & play Yes No Yes
Discovery Automatic Configured Automatic
Learning Automatic Configured Automatic
Multi path No Yes Yes
Convergence Slow Fast Fast
Connectivity Inflexible Flexible Flexible
Scale Limited Important Important
![Page 9: Ocpeu14](https://reader034.vdocuments.site/reader034/viewer/2022052700/55be677cbb61eb37668b4839/html5/thumbnails/9.jpg)
Control Plane: Forwarding database
9
![Page 10: Ocpeu14](https://reader034.vdocuments.site/reader034/viewer/2022052700/55be677cbb61eb37668b4839/html5/thumbnails/10.jpg)
Multitenancy: Virtual Network over TRILL (VNT)
10
New cloud architecture have to take into consideration Multitenancy Trill does not provide Multitenancy handling mechanisms → We need to extend it
![Page 11: Ocpeu14](https://reader034.vdocuments.site/reader034/viewer/2022052700/55be677cbb61eb37668b4839/html5/thumbnails/11.jpg)
• Update Both control and data planes – Control plane : Prune multicast tree to limit multicast traffic
– Data plane : Forwarding is conditioned by VNI support
VNT vs TRILL
11
VNT Encapsulation
Original Ethernet Frame
Outer Destination Mac Address
Outer Source Mac Address
Optional Outer IEEE 802.1Q
TRILL Header VNT Header Extensions
Original Packet Payload
Egress Rbridge Nickname
Ingress Rbridge Nickname
Options description
TLV VNI Tag (24 bits)
L2 Routing information Tenant identification
Publication: Amamou, A., Haddadou, K., & Pujolle, G. (2014). A TRILL-based multi-tenant data center network. Computer Networks.
![Page 12: Ocpeu14](https://reader034.vdocuments.site/reader034/viewer/2022052700/55be677cbb61eb37668b4839/html5/thumbnails/12.jpg)
VNT: Multicast tree pruning
12
n3
n4 n5
n8
n7
n6
n1
n2
i1 i1
i2 i2
i1
i2
i1
i2 i1
i2 i1
i3
i3
i3 i2
i2
i1
i2
i1
i3
n3
n4 n5
n8
n7
n6
n1
n2
i1 i1
i2 i2
i1
i2
i1
i2 i1
i2 i1
i3
i3
i3 i2
i2
i1
i2
i1
i3
n5 n2 n8
n1
n7 n6 n4 n3
n5 n2
n1
n6
A –Vni1
A –Vni1
B –Vni1
Topology Multicast tree
![Page 13: Ocpeu14](https://reader034.vdocuments.site/reader034/viewer/2022052700/55be677cbb61eb37668b4839/html5/thumbnails/13.jpg)
Current VNT implementation on Linux
13
Control plane : Quagga daemon
Data plane: Linux Bridge Module
![Page 14: Ocpeu14](https://reader034.vdocuments.site/reader034/viewer/2022052700/55be677cbb61eb37668b4839/html5/thumbnails/14.jpg)
Current VNT implementation on Linux
14
Control plane : Quagga daemon
Data plane: Linux Bridge Module
https://github.com/Gandi/
![Page 15: Ocpeu14](https://reader034.vdocuments.site/reader034/viewer/2022052700/55be677cbb61eb37668b4839/html5/thumbnails/15.jpg)
• Throughput is affected by the addition processing operation
• Processing for a single packet is not affected
Data plane: performance
15
Throughput Delay
![Page 16: Ocpeu14](https://reader034.vdocuments.site/reader034/viewer/2022052700/55be677cbb61eb37668b4839/html5/thumbnails/16.jpg)
• Shift data plane from host to smartNIC
– Increase performance
– Offload x86 for other usages
• eg. Customers workload
Improving performance
16
Host Host
NIC
smartNIC
Control plane
Data plane
Control plane
Data plane
![Page 17: Ocpeu14](https://reader034.vdocuments.site/reader034/viewer/2022052700/55be677cbb61eb37668b4839/html5/thumbnails/17.jpg)
• Founded in 2008, fabless semiconductor company
• Kalray has developed the disruptive MPPA® (Multi-Purpose Processing Array) programmable architecture
– Leading Performance / Energy Ratio Worldwide – Time predictability and low latency – Heterogeneous applications on the same chip – High programmability
• Working with industry-leading partners and
customers
• 55 employees
• Offices in France and US
KALRAY deterministic supercomputing on a chip
17
First MPPA®-256 Chips with CMOS 28nm TSMC
Leading Performance / Energy Ratio Worldwide
![Page 18: Ocpeu14](https://reader034.vdocuments.site/reader034/viewer/2022052700/55be677cbb61eb37668b4839/html5/thumbnails/18.jpg)
Software Defined NIC Smart packet classification/dispatching 256 cores for packets processing Standard C/C++ with GCC-4.9 Advanced debugging and profiling
Low latency Zero-copy Ethernet PCIe < 1µs port-to-port transparent mode < 1µs port to system memory
System integration Linux support Virtualization support Low power
High throughput / Line rate 80 Gbps full-duplex line-rate (2x120MPPS) 3400 instructions per packet @64B AES, SHA-1, SHA-2,CRC accelerators 2 x PCIe Gen3 8-lanes
MPPA®-256 Bostan Networking Strengths
18
![Page 19: Ocpeu14](https://reader034.vdocuments.site/reader034/viewer/2022052700/55be677cbb61eb37668b4839/html5/thumbnails/19.jpg)
MPPA®-256 Bostan
• 64-bit processor • Up to 800MHz • High Performance
– 845 GFLOPS SP / 422 GFLOPS DP – 1 TOPS
• High Bandwidth Network On a Chip – 2 x 12.8 GB/s
• High Speed Ethernet – Up to 2x40 Gbps / 2x120 MPPS @ 64B
• DDR3 Memory interfaces – 2 x 64-bit + ECC @2133MT/s / 2 x
17GB/s
• PCIe Gen3 interface – 2 x 8-lanes / 2 x 8 GB/s full duplex – End Point / Root Complex
• NoCX extension – 2 x 40 Gbps + 2 x 80 Gbps ILK
• Flash controller, GPIOs…
19
![Page 20: Ocpeu14](https://reader034.vdocuments.site/reader034/viewer/2022052700/55be677cbb61eb37668b4839/html5/thumbnails/20.jpg)
MPPA®-256 Processor Hierarchical Architecture 256 Processing Engine cores + 32 Resource Management cores
20
Manycore Processor Compute Cluster VLIW Core
Instruction Level
Parallelism
Thread Level
Parallelism
Process Level
Parallelism
![Page 21: Ocpeu14](https://reader034.vdocuments.site/reader034/viewer/2022052700/55be677cbb61eb37668b4839/html5/thumbnails/21.jpg)
High Speed Ethernet Packet processing
• Ethernet Rx dispatcher – 8 classification tables
• Classify
• Extract fields
• Smart Dispatch
– Round Robin way
– Flexible cores allocation • Round Robin vs. classification
• Per 10G Ports
• Ethernet Tx – 64 Tx FIFOs
– QoS between the FIFOs
– Flow Control between clusters and Tx FIFOs
21
Patent pending
![Page 22: Ocpeu14](https://reader034.vdocuments.site/reader034/viewer/2022052700/55be677cbb61eb37668b4839/html5/thumbnails/22.jpg)
VNT on a programmable data plane Multicast forwarding example
22
MPPA Linux ethernet driver
Linux networking stack
TRILL controller
Kalray Bostan smartNIC
x86
Hypervisor
MPPA Linux ethernet driver
Linux networking stack
Userspace application
• On-going work between Gandi and Kalray – Explore programmable data
plane opportunities
– Study a VNT smartNIC feasibility and architecture
• Multicast forwarding put a high load on each node IO ethernet driver
8x10GbE
![Page 23: Ocpeu14](https://reader034.vdocuments.site/reader034/viewer/2022052700/55be677cbb61eb37668b4839/html5/thumbnails/23.jpg)
VNT on a programmable data plane Multicast forwarding example
23
MPPA Linux ethernet driver
Linux networking stack
TRILL controller
x86
Hypervisor
MPPA Linux ethernet driver
Linux networking stack
Userspace application
• Dispatch the packet based on Egress Rbridge – In case of multicast, Egress
RBridge is set to the tree root
– Each cluster “owns” a subset of the possible Egress RBridge (ie. a FIB subset)
8x10GbE
IO ethernet driver if (Packet[Ethertype] == TRILL) {
send to cluster #HASH(Egress RBridge)
}
Kalray Bostan smartNIC <Ethertype=TRILL, Egress=DTROOT, VNI=VNI-1>
![Page 24: Ocpeu14](https://reader034.vdocuments.site/reader034/viewer/2022052700/55be677cbb61eb37668b4839/html5/thumbnails/24.jpg)
VNT on a programmable data plane Multicast forwarding example
24
MPPA Linux ethernet driver
Linux networking stack
TRILL controller
x86
Hypervisor
MPPA Linux ethernet driver
Linux networking stack
Userspace application
8x10GbE
IO ethernet driver
Kalray Bostan smartNIC
• Dispatch the packet based on Egress Rbridge – In case of multicast, Egress
RBridge is set to the tree root
– Each cluster “owns” a subset of the possible Egress RBridge (ie. a FIB subset)
![Page 25: Ocpeu14](https://reader034.vdocuments.site/reader034/viewer/2022052700/55be677cbb61eb37668b4839/html5/thumbnails/25.jpg)
VNT on a programmable data plane Multicast forwarding example
25
MPPA Linux ethernet driver
Linux networking stack
TRILL controller
x86
Hypervisor
MPPA Linux ethernet driver
Linux networking stack
Userspace application
• Lookup the list of next-hop RBridges for this multicast tree – RBridge owner clusters can
be local or remote
• Lookup the LIB for local ports if any
8x10GbE
IO ethernet driver
Kalray Bostan smartNIC
FIB[Egress RBridge] = {
Egress RBridge MAC;
Egress RBridge Interface;
MCTree = [ RBx, RBy, … ];
VNI = [ VNI-1, VNI-2, … ];
} LIB = {
(Local MACx, Local Portx, VNI-1);
…
}
![Page 26: Ocpeu14](https://reader034.vdocuments.site/reader034/viewer/2022052700/55be677cbb61eb37668b4839/html5/thumbnails/26.jpg)
VNT on a programmable data plane Multicast forwarding example
26
MPPA Linux ethernet driver
Linux networking stack
TRILL controller
x86
Hypervisor
MPPA Linux ethernet driver
Linux networking stack
Userspace application
• Forward the frame – Remote
• Forward to clusters owning the next-hop RBridge
– Local • Decapsulte inner frame
• Forward it the local VM
8x10GbE
IO ethernet driver
Kalray Bostan smartNIC
![Page 27: Ocpeu14](https://reader034.vdocuments.site/reader034/viewer/2022052700/55be677cbb61eb37668b4839/html5/thumbnails/27.jpg)
VNT on a programmable data plane Multicast forwarding example
27
MPPA Linux ethernet driver
Linux networking stack
TRILL controller
x86
Hypervisor
MPPA Linux ethernet driver
Linux networking stack
Userspace application
• Check if the RBridge support the appropriate VNI – If yes forward to Rbridge
– If not, stop here 8x10GbE
IO ethernet driver
Kalray Bostan smartNIC
FIB[Egress RBridge] = {
Egress RBridge MAC;
Egress RBridge Interface;
MCTree = [ RBx, RBy, … ];
VNI = [ VNI-1, VNI-2, … ];
}
![Page 28: Ocpeu14](https://reader034.vdocuments.site/reader034/viewer/2022052700/55be677cbb61eb37668b4839/html5/thumbnails/28.jpg)
VNT on a programmable data plane Multicast forwarding example
28
MPPA Linux ethernet driver
Linux networking stack
TRILL controller
x86
Hypervisor
MPPA Linux ethernet driver
Linux networking stack
Userspace application
• Check if the RBridge support the appropriate VNI – If yes forward to Rbridge
– If not, stop here 8x10GbE
IO ethernet driver
Kalray Bostan smartNIC
![Page 29: Ocpeu14](https://reader034.vdocuments.site/reader034/viewer/2022052700/55be677cbb61eb37668b4839/html5/thumbnails/29.jpg)
• Solving SDN and network virtualization challenges requires new protocols – eg. VXLAN, NVGRE, TRILL/VNT…
• Efficiency generally means hardware support …But hardware development cannot keep up with
software and slow down innovation
• Gandi and Kalray think a programmable data plane can reconcile efficiency and innovation …But we need open ecosystems, standards and API
Innovation and efficiency
29
![Page 30: Ocpeu14](https://reader034.vdocuments.site/reader034/viewer/2022052700/55be677cbb61eb37668b4839/html5/thumbnails/30.jpg)
Thank you for your attention!
Questions?
Ahmed Amamou, [email protected] Benoît Ganne, [email protected]