ocpeu14

30
Accelerate networking innovation through programmable data plane Removing switches from datacenters with TRILL/VNT and smartNIC Ahmed Amamou, [email protected] Benoît Ganne, [email protected]

Upload: kalray

Post on 03-Aug-2015

311 views

Category:

Technology


2 download

TRANSCRIPT

Page 1: Ocpeu14

Accelerate networking innovation through programmable data plane

Removing switches from datacenters with TRILL/VNT and smartNIC

Ahmed Amamou, [email protected] Benoît Ganne, [email protected]

Page 2: Ocpeu14

• Gandi is a domain name registrars since 1999 and a cloud provider

since 2008

• We provide both – IaaS: Infrastructure As A Service – PaaS: Platform As A Service

• We support open source community:

– Provide open source code : https://github.com/Gandi – Support open source project: VLC, Debian, … *

* Check http://www.gandi.net/supports/ for exhaustive list

Who is Gandi?

2

Page 3: Ocpeu14

IaaS new network’s challenges

3

• Cisco Forecast report*: – Cloud traffic was about 3.3 zetta (1021) Bytes in 2013 – Cloud traffic will reach 6.6 zetta Bytes in 2016 – 76% of cloud traffic are East-West (within the same datacenter)

A high density of links within a datacenter is needed

• Customer need a full network access – Should be isolated – VM network configuration should not be restrictive

Overlaying tenant traffic should be considered * Cisco Global Cloud Index Forecast and Methodology, 2011-2016.

Page 4: Ocpeu14

• New protocols are proposed to solve these problems (TRILL , VXLAN, 802.1 ad STT …) but: – Hardware integration is slow – Protocol extensions are hard to integrate

• We believe the OpenCompute community can help us

– To define an open, vendor-neutral API for

programmable data plane

– Bring open hardware fulfilling those needs

Why OpenCompute?

4

Page 5: Ocpeu14

• Switch from classic datacenter architecture to a full-mesh one • Upgrade hardware to improve performances

New datacenter architecture

5

Page 6: Ocpeu14

TRILL @Gandi

6

• Gandi uses commodity hardware as TRILL Rbridges since 2013 • We did not yet found hardware that suits our needs.

Page 7: Ocpeu14

• Layer 2 Routing Protocol • Uses a control and a data plane • Control plane : based on IS-IS that computes all Routing information • Data plane : forward packets using provided information from control plane • Uses Mac-in-Mac encapsulation

TRILL: TRansparent Interconnection of Lot of Links

7

Original payload TRILL Header

Page 8: Ocpeu14

TRILL benefits

8

Commutation(L2) Routing (L3) TRILL

Configuration Minimal Intense Minimal

Plug & play Yes No Yes

Discovery Automatic Configured Automatic

Learning Automatic Configured Automatic

Multi path No Yes Yes

Convergence Slow Fast Fast

Connectivity Inflexible Flexible Flexible

Scale Limited Important Important

Page 9: Ocpeu14

Control Plane: Forwarding database

9

Page 10: Ocpeu14

Multitenancy: Virtual Network over TRILL (VNT)

10

New cloud architecture have to take into consideration Multitenancy Trill does not provide Multitenancy handling mechanisms → We need to extend it

Page 11: Ocpeu14

• Update Both control and data planes – Control plane : Prune multicast tree to limit multicast traffic

– Data plane : Forwarding is conditioned by VNI support

VNT vs TRILL

11

VNT Encapsulation

Original Ethernet Frame

Outer Destination Mac Address

Outer Source Mac Address

Optional Outer IEEE 802.1Q

TRILL Header VNT Header Extensions

Original Packet Payload

Egress Rbridge Nickname

Ingress Rbridge Nickname

Options description

TLV VNI Tag (24 bits)

L2 Routing information Tenant identification

Publication: Amamou, A., Haddadou, K., & Pujolle, G. (2014). A TRILL-based multi-tenant data center network. Computer Networks.

Page 12: Ocpeu14

VNT: Multicast tree pruning

12

n3

n4 n5

n8

n7

n6

n1

n2

i1 i1

i2 i2

i1

i2

i1

i2 i1

i2 i1

i3

i3

i3 i2

i2

i1

i2

i1

i3

n3

n4 n5

n8

n7

n6

n1

n2

i1 i1

i2 i2

i1

i2

i1

i2 i1

i2 i1

i3

i3

i3 i2

i2

i1

i2

i1

i3

n5 n2 n8

n1

n7 n6 n4 n3

n5 n2

n1

n6

A –Vni1

A –Vni1

B –Vni1

Topology Multicast tree

Page 13: Ocpeu14

Current VNT implementation on Linux

13

Control plane : Quagga daemon

Data plane: Linux Bridge Module

Page 14: Ocpeu14

Current VNT implementation on Linux

14

Control plane : Quagga daemon

Data plane: Linux Bridge Module

https://github.com/Gandi/

Page 15: Ocpeu14

• Throughput is affected by the addition processing operation

• Processing for a single packet is not affected

Data plane: performance

15

Throughput Delay

Page 16: Ocpeu14

• Shift data plane from host to smartNIC

– Increase performance

– Offload x86 for other usages

• eg. Customers workload

Improving performance

16

Host Host

NIC

smartNIC

Control plane

Data plane

Control plane

Data plane

Page 17: Ocpeu14

• Founded in 2008, fabless semiconductor company

• Kalray has developed the disruptive MPPA® (Multi-Purpose Processing Array) programmable architecture

– Leading Performance / Energy Ratio Worldwide – Time predictability and low latency – Heterogeneous applications on the same chip – High programmability

• Working with industry-leading partners and

customers

• 55 employees

• Offices in France and US

KALRAY deterministic supercomputing on a chip

17

First MPPA®-256 Chips with CMOS 28nm TSMC

Leading Performance / Energy Ratio Worldwide

Page 18: Ocpeu14

Software Defined NIC Smart packet classification/dispatching 256 cores for packets processing Standard C/C++ with GCC-4.9 Advanced debugging and profiling

Low latency Zero-copy Ethernet PCIe < 1µs port-to-port transparent mode < 1µs port to system memory

System integration Linux support Virtualization support Low power

High throughput / Line rate 80 Gbps full-duplex line-rate (2x120MPPS) 3400 instructions per packet @64B AES, SHA-1, SHA-2,CRC accelerators 2 x PCIe Gen3 8-lanes

MPPA®-256 Bostan Networking Strengths

18

Page 19: Ocpeu14

MPPA®-256 Bostan

• 64-bit processor • Up to 800MHz • High Performance

– 845 GFLOPS SP / 422 GFLOPS DP – 1 TOPS

• High Bandwidth Network On a Chip – 2 x 12.8 GB/s

• High Speed Ethernet – Up to 2x40 Gbps / 2x120 MPPS @ 64B

• DDR3 Memory interfaces – 2 x 64-bit + ECC @2133MT/s / 2 x

17GB/s

• PCIe Gen3 interface – 2 x 8-lanes / 2 x 8 GB/s full duplex – End Point / Root Complex

• NoCX extension – 2 x 40 Gbps + 2 x 80 Gbps ILK

• Flash controller, GPIOs…

19

Page 20: Ocpeu14

MPPA®-256 Processor Hierarchical Architecture 256 Processing Engine cores + 32 Resource Management cores

20

Manycore Processor Compute Cluster VLIW Core

Instruction Level

Parallelism

Thread Level

Parallelism

Process Level

Parallelism

Page 21: Ocpeu14

High Speed Ethernet Packet processing

• Ethernet Rx dispatcher – 8 classification tables

• Classify

• Extract fields

• Smart Dispatch

– Round Robin way

– Flexible cores allocation • Round Robin vs. classification

• Per 10G Ports

• Ethernet Tx – 64 Tx FIFOs

– QoS between the FIFOs

– Flow Control between clusters and Tx FIFOs

21

Patent pending

Page 22: Ocpeu14

VNT on a programmable data plane Multicast forwarding example

22

MPPA Linux ethernet driver

Linux networking stack

TRILL controller

Kalray Bostan smartNIC

x86

Hypervisor

MPPA Linux ethernet driver

Linux networking stack

Userspace application

• On-going work between Gandi and Kalray – Explore programmable data

plane opportunities

– Study a VNT smartNIC feasibility and architecture

• Multicast forwarding put a high load on each node IO ethernet driver

8x10GbE

Page 23: Ocpeu14

VNT on a programmable data plane Multicast forwarding example

23

MPPA Linux ethernet driver

Linux networking stack

TRILL controller

x86

Hypervisor

MPPA Linux ethernet driver

Linux networking stack

Userspace application

• Dispatch the packet based on Egress Rbridge – In case of multicast, Egress

RBridge is set to the tree root

– Each cluster “owns” a subset of the possible Egress RBridge (ie. a FIB subset)

8x10GbE

IO ethernet driver if (Packet[Ethertype] == TRILL) {

send to cluster #HASH(Egress RBridge)

}

Kalray Bostan smartNIC <Ethertype=TRILL, Egress=DTROOT, VNI=VNI-1>

Page 24: Ocpeu14

VNT on a programmable data plane Multicast forwarding example

24

MPPA Linux ethernet driver

Linux networking stack

TRILL controller

x86

Hypervisor

MPPA Linux ethernet driver

Linux networking stack

Userspace application

8x10GbE

IO ethernet driver

Kalray Bostan smartNIC

• Dispatch the packet based on Egress Rbridge – In case of multicast, Egress

RBridge is set to the tree root

– Each cluster “owns” a subset of the possible Egress RBridge (ie. a FIB subset)

Page 25: Ocpeu14

VNT on a programmable data plane Multicast forwarding example

25

MPPA Linux ethernet driver

Linux networking stack

TRILL controller

x86

Hypervisor

MPPA Linux ethernet driver

Linux networking stack

Userspace application

• Lookup the list of next-hop RBridges for this multicast tree – RBridge owner clusters can

be local or remote

• Lookup the LIB for local ports if any

8x10GbE

IO ethernet driver

Kalray Bostan smartNIC

FIB[Egress RBridge] = {

Egress RBridge MAC;

Egress RBridge Interface;

MCTree = [ RBx, RBy, … ];

VNI = [ VNI-1, VNI-2, … ];

} LIB = {

(Local MACx, Local Portx, VNI-1);

}

Page 26: Ocpeu14

VNT on a programmable data plane Multicast forwarding example

26

MPPA Linux ethernet driver

Linux networking stack

TRILL controller

x86

Hypervisor

MPPA Linux ethernet driver

Linux networking stack

Userspace application

• Forward the frame – Remote

• Forward to clusters owning the next-hop RBridge

– Local • Decapsulte inner frame

• Forward it the local VM

8x10GbE

IO ethernet driver

Kalray Bostan smartNIC

Page 27: Ocpeu14

VNT on a programmable data plane Multicast forwarding example

27

MPPA Linux ethernet driver

Linux networking stack

TRILL controller

x86

Hypervisor

MPPA Linux ethernet driver

Linux networking stack

Userspace application

• Check if the RBridge support the appropriate VNI – If yes forward to Rbridge

– If not, stop here 8x10GbE

IO ethernet driver

Kalray Bostan smartNIC

FIB[Egress RBridge] = {

Egress RBridge MAC;

Egress RBridge Interface;

MCTree = [ RBx, RBy, … ];

VNI = [ VNI-1, VNI-2, … ];

}

Page 28: Ocpeu14

VNT on a programmable data plane Multicast forwarding example

28

MPPA Linux ethernet driver

Linux networking stack

TRILL controller

x86

Hypervisor

MPPA Linux ethernet driver

Linux networking stack

Userspace application

• Check if the RBridge support the appropriate VNI – If yes forward to Rbridge

– If not, stop here 8x10GbE

IO ethernet driver

Kalray Bostan smartNIC

Page 29: Ocpeu14

• Solving SDN and network virtualization challenges requires new protocols – eg. VXLAN, NVGRE, TRILL/VNT…

• Efficiency generally means hardware support …But hardware development cannot keep up with

software and slow down innovation

• Gandi and Kalray think a programmable data plane can reconcile efficiency and innovation …But we need open ecosystems, standards and API

Innovation and efficiency

29

Page 30: Ocpeu14

Thank you for your attention!

Questions?

Ahmed Amamou, [email protected] Benoît Ganne, [email protected]