scalable high-performance userland container networking for nfv · 2017-12-14 · challenges of...

37
Scalable High-performance Userland Container Networking for NFV Jianfeng Tan, Cunming Liang, Huawei Xie, Zhihong Wang, Yuanhan Liu, Heqing Zhu

Upload: others

Post on 04-Jun-2020

3 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Scalable High-performance Userland Container Networking for NFV · 2017-12-14 · Challenges of high perf. network •Forward 1~2 Mpps per core NIC Time budget for 64B Time budget

Scalable High-performance Userland Container Networking for NFV

Jianfeng Tan, Cunming Liang, Huawei Xie, Zhihong Wang, Yuanhan Liu, Heqing Zhu

Page 2: Scalable High-performance Userland Container Networking for NFV · 2017-12-14 · Challenges of high perf. network •Forward 1~2 Mpps per core NIC Time budget for 64B Time budget

Agenda

• Background

• Userland container networking

• Impact on application development

Page 3: Scalable High-performance Userland Container Networking for NFV · 2017-12-14 · Challenges of high perf. network •Forward 1~2 Mpps per core NIC Time budget for 64B Time budget

NFV and challenges

Virtual machine,

Container

Virtual Network function

Page 4: Scalable High-performance Userland Container Networking for NFV · 2017-12-14 · Challenges of high perf. network •Forward 1~2 Mpps per core NIC Time budget for 64B Time budget

VM vs Container based VNFs

• Challenges of VM-based VNFs

– Provisioning time

– Runtime performance overhead

• Challenges of container-based VNFs

– High performance networking

• For VNFs, like LB, FW, IDS/IPS, DPI, VPN, pktgen, Proxy, AppFilter, etc

– Security

Page 5: Scalable High-performance Userland Container Networking for NFV · 2017-12-14 · Challenges of high perf. network •Forward 1~2 Mpps per core NIC Time budget for 64B Time budget

Challenges of high perf. network

• Forward 1~2 Mpps per core

NIC Time budget for 64B

Time budget for 1518B

10Gb 67.2 ns 1,230 ns

40Gb N/A 307 ns

100Gb N/A 120 ns

* Tested on Intel i7-3770 (Ivy Bridge) by www.7-cpu.com Other data from LWN article

NIC Time cost

System call 75 ns/42 ns

Atomic ops 8.25 ns

Spinlock lock/unlock 16+ ns

L3 miss* ~53 ns

Page 6: Scalable High-performance Userland Container Networking for NFV · 2017-12-14 · Challenges of high perf. network •Forward 1~2 Mpps per core NIC Time budget for 64B Time budget

Container networking status quo

• Multi-host networking

Container Network

Virtualization Service

Linux Kernel

Container Network

Virtualization Service

Linux Kernel Agent

Data flow

Control flow

Page 7: Scalable High-performance Userland Container Networking for NFV · 2017-12-14 · Challenges of high perf. network •Forward 1~2 Mpps per core NIC Time budget for 64B Time budget

Userland Container Networking

Page 8: Scalable High-performance Userland Container Networking for NFV · 2017-12-14 · Challenges of high perf. network •Forward 1~2 Mpps per core NIC Time budget for 64B Time budget

SR-IOV in Virtualization Technologies

Virtual Appliance

PF VF VF

Virtual Ethernet Bridge & Classifier

VF

VM kernel

C

C

Host Kernel

VTEP

Page 9: Scalable High-performance Userland Container Networking for NFV · 2017-12-14 · Challenges of high perf. network •Forward 1~2 Mpps per core NIC Time budget for 64B Time budget

SR-IOV in Kernel

Container

PF VF VF

Container

VTEP

PF Driver

Virtual Ethernet Bridge & Classifier

VF Driver VF Driver

Host Kernel

netns

ETHx ETHy

Page 10: Scalable High-performance Userland Container Networking for NFV · 2017-12-14 · Challenges of high perf. network •Forward 1~2 Mpps per core NIC Time budget for 64B Time budget

SR-IOV in Userland

Container

PF VF VF

Container

VTEP

PF Driver

Virtual Ethernet Bridge & Classifier

Host Kernel

Userland NIC driver

Page 11: Scalable High-performance Userland Container Networking for NFV · 2017-12-14 · Challenges of high perf. network •Forward 1~2 Mpps per core NIC Time budget for 64B Time budget

• Pros:

– Line rate even with small packets

– Low latency

– HW-based QoS

• Cons:

– # of VFs is limited (64 or 128)

– Not flexible (in need of router or switch with support of VTEP)

Container

Linux Kernel

SR-IOV in Userland

Page 12: Scalable High-performance Userland Container Networking for NFV · 2017-12-14 · Challenges of high perf. network •Forward 1~2 Mpps per core NIC Time budget for 64B Time budget

Setup: SR-IOV in Userland

• Prepare VFs

• Bind to vfio driver

• Prepare hugetlbfs

• Start container

$ echo 1 > /sys/bus/pci/devices/0000\:81\:00.0/sriov_numvfs $ ./tools/dpdk_nic_bind.py --status … 0000:81:00.0 '82599ES 10-Gigabit SFI/SFP+ Network Connection' if=eth1 drv=ixgbe unused= 0000:81:10.0 '82599 Ethernet Controller Virtual Function' if=eth5 drv=ixgbevf unused= …

$ modprobe vfio-pci $ ./tools/dpdk_nic_bind.py -b vfio-pci 0000:81:10.0

$ mount -t hugetlbfs -o pagesize=2M,size=1024M none /mnt/huge_c0/

$ docker run … -v /dev/vfio/vfio0:/dev/vfio/vfio0 -v /mnt/huge_c0/:/dev/hugepages/ …

Page 13: Scalable High-performance Userland Container Networking for NFV · 2017-12-14 · Challenges of high perf. network •Forward 1~2 Mpps per core NIC Time budget for 64B Time budget

Container Networking Scenarios

C C

Host Kernel

C

vSwitch

VF

VF pass-through Aggregation

Linux bridge, OVS bridge

Page 14: Scalable High-performance Userland Container Networking for NFV · 2017-12-14 · Challenges of high perf. network •Forward 1~2 Mpps per core NIC Time budget for 64B Time budget

Aggregation Scenario

C C

Host Kernel

C

vSwitch

VF Driver

VF

VF pass-through Aggregation

Page 15: Scalable High-performance Userland Container Networking for NFV · 2017-12-14 · Challenges of high perf. network •Forward 1~2 Mpps per core NIC Time budget for 64B Time budget

VIRTIO for IPC

Page 16: Scalable High-performance Userland Container Networking for NFV · 2017-12-14 · Challenges of high perf. network •Forward 1~2 Mpps per core NIC Time budget for 64B Time budget

Why virtio for IPC?

• Performance

– Bypass kernel (shm-based)

– Smarter notification

– Cache friendly

• Make best use of what already we have in DPDK

Page 17: Scalable High-performance Userland Container Networking for NFV · 2017-12-14 · Challenges of high perf. network •Forward 1~2 Mpps per core NIC Time budget for 64B Time budget

VIRTIO as Unified Interface

Virtual Appliance

VM

C

C

Host Kernel

vSwitch

Page 18: Scalable High-performance Userland Container Networking for NFV · 2017-12-14 · Challenges of high perf. network •Forward 1~2 Mpps per core NIC Time budget for 64B Time budget

VIRTIO in VM

VM

vhost-user adapter

vSwitch

vho

st DPDK

Device Emulation

virtio

Socket /tmp/xx.socket

VIRTIO PMD

Hypervisor

MMIO or PMIO Interrupt injection

Kernel

VIRTIO NET

Transport: PCI bus

or

Page 19: Scalable High-performance Userland Container Networking for NFV · 2017-12-14 · Challenges of high perf. network •Forward 1~2 Mpps per core NIC Time budget for 64B Time budget

VIRTIO in Container

Container/App

vSwitch

vho

st

DPDK

VIRTIO PMD

virtio

Socket /tmp/xx.socket

ETHDEV

VIRTIO VIRTIO-USER

vhost-user adapter

PCI device Virtual device

Page 20: Scalable High-performance Userland Container Networking for NFV · 2017-12-14 · Challenges of high perf. network •Forward 1~2 Mpps per core NIC Time budget for 64B Time budget

Address translation

VIRTIO

VM

GPA: Guest Physical Address BVA: Backend Virtual Adress

GPA BVA LEN

GPA 0 BVA 0 Len 0

GPA 1 BVA 1 Len 1

… …

GPA n BVA n Len n

VIRTIO-USER

Container

FVA BVA LEN

FVA 0 BVA 0 Len 0

FVA 1 BVA 1 Len 1

… …

FVA n BVA n Len n

FVA: Frontend Virtual Address BVA: Backend Virtual Address

Holes

Non-DPDK memory

Page 21: Scalable High-performance Userland Container Networking for NFV · 2017-12-14 · Challenges of high perf. network •Forward 1~2 Mpps per core NIC Time budget for 64B Time budget

Setup: VIRTIO in container

• Add a bridge and a vhost-user port in ovs-dpdk

• Prepare hugetlbfs

• Run container

$ ovs-vsctl add-br br0 -- set bridge br0 datapath_type=netdev $ ovs-vsctl add-port br0 vhost-user-1 -- set Interface vhost-user-1 type=dpdkvhostuser

$ mount -t hugetlbfs -o pagesize=2M,size=1024M none /mnt/huge_c0/

$ docker run … \ -v /usr/local/var/run/openvswitch/vhost-user-1:/var/run/usvhost \ -v /mnt/huge_c0/:/dev/hugepages/ \ … -c 0x4 -n 4 --no-pci --vdev=virtio-user0,path=/var/run/usvhost \ …

Page 22: Scalable High-performance Userland Container Networking for NFV · 2017-12-14 · Challenges of high perf. network •Forward 1~2 Mpps per core NIC Time budget for 64B Time budget

Performance Evaluation - throughput

ixgbe kernel

vhost PMD

virtio PMD

Container

ixgbe PMD

Container

pcap lib

ixgbe PMD

Container

Kernel

vSwitch

10Gb Line rate

Case I: Native Linux

Case II: SR-IOV

Case III: VIRTIO

Page 23: Scalable High-performance Userland Container Networking for NFV · 2017-12-14 · Challenges of high perf. network •Forward 1~2 Mpps per core NIC Time budget for 64B Time budget

Performance Evaluation - latency

• For native Linux, ms level

• For the other two, us level

– Polling mode

– Batching

– SIMD

ixgbe kernel

vhost PMD

virtio PMD

Container

ixgbe PMD

Container

pcap lib

ixgbe PMD

Container

Kernel

vSwitch

Case I: Native Linux

Case II: SR-IOV

Case III: VIRTIO

Page 24: Scalable High-performance Userland Container Networking for NFV · 2017-12-14 · Challenges of high perf. network •Forward 1~2 Mpps per core NIC Time budget for 64B Time budget

Use case

ovs-vswitchd

vho

st Container

Container

ovsdb-server

vho

st vh

ost

Bridge

Virtual Appliance

VM virtio

virtio

virtio

Container

App

VF

Page 25: Scalable High-performance Userland Container Networking for NFV · 2017-12-14 · Challenges of high perf. network •Forward 1~2 Mpps per core NIC Time budget for 64B Time budget

Is everything OK to run DPDK in container?

Page 26: Scalable High-performance Userland Container Networking for NFV · 2017-12-14 · Challenges of high perf. network •Forward 1~2 Mpps per core NIC Time budget for 64B Time budget

DPDK efforts towards container

• Hugetlb initialization process – sysfs is not containerized, and DPDK allocates all

free pages • Addressed by here, avoid to use -m or --socket-mem

• Cores initialization – When/how to specify cores for DPDK?

• Addressed by here, avoid to use -c or -l or --lcores

• Reduce boot time – Addressed by here and here

Page 27: Scalable High-performance Userland Container Networking for NFV · 2017-12-14 · Challenges of high perf. network •Forward 1~2 Mpps per core NIC Time budget for 64B Time budget

Run DPDK in Container securely

• Dedicated hugetlbfs for each container

• Run without --privileged

– DPDK needs privilege to do VA2PA translation

• Use VA instead of PA (see here)

• DMA attack • Use VA as the IOVA for IOMMU table

Page 28: Scalable High-performance Userland Container Networking for NFV · 2017-12-14 · Challenges of high perf. network •Forward 1~2 Mpps per core NIC Time budget for 64B Time budget

Deterministic Environment

• Deterministic CPU env

– Boot-time: disable timer / task scheduler

• … default_hugepagesz=1G isolcpus=16-19 …

• Reduce scheduling-clock ticks: adaptive-tick mode

– Run-time: core-thread affinity

• cpuset tool: taskset / numactl

• cgroup.cpuset: cset / docker run … --cpuset-cpus …

– BIOS setting: if necessary, disable Hyper-Threading

Page 29: Scalable High-performance Userland Container Networking for NFV · 2017-12-14 · Challenges of high perf. network •Forward 1~2 Mpps per core NIC Time budget for 64B Time budget

Deterministic Environment

• Deterministic cache env

– Data Direct I/O (DDIO) technology

– Cache Allocation Technology (CAT)

• An example from here

Noisy Neighbor Scenario*

CAT Throughput (Mpps) LLC Occupancy (MB)

Not Present 9.8 4.5

Present 15 13.75

$ pqos -e "llc:2=0x00003" $ pqos -a "llc:2=8,9,10"

* DPDK IP Pipeline Application (Packet size = 64 Bytes, Flows = 16 Millions)

Page 30: Scalable High-performance Userland Container Networking for NFV · 2017-12-14 · Challenges of high perf. network •Forward 1~2 Mpps per core NIC Time budget for 64B Time budget

Impact on application development

Page 31: Scalable High-performance Userland Container Networking for NFV · 2017-12-14 · Challenges of high perf. network •Forward 1~2 Mpps per core NIC Time budget for 64B Time budget

• Type I: DPI, FW, LB, vSwitch/vRouter, …

• Type II: Applications in need of TCP/UDP stack

How to Develop Apps?

Type II App

TCP/UDP

ICMP

Interfaces

Driver

ARP DHCP

Type I App

From scratch: • mTCP • LwIP • TLDK Ported: • Libuinet • NUSE (libos) • Linux Kernel

Library

Page 32: Scalable High-performance Userland Container Networking for NFV · 2017-12-14 · Challenges of high perf. network •Forward 1~2 Mpps per core NIC Time budget for 64B Time budget

C C

vSwitch vSwitch

Container Host Container Host Container Host

Gateway Middleboxes Middleboxes Middleboxes

VNF VNF VNF

Userland container networking

Transform Middleboxes with VNFs

Logic Network 0

Logic Network 1

VNF

Page 33: Scalable High-performance Userland Container Networking for NFV · 2017-12-14 · Challenges of high perf. network •Forward 1~2 Mpps per core NIC Time budget for 64B Time budget

Full Stack in Container

• Full stack

– Hardware resource

– Driver (network)

– Under layer network facilities (optional)

– Dependencies and application itself

• Benefits

– Better isolation

– Convenient for live migration

Snort-DPDK

TRex Vortex

Contrail

ScyllaDB

6WINDGate

Page 34: Scalable High-performance Userland Container Networking for NFV · 2017-12-14 · Challenges of high perf. network •Forward 1~2 Mpps per core NIC Time budget for 64B Time budget

Future work

• Interrupt mode of virtio (to scale)

• Long path to handle VF interrupts in userland (low latency)

• Integrate with popular orchestrators

Page 35: Scalable High-performance Userland Container Networking for NFV · 2017-12-14 · Challenges of high perf. network •Forward 1~2 Mpps per core NIC Time budget for 64B Time budget

Summary

• Use DPDK to accelerate container networking

– Userland SR-IOV

– Userland virtio-user

• Compared to traditional ways, it provides

– High throughput

– Low latency

– Deterministic networking

Page 36: Scalable High-performance Userland Container Networking for NFV · 2017-12-14 · Challenges of high perf. network •Forward 1~2 Mpps per core NIC Time budget for 64B Time budget

Q & A

Page 37: Scalable High-performance Userland Container Networking for NFV · 2017-12-14 · Challenges of high perf. network •Forward 1~2 Mpps per core NIC Time budget for 64B Time budget

Backup- How DPDK improves net?

• CPU affinity

• Hugepages

• UIO

• Polling

• Lockless

• Batching

• SSE/AVX

• High-throughput

• Low-latency

• Deterministic