devconf 2014 kernel networking walkthrough

20
Kernel Networking Walkthrough 1 Kernel Networking Walkthrough Thomas Graf – Principal Software Engineer Networking Services Red Hat Feb 7, 2014

Upload: thomas-graf

Post on 10-May-2015

1.560 views

Category:

Technology


23 download

DESCRIPTION

This presentation features a walk through the Linux kernel networking stack covering the essentials and recent developments a developer needs to know. Our starting point is the network card driver as it feeds a packet into the stack. We will follow the packet as it traverses through various subsystems such as packet filtering, routing, protocol stacks, and the socket layer. We will pause here and there to look into concepts such as segmentation offloading, TCP small queues, and low latency polling. We will cover APIs exposed by the kernel that go beyond use of write()/read() on sockets and will look into how they are implemented on the kernel side.

TRANSCRIPT

Page 1: DevConf 2014   Kernel Networking Walkthrough

Kernel Networking Walkthrough1

Kernel Networking Walkthrough

Thomas Graf – Principal Software EngineerNetworking ServicesRed HatFeb 7, 2014

Page 2: DevConf 2014   Kernel Networking Walkthrough

Kernel Networking Walkthrough2

Agenda

● How does a packet get in and out of the net stack?● NAPI, Busy Polling, RSS, RPS, XPS, GRO, TSO

● How does a packet get through the net stack?● RX Handler, IP Processing, TCP Processing, TCP

Fast Open● How to account for memory and do flow control?

● Socket Buffers, Flow Control, TCP Small Queues

● Q&A

Page 3: DevConf 2014   Kernel Networking Walkthrough

Kernel Networking Walkthrough3

Touring the Network Stack

Expectation Reality

Page 4: DevConf 2014   Kernel Networking Walkthrough

Kernel Networking Walkthrough4

How does a packet get in and out of the Network Stack?

Page 5: DevConf 2014   Kernel Networking Walkthrough

Kernel Networking Walkthrough5

Receive & Transmit Process

Ring Buffer

DMA

ParseIP

ParseTCP/UDP

Socket Buffer

Task

read()

Ring Buffer

ConstructIP

ConstructTCP/UDP

Local?

Socket Buffer

Forward

Device?write()

NIC Network Stack(Kernel Space)

Process(User Space)

Page 6: DevConf 2014   Kernel Networking Walkthrough

Kernel Networking Walkthrough6

The 3 ways into the Network Stack

Ring Buffer

NetworkStack

NAPI based Polling poll()

Ring Buffer

NetworkStack

Interrupt Driven

Ring Buffer NetworkStack

Busy Polling busy_poll()

Task

Page 7: DevConf 2014   Kernel Networking Walkthrough

Kernel Networking Walkthrough7

RSS – Receive Side Scaling

● NIC distributes packets across multiple RX queues allowing for parallel processing.

● Separate IRQ per RX queue, thus selects CPU to run hardware interrupt handler on.

RX-queue-1

RX-queue-2

RX-queue-3

RX-queue-4

CPU 1

CPU 3

CPU 1

CPU 5

filter

Page 8: DevConf 2014   Kernel Networking Walkthrough

Kernel Networking Walkthrough8

RPS – Receive Packet Steering

● Software filter to select CPU # for processing

● Use it to ...

RX-queue-1

RX-queue-2

RX-queue-3

RX-queue-4

CPU 1

CPU 2

CPU 3

CPU 1

CPU 2

CPU 3

... redo queue - CPU mapping ... distribute single queue to multiple CPUs

Page 9: DevConf 2014   Kernel Networking Walkthrough

Kernel Networking Walkthrough9

Hardware Offload

● RX/TX Checksumming● Perform CPU intensive

checksumming in hardware.

● Virtual LAN filtering and tag stripping● Strip 802.1Q header and store VLAN

ID in network packet meta data.● Filter out unsubscribed VLANs.

Page 10: DevConf 2014   Kernel Networking Walkthrough

Kernel Networking Walkthrough10

Generic Receive Offload

Ring BufferNetwork

Stack

poll()

NAPI based GRO

MTU

GRO

Up to 64K

Page 11: DevConf 2014   Kernel Networking Walkthrough

Kernel Networking Walkthrough11

Segmentation Offload

Ring Buffer

NetworkStack

Generic Segmentation Offload (GSO)

MTU

TCP Segmentation Offload (TSO)

MTU

Up to 64K

Page 12: DevConf 2014   Kernel Networking Walkthrough

Kernel Networking Walkthrough12

How does a packet get through the Network Stack?

(c) Karen Sagovac

Page 13: DevConf 2014   Kernel Networking Walkthrough

Kernel Networking Walkthrough13

Packet Processing

Link Layer

Ingress QoS

Proto Handler

IPv4

IPv6

ARP

IPX

...Drop

Packet SocketETH_P_ALL

tcpdump

Feast of the hungry chicks

RX Handler

Open vSwitch

Team

Bonding

Bridge

macvlan

macvtap

Page 14: DevConf 2014   Kernel Networking Walkthrough

Kernel Networking Walkthrough14

IP Processing

IPHandler Route Lookup

PREROUTING

IPv4Construction

Route Lookup

Local Output

OUTPUT

POSTROUTINGLink Layer

FORWARD

Forwarding L4(TCP, ...)

Local Delivery

INPUT

UserSpace

Page 15: DevConf 2014   Kernel Networking Walkthrough

Kernel Networking Walkthrough15

TCP Processing

IP

Socket Filter

Receive TCP

Parse TCPLookup Socket

Backlogsocket locked

Receive Socket Buffer

Prequeuetask exists

process context ← softirq

Task

poll()read()

Page 16: DevConf 2014   Kernel Networking Walkthrough

Kernel Networking Walkthrough16

TCP Fast Open(net.ipv4.tcp_fastopen)

2nd Req SYN

SYN+ACK

ACK+HTTP GET

Data

2x RTT

SYN+Cookie+HTTP GET

SYN+ACK+Data

2nd Req

1x RTT

Client Server

SYN

SYN+ACK

ACK+HTTP GET

1st Req

Data

2x RTT2x RTT

Regular

Client Server

SYN

SYN+ACK+Cookie

ACK+HTTP GET

1st Req

Data

2x RTT

Fast Open

Page 17: DevConf 2014   Kernel Networking Walkthrough

Kernel Networking Walkthrough17

Memory Accounting & Flow Control

Page 18: DevConf 2014   Kernel Networking Walkthrough

Kernel Networking Walkthrough18

Socket Buffers & Flow Control(net.ipv4.tcp_{r|w}mem)

ssh

TX Ring Buffer

TCP/IP

Socket Buffer

wmemoverlimit?

Block or EWOULDBLOCK

wmem += packet-size

ssh

RX Ring Buffer

TCP/IP

Socket Buffer

rmem -= packet-size

rmemoverlimit?

Reduce TCP Window

rmem += packet-size

wmem -= packet-size

write()

Page 19: DevConf 2014   Kernel Networking Walkthrough

Kernel Networking Walkthrough19

TCP Small Queues(net.ipv4.tcp_limit_output_bytes)

ssh

TX Ring Buffer

Driver

TCP/IP

Socket Buffer

write()

Queuing Discipline

torrent

Socket Buffer

write()

TSQ: max 128Kb in flight per socket

Page 20: DevConf 2014   Kernel Networking Walkthrough

Kernel Networking Walkthrough20

Q&A

Feedback Page● http://devconf.cz/f/1

Coming Up Next:

NetworkManager for Enterprise

Dan Williams