accelerating networked applications with flexible packet … · with flexible packet processing...

29
Accelerating Networked Applications with Flexible Packet Processing Antoine Kaufmann, Naveen Kr. Sharma, Thomas Anderson, Arvind Krishnamurthy Timothy Stamler, Simon Peter University of Washington The University of Texas at Austin

Upload: others

Post on 18-Jul-2020

3 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Accelerating Networked Applications with Flexible Packet … · with Flexible Packet Processing Antoine Kaufmann, Naveen Kr. Sharma, Thomas Anderson, Arvind Krishnamurthy Timothy

1©2017 Open-NFP

Accelerating Networked Applications with Flexible Packet Processing

AntoineKaufmann,NaveenKr.Sharma,ThomasAnderson,ArvindKrishnamurthy

TimothyStamler, SimonPeter

UniversityofWashington The UniversityofTexasatAustin

Page 2: Accelerating Networked Applications with Flexible Packet … · with Flexible Packet Processing Antoine Kaufmann, Naveen Kr. Sharma, Thomas Anderson, Arvind Krishnamurthy Timothy

2©2017 Open-NFP

Networks are becoming faster

100MbE

1GbE

10GbE

40GbE100GbE

400GbE

100M

1G

10G

100G

1T

1990 1995 2000 2005 2010 2015 2020

Ethe

rnetBandw

idth[b

its/s]

YearofStandardRelease

5nsinter-arrivaltimefor64Bpacketsat100Gbps

Page 3: Accelerating Networked Applications with Flexible Packet … · with Flexible Packet Processing Antoine Kaufmann, Naveen Kr. Sharma, Thomas Anderson, Arvind Krishnamurthy Timothy

3©2017 Open-NFP

...but software packet processing is slow

Recv+send TCP stack processing time (2.2 GHz)▪ Linux: 3.5µs▪ Kernel bypass: ~1µs

Single core performance has stalledParallelize? Assuming 1µs over 100Gb/s, excluding Amdahl‘s law:▪ 64B packets => 200 cores▪ 1KB packets => 14 cores

Many cloud apps dominated by packet processing▪ Key-value storage, real-time analytics, intrusion detection, file service, ...▪ All rely on small messages: latency & throughput equally important

Page 4: Accelerating Networked Applications with Flexible Packet … · with Flexible Packet Processing Antoine Kaufmann, Naveen Kr. Sharma, Thomas Anderson, Arvind Krishnamurthy Timothy

4©2017 Open-NFP

What are the alternatives?RDMA▪ Bypasses server software entirely▪ Not well matched to client/server processing (security, two-sided for RPC)

Full application offload to NIC (FPGA, etc.)▪ Application now at slower hardware-development speed▪ Difficult to change once deployed

Fixed-function offloads (segmentation, checksums, RSS)▪ Good start!▪ Too rigid for today’s complex server & network architecture (next slide)

Flexible function offload to NIC (NFP, FlexNIC, etc.)▪ Break down functions (eg., RSS) and provide API for software flexibility

Page 5: Accelerating Networked Applications with Flexible Packet … · with Flexible Packet Processing Antoine Kaufmann, Naveen Kr. Sharma, Thomas Anderson, Arvind Krishnamurthy Timothy

5©2017 Open-NFP

Fixed-function offloads are not well integrated

Wasted CPU cycles▪ Packet parsing and validation repeated in software▪ Packet formatted for network, not software access▪ Multiplexing, filtering repeated in software

Poor cache locality, extra synchronization▪ NIC steers packets to cores by connection▪ Application locality may not match connection

Page 6: Accelerating Networked Applications with Flexible Packet … · with Flexible Packet Processing Antoine Kaufmann, Naveen Kr. Sharma, Thomas Anderson, Arvind Krishnamurthy Timothy

6©2017 Open-NFP

A more flexible NIC can help

With multi-core, NIC needs to pick destination core▪ The “right” core is application specific

NIC is perfectly situated – sees all traffic▪ Can scalably preprocess packets according to software needs▪ Can scalably forward packets among host CPUs and network

With kernel-bypass, only NIC can enforce OS policy▪ Need flexible NIC mechanisms, or go back into kernel

Page 7: Accelerating Networked Applications with Flexible Packet … · with Flexible Packet Processing Antoine Kaufmann, Naveen Kr. Sharma, Thomas Anderson, Arvind Krishnamurthy Timothy

7©2017 Open-NFP

Talk Outline

• Motivation• FlexNIC model

• Experience with Agilio-CX as prototyping platform• Accelerating packet-oriented networking (UDP, DCCP)

• Key-value store• Real-time analytics• Network Intrusion Detection

• WiP: Accelerating stream-oriented networking (TCP)

Page 8: Accelerating Networked Applications with Flexible Packet … · with Flexible Packet Processing Antoine Kaufmann, Naveen Kr. Sharma, Thomas Anderson, Arvind Krishnamurthy Timothy

8©2017 Open-NFP

FLEXNIC MODEL

Page 9: Accelerating Networked Applications with Flexible Packet … · with Flexible Packet Processing Antoine Kaufmann, Naveen Kr. Sharma, Thomas Anderson, Arvind Krishnamurthy Timothy

9©2017 Open-NFP

FlexNIC: A Model for Integrated NIC/SW Processing[ASPLOS’16]

• Implementable at Tbps line rate & low cost

Match+action pipeline:

ActionALU

MatchTable

Parser

M+AStage1 M+A2

...

ExtractedHeaderFields

Packet

ModifiedFields

Page 10: Accelerating Networked Applications with Flexible Packet … · with Flexible Packet Processing Antoine Kaufmann, Naveen Kr. Sharma, Thomas Anderson, Arvind Krishnamurthy Timothy

10©2017 Open-NFP

Match+Action Programs

Supports: Does not support:

Match:IF udp.port ==kvs

Action:core=HASH(kvs.key)%ncoresDMA hash,kvs TO Cores[core]

LoopsComplex calculationsKeeping large state

Steer packetCalculate hash/XsumInitiate DMA operationsTrigger reply packetModify packets

Page 11: Accelerating Networked Applications with Flexible Packet … · with Flexible Packet Processing Antoine Kaufmann, Naveen Kr. Sharma, Thomas Anderson, Arvind Krishnamurthy Timothy

11©2017 Open-NFP

FlexNIC: M+A for NICs

Efficient application level processing in the NIC▪ Improve locality by steering to cores based on app criteria▪ Transform packets for efficient processing in SW▪ DMA directly into and out of application data structures▪ Send acknowledgements on NIC

IngressPipeline

EgressPipeline

DMAPipeline

Queues

Page 12: Accelerating Networked Applications with Flexible Packet … · with Flexible Packet Processing Antoine Kaufmann, Naveen Kr. Sharma, Thomas Anderson, Arvind Krishnamurthy Timothy

12©2017 Open-NFP

Netronome Agilio-CX

We use Agilio-CX to prototype FlexNIC• Implement M&A programs in P4• Run on NIC

Our experience with Agilio-CX:▪ Improve locality by steering to cores based on app criteria▪ Transform packets for efficient processing in SW▪ DMA directly into and out of application data structures▪ Send acknowledgements on NIC

Dev

Page 13: Accelerating Networked Applications with Flexible Packet … · with Flexible Packet Processing Antoine Kaufmann, Naveen Kr. Sharma, Thomas Anderson, Arvind Krishnamurthy Timothy

13©2017 Open-NFP

ACCELERATING PACKET-ORIENTED NETWORKING

Page 14: Accelerating Networked Applications with Flexible Packet … · with Flexible Packet Processing Antoine Kaufmann, Naveen Kr. Sharma, Thomas Anderson, Arvind Krishnamurthy Timothy

14©2017 Open-NFP

Example: Key-Value Store

4

7

HashTable

Core1

Core2NIC

Receive-sidescaling:core=hash(connection)%N

Client1K= 3,4

Client2K=4,7

Client3K=7,8

• Lockcontention• Poorcacheutilization

4,7

4,7

Page 15: Accelerating Networked Applications with Flexible Packet … · with Flexible Packet Processing Antoine Kaufmann, Naveen Kr. Sharma, Thomas Anderson, Arvind Krishnamurthy Timothy

15©2017 Open-NFP

Key-based Steering

Core1

Core2NIC

3

4

7

8

HashTable

Client1K=3,4

Client2K=4,7

Client3K=7,8

Match:IF udp.port ==kvsAction:core=HASH(kvs.key)%NDMA hash,kvs TO Cores[core]

• Nolocksneeded• Highercacheutilization

Page 16: Accelerating Networked Applications with Flexible Packet … · with Flexible Packet Processing Antoine Kaufmann, Naveen Kr. Sharma, Thomas Anderson, Arvind Krishnamurthy Timothy

16©2017 Open-NFP

Custom DMA

DMA to application-level data structuresRequires packet validation and transformation

ItemLog

EventQueue

G

Item1 Item2

G S

GET,ClientID,Hash,KeySET,ClientID,ItemPointer

Page 17: Accelerating Networked Applications with Flexible Packet … · with Flexible Packet Processing Antoine Kaufmann, Naveen Kr. Sharma, Thomas Anderson, Arvind Krishnamurthy Timothy

17©2017 Open-NFP

Evaluation of the Model

• Measure impact on application performance• Key-based steering: Use NIC• Custom DMA: Software emulation of M&A pipeline

• Workload: 100k 32B keys, 64B values, 90% GET• 6 Core Sandy Bridge Xeon 2.2GHz, 2x10G links

Page 18: Accelerating Networked Applications with Flexible Packet … · with Flexible Packet Processing Antoine Kaufmann, Naveen Kr. Sharma, Thomas Anderson, Arvind Krishnamurthy Timothy

18©2017 Open-NFP

Key-based steering

• Better scalability▪ PCIe is bottleneck for 4+ cores

• 45% higher throughput• Processing time reduced to 310ns

0

2

4

6

8

1 2 3 4 5Throughp

ut[m

op/s]

NumberofCPUCores

FlexKVS/RSS

FlexKVS/Key

FlexKVS/Linux

MemcachedCustomDMAreducestimeto200ns

Page 19: Accelerating Networked Applications with Flexible Packet … · with Flexible Packet Processing Antoine Kaufmann, Naveen Kr. Sharma, Thomas Anderson, Arvind Krishnamurthy Timothy

19©2017 Open-NFP

Real-time Analytics System

(De-)Multiplexing threads are performance bottleneck• 2 CPUs required for 10 Gb/s => 20 CPUs for 100 Gb/s

NIC

Software

RxQueue

TxQueue

Count

Count

Rank

Rank

DemuxACKs Mux

Page 20: Accelerating Networked Applications with Flexible Packet … · with Flexible Packet Processing Antoine Kaufmann, Naveen Kr. Sharma, Thomas Anderson, Arvind Krishnamurthy Timothy

20©2017 Open-NFP

Real-time Analytics System

Offload (de)multiplexing and ACK generation to FlexNIC• No CPUs needed => Energy-efficiency

NIC

Software

RxQueue

TxQueue

Count

Count

Rank

Rank

DemuxACKs Mux

Page 21: Accelerating Networked Applications with Flexible Packet … · with Flexible Packet Processing Antoine Kaufmann, Naveen Kr. Sharma, Thomas Anderson, Arvind Krishnamurthy Timothy

21©2017 Open-NFP

Performance Evaluation

0

2

4

6

Balanced Grouped

Throughp

ut[m

tuples/s]

ApacheStormFlexStorm/LinuxFlexStorm/BypassFlexStorm/FlexNIC.5x

1x

2x

.3x1x

2.5x

• Clusterof3machines• DetermineTop-nTwitterposters(realtrace)• Measureattainablethroughput

Page 22: Accelerating Networked Applications with Flexible Packet … · with Flexible Packet Processing Antoine Kaufmann, Naveen Kr. Sharma, Thomas Anderson, Arvind Krishnamurthy Timothy

22©2017 Open-NFP

Network Intrusion Detection

Snort sniffs packets and analyzes them• Parallelized by running multiple instances• Status quo: Receive-side scaling

FlexNIC:• Analyze rules loaded into Snort• Partition rules among cores to maximize caching• Fine-grained steering to cores

Result: 1.6x higher throughput, 30% fewer cache misses

Page 23: Accelerating Networked Applications with Flexible Packet … · with Flexible Packet Processing Antoine Kaufmann, Naveen Kr. Sharma, Thomas Anderson, Arvind Krishnamurthy Timothy

23©2017 Open-NFP

ACCELERATING STREAM-ORIENTED NETWORKING

Page 24: Accelerating Networked Applications with Flexible Packet … · with Flexible Packet Processing Antoine Kaufmann, Naveen Kr. Sharma, Thomas Anderson, Arvind Krishnamurthy Timothy

24©2017 Open-NFP

Ongoing work: Stream protocols

Full TCP processing is too complex for M&A processing▪ Significant connection state required▪ Tricky edge cases: reordering, drops▪ Complicated algorithms for congestion control

But the common case is simpler: it can be offloaded▪ Reduces the critical path in software

Opportunity: Enforce correct protocol onto untrusted app▪ Focus: congestion control

Page 25: Accelerating Networked Applications with Flexible Packet … · with Flexible Packet Processing Antoine Kaufmann, Naveen Kr. Sharma, Thomas Anderson, Arvind Krishnamurthy Timothy

25©2017 Open-NFP

FlexTCP ideas

Safety critical & common processing on NIC▪ Includes filtering, validating ACKs, enforcing rate limits

Handle all non-common cases in software▪ E.g. packet drops, re-ordering, timeouts, …

Requires small per-flow state▪ 64 bytes (SEQ/ACK, queues, rate-limit, …)

Page 26: Accelerating Networked Applications with Flexible Packet … · with Flexible Packet Processing Antoine Kaufmann, Naveen Kr. Sharma, Thomas Anderson, Arvind Krishnamurthy Timothy

26©2017 Open-NFP

FlexTCP overview

Page 27: Accelerating Networked Applications with Flexible Packet … · with Flexible Packet Processing Antoine Kaufmann, Naveen Kr. Sharma, Thomas Anderson, Arvind Krishnamurthy Timothy

27©2017 Open-NFP

Flexible congestion control offloadNIC enforces per-flow rate limits set by trusted kernel▪ Flexibility to choose congestion control

Example: DCTCPCommon-case processing on NIC▪ Echo ECN marks in generated ACK▪ Track fraction of ECN marked packets per flow

Kernel implements control policy (DCTCP)▪ Use NIC-reported fraction of packets that are ECN marked▪ Adapt rate limit according to DCTCP protocol

Result: Indistinguishable from pure software implementations

Page 28: Accelerating Networked Applications with Flexible Packet … · with Flexible Packet Processing Antoine Kaufmann, Naveen Kr. Sharma, Thomas Anderson, Arvind Krishnamurthy Timothy

28©2017 Open-NFP

FlexTCP overhead evaluation

• We implemented FlexTCP in P4• Run on Agilio-CX with null application• Compare throughput to basic NIC (wiretest)

0

10

20

30

40

256 512 1024 1500

Throughp

ut[G

b/s]

Packetsize[Bytes]

Basic

Full

Page 29: Accelerating Networked Applications with Flexible Packet … · with Flexible Packet Processing Antoine Kaufmann, Naveen Kr. Sharma, Thomas Anderson, Arvind Krishnamurthy Timothy

29©2017 Open-NFP

Summary

Networks are becoming faster, CPUs are not▪ Server applications need to keep up▪ Fast I/O requires efficient I/O path to application

Flexible offloads can eliminate inefficiencies▪ Application control over where packets are processed▪ Efficient steering, validation, transformation

Case studies: Key-value store, real-time analytics, IDS▪ Up to 2.5x throughput & latency improvement vs. kernel-bypass▪ Vastly more energy-efficient (no CPUs for packet processing)