vpp accelerated high performance & scalable l3dsr l4 load ... · *other names and brands may be...

31
VPP Accelerated High Performance & Scalable L3DSR L4 Load Balancer on Top Clos Yusuke Tatsumi @ Yahoo Japan Corporation & Naoyuki Mori @ Intel Acknowledgement: Shunya Kitada, Yuta Kinoshita @ Yahoo Japan Corporation Hongjun Ni, Ray Kinsella @Intel Pierre Pfister, Jerome Tollet @Cisco

Upload: others

Post on 18-May-2020

5 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: VPP Accelerated High Performance & Scalable L3DSR L4 Load ... · *Other names and brands may be claimed as the property of others. Category Feature Description First appeared Framework

VPP Accelerated High Performance & Scalable L3DSR L4 Load Balancer on Top Clos

Yusuke Tatsumi @ Yahoo Japan Corporation & Naoyuki Mori @ Intel

Acknowledgement:Shunya Kitada, Yuta Kinoshita @ Yahoo Japan Corporation

Hongjun Ni, Ray Kinsella @IntelPierre Pfister, Jerome Tollet @Cisco

Page 2: VPP Accelerated High Performance & Scalable L3DSR L4 Load ... · *Other names and brands may be claimed as the property of others. Category Feature Description First appeared Framework

*Other names and brands may be claimed as the property of others.

• ! ! , • FD.io VPP Overview• VPP LB data-plane• VPP LB control-plane• Summary• !

Agenda

Page 3: VPP Accelerated High Performance & Scalable L3DSR L4 Load ... · *Other names and brands may be claimed as the property of others. Category Feature Description First appeared Framework

*Other names and brands may be claimed as the property of others.

Yahoo! JAPAN introduction

Page 4: VPP Accelerated High Performance & Scalable L3DSR L4 Load ... · *Other names and brands may be claimed as the property of others. Category Feature Description First appeared Framework

*Other names and brands may be claimed as the property of others.

Daily Unique Browser

90+ MillionDaily Unique Browser(Only Smartphone)

60+ Million

Monthly Page Views

70+ BillionNumber of services

100+Many Load Balancers support our services!

https://s.yimg.jp/i/docs/ir/archives/jp/nenji/2017/jp2018integrated_report.pdf

Yahoo! JAPAN introduction

Page 5: VPP Accelerated High Performance & Scalable L3DSR L4 Load ... · *Other names and brands may be claimed as the property of others. Category Feature Description First appeared Framework

*Other names and brands may be claimed as the property of others.

Background and issues of our LB

2 3 9 1 0 2 3

0 B = BE

[1] h9ps://www.linux-kvm.org[2] https://kubernetes.io/

[1] [2]

[3] LBaaS: Load Balancer as a Service

#

:+ B = / /4 = B 4 BE B= E B B 4 = =B 1 -

[3]

Page 6: VPP Accelerated High Performance & Scalable L3DSR L4 Load ... · *Other names and brands may be claimed as the property of others. Category Feature Description First appeared Framework

*Other names and brands may be claimed as the property of others.

• G I H & ( GL CB GC / +• E GC B CHG

• B E EC E G EL ( C• G A E E C

• ) E H C E G CB B GC (

) -

) 2-

)- B 2-

Internet

• B B CHG ( GL CB GC ) C• )CAAC GL EI E -) 1 G C• L C E G CB GC ( G (+2

- ) - A --

Leaf/Spine IP Clos fabric

-- ) 2- -- ) 2- -- ) 2-CCC

((

Internet

Limitations of our systemand OSS possibilities

Page 7: VPP Accelerated High Performance & Scalable L3DSR L4 Load ... · *Other names and brands may be claimed as the property of others. Category Feature Description First appeared Framework

*Other names and brands may be claimed as the property of others.

-G I: ���

-G B L ���

(CBGEC D B:

Internet

Leaf/Spine IP Clos fabric

1 ��� 1 ��� 1 ���

(CBGEC D B:

MMM

( 2 GC .2I:EG F B

.2F I 2

Internet

(C CE G: G : FG B D:E CEA B : 1 G D B: ) C 22):I: CD CBGEC D B: ECA F E G F: CB CHE CD:E G CB : D:E :B :F

GE G: L GC E: E:: C CHE HEE:BG A G G CBF

Limitations of our systemand OSS possibilities

Page 8: VPP Accelerated High Performance & Scalable L3DSR L4 Load ... · *Other names and brands may be claimed as the property of others. Category Feature Description First appeared Framework

*Other names and brands may be claimed as the property of others.

Universal data plane Extensible Modular Design• Layer 2 – 4 Network Stack

• CP, TM, Overlays and more …

• Linux (and FreeBSD) support

• Kernel Interfaces (Netmap, FastTap)

• Container and Virtualization support

• Appliance, Infrastructure, VNF & CNF

Architecture• Pluggable, easy to understand & extend.

• Mature graph node architecture.

Plugins• Full control to reorganize the pipeline.

• Fast, plugins are equal citizens

Fast, Scalable and Deterministic Developer friendly• L2XC - 25+ Mpps per core

Deterministic• 0 packet drops, ~15µs latency

• Continuous & extensive latency testing.

Scalability• Linear scaling with core/thread count

• Supporting millions of concurrent L[2,3]tables entries.

• Runtime counters for everything.( throughput, ipc, errors etc )

• Full pipeline tracing facilities.

• Multi-language API bindings.

• VPP command line introspection.

Packet Processing: VPP

Management AgentNetconf/Yang REST BGP

ethernet-input

dpdk-inputaf-packet-input

vhost-user-input

mpls-inputlldp-input

...-no-checksum

ip4-input ip6-inputarp-inputcdp-input l2-input

ip4-lookup ip4-lookup-mulitcast

ip4-rewrite-transit

ip4-load-balance

ip4-midchain

mpls-policy-encap

interface-output

Graph N

odesSDN IntegrationFD.io* VPP Overview

Page 9: VPP Accelerated High Performance & Scalable L3DSR L4 Load ... · *Other names and brands may be claimed as the property of others. Category Feature Description First appeared Framework

*Other names and brands may be claimed as the property of others.

VPP Layeringfor modular design

Plugins

VNET

VLIB

VPP Infra VPP InfraLibrary of function primitives, for• memory management• memory operations• vectors• rings• hashing• timers

VLIBVPP application management • buffer, buffer management• graph node, node management• tracing, counters• threading• CLIand most importantly …• main()

VNET VPP networking source• Devices• Layer [2, 3, 4]• Session Management• Overlays• Traffic Management

Plugins• Plugins can be in-tree:

SNAT, Policy ACL,Flow Per Packet,ILA, IOAM,Load Balancer, SIXRD,VCGNSegment RouWng

• Separate fd.io project:NSH_SFC

Page 10: VPP Accelerated High Performance & Scalable L3DSR L4 Load ... · *Other names and brands may be claimed as the property of others. Category Feature Description First appeared Framework

*Other names and brands may be claimed as the property of others.

• 33 E E C A•

• 3 A A BG C• : 3: / 3/ 3 :

• A D BA I (/1 1 BA ( 3• -/: B 3 :

• )3 GE A 1 )3(1 K 4• -

• / A A 1• 3: /: 3 3:

D E D D D E D D D E D D(

A3 :

/ †

-

CCC

† - C A BAE E ABD D E BA

: 3

1

A

1

! 33 3

:

Leaf/Spine IP Clos fabric

[1] h:ps://www.nanog.org/mee?ngs/nanog51/presenta?ons/Monday/NANOG51.Talk45.nanog51-Schaumann.pdf

What’s available?& What’s missing?

Missing

Par?ally exist

Existing

Page 11: VPP Accelerated High Performance & Scalable L3DSR L4 Load ... · *Other names and brands may be claimed as the property of others. Category Feature Description First appeared Framework

*Other names and brands may be claimed as the property of others.

Category Feature Description First appeared

Framework Consistent Hash Brings great scalability and redundancy w/ LB nodes 16.09

Multi-Threading Performance scalability w/ CPU cores 16.09

Protocols GRE4/GRE6 Encap w/ GRE IPv4 and IPv6 16.09

L3DSR Layer 3 Direct Server Return 18.04

NAT4/NAT6 NAT (originally for kube-proxy) 18.07

Port # aware Care TCP/UDP port number for LBaaS integration 18.10

API Manipulation Set/Get VIP/member over API from controller 18.10

Statistics Collect more detailed statistics like # of connections On going

Added by this project

VPP LB plugin Enhancements

Page 12: VPP Accelerated High Performance & Scalable L3DSR L4 Load ... · *Other names and brands may be claimed as the property of others. Category Feature Description First appeared Framework

*Other names and brands may be claimed as the property of others.

Client

Load Balancer (VIP)

Server

• Src IP: Client• Dst IP: VIP• DSCP: unflag

• Src IP: Client• Dst IP: Server• DSCP: flagged

• Src IP: VIP• Dst IP: Client• DSCP: unflag

1

23

DSCP/VIP mapping in hostVIP DSCP192.168.1.2 (000 010)b

192.168.1.3 (000 011)b DSCP: Part of IP header

• No tunneling overhead with DSCP• DSR: Saving LB bandwidth (3. is generally x2-10 larger than 1.)

Why L3DSR (L3 Direct Server Return) ?

Page 13: VPP Accelerated High Performance & Scalable L3DSR L4 Load ... · *Other names and brands may be claimed as the property of others. Category Feature Description First appeared Framework

*Other names and brands may be claimed as the property of others.

• 33 E E C A• -

• 3 A A BG C•

• A D BA I (/1 1 BA ( 3• - / : 3

• )3 GE A 1 )3(1 K 4• - ::

• / A A 1• -

[1] https://www.nanog.org/meetings/nanog51/presentations/Monday/NANOG51.Talk45.nanog51-Schaumann.pdf

Filling missing parts,build up system!

D E D D D E D D D E D D(

3 -

-

- :

AAA

† - C A BAE E ABD D E BA

1

A

1

/

3

Leaf/Spine IP Clos fabricMissing

ParJally exist

ExisJng

Successfully Developed!

Page 14: VPP Accelerated High Performance & Scalable L3DSR L4 Load ... · *Other names and brands may be claimed as the property of others. Category Feature Description First appeared Framework

*Other names and brands may be claimed as the property of others.

Mpps

@ 9 B % 6 6 6 @ @6B ) 0 9 BA 0 A @C

Frame size

Frame size V.S. Mpps @VPP L3DSR by IXIA• 8 S B MI 2P 2P5 M I E• :66 N

• 6 2 M I 6 I LLI0) *) 3 N

• 4 GI (1• 52 2 M 0MC M 5 M I E

I M I )( -8 1 L !

( @ @6B

.

2 0 1

IL MI 1 M AI G C N

2 cores

1 core

4 cores

[1] hIps://www.ixiacom.com/

[1]

*Other names and brands may be claimed as the property of others.

VPP LB/node performance test@ 10Gbps

To VIP

To real servers

Page 15: VPP Accelerated High Performance & Scalable L3DSR L4 Load ... · *Other names and brands may be claimed as the property of others. Category Feature Description First appeared Framework

*Other names and brands may be claimed as the property of others.

1 Packet processing is decomposedinto a directed graph of nodes …

… packets move through graph nodes in vector …2

Microprocessor

… graph nodes are optimized to fit inside the instruction cache …

… packets are pre-fetched into the data cache.

Instruction Cache3

Data Cache4

3

4

Makes use of modern Intel® Xeon® Processor micro-architectures. Instruction cache & data cache always hot è Minimized memory latency and usage.

vhost-user-input

af-packet-input dpdk-input

ip4-lookup-mulitcast ip4-lookup*

ethernet-input

mpls-inputlldp-input

arp-inputcdp-input...-no-

checksum

ip6-inputl2-input ip4-input

ip4-load-balance

mpls-policy-encap

ip4-rewrite-transit

ip4-midchain

interface-output

* Each graph node implements a “micro-NF”, a “micro-NetworkFunction” processing packets.

Packet 0

Packet 1

Packet 2

Packet 3

Packet 4

Packet 5

Packet 6

Packet 7

Packet 8

Packet 9

Packet 10

FD.io VPP – The “Magic” of VectorsCompute Optimized SW Network Platform

Page 16: VPP Accelerated High Performance & Scalable L3DSR L4 Load ... · *Other names and brands may be claimed as the property of others. Category Feature Description First appeared Framework

*Other names and brands may be claimed as the property of others.

• RSS (Receive Side Scaling) enables traffic associated with one connec>on to a given thread.• Well leveraging HW feature RSS and SW mul>-threading for true scalability

VPP Load Balancer with L3DSRWorker 0 to N

NIC

ip4 load balance

Ib4 l3dsr

DPDK input

ip4 input

ip4 rewrite

ip4 lookup

IF output

RSS

Queue 0

Queue 1

Queue N

Multi-threading for performance scalability

NIC

ip4 load balance

Ib4 l3dsr

DPDK input

ip4 input

ip4 rewrite

ip4 lookup

IF output

ip4 load balance

Ib4 l3dsr

DPDK input

ip4 input

ip4 rewrite

ip4 lookup

IF output

Queue 0

Queue 1

Queue N

HW SW HWETH IP

[packet][packet][packet]

Load Balancer

Worker 0

Worker N

Page 17: VPP Accelerated High Performance & Scalable L3DSR L4 Load ... · *Other names and brands may be claimed as the property of others. Category Feature Description First appeared Framework

*Other names and brands may be claimed as the property of others.

::1 1 C -- B: 4

• // 1 4 1 : / 14 1:1 3

• /31: B B 4 + 31 1 : D

• : C - B

• /B 3 1 3

Page 18: VPP Accelerated High Performance & Scalable L3DSR L4 Load ... · *Other names and brands may be claimed as the property of others. Category Feature Description First appeared Framework

*Other names and brands may be claimed as the property of others.

/

A

B A

-

D B

(N+1) nodes

C

(

-

/C

A

ACD

/

LB node

A B A AInternet

/

/ / B

A / B

DC AD

(A

: C: :) A FG

*Other names and brands may be claimed as the property of others.

Internal VPP LB node architectureand control-plane

Page 19: VPP Accelerated High Performance & Scalable L3DSR L4 Load ... · *Other names and brands may be claimed as the property of others. Category Feature Description First appeared Framework

*Other names and brands may be claimed as the property of others.

• Full-scratch written by Golang• 22k LoC with gRPC

• Components• LB agent and drivers

• VPP API agent• BGP agent• Health check agent

• LB management system • Controller & DB• User-API & Interface

-

/

/ / /

GoBGP

/

[1] [2]

[3]

*Other names and brands may be claimed as the property of others.

LB control-plane

[1] h>ps://bird.network.cz/ [2] h>p://www.haproxy.org/ [3] h>ps://osrg.github.io/gobgp/

Page 20: VPP Accelerated High Performance & Scalable L3DSR L4 Load ... · *Other names and brands may be claimed as the property of others. Category Feature Description First appeared Framework

*Other names and brands may be claimed as the property of others.

Leaf/Spine IP Clos fabric

����� ����� �����

-

Internet

xUpgrading…

xUpgrading…

xUpgrading…

• Features• Serving API to integrate with any system• Zero-downtime upgrading• Life cycle management

• HW EoL migration

Completed first node!Completed all node!

LB control-plane (Cont.)

Page 21: VPP Accelerated High Performance & Scalable L3DSR L4 Load ... · *Other names and brands may be claimed as the property of others. Category Feature Description First appeared Framework

*Other names and brands may be claimed as the property of others.

New Data Center

• Features• Serving API to integrate with any system• Zero-downtime upgrading• Life cycle management

• HW EoL migration

����� �����

Leaf/Spine IP Clos fabric

�����

-

IP Clos fabric

�����Internet

xx x

LB control-plane (Cont.)

Page 22: VPP Accelerated High Performance & Scalable L3DSR L4 Load ... · *Other names and brands may be claimed as the property of others. Category Feature Description First appeared Framework

*Other names and brands may be claimed as the property of others.

• Setting• Traffic generator: iperf 10Gbps * 8• 8 VPP-LB nodes: 10Gbps * 8• ECMP

• Spine• Leaf• VPP-LB

• 80Gbps to the 1 VIP on 8 VPP-LB

LB system performance@ 80Gbps

( (( (

* 8

(

80Gbps

( (( (VIP

( ) * 8

(

40Gbps/Spine

20Gbps/Leaf

10Gbps/member

10Gbps/VPPLB

Page 23: VPP Accelerated High Performance & Scalable L3DSR L4 Load ... · *Other names and brands may be claimed as the property of others. Category Feature Description First appeared Framework

*Other names and brands may be claimed as the property of others.

• Setting• Traffic generator: iperf 10Gbps * 8• 8 VPP-LB nodes: 10Gbps * 8• ECMP

• Spine• Leaf• VPP-LB

• 80Gbps to the 1 VIP on 8 VPP-LB

9 0. ( 2 0 2 11 )0. )

LB system performance@ 80Gbps

vpplb-node01VIP: 172.27.156.248

vpplb-node02VIP: 172.27.156.248

vpplb-node03VIP: 172.27.156.248

vpplb-node04VIP: 172.27.156.248

vpplb-node05VIP: 172.27.156.248

vpplb-node06VIP: 172.27.156.248

vpplb-node07VIP: 172.27.156.248

vpplb-node08VIP: 172.27.156.248

( (( (

* 8

(

80Gbps

( (( (VIP

( ) * 8

(

40Gbps/Spine

20Gbps/Leaf

10Gbps/member

10Gbps/VPPLB

Page 24: VPP Accelerated High Performance & Scalable L3DSR L4 Load ... · *Other names and brands may be claimed as the property of others. Category Feature Description First appeared Framework

*Other names and brands may be claimed as the property of others.

• Setting• Traffic generator: iperf 10Gbps * 8• 8 VPP-LB nodes: 10Gbps * 8• ECMP

• Spine• Leaf• VPP-LB

• 80Gbps to the 1 VIP on 8 VPP-LB

A B ( . A BA B ) 2 C 2 C BE 9

iperf –c VIP iperf –c VIP iperf –c VIP iperf –c VIP

iperf –c VIP iperf –c VIP iperf –c VIP iperf –c VIP

LB system performance@ 80Gbps

)) ()

=> 8.29 Gbps => 9.42 Gbps => 8.50 Gbps => 8.81 Gbps

=> 8.98 Gbps => 8.18 Gbps => 9.34 Gbps => 9.08 Gbps

- AC 8 E 11 / A C 01

)) )))) ))

* 8

() &)

80Gbps

)) )))) ))VIP

&) - * 8

() &)

40Gbps/Spine

20Gbps/Leaf

10Gbps/member

10Gbps/VPPLB

Page 25: VPP Accelerated High Performance & Scalable L3DSR L4 Load ... · *Other names and brands may be claimed as the property of others. Category Feature Description First appeared Framework

*Other names and brands may be claimed as the property of others.

LBaaS Summary::/ : : /

• D 4 4 34 C -• Serving API to integrate with any system• Zero-downtime upgrading• Life cycle management (EoL migration)

• // 1 43 4 4 / 3 4• / 34 + 1 D• C - 1 4• / 4 4 4 4 4 4 B4

--

Clos fabric

Internet

-

Page 26: VPP Accelerated High Performance & Scalable L3DSR L4 Load ... · *Other names and brands may be claimed as the property of others. Category Feature Description First appeared Framework

• ! : :: : /: : - A:A: - : A -

• / : : /:• A: :A /:

• : -.: - :• - : -• : - : : :A /• / : A /:

C A: :

- : / :

[email protected][email protected]

Page 27: VPP Accelerated High Performance & Scalable L3DSR L4 Load ... · *Other names and brands may be claimed as the property of others. Category Feature Description First appeared Framework

*Other names and brands may be claimed as the property of others.

Legal Disclaimers• Software and workloads used in performance tests may have been optimized for performance only on Intel microprocessors.

Performance tests, such as SYSmark and MobileMark, are measured using specific computer systems, components, software, operations and functions. Any change to any of those factors may cause the results to vary. You should consult other information and performance tests to assist you in fully evaluating your contemplated purchases, including the performance of that product when combined with other products. For more complete information visit www.intel.com/benchmarks.

• Intel technologies' features and benefits depend on system configuration and may require enabled hardware, software or service activation. Performance varies depending on system configuration. No computer system can be absolutely secure. Check with your system manufacturer or retailer or learn more at www.intel.com.

• Cost reduction scenarios described are intended as examples of how a given Intel- based product, in the specified circumstances and configurations, may affect future costs and provide cost savings. Circumstances will vary. Intel does not guarantee any costs or cost reduction.

• No license (express or implied, by estoppel or otherwise) to any intellectual property rights is granted by this document.

• Intel disclaims all express and implied warranties, including without limitation, the implied warranties of merchantability, fitness for a particular purpose, and non-infringement, as well as any warranty arising from course of performance, course of dealing, or usage in trade.

• All information provided here is subject to change without notice. Contact your Intel representative to obtain the latest Intel product specifications and roadmaps.

• Intel, the Intel logo and Xeon are trademarks of Intel Corporation in the U.S. and/or other countries.

• *Other names and brands may be claimed as the property of others.

• © 2018 Intel Corporation.

27

Page 28: VPP Accelerated High Performance & Scalable L3DSR L4 Load ... · *Other names and brands may be claimed as the property of others. Category Feature Description First appeared Framework
Page 29: VPP Accelerated High Performance & Scalable L3DSR L4 Load ... · *Other names and brands may be claimed as the property of others. Category Feature Description First appeared Framework

Backup

Page 30: VPP Accelerated High Performance & Scalable L3DSR L4 Load ... · *Other names and brands may be claimed as the property of others. Category Feature Description First appeared Framework

*Other names and brands may be claimed as the property of others.

• P E 1MA 1M4 L B• 55 6 6G

• 58 1E C : E 5 II( )( 2 6

• 3 N * 0-• 41 1E C E 4 L B

EE A E :( 0 GI

. 0

[1] h9ps://www.ixiacom.com/

[1]

VPP LB/node performance test@ 10Gbps

To VIP

To real servers

VPPLB AddiFonal latency

:= all_latency – returning_by_switch_latency

( ( 12 )2 29 ( 2 9

Page 31: VPP Accelerated High Performance & Scalable L3DSR L4 Load ... · *Other names and brands may be claimed as the property of others. Category Feature Description First appeared Framework

*Other names and brands may be claimed as the property of others.

• ( CA D ( 1• - ::

• -3 3 B•

[1] https://www.nanog.org/meetings/nanog51/presentations/Monday/NANOG51.Talk45.nanog51-Schaumann.pdf

Missing

ParIally exist

ExisIng

3 A D 3 A D 3 A D

3

/ †

-

:

† A AB4 A B

3 B

)

/

Leaf/Spine IP Clos fabric

L3DSR LB configuraIon example:lb vip 192.168.1.1/32 protocol tcp port 80 encap l3dsr dscp 23 new_len 1024lb as 192.168.1.1/32 protocol tcp port 80 172.16.1.1lb as 192.168.1.1/32 protocol tcp port 80 172.16.1.2lb as 192.168.1.1/32 protocol tcp port 80 172.16.1.3 * Added by this project

L3DSR LB configuration example