virtual rdma networking for containerized clouds · freeflow: software-based virtual rdma...

25
FreeFlow: Software-based Virtual RDMA Networking for Containerized Clouds Daehyeok Kim Tianlong Yu 1 , Hongqiang Liu 3 , Yibo Zhu 4 , Jitu Padhye 2 , Shachar Raindel 2 Chuanxiong Guo 4 , Vyas Sekar 1 , Srinivasan Seshan 1 Carnegie Mellon University 1 , Microsoft 2 , Alibaba group 3 , Bytedance 4

Upload: others

Post on 12-Jun-2020

9 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Virtual RDMA Networking for Containerized Clouds · FreeFlow: Software-based Virtual RDMA Networking for Containerized Clouds Daehyeok Kim Tianlong Yu1, Hongqiang Liu3, Yibo Zhu4,

FreeFlow: Software-based Virtual RDMA Networking for Containerized Clouds

Daehyeok Kim

Tianlong Yu1, Hongqiang Liu3, Yibo Zhu4, Jitu Padhye2, Shachar Raindel2

Chuanxiong Guo4, Vyas Sekar1, Srinivasan Seshan1

Carnegie Mellon University1, Microsoft2, Alibaba group3, Bytedance4

Page 2: Virtual RDMA Networking for Containerized Clouds · FreeFlow: Software-based Virtual RDMA Networking for Containerized Clouds Daehyeok Kim Tianlong Yu1, Hongqiang Liu3, Yibo Zhu4,

Two Trends in Cloud Applications

• Lightweight isolation

• Portability

1

• Higher networking performance

Containerization RDMA networking

Page 3: Virtual RDMA Networking for Containerized Clouds · FreeFlow: Software-based Virtual RDMA Networking for Containerized Clouds Daehyeok Kim Tianlong Yu1, Hongqiang Liu3, Yibo Zhu4,

Benefits of Containerization

2

NIC

Container 1

IP: 10.0.0.1

IP: 30.0.0.1

NetworkApp

Container 2

IP: 20.0.0.1

NetworkApp

Host 1 Host 2

NIC

Container 2

NetworkApp

IP: 20.0.0.1

IP: 40.0.0.1

Migration

Namespace Isolation

PortabilitySoftware Switch

Software Switch

Page 4: Virtual RDMA Networking for Containerized Clouds · FreeFlow: Software-based Virtual RDMA Networking for Containerized Clouds Daehyeok Kim Tianlong Yu1, Hongqiang Liu3, Yibo Zhu4,

Containerization and RDMA are in Conflict!

3

RDMA NIC

Container 1

IP: 10.0.0.1

IP: 10.0.0.1

RDMAApp

Container 2

IP: 10.0.0.1

RDMAApp

Host 1 Host 2

RDMA NIC

Container 2

RDMAApp

IP: 20.0.0.1

IP: 20.0.0.1

Migration

Namespace Isolation

Portability

Page 5: Virtual RDMA Networking for Containerized Clouds · FreeFlow: Software-based Virtual RDMA Networking for Containerized Clouds Daehyeok Kim Tianlong Yu1, Hongqiang Liu3, Yibo Zhu4,

Existing H/W based Virtualization Isn’t Working

4

Container 1

IP: 10.0.0.1

IP: 10.0.0.1

RDMAApp

Container 2

IP: 10.0.0.2

RDMAApp

Container 2

RDMAApp

IP: 20.0.0.1

Host 1 Host 2

VF 1 VF 2

NIC Switch

RDMA NIC

IP: 10.0.0.2

VF

NIC Switch

IP: 20.0.0.1

Migration

Using Single Root I/O Virtualization (SR-IOV)

VF Virtual Function

Namespace Isolation

Portability

Page 6: Virtual RDMA Networking for Containerized Clouds · FreeFlow: Software-based Virtual RDMA Networking for Containerized Clouds Daehyeok Kim Tianlong Yu1, Hongqiang Liu3, Yibo Zhu4,

Sub-optimal Performance of Containerized Apps

5

0

1000

2000

3000

Resnet-50 Inception-v3 AlexnetTr

ain

ing

Spee

d

(Im

ages

/sec

)Model

Native RDMA

Container+TCP

RDMA networking can improve the training speed of NN model by ~ 10x !

9.2x14.4x

Speech recognition RNN training Image classification CNN training

0

0.5

1

0 10 20 30 40

CD

F

Time per step (sec)

Native RDMA Container+TCP

Page 7: Virtual RDMA Networking for Containerized Clouds · FreeFlow: Software-based Virtual RDMA Networking for Containerized Clouds Daehyeok Kim Tianlong Yu1, Hongqiang Liu3, Yibo Zhu4,

Our Work: FreeFlow

• Enable high speed RDMA networking capabilities for containerized applications

• Compatible with existing RDMA applications

• Close to native RDMA performance• Evaluation with real-world data-intensive applications

6

Page 8: Virtual RDMA Networking for Containerized Clouds · FreeFlow: Software-based Virtual RDMA Networking for Containerized Clouds Daehyeok Kim Tianlong Yu1, Hongqiang Liu3, Yibo Zhu4,

Outline

• Motivation

• FreeFlow Design

• Implementation and Evaluation

7

Page 9: Virtual RDMA Networking for Containerized Clouds · FreeFlow: Software-based Virtual RDMA Networking for Containerized Clouds Daehyeok Kim Tianlong Yu1, Hongqiang Liu3, Yibo Zhu4,

FreeFlow Design Overview

8

RDMA App

RDMA NIC

Native RDMA FreeFlow

RDMA NIC

Container 1

IP: 10.0.0.1RDMA App

Container 2

IP: 20.0.0.1

FreeFlow

IP: 30.0.0.1

Host Host

Verbs library

Verbs library

Verbs APIVerbs API

RDMA App

Verbs API

NIC command

Page 10: Virtual RDMA Networking for Containerized Clouds · FreeFlow: Software-based Virtual RDMA Networking for Containerized Clouds Daehyeok Kim Tianlong Yu1, Hongqiang Liu3, Yibo Zhu4,

Background on RDMA

9

RDMA App

RDMA NIC

MEM-1RDMA CTX

RDMA App

RDMA NIC

MEM-2 RDMA CTX

1. Control path- Setup RDMA Context- Post work requests (e.g., write)

2. Data path- NIC processes work requests- NIC directly accesses memory

“Host 1 wants to write contents in MEM-1 to MEM-2 on Host 2”

Host 1 Host 2

Verbs library Verbs library

Page 11: Virtual RDMA Networking for Containerized Clouds · FreeFlow: Software-based Virtual RDMA Networking for Containerized Clouds Daehyeok Kim Tianlong Yu1, Hongqiang Liu3, Yibo Zhu4,

FreeFlow in the Scene

10

RDMA App

RDMA NIC

MEM-1RDMA CTX

“Container 1 wants to write contents in MEM-1 to MEM-2 on Container 2”

Container 1 Container 2

FreeFlow

S-RDMA CTX S-MEM-1

RDMA App

RDMA NIC

RDMA CTXMEM-2

FreeFlow

S-MEM-2 S-RDMA CTX

C1: How to forward verbs calls?

Verbs library Verbs library

C2: How to synchronize memory?

Page 12: Virtual RDMA Networking for Containerized Clouds · FreeFlow: Software-based Virtual RDMA Networking for Containerized Clouds Daehyeok Kim Tianlong Yu1, Hongqiang Liu3, Yibo Zhu4,

Challenge 1: Verbs forwarding in Control Path

11

RDMA App

RDMA NIC

FreeFlow

Verbs library

Container

NIC command

Verbs API

?

RDMA App

Shim

ibv_post_send (struct ibv_qp* qp, …)

Attempt 1: Forward “as it is”➔Incorrect

Attempt 2: “Serialize” and forward➔ Inefficient

struct ibv_qp {struct ibv_context *context;….

};

Page 13: Virtual RDMA Networking for Containerized Clouds · FreeFlow: Software-based Virtual RDMA Networking for Containerized Clouds Daehyeok Kim Tianlong Yu1, Hongqiang Liu3, Yibo Zhu4,

Internal Structure of Verbs Library

12

RDMA App

RDMA NIC

FreeFlow

Verbs library

Container

NIC command

Verbs API

?

RDMA App

Shim

ibv_post_send (struct ibv_qp* qp, …)

struct ibv_qp {struct ibv_context *context;….

};

Parameters are serialized by Verbs library!

Page 14: Virtual RDMA Networking for Containerized Clouds · FreeFlow: Software-based Virtual RDMA Networking for Containerized Clouds Daehyeok Kim Tianlong Yu1, Hongqiang Liu3, Yibo Zhu4,

FreeFlow Control Path Channel

13

RDMA App

FreeFlow library

Write (VNIC_fd, serialized parameters)

Parameters are forwarded correctly without manual serialization!

Idea: Leveraging the serialized output of verbs library

Verbs library

Shim

RDMA App

RDMA NIC

FreeFlow Router

Verbs library

Container

NIC command

Verbs API

VNIC

ibv_post_send (struct ibv_qp* qp, ….)

FreeFlow Router

VNIC

Page 15: Virtual RDMA Networking for Containerized Clouds · FreeFlow: Software-based Virtual RDMA Networking for Containerized Clouds Daehyeok Kim Tianlong Yu1, Hongqiang Liu3, Yibo Zhu4,

Challenge 2: Synchronizing Memory for Data Path

14

• Shadow memory in FreeFlow router• A copy of application’s memory region• Directly accessed by NICs

• S-MEM and MEM must be synchronized.

• How to synchronize S-MEM and MEM?

RDMA App

RDMA NIC

MEMRDMA CTX

FreeFlow Router

S-RDMA CTX S-MEM

Verbs library

VNIC

Container

Page 16: Virtual RDMA Networking for Containerized Clouds · FreeFlow: Software-based Virtual RDMA Networking for Containerized Clouds Daehyeok Kim Tianlong Yu1, Hongqiang Liu3, Yibo Zhu4,

Strawman Approach for Synchronization

15

“Container 1 wants to write contents in MEM-1 to MEM-2 on Container 2”

RDMA App

RDMA NIC

MEM-1RDMA CTX

FreeFlow Router

S-RDMA CTX S-MEM-1

Verbs library

VNIC

Container

RDMA App

RDMA NIC

RDMA CTXMEM-2

FreeFlow Router

S-MEM-2 S-RDMA CTX

Verbs library

VNIC

Container

DATA

?Explicit synchronizationHigh freq.➔ High overheadLow freq.➔Wrong data for app

Page 17: Virtual RDMA Networking for Containerized Clouds · FreeFlow: Software-based Virtual RDMA Networking for Containerized Clouds Daehyeok Kim Tianlong Yu1, Hongqiang Liu3, Yibo Zhu4,

Containers can Share Memory Regions

16

RDMA App

RDMA NIC

MEM-1RDMA CTX

FreeFlow Router

S-RDMA CTX S-MEM-1

Verbs library

VNIC

ContainerHost

Shared memory

MEM

MEM and S-MEM can be located onthe same physical memory region

• FreeFlow router is running in a container

Page 18: Virtual RDMA Networking for Containerized Clouds · FreeFlow: Software-based Virtual RDMA Networking for Containerized Clouds Daehyeok Kim Tianlong Yu1, Hongqiang Liu3, Yibo Zhu4,

Zero-copy Synchronization in Data Path

17

RDMA App

RDMA NIC

MEM-1RDMA CTX

FreeFlow Router

S-RDMA CTX S-MEM-1

Verbs library

VNIC

ContainerHost

Shared memory

MEM

Synchronization without explicit memory copy:Method1: Allocate shared buffers with FreeFlow APIsMethod2: Re-map app’s memory space to shadowmemory space

FreeFlow supports both!

How to allocated MEM-1 to shadow memory space?

Page 19: Virtual RDMA Networking for Containerized Clouds · FreeFlow: Software-based Virtual RDMA Networking for Containerized Clouds Daehyeok Kim Tianlong Yu1, Hongqiang Liu3, Yibo Zhu4,

FreeFlow Design Summary

18

FreeFlow control path channel

Zero-copy memory synchronization

FreeFlow provides near native RDMA performance for containers!

RDMA NIC

Container 1

IP: 10.0.0.1

RDMA App

Container 2

IP: 20.0.0.1

FreeFlow Router

IP: 30.0.0.1

Verbs library

RDMA App

VNIC VNIC

Page 20: Virtual RDMA Networking for Containerized Clouds · FreeFlow: Software-based Virtual RDMA Networking for Containerized Clouds Daehyeok Kim Tianlong Yu1, Hongqiang Liu3, Yibo Zhu4,

Outline

• Motivation

• FreeFlow Design

• Implementation and Evaluation

19

Page 21: Virtual RDMA Networking for Containerized Clouds · FreeFlow: Software-based Virtual RDMA Networking for Containerized Clouds Daehyeok Kim Tianlong Yu1, Hongqiang Liu3, Yibo Zhu4,

Implementation and Experimental Setup

• FreeFlow Library• Add 4000 lines in C to libibverbs and libmlx4.

• FreeFlow Router• 2000 lines in C++

• Testbed setup• Two Intel Xeon E5-2620 8-core CPUs, 64 GB RAM

• 56 Gbps Mellanox ConnectX-3 NICs

• Docker containers

20

Page 22: Virtual RDMA Networking for Containerized Clouds · FreeFlow: Software-based Virtual RDMA Networking for Containerized Clouds Daehyeok Kim Tianlong Yu1, Hongqiang Liu3, Yibo Zhu4,

Does FreeFlow Support Low Latency?

21

0

1

2

3

4

64 256 1K 4K

Late

ncy

(u

s)

Message size (B)

Native RDMA FreeFlow

0.38μs

Page 23: Virtual RDMA Networking for Containerized Clouds · FreeFlow: Software-based Virtual RDMA Networking for Containerized Clouds Daehyeok Kim Tianlong Yu1, Hongqiang Liu3, Yibo Zhu4,

Does FreeFlow Support High Throughput?

22

0

20

40

60

2K 8K 32K 128K 512K 1M

Thro

ugh

pu

t (G

bp

s)

Message size (B)

Native RDMA

FreeFlow

Bounded by control path channel performance

Page 24: Virtual RDMA Networking for Containerized Clouds · FreeFlow: Software-based Virtual RDMA Networking for Containerized Clouds Daehyeok Kim Tianlong Yu1, Hongqiang Liu3, Yibo Zhu4,

Do Applications Benefit from FreeFlow?

23

0

0.5

1

0 10 20 30 40

CD

F

Time per step (sec)

Container+TCP Native RDMA FreeFlow

8.7x

Page 25: Virtual RDMA Networking for Containerized Clouds · FreeFlow: Software-based Virtual RDMA Networking for Containerized Clouds Daehyeok Kim Tianlong Yu1, Hongqiang Liu3, Yibo Zhu4,

Summary

• Containerization today can’t benefit from speed of RDMA.

• Existing solutions for NIC virtualization don’t work (e.g., SR-IOV).

• FreeFlow enables containerized apps to use RDMA.

• Challenges and Key Ideas• Control path: Leveraging Verbs library structure for efficient Verbs forwarding

• Data path: Zero-copy memory synchronization

• Performance close to native RDMA

24github.com/microsoft/freeflow