yibo zhu, monia ghobadi, jitendra padhye (all microsoft)...0 5 0 5 0 5 0 5 0 4kb 16kb 64kb 256kb b b...

Post on 12-Mar-2020

13 Views

Category:

Documents

0 Downloads

Preview:

Click to see full reader

TRANSCRIPT

Yibo Zhu, Monia Ghobadi, JitendraPadhye (all Microsoft)

0 5

10 15 20 25 30 35 40

4KB 16KB 64KB 256KB 1MB 4MB

Th

rou

ghpu

t (G

bps)

Message size

TCP

4

Small messages CPU is the bottleneckLarger msgs ~3 CPU

cores are burnt by TCP

Sender Receiver

0

10

20

TCP RDMA(read/write)

RDMA(send)

Tim

e t

o t

ran

sfe

r 2

KB

(m

s)

0

20

40

60

80

100

4KB 16KB 64KB 256KB 1MB 4MB

CP

U u

tiliz

ation

(%

)Message size

TCP

5

RDMA bypasses host OS stack

frees host CPU, lowers latency

Memory

Buffer A

Write local buffer at address A

to remote buffer at address B

Buffer B is filled

DMA

NICApplication

NICApplicationMemory

Buffer B DMA

Sender

Receiver

Allocate

Allocate

6

RDMA single thread ~40Gbps RDMA CPU ~0%

RDMA latency 1~2 μs

0

10

20

TCP RDMA(read/write)

RDMA(send)

Tim

e t

o t

ran

sfe

r 2

KB

(m

s)

0

20

40

60

80

100

4KB 16KB 64KB 256KB 1MB 4MB

CP

U u

tiliz

ation

(%

)

Message size

TCPRDMA

0 5

10 15 20 25 30 35 40

4KB 16KB 64KB 256KB 1MB 4MB

Th

rou

ghpu

t (G

bps)

Message size

TCPRDMA

• Solution:

• Problem

7

Enter DCQCN and TIMELY: Congestion Control for ROCEv2

ECN

Delay

Takeaway:

DCQCN is a little too complicated

DCQCN model matches simulations and implementation

TIMELY model matches simulations

• Stability

• Rate of convergence

• Fairness

• High utilization

• Low flow completion time

We don’t have an intuitive explanation

Load factor = 0.8

• Feedback is delayed as queue builds up

T0, Q = 2

T1, Q = 3

T2, Q = 4

Blue packet arrival complete

Blue packet is about to arrive

Blue packet ready to depart

… and is marked, reflecting

state of queue at T2

Marking threshold = 4 packets

T0, Q = 2

T1, Q = 3

T2, Q = 4

Blue packet arrival complete.

… timer starts

Blue packet is about to arrive

Blue packet ready to depart

… and reflects state of queue

at T0

• Delay inherently reports “stale” information

• The staleness is affected by queue length!

• Longer queue more stale feedback

• This can lead to instability

• Can have fixed queue or fairness – but not both!

Bottleneck queue is a function of number of flows.

DCQCN (40Gbps link) TIMELY (10Gbps link)

DCQCN with RED-like marking

DCQCN with PI-like marking

• Can have fixed queue or fairness – but not both!

• ECN marking is resistant to feedback jitter

0

20

40

60

80

100

120

140

0 0.05 0.1 0.15 0.2

Qu

eu

e(K

B)

Time(s)

TIMELYDCQCN

top related