making linux tcp fast - netdev conf

26
1 Making Linux TCP Fast Yuchung Cheng Neal Cardwell netdev 1.2 Tokyo, October, 2016

Upload: others

Post on 05-Jan-2022

16 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Making Linux TCP Fast - NetDev conf

1

Making Linux TCP Fast

Yuchung ChengNeal Cardwell

netdev 1.2 Tokyo, October, 2016

Page 2: Making Linux TCP Fast - NetDev conf

Once upon a time, there was a TCP ACK...

Here is the a story of what happened next...

2

Page 3: Making Linux TCP Fast - NetDev conf

RACK: detect losses by packets’ send time

Monitors the delivery process of every (re)transmission. E.x.

Sent packets P1 and P2

Receives a SACK of P2

=> P1 is lost if sent more than $RTT + $reo_wnd ago1

Reduce timeouts in Disorder state by 80% on Google.com

3

1 RACK draft-ietf-tcpm-rack-00 since Linux 4.4

Page 4: Making Linux TCP Fast - NetDev conf

congestion control:how fast to send?

4

Page 5: Making Linux TCP Fast - NetDev conf

5

Congestion and bottlenecks

Page 6: Making Linux TCP Fast - NetDev conf

6

Deliv

ery

rate

BDP BDP + BufSize

Congestion and bottlenecks

amount in flight

Page 7: Making Linux TCP Fast - NetDev conf

7

Deliv

ery

rate

BDP BDP + BufSize

RTT

amount in flight

Page 8: Making Linux TCP Fast - NetDev conf

8

Deliv

ery

rate

BDP BDP + BufSize

RTT

CUBIC / Reno

amount in flight

Page 9: Making Linux TCP Fast - NetDev conf

9

Deliv

ery

rate

BDP BDP + BufSize

RTT

Optimal max BW and min RTT (Gail & Kleinrock. 1981)

amount in flight

Page 10: Making Linux TCP Fast - NetDev conf

BDP = (max BW) * (min RTT)

10

Deliv

ery

rate

BDP BDP + BufSize

RTT

Estimating optimal point (max BW, min RTT)

amount in flight

Est min RTT = windowed min of RTT samples

Est max BW = windowed max of BW samples

Page 11: Making Linux TCP Fast - NetDev conf

11

Deliv

ery

rate

BDP BDP + BufSize

RTT

amount in flight

Only min RTT is visible

Onlymax BWis visible

But to see both max BW and min RTT,must probe on both sides of BDP...

Page 12: Making Linux TCP Fast - NetDev conf

One way to stay near (max BW, min RTT) point:

12

Model network, update max BW and min RTT estimates on each ACK

Control sending based on the model, to...

Probe both max BW and min RTT, to feed the model samples

Pace near estimated BW, to reduce queues and loss

Vary pacing rate to keep inflight near BDP (for full pipe but small queue)

That's BBR congestion control (code in Linux v4.9 paper ACM Queue, Oct 2016)

BBR = Bottleneck Bandwidth and Round-trip propagation time

BBR seeks high tput with small queue by probing BW and RTT sequentially

Page 13: Making Linux TCP Fast - NetDev conf

Confidential + Proprietary

BBR: model-based walk toward max BW, min RTT

optimal operating point

13

Page 14: Making Linux TCP Fast - NetDev conf

Confidential + Proprietary

STARTUP: exponential BW search

14

Page 15: Making Linux TCP Fast - NetDev conf

Confidential + Proprietary

DRAIN: drain the queue created during startup

15

Page 16: Making Linux TCP Fast - NetDev conf

Confidential + Proprietary

PROBE_BW: explore max BW, drain queue, cruise

16

Page 17: Making Linux TCP Fast - NetDev conf

Confidential + Proprietary

PROBE_RTT briefly if min RTT filter expires (=10s)*

[*] if continuously sending

minimal packets in flight for max(0.2s, 1 round trip)

17

Page 18: Making Linux TCP Fast - NetDev conf

Packet scheduling: when to send?

18

Page 19: Making Linux TCP Fast - NetDev conf

NIC

TCP

Pacing

Fair queuing

TCP Small Queues (TSQ) TSO autosizing

link19

?

fq

Page 20: Making Linux TCP Fast - NetDev conf

Performance results...

20

Page 21: Making Linux TCP Fast - NetDev conf

BBR vs CUBIC synthetic bulk TCP test with 1 flow, bottleneck_bw 100Mbps, RTT 100ms

Fully use bandwidth, despite high loss

21

Page 22: Making Linux TCP Fast - NetDev conf

Low queue delay, despite bloated buffers

22BBR vs CUBIC synthetic bulk TCP test with 8 flows, bottleneck_bw=128kbps, RTT=40ms

Page 23: Making Linux TCP Fast - NetDev conf

BBR is 2-20x faster on Google WAN

● BBR used for all TCP on Google B4 ● Most BBR flows so far rwin-limited

○ max RWIN here was 8MB (tcp_rmem[2])

○ 10 Gbps x 100ms = 125MB BDP

● after lifting rwin limit

○ BBR 133x faster than CUBIC

23

Page 24: Making Linux TCP Fast - NetDev conf

Conclusion

Algorithms and architecture in Linux TCP have evolved

● Maximizing BW, minimizing queue, and one-RTT recovery (BBR, RACK) ● Based on groundwork of a high-performance packet scheduler

(fq/pacing/tsq/tso-autosizing)● Orders of magnitude higher bandwidth and lower latency

Next Google, YouTube, and... the Internet?

● Help us make them better! https://groups.google.com/forum/#!forum/bbr-dev

24

Page 25: Making Linux TCP Fast - NetDev conf

Backup slides...

26

Page 26: Making Linux TCP Fast - NetDev conf

Confidential + Proprietary

BBR convergence dynamics

Converge by sync'd PROBE_RTT + randomized cycling phases in PROBE_BW

● Queue (RTT) reduction is observed by every (active) flow● Elephants yield more (multiplicative decrease) to let mice grow

bw = 100 Mbit/secpath rtt = 10ms