tcp bbr for ultra-low latency networking:...

TCP BBR for Ultra-Low Latency Networking: Challenges, Analysis, and Solutions

Rajeev Kumar*, Athanasios Koutsaftis*, Fraida Fund*, Gaurang Naik†, Pei Liu*,Yong Liu*, and Shivendra Panwar*

*Tandon School of Engineering, New York University, Brooklyn, NY, USA, † Virginia Tech, Arlington, VA, USA

Outline

Motivation01

Identifying the problem02

Mathematical model03

Possible solutions04

Emerging applications require high throughput, low latency

Can be supported by LTE or LTE advanced

Likely to be supportedby millimeter wave (mmW) 5G networks

Figure source: GSMA Intelligence Understanding 5G

5G mmWave links promise to support these requirements…

… but we still need a suitable transport protocol on top of 5G

TCP BBR: designed for maximum throughput, minimum latency

Loss-based congestion control fills bufferAchieves full throughput, but induces queuing delay

TCP BBR doesn’t fill bufferEstimates bottleneck bandwidth and sends at that rate

TCP BBR operates at the BDPAlso tries to estimate what link RTT is without queueing delayBDP = Bottleneck bandwidth x link RTT without queueing

Figure source: N. Cardwell et al., “BBR: congestion-based congestion control,” Queue, vol. 14, no. 5, pp. 50:20–50:53, Oct. 2016.

TCP BBR has been successfully deployed over Google’s WAN and over other wired networksReported higher throughput and less latency relative to Cubic

But when we tried BBR over a prototype mmWave backhaul link, throughput was much less than the link capacity

In the rest of this talk, we explain this observation:• Identify the cause and validate it experimentally• Model it• Suggest solutions

Significant throughput loss over mmWave backhaul link

Why did BBR throughput collapse over the mmWave link?

A possible reason: on a wireless link with very low latency,

jitter is large relative to mean delay

Jitter occurs due to process scheduling, MAC scheduling, channel dynamics, and handover process

RTT over time in our 60 GHz link

Does jitter reliably cause throughput collapse?

Observed throughput and BBR’s bandwidth estimate, RTT estimate, and CWND

Experimental setup on CloudLab to confirm that jitter causes throughput degradation

Bottleneck link(1 Gbps)

Minimal propagation delay, < 0.1ms

Emulated delay(5ms mean,

normally distributed jitter)

Jitter causes TCP BBR throughput to collapse

At first, increasing jitter does not affect throughput

Beyond a certain point, throughput collapsescompletely

Why?

Intermediate points: time is divided between low throughput/high throughput

BBR’s BW estimate collapses at the same point

Intermediate points: time is divided between low estimate/high estimate

The “knee” occurs when CWND drops below BDP

Link BDP

CWND < BDP when BBR’s RTT estimate is too low

BBR measures “RTT without queueing delay”:

1. Drain queue (send at CWND=4 for 200 ms)

2. Measure minimum RTT

Assumes min RTT ≈ RTT without queueing delay

But, when there is delay variation that is not due to congestion,

min RTT can be much less than “typical RTT without queueing delay”

When min RTT < ½ of “RTT without queueing delay”, CWND < BDP

Jitter causes throughput collapse in BBR

1. Due to jitter on the link, mRTTest< ½ “RTT without queuing delay”

2. Then CWND = 2 x BDPest = 2 x mRTTest x BWest < BDP

3. BWest collapses

4. Throughput collapses to minimal value

when minimum RTT estimate is < ½ of “typical RTT without queuing delay”

?

Why does CWND < BDP cause BWest to collapse?

BBR estimates BW as follows:• When pkt Y is sent, we note previous ACK X• When ACK Y arrives, we compute:

(Y – X) / (time between ACK X and ACK Y)

Why does CWND < BDP cause BWest to collapse?

BBR estimates BW as follows:• When pkt Y is sent, we note previous ACK X• When ACK Y arrives, we compute:

(Y – X) / (time between ACK X and ACK Y)

If sending is CWND-limited, there is an idle period and new BWest is too low!

Then, CWND drops even lower→ more CWND exhaustion while estimating BW→ feedback loop results in complete collapse

Mathematical model

BBR BW estimation: without CWND exhaustion

BBR estimates bottleneck bandwidth as maximum of 10 delivery rate samples

Packets are paced so the inter-departure time of packets at the time of packet P’s transmission is:

Δ𝑡𝑃−1 = Τ1 𝐵𝑊𝑒𝑠𝑡,𝑃−1

Between the transmission of P and the receipt of its ACK (𝑅𝑇𝑇𝑃), the number of packets sent* is

ൗ𝑅𝑇𝑇𝑃

Δ𝑡𝑃−1

Then its delivery rate sample will be

𝑑𝑃 =1

𝑅𝑇𝑇𝑃

𝑅𝑇𝑇𝑃Δ𝑡𝑃−1

=1

Δ𝑡𝑃−1

→ Delivery rate sample 𝑑𝑃 is equal to link capacity (if 𝐵𝑊𝑒𝑠𝑡,𝑃−1is correct)

* Assuming 𝛥𝑡𝑃−1 is constant between P’s transmission and its ACK.

BBR BW estimation: with CWND exhaustion

If CWND exhaustion may occur, then the number of packets sent* is

min 𝑊𝑃,𝑅𝑇𝑇𝑃Δ𝑡𝑃−1

(CWND 𝑊𝑃 = 2 𝐵𝑊𝑒𝑠𝑡,𝑃−1 𝑚𝑅𝑇𝑇𝑒𝑠𝑡,𝑃−1).

And the delivery rate sample is

𝑑𝑃 =1

𝑅𝑇𝑇𝑃min 𝑊𝑃,

𝑅𝑇𝑇𝑃Δ𝑡𝑃−1

≤1

Δ𝑡𝑃−1

→ Delivery rate sample 𝑑𝑃 may be less than link capacity

* Assuming 𝑊𝑃, 𝛥𝑡𝑃−1 constant between P’s transmission and its ACK.

BW estimate as a function of previous BW, mRTT estimates

Suppose we observe BW estimation over a duration of length 𝑅𝑇𝑇𝑈𝐶𝐴𝑉 (“typical uncongested RTT”)

Then BW estimate at the end of observation window 𝑗 is*

𝐵𝑊𝑒𝑠𝑡,𝑗

→ Model expresses BW estimate as a function of previous BW estimate, min RTT estimates, shows that BW

is underestimated when 𝑚𝑅𝑇𝑇𝑒𝑠𝑡 <1

2𝑅𝑇𝑇𝑈𝐶𝐴𝑉

* 𝑊𝑗 , 𝛥𝑡𝑗−1 are assumed to be constant and delivery rate samples are identical during the observation window.

=1

𝑅𝑇𝑇𝑈𝐶𝐴𝑉min 𝑊𝑗 ,

𝑅𝑇𝑇𝑈𝐶𝐴𝑉Δ𝑡𝑗−1

=1

𝑅𝑇𝑇𝑈𝐶𝐴𝑉min 2 𝐵𝑊𝑒𝑠𝑡,𝐽−1 𝑚𝑅𝑇𝑇𝑒𝑠𝑡,𝐽−1,

𝑅𝑇𝑇𝑈𝐶𝐴𝑉Δ𝑡𝑃−1

=1

𝑅𝑇𝑇𝑈𝐶𝐴𝑉min

2 𝑚𝑅𝑇𝑇𝑒𝑠𝑡,𝑗−1

Δ𝑡𝑗−1,𝑅𝑇𝑇𝑈𝐶𝐴𝑉Δ𝑡𝑗−1

=1

𝑅𝑇𝑇𝑈𝐶𝐴𝑉 Δ𝑡𝑗−1min 2 𝑚𝑅𝑇𝑇𝑒𝑠𝑡,𝑗−1, 𝑅𝑇𝑇𝑈𝐶𝐴𝑉

When 𝑚𝑅𝑇𝑇𝑒𝑠𝑡 <1

2𝑅𝑇𝑇𝑈𝐶𝐴𝑉

→ first term is minimum→ BW estimate is too lowTriggers start of collapse

Experimental observation and model prediction

We compare experimental observations to

• average BW estimate computed iteratively from BW estimate expression, Monte Carlo simulation of mRTT estimate assuming truncated normal distribution, approximate queuing delay

• average throughput computed iteratively using CWND

with good agreement.

Comparison of experimental observation and analytical result

(a) Bandwidth estimate (b) TCP BBR throughput

Potential solutions

Possible solutionsAvoid throughput and bandwidth estimate collapse by keeping CWND > BDP

1. Fix bandwidth estimate as bottleneck link capacity, and uses it for estimating BDP?• Fixes “collapse”, but BDP is still lower than it should be

low min RTT estimate → CWND-limited while sending → throughput degradation• How to “know” link capacity?

2. Fix minimum RTT as uncongested average RTT, and uses it in computing estimated BDP?• Fixes BW estimate collapse, can be properly estimated

(a) Bandwidth estimate (b) Minimum RTT estimate (c) TCP BBR throughput

Conclusions

Conclusions

o BBR could be a good candidate for emerging applications requiring high throughput, low delay on 5G networks

o But, we observed severe throughput degradation over a 60 GHz link

o With experiments on CloudLab, we show that this is caused by RTT variation

• In particular, when minimum RTT estimate falls below half of the typical RTT of the link when it is not congested, a series of events results in complete throughput collapse

o We present a mathematical model of this process

o We evaluate two possible approaches to solve the problem

• By “fixing” the RTT estimate

• By “fixing” the BW estimate

Thank You

Back-up slides

TCP BBR Phases

Bandwidth Estimation PhaseObtains the initial estimate of bottleneck bandwidth using binary search01Bandwidth Probe PhaseContinuously updates the bottleneck bandwidth estimate and minimum RTT to size congestion window to obtain maximum throughput and lowest possible latency.

03

Drain PhaseObtains the minimum RTT of TCP session by exponentially reducing the sending rate02RTT Probe PhaseEnters this phase if minimum RTT does not get updated for last 10 seconds. Sets the congestion window as 4 MSS to obtain new estimate of minimum RTT.

04

Beginning ofTCP Session

Beginning ofTCP Session

98% time ofTCP session

2% time ofTCP session

At the initialization of TCP session

Steady State of TCP session(spend most of the time during

a persistent TCP session)

Effect of Jitter an example

0

1

2

3

4

5

6

0 2 4 6 8 10

RT

T [m

s]

0

1

2

3

4

5

6

0 2 4 6 8 10

mR

TT

[ms]

0

100

200

300

400

500

0 2 4 6 8 10

Esti

mat

ed B

DP

[M

SS]

0

100

200

300

400

500

600

700

800

900

1000

0 1 2 3 4 5 6 7 8 9 10

CW

ND

[M

SS]

0

200

400

600

800

1000

1200

0 1 2 3 4 5 6 7 8 9 10

BW

est

imat

e [M

bp

s]

BBR 2019

Some recent changes to BBR have the side effect of increased CWND.

This does not resolve the problem, but reduces the likelihood of observing it in practice.

BBR 2019

Solutions work in a similar way.

(a) Bandwidth estimate (b) Minimum RTT estimate (c) TCP BBR throughput

Approximation of 𝑚𝑅𝑇𝑇𝑒𝑠𝑡A minimum RTT may be observed in the Probe BW phase (where BBR also transmits data at the estimated link capacity) or in the Probe RTT phase (where CWND = 4).

The expectation of the min RTT estimate at the end of a single Probe RTT phase followed by a single Probe BW phase:

𝔼 𝑚𝑅𝑇𝑇𝑒𝑠𝑡 = න0

∞

1 − 𝐹𝑋(𝑠)𝑁 1 − 𝐹𝑌(𝑠)

𝑀𝑑𝑠

Where • 𝐹𝑋(𝑠) is the CDF of the RTT samples in the Probe RTT phase, • 𝑁 is the number of RTT samples in the Probe RTT phase,• 𝐹𝑌(𝑠) is the CDF of the RTT samples in the Probe BW phase*,• 𝑀 is the number of RTT samples in the Probe BW phase*.

* 𝑀 depends on the sending rate during Probe BW phase, and whether 10 seconds elapse without observing a new min RTT.𝐹𝑌(𝑠) depends on queuing delay, which in turn depends on sending rate (and on whether collapse has occurred).

tcp bbr for ultra-low latency networking:...

Documents