tcp bbr for ultra-low latency networking:...
TRANSCRIPT
-
TCP BBR for Ultra-Low Latency Networking: Challenges, Analysis, and Solutions
Rajeev Kumar*, Athanasios Koutsaftis*, Fraida Fund*, Gaurang Naik†, Pei Liu*,Yong Liu*, and Shivendra Panwar*
*Tandon School of Engineering, New York University, Brooklyn, NY, USA, † Virginia Tech, Arlington, VA, USA
-
Outline
Motivation01
Identifying the problem02
Mathematical model03
Possible solutions04
-
Emerging applications require high throughput, low latency
Can be supported by LTE or LTE advanced
Likely to be supportedby millimeter wave (mmW) 5G networks
Figure source: GSMA Intelligence Understanding 5G
5G mmWave links promise to support these requirements…
… but we still need a suitable transport protocol on top of 5G
-
TCP BBR: designed for maximum throughput, minimum latency
Loss-based congestion control fills bufferAchieves full throughput, but induces queuing delay
TCP BBR doesn’t fill bufferEstimates bottleneck bandwidth and sends at that rate
TCP BBR operates at the BDPAlso tries to estimate what link RTT is without queueing delayBDP = Bottleneck bandwidth x link RTT without queueing
Figure source: N. Cardwell et al., “BBR: congestion-based congestion control,” Queue, vol. 14, no. 5, pp. 50:20–50:53, Oct. 2016.
-
TCP BBR has been successfully deployed over Google’s WAN and over other wired networksReported higher throughput and less latency relative to Cubic
But when we tried BBR over a prototype mmWave backhaul link, throughput was much less than the link capacity
In the rest of this talk, we explain this observation:• Identify the cause and validate it experimentally• Model it• Suggest solutions
Significant throughput loss over mmWave backhaul link
-
Why did BBR throughput collapse over the mmWave link?
-
A possible reason: on a wireless link with very low latency,
jitter is large relative to mean delay
Jitter occurs due to process scheduling, MAC scheduling, channel dynamics, and handover process
RTT over time in our 60 GHz link
-
Does jitter reliably cause throughput collapse?
Observed throughput and BBR’s bandwidth estimate, RTT estimate, and CWND
Experimental setup on CloudLab to confirm that jitter causes throughput degradation
Bottleneck link(1 Gbps)
Minimal propagation delay, < 0.1ms
Emulated delay(5ms mean,
normally distributed jitter)
-
Jitter causes TCP BBR throughput to collapse
At first, increasing jitter does not affect throughput
Beyond a certain point, throughput collapsescompletely
Why?
Intermediate points: time is divided between low throughput/high throughput
-
BBR’s BW estimate collapses at the same point
Intermediate points: time is divided between low estimate/high estimate
-
The “knee” occurs when CWND drops below BDP
Link BDP
-
CWND < BDP when BBR’s RTT estimate is too low
BBR measures “RTT without queueing delay”:
1. Drain queue (send at CWND=4 for 200 ms)
2. Measure minimum RTT
Assumes min RTT ≈ RTT without queueing delay
But, when there is delay variation that is not due to congestion,
min RTT can be much less than “typical RTT without queueing delay”
When min RTT < ½ of “RTT without queueing delay”, CWND < BDP
-
Jitter causes throughput collapse in BBR
1. Due to jitter on the link, mRTTest< ½ “RTT without queuing delay”
2. Then CWND = 2 x BDPest = 2 x mRTTest x BWest < BDP
3. BWest collapses
4. Throughput collapses to minimal value
when minimum RTT estimate is < ½ of “typical RTT without queuing delay”
?
-
Why does CWND < BDP cause BWest to collapse?
BBR estimates BW as follows:• When pkt Y is sent, we note previous ACK X• When ACK Y arrives, we compute:
(Y – X) / (time between ACK X and ACK Y)
-
Why does CWND < BDP cause BWest to collapse?
BBR estimates BW as follows:• When pkt Y is sent, we note previous ACK X• When ACK Y arrives, we compute:
(Y – X) / (time between ACK X and ACK Y)
If sending is CWND-limited, there is an idle period and new BWest is too low!
Then, CWND drops even lower→ more CWND exhaustion while estimating BW→ feedback loop results in complete collapse
-
Mathematical model
-
BBR BW estimation: without CWND exhaustion
BBR estimates bottleneck bandwidth as maximum of 10 delivery rate samples
Packets are paced so the inter-departure time of packets at the time of packet P’s transmission is:
Δ𝑡𝑃−1 = Τ1 𝐵𝑊𝑒𝑠𝑡,𝑃−1
Between the transmission of P and the receipt of its ACK (𝑅𝑇𝑇𝑃), the number of packets sent* is
ൗ𝑅𝑇𝑇𝑃
Δ𝑡𝑃−1
Then its delivery rate sample will be
𝑑𝑃 =1
𝑅𝑇𝑇𝑃
𝑅𝑇𝑇𝑃Δ𝑡𝑃−1
=1
Δ𝑡𝑃−1
→ Delivery rate sample 𝑑𝑃 is equal to link capacity (if 𝐵𝑊𝑒𝑠𝑡,𝑃−1is correct)
* Assuming 𝛥𝑡𝑃−1 is constant between P’s transmission and its ACK.
-
BBR BW estimation: with CWND exhaustion
If CWND exhaustion may occur, then the number of packets sent* is
min 𝑊𝑃,𝑅𝑇𝑇𝑃Δ𝑡𝑃−1
(CWND 𝑊𝑃 = 2 𝐵𝑊𝑒𝑠𝑡,𝑃−1 𝑚𝑅𝑇𝑇𝑒𝑠𝑡,𝑃−1).
And the delivery rate sample is
𝑑𝑃 =1
𝑅𝑇𝑇𝑃min 𝑊𝑃,
𝑅𝑇𝑇𝑃Δ𝑡𝑃−1
≤1
Δ𝑡𝑃−1
→ Delivery rate sample 𝑑𝑃 may be less than link capacity
* Assuming 𝑊𝑃, 𝛥𝑡𝑃−1 constant between P’s transmission and its ACK.
-
BW estimate as a function of previous BW, mRTT estimates
Suppose we observe BW estimation over a duration of length 𝑅𝑇𝑇𝑈𝐶𝐴𝑉 (“typical uncongested RTT”)
Then BW estimate at the end of observation window 𝑗 is*
𝐵𝑊𝑒𝑠𝑡,𝑗
→ Model expresses BW estimate as a function of previous BW estimate, min RTT estimates, shows that BW
is underestimated when 𝑚𝑅𝑇𝑇𝑒𝑠𝑡 <1
2𝑅𝑇𝑇𝑈𝐶𝐴𝑉
* 𝑊𝑗 , 𝛥𝑡𝑗−1 are assumed to be constant and delivery rate samples are identical during the observation window.
=1
𝑅𝑇𝑇𝑈𝐶𝐴𝑉min 𝑊𝑗 ,
𝑅𝑇𝑇𝑈𝐶𝐴𝑉Δ𝑡𝑗−1
=1
𝑅𝑇𝑇𝑈𝐶𝐴𝑉min 2 𝐵𝑊𝑒𝑠𝑡,𝐽−1 𝑚𝑅𝑇𝑇𝑒𝑠𝑡,𝐽−1,
𝑅𝑇𝑇𝑈𝐶𝐴𝑉Δ𝑡𝑃−1
=1
𝑅𝑇𝑇𝑈𝐶𝐴𝑉min
2 𝑚𝑅𝑇𝑇𝑒𝑠𝑡,𝑗−1
Δ𝑡𝑗−1,𝑅𝑇𝑇𝑈𝐶𝐴𝑉Δ𝑡𝑗−1
=1
𝑅𝑇𝑇𝑈𝐶𝐴𝑉 Δ𝑡𝑗−1min 2 𝑚𝑅𝑇𝑇𝑒𝑠𝑡,𝑗−1, 𝑅𝑇𝑇𝑈𝐶𝐴𝑉
When 𝑚𝑅𝑇𝑇𝑒𝑠𝑡 <1
2𝑅𝑇𝑇𝑈𝐶𝐴𝑉
→ first term is minimum→ BW estimate is too lowTriggers start of collapse
-
Experimental observation and model prediction
We compare experimental observations to
• average BW estimate computed iteratively from BW estimate expression, Monte Carlo simulation of mRTT estimate assuming truncated normal distribution, approximate queuing delay
• average throughput computed iteratively using CWND
with good agreement.
Comparison of experimental observation and analytical result
(a) Bandwidth estimate (b) TCP BBR throughput
-
Potential solutions
-
Possible solutionsAvoid throughput and bandwidth estimate collapse by keeping CWND > BDP
1. Fix bandwidth estimate as bottleneck link capacity, and uses it for estimating BDP?• Fixes “collapse”, but BDP is still lower than it should be
low min RTT estimate → CWND-limited while sending → throughput degradation• How to “know” link capacity?
2. Fix minimum RTT as uncongested average RTT, and uses it in computing estimated BDP?• Fixes BW estimate collapse, can be properly estimated
(a) Bandwidth estimate (b) Minimum RTT estimate (c) TCP BBR throughput
-
Conclusions
-
Conclusions
o BBR could be a good candidate for emerging applications requiring high throughput, low delay on 5G networks
o But, we observed severe throughput degradation over a 60 GHz link
o With experiments on CloudLab, we show that this is caused by RTT variation
• In particular, when minimum RTT estimate falls below half of the typical RTT of the link when it is not congested, a series of events results in complete throughput collapse
o We present a mathematical model of this process
o We evaluate two possible approaches to solve the problem
• By “fixing” the RTT estimate
• By “fixing” the BW estimate
-
Thank You
-
Back-up slides
-
TCP BBR Phases
Bandwidth Estimation PhaseObtains the initial estimate of bottleneck bandwidth using binary search01Bandwidth Probe PhaseContinuously updates the bottleneck bandwidth estimate and minimum RTT to size congestion window to obtain maximum throughput and lowest possible latency.
03
Drain PhaseObtains the minimum RTT of TCP session by exponentially reducing the sending rate02RTT Probe PhaseEnters this phase if minimum RTT does not get updated for last 10 seconds. Sets the congestion window as 4 MSS to obtain new estimate of minimum RTT.
04
Beginning ofTCP Session
Beginning ofTCP Session
98% time ofTCP session
2% time ofTCP session
At the initialization of TCP session
Steady State of TCP session(spend most of the time during
a persistent TCP session)
-
Effect of Jitter an example
0
1
2
3
4
5
6
0 2 4 6 8 10
RT
T [m
s]
0
1
2
3
4
5
6
0 2 4 6 8 10
mR
TT
[ms]
0
100
200
300
400
500
0 2 4 6 8 10
Esti
mat
ed B
DP
[M
SS]
0
100
200
300
400
500
600
700
800
900
1000
0 1 2 3 4 5 6 7 8 9 10
CW
ND
[M
SS]
0
200
400
600
800
1000
1200
0 1 2 3 4 5 6 7 8 9 10
BW
est
imat
e [M
bp
s]
-
BBR 2019
Some recent changes to BBR have the side effect of increased CWND.
This does not resolve the problem, but reduces the likelihood of observing it in practice.
-
BBR 2019
Solutions work in a similar way.
(a) Bandwidth estimate (b) Minimum RTT estimate (c) TCP BBR throughput
-
Approximation of 𝑚𝑅𝑇𝑇𝑒𝑠𝑡A minimum RTT may be observed in the Probe BW phase (where BBR also transmits data at the estimated link capacity) or in the Probe RTT phase (where CWND = 4).
The expectation of the min RTT estimate at the end of a single Probe RTT phase followed by a single Probe BW phase:
𝔼 𝑚𝑅𝑇𝑇𝑒𝑠𝑡 = න0
∞
1 − 𝐹𝑋(𝑠)𝑁 1 − 𝐹𝑌(𝑠)
𝑀𝑑𝑠
Where • 𝐹𝑋(𝑠) is the CDF of the RTT samples in the Probe RTT phase, • 𝑁 is the number of RTT samples in the Probe RTT phase,• 𝐹𝑌(𝑠) is the CDF of the RTT samples in the Probe BW phase*,• 𝑀 is the number of RTT samples in the Probe BW phase*.
* 𝑀 depends on the sending rate during Probe BW phase, and whether 10 seconds elapse without observing a new min RTT.𝐹𝑌(𝑠) depends on queuing delay, which in turn depends on sending rate (and on whether collapse has occurred).