tcp bbr for ultra-low latency networking:...

31
TCP BBR for Ultra-Low Latency Networking: Challenges, Analysis, and Solutions Rajeev Kumar*, Athanasios Koutsaftis*, Fraida Fund*, Gaurang Naik , Pei Liu*, Yong Liu*, and Shivendra Panwar* *Tandon School of Engineering, New York University, Brooklyn, NY, USA,† Virginia Tech, Arlington, VA, USA

Upload: others

Post on 22-Jun-2020

12 views

Category:

Documents


0 download

TRANSCRIPT

  • TCP BBR for Ultra-Low Latency Networking: Challenges, Analysis, and Solutions

    Rajeev Kumar*, Athanasios Koutsaftis*, Fraida Fund*, Gaurang Naik†, Pei Liu*,Yong Liu*, and Shivendra Panwar*

    *Tandon School of Engineering, New York University, Brooklyn, NY, USA, † Virginia Tech, Arlington, VA, USA

  • Outline

    Motivation01

    Identifying the problem02

    Mathematical model03

    Possible solutions04

  • Emerging applications require high throughput, low latency

    Can be supported by LTE or LTE advanced

    Likely to be supportedby millimeter wave (mmW) 5G networks

    Figure source: GSMA Intelligence Understanding 5G

    5G mmWave links promise to support these requirements…

    … but we still need a suitable transport protocol on top of 5G

  • TCP BBR: designed for maximum throughput, minimum latency

    Loss-based congestion control fills bufferAchieves full throughput, but induces queuing delay

    TCP BBR doesn’t fill bufferEstimates bottleneck bandwidth and sends at that rate

    TCP BBR operates at the BDPAlso tries to estimate what link RTT is without queueing delayBDP = Bottleneck bandwidth x link RTT without queueing

    Figure source: N. Cardwell et al., “BBR: congestion-based congestion control,” Queue, vol. 14, no. 5, pp. 50:20–50:53, Oct. 2016.

  • TCP BBR has been successfully deployed over Google’s WAN and over other wired networksReported higher throughput and less latency relative to Cubic

    But when we tried BBR over a prototype mmWave backhaul link, throughput was much less than the link capacity

    In the rest of this talk, we explain this observation:• Identify the cause and validate it experimentally• Model it• Suggest solutions

    Significant throughput loss over mmWave backhaul link

  • Why did BBR throughput collapse over the mmWave link?

  • A possible reason: on a wireless link with very low latency,

    jitter is large relative to mean delay

    Jitter occurs due to process scheduling, MAC scheduling, channel dynamics, and handover process

    RTT over time in our 60 GHz link

  • Does jitter reliably cause throughput collapse?

    Observed throughput and BBR’s bandwidth estimate, RTT estimate, and CWND

    Experimental setup on CloudLab to confirm that jitter causes throughput degradation

    Bottleneck link(1 Gbps)

    Minimal propagation delay, < 0.1ms

    Emulated delay(5ms mean,

    normally distributed jitter)

  • Jitter causes TCP BBR throughput to collapse

    At first, increasing jitter does not affect throughput

    Beyond a certain point, throughput collapsescompletely

    Why?

    Intermediate points: time is divided between low throughput/high throughput

  • BBR’s BW estimate collapses at the same point

    Intermediate points: time is divided between low estimate/high estimate

  • The “knee” occurs when CWND drops below BDP

    Link BDP

  • CWND < BDP when BBR’s RTT estimate is too low

    BBR measures “RTT without queueing delay”:

    1. Drain queue (send at CWND=4 for 200 ms)

    2. Measure minimum RTT

    Assumes min RTT ≈ RTT without queueing delay

    But, when there is delay variation that is not due to congestion,

    min RTT can be much less than “typical RTT without queueing delay”

    When min RTT < ½ of “RTT without queueing delay”, CWND < BDP

  • Jitter causes throughput collapse in BBR

    1. Due to jitter on the link, mRTTest< ½ “RTT without queuing delay”

    2. Then CWND = 2 x BDPest = 2 x mRTTest x BWest < BDP

    3. BWest collapses

    4. Throughput collapses to minimal value

    when minimum RTT estimate is < ½ of “typical RTT without queuing delay”

    ?

  • Why does CWND < BDP cause BWest to collapse?

    BBR estimates BW as follows:• When pkt Y is sent, we note previous ACK X• When ACK Y arrives, we compute:

    (Y – X) / (time between ACK X and ACK Y)

  • Why does CWND < BDP cause BWest to collapse?

    BBR estimates BW as follows:• When pkt Y is sent, we note previous ACK X• When ACK Y arrives, we compute:

    (Y – X) / (time between ACK X and ACK Y)

    If sending is CWND-limited, there is an idle period and new BWest is too low!

    Then, CWND drops even lower→ more CWND exhaustion while estimating BW→ feedback loop results in complete collapse

  • Mathematical model

  • BBR BW estimation: without CWND exhaustion

    BBR estimates bottleneck bandwidth as maximum of 10 delivery rate samples

    Packets are paced so the inter-departure time of packets at the time of packet P’s transmission is:

    Δ𝑡𝑃−1 = Τ1 𝐵𝑊𝑒𝑠𝑡,𝑃−1

    Between the transmission of P and the receipt of its ACK (𝑅𝑇𝑇𝑃), the number of packets sent* is

    ൗ𝑅𝑇𝑇𝑃

    Δ𝑡𝑃−1

    Then its delivery rate sample will be

    𝑑𝑃 =1

    𝑅𝑇𝑇𝑃

    𝑅𝑇𝑇𝑃Δ𝑡𝑃−1

    =1

    Δ𝑡𝑃−1

    → Delivery rate sample 𝑑𝑃 is equal to link capacity (if 𝐵𝑊𝑒𝑠𝑡,𝑃−1is correct)

    * Assuming 𝛥𝑡𝑃−1 is constant between P’s transmission and its ACK.

  • BBR BW estimation: with CWND exhaustion

    If CWND exhaustion may occur, then the number of packets sent* is

    min 𝑊𝑃,𝑅𝑇𝑇𝑃Δ𝑡𝑃−1

    (CWND 𝑊𝑃 = 2 𝐵𝑊𝑒𝑠𝑡,𝑃−1 𝑚𝑅𝑇𝑇𝑒𝑠𝑡,𝑃−1).

    And the delivery rate sample is

    𝑑𝑃 =1

    𝑅𝑇𝑇𝑃min 𝑊𝑃,

    𝑅𝑇𝑇𝑃Δ𝑡𝑃−1

    ≤1

    Δ𝑡𝑃−1

    → Delivery rate sample 𝑑𝑃 may be less than link capacity

    * Assuming 𝑊𝑃, 𝛥𝑡𝑃−1 constant between P’s transmission and its ACK.

  • BW estimate as a function of previous BW, mRTT estimates

    Suppose we observe BW estimation over a duration of length 𝑅𝑇𝑇𝑈𝐶𝐴𝑉 (“typical uncongested RTT”)

    Then BW estimate at the end of observation window 𝑗 is*

    𝐵𝑊𝑒𝑠𝑡,𝑗

    → Model expresses BW estimate as a function of previous BW estimate, min RTT estimates, shows that BW

    is underestimated when 𝑚𝑅𝑇𝑇𝑒𝑠𝑡 <1

    2𝑅𝑇𝑇𝑈𝐶𝐴𝑉

    * 𝑊𝑗 , 𝛥𝑡𝑗−1 are assumed to be constant and delivery rate samples are identical during the observation window.

    =1

    𝑅𝑇𝑇𝑈𝐶𝐴𝑉min 𝑊𝑗 ,

    𝑅𝑇𝑇𝑈𝐶𝐴𝑉Δ𝑡𝑗−1

    =1

    𝑅𝑇𝑇𝑈𝐶𝐴𝑉min 2 𝐵𝑊𝑒𝑠𝑡,𝐽−1 𝑚𝑅𝑇𝑇𝑒𝑠𝑡,𝐽−1,

    𝑅𝑇𝑇𝑈𝐶𝐴𝑉Δ𝑡𝑃−1

    =1

    𝑅𝑇𝑇𝑈𝐶𝐴𝑉min

    2 𝑚𝑅𝑇𝑇𝑒𝑠𝑡,𝑗−1

    Δ𝑡𝑗−1,𝑅𝑇𝑇𝑈𝐶𝐴𝑉Δ𝑡𝑗−1

    =1

    𝑅𝑇𝑇𝑈𝐶𝐴𝑉 Δ𝑡𝑗−1min 2 𝑚𝑅𝑇𝑇𝑒𝑠𝑡,𝑗−1, 𝑅𝑇𝑇𝑈𝐶𝐴𝑉

    When 𝑚𝑅𝑇𝑇𝑒𝑠𝑡 <1

    2𝑅𝑇𝑇𝑈𝐶𝐴𝑉

    → first term is minimum→ BW estimate is too lowTriggers start of collapse

  • Experimental observation and model prediction

    We compare experimental observations to

    • average BW estimate computed iteratively from BW estimate expression, Monte Carlo simulation of mRTT estimate assuming truncated normal distribution, approximate queuing delay

    • average throughput computed iteratively using CWND

    with good agreement.

    Comparison of experimental observation and analytical result

    (a) Bandwidth estimate (b) TCP BBR throughput

  • Potential solutions

  • Possible solutionsAvoid throughput and bandwidth estimate collapse by keeping CWND > BDP

    1. Fix bandwidth estimate as bottleneck link capacity, and uses it for estimating BDP?• Fixes “collapse”, but BDP is still lower than it should be

    low min RTT estimate → CWND-limited while sending → throughput degradation• How to “know” link capacity?

    2. Fix minimum RTT as uncongested average RTT, and uses it in computing estimated BDP?• Fixes BW estimate collapse, can be properly estimated

    (a) Bandwidth estimate (b) Minimum RTT estimate (c) TCP BBR throughput

  • Conclusions

  • Conclusions

    o BBR could be a good candidate for emerging applications requiring high throughput, low delay on 5G networks

    o But, we observed severe throughput degradation over a 60 GHz link

    o With experiments on CloudLab, we show that this is caused by RTT variation

    • In particular, when minimum RTT estimate falls below half of the typical RTT of the link when it is not congested, a series of events results in complete throughput collapse

    o We present a mathematical model of this process

    o We evaluate two possible approaches to solve the problem

    • By “fixing” the RTT estimate

    • By “fixing” the BW estimate

  • Thank You

  • Back-up slides

  • TCP BBR Phases

    Bandwidth Estimation PhaseObtains the initial estimate of bottleneck bandwidth using binary search01Bandwidth Probe PhaseContinuously updates the bottleneck bandwidth estimate and minimum RTT to size congestion window to obtain maximum throughput and lowest possible latency.

    03

    Drain PhaseObtains the minimum RTT of TCP session by exponentially reducing the sending rate02RTT Probe PhaseEnters this phase if minimum RTT does not get updated for last 10 seconds. Sets the congestion window as 4 MSS to obtain new estimate of minimum RTT.

    04

    Beginning ofTCP Session

    Beginning ofTCP Session

    98% time ofTCP session

    2% time ofTCP session

    At the initialization of TCP session

    Steady State of TCP session(spend most of the time during

    a persistent TCP session)

  • Effect of Jitter an example

    0

    1

    2

    3

    4

    5

    6

    0 2 4 6 8 10

    RT

    T [m

    s]

    0

    1

    2

    3

    4

    5

    6

    0 2 4 6 8 10

    mR

    TT

    [ms]

    0

    100

    200

    300

    400

    500

    0 2 4 6 8 10

    Esti

    mat

    ed B

    DP

    [M

    SS]

    0

    100

    200

    300

    400

    500

    600

    700

    800

    900

    1000

    0 1 2 3 4 5 6 7 8 9 10

    CW

    ND

    [M

    SS]

    0

    200

    400

    600

    800

    1000

    1200

    0 1 2 3 4 5 6 7 8 9 10

    BW

    est

    imat

    e [M

    bp

    s]

  • BBR 2019

    Some recent changes to BBR have the side effect of increased CWND.

    This does not resolve the problem, but reduces the likelihood of observing it in practice.

  • BBR 2019

    Solutions work in a similar way.

    (a) Bandwidth estimate (b) Minimum RTT estimate (c) TCP BBR throughput

  • Approximation of 𝑚𝑅𝑇𝑇𝑒𝑠𝑡A minimum RTT may be observed in the Probe BW phase (where BBR also transmits data at the estimated link capacity) or in the Probe RTT phase (where CWND = 4).

    The expectation of the min RTT estimate at the end of a single Probe RTT phase followed by a single Probe BW phase:

    𝔼 𝑚𝑅𝑇𝑇𝑒𝑠𝑡 = න0

    1 − 𝐹𝑋(𝑠)𝑁 1 − 𝐹𝑌(𝑠)

    𝑀𝑑𝑠

    Where • 𝐹𝑋(𝑠) is the CDF of the RTT samples in the Probe RTT phase, • 𝑁 is the number of RTT samples in the Probe RTT phase,• 𝐹𝑌(𝑠) is the CDF of the RTT samples in the Probe BW phase*,• 𝑀 is the number of RTT samples in the Probe BW phase*.

    * 𝑀 depends on the sending rate during Probe BW phase, and whether 10 seconds elapse without observing a new min RTT.𝐹𝑌(𝑠) depends on queuing delay, which in turn depends on sending rate (and on whether collapse has occurred).