fast and cautious: leveraging multi-path diversity for ... · fast and cautious: leveraging...

Post on 19-Apr-2020

17 Views

Category:

Documents

0 Downloads

Preview:

Click to see full reader

TRANSCRIPT

Fast and Cautious:Leveraging Multi-path Diversity

for Transport Loss Recovery in Data Centers

Guo ChenYuanwei Lu, Yuan Meng, Bojie Li, Kun Tan, Dan Pei, Peng Cheng, Layong (Larry) Luo, Yongqiang Xiong,

Xiaoliang Wang, and Youjian Zhao

Motivationn Services care about the tail flow completion time (tail FCT)

¨ Large number of flows generated in each operation¨ Overall performance governed by the last completed flows

16/6/25 2

Large-scale web applicationhosted in

Data Center Network (DCN)

AppLogic

AppLogic

AppLogic

AppLogic

AppLogic

AppLogic

AppLogic

AppLogic

AppLogic

AppLogic

Respondingtoauserrequest

Motivationn Services care about the tail flow completion time (tail FCT)

¨ Large number of flows generated in each operation¨ Overall performance governed by the last completed flows

n But packet loss hurts tail FCT¨ Real case in a Microsoft Azure’s DCN

16/6/25 3

Spine switch 2% random drop rate -->

increase of 99th percentile latency of all users

DCNtaillatencyvisualization[Pingmesh (SIGCOMM’15)]

(a)Normal (b)Spinefailure

Outlinen Motivationn Packet Loss in DCNn Impact of Packet Lossn Challenge for Loss Recoveryn FUSO Designn Evaluationn Summary

16/6/25 4

Lossrateandlocationdistributionoflossy links(lossrate>1%)

Mean loss rate 4%

78% above ToR

Similar in 5 days

Packet Loss in DCN

16/6/25 5

1) Lossfrequently happens(theoveralllossrateislow)2) Mostlosseshappeninthenetworkinsteadoftheedge

n Loss characteristics¨ Measured in a Microsoft production DCN during Dec. 1st-5th, 2015

Packet Loss in DCNn Reasons causing loss

¨ Congestion lossØ Uneven load-balance

Ø Incast

¨ Failure lossØ Silent random drop

Ø Packet black-hole

Bursty;Transient

16/6/25 6

Complex;Hardtodetect

Greatly mitigated(e.g., 1%->0.01%)[Jupiter Rising SIGCOMM’15]

Common& Huge impacton performance[Pingmesh SIGCOMM’15]

Outlinen Motivationn Packet Loss in DCNn Impact of Packet Loss

¨ Why loss hurts the tail?¨ How hard loss hurts?

n Challenge for Loss Recoveryn FUSO Designn Evaluationn Summary

16/6/25 7

How TCP Handles Loss?n Fast recovery

¨ Wait for certain number of DACKs to detect the loss and retransmit

8

1-2

Ack1-2

3-6

DupAck3

Retran3

RTT

RTT

Sender Receiver

1-2

Ack1-2

3-6

Retran3

RTT

Timeout

Sender Receiver

How TCP Handles Loss?n Fast recovery

¨ Wait for certain number of DACKs to detect the loss and retransmit

n Timeout (RTO)¨ If not enough DACKs return, retransmit

after a timeout

9

RTO>> RTTe.g.RTO=5ms,RTT<100us[Pingmesh (SIGCOMM’15),DCTCP(SIGCOMM’10)]

1-2

Ack1-2

3-6

Retran3

RTT

Timeout

Sender Receiver

How TCP Handles Loss?n Fast recovery

¨ Wait for certain number of DACKs to detect the loss and retransmit

n Timeout (RTO)¨ If not enough DACKs return, retransmit

after a timeout

10

RTO>> RTTe.g.RTO=5ms,RTT<100us[Pingmesh (SIGCOMM’15),DCTCP(SIGCOMM’10)]

Encountering one RTO àdramatically increase the FCT

Timeout probability of flows with different sizes passing a path with different packet loss rate

10KB(testbed) 100KB(testbed)

100KB(analysis)

10KB(analysis)

Loss Incurs Timeout

1. 1% loss à more than 1% flows timeout2. Larger flows (e.g. 100KB)

a. timeout ratio sharply grows when loss rate > 1%16/6/25 11

99th FCT>RTO3%lossà ~10%timeout

n A little loss causes enough timeout to hurt the tail FCT

Timeout probability of flows with different sizes passing a path with different packet loss rate

10KB(testbed) 100KB(testbed)

100KB(analysis)

10KB(analysis)

Loss Incurs Timeout

1. 1% loss à more than 1% flows timeout2. Larger flows (e.g. 100KB)

a. timeout ratio sharply grows when loss rate > 1%16/6/25 12

99th FCT>RTO3%lossà ~10%timeout

n A little loss causes enough timeout to hurt the tail FCT

To avoid RTO

Outlinen Motivationn Packet Loss in DCNn Impact of Packet Lossn Challenge for Loss Recoveryn FUSO Designn Evaluationn Summary

16/6/25 13

Challenge for TCP Loss Recoveryn Prior works add aggressiveness to congestion control to do

loss recovery before timeout (RTO)¨ Tail Loss Probe (TLP)

Ø transmit one prober after 2RTT

¨ Instant Recovery (TCP-IR)Ø generate an FEC packet for every group of packets (up to 16)

Ø FEC packets also act as probers, delayed 1/4RTT before sent

¨ Proactive/RepFlowØ Duplicate every packet/flow

16/6/25 14

[SIGCOMM’13,RFC5827]

[SIGCOMM’13,RFC5827]

[SIGCOMM’13,INFOCOM’14]

Challenge for TCP Loss Recoveryn How long to wait before sending recovery packets?

¨ For congestion lossØ Should delay enough in case of worsening congestion

16/6/25 15

Bursty:Lead to multiple consecutive losses

[Incast (WREN’09),DCTCP(SIGCOMM’10)]

Challenge for TCP Loss Recoveryn How long to wait before sending recovery packets?

¨ For congestion lossØ Should delay enough in case of worsening congestion

¨ For failure loss such as random dropØ Should recover as fast as possible, otherwise already increase the FCT

16/6/25 16

• Wait 2RTT is too costly• Accurate & high-precision RTT measurement is challenging

[TLPSIGCOMM’13,RFC5827]

How to accelerate loss recovery as soon as possible, under various loss conditions without causing congestion?

Brief Summaryn Loss easily incurs timeout to hurt the tailn To prevent timeout, prior works add fixed aggressiveness to

recover loss before timeoutn Hard to adapt to various loss conditions

¨ Should be fast for failure loss¨ Should be cautious for congestion loss

16/6/25 17

Outlinen Motivationn Packet Loss in DCNn Impact of Packet Lossn Challenge for Loss Recoveryn FUSO Designn Evaluationn Summary

16/6/25 18

n Utilize the “good” paths to proactively conduct loss recovery for “bad” paths¨ Leveraging path diversity (multiple paths; a few encounter loss)

n Fast and Cautious¨ Fast

Ø Proactive (immediate) recovery for potential packet loss utilizing sparetransmission opportunity

¨ CautiousØ Strictly follow congestion control without adding aggressiveness

FUSO: Fast Multi-path Loss Recovery

16/6/25 19

ReceiverSender

Multi-path Transport Background

16/6/25 20

SF1

SF2

SF3

SF1

SF2

SF3

CWND2CWNDtotal

CWND1

CWND3

Multi-pathCongestionControl

DataDistribution

Sub-flows:Implicitly/Explicitlymappingtophysicalpaths

ReceiverSender

FUSO

16/6/25 21

SF1

SF2

SF3

SF1

SF2

SF3

CWNDtotal

CWND1

CWND2

CWND3

P2P3P4P5 P1

ReceiverSender

FUSO

16/6/25 22

SF1

SF2

SF3

SF1

SF2

SF3

CWNDtotal

CWND1

CWND2

CWND3

P2P3P4P5

P1

ReceiverSender

FUSO

16/6/25 23

SF1

SF2

SF3

SF1

SF2

SF3

CWNDtotal

CWND1

CWND2

CWND3

P2P3P4P5

P1

ReceiverSender

FUSO

16/6/25 24

SF1

SF2

SF3

SF1

SF2

SF3

CWNDtotal

CWND1

CWND2

CWND3

P2P3P4P5 P1

ReceiverSender

FUSO

16/6/25 25

SF1

SF2

SF3

SF1

SF2

SF3

CWNDtotal

CWND1

CWND2

CWND3

P2P3P4P5 P1

ReceiverSender

FUSO

16/6/25 26

SF1

SF2

SF3

SF1

SF2

SF3

CWNDtotal

CWND1

CWND2

CWND3

P3P4P5 P1P2

Lost

ReceiverSender

FUSO

16/6/25 27

SF1

SF2

SF3

SF1

SF2

SF3

CWNDtotal

CWND1

CWND2

CWND3

P4P5 P1

P3

P2

Lost

ReceiverSender

FUSO

16/6/25 28

SF1

SF2

SF3

SF1

SF2

SF3

CWNDtotal

CWND1

CWND2

CWND3

P4P5 P1

P3

P2

Lost

ReceiverSender

FUSO

16/6/25 29

SF1

SF2

SF3

SF1

SF2

SF3

CWNDtotal

CWND1

CWND2

CWND3

P4P5 P1P3P2

Lost

ReceiverSender

FUSO

16/6/25 30

SF1

SF2

SF3

SF1

SF2

SF3

CWNDtotal

CWND1

CWND2

CWND3

P4P5 P1P3P2

Lost

ACKP3

ReceiverSender

FUSO

16/6/25 31

SF1

SF2

SF3

SF1

SF2

SF3

CWNDtotal

CWND1

CWND2

CWND3

P4P5 P1P3P2

Lost

ReceiverSender

FUSO

16/6/25 32

SF1

SF2

SF3

SF1

SF2

SF3

CWNDtotal

CWND1

CWND2

CWND3

P4P5

P1P3P2

Lost

ReceiverSender

FUSO

16/6/25 33

SF1

SF2

SF3

SF1

SF2

SF3

CWNDtotal

CWND1

CWND2

CWND3

P1P3P2

Lost

P4P5

ReceiverSender

FUSO

16/6/25 34

SF1

SF2

SF3

SF1

SF2

SF3

CWNDtotal

CWND1

CWND2

CWND3

P1P3P2

Lost

P4P5

ReceiverSender

FUSO

16/6/25 35

SF1

SF2

SF3

SF1

SF2

SF3

CWNDtotal

CWND1

CWND2

CWND3

P1P3P2

Lost

ACKP1P4P5

ACKP4&P5

ReceiverSender

FUSO

16/6/25 36

SF1

SF2

SF3

SF1

SF2

SF3

CWNDtotal

CWND1

CWND2

CWND3

P3P4P5 P1P2

Lost

ReceiverSender

FUSO

16/6/25 37

SF1

SF2

SF3

SF1

SF2

SF3

CWNDtotal

CWND1

CWND2

CWND3

P3P4P5 P1P2

LostSpareCWND

Nonewdata

ReceiverSender

FUSO

16/6/25 38

SF1

SF2

SF3

SF1

SF2

SF3

CWNDtotal

CWND1

CWND2

CWND3

P3P4P5 P1P2

Lost

Proactivelossrecovery

P2

ReceiverSender

FUSO

16/6/25 39

SF1

SF2

SF3

SF1

SF2

SF3

CWNDtotal

CWND1

CWND2

CWND3

P3P4P5 P1P2

Lost

Proactivelossrecovery

P2

“Worst”sub-flow

“Best”sub-flow

ReceiverSender

FUSO

16/6/25 40

SF1

SF2

SF3

SF1

SF2

SF3

CWNDtotal

CWND1

CWND2

CWND3

P3P4P5 P1P2

Lost

Proactivelossrecovery

P2

“Worst”sub-flow

“Best”sub-flow

P2

ReceiverSender

FUSO

16/6/25 41

SF1

SF2

SF3

SF1

SF2

SF3

CWNDtotal

CWND1

CWND2

CWND3

P3P4P5 P1P2

Lost

Proactivelossrecovery

P2

“Worst”sub-flow

“Best”sub-flow

P2

ReceiverSender

FUSO

16/6/25 42

SF1

SF2

SF3

SF1

SF2

SF3

CWNDtotal

CWND1

CWND2

CWND3

P3P4P5 P1P2

Lost

Proactivelossrecovery

P2

“Worst”sub-flow

“Best”sub-flow

P2

ReceiverSender

FUSO

16/6/25 43

SF1

SF2

SF3

SF1

SF2

SF3

CWNDtotal

CWND1

CWND2

CWND3

P3P4P5 P1P2

Lost

Proactivelossrecovery

P2

“Worst”sub-flow

“Best”sub-flow

P2

Done!

ReceiverSender

Standard MPTCP

16/6/25 44

SF1

SF2

SF3

SF1

SF2

SF3

CWNDtotal

CWND1

CWND2

CWND3

P3P4P5 P1P2

LostRetransmitafteranRTO

Sender

FUSO: Path Selection

16/6/25 45

SF1

SF2

SF3

CWNDtotal

CWND1

CWND2

CWND3

P2

Lost

Possibility of encountering loss

Sender

FUSO: Path Selection

16/6/25 46

SF1

SF2

SF3

CWNDtotal

CWND1

CWND2

CWND3

P2

Lost

“Worst”sub-flow

n “Worst” Sub-flow¨ With un-ACKed data¨ Most likely having loss

Un-ACKed data

Possibility of encountering loss

Sender

FUSO: Path Selection

16/6/25 47

SF1

SF2

SF3

CWNDtotal

CWND1

CWND2

CWND3

P2

Lost

“Worst”sub-flow

n “Worst” Sub-flow¨ With un-ACKed data¨ Most likely having loss

Possibility of encountering loss

n “Best” Sub-flow¨ With spare CWND¨ Least likely having loss

Spare CWND

“Best”sub-flow

FUSO in 1 Slide

n If (spare CWND) && (no new data)¨ Utilize the transmission opportunity to proactively recover¨ Use “good” paths to help “bad” paths

n Multi-path diversity offers many transmission opportunities¨ “Good” paths have spare window

16/6/25 48

AppData...

P2

MultipathCongestionControl

SendtobestSub-Flow

P1

P5P6P7R

P4

Sparewindow

...

Sparewindow

Un-ACKeddata

P4

Sender

AppData

R

Recover

P3

Receiver

Sub-Flow1

Sub-FlowN

Sub-Flow2P3

Sub-Flow1

Sub-Flow2

Sub-FlowN

Recoverypackets

FUSO Implementationn Implemented in Linux kernel; ~900 lines of code

16/6/25 49

https://github.com/1989chenguo/FUSO

Outlinen Motivationn Packet Loss in DCNn Impact of Packet Lossn Challenge for Loss Recoveryn FUSO Designn Evaluationn Summary

16/6/25 50

Testbed Settingsn Network

¨ 1Gbps fabric & 1Gbps hosts; ECMP routing; ECN enabledn TCP

¨ Init_cwnd=16; min_RTO=5ms

16/6/25 51

99th FCT % of flows encountering

timeout

better

Testbed Resultsn Failure loss

¨ Random-drop

16/6/25 52

Fast

Reducing 99th FCT up to ~82.3%

Reducing the timeout flows up to 100%

Loss rate:0.125%-4%

Latency-sensitiveflows

better

Testbed Resultsn Congestion loss

¨ Incast

16/6/25 53

Concurrentresponses

Performs the best

Cautious

Testbed Resultsn Failure loss & Congestion loss

¨ From failure-loss-dominated to congestion-loss-dominated

16/6/25

54

Loss rate:2%

Latency-sensitiveflows

Adapt to various loss condition

better

Background long flows

Larger-scale Simulationsn Simulation settings

¨ NS2 simulator; 3-layer, 4-port FatTree

¨ 40Gbps fabric, 10Gbps host; 64 hosts, 20 switches

¨ Empirical failure generation

16/6/25 55

Latency-sensitiveflows

Backgroundlong flows

Random failure

better

Larger-scale Simulationsn Simulation settings

¨ NS2 simulator; 3-layer, 4-port FatTree fabric¨ 40Gbps fabric, 10Gbps host; 64 hosts, 20 switches¨ Empirical failure generation

16/6/25 56

Reducing the average FCT up to ~60.3%

Reducing the 99th FCT up to ~87.4%

Outlinen Motivationn Packet Loss in DCNn Impact of Packet Lossn Challenge for Loss Recoveryn FUSO Designn Evaluationn Summary

16/6/25 57

Summaryn Loss hurts tail latency

¨ Loss is not uncommon¨ A little loss leads to enough timeout, hurting the tail

n Challenges for loss recovery¨ How to accelerate loss recovery under various loss conditions without

causing congestion?n Philosophy for FUSO

¨ To be fast & cautious are equally important¨ Fast: Proactive loss recovery utilizing spare transmission opportunity,

leveraging multipath diversity ¨ Cautious: Strictly follows congestion control without adding

aggressiveness16/6/25 58

Q&A?

Thanks

top related