experience with loss-based congestion controlled tcp stacks yee-ting li university college london

33
Experience with Loss- Experience with Loss- Based Congestion Based Congestion Controlled TCP Stacks Controlled TCP Stacks Yee-Ting Li Yee-Ting Li University College London University College London

Upload: primrose-knight

Post on 03-Jan-2016

215 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Experience with Loss-Based Congestion Controlled TCP Stacks Yee-Ting Li University College London

Experience with Loss-Based Experience with Loss-Based Congestion Controlled TCP Congestion Controlled TCP

StacksStacks

Yee-Ting LiYee-Ting Li

University College LondonUniversity College London

Page 2: Experience with Loss-Based Congestion Controlled TCP Stacks Yee-Ting Li University College London

IntroductionIntroduction

Transport of Data for next generation Transport of Data for next generation applicationsapplications

Network hardware is capable of Gigabits Network hardware is capable of Gigabits per secondper second

Current ‘Vanilla’ TCP not capable over long Current ‘Vanilla’ TCP not capable over long distances and high throughputsdistances and high throughputs

New TCP Stacks have been introduced to New TCP Stacks have been introduced to rectify problemrectify problem

Investigation into the performance, Investigation into the performance, bottlenecks and deploy-ability of new bottlenecks and deploy-ability of new algorithmsalgorithms

Page 3: Experience with Loss-Based Congestion Controlled TCP Stacks Yee-Ting Li University College London

Transmission Control Transmission Control ProtocolProtocol

Connection orientatedConnection orientated Reliable Transport of DataReliable Transport of Data Window basedWindow based Congestion and Flow Control to prevent Congestion and Flow Control to prevent

network collapsenetwork collapse Provides ‘fairness’ between competing Provides ‘fairness’ between competing

streamsstreams 20 Years old20 Years old

Originally designed for kbit/sec pipesOriginally designed for kbit/sec pipes

Page 4: Experience with Loss-Based Congestion Controlled TCP Stacks Yee-Ting Li University College London

TCP AlgorithmsTCP Algorithms Based on two algorithms to determine rate at Based on two algorithms to determine rate at

which data is to be sentwhich data is to be sent Slowstart: probe for initial bandwidthSlowstart: probe for initial bandwidth Congestion Avoidance: maintain a steady state transfer Congestion Avoidance: maintain a steady state transfer

raterate Focus on Steady State: probe for increases in Focus on Steady State: probe for increases in

available bandwidth, whilst backing off if available bandwidth, whilst backing off if congestion is detected (through loss).congestion is detected (through loss).

Maintained through a ‘congestion window’ cwnd Maintained through a ‘congestion window’ cwnd that regulates the number of unacknowledged that regulates the number of unacknowledged packets allowed on connection.packets allowed on connection.

Size of window approx equals Bandwidth delay Size of window approx equals Bandwidth delay productproduct Determines the appropriate window size to set to obtain Determines the appropriate window size to set to obtain

a bandwidth under a certain delaya bandwidth under a certain delay Window = Bandwidth x DelayWindow = Bandwidth x Delay

Page 5: Experience with Loss-Based Congestion Controlled TCP Stacks Yee-Ting Li University College London

AlgorithmsAlgorithms

Congestion AvoidanceCongestion Avoidance For every packet (ack) received by senderFor every packet (ack) received by sender

Cwnd Cwnd cwnd + 1/cwnd cwnd + 1/cwnd For when loss is detected (through dupacks)For when loss is detected (through dupacks)

Cwnd Cwnd cwnd / 2 cwnd / 2 Growth of cwnd determined by:Growth of cwnd determined by:

the RTT of the connectionthe RTT of the connection When rtt is high, cwnd grows slowly (because of When rtt is high, cwnd grows slowly (because of

acking)acking) The loss rate on the lineThe loss rate on the line

High loss means that cwnd never achieved a large High loss means that cwnd never achieved a large valuevalue

Capacity of the linkCapacity of the link Allows for large cwnd value (when low loss)Allows for large cwnd value (when low loss)

Page 6: Experience with Loss-Based Congestion Controlled TCP Stacks Yee-Ting Li University College London

Current Methods of Achieving Current Methods of Achieving High ThroughputHigh Throughput

AdvantagesAdvantages Achieves good Achieves good

throughputthroughput Not changes to kernels Not changes to kernels

requiredrequired DisadvantagesDisadvantages

Have to manually tune Have to manually tune the number of flowsthe number of flows

May induce extra loss May induce extra loss on lossy networkson lossy networks

Need to Need to reprogram/recompile reprogram/recompile softwaresoftware

Page 7: Experience with Loss-Based Congestion Controlled TCP Stacks Yee-Ting Li University College London

New TCP StacksNew TCP Stacks

Modify the congestion control algorithm to Modify the congestion control algorithm to improve response timesimprove response times

All based on modifying the cwnd growth All based on modifying the cwnd growth and decrease valuesand decrease values

Define:Define: a = increase of data packets per window of acksa = increase of data packets per window of acks b = decrease factor upon congestionb = decrease factor upon congestion

To maintain compatibility (and hence To maintain compatibility (and hence network stability and fairness), for small network stability and fairness), for small cwnd values:cwnd values: Mode switch from Vanilla to New TCPMode switch from Vanilla to New TCP

Page 8: Experience with Loss-Based Congestion Controlled TCP Stacks Yee-Ting Li University College London

HSTCPHSTCP

Designed by Sally FloydDesigned by Sally Floyd Determine a and b as a function of cwndDetermine a and b as a function of cwnd

a a a(cwnd) a(cwnd) b b b(cwnd) b(cwnd)

Gradual improvement in throughput as we Gradual improvement in throughput as we approach larger bandwidth delay productsapproach larger bandwidth delay products

Current implementation focused on Current implementation focused on performance upto 10Gb/sec – set linear performance upto 10Gb/sec – set linear relation between loss and throughput relation between loss and throughput (response function)(response function)

Page 9: Experience with Loss-Based Congestion Controlled TCP Stacks Yee-Ting Li University College London

Scalable TCPScalable TCP

Designed by Tom KellyDesigned by Tom Kelly Define a and b to be constant:Define a and b to be constant:

a: cwnd a: cwnd cwnd + a (per ack) cwnd + a (per ack) b: cwnd b: cwnd cwnd – b x cwnd cwnd – b x cwnd

Intrinsic scaling property that has the Intrinsic scaling property that has the same performance over any link (beyond same performance over any link (beyond the initial threshold)the initial threshold)

Recommended settingsRecommended settings a = 1/100a = 1/100 b = 1/8b = 1/8

Page 10: Experience with Loss-Based Congestion Controlled TCP Stacks Yee-Ting Li University College London

H-TCPH-TCP

Designed by Doug Leith and Robert ShortenDesigned by Doug Leith and Robert Shorten Define a mode switch so that after Define a mode switch so that after congestioncongestion we do normal Vanilla we do normal Vanilla

After a predefined period After a predefined period ∆∆LL, switch to a , switch to a high performance ahigh performance a ∆∆ii

≤ ∆≤ ∆LL: a = 1: a = 1 ∆∆II

> ∆> ∆LL: a = 1 + (∆ - ∆: a = 1 + (∆ - ∆LL) + [(∆ - ∆) + [(∆ - ∆LL)/20])/20]22

Upon loss drop byUpon loss drop by | [B| [Bii

maxmax(k+1) - B(k+1) - Biimaxmax(k)] / B(k)] / Bii

maxmax(k) | > 0.2: b = 0.5(k) | > 0.2: b = 0.5 Else: b = RTTElse: b = RTTminmin/RTT/RTTmaxmax

Page 11: Experience with Loss-Based Congestion Controlled TCP Stacks Yee-Ting Li University College London

ImplementationImplementation All New Stacks have own implementationAll New Stacks have own implementation Small differences between implementations means Small differences between implementations means

that we are comparing the kernel differences rather that we are comparing the kernel differences rather than just the algorithmic differencesthan just the algorithmic differences

Lead to development of ‘test platform’ kernel Lead to development of ‘test platform’ kernel altAIMDaltAIMD

Implements all three stacks via simple Implements all three stacks via simple sysctlsysctl switch. switch. Also incorporates switches for certain undesirable Also incorporates switches for certain undesirable

kernel ‘features’kernel ‘features’ moderate_cwnd()moderate_cwnd() IFQIFQ

Added extra features for testing/evaluation purposesAdded extra features for testing/evaluation purposes Appropriate Byte Counting (RFC3465)Appropriate Byte Counting (RFC3465) Inducible packet loss (at recv)Inducible packet loss (at recv) Web100 TCP logging (cwnd etc)Web100 TCP logging (cwnd etc)

Page 12: Experience with Loss-Based Congestion Controlled TCP Stacks Yee-Ting Li University College London

Networks Under TestNetworks Under Test

NetworksNetworks

Cisco 7600

Cisco 7600

Juniper

StarLightCERN

Cisco 7600

Cisco 7600

Cisco 7600

Manchester

UCL

DataTAG MB-NG

Bottleneck Capacity 1Gb/secRTT 120msec

Bottleneck Capacity 1Gb/secRTT 6msec

Page 13: Experience with Loss-Based Congestion Controlled TCP Stacks Yee-Ting Li University College London

Graph/DemoGraph/Demo

Mode switch between stacks on Mode switch between stacks on constant packet dropconstant packet drop

Vanilla TCP Scalable TCP HS-TCP

{ { {

Page 14: Experience with Loss-Based Congestion Controlled TCP Stacks Yee-Ting Li University College London

Comparison against theoryComparison against theory

Response functionResponse function

Page 15: Experience with Loss-Based Congestion Controlled TCP Stacks Yee-Ting Li University College London

Self Similar Background Self Similar Background TestsTests

Results skewedResults skewed Not comparing differences in TCP algorithms!Not comparing differences in TCP algorithms! Not useful results!Not useful results!

Page 16: Experience with Loss-Based Congestion Controlled TCP Stacks Yee-Ting Li University College London

SACK …SACK … Look into what’s happening at the Look into what’s happening at the

algorithmic level:algorithmic level:

Strange hiccups in cwnd Strange hiccups in cwnd only only correlation is SACK arrivalscorrelation is SACK arrivals

Scalable TCP on MB-NG with 200mbit/sec CBR Background

Page 17: Experience with Loss-Based Congestion Controlled TCP Stacks Yee-Ting Li University College London

SACKSSACKS Supplies the sender information about what Supplies the sender information about what

segments the recv hassegments the recv has Sender infers the missing packets to resendSender infers the missing packets to resend Aids recovery during loss and prevents timeoutsAids recovery during loss and prevents timeouts

Current implementation in 2.4 and 2.6 does Current implementation in 2.4 and 2.6 does a walk through the entire sack list for each a walk through the entire sack list for each SACKSACK Very cpu intensiveVery cpu intensive Can be interrupted by arrival of next SACK which Can be interrupted by arrival of next SACK which

causes the SACK implementation to misbehavecauses the SACK implementation to misbehave Tests conducted with Tom Kelly’s SACK Tests conducted with Tom Kelly’s SACK

fast-path patchfast-path patch Improves SACK processing, but still not sufficientImproves SACK processing, but still not sufficient

Page 18: Experience with Loss-Based Congestion Controlled TCP Stacks Yee-Ting Li University College London

SACK Processing overheadSACK Processing overhead Periods of Periods of

web100 silence web100 silence due to high cpu due to high cpu utilizationutilization

Logging done in Logging done in userspace – userspace – kernel time kernel time taken up by tcp taken up by tcp sack processingsack processing

TCP resets cwndTCP resets cwnd

Page 19: Experience with Loss-Based Congestion Controlled TCP Stacks Yee-Ting Li University College London

Congestion Window Congestion Window ModerationModeration

Linux TCP implementation adds ‘feature’ Linux TCP implementation adds ‘feature’ of moderate_cwnd()of moderate_cwnd()

Idea is to prevent large bursts of data Idea is to prevent large bursts of data packets under ‘dubious’ conditionspackets under ‘dubious’ conditions When an ACK acknowledges more than 3 When an ACK acknowledges more than 3

packets (typically 2)packets (typically 2) Adjusts cwnd to known number of packets Adjusts cwnd to known number of packets

‘in-flight’ (plus extra 3 packets)‘in-flight’ (plus extra 3 packets) Under large cwnd sizes (high bandwidth Under large cwnd sizes (high bandwidth

delay products), throughput can be delay products), throughput can be diminished as resultdiminished as result

Page 20: Experience with Loss-Based Congestion Controlled TCP Stacks Yee-Ting Li University College London

CPU Load and ThroughputCPU Load and Throughput

Page 21: Experience with Loss-Based Congestion Controlled TCP Stacks Yee-Ting Li University College London

moderate_cwnd OFFmoderate_cwnd ON

moderate_cwnd(): Vanilla moderate_cwnd(): Vanilla TCPTCP

CW

ND

Th

rou

gh

pu

t90% TCP AF

Page 22: Experience with Loss-Based Congestion Controlled TCP Stacks Yee-Ting Li University College London

moderate_cwnd(): HS-TCPmoderate_cwnd(): HS-TCP

70%

TC

P A

F90

% T

CP

AF

moderate_cwnd OFFmoderate_cwnd ON

Page 23: Experience with Loss-Based Congestion Controlled TCP Stacks Yee-Ting Li University College London

moderate_cwnd OFFmoderate_cwnd ON

70%

TC

P A

F90

% T

CP

AF

moderate_cwnd(): Scalable-moderate_cwnd(): Scalable-TCPTCP

Page 24: Experience with Loss-Based Congestion Controlled TCP Stacks Yee-Ting Li University College London

Multiple StreamsMultiple Streams

Ag

gre

gat

e B

WC

oV

Page 25: Experience with Loss-Based Congestion Controlled TCP Stacks Yee-Ting Li University College London

10 TCP Flows versus Self-10 TCP Flows versus Self-Similar BackgroundSimilar Background

Ag

gre

gat

e B

WC

oV

Page 26: Experience with Loss-Based Congestion Controlled TCP Stacks Yee-Ting Li University College London

10 TCP Flows versus Self-10 TCP Flows versus Self-Similar BackgroundSimilar Background

BG

Lo

ss p

er T

CP

BW

Page 27: Experience with Loss-Based Congestion Controlled TCP Stacks Yee-Ting Li University College London

ImpactImpact Fairness: ratio of throughput achieved by one Fairness: ratio of throughput achieved by one

stack against anotherstack against another Means that a fairness against vanilla tcp is defined by Means that a fairness against vanilla tcp is defined by

how much more throughput a new stacks gets more how much more throughput a new stacks gets more than vanillathan vanilla

Doesn’t really consider deploy-ability of the stacks in Doesn’t really consider deploy-ability of the stacks in real life – how does these stacks affect the existing real life – how does these stacks affect the existing traffic? (mostly vanilla tcp)traffic? (mostly vanilla tcp)

Redefine fairness in terms of the Impact:Redefine fairness in terms of the Impact: Consider the affect of the background traffic only under Consider the affect of the background traffic only under

different stacksdifferent stacks

Vary against number of TCP Flows to determine Vary against number of TCP Flows to determine impact(vanilla flows)impact(vanilla flows)

throughput of n-Vanilla flows

throughput of (n-1) Vanilla flows + 1 new TCP flowBW impact =

Page 28: Experience with Loss-Based Congestion Controlled TCP Stacks Yee-Ting Li University College London

Impact of 1 TCP FlowImpact of 1 TCP FlowT

hro

ug

hp

ut

Imp

act

Th

rou

gh

pu

t

Page 29: Experience with Loss-Based Congestion Controlled TCP Stacks Yee-Ting Li University College London

1 New TCP Impact1 New TCP Impact

Co

V

Page 30: Experience with Loss-Based Congestion Controlled TCP Stacks Yee-Ting Li University College London

Impact of 10 TCP FlowsImpact of 10 TCP FlowsT

hro

ug

hp

ut

Imp

act

Th

rou

gh

pu

t

Page 31: Experience with Loss-Based Congestion Controlled TCP Stacks Yee-Ting Li University College London

10 TCP Flows Impact10 TCP Flows Impact

Co

V

Page 32: Experience with Loss-Based Congestion Controlled TCP Stacks Yee-Ting Li University College London

WAN TestsWAN Tests

Page 33: Experience with Loss-Based Congestion Controlled TCP Stacks Yee-Ting Li University College London

SummarySummary

Comparison of actual TCP differences through Comparison of actual TCP differences through test platform kerneltest platform kernel

Problems with SACK implementations mean that Problems with SACK implementations mean that it is difficult under loss to maintain high it is difficult under loss to maintain high throughput (>500Mbit/sec)throughput (>500Mbit/sec)

Other problems exist with kernel implementation Other problems exist with kernel implementation that hinder performancethat hinder performance

Compare stacks under different artificial (and Compare stacks under different artificial (and hence repeatable) conditionshence repeatable) conditions Single stream: Single stream: Multiple stream:Multiple stream:

Need to study over wider range of networksNeed to study over wider range of networks Move tests onto real production environmentsMove tests onto real production environments