recent linux tcp updates, and how to tune your 100g … · – throughput ramps up much quicker ......

39
Recent Linux TCP Updates, and how to tune your 100G host Brian Tierney and Nate Hanford, ESnet [email protected] http://fasterdata.es.net SC16 INDIS Workshop November 13, 2016

Upload: buidat

Post on 08-Sep-2018

219 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Recent Linux TCP Updates, and how to tune your 100G … · – Throughput ramps up much quicker ... 4 11/13/16 New York to Texas. Observation #2 • Turning on FQ helps throughput

RecentLinuxTCPUpdates,andhowtotuneyour100Ghost

BrianTierneyandNateHanford,[email protected]://fasterdata.es.net

SC16INDISWorkshop

November13,2016

Page 2: Recent Linux TCP Updates, and how to tune your 100G … · – Throughput ramps up much quicker ... 4 11/13/16 New York to Texas. Observation #2 • Turning on FQ helps throughput

Observation#1

• TCPismorestableinCentOS7vs CentOS6– Throughputrampsupmuchquicker

• Moreaggressiveslowstart– Lessvariabilityoverlifeoftheflow

11/13/162

Page 3: Recent Linux TCP Updates, and how to tune your 100G … · – Throughput ramps up much quicker ... 4 11/13/16 New York to Texas. Observation #2 • Turning on FQ helps throughput

BerkeleytoAmsterdam

Page 4: Recent Linux TCP Updates, and how to tune your 100G … · – Throughput ramps up much quicker ... 4 11/13/16 New York to Texas. Observation #2 • Turning on FQ helps throughput

11/13/164

NewYorktoTexas

Page 5: Recent Linux TCP Updates, and how to tune your 100G … · – Throughput ramps up much quicker ... 4 11/13/16 New York to Texas. Observation #2 • Turning on FQ helps throughput

Observation#2

• TurningonFQhelpsthroughputevenmore– TCPisevenmorestable– Worksbetterwithsmallbufferdevices

• Pacingtomatchbottlenecklinkworksbetteryet

11/13/165

Page 6: Recent Linux TCP Updates, and how to tune your 100G … · – Throughput ramps up much quicker ... 4 11/13/16 New York to Texas. Observation #2 • Turning on FQ helps throughput

TCPoption:FairQueuingScheduler (FQ)AvailableinLinuxkernel3.11(releasedlate2013)orhigher

– AvailableinFedora20,Debian 8,andUbuntu13.10– Backported to3.10.0-327kernel inv7.2CentOS/RHEL(Dec 2015)

ToenableFairQueuing(whichisoffbydefault),do:– tc qdisc adddev $ETHrootfqOraddthisto/etc/sysctl.conf:

net.core.default_qdisc =fq

Tobothpaceandshapethetraffic:– tc qdisc adddev $ETHrootfq maxrate Ngbit

• Canreliablypaceuptoamaxrate of32Gbpsonafastprocessor

Canalsodoapplicationpacingusinga‘setsockopt(SO_MAX_PACING_RATE)’systemcall– iperf3supportsthisviathe“—bandwidth’flag

Page 7: Recent Linux TCP Updates, and how to tune your 100G … · – Throughput ramps up much quicker ... 4 11/13/16 New York to Texas. Observation #2 • Turning on FQ helps throughput

11/13/167

NewYorktoTexas:WithPacing

Page 8: Recent Linux TCP Updates, and how to tune your 100G … · – Throughput ramps up much quicker ... 4 11/13/16 New York to Texas. Observation #2 • Turning on FQ helps throughput

FQPacketsaremuchmoreevenlyspacedtcptrace/xplotoutput:FQonleft,StandardTCPonright

8

Page 9: Recent Linux TCP Updates, and how to tune your 100G … · – Throughput ramps up much quicker ... 4 11/13/16 New York to Texas. Observation #2 • Turning on FQ helps throughput

100GHostTuning

11/13/169

Page 10: Recent Linux TCP Updates, and how to tune your 100G … · – Throughput ramps up much quicker ... 4 11/13/16 New York to Texas. Observation #2 • Turning on FQ helps throughput

TestEnvironment• Hosts:

– Supermicro X10DRiDTNs– IntelXeonE5-2643v3,2sockets,6coreseach– CentOS 7.2runningKernel3.10.0-327.el7.x86_64– Mellanox ConnectX-4EN/VPI100GNICswithportsinENmode– Mellanox OFEDDriver3.3-1.0.4(03Jul2016),Firmware12.16.1020

• Topology– BothsystemsconnectedtoDellZ9100100GbpsONTop-of-RackSwitch– Uplinktonersc-tb1ALUSR7750Routerrunning100GlooptoStarlightandback

• 92msRTT– UsingTagged802.1qtoswitchbetweenLoopandLocalVLANs– LANhad54usecRTT

• Configuration:– MTUwas9000B– irqbalance,tuned,andnumad wereoff– coreaffinitywassettocores7and8(ontheNUMAnodeclosesttotheNIC)– AlltestsareIPV4unlessotherwisestated

11/13/1610

Page 11: Recent Linux TCP Updates, and how to tune your 100G … · – Throughput ramps up much quicker ... 4 11/13/16 New York to Texas. Observation #2 • Turning on FQ helps throughput

nersc-tb1

Dell z9100

nersc-tbn-4 nersc-tbn-5

star-tb1

100G loop: RTT = 92ms

100G

StarLight (Chicago)

Oakland, CA

Each host has:• Mellanox ConnectX-4 (100G)• Mellanox ConnectX-3 (40G)

Alcatel 7750 Router

TestbedTopologyAlcatel 7750 Router

40G

100G 100G40G

Page 12: Recent Linux TCP Updates, and how to tune your 100G … · – Throughput ramps up much quicker ... 4 11/13/16 New York to Texas. Observation #2 • Turning on FQ helps throughput

OurCurrentBestSingleFlowResults• TCP

– LAN:79Gbps– WAN(RTT=92ms):36.5Gbps,49Gbpsusing‘sendfile’API(‘zero-copy’)– Testcommands:

• LAN:nuttcp -i1-xc7/7–w1m-T30hostname• WAN:nuttcp -i1-xc7/7–w900M-T30hostname

• UDP:– LANandWAN:33Gbps– Testcommand:nuttcp -l8972-T30-u-w4m-Ru -i1-xc7/7hostname

Othershavereportedupto85GbpsLANperformancewithsimilarhardware

11/13/1612

Page 13: Recent Linux TCP Updates, and how to tune your 100G … · – Throughput ramps up much quicker ... 4 11/13/16 New York to Texas. Observation #2 • Turning on FQ helps throughput

CPUgovernorLinuxCPUgovernor(P-States)settingmakesabig difference:

RHEL: cpupower frequency-set -g performanceDebian:cpufreq-set -r -g performance

57Gbps defaultsettings(powersave)vs.79Gbps ‘performance’modeontheLANTowatchtheCPUgovernorinaction:watch -n 1 grep MHz /proc/cpuinfo

cpu MHz:1281.109cpu MHz:1199.960cpu MHz:1299.968cpu MHz:1199.960cpu MHz:1291.601cpu MHz:3700.000cpu MHz:2295.796cpu MHz:1381.250cpu MHz:1778.492

11/13/1613

Page 14: Recent Linux TCP Updates, and how to tune your 100G … · – Throughput ramps up much quicker ... 4 11/13/16 New York to Texas. Observation #2 • Turning on FQ helps throughput

TCPBuffers# add to /etc/sysctl.conf

# allow testing with 2GB buffers

net.core.rmem_max = 2147483647net.core.wmem_max = 2147483647

# allow auto-tuning up to 2GB buffers

net.ipv4.tcp_rmem = 4096 87380 2147483647net.ipv4.tcp_wmem = 4096 65536 2147483647

2GBisthemaxallowableunderLinuxWANBDP=12.5GB/s*92ms=1150MB(autotuningsetthisto1136MB)LANBDP=12.5GB/s*54us=675KB(autotuningsetthisto2-9MB)ManualbuffertuningmadeabigdifferenceontheLAN:

– 50-60Gbpsvs 79Gbps

11/13/1614

Page 15: Recent Linux TCP Updates, and how to tune your 100G … · – Throughput ramps up much quicker ... 4 11/13/16 New York to Texas. Observation #2 • Turning on FQ helps throughput

zerocopy (sendfile)results

• iperf3–Zoption

• NosignificantdifferenceontheLAN

• SignificantimprovementontheWAN– 36.5Gbpsvs 49Gbps

11/13/1615

Page 16: Recent Linux TCP Updates, and how to tune your 100G … · – Throughput ramps up much quicker ... 4 11/13/16 New York to Texas. Observation #2 • Turning on FQ helps throughput

IPv4vs IPv6results

• IPV6isslightlyfasterontheWAN,slightlyslowerontheLAN

• LAN:– IPV4:79Gbps– IPV6:77.2Gbps

• WAN– IPV4:36.5Gbps– IPV6:37.3Gbps

11/13/1616

Page 17: Recent Linux TCP Updates, and how to tune your 100G … · – Throughput ramps up much quicker ... 4 11/13/16 New York to Texas. Observation #2 • Turning on FQ helps throughput

Don’tForgetaboutNUMAIssues

11/13/1617

• Upto2xperformancedifferenceifyouusethewrongcore.

• Ifyouhavea2CPUsocketNUMAhost,besureto:– Turnoffirqbalance– FigureoutwhatsocketyourNICisconnectedto:

cat /sys/class/net/ethN/device/numa_node

– RunMellanox IRQscript:/usr/sbin/set_irq_affinity_bynode.sh 1 ethN

– BindyourprogramtothesameCPUsocketastheNIC:numactl -N 1 program_name

• WhichcoresbelongtoaNUMAsocket?– cat/sys/devices/system/node/node0/cpulist– (note:onsomeDellservers,thatmightbe:0,2,4,6,...)

Page 18: Recent Linux TCP Updates, and how to tune your 100G … · – Throughput ramps up much quicker ... 4 11/13/16 New York to Texas. Observation #2 • Turning on FQ helps throughput

SettingstoleavealoneinCentOS7

Recommendleavingtheseatthedefaultsettings,andnoneoftheseseemtoimpactperformancemuch

• InterruptCoalescence

• RingBuffersize

• LRO(off)andGRO(on)

• net.core.netdev_max_backlog

• txqueuelen

• tcp_timestamps

11/13/1618

Page 19: Recent Linux TCP Updates, and how to tune your 100G … · – Throughput ramps up much quicker ... 4 11/13/16 New York to Texas. Observation #2 • Turning on FQ helps throughput

ToolSelection• iperf3,nuttcp,andiperf2havedifferentstrengths.

• nuttcp isabout10%fasteronLANtests,andhaslotsofcooloptions.

• iperf3hasniceretransmit/congestionwindowreport,supportsFQpacing,andJSONoutputoptionisgreatforproducingplots

• Iperf2ismulti-threaded,andbetterforparallelstreamtesting

• Useall!Allarepartofthe‘perfsonar-tools’package– Installationinstructionsat:http://fasterdata.es.net/performance-testing/network-troubleshooting-tools/

11/13/1619

Page 20: Recent Linux TCP Updates, and how to tune your 100G … · – Throughput ramps up much quicker ... 4 11/13/16 New York to Texas. Observation #2 • Turning on FQ helps throughput

BIOSSettings

• DCA/IOAT/DDIO:ON– AllowstheNICtodirectlyaddressthecacheinDMAtransfers

• PCIe MaxReadRequest:Turnitupto4096,butourresultssuggestitdoesn’tseemtohurtorhelp

• Turboboost:ON

• Hyperthreading:OFF– AddedexcessivevariabilityinLANperformance(51Gto77G)

11/13/1620

Page 21: Recent Linux TCP Updates, and how to tune your 100G … · – Throughput ramps up much quicker ... 4 11/13/16 New York to Texas. Observation #2 • Turning on FQ helps throughput

FQon100GHosts

11/13/1621

Page 22: Recent Linux TCP Updates, and how to tune your 100G … · – Throughput ramps up much quicker ... 4 11/13/16 New York to Texas. Observation #2 • Turning on FQ helps throughput

100GHost,ParallelStreams:nopacingvs 20Gpacing

11/13/1622

WealsoseeconsistentlossontheLANwith4streams,nopacingPacketlossduetosmallbuffersinDellZ9100switch?

Page 23: Recent Linux TCP Updates, and how to tune your 100G … · – Throughput ramps up much quicker ... 4 11/13/16 New York to Texas. Observation #2 • Turning on FQ helps throughput

100GHostto10GHost

11/13/1623

Page 24: Recent Linux TCP Updates, and how to tune your 100G … · – Throughput ramps up much quicker ... 4 11/13/16 New York to Texas. Observation #2 • Turning on FQ helps throughput

FQasafixforunderbuffereddevices

11/13/1624

Page 25: Recent Linux TCP Updates, and how to tune your 100G … · – Throughput ramps up much quicker ... 4 11/13/16 New York to Texas. Observation #2 • Turning on FQ helps throughput

Findingtheoptimalsendingrate

• Ifiperf3testsshowlotsofretransmits,trygraduallyreducingthesendratebwctl –c bwctl100g.sc16.orgbwctl –c bwctl100g.sc16.org –b 20Gbwctl –c bwctl100g.sc16.org –b 15G

• Thenconfigureyourhosttousethatasamaxsendrate:/sbin/tc qdisc add dev eth1 root fq maxrate 15gbit

11/13/1625

Page 26: Recent Linux TCP Updates, and how to tune your 100G … · – Throughput ramps up much quicker ... 4 11/13/16 New York to Texas. Observation #2 • Turning on FQ helps throughput

Summaryofour100Gresults

• NewEnhancementstoLinuxKernelmaketuningeasieringeneral.

• Afewofthestandard10Gtuningknobsnolongerapply

• TCPbufferautotuningdoesnotworkwell100GLAN

• Usethe‘performance’CPUgovernor

• UseFQPacingtomatchreceivehostspeedifpossible

• ImportanttobeusingtheLatestdriverfromMellanox– version:3.3-1.0.4(03Jul2016),firmware-version:12.16.1020

11/13/1626

Page 27: Recent Linux TCP Updates, and how to tune your 100G … · – Throughput ramps up much quicker ... 4 11/13/16 New York to Texas. Observation #2 • Turning on FQ helps throughput

What’snextintheTCPworld?

• TCPBBR(BottleneckBandwidthandRTT)fromGoogle– https://patchwork.ozlabs.org/patch/671069/– GoogleGroup:https://groups.google.com/forum/#!topic/bbr-dev

• AdetaileddescriptionofBBRpublishedinACMQueue,Vol.14No.5,September-October2016:– "BBR:Congestion-BasedCongestionControl".

• Googlereports2-4ordersofmagnitudeperformanceimprovementonapathwith1%lossand100msRTT.– Sampleresult:cubic:3.3Mbps,BBR:9150Mbps!!– EarlytestingonESnetlessconclusive,butseemstohelponsomepaths

11/13/1627

Page 28: Recent Linux TCP Updates, and how to tune your 100G … · – Throughput ramps up much quicker ... 4 11/13/16 New York to Texas. Observation #2 • Turning on FQ helps throughput

InitialBBRTCPresults(bwctl,3streams,40sectest)RemoteHost Throughput Retransmits

perfsonar.nssl.noaa.gov htcp:183bbr:803

htcp:1070bbr:240340

kstar-ps.nfri.re.kr htcp:4301bbr:4430

htcp:1641bbr:98329

ps1.jpl.net htcp:940bbr:935

htcp:1247bbr:399110

uhmanoa-tp.ps.uhnet.net htcp:5051bbr:3095

htcp:5364bbr:412348

11/13/1628

Variesbetween4xbetterand30%worse,allwithWAYmoreretransmits.

Page 29: Recent Linux TCP Updates, and how to tune your 100G … · – Throughput ramps up much quicker ... 4 11/13/16 New York to Texas. Observation #2 • Turning on FQ helps throughput

MoreInformation

• http://fasterdata.es.net/host-tuning/packet-pacing/• http://fasterdata.es.net/host-tuning/100g-tuning/• TalkonSwitchBuffersizeexperiments:

– http://meetings.internet2.edu/2015-technology-exchange/detail/10003941/

• Mellanox TuningGuide:– https://community.mellanox.com/docs/DOC-1523

• Email:[email protected]

11/13/1629

Page 30: Recent Linux TCP Updates, and how to tune your 100G … · – Throughput ramps up much quicker ... 4 11/13/16 New York to Texas. Observation #2 • Turning on FQ helps throughput

ExtraSlides

11/13/1630

Page 31: Recent Linux TCP Updates, and how to tune your 100G … · – Throughput ramps up much quicker ... 4 11/13/16 New York to Texas. Observation #2 • Turning on FQ helps throughput

FQBackground

• Lotsofdiscussionaround‘bufferbloat’startingin2011– https://www.bufferbloat.net/

• Googlewantedtobeabletogethigherutilizationontheirnetwork– Paper:“B4:ExperiencewithaGlobally-DeployedSoftwareDefinedWAN,SIGCOMM2013

• GooglehiredsomeverysmartTCPpeople• VanJacobson,MattMathis,EricDumazet,andothers

• Result:LotsofimprovementstotheTCPstackin2013-14,includingmostnotablythe‘fairqueuing’pacer

11/13/1631

Page 32: Recent Linux TCP Updates, and how to tune your 100G … · – Throughput ramps up much quicker ... 4 11/13/16 New York to Texas. Observation #2 • Turning on FQ helps throughput

Benchmarkingvs.ProductionHostSettings

Therearesomesettingsthatwillgiveyoumoreconsistentresultsforbenchmarking,butyoumaynotwanttorunonaproductionDTNBenchmarking:• UseaspecificcoreforIRQs:

/usr/sbin/set_irq_affinity_cpulist.sh 8 ethN• Useafixedclockspeed(settothemaxforyourprocessor):

– /bin/cpupower -c all frequency-set -f 3.4GHzProductionDTN:

/usr/sbin/set_irq_affinity_bynode.sh 1 ethN/bin/cpupower frequency-set -g performance

11/13/1632

Page 33: Recent Linux TCP Updates, and how to tune your 100G … · – Throughput ramps up much quicker ... 4 11/13/16 New York to Texas. Observation #2 • Turning on FQ helps throughput

FastHosttoSlowhost

11/13/1633

Throttledthereceivehostusing‘cpupower’command:/bin/cpupower -c all frequency-set -f 1.2GHz

Page 34: Recent Linux TCP Updates, and how to tune your 100G … · – Throughput ramps up much quicker ... 4 11/13/16 New York to Texas. Observation #2 • Turning on FQ helps throughput

AsmallamountofpacketlossmakesahugedifferenceinTCPperformance

MetroArea

Local(LAN)

Regional Continental

International

Measured(TCPReno) Measured(HTCP) Theoretical(TCPReno) Measured(noloss)

Withloss,highperformance beyondmetrodistancesisessentiallyimpossible

Page 35: Recent Linux TCP Updates, and how to tune your 100G … · – Throughput ramps up much quicker ... 4 11/13/16 New York to Texas. Observation #2 • Turning on FQ helps throughput

TCP’sCongestionControl

© 2015 Internet2

50ms simulated RTTCongestion w/ 2Gbps UDP trafficHTCP / Linux 2.6.32

SlidefromMichaelSmitasin,LBLnet

Page 36: Recent Linux TCP Updates, and how to tune your 100G … · – Throughput ramps up much quicker ... 4 11/13/16 New York to Texas. Observation #2 • Turning on FQ helps throughput

FairQueuingandandSmallSwitchBuffersTCPThroughputonSmallBufferSwitch(Congestionw/2GbpsUDPbackgroundtraffic)

RequiresCentOS 7.2orhigher

tc qdisc add dev EthN root fqEnableFairQueuing

PacingsideeffectofFairQueuingyields~1.25Gbpsincreaseinthroughput@10Gbpsonourhosts

TSOdifferencesstillnegligibleonourhostsw/IntelX520

SlidefromMichaelSmitasin,LBL

Page 37: Recent Linux TCP Updates, and how to tune your 100G … · – Throughput ramps up much quicker ... 4 11/13/16 New York to Texas. Observation #2 • Turning on FQ helps throughput

Moreexamplesofpacinghelping

Page 38: Recent Linux TCP Updates, and how to tune your 100G … · – Throughput ramps up much quicker ... 4 11/13/16 New York to Texas. Observation #2 • Turning on FQ helps throughput

ParallelStreamTest1Leftside:

sumof4streams

Rightside:tput ofeachstream

StreamsappeartobemuchbetterbalancedwithFQ,pacingto2.4performedbest

Page 39: Recent Linux TCP Updates, and how to tune your 100G … · – Throughput ramps up much quicker ... 4 11/13/16 New York to Texas. Observation #2 • Turning on FQ helps throughput

Runyourowntests

• FindaremoteperfSONARhostonapathofinterest– Mostofthe2000+worldwideperfSONARhostswillaccepttests

• See:http://stats.es.net/ServicesDirectory/

• Runsometests– bwctl-chostname-t60--parsable >results.json

• ConvertJSONtognuplot format:– https://github.com/esnet/iperf/tree/master/contrib