rdma in data centers: looking back and looking...
TRANSCRIPT
![Page 1: RDMA in Data Centers: Looking Back and Looking Forwardconferences.sigcomm.org/events/apnet2017/slides/cx.pdf · Microsoft Research. ... • Cloud scale services: IaaS, PaaS, Search,](https://reader031.vdocuments.site/reader031/viewer/2022030509/5ab8582a7f8b9ac60e8cb43c/html5/thumbnails/1.jpg)
RDMAinDataCenters:LookingBackandLookingForward
ChuanxiongGuo
ACMSIGCOMMAPNet 2017
August32017
MicrosoftResearch
![Page 2: RDMA in Data Centers: Looking Back and Looking Forwardconferences.sigcomm.org/events/apnet2017/slides/cx.pdf · Microsoft Research. ... • Cloud scale services: IaaS, PaaS, Search,](https://reader031.vdocuments.site/reader031/viewer/2022030509/5ab8582a7f8b9ac60e8cb43c/html5/thumbnails/2.jpg)
TheRisingofCloudComputing
40 AZUREREGIONS
![Page 3: RDMA in Data Centers: Looking Back and Looking Forwardconferences.sigcomm.org/events/apnet2017/slides/cx.pdf · Microsoft Research. ... • Cloud scale services: IaaS, PaaS, Search,](https://reader031.vdocuments.site/reader031/viewer/2022030509/5ab8582a7f8b9ac60e8cb43c/html5/thumbnails/3.jpg)
DataCenters
![Page 4: RDMA in Data Centers: Looking Back and Looking Forwardconferences.sigcomm.org/events/apnet2017/slides/cx.pdf · Microsoft Research. ... • Cloud scale services: IaaS, PaaS, Search,](https://reader031.vdocuments.site/reader031/viewer/2022030509/5ab8582a7f8b9ac60e8cb43c/html5/thumbnails/4.jpg)
DataCenters
![Page 5: RDMA in Data Centers: Looking Back and Looking Forwardconferences.sigcomm.org/events/apnet2017/slides/cx.pdf · Microsoft Research. ... • Cloud scale services: IaaS, PaaS, Search,](https://reader031.vdocuments.site/reader031/viewer/2022030509/5ab8582a7f8b9ac60e8cb43c/html5/thumbnails/5.jpg)
5
• Cloudscaleservices:IaaS,PaaS,Search,BigData,Storage,MachineLearning,DeepLearning
• Servicesarelatencysensitiveorbandwidthhungryorboth• Cloudscaleservicesneedcloudscalecomputingandcommunicationinfrastructure
Datacenternetworks(DCN)
![Page 6: RDMA in Data Centers: Looking Back and Looking Forwardconferences.sigcomm.org/events/apnet2017/slides/cx.pdf · Microsoft Research. ... • Cloud scale services: IaaS, PaaS, Search,](https://reader031.vdocuments.site/reader031/viewer/2022030509/5ab8582a7f8b9ac60e8cb43c/html5/thumbnails/6.jpg)
6
Datacenternetworks(DCN)
• Singleownership• Largescale• Highbisectionbandwidth• CommodityEthernetswitches• TCP/IPprotocolsuite
Spine
Leaf
ToR
Podset
Pod
Servers
![Page 7: RDMA in Data Centers: Looking Back and Looking Forwardconferences.sigcomm.org/events/apnet2017/slides/cx.pdf · Microsoft Research. ... • Cloud scale services: IaaS, PaaS, Search,](https://reader031.vdocuments.site/reader031/viewer/2022030509/5ab8582a7f8b9ac60e8cb43c/html5/thumbnails/7.jpg)
7
ButTCP/IPisnotdoingwell
![Page 8: RDMA in Data Centers: Looking Back and Looking Forwardconferences.sigcomm.org/events/apnet2017/slides/cx.pdf · Microsoft Research. ... • Cloud scale services: IaaS, PaaS, Search,](https://reader031.vdocuments.site/reader031/viewer/2022030509/5ab8582a7f8b9ac60e8cb43c/html5/thumbnails/8.jpg)
8
TCPlatency
405us(P50)
716us(P90)
2132us(P99)
Longlatencytail
Pingmeshmeasurementresults
![Page 9: RDMA in Data Centers: Looking Back and Looking Forwardconferences.sigcomm.org/events/apnet2017/slides/cx.pdf · Microsoft Research. ... • Cloud scale services: IaaS, PaaS, Search,](https://reader031.vdocuments.site/reader031/viewer/2022030509/5ab8582a7f8b9ac60e8cb43c/html5/thumbnails/9.jpg)
9
TCPprocessingoverhead(40G)Sender Receiver
8tcpconnections
40GNIC
![Page 10: RDMA in Data Centers: Looking Back and Looking Forwardconferences.sigcomm.org/events/apnet2017/slides/cx.pdf · Microsoft Research. ... • Cloud scale services: IaaS, PaaS, Search,](https://reader031.vdocuments.site/reader031/viewer/2022030509/5ab8582a7f8b9ac60e8cb43c/html5/thumbnails/10.jpg)
10
AnRDMArenaissancestory
![Page 11: RDMA in Data Centers: Looking Back and Looking Forwardconferences.sigcomm.org/events/apnet2017/slides/cx.pdf · Microsoft Research. ... • Cloud scale services: IaaS, PaaS, Search,](https://reader031.vdocuments.site/reader031/viewer/2022030509/5ab8582a7f8b9ac60e8cb43c/html5/thumbnails/11.jpg)
11
VirtualInterfaceArchitectureSpec1.0 1997
Infiniband ArchitectureSpec1.0 20001.1 20021.2 20041.3 2015RoCE 2010RoCEv22014
![Page 12: RDMA in Data Centers: Looking Back and Looking Forwardconferences.sigcomm.org/events/apnet2017/slides/cx.pdf · Microsoft Research. ... • Cloud scale services: IaaS, PaaS, Search,](https://reader031.vdocuments.site/reader031/viewer/2022030509/5ab8582a7f8b9ac60e8cb43c/html5/thumbnails/12.jpg)
12
RDMA
• RemoteDirectMemoryAccess(RDMA):Methodofaccessingmemoryonaremotesystemwithout interruptingtheprocessingoftheCPU(s)onthatsystem
• RDMAoffloadspacketprocessingprotocolstotheNIC• RDMAinEthernetbaseddatacenters
![Page 13: RDMA in Data Centers: Looking Back and Looking Forwardconferences.sigcomm.org/events/apnet2017/slides/cx.pdf · Microsoft Research. ... • Cloud scale services: IaaS, PaaS, Search,](https://reader031.vdocuments.site/reader031/viewer/2022030509/5ab8582a7f8b9ac60e8cb43c/html5/thumbnails/13.jpg)
13
RoCEv2:RDMAoverCommodityEthernet
• RoCEv2forEthernetbaseddatacenters
• RoCEv2encapsulatespacketsinUDP
• OSkernelisnotindatapath• NICfornetworkprotocolprocessingandmessageDMA
TCP/IP
NICdriver
User
Kernel
Hardware
RDMAtransport
IPEthernet
RDMAapp
DMA
RDMAverbs
TCP/IP
NICdriver
Ethernet
RDMAapp
DMA
RDMAverbs
Losslessnetwork
RDMAtransport
IP
![Page 14: RDMA in Data Centers: Looking Back and Looking Forwardconferences.sigcomm.org/events/apnet2017/slides/cx.pdf · Microsoft Research. ... • Cloud scale services: IaaS, PaaS, Search,](https://reader031.vdocuments.site/reader031/viewer/2022030509/5ab8582a7f8b9ac60e8cb43c/html5/thumbnails/14.jpg)
14
RDMAbenefit:latencyreduction
• Forsmallmsgs (<32KB),OSprocessinglatencymatters
• Forlargemsgs (100KB+),speedmatters
![Page 15: RDMA in Data Centers: Looking Back and Looking Forwardconferences.sigcomm.org/events/apnet2017/slides/cx.pdf · Microsoft Research. ... • Cloud scale services: IaaS, PaaS, Search,](https://reader031.vdocuments.site/reader031/viewer/2022030509/5ab8582a7f8b9ac60e8cb43c/html5/thumbnails/15.jpg)
15
RDMAbenefit:CPUoverheadreductionSender Receiver
OneNDconnection
40GNIC
37Gb/sgoodput
![Page 16: RDMA in Data Centers: Looking Back and Looking Forwardconferences.sigcomm.org/events/apnet2017/slides/cx.pdf · Microsoft Research. ... • Cloud scale services: IaaS, PaaS, Search,](https://reader031.vdocuments.site/reader031/viewer/2022030509/5ab8582a7f8b9ac60e8cb43c/html5/thumbnails/16.jpg)
16RDMA:SingleQP,88Gb/s,1.7%CPU TCP:Eightconnections,30-50Gb/s,Client:2.6%,Server:4.3%CPU
RDMAbenefit:CPUoverheadreductionIntel(R)Xeon(R)[email protected],twosockets28cores
![Page 17: RDMA in Data Centers: Looking Back and Looking Forwardconferences.sigcomm.org/events/apnet2017/slides/cx.pdf · Microsoft Research. ... • Cloud scale services: IaaS, PaaS, Search,](https://reader031.vdocuments.site/reader031/viewer/2022030509/5ab8582a7f8b9ac60e8cb43c/html5/thumbnails/17.jpg)
17
RoCEv2needsalosslessEthernetnetwork
• PFCforhop-by-hopflowcontrol• DCQCNforconnection-levelcongestioncontrol
![Page 18: RDMA in Data Centers: Looking Back and Looking Forwardconferences.sigcomm.org/events/apnet2017/slides/cx.pdf · Microsoft Research. ... • Cloud scale services: IaaS, PaaS, Search,](https://reader031.vdocuments.site/reader031/viewer/2022030509/5ab8582a7f8b9ac60e8cb43c/html5/thumbnails/18.jpg)
18
Priority-basedflowcontrol(PFC)
• Hop-by-hopflowcontrol,witheightprioritiesforHOLblockingmitigation
• ThepriorityindatapacketsiscarriedintheVLANtagorDSCP
• PFCpauseframetoinformtheupstreamtostop
• PFCcausesHOLandcolleterialdamage
PFCpauseframep1
Egressport Ingressport
p0p1
p7
Datapacket
p0 p0p1
p7
XOFFthreshold
DatapacketPFCpauseframe
![Page 19: RDMA in Data Centers: Looking Back and Looking Forwardconferences.sigcomm.org/events/apnet2017/slides/cx.pdf · Microsoft Research. ... • Cloud scale services: IaaS, PaaS, Search,](https://reader031.vdocuments.site/reader031/viewer/2022030509/5ab8582a7f8b9ac60e8cb43c/html5/thumbnails/19.jpg)
19
DCQCN
• CP: SwitchesuseECNforpacketmarking• NP:periodicallycheckifECN-markedpacketsarrived,ifso,notifythesender
• RP:adjustsendingratebasedonNPfeedbacks19
Sender NICReaction Point
(RP)
SwitchCongestion Point
(CP)
Receiver NICNotification Point
(NP)
DCQCN = Keep PFC + Use ECN + hardware rate-based congestion control
![Page 20: RDMA in Data Centers: Looking Back and Looking Forwardconferences.sigcomm.org/events/apnet2017/slides/cx.pdf · Microsoft Research. ... • Cloud scale services: IaaS, PaaS, Search,](https://reader031.vdocuments.site/reader031/viewer/2022030509/5ab8582a7f8b9ac60e8cb43c/html5/thumbnails/20.jpg)
20
Thelosslessrequirementcausessafetyandperformancechallenges
• RDMAtransportlivelock• PFCdeadlock
• PFCpauseframestorm• Slow-receiversymptom
![Page 21: RDMA in Data Centers: Looking Back and Looking Forwardconferences.sigcomm.org/events/apnet2017/slides/cx.pdf · Microsoft Research. ... • Cloud scale services: IaaS, PaaS, Search,](https://reader031.vdocuments.site/reader031/viewer/2022030509/5ab8582a7f8b9ac60e8cb43c/html5/thumbnails/21.jpg)
21
RDMAtransportlivelockRDMASend0
RDMASend1
RDMASendN+1
NAKN
RDMASend0
RDMASend1
RDMASend2
RDMASendN+2
Go-back-0 Go-back-N
RDMASend0
RDMASend1
RDMASendN+1
NAKN
RDMASendN
RDMASendN+1
RDMASendN+2
RDMASendN+2Sender Receiver
Switch
Pktdroprate1/256
Sender Receiver ReceiverSender
![Page 22: RDMA in Data Centers: Looking Back and Looking Forwardconferences.sigcomm.org/events/apnet2017/slides/cx.pdf · Microsoft Research. ... • Cloud scale services: IaaS, PaaS, Search,](https://reader031.vdocuments.site/reader031/viewer/2022030509/5ab8582a7f8b9ac60e8cb43c/html5/thumbnails/22.jpg)
22
PFCdeadlock
• OurdatacentersuseClosnetwork• Packetsfirsttravelupthengodown
• Nocyclicbufferdependencyforup-downrouting->nodeadlock
• Butwedidexperiencedeadlock!
Spine
Leaf
ToR
Podset
Pod
Servers
![Page 23: RDMA in Data Centers: Looking Back and Looking Forwardconferences.sigcomm.org/events/apnet2017/slides/cx.pdf · Microsoft Research. ... • Cloud scale services: IaaS, PaaS, Search,](https://reader031.vdocuments.site/reader031/viewer/2022030509/5ab8582a7f8b9ac60e8cb43c/html5/thumbnails/23.jpg)
23
PFCdeadlock
• Preliminaries• ARPtable:IPaddresstoMACaddressmapping
• MACtable:MACaddresstoportmapping
• IfMACentryismissing,packetsarefloodedtoallports
IP MAC TTL
IP0 MAC0 2h
IP1 MAC1 1h
MAC Port TTL
MAC0 Port0 10min
MAC1 - -
Input
Output
Dst:IP1
ARPtable
MACtable
![Page 24: RDMA in Data Centers: Looking Back and Looking Forwardconferences.sigcomm.org/events/apnet2017/slides/cx.pdf · Microsoft Research. ... • Cloud scale services: IaaS, PaaS, Search,](https://reader031.vdocuments.site/reader031/viewer/2022030509/5ab8582a7f8b9ac60e8cb43c/html5/thumbnails/24.jpg)
24
La Lb
T0 T1
S1 S2 S3 S4Server
p0 p1
p2 p3
p0 p1
p3 p4
p0 p1p0 p1
Egressport
Ingressport
1 432PFCpauseframes
p2
S5
Packetdrop
Congestedport
Deadserver
PFCpauseframes
Path:{S1,T0,La,T1,S3}
Path:{S1,T0,La,T1,S5}
Path:{S4,T1,Lb,T0,S2}
PFCdeadlock
![Page 25: RDMA in Data Centers: Looking Back and Looking Forwardconferences.sigcomm.org/events/apnet2017/slides/cx.pdf · Microsoft Research. ... • Cloud scale services: IaaS, PaaS, Search,](https://reader031.vdocuments.site/reader031/viewer/2022030509/5ab8582a7f8b9ac60e8cb43c/html5/thumbnails/25.jpg)
25
PFCdeadlock
• ThePFCdeadlockrootcause:theinteractionbetweenthePFCflowcontrolandtheEthernetpacketflooding
• Solution:dropthelosslesspacketsiftheARPentryisincomplete• Recommendation:donotfloodormulticastforlosslesstraffic
![Page 26: RDMA in Data Centers: Looking Back and Looking Forwardconferences.sigcomm.org/events/apnet2017/slides/cx.pdf · Microsoft Research. ... • Cloud scale services: IaaS, PaaS, Search,](https://reader031.vdocuments.site/reader031/viewer/2022030509/5ab8582a7f8b9ac60e8cb43c/html5/thumbnails/26.jpg)
26
L0
T0
L1
S0 S1
L2
T2
L3
T1 T3
L0
T0
L1
S0 S1
L2
T2
L3
T1 T3
Tagger:practicalPFCdeadlockprevention
• TaggerAlgorithmworksforgeneralnetworktopology
• DeployableinexistingswitchingASICs
• Concept:ExpectedLosslessPath(ELP)todecoupleTaggerfromrouting
• Strategy:movepacketstodifferentlosslessqueuebeforeCBDforming
![Page 27: RDMA in Data Centers: Looking Back and Looking Forwardconferences.sigcomm.org/events/apnet2017/slides/cx.pdf · Microsoft Research. ... • Cloud scale services: IaaS, PaaS, Search,](https://reader031.vdocuments.site/reader031/viewer/2022030509/5ab8582a7f8b9ac60e8cb43c/html5/thumbnails/27.jpg)
27
NICPFCpauseframestorm
• AmalfunctioningNICmayblockthewholenetwork
• PFCpauseframestormscausedseveralincidents
• Solution:watchdogsatbothNICandswitchsidestostopthestorm
ToRs
Leaflayer
Spinelayer
servers0 1 2 3 4 5 6 7MalfunctioningNIC
Podset0 Podset1
![Page 28: RDMA in Data Centers: Looking Back and Looking Forwardconferences.sigcomm.org/events/apnet2017/slides/cx.pdf · Microsoft Research. ... • Cloud scale services: IaaS, PaaS, Search,](https://reader031.vdocuments.site/reader031/viewer/2022030509/5ab8582a7f8b9ac60e8cb43c/html5/thumbnails/28.jpg)
28
Theslow-receiversymptom
• ToRtoNICis40Gb/s,NICtoserveris64Gb/s
• ButNICsmaygeneratelargenumberofPFCpauseframes
• Rootcause:NICisresourceconstrained
• Mitigation• LargepagesizefortheMTT(memorytranslationtable)entry
• DynamicbuffersharingattheToR
CPU DRAM
ToR
QSFP40Gb/s
PCIeGen38x864Gb/s
MTTWQEs
QPC
NIC
Server
Pauseframes
![Page 29: RDMA in Data Centers: Looking Back and Looking Forwardconferences.sigcomm.org/events/apnet2017/slides/cx.pdf · Microsoft Research. ... • Cloud scale services: IaaS, PaaS, Search,](https://reader031.vdocuments.site/reader031/viewer/2022030509/5ab8582a7f8b9ac60e8cb43c/html5/thumbnails/29.jpg)
29
Deploymentexperiencesandlessonslearned
![Page 30: RDMA in Data Centers: Looking Back and Looking Forwardconferences.sigcomm.org/events/apnet2017/slides/cx.pdf · Microsoft Research. ... • Cloud scale services: IaaS, PaaS, Search,](https://reader031.vdocuments.site/reader031/viewer/2022030509/5ab8582a7f8b9ac60e8cb43c/html5/thumbnails/30.jpg)
30
Latencyreduction
• RoCEv2deployedinBingworld-widefortwoandhalfyears
• Significantlatencyreduction
• Incast problemsolvedasnopacketdrops
![Page 31: RDMA in Data Centers: Looking Back and Looking Forwardconferences.sigcomm.org/events/apnet2017/slides/cx.pdf · Microsoft Research. ... • Cloud scale services: IaaS, PaaS, Search,](https://reader031.vdocuments.site/reader031/viewer/2022030509/5ab8582a7f8b9ac60e8cb43c/html5/thumbnails/31.jpg)
31
RDMAthroughput
• Usingtwopodsets eachwith500+servers• 5Tb/scapacitybetweenthetwopodsets
• Achieved3Tb/sinter-podset throughput• BottleneckedbyECMProuting• Closeto0CPUoverhead
![Page 32: RDMA in Data Centers: Looking Back and Looking Forwardconferences.sigcomm.org/events/apnet2017/slides/cx.pdf · Microsoft Research. ... • Cloud scale services: IaaS, PaaS, Search,](https://reader031.vdocuments.site/reader031/viewer/2022030509/5ab8582a7f8b9ac60e8cb43c/html5/thumbnails/32.jpg)
32
Latencyandthroughputtradeoff
L0
T0
L1
T1
L1 L1
S0,0 S0,23 S1,0 S1,23
• RDMAlatenciesincreaseasdatashufflingstarted
• Lowlatencyvshighthroughput
us
Beforedatashuffling Duringdatashuffling
![Page 33: RDMA in Data Centers: Looking Back and Looking Forwardconferences.sigcomm.org/events/apnet2017/slides/cx.pdf · Microsoft Research. ... • Cloud scale services: IaaS, PaaS, Search,](https://reader031.vdocuments.site/reader031/viewer/2022030509/5ab8582a7f8b9ac60e8cb43c/html5/thumbnails/33.jpg)
33
Lessonslearned
• Providinglosslessishard!• Deadlock,livelock,PFCpauseframespropagationandstormdidhappen
• Bepreparedfortheunexpected• Configurationmanagement,latency/availability,PFCpauseframe,RDMAtrafficmonitoring
• NICsarethekeytomakeRoCEv2work
![Page 34: RDMA in Data Centers: Looking Back and Looking Forwardconferences.sigcomm.org/events/apnet2017/slides/cx.pdf · Microsoft Research. ... • Cloud scale services: IaaS, PaaS, Search,](https://reader031.vdocuments.site/reader031/viewer/2022030509/5ab8582a7f8b9ac60e8cb43c/html5/thumbnails/34.jpg)
34
What’snext?
![Page 35: RDMA in Data Centers: Looking Back and Looking Forwardconferences.sigcomm.org/events/apnet2017/slides/cx.pdf · Microsoft Research. ... • Cloud scale services: IaaS, PaaS, Search,](https://reader031.vdocuments.site/reader031/viewer/2022030509/5ab8582a7f8b9ac60e8cb43c/html5/thumbnails/35.jpg)
35
Applications
Technologies
Architectures
Protocols
• RDMAforX(Search,Storage,HFT,DNN,etc.) • Lossyvslosslessnetwork
• Practical,large-scaledeadlockfreenetwork
• RDMAprogramming
• RDMAforheterogenouscomputingsystems
• RDMAvirtualization
• Reducingcolleterialdamage• RDMAsecurity
• Softwarevshardware
• Inter-DCRDMA
![Page 36: RDMA in Data Centers: Looking Back and Looking Forwardconferences.sigcomm.org/events/apnet2017/slides/cx.pdf · Microsoft Research. ... • Cloud scale services: IaaS, PaaS, Search,](https://reader031.vdocuments.site/reader031/viewer/2022030509/5ab8582a7f8b9ac60e8cb43c/html5/thumbnails/36.jpg)
36
• Historically,softwarebasedpacketprocessingwon(multipletimes)• TCPprocessingoverheadanalysisbyDavidClark,etal.• Nonofthestateful TCPoffloadingtookoff(e.g.,TCPChimney)
• Thestoryisdifferentthistime• Moore’slawisending• Acceleratorsarecoming• Networkspeedkeepincreasing• Demandsforultralowlatencyarereal
Willsoftwarewin(again)?
![Page 37: RDMA in Data Centers: Looking Back and Looking Forwardconferences.sigcomm.org/events/apnet2017/slides/cx.pdf · Microsoft Research. ... • Cloud scale services: IaaS, PaaS, Search,](https://reader031.vdocuments.site/reader031/viewer/2022030509/5ab8582a7f8b9ac60e8cb43c/html5/thumbnails/37.jpg)
37
• ThereisnobindingbetweenRDMAandlosslessnetwork• Butimplementingmoresophisticatedtransportprotocolinhardwareisachallenge
IslosslessmandatoryforRDMA?
![Page 38: RDMA in Data Centers: Looking Back and Looking Forwardconferences.sigcomm.org/events/apnet2017/slides/cx.pdf · Microsoft Research. ... • Cloud scale services: IaaS, PaaS, Search,](https://reader031.vdocuments.site/reader031/viewer/2022030509/5ab8582a7f8b9ac60e8cb43c/html5/thumbnails/38.jpg)
38
RDMAvirtualizationforthecontainernetworking
• Arouteractsasaproxyforthecontainers
• Sharedmemoryforimprovedperformance
• Zerocopypossible
Container1IP:1.1.1.1
Host1
HostNetwork
vNIC
NetAPI
Application
FreeFlowNetLib
Container2IP:2.2.2.2
vNIC
NetAPI
Application
FreeFlowNetLib
PhyNIC
Container3IP:3.3.3.3
Host2
vNIC
NetAPIFreeFlowNetLib
PhyNICRDMA
ControlAgent
IPCChannel
FreeFlowRouterFr
eeFlow
NetOrchestrator
SharedMemorySpace
Application
ControlAgent
ShmSpace
![Page 39: RDMA in Data Centers: Looking Back and Looking Forwardconferences.sigcomm.org/events/apnet2017/slides/cx.pdf · Microsoft Research. ... • Cloud scale services: IaaS, PaaS, Search,](https://reader031.vdocuments.site/reader031/viewer/2022030509/5ab8582a7f8b9ac60e8cb43c/html5/thumbnails/39.jpg)
39
RDMAforDNN
• TCPdoesnotworkfordistributedDNNtraining
• For16-GPU,2-hostspeechtrainingwithCNTK,TCPcommunicationsdominantthetrainingtime(72%),RDMAismuchfaster(44%)
![Page 40: RDMA in Data Centers: Looking Back and Looking Forwardconferences.sigcomm.org/events/apnet2017/slides/cx.pdf · Microsoft Research. ... • Cloud scale services: IaaS, PaaS, Search,](https://reader031.vdocuments.site/reader031/viewer/2022030509/5ab8582a7f8b9ac60e8cb43c/html5/thumbnails/40.jpg)
40
• HowmanyLOCfora“helloworld”communicationusingRDMA?• ForTCP,itis60LOCforclientorservercode• ForRDMA,itiscomplicated…
• IBVerbs:600LOC• RCMACM:300LOC• Rsocket:60LOC
RDMAProgramming
![Page 41: RDMA in Data Centers: Looking Back and Looking Forwardconferences.sigcomm.org/events/apnet2017/slides/cx.pdf · Microsoft Research. ... • Cloud scale services: IaaS, PaaS, Search,](https://reader031.vdocuments.site/reader031/viewer/2022030509/5ab8582a7f8b9ac60e8cb43c/html5/thumbnails/41.jpg)
41
• MakeRDMAprogrammingmoreaccessible• Easy-to-setupRDMAserverandswitchconfigurations• CanIrunanddebugmyRDMAcodeonmydesktop/laptop?• Highqualitycodesamples
• Looselycoupledvstightlycoupled(Send/Recv vsWrite/Read)
RDMAProgramming
![Page 42: RDMA in Data Centers: Looking Back and Looking Forwardconferences.sigcomm.org/events/apnet2017/slides/cx.pdf · Microsoft Research. ... • Cloud scale services: IaaS, PaaS, Search,](https://reader031.vdocuments.site/reader031/viewer/2022030509/5ab8582a7f8b9ac60e8cb43c/html5/thumbnails/42.jpg)
42
Summary:RDMAfordatacenters!• RDMAisexperiencingarenaissanceindatacenters
• RoCEv2hasbeenrunningsafelyinMicrosoftdatacentersfortwoandhalfyears
• Manyopportunitiesandinterestingproblemsforhigh-speed,low-latencyRDMAnetworking
• ManyopportunitiesinmakingRDMAaccessibletomoredevelopers
![Page 43: RDMA in Data Centers: Looking Back and Looking Forwardconferences.sigcomm.org/events/apnet2017/slides/cx.pdf · Microsoft Research. ... • Cloud scale services: IaaS, PaaS, Search,](https://reader031.vdocuments.site/reader031/viewer/2022030509/5ab8582a7f8b9ac60e8cb43c/html5/thumbnails/43.jpg)
43
• YanCai,GangCheng,ZhongDeng,DanielFirestone,JunchengGu,ShuihaiHu,HongqiangLiu,MarinaLipshteyn,AliMonfared,JitendraPadhye,GauravSoni,HaitaoWu,JianxiYe,YiboZhu
• Azure,Bing,CNTK,Phillycollaborators• AristaNetworks,Cisco,Dell,Mellanoxpartners
Acknowledgement
![Page 44: RDMA in Data Centers: Looking Back and Looking Forwardconferences.sigcomm.org/events/apnet2017/slides/cx.pdf · Microsoft Research. ... • Cloud scale services: IaaS, PaaS, Search,](https://reader031.vdocuments.site/reader031/viewer/2022030509/5ab8582a7f8b9ac60e8cb43c/html5/thumbnails/44.jpg)
44
Questions?