03/12/08nuova systems inc. page 1 tcp issues in the data center tom lyon the future of tcp:...
TRANSCRIPT
03/12/08 Nuova Systems Inc.
Page 1
TCP Issues in the Data Center
Tom Lyon
The Future of TCP: Train-wreck or Evolution?Stanford University
2008-04-01
03/12/08 Nuova Systems Inc.
Page 2
TCP: Not Just for “The Internet”
Essentially all network software relies on TCP/IP semantics
“The network is the data center” In the data center, gigabits are “free”
105 times cheaper than WAN bandwidth Terabit class switches 10Gb endpoints
TCP needs: High bandwidth Low Latency Predictability & Fairness
03/12/08 Nuova Systems Inc.
Page 3
Storage Networks
Storage Access slowly evolving from hardware bus to open network
NAS vs SAN NFS & CIFS vs SCSI's many flavors
Ethernet vs Fibre Channel vs Infiniband
03/12/08 Nuova Systems Inc.
Page 4
Storage Networks: Ethernet vs EtherNot
iSCSI, NFS, CIFS TCP & Ethernet Congestion Loss Stream Oriented Software Transport High CPU overhead
SCSI-FCP, SCSI-SRP F.C. and Infiniband Credit Flow Control Block Oriented Hardware Transport Low CPU overhead
03/12/08 Nuova Systems Inc.
Page 5
Storage Networks: Convergence
Data Center Ethernet Choice of congestion classes
Lossy vs lossless Choice of storage transports
TCP or F.C. (FCOE) Choice of hardware or software transport
TOE w TCP, software FCOE, ...
03/12/08 Nuova Systems Inc.
Page 6
TCP: Time Out of Joint
TCP was standardized in a much slower world ½ Second minimum retransmit timeout 20 micro-second RTT achievable today! Fast re-transmit algorithm only works for streams
– more data being sent Most data center traffic is request/response –
often single packets Packet loss hurts because TCP won't (not can't)
respond fast enough
03/12/08 Nuova Systems Inc.
Page 7
Congestion in the Data Center
Gigantic, non-blocking switches are the norm Hundreds of ports, terabits of throughput
Buffers and buffer management are the most costly part of the switch
Link based flow control (“pause”) allows switch to push congestion back to its upstream neighbors
If the upstream neighbor is the source server, then the congestion “Goes away”
Or does it?
03/12/08 Nuova Systems Inc.
Page 8
Servers and Gigabits
Any current x86 server can easily saturate a 1Gb Ethernet link with TCP traffic
Many current servers can saturate 10Gb Ethernet links!
Lossless classes cause the pipe to fill faster What happens when the first hop, the server's
own Ethernet link, is the point of congestion?
03/12/08 Nuova Systems Inc.
Page 9
TCP and the Fat Pipe
If TCP doesn't “see” congestion (loss or ECN) then it will continue to increase its window to try to get more bandwidth in the network
Lossless network => high throughput But... a single streaming connection will consume
all available buffers Newer connections will have a hard time getting
buffers => extreme unfairness The server needs good congestion management
03/12/08 Nuova Systems Inc.
Page 10
Servers, Ethernet, and Queues
“Everyone” knows that big, simple FIFO queues are a bad idea in routers
What do servers have today? - big, simple FIFO queues!
The queues are owned and maintained by the Ethernet NIC hardware
Horrible unfairness can be demonstrated with only 2 TCP connections
Many servers deal with 1000s of TCP connections
03/12/08 Nuova Systems Inc.
Page 11
Connection Size vs Throughput – idle 1G link
10 100 1,000 10,000 100,000 1,000,000 10,000,000 100,000,000 1,000,000,000
0
100000000
200000000
300000000
400000000
500000000
600000000
700000000
800000000
900000000
1000000000
Throughput
03/12/08 Nuova Systems Inc.
Page 12
10 100 1,000 10,000 100,000 1,000,000 10,000,000 100,000,000 1,000,000,000
0
50000000
100000000
150000000
200000000
250000000
300000000
350000000
400000000
450000000
500000000
Ideal
Actual
Connection Size vs Throughput – busy 1G link – competing with a single “hog” connection
UN
FA
IR!
03/12/08 Nuova Systems Inc.
Page 13
Improving Server Congestion Management
Omitted due to event rules!
03/12/08 Nuova Systems Inc.
Page 14
TCP: Rock or Hard Place?
With lossy Ethernet, TCP bandwidth can collapse due to stupidly high timeouts => Unpredictable performance
With lossless Ethernet, TCP fairness can collapse due to stupid queuing policies => Unpredictable performance
Data Center Managers hate unpredictability Ethernet standards have evolved, TCP needs to
catch up TCP and Ethernet implementations must improve
03/12/08 Nuova Systems Inc.
Page 15
Why does this matter?
The Earth is being paved by data centers Google, Microsoft, NSA, Walmart, Facebook, ...
Improving TCP means more overall efficiency in the data center
Heat, CO2, and radioactive waste are becoming
measurable by-products of TCP inefficiency
Fix TCP => Save the World!