03/12/08nuova systems inc. page 1 tcp issues in the data center tom lyon the future of tcp:...

15
03/12/08 Nuova Systems Inc. Page 1 TCP Issues in the Data Center Tom Lyon The Future of TCP: Train-wreck or Evolution? Stanford University 2008-04-01

Upload: kimberly-welch

Post on 22-Dec-2015

219 views

Category:

Documents


1 download

TRANSCRIPT

Page 1: 03/12/08Nuova Systems Inc. Page 1 TCP Issues in the Data Center Tom Lyon The Future of TCP: Train-wreck or Evolution? Stanford University 2008-04-01

03/12/08 Nuova Systems Inc.

Page 1

TCP Issues in the Data Center

Tom Lyon

The Future of TCP: Train-wreck or Evolution?Stanford University

2008-04-01

Page 2: 03/12/08Nuova Systems Inc. Page 1 TCP Issues in the Data Center Tom Lyon The Future of TCP: Train-wreck or Evolution? Stanford University 2008-04-01

03/12/08 Nuova Systems Inc.

Page 2

TCP: Not Just for “The Internet”

Essentially all network software relies on TCP/IP semantics

“The network is the data center” In the data center, gigabits are “free”

105 times cheaper than WAN bandwidth Terabit class switches 10Gb endpoints

TCP needs: High bandwidth Low Latency Predictability & Fairness

Page 3: 03/12/08Nuova Systems Inc. Page 1 TCP Issues in the Data Center Tom Lyon The Future of TCP: Train-wreck or Evolution? Stanford University 2008-04-01

03/12/08 Nuova Systems Inc.

Page 3

Storage Networks

Storage Access slowly evolving from hardware bus to open network

NAS vs SAN NFS & CIFS vs SCSI's many flavors

Ethernet vs Fibre Channel vs Infiniband

Page 4: 03/12/08Nuova Systems Inc. Page 1 TCP Issues in the Data Center Tom Lyon The Future of TCP: Train-wreck or Evolution? Stanford University 2008-04-01

03/12/08 Nuova Systems Inc.

Page 4

Storage Networks: Ethernet vs EtherNot

iSCSI, NFS, CIFS TCP & Ethernet Congestion Loss Stream Oriented Software Transport High CPU overhead

SCSI-FCP, SCSI-SRP F.C. and Infiniband Credit Flow Control Block Oriented Hardware Transport Low CPU overhead

Page 5: 03/12/08Nuova Systems Inc. Page 1 TCP Issues in the Data Center Tom Lyon The Future of TCP: Train-wreck or Evolution? Stanford University 2008-04-01

03/12/08 Nuova Systems Inc.

Page 5

Storage Networks: Convergence

Data Center Ethernet Choice of congestion classes

Lossy vs lossless Choice of storage transports

TCP or F.C. (FCOE) Choice of hardware or software transport

TOE w TCP, software FCOE, ...

Page 6: 03/12/08Nuova Systems Inc. Page 1 TCP Issues in the Data Center Tom Lyon The Future of TCP: Train-wreck or Evolution? Stanford University 2008-04-01

03/12/08 Nuova Systems Inc.

Page 6

TCP: Time Out of Joint

TCP was standardized in a much slower world ½ Second minimum retransmit timeout 20 micro-second RTT achievable today! Fast re-transmit algorithm only works for streams

– more data being sent Most data center traffic is request/response –

often single packets Packet loss hurts because TCP won't (not can't)

respond fast enough

Page 7: 03/12/08Nuova Systems Inc. Page 1 TCP Issues in the Data Center Tom Lyon The Future of TCP: Train-wreck or Evolution? Stanford University 2008-04-01

03/12/08 Nuova Systems Inc.

Page 7

Congestion in the Data Center

Gigantic, non-blocking switches are the norm Hundreds of ports, terabits of throughput

Buffers and buffer management are the most costly part of the switch

Link based flow control (“pause”) allows switch to push congestion back to its upstream neighbors

If the upstream neighbor is the source server, then the congestion “Goes away”

Or does it?

Page 8: 03/12/08Nuova Systems Inc. Page 1 TCP Issues in the Data Center Tom Lyon The Future of TCP: Train-wreck or Evolution? Stanford University 2008-04-01

03/12/08 Nuova Systems Inc.

Page 8

Servers and Gigabits

Any current x86 server can easily saturate a 1Gb Ethernet link with TCP traffic

Many current servers can saturate 10Gb Ethernet links!

Lossless classes cause the pipe to fill faster What happens when the first hop, the server's

own Ethernet link, is the point of congestion?

Page 9: 03/12/08Nuova Systems Inc. Page 1 TCP Issues in the Data Center Tom Lyon The Future of TCP: Train-wreck or Evolution? Stanford University 2008-04-01

03/12/08 Nuova Systems Inc.

Page 9

TCP and the Fat Pipe

If TCP doesn't “see” congestion (loss or ECN) then it will continue to increase its window to try to get more bandwidth in the network

Lossless network => high throughput But... a single streaming connection will consume

all available buffers Newer connections will have a hard time getting

buffers => extreme unfairness The server needs good congestion management

Page 10: 03/12/08Nuova Systems Inc. Page 1 TCP Issues in the Data Center Tom Lyon The Future of TCP: Train-wreck or Evolution? Stanford University 2008-04-01

03/12/08 Nuova Systems Inc.

Page 10

Servers, Ethernet, and Queues

“Everyone” knows that big, simple FIFO queues are a bad idea in routers

What do servers have today? - big, simple FIFO queues!

The queues are owned and maintained by the Ethernet NIC hardware

Horrible unfairness can be demonstrated with only 2 TCP connections

Many servers deal with 1000s of TCP connections

Page 11: 03/12/08Nuova Systems Inc. Page 1 TCP Issues in the Data Center Tom Lyon The Future of TCP: Train-wreck or Evolution? Stanford University 2008-04-01

03/12/08 Nuova Systems Inc.

Page 11

Connection Size vs Throughput – idle 1G link

10 100 1,000 10,000 100,000 1,000,000 10,000,000 100,000,000 1,000,000,000

0

100000000

200000000

300000000

400000000

500000000

600000000

700000000

800000000

900000000

1000000000

Throughput

Page 12: 03/12/08Nuova Systems Inc. Page 1 TCP Issues in the Data Center Tom Lyon The Future of TCP: Train-wreck or Evolution? Stanford University 2008-04-01

03/12/08 Nuova Systems Inc.

Page 12

10 100 1,000 10,000 100,000 1,000,000 10,000,000 100,000,000 1,000,000,000

0

50000000

100000000

150000000

200000000

250000000

300000000

350000000

400000000

450000000

500000000

Ideal

Actual

Connection Size vs Throughput – busy 1G link – competing with a single “hog” connection

UN

FA

IR!

Page 13: 03/12/08Nuova Systems Inc. Page 1 TCP Issues in the Data Center Tom Lyon The Future of TCP: Train-wreck or Evolution? Stanford University 2008-04-01

03/12/08 Nuova Systems Inc.

Page 13

Improving Server Congestion Management

Omitted due to event rules!

Page 14: 03/12/08Nuova Systems Inc. Page 1 TCP Issues in the Data Center Tom Lyon The Future of TCP: Train-wreck or Evolution? Stanford University 2008-04-01

03/12/08 Nuova Systems Inc.

Page 14

TCP: Rock or Hard Place?

With lossy Ethernet, TCP bandwidth can collapse due to stupidly high timeouts => Unpredictable performance

With lossless Ethernet, TCP fairness can collapse due to stupid queuing policies => Unpredictable performance

Data Center Managers hate unpredictability Ethernet standards have evolved, TCP needs to

catch up TCP and Ethernet implementations must improve

Page 15: 03/12/08Nuova Systems Inc. Page 1 TCP Issues in the Data Center Tom Lyon The Future of TCP: Train-wreck or Evolution? Stanford University 2008-04-01

03/12/08 Nuova Systems Inc.

Page 15

Why does this matter?

The Earth is being paved by data centers Google, Microsoft, NSA, Walmart, Facebook, ...

Improving TCP means more overall efficiency in the data center

Heat, CO2, and radioactive waste are becoming

measurable by-products of TCP inefficiency

Fix TCP => Save the World!