optimizing udp-based protocol implementations yunhong gu and robert l. grossman presenter: michal...
TRANSCRIPT
Optimizing UDP-based Protocol Implementations
Yunhong Gu and Robert L. Grossman
Presenter: Michal Sabala
National Center for Data Mining
Outline
• UDP Performance Characteristics and Optimizations
• Composable UDT: A Framework for UDP-based Protocol Implementations
Part I. UDP Performance Characteristics and Optimization Techniques
Introduction
• UDP-based Protocol is needed– As short-term solution to the lack of effective
kernel space transport protocols for high bandwidth-delay product networks
– As application specific data transfer library, e.g., Multimedia data transfer
• It is not an easy task to impalement a new UDP-based protocol from scratch– And may be not necessary!
UDP Performance
• Sending and receiving buffer size• Packet size• IO mode
– Scattering/gathering (writev/readv)– Memory copy avoidance (e.g., overlapped IO
of Windows Socket2)
• To reach same data transfer rate, UDP needs slightly less CPU time than TCP, and cause slightly less end system delay
UDP Performance: Impact of Buffer Size
1664
2561K
4K16K
16642561K4K16K
0
333
666
1000
Sender's Buffer (KB)Receiver's Buffer (KB)
Thro
ughp
ut (M
bps)
UDP Performance: Impact of Packet Size
100 300 600 900 1500 3000 6000 90000
200
400
600
800
1000
Packet Size (Bytes)
CPU
Util
i. (M
Hz/
Mbp
s)
0
2
4
6
8
Thro
ughp
ut (M
bps)
sendingreceiving
ThroughputCPU Util.
UDP-based Protocol Performance
• Additional overhead– Additional memory copy– Additional packet processing– Additional context switches
Optimization Guidelines
• Avoid additional memory copy
• Reduce the number of packets– Control packets, esp. acknowledgements
• Reduce overall processing time– Simpler mechanism is better
• Avoid burst in processing time– CPU may be too busy to process incoming
packets
Optimization Guidelines
• Memory copy avoidance– UDP IO– API semantics
• Acknowledgements– Timer-based Acknowledging– Light ACK– Loss processing
• Timing, rate control, and self-clocking
Optimization Guidelines
• Disk IO– sendfile/recvfile
• Threading– Synchronization cost
• Code Optimization– sending/receiving loop
• Profiling
Part II. Composable UDT: A Framework for UDP-based Protocol Implementations
Composable UDT
• Based on the UDT (UDP-based Data Transfer library) implementation
• Integrated those optimization techniques described in this paper
Objectives
• Rapid development of UDP-based transport protocols and application specific data transfer libraries
• Easy evaluation of new congestion control algorithms
• Non-objectives– Replace kernel space protocol implementations– User-level TCP implementation
Current Status
• UDT/CCC: Configurable congestion control
• In future– Data reliability configuration– Message boundary support
Configurable Congestion Control
• Packet sending control– Rate-based, window-based, hybrid
• Redefinition of control event handlers– Loss, ACK, Time Out, etc.
• Access to internal protocol parameters– RTT, RTO, Loss Rate, etc.
• User customized packet formats
Implementation
• C++ class inheritance– CCC: base class for control event handing
• Callbacks
• Performance monitoring– Internal protocol parameters– Performance statistics
Implementation
UDP
OS Socket Interface
UDT
UDT Socket
Application CC
CC
Ca
llba
cks
Me
mo
ry Co
py B
ypa
ss
Example: Simplified TCP
class CTCP: public CCC{public: virtual void init() { m_dPktSndPeriod = 0.0; m_dCWndSize = 2.0; setACKInterval(2); } virtual void onACK(const int&) { m_dCWndSize += 1.0/m_dCWndSize; } virtual void onLoss(const int*, const int&) { m_dCWndSize *= 0.5; }};
Configurable Congestion Control
0
100
200
300
400
500
600
700
1 501 1001 1501 2001 2501 3001 3501 4001 4501 5001 5501 6001
Time (Seconds)
Th
rou
gh
pu
t (M
b/s
)
U-TCP K-TCP Avg. U-TCP Avg. K-TCP
Future Work
• Continue to improve the UDT/CCC library
• More experimental evaluation work of the UDT/CCC library– Compare k-TCP and u-TCP in more network
environments– Implement more TCP variants
• More pre-implemented congestion control algorithms
Conclusion
• UDP-based protocol is one of the solutions to bulk data transfer in high BDP networks
• Some optimization principles and techniques are discussed in this paper
• We further propose a composable framework in order to make it much easier to implement UDP-based protocols
Thank you!
For more information, please visit
UDT Project: http://udt.sf.net
NCDM: http://www.ncdm.uic.edu
Backup Slides
UDP Performance: Experiment Setup
Name CPU Memory NIC OS
onnoDual Itanium2
1.5GHz8 GB 10 GbE Linux 2.6.0
sara77Dual Xeon
2.4GHz2 GB 1 GbE Linux 2.4.18
ncdm171Dual PowerPC
G4 1GHz2 GB 1 GbE Mac OS X
win91Dual Xeon
2.4GHz2 GB 1 GbE
Windows XP Professional
ncdm87Dual Opteron
2.4GHz4 GB 1 GbE Linux 2.6.8
UDP Performance: CPU Utilization
NameUDP TCP
Sending Receiving Sending Receiving
onno 0.22 0.35 0.23 0.50
sara77 0.40 0.45 0.51 0.51
ncdm171 1.22 1.45 2.22 2.73
win91 1.03 1.09 1.14 1.28
ncdm87 0.26 0.40 0.25 0.56
UDP Performance: End System Delay
NameUDP TCP
Delay (ms) Delay (ms)
onno 0.062 0.068
sara77 0.070 0.086
ncdm171 0.202 0.245
win91 0.203 0.302
ncdm87 0.065 0.087
UDT Profiling: Modules
0
20
40
60
80
100
Per
cen
tag
e Sending Receiving
UDT Profiling: Functionalities
0
5
10
15
20
25
30
Per
cen
tag
e Sending Receiving
CPU Utilization: K-TCP vs U-TCP
MachinesSender Receiver
K-TCP U-TCP K-TCP U-TCP
onno 0.23 0.60 0.50 0.44
sara77 0.51 0.65 0.51 0.78
ncdm171 2.22 3.46 2.73 3.26
win91 1.14 2.07 1.28 1.20
ncdm87 0.25 0.52 0.56 0.60