reliable sockets: a foundation for mobile communications
DESCRIPTION
Reliable Sockets: A Foundation for Mobile Communications. Victor C. Zandy Computer Sciences Department University of Wisconsin-Madison. Motivation. Network communication is unreliable Modems disconnect spontaneously Computers run on batteries Many IP addresses are not static - PowerPoint PPT PresentationTRANSCRIPT
Paradyn/Condor Week (March 2001, Madison WI)©2001 Victor C. Zandy
Reliable Sockets:A Foundation for Mobile
Communications
Victor C. ZandyComputer Sciences Department
University of Wisconsin-Madison
[2/36] Paradyn/Condor Week 2001©2001 Victor C. Zandy
Motivation
• Network communication is unreliable•Modems disconnect spontaneously•Computers run on batteries
• Many IP addresses are not static•Assignment by DHCP•Mobile computers move across networks
• Applications do not respond well to these failures
[3/36] Paradyn/Condor Week 2001©2001 Victor C. Zandy
Reliable Sockets (Rocks)
• Sockets that tolerate • IP address changes•Link failures•Extended periods of disconnection
• Automatically detect failures and recover
• No loss of in-flight data
• Applications are oblivious to failures
[4/36] Paradyn/Condor Week 2001©2001 Victor C. Zandy
Rocks are General Purpose
• Rocks can be used for•UDP and TCP (and everything over them)•Connected sockets and listening sockets
• Interoperate with plain sockets
• Transparent, user-level, and portable
[5/36] Paradyn/Condor Week 2001©2001 Victor C. Zandy
Applications
• Remote shells•Mail, editor•Long-running builds
• Remote GUI-based applications•Office apps
• Mobile and reliable UDP•Streaming video and audio
[6/36] Paradyn/Condor Week 2001©2001 Victor C. Zandy
Applications
• Process migration•Checkpoint Condor jobs with open sockets•Migrate desktop applications
[7/36] Paradyn/Condor Week 2001©2001 Victor C. Zandy
Related Work
• Emphasize mobility, not reliability• No extended periods of disconnection• Lack mechanisms for failure detection and automatic reconnection
• Based on kernel modifications• Must be root to install• Unportable• Protocol internals
• Mobile IP, TCP Migrate, MSOCKS
[8/36] Paradyn/Condor Week 2001©2001 Victor C. Zandy
TCP Sockets
IP: 128.1.2.310000
Send
RecvPort
Sockets API
Network
Kernel
Application
Host A
TCP Socket
[9/36] Paradyn/Condor Week 2001©2001 Victor C. Zandy
TCP Data Flow
IP: 128.1.2.310000
Send
RecvPort
write
Sockets API
IP: 144.0.1.122
Send
RecvPort
Sockets API
21 3 4 5
21 3
Host A Host B
[10/36] Paradyn/Condor Week 2001©2001 Victor C. Zandy
TCP Data Flow
IP: 128.1.2.310000
Send
RecvPort
write
Sockets API
IP: 144.0.1.122
Send
RecvPort
Sockets API
21 3 4 5
4 5
21 3
Host A Host B
[11/36] Paradyn/Condor Week 2001©2001 Victor C. Zandy
TCP Data Flow
IP: 128.1.2.310000
Send
RecvPort
write
Sockets API
IP: 144.0.1.122
Send
RecvPort
Sockets API
21 3 4 5
4 5
21 3
Host A Host B
In-flight data
[12/36] Paradyn/Condor Week 2001©2001 Victor C. Zandy
TCP Data Flow
IP: 128.1.2.310000
Send
RecvPort
write
Sockets API
IP: 144.0.1.122
Send
RecvPort
read
Sockets API
21 3 4 5
4 5
21 3
Host A Host B21 3
[13/36] Paradyn/Condor Week 2001©2001 Victor C. Zandy
Socket Failures
IP: 128.1.2.310000
Send
RecvPort
Sockets API
IP: 144.0.1.122
Send
RecvPort
Sockets API
Disconnection• Host suspension• Link failure
New IP Address• Host movement• Lease expiry• Process migration
Host A Host B
[14/36] Paradyn/Condor Week 2001©2001 Victor C. Zandy
Effect on Applications
IP: 128.1.2.310000
Send
RecvPort
Sockets API
IP: 144.0.1.122
Send
RecvPort
Sockets API
write read
Sockets API calls fail
In-flight data is lost
Host A Host B
[15/36] Paradyn/Condor Week 2001©2001 Victor C. Zandy
What Rocks Do
• Detect socket failure•Hide failure from the application
• Automatically reconnect•Recover in-flight data
[16/36] Paradyn/Condor Week 2001©2001 Victor C. Zandy
Reliable Sockets
NetworkIP: 128.1.2.3
Kernel
Application
Rocks Library
10000
Send
RecvPort
In-Flight
Sockets API
Sockets API
Host A
TCP Socket
Rock
[17/36] Paradyn/Condor Week 2001©2001 Victor C. Zandy
Rock Data Flow
IP: 128.1.2.310000
Send
RecvPort
In-Flight
Sockets API
write
Sockets API
IP: 144.0.1.122
Send
RecvPort
In-Flight
Sockets API
Sockets API
read
Count bytes read.
Host A Host B
Copy data.Count bytes sent.
[18/36] Paradyn/Condor Week 2001©2001 Victor C. Zandy
Response to Failure
IP: 128.1.2.310000
Send
RecvPort
In-Flight
Sockets API
write
Sockets API
IP: 144.0.1.122
Send
RecvPort
In-Flight
Sockets API
Sockets API
Host A Host B
[19/36] Paradyn/Condor Week 2001©2001 Victor C. Zandy
Response to Failure
IP: 128.1.2.310000
Send
RecvPort
In-Flight
Sockets API
Sockets API
IP: 144.0.1.122
Send
RecvPort
In-Flight
Sockets API
Sockets API
Host A Host B
[20/36] Paradyn/Condor Week 2001©2001 Victor C. Zandy
Response to Failure
IP: 128.1.2.310000
Send
RecvPort
In-Flight
Sockets API
Sockets API
IP: 144.0.1.122
Send
RecvPort
In-Flight
Sockets API
Sockets APIEach rock detects the failure within seconds.
! !
Host A Host B
[21/36] Paradyn/Condor Week 2001©2001 Victor C. Zandy
Response to Failure
IP: 128.1.2.3
In-Flight
Sockets API
Sockets API
IP: 144.0.1.1
In-Flight
Sockets API
Sockets APIEach rock suspends:
•Close TCP socket
•Block application
•Attempt to reconnect
Host A Host B
[22/36] Paradyn/Condor Week 2001©2001 Victor C. Zandy
Response to Failure
IP: 207.10.0.1
In-Flight
Sockets API
Sockets API
IP: 144.0.1.1
In-Flight
Sockets API
Sockets APIEach rock suspends:
•Close TCP socket
•Block application
•Attempt to reconnect
New IP Address
Host A Host B
[23/36] Paradyn/Condor Week 2001©2001 Victor C. Zandy
Recovery
IP: 207.10.0.1
In-Flight
Sockets API
Sockets API
IP: 144.0.1.1
In-Flight
Sockets API
Sockets API
30001
Send
RecvPort 22
Send
RecvPort
New TCP Connection
Host A Host B
[24/36] Paradyn/Condor Week 2001©2001 Victor C. Zandy
Recovery
IP: 207.10.0.1
In-Flight
Sockets API
Sockets API
IP: 144.0.1.1
In-Flight
Sockets API
Sockets API
30001
Send
RecvPort 22
Send
RecvPort
Authenticate.
Host A Host B
[25/36] Paradyn/Condor Week 2001©2001 Victor C. Zandy
Recovery
IP: 207.10.0.1
In-Flight
Sockets API
Sockets API
IP: 144.0.1.1
In-Flight
Sockets API
Sockets API
30001
Send
RecvPort 22
Send
RecvPort
Authenticate.
Retransmit in-flight data not received by remote application.
Host A Host B
[26/36] Paradyn/Condor Week 2001©2001 Victor C. Zandy
Recovery
IP: 207.10.0.1
In-Flight
Sockets API
Sockets API
IP: 144.0.1.1
In-Flight
Sockets API
Sockets API
30001
Send
RecvPort 22
Send
RecvPort
Authenticate.
Retransmit in-flight data not received by remote application.
Then resume the rock.
read
Host A Host B
[27/36] Paradyn/Condor Week 2001©2001 Victor C. Zandy
Reconnection
144.0.1.1128.1.2.3
Host A Host B
[28/36] Paradyn/Condor Week 2001©2001 Victor C. Zandy
Reconnection
Connection end moves to new IP address
144.0.1.1
Change IP Address
207.10.0.1
Host AHost B
[29/36] Paradyn/Condor Week 2001©2001 Victor C. Zandy
Reconnection
Each end attempts to reconnect to its peer at its last known address.
207.10.0.1144.0.1.1Connection does
not complete
Host AHost B
[30/36] Paradyn/Condor Week 2001©2001 Victor C. Zandy
Reconnection
As long as one end does not move, they eventually reconnect.
144.0.1.1 207.10.0.1
Host AHost B
[31/36] Paradyn/Condor Week 2001©2001 Victor C. Zandy
Reconnection
They cannot reconnect if both ends move.
101.8.7.1 207.10.0.1
Host AHost B
Connection does not complete
Connection does not complete
[32/36] Paradyn/Condor Week 2001©2001 Victor C. Zandy
Reconnection
101.8.7.1 207.10.0.1
Host AHost B
Network Proxy
Where is A? Where is B?
[33/36] Paradyn/Condor Week 2001©2001 Victor C. Zandy
Expanded Rocks API
• API allows rocks-aware applications to control rocks behavior
• Fine control of reconnection•Notification when rock is suspended•Manual control of reconnection addresses•Notification when rock is resumed
[34/36] Paradyn/Condor Week 2001©2001 Victor C. Zandy
Expanded Rocks API
• New socket options•Extended getsockopt and setsockopt
• Policies•Which ports are excluded?
• Parameters•Reconnection timeout•Sensitivity to connection failures
[35/36] Paradyn/Condor Week 2001©2001 Victor C. Zandy
Performance
• Reconnection latency• 1-2 seconds to reconnect• Usually less than time to acquire DHCP lease
• Suspended rocks have negligible overhead
Sockets Rocks Slowdown
10MB FTP 27.38 s 26.48 s 1x
Connect 5.6 ms 20.4 ms 4x
[36/36] Paradyn/Condor Week 2001©2001 Victor C. Zandy
Conclusion
• Rocks make sockets completely reliable• Protect from link failures and IP address changes• Use with any application
• Our release is ready for download• Ready for remote shells and remote GUIs• http://www.cs.wisc.edu/~zandy/rocks
• See the demo on Wednesday!
[37/36] Paradyn/Condor Week 2001©2001 Victor C. Zandy
Detecting Failures
• Users expect quick response to failures.
• Heartbeat:• Periodically send heartbeat to peer• Watch for too many missed heartbeats
• Sockets API Errors:•Too slow to rely upon•Not reported for idle connections
[38/36] Paradyn/Condor Week 2001©2001 Victor C. Zandy
Detecting Failures
• The TCP keep-alive probe is inadequate• It waits two hours to send its first probe•User cannot change its period