transport and application layer approaches to improve end-to-end performance in the internet phd...

Download Transport and Application Layer Approaches to Improve End-to-end Performance in the Internet PhD thesis defense Amit Mondal Committee: Aleksandar Kuzmanovic,

If you can't read please download the document

Upload: curtis-mathews

Post on 23-Dec-2015

215 views

Category:

Documents


2 download

TRANSCRIPT

  • Slide 1
  • Transport and Application Layer Approaches to Improve End-to-end Performance in the Internet PhD thesis defense Amit Mondal Committee: Aleksandar Kuzmanovic, Asst. Professor, Northwestern Univ Peter Dinda, Assoc. Professor, Northwestern Univ Yan Chen, Assoc. Professor, Northwestern Univ Jin Li, Principal Researcher, Microsoft Research
  • Slide 2
  • The Internet is a commercial infrastructure used by diverse set of applications and services Internet A multiservice IP network 2 FTPIPTVVoIP Video ConferencingStreaming Gaming
  • Slide 3
  • Challenges involved Applications have end-to-end network performance requirements Jitter, latency, packet loss, bandwidth, etc Original Internet Best effort service No service assurance TCP ensures only in-order packet delivery Destination-based IP routing 3 low delay high throughput Need to provide support to new set of emerging applications in the Internet
  • Slide 4
  • Application classification based on QoS Low bandwidthHigh bandwidth Latency sensitive VoIP, network games, SSH, chatting, web browsing, e- commerce Multimedia streaming, IPTV, audio/video conferencing Latency insensitive EmailFile transfer (FTP/BitTorrent) My focus: Low-latency interactive TCP applications (Chapter II and III) Telnet, SSH, network games, e-commerce, etc. Interactive multimedia services (Chapter IV and V) Audio/video conferencing, VoIP, streamed multimedia services, etc. 4
  • Slide 5
  • Overlay routing (QRON, QSON, etc.) Chapter-IV ECN, ECN+, packet marking & differential dropping, Service differentiation, etc. IntServ, DiffServ, Traffic engineering, Constraint based routing MPLS Bandwidth over- provisioning Forward error correction, Bitrate adaptation, Chapter-V, etc. TCP smart framing, Limited retransmit, Early retransmit, Chapter-II, Chapter-III, etc. N/A Infrastr Endpoint Application Transport Network Data Link Physical The spectrum of QoS provisioning 5
  • Slide 6
  • Research thesis For example, I propose techniques that improve Response times of short TCP flows by five times in certain scenarios Median Mean Opinion Score (MOS) of VoIP calls over WiFi by a factor of two 6 Despite much work to improve end-to-end performance in the Internet, there still exists a significant space for improvement. In my dissertation, I develop techniques to reduce the gap further.
  • Slide 7
  • Outline Chapter I: Introduction Chapter II: Improving performance of thin-stream TCP applications Chapter III: Removing exponential backoff from TCP Chapter IV: Multi-constraint QoS routing framework Chapter V: Audio/video performance Issues: Diagnosis and solutions Conclusion 7
  • Slide 8
  • 8 Chapter II: Improving thin-stream TCP flows data packets dummy packets strict priority TCP-fair rate Upgrading mice to elephants Packet switchedCircuit switched A. Mondal and A. Kuzmanovic, When TCP Friendliness Becomes Harmful, IEEE INFOCOM 2007 A. Mondal and A. Kuzmanovic, Upgrading Mice to Elephants: Effects and End-Point Solutions, IEEE/ACM Transactions on Networking, Volume 18, Issue 2, April 2010
  • Slide 9
  • 9 Chapter III: Removing Exponential Backoff from TCP V. Jacobson, Congestion Avoidance and Control, in ACM CCR, 18(4): 314-329, Aug 1988. Exponential retransmit timer backoff Implicit packet conservation principle Response times improvement of short and interactive flows by five times in certain scenarios A. Mondal and A. Kuzmanovic, Removing Exponential Backoff from TCP, In ACM SIGCOMM CCR, Volume 38, Number 5, October 2008.
  • Slide 10
  • Chapter IV: Multi-constraint QoS routing framework We design a framework that finds path under multiple constraints without NP-hard computation Dijkstras algorithm involves NP-hard computation Hybrid protocol of path vector protocol and on- demand route discovery Using simulation based on real-world data we demonstrated that our solution is both efficient and scalable Built a functional prototype using Click Modular router 10 A. Mondal, P. Sharma, S. Banerjee, and A. Kuzmanovic, Supporting Application Network Flows with Multiple QoS Constraints, In IEEE IWQoS 2009
  • Slide 11
  • Chapter V: Audio/video performance issues: Diagnosis and solutions Identify challenges towards high quality audio/video conferencing over the Internet Understand loss and jitter behavior in shorter time scale and quantify impacts of various network scenarios Investigate solutions 11 A. Mondal, R. Cutler, C. Huang, J. Li, and A. Kuzmanovic, SureCall: Towards Glitch-Free Real-time Audio/Video Conferencing, In IEEE IWQoS 2010 A. Mondal, C. Huang, M. Jain, J. Li, and A. Kuzmanovic, A Case of WiFi Relay: Improving VoIP Quality for WiFi Users, In IEEE ICC 2010
  • Slide 12
  • Modern AV conferencing System 12
  • Slide 13
  • SureCall platform A distributed measurement and experiment platform Understand problems and experiment solutions Agents installed on volunteers machines Measurements and experiments driven by masters SureCall agents are upgradeable without user intervention Available from http://research.microsoft.com/~chengh/SureCall/SureCall.htm http://research.microsoft.com/~chengh/SureCall/SureCall.htm 13
  • Slide 14
  • SureCall measurement Emulated bidirectional audio/video sessions using UDP 5 minute per hour Audio bitrate : 24 kbps Video bitrate: 192 kbps STUN NAT traversal protocol for home users Detailed packet-level traces collected Network connectivity close to the clients ICMP packet pair with TTL=2 Traceroute to other endpoint at the beginning and end of each session Environmental details on client machines CPU load, network interface type 14
  • Slide 15
  • SureCall deployment Microsoft global enterprise network Many residential networks Current deployment status 80 unique machines Enterprise - 32 Home 20 Both 28 Enterprise trace and Home trace Two separate masters (within enterprise network and in public Internet) 15
  • Slide 16
  • SureCall dataset 4,800 hours of packet traces 4,100 from enterprise 700 from home 1968 unique IP addresses Enterprise - 1212 Home -756 Trace classification and stratification Intra-continental vs inter- continental Wired vs wireless Audio-only vs audio+video Trace preprocessing Clock skew removal 16 Clock skew in wild
  • Slide 17
  • Jitter computation algorithm Multiple algorithms to compute jitter Variance of one-way-delay samples Time difference between actual packet receiving time and ideal receiving time Mostrelevant for multimedia streaming/conferencing with playout buffer 17
  • Slide 18
  • Jitter in enterprise and residential networks 18 US-US, wired traces Inter-continental, wired traces Residential networks have significantly higher jitter compared to enterprise networks and affected greatly by inter-continental links.
  • Slide 19
  • Jitter variation across hosts 19 Enterprise Home Jitter variation is much higher in residential networks than in enterprise networks. The 95-th percentile jitter values are significantly worse than median jitter values in home networks.
  • Slide 20
  • Packet loss in residential and enterprise networks 20 Even well provisioned enterprise networks can become quite congested in short time scale. Both enterprise and home networks show long tail in loss burst size distribution.
  • Slide 21
  • Impact of WiFi connections 21 Enterprise Home In both enterprise and home networks, wireless traces show significantly worse jitter statistics than wired traces.
  • Slide 22
  • Impact of WiFi connections 22 Enterprise Home In both enterprise and home networks, wireless traces show significantly worse jitter statistics than wired traces. The degradation due to WiFi in enterprise scenarios is more severe than that in home scenarios.
  • Slide 23
  • Impact of VPN on performance 23 JitterLoss VPN connection causes more degradation compared to wireless.
  • Slide 24
  • Can jitter predict future loss events? Extent to which loss and jitter are correlated, i.e. whether abrupt jitter increase can serve as a precursor of network congestion and predict future loss events audio/video conferencing applications can take anticipatory action. > 10 ms average increase in end-to-end delay for the last three packets preceding a loss event enterprise networks ~ 82%, home networks ~ 80% 24
  • Slide 25
  • Correlation between loss burst size and jitter 25 1.End-to-end delay increases significantly before loss events in both enterprise and home networks. 2.Increase in end-to-end delay is not a great indicator of loss burst size in enterprise networks. 1.End-to-end delay increases significantly before loss events in both enterprise and home networks. 2.Increase in end-to-end delay is not a great indicator of loss burst size in enterprise networks. Enterprise Home
  • Slide 26
  • Network audio diagnostics Concealed: percent of packets interpolated or extrapolated due to unrecovered packet loss Stretched: percent of packets stretched via time compression Classifier operates as follows Supervised training with ground-truth objectively determined by PESQ score 26
  • Slide 27
  • Audio classifier performance 27 The classifier achieves a true positive rate >80% and false positive rate < 1% for T1=T2=0.07.
  • Slide 28
  • WiFi Relay: Improving VoIP Quality for WiFi Users Large number of WiFi clients both in enterprise and residential networks 43% enterprises provide only WiFi connections to their employees 36% uses VoIP over WiFi Possible reasons dense deployment of APs, overloading of an AP point, other wireless devices in the vicinity, etc 28 WiFi links can significantly degrade VoIP performance
  • Slide 29
  • Effectiveness of redundancy Passive analysis with voice packet replication Replication ratio r = 2,3,4, or 5 Packet losses can be effectively mitigated using application layer packet replication 29
  • Slide 30
  • Overhead of replication Typical audio packet size = 60 bytes Encapsulated with RTP(12bytes), UDP (8bytes), IP(20bytes), 802.11 MAC(28bytes), PHY (20us for 802.11g) headers. w/o ACK: air time = DIFS + PHY header + (60+76 bytes)/54Mbps = 70 us Replication ratio Air time (us) w/o ACK w/ ACK 170102 279111 387120 496128 Replicating audio packet at application layer causes only marginal increase in air time 30
  • Slide 31
  • WiFi relay solution Nearby wired endpoints as relays Heavy replication between relays and wireless endpoints No dedicated infrastructure 31
  • Slide 32
  • Evaluation Evaluated on SureCall platform Upgrade SureCall clients to support relay Simultaneous direct call and relayed VoIP calls between each pair of SureCall agents Apple-to-apple comparison One-hop overlay (only one wireless endpoint) Two-hop overlay (both endpoints are wireless) Relay node selection based on enterprise internal database 32
  • Slide 33
  • Impact of relay on jitter No dedicated infrastructure, ordinary endpoints as relay nodes CDF of jitter diff at 50 th percentile CDF of jitter diff at 95 th percentile Relay has negligible impact on end-to-end jitter 33
  • Slide 34
  • Improvement with WiFi relay Mean Opinion Score (MOS) Calculated from packet loss rate and jitter (Cole et al. CCR01) Fixed de-jitter buffer of 100 ms WiFi relay significantly improve VoIP quality for WiFi users WiFi relay greatly reduces packet loss 34
  • Slide 35
  • Summary of Chapter V SureCall, a distributed experimental platform, to address the challenges of audio/video communications over Internet. Characterized enterprise and residential networks over a wide variety of network scenarios Classifier that accurately predicts when network issues most likely to cause audio quality degradation WiFi relay that significantly improve VoIP qualify for WiFi clients 35
  • Slide 36
  • Conclusion Proposed easily deployable techniques to improve performance of TCP based interactive applications Demonstrated that exponential backoff can be altogether removed from TCP without any stability issues Designed an overlay framework to support multimedia services with multiple QoS constraints Developed an distributed experimental framework, SureCall, to understand the challenges towards IP based audio/video communications and for rapid evaluation of new protocols 36
  • Slide 37
  • 37 Thank you!
  • Slide 38
  • [1] A. Mondal and A. Kuzmanovic, When TCP Friendliness Becomes Harmful, In IEEE INFOCOM 2007 [2] A. Mondal and A. Kuzmanovic, A Poisoning-Resilient TCP Stack, In IEEE ICNP 2007 [3] A. Mondal and A. Kuzmanovic, Removing Exponential Backoff from TCP, In ACM SIGCOMM CCR, Volume 38, Number 5, October 2008. [4] A. Mondal, P. Sharma, S. Banerjee, and A. Kuzmanovic, Supporting Application Network Flows with Multiple QoS Constraints, In IEEE IWQoS 2009 [5] A. Kuzmanovic, A Mondal, S. Floyd, and K.K. Ramakrishnan. Adding Explicit Congestion Notification (ECN) Capabilities to TCPs SYN/ACK Packets. RFC 5562, June 2009. [6] A. Mondal and A. Kuzmanovic, Upgrading Mice to Elephants: Effects and End-Point Solutions, In IEEE/ACM Transactions on Networking, Volume 18, Issue 2, April 2010 [7] A. Mondal, R. Cutler, C. Huang, J. Li, and A. Kuzmanovic, SureCall: Towards Glitch-Free Real- time Audio/Video Conferencing, In IEEE IWQoS 2010 [8] A. Mondal, C. Huang, M. Jain, J. Li, and A. Kuzmanovic, A Case of WiFi Relay: Improving VoIP Quality for WiFi Users, In IEEE ICC 2010 [9] A. Mondal, I. Trestian, Z. Quin, and A. Kuzmanovic, P2P as CDN (Akamizing BitTorrent), under submission [10] J. Miller, A. Mondal, R. Potharaju, P Dinda, and A. Kuzmanovic, Network Monitoring is People: Understanding End-user Perception of Network Problems, Under submission. Publications 38
  • Slide 39
  • Backup slides 39
  • Slide 40
  • QoS and the Internet QoS Architectures Integrated Service (Intserv) Differentiated Service (Diffserv) Multi Protocol Label Switching (MPLS) Traffic Engineering and Constraint based routing Key Challenges Scalability issues in core Complex signaling protocols Deployment overhead Current Internet still offers only a best-effort service Motivates to investigate easily deployable solutions that improve end-to-end network performance 40
  • Slide 41
  • QoS using transport and application layer techniques without network support Explicit congestion notification [ Floyd 94] Packet marking and differential dropping [Guo and Matta01] Limited transmit [Allman et al. 01] Service differentiation [Neoreddine and Tobagi02] Differential congestion notification [Le et al.04] TCP smart framing [Mellia et al. 05] ECN+ [Kuzmanovic05] Early retransmit [Allman et al.06] TCP SAReno [Yang and Vecinia02] PCP [Anderson et al. 06] 41
  • Slide 42
  • Going beyond TCP-fair Differentiated minRTO Application-limited flows use reduced minRTO value Short-term padding with dummy packets Application data followed by three tiny dummy packets Diversity approach Application layer FEC-based approach The simplest FEC scheme is replication 42
  • Slide 43
  • 43 Why Exponential Backoff? Jacobson adopted exponential backoff from the classical shared-medium Ethernet protocol IP gateway has essentially the same behavior as Ether in a shared-medium network.
  • Slide 44
  • 44 Why Exponential Backoff? Jacobson adopted exponential backoff from the classical shared-medium Ethernet protocol IP gateway has essentially the same behavior as Ether in a shared-medium network. Not true! C C
  • Slide 45
  • Removing exponential backoff from TCP and its implications Other reasons: no admission control, finite flow size, skewed traffic distribution, etc. When to resend a packet? Implicit packet conservation principle As soon as the retransmission timeout expires End-to-end performance can only improve if we remove the exponential backoff from TCP Implications Significant improvement of response times for short and interactive TCP flows 45
  • Slide 46
  • Multiple QoS Constraints The Internet evolves towards the global multiservice IP network Diverse applications and different QoS requirements Many applications have multiple QoS requirements Video streaming, VoIP, Video conferencing, etc. Need support for end-to-end QoS guarantee under multiple constraints Multiple QoS constraints often make the routing problem intractable 46
  • Slide 47
  • QoS provisioning using overlay networks Build Overlay Backbone Deploy overlay nodes at strategic locations in the Internet Provide support for per-flow forwarding e.g. Anagran Flow Aware Routers Flow route management architecture Discover and setup end-to-end paths for individual flows with diverse flow QoS requirements Monitor end-to-end flow performance to trigger path adaptation 47
  • Slide 48
  • Overlay flow QoS management architecture 48 AS3 AS4 AS1 AS2 End user Overlay node Physical link Logical link Sensing local link characteristics Find a path to X with b/w > b, delay < d and loss < l% Configure intermediate overlay nodes for per-flow forwarding Adapt to different path dynamically as current path fails to meet QoS parameters
  • Slide 49
  • Contribution Design a scalable QoS routing protocol which finds path under multiple constraints Propose a distributed algorithm for dynamic path adaptation Evaluate accuracy, efficiency and scalability of the protocol using large-scale simulation and compare with other existing approaches Build a functional prototype using Click modular router 49
  • Slide 50
  • Design challenges Multiple QoS metrics Finding a feasible path using Dijkstras algorithm is NP- Complete Randomized and approximation algorithms Single composite metric derived from multiple metrics Paths might not meet individual QoS constraints Dynamic overlay-link properties Increases control message overhead 50
  • Slide 51
  • Multi-constraint QoS routing protocol Path vector protocol to disseminate path information Tag with QoS parameters How to aggregate path information when multiple QoS metrics are considered? Distribute the best paths for each metrics What about QoS requests which could be served by paths which are not in the best path set? On-demand route discovery 51 A. Mondal, P. Sharma, S. Banerjee, and A. Kuzmanovic, Supporting Application Network Flows with Multiple QoS Constraints, In IEEE IWQoS 2009
  • Slide 52
  • MCQoS: Disseminating path information 52 B A QoS Path Table XAS1 (2ms, 0.01%, 128Kbps) AS3 (3ms, 0.02%, 378Kbps) AS5 Delay XAS1 (2ms, 0.0%, 128Kbps) AS3 (3ms, 0.005%, 378Kbps) AS5 Loss XAS1 (10ms, 0.01%, 1Mbps) AS3 (5ms, 0.01%, 768Kbps) AS5 B/w Local link info Tag QoS characteristics Advertise best path for each QoS metric
  • Slide 53
  • MCQoS: Aggregating path information What about QoS requests in the undecidable region? 53 Delay Bandwidth (b/w) infeasible undecideable best b/w best delay feasible There will feasible requests that can be supported but the source node might not know about those paths, thus cannot admit flows based on local information The source node already knows a path if the QoS request falls in the feasible region There cannot exist a feasible path in the network if the QoS request falls in the infeasible region
  • Slide 54
  • MCQoS: On-demand route discovery Admit or deny flow based on local QoS table if in feasible or infeasible region Otherwise, On-demand route discovery for requests in undecideable region Exploit advertisement received from neighbors to reduce search space while route discovery 54 Delay B/W feasible infeasible undecideable AB C D E
  • Slide 55
  • 55 C B D A E 10ms12Mbps 100ms 50Mbps 2ms5Mbps 8ms20Mbps 4ms5Mbps 105ms 50Mbps 5ms5Mbps 106ms 50Mbps 120ms, 15Mbps OK 10ms12Mbps 100ms 50Mbps 2ms5Mbps 8ms20Mbps 10ms, 3Mbps OK 10ms, 100Mbps X 15ms, 15Mbps ??? 10ms, 3Mbps OK ABDE 120ms, 15Mbps OK ABCE 10ms, 100Mbps X ---- (2ms, 20Mbps) (5ms, 100Mbps) (1ms, 100Mbps) 15ms, 15Mbps OK ABDE Requests: best b/w best delay MCQoS: Illustration through example
  • Slide 56
  • Route maintenance in MCQoS Route maintenance through path patching Each intermediate node knows the QoS requirements from the node to the destination Upstream node periodically pushes QoS requirements to downstream nodes As a node detects QoS violation, it triggers alternate path search at local node Notify upstream node if no alternative path 56 AG B E H FD C
  • Slide 57
  • Overhead analysis of path dissemination 57 4 6 5 2 3 1 10 8 4 7 5 9 6 In MCQoS protocol, a node advertises only the best path to a destination. Thus many alternative paths are pruned, which increases scalability.
  • Slide 58
  • Overhead analysis of on-demand route discovery Parameters Average out-degree of the nodes Overlay distance between source to destination Worst case Message overhead is proportional to sum of all possible path lengths from source to destination Amortized cost Fraction of request in undecidable region Limit no of hops of route discovery 58 More than 99% of the undecidable region is discovered within 5 hops from the source node, thus amortized cost will be significantly less than worst case scenario.
  • Slide 59
  • Experimental evaluation of MCQoS Built an event-driven simulator Generated random flat topology of nodes using GT- ITM Outdegree min(10, size/2) Assigned link metrics from actual planetlab link measurement data 59
  • Slide 60
  • Convergence time of path dissemination 60 Being path vector based protocol MCQoS takes longer time to converge, but does not involve any NP-hard computation, thus scale with network size Convergence time: how long does it take to stabilize for a given network snapshot? Re-stabilization time: how long does it take to stabilize once a link metric changes? QRON: Link state based multi-QoS routing protocol using composite metric approach
  • Slide 61
  • Message overhead of path dissemination 61 Message overhead of MCQoS is comparable to Link- State based (QRON) protocol
  • Slide 62
  • Elaborating the undecidable region 62 Depletion area Global feasible region: feasible region at the source node if the source node knew all alternative paths like link-state protocol Depletion area: part of global feasible QoS region not known at the source node because many alternate paths are suppressed K-hop path: paths in the undecidabe region discovered within k-hops of on-demand route discovery process
  • Slide 63
  • Overhead of on-demand path discovery 63 More than 90% of the depletion area is discovered within 3 hops How many hops does it take to discover the entire depletion area? We measure the fraction of depletion area discovered within k hops from the source node
  • Slide 64
  • Improvement in accuracy by MCQoS 64 A feasible path with a composite metric might not satisfy individual QoS metrics. The line-segment based approach often suffers from loss/distortion. Our hybrid approach has no false positive and false negative percentage can be reduced to less than one 1% by 3-hop on-demand route discovery.
  • Slide 65
  • QoS violation ratio in dynamic environment with MCQoS 65 Arrival rate (conn/sec) 60120240300600 Violation ratio (%) 0.320.330.780.41.12 100 node topology Generate QoS requests with certain arrival rate with b/w [5Mbps, 55Mbps] and delay [100ms,400ms] Each flow lasts between 5 to 10 minutes We simulate the network behavior for 10 minutes New flows arrive before network stabilizes Expect to observe QoS violation The QoS violation ratio is negligible even with arrival rate of 600 conn/sec.
  • Slide 66
  • MCQoS enabled overlay node prototype 66 MCQoSS3S3 Click Router DataInDataOut Flow setup Local link characteristics Peers (path ads) Control Plane Data Plane QoS path setup (Y:p -> X:q, Dms, L%, BKbps) Rt. discovery req, Rt. discovery reply QoS Path table Flow setup req Flow idNext hop Y:p ->X:qC
  • Slide 67
  • Summary Designed a scalable multiple constraints QoS flow route management protocol hybrid approach of path vector routing and on-demand route discovery Keep balance between flow setup time and control message overhead No complex NP-hard computation Performed large-scale simulations to demonstrate the efficiency and scalability of the approach Built a prototype using Click modular router 67
  • Slide 68
  • Composite Metric approach to multi-QoS routing (1/2) 68 Composite Metric = K1*delay + k2/bw where k1=1, k2 = 10^7, delay in sec, b/w in bps False positive: flow is admitted but the path does not meet the QoS False negative: there exists a feasible path but the flow is not admitted
  • Slide 69
  • Composite Metric approach to multi-QoS routing (2/2) 69
  • Slide 70
  • Line Segment approach to multi-QoS routing (1/2) 70 Lui et al. proposed line segment based approach to for topology aggregation in delay-bw plane. Tam et al. designed a distance vector based QoS protocol using the line-segment approach False positive: Fraction of undecidable region that is actually infeasible, but the approach labels as feasible. False negative: Fraction of undecidable region that is feasible, but the approach labels as infeasible.
  • Slide 71
  • Line Segment approach to multi-QoS routing (2/2) 71