detecting shared congestion of flows via end-to-end measurement (and other inference problems)
DESCRIPTION
Detecting Shared Congestion of Flows Via End-to-end Measurement (and other inference problems). Dan Rubenstein joint work with Jim Kurose and Don Towsley Umass Amherst. NETWORK. Network Inference. What’s going on in there?. Where are packets getting lost / delayed? - PowerPoint PPT PresentationTRANSCRIPT
Detecting Shared Congestion of Flows Via
End-to-end Measurement(and other inference problems)
Dan Rubenstein
joint work with Jim Kurose andDon Towsley
Umass Amherst
Network Inference• What’s going on in there?
NETWORK
• Where are packets getting lost / delayed?• Where is congestion occurring?• Where are the network hot spots?• What are routers doing (WFQ, RED)?• What version of TCP are end-hosts using?
Multiple Autonomous Systems
• What routing capabilities does your ISP provide? “That’s proprietary info”• Who’s to blame for poor service?
• Consequence: who has to figure out what and where the problem is and how to fix it?
somebody else!
Overview
• Overview of other inference work:– Identifying bottleneck capacities– Multicast inference of loss (MINC)– TCP inference (TBIT)
• Detecting shared points of congestion
Identifying bottleneck bandwidths
• Links have different capacities– “skinniest” link processes slowest: creates a
rate bottleneck– can the bottleneck rate be identified?
• Lots of work here [Carter’96, Jacobson’97, Downey’99, Lai’99, Melander’99, Lai’00]
Multicast Inference
• Infer loss points on multicast tree via correlation patterns of receivers w/in a multicast group [Ratnas’99, Caceres’99 (3), LoPresti’99, Adler’00]
S
R R RRR
Pts of loss
TCP Inference (TBIT)
• Many versions of TCP exist– RENO, TAHOE, VEGAS
• Many “optional” components– SACK, ECN compliance
• Are specification reqmts being met?– initial window sizes, slow start
• TBIT: TCP Behavior Identification Tool [Padhye’00]– stress-tests a server’s TCP by intentionally delaying /
dropping various ACKs– different TCPs / TCP options respond differently to
the delayed / dropped ACKs
Client
Point of congestion
Detecting Shared Pts of Congestion: Why bother?
• When flows share common point of congestion (POC), bandwidth can be “transferred” between flows w/o impacting other traffic
• Applications: WWW servers, multi-flow (multi-media) sessions, multi-sender multicast
• Can limit “transfer” to flows w/ identical e2e data paths [Balak’99]
– ensures flows have common bottleneck– but limits applicability
Server
Point of congestion
Detecting Shared POCs
Q: Can we identify whether two flows share the same Point of Congestion (POC)?
Network Assumptions:– routers use FIFO forwarding
– The two flows’ POCs are either all shared or all
separate
Techniques for detecting shared POCs
• Requirement: flows’ senders or receivers are co-located
• Packet ordering through a potential SPOC same as that at the co-located end-system
• Good SPOC candidates
S2
S1
R1
R2
S1
S2
R1
R2
co-located senders
co-located receivers
Simple Queueing Models of POCs for two flows
FG Flow 1
FG Flow 2
A Shared POCFG Flow 1
FG Flow 2
Separate POCs
BGBG BG
InternetInternet
Approach (High level)
• Idea: Packets passing through same POC close in time experience loss and delay correlations [Moon’98, Yajnik’99]
• Using either loss or delay statistics, compute two measures of correlation:
– Mc: cross-measure (correlation between flows)
– Ma: auto-measure (correlation within a flow)
• such that – if Mc < Ma then infer POCs are separate– else Mc > Ma and infer POCs are shared
The Correlation Statistics...
Loss-Corr for co-located senders:
Mc = Pr(Lost(i) | Lost(i-1))
Ma = Pr(Lost(i) | Lost(prev(i)))
Loss-Corr for co-located receivers: a bit more complex
Delay: Either co-located topology:
Mc = C(Delay(i), Delay(i-1))
Ma = C(Delay(i), Delay(prev(i))C(X,Y) =
E[XY] - E[X]E[Y]
(E[X2] - E2[X])(E[Y2] - E2[Y])
i-4
i-2
i
i-1
i-3
i+1
time
Flow 1 pkts
Flow 2 pkts
Intuition: Why the comparison works
Tarr(prev(i), i)Tarr(i-1, i) • Recall: Pkts closer together exhibit higher correlation
• E[Tarr(i-1, i)] < E[Tarr(prev(i), i)]– On avg, i “more correlated” with i-1 than with prev(i) – True for many distributions, e.g.,
• deterministic, any• poisson, poisson
• Rest of talk: assume poisson, poisson
• Delay-Correlation technique: Assume POC(s) are M+G/G/1/ queues– Thm: Both co-located topologies: Mc > Ma iff flows share
POCs
Analytical Results
As # samples • Loss-Correlation technique:
– Assume POC(s) are M+M/M/1/K queues:
– Thm: Co-located senders, then Mc > Ma iff flows share POCs
– co-located receivers: Mc > Ma iff flows share POCs shown via extensive tests using recursive solutions of Mc and Ma
Simulation Setup
• Co-located senders: Shared POCs
10ms 30ms 10ms
20ms 20ms
30ms 20ms 30ms
S1S2
R1
R2
1.5 Mbs
1000 Mbs
TCP trafficon/off sources
20 pps
20 pps
2nd Simulation Setup
• Co-located senders: Independent POCs
TCP trafficon/off sources
10ms 30ms 10ms
20ms 20ms
30ms 20ms 30ms
S1S2
R1
R2
1000 Mbs
1.5 Mbs20pps
20pps
TCP trafficon/off sources
Independent POCs Shared POCs
Simulation results
• Delay-corr an order of magnitude faster than loss-corr• The Shared loss-corr dip: bias due to delayed Mc samples
• Similar results on co-located receiver topology simulations
Internet Experiments• Goal: Verify techniques using real Internet
traces• Experimental Setup:
– Choose topologies where POC status (shared or unshared)
– Use traceroute to assess shared links and approximate per-link delays
UMass
ACIRI
UCL
Separate POCs (?)193 ms
264 ms 30
ms
Experimental Results
CorrectInconclusive
Wrong
3 Umass (MA)
Columbia (NY)
UCL (UK)
ACIRI (Calif.)
AT&T (Calif.)
Sites
Summary
• E2E Shared-POC detecting techniques– Delay-based techniques more accurate, take less
time (order of magnitude)
• Future Directions:– Experiment with non-Poisson foreground traffic
– Focus on making techniques more practical (e.g., Byers @ BU CS for recent TR)
• Paper available (SIGMETRICS’00)