detecting shared congestion of flows via end-to-end measurement (and other inference problems)

Detecting Shared Congestion of Flows Via

End-to-end Measurement(and other inference problems)

Dan Rubenstein

joint work with Jim Kurose andDon Towsley

Umass Amherst

Network Inference• What’s going on in there?

NETWORK

• Where are packets getting lost / delayed?• Where is congestion occurring?• Where are the network hot spots?• What are routers doing (WFQ, RED)?• What version of TCP are end-hosts using?

Multiple Autonomous Systems

• What routing capabilities does your ISP provide? “That’s proprietary info”• Who’s to blame for poor service?

• Consequence: who has to figure out what and where the problem is and how to fix it?

somebody else!

Overview

• Overview of other inference work:– Identifying bottleneck capacities– Multicast inference of loss (MINC)– TCP inference (TBIT)

• Detecting shared points of congestion

Identifying bottleneck bandwidths

• Links have different capacities– “skinniest” link processes slowest: creates a

rate bottleneck– can the bottleneck rate be identified?

• Lots of work here [Carter’96, Jacobson’97, Downey’99, Lai’99, Melander’99, Lai’00]

Multicast Inference

• Infer loss points on multicast tree via correlation patterns of receivers w/in a multicast group [Ratnas’99, Caceres’99 (3), LoPresti’99, Adler’00]

S

R R RRR

Pts of loss

TCP Inference (TBIT)

• Many versions of TCP exist– RENO, TAHOE, VEGAS

• Many “optional” components– SACK, ECN compliance

• Are specification reqmts being met?– initial window sizes, slow start

• TBIT: TCP Behavior Identification Tool [Padhye’00]– stress-tests a server’s TCP by intentionally delaying /

dropping various ACKs– different TCPs / TCP options respond differently to

the delayed / dropped ACKs

Client

Point of congestion

Detecting Shared Pts of Congestion: Why bother?

• When flows share common point of congestion (POC), bandwidth can be “transferred” between flows w/o impacting other traffic

• Applications: WWW servers, multi-flow (multi-media) sessions, multi-sender multicast

• Can limit “transfer” to flows w/ identical e2e data paths [Balak’99]

– ensures flows have common bottleneck– but limits applicability

Server

Point of congestion

Detecting Shared POCs

Q: Can we identify whether two flows share the same Point of Congestion (POC)?

Network Assumptions:– routers use FIFO forwarding

– The two flows’ POCs are either all shared or all

separate

Techniques for detecting shared POCs

• Requirement: flows’ senders or receivers are co-located

• Packet ordering through a potential SPOC same as that at the co-located end-system

• Good SPOC candidates

S2

S1

R1

R2

S1

S2

R1

R2

co-located senders

co-located receivers

Simple Queueing Models of POCs for two flows

FG Flow 1

FG Flow 2

A Shared POCFG Flow 1

FG Flow 2

Separate POCs

BGBG BG

InternetInternet

Approach (High level)

• Idea: Packets passing through same POC close in time experience loss and delay correlations [Moon’98, Yajnik’99]

• Using either loss or delay statistics, compute two measures of correlation:

– Mc: cross-measure (correlation between flows)

– Ma: auto-measure (correlation within a flow)

• such that – if Mc < Ma then infer POCs are separate– else Mc > Ma and infer POCs are shared

The Correlation Statistics...

Loss-Corr for co-located senders:

Mc = Pr(Lost(i) | Lost(i-1))

Ma = Pr(Lost(i) | Lost(prev(i)))

Loss-Corr for co-located receivers: a bit more complex

Delay: Either co-located topology:

Mc = C(Delay(i), Delay(i-1))

Ma = C(Delay(i), Delay(prev(i))C(X,Y) =

E[XY] - E[X]E[Y]

(E[X2] - E2[X])(E[Y2] - E2[Y])

i-4

i-2

i

i-1

i-3

i+1

time

Flow 1 pkts

Flow 2 pkts

Intuition: Why the comparison works

Tarr(prev(i), i)Tarr(i-1, i) • Recall: Pkts closer together exhibit higher correlation

• E[Tarr(i-1, i)] < E[Tarr(prev(i), i)]– On avg, i “more correlated” with i-1 than with prev(i) – True for many distributions, e.g.,

• deterministic, any• poisson, poisson

• Rest of talk: assume poisson, poisson

• Delay-Correlation technique: Assume POC(s) are M+G/G/1/ queues– Thm: Both co-located topologies: Mc > Ma iff flows share

POCs

Analytical Results

As # samples • Loss-Correlation technique:

– Assume POC(s) are M+M/M/1/K queues:

– Thm: Co-located senders, then Mc > Ma iff flows share POCs

– co-located receivers: Mc > Ma iff flows share POCs shown via extensive tests using recursive solutions of Mc and Ma

Simulation Setup

• Co-located senders: Shared POCs

10ms 30ms 10ms

20ms 20ms

30ms 20ms 30ms

S1S2

R1

R2

1.5 Mbs

1000 Mbs

TCP trafficon/off sources

20 pps

20 pps

2nd Simulation Setup

• Co-located senders: Independent POCs


10ms 30ms 10ms

20ms 20ms

30ms 20ms 30ms

S1S2

R1

R2

1000 Mbs

1.5 Mbs20pps

20pps


Independent POCs Shared POCs

Simulation results

• Delay-corr an order of magnitude faster than loss-corr• The Shared loss-corr dip: bias due to delayed Mc samples

• Similar results on co-located receiver topology simulations

Internet Experiments• Goal: Verify techniques using real Internet

traces• Experimental Setup:

– Choose topologies where POC status (shared or unshared)

– Use traceroute to assess shared links and approximate per-link delays

UMass

ACIRI

UCL

Separate POCs (?)193 ms

264 ms 30

ms

Experimental Results

CorrectInconclusive

Wrong

3 Umass (MA)

Columbia (NY)

UCL (UK)

ACIRI (Calif.)

AT&T (Calif.)

Sites

Summary

• E2E Shared-POC detecting techniques– Delay-based techniques more accurate, take less

time (order of magnitude)

• Future Directions:– Experiment with non-Poisson foreground traffic

– Focus on making techniques more practical (e.g., Byers @ BU CS for recent TR)

• Paper available (SIGMETRICS’00)

detecting shared congestion of flows via end-to-end measurement (and other inference problems)

Documents

shared congestion of

flows pocs

shared pocfg flow

colocated receivers

flows wo

colocated senders

inference work

shared pocsrequirement