the impact of false sharing on shared congestion management

The Impact of False Sharing on Shared Congestion Management

Srinivasa Aditya AkellaJoint work with Srini Seshan and

Hari Balakrishnan28 Feb, 2001

Introduction

Predominant model for congestion control

Slow-start AIMD

Not always optimal

Multiple concurrent flows from Src to Dest may share a bottleneck

Compete for resources rather than co-operate Especially visible in the context of Web transfers

Sharing Congestion Information...

Solution - share congestion information Granularity of sharing

Common destination host (network interface) All destination hosts on the same IP subnet

Set of flows sharing congestion info - macroflow

What are the drawbacks of sharing at agranularity larger than a single flow?

False Sharing

Flows sharing congestion state might not share the same bottleneck

Sender has no knowledge False sharing in the Internet

Flows are treated differently- Service Differentiation

Flows take different paths - Path Diversity

False Sharing

Service Differentiation

Integrated Services Differentiated Services (DiffServ)

Path Diversity

Network Load Balancers Network Address translators (NATs)

The sender observes different bottleneckbandwidths, RTTs and loss rates for flowssharing congestion info

Questions...

Impact on performance and correctness

Compromise to end-to-end congestion control?

Degradation in performance of individual flows? Detection

Under what conditions can false-sharing be detected?

Response

How should congestion sharing systems be modified?

What effect do these modifications have?

What should be the default behavior?

Quantifying the Penalty XXX needs to be fixed

Analysis False sharing reduces observed flow

throughput _l share = _1 _2 / ( _1 + _2)l l l l

False sharing increases observed flow loss rate

r_noshare = sqrt( _1 _2r r ) r_share = ( _1 + _2)/2r r

Service Differentiation

Network treats different flows differently

Bandwidth allocation and buffer resources

IETF DiffServ architecture

Three PHBs : Assured Forwarding, Expedited Forwarding, Best Effort

Nortel's implementation of Diffserv

Experiments with two traffic classes : AF and BE

WRR for bandwidth sharing

RIO (for AF) and RED (for BE) for buffer management

Styles of buffer management

Shared and unshared

Topology for Diffserv

Results...

Predicted throughput = XXX need to fill

The faster connection is slowed down by the slower one

Slower connection is never persistently overloaded

Loss rate for the slower connection does not increase appreciably with sharing

Path Diversity

Two flows taking different routes may not share a bottleneck

Two scenarios where path diversity leads to false sharing

Dispersity Routing

NATs Three distinct categories

Unshared bottleneck No shared bottleneck link

Semi-shared bottleneck One of the unshared paths has a bottleneck

Fully shared bottleneck No bottlenecks in the unshared portions RTTs would be different

Topology for Unshared Bottleneck

Results for Unshared-Bottleneck

Bandwidth is close to the prediction

Loss rates followed similar pattern as with the DiffServ case

Delays and Losses...

Delays vary independently of each other

Losses are uncorrelated

Variations and delays in losses in one flow are more correlated than those across flows

Path Diversity, Other Cases

Fully Shared Bottleneck - How is it Different?

Variations in delay seem correlated

The two flows share a common point of congestion

The flows should not share congection information

Detection

Test description

Rubenstein's Delay and Loss Correlation tests Need modifications to be a part of the architecture

Flows might undergo false-sharing if even one of their bottlenecks is unshared

Two differentially served flows might observe statistically dependent delays

Scheduler at the sender might apportion bandwidths non-uniformly

Congestion control schemes depend on RTTs Aggregating flows with different RTTs would

lead to false sharing

Loss-correlation Test Idea -- Losses are likely to come in bursts

This should hold across flows from the same source when a bottleneck is shared

Rubenstein's tests compare the auto and cross correlation metrics for pairs of flows

Does not detect unshared bottlenecks

Need a test to detect all if all bottlenecks are shared New test - Symmetric Loss Correlation

Loss and cross correlation metrics defined in a manner independent of the flows solves the problem

However, packets across flows are assumed to be spaced closer than those within a flow -- Not always true

A fix -- Schedule transmmissions appropriately

Delay-correlation Test

Delay = f(propagation time, queueing delay)

Queueing delay (Q)can vary significantly with time Current Q is strongly related to recently values

Challanges with measuring delay

Clocks cannot be easily synchronized Use change in delay or the relative delay

Methodology of the tests Use timestamps to compute delays Compute correlations Correlation is independent of constant differences

Out-of-Order Test Flows might have fundamentally different delays

DelayCorr does not identify this

Loss and Delay tests might help detect false-sharing

MultiPath Routing where bottleneck is shared

Out-of-Order test handles this well

Look at packet reordering from a source

Reordering by more than 3 packets => No sharing Limitation: Packets must be delivered to the same

physical destination

Cannot be applied to situations like NAT

Rely on RTTs in such situations

Genuine Sharing is Harder to Detect

Evaluation of the Tests

Two metrics for each tests

Detection time Probability of correct decision

Which test is the best?

Out-of-order tests are mostly accurate Loss tests are neither timely nor accurate Delay tests are timely but not as accurate Symmetric Loss test ouputs correct result

much more often than the asymmetric test

Response to False Sharing Design Issues

Default behavior: share information and detect false-sharing

Scheduling False sharing detected more easily than genuine

sharing Default of no-sharing makes no sense with out-

of-order tests Upon detection, stop sharing

In CM, associate the different flows to different macroflows

Relatively small confidence intervals can be used No significant penalty due to an incorrect

decision

Performance How good can restoration possibly be?

False sharing may penalize flows significantly It might take time to restore performance However, the greater the penalty, the easier it is to

detect Approach to performance evaluation -- multiple, de-

randomized, offline runs

Performance restored in less then a factor of 3 of time taken to detect

the impact of false sharing on shared congestion management

Documents