stable internet route selectionstable internet route selection brighten godfrey matthew caesar ian...
TRANSCRIPT
Stable Internet Route Selection
Brighten GodfreyMatthew Caesar
Ian HakenScott Shenker
Ion StoicaUC Berkeley
NANOG 40June 6, 2007
BGP instability: troubleCPU cyclesupdate processing uses majority of cycles on some core routers
control plane
data plane degraded path quality BGP causes majority of packet loss bursts
Stable Route Selection:simple technique to
significantly improve stability
What aboutRoute Flap Damping?
• Introduces pathologies
• Impacts availability
“...the application of flap damping in ISP networks is NOT recommended.”
--RIPE Route Working Group, May 2006
• Only helps for very unstable routes
Stable Route Selection
Given a choice between routes,select routes that are less likely to fail.
RFD philosophy
SRS philosophy
shut off bad routes
always pick a route if possible, but prefermore stable routes
Challenges
• Inferring stability of paths, locally
• Dependence: does one ISP’s benefit require others’ participation?
• Flexibility required
Sign image from the Manual of Traffic Signs <http://www.trafficsign.us/>This sign image copyright Richard C. Moeur. All rights reserved.
W15-1Playground
TRADEOFFSAHEAD
Outline
• Design
• Evaluation
Improvement in stability
Dependence
Flexibility
• Conclusion
Design
2. Current route3. Shortest path length4. Longest uptime
SRS heuristic
?1. Highest local pref2. Shortest path length3. Lowest origin type4. Lowest MED5. eBGP- over iBGP-learned6. Lowest IGP cost7. Lowest router ID
BGP decision process
Simplified processes
• Simulator has one router per AS, at most one link between AS’s
• So...
1. Highest local pref2. Shortest path length3. Lowest router ID
Standard BGP
1. Highest local pref2. Current route3. Shortest path length4. Longest uptime
SRS
Outline
• Design
• Evaluation
Improvement in stability
Dependence
Flexibility
• Conclusion
Evaluation methodology
• Event-based BGP simulator
• Measuring interruptions: route changes/withdrawals
Topology
Local prefs
AS-adjacency failures
Internet AS-level
cust./prov./peer
inferred from RouteViews
The bottom line
Std. BGP
SRS 5.3
26
RFD 8.9
RFD & SRS 2.8
Mean interruptions permonth per src-dst pair
Availability lossrelative to Std BGP
4%
0%
5%
0%
Dependence between ISPs
0
5
10
15
20
25
30
0 0.2 0.4 0.6 0.8 1
Inte
rrupt
ions
per
mon
th
Fraction of ASs running SRS
0
5
10
15
20
25
30
0 0.2 0.4 0.6 0.8 1
Inte
rrupt
ions
per
mon
th
Fraction of ASs running SRS
0
5
10
15
20
25
30
0 0.2 0.4 0.6 0.8 1
Inte
rrupt
ions
per
mon
th
Fraction of ASs running SRS
Std. BGP
80% reductionat full
deployment
44% reductionfor first adopters
34
5
4
3
2
1
8 16 32 64 128 256
Fact
or im
prov
emen
t usin
g SR
S
Interruptions per month in Standard BGP
mean for all ISPs
AT&TQuest
XO Comm.
Tier 1s benefit more than other ISPs when
unilaterally running SRS
Dependence between ISPs
Tier 1’s
Route flexibility
path length
business relationships
load balancingFlexibility also
needed for
What is the tradeoff withthese other objectives?
How much flexibility?
Std BGP
Interruptions per month per src/dst pair
Business LP’s 5.3
26
Realistic traffic eng., etc. ?
Flexibilityfor SRS
Flexibility forother objectives
Tradeoffs: path length
SRS only4% longer!
(Hypothetical“Longest paths”
strategy:32% longer) 0
0.2
0.4
0.6
0.8
1
0 2 4 6 8 10 12
Frac
tion
of s
ourc
e-de
stin
atio
n pa
irs
Mean path length
SRSStandard BGP
0
0.2
0.4
0.6
0.8
1
0 2 4 6 8 10 12
Frac
tion
of s
ourc
e-de
stin
atio
n pa
irs
Mean path length
Longest pathsSRS
Standard BGP 0
0.2
0.4
0.6
0.8
1
0 2 4 6 8 10 12
Frac
tion
of s
ourc
e-de
stin
atio
n pa
irs
Mean path length
Standard BGP
Outline
• Design
• Evaluation
Improvement in stability
Dependence
Flexibility
• Conclusion
Summary
• Stable Route Selection: use flexibility in path selection to optimize for stability
• Significantly more stable
• No impact on availability
• Very low stretch
• Ongoing work: implementation & deployment
Questions for operators
1 How useful is stability?
3How much flexibility would be available to SRS?
2What are the barriers?(nondeterminism, traffic engineering...)
Very interested in feedback and [email protected]
Backup slides
AS adjacency mean session time distribution
0
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
1
10 100 1000 10000 100000 1e+06 1e+07
Fra
ction o
f lin
ks
Average sessiontime (seconds)
Attribution of improvement
instability = interruptions per event x # events
speed convergence avoid failures
~8% better Majority of theimprovement
SRS vs. flap damping
SRS: better mean improvement
0.001
0.01
0.1
1
1 10 100 1000 10000
Frac
tion
of s
ourc
e-de
stin
atio
n pa
irs
Interruptions per month
2x10x
Std. BGPFD
SRSSRS + FD
RFD: better forworst ~0.5%
of src-dst pairs
SRS vs. flap damping
SRS is more conservative
SRS is more aggressive
always pick a routeif one is advertised
use any flexibilityavailable for stability
SRS with less flexibility
0 5
10 15 20 25 30 35 40 45 50
1 1.1 1.2 1.3 1.4 1.5 1.6 1.7 1.8 1.9
Inte
rrupt
ions
per
mon
th
Mean number of route options
1.85 possible paths with business class Local Prefs
Substantial benefitwith just 1.2 choices
But how much flexibility in practice?
SRS convergence
• Convergence depends on decision process
• If heuristic is passive
• Any stable state for Std BGP is still stable
• Gao-Rexford constraints still sufficient to guarantee convergence to stable state
• (Simulations: slightly faster convergence)
SRS Heuristic2. Current route3. Lowest path length4. Longest uptime
SRS can converge where standard BGP doesn’t
4
13414
32434
21424
2
1
3 tie}
Dispute wheel
Acknowledgements
• This material is based upon work supported by a National Science Foundation Graduate Research Fellowship
• Traffic sign image from Manual of Traffic Signs, by Richard C. Moeur (http://www.trafficsign.us)