stable internet route selectionstable internet route selection brighten godfrey matthew caesar ian...

28
Stable Internet Route Selection Brighten Godfrey Matthew Caesar Ian Haken Scott Shenker Ion Stoica UC Berkeley [email protected] NANOG 40 June 6, 2007

Upload: others

Post on 25-Oct-2020

7 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Stable Internet Route SelectionStable Internet Route Selection Brighten Godfrey Matthew Caesar Ian Haken Scott Shenker Ion Stoica UC Berkeley pbg@cs.berkeley.edu NANOG 40 June 6, 2007

Stable Internet Route Selection

Brighten GodfreyMatthew Caesar

Ian HakenScott Shenker

Ion StoicaUC Berkeley

[email protected]

NANOG 40June 6, 2007

Page 2: Stable Internet Route SelectionStable Internet Route Selection Brighten Godfrey Matthew Caesar Ian Haken Scott Shenker Ion Stoica UC Berkeley pbg@cs.berkeley.edu NANOG 40 June 6, 2007

BGP instability: troubleCPU cyclesupdate processing uses majority of cycles on some core routers

control plane

data plane degraded path quality BGP causes majority of packet loss bursts

Stable Route Selection:simple technique to

significantly improve stability

Page 3: Stable Internet Route SelectionStable Internet Route Selection Brighten Godfrey Matthew Caesar Ian Haken Scott Shenker Ion Stoica UC Berkeley pbg@cs.berkeley.edu NANOG 40 June 6, 2007

What aboutRoute Flap Damping?

• Introduces pathologies

• Impacts availability

“...the application of flap damping in ISP networks is NOT recommended.”

--RIPE Route Working Group, May 2006

• Only helps for very unstable routes

Page 4: Stable Internet Route SelectionStable Internet Route Selection Brighten Godfrey Matthew Caesar Ian Haken Scott Shenker Ion Stoica UC Berkeley pbg@cs.berkeley.edu NANOG 40 June 6, 2007

Stable Route Selection

Given a choice between routes,select routes that are less likely to fail.

RFD philosophy

SRS philosophy

shut off bad routes

always pick a route if possible, but prefermore stable routes

Page 5: Stable Internet Route SelectionStable Internet Route Selection Brighten Godfrey Matthew Caesar Ian Haken Scott Shenker Ion Stoica UC Berkeley pbg@cs.berkeley.edu NANOG 40 June 6, 2007

Challenges

• Inferring stability of paths, locally

• Dependence: does one ISP’s benefit require others’ participation?

• Flexibility required

Sign image from the Manual of Traffic Signs <http://www.trafficsign.us/>This sign image copyright Richard C. Moeur. All rights reserved.

W15-1Playground

TRADEOFFSAHEAD

Page 6: Stable Internet Route SelectionStable Internet Route Selection Brighten Godfrey Matthew Caesar Ian Haken Scott Shenker Ion Stoica UC Berkeley pbg@cs.berkeley.edu NANOG 40 June 6, 2007

Outline

• Design

• Evaluation

Improvement in stability

Dependence

Flexibility

• Conclusion

Page 7: Stable Internet Route SelectionStable Internet Route Selection Brighten Godfrey Matthew Caesar Ian Haken Scott Shenker Ion Stoica UC Berkeley pbg@cs.berkeley.edu NANOG 40 June 6, 2007

Design

2. Current route3. Shortest path length4. Longest uptime

SRS heuristic

?1. Highest local pref2. Shortest path length3. Lowest origin type4. Lowest MED5. eBGP- over iBGP-learned6. Lowest IGP cost7. Lowest router ID

BGP decision process

Page 8: Stable Internet Route SelectionStable Internet Route Selection Brighten Godfrey Matthew Caesar Ian Haken Scott Shenker Ion Stoica UC Berkeley pbg@cs.berkeley.edu NANOG 40 June 6, 2007

Simplified processes

• Simulator has one router per AS, at most one link between AS’s

• So...

1. Highest local pref2. Shortest path length3. Lowest router ID

Standard BGP

1. Highest local pref2. Current route3. Shortest path length4. Longest uptime

SRS

Page 9: Stable Internet Route SelectionStable Internet Route Selection Brighten Godfrey Matthew Caesar Ian Haken Scott Shenker Ion Stoica UC Berkeley pbg@cs.berkeley.edu NANOG 40 June 6, 2007

Outline

• Design

• Evaluation

Improvement in stability

Dependence

Flexibility

• Conclusion

Page 10: Stable Internet Route SelectionStable Internet Route Selection Brighten Godfrey Matthew Caesar Ian Haken Scott Shenker Ion Stoica UC Berkeley pbg@cs.berkeley.edu NANOG 40 June 6, 2007

Evaluation methodology

• Event-based BGP simulator

• Measuring interruptions: route changes/withdrawals

Topology

Local prefs

AS-adjacency failures

Internet AS-level

cust./prov./peer

inferred from RouteViews

Page 11: Stable Internet Route SelectionStable Internet Route Selection Brighten Godfrey Matthew Caesar Ian Haken Scott Shenker Ion Stoica UC Berkeley pbg@cs.berkeley.edu NANOG 40 June 6, 2007

The bottom line

Std. BGP

SRS 5.3

26

RFD 8.9

RFD & SRS 2.8

Mean interruptions permonth per src-dst pair

Availability lossrelative to Std BGP

4%

0%

5%

0%

Page 12: Stable Internet Route SelectionStable Internet Route Selection Brighten Godfrey Matthew Caesar Ian Haken Scott Shenker Ion Stoica UC Berkeley pbg@cs.berkeley.edu NANOG 40 June 6, 2007

Dependence between ISPs

0

5

10

15

20

25

30

0 0.2 0.4 0.6 0.8 1

Inte

rrupt

ions

per

mon

th

Fraction of ASs running SRS

0

5

10

15

20

25

30

0 0.2 0.4 0.6 0.8 1

Inte

rrupt

ions

per

mon

th

Fraction of ASs running SRS

0

5

10

15

20

25

30

0 0.2 0.4 0.6 0.8 1

Inte

rrupt

ions

per

mon

th

Fraction of ASs running SRS

Std. BGP

80% reductionat full

deployment

44% reductionfor first adopters

Page 13: Stable Internet Route SelectionStable Internet Route Selection Brighten Godfrey Matthew Caesar Ian Haken Scott Shenker Ion Stoica UC Berkeley pbg@cs.berkeley.edu NANOG 40 June 6, 2007

34

5

4

3

2

1

8 16 32 64 128 256

Fact

or im

prov

emen

t usin

g SR

S

Interruptions per month in Standard BGP

mean for all ISPs

AT&TQuest

XO Comm.

Tier 1s benefit more than other ISPs when

unilaterally running SRS

Dependence between ISPs

Tier 1’s

Page 14: Stable Internet Route SelectionStable Internet Route Selection Brighten Godfrey Matthew Caesar Ian Haken Scott Shenker Ion Stoica UC Berkeley pbg@cs.berkeley.edu NANOG 40 June 6, 2007

Route flexibility

path length

business relationships

load balancingFlexibility also

needed for

What is the tradeoff withthese other objectives?

Page 15: Stable Internet Route SelectionStable Internet Route Selection Brighten Godfrey Matthew Caesar Ian Haken Scott Shenker Ion Stoica UC Berkeley pbg@cs.berkeley.edu NANOG 40 June 6, 2007

How much flexibility?

Std BGP

Interruptions per month per src/dst pair

Business LP’s 5.3

26

Realistic traffic eng., etc. ?

Flexibilityfor SRS

Flexibility forother objectives

Page 16: Stable Internet Route SelectionStable Internet Route Selection Brighten Godfrey Matthew Caesar Ian Haken Scott Shenker Ion Stoica UC Berkeley pbg@cs.berkeley.edu NANOG 40 June 6, 2007

Tradeoffs: path length

SRS only4% longer!

(Hypothetical“Longest paths”

strategy:32% longer) 0

0.2

0.4

0.6

0.8

1

0 2 4 6 8 10 12

Frac

tion

of s

ourc

e-de

stin

atio

n pa

irs

Mean path length

SRSStandard BGP

0

0.2

0.4

0.6

0.8

1

0 2 4 6 8 10 12

Frac

tion

of s

ourc

e-de

stin

atio

n pa

irs

Mean path length

Longest pathsSRS

Standard BGP 0

0.2

0.4

0.6

0.8

1

0 2 4 6 8 10 12

Frac

tion

of s

ourc

e-de

stin

atio

n pa

irs

Mean path length

Standard BGP

Page 17: Stable Internet Route SelectionStable Internet Route Selection Brighten Godfrey Matthew Caesar Ian Haken Scott Shenker Ion Stoica UC Berkeley pbg@cs.berkeley.edu NANOG 40 June 6, 2007

Outline

• Design

• Evaluation

Improvement in stability

Dependence

Flexibility

• Conclusion

Page 18: Stable Internet Route SelectionStable Internet Route Selection Brighten Godfrey Matthew Caesar Ian Haken Scott Shenker Ion Stoica UC Berkeley pbg@cs.berkeley.edu NANOG 40 June 6, 2007

Summary

• Stable Route Selection: use flexibility in path selection to optimize for stability

• Significantly more stable

• No impact on availability

• Very low stretch

• Ongoing work: implementation & deployment

Page 19: Stable Internet Route SelectionStable Internet Route Selection Brighten Godfrey Matthew Caesar Ian Haken Scott Shenker Ion Stoica UC Berkeley pbg@cs.berkeley.edu NANOG 40 June 6, 2007

Questions for operators

1 How useful is stability?

3How much flexibility would be available to SRS?

2What are the barriers?(nondeterminism, traffic engineering...)

Very interested in feedback and [email protected]

Page 20: Stable Internet Route SelectionStable Internet Route Selection Brighten Godfrey Matthew Caesar Ian Haken Scott Shenker Ion Stoica UC Berkeley pbg@cs.berkeley.edu NANOG 40 June 6, 2007

Backup slides

Page 21: Stable Internet Route SelectionStable Internet Route Selection Brighten Godfrey Matthew Caesar Ian Haken Scott Shenker Ion Stoica UC Berkeley pbg@cs.berkeley.edu NANOG 40 June 6, 2007

AS adjacency mean session time distribution

0

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

1

10 100 1000 10000 100000 1e+06 1e+07

Fra

ction o

f lin

ks

Average sessiontime (seconds)

Page 22: Stable Internet Route SelectionStable Internet Route Selection Brighten Godfrey Matthew Caesar Ian Haken Scott Shenker Ion Stoica UC Berkeley pbg@cs.berkeley.edu NANOG 40 June 6, 2007

Attribution of improvement

instability = interruptions per event x # events

speed convergence avoid failures

~8% better Majority of theimprovement

Page 23: Stable Internet Route SelectionStable Internet Route Selection Brighten Godfrey Matthew Caesar Ian Haken Scott Shenker Ion Stoica UC Berkeley pbg@cs.berkeley.edu NANOG 40 June 6, 2007

SRS vs. flap damping

SRS: better mean improvement

0.001

0.01

0.1

1

1 10 100 1000 10000

Frac

tion

of s

ourc

e-de

stin

atio

n pa

irs

Interruptions per month

2x10x

Std. BGPFD

SRSSRS + FD

RFD: better forworst ~0.5%

of src-dst pairs

Page 24: Stable Internet Route SelectionStable Internet Route Selection Brighten Godfrey Matthew Caesar Ian Haken Scott Shenker Ion Stoica UC Berkeley pbg@cs.berkeley.edu NANOG 40 June 6, 2007

SRS vs. flap damping

SRS is more conservative

SRS is more aggressive

always pick a routeif one is advertised

use any flexibilityavailable for stability

Page 25: Stable Internet Route SelectionStable Internet Route Selection Brighten Godfrey Matthew Caesar Ian Haken Scott Shenker Ion Stoica UC Berkeley pbg@cs.berkeley.edu NANOG 40 June 6, 2007

SRS with less flexibility

0 5

10 15 20 25 30 35 40 45 50

1 1.1 1.2 1.3 1.4 1.5 1.6 1.7 1.8 1.9

Inte

rrupt

ions

per

mon

th

Mean number of route options

1.85 possible paths with business class Local Prefs

Substantial benefitwith just 1.2 choices

But how much flexibility in practice?

Page 26: Stable Internet Route SelectionStable Internet Route Selection Brighten Godfrey Matthew Caesar Ian Haken Scott Shenker Ion Stoica UC Berkeley pbg@cs.berkeley.edu NANOG 40 June 6, 2007

SRS convergence

• Convergence depends on decision process

• If heuristic is passive

• Any stable state for Std BGP is still stable

• Gao-Rexford constraints still sufficient to guarantee convergence to stable state

• (Simulations: slightly faster convergence)

SRS Heuristic2. Current route3. Lowest path length4. Longest uptime

Page 27: Stable Internet Route SelectionStable Internet Route Selection Brighten Godfrey Matthew Caesar Ian Haken Scott Shenker Ion Stoica UC Berkeley pbg@cs.berkeley.edu NANOG 40 June 6, 2007

SRS can converge where standard BGP doesn’t

4

13414

32434

21424

2

1

3 tie}

Dispute wheel

Page 28: Stable Internet Route SelectionStable Internet Route Selection Brighten Godfrey Matthew Caesar Ian Haken Scott Shenker Ion Stoica UC Berkeley pbg@cs.berkeley.edu NANOG 40 June 6, 2007

Acknowledgements

• This material is based upon work supported by a National Science Foundation Graduate Research Fellowship

• Traffic sign image from Manual of Traffic Signs, by Richard C. Moeur (http://www.trafficsign.us)