cs 557 routing measurementsmassey/teaching/cs557/restrictedaccess/slid… · cs 557 routing...
TRANSCRIPT
CS 557 Routing Measurements
End to End Routing Behavior in the Internet
Vern Paxson, 1996
Internet Routing Instability Labovitz, Malan, Jahanian, 1997
Spring 2013
End to End Routing Behavior • Objective:
– Understand the actual behavior of Internet routing
• Approach: – Use traceroute to measure routes from multiple
sites. • Contributions:
– Analysis of how routing is really behaving in 1994-1996.
– Example of how to conduct large-scale measurements
– Importance of observing real data
Review and Expected Behavior (1/2)
• How does traceroute work? – Start with a TTL 1, get an ICMP reply from router 1
hop away. – Next use a TTL 2, get an ICMP message from
router 2 hops away. – Continue until reach destination
Review and Expected Behavior (2/2)
• traceroute to 129.82.100.64 (129.82.100.64), 30 hops max, 40 byte packets !• 1 FastEthernet6-0.civ-service1.Canberra.telstra.net (203.50.1.65) 0.236 ms 0.176 ms 0.243 ms !• 2 GigabitEthernet3-0.civ-core2.Canberra.telstra.net (203.50.10.129) 0.762 ms 0.814 ms 0.776 ms !• 3 GigabitEthernet2-2.dkn-core1.Canberra.telstra.net (203.50.6.126) 1.052 ms 1.008 ms 0.942 ms !• 4 Pos4-1.ken-core4.Sydney.telstra.net (203.50.6.69) 4.983 ms 4.953 ms 5.036 ms !• 5 10GigabitEthernet3-0.pad-core4.Sydney.telstra.net (203.50.6.86) 5.31 ms 5.281 ms 5.2 ms !• 6 GigabitEthernet2-2.syd-core02.Sydney.net.reach.com (203.50.13.42) 26.281 ms 5.318 ms 5.322
ms !• 7 i-4-0.syd-core01.net.reach.com (202.84.221.89) 5.475 ms 5.456 ms 5.528 ms !• 8 i-12-1.wil-core02.net.reach.com (202.84.144.65) 162.252 ms 162.236 ms 162.178 ms !• 9 i-6-2.wil04.net.reach.com (202.84.251.186) 162.542 ms 162.561 ms 162.509 ms!• 10 lax-brdr-01.inet.qwest.net (205.171.4.53) 162.866 ms 162.401 ms 162.305 ms!• 11 lax-core-02.inet.qwest.net (205.171.19.41) 162.745 ms 162.563 ms 162.469 ms!• 12 bur-core-01.inet.qwest.net (205.171.8.42) 168.971 ms 168.827 ms 169.185 ms!• 13 dia-core-03.inet.qwest.net (205.171.8.118) 204.15 ms 204.166 ms 203.956 ms!• 14 dvr-edge-09.inet.qwest.net (205.171.10.70) 204.313 ms 204.007 ms 204.078 ms!• 15 65.121.56.106 (65.121.56.106) 204.027 ms 203.851 ms 203.971 ms!• 16 peer01.ari-co.icg.net (170.147.161.87) 204.062 ms 204.299 ms 204.243 ms!• 17 165.236.232.190 (165.236.232.190) 205.499 ms 205.336 ms 205.43 ms!• 18 csu-frgp-gw.colostate.edu (129.82.10.5) 206.788 ms 206.451 ms 207.029 ms!• 19 129.82.2.10 (129.82.2.10) 207.259 ms 206.967 ms 207.849 ms!• 20 yuma.acns.colostate.edu (129.82.100.64) 206.985 ms 206.941 ms 207.193 ms!
Path Vector Routing and Loops
A
B C
D X
Path(X)=A,B,C,D,X Next(A,X)=B
Path(X)=B,C,D,X Next(B,X)=C
Path(X)=C,D,X Next(C,X)=D
Path(X)=D,X Next(D,X)=X
1. Link(D,X) fails => Path(D,X)=none
2. Update Path(A,X)=A,B,C,D,X arrives at D => Path(D,X)=?
Claim D will ignore this path… why??
Internet Routing Loops
Prevalence and Persistence
• Prevalence: how likely is it you will encounter a route?
• Persistence: how long will the route last?
• Very different metrics – Can be prevalent, but not persistent – Why is persistence important? – Why is prevalence important?
Internet Route Persistence
[Pax96] Conclusions • Important to measure the actual system behavior. • Some conclusions as of 1996..
– Routing pathologies are emerging as a challenge for the growing Internet.
– Internet routes are heavily dominated by a prevalent route.
– But wide variation in persistence – About 2/3 of paths persisted for days or weeks.
• Next we consider how well BGP responds to changes in policy and topology….
Internet Routing Instability • Objective:
– Analyze BGP updates and identify BGP routing behaviors and pathologies
• Approach: – Log BGP updates collected from peering
point. • Contributions:
– Identification of routing pathologies. – Identification of routing convergence
problems
Exchange Points • Public Exchange Points
– Network and physical location for connecting BGP routers from different Autonomous Systems.
– Not all routers peer with each other (polices)
Sprint
AT&T
Verio UUNet Regional AS1
Regional AS2
Regional AS3
Regional AS4
Regional AS5 Monitoring Point
Multi-Homing and BGP (1/2)
AS3
AS4
AS5
AS1
AS2
10.0.0.0/9
10.128.0.0/10
10.192.0.0/10
10.0.0.0/8 Path=AS4,{AS1,AS2,AS3}
All traffic to 10.192.0.0/10 will follow link AS5-AS3 (/10 more specific than /8)
10.192.0.0/10 Path=AS4,AS5
Multi-Homing and BGP (2/2)
AS3
AS4
AS5
AS1
AS2
10.0.0.0/9
10.128.0.0/10
10.192.0.0/10
10.0.0.0/8 Path=AS4,{AS1,AS2,AS3} 10.192.0.0/10 Path AS4, AS3
Traffic to 10.192.0.0/10 Split between link AS5-AS3 and AS4-AS3
10.192.0.0/10 Path=AS4,AS5
Types of Routing Events
• Forwarding Instability (change of path) – WADiff = withdraw, announce different – AADiff = implicit withdrawal by replacing with
different route. • Possible Pathologies
– WADup = Withdraw then reannounce – AADup = Implicit withdraw by replacing with new
route that has same AS Path and same next route • Pathologies
– WWDup = withdraw already withdrawn route
Gross Observations • Internet Stats in 1997
– 45,000 prefixes (BGP destinations) – 1300 Autonomous Systems – 1500 AS paths observed in updates
• BGP Updates – Average of 125 updates per prefix per day – Over 30 million updates on one day – Surprise since BGP should only send update if path
changes – Dominated by WWDup, AADup, and WADup
MAE-East Gross Observations Duplicate Withdrawals not shown
Duplicate Withdrawals • Dominate the BGP Update traffic
– 500,000 to 6 million duplicate withdrawals per day at MAE-East
• ISP I Example – 259 prefixes announces – 2.4 million withdrawals – Withdrawals for 14,112 prefixes – Withdrawals for nearly 14K prefixes never announced
in the first place. • Partial Explanation: Stateless BGP
– Don’t keep track of what you advertised – Thus propagate any withdraw to all neighbors
• Even if you never announced the route in the first place
Some Duplicate Explanations • Stateless BGP
– Don’t keep track of what you advertised – Thus propagate any withdraw to all neighbors
• Even if you never announced the route in the first place
• BGP MRAI Timer – MRAI Timer: Advertise only once every 30 seconds – Time 0: update P1 sent – Time 10: P1 changes to P2
but annoucment delayed due to MRAI – Time 20: P2 changes back to P1
so delayed update modified to list P1 – Time 30: update listing P1 sent
• Self-Synchronization (recall earlier paper) – Need to add jitter to MRAI timer
Routing Instability
Major upgrade in end of May. More instability at 10am
No Dominant Problem AS
No Single AS dominates instability. Not a correlation between size and portion of instability. Instability is evenly distributed across routers
Yet The Internet Mostly Works
Conclusions • Very high percentage of pathological updates.
• No dominant AS responsible for problems.
• Lots more current work on BGP measurements – Need to understand the current system – Reminder that systems don’t behave as expected – Fix current problems to keep network running – Draw lessons for future protocol designs