improving internet availability
DESCRIPTION
Improving Internet Availability. Nick Feamster Georgia Tech. Can the Internet be “Always On”?. E911 service Air traffic control …. OK for email and the Web, but what about:. Stanford University Clean-Slate Design for the Internet:. - PowerPoint PPT PresentationTRANSCRIPT
![Page 1: Improving Internet Availability](https://reader030.vdocuments.site/reader030/viewer/2022032612/5681309f550346895d9694dc/html5/thumbnails/1.jpg)
ImprovingInternet Availability
Nick FeamsterGeorgia Tech
![Page 2: Improving Internet Availability](https://reader030.vdocuments.site/reader030/viewer/2022032612/5681309f550346895d9694dc/html5/thumbnails/2.jpg)
2
Can the Internet be“Always On”?
“It is not difficult to create a list of desired characteristics for a new Internet. Deciding how to design and deploy a network that achieves these goals is much harder. … It should be:
1. Robust and available. The network should be as robust, fault-tolerant and available as the wire-line telephone network is today.
2. …
• E911 service• Air traffic control• …
Stanford University Clean-Slate Design for the Internet:
OK for email and the Web, but what about:
![Page 3: Improving Internet Availability](https://reader030.vdocuments.site/reader030/viewer/2022032612/5681309f550346895d9694dc/html5/thumbnails/3.jpg)
3
Work to do…
• Various studies (Paxson, Andersen, etc.) show the Internet is at about 2.5 “nines”
• More “critical” (or at least availability-centric) applications on the Internet
• At the same time, the Internet is getting more difficult to debug– Scale, complexity, disconnection, etc.
![Page 4: Improving Internet Availability](https://reader030.vdocuments.site/reader030/viewer/2022032612/5681309f550346895d9694dc/html5/thumbnails/4.jpg)
4
Natural Disasters
![Page 5: Improving Internet Availability](https://reader030.vdocuments.site/reader030/viewer/2022032612/5681309f550346895d9694dc/html5/thumbnails/5.jpg)
5
Unnatural Disasters
![Page 6: Improving Internet Availability](https://reader030.vdocuments.site/reader030/viewer/2022032612/5681309f550346895d9694dc/html5/thumbnails/6.jpg)
6
Economic Threats
![Page 7: Improving Internet Availability](https://reader030.vdocuments.site/reader030/viewer/2022032612/5681309f550346895d9694dc/html5/thumbnails/7.jpg)
7
Operator Error
![Page 8: Improving Internet Availability](https://reader030.vdocuments.site/reader030/viewer/2022032612/5681309f550346895d9694dc/html5/thumbnails/8.jpg)
8
Threats to Availability
• Natural disasters• Physical failures (node, link)• Router software bugs• Misconfiguration• Mis-coordination• Denial-of-service (DoS) attacks• Changes in traffic patterns (e.g., flash crowd)• …
![Page 9: Improving Internet Availability](https://reader030.vdocuments.site/reader030/viewer/2022032612/5681309f550346895d9694dc/html5/thumbnails/9.jpg)
9
Availability of Other Services
• Carrier Airlines (2002 FAA Fact Book)– 41 accidents, 6.7M departures– 99.9993% availability
• 911 Phone service (1993 NRIC report +)– 29 minutes per year per line– 99.994% availability
• Std. Phone service (various sources)– 53+ minutes per line per year– 99.99+% availability
Credit: David Andersen job talk
![Page 10: Improving Internet Availability](https://reader030.vdocuments.site/reader030/viewer/2022032612/5681309f550346895d9694dc/html5/thumbnails/10.jpg)
10
Can the Internet Be “Always On”?
• Various studies (Paxson, Andersen, etc.) show the Internet is at about 2.5 “nines”
• More “critical” (or at least availability-centric) applications on the Internet
• At the same time, the Internet is getting more difficult to debug– Increasing scale, complexity, disconnection, etc.
Is it possible to get to “5 nines” of availability?If so, how?
![Page 11: Improving Internet Availability](https://reader030.vdocuments.site/reader030/viewer/2022032612/5681309f550346895d9694dc/html5/thumbnails/11.jpg)
11
Two Philosophies
• Bandage: Accept the Internet as is. Devise band-aids.
• Amputation: Redesign Internet routing to guarantee safety, route validity, and path visibility
![Page 12: Improving Internet Availability](https://reader030.vdocuments.site/reader030/viewer/2022032612/5681309f550346895d9694dc/html5/thumbnails/12.jpg)
12
Two Approaches
• Proactive: Catch the fault before it happens on the live network.
• Reactive: Recover from the fault when it occurs, and mask or limit the damage.
![Page 13: Improving Internet Availability](https://reader030.vdocuments.site/reader030/viewer/2022032612/5681309f550346895d9694dc/html5/thumbnails/13.jpg)
13
rcc (routers)
FIREMAN (firewalls),
OpNet, …
IP Fast Reroute,
RON
Routing Control Platform
4D Architecture
CoNMan
Failure-Carrying Packets
Multi-Router Configuration
Path Splicing
Proactive Reactive
Bandage
Amputation
Tutorial Outline
![Page 14: Improving Internet Availability](https://reader030.vdocuments.site/reader030/viewer/2022032612/5681309f550346895d9694dc/html5/thumbnails/14.jpg)
14
Proactive Techniques
• Today: router configuration checker (“rcc”)– Check configuration offline, in advance– Reason about protocol dynamics with static analysis
• Tomorrow– Simplify the configuration
• CONMan– Simplify the protocol operation
• RCP• 4D
![Page 15: Improving Internet Availability](https://reader030.vdocuments.site/reader030/viewer/2022032612/5681309f550346895d9694dc/html5/thumbnails/15.jpg)
15
What can go wrong?
Two-thirds of the problems are caused by configuration of the routing protocol
Some downtime is very hard to protect against…
But…
![Page 16: Improving Internet Availability](https://reader030.vdocuments.site/reader030/viewer/2022032612/5681309f550346895d9694dc/html5/thumbnails/16.jpg)
16
Internet Routing Protocol: BGP
Route Advertisement
Autonomous Systems (ASes)
Session
Traffic
Destination Next-hop AS Path130.207.0.0/16
130.207.0.0/16
192.5.89.89
66.250.252.44
10578..2637
174… 2637
![Page 17: Improving Internet Availability](https://reader030.vdocuments.site/reader030/viewer/2022032612/5681309f550346895d9694dc/html5/thumbnails/17.jpg)
17
Two Flavors of BGP
• External BGP (eBGP): exchanging routes between ASes
• Internal BGP (iBGP): disseminating routes to external destinations among the routers within an AS
eBGPiBGP
Question: What’s the difference between IGP and iBGP?
![Page 18: Improving Internet Availability](https://reader030.vdocuments.site/reader030/viewer/2022032612/5681309f550346895d9694dc/html5/thumbnails/18.jpg)
18
Complex configuration!
• Which neighboring networks can send traffic
• Where traffic enters and leaves the network
• How routers within the network learn routes to external destinations
Flexibility for realizing goals in complex business landscape
Flexibility Complexity
Traffic
Route No Route
![Page 19: Improving Internet Availability](https://reader030.vdocuments.site/reader030/viewer/2022032612/5681309f550346895d9694dc/html5/thumbnails/19.jpg)
19
What types of problems does configuration cause?
• Persistent oscillation (last time)• Forwarding loops• Partitions• “Blackholes”• Route instability• …
![Page 20: Improving Internet Availability](https://reader030.vdocuments.site/reader030/viewer/2022032612/5681309f550346895d9694dc/html5/thumbnails/20.jpg)
20
Real Problems: “AS 7007”“…a glitch at a small ISP… triggered a major outage in Internet access across the country. The problem started when MAI Network Services...passed bad router information from one of its customers onto Sprint.”
-- news.com, April 25, 1997
UUNet
Florida InternetBarn
Sprint
![Page 21: Improving Internet Availability](https://reader030.vdocuments.site/reader030/viewer/2022032612/5681309f550346895d9694dc/html5/thumbnails/21.jpg)
21
Real, Recurrent Problems“…a glitch at a small ISP… triggered a major outage in Internet access across the country. The problem started when MAI Network Services...passed bad router information from one of its customers onto Sprint.”
-- news.com, April 25, 1997
“Microsoft's websites were offline for up to 23 hours...because of a [router] misconfiguration…it took nearly a day to determine what was wrong and undo the changes.” -- wired.com, January 25, 2001
“WorldCom Inc…suffered a widespread outage on its Internet backbone that affected roughly 20 percent of its U.S. customer base. The network problems…affected millions of computer users worldwide. A spokeswoman attributed the outage to "a route table issue."
-- cnn.com, October 3, 2002
"A number of Covad customers went out from 5pm today due to, supposedly, a DDOS (distributed denial of service attack) on a key Level3 data center, which later was described as a route leak (misconfiguration).”
-- dslreports.com, February 23, 2004
![Page 22: Improving Internet Availability](https://reader030.vdocuments.site/reader030/viewer/2022032612/5681309f550346895d9694dc/html5/thumbnails/22.jpg)
22
January 2006: Route Leak, Take 2
“Of course, there are measures one can take against this sort of thing; but it's hard to deploy some of them effectively when the party stealing your routes was in fact once authorized to offer them, and its own peers may be explicitly allowing them in filter lists (which, I think, is the case here). “
Con Ed 'stealing' Panix routes (alexis) Sun Jan 22 12:38:16 2006
All Panix services are currently unreachable from large portions of the Internet (though not all of it). This is because Con Ed Communications, a competence-challenged ISP in New York, is announcing our routes to the Internet. In English, that means that they are claiming that all our traffic should be passing through them, when of course it should not. Those portions of the net that are "closer" (in network topology terms) to Con Ed will send them our traffic, which makes us unreachable.
![Page 23: Improving Internet Availability](https://reader030.vdocuments.site/reader030/viewer/2022032612/5681309f550346895d9694dc/html5/thumbnails/23.jpg)
23
Several “Big” Problems a Week
0102030405060708090
Filtering RouteLeaks
RouteHijacks
RouteInstability
RoutingLoops
Blackholes
# T
hre
ad
s o
ve
r S
tate
d P
eri
od
1994-1997 1998-2001 2001-2004
![Page 24: Improving Internet Availability](https://reader030.vdocuments.site/reader030/viewer/2022032612/5681309f550346895d9694dc/html5/thumbnails/24.jpg)
24
Why is routing hard to get right?
• Defining correctness is hard
• Interactions cause unintended consequences– Each network independently configured– Unintended policy interactions
• Operators make mistakes – Configuration is difficult– Complex policies, distributed configuration
![Page 25: Improving Internet Availability](https://reader030.vdocuments.site/reader030/viewer/2022032612/5681309f550346895d9694dc/html5/thumbnails/25.jpg)
25
Today: Stimulus-Response
• Problems cause downtime• Problems often not immediately apparent
What happens if I tweak this policy…?
Configure ObserveWait for
Next ProblemDesired Effect?
RevertNo
Yes
![Page 26: Improving Internet Availability](https://reader030.vdocuments.site/reader030/viewer/2022032612/5681309f550346895d9694dc/html5/thumbnails/26.jpg)
26
“rcc”
Idea: Proactive Checks
Normalized Representation
CorrectnessSpecification
Constraints
Faults
• Analyzing complex, distributed configuration• Defining a correctness specification• Mapping specification to constraints
Challenges
Distributed routerconfigurations
(Single AS)
![Page 27: Improving Internet Availability](https://reader030.vdocuments.site/reader030/viewer/2022032612/5681309f550346895d9694dc/html5/thumbnails/27.jpg)
27
Correctness SpecificationSafetyThe protocol converges to a stable path assignment for every possible initial state and message orderingThe protocol does not oscillate
![Page 28: Improving Internet Availability](https://reader030.vdocuments.site/reader030/viewer/2022032612/5681309f550346895d9694dc/html5/thumbnails/28.jpg)
28
What about properties of resulting paths, after the protocol has converged?
We need additional correctness properties.
![Page 29: Improving Internet Availability](https://reader030.vdocuments.site/reader030/viewer/2022032612/5681309f550346895d9694dc/html5/thumbnails/29.jpg)
29
Correctness SpecificationSafetyThe protocol converges to a stable path assignment for every possible initial state and message orderingThe protocol does not oscillate
Path Visibility Every destination with a usable path has a route advertisement
Route Validity Every route advertisement corresponds to a usable path
Example violation: Network partition
Example violation: Routing loop
If there exists a path, then there exists a route
If there exists a route, then there exists a path
![Page 30: Improving Internet Availability](https://reader030.vdocuments.site/reader030/viewer/2022032612/5681309f550346895d9694dc/html5/thumbnails/30.jpg)
30
Configuration Semantics
Ranking: route selection
Dissemination: internal route advertisement
Filtering: route advertisement
Customer
Competitor
Primary
Backup
![Page 31: Improving Internet Availability](https://reader030.vdocuments.site/reader030/viewer/2022032612/5681309f550346895d9694dc/html5/thumbnails/31.jpg)
31
Path Visibility: Internal BGP (iBGP)
“iBGP”Default: “Full mesh” iBGP. Doesn’t scale.
Large ASes use “Route reflection” Route reflector: non-client routes over client sessions; client routes over all sessions Client: don’t re-advertise iBGP routes.
![Page 32: Improving Internet Availability](https://reader030.vdocuments.site/reader030/viewer/2022032612/5681309f550346895d9694dc/html5/thumbnails/32.jpg)
32
iBGP Signaling: Static CheckTheorem.Suppose the iBGP reflector-client relationship graph contains no cycles. Then, path visibility is satisfied if, and only if, the set of routers that are not route reflector clients forms a clique.
Condition is easy to check with static analysis.
![Page 33: Improving Internet Availability](https://reader030.vdocuments.site/reader030/viewer/2022032612/5681309f550346895d9694dc/html5/thumbnails/33.jpg)
33
rcc Implementation
Preprocessor Parser
Verifier
Distributed routerconfigurations Relational
Database(mySQL)
Constraints
Faults
(Cisco, Avici, Juniper, Procket, etc.)
![Page 34: Improving Internet Availability](https://reader030.vdocuments.site/reader030/viewer/2022032612/5681309f550346895d9694dc/html5/thumbnails/34.jpg)
34
Configuration Checking:Take-home lessons
• Static configuration analysis uncovers many errors
• Major causes of error:– Distributed configuration– Intra-AS dissemination is too complex– Mechanistic expression of policy
![Page 35: Improving Internet Availability](https://reader030.vdocuments.site/reader030/viewer/2022032612/5681309f550346895d9694dc/html5/thumbnails/35.jpg)
35
Limitations of Static Analysis
• Problem: Many problems can’t be detected from static configuration analysis of a single AS
• Dependencies/Interactions among multiple ASes– Contract violations– Route hijacks– BGP “wedgies” (RFC 4264)– Filtering
• Dependencies on route arrivals– Simple network configurations can oscillate, but
operators can’t tell until the routes actually arrive.
![Page 36: Improving Internet Availability](https://reader030.vdocuments.site/reader030/viewer/2022032612/5681309f550346895d9694dc/html5/thumbnails/36.jpg)
36
BGP Wedgies
• AS 1 implements backup link by sending AS 2 a “depref me” community.
• AS 2 sets localpref to smaller than that of routes from its upstream provider (AS 3 routes)
Backup Primary
“Depref”
AS 2
AS 1
AS 3 AS 4
![Page 37: Improving Internet Availability](https://reader030.vdocuments.site/reader030/viewer/2022032612/5681309f550346895d9694dc/html5/thumbnails/37.jpg)
37
Wedgie: Failure and “Recovery”
• Requires manual intervention
Backup Primary
“Depref”
AS 2
AS 1
AS 3 AS 4
![Page 38: Improving Internet Availability](https://reader030.vdocuments.site/reader030/viewer/2022032612/5681309f550346895d9694dc/html5/thumbnails/38.jpg)
38
Routing Attributes and Route Selection
• Local preference: numerical value assigned by routing policy. Higher values are more preferred.
• AS path length: number of AS-level hops in the path• Multiple exit discriminator (“MED”): allows one AS to specify that
one exit point is more preferred than another. Lower values are more preferred.
• eBGP over iBGP• Shortest IGP path cost to next hop: implements “hot potato”
routing• Router ID tiebreak: arbitrary tiebreak, since only a single “best”
route can be selected
BGP routes have the following attributes, on which the route selection process is based:
![Page 39: Improving Internet Availability](https://reader030.vdocuments.site/reader030/viewer/2022032612/5681309f550346895d9694dc/html5/thumbnails/39.jpg)
39
Problems with MED
• R3 selects A• R1 advertises A to R2• R2 selects C• R1 selects C
– (R1 withdraws A from R2)
• R2 selects B– (R2 withdraws C from R1)
• R1 selects A, advertises to R2
R1
R3 R2
AB
C
2 1
MED: 10MED: 20
Preference between B and C at R2 depends on presence or absence of A.
![Page 40: Improving Internet Availability](https://reader030.vdocuments.site/reader030/viewer/2022032612/5681309f550346895d9694dc/html5/thumbnails/40.jpg)
40
rcc (routers)
FIREMAN (firewalls),
OpNet, …
IP Fast Reroute,
RON
Routing Control Platform
4D Architecture
CoNMan
Failure-Carrying Packets
Multi-Router Configuration
Path Splicing
Proactive Reactive
Bandage
Amputation
Tutorial Outline
![Page 41: Improving Internet Availability](https://reader030.vdocuments.site/reader030/viewer/2022032612/5681309f550346895d9694dc/html5/thumbnails/41.jpg)
41
Routing Control Platform
iBGP
RCP
After: RCP gets “best” iBGP routes (and IGP topology)
iBGP
eBGPBefore: conventional iBGP
Caesar et al., “Design and Implementation of a Routing Control Platform”, NSDI, 2005
![Page 42: Improving Internet Availability](https://reader030.vdocuments.site/reader030/viewer/2022032612/5681309f550346895d9694dc/html5/thumbnails/42.jpg)
42
How ISPs Route
Border routerInternal router
1. Provide internal reachability (IGP)2. Learn routes to external destinations (eBGP)3. Distribute externally learned routes internally (iBGP)4. Select closest egress (IGP)
62
4 9 2
13
3
![Page 43: Improving Internet Availability](https://reader030.vdocuments.site/reader030/viewer/2022032612/5681309f550346895d9694dc/html5/thumbnails/43.jpg)
43
What’s wrong with Internet routing?
• Full-mesh iBGP doesn’t scale– # sessions, control traffic, router memory/cpu– Route-reflectors help by introducing hierarchy
• but introduce configuration complexity, protocol oscillations/loops
• Hard to manage– Many highly configurable mechanisms– Difficult to model effects of configuration changes– Hard to diagnose when things go wrong
• Hard to evolve– Hard to provide new services, improve upon protocols
![Page 44: Improving Internet Availability](https://reader030.vdocuments.site/reader030/viewer/2022032612/5681309f550346895d9694dc/html5/thumbnails/44.jpg)
44
Routing Control Platform
• What’s causing these problems?– Each router has limited visibility of IGP and BGP– No central point of control/observation– Resource limitations on legacy routers
network
RCP
Compute routes from central point, remove protocols from routers
network network
RCPInter-AS Protocol
RCP
![Page 45: Improving Internet Availability](https://reader030.vdocuments.site/reader030/viewer/2022032612/5681309f550346895d9694dc/html5/thumbnails/45.jpg)
45
RCP in a Single ISP
• Better scalability: reduces load on routers• Easier management: configuration from a single point• Easier evolvability: freedom from router software
RCP
![Page 46: Improving Internet Availability](https://reader030.vdocuments.site/reader030/viewer/2022032612/5681309f550346895d9694dc/html5/thumbnails/46.jpg)
46
Example of a Forwarding Loop
RR1 RR2
C1 C2
3 3
1
1 1
d
C1 learns BGP route to destination from RR1C2 learns BGP route to destination from RR2
C1 sends packets to RR1 via its IGP shortest path which traverses C2
C2 sends packets to RR2 via its IGP shortest path which traverses C1
![Page 47: Improving Internet Availability](https://reader030.vdocuments.site/reader030/viewer/2022032612/5681309f550346895d9694dc/html5/thumbnails/47.jpg)
47
Avoiding Forwarding Loops with RCP
• RCP learns BGP routes• Computes consistent router-level paths• Intrinsic loop freedom and faster convergence• No neeed to stick to BGP decision process
RR1 RR2
C1 C23 3
1
1 1
d
“RR2”“RR2” RCP
![Page 48: Improving Internet Availability](https://reader030.vdocuments.site/reader030/viewer/2022032612/5681309f550346895d9694dc/html5/thumbnails/48.jpg)
48
RCP architecture
• Divide design into components– Replication improves availability
• Distributed operation, but global state per component
Route Control Server (RCS)
BGP EngineIGP Viewer(NSDI ’04)
Routing Control Platform (RCP)
Available BGP routes
BGP updates
…
Selected BGP routes
BGP updates…
Path cost matrix
IGP link-state advertisements
…
![Page 49: Improving Internet Availability](https://reader030.vdocuments.site/reader030/viewer/2022032612/5681309f550346895d9694dc/html5/thumbnails/49.jpg)
49
Challenges and contributions
• Reliability– Problem: single point of failure– Contribution: simple replication of RCP components
• Consistency– Problem: inconsistent decisions by replicas– Contribution: guaranteed consistency without inter-replica
protocol
• Scalability– Problem: storing all routes increases cpu/memory usage– Contribution : can support large ISP in one computer
![Page 50: Improving Internet Availability](https://reader030.vdocuments.site/reader030/viewer/2022032612/5681309f550346895d9694dc/html5/thumbnails/50.jpg)
50
Generalization: 4D Architecture
• Decision: makes all decisions re: network control
• Dissemination: connect routers with decision elements
• Discovery: discover physical identifiers and assign logical identifiers
• Data: handle packets based on data output by the decision plane
Separate decision logic from packet forwarding.
![Page 51: Improving Internet Availability](https://reader030.vdocuments.site/reader030/viewer/2022032612/5681309f550346895d9694dc/html5/thumbnails/51.jpg)
51
Configuration is too complex: Fix Bottom Up!
• MIBDepot.com lists:– 6200 SNMP MIBs, from
142 vendors, a million MIB objects
• SNMPLink.com lists:– More than 1000
management applications
• Market survey: – Config errors account
for 62% of network downtime
• CONMan abstraction exploits commonality among all protocols
• Protocol details are hidden inside protocol implementations
• Shift complexity from network manager to protocol implementer– Who in any event must deal
with the complexity
Problem Solution
![Page 52: Improving Internet Availability](https://reader030.vdocuments.site/reader030/viewer/2022032612/5681309f550346895d9694dc/html5/thumbnails/52.jpg)
52
Research: Techniques for Availability
• Efficient algorithms for testing correctness offline– Networks: VLANs, IGP, BGP, etc.– Security: Firewalls
• Scalable techniques for enforcing correct behavior in the protocol itself
![Page 53: Improving Internet Availability](https://reader030.vdocuments.site/reader030/viewer/2022032612/5681309f550346895d9694dc/html5/thumbnails/53.jpg)
53
rcc (routers)
FIREMAN (firewalls),
OpNet, …
IP Fast Reroute,
RON
Routing Control Platform
4D Architecture
CoNMan
Failure-Carrying Packets
Multi-Router Configuration
Path Splicing
Proactive Reactive
Bandage
Amputation
Tutorial Outline
![Page 54: Improving Internet Availability](https://reader030.vdocuments.site/reader030/viewer/2022032612/5681309f550346895d9694dc/html5/thumbnails/54.jpg)
54
Reactive Approach
• Failures will happen…what to do?
• (At least) three options– Nothing– Diagnosis + Semi-manual intervention – Automatic masking/recovery
• How to detect faults?– At the network interface/node (MPLS fast reroute)– End-to-end (RON, Path Splicing)
• How to mask faults?– At the network layer (“conventional” routing, FRR, splicing)– Above the network layer (Overlays)
![Page 55: Improving Internet Availability](https://reader030.vdocuments.site/reader030/viewer/2022032612/5681309f550346895d9694dc/html5/thumbnails/55.jpg)
55
The Internet Ideal
• Dynamic routing routes around failures• End-user is none the wiser
![Page 56: Improving Internet Availability](https://reader030.vdocuments.site/reader030/viewer/2022032612/5681309f550346895d9694dc/html5/thumbnails/56.jpg)
56
Reality
• Routing pathologies: 3.3% of routes had “serious problems”
• Slow convergence: BGP can take a long time to converge– Up to 30 minutes!– 10% of routes available < 95% of the time [Labovitz]
• “Invisible” failures: about 50% of prolonged outages not visible in BGP [Feamster]
![Page 57: Improving Internet Availability](https://reader030.vdocuments.site/reader030/viewer/2022032612/5681309f550346895d9694dc/html5/thumbnails/57.jpg)
57
What to Protect?
• Links• Shared risk link groups• End-to-end paths
![Page 58: Improving Internet Availability](https://reader030.vdocuments.site/reader030/viewer/2022032612/5681309f550346895d9694dc/html5/thumbnails/58.jpg)
58
Fast Reroute
• Idea: Detect link failure locally, switch to a pre-computed backup path that protects that link
• Two deployment scenarios– MPLS Fast Reroute
• Source-routed path around each link failure• Requires MPLS infrastructure
– IP Fast Reroute• Connectionless alternative• Various approaches: ECMP, Not-via
![Page 59: Improving Internet Availability](https://reader030.vdocuments.site/reader030/viewer/2022032612/5681309f550346895d9694dc/html5/thumbnails/59.jpg)
59
IP Fast Reroute
• Interface protection (vs. path protection)– Detect interface/node failure locally– Reroute either to that node or one hop past
• Various mechanisms– Equal cost multipath– Loop-free Alternatives– Not-via Addresses
![Page 60: Improving Internet Availability](https://reader030.vdocuments.site/reader030/viewer/2022032612/5681309f550346895d9694dc/html5/thumbnails/60.jpg)
60
Equal Cost Multipath
• Set up link weights so that several paths have equal cost
• Protects only the paths for which such weights exist
Link not protected
S
D
I
15 5
55 5
15
15
5
20
![Page 61: Improving Internet Availability](https://reader030.vdocuments.site/reader030/viewer/2022032612/5681309f550346895d9694dc/html5/thumbnails/61.jpg)
61
ECMP: Strengths and Weaknesses
• Simple• No path stretch upon recovery
(at least not nominally)
• Won’t protect a large number of paths• Hard to protect a path from multiple failures• Might interfere with other objectives (e.g., TE)
Strengths
Weaknesses
![Page 62: Improving Internet Availability](https://reader030.vdocuments.site/reader030/viewer/2022032612/5681309f550346895d9694dc/html5/thumbnails/62.jpg)
62
Loop-Free Alternates
• Precompute alternate next-hop
• Choose alternate next-hop to avoid microloops:
S N
D
5
3 2 6
9
10
• More flexibility than ECMP• Tradeoff between loop-freedom and available
alternate paths
![Page 63: Improving Internet Availability](https://reader030.vdocuments.site/reader030/viewer/2022032612/5681309f550346895d9694dc/html5/thumbnails/63.jpg)
63
Not-via Addresses
• Connectionless version of MPLS Fast Reroute– Local detection + tunneling
• Avoid the failed component– Repair to next-next hop
• Create special not-via addresses for ”deflection”– 2E addresses needed
S F Bf
D
![Page 64: Improving Internet Availability](https://reader030.vdocuments.site/reader030/viewer/2022032612/5681309f550346895d9694dc/html5/thumbnails/64.jpg)
64
Not-via Memory Overhead
• Extra FIB entries
dst nexthop
Kansas C. Denver
New York Denver
… …
2095
1176
902
1295 639
856
1893
587
233260
846
700
548
366
NVD->K Los Angeles
… …
# of extra entries = # of unidirectional links unprotected by LFA
![Page 65: Improving Internet Availability](https://reader030.vdocuments.site/reader030/viewer/2022032612/5681309f550346895d9694dc/html5/thumbnails/65.jpg)
65
Not-via: Strengths and Weaknesses
• 100% coverage• Easy support for multicast traffic
– Due to repair to next-next hop
• Easy support for SRLGs
• Relies on tunneling– Heavy processing– MTU issues
• Suboptimal backup path lengths– Due to repair to next-next hop
Strengths
Weaknesses
![Page 66: Improving Internet Availability](https://reader030.vdocuments.site/reader030/viewer/2022032612/5681309f550346895d9694dc/html5/thumbnails/66.jpg)
66
Concerns with Recovery: Loops
• Topology changes caused by either failures or management operations can cause temporary inconsistent state
• Causes– Asymmetric link costs– Topology changes that affect multiple links
• Undesirable effects– Wasted bandwidth– Failure to recover
![Page 67: Improving Internet Availability](https://reader030.vdocuments.site/reader030/viewer/2022032612/5681309f550346895d9694dc/html5/thumbnails/67.jpg)
67
Loop Mitigation/Prevention
• Incremental cost changes• Synchronizeded FIB installation• Ordered FIB installation (OFIB)• Tunnel-based• Packet marking
![Page 68: Improving Internet Availability](https://reader030.vdocuments.site/reader030/viewer/2022032612/5681309f550346895d9694dc/html5/thumbnails/68.jpg)
68
Incremental Cost Change
• A change in a link cost of x can only cause loops whose “cyclic” cost is <=x
• Minimum cycle is 2 (1 in each direction)• Cost change of 1 can never cause a loop• Where minimum cycle is larger, larger
increments can be used• Once cost reaches cost of alternate path no
more loops possible
![Page 69: Improving Internet Availability](https://reader030.vdocuments.site/reader030/viewer/2022032612/5681309f550346895d9694dc/html5/thumbnails/69.jpg)
69
Synchronized FIB Installation
• Network synchronized change-over at predetermined time– Signal/determine time to change– Network Synchronized Time (NTP is there)
• Either Two FIBs for fast swap – Substantial hardware implications
• FIB update “fast-enough” from change-over time• Dependent on NTP
![Page 70: Improving Internet Availability](https://reader030.vdocuments.site/reader030/viewer/2022032612/5681309f550346895d9694dc/html5/thumbnails/70.jpg)
70
Ordered FIB changes
• For any isolated link/node change• Determine “safe” ordering for FIB installation
– bad news: update from edge to failure, – good news: update from change to edge
• Each router computes its “rank” with respect to the change.
• Delays for a number of worst-case FIB compute/install times proportional to its rank.
![Page 71: Improving Internet Availability](https://reader030.vdocuments.site/reader030/viewer/2022032612/5681309f550346895d9694dc/html5/thumbnails/71.jpg)
71
Computing the Ordering
• Single Reverse SPF rooted at change node– Use old SPT to determine relevant node
• For bad news: count maximum depth of sub-tree below you
• For good news: count maximum hops to change
![Page 72: Improving Internet Availability](https://reader030.vdocuments.site/reader030/viewer/2022032612/5681309f550346895d9694dc/html5/thumbnails/72.jpg)
72
Delay Proportional to Network Diameter
• For Good News, rSPF gives necessary depth.• For Bad News, rSPF is overly pessimistic for
some topologies.• Strategies to reduce unnecessary delay
– Prune rSPF by only considering the branch across the failure – but still too pessimistic.
– Run SPF rooted at edge nodes to correctly prune them – but doesn’t scale.
– Compare rSPFs before and after failure
S
F
E
1
G
A
B
1
1
D
1
5
10
Calc Delay 0Needed Delay 0
Calc Delay NNeeded Delay 0
Calc Delay N+1Needed Delay 1
Avoids all micro-loops and requires single FIB install. Delay depends on network diameter; may be unacceptable.
![Page 73: Improving Internet Availability](https://reader030.vdocuments.site/reader030/viewer/2022032612/5681309f550346895d9694dc/html5/thumbnails/73.jpg)
73
Ordered SPF Summary
• No forwarding changes required.• No signalling required at time of change.• Complete prevention of loops for isolated node or link
changes.
• Requires cooperation from all routers• May delay re-convergence for tens of seconds (unless
optional signalling used)
![Page 74: Improving Internet Availability](https://reader030.vdocuments.site/reader030/viewer/2022032612/5681309f550346895d9694dc/html5/thumbnails/74.jpg)
74
Packet Marking
• Mark packets to force forwarding according to a particular topology.
• Topology can be new or old
![Page 75: Improving Internet Availability](https://reader030.vdocuments.site/reader030/viewer/2022032612/5681309f550346895d9694dc/html5/thumbnails/75.jpg)
75
Failure-Carrying Packets
• When a router detects a failed link, the packet carries the information about the failure
• Routers recompute shortest paths based on these missing edges
![Page 76: Improving Internet Availability](https://reader030.vdocuments.site/reader030/viewer/2022032612/5681309f550346895d9694dc/html5/thumbnails/76.jpg)
76
FCP: Strengths and Weaknesses
• Stretch is bounded/enforced for single failures– Though still somewhat high (20% of paths have 1.2+)
• No tunneling required
Strengths
Weaknesses
• Overhead– Option 1: All nodes must have same network map,
and recompute SPF– Option 2: Packets must carry source routes
• Multiple failures could cause very high stretch
![Page 77: Improving Internet Availability](https://reader030.vdocuments.site/reader030/viewer/2022032612/5681309f550346895d9694dc/html5/thumbnails/77.jpg)
77
Alternate Approach: Protect Paths
• Idea: compute backup topologies in advance– No dynamic routing, just dynamic forwarding– End systems (routers, hosts, proxies) detect failures
and send hints to deflect packets– Detection can also happen locally
• Various proposals– Multihoming/multi-path routing– Multi-router configurations– Path Splicing
![Page 78: Improving Internet Availability](https://reader030.vdocuments.site/reader030/viewer/2022032612/5681309f550346895d9694dc/html5/thumbnails/78.jpg)
78
What is Multihoming?
• The use of redundant network links for the purposes of external connectivity
• Can be achieved at many layers of the protocol stack and many places in the network– Multiple network interfaces in a PC– An ISP with multiple upstream interfaces
• Can refer to having multiple connections to– The same ISP– Multiple ISPs
![Page 79: Improving Internet Availability](https://reader030.vdocuments.site/reader030/viewer/2022032612/5681309f550346895d9694dc/html5/thumbnails/79.jpg)
79
Why Multihome?
• Redundancy• Availability• Performance• Cost
Interdomain traffic engineering: the process by which a multihomed network configures its
network to achieve these goals
![Page 80: Improving Internet Availability](https://reader030.vdocuments.site/reader030/viewer/2022032612/5681309f550346895d9694dc/html5/thumbnails/80.jpg)
80
Redundancy
• Maintain connectivity in the face of:– Physical connectivity problems (fiber cut, device
failures, etc.)– Failures in upstream ISP
![Page 81: Improving Internet Availability](https://reader030.vdocuments.site/reader030/viewer/2022032612/5681309f550346895d9694dc/html5/thumbnails/81.jpg)
81
Performance
• Use multiple network links at once to achieve higher throughput than just over a single link.
• Allows incoming traffic to be load-balanced.
70% of traffic30% of traffic
![Page 82: Improving Internet Availability](https://reader030.vdocuments.site/reader030/viewer/2022032612/5681309f550346895d9694dc/html5/thumbnails/82.jpg)
82
Multihoming in IP Networks Today
• Stub AS: no transit service for other ASes– No need to use BGP
• Multi-homed stub AS: has connectivity to multiple immediate upstream ISPs– Need BGP– No need for a public AS number– No need for IP prefix allocation
• Multi-homed transit AS: connectivity to multiple ASes and transit service– Need BGP, public AS number, IP prefix allocation
![Page 83: Improving Internet Availability](https://reader030.vdocuments.site/reader030/viewer/2022032612/5681309f550346895d9694dc/html5/thumbnails/83.jpg)
83
BGP or no?
• Advantages of static routing– Cheaper/smaller routers (less true nowadays)– Simpler to configure
• Advantages of BGP– More control of your destiny (have providers stop
announcing you)– Faster/more intelligent selection of where to send
outbound packets.– Better debugging of net problems (you can see the
Internet topology now)
![Page 84: Improving Internet Availability](https://reader030.vdocuments.site/reader030/viewer/2022032612/5681309f550346895d9694dc/html5/thumbnails/84.jpg)
84
Same Provider or Multiple?
• If your provider is reliable and fast, and affordably, and offers good tech-support, you may want to multi-home initially to them via some backup path (slow is better than dead).
• Eventually you’ll want to multi-home to different providers, to avoid failure modes due to one provider’s architecture decisions.
![Page 85: Improving Internet Availability](https://reader030.vdocuments.site/reader030/viewer/2022032612/5681309f550346895d9694dc/html5/thumbnails/85.jpg)
85
Multihomed Stub: One Link
• Downstream ISP’s routers configure default (“static”) routes pointing to border router.
• Upstream ISP advertises reachability
Upstream ISP
Multiple links between same pair of routers.
Default routes to “border”
“Stub”ISP
![Page 86: Improving Internet Availability](https://reader030.vdocuments.site/reader030/viewer/2022032612/5681309f550346895d9694dc/html5/thumbnails/86.jpg)
86
Multihomed Stub: Multiple Links
• Use BGP to share load• Use private AS number (why is this OK?)• As before, upstream ISP advertises prefix
Upstream ISP
Multiple links to different upstream routers
“Stub”ISP
Internal routing for “hot potato”
BGP for load balance at edge
![Page 87: Improving Internet Availability](https://reader030.vdocuments.site/reader030/viewer/2022032612/5681309f550346895d9694dc/html5/thumbnails/87.jpg)
87
Multihomed Stub: Multiple ISPs
• Many possibilities– Load sharing– Primary-backup– Selective use of different ISPs
• Requires BGP, public AS number, etc.
“Stub”ISP
Upstream
ISP 1
Upstream
ISP 2
![Page 88: Improving Internet Availability](https://reader030.vdocuments.site/reader030/viewer/2022032612/5681309f550346895d9694dc/html5/thumbnails/88.jpg)
88
Multihomed Transit Network
• BGP everywhere• Incoming and outcoming traffic• Challenge: balancing load on intradomain and egress
links, given an offered traffic load
TransitISP
ISP 1
ISP 2
ISP 3
![Page 89: Improving Internet Availability](https://reader030.vdocuments.site/reader030/viewer/2022032612/5681309f550346895d9694dc/html5/thumbnails/89.jpg)
89
Protecting Paths: Multi-Path Routing
• Idea: Compute multiple paths– If paths are disjoint, one is likely to survive failure– Send traffic along multiple paths in parallel
• Two functions– Dissemination: How nodes discover paths– Selection: How nodes send traffic along paths
• Key problem: Scaling
![Page 90: Improving Internet Availability](https://reader030.vdocuments.site/reader030/viewer/2022032612/5681309f550346895d9694dc/html5/thumbnails/90.jpg)
90
Multiple Routing Configurations
• Relies on multiple logical topologies – Builds backup configurations so that all components
are protected
• Recovered traffic is routed in the backup configurations
• Detection and recovery is local
• Path protection to egress node
![Page 91: Improving Internet Availability](https://reader030.vdocuments.site/reader030/viewer/2022032612/5681309f550346895d9694dc/html5/thumbnails/91.jpg)
91
MRC: How It Works
• Precomputation of backup: Backup paths computed to protect failed nodes or edges– Set link weights high so that no traffic goes through a
particular node
• Recovery: When a router detects a failure, it switches topologies.– Packets keep track of when they have switched
topologies, to avoid loops
![Page 92: Improving Internet Availability](https://reader030.vdocuments.site/reader030/viewer/2022032612/5681309f550346895d9694dc/html5/thumbnails/92.jpg)
92
Configuration
• Each configuration is a set of link weights.• Each is resistant to failures in node n and link l if no
traffic is routed over l or via n. In this configuration, l and n are called isolated.
• There must exist enough configurations so that every network component is isolated in one of them.
• A link to an isolated node is called restricted. Its weight is set high enough so that only packets addressed to that node is sent over the link.
• Isolated links have their weight set to infinity, and are never used.
92
![Page 93: Improving Internet Availability](https://reader030.vdocuments.site/reader030/viewer/2022032612/5681309f550346895d9694dc/html5/thumbnails/93.jpg)
93
Recovering from Failure
• The router does not immediately inform the rest of the network.
• Packets that should be forwarded over the failed interface are marked as belonging to the chosen backup configuration and are forwarded over an alternative interface.
• The routers along the way will see which configuration to use.
![Page 94: Improving Internet Availability](https://reader030.vdocuments.site/reader030/viewer/2022032612/5681309f550346895d9694dc/html5/thumbnails/94.jpg)
94
MRC: Strengths and Weaknesses
• 100% coverage• Better control over recovery paths
– Recovered traffic routed independently
• Needs a topology identifier– Packet marking, or tunnelling
• Potentially large number of topologies required• No end-to-end recovery• Only one switch
Strengths
Weaknesses
![Page 95: Improving Internet Availability](https://reader030.vdocuments.site/reader030/viewer/2022032612/5681309f550346895d9694dc/html5/thumbnails/95.jpg)
95
Multipath: Promise and Problems
• Bad: If any link fails on both paths, s is disconnected from t
• Want: End systems remain connected unless the underlying graph is disconnected
ts
![Page 96: Improving Internet Availability](https://reader030.vdocuments.site/reader030/viewer/2022032612/5681309f550346895d9694dc/html5/thumbnails/96.jpg)
96
Path Splicing
• Step 1 (Perturbations): Run multiple instances of the routing protocol, each with slightly perturbed versions of the configuration
• Step 2 (Parallelization): Allow traffic to switch between instances at any node in the protocol
ts
Compute multiple forwarding trees per destination.Allow packets to switch slices midstream.
![Page 97: Improving Internet Availability](https://reader030.vdocuments.site/reader030/viewer/2022032612/5681309f550346895d9694dc/html5/thumbnails/97.jpg)
97
Perturbations
• Goal: Each instance provides different paths• Mechanism: Each edge is given a weight that is
a slightly perturbed version of the original weight– Two schemes: Uniform and degree-based
ts
3
3
3
“Base” Graph
ts
3.5
4
5 1.5
1.5
1.25
Perturbed Graph
![Page 98: Improving Internet Availability](https://reader030.vdocuments.site/reader030/viewer/2022032612/5681309f550346895d9694dc/html5/thumbnails/98.jpg)
98
Slicing
• Goal: Allow multiple instances to co-exist• Mechanism: Virtual forwarding tables
a
t
c
s b
t a
t c
Slice 1
Slice 2
dst next-hop
![Page 99: Improving Internet Availability](https://reader030.vdocuments.site/reader030/viewer/2022032612/5681309f550346895d9694dc/html5/thumbnails/99.jpg)
99
Path Splicing in Practice
• Packet has shim header with routing bits
• Routers use lg(k) bits to index forwarding tables– Shift bits after inspection– Incremental deployment is trivial– Persistent loops cannot occur
• To access different (or multiple) paths, end systems simply change the forwarding bits
![Page 100: Improving Internet Availability](https://reader030.vdocuments.site/reader030/viewer/2022032612/5681309f550346895d9694dc/html5/thumbnails/100.jpg)
100
Recovery in the Wide-Area
![Page 101: Improving Internet Availability](https://reader030.vdocuments.site/reader030/viewer/2022032612/5681309f550346895d9694dc/html5/thumbnails/101.jpg)
101
Recovery in the Wide-Area
Scalability
Performance (convergence speed, etc.)
BGP
Routing overlays (e.g., RON)
![Page 102: Improving Internet Availability](https://reader030.vdocuments.site/reader030/viewer/2022032612/5681309f550346895d9694dc/html5/thumbnails/102.jpg)
102
Slow Convergence in BGPGiven a failure, can take up to 15 minutes to see BGP.
Sometimes, not at all.
![Page 103: Improving Internet Availability](https://reader030.vdocuments.site/reader030/viewer/2022032612/5681309f550346895d9694dc/html5/thumbnails/103.jpg)
104
Routing Convergence in the Wild
• Route withdrawn, but stub cycles through backup path…
![Page 104: Improving Internet Availability](https://reader030.vdocuments.site/reader030/viewer/2022032612/5681309f550346895d9694dc/html5/thumbnails/104.jpg)
105
Resilient Overlay Networks: Goal
• Increase reliability of communication for a small (i.e., < 50 nodes) set of connected hosts
• Main idea: End hosts discover network-level path failure and cooperate to re-route.
![Page 105: Improving Internet Availability](https://reader030.vdocuments.site/reader030/viewer/2022032612/5681309f550346895d9694dc/html5/thumbnails/105.jpg)
106
RON: Resilient Overlay Networks
Premise: application overlay network to increase performance and reliability of routing
Two-hop (application-level) route
application-layer router
![Page 106: Improving Internet Availability](https://reader030.vdocuments.site/reader030/viewer/2022032612/5681309f550346895d9694dc/html5/thumbnails/106.jpg)
107
RON Can Outperform IP Routing
• IP routing does not adapt to congestion– But RON can reroute when the direct path is congested
• IP routing is sometimes slow to converge– But RON can quickly direct traffic through intermediary
• IP routing depends on AS routing policies– But RON may pick paths that circumvent policies
• Then again, RON has its own overheads– Packets go in and out at intermediate nodes
• Performance degradation, load on hosts, and financial cost– Probing overhead to monitor the virtual links
• Limits RON to deployments with a small number of nodes
![Page 107: Improving Internet Availability](https://reader030.vdocuments.site/reader030/viewer/2022032612/5681309f550346895d9694dc/html5/thumbnails/107.jpg)
108
RON Architecture• Outage detection
– Active UDP-based probing• Uniform random in [0,14]• O(n2)
– 3-way probe• Both sides get RTT information• Store latency and loss-rate information in DB
• Routing protocol: Link-state between nodes• Policy: restrict some paths from hosts
– E.g., don’t use Internet2 hosts to improve non-Internet2 paths
![Page 108: Improving Internet Availability](https://reader030.vdocuments.site/reader030/viewer/2022032612/5681309f550346895d9694dc/html5/thumbnails/108.jpg)
109
Main results
• RON can route around failures in ~ 10 seconds
• Often improves latency, loss, and throughput
• Single-hop indirection works well enough– Motivation for second paper (SOSR)– Also begs the question about the benefits of overlays
![Page 109: Improving Internet Availability](https://reader030.vdocuments.site/reader030/viewer/2022032612/5681309f550346895d9694dc/html5/thumbnails/109.jpg)
110
When (and why) does RON work?
• Location: Where do failures appear?– A few paths experience many failures, but many paths experience
at least a few failures (80% of failures on 20% of links).
• Duration: How long do failures last?– 70% of failures last less than 5 minutes
• Correlation: Do failures correlate with BGP instability?– BGP updates often coincide with failures– Failures near end hosts less likely to coincide with BGP– Sometimes, BGP updates precede failures (why?)
Feamster et al., Measuring the Effects of Internet Path Faults on Reactive Routing, SIGMETRICS 2003
![Page 110: Improving Internet Availability](https://reader030.vdocuments.site/reader030/viewer/2022032612/5681309f550346895d9694dc/html5/thumbnails/110.jpg)
111
Location of Failures
• Why it matters: failures closer to the edge are more difficult to route around, particularly last-hop failures– RON testbed study (2003): About 60% of failures
within two hops of the edge– SOSR study (2004): About half of failures potentially
recoverable with one-hop source routing• Harder to route around broadband failures (why?)
![Page 111: Improving Internet Availability](https://reader030.vdocuments.site/reader030/viewer/2022032612/5681309f550346895d9694dc/html5/thumbnails/111.jpg)
112
Benefits of Overlays
• Access to multiple paths– Provided by BGP multihoming
• Fast outage detection– But…requires aggressive probing; doesn’t scale
Question: What benefits does overlay routing provide over traditional multihoming + intelligent routing (e.g., RouteScience)?
![Page 112: Improving Internet Availability](https://reader030.vdocuments.site/reader030/viewer/2022032612/5681309f550346895d9694dc/html5/thumbnails/112.jpg)
113
Drawbacks and Open Questions
• Efficiency– Requires redundant traffic on access links
• Scaling– Can a RON be made to scale to > 50 nodes?– How to achieve probing efficiency?
• Interaction of overlays and IP network• Interaction of multiple overlays
![Page 113: Improving Internet Availability](https://reader030.vdocuments.site/reader030/viewer/2022032612/5681309f550346895d9694dc/html5/thumbnails/113.jpg)
114
Efficiency
• Problem: traffic must traverse bottleneck link both inbound and outbound
• Solution: in-network support for overlays– End-hosts establish reflection points in routers
• Reduces strain on bottleneck links• Reduces packet duplication in application-layer multicast
(next lecture)
Upstream ISP
![Page 114: Improving Internet Availability](https://reader030.vdocuments.site/reader030/viewer/2022032612/5681309f550346895d9694dc/html5/thumbnails/114.jpg)
115
Interaction of Overlays and IP Network• ISPs: “Overlays will interfere with our traffic
engineering goals.”– Likely would only become a problem if overlays
became a significant fraction of all traffic– Control theory: feedback loop between ISPs and
overlays– Philosophy: Who should have the final say in how
traffic flows through the network?
End-hostsobserve
conditions, react
ISP measures traffic matrix,
changes routing config.
Traffic matrix
Changes in end-to-end paths
![Page 115: Improving Internet Availability](https://reader030.vdocuments.site/reader030/viewer/2022032612/5681309f550346895d9694dc/html5/thumbnails/115.jpg)
116
Interaction of Multiple Overlays
• End-hosts observe qualities of end-to-end paths• Might multiple overlays see a common “good
path”• Could these multiple overlays interact to create
increase congestion, oscillations, etc.?
“Selfish routing”
![Page 116: Improving Internet Availability](https://reader030.vdocuments.site/reader030/viewer/2022032612/5681309f550346895d9694dc/html5/thumbnails/116.jpg)
117
Lesson from Routing Overlays
• End-hosts can measure path performance metrics on the (small number of) paths that matter
• Internet routing scales well, but at the cost of performance
End-hosts are often better informed about performance, reachability
problems than routers.
![Page 117: Improving Internet Availability](https://reader030.vdocuments.site/reader030/viewer/2022032612/5681309f550346895d9694dc/html5/thumbnails/117.jpg)
118
Recovery: Research Problems
• Tradeoffs between stretch and reliability
• Fast, scalable recovery in the wide area
• Interactions between routing at multiple layers
![Page 118: Improving Internet Availability](https://reader030.vdocuments.site/reader030/viewer/2022032612/5681309f550346895d9694dc/html5/thumbnails/118.jpg)
119
rcc (routers)
FIREMAN (firewalls),
OpNet, …
IP Fast Reroute,
RON
Routing Control Platform
4D Architecture
CoNMan
Failure-Carrying Packets
Multi-Router Configuration
Path Splicing
Proactive Reactive
Bandage
Amputation
Summary
![Page 119: Improving Internet Availability](https://reader030.vdocuments.site/reader030/viewer/2022032612/5681309f550346895d9694dc/html5/thumbnails/119.jpg)
120
Looking Forward: Rethinking Availability
• What definitions of availability are appropriate?– Downtime
• Fraction of time that path exists between endpoints• Fraction of time that endpoints can communicate on any path
– Transfer time• How long must I wait to get content?• (Perhaps this makes more sense in delay-tolerant networks,
bittorrent-style protocols, etc.)
• Some applications depend more on availability of content, rather than uptime/availability of any particular Internet path or host