ars-msr 3 interdomain 2013
TRANSCRIPT
-
8/13/2019 Ars-msr 3 Interdomain 2013
1/48
Routing protocols
Inter-domain routing
Architectures for Networks
and Services
Arhitecturi pentru retele si servicii (ARS)
Managementul serviciilor si retelelor (MSR)
-
8/13/2019 Ars-msr 3 Interdomain 2013
2/48
Octavian Catrina 2
Routing in the Internet (review)
Routing and administrative boundaries
The Internet is a federation of autonomous IP networks, owned
and operated by different organizations.
Each organization manages autonomously:
the connectivity and routing within its own network;
the policies for interconnection with other networks.
Internet routing must take into account these administrative
boundaries.
-
8/13/2019 Ars-msr 3 Interdomain 2013
3/48
Octavian Catrina 3
Autonomous systems (AS)
An AS is a connected group of networks under a single technical
administration, with consistent interior routing, and unified exteriorrouting policy.
An AS is treated as a single entityby the rest of the Internet. It is
also called (autonomous) routing domain.
Autonomous systems are distinguished byAS numbers.
16-bit numbers initially; 32-bit numbers since RFC 4893, 2007. A routing domain needs a unique AS number only if it
participates in inter-domain routing, with distinct routing policies.
The range 64512 - 65535 is reserved for private use.
AS 100 AS 500
ISP-1 ISP-2
Single-homed stub network.
Extension of ISP-1 network.
Does not need its own AS
number.
Multi-homed stub network.
Needs an AS number for
typical routing policies
related to multi-homing.AS 7000
Transit networks.
Need an AS number.
S1 S2
-
8/13/2019 Ars-msr 3 Interdomain 2013
4/48
Octavian Catrina 4
How many Autonomous Systems?
http://www.potaroo.net, Oct. 2012
Evolution of the number of assigned AS numbers(found in BGP routing tables)
-
8/13/2019 Ars-msr 3 Interdomain 2013
5/48
Octavian Catrina 5
Intra-domain vs. inter-domain routing
Inter-domain routing
Based on AS interconnection
topology and routing policies.
Designated pairs of AS border
routers exchange routing info.
Focus on scalability, route
stability, convergence.
Exterior routing (or gateway)
protocols (EGP): BGP4.
Intra-domain routing
Based on IP network topology
and performance metrics.
All routers in the AS participate
in route finding and selection.
Focus on efficiency and
performance (QoS).
Interior routing (or gateway)
protocols (IGP): RIP, OSPF, ...
IGP EGP
-
8/13/2019 Ars-msr 3 Interdomain 2013
6/48
Octavian Catrina 6
Internet structure
Peer Relationship (shared-cost) Shortcuts
Customer-Provider Relationship (transit) Hierarchical structure
Tier-1 ISPs.
Very large,
global networks
Transit
domains
Stub
domains
Tier-2 ISPs.National or
regional ISPs.
Company
networks,small ISPs,
...
Examples of paths
-
8/13/2019 Ars-msr 3 Interdomain 2013
7/48
Map of the Internet
Octavian Catrina 7
http://www.technologyreview.com
Three distinct regions are apparent: an
inner core of highly connected nodes, an
outer periphery of isolated networks, and amantle-like mass of peer-connected nodes.
Bigger the node, more connections it has.
The picture shows the Internet as
consisting of a dense core of 80 or so
critical nodes surrounded by an outer shell
of 5,000 sparsely connected, isolated
nodes that are very much dependent upon
this core. Separating the core from the
outer shell are approximately 15,000 peer-
connected and self-sufficient nodes.
Take away the core, and an interesting
thing happens: about 30 percent of the
nodes from the outer shell become
completely cut off. But the remaining 70
percent can continue communicatingbecause the middle region has enough
peer-connected nodes to bypass the core.
With the core connected, any node is able
to communicate with any other node
within about 4 links. If the core is removed,
it takes about 7 or 8 links.
-
8/13/2019 Ars-msr 3 Interdomain 2013
8/48 Octavian Catrina 8
Types of routing domains
Transit domain
Allows other routing domains to use its network infrastructure
to communicate with each-other (ISPs, typically).
Stub domain
Does not allow the use of its network infrastructure for traffic
between other routing domains.
A stub domain is connected to at least one transit domain.
Single-homed: 1 transit domain. Dual-homed: 2 transit domains.
Can also be connected to other stub domains.
Different traffic patterns: content rich; access rich.
S2S3
S4
S5T1 T3
T2
S1T1 - T3: Transit domains.
S1 - S5: Stub domains.S1, S3: Multi-homed stub
domains.
-
8/13/2019 Ars-msr 3 Interdomain 2013
9/48 Octavian Catrina 9
Inter-domain relationships
Customer-provider relationship
Provider offers transit for traffic between the customer and all
prefixes it can reach or part of them. Customer pays provider.
Peering relationship
Two ASes interconnect their networks such that to allow traffic
between their customers and between their internal prefixes. Typically established by ISPs of similar sizes, without payment.
Peering provides shortcuts and reduces transit costs.
AS1
AS2
AS3 AS5
AS7
Peer
Customer
AS6
AS4
Contribution to delivery costs (implicit)
Provider
Peer
-
8/13/2019 Ars-msr 3 Interdomain 2013
10/48 Octavian Catrina 10
Physical ISP interconnection
Internet exchange
More efficient solution: rendezvouspoint for many ISPs hub and
spokes physical topology.
Run by an independent organization.
A typical Internet exchange offers:
Collocation for ISP routers running
BGP.
High speed switched Ethernet LAN
(1-10 Gbps links) for the
interconnection of the ISP routers.
Dedicated links
Set up between 2 ISPs that agree to
interconnect their networksmesh
topology for AS interconnection.
AS2
AS4
AS3
AS5
Internet Exchange
Dedicated
links
AS1
-
8/13/2019 Ars-msr 3 Interdomain 2013
11/48 Octavian Catrina 11
Border Gateway Protocol (BGP)
BGP: The inter-domain routing protocol of the Internet
"BGP is running on more than 100K routers (my estimate),
making it one of worlds largest and most visible distributed
systems". T. Griffin, 2002.
BGP evolution("BGP was not designed, it evolved")
1989-1995: The beginnings.1989: BGP-1 (RFC 1105) replaces EGP.
1990, 1991: BGP-2 (RFC 1163). BGP-3 (RFC 1267).
1995: BGP-4 (RFC 1771).
Uses CIDR, the new IP addressing & routing scheme. Deployed
all over the Internet during the following years. So far, it scaled up quite well with the fast growth of the Internet.
2006: Updated BGP-4 specification, RFC 4271.
Tens of other RFCs describe various BGP-4 extensions
introduced in the mean time.
-
8/13/2019 Ars-msr 3 Interdomain 2013
12/48 Octavian Catrina 12
Active BGP routes (2004)
http://www.potaroo.net, Oct. 2004
Fast exponential
growth before CIDR
CIDR is deployed,
route aggregation limits
routing table growth
ISPs improve
BGP policies
Route aggregation
does not work anymore
Route aggregation
problems again!
About 180000 routes
-
8/13/2019 Ars-msr 3 Interdomain 2013
13/48
Octavian Catrina 13
Active BGP routes (2012)
http://www.potaroo.net, Oct. 2012
About 430000 routes
-
8/13/2019 Ars-msr 3 Interdomain 2013
14/48
Octavian Catrina 14
Outline of BGP operation
Path vector routing protocol Similar to distance vector protocols.
BGP sessions
Routers establish a BGP sessionon a TCP
connection, exchange the current routes in the
BGP Routing Information Base (RIB), and thensend incremental updatesof RIB changes.
Feasible
routes.
Withdrawn
routes.
BGP
session
AS1
R1
AS2
R2
BGP routing information (UPDATE message)
Feasible routes: Address prefixes reachable via the sender's
ASand an associated set ofpath attributes.
Withdrawn (infeasible) routes: Address prefixes of previously
announced routes that are not reachable anymore via the
sender's AS.
An UPDATE message can indicate both feasible routes and
infeasible routes.
-
8/13/2019 Ars-msr 3 Interdomain 2013
15/48
Octavian Catrina 15
BGP path attributes
Path attributes
BGP selects a feasible "best" route to a destination based on
information provided in a set of attributesassociated to each
path, including: loop detection, next hop, local preference, etc.
Well-known attributes
Must be recognized by all compliant BGP implementations.Are propagated to other neighbors.
Well-known mandatory: Must be specified for all paths.
Well-known discretionary: May be omitted for some paths.
Optional attributes
Not required and not expected to be recognized by all BGPimplementations. If recognized they may be propagated to
other BGP routers, according to their meaning.
Optional transitive: If not recognized, may be propagated.
Optional non-transitive: If not recognized, must be discarded.
-
8/13/2019 Ars-msr 3 Interdomain 2013
16/48
Octavian Catrina 16
Main BGP path attributes
Attribute Category Description
ORIGIN well-knownmandatory
Origin of the path information: 0 = intra-AS (IGP);1 = EGP; 2 = incomplete (other means). Used to
assess the trust in the source of routing info.
AS_PATH well-known
mandatory
List of all the ASes on the path: AS_SEQUENCE,
AS_SET (AS_SET is needed for route aggregation).
Loop detection, in/out filtering, best path selection.
NEXT_HOP well-knownmandatory
Address of the router to be used as next hop on thepath to the specified prefix.
MULTI_EXIT_DISC
(MED)
optional non-
transitive
Value used in best path selection to discriminate
among multiple entry points of a neighboring AS.
LOCAL_PREF well-known Value of the degree of preference for this route.
Only used inside an AS (iBGP session, not eBGP).
COMMUNITY
(RFC 1997)
optional
transitive
32-bit tag that can be associated to any group of
prefixes in order to specify various policies (can be
structured as 16-bit AS number and 16-bit value).
Standard (e.g., NO_EXPORT) and custom policies.
Often used to control inbound traffic.
. . .
-
8/13/2019 Ars-msr 3 Interdomain 2013
17/48
BGP extensions
Multiprotocol BGP (MP-BGP)
Extends BGP in order to carry routing information for other
protocols besides IPv4 (e.g., IPv6, VPN-IPv4, VPN-IPv6).
RFC 4760, 2007, Multiprotocol Extensions for BGP-4.
BGP Extended Communities Attribute
Extends the range of values and adds structure (Type field). RFC 4360, 2006, BGP Extended Communities Attribute.
Octavian Catrina 17
Attribute Category Description
MP_REACH_NLRI
MP_UNREACH_NLRI
optional non-
transitive
Multiprotocol Reachable NLRI. Used to advertise MP route.
Multiprotocol Unreachable NLRI. Used to withdraw MP route.
Two fields are used to describe the kind of route: AddressFamily Identifier (AFI), e.g., IPv4, IPv6, vpnv4 (VPN-IPv4),
vpnv6 (VPN-IPv6). Subsequent Address Family Identifier
(SAFI), e.g. unicast, multicast.
Extended Communities optional
transitive
Extension of Community attribute8 octets, out of which 1-2
octets for type field. E.g., Route Target (RT) Community used
for MPLS-based L3 VPNs.
-
8/13/2019 Ars-msr 3 Interdomain 2013
18/48
Octavian Catrina 18
BGP messages
BGP message Description
OPEN Used to establish a BGP session between two routers (includes
mutual identification and optional capabilities negotiation). Each
router sends an OPEN message, which is acknowledged (if
accepted) by a KEEPALIVE message.
KEEPALIVE Used to verify the liveness of the BGP session. A KEEPALIVE
message is sent if no other message was transmitted duringHoldTime/3 seconds. A BGP session is canceled if no message
was received during HoldTime seconds.
UPDATE Used to exchange routing information on a BGP session. Can
carry a set of feasible routes with common attribute-set (prefixes
and attribute-set), and/or a set of withdrawn routes (prefixes).
NOTIFICATION Used to inform the other BGP router of an error. After sending or
receiving this message a BGP session is closed.
ROUTE_REFRESH Message added in 2000 to enable a graceful restart of a BGP
session (e.g., when policies are reconfigured).
-
8/13/2019 Ars-msr 3 Interdomain 2013
19/48
Octavian Catrina 19
BGP session overview
AS 100
R1
AS 200
R2
TCP connection establishment (to port 179)
OPEN (Version = 4, My-AS = 100, Hold-Time = 180,
BGP-Identifier = 10.1.3.1, Optional-Param.)
BGP session establishment
BGP BGP
Session
established
10.1.3.1 10.1.3.2
OPEN (Version = 4, My-AS = 200, Hold-Time = 180,
BGP-Identifier = 10.1.3.2, Optional-Param.)
Session
establishedExchange of current routes in RIB
KEEPALIVE
KEEPALIVE
UPDATE (Path-Attributes-Len = ..., Path-Attributes = ...,
NLRI = 140.1.128.0/18, 160.2.0.0/16)
140.1.128.0/18160.2.0.0/16 190.3.192.0/18170.4.0.0/16
. . .
UPDATE (Path-Attributes-Len = ..., Path-Attributes = ...,
NLRI = 190.3.192.0/18, 170.4.0.0/16)
Incremental updates (RIB changes) using UPDATE:
- Announce new feasible routes.
- Withdraw infeasible routes.
Periodic KEEPALIVE messages (e.g., at Hold-Time/3)
NLRI = Network Layer
Reachability Information
= IP address prefix
-
8/13/2019 Ars-msr 3 Interdomain 2013
20/48
Example: BGP session setup
Octavian Catrina 20
OPEN Messageand main parameters
KEEPALIVE Message
-
8/13/2019 Ars-msr 3 Interdomain 2013
21/48
Example: BGP updates (1)
Octavian Catrina 21
Prefix of the BGP route: 10.20.0.0/16
Attributes of the BGP route
to the prefix specified below
UPDATE Message advertising a
feasible rout eto 10.20.0.0/16
-
8/13/2019 Ars-msr 3 Interdomain 2013
22/48
Example: BGP updates (2)
Octavian Catrina 22
Prefix of the withdrawn route:
10.20.0.0/16
UPDATE Message announcing a
withdrawn ( infeasib le) routeto 10.20.0.0/16
A BGP UPDATE message can advertise feasible routes and also announce withdrawn routes
(previously advertised routes which are no longer feasible).
-
8/13/2019 Ars-msr 3 Interdomain 2013
23/48
Octavian Catrina 23
BGP path vector routing (1)
AS 10
R1
AS 20
R2
BGP session setup
11.1.1.1
11.1.1.2
140.1.0.0/16150.1.0.0/16
AS 30
R3160.1.0.0/1611.1.2.1
11.1.2.2
UPDATE (Prefix = 140.1.0.0/16,
AS_PATH = 10, NEXT_HOP = 11.1.1.1)
Router R1: BGP routing table
Prefix AS_PATH NEXT_HOP
140.1.0.0/16 internal . . .
Router R2: BGP routing table
Prefix AS_PATH NEXT_HOP
150.1.0.0/16 internal . . .
Router R3: BGP routing table
Prefix AS_PATH NEXT_HOP
160.1.0.0/16 internal . . .
UPDATE (Prefix = 150.1.0.0/16,
AS_PATH = 20, NEXT_HOP = 11.1.1.2)
Router R1: BGP routing table
Prefix AS_PATH NEXT_HOP
140.1.0.0/16 internal . . .
150.1.0.0/16 20 11.1.1.2
Router R2: BGP routing table
Prefix AS_PATH NEXT_HOP
150.1.0.0/16 internal . . .
140.1.0.0/16 10 11.1.1.1
TheAS_PATH attributelists the ASes on the path. It is used for loop
detection, for policy-based routing, and as basic path metric (AS count).
The NEXT_HOP attributeindicates the IP address of the router used to
reach the prefix.
R1 and R2 exchange the
current routes in their tables.
-
8/13/2019 Ars-msr 3 Interdomain 2013
24/48
Octavian Catrina 24
BGP path vector routing (2)
AS 10
R1
AS 20
R2
11.1.1.1
11.1.1.2
140.1.0.0/16150.1.0.0/16
AS 30
R3160.1.0.0/1611.1.2.1
11.1.2.2
Router R3: BGP routing table
Prefix AS_PATH NEXT_HOP
160.1.0.0/16 internal . . .
Router R1: BGP routing table
Prefix AS_PATH NEXT_HOP
140.1.0.0/16 internal . . .
150.1.0.0/16 20 11.1.1.2
Router R2: BGP routing table
Prefix AS_PATH NEXT_HOP
150.1.0.0/16 internal . . .
140.1.0.0/16 10 11.1.1.1
BGP session setup
UPDATE (Prefix = 150.1.0.0/16,
AS_PATH = 20, NEXT_HOP = 11.1.2.1)
UPDATE (Prefix = 160.1.0.0/16,AS_PATH = 30, NEXT_HOP = 11.1.2.2)
UPDATE (Prefix = 140.1.0.0/16,
AS_PATH = 20 10, NEXT_HOP = 11.1.2.1)
Router R3: BGP routing table
Prefix AS_PATH NEXT_HOP
160.1.0.0/16 internal . . .
150.1.0.0/16 20 11.1.2.1
140.1.0.0/16 20 10 11.1.2.1
Router R2: BGP routing table
Prefix AS_PATH NEXT_HOP
150.1.0.0/16 internal . . .
140.1.0.0/16 10 11.1.1.1
160.1.0.0/16 30 11.1.2.2
We assum e that AS 20 is the pr ovider, and
AS 10 and AS 30 are its cu stom ers. AS 20
provid es transit for its customers, hence it
announc es the routes to them.
R2 and R3 exchange the
current routes in their tables.
-
8/13/2019 Ars-msr 3 Interdomain 2013
25/48
-
8/13/2019 Ars-msr 3 Interdomain 2013
26/48
Octavian Catrina 26
BGP policy-based routing
Route import policies:For each BGP peer
- Import filter selects the received
routes accepted from that peer.
- Path attribute manipulation
influences best route selection bythis BGP router.
Control of outbound traffic:
Imported route participates in the
selection of the best path toward
that prefix.
Route export policies:For each BGP peer
- Export filter selects the routes
announced to that peer.
- Path attribute manipulation
influences the best route selectionby that peer.
Control of inbound traffic:
Announcing a route enables the
other AS to route traffic to that
prefix via this AS.
Received BGProute updates
Sent BGProute updates
Control outbound
traffic
BGP router
Control inbound
traffic
Best route
table
+ Importpolicies
+ Exportpolicies
-
8/13/2019 Ars-msr 3 Interdomain 2013
27/48
Octavian Catrina 27
Example: Route filtering
Peer
Customer Provider
Peer
AS1
AS2
Tier-2 ISP
AS3
Tier-1 ISP
AS5
Tier-1 ISP
AS7AS6
All
routesCustomer (AS1) &
internal routes (AS2)
AS4
Tier-2 ISP
Customer (AS1) &
internal routes (AS2)
BGP not
necessary
Customer (AS6) &
internal routes (AS4)
All routes
All routes
All
routes
Customer (AS6) &
internal routes (AS4)
BGP not
necessary
AS6
routes
AS6
routesDefault route
Static route Default route
Static route
Note: Assume that AS2, AS3, AS4, AS5 have many other interconnections, not shown
in this picture. In particular, AS2 and AS4 are also connected to other Tier-1 and Tier-2
ISPs, and have many other customers.
-
8/13/2019 Ars-msr 3 Interdomain 2013
28/48
Octavian Catrina 28
BGP route processing
Receive
BGPupdates
Apply import
policies
Import filter
and attributemanipulation
Best route
selection
Apply export
policies
BGP best
route table
Send
BGPupdates
Select one best route
to each destinationbased on attributes
Export filter
and attributemanipulation
IP routing
table
Install routesfor IP forwarding
IGP routes
Dir. con. net.
Static routes
BGP Loc-RIB
Policies are typically defined using vendor specific languages. Route filtering can be done based onprefix, AS number, attributes.
Import and export of internal routes is also controlled by policies.
The interactions between BGP and internal routing (IGP, static) are
not precisely defined.
BGP Adj-RIB-In BGP Adj-RIB-Out
-
8/13/2019 Ars-msr 3 Interdomain 2013
29/48
Octavian Catrina 29
Best route selection
Overview The algorithm operates on the set of routes in Adj-RIBs-In after
applying import policies (filter, tweak attributes, assign local
preference). It selects one best route for each destination.
Sequence of selection steps for each destination
Discard all the routes for which NEXT_HOP is inaccessible.
Discard all the routes for which AS_PATH contains the local AS or a loop.
Let n be the number of routes left.
If n>1, select the routes with the largest LOCAL_PREF.
If n>1, select the routes with the smallest number of ASes in AS_PATH.
If n>1, select the routes with the lowest ORIGIN (IGP1 and a route learned from eBGP, discard routes learned from iBGP.
If n>1, select the routes with shortest path (local metric) to NEXT_HOP.
If n>1, select the route from the BGP speaker with the lowest BGP Id.
-
8/13/2019 Ars-msr 3 Interdomain 2013
30/48
Octavian Catrina 30
Route propagation in an AS
AS 10
AS 20
R3
11.1.1.1
11.1.1.2 AS 30
11.1.2.1
11.1.2.2R1
R2
140.1.0.0/16
150.1.0.0/16
150.2.0.0/16
IGP?BGP
BGP
160.1.0.0/16R4
10.0.0.110.0.0.2
Example
AS 20 has two border routers, R2 and R3.
How can R2 advertise to R3 the routes learned from R1?
How can R3 advertise to R2 the routes learned from R4?
AS 20 runs an IGP for interior routing.
Could this IGP distribute the BGP routes as well?
No. The IGP cannot carry the BGP path attributes and does not
scale up to BGP routing table size.
Solution: Propagate BGP routes within an AS using BGP.
-
8/13/2019 Ars-msr 3 Interdomain 2013
31/48
Octavian Catrina 31
External vs. internal BGP
AS 10
AS 20
R3
11.1.1.1
11.1.1.2 AS 30
11.1.2.1
11.1.2.2R1
R2
140.1.0.0/16
150.1.0.0/16
150.2.0.0/16
iBGPeBGP
eBGP
160.1.0.0/16R4
10.0.0.110.0.0.2
External BGP (eBGP): BGP routers in different ASs
Internal (iBGP): BGP routers in the same AS
eBGP session
Directly connected routers (usually).
Do not use LOCAL_PREF.
Update AS_PATH, NEXT_HOPattributes in advertised paths.
Advertise the best route to each
destination. Usually, an export filter
for each session.
iBGP session
Need not directly connect the routers.
Use LOCAL_PREF attribute.
No change to AS_PATH andNEXT_HOP attributes.
Advertise only best routes learned
from eBGP (not from iBGP, to avoid
loops). Usually, no export filter.
Full mesh of iBGP sessions.
-
8/13/2019 Ars-msr 3 Interdomain 2013
32/48
Octavian Catrina 32
Example: eBGP + iBGP + IGP
AS 10
AS 20
R3
11.1.1.1
11.1.1.2 AS 30
11.1.2.1
11.1.2.2R1
R2140.1.0.0/16
150.1.0.0/16
150.2.0.0/16
iBGPeBGP
eBGP
160.1.0.0/16R4
UPDATE (eBGP)
Prefix = 140.1.0.0/16,
AS_PATH = 10,
NEXT_HOP = 11.1.1.1
10.0.0.1 10.0.0.2
UPDATE (iBGP)
Prefix = 140.1.0.0/16,
AS_PATH = 10,NEXT_HOP = 11.1.1.1,
LOCAL_PREF = 100
UPDATE (eBGP)
Prefix = 140.1.0.0/16,
AS_PATH = 20 10,
NEXT_HOP = 11.1.2.1 Note how the
path attributeschange during
route propagation
Router R2: BGP routing table
Prefix AS_PATH NEXT_HOP
140.1.0.0/16 10 11.1.1.1
Router R2: IP routing table
P Prefix INTF NEXT_HOP
BGP 140.1.0.0/16 E 11.1.1.1
- 11.1.1.0/30 E direct
- 10.0.0.0/30 S direct
- 150.1.0.0/16 W direct
IGP 150.2.0.0/16 S 10.0.0.2
IGP 11.1.2.0/30 S 10.0.0.2
Router R3: BGP routing table
Prefix AS_PATH NEXT_HOP
140.1.0.0/16 10 11.1.1.1
Router R3: IP routing table
P Prefix INTF NEXT_HOP
BGP 140.1.0.0/16 N 10.0.0.1
- 11.1.2.0/30 W direct
- 10.0.0.0/30 N direct
- 150.2.0.0/16 E direct
IGP 150.1.0.0/16 N 10.0.0.1
IGP 11.1.1.0/30 N 10.0.0.1
-
8/13/2019 Ars-msr 3 Interdomain 2013
33/48
Octavian Catrina 33
Which is the right path?
Hot potato routing
Closest exit routing (get ridof it as soon as possible).
"Natural" BGP behavior.
Cold potato routing
Best exit routing (closest todestination). Must use MED
or COMMUNITY attributes.
SF NY
Customer
Provider (ISP)
Hot
potato!
Hot
potato!
Example: Dual-homed customer
Asymmetric routes. If the customer downloads only,
then its own network carries most of the traffic!
SF NY
Customer
Provider (ISP)
Hot
potato
Cold
potato
Symmetric routes. The provider carries most
of the traffic in this case.
-
8/13/2019 Ars-msr 3 Interdomain 2013
34/48
Octavian Catrina 34
Multi-homing with one provider (1)
Example: Dual-homing without BGP
Configuration example: The customer can configure default routes on R3 (via R1) and on R4
(via R2) and distribute them internally using IGP; outbound packets follow the IGP shortest
paths and exit on either R3 or R4. The provider can configure static routes to the customer's
prefix on R1 (via R3) and R2 (via R1) and distribute them using BGP (possibly also IGP).
AS 10
Transit providerCustomer
R2
R1
R4
R3
150.1.0.0/24
11.1.1.111.1.1.2
11.1.2.111.1.2.2
Hot potato
routing
Hot potato
routing
How can we control outbound traffic?
Easier: We have full control of the routers in our own AS.
How can we control inbound traffic?
Difficult: We need to influence routing in the provider's AS.
We need BGP on R3, R4. However, solutions currently offered
by BGP are not satisfactory - still anopen issue.
No control of inbound traffic.
Asymmetric routing.
-
8/13/2019 Ars-msr 3 Interdomain 2013
35/48
Octavian Catrina 35
Multi-homing with one provider (2)
AS 10Transit provider
iBGP
AS 456
Customer
R2
R1
iBGP
R4
R3
150.1.0.0/24
11.1.1.111.1.1.2
11.1.2.111.1.2.2
Hot potato
routing
Example: Backup link using BGP Simpler goal: Primary link carries all the traffic. Backup link is
used only when the primary link fails.
Higher local
preference for
default route
via primary link
Control of outbound traffic
Configuration example for outbound traffic: R1/R2 send default routes to R3/R4. R3 and R4
set the local preference of the received default routes such that the primary link has higher
preference than the backup link. They don't need to import other routes. The default route is
then distributed in the customer network by IGP. Variant: on Cisco routers we can configure a
default network instead of default route - one of the networks advertised by the provider.
Primary
link
Backup
link
Traffic can still return on the backup link.
We have to also control inbound traffic.
UPDATE (eBGP)
Prefix = 0.0.0.0/0,
AS_PATH = 10,NEXT_HOP = 11.1.1.1
UPDATE (eBGP)
Prefix = 0.0.0.0/0,
AS_PATH = 10,
NEXT_HOP = 11.1.2.1
-
8/13/2019 Ars-msr 3 Interdomain 2013
36/48
Octavian Catrina 36
Multi-homing with one provider (3)
AS 10
Transit provider
iBGP
AS 456
Customer
Primary
link
Backup
link
R2
R1
iBGP
R3
150.1.0.0/24
11.1.1.111.1.1.2
11.1.2.111.1.2.2
Cold potato
routing
UPDATE (eBGP)
Prefix = 150.1.0.0/24,
AS_PATH = 456,
NEXT_HOP = 11.1.1.2
MED = 100
UPDATE (eBGP)
Prefix = 150.1.0.0/24,AS_PATH = 456,
NEXT_HOP = 11.1.2.2
MED = 200
Control of inbound traffic using the MED attribute MED: Multi-exit discriminator attribute. Optional, non-transitive. Lower MED means better route (MED can be IGP metric).
Applies only to routes received from the same adjacent AS.Does not help if multi-homed with different providers.
R4
Configuration example for inbound traffic: R3 advertises the customer's route with a lower
MED than R4. This indicates that the preferred route for inbound traffic is via R3 (the primary
link). Then, R1 and R2 select the route via R3 as best route to the customer (assuming that
the provider supports and accepts MED).
Example: Using MED for backup link
-
8/13/2019 Ars-msr 3 Interdomain 2013
37/48
Octavian Catrina 37
Community attribute
Community attributes
Versatile mechanism that allows grouping destinations intocommunities and applying to them various policies based on a
Community attribute. The attribute is optional and transitive.
RFC 1997, RFC 4360 (extended communities), etc.
Several standard (well-known) communities NO_EXPORT (0xFFFFFF01): Do not export the route in eBGP
sessions (only use it in within your AS).
NO_ADVERTISE (0xFFFFFF02): Do not advertise the route in
iBGP sessions (only use it in that router).
Delegated communities Operators can define communities for policies of their own
choice. Attribute value is structured into a 16-bit AS number
(local AS), and a 16-bit value for the policy.
-
8/13/2019 Ars-msr 3 Interdomain 2013
38/48
Octavian Catrina 38
Route filtering based on communities
AS 99
Our ISP
R2
AS 10Our transit provider
R1
AS 4567
Our customer
AS 1234
Our customer
R8
R7 R5
R6
R3 R4
AS 88
Our peer
Communities defined by AS 99
for export filters:
99:100Customer prefixes
99:200Transit prefixes
99:300Peer prefixes99:400 Internal prefixes
Received routes are tagged
with these communities when
they are imported.
Import: set 99:100Export: match 99:100,
99:200, 99:300, 99:400
Import: set 99:300Export: match 99:100,
99:400
Import: set 99:200
Export: match 99:100, 99:400
-
8/13/2019 Ars-msr 3 Interdomain 2013
39/48
Octavian Catrina 39
Multi-homing with different providers (1)
AS 10Transit provider
150.1.0.0/16
AS 456
Our AS
150.1.0.0/24
AS 20
Transit provider
160.1.0.0/16
UPDATE
Prefix = 150.1.0.0/24,
AS_PATH = 456,
NEXT_HOP = 150.1.1.2
UPDATE
Prefix = 150.1.0.0/24,
AS_PATH = 456,
NEXT_HOP = 160.1.1.2
150.1.1.1
150.1.1.2
160.1.1.1
160.1.1.2
UPDATE
Prefix = 150.1.0.0/16,
AS_PATH = {10 456},
NEXT_HOP = 150.1.2.1
UPDATE
Prefix = 150.1.0.0/24,
AS_PATH = 20 456,
NEXT_HOP = 160.1.2.1
Can aggregate with lo ss
of p ath info : us e AS_SET.
No aggregat ion
Some Router: BGP routing tablePrefix AS_PATH NEXT_HOP
150.1.0.0/16 ... {10 456} ...
150.1.0.0/24 ... 20 456 ...
Best path, because it is more specific!
Increase of the number of multi-homed
networks leads to less aggregation,hence fast growth of the BGP routing
table, and more instability.
If AS 10 aggregates, routers in the Internet will take the
path via AS 20 because the route is more specific.
Should AS 10 de-aggregate its /16 to balance the traffic?!
R1
Multi-homing and
route aggregation
R2
R3
160.1.2.1
150.1.2.1
-
8/13/2019 Ars-msr 3 Interdomain 2013
40/48
Octavian Catrina 40
Multi-homing with different providers (2)
AS 10
Transit provider
150.1.0.0/16
AS 456
Our AS
150.1.0.0/24
AS 20
Transit provider
160.1.0.0/16
UPDATE
Prefix = 150.1.0.0/24,
AS_PATH = 456,
NEXT_HOP = 150.1.1.2
UPDATE
Prefix = 150.1.0.0/24,
AS_PATH = 456,
NEXT_HOP = 160.1.1.2
COMMUNITY = 20:30
150.1.1.1
150.1.1.2
160.1.1.1
160.1.1.2
UPDATE
Prefix = 150.1.0.0/24,
AS_PATH = 10 456,
NEXT_HOP = ...
R1
Control of inbound traffic
using communitiesExample: Backup link
Communities defined by AS 20
for customers:
20:90 Set LOCAL_PREF = 90
20:70 Set LOCAL_PREF = 70
20:50 Set LOCAL_PREF = 50
20:30 Set LOCAL_PREF = 30
Provider policy for assigning default
LOCAL_PREF to imported routes:
Customer: LOCAL_PREF = 80
Peer: LOCAL_PREF = 60
Transit: LOCAL_PREF = 40
Assume that the providers
use the communities and
policies listed below.
Backup link
AS 20 routes to 456 via AS 10
due to the community attributereceived. It could forward the
community attribute, but this
helps only if it is recognized by
its peers and providers.
UPDATE
Prefix = 150.1.0.0/24,
AS_PATH = 20 10 456,
NEXT_HOP = ...
R2
R3
AS 10 uses primary link to 456
because it prefers a customer
route to peer and provider routes.
No aggregation of AS 456 prefix suchthat other ASes do not route via AS 20.
Caveat: De-aggregation violates CIDR
principles and increases BGP routing
table, which will become unmanageable.
S G
-
8/13/2019 Ars-msr 3 Interdomain 2013
41/48
Octavian Catrina 41
Scaling up iBGP
How to scale up iBGP to a large domain?
Routes learned from an iBGP session cannot be advertised oniBGP sessions, to avoid routing loops.
Hence we must set up a full mesh of iBGP speakers.
For N routers, each router must establish N-1 sessions, for a
total of N(N-1)/2 sessions. Does not scale up. ASes can have
100s of routers or more.
Solution 1: Confederations
Divide a large AS in smaller sub-domains.
Use an iBGP full mesh inside each sub-domain, and eBGP
between sub-domains.
Solution 2: Route reflectors
Establish one or more routers, called route reflectors, that
forward the iBGP routes to a cluster of other routers.
R t fl t (1)
-
8/13/2019 Ars-msr 3 Interdomain 2013
42/48
Octavian Catrina 42
Route reflectors (1)
From iBGP full mesh to iBGP with route reflector
iBGP
eBGPR2
R3
R1
iBGP
iBGP
eBGP
Full mesh iBGP iBGP with route reflector (RR)
iBGP
eBGPR2
R3
R1
iBGP
eBGP
RR
Interactions between a router reflector and its clients
R2 and R3 are configured as clients of the route reflector R1.Clients establish iBGP sessions only with their route reflector.
When a route reflector receives a route update from its clients
or from eBGP, it computes the best route and then advertises it
to its clients.
eBGPeBGP
R t fl t (2)
-
8/13/2019 Ars-msr 3 Interdomain 2013
43/48
Octavian Catrina 43
Route reflectors (2)
A more general configuration: route reflector and mesh
R1 = Route reflector (RR).
R2, R3, R4 = RR clients.
R5, R6 = non-clients.
R5 and R6 are not clients of the route reflector R1. They
establish together with R1 a full mesh of iBGP sessions. When R1 receives a route update from a client, it computes
the best path and advertises it to its clients and to non-clients.
When R1 receives a route update from a non-client, it
computes the best path and advertises it to its clients.
R5
R6
R2
R4
RR
R3
Full mesh Route reflector
R1
eBGP
iBGP
iBGP
eBGP
iBGP
eBGP
eBGP
eBGPiBGP
iBGP
iBGP
BGP
-
8/13/2019 Ars-msr 3 Interdomain 2013
44/48
Octavian Catrina 44
BGP convergence
Convergence issues
BGP cannot guarantee that a unique stable routing exists forany set of policies.
Moreover, BGP cannot guarantee that any stable routing
exists or that it can be found.
Various scenarios with non-deterministic outcome or persistent
oscillation have been presented over time.
Causes Locally defined policies may conflict globally.
Unconstrained policy languages.
However, outages are caused by configuration errors ...
Convergence speed Improved by the use of triggered, incremental updates.
But with current Internet size & complexity, path vector routing
is quite slow (e.g., for unreachable destination).
St bilit f i t d i ti
-
8/13/2019 Ars-msr 3 Interdomain 2013
45/48
Octavian Catrina 45
Stability of inter-domain routing
Challenges
Inter-domain routing in the global Internet relies on a hugedistributed system of over 100K BGP routers.
A router reconfiguration/restart, BGP configuration error, or
local instability (link, BGP session, etc.) can flood with updates
all the BGP routers in the Internet.
This causes routing instability and disruption of Internet traffic.
BGP traffic analysis
Most routes do not change frequently. Most of the Internet
traffic uses these routes.
A small fraction of the routes are responsible for most of theBGP messages exchanged.
Approach
Penalize the unstable routes such that to preserve the more
stable ones(and reduce Internet traffic disruption).
-
8/13/2019 Ars-msr 3 Interdomain 2013
46/48
R t fl d i l
-
8/13/2019 Ars-msr 3 Interdomain 2013
47/48
Octavian Catrina 47
Route flap dampening example
S
-
8/13/2019 Ars-msr 3 Interdomain 2013
48/48
Summary
"BGP was not designed. It evolved ..."
BGP remains a quite simple protocol, and there are manyrobust and efficient implementations.
It has managed fairly well over the last 15 years to cope with
the fast growth and increasing complexity of the Internet.
BGP configuration for large domains is quite complicated and
error prone (especially policies).
Challenges
BGP security (so far, only TCP MAC with static keys).
Multi-homing and BGP "traffic engineering". How to do them
without breaking CIDR and exploding the BGP routing tables? Policy-based routing.
BGP next generation?