ars-msr 3 interdomain 2013

Upload: gabyyy09

Post on 04-Jun-2018

218 views

Category:

Documents


0 download

TRANSCRIPT

  • 8/13/2019 Ars-msr 3 Interdomain 2013

    1/48

    Routing protocols

    Inter-domain routing

    Architectures for Networks

    and Services

    Arhitecturi pentru retele si servicii (ARS)

    Managementul serviciilor si retelelor (MSR)

  • 8/13/2019 Ars-msr 3 Interdomain 2013

    2/48

    Octavian Catrina 2

    Routing in the Internet (review)

    Routing and administrative boundaries

    The Internet is a federation of autonomous IP networks, owned

    and operated by different organizations.

    Each organization manages autonomously:

    the connectivity and routing within its own network;

    the policies for interconnection with other networks.

    Internet routing must take into account these administrative

    boundaries.

  • 8/13/2019 Ars-msr 3 Interdomain 2013

    3/48

    Octavian Catrina 3

    Autonomous systems (AS)

    An AS is a connected group of networks under a single technical

    administration, with consistent interior routing, and unified exteriorrouting policy.

    An AS is treated as a single entityby the rest of the Internet. It is

    also called (autonomous) routing domain.

    Autonomous systems are distinguished byAS numbers.

    16-bit numbers initially; 32-bit numbers since RFC 4893, 2007. A routing domain needs a unique AS number only if it

    participates in inter-domain routing, with distinct routing policies.

    The range 64512 - 65535 is reserved for private use.

    AS 100 AS 500

    ISP-1 ISP-2

    Single-homed stub network.

    Extension of ISP-1 network.

    Does not need its own AS

    number.

    Multi-homed stub network.

    Needs an AS number for

    typical routing policies

    related to multi-homing.AS 7000

    Transit networks.

    Need an AS number.

    S1 S2

  • 8/13/2019 Ars-msr 3 Interdomain 2013

    4/48

    Octavian Catrina 4

    How many Autonomous Systems?

    http://www.potaroo.net, Oct. 2012

    Evolution of the number of assigned AS numbers(found in BGP routing tables)

  • 8/13/2019 Ars-msr 3 Interdomain 2013

    5/48

    Octavian Catrina 5

    Intra-domain vs. inter-domain routing

    Inter-domain routing

    Based on AS interconnection

    topology and routing policies.

    Designated pairs of AS border

    routers exchange routing info.

    Focus on scalability, route

    stability, convergence.

    Exterior routing (or gateway)

    protocols (EGP): BGP4.

    Intra-domain routing

    Based on IP network topology

    and performance metrics.

    All routers in the AS participate

    in route finding and selection.

    Focus on efficiency and

    performance (QoS).

    Interior routing (or gateway)

    protocols (IGP): RIP, OSPF, ...

    IGP EGP

  • 8/13/2019 Ars-msr 3 Interdomain 2013

    6/48

    Octavian Catrina 6

    Internet structure

    Peer Relationship (shared-cost) Shortcuts

    Customer-Provider Relationship (transit) Hierarchical structure

    Tier-1 ISPs.

    Very large,

    global networks

    Transit

    domains

    Stub

    domains

    Tier-2 ISPs.National or

    regional ISPs.

    Company

    networks,small ISPs,

    ...

    Examples of paths

  • 8/13/2019 Ars-msr 3 Interdomain 2013

    7/48

    Map of the Internet

    Octavian Catrina 7

    http://www.technologyreview.com

    Three distinct regions are apparent: an

    inner core of highly connected nodes, an

    outer periphery of isolated networks, and amantle-like mass of peer-connected nodes.

    Bigger the node, more connections it has.

    The picture shows the Internet as

    consisting of a dense core of 80 or so

    critical nodes surrounded by an outer shell

    of 5,000 sparsely connected, isolated

    nodes that are very much dependent upon

    this core. Separating the core from the

    outer shell are approximately 15,000 peer-

    connected and self-sufficient nodes.

    Take away the core, and an interesting

    thing happens: about 30 percent of the

    nodes from the outer shell become

    completely cut off. But the remaining 70

    percent can continue communicatingbecause the middle region has enough

    peer-connected nodes to bypass the core.

    With the core connected, any node is able

    to communicate with any other node

    within about 4 links. If the core is removed,

    it takes about 7 or 8 links.

  • 8/13/2019 Ars-msr 3 Interdomain 2013

    8/48 Octavian Catrina 8

    Types of routing domains

    Transit domain

    Allows other routing domains to use its network infrastructure

    to communicate with each-other (ISPs, typically).

    Stub domain

    Does not allow the use of its network infrastructure for traffic

    between other routing domains.

    A stub domain is connected to at least one transit domain.

    Single-homed: 1 transit domain. Dual-homed: 2 transit domains.

    Can also be connected to other stub domains.

    Different traffic patterns: content rich; access rich.

    S2S3

    S4

    S5T1 T3

    T2

    S1T1 - T3: Transit domains.

    S1 - S5: Stub domains.S1, S3: Multi-homed stub

    domains.

  • 8/13/2019 Ars-msr 3 Interdomain 2013

    9/48 Octavian Catrina 9

    Inter-domain relationships

    Customer-provider relationship

    Provider offers transit for traffic between the customer and all

    prefixes it can reach or part of them. Customer pays provider.

    Peering relationship

    Two ASes interconnect their networks such that to allow traffic

    between their customers and between their internal prefixes. Typically established by ISPs of similar sizes, without payment.

    Peering provides shortcuts and reduces transit costs.

    AS1

    AS2

    AS3 AS5

    AS7

    Peer

    Customer

    AS6

    AS4

    Contribution to delivery costs (implicit)

    Provider

    Peer

  • 8/13/2019 Ars-msr 3 Interdomain 2013

    10/48 Octavian Catrina 10

    Physical ISP interconnection

    Internet exchange

    More efficient solution: rendezvouspoint for many ISPs hub and

    spokes physical topology.

    Run by an independent organization.

    A typical Internet exchange offers:

    Collocation for ISP routers running

    BGP.

    High speed switched Ethernet LAN

    (1-10 Gbps links) for the

    interconnection of the ISP routers.

    Dedicated links

    Set up between 2 ISPs that agree to

    interconnect their networksmesh

    topology for AS interconnection.

    AS2

    AS4

    AS3

    AS5

    Internet Exchange

    Dedicated

    links

    AS1

  • 8/13/2019 Ars-msr 3 Interdomain 2013

    11/48 Octavian Catrina 11

    Border Gateway Protocol (BGP)

    BGP: The inter-domain routing protocol of the Internet

    "BGP is running on more than 100K routers (my estimate),

    making it one of worlds largest and most visible distributed

    systems". T. Griffin, 2002.

    BGP evolution("BGP was not designed, it evolved")

    1989-1995: The beginnings.1989: BGP-1 (RFC 1105) replaces EGP.

    1990, 1991: BGP-2 (RFC 1163). BGP-3 (RFC 1267).

    1995: BGP-4 (RFC 1771).

    Uses CIDR, the new IP addressing & routing scheme. Deployed

    all over the Internet during the following years. So far, it scaled up quite well with the fast growth of the Internet.

    2006: Updated BGP-4 specification, RFC 4271.

    Tens of other RFCs describe various BGP-4 extensions

    introduced in the mean time.

  • 8/13/2019 Ars-msr 3 Interdomain 2013

    12/48 Octavian Catrina 12

    Active BGP routes (2004)

    http://www.potaroo.net, Oct. 2004

    Fast exponential

    growth before CIDR

    CIDR is deployed,

    route aggregation limits

    routing table growth

    ISPs improve

    BGP policies

    Route aggregation

    does not work anymore

    Route aggregation

    problems again!

    About 180000 routes

  • 8/13/2019 Ars-msr 3 Interdomain 2013

    13/48

    Octavian Catrina 13

    Active BGP routes (2012)

    http://www.potaroo.net, Oct. 2012

    About 430000 routes

  • 8/13/2019 Ars-msr 3 Interdomain 2013

    14/48

    Octavian Catrina 14

    Outline of BGP operation

    Path vector routing protocol Similar to distance vector protocols.

    BGP sessions

    Routers establish a BGP sessionon a TCP

    connection, exchange the current routes in the

    BGP Routing Information Base (RIB), and thensend incremental updatesof RIB changes.

    Feasible

    routes.

    Withdrawn

    routes.

    BGP

    session

    AS1

    R1

    AS2

    R2

    BGP routing information (UPDATE message)

    Feasible routes: Address prefixes reachable via the sender's

    ASand an associated set ofpath attributes.

    Withdrawn (infeasible) routes: Address prefixes of previously

    announced routes that are not reachable anymore via the

    sender's AS.

    An UPDATE message can indicate both feasible routes and

    infeasible routes.

  • 8/13/2019 Ars-msr 3 Interdomain 2013

    15/48

    Octavian Catrina 15

    BGP path attributes

    Path attributes

    BGP selects a feasible "best" route to a destination based on

    information provided in a set of attributesassociated to each

    path, including: loop detection, next hop, local preference, etc.

    Well-known attributes

    Must be recognized by all compliant BGP implementations.Are propagated to other neighbors.

    Well-known mandatory: Must be specified for all paths.

    Well-known discretionary: May be omitted for some paths.

    Optional attributes

    Not required and not expected to be recognized by all BGPimplementations. If recognized they may be propagated to

    other BGP routers, according to their meaning.

    Optional transitive: If not recognized, may be propagated.

    Optional non-transitive: If not recognized, must be discarded.

  • 8/13/2019 Ars-msr 3 Interdomain 2013

    16/48

    Octavian Catrina 16

    Main BGP path attributes

    Attribute Category Description

    ORIGIN well-knownmandatory

    Origin of the path information: 0 = intra-AS (IGP);1 = EGP; 2 = incomplete (other means). Used to

    assess the trust in the source of routing info.

    AS_PATH well-known

    mandatory

    List of all the ASes on the path: AS_SEQUENCE,

    AS_SET (AS_SET is needed for route aggregation).

    Loop detection, in/out filtering, best path selection.

    NEXT_HOP well-knownmandatory

    Address of the router to be used as next hop on thepath to the specified prefix.

    MULTI_EXIT_DISC

    (MED)

    optional non-

    transitive

    Value used in best path selection to discriminate

    among multiple entry points of a neighboring AS.

    LOCAL_PREF well-known Value of the degree of preference for this route.

    Only used inside an AS (iBGP session, not eBGP).

    COMMUNITY

    (RFC 1997)

    optional

    transitive

    32-bit tag that can be associated to any group of

    prefixes in order to specify various policies (can be

    structured as 16-bit AS number and 16-bit value).

    Standard (e.g., NO_EXPORT) and custom policies.

    Often used to control inbound traffic.

    . . .

  • 8/13/2019 Ars-msr 3 Interdomain 2013

    17/48

    BGP extensions

    Multiprotocol BGP (MP-BGP)

    Extends BGP in order to carry routing information for other

    protocols besides IPv4 (e.g., IPv6, VPN-IPv4, VPN-IPv6).

    RFC 4760, 2007, Multiprotocol Extensions for BGP-4.

    BGP Extended Communities Attribute

    Extends the range of values and adds structure (Type field). RFC 4360, 2006, BGP Extended Communities Attribute.

    Octavian Catrina 17

    Attribute Category Description

    MP_REACH_NLRI

    MP_UNREACH_NLRI

    optional non-

    transitive

    Multiprotocol Reachable NLRI. Used to advertise MP route.

    Multiprotocol Unreachable NLRI. Used to withdraw MP route.

    Two fields are used to describe the kind of route: AddressFamily Identifier (AFI), e.g., IPv4, IPv6, vpnv4 (VPN-IPv4),

    vpnv6 (VPN-IPv6). Subsequent Address Family Identifier

    (SAFI), e.g. unicast, multicast.

    Extended Communities optional

    transitive

    Extension of Community attribute8 octets, out of which 1-2

    octets for type field. E.g., Route Target (RT) Community used

    for MPLS-based L3 VPNs.

  • 8/13/2019 Ars-msr 3 Interdomain 2013

    18/48

    Octavian Catrina 18

    BGP messages

    BGP message Description

    OPEN Used to establish a BGP session between two routers (includes

    mutual identification and optional capabilities negotiation). Each

    router sends an OPEN message, which is acknowledged (if

    accepted) by a KEEPALIVE message.

    KEEPALIVE Used to verify the liveness of the BGP session. A KEEPALIVE

    message is sent if no other message was transmitted duringHoldTime/3 seconds. A BGP session is canceled if no message

    was received during HoldTime seconds.

    UPDATE Used to exchange routing information on a BGP session. Can

    carry a set of feasible routes with common attribute-set (prefixes

    and attribute-set), and/or a set of withdrawn routes (prefixes).

    NOTIFICATION Used to inform the other BGP router of an error. After sending or

    receiving this message a BGP session is closed.

    ROUTE_REFRESH Message added in 2000 to enable a graceful restart of a BGP

    session (e.g., when policies are reconfigured).

  • 8/13/2019 Ars-msr 3 Interdomain 2013

    19/48

    Octavian Catrina 19

    BGP session overview

    AS 100

    R1

    AS 200

    R2

    TCP connection establishment (to port 179)

    OPEN (Version = 4, My-AS = 100, Hold-Time = 180,

    BGP-Identifier = 10.1.3.1, Optional-Param.)

    BGP session establishment

    BGP BGP

    Session

    established

    10.1.3.1 10.1.3.2

    OPEN (Version = 4, My-AS = 200, Hold-Time = 180,

    BGP-Identifier = 10.1.3.2, Optional-Param.)

    Session

    establishedExchange of current routes in RIB

    KEEPALIVE

    KEEPALIVE

    UPDATE (Path-Attributes-Len = ..., Path-Attributes = ...,

    NLRI = 140.1.128.0/18, 160.2.0.0/16)

    140.1.128.0/18160.2.0.0/16 190.3.192.0/18170.4.0.0/16

    . . .

    UPDATE (Path-Attributes-Len = ..., Path-Attributes = ...,

    NLRI = 190.3.192.0/18, 170.4.0.0/16)

    Incremental updates (RIB changes) using UPDATE:

    - Announce new feasible routes.

    - Withdraw infeasible routes.

    Periodic KEEPALIVE messages (e.g., at Hold-Time/3)

    NLRI = Network Layer

    Reachability Information

    = IP address prefix

  • 8/13/2019 Ars-msr 3 Interdomain 2013

    20/48

    Example: BGP session setup

    Octavian Catrina 20

    OPEN Messageand main parameters

    KEEPALIVE Message

  • 8/13/2019 Ars-msr 3 Interdomain 2013

    21/48

    Example: BGP updates (1)

    Octavian Catrina 21

    Prefix of the BGP route: 10.20.0.0/16

    Attributes of the BGP route

    to the prefix specified below

    UPDATE Message advertising a

    feasible rout eto 10.20.0.0/16

  • 8/13/2019 Ars-msr 3 Interdomain 2013

    22/48

    Example: BGP updates (2)

    Octavian Catrina 22

    Prefix of the withdrawn route:

    10.20.0.0/16

    UPDATE Message announcing a

    withdrawn ( infeasib le) routeto 10.20.0.0/16

    A BGP UPDATE message can advertise feasible routes and also announce withdrawn routes

    (previously advertised routes which are no longer feasible).

  • 8/13/2019 Ars-msr 3 Interdomain 2013

    23/48

    Octavian Catrina 23

    BGP path vector routing (1)

    AS 10

    R1

    AS 20

    R2

    BGP session setup

    11.1.1.1

    11.1.1.2

    140.1.0.0/16150.1.0.0/16

    AS 30

    R3160.1.0.0/1611.1.2.1

    11.1.2.2

    UPDATE (Prefix = 140.1.0.0/16,

    AS_PATH = 10, NEXT_HOP = 11.1.1.1)

    Router R1: BGP routing table

    Prefix AS_PATH NEXT_HOP

    140.1.0.0/16 internal . . .

    Router R2: BGP routing table

    Prefix AS_PATH NEXT_HOP

    150.1.0.0/16 internal . . .

    Router R3: BGP routing table

    Prefix AS_PATH NEXT_HOP

    160.1.0.0/16 internal . . .

    UPDATE (Prefix = 150.1.0.0/16,

    AS_PATH = 20, NEXT_HOP = 11.1.1.2)

    Router R1: BGP routing table

    Prefix AS_PATH NEXT_HOP

    140.1.0.0/16 internal . . .

    150.1.0.0/16 20 11.1.1.2

    Router R2: BGP routing table

    Prefix AS_PATH NEXT_HOP

    150.1.0.0/16 internal . . .

    140.1.0.0/16 10 11.1.1.1

    TheAS_PATH attributelists the ASes on the path. It is used for loop

    detection, for policy-based routing, and as basic path metric (AS count).

    The NEXT_HOP attributeindicates the IP address of the router used to

    reach the prefix.

    R1 and R2 exchange the

    current routes in their tables.

  • 8/13/2019 Ars-msr 3 Interdomain 2013

    24/48

    Octavian Catrina 24

    BGP path vector routing (2)

    AS 10

    R1

    AS 20

    R2

    11.1.1.1

    11.1.1.2

    140.1.0.0/16150.1.0.0/16

    AS 30

    R3160.1.0.0/1611.1.2.1

    11.1.2.2

    Router R3: BGP routing table

    Prefix AS_PATH NEXT_HOP

    160.1.0.0/16 internal . . .

    Router R1: BGP routing table

    Prefix AS_PATH NEXT_HOP

    140.1.0.0/16 internal . . .

    150.1.0.0/16 20 11.1.1.2

    Router R2: BGP routing table

    Prefix AS_PATH NEXT_HOP

    150.1.0.0/16 internal . . .

    140.1.0.0/16 10 11.1.1.1

    BGP session setup

    UPDATE (Prefix = 150.1.0.0/16,

    AS_PATH = 20, NEXT_HOP = 11.1.2.1)

    UPDATE (Prefix = 160.1.0.0/16,AS_PATH = 30, NEXT_HOP = 11.1.2.2)

    UPDATE (Prefix = 140.1.0.0/16,

    AS_PATH = 20 10, NEXT_HOP = 11.1.2.1)

    Router R3: BGP routing table

    Prefix AS_PATH NEXT_HOP

    160.1.0.0/16 internal . . .

    150.1.0.0/16 20 11.1.2.1

    140.1.0.0/16 20 10 11.1.2.1

    Router R2: BGP routing table

    Prefix AS_PATH NEXT_HOP

    150.1.0.0/16 internal . . .

    140.1.0.0/16 10 11.1.1.1

    160.1.0.0/16 30 11.1.2.2

    We assum e that AS 20 is the pr ovider, and

    AS 10 and AS 30 are its cu stom ers. AS 20

    provid es transit for its customers, hence it

    announc es the routes to them.

    R2 and R3 exchange the

    current routes in their tables.

  • 8/13/2019 Ars-msr 3 Interdomain 2013

    25/48

  • 8/13/2019 Ars-msr 3 Interdomain 2013

    26/48

    Octavian Catrina 26

    BGP policy-based routing

    Route import policies:For each BGP peer

    - Import filter selects the received

    routes accepted from that peer.

    - Path attribute manipulation

    influences best route selection bythis BGP router.

    Control of outbound traffic:

    Imported route participates in the

    selection of the best path toward

    that prefix.

    Route export policies:For each BGP peer

    - Export filter selects the routes

    announced to that peer.

    - Path attribute manipulation

    influences the best route selectionby that peer.

    Control of inbound traffic:

    Announcing a route enables the

    other AS to route traffic to that

    prefix via this AS.

    Received BGProute updates

    Sent BGProute updates

    Control outbound

    traffic

    BGP router

    Control inbound

    traffic

    Best route

    table

    + Importpolicies

    + Exportpolicies

  • 8/13/2019 Ars-msr 3 Interdomain 2013

    27/48

    Octavian Catrina 27

    Example: Route filtering

    Peer

    Customer Provider

    Peer

    AS1

    AS2

    Tier-2 ISP

    AS3

    Tier-1 ISP

    AS5

    Tier-1 ISP

    AS7AS6

    All

    routesCustomer (AS1) &

    internal routes (AS2)

    AS4

    Tier-2 ISP

    Customer (AS1) &

    internal routes (AS2)

    BGP not

    necessary

    Customer (AS6) &

    internal routes (AS4)

    All routes

    All routes

    All

    routes

    Customer (AS6) &

    internal routes (AS4)

    BGP not

    necessary

    AS6

    routes

    AS6

    routesDefault route

    Static route Default route

    Static route

    Note: Assume that AS2, AS3, AS4, AS5 have many other interconnections, not shown

    in this picture. In particular, AS2 and AS4 are also connected to other Tier-1 and Tier-2

    ISPs, and have many other customers.

  • 8/13/2019 Ars-msr 3 Interdomain 2013

    28/48

    Octavian Catrina 28

    BGP route processing

    Receive

    BGPupdates

    Apply import

    policies

    Import filter

    and attributemanipulation

    Best route

    selection

    Apply export

    policies

    BGP best

    route table

    Send

    BGPupdates

    Select one best route

    to each destinationbased on attributes

    Export filter

    and attributemanipulation

    IP routing

    table

    Install routesfor IP forwarding

    IGP routes

    Dir. con. net.

    Static routes

    BGP Loc-RIB

    Policies are typically defined using vendor specific languages. Route filtering can be done based onprefix, AS number, attributes.

    Import and export of internal routes is also controlled by policies.

    The interactions between BGP and internal routing (IGP, static) are

    not precisely defined.

    BGP Adj-RIB-In BGP Adj-RIB-Out

  • 8/13/2019 Ars-msr 3 Interdomain 2013

    29/48

    Octavian Catrina 29

    Best route selection

    Overview The algorithm operates on the set of routes in Adj-RIBs-In after

    applying import policies (filter, tweak attributes, assign local

    preference). It selects one best route for each destination.

    Sequence of selection steps for each destination

    Discard all the routes for which NEXT_HOP is inaccessible.

    Discard all the routes for which AS_PATH contains the local AS or a loop.

    Let n be the number of routes left.

    If n>1, select the routes with the largest LOCAL_PREF.

    If n>1, select the routes with the smallest number of ASes in AS_PATH.

    If n>1, select the routes with the lowest ORIGIN (IGP1 and a route learned from eBGP, discard routes learned from iBGP.

    If n>1, select the routes with shortest path (local metric) to NEXT_HOP.

    If n>1, select the route from the BGP speaker with the lowest BGP Id.

  • 8/13/2019 Ars-msr 3 Interdomain 2013

    30/48

    Octavian Catrina 30

    Route propagation in an AS

    AS 10

    AS 20

    R3

    11.1.1.1

    11.1.1.2 AS 30

    11.1.2.1

    11.1.2.2R1

    R2

    140.1.0.0/16

    150.1.0.0/16

    150.2.0.0/16

    IGP?BGP

    BGP

    160.1.0.0/16R4

    10.0.0.110.0.0.2

    Example

    AS 20 has two border routers, R2 and R3.

    How can R2 advertise to R3 the routes learned from R1?

    How can R3 advertise to R2 the routes learned from R4?

    AS 20 runs an IGP for interior routing.

    Could this IGP distribute the BGP routes as well?

    No. The IGP cannot carry the BGP path attributes and does not

    scale up to BGP routing table size.

    Solution: Propagate BGP routes within an AS using BGP.

  • 8/13/2019 Ars-msr 3 Interdomain 2013

    31/48

    Octavian Catrina 31

    External vs. internal BGP

    AS 10

    AS 20

    R3

    11.1.1.1

    11.1.1.2 AS 30

    11.1.2.1

    11.1.2.2R1

    R2

    140.1.0.0/16

    150.1.0.0/16

    150.2.0.0/16

    iBGPeBGP

    eBGP

    160.1.0.0/16R4

    10.0.0.110.0.0.2

    External BGP (eBGP): BGP routers in different ASs

    Internal (iBGP): BGP routers in the same AS

    eBGP session

    Directly connected routers (usually).

    Do not use LOCAL_PREF.

    Update AS_PATH, NEXT_HOPattributes in advertised paths.

    Advertise the best route to each

    destination. Usually, an export filter

    for each session.

    iBGP session

    Need not directly connect the routers.

    Use LOCAL_PREF attribute.

    No change to AS_PATH andNEXT_HOP attributes.

    Advertise only best routes learned

    from eBGP (not from iBGP, to avoid

    loops). Usually, no export filter.

    Full mesh of iBGP sessions.

  • 8/13/2019 Ars-msr 3 Interdomain 2013

    32/48

    Octavian Catrina 32

    Example: eBGP + iBGP + IGP

    AS 10

    AS 20

    R3

    11.1.1.1

    11.1.1.2 AS 30

    11.1.2.1

    11.1.2.2R1

    R2140.1.0.0/16

    150.1.0.0/16

    150.2.0.0/16

    iBGPeBGP

    eBGP

    160.1.0.0/16R4

    UPDATE (eBGP)

    Prefix = 140.1.0.0/16,

    AS_PATH = 10,

    NEXT_HOP = 11.1.1.1

    10.0.0.1 10.0.0.2

    UPDATE (iBGP)

    Prefix = 140.1.0.0/16,

    AS_PATH = 10,NEXT_HOP = 11.1.1.1,

    LOCAL_PREF = 100

    UPDATE (eBGP)

    Prefix = 140.1.0.0/16,

    AS_PATH = 20 10,

    NEXT_HOP = 11.1.2.1 Note how the

    path attributeschange during

    route propagation

    Router R2: BGP routing table

    Prefix AS_PATH NEXT_HOP

    140.1.0.0/16 10 11.1.1.1

    Router R2: IP routing table

    P Prefix INTF NEXT_HOP

    BGP 140.1.0.0/16 E 11.1.1.1

    - 11.1.1.0/30 E direct

    - 10.0.0.0/30 S direct

    - 150.1.0.0/16 W direct

    IGP 150.2.0.0/16 S 10.0.0.2

    IGP 11.1.2.0/30 S 10.0.0.2

    Router R3: BGP routing table

    Prefix AS_PATH NEXT_HOP

    140.1.0.0/16 10 11.1.1.1

    Router R3: IP routing table

    P Prefix INTF NEXT_HOP

    BGP 140.1.0.0/16 N 10.0.0.1

    - 11.1.2.0/30 W direct

    - 10.0.0.0/30 N direct

    - 150.2.0.0/16 E direct

    IGP 150.1.0.0/16 N 10.0.0.1

    IGP 11.1.1.0/30 N 10.0.0.1

  • 8/13/2019 Ars-msr 3 Interdomain 2013

    33/48

    Octavian Catrina 33

    Which is the right path?

    Hot potato routing

    Closest exit routing (get ridof it as soon as possible).

    "Natural" BGP behavior.

    Cold potato routing

    Best exit routing (closest todestination). Must use MED

    or COMMUNITY attributes.

    SF NY

    Customer

    Provider (ISP)

    Hot

    potato!

    Hot

    potato!

    Example: Dual-homed customer

    Asymmetric routes. If the customer downloads only,

    then its own network carries most of the traffic!

    SF NY

    Customer

    Provider (ISP)

    Hot

    potato

    Cold

    potato

    Symmetric routes. The provider carries most

    of the traffic in this case.

  • 8/13/2019 Ars-msr 3 Interdomain 2013

    34/48

    Octavian Catrina 34

    Multi-homing with one provider (1)

    Example: Dual-homing without BGP

    Configuration example: The customer can configure default routes on R3 (via R1) and on R4

    (via R2) and distribute them internally using IGP; outbound packets follow the IGP shortest

    paths and exit on either R3 or R4. The provider can configure static routes to the customer's

    prefix on R1 (via R3) and R2 (via R1) and distribute them using BGP (possibly also IGP).

    AS 10

    Transit providerCustomer

    R2

    R1

    R4

    R3

    150.1.0.0/24

    11.1.1.111.1.1.2

    11.1.2.111.1.2.2

    Hot potato

    routing

    Hot potato

    routing

    How can we control outbound traffic?

    Easier: We have full control of the routers in our own AS.

    How can we control inbound traffic?

    Difficult: We need to influence routing in the provider's AS.

    We need BGP on R3, R4. However, solutions currently offered

    by BGP are not satisfactory - still anopen issue.

    No control of inbound traffic.

    Asymmetric routing.

  • 8/13/2019 Ars-msr 3 Interdomain 2013

    35/48

    Octavian Catrina 35

    Multi-homing with one provider (2)

    AS 10Transit provider

    iBGP

    AS 456

    Customer

    R2

    R1

    iBGP

    R4

    R3

    150.1.0.0/24

    11.1.1.111.1.1.2

    11.1.2.111.1.2.2

    Hot potato

    routing

    Example: Backup link using BGP Simpler goal: Primary link carries all the traffic. Backup link is

    used only when the primary link fails.

    Higher local

    preference for

    default route

    via primary link

    Control of outbound traffic

    Configuration example for outbound traffic: R1/R2 send default routes to R3/R4. R3 and R4

    set the local preference of the received default routes such that the primary link has higher

    preference than the backup link. They don't need to import other routes. The default route is

    then distributed in the customer network by IGP. Variant: on Cisco routers we can configure a

    default network instead of default route - one of the networks advertised by the provider.

    Primary

    link

    Backup

    link

    Traffic can still return on the backup link.

    We have to also control inbound traffic.

    UPDATE (eBGP)

    Prefix = 0.0.0.0/0,

    AS_PATH = 10,NEXT_HOP = 11.1.1.1

    UPDATE (eBGP)

    Prefix = 0.0.0.0/0,

    AS_PATH = 10,

    NEXT_HOP = 11.1.2.1

  • 8/13/2019 Ars-msr 3 Interdomain 2013

    36/48

    Octavian Catrina 36

    Multi-homing with one provider (3)

    AS 10

    Transit provider

    iBGP

    AS 456

    Customer

    Primary

    link

    Backup

    link

    R2

    R1

    iBGP

    R3

    150.1.0.0/24

    11.1.1.111.1.1.2

    11.1.2.111.1.2.2

    Cold potato

    routing

    UPDATE (eBGP)

    Prefix = 150.1.0.0/24,

    AS_PATH = 456,

    NEXT_HOP = 11.1.1.2

    MED = 100

    UPDATE (eBGP)

    Prefix = 150.1.0.0/24,AS_PATH = 456,

    NEXT_HOP = 11.1.2.2

    MED = 200

    Control of inbound traffic using the MED attribute MED: Multi-exit discriminator attribute. Optional, non-transitive. Lower MED means better route (MED can be IGP metric).

    Applies only to routes received from the same adjacent AS.Does not help if multi-homed with different providers.

    R4

    Configuration example for inbound traffic: R3 advertises the customer's route with a lower

    MED than R4. This indicates that the preferred route for inbound traffic is via R3 (the primary

    link). Then, R1 and R2 select the route via R3 as best route to the customer (assuming that

    the provider supports and accepts MED).

    Example: Using MED for backup link

  • 8/13/2019 Ars-msr 3 Interdomain 2013

    37/48

    Octavian Catrina 37

    Community attribute

    Community attributes

    Versatile mechanism that allows grouping destinations intocommunities and applying to them various policies based on a

    Community attribute. The attribute is optional and transitive.

    RFC 1997, RFC 4360 (extended communities), etc.

    Several standard (well-known) communities NO_EXPORT (0xFFFFFF01): Do not export the route in eBGP

    sessions (only use it in within your AS).

    NO_ADVERTISE (0xFFFFFF02): Do not advertise the route in

    iBGP sessions (only use it in that router).

    Delegated communities Operators can define communities for policies of their own

    choice. Attribute value is structured into a 16-bit AS number

    (local AS), and a 16-bit value for the policy.

  • 8/13/2019 Ars-msr 3 Interdomain 2013

    38/48

    Octavian Catrina 38

    Route filtering based on communities

    AS 99

    Our ISP

    R2

    AS 10Our transit provider

    R1

    AS 4567

    Our customer

    AS 1234

    Our customer

    R8

    R7 R5

    R6

    R3 R4

    AS 88

    Our peer

    Communities defined by AS 99

    for export filters:

    99:100Customer prefixes

    99:200Transit prefixes

    99:300Peer prefixes99:400 Internal prefixes

    Received routes are tagged

    with these communities when

    they are imported.

    Import: set 99:100Export: match 99:100,

    99:200, 99:300, 99:400

    Import: set 99:300Export: match 99:100,

    99:400

    Import: set 99:200

    Export: match 99:100, 99:400

  • 8/13/2019 Ars-msr 3 Interdomain 2013

    39/48

    Octavian Catrina 39

    Multi-homing with different providers (1)

    AS 10Transit provider

    150.1.0.0/16

    AS 456

    Our AS

    150.1.0.0/24

    AS 20

    Transit provider

    160.1.0.0/16

    UPDATE

    Prefix = 150.1.0.0/24,

    AS_PATH = 456,

    NEXT_HOP = 150.1.1.2

    UPDATE

    Prefix = 150.1.0.0/24,

    AS_PATH = 456,

    NEXT_HOP = 160.1.1.2

    150.1.1.1

    150.1.1.2

    160.1.1.1

    160.1.1.2

    UPDATE

    Prefix = 150.1.0.0/16,

    AS_PATH = {10 456},

    NEXT_HOP = 150.1.2.1

    UPDATE

    Prefix = 150.1.0.0/24,

    AS_PATH = 20 456,

    NEXT_HOP = 160.1.2.1

    Can aggregate with lo ss

    of p ath info : us e AS_SET.

    No aggregat ion

    Some Router: BGP routing tablePrefix AS_PATH NEXT_HOP

    150.1.0.0/16 ... {10 456} ...

    150.1.0.0/24 ... 20 456 ...

    Best path, because it is more specific!

    Increase of the number of multi-homed

    networks leads to less aggregation,hence fast growth of the BGP routing

    table, and more instability.

    If AS 10 aggregates, routers in the Internet will take the

    path via AS 20 because the route is more specific.

    Should AS 10 de-aggregate its /16 to balance the traffic?!

    R1

    Multi-homing and

    route aggregation

    R2

    R3

    160.1.2.1

    150.1.2.1

  • 8/13/2019 Ars-msr 3 Interdomain 2013

    40/48

    Octavian Catrina 40

    Multi-homing with different providers (2)

    AS 10

    Transit provider

    150.1.0.0/16

    AS 456

    Our AS

    150.1.0.0/24

    AS 20

    Transit provider

    160.1.0.0/16

    UPDATE

    Prefix = 150.1.0.0/24,

    AS_PATH = 456,

    NEXT_HOP = 150.1.1.2

    UPDATE

    Prefix = 150.1.0.0/24,

    AS_PATH = 456,

    NEXT_HOP = 160.1.1.2

    COMMUNITY = 20:30

    150.1.1.1

    150.1.1.2

    160.1.1.1

    160.1.1.2

    UPDATE

    Prefix = 150.1.0.0/24,

    AS_PATH = 10 456,

    NEXT_HOP = ...

    R1

    Control of inbound traffic

    using communitiesExample: Backup link

    Communities defined by AS 20

    for customers:

    20:90 Set LOCAL_PREF = 90

    20:70 Set LOCAL_PREF = 70

    20:50 Set LOCAL_PREF = 50

    20:30 Set LOCAL_PREF = 30

    Provider policy for assigning default

    LOCAL_PREF to imported routes:

    Customer: LOCAL_PREF = 80

    Peer: LOCAL_PREF = 60

    Transit: LOCAL_PREF = 40

    Assume that the providers

    use the communities and

    policies listed below.

    Backup link

    AS 20 routes to 456 via AS 10

    due to the community attributereceived. It could forward the

    community attribute, but this

    helps only if it is recognized by

    its peers and providers.

    UPDATE

    Prefix = 150.1.0.0/24,

    AS_PATH = 20 10 456,

    NEXT_HOP = ...

    R2

    R3

    AS 10 uses primary link to 456

    because it prefers a customer

    route to peer and provider routes.

    No aggregation of AS 456 prefix suchthat other ASes do not route via AS 20.

    Caveat: De-aggregation violates CIDR

    principles and increases BGP routing

    table, which will become unmanageable.

    S G

  • 8/13/2019 Ars-msr 3 Interdomain 2013

    41/48

    Octavian Catrina 41

    Scaling up iBGP

    How to scale up iBGP to a large domain?

    Routes learned from an iBGP session cannot be advertised oniBGP sessions, to avoid routing loops.

    Hence we must set up a full mesh of iBGP speakers.

    For N routers, each router must establish N-1 sessions, for a

    total of N(N-1)/2 sessions. Does not scale up. ASes can have

    100s of routers or more.

    Solution 1: Confederations

    Divide a large AS in smaller sub-domains.

    Use an iBGP full mesh inside each sub-domain, and eBGP

    between sub-domains.

    Solution 2: Route reflectors

    Establish one or more routers, called route reflectors, that

    forward the iBGP routes to a cluster of other routers.

    R t fl t (1)

  • 8/13/2019 Ars-msr 3 Interdomain 2013

    42/48

    Octavian Catrina 42

    Route reflectors (1)

    From iBGP full mesh to iBGP with route reflector

    iBGP

    eBGPR2

    R3

    R1

    iBGP

    iBGP

    eBGP

    Full mesh iBGP iBGP with route reflector (RR)

    iBGP

    eBGPR2

    R3

    R1

    iBGP

    eBGP

    RR

    Interactions between a router reflector and its clients

    R2 and R3 are configured as clients of the route reflector R1.Clients establish iBGP sessions only with their route reflector.

    When a route reflector receives a route update from its clients

    or from eBGP, it computes the best route and then advertises it

    to its clients.

    eBGPeBGP

    R t fl t (2)

  • 8/13/2019 Ars-msr 3 Interdomain 2013

    43/48

    Octavian Catrina 43

    Route reflectors (2)

    A more general configuration: route reflector and mesh

    R1 = Route reflector (RR).

    R2, R3, R4 = RR clients.

    R5, R6 = non-clients.

    R5 and R6 are not clients of the route reflector R1. They

    establish together with R1 a full mesh of iBGP sessions. When R1 receives a route update from a client, it computes

    the best path and advertises it to its clients and to non-clients.

    When R1 receives a route update from a non-client, it

    computes the best path and advertises it to its clients.

    R5

    R6

    R2

    R4

    RR

    R3

    Full mesh Route reflector

    R1

    eBGP

    iBGP

    iBGP

    eBGP

    iBGP

    eBGP

    eBGP

    eBGPiBGP

    iBGP

    iBGP

    BGP

  • 8/13/2019 Ars-msr 3 Interdomain 2013

    44/48

    Octavian Catrina 44

    BGP convergence

    Convergence issues

    BGP cannot guarantee that a unique stable routing exists forany set of policies.

    Moreover, BGP cannot guarantee that any stable routing

    exists or that it can be found.

    Various scenarios with non-deterministic outcome or persistent

    oscillation have been presented over time.

    Causes Locally defined policies may conflict globally.

    Unconstrained policy languages.

    However, outages are caused by configuration errors ...

    Convergence speed Improved by the use of triggered, incremental updates.

    But with current Internet size & complexity, path vector routing

    is quite slow (e.g., for unreachable destination).

    St bilit f i t d i ti

  • 8/13/2019 Ars-msr 3 Interdomain 2013

    45/48

    Octavian Catrina 45

    Stability of inter-domain routing

    Challenges

    Inter-domain routing in the global Internet relies on a hugedistributed system of over 100K BGP routers.

    A router reconfiguration/restart, BGP configuration error, or

    local instability (link, BGP session, etc.) can flood with updates

    all the BGP routers in the Internet.

    This causes routing instability and disruption of Internet traffic.

    BGP traffic analysis

    Most routes do not change frequently. Most of the Internet

    traffic uses these routes.

    A small fraction of the routes are responsible for most of theBGP messages exchanged.

    Approach

    Penalize the unstable routes such that to preserve the more

    stable ones(and reduce Internet traffic disruption).

  • 8/13/2019 Ars-msr 3 Interdomain 2013

    46/48

    R t fl d i l

  • 8/13/2019 Ars-msr 3 Interdomain 2013

    47/48

    Octavian Catrina 47

    Route flap dampening example

    S

  • 8/13/2019 Ars-msr 3 Interdomain 2013

    48/48

    Summary

    "BGP was not designed. It evolved ..."

    BGP remains a quite simple protocol, and there are manyrobust and efficient implementations.

    It has managed fairly well over the last 15 years to cope with

    the fast growth and increasing complexity of the Internet.

    BGP configuration for large domains is quite complicated and

    error prone (especially policies).

    Challenges

    BGP security (so far, only TCP MAC with static keys).

    Multi-homing and BGP "traffic engineering". How to do them

    without breaking CIDR and exploding the BGP routing tables? Policy-based routing.

    BGP next generation?