understanding large internet service provider backbone networks

38
Understanding Large Internet Service Provider Backbone Networks Joel M. Gottlieb IP Network Management & Performance Department AT&T Labs – Research Florham Park, New Jersey [email protected]

Upload: byron-harper

Post on 04-Jan-2016

43 views

Category:

Documents


1 download

DESCRIPTION

Understanding Large Internet Service Provider Backbone Networks. Joel M. Gottlieb IP Network Management & Performance Department AT&T Labs – Research Florham Park, New Jersey. [email protected]. Purpose of the talk. You’ve heard of the big ISP’s WorldCom, Sprint, AT&T, AOL… - PowerPoint PPT Presentation

TRANSCRIPT

Page 1: Understanding Large Internet Service Provider Backbone Networks

Understanding Large Internet Service Provider Backbone Networks

Joel M. GottliebIP Network Management & Performance Department

AT&T Labs – ResearchFlorham Park, New Jersey

[email protected]

Page 2: Understanding Large Internet Service Provider Backbone Networks

Purpose of the talk

• You’ve heard of the big ISP’s– WorldCom, Sprint, AT&T, AOL…

• You’ve learned about all the stuff– Routers, layers, TCP/IP, protocols…

• How do these ISP networks work?– How is a large network structured?– Which routing protocols do you use?– How does it all fit together?– What are some of the challenges in

operating the network?

Page 3: Understanding Large Internet Service Provider Backbone Networks

Outline

• Network Architecture– From a cloud to individual routers– Router hierarchy– Routing protocols

• Operational challenges– A variety of practical issues

• Focus on network configuration– The actual process of configuration– Configuration management and Netdb

Page 4: Understanding Large Internet Service Provider Backbone Networks

Internet Architecture

• Divided into Autonomous Systems (“clouds”)– Distinct regions of administrative control (~15,000)

– Set of routers and links managed by a single “institution”

– Service provider, company, university, …

• Hierarchy of Autonomous Systems– Large, tier-1 provider with a nationwide backbone

– Medium-sized regional provider with smaller backbone

– Small network run by a single company or university

• Interaction between Autonomous Systems– Internal topology is not shared between ASes

– … but, neighboring ASes interact to coordinate routing

Page 5: Understanding Large Internet Service Provider Backbone Networks

ISP 1

ISP 2

ISP 3

NAP

commercialcustomer

dial-in access

destination

destination

interdomainprotocols

intradomainprotocols

Connections Between Providers

Page 6: Understanding Large Internet Service Provider Backbone Networks
Page 7: Understanding Large Internet Service Provider Backbone Networks

Inside the Cloud

• Multiple POPs (Points of Presence)

– Like central offices in telephone network

– Space in POP may be owned or rented

– Cages containing large number of vertical racks

• Within a POP:

– Multiple routers

– Routers may have different responsibilities:

• Access router

• Backbone router

• Internet Gateway Router

– Routers w/different responsibilities may be same model

Page 8: Understanding Large Internet Service Provider Backbone Networks

Internet Gateway Router

• Connections to neighboring Tier 1 providers– Peer: entity who may exchange full routing

tables with you– Peers often have contractual relationship

• Few interfaces (interface = slot, as in a PC, for plugging in cards and cables)

• Fast interfaces• Limited filtering (filter = router feature

to prevent unwanted traffic, by source or by destination)

Page 9: Understanding Large Internet Service Provider Backbone Networks

Backbone Router

• No connections outside the network• Moderate number of interfaces• Fastest interfaces• Very limited filtering• Top-of-the-line router models, connected

by fastest links (presently OC-192)• Main purpose: move traffic through the

network as fast and efficiently as possible

• “Big, fast and stupid”

Page 10: Understanding Large Internet Service Provider Backbone Networks

Access Router

• Many connections, to customers, modem banks…

• Connections only to backbone routers, not to each other

• Large number of interfaces• Variety of interface speeds (depends on

customer)• Extensive filtering

– Important to protect network from unwanted or dangerous traffic

Page 11: Understanding Large Internet Service Provider Backbone Networks

The Router Hierarchy in a POP

modem banks,business customers,web/e-mail servers

neighboring providers

fewIGR

many

BR

AR

Page 12: Understanding Large Internet Service Provider Backbone Networks

Motivation: Routing Protocols

Customer A

Customer B

Customer C

ISP 2

ISP 1

Our Customer A wants to reach Customer C.

How can we guarantee this?

Router YRouter X

Page 13: Understanding Large Internet Service Provider Backbone Networks

What we need to do

• Customer A sends us traffic destined for Customer C, which arrives at router X

• Router Y needs to know how to reach C• Router X needs to know to choose

Router Y in order to reach C• Router X needs to know how to reach

Router Y

Page 14: Understanding Large Internet Service Provider Backbone Networks

Some candidate routing protocols

• RIP version 2– Not enough knobs, plus convergence

problems

• EIGRP (Cisco proprietary)– Somewhat popular, but has instabilities– “Storms” a major problem for large networks

• MPLS (Multiprotocol Label Switching)– Adds labels to packets inside network, then

switches based on the labels– Increasing in popularity– Very complex configuration

Page 15: Understanding Large Internet Service Provider Backbone Networks

Focus on most popular routing protocols

• BGP (Border Gateway Protocol)– Path-vector– Keep track of who knows how to reach

prefixes– Send prefixes we know how to reach, to others– Use internally and externally

• OSPF (Open Shortest Path First)– Link-state– Each router computes best paths to all

destinations, based on link information it receives

– Moy (1990)– Use internally only

Page 16: Understanding Large Internet Service Provider Backbone Networks

Border Gateway Protocol (BGP)

• ASes announce info about prefixes they can reach

• Prefixes no longer reachable are withdrawn

• Local policies for path selection (which to use?)

• Local policies for route propagation (who to tell?)

• Policies configured by the AS’s network operator

1 2 3

12.34.158.5

“I can reach 12.34.158.0/23”

“I can reach 12.34.158.0/23 via AS 1”

Page 17: Understanding Large Internet Service Provider Backbone Networks

BGP (continued)

• When AS3 receives this advertisement, it may update its forwarding table: a table of best routes to use

• When AS3 encounters packets which want to reach 12.34.158.0/23, it will send them to its next-hop, AS2

• AS3’s forwarding table may only contain one best route for a specific destination

• What if AS3 receives more than one advertisement for 12.34.158.0/23?– BGP attributes and BGP Best Route Selection

Page 18: Understanding Large Internet Service Provider Backbone Networks

Some BGP Attributes

Local Preferencea single number, used internally inside AS’s to allow

preference of particular routes. It is not passed from one AS to another.

AS-PathContains a list of AS’s that the route has traversed

through (example coming up)Weight

A Cisco-only parameter (a single number) allowing weighting of routes

CommunityA parameter in the form ASN:xxxx which allows

coloring of particular routesOrigin

IGP, EGP, or Incomplete (perhaps a static route)There are more… (MPLS route-distinguisher. Etc.)

Page 19: Understanding Large Internet Service Provider Backbone Networks

Example: The AS-PATH parameter

• Each time a route is advertised by an AS, the AS-PATH is stamped with the AS number of the AS doing the advertising

• Route found in BGP routing table on AT&T router:• * i161.173.0.0 192.205.31.97 0 82 0 2828 7911 13965 10695 i• What this means:

– Route for prefix 161.173.0.0/16– AS path: 2828 7911 13965 10695– The next hop is 192.205.31.97 (different AT&T router)– This route was obtained from Concentric (AS 2828)– The route originated with AS 10695 (Wal-Mart) (see

www.arin.net)– The block of addresses 161.173.0.0/16 is indeed

owned by Wal-Mart

Page 20: Understanding Large Internet Service Provider Backbone Networks

Path Preference: BGP Best Route Selection

BGP Best Route Selection ProcessCan select at most one route to any given path

Prefer routes with the largest weight (Cisco proprietary)

If the weights are equal, prefer routes with largest local preference

Then prefer routes with shortest AS-pathsThen select the route which has smallest cost to

the NextHop

This is a simplified version of the actual process…

Page 21: Understanding Large Internet Service Provider Backbone Networks

Typical use of BGP

• Internal (I-BGP)– Remember: Router X needs to know to choose

Router Y in order to reach C– Use pairwise BGP relationships to ensure that entire

network knows the best ways to reach external destinations (create a full mesh internally)

• External (E-BGP)– Remember: Router Y needs to know how to reach C– Policies set to receive most (or all) routes from

peers– Policies set to receive some routes from customers

(e.g., using communities for ‘color’)– Exchange no routes with others

• Many customers too small to generate routes to others• Security risk

Page 22: Understanding Large Internet Service Provider Backbone Networks

OSPF

• Routers flood information to learn the topology

• Routers determine “next hop” to reach other routers, by running Dijkstra algorithm

• Path selection based on link weights (shortest path)

• Link weights configured by the network operator

3

2

2

1

1

3

1

4

5

3 Path cost = 8

Page 23: Understanding Large Internet Service Provider Backbone Networks

Typical use of OSPF

• Internal– Remember: Router X needs to know how to reach

Router Y– Can use link weights to select/deselect specific

paths (for maintenance, bandwidth reasons – “costing out”)

– You have ability to engineer paths through the network

• NY to LA: use your long-haul fastest links• Avoid using backup links when others available

• Additional design options– Multiple areas (avoid too-large Dijkstra calculation)– Stub areas (isolate particular network features)

Page 24: Understanding Large Internet Service Provider Backbone Networks

Some operational challenges

• Now you have a rough idea of the structure

• Operational challenges – what is it really like to operate a large ISP?

• Some topics to touch on briefly– Management issues– Provisioning issues– Capacity planning issues– Performance issues– Configuration issues (we’ll focus here in a

moment)

Page 25: Understanding Large Internet Service Provider Backbone Networks

Management issues

• Multiple groups often share responsibility– Communication must be good, and often isn’t– Outages require very skilled problem-solving– No personnel at many physical locations (router

crash)

• Complexity– Network policy not widely available/understood– Policy changes faster than documentation– Very deep subject matter (e.g., interfaces)– Database of record hard to keep updated

• Unforeseen problems with serious impact– Angry customers quick to make issues known

• Security – hard to protect assets

Page 26: Understanding Large Internet Service Provider Backbone Networks

Provisioning Issues

• Very complicated transactions– Customer needs may be very specific– Offers tend to be very complex– Difficult tasks for customer care staff

• Must have very accurate network picture– Oversubscribing router can cause problems– Example: duplicate IP address allocation

• Must respect network time issues– Maintenance windows

• Validation testing must be efficient, timely and thorough

Page 27: Understanding Large Internet Service Provider Backbone Networks

Practical Operational Challenges

• Increase in the scale of the network– Link speeds, # of routers/links– Large network has 100s of routers and 1000s of links

(already discussed managing routing protocols)

• Significant traffic fluctuations– Time-of-day changes and addition of new customers– Special events (Olympics) and new applications

(Napster/kazAa)– Difficult to forecast traffic load before designing

topology

• Market demand for stringent network performance– Service level agreements (SLAs), high-quality voice-

over-IP

Page 28: Understanding Large Internet Service Provider Backbone Networks

Practical Capacity Planning Issues

• Deciding whether to buy/install new equipment– What? Where? When?

• Examples– Where to put the next backbone router– Whether the network can accommodate a new customer– Whether to install a caching proxy for cable modems

• Requirements– Projections of future traffic patterns from measurements– Cost estimates for buying/deploying the new equipment– Model of the potential impact of the change (e.g., latency

reduction and bandwidth savings from a caching proxy)

Page 29: Understanding Large Internet Service Provider Backbone Networks

Performance Issues

• Data is enormous and hard to analyze– Netflow: per-packet, heavy load on routers– SNMP: not very detailed (link utilization)– Perhaps packet sampling/sniffing?– File systems fill up quickly

• Real-time performance analysis is hard– Open problem: Recognizing trouble– Support staff must interpret very complex

analysis

• Problems are expensive and damaging– Unhappy customers and press coverage

Page 30: Understanding Large Internet Service Provider Backbone Networks

Focus on router configuration

• Uncertainty– Decentralized manual router configuration

(telnet)- Databases of record must be kept accurate

• Complexity (as mentioned earlier)– Network policy not widely available/understood– Very deep subject matter (e.g., interfaces);

experts are expensive and hard-to-find

• Limited commercial tools for CM/debugging– Tools do not cover local conventions and policies– Tools typically lag behind product releases

Page 31: Understanding Large Internet Service Provider Backbone Networks

Cisco Router Configuration Language (IOS)

• Not user-friendly– Certifications offered (CCNE etc.) – Requires knowledge of low-level details (“assembly

language”)– Many options for arguments

• Not a formal language– Simple grammar (keywords mixed with optional args)– Generally unstructured - very specific parsing required

• Presents a moving target– Multiple versions in marketplace (and in single

network)– Command-set extended very often

• Substantial expertise required– 900+ unique statements in single network– Long files (AR 1000’s of lines; BR and IGR 100’s)

Page 32: Understanding Large Internet Service Provider Backbone Networks

How configuration is done

• Telnet directly to router• Type in specific command: router status

– “show ip running config” – get full configuration– “show diag” – lower-level hardware details– “show version” – get the running IOS levels– “show ip interface brief” – details on particular

interface

• Type in specific command: making changes– “no ip access-list” – get rid of existing access list (and

then you have to put the new one in, line by line!)– “router bgp 701” – bring up the BGP process on your

router

• Some command options depend on previous options

• New tools – keystroke trackers

Page 33: Understanding Large Internet Service Provider Backbone Networks

Example: Cisco Router Configuration File

• Language with hundreds of different commands• Cisco IOS is a de facto standard config language• Sections for interfaces, routing protocols, filters, etc.

version 12.0hostname MyRouter!interface Loopback0 ip address 12.123.37.250 255.255.255.255!interface Serial9/1/0/4:0 description MyT1Customer bandwidth 1536 ip address 12.125.133.89 255.255.255.252 ip access-group 10 in!

interface POS6/0 description MyBackboneLink ip address 12.123.36.73 255.255.255.252 ip ospf cost 1024!router ospf 2 network 12.123.36.72 0.0.0.3 area 9 network 12.123.37.250 0.0.0.0 area 9!access-list 10 permit 12.125.133.88 0.0.0.3access-list 10 permit 135.205.0.0 0.0.255.255ip route 135.205.0.0 255.255.0.0 Serial9/1/0/4:0

Page 34: Understanding Large Internet Service Provider Backbone Networks

Netdb: Router configuration files to Network abstraction

Router configuration files

interface ATM9/0/0 description ip access-group rate-limit input ip route-cache no ip route- ip route-cache bandwidth 12500 load-interval 3 . . .

Using Netdb, an accurate network view can be stored in a database, permitting querying, error checking, and specialized reporting

Network abstraction

Page 35: Understanding Large Internet Service Provider Backbone Networks

Netdb Architecture

routerconfigfiles

Traffic Analysis

Securityaudits

NetdbDatabase of record

Discords

Low levelstandard form

AbstractnetworkDatabase

Operations

Network Management Tools(Tools on top)

queries

Page 36: Understanding Large Internet Service Provider Backbone Networks

Netdb and the CBB network

• Queries developed for specific topics– Which router cards? Which router models?– Are security features configured properly?– Are BGP relationships configured properly?

• Results available daily– Operations groups note discords, may fix

them– Capacity Planning may use topology queries

• Many research efforts enhanced – Traffic engineering, traffic analysis– Developing expertise base in configuration

management

Page 37: Understanding Large Internet Service Provider Backbone Networks

Tracking the State of the Network

• Network management groups– Tier 1: Customer care– Tier 2: Individual network elements– Tier 3: Network-wide view

• Databases– Customers (name, billing info, IP addresses,

service,...)– Network assets (routers, links, configuration,…)

• Data from the operational network– Router configuration files (commands applied to

router)– Fault data (link/router failures, BGP session failures,

…)– Routing tables (dumps of BGP and forwarding tables)– Netflow packet-level information– Link utilization

Page 38: Understanding Large Internet Service Provider Backbone Networks

Conclusions

• Large IP network is very complex; requires both expertise and personpower to manage carefully

• Router configuration is a very big subject (High schools now teaching Cisco configuration)

• Management issues often as difficult as technical issues

• Small changes in network configuration can have serious consequences; work must be done very expertly

• Performance management is a recent subject and just developing

• Large IP networks are very complicated!• New technologies coming all the time