trill routing scalability considerations

IETF-62 TRILL BOF

TRILL Routing Scalability Considerations

Alex Zininzinin@psg.com

IETF-62 TRILL BOF

General scalability framework

About growth functions for Data overhead (Adj’s, LSDB, MAC entries) BW overhead (Hellos, Updates, Refr’s/sec) CPU overhead (comp complexity, frequency)

Scaling parameters N—total number of stations N L—number of VLANs F—relocation frequency

Types of devices Edge switch (attached to a fraction of N, and L) Core switch (most of L)

IETF-62 TRILL BOF

Scenarios for analysis Single stationary bcast domain

No practical station mobility N = O(1K) by natural bcast limits

Bcast domain with mobile stations Multiple stationary VLANs

L = O(1K) total, O(100) visible to switch N = O(10K) total

Multiple VLANs with mobile stations

IETF-62 TRILL BOF

Protocol params of interest

What Amount of data (topology, leaf entries) Number of LSPs LSP refresh rate LSP update rate Flooding complexity Route calculation complexity & frequency

Why Required memory [increase] as network grows Required mem & CPU to keep up with protocol dynamics Link BW overhead to control the network

How: Absolute: big-O notation Relative: compare to e.g. bridging & IP routing

IETF-62 TRILL BOF

Why is this important If data-inefficient:

Increased memory requirements Frequent memory upgrades as network grows Much more info to flood

If comput’ly inefficient: Substantial comp power increase == marginal network

size increase High CPU utilization Inability to keep up with protocol dynamics

IETF-62 TRILL BOF

Link-state Protocol Dynamics

Network events are visible everywhere Main assumption for stationary networks:

Network change is temporary Topology stabilizes within finite T

For each node: Rinp—input update rate (network event frequency) Rprc—update process rate

Long-term convergence condition: Rprc >> Rinp

What if (Rprc < Rinp) ??? Micro bursts are buffered by queues Short-term (normal for stat. nets): update drops, rexmit,

convergence Long-term/permanent: net never converges, CPU upgrade needed

Rprc = f (proto design, CPU, implementation) Rinp = f (proto design, network)

IETF-62 TRILL BOF

Data-plane parameters Data overhead

Number of MAC entries in CAM-table Why worry?

CAM-table is expensive 1-8K entries for small switches 32K-128K for core switches Shared among VLANs Entries expire when stations go silent

IETF-62 TRILL BOF

Single Bcast domain (CP) Total of O(1K) MAC addresses

Each address: 12bit VLAN tag + 48bit MAC = 60 bits IS-IS update packing:

4 addr’s per TLV (TLV is 255B max) 20 addr’s per LSP fragment (1470B default) ~5K add’s per node (256 frags total)

LSP refresh rate: 1K MACs = 50 LSPs 1h renewal = 1 update every 72 secs

MAC update rate: Depends on MAC learning & dead detection procedure

IETF-62 TRILL BOF

MAC learning Traffic + expiration (5-15m):

Announces station activity 1K stations, 30m fluctuations = 1 update every 1.8

seconds average Likely bursts due to “start-of-day” phenomenon

Reachability-based Start announcing MAC when first heard from station Assume it’s there until have seen evidence otherwise

even if silent (presumption of reachability) Removes activity-sensitive fluctuations

IETF-62 TRILL BOF

Single bcast domain (DP) Number of entries

Bridges: f (traffic) Limited by local config, location within network

Rbridge: all attached stations No big change for core switches (see most

MACs) May be a problem for smaller ones

IETF-62 TRILL BOF

Single bcast: summary With reachibility-based MAC announcements… CP is well within the limits of current link-state

routing protocols Can comfortably handle O(10k) routes Dynamics are very similar There’s an existence proof that this works

CP data overhead is O(N) Worse than IP routing: O(log N) However, net size is upper-bound by bcast limits Small switches will need to store & compute more

Data-plane may require bigger MAC tables in smaller switches

IETF-62 TRILL BOF

Note: comfort limit Always possible to overload neighbor w updates Update flow control is employed

Dynamic is possible, yet… Experience-based heuristics: pace updates at

30/sec Not a hard rule, ballpark Limits burst Rinp for neighbor Prevents drops during flooding storms

Given the (Rprc >> Rinp) condition, want average to be an order of magnitude lower, e.g. O(1) upd/sec Max

IETF-62 TRILL BOF

Note: protocol upper-bound

LSP generation is paced: normally not more frequent than each 5 secs

Each LSP frag has it’s own timer With equal distribution

Max node origination rate == 51 upd/sec Does not address long-term stability

IETF-62 TRILL BOF

Single bcast + mobility Same number of stations

Same data efficiency for CP and DP Different dynamics Take IETF wireless network, worst case

~700 stations New location within 10 minutes Average 1 MAC every 0.86 sec or 1.16 MAC/sec Note: every small switch in VLAN will see updates

How does it work now??? Bridges (APs + switches) relearn MACs, expire old

Summary: dynamics barely fit within comfort range

IETF-62 TRILL BOF

Multiple VLANs Real networks have VLANs Assuming current proposal is used

Standard IS-IS flooding Two possibilities:

Single IS-IS instance for whole network Separate IS-IS instance per VLAN

Similar scaling challenges as with VR-based L3 VPNs

IETF-62 TRILL BOF

VLANs: single IS-IS Assuming reachability-based MAC

announc’t Adjacencies and convergence scale well However…

Easily hit 5K MAC/node limit (solvable) Every switch sees every MAC in every VLAN Even if it doesn’t need it

Clear scaling issue

IETF-62 TRILL BOF

VLANs: multiple instances MAC announcements scale well Good resource separation However…

N adjacencies for a VLAN trunk N times more processing for a single topological

event N times more data structures (neighbors, timers,

etc.) N =100…1000 for a core switch

Clear scaling issue for core switches

IETF-62 TRILL BOF

VLANs: data plane Core switches

Not big difference Exposed to most MACs in VLANs anyway

Smaller switches Have to install all MACs even if a single port on

a switch belongs to a VLAN May require bigger MAC tables than available

IETF-62 TRILL BOF

VLANs: summary Control plane:

Currently available solutions have scaling issues

Data plane: Smaller switches may have to pay

IETF-62 TRILL BOF

VLANs + Mobility Assuming some VLANs will have mobile

stations Data plane: same as stationary VLANs All scaling considerations for VLANs apply Mobility dynamics get multiplied

Single IS-IS: updates hit same adjacency Multiple IS-IS: updates hit same CPU

Activity not bounded naturally anymore Update rate easily goes outside comfort range Clear scaling issues

IETF-62 TRILL BOF

Resolving scaling concerns 5K MAC/node limit in IS-IS could be solved with

RFC3786 Don’t use per-VLAN (multi-instance) routing Use reachability-based MAC announcement Scaling MAC distribution requires VLAN-aware

flooding: Each node and link is associated with a set of VLANs Only information needed by the remote nbr is flooded to it Not present in current IS-IS framework

Forget about mobility ;-)

trill routing scalability considerations

Documents

spring 2008cs 3321 intradomain routing outline algorithms...

shake (trill).pdf

1 ospf in multiple areas. 2 2 scalability problems in large...

spring 2003cs 4611 routing outline algorithms scalability

trill and datacenter alternatives

ietf-62trill bof trill routing scalability considerations...

stability and scalability in global routing

improving scalability and security using region based ...

i-4 routing scalability

trill over ip

enog trill-takasima-01

lever: leveraging routing infrastructure for routing...

trill spring

trill review

spring 2002cs 4611 routing outline algorithms scalability

trill spb-comparison-extract

lever: leveraging routing infrastructure for routing...

1 routing outline algorithms scalability. 2 overview...

multilevel trill...

spring 2006cs 3321 intradomain routing outline algorithms...