winter 2008 router design1 overview of generic router architecture input-queued switches (routers)...
Post on 22-Dec-2015
217 views
TRANSCRIPT
winter 2008 Router Design 1
Router Design
• Overview of Generic Router Architecture
• Input-Queued Switches (Routers)• IP Address Look-up Algorithms• Packet Classification Algorithms
Readings: do required and optional readings if interested
winter 2008 Router Design
Routers in a Network
. .
.
. .
.
winter 2008 Router Design 3
Sample Routers and SwitchesCisco 12416 Router
up to 160 Gb/s throughput
up to 10 Gb/s ports
3Com 495024 port gigabit Ethernet switch
Juniper Networks T640 Router
up to 160 Gb/s throughputup to 10 Gb/s ports
winter 2008 Router Design 4
High Capacity Router• Cisco CRS-1
– up to 46 Tb/s thruput
• two rack types• line card rack
– 640 Gb/s thruput– up to 16 line cards
• up to 40 Gb/s each
– up to 72 racks
• switch rack– central switch stage– up to 8 racks
• in-service scaling
winter 2008 Router Design
Components of a Basic Router• Input/Output Interfaces (II,
OI)– convert between optical
signals and electronic signals– extract timing from received
signals– encode (decode) data for
transmission
• Input Port Processor (IPP)– synchronize signals– determine required OI or OIs
from routing table
• Output Port Processor (OPP)– queue outgoing cells
• shared bus interconnects IPPs and OPPs
CPIPP OPP
. .
.
II OI
. .
.
routingtable
outputqueue
Control Processor (CP)» configures routing tables » coordinates end-to-end channel setup
together with neighboring routers
winter 2008 Router Design 6
Generic Router Architecture
LookupIP Address
UpdateHeader
Header Processing
AddressTable
AddressTable
LookupIP Address
UpdateHeader
Header Processing
AddressTable
AddressTable
LookupIP Address
UpdateHeader
Header Processing
AddressTable
AddressTable
QueuePacket
BufferMemory
BufferMemory
QueuePacket
BufferMemory
BufferMemory
QueuePacket
BufferMemory
BufferMemory
Data Hdr
Data Hdr
Data Hdr
1
2
N
1
2
N
N times line rate
N times line rate
winter 2008 Router Design 7
Switch Fabric: Three Design Approaches
winter 2008 Router Design 8
Switch Fabric: First Generation Routers
• Traditional computers with switching under direct control of the CPU
• Packet copied to the system’s memory• Speed limited by the memory bandwidth (two
bus crossings per packet)
InputPort
OutputPort
Memory
System Bus
winter 2008 Router Design 9
Shared Memory (1st Generation)
RouteTableCPU Buffer
Memory
LineInterface
MAC
LineInterface
MAC
LineInterface
MAC
Typically < 0.5Gbps aggregate capacityLimited by rate of shared memory
Shared Backplane
Line Interface
CPU
Memory
winter 2008 Router Design 10
Switch Fabric: Switching Via a Bus
• Packet from input port memory to output port memory via a shared bus
• Bus contention: switching speed limited by bus bandwidth
• 1 Gbps bus, Cisco 1900: sufficient speed for access and enterprise routers (not regional or backbone)
winter 2008 Router Design 11
Shared Bus (2nd Generation)
RouteTableCPU
LineCard
BufferMemory
LineCard
MAC
BufferMemory
LineCard
MAC
BufferMemory
FwdingCache
FwdingCache
FwdingCache
MAC
BufferMemory
Typically < 5Gb/s aggregate capacity; Limited by shared bus
winter 2008 Router Design 12
Switch Fabric: Interconnection Network
• Banyan networks, other interconnection nets initially created for multiprocessors
• Advanced design: fragmenting packet into fixed length cells to send through the fabric
• Cisco 12000: switches Gbps through the interconnection network
winter 2008 Router Design 13
Point-to-Point Switch (3rd Generation)
LineCard
MAC
LocalBuffer
Memory
CPUCard
LineCard
MAC
LocalBuffer
Memory
Switched Backplane
Line Interface
CPUMemory Fwding
Table
RoutingTable
FwdingTable
Typically < 50Gbps aggregate capacity
winter 2008 Router Design 14
Buffer Placement: Output Port Queuing
• Buffering when the aggregate arrival rate exceeds the output line speed
• Memory must operate at very high speed
winter 2008 Router Design 15
Simple model of output queued switch
Link 1, ingress Link 1, egress
Link 2, ingress Link 2, egress
Link 3, ingress Link 3, egress
Link 4, ingress Link 4, egress
Link rate, R
R
R
R
Link rate, R
R
R
R
winter 2008 Router Design 16
Characteristics of an output queued (OQ) switch
• arriving packets immediately written into output queue, without intermediate buffering
• flow of packets to one output does not affect flow to another output
• OQ switch is work conserving: output line always busy when there is a packet in switch for it
• OQ switch has highest throughput, lowest average delay
winter 2008 Router Design 17
Switching Speed-up Needed
LookupIP Address
UpdateHeader
Header Processing
AddressTable
AddressTable
LookupIP Address
UpdateHeader
Header Processing
AddressTable
AddressTable
LookupIP Address
UpdateHeader
Header Processing
AddressTable
AddressTable
QueuePacket
BufferMemory
BufferMemory
QueuePacket
BufferMemory
BufferMemory
QueuePacket
BufferMemory
BufferMemory
Data Hdr
Data Hdr
Data Hdr
1
2
N
1
2
N
N times line rate
N times line rate
winter 2008 Router Design 18
Buffer Placement: Input Port Queuing
• Fabric slower than input ports combined– So, queuing may occur at input queues
• Head-of-the-Line (HOL) blocking– Queued packet at the front of the queue prevents
others in queue from moving forward
winter 2008 Router Design 19
Simple model of input queued switch
Link 1, ingressLink 1, egress
Link 2, ingress Link 2, egress
Link 3, ingress Link 3, egress
Link 4, ingress Link 4, egress
R
R
R
R
R
R
R
R
R1Link 1
Link 2
Link 3
Link 4
winter 2008 Router Design 20
Head-of-line Blocking• Packet at the head of an input queue cannot
be transferred, thus blocking the following packets (or cells – packets of fixed size)
Cannot betransferred because output buffer full
Cannot be transferred because is blocked by red packet
Output 1
Output 2
Output 3
Input 1
Input 2
Input 3
winter 2008 Router Design 21
Characteristics of an input queued (IQ) switch
• arriving packets written into input queue• only one packet can be sent to output link at a
time• head-of-line blocking• IQ switch cannot keep output links fully utilized
winter 2008 Router Design 22
Buffer Placement: Design Trade-offs
• Output queues– Pro: work-conserving, so maximizes throughput– Con: memory must operate at speed N*R
• Input queues– Pro: memory can operate at speed R– Con: head-of-line blocking for access to output
• Work-conserving: output line is always busy when there is a packet in the switch for it
• Head-of-line blocking: head packet in a FIFO cannot be transmitted, forcing others to wait
winter 2008 Router Design 23
What is capacity of IQ: Model[optional: Karol et al Globecom’86]
•Large input-queued switch with– single FIFO at each input– packet destinations i.i.d. (independently, identically distributed), uniform across outputs– HoL blocked packets not flushed
•throughput analysis– saturated switch (i.e., always arrival at each input queue)– ball/urns model: N balls, N urns– focus on first urn
– Xt - number of balls in urn at time t
– Dt- number balls removed from all ums at end of time t
– Dt is switch thruput
winter 2008 Router Design 24
Model (cont’d)• At+1 - no. balls dropped into urn 1 at t+1
• Xt+1 = (Xt-1)+ + At+1
• where
• E(Dt) = ρN where ρ is output throughput
• for large N, binomial distribution can be approximated by Poisson distribution,
kDktt
tNNk
DkAP
/11/11
ek
kAPk
t !)(
winter 2008 Router Design 25
Model (cont’d)
where EA = ρ, E(A2) = ρ + ρ2
therefore
EX = 1, therefore
and ρ =2-√2 58.6%
)1(2
)(2)( 22
EA
EAEAAEEX
)1(2
2 2
EX
)1(2
21
2
winter 2008 Router Design 26
0% 20% 40% 60% 80% 100%Load
Delay
A Router with Input QueuesHead of Line Blocking
The best that any queueing system can
achieve.
2 2 58%
winter 2008 Router Design 27
Solution to Avoid Head-of-line Blocking
• How to improve capacity without increasing switching fabric speed ?
• Maintain at each input N virtual queues, i.e., one per output – use non-FIFO scheduler, matching input/output
Output 1
Output 2
Output 3
Input 1
Input 2
Input 3
winter 2008 Router Design 28
Virtual Output Queueing• assume fixed length
packets• each input manages
separate queue per output
• at each time, matching scheduler finds best possible packets from inputs to said to outputs
• maximum-weight matching
.
.
.
.
.
.
matchingscheduler
1 1
N N
winter 2008 Router Design 29
Matching• Lij(t): no. of packets at input i for output j at t
• bipartite graph (V1V2,E), EV1V2
– V1,V2 inputs, outputs
– (i,j) E iff Lij(t) > 0
• matching: subset of E such thatno two edges are adjacent
input output
winter 2008 Router Design 30
Matching problems• maximum size matching
– matching with largest number of edges– when traffic uniform, provides 100% utilization– network flow problem, O(N5/2)
• maximum weight matching– add weight wij to edge from i to j
– matching with highest weight
– when wij = Lij(t) provides 100% utilization
– equivalent to a network flow problem, O(N3)– MWM algorithms involve backtracking: i.e. edges laid down in one iteration may be removed
later algorithm not amenable to pipelining
winter 2008 Router Design 31
Scheduling Algorithms19
34 21
18
7
1
Not stable
Stable Not stable
PracticalMaximal Matchings
Max Wt Matching
19
18
Max Size Matching
19
1
7
winter 2008 Router Design 32
Switch Algorithms
Stable, low backlogsNot stable
Better performance
Easier to implement
Maximal matching Max Wt Matching
19
18
Max Size Matching
19
1
7
Not stable
winter 2008 Router Design 33
Better Matching Algorithms• Need simple algorithms that perform well
– efficient packet processing packets at line speeds– high throughput– low latencies/backlogs
• Randomized algorithms with linear complexity available– Tassiulas’ Randomized Algorithm– LAURA– SERENAUse both randomization, history, problem structure and arrival
informationFor more details, see “Efficient Randomized Algorithms for Input-
Queued Switch Scheduling” by Shah, Giaccone and Prabhakar, IEEE Micro Vol 22, Issue 1, Jan 2002
winter 2008 Router Design 34
Combined Input-Output Queued (CIOQ) Routers
• Both input and output interfaces store packets
• Advantages– Easy to built
• Utilization 1 can be achieved with limited input/output speedup (<= 2)
• Disadvantages– Harder to design algorithms
• Two congestion points• Need to design flow control
input interface output interface
Backplane
CRO
winter 2008 Router Design 35
Output Queue Emulation using CIOQ (with Speed-up)
Stable Marriage Problem -- Gale Shapely Algorithm (GSA)• As long as there is a free man m
– m proposes to highest ranked women w in his list he hasn’t proposed yet
– If w is free, m an w are engaged– If w is engaged to m’ and w prefers m to m’, w
releases m’• Otherwise m remains free
• A stable matching exists for every set of preference lists
• Complexity: worst-case O(N2)
winter 2008 Router Design 36
Stable Marriage Problem• Consider N women and N men
• Each woman/man ranks each man/woman in the order of their preferences
• Stable matching, a matching with no blocking pairs
• Blocking pair; let p(i) denote the pair of i– There are matched pairs (k, p(k)) and (j, p(j)) such that
k prefers p(j) to p(k), and p(j) prefers k to j
winter 2008 Router Design 37
Example
• If men propose to women, the stable matching is– 1st round: (1,2), (2,1), (3,4), (4,1) -> w1 releases m2– 2nd round: (2,4) ->w4 releases m3;– 3rd round: (3,3); – final match: (1,2), (2,4), (3,3), (4,1)
• What is the stable matching if women propose to men?
1 2 4 3 1 2 1 4 3 23 4 3 2 14 1 2 4 3
men pref. list1 1 4 3 2 2 3 1 4 23 1 2 3 44 2 1 4 3
women pref. list
winter 2008 Router Design 38
OQ Emulation with a Speedup of 2
• Each input and output maintains a preference list
• Input preference list: list of cells at that input ordered in the inverse order of their arrival
• Output preference list: list of all input cells to be forwarded to that output ordered by the times they would be served in an Output Queueing schedule
• Use GSA to match inputs to outputs– Outputs initiate the matching
• Can emulate all work-conserving schedulers
winter 2008 Router Design 39
Line Cards
• Interfacing – Physical link– Switching fabric
• Packet handling– Packet forwarding (FIB)– Packet filtering (ACLs)– Buffer management– Link scheduling– Rate-limiting– Packet marking– Measurement
to/from link
to/from switch
FIB
Rec
eive
Transm
it
winter 2008 Router Design 40
Line Card: Abstract view
LookupIP Address
UpdateHeader
Header ProcessingData Hdr Data Hdr
AddressTable
AddressTable
IP Address Next Hop
QueuePacket
BufferMemoryBuffer
Memory
winter 2008 Router Design 41
Line Cards: Longest-Prefix Match Forwarding
• Forwarding Information Base in IP routers– Maps each IP prefix to next-hop link(s)
• Destination-based forwarding– Packet has a destination address– Router identifies longest-matching prefix– Pushing complexity into forwarding decisions
4.0.0.0/84.83.128.0/1712.0.0.0/812.34.158.0/24126.255.103.0/24
12.34.158.5destination
FIB
Serial0/0.1outgoing link
winter 2008 Router Design 42
Line Cards: Packet Forwarding Evolution
• Software on the router CPU– Central processor makes forwarding decision– Not scalable to large aggregate throughput
• Route cache on the line card– Maintain a small FIB cache on each line card– Store (destination, output link) mappings– Cache misses handled by the router CPU
• Full FIB on each line card– Store the entire FIB on each line card– Apply dedicated hardware for longest-prefix match
winter 2008 Router Design 43
Line Cards: Packet Filtering With Access Control Lists
• “Five tuple” for access control lists (ACLs)– Source and destination IP addresses– TCP/UDP source and destination ports– Protocol (e.g., UDP vs. TCP)
Should arriving packet be allowed
in? Departing packet let out?
winter 2008 Router Design 44
ACL Examples
• Filter packets based on source address– Customer access link to the service provider– Source address should fall in customer prefix
• Filter packets based on port number– Block traffic for unwanted applications– Known security vulnerabilities, peer-to-peer, …
• Block pairs of hosts from communicating– Protect access to special servers– E.g., block the dorms from the grading server
winter 2008 Router Design 45
Line Cards: Mapping Traffic to Classes
• Gold traffic– All traffic to/from President’s IP address– All traffic to/from the port number for DNS
• Silver traffic– All traffic to/from academic and administrative
buildings
• Bronze traffic– All traffic on the public wireless network
• Then, schedule resources accordingly– 50% for gold, 30% for silver, and 20% for bronze
winter 2008 Router Design 46
Addressing and Look-up• Flat address
– Ethernet: 48 bit MAC address
– ATM: 28 bit VPI/VCI– DS-0: timeslot location
• Limited scalability• High speed lookup
• Hierarchical address– IP
<network>.<subnet>.<host>
– Telephone: country.area.home
• Scalable• Easy lookup if boundary is
fixed– telephony
• Difficult lookup if boundary is flexible– longest prefix match for IP
winter 2008 Router Design 47
Lookups Must be Fast
12540Gb/s2003
31.2510Gb/s2001
7.812.5Gb/s1999
1.94622Mb/s1997
40Byte packets (Mpkt/s)
LineYear
1. lookup mechanism must be simple, easy to implement2. memory access time long-term bottleneck
winter 2008 Router Design 48
Memory Technology (2003-04)
Technology Single chip density
$/chip ($/MByte)
Access speed
Watts/chip
Networking DRAM
64 MB $30-$50($0.50-$0.75)
40-80ns 0.5-2W
SRAM 4 MB $20-$30($5-$8)
4-8ns 1-3W
TCAM 1 MB $200-$250($200-$250)
4-8ns 15-30W
Note: price, speed, power manufacturer and market dependent
winter 2008 Router Design 49
Lookup Mechanism is Protocol Dependent
Protocol Mechanism Techniques
MPLS, ATM, Ethernet
Exact match search
–Direct lookup–Associative lookup–Hashing–Binary/Multi-way Search Trie/Tree
IPv4, IPv6 Longest-prefix match search
-Radix trie and variants-Compressed trie-Binary search on prefix intervals
winter 2008 Router Design 50
Exact Matches in Ethernet Switches
• layer-2 addresses usually 48-bits long• address global, not just local to link• range/size of address not “negotiable” • 248 > 1012, therefore cannot hold all addresses in table
and use direct lookup
winter 2008 Router Design 51
Exact Matches in Ethernet Switches (Associative Lookup)
• associative memory (aka Content Addressable Memory, CAM) compares all entries in parallel against incoming data
Network address Data
AssociativeMemory(“CAM”)
Addre
ss48bitsMatch
Location
Addre
ss“Normal”Memory
Data
Port
winter 2008 Router Design 52
Exact Matches in Ethernet SwitchesHashing
• use pseudo-random hash function (relatively insensitive to actual function)• bucket linearly searched (or could be binary search, etc.)• unpredictable number of memory references
HashingFunction
Memory
Addre
ss
Data
NetworkAddress
48
16, say Pointer
Memory
Addre
ss
DataList/Bucket
List of network addresses in this bucket
winter 2008 Router Design 53
Exact Matches Using HashingNumber of memory references
MN )/11(11
|(2
1
21
empty) not list list of length ExpectedER
:references memory of number Expected
Where:
ER = Expected number of memory references
M - Number of memory addresses in table
N - Number of linked lists-
M/N=
winter 2008 Router Design 54
Exact Matches in Ethernet SwitchesPerfect Hashing
HashingFunction
Memory
Addre
ss
Data
NetworkAddress
48
16, say Port
There always exists perfect hash function
Goal: With perfect hash function, memory lookup always takes O(1) memory references
Problem: - finding perfect hash function very complex- updates?
winter 2008 Router Design 55
Exact Matches in Ethernet Switches: Hashing
• advantages:– simple– expected lookup time is small
• disadvantages– inefficient use of memory– non-deterministic lookup time
attractive for software-based switches, but decreasing use in hardware platforms
winter 2008 Router Design
IP Address Lookup• routing tables contain (prefix, next hop)
pairs• address in packet compared to stored
prefixes, starting at left• prefix that matches largest number of
address bits is desired match• packet forwarded to specified next hop
01* 5110* 31011* 50001* 0
10* 7
0001 0* 10011 00* 21011 001* 31011 010* 5
0101 1* 7
0100 1100* 41011 0011* 81001 1000*100101 1001* 9
0100 110* 6
prefixnexthop
routing table
address: 1011 0010 1000
Problem - large router may have100,000 prefixes in its list
winter 2008 Router Design 57
Longest Prefix Match Harder than Exact Match
• destination address of arriving packet does not carry information to determine length of longest matching prefix
• need to search space of all prefix lengths; as well as space of prefixes of given length
winter 2008 Router Design 58
LPM in IPv4: exact matchUse 32 exact match algorithms
Exact matchagainst prefixes
of length 1
Exact matchagainst prefixes
of length 2
Exact matchagainst prefixes
of length 32
Network Address PortPriorityEncodeand pick
winter 2008 Router Design 59
• prefixes “spelled” out by following path from root
• to find best prefix, spell out address in tree
• last green node marks longest matching prefix
Lookup 10111
• adding prefix easy
Address Lookup Using Tries
P1 111* H1
P2 10* H2
P3 1010* H3
P4 10101 H4
P2
P3
P4
P1
A
B
C
G
D
F
H
E
1
0
0
1 1
1
1add P5=1110*
I
0
P5
next-hop-ptr (if prefix)
left-ptr right-ptr
Trie node
winter 2008 Router Design 60
Binary Tries
• W-bit prefixes: O(W) lookup, O(NW) storage and O(W) update complexity
Advantages
SimplicityExtensible to wider fields
Disadvantages
Worst case lookup slowWastage of storage space in chains
winter 2008 Router Design 61
Leaf-pushed Binary Trie
A
B
C
G
D
E
1
0
0
1
1
left-ptr or next-hop
Trie node
right-ptr or next-hop
P2
P4P3
P2
P1P1 111* H1
P2 10* H2
P3 1010* H3
P4 10101 H4
winter 2008 Router Design 62
PATRICIAPatricia tree internal node
bit-position
left-ptr right-ptr
Lookup 10111
2A
B C
E
10
1
3
P3 P4
P11
0F G
5
P1 111* H1
P2 10* H2
P3 1010* H3
P4 10101 H4
Bitpos 12345
• PATRICIA (practical algorithm to retrieve coded information in alphanumeric)– Eliminate internal nodes with only
one descendant– Encode bit position for determining
(right) branching
P2
0
winter 2008 Router Design 63
• W-bit prefixes: O(W2) lookup, O(N) storage and O(W) update complexity
Advantages
decreased storage extensible to wider fields
Disadvantages
worst case lookup slowbacktracking makes implementation complex
PATRICIA
winter 2008 Router Design 64
Path-compressed Tree1, , 2
A
B C10
10,P2,4
P4
P1
1
1
E
D1010,P3,5
bit-position
left-ptr right-ptr
variable-length bitstring
next-hop (if prefix present)
Path-compressed tree node structureLookup 10111
P1 111* H1
P2 10* H2
P3 1010* H3
P4 10101 H4
winter 2008 Router Design 65
• W-bit prefixes: O(W) lookup, O(N) storage and O(W) update complexity
Advantages
decreased storage
Disadvantages
worst case lookup slow
Path-compressed Tree
winter 2008 Router Design 66
Multi-bit Tries
Depth = WDegree = 2Stride = 1 bit
Binary trieW
Depth = W/kDegree = 2k
Stride = k bits
Multi-ary trie
W/k
winter 2008 Router Design 67
Prefix Expansion with Multi-bit Tries
If stride = k bits, prefix lengths that are not a multiple of k need to be expanded
Prefix Expanded prefixes
0* 00*, 01*
11* 11*
E.g., k = 2:
Maximum number of expanded prefixes corresponding to one non-expanded prefix = 2k-
1
winter 2008 Router Design 68
4-ary Trie (k=2)
P2
P3 P12
A
B
F11
next-hop-ptr (if prefix)
ptr00 ptr01
A four-ary trie node
P11
10
P42
H11
P41
10
10
1110
D
C
E
G
ptr10 ptr11
Lookup 10111
P1 111* H1
P2 10* H2
P3 1010* H3
P4 10101 H4
winter 2008 Router Design 69
Prefix Expansion Increases Storage Consumption
• replication of next-hop ptr• greater number of unused (null) pointers in a
node
Time ~ W/kStorage ~ NW/k * 2k-1
winter 2008 Router Design 70
Generalization: Different Strides at Each Trie Level
• 16-8-8 split• 4-10-10-8 split• 24-8 split• 21-3-8 split
winter 2008 Router Design 71
Choice of Strides: Controlled Prefix Expansion
Given forwarding table and desired number of memory accesses in worst case (i.e., maximum tree depth, D)
A dynamic programming algorithm to compute optimal sequence of strides that minimizes storage requirements: runs in O(W2D) time
Advantages
Optimal storage under these constraints
Disadvantages
Updates lead to sub-optimality anywayHardware implementation difficult
winter 2008 Router Design 72
Fast IP Address Lookup Algorithms
• Lulea’s Algorithm (SIGCOMM 1997)– Key goal: compactly represent routing table in small memory (hopefully, within
cache size), to minimize memory access– Use a three-level data structure
• Cut the look-up tree at level 16 and level 24
– Clever ways to design compact data structures to represent routing look-up info at each level
• Binary Search on Levels (SIGCOMM 1997)– Represent look-up tree as array of hash tables– Notion of “marker” to guide binary search– Prefix expansion to reduce size of array (thus memory accesses)
winter 2008 Router Design 73
Packet Classification•general router mechanism
– firewalls– network address translation– web server load balancing– special processing for selected flows
•common form of based on 5 IP header fields– source/dest. addr. – either/both specified by prefixes– protocol field - may be “wild-card”– source/dest. port #s (TCP/UDP) - may be port ranges
•no ideal design– exhaustive search - slow links, few filters– ternary content-addressable memory – exhaustive
search– efficient special cases - exact match, one or two address
prefixes
winter 2008 Router Design 74
Packet Classification
Packet Classification: find action associated with highest priority rule matching incoming packet header
Field 1 Field 2 … Field k Action
Rule 1 5.3.40.0/21 2.13.8.11/32 … UDP A1
Rule 2 5.168.3.0/24 152.133.0.0/16
… TCP A2
… … … … … …
Rule N 5.168.0.0/16 152.0.0.0/8 … ANY AN
Example: packet (5.168.3.32, 152.133.171.71, …, TCP)
L3-DA L3-SA L4-PROT
winter 2008 Router Design 75
Formal Problem Definition
Given classifier C with N rules, Rj, 1 j N, where Rj consists of three entities:
1) a regular expression Rj[i], 1 i d, on each of the d header fields,
2) a number, pri(Rj), indicating the priority of the rule in the classifier, and
3) an action, referred to as action(Rj).
For incoming packet P with header considered as d-tuple of points (P1, P2, …, Pd), the d-dimensional packet classification problem is to find rule Rm with highest priority among all rules Rj matching d-tuple; i.e., pri(Rm) > pri(Rj), j m, 1 j N, such that Pi matches Rj[i], 1 i d. Rule Rm is best matching rule for packet P.
winter 2008 Router Design 76
Routing Lookup: Instance of 1D Classification
• one-dimension (destination address)• forwarding table classifier• routing table entry rule• outgoing interface action• prefix-length priority
winter 2008 Router Design 77
Example 4D ClassifierRule L3-DA L3-SA L4-DP L3-PROT Action
R1 152.163.190.69/255.255.255.255
152.163.80.11/255.255.255.255
* * Deny
R2 152.168.3/255.255.255
152.163.200.157/255.255.255.255
eq www udp Deny
R3 152.168.3/255.255.255
152.163.200.157/255.255.255.255
range 20-21 udp Permit
R4 152.168.3/255.255.255
152.163.200.157/255.255.255.255
eq www tcp Deny
R5 * * * * Deny
winter 2008 Router Design 78
Example Classification Results
Pkt Hdr
L3-DA L3-SA L4-DP L3-PROT Rule, Action
P1 152.163.190.69 152.163.80.11 www tcp R1, Deny
P2 152.168.3.21 152.163.200.157 www udp R2, Deny
winter 2008 Router Design 79
Geometric Interpretation
R5 R4
R3
R1R2
R7
Dimension 1
Dim
ensi
on 2
R6
e.g. (128.16.46.23, *)
e.g. (144.24/24, 64/16)
P2 P1
Packet classification problem: Find the highest priority rectangle containing an incoming point
winter 2008 Router Design 80
Linear Search
• keep rules in a linked list• O(N) storage, O(N) lookup time, O(1) update
complexity
winter 2008 Router Design 81
Ternary Match Operation• Each TCAM entry stores a value, V, and mask, M• Hence, two bits (Vi and Mi) for each bit position i (i=1..W)• For an incoming packet header, H = {Hi}, the TCAM entry outputsa match if Hi matches Vi in each bit position for which Mi equals ‘1’.
Vi Mi Match in bit position i ?
X 0 Yes
0 1 Iff (Hi==0)
1 1 Iff (Hi==1)
winter 2008 Router Design 82
Lookups/Classification with Ternary CAM
Memory array Priority
encoder
Action MemoryPacket
HeaderAction
TCAM RAM
01
2
3
M
0
1
0
0
1
1.23.11.3, tcp
1.23.x.x, x
winter 2008 Router Design 83
Lookups/Classification with Ternary CAM
Memory array Priority
encoder
Action MemoryPacket
HeaderAction
TCAM RAM
01
2
3
M
0
1
0
0
1
1.23.11.3
1.23.x.x
P32
P31
P8
For LPM
winter 2008 Router Design 84
Range-to-prefix Blowup
• prefixes easier to handle than ranges• can transform ranges to prefixes
Range-to-prefix blowup problem
winter 2008 Router Design 85
Maximal Prefixes
0011, 01**, 10**
001*, 01**
01**, 10**
01**
0001, 001*, 01**, 10**, 110*, 1110
Range-to-prefix Blowup
Rule Range
R1 [3,11]
R2 [2,7]
R3 [4,11]
R4 [4,7]
R5 [1,14]
Maximum memory blowup = factor of (2W-2)d
Luckily, real-life does not see too many arbitrary ranges.
winter 2008 Router Design 86
TCAMsAdvantages
extensible to multiple fieldsfast: 10-16 ns today (66-100 M searches per second) going to 250 Mspssimple to understand and use
Disadvantages
inflexible: range-to-prefix blowuphigh power, cost: low density, largest available in 2003-4 is ~2MB, i.e., 128K x 128 (can be cascaded)
winter 2008 Router Design 87
Example Classifier
Rule Destination Address
Source Address
R1 0* 10*
R2 0* 01*
R3 0* 1*
R4 00* 1*
R5 00* 11*
R6 10* 1*
R7 * 00*
winter 2008 Router Design 88
Hierarchical Tries
O(NW) memoryO(W2) lookup
Rule DA SA
R1 0* 10*
R2 0* 01*
R3 0* 1*
R4 00* 1*
R5 00* 11*
R6 10* 1*
R7 * 00*
Dimension SAR5 R2 R1
R3R6
R7
R4
Dimension DA
Search (000,010)
winter 2008 Router Design 89
Set-pruning Tries
Rule DA SA
R1 0* 10*
R2 0* 01*
R3 0* 1*
R4 00* 1*
R5 00* 11*
R6 10* 1*
R7 * 00*
Dimension SA
O(N2) memoryO(2W) lookup
Dimension DA
R7 R2 R1 R5 R7 R2 R1
R3
R7
R6R4
Search (000,010)
winter 2008 Router Design 90
Grid-of-Tries
O(NW) memoryO(2W) lookup
Rule DA SA
R1 0* 10*
R2 0* 01*
R3 0* 1*
R4 00* 1*
R5 00* 11*
R6 10* 1*
R7 * 00*
Dimension DA
Dimension SAR5 R2 R1
R3R6
R7
R4
Search (000,010)
switchpointers
winter 2008 Router Design 91
Grid-of-Tries
Advantages
good solution for two dimensions
Disadvantages
difficult to carry updatenot easily extensible to more than two dimensions
20K 2D rules: 2MB, 9 memory accesses (with prefix-expansion)
winter 2008 Router Design 92
Classification Algorithms: Speed vs. Storage Tradeoff
O(log N) time with O(Nd) storage, orO(logd-1N) time with O(N) storage
Lower bounds for Point Location in N regions with d dimensions from Computational Geometry
N = 100, d = 4, Nd = 100 MBytes and logd-1N = 350 memory accesses
winter 2008 Router Design 93
Packet Classification Summary
• Algorithms discussed so far– good for two fields, doesn’t scale to more than two fields,
OR– good for very small classifiers (< 50 rules) only, OR– have non-deterministic classification time, etc.
• Heuristic-Based Algorithms– Recursive Flow Classification (RFC)
• Exploit structure of classifiers, recursively reduce rule space– Hierarchical Intelligent Cuttings (HiCuts)
• Use heuristics to reduce d-dim search space into sub-spaces– Tuple Space Search
• decompose query into a number of exact match queries• store rules into hash table
winter 2008 Router Design 94
Example of Packet Flow in RFC
winter 2008 Router Design 95
RFC Example• Four fields six chunks
– Source and destination IP addresses two chuncks each– Protocol number one chunck– Destination port number one chunck
winter 2008 Router Design 96
Lookup: What’s Used Out There?
• overwhelming majority of routers:– modifications of multi-bit tries (h/w optimized trie
algorithms)– DRAM (sometimes SRAM) based, large number of
routes (>0.25M)– parallelism required for speed/storage becomes an
issue
• others mostly TCAM based– for smaller number of routes (256K)– used more frequently in L2/L3 switches– power and cost main bottlenecks
winter 2008 Router Design 97
Classification: What’s Used Out There?
• majority of hardware platforms: TCAMs– High performance, cost, power, deterministic worst-case
• some others: Modifications of RFC– Low speed, low cost DRAM-based, heuristic– Works well in software platforms
• some others: nothing/linear search/simulated-parallel-search etc.