inside internet routers

68
CS 3035/GZ01: Networked Systems Kyle Jamieson Department of Computer Science University College London Inside Internet Routers

Upload: tiva

Post on 24-Feb-2016

79 views

Category:

Documents


0 download

DESCRIPTION

Inside Internet Routers. CS 3035/GZ01: Networked Systems Kyle Jamieson Department of Computer Science University College London. Today: Inside Internet routers. Longest-prefix lookup for IP forwarding T he Luleå algorithm Router architecture. Cisco CRS-1 Carrier Routing System. - PowerPoint PPT Presentation

TRANSCRIPT

Page 1: Inside Internet Routers

CS 3035/GZ01: Networked SystemsKyle Jamieson

Department of Computer ScienceUniversity College London

Inside Internet Routers

Page 2: Inside Internet Routers

2

Today: Inside Internet routers

1. Longest-prefix lookup for IP forwarding– The Luleå algorithm

2. Router architecture

Cisco CRS-1 Carrier Routing System

Networked Systems 3035/GZ01

Page 3: Inside Internet Routers

3

The IP forwarding problem• Core Internet links have extremely fast line speeds:– SONET optical fiber links• OC-48 @ 2.4 Gbits/s: backbones of secondary ISPs• OC-192 @ 10 Gbits/s: widespread in the core• OC-768 @ 40 Gbits/s: found in many core links

• Internet routers must handle minimum-sized packets (40−64 bytes) at the line speed of the link– Minimum-sized packets are the hardest case for IP lookup• They require the most lookups per second• At 10 Gbits/s, have 32−51 ns to decide for each packet• Compare: DRAM latency ≈ 50 ns; SRAM latency ≈ 5 ns

Networked Systems 3035/GZ01

Page 4: Inside Internet Routers

4

The IP forwarding problem (2)• Given an incoming packet with IP destination D, choose an

output port for the packet by longest prefix match rule– Then we will configure the switching fabric to connect the

input port of the packet to the chosen output port

• What kind of data structure can we use for longest prefix match?

Networked Systems 3035/GZ01

Page 5: Inside Internet Routers

5

Longest prefix match (LPM)• Given an incoming IP datagram to destination D– The forwarding table maps a prefix to an outgoing port:

• Longest prefix match rule: Choose the forwarding table entry with the longest prefix P/x that matches D in the first x bits

• Forward the IP datagram out the chosen entry’s outgoing port

Networked Systems 3035/GZ01

Prefix Port192.0.0.0/4 24.83.128.0/17 1201.10.0.0/21 3201.10.6.0/23 2

Page 6: Inside Internet Routers

6

Computing the longest prefix match (LPM)

• The destination matches forwarding table entry 192.0.0.0/4

Destination:201.10.7.17

Forwarding Table:Prefix Port✔ 192.0.0.0/4 24.83.128.0/17 1201.10.0.0/21 3201.10.6.0/23 2

201 10 7 1711001001 00001010 00000111 0001000119211000000

Destination:

Prefix (/4):

IP Header

Networked Systems 3035/GZ01

Page 7: Inside Internet Routers

7

Computing the longest prefix match (LPM)

• No match for forwarding table entry 4.83.128.0/17

Destination:201.10.7.17

Forwarding Table:Prefix Port✔ 192.0.0.0/4 2✗ 4.83.128.0/17 1201.10.0.0/21 3201.10.6.0/23 2

201 10 7 1711001001 00001010 00000111 000100014 83 12800000100 01010011 10000000

Destination:

Prefix (/17):

IP Header

Networked Systems 3035/GZ01

Page 8: Inside Internet Routers

8

Computing the longest prefix match (LPM)

• The destination matches forwarding table entry 201.10.0.0/21

Destination:201.10.7.17

Forwarding Table:Prefix Port✔ 192.0.0.0/4 2✗ 4.83.128.0/17 1✔ 201.10.0.0/21 3201.10.6.0/23 2

201 10 7 1711001001 00001010 00000111 00010001201 10 011001001 00001010 00000000

Destination:

Prefix (/21):

IP Header

Networked Systems 3035/GZ01

Page 9: Inside Internet Routers

9

Computing the longest prefix match (LPM)

• The destination matches forwarding table entry 201.10.6.0/23

Destination:201.10.7.17

Forwarding Table:Prefix Port✔ 192.0.0.0/4 2✗ 4.83.128.0/17 1✔ 201.10.0.0/21 3✔ 201.10.6.0/23 2

201 10 7 1711001001 00001010 00000111 00010001201 10 611001001 00001010 00000110

Destination:

Prefix (/21):

IP Header

Networked Systems 3035/GZ01

Page 10: Inside Internet Routers

10

Computing the longest prefix match (LPM)

• Applying the longest prefix match rule, we consider only all three matching prefixes

• We choose the longest, 201.10.6.0/23

Destination:201.10.7.17

Forwarding Table:Prefix Port✔ 192.0.0.0/4 2✗ 4.83.128.0/17 1✔ 201.10.0.0/21 3✔ 201.10.6.0/23 2

IP Header

Networked Systems 3035/GZ01

Page 11: Inside Internet Routers

11

LPM: Performance• How fast does the preceding algorithm run?

• Number of steps is linear in size of the forwarding table– Today, that means 200,000−250,000 entries!– And, the router may have just tens of nanoseconds before the

next packet arrives– Recall, DRAM latency ≈ 50 ns; SRAM latency ≈ 5 ns

• So, we need much greater efficiency to keep up with line speed– Better algorithms– Hardware implementations

• Algorithmic problem: How do we do LPM faster than a linear scan?

Networked Systems 3035/GZ01

Page 12: Inside Internet Routers

12

• Store routing table prefixes, outgoing port in a binary tree

• For each routing table prefix P/x:– Begin at root of a binary tree– Start from the most-significant bit of the prefix– Traverse down ith-level branch corresponding to ith most

significant bit of prefix, store prefix and port at depth x

First attempt: Binary tree

Networked Systems 3035/GZ01

Length (bits)

0 1

00/1

80/2

0 180/1

Root

C0/2Note convention: Hexadecimal notation

Page 13: Inside Internet Routers

13

Example: 192.0.0.0/4 in the binary tree

Networked Systems 3035/GZ01

0xC011000000192.0.0.0/4:

Prefix Port192.0.0.0/4 2

1

180/1

Root

C0/2

C0/3

C0/4:Port 2

0

0

Forwarding Table:

Page 14: Inside Internet Routers

14

• When a packet arrives:– Walk down the tree based on the destination address– The deepest matching node corresponds to the longest-

prefix match

• How much time do we need to perform an IP lookup?– Still need to keep big routing table in slow DRAM

– In the worst case, scales directly with the number of bits in longest prefix, each of which involves a slow memory lookup

– Back-of-the-envelope calculation: • 20-bit prefix × 50 ns DRAM latency = 1 μs (goal, ≈ 32 ns)

Routing table lookups with the binary tree

Networked Systems 3035/GZ01

Page 15: Inside Internet Routers

15

Luleå algorithm• Degermark et al., Small Forwarding Tables for Fast Routing

Lookups (ACM SIGCOMM ‘97)

• Observation: Binary tree is too large– Won’t fit in fast CPU cache memory in a software router– Memory bandwidth becomes limiting factor in a

hardware router

• Therefore, goal becomes: How can we minimize memory accesses and the size of data structure?– Method for compressing the binary tree– Compact 40K routing entries into 150−160 Kbytes– So we can use SRAM for the lookup, and thus perform

IP lookup at Gigabit line rates!

Networked Systems 3035/GZ01

Page 16: Inside Internet Routers

16

Luleå algorithm: Binary tree• The full binary tree has a height of 32 (number of bits in IP address)– So has 232 leaves (one for each possible IP address)

• Luleå stores prefixes of different lengths differently– Level 1 stores prefixes ≤ 16 bits in length– Level 2 stores prefixes 17-24 bits in length– Level 3 stores prefixes 25-32 bits in length

31IP address space: 232 possible addresses

0

Networked Systems 3035/GZ01

Page 17: Inside Internet Routers

17

Luleå algorithm: Level 1• Imagine a cut across the binary tree at level 16

• Construct a length 216 bit vector that contains information about the routing table entries at or above the cut– Bit vector stores one bit per /16 prefix

• Let’s zoom in on the binary tree here:

31IP address space: 232 possible addresses

0

Level-16 cut

Networked Systems 3035/GZ01

Page 18: Inside Internet Routers

18

Constructing the bit vector• Put a 1 in the bit vector wherever there is a

routing table entry at depth ≤ 16

1 1 …

– If the entry is above level 16, follow pointers left (i.e., 0) down to level 16

• These routing table entries are called genuine heads

Networked Systems 3035/GZ01

Page 19: Inside Internet Routers

19

Constructing the bit vector• Put a 0 in the bit vector wherever a genuine

head at depth < 16 contains this /16 prefix

1 0 1 …

• For example, 0000/15 contains 0001/16 (they have the same next hop)

• These routing table entries are called members

Networked Systems 3035/GZ01

Page 20: Inside Internet Routers

20

Constructing the bit vector• Put a 1 in the bit vector wherever there are

routing entries at depth > 16, below the cut

1 0 1 1 …

• These bits correspond to root heads, and are stored in Levels 2 and 3 of the data structure

Networked Systems 3035/GZ01

Page 21: Inside Internet Routers

21

Bit masks

• The bit vector is divided into 212 16-bit bit masks

• From the figure below, notice:– The first 16 bits of the lookup IP address index the bit in the bit vector– The first 12 bits of the lookup IP address index the containing bit mask

Networked Systems 3035/GZ01

Bit mask

depth 12

Bit vector:

000/12

0000/16

1 0 0 0 1 0 1 1 1 0 0 0 1 1 1 1depth 16

Page 22: Inside Internet Routers

22

The pointer array• To compress routing table, Luleå stores neither binary tree nor bit vector

• Instead, pointer array has one entry per set bit (i.e., equal to 1) in the bit mask– For genuine heads, pointer field indexes into a next hop table– For root heads, pointer field indexes into a table of Level 2 chunks

• Given a lookup IP, how to compute the correct index into the pointer array, pix?– No simple relationship between lookup IP and pix

Networked Systems 3035/GZ01

‸Genuine Next hop index

Pointer arraytype (2) pointer (14)

Genuine Next hop indexRoot L2 chunk index

pix

……

1 0 0 0 1 0 1 1 1 0 0 0 1 1 1 1

Page 23: Inside Internet Routers

23

Finding the pointer group: Code word array• Group the pointers associated with each bit mask together (pointer group)

Networked Systems 3035/GZ01

000/12

1 0 0 0 1 0 1 1 1 0 0 0 1 1 1 1 1 0 0 0 0 0 0 0 1 0 1 1 1 0 0 0

001/120000/16

Bit mask for 000/12 (nine bits set) Bit mask for 001/12 (five bits set)

0

code: six

9

000:001:

21214002:

• The code word array stores the index in the pointer array index where each pointer group begins in its six field– code has 212 entries, one for each bit mask– Indexed by top 12 bits of the lookup IP

Page 24: Inside Internet Routers

24

Finding the pointer group: Base index array• After four bit masks, 4 × 16 = 64 bits could be set, too many to

represent in six bits, so we “reset” six to zero every fourth bit mask

• The base index array base stores the index in the pointer array where groups of four bit masks begin– Indexed by the top 10 bits of the lookup IP

Networked Systems 3035/GZ01

0

code: six

000:001:

212002:914

003: 160004:

004/12

000/10

000/12 (9 bits set) 001/12 (5 bits set) 002/12 (2 bits set) 003/12 (2 bits set) 004/12 (2 bits set)

004/10

0

base: 16 bits

18…000:001:

210

So, the pointer group begins at base[IP[31:22]] + code[IP[31:20]].six

Page 25: Inside Internet Routers

25

Finding the correct pointer in the pointer group

• The high-order 16 bits (31−16) of the lookup IP identify a leaf at depth 16– Recall: The first 12 of those (31−20) determine the bit mask– The last four of those (19−16, i.e. the path between depth 12 and

depth 16) determine the bit within a bit mask

• The maptable tells us how many pointers to skip in the pointer group to find a particular bit’s pointer– There are relatively few possible bit masks, so store all possibilities

Networked Systems 3035/GZ01

1 0 0 0 1 0 1 1 1 0 0 0 1 1 1 1depth 16

depth 12IP[19]:IP[18]:IP[17]:IP[16]:

Bit mask

Page 26: Inside Internet Routers

26

Completing the binary tree• But, a problem: two different prefix trees can have the same bit vector, e.g.:

Networked Systems 3035/GZ01

1 0 …

……

1 0 …

……

• So, the algorithm requires that the prefix tree be complete: each node in the tree have either zero or two children

• Nodes with a single child get a sibling node added with duplicate next-hop information as a preliminary step

1 1 …

……

Page 27: Inside Internet Routers

27

How many bit masks are possible?• Bit masks are constrained: not all 216 combinations are possible,

since they are formed from a complete prefix tree

• How many are there? Let’s count:– Let a(n) be the number of valid non-zero bit masks of length 2n

• a(0) = 1 (one possible non-zero bit mask of length one)• a(n) = 1 + a(n − 1)2

– Either bit mask with value 1, or any combination of non-zero, half-size masks

• e.g. a(1): [1 1] [1 0]

– a(4) = 677 possible non-zero bit masks of length 16– So we need 10 bits to index the 678 possibilities

Networked Systems 3035/GZ01

Page 28: Inside Internet Routers

28

Finding the correct pointer in the pointer group• ten field of code word array stores a row index of maptable that

has offsets for the bit mask pattern at that location in the tree

• Bits 19−16 of the IP address index maptable columns, to get the right (4-bit) offset in the bit mask of the bit we’re interested in

• maptable entries are pre-computed, don’t depend on the routing table

code:

ten six

212

0 1 2 3 4 . . . 15

0

675maptable:10 2 4 1631 0IP address

maptableentry

Networked Systems 3035/GZ01

Page 29: Inside Internet Routers

29

Luleå: Summary of finding pointer index

ten = code[ix].tensix = code[ix].sixpix = base[bix] + six + maptable[ten][bit]pointer = pointer_array[pix]

Networked Systems 3035/GZ01

Page 30: Inside Internet Routers

30

Optimization at level 1• When the bit mask is zero, or has a single bit set, the pointer array is

holding an index into the next-hop table (i.e., a genuine head)

• In these cases, store the next-hop index directly in the codeword

Networked Systems 3035/GZ01

1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0

000/12 000/11

• This reduces the number of rows needed for maptable from 678 to 676

Bit mask Bit mask

Page 31: Inside Internet Routers

31

Luleå algorithm: Levels 2 and 3• Root heads point to subtrees of height at most eight, called chunks– A chunk itself contains at most 256 heads

• There are three types of chunk, depending on how many heads it contains:– Sparse: 1-8 heads, array of next hop indices of the heads within a chunk– Dense: 9-64 heads, same format as Level 1, but no base index– Very dense: 65-256 heads, same format as Level 1

Networked Systems 3035/GZ01

Genuine Next hop index

Pointer arraytype (2) pointer (14)

Root L2 chunk indexGenuine Next hop index

pix

Page 32: Inside Internet Routers

32

Sprint routing table lookup time distribution(Alpha 21164 CPU)

• Fastest lookups (17 cycles = 85 ns): code word directly storing next-hop index

• 41 clock cycles: pointer in level 1 indexes next-hop table (genuine head)

• Longer lookup times correspond to searching at levels 2 or 3

Networked Systems 3035/GZ01

Count

(cycle time = 5 ns)

17 41

500 ns worst-case lookup time on a commodity CPU

Page 33: Inside Internet Routers

33

Luleå algorithm: Summary• Current state of the art in IP router lookup

• Tradeoff mutability and table construction time for speed– Adding a routing entry requires rebuilding entire table– But, routing tables don’t often change, and they argue they

can sustain one table rebuild/second on their platform

• Table size: 150 Kbytes for 40,000 entries, so can fit in fast SRAM on a commodity system

• Utilized in hardware as well as software routers to get lookup times down to tens of nanoseconds

Networked Systems 3035/GZ01

Page 34: Inside Internet Routers

34

Today: Inside Internet routers

1. Longest-prefix lookup for IP forwarding– The Luleå algorithm

2. Router architecture– Crossbar scheduling and the

iSLIP algorithm

– Self-routing fabric: Banyan network

Cisco CRS-1 Carrier Routing System

Networked Systems 3035/GZ01

Page 35: Inside Internet Routers

35

Router architecture1. Data path: functions performed on each datagram– Forwarding decision– Switching fabric (backplane)– Output link scheduling

2. Control plane: functions performed relatively infrequently– Routing table information exchange with others– Configuration and management

• Key design factor: Rate at which components operate (packets/sec)– If one component operates at n times rate of another, we say

it has speedup of n relative to the other

Networked Systems 3035/GZ01

Page 36: Inside Internet Routers

36

Input port functionality

• IP address lookup– CIDR longest-prefix match– Uses a copy of forwarding table from control processor

• Check IP header, decrement TTL, recalculate checksum, prepend next-hop link-layer address

• Possible queuing, depending on design

R

Networked Systems 3035/GZ01

Page 37: Inside Internet Routers

37

Switching fabric• So now the input port has tagged the packet with the right

output port (based on the IP lookup)

• Job of switching fabric is to move the packet from an input port to the right output port

• How can this be done?

1. Copy it into some memory location and out again2. Send it over a shared hardware bus3. Crossbar interconnect4. Self-routing fabric

Networked Systems 3035/GZ01

Page 38: Inside Internet Routers

38

Switching via shared memory• First generation routers: traditional computers with switching

under direct control of CPU1. Packet copied from input port across shared bus to RAM2. Packet copied from RAM across shared bus to output port

Simple design

All ports share queuememory in RAM

– Speed limited by CPU: must processevery packet

[Image: N. McKeown]Networked Systems 3035/GZ01

Page 39: Inside Internet Routers

39

Switching via shared bus• Datagram moves from input port memory to output port

memory via a shared bus– e.g. Cisco 5600: 32 Gbit/s bus yields sufficient speed for

access routers

Eliminates CPUbottleneck

– Bus contention:switching speed limitedby shared bus bandwidth

– CPU speed still a factor

[Image: N. McKeown]Networked Systems 3035/GZ01

Page 40: Inside Internet Routers

40

Switched interconnection fabrics• Shared buses divide bandwidth among contenders– Electrical reason: speed of bus limited by # connectors

• Crossbar interconnect– Up to n2 connects join n inputs to n outputs

Multiple input ports can then communicate

simultaneously with multiple output ports

[Image: N. McKeown]

Networked Systems 3035/GZ01

Page 41: Inside Internet Routers

41

Switching via crossbar• Datagram moves from input port memory to output port memory

via the crossbar• e.g. Cisco 12000 family: 60 Gbit/s; sufficient speed for core routers

Eliminates bus bottleneck

Custom hardware forwardingengines replace generalpurpose CPUs

– Requires algorithm todetermine crossbarconfiguration

– Requires n× output port speedup

[Image: N. McKeown]

Crossbar

Networked Systems 3035/GZ01

Page 42: Inside Internet Routers

42

Where does queuing occur?• Central issue in switch design; three choices:– At input ports (input queuing)

– At output ports (output queuing)

– Some combination of the above

• n = max(# input ports, # output ports)

Networked Systems 3035/GZ01

Page 43: Inside Internet Routers

43

Output queuing• No buffering at input ports, therefore:– Multiple packets may arrive to an output port in one cycle;

requires switch fabric speedup of n

– Output port buffers all packets

• Drawback: Output port speedup required: n

Networked Systems 3035/GZ01

Page 44: Inside Internet Routers

44

Input queuing

• Input ports buffer packets

• Send at most one packet per cycle to an output port

Networked Systems 3035/GZ01

Page 45: Inside Internet Routers

45

Input queuing: Head-of-line blocking

• One packet per cycle sent to any output

• Blue packet blocked despite the presence of available capacity at output ports and in switch fabric

• Reduces throughput of the switch

Networked Systems 3035/GZ01

Page 46: Inside Internet Routers

46

Virtual output queuing• On each input port, place one input queue per output port

• Use a crossbar switch fabric

• Input port places packet in virtual output queue (VOQ) corresponding to output port of forwarding decisionNo head-of-line blockingAll ports (input and output) operate at same rate– Need to schedule fabric: choose which VOQs get service when

Outputports (3)

Networked Systems 3035/GZ01

Page 47: Inside Internet Routers

47

Virtual output queuing

[Image: N. McKeown]

Networked Systems 3035/GZ01

Page 48: Inside Internet Routers

48

Today: Inside Internet routers

1. Longest-prefix lookup for IP forwarding– The Luleå algorithm

2. Router architecture– Crossbar scheduling and the

iSLIP algorithm

– Self-routing fabric: Banyan network

Cisco CRS-1 Carrier Routing System

Networked Systems 3035/GZ01

Page 49: Inside Internet Routers

49

Crossbar scheduling algorithm: goals1. High throughput– Low queue occupancy in VOQs– Sustain 100% of rate R on all n inputs, n outputs

2. Starvation-free– Don’t allow any one virtual output queue to be unserved

indefinitely

3. Speed of execution– Should not be the performance bottleneck in the router

4. Simplicity of implementation– Will likely be implemented on a special purpose chip

Networked Systems 3035/GZ01

Page 50: Inside Internet Routers

50

iSLIP algorithm: Introduction• Model problem as a bipartite graph– Input port = graph node on left– Output port = graph node on right– Edge (i, j) indicates packets in VOQ Q(i, j) at input port i

• Scheduling = a bipartite matching (no two edges connected to the same node)

Request graph Bipartite matching

Networked Systems 3035/GZ01

Page 51: Inside Internet Routers

51

iSLIP: High-level overview• For simplicity, we will look at “single-iteration” iSLIP– One iteration per packet

• Each iteration consists of three phases:

1. Request phase: all inputs send requests to outputs

2. Grant phase: all outputs grant requests to some input

3. Accept phase: input chooses an output’s grant to accept

Networked Systems 3035/GZ01

Page 52: Inside Internet Routers

52

iSLIP: Accept and grant counters• Each input port i has a round-robin accept counter ai

• Each output port j has a round-robin grant counter gj

• ai and gj are round-robin counters: 1, 2, 3, …, n, 1, 2, …

g2

1

23

4a2

1

23

4

a1

a3

a4

g1

g3

g4

Networked Systems 3035/GZ01

Page 53: Inside Internet Routers

53

iSLIP: One iteration in detail1. Request phase– Input sends a request to all backlogged outputs

2. Grant phase– Output j grants the next request grant pointer gj points to

3. Accept phase– Input i accepts the next grant its accept pointer ai points to– For all inputs k that have accepted, increment then ak

g2

1

23

4a2

1

23

4

a1

a3

a4

g1

g3

g4

Networked Systems 3035/GZ01

Page 54: Inside Internet Routers

54

iSLIP example

• Two inputs, two outputs– Input 1 always has traffic for outputs 1 and 2– Input 2 always has traffic for outputs 1 and 2

• All accept and grant counters initialized to 1

1

2

1

2

a1

1

2

a2

1

2

g1

1

2

g2

1

2

Networked Systems 3035/GZ01

Page 55: Inside Internet Routers

55

iSLIP example: Packet time 1

1

2

1

2

a1

1

2

a2

1

2

g1

1

2

g2

1

2

Request phase

1

2

1

2

a1

1

2

a2

1

2

g1

1

2

g2

1

2

Grant phase

Accept phase1

2

1

2a2

1

2

g1

1

2

g2

1

2

a1

1

2

Networked Systems 3035/GZ01

Page 56: Inside Internet Routers

56

iSLIP example: Packet time 2

Request phase1

2

1

2a2

1

2g2

1

2

a1

1

2g1

1

2

Accept phase1

2

1

2

a1

1

2

g2

1

2

g1

1

2

a2

1

2

Grant phase1

2

1

2a2

1

2g2

1

2

a1

1

2g1

1

2

Networked Systems 3035/GZ01

Page 57: Inside Internet Routers

57

iSLIP example: Packet time 3

Request phase1

2

1

2

Grant phase1

2

1

2

Accept phase1

2

1

2

a1

1

2

g2

1

2

g1

1

2

a2

1

2

a1

1

2

g2

1

2

g1

1

2

a2

1

2

a2

1

2g2

1

2

a1

1

2g1

1

2

Networked Systems 3035/GZ01

Page 58: Inside Internet Routers

58

Implementing iSLIP

1r11 = 1r21 = 1r12 = 1r22 = 1

2

1

2

1000

1

2

1

2

1

2

Requestvector:

Grantarbiters

Acceptarbiters

Decisionvector:

Request phase Grant phase Accept phase

1

10

0

Networked Systems 3035/GZ01

Page 59: Inside Internet Routers

59

Implementing iSLIP: General circuit

Networked Systems 3035/GZ01

Page 60: Inside Internet Routers

60

Implementing iSLIP: Inside an arbiter

Highest priorityIncrementer

Networked Systems 3035/GZ01

Page 61: Inside Internet Routers

61

Today: Inside Internet routers

1. Longest-prefix lookup for IP forwarding– The Luleå algorithm

2. Router architecture– Crossbar scheduling and the

iSLIP algorithm

– Self-routing fabric: Banyan network

Cisco CRS-1 Carrier Routing System

Networked Systems 3035/GZ01

Page 62: Inside Internet Routers

62

• Can we achieve high throughput without a crossbar scheduling algorithm?

• One way: self-routing fabrics– Input port appends self-

routing header to packet– Self-routing header contains

output port– Output port removes self-

routing header

• Example: Banyan-Batcher architecture

Switching via self-routing fabric

Networked Systems 3035/GZ01

Page 63: Inside Internet Routers

63

• Composition: 2 × 2 comparator elements

• Comparator element switches its output connections so that 0-tagged packet exits top, 1-tagged packet exits bottom

• Comparator blocks on two packets with the same header value

Self-routing fabric example: Banyan network

1

0 0

1

0

1

1

0

1

1

0

0

Networked Systems 3035/GZ01

Page 64: Inside Internet Routers

64

• Organized in stages

• Designed to deliver packet with self-routing header x to output x

• Self-routing header: use ith most significant bit for the ith stage

• First stage moves packets to correct upper or lower half based on 1st bit (0↗, 1↘)

Self-routing fabric example: Banyan network

Banyan with four arriving packets

Stage: 1 2 3 Output:

Networked Systems 3035/GZ01

000

010

100

101

Page 65: Inside Internet Routers

65

• Organized in stages

• Designed to deliver packet with self-routing header x to output x

• Self-routing header: use ith most significant bit for the ith stage

• First stage moves packets to correct upper or lower half based on 1st bit (0↗, 1↘)

Self-routing fabric example: Banyan network

Banyan with four arriving packets

001

110

011

111

Networked Systems 3035/GZ01

0 half

1 half

Stage: 1 2 3 Output: 000

010

100

101

Page 66: Inside Internet Routers

66

• 2nd stage moves packets to correct quadrant based on 2nd bit (0↗, 1↘)

Self-routing fabric example: Banyan network

Banyan with four arriving packets

001

110

011

111

Networked Systems 3035/GZ01

Stage: 1 2 3 Output:

1 quad

0 quad

0 quad

1 quad

000

010

100

101

Page 67: Inside Internet Routers

67

• 3rd stage moves packets to correct output based on third bit (0↗, 1↘)

• Fact: Banyan network is blocking-free if packets are presented in sorted ascending order

Self-routing fabric example: Banyan network

Banyan with four arriving packets

001

110

011

111

Networked Systems 3035/GZ01

Stage: 1 2 3 Output: 000

010

100

101

Page 68: Inside Internet Routers

NEXT TIME

Content Delivery: HTTP, Web Caching, and Content Distribution Networks (KJ)

Pre-reading: P & D, Section 9.4.3