the fork-join router

34
The Fork-Join Router Nick McKeown Assistant Professor of Electrical Engineering and Computer Science, Stanford University [email protected] http://www.stanford.edu/~nickm

Upload: ona

Post on 13-Jan-2016

47 views

Category:

Documents


2 download

DESCRIPTION

The Fork-Join Router. Nick McKeown Assistant Professor of Electrical Engineering and Computer Science, Stanford University [email protected] http://www.stanford.edu/~nickm. Outline. Quick Background on Packet Switches What’s the problem? “What if data rates exceed memory bandwidth?” - PowerPoint PPT Presentation

TRANSCRIPT

Page 1: The Fork-Join Router

High PerformanceSwitching and RoutingTelecom Center Workshop: Sept 4, 1997.

The Fork-Join Router

Nick McKeownAssistant Professor of Electrical Engineering and Computer Science, Stanford University

[email protected]://www.stanford.edu/~nickm

Page 2: The Fork-Join Router

Outline

• Quick Background on Packet Switches

• What’s the problem?“What if data rates exceed memory

bandwidth?”

• The Fork-Join Router• Parallel Packet Switches

Page 3: The Fork-Join Router

First Generation Packet Switches

Shared Backplane

Line Interface

CPU

Memory

CPU BufferMemory

LineInterface

DMA

MAC

LineInterface

DMA

MAC

LineInterface

DMA

MAC

Fixed length “DMA” blocksor cells. Reassembled on egress

linecard

Fixed length cells or variable length packets

Page 4: The Fork-Join Router

Second Generation Packet Switches

CPU BufferMemory

LineCard

DMA

MAC

LocalBuffer

Memory

LineCard

DMA

MAC

LocalBuffer

Memory

LineCard

DMA

MAC

LocalBuffer

Memory

Page 5: The Fork-Join Router

Third Generation Packet Switches

LineCard

MAC

LocalBuffer

Memory

CPUCard

LineCard

MAC

LocalBuffer

Memory

Switched Backplane

Line Interface

CPUMem

ory

Page 6: The Fork-Join Router

Fourth Generation Packet Switches

Page 7: The Fork-Join Router

Two Basic Techniques

Input-queued Crossbar

Shared Memory

1+1 = 2 operations per cell time

N+N = 2N operations per cell time

Page 8: The Fork-Join Router

Shared MemoryThe Ideal

A

ZZ

A

ZZZ

A

A

Z

A

ZPIKTD

AAAAAAA

FXHBAD

Numerous work has proven and made possible:– Fairness– Delay Guarantees– Delay Variation Control– Loss Guarantees– Statistical Guarantees

Page 9: The Fork-Join Router

Precise Emulation of an Output Queued Switch

N N

Output Queued Switch

1

N

Combined Input-Output Queued Switch

= ?

Scheduler

Page 10: The Fork-Join Router

Result

Theorem: A speedup of 2-1/N is necessary

and sufficient for a combined input- and output-queued switch to precisely emulate an output-queued switch for all traffic.

Joint work with Balaji Prabhakar at Stanford.

Page 11: The Fork-Join Router

Outline

• Quick Background on Packet Switches

• What’s the problem?“What if data rates exceed memory

bandwidth?”

• The Fork-Join Router• Parallel Packet Switches

Page 12: The Fork-Join Router

Buffer MemoryHow Fast Can I Make a Packet Buffer?

BufferMemory

5ns SRAM

Rough Estimate:– 5ns per memory operation.– Two memory operations per

packet.– Therefore, maximum

51.2Gb/s.

– In practice, closer to 40Gb/s.

64-byte wide bus 64-byte wide bus

Page 13: The Fork-Join Router

Buffer MemoryIs It Going to Get Better?

time

Specmarks,Memory size,Gate density

time

MemoryBandwidth

(to core)

Page 14: The Fork-Join Router

Optical Physical Layers……are Going to Make Things “Worse”

DWDM:– More ’s per fiber more “ports” per switch.– # ports: 16, …, 1000’s.

Data rate:– More b/s per higher capacity.– Data rates: 2.5Gb/s, 10Gb/s, 40Gb/s, 160Gb/s, …

Page 15: The Fork-Join Router

Approach #1: Ping-pong Buffering

BufferMemory

64-byte wide bus

BufferMemory

64-byte wide bus

Page 16: The Fork-Join Router

Approach #1: Ping-pong Buffering

BufferMemory

64-byte wide bus

BufferMemory

64-byte wide bus

Memory bandwidth doubled to ~80 Gb/s

Page 17: The Fork-Join Router

Approach #2: Multiple Parallel Buffers

aka Banking, Interleaving

BufferMemory

BufferMemory

BufferMemory

BufferMemory

Page 18: The Fork-Join Router

Outline

• Quick Background on Packet Switches

• What’s the problem?“What if data rates exceed memory

bandwidth?”

• The Fork-Join Router• Parallel Packet Switches

Page 19: The Fork-Join Router

The Fork-Join Router

1

2

k

1

N

rate, R

rate, R

rate, R

rate, R

1

N

Router

Bufferless

Page 20: The Fork-Join Router

The Fork-Join Router

• Advantages– kmemory bandwidth – klookup/classification rate – k routing/classification table size

• Problems– How to demultiplex prior to

lookup/classification?– How does the system perform/behave?– Can we predict/guarantee performance?

Page 21: The Fork-Join Router

Outline

• Quick Background on Packet Switches

• What’s the problem?“What if data rates exceed memory

bandwidth?”

• The Fork-Join Router• Parallel Packet Switches

Page 22: The Fork-Join Router

A Parallel Packet Switch

1

N

rate, R

rate, R

rate, R

rate, R

1

N

OutputQueuedSwitch

OutputQueuedSwitch

OutputQueuedSwitch

1

2

k

Page 23: The Fork-Join Router

Parallel Packet SwitchQuestions

1. Can it be work-conserving?2. Can it emulate a single big output

queued switch?3. Can it support delay guarantees,

strict-priorities, WFQ, …?4. What happens with multicast?

Page 24: The Fork-Join Router

Parallel Packet SwitchWork Conservation

rate, R1rate, R

1

2

k

1

R/k

R/k

R/k

R/k

R/k

R/k

Input LinkConstraint

Output LinkConstraint

Page 25: The Fork-Join Router

Parallel Packet SwitchWork Conservation

rate, R1rate, R

1

2

k

1

R/k

R/k

R/k

R/k

R/k

R/k

1

2

3 Output LinkConstraint

45

1

2

3

4

1234115

Page 26: The Fork-Join Router

Parallel Packet SwitchWork Conservation

1

N

rate, R

rate, R

rate, R

rate, R

1

N

OutputQueuedSwitch

OutputQueuedSwitch

OutputQueuedSwitch

1

2

k

S(R/k)

S(R/k)

S(R/k)

S(R/k)

S(R/k)

S(R/k)

Page 27: The Fork-Join Router

Precise Emulation of an Output Queued Switch

N N

Output Queued Switch

1

N

Parallel Packet Switch

= ?

1

N

1

N

Page 28: The Fork-Join Router

Parallel Packet SwitchTheorems

1. If S > 2k/(k+2) 2 then a parallel packet switch can be work-conserving for all traffic.

2. If S > 2k/(k+2) 2 then a parallel packet switch can precisely emulate a FCFS output-queued switch for all traffic.

Page 29: The Fork-Join Router

Parallel Packet SwitchTheorems

3. If S > 3k/(k+3) 3 then a parallel packet switch can be precisely emulate a switch with WFQ, strict priorities, and other types of QoS, for all traffic.

Page 30: The Fork-Join Router

An asideUnbuffered Clos Circuit Switch

Expansion factor required = 2-1/N

Page 31: The Fork-Join Router

Clos Network

I1

IX

a

b

c

O1

OXm {

}m

}m

m {

O1 O2 O3 Ox

I1 I2

I3 Ix

b

<= min(R,m) entries in each row <= min(R,m) entries in each column

R middlestage switches

Page 32: The Fork-Join Router

Clos Network

I1

IX

ab

c

O1

OXm {

}m

}m

m {

O1 O2 O3 Ox

I1 I2

I3 Ix

b

<= min(R,m) entries in each row<= min(R,m) entries in each column

R middlestage switches

Define: UIL(Ii) = used links at switch Ii to connect to middle stages. UOL(Oi) = used links at switch Oi to connect to middle stages.

If we wish to connect Ii to Oi:

When adding connection: |UIL(Ii)| <= m-1 and |UOL(Oi)| <= m-1

Worst-case: |UIL(Ii) U UOL(Oi)| = 2m -2

Therefore, if R >= 2m-2 there are always enough middle stages.

Page 33: The Fork-Join Router

An asideUnbuffered Clos Circuit Switch

Expansion factor required = 2-1/N

Expansion 2 - 4/(k+2)

Page 34: The Fork-Join Router

Fork-Join Router ProjectWhat’s next?

• Theory: – Extending results to distributed

algorithms.– Extending results to multicast.

• Implementation/Prototyping:– Under discussion...