packet-mode emulation of output-queued switches david hay, cs, technion joint work with hagit attiya...

33
Packet-Mode Emulation of Output- Queued Switches David Hay, CS, Technion Joint work with Hagit Attiya (CS, Technion), Isaac Keslassy (EE, Technion)

Post on 20-Dec-2015

230 views

Category:

Documents


0 download

TRANSCRIPT

Packet-Mode Emulation of Output-Queued Switches

David Hay, CS, Technion

Joint work with Hagit Attiya (CS, Technion),

Isaac Keslassy (EE, Technion)

CIOQ Switches

Cell-Mode Scheduling

Cell-Mode Scheduling

Cell-Mode Scheduling

Trend towards Packet-Mode

Cell-mode scheduling is getting too hard Fragmentation and reassembly should work very fast,

at the external rate Extra header for each cell loss of bandwidth

For optical switches such fragmentation and reassembly are prohibitive

Cell-mode schedulers are packet-oblivious Degradation of the overall performance

Packet-Mode Scheduling

Packet-Mode Scheduling

No need for fragmentation and reassembly Must ensure contiguous packet delivery over the

fabric While input i delivers a packet to output j, neither input

i nor output j can handle other packets.

Can packet-mode schedulers provide similar

performance guarantees as cell-mode schedulers?

[Marsan et al., 2002][Ganjali et al., 2003][Turner, 2006]

Output Queuing Emulation

OQ switches are considered optimal with respect to queuing delay and throughput But too hard to implement in practice…

Emulation: Same input traffic same output traffic

How hard is it for cell-mode / packet-mode CIOQ switch to emulate OQ switch?

Output Queuing Emulation

OQ switches are considered optimal with respect to queuing delay and throughput But too hard to implement in practice…

Emulation: Same input traffic same output traffic

How hard is it for cell-mode / packet-mode CIOQ switch to emulate OQ switch?

Easy with speedup S=N N scheduling decisions every time-slot:

In the 1st decision forward the cell of input 1 In the 2nd decision forward the cell of input 2⋮ In the Nth decision forward the cell of input N

Possible with speedup S2: CCF algorithm Lower bound: S≥2-1/N is required

[Chuang et al.,1999]

Cell-Mode Emulation is Possible

What is the speedup required for

packet-mode emulation?

Packet-Mode Emulation is Impossible

Regardless of speedupEven with speedup S=N

Packet-Mode Emulation is Impossible

Packet-Mode Emulation is Impossible

Packet-Mode Emulation is Impossible

Packet-Mode Emulation is Impossible

Packet-Mode Emulation is Impossible

Emulation w/ Relative Queuing Delay

The CIOQ switch is allowed a bounded lag behind the shadow OQ switch

Exact same behavior as the optimal OQ switch, but with some extra delay Called relative queuing delay

Can we provide packet-mode OQ emulation with bounded RQD and small speedup?

Our Results: Speedup-RQD tradeoff

Speedup

RQD

2

4

2Lmax

Lower bound on RQD (even with infinite speedup)

Lower bound on the speedup (from cell-mode scheduling)

Generalization of cell-mode scheduling with S=2: Taking each packet of size ≤ Lmax as one huge cell

Lmax=maximum packet size

First algorithm: S 4 with RQD=O(NLmax)

Intuition for Emulation Algorithms

Packet Mode CIOQ

Packet Mode OQ

Cell Mode CIOQ w/ S=2

Underlying CCF Algorithm

Observation: Packet-Mode OQ switch is a Cell-Mode OQ switch with different queuing discipline (called PIFO)

Cell-Mode CIOQ w/ CCF (and speedup S=2) emulates any PIFO cell-mode OQ switch [Chuang et al.,1999]

But, CCF does not maintain contiguous packet forwarding over the fabric!

Packet Mode CIOQ

Packet Mode OQ

Cell Mode CIOQ w/ S=2

PIFO Cell-Mode OQ

=

Intuition for Emulation Algorithms

Packet Mode CIOQ

Packet Mode OQ

Cell Mode CIOQ w/ S=2

Two sub-steps:1. Framing2. Contiguous Decomposition

Frame-Based Schedulers

Works in pipelined frame-based manner

Within each frame: Build a demand matrix for this frame Schedule the demand matrix of the

previous frame

time

At each frame of size T, CCF forwards at most 2T cells from each input and to each output.

Building the Demand Matrix

3012

1221

2220

0213

Number of cells CCF sent from input 1 to output 1 in

the last frame

+ + +

+

+

+

+

+

+ +

+

+

≤ 2T

≤ 2T

≤ 2T

≤ 2T

++++

++++

++++≤≤ ≤ ≤

Problem: A packet may span several frames.

2T 2T 2T 2T

Building the Demand Matrix

Count only packets whose last cell is forwarded by the CCF in the frame

Each row/column in the matrix is bounded by 2T+N(Lmax-1)For each input-output pair only cells of one

additional packet can be added.

Translates into RQD of 2T+Lmax-2.

Intuition for Emulation Algorithms

Packet Mode CIOQ

Packet Mode OQ

Cell Mode CIOQ w/ S=2

Two sub-steps:1. Framing2. Contiguous Decomposition

Decomposing the Demand Matrix Challenge: Decompose the matrix into permutations

while maintaining contiguous packet delivery. Each permutation dictates a scheduling decision. Speedup = Number of permutations/Frame Length

First try: optimal Birkhoff von-Neumann decomposition results in 2T+N(Lmax-1) permutations.

0010

0100

1000

0001

1000

0010

0100

0001

1000

0100

0010

0001

3012

1221

2220

0213

0001

0010

1000

0100

0001

1000

0100

0010

1000

0001

0010

0100

Contiguous Greedy Decomposition

To maintain contiguous packet delivery: If (i,j) was matched in iteration t-1 and there are more

(i,j) cells to schedule keep for iteration t.

Find a greedy matching for the rest of the matrix.

1000

0010

0100

0001

Iteration t-1

1000

0010

0100

0001

Iteration t

Cells left from 1 to 1

0010

0100

1000

0001

T

LN 1)1(24 max Speedup: RQD: 2T+Lmax-2

Our Results: Speedup-RQD tradeoff

Speedup

RQD

2

4

2Lmax

S=4+ (2N(Lmax-1)-1)/TRQD = 2T+Lmax-2

Next…

Packet-Mode Emulation w/ S2

Separate demand matrix for every possible packet size

Concatenate packets of the same size into mega-packets of size k=LCM(1,…,Lmax)

Leftover matrix for each size m

Packet Mode CIOQ

Packet Mode OQ

Cell Mode CIOQ w/ S=2

Two sub-steps:1. Framing2. Contiguous

Decomposition

Packet-Mode Emulation w/ S2

Optimally decompose (w/ Birkhoff von-Neumann) the mega-packets

matrix then the leftover

matrices

Packet Mode CIOQ

Packet Mode OQ

Cell Mode CIOQ w/ S=2

Two sub-steps:1. Framing2. Contiguous

Decomposition

T

kLNS

)1(2 max

22 max LTRQD

Wrap-up

Packet-mode scheduling can be done with the same speedup as cell-mode scheduling

With the price of bounded RQD Future work: lower bounds

??

Thank You!