bluespec-4: architectural exploration using ip lookup arvind

24
February 21, 2007 http:// csg.csail.mit.edu/6.375/ L07-1 Bluespec-4: Architectural exploration using IP lookup Arvind Computer Science & Artificial Intelligence Lab Massachusetts Institute of Technology

Upload: keefer

Post on 19-Jan-2016

38 views

Category:

Documents


0 download

DESCRIPTION

Bluespec-4: Architectural exploration using IP lookup Arvind Computer Science & Artificial Intelligence Lab Massachusetts Institute of Technology. Line Card (LC). Arbitration. Packet Processor. Control Processor. SRAM (lookup table). Switch. Queue Manager. IP Lookup. - PowerPoint PPT Presentation

TRANSCRIPT

Page 1: Bluespec-4: Architectural exploration using IP lookup Arvind

February 21, 2007 http://csg.csail.mit.edu/6.375/ L07-1

Bluespec-4:Architectural exploration usingIP lookup

Arvind Computer Science & Artificial Intelligence LabMassachusetts Institute of Technology

Page 2: Bluespec-4: Architectural exploration using IP lookup Arvind

February 21, 2007 L07-2http://csg.csail.mit.edu/6.375/

IP Lookup block in a router

QueueManager

Packet Processor

Exit functions

ControlProcessor

Line Card (LC)

IP Lookup

SRAM(lookup table)

Arbitration

Switch

LC

LC

LC

A packet is routed based on the “Longest Prefix Match” (LPM) of it’s IP address with entries in a routing tableLine rate and the order of arrival must be maintained line rate 15Mpps for 10GE

Page 3: Bluespec-4: Architectural exploration using IP lookup Arvind

February 21, 2007 L07-3http://csg.csail.mit.edu/6.375/

18

2

3

IP address Result M Ref

7.13.7.3 F

10.18.201.5 F

7.14.7.2

5.13.7.2 E

10.18.200.7 C

Sparse tree representation

3

A…

A…

B

C…

C…

5 D

F…

F…

14

A…

A…

7

F…

F…

200

F…

F…

F*

E5.*.*.*

D10.18.200.5

C10.18.200.*

B7.14.7.3

A7.14.*.* F…F…

F

F…

E5

7

10

255

0

14

4A Real-world lookup algorithms are more complex but all make a sequence of dependent memory references.

Page 4: Bluespec-4: Architectural exploration using IP lookup Arvind

February 21, 2007 L07-4http://csg.csail.mit.edu/6.375/

Table representation issuesTable size

Depends on the number of entries: 10K to 100K Too big to fit on chip memory SRAM DRAM

latency, cost, power issues

Number of memory accesses for an LPM? Too many difficult to do table lookup at line rate

(say at 10Gbps)

Control-plane issues: incremental table update size, speed of table maintenance software

In this lecture (to fit the code on slides!): Level 1: 16 bits, Level 2: 8 bits, Level 3: 8 bits from 1 to 3 memory accesses for an LPM

Page 5: Bluespec-4: Architectural exploration using IP lookup Arvind

February 21, 2007 L07-5http://csg.csail.mit.edu/6.375/

“C” version of LPMintlpm (IPA ipa) /* 3 memory lookups */{ int p;

/* Level 1: 16 bits */ p = RAM [ipa[31:16]]; if (isLeaf(p)) return value(p);

/* Level 2: 8 bits */ p = RAM [ptr(p) + ipa [15:8]]; if (isLeaf(p)) return value(p);

/* Level 3: 8 bits */ p = RAM [ptr(p) + ipa [7:0]];

return value(p); /* must be a leaf */}

Not obvious from the C code how to deal with - memory latency - pipelining

216 -1

0

…28 -1

0

28 -1

0

Must process a packet every 1/15 s or 67 ns

Must sustain 3 memory dependent lookups in 67 ns

Memory latency ~30ns to 40ns

Page 6: Bluespec-4: Architectural exploration using IP lookup Arvind

February 21, 2007 http://csg.csail.mit.edu/6.375/ L07-6

IP Lookup:

Microarchitecture -1Static Pipeline

Page 7: Bluespec-4: Architectural exploration using IP lookup Arvind

February 21, 2007 L07-7http://csg.csail.mit.edu/6.375/

Static PipelineAssume the memory has a latency of n (4) cycles and can accept a request every cycleAssume every IP look up takes exactly m (3) memory reads Assuming there is always an input to process:

Pipelining to deal

with latency

Number of lookups

per packet

The system needs space for at least n packets for full pipelining

Inefficient memory usage – unused memory slots represent wasted bandwidth

Difficult to schedule table updates

Page 8: Bluespec-4: Architectural exploration using IP lookup Arvind

February 21, 2007 L07-8http://csg.csail.mit.edu/6.375/

Static (Synchronous) Pipeline Microarcitecture

Provide n (> latency) registers; mark all of them as EmptyLet a new message enter the system when the last register is empty or an old request leavesEach Register r hold either the result value or the remainder of the IP address. r5 also has to hold the next address for the memory

typedef union tagged { Value Result; struct{Bit#(16) remainingIP; Bit#(19) ptr;} IPptr} regData

The state c of each register istypedef enum { Empty , Level1 , Level2 , Level3} State;

RAM

req

IP addr

resp

RAM latency=4

ri, ci

Page 9: Bluespec-4: Architectural exploration using IP lookup Arvind

February 21, 2007 L07-9http://csg.csail.mit.edu/6.375/

Static coderule static (True); if (next(c5) == Empty) if (inQ.notEmpty) begin IP ip = inQ.first(); inQ.deq(); ram.req(ext(ip[31:16])); r1 <= IPptr{ip[15:0],?}; c1 <= Level1; end else c1 <= Empty; else begin r1 <= r5; c1 <= next(c5); if(!isResult(r5)) ram.req(ptr(r5));end r2 <= r1; c2 <= c1; r3 <= r2; c3 <= c2; r4 <= r3; c4 <= c3; TableEntry p; if((c4 != Empty)&& !isResult(r4)) p <- ram.resp(); r5 <= nextReq(p, r4); c5 <= c4; if (c5 == Level3) outQ.enq(result(r5)); endrule

RAM

req

IP addr

resp

RAM latency=4

r1, c1

inQ

outQ

r2, c2

r3, c3

r4, c4

r5, c5

action value method

Page 10: Bluespec-4: Architectural exploration using IP lookup Arvind

February 21, 2007 L07-10http://csg.csail.mit.edu/6.375/

The next functionfunction State next (State c); case (c) Empty : return(Empty); Level1 : return(Level2); Level2 : return(Level3); Level3 : return(Empty); endcaseendfunction

Page 11: Bluespec-4: Architectural exploration using IP lookup Arvind

February 21, 2007 L07-11http://csg.csail.mit.edu/6.375/

The nextReq function

function RegData nextReq(TableEntry p, RegData r); case (r) matches tagged Result .*: return r; tagged IPptr .ip: if (isLeaf(p)) return tagged Result value(p), else return tagged IPptr{remainingIP: ip.remainingIP << 8, ptr: ptr(p) + ip.remainingIP[15:8] };endcaseendfunction

Page 12: Bluespec-4: Architectural exploration using IP lookup Arvind

February 21, 2007 L07-12http://csg.csail.mit.edu/6.375/

Another Static Organization

RAM

FSM

MUX / De-MUX

FSM FSM FSM

Counter

MUX / De-MUX

resultIP addr

Each packet is processed by its own FSM Counter determines which FSM gets to go

Page 13: Bluespec-4: Architectural exploration using IP lookup Arvind

February 21, 2007 L07-13http://csg.csail.mit.edu/6.375/

Code for Static-2 Organizationfunction Action doFSM(r,c); action if (c == Empty)

else if (c == Level1 || c == Level2) begin

else if (c == Level3) begin

endaction endfunction

rule static2(True); cnt <= cnt + 1; for (Integer i=0; i<maxLat; i=i+1) if(fromInteger(i)==cnt) doFSM(r[cnt],c[cnt]); endrule

RAM

FSM

MUX / De-MUX

FSM FSM FSM

MUX / De-MUX

resultIP addr

Counter

if (inQ.notEmpty) begin IP ip = in.first(); inQ.deq(); ram.req(ext(ip[31:16])); c <= Level1; r <= IPptr{ip[15:0],?}; end else c <= Empty;

if (!isResult(r)) p <- ram.resp(); RegData nextr = nextReq(p, r); if (!isResult(nextr)) ram.req(ptr(nextr)); c <= next(c); r <= nextr; end

if (!isResult(r)) p <- ram.resp(); RegData nextr = nextReq(p, r); outQ.enq(result(nextr)); if (inQ.notEmpty) begin IP ip = in.first(); inQ.deq(); ram.req(ext(ip[31:16])); c <= Level1; r <= IPptr{ip[15:0],?}; end else c <= Empty;

Page 14: Bluespec-4: Architectural exploration using IP lookup Arvind

February 21, 2007 L07-14http://csg.csail.mit.edu/6.375/

Implementations of Static pipelines

Two designers, two results

LPM versions Best Area(gates)

Best Speed(ns)

Static V (Replicated FSMs) 8898 3.60

Static V (Single FSM) 2271 3.56

Replicated:

RAM

FSM

MUX / De-MUX

FSM FSM FSM

Counter

MUX / De-MUX

resultIP addr

FSM

RAM

MUX

result

IP addr

BEST:

Each packet is processed by one FSM

Shared FSM

Page 15: Bluespec-4: Architectural exploration using IP lookup Arvind

February 21, 2007 http://csg.csail.mit.edu/6.375/ L07-15

IP Lookup:

Microarchitecture -2Circular Pipeline

5-minute break to stretch you legs

Page 16: Bluespec-4: Architectural exploration using IP lookup Arvind

February 21, 2007 L07-16http://csg.csail.mit.edu/6.375/

Circular pipeline

luReq

luResp

Completion buffer- gives out tokens to control the entry into the circular pipeline- ensures that departures take place in order even if lookups complete out-of-order

The fifo holds the token while the memory access is in progress: Tuple2#(Bit#(16), Token)

enter?enter?done?done?RAM

cbufyes

getToken

inQ

fifo

no

remainingIP

Page 17: Bluespec-4: Architectural exploration using IP lookup Arvind

February 21, 2007 L07-17http://csg.csail.mit.edu/6.375/

Circular Pipeline Coderule enter (True); Token tok <- cbuf.getToken(); IP ip = inQ.first(); ram.req(ext(ip[31:16])); fifo.enq(tuple2(ip[15:0], tok)); inQ.deq();endrule

enter?enter?done?done?RAM

cbufinQ

fifo

rule recirculate (True); TableEntry p <- ram.resp(); match {.rip, .t} = fifo.first(); if (isLeaf(p)) cbuf.put(t, p); else begin fifo.enq(tuple2(rip << 8, tok)); ram.req(p+signExtend(rip[15:8])); end fifo.deq();endrule

Page 18: Bluespec-4: Architectural exploration using IP lookup Arvind

February 21, 2007 L07-18http://csg.csail.mit.edu/6.375/

Completion bufferinterface CBuffer#(type t); method ActionValue#(Token) getToken(); method Action put(Token tok, t d); method ActionValue#(t) getResult();endinterface

module mkCBuffer (CBuffer#(t)) provisos (Bits#(t,sz)); RegFile#(Token, Maybe#(t)) buf <- mkRegFileFull(); Reg#(Token) i <- mkReg(0); //input index Reg#(Token) o <- mkReg(0); //output index Reg#(Token) cnt <- mkReg(0); //number of filled slots…

I

I

V

I

V

Icnt

i

o

buf

Entry must be representable as bits

Page 19: Bluespec-4: Architectural exploration using IP lookup Arvind

February 21, 2007 L07-19http://csg.csail.mit.edu/6.375/

Completion buffer

... // state elements buf, i, o, n ... method ActionValue#(t)

getToken() if (cnt <= maxToken); cnt <= cnt + 1; i <= i + 1; buf.upd(i, Invalid); return i; endmethod

method Action put(Token tok, t data); return buf.upd(tok, Valid data); endmethod

method ActionValue#(t) getResult() if (cnt > 0) &&& (buf.sub(o) matches tagged (Valid .x)); o <= o + 1; cnt <= cnt - 1; return x; endmethod

I

I

V

I

V

Icnt

i

o

buf

Page 20: Bluespec-4: Architectural exploration using IP lookup Arvind

February 21, 2007 L07-20http://csg.csail.mit.edu/6.375/

Longest Prefix Match for IP lookup:3 possible implementation architectures

Rigid pipeline

Inefficient memory usage but simple design

Linear pipeline

Efficient memory usage through memory port replicator

Circular pipeline

Efficient memory with most complex control

Designer’s Ranking:

1 2 3Which is “best”?

Arvind, Nikhil, Rosenband & Dave ICCAD 2004

Page 21: Bluespec-4: Architectural exploration using IP lookup Arvind

February 21, 2007 L07-21http://csg.csail.mit.edu/6.375/

Synthesis resultsLPM versions

Code size(lines)

Best Area(gates)

Best Speed(ns)

Mem. util. (random workload)

Static V 220 2271 3.56 63.5%

Static BSV 179 2391 (5% larger) 3.32 (7% faster) 63.5%

Linear V 410 14759 4.7 99.9%

Linear BSV 168 15910 (8% larger) 4.7 (same) 99.9%

Circular V 364 8103 3.62 99.9%

Circular BSV 257 8170 (1% larger) 3.67 (2% slower) 99.9%

Synthesis: TSMC 0.18 µm lib

V = Verilog;BSV = Bluespec System Verilog

- Bluespec results can match carefully coded Verilog- Micro-architecture has a dramatic impact on performance- Architecture differences are much more important than language differences in determining QoR

Page 22: Bluespec-4: Architectural exploration using IP lookup Arvind

February 21, 2007 L07-22http://csg.csail.mit.edu/6.375/

A problem ...

rule recirculate (True); TableEntry p <- ram.resp(); match {.rip, .t} = fifo.first(); if (isLeaf(p)) cbuf.put(t, p); else begin fifo.enq(tuple2(rip << 8, tok)); ram.req(p+signExtend(rip[15:8])); end fifo.deq();endrule

What condition does the fifo need to satisfy for this rule to fire?

Page 23: Bluespec-4: Architectural exploration using IP lookup Arvind

February 21, 2007 L07-23http://csg.csail.mit.edu/6.375/

module mkFIFO1 (FIFO#(t)); Reg#(t) data <- mkRegU(); Reg#(Bool) full <- mkReg(False); method Action enq(t x) if (!full); full <= True; data <= x; endmethod method Action deq() if (full); full <= False; endmethod method t first() if (full); return (data); endmethod method Action clear(); full <= False; endmethodendmodule

One Element FIFO

enq and deq cannot be enabled together!

Page 24: Bluespec-4: Architectural exploration using IP lookup Arvind

February 21, 2007 L07-24http://csg.csail.mit.edu/6.375/

Another Problem: Dead cycle eliminationrule enter (True); Token tok <- cbuf.getToken(); IP ip = inQ.first(); ram.req(ext(ip[31:16])); fifo.enq(tuple2(ip[15:0], tok)); inQ.deq();endrule

enter?enter?done?done?RAM

cbufinQ

fifo

rule recirculate (True); TableEntry p <- ram.resp(); match {.rip, .t} = fifo.first(); if (isLeaf(p)) cbuf.put(t, p); else begin fifo.enq(tuple2(rip << 8, tok)); ram.req(p+signExtend(rip[15:8])); end fifo.deq();endrule

Can a new request enter the system simultaneously with an old one leaving?

Solutions next time ...