a 10gbps/port 8x8 shared bus switch with embedded dram...

15
A 10Gbps/port 8x8 Shared Bus Switch with embedded DRAM Hierarchical Output Buffer Kangmin Lee*, Se-Joong Lee, and Hoi-Jun Yoo Semiconductor System Laboratory Department of EECS Korea Advanced Institute of Science and Technology

Upload: others

Post on 04-Oct-2020

5 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: A 10Gbps/port 8x8 Shared Bus Switch with embedded DRAM ...ssl.kaist.ac.kr/2007/data/conference/lkmESSCIRC2003_HOB_slide.pdf · eDRAM9 LAT LAT 512b Shared Bus LAT eDRAM2 LAT eDRAM3

A 10Gbps/port 8x8 Shared Bus Switch with embedded DRAM Hierarchical Output Buffer

Kangmin Lee*, Se-Joong Lee, and Hoi-Jun Yoo

Semiconductor System LaboratoryDepartment of EECS

Korea Advanced Institute of Science and Technology

Page 2: A 10Gbps/port 8x8 Shared Bus Switch with embedded DRAM ...ssl.kaist.ac.kr/2007/data/conference/lkmESSCIRC2003_HOB_slide.pdf · eDRAM9 LAT LAT 512b Shared Bus LAT eDRAM2 LAT eDRAM3

ESSCIRC 2003

2

Outline

• Introduction & Motivation• Hierarchical Output Buffering Technique• Simulation Results• Implementation• Measurement Result• Conclusion

Page 3: A 10Gbps/port 8x8 Shared Bus Switch with embedded DRAM ...ssl.kaist.ac.kr/2007/data/conference/lkmESSCIRC2003_HOB_slide.pdf · eDRAM9 LAT LAT 512b Shared Bus LAT eDRAM2 LAT eDRAM3

ESSCIRC 2003

3

Introduction• Simplified Router System

Bottleneck 90Gbps FIFO buffer

10Gbps

P/S

P/S

P/S

8x8 Shared Bus SwitchInput Port

NPRX

NPRX

NPRX NP T

X

NP TX

NP TX

Output Port1GHz x 10b

10Gbps

S/P

S/P

S/P

512b

Sha

red

Bus

160MHz x 512b

AF

AF

AF

FIFO

FIFO

FIFO

20MHz x 512b

20Mz x 512b 1GHz x 10b

BW=80Gbps 10Gbps

1

2

8

Page 4: A 10Gbps/port 8x8 Shared Bus Switch with embedded DRAM ...ssl.kaist.ac.kr/2007/data/conference/lkmESSCIRC2003_HOB_slide.pdf · eDRAM9 LAT LAT 512b Shared Bus LAT eDRAM2 LAT eDRAM3

ESSCIRC 2003

4

Motivation

• FIFO Requirements– Max. Input BW: 80Gbps– Max. Output BW: 10Gbps– Buffer Capacity: 1Mbits (=2048 packets)

(1) Dual Port SRAM– t_cycle = 6.25nsec– Area: 16mm2 @ 0.18µm CMOS

(2) Parallel eDRAM buffer– t_cycle = 40nsec– 9 eDRAM MACROs– Cell efficiency is degraded.

• Area: 20mm2 inc. Bus and Latch

FIFO512b

80Gbps 10Gbps

512b

1Mb

eDRAM1

eDRAM9

LAT

LAT

512b

Sha

red

Bus

eDRAM2LAT

eDRAM3LAT80Gbps

10Gbps

Page 5: A 10Gbps/port 8x8 Shared Bus Switch with embedded DRAM ...ssl.kaist.ac.kr/2007/data/conference/lkmESSCIRC2003_HOB_slide.pdf · eDRAM9 LAT LAT 512b Shared Bus LAT eDRAM2 LAT eDRAM3

ESSCIRC 2003

5

Hierarchical Output Buffer

SRAM DRAM Hierarchical Buffer= SRAM + eDRAM

80G

K

DRAM

10Glarge bandwidth,

small capacitysmall bandwidth,

Large capacity

80G

10GSRAM

FunnelSRAM

DRAM

10G

FIFO

K

intermediatebandwidth(30Gbps)

80G

10G

Max.80G

time

regurated b/wLarge

bandwidth

Largecapacity

time

input b/w

Page 6: A 10Gbps/port 8x8 Shared Bus Switch with embedded DRAM ...ssl.kaist.ac.kr/2007/data/conference/lkmESSCIRC2003_HOB_slide.pdf · eDRAM9 LAT LAT 512b Shared Bus LAT eDRAM2 LAT eDRAM3

ESSCIRC 2003

6

80GbpsDual-Port

SRAM

DMUX

eDRAM

eDRAM

eDRAM

eDRAM

10Gbps

10Gbps

10Gbps

10Gbps

10Gbps

HOB FIFO

30GbpsMUX

address manager

512bits I/O

512bits I/O

regulatedirregular input

Hierarchical Output Buffer (Cont’d)

• Determination of K, SRAM and eDRAM capacity– Tradeoff b/w area cost and switch performance– Target Performance

• Packet loss probability: < 10-6 @ 90% offered load• Packet Latency < 100 cycles

K=30Gbps

Page 7: A 10Gbps/port 8x8 Shared Bus Switch with embedded DRAM ...ssl.kaist.ac.kr/2007/data/conference/lkmESSCIRC2003_HOB_slide.pdf · eDRAM9 LAT LAT 512b Shared Bus LAT eDRAM2 LAT eDRAM3

ESSCIRC 2003

7

Hierarchical Output Buffer (Cont’d)

• Simulation Results (K=30Gbps)

Buffer Capacity Latency / Packet Loss Rate

SRAM: 64 packets (= 4KBytes)DRAM: 1024packets (= 0.5Mbits)

Latency : 100 cycles (= 4.8µsec)Packet Loss Rate: ~ 10-6

- Simulation Inputs: Trace of real Internet Protocol packets

Buf

fer S

ize

of e

DR

AM

[cel

ls]

Buf

fer S

ize

(SR

AM

) [ce

lls]

eDRAM

1

10

100

1000

10000

Offered Load(%)

SRAM

1cell = 64Bytes

20 30 40 50 60 70 80 90 100

32

64

16

48

80

Late

ncy

[cel

l-tim

e]Offered Load [%]

1

Cel

l Los

s Pr

ob.

Latency

Cell LossProb.

1cell-time = 48nsec

1E-8

1E-7

1E-6

1E-5

1E-4

1E-3

0.01

0.1

20 30 40 50 60 70 80 90 1001

10

100

1000

Page 8: A 10Gbps/port 8x8 Shared Bus Switch with embedded DRAM ...ssl.kaist.ac.kr/2007/data/conference/lkmESSCIRC2003_HOB_slide.pdf · eDRAM9 LAT LAT 512b Shared Bus LAT eDRAM2 LAT eDRAM3

ESSCIRC 2003

8

RO

M 1

RO

M 2

RO

M 3

RO

M 4

Inpu

t BU

S (5

12b)

DRAMMACRO

Controller256b I/O

repe

ater

256b I/O

Run-time Traffic Control

HOB FIFO

Dua

l Por

t SR

AM

PLL

DRAMMACRO

DRAMMACRO

DRAMMACRO

Implementation of a Prototype

• Input Packet Generator– 20Gbps / ROM x 4 = 80Gbps Traffic Emulation– Run-Time Traffic Control

• PLL– Generates Multiple Clocks, 200MHz for SRAM, 25MHz for eDRAM

Page 9: A 10Gbps/port 8x8 Shared Bus Switch with embedded DRAM ...ssl.kaist.ac.kr/2007/data/conference/lkmESSCIRC2003_HOB_slide.pdf · eDRAM9 LAT LAT 512b Shared Bus LAT eDRAM2 LAT eDRAM3

ESSCIRC 2003

9

Inpu

t Gen

erat

or

Inpu

t BU

S (5

12b)

DRAMMACRO

256b I/O

repe

ater

256b I/O

HOB FIFO

Dua

l Por

t SR

AM

PLL

DRAMMACRO

DRAMMACRO

DRAMMACRO

Implementation (Cont’d)

• Dual Port SRAM– 200MHz, 512b I/O, 64 words (4kB)– 1 Write Port, 1 Read Port– 4.5mm2

Page 10: A 10Gbps/port 8x8 Shared Bus Switch with embedded DRAM ...ssl.kaist.ac.kr/2007/data/conference/lkmESSCIRC2003_HOB_slide.pdf · eDRAM9 LAT LAT 512b Shared Bus LAT eDRAM2 LAT eDRAM3

ESSCIRC 2003

10

DRAMMACRO

DRAMMACRO

Inpu

t Gen

erat

or

Inpu

t BU

S (5

12b)

256b I/O

repe

ater

256b I/O

HOB FIFO

Dua

l Por

t SR

AM

PLL

DRAMMACRO

DRAM(512x512)

Sense Amp

DR

AM

ctr.

Latch (128x2) Latch Latch Latch

Latch (128x2) Latch Latch LatchSense Amp

Sense Amp

Sense Amp

Sense Amp

Sense Amp

Sense Amp

Sense Amp

HOB Controller

Implementation (Cont’d)

• eDRAM MACRO– 25MHz, 512b I/O, 512 words x 4 MACROs (=1Mb)– Dual I/O Scheme for huge I/O bandwidth– 3.4mm2

Page 11: A 10Gbps/port 8x8 Shared Bus Switch with embedded DRAM ...ssl.kaist.ac.kr/2007/data/conference/lkmESSCIRC2003_HOB_slide.pdf · eDRAM9 LAT LAT 512b Shared Bus LAT eDRAM2 LAT eDRAM3

ESSCIRC 2003

11

Implementation (Cont’d)

• Dual I/O Scheme1 MACRO (512x512)

Dual-I/OInterface

- Doubles the I/O Bandwidth- I/O width = Page Size Energy Efficient

cellarray

driver

256b

512b

512b

WDRV x 64 WDRV

WDRV x 64 WDRV

256b

.......

.....................

128b

.......

.....................

eDRAM MACRO

128bDual I/O circuits

clock pulse

odd / even data

Burstselect

eDRAM

Latches

MUX/DEMUX

CLKpulse

Write

Read

Sense Amp Sense Amp

Sense Amp Sense Amp

Page 12: A 10Gbps/port 8x8 Shared Bus Switch with embedded DRAM ...ssl.kaist.ac.kr/2007/data/conference/lkmESSCIRC2003_HOB_slide.pdf · eDRAM9 LAT LAT 512b Shared Bus LAT eDRAM2 LAT eDRAM3

ESSCIRC 2003

12

Die Photo

RO

M

SRA

M

DRAM

S-D bus

S-D bus

PLL

• 0.16µm DRAM Process• Die Area : 6 x 11mm2

• HOB FIFO : 14.7mm2 7.6mm2 @ 0.18µm SRAM

0.16um DRAM process

0.35um DRAM Peripheral process

Page 13: A 10Gbps/port 8x8 Shared Bus Switch with embedded DRAM ...ssl.kaist.ac.kr/2007/data/conference/lkmESSCIRC2003_HOB_slide.pdf · eDRAM9 LAT LAT 512b Shared Bus LAT eDRAM2 LAT eDRAM3

ESSCIRC 2003

13

Measurements

Chip On Board Measurement Setup

Waveforms

Page 14: A 10Gbps/port 8x8 Shared Bus Switch with embedded DRAM ...ssl.kaist.ac.kr/2007/data/conference/lkmESSCIRC2003_HOB_slide.pdf · eDRAM9 LAT LAT 512b Shared Bus LAT eDRAM2 LAT eDRAM3

ESSCIRC 2003

14

Measurements

(2) ROM1 enable

(1) 25MHz DRAM Clock

(3) ROM SRAM : 200MHz

(4) eDRAM MACRO output1 1 1 00

25ns/divR

OM

SRA

M

DRAM

200MHz512b

25MHz512b

100MHz512b

PLL

(1)(2) (3)

(4)

4

100Gbps 12.5Gbps

Page 15: A 10Gbps/port 8x8 Shared Bus Switch with embedded DRAM ...ssl.kaist.ac.kr/2007/data/conference/lkmESSCIRC2003_HOB_slide.pdf · eDRAM9 LAT LAT 512b Shared Bus LAT eDRAM2 LAT eDRAM3

ESSCIRC 2003

15

Conclusion

• Hierarchical Output Buffering (HOB) Technique is proposed for 10Gbps 8x8 Shared Bus Switch

– Area reduction• 7.6mm2 @ 0.18 µm embedded DRAM Process

( < 50% than conventional approach)

– Performance summary• Max. Bandwidth of 90Gbps with 1Mb capacity• Latency: 100 cycles, Packet Loss Rate: 10-6

• Dual I/O scheme expands the I/O width to 512bits