dx-gt: memory management and crossbar switch generator...
TRANSCRIPT
© Georgia Institute of Technology, 2003
DX-Gt: Memory Management and Crossbar Switch Generator for Multiprocessor System-on-a-Chip
Mohamed Shalan*, Eung Shin* and Vincent J. Mooney III+
{shalan, shin, mooney}@ece.gatech.edu
+Assistant Professor, *School of Electrical and Computer Engineering+Adjunct Assistant Professor, College of Computing
Georgia Institute of TechnologyAtlanta, Georgia, USA
SASIMI Workshop, April 3-4, 2003
SASIMI April 3-4, 2003 2 © Georgia Institute of Technology, 2003
Agenda
n Introduction & Motivationn Target Architecturen The DX-Gtn Synthesis Resultsn SoC Floorplann Conclusions
SASIMI April 3-4, 2003 3 © Georgia Institute of Technology, 2003
Introduction
n In next five years multiprocessor SoCs will be dominated by designs with four to eight processors and on-chip DRAM of 16Mbytes to 128Mbytes
n As the number of transistors on a single chip increases rapidly, there is a productivity gap between the increasing number of available transistors and the design time.
SASIMI April 3-4, 2003 4 © Georgia Institute of Technology, 2003
Introduction
n To reduce the productivity gap, designers use IP cores.
n An IP core should be customized before being used in a system different than the one for which it was designed.
n Either an engineer must spend significant effort altering the core by hand or else an IP generators should be used.
SASIMI April 3-4, 2003 5 © Georgia Institute of Technology, 2003
Motivation
RTOS1
Hardware RTOS library
Software RTOS library
GUI tool
SW RTOS w/ dyn. mem.mngmnt
SW RTOS + SoCDMM
U
SW RTOS + SoCLC + SoCDMM
U
Compile Stage for each systemApplication
Executable HW file for each
Executable SW file for each
Simulation in Seamless CVE
Base Architecture
library
VCS XRAY
RTOS2 RTOS3 RTOS4 RTOS5
RTU
User Input
SW RTOS w/
sem
SW RTOS + SoCLC
RTOS6
SASIMI April 3-4, 2003 6 © Georgia Institute of Technology, 2003
Previous Work
n No known previous work in SoCDMMUgeneration or Xbar generation
n Previous work exists in memory management approaches similar to SoCDMMU
n Previous work exists in Xbar design
SASIMI April 3-4, 2003 7 © Georgia Institute of Technology, 2003
Agenda
n Introduction & Motivationn Target Architecturen The DX-Gtn Synthesis Resultsn SoC Floorplann Conclusions
SASIMI April 3-4, 2003 8 © Georgia Institute of Technology, 2003
Target Architecture
SoCDMMU
PEM
Cache
PE2
Cache
PE1
Cache. . . . .
ConfigurableXbar
. . .
Mem
ory
Mod
ule N
Mem
ory
Mod
ule 2
Mem
ory
Mod
ule 1
. . .
. . .
cont
rol
Global On-Chip L2 Memory
SASIMI April 3-4, 2003 9 © Georgia Institute of Technology, 2003
The SoCDMMUPE1
AddressConverter
Command1/Status1
PEnAddress
Converter
BASICSoCDMMU MUX
CM
DR
EG
RequestSchedulerControl Unit
STA
TUS
RE
GC
MD
RE
GS
TATU
SR
EG
CM
DR
EG
STA
TUS
RE
G
Command2/Status2
Commandn/Statusn
Rd 1
Wr 1
Rd n
Wr n
. . .
To GlobalMemory Buses
PE1 Memory BusPE2 Memory Bus
PEn Memory Bus
.
.
.
.
.
.
.
.
.
.
AllocationTable
Allocation UnitDeallocation Unit
Allocation Vector
Com
mand
Register
0
MUX
Statu
sR
egister
ControlUnit
Data Bus
BusControlLines
SASIMI April 3-4, 2003 10 © Georgia Institute of Technology, 2003
The SoCDMMUPE1
AddressConverter
Command1/Status1
PEnAddress
Converter
BASICSoCDMMU MUX
CM
DR
EG
RequestSchedulerControl Unit
STA
TUS
RE
GC
MD
RE
GS
TATU
SR
EG
CM
DR
EG
STA
TUS
RE
G
Command2/Status2
Commandn/Statusn
Rd 1
Wr 1
Rd n
Wr n
. . .
To GlobalMemory Buses
PE1 Memory BusPE2 Memory Bus
PEn Memory Bus
.
.
.
.
.
.
.
.
.
.
Address Converter
Physical G_block no. Offset
Virtual Block no. Offset
0x00FA
0x0A3F
Bin
ary
Enc
oder
.
.
.
=
=
=
=
0
1
2
n-1
SASIMI April 3-4, 2003 11 © Georgia Institute of Technology, 2003
Xbar Switch
n One configuration of target architecture for 4x4 case
n All switches directly interface to SoCDMMU.
n Each switch compares physical addresses from SoCDMMU and judges if addresses belong to the address space of the attached memory block.
SoCDMMU
PE3
PE2
PE1
PE0
4x4 Xbar
4x1switch
3
4x1switch
0
4x1switch
2
4x1switch
1
mem0
mem1
mem3
mem2
SASIMI April 3-4, 2003 12 © Georgia Institute of Technology, 2003
Xbar (Continued)n An MxN switch consists of N Mx1 switches, where
M = # of PEs and N = # of memory blocks.n If an ‘address’input belongs to the address space
of the attached memory block, ‘mem_req’inside a 4x1 switch is asserted.
n ‘mem_req’signals ask an arbiter to grant a single bus attached to the corresponding memory block.
n An arbiter handles M requests from M processors and grants one request in round-robin order
SASIMI April 3-4, 2003 13 © Georgia Institute of Technology, 2003
Example of Xbar Configuration
4x1switch
7
4x1switch
0
4x1switch
6
4x1switch
1
4x1switch
5
4x1switch
2
4x1switch
4
4x1switch
3
mem0
mem1
mem2
mem3
mem7
mem6
mem5
mem4
n 4x8 Xbarn Supports 4 PEs and
8 memory modules.
n Connects a particular PE signals to a specific memory module by a 4x1 switch according to a physical address from an SoCDMMU.
SASIMI April 3-4, 2003 14 © Georgia Institute of Technology, 2003
Agenda
n Introduction & Motivationn Target Architecturen The DX-Gtn Synthesis Resultsn SoC Floorplann Conclusions
SASIMI April 3-4, 2003 15 © Georgia Institute of Technology, 2003
DX-Gt Overview
DX-Gt
H/W DB
RT
OS
DB
VP
P Config.SoC H/W
(*.v)
Config.RTOS
(*.c, *.S)
DCTM ScriptCVE *.cveReport *.rpt
DX-Gt + δ
SASIMI April 3-4, 2003 16 © Georgia Institute of Technology, 2003
User Specified Parameters§ System wide parameters (first two screens)
ü The number and type of PEsü The number and size of the global on-chip memory G_blocksü The number of memory modules & ports/moduleü The memory type ü The choice of use of SoCDMMU, Xbar, both or none
§ SoCDMMU related parameters (third screen)ü The scheduling scheme to resolve concurrent SoCDMMU
requests ü Memory G_blocks initially assigned to the PEs
§ Xbar related parameters (fourth screen)ü The data bus width of each PE
SASIMI April 3-4, 2003 17 © Georgia Institute of Technology, 2003
The SoCDMMU GenerationPE1
AddressConverter
Command1/Status1
PEnAddress
Converter
BASICSoCDMMU MUX
CM
DR
EG
RequestSchedulerControl Unit
STA
TUS
RE
GC
MD
RE
GS
TATU
SR
EG
CM
DR
EG
STA
TUS
RE
G
Command2/Status2
Commandn/Statusn
Rd 1
Wr 1
Rd n
Wr n
. . .
To GlobalMemory Buses
PE1 Memory BusPE2 Memory Bus
PEn Memory Bus
.
.
.
.
.
.
.
.
.
.
AllocationTable
Allocation UnitDeallocation Unit
Allocation Vector
Com
mand
Register
0
MUX
Statu
sR
egister
ControlUnit
Data Bus
BusControlLines
SASIMI April 3-4, 2003 18 © Georgia Institute of Technology, 2003
n Verilog Languagen `define & `ifdef
n not enough, e.g., cannot automatically generate n modules
n Verilog 2000/2001n Generate loops (not supported by available tools)n not enough, e.g., cannot calculate log2(n)
n Verilog PreProcessor (VPP) n `ifdef, `ifndef, `if, `let, `for, `while, `switch & `case
n LOG2, ROUND, CEIL, FLOOR, EVEN, ODD, MAX, MIN & ABS
Customizing the SoCDMMU
SASIMI April 3-4, 2003 19 © Georgia Institute of Technology, 2003
Customizing the SoCDMMU
VPP
SASIMI April 3-4, 2003 20 © Georgia Institute of Technology, 2003
Allocation Unit Optimization
-
0's Counter
-
0's Counter
sz0sz1sz2
s0s1I0I1
mm
I_MUX
I0I1Ik-1
SZ_MUX
sz0sz1szk-1
1's Selector
MUX 0
I0I
I
s0..
.
.
s0s1
sk-1
in[m-1:0]in[2m-1:m]
out[m-1:0]
MUX1
I1I
out[2m-1:m]
-
0's Counter
szk
sk-1Ik-1
m
szk-1
. . .
. . .
. . .MUXk-1
Ik-1I
out[n-1:n-m-2]
in[n-1:n-m-2]
s0s1sk-1
sz_selected I_selected
mmm
log2n
mlog2n
. . .. . .
s1sk-1P
riorit
yE
ncod
er
Prio
rity
Enc
oder
1
MM
RequestSize
0’s Counter
Almost Constant
k Subtractors
k x DS
SZ_MUX
Almost Constatnt
1’s Selector
m x D1
MUX
Almost Constant
SASIMI April 3-4, 2003 21 © Georgia Institute of Technology, 2003
Allocation Unit Optimization
n Delay over the critical pathDelay = C + k*Ds + m*D1
n Also, we haven = k * m : n is the no. of G_blocks
n This leads toDelay = C + k*Ds + (n/k)*D1
n The Delay is minimum whenk = SQR(n*D1/Ds) : k is power of 2
SASIMI April 3-4, 2003 22 © Georgia Institute of Technology, 2003
Xbar Generation
n Customized Xbar generated in Verilog at the RTL level.n An arbiter is generated by RAGn Parameterizable switch blocks are hand-
coded beforehand.n All submodules are connected by wire
names.
SASIMI April 3-4, 2003 23 © Georgia Institute of Technology, 2003
Flowchart of DX-GtUser Input
Validation
SoCDMMU?
Fetch the required *.v *.vpp files
Change the parameters of each *.vpp file
Xbar?
Pass the *.vppfiles to VPP for
processing
gen_xbar()
Generate top level file and compress
all verilog files
No No
YesYes
O/P files to user
User Input*
• 4 ARM9tdmi processors (N=4)
• Use SoCDMMU & Xbar
• 256 G_blocks (n=256)
• 4 Memory Modules (M=4)
• Initial Memory Allocations
*partial list
SASIMI April 3-4, 2003 24 © Georgia Institute of Technology, 2003
Flowchart of the DX-GtGet User I/P
Validation
SoCDMMU?
Fetch the required *.v *.vpp files
Change the parameters of each *.vpp file
Xbar?
Pass the *.vppfiles to VPP for
processing
gen_xbar()
Generate top level file and compress
all verilog files
No No
YesYes
O/P files to user
Fetch *.vpp files
Fetch SocDMMU bus wrapper for ARM9tdmi
Fetch the SoCDMMU *.vpp & *.v files
Calculate the k & m parameters that optimize the Allocation Unit (k=16, m=16 for TSMC 0.25)
SASIMI April 3-4, 2003 25 © Georgia Institute of Technology, 2003
Flowchart of the DX-GtGet User I/P
Validation
SoCDMMU?
Fetch the required *.v *.vpp files
Set the parameters of each *.vpp file
Xbar?
Pass the *.vppfiles to VPP for
processing
gen_xbar()
Generate top level file and compress
all verilog files
No No
YesYes
O/P files to user
`let n = 256 //No. of G_blocks`let p = 4 //No. of processors`let lbsz = 13 //log2(G_block size)`let VLOGR = 1 //Verilogger Friendly`let k = 16 //# of segments`let m = 16 //Segment width
`let ate = LOG2(p) + LOG2(n) + 2 `let v = 32 - lbsz`let ln = LOG2 (n)`let lp = LOG2 (p)`let ln1 = ln - 1`let lp1 = lp - 1`let ate1 = ate - 1`let n1 = n - 1`let p1 = p - 1`let v1 = v - 1
//Allocation Unit Specific`let k1 = k-1`let m1 = m-1`let lk = LOG2(k)`let lm = LOG2(m)...
AllocationUnit.vpp
SASIMI April 3-4, 2003 26 © Georgia Institute of Technology, 2003
Flowchart of the DX-GtGet User I/P
Validation
SoCDMMU?
Fetch the required *.v *.vpp files
Set the parameters of each *.vpp file
Xbar?
Pass the *.vppfiles to VPP for
processing
gen_xbar()
Generate top level file and compress
all verilog files
No No
YesYes
O/P files to user
VPP
SASIMI April 3-4, 2003 27 © Georgia Institute of Technology, 2003
Flowchart of the DX-GtGet User I/P
Validation
SoCDMMU?
Fetch the required *.v *.vpp files
Set the parameters of each *.vpp file
Xbar?
Pass the *.vppfiles to VPP for
processing
gen_xbar()
Generate top level file and compress
all verilog files
No No
YesYes
O/P files to user
SASIMI April 3-4, 2003 28 © Georgia Institute of Technology, 2003
parameters
m<M
prev_req[m]...
m++
gen_proc_wire(M)
n<N
mem_addrn...
n++
m<M
gen_mem_wire(M)
gen_addr_bus_switch(M)gen_data_bus_switch(M)
gen_wire_switch(M)gen_wre_ta_switch(M)
yes
yes
RAG generatingan arbiter
m++
gen_comp(M)
yesgen_Mx1(parameters)
MxN Xbar
n prev_ indicates that signals come from SoCDMMU.
n mem_ indicates that signals to memory blocks.
SASIMI April 3-4, 2003 29 © Georgia Institute of Technology, 2003
parameters
m<M
prev_req[m]...
m++
gen_proc_wire(M)
n<N
mem_addrn...
n++
m<N
gen_mem_wire(M)
gen_addr_bus_switch(M)gen_data_bus_switch(M)
gen_wire_switch(M)gen_wre_ta_switch(M)
yes
yes
RAG generatingan arbiter
m++
gen_comp(M)
yes
gen_Mx1(parameters)
4x4 Xbar
prev_req[0]prev_addr0prev_data0prev_read0prev_write0prev_ta0
m=0
M=4, N=4
prev_req[1]prev_addr1prev_data1prev_read1prev_write1prev_ta1
m=1
prev_req[2]prev_addr2prev_data2prev_read2prev_write2prev_ta2
m=2
prev_req[3]prev_addr3prev_data3prev_read3prev_write3prev_ta3
m=3
mem_addr0mem_data0mem_read0mem_write0mem_ta0
n=0
mem_addr1mem_data1mem_read1mem_write1mem_ta1
n=1
mem_addr2mem_data2mem_read2mem_write2mem_ta2
n=2
mem_addr3mem_data3mem_read3mem_write3mem_ta3
n=3
4x1switch
0
m=1
4x1switch
1
m=2
4x1switch
2
m=3
4x1switch
3
m=4
4x1switch
34x1sw
itch2
4x1switch
04x1sw
itch1
SASIMI April 3-4, 2003 30 © Georgia Institute of Technology, 2003
n Verilog Languagen `define & `ifdef
n not enough, e.g., cannot automatically generate n modules
n Verilog 2000/2001n Generate loops (not supported by available tools)n not enough, e.g., cannot calculate log2(n)
n C Code generates custom Verilog directly
Customizing the Xbar
SASIMI April 3-4, 2003 31 © Georgia Institute of Technology, 2003
Agenda
n Introduction & Motivationn Target Architecturen The DX-Gtn Synthesis Resultsn SoC Floorplann Conclusions
SASIMI April 3-4, 2003 32 © Georgia Institute of Technology, 2003
Synthesis Results
n We synthesized different configurations of the SoCDMMU and the Xbar.
n We use the Synopsys Design Compiler ?with a 0.25µm TSMC technology libraryfrom LEDA Systems
SASIMI April 3-4, 2003 33 © Georgia Institute of Technology, 2003
Mx1 Switch Area
0
500
1000
1500
2000
2500
3000
3500
4000
4500
0 2 4 6 8 10 12 14
M: number of processors
Mx1
sw
itch
area
in t
he
nu
mb
er o
f N
AN
D g
ates
equ
ival
ent
wit
h
TS
MC
.25u
m
SASIMI April 3-4, 2003 34 © Georgia Institute of Technology, 2003
MxN Xbar Area
0
5000
10000
15000
20000
25000
30000
35000
40000
45000
50000
0 2 4 6 8 10 12 14
N: number of processors and number of memory ports
NxN
Xba
r ar
ea in
the
num
ber
of N
AND g
ates
equiv
alen
ts w
ith T
SM
C .2
5um
SASIMI April 3-4, 2003 35 © Georgia Institute of Technology, 2003
SoCDMMU Area (w/o memory)
SASIMI April 3-4, 2003 36 © Georgia Institute of Technology, 2003
SoCDMMU Area (Memory)
SoCDMMU Address Converter and Allocation Table Area
0
10
20
30
40
50
60
70
0 2 4 6 8 10 12 14
Number of PEs
6T-S
RA
M w
ith t
he S
ame
area
(KB
)
1024
512
256
128
SASIMI April 3-4, 2003 37 © Georgia Institute of Technology, 2003
Agenda
n Introduction & Motivationn Target Architecturen The DX-Gtn Synthesis Resultsn SoC Floorplann Conclusions
SASIMI April 3-4, 2003 38 © Georgia Institute of Technology, 2003
SoC Floorplan
ARM9tdmi+
Caches
DRAMbank 0
ARM9tdmi+
Caches
MPC750 +
Caches
MPC750+
Caches
SoCDMMU w/oAddr. Conv.
Addr. Conv. Addr. Conv. Addr. Conv. Addr. Conv.
Memory CTRL
Peripherals(e.g., Network Interface)
DRAMbank 1
DRAMbank 2
DRAMbank 3
Xbar
Custom Logic
n ARM9TDMI Core: 112k transistorsn L1 $ (128KB: 64KB I$ + 64KB D$):
~6.5M* transistors n SoCDMMU (w/o the memory
elements -- Allocation Table and Address Converters): ~28k transistors.
n Allocation Table: ~168k transistorsn Address Converter: ~320k**
transistorsn L2 (Global Memory)=~16M * 8 =
~128M transistorsn For TSMC 0.25u
n SoCDMMU w/o memory elements: 1.43mm2
n Xbar : 0.23mm2
* Using dual-port 6T SRAM Cells.
** A custom physical design would a much smaller number.
Tools/Information Used to Floorplan:n ARM website (ARM core area) n Synopsys Design Compilern Cadence Silicon Ensemble
SASIMI April 3-4, 2003 39 © Georgia Institute of Technology, 2003
Agenda
n Introduction & Motivationn Target Architecturen The DX-Gtn Synthesis Resultsn SoC Floorplann Conclusions
SASIMI April 3-4, 2003 40 © Georgia Institute of Technology, 2003
Conclusion
n DX-Gt is a System-on-a-Chip IP generation tool that enables an SoC designer to design a multiprocessor SoC and configure its memory and bus subsystems to meet the design constraints with ease.
SASIMI April 3-4, 2003 41 © Georgia Institute of Technology, 2003
Previous Work in Custom Memory Management: Matisse
n Virtual Memory Management Search Spacen Keeping track of free blocksn Choosing a free blockn Freeing allocated blocksn Merging Free Blocks
n Physical Memory Optimizationn Basic Groupsn Basic Groups memory assignmentn Address Optimization
SASIMI April 3-4, 2003 42 © Georgia Institute of Technology, 2003
Previous Work:Matisse vs. SoCDMMU
n Matissen DMM Synthesis (VM & Physical Memory)n Application Specific (suitable for special-
purpose systems, e.g., an ATM switch)
n SoCDMMUn Run-Time DMMn General Purpose (not tied to any
application or configuration)