a tileable switch module architecture for homogeneous 3d fpgas

A Tileable Switch Module Architecture for Homogeneous 3D FPGAs

Seyyed Ahmad Razavi1, Morteza Saheb Zamani1, Kia Bazargan2 1 Amirkabir University of Technology, Tehran, Iran. {a.razavi, szamani}@aut.ac.ir

2 University of Minnesota, Minneapolis, Minnesota, USA. [email protected]

Abstract

3D technology is an attractive solution for reducing wirelength in a field programmable gate array (FPGA). However, trough silicon vias (TSV) are limited in number. In this paper, we propose a tilable switch module architecture based on the 3D disjoint switch module for 3D FPGAs. Experimental results over 20 MCNC benchmarks show 62% reduction in the number of TSVs on average and small improvements in horizontal channel width and delay compared to the original 3D disjoint SM. 1. Introduction

Due to advances in fabrication technology and feature size shrinking, interconnection delay has become responsible for most of a circuits delay. One of the solutions proposed in recent years is to use 3D technologies. In 3D technologies, multiple substrates are stacked on top of each other, using through silicon vias (TSV) for inter-tier communication. 3D technologies provide a few useful properties: (1) wire length reduction, (2) design footprint reduction and (3) ability to integrate different technologies. [1]

In SRAM-based FPGAs, programmable routing resources have the largest impact on delay and area of the chip. 3D technology provides interesting opportunities for reducing FPGA routing delay due to shorter wire lengths although care must be taken not to increase the area too much. 3D FPGAs can be categorized into two types: (1) monolithically stacked 3D FPGAs [3,4] and (2) 3D FPGAs formed by stacking multiple active device layers [1][2]. In the first type, devices used for logic blocks, routing resources and configuration memory are laid out in separate layers. With this architecture, the distance between logic blocks is reduced, leading to decrease in the length of interconnection wires. The results show improvement in logic density, delay and power by 3.2, 1.7, 1.7 times respectively compared to 2D FPGA [3]. Lin in [4] designed a routing fabric for such FPGA architectures, resulting in circuit delay improvements.

In the second type, a 3D FPGA consists of multiple stacked island-based 2D FPGAs connected via 3D switch boxes (SB). A 3D SB connects to the top/bottom layers using TSVs. In current 3D technologies, TSVs are large and have large pitch sizes, resulting in practical

limitations on the number of such 3D vias that one can use [1]. In addition, 3D SBs have more switches than 2D SBs witch can lead to more area and delay. Therefore, it is reasonable to reduce their count in a switch module (SM) to decrease the FPGA area and cost. Ababei, et. al. [2], introduced the TPR tool for 3D FPGA placement and routing. Their tool is based on VPR [5] and is open source [6]. To minimize the number of TSVs, they first partition logic blocks into layers using the min-cut hMetis partitioning algorithm [7], and then assign them to layers using a linear placement method. After assigning logic blocks to layers, they use partitioning-based placement [8] for each layer with some restrictions on the location of some blocks. They use path finder to routing the design. They also design a 3D SM named disjoint that is a 3D extension of ordinary 2D disjoint SM. In their architecture, the number of 3D SBs in an SM is fewer than the horizontal channel width.

In [1], a number of 3D SM topologies as well as thermal issues are discussed. Compared to the disjoint SM in [1,2], their best SM topology, Universal-twist, reduces the number of TSVs and area-delay product by 49% and 9% respectively over 20 MCNC benchmarks. This SM topology is 2D universal in horizontal direction. They used SA-based placement and timing-driven routing.

In previous 3D SM architectures, nets that span multiple layers had to be routed on tracks that connected exclusively to 3D SBs. We first show that SM architectures with this property cannot use TSVs efficiently. Then we present our tileable SM architecture that distributes 3D SBs better and hence uses TSVs more efficiently. Our SM architecture, called MDisjoint, is based on the 3D disjoint SM [1][2] but has a different topology. By the proposed architecture, with the same number of switches, more tracks in a channel can have access to TSVs. Therefore, routability can improve resulting in a decrease in the number of 3D SBs and TSVs. Our idea can be applied to the SMs in [1] as well.

The paper is organized as follows. In Section 2, we first show that 3D disjoint SM cannot use TSVs efficiently and then in section 3 we describe our SM architecture. In Section 4, we quantify the improvement in the new 3D SM and finally we conclude this paper is Section 5.

978-1-4244-4512-7/09/$25.00 2009 IEEE978-1-4244-4512-7/09/$25.00 2009 IEEE

2. Shortcomings of the 3D Disjoint Switch Module

As in 2D FPGAs, 3D SM architecture has a significant effect on area, routability and delay of a design. We first explain the inefficiency of disjoint 3D SBs and then describe our tileable 3D SM (MDisjoint).

Figure 1.a shows a 2D disjoint SB. It uses 6 switches, shown by dashed lines, to connect 4 segments from 4 tracks connected to the SM. Each switch consists of a pass transistor, a reconfigurable memory and potentially a buffer. Figure 1.b shows the 3D extension of the 2D SB. In this architecture, wire segments in one layer are connected to neighboring layers by TSVs and there are 8 additional switches for connections to TSVs. Therefore, this 3D SB takes more area and causes more delay compared to the 2D SB. A 3D disjoint SM may consist of 2D and 3D disjoint SBs. In Figure 1.c, a 3D disjoint SM with horizontal channel width of four and one TSV is shown.

Figure 1: (a) 2D SB, (b) 3D SB and (c) 3D dsjoint SM

with horizontal channel width of four and one TSV

We use the term disjoint 3D SBs to refer to SMs in which 3D SBs only connect to other 3D SBs. In SM architectures with disjoint 3D SBs, nets that span multiple layers have to be routed on tracks that connect exclusively to 3D SBs. For example, in Figure 2.1, four neighboring 3D disjoint SMs are shown. Horizontal channel width is 4 in this Figure, and the number of TSVs in an SM is 1. The darker SBs are 3D. The connections to TSVs are not shown in the SMs to simplify the Figure. As can be seen, if a net spans multiple layers, it should route

exclusively on track 0 in the horizontal (within a 2D plane) or the vertical directions (using TSVs). All of the SBs in track 0 are 3D and therefore if a net is routed using track 0, it will pass through 3D SBs in all the SMs on its way. To better understand the inefficiencies associated with using TSVs in this architecture, consider three adjacent disjoint SMs (Figures 2.b). Only tracks in the X direction are shown for simplicity. In the 3D disjoint SM, only one of the four nets entering the right-most SM can use TSVs by routing in track 0.

If wire segments span no more than one CLB, a 2 terminal net, neti, with total wire length L will pass through L SBs. If neti spans multiple layers, it will pass through L 3D SBs and no 2D SB. These 3D SBs contain the same number of TSVs. For example, in Figure 2.b the horizontal segments of the net, shown in bold, uses 3D SBs and occupy a horizontal track passing through 3D SBs and therefore other nets cannot use TSVs on these SBs.

Assume that a circuit has N3D 3D nets and their average wire length is L3Davg. In the best case, all 6 segments of the 3D disjoint SB are occupied. Then we need at least L3Davg * N3D /(6/2) 3D SBs to route these 3D nets. The number of TSVs is the same. These nets use 3D SBs to be routed in the horizontal and vertical directions. Then the tracks connected to the TSVs are occupied and other nets cannot reach to the unused TSVs. A corollary to this argument is that architectures that use TSVs more efficiently, will need fewer TSVs for routing the nets. For example, assume that all 3D nets span 2 layers. In an optimal routing, each 3D net will need one TSV to route. Then the ratio of the optimal number of TSVs to the number of TSVs needed in the disjoint 3D SM is 3/L3Davg. For L3Davg=9, this number is 1/3 that shows the low utilization of the disjoint 3D SM (~%33).

Figure 2: (a) Adjacent 3D disjoint SM and (b) Segment

view in the X direction

MostafaRectangle

MostafaRectangle

MostafaRectangle

MostafaRectangle

MostafaRectangle

If a critical net spans multiple layers, it must pass through 3D SBs. In addition to inefficient use of TSVs, disjoint 3D SBs may cause higher circuit delay because 3D SBs have more switches and hence more junction capacitance than 2D SBs.

3. Proposed Architecture

We improve routability and delay of SMs by staggering 3D SBs between tracks, which leads to placing 3D SBs on different tracks on in adjacent SMs. In this architecture, 3D nets try using 2D SBs in the horizontal section of their routes, hence reducing the number of TSVs used.

The proposed architecture, called MDisjoint, is an extension of the 3D disjoint SB. MDisjoint is tileable and the method that we use to change 3D SB distribution is applicable to other 3D SMs (e.g., in [1]) as well. MDisjoint is similar to 2D disjoint SM in the horizontal plane except that tracks are staggered. It is regular and its formal description is shown in Figure 3. In this description, parameter S indicates the number of tracks in the channel which can connect to TSVs in adjacent SMs. Using this architecture, more tracks can reach to TSVs, leading to more routability.

Original disjoint SM For( i = 0; i< tracks_in_horizontal_channel; i++){

If( i < number_of_TSVs) Connect tracks (i) in [left, top, right , bottom, above, below] sides to each other;

else Connect tracks (i) in [left, top, right , bottom] sides to each other;

} MDisjoint SM //connections in horizontal plan For( i = 0; i< tracks_in_horizontal_channel; i++){

If ( i < S ) Connect tracks (i) in [left, top] sides and tracks (i+1) in [right, bottom] sides;

If ( i = = S) Connect track S in [left, top] sides and tracks (0) in [right, bottom] sides;

If ( i > S) Connect tracks (i) in [left, top, right , bottom] sides;

} //connections of TSVs For(j =0; j < number_of_TSVs; j++){

Connect 2 TSVs from [above, below] layers to 2D disjoint SB on track (j* S/number_of_TSV); Connect TSV (j) in [below] and [above] layers;

} Figure 3: Formal description of original 3D disjoint SM

and MDisjoint

Figure 4, shows the MDisjoint architecture for a horizontal channel width of 4 and one TSV. In this architecture, tracks are divided into two categories: (1) Tracks that connect to disjoint 2D SBs (e.g. Tracks 3 in Figure 4). Nets that route on these tracks pass trough disjoint 2D SBs. (2) Tracks that can reach to TSVs directly or indirectly (e.g. tracks 0, 1 and 2 in the figure). In the proposed architecture, due to the staggering effect of tracks, more nets can use TSVs although using different number of SM hops. Therefore, not only 3D nets do not suffer from large disjoint 3D SBs overheads but also the required number of 3D SBs and TSVs decreases. In Figure 4.b, only tracks in the X direction are shown. Three nets entering the right-most SM, can use TSVs by routing on tracks 0, 1 and 2 although using different hops. For example, a net entering track 0 (1) can use the TSV in the left-most (right-most) SM.

Figure 4: (a) MDisjoint SM and (b) Segment view in X

direction 4. Experimental results

To evaluate our SM architecture, TPR [2], a placement and routing tool for 3D FPGAs was used. We modified the TPR routing algorithm that is routability-driven to make it timing-driven. A configurable logic block (CLB) in our experiments consists of one 4-input LUT and a FF. Each input and output pin of the CLB can connect to all adjacent tracks. All horizontal segments (CHANX and CHANY) span 1 CLB and are driven by tri-state buffers. The TSVs span one layer. Parameter S in MDisjoint SM is set to 5 and 7 for the first and the second ten benchmarks respectively.

To eliminate the effects of placement variation over multiple runs on the results, we first place the netlist once and then route the placed netlist for different SM architectures. Similar to the method used in [1], the placed netlists are routed to find the minimum number of

MostafaRectangle

TSVs for a large horizontal channel width (HCW) and then after fixing the number of TSVs, the design is rerouted to find the minimum number of HCW.

Table 1 tabulates the results for a four-layer 3D FPGA using the original 3D disjoint SM and MDisjoint. For all of the benchmarks, using MDisjoint SM, the number of TSVs is reduced as expected and almost all the benchmarks are routed by only one TSV in every SMs The average HCW is almost the same in the two SM architectures. In our experiments, we found that HCW is highly dependant on the number of TSVs in both the original disjoint and MDisjoint SM architectures. Experimental results over 20 MCNC benchmarks show a 62% reduction in the number of TSVs on average and small improvements in HCW and delay compared to the original 3D disjoint SM.

Table 1: Experimental results for the original 3D Disjoint

SM and the proposed architecture 3D disjoint SM Proposed SM ratio

circuit HCW TSV Delay HCW TSV Delay HCW TSV Delaytseng 9 2 0.474 8 1 0.490 0.89 0.50 1.03 diffeq 10 2 0.543 9 1 0.543 0.90 0.50 1.00 dsip 14 2 0.319 13 1 0.266 0.93 0.50 0.83

bigkey 16 2 0.348 14 1 0.306 0.88 0.50 0.88 s298 11 3 0.887 12 1 0.965 1.09 0.33 1.09 frisc 12 4 1.200 13 1 1.207 1.08 0.25 1.01

elliptic 14 3 0.865 13 1 0.842 0.93 0.33 0.97 s38417 16 1 0.810 13 1 0.774 0.81 1.00 0.96

s38584.1 9 2 0.714 9 1 0.682 1.00 0.50 0.95 clma 16 3 1.338 15 1 1.465 0.94 0.33 1.10 ex5p 16 5 0.478 14 2 0.462 0.88 0.40 0.97 apex4 12 5 0.472 14 1 0.577 1.17 0.20 1.22 misex3 10 4 0.526 11 1 0.495 1.10 0.25 0.94

alu4 9 4 0.506 10 1 0.509 1.11 0.25 1.01 des 12 2 0.450 11 1 0.443 0.92 0.50 0.98 seq 12 5 0.524 14 1 0.516 1.17 0.20 0.98

apex2 12 4 0.648 14 1 0.644 1.17 0.25 0.99 spla 22 5 0.944 21 1 0.887 0.95 0.20 0.94 pdc 22 6 1.021 22 2 0.963 1.00 0.33 0.94

ex1010 17 3 0.897 15 1 0.892 0.88 0.33 1.00 Average ratio 0.99 0.38 0.99 improvement 0.01 0.62 0.01

In these experiments, the average TSV utilization was

46% and 70% of the original 3D disjoint and MDisjoint SMs. The 24% increase in TSV utilization shows more efficient use of TSVs in our architecture. 5. Conclusion

In an SM with disjoint 3D SBs, if a net spans across multiple layers, it will route through 3D SBs throughout its routing path. In this paper, we first showed that SM with disjoint 3D SBs cannot use TSVs efficiently. Then we described our tileable SM architecture, MDisjoint, that is an extension of the 3D disjoint SM. By staggering tracks in the MDisjoint SM, 3D SBs appear on different

tracks in adjacent SMs. Experimental results over 20 MCNC benchmarks shows 62% reduction in the number of TSVs without degradation in performance or increase in horizontal channel width on average compared to the original 3D disjoint SM. Our idea can be applied to the SMs in [1] as well. 6. References [1] A. Gayasen et al. , Designing a 3-D FPGA: switch box

architecture and thermal issues, IEEE Transaction on Very Large Scale Integration Systems(TVLSI), 2008.

[2] C. Ababei et al. , Three-dimensional place and route for FPGAs, IEEE Transaction on Computer-Aided Design of Integrated Circuits and Systems (TCAD), 2006.

[3] M. Lin et al., Performance benefits of monolithically stacked 3D-FPGA, International Symposium on Field Programmable Gate Arrays (FPGA), 2006.

[4] M. Lin et al., A Routing Fabric for monolithically stacked 3D-FPGA, International Symposium on Field Programmable Gate Arrays (FPGA), 2007.

[5] V. Betz et al., VPR: a new packing placement and routing tool for FPGA research, International Symposium on Field Programmable Gate Arrays (FPGA), 1997.

[6] https://netfiles.umn.edu/users/kia/www/index.html [7] G. Karypis et al., Multi-level hypergraph partitioning:

application in VLSI design, Design Automation Conference (DAC), 1997.

[8] P. Maidee et al., Timing-driven Partitioning-based Placement for Island Style FPGAs, IEEE Transaction on Computer-Aided Design of Integrated Circuits and Systems (TCAD), 2005.

a tileable switch module architecture for homogeneous 3d fpgas

Documents