a 90-nm low-power fpga for battery-powered applications

296 IEEE TRANSACTIONS ON COMPUTER-AIDED DESIGN OF INTEGRATED CIRCUITS AND SYSTEMS, VOL. 26, NO. 2, FEBRUARY 2007

A 90-nm Low-Power FPGA forBattery-Powered Applications

Tim Tuan, Arif Rahman, Satyaki Das,Steve Trimberger, and Sean Kao

Abstract—Programmable logic devices such as field-programmable gatearrays (FPGAs) are useful for a wide range of applications. However,FPGAs are not commonly used in battery-powered applications becausethey consume more power than application-specified integrated circuitsand lack power management features. In this paper, we describe the designand implementation of Pika, a low-power FPGA core targeting battery-powered applications. Our design is based on a commercial low-cost FPGAand achieves substantial power savings through a series of power optimiza-tions. The resulting architecture is compatible with existing commercialdesign tools. The implementation is done in a 90-nm triple-oxide CMOSprocess. Compared to the baseline design, Pika consumes 46% less activepower and 99% less standby power. Furthermore, it retains circuit andconfiguration state during standby mode and wakes up from standby modein approximately 100 ns.

Index Terms—Field-programmable gate array (FPGA), programmablelogic devices, standby power.

I. INTRODUCTION

A key challenge in the IC scaling era is delivering high-performancesolutions while minimizing power and cost. Programmable logic de-vices such as field-programmable gate arrays (FPGAs) address thischallenge by providing a cost-efficient solution from low- to mid-volume applications due to low non-recurring engineering costs. Also,with in-field programmability, FPGAs provide a platform solutionwith faster time to market and longer product lifetime. Despite itsmany advantages, FPGAs are not widely found in today’s mobileapplications.

Mobile applications generally have two power requirements: activepower and standby power. A typical user application has extremely lowduty cycle, where the device is active for a short period of time (lessthan 1 h) and then is inactive for a long period of time (days or weeks).During the periods of activity, the device must be energy efficient,that is, perform the necessary functions while consuming minimumenergy. During the idle periods, the device must consume little or nopower to extend battery life. The active power requirement for a typicalmobile IC is on the order of 100s of milliwatts, while its standby powerrequirement is on the orders of 10s to 100s of microwatts [2], [3].

Despite FPGAs’ computational energy efficiency advantage overdigital signal processors (DSPs) [4], [5] today, DSPs are widely usedin battery-operated applications primarily due to their extensive powermanagement capabilities that enable very low-power consumptionduring standby. In contrast, existing FPGAs, designed for high-throughput, high-duty-cycle applications, have little or no power man-agement features. Current low-cost FPGAs consume up to 100s ofmilliwatts of standby power [23], while high-end FPGAs can consumeover 1 W [26]. Compared to mobile ICs, the FPGA’s standby power isat least two orders of magnitude higher than what is required.

In this paper, we present the design and implementation of Pika,a low-power FPGA core targeting battery-powered applications. We

Manuscript received March 16, 2006; revised July 7, 2006. This paper wasrecommended by Associate Editor K. Bazargan.

T. Tuan, A. Rahman, S. Das, and S. Trimberger are with Xilinx, Inc., SanJose, CA 95124 USA (e-mail: [email protected]; [email protected];[email protected]; [email protected]).

S. Kao is with Newport Media, Lake Forest, CA 92630 USA.Digital Object Identifier 10.1109/TCAD.2006.885731

base our design on a low-cost commercial FPGA [23]. Due to practicalconcerns, we constrain the design to be compatible with existingsoftware and process technology. Compared to the baseline Spartan-3core, Pika consumes 46% less active power and 99% less standbypower. Furthermore, it retains circuit and configuration state duringstandby mode and wakes up from standby mode in approximately100 ns.

II. RELATED WORK

Low-power FPGA design has recently become an area of researchinterest. Software solutions for power reduction have been proposedto optimize FPGA power by performing power-aware technologymapping, placement, routing, and lookup-table (LUT) reprogramming[10]–[13]. In the area of hardware design, low-power techniquessuch as low-swing interconnect, heterogeneous interconnect, multi-Vt,multi-Vdd, and fine-grain power gating have been proposed to improveFPGA power consumption [14]–[21].

Some of the power optimization techniques presented in this paperhave been applied to application-specified integrated circuit (ASIC)and processor design. An early application of power gating was a0.5-µm multithreshold CMOS (MTCMOS) DSP that achieved 1000Xpower reduction in standby mode. An example of aggressive voltagescaling is a commercial 0.18-µm microprocessor that operates downto 0.75 V to achieve dramatic power reduction at low frequencies [7].More recently, power gating was applied to processor IP blocks in a0.13-µm process [8] to achieve 300X standby power reduction and40X leakage reduction in a 90-nm DSP processor when combined withmultithreshold design [9].

The main contribution of this paper is the application of proven low-power ASIC and processor design techniques to a 90-nm commercialFPGA. In doing so, we uncover and address new design issues asso-ciated with FPGA architecture, advanced process challenges, cost andperformance constraints, and software compatibility.

III. BASELINE ARCHITECTURE AND POWER

For our baseline architecture, we use the Xilinx Spartan-3FPGA [23] a low-cost FPGA built in a 1.2-V 90-nm CMOS process.We choose a low-cost architecture because many low-power applica-tions are also cost sensitive. To facilitate comparison and to ensureour FPGA can be manufactured and programmed without substantialeffort on process technology and software design, we keep our de-sign changes compatible with existing manufacturing processes andcomputer-aided-design tools.

The Spartan-3 core architecture (Fig. 1) comprises an array ofconfigurable logic blocks (CLBs). Each CLB is coupled with aprogrammable interconnect switch matrix that connects the CLB toadjacent and nearby CLBs. We will refer to each CLB/switch matrixpair as a tile.

Each CLB has four logic slices. Each logic slice has two four-input LUTs (4LUTs), two configurable flip-flops (FFs), and some addi-tional circuitry for fast arithmetic operations and wide-input functions.Each interconnect switch matrix comprises numerous programmableswitches that drive CLB outputs to other CLBs or select CLB inputsfrom signals in the interconnect. Each programmable switch is abuffered multiplexer controlled by a set of configuration memory cells.Although the programmable switches are simple in structure, theytypically dominate the CLB power consumption because they makeup a large portion of the total area.

In addition to the CLBs and the switch matrices, the FPGA corealso has a number of specialty blocks such as block RAMs (BRAMs),

0278-0070/$25.00 © 2007 IEEE

IEEE TRANSACTIONS ON COMPUTER-AIDED DESIGN OF INTEGRATED CIRCUITS AND SYSTEMS, VOL. 26, NO. 2, FEBRUARY 2007 297

Fig. 1. Architecture of the baseline Spartan-3 core.

multipliers, and digital clock managers that provide efficient im-plementation of complex functions common to many applications.These specialty blocks are beyond the scope of this paper, but theoptimizations described in this paper can be applied to these blocksto achieve similar savings.

The power consumption of an FPGA is dependent on block capac-itance, block leakage, switching activity, configuration state, resourceutilization, and temperature. To estimate typical FPGA power, weobtain block capacitance and block leakage through exhaustive SPICEsimulations of each block under a wide range of input states andconfiguration states. Typical switching activity is set to 12.5%. Typicalutilization for the given architecture is obtained by analyzing over100 proprietary benchmark designs. Temperature for a typical designis defined as 25 ◦C for an idle device and 85 ◦C for an active device.The accuracy of this characterization methodology had been validatedin previous work [24], [25].

Fig. 2 shows the typical core power consumption of the baselinearchitecture excluding the specialty blocks. Active power consists ofthe dynamic and static power of an active device, while standby powerconsists of the static power of an idle device. For the array sizes shown,active power is on the orders of 10s–100s of milliwatts, while standbypower is on the order of 1s–10s of milliwatts. This active power iscomparable to that of existing mobile devices, while the standby poweris about two orders of magnitude higher. Therefore, we prioritize ourefforts to target dramatic standby power reduction, and approach activepower as a secondary goal.

We further break down total power to identify high-power compo-nents. Routing switches make up most of the total active power, whileboth routing switches and configuration memory represent significantparts of the total static power.

IV. POWER OPTIMIZATIONS

In this section, we describe each of the power optimizations appliedto the baseline architecture.

A. Voltage Scaling

Voltage scaling is particularly effective for reducing power be-cause dynamic power and static power are quadratic and exponentialfunctions of the supply voltage, respectively, while circuit speed isapproximately a linear function of the supply voltage [1]. Designs suchas FPGAs that are optimized for performance typically use relativelyhigh supply voltages to gain speed at the expense of higher power.Consequently, lowering the supply voltage of a high-speed design canoften yield a more energy efficient solution.

Fig. 3 shows the speed and the static power of a test circuit, a 4LUTdriving two double switches, at different voltages. Looking purely atenergy efficiency, one would typically set the supply voltage at a pointwhere the power-delay product (PDP) is minimal. However, in thisexample, PDP continues to drop even below 0.8 V, where performancedegradation becomes prohibitively large and reliability becomes aconcern. Consequently, considering performance, energy efficiency,and reliability, we choose 1.0 V as our core operating voltage. Thisleads to power reduction in all core blocks except for the configurationmemory, which are excluded because they can be more effectivelyaddressed as described in the next section.

B. Low-Leakage Configuration Memory

As shown in Fig. 2, configuration memory represents 44% of thecore leakage power. Because it is not timing critical, it is a goodcandidate for aggressive power optimization where performance isadversely affected. It has been suggested to use high-Vt transistors forconfiguration MEMORY to save leakage power [11], [19]. However,such a scheme does not address gate leakage, which can be morethan 50% of the total leakage power at low temperatures. Therefore,any technique that fails to address gate leakage cannot reduce powerconsumption by more than 50%. Should future process generationssuccessfully adopt high-κ dielectrics to achieve dramatic gate leakagereduction, high-Vt devices will again become suitable for powergating.

The presence of significant gate leakage mandates that a low-leakage device must have thicker gate oxide. Standard thick-oxidedevices used for IO buffers are much too large to be used for millionsof configuration memory cells. Alternatively, we use a midoxide, high-Vt device for the configuration memory cells. The midoxide transistoris available in the triple-oxide process used by the Virtex-4 FPGAfamily [26]. Therefore, its use does not impose on us a new processtechnology.

Using midoxide, high-Vt transistors dramatically cuts both sub-threshold leakage and gate leakage in the configuration memory. Asshown in Fig. 4, total memory leakage is reduced by nearly two ordersof magnitude. The use of midoxide transistors does require additionalmask cost and causes some die area increase (the latter will be coveredin Section V), but the power savings sufficiently justify the added cost.

C. Power Gating for Active Leakage Reduction

Power gating is a well-known power reduction technique in ASICsand microprocessors [6], [8], [9]. To support power gating, one or morepower transistors are inserted between a circuit block and its powerand/or ground. When the power gates are turned on, active operationis unaffected except for a small performance penalty due to theon-resistance of the power gates. When the power gates are turned off,circuit current is limited to the leakage current of the power gates.

Typically, power gating is applied to coarse-grain functional blocksto reduce standby leakage when the block is temporarily idle. In thispaper, we also use power gating to reduce the chip’s active leakagepower by power gating unused blocks.

1) Granularity: One of the main design decisions in power gatingdesign is the granularity of the smallest block that can be indepen-dently power gated. At one extreme, each LUT, FF, and routingswitch may be independently gated [16]. This approach leads to alarger fraction of unused blocks and hence greater power savings fromswitching them OFF, but the area overhead is also greater. At the otherextreme, clusters of 20 or more tiles may be controlled by a singlepower gate [11]. This approach has the benefit of less area overheadfrom power gating [8], but fewer clusters will be completely unused.The tradeoff of power gating granularity is studied in greater detail in


Fig. 2. Typical power consumption of Spartan-3 cores.

Fig. 3. Delay, static power, and PDP of the test circuit at various voltages.

Fig. 4. Leakage of thin-oxide, regular-Vt and midoxide, and high-Vtmemory cell.

a separate work [22]. We opt for power gating at the level of individualtiles as a compromise between the two extremes and because it leads toefficient physical layout. On the average of over 100 benchmarks, wefind that 25% of the tiles are unused and can be power gated. A diagramof the power gating architecture is shown in Fig. 5. Configurationmemory cells are not power gated because they consume very littlepower due to midoxide design and the ability to retain state in a low-power mode is valuable.

2) Power Gate Design: Another design consideration is the imple-mentation of the power gating transistors. The most straightforwardpower gating implementation is to use both PMOS and NMOS powergates, but by using only one or the other, one can achieve comparablepower savings with much less area overhead. For the same size,NMOS power gates generally give better speed characteristics than

Fig. 5. Proposed power gating architecture, where configuration memory cellsare not power gated.

PMOS power gates due to greater carrier mobility. In our simulations,we find the power savings are similar for both. Thus, we chooseNMOS-only power gates.

Conventional power gating uses thin-oxide high-Vt transistors [6],which are susceptible to gate leakage in a 90-nm process. Fig. 6(a)shows the gate leakage path introduced by thin-oxide high-Vt powergates. Fig. 6(b) shows the power and delay behavior of thin-oxide andmidoxide power gating. Each data point represents a different powergate size. Although thin-oxide power gating is potentially faster, italso consumes higher power than midoxide power gating due to theadditional gate leakage. Consequently, we implement midoxide powergates. Using simulation, we size the power gates such that performanceis degraded by no more than 10%.

D. Standby Modes

A simple extension to tile-level power gating enables a low-powerstandby mode where all tiles are power gated while the circuit config-uration is retained. To retain the circuit state stored in FFs, we capturethe FF content into dedicated configuration memory cells beforeentering the standby mode, and restore the FF states when coming outof the standby mode. A simple controller is designed to coordinatethe necessary standby and wake up sequences to prevent contention.

Many applications need a small amount of functionality to remainactive during standby mode to detect wake-up events. Tile-basedpower gating enables such partial standby mode. Since each tile canbe independently power gated, any combination of tiles can be selected

IEEE TRANSACTIONS ON COMPUTER-AIDED DESIGN OF INTEGRATED CIRCUITS AND SYSTEMS, VOL. 26, NO. 2, FEBRUARY 2007 299

Fig. 6. (a) Sleep transistors introduce new gate leakage paths. (b) Sizing of thin-oxide and midoxide power gate produces leakage-delay tradeoffs.

TABLE ISUMMARY OF EACH TECHNIQUE’S IMPACT ON TOTAL CORE

to remain active during standby mode. This decision is programmableby setting the value of a single configuration bit per tile. In Fig. 5,a configurable multiplexer (labeled partial standby) chooses whethereach tile remains awake during partial standby mode.

V. RESULTS

A. Power

We designed and laid out Pika in a 90-nm dual-Vt, triple-oxideCMOS process. To characterize Pika’s power, we determined thecapacitance and leakage of individual resources with postlayout simu-lations, and then used this characterization data to estimate the activeand standby power of a typical user design with the same methodologydescribed in Section III.

Overall, Pika’s active power is 46% less than an equivalentSpartan-3 core, while its standby power is reduced by 99%. The activepower improvement comes from voltage scaling as well as static powerimprovements in the configuration memory cell design. The standbypower reduction primarily comes from static power improvements andthe standby mode. Because the implemented techniques are not circuitspecific, the power breakdown by resource type is approximately thesame as that in Fig. 2. Table I shows a breakdown of the power reduc-tion. For each technique, the reduction is what we would achieve if weapply that technique alone. Since the techniques are not independent,the parts do not add up to the sum.

Fig. 7 compares the power of Spartan-3 cores and Pika cores ofequivalent sizes. Since Pika does not have specialty blocks, thoseblocks are excluded from the Spartan-3 cores in this comparison.

For equivalent cores of approximately 1500 logic cells to 15 000logic cells (representing low to medium density Spartan-3 parts),Pika’s typical active power consumption ranges from 13 to 130 mW,

Fig. 7. Active and standby power comparison between baseline and Pika forvarious size arrays.

Fig. 8. Power consumption of a single tile entering and exiting standby mode.

and its standby power (in sleep mode) ranges from 46 to 460 µW. Thelatter range falls within the aforementioned requirements of 10s–100sof microwatts.

B. Area

Area results are obtained by measuring the physical layout area.The Pika tile is 40% larger than the equivalent Spartan-3 tile. Thisadditional area increases dynamic power consumption and delay byapproximately 5%. These increases are included in our results. Theseincreases are modest because large contributors of the total delay anddynamic power, such as interconnect buffers and logic blocks, are


unaffected by the increases in routing wire length. Area increase alsonegatively affects chip cost, although when accounting for the costs oftesting, assembly, and packaging, die cost is only a fraction of the totalchip cost.

C. Performance

We estimated a user design’s performance by determining the delayof individual resources through postlayout simulation and totaling thedelays of a typical critical path. The total performance impact in Pika isapproximately 27%. Of that, approximately 7% is due to power gating,5% is due to layout area increase, and the rest is from voltage scaling.Performance penalty from power gating is less than what we intendedbecause in our physical layout, we are able to up size the power gatesto fill in open spaces.

D. Mode Transition Behavior

One of the objectives of this design is fast wake-up time fromstandby mode, which involves restoring power and state to the powergated tiles. Not having to reconfigure because configuration data isretained in the device saves a considerable amount of wake-up time.

Fig. 8 shows the power curve of a single tile entering and exitingstandby mode. The exit or wake-up time is shown to be approximately100 ns. Most of this time is spent to charge the gate of the powergate and to discharge the virtual ground. Since all tiles are woken inparallel, the entire core also wakes up in approximately 100 ns.

VI. CONCLUSION

FPGAs present numerous advantages in deep-submicrometer ICdesign. However, high-power consumption, in particular high standbypower, has thus far prevented FPGAs from being widely adopted inbattery-powered applications. We have presented a low-power FPGAcore based on a commercial FPGA architecture that achieves dramaticactive and standby power reductions with limited performance andarea degradation.

REFERENCES

[1] J. Rabaey, A. Chandrakasan, and B. Nikolic, Digital Integrated Circuits:A Design Perspective, 2nd ed. Englewood Cliffs, NJ: Prentice-Hall,2003.

[2] Intel Corp., PXA270 Processor Datasheet. [Online]. Available: http://www.intel.com/design/pca/products/pxa27x/techdocs.htm

[3] Texas Instruments, OMAP5910 Dual-Core Processor Data Manual.[Online]. Available: http://focus.ti.com/docs/prod/folders/print/omap5910.html

[4] T. Claasen, “High speed: Not the only way to exploit the intrinsic compu-tational power of silicon,” in Proc. ISSCC, 1999, pp. 22–25.

[5] P. Schumacher et al., “An efficient JPEG2000 encoder implemented on aplatform FPGA,” in Proc. SPIE Annu. Meeting, 2003, pp. 306–313.

[6] S. Mutoh et al., “A 1 V multi-threshold voltage CMOS DSP with anefficient power management technique for mobile phone application,” inProc. ISSCC, 1996, pp. 168–169.

[7] L. Clark, N. Deutscher, F. Ricci, and S. Demmons, “Standby powermanagement for a 0.18 µm microprocessor,” in Proc. ISLPED, 2002,pp. 7–12.

[8] R. Puri, L. Stok, and S. Bhattacharya, “Keeping hot chips cool,” in Proc.Des. Autom. Conf., 2005, pp. 285–288.

[9] P. Royannez, “90 nm low-leakage SoC design techniques for wirelessapplications,” in Proc. Int. Solid-State Circuits Conf., 2005, pp. 138–589.

[10] J. Lamoureux and S. Wilton, “On the interaction between power-awareFPGA CAD algorithms,” in Proc. Int. Conf. CAD, 2003, pp. 701–708.

[11] A. Gayasen et al., “Reducing leakage energy in FPGAs using region-constrained placement,” in Proc. Int. Symp. FPGA, 2004, pp. 51–58.

[12] J. Anderson and F. Najm, “Power-aware technology mapping for LUT-based FPGAs,” in Proc. FPT, 2002, pp. 211–218.

[13] J. Anderson, F. Najm, and T. Tuan, “Active leakage power optimizationfor FPGAs,” in Proc. Int. Symp. FPGAs, 2004, pp. 33–41.

[14] V. George, H. Zhang, and J. Rabaey, “The design of low energy FPGA,”in Proc. Int. Symp. Low Power Electron. Des., 1999, pp. 188–193.

[15] A. Rahman, “Evaluation of low leakage design techniques for field pro-grammable gate arrays,” in Proc. Int. Symp. FPGA, 2004, pp. 23–30.

[16] B. Calhoun, F. Honore, and A. Chandrakasan, “Design methodologyfor fine-grained leakage control in MTCMOS,” in Proc. ISLPED, 2003,pp. 104–109.

[17] J. Anderson and F. Najm, “Low-power programmable routing circuitry forFPGAs,” in Proc. Custom Integr. Circuits Conf., 2004, pp. 602–609.

[18] A. Rahman, S. Das, T. Tuan, and A. Rahut, “Heterogeneous routing archi-tecture for low power FPGA fabric,” in Proc. CICC, 2005, pp. 183–186.

[19] F. Li, Y. Lin, L. He, and J. Cong, “Low-power FPGA using pre-defined Dual-Vdd/Dual-Vt fabrics,” in Proc. Int. Symp. FPGA, 2004,pp. 42–50.

[20] ——, “FPGA power reduction using configurable Dual-Vdd,” in Proc.Des. Autom. Conf., 2004, pp. 735–740.

[21] F. Li, Y. Lin, and L. He, “Vdd programmability to reduce FPGA intercon-nect power,” in Proc. Int. Conf. CAD, 2004, pp. 760–765.

[22] A. Rahman, S. Das, T. Tuan, and S. Trimberger, “Determination of powergating granularity for FPGA fabric,” in Proc. CICC, 2006.

[23] Xilinx Inc., Spartan-3 FPGA Family Datasheet. [Online]. Available:http://direct.xilinx.com/bvdocs/publications/ds099.pdf

[24] T. Tuan and B. Lai, “Leakage power analysis of a 90 nm FPGA,” in Proc.Custom Integr. Circuits Conf., 2003, pp. 57–60.

[25] V. Degalahal and T. Tuan, “Methodology for high level estimation ofFPGA power consumption,” in Proc. ASP-DAC, 2005, pp. 657–660.

[26] Xilinx Inc., Virtex-4 FPGA Family Datasheet. [Online]. Available: http://direct.xilinx.com/bvdocs/publications/ds302.pdf

a 90-nm low-power fpga for battery-powered applications

Documents