resonant clocking using distributed parasitic capacitance

9
1520 IEEE JOURNAL OF SOLID-STATE CIRCUITS, VOL. 39, NO. 9, SEPTEMBER 2004 Resonant Clocking Using Distributed Parasitic Capacitance Alan J. Drake, Student Member, IEEE, Kevin J. Nowka, Member, IEEE, Tuyet Y. Nguyen, Jeffrey L. Burns, Member, IEEE, and Richard B. Brown, Senior Member, IEEE Abstract—A resonant-clock generation and distribution scheme that uses the inherent, parasitic capacitance of the clocked logic as a lumped capacitor in a negative-resistance oscillator is described. Clock energy is resonated between inductors and the parasitic, local clock network to save power over traditional clocking methodologies. Theory predicts that the data passing though the clocked logic will change the clock frequency by less than 1.25%. A resonant clock test chip was designed and fabricated in an IBM 0.13- m partially depleted SOI process. Although the test chip was designed to operate in the gigahertz range using integrated inductors, startup difficulties required the addition of external inductance to reduce the resonant frequency so that the effects of the parasitic capacitance could be measured. The parasitic capac- itance is approximately 40 pF per clock phase, resulting in a clock frequency between 106 and 146 MHz, depending on biasing. At its most efficient bias point, the clock dissipated 2.09 mW, which is approximately 35% less power than a conventional, buffer-driven clock. The maximum period jitter measured in the resonant clock due to changing data in the clocked latches was 55 ps at 124 MHz, or 0.68% of the clock period. Index Terms—Clock generator, energy-recovery circuit, har- monic resonance, low power. I. INTRODUCTION T HE clock distribution network of a microprocessor is typ- ically divided into global and local clock distributions as shown in Fig. 1(a). The global clock distribution comprises a clock source and the wires and buffers needed to drive the clock source to the logic gates. Since the clock buffers essentially drive the clock network in parallel, they can be combined to form the simplified circuit in Fig. 1(b). The buffers form an -stage exponential horn where each stage has a gain of . The total capacitance of the global buffers and wires is labeled as . The local clock distribution consists of the wires that connect the clock loads—latches and gates—in the micropro- cessor’s functional units. The capacitance of the local clock dis- tribution, , is the sum of the local wires and gate loads and forms the clock sink. In a properly designed exponential horn, the gain is balanced evenly across a number of stages, ; the input capacitance of each stage is the output capacitance of the Manuscript received January 24, 2004; revised February 27, 2004. This work was supported in part by a faculty grant from the IBM Austin Center for Ad- vanced Studies. A. J. Drake, K. J. Nowka, and T. Y. Nguyen are with the IBM Austin Research Laboratory, Austin, TX 78758 USA (e-mail: [email protected]). J. L. Burns is with the IBM Thomas J. Watson Research Laboratory, Yorktown Heights, NY 10598 USA. R. B. Brown is with the University of Utah, Salt Lake City, UT 84112 USA. Digital Object Identifier 10.1109/JSSC.2004.831435 stage divided by the stage gain as shown in Fig. 1(a). An ap- proximate value for the clock distribution capacitance can be computed as (1) which is sufficiently accurate even for a small number, , of buffer stages; the added capacitance of the third stage from the load of a balanced buffer horn with a stage gain of 3 is . Ignoring leakage and short-circuit power, the value for ob- tained in (1) can be used to estimate the power dissipated in the clock distribution network as (2) where is the power-supply voltage and is the clock fre- quency. For a buffer horn with a stage gain of 3, 2/3 of the clock power is dissipated in the local clock distribution and latches, which makes reducing local clock capacitance the prime target for reducing clock power. As can be seen from (2), the clock power dissipation de- pends strongly on the capacitance of the local clock distribu- tion. Deeper pipelining and greater complexity with each new generation of microprocessors has steadily increased the size of the local clock load. This increasing clock load, combined with ever-increasing clock frequencies, has made the clock-distribu- tion network the major power consumer in modern micropro- cessors. The POWER4 microprocessor, for example, dissipates 70% of its power in its clock distribution and latches [1]. Little can be done to reduce the capacitance of the local clock distribution by the clock-tree designer since the logic fixes the clock load. Instead, local clock power is reduced by shutting off sections of the local clock distribution using aggressive clock gating. To reduce global clock power, care is taken to optimize the global clock-distribution capacitance through effi- cient buffer allocation [2]. Another approach is to leverage the clock line inductance to aid in signal propagation of the global clock [3]. These techniques have been successful in slowing the growth of clock power dissipation but are ultimately limited by the fixed clock capacitance that needs to be switched. More exotic clock-power-reduction techniques that use some form of resonance to recycle clock energy have been increas- ingly studied due to their potential for significant power reduc- tion. The general idea is to form an energy-efficient tank with a high quality factor that dissipates power only in the parasitic resistance of the network, not in switching the clock capaci- tance. Adiabatic circuits represent the ultimate goal in resonance 0018-9200/04$20.00 © 2004 IEEE

Upload: nano-scientific-research-centre-pvtltd

Post on 10-Nov-2014

490 views

Category:

Technology


1 download

DESCRIPTION

For more projects or your own idea contact us @ www.nanocdac.com

TRANSCRIPT

Page 1: Resonant clocking using distributed parasitic capacitance

1520 IEEE JOURNAL OF SOLID-STATE CIRCUITS, VOL. 39, NO. 9, SEPTEMBER 2004

Resonant Clocking Using DistributedParasitic Capacitance

Alan J. Drake, Student Member, IEEE, Kevin J. Nowka, Member, IEEE, Tuyet Y. Nguyen,Jeffrey L. Burns, Member, IEEE, and Richard B. Brown, Senior Member, IEEE

Abstract—A resonant-clock generation and distribution schemethat uses the inherent, parasitic capacitance of the clocked logic asa lumped capacitor in a negative-resistance oscillator is described.Clock energy is resonated between inductors and the parasitic,local clock network to save power over traditional clockingmethodologies. Theory predicts that the data passing though theclocked logic will change the clock frequency by less than 1.25%.A resonant clock test chip was designed and fabricated in an IBM0.13- m partially depleted SOI process. Although the test chipwas designed to operate in the gigahertz range using integratedinductors, startup difficulties required the addition of externalinductance to reduce the resonant frequency so that the effects ofthe parasitic capacitance could be measured. The parasitic capac-itance is approximately 40 pF per clock phase, resulting in a clockfrequency between 106 and 146 MHz, depending on biasing. At itsmost efficient bias point, the clock dissipated 2.09 mW, which isapproximately 35% less power than a conventional, buffer-drivenclock. The maximum period jitter measured in the resonant clockdue to changing data in the clocked latches was 55 ps at 124 MHz,or 0.68% of the clock period.

Index Terms—Clock generator, energy-recovery circuit, har-monic resonance, low power.

I. INTRODUCTION

THE clock distribution network of a microprocessor is typ-ically divided into global and local clock distributions as

shown in Fig. 1(a). The global clock distribution comprises aclock source and the wires and buffers needed to drive the clocksource to the logic gates. Since the clock buffers essentiallydrive the clock network in parallel, they can be combined toform the simplified circuit in Fig. 1(b). The buffers form an

-stage exponential horn where each stage has a gain of .The total capacitance of the global buffers and wires is labeledas . The local clock distribution consists of the wires thatconnect the clock loads—latches and gates—in the micropro-cessor’s functional units. The capacitance of the local clock dis-tribution, , is the sum of the local wires and gate loads andforms the clock sink. In a properly designed exponential horn,the gain is balanced evenly across a number of stages, ; theinput capacitance of each stage is the output capacitance of the

Manuscript received January 24, 2004; revised February 27, 2004. This workwas supported in part by a faculty grant from the IBM Austin Center for Ad-vanced Studies.

A. J. Drake, K. J. Nowka, and T. Y. Nguyen are with the IBM Austin ResearchLaboratory, Austin, TX 78758 USA (e-mail: [email protected]).

J. L. Burns is with the IBM Thomas J. Watson Research Laboratory, YorktownHeights, NY 10598 USA.

R. B. Brown is with the University of Utah, Salt Lake City, UT 84112 USA.Digital Object Identifier 10.1109/JSSC.2004.831435

stage divided by the stage gain as shown in Fig. 1(a). An ap-proximate value for the clock distribution capacitance can becomputed as

(1)

which is sufficiently accurate even for a small number, , ofbuffer stages; the added capacitance of the third stage from theload of a balanced buffer horn with a stage gain of 3 is .Ignoring leakage and short-circuit power, the value for ob-tained in (1) can be used to estimate the power dissipated in theclock distribution network as

(2)

where is the power-supply voltage and is the clock fre-quency. For a buffer horn with a stage gain of 3, 2/3 of the clockpower is dissipated in the local clock distribution and latches,which makes reducing local clock capacitance the prime targetfor reducing clock power.

As can be seen from (2), the clock power dissipation de-pends strongly on the capacitance of the local clock distribu-tion. Deeper pipelining and greater complexity with each newgeneration of microprocessors has steadily increased the size ofthe local clock load. This increasing clock load, combined withever-increasing clock frequencies, has made the clock-distribu-tion network the major power consumer in modern micropro-cessors. The POWER4 microprocessor, for example, dissipates70% of its power in its clock distribution and latches [1].

Little can be done to reduce the capacitance of the local clockdistribution by the clock-tree designer since the logic fixes theclock load. Instead, local clock power is reduced by shuttingoff sections of the local clock distribution using aggressiveclock gating. To reduce global clock power, care is taken tooptimize the global clock-distribution capacitance through effi-cient buffer allocation [2]. Another approach is to leverage theclock line inductance to aid in signal propagation of the globalclock [3]. These techniques have been successful in slowing thegrowth of clock power dissipation but are ultimately limited bythe fixed clock capacitance that needs to be switched.

More exotic clock-power-reduction techniques that use someform of resonance to recycle clock energy have been increas-ingly studied due to their potential for significant power reduc-tion. The general idea is to form an energy-efficient tank with ahigh quality factor that dissipates power only in the parasiticresistance of the network, not in switching the clock capaci-tance. Adiabatic circuits represent the ultimate goal in resonance

0018-9200/04$20.00 © 2004 IEEE

Page 2: Resonant clocking using distributed parasitic capacitance

DRAKE et al.: RESONANT CLOCKING USING DISTRIBUTED PARASITIC CAPACITANCE 1521

Fig. 1. Buffer-driven clock network and resonant clock network diagrams.(a) Buffer-driven clock distribution. (b) Reduced clock tree.

in that all circuit power is recycled. Adiabatic logic benefitsfrom slow transition times making it impractical for high-per-formance logic, although modified adiabatic circuits have beendeveloped that run above 100 MHz [4], [5]. Another resonant-clock generation technique establishes a standing or travelingwave using the transmission-line characteristics of the clocklines; this approach has yet to demonstrate a power advantageover established clock-distribution techniques [6]–[9].

The resonant-clock scheme presented here addresses thepower dissipation in the local clock directly by using the para-sitic capacitance inherent in the local clock distribution as thecapacitor in an LC tank. All clock buffers and their associatedcapacitance are removed and the clock energy is resonatedbetween integrated inductors and the local clock capacitance.Unlike adiabatic circuits which power the logic from the clockand rely on slow edges, this resonant-clock scheme has thepotential to run at frequencies used in modern microprocessorssince only the gate capacitance is driven by the clock.

By incorporating the capacitance as part of the oscillator,clock generation, and distribution are designed concurrently and

Fig. 2. Ideal resonant clock-generation and distribution.

the oscillator naturally selects the most efficient frequency; un-like buffer driven resonant clock networks such as in [9] wherethe natural frequency of the network has to be tuned to the clockfrequency. Unfortunately, clock-gating can only be achieved inthe proposed scheme by shutting down the oscillator, which ispossible if the startup time of the oscillator can be tolerated.

The resonant-clock generation scheme presented here can beused to replace entire clock systems for small designs or thequadrant clocks in larger designs. Thanks to improving inte-grated inductors and copper metallization in advanced semi-conductor technologies, the quality factor of the resonant cir-cuit is sufficient to effect clock power reduction over ungatedbuffer-driven local clocking techniques. The next section willreview the theory behind distributed-capacitance resonant-clockgeneration. Following that, a prototype resonant clock, built inIBM’s 0.13- m partially depleted SOI (PD-SOI) [10] will bedescribed.

II. RESONANT CLOCKING THEORY

The main advantage of resonant clocking is a reduction ofclock power, but the procedure introduces challenges for thedesigner such as jitter and skew management and nonlinear loadcapacitance. Each of these will now be examined.

A. Power

The power reduction can only occur if less static power is dis-sipated in the parasitic resistance of the resonant clock than isdissipated switching the buffers and local clock capacitance ofa buffer-driven clock. To form the resonant clock, integrated in-ductors are placed in parallel with the clock load, , creatingan RLC circuit as shown in Fig. 2. At resonance, the impedanceof the parallel elements is infinite. Power is only dissipatedin the parasitic resistance, , which arises from the resistiveelements of the inductors and the distributed capacitance. Theclock generated by the resonant circuit is a sinusoid of the form

whose magnitude, , depends on themagnitude of . The resonant frequency, , is determined by

. To allow comparisons to the buffer-driven clock,is assumed to be , providing a clock that swings be-

tween 0 V and . The average power dissipation in the RLCcircuit at resonance is

(3)

Given that the quality factor, , of a parallel RLC circuit is, that the clock load is , and that the clock fre-

Page 3: Resonant clocking using distributed parasitic capacitance

1522 IEEE JOURNAL OF SOLID-STATE CIRCUITS, VOL. 39, NO. 9, SEPTEMBER 2004

quency is , then the ratio of power dissipated in the proposedresonant clock versus a buffer-driven clock, from (2) and (3), is

(4)

Thus, a resonant-clock distribution with a greater thanwill use less power than a buffer-driven clock network witha stage gain of 3. The quality of integrated inductors has im-proved significantly with each technology generation; inductor

values greater than 15 were reported for the target technology[11]. The achievable power reduction in the resonant clock de-pends on how low the resistance of the clock distribution can bemade.

Another advantage of the proposed resonant clock is theability to control the maximum voltage of the output clocksimply by varying . This feature can be used to overdrivethe clock signal for faster rise and fall times at the logic gatesat the expense of some extra power, but without having togenerate and propagate higher clock harmonics. To do this withbuffer-driven clocks, the output resistance of the drivers mustbe made smaller by increasing the driver size which increasesthe capacitance in the network. Unfortunately, the clock-distri-bution network can nullify such efforts by filtering the higherclock harmonics, which is why clocks on high-performanceprocessors become more sinusoidal with each generation. Caremust be taken when overdriving the resonant clock to preventthe clock waveform from clipping so that it is no longer sinu-soidal. Clipping increases the power dissipation in the resonantcircuit and reduces its efficiency [12].

B. Skew Management

Since the clock network serves as the capacitance setting theclock frequency, it must be made small enough to avoid trans-mission-line effects and to keep skew manageable. Electromag-netic signals propagate at a speed given by

(5)

or roughly 150 m/ps in wires in an SiO insulator. Thus, fora 1-GHz clock, the clock sinks must extend less than 15 mmfrom the clock source to meet a skew requirement of 10%. Theactual propagation time will be slower than predicted by (5)due to the loading and branching in the clock network, so somemargin must be included in the design. A block of logic 7 cellsby 64 cells, where the cells are 16 16 wire tracks and containpass-gate clock loads, was simulated in IBM’s 0.13- m SOIprocess with a 3-GHz clock. Using distributed RLC pi models,the skew from top left to bottom right of the network was 10 ps,which is longer than the 6.9 ps predicted by (5). Fortunately, theskew requirements are stringent enough to ensure that the clocknetwork does not behave like a transmission line. Since the risetime of a sinusoidal clock is 50% of its period and skew targetsare less than 10% of the clock period, a clock network that meetsskew requirements will never be long enough for reflections tobecome problematic.

C. Quality Factor

There are a number of definitions for quality factor which areessentially equivalent:

Maximum Energy StoredEnergy Dissipated per CycleMaximum Energy Stored

Average Power Dissipation

(6)

where is the resonant frequency and is the bandwidth, thedifference between the half-power frequency above and below

[13]. Quality factors of individual components are moreeasily characterized than a resonant circuit, so it is useful tobe able to relate the quality factor of the components to that ofthe overall circuit. Real inductors and capacitors contain lossyelements and have somewhat complicated models when allphysical effects are taken into consideration. However, if theRLC resonant frequency is well below the self resonance of thecircuit elements, then at resonance the inductor and capacitorare inductive and capacitive with some real lossy component.The quality factor of a nonideal inductor [13] in parallel formis approximated from (6) by

(7)

where is the parasitic resistance of the inductor expressedas a parallel resistance at resonance. The quality factor for anonideal capacitor [13] in parallel form is approximated by

(8)

where is the equivalent parallel resistance in the capac-itor. Solving (7) and (8) for and , substituting into thequality factor of a parallel RLC tank, ,and performing some algebra provides the tank quality factor interms of its component’s quality factors:

(9)

From (9) it is apparent that a low-quality distributed capac-itor will limit the quality of the resonant circuit. To improve thequality factor, the clock resistance must be kept to a minimum byutilizing techniques already needed for reducing skew in stan-dard clock-distribution methods such as clock grids, fat wires,and multiple vias. Unlike a standard clock distribution wherereduced capacitance is a must, the quality factor of the resonantclock is improved by adding extra capacitance to the distribu-tion network. In most integrated oscillators, the quality of theinductor limits the quality factor, but the distributed nature ofthe capacitor adds enough parasitic resistance to the capacitanceto limit the quality factor.

D. Nonlinear Capacitance

The most challenging part of the resonant clock is character-izing the distributed, parasitic capacitor. If the clock network isdesigned to meet skew requirements and avoid transmission-lineeffects, its parasitic capacitance acts like a lumped capacitor, but

Page 4: Resonant clocking using distributed parasitic capacitance

DRAKE et al.: RESONANT CLOCKING USING DISTRIBUTED PARASITIC CAPACITANCE 1523

Fig. 3. Equivalent negative-resistance oscillator circuit.

Fig. 4. Master–slave D-flip-flop.

with a time-varying characteristic. Fig. 3 shows an equivalentmodel of a negative-resistance oscillator used for the resonantclock. The time-varying capacitance is represented as a fixedcapacitance, , in parallel with a periodically varying capac-itance, , and time-dependent noise capacitance, . Ifdesigned correctly, the negative resistance and parasitic resis-tance cancel and the circuit behaves like an ideal LC tank whichhas a transfer function of

(10)

and a natural frequency of . Unfortunately, thecapacitance in the distributed network is not constant, but a func-tion of two independent voltages applied to the gate and drainof the transistors. The gate voltage comes from the clock signaland the drain voltage, which is pseudo-random, comes from thedata flowing through the logic. Equation (10) is not an accuratemodel of the transfer function. In Fig. 3, the time-varying ca-pacitance associated with the gate voltage is the periodic capac-itance, , and the time-varying capacitance associated withthe data signals is the noise capacitance, . The commonflip-flop design in Fig. 4 is used for the clock loads in this study.Fig. 5 shows the flip-flop’s simulated input gate capacitancevariation for sinusoidal and square-wave gate voltages, ignoringthe effect of the data signals. The change in gate capacitanceis periodic with the input waveform and since the capacitancechanges are driven by the clock at steady state, a stable oscilla-tion frequency will be reached, as will be explained later.

The noise capacitance, , is more difficult to understandbecause it results from logic signals travelling through theclocked logic and will be pseudo-random in nature. The logicis driven by sources independent of the clock and will cause

Fig. 5. Gate capacitance variation with input waveform.

some amount of mixing in the clock signal. An intuitive under-standing of this effect is obtained from the KCL node equationfor the circuit in Fig. 3

(11)

There is no analytical solution to (11), but some insight intothe solution can be gleaned from its Fourier transform, which isgiven by

(12)

The last two convolution terms on the right-hand side of (12)are not in (10) and result directly from the two time-varyingcapacitances. In the steady-state solution for (12),and are co-periodic so they will not cause phase noise.The noise capacitance, , and data voltages, ,are random and will modulate the gate voltage, causing jitter.The difficulty is in analyzing the magnitude of this effect in aclock circuit.

The oscillator behaves like a frequency modulation circuitwhere the data voltage acts as a modulating signal [14]. Theinstantaneous frequency of the oscillator is given by

(13)

Page 5: Resonant clocking using distributed parasitic capacitance

1524 IEEE JOURNAL OF SOLID-STATE CIRCUITS, VOL. 39, NO. 9, SEPTEMBER 2004

Thus, the frequency is composed of a fundamental frequency,, divided by a normalized time-varying capacitance. If the

time-varying capacitance, , has a small maximum ampli-tude variation, , then a reasonable approximation can bemade as follows

(14)

Equation (14) is valid when the change in capacitance is smallrelative to the average capacitance value. Plugging (14) into (13)gives an instantaneous frequency, , of

(15)

If the time-varying capacitance is thenthe instantaneous frequency is

(16)

By defining , then the phase shiftover time is given by

(17)

Equations (16) and (17) can be used to analyze the effects ofthe time varying load capacitance on the oscillation frequencyand on the jitter of the oscillator [15]. The most important ob-servation is that the frequency deviation is small if the changein capacitance is also small. If then the frequencywill only deviate by 5%. Simulations show that the clock loadcapacitance of the D-flip-flop changes by 5% due to changes inthe data-flow. If to of the clock network capacitanceis in the gates of the latches then the maximum change in theclock network capacitance, which occurs when the data in allthe latches changes at the same time and in the same direction,is between 1.7% and 2.5%. This results in a change in frequencyof 0.83% to 1.25%. Simulations of a simplified resonant clocknetwork showed less than 15-ps period jitter for a 2-GHz clock.This analysis does not account for capacitive coupling betweenthe gate and the drain which will depend strongly on the edgerate of the logic and which may be significant if data movementis the same through a majority of the logic gates.

E. Miscellaneous Concerns

Some other concerns for resonant clocking include the areapenalty associated with the integrated inductors, how to gate theclock to reduce power when a functional unit is not needed, andhow to synchronize clock domains. Each of these will be brieflyaddressed. The area penalty can only be reduced by using res-onant clocks that require the minimum number and size of in-ductors. Fortunately, since the quality of the parasitic capacitorlimits the quality of the resonant clock, some tradeoffs can be

Fig. 6. Modified test-chip block diagram.

made between inductor area and quality. As for clock gating, itis a challenge since the local clock buffers have been removed.Turning off the clock is an option but wastes time and powerwaiting for the clock to settle. Finally, to use the presented res-onant clock in a large design that cannot be covered by a singleclock domain due to skew requirements, some tuning mecha-nism would have to be incorporated to synchronize multipleclock domains [16], or a hand-shaking system used between dif-ferent clock networks.

III. RESONANT CLOCKING TEST CHIP

A test chip was fabricated in IBM’s 0.13- m RF PD-SOICMOS process [10] to analyze the frequency, power dissipa-tion, and quality of the proposed resonant clock as compared toa buffer-driven clock network. A block diagram of the test chipis shown in Fig. 6 and a microphotograph of the test chip, minusthe external inductors, is shown in Fig. 7. The load of the localdistribution consists of three 8 64 scan-chains connected bya clock grid. Each scan-chain has eight rows, for a total of 24rows, where each row is composed of 64 D-flip-flops. The ex-perimental clock load represents the clock load of a block ofstatic CMOS logic with 24 latch stages, as may be found in thefunctional units of a 64-bit pipelined microprocessor. The clockdistribution is laid out differentially in metal-2 over each cellwith a parallel grid on metal-4. The logic gates are powered bythe supply voltage .

A negative-resistance oscillator was designed as the resonantclock source with integrated inductors similar to those reportedin [11]. The capacitance used to set the resonant frequency ofthe oscillator is the parasitic capacitance of the clock networkand was estimated, based on wire models and manual extrac-tion, since an automatic extraction deck was not available, to beabout 21 pF per clock phase. The parasitic resistance and induc-tance in the clock wires joining the VCO with its clock load wereunderestimated and so the NFETs in the VCO are undersized.The wires have 3- resistance and about 0.8 nH which is enough

Page 6: Resonant clocking using distributed parasitic capacitance

DRAKE et al.: RESONANT CLOCKING USING DISTRIBUTED PARASITIC CAPACITANCE 1525

Fig. 7. Microphotograph of resonant clock test macro.

to keep the VCO from starting. To get the chip to start up, theintegrated inductors were cut out and the internal clock-nodebonded to connect to off-chip inductors. The total inductanceconsists of the bond-wire inductance, , and the externalinductor, . By using larger, off-chip inductors, the resonantfrequency of the oscillator was lowered to a value where thetransconductance of the crosscoupled NFETs could cancel theparasitic resistance and maintain the oscillation. Using off-chipinductance changes the resonant-clock experiment because theinductance is no longer integrated and the clock frequency ismuch lower than on a high-performance VLSI circuit. Never-theless, the capacitance that sets the oscillation frequency is stillthe local clock capacitance and its affect on clock stability dueto data signals can still be measured.

To simulate a conventional, buffer-driven clock distribution,an 11-stage ring-oscillator and associated 7-stage buffer horn,with a stage gain of 3, were added for power comparisons.Both the resonant clock and the ring oscillator drive the samelocal clock network but the ring oscillator is tri-stated, soonly one clock driver has access to the clock grid at a time.The ring-oscillator characteristics were measured after cuttingout the inductors with a laser. Simulations were performed toensure that the ring-oscillator frequency was close to 2 GHzwith 10% edge rates at 1.2 V. Both clock phases are divideddown by 64 and output for frequency measurements. Thereare three power-supply domains on the chip that separate theoscillator, scan-chain, and ring-oscillator power.

IV. TEST RESULTS

The test chip operates in two modes for testing. In the reso-nant mode, the ring oscillator is disabled and the resonant clockcontrols the clock network. In the ring-oscillator mode, the res-onant clock is disconnected from the clock grid using laser trim-ming and the ring oscillator drives the clock network without theinductors. Functionality of the latches, and by correlation the

Fig. 8. Operating frequency of the resonant clock and the ring-oscillator.

clock, was determined by monitoring the divided clock output.The clock was also monitored at the junction of and .

Fig. 8 shows the measured clock frequency of the reso-nant clock and the ring-oscillator as a function of the supplyvoltage on the logic. The resonant-clock frequency varies from147 MHz when the VCO is driven by 0.4 V to 112 MHz whendriven by 0.6 V, which is consistent with simulations. Thering-oscillator frequency, on the other hand, increases rapidlywith power supply because of increasing current drive in theindividual delay elements. Measurements were taken with ex-ternal inductors ranging from a simple wire to a 420-nH air-coreinductor. Measurements indicate that the clock load is between38 and 45 pF in each phase of the clock, the bond-wire has aninductance of 15.9 nH, and the external wire has an inductanceof 17.1 nH. The clock frequency measured is within 12.5%of the predicted value. For all remaining measurements, theexternal inductor consisted only of a simple wire between thebond-pad and the power supply.

Fig. 9 shows the eye diagram of the dual-phase clock mea-sured at the inductor bond-pin. The low voltage swing and dis-tortion in the waveform occur, according to simulations, becausethe eye diagram was measured between a voltage divider com-posed of the external and the bond-wire inductance, not at theclock gates. Simulations show a cleaner signal at the clock gates,and the logic is functional in measurements, but the actual shapeof the clock signal cannot be verified in this test chip.

Fig. 10 shows the power dissipated in the ring-oscil-lator-driven clock versus the resonant clock. The ring-oscillatorpower has also been scaled by frequency and supply voltageto compare the two techniques. Three things complicate thiscomparison. First, the buffer-horn was not optimized and maydissipate more power than a well designed clock distributionnetwork. Second, the tri-state inverters add an extra load to theresonant clock that would not normally be present. Third, thiscomparison is not exact because the actual amplitude of theresonant clock voltage could not be measured. Knowing theamplitude of the clock signal is necessary for an accurate com-parison between resonant clocking and buffer-driven clockingfrom (4).

Page 7: Resonant clocking using distributed parasitic capacitance

1526 IEEE JOURNAL OF SOLID-STATE CIRCUITS, VOL. 39, NO. 9, SEPTEMBER 2004

Fig. 9. Dual-phase clock eye-diagram.

In the test chip, the clock voltage swing can be measured atthe point where the bond-wire inductor (16 nH) and the externalwire (17 nH), meet. Measurements taken at that tapped pointshow the clock voltage swing at the logic gates is from

V to , depending on the bias on the resonant clock. Theclock voltage at the clock gates on chip is higher than that mea-sured at the tapped point and its magnitude is determined bythe ratio of inductance in the bond-wire to the inductance in theexternal wire. Simulations of the clock network, under the biasconditions being tested, show that the on-chip clock voltage is300 mV higher than the clock voltage at the pad where the twoinductors meet. The simulations predict a power dissipation of2.8 mW at this bias point. Based on these simulations and themeasurements taken, the on-chip resonant clock swing at themost efficient point measured— of 0.67 V and of0.43 V—is between 0.63 and 0.93 V. The resonant-clock power,measured at V with a frequency of 147 MHz,is 2.06 mW. The ring-oscillator clock power was measured as7.72 mW with a clock frequency of 360 MHz at V.Scaling the measured ring-oscillator clock power to 147 MHz at0.63 V and 147 MHz at 0.93 V yields a scaled power of 3.15 mWand 6.9 mW. The resonant clock thus dissipates between 35%and 70% less power than the buffer-driven clock at the same fre-quency and clock amplitude. The actual power savings is mostlikely near the lower part of this range. From (4), the qualityfactor of the resonant circuit is 2.4 if the power savings is 35%and 5.3 if the power savings is 70%, which is quite low. For aninductor quality of 15, this means the capacitance has a qualitybetween 2.8 and 8.2 from (9). Again, the lower number is prob-ably most accurate based on simulations.

It is important to note that the resonant clock has a wide swingin power dissipation without increasing clock frequency. This isdue to the resonant clock leaving its sinusoidal operation andgenerating a waveform that looks like a half sine wave [12]. Forthe resonant clock to make sense from a power perspective, itmust be designed efficiently and kept in its sinusoidal operatingmode.

Fig. 11 shows the period jitter measurements of the outputof the divider as the scan-chain input frequency was increasedfrom DC to one-half of the resonant frequency. Period jitter is

Fig. 10. Power dissipation of the resonant clock and the ring oscillator.

Fig. 11. Period jitter of the resonant clock and the ring oscillator as thefrequency of data passing through the scan-chain is swept.

the square-root of the variance of the width of the clock period.The measurements were made using an Agilent Infiniium Oscil-loscope using the method described in [17]. Since the clock loadis a scan-chain, the state of all of the flip-flops changes each timethe input changes and in the same direction, maximizing the ca-pacitance change in the clock network. The resonant clock jitterwas measured with the oscillator running at 2.0 MHz and thering-oscillator jitter was measured with the ring oscillator run-ning at 1.96 MHz. The output clock frequency is the internalclock divided by 64. The on-chip period jitter can be approx-imated by dividing the period jitter of the output clock by thesquare root of the divider [18], or 8 in this case, although thisignores the jitter contribution of the divider.

The ring-oscillator jitter is higher than the resonant clockjitter due to well studied differences between and delay-based oscillators. Jitter in the 2-MHz divided output clock is arelatively flat 400 ps until the data rate approaches one-half ofthe clock frequency where jitter rapidly rises to 910 ps, or 0.18%of the clock period. The maximum jitter of the internal clock,measured at the inductor bond-pad, was 55 ps, or 0.68% of the124-MHz clock. The closer the data frequency is to half theresonant frequency, the worse the jitter becomes, which is ex-pected because faster data rates mean more capacitive coupling

Page 8: Resonant clocking using distributed parasitic capacitance

DRAKE et al.: RESONANT CLOCKING USING DISTRIBUTED PARASITIC CAPACITANCE 1527

and more data induced capacitance changes. Unfortunately, alogic chip will have random data patterns, not deterministic pat-terns as measured here. To approximate a more real scenario,the data frequency was varied randomly between 1 and 60 MHzfor several minutes. The measured jitter was 555 ps, or 0.11%.At higher frequencies, the edge rates are sharper, so capacitivecoupling should increase the jitter as the gigahertz range is ap-proached, but since these changes are mostly local in the dis-tributed clock network, they should not be significant.

V. CONCLUSION

The test macro demonstrates that a stable resonant clock canbe implemented using the inherent parasitic capacitance of thelocal clock network in an LC tank. Both a resonant clock usingthe local gate capacitance and a ring-oscillator-driven buffer-horn clock distribution were implemented. A stable sinusoidalclock between 112 and 147 MHz, depending on biasing, wasmeasured using a straight wire for the external inductance. Ananalysis of the voltage-varying gate capacitance shows that dataflowing in the clock network should change the clock frequencyby less than 1.25%. A maximum period jitter of 0.68% was mea-sured when the scan-chain data frequency approached one-halfof the clock frequency. Power comparisons indicate that the res-onant clock dissipates around 35% less power than the buffer-driven clock with an estimated quality factor between 2.4 and5.3. Since the off-chip inductors used in the measurements havequality factors between 15 and 30, the of the parasitic capac-itance is nearly the quality of the tank. On-chip inductors in thistechnology were measured with quality factors above 15 as well[11], so moving the inductance on-chip should not adversely af-fect the power savings. The main disadvantage of scaling theclock into the multigigahertz range is the increase in wire re-sistance due to skin effect which will decrease the already lowquality of the distributed capacitor. Some things can be doneto improve the quality of the parasitic capacitance. The clocksignal was partially routed in poly-silicon within the D-flip-flop,so removing poly routing and using wider wires in lower metallayers would improve the quality of the capacitor; clock wirewidths in general need to be wider to handle the current neededat higher frequencies.

Since the capacitor quality is the limiting factor, some lossof in the inductor can be tolerated to save area. Moving theinductor close to the logic circuits and using a multiturn inductorinstead of a single-turn inductor would save area at the expenseof some inductor quality. A balanced VCO with crosscoupledPFETs as well as NFETs uses only one inductor instead of twofor even more area savings. A second generation of the resonantclock designed to operate in the multigigahertz range is beingdeveloped that improves the clock load and area using thesetechniques.

ACKNOWLEDGMENT

The authors acknowledge the contributions made by R. Mon-toye and U. Ghoshal of IBM’s Austin Research Laboratory andthe help with the technology given by N. Zamdmer, M. Sherony,M. Talbi, and J.-O. Plouchart of IBM-Fishkill.

REFERENCES

[1] C. J. Anderson et al., “Physical design of a fourth-generation POWERGHz microprocessor,” in IEEE ISSCC Dig. Tech. Papers, 2001, pp.232–233.

[2] P. J. Restle et al., “A clock distribution network for microprocessors,” J.Solid-State Circuits, vol. 36, pp. 792–799, May 2001.

[3] X. Huang, P. Restle, T. Bucelot, Y. Cao, T. J. King, and C. Hu,“Loop-based interconnect modeling and optimization approach formulti-gigahertz clock network design,” J. Solid-State Circuits, vol. 38,pp. 457–463, Mar. 2003.

[4] W. Athas et al., “The design and implementation of a low-powerclock-powered microprocessor,” J. Solid-State Circuits, vol. 35, pp.1561–1569, Nov. 2000.

[5] S. Kimm, C. Ziesler, and M. Papaefthymiou, “A true single-phase en-ergy-recovery multiplier,” IEEE Trans. VLSI Syst., vol. 11, pp. 194–207,Apr. 2003.

[6] P. Restle and X. Huang, “Inductance: Implications and solutions forhigh-speed digital circuits,” in IEEE ISSCC Dig. Tech. Papers, 2002,pp. 558–562.

[7] J. Wood, T. C. Edwards, and S. Lipa, “Rotary traveling wave oscillatorarrays: A new clock technology,” J. Solid-State Circuits, vol. 36, pp.1654–1665, Nov. 2001.

[8] F. O’Mahony, C. P. Yue, M. A. Horowitz, and S. S. Wong, “A 10-GHzglobal clock distribution using coupled standing-wave oscillators,” J.Solid-State Circuits, vol. 38, pp. 1813–1820, Nov. 2003.

[9] S. Chan, K. Shepard, and P. Restle, “Design of resonant global clockdistributions,” in Proc. ICCD, 2003, pp. 248–253.

[10] N. Zamdmer et al., “A 0.13-�m SOI CMOS technology for low-powerdigital and RF applications,” in Symp. VLSI Technology Dig. Tech. Pa-pers, 2001, pp. 85–86.

[11] N. Zamdmer et al., “Suitability of scaled SOI CMOS for high-frequencyanalog circuits,” in Proc. ESSDERC, 2002, pp. 511–514.

[12] D. Ham and A. Hajimiri, “Concepts and methods in optimization of in-tegrated LC VCOs,” J. Solid-State Circuits, vol. 36, pp. 896–909, June2001.

[13] J. W. Nilsson, Electric Circuits, 3rd ed. New York: Addison-Wesley,1990.

[14] S. Haykin, Communication Systems, 3rd ed. New York: Wiley, 1994.[15] S. Goldman, Frequency Analysis, Modulation, and Noise. New York:

McGraw-Hill, 1948.[16] V. Gutnik and A. P. Chandrakasan, “Active GHz clock network using

distributed PLLs,” J. Solid-State Circuits, vol. 35, pp. 1553–1560, Nov.2000.

[17] “Jitter analysis techniques using an Agilent Infiniium oscilloscipe,” Ag-ilent Technologies, Palo Alto, CA, [Online.] Available: http://cp.litera-ture.agilent.com/litweb/pdf/5988-6109EN.pdf, May 2002.

[18] M. S. McCorquodale, M. K. Ding, and R. B. Brown, “Study and sim-ulation of CMOS LC oscillator phase noise and jitter,” in Proc. ISCAS,2003, pp. 665–668.

Alan J. Drake (S’99) received the B.S. degreein electrical engineering from the University ofArizona, Tucson, in 1997 and the M.S. degreein electrical engineering from the University ofMichigan, Ann Arbor, MI, in 2000. Currently, he isworking toward the Ph.D. degree at the Universityof Michigan.

His research interests include low-power VLSI,resonant clock generation and distribution, and SOItechnology. In March, 2004, he joined the IBMAustin Research Laboratory where he is conducting

research on clock distribution and high-performance processor circuit design.

Kevin J. Nowka (S’84–M’85) received the B.S. de-gree in computer engineering from Iowa State Uni-versity, Ames, in 1986 and the M.S. and Ph.D. de-grees in electrical engineering from Stanford Univer-sity, Stanford, CA, in 1988 and 1995, respectively.

He joined the IBM Austin Research Laboratoryin 1996 where he has conducted research on CMOSVLSI circuits for two 1-GHz microprocessors andfor a low-power embedded PowerPC processor. Hecurrently manages the Exploratory VLSI DesignDepartment of the IBM Austin Research Laboratory.

He holds 35 patents related to microprocessor design.

Page 9: Resonant clocking using distributed parasitic capacitance

1528 IEEE JOURNAL OF SOLID-STATE CIRCUITS, VOL. 39, NO. 9, SEPTEMBER 2004

Tuyet Y. Nguyen was born in Vietnam.She joined IBM in 1987. She has been involved in

process support, specializing in analyzing device fail-ures resulting from manufacturing process and layoutdesign issues. Her current focus is VLSI mask designfor high speed analog and digital VLSI designs.

Jeffrey L. Burns received the B.S. degree in en-gineering from the University of California, LosAngeles, and the M.S. and Ph.D. degrees in electricalengineering from the University of California atBerkeley.

In October 1988, he joined the IBM T. J. WatsonResearch Center as a Research Staff Member, wherehe worked in the areas of layout compaction, layoutsynthesis for control logic, CAD system architecture,and microprocessor design. In 1996, he joined theIBM Austin Research Laboratory, Austin, TX, where

he worked initially on high-frequency microprocessor design and design-toolsstrategy. From 1999 to 2003, he managed the Exploratory VLSI Design Depart-ment of the Austin Research Laboratory, working in the areas of high-end mi-croprocessors, ultra-low-power embedded processors, and high-bandwidth datacommunications. Since mid 2003, he has been on the IBM Research TechnicalStrategy staff, in Yorktown Heights, NY, where his main responsibility has beento produce IBM Research’s long-term IT industry outlook.

Dr. Burns received an IBM Outstanding Technical Achievement Award in1997 for his microprocessor tools and design work for IBM’s S/390 products,and an IBM Research Division Award for his work on IBM’s 1.0-GHz PowerPCprototype disclosed in 1998.

Richard B. Brown (S’74–M’76–SM’91) receivedthe B.S. and M.S. degrees in electrical engineeringfrom Brigham Young University, Provo, UT, in1976, and the Ph.D. degree in electrical engineering(solid-state) from the University of Utah, Salt LakeCity, in 1985.

From 1976 to 1981, he worked in computer designas Vice-President of Engineering at Holman Indus-tries, Oakdale, CA, and then as Manager of Com-puter Development at Cardinal Industries, Webb City,MO. He joined the faculty of the Department of Elec-

trical Engineering and Computer Science, University of Michigan, Ann Arbor,in 1985. He has conducted major research projects in the areas of solid-statesensors, mixed-signal circuits, GaAs and silicon-on-insulator circuits, and highperformance and low power microprocessors. He served as Associate Chair ofElectrical Engineering for four years and as Interim Chair of Electrical Engi-neering and Computer Science for two years at the University of Michigan. Hebecame Dean of Engineering at the University of Utah in July 2004.

Prof. Brown serves as Chairman of the NSF MOSIS Advisory Council for Ed-ucation. He was Chair of the 1997 Conference on Advanced Research in VLSIand the 2001 Microelectronic System Education Conference. He has served asGuest Editor of the IEEE JOURNAL OF SOLID-STATE CIRCUITS and Proceedingsof the IEEE, and as associate editor of IEEE TRANSACTIONS ON VLSI SYSTEMS.