[IEEE IEEE 34th Annual Spring Reliability Symposium, 'Reliability - Investing in the Future' - Boxborough, MA, USA (1996.04.18-1996.04.18)] IEEE 34th Annual Spring Reliability Symposium, 'Reliability - Investing in the Future'. - Design for Reliability in the Alpha 21164 Microprocessor

Download [IEEE IEEE 34th Annual Spring Reliability Symposium, 'Reliability - Investing in the Future' - Boxborough, MA, USA (1996.04.18-1996.04.18)] IEEE 34th Annual Spring Reliability Symposium, 'Reliability - Investing in the Future'. - Design for Reliability in the Alpha 21164 Microprocessor

Post on 16-Mar-2017




2 download


  • Design for Reliability in the Alpha 21 164 Microprocessor

    John Kitchin

    Digital Equipment Corporation

    (508) 568-4650 kitchin@bigq.enet.dec.com 77 Reed Road, Hudson, MA 01749-2839


    An increasingly important area where reliability engineering can contribute to success in the semiconductor industry is in methods and tools for executing design for reliability. VLSl process variation presents design-dependent levels of statistical reliability risk to all products fabricated in the process. Reliability methods and tools that allow process reliability to be projected directly and quantitatively back into the design can en- able designers to achieve balance in design pa- rameters that simultaneously achieve perform- ance, time-to-market and long-term reliability goals. Statistical Electromigration Budgeting (SEB) is a method for setting and verifying elec- tromigration (EM) design for interconnect that allows designers to selectively supersede fixed EM design rules to achieve increased chip per- formance while simultaneously quantifying chip- level EM reliability to directly assure design conformance to chip-level reliability require- ments. An overview of SEB and some results from its application in the design and verification of the Alpha 21 164 microprocessor are given, followed by a tabulation of opportunities for fur- ther reliability engineering activities enabled by developments like SEB.


    Consider a (strict) definition of VLSl Design for Reliability as achieving a chip-level reliability goal. Then an increasingly important area where reliability engineering can contribute to success in the semiconductor industry is in methods and tools for executing design for reliability. This is especially true under the market push for higher levels of performance and shorter design cycles

    in VLSl products. In particular, the spiraling complexity and geometrically increasing clock rates for high performance microprocessors need sophisticated design and verification ap- proaches to ensure that long-term reliability re- quirements are met without sacrifice of competi- tive performance and time-to-market goals.

    VLSl process variation, which can be reduced with good process development and manufactur- ing process improvement strategies, can never be completely eliminated. This variation pres- ents design-dependent levels of statistical reli- ability risk to all products fabricated in the proc- ess. If design for reliability is taken to mean achieving a chip-level reliability goal, then fixed limits for reliability related design parameters are mathematically arbitrary. For example, having all chip interconnect satisfy a fixed current density limit may not guarantee chip-level reliable design for electromigration (EM), nor does allowing some interconnect to exceed fixed limits neces- sarily lead to an unreliable design. In design for reliability under process variation it is the chip- level reliability risk that is ultimately meaningful.

    Reliability methods and tools that allow process reliability to be projected directly and quantita- tively back into the design can enable designers to achieve balance in design parameters that si- multaneously achieve performance, time-to- market and long-term reliability goals. A recent example of such methods and tools is Statistical Electromigration Budgeting (SEB) [I], which was applied in the design and verification of the Alpha 21 164 microprocessor [2], a 4-level metal, 0.5 pm CMOS implementation of the Digital's Alpha architecture operating at up to 333 MHz.


  • Design for EM Reliability

    Commercial CMOS VLSl process technologies typical employ AI as the interconnect metal. The resistance to EM degradation of AI interconnect has been improved over the past several years through use of better alloys, adding underlayers, and increasing relative grain size. However, during the same period VLSl designers have ag- gressively pursued increased product perform- ance (especially for microprocessors). The result has been significantly higher clock speeds and power dissipation. (Versions of the Alpha 21 164 that run at 300 MHz dissipate a maximum of 50W [2,3].) Increasing the clock speed increases the current density while higher power dissipation increases the chip junction temperature. Current density and temperature drive the EM directed diffusion mechanism that can lead to voids (leading to open circuits) or hillocks (leading to short circuits) in AI interconnect. Thus EM has remained a key limiter in the design of reliable high performance VLSl products.

    Historically designers have compared intercon- nect dc average current per unit width, I,,, to a conservative fixed design limit. Let

    Actual Ieff Design Limit I,, S =

    (I,, is typically modified to correctly account for the effects of current crowding in corners and vias and the smaller net charge redistributed by ac nodes compared to dc nodes.). A design was viewed as reliable with respect to EM if S I1 , while any interconnect with S > 1 had to be re- designed.

    However, when viewed from the process reliabil- ity perspective, EM degradation is inherently sta- tistical. A wide dispersion in the times to failure (defined as a specific increase in interconnect resistance) is typically observed for identically sized and stressed segments of interconnect or contacts (as shown by the probability plot of ac- tual EM accelerated testing data in Figure 1). Approaches that factor EM failure statistics into the setting of EM design limits [4] or into comput- ing EM reliability in circuit simulation [5] do not quantify chip-level design reliability or reveal the

    relationship between S and EM risk in a way that enables a chip-level EM design for reliability strategy.

    .99 t-

    .95 c -80 I 4 * + *


    -05 1 6 .O1 ~

    100 200 300 400 500600

    EM Failure Time under Acceleration (hrs)

    Figure I : Illustration of the Statistical Nature of EM Degradation.

    SEB Concept

    If design for reliability is defined to mean achiev- ing a chip-level reliability goal, then fixed current density design limits become mafhematically arbitrary. Note that having all interconnect sat- isfy S 5 1 does not guarantee a particular chip- level reliability, nor does having some intercon- nect with S > 1 necessarily mean that the chip- level reliability goal is exceeded. Only the total statistical risk to the chip is meaningful. Then if the EM reliability of each segment of intercon- nect at each stress level can be computed, the chip-level EM reliability goal can be budgeted among classes of interconnect or chip design subdivisions (such as boxes) to minimize the performance limitations that an EM reliability goal places on the design.

    Figure 2 illustrates some aspects of a statistical budget approach. It shows, for example, that lo2 unit lengths of Narrow first level metal (MTI) at S = 2.1 present about t he same EM risk as I O 6 unit lengths of Narrow MTI at S = 10 . Further, the per unit length EM risks of Narrow and Wide MTI diverge significantly as S increases (due to different microstructure effects [6]), indicating that Wide MTI will limit EM design at the highest current stresses. not Narrow MT1 .

  • l e 4 , I

    E le-6 s le-8 .- I f le-IO l?z 6 5 le12


    .- .- a g

    -t Wide MTI

    I I

    .- i=g - E le-14 a a


    S (relative current stress)

    Figure 2; Comparison of EM Risk in Narrow & Wide MTI.

    SEB Method

    Interconnect test chips are subjected to high cur- rent density and temperature while being moni- tored for increases in resistance, as is typical in EM process reliability testing. The elapsed test time until the resistance of a structure increases beyond a fixed limit is defined as its failure time. The failure times of a sample of structures tested under identical current and temperature stresses are used to estimate the failure distribution for the structure class. (Narrow MT1 and Wide MT1 are examples of classes of interconnect struc- tures. Within a class the relative reliability of each size structure is assumed to be a function only of the relative current density and temperature stress on the structure.) Electromigration models are used to scale the failure time distribution to the Design Limit leR and temperature.

    CAD tools are used to determine the relative cur- rent stress S, and temperature acceleration factor A, at worst case operating conditions for the j-th line segment (or contact) (where j = 1,. . ~, mi ) of the i-th structure class (where i = 1, ..., k ) in the design. For each in- terconnect class SEB computes the estimated composite EM risk f i j for the chip arising from the class over the chip product life L. Composite EM Risk is based on a series reliability model- any interconnect segment failure means chip failure. Interconnect segment failure times are assumed to be statistically independent. The

    standard error I?., denoted f i , is also com- puted. This is to account for the statistical uncer- tainty in the estimated EM risk arising from the relatively small samples of interconnect that can feasibly be tested under accelerated conditions.

    To obtain the SEB results presented in this paper a Multilognormal failure distribution for intercon- nect failure times was assumed [7], along with the commonly used inverse square dependence of failure time on current density. EM risk is then given by

    where 6; and T i are the estimated median and log standard deviation parameters of the lognor- mal statistical model standardized to S=l , A = l , and a unit length (for lines). 4, is the number of unit lengths (or number of contacts) present on the chip at the stress combination S, and A,. For the i-th class there are m, stress combina- tions occurring on the chip. Also,

    @(x) = @&)k exp(- xz/2)dx is the standard normal cumulative distribution function. The total chip EM risk over the k interconnect classes is


    i = l

    To be consistent with the statistical nature of EM degradation (and to be conservative in design) the 90% upper confidence bound (UCB) for /? ,

    UCB=l - I - R XP -128 V , (3) [( -k ( q is always used, based on q = c: e 1 qi. Formu- las for the \?I and the UCB are based on the theory of Maximum Likelihood Estimation (MLE). Derivation of these formulas along with Monte Carlo simulation to examine their accuracy with small, censored samples is given in [8]. Since the MLE theory is quite general, SEB can be adapted to use almost any EM statistical model and stress scaling model. The MLE estimates of

  • the statistical parameters of the model along with the q. are computed based on the likelihood equation of the model. The resulting parameter estimates are substituted into Equation (1) with the Multilognormal failure distribution replaced by the failure distribution of the new model. Equa- tions (2) and (3) are unchanged.

    0.7 1

    S (relative current stress)


    le+5 - le+4

    2 le+3 E

    le+2 5

    l e + l - 1 e+O


    E c) 0 a S


    - .c 0

    F Q) -I

    a 0 I- .c

    Figure 3: Alpha 21 164 Distribution of S @ 333 M Hz

    SEB Application to Alpha 21 164

    Digitals proprietary WAWOTH CAD tool suite [9] was used to obtain the node currents and layout geometry that are required to determine the S,. See [ I O ] for more detail on how these CAD tools were used to process the millions of interconnect segments in the Alpha 21 164 design. Figure 3 tabulates those Sii for 3 classes of interconnect in the 4-metal level design at the design verifica- tion frequency of 333 Mhz and maximum junction temperature of 100C.

    Each risk estimate k is computed using Equation (1) and its 90% UCB computed using Equation (3) with k = 1 . The composite EM Risk (all MTI and MT4) is then computed by applying Equa- tions (2) and (3) to the 3 interconnect classes

    analyzed. Figure 4 shows the resulting SEB- computed EM risk at these extremes.

    E ._


    :le-6 Narrow Wide Mr4 All

    Mr4 Mf l MT1 MT18

    Figure 4: Alpha 21 164 EM Risk @ 333 MHz, 100C

    Other Reliability Engineering Activities

    An examination of the hot carrier design for reli- ability methodology applied to the Alpha 21 164 is given in [ IO ] , along with an overview of the CAD tools employed in its EM and hot carrier design verification.

    The greater bandwidth between design and process entailed by approaches like SEB can enable additional reliability engineering activities that contribute business success. Building direct links between process reliability data and the design function enable the

    Identification of reliability-critical structures that can focus process improvement and in- crease the assessment accuracy of qualifica- tion testing

    Quantification of acceleration model sensitivi- ties that can drive the design and prioritiza- tion of kinetic studies and reliability model development projects

    Rapid assessment of the chip-level impact of improvements in process reliability or chip thermal management

    Rapid assessment of the chip-level impact of changes in chip operating voltages or fre- quencies

  • All of these activities can lead to higher perform- ance products with shorter time-to-market, but with more thoroughly verified reliability.

    References J. Kitchin, Statistical electromigration budgeting for reliable design and verification in a 300-MHz microprocessor , 7995 Symposium on VLSI Cir- cuits Digest of Technical Papers, pp. 1 15-6.

    W. Bowhill et al., A 300-MHz 64b quad-issue CMOS RlSC microprocessor, 7995 lSSCC Digest of Technical Papers.

    S. Bell et al., Implementation of a 300-MHz 64b second generation CMOS Alpha RlSC CPU, Digital Technical Journal, in press.

    M. Triantafyllou and A. Leone, Electromigration reliability evaluation for a three-level metalization process, Microelectron. Rehab., Vol. 32, pp. 621 - 631, 1992.

    B-K. Liew, P. Fang, N. W. Cheung, and C. H u , Circuit reliability simulator for interconnect, via, and contact electromigration, I Transactions on Computer-aided Design of Integrated Circuits and Systems, Vol. 12, pp. 1524-1533, 1993.

    M. Atakov, J. J. Clement, and B. Miner, Two electromigration failure modes in poly-crystalline aluminum interconnects, Int. Rely. Phys. Symp.,

    J. R. Lloyd and J. Kitchin, The electromigration failure distribution: the fine-line case, J. Appl. fhys., Vol. 69, pp. 2117-2127, 15 February 1991.

    J. Kitchin and W. J. Martin, Risk estimation and confidence bounds for statistical electromigration budgeting, unpublished.

    V. Peng, D. R. Donchin, and Y-T. Yen, Design methodology and CAD tools for the NVAX micro- processor, / lnternational Conference on Computer Design: VLSI in Computers and Proc- essors, pp. 310-313, October 1992.

    pp. 213-224, 1994.

    [IOIR. P. Preston et al., Design and verification strategies for ensuring long-term reliability of a 300-MHz microprocessor, Proceedings ESSCIRC95, pp. 278-281.


View more >