[IEEE IEEE 34th Annual Spring Reliability Symposium, 'Reliability - Investing in the Future' - Boxborough, MA, USA (1996.04.18-1996.04.18)] IEEE 34th Annual Spring Reliability Symposium, 'Reliability - Investing in the Future'. - Design for Reliability in the Alpha 21164 Microprocessor
Post on 16-Mar-2017
Embed Size (px)
Design for Reliability in the Alpha 21 164 Microprocessor
Digital Equipment Corporation
(508) 568-4650 email@example.com 77 Reed Road, Hudson, MA 01749-2839
An increasingly important area where reliability engineering can contribute to success in the semiconductor industry is in methods and tools for executing design for reliability. VLSl process variation presents design-dependent levels of statistical reliability risk to all products fabricated in the process. Reliability methods and tools that allow process reliability to be projected directly and quantitatively back into the design can en- able designers to achieve balance in design pa- rameters that simultaneously achieve perform- ance, time-to-market and long-term reliability goals. Statistical Electromigration Budgeting (SEB) is a method for setting and verifying elec- tromigration (EM) design for interconnect that allows designers to selectively supersede fixed EM design rules to achieve increased chip per- formance while simultaneously quantifying chip- level EM reliability to directly assure design conformance to chip-level reliability require- ments. An overview of SEB and some results from its application in the design and verification of the Alpha 21 164 microprocessor are given, followed by a tabulation of opportunities for fur- ther reliability engineering activities enabled by developments like SEB.
Consider a (strict) definition of VLSl Design for Reliability as achieving a chip-level reliability goal. Then an increasingly important area where reliability engineering can contribute to success in the semiconductor industry is in methods and tools for executing design for reliability. This is especially true under the market push for higher levels of performance and shorter design cycles
in VLSl products. In particular, the spiraling complexity and geometrically increasing clock rates for high performance microprocessors need sophisticated design and verification ap- proaches to ensure that long-term reliability re- quirements are met without sacrifice of competi- tive performance and time-to-market goals.
VLSl process variation, which can be reduced with good process development and manufactur- ing process improvement strategies, can never be completely eliminated. This variation pres- ents design-dependent levels of statistical reli- ability risk to all products fabricated in the proc- ess. If design for reliability is taken to mean achieving a chip-level reliability goal, then fixed limits for reliability related design parameters are mathematically arbitrary. For example, having all chip interconnect satisfy a fixed current density limit may not guarantee chip-level reliable design for electromigration (EM), nor does allowing some interconnect to exceed fixed limits neces- sarily lead to an unreliable design. In design for reliability under process variation it is the chip- level reliability risk that is ultimately meaningful.
Reliability methods and tools that allow process reliability to be projected directly and quantita- tively back into the design can enable designers to achieve balance in design parameters that si- multaneously achieve performance, time-to- market and long-term reliability goals. A recent example of such methods and tools is Statistical Electromigration Budgeting (SEB) [I], which was applied in the design and verification of the Alpha 21 164 microprocessor , a 4-level metal, 0.5 pm CMOS implementation of the Digital's Alpha architecture operating at up to 333 MHz.
Design for EM Reliability
Commercial CMOS VLSl process technologies typical employ AI as the interconnect metal. The resistance to EM degradation of AI interconnect has been improved over the past several years through use of better alloys, adding underlayers, and increasing relative grain size. However, during the same period VLSl designers have ag- gressively pursued increased product perform- ance (especially for microprocessors). The result has been significantly higher clock speeds and power dissipation. (Versions of the Alpha 21 164 that run at 300 MHz dissipate a maximum of 50W [2,3].) Increasing the clock speed increases the current density while higher power dissipation increases the chip junction temperature. Current density and temperature drive the EM directed diffusion mechanism that can lead to voids (leading to open circuits) or hillocks (leading to short circuits) in AI interconnect. Thus EM has remained a key limiter in the design of reliable high performance VLSl products.
Historically designers have compared intercon- nect dc average current per unit width, I,,, to a conservative fixed design limit. Let
Actual Ieff Design Limit I,, S =
(I,, is typically modified to correctly account for the effects of current crowding in corners and vias and the smaller net charge redistributed by ac nodes compared to dc nodes.). A design was viewed as reliable with respect to EM if S I1 , while any interconnect with S > 1 had to be re- designed.
However, when viewed from the process reliabil- ity perspective, EM degradation is inherently sta- tistical. A wide dispersion in the times to failure (defined as a specific increase in interconnect resistance) is typically observed for identically sized and stressed segments of interconnect or contacts (as shown by the probability plot of ac- tual EM accelerated testing data in Figure 1). Approaches that factor EM failure statistics into the setting of EM design limits  or into comput- ing EM reliability in circuit simulation  do not quantify chip-level design reliability or reveal the
relationship between S and EM risk in a way that enables a chip-level EM design for reliability strategy.
.95 c -80 I 4 * + *
-05 1 6 .O1 ~
100 200 300 400 500600
EM Failure Time under Acceleration (hrs)
Figure I : Illustration of the Statistical Nature of EM Degradation.
If design for reliability is defined to mean achiev- ing a chip-level reliability goal, then fixed current density design limits become mafhematically arbitrary. Note that having all interconnect sat- isfy S 5 1 does not guarantee a particular chip- level reliability, nor does having some intercon- nect with S > 1 necessarily mean that the chip- level reliability goal is exceeded. Only the total statistical risk to the chip is meaningful. Then if the EM reliability of each segment of intercon- nect at each stress level can be computed, the chip-level EM reliability goal can be budgeted among classes of interconnect or chip design subdivisions (such as boxes) to minimize the performance limitations that an EM reliability goal places on the design.
Figure 2 illustrates some aspects of a statistical budget approach. It shows, for example, that lo2 unit lengths of Narrow first level metal (MTI) at S = 2.1 present about t he same EM risk as I O 6 unit lengths of Narrow MTI at S = 10 . Further, the per unit length EM risks of Narrow and Wide MTI diverge significantly as S increases (due to different microstructure effects ), indicating that Wide MTI will limit EM design at the highest current stresses. not Narrow MT1 .
l e 4 , I
E le-6 s le-8 .- I f le-IO l?z 6 5 le12
.- .- a g
-t Wide MTI
.- i=g - E le-14 a a
S (relative current stress)
Figure 2; Comparison of EM Risk in Narrow & Wide MTI.
Interconnect test chips are subjected to high cur- rent density and temperature while being moni- tored for increases in resistance, as is typical in EM process reliability testing. The elapsed test time until the resistance of a structure increases beyond a fixed limit is defined as its failure time. The failure times of a sample of structures tested under identical current and temperature stresses are used to estimate the failure distribution for the structure class. (Narrow MT1 and Wide MT1 are examples of classes of interconnect struc- tures. Within a class the relative reliability of each size structure is assumed to be a function only of the relative current density and temperature stress on the structure.) Electromigration models are used to scale the failure time distribution to the Design Limit leR and temperature.
CAD tools are used to determine the relative cur- rent stress S, and temperature acceleration factor A, at worst case operating conditions for the j-th line segment (or contact) (where j = 1,. . ~, mi ) of the i-th structure class (where i = 1, ..., k ) in the design. For each in- terconnect class SEB computes the estimated composite EM risk f i j for the chip arising from the class over the chip product life L. Composite EM Risk is based on a series reliability model- any interconnect segment failure means chip failure. Interconnect segment failure times are assumed to be statistically independent. The
standard error I?., denoted f i , is also com- puted. This is to account for the statistical uncer- tainty in the estimated EM risk arising from the relatively small samples of interconnect that can feasibly be tested under accelerated conditions.
To obtain the SEB results presented in this paper a Multilognormal failure distribution for intercon- nect failure times was assumed , along with the commonly used inverse square dependence of failure time on current density. EM risk is then given by
where 6; and T i are the estimated median and log standard deviation parameters of the lognor- mal statistical model standardized to S=l , A = l , and a unit length (for lines). 4, is the number of unit lengths (or number of contacts) present on the chip at the stress combination S, and A