[ieee proceedings of ieee international reliability physics symposium - atlanta, ga, usa...

4
THE ROLE OF FIELD PERFORMANCE INFORMATION IN BUILDING IN RELIABILITY Bernard M. Piermcha AT&T Bell Laboratories 67 Whippany Road P. 0. Box 903 Whippany, New Jersey 07981-0903 R- 4E-230 (201) 386-30 'Those who cannot remember the past are condemned to repeat it." George Santayana (1863-1952) ABSTRACT Analysis of the field performance of VLSI devices and feedback of that information to device designers has been instrumental in improving device reliability. Such a system is required for certification under the military Qualified Manufacturers List (QML) program and is a required element of ocher certification systems. e.g.. ISO-9OOO. Field performance information can also be used to model the expected reliability of new designs. This paper describes a working feedback system which provides the infodon required for both product improvement and reliability prediction. INTRODUCTION The traditional approach to component reliability has been to build a batch of components and then try to "test out" the weak part of the population by applying various stresses, usually over time. This technique is illustrated in Figure 1 which shows a fixed stress being applied to a component population. Components which are weaker than the stress fail and are removed from the population. The remainder of the population is considered to have a strength which is greater than the stress. and is, by dewtion, suitable for use i.e.. it is "reliable". Types of stress to be applied to components imd their levels are delineated in our military. aerospace, and industrial specifications, and are usually set in response to some prior failure experience. When a failure of a certain ilk is discovered by the customer in what was thought to be a lot of "reliable" parts, the reliability justice system moves into action. If the failure were deemed an "escapee", the system would set in motion a process to create a new screen which would preclude the acceptance of that kind of failure. As devices became more complex, new screens were developed in response to each new failure, and the documents which described the screening methodology grew in proportion. The industry had all of the CH3194-8/93/oooO-0077$01.00 0 1993 IEEUIRPS 77 resources necessary to remove potentially bad poduct, but the means and incentive to improve product was absent from the specificationsystem. With increasing device complexity and more demanding customers, we need to focus less on screening out the bad ones and more on not manufacturingthem in the first place. DISTRIBUTION OF POPULATI 0 N STRENGTH DEVICES FA I L I NG SCREEN STRESS DEVICES STRENGTH Figure 1. Screening the Weak Roduct

Upload: bm

Post on 20-Mar-2017

212 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: [IEEE Proceedings of IEEE International Reliability Physics Symposium - Atlanta, GA, USA (1993.03.23-1993.03.25)] 31st Annual Proceedings Reliability Physics 1993 - The role of field

THE ROLE OF FIELD PERFORMANCE INFORMATION IN BUILDING IN RELIABILITY

Bernard M. Piermcha AT&T Bell Laboratories

67 Whippany Road P. 0. Box 903

Whippany, New Jersey 07981-0903

R- 4E-230

(201) 386-30

'Those who cannot remember the past are condemned to repeat it."

George Santayana (1863-1952)

ABSTRACT

Analysis of the field performance of VLSI devices and feedback of that information to device designers has been instrumental in improving device reliability. Such a system is required for certification under the military Qualified Manufacturers List (QML) program and is a required element of ocher certification systems. e.g.. ISO-9OOO. Field performance information can also be used to model the expected reliability of new designs. This paper describes a working feedback system which provides the i n f o d o n required for both product improvement and reliability prediction.

INTRODUCTION

The traditional approach to component reliability has been to build a batch of components and then try to "test out" the weak part of the population by applying various stresses, usually over time. This technique is illustrated in Figure 1 which shows a fixed stress being applied to a component population. Components which are weaker than the stress fail and are removed from the population. The remainder of the population is considered to have a strength which is greater than the stress. and is, by dewtion, suitable for use i.e.. it is "reliable".

Types of stress to be applied to components imd their levels are delineated in our military. aerospace, and industrial specifications, and are usually set in response to some prior failure experience. When a failure of a certain ilk is discovered by the customer in what was thought to be a lot of "reliable" parts, the reliability justice system moves into action. If the failure were deemed an "escapee", the system would set in motion a process to create a new screen which would preclude the acceptance of that kind of failure. As devices became more complex, new screens were developed in response to each new failure, and the documents which described the screening methodology grew in proportion. The industry had all of the

CH3194-8/93/oooO-0077$01.00 0 1993 IEEUIRPS 77

resources necessary to remove potentially bad poduct, but the means and incentive to improve product was absent from the specification system.

With increasing device complexity and more demanding customers, we need to focus less on screening out the bad ones and more on not manufacturing them in the first place.

DISTRIBUTION OF POPULATI 0 N STRENGTH

DEVICES FA I L I NG

SCREEN STRESS

DEVICES

STRENGTH

Figure 1. Screening the Weak Roduct

Page 2: [IEEE Proceedings of IEEE International Reliability Physics Symposium - Atlanta, GA, USA (1993.03.23-1993.03.25)] 31st Annual Proceedings Reliability Physics 1993 - The role of field

ISSUES AND CONCERNS

W g e Santayana was addressing political and social h ~ u e s and the consequences of ignoring those lessons when he penned his f m u s lines in 1905. but his words have far-reaching applicability in the modem electronic product realizafon and delivery process. The performance history of components and S Y S ~ abounds with information which can reveal to us the SUTC~B of successful product design and manufacture. Electronic component manufacturers cm leam a great deal about their products from accurate field performance infcnormation.

bowledgeable reliability engineers realize that screening devices to a certain stress level or series of stress levels does not guarantee that a component failure will not occur. In explaining away these component failures, astute manufacturus have cited d g as a cause for shifting the responsibility for assuring reliable product from the manufacauer to the customer. Manufacturers may claim that they have no liability for device failures because they m n e d the devices m acconhce with a military standard or some customer specification. The implicit message here is that if I. as a mmufwturer. must screen, or, in fact build the product according customer dictates, the failures then become the customer's responsibility. This is the reason for the proliferation of screening methodology over the years. But we know now that specifying a screening level is not in itself a p a n t e e that reliable product will be produced. Something more than screening is needed. Screens which yield 100% contribute liale to product reliability but they do contribute to

S " i n g , however. can be more than a test applied to finished product. A screening methodology can be applied during the concept and design phases of a new product to assure ourselves and our customers that reliability is being built into the product.

Too often in our race to be "first to market" with new products we rely on innovative but less-than-proven technology to grab a significant share of the market before the other guy can get a large slice of that same customer base. How often do we ask ourselves if we are making some of the same mistakes again as we rush the new product into production? (Post- mortems frequently show that field failures are caused by old mistakes being repeated on new products.)

We need to address these concerns through systematic, disciplined feedback of field performance information to design and manufacturing. If we heed the voice of the customer as we mate new and more exciting products, we can use our customers' knowledge of our current products to dramatically impove new products' performance and reliability.

the product c o s t

MILITARY AND COMMERCIAL STANDARDS

The need to build-in reliability is an acknowledged requirement in new and cwent military and commercial standards and specifications. Emphasis is being placed more strongly on "what" manufacturers must do to be recognized as a reliable product supplier and less on "how" to accomplish it.

The listing of specific screens and test methodology is being abandoned in favor of a set of customer-oriented expectations which may be implemented in any number of ways.

MIL-1-38535 (Qualified Manufacturers' List) requires that a field failure remm program be established as a prerequisite for certification and qualification under that specification. Procedures must be self-imposed by a manufacturer of ASIC/VLSI devices to test and analyze failed parts from the field and to implement corrective actions.

The IsO-9OOo series of standards is gaining world-wide lrcceptance as a certifcation methodology and its provisions do not include a reference to "how" to implement the program. The provisions do clearly state that customer complaints must be analyzed to eliminate causes of nonconformance. This does not mean that, if a customer complains, he gets a new product to replace the defective one. There must be a system in place to get the failure information to the responsible entity within the company so that the cause of failure can be eliminated at its source.

The National Electronic Process Certification Standard @ECPS/ANSI-EIA-599) was created to provide a commercial model that would reflect the innovative provisions of MIL-I- 38535 without requiring certification by a government agency. It also requires that product performance. quality, and reliability information must be evaluated to assess whether improvement is continuous. In addition, on-going customer communication must be maintained to determine and measure customer satisfaction.

The purpose of the cited requirements in these standards is to get field performance information to the people who can use it effectively to improve the product. thereby preventing potential unreliable product from being built, i.e.. to build reliability into the products.

FIELD FAILURE INFORMATION FLOW

It is not sufficient that field failures be analyzed and that results be given to the customer. The program must also include a feedback path to designers. process developers, and other responsible technical personnel. Product management and marketing must also be in the loop for it is often their decision whether a problem can and will be addressed.

Figure 2 is a description of how a failure is analyzed and documented under a MIL-1-38535 certified process.

The failed component is indicted using the system manufacturer's standard factory test procedure, usually at the component level while the component is still attached to the printed wiring board. The alleged defective component is then cataloged in the removal database. If a pattem of failures for that particular code starts to become apparenL the device type is assigned a unique identifier (the root cause database) and failures are returned to the device manufacturer for analysis. Simple devices may be analyzed by the systems manufacturer, but ASIC devices with high gate counts can only be fully analyzed by the device manufacturer.

When the devices arrive at the device manufacturer's facility, they are cataloged at the Customer Technical Support

78

Page 3: [IEEE Proceedings of IEEE International Reliability Physics Symposium - Atlanta, GA, USA (1993.03.23-1993.03.25)] 31st Annual Proceedings Reliability Physics 1993 - The role of field

Ccntez wing the unique identifk from the root cause database and the failure analysis v s begins. The devices 8e

electrically tested and the verified bad devices are failure analyzed. Results frcnn this analysis are then delivered to the apprapiata wtivity within the factory for resolution. The Customex Technical Support Center then reports the resolution to the customex.

FIELD FAILURE INFORMATION FLOW

I ROOT CAUSE DATABASE

REMOVAL DATABASE

MODELING 6 FAILURE

TECHNICAL SUPPORTCENTER

FIELD/ FACTORY PROCESS/

PROCEDURE ASSEMBLY

CORRELATION CAUSE ISSUE OEVlCE

CHANGE

Figure 2. Field Failure Information Flow

The process becomes more complex when the devices returned test good upon receipt This is not unusual for complex devices with high U0 counts since there are many different sets of test vector possibilities. Failures of this type require intense involvement of both design and test personnel in the failure analysis process. This activity usually results in additional test vectors to preclude a repeat of this type of failure. Since the failures are usually point defects, i.e., a defect in a single transistor in a single gate. the root cause is not a lack of fault coverage but is. in fact, a question of defect density in the lot which yielded the device.

Once this type of failure is resolved, the customer is again informed of the results and the nature of the resolution. Resolution of this kind of failure usually requires frequent communication between the vendor and the customer to assure that the customer's expectations and test methodology are adequately addressed and incorporated into the failure resolution.

When the circumstances mounding the failure are generic in nature, design rules, tools, and techniques can be modSed to keep the failure mechanism from being built into future designs.

Ths flow of h l d failure information reaches a broad audience with the necessary skills to resolve customer satisfaction issues quickly and competently. With the total commitment of the entire product development/manufacturing organization, unpleasant replays of historic events can be prevented h m damaging the producer's reputdon and delivaing the customer to the arms of a competitor.

TRACKING AND MODELING FIELD PERFORMANCE

INFORMATION

Aftex the failure is resolved, data entered into the componmt removal database can be analyzed to determine the effect of the desi-dtest changes on device performance. An example of this type of analysis is shown by the arrow in Figures 3.4, and 5. Each of these ASIC codes had test vectors added to the test program at the vendor's facility to more fully exercise the logic. The date that the vectoxs were added is shown on the mdividual charts. The improvements in the numbers are measurable, but the contribution of the improvements to customer satisfaction is significant. A joint effort like this in which the vendor and the customer act as partners rathez than adversaries helps to improve the field paformance of the devices, and the partnership can continue to grow.

Once field performance information on devices is collected and a n a l y d that information can be used to predict the performance of new products during the design phase of the poduct realization process. Knowing the population for a given device type and the total operating time for the population (within r e m a b l e accuracy limits, e.g.. to the nearest month), we can compute a reliability estimate for that device type based on its field performance. Analyzing this informdon for many different component types then yields a reliability estimate for each component type. Using these reliability estimates, we can then model the field performance of a new design by assigning a reliability estimate to each component in the proposed circuit and adding those estimates (in FIT) to get a reliability estimate (in FIT) for the complete circuit With appropriate multipliers for temperature and other stresses, we can get an estimate of field performance information for the new design and make that information available to device and circuit designers and process developers. Guided by this information, designers can perform sensitivity analyses and optimize the predicted reliability of their designs.

Field failure rates for two ASIC devices which were estimated using this technique and the respective populations from which these failure rates are derived are shown in Table 1.

I Device I FIT I PoDulationl

CodeB 10.3 475573 I CodeC 1 15.0 I 1601700 I

Table 1. Field Failure Rates for Device Codes

79

Page 4: [IEEE Proceedings of IEEE International Reliability Physics Symposium - Atlanta, GA, USA (1993.03.23-1993.03.25)] 31st Annual Proceedings Reliability Physics 1993 - The role of field

SUMMARY

1000-

CODE A FACTORY REMOVALS

................................................................................................................................ i

................. ... ....-......._.

............... .

AUg87Feba8A~&88 AWB9 AUg-90 AQ-91 AN-92 Feb89 FebW Feb91 Feb92

TIME

Figure 3. Device Rer~~ovals. Code A

CODE B FACTORY REMOVALS

---- I I

...

.........

CODE C FACTORY REMOVALS

B n I \ I

The importance of field information in building in reliability cannot be overstated. Customers' knowledge of how OUT products pczform m their applications, if they share that knowledge with us, can be used to i m p v e our product reliability. Laboratory testing can only give an approximation of device performance in customer applications.

Smaller device geometries call for a proactive effort to bring reliability into everyone's design process. We can no longer rely on after-the-fact testing to remove the weaker devices. We have recognized that additional screening on finished product does little to improve product reliability. The field environment is the ultimate screen.

Reliability comes to those who recognize the importance of field performance information. We must use that information to make reliability an integral part of the design process, and not a post-manufacturing ritual. Reliability of future products can only grow if we embrace this COIEcept. We must remember our past. or we shall indeed be condemned to repeat it.

--= -- . 'Yy L"

Feb90 F

Figure 5. Device Removals, Code C

ao