[ieee ieee 34th annual spring reliability symposium, 'reliability - investing in the...

12
An Improved Accelerated Life Testing Approach for Evaluating System Reliability Jeffrey A. Clark Ugo S. Garganese The MITRE Corporation* Abstract Renewed focus on the cost of reliability engineering has further motivated the development of faster and cheaper methods for evaluating system reliability. Conventional reliability verification testing (RVT) is time consuming and expensive. Analytical predictions are economical, but sacrifice some credibility for their low cost. They also do not identify manufacturing-related reliability problems. An attractive alternative to RVT and analytical predictions is accelerated life testing (ALT); however, little research has been done in the area of system-level ALT using multiple environmental stresses. Of the few approaches that exist today, the Stress Model for Accelerated Reliability Testing ($MART), developed by the Electronic Systems Center at Hanscom AFJ3, is perhaps the most advanced. We have developed an improved ALT approach based on $MART for evaluating system reliability under multiple environmental stresses. In the three previous applications of $MART, little emphasis was placed on design of experiments (DOE). As a consequence, failures due to cumulative damage and/or overstress were observed. Our DOE-based approach improves the accuracy of the results obtained from $MART by ensuring that the design limits of the test item are not exceeded. It also improves the confidence in the results obtained from $MART by increasing the probability of observing statistically random failures for a given test length. Both of these improvements are achieved by performing several destructive evaluation experiments on the test item before the accelerated life test takes place. This destructive evaluation determines the design limits of the test item, which can then be used to optimize the ALT stress profile for accuracy and statistical confidence. Alternatively, the stress profile can be optimized for accuracy and maximum acceleration, if a shorter test is desired. In either case, OUT ALT approach preserves many of the better attributes of a conventional RVT, but costs much less to perform. The approach is particularly applicable to systems with commercial off-the-shelf (COTS) components for which little information is available from the manufacturers on their true design limits. In this talk, we will present our improved ALT approach and discuss its evaluation on a COTS single-board computer. A comparison of the time and cost associated with our ALT approach versus conventional RVT will also be provided. * The views, opinions, andor findings contained in this presentation are those of the authors, and should not be construed as the official positions, policies, and/or decisions of the MITRE Corporation or its government sponsors.

Upload: ja

Post on 16-Mar-2017

213 views

Category:

Documents


1 download

TRANSCRIPT

Page 1: [IEEE IEEE 34th Annual Spring Reliability Symposium, 'Reliability - Investing in the Future' - Boxborough, MA, USA (1996.04.18-1996.04.18)] IEEE 34th Annual Spring Reliability Symposium,

An Improved Accelerated Life Testing Approach for Evaluating System Reliability

Jeffrey A. Clark Ugo S. Garganese

The MITRE Corporation*

Abstract

Renewed focus on the cost of reliability engineering has further motivated the development of faster and cheaper methods for evaluating system reliability. Conventional reliability verification testing (RVT) is time consuming and expensive. Analytical predictions are economical, but sacrifice some credibility for their low cost. They also do not identify manufacturing-related reliability problems. An attractive alternative to RVT and analytical predictions is accelerated life testing (ALT); however, little research has been done in the area of system-level ALT using multiple environmental stresses. Of the few approaches that exist today, the Stress Model for Accelerated Reliability Testing ($MART), developed by the Electronic Systems Center at Hanscom AFJ3, is perhaps the most advanced.

We have developed an improved ALT approach based on $MART for evaluating system reliability under multiple environmental stresses. In the three previous applications of $MART, little emphasis was placed on design of experiments (DOE). As a consequence, failures due to cumulative damage and/or overstress were observed. Our DOE-based approach improves the accuracy of the results obtained from $MART by ensuring that the design limits of the test item are not exceeded. It also improves the confidence in the results obtained from $MART by increasing the probability of observing statistically random failures for a given test length.

Both of these improvements are achieved by performing several destructive evaluation experiments on the test item before the accelerated life test takes place. This destructive evaluation determines the design limits of the test item, which can then be used to optimize the ALT stress profile for accuracy and statistical confidence. Alternatively, the stress profile can be optimized for accuracy and maximum acceleration, if a shorter test is desired. In either case, OUT ALT approach preserves many of the better attributes of a conventional RVT, but costs much less to perform. The approach is particularly applicable to systems with commercial off-the-shelf (COTS) components for which little information is available from the manufacturers on their true design limits.

In this talk, we will present our improved ALT approach and discuss its evaluation on a COTS single-board computer. A comparison of the time and cost associated with our ALT approach versus conventional RVT will also be provided.

* The views, opinions, andor findings contained in this presentation are those of the authors, and should not be construed as the official positions, policies, and/or decisions of the MITRE Corporation or its government sponsors.

Page 2: [IEEE IEEE 34th Annual Spring Reliability Symposium, 'Reliability - Investing in the Future' - Boxborough, MA, USA (1996.04.18-1996.04.18)] IEEE 34th Annual Spring Reliability Symposium,
Page 3: [IEEE IEEE 34th Annual Spring Reliability Symposium, 'Reliability - Investing in the Future' - Boxborough, MA, USA (1996.04.18-1996.04.18)] IEEE 34th Annual Spring Reliability Symposium,

An Improved Accskmed Uls Tssllng Appoach lor Evallullng System Rollabllity - Jollrey A. Clark and Up0 s. Cargansse 1

Motivation

Shrinking budgets have renewed the search for faster and

0 Conventional reliability verification testing (RVT) is time

0 Analytical reliability predictions are often inaccurate and

0 Accelerated life testing (ALT) is an attractive alternative - Test at elevated levels of environmental stress to

- Statistical model translates the failure time distribution

cheaper methods of evaluating system reliability

consuming and expensive

do not identify manufacturing-related problems

precipitate failures more rapidly

down to the normal use environment

RVT-ALT Comparison

0 1994 MITRE R&M Center study Average time and cost to evaluate the MTBFs of a military

0 Includes all labor, equipment, facilities, and consumables synthesizer/detector and receiverhransmitter

SynthesizerDetector Receivernransmitter

Time cost Time cost

RVT 34 wks. $802,000 30 wks. $695,000

ALT 14 wks. $246,000 13 wks. $207,000

Savings 60% 70% 60% 70%

Page 4: [IEEE IEEE 34th Annual Spring Reliability Symposium, 'Reliability - Investing in the Future' - Boxborough, MA, USA (1996.04.18-1996.04.18)] IEEE 34th Annual Spring Reliability Symposium,

An lmprovad Aeec!aNed Ute TcRlng Approrh 101 Evsluatlng System Rellsbllg . Jcmey a Claw and Ugo s. c3rgane.e

ALT at the System Level

0 ALT traditionally performed on circuit-level components using a single environmental s t ress

0 Little research done in the area of system-level ALT using multiple environmental stresses

0 Multiple environmental stresses needed to maximize test acceleration and provide the most representative results

0 Only two statistical models have been proposed - Design of experiments based model - Stress Model for Accelerated Reliability Testing

($MART)

MITRE

$MART

0 Cooperatively developed by the Air Force's Electronics Systems Center and George Washington University

0 Step-stress test up to the design limits 0 Bayesian "le1 combines prior failure rate predictions

with test results to obtain posterior failure rate estimates 0 MIL-HDBK-217 typically used to generate the priors 0 Same (or better) confidence a s classical statistical models

but requires fewer test items

Page 5: [IEEE IEEE 34th Annual Spring Reliability Symposium, 'Reliability - Investing in the Future' - Boxborough, MA, USA (1996.04.18-1996.04.18)] IEEE 34th Annual Spring Reliability Symposium,

An lmprovcd Aceekrm& Ute Temlng Approsh lor Evalumlng Syrtsm Rdlablllfy - &nmy A Clarkand ugo s. G8rgms.c

Statistics Behind $MART

A = Set of prior failure rates X = Set of test results (number of failures on each step)

Posterior probability density function (pdf) of A given X is obtained by applying Bayes' Theorem :

where :

/(XlA) p(A)= Priorpdf of A

g(X) Normalization pdf of X

Likelihood pdf of X given A

An Improved Accokrated Ule Temlng Appoah lor Evaluallng Sptem Rallsbllty - JCUmy A Clsrk and Ugo S. Gargrineie 6

Previous Applications of $MART

0 Estimated design limits 0 Linear step-stress profiles 0 Thermal stress not always characterized 0 Voltage stress on microcircuits not modeled 0 Failures due to cumulative damage and/or overstress 0 Performed on military equipment by their manufacturers

Page 6: [IEEE IEEE 34th Annual Spring Reliability Symposium, 'Reliability - Investing in the Future' - Boxborough, MA, USA (1996.04.18-1996.04.18)] IEEE 34th Annual Spring Reliability Symposium,

Our Improved Approach

0 Experimentally determined design limits 0 Optimized step-stress profiles 0 Thermal surveys of the test item and fixture e Voltage stress model for microcircuits 0 Vibration used as a background (unaccelerated) stress 0 Applicable to commercial off-the-shelf (COTS) electronics

as well as military equipment

Implemented in part through a destructive evaluation of the test item before the accelerated life test

0 Establishes the design limits of the test item o Three experiments starting at the maximum specified (max

spec) limits of the test item 1. Ramp temperature only 2. Ramp voltage only 3. Ramp both temperature and voltage

stress ramp rates for the third experiment

failure point of the third experiment

0 Hard failure points of the first two experiments determine

0 Assume the design limits lie ten percent below the hard

Page 7: [IEEE IEEE 34th Annual Spring Reliability Symposium, 'Reliability - Investing in the Future' - Boxborough, MA, USA (1996.04.18-1996.04.18)] IEEE 34th Annual Spring Reliability Symposium,

An Improved Aecelsrstal Ute Toatlng Approrh lor EVslWlng Sptcm Relbblliiy . JcHrsy A Cbrk and Ugo S. Dargmaso 9

Accelerated Life Test

0 First two steps of the stress profile are at the normal use

0 Last step of the stress profile is at the design limits 0 Nine intermediate steps are based on the underlying

relationship between failure rate and applied stress - Want failure rate to increase linearly with time - Stresses should increase as an inverted, negative

and max spec levels

exponential function of time 0 Times on all steps are equal and chosen to achieve desired

probability of obtaining one or more failures

An Improved Ambralcd We lanlng Approach for Evalwtlng Sptem Rolbbllhy - W h e y A Cbrkmnd Ugo s.Gargmase 10

Evaluation of Our Approach

0 Apply it to a COTS single-board computer (SBC) - Representative mixture of components - Easy to monitor using COTS diagnostic software - Results can be extrapolated to PCs and workstations

- Three for the destructive evaluation - Six for the accelerated life test - One for software development

0 Ten SBCs purchased

0 Tests and failure analyses performed in MITRE’S Product Evaluation Laboratory

Page 8: [IEEE IEEE 34th Annual Spring Reliability Symposium, 'Reliability - Investing in the Future' - Boxborough, MA, USA (1996.04.18-1996.04.18)] IEEE 34th Annual Spring Reliability Symposium,

An Improved Assskmted Ute Testing I\ppoash tor Evahutbg Sptam Rellabllhy~ JslWey A. Clark and Ugo S. Garganlns

Octagon 5025A SBC

Rsfbbfffty and MnfnmfnabllW MITRE

0 25-MHz 80386CX microprocessor 0 l-Mbyte DRAM 0 Three solid-state disks (SRAM, UV-EPROM, Flash EPROM) e Serial, parallel, keyboard, and speaker ports 0 Industrial temperature range (-40°C to +85"C) 0 5V-only (+ 5 percent) operation 0 5 Grms vibration specification

I Diag (the chosen COTS diagnostic program) under DOS

An Improved Acselarsted U s Testlng A p p w h lor Evaluating System Rellabllhy - JsWray A Clark and Ugo S. Garganeae

Test Setup

Page 9: [IEEE IEEE 34th Annual Spring Reliability Symposium, 'Reliability - Investing in the Future' - Boxborough, MA, USA (1996.04.18-1996.04.18)] IEEE 34th Annual Spring Reliability Symposium,

Random Vibration Profile

Power Spectral Density (G2Mz)

1 6'

I 0-*

I o - ~

1 o4

0 Simulates an airborne inhabited cargo environment 0 MILSTAR line replaceable unit performance profile

0 Applied perpendicular to the SBC during all testing attenuated through a 30-Hz vibration isolator

10' 1 o2 1 o3 Frequency (Hz)

Destructive Evaluation Results

195

155

('C) 132 12c

Temperature

85

Experiment #1 a Hard Failure

I I I I I ) 5.25 6.3 6.7 7.35 9.2

Voltage (V)

Page 10: [IEEE IEEE 34th Annual Spring Reliability Symposium, 'Reliability - Investing in the Future' - Boxborough, MA, USA (1996.04.18-1996.04.18)] IEEE 34th Annual Spring Reliability Symposium,

An Imprwed Accskrmed Uta Taming Appoach lor Evaluating System Rallabllny - Janrsy A cmm ana ugo s. Gllrgmese

Accelerated Life Test Results

Stress

Design Limit

Max Spec

Normal Use

3 of 6 SBCs Tested with No Failures

i Step Temp. Volt.

124 6.42

131 6.62 4 134 6.71

Constant vibe on all aeps at the attenuated level

I I l l I I I I I I I I * a 16 24 32 40 48 56 64 72 80 88 96

Time (hours)

An Improved AecakrPad Ute TePbg AippTOaCh lor EValUtIng System RallabllW - Jstlmy A Clark P M Ugo 5. Gargansoo 16

$MART Analysis

0 Prior Predictions - MIL-HDBK-217F and Voltage Stress Model by Shirley - Normal Use: MTBF = 85,470 hours (3,327 hours) - Max Spec: MTBF = 34,040 hours (2,629 hours) - Design Limit: MTBF = 1,180 hours (282 hours)

- Will be computed using $MART Version 1.5 0 Posterior Estimates

95 percent lower confidence limits given in parentheses

Page 11: [IEEE IEEE 34th Annual Spring Reliability Symposium, 'Reliability - Investing in the Future' - Boxborough, MA, USA (1996.04.18-1996.04.18)] IEEE 34th Annual Spring Reliability Symposium,

An Improved Assakrated Ute Toelng Approach for EvdMlnp System Ralhbllny~ Jm" A Cbrk and Ugo S. Gargane~ 17

Future Work

0 Apply our approach to the next higher level of assembly 0 Considering a desktop PC

- Re-use most of the test setup - Bypass the power supply - Isolate the hard disk from vibration

0 Validate the $MART model with field reliability data 0 Demonstrate repeatability of the approach on other items

T An Improved Asceloreod Uh Teeing Approach lor Evalcallng System RelubllnV . &may A. Cbrk and Vgo S. Garpaneso

Conclusion

\ 18

0 ALT is an attractive alternative to conventional RVT and

0 We have developed an improved system-level ALT

0 We are evaluating our approach on a COTS SBC 0 We plan to apply the approach to the next higher level of

0 Many opportunities exist for additional research

analytical reliability predictions

approach based on $MART

assembly in the near future

- Developing ALT statistical models - Verifying and validating the models

Page 12: [IEEE IEEE 34th Annual Spring Reliability Symposium, 'Reliability - Investing in the Future' - Boxborough, MA, USA (1996.04.18-1996.04.18)] IEEE 34th Annual Spring Reliability Symposium,

Bibliography

1. “A Bayesian Approach to Accelerated Life Testing,” van Dorp, et al., to appear in IEEE Transactions on Reliability, 1996

2 . A Wide-Parametric Bayesian Methodology for System-Level, Step-Stress Accelerated Life Testing, Pollock, Florida Institute of Technology Doctoral Dissertation, 1989

3. Accelerated Reliability Testing Utilizing Design of Experiments, McKinney, Rome Laboratory Technical Report RL-TR-93-249, December 1993

4. “A Defect Model of Reliability,” Shirley, Tutorial Notes of the 33rd Annual Intemational Reliability Physics Symposium, April 1995, pp. 3.1-3.17

Biographies

Jeffrey A. Clark

The MITRE Corporation Phone: 617-271-3486

Bedford, MA 01730 E-mail: [email protected] 202 Burlington Road, Mail Stop H113 Fax: 617-271-2734

Jeff is a Senior Staff member in the Reliability and Maintainability Center at the MITRE Corporation. Previously, he held summer staff positions at Sandia National Laboratory, Lawrence Livermore National Laboratory, MIT Lincoln Laboratory, and Halmstad University in Sweden. Jeff earned a B.S. in Electrical and Computer Engineering (ECE) from Carnegie Mellon University in 1987, and an M.S. and Ph.D. in ECE from the University of Massachusetts at Amherst in 1989 and 1993, respectively. His research interests include fault-tolerant computing, system dependability modeling, simulation, and experimental evaluation, and parallel and distributed processing. He is a member of the IEEE Reliability and Computer Societies, and is the Publicity Chair for the IEEE Boston Section Reliability Chapter.

Ugo S. Garganese

The MITRE Corporation Phone: 617-271-7887

Bedford, MA 01730 E-mail: [email protected] 202 Burlington Road, Mail Stop H113 Fax: 617-27 1-2734

Ugo is a Group Leader in the Reliability and Maintainability Center at the MITRE Corporation, where he supervises reliability engineering efforts on large-scale command, control, communication, and computer systems. He was responsible for initiating the accelerated life testing research program at MITRE. Prior to joining MITRE in 198 1, Ugo worked as an analytical engineer for the Pratt and Whitney Aircraft, and Sikorsky Aircraft Divisions of United Technologies. He was also a project engineer at Westinghouse Electric Corporation and a program manager at Kaman Aerospace Corporation. He received a B.S.M.E. from the University of Lowell in 1973 and an M.S.M.E. from Rensselaer Polytechnic Institute in 1976. He is a member of the SAE G-1 1 Subcommittee on FMECA.