[ieee annual reliability and maintainability symposium 1995 - washington, dc, usa (16-19 jan. 1995)]...

4
On Reliability Growth Testing Edward Demko * Grumman Melbourne Systems Division Melbourne Key Words: Reliability Development GronTh Testing (RDGT), En~~ironmental Stress Screening (ESS), Environmental Qualification Testing (EQT), Pattern Failures, Test, Analyze, And, Fis (TAAF). SuMiMARY B CONCLUSIONS Reliability Development Growth Testing (RDGT) is the most common method used to improve equipment reliability. The author had an opportunity to perform an analysis of hardware that experienced Environmental Stress Screening (ESS), Environmental Qualification Testing (EQT), RDGT and field usage. The failure mode and corrective action data were used to qualitatively assess the effectiveness of RDGT testing. The results of this analysis yield the follouing conclusions: 1. RDGT is not a very good precipitator of field related failure modes, therefore RDGT alone docs not appcar to be a strong driver of reliability growth. 2. RDGT, EQT, ESS, and EQT tests precipitate a high percentage of failure modes that occur only in "chamber-t>pe" environments, and are not related to field use. 3. Of the three "chamber-type" tests (ESS, RDGT, and EQT) evaluated as precipitators of field related failure modcs. ESS appears to be the most effective. 4. "Chamber-type" tests are more eficient in de\tloping corrective actions than Field operation. 1. INTRODUCTION The primary purpose of ESS testing is to precipitate latent defects (weak parts) and prevent their occurrence in field use. These failure modes usually occur as single occurrence events. However, ESS also precipitates pattern failure modes arid identifies design flaws. The purpose of an RDGT is to discover pattern design deficiencies, correct them, and "grow" the 1iardn.are reliability. Reliability growth occurs by correcting pattern (repeating) failures and preventing thcrn from recurring in the field environment. RDGT is a process of Tcst-Ana!yze- And-Fix (TAAF) under repeated application of a \\'orst case mission environment. In actual use ho\\.c\.er. thc \\,orst casc environment may only occur occasionally. (if eiw). RDGT precipitates both pattern and single occurrcncc t!'pe failurc modes. The data analyzed in this papcr are based on hard\\.arc that experienced RDGT and accumulated niorc than half a life-time of field use without the bcnefit of the RDGT corrective actions. This occurred because the hard\\.are delivery schedule required LRU dcli\wy before complction of the RDGT and their was no opportunity to implcmcnt corrcctive actions into the delivered hardware. This provided an opportunity to compare the failure modes and corrective actions observed in RDGT, ESS, EQT, with those in the field. into the following questions: precipitating failures modes that may occur in the field? field failures that may occur in the field? prccipitating failure modes that may occur in the field? This paper attempts to provide some qualitative insight 1 Hon effective is RDGT in "growing" reliability by 2 How effective are "chamber-type" tests in precipitating 3 Which "chamber-type" test is most effective in 2. DESCRIPTICIN OF HARD WARE AND TESTS The hardware population analyzed in this paper consists of 13 different types of 283 Avionics LRUs. The 283 LRUs esperienccd approximately 2 million hours of operation in a ficld environment. The field environment consisted of LRU intcgration into an aircraft, system test, and deployed field operation (air inhabited cargo application). The complexity of these LRUs spans the range from simple to quite complex. The ratio of digital to analog design is approximately 60 % to 40 (!4 respectively. Five diffcrent companies conducted RDGT tests according to MIL-STD-1635. This provided a varied cross section of test personnel, facilities, company policiesiprocedures. etc. Two tests consisted of one LRU type? two tests involved three types, and one test consisted of fiize LRU types. Over 21,000 RDGT operating hours were logged from the combined tests. Four of the five RDGT tests escecded 4.000 hours of operation. The calendar time for the tests averagcd over 2 years. The shortest test lasted approximately one year. The longest test lasted 3.3 years. The ESS tests consisted of thermal cycling and Iibration. The temperatures ranged from -54°C to 55°C. The duration consistcd of 40 operating hours burn-in (failures permitted in each cycle), and 40 of failure-free operation. The \.ibration consisted of 5 minutes burn-in and 5 minutes of failurc-frce opcration. The profile varied from 2 to 3g RMS random o\cr a broad frequency spectrum. On average, each LRU cspcricnccd approximately 100 hours of ESS operation, accumulating approximately 28,000 hours of ESS test time. EQT was conducted according to MIL-STD-810 and consisted of: Tcmpcrature/Altitude, Temperature Shock, Humidity. Sand/Dust. Salt Fog, Explosive Atmosphere, Vibration. Operating Shock. and Bench Handling tests. a 62 0 149- 144>3/95/$4.00 0 1995 IEEE 1995 PROCEEDINGS Annual RELIABILITY and MAINTAINABILITY Symposium

Upload: e

Post on 07-Mar-2017

213 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: [IEEE Annual Reliability and Maintainability Symposium 1995 - Washington, DC, USA (16-19 Jan. 1995)] Annual Reliability and Maintainability Symposium 1995 Proceedings - On reliability

On Reliability Growth Testing Edward Demko * Grumman Melbourne Systems Division Melbourne

Key Words: Reliability Development GronTh Testing (RDGT), En~~ironmental Stress Screening (ESS), Environmental Qualification Testing (EQT), Pattern Failures, Test, Analyze, And, Fis (TAAF).

SuMiMARY B CONCLUSIONS

Reliability Development Growth Testing (RDGT) is the most common method used to improve equipment reliability. The author had an opportunity to perform an analysis of hardware that experienced Environmental Stress Screening (ESS), Environmental Qualification Testing (EQT), RDGT and field usage. The failure mode and corrective action data were used to qualitatively assess the effectiveness of RDGT testing. The results of this analysis yield the follouing conclusions:

1. RDGT is not a very good precipitator of field related failure modes, therefore RDGT alone docs not appcar to be a strong driver of reliability growth.

2. RDGT, EQT, ESS, and EQT tests precipitate a high percentage of failure modes that occur only in "chamber-t>pe" environments, and are not related to field use.

3. Of the three "chamber-type" tests (ESS, RDGT, and EQT) evaluated as precipitators of field related failure modcs. ESS appears to be the most effective.

4. "Chamber-type" tests are more eficient in de\tloping corrective actions than Field operation.

1. INTRODUCTION

The primary purpose of ESS testing is to precipitate latent defects (weak parts) and prevent their occurrence in field use. These failure modes usually occur as single occurrence events. However, ESS also precipitates pattern failure modes arid identifies design flaws.

The purpose of an RDGT is to discover pattern design deficiencies, correct them, and "grow" the 1iardn.are reliability. Reliability growth occurs by correcting pattern (repeating) failures and preventing thcrn from recurring i n the field environment. RDGT is a process of Tcst-Ana!yze- And-Fix (TAAF) under repeated application of a \\'orst case mission environment. In actual use ho\\.c\.er. thc \\,orst casc environment may only occur occasionally. (if eiw). RDGT precipitates both pattern and single occurrcncc t!'pe failurc modes.

The data analyzed in this papcr are based on hard\\.arc that experienced RDGT and accumulated niorc than half a life-time of field use without the bcnefit of the RDGT corrective actions. This occurred because the hard\\.are delivery schedule required LRU dcli\wy before complction of the RDGT and their was no opportunity to implcmcnt

corrcctive actions into the delivered hardware. This provided an opportunity to compare the failure modes and corrective

actions observed in RDGT, ESS, EQT, with those in the field.

into the following questions:

precipitating failures modes that may occur in the field?

field failures that may occur in the field?

prccipitating failure modes that may occur in the field?

This paper attempts to provide some qualitative insight

1 Hon effective is RDGT in "growing" reliability by

2 How effective are "chamber-type" tests in precipitating

3 Which "chamber-type" test is most effective in

2. DESCRIPTICIN OF HARD WARE AND TESTS

The hardware population analyzed in this paper consists of 13 different types of 283 Avionics LRUs. The 283 LRUs esperienccd approximately 2 million hours of operation in a ficld environment. The field environment consisted of LRU intcgration into an aircraft, system test, and deployed field operation (air inhabited cargo application). The complexity of these LRUs spans the range from simple to quite complex. The ratio of digital to analog design is approximately 60 % to 40 (!4 respectively.

Five diffcrent companies conducted RDGT tests according to MIL-STD-1635. This provided a varied cross section of test personnel, facilities, company policiesiprocedures. etc. Two tests consisted of one LRU type? two tests involved three types, and one test consisted of fiize LRU types. Over 21,000 RDGT operating hours were logged from the combined tests. Four of the five RDGT tests escecded 4.000 hours of operation. The calendar time for the tests averagcd over 2 years. The shortest test lasted approximately one year. The longest test lasted 3 . 3 years.

The ESS tests consisted of thermal cycling and Iibration. The temperatures ranged from -54°C to 55°C. The duration consistcd of 40 operating hours burn-in (failures permitted in each cycle), and 40 of failure-free operation. The \.ibration consisted of 5 minutes burn-in and 5 minutes of failurc-frce opcration. The profile varied from 2 to 3g RMS random o\cr a broad frequency spectrum. On average, each LRU cspcricnccd approximately 100 hours of ESS operation, accumulating approximately 28,000 hours of ESS test time.

EQT was conducted according to MIL-STD-810 and consisted of: Tcmpcrature/Altitude, Temperature Shock, Humidity. Sand/Dust. Salt Fog, Explosive Atmosphere, Vibration. Operating Shock. and Bench Handling tests.

a 62 0 149- 144>3/95/$4.00 0 1995 IEEE 1995 PROCEEDINGS Annual RELIABILITY and MAINTAINABILITY Symposium

Page 2: [IEEE Annual Reliability and Maintainability Symposium 1995 - Washington, DC, USA (16-19 Jan. 1995)] Annual Reliability and Maintainability Symposium 1995 Proceedings - On reliability

3. THE NA TURE OF FAILURES

This paper focuses on categorizing failure modes and corrective actions. Thc following definitions are used in the categorization:

1. Pattern failure modes reoccur and have a common cause that is repeatable. A failure with a potential to reoccur can sometimes be identified and corrccted based on only the first occurrence. For example, a part flies off a printcd circuit card during vibration testing. One does not need several parts flying olT to dctermine that the part restraint must be corrected.

2. Single occurrence failure modes involve dilTerent causes. Therefore, single occurrence failures require no corrective action since they have no common cause and exhibit no propensity to reoccur.

3. Latent dcfects involve part weaknesses that upon application of sufficient stress, precipitate the failure mode. These can be both pattern or single occurrence failure modes.

4. Infant mortality failures are observed early in the "life" of equipmcnt. These can be any combination of pattern, single occurrence design weakness, or latent failure modes.

One might ask, what types of failures are expected to be discovercd by RDGT? The author expected to find failure modcs not previously observed during ESS or EQT. The author also expected to find reasonable correlation between failure modes obscrved in RDGT and field use. The RDGT failure modcs wcre expected to relate to stress from thermal cycling, vibration, etc., simulating accelerated life. The :inalysis of the data presented in this paper surprisingly does riot support these expectations.

4. DA TA CA TEGORIZA TION

Figure I shows a categorization of the test failures and corrective actions that resulted from the factory and field tests esperienced by the hardware.

The failure data are segregated into two major categories; Single Occurrence and Pattern Failures. The _- Pattern - Failures are subdivided into two major groups: ]%tory Test Patterns, which includes pattern failures obscrved only in factory type testing, and Field Related patterns, which includes all pattern failures that had at least one occurrence in the field environment.

TOTAL FAILURE CATEGORIES (P+S=743)

291 ClAs

PATIERN FAILURES (P=140), 119 ClAs (5=603), 172 ClAs

EQT/RDGT ( I ) EQT/ESS (I)

P= PATTERN FAILURES S = SINGLE OCCURRENCE PRIMARY FAILURES CIA :=CORRECTIVE ACTIONS ONLY (9) ONLY (37) ONLY (2) ONLY ( I )

ESS (5) RDGT (8) ESS (2) ESS (2) EQT (2) EQT(1) RDGT(1) EQT(1)

RDGT/EQT ( 2) ESSRDGT (1)

FIGURE 1 FAILURE CATEGORIES

1995 PROCEEDINGS Annual RELIABILITY and MAINTAINABILITY Symposium 1 63

Page 3: [IEEE Annual Reliability and Maintainability Symposium 1995 - Washington, DC, USA (16-19 Jan. 1995)] Annual Reliability and Maintainability Symposium 1995 Proceedings - On reliability

The Factory Test Patterns are further subdivided into those first discovered in RDGT, ESS, EQT, or Other (factory tests). Within each of these test categories, the failure modes that repeated in other tests are also identified.

Field Related Patterns are similarly subdivided into those first discovered in RDGT. ESS, EQT, or Other (factory tests). However, since these failure modes were also observed in the field environment, an additional category was added, namely, field. This category identifies pattern failure modes that were first observed in the field environment. As with the other test results, failure modes that repeated in other tests are also identified. For example (see the highlighted box in figure l), 6 field related pattern failures were first discovered in RDGT, shown as P=6. One pattern failure was subsequently observed in the field, and 5 pattern failures were later observed in ESS. The number of corrective actions identified are also shown in each box. For the example shown above, 5 corrective actions were identified for the 6 failure modes first observed in the RDGT.

The Single Occurrence Failures are subditrided into those observed in the Field and those in Factorv Test. The factory Test failures are hr ther subdivided by specific test. For example, 204 single occurrence failures were observed in ESS tests, and 86 had developed corrective actions.

5. ANA I, YSIS OF RESUL TS

The data categorized in figure 1 are a n a l j x d for the following failure mode charactcristics:

1 OBSERVED - A count of failure modes observed in a specific test environment regardless of where the failure mode was first discovered. Obviously single occurrence failure modes are categorized as both observed and first observed.

2 FIRST OBSERVED - A count of failure modes first observed in a specific test environment.

3 PATTERNS - A count of pattern failure modes observed in a specific test environment regardless of where the failure mode was first observed.

4 CORRECTIVE ACTIONS - A count of corrective actions resulting from failure modes first observed in a specific test environment.

5 UNIQuE OBSERVED - A count of failure modes that are observed only in a specific test environment.

Figure 2 compares the factory test environments against the first four characteristics defined above. It is obvious from figure 2, that ESS testing yields more obsewed. first discovered failure modes, patterns, and correctiye actions than any other factory test. The implication is that ESS is a more efficient test for precipitating failure modes. patterns. and corrcctive actions.

Figure 3 compares "chamber-type" tests to the Ficld environment to the first four characteristics defincd aboi~e. It shows that "chamber-type test environments yield more observed, first discovered failure modes, patlerns, and corrective actions than field operation. The implication is that "chamber-type'' tests precipitate a large quantih of failure modes that are observed only in factory test

environments, and not related to the field environment. Figure 1 compares pattern failure modes from field

related factory tests and field only for characteristics 2), 4), and 5) defined above. Figure 4 shows that the field environment yields more first discovered, pattern failure modes. corrective actions developed, and uniaue discovered failure modes than factory tests. The implication is that many pattern failure modes escape factory tests and are only observed in the field.

0 B S E R VE D FIRST PATTERNS CO R R ECTl VE D ISCOVERED AC TI 0 NS

CHARAC TERlSTlC

FIGURE 2. COMPARISON OF FACTORY TEST FAILURE MODES

OBSERVED FIRST PATrERNS CORRECTIW Enow DISCOMRED

C ~ A R A C T E RISTIC

FIGURE 3. COMPARISON OF CHAMBER TEST AND FIELD FAILURE MODES

!

F I R S 1 DISCOMRED CORRECllW f C n O N s UNIQVE DISCOMRED

C n m x T E R i s n c

FIGURE 1. COMPARISON OF FIELD RELATED PATTERN FAILURE MODES

164 1995 PROCEEDINGS Annual RELIABILITY and MAINTAINABILITY Symposium

Page 4: [IEEE Annual Reliability and Maintainability Symposium 1995 - Washington, DC, USA (16-19 Jan. 1995)] Annual Reliability and Maintainability Symposium 1995 Proceedings - On reliability

6 COMMEN75 ON THE RES(JL75 A CKNO WIXDGMENTS

The conclusions presented in this paper indicate that while ESS and EQT tests can cull out infant mortality single occurrence failure modes, they can also precipitate most of the pattern failures that might otherwise be discovered in RDGT. The results in this study suggest that ESS has the potential to be more effective than RDGT in increasing field reliability.

As a point of interest, of the 13 types of LRUs presented in this study, 10 types met or exceeded their reliability goals. Two of the three LRUs that did not meet their goals were designed out of the final production configuration for non- reliability reasons. In addition to the LRUs analyzed in this paper, the overall system configuration includcd LRUs that did not experience RDGT, but were subjected to ESS and EQT. Most of these LRUs exceeded their reliability requirements.

The writer also concludes that the 80 hour ESS test program described in this paper was effective in "growing" the equipment reliability. The strong ESS tcst program appears to be the reason why the LRUs that did not have the benefit of RDGT met their reliability requirements.

The cost of an RDGT does not appear to be justified because of the extensive tcst time, troubleshooting, repair, and corrective action costs for failure modes that for the most part can be discovered with other tests With development budgets cut to the bone, funding an RDGT that ties-up asscts for two or more years sec" very unlikely. Instead of an RDGT, a more cost effective alternative may be to implement an aggressive ESS of 80 hours or more with a failure-free element.

This paper also points out the need for an effective Failure Reporting Analysis and Corrective Action System (FRACAS) to provide the necessary data to monitor test effectiveness. Based on the poor corrcctivc action effectiveness of field failures, i t IS apparent that an improvement in the field FRACAS system is suggested

The raw data that resulted from these five RDGT case studies can be analyzed for other characteristics No attempt was made in this papcr to dctermine the reasons behind the results obtained or what changes might be made to improve RDGT or ESS effectiveness in improving reliability growth. These added studies have no impact on the qualitative conclusions in this paper, however, they may be the subject of future papers

I wish to express my appreciation to my colleagues at Melbourne Systems, Gary Epstein, Billy DeBusk, and Charles Contcss for their assistance in preparing this paper.

BIOGRAPHY

Edward Dcniko Rcliability Managsr Gruninian Melboume Systems

Melboume. I:lorida 32902-9650 USA

Edward D m k o joined Grumman Melbourne Systems February 1988 as Maiiagcr of Rcliability. IIe lis held reliability positions from 1958 with Singer Ksarrott, Lockliccd Electronics Company and Picatinny Arsenal. Mr. Demko rcceived BSME and MSEE degrees from Stevens Institute of Technology, as

well as an MBA froni Fairleigh Dickinson University. Mr. Demko has published six tcchnical papers including four for the RAMS.

1995 PROCEEDINGS A n n u a l RELIABILITY and MAINTAINABIL.ITY S y m p o s i u m 165