[ieee annual reliability and maintainability symposium 1992 proceedings - las vegas, nv, usa (21-23...

5
ExperimentaliDesign Techniques in Reliability-Growth Assessment H. Claudio Benski 0 Merlin Gerin 0 Grenoble Emmanuel Cabau [3 Merlin Gerin 0 Grenoble Key Words: Experimental Design, Reliability Growth, Fractional Factorials ABSTRACT The goal of corrective maintenance actions in repairable systems is to lengthen as much as possible the expected time to the next system failure. In order to achieve this goal there are usually several system parameters that can be changed before each new run. Now, the effect of each parameter change on the system’s reliability can best be assessed by specific experimental designs. The response of these experiments will be the actual time until the system fails. Since for a given system only one such time will be available after each corrective maintenance action, the experimental design will be an unreplicated one. This is a typical situation in reliability growth programs, although growth as such will not occur during the experiments. It is therefore important to determine how powerful are the available numerical techniques in identifying ssignificant effects when applied to these unreplicated factorial designs. The results of an extensive Montecarlo simulation are presented in which the power of three recently published techniques is analyzed. In view of these results, a measure of performance to complexity ratio is also given for each technique. Consequently, a natural approach to this problem is to use a factorial experimental design to determine the impact of the different system parameters on its reliability. In general, this experimental design will be a fraction of the full factorial. This is so, because high order interactions between the parameters are of little interest and fractional factorial designs allow for a very significant reduction in the number of experiments [l]. In most cases these designs will also be unreplicated. The reason for this is that, for a single system, the response of these experiments is a && measurement. For example, for each configuration of the factors, only a single measurement of the time to the following failure is obtained. After each change in the system parameters, usually consecutive to a failure, a new value of the system’s lifetime is obtained. Note however that, if the modified parameters do have an impact on the system’s reliability, this time to failure is no longer issued from the same statistical population as the previous times between failures [2]. The implication here is that there is no independent way to evaluate the residual noise of these measurements and therefore it is not possible to perform either a classical or a non- parametric Analysis of Variance to assess the s- significance of each factor and eventual interactions. Due to the fractionning feature of these designs, variance pooling is also very restricted [l]. To overcome these limitations, several recent statistical methods have been proposed in the literature, In reliability growth programs the time pattern including a Bayesian technique, to detect the presence of of system failures is observed. The usual assumption significant effects in unreplicated factorials, [3 - 51. It is behind these programs is that, after each system failure, recognized, however, that these techniques were corrective maintenance is applied such that the system’s developed for s-normally distributed responses. And this reliability is better after these actions than it was before. may or may not be the case for times between failures. This should translate in a (stochastic) increase in times In fact, for Homogeneous Poisson Processes (HPP), these between successive failures. However, in real life, there times are exponentially distributed. Still, response data may be many system parameters that maintenance transformations can be applied to these times [6] so that, actions could modify to achieve a reliability at least approximately, these procedures can be used. It improvement and the precise effect of each such was therefore considered important to determine how parameter on the system’s reliability is rarely known. well these different techniques performed in terms of Even less so, are the possible interactions between these power. The actual details of a fractional factorial design parameters. applied in the context of reliability growth are described in Section 2. The power comparison results are described 1. INTRODUCTION 0149-144X/92/0000-0322$01.00 0 1992 IEEE 322 1992 PROCEEDINGS Annual RELIABILITY AND MAINTAINABILITY Symposium

Upload: e

Post on 14-Apr-2017

214 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: [IEEE Annual Reliability and Maintainability Symposium 1992 Proceedings - Las Vegas, NV, USA (21-23 Jan. 1992)] Annual Reliability and Maintainability Symposium 1992 Proceedings -

ExperimentaliDesign Techniques in Reliability-Growth Assessment

H. Claudio Benski 0 Merlin Gerin 0 Grenoble Emmanuel Cabau [3 Merlin Gerin 0 Grenoble

Key Words: Experimental Design, Reliability Growth, Fractional Factorials

ABSTRACT

The goal of corrective maintenance actions in repairable systems is to lengthen as much as possible the expected time to the next system failure. In order to achieve this goal there are usually several system parameters that can be changed before each new run. Now, the effect of each parameter change on the system’s reliability can best be assessed by specific experimental designs. The response of these experiments will be the actual time until the system fails. Since for a given system only one such time will be available after each corrective maintenance action, the experimental design will be an unreplicated one. This is a typical situation in reliability growth programs, although growth as such will not occur during the experiments. It is therefore important to determine how powerful are the available numerical techniques in identifying ssignificant effects when applied to these unreplicated factorial designs.

The results of an extensive Montecarlo simulation are presented in which the power of three recently published techniques is analyzed. In view of these results, a measure of performance to complexity ratio is also given for each technique.

Consequently, a natural approach to this problem is to use a factorial experimental design to determine the impact of the different system parameters on its reliability. In general, this experimental design will be a fraction of the full factorial. This is so, because high order interactions between the parameters are of little interest and fractional factorial designs allow for a very significant reduction in the number of experiments [l].

In most cases these designs will also be unreplicated. The reason for this is that, for a single system, the response of these experiments is a && measurement. For example, for each configuration of the factors, only a single measurement of the time to the following failure is obtained. After each change in the system parameters, usually consecutive to a failure, a new value of the system’s lifetime is obtained. Note however that, if the modified parameters do have an impact on the system’s reliability, this time to failure is no longer issued from the same statistical population as the previous times between failures [2]. The implication here is that there is no independent way to evaluate the residual noise of these measurements and therefore it is not possible to perform either a classical or a non- parametric Analysis of Variance to assess the s- significance of each factor and eventual interactions. Due to the fractionning feature of these designs, variance pooling is also very restricted [l].

To overcome these limitations, several recent statistical methods have been proposed in the literature,

In reliability growth programs the time pattern including a Bayesian technique, to detect the presence of of system failures is observed. The usual assumption significant effects in unreplicated factorials, [3 - 51. It is behind these programs is that, after each system failure, recognized, however, that these techniques were corrective maintenance is applied such that the system’s developed for s-normally distributed responses. And this reliability is better after these actions than it was before. may or may not be the case for times between failures. This should translate in a (stochastic) increase in times In fact, for Homogeneous Poisson Processes (HPP), these between successive failures. However, in real life, there times are exponentially distributed. Still, response data may be many system parameters that maintenance transformations can be applied to these times [6] so that, actions could modify to achieve a reliability at least approximately, these procedures can be used. It improvement and the precise effect of each such was therefore considered important to determine how parameter on the system’s reliability is rarely known. well these different techniques performed in terms of Even less so, are the possible interactions between these power. The actual details of a fractional factorial design parameters. applied in the context of reliability growth are described

in Section 2. The power comparison results are described

1. INTRODUCTION

0149-144X/92/0000-0322$01.00 0 1992 IEEE 322 1992 PROCEEDINGS Annual RELIABILITY AND MAINTAINABILITY Symposium

Page 2: [IEEE Annual Reliability and Maintainability Symposium 1992 Proceedings - Las Vegas, NV, USA (21-23 Jan. 1992)] Annual Reliability and Maintainability Symposium 1992 Proceedings -

in Section 3.

2. EXPERIMENTAL DESIGNS IN RELIABILITY GRQWTE

Failures of a complex system usually result in an intervention by a corrective maintenance team. During system development and after these failures, it is frequently the case that many system parameters (or factors) are considered to be possible candidates for change. As an example, consider the case of an elaborate programmable controller which fails to perform a function it was supposed to perform when a certain external event is present at one of its sensor inputs. This failure can be due to many different causes: a software failure, an incorrect threshold, a thermally induced early failure of a hardware component, poor electromagnetic shielding, etc.

The maintenance team may ponder what system parameters should be changed and in what order. Maybe the hardware component failed due to a bad circuit design or a poor thermal design. Or an electromagnetic glitch, induced by the external event, resulted in a wrong calculation of the triggering threshold. In the Design of Experiments (DOE) parlance, changing a system parameter implies assigning it a different level. Thus, the parameter levels that could be considered ioclude different software versions, the position of a cooling fan, type of grounding of a circuit board, characteristics of a critical component, etc.

Experimenting with one design change at a time is known to be a costly and inefficient way of achieving the sought improvement [l]. Alternatively, using an appropriate experimental design will lead to a more efficient reliability improvement program since several parameters are modified in a single maintenance action. These designs consist in a series of different system configurations in which all the parameter levels are changed at each experiment. Of course, the improvement in system reliability, if any, will not be visible until the end of the whole series of experiments. The statistical analysis of these experiments will suggest the actual configuration of the significant system parameters which will result in the longest expected time to system failure.

Consider, for example, a system in which six design parameters, say A , B, C, D, E and F have been targeted as candidates for having an effect on the system’s reliability. With only two levels for each parameter, a complete factorial design will result in 2 6 experiments in which the time to system failure has to be measured for each of the 64 system configurations. This is economically penalizing and usually unnecessary. In fact, if only the effects of each individual parameter and, say, seven of the two parameter interactions are to be estimated, it can be shown [l] that 16 experiments will suffice to obtain the desired result using a fractional

factorial design (FFD). At the end of this experimental phase it may well turn out that not all specified factors and interactions have a s-significant influence on the time to failure. But then, the next phase will only need to concentrate on the s-significant ones. And here is where important savings occur: less system parameters are modified to obtain a concurrent increase in system reliability. Conversely, without the experimental design, useleas and sometimes costly changes will rarely produce reliability growth. And little experience is gained in the process.

To illustrate this problem by a specific example, let us assume that out of a total of 15 two-factor interactions, the seven following ones are suspected as possibly having an effect on the MTBF: AB, AC, BF, CD, CF, DE and DF. The remaining eight interactions are felt to be physically impossible. It is not intuitively obvious which experiments have to be carried out to assess the effects of these six factors and seven interactions. Fortunately, there are now a number of programs that automatically generate the necessary experimental arrays as a function of the specified interactions. With parameter levels labeled - and +, such a design will take the following form:

Run A B C D E F 1 2 + + - - + - + 3 + - - - - 4 - + - - + + 5 + - - + + - 6 - + - + - -

+ + + 7 8 + + - + - + 9 + - + - + - 10 - + + - - -

+ - + + 11 12 + + + - - + 13 - - + + - - 14 + + + + + - 15 + - + + - + 16 - + + + + +

- - - - - -

- - -

- -

To each run there will be onlv one corresponding measurement of the time to system failure. Now, the analysis of FFD’s is based on formal or informal Analysis of Variance (ANOVA). This is not always possible, as has been mentioned in the Introduction: there may be no independent estimate of the residual (statistical) noise to which one could compare the estimate of each effect. Of course, a ranking of the effects is an alternative and an arbitrary selection of the two or three largest effects is one possibility [7]. But this solution is a dangerous one: there may be more than three &significant effects or perhaps less than two. In both cases, choosing the wrong number of system factors

1992 PROCEEDINGS Annual RELIABILITY AND MAINTAINABILITY Symposium 323

Page 3: [IEEE Annual Reliability and Maintainability Symposium 1992 Proceedings - Las Vegas, NV, USA (21-23 Jan. 1992)] Annual Reliability and Maintainability Symposium 1992 Proceedings -

to change can lead to either a suboptimal reliability improvement or economic waste. The evaluation of the statistical risk incurred in making a wrong choice is not possible by this approach.

In order to illustrate the difficulties associated with the analysis of this type of designs we will assume that the above FFD has resulted in the following sixteen times to failure, given in the same order as the design:

Run ti

1 0.4703 2 74.279 3 0.0322 4 1.0980 5 0.0005 6 0.1884 7 0.0176 8 8.1720 9 0.0850 10 0.3989 11 1.8602 12 3.7119 13 0.7626 14 104.07 15 0.0298 16 0.0650

ln(iJ

-0.754 +4.308 -3.437 +0.094 -7.664 - 1.669 -4.037 +2.101 -2.464 -0.919 +0.621 + 1.312 +4.645 -3.512 -2.733

-0.271

The natural logarithms of the times to failure are also given since they are the transformed responses that will be used for the analysis for reasons that will become clear later. The times ti are simulated times between system failures, randomly issued from exponential distributions having different X parameters depending on the settings of factor B and the interaction AB. The parameter of the exponential distribution used for each run is calculated using the formula:

where A, is the base value, arbitrarily set equal to 1, and the n factors are similar to the ones used by MIL-HDBK 217 for electronic components. This corresponds to a model in which a base A is multiplied or divided by factors of improvement or degradation depending on whether the n factors are smaller or greater than 1. For example, if a better cooling system was one of the factor levels, we would expect electronic components to have a reduced failure rate which leads to a smaller intensity of failures for the system. We have used the following numerical values for the n factors: n ~ . = 7 and IIAB = 5 giving a thirtyfivefold improvement in MTBF with respect to its base value 1/Ab.

Using the logarithms of the times between failures as the response has two effects: the response is

now an additive linear model in terms of the active parameters and is approximately s-normally distributed under the assumption than there aren’t any s-significant effects. (The exact distribution is an extreme-value function.)

The goal of the analysis is then to “discover” that, out of 15 potentially active factors or double interactions, only B and A B are s-significant and to estimate their actual effects.

Using Yates algorithm [l] the calculated effects are as follows:

Factor Effect A +0.62 B +3.58 C +0.97 D -1.49 E -0.01 F -0.60 A B +3.78 A C +0.20

CD +1.38 BF -0.80

CF -0.73 DE - 1.60 DF -0.20

Hadn’t there been any noise in the simulation, only factors B and A B would have been different from zero. The theoretical expected values of their effects are 3.89 and 3.22 respectively. Applying now the techniques given in references [3 - 51 we can assess the statistical significance of the above estimated effects.

Reference [3], the Bayesian technique, gives as expected the highest posterior probability to effects B and A B 85% and 88% respectively. All the others are smaller than 20%. The prior probability adopted for this test was a uniform 20% for all effects, a typical assumption in the literature. Real effects are assumed to come from a s-normal distribution with a scale factor ten times bigger than the noise.

Reference [4] produces two sets of limits that effects must exceed to be ssignificant. Effects exceeding the outer limits are ssignificant a t a 95% confidence level. Effects smaller than the inner limits are not ssignificant a t this confidence level. In-between cases “may be s-significant”. This latter situation is the case for effects B and A B since the outer limits are f5.7 and the inner limits are f2.8. All other effects are smaller than the inner limits and thus are not statistically significant.

324

~-

1992 PROCEEDINGS Annual RELIABILITY AND MAINTAINABILITY Symposium

-

Page 4: [IEEE Annual Reliability and Maintainability Symposium 1992 Proceedings - Las Vegas, NV, USA (21-23 Jan. 1992)] Annual Reliability and Maintainability Symposium 1992 Proceedings -

0 Reference [5] is a combination of two techniques: a normality test and an outlier detection test. In this technique, effects that are $significant must produce a rejection of the hypothesis of normality of all effects be detected as outliers. Only B and A B qualify as such, with the normality test level set at 89% and the limits for the outlier test calculated as f 1.82. No other effects are identified as significant, even when the normality test level is lowered to 60% confidence.

Since in this particular experimental design two degrees of freedom are available to estimate the error variance, a classical ANOVA is also possible for comparison purposes. When this was done, again effects B and A B were identified as ssignificant at a 97% confidence level. According to this ANOVA, three other effects, DE, D and CD, had 86%, 84% and 82% confidence levels. We should point out that an ANOVA would not have been possible, had we been interested in 9 double interactions instead of 7, (except through an afterthought variance pooling.)

The conclusion of all these analyses is that B and A B have been (correctly) identified as being the only s-significant effects although with a somewhat limited confidence. Setting both, A and B to their +. level should therefore maximize the expected time to system failure. The effect of A will only manifest itself via its interaction with B. The four other single factors are irrelevant and can be disregarded as well as all the other two-factor interactions.

To compute the actual improvement that would be possible on the MTBF by setting the s-significant factors to their + value we must go back to the original time domain. If we denote the effects of B and AB by T3 and d!B we would have the formula:

MTBF' = (MTBFO) x ez{h - 3) x ez{; - dT3)

where the MTBF' is the improved MTBF and MTBF is the baseline MTBF. The factor is due to the fact t t a t in Eq. (1) the n factors can multiply or divide the base A,. Numerically the effect of B in lengthening the

interaction A B is ezp(1- 3.78) 21 + 6.6. These numbers are to be compared to b e theoretical factors of 7 and 5. The agreement is only fair due to the scarcity of the data: we are trying to estimate 13 possible n factors using only 16 observations! Overall, we can expect the MTBF of the improved design to be about 6 x 6.6 N 40 times better than the original. This compares very well to the theoretical factor of 35 used for the simulation.

A legitimate question arises now as to why the observed times between failures of this experiment are so wildly dispersed, spanning over five orders of magnitude. The answer lies in the exponential nature of these times.

MTBF is ezp(?.3.58) i 2 + 6 and the effect of the

We have assumed a HPP and this is a very unrealistic model in this case, in spite of the large amounts of literature using it. Real world systems behave in a more orderly way and times between failures are not so dispersed. For a DOE application, this is good news because the effect of system parameters should be even more easily detectable than this example might otherwise suggest.

3. MONTE-CARLO POWER ANALYSIS

Because an experimenter will often find it impossible to perform the variance pooled ANOVA, Refs. [3 - 51 will be the only way to assess the statistical significance of the calculated effects. In this section we present a summary of a power study comparing these techniques.

A Montecarlo study [8] was carried out in which 15 and 31 s-normal variates were generated. The noise effects were issued from a +normal distribution, N(0,a). This noise distribution was contaminated by simulated "real" effects which were issued from a N(0 ,b) distribution with k = 5, 10 and 15. The number of contaminants varied from 0 to 3. This corresponds to many typical fractional factorial designs. The number of Monte Carlo samples was always set equal to 10 000 and ~7 was set equal to 1 without loss of generality. A vectorized machine was used to ensure reasonable response times during these tests.

For each technique, we counted the number of times that the correct number of real effects were detected as well as the number of times they detected some but not all of these. Also recorded were the number of times that spurious, (noise), effects were detected either alone or with some of the real effects. This gave a performance measure of each technique. The following table, extracted from this study, gives the percent probabilities that each technique will detect the number of real effects (0, 1, 2 or 3) out of a total of 15 and 31 possible ones and nothing else. This table corresponds to a simulation in which the prior assumptions of the Bayesian technique were satisfied.

case Ref. [3]

0/15 97.53 0/31 96.47

1/15 68.04 1/31 68.98

2/15 44.78 2/31 48.66

3/15 29.25 3/31 35.09

Ref. [4] Ref. [5]

97.98 97.02

58.76 64.24

36.18 43.28

22.31 29.65

92.36 92.65

67.44 66.50

46.35 47.43

33.48 35.52

1992 PROCEEDINGS Annual RELIABILITY AND MAINTAINABILITY Symposium 325

Page 5: [IEEE Annual Reliability and Maintainability Symposium 1992 Proceedings - Las Vegas, NV, USA (21-23 Jan. 1992)] Annual Reliability and Maintainability Symposium 1992 Proceedings -

The analysis of this table shows that when the prior hypothesis of the Bayesian technique, Ref. [3], corresponds exactly to the conditions of the simulated sample, this is the most powerful technique, albeit by a small margin. When this is not the case, a much simpler and faster technique [5] is shown in Ref. [8] to perform better than the others. Ref. [4] was the least powerful of all three.

Many other measures of performance are of course possible. A technique which detects all real effects but also detects lots of non real ones is not very useful. Conversely, it is interesting to assess the ability of a procedure to detect only a large fraction of the real effects without contamination by spurious ones. For example, Ref. [8] gives the probabilities of detecting two out of three real effects without adding any other spurious factors. All other combinations are also analyzed.

Another useful characteristic available to compare these techniques is to evaluate their complexity, as measured by the CPU time needed to complete an analysis. This is shown in the following table. The technique described in Ref. [4] executed in the shortest time and is used as a reference here by giving it a CPU time of 1.

Ref. CPU units

1400

The technique given in Ref. [5], almost as powerful as that in Ref. [3], seems to attain a good compromise of desirable statistical characteristics and a straightforward implementation.

REFERENCES

G. E. P. Box, W. G. Hunter and J. S . Hunter, “Statistics for Experimenters”, John Wiley & Sons, New York, N. Y. 1978. H. Ascher and H. Feingold, “Repairable Systems Reliability”, Lecture Notes in Statistics, vol. 7, Marcel Dekker, Inc. 1984. G. E. P. Box and R. D. Meyer, “An Analysis for Unreplicated Fractional Factorials”, Technometrics, vol. 28, 1986, pp. 11-18. R. V. Lenth, “Quick and Easy Analysis of Unreplicated Factorials”, Technometrics, vol. 31,

H. C. Benski, “Use of a Normality Test to Identify Significant Effects in Factorial Designs”, Journal of Quality Technology, vol. 21, 1989, pp. 174178.

1989, pp. 469-473.

326 1992 PROCEEDINGS Annual

[6] G. E. P. Box and N. D. Draper, “Empirical Model- Building and Response Surfaces”, John Wiley & Sons, New York, N. Y. 1987.

[7] R. N. Kackar and A. C. Shoemaker, “Robust Design: A Cost-Effective Method for Improving Manufacturing Processes, A T&T Technical Journal, vol. 65, 1986, pp. 39-50. E. Cabau and H. C. Benski, to be published. [8]

BIOGRAPHIES

H. Claudio Benski PhD Merlin Gerin 38050 Grenoble Cedex, France

H. Claudio Benski received his PhD in Physics from Brandeis University in 1972. He also holds a ”Licenciado” degree from the University of Buenos Aires. He is the president of the French delegation to the Technical Committee 56 of the International Electrotechnical Commission (Reliability and Maintainability) and an associate editor for the IEEE Transactions on Reliability. Presently, he is the reliability manager at the Technical Division of Merlin Gerin. He also teaches Reliability and Statistics at the Institute of Applied Mathematics and the Engineering School of the University of Grenoble.

Emmanuel Cabau Merlin Gerin 38050 Grenoble Cedex, France

Mr. Cabau is an engineer from the Polytechnic Institute of Grenoble. He graduated from this institute in 1989 and has worked since then in the reliability group of the Technical Division of Merlin Gerin. He also has a Masters degree in Statistics from the University of Grenoble.

RELIABILITY AND MAINTAINABILITY Symposium