[ieee annual symposium reliability and maintainability, 2004 - rams - los angeles, ca, usa (jan....

Modeling Reliability Growth in the Product Design Process Milena Krasich, Bose Corporation, Massachusetts, USA John Quigley, Lesley Walls, University of Strathclyde, Glasgow, Scotland Key Words: product design, reliability growth, goal reliability, integrated reliability engineering

SUMMARY AND CONCLUSION

Relying on reliability growth testing to improve system design is not always cost effective and certainly not efficient. Instead, it is important to design in reliability. This requires models to estimate reliability growth in the design and to assess whether goal reliability will be achieved within the target timescale. While many models have been developed for analysis of reliability growth in test, there has been less attention given to reliability growth in design. This paper proposes and compares two models - one motivated by the practical engineering process (the Modified Power Law) and the other by extending the reasoning of statistical reliability growth modeling (the Modified IBM). The commonalities and differences between these models are explored through an assessment of their logic and an application. We conclude that the choice of model depends on the growth process being modeled. Key drivers are the type of system design and the project management of the growth process. When the design activities are well understood and project workloads can be managed evenly, leading to predictable and equally spaced modifications each of which having a similar effect on the reliability of the item, then the Modified Power Law is a more appropriate model. On the other hand, the Modified IBM is more appropriate for more uncertain situations, where the reliability improvement of a design is driven by the removal of faults, which are yet unknown and only through further investigation of the design will these be identified. These situations have less predictable workloads and fewer modifications are likely later on in the project.

1. INRODUCTION

Improving reliability through a growth program should be

a key part of the overall reliability activity in the product development process. This is especially true for a design, which uses novel, or unproven, techniques, latest technologies, newly developed component parts, or has a substantial content of software. In such cases the analytical growth program may expose many types of design-related weaknesses over a period of time. It is essential to reduce the probability of product failure due to these design weaknesses as far as possible to prevent their later appearance in formal tests or in the field. This is because, at that late stage, design correction is often highly inconvenient, costly, time-consuming, and at times not practical. It is generally accepted that life cycle costs will be minimized if the necessary design changes are made at the earliest possible stage, during the program development. The importance of integrating the reliability growth program into the product development process is driven by limited time to market, program costs,

and striving for product cost reduction. Although a reliability growth test program is effective for disclosure of potential field problems, it is typically expensive. It requires extensive test time and resources, and the corrective actions are considerably more costly, and at times not possible to implement, than if they were found and corrected in early design, or before the design is frozen. Additionally, the duration of reliability growth tests that sometimes may last for two or three years, could seriously affect the marketing or deployment schedule of the product. Therefore while reliability growth testing still has an important role, it should be secondary to growth management during design, and should be considerably shortened to disclose faults not identified during reliability analysis

During the last few years leading industry organizations have developed and applied state-of-the-art integrated reliability growth analysis methods and processes for increasing the product reliability during the design phase, thus requiring less reliance on formal reliability growth testing. Many models already exist for reliability testing programs. Therefore it is also timely to consider how these might be adapted for growth in design. In this paper we shall present two models that are applied to the reliability growth in the product design phase. One such model is a modification of the Power Law Model for reliability growth developed by Krasich (1999). A second model introduced is an adaptation of the growth model proposed by Quigley and Walls (1999, 2003), which is, in turn, related to other growth models such as IBM (Rosner 1961), Meinhold and Singpurwalla (1983). The basic principles of reliability growth of a product are the same during design and test. This is because both involve identifying and mitigating weaknesses to improve the product and both measure that improvement by comparing the estimated reliability with the goal reliability. The mathematical models for reliability growth are constructed to estimate the growth achieved in the projected reliability of the product for a predetermined time (such as desired life, warranty period, or a mission), see, for example, Jewell (1984). Reliability growth models aim to support the planning of a reliability improvement program by estimating the number and the magnitude of the product changes during the design and development process or the test time required to reach a specified reliability goal.

The reliability growth models can be formulated in terms of the failure rate (or intensity) or probability of survival to a specified time (the reliability) as shown in Figure 1 assuming that the usage profile of the product during the time of interest is known.

The paper begins by describing the model formulations and then considers their application to a common example that is based on a real problem. Based on both the experience

and an assessment of the underpinning logic we compare the use of these two models for supporting reliability growth programs in design.

1.1 Symbols

Dα reliability growth rate resultant from fault mitigation ( ) d t number of design modifications

D number of modifications that will be made during a specified design period tD

kη expected number of design weaknesses in fault class k likely to result in failure if the design is not modified

( ) a tλ hazard rate of the product at time t

0aλ initial average hazard rate of product

D( )aG tλ goal average failure rate at the end of design period tD R0(T) initial reliability for the predetermined product

operatonal life at time T RG(T) product reliability goal at time T Ra(t,T) actual reliability at time T as a function of t

( , )R t T product reliability at time T is a function of time t (the reliability growth model in the time period from 0 to tD)

tD specified design period

Gt expected time to reach the goal reliability

2. GROWTH MODEL FORMULATIONS 2.1 Modified Power Law Model

The modified power law model (Krasich, 1999) aims to support planning decisions in the product design phase. It assumes design reliability improvements are implemented successfully to mitigate a failure mode or to reduce its probability of occurrence, and requires information about the

time from the beginning of design to occurrence of the improvement. This model can estimate the number, or the magnitude, of improvements in the original design to increase its reliability from that initially assessed to its goal value. The assumption of a power law is justified by the fact that the early improvements will be those that contribute most to the reliability improvement. That is, the failure modes with the highest probability of occurrence will be addressed first, followed by improvements of lesser and lesser unreliability contribution. The actual reliability values achieved in the course of the design are then plotted corresponding to the design time when they were realized, and compared to the model. This model is used to plan the strategies necessary for reliability improvement of a design during the available time period from the initial design revision till the design is completed and released for production. The model can be formulated as follows. The initial product reliability for the predetermined product operational life at time T is denoted by R0(T). Assuming that the distribution of the time to failure is exponentially distributed then the initial average hazard rate of that product is:

ln ( )a

λ = − (1)

Under the assumption that growth in reliability follows a power function, the product hazard rate decreases as modifications are made. We can represent the hazard rate after a period of design t as a function of the accumulated number of modifications made by t, d(t). Therefore, we can express the hazard rate of the product at time t during the design period as:

[ ]0( ) 1 ( ) D

a at d t αλ λ −= + (2) Assuming the number of design modifications is a function of time distributed linearly over the design period then:

td t Dt

= × (3)

Figure 1 Planned improvement of the equivalent failure rate or reliability

Number of design improvements1 2 3 4 5

teInitial Goal

Number of design improvements1 2 3 4 5

Initial

where D is the number of modifications that will be made during a specified design period tD

and Dα is the reliability growth rate resultant from fault mitigation.

The average failure rate as a function of design time becomes:

( ) 1D

a att D

λ λ−

= + ×

Denoting the product reliability goal by RG(T), then the goal average failure rate at the end of design period tD, is:

Dln[ ( )]( ) G

aGR TtT

λ −= (5)

Introducing the initial reliability estimate to express the initial average hazard rate as:

( ) 1D

aG aDttt

λ λ−

( ) [ ] ( )00

ln ( )1 1D D

R TD D

Tα αλ − −= + = − + (6)

Substituting D( )aG tλ into the expression for goal reliability and solving for D gives:

[ ][ ]0

ln ( )ln

ln ( )

R TR T

D e α

= − (7) Solving the same equation for the growth rate, expressed

as a function of design modifications, initial and reliability goal gives:

[ ][ ]

0ln ( )ln

ln ( )ln(1 )

R TR T

During the design period, continuous improvement of product reliability that it has to have at the time T is a function of time t (the reliability growth model in the time period from 0 to tD) can be written as:

( )( , ) exp ( )R t T t Taλ= − (9)

Substituting the expression for the average failure rate, the modified power law model for the design phase 0 < t < tD, is derived as follows:

( , ) exp 1D

aDtR t T Tt

= − +

ln ( )exp 1

DR T Dt TT t

α− = +

In the above equation, expressing D in terms of initial and goal reliability, the reliability growth as a function of time in design period, available for design improvements becomes:

[ ][ ]

ln ( )ln ( )

0( , ) ( )

R Tt t t

R t T R T

−−

+ ⋅ −

= (11)

When done routinely in development of different products, it is convenient to graph the desired reliability improvement

(the growth model), and then compare the actual results to the model in the course of product improvement.

For that purpose, a spreadsheet template has been developed and is shown in Figure 2 with embedded equations for the reliability growth model, where the change in parameters allows the model to be re-drawn. Addition of actual product reliability allows for easy comparison of actual results to the planned reliability growth as shown in Figure 3. 2.1 Modified IBM Model

This modification of the IBM-Rosner reliability growth was initially motivated by the analysis of test data (Quigley and Walls, 1999, 2003). This version is adapted for supporting planning decisions during the product design phase.

The model is based on a Bayesian approach that combines a prior distribution for the number of design weaknesses in the new product design with empirical data for the reliability of similar product designs to produce a posterior distribution for estimating the reliability of the new product design.

Like the IBM-Rosner model (Rosner 1961), this model assumes a fixed number of weaknesses or potential faults are inherent in the product design and that within the period between design modifications the rate at which failures occur are constant. It is further assumed that modifications to the design to remove weaknesses are perfect.

In this model we additionally, decompose the inherent failure rate of the product design into the systematic and non-systematic (or noise) failures. This allows us to modify the reliability growth profile as the systematic failures rate changes when design modifications are implemented, while always taking into account the impact of noise failures on the estimated reliability at a given time.

The non-systematic failures are assumed to occur at a constant rate ( NSλ ) and can be estimated using data from similar product designs or using engineering judgment.

The systematic failures are assessed through a combination of expert judgment about the design weaknesses and the failure rates associated with fault classes from engineering experience.

To assess the effect of systematic failures on the reliability of the product design, all potential design weaknesses (D) should be identified and may be allocated to one of K fault classes as appropriate. The probability of each

design weakness within each fault class resulting in failure during the specified life of the product should be estimated using, for example, engineering judgment. Procedures for identifying design weaknesses and estimating their probability of resulting in failure using engineering judgment may be required. See, for example, Walls and Quigley (2001). The expected number of design weaknesses in fault class k ( kη ) likely to result in failure if the design is not modified can be calculated using:

=∑ (12)

where kD is the total number of design weaknesses

Figure 3. Reliability growth model and tracking of actual reliability

expected in fault class k and kjp is the probability of the jth design weakness in fault class k being realized. This calculation is based on the assumption that the number of design weaknesses for each fault class is a Poisson random variable. Systematic failure rates are also required for each fault class. These may be estimated using empirical or generic data on relevant existing product designs.

Given the input data have been specified, the (posterior) estimator of the reliability of the initial product design can be found. This is the product of the reliability of the nonsystematic failures and the reliability of the systematic failures. The rate of the former is the product of the (prior) distribution for the number of design weaknesses and the (empirical) data for the systematic failures. Thus the reliability of the initial product design can be written as:

( ) ( )01

exp 1 k

R T T e λλ η −

= − + −

∑ (13)

Given that modifications will be implemented to remove design weaknesses, the reliability of the product design will grow. Therefore to take into account the rate of reliability growth ( Dα ), the reliability of the modified product design at time T is given by:

( ) ( )1

, exp 1 kD

R t T T e e λαλ η −−

= − + −

∑ (14)

To estimate the rate of growth the goal reliability, replace ( ( )GR T ) and the specified time of the design period ( Dt )

with ( )R T and the time index T on the growth rate ( Dα ) respectively in the previous equation. Re-arranging gives:

( )( )

− − =

A B C D E F G H I J 1 tD Design

(days) T (hours) R0(T) RCT(T) RG(T) D �0 �G �

2 =(-LN($D$3)/$C$3) =-LN($F$3)/$C$3 =(-LN($I$3/$H$3)/LN(1+$D$5))3 7200 300 87600 0.800 0.990 0.98 10 2.547E-06 2.306E-07 1.002E+004 5 6 =EXP(-H8*$C$3) =$F$3 =$H$3*(1+$G3$5*(A8/$A$3))^(-$J$3) 7 t t (days) R(t,T) RG(T) Ra(t,T) �(t) 8 0 0 0.800 0.980 2.547E-06 9 96 4 0.821 0.980 0.768 2.247E-06 10 192 8 0.839 0.980 2.010E-06 11 288 12 0.853 0.980 1.818E-06 12 384 16 0.865 0.980 0.832 1.660E-06 13 480 20 0.875 0.980 1.527E-06 14 576 24 0.884 0.980 1.414E-06 15 672 28 0.891 0.980 0.867 1.316E-06 16 768 32 0.898 0.980 1.231E-06 17 864 36 0.904 0.980 1.156E-06 18 960 40 0.909 0.980 1.090E-06 19 1056 44 0.914 0.980 0.905 1.031E-06 20 1152 48 0.918 0.980 9.781E-07 21 1248 52 0.922 0.980 9.303E-07 22 1344 56 0.925 0.980 8.870E-07 23 1440 60 0.928 0.980 8.475E-07 24 1536 64 0.931 0.980 0.916 8.114E-07 25 1632 68 0.934 0.980 7.782E-07 26 1728 72 0.937 0.980 7.476E-07 27 1824 76 0.939 0.980 7.194E-07

Figure 2. Spreadsheet set-up for Krasich reliability growth model

0 50 100 150 200 250 300Design Period Duration.t (days)

R(T,t)

Ra(T,t)

Reliability Goal

R G (T)

R(t,T)

R 0 (T)

R(t,T)

Actual Reliability

R G (T)

Ra(t,T)

If a growth rate has been specified or estimated, then similarly an estimate of the expected time to reach the goal reliability is given by:

( )( )

− − =

3. 3. ILLUSTRATIVE EXAMPLE

An industry based example is used to illustrate the

application of the models. The modified power law model was used directly in its design phase, while the modified IBM model has been applied retrospectively. For the product concerned the required life is 15 years with a goal reliability of 95%. The product development program specifies the design period to be 140 days. 3.1 Modified Power Law Model

Since the required life is 15 years then T = 365×24×15 = 1.314x105 hours and the 140 day duration of the design period implies tD = 140×24 = 3360 h. A fault tree analysis, or another reliability assessment or prediction method, of the initial design estimates that the initial 15-year reliability is R0(T =15 years)=0.72. Given the initial and goal reliabilities are specified, the initial and goal average failure (hazard) rates are found to be 2.5 x 10-6 and 3.9x10-7 failures per hour respectively. From engineering judgment ten design types of design weakness were assumed, giving an estimated growth rate of 0.774. The reliability growth function under the modified power law (MPL) is shown in Figure 4. 3.2 Modified IBM Model

Again the goal reliability is 0.95 at 1.314x105 hours and ten potential design weaknesses are assumed. They were all classified as being related to component failures. The duration of the design phase is 3360 hours. We further assume that we can elicit the probability of each potential design weakness resulting in failure and construct the prior distribution of the number of potential design weaknesses. In this case two classes of failure can be identified relating to electronic component and board layout. We assume the prior distributions are Poisson with the expected number of design weaknesses being 2 and 1 respectively. We further assume, the non-systematic failure rate is 1x10-8 failures/hours, with component related failures occurring at a rate of 1x10-6

failures/hours and board layout related failures at a rate of 6.67x10-7 failures/hour. Thus the reliability of the initial product design is estimated to be 0.72 and the reliability growth in design is estimated to be 5.6x10-4. The reliability growth curve under the modified IBM model (MIBM) is also shown in Figure 4. 3.3 Achieved growth and relationship to models

The observed growth during this product design is also plotted in Figure 4. In fact the improvements were made in three steps, although many similar types of modifications were included in the same change. The first modification involved changing capacitor types to those with much better dielectric properties and higher reliability (106 capacitors of various values and various contribution to unreliability were considered one design modification). The second change was introduction of parts with higher ratings, as there were some parts (capacitors on semiconductors) that, because of improper electrical rating, demonstrated high unreliability. The third change was also a compilation of several modifications: more reliable IC components, some switching field effect transistors (FETs) to be obtained from a more reliable vendor, reduction in some discrete semiconductor components. Following each design modification, product reliability was re-estimated using fault tree analysis. Further improvement was not considered cost effective, and the final reliability estimate was accepted as satisfactory.

Both models present growth profiles that show pathways to achieving the goal reliability of 0.95 by end of the design period. Despite showing more early growth than was observed, in part due to the accumulation of faults to be address during one modification, the MPL model fits the observed profile well in the latter stages of design and did help support the decision that goal reliability was achievable. The profile of the MIBM model is less steep than the modified power law and so presents a more conservative estimate of reliability growth. Statistically this model matches the observed profile better than the modified power law. However this is partly due to the retrospective nature of the analysis. Although it may also be argued that delayed modifications may lead to less immediate growth and so the MIBM model may present a more realistic profile.

4. COMPARISON AND FUTURE DEVELOPMENT OF MODELS

More general comparison of the two models can be

conducted in terms of broader criteria such as the assumptions,

0 50 100 150Design Period (Days)

IBM Power

Observed Power (Observed)

Figure 4 Achieved and planned growth under two models

data requirements, model implementation, support provided for planning decisions. 4.1 Assumptions and model formulations The MPL assumes that there are fixed number of design modifications, D, that will be conducted over a known design time period, tD. Moreover, this number is known a priori. The MIBM differs in so far as the number of design modifications is assumed unknown a priori, but modeled as a Poisson random variable with a known mean.

Both modeling approaches utilize subjective assessments to support inference regarding the number of design modifications that will be made. The MPL requires a point estimate of the number of modifications while the MIBM demands a prior distribution, describing the uncertainty associated with the number of faults that exists within the design. Formal processes have been developed and evaluated to support the acquisition of this distribution for the MIBM, see, for example, Hodge et al (2001). Each model makes different assumptions concerning the rate at which these modifications are implemented. The MPL assumes that modifications are evenly spaced throughout the design phase, while the MIBM assumes that it is more likely to conduct modifications early in design and they become fewer as time progresses.

Both models agree that the hazard rate decreases more substantially in early design. By assuming a Power Law and equally spaced modifications the MPL implies that earlier modifications have a greater impact on the hazard rate than later. The MIBM focuses more on the underlying fault, whereby it is assumed that each fault within a class contributes identically to the overall hazard rate of the design. By assuming more modifications in early design, the MIBM captures the decreasing rate of failures.

Both models assume that when design is terminated, the product has a constant hazard rate. However, in the MPL this hazard rate is assumed known, while in the MIBM it is composed of an unknown number of underlying faults for which we are in possession of a prior distribution. As such, this latter model will not characterize the time to failure with an exponential distribution but models the time to failure through an average of exponential distributions weighted against the prior distribution.

Neither model explicitly considers imperfect fault mitigation, or the introduction of new faults throughout design. However re-assessing the input parameter D for the MPL and re-eliciting the prior distribution for MIBM can address this.

4.2 Data requirements and model outputs Both models require a clear specification of the goal reliability and an initial assessment of the reliability. Using these as inputs they explicate the relationship between the number of design modifications and effectiveness of the design phase. The MIBM requires as input a mean number of faults within the initial design and therefore, the effectiveness of design time in mitigating faults is an output, selected to

achieve the goal reliability. However, with the MPL the number of modifications drives the reliability enhancement and is determined assuming an effectiveness parameter, which would be characteristic of the product and the anticipated types of modifications.

Neither model is complex. Each can be supported using a spreadsheet and can provide a useful guide to growth profiling in practice. Both models have formulations that are intuitive to project management teams. The MIBM requires more inputs, is grounded in a stochastic process theory and so requires slightly deeper understanding to implement. However it can formally account for uncertainty in reliability growth through a prior distribution, which can be updated mathematically through Bayesian methods or through re-elicitation during design. The MPL has a simpler formulation that is very intuitive and integrates with standard reliability techniques used in the design phase. However it is only able to support point estimates of reliability growth.

REFERENCES

1. Hodge R, Evans M, Marshall J, Quigley J, Walls L.A.

(2001) ‘Eliciting engineering knowledge about reliability during design - Lessons learnt from implementation’ Quality and Reliability Engineering International 17 pp. 169-179

2. Jewell, W (1984) General framework for learning curve reliability growth models, Operations Research, 32, pp 547-558.

3. Krasich, M, (1999), Analysis Approach to Reliability Improvement. Proceedings of the Annual Technical Meeting of the Institute of Environmental Sciences and Technology, pp. 180-188. Ontario, CA

4. Meinhold, R and Singpurwalla, ND (1983) Bayesian Analysis of a Commonly Used Model for Describing Software Failures, The Statistician, 32, pp 168-173.

5. Quigley J. and Walls L.A. (1999) Measuring the Effectiveness of Reliability Growth Testing, Quality and Reliability Engineering International, 15, pp 87-93.

6. Quigley J and Walls L.A., (2003) Confidence Intervals for Reliability Growth Models with Small Sample Sizes, to appear in IEEE Transactions in Reliability.

7. Rosner, N (1961) System Analysis - Non-Linear Estimation Techniques, Proceedings of the Seventh National Symposium on Reliability and Quality Control, Institute of Radio Engineers.

8. Walls L.A. and Quigley J (2001) Building Prior Distributions to Support Bayesian Reliability Growth Modelling Using Expert Judgement, Reliability Engineering and System Safety, 74, pp 117-128.

BIOGRAPHIES

Milena Krasich Bose Corporation Massachusetts 01701-7330 USA Milena Krasich is the Senior Technical Lead of Reliability Engineering in Design Assurance Engineering at the Bose Corporation. She holds a BS and MS in Electrical Engineering and is a Registered Professional Electrical

Engineer. She is a member of IEEE and ASQ Reliability Society. John Quigley Department of Management Science University of Strathclyde Glasgow G1 1QE, SCOTLAND John Quigley is a Senior Lecturer specializing in probability and statistical modeling. He has a BMath and a PhD in

reliability growth modeling and is a Fellow of the RSS and ORS. Lesley Walls Department of Management Science University of Strathclyde Glasgow G1 1QE, SCOTLAND Lesley Walls is a Professor with a BSc and PhD in reliability data analysis. She is a Chartered Statistician and a member of IEC TC56.

[ieee annual symposium reliability and maintainability, 2004 - rams - los angeles, ca, usa (jan....

Documents

reliability, availability, maintainability

13 maintainability prediction - ald favoweb software ·...

condition based rams for submission 21.6.2016€¦ ·...

dieter rams

networking and computing technologies division becky...

sgt-800 maintainability improvements

maintainability

maintainability si.pdf

3.0 maintainability and availability

analisi rams

implementing reliability, availability, maintainability and...

rams の認証とセーフティケース - ipa.go.jp · pdf...

boost maintainability

el análisis de rams (reliability, availability,...

annual reliability and maintainability symposium, · 2009...

reliability, availability and maintainability

· atkläts konkurss/ open competition „uzticamïbas,...

analisis rams

author: mara ĆujiĆ an analysis of the task list impact...

maintainability in software engineering