distribution of residual system-life after partial failures

IEEE TRANSACTIONS ON RELIABILITY, VOL. 37, NO. 5 , 1988 DECEMBER

Distribution of Residual System-Life After Partial Failures

539

Janusz Karpinski S ys terns Research Ins tit u te, Warsaw

I have not found any non-trivial results on this subject in the literature.

Notation Key Words - Residual system life time, Residual life set

Reader Aids - Purpose: Widen state of the art Special math needed for explanations: Probability Special math needed to use results: Same Results useful to: Reliability theoreticians

Summary & Conclusions - This paper presents a general method to determine the distribution of residual system life (RSL) of a coherent system after some partial failures. The RSL begins once all the system components of at least one member of the well-defined set of residual life sets have failed. The approach is based on the knowledge of a special distribution of component lives and system life. It is general with respect to statistical dependence of system component states, unless cold standby is included. It can be applied to any coherent system and can be useful for solving problems of safety forecasts for nuclear power plants, chemical systems, and other potentially dangerous (environmen- tally) systems. A computer program has been written that permits practical application of the method. The process of determining the distribution of the RSL is generally useful for theoreticians. Analytic and numerical examples of the method are included.

1. INTRODUCTION

In coherent systems the failure of more components always increases the possibility of system failure. Especial- ly, if safety depends on system functioning (nuclear power plants, chemical systems, etc.) then the problem of system reliability forecast after failure of certain components becames very important. For instance, for the 2-out-3:G system, tolerating one fault, the following question can be asked: What is the distribution of residual system life (RSL) after the failure of any one component? In a special case, if only one component can be checked for operability then one can ask for the distribution of RSL after failure of this component, taking into consideration that unchecked components can fail earlier.

The problem mentioned above was solved for all possible systems of 2 and 3 components in [3], where cases of cold standby were also discussed. The presentation in [3] is very explicit. However, results were found in an ad- hoc manner not useful for systems composed of many components.

This paper presents a general method of determining the distribution of RSL. It is easy to prepare an algorithm and computer program to determine the distribution of RSL for any coherent system. Analytic and numerical examples are presented to illustrate the method.

system component i of S number of ci in S life time of ci, a r.v.

minimal cut set i number of minimal cuts system life time, a r.v.

Cdf, pdf of Ti

,., t,, t ) joint Cdf of Tl, ..

Other, standard notation is given in “Information for Readers & Authors” at the rear of each issue.

Nomenclature

RSL lj

L

Li K t I number of Li F,, F , Cdf, Sf of RSL, F, + F , = 1 Fr,i auxiliary function used for determining F, Qi(tl, ..., t,, t ) function, obtained from G(tl, ..., t,, t )

with respect to Li A(x, y, 7, S, t), Ai(x, y, T, S, Li) random events A short for A(x, y, T, S, L) A ’(x, S, L), A ”(y, T , S ) random events i: ci E H set of indexes of all ci contained in H CkAi

residual system life, a r.v. residual life set: the set j of certain ci, such that when the last one fails, then RSL begins { 11, . . ., l M } : the set of all residual life sets (of the S under consideration) set; union of certain lj number of lj that form Li

Ck without those cj that belong to li

Explanation

The set L of all residual life sets defines the conditions for starting RSL, ie, RSL begins when the last component of at least one residual life set fails. For instance, if for 2-out-3 system we define three one-component residual life sets, then RSL begins when any one component fails. In case of only one one-component residual life set, RSL begins at the moment of failure of the component of this set. However, unchecked components can fail earlier, so that, with some non-zero probability, in this case RSL could be zero.

001 8-9529/88/12OO-0539$01 .OO 01988 IEEE

540 IEEE TRANSACTIONS ON RELIABILITY, VOL. 37, NO. 5,1988 DECEMBER

Assumptions

1. The system and its components each have two states: functioning and failed.

2. There exist only basic (no standby) components. (Cold standby does not exist).

3. The system is s-coherent with known reliability structure. (All minimal cuts are known).

4. The joint Cdf of component life times is known. (If system components failures are s-independent then the Cdf’s of life times of each component are known).

2. GENERAL SOLUTION

For determining the Cdf of RSL it is useful to consider the event

is the random event, that to the moment x all components of at least one residual life set lj have failed; and

A”(y, 7, S) = { T > y + T} is the event, that up to the moment y + 7 the system does not fail. Pr{A(x, y, 7, S, L ) } is a mixture of a Cdf of a certain sub- system and a survivor function (Sf) of the system. From ( 1 ) and from definition of RSL it follows that the Sf of RSL is :

Fr(7) = Pr{RSL > 7)

Now we must determine Pr{A}. On the basis of the Poin- care formula we get:

M r

+ ...

@ = U l ; i= 1

(3)

In order to give names to the several joint probabilities of (3), define:

Li = { l i l U li2 U ... U l i K . } , I il + iz + ... + ixi,

Ki < M. (4)

The number of sets of the type Li is: Z = 2M - 1 .

A ~ ( x , ~ , T, S, L J = n ( T ~ < X ) n ( T ~ J + 7) . ( 5 ) 1 i : C j € L i i Then, by (3), for properly defined Li:

Pr{A(x, y, 7, S, L ) } = C ( - I

Pr Ai{(x, X, 7, S, Li)}. i= 1

(6)

Using the abbreviation -

eq. (2) and (6) yield I

F , ( T ) = i= c 1 ( - l )Ki+lFr , i (T) . (8)

Now it is necessary to model the fact that T, begins when all the ci of any l j have failed; and it ends when all the ci of any mincut c k have failed. The F,,i(7) can be determined by using the joint Cdf of the random vector (TI, .. ., T, T). In [2, 31 it is shown that the joint Cdf of (Tl, ..., T, T ) is:

The function G ( t , ..., t , t ) is discontinuous at ti = t. Hence, differentation at ti = t is not defined. However, for determining the probability of the event A i it is sufficient to consider the moments tj < t for allj: cj E Li, because we are interested in failures (of components belonging to Li) before y , being the beginning of RSL. Moreover, for j : cj 4 Li it is sufficient to consider the moments tj 2 t, because we are interested in failures (of components not belonging to Li) only aftery + 7. Thus, in determining Fr,i(7) we can use, instead of G(t,, ..., t, t), the simpler function Qi(tl, ..., t , t) obtained from G, where instead of expressions of the type min{tj, t } there are only ti forj: cj E Li, and t forj:cj Li. More precisely, define:

KARPINSKI: DISTRIBUTION OF RESIDUAL SYSTEM-LIFE AFTER PARTIAL FAILURES 54 1

By definition of Qi it follows that for tj > t ( j :c j E Li) , and for tj < t ( j : cj $? Li) we have Qi = 0. For tj < t ( j :c j E Li) the function Qi is continuous and at tj = t is left- continuous. For ti > t 0: cj $? L J , the Qiis continuous, and it is rightcontinuous at tj = t. This means that for ti 6 t ( j : Cj E Li) and for tj 2 t ( j : C j $? Li), the Qi has partial derivatives with respect to all tj and t. Thus we can write:

basis for determining the Pr{A(x, y, 7, S, L ) } , is (after reduction of identical elements) a sum of no more then 2(N-k) terms, where k is the number of elements of the set L ~ .

Remark 4

- Taking into consideration remarks 1-3 it is easy to prepare an algorithm of the method and then, with the aid of a computer program, to determine the Cdf of system residual life for any coherent structure.

Pr{Ai(x, Y , 7, S, Li)}lx=y=t -

Qi(QO, ..., Zi, ..., 00, Z i , a e . 3 QO, Y ) dYdzi. 1 m az = s:, st+,[,,,

3. EXAMPLES In (ll), the zi exist in place of tj ( j : cj E Li); 03 exists in place of t , (m:c, $? Li). Integrating (1 1) by y we obtain In all examples it is assumed that system components

AQi 3 Q~(Qo, ..., zi, ..., QO, zi, ..., 03, QO)

- Q~(Qo, ..., zi, ..., 00, zi, ..., QO, t + T) By (7) and (12) we have:

Inserting (13) into (8) we finally get the basic result:

dt

Remark 1

fail s-independently of each other.

3.1. The l-out-of-2:G system (“parallel” system) (12)

Let L = {11, I,}, I, = {cl}, 1, = {cz}. This means that T, begins at the failure of the first component. By (3) -

Pr{A(x, y, 7, S, L) 1

= Pr{[(Tl < x) n (Tz < x)] U (T < y + T)

= Pr{(Tl < x) n ( T Z y + T)} (13)

+ Pr{(Tz < x) n ( T Z y + T)} - Pr{(Tl < x) n (Tz < X) n ( T a y + T)}. (15)

(14)

From (15) it follows that: L 1 = {cl}, Lz = {cz}, L3 = {ci, cz}. The L3 is a system cut and (by remark 1) is not considered. The joint Cdf of the random vector (Ti, Tz, T ) is:

If Li is a cut set (failure of all components belonging to Li causes system failure), then from (9, the Pr{Ai(x, y, S, L i ) } = 0 for all T > 0; and then from (7), the Fr,i 3 0. This means that by considering successive sets Li in (8) we can omit sets Li that are cuts of the system.

G(tl, tz, t) = Pr{(Tl < t) n (Tz < tz) n (T < t)}

= Pr{(T, < min{tl, t}) fl (Tz < min{tz, t } ) } . (16) Remark 2

By (lo), for 1, = c1 we have: After the removal of all sets Li that are cuts of the

system, it often happens that the number of remaining sets Li is small. For example, if the system tolerates k - 1 failures of components (k-out-of-n:F) and L is a set of all (k- 1)-

(k 11) element sets of system components, then M =

and I = 2M - 1 . The number of non-trivial sets Li is n.

Remark 3

From (9) it follows that the joint Cdf G(t , ..., t,, t) is a sum of 2N - 1 constituents. But the Qi in (lo), being the

542 IEEE TRANSACTIONS ON RELIABILITY, VOL. 37, NO. 5,1988 DECEMBER

Using (13) we obtain By (lo), for L1 = {cl} we get

Any set of components is cut of a system. Thus, by a - Qi(zi. tz , f3, 4 azl remark 1:

a 3.3 The 2-out-of-3:G system azl

a

-Qi(zi, ~ 0 , m , 03) =fi(zi),

Let = {I13 /3}, I1 = iCl>, I2 = {cZ>, 4 = {c3)- - Ql(zl, 00, 00, t + 7) = fl(zl)[Fz(t + 7) Here, M = 3, I = 23 - 1 = 7, so that there are 7 different sets of the type Li:

azl

n (Tz < min{tz, t } ) fl (T3 < min{t3, t } ) } . (23) Hence, one can rewrite the Fr(7) as:

KARPINSKI: DISTRIBUTION OF RESIDUAL SYSTEM-LIFE AFTER PARTIAL FAILURES 543

3.4. The 1-out-of-3:G system

A. Assume that I = { I l , 12, I,}, I, = {cl, CZ}, 12 = {cl, c3}, l3 = {cz, c3}. Now we obtain:

Sets Li (i = 4, ..., 7) are system cuts. Then

B. Assume now that L = {Il, l ~ } , I1 = {cl}J IZ = {CZ, c3). In this case:

3.5 The 3-component “series-parallel” system

Consider the reliability block diagram in figure 1.

Fig. 1. Reliability Block Diagram for 3-Component System.

Assume that L = {IlJ Iz}, l1 = {cI}, IZ = {cz}. Then Tr begins at the failure of c1 or cz. Component c3 is not observed. There are 3 sets of the type Li:

and from (4), the K 1 = KZ = 1, K3 = 2. The joint Cdf of (TI, Tz, T3, 7) can be written as:

G(tl, t2, t3, T ) = Fl(min{tl, f } ) F Z ( f d F3(min{t3, t } )

+ Fl(tl)Fz(min{fz, t})Fdmin{f3, f } )

- Fl(min{tl, t})F2(min{tz, t})F3(min{t3, t}). (33)

For L 1 = {cl}, by (10) we obtain:

Ql(flJ fz, f3, t)

= Fl(fl)F2(fZ)F3(f) + Fl(fl)FZ(tP3(f) - F1(fl)FZ(t)F3(t)

= F1(fl)Fz(fZ)F3(t)* (34)

Thus -

(3 5 ) a

a, 1

- Qi(zlJ t z J t3J f , = fl(Zl)FZ(tZ)F3(f)*

Using (13) we get:

Fr,1(7) = jomfl(f) Fdf + 7)df

Fr,2(7) = SornfZ(t)F3(t 7)dt. (36b)

(36a)

and similarly:

For L3 = {clJ cz} we have:

Q3(z3~ 2% f % t ) = Fl(ZJ)FZ(ZJ)FJ(f),

so that - a

az3 - QJ(ZJJ z3J fa f ) = [fl(&)FZ(z3) + F I ( Z J ~ Z ( Z ~ ) ~ F ~ ( ~ ) .

(37)

From (1 3) we obtain:

F r , 3 ( 7 ) = Som [fi(t)Fz(t) + Fl(tIf~(t)] F3(f + 7)df. (38)

and by (8) we finally get:

Fr(7) = ~ o m f l ( f ) F 3 ( f + 7)df + som fz(t) F-30 + 7)df

- som [fl(f)FZ(f) + Fl(f)FZ(f)l Fdf + Mf. (39)

The Fr(O + ) # 0. This result is correct because component c3 is not observed and with a certain positive probability it can fail first. Thus, failure of any observed component (CZ or c3) causes system failure where, sometimes, RSL = 0.

4. NUMERICAL EXAMPLES

For more complex systems than those considered in section 3, determining Cdf of residual system life time needs many laborious mathematical operations. There- fore, a computer program RES1, calculating values of Fr(7) for any 7, has been written. This program was coded in FORTRAN IV and run on a PDP-11/60 computer.

544 IEEE TRANSACTIONS ON RELIABILITY, VOL. 37, NO. 5, 1988 DECEMBER

Example 1. the failure of two of them: {cl, c2} and {c3, c4}. Other- wise, RSL begins after the next (third) component failure, but in these cases the system is failed, and RSL = 0. Consider the 3-out-of-4:F system. Let L = {11, ...,

1 6 } , where L is a set of all 2-element subsets of system components. It means that RSL begins at the failure of the last of any pair of components. Assume that all component times-to-failure (lives) have a Weibull distribution with scale and shape parameters, respectively: (3.0; 1 .O), (4.0;1.5), (5.0;2.0) and (6.0;2.5). The values of Fr(7) for some 7 are presented in table 1.

TABLE 1 Values of FAT)

0.0 O.OO0 0.1 0.050 0.2 0.096 6.0 0.995 0.3 0.139 6.1 0.996

6.2 0.996 6.3 0.997

2.1 0.707 6.4 0.997

5 . FINAL REMARKS

The above method of determining distribution of RSL is universal and can be succesfully applied in many practical cases. Nevertheless, for large systems this method is not very effective because of many laborious mathematical operations. The computational time required for determining Cdf of RSL is a quickly increasing function of a number of minimal cuts, a number of considered residual life sets, and exactness of numerical integration in (14). For small systems, like 2-out-of-3 or 3-out-of-4, a value of Fr(7) is obtained after several seconds, but for systems con- taining ten or more minimal cuts and considering more then 10 residual life sets, in most cases the computational time required for evaluating a one value of F47) is greater then a one minute. Using new generations of computers, faster then PDP-11/60, this time can be considerably

2.2 0.726 6.5 0.998 shorter. 2.3 0.744 6.6 0.998 2.4 0.761 6.7 0.999 2.5 0.778 ACKNOWLEDGMENT 2.6 0.793 I am very grateful to Prof. Dr. Winfrid G.

Schneeweiss of Hagen University for helpful discussions throughout the course of this research and suggestions in improving the presentation of this paper.

Example 2

For the 3-out-of-4:F system assume that L = {Il, 1 2 ) , where lI = {cl, c2}, l2 = {cg, c4}. Let all system component lives be i.i.d with a Weibull distribution with scale parameter a = 3.0 and shape parameter 0 = 2.0. The results of computing Fr(7) for 7 are presented in table 2.

TABLE 2 Values of F,(r): Numerical Example 2

0.0 0.1 0.2 0.3

1.4 1.5 1.6 1.7

0.667 0.698 0.727 0.754

0.937 0.946 0.953 0.960

2.0 2.1 2.2 2.3 2.4 2.5 2.6

0.975 0.979 0.983 0.985 0.988 0.990 1 .OOo

REFERENCES

[l]

[2]

J . Karpiiski, “General probability of system failure”, IEEE Truns. Reliability, vol R-32, 1983 Dec, pp 444-449. J . Karpiiski, “General method of determining the probability of system components failure before system failure”, Systems Research Institute Report #110, Warsaw 1985. W. Schneeweiss, J . Karpiiski, “The theory of delayed repair for all systems with 2 or 3 components”, Informutik - Berichte, 1986, Fernuniversitat, D-5800 Hagen, Fed. Rep. Germany.

[3]

AUTHOR

Dr. Janusz Karpinski; Systems Research Institute; Polish Academy of Sciences; Newelska 6; 01-447 Warsaw, POLAND.

Janusz Karpilski was born in Poland on 1948 March 17. He received the MS, Dr of Techn. Sci, and PhD degrees in 1971, 1975, 1980. His in- terests include reliability analysis and optimal maintenance policies of systems. He is an Assistant Professor and Head of Reliability Section in the Systems Research Institute of the Polish Academy of Sciences.

Manuscript TR86-164 received 1986 December 15; revised 1988 February 17.

The Fr(O +) = %. This result is obvious. There are 6 2-element sets of components, but RSL begins only after IEEE Log Number 22777 4 TR F

distribution of residual system-life after partial failures

Documents