chap3 basic reli maths

Upload: koteshwarryahoocom

Post on 09-Apr-2018

232 views

Category:

Documents


0 download

TRANSCRIPT

  • 8/7/2019 Chap3 basic reli maths

    1/31

    Chapter 3

    BASIC RELIABILITYMATHEMATICS

    This Chapter introduces the terms reliability R(t), unreliability F(t), time tofailure density f(t), failure rate function f r(t), hazard h(t) and cumulativehazard H(t) functions as well as their interrelationships. Other terms relatingto mean life are also introduced.

    It contains mathematical definitions and relationships necessary to under-stand each of the chapters which follow. These definitions and relationshipsare the building blocks of reliability engineering. It introduces the four fun-damental failure distributions (densities) of reliability engineering. It also

    explains how we can estimate the percent of the population which will failby a certain time simply by using the sample data order number and numberin the sample. This provides the basis for probability plotting, discussed inChapter 8.

    Many of the developments in this chapter have their origin in the math-ematics of actuarial science, developed for over 200 years before they wereapplied to electro-mechanical devices. Also, there are statistics and biostatis-tics courses in survival analysis which focus on many of the same topics asdo reliability engineering courses.

    Glossary of terms and symbols:

    Bathtub Curve: A plot of h(t), the hazard function over time, t. So-called because its shape resembles the profile of a bathtub.

    91

  • 8/7/2019 Chap3 basic reli maths

    2/31

    92 CHAPTER 3. BASIC RELIABILITY MATHEMATICS

    Conditional Reliability: The probability of no failure in an interval

    given no failure (survival) from time zero until the starting time of the inter-val.

    Cumulative Hazard Function H(t): The area under the hazard func-tion from 0 to t. H(t) is not a probability.

    Hazard Function h(t): The instantaneous conditional probability offailure in a small interval (t, t + dt) divided by the width of the interval.

    Failure Rate Function: A function depicting the number of failures perunit of time at a particular time. The failure rate function f r(t) is related

    to the hazard function in that its plot over time has the same shape. Onlythe Y axis values differ.

    Non- parametric means distribution-free and refers in this chapterto estimates of functions such as unreliability, F(t) which are made withoutreference to the underlying failure distribution (Weibull, normal, etc.)

    Reliability: R(t) The probability that a device or system will performits intended function for a given interval of time under specified operatingconditions.

    Time-to-Failure Density Function, f(t) A probability density func-tion describing the failure behavior of system over time.

    Unreliability: F(t) = 1R(t) The probability that a device or systemwill not perform its intended function for a given interval of time under spec-ified operating conditions. The Unreliability is identical to the cumulativedensity function (cdf) in probability theory.

    3.1 Definition of reliability

    Most definitions of reliability have four elements. Consider the definition pro-posed by the Advisory Group on Reliability of Electronic Equipment(AGREE)in 1952 and reported in AGREE (1957).

  • 8/7/2019 Chap3 basic reli maths

    3/31

    3.1. DEFINITION OF RELIABILITY 93

    Definition: Reliability is the probability of performing without fail-

    ure, a specific function under given conditions for a specified period oftime.

    The four elements are:

    1) Probability: Reliability is a probability, a probability of performingwithout failure; thus, a reliability is a number between zero and one.

    2) Failure: What constitutes a failure must be agreed upon in advanceof the testing and use of the component or system under study. For exampleif the function of a pump is to deliver at least 200 gallons of fluid per minute

    and it is now delivering 150 gallons/per minute, the pump has failed, by thisdefinition.

    3) Function: The device whose reliability is in question must performa specific function. For example, if I use my gasoline-powered lawnmower totrim my hedges and a blade breaks, this should not be charged as a failure

    4) Conditions: The device must perform its function under given con-ditions. For example, if my company builds and sells small gasoline-poweredelectrical generators intended for use in ambient temperatures of 0-120 de-grees Fahrenheit and several are brought to Nome, Alaska and fail to operatein the winter, we should not charge failures to these units.

    5) Time: The device must perform for a period of time. One shouldnever cite a reliability figure without specifying the time in question. Theexception to this rule is for one-shot devices such as munitions, rockets, au-tomobile air-bags, and the like. In this case we think of the reliability as theprobability that the device will operate properly (once) when deployed orused. Or equivalently one-shot reliability may be thought of as the propor-tion of all identical devices which will operate properly (once) when deployedor used. In reliability, unless otherwise specified time begins at zero. We

    treat conditional probability of failure and conditional reliability separatelyand call them as such.

    The elements 2,3 and 4 are important to the reliability of a device, butthey differ in different situations; elements 1 and 5 are more basic. Since

  • 8/7/2019 Chap3 basic reli maths

    4/31

    94 CHAPTER 3. BASIC RELIABILITY MATHEMATICS

    reliability is a probability, the theory outlined in Section 1 of this chapter

    is available for use in reliability theory and also the methods of probabilityassignment discussed in Section 1 are important for reliability studies. Theprobability element of reliability also allows one to calculate reliabilities in aquantitative way, that is, the assessment of a reliability can be done proba-bilistically so that the quantity given to the reliability has the meaning andstructure of probability for its manipulation and interpretation.

    The time element is also basic in reliability. In fact, the same publicationin which the AGREE definition of reliability appeared proposes that thebasic distinction between reliability and quality control is related to thiselement. In this way of comparing reliability and quality control, quality

    control studies failure at a given time whereas reliability studies failure overtime.

    In a sense, this comparison introduces a new definition of reliability, thatis, a study of failure over time. Also the term failure is introduced and to beconsistent, it is important to define failure. Thus, a failure is defined as anyfunctioning of the device or component which is not considered within theprescribed limits of satisfactory functioning.

    Since the element time is so basic to reliability, it is quite natural then,that the primary random variable in reliability studies is time and that thepurpose of such studies is often life length. When this emphasis on life lengthis the focus of a reliability study, the study is often referred to as a life test.and this terminology is often used to describe the reliability study. Withthese points in mind, one can imagine the kinds of interesting discussionsand arguments one may observe when design engineers, manufacturing en-gineers, electrical, mechanical and quality engineers get together in designreview or failure analysis meetings and discuss things such as :

    1) was it a failure or not ? and

    2) was it electrical or mechanical or was it really mechanical caused by

    electrical (or vice-versa) or it was caused by software or it was caused bythose guys over there ? Thus, there will be some finger-pointing. Thismakes for interesting meetings. To minimize these problems, you must clas-sify potential failures, define what is a failure and have some kind of meetingof the minds before you actually see the failures.

  • 8/7/2019 Chap3 basic reli maths

    5/31

    3.2. MATHEMATICAL DEFINITION OF RELIABILITY 95

    3.2 Mathematical definition of reliability

    The life of a device under reliability study follows a sequence that results inan observable time to failure. A new device is put into service, it functionsacceptably for a period of time and then it fails to function satisfactorily.The observed time to failure is a value of the random variable T, whichrepresents the lifetime of the device. T takes its values in an interval ofthe real numbers, R, most often in the interval [0,). Since the lifetimeof a device is represented by a random variable T, there is a probabilitydistribution function (cdf) of T,

    FT(t) = P(T t), 0 < t. (3.1)

    FT(t) is usually called the unreliability at time t. It represents the prob-ability of failure in the interval [0, t]. The probability of failure in the interval(t1, t2] equals F(t2) F(t1).

    Definition: The reliability function is:

    RT(t) = P(T > t) = 1 FT(t) (3.2)

    Thus, reliability is the probability of no failures in the interval [0 , t] or equiv-alently, the probability of failure after time t. Sometimes T will take on onlya countable number of values in R. This case, called the discrete case, occurswhen T is a number of cycles, for example, or when the failure time can occurat only discrete points.

    Most of the time, however, T will be a continuous random variable and itsdistribution FT(t) will be a continuous distribution having a density fT(t).

    3.2.1 Reliability With Continuous Random Variables

    Assume T is a continuous random variable, taking values in (0,) and withdensity function fT(t). The reliability function RT(t) is:

    RT(t) =t

    fT(x)dx = 1 t

    0fT(x)dx = 1 FT(t) (3.3)

    where:

    fT(t) 0 and

    0

    fT(x)dx = 1.

  • 8/7/2019 Chap3 basic reli maths

    6/31

    96 CHAPTER 3. BASIC RELIABILITY MATHEMATICS

    Note that,

    fT(t) = dRT(t)

    dt (3.4)

    It is also worth noting that the probability that the failure time T occursin an interval (t1, t2) can be written:

    P(t1 < T < t2) = FT(t2) FT(t1) = RT(t1)RT(t2) (3.5)At this point, we abandon the notation fT(t), RT(t) and FT(t) and for

    simplicity use f(t), R(t) and F(t), respectively. Figure 3.1 presents the rela-tionship between f(t), F(t) and R(t) graphically.

    Figure 3.1: Relationship between f(t),F(t) and R(t)

    Example: The exponential time-to-failure density is given by

    f(t) =1

    exp( t

    ), t > 0.

    Using the above relationships,

    F(t) = 1 exp( t

    ) and R(t) = exp( t

    ).

    One selects a time-to-failure (TTF) density, f(t), by collecting failure dataand either doing a goodness of fit test if sufficient data exists, or by makinga probability plot if there is very little data. Probability plotting proceduresare discussed in Chapter 8. We illustrate the use of TTF densities with

  • 8/7/2019 Chap3 basic reli maths

    7/31

    3.2. MATHEMATICAL DEFINITION OF RELIABILITY 97

    exponential and Weibull densities which are discussed in much more detail

    in Chapter 4.Figure 3.2 below illustrates a histogram of 1000 data points with an ex-ponential density curve overlaid. Figure 3.3 represents a probability plot on

    Figure 3.2: Histogram for Exponential Distribution

    exponential paper of 10 TTF points. The data points are reasonably close tothe fitted line and hence we may conclude that the exponential distributionis an appropriate choice for f(t).

    Figure 3.4 represents plots of Weibull density functions with various pa-rameters. The general form of this density function is given by

    f(t) = t1

    e(t/)

    t > 0

    where is a scale parameter called the characteristic value and is calledthe shape parameter. More on the Weibull distribution will be presentedthroughout the book, beginning with Chapter 4. Some densities in Figure3.4 have a positive skewness (ski-slope to the right) which indicates thatmost failures occur in the early part of life. Figure 3.5 represents F(t) vs. t for

  • 8/7/2019 Chap3 basic reli maths

    8/31

    98 CHAPTER 3. BASIC RELIABILITY MATHEMATICS

    Figure 3.3: Probability plot for Exponential Distribution

    those distributions. Note that as time increases, the cumulative probabilityof failure (failure on or before time t) increases and ultimately reaches one.

    Next, Figure 3.6 presents the hazard function for the same set of Weibull

    random variables. Compare these plots to the ones in Figure 3.4 and noticehow the shapes of both depend on the parameter .Example:

    Suppose that the TTF density of a pick and place machine used in printedcircuit surface mount technology is given by a Weibull density with pa-

    rameters = 40 and = 2. Thus, f(t) = 2t21

    402 e( t40)

    2

    . Hence,R(t) =

    t2t

    1600e(

    t40)

    2

    dt. or R(t) = e(t40)

    2

    . The probability of surviving the in-terval (0, 40) is R(40) = exp(1) = 0.368

    3.2.2 Reliability With Discrete Random Variables:

    Suppose now T is discrete, taking values 0 = t0 < t1 < t2 < . . . withprobability function:

    p(ti) = P(T = ti), i = 0, 1, 2, . . . (3.6)

  • 8/7/2019 Chap3 basic reli maths

    9/31

    3.2. MATHEMATICAL DEFINITION OF RELIABILITY 99

    0

    0.5

    1

    1.5

    2

    2.5

    3

    3.5

    4

    4.5

    5

    0 0.5 1 1.5 2 2.5 3

    t

    f(t)

    f(t),scale=1,shape=0.5

    f(t),scale=1,shape=1

    f(t),scale=1,shape=2

    f(t),scale=1,shape=3.5

    f(t),scale=1,shape=8

    Figure 3.4: Weibull Density Function

    In practice, as with cycles or discrete time periods (like minutes), thesepoints ti may be taken to be equally spaced with ti = i in suitable units. Itis not necessary but it is often convenient. If ti = i, then:

    RT(ti) = P(T > ti) = p(i + 1) + p(i + 2) + . . . (3.7)

    In general,

    RT(t) = P{

    T > t}

    = i: ti>t

    p(ti) (3.8)

    Notice that:

    p(ti) = R(ti1)R(ti) (3.9)

  • 8/7/2019 Chap3 basic reli maths

    10/31

    100 CHAPTER 3. BASIC RELIABILITY MATHEMATICS

    0

    0.1

    0.2

    0.3

    0.4

    0.5

    0.6

    0.7

    0.8

    0.9

    1

    0 1 2 3 4 5 6 7 8

    t

    F(t)

    F(t),scale=1,shape=0.5

    F(t),scale=1,shape=1

    F(t),scale=1,shape=2

    F(t),scale=1,shape=3.5

    F(t),scale=1,shape=8

    Figure 3.5: Weibull Cumulative Distribution Function

    3.2.3 Conditional Reliability and Unreliability

    We first define conditional reliability. Using Bayes Rule (2.10),

    P[nofailure(t, t + T)|nofailure(0, t)]

    =P[nofailure(t, t + T) nofailure(0, t)]

    P[nofailure(0, t)]=

    R(t + T)

    R(t)(3.10)

    Example: For the pick and place machine of the previous example, theprobability of surviving the interval (40,50) given survival (0,40) is

    R(50)

    R(40)=

    e(5040)

    2

    e(4040)

    2 =e1.5625

    e1= 0.5698

  • 8/7/2019 Chap3 basic reli maths

    11/31

    3.2. MATHEMATICAL DEFINITION OF RELIABILITY 101

    0

    0.5

    1

    1.5

    2

    2.5

    3

    3.5

    4

    4.5

    5

    0 0.2 0.4 0.6 0.8 1 1.2 1.4 1.6 1.8 2

    t

    h(t)

    h(t),scale=1,shape=0.5

    h(t),scale=1,shape=1

    h(t),scale=1,shape=2

    h(t),scale=1,shape=3.5

    h(t),scale=1,shape=8

    Figure 3.6: Weibull Hazard or Survival Function

    Conditional reliability is always calculated as the ratio of the reliabilityat the end of the interval to the reliability of the beginning of the interval.

    Conditional unreliability is given by

    P[failure(t, t + T)|nofailure(0, t)] = P[failure(t, t + T) nofailure(0, t)]P[nofailure(0, t)]

    = F(t + T) F(t)R(t)

    = R(t)R(T + t)R(t)

    (3.11)

    Figure 3.8 illustrates the regions of interest.

    Example. The probability that the pick and place machine will fail in

  • 8/7/2019 Chap3 basic reli maths

    12/31

    102 CHAPTER 3. BASIC RELIABILITY MATHEMATICS

    Figure 3.7: Region of interest for Reliability and Unreliability

    Figure 3.8: Region of interest for Conditional Reliability and Unreliability

    the interval (40,50) given survival (0,40) is

    F(50) F(40)R(40)

    =1 e( 5040)

    2

    (1 e1)e(

    4040)

    2 =e1 e1.5625

    e1= 0.4302

  • 8/7/2019 Chap3 basic reli maths

    13/31

    3.3. HAZARD FUNCTIONS (CONTINUOUS) 103

    Note that for this problem, in which the interval and TTF density is the

    same as the previous example, the conditional unreliability could have beenobtained by subtracting the conditional reliability from one.

    3.3 Hazard functions (continuous)

    Sometimes it is difficult to specify the distribution function of T directly fromthe physical information that is available. A function found useful in clari-fying the relationship between physical modes of failure and the probabilitydistribution of T is the conditional density function h(t), called the hazardfunction or failure rate. Consider the probability that a failure will occur

    in the small interval of time (t, t + dt):

    P{t T < t + dt} = P{T t}P{T < t + dt|T t},which is true by the multiplication rule of probability. Further, if R(t) =

    P(T > t)ispositive,

    P(T < t + dt|T > t) = P(t < T < t + dt)R(t)

    (3.12)

    Now the conditional rate of failure for the interval (t,t+dt) is the condi-tional probability of failure in the interval (given that the life of the devicehas reached t) divided by the length of the interval. Thus, the conditionalinterval failure rate is given by:

    P(t < T < t + dt|T > t)dt

    =P(t < T < t + dt)

    R(t)dt=

    [R(t) R(t + dt)]R(t)dt

    (3.13)

    The instantaneous failure rate, or the hazard rate, is the limit of the aboveequations as dt 0. That is,

    hT(t) = limdt0

    P(t < T t + dt|T > t)dt

    = limdt0

    R(t)R(t + dt)R(t)dt

    =R(t)

    R(t)(3.14)

    For simplicity we replace hT(t) with h(t). The function h(t) is usuallyreferred to in reliability as the hazard rate. Above, it is also called the in-stantaneous failure rate and elsewhere the failure rate function. In actuarialstatistics it is called the force of mortality and, in other places, the intensityfunction. In economics, the inverse of h(t) is called Mills ratio. Note that

  • 8/7/2019 Chap3 basic reli maths

    14/31

    104 CHAPTER 3. BASIC RELIABILITY MATHEMATICS

    if both sides of the first equality in equation 3.15 are multiplied by dt then

    h(t)dt is the instantaneous conditional probability of failure, i.e., the proba-bility of failure in the decreasingly small interval (t, t + dt) given no failurein (0, t).

    3.3.1 The Bathtub Curve

    With repect to both human and electromechnical failures, the shape of thehazard function over the lifetime appears to take on a shape somewhat likea bathtub. We are of the opinion that the bathtub curves for electrical andmechanical devices are different and futhermore each has been changed sinceattention to quality became a strategic issue in America. Figures 3.9 and 3.10

    depicts hypothetical bathtub curves for electrical and mechanical devices.

    Figure 3.9: Bathtub curve for Electrical devices

    Prior to the quality movement, a lifetime was thought of being comprisedof three failure regions. The first, where the hazard rate is decreasing is called

  • 8/7/2019 Chap3 basic reli maths

    15/31

    3.3. HAZARD FUNCTIONS (CONTINUOUS) 105

    the infancy hazard rate or the burn-in hazard rate. The second region, which

    was represented by a rather constant hazard rate is the region where failureis usually attributed to a chance occurrence. The third region is where thelife time reached the stage where the device is beginning to wear out and thehazard rate begins to increase. We feel that todays plots of h(t) vs. t arenot like the traditional bathtub shapes.

    Figure 3.10: Bathtub curve for Mechanical devices

    A few arguments for the demise of the bathtub curve will now be pre-sented. The initial portion, called the burn-in period for electronic devicesand the period of early failures for mechanical components, was due to defectspresent in the raw materials or subassemblies, errors in workmanship, early

    manufacturing problems and the like. That is, the early part of the bathtubcurve was primarily associated with poor quality . As defects were identifiedand removed, quality improved, and the hazard function began to steadilydecrease. With TQM and more attention to supplier quality, nurturing ofsuppliers, supplier evaluation and certification and with attention to elimi-

  • 8/7/2019 Chap3 basic reli maths

    16/31

    106 CHAPTER 3. BASIC RELIABILITY MATHEMATICS

    nating and removing the root causes of defects in manufacturing, quality has

    vastly improved. There is very little poor quality to improve upon and hencefew reasons to expect a downward slope to the curve.

    Many manufacturers of consumer products, with substantial electroniccircuitry, e.g., appliances, are now foregoing the burn-in period. That is ,no burn-in. The reason for burn-in (high temperatures, and sometimesvibration) is to allow substandard components (e.g., bad capacitor and faultyprocesses, e.g., poor solderability) to identify themselves by failing under theincreased stress(es). Replacements or repairs would be made before assemblyof component to the printed circuit board and/or before boards are placed inthe cabinet or housing. These manufacturers feel that quality has improvedto the extent that it is more likely that burn-in will cause latent defects

    than it will identify substandard components or processes. This is somewhatanalogous to the cessation of polio immunizations in the United States, withthe belief that it is more likely that the polio shot will cause the disease thanprevent it since it is now so rare in the U.S. population.

    Since the exponential distribution, as will be shown in a later section,has a constant hazard rate, the hazard rate function is useful for comparingdistributions to the exponential. In addition, the empirical hazard function(based on data alone) has been shown to be convenient for comparing groupsof devices. Other strengths of the use of the hazard function relate to itsfacility and stability when there is censoring of some of the data and when

    there are several modes of failure present in the failure process.

    3.3.2 Considerations in Selecting a TTF Density Func-tion

    If we examine the simplified plots of Figure 3.11 below, we observe threehazard functions: A, B and C. A is monotonically decreasing and B is mono-tonically increasing while C is constant. Relationship A implies that as timegoes by the instantaneous condition probability of failure decreases. Thisis unrealistic beyond small values of t for nearly all cases. Hence, most re-

    liability studies use relationship B or relationship C. As we shall soon see,B represents the constant hazard function of the exponential distribution,implying that the instantaneous conditional probability of failure does notchange over time. B is often selected for use in modeling TTF for electronicand electrical devices. C is most common for mechanical and electromechan-

  • 8/7/2019 Chap3 basic reli maths

    17/31

    3.4. RELATIONSHIP OFH(T), F(T) AND R(T) 107

    ical failures.

    A is usually observed only for a brief initial period after manufacture orprocessing. If devices behaved according to A, the more they were used,the better they would get, and paradoxically as we shall see in Chapter12, the more they are repaired, the worse they get. Hence, we recommendmodeling TTF with a random variable whose hazard function is, for themost part, either relatively constant or increasing in nature, although suchan increase may not be strictly monotone. This is the way most things inlife behave; the more we use them, the worse they get (the more likely theyare to fail). Even for the exponential random variable with the constanthazard function, wearout occurs. Failure eventually happens. Its just thatwith the exponential, the conditional probability of failure in a fixed interval

    is independent of where the interval begins (how long the device has beenoperating).

    Figure 3.11: Three Hazard Functions

    3.4 Relationship of h(t), f(t) and R(t)Because of the relationship on the right-hand side of (3.15), it also followsthat

    h(t) =d(ln R(t))

    dt(3.15)

  • 8/7/2019 Chap3 basic reli maths

    18/31

    108 CHAPTER 3. BASIC RELIABILITY MATHEMATICS

    and from that, integrating both sides and using R(0)=1, we have

    R(t) = exp

    t

    0

    h(x)dx

    (3.16)

    Now, from (3.15), h(t)R(t) = dR(t)dt. However, from (3.4), f(t) =dR(t)

    dt

    or,

    h(t) =f(t)

    R(t)=

    f(t)

    1 F(t) (3.17)

    and the relationship among h(t), f(t) and R(t) is established. Note that if anyone of the three functions is known, the others are known. Thus knowledgeof the hazard rate is equivalent to knowledge of the distribution.

    3.5 The cumulative hazard rate

    In some situations there is interest in the function

    H(t) =

    t0

    h(x)dx =

    t0

    f(x)

    1 F(x)dx (3.18)

    which is called the cumulative hazard rate. Using (3.17), it is seenthat

    R(t) = eH(t) and H(t) = ln{R(t)} (3.19)Notice that the condition that R(t) 1 indicates that H(t) 0. It can

    easily be shown that 2 ln U 22, where U is distributed uniformly on theunit interval. Since F(t) is uniform over (0,1), so is R(t). Thus, we can write

    2 ln{R(t)} 22,or from (2.3.18),

    2 ln(eH(t)) 22Thus

    2 H(t) 22 (3.20)

    Equation (3.5.3) is the basis for a test of hypotheses to be introducedin Chapter 4. The cumulative hazard has been proposed (see, for example,Nelson (1972) or Nelson (1982)) as an effective characteristic to use as a basisfor the determination of the failure distribution through the use of plottingtechniques.

  • 8/7/2019 Chap3 basic reli maths

    19/31

    3.5. THE CUMULATIVE HAZARD RATE 109

    3.5.1 Explanatory variables or regression models

    In the early reliability analyses, it was usual to assume that the population ofdevices under study was sufficiently homogeneous so that the lifetimes of thedevices could be considered independent and identically distributed randomvariables. However, in many applications, it is not possible or instructiveto obtain devices from homogeneous populations and thus the devices understudy or available may differ in their intrinsic properties or in the conditionsunder which they operate. These differing conditions make it important toconsider and add explanatory variables or covariates to the reliability model.These explanatory variables are variables that are associated with each deviceand are believed to affect the lifetime of the device. These variables may be

    continuous, as in the case of temperature or voltage, or discrete, as in thecase of a particular material used in the device or the presence or absenceof a particular factor in the device. These variables can also be classified asconstant over time or as time dependent.

    The relationship of these explanatory variables to the lifetime of the de-vice is usually studied by means of a regression model in which the lifetimeof the device has a distribution that depends on the explanatory variables.If the amount of information about the lifetime distribution that is availableis minimal, then there are appropriate non-parametric analyses that can beused.

    In the following sub-sections, two possible models are outlined where

    the model allows one to include the effect of explanatory variables on thelifetime of a device. When the effect of the explanatory variables is appliedto the hazard function as a multiplicative factor, the resulting model is theproportional hazards model, developed by D. R. Cox (1972). When the effectof the explanatory variables is applied to the time scale as a multiplicativefactor, the resulting model is the accelerated life model. These models andthe related methods will be considered in more detail in Chapter 9.

    3.5.2 Proportional hazards models

    One successful method of including explanatory variables into the model isto allow a function of the explanatory variables to affect the hazard functionof the lifetimes as a multiplicative factor. Thus, a standard or use conditionor baseline hazard function is multiplied by a function of the explanatoryvariables, resulting in a new hazard function which is now a function of the

  • 8/7/2019 Chap3 basic reli maths

    20/31

    110 CHAPTER 3. BASIC RELIABILITY MATHEMATICS

    parameters associated with the explanatory variables. Thus:

    h(t; x) = (x)h(t; x = 0) (3.21)

    where h(t ; x=0) is the standard or baseline hazard function and ( x)is the function of the vector of explanatory variables x with an associatedvector of parameters . Since it is required that (x) be positive and that(0) = 1, it is usual to define (x) as:

    (x) = e(x11+x22++xrr),

    when there are r explanatory variables.

    Also, since, R(t) = e(t0 h(u)du), the relationship of the reliability func-

    tions in the proportional hazards model is:

    R(t; x) = [R(t; x = 0)](x)

    In many applications, the assumption that a change in one or more conditionsunder which a device is tested or used has a multiplicative effect on the hazardrate of the device is a reasonable and effective assumption. In addition,the techniques associated with the proportional hazard model accommodatecensored data, tied values and failure times of zero. These situations whichoccur regularly in reliability studies, can cause difficulty in some analyses,but pose no problem in the use of proportional hazards techniques. Also,

    general non-parametric techniques are also available so that estimation ofthe reliability can be achieved without an assumption as to the underlyingfailure time distribution.

    Example 1

    A CMOS integrated curcuit memory device is such that its time to failureis assumed to follow the exponential distribution and its failure rate isa function of temperature according to the Arrhenius model, that is, =

    KeA/T = elnKA(1T), where is the failure rate, K is the proportionality

    constant, A is Boltzmans constant and T is the temperature, oK. Choose

    the baseline proportionality constant so that lnKA 1T0 = 0, that is, thebaseline failure rate is 1. Then for the proportional hazards model:

    h

    t : x =

    1

    T

    = (x) h(t; x0), (x) = e

    lnKA( 1T) = ea+bx,

  • 8/7/2019 Chap3 basic reli maths

    21/31

    3.6. MEAN TIME TO FAILURE (MTTF), AND MEAN TIME BETWEEN FAILURES (MTBF

    that is, the Arrhenius model is a special case of the proportional hazards

    model.The Arrhenius model was developed as an accelerated life model, whichit is (see the next section and Chapter **) and it will be seen that for theWeibull distribution, of which the exponential distribution is a member, theaccelerated life model and the proportional hazards model are equivalent. Inthe case of this example, note that the accelerated life model is such that:

    Ra(t) = Ru((x)t) = e(T)t = ee

    lnK+A( 1T)= e(a+bx)t

    3.6 Mean time to failure (MTTF), and mean

    time between failures (MTBF)

    It is important to distinguish between the concepts Mean Time To Failure(MTTF) and Mean Time Between Failures (MTBF). The MTTF is the ex-pected time to failure of a component or system. That is, the mean of thetime to failure (TTF) for that component or system. The MTBF is the ex-pected time to failure after a failure and repair of the component or system.With the MTBF, it is easily seen that some assumptions are necessary as tothe state of the component after its repair. The terms, Time Between Fail-ures and Mean Time Between Failures are usually reserved for the study of

    repairable systems. Although many practitioners assign the same meaning tothe symbols TTF and TBF as well as treating MTTF and MTBF identically,the practice is discouraged. We will study TBFs and MTBFs extensively inChapter 11. Throughout this book, it will be assumed that when we use thesymbols TTF or MTBF, we are referring to operating time until failure, forboth repairable and non-repairable components and systems. We will use thesymbols TBF and MTBF only when referring to down times for repairablecomponents and systems.

    Suppose the random variable (lifetime) T has density f(t) and reliabilityfunction R(t). The MTTF is:

    M T T F = E(T) =0

    t f(t)dt =0

    tdR(t)

    dt

    dt

    = tR(t)|0 +

    0

    R(t)dt =

    0

    R(t)dt (3.22)

  • 8/7/2019 Chap3 basic reli maths

    22/31

    112 CHAPTER 3. BASIC RELIABILITY MATHEMATICS

    if limt t R(t) = 0, which is true for distributions whose mean exists, par-

    ticularly those of interest in reliability practice. For many of the populardensities of reliability, it will not be necessary to perform integration to de-termine the mean as it is well-known.

    3.7 Variance of the TTF

    The variance of the TTF is given by

    V AR(T) =

    0

    t2f(t)dtM T T F 2 (3.23)

    Once again, it will often not be necessary to perform the above integrationsince, for the most part, one will be dealing with well-known TTF densitieswhose variances are well-established.

    3.8 Mean residual life (MRL)

    The MRL is the mean remaining lifetime of a component given that it hasreached age t. It is quite important in the study of systems in which censoringoccurs. Censoring is discussed in section 3.9. The MRL is defined as:

    r(t) = E(T|T t) =t (u t)f(u)du

    R(t)

    =1

    R(t){[(u t)R(u)]|t +

    t

    R(u)du} =

    tR(u)du

    R(t)(3.24)

    Note that r(0)=E(T) and if the life has a constant hazard rate , then r(t) =1

    .?This is further evidence of the memoryless property of the exponentialdistribution, discussed in Chapter 4. It also follows that :

    R(t) =r(0)

    r(t)e

    t0

    dur(u) (3.25)

    Thus the time to failure distribution is completely specified by the MRL.EXAMPLE: Consider a linear mean residual life given by: r(t) = m(1+mt)

    (k1).

    Using (3.7.2), one finds that the reliability function R(t) is:

  • 8/7/2019 Chap3 basic reli maths

    23/31

    3.9. MEAN LIFE WITH CENSORING (MLC) 113

    R(t) =

    1(k1)

    (1+mt)(k1)

    exp

    t

    0

    (k

    1)mdu

    (1 + mu)

    =1

    (1 + mt)exp

    (k 1)ln(1 + mu)|t0 = 1(1 + mt)eln(1+mt)k+1

    =1

    (1 + mt)k,

    which is the reliability function for the Burr distribution (see Chapter 6)with parameter c = 1.

    3.9 Mean life with censoring (MLC)

    In (3.25) and (3.26) above, it is important to note that the MRL is mea-

    sured from time t, the lifetime already achieved without failure. An expres-sion similar to the mean residual life, called the mean life with censoring(MLC) combines the lifetime already achieved with the expected remaininglife. Thus MLC=MRL + t.

    MLC(at time t) =

    t t f(t)dt

    R(t)(3.26)

    The MLC (at time t) is simply the conditional expectation of the entirelife given survival until time t.

    Example: Suppose that a device with an exponential TTF with mean500 hours is removed from service after 200 hours. What is the MLC at 200hours ? Recall from Section 3.2.1that the density of the exponential is givenby f(t) = 1

    exp(t/), t > 0 and is the mean of the exponential.

    MLC(200) =

    200t 1500

    e(t/500)dt

    e(200/500)=

    469.224

    0.67032= 700

    Thus a unit removed from service after 200 hours will have a total ex-pected life of 700 hours. This is 500 hours after censoring (removal fromservice). The MRL(200) = from (3.9.1) is

    200 e

    (t/500)dt

    e(200/500) = 335.160.67032 = 500

    In general, for the exponential, MRL(t) = + t. Also, we have verifiedthrough this example that MLC(t)=MRL(t)+t. More will be said about theMRL and MLC in the Chapter 4.

  • 8/7/2019 Chap3 basic reli maths

    24/31

    114 CHAPTER 3. BASIC RELIABILITY MATHEMATICS

    3.10 Life testing

    3.10.1 Introduction

    Life testing is concerned with measuring the pertinent characteristics of thelife of the unit under study. Often this is accomplished by making statisticalinferences about probability distributions or their parameters.

    In general, units are put on test, observed and the times of failure recordedas they occur. For example, a group of similar components are placed on testand the failure times observed. Obviously, the times at which individual unitsfail will vary. Sometimes, assignable causes can be found that contribute tothat variation. Suppose some components have been subjected to testing

    at a high temperature environment and it is possible that such componentswill fail sooner than those tested at an ambient temperature environment.However, the components at the high temperature will still have differentfailure times; and, if there are no assignable causes in operation, these com-ponents will still have different failure times, that is, it is always assumedthat the failure times of the components have some random elements andwill be assumed to be a random variable with a probability distribution.

    To make statistical inferences about the probability distribution of thefailure time random variable, one uses the failure times that have been ob-served from a life test, ideally a test that has been statistically designedfor the purpose of the study. If the failure times of a particular component

    under a given set of conditions, can be adequately described by a probabil-ity distribution, there are considerable practical benefits. The failure timescan then be used to estimate the parameters of the distribution and to per-haps study the relationship of these parameters to associated explanatoryvariables. The estimates can be used to make predictions, determine com-ponent configurations in systems, determine replacement procedures, specifyguarantee periods and make other decisions about the use of the component.

    3.10.2 Failure Times

    Before a study of the effects of a group of failure times is begun, it must bedetermined precisely what these data values involve. There must be agree-ment among participating parties about certain characteristics of the failuredata. That is, the start of the time measurement, the scale of the time mea-surement and the definition of a failure are not always consistent in life test

  • 8/7/2019 Chap3 basic reli maths

    25/31

    3.10. LIFE TESTING 115

    situations and must be precisely specified in a given study.

    The time origin in some studies is obvious. In some other studies, how-ever, there is enough confusion about the origin of time measurements thatsome agreement as to the origin must be reached before the study begins.For example, in some studies the unit under test may have under gone earliertesting in development studies and some agreement must be reached as towhether to include the earlier times on test as running times for the presentstudy.

    The same is true of the time scale. Usually the scale is clock time butother measures may also be used, such as the number of cycles, the mileageto the first puncture of a tire, etc.

    There may also be differing definitions of what constitutes a failure. It is

    important that one definition be specified or that different modes of failurebe recognized and allowed as failures. It is usually informative in the dataanalysis if the differing modes of failure are distinguished and recorded in thetest results. For many components, failure is catastrophic and the definitionof a failure is obvious. But for some components, the performance slowlydegrades and the amount of degradation to be judged a failure must bedefined.

    3.10.3 Censoring of Data

    One of the circumstances that has traditionally caused concern and somedifficulty in statistical studies has been the occurrence of missing observa-tions. Although techniques have been proposed for accommodating missingobservations in most types of statistical analyses, the problem of missingor incomplete observations in general does not seem to occur as often as inmodern reliability studies. With highly reliable components, it is unusual ifall the components have failed by the end of the time allotted for the test.In human survival studies and in some engineering studies, some of the unitson test may be withdrawn from the test for various reasons or may fail dueto a cause that is not under study. Such incomplete data observations inreliability studies are called censored items. Although the failure time infor-

    mation on such an item is incomplete, there is usually still some informationin the time data that is available in the item and so the censoring time shouldalways be recorded in a study.

    Censoring is often distinguished according to type and order. The typeof censoring reflects the rule for censoring and influences which variables in

  • 8/7/2019 Chap3 basic reli maths

    26/31

    116 CHAPTER 3. BASIC RELIABILITY MATHEMATICS

    the study that. are random. A consideration of which variables are random

    affects the distributional assumptions of estimates and will be discussed later.Type I censoring is the rule that specifies that the testing is terminatedat a specific, fixed time tc. In this case, the time tc is a fixed value and thenumber of units which are censored in a study is a random variable. TypeI censoring is the most common type of censoring used in practice becauseit is the easiest to implement since the duration of the study is determinedand fixed beforehand. However, it is not the most convenient in terms of thedistributional considerations.

    Type II censoring is the rule that specifies that the testing is terminatedwhen a pre-set number of units, say r, have failed. In the case of Type IIcensoring the time at which the test is stopped is a random variable, that

    is, the time at which the rth failure occurred. This type of censoring isless practical because it does not allow an upper bound on the total timeduration. It does, however, result in a more convenient theory.

    The order of censoring indicates whether there is a single or there aremultiple rules for censoring in a test. Multiply censored data are made up offailure times and a mixture of censored times.

    For example, n units are on test:a) The test is terminated at tc = 100 hours and there are r failures. Thenumber of failures, R , is a random variable as is the total test time, T T :

    T T =

    r

    i=1

    ti (Type I, single)

    b) The test is terminated when the r, say, 10th failure occurs which is attime t(10). T T is a random variable as is the total test time:

    T T =10i=1

    ti + (n 10)t(10) Type II, single

    c) The test is terminated at tc = 100 hours and there are r failures. Inaddition, two units have been removed while still functioning at 50 hours.

    (Type I, multiple)More generally, for the ith unit from a sample of n on life test, one could

    record the observation (xi, di), where xi is the failure time if the indicatorvariable di = 1 and xi is the censored time if di = 0. In Type I censoring,all the xi values are equal to tc when di = 0 and when di = 1, the xi values

  • 8/7/2019 Chap3 basic reli maths

    27/31

    3.11. RELIABILITY DATA FROM THE FIELD 117

    have the values ti which are observations of the random failure variable T.

    In Type II censoring, the censoring time is a random variable, the rth orderstatistic T(r), if the test is stopped at the time of the rth failure.

    The (xi, di) notation can handle multiple censoring mechanisms also, andwill be particularly useful in the maximum likelihood derivations of estima-tors.

    It is important that the censoring mechanism remain independent of thefailure mechanism. It would be impossible to obtain meaningful data if unitswere censored when they appeared to have a high probability of failure at thetime of censoring. Any unit censored at a time tc should be representativeof all the units under the same test conditions at time tc.

    3.11 Reliability data from the field

    After release of a product, most of the data provided to the reliability engineercome from the field (from actual use conditions). In this case, the data arealmost always multiply-censored. This means there are a mixture of failuretimes and non-failure running times i.e., there is no particular order inwhich the failed and non-failed units occur. They are completely intermixed.Many of the topics in this book deal with multiply-censored data.

    3.12 Reliable life

    Sometimes, instead of computing the reliability at time t, it is of interestto compute the time for which the reliability is . This value is called thereliable life for reliability . Reliable Life is a useful way of allowing engi-neers to specify reliability goals or targets as well as specifying intolerablereliability values. More will be said about this in Chapter 4. It gives thetime at which 100% of the components in question are functioning and is

    equivalent to determining the 100(1 )th percentile of the time to failuredistribution. Estimates of the reliable life can be obtained from one-sidedtolerance limits. If the reliability function is known, the reliable life can beobtained by inverting the reliability function at the appropriate value.

  • 8/7/2019 Chap3 basic reli maths

    28/31

    118 CHAPTER 3. BASIC RELIABILITY MATHEMATICS

    EXAMPLE: Suppose the reliability function for the life of a particular

    component is given by: R(t) = e0.01t2

    ,

    where t is in hours. The reliable life for reliability R=0.90 is:

    t0.90 =

    ln(0.90)

    0.01=

    10.536 = 3.246 hours.

    3.13 Other responses

    Although the failure time of a unit on life test is the primary response to be

    studied, other responses with associated probability distributions can occur.In some test situations, the time of failure is not relevant, only whether theunit fails or not is of importance. This situation results in what is calledattribute life test data. It is often more economical and straight-forwardto run an attribute test and analyze attribute data but there are obviousdisadvantages in that less information is obtained than if one observed thefailure times. The analysis of attribute test data will be treated in a laterchapter but the emphasis of the procedures in this text will be the caseswhere a variable, such as failure time, will be observed.

    Another type of data that can arise in a life test is called quantal-responsedata. Quantal-reponse data is observed when the failure time itself cannotbe observed and a unit is only inspected once at a certain time to see if ithas failed or not.

    A more usual situation in life testing, where the failure time itself cannotbe observed, is the case of interval or grouped data responses. In this case,the units are inspected more than once, but one only knows whether a unitfailed in an interval between inspections. Techniques for this type of datawill be discussed for some of the graphical data analysis procedures. Wheninterval data occurs, it is often with large data sets and in these situationsthe graphical analyses do well.

    3.14 Problems in reliability mathematics

    Problem 3.1 If h(t) = 3t2 2t.a) What are f(t) and R(t)?

  • 8/7/2019 Chap3 basic reli maths

    29/31

    3.14. PROBLEMS IN RELIABILITY MATHEMATICS 119

    b) What the restrictions on t ?

    c) R(2)?Problem 3.2 A system has a time to failure (TTF) density f(t) = 6t(1+t)4 . Find R(t),

    h(t) and H(t).

    Problem 3.3 Find E(T) for a system whose TTF density is 16 t e4t.

    Problem 3.4 For a component having the Rayleigh TTF density, i.e., f(t) = ta2

    e

    t2

    2a2

    ,

    a) find E(T); b) find R(t) and h(t); c) find the reliable life t0.90.

    Problem 3.5 If f(t) = t25 e0.2t,

    a) what is the probability that this part fails during the first 10 hours

    of lifeb) what is the probability that it fails during the interval (10, 20)c) what is the conditional probability that it fails during the interval(10, 20),given that it has survived until 10 hours ?d) what is the MRL at t = 10?

    Problem 3.6 h(t) = t12 ,

    a) what is H(t)?b) what is R(3)?

    Problem 3.7 A component has TTF density given by f(t) = kt4 e

    5t, t > 0. Find:a) kb) R(t)c) h(t)d) MTTF

    Problem 3.8 Suppose that R(t) = t2exp(9t2), t > 0a) What is the MTTF ? (Numerical answer required)b) What is the expression for f(t) ?c) Is the random variable, T, a time-to-failure random variable ?Check by evaluating F(0) and F(

    ).

    d) What is H(t)?

    Problem 3.9 Find the mean time-to-failure of the time-to-failure density given by

    f(t) =t

    4e

    t2

    8 t > 0.

  • 8/7/2019 Chap3 basic reli maths

    30/31

    120 CHAPTER 3. BASIC RELIABILITY MATHEMATICS

    Numerical answer required. Note that

    12

    =

    .

    Problem 3.10 Suppose that f(t) = t3et/41536 t > 0a) What is the MTTF ?b) What is the probability of failure before t=100?

    Problem 3.11 If T is discrete, say cycles, with numbers small enough so that a con-tinuous approximation is not valid, and P{T = ti = i} = ii! e, i =0, 1, . . .a) plot h(ti), R(ti) and H(ti) for i=0, 1, ..., 10 and = 2,b) show h(ti) is monotone increasing, for any ?

    Problem 3.12 fT(t) =1tu

    , 0

    T

    tu, that is, f(t) is uniform in the time interval

    [0, tu]. Find:a) R(t)b) h(t)c) MTTFd)MTTF using right hand side of (2.3.28)e) MRL.f) Is the process represented by f(t), an aging process?

    Problem 3.13 Consider a process where the components are replaced at a set time tr,or replaced at failure, if failure occurs before tr. What is the mean lifeof a component of this type, in terms of the reliability function?

    Problem 3.14 In problem 13, the cost of replacement of such a component at failureis Cf and at replacement, Cr. What is the average cost per unit timeper component?

    Problem 3.15 If the components in problem 14 have constant hazard, show that thebest strategy is to replace on failure only. Is this also true for compo-nents with decreasing hazard rate?

    Problem 3.16 If the components in problem 15 have f(t) as in problem 12 with tu = 50hours and tr = 30 hours, find the cost per unit time per component. If

    2Cr = Cf, can you find a better tr?

    Problem 3.17 What is the mean of the random variable with the following TTF den-sity

    f(t) =

    eat ab

    /(b)

    1t

    b+1t > 0.

  • 8/7/2019 Chap3 basic reli maths

    31/31

    3.15. REFERENCES 121

    Problem 3.18 Suppose that R(t) = t2exp(9t2). What is the MTTF ?

    Problem 3.19 The time to failure density for a particular component is given by f(t) =1

    124416t3exp(t/12) t > 0. What is the probability of failure before 120

    hours ?

    Problem 3.20 For an exponential distribution with a mean of 500 hours, Finda)P[failure(300, 400)|no failure (0, 300)].b) P[failure(600, 700)|no failure(0, 600)].

    Problem 3.21 Write expressions for H(t) for each of the following densities:a) exponential b) Weibull c) normal d) lognormal e) gamma

    Problem 3.22 Estimate the parameters of the lognormal distribution fitted tothe following:

    Data11.0 23.5 7.6 5.2 10.625.8 28.6 3.5 6.7 8.118.7 5.7 4.3 6.9 3.310.4 4.4 3.5 6.5 6.323.6 2.0 7.4 9.4 17.88.3 8.8 1.9 10.4 13.2

    3.15 References

    Advisory Group on Reliability of Electronic Equipment (AGREE) (1957),Reliability of Military Electronic Equipment, Task Group 9 Report, Wash-ington, DC, US Government Printing Office, June.

    Nelson, Wayne (1972), Theory and Application of Hazard Plotting forCensored Failure Data, Technometrics 14, pp. 945-966

    Nelson, Wayne (1982), Applied Life Data Analysis, John Wiley & Sons,

    New York.