appendix a review of probability theory - home - …978-3-319-20946...appendix a: review of...

30
Appendix A Review of Probability Theory A.1 Introduction: What Is Probability? What’s in a word? The words “probably” and “probability” are used commonly in everyday speech. We all know how to interpret expressions such as “It will proba- bly rain tomorrow,” or “Careless smoking probably caused that fire,” although the meanings are not particularly precise. The common usage of “probability” has to do with how closely a given statement resembles truth. Note that in common usage, it may be impossible to verify whether the statement is true or not; that is, the truth may not be knowable. Informally, we use the terms “probable” and “probability” to express a likelihood or chance of truth. While these common usages of the term “probability” are effective in communi- cating ideas, from a mathematical point of view, they lack the precision and stan- dardization of terminology to be particularly functional. Thus scientists and math- ematicians have developed various theories of probability to address the needs of scientific analysis and decision making. We will use a particular theory that has its origins in the early twentieth century and is now (by far) the most widely used theory of probability. This theory provides a formal structure (entities, definitions, axioms, etc.) that allows us to use other well-developed mathematical concepts (limits, sums, averages, etc.) in a way that remains consistent with our understanding of physi- cal principals. All theories have limitations. Our theory of probability, for instance, will not help us answer questions like, “What is the probability that individual X is guilty of a crime?” or “What is the probability that pigs will fly?” Fortunately, a well-developed theory has well-defined limitations, and we should be able to identify when we have overstepped the bounds of scientific validity. As we discuss these concepts, keep in mind that it is “probably” inevitable that we will at times encounter conflicts between the colloquial meanings of words and their formal mathematical definitions. These conflicts are natural and are no cause for alarm! © Springer International Publishing Switzerland 2016 M. Sánchez-Silva and G.-A. Klutke, Reliability and Life-Cycle Analysis of Deteriorating Systems, Springer Series in Reliability Engineering, DOI 10.1007/978-3-319-20946-3 325

Upload: lamhanh

Post on 15-May-2018

220 views

Category:

Documents


3 download

TRANSCRIPT

Page 1: Appendix A Review of Probability Theory - Home - …978-3-319-20946...Appendix A: Review of Probability Theory 327 To reiterate, a sample space is a set of outcomes; it obeys the typical

Appendix AReview of Probability Theory

A.1 Introduction: What Is Probability?

What’s in a word? The words “probably” and “probability” are used commonly ineveryday speech. We all know how to interpret expressions such as “It will proba-bly rain tomorrow,” or “Careless smoking probably caused that fire,” although themeanings are not particularly precise. The common usage of “probability” has to dowith how closely a given statement resembles truth. Note that in common usage, itmay be impossible to verify whether the statement is true or not; that is, the truthmay not be knowable. Informally, we use the terms “probable” and “probability” toexpress a likelihood or chance of truth.

While these common usages of the term “probability” are effective in communi-cating ideas, from a mathematical point of view, they lack the precision and stan-dardization of terminology to be particularly functional. Thus scientists and math-ematicians have developed various theories of probability to address the needs ofscientific analysis and decision making. We will use a particular theory that has itsorigins in the early twentieth century and is now (by far) the most widely used theoryof probability. This theory provides a formal structure (entities, definitions, axioms,etc.) that allows us to use other well-developed mathematical concepts (limits, sums,averages, etc.) in a way that remains consistent with our understanding of physi-cal principals. All theories have limitations. Our theory of probability, for instance,will not help us answer questions like, “What is the probability that individual Xis guilty of a crime?” or “What is the probability that pigs will fly?” Fortunately, awell-developed theory has well-defined limitations, and we should be able to identifywhen we have overstepped the bounds of scientific validity.

As we discuss these concepts, keep in mind that it is “probably” inevitable thatwe will at times encounter conflicts between the colloquial meanings of words andtheir formal mathematical definitions. These conflicts are natural and are no causefor alarm!

© Springer International Publishing Switzerland 2016M. Sánchez-Silva and G.-A. Klutke, Reliability and Life-Cycle Analysisof Deteriorating Systems, Springer Series in Reliability Engineering,DOI 10.1007/978-3-319-20946-3

325

Page 2: Appendix A Review of Probability Theory - Home - …978-3-319-20946...Appendix A: Review of Probability Theory 327 To reiterate, a sample space is a set of outcomes; it obeys the typical

326 Appendix A: Review of Probability Theory

A.2 Random Experiments and Probability Spaces:The Building Blocks of Probability

Our theory of probability begins with the concept of a random experiment. The ideais that we intend to perform an experiment that results in (precisely) one of a groupof outcomes. We use the term random experiment because we cannot be certain inadvance about the outcome. That is, we can identify all possible outcomes of theexperiment, but we do not know in advance which particular outcome will occur.The experiment is assumed to be repeatable, in the sense that we could recreatethe exact conditions of the experiment. If we repeat the experiment, however, weare not guaranteed that the same outcome will occur. To effectively describe therandom experiment, we must be able to: (i) identify its outcomes, (ii) characterizethe information available to us about the outcome of the experiment, and (iii) quantifythe likelihood that the experiment results in a particular incident. In mathematicalterminology, a random experiment will be identified with (actually, is equivalent to)a probability space. A probability space consists of three entities: a sample space(we will call it �), an event space (we’ll call itF ), and a probability measure (we’llcall it P). Let us discuss each of these entities in turn.

A.2.1 Sample Space

Formally, we define the sample space � to be the collection of all possible outcomes.Elements of the sample space are distinct and exhaustive (i.e., on any given perfor-mance of the experiment, one and only one outcome occurs), and we can think of thesample space as a set of distinct points. The sample space may be discrete (countableor denumerable) or continuous (uncountable or nondenumerable), likewise, it maybe finite or infinite.

Example A.1 The experiment consists of tossing a coin three times consecutively.Assuming that we do not allow the possibility of a coin landing on its side (Hheads or T Tails), the sample space can be identified as {(HHH), (HHT), (HTH),

(THH), (HTT), (THT), (TTH), (TTT)}. The sample space is discrete and finite.

Example A.2 The experiment consists of two players (A and B) playing hands ofpoker for $1 per hand. Each player begins with $5, and the game continues until oneof the players is bankrupt. Here the sample space can be identified as all sequences ofthe elements A and B such that the number of one letter does not exceed the numberof the other letter by more than 5. The sample space is discrete and infinite.

Example A.3 The experiment consists of measuring the diameter of every 5th steelcylinder that leaves a manufacturing line. The sample space consists of sequences ofreal numbers; it is continuous and infinite.

Page 3: Appendix A Review of Probability Theory - Home - …978-3-319-20946...Appendix A: Review of Probability Theory 327 To reiterate, a sample space is a set of outcomes; it obeys the typical

Appendix A: Review of Probability Theory 327

To reiterate, a sample space is a set of outcomes; it obeys the typical rules thatobtain with sets (unions, intersections, complements, differences, etc.).

A.2.2 Event Space

The second element of a probability space is a collection of so-called events F .Events themselves consist of particular groups of outcomes. Thus the set of events isa collection of subsets of the sample space. Events can be thought of as characteristicsof outcomes that can be identified once the experiment has been performed; that is,they are the “information scale” at which we can view the results of an experiment.In many, but not all, experiments, we can identify individual outcomes of an experi-ment; in some experiments we can identify only certain characteristics of individualoutcomes. Thus the event space characterizes the information that we have availableto us about the outcomes of a random experiment; it is the mesh or filter throughwhich we can view the outcomes. Some terminology: we say that “an event hasoccurred” if the outcome that occurred is contained in that event.

The specification of the event space is not completely arbitrary; in order to main-tain consistency, we need to instill some structure (rules) on the event space. Thestructure makes perfect intuitive sense. First, if we are able to observe that a partic-ular group of outcomes occurred, we should be able to observe that the same groupof outcomes did not occur. This means that if a set of outcomes F is in the eventspace, then the set of outcomes F (the complement of F) is also in the event space.Secondly, if we are able to determine if a set of outcomes F1 occurred, and we areable to determine if a group of outcomes F2 occurred, then we should be able todetermine if either F1 or F2 occurred. That is, if F1 and F2 are in the event space,then F1 ∪ F2 must be in the event space. Finally, we must be able to observe thatsome outcome occurred; that is, � itself must be an event. Note that since � is in theevent space, so is φ, the empty set (also called the impossible event). With these rulesfor the event space, the smallest event space that we can work with is F = {�,φ}.Example A.4 Suppose the random experiment is as in ExampleA.1, and supposethat we are able to observe the outcome of each individual coin toss. Then the eventspace consists of all subsets of the sample space (the power set of the sample space).

Example A.5 Now suppose the random experiment is as in ExampleA.1, except thatwe are able to observe only the outcome of the last toss. Then the event space consistsof�,φ, and the sets {(HHH), (HTH), (THH), (TTH)} and {(HHT), (HTT), (THT),(TTT)}.

Note that an event can be determined either by listing its elements or by stating acondition that its elements must satisfy; e.g., if the sample space of our experiment isas in ExampleA.1, the set {(HHT), (HTH), (THH), (HTT)} and the statement “twoheads occurred” determine the same event.

Page 4: Appendix A Review of Probability Theory - Home - …978-3-319-20946...Appendix A: Review of Probability Theory 327 To reiterate, a sample space is a set of outcomes; it obeys the typical

328 Appendix A: Review of Probability Theory

A.2.3 Probability Measure

The final element of our probability space is an assignment of probabilities foreach event in the event space. Such an assignment is described by a function P thatassigns a value to each event. This value represents our belief in the likelihood that theexperiment will result in an event’s occurrence. The choice of this function quantifiesour knowledge of the randomness of the experiment. It is important to remember thata probability measure lives on (assigns values to) events rather than outcomes, butremember, also, that there are certain situations where individual outcomes can alsobe events; such events are called atomic events.

Definition 55 A sample space � of a random experiment is the set of all possibleoutcomes of the experiment.

Definition 56 An event space F of a random experiment is a collection of subsetsof the sample space that satisfy

• � is inF• If F is inF , then F is inF• If F1 and F2 are inF , then F1 ∪ F2 is inF .

Definition 57 A probability measure P for a random experiment is a function thatassigns a numerical value to each event in an event space such that

• If F is an event, 0 ≤ P(F) ≤ 1.• P(�) = 1.• If F1, F2, . . . are mutually exclusive events, then

P(F1 ∪ F2 ∪ · · · ) =∑

i

P(Fi ).

These rules guarantee that a probability measure is meaningful and workableand are often referred to as the “Axioms of Probability.” Beyond these rules, howwe determine which probability measure to use for a given random experiment is amodeling issue rather than amathematical one.Many different choices for probabilitymeasures are possible, depending on how we believe the probabilistic mechanismproducing the outcomes works.

To summarize, we have fully described any random experiment if we have spec-ified a probability space {�,F , P} consisting of a sample space �, an event spaceF , and a probability measure P .

Example A.6 Consider again the random experiment described in ExampleA.1 andthe event space described in ExampleA.4. If we believe that the coin we are using isfair (unbiased), it should follow that each of the atomic events should have the sameprobability (i.e., be equally likely). If on any given toss, a head is twice as likelyas a tail, the probability of the event {(HHH)} should be eight times the probabilityof the event {(TTT)}, and the events {(HTH)} and {(THH)} should have the sameprobability.

Page 5: Appendix A Review of Probability Theory - Home - …978-3-319-20946...Appendix A: Review of Probability Theory 327 To reiterate, a sample space is a set of outcomes; it obeys the typical

Appendix A: Review of Probability Theory 329

The probability axioms lead to several elementary properties of probability. Theseproperties follow easily by considering simple set operations.

Property 1 For any event F, P(F) = 1 − P(F).

Proof F and F are mutually exclusive events, and � = F ∪ F . Hence by Axioms2 and 3,

1 = P(�) = P(F ∪ F) = P(F) + P(F), (A.1)

and hence P(F) = 1 − P(F).

Property 2 If F1 and F2 are any events (not necessarily mutually exclusive), then

P(F1 ∪ F2) = P(F1) + P(F2) − P(F1 ∩ F2). (A.2)

Proof By simple set properties,

F1 ∪ F2 = F1 ∪ (F1 ∩ F2) and F2 = (F1 ∩ F2) ∪ (F1 ∩ F2). (A.3)

The unions on the right-hand side of each equation are of mutually exclusiveevents, so by Axiom 3,

P(F1 ∪ F2) =P(F1) + P(F1 ∩ F2)

P(F2) =P(F1 ∩ F2) + P(F1 ∩ F2)

Solving both equations for P(F1 ∩ F2) gives the desired result.

Property 3 If F1, F2, . . . , Fk are any events,

P(F1 ∪ F2 ∪ · · · ∪ Fk) =∑

i

P(Fi ) −∑

i< j

P(Fi ∩ Fj )

+ · · · + (−1)k+1P(F1 ∩ F2 · · · ∩ Fk).

Proof Follows from Property2 by mathematical induction.

A.2.4 Conditional Probability and the Law of Total Probability

The probability measure ensures that we have assigned a probability to every eventin the event space of our random experiment. In many situations, we may be ableto observe partial information about the outcome of an experiment in terms of theoccurrence of an event. We would like to have a consistent way of “updating” theprobabilities of other events based on this information. To this end, we give anelementary definition of conditional probability.

Page 6: Appendix A Review of Probability Theory - Home - …978-3-319-20946...Appendix A: Review of Probability Theory 327 To reiterate, a sample space is a set of outcomes; it obeys the typical

330 Appendix A: Review of Probability Theory

Definition 58 Given events F1 and F2, the conditional probability of F2 given thatthe F1 occurs is given by

P(F2|F1) = P(F1 ∩ F2)

P(F1). (A.4)

Of course, this definition only makes sense if P(F1) > 0. For now, we leavethe conditional probability undefined if P(F1) = 0, but there are other ways toconsistently define the conditional probability in this case.

Now consider a set of events F1, F2, . . . that form a partition of the sample space�; that is, the events are mutually exclusive (Fi ∩ Fj = ∅, i �= j) and exhaustive(∪ j Fj = �). The number of events in the partition may be finite or infinite. For anyevent A, by the properties of the partition, we can write

A = [A ∩ F1] ∪ [A ∩ F1] ∪ · · · , (A.5)

and since the [A ∩ F1]’s are mutually exclusive, we have

P(A) = P(A ∩ F1) + P(A ∩ F2) + · · · , (A.6)

and using the definition of conditional probability,

P(A) = P(A|F1)P(F1) + P(A|F2)P(F2) + · · · =∑

i

P(A|Fi )P(Fi ). (A.7)

This result is known as the Law of Total Probability and is very useful.

A.3 Random Variables

A.3.1 Definition

Once we have a probability space that describes our random experiment, there aremany things that we can “measure” about each outcome in the sample space. Thesemeasurable properties, which depend on the actual outcome realized by the experi-ment, are termed random variables.

Definition 59 A random variable X is a function that assigns a real number X (ω) toeach element ω of the sample space such that for any collection of real numbers C ,

X−1(C) = {ω : X (ω) ∈ C} (A.8)

is an event (i.e., is inF ).

Page 7: Appendix A Review of Probability Theory - Home - …978-3-319-20946...Appendix A: Review of Probability Theory 327 To reiterate, a sample space is a set of outcomes; it obeys the typical

Appendix A: Review of Probability Theory 331

Mathematically, an assignment of a numerical value to an element of the samplespace is a mapping (function) of the sample space to the real line. Such a mappingis called a random variable provided we can “trace back” values of the function toevents. Formally, a random variable is a function whose domain is the sample spaceand whose range is some subset of (or possibly the whole) the real line; that is, arandomvariable assigns a real number to each element of the sample space.A randomvariablemust have the property that, if we take a particular range of numerical values,the collection of outcomes that gets assigned a value in that range is an event. Thislast property is called measurability and ensures that our probability space is “richenough” to support the random variable.

Example A.7 Suppose our experiment consists of selecting an individual at randomfrom a classroom with n students. A reasonable choice for a probability space forthis experiment might be to choose � to be the list of students’ id numbers (to makesure each student is uniquely identified),F to be the power set of �, and to chooseP such that it assigns value 1/n to each atomic event. Now to each outcome in thesample space (each student), assign a numerical value equal to the students’ height,weight, cumulative GPA, and score on the last exam.

Example A.8 Consider the random experiment of ExampleA.1, and suppose wedefine a function X to be the number of heads in all three tosses. Then X ((HHH)) =3, X ((HHT)) = X ((HTH)) = X ((THH)) = 2, X ((HTT)) = X ((THT)) =X ((TTH)) = 1, X ((TTT)) = 0. X is a random variable for the event space describedin ExampleA.4 but not for the event space described in ExampleA.5.

Random variables are termed discrete if the possible values they can take on is adiscrete set and continuous if it is a continuous set.

Example A.9 A manufacturing facility contains a sophisticated CNC pipe bendingstation. In-process jobs arrive at the bending station from an upstream cutting station,and after processing at the bending station, are placed on a conveyor that takes themto a drilling station. Let X be the number of jobswaiting for processing at themachineat the beginning of a particular day. X is a discrete random variable. Let Y be theamount of time between the first two departures from the machine on a given day. Yis a continuous random variable.

A.3.2 Events Defined by Random Variables

Aprobabilitymeasure is part of the description of a randomexperiment.Aprobabilitymeasure “lives on” the event space that we have chosen for our random experiment.How do we make a connection between probability and random variables? Theanswer lies in constructing appropriate events using random variables.

Page 8: Appendix A Review of Probability Theory - Home - …978-3-319-20946...Appendix A: Review of Probability Theory 327 To reiterate, a sample space is a set of outcomes; it obeys the typical

332 Appendix A: Review of Probability Theory

Let X be a random variable defined on a probability space (�,F , P). For sim-plicity, suppose X is discrete. Take any real number x , and consider the set

Fx = {ω ∈ � : X (ω) = x}. (A.9)

Fx is an event, and therefore it makes sense to talk about P(Fx ). That is, for anyreal number x , we can use the random variable X to construct an event by consideringall sample points whose X -value is x . Such an event is called an event generated bythe random variable X .

We will use the notation {X = x} to indicate the event {ω ∈ � : X (ω) = x}, andwe will write P(X = x) to mean P({ω ∈ � : X (ω) = x}). Similarly, we can defineevents such as {X < x}, {X ≥ x}, and even such events as {X ≤ y, X ≥ x} and{y ≤ X ≤ x}. As long as we associate statements about random variables with eventsin the event space and use the rules for probability measure, we have no difficulty inassigning the proper probabilities to any event generated by a random variable.

A.3.3 Distribution Function

Suppose we have defined a random variable X on a probability space. For a givenx , we know how to interpret the event {X ≤ x}, and how to evaluate its probability.As x varies over the real line, P(X ≤ x) defines a function of x ; this function iscalled the cumulative distribution function (distribution function or cdf for short) andit plays a very important role in probability theory.

Definition 60 The distribution function of a random variable X is defined by

F(x) = P(X ≤ x), −∞ < x < ∞. (A.10)

Note that knowing the cdf of a random variable is equivalent to knowing theprobability of each and every event generated by that random variable.

The cdf of any random variable has a number of important properties.

• The cdf is right continuous.• The cdf is nondecreasing.• F(−∞) = 0, F(∞) = 1.

The cdf of a discrete random variable is a step function; the cdf of a continuousrandom variable is a continuous function.

Example A.10 Let X be the number of heads in three consecutive tosses of a faircoin. Then

X (ω) =

⎧⎪⎪⎪⎨

⎪⎪⎪⎩

0 if ω = (TTT);1 if ω ∈ {(TTH), (THT), (HTT)};2 if ω ∈ {(HHT), (HTH), (THH)};3 if ω = (HHH).

Page 9: Appendix A Review of Probability Theory - Home - …978-3-319-20946...Appendix A: Review of Probability Theory 327 To reiterate, a sample space is a set of outcomes; it obeys the typical

Appendix A: Review of Probability Theory 333

Since the coin is fair, the probability measure assigns the following values to theevents {X = x}:

P(X = x) =

⎧⎪⎪⎪⎨

⎪⎪⎪⎩

18 if x = 0;38 if x = 1;38 if x = 2;18 if x = 3.

and therefore, the distribution function of X is

F(x) =

⎧⎪⎪⎪⎪⎪⎪⎨

⎪⎪⎪⎪⎪⎪⎩

0 if x < 0;18 if 0 ≤ x < 1;12 if 1 ≤ x < 2;78 if 2 ≤ x < 3;1 if x ≥ 3.

Example A.11 Let X be an exponentially distributed random variable. Then

P(X ≤ x) = F(x) = 1 − e−λx , x > 0. (A.11)

A.3.4 Expectation and Moments

We have seen that its distribution function completely specifies the probabilisticstructure of a random variable. Only the distribution function is capable of givingus the probability that the random variable takes on values in a particular range. Wemay, however, be interested in other, less detailed, information about the structure ofthe random variable. For instance, we might want to know the 95th percentile (valueα such that P(X ≤ α) = 0.95), the median (value β such that P(X ≤ β) = P(X ≥β)), or the mean (probabilistic average) of the random variable. Each of these entitiesis a number (rather than a function) and contains some useful information about therandom variable. In this section, we will define a probabilistic average that will beof great use to us in characterizing random variables.

The expectation operator E of a random variable X is defined as

E[X ] =∫

X (ω)P(dω), (A.12)

or in terms of the distribution function

E[X ] =∫ ∞

−∞xd F(x). (A.13)

Page 10: Appendix A Review of Probability Theory - Home - …978-3-319-20946...Appendix A: Review of Probability Theory 327 To reiterate, a sample space is a set of outcomes; it obeys the typical

334 Appendix A: Review of Probability Theory

Expectation is an averaging operation; as you can see from the right-hand sideof the definition, it “weights” values assigned by the random variable by their“likelihood” as assigned by the probability measure. We can define the expectationfor functions of random variables similarly:

E[φ(X)] =∫

φ(X (ω))P(dω) =∫ ∞

−∞φ(x)d F(x). (A.14)

We refer to E[X ] as the mean of X , and we often denote it by μ. If we chooseφ(X) = Xk , we have

E[Xk] =∫ ∞

−∞xkd F(x). (A.15)

where E[Xk] is called the kth moment about zero of the random variable X . If wechoose φ(X) = (X − μ)k , we have

E[(X − μ)k] =∫ ∞

−∞(x − μ)kd F(x). (A.16)

whereE[(X −μ)k] is called the kthmoment about the mean of the randomvariable X .

A.3.5 Discrete Random Variables

If X is a discrete randomvariable, then F(x) is a step function, andd F(x) is computedas a difference F(x) − F(x−). Note that this difference will be zero except at jumppoints (steps) of F(x). In this case, d F(x) is known as the mass function p(x) andis defined for each jump point x of F(x). Notice that

p(x) = d F(x) = F(x)− F(x−) = P(X ≤ x)− P(X < x) = P(X = x). (A.17)

Thus for a discrete random variable X , E[X ] is calculated as

E[X ] =∑

x

xp(x). (A.18)

Example A.12 Consider the random variable X in ExampleA.10. Here

d F(x) = p(x) =

⎧⎪⎪⎪⎨

⎪⎪⎪⎩

18 if x = 0;38 if x = 1;38 if x = 2;18 if x = 3.

Page 11: Appendix A Review of Probability Theory - Home - …978-3-319-20946...Appendix A: Review of Probability Theory 327 To reiterate, a sample space is a set of outcomes; it obeys the typical

Appendix A: Review of Probability Theory 335

Then

E(X) =∑

x

xp(x) = 0 · 18

+ 1 · 38

+ 2 · 38

+ 3 · 18

= 11

2(A.19)

and

E(X2) =∑

x

x2 p(x) = 0 · 18

+ 1 · 38

+ 4 · 38

+ 9 · 18

= 3. (A.20)

A.3.6 Continuous Random Variables

If X is a continuous random variable, then F(x) is a continuous function. Thus ithas a derivative f (x); i.e.,

d F(x) = f (x)dx (A.21)

The derivative f (x) = ddx F(x) is called the density function of X . Thus for a

continuous random variable X . Then, E[X ] is calculated by

E[X ] =∫ ∞

−∞x f (x)dx . (A.22)

Example A.13 Consider the random variable X in ExampleA.11. Here

f (x) = d F(x)

dx= λe−λx . (A.23)

This gives

E[X ] =∫ ∞

0xλe−λx dx = 1

λ(A.24)

and

E[X2] =∫ ∞

0x2λe−λx dx = 2

λ2. (A.25)

A.3.7 Variance and Coefficient of Variation

The second moment about the mean, E[(X − μ)2], is known as the variance of therandom variable X and is of great importance in both probability and statistics. Itprovides a simple measure of the dispersion of X around the mean. The variance of

Page 12: Appendix A Review of Probability Theory - Home - …978-3-319-20946...Appendix A: Review of Probability Theory 327 To reiterate, a sample space is a set of outcomes; it obeys the typical

336 Appendix A: Review of Probability Theory

X is written as V ar(X) and is often denoted by σ 2. Variance can be computed interms of the second moment of X by

V ar(X) = E[X2] − (E[X ])2. (A.26)

The square root of the variance is known as the standard deviation, St Dev(X),and is denoted by σ .

Also of great importance is the ratio of standard deviation to mean of the randomvariable, known as the coefficient of variation of X :

C OV = St Dev(X)

E[X ] = σ

μ. (A.27)

A.4 Multiple Random Variables: Joint and ConditionalDistributions

In many applications, we will be interested in studying two or more random variablesdefined on the same probability space. For instance, in a manufacturing environment,we might be interested in studying the number of jobs waiting to be processed (thework-in-process inventory, or wip) at n machines at a given point in time. We areinterested in this section in describing the properties of several random variablessimultaneously. We will discuss the joint distribution of two random variables, butour discussion extends naturally to several random variables or an entire sequenceof random variables.

A.4.1 Events Generated by Pairs of Random Variables

When two random variables X and Y are considered simultaneously, the eventsgenerated by X and Y take the form

{X ∈ EX and Y ∈ EY } = {ω ∈ � : X (ω) ∈ EX and Y (ω) ∈ EY }, (A.28)

where EX and EY are, respectively, subsets of the range space of X and the rangespace of Y . Events generated by X and Y are such sets as {X < x1 and y1 <

Y ≤ y2} or {X ≥ x1 and Y ≥ y1}, or even {X < x1}, which is really the event{X < x1 and Y ≤ ∞}.

To compute probabilities of events generated by pairs of random variables, weneed only to find the subset F ∈ F of the sample space that the event represents, andthen to find the assignment P(F) made by the probability measure to that subset.

Page 13: Appendix A Review of Probability Theory - Home - …978-3-319-20946...Appendix A: Review of Probability Theory 327 To reiterate, a sample space is a set of outcomes; it obeys the typical

Appendix A: Review of Probability Theory 337

A.4.2 Joint Distributions

In the previous section, we defined the cdf of a random variable X to be

F(x) = P(X ≤ x), −∞ < x < ∞. (A.29)

We can similarly define a joint distribution of two random variables X and Y as

F(x, y) = P(X ≤ x and Y ≤ y), −∞ < x < ∞,−∞ < y < ∞. (A.30)

with respect to the joint distribution of X and Y , we refer to the cdf of X alone, or ofY alone, as a marginal distribution. F(x, y) has the following properties, which cor-respond to the properties of the marginal distribution functions we have encounteredearlier.

• 0 ≤ F(x, y) ≤ 1 for −∞ < x < ∞,−∞ < y < ∞.• limx→a+ F(x, y) = F(a, y) and limy→b+ F(x, y) = F(x, b).• If x1 ≤ x2 and y1 ≤ y2, then F(x1, y1) ≤ F(x2, y2).• limx→−∞ F(x, y) = 0, limy→−∞ F(x, y) = 0, limx→∞,y→∞ F(x, y) = 1.• Whenever a ≤ b and c ≤ d, then F(a, c) − F(a, d) − F(b, c) + F(b, d) ≥ 0.

Notice that we can always recover the marginal cdfs from the joint cdf:

limy→∞ F(x, y) = F(x,∞) = FX (x)

limx→∞ F(x, y) = F(∞, y) = FY (y)

Example A.14 Let the joint distribution of X and Y be given by

F(x, y) ={1 − e−x − e−y + e−(x+y), 0 ≤ x < ∞, 0 ≤ y < ∞;0 otherwise.

Then the marginal cdfs of X and Y are, respectively,

FX (x) = limy→∞ F(x, y) =

{1 − e−x 0 ≤ x < ∞;0 otherwise.

FY (y) = limx→∞ F(x, y) =

{1 − e−y 0 ≤ y < ∞;0 otherwise.

Page 14: Appendix A Review of Probability Theory - Home - …978-3-319-20946...Appendix A: Review of Probability Theory 327 To reiterate, a sample space is a set of outcomes; it obeys the typical

338 Appendix A: Review of Probability Theory

A.4.3 Determining Probabilities from the JointDistribution Function

Just as in the one-dimensional case, the joint distribution function of X and Y allowsus to compute the probability of any event generated by the random variables Xand Y . Any event of the form {X ≤ x and Y ≤ y} has probability F(x, y). For morecomplicated events, it is often useful to sketch the event as a region in the (x, y)

plane. Doing so, we observe that

P(x1 < X ≤ x2 and Y ≤ y) = F(x2, y) − F(x1, y), (A.31)

and

P(x1 < X ≤ x2 and y1 < Y ≤ y2) = F(x2, y2)−F(x1, y2)−F(x2, y1)+F(x1, y1).(A.32)

Another way to understand the last equality is to examine set relationships. Let

A = {x1 < X ≤ x2 and y1 < Y ≤ y2}B = {X ≤ x2 and Y ≤ y2}C = {X ≤ x1 and Y ≤ y2}D = {X ≤ x2 and Y ≤ y1}

We are interested in computing P(A). Notice that any point of the set B that doesnot lie in A must lie in C or D; i.e.,

B = A ∪ (C ∪ D). (A.33)

Moreover, the sets A and C ∪ D are mutually exclusive, so that

P(B) = P(A) + P(C ∪ D). (A.34)

Therefore,

P(A) = P(B) − P(C ∪ D)

= P(B) − (P(C) + P(D) − P(C ∩ D)

)(Property 2,Sect. 2.3)

= P(B) − P(C) − P(D) + P(C ∩ D),

which is what we needed to show.

Page 15: Appendix A Review of Probability Theory - Home - …978-3-319-20946...Appendix A: Review of Probability Theory 327 To reiterate, a sample space is a set of outcomes; it obeys the typical

Appendix A: Review of Probability Theory 339

A.4.4 Joint Mass and Density Functions

As for a single random variable, we can define a joint mass function (for discreterandom variables) or density function (for continuous random variables) for a pairof random variables. We may also have one discrete and one continuous randomvariable, in which case we have a mixture of a mass function and a density function.

When random variables X and Y are both discrete, we define the joint massfunction

p(i, j) = P(X = i and Y = j), ∀ i in the range of X, j in the range of Y (A.35)

The joint mass function has the following properties:

• 0 ≤ p(i, j) ≤ 1 for each i, j .• ∑

i, j p(i, j) = ∑i

∑j p(i, j) = 1.

• F(x, y) = ∑i≤x

∑j≤y p(i, j).

The marginal mass functions are easily calculated from the joint mass function:

pX (x) = P(X = x) =∑

j

p(i, j), pY (y) = P(Y = y) =∑

i

p(i, j). (A.36)

Example A.15 Suppose a coin is tossed three times consecutively. Let X be the totalnumber of heads in the first two tosses, and Y the total number of heads in the lasttwo tosses. Assuming that all 8 outcomes are equally likely, that is,

P({HHH}) = P({HHT}) = P({HTH}) = P({THH})= P({HTT}) = P({THT}) = P({TTH}) = P({TTT}) = 1

8,

the values assigned by X and Y to these outcomes are

X (HHH) = 2 Y (HHH) = 2

X (HHT) = 2 Y (HHT) = 1

X (HTH) = 1 Y (HTH) = 1

X (THH) = 1 Y (THH) = 2

X (HTT) = 1 Y (HTT) = 0

X (THT ) = 1 Y (THT) = 1

X (TTH) = 0 Y (TTH) = 1

X (TTT) = 0 Y (TTT) = 0

Page 16: Appendix A Review of Probability Theory - Home - …978-3-319-20946...Appendix A: Review of Probability Theory 327 To reiterate, a sample space is a set of outcomes; it obeys the typical

340 Appendix A: Review of Probability Theory

This gives the joint mass function to be

p(0, 0) = P(X = 0 and Y = 0) = P({TTT}) = 1

8

p(0, 1) = P(X = 0 and Y = 1) = P({TTH}) = 1

8p(0, 2) = P(X = 0 and Y = 2) = P(∅) = 0

p(1, 0) = P(X = 1 and Y = 0) = P({HTT}) = 1

8

p(1, 1) = P(X = 1 and Y = 1) = P({HTH} ∪ {THT}) = 1

4

p(1, 2) = P(X = 1 and Y = 2) = P({THH}) = 1

8

p(2, 0) = P(X = 2 and Y = 0) = P(∅) = 1

8

p(2, 1) = P(X = 2 and Y = 1) = P({HHT}) = 1

8

p(2, 2) = P(X = 2 and Y = 2) = P({HHH}) = 1

8

and the marginal mass functions by

pX (0) = P({TTH} ∪ {TTT}) = 1

4

pX (1) = P({HTH} ∪ {THH} ∪ {HTT} ∪ {THT}) = 1

2

pX (2) = P({HTH} ∪ {THT}) = 1

4

pY (0) = P({HTT} ∪ {TTT}) = 1

4

pY (1) = P({HHT} ∪ {HTH} ∪ {THT} ∪ {TTH}) = 1

2

pY (2) = P({HTH} ∪ {THT}) = 1

4

When random variables X and Y are both continuous, we define the joint densityfunction by

f (x, y) = ∂2

∂x∂yF(x, y). (A.37)

The joint density function has the following properties:

• f (x, y) ≥ 0 for all x, y.• ∫ ∞

−∞∫ ∞−∞ f (s, t)dtds = 1.

• F(x, y) = ∫ x−∞

∫ y−∞ f (s, t)dtds.

Page 17: Appendix A Review of Probability Theory - Home - …978-3-319-20946...Appendix A: Review of Probability Theory 327 To reiterate, a sample space is a set of outcomes; it obeys the typical

Appendix A: Review of Probability Theory 341

The marginal density functions are easily calculated from the joint density func-tion:

fX (x) = F(x,∞) =∫ x

−∞

∫ ∞

−∞f (s, t)dtds,

fY (y) = F(∞, y) =∫ ∞

−∞

∫ y

−∞f (s, t)dtds.

Example A.16 Let X and Y be continuous random variables with ranges (0,∞) and(0,∞), respectively, and joint density function

f (x, y) ={

xe−x(y+1), 0 ≤ x < ∞, 0 ≤ y < ∞0 otherwise.

The marginal density functions are given by

fX (x) =∫ ∞

0xe−x(y+1)dy = xe−x

∫ ∞

0e−xydy = e−x , 0 ≤ x < ∞

and

fY (y) =∫ ∞

0xe−x(y+1)dx = 1

(y + 1)2, 0 ≤ y < ∞.

A.4.5 Conditional Distributions

For two random variables X and Y with joint distribution function F(x, y), and mar-ginal distribution functions FX (x) and FY (y), respectively, we define the conditionaldistribution function of X given Y as

G X |Y (x |y) = F(x, y)

FY (y)(A.38)

provided FY (y) > 0. Whenever FY (y) = 0, G X |Y (x |y) is not defined. Similarly, wedefine the conditional distribution function of Y given X as

GY |X (y|x) = F(x, y)

FX (x)(A.39)

provided FX (x) > 0. Whenever FX (x) = 0, GY |X (y|x) is not defined. In terms ofconditional probability, G X |Y (x |y) and GY |X (y|x) are, respectively, P(X ≤ x |Y ≤ y)

and P(Y ≤ y|X ≤ x).

Page 18: Appendix A Review of Probability Theory - Home - …978-3-319-20946...Appendix A: Review of Probability Theory 327 To reiterate, a sample space is a set of outcomes; it obeys the typical

342 Appendix A: Review of Probability Theory

If X and Y are both discrete random variables, we can define the conditional massfunction of X , given that Y = j as

pX |Y (i | j) = P(X = i |Y = j) = P(X = i and Y = j)

P(Y = j)= p(i, j)

pY ( j), pY ( j) > 0.

(A.40)The condition mass function of Y , given that X = i , pY |X ( j |i) is defined similarly.

Example A.17 Suppose we perform the following experiment. First, we roll a fairdie and observe the number of spots on the face pointing up. Call this number x .Then, a fair coin is tossed x times, and the number of resulting heads is recorded.We can think of this experiment as defining two random variables X and N , whereX is the first number selected and N is the number of heads observed.

The marginal mass function of X is given by

pX (x) ={

16 x = 1, 2, . . . , 6;0 otherwise.

The conditional mass function of N given X is

pN |X (n|x) = P(N = n|X = x) =(

x

n

)(1

2)

x

, n = 0, 1, . . . , x .

Thus the joint mass function of X and N is given by

p(x, n) = p(n|x)pX (x) =(

x

n

)(1

2

)n

· 16, x = 1, 2, . . . , 6, n = 0, 1, . . . , x

and the marginal mass function of N is given by

pN (n) =6∑

x=1

(x

n

)(1

2

)n

· 16

n = 0, 1, 2, . . . , x .

In the case that X and Y are both continuous random variables, we define con-ditional density functions of X, given that Y = y, and of Y , given that X = xanalogously:

fX |Y (x |y) = f (x, y)

fY (y)and fY |X (y|x) = f (x, y)

fX (x)

provided, respectively, that fY (y) > 0 and fX (x) > 0.

Page 19: Appendix A Review of Probability Theory - Home - …978-3-319-20946...Appendix A: Review of Probability Theory 327 To reiterate, a sample space is a set of outcomes; it obeys the typical

Appendix A: Review of Probability Theory 343

Example A.18 Consider the joint density function of ExampleA.16. For this case,

fX |Y (x |y) = f (x, y)

fY (y)= xe−x(y+1)

1/(y + 1)2= x(y+1)2e−x(y+1), 0 ≤ x < ∞, 0 ≤ y < ∞

(A.41)

and

fY |X (y|x) = f (x, y)

fX (x)= xe−x(y+1)

e−x= xe−xy, 0 ≤ x < ∞, 0 ≤ y < ∞.

(A.42)

When the random variables are clear from the context, we will drop the subscriptsof the conditional distribution, mass, and density functions.

A.4.6 A Mixed Case from Queueing Theory

There are many cases of interest that involve the joint distribution of a discrete and acontinuous random variable. All of our results will carry over to this mixed case. Inthis section, we will work through an example from queueing theory that illustratesthe use of a mixed density function.

Suppose that individual jobs arrive at random to a single machine for processing.We will call the sequence of arriving jobs the arrival stream. Jobs are served one-at-a-time in the order of arrival. When processing is complete, the jobs depart forfinished goods inventory. Those jobs that arrive while the machine is processinganother job wait in a queue until the machine becomes available and all previouslyarrived jobs are completed. Let us define At as the random number of jobs that arriveto the machine in the time interval [0, t], where t is a fixed time. Note that At isa discrete random variable that can take on values 0, 1, 2, . . .. Suppose we modelthe probability distribution of At as a Poisson distribution; i.e., we assume the massfunction of At is given by

p(a) = P(At = a) = e−λt (λt)a

a! , a = 0, 1, 2, . . . , (A.43)

where λ is a given positive constant (we will justify this particular choice of massfunction later).

Another random variable of interest to us is the length of time it takes for aparticular job to be processed on the machine. Note that here we are measuring thetime from start to completion of processing of the job; we are not including the timethat the job may wait in queue before processing begins. We will assume that all thejobs are statistically identical and independent of each other; that is, the processingtime of each job is selected independently from a common distribution function. Wedefine T as the time it takes to process a particular job, and we assume that T is a

Page 20: Appendix A Review of Probability Theory - Home - …978-3-319-20946...Appendix A: Review of Probability Theory 327 To reiterate, a sample space is a set of outcomes; it obeys the typical

344 Appendix A: Review of Probability Theory

continuous random variable that follows an exponential distribution; i.e., we assumethat T has density function

f (t) ={

γ e−γ t , 0 < t < ∞0, otherwise,

where γ is another given positive constant.With these definitions, let us attempt to find the distribution function for a third

random variable N , which is the number of jobs arriving during the service timeof a particular job. We begin by considering the pair (N , T ), where N is a discreterandom variable and T is a continuous random variable. Note that if the actual valueof T were known (say, t), then N would have the same mass function as At . Thus,the conditional mass function of N , given that T = t is

f (n|t) = P(N = n|T = t) = e−λt (λt)n

n! , n = 0, 1, 2, . . . ,

The joint density function of (N , T ) is then obtained by multiplying this condi-tional mass function by the marginal density function of T ; i.e.,

f (n, t) = f (n|t) f (t) = e−λt (λt)n

n! · γ e−γ t = γ e−(λ+γ )t (λt)n

n! , n = 0, 1, . . . , t > 0.

To find the marginal mass function of N , we integrate the joint density functionover all t :

pN (n) = P(N = n) =∫ ∞

0f (n, t)dt

=∫ ∞

0

γ e−(λ+γ )t (λt)n

n! dt

= γ λn

n!∫ ∞

0e−(λ+γ )t t ndt

= γ λn

n! (λ + γ )

∫ ∞

0tn(λ + γ )e−(λ+γ )t dt

Note that the integral of the right-hand side is the nth moment of an exponentialrandom variable with parameter λ + γ ; hence

pN (n) = γ λn

n! (λ + γ )

[ n!(λ + γ )n

]=

( λ

λ + γ

)n ( γ

λ + γ

), n = 0, 1, 2, . . . .

All these manipulations carry through in spite of the fact that N is discrete and Tis continuous. Notice that N follows a geometric distribution. Can you provide anyintuitive justification for this result?

Page 21: Appendix A Review of Probability Theory - Home - …978-3-319-20946...Appendix A: Review of Probability Theory 327 To reiterate, a sample space is a set of outcomes; it obeys the typical

Appendix A: Review of Probability Theory 345

A.4.7 Independence

We have seen that the probability of any event generated jointly by random variablesX andY can be computed via the joint distribution function. That is, the joint distribu-tion function encapsulates not only the probability structure of each random variableseparately, but also of their relationship. In general, it is not possible to deduce theprobability of an event generated by both X and Y if we only know the marginaldistributions of X and Y . This section considers a particular kind of relationship(namely, independence) between random variables that does allow us to deduce thejoint distribution frommarginal distributions. We first define the idea of independentevents.

Definition 61 Two events F1 and F2 (defined on the same probability space) aresaid to be independent if

P(F1 ∩ F2) = P(F1)P(F2). (A.44)

Written in terms of conditional probability, the definition yields the following:Two events F1 and F2 are independent if and only if

P(F1|F2) = P(F1)P(F2)/P(F2) = P(F1) (A.45)

and

P(F2|F1) = P(F1)P(F2)/P(F1) = P(F2). (A.46)

The definition of independent events leads to an analogous definition of indepen-dent random variables.

Definition 62 Two random variables X and Y are independent if the probabilityof any event generated jointly by the random variables equals the product of theprobabilities of the marginal events generated by each random variable; i.e., for anysubsets R1 of the range of X and R2 of the range of Y ,

P(X ∈ R1, Y ∈ R2) = P(X ∈ R1)P(Y ∈ R2). (A.47)

Since the joint distribution function yields the probability of any event generatedby X andY , and themarginal distributions yield the probability of any event generatedby X and Y separately, the above definition is equivalent to the following statement.Random variables X and Y are independent if and only if

F(x, y) = FX (x)FY (y) for any − ∞ < x < ∞,−∞ < y < ∞. (A.48)

In terms of the mass or density functions, the above statement is equivalent to thefollowing statements.

Page 22: Appendix A Review of Probability Theory - Home - …978-3-319-20946...Appendix A: Review of Probability Theory 327 To reiterate, a sample space is a set of outcomes; it obeys the typical

346 Appendix A: Review of Probability Theory

Discrete random variables X and Y are independent if and only if

p(x, y) = pX (x)pY (y) for any x, y. (A.49)

Continuous random variables X and Y are independent if and only if

f (x, y) = fX (x) fY (y) for any x, y. (A.50)

Determiningwhether X andY are independent involves verifying any of the aboveconditions.

Example A.19 Suppose the joint density function of X and Y is given by

f (x, y) ={2e−x−y, 0 ≤ x ≤ y, 0 ≤ y ≤ ∞0, otherwise.

Notice that f (x, y) can be written as f (x) f (y) = (2e−x )(e−y). But

fX (x) = 2∫ ∞

xe−x−ydy

= 2e−x∫ ∞

xe−ydy

= 2e−2x

fY (y) = 2∫ y

0e−x−ydx

= 2e−y[1 − e−y].

Clearly f (x, y) �= fX (x) fY (y), and hence X and Y are not independent.

A.5 Bayesian Analysis

A.5.1 Bayes’ Theorem

Bayes’ theorem is a particularly useful statement regarding conditional probabilities.Let the events B1, B2, . . . make up a partition of the sample space �. Now supposewe are able to observe from an experiment that the event A has occurred, but we donot know which of the events {B j } has occurred (because the B j ’s form a partition,one and only one of them occurs). Bayes’ theorem, which is a simple restatementof the definition of conditional probability (Eq.A.4) and the law of total probability

Page 23: Appendix A Review of Probability Theory - Home - …978-3-319-20946...Appendix A: Review of Probability Theory 327 To reiterate, a sample space is a set of outcomes; it obeys the typical

Appendix A: Review of Probability Theory 347

(Eq.A.7), allows us to refine our guess at the probabilities of occurrence of each ofthe B j ’s:

P(B j |A) = P(A|B j )P(B j )∑ni P(A|Bi )P(Bi )

(A.51)

Bayes theorem is of particular importance in modeling experiments where newinformation (in terms of the occurrence of an event or empirical evidence in the formof data) may lead us to update the likelihood of other events. Speaking somewhatinformally, suppose we are interested in estimating some property of a probabilisticmechanism that we will term a system state, and suppose we have available to ussome empirical output of that probabilistic mechanism that we will term a sample.Then Bayes’ theorem can be used to help refine our estimate of the system state asfollows:

P(state|sample) = P(sample|state)P(state)∑all states P(sample|state)P(state)

(A.52)

Beyond the formal use of Bayes’ theorem in Eq.A.51, this interpretation allowsus to use the result to refine our model of the probabilistic mechanism based onobserved output from the mechanism. Clearly, this expression may have importantapplications when modeling damage accumulation. The following section providesfurther details.

A.5.2 Bayesian Inference and Bayesian Updating

Bayesian analysis refers to a collection of procedures inwhichBayes’ theorem is usedto refine estimates of event likelihoods as new evidence become available. Bayesiananalysis includes Bayesian inference, Bayesian updating, Bayesian regression, andmany other techniques. This approach has found wide application in many fields andis often contrastedwith frequentist reasoning, which assumes that observations (data)are the product of a statistical mechanism (distribution) whose design is known apriori and remains constant as data are accumulated. Bayesian analysis, on the otherhand, asserts that the statistical mechanisms producing observed data are themselvesprobabilistic in nature, so that, in particular, their parameters are random and can beestimated and updated as observations are revealed. In simple terms, the frequen-tist approach holds that data are realizations from a mechanism whose parametersare fixed (and thus data are potentially infinitely repeatable), while the Bayesianapproach holds that available data from a particular study are fixed realizations fromanunknown (random)mechanism, and thus as additional data are revealed, our under-standing of the random nature of themechanism changes. Bayesian analysis providesa means of updating the estimates of the statistical properties of the mechanism (i.e.,its parameters).

Page 24: Appendix A Review of Probability Theory - Home - …978-3-319-20946...Appendix A: Review of Probability Theory 327 To reiterate, a sample space is a set of outcomes; it obeys the typical

348 Appendix A: Review of Probability Theory

Suppose a probabilistic mechanism produces a random variable X , and supposethe distribution of X involves a parameter � that can take on only discrete values{θ1, θ2, . . .}). In Bayesian analysis, the parameter� is taken to be a random variable,and we begin with a prior distribution, which conveys the probability law of theparameter prior to observing any data. If the parameter takes on discrete values, theprior distribution can be described by a probability mass function p, i.e., {p(θi ) =P(� = θi ), i = 1, 2, . . .}. The choice of the prior distribution may be based onany already available information, such as previous studies or other data sources,expertise or intuition, or simply convenience. In practice, it is common to assumea uniform distribution for the prior distribution, which is commonly referred to asdiffuse prior [1].

Consider now that new information e becomes available as a realization of theprobabilistic mechanism. Then, conditioned on the new information, the updatedpmf of �, denoted by p′, where p′(θi ) = P(� = θi |e), i = 1, 2, . . . can be obtainedfrom Bayes’ theorem as [1]

p′(θi ) = P(e|� = θi )p(θi )∑j P(e|� = θ j )p(θ j )

, i = 1, 2, . . . , (A.53)

where P(e|� = θi ) is the conditional probability of the information given that theparameter takes on the value θi . The pmf p′ is known as the posterior probabilitymass function; i.e., the new pmf for � given the observations.

The expected value of �, computed using the posterior distribution, is known asthe Bayesian (updated) estimator of the parameter �, and is computed as

θ̂ ′ = E[�|e] =∑

i

θi p′(θi ) (A.54)

The new information e leads to a change in the pmf of �, and this change shouldbe reflected in the evaluation of the probability of the random variable X . Based onthe theorem of total probability (Eq.A.7) and using the posterior pmf from Eq.A.53,we obtain the distribution function of X as follows:

P(X ≤ x) =∑

i

P(X ≤ x |θi )p′(θi ) (A.55)

Similarly, for the continuous case, we can define f (θ),−∞ ≤ θ ≤ ∞ as the priordensity function for �. Then, when additional information e becomes available, theposterior probability density function f ′ can be computed as follows [1]:

f ′(θ) = P(e|� = θ) f ′(θ)∫ ∞−∞ P(e|� = θ) f ′(θ)dθ

(A.56)

Page 25: Appendix A Review of Probability Theory - Home - …978-3-319-20946...Appendix A: Review of Probability Theory 327 To reiterate, a sample space is a set of outcomes; it obeys the typical

Appendix A: Review of Probability Theory 349

where P(e|� = θ) is the conditional probability of the information (data) given� = θ,−∞ ≤ θ ≤ ∞. This is commonly referred to as the likelihood function of� and it is denoted by L(θ). Then, the updated estimator of the parameter is

θ̂ ′ = E[�|e] =∫ ∞

−∞θ f ′(θ)dθ (A.57)

and similar to Eq.A.55

P(X ≤ x) =∫ ∞

−∞P(X ≤ x |θ) f ′(θ)dθ (A.58)

The posterior distribution can be used to develop Bayesian inferential statistics,such as Bayesian confidence intervals. As an aside, one of the primary differencesbetween the frequentist and Bayesian approaches is how confidence intervals areinterpreted. In the frequentist case, confidence intervals are interpreted in terms ofcoverage; an α-level confidence interval means that in a large number of repeatedtrials with the same number of observations, approximately α · 100-percent of thecomputed confidence intervals contain the true parameter. In the Bayesian case, weinterpret the confidence interval in terms of probability; anα-level confidence intervalmeans that, based on the information provided, the parameter is in the computedconfidence interval with probability α.

Reference

1. A.H-S. Ang, W.H. Tang, Probability Concepts in Engineering: Emphasis on Applications toCivil and Environmental Engineering (Wiley, New York, 2007)

Page 26: Appendix A Review of Probability Theory - Home - …978-3-319-20946...Appendix A: Review of Probability Theory 327 To reiterate, a sample space is a set of outcomes; it obeys the typical

Index

AAccelerated testing, 84Advanced First-Order Second Moment

(AFOSM), 32Age replacement

discounted, 288optimal policy, 286

ALARP region, 266Alternating renewal processes, 74Availability, 221, 276, 282

asymptotic, 283limiting average, 314limiting interval, 283Markovian degradation, 319mission, 283pointwise/instantaneous, 282

Average cost rate, 299

BBasic reliability problem, 28Bathtub curve, 36Bayes theorem, 348Bayesian analysis, 278, 348

diffuse prior, 350likelihood function, 351posterior distribution, 350prior distribution, 350

Bayesian updating, 278Bridge deck condition, 157

CCarbon dioxide emissions, 234Censored data, 83

Compound Poisson process, 60, 123, 188,194

Compound renewal process, 126Conditional distributions, 343Conditional failure rate, 35Conditional probability, 331Control-limit policy, 81Convolution, 65Cost of loss of human life, 249Counting process, 51Cox-Lewis Model, 137Cradle to grave, 233

DDamage accumulation with annealing, 144Data collection

challenges, 84purpose, 83simulation, 85

Decision-making, 3Decision theory, 5, 7, 239Decisions

alternative solution, 5decision tree, 7expected utility theorem, 4in the public interest, 8, 241, 249rational, 3

Decommissioning, 248Degradation, 24

analytical models, 99basic formulation, 81conditioned on damage state, 140damage accumulation with annealing,144

© Springer International Publishing Switzerland 2016M. Sánchez-Silva and G.-A. Klutke, Reliability and Life-Cycle Analysisof Deteriorating Systems, Springer Series in Reliability Engineering,DOI 10.1007/978-3-319-20946-3

351

Page 27: Appendix A Review of Probability Theory - Home - …978-3-319-20946...Appendix A: Review of Probability Theory 327 To reiterate, a sample space is a set of outcomes; it obeys the typical

352 Index

definition, 80progressive, 101, 129shock-based, 105, 118

Degradation data, 83Deterioration, see DegradationDiscount factor, 219Discounting, 8, 239, 241

economic growth, 241function, 241, 242Harberger approach, 242pure time consumption, 241rate, 241social discount rate (SDR), 241Social Opportunity Cost (SOC), 242social rate of time preference (SRTP),241

utility discount rate, 241weighted average approach, 242

DistributionGaussian, 83generalized gamma, 38phase-type, 173

Distribution function, 334Downtimes, 282Duane model, 126

EElasticity, 241Elementary damage models, 117Elementary renewal theorem, 69End of service life, 248Engineering judgement, 157Event space, 329Expectation, 335Expected number of renewals, 228Expected value, 8

FFatigue endurance limit, 138Fault tree analysis, 23First-Order Reliability Method (FORM), 32First-Order Second Moment (FOSM), 32First passage, 82FMECA, 23Fourier inversion formula, 188Fragility curves, 121

GGamma process, 93, 133, 196

bridge sampling, 134increment sampling, 134

sequential sampling, 134Generalized reliability problem, 30Geometric process, 135, 182

ratio of the process, 135threshold geometric process, 137Wald’s equation, 143

HHazard function, 35, 52, 84Hazard rate, 35, 227Health monitoring, 84How do systems fail?, 24Human life losses, 249, 250

saving life-years, 249saving lives, 249

IImpulse control, 302

optimal policy, 306Increment-sampling method, 202Independence, 347Infant mortality, 36Inspection

rate, 314Inspection paradox, 77Inspections, 277Instantaneous intervention intensity, 227Instantaneous wear, 130Interference theory, 28

JJoin Committee on Structural Safety, 23Joint probability distributions, 339

KKey renewal theorem (KRT), 72, 73

LLévy process, 187

central moments, 191characteristic exponent, 189characteristic function, 188combined mechanisms, 197compound Poisson process as, 188, 194decomposition, 190degradation formalism, 192gamma process as, 188, 196Gaussian coefficient, 190inversion formula, 200

Page 28: Appendix A Review of Probability Theory - Home - …978-3-319-20946...Appendix A: Review of Probability Theory 327 To reiterate, a sample space is a set of outcomes; it obeys the typical

Index 353

Lévy-Ito decomposition, 190Lévy-Khintchine formula, 189Lévy measure, 190non-homogeneous, 193, 204progressive degradation, 195

Laplace transform, 65, 213, 258Latent variables, 80Law of total probability, 332Least-squares method, 90Life-cycle, 234Life-cycle analysis (LCA), 14, 233, 234Life-cycle cost analysis (LCCA), 14, 235

benefit, 245, 258decision making, 237formulation, 238intervention costs, 246optimization problem, 265systems abandoned after failure, 256systems systematically reconstructed,259

Life-cycle sustainability, 14Life Quality Index (LQI), 250

formulation, 250life expectancy, 251

Lifetime, 24, 34, 81, 234Likelihood, 327Limit state, 26, 42, 81

failure, 81serviceability, 82, 222ultimate, 222

Linear regression, 90

MMaintenance

“as bad as old”, 275“as good as new”, 118, 275classification, 274corrective, 274definition, 273imperfect, 275management, 276minimal maintenance, 275perfect maintenance, 275policies, 276preventive, 274reactive, 274update, 276

Maintenance modelsage-replacement, 284infrastructure, 300no replacement at failures, 295non self-announcing failures, 313

periodic complete repair, 291periodic minimal repair, 296periodic replacement, 290permanent monitoring, 301preventive maintenance models, 284

Maintenance region, 306Marked point process, 121Markov chain, continuous time (CTMC),

161Chapman-Kolmogorov equations, 162,163

infinitesimal generator, 163Kolmogorov differential equations, 162,163

transition probability function, 161Markov chain, discrete time (DTMC), 151

time homogeneous, 152transition probability, 152, 157

Markov process, 151, 223absorbing state, 154balance equations, 154embedded Markov chain, 169irreducible, 154Markov property, 151Markov renewal process, 169periodic (aperiodic), 154regression-based optimization, 158semi-Markov kernel, 169semi-Markov process, 168supplementary variables, 170time homogeneous, 152, 161

Markovian degradation, 319semi-regenerative process, 320

Mathematical definition of risk, 16Maximum Likelihood (ML), 93, 95, 135Mean square error, 89Mean Time to Failure (MTTF), 34, 120, 126,

128, 283Mean Time to Repair (MTTR), 283Method of moments, 135Mission of a system, 21, 25Moment Matching method (MM), 93, 94Monte Carlo simulation, 171, 227

NNet present value, 240, 241Nominal life, 24, 81Non self-announcing failures, 313

periodic inspections, 315quantile based inspections (QBI), 321

Nonhomogeneous Poisson process, 59Nonlinear regression, 91Non-reparable systems, 83

Page 29: Appendix A Review of Probability Theory - Home - …978-3-319-20946...Appendix A: Review of Probability Theory 327 To reiterate, a sample space is a set of outcomes; it obeys the typical

354 Index

OObjective function, 11, 12Operation policy, 13Opportunity, 16Optimal design, 265Optimization

constrain optimization problem, 11dynamic optimization, 13multi-criteria optimization, 12stochastic optimization, 12

PPavement Condition Index (PCI), 157Performance measures, 80, 283

limiting average availability, 314long run inspection rate, 315maintained systems, 282

Periodiccomplete repair, 291inspections, 315minimal repair, 296optimal replacement, 298replacement models, 290

Permanent monitoring, 301Phase-type distribution, 173

numerical approximation, 177properties, 176

Point process, 50, 52conditional intensity function, 52counting process, 51inter-event times, 52marked, 53Poisson process, 54renewal process, 61simple, 50

Poisson process, 54, 123compound, 60inter-event times, 56nonhomogeneous, 59

Power law intensity, 126Prediction, 9Probabilistic risk analysis (PRA), 27Probability, 327Probability measure, 330Probability space, 328, 330Progressive degradation, 129

rate based, 130Public interest, 8

QQuantile-based inspections, 321

Queueing theory, 345

RRandom experiment, 328Random variables, 332

continuous, 337discrete, 336

Rational decisions, 18Regenerative process, 227Regression analysis, 89Reliability

definition, 25history, 22

Reliability function, 36Reliability index, 29, 32Reliability methods, 27Remaining capacity, 81, 123Remaining life, 81Renewal density, 214Renewal function, 214Renewal process, 61

alternating, 74Blackwell’s theorem, 69, 73central limit theorem for, 68elementary renewal theorem, 69forward recurrence time, 72key renewal theorem, 72renewal equation, 69renewal function, 68strong law for, 63

Renewal-type equations, 69Repairable systems, 275Return, 16

gain/reward/payoff, 16loss, 16

Risk, 15and reliability, 26opportunity, 16perceived, 15types of risk, 15

Risk analysis, 26Risk tolerance, 17, 266

SSafety factor, 27Safety margin, 28Sample space, 328Second-Order Reliability Method (SORM),

32Shock-based degradation, 105, 118

damage accumulation, 121first shock model, 118

Page 30: Appendix A Review of Probability Theory - Home - …978-3-319-20946...Appendix A: Review of Probability Theory 327 To reiterate, a sample space is a set of outcomes; it obeys the typical

Index 355

increasing degradation models, 139independent damage model, 119renewal model, 126

Shocks, 105Shot noise model, 144Simulation, 31Societal value of statistical life (SVSL), 254Societal Willingness to Pay (SWTP), 249Standard Brownian motion, 132Stochastic mechanics, 10Stochastic process, 47

definition, 47sample path, 48

Stress-strength model, 117Sufficiency Rating Index (SRI), 159Sustainability, 236Sustainable development, 236System condition evaluation, 157Systems

abandoned after first failure, 118, 128,256

successively reconstructed, 212, 215,256

TTime mission, 234, 248Time to failure, 34Truth, 327

UUptimes, 282Utility, 4

measure, 234

VValue of statistical life, 250Value per Statistical Life-Year, 250Variance reduction techniques, 31Von Neumann–Morgenstern, 3

WWeibull model, 126Weibull process, 137Wiener process, 132Willingness to Pay (WTP), 252