9 expectation and variance · 9 expectation and variance two numbers are often used to summarize a...

9 Expectation and Variance

Two numbers are often used to summarize a probability distribu-tion for a random variable X. The mean is a measure of the cen-ter or middle of the probability distribution, and the variance is ameasure of the dispersion, or variability in the distribution. Thesetwo measures do not uniquely identify a probability distribution.That is, two different distributions can have the same mean andvariance. Still, these measures are simple, useful summaries of theprobability distribution of X.

9.1 Expectation of Discrete Random Variable

The most important characteristic of a random variable is its ex-pectation. Synonyms for expectation are expected value, mean,and first moment.

The definition of expectation is motivated by the conventionalidea of numerical average. Recall that the numerical average of nnumbers, say a1, a2, . . . , an is

1

n

n∑k=1

ak.

We use the average to summarize or characterize the entire collec-tion of numbers a1, . . . , an with a single value.

Example 9.1. Consider 10 numbers: 5, 2, 3, 2, 5, -2, 3, 2, 5, 2.The average is

5 + 2 + 3 + 2 + 5 + (−2) + 3 + 2 + 5 + 2

10=

27

10= 2.7.

We can rewrite the above calculation as

−2× 1

10+ 2× 4

10+ 3× 2

10+ 5× 3

10

102

Definition 9.2. Suppose X is a discrete random variable, we de-fine the expectation (or mean or expected value) of X by

EX =∑x

x× P [X = x] =∑x

x× pX(x). (15)

In other words, The expected value of a discrete random variableis a weighted mean of the values the random variable can take onwhere the weights come from the pmf of the random variable.

• Some references use mX or µX to represent EX.

• For conciseness, we simply write x under the summation sym-bol in (15); this means that the sum runs over all x values inthe support of X. (Of course, for x outside of the support,pX(x) is 0 anyway.)

9.3. Analogy: In mechanics, think of point masses on a line witha mass of pX(x) kg. at a distance x meters from the origin.

In this model, EX is the center of mass (the balance point).This is why pX(x) is called probability mass function.

Example 9.4. When X ∼ Bernoulli(p) with p ∈ (0, 1),

Note that, since X takes only the values 0 and 1, its expectedvalue p is “never seen”.

9.5. Interpretation: The expected value is in general not a typicalvalue that the random variable can take on. It is often helpful tointerpret the expected value of a random variable as the long-runaverage value of the variable over many independent repetitionsof an experiment

Example 9.6. pX (x) =

1/4, x = 03/4, x = 20, otherwise

103

Example 9.7. For X ∼ P(α),

EX =∞∑i=0

ie−α(α)i

i!=

∞∑i=1

e−α(α)i

i!i+ 0 = e−α (α)

∞∑i=1

(α)i−1

(i− 1)!

= e−αα∞∑k=0

αk

k!= e−ααeα = α.

Example 9.8. For X ∼ B(n, p),

EX =

n∑i=0

i

(n

i

)pi(1− p)n−i =

n∑i=1

in!

i! (n− i)!pi(1− p)n−i

= n

n∑i=1

(n− 1)!

(i− 1)! (n− i)!pi(1− p)n−i = n

n∑i=1

(n− 1

i− 1

)pi(1− p)n−i

Let k = i− 1. Then,

EX = n

n−1∑k=0

(n− 1

k

)pk+1(1− p)n−(k+1) = np

n−1∑k=0

(n− 1

k

)pk(1− p)n−1−k

We now have the expression in the form that we can apply thebinomial theorem which finally gives

EX = np(p+ (1− p))n−1 = np.

We shall revisit this example again using another approach in Ex-ample 11.48.

Example 9.9. Pascal’s wager : Suppose you concede that youdon’t know whether or not God exists and therefore assign a 50percent chance to either proposition. How should you weigh theseodds when deciding whether to lead a pious life? If you act piouslyand God exists, Pascal argued, your gain–eternal happiness–is in-finite. If, on the other hand, God does not exist, your loss, ornegative return, is small–the sacrifices of piety. To weigh thesepossible gains and losses, Pascal proposed, you multiply the prob-ability of each possible outcome by its payoff and add them all up,forming a kind of average or expected payoff. In other words, themathematical expectation of your return on piety is one-half infin-ity (your gain if God exists) minus one-half a small number (yourloss if he does not exist). Pascal knew enough about infinity to

104

know that the answer to this calculation is infinite, and thus theexpected return on piety is infinitely positive. Every reasonableperson, Pascal concluded, should therefore follow the laws of God.[14, p 76]

• Pascals wager is often considered the founding of the math-ematical discipline of game theory, the quantitative study ofoptimal decision strategies in games.

9.10. Technical issue: Definition (15) is only meaningful if thesum is well defined.

The sum of infinitely many nonnegative terms is always well-defined, with +∞ as a possible value for the sum.

• Infinite Expectation : Consider a random variable X whosepmf is defined by

pX (x) =

1cx2 , x = 1, 2, 3, . . .0, otherwise

Then, c =∑∞

n=11n2 which is a finite positive number (π2/6).

However,

EX =∞∑k=1

kpX(k) =∞∑k=1

k1

c

1

k2=

1

c

∞∑k=1

1

k= +∞.

Some care is necessary when computing expectations of signedrandom variables that take infinitely many values.

• The sum over countably infinite many terms is not always welldefined when both positive and negative terms are involved.

• For example, the infinite series 1−1 + 1−1 + . . . has the sum0 when you sum the terms according to (1−1)+(1−1)+ · · · ,whereas you get the sum 1 when you sum the terms accordingto 1 + (−1 + 1) + (−1 + 1) + (−1 + 1) + · · · .

• Such abnormalities cannot happen when all terms in the infi-nite summation are nonnegative.

105

It is the convention in probability theory that EX should be eval-uated as

EX =∑x≥0

xpX(x)−∑x<0

(−x)pX(x),

• If at least one of these sums is finite, then it is clear whatvalue should be assigned as EX.

• If both sums are +∞, then no value is assigned to EX, andwe say that EX is undefined.

Example 9.11. Undefined Expectation: Let

pX (x) =

1

2cx2 , x = ±1,±2,±3, . . .0, otherwise

Then,

EX =∞∑k=1

kpX (k)−−1∑

k=−∞(−k) pX (k).

The first sum gives

∞∑k=1

kpX (k) =∞∑k=1

k1

2ck2=

1

2c

∞∑k=1

1

k=∞2c.

The second sum gives

−1∑k=−∞

(−k) pX (k) =∞∑k=1

kpX (−k) =∞∑k=1

k1

2ck2=

1

2c

∞∑k=1

1

k=∞2c.

Because both sums are infinite, we conclude that EX is undefined.

9.12. More rigorously, to define EX, we let X+ = max X, 0 andX− = −min X, 0. Then observe that X = X+ − X− and thatboth X+ and X− are nonnegative r.v.’s. We say that a randomvariable X admits an expectation if EX+ and EX− are notboth equal to +∞. In which case, EX = EX+ − EX−.

106

9.2 Function of a Discrete Random Variable

Given a random variable X, we will often have occasion to definea new random variable by Y ≡ g(X), where g(x) is a real-valuedfunction of the real-valued variable x. More precisely, recall thata random variable X is actually a function taking points of thesample space, ω ∈ Ω, into real numbers X(ω). Hence, we have thefollowing definition

Definition 9.13. The notation Y = g(X) is actually shorthandfor Y (ω) := g(X(ω)).

• The random variable Y = g(X) is sometimes called derivedrandom variable.

Example 9.14. Let

pX (x) =

1cx

2, x = ±1,±20, otherwise

andY = X4.

Find pY (y) and then calculate EY .

9.15. For discrete random variable X, the pmf of a derived ran-dom variable Y = g(X) is given by

pY (y) =∑

x:g(x)=y

pX(x).

107

Note that the sum is over all x in the support of X which satisfyg(x) = y.

Example 9.16. A “binary” random variable X takes only twovalues a and b with

P [X = b] = 1− P [X = a] = p.

X can be expressed as X = (b − a)I + a, where I is a Bernoullirandom variable with parameter p.

9.3 Expectation of a Function of a Discrete RandomVariable

Recall that for discrete random variable X, the pmf of a derivedrandom variable Y = g(X) is given by

pY (y) =∑

x:g(x)=y

pX(x).

If we want to compute EY , it might seem that we first have tofind the pmf of Y . Typically, this requires a detailed analysis of gwhich can be complicated, and it is avoided by the following result.

9.17. Suppose X is a discrete random variable.

E [g(X)] =∑x

g(x)pX(x).

This is referred to as the law/rule of the lazy/unconciousstatistician (LOTUS) [22, Thm 3.6 p 48],[9, p. 149],[8, p. 50]because it is so much easier to use the above formula than to firstfind the pmf of Y . It is also called substitution rule [21, p 271].

Example 9.18. Back to Example 9.14. Recall that

pX (x) =

1cx

2, x = ±1,±20, otherwise

(a) When Y = X4, EY =

108

(b) E [2X − 1]

9.19. Caution: A frequently made mistake of beginning studentsis to set E [g(X)] equal to g (EX). In general, E [g(X)] 6= g (EX).

(a) In particular, E[

1X

]is not the same as 1

EX .

(b) An exception is the case of an affine function g(x) = ax + b.See also (9.23).

Example 9.20. Continue from Example 9.4. ForX ∼ Bernoulli(p),

(a) EX = p

(b) E[X2]

= 02 × (1− p) + 12 × p = p 6= (EX)2.

Example 9.21. Continue from Example 9.7. Suppose X ∼ P(α).

E[X2]

=∞∑i=0

i2e−ααi

i!= e−αα

∞∑i=1

iαi−1

(i− 1)!(16)

We can evaluate the infinite sum in (16) by rewriting i as i−1+1:∞∑i=1

iαi−1

(i− 1)!=

∞∑i=1

(i− 1 + 1)αi−1

(i− 1)!=

∞∑i=1

(i− 1)αi−1

(i− 1)!+

∞∑i=1

αi−1

(i− 1)!

= α∞∑i=2

αi−2

(i− 2)!+∞∑i=1

αi−1

(i− 1)!= αeα + eα = eα(α+ 1).

Plugging this back into (16), we get

E[X2]

= α (α + 1) = α2 + α.

9.22. Continue from Example 9.8. For X ∼ B(n, p), one can findE[X2]

= np(1− p) + (np)2.

109

9.23. Some Basic Properties of Expectations

(a) For c ∈ R, E [c] = c

(b) For c ∈ R, E [X + c] = EX + c and E [cX] = cEX

(c) For constants a, b, we have

E [aX + b] = aEX + b.

(d) For constants c1 and c2,

E [c1g1(X) + c2g2(X)] = c1E [g1(X)] + c2E [g2(X)] .

(e) For constants c1, c2, . . . , cn,

E

[n∑k=1

ckgk(X)

]=

n∑k=1

ckE [gk(X)] .

Definition 9.24. Some definitions involving expectation of a func-tion of a random variable:

(a) Absolute moment : E[|X|k

], where we define E

[|X|0

]= 1

(b) Moment : mk = E[Xk]

= the kth moment of X, k ∈ N.

• The first moment of X is its expectation EX.

• The second moment of X is E[X2].

110

9.4 Variance and Standard Deviation

An average (expectation) can be regarded as one number thatsummarizes an entire probability model. After finding an average,someone who wants to look further into the probability modelmight ask, “How typical is the average?” or, “What are thechances of observing an event far from the average?” A measureof dispersion/deviation/spread is an answer to these questionswrapped up in a single number. (The opposite of this measure isthe peakedness.) If this measure is small, observations are likelyto be near the average. A high measure of dispersion suggests thatit is not unusual to observe events that are far from the average.

Example 9.25. Consider your score on the midterm exam. Afteryou find out your score is 7 points above average, you are likely toask, “How good is that? Is it near the top of the class or somewherenear the middle?”.

Example 9.26. In the case that the random variable X is therandom payoff in a game that can be repeated many times underidentical conditions, the expected value of X is an informativemeasure on the grounds of the law of large numbers. However, theinformation provided by EX is usually not sufficient when X isthe random payoff in a nonrepeatable game.

Suppose your investment has yielded a profit of $3,000 and youmust choose between the following two options:

• the first option is to take the sure profit of $3,000 and

• the second option is to reinvest the profit of $3,000 under thescenario that this profit increases to $4,000 with probability0.8 and is lost with probability 0.2.

The expected profit of the second option is

0.8× $4, 000 + 0.2× $0 = $3, 200

and is larger than the $3,000 from the first option. Nevertheless,most people would prefer the first option. The downside risk istoo big for them. A measure that takes into account the aspect ofrisk is the variance of a random variable. [21, p 35]

111

9.27. The most important measures of dispersion are thestandard deviation and its close relative, the variance.

Definition 9.28. Variance :

VarX = E[(X − EX)2

]. (17)

• Read “the variance of X”

• Notation: DX , or σ2 (X), or σ2X , or VX [22, p. 51]

• In some references, to avoid confusion from the two expecta-tion symbols, they first define m = EX and then define thevariance of X by

VarX = E[(X −m)2

].

• We can also calculate the variance via another identity:

VarX = E[X2]− (EX)2

• The units of the variance are squares of the units of the ran-dom variable.

9.29. Basic properties of variance:

• VarX ≥ 0.

• VarX ≤ E[X2].

• Var[cX] = c2 VarX.

• Var[X + c] = VarX.

• Var[aX + b] = a2 VarX.

112

Definition 9.30. Standard Deviation :

σX =√

Var[X].

• It is useful to work with the standard deviation since it hasthe same units as EX.

• Informally we think of outcomes within ±σX of EX as beingin the center of the distribution. Some references would in-formally interpret sample values within ±σX of the expectedvalue, x ∈ [EX − σX ,EX + σX ], as “typical” values of X andother values as “unusual”.

• σaX+b = |a|σX .

9.31. σX and√

VarX: Note that the√· function is a strictly

increasing function. Because σX =√

VarX, if one of them islarge, another one is also large. Therefore, both values quantifythe amount of spread/dispersion in RV X (which can be observedfrom the spread or dispersion of the pmf or the histogram or therelative frequency graph). However, VarX does not have the sameunit as the RV X.

9.32. In finance, standard deviation is a key concept and is usedto measure the volatility (risk) of investment returns and stockreturns.

It is common wisdom in finance that diversification of a portfolioof stocks generally reduces the total risk exposure of the invest-ment. We shall return to this point in Example 11.67.

Example 9.33. Continue from Example 9.25. If the standarddeviation of exam scores is 12 points, the student with a score of+7 with respect to the mean can think of herself in the middle ofthe class. If the standard deviation is 3 points, she is likely to benear the top.

Example 9.34. Suppose X ∼ Bernoulli(p).

(a) E[X2]

= 02 × (1− p) + 12 × p = p.

113

(b) VarX = EX2 − (EX)2 = p− p2 = p(1− p).Alternatively, if we directly use (17), we have

VarX = E[(X − EX)2

]= (0− p)2 × (1− p) + (1− p)2 × p

= p(1− p)(p+ (1− p)) = p(1− p).

Example 9.35. Continue from Example 9.7 and Example 9.21.Suppose X ∼ P(α). We have

VarX = E[X2]− (EX)2 = α2 + α− α2 = α.

Therefore, for Poisson random variable, the expected value is thesame as the variance.

Example 9.36. Consider the two pmfs shown in Figure 11. Therandom variable X with pmf at the left has a smaller variancethan the random variable Y with pmf at the right because moreprobability mass is concentrated near zero (their mean) in thegraph at the left than in the graph at the right. [9, p. 85]

2.4 Expectation 85

The variance is the average squared deviation of X about its mean. The variance character-

izes how likely it is to observe values of the random variable far from its mean. For example,

consider the two pmfs shown in Figure 2.9. More probability mass is concentrated near zero

in the graph at the left than in the graph at the right.

ipX

( )

0 1 2−1i

−2

1/6

1/3

Yip ( )

0 1 2−1i

−2

1/6

1/3

Figure 2.9. Example 2.27 shows that the random variable X with pmf at the left has a smaller variance than the

random variable Y with pmf at the right.

Example 2.27. Let X and Y be the random variables with respective pmfs shown in

Figure 2.9. Compute var(X) and var(Y ).

Solution. By symmetry, both X and Y have zero mean, and so var(X) = E[X2] and

var(Y ) = E[Y 2]. Write

E[X2] = (−2)2 16+(−1)2 1

3+(1)2 1

3+(2)2 1

6= 2,

and

E[Y 2] = (−2)2 13+(−1)2 1

6+(1)2 1

6+(2)2 1

3= 3.

Thus, X and Y are both zero-mean random variables taking the values ±1 and ±2. But Y

is more likely to take values far from its mean. This is reflected by the fact that var(Y ) >var(X).

When a random variable does not have zero mean, it is often convenient to use the

variance formula,

var(X) = E[X2]− (E[X ])2, (2.17)

which says that the variance is equal to the second moment minus the square of the first

moment. To derive the variance formula, write

var(X) := E[(X −m)2]

= E[X2 −2mX +m2]

= E[X2]−2mE[X ]+m2, by linearity,

= E[X2]−m2

= E[X2]− (E[X ])2.

The standard deviation of X is defined to be the positive square root of the variance. Since

the variance of a random variable is often denoted by the symbol σ2, the standard deviation

is denoted by σ .

Figure 11: Example 9.36 shows that a random variable whose probability massis concentrated near the mean has smaller variance. [9, Fig. 2.9]

9.37. We have already talked about variance and standard de-viation as a number that indicates spread/dispersion of the pmf.More specifically, let’s imagine a pmf that shapes like a bell curve.As the value of σX gets smaller, the spread of the pmf will besmaller and hence the pmf would “look sharper”. Therefore, theprobability that the random variable X would take a value that isfar from the mean would be smaller.

114

The next property involves the use of σX to bound “the tailprobability” of a random variable.

9.38. Chebyshev’s Inequality :

P [|X − EX| ≥ α] ≤ σ2X

α2

or equivalently

P [|X − EX| ≥ nσX ] ≤ 1

n2

• Useful only when α > σX

Example 9.39. If X has mean m and variance σ2, it is sometimesconvenient to introduce the normalized random variable

Y =X −mσ

.

Definition 9.40. Central Moments : A generalization of thevariance is the nth central moment which is defined to be

µn = E [(X − EX)n] .

(a) µ1 = E [X − EX] = 0.

(b) µ2 = σ2X = VarX: the second central moment is the variance.

115

Sirindhorn International Institute of Technology

Thammasat University

School of Information, Computer and Communication Technology

ECS315 2014/1 Part IV.1 Dr.Prapun

10 Continuous Random Variables

10.1 From Discrete to Continuous Random Variables

In many practical applications of probability, physical situationsare better described by random variables that can take on a con-tinuum of possible values rather than a discrete number of values.For this type of random variable, the interesting fact is that

• any individual value has probability zero:

P [X = x] = 0 for all x (18)

and that

• the support is always uncountable.

These random variables are called continuous random vari-ables.

10.1. We can see from (18) that the pmf is going to be useless forthis type of random variable. It turns out that the cdf FX is stilluseful and we shall introduce another useful function called prob-ability density function (pdf) to replace the role of pmf. However,integral calculus36 is required to formulate this continuous analogof a pmf.

10.2. In some cases, the random variable X is actually discretebut, because the range of possible values is so large, it might bemore convenient to analyze X as a continuous random variable.

36This is always a difficult concept for the beginning student.

117

Example 10.3. Suppose that current measurements are read froma digital instrument that displays the current to the nearest one-hundredth of a mA. Because the possible measurements are lim-ited, the random variable is discrete. However, it might be a moreconvenient, simple approximation to assume that the current mea-surements are values of a continuous random variable.

Example 10.4. If you can measure the heights of people withinfinite precision, the height of a randomly chosen person is a con-tinuous random variable. In reality, heights cannot be measuredwith infinite precision, but the mathematical analysis of the dis-tribution of heights of people is greatly simplified when using amathematical model in which the height of a randomly chosenperson is modeled as a continuous random variable. [17, p 284]

Example 10.5. Continuous random variables are important mod-els for

(a) voltages in communication receivers

(b) file download times on the Internet

(c) velocity and position of an airliner on radar

(d) lifetime of a battery

(e) decay time of a radioactive particle

(f) time until the occurrence of the next earthquake in a certainregion

Example 10.6. The simplest example of a continuous randomvariable is the “random choice” of a number from the interval(0, 1).

• In MATLAB, this can be generated by the command rand.In Excel, use rand().

• The generation is “unbiased” in the sense that “any numberin the range is as likely to occur as another number.”

• Histogram is flat over (0, 1).

• Formally, this is called a uniform RV on the interval (0, 1).

118

Definition 10.7. We say that X is a continuous random vari-able37 if we can find a (real-valued) function38 f such that, for anyset B, P [X ∈ B] has the form

P [X ∈ B] =

∫B

f(x)dx. (19)

• In particular,

P [a ≤ X ≤ b] =

∫ b

a

f(x)dx. (20)

In other words, the area under the graph of f(x) betweenthe points a and b gives the probability P [a ≤ X ≤ b].

• The function f is called the probability density function(pdf) or simply density.

• When we want to emphasize that the function f is a densityof a particular random variable X, we write fX instead of f .

37To be more rigorous, this is the definition for absolutely continuous random variable. Atthis level, we will not distinguish between the continuous random variable and absolutelycontinuous random variable. When the distinction between them is considered, a randomvariable X is said to be continuous (not necessarily absolutely continuous) when condition (18)is satisfied. Alternatively, condition (18) is equivalent to requiring the cdf FX to be continuous.Another fact worth mentioning is that if a random variable is absolutely continuous, then itis continuous. So, absolute continuity is a stronger condition.

38Strictly speaking, δ-“function” is not a function; so, can’t use δ-function here.

119

206 Part 2: Probability

learningobjectives

After reading thischapter, you should

be able to:

• Understand the nature and the applications of the normal distribution.

• Use the standard normal distribution and z-scores to determine probabilitiesassociated with the normal distribution.

• Use the normal distribution to approximate the binomial distribution.

• Understand the nature and the applications of the exponential distribution,including its relationship to the Poisson distribution of Chapter 6.

• Use the computer in determining probabilities associated with the normal andexponential distributions.

7.1 INTRODUCTION

Chapter 6 dealt with probability distributions for discrete random variables,which can take on only certain values along an interval, with the possible valueshaving gaps between them. This chapter presents several continuous probabilitydistributions; these describe probabilities associated with random variables thatare able to assume any of an infinite number of values along an interval.

Discrete probability distributions can be expressed as histograms, where theprobabilities for the various x values are expressed by the heights of a series ofvertical bars. In contrast, continuous probability distributions are smooth curves,where probabilities are expressed as areas under the curves. The curve is a func-tion of x, and f(x) is referred to as a probability density function. Since the con-tinuous random variable x can be in an infinitely small interval along a range orcontinuum, the probability that x will take on any exact value may be regarded aszero. Therefore, we can speak of probabilities only in terms of the probability thatx will be within a specified interval of values. For a continuous random variable,the probability distribution will have the following characteristics:

The probability distribution for a continuous random variable:

1. The vertical coordinate is a function of x, described as f(x) and referred to asthe probability density function.

2. The range of possible x values is along the horizontal axis.3. The probability that x will take on a value between a and b will be the

area under the curve between points a and b, as shown in Figure 7.1. The

a bx

f(x)

Area = P(a ≤ x ≤ b)

FIGURE 7.1For a continuous randomvariable, the probability dis-tribution is described by acurve called the probabilitydensity function, f(x). Thetotal area beneath the curveis 1.0, and the probabilitythat x will take on somevalue between a and b isthe area beneath the curvebetween points a and b.

Figure 13: For a continuous random variable, the probability distribution isdescribed by a curve called the probability density function, f(x). The totalarea beneath the curve is 1.0, and the probability that X will take on somevalue between a and b is the area beneath the curve between points a and b.

Example 10.8. For the random variable generated by the rand

command in MATLAB39 or the rand() command in Excel,

Definition 10.9. Recall that the support SX of a random variableX is any set S such that P [X ∈ S] = 1. For continuous randomvariable, SX is usually set to be x : fX(x) > 0.

39The rand command in MATLAB is an approximation for two reasons:

(a) It produces pseudorandom numbers; the numbers seem random but are actually theoutput of a deterministic algorithm.

(b) It produces a double precision floating point number, represented in the computerby 64 bits. Thus MATLAB distinguishes no more than 264 unique double precisionfloating point numbers. By comparison, there are uncountably infinite real numbers inthe interval from 0 to 1.

120

10.2 Properties of PDF and CDF for Continuous Ran-dom Variables

10.10. fX is determined only almost everywhere40. That is, givena pdf f for a random variable X, if we construct a function g bychanging the function f at a countable number of points41, then gcan also serve as a pdf for X.

10.11. The cdf of any kind of random variable X is defined as

FX(x) = P [X ≤ x] .

Note that even though there are more than one valid pdfs forany given random variable, the cdf is unique. There is only onecdf for each random variable.

10.12. For continuous random variable, given the pdf fX(x), wecan find the cdf of X by

FX(x) = P [X ≤ x] =

∫ x

−∞fX(t)dt.

10.13. Given the cdf FX(x), we can find the pdf fX(x) by

• If FX is differentiable at x, we will set

d

dxFX(x) = fX(x).

• If FX is not differentiable at x, we can set the values of fX(x)to be any value. Usually, the values are selected to give simpleexpression. (In many cases, they are simply set to 0.)

40Lebesgue-a.e, to be exact41More specifically, if g = f Lebesgue-a.e., then g is also a pdf for X.

121

command in MATLAB or the rand() command in Excel,

Example 10.15. Suppose that the lifetime X of a device has thecdf

FX (x) =

0, x < 014x

2, 0 ≤ x ≤ 21, x > 2

Observe that it is differentiable at each point x except at x = 2.The probability density function is obtained by differentiation ofthe cdf which gives

fX (x) =

12x, 0 < x < 20, otherwise.

At x = 2 where FX has no derivative, it does not matter whatvalues we give to fX . Here, we set it to be 0.

10.16. In many situations when you are asked to find pdf, it maybe easier to find cdf first and then differentiate it to get pdf.

Exercise 10.17. A point is “picked at random” in the inside of acircular disk with radius r. Let the random variable X denote thedistance from the center of the disk to this point. Find fX(x).

10.18. Unlike the cdf of a discrete random variable, the cdf of acontinuous random variable has no jump and is continuous every-where.

10.19. pX(x) = P [X = x] = P [x ≤ X ≤ x] =∫ xx fX(t)dt = 0.

Again, it makes no sense to speak of the probability that X willtake on a pre-specified value. This probability is always zero.

10.20. P [X = a] = P [X = b] = 0. Hence,

P [a < X < b] = P [a ≤ X < b] = P [a < X ≤ b] = P [a ≤ X ≤ b]

122

• The corresponding integrals over an interval are not affectedby whether or not the endpoints are included or excluded.

• When we work with continuous random variables, it is usuallynot necessary to be precise about specifying whether or nota range of numbers includes the endpoints. This is quite dif-ferent from the situation we encounter with discrete randomvariables where it is critical to carefully examine the type ofinequality.

10.21. fX is nonnegative and∫R fX(x)dx = 1.

Example 10.22. Random variable X has pdf

fX(x) =

ce−2x, x > 00, otherwise

Find the constant c and sketch the pdf.

Definition 10.23. A continuous random variable is called expo-nential if its pdf is given by

fX (x) =

λe−λx, x > 0,0, x ≤ 0

for some λ > 0

Theorem 10.24. Any nonnegative42 function that integrates toone is a probability density function (pdf) of some randomvariable [8, p.139].

42or nonnegative a.e.

123

10.25. Intuition/Interpretation:The use of the word “density” originated with the analogy to

the distribution of matter in space. In physics, any finite volume,no matter how small, has a positive mass, but there is no mass ata single point. A similar description applies to continuous randomvariables.

Approximately, for a small ∆x,

P [X ∈ [x, x+ ∆x]] =

∫ x+∆x

x

fX(t)dt ≈ fX(x)∆x.

This is why we call fX the density function.

4.1 Densities and probabilities 139

Definition

We say that X is a continuous random variable if P(X ∈ B) has the form

P(X ∈ B) =∫

Bf (t)dt :=

∫ ∞

−∞IB(t) f (t)dt (4.1)

for some integrable function f .a Since P(X ∈ IR) = 1, the function f must integrate to one;

i.e.,∫ ∞−∞ f (t)dt = 1. Further, since P(X ∈ B) ≥ 0 for all B, it can be shown that f must be

nonnegative.1 A nonnegative function that integrates to one is called a probability density

function (pdf).

Usually, the set B is an interval such as B = [a,b]. In this case,

P(a ≤ X ≤ b) =∫ b

af (t)dt.

See Figure 4.1(a). Computing such probabilities is analogous to determining the mass of a

piece of wire stretching from a to b by integrating its mass density per unit length from a to

b. Since most probability densities we work with are continuous, for a small interval, say

[x,x+∆x], we have

P(x ≤ X ≤ x+∆x) =∫ x+∆x

xf (t)dt ≈ f (x)∆x.

See Figure 4.1(b).

(a) (b)

a b x+x x

Figure 4.1. (a) P(a ≤ X ≤ b) =∫ b

a f (t)dt is the area of the shaded region under the density f (t). (b) P(x ≤ X ≤x+∆x) =

∫ x+∆xx f (t)dt is the area of the shaded vertical strip.

Note that for random variables with a density,

P(a ≤ X ≤ b) = P(a < X ≤ b) = P(a ≤ X < b) = P(a < X < b)

since the corresponding integrals over an interval are not affected by whether or not the

endpoints are included or excluded.

Some common densities

Here are some examples of continuous random variables. A summary of the more com-

mon ones can be found on the inside of the back cover.

aLater, when more than one random variable is involved, we write fX (x) instead of f (x).

Figure 14: P [x ≤ X ≤ x+ ∆x] is the area of the shaded vertical strip.

In other words, the probability of random variable X taking ona value in a small interval around point c is approximately equalto f(c)∆c when ∆c is the length of the interval.

• In fact, fX(x) = lim∆x→0

P [x<X≤x+∆x]∆x

• The number fX(x) itself is not a probability. In particular,it does not have to be between 0 and 1.

• fX(c) is a relative measure for the likelihood that randomvariable X will take on a value in the immediate neighborhoodof point c.

Stated differently, the pdf fX(x) expresses how densely theprobability mass of random variable X is smeared out in theneighborhood of point x. Hence, the name of density function.

124

10.26. Histogram and pdf [17, p 143 and 145]:

2 4 6 8 10 12 14 16 180

0.05

0.1

0.15

0.2

0.25

x

pdf

Estimated pdf

From Histogram to pdf approximation

6

2 4 6 8 10 12 14 16 180

500

1000

x

Number of occurrences

2 4 6 8 10 12 14 16 180

10

20

x

Frequency (%) of occurrences

2 4 6 8 10 12 14 16 18

Number of samples = 5000

5000 Samples

Histogram

Vertical axis scaling

Figure 15: From histogram to pdf.

(a) A probability histogram is a bar chart that divides the rangeof values covered by the samples/measurements into intervalsof the same width, and shows the proportion (relative fre-quency) of the samples in each interval.

• To make a histogram, you break up the range of valuescovered by the samples into a number of disjoint adjacentintervals each having the same width, say width ∆. Theheight of the bar on each interval [j∆, (j + 1)∆) is takensuch that the area of the bar is equal to the proportionof the measurements falling in that interval (the propor-tion of measurements within the interval is divided by thewidth of the interval to obtain the height of the bar).

• The total area under the probability histogram is thusstandardized/normalized to one.

(b) If you take sufficiently many independent samples from a con-tinuous random variable and make the width ∆ of the baseintervals of the probability histogram smaller and smaller, thegraph of the probability histogram will begin to look more andmore like the pdf.

125

(c) Conclusion: A probability density function can be seen as a“smoothed out” version of a probability histogram

10.3 Expectation and Variance

10.27. Expectation : Suppose X is a continuous random variablewith probability density function fX(x).

EX =

∫ ∞−∞

xfX(x)dx (21)

E [g(X)] =

∫ ∞−∞

g(x)fX(x)dx (22)

In particular,

E[X2]

=

∫ ∞−∞

x2fX(x)dx

VarX =

∫ ∞−∞

(x− EX)2fX(x)dx = E[X2]− (EX)2.


command in MATLAB or the rand() command in Excel,

Example 10.29. For the exponential random variable introducedin Definition 10.23,

126

10.30. If we compare other characteristics of discrete and continu-ous random variables, we find that with discrete random variables,many facts are expressed as sums. With continuous random vari-ables, the corresponding facts are expressed as integrals.

10.31. All of the properties for the expectation and variance ofdiscrete random variables also work for continuous random vari-ables as well:

(a) Intuition/interpretation of the expected value: As n → ∞,the average of n independent samples of X will approach EX.This observation is known as the “Law of Large Numbers”.

(b) For c ∈ R, E [c] = c

(c) For constants a, b, we have E [aX + b] = aEX + b.

(d) E [∑n

i=1 cigi(X] =∑n

i=1 ciE [gi(X)].

(e) VarX = E[X2]− (EX)2

(f) VarX ≥ 0.

(g) VarX ≤ E[X2].

(h) Var[aX + b] = a2 VarX.

(i) σaX+b = |a|σX .

10.32. Chebyshev’s Inequality :

P [|X − EX| ≥ α] ≤ σ2X

α2

or equivalently

P [|X − EX| ≥ nσX ] ≤ 1

n2

• This inequality use variance to bound the “tail probability”of a random variable.

• Useful only when α > σX

127

Example 10.33. A circuit is designed to handle a current of 20mA plus or minus a deviation of less than 5 mA. If the appliedcurrent has mean 20 mA and variance 4 mA2, use the Chebyshevinequality to bound the probability that the applied current vio-lates the design parameters.

Let X denote the applied current. Then X is within the designparameters if and only if |X − 20| < 5. To bound the probabilitythat this does not happen, write

P [|X − 20| ≥ 5] ≤ VarX

52=

4

25= 0.16.

Hence, the probability of violating the design parameters is at most16%.

10.34. Interesting applications of expectation:

(a) fX (x) = E [δ (X − x)]

(b) P [X ∈ B] = E [1B(X)]

128

ECS315 2014/1 Part IV.2 Dr.Prapun

10.4 Families of Continuous Random Variables

Theorem 10.24 states that any nonnegative function f(x) whoseintegral over the interval (−∞,+∞) equals 1 can be regarded asa probability density function of a random variable. In real-worldapplications, however, special mathematical forms naturally showup. In this section, we introduce a couple families of continuousrandom variables that frequently appear in practical applications.The probability densities of the members of each family all have thesame mathematical form but differ only in one or more parameters.

10.4.1 Uniform Distribution

Definition 10.35. For a uniform random variable on an interval[a, b], we denote its family by uniform([a, b]) or U([a, b]) or simplyU(a, b). Expressions that are synonymous with “X is a uniformrandom variable” are “X is uniformly distributed”, “X has a uni-form distribution”, and “X has a uniform density”. This family ischaracterized by

fX (x) =

0, x < a, x > b

1b−a , a ≤ x ≤ b

• The random variable X is just as likely to be near any valuein [a, b] as any other value.

129

• In MATLAB,

(a) use X = a+(b-a)*rand or X = random(’Uniform’,a,b)

to generate the RV,

(b) use pdf(’Uniform’,x,a,b) and cdf(’Uniform’,x,a,b)

to calculate the pdf and cdf, respectively.

Exercise 10.36. Show that FX (x) =

0, x < a, x > bx−ab−a , a ≤ x ≤ b

84 Probability theory, random variables and random processes

x x0

b – a1

a b 0a b

1

fx(x) Fx(x)

Fig. 3.5 The pdf and cdf for the uniform random variable.

xμ

x0μ0

12πσ2

1

21

fx(x) Fx(x)

Fig. 3.6 The pdf and cdf of a Gaussian random variable.

Gaussian (or normal) random variable This is a continuous random variable thatis described by the following pdf:

fx(x) = 1√2πσ 2

exp

− (x− μ)2

2σ 2

, (3.16)

where μ and σ 2 are two parameters whose meaning is described later. It is usually denotedas N (μ, σ 2). Figure 3.6 shows sketches of the pdf and cdf of a Gaussian random variable.

The Gaussian random variable is the most important and frequently encountered ran-dom variable in communications. This is because thermal noise, which is the major sourceof noise in communication systems, has a Gaussian distribution. Gaussian noise and theGaussian pdf are discussed in more depth at the end of this chapter.

The problems explore other pdf models. Some of these arise when a random variableis passed through a nonlinearity. How to determine the pdf of the random variable in thiscase is discussed next.

Funct ions of a random variable A function of a random variable y = g(x) is itself arandom variable. From the definition, the cdf of y can be written as

Fy(y) = P(ω ∈ : g(x(ω)) ≤ y). (3.17)

Figure 16: The pdf and cdf for the uniform random variable. [16, Fig. 3.5]

Example 10.37 (F2011). Suppose X is uniformly distributed onthe interval (1, 2). (X ∼ U(1, 2).)

(a) Plot the pdf fX(x) of X.

(b) Plot the cdf FX(x) of X.

10.38. The uniform distribution provides a probability model forselecting a point at random from the interval [a, b].

• Use with caution to model a quantity that is known to varyrandomly between a and b but about which little else is known.

130

Example 10.39. [9, Ex. 4.1 p. 140-141] In coherent radio com-munications, the phase difference between the transmitter and thereceiver, denoted by Θ, is modeled as having a uniform density on[−π, π].

(a) P [Θ ≤ 0] = 12

(b) P[Θ ≤ π

2

]= 3

4

Exercise 10.40. Show that when X ∼ U([a, b]), EX = a+b2 ,

VarX = (b−a)2

12 , and E[X2]

= 13

(b2 + ab+ a2

).

10.4.2 Gaussian Distribution

10.41. This is the most widely used model for the distributionof a random variable. When you have many independent randomvariables, a fundamental result called the central limit theorem(CLT) (informally) says that the sum (technically, the average) ofthem can often be approximated by normal distribution.

Definition 10.42. Gaussian random variables:

(a) Often called normal random variables because they occur sofrequently in practice.

(b) In MATLAB, use X = random(’Normal’,m,σ) or X = σ*randn+ m.

(c) fX (x) = 1√2πσe−

12(

x−mσ )

2

.

• In Excel, use NORMDIST(x,m,σ,FALSE).In MATLAB, use normpdf(x,m,σ) or pdf(’Normal’,x,m,σ).

• Figure 17 displays the famous bell-shaped graph of theGaussian pdf. This curve is also called the normal curve.

131

(d) FX(x) has no closed-form expression. However, see 10.48.

• In MATLAB, use normcdf(x,m,σ) or cdf(’Normal’,x,m,σ).

• In Excel, use NORMDIST(x,m,σ,TRUE).

(e) We write X ∼ N(m,σ2

).

84 Probability theory, random variables and random processes

x x0

b – a1

a b 0a b

1

fx(x) Fx(x)

Fig. 3.5 The pdf and cdf for the uniform random variable.

xμ

x0μ0

12πσ2

1

21

fx(x) Fx(x)

Fig. 3.6 The pdf and cdf of a Gaussian random variable.

Gaussian (or normal) random variable This is a continuous random variable thatis described by the following pdf:

fx(x) = 1√2πσ 2

exp

− (x− μ)2

2σ 2

, (3.16)

where μ and σ 2 are two parameters whose meaning is described later. It is usually denotedas N (μ, σ 2). Figure 3.6 shows sketches of the pdf and cdf of a Gaussian random variable.

The Gaussian random variable is the most important and frequently encountered ran-dom variable in communications. This is because thermal noise, which is the major sourceof noise in communication systems, has a Gaussian distribution. Gaussian noise and theGaussian pdf are discussed in more depth at the end of this chapter.

The problems explore other pdf models. Some of these arise when a random variableis passed through a nonlinearity. How to determine the pdf of the random variable in thiscase is discussed next.

Funct ions of a random variable A function of a random variable y = g(x) is itself arandom variable. From the definition, the cdf of y can be written as

Fy(y) = P(ω ∈ : g(x(ω)) ≤ y). (3.17)

Figure 17: The pdf and cdf of N (µ, σ2). [16, Fig. 3.6]

10.43. EX = m and VarX = σ2.

10.44. Important probabilities:P [|X − µ| < σ] = 0.6827;P [|X − µ| > σ] = 0.3173;P [|X − µ| > 2σ] = 0.0455;P [|X − µ| < 2σ] = 0.9545

These values are illustrated in Figure 20.

Example 10.45. Figure 21 compares several deviation scores andthe normal distribution:

(a) Standard scores have a mean of zero and a standard deviationof 1.0.

(b) Scholastic Aptitude Test scores have a mean of 500 and astandard deviation of 100.

132

109 3.5 The Gaussian random variable and process

0 0.2 0.4 0.6 0.8 1−0.8

−0.6

−0.4

−0.2

0

0.2

0.4

0.6

t (s)

Sign

al a

mpl

itude

(V

)

(a)

−1 −0.5 0 0.5 10

0.5

1

1.5

2

2.5

3

3.5

4

x (V)

f x(x

) (1

/V)

(b)

HistogramGaussian fitLaplacian fit

Fig. 3.14 (a) A sample skeletal muscle (emg) signal, and (b) its histogram and pdf fits.

1 =[∫ ∞

−∞fx(x)dx

]2

=[∫ ∞

−∞K1e−ax2

dx

]2

= K21

∫ ∞

x=−∞e−ax2

dx∫ ∞

y=−∞e−ay2

dy

= K21

∫ ∞

x=−∞

∫ ∞

y=−∞e−a(x2+y2)dxdy. (3.103)

Figure 18: Electrical activity of a skeletal muscle: (a) A sample skeletal muscle(emg) signal, and (b) its histogram and pdf fits. [16, Fig. 3.14]

133

111 3.5 The Gaussian random variable and process

−15 −10 −5 0 5 10 150

0.05

0.1

0.15

0.2

0.25

0.3

0.35

0.4

x

f x(x

)

σx = 1

σx = 2

σx = 5

Fig. 3.15 Plots of the zero-mean Gaussian pdf for different values of standard deviation, σx.

Table 3.1 Influence of σx on different quantities

Range (±kσx) k = 1 k = 2 k = 3 k = 4

P(mx − kσx < x ≤ mx + kσx) 0.683 0.955 0.997 0.999Error probability 10−3 10−4 10−6 10−8

Distance from the mean 3.09 3.72 4.75 5.61

of the pdf are ignorable. Indeed when communication systems are considered later it is the

presence of these tails that results in bit errors. The probabilities are on the order of 10−3–

10−12, very small, but still significant in terms of system performance. It is of interest to

see how far, in terms of σx, one must be from the mean value to have the different levels of

error probabilities. As shall be seen in later chapters this translates to the required SNR to

achieve a specified bit error probability. This is also shown in Table 3.1.

Having considered the single (or univariate) Gaussian random variable, we turn our

attention to the case of two jointly Gaussian random variables (or the bivariate case). Again

they are described by their joint pdf which, in general, is an exponential whose exponent

is a quadratic in the two variables, i.e., fx,y(x, y) = Ke(ax2+bx+cxy+dy+ey2+f ), where the con-

stants K, a, b, c, d, e, and f are chosen to satisfy the basic properties of a valid joint pdf,

namely being always nonnegative (≥ 0), having unit volume, and also that the marginal

pdfs, fx(x) = ∫∞−∞ fx,y(x, y)dy and fy(y) = ∫∞−∞ fx,y(x, y)dx, are valid. Written in standard

form the joint pdf is

Figure 19: Plots of the zero-mean Gaussian pdf for different values of standarddeviation, σX . [16, Fig. 3.15]

23) Fourier transform: ( ) ( )2 21

2j mj x

X Xf f x e dt eω ω σω

∞− −−

−∞

= =∫F .

24) Note that 2xe dxα π

α

∞−

−∞

=∫ .

25) [ ] x mP X x Qσ−⎛ ⎞> = ⎜ ⎟

⎝ ⎠; [ ] 1 x m xP X x Q Q

σ σ− −⎛ ⎞ ⎛< = − = −⎜ ⎟ ⎜

⎝ ⎠ ⎝m ⎞⎟⎠

.

• 0.6827, 0.3173P X P Xμ σ μ σ⎡ − < ⎤ = ⎡ − > ⎤ =⎣ ⎦ ⎣ ⎦

2 0.0455, 2 0.9545P X P Xμ σ μ σ⎡ − > ⎤ = ⎡ − < ⎤ =⎣ ⎦ ⎣ ⎦

μ σ+μ σ− μ

68%

( )Xf x

μ 2μ σ+ 2μ σ−

( )Xf x

95%

26) Q-function: ( )2

212

x

z

Q z e dxπ

∞−

= ∫ corresponds to [ ]P X z> where ;

that is

( )~ 0,1X N

( )Q z is the probability of the “tail” of ( )0,1N .

( )0,1N

z 0

( )Q z

0.5

a) Q is a decreasing function with ( ) 102

Q = .

b) ( ) ( )1Q z Q z− = −

c) ( )( )1 1Q Q z− − = z−

d) ( )2

22

2 sin

0

1 x

Q x e d

π

θ θπ

−= ∫ . ( )

2

24

2 2 sin

0

1 x

Q x e d

π

θ θπ

−= ∫ .

e) ( )2

212

xd Q x edx π

−= − ; ( )( )

( )( )( )

2

212

f xd dQ f x e f xdx dxπ

−= − .

Figure 20: Probability density function of X ∼ N (µ, σ2) .

6.1 Normal Probability DistributionsThe domain of bell-shaped distributions is the setof all real numbers.

6.2 The Standard Normal DistributionTo work with normal distributions, we need thestandard score.

6.3 Applications of Normal DistributionsThe normal distribution can help us to determineprobabilities.

6.4 NotationThe z notation is critical in the use of normaldistributions.

6.5 Normal Approximation of theBinomialBinomial probabilities can be estimated by usinga normal distribution.

6 Normal Probability Distributions

6.1 Normal Probability Distributions

Intelligence ScoresThe normal probability distribution is considered the single most important proba-bility distribution. An unlimited number of continuous random variables have either a normal

or an approximately normal distribution.

We are all familiar with IQ (intelligence quotient) scores and/or SAT (Scholastic Aptitude Test)scores. IQ scores have a mean of 100 and a standard deviation of 16. SAT scores have a mean of

500 with a standard deviation of 100. But did you know that these continuous random variables

also follow a normal distribution?

Figure A, pictures the comparison of sev-

eral deviation scores and the normal distri-

bution: Standard scores have a mean of

zero and a standard deviation of 1.0.

Scholastic Aptitude Test scores have a

mean of 500 and a standard deviation of

100.

Binet Intelligence Scale scores have a

mean of 100 and a standard deviation of 16.

In each case there are 34 percent of the

scores between the mean and one standard

deviation, 14 percent between one and two

standard deviations, and 2 percent beyond

two standard deviations.

Source: Beck, Applying Psychology: Critical and Creative Thinking, Figure 6.2 “Pictures the Comparison of Several DeviationScores and the Normal Distribution,” © 1992 Prentice-Hall, Inc. Reproduced by permission of Pearson Education, Inc.

–3.0 –2.0 –1.0 0

2% 2%14% 14%34% 34%

Standard Scores1.0 2.0 3.0

200 300 400 500SAT Scores

600 700 800

52 68 84 100Binet Intelligence Scale Scores

116 132 148

F IGURE A

© 2010/Jupiterimages Corporation

Cop

yrig

ht 2

010

Cen

gage

Lea

rnin

g. A

ll R

ight

s R

eser

ved.

May

not

be

copi

ed, s

cann

ed, o

r du

plic

ated

, in

who

le o

r in

par

t. D

ue to

ele

ctro

nic

righ

ts, s

ome

thir

d pa

rty

cont

ent m

ay b

e su

ppre

ssed

fro

m th

e eB

ook

and/

or e

Cha

pter

(s).

Edi

tori

al r

evie

w h

as d

eem

ed th

at a

ny s

uppr

esse

d co

nten

t doe

s no

t mat

eria

lly a

ffec

t the

ove

rall

lear

ning

exp

erie

nce.

Cen

gage

Lea

rnin

g re

serv

es th

e ri

ght t

o re

mov

e ad

ditio

nal c

onte

nt a

t any

tim

e if

sub

sequ

ent r

ight

s re

stri

ctio

ns r

equi

re it

.

Figure 21: Comparison of Several Deviation Scores and the Normal Distribution

134

(c) Binet Intelligence Scale43 scores have a mean of 100 and astandard deviation of 16.

In each case there are 34 percent of the scores between themean and one standard deviation, 14 percent between one andtwo standard deviations, and 2 percent beyond two standarddeviations. [Source: Beck, Applying Psychology: Critical andCreative Thinking.]

10.46. N (0, 1) is the standard Gaussian (normal) distribution.

• In Excel, use NORMSINV(RAND()).In MATLAB, use randn.

• The standard normal cdf is denoted by Φ(z).

It inherits all properties of cdf.

Moreover, note that Φ(−z) = 1− Φ(z).

10.47. Relationship between N (0, 1) and N (m,σ2).

(a) An arbitrary Gaussian random variable with mean m andvariance σ2 can be represented as σZ+m, where Z ∼ N (0, 1).

43Alfred Binet, who devised the first general aptitude test at the beginning of the 20thcentury, defined intelligence as the ability to make adaptations. The general purpose of thetest was to determine which children in Paris could benefit from school. Binets test, like itssubsequent revisions, consists of a series of progressively more difficult tasks that children ofdifferent ages can successfully complete. A child who can solve problems typically solved bychildren at a particular age level is said to have that mental age. For example, if a child cansuccessfully do the same tasks that an average 8-year-old can do, he or she is said to have amental age of 8. The intelligence quotient, or IQ, is defined by the formula:

IQ = 100 × (Mental Age/Chronological Age)

There has been a great deal of controversy in recent years over what intelligence tests measure.Many of the test items depend on either language or other specific cultural experiences forcorrect answers. Nevertheless, such tests can rather effectively predict school success. Ifschool requires language and the tests measure language ability at a particular point of timein a childs life, then the test is a better-than-chance predictor of school performance.

135

0.00 0.5000 0.50 0.6915 1.00 0.8413 1.50 0.9332 2.00 0.97725 2.50 0.993790.01 0.5040 0.51 0.6950 1.01 0.8438 1.51 0.9345 2.01 0.97778 2.51 0.993960.02 0.5080 0.52 0.6985 1.02 0.8461 1.52 0.9357 2.02 0.97831 2.52 0.994130.03 0.5120 0.53 0.7019 1.03 0.8485 1.53 0.9370 2.03 0.97882 2.53 0.994300.04 0.5160 0.54 0.7054 1.04 0.8508 1.54 0.9382 2.04 0.97932 2.54 0.994460.05 0.5199 0.55 0.7088 1.05 0.8531 1.55 0.9394 2.05 0.97982 2.55 0.994610.06 0.5239 0.56 0.7123 1.06 0.8554 1.56 0.9406 2.06 0.98030 2.56 0.994770.07 0.5279 0.57 0.7157 1.07 0.8577 1.57 0.9418 2.07 0.98077 2.57 0.994920.08 0.5319 0.58 0.7190 1.08 0.8599 1.58 0.9429 2.08 0.98124 2.58 0.995060.09 0.5359 0.59 0.7224 1.09 0.8621 1.59 0.9441 2.09 0.98169 2.59 0.995200.10 0.5398 0.60 0.7257 1.10 0.8643 1.60 0.9452 2.10 0.98214 2.60 0.995340.11 0.5438 0.61 0.7291 1.11 0.8665 1.61 0.9463 2.11 0.98257 2.61 0.995470.12 0.5478 0.62 0.7324 1.12 0.8686 1.62 0.9474 2.12 0.98300 2.62 0.995600.13 0.5517 0.63 0.7357 1.13 0.8708 1.63 0.9484 2.13 0.98341 2.63 0.995730.14 0.5557 0.64 0.7389 1.14 0.8729 1.64 0.9495 2.14 0.98382 2.64 0.995850.15 0.5596 0.65 0.7422 1.15 0.8749 1.65 0.9505 2.15 0.98422 2.65 0.995980.16 0.5636 0.66 0.7454 1.16 0.8770 1.66 0.9515 2.16 0.98461 2.66 0.996090.17 0.5675 0.67 0.7486 1.17 0.8790 1.67 0.9525 2.17 0.98500 2.67 0.996210.18 0.5714 0.68 0.7517 1.18 0.8810 1.68 0.9535 2.18 0.98537 2.68 0.996320.19 0.5753 0.69 0.7549 1.19 0.8830 1.69 0.9545 2.19 0.98574 2.69 0.996430.20 0.5793 0.70 0.7580 1.20 0.8849 1.70 0.9554 2.20 0.98610 2.70 0.996530.21 0.5832 0.71 0.7611 1.21 0.8869 1.71 0.9564 2.21 0.98645 2.71 0.996640.22 0.5871 0.72 0.7642 1.22 0.8888 1.72 0.9573 2.22 0.98679 2.72 0.996740.23 0.5910 0.73 0.7673 1.23 0.8907 1.73 0.9582 2.23 0.98713 2.73 0.996830.24 0.5948 0.74 0.7704 1.24 0.8925 1.74 0.9591 2.24 0.98745 2.74 0.996930.25 0.5987 0.75 0.7734 1.25 0.8944 1.75 0.9599 2.25 0.98778 2.75 0.997020.26 0.6026 0.76 0.7764 1.26 0.8962 1.76 0.9608 2.26 0.98809 2.76 0.997110.27 0.6064 0.77 0.7794 1.27 0.8980 1.77 0.9616 2.27 0.98840 2.77 0.997200.28 0.6103 0.78 0.7823 1.28 0.8997 1.78 0.9625 2.28 0.98870 2.78 0.997280.29 0.6141 0.79 0.7852 1.29 0.9015 1.79 0.9633 2.29 0.98899 2.79 0.997360.30 0.6179 0.80 0.7881 1.30 0.9032 1.80 0.9641 2.30 0.98928 2.80 0.997440.31 0.6217 0.81 0.7910 1.31 0.9049 1.81 0.9649 2.31 0.98956 2.81 0.997520.32 0.6255 0.82 0.7939 1.32 0.9066 1.82 0.9656 2.32 0.98983 2.82 0.997600.33 0.6293 0.83 0.7967 1.33 0.9082 1.83 0.9664 2.33 0.99010 2.83 0.997670.34 0.6331 0.84 0.7995 1.34 0.9099 1.84 0.9671 2.34 0.99036 2.84 0.99774035 0.6368 0.85 0.8023 1.35 0.9115 1.85 0.9678 2.35 0.99061 2.85 0.997810.36 0.6406 0.86 0.8051 1.36 0.9131 1.86 0.9686 2.36 0.99086 2.86 0.997880.37 0.6443 0.87 0.8078 1.37 0.9147 1.87 0.9693 2.37 0.99111 2.87 0.997950.38 0.6480 0.88 0.8106 1.38 0.9162 1.88 0.9699 2.38 0.99134 2.88 0.998010.39 0.6517 0.89 0.8133 1.39 0.9177 1.89 0.9706 2.39 0.99158 2.89 0.998070.40 0.6554 0.90 0.8159 1.40 0.9192 1.90 0.9713 2.40 0.99180 2.90 0.998130.41 0.6591 0.91 0.8186 1.41 0.9207 1.91 0.9719 2.41 0.99202 2.91 0.998190.42 0.6628 0.92 0.8212 1.42 0.9222 1.92 0.9726 2.42 0.99224 2.92 0.998250.43 0.6664 0.93 0.8238 1.43 0.9236 1.93 0.9732 2.43 0.99245 2.93 0.998310.44 0.6700 0.94 0.8264 1.44 0.9251 1.94 0.9738 2.44 0.99266 2.94 0.998360.45 0.6736 0.95 0.8289 1.45 0.9265 1.95 0.9744 2.45 0.99286 2.95 0.998410.46 0.6772 0.96 0.8315 1.46 0.9279 1.96 0.9750 2.46 0.99305 2.96 0.998460.47 0.6808 0.97 0.8340 1.47 0.9292 1.97 0.9756 2.47 0.99324 2.97 0.998510.48 0.6844 0.98 0.8365 1.48 0.9306 1.98 0.9761 2.48 0.99343 2.98 0.998560.49 0.6879 0.99 0.8389 1.49 0.9319 1.99 0.9767 2.49 0.99361 2.99 0.99861

3.6 DELTA FUNCTIONS, MIXED RANDOM VARIABLES 123

Table 3.1 The standard nonnal CDF <t>(y).

-

 (z) I(z) I z(z) I z(z) I z~(z) I z(z) I zI z

This relationship can be used to generate general GaussianRV from standard Gaussian RV.

(b) If X ∼ N(m,σ2

), the random variable

Z =X −mσ

is a standard normal random variable. That is, Z ∼ N (0, 1).

• Creating a new random variable by this transformationis referred to as standardizing.

• The standardized variable is called “standard score” or“z-score”.

10.48. It is impossible to express the integral of a Gaussian PDFbetween non-infinite limits (e.g., (20)) as a function that appearson most scientific calculators.

• An old but still popular technique to find integrals of theGaussian PDF is to refer to tables that have been obtainedby numerical integration.

One such table is the table that lists Φ(z) for many valuesof positive z.

For X ∼ N(m,σ2

), we can show that the CDF of X can

be calculated by

FX(x) = Φ

(x−mσ

).

Example 10.49. Suppose Z ∼ N (0, 1). Evaluate the followingprobabilities.

(a) P [−1 ≤ Z ≤ 1]

136

(b) P [−2 ≤ Z ≤ 2]

Example 10.50. Suppose X ∼ N (1, 2). Find P [1 ≤ X ≤ 2].

10.51. Q-function : Q (z) =∞∫z

1√2πe−

x2

2 dx corresponds to P [X > z]

where X ∼ N (0, 1); that is Q (z) is the probability of the “tail”of N (0, 1). The Q function is then a complementary cdf (ccdf).

( )0,1N

z 0

( )Q z

-3 -2 -1 0 1 2 30

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

1

z

10,2

⎛ ⎞⎜ ⎟⎝ ⎠

N

z

( )erf z

0

( )2Q z

Figure 22: Q-function

(a) Q is a decreasing function with Q (0) = 12 .

(b) Q (−z) = 1−Q (z) = Φ(z)

10.52. Error function (MATLAB): erf (z) = 2√π

z∫0

e−x2

dx =

1− 2Q(√

2z)

(a) It is an odd function of z.

(b) For z ≥ 0, it corresponds to P [|X| < z] where X ∼ N(0, 1

2

).

(c) limz→∞

erf (z) = 1

137

3.00 1.35.10-3 3.40 3.37-10-4 3.80 7.23.10-5 4.20 1.33.10-5 4.60 2.11.10-6

3.01 1.31.10-3 3.41 3.25.10-4 3.81 6.95.10-5 4.21 1.28.10-5 4.61 2.01.10-6

3.02 1.26.10-3 3.42 3.13.10-4 3.82 6.67.10-5 4.22 1.22.10-5 4.62 1.92-10-6

3.03 1.22.10-3 3.43 3.02.10-4 3.83 6.41.10-5 4.23 1.17.10-5 4.63 1.83.10-6

3.04 1.18.10-3 3.44 2.91.10-4 3.84 6.15.10-5 4.24 1.12.10-5 4.64 1.74.10-6

3.05 1.14.10-3 3.45 2.80.10-4 3.85 5.91.10-5 4.25 ,1.07.10-5 4.65 1.66.10-6

3.06 1.11.10-3 3.46 2.70.10-4 3.86 5.67.10-5 4.26 1.02.10-5 4.66 1.58.10-6

3.07 1.07.10-3 3.47 2.60.10-4 3.87 5.44.10-5 4.27 9.77-10-6 4.67 1.51.10-6

3.08 1.04.10-3 3.48 2.51.10-4 3.88 5.22.10-5 4.28 9.34.10-6 4.68 1.43.10-6

3.09 1.00.10-3 3.49 2.42.10-4 3.89 5.01.10-5 4.29 8.93.10-6 4.69 1.37.10-6

3.10 9.68.10-4 3.50 2.33-10-4 3.90 4.81.10-5 4.30 8.54.10-6 4.70 1.30.10-6

3.11 9.35.10-4 3.51 2.24.10-4 3.91 4.61.10-5 4.31 8.16.10-6 4.71 1.24.10-6

3.12 9.04.10-4 3.52 2.16.10-4 3.92 4.43.10-5 4.32 7.80.10-6 4.72 1.18.10-6

3.13 8.74.10-4 3.53 2.08.10-4 3.93 4.25.10-5 4.33 7.46.10-6 4.73 1.12.10-6

3.14 8.45.10-4 3.54 2.00.10-4 3.94 4.07-10-5 4.34 7.12.10-6 4.74 1.07.10-6

3.15 8.16.10-4 3.55 1.93.10-4 3.95 3.91.10-5 4.35 6.81.10-6 4.75 1.02.10-6

3.16 7.89.10-4 3.56 1.85.10-4 3.96 3.75.10-5 4.36 6.50.10-6 4.76 9.68.10-7

3.17 7.62.10-4 3.57 1.78.10-4 3.97 3.59.10-5 4.37 6.21.10-6 4.77 9.21.10-7

3.18 7.36.10-4 3.58 1.72-10-4 3.98 3.45.10-5 4.38 5.93.10-6 4.78 8.76.10-7

3.19 7.11.10-4 3.59 1.65.10-4 3.99 3.30.10-5 4.39 5.67.10-6 4.79 8.34.10-7

3.20 6.87-10-4 3.60 1.59.10-4 4.00 3.17.10-5 4.40 5.41.10-6 4.80 7.93.10-7

3.21 6.64.10-4 3.61 1.53.10-4 4.01 3.04.10-5 4.41 5.17.10-6 4.81 7.55.10-7

3.22 6.41.10-4 3.62 1.47.10-4 4.02 2.91.10-5 4.42 4.94.10-6 4.82 7.18.10-7

3.23 6.19.10-4 3.63 1.42.10-4 4.03 2.79.10-5 4.43 4.71.10-6 4.83 6.83-10-7

3.24 5.98.10-4 3.64 1.36.10-4 4.04 2.67.10-5 4.44 4.50.10-6 4.84 6.49.10-7

3.25 5.77-10-4 3.65 1.31.10-4 4.05 2.56.10-5 4.45 4.29.10-6 4.85 6.17.10-7

3.26 5.57.10-4 3.66 1.26.10-4 4.06 2.45.10-5 4.46 4.10.10-6 4.86 5.87-10-7

3.27 5.38.10-4 3.67 1.21.10-4 4.07 2.35.10-5 4.47 3.91.10-6 4.87 5.58.10-7

3.28 5.19.10-4 3.68 1.17.10-4 4.08 2.25.10-5 4.48 3.73-10-6 4.88 5.30.10-7

3.29 5.Ql.1O-4 3.69 1.12.10-4 4.09 2.16.10-5 4.49 3.56.10-6 4.89 5.04.10-7

3.30 4.83.10-4 3.70 1.08.10-4 4.10 2.07.10-5 4.50 3.40.10-6 4.90 4.79.10-7

3.31 4.66.10-4 3.71 1.04.10-4 4.11 1.98.10-5 4.51 3.24.10-6 4.91 4.55.10-7

3.32 4.50.10-4 3.72 9.96.10-5 4.12 1.89.10-5 4.52 3.09.10-6 4.92 4.33.10-7

3.33 4.34·10-4 3.73 9.57.10-5 4.13 1.81.10-5 4.53 2.95.10-6 4.93 4.11.10-7

3.34 4.19·10-4 3.74 9.20.10-5 4.14 1.74.10-5 4.54 2.81.10-6 4.94 3.91.10-7

3.35 4.04·10-4 3.75 8.84.10-5 4.15 1.66.10-5 4.55 2.68.10-6 4.95 3.71-10-7

3,36 3.90·10-4 3.76 8.50.10-5 4.16 1.59.10-5 4.56 2.56.10-6 4.96 3.52.10-7

3.37 3.76.10--;4 3.77 8.16.10-5 4.17 1.52.10-5 4.57 2.44.10-6 4.97 3.35.10-7

3.38 3.62·10-4 3.78 7.84.10-5 4.18 1.46.10-5 4.58 2.32.10-6 4.98 3.18.10-7

3.39 3.49·10-4 3.79 7.53.10-5 4.19 1.39.10-5 4.59 2.22.10-6 4.99 3.02.10-7

Table 3.2 The standard normal complementary CDF Q(z) .

Q(z)IzQ(z)I zQ(z)Iz

._------..... '- .. --~--------...

Iz' Q(z)Q(z)I z

124 CHAPTER 3 CONTINUOUS RANDOM VARIABLES

(d) erf (−z) = −erf (z)

(e) Φ(x) = 12

(1 + erf

(x√(2)

))= 1

2erfc(− x√

2

)(f) The complementary error function:

erfc (z) = 1− erf (z) = 2Q(√

2z)

= 2√π

∫∞z e−x

2

dx

f) ( )( ) ( ) ( )( ) ( )( )( )

( ) ( )2

212

f x x

a

dQ f x g x dx Q f x g x dx e f x g t dt dxdxπ

− ⎛ ⎞⎛ ⎞= + ⎜ ⎟⎜ ⎟⎝ ⎠⎝ ⎠

∫ ∫ ∫ ∫ .

g) Approximation:

i) ( )( )

2

22

1 121

z

Q z ea z a z b π

−⎡ ⎤⎢ ⎥≈⎢ ⎥− + +⎣ ⎦

; 1aπ

= , 2b π=

ii) ( )

22

22

2

1 1122

xxe Q x e

x x π

−−⎛ ⎞− ≤ ≤⎜ ⎟

⎝ ⎠.

27) Moment and central moment

n 0 1 2 3 4

nXE 1 μ 2 2μ σ+ ( )2 23μ μ σ+ 4 2 26 3 4μ μ σ σ+ +

( )nX μ⎡ −⎣E ⎤⎦ 1 0 2σ 0 43σ

• ( ) ( ) ( ) ( )2 0, odd

11 3 5 1 , even

k kk

kX k X

k kμ μ

σ− ⎧⎡ ⎤ ⎡ ⎤− = − − = ⎨⎣ ⎦ ⎣ ⎦ ⋅ ⋅ ⋅ ⋅ −⎩

E E

• ( )

( )

22 4 6 1 , odd

1 3 5 1 , even

kk

k

k kX

k k

σμ π

σ

⎧⋅ ⋅ ⋅ ⋅ −⎪⎡ ⎤− = ⎨⎣ ⎦ ⎪ ⋅ ⋅ ⋅ ⋅ −⎩

E [Papoulis p 111].

• 2 2 2Var 4 2X 4μ σ σ⎡ ⎤ = +⎣ ⎦ .

28) For ( )0,1N and , 1k ≥ ( ) ( )2 0, odd

11 3 5 1 , even

k k kX k X

k k− ⎧

⎡ ⎤ ⎡ ⎤= − = ⎨⎣ ⎦ ⎣ ⎦ ⋅ ⋅ ⋅ ⋅ −⎩E E

29) Error function (Matlab): ( ) (2

0

2 1 2 2z

xerf z e dx Q zπ

−= = −∫ ) corresponds to

P X z⎡ <⎣ ⎤⎦ where X ~ 10,2

⎛ ⎞⎜ ⎟⎝ ⎠

N .

10,2

⎛ ⎞⎜ ⎟⎝ ⎠

N

z

( )erf z

0

( )2Q z

a) ( )lim 1

zerf z

→∞=

b) ( ) (erf z erf z− = − )Figure 23: erf-function and Q-function

10.4.3 Exponential Distribution

Definition 10.53. The exponential distribution is denoted byE (λ).

(a) λ > 0 is a parameter of the distribution, often called the rateparameter.

(b) Characterized by

• fX (x) =

λe−λx, x > 0,0, x ≤ 0

• FX (x) =

1− e−λx, x > 0,0, x ≤ 0

138

• Survival-, survivor-, or reliability-function:

(c) MATLAB:

• X = exprnd(1/λ) or random(’exp’,1/λ)

• fX(x) = exppdf(x,1/λ) or pdf(’exp’,x,1/λ)

• FX(x) = expcdf(x,1/λ) or cdf(’exp’,x,1/λ)

Example 10.54. Suppose X ∼ E(λ), find P [1 < X < 2].

Exercise 10.55. Exponential random variable as a continuousversion of geometric random variable: Suppose X ∼ E (λ). Showthat bXc ∼ G0(e

−λ) and dXe ∼ G1(e−λ)

Example 10.56. The exponential distribution is intimately re-lated to the Poisson process. It is often used as a probabilitymodel for the (waiting) time until a “rare” event occurs.

• time elapsed until the next earthquake in a certain region

• decay time of a radioactive particle

• time between independent events such as arrivals at a servicefacility or arrivals of customers in a shop.

• duration of a cell-phone call

• time it takes a computer network to transmit a message fromone node to another.

139

10.57. EX = 1λ

Example 10.58. Phone Company A charges $0.15 per minutefor telephone calls. For any fraction of a minute at the end ofa call, they charge for a full minute. Phone Company B alsocharges $0.15 per minute. However, Phone Company B calculatesits charge based on the exact duration of a call. If T , the durationof a call in minutes, is exponential with parameter λ = 1/3, whatare the expected revenues per call E [RA] and E [RB] for companiesA and B?

Solution : First, note that ET = 1λ = 3. Hence,

E [RB] = E [0.15× T ] = 0.15ET = $0.45.

andE [RA] = E [0.15× dT e] = 0.15E dT e .

Now, recall, from Exercise 10.55, that dT e ∼ G1

(e−λ). Hence,

E dT e = 11−e−λ ≈ 3.53. Therefore,

E [RA] = 0.15E dT e ≈ 0.5292.

10.59. Memoryless property : The exponential r.v. is the onlycontinuous44 r.v. on [0,∞) that satisfies the memoryless property:

P [X > s+ x |X > s ] = P [X > x]

for all x > 0 and all s > 0 [18, p. 157–159]. In words, the futureis independent of the past. The fact that it hasn’t happened yet,tells us nothing about how much longer it will take before it doeshappen.

• Imagining that the exponentially distributed random variableX represents the lifetime of an item, the residual life of an itemhas the same exponential distribution as the original lifetime,

44For discrete random variable, geometric random variables satisfy the memoryless property.

140

regardless of how long the item has been already in use. Inother words, there is no deterioration/degradation over time.If it is still currently working after 20 years of use, then today,its condition is “just like new”.

• In particular, suppose we define the setB+x to be x+ b : b ∈ B.For any x > 0 and set B ⊂ [0,∞), we have

P [X ∈ B + x|X > x] = P [X ∈ B]

because

P [X ∈ B + x]

P [X > x]=

∫B+x λe

−λtdt

e−λxτ=t−x

=

∫B λe

−λ(τ+x)dτ

e−λx.

10.60. Summary:

X ∼ Support SX fX (x) =

Uniform U(a, b) (a, b)

1b−a , a < x < b,

0, otherwise.

Normal (Gaussian) N (m,σ2) R 1√2πσ

e−12(x−mσ )

2

Exponential E(λ) (0,∞)

λe−λx, x > 0,0, x ≤ 0

Table 4: Examples of probability density functions. Here, λ, σ > 0.

141

10.5 Function of Continuous Random Variables: SISO

Reconsider the derived random variable Y = g(X).

Recall that we can find EY easily by (22):

EY = E [g(X)] =

∫Rg(x)fX(x)dx.

However, there are cases when we have to evaluate probabilitydirectly involving the random variable Y or find fY (y) directly.

Recall that for discrete random variables, it is easy to find pY (y)by adding all pX(x) over all x such that g(x) = y:

pY (y) =∑

x:g(x)=y

pX(x). (23)

For continuous random variables, it turns out that we can’t45 sim-ply integrate the pdf of X to get the pdf of Y .

10.61. For Y = g(X), if you want to find fY (y), the followingtwo-step procedure will always work and is easy to remember:

(a) Find the cdf FY (y) = P [Y ≤ y].

(b) Compute the pdf from the cdf by “finding the derivative”fY (y) = d

dyFY (y) (as described in 10.13).

10.62. Linear Transformation : Suppose Y = aX + b. Then,the cdf of Y is given by

FY (y) = P [Y ≤ y] = P [aX + b ≤ y] =

P[X ≤ y−b

a

], a > 0,

P[X ≥ y−b

a

], a < 0.

Now, by definition, we know that

P

[X ≤ y − b

a

]= FX

(y − ba

),

45When you applied Equation (23) to continuous random variables, what you would get is0 = 0, which is true but not interesting nor useful.

142

and

P

[X ≥ y − b

a

]= P

[X >

y − ba

]+ P

[X =

y − ba

]= 1− FX

(y − ba

)+ P

[X =

y − ba

].

For continuous random variable, P[X = y−b

a

]= 0. Hence,

FY (y) =

FX

(y−ba

), a > 0,

1− FX(y−ba

), a < 0.

Finally, fundamental theorem of calculus and chain rule gives

fY (y) =d

dyFY (y) =

1afX

(y−ba

), a > 0,

−1afX

(y−ba

), a < 0.

Note that we can further simplify the final formula by using the| · | function:

fY (y) =1

|a|fX(y − ba

), a 6= 0. (24)

Graphically, to get the plots of fY , we compress fX horizontallyby a factor of a, scale it vertically by a factor of 1/|a|, and shift itto the right by b.

Of course, if a = 0, then we get the uninteresting degeneratedrandom variable Y ≡ b.

Example 10.63. Suppose X ∼ E(λ). Let Y = 5X. Find fY (y).

143

10.64. SupposeX ∼ N (m,σ2) and Y = aX+b for some constantsa and b. Then, we can use (24) to show that Y ∼ N (am+b, a2σ2).

Example 10.65. Amplitude modulation in certain communica-tion systems can be accomplished using various nonlinear devicessuch as a semiconductor diode. Suppose we model the nonlineardevice by the function Y = X2. If the input X is a continuousrandom variable, find the density of the output Y = X2.

Example 10.66. Suppose X ∼ E(λ). Let Y = 1X2 . Find fY (y).

144

Exercise 10.67 (F2011). Suppose X is uniformly distributed onthe interval (1, 2). (X ∼ U(1, 2).) Let Y = 1

X2 .

(a) Find fY (y).

(b) Find EY .

Exercise 10.68 (F2011). Consider the function

g(x) =

x, x ≥ 0−x, x < 0.

Suppose Y = g(X), where X ∼ U(−2, 2).Remark: The function g operates like a full-wave rectifier in

that if a positive input voltage X is applied, the output is Y = X,while if a negative input voltage X is applied, the output is Y =−X.

(a) Find EY .

(b) Plot the cdf of Y .

(c) Find the pdf of Y

145

Discrete ContinuousP [X ∈ B] =

∑x∈B

pX(x)∫B

fX(x)dx

P [X = x] = pX(x) = F (x)− F (x−) 0

Interval prob.

PX ((a, b]) = F (b)− F (a)

PX ([a, b]) = F (b)− F(a−)

PX ([a, b)) = F(b−)− F

(a−)

PX ((a, b)) = F(b−)− F (a)

PX ((a, b]) = PX ([a, b])

= PX ([a, b)) = PX ((a, b))

=

b∫a

fX(x)dx = F (b)− F (a)

EX =∑x

xpX(x)+∞∫−∞

xfX(x)dx

For Y = g(X), pY (y) =∑

x: g(x)=y

pX(x)

fY (y) =d

dyP [g(X) ≤ y] .

Alternatively,

fY (y) =∑k

fX(xk)

|g′(xk)|,

xk are the real-valued rootsof the equation y = g(x).

For Y = g(X),P [Y ∈ B] =

∑x:g(x)∈B

pX(x)∫

x:g(x)∈BfX(x)dx

E [g(X)] =∑x

g(x)pX(x)+∞∫−∞

g(x)fX(x)dx

E [X2] =∑x

x2pX(x)+∞∫−∞

x2fX(x)dx

VarX =∑x

(x− EX)2pX(x)+∞∫−∞

(x− EX)2 fX(x)dx

Table 5: Important Formulas for Discrete and Continuous Random Variables

146




ECS315 2014/1 Part V.1 Dr.Prapun

11 Multiple Random Variables

One is often interested not only in individual random variables, butalso in relationships between two or more random variables. Fur-thermore, one often wishes to make inferences about one randomvariable on the basis of observations of other random variables.

Example 11.1. If the experiment is the testing of a new medicine,the researcher might be interested in cholesterol level, blood pres-sure, and the glucose level of a test person.

11.1 A Pair of Discrete Random Variables

In this section, we consider two discrete random variables, say Xand Y , simultaneously.

11.2. The analysis are different from Section 9.2 in two mainaspects. First, there may be no deterministic relationship (such asY = g(X)) between the two random variables. Second, we wantto look at both random variables as a whole, not just X alone orY alone.

Example 11.3. Communication engineers may be interested inthe input X and output Y of a communication channel.

149

Example 11.4. Of course, to rigorously define (any) random vari-ables, we need to go back to the sample space Ω. Recall Example7.4 where we considered several random variables defined on thesample space Ω = 1, 2, 3, 4, 5, 6 where the outcomes are equallylikely. In that example, we define X(ω) = ω and Y (ω) = (ω− 3)2.

Example 11.5. Consider the scores of 20 students below:

10, 9, 10, 9, 9, 10, 9, 10, 10, 9︸︷︷︸Room #1

, 1, 3, 4, 6, 5, 5, 3, 3, 1, 3.︸︷︷︸Room #2

The first ten scores are from (ten) students in room #1. The last10 scores are from (ten) students in room #2.

Suppose we have the a score report card for each student. Then,in total, we have 20 report cards.

Figure 24: In Example 11.5, we pick a report card randomly from a pile ofcards.

I pick one report card up randomly. Let X be the score on thatcard.

• What is the chance that X > 5? (Ans: P [X > 5] = 11/20.)

150

• What is the chance thatX = 10? (Ans: pX(10) = P [X = 10] =5/20 = 1/4.)

Now, let the random variable Y denote the room# of the studentwhose report card is picked up.

• What is the probability that X = 10 and Y = 2?

• What is the probability that X = 10 and Y = 1?

• What is the probability that X > 5 and Y = 1?

• What is the probability that X > 5 and Y = 2?

Now suppose someone informs me that the report card which Ipicked up is from a student in room #1. (He may be able to tellthis by the color of the report card of which I have no knowledge.)I now have an extra information that Y = 1.

• What is the probability that X > 5 given that Y = 1?

• What is the probability that X = 10 given that Y = 1?

151

11.6. Recall that, in probability, “,” means “and”. For example,

P [X = x, Y = y] = P [X = x and Y = y]

and

P [3 ≤ X < 4, Y < 1] = P [3 ≤ X < 4 and Y < 1]

= P [X ∈ [3, 4) and Y ∈ (−∞, 1)] .

In general, the event

[“Some condition(s) on X”,“Some condition(s) on Y ”]

is the same as the intersection of two events:

[“Some condition(s) on X”] ∩ [“Some condition(s) on Y ”]

which simply means both statements happen.More technically,

[X ∈ B, Y ∈ C] = [X ∈ B and Y ∈ C] = [X ∈ B] ∩ [Y ∈ C]

andP [X ∈ B, Y ∈ C] = P [X ∈ B and Y ∈ C]

= P ([X ∈ B] ∩ [Y ∈ C]) .

Remark: Linking back to the original sample space, this short-hand actually says

[X ∈ B, Y ∈ C] = [X ∈ B and Y ∈ C]

= ω ∈ Ω : X(ω) ∈ B and Y (ω) ∈ C= ω ∈ Ω : X(ω) ∈ B ∩ ω ∈ Ω : Y (ω) ∈ C= [X ∈ B] ∩ [Y ∈ C] .

152

11.7. The concept of conditional probability can be straightfor-wardly applied to discrete random variables. For example,

P [“Some condition(s) on X” | “Some condition(s) on Y ”] (25)

is the conditional probability P (A|B) where

A = [“Some condition(s) on X”] and

B = [“Some condition(s) on Y ”].

Recall that P (A|B) = P (A ∩B)/P (B). Therefore,

P [X = x|Y = y] =P [X = x and Y = y]

P [Y = y],

and

P [3 ≤ X < 4|Y < 1] =P [3 ≤ X < 4 and Y < 1]

P [Y < 1]

More generally, (25) is

=P ([“Some condition(s) on X”] ∩ [“Some condition(s) on Y ”])

P ([“Some condition(s) on Y ”])

=P ([“Some condition(s) on X”,“Some condition(s) on Y ”])

P ([“Some condition(s) on Y ”])

=P [“Some condition(s) on X”,“Some condition(s) on Y ”]

P [“Some condition(s) on Y ”]

More technically,

P [X ∈ B|Y ∈ C] = P ([X ∈ B] |[Y ∈ C]) =P ([X ∈ B] ∩ [Y ∈ C])

P ([Y ∈ C])

=P [X ∈ B, Y ∈ C]

P [Y ∈ C].

153

Definition 11.8. Joint pmf : If X and Y are two discrete ran-dom variables (defined on a same sample space with probabilitymeasure P ), the function pX,Y (x, y) defined by

pX,Y (x, y) = P [X = x, Y = y]

is called the joint probability mass function of X and Y .

(a) We can visualize the joint pmf via stem plot. See Figure 25.

(b) To evaluate the probability for a statement that involves bothX and Y random variables:

We first find all pairs (x, y) that satisfy the condition(s) inthe statement, and then add up all the corresponding valuesfrom the joint pmf .

More technically, we can then evaluate P [(X, Y ) ∈ R] by

P [(X, Y ) ∈ R] =∑

(x,y):(x,y)∈RpX,Y (x, y).

Example 11.9 (F2011). Consider random variables X and Ywhose joint pmf is given by

pX,Y (x, y) =

c (x+ y) , x ∈ 1, 3 and y ∈ 2, 4 ,0, otherwise.

(a) Check that c = 1/20.

(b) Find P[X2 + Y 2 = 13

].

In most situation, it is much more convenient to focus on the“important” part of the joint pmf. To do this, we usually presentthe joint pmf (and the conditional pmf) in their matrix forms:

154

Definition 11.10. When both X and Y take finitely many val-ues (both have finite supports), say SX = x1, . . . , xm and SY =y1, . . . , yn, respectively, we can arrange the probabilities pX,Y (xi, yj)in an m× n matrix

pX,Y (x1, y1) pX,Y (x1, y2) . . . pX,Y (x1, yn)pX,Y (x2, y1) pX,Y (x2, y2) . . . pX,Y (x2, yn)

...... . . . ...

pX,Y (xm, y1) pX,Y (xm, y2) . . . pX,Y (xm, yn)

. (26)

• We shall call this matrix the joint pmf matrix.

• The sum of all the entries in the matrix is one.

2.3 Multiple random variables 75

Example 2.13. In the preceding example, what is the probability that the first cache

miss occurs after the third memory access?

Solution. We need to find

P(T > 3) =∞

∑k=4

P(T = k).

However, since P(T = k) = 0 for k ≤ 0, a finite series is obtained by writing

P(T > 3) = 1−P(T ≤ 3)

= 1−3

∑k=1

P(T = k)

= 1− (1− p)[1+ p+ p2].

Joint probability mass functions

The joint probability mass function of X and Y is defined by

pXY (xi,y j) := P(X = xi,Y = y j). (2.7)

An example for integer-valued random variables is sketched in Figure 2.8.

0

1

2

3

4

5

6

7

8

01

23

45

6

0

0.02

0.04

0.06

ji

Figure 2.8. Sketch of bivariate probability mass function pXY (i, j).

It turns out that we can extract the marginal probability mass functions pX (xi) and

pY (y j) from the joint pmf pXY (xi,y j) using the formulas

pX (xi) = ∑j

pXY (xi,y j) (2.8)

Figure 25: Example of the plot of a joint pmf. [9, Fig. 2.8]

• pX,Y (x, y) = 0 if46 x /∈ SX or y /∈ SY . In other words, wedon’t have to consider the x and y outside the supports of Xand Y , respectively.

46To see this, note that pX,Y (x, y) can not exceed pX(x) because P (A ∩B) ≤ P (A). Now,suppose at x = a, we have pX(a) = 0. Then pX,Y (a, y) must also = 0 for any y because it cannot exceed pX(a) = 0. Similarly, suppose at y = a, we have pY (a) = 0. Then pX,Y (x, a) = 0for any x.

155

11.11. From the joint pmf, we can find pX(x) and pY (y) by

pX(x) =∑y

pX,Y (x, y) (27)

pY (y) =∑x

pX,Y (x, y) (28)

In this setting, pX(x) and pY (y) are call the marginal pmfs (todistinguish them from the joint one).

(a) Suppose we have the joint pmf matrix in (26). Then, the sumof the entries in the ith row is47 pX(xi), andthe sum of the entries in the jth column is pY (yj):

pX(xi) =n∑j=1

pX,Y (xi, yj) and pY (yj) =m∑i=1

pX,Y (xi, yj)

(b) In MATLAB, suppose we save the joint pmf matrix as P XY, thenthe marginal pmf (row) vectors p X and p Y can be found by

p_X = (sum(P_XY,2))’

p_Y = (sum(P_XY,1))

Example 11.12. Consider the following joint pmf matrix

47To see this, we consider A = [X = xi] and a collection defined by Bj = [Y = yj ]and B0 = [Y /∈ SY ]. Note that the collection B0, B1, . . . , Bn partitions Ω. So, P (A) =∑nj=0 P (A ∩Bj). Of course, because the support of Y is SY , we have P (A∩B0) = 0. Hence,

the sum can start at j = 1 instead of j = 0.

156

Definition 11.13. The conditional pmf of X given Y is definedas

pX|Y (x|y) = P [X = x|Y = y]

which gives

pX,Y (x, y) = pX|Y (x|y)pY (y) = pY |X(y|x)pX(x). (29)

11.14. Equation (29) is quite important in practice. In mostcases, systems are naturally defined/given/studied in terms of theirconditional probabilities, say pY |X(y|x). Therefore, it is importantthe we know how to construct the joint pmf from the conditionalpmf.

Example 11.15. Consider a binary symmetric channel. Supposethe input X to the channel is Bernoulli(0.3). At the output Y ofthis channel, the crossover (bit-flipped) probability is 0.1. Findthe joint pmf pX,Y (x, y) of X and Y .

Exercise 11.16. Toss-and-Roll Game:

Step 1 Toss a fair coin. Define X by

X =

1, if result = H,0, if result = T.

Step 2 You have two dice, Dice 1 and Dice 2. Dice 1 is fair. Dice 2 isunfair with p(1) = p(2) = p(3) = 2

9 and p(4) = p(5) = p(6) =19 .

(i) If X = 0, roll Dice 1.

(ii) If X = 1, roll Dice 2.

157

Record the result as Y .

Find the joint pmf pX,Y (x, y) of X and Y .

Exercise 11.17 (F2011). Continue from Example 11.9. Randomvariables X and Y have the following joint pmf

pX,Y (x, y) =

c (x+ y) , x ∈ 1, 3 and y ∈ 2, 4 ,0, otherwise.

(a) Find pX(x).

(b) Find EX.

(c) Find pY |X(y|1). Note that your answer should be of the form

pY |X(y|1) =

?, y = 2,?, y = 4,0, otherwise.

(d) Find pY |X(y|3).

Definition 11.18. The joint cdf of X and Y is defined by

FX,Y (x, y) = P [X ≤ x, Y ≤ y] .

158

Definition 11.19. Two random variables X and Y are said to beidentically distributed if, for every B, P [X ∈ B] = P [Y ∈ B].

Example 11.20. Let X ∼ Bernoulli(1/2). Let Y = X andZ = 1 − X. Then, all of these random variables are identicallydistributed.

11.21. The following statements are equivalent:

(a) Random variables X and Y are identically distributed .

(b) For every B, P [X ∈ B] = P [Y ∈ B]

(c) pX(c) = pY (c) for all c

(d) FX(c) = FY (c) for all c

Definition 11.22. Two random variables X and Y are said to beindependent if the events [X ∈ B] and [Y ∈ C] are independentfor all sets B and C.


(a) Random variables X and Y are independent .

(b) [X ∈ B] |= [Y ∈ C] for all B,C.

(c) P [X ∈ B, Y ∈ C] = P [X ∈ B]× P [Y ∈ C] for all B,C.

(d) pX,Y (x, y) = pX(x)× pY (y) for all x, y.

(e) FX,Y (x, y) = FX(x)× FY (y) for all x, y.

Definition 11.24. Two random variables X and Y are said to beindependent and identically distributed (i.i.d.) if X andY are both independent and identically distributed.

11.25. Being identically distributed does not imply independence.Similarly, being independent, does not imply being identically dis-tributed.

159

Example 11.26. Roll a dice. Let X be the result. Set Y = X.

Example 11.27. Suppose the pmf of a random variable X is givenby

pX (x) =

1/4, x = 3,α, x = 4,0, otherwise.

Let Y be another random variable. Assume that X and Y arei.i.d.

Find

(a) α,

(b) the pmf of Y , and

(c) the joint pmf of X and Y .

160

Example 11.28. Consider a pair of random variables X and Ywhose joint pmf is given by

pX,Y (x, y) =

1/15, x = 3, y = 1,2/15, x = 4, y = 1,4/15, x = 3, y = 3,β, x = 4, y = 3,0, otherwise.

(a) Are X and Y identically distributed?

(b) Are X and Y independent?

161

11.2 Extending the Definitions to Multiple RVs

Definition 11.29. Joint pmf:

pX1,X2,...,Xn(x1, x2, . . . , xn) = P [X1 = x1, X2 = x2, . . . , Xn = xn] .

Joint cdf:

FX1,X2,...,Xn(x1, x2, . . . , xn) = P [X1 ≤ x1, X2 ≤ x2, . . . , Xn ≤ xn] .

11.30. Marginal pmf:

Definition 11.31. Identically distributed random variables:The following statements are equivalent.

(a) Random variables X1, X2, . . . are identically distributed

(b) For every B, P [Xj ∈ B] does not depend on j.

(c) pXi(c) = pXj

(c) for all c, i, j.

(d) FXi(c) = FXj

(c) for all c, i, j.

Definition 11.32. Independence among finite number of ran-dom variables: The following statements are equivalent.

(a) X1, X2, . . . , Xn are independent

(b) [X1 ∈ B1], [X2 ∈ B2], . . . , [Xn ∈ Bn] are independent, for allB1, B2, . . . , Bn.

(c) P [Xi ∈ Bi,∀i] =∏n

i=1 P [Xi ∈ Bi], for all B1, B2, . . . , Bn.

(d) pX1,X2,...,Xn(x1, x2, . . . , xn) =

∏ni=1 pXi

(xi) for all x1, x2, . . . , xn.

(e) FX1,X2,...,Xn(x1, x2, . . . , xn) =

∏ni=1 FXi

(xi) for all x1, x2, . . . , xn.

Example 11.33. Toss a coin n times. For the ith toss, let

Xi =

1, if H happens on the ith toss,0, if T happens on the ith toss.

We then have a collection of i.i.d. random variablesX1, X2, X3, . . . , Xn.

162

Example 11.34. Roll a dice n times. Let Ni be the result of theith roll. We then have another collection of i.i.d. random variablesN1, N2, N3, . . . , Nn.

Example 11.35. Let X1 be the result of tossing a coin. Set X2 =X3 = · · · = Xn = X1.

11.36. If X1, X2, . . . , Xn are independent, then so is any subcol-lection of them.

11.37. For i.i.d. Xi ∼ Bernoulli(p), Y = X1 + X2 + · · · + Xn isB(n, p).

Definition 11.38. A pairwise independent collection of ran-dom variables is a collection of random variables any two of whichare independent.

(a) Any collection of (mutually) independent random variables ispairwise independent

(b) Some pairwise independent collections are not independent.See Example (11.39).

Example 11.39. Let suppose X, Y , and Z have the followingjoint probability distribution: pX,Y,Z (x, y, z) = 1

4 for (x, y, z) ∈(0, 0, 0), (0, 1, 1), (1, 0, 1), (1, 1, 0). This, for example, can be con-structed by starting with independent X and Y that are Bernoulli-12 . Then set Z = X ⊕ Y = X + Y mod 2.

(a) X, Y, Z are pairwise independent.

(b) X, Y, Z are not independent.

163




ECS315 2014/1 Part V.2 Dr.Prapun

11.3 Function of Discrete Random Variables

11.40. Recall that for discrete random variable X, the pmf of aderived random variable Y = g(X) is given by

pY (y) =∑

x:g(x)=y

pX(x).

Similarly, for discrete random variables X and Y , the pmf of aderived random variable Z = g(X, Y ) is given by

pZ(z) =∑

(x,y):g(x,y)=z

pX,Y (x, y).

Example 11.41. Suppose the joint pmf of X and Y is given by

pX,Y (x, y) =

1/15, x = 0, y = 0,2/15, x = 1, y = 0,4/15, x = 0, y = 1,8/15, x = 1, y = 1,0, otherwise.

Let Z = X + Y . Find the pmf of Z.

164

Exercise 11.42 (F2011). Continue from Exercise 11.9. Let Z =X + Y .

(a) Find the pmf of Z.

(b) Find EZ.

11.43. In general, when Z = X + Y ,

pZ(z) =∑

(x,y):x+y=z

pX,Y (x, y)

=∑y

pX,Y (z − y, y) =∑x

pX,Y (x, z − x).

Furthermore, if X and Y are independent,

pZ(z) =∑

(x,y):x+y=z

pX (x) pY (y) (30)

=∑y

pX (z − y) pY (y) =∑x

pX (x) pY (z − x). (31)

Example 11.44. Suppose Λ1 ∼ P(α1) and Λ2 ∼ P(α2) are inde-pendent. Let Λ = Λ1+Λ2. Use (31) to show48 that Λ ∼ P(α1+α2).

First, note that pΛ(`) would be positive only on nonnegativeintegers because a sum of nonnegative integers (Λ1 and Λ2) is stilla nonnegative integer. So, the support of Λ is the same as thesupport for Λ1 and Λ2. Now, we know, from (31), that

P [Λ = `] = P [Λ1 + Λ2 = `] =∑i

P [Λ1 = i]P [Λ2 = `− i]

Of course, we are interested in ` that is a nonnegative integer.The summation runs over i = 0, 1, 2, . . .. Other values of i wouldmake P [Λ1 = i] = 0. Note also that if i > `, then ` − i < 0 andP [Λ2 = `− i] = 0. Hence, we conclude that the index i can only

48Remark: You may feel that simplifying the sum in this example (and in Exercise 11.45is difficult and tedious, in Section 13, we will introduce another technique which will makethe answer obvious. The idea is to realize that (31) is a convolution and hence we can useFourier transform to work with a product in another domain.

165

be integers from 0 to k:

P [Λ = `] =∑i=0

e−α1αi1i!e−α2

α`−i2

(`− i)!

= e−(α1+α2) 1

`!

∑i=0

`!

i! (`− i)!αi1α

`−i2

= e−(α1+α2) (α1 + α2)`

`!,

where the last equality is from the binomial theorem. Hence, thesum of two independent Poisson random variables is still Poisson!

pΛ (`) =

e−(α1+α2) (α1+α2)

`

`! , ` ∈ 0, 1, 2, . . .0, otherwise

Exercise 11.45. Suppose B1 ∼ B(n1, p) and B2 ∼ B(n2, p) areindependent. Let B = B1 + B2. Use (31) to show that B ∼B(n1 + n2, p).

11.4 Expectation of Function of Discrete Random Vari-ables

11.46. Recall that the expected value of “any” function g of adiscrete random variable X can be calculated from

E [g(X)] =∑x

g(x)pX(x).

Similarly49, the expected value of “any” function g of two discreterandom variables X and Y can be calculated from

E [g(X, Y )] =∑x

∑y

g(x, y)pX,Y (x, y).

49Again, these are called the law/rule of the lazy statistician (LOTUS) [22, Thm 3.6p 48],[9, p. 149] because it is so much easier to use the above formula than to first find thepmf of g(X) or g(X,Y ). It is also called substitution rule [21, p 271].

166

DiscreteP [X ∈ B]

∑x∈B

pX(x)

P [(X, Y ) ∈ R]∑

(x,y):(x,y)∈RpX,Y (x, y)

Joint to Marginal: pX(x) =∑y

pX,Y (x, y)

(Law of Total Prob.) pY (y) =∑x

pX,Y (x, y)

P [X > Y ]∑x

∑y: y<x

pX,Y (x, y)

=∑y

∑x:x>y

pX,Y (x, y)

P [X = Y ]∑x

pX,Y (x, x)

X |= Y pX,Y (x, y) = pX(x)pY (y)

Conditional pX|Y (x|y) =pX,Y (x,y)

pY (y)

E [g(X, Y )]∑x

∑y

g(x, y)pX,Y (x, y)

Table 6: Joint pmf: A Summary

11.47. E [·] is a linear operator: E [aX + bY ] = aEX + bEY .

(a) Homogeneous: E [cX] = cEX

(b) Additive: E [X + Y ] = EX + EY

(c) Extension: E [∑n

i=1 cigi(Xi)] =∑n

i=1 ciE [gi(Xi)].

Example 11.48. Recall from 11.37 that when i.i.d. Xi ∼ Bernoulli(p),Y = X1 +X2 + · · ·Xn is B(n, p). Also, from Example 9.4, we haveEXi = p. Hence,

EY = E

[n∑i=1

Xi

]=

n∑i=1

E [Xi] =n∑i=1

p = np.

Therefore, the expectation of a binomial random variable withparameters n and p is np.

167

Example 11.49. A binary communication link has bit-error prob-ability p. What is the expected number of bit errors in a trans-mission of n bits?

Theorem 11.50 (Expectation and Independence). Two randomvariables X and Y are independent if and only if

E [h(X)g(Y )] = E [h(X)]E [g(Y )]

for “all” functions h and g.

• In other words, X and Y are independent if and only if forevery pair of functions h and g, the expectation of the producth(X)g(Y ) is equal to the product of the individual expecta-tions.

• One special case is that

X |= Y implies E [XY ] = EX × EY. (32)

However, independence means more than this property. Inother words, having E [XY ] = (EX)(EY ) does not necessarilyimply X |= Y . See Example 11.61.

11.51. Let’s combined what we have just learned about indepen-dence into the definition/equivalent statements that we alreadyhave in 11.32.

The following statements are equivalent:

(a) Random variables X and Y are independent .

(b) [X ∈ B] |= [Y ∈ C] for all B,C.

(c) P [X ∈ B, Y ∈ C] = P [X ∈ B]× P [Y ∈ C] for all B,C.

(d) pX,Y (x, y) = pX(x)× pY (y) for all x, y.

(e) FX,Y (x, y) = FX(x)× FY (y) for all x, y.

(f)

168

Exercise 11.52 (F2011). Suppose X and Y are i.i.d. with EX =EY = 1 and VarX = VarY = 2. Find Var[XY ].

11.53. To quantify the amount of dependence between tworandom variables, we may calculate their mutual information.This quantity is crucial in the study of digital communicationsand information theory. However, in introductory probability class(and introductory communication class), it is traditionally omit-ted.

11.5 Linear Dependence

Definition 11.54. Given two random variables X and Y , we maycalculate the following quantities:

(a) Correlation: E [XY ].

(b) Covariance: Cov [X, Y ] = E [(X − EX)(Y − EY )].

(c) Correlation coefficient: ρX,Y = Cov[X,Y ]σXσY

Exercise 11.55 (F2011). Continue from Exercise 11.9.

(a) Find E [XY ].

(b) Check that Cov [X, Y ] = − 125 .

11.56. Cov [X, Y ] = E [(X − EX)(Y − EY )] = E [XY ]−EXEY

• Note that VarX = Cov [X,X].

11.57. Var [X + Y ] = VarX + VarY + 2Cov [X, Y ]

169

Definition 11.58. X and Y are said to be uncorrelated if andonly if Cov [X, Y ] = 0.


(a) X and Y are uncorrelated.

(b) Cov [X, Y ] = 0.

(c) E [XY ] = EXEY .

(d)

11.60. Independence implies uncorrelatedness; that is if X |= Y ,then Cov [X, Y ] = 0.

The converse is not true. Uncorrelatedness does not imply in-dependence. See Example 11.61.

Example 11.61. Let X be uniform on ±1,±2 and Y = |X|.

11.62. The variance of the sum of uncorrelated (or independent)random variables is the sum of their variances.

170

Exercise 11.63. Suppose two fair dice are tossed. Denote by therandom variable V1 the number appearing on the first dice and bythe random variable V2 the number appearing on the second dice.Let X = V1 + V2 and Y = V1 − V2.

(a) Show that X and Y are not independent.

(b) Show that E [XY ] = EXEY .

11.64. Cov [aX + b, cY + d] = acCov [X, Y ]

Cov [aX + b, cY + d] = E [((aX + b)− E [aX + b]) ((cY + d)− E [cY + d])]

= E [((aX + b)− (aEX + b)) ((cY + d)− (cEY + d))]

= E [(aX − aEX) (cY − cEY )]

= acE [(X − EX) (Y − EY )]

= acCov [X,Y ] .

Definition 11.65. Correlation coefficient :

ρX,Y =Cov [X, Y ]

σXσY

= E[(

X − EXσX

)(Y − EYσY

)]=

E [XY ]− EXEYσXσY

.

• ρX,Y is dimensionless

• ρX,X = 1

• ρX,Y = 0 if and only if X and Y are uncorrelated.

• Cauchy-Schwartz Inequality 50:

|ρX,Y | ≤ 1.

In other words, ρXY ∈ [−1, 1].

50Cauchy-Schwartz inequality shows up in many areas of Mathematics. A general form ofthis inequality can be stated in any inner product space:

| 〈a, b〉 |2 ≤ 〈a, a〉〈b, b〉 .

Here, the inner product is defined by 〈X,Y 〉 = E [XY ]. The Cauchy-Schwartz inequality thengives

|E [XY ] |2 ≤ E[X2]E[Y 2].

171

11.66. Linear Dependence and Cauchy-Schwartz Inequality

(a) If Y = aX + b, then ρX,Y = sign(a) =

1, a > 0−1, a < 0.

• To be rigorous, we should also require that σX > 0 anda 6= 0.

(b) When σY , σX > 0, equality occurs if and only if the followingconditions holds

≡ ∃a 6= 0 such that (X − EX) = a(Y − EY )

≡ ∃a 6= 0 and b ∈ R such that X = aY + b

≡ ∃c 6= 0 and d ∈ R such that Y = cX + d

≡ |ρXY | = 1

In which case, |a| = σXσY

and ρXY = a|a| = sgn a. Hence, ρXY

is used to quantify linear dependence between X and Y .The closer |ρXY | to 1, the higher degree of linear dependencebetween X and Y .

Example 11.67. [21, Section 5.2.3] Consider an important factthat investment experience supports: spreading investments overa variety of funds (diversification) diminishes risk. To illustrate,imagine that the random variable X is the return on every investeddollar in a local fund, and random variable Y is the return on everyinvested dollar in a foreign fund. Assume that random variables Xand Y are i.i.d. with expected value 0.15 and standard deviation0.12.

If you invest all of your money, say c, in either the local or theforeign fund, your return R would be cX or cY .

• The expected return is ER = cEX = cEY = 0.15c.

• The standard deviation is cσX = cσY = 0.12c

Now imagine that your money is equally distributed over thetwo funds. Then, the return R is 1

2cX+ 12cY . The expected return

172

is ER = 12cEX + 1

2cEY = 0.15c. Hence, the expected returnremains at 15%. However,

VarR = Var[c

2(X + Y )

]=c2

4VarX +

c2

4VarY =

c2

2× 0.122.

So, the standard deviation is 0.12√2c ≈ 0.0849c.

In comparison with the distributions of X and Y , the pmf of12(X + Y ) is concentrated more around the expected value. Thecentralization of the distribution as random variables are averagedtogether is a manifestation of the central limit theorem.

11.68. [21, Section 5.2.3] Example 11.67 is based on the assump-tion that return rates X and Y are independent from each other.In the world of investment, however, risks are more commonlyreduced by combining negatively correlated funds (two funds arenegatively correlated when one tends to go up as the other falls).

This becomes clear when one considers the following hypothet-ical situation. Suppose that two stock market outcomes ω1 and ω2

are possible, and that each outcome will occur with a probability of12 Assume that domestic and foreign fund returns X and Y are de-termined by X(ω1) = Y (ω2) = 0.25 and X(ω2) = Y (ω1) = −0.10.Each of the two funds then has an expected return of 7.5%, withequal probability for actual returns of 25% and 10%. The randomvariable Z = 1

2(X + Y ) satisfies Z(ω1) = Z(ω2) = 0.075. In otherwords, Z is equal to 0.075 with certainty. This means that an in-vestment that is equally divided between the domestic and foreignfunds has a guaranteed return of 7.5%.

173

Exercise 11.69. The input X and output Y of a system subjectto random perturbations are described probabilistically by the fol-lowing joint pmf matrix:

0.02 0.10 0.08

0.08 0.32 0.40

y 2 4 5 x 1 3

(a) Evaluate the following quantities.

(i) EX(ii) P [X = Y ]

(iii) P [XY < 6]

(iv) E [(X − 3)(Y − 2)]

(v) E[X(Y 3 − 11Y 2 + 38Y )

](vi) Cov [X, Y ]

(vii) ρX,Y

(b) Calculate the following quantities using what you got frompart (a).

(i) Cov [3X + 4, 6Y − 7]

(ii) ρ3X+4,6Y−7

(iii) Cov [X, 6X − 7]

(iv) ρX,6X−7

174

Answers:

(a)

(i) EX = 2.6

(ii) P [X = Y ] = 0

(iii) P [XY < 6] = 0.2

(iv) E [(X − 3)(Y − 2)] = −0.88

(v) E[X(Y 3 − 11Y 2 + 38Y )

]= 104

(vi) Cov [X, Y ] = 0.032

(vii) ρX,Y = 0.0447

(b)

(i) Hence, Cov [3X + 4, 6Y − 7] = 3× 6× Cov [X, Y ] ≈ 3×6× 0.032 ≈ 0.576 .

(ii) Note that

ρaX+b,cY+d =Cov [aX + b, cY + d]

σaX+bσcY+d

=acCov [X, Y ]

|a|σX |c|σY=

ac

|ac|ρX,Y = sign(ac)× ρX,Y .

Hence, ρ3X+4,6Y−7 = sign(3× 4)ρX,Y = ρX,Y = 0.0447 .

(iii) Cov [X, 6X − 7] = 1 × 6 × Cov [X,X] = 6 × Var[X] ≈3.84 .

(iv) ρX,6X−7 = sign(1× 6)× ρX,X = 1 .

175

11.6 Multiple Continuous Random Variables

Discrete ContinuousP [X ∈ B]

∑x∈B

pX(x)∫B

fX(x)dx

P [(X, Y ) ∈ R]∑

(x,y):(x,y)∈RpX,Y (x, y)

∫∫(x,y):(x,y)∈R

fX,Y (x, y)dxdy

Joint to Marginal: pX(x) =∑y

pX,Y (x, y) fX(x) =+∞∫−∞

fX,Y (x, y)dy

(Law of Total Prob.) pY (y) =∑x

pX,Y (x, y) fY (y) =+∞∫−∞

fX,Y (x, y)dx

P [X > Y ]∑x

∑y: y<x

pX,Y (x, y)+∞∫−∞

x∫−∞

fX,Y (x, y)dydx

=∑y

∑x:x>y

pX,Y (x, y) =+∞∫−∞

∞∫y

fX,Y (x, y)dxdy

P [X = Y ]∑x

pX,Y (x, x) 0

X |= Y pX,Y (x, y) = pX(x)pY (y) fX,Y (x, y) = fX(x)fY (y)

Conditional pX|Y (x|y) =pX,Y (x,y)

pY (y)fX|Y (x|y) =

fX,Y (x,y)

fY (y)

E [g(X, Y )]∑x

∑y

g(x, y)pX,Y (x, y)+∞∫−∞

+∞∫−∞

g(x, y)fX,Y (x, y)dxdy

P [g(X, Y ) ∈ B]∑

(x,y): g(x,y)∈BpX,Y (x, y)

∫∫(x,y): g(x,y)∈B

fX,Y (x, y)dxdy

Z = X + Y pZ(z) =∑x

pX,Y (x, z − x) fZ(z) =∫ +∞−∞ fX,Y (x, z − x)dx

=∑y

pX,Y (z − y, y) =∫ +∞−∞ fX,Y (z − y, y)

Table 7: pmf vs. pdf

176




ECS315 2014/1 Part VI Dr.Prapun

12 Limiting Theorems

12.1 Law of Large Numbers (LLN)

Definition 12.1. Let X1, X2, . . . , Xn be a collection of randomvariables with a common mean E [Xi] = m for all i. In practice,since we do not know m, we use the numerical average, or samplemean,

Mn =1

n

n∑i=1

Xi

in place of the true, but unknown value, m.

Q: Can this procedure of using Mn as an estimate of m bejustified in some sense?

A: This can be done via the law of large number.

12.2. The law of large number basically says that if you have asequence of i.i.d random variables X1, X2, . . .. Then the samplemeans Mn = 1

n

∑ni=1Xi will converge to the actual mean as n →

∞.

12.3. LLN is easy to see via the property of variance. Note that

E [Mn] = E

[1

n

n∑i=1

Xi

]=

1

n

n∑i=1

EXi = m

and

Var[Mn] = Var

[1

n

n∑i=1

Xi

]=

1

n2

n∑i=1

VarXi =1

nσ2, (33)

176

Remarks:

(a) For (33) to hold, it is sufficient to have uncorrelated Xi’s.

(b) From (33), we also have

σMn=

1√nσ. (34)

In words, “when uncorrelated (or independent) random vari-ables each having the same distribution are averaged together,the standard deviation is reduced according to the square rootlaw.” [21, p 142].

Exercise 12.4 (F2011). Consider i.i.d. random variablesX1, X2, . . . , X10.Define the sample mean M by

M =1

10

10∑k=1

Xk.

Let

V1 =1

10

10∑k=1

(Xk − E [Xk])2.

and

V2 =1

10

10∑j=1

(Xj −M)2.

Suppose E [Xk] = 1 and Var[Xk] = 2.

(a) Find E [M ].

(b) Find Var[M ].

(c) Find E [V1].

(d) Find E [V2].

177

12.2 Central Limit Theorem (CLT)

In practice, there are many random variables that arise as a sumof many other random variables. In this section, we consider thesum

Sn =n∑i=1

Xi (35)

where theXi are i.i.d. with common meanm and common varianceσ2.

• Note that when we talk about Xi being i.i.d., the definitionis that they are independent and identically distributed. Itis then convenient to talk about a random variable X whichshares the same distribution (pdf/pmf) with these Xi. Thisallow us to write

Xii.i.d.∼ X, (36)

which is much more compact than saying that the Xi arei.i.d. with the same distribution (pdf/pmf) as X. Moreover,we can also use EX and σ2

X for the common expected valueand variance of the Xi.

Q: How does Sn behave?

In the previous section, we consider the sample mean of identi-cally distributed random variables. More specifically, we considerthe random variable Mn = 1

nSn. We found that Mn will convergeto m as n increases to ∞. Here, we don’t want to rescale the sumSn by the factor 1

n .

12.5 (Approximation of densities and pmfs using the CLT). Theactual statement of the CLT is a bit difficult to state. So, we firstgive you the interpretation/insight from CLT which is very easyto remember and use:

For n large enough, we can approximate Sn by a Gaus-sian random variable with the same mean and variance asSn.

178

Note that the mean and variance of Sn is nm and nσ2, re-spectively. Hence, for n large enough we can approximate Sn byN(nm, nσ2

). In particular,

(a) FSn(s) ≈ Φ(s−nmσ√n

).

(b) If the Xi are continuous random variable, then

fSn(s) ≈1√

2πσ√ne−

12(

x−nmσ√n )

2

.

(c) If the Xi are integer-valued, then

P [Sn = k] = P

[k − 1

2< Sn ≤ k +

1

2

]≈ 1√

2πσ√ne−

12(

k−nmσ√n )

2

.

[9, eq (5.14), p. 213].

The approximation is best for k near nm [9, p. 211].

Example 12.6. Approximation for Binomial Distribution: ForX ∼ B(n, p), when n is large, binomial distribution becomes diffi-cult to compute directly because of the need to calculate factorialterms.

(a) When p is not close to either 0 or 1 so that the variance isalso large, we can use CLT to approxmiate

P [X = k] ≈ 1√2πVarX

e−(k−EX)2

2 VarX (37)

=1√

2πnp (1− p)e−

(k−np)22np(1−p) . (38)

This is called Laplace approximation to the Binomial distri-bution [25, p. 282].

(b) When p is small, the binomial distribution can be approxi-mated by P(np) as discussed in 8.45.

(c) If p is very close to 1, then n−X will behave approximatelyPoisson.

179

• Normal Approximation to Poisson Distribution with large λ:

Let ( )~X λP . X can be though of as a sum of i.i.d. ( )0~iX λP , i.e., 1

n

ii

X X=

= ∑ , where

0nλ λ= . Hence X is approximately normal ( ),λ λN for λ large.

Some says that the normal approximation is good when 5λ > .

The above figure compare 1) Poisson when x is integer, 2) Gaussian, 3) Gamma, 4) Binomial.

• If :g + →Z R is any bounded function and ( )~ λΛ P , then ( ) ( )1 0g gλ Λ + − Λ Λ =⎡ ⎤⎣ ⎦E .

Proof. ( ) ( )( ) ( ) ( )

( ) ( )

1

0 0 1

1 1

0 0

1 1! ! !

1 1! !

0

i i i

i i i

i m

i m

g i ig i e e g i ig ii i i

e g i g mi m

λ λ

λ

λ λ λλ

λ λ

+∞ ∞ ∞− −

= = =

+ +∞ ∞−

= =

⎛ ⎞+ − = + −⎜ ⎟

⎝ ⎠⎛ ⎞

= + − +⎜ ⎟⎝ ⎠

=

∑ ∑ ∑

∑ ∑

Any function :f + →Z R for which ( ) 0f Λ =⎡ ⎤⎣ ⎦E can be expressed in the form

( ) ( ) ( )1f j g j jg jλ= + − for a bounded function g.

Thus, conversely, if ( ) ( )1 0g gλ Λ + − Λ Λ =⎡ ⎤⎣ ⎦E for all bounded g, then Λ has the Poisson

distribution ( )λP .

• Poisson distribution can be obtained as a limit from negative binomial distributions. (Thus, the negative binomial distribution with parameters r and p can be approximated by the

Poisson distribution with parameter rqp

λ = (maen-matching), provided that p is

“sufficiently” close to 1 and r is “sufficiently” large. • Let X be Poisson with mean λ . Suppose, that the mean λ is chosen in accord with a

probability distribution ( )F λΛ . Hence,

p 0.05:= n 100:= λ 5:=

0 5 100

0.05

0.1

0.15e λ− λx

Γ x 1+( )⋅

1

2 π⋅ λe

1−2 λ⋅

x λ−( )2

⋅

e x− xλ 1−⋅

Γ λ( )

Γ n 1+( )

Γ n x− 1+( ) Γ x 1+( )⋅px⋅ 1 p−( ) n x−⋅

x

p 0.05:= n 800:= λ 40:=

0 20 40 600

0.02

0.04

0.06e λ− λx

Γ x 1+( )⋅

1

2 π⋅ λe

1−2 λ⋅

x λ−( )2

⋅

e x− xλ 1−⋅

Γ λ( )

Γ n 1+( )

Γ n x− 1+( ) Γ x 1+( )⋅px⋅ 1 p−( ) n x−⋅

x

Figure 26: Gaussian approximation to Binomial, Poisson distribution, andGamma distribution.

Exercise 12.7 (F2011). Continue from Exercise 6.53. The strongerperson (Kakashi) should win the competition if n is very large. (Bythe law of large numbers, the proportion of fights that Kakashi winsshould be close to 55%.) However, because the results are randomand n can not be very large, we can not guarantee that Kakashiwill win. However, it may be good enough if the probability thatKakashi wins the competition is greater than 0.85.

We want to find the minimal value of n such that the probabilitythat Kakashi wins the competition is greater than 0.85.

Let N be the number of fights that Kakashi wins among the nfights. Then, we need

P[N >

n

2

]≥ 0.85. (39)

Use the central limit theorem and Table 3.1 or Table 3.2 from[Yates and Goodman] to approximate the minimal value of n suchthat (39) is satisfied.

180




ECS315 2014/1 Part VII Dr.Prapun

13 Three Types of Random Variables

13.1. Review: You may recall51 the following properties for cdfof discrete random variables. These properties hold for any kindof random variables.

(a) The cdf is defined as FX(x) = P [X ≤ x]. This is valid forany type of random variables.

(b) Moreover, the cdf for any kind of random variable must sat-isfies three properties which we have discussed earlier:

CDF1 FX is non-decreasing

CDF2 FX is right-continuous

CDF3 limx→−∞

FX (x) = 0 and limx→∞

FX (x) = 1.

(c) P [X = x] = FX (x) − FX (x−) = the jump or saltus in F atx.

Theorem 13.2. If you find a function F that satisfies CDF1,CDF2, and CDF3 above, then F is a cdf of some random variable.

51If you don’t know these properties by now, you should review them as soon as possible.

181

Example 13.3. Consider an input X to a device whose output Ywill be the same as the input if the input level does not exceed 5.For input level that exceeds 5, the output will be saturated at 5.Suppose X ∼ U(0, 6). Find FY (y).

13.4. We can categorize random variables into three types ac-cording to its cdf:

(a) If FX(x) is piecewise flat with discontinuous jumps, then X

is discrete.

(b) If FX(x) is a continuous function, then X is continuous.

(c) If FX(x) is a piecewise continuous function with discontinu-ities, then X is mixed.

182

81 3.1 Random variables1.0

1.0

1.0

0–¥ ¥

–¥ ¥

–¥ ¥

x

x

x

0

0

Fx(x)

Fx(x)

Fx(x)

(a)

(b)

(c)

Fig. 3.2Typical cdfs: (a) a discrete random variable, (b) a continuous random variable, and (c) a mixed randomvariable.

For a discrete random variable, Fx(x) is a staircase function, whereas a random variableis called continuous if Fx(x) is a continuous function. A random variable is called mixedif it is neither discrete nor continuous. Typical cdfs for discrete, continuous, and mixedrandom variables are shown in Figures 3.2(a), 3.2(b), and 3.2(c), respectively.

Rather than dealing with the cdf, it is more common to deal with the probability densityfunction (pdf), which is defined as the derivative of Fx(x), i.e.,

fx(x) = dFx(x)

dx. (3.11)

From the definition it follows that

P(x1 ≤ x ≤ x2) = P(x ≤ x2)− P(x ≤ x1)

= Fx(x2)− Fx(x1)

=∫ x2

x1

fx(x)dx. (3.12)

Figure 27: Typical cdfs: (a) a discrete random variable, (b) a continuous randomvariable, and (c) a mixed random variable [16, Fig. 3.2].

183

We have seen in Example 13.3 that some function can turn acontinuous random variable into a mixed random variable. Next,we will work on an example where a continuous random variableis turned into a discrete random variable.

Example 13.5. Let X ∼ U(0, 1) and Y = g(X) where

g(x) =

1, x < 0.60, x ≥ 0.6.

Before going deeply into the math, it is helpful to think about thenature of the derived random variable Y . The definition of g(x)tells us that Y has only two possible values, Y = 0 and Y = 1.Thus, Y is a discrete random variable.

Example 13.6. In MATLAB, we have the rand command to gen-erate U(0, 1). If we want to generate a Bernoulli random variablewith success probability p, what can we do?

Exercise 13.7. In MATLAB, how can we generateX ∼ binomial(2, 1/4)from the rand command?

184

9 expectation and variance · 9 expectation and variance two numbers are often used to summarize a...

Documents