probability, random processes and inferencepescamilla/prpi/slides/prpi_3.pdf · probability, random...

INSTITUTO POLITÉCNICO NACIONALCENTRO DE INVESTIGACION EN COMPUTACION

Probability, Random Processes and Inference

Dr. Ponciano Jorge Escamilla [email protected]

http://www.cic.ipn.mx/~pescamilla/

Laboratorio de

Ciberseguridad

CIC

2

Course Content

1.4. General Random Variables

1.4.1. Continuous Random Variables and PDFs

1.4.2. Cumulative Distribution Function

1.4.3. Normal Random Variables

1.4.4. Joint PDFs of Multiple Random Variables

1.4.5. Conditioning

1.4.6. The Continuous Bayes’ Rule

1.4.7. The Strong Law of Large Numbers

CIC

❑ Continuous random variables

➢ The velocity of a vehicle traveling along the highway

❑ Continuous random variables can take on any real

value in an interval.

➢ possibly of infinite length, such as (0,) or the entire real

line.

❑ In this section the concepts and method for discrete

r.v.s, such as expectation, PMF, and conditioning,

for their continuous counterparts are introduced.

3

General Random Variables

CIC

❑ Continuous random variable. A random variable is

called continuous if there exists a non negative

function fX, called the probability density function

of X, or PDF, such that:

❑ For every subset B of the real line

4

Probability Density Function

CIC

❑ The probability that the value of X falls within an

interval is:

which can be interpreted as the area under the graph of

the PDF.

5


CIC

6


CIC

❑ For any single value a, we have:

❑ For this reason, including or excluding the endpoints

of an interval has no effect on its probability:

7


CIC

❑ To qualify as a PDF, a function fX must be:

o nonnegative, i.e., fX(x) 0 for every x,

o have the normalisation property:

❑ Graphically, this means that the entire area under the

graph of the PDF must be equal to 1.

8


CIC

9

Discrete vs. continuous r.v.s.

Recall that for a discrete r.v., the CDF jumps at every point in the

support, and is flat everywhere else. In contrast, for a continuous

r.v. the CDF increases smoothly.

CIC

❑ For a continuous r.v. X with CDF, FX(x), the

probability density function (PDF) of X is the

derivative fX(x) of the CDF, given by fX(x) = F′X (x).

The support of X, and of its distribution, is the set of

all x where fX(x) > 0.

❑ The PDF represents the “density” of probability at

the point x.

10

Discrete vs. continuous r.v.s.

CIC

❑ To get from the PDF back to the CDF we apply:

❑ Thus, analogous to how we obtained the value of a

discrete CDF at x by summing the PMF over all

values less than or equal to x; here we integrate the

PDF over all values up to x, so the CDF is the

accumulated area under the PDF.

11


CIC

❑ Since we can freely convert between the PDF and

the CDF using the inverse operations of integration

and differentiation, both the PDF and CDF carry

complete information about the distribution of a

continuous r.v.

❑ Thus the PDF completely specifies the behavior

of continuous random variables.

12


CIC

❑ For an interval [x, x+] with very small length , we

have:

So we can view fX(x) as the “probability mass per unit

length” near x.

13


Even though a PDF is used to calculate

event probabilities, fX(x) is not the

probability of any particular event.

In particular, it is not restricted to be

less than or equal to one.

CIC

❑ An important way in which continuous r.v.s differ

from discrete r.v.s is that for a continuous r.v. X,

P(X = x) = 0 for all x. This is because P(X = x) is the

height of a jump in the CDF at x, but the CDF of X

has no jumps! Since the PMF of a continuous r.v.

would just be 0 everywhere, we work with a PDF

instead.

14


CIC

❑ The PDF is analogous to the PMF in many ways, but

there is a key difference: for a PDF fX , the quantity

fX(x) is not a probability, and in fact it is possible to

have fX(x) > 1 for some values of x. To obtain a

probability, we need to integrate the PDF.

❑ In summary:

➢To get a desired probability, integrate the PDF over

the appropriate range.

15


CIC

❑ The Logistic distribution has CDF:

❑ To get the PDF, we differentiate the CDF, which

gives:

❑ Example:

16

Examples of PDFs

CIC

17

Examples of PDFs

CIC

❑ The Rayleigh distribution has CDF:

❑ To get the PDF, we differentiate the CDF, which

gives:

❑ Example:

18

Examples of PDFs

CIC

19

Examples of PDFs

CIC

❑ A continuous r.v. X is said to have Uniform

distribution on the interval (a, b) if its PDF is:

❑ The CDF is the accumulated area under the PDF:

20

Examples of PDFs

CIC

❑ We denote this by X Unif(a, b).

❑ The Uniform distribution that we will most frequently

use is the Unif(0, 1) distribution, also called the

standard Uniform.

❑ The Unif(0, 1) PDF and CDF are particularly simple:

f(x) = 1 and F(x) = x for 0 < x < 1.

❑ For a general Unif(a, b) distribution, the PDF is

constant on (a, b), and the CDF is ramp-shaped,

increasing linearly from 0 to 1 as x ranges from a to b.

21

Examples of PDFs

CIC

22

Examples of PDFs

For Uniform distributions, probability is proportional to length.

CIC

23

PDF Properties

CIC

❑ The expected value or expectation or mean of a

continuous r.v. X is defined by:

❑ This sis similar to the discrete case except that the

PMF is replaced by the PDF, and summation is

replaced by integration.

❑ Its mathematical properties are similar to the discrete

case.

24

Expected Value and Variance of a

Continuous r.v.

CIC

❑ If X is a continuous random variable with given

PDF, then any real-valued function Y = ɡ(X) of X is

also a random variable.

➢Note that Y can be a continuous r.v., but Y can also be

discrete, e.g., ɡ(x) = 1 for x ˃ 0 and ɡ(x) = 0, otherwise.

❑ In either case, the mean of ɡ(X) satisfies the

expected value rule:

25


Continuous r.v.

CIC

❑ The nth moment of a continuous r.v. X is defined as

E[Xn], the expected value of the random variable Xn.

❑ The variance of X denoted as var(X), is defined as the

expected value of the random variable (X - E[X])2 :

26


Continuous r.v.

CIC

❑ Example. Consider a uniform PDF over an interval

[a, b], its expectation is given by:

27


Continuous r.v.

CIC

❑ Its variance is given as:

28


Continuous r.v.

CIC

❑ The exponential continuous random variable has

PDF:

where is a positive parameter characterising the

PDF, with

29


Continuous r.v.

CIC

❑ The probability that X exceeds a certain value

decreases exponentially. This is, for any a 0, we

have:

❑ An exponential random variable can be a good

model for the amount of time until an incident of

interest takes place.

➢ a message arriving at a computer, some equipment

breaking down, a light bulb burning out, etc.

30


Continuous r.v.

CIC

31


Continuous r.v.

CIC

❑ The mean of the exponential r.v. X is calculated by:

32


Continuous r.v.

CIC

❑ The variance of the exponential r.v. X is calculated

by:

33


Continuous r.v.

CIC

❑ The cumulative distribution function, CDF, of a

random variable X is denoted as FX and provides the

probability P(X x). In particular for every x we

have:

34

Cumulative Distribution Functions

The CDF FX(x) “accumulates” probability “up to” the value of x.

CIC

❑ Any random variable associated with a given

probability model has CDF, regardless of whether it

is discrete or continuous.

➢ {X x} is always an event and therefore has well-defined

probability.

35


CIC

36


CIC

37


CIC

38


CIC

39


CIC

40

Normal Random Variables

❑ A continuous random variable X is normal or

Gaussian or normally distributed if it has PDF of

the form:

where μ and σ are two scalar parameters characterising

the PDF (abbreviated N(μ, σ2), and referred to as

normal density function), with σ assumed positive.

CIC

41


❑ It can be verified that the normalisation property

holds:

N(1,1)

CIC

42


❑ If X is N(μ, σ2), then: E(X) = μ

Proof: The PDF is symmetric about x = μ.

❑ If X is N(μ, σ2), then: Var(X) = σ2

Proof:

CIC

43


❑ Its maximum value occurs at the mean value of its

argument.

❑ It is symmetrical about the mean value.

❑ The points of maximum absolute slope occur at one

standard deviation above and below the mean.

❑ Its maximum value is inversely proportional to its

standard deviation.

❑ The limit as the standard deviation approaches zero

is a unit impulse.

CIC

44


CIC

45

Linear Function of a Normal

Random Variable

❑ If X is a normal r.v. with mean and variance 2,

and if a 0, b are scalars, then the random variable:

Y = aX + b

is also normal, with mean and variance:

E[Y] = a + b, var(Y) = a2 2

CIC

46

Standard Normal Random Variables

❑ A normal random variable Y with zero mean and

unit variance, N(0, 1), is said to be a standard

normal. Its PDF and CDF are denoted by and ,

respectively:

CIC

47


❑ The PDF of a normal r.v. cannot be integrated in

terms of the common elementary functions, and

therefore the probabilities of X falling in various

intervals are obtained from tables or by computer.

❑ Example, the Standard Normal Table.

❑ The table only provides the values of (y) for y 0,

because the omitted values can be calculated using

the symmetry of the PDF.

CIC

48


CIC

49


CIC

50


❑ It would be overwhelming to construct tables for all

μ and σ values required in application.

➢ Standardise the r.v.

❑ Let X be a normal (Gaussian) random variable with

mean μ and variance σ 2 values. We standardise X by

defining a new random variable Y given by:

CIC

51


❑ Since Y is a linear function of X, it is normal, This

means:

❑ Thus, Y is a standard normal random variable.

➢ This allows us to calculate the probability of any event

defined in terms of X by redefining the event in terms of

Y, and then using the standard normal table.

CIC

52


❑ Example 1:

CIC

53


❑ Example 2: The annual snowfall at a particular

geographic location is modelled as a normal random

variable with a mean = 60 inches and a standard

deviation of = 20. What is the probability that this

year’s snowfall will be at least 80 inches?

CIC

54


❑ Solution:

CIC

55


❑ Example 3: (Height Distribution of Men). Assume

that the height X, in inches, of a randomly selected

man in a certain population is normally distributed

with μ = 69 and σ = 2.6. Find

1. P(X < 72),

2. P(X > 72),

3. P(X < 66),

4. P(|X − μ| < 3).

CIC

56


❑ The table gives (z) only for z ≥ 0, and for z < 0 we

need to make use of the symmetry of the normal

distribution. This implies that, for any z, P(Z < −z) =

P(Z > z). Thus, solution:

CIC

57


CIC

58


❑ Normal r.v.s. are often used in signal processing and

communications engineering to model noise and

unpredictable distortions of signals.

❑ Example:

CIC

59


CIC

60


❑ Solution:

CIC

61


❑ Three important benchmarks for the Normal

distribution are the probabilities of falling within

one, two, and three standard deviations of the mean.

The 68-95-99.7% rule tells us that these probabilities

are what the name suggests.

❑ (68-95-99.7% rule). If X N(μ, 2), then:

Standardising

CIC

62


❑ Three important benchmarks for the Normal

distribution are the probabilities of falling within

one, two, and three standard deviations of the mean.

The 68-95-99.7% rule tells us that these probabilities

are what the name suggests.

❑ (68-95-99.7% rule). If X N(μ, 2), then:

Standardising

CIC

63


CIC

64

Joint PDF of Multiple Random

Variables

❑ Two continuous random variables associated with

the same experiment are jointly continuous and can

be described in terms of a joint PDF fX,Y if fX,Y is a

nonnegative function that satisfies:

for every subset B of the two-dimensional plane.

❑ The notation means that the integration is carried out

over the set B.

CIC

65


Variables

❑ In the particular case where B is a rectangle of the

form B = {(x, y) | a x b, c y d}, we have:

❑ If B is the entire two-dimensional plane, then we

obtain the normalisation property:

CIC

66


Variables

B

CIC

67


Variables

❑ To interpret the joint PDF, we let be a small

positive number and consider the probability of a

small rectangle. Then we have:

so we can view fX,Y(a, c) as the probability per unit

area in the vicinity of (a, c).

CIC

68


Variables

CIC

69


Variables

❑ The joint PDF contains all relevant probabilistic

information on the random variables X, Y, and their

dependencies.

❑ Therefore, the joint PDF allow us to calculate the

probability of any event that can be defined in terms

of these two random variables.

CIC

70

Marginals

❑ Marginal PDF. For continuous r.v.s X and Y with

joint PDF fX,Y, the marginal PDF of X is:

❑ Similarly, the marginal PDF of Y is:

CIC

71

Marginals

❑ Marginalisation works analogously with any number

of variables. For example, if we have the joint PDF

of X, Y, Z,W but want the joint PDF of X,W, we

just have to integrate over all possible values of Y

and Z:

➢ Conceptually this is very easy—just integrate over the unwanted

variables to get the joint PDF of the wanted variables—but

computing the integral may or may not be difficult.

CIC

72

Marginals

❑ Example 1.

CIC

73

Marginals

❑ Example 1.

CIC

74

Joint CDFs

❑ If X and Y are two random variables associated with

the same experiment, their joint CDF is defined by:

❑ The joint CDF is the joint probability of the two

events {X ≤ x} and {Y ≤ y}.

❑ If X and Y are described by a joint PDF fX,Y, then:

CIC

75


Variables

❑ Conversely, if X and Y are continuous with joint

CDF FX,Y their joint PDF is the derivative of the

joint CDF with respect to x and y:

CIC

76

Joint CDF of Multiple Random

Variables

❑ Let X and Y be described by a uniform PDF on the

unit square. The joint CDF is given by:

❑ It can be verified that:

for al (x, y) in the unit square.

CIC

77

Expectation

❑ If X and Y are jointly continuous random variables

and ɡ is some function, then Z = ɡ (X, Y) is also a

random variable. Thus the expected value rule

applies:

❑ As an important special case, for any scalars a, b,

and c, we have:

CIC

78

More than Two Random Variables

❑ The joint PDF of three random variables X, Y, and Z

is defined in analogy with the case of two random

variables. For example:

❑ For any set B. We have the relations such as:

CIC

79


❑ The expected value rule takes the form:

❑ If ɡ is linear, of the form aX +bY + cZ, then:

CIC

80


CIC

81


CIC

82

Conditioning

❑ The conditional PDF of a continuous random

variable X, given an event A with P(A) 0, is

defined as a nonnegative function fX|A that satisfies:

for any subset B of the real line.

CIC

83

Conditioning

❑ In particular, by letting B be the entire real line, we

obtain the normalisation property:

so that fX|A is a legitimate PDF.

CIC

84

Conditioning

❑ In the important special case where we condition on

an event of the form {X A}, with P(X A) 0,

the definition of conditional probabilities yields:

❑ By comparing with the earlier formula, it gives:

CIC

85

Conditioning

CIC

86

Joint Conditional PDF

❑ Suppose that X and Y are jointly continuous random

variables, with joint PDF fX,Y. If we condition on a

positive probability event of the form C = {(X,Y)

A}, we have:

❑ In this case, the conditional PDF of X, given this

event, can be obtained from the formula:

CIC

87


❑ These two formulas provide one possible method for

obtaining the conditional PDF of a random variable

X when the conditioning event is not of the form {X

A}, but instead defined in terms multiple random

variables.

CIC

88


❑ A version of the total probability theorem, which

involves conditional PDFs is given as: if the events

A1,…, An form a partition of the sample space, then:

❑ Using the total probability theorem:

CIC

89


❑ Finally, the formula can be written as:

❑ We then take the derivative of both sides, with

respect to x, and obtain the desired result.

CIC

90

Conditioning

CIC

91

Conditioning

CIC

92

Conditioning

CIC

93


❑ To interpret the conditional PDF, let us fix some

small positive numbers 1 and 2, and condition on

the event B = {y Y y + 2}. We have:

CIC

94


❑ Therefore, fX|Y(x|y)1 provides us with the

probability that X belongs to a small interval [x, x +

1], given that Y belongs to a small interval [y, y +

2]. Since fX|Y(x|y)1 does not depend on 2, we can

think of the limiting case where 2 decreases to zero

and write:

❑ And more generally:

CIC

95


❑ The conditional probability PDF fX|Y(x|y) can be seen

as a description of the probability law of X, given that

the event {Y = y} has occurred.

❑ As in the discrete case, the PDF fX|Y, together with the

marginal PDF fy are sometimes used to calculate the

joint PDF.

➢ This approach can also be used for modelling: instead of

directly specifying fX|Y, it is often natural to provide a

probability law for Y, in terms of a PDF fY, and then provide a

conditional PDF fX|Y(x|y) for X, given any possible value y of Y.

CIC

96


❑ Example. The speed of a typical vehicle that drives

past a police radar is modelled as an exponentially

distributed random variable X with mean 50 miles

per hour. The police radar’s measurement Y of the

vehicle’s speed has an error which is modelled as a

normal random variable with zero mean and

standard deviation equal to one tenth of the vehicle’s

speed. What is the joint PDF of X and Y?

CIC

97


❑ Solution. We have fX(x) = (1/50)e-x/50, for x 0.

Also, conditioned on X = x, the measurement Y has

a normal PDF with mean x and variance x2/100.

Therefore:

❑ Thus, for all x 0 and all y:

CIC

98

Conditional PDF for More Than Two r.v.s.

❑ Conditional PDF can be defined for the extension for

the case of more than two random variables:

❑ The analogue multiplication rule is given as:

CIC

99

Conditional Expectation

❑ For a continuous random variable X, we define the

conditional expectation E[X|A] given an event A,

similar to the unconditional case, except that we now

need to use the conditional PDF fX|A.

❑ Let X and Y be jointly continuous random variables,

and let A be an event with P(A) 0, then the

conditional expectation of X given the event A is

defined by:

CIC

100


❑ The conditional expectation of X given that Y = y is

defined by:

❑ The expectation rule, for a function ɡ(x):

and

CIC

101


❑ Total expectation theorem: Let A1, A2,…, An be

disjoint events that form a partition of the same

space, and assume that P(Ai) ˃ 0 for all i. Then:

❑ Similarly:

CIC

102


❑ There are natural analogues for the case of functions

of several random variables. For example:

❑ And:

CIC

103

Independence

❑ Two continuous random variable X and Y are

independent if their joint PDF is the product of the

marginal PDFs:

❑ Comparing with the formula fX,Y(x, y) = fX|Y(x|y)fY(y),

we see that independence is the same as the

condition:

or, symmetrically:

CIC

104

Independence

❑ For the case of more than three random variables, for

example, we say that three random variables X, Y,

and Z are independent if:

CIC

105

Independence

❑ Example. Independent Normal Random Variables.

Let X and Y be independent normal random

variables with means x, y, and variances 2x,

2y,

respectively. Their joint PDF is of the form:

❑ This joint PDF has the shape of a bell cantered at (x,

y), and whose width in the x and y directions is

proportional to 2x and 2

y, respectively.

CIC

106

Independence

❑ Additional insight into the form of the PDF can be

get by considering its contours.

➢ i.e., sets of points ata which the PDF takes a constant

value.

❑ These contours are described by an equation of the

form:

and are ellipses whose two axes are horizontal and

vertical. If 2x = 2

y, then the contours are circles.

CIC

107

Independence

CIC

108

Independence

CIC

109

Independence

❑ If X and Y are independent, then any two events of

the form {X A} and {Y B} are independent:

CIC

110

Independence

❑ Independence implies that:

❑ The property:

can be used to provide a general definition of

independence between two random variables, e.g., if X

is discrete and Y is continuous.

CIC

111

Independence

❑ Similarly than to the discrete case, if X and Y are

independent, then:

for any two functions ɡ and h.

❑ The variance of the sum of independent random

variables is equal to the sum of their variances:

CIC

112

Summary of Independence

CIC

113

Summary of Independence

CIC

114

The continuous Bayes’ rule:

Inference problem

posterior

CIC

115

The continuous Bayes’ rule

❑ Inference problem:

❑ We have an unobserved random variable X with

known PDF, and we obtain a measurement Y

according to a conditional PDF fX|Y. Given an

observed value y of Y, the inference problem is to

evaluate the conditional PDF fX|Y(x|y).

CIC

116


❑ Thus, whatever information is provided by the event

{Y = y} is captured by the conditional PDF fX|Y(x|y).

It thus suffices to evaluate this PDF. From the

formula fX fY|X = fX,Y = fY fX|Y, it follows:

CIC

117


❑ Based on the normalisation property

an equivalent expression is:

CIC

118


CIC

119


CIC

120

The Bayes’ rule – discrete unknown,

continuous measurement

CIC

121

The Bayes’ rule – continuous unknown,

discrete measurement

CIC

122

Sums of Independent Random Variables

Convolution

❑ Let Z = X + Y, where X and Y are independent

integer-valued random variables with PMFs pX and

pY, respectively. Then, for any integer z:

❑ The resulting PMF pZ is called the convolution of the

PMFs of X and Y.

CIC

123

Covariance and Correlation

❑ The covariance of two random variables X and Y,

denoted by cov(X, Y), is defined as:

❑ When cov(X, Y) = 0, we say X and Y are

uncorrelated.

➢A positive o negative covariance indicates that the values

of X− E[X] and Y − E[Y] obtained in a single experiment

“tend” to have the same or the opposite sign, respectively.

CIC

124


CIC

125


❑ Multiplying this out and using linearity, we have an

equivalent expression:

❑ Covariance has the following key properties:

1. Cov(X, X) = Var(X).

2. Cov(X, Y) = Cov(Y, X).

3. Cov(X, c) = 0 for any constant c.

4. Cov(aX, Y ) = aCov(X, Y) for any constant a.

CIC

126


5. Cov(X + Y,Z) = Cov(X,Z) + Cov(Y,Z).

6. Cov(X + Y,Z +W) = Cov(X,Z) + Cov(X,W) +

Cov(Y,Z) + Cov(Y,W).

7. Var(X + Y ) = Var(X) + Var(Y) + 2Cov(X, Y ). For

n r.v.s X1, . . . ,Xn,

CIC

127


❑ The correlation coefficient (X,Y) of two random

variables X and Y that have nonzero variances is

defined as:

❑ It may be viewed as a normalised version of the

covariance cov(X, Y).

❑ ranges from -1 to 1.

CIC

128


❑ If > 0 (or < 0), then the values of X − E[X] and Y

− E[Y] “tend” to have the same (or opposite,

respectively) sign.

➢ The size of || provides a normalized measure of the

extent to which this is true.

➢Always assuming that X and Y have positive variances, it

cab be shown that = 1 (or = −1) if and only if

there exists a positive (or negative, respectively)

constant c such that:

CIC

129


CIC

130


probability, random processes and inferencepescamilla/prpi/slides/prpi_3.pdf · probability, random...

Documents