probability, random processes and inferencepescamilla/prpi/slides/prpi_3.pdf · probability, random...
TRANSCRIPT
INSTITUTO POLITÉCNICO NACIONALCENTRO DE INVESTIGACION EN COMPUTACION
Probability, Random Processes and Inference
Dr. Ponciano Jorge Escamilla [email protected]
http://www.cic.ipn.mx/~pescamilla/
Laboratorio de
Ciberseguridad
CIC
2
Course Content
1.4. General Random Variables
1.4.1. Continuous Random Variables and PDFs
1.4.2. Cumulative Distribution Function
1.4.3. Normal Random Variables
1.4.4. Joint PDFs of Multiple Random Variables
1.4.5. Conditioning
1.4.6. The Continuous Bayes’ Rule
1.4.7. The Strong Law of Large Numbers
CIC
❑ Continuous random variables
➢ The velocity of a vehicle traveling along the highway
❑ Continuous random variables can take on any real
value in an interval.
➢ possibly of infinite length, such as (0,) or the entire real
line.
❑ In this section the concepts and method for discrete
r.v.s, such as expectation, PMF, and conditioning,
for their continuous counterparts are introduced.
3
General Random Variables
CIC
❑ Continuous random variable. A random variable is
called continuous if there exists a non negative
function fX, called the probability density function
of X, or PDF, such that:
❑ For every subset B of the real line
4
Probability Density Function
CIC
❑ The probability that the value of X falls within an
interval is:
which can be interpreted as the area under the graph of
the PDF.
5
Probability Density Function
CIC
6
Probability Density Function
CIC
❑ For any single value a, we have:
❑ For this reason, including or excluding the endpoints
of an interval has no effect on its probability:
7
Probability Density Function
CIC
❑ To qualify as a PDF, a function fX must be:
o nonnegative, i.e., fX(x) 0 for every x,
o have the normalisation property:
❑ Graphically, this means that the entire area under the
graph of the PDF must be equal to 1.
8
Probability Density Function
CIC
9
Discrete vs. continuous r.v.s.
Recall that for a discrete r.v., the CDF jumps at every point in the
support, and is flat everywhere else. In contrast, for a continuous
r.v. the CDF increases smoothly.
CIC
❑ For a continuous r.v. X with CDF, FX(x), the
probability density function (PDF) of X is the
derivative fX(x) of the CDF, given by fX(x) = F′X (x).
The support of X, and of its distribution, is the set of
all x where fX(x) > 0.
❑ The PDF represents the “density” of probability at
the point x.
10
Discrete vs. continuous r.v.s.
CIC
❑ To get from the PDF back to the CDF we apply:
❑ Thus, analogous to how we obtained the value of a
discrete CDF at x by summing the PMF over all
values less than or equal to x; here we integrate the
PDF over all values up to x, so the CDF is the
accumulated area under the PDF.
11
Probability Density Function
CIC
❑ Since we can freely convert between the PDF and
the CDF using the inverse operations of integration
and differentiation, both the PDF and CDF carry
complete information about the distribution of a
continuous r.v.
❑ Thus the PDF completely specifies the behavior
of continuous random variables.
12
Probability Density Function
CIC
❑ For an interval [x, x+] with very small length , we
have:
So we can view fX(x) as the “probability mass per unit
length” near x.
13
Probability Density Function
Even though a PDF is used to calculate
event probabilities, fX(x) is not the
probability of any particular event.
In particular, it is not restricted to be
less than or equal to one.
CIC
❑ An important way in which continuous r.v.s differ
from discrete r.v.s is that for a continuous r.v. X,
P(X = x) = 0 for all x. This is because P(X = x) is the
height of a jump in the CDF at x, but the CDF of X
has no jumps! Since the PMF of a continuous r.v.
would just be 0 everywhere, we work with a PDF
instead.
14
Probability Density Function
CIC
❑ The PDF is analogous to the PMF in many ways, but
there is a key difference: for a PDF fX , the quantity
fX(x) is not a probability, and in fact it is possible to
have fX(x) > 1 for some values of x. To obtain a
probability, we need to integrate the PDF.
❑ In summary:
➢To get a desired probability, integrate the PDF over
the appropriate range.
15
Probability Density Function
CIC
❑ The Logistic distribution has CDF:
❑ To get the PDF, we differentiate the CDF, which
gives:
❑ Example:
16
Examples of PDFs
CIC
17
Examples of PDFs
CIC
❑ The Rayleigh distribution has CDF:
❑ To get the PDF, we differentiate the CDF, which
gives:
❑ Example:
18
Examples of PDFs
CIC
19
Examples of PDFs
CIC
❑ A continuous r.v. X is said to have Uniform
distribution on the interval (a, b) if its PDF is:
❑ The CDF is the accumulated area under the PDF:
20
Examples of PDFs
CIC
❑ We denote this by X Unif(a, b).
❑ The Uniform distribution that we will most frequently
use is the Unif(0, 1) distribution, also called the
standard Uniform.
❑ The Unif(0, 1) PDF and CDF are particularly simple:
f(x) = 1 and F(x) = x for 0 < x < 1.
❑ For a general Unif(a, b) distribution, the PDF is
constant on (a, b), and the CDF is ramp-shaped,
increasing linearly from 0 to 1 as x ranges from a to b.
21
Examples of PDFs
CIC
22
Examples of PDFs
For Uniform distributions, probability is proportional to length.
CIC
23
PDF Properties
CIC
❑ The expected value or expectation or mean of a
continuous r.v. X is defined by:
❑ This sis similar to the discrete case except that the
PMF is replaced by the PDF, and summation is
replaced by integration.
❑ Its mathematical properties are similar to the discrete
case.
24
Expected Value and Variance of a
Continuous r.v.
CIC
❑ If X is a continuous random variable with given
PDF, then any real-valued function Y = ɡ(X) of X is
also a random variable.
➢Note that Y can be a continuous r.v., but Y can also be
discrete, e.g., ɡ(x) = 1 for x ˃ 0 and ɡ(x) = 0, otherwise.
❑ In either case, the mean of ɡ(X) satisfies the
expected value rule:
25
Expected Value and Variance of a
Continuous r.v.
CIC
❑ The nth moment of a continuous r.v. X is defined as
E[Xn], the expected value of the random variable Xn.
❑ The variance of X denoted as var(X), is defined as the
expected value of the random variable (X - E[X])2 :
26
Expected Value and Variance of a
Continuous r.v.
CIC
❑ Example. Consider a uniform PDF over an interval
[a, b], its expectation is given by:
27
Expected Value and Variance of a
Continuous r.v.
CIC
❑ Its variance is given as:
28
Expected Value and Variance of a
Continuous r.v.
CIC
❑ The exponential continuous random variable has
PDF:
where is a positive parameter characterising the
PDF, with
29
Expected Value and Variance of a
Continuous r.v.
CIC
❑ The probability that X exceeds a certain value
decreases exponentially. This is, for any a 0, we
have:
❑ An exponential random variable can be a good
model for the amount of time until an incident of
interest takes place.
➢ a message arriving at a computer, some equipment
breaking down, a light bulb burning out, etc.
30
Expected Value and Variance of a
Continuous r.v.
CIC
31
Expected Value and Variance of a
Continuous r.v.
CIC
❑ The mean of the exponential r.v. X is calculated by:
32
Expected Value and Variance of a
Continuous r.v.
CIC
❑ The variance of the exponential r.v. X is calculated
by:
33
Expected Value and Variance of a
Continuous r.v.
CIC
❑ The cumulative distribution function, CDF, of a
random variable X is denoted as FX and provides the
probability P(X x). In particular for every x we
have:
34
Cumulative Distribution Functions
The CDF FX(x) “accumulates” probability “up to” the value of x.
CIC
❑ Any random variable associated with a given
probability model has CDF, regardless of whether it
is discrete or continuous.
➢ {X x} is always an event and therefore has well-defined
probability.
35
Cumulative Distribution Functions
CIC
36
Cumulative Distribution Functions
CIC
37
Cumulative Distribution Functions
CIC
38
Cumulative Distribution Functions
CIC
39
Cumulative Distribution Functions
CIC
40
Normal Random Variables
❑ A continuous random variable X is normal or
Gaussian or normally distributed if it has PDF of
the form:
where μ and σ are two scalar parameters characterising
the PDF (abbreviated N(μ, σ2), and referred to as
normal density function), with σ assumed positive.
CIC
41
Normal Random Variables
❑ It can be verified that the normalisation property
holds:
N(1,1)
CIC
42
Normal Random Variables
❑ If X is N(μ, σ2), then: E(X) = μ
Proof: The PDF is symmetric about x = μ.
❑ If X is N(μ, σ2), then: Var(X) = σ2
Proof:
CIC
43
Normal Random Variables
❑ Its maximum value occurs at the mean value of its
argument.
❑ It is symmetrical about the mean value.
❑ The points of maximum absolute slope occur at one
standard deviation above and below the mean.
❑ Its maximum value is inversely proportional to its
standard deviation.
❑ The limit as the standard deviation approaches zero
is a unit impulse.
CIC
44
Normal Random Variables
CIC
45
Linear Function of a Normal
Random Variable
❑ If X is a normal r.v. with mean and variance 2,
and if a 0, b are scalars, then the random variable:
Y = aX + b
is also normal, with mean and variance:
E[Y] = a + b, var(Y) = a2 2
CIC
46
Standard Normal Random Variables
❑ A normal random variable Y with zero mean and
unit variance, N(0, 1), is said to be a standard
normal. Its PDF and CDF are denoted by and ,
respectively:
CIC
47
Standard Normal Random Variables
❑ The PDF of a normal r.v. cannot be integrated in
terms of the common elementary functions, and
therefore the probabilities of X falling in various
intervals are obtained from tables or by computer.
❑ Example, the Standard Normal Table.
❑ The table only provides the values of (y) for y 0,
because the omitted values can be calculated using
the symmetry of the PDF.
CIC
48
Standard Normal Random Variables
CIC
49
Standard Normal Random Variables
CIC
50
Standard Normal Random Variables
❑ It would be overwhelming to construct tables for all
μ and σ values required in application.
➢ Standardise the r.v.
❑ Let X be a normal (Gaussian) random variable with
mean μ and variance σ 2 values. We standardise X by
defining a new random variable Y given by:
CIC
51
Standard Normal Random Variables
❑ Since Y is a linear function of X, it is normal, This
means:
❑ Thus, Y is a standard normal random variable.
➢ This allows us to calculate the probability of any event
defined in terms of X by redefining the event in terms of
Y, and then using the standard normal table.
CIC
52
Standard Normal Random Variables
❑ Example 1:
CIC
53
Standard Normal Random Variables
❑ Example 2: The annual snowfall at a particular
geographic location is modelled as a normal random
variable with a mean = 60 inches and a standard
deviation of = 20. What is the probability that this
year’s snowfall will be at least 80 inches?
CIC
54
Standard Normal Random Variables
❑ Solution:
CIC
55
Standard Normal Random Variables
❑ Example 3: (Height Distribution of Men). Assume
that the height X, in inches, of a randomly selected
man in a certain population is normally distributed
with μ = 69 and σ = 2.6. Find
1. P(X < 72),
2. P(X > 72),
3. P(X < 66),
4. P(|X − μ| < 3).
CIC
56
Standard Normal Random Variables
❑ The table gives (z) only for z ≥ 0, and for z < 0 we
need to make use of the symmetry of the normal
distribution. This implies that, for any z, P(Z < −z) =
P(Z > z). Thus, solution:
CIC
57
Standard Normal Random Variables
CIC
58
Standard Normal Random Variables
❑ Normal r.v.s. are often used in signal processing and
communications engineering to model noise and
unpredictable distortions of signals.
❑ Example:
CIC
59
Standard Normal Random Variables
CIC
60
Standard Normal Random Variables
❑ Solution:
CIC
61
Standard Normal Random Variables
❑ Three important benchmarks for the Normal
distribution are the probabilities of falling within
one, two, and three standard deviations of the mean.
The 68-95-99.7% rule tells us that these probabilities
are what the name suggests.
❑ (68-95-99.7% rule). If X N(μ, 2), then:
Standardising
CIC
62
Standard Normal Random Variables
❑ Three important benchmarks for the Normal
distribution are the probabilities of falling within
one, two, and three standard deviations of the mean.
The 68-95-99.7% rule tells us that these probabilities
are what the name suggests.
❑ (68-95-99.7% rule). If X N(μ, 2), then:
Standardising
CIC
63
Standard Normal Random Variables
CIC
64
Joint PDF of Multiple Random
Variables
❑ Two continuous random variables associated with
the same experiment are jointly continuous and can
be described in terms of a joint PDF fX,Y if fX,Y is a
nonnegative function that satisfies:
for every subset B of the two-dimensional plane.
❑ The notation means that the integration is carried out
over the set B.
CIC
65
Joint PDF of Multiple Random
Variables
❑ In the particular case where B is a rectangle of the
form B = {(x, y) | a x b, c y d}, we have:
❑ If B is the entire two-dimensional plane, then we
obtain the normalisation property:
CIC
66
Joint PDF of Multiple Random
Variables
B
CIC
67
Joint PDF of Multiple Random
Variables
❑ To interpret the joint PDF, we let be a small
positive number and consider the probability of a
small rectangle. Then we have:
so we can view fX,Y(a, c) as the probability per unit
area in the vicinity of (a, c).
CIC
68
Joint PDF of Multiple Random
Variables
CIC
69
Joint PDF of Multiple Random
Variables
❑ The joint PDF contains all relevant probabilistic
information on the random variables X, Y, and their
dependencies.
❑ Therefore, the joint PDF allow us to calculate the
probability of any event that can be defined in terms
of these two random variables.
CIC
70
Marginals
❑ Marginal PDF. For continuous r.v.s X and Y with
joint PDF fX,Y, the marginal PDF of X is:
❑ Similarly, the marginal PDF of Y is:
CIC
71
Marginals
❑ Marginalisation works analogously with any number
of variables. For example, if we have the joint PDF
of X, Y, Z,W but want the joint PDF of X,W, we
just have to integrate over all possible values of Y
and Z:
➢ Conceptually this is very easy—just integrate over the unwanted
variables to get the joint PDF of the wanted variables—but
computing the integral may or may not be difficult.
CIC
72
Marginals
❑ Example 1.
CIC
73
Marginals
❑ Example 1.
CIC
74
Joint CDFs
❑ If X and Y are two random variables associated with
the same experiment, their joint CDF is defined by:
❑ The joint CDF is the joint probability of the two
events {X ≤ x} and {Y ≤ y}.
❑ If X and Y are described by a joint PDF fX,Y, then:
CIC
75
Joint PDF of Multiple Random
Variables
❑ Conversely, if X and Y are continuous with joint
CDF FX,Y their joint PDF is the derivative of the
joint CDF with respect to x and y:
CIC
76
Joint CDF of Multiple Random
Variables
❑ Let X and Y be described by a uniform PDF on the
unit square. The joint CDF is given by:
❑ It can be verified that:
for al (x, y) in the unit square.
CIC
77
Expectation
❑ If X and Y are jointly continuous random variables
and ɡ is some function, then Z = ɡ (X, Y) is also a
random variable. Thus the expected value rule
applies:
❑ As an important special case, for any scalars a, b,
and c, we have:
CIC
78
More than Two Random Variables
❑ The joint PDF of three random variables X, Y, and Z
is defined in analogy with the case of two random
variables. For example:
❑ For any set B. We have the relations such as:
CIC
79
More than Two Random Variables
❑ The expected value rule takes the form:
❑ If ɡ is linear, of the form aX +bY + cZ, then:
CIC
80
More than Two Random Variables
CIC
81
More than Two Random Variables
CIC
82
Conditioning
❑ The conditional PDF of a continuous random
variable X, given an event A with P(A) 0, is
defined as a nonnegative function fX|A that satisfies:
for any subset B of the real line.
CIC
83
Conditioning
❑ In particular, by letting B be the entire real line, we
obtain the normalisation property:
so that fX|A is a legitimate PDF.
CIC
84
Conditioning
❑ In the important special case where we condition on
an event of the form {X A}, with P(X A) 0,
the definition of conditional probabilities yields:
❑ By comparing with the earlier formula, it gives:
CIC
85
Conditioning
CIC
86
Joint Conditional PDF
❑ Suppose that X and Y are jointly continuous random
variables, with joint PDF fX,Y. If we condition on a
positive probability event of the form C = {(X,Y)
A}, we have:
❑ In this case, the conditional PDF of X, given this
event, can be obtained from the formula:
CIC
87
Joint Conditional PDF
❑ These two formulas provide one possible method for
obtaining the conditional PDF of a random variable
X when the conditioning event is not of the form {X
A}, but instead defined in terms multiple random
variables.
CIC
88
Joint Conditional PDF
❑ A version of the total probability theorem, which
involves conditional PDFs is given as: if the events
A1,…, An form a partition of the sample space, then:
❑ Using the total probability theorem:
CIC
89
Joint Conditional PDF
❑ Finally, the formula can be written as:
❑ We then take the derivative of both sides, with
respect to x, and obtain the desired result.
CIC
90
Conditioning
CIC
91
Conditioning
CIC
92
Conditioning
CIC
93
Joint Conditional PDF
❑ To interpret the conditional PDF, let us fix some
small positive numbers 1 and 2, and condition on
the event B = {y Y y + 2}. We have:
CIC
94
Joint Conditional PDF
❑ Therefore, fX|Y(x|y)1 provides us with the
probability that X belongs to a small interval [x, x +
1], given that Y belongs to a small interval [y, y +
2]. Since fX|Y(x|y)1 does not depend on 2, we can
think of the limiting case where 2 decreases to zero
and write:
❑ And more generally:
CIC
95
Joint Conditional PDF
❑ The conditional probability PDF fX|Y(x|y) can be seen
as a description of the probability law of X, given that
the event {Y = y} has occurred.
❑ As in the discrete case, the PDF fX|Y, together with the
marginal PDF fy are sometimes used to calculate the
joint PDF.
➢ This approach can also be used for modelling: instead of
directly specifying fX|Y, it is often natural to provide a
probability law for Y, in terms of a PDF fY, and then provide a
conditional PDF fX|Y(x|y) for X, given any possible value y of Y.
CIC
96
Joint Conditional PDF
❑ Example. The speed of a typical vehicle that drives
past a police radar is modelled as an exponentially
distributed random variable X with mean 50 miles
per hour. The police radar’s measurement Y of the
vehicle’s speed has an error which is modelled as a
normal random variable with zero mean and
standard deviation equal to one tenth of the vehicle’s
speed. What is the joint PDF of X and Y?
CIC
97
Joint Conditional PDF
❑ Solution. We have fX(x) = (1/50)e-x/50, for x 0.
Also, conditioned on X = x, the measurement Y has
a normal PDF with mean x and variance x2/100.
Therefore:
❑ Thus, for all x 0 and all y:
CIC
98
Conditional PDF for More Than Two r.v.s.
❑ Conditional PDF can be defined for the extension for
the case of more than two random variables:
❑ The analogue multiplication rule is given as:
CIC
99
Conditional Expectation
❑ For a continuous random variable X, we define the
conditional expectation E[X|A] given an event A,
similar to the unconditional case, except that we now
need to use the conditional PDF fX|A.
❑ Let X and Y be jointly continuous random variables,
and let A be an event with P(A) 0, then the
conditional expectation of X given the event A is
defined by:
CIC
100
Conditional Expectation
❑ The conditional expectation of X given that Y = y is
defined by:
❑ The expectation rule, for a function ɡ(x):
and
CIC
101
Conditional Expectation
❑ Total expectation theorem: Let A1, A2,…, An be
disjoint events that form a partition of the same
space, and assume that P(Ai) ˃ 0 for all i. Then:
❑ Similarly:
CIC
102
Conditional Expectation
❑ There are natural analogues for the case of functions
of several random variables. For example:
❑ And:
CIC
103
Independence
❑ Two continuous random variable X and Y are
independent if their joint PDF is the product of the
marginal PDFs:
❑ Comparing with the formula fX,Y(x, y) = fX|Y(x|y)fY(y),
we see that independence is the same as the
condition:
or, symmetrically:
CIC
104
Independence
❑ For the case of more than three random variables, for
example, we say that three random variables X, Y,
and Z are independent if:
CIC
105
Independence
❑ Example. Independent Normal Random Variables.
Let X and Y be independent normal random
variables with means x, y, and variances 2x,
2y,
respectively. Their joint PDF is of the form:
❑ This joint PDF has the shape of a bell cantered at (x,
y), and whose width in the x and y directions is
proportional to 2x and 2
y, respectively.
CIC
106
Independence
❑ Additional insight into the form of the PDF can be
get by considering its contours.
➢ i.e., sets of points ata which the PDF takes a constant
value.
❑ These contours are described by an equation of the
form:
and are ellipses whose two axes are horizontal and
vertical. If 2x = 2
y, then the contours are circles.
CIC
107
Independence
CIC
108
Independence
CIC
109
Independence
❑ If X and Y are independent, then any two events of
the form {X A} and {Y B} are independent:
CIC
110
Independence
❑ Independence implies that:
❑ The property:
can be used to provide a general definition of
independence between two random variables, e.g., if X
is discrete and Y is continuous.
CIC
111
Independence
❑ Similarly than to the discrete case, if X and Y are
independent, then:
for any two functions ɡ and h.
❑ The variance of the sum of independent random
variables is equal to the sum of their variances:
CIC
112
Summary of Independence
CIC
113
Summary of Independence
CIC
114
The continuous Bayes’ rule:
Inference problem
posterior
CIC
115
The continuous Bayes’ rule
❑ Inference problem:
❑ We have an unobserved random variable X with
known PDF, and we obtain a measurement Y
according to a conditional PDF fX|Y. Given an
observed value y of Y, the inference problem is to
evaluate the conditional PDF fX|Y(x|y).
CIC
116
The continuous Bayes’ rule
❑ Thus, whatever information is provided by the event
{Y = y} is captured by the conditional PDF fX|Y(x|y).
It thus suffices to evaluate this PDF. From the
formula fX fY|X = fX,Y = fY fX|Y, it follows:
CIC
117
The continuous Bayes’ rule
❑ Based on the normalisation property
an equivalent expression is:
CIC
118
The continuous Bayes’ rule
CIC
119
The continuous Bayes’ rule
CIC
120
The Bayes’ rule – discrete unknown,
continuous measurement
CIC
121
The Bayes’ rule – continuous unknown,
discrete measurement
CIC
122
Sums of Independent Random Variables
Convolution
❑ Let Z = X + Y, where X and Y are independent
integer-valued random variables with PMFs pX and
pY, respectively. Then, for any integer z:
❑ The resulting PMF pZ is called the convolution of the
PMFs of X and Y.
CIC
123
Covariance and Correlation
❑ The covariance of two random variables X and Y,
denoted by cov(X, Y), is defined as:
❑ When cov(X, Y) = 0, we say X and Y are
uncorrelated.
➢A positive o negative covariance indicates that the values
of X− E[X] and Y − E[Y] obtained in a single experiment
“tend” to have the same or the opposite sign, respectively.
CIC
124
Covariance and Correlation
CIC
125
Covariance and Correlation
❑ Multiplying this out and using linearity, we have an
equivalent expression:
❑ Covariance has the following key properties:
1. Cov(X, X) = Var(X).
2. Cov(X, Y) = Cov(Y, X).
3. Cov(X, c) = 0 for any constant c.
4. Cov(aX, Y ) = aCov(X, Y) for any constant a.
CIC
126
Covariance and Correlation
5. Cov(X + Y,Z) = Cov(X,Z) + Cov(Y,Z).
6. Cov(X + Y,Z +W) = Cov(X,Z) + Cov(X,W) +
Cov(Y,Z) + Cov(Y,W).
7. Var(X + Y ) = Var(X) + Var(Y) + 2Cov(X, Y ). For
n r.v.s X1, . . . ,Xn,
CIC
127
Covariance and Correlation
❑ The correlation coefficient (X,Y) of two random
variables X and Y that have nonzero variances is
defined as:
❑ It may be viewed as a normalised version of the
covariance cov(X, Y).
❑ ranges from -1 to 1.
CIC
128
Covariance and Correlation
❑ If > 0 (or < 0), then the values of X − E[X] and Y
− E[Y] “tend” to have the same (or opposite,
respectively) sign.
➢ The size of || provides a normalized measure of the
extent to which this is true.
➢Always assuming that X and Y have positive variances, it
cab be shown that = 1 (or = −1) if and only if
there exists a positive (or negative, respectively)
constant c such that:
CIC
129
Covariance and Correlation
CIC
130
Covariance and Correlation