math2901 revision seminar - mathsoc.unsw.edu.au€¦ · table of contents 1 basic probability...
Post on 17-Oct-2020
3 Views
Preview:
TRANSCRIPT
MATH2901 Revision SeminarPart I: Probability Theory
Jeffrey Yang
UNSW Society of Statistics/UNSW Mathematics Society
August 2020
Jeffrey Yang (UNSW Society of Statistics/UNSW Mathematics Society)MATH2901 Revision Seminar August 2020 1 / 116
Note
The theoretical content was mostly sourced from Libo’s notes.
Most of the examples were taken from the slides Rui Tong (2018 StatSocTeam) made for that year’s revision seminar.
All examples are presented at the end so that it isn’t as obvious whichtechniques/methods should be used.
It is recommended that you refer to the official lecture notes when quotingdefinitions/results.
Jeffrey Yang (UNSW Society of Statistics/UNSW Mathematics Society)MATH2901 Revision Seminar August 2020 2 / 116
Table of Contents
1 Basic Probability Theory
2 Review of Random Variables
3 Computations for Common Distributions
4 Inequalities
5 Limit of Random Variables
6 Tips and Assorted Examples
Jeffrey Yang (UNSW Society of Statistics/UNSW Mathematics Society)MATH2901 Revision Seminar August 2020 3 / 116
Basic Probability Theory
1 Axioms and Basic Results
2 Conditional Probability and Independence
3 Basic Laws
4 Bayes Formula
Jeffrey Yang (UNSW Society of Statistics/UNSW Mathematics Society)MATH2901 Revision Seminar August 2020 4 / 116
Introduction to Probability Theory
Probability theory is about modelling and analysing random experiments.We can do this mathematically by specifying an appropriate probabilityspace consisting of three components:
1 a sample space, Ω, which is the set of all possible outcomes.
2 an event space, F , which is the set of all events ”of interest”. Here,we can understand F as being a set of subsets of Ω which satisfiescertain conditions.
3 a probability function, P : F → [0, 1], which assigns each event inthe event space a probability.
Of course, for the model to be meaningful, P should satisfy certain axioms.
Jeffrey Yang (UNSW Society of Statistics/UNSW Mathematics Society)MATH2901 Revision Seminar August 2020 5 / 116
Axioms
Definition (Probability Space)
A probability space is the triple (Ω,F ,P). Here, P satisfies the axioms
1 P(A) ≥ 0 for all A ∈ F2 P(Ω) = 1
3 P (⋃∞
i=1 Ai ) =∑∞
i=1 P(Ai ) for mutually exclusive events A1,A2, ... ∈ F
Exercise for the reader: Prove the axioms.
Jeffrey Yang (UNSW Society of Statistics/UNSW Mathematics Society)MATH2901 Revision Seminar August 2020 6 / 116
Elementary Results
From the axioms, we are able to derive the following fundamental results:
1 If A1,A2, ...,Ak are mutually exclusive,
P
(k⋃
i=1
Ai
)=
k∑i=1
P(Ai ).
2 P(∅) = 0.
3 For any A ⊆ Ω, 0 ≤ P(A) ≤ 1 and P(A) = 1− P(A).
4 If B ⊂ A, then P(B) ≤ P(A). Hence, if B occurs → A occurs, thenP(B) ≤ P(A).
These results can be used without proof.
Jeffrey Yang (UNSW Society of Statistics/UNSW Mathematics Society)MATH2901 Revision Seminar August 2020 7 / 116
Conditional Probability
Definition (Conditional Probability)
The conditional probability that an event A occurs, given that an event Bhas occurred is
P(A|B) =P(A ∩ B)
P(B).
Here, we require P(B) 6= 0.
Given that B has occurred, the total probability for the possible results ofan experiment equals P(B). Visually, we see that the only outcomes in Athat are now possible are those in A ∩ B.
Jeffrey Yang (UNSW Society of Statistics/UNSW Mathematics Society)MATH2901 Revision Seminar August 2020 8 / 116
Independence
Definition (Independent Events)
Events A and B are independent if P(A ∩ B) = P(A)P(B).
Recall that for any two events A and B , we have P(A ∩ B) = P(A|B)P(B).Thus, A and B are independent if and and only P(A|B) = P(A) (orequivalently, P(B|A) = P(B)). Intuitively, this means that knowing event Ahas occurred does not give us any information on the probability of event Boccurring (and vice versa).
Jeffrey Yang (UNSW Society of Statistics/UNSW Mathematics Society)MATH2901 Revision Seminar August 2020 9 / 116
Independence
Definition (Pairwise Independent Events)
For a countable sequence of events Ai, the events are pairwiseindependent if
P(Ai ∩ Aj) = P(Ai )P(Aj)
for all i 6= j
Definition ((Mutually) Independent Events)
For a countable sequence of events Ai, the events are (mutually)independent if for any collection Ai1 ,Ai2 , ...,Ain , we have
P(Ai1 ∩ ... ∩ Ain) = P(Ai1)...P(Ain).
Independence implies pairwise independence but not vice versa. Can youthink of an example of where pairwise independence does not implyindependence?
Jeffrey Yang (UNSW Society of Statistics/UNSW Mathematics Society)MATH2901 Revision Seminar August 2020 10 / 116
The Multiplicative Law
Definition (The Multiplicative Law)
For events A1,A2, we have
P(A1 ∩ A2) = P(A2|A1)P(A1).
For events A1,A2,A3, we have
P(A1 ∩ A2 ∩ A3) = P(A3 ∩ A2 ∩ A1)
= P(A3|A2 ∩ A1)P(A2 ∩ A1)
= P(A3|A1 ∩ A2)P(A2|A1)P(A1).
We trust that the reader is able to generalise this to cases involving agreater number of events.
The Multiplicative Law is highly useful when dealing with asequence of dependent trials.
Jeffrey Yang (UNSW Society of Statistics/UNSW Mathematics Society)MATH2901 Revision Seminar August 2020 11 / 116
The Additive Law
Definition (The Additive Law)
For events A and B, we have
P(A ∪ B) = P(A) + P(B)− P(A ∩ B).
You might have noticed that the Additive Law resembles theInclusion-exclusion principle. The following quote from Wikipedia shedssome light on this:
”As finite probabilities are computed as counts relative to the cardinality ofthe probability space, the formulas for the principle of inclusion–exclusionremain valid when the cardinalities of the sets are replaced by finiteprobabilities.”
https://en.wikipedia.org/wiki/Inclusion-exclusion principle
Jeffrey Yang (UNSW Society of Statistics/UNSW Mathematics Society)MATH2901 Revision Seminar August 2020 12 / 116
The Law of Total Probability
Definition (The Law of Total Probability)
Suppose A1,A2, ...,Ak are mutually exclusive (Ai ∩ Aj = ∅ for all i 6= j)
and exhaustive (⋃k
i=1 Ai = Ω = sample space); that is, A1, ...,Ak forms apartition of Ω. Then, for any event B, we have
P(B) =k∑
i=1
P(B|Ai )P(Ai ).
The Law of Total Probability relates marginal probabilities to conditionalprobabilities and is often used in calculations involving Bayes’ Formula.
Jeffrey Yang (UNSW Society of Statistics/UNSW Mathematics Society)MATH2901 Revision Seminar August 2020 13 / 116
Bayes’ Formula/Bayes’ Theorem/Bayes’ Law
Definition (Bayes’ Theorem)
For a partition A1,A2, ...,Ak and an event B,
P(Aj |B) =P(B|Aj)P(Aj)∑ki=1 P(B|Ai )P(Ai )
=P(B|Aj)P(Aj)
P(B)
In essence, Bayes’ Theorem allows us to reverse the order of conditioningprovided that we know the marginal probabilities of event A and event B.When dealing with problems involving Bayes’ Theorem, it is recommendedthat one draws a tree diagram.
Here, we shall adopt the frequentist interpretation of probability whereprobability measures a proportion of outcomes – as opposed to the Bayesianinterpretation where probability measures a degree of belief.
Jeffrey Yang (UNSW Society of Statistics/UNSW Mathematics Society)MATH2901 Revision Seminar August 2020 14 / 116
Table of Contents
1 Basic Probability Theory
2 Review of Random Variables
3 Computations for Common Distributions
4 Inequalities
5 Limit of Random Variables
6 Tips and Assorted Examples
Jeffrey Yang (UNSW Society of Statistics/UNSW Mathematics Society)MATH2901 Revision Seminar August 2020 15 / 116
Review of Random Variables
1 Discrete Random Variables
2 Continuous Random Variables
3 Cumulative Distribution Function
4 Expectation and Moments
Jeffrey Yang (UNSW Society of Statistics/UNSW Mathematics Society)MATH2901 Revision Seminar August 2020 16 / 116
Discrete Random Variables
Definition (Discrete Random Variable)
The random variable X is discrete if there are countably many values x forwhich P(X = x) > 0.
The probability structure of X is typically described by its probability(mass) function.
Definition (Probability (Mass) Function)
The probability function of the discrete random variables X is given by
fX (x) = P(X = x).
The following two properties are important and apply to all discrete randomvariables:
1 fX (x) ≥ 0 for all x ∈ R2∑
all x fX (x) = 1
Jeffrey Yang (UNSW Society of Statistics/UNSW Mathematics Society)MATH2901 Revision Seminar August 2020 17 / 116
Continuous Random Variables
A continuous random variable has a continuum of possible values.Continuous random variables do not have a probability (mass) function, buthave the analogous (probability) density function.
Definition ((Probability) Density Function)
The density function of a continuous random variable is a real-valuedfunction fX on R with the property∫
AfX (x)dx = P(X ∈ A)
for any (measurable) set A ⊆ R.
The following two properties are important and apply to all continuousrandom variables:
1 fX (x) ≥ 0 for all x ∈ R.2∫∞−∞ fX (x)dx = 1.
Jeffrey Yang (UNSW Society of Statistics/UNSW Mathematics Society)MATH2901 Revision Seminar August 2020 18 / 116
Cumulative Distribution Function (cdf)
Definition (Cumulative Distribution Function)
The cumulative distribution function (cdf) of a random variable X isdefined by
FX (x) = P(X ≤ x).
Here, X can be either continuous or discrete.
In the case where X is a continuous random variable, we have the followingimportant results:
1 FX =∫∞−∞ fX (t)dt.
2 fX (x) = F ′X (x).
3 P(a ≤ X ≤ b) =∫ ba fX (x)dx = area under fX between a and b.
Thus, if we know one of FX or fX , we are able to derive the other. Moreover,once we know these, we are able to derive any probability/property of X .
Jeffrey Yang (UNSW Society of Statistics/UNSW Mathematics Society)MATH2901 Revision Seminar August 2020 19 / 116
Important Remarks on Continuous Random Variables
Suppose X is a continuous random variable. Then we have
P(X = a) = 0 for any a ∈ R.
Hence, it is only meaningful to talk about the probability of X lying in somesubinterval(s) of R. Consequently, we have
P(a < X < b) = P(a ≤ X < b) = P(a < X ≤ b) = P(a ≤ X ≤ b).
Thus, we typically do not have to worry about whether an interval containsits boundary points or not.
This is NOT the case for discrete random variables.
Jeffrey Yang (UNSW Society of Statistics/UNSW Mathematics Society)MATH2901 Revision Seminar August 2020 20 / 116
Expectation
Definition (Expected Value of a Discrete Random Variable)
The expected value or mean of a discrete random variable X is
EX = E[X ]def=∑all x
x · P(X = x) =∑all x
x · fX (x),
where fX is the probability function of X .
Definition (Expected Value of a Continuous Random Variable)
The expected value of mean of a continuous random variable X is
E[X ] =
∫ ∞−∞
x · fX (x)dx ,
where fX is the density function of X .
In either case, E[X ] can be interpreted as the long run average of X .Jeffrey Yang (UNSW Society of Statistics/UNSW Mathematics Society)MATH2901 Revision Seminar August 2020 21 / 116
Expectation of Transformed Random Variables
Often, we are interested in a transformation of a random variable. Inparticular, we often examine the rth moment of X about some constant a,E[(X − a)r ]. The following results provide a method of calculating theexpectation of a transformation of a random variable.
Result (Transformation of a Discrete Random Variable)
Let X be a discrete random variable and let g be a function of X . Then
Eg(X ) = E[g(X )] =∑all x
g(x) · fX (x).
Result (Transformation of a Continuous Random Variable)
Let X be a continuous random variable and let g be a function of X . Then
Eg(X ) = E[g(X )] =
∫ ∞−∞
g(X )fX (x)dx .
Jeffrey Yang (UNSW Society of Statistics/UNSW Mathematics Society)MATH2901 Revision Seminar August 2020 22 / 116
Properties of Expectation
Result (Linearity of Expectation)
Let X ,Y be random variables and a, b be constants. Then
E[aX + bY ] = aE[X ] + bE[Y ].
Result (Expected Value of a Constant)
For a constant c ,E[c] = c .
In general, for dependent random variables X ,Y ,
E[XY ] 6= E[X ]E[Y ].
Also, if g is a transformation of X , then typically,
E[g(X )] 6= g(E[X ]).
Jeffrey Yang (UNSW Society of Statistics/UNSW Mathematics Society)MATH2901 Revision Seminar August 2020 23 / 116
Variance and Standard Deviation
Definition (Variance)
Let µ = E[X ]. Then the variance of X , denoted Var[X ], is defined as
Var[X ] = E[(X − µ)2].
Observe that this is the second moment of X about µ.
Definition (Standard Deviation)
The standard deviation of X is the square root of its variance:
σ =√
Var[X ].
Both variance and standard deviation are measures of the spread of arandom variable. Standard deviations are in the same unit as X and thus,can be more readily interpreted. However, variances are easier to work withtheoretically.
Jeffrey Yang (UNSW Society of Statistics/UNSW Mathematics Society)MATH2901 Revision Seminar August 2020 24 / 116
Properties of Variance
Result (Alternative Formula for Variance)
Let X be a random variable and let µ = E[X ]. Then
Var[X ] = E[X 2]− µ2.
The variance will often be calculated using this formula.
Result (Nonlinearity of Variance)
Let X be a random variable and let a be a constant. Then
Var[X + a] = Var[X ]
Var[aX ] = a2 Var[X ].
Pay attention to the nonlinearity of variance. A common mistake is treatingvariance as being linear.
Jeffrey Yang (UNSW Society of Statistics/UNSW Mathematics Society)MATH2901 Revision Seminar August 2020 25 / 116
Moment Generating Functions
Definition (Moment Generating Function)
The moment generating function (mgf) of a random variable X is
µX (u) = E[euX ].
We say that the moment generating function of X exists if mX (u) is finitein some interval containing zero.
The following result regarding the r th moment of X , E[X r ], shows why themoment generating function is called as such.
Result (r th Moment of a random variable)
Let X be a random variable. Then in general, we have
E[X r ] = m(r)X (0)
for r = 0, 1, 2, ....
Jeffrey Yang (UNSW Society of Statistics/UNSW Mathematics Society)MATH2901 Revision Seminar August 2020 26 / 116
Properties of Moment Generating Functions
Result (Uniqueness)
Let X and Y be two random variables all of whose moments exist. If
mX (u) = mY (u)
for all u in a neighbourhood of 0 (i.e., for all |µ| < ε for some ε < 0), then
FX (x) = FY (y) for all x ∈ R.
That is, the moment generating function of a random variable is unique.
Jeffrey Yang (UNSW Society of Statistics/UNSW Mathematics Society)MATH2901 Revision Seminar August 2020 27 / 116
Properties of Moment Generating Functions
Result (Convergence)
Let Xn : n = 1, 2, ... be a sequence of random variables, each withmoment generating function mXn(u). Furthermore, suppose that
limn→∞
mXn(u) for all u in a neighbourhood of 0
and mX (u) is a moment generating function of a random variable X . Then
limn→∞
FXn(x) = FX (x) for all x ∈ R.
That is, convergence of moment generating functions implies convergenceof cumulative distribution functions.
Jeffrey Yang (UNSW Society of Statistics/UNSW Mathematics Society)MATH2901 Revision Seminar August 2020 28 / 116
Location and Scale Families of Densities
Result (Location Family of Densities)
A location family of densities based on the random variable U is thefamily of densities fX (x) where X = U + c for all possible c . Here, fX (x) isgiven by:
fX (x) = fU(x − c).
Result (Scale Family of Densities)
A scale family of densities based on the random variable U is the family ofdensities fX (x) where X = cU for all possible c . fX (x) is given by:
fX (x) = c−1fU
(xc
).
In essence, the density function of a continuous random variable may belongto a family of density functions that all have a similar form. This relates tothe concept of parameters in statistics.Proving the above results is not difficult and is a good exercise.
Jeffrey Yang (UNSW Society of Statistics/UNSW Mathematics Society)MATH2901 Revision Seminar August 2020 29 / 116
Table of Contents
1 Basic Probability Theory
2 Review of Random Variables
3 Computations for Common Distributions
4 Inequalities
5 Limit of Random Variables
6 Tips and Assorted Examples
Jeffrey Yang (UNSW Society of Statistics/UNSW Mathematics Society)MATH2901 Revision Seminar August 2020 30 / 116
Computations for Common Distributions
1 Common distributions summary
2 Computing the MGF of random variables
3 Computing the moments directly
4 Computing the distribution of simple transformed random variables
5 Bivariate density computations
6 Computing the sum of random variables using the convolutionformula, bivariate transformation and MGFs
7 Identifying distributions from the MGF
Jeffrey Yang (UNSW Society of Statistics/UNSW Mathematics Society)MATH2901 Revision Seminar August 2020 31 / 116
Bernoulli Distribution
Definition (Bernoulli Distribution)
For a Bernoulli trial, define the random variable
X =
1 if the trial results in success
0 otherwise
Then X is said to have a Bernoulli distribution.
Result (Probability Function of X )
If X is a Bernoulli random variable defined according to a Bernoulli trialwith success probability 0 < p < 1 then the probability function of X is
fX (x) =
p if x = 1
1− p if x = 0.
An equivalent way of writing this is fX (x) = px(1− p)x , x = 0, 1.
Jeffrey Yang (UNSW Society of Statistics/UNSW Mathematics Society)MATH2901 Revision Seminar August 2020 32 / 116
Binomial Distribution
Definition (Binomial Random Variable)
Consider a sequence of n independent Bernoulli trials, each with successprobability p. If
X = total number of successes
then X is a Binomial random variable with parameters n and p. Acommon shorthand is
X ∼ Bin(n, p).
Here, ”∼” has the meaning ”is distributed as” or ”has distribution”.
Results
If X ∼ Bin(n, p) then
1 fX (x) =(nx
)px(1− p)n−x , x = 0, ..., n,
2 E[X ] = np,
3 Var(X ) = np(1− p).
Jeffrey Yang (UNSW Society of Statistics/UNSW Mathematics Society)MATH2901 Revision Seminar August 2020 33 / 116
Geometric Distribution
Definition (Geometric Distribution)
IfX = number of trials until first success,
then X is said to have a geometric distribution with parameter p (theprobably of success on each trial).
Results
If X ∼ Geom(p), then
1 fX (x ; p) = p(1− p)x−1, x = 1, 2, ...
2 E[X ] = 1p ,
3 Var(X ) = 1−pp2 .
Jeffrey Yang (UNSW Society of Statistics/UNSW Mathematics Society)MATH2901 Revision Seminar August 2020 34 / 116
Poisson Distribution
Definition (Poisson Distribution)
The random variable X has a Poisson distribution with parameter λ > 0if its probability function is
fX (x ;λ) = P(X = x) =e−λλx
x!, x = 0, 1, 2, ...
A common abbreviation is X ∼ Poisson(λ).
The Poisson distribution is a model of the occurrence of point events in acontinuum. The number of points occurring in a time interval t is a randomvariable with a Poisson(λt) distribution
Results
If X ∼ Poisson(λ)
1 E(X ) = λ,
2 Var(X ) = λ.Jeffrey Yang (UNSW Society of Statistics/UNSW Mathematics Society)MATH2901 Revision Seminar August 2020 35 / 116
Exponential Distribution
The exponential distribution is useful for describing the probability structureof positive random variables. The exponential distribution is closely relatedto the Poisson distribution; the time until the next event has an exponentialdistribution with parameter β = 1
λ .
Definition
A random variable X is said to have an exponential distribution withparameter β > 0 if X has density function:
fX (x ;β) =1
βe−x/β, x > 0.
Results1 E(X ) = β,
2 Var(X ) = β2,
3 Memoryless property: P(X > s + t|X > s) = P(X > t).
Jeffrey Yang (UNSW Society of Statistics/UNSW Mathematics Society)MATH2901 Revision Seminar August 2020 36 / 116
Uniform Distribution
Definition (Uniform Distribution)
A continuous random variable X that can take values in the interval (a, b)with equal likelihood is said to have a uniform distribution on (a,b). Acommon shorthand is
X ∼ Unif(a, b).
Results
If X ∼ Unif(a, b), then
1 fX (x ; a, b) = 1b−a , a < x < b,
2 E(X ) = a+b2 ,
3 Var(X ) = (b−a)2
12 .
Jeffrey Yang (UNSW Society of Statistics/UNSW Mathematics Society)MATH2901 Revision Seminar August 2020 37 / 116
Gamma Function
The Gamma Function extends the factorial function to the real numbers.
Definition (Gamma Function)
The Gamma function at x ∈ R is given by
Γ(x) =
∫ ∞0
tx−1e−tdt.
Results1 Γ(x) = (x − 1)Γ(x − 1),
2 Γ(n) = (n − 1)!,, n = 1, 2, 3...
3 Γ( 12 ) =
√π,
4∫∞
0 xme−xdx = m! for m = 0, 1, 2, ....
Jeffrey Yang (UNSW Society of Statistics/UNSW Mathematics Society)MATH2901 Revision Seminar August 2020 38 / 116
Beta Function
Definition (Beta Function)
The Beta function at x , y ∈ R is given by
B(x , y) =
∫ 1
0tx−1(1− t)y−1dt.
Result
For all x , y ∈ R,
B(x , y) =Γ(x)Γ(y)
Γ(x + y).
Jeffrey Yang (UNSW Society of Statistics/UNSW Mathematics Society)MATH2901 Revision Seminar August 2020 39 / 116
Phi Function
Definition (Φ)
For all x ∈ R,
Φ(x) =1√2π
∫ x
−∞e−t
2/2dt.
We cannot simplify the above expression for Φ(x) as there is no closed-formanti-derivative.Φ gives the cumulative distribution function of the standard normaldistribution.
Results1 limx→−∞Φ(x) = 0,
2 limx→∞Φ(x) = 1,
3 Φ(0) = 12 ,
4 Φ is monotonically increasing over R.
Jeffrey Yang (UNSW Society of Statistics/UNSW Mathematics Society)MATH2901 Revision Seminar August 2020 40 / 116
Normal Distribution
Definition (Normal Distribution)
The random variable X is said to have a normal distribution withparameters µ and σ2 (where −∞ < µ <∞ and σ2 > 0) if X has densityfunction
fX (x ;µ, σ) =1
σ√
2πe−(x−µ)2
2σ2 ,−∞ < x <∞.
A common shorthand isX ∼ N(µ, σ2).
Results
If X ∼ N(µ, σ2),
1 E(X ) = µ,
2 Var(X ) = σ2,
Jeffrey Yang (UNSW Society of Statistics/UNSW Mathematics Society)MATH2901 Revision Seminar August 2020 41 / 116
Computing Normal Distribution Probabilities
Result
If Z ∼ N(0, 1) then
P(Z ≤ x) = FZ (x) = Φ(x).
In other words, the Φ function is the cumulative distribution function of theN(0, 1) random variable.
Result
If X ∼ N(µ, σ2) then
Z =X − µσ
∼ N(0, 1).
Jeffrey Yang (UNSW Society of Statistics/UNSW Mathematics Society)MATH2901 Revision Seminar August 2020 42 / 116
Gamma Distribution
Definition (Gamma Distribution)
A random variable X is said to have a Gamma distribution withparameters α and β (where α, β > 0) if X has density function:
fX (x ;α, β) =e−α/βxα−1
Γ(α)βα, x > 0
A common shorthand is:
X ∼ Gamma(α, β).
Results
If X ∼ Gamma(α, β) then
1 E(X ) = αβ,
2 Var(X ) = αβ2.
Moreover, Y has an Exponential distribution iff Y ∼ Gamma(1, β).Jeffrey Yang (UNSW Society of Statistics/UNSW Mathematics Society)MATH2901 Revision Seminar August 2020 43 / 116
Beta Distribution
Definition (Beta Distribution)
A random variable X is said to have a Beta distribution with parametersaα, β > 0 if its density function is
fX (x ;α, β) =xα−1(1− x)β−1
B(α, β), 0 < x < 1.
The Beta distribution generalises the Unif(0, 1) distribution which can bethought of as a Beta distribution with a = b = 1.
Results
If X has a distribution with parameters α and β, then
1 E(X ) = αα+β ,
2 Var(X ) = αβ(α+β+1)(α+β)2 .
Jeffrey Yang (UNSW Society of Statistics/UNSW Mathematics Society)MATH2901 Revision Seminar August 2020 44 / 116
Joint Probability Function and Density Function
Definition (Joint Probability Function)
If X and Y are discrete random variables, then the joint probabilityfunction of X and Y is
fX ,Y (x , y) = P(X = x ,Y = y),
the probability that X = x and Y = y .
Definition (Joint Density Function)
The joint density function of continuous random variables X and Y is abivariate function fX ,Y with the property∫ ∫
AfX ,Y (x , y)dxdy = P((X ,Y ) ∈ A)
for any (measurable) subset A of R2.
Jeffrey Yang (UNSW Society of Statistics/UNSW Mathematics Society)MATH2901 Revision Seminar August 2020 45 / 116
Joint Cumulative Distribution Function
Definition (Joint Cumulative Distribution Function)
The joint cdf of X and Y is
FX ,Y (x , y) = P(X ≤ x ,Y ≤ y)
=
∑u≤x
∑v≤y P(X = u,Y = v) (X discrete)∫ y
−∞∫ x−∞ fX ,Y (u, v)dudv (X continuous).
Result1 If X and Y are discrete radom variables, then∑
all x
∑all y fX ,Y (x , y) = 1.
2 If X and Y are continuous random variables then∫∞−∞
∫∞−∞ fX ,Y (x , y)dxdy = 1.
Jeffrey Yang (UNSW Society of Statistics/UNSW Mathematics Society)MATH2901 Revision Seminar August 2020 46 / 116
Expectation of Joint Functions
Result (Expectation of Joint Functions)
If g is any function of g(X ,Y ),
E[g(X ,Y )] =
=∑
all x
∑all y g(x , y)P(X = x ,Y = y) discrete
=∫∞−∞
∫∞−∞ g(x , y)fX ,Y (x , y)dxdy continuous
Jeffrey Yang (UNSW Society of Statistics/UNSW Mathematics Society)MATH2901 Revision Seminar August 2020 47 / 116
Marginal Probability Function
Result (Marginal Probability Function)
If X and Y are discrete, then fX (x) and fY (y) can be calculated fromfX ,Y (x , y) as follows:
fX (x) =∑all y
fX ,Y (x , y)
fY (y) =∑all x
fX ,Y (x , y).
fX (x) is sometimes referred to as the marginal probability function of X .
Jeffrey Yang (UNSW Society of Statistics/UNSW Mathematics Society)MATH2901 Revision Seminar August 2020 48 / 116
Marginal Density Function
Result (Marginal Density Function)
If X and Y are continuous, then fX (x) and fY (y) can be calculated fromfX ,Y (x , y) as follows:
fX (x) =
∫ ∞−∞
fX ,Y (x , y)dy
fY (y) =
∫ ∞−∞
fX ,Y (x , y)dx .
fX (x) is sometimes referred to as the marginal density function of X .
Jeffrey Yang (UNSW Society of Statistics/UNSW Mathematics Society)MATH2901 Revision Seminar August 2020 49 / 116
Conditional Probability Function
Definition (Conditional Probability Function)
If X and Y are discrete, the conditional probability function of X givenY = y is
fX |Y (x |y) = P(X = x |Y = y) =P(X = x ,Y = y)
P(Y = y)=
fX ,Y (x , y)
fY (y).
Similarly,
fY |X (y |x) = P(Y = y |X = x) =fX ,Y (x , y)
fX (x).
Result
P(Y ∈ A|X = x) =∑y∈A
fY |X (y |X = x).
Jeffrey Yang (UNSW Society of Statistics/UNSW Mathematics Society)MATH2901 Revision Seminar August 2020 50 / 116
Conditional Density Function
Definition (Conditional Density Function)
If X and Y are continuous, the conditional density function of X givenY = y is
fX |Y (x |Y = y) =fX ,Y (x , y)
fY (y).
Similarly,
fY |X (y |X = x) =fX ,Y (x , y)
fX (x).
Shorthand: fY |X (y |x).
Result
P(a ≤ Y ≤ b|X = x) =
∫ b
afY |X (y |x)dy .
Jeffrey Yang (UNSW Society of Statistics/UNSW Mathematics Society)MATH2901 Revision Seminar August 2020 51 / 116
Conditional Expected Value and Variance
Result (Conditional Expected Value)
The conditional expected value of X given Y = y is
E(X |Y = y) =
∑all x xP(X = x |Y = y) if X is discrete∫∞−∞ xfX |Y (x |y)dx if X is continuous
Result (Conditional Variance)
The conditional variance of X given Y = y is
Var(X |Y = y) = E(X 2|Y = y)− [E(X |Y = y)]2
where
E(X 2|Y = y) =
∑all x x
2P(X = x |Y = y)∫∞−∞ x2fX |Y (x |y)dx .
Jeffrey Yang (UNSW Society of Statistics/UNSW Mathematics Society)MATH2901 Revision Seminar August 2020 52 / 116
Independent Random Variables
Definition (Independent)
Random variables X and Y are independent if and only if for all x , y
fX ,Y (x , y) = fX (x)fY (y).
Result
Random variables X and Y are independent if and only if for all x , y
fY |X (x , y) = fY (y)
.
Result
If X and Y are independent
FX ,Y (x , y) = FX (x) · FY (y).
Jeffrey Yang (UNSW Society of Statistics/UNSW Mathematics Society)MATH2901 Revision Seminar August 2020 53 / 116
Covariance
Definition (Covariance
The covariance of X and Y is
Cov(X ,Y ) = E[(X − µX )(Y − µY )].
Cov(X ,Y ) measures how much X and Y vary about their means and alsohow much they vary together linearly.
Results1 Cov(X ,X ) = Var(X )
2 Cov(X ,Y ) = E(XY )− µXµY3 If X and Y are independent, then Cov(X ,Y ) = 0.
Jeffrey Yang (UNSW Society of Statistics/UNSW Mathematics Society)MATH2901 Revision Seminar August 2020 54 / 116
Correlation
Definition (Correlation)
The correlation between X and Y is
Corr(X ,Y ) =Cov(X ,Y )√
Var(X ) · Var(Y ).
Corr(X ,Y ) measures the strength of the linear association between X andY . Independent random variables are uncorrelated, but uncorrelatedvariables are not necessarily independent (Consider the case whereE(X ) = 0 and Y = X 2).
Results1 |Corr(X ,Y )| ≤ 1
2 |Corr(X ,Y )| = 1 if and only if P(Y = a + bX ) = 1 for some constantsa, b.
Jeffrey Yang (UNSW Society of Statistics/UNSW Mathematics Society)MATH2901 Revision Seminar August 2020 55 / 116
Transformations
Result
For discrete X ,
fY (y) = P(Y = y) = P[h(X ) = y ] =∑
x :h(x)=y
P(X = x).
Result
For continuous X , if h is monotonic over the set x : fX (x) > 0 then
fY (y) = fX (x)
∣∣∣∣dxdy∣∣∣∣
= fX (h−1(y))
∣∣∣∣dxdy∣∣∣∣
for y such that fX (h−1(y)) > 0.
Jeffrey Yang (UNSW Society of Statistics/UNSW Mathematics Society)MATH2901 Revision Seminar August 2020 56 / 116
Linear Transformation
Result
For a continuous random variable X , if Y = aX + b is a lineartransformation of X with a 6= 0, then
fY (y) =1
|a|fX
(y − b
a
)for all y such that fX ( y−ba ) > 0.
Jeffrey Yang (UNSW Society of Statistics/UNSW Mathematics Society)MATH2901 Revision Seminar August 2020 57 / 116
Leading into Bivariate Transformations...
If X and Y have joint density fX ,Y (x , y) and U is a function of X and Y ,we can find the density of U by calculating FU(u) = P(U ≤ u) anddifferentiating.
Jeffrey Yang (UNSW Society of Statistics/UNSW Mathematics Society)MATH2901 Revision Seminar August 2020 58 / 116
Bivariate Transformations
Result
If U and V are functions of continuous random variables X and Y , then
fU,V (u, v) = fX ,Y (x , y) · |J|
where
J =
∣∣∣∣∣∣∂x∂u
∂x∂v
∂y∂u
∂y∂v
∣∣∣∣∣∣is a determinant called the Jacobian of the transformation.The full specification of fU,V (u, v) requires that the range of (u, v) valuescorresponding to those (x , y) for which fX ,Y (x , y) > 0 is determined.
Jeffrey Yang (UNSW Society of Statistics/UNSW Mathematics Society)MATH2901 Revision Seminar August 2020 59 / 116
Bivariate Transformations
To find fU(u) by bivariate transformation:
1 Define some bivariation transformation (U,V ).
2 Find fU,V (u, v).
3 We want the marginal distribution of U. So now find∫∞−∞ fU,V (u, v).
Using a bivariate transformation to find the distribution of U is often moreconvenient that deriving it via the cumulative distribution function. Usingthe cdf requires double integration, which we can avoid when we use abivariate transformation.
Jeffrey Yang (UNSW Society of Statistics/UNSW Mathematics Society)MATH2901 Revision Seminar August 2020 60 / 116
Sum of Independent Random Variables - ProbabilityFunction/Density Function Approach
Result (Discrete Convolution Formula)
Suppose that X and Y are independent random variables raking onlynon-negative integer values, and let Z = X + Y . Then
fX (z) =z∑
y=0
fX (z − y)fY (y), z = 0, 1, ...
Result (Continuous Convolution Formula)
Suppose X and Y are independent continuous random variables withX ∼ fX (x) and Y ∼ fY (y). Then Z = X + Y has density
fZ (z) =
∫all possible y
fX (z − y)fY (y)dy .
Jeffrey Yang (UNSW Society of Statistics/UNSW Mathematics Society)MATH2901 Revision Seminar August 2020 61 / 116
Sum of Gammmas
Result
If X1,X2, ...,Xn are independent with Xi ∼ Gamma(αi , β), then
n∑i=1
Xi ∼ Gamma(n∑
i=1
αi , β).
Jeffrey Yang (UNSW Society of Statistics/UNSW Mathematics Society)MATH2901 Revision Seminar August 2020 62 / 116
Sum of Independent Random Variables - MomentGenerating Function Approach
Result
Suppose that X and Y are independent random variables with momentgenerating functions mX and mY . Then
mX+Y (u) = mX (u)mY (u).
This generalises to n independent random variables.
Result
If X ∼ N(µX , σ2X ) and Y ∼ N(σY , σ
2Y ) are independent then
X + Y ∼ N(µX + µY , σ2X + σ2
Y ).
This also generalises.
Jeffrey Yang (UNSW Society of Statistics/UNSW Mathematics Society)MATH2901 Revision Seminar August 2020 63 / 116
Table of Contents
1 Basic Probability Theory
2 Review of Random Variables
3 Computations for Common Distributions
4 Inequalities
5 Limit of Random Variables
6 Tips and Assorted Examples
Jeffrey Yang (UNSW Society of Statistics/UNSW Mathematics Society)MATH2901 Revision Seminar August 2020 64 / 116
Inequalities
1 Jensen’s Inequality
2 Markov’s inequality
3 Chebyshev’s inequality
Jeffrey Yang (UNSW Society of Statistics/UNSW Mathematics Society)MATH2901 Revision Seminar August 2020 65 / 116
Jensen’s Inequality
Result (Jensen’s Inequality)
If h is a convex function (i.e., concave up) and X is a random variable, then
E[h(X )] ≥ h(E[X ]).
There are several formulations of Jensen’s inequality and the aboveformulation is the one most relevant for probability theory. Studentsstudying MATH2701 next term will be delighted to re-encounter Jensen’sinequality albeit in a different formulation.
Jeffrey Yang (UNSW Society of Statistics/UNSW Mathematics Society)MATH2901 Revision Seminar August 2020 66 / 116
Markov’s Inequality
Result (Markov’s Inequality)
Let X be a nonnegative random variable and a > 0. Then
P(X ≥ a) ≤ E[X ]
a.
That is, the probability that X is at least a is at most the expectation of Xdivided by a.
Jeffrey Yang (UNSW Society of Statistics/UNSW Mathematics Society)MATH2901 Revision Seminar August 2020 67 / 116
Chebychev’s Inequality
Result (Chebychev’s Inequality)
If X is any random variable with E[X ] = µ, Var(X ) = σ2 then
P(|X − µ| > kσ) ≤ 1
k2.
That is, the probability that X is more than k standard deviations from itsmean is less than 1
k2 .
The significance of Chebychev’s Inequality is that we can make specificprobabilistic statements about a random variable given only its mean andstandard deviation – observe that we have not made any assumptions onthe distribution of X .Interestingly, Chebychev’s Inequality can be easily derived as a corollary ofMarkov’s Inequality (Hint: consider the random variable (X − E[X ])2)
Jeffrey Yang (UNSW Society of Statistics/UNSW Mathematics Society)MATH2901 Revision Seminar August 2020 68 / 116
Table of Contents
1 Basic Probability Theory
2 Review of Random Variables
3 Computations for Common Distributions
4 Inequalities
5 Limit of Random Variables
6 Tips and Assorted Examples
Jeffrey Yang (UNSW Society of Statistics/UNSW Mathematics Society)MATH2901 Revision Seminar August 2020 69 / 116
Limit of Random Variables
1 Almost Sure Convergence
2 Convergence in Probability
3 Convergence in Distribution
4 Central Limit Theorem
5 Law of Large Numbers
6 The Statement and Application of the Delta Method
Jeffrey Yang (UNSW Society of Statistics/UNSW Mathematics Society)MATH2901 Revision Seminar August 2020 70 / 116
Almost Sure Convergence
Definition (Almost Sure Convergence)
The sequence of numerical random variables X1,X2, ... is said to convergealmost surely to a numerical random variable X , denoted Xn
a.s.→ X if
P(ω : lim
n→∞Xn(ω) = X (ω)
)= 1.
Within the context of probability theory, an event is said to happen almostsurely if it happens with probability 1. Note that it is possible for the set ofexceptions to be non-empty as long as it has probability 0.
Jeffrey Yang (UNSW Society of Statistics/UNSW Mathematics Society)MATH2901 Revision Seminar August 2020 71 / 116
Almost Sure Convergence
The previous definition of almost sure convergence can be difficult to workwith so we will make use of an alternative definition.
Result (Alternative Definition of Almost Sure Convergence)
Xna.s.→ X
if and only if for every ε > 0,
limn→∞
P
(supk≥n|Xk − X | > ε
)= 0.
Almost Sure Convergence is the mode of convergence used in the StrongLaw of Large Numbers.
Jeffrey Yang (UNSW Society of Statistics/UNSW Mathematics Society)MATH2901 Revision Seminar August 2020 72 / 116
Convergence in Probability
The main idea behind Convergence in Probability is that the probability ofan “unusual” event becomes smaller as the sequence progresses.
Definition (Convergence in Probability)
The sequence of random variables X1,X2, ... converges in probability to arandom variable X if, for all ε > 0,
limn→∞
P(|Xn − X | > ε) = 0.
This is usually written as
XnP→ X .
For context, an estimator is called consistent if it converges in probabilityto the quantity being estimated. Furthermore, convergence in probability isthe type of convergence established by the Weak Law of Large Numbers.
Jeffrey Yang (UNSW Society of Statistics/UNSW Mathematics Society)MATH2901 Revision Seminar August 2020 73 / 116
Relationship between Almost Sure Convergence andConvergence in Probability
Result (Almost Sure Convergence implies Convergence in Probability)
Xna.s.→ X =⇒ Xn
P→ X
andXn
a.s.→ 0 ⇐⇒ supk≥n|Xk |
P→ 0.
To understand why almost sure convergence is stronger than convergence inprobability, almost sure convergence depends on a joint distribution whereasconvergence in probability depends only on a marginal distribution.Wise words of wisdom: Almost sure convergence means no noodle leavesthe strip (for large enough n), convergence in probability means theproportion of noodles leaving the strip goes to 0 (as n→∞).
Jeffrey Yang (UNSW Society of Statistics/UNSW Mathematics Society)MATH2901 Revision Seminar August 2020 74 / 116
Convergence in Distribution
Convergence in Distribution is concerned with whether the distributions ofXi converges to the distribution of some random variable X . In other words,we increasingly expect to see the next outcome in a sequence of randomexperiments become better modelled by a given probability distribution.
Definition (Convergence in Distribution)
Let X1,X2, ... be a sequence of random variables. We say that Xn
converges in distribution to X if
limn→∞
FXn(x) = FX (x)
for all x where FX is continuous. A common shorthand is Xnd→ X .
We say that FX is the limiting distribution of Xn.
Convergence in distribution often arises in applications of the central limittheorem.
Jeffrey Yang (UNSW Society of Statistics/UNSW Mathematics Society)MATH2901 Revision Seminar August 2020 75 / 116
More on Convergence in Distribution
Convergence in distribution allows us to make approximate probabilitystatements about Xn, for large n, if we can derive the limiting distributionFX (n).
Result (Establishing Convergence in Distribution using Moments)
Let Xnn∈N be a sequence of random variables, each with mgf MXi(t).
Furthermore, suppose that
limn→∞
MXn(t) = MX (t).
If MX (t) is a moment generating function then there is a unique FX (whichgives a random variable X ) whose moments are determined by MX (t) andfor all points of continuity FX (x) we have
limn→∞
FXn(x) = FX (x).
Jeffrey Yang (UNSW Society of Statistics/UNSW Mathematics Society)MATH2901 Revision Seminar August 2020 76 / 116
Relationship between Convergence in Probability andConvergence in Distribution
Result (Convergence in Probability implies Convergence in Distribution)
XnP→ X =⇒ Xn
d→ X .
Convergence in probability is concerned with the convergence of the actualvalues (the xi ’s) whereas convergence in distribution is concerned with theconvergence of the distributions (the FXi
(x)’s).
Jeffrey Yang (UNSW Society of Statistics/UNSW Mathematics Society)MATH2901 Revision Seminar August 2020 77 / 116
Weak Law of Large Numbers
Result (Weak Law of Large Numbers)
Suppose X1,X2, ... are independent, each with mean µ and variance0 < σ2 <∞. If
Xn =1
n
n∑i=1
Xi , then XnP→ µ.
The Weak Law of Large Numbers describes how the sample averageconverges to the distributional average as the sample size increases.
Jeffrey Yang (UNSW Society of Statistics/UNSW Mathematics Society)MATH2901 Revision Seminar August 2020 78 / 116
Slutsky’s Theorem
Result (Slutsky’s Theorem)
Let X1,X2, ... be a sequence of random variables such that Xnd→ X for
some distribution X and let Y1,Y2, ... be another sequence of random
variables such that YnP→ c for some constant c . Then
1 Xn + Ynd→ X + c
2 XnYnd→ cX .
Slutsky’s Theorem extends some properties of algebraic operations onconvergent sequences of real numbers to sequences of random variables andis useful for establishing convergence in distribution results.
Jeffrey Yang (UNSW Society of Statistics/UNSW Mathematics Society)MATH2901 Revision Seminar August 2020 79 / 116
Strong Law of Large Numbers
The Weak Law corresponds to convergence in probability while the StrongLaw corresponds to almost sure convergence.
Result (Strong Law of Large Numbers)
Let X1,X2, ... be independent with common mean E[X ] = µ and varianceVar(X ) = σ2 <∞, then
Xna.s.→ µ.
Jeffrey Yang (UNSW Society of Statistics/UNSW Mathematics Society)MATH2901 Revision Seminar August 2020 80 / 116
Central Limit Theorem
Result (Central Limit Theorem)
Suppose X1,X2, ... are independent and identically distributed randomvariables with common mean µ = E(Xi ) and common variance
σ2 = Var(Xi ) <∞. For each n ≥ 1 let Xn =∑n
i=1 Xi
n . Then
Xn − µσ/√n
d→ Z
where Z ∼ N(0, 1). It is common to write
Xn − µσ/√n
d→ N(0, 1).
Note that E(Xn = µ and Var(Xn) = σ2
n so that Central Limit Theoremstates that the limiting distribution of any standardised average ofindependent random variables is the standard Normal distribution.
Jeffrey Yang (UNSW Society of Statistics/UNSW Mathematics Society)MATH2901 Revision Seminar August 2020 81 / 116
Alternative Forms of the Central Limit Theorem
Sometimes, probabilities involving related quantities such as the sum∑ni=1 Xi are required. Since
∑ni=1 Xi = nX , the Central Limit Theorem
also applies to the sum of a sequence of random variables.
Results
1√n(X − µ)
d→ N(0, σ2)
2
∑i Xi−nµσ√n
d→ N(0, 1)
3
∑i Xi−nµ√
n
d→ N(0, σ2)
Jeffrey Yang (UNSW Society of Statistics/UNSW Mathematics Society)MATH2901 Revision Seminar August 2020 82 / 116
Applications of the Central Limit Theorem
Central Limit Theorem for Binomial Distribution
Suppose X ∼ Bin(n, p). Then
X − np√np(1− p)
d→ N(0, 1)
Normal Approximation to the Poisson Distribution
Suppose X ∼ Poisson(λ). Then
limλ→∞
P(X − λ√
λ≤ x
)= P(Z ≤ x)
where Z ∼ N(0, 1).
Continuity correction: add 12 to the numerator.
Jeffrey Yang (UNSW Society of Statistics/UNSW Mathematics Society)MATH2901 Revision Seminar August 2020 83 / 116
The Delta Method
The Delta Method
Let Y1,Y2, ... be a sequence of random variables such that
√n(Yn − θ)
σ
d→ N(0, 1).
Suppose the function g is differentiable in the neighbourhood of θ andg ′(θ) 6= 0. Then
√n(g(Yn)− g(θ))
d→ N(0, σ2[g ′(θ)]2).
Alternatively,g(Yn)− g(θ)
g ′(θ)/√n
d→ N(0, 1).
There are several ways of stating the Delta method.
Jeffrey Yang (UNSW Society of Statistics/UNSW Mathematics Society)MATH2901 Revision Seminar August 2020 84 / 116
Table of Contents
1 Basic Probability Theory
2 Review of Random Variables
3 Computations for Common Distributions
4 Inequalities
5 Limit of Random Variables
6 Tips and Assorted Examples
Jeffrey Yang (UNSW Society of Statistics/UNSW Mathematics Society)MATH2901 Revision Seminar August 2020 85 / 116
General Tips
1 When finding a pdf/cdf, always specify the domain over which it isdefined.
2 Make sure that the relevant conditions are satisfied before applyingsome result/theorem (e.g., are the random variables i.i.d? Does asequence of random variables have the right mode of convergence?)
3 Does the pdf in question belongs to a known family? If so, you cantypically make use of some known results.
4a.s.→ =⇒ P→ =⇒ d→
5 Know how to do calculations involving bivariate distributions
6 Know how to apply (bivariate) transformations
7 Know how to sum independent random variables using the convolutionformula approach and the moment generating function approach.
8 Know the Large Numbers Laws, the CLT and the Delta method.
Jeffrey Yang (UNSW Society of Statistics/UNSW Mathematics Society)MATH2901 Revision Seminar August 2020 86 / 116
Example 1
Example (MATH1251) (contd.)
99% of the people with the disease receive a positive test. 98% of thosewithout receive a negative test. If 2% of the population have the disease,determine the probability of someone having the disease given they receiveda positive test.
Jeffrey Yang (UNSW Society of Statistics/UNSW Mathematics Society)MATH2901 Revision Seminar August 2020 87 / 116
Example 1
We require P(D | T ) =P(T | D)P(D)
P(T ).
P(T ) = P(T | D)P(D) + P(T | Dc)P(Dc)
= P(T | D)P(D) +(1− P(T c | Dc)
)P(Dc)
= 0.99× 0.02 + (1− 0.98)× 0.98 = 0.0394
∴ P(D | T ) =0.99× 0.02
0.0394≈ 0.5025
Jeffrey Yang (UNSW Society of Statistics/UNSW Mathematics Society)MATH2901 Revision Seminar August 2020 87 / 116
Example 1
A lot of people get stuck with Bayes’ law, especially when used with otherresults. Use a tree diagram!
Jeffrey Yang (UNSW Society of Statistics/UNSW Mathematics Society)MATH2901 Revision Seminar August 2020 87 / 116
Example 2
Example
Given the distribution of X below, compute its expectation and standarddeviation.
x 0 3 9 27
P(X = x) 0.3 0.1 0.5 0.1
E[X ] = 7.5
Jeffrey Yang (UNSW Society of Statistics/UNSW Mathematics Society)MATH2901 Revision Seminar August 2020 88 / 116
Example 2
Example
Given the distribution of X below, compute its expectation and standarddeviation.
x 0 3 9 27
P(X = x) 0.3 0.1 0.5 0.1
E[X ] =∑all x
x P(X = x)
= 0× 0.3 + 3× 0.1 + 9× 0.5 + 27× 0.1
= 7.5
Jeffrey Yang (UNSW Society of Statistics/UNSW Mathematics Society)MATH2901 Revision Seminar August 2020 88 / 116
Example 2
Example
Given the distribution of X below, compute its expectation and standarddeviation.
x 0 3 9 27
P(X = x) 0.3 0.1 0.5 0.1
E[X ] = 7.5
E[X 2] = 02 × 0.3 + 32 × 0.1 + 92 × 0.5 + 272 × 0.1
= 114.3
Jeffrey Yang (UNSW Society of Statistics/UNSW Mathematics Society)MATH2901 Revision Seminar August 2020 88 / 116
Example 2
Example
Given the distribution of X below, compute its expectation and standarddeviation.
x 0 3 9 27
P(X = x) 0.3 0.1 0.5 0.1
E[X ] = 7.5
E[X 2] = 114.3
σX =
√E[X 2]− (E[X ])2 =
√114.3− 7.52 =
√58.05 ≈ 7.619
Jeffrey Yang (UNSW Society of Statistics/UNSW Mathematics Society)MATH2901 Revision Seminar August 2020 88 / 116
Example 2
Example (2901 oriented)
Let X ∼ Geom(p). Prove that E[X ] = 1p .
E[X ] =∞∑x=1
xp(1− p)x−1
Jeffrey Yang (UNSW Society of Statistics/UNSW Mathematics Society)MATH2901 Revision Seminar August 2020 88 / 116
Example 2
Example (2901 oriented)
Let X ∼ Geom(p). Prove that E[X ] = 1p .
Recall: P(X = x) = p(1− p)x−1 for x = 1, 2, . . .
E[X ] =∑all x
x P(X = x) =∞∑x=1
xp(1− p)x−1
Jeffrey Yang (UNSW Society of Statistics/UNSW Mathematics Society)MATH2901 Revision Seminar August 2020 88 / 116
Example 2
Example (2901 oriented)
Let X ∼ Geom(p). Prove that E[X ] = 1p .
E[X ] =∞∑x=1
xp(1− p)x−1
=∞∑y=0
(y + 1)p(1− p)y (y = x − 1)
= (1− p)
∞∑y=0
(y + 1)p(1− p)y−1
= (1− p)∞∑y=0
yp(1− p)y−1 + (1− p)∞∑y=0
p(1− p)y−1
= (1− p)∞∑y=1
yp(1− p)y−1 + (1− p)∞∑y=1
p(1− p)y−1
+ p(1− p)−1 (evaluating at y = 0)
= (1− p)E[X ] + (1− p)
(1 + p(1− p)−1
)
Jeffrey Yang (UNSW Society of Statistics/UNSW Mathematics Society)MATH2901 Revision Seminar August 2020 88 / 116
Example 2
E[X ] =∞∑x=1
xp(1− p)x−1
=∞∑y=0
(y + 1)p(1− p)y (y = x − 1)
= (1− p)
∞∑y=0
(y + 1)p(1− p)y−1
= (1− p)
∞∑y=0
yp(1− p)y−1 + (1− p)∞∑y=0
p(1− p)y−1
= (1− p)∞∑y=1
yp(1− p)y−1 + (1− p)∞∑y=1
p(1− p)y−1
+ p(1− p)−1 (evaluating at y = 0)
= (1− p)E[X ] + (1− p)
(1 + p(1− p)−1
)
Jeffrey Yang (UNSW Society of Statistics/UNSW Mathematics Society)MATH2901 Revision Seminar August 2020 88 / 116
Example 2
E[X ] =∞∑x=1
xp(1− p)x−1
= (1− p)∞∑y=0
yp(1− p)y−1 + (1− p)∞∑y=0
p(1− p)y−1
= (1− p)∞∑y=1
yp(1− p)y−1 + (1− p)∞∑y=1
p(1− p)y−1
+ p(1− p)−1 (evaluating at y = 0)
= (1− p)E[X ] + (1− p)
(1 + p(1− p)−1
)
Jeffrey Yang (UNSW Society of Statistics/UNSW Mathematics Society)MATH2901 Revision Seminar August 2020 88 / 116
Example 3
Example (2901 oriented)
Let X ∼ Geom(p). Prove that E[X ] = 1p .
∴ pE[X ] =
((1− p) + p
)E[X ] =
1
p
Jeffrey Yang (UNSW Society of Statistics/UNSW Mathematics Society)MATH2901 Revision Seminar August 2020 89 / 116
Example 3
In general, can be done with the aid of Taylor series or binomial theorem.But preferably just do this:
Method (Deriving Expected Value from definition) (2901)
Keep rearranging the expression until you make the entire density, or E[X ],appear again.
Discrete case - Use a change of summation index at some point
Continuous case - Use integration by parts (or occasionally integrationby substitution)
Jeffrey Yang (UNSW Society of Statistics/UNSW Mathematics Society)MATH2901 Revision Seminar August 2020 89 / 116
Example 4
Example
Let fX (x) = 2θ2 x for 0 < x < θ. Compute the MGF and (2901) assert its
existence.
Jeffrey Yang (UNSW Society of Statistics/UNSW Mathematics Society)MATH2901 Revision Seminar August 2020 90 / 116
Example 4
Example
Let fX (x) = 2θ2 x for 0 < x < θ. Compute the MGF and (2901) assert its
existence.
Integrate by parts
mX (u) = E[euX ] =2
θ2
∫ θ
0xeux dx
=2
θ2
(xeux
u
∣∣∣∣θ0
−∫ θ
0
eux
udx
)
=2θeuθ
uθ2− 2
θ2
(eux
u2
∣∣∣∣θ0
)
=2(uθeuθ − euθ + 1)
u2θ2
Jeffrey Yang (UNSW Society of Statistics/UNSW Mathematics Society)MATH2901 Revision Seminar August 2020 90 / 116
Example 4
Example
Let fX (x) = 2θ2 x for 0 < x < θ. Compute the MGF and (2901) assert its
existence.
Slowly tidy everything up
mX (u) = E[euX ] =2
θ2
∫ θ
0xeux dx
=2
θ2
(xeux
u
∣∣∣∣θ0
−∫ θ
0
eux
udx
)
=2θeuθ
uθ2− 2
θ2
(eux
u2
∣∣∣∣θ0
)
=2(uθeuθ − euθ + 1)
u2θ2
Jeffrey Yang (UNSW Society of Statistics/UNSW Mathematics Society)MATH2901 Revision Seminar August 2020 90 / 116
Example 4
Example
Let fX (x) = 2θ2 x for 0 < x < θ. Compute the MGF and (2901) assert its
existence.
Slowly tidy everything up
mX (u) = E[euX ] =2
θ2
∫ θ
0xeux dx
=2
θ2
(xeux
u
∣∣∣∣θ0
−∫ θ
0
eux
udx
)
=2θeuθ
uθ2− 2
θ2
(eux
u2
∣∣∣∣θ0
)
=2(uθeuθ − euθ + 1)
u2θ2
Jeffrey Yang (UNSW Society of Statistics/UNSW Mathematics Society)MATH2901 Revision Seminar August 2020 90 / 116
Example 4
Example
Let fX (x) = 2θ2 x for 0 < x < θ. Compute the MGF and (2901) assert its
existence.
mX (u) =2(uθeuθ − euθ + 1)
u2θ2
GeoGebra simulation
Jeffrey Yang (UNSW Society of Statistics/UNSW Mathematics Society)MATH2901 Revision Seminar August 2020 90 / 116
Example 4
Example
Let fX (x) = 2θ2 x for 0 < x < θ. Compute the MGF and (2901) assert its
existence.
Idea: Can check that the limit as u → 0 is finite. The finiteness of the limitimplies the required result.
limu→0
2(uθeuθ − euθ + 1)
u2θ2
LH= lim
u→0
2(θeuθ + uθ2euθ − θeuθ
)2uθ2
= limu→0
euθ
= 1
Jeffrey Yang (UNSW Society of Statistics/UNSW Mathematics Society)MATH2901 Revision Seminar August 2020 90 / 116
Example 5
Example
Use the MGF of X ∼ Bin(n, p) to prove that E[X ] = np.
E[X ] = limu→0
d
du(1− p + peu)n
= limu→0
n(1− p + peu)n−1 · peu
= n(1− p + p)n−1 · p= np
Jeffrey Yang (UNSW Society of Statistics/UNSW Mathematics Society)MATH2901 Revision Seminar August 2020 91 / 116
Example 5
Example
Use the MGF of X ∼ Bin(n, p) to prove that E[X ] = np.
E[X ] = limu→0
d
du(1− p + peu)n
= limu→0
n(1− p + peu)n−1 · peu
= n(1− p + p)n−1 · p= np
Jeffrey Yang (UNSW Society of Statistics/UNSW Mathematics Society)MATH2901 Revision Seminar August 2020 91 / 116
Example 6
Example
A busy switchboard receives 150 calls an hour on average. Assume thatcalls are independent from each other and can be modelled with a Poissondistribution. Find the probability of
1 Exactly 3 calls in a given minute
2 At least 10 calls in a given 5 minute period.
Naive:X ∼ Poisson(150).
Jeffrey Yang (UNSW Society of Statistics/UNSW Mathematics Society)MATH2901 Revision Seminar August 2020 92 / 116
Example 6
Example
A busy switchboard receives 150 calls an hour on average. Assume thatcalls are independent from each other and can be modelled with a Poissondistribution. Find the probability of
1 Exactly 3 calls in a given minute
2 At least 10 calls in a given 5 minute period.
In Q1, take X ∼ Poisson(150/60) = Poisson(2.5). Then,
P(X = 3) = e−2.5 2.53
3!≈ 0.2138
Jeffrey Yang (UNSW Society of Statistics/UNSW Mathematics Society)MATH2901 Revision Seminar August 2020 92 / 116
Example 6
Example
A busy switchboard receives 150 calls an hour on average. Assume thatcalls are independent from each other and can be modelled with a Poissondistribution. Find the probability of
1 Exactly 3 calls in a given minute
2 At least 10 calls in a given 5 minute period.
In Q2, take Y ∼ Poisson(2.5× 5) = Poisson(12.5). Then,
P(Y ≥ 10) = 1− P(Y ≤ 9)
= 1− e−12.5
(12.50
0!+ · · ·+ 12.59
9!
)
= 1 - ppois(9,lambda=12.5,lower=TRUE)
≈ 0.7985689
Jeffrey Yang (UNSW Society of Statistics/UNSW Mathematics Society)MATH2901 Revision Seminar August 2020 92 / 116
Example 6
Example
A busy switchboard receives 150 calls an hour on average. Assume thatcalls are independent from each other and can be modelled with a Poissondistribution. Find the probability of
1 Exactly 3 calls in a given minute
2 At least 10 calls in a given 5 minute period.
In Q2, take Y ∼ Poisson(2.5× 5) = Poisson(12.5). Then,
P(Y ≥ 10) = 1− P(Y ≤ 9)
= 1 - ppois(9,lambda=12.5,lower=TRUE)
≈ 0.7985689
Jeffrey Yang (UNSW Society of Statistics/UNSW Mathematics Society)MATH2901 Revision Seminar August 2020 92 / 116
Example 7
Example (2901 course pack)
If, on average, 5 servers go offline during the day, what is the chance thatno servers will go offline in the next hour?
The number of servers going offline in a day is X ∼ Poisson(5).
So the time taken for the next server to go offline is T ∼ Exp(0.2),measured in days.
P(T >
1
24
)=
∫ ∞1/24
5e−5t dt
= e−5/24
Jeffrey Yang (UNSW Society of Statistics/UNSW Mathematics Society)MATH2901 Revision Seminar August 2020 93 / 116
Example 7
Example (2901 course pack)
If, on average, 5 servers go offline during the day, what is the chance thatno servers will go offline in the next hour?
The number of servers going offline in a day is X ∼ Poisson(5).
So the time taken for the next server to go offline is T ∼ Exp(0.2),measured in days.
P(T >
1
24
)=
∫ ∞1/24
5e−5t dt
= e−5/24
Jeffrey Yang (UNSW Society of Statistics/UNSW Mathematics Society)MATH2901 Revision Seminar August 2020 93 / 116
Example 7
Example (2901 course pack)
If, on average, 5 servers go offline during the day, what is the chance thatno servers will go offline in the next hour?
The number of servers going offline in a day is X ∼ Poisson(5).
So the time taken for the next server to go offline is T ∼ Exp(0.2),measured in days.
∴ We require P(T >
1
24
)
P(T >
1
24
)=
∫ ∞1/24
5e−5t dt
= e−5/24
Jeffrey Yang (UNSW Society of Statistics/UNSW Mathematics Society)MATH2901 Revision Seminar August 2020 93 / 116
Example 7
Example (2901 course pack)
If, on average, 5 servers go offline during the day, what is the chance thatno servers will go offline in the next hour?
The number of servers going offline in a day is X ∼ Poisson(5).
So the time taken for the next server to go offline is T ∼ Exp(0.2),measured in days.
P(T >
1
24
)=
∫ ∞1/24
5e−5t dt
= e−5/24
Jeffrey Yang (UNSW Society of Statistics/UNSW Mathematics Society)MATH2901 Revision Seminar August 2020 93 / 116
Example 8
Formula (Transforming a Discrete r.v.)
P(h(X ) = y
)=
∑x :h(x)=y
P(X = x)
Um, ye wat?
Jeffrey Yang (UNSW Society of Statistics/UNSW Mathematics Society)MATH2901 Revision Seminar August 2020 94 / 116
Example 8
Example
A random variable has the following distribution:
x -1 0 1 2
P(X = x) 0.38 0.21 0.14 0.27
Determine the distribution of Y = X 3 and Z = X 2.
Jeffrey Yang (UNSW Society of Statistics/UNSW Mathematics Society)MATH2901 Revision Seminar August 2020 94 / 116
Example 8
Example
A random variable has the following distribution:
x -1 0 1 2
P(X = x) 0.38 0.21 0.14 0.27
Determine the distribution of Y = X 3 and Z = X 2.
If X can take the values −1, 0, 1, 2,then Y = X 3 takes the values −1, 0, 1, 8.
P(Y = −1) = P(X 3 = −1) = P(X = −1) = 0.38
Similarly, P(Y = 0) = 0.21, P(Y = 1) = 0.14, P(Y = 8) = 0.27.
Jeffrey Yang (UNSW Society of Statistics/UNSW Mathematics Society)MATH2901 Revision Seminar August 2020 94 / 116
Example 8
Example
A random variable has the following distribution:
x -1 0 1 2
P(X = x) 0.38 0.21 0.14 0.27
Determine the distribution of Y = X 3 and Z = X 2.
On the other hand, X 2 can only take the values of 0, 1, 4.
P(Z = 0) = P(X 2 = 0) = P(X = 0) = 0.21
P(Z = 1) = P(X 2 = 1) = P(X = ±1) = 0.38 + 0.14 = 0.62
...and P(Z = 4) is still equal to 0.27.
Jeffrey Yang (UNSW Society of Statistics/UNSW Mathematics Society)MATH2901 Revision Seminar August 2020 94 / 116
Example 8
Example
A random variable has the following distribution:
x -1 0 1 2
P(X = x) 0.38 0.21 0.14 0.27
Determine the distribution of Y = X 3 and Z = X 2.
On the other hand, X 2 can only take the values of 0, 1, 4.
P(Z = 0) = P(X 2 = 0) = P(X = 0) = 0.21
P(Z = 1) = P(X 2 = 1) = P(X = ±1) = 0.38 + 0.14 = 0.62
...and P(Z = 4) is still equal to 0.27.
Jeffrey Yang (UNSW Society of Statistics/UNSW Mathematics Society)MATH2901 Revision Seminar August 2020 94 / 116
Example 8
Just to think about... (2901 oriented)
If X ∼ Poisson(λ), what must be the distribution of Y = X 2
P(Y = y) =
e−λ λ
√y
(√y)! if y = 0, 1, 4, 9, . . .
0 otherwise
Jeffrey Yang (UNSW Society of Statistics/UNSW Mathematics Society)MATH2901 Revision Seminar August 2020 94 / 116
Example 9
Method 1 (Continuous random variable transform theorem)
Consider the transform y = h(x). If h is monotonic wherever fX (x) isnon-zero, then the density of Y = h(X ) is
fY (y) = fX(h−1(y)
) ∣∣∣∣dxdy∣∣∣∣
Example
Let X ∼ Exp(λ). What is the density of Y = X 2?
Jeffrey Yang (UNSW Society of Statistics/UNSW Mathematics Society)MATH2901 Revision Seminar August 2020 95 / 116
Example 9
Example
Let X ∼ Exp(λ). What is the density of Y = X 2?
fX (x) = 1λe−x/λ for all x > 0.
h(x) = x2 is invertible for all x > 0, with h−1(y) =√y .
x =√y , so dx
dy = 12√y
∴ fY (y) = fX (√y)
∣∣∣∣ 1
2√y
∣∣∣∣
=1
λe−√y/λ
∣∣∣∣ 1
2√y
∣∣∣∣=
1
2λ√ye−√y/λ
Jeffrey Yang (UNSW Society of Statistics/UNSW Mathematics Society)MATH2901 Revision Seminar August 2020 95 / 116
Example 9
Example
Let X ∼ Exp(λ). What is the density of Y = X 2?
fX (x) = 1λe−x/λ for all x > 0.
h(x) = x2 is invertible for all x > 0, with h−1(y) =√y .
x =√y , so dx
dy = 12√y
∴ fY (y) = fX (√y)
∣∣∣∣ 1
2√y
∣∣∣∣=
1
λe−√y/λ
∣∣∣∣ 1
2√y
∣∣∣∣=
1
2λ√ye−√y/λ
Jeffrey Yang (UNSW Society of Statistics/UNSW Mathematics Society)MATH2901 Revision Seminar August 2020 95 / 116
Example 10
Example
Let X ∼ Exp(λ). What is the density of Y = X 2?
fY (y) =1
2λ√ye−√y/λ
Since x > 0 and y = x2, y > 0 as well.
Jeffrey Yang (UNSW Society of Statistics/UNSW Mathematics Society)MATH2901 Revision Seminar August 2020 96 / 116
Example 10
Example
Let X ∼ Unif(−10, 10). What is the density of Y = X 2?
fY (y) =1
20√y
Since −10 < x < 10 and y = x2, we must have 0 < y < 100.
Jeffrey Yang (UNSW Society of Statistics/UNSW Mathematics Society)MATH2901 Revision Seminar August 2020 96 / 116
Example 11
Example
The joint probability distribution of X and Y is
y
0 1 2
0 1/16 1/8 1/8x 1 1/8 1/16 0
2 3/16 1/4 1/16
Determine P(X = 0,Y = 1), P(X ≥ 1,Y < 1) and P(X − Y = 1)
P(X = 0,Y = 1) =1
8
Jeffrey Yang (UNSW Society of Statistics/UNSW Mathematics Society)MATH2901 Revision Seminar August 2020 97 / 116
Example 11
Example
The joint probability distribution of X and Y is
y
0 1 2
0 1/16 1/8 1/8x 1 1/8 1/16 0
2 3/16 1/4 1/16
Determine P(X = 0,Y = 1), P(X ≥ 1,Y < 1) and P(X − Y = 1)
P(X ≥ 1,Y < 1) = P(X = 1,Y = 0) + P(X = 2,Y = 0)
=1
8+
3
16=
5
16
Jeffrey Yang (UNSW Society of Statistics/UNSW Mathematics Society)MATH2901 Revision Seminar August 2020 97 / 116
Example 11
Example
The joint probability distribution of X and Y is
y
0 1 2
0 1/16 1/8 1/8x 1 1/8 1/16 0
2 3/16 1/4 1/16
Determine P(X = 0,Y = 1), P(X ≥ 1,Y < 1) and P(X − Y = 1)
P(X − Y = 1) = P(X = 2,Y = 1) + P(X = 1,Y = 0)
=1
4+
1
8=
3
8
Jeffrey Yang (UNSW Society of Statistics/UNSW Mathematics Society)MATH2901 Revision Seminar August 2020 97 / 116
Example 12
Joint continuous distributions
Unless you know how to use indicator functions really well (2901), sketchthe region!
Example
fX ,Y (x , y) =1
x2y2x ≥ 1, y ≥ 1
is the joint density of the continuous r.v.s X and Y . Find P(X < 2,Y ≥ 4)and P(X ≤ Y 2).
Jeffrey Yang (UNSW Society of Statistics/UNSW Mathematics Society)MATH2901 Revision Seminar August 2020 98 / 116
Example 12
Example
fX ,Y (x , y) =1
x2y2x ≥ 1, y ≥ 1
is the joint density of the continuous r.v.s X and Y . Find P(X < 2,Y ≥ 4)and P(X ≤ Y 2).
P(X < 2,Y ≥ 4) =
∫ 2
1
∫ ∞4
1
x2y2dy dx
=
∫ 2
1
1
4x2dx
=1
8
Jeffrey Yang (UNSW Society of Statistics/UNSW Mathematics Society)MATH2901 Revision Seminar August 2020 98 / 116
Example 12
Example
fX ,Y (x , y) =1
x2y2x ≥ 1, y ≥ 1
is the joint density of the continuous r.v.s X and Y . Find P(X < 2,Y ≥ 4)and P(X ≤ Y 2).
P(X ≤ Y 2) =
∫ ∞1
∫ x2
1
1
x2y2dy dx
=
∫ ∞1
(1
x2− 1
x4
)dx
=2
3
Jeffrey Yang (UNSW Society of Statistics/UNSW Mathematics Society)MATH2901 Revision Seminar August 2020 98 / 116
Example 13
Example
Find E[Y 2 lnX ] for the following distribution
y
1 2
x 1 1/10 1/52 3/10 2/5
E[Y 2 lnX ] = 12 ln 1P(X = 1,Y = 1) + 22 ln 1P(X = 1,Y = 2)
+ 12 ln 2P(X = 2,Y = 1) + 22 ln 2P(X = 2,Y = 2)
=
(3
10+ 2× 2
5
)ln 2 =
11 ln 2
10
Jeffrey Yang (UNSW Society of Statistics/UNSW Mathematics Society)MATH2901 Revision Seminar August 2020 99 / 116
Example 13
Example
Find E[Y 2 lnX ] for the following distribution
y
1 2
x 1 1/10 1/52 3/10 2/5
E[Y 2 lnX ] = 12 ln 1P(X = 1,Y = 1) + 22 ln 1P(X = 1,Y = 2)
+ 12 ln 2P(X = 2,Y = 1) + 22 ln 2P(X = 2,Y = 2)
=
(3
10+ 2× 2
5
)ln 2 =
11 ln 2
10
Jeffrey Yang (UNSW Society of Statistics/UNSW Mathematics Society)MATH2901 Revision Seminar August 2020 99 / 116
Example 13
Example
Find E[Y 2 lnX ] for the following distribution
y
1 2
x 1 1/10 1/52 3/10 2/5
E[Y 2 lnX ] = 12 ln 1P(X = 1,Y = 1) + 22 ln 1P(X = 1,Y = 2)
+ 12 ln 2P(X = 2,Y = 1) + 22 ln 2P(X = 2,Y = 2)
=
(3
10+ 2× 2
5
)ln 2 =
11 ln 2
10
Jeffrey Yang (UNSW Society of Statistics/UNSW Mathematics Society)MATH2901 Revision Seminar August 2020 99 / 116
Example 14
Problem
Examine the existence of E[XY ] for the earlier example:
fX ,Y (x , y) =1
x2y2for x , y ≥ 1.
Jeffrey Yang (UNSW Society of Statistics/UNSW Mathematics Society)MATH2901 Revision Seminar August 2020 100 / 116
Example 15
Definition (Cumulative Distribution Function)
The CDF FX ,Y (x , y) is the function given by
FX ,Y (x , y) = P(X ≤ x ,Y ≤ y)
Finding a CDF (Continuous case)
FX ,Y (x , y) =
∫ x
−∞
∫ y
−∞fX ,Y (u, v) du dv
Example
For the earlier example, FX ,Y (x , y) = 0 if x < 1 or y < 1. Else:
FX ,Y (x , y) =
∫ x
1
∫ y
1
1
u2v2du dv =
(1− 1
x
)(1− 1
y
)Jeffrey Yang (UNSW Society of Statistics/UNSW Mathematics Society)MATH2901 Revision Seminar August 2020 101 / 116
Example 16
Recall that P(A ∩ B) = P(A)P(B).
Definition (Independence of random variables)
Two random variables are independent when:
P(X = x ,Y = y) = P(X = x)P(Y = y) (discrete case)
fX ,Y (x , y) = fX (x)fY (y) (continuous case)
Example
Test if X and Y are independent, for
fX ,Y (x , y) =1
x2y2x , y ≥ 1.
Jeffrey Yang (UNSW Society of Statistics/UNSW Mathematics Society)MATH2901 Revision Seminar August 2020 102 / 116
Example 16
Example
Test if X and Y are independent, for
fX ,Y (x , y) =1
x2y2x , y ≥ 1.
fX (x) =
∫ ∞1
1
x2y2dy
=1
x2x ≥ 1
Similarly fY (y) =1
y2y ≥ 1.
Therefore since fX ,Y (x , y) = fX (x)fY (y), X and Y are independent.
Jeffrey Yang (UNSW Society of Statistics/UNSW Mathematics Society)MATH2901 Revision Seminar August 2020 102 / 116
Example 17
Example
Determine P(X = x | Y = 2), i.e. fX |Y (x | 2), for
y
1 2
x 1 1/10 1/52 3/10 2/5
P(Y = 2) = P(X = 1,Y = 2) + P(X = 2,Y = 2)
=1
5+
2
5
=3
5.
Jeffrey Yang (UNSW Society of Statistics/UNSW Mathematics Society)MATH2901 Revision Seminar August 2020 103 / 116
Example 17
Example
Determine P(X = x | Y = 2), i.e. fX |Y (x | 2), for
y
1 2
x 1 1/10 1/52 3/10 2/5
P(Y = 2) =3
5
P(X = 1 | Y = 2) =P(X = 1,Y = 2)
P(Y = 2)=
1
3
P(X = 2 | Y = 2) =P(X = 2,Y = 2)
P(Y = 2)=
2
3
Jeffrey Yang (UNSW Society of Statistics/UNSW Mathematics Society)MATH2901 Revision Seminar August 2020 103 / 116
Example 18
Lemma (Independence of random variables)
Two random variables are independent if and only if
fY |X (y | x) = fY (y)
orfX |Y (x | y) = fX (x)
Investigation
For the earlier example with fX ,Y (x , y) = x−2y−2 for x ≥ 1, y ≥ 1, provethe independence of X and Y using this lemma instead.
Jeffrey Yang (UNSW Society of Statistics/UNSW Mathematics Society)MATH2901 Revision Seminar August 2020 104 / 116
Example 19
Definition (Conditional Expectation)
E[X | Y = y ] =
∑all x
xP(X = x | Y = y) discrete case∫ ∞−∞
x fX |Y (x | y) dx continuous case
Definition (Conditional Variance)
Var(X | Y = y) = E[X 2 | Y = y ]−(E[X | Y = y ]
)2
(And similarly for Y . Basically, just add the condition to the originalformula.)
Jeffrey Yang (UNSW Society of Statistics/UNSW Mathematics Society)MATH2901 Revision Seminar August 2020 105 / 116
Example 19
Example
Find E[X | Y = 2] and Var(X | Y = 2) for
y
1 2
x 1 1/10 1/52 3/10 2/5
E[X | Y = 2] = 1 · P(X = 1 | Y = 2) + 2 · P(X = 2 | Y = 2)
= 1× 1
3+ 2× 2
3
=5
3.
Jeffrey Yang (UNSW Society of Statistics/UNSW Mathematics Society)MATH2901 Revision Seminar August 2020 105 / 116
Example 19
Example
Find E[X | Y = 2] and Var(X | Y = 2) for
y
1 2
x 1 1/10 1/52 3/10 2/5
E[X 2 | Y = 2] = 12 · P(X = 1 | Y = 2) + 22 · P(X = 2 | Y = 2)
= 12 × 1
3+ 22 × 2
3= 3.
Jeffrey Yang (UNSW Society of Statistics/UNSW Mathematics Society)MATH2901 Revision Seminar August 2020 105 / 116
Example 19
Example
Find E[X | Y = 2] and Var(X | Y = 2) for
y
1 2
x 1 1/10 1/52 3/10 2/5
Var(X 2 | Y = 2) = 3−(
5
3
)2
=2
9
Jeffrey Yang (UNSW Society of Statistics/UNSW Mathematics Society)MATH2901 Revision Seminar August 2020 105 / 116
Example 20
Example
Let fX ,Y (x , y) = xy for x ∈ [0, 1], y ∈ [0, 2]. Determine their covariance inthe old fashioned way.
Step 1: Determine the marginal densities
fX (x) =
∫ 2
0xy dy = 2x (0 ≤ x ≤ 1)
fY (y) =
∫ 1
0xy dx =
y
2(0 ≤ y ≤ 2)
Jeffrey Yang (UNSW Society of Statistics/UNSW Mathematics Society)MATH2901 Revision Seminar August 2020 106 / 116
Example 20
Example
Let fX ,Y (x , y) = xy for x ∈ [0, 1], y ∈ [0, 2]. Determine their covariance inthe old fashioned way.
Step 2: Find the marginal expectations E[X ] and E[Y ]
E[X ] =
∫ 1
02x2 dx =
2
3
E[Y ] =
∫ 2
0
y2
2dy =
4
3
Jeffrey Yang (UNSW Society of Statistics/UNSW Mathematics Society)MATH2901 Revision Seminar August 2020 106 / 116
Example 20
Example
Let fX ,Y (x , y) = xy for x ∈ [0, 1], y ∈ [0, 2]. Determine their covariance inthe old fashioned way.
Step 3: Find E[XY ]
E[XY ] =
∫ 1
0
∫ 2
0xy dy dx = · · · =
8
9
Step 4: Plug in:
Cov(X ,Y ) = E[XY ]− E[X ]E[Y ] =8
9− 2
3× 4
3= 0.
Jeffrey Yang (UNSW Society of Statistics/UNSW Mathematics Society)MATH2901 Revision Seminar August 2020 106 / 116
Example 20
Example
Let fX ,Y (x , y) = xy for x ∈ [0, 1], y ∈ [0, 2].Determine their covariance inthe old fashioned way.
That was a horrible idea.
Can prove that X and Y are independent
Can use the Fubini-Tonelli theorem to just check that E[XY ] equalsE[X ]E[Y ]
Jeffrey Yang (UNSW Society of Statistics/UNSW Mathematics Society)MATH2901 Revision Seminar August 2020 106 / 116
Example 21
Example (2901)
Let Z ∼ N (0, 1) and W satisfy P(X = 1) = P(X = −1) = 12 . Suppose
that W and Z are independent and define X := WZ .
Show that Cov(X ,Z ) = 0.
Noting that E[Z ] = 0,
Cov(X ,Z ) = E[XZ ]− E[X ]E[Z ] = E[XZ ]
Subbing in X = WZ and using independence gives
Cov(X ,Z ) = E[WZ 2] = E[W ]E[Z 2]
Jeffrey Yang (UNSW Society of Statistics/UNSW Mathematics Society)MATH2901 Revision Seminar August 2020 107 / 116
Example 21
Example (2901)
Let Z ∼ N (0, 1) and W satisfy P(X = 1) = P(X = −1) = 12 . Suppose
that W and Z are independent and define X := WZ .
Show that Cov(X ,Z ) = 0.
Noting that E[Z ] = 0,
Cov(X ,Z ) = E[XZ ]− E[X ]E[Z ] = E[XZ ]
Subbing in X = WZ and using independence gives
Cov(X ,Z ) = E[WZ 2] = E[W ]E[Z 2]
Jeffrey Yang (UNSW Society of Statistics/UNSW Mathematics Society)MATH2901 Revision Seminar August 2020 107 / 116
Example 21
Example (2901)
Let Z ∼ N (0, 1) and W satisfy P(X = 1) = P(X = −1) = 12 . Suppose
that W and Z are independent and define X := WZ .
Show that Cov(X ,Z ) = 0.
Observe that
E[W ] = 1P(X = 1)− 1P(X = −1) = 0.
Hence Cov(X ,Z ) = E[W ]E[Z 2] = 0.
Jeffrey Yang (UNSW Society of Statistics/UNSW Mathematics Society)MATH2901 Revision Seminar August 2020 107 / 116
Example 22
Theorem (Bivariate Transform Formula)
Suppose X and Y have joint density function fX ,Y and let U and V betransforms on these random variables. Then the joint density of U,V is
fU,V (u, v) = fX ,Y (x , y) | det(J)|
where J is the Jacobian matrix
J =
(∂x∂u
∂x∂v
∂y∂u
∂y∂v
)Remember: x above y and u left of v
Example (Course pack)
Let X and Y be i.i.d. Exp(4) r.v.s. Find the joint density of U and V if
U = 12 (X − Y ) and V = Y .
We know that y > 0. Since v = y , it immediately follows that v > 0.
Jeffrey Yang (UNSW Society of Statistics/UNSW Mathematics Society)MATH2901 Revision Seminar August 2020 108 / 116
Example 22
Example (Course pack)
Let X and Y be i.i.d. Exp(4) r.v.s. Find the joint density of U and V if
U =1
2(X − Y ) and V = Y .
We have y = v and
u =1
2(x − v) =⇒ x = 2u + v .
∴ J =
(2 10 1
)and det(J) = 2.
We know that y > 0. Since v = y , it immediately follows that v > 0.
Jeffrey Yang (UNSW Society of Statistics/UNSW Mathematics Society)MATH2901 Revision Seminar August 2020 108 / 116
Example 22
Example (Course pack)
Let X and Y be i.i.d. Exp(4) r.v.s. Find the joint density of U and V if
U =1
2(X − Y ) and V = Y .
fX ,Y (x , y) =1
16e−(x+y)/4
Since y = v and x = 2u + v , we get x + y = 2u + 2v . Therefore
fU,V (u, v) =1
8e−(u+v)/2.
We know that y > 0. Since v = y , it immediately follows that v > 0.
Jeffrey Yang (UNSW Society of Statistics/UNSW Mathematics Society)MATH2901 Revision Seminar August 2020 108 / 116
Example 22
Example (Course pack)
Let X and Y be i.i.d. Exp(4) r.v.s. Find the joint density of U and V if
U =1
2(X − Y ) and V = Y .
We know that y > 0. Since v = y , it immediately follows that v > 0.
Jeffrey Yang (UNSW Society of Statistics/UNSW Mathematics Society)MATH2901 Revision Seminar August 2020 108 / 116
Example 22
Example (Course pack)
Let X and Y be i.i.d. Exp(4) r.v.s. Find the joint density of U and V if
U =1
2(X − Y ) and V = Y .
We know that y > 0. Since v = y , it immediately follows that v > 0.However, x > 0 and x = 2u + v . Therefore:
2u + v > 0
u > −v
2
Jeffrey Yang (UNSW Society of Statistics/UNSW Mathematics Society)MATH2901 Revision Seminar August 2020 108 / 116
Example 23
Example
Let X and Y be i.i.d. Geom(p). Use convolutions to find the probabilityfunction of Z := X + Y .
The probability functions are P(X = x) = p(1− p)x for x = 1, 2, 3, . . .,and P(Y = y) = p(1− p)y for y = 1, 2, 3, . . .. Therefore:
P(X = z − y) = p(1− p)z−y
for z − y = 1, 2, 3, . . .,
Jeffrey Yang (UNSW Society of Statistics/UNSW Mathematics Society)MATH2901 Revision Seminar August 2020 109 / 116
Example 23
Example
Let X and Y be i.i.d. Geom(p). Use convolutions to find the probabilityfunction of Z := X + Y .
The probability functions are P(X = x) = p(1− p)x for x = 1, 2, 3, . . .,and P(Y = y) = p(1− p)y for y = 1, 2, 3, . . .. Therefore:
P(X = z − y) = p(1− p)z−y
for z − y = 1, 2, 3, . . ., i.e.
y − z = . . . ,−3,−2,−1 ⇐⇒ y = . . . , z − 3, z − 2, z − 1
Jeffrey Yang (UNSW Society of Statistics/UNSW Mathematics Society)MATH2901 Revision Seminar August 2020 109 / 116
Example 23
Example
Let X and Y be i.i.d. Geom(p). Use convolutions to find the probabilityfunction of Z := X + Y .
Hence P(X = z − y)P(Y = y) = p(1− p)z−yp(1− p)y = p2(1− p)z , when
y = 0, 1, 2, . . .
and y = . . . , z − 3, z − 2, z − 1.
Therefore, y = 0, 1, 2, . . . , z − 3, z − 2, z − 1.
Jeffrey Yang (UNSW Society of Statistics/UNSW Mathematics Society)MATH2901 Revision Seminar August 2020 109 / 116
Example 23
Example
Let X and Y be i.i.d. Geom(p). Use convolutions to find the probabilityfunction of Z := X + Y .
∴ P(Z = z) =z−1∑y=0
p2(1− p)z
= zp2(1− p)z (sum only depends on y !)
Jeffrey Yang (UNSW Society of Statistics/UNSW Mathematics Society)MATH2901 Revision Seminar August 2020 109 / 116
Example 23
Example
Let X and Y be i.i.d. Geom(p). Use convolutions to find the probabilityfunction of Z := X + Y .
∴ P(Z = z) =z−1∑y=0
p2(1− p)z
= zp2(1− p)z (sum only depends on y !)
Since x = 1, 2, . . . and y = 1, 2, . . ., i.e. x and y are natural numbersgreater than or equal to 1, z = x + y = 2, 3, 4, . . .
Jeffrey Yang (UNSW Society of Statistics/UNSW Mathematics Society)MATH2901 Revision Seminar August 2020 109 / 116
Example 24
Example
Let X and Y be i.i.d. Exp(1). Prove that Z := X + Y follows aGamma(2, 1) distribution using a convolution.
The densities are fX (x) = e−x for x > 0, and fY (y) = e−y for y > 0.Therefore:
fX (z − y) = e−z+y , for z − y > 0 , i.e. y < z
Hence fX (z − y)fY (y) = e−z when y < z and y > 0. i.e.
fX (z − y)fY (y) = e−z for 0 < y < z
Jeffrey Yang (UNSW Society of Statistics/UNSW Mathematics Society)MATH2901 Revision Seminar August 2020 110 / 116
Example 24
Example
Let X and Y be i.i.d. Exp(1). Prove that Z := X + Y follows aGamma(2, 1) distribution using a convolution.
The densities are fX (x) = e−x for x > 0, and fY (y) = e−y for y > 0.Therefore:
fX (z − y) = e−z+y , for z − y > 0 , i.e. y < z
Hence fX (z − y)fY (y) = e−z when y < z and y > 0. i.e.
fX (z − y)fY (y) = e−z for 0 < y < z
Jeffrey Yang (UNSW Society of Statistics/UNSW Mathematics Society)MATH2901 Revision Seminar August 2020 110 / 116
Example 24
Example
Let X and Y be i.i.d. Exp(1). Prove that Z := X + Y follows aGamma(2, 1) distribution using a convolution.
∴ fZ (z) =
∫ z
0e−z dy
= e−zz
=e−z/1z2−1
Γ(2)12
Since x > 0 and y > 0, z = x + y > 0. Thus Z has the density of aGamma(2,1) random variable.
Jeffrey Yang (UNSW Society of Statistics/UNSW Mathematics Society)MATH2901 Revision Seminar August 2020 110 / 116
Example 24
Example
Let X and Y be i.i.d. Exp(1). Prove that Z := X + Y follows aGamma(2, 1) distribution using a convolution.
∴ fZ (z) =
∫ z
0e−z dy
= e−zz
=e−z/1z2−1
Γ(2)12
Since x > 0 and y > 0, z = x + y > 0. Thus Z has the density of aGamma(2,1) random variable.
Jeffrey Yang (UNSW Society of Statistics/UNSW Mathematics Society)MATH2901 Revision Seminar August 2020 110 / 116
Example 25
Theorem (MGF of a sum)
If X and Y are independent random variables, then
mX+Y (u) = mX (u)mY (u)
Example
Let X and Y be i.i.d. Exp(1). Prove that Z := X + Y follows aGamma(2, 1) distribution from quoting MGFs.
Jeffrey Yang (UNSW Society of Statistics/UNSW Mathematics Society)MATH2901 Revision Seminar August 2020 111 / 116
Example 25
Example
Let X and Y be i.i.d. Exp(1). Prove that Z := X + Y follows aGamma(2, 1) distribution from quoting MGFs.
mX (u) = 11−u and mY (u) = 1
1−u . So clearly
mZ (u) = mX (u)mY (u) =
(1
1− u
)2
,
which is the MGF of a Gamma(2,1) distribution. Hence Z follows thisdistribution as well.
Jeffrey Yang (UNSW Society of Statistics/UNSW Mathematics Society)MATH2901 Revision Seminar August 2020 111 / 116
Example 26
Example
Let X1, . . . ,Xn be a sequence of i.i.d. Unif(0, 1) random variables. Define
Yn = nminU1, . . . ,Un. Prove that Ynd→ Y , where Y ∼ Exp(1).
FYn(y) = P(Yn ≤ y) = P(nminU1, . . . ,Un ≤ y)
= P(
minU1, . . . ,Un ≤y
n
)
= 1− P(
minU1, . . . ,Un ≥y
n
)
= 1− P(U1 >
y
n, . . . ,Un >
y
n
)
Jeffrey Yang (UNSW Society of Statistics/UNSW Mathematics Society)MATH2901 Revision Seminar August 2020 112 / 116
Example 26
Example
Let X1, . . . ,Xn be a sequence of i.i.d. Unif(0, 1) random variables. Define
Yn = nminU1, . . . ,Un. Prove that Ynd→ Y , where Y ∼ Exp(1).
FYn(y) = P(Yn ≤ y) = P(nminU1, . . . ,Un ≤ y)
= P(
minU1, . . . ,Un ≤y
n
)= 1− P
(minU1, . . . ,Un ≥
y
n
)
= 1− P(U1 >
y
n, . . . ,Un >
y
n
)
In general, if minx1, . . . , xn ≤ x , then not every xi ≤ x .
Jeffrey Yang (UNSW Society of Statistics/UNSW Mathematics Society)MATH2901 Revision Seminar August 2020 112 / 116
Example 26
Example
Let X1, . . . ,Xn be a sequence of i.i.d. Unif(0, 1) random variables. Define
Yn = nminU1, . . . ,Un. Prove that Ynd→ Y , where Y ∼ Exp(1).
FYn(y) = P(Yn ≤ y) = P(nminU1, . . . ,Un ≤ y)
= P(
minU1, . . . ,Un ≤y
n
)= 1− P
(minU1, . . . ,Un ≥
y
n
)= 1− P
(U1 >
y
n, . . . ,Un >
y
n
)But it is true that if minU1, . . . ,Un ≥ x , then every xi ≥ x .
Jeffrey Yang (UNSW Society of Statistics/UNSW Mathematics Society)MATH2901 Revision Seminar August 2020 112 / 116
Example 26
Example
Let X1, . . . ,Xn be a sequence of i.i.d. Unif(0, 1) random variables. Define
Yn = nminU1, . . . ,Un. Prove that Ynd→ Y , where Y ∼ Exp(1).
FYn(y) = 1− P(U1 >
y
n, . . . ,Un >
y
n
)= 1− P
(U1 >
y
n
). . .P
(Un >
y
n
)(independence)
= 1−[P(U1 >
y
n
)]n(id. distributed)
= 1−
[∫ 1
y/n1 dt
]n= 1−
(1− y
n
)n
Jeffrey Yang (UNSW Society of Statistics/UNSW Mathematics Society)MATH2901 Revision Seminar August 2020 112 / 116
Example 26
Example
Let X1, . . . ,Xn be a sequence of i.i.d. Unif(0, 1) random variables. Define
Yn = nminU1, . . . ,Un. Prove that Ynd→ Y , where Y ∼ Exp(1).
FYn(y) = 1− P(U1 >
y
n, . . . ,Un >
y
n
)= 1− P
(U1 >
y
n
). . .P
(Un >
y
n
)(independence)
= 1−[P(U1 >
y
n
)]n(id. distributed)
= 1−
[∫ 1
y/n1 dt
]n= 1−
(1− y
n
)n
Jeffrey Yang (UNSW Society of Statistics/UNSW Mathematics Society)MATH2901 Revision Seminar August 2020 112 / 116
Example 26
Example
Let X1, . . . ,Xn be a sequence of i.i.d. Unif(0, 1) random variables. Define
Yn = nminU1, . . . ,Un. Prove that Ynd→ Y , where Y ∼ Exp(1).
∴ limn→∞
FYn(y) = 1− e−y = FY (y)
Hence Ynd→ Y .
Jeffrey Yang (UNSW Society of Statistics/UNSW Mathematics Society)MATH2901 Revision Seminar August 2020 112 / 116
Example 27
Example (Libo’s notes)
Australians have average weight about 68 kg and variance about 16 kg2.Suppose 40 random Australians are chosen. What is the (approximate)probability that the average weight of these Australians is over 80?
Let X1, . . . ,X40 be the weights of the Australians.
Jeffrey Yang (UNSW Society of Statistics/UNSW Mathematics Society)MATH2901 Revision Seminar August 2020 113 / 116
Example 27
Example (Libo’s notes)
Australians have average weight about 68 kg and variance about 16 kg2.Suppose 40 random Australians are chosen. What is the (approximate)probability that the average weight of these Australians is over 80?
Let X1, . . . ,X40 be the weights of the Australians. Then n = 40, µ = 68and σ = 4, so by the CLT:
X − 68
4/√
40
d→ Z
where Z ∼ N (0, 1).
Jeffrey Yang (UNSW Society of Statistics/UNSW Mathematics Society)MATH2901 Revision Seminar August 2020 113 / 116
Example 27
Example (Libo’s notes)
Australians have average weight about 68 kg and variance about 16 kg2.Suppose 40 random Australians are chosen. What is the (approximate)probability that the average weight of these Australians is over 80?
∴ P(X40 > 80) = P(X40 − 68
4/√
40>
80− 68
4/√
40
)≈ P
(Z >
80− 68
4/√
40
)
= P(Z > 3√
40)
= 1-pnorm(3*sqrt(40))
or pnorm(3*sqrt(40), lower.tail=FALSE)
Jeffrey Yang (UNSW Society of Statistics/UNSW Mathematics Society)MATH2901 Revision Seminar August 2020 113 / 116
Example 27
Example (Libo’s notes)
Australians have average weight about 68 kg and variance about 16 kg2.Suppose 40 random Australians are chosen. What is the (approximate)probability that the average weight of these Australians is over 80?
∴ P(X40 > 80) = P(X40 − 68
4/√
40>
80− 68
4/√
40
)≈ P
(Z >
80− 68
4/√
40
)= P(Z > 3
√40)
= 1-pnorm(3*sqrt(40))
or pnorm(3*sqrt(40), lower.tail=FALSE)
Jeffrey Yang (UNSW Society of Statistics/UNSW Mathematics Society)MATH2901 Revision Seminar August 2020 113 / 116
Example 28
Lemma (Normal Approximation to Binomial)
Let X ∼ Bin(n, p), which is a sum of n independent Ber(p) r.v.s. Then
X − np√np(1− p)
d→ N (0, 1)
Jeffrey Yang (UNSW Society of Statistics/UNSW Mathematics Society)MATH2901 Revision Seminar August 2020 114 / 116
Example 28
Example
An unfortunate soul decided to sit his exam despite having a migraine andthe flu. Fortunately, it was not a university exam, and the paper involvedonly 200 multiple choice questions with 5 options. Therefore, he randomlyguesses every answer. What is the (approximate) probability he fails?
Let X be how many he gets correct. Then X ∼ Bin(200, 1
5
).
We may approximate X with Y ∼ N (40, 32). Then,
P(X < 100) ≈ P(Y < 100)
= P(Y − 40√
32<
100− 40√32
)= P
(Z <
60√32
)= P(Z < 10.6066)
Oh my...
Jeffrey Yang (UNSW Society of Statistics/UNSW Mathematics Society)MATH2901 Revision Seminar August 2020 114 / 116
Example 28
Example
An unfortunate soul decided to sit his exam despite having a migraine andthe flu. Fortunately, it was not a university exam, and the paper involvedonly 200 multiple choice questions with 5 options. Therefore, he randomlyguesses every answer. What is the (approximate) probability he fails?
Let X be how many he gets correct. Then X ∼ Bin(200, 1
5
).
We may approximate X with Y ∼ N (40, 32). Then,
P(X < 100) ≈ P(Y < 100)
= P(Y − 40√
32<
100− 40√32
)= P
(Z <
60√32
)= P(Z < 10.6066)
Oh my...
Jeffrey Yang (UNSW Society of Statistics/UNSW Mathematics Society)MATH2901 Revision Seminar August 2020 114 / 116
Example 28
Example
An unfortunate soul decided to sit his exam despite having a migraine andthe flu. Fortunately, it was not a university exam, and the paper involvedonly 200 multiple choice questions with 5 options. Therefore, he randomlyguesses every answer. What is the (approximate) probability he fails?
Let X be how many he gets correct. Then X ∼ Bin(200, 1
5
).
We may approximate X with Y ∼ N (40, 32). Then,
P(X < 100) ≈ P(Y < 100)
= P(Y − 40√
32<
100− 40√32
)= P
(Z <
60√32
)= P(Z < 10.6066)
Oh my...
Jeffrey Yang (UNSW Society of Statistics/UNSW Mathematics Society)MATH2901 Revision Seminar August 2020 114 / 116
Example 28
Example
An unfortunate soul decided to sit his exam despite having a migraine andthe flu. Fortunately, it was not a university exam, and the paper involvedonly 200 multiple choice questions with 5 options. Therefore, he randomlyguesses every answer. What is the (approximate) probability he fails?
Let X be how many he gets correct. Then X ∼ Bin(200, 1
5
).
We may approximate X with Y ∼ N (40, 32). Then,
P(X < 100) ≈ P(Y < 100)
= P(Y − 40√
32<
100− 40√32
)= P
(Z <
60√32
)= P(Z < 10.6066) Oh my...
Jeffrey Yang (UNSW Society of Statistics/UNSW Mathematics Society)MATH2901 Revision Seminar August 2020 114 / 116
Example 29
Example (Libo’s notes)
Let X1,X2, ... be a sequence of i.i.d random variables with mean 2 andvariance 7. Obtain a large sample approximation for the distribution of(Xn)3.
Jeffrey Yang (UNSW Society of Statistics/UNSW Mathematics Society)MATH2901 Revision Seminar August 2020 115 / 116
Example 29
The CLT gives√n(Xn − 2)
d→ N(0, 7)
Applying the Delta Method with g(x) = x3 leads to g ′(x) = 3x2 and then
√n[(X 3
n )− 23]d→ N(0, 7 · (3 · 22)2).
Simplifying, we have
√n[((X )3
n − 8]d→ N(0, 1008).
Thus, for large n, the approximate distribution of (Xn)3 is N(8, 1008n ).
Jeffrey Yang (UNSW Society of Statistics/UNSW Mathematics Society)MATH2901 Revision Seminar August 2020 115 / 116
Example 30
Example (More 2801 focused)
The Riemann zeta function is defined for complex s with real part greaterthan 1 by the absolutely convergent infinite series
ζ(s) =∞∑n=1
1
ns.
Prove that the real part of every non-trivial zero of the Riemann zetafunction is 1
2 .
Jeffrey Yang (UNSW Society of Statistics/UNSW Mathematics Society)MATH2901 Revision Seminar August 2020 116 / 116
Example 30
Jeffrey Yang (UNSW Society of Statistics/UNSW Mathematics Society)MATH2901 Revision Seminar August 2020 116 / 116
top related