stat:5100 (22s:193) statistical inference i - week...
TRANSCRIPT
STAT:5100 (22S:193) Statistical Inference IWeek 4
Luke Tierney
University of Iowa
Fall 2015
Luke Tierney (U Iowa) STAT:5100 (22S:193) Statistical Inference I Fall 2015 1
Monday, September 14, 2015
Recap
• Reliability block diagrams
• Conditional probability.
• Law of total probability
• Bayes theorem
• Conditioning examples
• Gambler’s ruin problem
Luke Tierney (U Iowa) STAT:5100 (22S:193) Statistical Inference I Fall 2015 2
Monday, September 14, 2015 Gambler’s Ruin
Gambler’s Ruin
Example
• The gambler’s ruin problem is another famous problem in probability.
• It is related to sequential stopping rules in clinical trials.
• Players A and B have a total fortune of M dollars between them.
• Player A starts with n dollars and B with M − n.
• Each turn a coin is toss independently:• The probability of heads is p.• If a head occurs then B pays one dollar to A.• If a tail occurs then A pays one dollar to B.
• The game ends when one player has all M dollars.
• What is the probability that A ends up with all the fortune, or B isruined?
• Some simulations:
http://www.stat.uiowa.edu/~luke/classes/193/ruin.R.
Luke Tierney (U Iowa) STAT:5100 (22S:193) Statistical Inference I Fall 2015 3
Monday, September 14, 2015 Gambler’s Ruin
Example (gambler’s ruin. continued)
• Let pn be the probability that B is ruined when A starts with n dollars.
• Then
p0 = 0
pM = 1.
• For 0 < n < M
pn = P(B is ruined|first toss is H)p + P(B is ruined|first toss is T )(1− p)
• Now
P(B is ruined|first toss is H) = pn+1
P(B is ruined|first toss is T ) = pn−1
• So pn satisfies the second order difference equation
pn = pn+1p + pn−1(1− p).
Luke Tierney (U Iowa) STAT:5100 (22S:193) Statistical Inference I Fall 2015 4
Monday, September 14, 2015 Gambler’s Ruin
Example (gambler’s ruin. continued)
• Subtracting pnp from both sides and rearranging yields
pn+1 − pn =1− p
p(pn − pn−1).
• Since p0 = 0 this implies that
pn+1 − pn =
(1− p
p
)n
p1
• Therefore for 0 < n ≤ M
pn = p1
n∑k=1
(1− p
p
)k−1
Luke Tierney (U Iowa) STAT:5100 (22S:193) Statistical Inference I Fall 2015 5
Monday, September 14, 2015 Gambler’s Ruin
Example (gambler’s ruin. continued)
• The sum is
n∑k=1
(1− p
p
)k−1=
1−
(1−pp
)n
1−(
1−pp
) if p 6= 12
n if p = 12 .
• Using the fact that pM = 1 this gives
pn =
1−
(1−pp
)n
1−(
1−pp
)M if p 6= 12
nM if p = 1
2 .
• This holds for 0 ≤ n ≤ M.
Luke Tierney (U Iowa) STAT:5100 (22S:193) Statistical Inference I Fall 2015 6
Monday, September 14, 2015 Continuous Sample Spaces
Continuous Sample Spaces
• Some experiments require sample spaces that are continuous ranges:• Measuring the melting point of a substance.• Measuring the angle at which a particle leaves a point source.• Recording the time it takes to service a customer at a bank.
• Some experiments may require several continuous ranges:• Recording height and weight of a randomly selected person.• Recording the service times for each of the first 10 customers.
• There can be a continuous range of continuous ranges:• Recording the trajectory of a migrating bird.
• Continuous ranges are uncountable and require some differentapproaches.
Luke Tierney (U Iowa) STAT:5100 (22S:193) Statistical Inference I Fall 2015 7
Monday, September 14, 2015 Continuous Sample Spaces
Example
• We would like to model the direction in which a particle leaves apoint source.
• This has a number of applications:• Photon emission in optics.• Gamma particles in nuclear medicine and medical imaging.
• We would like to capture the idea that the particle is equally likely toleave in any direction.
• The angle will be in the range [0, 2π).
• Equally likely to leave in any direction would mean• The probability of the angle falling in the ranges [0, π) and [π, 2π)
should be 12 each.
• The probability of the angle falling in the range [π, 3π/2) should be 14 .
• The probability of the angle falling in an interval [a, b) should beproportional to the length b − a.
• This means the probability would be b−a2π .
Luke Tierney (U Iowa) STAT:5100 (22S:193) Statistical Inference I Fall 2015 8
Monday, September 14, 2015 Continuous Sample Spaces
Example (continued)
• What is the probability the particle leaves in a particular direction a;what is P({a})?
• Let An = [a, a + 1/n).• Then A1 ⊃ A1 ⊃ . . . and
⋂∞n=1 An = {a}.
• So by the continuity property of probability
P({a}) = limn→∞
P(An) = limn→∞
1
2πn= 0.
• A consequence is that
P([a, b)) = P((a, b)) = P((a, b]) = P([a, b])
• We can compute the probability density near a direction a as
f (a) = limh↓0
P([a, a + h))
h= lim
h↓0
P([a− h, a])
h= lim
h↓0
1
h
h
2π=
1
2π.
Luke Tierney (U Iowa) STAT:5100 (22S:193) Statistical Inference I Fall 2015 9
Monday, September 14, 2015 Continuous Sample Spaces
• Does a probability with the properties of the previous example exist?
• Equivalent question: is it possible to compute a meaningful size, orlength, for any subset of the real line?
• Unfortunately: no.
• There exist some very strange subsets of the real line.
• Reasonable subsets, even many very complicated ones, do allow theirsize to be computed.
• To move forward we need to be able to restrict our probabilities to acollection B of reasonable subsets.
• For our axioms to make sense we need our collection of reasonablesubsets B to satisfy some closure conditions:
(i) ∅ ∈ B.(ii) If A ∈ B then Ac ∈ B.
(iii) If A1,A2, · · · ∈ B then⋃∞
i=1 Ai ∈ B.
• A collection B of subsets of a sample space S that satisfies theseconditions is called a sigma-algebra of subsets.
Luke Tierney (U Iowa) STAT:5100 (22S:193) Statistical Inference I Fall 2015 10
Monday, September 14, 2015 Continuous Sample Spaces
Examples
• The smallest sigma-algebra on a sample space S is
B = {∅, S}.
• The largest sigma-algebra on a sample space S is
B = {all subsets of S} = 2S .
• If S is countable and {s} ∈ B for all s ∈ S , thenB = {all subsets of S}.
• If S is not countable, then we often need to use a B that does notcontain all subsets of S .
• Even for countable state spaces it is sometimes useful to use a B thatdoes not contain all subsets of S .
Luke Tierney (U Iowa) STAT:5100 (22S:193) Statistical Inference I Fall 2015 11
Monday, September 14, 2015 Continuous Sample Spaces
Example
Let S = R = (−∞,∞) and let B be the smallest sigma-algebra containingall open intervals (a, b), a, b ∈ R.
• This is well-defined.
• B is called the Borel sigma-algebra on S .
• The sets in B are called Borel sets.
• Non-Borel sets do exist but are quite strange.
• B is also the smallest sigma-algebra containing all intervals of theform (−∞, a] for all a ∈ R.
• Other characterizations are possible.
• Analogous definitions apply when S is an interval.
• Analogous definitions apply when S is R2 or Rn.
Luke Tierney (U Iowa) STAT:5100 (22S:193) Statistical Inference I Fall 2015 12
Monday, September 14, 2015 A Final Definition of Probability
A Final Definition of Probability
Definition
A probability, or probability function, is a real-valued function defined on asigma-algebra B of subsets of a sample space S such that
(i) P(A) ≥ 0 for all A ∈ B(ii) P(S) = 1
(iii) If A1,A2, . . . ∈ B are pairwise disjoint, then
P
( ∞⋃i=1
Ai
)=∞∑i=1
P(Ai ).
• This is the definition introduced by Kolmogorov in 1933 that is nowthe standard.
• A triple (S ,B,P) is called a probability space.• If S is a subset of the real line then a probability is sometimes called a
probability distribution.Luke Tierney (U Iowa) STAT:5100 (22S:193) Statistical Inference I Fall 2015 13
Monday, September 14, 2015 A Final Definition of Probability
With this definition we can use the probability we developed to model theangle of a particle leaving a point source:
TheoremFor any bounded interval [L,U] with L < U there exists a uniqueprobability defined on the Borel subsets of the interval such that
P((a, b)) =b − a
U − L
for L ≤ a < b ≤ U. This is called the uniform distribution on [L,U].
Luke Tierney (U Iowa) STAT:5100 (22S:193) Statistical Inference I Fall 2015 14
Monday, September 14, 2015 Random Variables
Random Variables
• Random variables are numerical quantities with values we areuncertain about.
• In our framework we can think of each outcome in our sample spaceas producing a particular value for the random variable.
• We can think of a random variable as a real-valued function definedon our sample space.
• Sometimes we are interested in only one random variable, sometimesin many.
Luke Tierney (U Iowa) STAT:5100 (22S:193) Statistical Inference I Fall 2015 15
Monday, September 14, 2015 Random Variables
Examples
• Outcomes are all possible subsets of 3 out of 100 items.• We are mainly interested in the number of defectives in the sample.
• Outcomes are all possible past and future prices of a stock.• We are mainly interested in the stock today and one month from now.
• A student is selected at random. Outcomes are all possible students.• We are interested in the height and weight of the student selected.
Luke Tierney (U Iowa) STAT:5100 (22S:193) Statistical Inference I Fall 2015 16
Wednesday, September 16, 2015
Recap
• Conditioning examples• reliability block diagram• matching problem• gambler’s ruin problem
• Continuous sample spaces• sigma algebras• Borel sets
• A final definition of probability
• Random variables
Luke Tierney (U Iowa) STAT:5100 (22S:193) Statistical Inference I Fall 2015 17
Wednesday, September 16, 2015
• A random variable represents a numerical measurement that can becomputed for each outcome.
• Informally: A random variable is a real-valued function defined on asample space.
• Random variables are usually denoted by upper case letters from theend of the alphabet:
X (s) : S → R• Lower case letters usually denote generic realized values: x = X (s).
• Two dice are rolled; the sample space is S = {(1, 1), (1, 2), . . . , (6, 6)}.• X is the sum of the values on the two dice.• The experiment is performed and results in the outcome s = (3, 2).• The realized value of X is x = X (s) = X ((3, 2)) = 5.
• Often the argument is dropped:
P({s ∈ S : X (s) ≤ 5}) = P({X ≤ 5}) = P(X ≤ 5)
Luke Tierney (U Iowa) STAT:5100 (22S:193) Statistical Inference I Fall 2015 18
Wednesday, September 16, 2015 Random Variables
Random Variables
• When our probability is defined on all subsets of the sample spacethen any real-valued function on the sample space is a randomvariable.
• If our probability is defined only for some subsets of the sample space,the subsets in a sigma-algebra B, then to be able to compute
P(X ≤ 5) = P({s ∈ S : X (s) ≤ 5})
we need to have {s ∈ S : X (s) ≤ 5} ∈ B.
• This property is called measurability with respect to B.
• A random variable is a B-measurable real-valued function on thesample space S .
Luke Tierney (U Iowa) STAT:5100 (22S:193) Statistical Inference I Fall 2015 19
Wednesday, September 16, 2015 Random Variables
Definition
Let S be a sample space and B a sigma-algebra of events on S . Areal-valued function X = X (s) : S → R is a random variable if
{s : X (s) ≤ t} ∈ B
for all t ∈ R.An equivalent condition is that
X−1(A) = {s : X (s) ∈ A} ∈ B
for any Borel set A.
Luke Tierney (U Iowa) STAT:5100 (22S:193) Statistical Inference I Fall 2015 20
Wednesday, September 16, 2015 Random Variables
Examples
• Toss a coin n times, X = number of heads.
• Roll two dice,
X = sum,Y = difference.
• Sample 1000,
X = number who favor some issue,Y = number who oppose the issue,Z = 1000− X − Y .
Luke Tierney (U Iowa) STAT:5100 (22S:193) Statistical Inference I Fall 2015 21
Wednesday, September 16, 2015 A Measurability Example
Example
• A coin is tossed twice; the sample space is S = {HH,HT ,TH,TT}.• Let Xi be the number of heads on toss i and let Y = X1 + X2 be the
total number of heads.
• Let B be all subsets of S
• and letB1 = {∅,S , {HH,HT}, {TH,TT}}.
• Both B and B1 are sigma-algebras.
• X1, X2, and Y are all measurable with respect to B.
• X1 is measurable with respect to B1.
• X2 and Y are not measurable with respect to B1:
{X2 ≤ 0} = {HT ,TT} 6∈ B1{Y ≤ 0} = {TT} 6∈ B1
Luke Tierney (U Iowa) STAT:5100 (22S:193) Statistical Inference I Fall 2015 22
Wednesday, September 16, 2015 A Measurability Example
Example (continued)
• B1 is called the sigma-algebra generated by X1.
• B1 is the smallest sigma-algebra with respect to which X1 ismeasurable.
• B1 consists of those subsets of S for which one can determinewhether the outcome that has occurred is in the subset or not byknowing only the value of X1.
• Conversely, if we know for each event A ∈ B1 whether A has occurredor not, then we know the value of X1.
Luke Tierney (U Iowa) STAT:5100 (22S:193) Statistical Inference I Fall 2015 23
Wednesday, September 16, 2015 Distributions
Distributions
• Let X be a random variable on a probabiity space (S ,B,P).
• For any Borel set A, let
PX (A) = P({X ∈ A}) = P({s ∈ S : X (s) ∈ A}).
• PX is a probability on R, the Borel sigma-algebra on R.
• A random variable induces the probability PX on the real line R.
• This induced probability is called the distribution of X .
• The triple (R,R,PX ) is the probability space induced by X on thereal line.
• Similarly, a pair of random variables X and Y induces a probability onR2.
• This is called the joint probability distribution of X and Y .
Luke Tierney (U Iowa) STAT:5100 (22S:193) Statistical Inference I Fall 2015 24
Friday, September 18, 2015
Recap
• Random variables
• Measurability
• Distributions, induced probabilites
• Cumulative Distribution Function (CDF)
Luke Tierney (U Iowa) STAT:5100 (22S:193) Statistical Inference I Fall 2015 25
Friday, September 18, 2015 Cumulative Distribution Functions
Cumulative Distribution Functions
• Writing down probabilities for all Borel sets would be a daunting task.
• We need some simpler tools for specifying distributions.
• One tool that is available for any random variable is the cumulativedistribution function:
Definition
The cumulative distribution function (CDF) of a random variable X is
FX (x) = P(X ≤ x)
for all x ∈ R.
• A few authors define the CDF as FX (x) = P(X < x).
• The survival function SX (x) = P(X > x) is sometimes used for lifetime distributions.
• The survival function satisfies SX (x) = 1− FX (x).
Luke Tierney (U Iowa) STAT:5100 (22S:193) Statistical Inference I Fall 2015 26
Friday, September 18, 2015 Cumulative Distribution Functions
Example
For X = number of heads in two coin flips:
FX (x) =
0 x < 014 0 ≤ x < 134 1 ≤ x < 2
1 x ≥ 2
−1 0 1 2 3
0.0
0.2
0.4
0.6
0.8
1.0
xF
X((x
))
●
●
●
The plot was created with
s <- stepfun(c(0, 1, 2), c(0, .25, .75, 1))
plot(s, verticals = FALSE, pch = 19, main="", ylab=expression(F[X](x)))
Luke Tierney (U Iowa) STAT:5100 (22S:193) Statistical Inference I Fall 2015 27
Friday, September 18, 2015 Cumulative Distribution Functions
Example
• Let X be the angle in which a particle leaves a point source.
• Suppose X has a uniform distribution on [0, 2π).
• This means for 0 ≤ a ≤ b ≤ 2π
P(a < X < b) =b − a
2π.
• Therefore
FX (x) =
0 x < 0x2π 0 ≤ x < 2π
1 x ≥ 2π.
Luke Tierney (U Iowa) STAT:5100 (22S:193) Statistical Inference I Fall 2015 28
Friday, September 18, 2015 Cumulative Distribution Functions
• Some properties of the CDF:
(a) limx→−∞ F (x) = 0(b) limx→∞ F (x) = 1(c) F is nondecreasing(d) F is right-continuous
• Properties (a), (b), and (d) follow from the continuity of probablity.
• If a function F has these four properties, then it is the CDF of somerandom variable.
• If F has a jump at x , then P(X = x) is the size of the jump.
Luke Tierney (U Iowa) STAT:5100 (22S:193) Statistical Inference I Fall 2015 29
Friday, September 18, 2015 Computing Some Probabilities Using the CDF
Computing Some Probabilities
P(a < X ≤ b) = F (b)− F (a) for any a < b.
Proof.
The event {X ≤ b} can be written as the disjoint union
{X ≤ b} = {X ≤ a} ∪ {a < X ≤ b}
By finite additivity,
F (b) = P(X ≤ b) = P(X ≤ a) + P(a < X ≤ b)
= F (a) + P(a < X ≤ b)
Luke Tierney (U Iowa) STAT:5100 (22S:193) Statistical Inference I Fall 2015 30
Friday, September 18, 2015 Computing Some Probabilities Using the CDF
Computing Some Probabilities
P(X < a) = F (a−) = limx↑a F (x) for any a ∈ R.
Proof.
Let An ={
a− 1n < X < a
}. Then for each n ≥ 1
P(X < a) = F
(a− 1
n
)+ P(An)
Furthermore,∞⋂n=1
An = ∅.
So by the continuity property limn→∞ P(An) = 0, and
P(X < a) = F (a−) + limn→∞
P(An) = F (a−).
Luke Tierney (U Iowa) STAT:5100 (22S:193) Statistical Inference I Fall 2015 31
Friday, September 18, 2015 Computing Some Probabilities Using the CDF
Computing Some Probabilities
P(X = a) = F (a)− F (a−) = size of the jump of F at a.
Proof.
The event {X = a} can be written as
{X = a} = {X ≤ a}\{X < a}.
SoP(X = a) = P(X ≤ a)− P(X < a) = F (a)− F (a−)
P(a ≤ X ≤ b) = F (b)− F (a−).
Proof.
Follows similarly from observing that
{a ≤ X ≤ b} = {X ≤ b}\{X < a}.
Luke Tierney (U Iowa) STAT:5100 (22S:193) Statistical Inference I Fall 2015 32
Friday, September 18, 2015 Computing Some Probabilities Using the CDF
Joint CDF of Two Random Variables
• The joint CDF of two random variables X and Y is defined as
FXY (x , y) = P({X ≤ x} ∩ {Y ≤ y}).
• The probability of any rectangle with sides parallel to the axes can becalculated from the joint CDF.
• Other probabilities are harder to obtain from the joint CDF.
Luke Tierney (U Iowa) STAT:5100 (22S:193) Statistical Inference I Fall 2015 33
Friday, September 18, 2015 Comparing Distributions
Comparing Distributions
Definition
Two random variables X and Y are identically distributed if theirdistributions PX and PY are identical, i.e. if
PX (A) = P(X ∈ A) = P(Y ∈ A) = PY (A)
for all Borel sets A.
TheoremTwo random variables X and Y are identically distributed if and only iftheir CDF’s are identical.
Luke Tierney (U Iowa) STAT:5100 (22S:193) Statistical Inference I Fall 2015 34
Friday, September 18, 2015 Classifying Distributions
Classifying Distributions
• A random variable is discrete if the set of its possible values is finiteor countable.
• Equivalently, a random variable is discrete if its CDF is a stepfunction.
• A random variable is continuous if its CDF is continuous and there isa nonnegative function f such that
F (x) =
∫ x
−∞f (u)du
for all x ∈ R.• F will be differentiable “almost everywhere” with F ′ = f .• f is called a probability density function.
• Continuous distributions are sometimes called absolutely continuous.
Luke Tierney (U Iowa) STAT:5100 (22S:193) Statistical Inference I Fall 2015 35
Friday, September 18, 2015 Classifying Distributions
• Combinations of discrete and continuous features are possible.• The CDF of the waiting time at a bank might look like this:
0
• The CDF of the amount of rainfall on a day would also look like this.
• There also exist weird random variables that have a CDF that iscontinuous but nowhere differentiable.
Luke Tierney (U Iowa) STAT:5100 (22S:193) Statistical Inference I Fall 2015 36
Friday, September 18, 2015 Characterizing Discrete and Continuous Distributions
Characterizing Discrete and Continuous Distributions
• The CDF can be used to characterize any probability distribution onthe real line.
• Simpler characterizations are available if the RV’s are discrete orcontinuous.
• For discrete random variables we can use the probability massfunction (PMF):
Definition
The probability mass function (PMF) of a discrete random variable isgiven by
fX (x) = P(X = x)
for all x ∈ R.
• If X is the discrete set of possible values of X , then fX (x) = 0 for allx 6∈ X .
Luke Tierney (U Iowa) STAT:5100 (22S:193) Statistical Inference I Fall 2015 37
Friday, September 18, 2015 Characterizing Discrete and Continuous Distributions
• Using the PMF we can compute any probabilities we want:
P(a ≤ X ≤ b) =∑
a≤x≤bfX (x)
P(X ∈ A) =∑x∈A
fX (x)
• In particular,
FX (x) =∑y≤x
fX (y)
• So the PMF completely determines the distribution of a RV.
Luke Tierney (U Iowa) STAT:5100 (22S:193) Statistical Inference I Fall 2015 38
Friday, September 18, 2015 Characterizing Discrete and Continuous Distributions
• The PMF is defined for continuous random variables but isn’t useful.
• Instead, we use the probability density function (PDF):
Definition
If X is a random variable and fX (x) is a nonnegative real-valued functionon R such that
FX (x) = P(X ≤ x) =
∫ x
−∞fX (u)du
for all x ∈ R then X is continuous and fX (x) is a probability densityfunction (PDF) for X (or for FX ).
Luke Tierney (U Iowa) STAT:5100 (22S:193) Statistical Inference I Fall 2015 39
Friday, September 18, 2015 Characterizing Discrete and Continuous Distributions
Notes
• If fX (x) is continuous at x , then fX (x) = F ′X (x), i.e.
fX (x) = limε↓0
FX (x + ε)− FX (x)
ε
= limε↓0
1
εP(x ≤ X ≤ x + ε)
• So fX (x) is the “density of probability” at x .
• fX (x) > 1 is possible.
• P(X = x) = 0 for all x if X is continuous.
• For any Borel subset A of R,
P(X ∈ A) =
∫A
fX (x)dx
Luke Tierney (U Iowa) STAT:5100 (22S:193) Statistical Inference I Fall 2015 40
Friday, September 18, 2015 Characterizing Discrete and Continuous Distributions
TheoremA function fX (x) is a PMF (or PDF) of some random variable if and onlyif
(i) fX (x) ≥ 0 for all x ∈ R.
(ii) ∑x
fX (x) = 1 (PMF)∫ ∞−∞
fX (x)dx = 1 (PDF)
Luke Tierney (U Iowa) STAT:5100 (22S:193) Statistical Inference I Fall 2015 41
Friday, September 18, 2015 Examples
Example
• A random variable with density
fX (x) =
{λe−λx x ≥ 0
0 otherwise
for some λ > 0 is called an exponential random variable.
• This is a useful simple model for life times.
• We have fX (x) ≥ 0 for all x and∫ ∞−∞
fX (x)dx = 0 +
∫ ∞0
λe−λxdx = 0 + 1 = 1
So fX is a density for any λ > 0.
• The CDF is
FX (x) =
∫ x
−∞fX (y)dy =
{0 for x ≤ 0
1− e−λx for x > 0
Luke Tierney (U Iowa) STAT:5100 (22S:193) Statistical Inference I Fall 2015 42