random variable and distribution
TRANSCRIPT
Chapter 4: Random Variables and Distribution
Statistics
2
Where We’re Going
Develop the notion of a random variable
Numerical data and discrete random variables
Discrete random variables and their probabilities
4.1: Two Types of Random Variables
A random variable is a variable hat assumes numerical values associated with the random outcome of an experiment, where one (and only one) numerical value is assigned to each sample point.
3
4.1: Two Types of Random Variables
A discrete random variable can assume a countable number of values. Number of steps to the top of the Eiffel Tower*
A continuous random variable can assume any value along a given interval of a number line. The time a tourist stays at the top
once s/he gets there
*Believe it or not, the answer ranges from 1,652 to 1,789. See Great Buildings
4
4.1: Two Types of Random Variables Discrete random variables
Number of sales Number of calls Shares of stock People in line Mistakes per page
Continuous random variables Length Depth Volume Time Weight
5
4.2: Probability Distributions for Discrete Random Variables
The probability distribution of a discrete random variable is a graph, table or formula that specifies the probability associated with each possible outcome the random variable can assume. p(x) ≥ 0 for all values of x p(x) = 1
6
4.2: Probability Distributions for Discrete Random Variables
Say a random variable x follows this pattern: p(x) = (.3)(.7)x-1
for x > 0. This table gives the
probabilities (rounded to two digits) for x between 1 and 10.
x P(x)
1 .30
2 .21
3 .15
4 .11
5 .07
6 .05
7 .04
8 .02
9 .02
10 .01
7
4.3: Expected Values of Discrete Random Variables
The mean, or expected value, of a discrete random variable is
( ) ( ).E x xp x
8
4.3: Expected Values of Discrete Random Variables
The variance of a discrete random variable x is
The standard deviation of a discrete random variable x is
2 2 2[( ) ] ( ) ( ).E x x p x
2 2 2[( ) ] ( ) ( ).E x x p x
9
)33(
)22(
)(
xP
xP
xP
Chebyshev’s Rule Empirical Rule
≥ 0 .68
≥ .75 .95
≥ .89 1.00
10
4.3: Expected Values of Discrete Random Variables
4.3: Expected Values of Discrete Random Variables
11
In a roulette wheel in a U.S. casino, a $1 bet on “even” wins $1 if the ball falls on an even number (same for “odd,” or “red,” or “black”).
The odds of winning this bet are 47.37%
9986.0526.5263.1$4737.1$
5263.)1$(4737.)1$(
losePwinP
On average, bettors lose about a nickel for each dollar they put down on a bet like this.(These are the best bets for patrons.)
Binomial Distribution
Tree Diagram
4 Properties of Binomial Distribution
1. Fixed number of Trials (n)
Tree Diagram
Tree Diagram
4 Properties of Binomial Distribution
1. Fixed number of Trials (n)
2. Two outcomes in a trial, SUCCESS or FAILURE
Tree Diagram
Tree Diagram
4 Properties of Binomial Distribution
1. Fixed number of Trials (n)
2. Two outcomes in a trial, SUCCESS or FAILURE
3. Trials are independent
Tree Diagram
Tree Diagram
4 Properties of Binomial Distribution
1. Fixed number of Trials (n)
2. Two outcomes in a trial, SUCCESS or FAILURE
3. Trials are independent
4. Probability of success (p) remains constant
Tree Diagram
Tree Diagram
Throwing a die
Tree Diagram
X ~ B(n,p)
X – number of successes in a trial
X ~ B(3, 1/6)
Is there a formula for calculating Binomial Probabilities rather than draw a tree diagram?
There are five things you need to do to work a binomial story problem.
1. Define Success first. Success must be for a single
trial. Success = "Rolling a 6 on a single die"
2. Define the probability of success (p): p = 1/6
3. Find the probability of failure (q): q = 5/6
4. Define the number of trials: n = 3
5. Define the number of successes out of those trials (r)
The General Binomial Probability Formula
r – number of successes out of those trialsn – number of trialsp – probability of successq – probability of failure
Where: q = 1 - p
The General Binomial Probability Formula
In the old days, there was a probability of 0.8 of success in any attempt to make a telephone call. Calculate the
probability of having 7 successes in 10 attempts.
Mean and Variance
4.5: The Poisson Distribution
39
The Poisson distribution is a discrete probability distribution for the counts of events that occur randomly in a given interval of time (or space).
The Poisson distribution can be used to calculate the probabilities of various numbers of "successes" based on the mean number of successes. In order to apply the Poisson distribution, the various events must be independent.
4.5: The Poisson Distribution
40
Many experimental situations occur in which we observe the counts of eventswithin a set unit of time, area, volume, length etc. For example, The number of cases of a disease in different towns The number of mutations in set sized regions of a chromosome The number of dolphin pod sightings along a flight path through
a region The number of particles emitted by a radioactive source in a
given time The number of births per hour during a given day
4.5: The Poisson Distribution
41
FORMULAThe formula for the Poisson probability mass function is
where•e is the base of natural logarithms (2.7183)•μ is the mean number of "successes"•x is the number of "successes" in question
4.5: The Poisson Distribution
42
EXAMPLE The average number of homes sold by the Acme Realty company is 2
homes per day. What is the probability that exactly 3 homes will be sold tomorrow?
Solution: This is a Poisson experiment in which we know the following: μ = 2; x = 3; e = 2.71828; since e is a constant equal to approximately 2.71828. We plug these values into the Poisson formula as follows: P(x; μ) = (e-μ) (μx) / x!
P(3; 2) = (2.71828-2) (23) / 3! P(3; 2) = (0.13534) (8) / 6 P(3; 2) = 0.180
Thus, the probability of selling 3 homes tomorrow is 0.180 .
4.5: The Poisson Distribution
43
EXAMPLE
Suppose you knew that the mean number of calls to a fire station on a weekday is 8. What is the probability that on a given weekday there would be 11 calls? μ = 8; x = 11; e = 2.71828; since e is a constant equal to approximately
2.71828.
4.5: The Poisson Distribution
44
Changing the size of the intervalSuppose we know that births in a hospital occur randomly at an average rate of1.8 births per hour.What is the probability that we observe 5 births in a given 2 hour interval? Well, if births occur randomly at a rate of 1.8 births per 1 hour intervalThen births occur randomly at a rate of 3.6 births per 2 hour interval Let Y = No. of births in a 2 hour period
P(Y=5) = (e-3.6)(3.65) / (5!)
= 0.13768
4.5: The Poisson Distribution
45
Sum of two Poisson variablesNow suppose we know that in hospital A births occur randomly at an average rateof 2.3 births per hour and in hospital B births occur randomly at an average rateof 3.1 births per hour.What is the probability that we observe 7 births in total from the two hospitalsin a given 1 hour period? So if we let X = No. of births in a given hour at hospital Aand Y = No. of births in a given hour at hospital B
P (X + Y = 7) = (e-5.4)(5.47) / (7!)
= 0.11999
4.5: The Poisson Distribution
46
Cumulative Poisson Probability
A cumulative Poisson probability refers to the probability that the Poisson random variable is greater than some specified lower limit and less than some specified upper limit.
4.5: The Poisson Distribution
47
Example 1een on a 1-day safari is 5. What is the probability that tourists will see fewer than four lions on the next 1-day safari?
Solution: This is a Poisson experiment inSuppose the average number of lions s which we know the following: μ = 5; x = 0, 1, 2, or 3; e = 2.71828; since e is a constant equal to approximately
2.71828.
To solve this problem, we need to find the probability that tourists will see 0, 1, 2, or 3 lions. Thus, we need to calculate the sum of four probabilities: P(0; 5) + P(1; 5) + P(2; 5) + P(3; 5). To compute this sum, we use the Poisson formula:
P(x < 3, 5) = P(0; 5) + P(1; 5) + P(2; 5) + P(3; 5)P(x < 3, 5) = [ (e-5)(50) / 0! ] + [ (e-5)(51) / 1! ] + [ (e-5)(52) / 2! ] + [ (e-5)(53) / 3! ] P(x < 3, 5) = [ (0.006738)(1) / 1 ] + [ (0.006738)(5) / 1 ] + [ (0.006738)(25) / 2 ] + [ (0.006738)(125) / 6 ] P(x < 3, 5) = [ 0.0067 ] + [ 0.03369 ] + [ 0.084224 ] + [ 0.140375 ] P(x < 3, 5) = 0.2650
Thus, the probability of seeing at no more than 3 lions is 0.2650.
4.6: The Hypergeometric Distribution
In the binomial situation, each trial was independent. Drawing cards from a deck and replacing
the drawn card each time If the card is not replaced, each trial
depends on the previous trial(s). The hypergeometric distribution can be
used in this case.
48
4.6: The Hypergeometric Distribution
49
Randomly draw n elements from a set of N elements, without replacement. Assume there are r successes and N-r failures in the N elements.
The hypergeometric random variable is the number of successes, x, drawn from the r available in the n selections.
4.6: The Hypergeometric Distribution
50
nN
xnrN
xr
xP )(
where N = the total number of elementsr = number of successes in the N elementsn = number of elements drawnX = the number of successes in the n elements
4.6: The Hypergeometric Distribution
51
nN
xnrN
xr
xP )(
)1()()(
22
NNnNnrNr
Nnr
4.6: The Hypergeometric Distribution
44.22.2)2()2()2or2(
22.45
)1)(10(
210
22510
25
)2()2(
FPMPFMP
FPMP
Suppose a customer at a pet store wants to buy two hamsters for his daughter, but he wants two males or two females (i.e., he wants only two hamsters in a few months)
If there are ten hamsters, five male and five female, what is the probability of drawing two of the same sex? (With hamsters, it’s virtually a random selection.)
52
Continuous Random Variable
53
The normal distribution refers to a family of continuous probability distributions described by the normal equation.
• Normal Distribution
Continuous Random Variable
54
• Normal Distributionz = (X - μ) / σ
where X is a normal random variable, μ is the mean of X, and σ is the standard deviation of X
Continuous Random Variable
55
• Normal DistributionExample
An average light bulb manufactured by the Acme Corporation lasts 300 days with a standard deviation of 50 days. Assuming that bulb life is normally distributed, what is the probability that an Acme light bulb will last at most 365 days?
Solution: The value of the normal random variable is 365 days.• The mean is equal to 300 days.• The standard deviation is equal to 50 days.
z = (X - μ) / σ = (365-300)/50z= 1.3
The answer is: P( X < 365) = 0.90. Hence, there is a 90% chance that a light bulb will burn out within 365 days.
Continuous Random Variable
56
• Standard Normal DistributionThe standard normal distribution is a special case of the normal distribution. It is the distribution that occurs when a normal random variable has a mean of zero and a standard deviation of one.Standard Score (aka, z Score)The normal random variable of a standard normal distribution is called a standard score or a z-score. Every normal random variable X can be transformed into a z score via the following equation:
z = (X - μ) / σwhere X is a normal random variable, μ is the mean of X, and σ is the standard deviation of X.
Continuous Random Variable
57
• Standard Normal DistributionThe standard normal distribution is a special case of the normal distribution. It is the distribution that occurs when a normal random variable has a mean of zero and a standard deviation of one.Standard Score (aka, z Score)The normal random variable of a standard normal distribution is called a standard score or a z-score. Every normal random variable X can be transformed into a z score via the following equation:
z = (X - μ) / σwhere X is a normal random variable, μ is the mean of X, and σ is the standard deviation of X.
Continuous Random Variable
58
• Standard Normal DistributionExampleMolly earned a score of 940 on a national achievement test. The mean test score was 850 with a standard deviation of 100. What proportion of students had a higher score than Molly? (Assume that test scores are normally distributed.)
(A) 0.10 (B) 0.18 (C) 0.50 (D) 0.82 (E) 0.90
Continuous Random Variable
59
• Standard Normal Distribution• First, we transform Molly's test score into a z-score, using the z-
score transformation equation.
z = (X - μ) / σ = (940 - 850) / 100 = 0.90
• Then, using the standard normal distribution table, we find the cumulative probability associated with the z-score. In this case, we find P(Z < 0.90) = 0.8159.
• Therefore, the P(Z > 0.90) = 1 - P(Z < 0.90) = 1 - 0.8159 = 0.1841.
Thus, we estimate that 18.41 percent of the students tested had a higher score than Molly.