unit4quantitativeskills.org/module1/unit4.pdf · 2020. 9. 11. · 'hilqlwlrqv 7\shv ri 5dqgrp...
TRANSCRIPT
UNIT 4 THEORETICAL PROBABILITY DISTRIBUTION
Prof. Ryung Kim
1
Unit 1.3 (PnG p. 4)
This unit extends the notion of probability and introduces some common probability distributions. These mathematical models are useful as a basis for the methods studied in the remainder of the text.
2
1
2
1. PROBABILITY DISTRIBUTIONS
Prof. Ryung Kim
3
Random Variable
Random variable is a function that assigns a number to each outcome. E.g. In a single toss of a coin, let X be 1 if we observe
head and 0 if we observe tail. E.g. Let’s consider coin tossing 3 times, and Y be # of
heads.
Notation X, Y, Z, … : random variable
x, y, z … : observation
4
3
4
Definitions - Types of Random Variables
Discrete random variable a finite number of (or countable) values
• Marital status, number of ear infections an infant develops during the first year of life, …
Continuous random variable infinitely many values that can be mapped on a
continuous scale • Height, weight, life time, forced expiratory volume
In 1 second, …
5
Probability distribution
Probability distribution (for discrete random variable) is a table (or formula) of the probability for each value of the random variable
6
Elementary Statistics, 10th Edition, p 202
5
6
P(xi) = 1
0 P(xi) 1
Requirements for Probability Distribution7
µ = [xi • P(xi)] Mean or Expected Value
2 = [(xi – µ)2
• P(xi)] Variance
= [(xi – µ)2
• P(xi)] Standard Deviation
Mean, Variance, Standard Deviation of a Discrete Probability Distribution
8
7
8
2. BERNOULLI AND BINOMIAL PROBABILITY DISTRIBUTION
Prof. Ryung Kim
9
Bernoulli Random Variable
A Bernoulli random variable Y has two possible values 1 and 0, and the probabilities of obtaining those values are p and 1-p, respectively.
In other words,
P(Y=1) = p and P(Y=0) = 1-p
E.g. life/death, male/female, sickness/health
10
9
10
Bernoulli Random variable
In 1987, 29% of the adults in the U.S. smoked cigarettes, cigars, or pipes [CDC, 1989]
If we randomly select one person, and let Y be 1 if the person smokes and 0 if he/she does not.
P(Y=1) = p=0.29
P(Y=0) = 1-p=0.71
11
Binomial Random variable
Now, suppose we randomly select three people, and let X be the number of smokers.
P(X=0) = (1- p)3=(0.71) 3=0.358 P(X=2) = 3p2 (1- p) =3(0.29) 2(0.71) =0.179
P(X=1) = 3p(1- p)2=3(0.29) (0.71) 2=0.439 P(X=3) = p3= (0.29) 3 =0.024
1st
person2nd
person3rd
personProbability
Number ofsmokers (X)
0 0 0 (1-p)(1-p)(1-p) 0
1 0 0 p(1-p)(1-p) 1
0 1 0 (1-p)p(1-p) 1
0 0 1 (1-p)(1-p)p 1
1 1 0 pp(1-p) 2
1 0 1 p(1-p)p 2
0 1 1 (1-p)pp 2
1 1 1 ppp 3
12
11
12
Notationn = number of trialsx = number of successes among n trialsp = probability of success in any one trialq = probability of failure in any one trial (q = 1 – p)
Probability distribution of Binomial Random Variable
P(X=x) = • px (1-p)n-x(n – x )!x!
n !
for x = 0, 1, 2, . . ., n
13
Understanding the Binomial Probability Formula
P(X=x) = • px (1-p)n-xn ! (n – x )!x!
Number of combinations with exactly xsuccesses in n
trials
The probability of x successes
and n-x failures in a particular
order
n ! (n – x )!x! x
n=
14
13
14
Binomial probability distribution
One can model a random phenomenon by a binomial probability distribution if it meets the following requirements:
1. The procedure has a fixed number of trials.
2. The trials must be independent.
3. Each trial has outcomes classified into two categories (‘success’ or ‘failure’).
4. The probability of a success remains the same in all trials.
15
Example – binomial model
When and how do you model a phenomenon with binomial distribution?
In a population of flatworms (Planaria) living in a certain pond, one in five individuals is adult and four are juvenile. An ecologist plans to count the adults in a random sample of 12 flatworms from the pond. What is the probability that she finds less than 5 adults?
16
15
16
Binomial probability model
p=0.2
n=12
Right skewed
Left skewed when p>.5
Symmetric when p=.5
17
Use R to compute binomial probabilities
18
17
18
Use Statato compute binomial probabilities
19
Mean µ = n p
Variance 2 = n p (1-p)
Std. Dev. = n p (1-p)
Mean, variance, Standard Deviation of Binomial Distribution
Recall the definition of mean and variance…
µ = [xi • P(xi)] 2 = [(xi – µ)2• P(xi)]
20
19
20
3. NORMAL DISTRIBUTION
Prof. Ryung Kim
21
Many random variables of interest – e.g. blood pressure, amount of chemicals in human body, height, and weight – are approximately normally distributed. (PnG, p.177)
The importance of Normal distribution will be obvious in the following chapters.
Normal distribution is a continuous distribution
Normal distribution22
21
22
Probability density function
A density curve is the graph of a continuous probability distribution.1. The total area under the curve must equal 1.
2. Every point on the curve must have a vertical height that is 0 or greater. (That is, the curve cannot fall below the x-axis.)
23
Density function of Normal Distribution
The normal probability distribution has a bell-shape density function and the total area under its density curve is equal to 1. It’s mean µ can be any number and variance 2 can be any positive number.
Graph from Elementary Statistics, 10th Edition
24
23
24
Probability density function -Area and Probability
Because the total area under the density curve is equal to 1, there is a correspondence between area and probability.
The area under the density curve between a and b corresponds to P(a ≤ X ≤ b), i.e. probability that the random variable has value between a and b.
25
µ = ∫ y p(y) dy Mean or Expected Value
2 = ∫ [(y – µ)2p(y)] dy Variance
= ∫ [(y – µ)2p(y)] dy Standard Deviation
I will not ask you to do integration to compute mean, variance, or standard deviation.
Mean, Variance, Standard Deviation of a Continuous Probability Distribution
26
25
26
Shape of Normal Probability Density
Mean =0
27
Shape of Normal Probability Density
Standard Deviation (σ) =1
28
27
28
THE STANDARD NORMAL DISTRIBUTION
Prof. Ryung Kim
29
Definition – Standard Normal Distribution
The standard normal distribution is the normal distribution with mean equal to 0 and standard deviation equal to 1.
It is extremely important to develop the skill to find areas corresponding to various regions under the graph of the standard normal distribution.
Graph from Elementary Statistics, 10th Edition
30
29
30
P (z < 1.58) = 0.9429
Computing Standard Normal Probabilities
31
P (z > –1.23) = 0.8907
Computing standard normal probabilities32
31
32
Computing standard normal probabilities
P (z < –2.00) = 0.0228P (z < 1.50) = 0.9332
0.9332 – 0.0228 = 0.9104
P (–2.00 < z < 1.50) = ?
Graph from Elementary Statistics, 10th
Edition
33
Finding the 95th Percentile
1.645
5% or 0.05
(z score will be positive)
Finding Percentiles of Standard Normal Distribution
Graph from Elementary Statistics, 10th
Edition
34
33
34
Finding the Bottom 2.5% and Upper 2.5%(One z score will be negative and the other positive)
Finding Percentiles (continued.)
-1.96 1.96
35
Use R to compute standard normal percentiles
R can be used to find the standard normal percentiles.
1st example
2nd example
2nd example
36
35
36
Use Stata to compute standard normal probability
Stata can be used to find the standard normal probabilities.
1st example
2nd example
3rd example
37
Use Stata to compute standard normal percentiles
Stata can be used to find the standard normal percentiles.
1st example
2nd example
2nd example
38
37
38
Z-TRANSFORMATION
Prof. Ryung Kim
39
To compute probability of non-standard normal:Convert it to the Standard Normal (Z-transformation)
x – z =
Y – µZ =
Graph from Elementary Statistics, 10th Edition
If Y is a Normal random variable with mean µ and variance 2, than Z is a standard normal random variable.
40
39
40
Example – Systolic blood pressure (p.180)
For the population of 18 to 74 year-old males in the U.S., systolic blood pressure is approximately normally distributed with mean 129 millimeters of mercury (mm Hg) and standard deviation 19.8 mm Hg [5]What is the proportion of men in the population who
have systolic blood pressures greater than 150 mm Hg?
41
Example - cont
=19.8 mm Hg = 129 mm Hg
P ( Y > 150 mmHg)= P(Z > 1.06) = 1–0.8556
= 0.1446
z = 150 – 129
19.8= 1.06
Elementary Statistics, 10th Edition
42
41
42
Use R to compute Normal Probability43
Example – Finding percentiles
Find the values that cut off the upper and lower 2.5% of the curve of systolic blood pressure.
44
43
44
y = + zy = 129 + 1.9619.8y = 167.81
Example – Finding percentiles (cont.)
“The pressure of 167.81 (mm Hg) separates the lightest 97.5% from the heaviest 2.5%”
45
Use R to compute Normal Probability46
45
46
Use Stata to compute Normal Probability
1st example
2nd example
47
Other Probability Distributions
For counts Poisson distributions
For positive continuous variables Exponential distributions Gamma distributions Weibull distributions Chi-square distributions
For percentages or proportions as continuous variables Beta distributions
And MANY others
48
47
48
Reference Principles of Biostatistics (Pagano and Gauvreau)
Elementary Statistics by Triola, 10th edition.
Applied Statistics for Engineers and Scientists by Petruccelli et al.
Acknowledgements Prof. Jayson Wilbur, WPI
Prof. Balgobin Nandram, WPI
Prof. Lee Jaeyong, Seoul National University
Some slides provided by Pearson Education, Inc Publishing as Pearson Addison-Weslely
49
49