august 15. in chapter 7: 7.1 normal distributions 7.2 determining normal probabilities 7.3 finding...
TRANSCRIPT
Apr 21, 2023
Chapter 7: Chapter 7: Normal Probability Normal Probability
DistributionsDistributions
In Chapter 7:
7.1 Normal Distributions
7.2 Determining Normal Probabilities
7.3 Finding Values That Correspond to Normal Probabilities
7.4 Assessing Departures from Normality
§7.1: Normal Distributions• Normal random variables are the most common
type of continuous random variable
• First described de Moivre in 1733
• Laplace elaborated the mathematics in 1812
• Describe some (not all) natural phenomena
• More importantly, describe the behavior of means
Normal Probability Density Function
• Recall the continuous random variables are described with smooth probability density functions (pdfs) – Ch 5
• Normal pdfs are recognized by their familiar bell-shape
This is the age distribution of a pediatric population. The overlying curve represents its Normal pdf model
Area Under the Curve• The darker bars of the
histogram correspond to ages less than or equal to 9 (~40% of observations)
• This darker area under the curve also corresponds to ages less than 9 (~40% of the total area)
2
21
2
1)(
x
exf
Parameters μ and σ• Normal pdfs are a family of distributions• Family members identified by parameters
μ (mean) and σ (standard deviation)
σ controls spreadμ controls location
Mean and Standard Deviation of Normal Density
μ
σ
Standard Deviation σ
• Points of inflections (where the slopes of the curve begins to level) occur one σ below and above μ
• Practice sketching Normal curves to feel inflection points
• Practice labeling the horizontal axis of curves with standard deviation markers (figure)
68-95-99.7 Rule forNormal Distributions
• 68% of the AUC falls within ±1σ of μ• 95% of the AUC falls within ±2σ of μ• 99.7% of the AUC falls within ±3σ of μ
Example: 68-95-99.7 Rule
Wechsler adult intelligence scores are Normally distributed with μ = 100 and σ = 15; X ~ N(100, 15). Using the 68-95-99.7 rule:
• 68% of scores fall in μ ± σ = 100 ± 15 = 85 to 115
• 95% of scores fall in μ ± 2σ = 100 ± (2)(15) = 70 to 130
• 99.7% of scores in μ ± 3σ = 100 ± (3)(15) = 55 to 145
Symmetry in the TailsBecause of the Normal curve is symmetrical and the total AUC adds to 1…
… we can determine the AUC in tails, e.g., Because 95% of curve is in μ ± 2σ, 2.5% is in each tail beyond μ ± 2σ
95%
Example: Male Height• Male height is approximately Normal with μ =
70.0˝ and σ = 2.8˝ • Because of the 68-95-99.7 rule, 68% of
population is in the range 70.0˝ 2.8˝ = 67.2 ˝ to 72.8˝
• Because the total AUC adds to 100%, 32%
are in the tails below 67.2˝ and above 72.8˝
• Because of symmetry, half of this 32% (i.e.,
16%) is below 67.2˝ and 16% is above 72.8˝
Example: Male Height
70 72.867.2
64%
16%16%
Reexpression of Non-Normal Variables
• Many biostatistical variables are not Normal
• We can reexpress non-Normal variables with a mathematical transformation to make them more Normal
• Example of mathematical transforms include logarithms, exponents, square roots, and so on
• Let us review the logarithmic transformation
Logarithms
• Logarithms are exponents of their base
• There are two main logarithmic bases
– common log10 (base 10)
– natural ln (base e)
Landmarks:• log10(1) = 0
(because 100 = 1) • log10(10) = 1
(because 101 = 10)
Example: Logarithmic Re-expression
• Prostate specific antigen (PSA) not Normal in 60 year olds but the ln(PSA) is approximately Normal with μ = −0.3 and σ = 0.8
• 95% of ln(PSA) falls in μ ± 2σ = −0.3 ± (2)(0.8) = −1.9 to 1.3
• Thus, 2.5% are above ln(PSA) 1.3; take anti-log of 1.3: e1.3 = 3.67
Since only 2.5% of population has values greater than 3.67 → use this as cut-point for suspiciously high results
§7.2: Determining Normal Probabilities
To determine a Normal probability when the value does not fall directly on a ±1σ, ±2σ, or ±3σ landmark, follow this procedure:
1. State the problem
2. Standardize the value (z score)
3. Sketch and shade the curve
4. Use Table B to determine the probability
Example: Normal ProbabilityStep 1. Statement of Problem
• We want to determine the percentage of human gestations that are less than 40 weeks in length
• We know that uncomplicated human pregnancy from conception to birth is approximately Normally distributed with μ = 39 weeks and σ = 2 weeks. [Note: clinicians measure gestation from last menstrual period to birth, which adds 2 weeks to the μ.]
• Let X represent human gestation: X ~ N(39, 2)
• Statement of the problem: Pr(X ≤ 40) = ?
Standard Normal (Z) Variable
• Standard Normal variable ≡ a Normal random variable with μ = 0 and σ = 0
• Called “Z variables”
• Notation: Z ~ N(0,1)
• Use Table B to look up cumulative probabilities
• Part of Table B shown on next slide…
Example: A Standard Normal (Z) variable with a value of 1.96 has a cumulative probability of .9750.
x
z
Normal ProbabilityStep 2. Standardize
5.02
3940
has )2,39(~ from 40 value theexample,For
z
NX
The z-score tells you how the number of σ-units the value falls above or below μ
To standardize, subtract μ and divide by σ.
3. Sketch and label axes4. Use Table B to lookup Pr(Z ≤ 0.5) = 0.6915
Steps 3 & 4. Sketch and Use Table B
Let a represent the lower boundary and b represent the upper boundary of a range:
Pr(a ≤ Z ≤ b) = Pr(Z ≤ b) − Pr(Z ≤ a)
Probabilities Between Two Points
Use of this concept will be demonstrate in class and on HW exercises.
§7.3 Finding Values Corresponding to Normal Probabilities
1. State the problem.2. Use Table B to look up the z-percentile
value.3. Sketch4. Unstandardize with this formula
pzx
Looking up the z percentile value
Use Table B to look up the z percentile value, i.e., the z score for the probability in questions
Look inside the table for the entry closest to the associated cumulative probability.
Then trace the z score to the row and column labels.
Notation: Let zp represents the z score with cumulative probability p, e.g., z.975 = 1.96
Suppose you wanted the 97.5th percentile z score. Look inside the table for .9750. Then trace the z score to the margins.
Finding Normal Values - Example
Suppose we want to know what gestational length is less than 97.5% of all gestations?
Step 1. State the problem!
Let X represent gestations length
Prior problem established X ~ N(39, 2)
We want the gestation length that is shorter than .975 of all gestations. This is equivalent to the gestation that is longer than.025 of gestations.
Example, cont.Step 2. Use Table B to look up the z value. Table B lists only “left tails”. “less than 97.5%” (right tail) = “greater than 2.5%” (left tail).
z lookup in table shows z.025 = −1.96
z .00 .01 .02 .03 .04 .05 .06 .07 .08 .09
–1.9 .0287 .0281 .0274 .0268 .0262 .0256 .0250 .0244 .0239 .0233
3. Sketch
4. Unstandardize3508.35)2)(96.1(39 x
“The 2.5th percentile gestation is 35 weeks.”
7.4 Assessing Departures from Normality
Normal “Q-Q” Plot of same distribution
Approximately Normal histogram
The best way to assess Normality is graphically
A Normal distribution will adhere to a diagonal line on the Q-Q plot
Negative Skew
A negative skew will show an upward curve on the Q-Q plot
Positive Skew
A positive skew will show an downward curve on the Q-Q plot
Same data as previous slide but with logarithmic transform
A mathematical transform can Normalize a skew
Leptokurtotic
A leptokurtotic distribution (skinny tails) will show an S-shape on the Q-Q plot