lecture 2 : association and inference
DESCRIPTION
Lecture 2 : Association and Inference. Karen Bandeen -Roche, PhD Department of Biostatistics Johns Hopkins University. July 12, 2011. Introduction to Statistical Measurement and Modeling. Data motivation. Osteoporosis screening - PowerPoint PPT PresentationTRANSCRIPT
![Page 1: Lecture 2 : Association and Inference](https://reader036.vdocuments.site/reader036/viewer/2022062423/56814e93550346895dbc3ca3/html5/thumbnails/1.jpg)
Lecture 2Lecture 2: Association and Inference: Association and Inference
July 12, 2011July 12, 2011
Karen Bandeen-Roche, PhDDepartment of Biostatistics
Johns Hopkins University
Introduction to Statistical Measurement
and Modeling
![Page 2: Lecture 2 : Association and Inference](https://reader036.vdocuments.site/reader036/viewer/2022062423/56814e93550346895dbc3ca3/html5/thumbnails/2.jpg)
Data motivation Osteoporosis screening
Scientific question: Can we detect osteoporosis earlier and more safely?
Some related statistical questions:
Are ultrasound measurements associated with osteoporosis status?
How strong is the evidence of association?
By how much do the mean ultrasound values differ between those with and without osteoporosis?
How precisely can we determine that difference?
![Page 3: Lecture 2 : Association and Inference](https://reader036.vdocuments.site/reader036/viewer/2022062423/56814e93550346895dbc3ca3/html5/thumbnails/3.jpg)
Data motivation
control case
1600
1700
1800
1900
2000
Ultrasound scores by osteoporosis groups
control case
0.6
0.7
0.8
0.9
1.0
1.1
1.2
DPA scores by osteoporosis groups
![Page 4: Lecture 2 : Association and Inference](https://reader036.vdocuments.site/reader036/viewer/2022062423/56814e93550346895dbc3ca3/html5/thumbnails/4.jpg)
Outline Association
Conditional probability
Joint distributions / Independence (note SRS) Correlation
Statistical evidence / certainty
Confidence intervals
Statistical tests
![Page 5: Lecture 2 : Association and Inference](https://reader036.vdocuments.site/reader036/viewer/2022062423/56814e93550346895dbc3ca3/html5/thumbnails/5.jpg)
Association - Heuristic Two random variables are associated if…
… they are “connected” or “related”
… knowing one helps predict the other
“Association” and “causality” are not equivalent Two variables may be associated because a
third variable causes each
Even if an association is causal the direction may not be clear (which causes which)
Association is necessary, but not sufficient, for causality
![Page 6: Lecture 2 : Association and Inference](https://reader036.vdocuments.site/reader036/viewer/2022062423/56814e93550346895dbc3ca3/html5/thumbnails/6.jpg)
Association - Formal Tool: Conditional probability
Definition for two events A and B: If P(B) > 0, the conditional probability that event A occurs, given that event B has occurred, is P(A|B) := P(A∩B)/P(B)
Key concept: independence A,B are pairwise independent if P{A,B} = P{A}P{B}.
Events {Aj, j = 1,...,n} are mutually independent if P{∩j=1
n Aj} = ∏1n P{XjεAj}
A, B are independent iff P(A|B) = P(A)
Osteoporosis example: A=ultrasound<1750, B=case?
![Page 7: Lecture 2 : Association and Inference](https://reader036.vdocuments.site/reader036/viewer/2022062423/56814e93550346895dbc3ca3/html5/thumbnails/7.jpg)
Association - Formal What about random variables?
Joint distribution function: FX,Y(x,y) = P{X≤x,Y≤y} = P{{X≤x} ∩ {Y≤y}}
Joint mass, density functions:
Discrete X, Y: pXY(x,y)=P{X=x,Y=y}
Continuous X, Y: fX,Y (x,y) = d2/(dsdt) FX,Y (s,t) |(x,y)
Conditional mass, density functions:
Discrete X, Y: pX|Y(x|y)=pXY(x,y)/pY(y)
Continuous X, Y: f(x|y) = fX,Y(x,y)/fY (y)
![Page 8: Lecture 2 : Association and Inference](https://reader036.vdocuments.site/reader036/viewer/2022062423/56814e93550346895dbc3ca3/html5/thumbnails/8.jpg)
Association - Formal Key concept: independence
RVs X,Y are independent if for each x є SX , y є SY
FX,Y(x,y) = FX(x)FY(y)
pX|Y(x|y)=pX(x) (discrete X)
f(x|y) = fX(x) (continuous Y)
Osteoporosis example: X=ultrasound score, Y = 1 if osteoporosis case and Y=0 if osteoporosis control Are the ultrasound densities the same for cases &
controls?
Concepts generalize to mixed continuous / discrete (etc.), multiple random variables
![Page 9: Lecture 2 : Association and Inference](https://reader036.vdocuments.site/reader036/viewer/2022062423/56814e93550346895dbc3ca3/html5/thumbnails/9.jpg)
Association - Examples Discrete random variables
Let X=1 if ultrasound < 1750, X=0 otherwise
Y = 1 if osteoporosis case and Y=0 if control
Suppose the joint mass function in a population of older women is as follows:
Are low ultrasound and osteoporosis associated?
Y=0 Y=1
X=0 .43 .07
X=1 .10 .40
![Page 10: Lecture 2 : Association and Inference](https://reader036.vdocuments.site/reader036/viewer/2022062423/56814e93550346895dbc3ca3/html5/thumbnails/10.jpg)
Association - Examples Continuous random variables
A pair of random variables (X,Y) is said to be bivariate normal if it is distributed according to the following density:
is the “correlation” parameter
Bivariate normal X,Y are independent if =0
f x yx x y y
( , ) exp
1
2 1
1
2 12
1 22 2
1
1
2
1
1
2
2
2
2
f xx
1
2
1
212
1
1
2
exp
![Page 11: Lecture 2 : Association and Inference](https://reader036.vdocuments.site/reader036/viewer/2022062423/56814e93550346895dbc3ca3/html5/thumbnails/11.jpg)
Association - Examples Continuous random variables
![Page 12: Lecture 2 : Association and Inference](https://reader036.vdocuments.site/reader036/viewer/2022062423/56814e93550346895dbc3ca3/html5/thumbnails/12.jpg)
Association - Formal A related concept to independence: Covariance
Heuristic: measures degree of linear relationship between variables.
Covariance=Cov(X,Y) =E{[X-E(X)][Y-E(Y)]} = E[XY] - E[X]E[Y]
Note: Details provided on adjunct handout
Two relationships now easy to characterize:
a) Pairwise independence ⇒ Cov(X,Y) = 0; not vice versa
b) Var(X+Y) = Var(X)+Var(Y)+2Cov(X,Y)
![Page 13: Lecture 2 : Association and Inference](https://reader036.vdocuments.site/reader036/viewer/2022062423/56814e93550346895dbc3ca3/html5/thumbnails/13.jpg)
Association - Example Correlation of (ultrasound, DPA) = 0.36
0.6 0.7 0.8 0.9 1.0 1.1 1.2
16
00
17
00
18
00
19
00
20
00
DPA
Ultr
aso
un
d s
core
![Page 14: Lecture 2 : Association and Inference](https://reader036.vdocuments.site/reader036/viewer/2022062423/56814e93550346895dbc3ca3/html5/thumbnails/14.jpg)
Association - Formal X, Y are dependent (not independent) if there exist x1 є SX,
y1, y2 є SY such that
FX,Y(x1,y1) ≠ FX(x1)FY(y1)
pX|Y(x1|y1) ≠ pX(x1) ≠ pX|Y(x1|y2) (discrete X)
fX|Y(x1|y1) ≠ fX(x1) ≠ fX|Y(x1|y2)(continuous Y)
A sufficient but not necessary condition: X, Y are dependent (not independent) if the mean of X varies conditionally on Y
E{X|Y} = ∫xfX|Y(x|Y)dx (continuous)
= ΣxεSx xpX|Y(x|Y) (discrete)
> Basis of regression!
![Page 15: Lecture 2 : Association and Inference](https://reader036.vdocuments.site/reader036/viewer/2022062423/56814e93550346895dbc3ca3/html5/thumbnails/15.jpg)
Data motivation
control case
1600
1700
1800
1900
2000
Ultrasound scores by osteoporosis groups In our data sample the means are certainly different for the women with versus without osteoporosis
How persuasive is the evidence of a mean difference in a population of older women?
= 1828X
= 1688X
![Page 16: Lecture 2 : Association and Inference](https://reader036.vdocuments.site/reader036/viewer/2022062423/56814e93550346895dbc3ca3/html5/thumbnails/16.jpg)
Basic paradigm of statistics We wish to learn about populations
All about which we wish to make an inference
“True” experimental outcomes and their mechanisms
We do this by studying samples A subset of a given population
“Represents” the population
Sample features are used to infer population features Method of obtaining the sample is important
Simple random sample: All population elements / outcomes have equal probability of inclusion
![Page 17: Lecture 2 : Association and Inference](https://reader036.vdocuments.site/reader036/viewer/2022062423/56814e93550346895dbc3ca3/html5/thumbnails/17.jpg)
Basic paradigm of statistics
Truth for Population
Observed Value for a
Representative Sample
Probability
Statistical inference
Our example: thedifference in mean ultrasound measurementBetween older womenWith and without osteoporosis
![Page 18: Lecture 2 : Association and Inference](https://reader036.vdocuments.site/reader036/viewer/2022062423/56814e93550346895dbc3ca3/html5/thumbnails/18.jpg)
Basic paradigm of statistics Identify a parameter that summarizes the
scientific quantity of interest
Obtain a representative sample of the target population
X1, … , Xn (possibly within groups)
Estimate the parameter using the sample (statistic)
Characterize the variability of the estimate
Make an inference, characterize its uncertainty, and draw a conclusion
![Page 19: Lecture 2 : Association and Inference](https://reader036.vdocuments.site/reader036/viewer/2022062423/56814e93550346895dbc3ca3/html5/thumbnails/19.jpg)
Basic paradigm Identify a parameter that summarizes the
scientific quantity of interest
µ1 = E[X|Y=1] = mean ultrasound given osteoporosis
µ0 = E[X|Y=0] = mean ultrasound given no osteoporosis
Target parameter: µ1 - µ0
Obtain a representative sample of the target population The current sample was a clinical series.
Questionable how representative.
![Page 20: Lecture 2 : Association and Inference](https://reader036.vdocuments.site/reader036/viewer/2022062423/56814e93550346895dbc3ca3/html5/thumbnails/20.jpg)
Basic paradigm Estimate the parameter using the sample
(statistic) Many estimation methods have been developed.
Proposed estimator here: Means based on ECDF
“Method of moments”
Pause: Is this a good estimator? What makes it one?
Consider a generic estimator (statistic) gn(X)
Denote the target parameter we wish to estimate by Θ
X X1 0
![Page 21: Lecture 2 : Association and Inference](https://reader036.vdocuments.site/reader036/viewer/2022062423/56814e93550346895dbc3ca3/html5/thumbnails/21.jpg)
Basic paradigm Some properties of a good estimator gn(X)
Accuracy with respect to target parameter Θ
Unbiased: E[gn(X) ] = Θ
That is, bias = E[gn(X)] – Θ = 0
Consistent: For each ε > 0, limn→∞ P{|gn(X)-Θ|> ε} = 0or (stronger) P{limn→∞ gn(X) = Θ} = 1
Precision: small Var(gn(X))
Good tradeoff of accuracy and precision:
Low mean squared error
= E(gn(X)-Θ)2 = Var(gn (X))+[Bias(gn (X))]2
![Page 22: Lecture 2 : Association and Inference](https://reader036.vdocuments.site/reader036/viewer/2022062423/56814e93550346895dbc3ca3/html5/thumbnails/22.jpg)
Basic paradigm Estimate the parameter using the sample
(statistic)
Characterize the variability of the estimate Method 1: Calculate the variance directly
Assumption: Sampling method ⇒ mutual independence
Then, Var( ) = Var( ) + Var( )
Var( ) = 1/n1 Var(X1) = σ12/n1
Var( ) = σ12/n1 + σ0
2/n0
X X1 0
X X1 0 X 1 X 0
X 1
X X1 0
![Page 23: Lecture 2 : Association and Inference](https://reader036.vdocuments.site/reader036/viewer/2022062423/56814e93550346895dbc3ca3/html5/thumbnails/23.jpg)
*** Basic paradigm *** Characterize the variability of the estimate
Method 1: Var( ) = σ12/n1 + σ0
2/n0
What does this mean?
We “have” one possible sample among many that could have been taken from the population
Suppose it was drawn randomly
Definition: The values that the statistic of interest could have taken over all possible samples (of the same size randomly drawn from the population), and their probability measure, is called the sampling distribution of that statistic.
σ12/n1 + σ0
2/n0 is the variance of the sampling distribution
Important term: The standard deviation of the sampling distribution is called the standard error
X X1 0
![Page 24: Lecture 2 : Association and Inference](https://reader036.vdocuments.site/reader036/viewer/2022062423/56814e93550346895dbc3ca3/html5/thumbnails/24.jpg)
24
1. Treat the sample as if it is the whole population. The original sample statistic becomes the true value (“truth”) we seek.
2. Draw the first bootstrap sample at random (with replacement) from the original sample and calculate the statistic of interest.
3. Repeat this process 1000* times. The distribution of bootstrapped statistics approximates the sampling distribution of the statistic.
Way to “see” the sampling distribution: Bootstrap-Efron,
1979 Idea: mimic sampling that produced the original sample.
Efron, B. Bootstrap Methods: Another Look at the Jackknife. Ann Stat 1979; 7:1-26.
![Page 25: Lecture 2 : Association and Inference](https://reader036.vdocuments.site/reader036/viewer/2022062423/56814e93550346895dbc3ca3/html5/thumbnails/25.jpg)
25
Repeat 1000 TimesBootstrapped
mean difference:
-150.14
-118.16
-128.30
-168.46
-157.69
.
.
.
-196.65
Original sample value of -140.48
![Page 26: Lecture 2 : Association and Inference](https://reader036.vdocuments.site/reader036/viewer/2022062423/56814e93550346895dbc3ca3/html5/thumbnails/26.jpg)
Characterize variability of gn(X)
Method 2: Var( ) estimated by the variance of the bootstrapped distribution Variance is still a bit difficult to interpret
Method 3: Employ the Central Limit Theorem Distributions of sample means converge to normal as n→∞
Definition: Fgn(X)(s) converges to F(s) in distribution ⇒ limn→∞ P{gn(X)≤s} = F(s) at every continuity point of F.
Central limit theorem: Let X1,...,Xn be a sequence of mutually independent RVs with common distribution. Define Sn=Σi Xi. Then limn→∞ P{(Sn-nμ)/(σn1/2) ≤ z} = Φ(z) for every fixed z , where Φ is the Normal CDF with mean=0 and variance=1.
X X1 0
![Page 27: Lecture 2 : Association and Inference](https://reader036.vdocuments.site/reader036/viewer/2022062423/56814e93550346895dbc3ca3/html5/thumbnails/27.jpg)
Characterize variability of gn(X)
Implications of the Central Limit Theorem
~ 95% of estimates within +/- 1.96 standard errors of µ1 - µ0
![Page 28: Lecture 2 : Association and Inference](https://reader036.vdocuments.site/reader036/viewer/2022062423/56814e93550346895dbc3ca3/html5/thumbnails/28.jpg)
Characterize variability of gn(X)
Implications of the Central Limit Theorem
~ 95% of estimates within +/- 1.96 standard errors of µ1 - µ0
P{µ1 - µ0 ε ± 1.96 SE( )} = 0.95
Many gn(X) converge in distribution to normal
Broadly: P[Θ ε gn(X) ± z(1-α/2) SE{gn(X)}] ≈ 1-α with z(1-α/2) = Q{(1-α/2)} of the normal distribution with µ=0 and σ=1
An interval I(X) = [L{gn(X)},U{gn(X)}] satisfying P{Θ ε I(X)}= 1-α is called a 100x(1-α) confidence interval
X X1 0 X X1 0
![Page 29: Lecture 2 : Association and Inference](https://reader036.vdocuments.site/reader036/viewer/2022062423/56814e93550346895dbc3ca3/html5/thumbnails/29.jpg)
29
Bootstrap Confidence Interval
Fact: the interval that includes the middle 95% of the bootstrapped statistics covers the true unknown population value for roughly 95% of samples if the sampling distribution of the statistic or a function thereof is roughly Gaussian.
Original sample value of -140.48
![Page 30: Lecture 2 : Association and Inference](https://reader036.vdocuments.site/reader036/viewer/2022062423/56814e93550346895dbc3ca3/html5/thumbnails/30.jpg)
Analytic Confidence Interval (CI)
Sample mean difference = 1688-1828 = -140
Var( ) = σ12/n1 + σ0
2/n0; SE = square root of this
Estimate σ12, σ0
2 by sample analogs s12, s0
2 = (5362,13570)
n1 = n0 = 21
SE = √{5362/21 + 13570/21} = 30.03
95% CI = (-140-1.96*30.03,-140+1.96*30.03)= (-199,-81)
= an interval in which we have 95% confidence for including the difference in mean ultrasound scores between osteoporotic and non-osteoporotic older women
X X1 0
X X1 0
![Page 31: Lecture 2 : Association and Inference](https://reader036.vdocuments.site/reader036/viewer/2022062423/56814e93550346895dbc3ca3/html5/thumbnails/31.jpg)
Analytic CI – A detail The preceding calculation assumed
/ √ (s12/n1 + s0
2/n0)
approximately distributed as normal mean=0, variance=1
This does not account for variability in substituting s for σ
Rather, the correct distribution is “t” (close to normal for large n)
With this correction the 95% CI is (-202,-79)
X X1 0
![Page 32: Lecture 2 : Association and Inference](https://reader036.vdocuments.site/reader036/viewer/2022062423/56814e93550346895dbc3ca3/html5/thumbnails/32.jpg)
Basic paradigm of statistics Make an inference, characterize its
uncertainty, and draw a conclusion Inference Method 1: (-202,-79) is a 95% CI for
the difference in mean ultrasound scores between those with, without osteoporosis.
Conclusion: Based on the data, we are confident that the mean ultrasound score is substantially lower in osteoporotic older women than in those without osteoporosis.
The conclusion is a population statement.
![Page 33: Lecture 2 : Association and Inference](https://reader036.vdocuments.site/reader036/viewer/2022062423/56814e93550346895dbc3ca3/html5/thumbnails/33.jpg)
Basic paradigm of statistics Make an inference, characterize its
uncertainty, and draw a conclusion Inference Method 2: Statistical testing
Two goals
Assess evidence for a statement / hypothesis
Ultrasound measurements are (or are not) associated with osteoporosis status
Make yes/no decision about question of interest:
Are ultrasound measurements associated with osteoporosis status?
![Page 34: Lecture 2 : Association and Inference](https://reader036.vdocuments.site/reader036/viewer/2022062423/56814e93550346895dbc3ca3/html5/thumbnails/34.jpg)
Decision making Neyman-Pearson framework for hypothesis testing
Define complementary hypotheses about the truth in the population Denote as θ the parameter space={possible values of Θ}
Define θ0, θ1 as distinct subsets of θ corresponding to the different hypotheses
Example: No difference in means versus a difference in means
“Null” Hypothesis - H0: Θ ε θ0 (H0: µ1 - µ0 = 0)
“Alternative” Hypothesis - H1: Θ ε θ1 (H1: µ1 - µ0 ≠ 0)
![Page 35: Lecture 2 : Association and Inference](https://reader036.vdocuments.site/reader036/viewer/2022062423/56814e93550346895dbc3ca3/html5/thumbnails/35.jpg)
Decision making Choose an estimator of Θ, gn(X) (as above)
Main idea: If the value of gn(X) observed in one’s data is “unusual” assuming H0, then conclude H0 is not a reasonable model for the data and decide against it
Testing – steps:
Compute distribution of gn(X) for Θ = θ0
Define a “rejection” region of values far from θ0 occurring with low probability = α when H0 is true
Reject H0 if gn(x) is in the “rejection” region; do not reject it otherwise.
![Page 36: Lecture 2 : Association and Inference](https://reader036.vdocuments.site/reader036/viewer/2022062423/56814e93550346895dbc3ca3/html5/thumbnails/36.jpg)
Rejection region - example
We are interested in
If H0 is true (µ1 - µ0 = 0), then ( - 0)/SE is approximately distributed as normal with mean = 0 and variance = 1 (henceforth, N(0,1) )
X X1 0
X X1 0
Density if H0 is true
![Page 37: Lecture 2 : Association and Inference](https://reader036.vdocuments.site/reader036/viewer/2022062423/56814e93550346895dbc3ca3/html5/thumbnails/37.jpg)
Possible Results of Hypothesis Testing
Reject when H0 true
Fail to reject when H1 true
![Page 38: Lecture 2 : Association and Inference](https://reader036.vdocuments.site/reader036/viewer/2022062423/56814e93550346895dbc3ca3/html5/thumbnails/38.jpg)
Significance testing Variant of Neyman-Pearson
Based on calculation of a p-value:
The probability of observing a test statistic as or more extreme than occurred in sample under the null hypothesis
As or more extreme: as different or more different from the hypothesized value
p-value sometimes used as measure of evidence against null, and often, "for" alternative
![Page 39: Lecture 2 : Association and Inference](https://reader036.vdocuments.site/reader036/viewer/2022062423/56814e93550346895dbc3ca3/html5/thumbnails/39.jpg)
Osteoporosis data Are ultrasound measurements associated with
osteoporosis status?
Hypotheses: H0: µ1 - µ0 = 0 vs. H1: µ1 - µ0 ≠ 0
Test statistic: /√ (s12/n1 + s0
2/n0) =140/30.03=4.66
Rejection region: > 1.96 or < -1.96
4.66 > 1.96, thus we reject H0 (again, approximate: “t”)
p-value: probability of a value as far or farther from 0 than 4.66 if the true mean difference were 0
P{gn(X) > 4.66 or gn(X) < -4.66} = 3.16E-06
Conclusion: Data strongly support a “yes” answer.
X X1 0
![Page 40: Lecture 2 : Association and Inference](https://reader036.vdocuments.site/reader036/viewer/2022062423/56814e93550346895dbc3ca3/html5/thumbnails/40.jpg)
Data motivation Osteoporosis screening
Scientific question: Can we detect osteoporosis earlier and more safely?
Some related statistical questions:
Are ultrasound measurements associated with osteoporosis status? - Testing indicates “yes”.
How strong is the evidence of association? Strong (95% CI very substantially excludes values near 0)
By how much do the mean ultrasound values differ between those with and without osteoporosis? We estimate that older women with osteoporosis have mean 140 dB/MHz lower than those without osteoporosis
How precisely can we determine that difference? Standard error = 30.03; 95% CI = (-202,-79)
![Page 41: Lecture 2 : Association and Inference](https://reader036.vdocuments.site/reader036/viewer/2022062423/56814e93550346895dbc3ca3/html5/thumbnails/41.jpg)
Main points Variables (characteristics, features, etc.) are
associated if they are statistically dependent.
Covariance / correlation measure linear association
We use representative samples to study features of populations
Estimators (“statistics”) provide approximations of population parameters Possible values and their probabilities are characterized
by sampling distributions
Standard errors and confidence intervals summarize their associated variability / uncertainty
![Page 42: Lecture 2 : Association and Inference](https://reader036.vdocuments.site/reader036/viewer/2022062423/56814e93550346895dbc3ca3/html5/thumbnails/42.jpg)
Main points Statistical tests evaluate evidence and
inform decisions about scientific (and other) questions Neyman-Pearson paradigm
Hypothesis testing
Significance testing