llsk_confidence interval printout
TRANSCRIPT
-
8/11/2019 Llsk_confidence Interval Printout
1/32
Confidence Intervals
Prof. Benjamin HK YipDivision Family MedicineSchool of Public Health and Primary Care
1
-
8/11/2019 Llsk_confidence Interval Printout
2/32
Overview
Outline! Background! Confidence intervals (CIs)! Examples
Learning Objectives! to understand CI construction! to be able to name 3 factors that affect CIs! to be able to interpret CIs found in literature
2
-
8/11/2019 Llsk_confidence Interval Printout
3/32
Motivation
! Research/Clinical Questions: Do HK youngadults have a low BMD (L2-L4)?
! Ex: In this particular class, the sample mean ofBMD (L2-L4) is 0.96 g/cm 2.
! Questions:
" How meaningful is this sample mean?" Will you trust this estimates?
3
-
8/11/2019 Llsk_confidence Interval Printout
4/32
Statistical inference
! Methods for drawing conclusions about a population from a sample data," Parameter estimation and Confidence interval" Hypothesis testing (p-values)
! Question: What allows us to make valid inferencesabout a population based only on a sample ?" Probability (see previous lecture)" Random process, i.e., randomization is the key
4
-
8/11/2019 Llsk_confidence Interval Printout
5/32
Why should I sample instead of usingthe entire population?
The reasons to avoid using entire population arefollowing:
! Cost ( ! , $) and time! Impractical! Inaccurate
" There is a lot of error to control and monitor
" Lists are rarely up to date.! Random sampling
5
-
8/11/2019 Llsk_confidence Interval Printout
6/32
-
8/11/2019 Llsk_confidence Interval Printout
7/32
Terminology
Population parameter! a quantity that describes a population.
Sample statistics! an estimate of the population parameter
Statistical inference! process of drawing conclusions about a population
based on observations in a sample
7
-
8/11/2019 Llsk_confidence Interval Printout
8/32
Framework for statistical inference
8
Sampling
Inference
Population Sample
Random sampling, study design
Statistical estimation, hypothesis testing
! 2
"
#
x
s2
r
p
-
8/11/2019 Llsk_confidence Interval Printout
9/32
Example of Population, sample and
parameters
Arbitrary population:! Objective: Smoking = > cancer! Population: ???! The underlying truth process, which is
universally true for the Population
9
-
8/11/2019 Llsk_confidence Interval Printout
10/32
Definitions
Confidence interval, CI! a range of values that probably contain the
population value
Confidence limits! the values that state the boundaries of the
confidence interval
10
-
8/11/2019 Llsk_confidence Interval Printout
11/32
Construction
Most CIs have the following form:
Sample +/- (critical value)x(SE of sample statistics)statistics
margin of error
11
-
8/11/2019 Llsk_confidence Interval Printout
12/32
Construction! The sample statistics is point estimate based on sampled
data (eg, sample mean, sample proportion)
! The critical value represents the desired confidence levelbased on distribution theory (normal, t, Poisson).
! The SE of sample statistics is a measure of the precisionof the sample estimation. In case the estimate is about a
mean (central tendency) then it can also be called asStandard error of the mean (SEM). SE differ to SD, butthey are related (see later slides).
12
-
8/11/2019 Llsk_confidence Interval Printout
13/32
Critical value
! Decide distribution which the desired CI is based on." Continuous: Normal or Student- t" Count: Poisson
" Binary: Binomial! In general, normal (z-table) is the default distribution,
given the sample size is large enough (Central limittheorem).
! Decide type I error rate (
"): Incorrect to claim asignificant results (False positive). In general:
" " = 0.05
13
-
8/11/2019 Llsk_confidence Interval Printout
14/32
-
8/11/2019 Llsk_confidence Interval Printout
15/32
Recall
15
Pr ! 1.96 < z < 1.96( )= 0.95
-
8/11/2019 Llsk_confidence Interval Printout
16/32
Population Sample
Mean
UnbiasedEstimator
m or
# Standard deviation
UnbiasedEstimator
16
x
SD =1
n ! 1 x
i ! x( )2"
SE and SD
SE =!
N =
SD
N
-
8/11/2019 Llsk_confidence Interval Printout
17/32
SE vs SD! Standard Deviation tells you the variability of your
data.
! Standard Error of the mean, SEM, tells you howgood is your estimate of the mean (accuracy). Its ingeneral smaller than SD, but dont let this be areason for you to choose to use it!
! Which one to use is depending on the content, youwant to describe the variability of the data or theaccuracy of your mean estimation?
17
-
8/11/2019 Llsk_confidence Interval Printout
18/32
-
8/11/2019 Llsk_confidence Interval Printout
19/32
Probability and Confidence Interval
! From the CLT we know that
! From a N(0,1)-table we have
! Rearranging gives
! Thus, the interval is a 95% CI for !
November 08, 2012Benjamin Yip 19
! SE
~ z ~ N (0, 1)
Pr ! 1.96 < !
SE < 1.96
"#$
%&'
= 0.95
Pr( ! 1.96 " SE < < + 1.96 " SE ) = 0.95
..96.1 e s!
-
8/11/2019 Llsk_confidence Interval Printout
20/32
*Theory behind CI
! Constructing a CI is simple, only need 3components: statistics (e.g., mean), SE of thestatistics, and desired % CI.
! However, the logic behind is more complicated. Itinvolves three type of standard deviation (SD): SD ofthe population parameter, SD of the sample, and SD
of the sampling distribution.
20
-
8/11/2019 Llsk_confidence Interval Printout
21/32
!"#$% '$ ()*(+*),$ ,#$ -)./*$ .$)% '$ )0$ +-+)**12%,$0$-,$3 %4, 2% ,#$ .$)% 45 ,#2- /)06(+*)0 -)./*$7 8+, 2%
,#$ .$)% 540 2%32923+)*- 45 ,#2- ,1/$ : 2% -,)6-6()* ,$0.-7 45,#$ /4/+*)64% 504. '#2(# ,#$ -)./*$ (4.$- 504.; "$
+-+)**1 (4**$(, 3),) 2% 403$0 ,4
-
8/11/2019 Llsk_confidence Interval Printout
22/32
22
Only 1 CI missed the true mean.
Indicates the true mean (75mmHg)
*95% CI for the mean diastolic BP for 20simulated studies, 50 subjects in simulation
-
8/11/2019 Llsk_confidence Interval Printout
23/32
An example! Suppose that you would like to know the effect of a
newly developed drug (drug A) and a current drug(drug B) on systolic blood pressure (SBP).
! Let say 35 patients were randomly assigned toreceive drug A and another 35 assigned to drug B.The average (mean) SBP among drug A and drug Bpatients was 107 mmHg (SD=19) and 125 (20)mmHg, respectively.
! Construct 95% CI for each group, do the CIs overlapand what is the interpretation?
23
-
8/11/2019 Llsk_confidence Interval Printout
24/32
Mean =
z1-" /2 =
SE = SD/sqrt(N) =
95% CI =
24
95% CI = mean z1! ! /2 SE = mean z1! ! /2SD
N
95% CI for Drug A
-
8/11/2019 Llsk_confidence Interval Printout
25/32
-
8/11/2019 Llsk_confidence Interval Printout
26/32
26
Graph the CIs
100 110 120 130 140
100 110 120 130 140
Drug A
Drug B
Non-overlapping CIs indicating a true (i.e., signicant) mean difference: Drug A is more effective to lower SBP than drug B.
-
8/11/2019 Llsk_confidence Interval Printout
27/32
In general:
27Sourse: http://www.measuringusability.com/blog/ci-10things.php
-
8/11/2019 Llsk_confidence Interval Printout
28/32
Factors that affect the width of a CI are:
! Targeted confidence level, 1- " (higher % wider CI)
! Sample size, N(larger sample size, shorter CI)
! Variability or standard deviation, # (or SD)(higher SD, wider CI)
28
mean z1! ! /2SD
N
-
8/11/2019 Llsk_confidence Interval Printout
29/32
Factors that affect the width of a CI are:Targeted CI, 1- !
! Intuition: a higher confidence interval level withoutimproving data quality means a larger margin of
error.
! As the targeted confidence interval increases, the CIwidth increases, given all other quantities remainunchanged.
29
-
8/11/2019 Llsk_confidence Interval Printout
30/32
Factors that affect the width of a CI are:Sample size, N
! Intuition: a larger sample size means moreinformation, which implies better inference
! As the sample size increases, the CI widthdecreases, given all other quantities remainunchanged.
30
-
8/11/2019 Llsk_confidence Interval Printout
31/32
Factors that affect the width of a CI are:Variability, SD
! Intuition: more variability or larger spread meansmore difficult to estimate population value withoutlarge amounts of data
! As the variability increases, the CI width increases,given all other quantities remain unchanged
31
-
8/11/2019 Llsk_confidence Interval Printout
32/32
5 things to know about CI
1. CI tells you the most likely range of the unknownpopulation statistics (e.g., mean, proportion).
2. CI provides both the location and precision of a
measure3. Three things influence the width of a CI ( " , N, SD)4. Our CI estimated from sample data may or may not
contain the population average.
5.
Overlap in CIs is a quick way to check for statisticalsignificance. However, the term significance ismore related to hypothesis testing.
32