power analysis - sneb.org · power analysis • by the end of this webinar, participants should be...
TRANSCRIPT
Power Analysis
• By the end of this webinar, participants should be able to:
• Explain the significance of a power analysis
• Use effect size to determine outcomes in an intervention
• Conduct a power analysis using Sample Power or other software
Karen Chapman-Novakofski, PhD, RDN
The Research Experiment
To reject or not to reject.
That is the question!
• Develop hypothesis and null hypothesis
• Set alpha, usually .05
• Calculate power and determine sample size
• Collect data, conduct stats, calculate P
• Compare P to alpha
• P<.05, reject null hypothesis
• P> .05, fail to reject null hypothesis
Type I error-α
• Incorrect rejection of a true null hypothesis—we hypothesize that you are not pregnant; false positive You’re pregnant!
• Alpha is the maximum probability that we have a type I error.
• For a 95% confidence level, the value of alpha is usually 0.05.
• This means that there is a 5% probability that we will reject a true null hypothesis.
• One out of every twenty hypothesis tests that we perform at this level will result in a type I error.
Protection!
• Corrections for multiple comparisons within 1 data set:
• Bonferroni
• Benjamini-Hochberg
• When a statistical test is not significant, it means that the data do not provide strong evidence that the null hypothesis is false.
• Lack of significance does not support the conclusion that the null hypothesis is true.
What’s so powerful about a power analysis?
Type II error-β
• We do not reject a null hypothesis (we hypothesize that you are not pregnant)-that is false.
You’re not pregnant!
• Typically when we try to decrease the probability one type of error, the probability for the other type increases.
• We could decrease the value of alpha from 0.05 to 0.01, corresponding to a 99% level of confidence of avoiding a type I error.
• However, if everything else remains the same, then the probability of a type II error (β) will nearly always increase.
• The probability of a Type II error is called β (beta).
• The probability of correctly rejecting a false null hypothesis equals 1- β and is called power.
• Reject the null hypothesis of you are not pregnant, and you feel pretty confident you will get the baby!
• The power of your test generally depends
on four things:
1. your sample size,
2. the variability of the sample
3. the effect size you want to be able to
detect (usually medium),
4. the Type I error rate (alpha, usually .05).
• Power is usually specified at 0.80, that is,
80% likely to be right.
Sample size
• The sample size is chosen to maximize the chance of uncovering a specific mean difference, which is also statistically significant.
2.8 1 5
1 3.2 5
With so few participants in each group (n=10) it is difficult to say if these
are significantly different groups. You have a lot of overlap. Only more
subjects will let you see the distinction between the groups.
• The power of your test generally depends
on four things:
1. your sample size,
2. the variability of the sample
3. the effect size you want to be able to
detect (usually medium),
4. the Type I error rate (alpha, usually .05).
• Power is usually specified at 0.80, that is,
80% likely to be right.
Central Limit Theorem
• The central limit theorem states that the sampling distribution will be normal or nearly normal, if the sample size is large enough.
– The population distribution is normal.
How can you tell?
• Visual
• The frequency distribution (histogram), stem-and-leaf plot, boxplot, P-P plot (probability-probability plot), and Q-Q plot (quantile-quantile plot) are used for checking normality visually.
How can you tell?
• Statistically
• The main tests for the assessment of normality are
– Kolmogorov-Smirnov (K-S) test )
– Lilliefors corrected K-S test
– Shapiro-Wilk test
• So if your data are normally distributed, might invoke Central Limit Theorem…
• More popular is power analysis
Variability of sample
Variability
Variability
• Measures
– Range
– Interquartile range
– Variance
– Standard deviation
• The power of your test generally depends
on four things:
1. your sample size,
2. the variability of the sample
3. the effect size you want to be able to
detect (usually medium),
4. the Type I error rate (alpha, usually .05).
• Power is usually specified at 0.80, that is,
80% likely to be right.
Effect size
• Magnitude of difference you are looking for
• Usually, standardized difference between two means
• Cohen’s d
• Cohen suggested that d=0.2 be considered a 'small' effect size, 0.5 represents a 'medium' effect size and 0.8 a 'large' effect size.
• This means that if two groups' means don't differ by 0.2 standard deviations or more, the difference is trivial, even if it is statistically significant.
• This can vary, depending on means compared.
• Would 0.2 difference in serving of vegetables be important?
Various formulas depend on type of statistic
e.g., for difference in means (t-test)
d= (mean1 – mean2) /standard deviation
Various labels,
d for difference in two means
So for .2 effect size in vegetable difference would depend on the means and SDs.
Computing Effect Size
Effect size
• Mean 1 = 2 cups/day
• Mean 2 = 1 cup/day
• SD = 5 cups
• 2-1/5 = effect size of .2!
• Mean 1 = .2 cups/day [1/5 cup]
• Mean 2 = .1 cup/day [1/10 cup]
• SD = .5 cup
• .2-.1/.5 = effect size of .2!
Effect size- other indicators
• Measure of size of association-Cohen’s
• Correlation/regression coefficients r and R are actually measures of effect size
• Cohen provided rules of thumb for interpreting these effect sizes, suggesting that an r of |.1| represents a 'small' effect size, |.3| represents a 'medium' effect size and |.5| represents a 'large' effect size
Effect size
• Can use previously published data to calculate
• Own pilot data
• Medium effect sizes are often used with nutrition education or psychosocial studies.
Number of subjects needed for different effect sizes. Bright blue = a small effect
size: 30 vs 40%; Dark blue = medium effect size: 30% vs 50%; Blue-green =
large effect size: 30% vs 60%.
For power calculation, you will need to know:
• What type of test you plan to use (e.g., independent t-test, paired t-test, ANOVA, regression, etc.)
• The alpha value and significance
• The expected effect size
Homework given
• Illinois BRFSS 2005
• Fruit and vegetable intake in adults
• How many adults will you need to see an effect of your intervention, at 80% power, alpha .05, with an effect of .5?
Sample Power
Example
• What type of test you plan to use:
– independent t-test
• The alpha value or significance
– .05
• The expected effect size
– .50
IL BRFSS 2005 Total FV Intake
3 0
• Means or mean difference
• Variance or standard deviation
Calculating effect size and estimating Mean2
• Mean1= 3.91
• Mean2= x
• SD = 2.22
• d= 3.91-x/2.22
• If we want a medium effect size of .5, then
• .5=3.91-x/2.22
• 1.11=3.91-x
• 1.11=3.91-2.8
• Mean2=2.8
Sample Power
Very large sample sizes will usually have
statistical significance
Post hoc power analysis
• For
– No difference or not enough people
– Can provide estimate for future studies
– Analysis of pilot or novel data
• Against
– Can’t tell because if you add people you add data so different
– Should use CI instead
WIC Farmers’ Market Nutrition Program
• 1992
• F&V
• Awareness of and use of FM
• Vouchers, $3 increments
Show me
the
data!
• ↑ F&V intake • Herman DR, Harrison GG, Afifi AA, Jenks E. Am J Public
Health. 2008 Jan;98(1):98-105
• FMNP participants & nonparticipants • Kropf ML, Holben DH, Holcomb JP Jr, Anderson H. J Am
Diet Assoc. 2007 Nov;107(11):1903-8
Cost as a Barrier
• F&V sold at FM would not cost more than at grocery stores
• 3 FM (12 vendors) and 5 grocery stores: WIC clinic
• Prices collected biweekly, mid-May to mid-August
• Lowest unit price recorded
– Note voucher value doubled June/July
Corn, peppers, squash ns
Strawberries needed >3 pairs
Summary & Conclusions
• Power analysis is a statistical tool
• Helps you to avoid accepting a null hypothesis as “no difference” when there might be
• Many more sophisticated applications
Software
• Commercial
• SamplePower is available from SPSS
• NCSS/PASS (power and sample size)
• Power and Precision http://www.power-analysis.com/software_overview.htm
• Free
• Gpower http://download.cnet.com/G-Power/3000-2054_4-10647044.html
• PS http://biostat.mc.vanderbilt.edu/wiki/Main/PowerSampleSize