topics in biostatistics: part ii
Post on 04-Jul-2015
279 Views
Preview:
TRANSCRIPT
CCEB
Topics in BiostatisticsPart 2
Sarah J. Ratcliffe, Ph.D.Sarah J. Ratcliffe, Ph.D.Center for Clinical Epidemiology and Center for Clinical Epidemiology and
BiostatisticsBiostatisticsUniversity of Penn School of Medicine University of Penn School of Medicine
CCEB
Outline
Hypothesis testingHypothesis testing ExamplesExamples Interpreting resultsInterpreting results ResourcesResources
CCEB
Hypothesis testing
Steps:Steps: Select a one-sided or two-sided test.Select a one-sided or two-sided test. Establish the level of significance (e.g., Establish the level of significance (e.g., αα
= .05).= .05). Select an appropriate test statistic.Select an appropriate test statistic. Compute test statistic with actual data.Compute test statistic with actual data. Calculate degrees of freedom (Calculate degrees of freedom (dfdf) for the ) for the
test statistic.test statistic.
CCEB
Hypothesis testing
Steps cont’d:Steps cont’d: Obtain a tabled value for the statistical Obtain a tabled value for the statistical
test.test. Compare the test statistic to the tabled Compare the test statistic to the tabled
value.value. Calculate a p-value.Calculate a p-value.
Make decision to accept or reject null Make decision to accept or reject null hypothesis.hypothesis.
CCEB
Hypothesis testing
Steps:Steps: Select a one-sided or two-sided test.Select a one-sided or two-sided test. Establish the level of significance (e.g., Establish the level of significance (e.g., αα = .05). = .05). Select an appropriate test statistic.Select an appropriate test statistic. Compute test statistic with actual data.Compute test statistic with actual data. Calculate degrees of freedom (Calculate degrees of freedom (dfdf) for the test ) for the test
statistic.statistic.
CCEB
Hypothesis testing: One-sided versus Two-sided
Determined by the alternative hypothesis. Unidirectional = one-sided
Example: Infected macaques given vaccine or placebo. Higherviral-replication in vaccine group has no benefit ofinterest.
H0: vaccine has no beneficial effect on viral-replication levels at 6 weeks after infection.
Ha: vaccine lowers viral-replication levels by 6 weeks after infection.
CCEB
Hypothesis testing: One-sided versus Two-sided
Bi-directional = two-sidedExample:
Infected macaques given vaccine or placebo. Interested in whether vaccine has any effect on viral-replication levels, regardless of direction of effect.
H0: vaccine has no beneficial effect on viral-replication levels at 6 weeks after infection.
Ha: vaccine effects the viral-replication levels.
CCEB
Hypothesis testing
Steps:Steps: Select a one-sided or two-sided test.Select a one-sided or two-sided test. Establish the level of significance (e.g., Establish the level of significance (e.g., αα = . = .
05).05). Select an appropriate test statistic.Select an appropriate test statistic. Compute test statistic with actual data.Compute test statistic with actual data. Calculate degrees of freedom (Calculate degrees of freedom (dfdf) for the test ) for the test
statistic.statistic.
CCEB
Hypothesis testing: Level of Significance
How many different hypotheses are being examining?
How many comparisons are needed to answer this hypothesis?
Are any interim analyses planned?e.g. test data, depending on results
collect more data and re-test.=>=> How many tests will be ran in total?How many tests will be ran in total?
CCEB
Hypothesis testing: Level of Significance
αtotal = desired total Type-I error (false positives) for all comparisons.
One test α1 = αtotal
Multiple tests / comparisons If αi = αtotal, then ∑αi > αtotal
Need to use a smaller α for each test.
CCEB
Hypothesis testing: Level of Significance
Conservative approach: αi = αtotal / number comparisons
Can give different α’s to each comparison. Formal methods include: Bonferroni, Tukey-
Cramer, Scheffe’s method, Duncan-Walker. O’Brien-Fleming boundary or a Lan and Demets analog
can be used to determine αi for interim analyses.
Benjamini Y, and Hochberg Y (1995) Controlling the false discovery rate: a practical and powerful approach to multiple testing. JRSSB, 57:125-133.
CCEB
Hypothesis testing
Steps:Steps: Select a one-tailed or two-tailed test.Select a one-tailed or two-tailed test. Establish the level of significance (e.g., Establish the level of significance (e.g., αα = .05). = .05). Select an appropriate test statistic.Select an appropriate test statistic. Compute test statistic with actual data.Compute test statistic with actual data. Calculate degrees of freedom (Calculate degrees of freedom (dfdf) for the test ) for the test
statistic.statistic.
CCEB
Hypothesis testing: Selecting an Appropriate test
How many samples are being compared? One sample Two samples Multi-samples
Are these samples independent? Unrelated subjects in each sample. Subjects in each sample related / same.
CCEB
Hypothesis testing: Selecting an Appropriate test
Are your variables continuous or categorical? If continuous, is the data normally distributed?
Normality can be determined using a P-P
(or Q-Q) plot. Plot should be approximately a straight line
for normality. If not normal, can it be transformed to
normality?Blindly assuming normality can lead to
wrong conclusions!!!
CCEB
Hypothesis testing: Selecting an Appropriate test
Approximately a straight line
= normal assumption okay
CCEB
Hypothesis testing: Selecting an Appropriate test
Not a straight line
= NOT normal
Can it be transformed to normality?
CCEB
Hypothesis testing: Selecting an Appropriate test
The natural log transform of the data is approximately a straight line
= normal assumption okay
Analyze the transformed data NOT the original data.
CCEB
Hypothesis testing: Geometric versus Arithmetic mean
GeometricGeometric mean of n positive numerical values is mean of n positive numerical values is the nth root of the product of the n values. the nth root of the product of the n values.
GeometricGeometric will always be will always be less thanless than arithmeticarithmetic.. GeometricGeometric better when some values are very large better when some values are very large
in magnitude and others are small.in magnitude and others are small. If If geometricgeometric is used, log-transform the data before is used, log-transform the data before
analyzing. analyzing. Arithmetic mean of log-transformed data is the Arithmetic mean of log-transformed data is the
log of the geometric mean of the data log of the geometric mean of the data E.g. t-test on log-transformed data = test for E.g. t-test on log-transformed data = test for
location of the geometric mean location of the geometric mean Langley R., Langley R., Practical Statistics Simply ExplainedPractical Statistics Simply Explained, 1970, , 1970,
Dover Press Dover Press
CCEB
Source: Richardson & Overbaugh (2005). Basic statistical considerations in virological experiments. Journal of Virology, 29(2): 669-676.
Type of Data
No. of samplesbeing
compared
Relationshipbetweensamples
Underlyingdistribution ofall samples Potential statistical tests
Binary 1 n/a Binary One sample binomial test
Binary 2 Independent Binary Chi-square test, Fisher's exact test
Binary >2 Independent Binary Chi-square test
Binary 2 Paired Binary McNemar's test
Binary >2 Related Binary Cochran's Q test
Continuous 1 n/a NormalOne sample t-test for means, one-
sample chi-square test fro variances
Continuous 1 n/a Non-normalOne sample Wilcoxon signed-rank test,
one-sample sign test
Continuous 2 Independent NormalTwo-sample t-test for means, two-sample
F test for variances
Continuous 2 Independent Non-normal Wilcoxon rank sum test
Continuous 2 Paired Normal Paired t-test
Continuous 2 Paired Non-normal Wilcoxon signed-rank test, sign test
Continuous >2 Independent NormalOne-way ANOVA for means, Bartlett's
test of homogeneity for variances
Continuous >2 Independent Non-normal Kruskal-Wallis test
Continuous >2 Related Non-normal Friedman rank sum test
CCEB
Hypothesis testing: Selecting an Appropriate test
Other tests are available for more complex situations. For example,
Repeated measures ANOVA: >2 measurements taken on each subject; usually interested in time effect.
GEEs / Mixed-effects models: >2 measurements taken on each subject; adjust for other covariates.
CCEB
Hypothesis testing
Steps:Steps: Select a one-tailed or two-tailed test.Select a one-tailed or two-tailed test. Establish the level of significance (e.g., Establish the level of significance (e.g., αα = .05). = .05). Select an appropriate test statistic.Select an appropriate test statistic. Run the testRun the test..
CCEB
Example 1
Expression of chemokine receptors on CD14+/CD14- populations of blood monocytes.
Percent of cells positive by FACS.
CCEB
CCR8
subject CD14+ CD14-
1 5 17
2 9 25
3 13 36
4 2 9
5 5 18
6 0 2
7 6 6
8 21 30
9 5 6
10 36 35
mean 10.2 18.4
st dev 10.9 12.6
st error 3.4 4.0
CCEB
Example 1 cont’d
Continuous data, 2 samples=> t-test, if normal OR=> Wilcoxon rank sum or signed-rank
sum test, if non-normal Are samples independent or paired?
If independent, can test for equality of variances using a Levene’s test
CCEB
Example 1 cont’d
T-tests in excel
=TTEST(L6:L15,M6:M15,2,2)
Cells containing data from sample 1
Cells containing data from sample 2
1-sided or 2-sided test
Type of t-test:
1: paired
2: independent, equal variance
3: independent, unequal variance
CCEB
CCEB
Example 1 cont’d Possible results for different assumptions:
P-valuesP-values Normal Normal (t-tests)(t-tests)
Non-normal Non-normal (non-parametric (non-parametric
tests)tests)
Independent, Independent, equal varianceequal variance
0.1370.137
Independent, Independent, unequal varianceunequal variance
0.1370.137 0.1050.105
PairedPaired 0.0100.010 0.0130.013
CCEB
Example 1 cont’d
Which result is correct? Data are paired The differences for each subject are
normally distributed.=> Paired t-test
p = .0095There is a difference in the percentage of
positive CD14+ and CD14- cells.
CCEB
A graph of the 95% CIs for the means would give the impression there is no difference …
CCEB
When it’s really the differences we are testing.
CCEB
Example 1 cont’d
Note: paired tests don’t always give lower p-values.
A 1-sided test on the CCR5 values would give p-values of:
p = 0.06 independent samplesp = 0.11 paired samples
WHY?
CCEB
Example 1 cont’d
The differences have a larger spread than the individual variables.
CCEB
Example 2
Does the level of CCR5 expression on PBLs (basal or upregulated using lentiviral vector) determine the % of entry that occurs via CCR5?
Two viruses 89.6 DH12
CCEB
Example 2 cont’dCCR5-mediated entry into PBL from 6 donors
89.6y = 3.7371x - 0.1265
R2 = 0.4473
DH12y = 4.1408x + 4.2137
R2 = 0.4333
0
4
8
12
16
20
0 0.5 1 1.5 2 2.5
% of cells CCR5 positive
% o
f e
ntr
y m
ed
iate
d b
y C
CR
5
89.6
DH12
Linear (89.6)
Linear (DH12)
CCEB
Example 2 cont’d
How do we know if the slope of the line is significantly different from 0?
Can perform a t-test on the slope estimate. For simple linear regression, this is the same as a t-test for correlation (= square root of R2).
CCEB
Example 2 cont’d
CCEB
Interpreting Results
P-values Is there a statistically significant result? If not, was the sample size large
enough to detect a biologically meaningful difference?
CCEB
Online Resources
Power / sample size calculatorsPower / sample size calculators http://calculators.stat.ucla.edu/powercalc/http://calculators.stat.ucla.edu/powercalc/ http://www.stat.uiowa.edu/~rlenth/Power/http://www.stat.uiowa.edu/~rlenth/Power/
Free statistical softwareFree statistical software http://members.aol.com/johnp71/javasta2.html#http://members.aol.com/johnp71/javasta2.html#
FreebiesFreebies
CCEB
BECC – Consulting Center
www.cceb.upenn.edu/main/center/becc.htmlwww.cceb.upenn.edu/main/center/becc.html Hourly fee serviceHourly fee service Design and analysis strategies for research Design and analysis strategies for research
proposals; proposals; Selecting and implementing appropriate statistical Selecting and implementing appropriate statistical
methods for specific applications to research data; methods for specific applications to research data; Statistical and graphical analysis of data; Statistical and graphical analysis of data; Statistical review of manuscripts.Statistical review of manuscripts.
top related