biostatistics in practice session 6: data and analyses: too little or too much youngju pak...
DESCRIPTION
Too Few: Miss an Effect Too Few: Miss an Effect Too Many: Spurious Results Too Many: Spurious Results Numerous analyses due to: Numerous analyses due to: Multiple possible outcomes. Multiple possible outcomes. Ongoing analyses as more subjects accrue. Ongoing analyses as more subjects accrue. Many potential subgroups. Many potential subgroups. Too Little or Too Much: AnalysesTRANSCRIPT
Biostatistics in Practice
Session 6: Data and Analyses:
Too Little or Too Much
Youngju PakBiostatistician
• http://research.LABioMed.org/Biostat
• Too Little• Too few subjects: study not sufficiently
powered (Session 4)• A biasing characteristic not measured:
attributability of effects questionable (Session 5)
• Subjects do not complete study, or do not comply, e.g., take all doses (This session)
• “Too Much”• All subjects, not a sample (This session)• Irrelevant detectability (This session)
Too Little or Too Much: Data
• Too Few: Miss an Effect
• Too Many: Spurious Results
• Numerous analyses due to:
Multiple possible outcomes.
• Ongoing analyses as more subjects accrue.
• Many potential subgroups.
Too Little or Too Much: Analyses
Non-Completing or
Non-Complying Subjects
All Study Subjects or “Appropriate” Subset What is the most relevant group of studied subjects: all randomized, or mostly compliant, or completed study, or …?
• Possible Bias Using Only Completers
• Comparison: % cured, placebo vs. treated.• Many more placebo subjects are not curing and go elsewhere; do not complete study.
• Cure rate is biased upward in placebo completers
under-estimate the treatment effect• If cure rate is biased upward in treatment completers over-estimate the treatment effect
• Study Goal:Scientific effect?Societal impact?
• Potential Biased Conclusions:Why not completed?Study arms equivalent?
• Criteria for Appropriate Subset
Primarily Compliance
Primarily Dropout
Possible Study Populations • Per-Protocol Subjects:
• Had all measurements, visits, doses, etc.• “Modified”: relaxations , e.g., 85% of doses.
• Emphasis on scientific effect.
• Intention-to-Treat Subjects:• Everyone who was randomized.• “Modified”: slight relaxations, e.g., ≥ 1 dose.• Emphasis on non-biased policy conclusion.
• Intention-to-Treat (ITT)
• ITT specifies the population; it includes non-completers.
• Still need to define outcomes for non-completers, i.e., “impute” values.
• Typical to define non-completers as not cured.
ITT: Two Ways to Impute Unknown Values
• Change from
Baseline
• Baseline • Final Visit
• Intermediate Visit
• 0
• Change from
Baseline
• Intermediate Visit
• Final Visit
• Baseline
• 0
• LOCF:• Ignore Presumed
Progression
• LRCF:• Maintain
Expected Relative
Progression
• Individual
Subjects
• Ranks
• Observations
• “Too Much” Data
• All Possible Data, No Sample
• “Too much” data to need probabilistic statements; already have the whole truth.
• Not always as obvious as it sounds.• Examples: Electric Medical Records(EMR), some chart reviews; site-specific, not samples.
• Confidence intervals usually irrelevant.• Reference ranges, some non - generalizable comparisons may be valid.
• Irrelevant (?) Detectability with Large Study• Significant differences (p<0.05) in %s between
placebo and treatment groups:• N/Group Difference #Treated* to Cure 1
• 100 50% vs. 63.7% 7• 1000 50% vs. 54.4% 23• 5000 50% vs. 52.0% 50• 10000 50% vs. 51.4% 71• 50000 50% vs. 50.6% 167
• *NNT = Number Needed to Treat = 100/Δ
Too Little or Too Much: Analyses
• Too Little or Too Much: Analyses
• Multiple:• Outcomes• Subgroups• Ongoing effects
• Exploring vs. Proving
• Balance Between Missing an Effect and Spurious Results
• Food Additives and Hyperactivity Study:
• Uses composite score.
• Many other indicators of hyperactivity.
Multiple Outcomes
Multiple Outcomes
• GHA: Global Hyperactivity Aggregate
• Teacher
ADHD
• Parent ADHD
• Class ADH
D• Conner
• …
• …
• …
• …
• 10 Items
• 10 Items
• 12 Items
• 4 Items
• Could perform: 10 + 10 + 12 + 4 = 36 item analyses.
• pp. 1667-69
• Editorial:
• Multiple Subgroup Analyses: Example
• Comparing Two Treatments in 25
Subgroups + Overall
• Multiple Subgroup Analyses: Example
Multiple Subgroup Analyses• Lagakos NEJM 354(16):1667-1669.
• False Positive
Conclusions• 72%
chance of claiming at least one false effect with 25 comparisons
A Correction for Multiple Analyses• No Correction:• If using p<0.05, then P[ true negative] = 0.95.• If 25 comparisons are independent, P[all true negative] = (1-0.05)25 = (0.95)25 = 0.28.• So, P[at least 1 false pos] = 1 - 0.28 = 0.72.• Bonferroni Correction:• To maintain P[true negative in k tests] = 0.95 = (1-
p*)k, need to use p* = 1 - (0.95)1/k ≈ 0.05/k• So, use p<0.05/k to maintain <5% overall false
positive rate(type I error rate).
• Some formal corrections “built-in” to p-values:• Bonferroni: general purpose• Tukey: for pairs of group means, >2 groups• Many statistical software will compute
adjusted p-values due to the multiple tests using these methods
Accounting for Multiple Analyses
• Formal corrections may not be necessary:• Transparency of what was done is most important. • Should be aware yourself of number of analyses
and report it with any conclusions.
• Cohan, Crit Care Med 33(10):2358-2366.
Reporting Multiple Analyses• Clopidogrel paper 4 slides back:• No p-values or probabilistic conclusions for 25
subgroups, and:
• Another paper’s transparency:
Multiple Mid-Study Analyses • Should effects be monitored as more and more subjects complete?
• Some mid-study analyses:
• Interim analyses• Study size re-evaluation• Feasibility analyses
Mid-Study Analyses
• Effect
• 0
• Number of Subjects Enrolled• Time →
• Too many analyses
• Wrong early conclusion
• Need to monitor, but also account for many analyses
Mid-Study Analyses Mid-study comparisons should not be made
before study completion unless planned for (interim analyses). Early comparisons are unstable, and can invalidate final comparisons.
Interim analyses are planned comparisons at specific times, usually by an unmasked advisory board. They allow stopping the study early due to very dramatic effects, and final comparisons, if study continues, are adjusted to validly account for “peeking”.
• Continued …
Mid-Study Analyses Mid-study reassessment of study size is advised
for long studies. Only standard deviations to date, not effects themselves, are used to assess original design assumptions.
Feasibility analysis: may use the assessment noted above to
decide whether to continue the study. may measure effects, like interim analyses, by
unmasked advisors, to project ahead on the likelihood of finding effects at the planned end of study.
• Continued …
Mid-Study Analyses
• Study 1: Groups do not differ; plan to add more subjects.
• Consequence → final p-value not valid; probability requires no prior knowledge of effect.
• Study 2: Groups differ significantly; plan to stop study.
• Consequence → use of this p-value not valid; the probability requires incorporating later comparison.
• Examples: Studies at Harbor• Randomized; not masked; data available to PI.• Compared treatment groups repeatedly, as more
subjects were enrolled.
Bad Science That Seems So Good1. Re-examining data, or using many outcomes,
seeming to be due diligence.
2. Adding subjects to a study that is showing marginal effects; stopping early due to strong results.
3. Looking for effects in many subgroups.
• Actually bad? Could be negligent NOT to do these, but need to account for doing them.
How to avoid the misled result Analyses should be planned before the
data are collected (how many dependent and independent variables are to be collected, what hypotheses to be tested.
All planned analyses should be completed and reported.
1. Study designs2. Descriptive vs. Inferential statistics3. Hypothesis testing and a p-value 4. Five elements to determine a sample size5. Covariates and multivarite regression models6. Bonferroni’s correction
We have learned ..
EPILOGUE
GIVE A BIG CLAP TO YOURSELF
SINCE YOU ‘VE MADE THIS FAR !
CONGRATULATION !!!33