biostatistics in practice session 6: data and analyses: too little or too much youngju pak...

Biostatistics in Practice

Session 6: Data and Analyses:

Too Little or Too Much

Youngju PakBiostatistician

• http://research.LABioMed.org/Biostat

• Too Little• Too few subjects: study not sufficiently

powered (Session 4)• A biasing characteristic not measured:

attributability of effects questionable (Session 5)

• Subjects do not complete study, or do not comply, e.g., take all doses (This session)

• “Too Much”• All subjects, not a sample (This session)• Irrelevant detectability (This session)

Too Little or Too Much: Data

• Too Few: Miss an Effect

• Too Many: Spurious Results

• Numerous analyses due to:

Multiple possible outcomes.

• Ongoing analyses as more subjects accrue.

• Many potential subgroups.

Too Little or Too Much: Analyses

Non-Completing or

Non-Complying Subjects

All Study Subjects or “Appropriate” Subset What is the most relevant group of studied subjects: all randomized, or mostly compliant, or completed study, or …?

• Possible Bias Using Only Completers

• Comparison: % cured, placebo vs. treated.• Many more placebo subjects are not curing and go elsewhere; do not complete study.

• Cure rate is biased upward in placebo completers

under-estimate the treatment effect• If cure rate is biased upward in treatment completers over-estimate the treatment effect

• Study Goal:Scientific effect?Societal impact?

• Potential Biased Conclusions:Why not completed?Study arms equivalent?

• Criteria for Appropriate Subset

Primarily Compliance

Primarily Dropout

Possible Study Populations • Per-Protocol Subjects:

• Had all measurements, visits, doses, etc.• “Modified”: relaxations , e.g., 85% of doses.

• Emphasis on scientific effect.

• Intention-to-Treat Subjects:• Everyone who was randomized.• “Modified”: slight relaxations, e.g., ≥ 1 dose.• Emphasis on non-biased policy conclusion.

• Intention-to-Treat (ITT)

• ITT specifies the population; it includes non-completers.

• Still need to define outcomes for non-completers, i.e., “impute” values.

• Typical to define non-completers as not cured.

ITT: Two Ways to Impute Unknown Values

• Change from

Baseline

• Baseline • Final Visit

• Intermediate Visit

• 0

• Change from

Baseline

• Intermediate Visit

• Final Visit

• Baseline

• 0

• LOCF:• Ignore Presumed

Progression

• LRCF:• Maintain

Expected Relative

Progression

• Individual

Subjects

• Ranks

• Observations

• “Too Much” Data

• All Possible Data, No Sample

• “Too much” data to need probabilistic statements; already have the whole truth.

• Not always as obvious as it sounds.• Examples: Electric Medical Records(EMR), some chart reviews; site-specific, not samples.

• Confidence intervals usually irrelevant.• Reference ranges, some non - generalizable comparisons may be valid.

• Irrelevant (?) Detectability with Large Study• Significant differences (p<0.05) in %s between

placebo and treatment groups:• N/Group Difference #Treated* to Cure 1

• 100 50% vs. 63.7% 7• 1000 50% vs. 54.4% 23• 5000 50% vs. 52.0% 50• 10000 50% vs. 51.4% 71• 50000 50% vs. 50.6% 167

• *NNT = Number Needed to Treat = 100/Δ

Too Little or Too Much: Analyses

• Too Little or Too Much: Analyses

• Multiple:• Outcomes• Subgroups• Ongoing effects

• Exploring vs. Proving

• Balance Between Missing an Effect and Spurious Results

• Food Additives and Hyperactivity Study:

• Uses composite score.

• Many other indicators of hyperactivity.

Multiple Outcomes

Multiple Outcomes

• GHA: Global Hyperactivity Aggregate

• Teacher

ADHD

• Parent ADHD

• Class ADH

D• Conner

• …

• …

• …

• …

• 10 Items

• 10 Items

• 12 Items

• 4 Items

• Could perform: 10 + 10 + 12 + 4 = 36 item analyses.

• pp. 1667-69

• Editorial:

• Multiple Subgroup Analyses: Example

• Comparing Two Treatments in 25

Subgroups + Overall

• Multiple Subgroup Analyses: Example

Multiple Subgroup Analyses• Lagakos NEJM 354(16):1667-1669.

• False Positive

Conclusions• 72%

chance of claiming at least one false effect with 25 comparisons

A Correction for Multiple Analyses• No Correction:• If using p<0.05, then P[ true negative] = 0.95.• If 25 comparisons are independent, P[all true negative] = (1-0.05)25 = (0.95)25 = 0.28.• So, P[at least 1 false pos] = 1 - 0.28 = 0.72.• Bonferroni Correction:• To maintain P[true negative in k tests] = 0.95 = (1-

p*)k, need to use p* = 1 - (0.95)1/k ≈ 0.05/k• So, use p<0.05/k to maintain <5% overall false

positive rate(type I error rate).

• Some formal corrections “built-in” to p-values:• Bonferroni: general purpose• Tukey: for pairs of group means, >2 groups• Many statistical software will compute

adjusted p-values due to the multiple tests using these methods

Accounting for Multiple Analyses

• Formal corrections may not be necessary:• Transparency of what was done is most important. • Should be aware yourself of number of analyses

and report it with any conclusions.

• Cohan, Crit Care Med 33(10):2358-2366.

Reporting Multiple Analyses• Clopidogrel paper 4 slides back:• No p-values or probabilistic conclusions for 25

subgroups, and:

• Another paper’s transparency:

Multiple Mid-Study Analyses • Should effects be monitored as more and more subjects complete?

• Some mid-study analyses:

• Interim analyses• Study size re-evaluation• Feasibility analyses

Mid-Study Analyses

• Effect

• 0

• Number of Subjects Enrolled• Time →

• Too many analyses

• Wrong early conclusion

• Need to monitor, but also account for many analyses

Mid-Study Analyses Mid-study comparisons should not be made

before study completion unless planned for (interim analyses). Early comparisons are unstable, and can invalidate final comparisons.

Interim analyses are planned comparisons at specific times, usually by an unmasked advisory board. They allow stopping the study early due to very dramatic effects, and final comparisons, if study continues, are adjusted to validly account for “peeking”.

• Continued …

Mid-Study Analyses Mid-study reassessment of study size is advised

for long studies. Only standard deviations to date, not effects themselves, are used to assess original design assumptions.

Feasibility analysis: may use the assessment noted above to

decide whether to continue the study. may measure effects, like interim analyses, by

unmasked advisors, to project ahead on the likelihood of finding effects at the planned end of study.

• Continued …

Mid-Study Analyses

• Study 1: Groups do not differ; plan to add more subjects.

• Consequence → final p-value not valid; probability requires no prior knowledge of effect.

• Study 2: Groups differ significantly; plan to stop study.

• Consequence → use of this p-value not valid; the probability requires incorporating later comparison.

• Examples: Studies at Harbor• Randomized; not masked; data available to PI.• Compared treatment groups repeatedly, as more

subjects were enrolled.

Bad Science That Seems So Good1. Re-examining data, or using many outcomes,

seeming to be due diligence.

2. Adding subjects to a study that is showing marginal effects; stopping early due to strong results.

3. Looking for effects in many subgroups.

• Actually bad? Could be negligent NOT to do these, but need to account for doing them.

How to avoid the misled result Analyses should be planned before the

data are collected (how many dependent and independent variables are to be collected, what hypotheses to be tested.

All planned analyses should be completed and reported.

1. Study designs2. Descriptive vs. Inferential statistics3. Hypothesis testing and a p-value 4. Five elements to determine a sample size5. Covariates and multivarite regression models6. Bonferroni’s correction

We have learned ..

EPILOGUE

GIVE A BIG CLAP TO YOURSELF

SINCE YOU ‘VE MADE THIS FAR !

CONGRATULATION !!!33

biostatistics in practice session 6: data and analyses: too little or too much youngju pak...

Documents