Transcript

STATISTICAL QUESTION

Multiple significance tests: the Bonferroni correctionPhilip Sedgwick senior lecturer in medical statistics

Centre for Medical and Healthcare Education, St George’s, University of London, Tooting, London, UK

Researchers assessed the effects of hormone replacementtherapy, consisting of combined oestrogen and progestogen, onhealth related quality of life. A randomised placebo controlled,double blind trial study design was used.Women were recruitedif they were postmenopausal, had a uterus, and were aged 50-69at randomisation. Outcome measures included health relatedquality of life and psychological wellbeing. The study periodwas one year.1

The researchers investigated the effects of combined hormonereplacement therapy compared with placebo at one year, usinga 0.05 (5%) critical level of significance and adjusting this withthe Bonferroni correction. The researchers concluded thatcombined hormone replacement therapy started many yearsafter the menopause can improve health related quality of life.For which of the following does the Bonferroni correctionreduce the probability of occurring?

a) Type I errorb) Type II error

AnswersThe Bonferroni correction reduces the probability of making atype I error (answer a) but not a type II error (answer b).Combined hormone replacement therapy was compared withplacebo using statistical hypothesis testing, the purpose of whichwas to make inferences about the population on the basis of thesample. However, if the sample was not representative of thepopulation then errors could have been committed in thehypothesis testing. Two types of error were possible, type I andII, described in a previous question.2 The purpose of theBonferroni correction was to limit the probability of committinga type I error (answer a).Type I and II errors would both result in the incorrect inferencebeing made about the effectiveness of the combined hormonereplacement therapy. A type I error would occur if the nullhypothesis was incorrectly rejected in favour of thealternative—that is, if there was a difference in outcome betweencombined hormone treatment and placebo in the sample but notin the population. A type I error would occur because ofsampling error: only a proportion of the population was studied,

possibly resulting in an unrepresentative sample. Sampling errorcan also result in a type II error, which is when the nullhypothesis is not rejected in favour of the alternative when itshould have been—that is, there is a difference in outcome inthe population between combined hormone treatment andplacebo but the difference was not seen in the sample. However,the Bonferroni correction does not limit the probability of a typeII error occurring (answer b is false). Sampling error can bereduced by increasing sample size, thus obtaining a morerepresentative sample, and therefore doing so increases thepower of the statistical test.3

For each hypothesis test in the study, the P value was derivedby hypothetically repeating the study an infinite number oftimes. The P value is the proportion of these hypothetical studiesthat would have produced a test statistic greater or equal to theabsolute value calculated in the above study. The critical levelof significance is set at 0.05 (5%). Therefore, for each hypothesistest the null hypothesis would be rejected in favour of thealternative for those 5% of the infinite number of studies withthe largest test statistics; hence for any hypothesis test themaximum probability of rejecting the null hypothesis was 0.05.Since any hypothesis test could result in a type I error, theprobability of it occurring for each test was 0.05.Whenmultiplehypothesis tests are performed, the probability of a type I erroroccurring is greater than 0.05.4

Care must be taken when studies undertake a large number ofstatistical tests—ultimately some of these will result in a typeI error. However, we will not know which significant findingsare a type I error. Various approaches have been suggested toreduce the number of type I errors when undertaking multipletesting, including the Bonferroni correction.The Bonferroni correction involved adjusting the criticalsignificance level of 0.05 by dividing it by the number ofstatistical tests performed. The researchers reported performing41 statistical tests, and so therefore statistical significance wasachieved if P was less than 0.05 ÷ 41, or 0.001. The correctionis conservative and not recommended if a large number of testsare performed, since few if any tests will be significant after thecorrection has been applied.

[email protected]

For personal use only: See rights and reprints http://www.bmj.com/permissions Subscribe: http://www.bmj.com/subscribe

BMJ 2012;344:e509 doi: 10.1136/bmj.e509 (Published 25 January 2012) Page 1 of 2

Endgames

ENDGAMES

Competing interests: None declared.

1 Welton AJ, Vickers MR, Kim J, Ford D, Lawton BA, MacLennan AH, et al on behalf of theWISDOM team. Health related quality of life after combined hormone replacement therapy:randomised controlled trial. BMJ 2008;337:a1190.

2 Sedgwick P. Errors when statistical hypothesis testing. BMJ 2010;340:c2348.

3 Sedgwick P. Sample size calculations I. BMJ 2010;340:c3104.4 Sedgwick P. Multiple significance tests. BMJ 2010;340:c2963.

Cite this as: BMJ 2012;344:e509© BMJ Publishing Group Ltd 2012

For personal use only: See rights and reprints http://www.bmj.com/permissions Subscribe: http://www.bmj.com/subscribe

BMJ 2012;344:e509 doi: 10.1136/bmj.e509 (Published 25 January 2012) Page 2 of 2

ENDGAMES


Top Related