study design and simple statistics 17 th feb 2005 kath bennett
TRANSCRIPT
Study design and simple statistics
17th Feb 2005
Kath Bennett
Overview
• Overview of research methods, study design.
• Some common statistical definitions.
Research
Basic research
Lab, biochemical, genetic
Epidemiology
Distribution & determinants of disease in a
population
Clinical
Deals with patients with a particular
disease
Research
• Clear aims and objectives from start– hypothesis
• Design study to be able to address the objectives set out
• Collect complete and accurate data• Enter and analyse data• Interpret the data in light of available
evidence• Publish
Types of Clinical Research
Quantitative Qualitative
Types of clinical studies
Quantitative
Observational Experimental(epidemiological) (interventional)
Cohort “Clinical trials”Case-Control Randomised controlled Cross-Sectional trialCase Reports Open studies
Pilot study Large simplified trial
Observational versus Experimental Research
• Observational research seen as complementary to experimental:
•Intervention producing large impact, can be shown using observational studies
•Infrequent adverse events, require large numbers, inpractical in RCTS.
•Longer term than RCTS.•Clinical uncertainty providing
evidence for RCTS.•Impractical or unethical to do an
RCT.
Comparison of random and Comparison of random and non-random studiesnon-random studies
HRT and coronary heart disease. Evidence HRT and coronary heart disease. Evidence from observational studies and recently from observational studies and recently published RCT (Lancet 2002)published RCT (Lancet 2002)
Relative risk
Observational studies 0.5-0.75
RCT 1.29
Quantitative Methods
Advantages• ‘Objective’ assessment• Can sample large numbers (cost!)• Can assess prevalence• Repeatable results (consistency)
Quantitative Methods
Disadvantages• Way in which questions are
generated – Researcher decides limits and imposes
structure– Little opportunity to detect
“unexpected” new outcomes
• Sources of bias– lack of explanatory power– limited ability to describe context
Types of clinical studies
Qualititative
Focus group discussions
Indepth interviewing
Observation
Documentary
Primary versus Secondary Research
Primary SecondaryClinical trials Systematic
ReviewsSurveys Meta – analyses
Cohort studies Economic analyses (original research (reanalysis of focused on patients previously
gathered or populations) data)
Clinical trials
• Importance for ventures into clinical researchPrinciples required• Appropriate Design• Randomisation• Blinding• Study power or sample size
Randomised Controlled Trial - RCT
Treatment (efficacy, R.C.T.(randomised
safety comparison etc.) controlled trial)
QUESTION PREFERRED DESIGN
Clinical trial design
• Parallel group trials– RANDOMISED:Patients randomly allocated to either one
treatment or another– NON-RANDOMISED : patients not randomly allocated to
treatment.
• Factorial design – Patients may receive none, one or more than one of several
interventions.
• Cross-over trials– Patients receive one treatment followed by another. Fewer
patients required but takes longer. Within-subject comparisons, and therefore less variability producing more precise results (fewer patients required)
Randomised parallel group design
Participants satisfying entry criteria
Randomly allocated to receive A or B
A B
Participants followed up exactly the same way
Example: Digoxin vs Placebo – DIG study
Factorial design
Participants satisfying entry criteria
Participants randomly allocated to one of four groups. 2x2 factorial design
Example: Heart Protection Study. =Simvastatin;
=Vitamins; =Placebo
MRC/BHF Heart Protection Study
Simvastatin(40 mg daily)
vs Placebotablets
Vitamins(600 mg E, 250 mg C& 20 mg beta-carotene)
vs Placebocapsules
Planned mean duration: At least 5 years
2x2 Factorial treatment comparisons
Randomised to either:
Two-period, two-treatment cross-over trial
Participants satisfying
entry criteria – sometimes followed by
run-in period
A
B
B
A
Randomised to A followed by B or vice-versa
Usually ‘washout’ in between
Example: Aspergesic (A) vs ibuprofen (B) in rheumatoid arthritis.
RELIABILITY
CHANCE SYSTEMATIC
EFFECTS BIASESRandom error Systematic error
• Minimise chance effects (random error) by– Increasing the number of patients studied (do large trials
and reviews of trials)
• Minimise systematic biases (systematic error) by– Using an appropriate method of allocation
(randomisation)– Ensuring investigator and/or subject unaware of
treatment allocation (blinding)– Basing the analyses on the allocated treatment
(intention-to-treat)– Including all relevant evidence (systematic review of
similar trials)
To obtain evidence as reliable as possible
Randomisation
• Clinical trials, and any studies need to avoid bias– By doctor eg. preferences to treatment– By individual patient– By choice of design
• Randomisation avoids bias by removing choice of treatment by doctor or patient
• Randomisation is not always possible for practical or ethical reasons, leading to a controlled clinical trial (treated group compared directly with non-treated group)
Blinding
• Avoidance of bias in subjective assessment eg. pain, frequency of side effects achieved through blinding
• Double blind (masked) trials – when both patients & investigators are not aware of
which treatment group has been assigned
• Single blind (masked) trials– when only the study participant is not aware of the
treatment group assigned to them
• ‘Placebo’ is also useful in avoiding bias
Intention to treat (ITT)
• Intention of randomisation is to establish similar groups of patients in each arm
• Problems arise when non-adherence may be related to outcome or prognosis, leading to biased representation
• ITT analyses all patients according to randomised treatment irrespective of protocol violations etc.
• However, it does not solve all problems
Number of patients required – sample size
• Requirement for well-designed studies• Most journals now require sample size
calculations• Reassurance money well spent – likelihood
study will give unequivocal results• Requirement for regularity authorities i.e FDA• Low sample size can be a reason for not
recognising that one treatment is superior• Unethical to perform a study if numbers too
small to detect a useful difference
What is “power” of a study?
• “the ability to detect a true difference of clinical importance” Doug Altman
• “the confidence with which the investigator can claim that a specified treatment benefit has not been overlooked”Sheila Gore
Estimating sample size and power
• Identify a single major outcome measure – primary endpoint– Survival, response rate, quality of life
• Specify size of difference required to detect– Improvement in response from 20% to 30%
• ‘We want to be reasonably certain of detecting such a difference if it really exists’– ‘detecting a difference’ refers to P<0.05– ‘reasonably certain’ refers to having a chance of at
least 80% or obtaining such a P value
Methods to calculate sample size
• Equations– Mathematical equations available for computing
sample size given , and (1- )
• Tables – Based on equations above
• Nomogram– Summarises figures in a graph, easy to use
• Computer packages
Example• Objective: to compare effect of drug A vs drug B
using blood pressure as outcome measure• Design: RCT – half to drug A, half to drug B• Require 80% power, and significance level set at
5%• Expected mean difference between the two
groups= 6 • Pooled standard deviation SD=10 =difference in means/SD (effect size)
= 6/10 = 0.6• From tables n=45 per group
Common statistical definitions
Classification of data
• Different types of data– Nominal / categorical - used in
classification (eg blood groups); Female / Male also
– Ordinal - ordered categorical data (e.g. non-smoker, <10 day, 10-20 day, >20 day)
– Interval / continuous data (e.g. age, birthweight, plasma K levels)
Graphical presentations
BAR CHARTS• Bar charts are used to show
(graphically) frequency distributions for categorical data.
• The height of each ‘bar’ in the bar chart is proportional to the number of observations or frequency of the observations in each category.
BAR CHART
Bar chart of Blood groups
BLOOD GROUP
OBABA
Num
ber
of
patie
nts
60
50
40
30
20
10
Histograms
• Similar to bar charts but for continuous (interval) data
• the width of the bars varies only with varying intervals of data.
• Boundaries of histogram ‘bars’ are taken as half way between the upper limit of the lower group and the lower limit of the upper group.
pre-operative % haemoglobin
100.090.080.070.060.050.040.030.0
Histogram of pre-operative haemoglobin ratesF
requ
ency
(N
umbe
r o
f pa
tient
s)
16
14
12
10
8
6
4
2
0
Std. Dev = 14.40 Mean = 61.3N = 45.00
The Normal distribution
• An important distribution in statistics• - used for continuous data • - bell-shaped curve• - symmetric about the mean (or median)
0 2-2-4 4incr
easi
ng p
robabili
ty
0
0.4
-1.96 1.96
2.5%2.5% 95%
Measures of location
• Gives an idea of the ‘average’ value on a particular scale
Common measures are:– Mean - sum of observations / number of
observations– Median - middle value of the sample when
arranged in order– Mode - most common value (used when
only a few different values)
Variation
• Humans differ in response to exposure to adverse effects
• Humans differ in response to treatment
• Humans differ in disease symptoms
• Diagnosis and treatment is often probabilistically based
Measures of variation
• Gives an idea of the spread or variability of the data
• Common measures are:– Range – Quartiles - The ‘inter-quartile range’ is the
difference between the 25th and 75th centiles
– Sample variance - 2= 1
12
nxix
( )
Measures of dispersion (contd.)
The standard deviation () is the square root of the variance.
– Standard error (if repeated samples were taken, the standard deviation of means from each sample)
• SE(Mean)= n
Confidence intervals• Over emphasis on hypothesis testing and
p-values.
• The size and range of the difference between two groups is more informative than whether it is statistically significant or not.
• Confidence intervals, if appropriate to the type of study, should be used for major findings in both main text and abstract.
Confidence intervals
• If a CI is constructed, the significance of a hypothesis test can be inferred from it.
• For example, a 95% CI for the difference of two means containing 0 would infer that the difference between the means was non-significant at 5%
Systolic blood pressure in 100 diabetic and 100 non-diabetic men
DIABETICS
190.0180.0170.0160.0150.0140.0130.0120.0110.0100.0
30
20
10
0
146.4
NON-DIABETICS
180.0170.0160.0150.0140.0130.0120.0110.0100.0
30
20
10
0
140.4
Difference between sample means = 6 mm Hg.
• Difference of 6.0mm Hg found between mean systolic blood pressures, standard error 2.5mm Hg.
• 95% confidence interval for population difference is from 1.1 to 10.9 mm Hg.
• This means there is a 95% chance that the indicated range includes the ‘true’ population difference in mean blood pressure.
Systolic blood pressure in 100 men with diabetes and 100 men
without
What affects the width of a CI?
• The sample size by a factor of n. Smaller sample size leads to lower precision.
• Variability of data - less variable the data, more precise the estimate.
• Degree of confidence. 95% most commonly used. If greater or less confidence required the CIs increase and decrease respectively.
P-values and CIs
• One can infer from CIs whether there is a statistical significant difference, but not vice versa.
• Example, difference in BP between diabetics and non-diabetics found to be 6mm Hg. 95% confidence interval for population difference is from 1.1 to 10.9 mm Hg.
• The interval does not contain ‘0’ so we can infer that there is a statistically significant difference between the groups. In fact, the p-value from an independent t-test was p=0.02.
Probability
• Probability and statistical tests– Statistical tests are used to assess the weight
of evidence and to estimate probability that data arose from chance
– Presented as ‘p value’, usually p<0.05, i.e. the observed difference would be expected to have arisen by chance less than 5% of time or p<0.001, less than 0.1% of the time
– 5% or 1% is known as the significance level of the test or alpha ()
Effect on significance
• ‘Non-significance’– Indicates insufficient weight of evidence – Does not mean ‘no clinically important difference
between groups’– If power of test is low (i.e. sample size too small), all
one can conclude is that the question of difference between groups is unresolved
• Confidence intervals show, more informatively, the impact of sample size upon precision of a difference
Reporting p-values
P value Wording Summary
>0.05 Not significant ns
0.01 to 0.05 Significant *
0.001 to 0.01 Very significant **
< 0.001 Extremely significant ***
Report the actual p-value
Measuring effectiveness
Risk
PROPORTION
A ratio where the numerator (top) is part of the denominator (bottom).
RISK
Number of subjects in a group who have an event divided by total number of subjects in the group. It is the probability of (proportion) having an event in that group (P). It is called incidence when expressed per unit time
RELATIVE RISK (RR)
Ratio of risk in exposed group to risk in not exposed group (P1/P2)
Example
Type of vaccine Got Avoided Total
Influenza Influenza
I 43 237 280
II (Control) 52 198 250
Risk of disease in Vaccine Group I = 43/280=0.154
Risk of disease in Vaccine Group II=52/250=0.208
Relative Risk (Risk Ratio) =0.154/0.208 =0.74
Odds
ODDS
Probability of developing disease divided by probability of not developing disease. P/ (1-P)
Often expressed as number of times something expected not to happen: number of times something expected to happen.
ODDS RATIO (OR)
Ratio of odds for exposed group divided by odds for not exposed group.
{P1/(1-P1)}/{P2/(1-P2)}
Odds ratios are treated as relative risks, especially when events are rare, and emerge naturally in some types of studies (case-control studies)
Example Odds of disease in Vaccine Group I = 0.154/(1-0.154)=0.182
Odds of disease in Vaccine Group II= 0.208/(1-0.208)=0.263
Odds ratio of getting disease in Group I relative to Group II=0.182/0.263=0.69 (close to relative risk of 0.74)
Absolute risk reduction
Absolute risk reduction (ARR)
Risk in treated group minus risk in control group
ARR=p1-p2
Number need to treat=1/ARR
This is the number you would need to treat under each of two treatments to get one extra person cured under the new treatment
Example
Absolute risk reduction for vaccine I=
0.208 - 0.154=0.054
NNT=1/0.054=18.5
Thus on average one would have to give vaccine I to 19 patients to expect one extra patient is being protected from influenza compared with vaccine II.
Summary
• Have clear objectives and aims to study
• Chose the study design that best addresses these aims
• Use randomisation, blinding etc. where appropriate
• Make sure sufficient numbers of individuals studied to be able to reliably answer the question.
Useful statistical references
• M Bland. An Introduction to Medical Statistics.• Campbell MJ and Machin D (1993) Medical Campbell MJ and Machin D (1993) Medical
Statistics: a commonsense approach. WileyStatistics: a commonsense approach. Wiley• DG Altman. Practical statistics for medical
research. London: Chapman & Hall, 1991.• DS Moore and GP McCabe. Introduction to
the practice of statistics. WH Freeman and Company, New York, 3rd Edition. 1999.