statistics bootcamp 2013
TRANSCRIPT
Statistics Bootcamp 101 for HLABC Members
Penny Brasher, PhD
Vancouver, BC
June 14, 2013
c©PMA Brasher (UBC) Biostats Bootcamp 14.Jun.2013 2 / 57
Statistics are everywhere
Angus Reid Public Opinion surveyed 808 randomly selected B.C. residents from May 1 to 2. Itclaims a margin of error of +/-3.5
c©PMA Brasher (UBC) Biostats Bootcamp 14.Jun.2013 3 / 57
What is Statistics?
c©PMA Brasher (UBC) Biostats Bootcamp 14.Jun.2013 4 / 57
What is Biostatistics?
Biostatistics = statistics applied to biomedical problems
design and analysis of experiments
design and analysis of observational studies
measurement, data analysis (description, inference), statistical graphics
detective work
making decisions in the face of uncertainty (variability)
inference from a sample (specific) to a population (general)
c©PMA Brasher (UBC) Biostats Bootcamp 14.Jun.2013 5 / 57
Part I
Basic Statistical Concepts
c©PMA Brasher (UBC) Biostats Bootcamp 14.Jun.2013 8 / 57
Basic Concepts
Two broad categories of statistics:
Descriptive Statistics
Inferential Statistics
c©PMA Brasher (UBC) Biostats Bootcamp 14.Jun.2013 9 / 57
Basic Concepts
Descriptive Statistics
using numerical summaries and figures to summarize or characterize a set of data.
mean, median, variance, range, etc.
histograms, scatterplots, boxplots, etc.
? no assumptions are made.
⇒ If the data are a random sample from a certain population, the sample represents thepopulation in minature.
c©PMA Brasher (UBC) Biostats Bootcamp 14.Jun.2013 10 / 57
Part II
Types of Data
c©PMA Brasher (UBC) Biostats Bootcamp 14.Jun.2013 12 / 57
Types of Data
Categorical Data
Nominal variables assume values that fall into unordered categories. Nominal datamay be binary (dichotomous) or polychotomous (polytomous). Examples: admissionstatus (admitted, not admitted), survival status (alive, dead), race (caucasian, asian,black, ...).
Ordinal variables assume values that fall into ordered categories but differencesbetween values are not meaningful. Examples: response to treatment (worse, same,improved), degress of illness (none, mild, moderate, severe), likert-item (stronglydisagree, disagree, neutral, agree, strongly agree).
Numerical (Metric, Quantitative) Data
Numerical discrete variables assume a countable number of values. There can begaps in its possible values. Examples: number of comorbidities, number of falls in ayear.
Numerical continuous variables assume, in theory, inifinite values in a given range;there are no gaps in its possible values. Examples: age, weight, etc.
c©PMA Brasher (UBC) Biostats Bootcamp 14.Jun.2013 13 / 57
Grip strength of health librarians
Data collected:
Cohort HLABCID 1Year of birth 1952 numerical discreteHeight (cm) 161.3 numerical continuousSex F nominalGrip position 2 ordinalDominant hand R nominalOrder RL nominalGrip strength, Right (kg) 31.8 numerical continuousScrunchy face (R) 1 nominalGrip strength, Left (kg) 24.6 numerical continuousScrunchy face (L) 1 nominal
c©PMA Brasher (UBC) Biostats Bootcamp 14.Jun.2013 14 / 57
Descriptive StatisticsTypes of Data
Nota bene
There is no such thing as
”nonparametric data”.
⇒ Parameters belong to models.
c©PMA Brasher (UBC) Biostats Bootcamp 14.Jun.2013 15 / 57
Part III
Descriptive Statistics
c©PMA Brasher (UBC) Biostats Bootcamp 14.Jun.2013 16 / 57
Grip strength of health librariansDescriptive Statistics
How would you summarize the characteristics of this sample of librarians?
The characteristics we have collected include:
Year of birthHeight (cm)SexDominant handGrip strength
c©PMA Brasher (UBC) Biostats Bootcamp 14.Jun.2013 17 / 57
Descriptive StatisticsData Summaries
For categorical variables - frequencies & percentages. 1
For numerical continuous variables, typically, one wants to describe the central tendency(central location) of the data, and the degree to which the data is, or is not, spread out(dispersion).
Why are mean and standard deviation often used to describe continuous variables?
1Don’t report percentages if the sample size is small.
c©PMA Brasher (UBC) Biostats Bootcamp 14.Jun.2013 18 / 57
The Normal (Gaussian) Distribution
Normal distributions are completely determined by only two values – the mean, µ, andthe standard deviation, σ.
−8 −6 −4 −2 0 2 4 6 8
(0,1) (3,1)
(0,2)
Gaussian (normal) distributions
The mean, µ, determinesthe center.
The standard deviation, σdetermines the spread(variability).
c©PMA Brasher (UBC) Biostats Bootcamp 14.Jun.2013 19 / 57
The Normal (Gaussian) Distribution
Normal distributions are completely determined by only two values – the mean, µ, andthe standard deviation, σ.
µ−4σ µ−2σ µ µ+2σ µ+4σ
N(µ,σ)
95%
95% of observations will lie in theinterval (µ− 1.96σ, , µ+ 1.96σ).
∼70% of observations will lie in theinterval (µ− σ , µ+ σ).
50% of observations will lie in theinterval (µ− 0.675σ , µ+ 0.675σ).
c©PMA Brasher (UBC) Biostats Bootcamp 14.Jun.2013 20 / 57
Describing DataData Summaries
For continuous variables that are approximately normally distributed the sampledistribution may be summarized with the sample mean, x̄ , and the sample standarddeviation, sd .
? For continuous variables with skewed distributions other summary statistics should beused. If the distribution is unimodal the median and P25 & P75 (Q1 & Q3) or themedian and P10 & P90 could be used.
– Altman DG, Bland JM. Quartiles, quintiles, centiles, and other quantiles. BMJ 1994;309:996.
c©PMA Brasher (UBC) Biostats Bootcamp 14.Jun.2013 21 / 57
Descriptive Statistics
Part of Table 1 from a randomized trial in patients undergoing CABG.
20
Table 1. Anthropometric, baseline and procedural characteristics (intent-to-treat and safety population)
Clevidipine
N=49
Nitroglycerin
N=51
Age, years; mean (SD) 65.8 (11.3) 63.2 (12.3)
Sex
Male, n (%) 40 (81.6) 43 (84.3)
Female, n (%) 9 (18.4) 8 (15.7)
Weight, kg; mean (SD) 79.7 (15.9) 82.1 (18.5)
Height, cm; mean (SD) 170.4 (9.0) 170.5 (12.4)
ASA Physical Status*, n (%)
I 0 (0.0) 0 (0.0)
II 0 (0.0) 1 (2.0)
III 29 (59.2) 33 (64.7)
IV 19 (38.8) 16 (31.4)
V 1 (2.0) 0 (0.0)
Body Mass Index, kg/m2; mean (SD) 27.4 (5.1) 28.2 (5.2)
Index Procedure, n (%)
CABG 43 (87.8) 45 (88.2)
CABG plus valve surgery 6 (12.2) 6 (11.8)
Target MAP, pre-CPB, mmHg; mean (SD) 76.1 (7.0) 76.4 (7.9)
Target MAP, aortic cannulation, mmHg; mean (SD);
CLV n=49, NTG n=49
64.6 (11.9) 63.6 (10.4)
Duration of bypass, min; mean (SD); CLV n=47,
NTG n=51
102.5 (37.1) 99.2 (35.8)
Duration of aortic cannulation (min) mean (SD); CLV
n=35, NTG n=38
18.9 (40.3) 13.3 (26.1)
IABP used, mean (SD) 2 (4.1) 0 (0.0)
Number of grafts, mean (SD) 3.1 (0.8) 3.0 (1.0)
Abbreviations: kg = kilograms, cm = centimeters. ASA = American Society of Anesthesiologists.
IABP = intra-aortic balloon pump.
SD=standard deviation. CLV=clevidipine. NTG=nitroglycerin.
*ASA physical status unknown for 1 NTG-treated patient.
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65
What changes would you make to this table?
c©PMA Brasher (UBC) Biostats Bootcamp 14.Jun.2013 22 / 57
Descriptive Statistics
20
Table 1. Anthropometric, baseline and procedural characteristics (intent-to-treat and safety population)
Clevidipine N=49
Nitroglycerin N=51
Age, years; mean (SD) 65.8 (11.3) 63.2 (12.3)
Sex
Male, n (%) 40 (81.6) 43 (84.3)
Female, n (%) 9 (18.4) 8 (15.7)
Weight, kg; mean (SD) 79.7 (15.9) 82.1 (18.5)
Height, cm; mean (SD) 170.4 (9.0) 170.5 (12.4)
ASA Physical Status*, n (%)
I 0 (0.0) 0 (0.0)
II 0 (0.0) 1 (2.0)
III 29 (59.2) 33 (64.7)
IV 19 (38.8) 16 (31.4)
V 1 (2.0) 0 (0.0)
Body Mass Index, kg/m2; mean (SD) 27.4 (5.1) 28.2 (5.2)
Index Procedure, n (%)
CABG 43 (87.8) 45 (88.2)
CABG plus valve surgery 6 (12.2) 6 (11.8)
Target MAP, pre-CPB, mmHg; mean (SD) 76.1 (7.0) 76.4 (7.9)
Target MAP, aortic cannulation, mmHg; mean (SD); CLV n=49, NTG n=49
64.6 (11.9) 63.6 (10.4)
Duration of bypass, min; mean (SD); CLV n=47, NTG n=51
102.5 (37.1) 99.2 (35.8)
Duration of aortic cannulation (min) mean (SD); CLV n=35, NTG n=38
18.9 (40.3) 13.3 (26.1)
IABP used, mean (SD) 2 (4.1) 0 (0.0)
Number of grafts, mean (SD) 3.1 (0.8) 3.0 (1.0)
Abbreviations: kg = kilograms, cm = centimeters. ASA = American Society of Anesthesiologists. IABP = intra-aortic balloon pump. SD=standard deviation. CLV=clevidipine. NTG=nitroglycerin. *ASA physical status unknown for 1 NTG-treated patient.
1 2 3 4 5 6 7 8 9 1011121314151617181920212223242526272829303132333435363738394041424344454647484950515253545556575859606162636465
⇒ For skewed (asymmetric) data use percentiles.
⇒ For nominal and ordinal variables and for numerical discrete variables with a limitedrange use a table of frequencies.
c©PMA Brasher (UBC) Biostats Bootcamp 14.Jun.2013 23 / 57
Describing dataData Summaries
Sometimes you don’t need to summarize data:
Times to circulatory collapse(s) were
10,35,42,42,43,70; 5,46,50,50,54,64 in IG and C groups, respectively.
c©PMA Brasher (UBC) Biostats Bootcamp 14.Jun.2013 24 / 57
Part V
Inferential Statistics
c©PMA Brasher (UBC) Biostats Bootcamp 14.Jun.2013 25 / 57
Basic Concepts
Inferential Statistics
making inferences about a population from a sample.
estimation and hypothesis testing.
? some assumptions are made.
c©PMA Brasher (UBC) Biostats Bootcamp 14.Jun.2013 26 / 57
Quantifying the role of chance
A very simple example.
We wish to know if a coin is ”fair”. By ”fair” we mean that the probability of getting ahead on any flip is 1/2.
To determine if the coin is ”fair” we could take it to the laboratory and:
determine the weight distribution throughout the coin,
determine the aerodynamics of the coin,
etc.
In this way we would discover the ”truth”.
OR
We could conduct an experiment, compute some statistics and try to get close to thetruth.
⇒ Statistical inference.
c©PMA Brasher (UBC) Biostats Bootcamp 14.Jun.2013 27 / 57
Significance TestingQuantifying the role of chance
Returning to our very simple example.
We decide to flip the coin 15 times.
We observe 4 heads in 15 flips.
c©PMA Brasher (UBC) Biostats Bootcamp 14.Jun.2013 28 / 57
Significance TestingQuantifying the role of chance
N Observed Expected Assumed p Observed p---------------------------------------------------15 4 7.5 0.50000 0.26667
Pr(k <= 4) = 0.059 (one-sided test)Pr(k <= 4 or k >= 11) = 0.118 (two-sided test)
What does this mean?
c©PMA Brasher (UBC) Biostats Bootcamp 14.Jun.2013 29 / 57
Significance Testing
Sir Ronald A. Fisher
In general, tests of significance arebased on hypothetical probabilitiescalculated from their nullhypotheses. They do not generallylead to any probability statementsabout the real world, but to arational and well-defined measureof reluctance to the acceptance ofthe hypotheses they test”.
– Fisher RA. Statistical Methods and ScientificInference (1956)
c©PMA Brasher (UBC) Biostats Bootcamp 14.Jun.2013 30 / 57
Significance TestingA very simple example
Study design: flip coin 15 times.
Test statistic: number of heads.
Evidence against: too many or too few heads.
Probability model: Binomial (n=15,π = 0.5)
Theoretical distribution of the number of heads if the coin is fair.
number of heads
prob
abili
ty
0.00
0.05
0.10
0.15
0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15
If the coin was fair
Fisher would ask us to consider if 4heads (plus more extreme results)is unlikely under the nullhypothesis, i.e. fair coin.
c©PMA Brasher (UBC) Biostats Bootcamp 14.Jun.2013 31 / 57
Significance TestingThe P-value
Pr(k <= 4 or k >= 11 | if coin is fair) = 0.118
The more interesting question is . . .
What is the probablity that the coin is fair? i.e. What is the probability that the nullhypothesis is true?
I have no idea.
c©PMA Brasher (UBC) Biostats Bootcamp 14.Jun.2013 32 / 57
Quantifying the role of chanceThe P-value
In significance testing, the P-value is the probability of obtaining a result (i.e. teststatistic) at least as extreme as the one that was actually observed, when the nullhypothesis is true.
It is Pr(data|H0 is true).
An ”unlikely” event suggests that H0 is unlikely but the P-value provides no measure ofjust how unlikely H0 is.
Akin to proof by contradiction.
We have a model and we examine the extent to which the data contradict the model.
The basis for suggesting a contradiction is observing data that are highly improbableunder the model.
⇒ In health research involving human subjects, P-values are next to useless.
And yet they’re everywhere.
c©PMA Brasher (UBC) Biostats Bootcamp 14.Jun.2013 33 / 57
Inferential StatisticsP-values vs Confidence Intervals
In a randomized trial comparing two treatments the following mortality results werereported by the authors.
Std Exp Std Expnumber percent
died 19 12 31.7 20.7survived 41 46 68.3 79.3total 60 58
P = 0.21, Fisher’s exact test.
The authors concluded ”there is no difference in mortality”.
What do you think?
c©PMA Brasher (UBC) Biostats Bootcamp 14.Jun.2013 34 / 57
Inferential StatisticsConfidence Intervals
A P-value tells you nothing about the size of the treatement effect.
The estimate of the true treatment effect is:
31.7% - 20.7% = 11.0%, 95% CI: -4.9% to 26.1%, 80% CI: 0.6% to 21.0%.
What does the confidence interval represent?
statistical definition: If the study were to be repeated 1000 times and a 95% CI wasconstructed each time, we would expect 950 of those intervals to include the populationparameter. A reported confidence interval from a particular study may or may not include the actual
population value.
working definition: Values of the population parameter that are conistent with thesample data.
⇒ The confidence interval gives a plausible range of values for the unknown populationparameter.
Would you want to receive standard treatment?
c©PMA Brasher (UBC) Biostats Bootcamp 14.Jun.2013 35 / 57
Angus Reid Public Opinion surveyed 808 randomly selected B.C. residents from May 1 to2. It claims a margin of error of +/-3.5
For a proportion the maximum variance is when p = 0.50.
. cii 808 404
-- Binomial Exact --
Obs Mean Std. Err. [95% Conf. Interval]
----------------------------------------------------
808 0.5 0.01759 0.46496 .53504
2*.01759 = 0.03518
Angus Reid is providing the width of the largest possible 95% confidence interval.
c©PMA Brasher (UBC) Biostats Bootcamp 14.Jun.2013 36 / 57
Confidence Intervals vs P-valuesInterpreting Results
Overemphasis on hypothesis testing — and the use of P-values to dichotomizesignificant or non-significant results — has detracted from more usefulapproaches to interpreting study results, such as estimation and confidenceintervals. In medical studies investigators should usually be interested indetermining the size of difference of a measured outcome between groups,rather than a simple indication of whether or not it is statistically significant.
Gardner MJ, Altman DG. Statistics with Confidence
”’The 0.05 syndrome’, a severe, debilitating statistical illness.”
– Palmer CR. 2002
c©PMA Brasher (UBC) Biostats Bootcamp 14.Jun.2013 37 / 57
Inferential Statistics
P-values Confidence intervals
c©PMA Brasher (UBC) Biostats Bootcamp 14.Jun.2013 38 / 57
Part VI
The other big problem - useless graphics.
c©PMA Brasher (UBC) Biostats Bootcamp 14.Jun.2013 39 / 57
Graphical Displays
Common pitfalls in statisticsEvaluating research articlesOtherWhat to look for in a clinical trial
Red larger than yellow or yellow larger than red?
c©PMA Brasher (UBC) Biostats Bootcamp 14.Jun.2013 40 / 57
Graphical Displays
0 2 4 6 8 10frequency
What to look for in a clinical trial
Other
Common pitfalls in statistics
Evaluating research articles
c©PMA Brasher (UBC) Biostats Bootcamp 14.Jun.2013 41 / 57
Graphical Displays
”Any data that can be encoded by one of the pop charts [pie charts, divided barcharts, area charts] can also be encoded by either a dot plot or a multiway dotplot that typically provides far more efficient pattern perception and tablelook-up than the pop-chart encoding.”
– WS Cleveland, The Elements of Graphing Data (rev. Ed) 1994.
c©PMA Brasher (UBC) Biostats Bootcamp 14.Jun.2013 42 / 57
Graphical Displays
Civic Theatres
Contingency & Transfers
Civic Grants
General Government
Library
Community Services
Engineering
Capital Program & Debt
Fire
Support Services
Parks & Recreation
Police
Utilities
●
●
●
●
●
●
●
●
●
●
●
●
●
0 5 10 15 20 25
2011 Operating Expenditure Budget ($1.03B)
Percent of Total Budget
c©PMA Brasher (UBC) Biostats Bootcamp 14.Jun.2013 43 / 57
Graphical DisplaysDynamite plots
”Dynamite pushers” ”Skyscrapers with TV-aerials” ”Pinhead plots”
0
20
40
60
80
100
post
trea
tmen
t sco
rem
ean
+ sd
sham acupuncture
Low information-to-ink ratio. Inaccurate ”look-up”.
c©PMA Brasher (UBC) Biostats Bootcamp 14.Jun.2013 44 / 57
Graphical DisplaysDynamite plots
0
20
40
60
80
100
post
trea
tmen
t sco
rem
ean
+ sd
sham acupuncture0
20
40
60
80
100
Post
trea
tmen
t sco
re
sham acupuncture
c©PMA Brasher (UBC) Biostats Bootcamp 14.Jun.2013 45 / 57
Graphical DisplaysTufte’s list of nine ”shoulds”The Visual Display of Quantitative Information, 1983
Graphical displays should:
show the data,
induce the viewer to think about the substance rather than about methodology,graphic design, the technology of graphic production, or something else,
avoid distorting what the data have to say,
present many numbers in a small space,
make a large data set coherent,
encourage the eye to compare different pieces of data,
reveal the data at several levels of detail, from a broad overview to the fine structure,
serve a reasonably clear purpose: description, exploration, tabulation, or decoration,and
be closely integrated with the statistical and verbal descriptions of a data set.
c©PMA Brasher (UBC) Biostats Bootcamp 14.Jun.2013 46 / 57