statistics for anaesthesiologists

Dr John George K. MD,PDCCAssociate Professor of Anaesthesiology

KMC, Manipal

Statistics for Anaesthesiologists

Recommended Software

• RStudio (GUI) with R, R Commander, R Commander Plugins like EZR (Free, Cross platform, powerful programming paradigm)

• G*Power (Free, for power analysis)

• SPSS (Commercial, expensive)

• SOFA (Free, basic)

• Graphpad.com

• Spreadsheet software like MS Excel for initial data entry (export as CSV file format)

Data Types

• Nominal or Categorical data

• Ordinal data

• Interval data

• Ratio data

Data Types

Nominal: Categorical data and numbers that are simply used as identifiers or names. Ex: social security (Aadhar) number

Ordinal: an ordered series of relationships or rank order. Ex: first, second, or third place in a contest, Likert scale

Interval: A scale that represents quantity and has equal units but for which zero represents simply an additional point of measurement.. Ex: Fahrenheit scale

Ratio: similar to the interval scale. However, this scale also has an absolute zero (no numbers exist below zero). Ex: Height, Weight

Parametric tests

Non-parametric tests

Reporting data types

OK to compute Nominal Ordinal Interval Ratio

Frequency Distribution

Yes Yes Yes Yes

Median, percentiles

No Yes Yes Yes

Mean, SD, SE of mean

No No Yes Yes

Ratio or coefficient of variation

No No No Yes

Tests for normality of data

• Kolmogorov-Smirnov Test – inferior to others, relies on goodness of fit of a sample with a normal distribution curve, avoid its use!

• Shapiro-Wilk Test – better, mores specific, more powerful especially with small sample sizes, available in Rcommander, SPSS (under menu Analyze>Descriptive Statistics>Explore)

Tests for normality of data

• D'Agostino-Pearson test

• Anderson-Darling test

• Q-Q (Quantile Probability) Plot – visual guide

• Histogram – inferior, look for Skew or Kurtosis

• Density Plot – better, look for Skew or Kurtosis

Choosing a statistical test

• Make sure you have adequate sample size (power) to reject null hypothesis (Ho)

• Check is it one (only < or > μ, only one direction) or two-tailed comparison (≠μ , test significance at both sides) – in general use 2

• Look at your data types – ordinal, interval etc

• Do descriptive statistics testing

Choosing a statistical test

• Test normality of data – tests and visual comparison (especially when n<30)

• Decide to use Parametric Vs Non-parametric tests

• Look at number of groups 2 or more – t-tests (if n<30), z-test (n>30) or ANOVA (F-test) or their non-parametric equivalents

• For 2 or more groups check if data is paired or independent

What is p-value?

Ronald Fisher

What is p-value?

What is p-value?• The p-value is a probability of the test statistic’s sampling

distribution under the null hypothesis (null distribution, we first assume Ho is true!)

• The (left-tailed) p-value is the quantile of the value of the test statistic, the right-tailed p-value is one minus the quantile, while the two-tailed p-value is twice whichever of these is smaller.

• The p-value is NOT the probability that the null hypothesis is true, nor is it the probability that the alternative hypothesis is false

What is p-value?

• p-value is NOT the same as α !

• p-value is NOT the probability of rejecting the null hypothesis (we reject Ho when p-value is less than the significance level which is α)

• p-value is computed while α is set by experimental design

• If Ho is true, α is the probability of rejecting null hypothesis

CHI SQUARE OR FISHER’S EXACT TEST?

• In the days before computers were readily available, people analyzed contingency tables by hand, or using a calculator, using chi-square tests

• Works by computing the expected values for each cell if the relative risk (or odds' ratio) were 1.0. It then combines the discrepancies between observed and expected values into a chi-square statistic from which a P value is computed


• The chi-square test is only an approximation!

• Yates continuity correction is designed to make it better, but it over corrects so gives a p-value that is too large (too 'conservative’)

• With large sample sizes, Yates' correction makes little difference, and the chi-square test works very well. With small sample sizes, chi-square is not accurate, with or without Yates' correction


• Fisher's exact test, as its name implies, always gives an exact P value and works fine with small sample sizes

• Fisher's test (unlike chi-square) is very hard to calculate by hand (so generally used for 2 x 2 or 2 x n table), but is easy to compute with a computer

• Advisable to use when any cell of the table has expected value < 5


• Most statistical books advise using it instead of chi-square test (especially small samples, but chi square becomes acceptable for large sample sizes)

• Fisher’s exact test can be used for a m x n table

• Some have criticized it as the exact answer to the wrong question!

Men Women Total

Dieting a b a+b

Not Dieting c d c+d

Total a+c b+d (a+b+c+d)=n

ANOVA (ANALYSIS OF VARIANCE)

• The one-way analysis of variance (ANOVA) is used to determine whether there are any significant differences between the means of two or more independent (unrelated) groups

• For ex: to understand if exam performance (dependent variable) differed based on test anxiety levels amongst students, dividing students into three independent groups (e.g., low, medium and high-stressed students)

ONE-WAY ANOVA DESIGN

Treatment/Condition

Levels (Independent Variable)

Group1 Group2 Group3

CONDITION1

S1 DV S6 DV S11 DVS2 DV S7 DV S12 DVS3 DV S8 DV S13 DVS4 DV S9 DV S14 DVS5 DV S10 DV S15 DV

DV = Dependent Variable S = Subject


• It is an omnibus test statistic and cannot tell you which specific groups were significantly different from each other; it only tells you that at least two groups were different.

• Since you may have ≥3 groups in your study design, determining which of these groups differ from each other is done using a Post-hoc test (Tukey’s test is preferred) which gives a Multiple comparisons table.


• To apply ANOVA 6 assumptions must be met:

• Assumption #1: Your dependent variable should be measured at the interval or ratio level (i.e., they are continuous)

• Assumption #2: Your independent variable should consist of two or more categorical, independent groups; it can be used for just two groups (but an independent-samples t-test is more commonly used for two groups)


• Assumption #3: You should have independence of observations, which means that there is no relationship between the observations in each group or between the groups themselves.

• Assumption #4: There should be no significant outliers.

• Assumption #5: Your dependent variable should be approximately normally distributed for each category of the independent variable (but it is quite "robust" to violations of normality)

• Assumption #6: There needs to be homogeneity of variances. (in SPSS using Levene's test for homogeneity of variances)

ANOVA (ANALYSIS OF VARIANCE) METHOD

• ANOVA calculates the mean for each of the groups - the Group Means.

• It calculates the mean for all the groups combined - the Overall Mean.

• Then it calculates, within each group, the total deviation of each individual's score from the Group Mean - Within Group (Error )Variation.


• Next, it calculates the deviation of each Group Mean from the Overall Mean - Between Group Variation.

• Finally, ANOVA produces the F statistic which is the ratio Between Group Variation to the Within Group (Error) Variation.

TWO-WAY ANOVA DESIGN

Treatment/Condition

(Independent)

Levels (Independent Variable)

Group1 Group2 Group3

CONDITION1

S1 DV S6 DV S11 DV

S2 DV S7 DV S12 DV

S3 DV S8 DV S13 DV

S4 DV S9 DV S14 DV

S5 DV S10 DV S15 DV

CONDITION2

S16 DV S21 DV S26DV

S17 DV S22 DV S27 DV




ANCOVA (ANALYSIS OF COVARIANCE)

• An extension of the one-way ANOVA used to determine whether there are any significant differences between the means of two or more independent (unrelated) groups (specifically, the adjusted means) by adjusting for a third or confounding variable

• Third variable (known as a "covariate” or “confounding variable”) is that you want to "statistically control” that maybe affecting results of ANOVA

• In each one of the two groups we can compute the correlation coefficient between the third variable and dependent variables

REPEATED MEASURES ANOVA

• A repeated measures ANOVA is used when you have a single group on which you have measured something a few times

• For example, you may have a test of understanding of Classes. You give this test at the beginning of the topic, at the end of the topic and then at the end of the subject

• You would use a one-way repeated measures ANOVA to see if student performance on the test changed over time


• Repeated measures ANOVA is the equivalent of the one-way ANOVA, but for related, not independent groups, and is the extension of the dependent t-test

• A repeated measures ANOVA is also referred to as a within-subjects ANOVA or ANOVA for correlated samples

• The major advantage with running a repeated measures ANOVA over an independent ANOVA is that the test is generally much more powerful. This particular advantage is achieved by the reduction in variability (due to differences between subjects) during the performance of the test


SubjectsTime/Condition (Independent Variable)

T1 T2 T3

S1 S1 S1 S1

S2 S2 S2 S2

S3 S3 S3 S3

S4 S4 S4 S4

S5 S5 S5 S5

TWO-WAY ANOVA REPEATED MEASURES

Factor(Independent)

SubjectsTime/Condition (Independent Variable)

T1 T2 T3

GROUP1

S1 S1 S1 S1

S2 S2 S2 S2

S3 S3 S3 S3

S4 S4 S4 S4

S5 S5 S5 S5

GROUP2

S6 S6 S6 S6

S7 S7 S7 S7

S8 S8 S8 S8

S9 S9 S9 S9

S10 S10 S10 S10

Variable type & CHOOSING A Test

Explanatory Variable

Response Variable

Methods

Categorical Categorical Contingency Tables

Categorical Quantitative ANOVA

Quantitative Quantitative Regression

ANOVA – WHY NOT JUST USE t-TESTS?

• Multiple t-tests are not the answer because as the number of groups grows, the number of needed pair comparisons grows quickly. For example in 7 groups there are 21 pairs. If we test 21 pairs we should not be surprised to observe things that happen only 5% of the time. Thus in 21 pairings, a p-value = 0.05 for one pair cannot be considered significant.

• Our level of significance α has to be divided for multiple comparisons (Ex: for above it becomes α/21)

• ANOVA puts all the data into one number (F) and gives us one p-value for the null hypothesis.

ANOVA – WHY NOT JUST USE t-TESTS?

From eBook: Research skills for Psychology Majors by William Gabrenya

Likert ITEM & LIKERT Scale


• Likert scale consists of multiple Likert-type items

• Likert-type scales (such as "On a scale of 1 to 10, with one being no pain and ten being high pain, how much pain are you in today?")

• Represent ordinal data (order, rank, but no real distance)


• Fundamentally, these scales do not represent a measurable quantity

• An individual may respond 8 and be in less pain than someone else who responded 5

• A person may not be in exactly half as much pain if they responded 4 than if they responded 8

• Visual Analog Scale is a Likert scale but often (wrongly) analyzed as if it were continuous data

COMPOSITE SCORE & LIKERT Scale

• Composite scores combine multiple Likert item scales into a single scale

• Composite scores must first be analyzed for internal consistency and inter-item correlation for each item and reported (ex: using Cronbach’s alpha – scale reliability analysis)

• These scores represent ordinal data so must use non-parametric tests and descriptives

Cronbach’s Alpha For scales

• Check for internal consistency and overall validity of a multiple Likert-type item scale

• Check correlation (α) with each item deleted at a time

• Based on number of items and comparison of its variances

Cronbach’s Alpha For scales

• Values of α range from 0 to 1

• Ideally overall α and α for each item (when deleted from scale) must be > 0.7 to 0.8

• Clinical scores need higher α > 0.8 to 0.9 (Bland-Altman)

Power analysis & effect size

• To calculate sample size (n) we must know the type of statistical test involved in our primary outcome measure

• Also we must also know:

• Desired α error (usually taken as 0.05)

• Power (1-β) usually taken 0.8 (80%) or greater

• Two or one-tailed comparison

• Effect size

Power analysis & effect size• Power is the fraction of experiments that you expect to yield a "statistically significant” p-value (80% of experiments of the sample may yield a significant p-value)

• Effect size (Cohen’s d for mean) depends on study design, it is calculated by data from pilot studies or reference studies

• Effect size depends on a clinically defined level of significance (ex: more than 20% difference between 2 groups, with difference for proportion or mean ± SD data etc)

Power analysis & effect size• Cohen’s d is usually calculated based on pilot

studies but if effect size is unknown Jacob Cohen provided 3 guess estimate effect sizes (value varies slightly for different statistical tests):

1.Small effect d around 0.2 (requires large sample sizes)

2.Medium effect d around 0.5 (seen with careful observation, use when in doubt)

3.Large effect greater than 0.8 (if large it is obvious)

• Criticized when d is used as above as “T-shirt” effect sizes

Power analysis & effect size

• Calculation of required sample size a with set target for power before starting the final study is called A priori analysis (before the fact) – accepted method, especially important to avoid incorrectly being “blind” to a real difference in a negative study (due to large βerror)

• Calculation of required sample size at the end of the final study is called Post hoc analysis (after the fact) – incorrect as the computed power is a simple reflection of the p-value!

• G*Power software is a free useful resource

statistics for anaesthesiologists

Education