how to learn everything you ever wanted to know about biostatistics

97
1 How to Learn Everything You Ever Wanted to Know About Biostatistics Daniel W. Byrne Daniel W. Byrne Director of Biostatistics and Director of Biostatistics and Study Design Study Design General Clinical Research General Clinical Research Center Center Vanderbilt University Vanderbilt University The presenter has no financial interests in the products mentioned in this t

Upload: adeola

Post on 24-Feb-2016

46 views

Category:

Documents


1 download

DESCRIPTION

How to Learn Everything You Ever Wanted to Know About Biostatistics. Daniel W. Byrne Director of Biostatistics and Study Design General Clinical Research Center Vanderbilt University Medical Center. The presenter has no financial interests in the products mentioned in this talk. - PowerPoint PPT Presentation

TRANSCRIPT

Page 1: How to Learn Everything You Ever Wanted to Know About Biostatistics

1

How to Learn Everything You Ever

Wanted to Know About Biostatistics

Daniel W. ByrneDaniel W. Byrne

Director of Biostatistics and Study DesignDirector of Biostatistics and Study Design

General Clinical Research CenterGeneral Clinical Research CenterVanderbilt University Medical CenterVanderbilt University Medical Center

The presenter has no financial interests in the products mentioned in this talk.

Page 2: How to Learn Everything You Ever Wanted to Know About Biostatistics

2

Objective of This Workshop

To provide a 1-hour overview of the To provide a 1-hour overview of the

important practical information that a important practical information that a

clinical investigator needs to know about clinical investigator needs to know about

biostatistics to be successful.biostatistics to be successful.

Page 3: How to Learn Everything You Ever Wanted to Know About Biostatistics

3

I. You Will Need the Right Tools

Page 4: How to Learn Everything You Ever Wanted to Know About Biostatistics

4

Install a powerful, yet easy to use, statistical software package on your computer.

I recommend SPSS for Windows.I recommend SPSS for Windows.

Bring an 1180 for $80 to Karen Montefiori Bring an 1180 for $80 to Karen Montefiori

in 143 Hill Student Center (3-1630).in 143 Hill Student Center (3-1630).

She will lend you the SPSS CD for the day She will lend you the SPSS CD for the day

and you can install this software easily.and you can install this software easily.

Page 5: How to Learn Everything You Ever Wanted to Know About Biostatistics

5

11 163163 SASSAS22 5252 SPSSSPSS33 4848 STATASTATA44 3636 Epi InfoEpi Info55 2222 SUDAANSUDAAN66 1919 S-PLUSS-PLUS77 1212 StatxactStatxact88 88 BMDPBMDP99 66 StatisticaStatistica1010 55 StatviewStatview

SPSS is the 2nd most popular package. It is much easier to use than SAS and Stata.

Page 6: How to Learn Everything You Ever Wanted to Know About Biostatistics

6

Install additional software for statistical “odds and ends”

Instat by GraphPad – graphpad.comInstat by GraphPad – graphpad.com for summary data analysis - $100for summary data analysis - $100

True Epistat by Epistat Services – true-True Epistat by Epistat Services – true-epistat.com epistat.com - $395- $395 for random number table, etc.for random number table, etc.

CIA (CCIA (Confidence Interval Analysis) onfidence Interval Analysis) – bmj.com– bmj.com for confidence intervals - $35.95 with bookfor confidence intervals - $35.95 with book ““Statistics with Confidence” D. AltmanStatistics with Confidence” D. Altman

Page 7: How to Learn Everything You Ever Wanted to Know About Biostatistics

7

Install a sample size program.

If you can afford to spend $400, buy nQuery Advisor – statistical If you can afford to spend $400, buy nQuery Advisor – statistical solutions - solutions - www.statsol.comwww.statsol.com

If you can afford to spend $0, download PS from the Vanderbilt If you can afford to spend $0, download PS from the Vanderbilt

web site – web site – http://www.mc.vanderbilt.edu/prevmed/ps/index.http://www.mc.vanderbilt.edu/prevmed/ps/index.htmhtm

Both packages are on the CRC’s statistical workstation in room A-Both packages are on the CRC’s statistical workstation in room A-3101. VUMC investigators are welcome to use this workstation.3101. VUMC investigators are welcome to use this workstation.

Page 8: How to Learn Everything You Ever Wanted to Know About Biostatistics

8

II. You Will Need a Plan

Page 9: How to Learn Everything You Ever Wanted to Know About Biostatistics

9

Use the scientific method to keep your project focused.

State the problemState the problem Formulate the null hypothesisFormulate the null hypothesis Design the studyDesign the study Collect the dataCollect the data Interpret the dataInterpret the data Draw conclusionsDraw conclusions

Page 10: How to Learn Everything You Ever Wanted to Know About Biostatistics

10

State the Problem Among patients hospitalized for a hip fracture Among patients hospitalized for a hip fracture

who develop pneumonia during their stay in the who develop pneumonia during their stay in the hospital, the mortality rate is 2.3 times higher at hospital, the mortality rate is 2.3 times higher at non-trauma centers compared with trauma centers non-trauma centers compared with trauma centers (48.7% vs. 21.1%, P=0.043.)(48.7% vs. 21.1%, P=0.043.)

It is not clear if, or how, those who will develop It is not clear if, or how, those who will develop pneumonia could be identified on admission.pneumonia could be identified on admission.

Page 11: How to Learn Everything You Ever Wanted to Know About Biostatistics

11

Formulate the Null Hypothesis

Among patients hospitalized for treatment Among patients hospitalized for treatment of a hip fracture, there are no factors known of a hip fracture, there are no factors known upon admission that are statistically upon admission that are statistically different between those who develop different between those who develop pneumonia during their stay and those who pneumonia during their stay and those who do not.do not.

Page 12: How to Learn Everything You Ever Wanted to Know About Biostatistics

12

Why bother with a null hypothesis?

For the same reason that we assume that a person For the same reason that we assume that a person is innocent until proven guilty.is innocent until proven guilty.

The burden of responsibility is on the prosecutor The burden of responsibility is on the prosecutor to demonstrate enough evidence for members of a to demonstrate enough evidence for members of a jury to be convinced of that the charges are true jury to be convinced of that the charges are true and to and to changechange their minds. their minds.

Outcome after treatment with Drug A will not be Outcome after treatment with Drug A will not be significantly different from placebo.significantly different from placebo.

Page 13: How to Learn Everything You Ever Wanted to Know About Biostatistics

13

Design the Study

Data on 933 patients with a hip fracture Data on 933 patients with a hip fracture from a New York trauma registry will be from a New York trauma registry will be analyzed.analyzed.

The 58 patients with pneumonia will be The 58 patients with pneumonia will be compared with the 875 without pneumonia.compared with the 875 without pneumonia.

Page 14: How to Learn Everything You Ever Wanted to Know About Biostatistics

14

The Most Common Type of Flaw

4

4

20

0

0 5 10 15 20 25

Presentation of the results

Importance of the topic

Interpretation of the findings

Study Design

Number of Responses

Page 15: How to Learn Everything You Ever Wanted to Know About Biostatistics

15

Example of Recall Bias A control group is asked, A control group is asked,

““Two weeks ago from today, did you eat X for Two weeks ago from today, did you eat X for breakfast?”breakfast?”

Two weeks after their MI, patients are asked Two weeks after their MI, patients are asked ““Did you eat X for breakfast on the day of your Did you eat X for breakfast on the day of your

heart attack?”heart attack?” You can prove any food causes an MI using this You can prove any food causes an MI using this

method (X=bacon, X=Flintstone vitamins, etc.)method (X=bacon, X=Flintstone vitamins, etc.)

Page 16: How to Learn Everything You Ever Wanted to Know About Biostatistics

16

John Bailar’s Quote:

““Study design and bias are much more Study design and bias are much more important than complex statistical important than complex statistical methods.”methods.”

Devote more time to improving the study Devote more time to improving the study design, and minimizing and measuring bias.design, and minimizing and measuring bias.

Become an expert at study design issues Become an expert at study design issues and biases in your area of research.and biases in your area of research.

Page 17: How to Learn Everything You Ever Wanted to Know About Biostatistics

17

What is the statistical power of the study? PowerPower BetaBeta AlphaAlpha Sample sizeSample size Ratio of treated to control groupRatio of treated to control group Measure of outcomeMeasure of outcome

Page 18: How to Learn Everything You Ever Wanted to Know About Biostatistics

18

Sample Size Table

See Table 9-1 in the handout See Table 9-1 in the handout ““Sample Size Requirements for Each of Sample Size Requirements for Each of

Two Groups”.Two Groups”.

Page 19: How to Learn Everything You Ever Wanted to Know About Biostatistics

19

Page 20: How to Learn Everything You Ever Wanted to Know About Biostatistics

20

Collect the Data

See the handouts for:See the handouts for: II TEC Trauma Systems StudyTEC Trauma Systems Study

Page 21: How to Learn Everything You Ever Wanted to Know About Biostatistics

21

III. You Will Need Data Management Skills

Page 22: How to Learn Everything You Ever Wanted to Know About Biostatistics

22

Enter your data with statistical analysis in mind.

For small projects enter data into Microsoft For small projects enter data into Microsoft Excel or directly into SPSS.Excel or directly into SPSS.

For large projects, create a database with For large projects, create a database with Microsoft Access.Microsoft Access.

Keep variables names in the first row, with Keep variables names in the first row, with <=8 characters, and no internal spaces.<=8 characters, and no internal spaces.

Enter as little text as possible and use codes Enter as little text as possible and use codes for categories, such as 1=male, 2=female.for categories, such as 1=male, 2=female.

Page 23: How to Learn Everything You Ever Wanted to Know About Biostatistics

23

Spreadsheet from Hell

Page 24: How to Learn Everything You Ever Wanted to Know About Biostatistics

24

Spreadsheet from Heaven

Page 25: How to Learn Everything You Ever Wanted to Know About Biostatistics

25

IV. You Will Need to Learn Descriptive Statistics

Page 26: How to Learn Everything You Ever Wanted to Know About Biostatistics

26

Descriptive vs. Inferential

Descriptive statistics summarize your group.Descriptive statistics summarize your group. average age 78.5, 89.3% white.average age 78.5, 89.3% white.

Inferential statistics use the theory of probability to Inferential statistics use the theory of probability to make inferences about larger populations from your make inferences about larger populations from your sample. sample. White patients were significantly older than black White patients were significantly older than black

and Hispanic patients, P<0.001.and Hispanic patients, P<0.001.

Page 27: How to Learn Everything You Ever Wanted to Know About Biostatistics

27

Import your data into a statistical program for screening and analysis.

Page 28: How to Learn Everything You Ever Wanted to Know About Biostatistics

28

Screen your data thoroughly for errors and inconsistencies before doing ANY analyses.

Check the lowest and highest value for each Check the lowest and highest value for each variable. variable. For example, age 1-777.For example, age 1-777.

Look at histograms to detect typos.Look at histograms to detect typos. Cross-check variables to detect impossible Cross-check variables to detect impossible

combinations. combinations. For example, pregnant males, survivors For example, pregnant males, survivors

discharged to the morgue, patients in the ICU discharged to the morgue, patients in the ICU for 25 days with no complications.for 25 days with no complications.

Page 29: How to Learn Everything You Ever Wanted to Know About Biostatistics

29

Analyze, descriptive statistics, frequencies, select the variable

Statistics

AGE933

079.29281.300

90.026.537763.014.0

777.0

ValidMissing

N

MeanMedianModeStd. DeviationRangeMinimumMaximum

AGE

775.0725.0

675.0625.0

575.0525.0

475.0425.0

375.0325.0

275.0225.0

175.0125.0

75.025.0

AGE

Fre

quen

cy

700

600

500

400

300

200

100

0

Std. Dev = 26.54

Mean = 79.3

N = 933.00

Page 30: How to Learn Everything You Ever Wanted to Know About Biostatistics

30

Analyze, Descriptive Statistics, Crosstabs

SURVIVAL * 48-DISPOSITION Crosstabulation

Count

63 63224 56 12 201 236 3 138 870224 56 12 63 201 236 3 138 933

EXPIREDSURVIVED

SURVIVAL

Total

HOME

REHABILITATION

FACILITYOTHER

HOSPITAL MORGUE

SKILLEDNURSINGFACILITY

HOMEWITH

ASSISTANCE

AMADISCHAR

GEAGAINSTMEDICALADVICE 8

48-DISPOSITION

Total

Page 31: How to Learn Everything You Ever Wanted to Know About Biostatistics

31

Correct the data in the original database or spreadsheet and import a revised version into

the statistical package.

The age of 777 should be checked and The age of 777 should be checked and

changed to the correct age.changed to the correct age.

Suspicious values, such as an age of 106 Suspicious values, such as an age of 106

should be checked. In this case it is correct.should be checked. In this case it is correct.

Page 32: How to Learn Everything You Ever Wanted to Know About Biostatistics

32

Interpret the Data

Page 33: How to Learn Everything You Ever Wanted to Know About Biostatistics

33

Run descriptive statistics to summarize your data.

SURVIVAL

63 6.8 6.8 6.8870 93.2 93.2 100.0933 100.0 100.0

EXPIREDSURVIVEDTotal

ValidFrequency Percent

ValidPercent

Cumulative Percent

Statistics

49-DAYS IN HOSPITAL933

023.3419.00

2018.03

2361

237

ValidMissing

N

MeanMedianModeStd. DeviationRangeMinimumMaximum

49-DAYS IN HOSPITAL

240.0220.0

200.0180.0

160.0140.0

120.0100.0

80.060.0

40.020.0

0.0

49-DAYS IN HOSPITAL

Freq

uenc

y

400

300

200

100

0

Std. Dev = 18.03

Mean = 23.3

N = 933.00

Page 34: How to Learn Everything You Ever Wanted to Know About Biostatistics

34

V. You Will Need to Learn Inferential Statistics

Page 35: How to Learn Everything You Ever Wanted to Know About Biostatistics

35

P Value A P value is an estimate of the probability of A P value is an estimate of the probability of

results such as yours could have occurred by results such as yours could have occurred by chance alone if there truly was no difference or chance alone if there truly was no difference or association.association.

P < 0.05 = 5% chance, 1 in 20.P < 0.05 = 5% chance, 1 in 20. P <0.01 = 1% chance, 1 in 100.P <0.01 = 1% chance, 1 in 100. Alpha is the threshold. If P is < this threshold, Alpha is the threshold. If P is < this threshold,

you consider it statistically significant.you consider it statistically significant.

Page 36: How to Learn Everything You Ever Wanted to Know About Biostatistics

36

Basic formula for inferential tests

Based on the total number of observations Based on the total number of observations and the size of the test statistic, one can and the size of the test statistic, one can determine the P value.determine the P value.

yVariabilitExpectedObservedStatisticTest

Page 37: How to Learn Everything You Ever Wanted to Know About Biostatistics

37

How many noise units?

Test statistic & sample size (degrees of Test statistic & sample size (degrees of freedom) convert to a probability or P freedom) convert to a probability or P Value.Value.

NoiseSignalStatisticTest

Page 38: How to Learn Everything You Ever Wanted to Know About Biostatistics

38

Use inference statistics to test for differences and associations.

There are hundreds of statistical tests.There are hundreds of statistical tests.

A clinical researcher does not need to know them all.A clinical researcher does not need to know them all.

Learn how to perform the most common tests on SPSS.Learn how to perform the most common tests on SPSS.

Learn how to use the statistical flowchart to determine Learn how to use the statistical flowchart to determine

which test to use.which test to use.

Page 39: How to Learn Everything You Ever Wanted to Know About Biostatistics

39

VI. You Will Need to Understand the Statistical

Terminology Required to Select the Proper Inferential Test

Page 40: How to Learn Everything You Ever Wanted to Know About Biostatistics

40

Univariate vs. Multivariate Univariate analysis usually refers to one Univariate analysis usually refers to one

predictor variable and one outcome variablepredictor variable and one outcome variable Is gender a predictor of pneumonia?Is gender a predictor of pneumonia?

Multivariate analysis usually refers to more Multivariate analysis usually refers to more than one predictor variable or more than than one predictor variable or more than one outcome variable being evaluated one outcome variable being evaluated simultaneously.simultaneously. After adjusting for age, is gender a After adjusting for age, is gender a

predictor of pneumonia?predictor of pneumonia?

Page 41: How to Learn Everything You Ever Wanted to Know About Biostatistics

41

Difference vs. Association Some tests are designed to assess whether there Some tests are designed to assess whether there

are statistically significant differences between are statistically significant differences between groups.groups. Is there a statistically significant difference Is there a statistically significant difference

between the age of patients with and without between the age of patients with and without pneumonia?pneumonia?

Some tests are designed to assess whether there Some tests are designed to assess whether there are statistically significant associations between are statistically significant associations between variables.variables. Is the age of the patient associated with the Is the age of the patient associated with the

number of days in the hospital?number of days in the hospital?

Page 42: How to Learn Everything You Ever Wanted to Know About Biostatistics

42

Unmatched vs. Matched Some statistical tests are designed to assess Some statistical tests are designed to assess

groups that are unmatched or independent.groups that are unmatched or independent. Is the admission systolic blood pressure Is the admission systolic blood pressure

different between men and women?different between men and women? Some statistical tests are designed to assess Some statistical tests are designed to assess

groups that are matched or data that are paired.groups that are matched or data that are paired. Is the systolic blood pressure different Is the systolic blood pressure different

between admission and discharge?between admission and discharge?

Page 43: How to Learn Everything You Ever Wanted to Know About Biostatistics

43

Level of Measurement Categorical vs. continuous variablesCategorical vs. continuous variables

If you take the average of a continuous If you take the average of a continuous variable, it has meaning.variable, it has meaning.Average age, blood pressure, days in the Average age, blood pressure, days in the

hospital.hospital. If you take the average of a categorical If you take the average of a categorical

variable, it has no meaning.variable, it has no meaning.Average gender, race, smoker.Average gender, race, smoker.

Page 44: How to Learn Everything You Ever Wanted to Know About Biostatistics

44

Level of Measurement

Nominal - categorical Nominal - categorical gender, race, hypertensivegender, race, hypertensive

Ordinal - categories that can be rankedOrdinal - categories that can be ranked none, light, moderate, heavy smokernone, light, moderate, heavy smoker

Interval - continuous Interval - continuous blood pressure, age, days in the hospitalblood pressure, age, days in the hospital

Page 45: How to Learn Everything You Ever Wanted to Know About Biostatistics

45

Horse race example NominalNominal

Did this horse come in first place? Did this horse come in first place? 0=no, 1=yes0=no, 1=yes

OrdinalOrdinal In what position did this horse finish?In what position did this horse finish? 1=first, 2=second, 3=third, etc.1=first, 2=second, 3=third, etc.

Interval (scale)Interval (scale) How long did it take for this horse to finish?How long did it take for this horse to finish? 60 seconds, etc.60 seconds, etc.

Page 46: How to Learn Everything You Ever Wanted to Know About Biostatistics

46

Page 47: How to Learn Everything You Ever Wanted to Know About Biostatistics

47

Normal vs. Skewed Distributions Parametric statistical test can be used to Parametric statistical test can be used to

assess variables that have a “normal” or assess variables that have a “normal” or symmetrical bell-shaped distribution curve symmetrical bell-shaped distribution curve for a histogram.for a histogram.

Nonparamettric statistical test can be used Nonparamettric statistical test can be used to assess variables that are skewed or to assess variables that are skewed or nonnormal.nonnormal.

Look at a histogram to decide.Look at a histogram to decide.

Page 48: How to Learn Everything You Ever Wanted to Know About Biostatistics

48

Examples of Normal and Skewed

44-DAYS IN ICU

70.065.0

60.055.0

50.045.0

40.035.0

30.025.0

20.015.0

10.05.0

0.0

44-DAYS IN ICU

Freq

uenc

y

1000

800

600

400

200

0

Std. Dev = 3.99

Mean = .9

N = 933.00

35-SYSTOLIC BLOOD PRESSURE FIRST ER

250.0240.0

230.0220.0

210.0200.0

190.0180.0

170.0160.0

150.0140.0

130.0120.0

110.0100.0

90.080.0

70.060.0

35-SYSTOLIC BLOOD PRESSURE FIRST ER

Freq

uenc

y

160

140

120

100

80

60

40

20

0

Std. Dev = 27.74

Mean = 146.9

N = 925.00

Page 49: How to Learn Everything You Ever Wanted to Know About Biostatistics

49

VII. You Will Need to Know Which Statistical Test to Use

Page 50: How to Learn Everything You Ever Wanted to Know About Biostatistics

50

Flowchart of common inferential statistics See the handout, Figure 16-1, pages 78-79.See the handout, Figure 16-1, pages 78-79.

Page 51: How to Learn Everything You Ever Wanted to Know About Biostatistics

51

Commonly used statistical methods 1. Chi-square 2. Logistic regression 3. Student's t-test 4. Fisher's exact test 5. Cox proportional-hazards 6. Kaplan-Meier method 7. Wilcoxon rank-sum test 8. Log-rank test 9. Linear regression analysis 10. Mantel-Haenszel method

Page 52: How to Learn Everything You Ever Wanted to Know About Biostatistics

52

Commonly used statistical methods 11. One-way analysis of variance (ANOVA) 12. Mann-Whitney U test 13. Kruskal-Wallis test 14. Repeated-measures analysis of variance

15. Paired t-test 16. Chi-square test for trend 17. Wilcoxon signed-rank test 18. Analysis of variance (two-way) 19. Spearman rank-order correlation 20. Analysis of covariance (ANCOVA)

Page 53: How to Learn Everything You Ever Wanted to Know About Biostatistics

53

Chi-square The most commonly used statistical test.The most commonly used statistical test. Used to test if two or more percentages are Used to test if two or more percentages are

different.different. For example, suppose that in a study of 933 patients For example, suppose that in a study of 933 patients

with a hip fracture, 10% of the men (22/219) of the with a hip fracture, 10% of the men (22/219) of the men develop pneumonia compared with 5% of the men develop pneumonia compared with 5% of the women (36/714).women (36/714).

What is the probability that this could happen by What is the probability that this could happen by chance alone?chance alone?

Univariate, difference, unmatched, nominal, =>2 Univariate, difference, unmatched, nominal, =>2 groups, n=>20.groups, n=>20.

Page 54: How to Learn Everything You Ever Wanted to Know About Biostatistics

54

Chi-square example

PNEUMONIA COMPLICATION 480.00-486.99 * SEX Crosstabulation

197 678 875

90.0% 95.0% 93.8%

22 36 58

10.0% 5.0% 6.2%

219 714 933

100.0% 100.0% 100.0%

Count

% within SEX

Count

% within SEX

Count

% within SEX

ABSENT

PRESENT

PNEUMONIACOMPLICATION480.00-486.99

Total

MALE FEMALE

SEX

Total

Chi-Square Tests

7.197b 1 .007

6.364 1 .012

6.492 1 .011

.010 .008

7.189 1 .007

933

Pearson Chi-Square

Continuity Correctiona

Likelihood Ratio

Fisher's Exact Test

Linear-by-Linear Association

N of Valid Cases

Value dfAsymp. Sig.

(2-sided)Exact Sig.(2-sided)

Exact Sig.(1-sided)

Computed only for a 2x2 tablea.

0 cells (.0%) have expected count less than 5. The minimum expected count is 13.61.b.

Page 55: How to Learn Everything You Ever Wanted to Know About Biostatistics

55

Fisher’s Exact Test This test can be used for 2 by 2 tables when This test can be used for 2 by 2 tables when

the number of cases is too small to satisfy the the number of cases is too small to satisfy the assumptions of the chi-square.assumptions of the chi-square. Total number of cases is <20 orTotal number of cases is <20 or The expected number of cases in any cell is The expected number of cases in any cell is

<1 or<1 or More than 25% of the cells have expected More than 25% of the cells have expected

frequencies <5.frequencies <5.

Page 56: How to Learn Everything You Ever Wanted to Know About Biostatistics

56

Chi-Square Tests

13.545b 1 .000

8.674 1 .003

6.842 1 .009

.010 .010

13.531 1 .000

933

Pearson Chi-Square

Continuity Correctiona

Likelihood Ratio

Fisher's Exact Test

Linear-by-Linear Association

N of Valid Cases

Value dfAsymp. Sig.

(2-sided)Exact Sig.(2-sided)

Exact Sig.(1-sided)

Computed only for a 2x2 tablea.

1 cells (25.0%) have expected count less than 5. The minimum expected count is .50.b.

PNEUMONIA COMPLICATION 480.00-486.99 * CIRRHOSIS OR CHRONIC LIVER 571Crosstabulation

870 5 875

867.5 7.5 875.0

99.4% .6% 100.0%

94.1% 62.5% 93.8%

55 3 58

57.5 .5 58.0

94.8% 5.2% 100.0%

5.9% 37.5% 6.2%

925 8 933

925.0 8.0 933.0

99.1% .9% 100.0%

100.0% 100.0% 100.0%

Count

Expected Count

% within PNEUMONIACOMPLICATION480.00-486.99

% within CIRRHOSIS ORCHRONIC LIVER 571

Count

Expected Count

% within PNEUMONIACOMPLICATION480.00-486.99

% within CIRRHOSIS ORCHRONIC LIVER 571

Count

Expected Count

% within PNEUMONIACOMPLICATION480.00-486.99

% within CIRRHOSIS ORCHRONIC LIVER 571

ABSENT

PRESENT

PNEUMONIACOMPLICATION480.00-486.99

Total

ABSENT PRESENT

CIRRHOSIS ORCHRONIC LIVER 571

Total

Page 57: How to Learn Everything You Ever Wanted to Know About Biostatistics

57

How to calculate the expected number in a cell

PNEUMONIA COMPLICATION 480.00-486.99 * CIRRHOSIS OR CHRONICLIVER 571 Crosstabulation

Count

870 5 875

55 3 58

925 8 933

ABSENT

PRESENT

PNEUMONIACOMPLICATION480.00-486.99Total

ABSENT PRESENT

CIRRHOSIS ORCHRONIC LIVER 571

Total

PNEUMONIA COMPLICATION 480.00-486.99 * CIRRHOSIS OR CHRONIC LIVER 571Crosstabulation

870 5 875867.5 7.5 875.0

55 3 5857.5 .5 58.0925 8 933

925.0 8.0 933.0

CountExpected CountCountExpected CountCountExpected Count

ABSENT

PRESENT

PNEUMONIACOMPLICATION480.00-486.99

Total

ABSENT PRESENT

CIRRHOSIS ORCHRONIC LIVER 571

Total

PNEUMONIA COMPLICATION 480.00-486.99 * CIRRHOSIS OR CHRONIC LIVER 571 Crosstabulation

870 5 875867.5 7.5 875.0

99.4% .6% 100.0%

94.1% 62.5% 93.8%

55 3 5857.5 .5 58.0

94.8% 5.2% 100.0%

5.9% 37.5% 6.2%

925 8 933925.0 8.0 933.0

99.1% .9% 100.0%

100.0% 100.0% 100.0%

CountExpected Count% within PNEUMONIACOMPLICATION480.00-486.99% within CIRRHOSIS ORCHRONIC LIVER 571CountExpected Count% within PNEUMONIACOMPLICATION480.00-486.99% within CIRRHOSIS ORCHRONIC LIVER 571CountExpected Count% within PNEUMONIACOMPLICATION480.00-486.99% within CIRRHOSIS ORCHRONIC LIVER 571

ABSENT

PRESENT

PNEUMONIACOMPLICATION480.00-486.99

Total

ABSENT PRESENT

CIRRHOSIS ORCHRONIC LIVER 571

Total

Page 58: How to Learn Everything You Ever Wanted to Know About Biostatistics

58

Chi-square for a trend test

Used to assess a nominal variable and an Used to assess a nominal variable and an ordinal variable.ordinal variable.

Does the pneumonia rate increase with the Does the pneumonia rate increase with the total number of comorbidities?total number of comorbidities?

Univariate, association, nominal.Univariate, association, nominal. Analyze, Descriptive Statistics, Crosstabs.Analyze, Descriptive Statistics, Crosstabs.

Page 59: How to Learn Everything You Ever Wanted to Know About Biostatistics

59

PNEUMONIA COMPLICATION 480.00-486.99 * NUMBER OF COMORBIDITES (0-9) Crosstabulation

250 292 213 98 19 3 875

98.8% 94.2% 93.0% 86.0% 90.5% 50.0% 93.8%

3 18 16 16 2 3 58

1.2% 5.8% 7.0% 14.0% 9.5% 50.0% 6.2%

253 310 229 114 21 6 933

100.0% 100.0% 100.0% 100.0% 100.0% 100.0% 100.0%

Count% within NUMBER OFCOMORBIDITES (0-9)Count% within NUMBER OFCOMORBIDITES (0-9)Count% within NUMBER OFCOMORBIDITES (0-9)

ABSENT

PRESENT

PNEUMONIACOMPLICATION480.00-486.99

Total

.00 1.00 2.00 3.00 4.00 5.00NUMBER OF COMORBIDITES (0-9)

Total

Chi-Square Tests

43.381a 5 .00034.576 5 .000

30.522 1 .000

933

Pearson Chi-SquareLikelihood RatioLinear-by-LinearAssociationN of Valid Cases

Value df

Asymp.Sig.

(2-sided)

2 cells (16.7%) have expected count less than 5. Theminimum expected count is .37.

a.

Page 60: How to Learn Everything You Ever Wanted to Know About Biostatistics

60

Mantel-Haenszel Method

Used to assess a factor across a number of 2 Used to assess a factor across a number of 2 by 2 tables.by 2 tables.

Is the mortality rate associated with Is the mortality rate associated with pneumonia different between trauma pneumonia different between trauma centers and nontrauma centers?centers and nontrauma centers?

Analyze, Descriptive Statistics, Crosstabs.Analyze, Descriptive Statistics, Crosstabs.

Page 61: How to Learn Everything You Ever Wanted to Know About Biostatistics

61

Page 62: How to Learn Everything You Ever Wanted to Know About Biostatistics

62

Student’s t-test

Used to compare the average (mean) in one Used to compare the average (mean) in one group with the average in another group.group with the average in another group.

Is the average age of patients significantly Is the average age of patients significantly different between those who developed different between those who developed pneumonia and those who did not?pneumonia and those who did not?

Univariate, Difference, Unmatched, Univariate, Difference, Unmatched, Interval, Normal, 2 groups.Interval, Normal, 2 groups.

Page 63: How to Learn Everything You Ever Wanted to Know About Biostatistics

63

Independent Samples Test

1.937 .164 -1.561 931 .119 -2.849 1.825 -6.429 .732

-2.085 72.574 .041 -2.849 1.366 -5.572 -.125

Equal variances assumed

Equal variances not assumed

AGEF Sig.

Levene's Test for Equalityof Variances

t dfSig.

(2-tailed)Mean

DifferenceStd. ErrorDifference Lower Upper

95% Confidence Intervalof the Difference

t-test for Equality of Means

Page 64: How to Learn Everything You Ever Wanted to Know About Biostatistics

64

Mann-Whitney U test Same as the Same as the Wilcoxon rank-sum test Used in place of the Student’s t-test when the data are skewed.

A nonparametric test that uses the rank of the value rather than the actual value.

Univariate, Difference, Unmatched, Interval, Nonnormal, 2 groups.

Page 65: How to Learn Everything You Ever Wanted to Know About Biostatistics

65

Paired t-test Used to compare the average for measurements Used to compare the average for measurements

made twice within the same person - before vs. made twice within the same person - before vs. after.after.

Used to compare a treatment group and a matched Used to compare a treatment group and a matched control group.control group.

For example, Did the systolic blood pressure change For example, Did the systolic blood pressure change significantly from the scene of the injury to significantly from the scene of the injury to admission?admission?

Univariate, Difference, Matched, Interval, Normal, Univariate, Difference, Matched, Interval, Normal, 2 groups.2 groups.

Page 66: How to Learn Everything You Ever Wanted to Know About Biostatistics

66

Wilcoxon signed-rank test Used to compare two skewed continuous variables Used to compare two skewed continuous variables

that are paired or matched.that are paired or matched. Nonparametric equivalent of the paired t-test.Nonparametric equivalent of the paired t-test. For example, “Was the Glasgow Coma Scale score For example, “Was the Glasgow Coma Scale score

different between the scene and admission?”different between the scene and admission?” Univariate, Difference, Matched, Interval, Univariate, Difference, Matched, Interval,

Nonnormal, 2 group.Nonnormal, 2 group.

Page 67: How to Learn Everything You Ever Wanted to Know About Biostatistics

67

ANOVA

One-way used to compare more than 3 means One-way used to compare more than 3 means from independent groups.from independent groups.““Is the age different between White, Black, Is the age different between White, Black, Hispanic patients?”Hispanic patients?”

Two-way used to compare 2 or more means Two-way used to compare 2 or more means by 2 or more factors.by 2 or more factors.““Is the age different between Males and Is the age different between Males and Females, With and Without Pnuemonia?”Females, With and Without Pnuemonia?”

Page 68: How to Learn Everything You Ever Wanted to Know About Biostatistics

68

Tests of Between-Subjects Effects

Dependent Variable: AGE

5769944a 4 1442486 8664.775 .0001981.683 1 1981.683 11.904 .0011299.320 1 1299.320 7.805 .005519.282 1 519.282 3.119 .078

154657.2 929 166.4775924601 933

SourceModelSEXPNEUMONSEX * PNEUMONErrorTotal

Type IIISum ofSquares df

MeanSquare F Sig.

R Squared = .974 (Adjusted R Squared = .974)a.

Page 69: How to Learn Everything You Ever Wanted to Know About Biostatistics

69

Kruskal-Wallis One-Way ANOVA

Used to compare continuous variables that Used to compare continuous variables that are not normally distributed between more are not normally distributed between more than 2 groups.than 2 groups.

Nonparametric equivalent to the one-way Nonparametric equivalent to the one-way ANOVA.ANOVA.

Is the length of stay different by ethnicity?Is the length of stay different by ethnicity? Analyze, nonparametric tests, K Analyze, nonparametric tests, K

independent samples.independent samples.

Page 70: How to Learn Everything You Ever Wanted to Know About Biostatistics

70

Repeated-Measures ANOVA Used to assess the change in 2 or more continuous Used to assess the change in 2 or more continuous

measurement made on the same person. Can also measurement made on the same person. Can also compare groups and adjust for covariates.compare groups and adjust for covariates.

Do changes in the vital signs within the first 24 Do changes in the vital signs within the first 24 hours of a hip fracture predict which patients will hours of a hip fracture predict which patients will develop pneumonia?develop pneumonia?

Analyze, General Linear Model, Repeated Analyze, General Linear Model, Repeated Measures.Measures.

Page 71: How to Learn Everything You Ever Wanted to Know About Biostatistics

71

Pearson Correlation

Used to assess the linear association Used to assess the linear association between two continuous variables.between two continuous variables. r=1.0 perfect correlationr=1.0 perfect correlation r=0.0 no correlationr=0.0 no correlation r=-1.0 perfect inverse correlationr=-1.0 perfect inverse correlation

Univariate, Association, IntervalUnivariate, Association, Interval

Page 72: How to Learn Everything You Ever Wanted to Know About Biostatistics

72

Correlations

1.000 .088** .211** .137** .149** -.030 -.008. .007 .000 .000 .000 .356 .809

933 933 933 933 925 926 923.088** 1.000 .167** .453** .039 .016 .022.007 . .000 .000 .237 .633 .499933 933 933 933 925 926 923.211** .167** 1.000 .222** .034 -.079* .055.000 .000 . .000 .296 .017 .093

933 933 933 933 925 926 923

.137** .453** .222** 1.000 -.033 -.028 .046

.000 .000 .000 . .310 .393 .161933 933 933 933 925 926 923

.149** .039 .034 -.033 1.000 .043 .069*

.000 .237 .296 .310 . .196 .035925 925 925 925 925 925 923

-.030 .016 -.079* -.028 .043 1.000 -.100**.356 .633 .017 .393 .196 . .002926 926 926 926 925 926 923

-.008 .022 .055 .046 .069* -.100** 1.000.809 .499 .093 .161 .035 .002 .923 923 923 923 923 923 923

Pearson CorrelationSig. (2-tailed)NPearson CorrelationSig. (2-tailed)NPearson CorrelationSig. (2-tailed)N

Pearson CorrelationSig. (2-tailed)NPearson CorrelationSig. (2-tailed)NPearson CorrelationSig. (2-tailed)NPearson CorrelationSig. (2-tailed)N

AGE

49-DAYS IN HOSPITAL

NUMBER OFCOMORBIDITES (0-9)

43-TOTAL NUMBEROF COMPLICATIONS

35-SYSTOLIC BLOODPRESSURE FIRST ER

35-GLASGOW COMASCALE FIRST ER

35-PULSE FIRST ER

AGE

49-DAYSIN

HOSPITAL

NUMBEROF

COMORBIDITES

(0-9)

43-TOTALNUMBER

OFCOMPLICATIONS

35-SYSTOLIC

BLOODPRESSURE FIRST

ER

35-GLASGOW COMA

SCALEFIRST ER

35-PULSEFIRST ER

Correlation is significant at the 0.01 level (2-tailed).**.

Correlation is significant at the 0.05 level (2-tailed).*.

Page 73: How to Learn Everything You Ever Wanted to Know About Biostatistics

73

Spearman rank-order correlation Use to assess the relationship between two Use to assess the relationship between two

ordinal variables or two skewed continuous ordinal variables or two skewed continuous variables.variables.

Nonparametric equivalent of the Pearson Nonparametric equivalent of the Pearson correlation.correlation.

Univariate, Association, Ordinal (or Univariate, Association, Ordinal (or skewed).skewed).

Page 74: How to Learn Everything You Ever Wanted to Know About Biostatistics

74

Correlations

1.000 .089** .158** .145** .091** -.146** -.008. .007 .000 .000 .005 .000 .806

933 933 933 933 925 926 923.089** 1.000 .142** .389** .073* .048 .037.007 . .000 .000 .027 .149 .268933 933 933 933 925 926 923

.158** .142** 1.000 .229** .037 -.091** .042

.000 .000 . .000 .257 .006 .202

933 933 933 933 925 926 923

.145** .389** .229** 1.000 -.014 -.076* .043

.000 .000 .000 . .676 .020 .196933 933 933 933 925 926 923

.091** .073* .037 -.014 1.000 .079* .080*

.005 .027 .257 .676 . .017 .015925 925 925 925 925 925 923

-.146** .048 -.091** -.076* .079* 1.000 -.038.000 .149 .006 .020 .017 . .252926 926 926 926 925 926 923

-.008 .037 .042 .043 .080* -.038 1.000.806 .268 .202 .196 .015 .252 .923 923 923 923 923 923 923

Correlation CoefficientSig. (2-tailed)NCorrelation CoefficientSig. (2-tailed)NCorrelation CoefficientSig. (2-tailed)N

Correlation CoefficientSig. (2-tailed)NCorrelation CoefficientSig. (2-tailed)NCorrelation CoefficientSig. (2-tailed)NCorrelation CoefficientSig. (2-tailed)N

AGE

49-DAYS IN HOSPITAL

NUMBER OFCOMORBIDITES (0-9)

43-TOTAL NUMBEROF COMPLICATIONS

35-SYSTOLIC BLOODPRESSURE FIRST ER

35-GLASGOW COMASCALE FIRST ER

35-PULSE FIRST ER

Spearman's rhoAGE

49-DAYSIN

HOSPITAL

NUMBEROF

COMORBIDITES

(0-9)

43-TOTALNUMBER

OFCOMPLICATIONS

35-SYSTOLIC

BLOODPRESSURE FIRST

ER

35-GLASGOW COMA

SCALEFIRST ER

35-PULSEFIRST ER

Correlation is significant at the .01 level (2-tailed).**.

Correlation is significant at the .05 level (2-tailed).*.

Page 75: How to Learn Everything You Ever Wanted to Know About Biostatistics

75

Summary of Inferential Tests

Page 76: How to Learn Everything You Ever Wanted to Know About Biostatistics

76

Unpaired vs. Paired Student’s t-testStudent’s t-test Chi-squareChi-square One-way ANOVAOne-way ANOVA Mann-Whitney U testMann-Whitney U test Kruskal-Wallis H testKruskal-Wallis H test

Paired t-testPaired t-test McNemar’s testMcNemar’s test Repeated-measuresRepeated-measures Wilcoxon signed-rankWilcoxon signed-rank Friedman ANOVAFriedman ANOVA

Page 77: How to Learn Everything You Ever Wanted to Know About Biostatistics

77

Parametric vs. Nonparametric Student’s t-testStudent’s t-test One-way ANOVAOne-way ANOVA Paired t-testPaired t-test Pearson correlationPearson correlation Correlated F ratio Correlated F ratio

(repeatedmeasures (repeatedmeasures ANOVA)ANOVA)

Mann-Whitney U testMann-Whitney U test Kruskal-Wallis testKruskal-Wallis test Wilcoxon signed-rankWilcoxon signed-rank Spearman’s rSpearman’s r Friedman ANOVAFriedman ANOVA

Page 78: How to Learn Everything You Ever Wanted to Know About Biostatistics

78

A Good Rule to Follow

Always check your results with a Always check your results with a nonparametric.nonparametric.

If you test your null hypothesis with a If you test your null hypothesis with a Student’s t-test, also check it with a Mann-Student’s t-test, also check it with a Mann-Whitney U test.Whitney U test.

It will only take an extra 25 seconds.It will only take an extra 25 seconds.

Page 79: How to Learn Everything You Ever Wanted to Know About Biostatistics

79

VIII. You Will Need to Understand Regression

Techniques

Page 80: How to Learn Everything You Ever Wanted to Know About Biostatistics

80

Linear Regression Used to assess how one or more predictor Used to assess how one or more predictor

variables can be used to predict a variables can be used to predict a continuous outcome variable.continuous outcome variable.

““Do age, number of comorbidities, or Do age, number of comorbidities, or admission vital signs predict the length of admission vital signs predict the length of stay in the hospital after a hip fracture?”stay in the hospital after a hip fracture?”

Multivariate, Association, Interval/Ordinal Multivariate, Association, Interval/Ordinal dependent variable.dependent variable.

Page 81: How to Learn Everything You Ever Wanted to Know About Biostatistics

81

Coefficientsa

-4.451 18.889 -.236 .8147.136E-02 .045 .053 1.571 .117

2.606 .548 .159 4.757 .000

1.562E-02 .022 .024 .726 .468

1.067 1.170 .030 .912 .362

2.581E-02 .047 .019 .554 .580

-8.00E-02 .188 -.014 -.425 .671

(Constant)AGENUMBER OFCOMORBIDITES (0-9)35-SYSTOLIC BLOODPRESSURE FIRST ER35-GLASGOW COMASCALE FIRST ER35-PULSE FIRST ER35-RESPIRATIONRATE FIRST ER

Model1

B Std. Error

UnstandardizedCoefficients

Beta

Standardized

Coefficients

t Sig.

Dependent Variable: 49-DAYS IN HOSPITALa.

Page 82: How to Learn Everything You Ever Wanted to Know About Biostatistics

82

Logistic Regression

Used to assess the predictive value of one or more Used to assess the predictive value of one or more

variables on an outcome that is a yes/no question.variables on an outcome that is a yes/no question.

““Do age, gender, and comorbidities predict which Do age, gender, and comorbidities predict which

hip fracture patients will develop pneumonia?”hip fracture patients will develop pneumonia?”

Multivariate, Difference, Nominal dependent Multivariate, Difference, Nominal dependent

variable, not time-dependent, 2 groups.variable, not time-dependent, 2 groups.

Page 83: How to Learn Everything You Ever Wanted to Know About Biostatistics

83

11 Total number of Total number of comorbiditiescomorbidities

22 CirrhosisCirrhosis

33 COPDCOPD

44 GenderGender

55 AgeAge

Page 84: How to Learn Everything You Ever Wanted to Know About Biostatistics

84

Draw Conclusions We reject the null hypothesis.We reject the null hypothesis. Patients who are at high risk of developing Patients who are at high risk of developing

pneumonia during their hospitalization for a pneumonia during their hospitalization for a hip fracture can be identified by:hip fracture can be identified by: total number of pre-existing conditionstotal number of pre-existing conditions cirrhosiscirrhosis COPDCOPD male gendermale gender

Page 85: How to Learn Everything You Ever Wanted to Know About Biostatistics

85

How this information could be used to predict pneumonia on admission

Z=-4.899 + (number of comorbidities x 0.469) + Z=-4.899 + (number of comorbidities x 0.469) + (cirrhosis x 2.275) + (COPD x 0.714) + (age x (cirrhosis x 2.275) + (COPD x 0.714) + (age x 0.021) + (gender[female=1, male=0] x –0.715)0.021) + (gender[female=1, male=0] x –0.715)

e=2.718e=2.718 Example, an 80 year old male with cirrhosis and Example, an 80 year old male with cirrhosis and

one other comorbidity (but not COPD) had a one other comorbidity (but not COPD) had a 99.4% chance of developing pneumonia.99.4% chance of developing pneumonia.

Z=-4.899 + (2 x 0.469) + (1 x 2.275) + (0 x 0.714) Z=-4.899 + (2 x 0.469) + (1 x 2.275) + (0 x 0.714) + (80 x 0.021) (0 x –0.715)+ (80 x 0.021) (0 x –0.715)

)e(1

1 Pneumonia ofy Probabilit Z-

Z

Page 86: How to Learn Everything You Ever Wanted to Know About Biostatistics

86

Survival Analysis Kaplan-Meier method

Used to plot cumulative survival Log-rank test

Used to compare survival curves Cox proportional-hazards

Used to adjust for covariates in survival analysis

Page 87: How to Learn Everything You Ever Wanted to Know About Biostatistics

87

Odds and Ends You Will Need

Page 88: How to Learn Everything You Ever Wanted to Know About Biostatistics

88

95% Confidence Intervals A 95% confidence interval is an estimate that you A 95% confidence interval is an estimate that you

make from your sample as to where the true make from your sample as to where the true population value lies.population value lies.

If your study were to be repeated 100 times, you If your study were to be repeated 100 times, you would expect the 95% CIs to cross the true value would expect the 95% CIs to cross the true value for the population in 95 of these 100 studies.for the population in 95 of these 100 studies. the value might be a mean, percentage or RRthe value might be a mean, percentage or RR

Confidence intervals should be included in Confidence intervals should be included in publications for the major findings of the study.publications for the major findings of the study.

Page 89: How to Learn Everything You Ever Wanted to Know About Biostatistics

89

Prevalence vs. Incidence

PrevalencePrevalence How many of you now have the flu?How many of you now have the flu?

IncidenceIncidence How many of you have had the flu in the How many of you have had the flu in the

past year?past year?

Page 90: How to Learn Everything You Ever Wanted to Know About Biostatistics

90

Random Random is not the same as haphazard, Random is not the same as haphazard,

unplanned, incidental.unplanned, incidental. Allocating patients to the treatment group Allocating patients to the treatment group

on even days and to the control group on on even days and to the control group on odd days is systematic – not random.odd days is systematic – not random.

Random refers to the idea that each element Random refers to the idea that each element in a set has an equal probability of in a set has an equal probability of occurrence.occurrence.

Page 91: How to Learn Everything You Ever Wanted to Know About Biostatistics

91

Improving a RCT

See the handout, Table 3-2 pages18-19.See the handout, Table 3-2 pages18-19. ““Checklist to Be Used by Authors When Checklist to Be Used by Authors When

Preparing or by Readers When Analyzing a Preparing or by Readers When Analyzing a Report of a Randomized Controlled Trial”.Report of a Randomized Controlled Trial”.

Page 92: How to Learn Everything You Ever Wanted to Know About Biostatistics

92

IX. You Will Need to Continue Learning About Statistics

Page 93: How to Learn Everything You Ever Wanted to Know About Biostatistics

93

Recommended books on statistics Kuzma – Statistics in the Health SciencesKuzma – Statistics in the Health Sciences Norusis – Data Analysis with SPSSNorusis – Data Analysis with SPSS Altman – Statistics with ConfidenceAltman – Statistics with Confidence Friedman – Fundamentals of Clinical TrialsFriedman – Fundamentals of Clinical Trials Pagano – Principles of BiostatisticsPagano – Principles of Biostatistics Encyclopedia of BiostatisticsEncyclopedia of Biostatistics SPSS manualsSPSS manuals

Page 94: How to Learn Everything You Ever Wanted to Know About Biostatistics

94

Future Workshops

Page 95: How to Learn Everything You Ever Wanted to Know About Biostatistics

95

Future CRC Workshops Oct 11 - How to use wireless hand-helds for clinical

research(Paul St Jacques, MD, Anesthesiology)

Oct 18 - How to conduct Anova statistical tests - Part 1/3(Ayumi Shintani, PhD, MPH, Center for Health Services Research)

Oct 25 - How to conduct Anova statistical tests - Part 2/3(Ayumi Shintani, PhD, MPH, Center for Health Services Research)

Nov 1 - How to conduct Anova statistical tests - Part 3/3(Ayumi Shintani, PhD, MPH, Center for Health Services Research)

Nov 8 - How to write a data and safety-monitoring plan(Harvey Murff, MD)

Page 96: How to Learn Everything You Ever Wanted to Know About Biostatistics

96

X. One Final Skill You Will Need to Master

Page 97: How to Learn Everything You Ever Wanted to Know About Biostatistics

97

A response to the comment: You’re comparing apples and oranges” ““No – this is comparing apples and No – this is comparing apples and

oranges!”oranges!”