advanced biostatistics - simplified

37
Advanced Biostatistics Simplifi ed DR. M. ALHEFZI DR. B. ALHEJAILI S B C M | R 1 | T a i f 1 DR. N. ALOTAIBI DR. M. ALGOTHAMI PREPARED & PRESENTED BY: DR. A. KHALAWI DR. S. ALGHAMDI

Upload: mohammed-alhefzi

Post on 20-Jun-2015

952 views

Category:

Health & Medicine


2 download

DESCRIPTION

A presentation I have presented as a part of the Saudi Board of Community Medicine, Western Region. It simplifies the ideas behind hypothesis and hypothesis testing, also contains many different approaches of choosing the best statistical tests needed in any study.

TRANSCRIPT

Page 1: Advanced Biostatistics - Simplified

1

A d v a n c e d

BiostatisticsS i m p l i fi e d

DR. M. ALHEFZI

DR. B. ALHEJAILI

SB

CM

| R1

| Taif

DR. N. ALOTAIBI

DR. M. ALGOTHAMI

PREPARED & PRESENTED BY:

DR. A. KHALAWI

DR. S. ALGHAMDI

Page 2: Advanced Biostatistics - Simplified

SBCM | R1 | Taif

2

WHY BIOSTAT ?!

Collection

Summarization

Analysis – inference.

Interpretation of the results

Abhaya Indrayan (2012). Medical Biostatistics. CRC Press. ISBN 978-1-4398-8414-0. (QR-code above).

Page 3: Advanced Biostatistics - Simplified

SBCM | R1 | Taif

3

Philosophy behind HypothesisWhat is a hypothesis?

CHANCE?!

Mill’s Cannons / Methods – Agreement, Difference, Concomitant, Residues

Page 4: Advanced Biostatistics - Simplified

SBCM | R1 | Taif

4

Am I right or wrong ?!Is it the truth ?!

Page 5: Advanced Biostatistics - Simplified

SBCM | R1 | Taif

5

SIGNIFICANCE

• BIAS?

• CONFOUNDING?

• CHANCE?

• CAUSE / EFFECT?

• GENERALIZABILITY!

Page 6: Advanced Biostatistics - Simplified

SBCM | R1 | Taif

6

My HypothesisHa

TEST!

Page 7: Advanced Biostatistics - Simplified

SBCM | R1 | Taif

7

Page 8: Advanced Biostatistics - Simplified

SBCM | R1 | Taif

8

In other words …

HypothesisTest

Hypothesis

Measure Assoc Sig Reject

or FTR.

Page 9: Advanced Biostatistics - Simplified

SBCM | R1 | Taif

9

So, what language do we speak in biostat?

MATH?

MEAN, MEDIAN, MODE, RANGE …

AREA UNDER THE CURVE, VARIANCE, SD …

MEDICINE?

EXPOSURE, DISEASE, OUTCOME, EFFECTIVITY, PREVENTION

RELATIVE RISK, ABSOLUTE RISK

Page 10: Advanced Biostatistics - Simplified

SBCM | R1 | Taif

10

Biostatisticians’ language

MEAN (μ).

MEDIAN.

MODE.

AREA UNDER THE CURVE: Variance.

SD (σ).

Page 11: Advanced Biostatistics - Simplified

SBCM | R1 | Taif

11

Biostatisticians’ languageStandard Deviation (SD)

Page 12: Advanced Biostatistics - Simplified

SBCM | R1 | Taif

12

Photo courtesy of Judy Davidson, DNP, RN

Page 13: Advanced Biostatistics - Simplified

SBCM | R1 | Taif

13

WE MAKE MISTAKES!

IN ORDER TO AVOID THEM, WE NEED TO SET RANGES FOR CHANCE, ALSO SET OUR CRITICAL LIMITS. TO END UP WITH A MASTERPIECE OF EVIDENCE!

H0

p-value vs. α level

CI *

Page 14: Advanced Biostatistics - Simplified

SBCM | R1 | Taif

14

Page 15: Advanced Biostatistics - Simplified

SBCM | R1 | Taif

15

Test Hypothesis

Page 16: Advanced Biostatistics - Simplified

SBCM | R1 | Taif

16

Test Hypothesis

ASSUMPTIONS.

STEPS.

TESTS.

Page 17: Advanced Biostatistics - Simplified

SBCM | R1 | Taif

17

Test Hypothesis

ASSUMPTIONS

– Differs for each test.

LARGE SAMPLE SIZE.

NORMAL DISTRIBUTION. Gaussian Dist.

HOMOGENEITY.

NO MULTICOLINIARITY.

KNOWN ( μ & σ ).

INDEPENDENCY.

Page 18: Advanced Biostatistics - Simplified

SBCM | R1 | Taif

18

Test Hypothesis

STEPS– 7 steps of hypothesis testing.

1) RQ ?

2) H0 & H1

3) TEST & ASSUMPTIONS.

4) α LEVEL, P-VALUE.

5) TEST STATISTIC (DF).

6) DECISION.

7) CONCLUSION (YES/NO).

Page 19: Advanced Biostatistics - Simplified

SBCM | R1 | Taif

19

Test Hypothesis

TEST STATISTICS

Page 20: Advanced Biostatistics - Simplified

SBCM | R1 | Taif

20

InputIndep. VA.Exposure

OutputDep. VAOutcome

Disease

Dependency Concept

Each member in this group is exclusively

linked to it

Output changes

whenever input do so

Page 21: Advanced Biostatistics - Simplified

SBCM | R1 | Taif

21

• Summarizing percentage, averages…Univariate

• 2 VABivariate

• Control confoundingsMultivariate

Data Analysis

• Randomization.• Restriction.• Matching.• Stratification.

Page 22: Advanced Biostatistics - Simplified

SBCM | R1 | Taif

22Statistical Tests

Parametric Tests

Student’s t-test

Paired Samples

t-test

ANOVA

Correlation

Regression

Page 23: Advanced Biostatistics - Simplified

SBCM | R1 | Taif

23Statistical Tests

Non-Parametric

Tests

Chi-Square

(χ2)

Wilcoxon

Mann-Whitney (U Test)

Kruskal

Wallis

Logistic Regressio

n

Page 24: Advanced Biostatistics - Simplified

SBCM | R1 | Taif

24

Dependent VA (outcome, output)

2 Cat. >2 Cat. Continuous

Indep. VA

Inputexpos

ure

Cat. χ2 χ2 t-test

> 2 Cat. χ2 χ2 ANOVA

Continuous t-test ANOV

A

CorrelationLinear

Regression

Choosing a Bivariate test

Page 25: Advanced Biostatistics - Simplified

SBCM | R1 | Taif

25Continuous Data

Comparing 2 Gps

t-testComparing >2 Gps

ANOVAAssoc. 2 Gps

Pearson Correlation

Prediction

Regression

Page 26: Advanced Biostatistics - Simplified

SBCM | R1 | Taif

26Ordinal Data

Comparing 2 Gps

Mann-Whitney (U) test.Wilcoxon (Pre-Post).

Comparing >2 Gps

Kruskal WallisAssoc. 2 Gps

Spearman’s ρ

Page 27: Advanced Biostatistics - Simplified

SBCM | R1 | Taif

27Categorical Data

Test of frequency (χ2)

How often something is observed(AKA: Goodness of Fit Test, Test of Homogeneity)

Examples:- Do negative ads change how people vote?- Is there a relationship between marital status and health insurance coverage?

Page 28: Advanced Biostatistics - Simplified

28

SBCM | R1 | Taif

Comparison the difference between groups

Cat. VA (2) Cont. VA

Independent sample(t-test)

Mann-Whitney(U test)

Cont. Dep. VA same group

Paired Sample (t-test) Wilcoxon

Cat. VA (>3) Cont. VA

One Way ANOVA Kruskal Wallis

Association / Strength of Relationship

Cont. VA Cont. VA

Pearson (r) Spearman’s ρ

Prediction

Cont. VA Cont. or Cat.

SLR (Bivariate)

Cont. VA Cont. + Other VAs

MLR

Cat. VA >1 Other VAs

Logistic Regression

By @alhefzi

Choosing the Best Statistical Test

Cat. VA Cat. VA

Chi-Square(χ2 ) McNemar

PMT NPMT

Page 29: Advanced Biostatistics - Simplified

SBCM | R1 | Taif

29

Page 30: Advanced Biostatistics - Simplified

SBCM | R1 | Taif

30

Page 31: Advanced Biostatistics - Simplified

SBCM | R1 | Taif

31Considerations

Normal Distribution & Sample Size. Large sample size ().

Shape by inspection.

Otherwise, do (Kolmogorov Smirnov) to check normality.

If NPMT with Large sample size () less powerful than a PMT.

Gaussian Distribution ().

NPMT with Gaussian distribution, “small” sample size (). (small, Non-Gaussian) ( p-value).

PMT with Non-Gaussian distribution () CLT.

PMT with Non-Gaussian distribution, “small” sample size () CLT won’t work, inaccurate p-value.

Page 32: Advanced Biostatistics - Simplified

SBCM | R1 | Taif

32Considerations

1 or 2 sided p-value

H0 ().

Based on: equal population means. Otherwise, any discrepancy is due to chance!!

Question: WHICH p-value is larger and why? (1 or 2 sided)?

i.e. when formulating your Ha; consider “larger” critical p-value accordingly!

Go for 1 sided (if)

You have formulate a “directional” hypothesis.

Set it BEFORE data collection. Otherwise, you will have to attribute the difference to chance.

Go for 2 sided (if)

Unsure or in doubt of your hypothesis direction.

Set it BEFORE data collection. Otherwise, you will have to attribute the difference to chance.

Page 33: Advanced Biostatistics - Simplified

SBCM | R1 | Taif

33

2-tailed testBiostatisticians’ language

The critical value is the number that separates the “blue zone” from the middle (± 1.96 this example).

In a t-test, in order to be statistically significant the t score needs to be in the “dark-blue zone”.

If α = .05, then 2.5% of the area is in each tail

Page 34: Advanced Biostatistics - Simplified

SBCM | R1 | Taif

34

1-tailed testBiostatisticians’ language

The critical value is either + or -, but not both.

e.g. in a t-test In this case, you would

have statistical significance (p < .05) if t ≥ 1.645.

Page 35: Advanced Biostatistics - Simplified

SBCM | R1 | Taif

35

Chi-Square (χ2) – as an exampleBiostatisticians’ language

Any number squared is a positive number.

Therefore, area under the curve starts at 0 and goes to infinity (∞).

To be statistically significant, needs to be in the upper 5% (α = .05).

Compares observed frequency to what we expected.

Published on STAT 100 - Statistical Concepts and Reasoning (QR-code above)

Page 36: Advanced Biostatistics - Simplified

SBCM | R1 | Taif

36Considerations

Regression or Correlation

Correlation Regression

Cause-effect relationship X&Y are important to be

set Swapping X&Y in the curve gives different

results

In Gaussian distribution Pearson SLR, MLR

NPMT Spearman’s rho Logistic Regression

Page 37: Advanced Biostatistics - Simplified

37

End of Part I

Thank you…QUESTIONS?

@alhefzi