advanced biostatistics - simplified

Post on 20-Jun-2015

956 Views

Category:

Health & Medicine

2 Downloads

Preview:

Click to see full reader

DESCRIPTION

A presentation I have presented as a part of the Saudi Board of Community Medicine, Western Region. It simplifies the ideas behind hypothesis and hypothesis testing, also contains many different approaches of choosing the best statistical tests needed in any study.

TRANSCRIPT

1

A d v a n c e d

BiostatisticsS i m p l i fi e d

DR. M. ALHEFZI

DR. B. ALHEJAILI

SB

CM

| R1

| Taif

DR. N. ALOTAIBI

DR. M. ALGOTHAMI

PREPARED & PRESENTED BY:

DR. A. KHALAWI

DR. S. ALGHAMDI

SBCM | R1 | Taif

2

WHY BIOSTAT ?!

Collection

Summarization

Analysis – inference.

Interpretation of the results

Abhaya Indrayan (2012). Medical Biostatistics. CRC Press. ISBN 978-1-4398-8414-0. (QR-code above).

SBCM | R1 | Taif

3

Philosophy behind HypothesisWhat is a hypothesis?

CHANCE?!

Mill’s Cannons / Methods – Agreement, Difference, Concomitant, Residues

SBCM | R1 | Taif

4

Am I right or wrong ?!Is it the truth ?!

SBCM | R1 | Taif

5

SIGNIFICANCE

• BIAS?

• CONFOUNDING?

• CHANCE?

• CAUSE / EFFECT?

• GENERALIZABILITY!

SBCM | R1 | Taif

6

My HypothesisHa

TEST!

SBCM | R1 | Taif

7

SBCM | R1 | Taif

8

In other words …

HypothesisTest

Hypothesis

Measure Assoc Sig Reject

or FTR.

SBCM | R1 | Taif

9

So, what language do we speak in biostat?

MATH?

MEAN, MEDIAN, MODE, RANGE …

AREA UNDER THE CURVE, VARIANCE, SD …

MEDICINE?

EXPOSURE, DISEASE, OUTCOME, EFFECTIVITY, PREVENTION

RELATIVE RISK, ABSOLUTE RISK

SBCM | R1 | Taif

10

Biostatisticians’ language

MEAN (μ).

MEDIAN.

MODE.

AREA UNDER THE CURVE: Variance.

SD (σ).

SBCM | R1 | Taif

11

Biostatisticians’ languageStandard Deviation (SD)

SBCM | R1 | Taif

12

Photo courtesy of Judy Davidson, DNP, RN

SBCM | R1 | Taif

13

WE MAKE MISTAKES!

IN ORDER TO AVOID THEM, WE NEED TO SET RANGES FOR CHANCE, ALSO SET OUR CRITICAL LIMITS. TO END UP WITH A MASTERPIECE OF EVIDENCE!

H0

p-value vs. α level

CI *

SBCM | R1 | Taif

14

SBCM | R1 | Taif

15

Test Hypothesis

SBCM | R1 | Taif

16

Test Hypothesis

ASSUMPTIONS.

STEPS.

TESTS.

SBCM | R1 | Taif

17

Test Hypothesis

ASSUMPTIONS

– Differs for each test.

LARGE SAMPLE SIZE.

NORMAL DISTRIBUTION. Gaussian Dist.

HOMOGENEITY.

NO MULTICOLINIARITY.

KNOWN ( μ & σ ).

INDEPENDENCY.

SBCM | R1 | Taif

18

Test Hypothesis

STEPS– 7 steps of hypothesis testing.

1) RQ ?

2) H0 & H1

3) TEST & ASSUMPTIONS.

4) α LEVEL, P-VALUE.

5) TEST STATISTIC (DF).

6) DECISION.

7) CONCLUSION (YES/NO).

SBCM | R1 | Taif

19

Test Hypothesis

TEST STATISTICS

SBCM | R1 | Taif

20

InputIndep. VA.Exposure

OutputDep. VAOutcome

Disease

Dependency Concept

Each member in this group is exclusively

linked to it

Output changes

whenever input do so

SBCM | R1 | Taif

21

• Summarizing percentage, averages…Univariate

• 2 VABivariate

• Control confoundingsMultivariate

Data Analysis

• Randomization.• Restriction.• Matching.• Stratification.

SBCM | R1 | Taif

22Statistical Tests

Parametric Tests

Student’s t-test

Paired Samples

t-test

ANOVA

Correlation

Regression

SBCM | R1 | Taif

23Statistical Tests

Non-Parametric

Tests

Chi-Square

(χ2)

Wilcoxon

Mann-Whitney (U Test)

Kruskal

Wallis

Logistic Regressio

n

SBCM | R1 | Taif

24

Dependent VA (outcome, output)

2 Cat. >2 Cat. Continuous

Indep. VA

Inputexpos

ure

Cat. χ2 χ2 t-test

> 2 Cat. χ2 χ2 ANOVA

Continuous t-test ANOV

A

CorrelationLinear

Regression

Choosing a Bivariate test

SBCM | R1 | Taif

25Continuous Data

Comparing 2 Gps

t-testComparing >2 Gps

ANOVAAssoc. 2 Gps

Pearson Correlation

Prediction

Regression

SBCM | R1 | Taif

26Ordinal Data

Comparing 2 Gps

Mann-Whitney (U) test.Wilcoxon (Pre-Post).

Comparing >2 Gps

Kruskal WallisAssoc. 2 Gps

Spearman’s ρ

SBCM | R1 | Taif

27Categorical Data

Test of frequency (χ2)

How often something is observed(AKA: Goodness of Fit Test, Test of Homogeneity)

Examples:- Do negative ads change how people vote?- Is there a relationship between marital status and health insurance coverage?

28

SBCM | R1 | Taif

Comparison the difference between groups

Cat. VA (2) Cont. VA

Independent sample(t-test)

Mann-Whitney(U test)

Cont. Dep. VA same group

Paired Sample (t-test) Wilcoxon

Cat. VA (>3) Cont. VA

One Way ANOVA Kruskal Wallis

Association / Strength of Relationship

Cont. VA Cont. VA

Pearson (r) Spearman’s ρ

Prediction

Cont. VA Cont. or Cat.

SLR (Bivariate)

Cont. VA Cont. + Other VAs

MLR

Cat. VA >1 Other VAs

Logistic Regression

By @alhefzi

Choosing the Best Statistical Test

Cat. VA Cat. VA

Chi-Square(χ2 ) McNemar

PMT NPMT

SBCM | R1 | Taif

29

SBCM | R1 | Taif

30

SBCM | R1 | Taif

31Considerations

Normal Distribution & Sample Size. Large sample size ().

Shape by inspection.

Otherwise, do (Kolmogorov Smirnov) to check normality.

If NPMT with Large sample size () less powerful than a PMT.

Gaussian Distribution ().

NPMT with Gaussian distribution, “small” sample size (). (small, Non-Gaussian) ( p-value).

PMT with Non-Gaussian distribution () CLT.

PMT with Non-Gaussian distribution, “small” sample size () CLT won’t work, inaccurate p-value.

SBCM | R1 | Taif

32Considerations

1 or 2 sided p-value

H0 ().

Based on: equal population means. Otherwise, any discrepancy is due to chance!!

Question: WHICH p-value is larger and why? (1 or 2 sided)?

i.e. when formulating your Ha; consider “larger” critical p-value accordingly!

Go for 1 sided (if)

You have formulate a “directional” hypothesis.

Set it BEFORE data collection. Otherwise, you will have to attribute the difference to chance.

Go for 2 sided (if)

Unsure or in doubt of your hypothesis direction.

Set it BEFORE data collection. Otherwise, you will have to attribute the difference to chance.

SBCM | R1 | Taif

33

2-tailed testBiostatisticians’ language

The critical value is the number that separates the “blue zone” from the middle (± 1.96 this example).

In a t-test, in order to be statistically significant the t score needs to be in the “dark-blue zone”.

If α = .05, then 2.5% of the area is in each tail

SBCM | R1 | Taif

34

1-tailed testBiostatisticians’ language

The critical value is either + or -, but not both.

e.g. in a t-test In this case, you would

have statistical significance (p < .05) if t ≥ 1.645.

SBCM | R1 | Taif

35

Chi-Square (χ2) – as an exampleBiostatisticians’ language

Any number squared is a positive number.

Therefore, area under the curve starts at 0 and goes to infinity (∞).

To be statistically significant, needs to be in the upper 5% (α = .05).

Compares observed frequency to what we expected.

Published on STAT 100 - Statistical Concepts and Reasoning (QR-code above)

SBCM | R1 | Taif

36Considerations

Regression or Correlation

Correlation Regression

Cause-effect relationship X&Y are important to be

set Swapping X&Y in the curve gives different

results

In Gaussian distribution Pearson SLR, MLR

NPMT Spearman’s rho Logistic Regression

37

End of Part I

Thank you…QUESTIONS?

@alhefzi

top related