introductory presentation outline: statistics · pdf filehypothesis testing: ... parametric -...

Introductory Statistics

Presentation Outline:

Types of statistics

Statistical test definitions

Simple statistical tests

What are statistics?

One way to describe statistics is as a set of

scientific techniques used for learning in the

presence of variation

Statistical measures such as P-values and

Confidence Intervals, help to quantify how

much we can learn from a sample of data

There are two types of statistics – Descriptive Statistics

Concerned with presentation, organization, and summarization

of data

Give a lot of VERY important information about data prior to

performing Inferential statistics (%’s, means, confidence

intervals)

Inferential Statistics

Used to make inferences about the characteristics of the

population from the characteristics of a random sample drawn

from the population

Hypothesis testing: using data samples to establish the

credibility of a theory about the population

P values are calculated from the different inferential statistical

tests to confirm the study hypothesis

Inferential Statistics Inferential statistics are then divided into two more

categories:

Parametric - assumes a normal distribution based on

population means and standard deviations

• Interval/Integer (pain level, temperature)

• Ratio (weight-Body Mass Index)

Nonparametric - make no assumptions about the nature

of the distribution underlying the data; these statistics are

not distribution free, we do not know what the distribution

looks like

• Nominal (gender, ethnicity)

• Ordinal (tumor position)

Inferential Statistics cont’d.

The type of inferential statistical test performed is driven by the

type of data being analyzed

Types of Data

Categorical / Nominal data consists of named categories with

no implied order among the categories.

Ordinal / Rank data consists of ordered categories, where the

differences between categories cannot be considered to be

equal.

Continuous data may take any value within a defined range and

assumes equal distances between values.

Choosing an appropriate test

Assumptions

inferential tests have certain assumptions that

you should be familiar with before you use a

test

violating the assumptions misleading results

Considerations for choosing a test

• Variables

• Distribution

• Parametric and non-parametric

The type of research design and type of data will ultimately drive the appropriate statistical test used (if all assumptions of the statistical test are met).

Type of Statistical Test

Type of Data

Continuous Ordinal Categorical

Type of Design Parametric Non-parametric

Compares 2 Independent groups Independent t-test Wilcoxon-Mann-

Whitney test Chi-square test (r x 2)

Compares 3 Independent groups One-way ANOVA Kruskal-Wallis test Chi-square test (r x k)

Compares pre and post in the same

sample size Paired t-test

Wilcoxon signed rank

test McNemar Change test

Compares multiple measures in the

same sample size

Repeated Measures

ANOVA Friedman test Cochran Q test

Correlation between two variables Pearson Correlation Spearman Correlation Kappa coefficient

To model which variables predict an

outcome Multiple Regression

Multinomial Logistic

Regression Logistic Regression

Chi-Square Test (Pearson’s, goodness-of-fit)

Underlying concept: do the observed frequencies

differ from the expected frequencies?

If H0 (NULL) is true: expected = observed

If HA (ALTERNATIVE) is true: expected ≠

observed

Design is represented by contingency tables (often

stated in % age, but count is level of analysis)

Contingency tables = frequency tables = cross-

tabulation tables

Generic Contingency Table

For a 2 x 2 contingency table the Chi-square statistic is calculated

by the formula:

Just like with the t-test, the computation will result in a test

statistic and the associated p-value that will allow discussion of

the group differences.

A B

C D

A + C B + D

A + B

C + D

Fischer’s Exact Test

Similar to the chi square test, but is preferred with:

Smaller sample sizes

Severely unequal cell distribution

Cells with an expected frequency of < 5 OR 10

T-test

Independent t-test / Student’s t-test – compares

continuous data (means) between 2 independent

groups (most robust of all statistical tests)

Paired t-test – compares continuous data (means)

between 2 dependent / matched / paired groups

ANOVA 3 or more groups

Multiple repeated measures

Within and between subject designs (also MIXED

design)

Study Design (one-way ANOVA analysis)

• Group A: full dose of drug ‘wonderful’

• Group B: half-dose of drug ‘wonderful’

• Group C: placebo

• t-test = 4 tests, ANOVA = 1 test to tell if a difference

exists

Two-way ANOVA

Two independent variables (IV) (Example: 2x2, DV – BMI)

Main effect of Diet: Yes (A+B) vs. No (C+D)

Simple Main Effect of Diet Yes: A vs. B

Interaction: A by B by C by D

(weight loss) DRUG

Diet Plan

YES NO

YES 20 (A) 26 (B)

NO 23 (C) 29 (D)

Repeated Measure ANOVA

Within subject design only or within –between

design

Every subject is exposed to each level of an

IV

Most common example: time

All other types of repeated measure are less

common and not appropriate for many

questions

Tip: avoid order effects

Measures of Association Pearson’s Correlation

A measure of the strength of association between 2

variables

Ranges from -1 to +1,

Correlation coefficient is r

-/+ signs indicate the direction of the relationship

≠ causation

0.10 (small), 0.30 (medium), 0.50 (large)

Spurious/illusory correlations

Statistical significance does not mean the results are

clinically significant

Confidence Intervals (CIs)

A range of values within which a researcher can

say with a certain degree of confidence that a

population parameter will fall.

Originally designed to analyze a sample of samples, but

is now used on one sample

Useful when the mean is uncertain do to conflicting

results

Meta-analysis

Used to test non-inferiority and superiority

Provides confidence without specificity

CIs and Hypothesis Testing

Factors That Create Misleading Results

Restricted range tends to reduce r

Nonlinear relationships – cannot use

Pearson’s correlation

X and/ or Y have skewed distribution –

underestimate r

Outliers – over- or underestimate r

Extreme groups – overestimate r

Non-Normal distribution Spearman rank correlation

Uses the same ranking principle as the Mann-Whitney

Same characteristics as Pearson’s r

• Ranges from -1 to +1,

• Correlation coefficient is rs

• -/+ signs indicate the direction of the relationship

• ≠ causation

• Spurious correlation

• Statistical significance does not mean clinically

significant

No correlation for dichotomous data (chi-square)

Non-parametric group comparisons Mann-Whitney U Test

Alternative to the independent t-test when normal

distribution is severely violated

Converts raw scores to ranks

Compares ranks between groups to determine if there is

a difference

Wilcoxon Signed-Ranks Test

Alternative to the dependent t-test when normal

distribution is severely violated

Compares rankings of difference scores

References UCLA

http://www.linguistics.ucla.edu/faciliti/facilities/statistics/power.htm

The Florida State University

http://stat.fsu.edu/undergrad/statinf2.php

About.com Sociology

http://sociology.about.com/od/Statistics/a/Descriptive-inferential-

statistics.htm

University of South Carolina

http://www.usca.edu/polisci/apls301/Text/Chapter%2012.%20Significance

%20and%20Measures%20of%20Association.htm

Boston University School of Public Health

http://sphweb.bumc.bu.edu/otlt/MPH-

Modules/BS/BS704_Nonparametric/BS704_Nonparametric2.html

Previous internal Advocate research department presentations

http://www.linguistics.ucla.edu/faciliti/facilities/statistics/power.htm

http://stat.fsu.edu/undergrad/statinf2.php

http://sociology.about.com/od/Statistics/a/Descriptive-inferential-statistics.htm





http://www.usca.edu/polisci/apls301/Text/Chapter 12. Significance and Measures of Association.htm

http://www.usca.edu/polisci/apls301/Text/Chapter 12. Significance and Measures of Association.htm

http://sphweb.bumc.bu.edu/otlt/MPH-Modules/BS/BS704_Nonparametric/BS704_Nonparametric2.html



Presentation Feedback

Thank you for your review.

If you would like to provide feedback on the content of the

presentation, please complete the short survey which can be

found at this link: Presentation Evaluation Survey

Please note the survey should not take more than 5 minutes to complete.

Thank you in advance for completing the survey!

https://advocatehealth.qualtrics.com/SE/?SID=SV_4TwnATnUsmv2jfD

Thank You!

introductory presentation outline: statistics · pdf filehypothesis testing: ... parametric -...

Documents