statistical significance p-values analytic decisions

65
STATISTICAL SIGNIFICANCE P-VALUES A N A L Y T I C D E C I S I O N S

Upload: steven-riley

Post on 23-Dec-2015

219 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: STATISTICAL SIGNIFICANCE P-VALUES ANALYTIC DECISIONS

STATISTICAL SIGNIFICANCE

P-V

ALU

ESA

NA

LYTIC

DEC

ISIO

NS

Page 2: STATISTICAL SIGNIFICANCE P-VALUES ANALYTIC DECISIONS

Where we are:

Thus far we’ve covered: Measures of Central Tendency Measures of Variability Z-scores Frequency Distributions Graphing/Plotting data

All of the above are used to describe individual variables

Tonight we begin to look into analyzing the relationship between two variables

Page 3: STATISTICAL SIGNIFICANCE P-VALUES ANALYTIC DECISIONS

However… As soon as we begin analyzing relationships, we

have to discuss ‘statistical significance’, RSE, p-values, and hypothesis testing

Descriptive statistics do NOT require such things, as we are not ‘testing’ theories about the data, only exploring You aren’t trying to ‘prove’ something with

descriptive statistics, just ‘show’ something

These next few slides are critical to your understanding of the rest of the course– please stop me for questions!

Page 4: STATISTICAL SIGNIFICANCE P-VALUES ANALYTIC DECISIONS

Hypotheses

Hypothesis - the prediction about what will happen during an experiment or observational study, or what researchers will find.

Examples: Drug X will lower blood pressure Smoking will increase the risk of cancer Lowering ticket prices will increase event

attendance Wide receivers can run faster than linemen

Page 5: STATISTICAL SIGNIFICANCE P-VALUES ANALYTIC DECISIONS

Hypotheses

Example:Wide receivers can run faster than linemen

However, keep in mind that our hypothesis might be wrong – and the opposite might be true:

Wide receivers can NOT run faster than linemen

So, each time we investigate a single hypothesis, we actually test two, competing hypotheses.

Page 6: STATISTICAL SIGNIFICANCE P-VALUES ANALYTIC DECISIONS

Hypothesis testing

HA: Wide receivers can run faster than linemen This is what we expect to be true This is the alternative hypothesis (HA)

HO: Wide receivers can NOT run faster than linemen This is the hypothesis we have to prove wrong –

before our real hypothesis can be correct The default hypothesis This is the null hypothesis (HO)

Page 7: STATISTICAL SIGNIFICANCE P-VALUES ANALYTIC DECISIONS

Hypothesis Testing

Every time you run a statistical analysis (excluding descriptive statistics), you are trying to reject a null hypothesis

Could be very specific: Men taking Lipitor will have a lower LDL cholesterol

after 6 weeks compared to men not taking Lipitor Men taking Lipitor will have a similar LDL cholesterol

after 6 weeks compared to men not taking Lipitor (no difference)

…or very simple (and non-directional): There is an association between smoking and cancer These is not an association between smoking and

cancer

Page 8: STATISTICAL SIGNIFICANCE P-VALUES ANALYTIC DECISIONS

Why null vs alternative?

All statistical tests boil down to…

HO vs. HA

We write and test our hypothesis in this ‘competing’ fashion for several reasons, one is to address the issue of random sampling error (RSE)

Page 9: STATISTICAL SIGNIFICANCE P-VALUES ANALYTIC DECISIONS

Random Sampling Error

Remember RSE? Because the group you sampled does NOT EXACTLY

represent the population you sampled from (by chance/accident)

Red blocks vs Green blocks Always have a chance of RSE

All statistical tests provide you with the probability that sampling error has occurred in that test The odds that you are seeing something due to chance

(RSE)vs The odds you are seeing something real (a real

association or real difference between groups)

Page 10: STATISTICAL SIGNIFICANCE P-VALUES ANALYTIC DECISIONS

Summary so far… #1- Each time we use a statistical test, there

are two competing hypotheses HO: Null Hypothesis HA: Alternative Hypothesis

#2- Each time we use a statistical test, we have to consider random sampling error The result is due to random chance (RSE, bad

sample) The result is due to a real difference or

associationThese two things, #1 and #2, are interconnected and we have to consider potential errors in our decision making

Page 11: STATISTICAL SIGNIFICANCE P-VALUES ANALYTIC DECISIONS

Examples of Competing Hypotheses and Error

Suppose we collected data on risk of death and smoking

We generate our hypotheses: HA: Smoking increases risk of death HO: Smoking does not increase risk of death

Now we go and run our statistical test on our hypotheses and need to make a final decision about them But, due to RSE, there are two potential errors we

could make

Page 12: STATISTICAL SIGNIFICANCE P-VALUES ANALYTIC DECISIONS

Error…

There are two possible errors:

Type I Error We could reject the null hypothesis although it

was really true HA: Smoking increases risk of death (FALSE) HO: Smoking does not increase risk of death

(TRUE)

This error led to unwarranted changes. We went around telling everyone to stop smoking even though it didn’t really harm them

OR…

Page 13: STATISTICAL SIGNIFICANCE P-VALUES ANALYTIC DECISIONS

Error…

Type II Error We could fail to reject the null hypothesis when it

was really untrue HA: Smoking increases risk of death (TRUE) HO: Smoking does not increase risk of death (FALSE)

This error led to inaction against a preventable outcome (keeping the status quo). We went around telling everyone to keeping smoking while it killed them

OR…

Page 14: STATISTICAL SIGNIFICANCE P-VALUES ANALYTIC DECISIONS

There are really 4 potential decisions, based on what is “true” and what we “decide”

Our Decision

Reject HO Accept HO

What is True

HO

Type I Error

Unwarranted Change

Correct

HA CorrectType II Error

Kept Status Quo

HA: Smoking increases risk of death HO: Smoking does not increase risk of death

1

3

2

4

Questions…?

Page 15: STATISTICAL SIGNIFICANCE P-VALUES ANALYTIC DECISIONS

RA

ND

OM

SA

MPLI

NG

ER

RO

R

Kent Brockman: Mr. Simpson, how do you respond to the charges that petty vandalism such as graffiti is down eighty percent, while heavy sack beatings are up a shocking nine hundred percent? Homer Simpson: Aw, you can come up with statistics to prove anything, Kent. Forty percent of all people know that. 

Page 16: STATISTICAL SIGNIFICANCE P-VALUES ANALYTIC DECISIONS

Example of RSE

RSE is the fact that - each time you draw a sample from a population, the values of those statistics (Mean, SD, etc…) will be different to some degree

Suppose we want to determine the average points per game of an NBA player from 2008-2009 (population parameter) If I sample around 30 players 3 times, and calculate

their average points per game I’ll end up with 3 different numbers (sample statistics)

Which 1 of the 3 sample statistics is correct?

Page 17: STATISTICAL SIGNIFICANCE P-VALUES ANALYTIC DECISIONS

8 random samples of 10% of population: Note the varying Mean and SD – this is RSE!

Page 18: STATISTICAL SIGNIFICANCE P-VALUES ANALYTIC DECISIONS

Knowing this…

The process of statistics provides us with a guide to help us minimize the risk of making Type I/Type II errors and RSE Statistical significance

Recall, random sampling error is less likely when: You draw a larger sample size from the population

(larger n) The variable you are measuring has less variance

(smaller standard deviation) Hence, we calculate statistical significance with a

formula that incorporates the sample size, the mean, and the SD of the sample

Page 19: STATISTICAL SIGNIFICANCE P-VALUES ANALYTIC DECISIONS

Statistical Significance All statistical tests (t-tests, correlation, regression,

etc…) provide an estimate of statistical significance When comparing two groups (experimental vs control) –

how different do they need to before we can determine if the treatment worked? Perhaps any difference is due to the random chance of sampling (RSE)?

When looking for an association between 2 variables – how do we know if there really is an association or if what we’re seeing is due to the random chance of sampling?

Statistical significance puts a value on this chance

Page 20: STATISTICAL SIGNIFICANCE P-VALUES ANALYTIC DECISIONS

Statistical Significance Statistical significance is defined with a p-value

p is a probability, ranging from near 0 to near1

Assuming the null hypothesis is true, p is the probability that these results could be due to RSE If p is small, you can be more confident you are looking at

the reality (truth)

If p is large, it’s more likely any differences between groups or associations between variables are due to random chance

Notice there are no absolutes here –never 100% sure

Page 21: STATISTICAL SIGNIFICANCE P-VALUES ANALYTIC DECISIONS

Statistical Significance All analytic research estimates ‘statistical

significance’ – but this is different from ‘importance’ Dictionary definition of Significance:

The probability the observed effect was caused by something other than mere chance (mere chance = RSE)

This does NOT tell you anything about how important or meaningful the result is!

P-values are about RSE and statistical interpretation, not about how “significant” your findings are

Page 22: STATISTICAL SIGNIFICANCE P-VALUES ANALYTIC DECISIONS

Example

Tonight we’ll be working with NFL combine data

Suppose I want to see if WR’s are faster than OL’s Compare 40-yard dash times

I’ll randomly select a few cases and run a statistical test (in this case, a t-test)

The test will provide me with the mean and standard deviation of 40 yard dash times – along with a p-value for that test

Page 23: STATISTICAL SIGNIFICANCE P-VALUES ANALYTIC DECISIONS

Results

HA: WR are faster than linemen HO: WR are not faster than linemen

WR are faster than linemen, by about 0.8 seconds With a p-value so low, there is a small chance this

difference is due to RSE

PositionMean 40yd (seconds)

SD p-value

WR 4.52 0.12 0.02

OL 5.32 0.25

Page 24: STATISTICAL SIGNIFICANCE P-VALUES ANALYTIC DECISIONS

Results

WR are faster than linemen, by about 0.8 seconds If the null hypothesis was true, and we drew more

samples and repeated this comparison 1,000 times, we would expect to see a difference of 0.8 seconds or larger only 20 times out of 1,000 (2% of the time)

Unlikely this is NOT a real difference (low prob of Type I error)

PositionMean 40yd (seconds)

SD p-value

WR 4.52 0.12 0.02

OL 5.32 0.25

HO: WR are not faster than linemen

Page 25: STATISTICAL SIGNIFICANCE P-VALUES ANALYTIC DECISIONS

Example…AGAIN

Suppose I want to see if OG’s are faster than OT’s Compare 40-yard dash times

I’ll randomly select a few cases and run a statistical test

The test will provide me with the mean and standard deviation of 40 yard dash times – along with a p-value for that test

Page 26: STATISTICAL SIGNIFICANCE P-VALUES ANALYTIC DECISIONS

Results

HA: OG are faster than OT HO: OG are not faster than OT

OG are faster than OT, by about 0.1 seconds With a p-value so high, there is a high chance this

difference is due to RSE (OG aren’t really faster)

PositionMean 40yd (seconds)

SD p-value

OG 5.33 0.14 0.57

OT 5.42 0.16

Page 27: STATISTICAL SIGNIFICANCE P-VALUES ANALYTIC DECISIONS

PositionMean 40yd (seconds)

SD p-value

OG 5.33 0.14 0.57

OT 5.42 0.16

Results

OG are faster than OT, by about 0.1 seconds If the null hypothesis was true, and we drew more

samples and repeated this comparison 1,000 times, we would expect to see a difference of 0.1 seconds or larger 570 times out of 1,000 (57% of the time)

Unlikely this is a real difference (high prob of Type I error)

HO: OG are not faster than OT

Page 28: STATISTICAL SIGNIFICANCE P-VALUES ANALYTIC DECISIONS

Alpha However, this raises the question, “How small a p-

value is small enough?” To conclude there is a real difference or real

association To remain objective, researchers make this decision

BEFORE each new statistical test (p is set a priori) Referred to as alpha, α The value of p that needs to be obtained before

concluding that the difference is statistically significant p < 0.10 p < 0.05 p < 0.01 p < 0.001

Page 29: STATISTICAL SIGNIFICANCE P-VALUES ANALYTIC DECISIONS

p-values WARNINGS:

A p-value of 0.03 is NOT interpreted as: “This difference has a 97% chance of being real and a 3%

chance of being due to RSE” Rather

“If the null hypothesis is true, there is a 3% chance of observing a difference (or association) as large (or larger)”

p-values are calculated differently for each statistic (t-test, correlations, etc…) – just know a p-value incorporates the SD (variability) and n (sample size)

SPSS outputs a p-value for each test Sometimes it’s “0.000” in SPSS – but that is NOT true Instead report as “p < 0.001”

Page 30: STATISTICAL SIGNIFICANCE P-VALUES ANALYTIC DECISIONS

SLIDE

Page 31: STATISTICAL SIGNIFICANCE P-VALUES ANALYTIC DECISIONS

CORRELATIONAssociation between 2 variables

Page 32: STATISTICAL SIGNIFICANCE P-VALUES ANALYTIC DECISIONS

The everyday notion of correlation

Connection Relation Linkage Conjunction Dependence and the ever too ready “cause”

NY Times, 10/24/ 2010Stories vs. StatisticsBy JOHN ALLEN PAULOS

Page 33: STATISTICAL SIGNIFICANCE P-VALUES ANALYTIC DECISIONS

Correlations

Knowing p-values and statistical significance, now we can begin ‘analyzing’ data Perhaps the most often used stat with a p-value

is the correlation

Suppose we wished to graph the relationship between foot length and height of 20 subjects

In order to create the scatterplot, we need the foot length and height for each of our subjects.

Page 34: STATISTICAL SIGNIFICANCE P-VALUES ANALYTIC DECISIONS

Scatterplot

Assume our first subject had a 12 inch foot and was 70 inches tall.

1. Find 12 inches on the x-axis.2. Find 70 inches on the y-axis.3. Locate the intersection of 12 and 70.4. Place a dot at the intersection of 12 and

70.

Page 35: STATISTICAL SIGNIFICANCE P-VALUES ANALYTIC DECISIONS

58

60

62

64

66

68

70

72

74

4 6 8 10 12 14

Hei

gh

t

Foot Length

Scatterplot

Page 36: STATISTICAL SIGNIFICANCE P-VALUES ANALYTIC DECISIONS

Scatterplot

Continue to plot each subject based on x and y

Eventually, if the two variables are related in some way, we will see a pattern…

Page 37: STATISTICAL SIGNIFICANCE P-VALUES ANALYTIC DECISIONS

58

60

62

64

66

68

70

72

74

4 6 8 10 12 14

A Pattern Emerges

The more closely they cluster to a line that is drawn through them, the stronger the linear relationship between the two variables is (in this case foot length and height).

Envelope

Hei

gh

t

Foot Length

Page 38: STATISTICAL SIGNIFICANCE P-VALUES ANALYTIC DECISIONS

58

60

62

64

66

68

70

72

74

4 6 8 10 12 14

Describing These Patterns

If the points have an upward movement from left to right, the relationship is “positive

As one increases, the other increases (larger feet > taller people + smaller feet > shorter people)

Page 39: STATISTICAL SIGNIFICANCE P-VALUES ANALYTIC DECISIONS

58

60

62

64

66

68

70

72

74

4 6 8 10 12 14

Describing These Patterns

Page 40: STATISTICAL SIGNIFICANCE P-VALUES ANALYTIC DECISIONS

58

60

62

64

66

68

70

72

74

4 6 8 10 12 14

Describing These Patterns

If the points on the scatterplot have a downward movement from left to right, the relationship is negative.

As one increases, the other decreases (and visa versa)

Page 41: STATISTICAL SIGNIFICANCE P-VALUES ANALYTIC DECISIONS

Strength of Relationship Not only do relationships have direction

(positive and negative), they also have strength (from 0.00 to 1.00 and from 0.00 to –1.00). Also known as “magnitude” of the relationship

The more closely the points cluster toward a straight line, the stronger the relationship is.

Page 42: STATISTICAL SIGNIFICANCE P-VALUES ANALYTIC DECISIONS

Pearson’s r

For this procedure, we use Pearson’s r aka Pearson Product Moment Correlation

Coefficient

What calculations go into this calculation? Recognize them?

( (Xi - X) * (Yi -Y) )

(Xi - X)2 * (Yi - Y)2r =

Page 43: STATISTICAL SIGNIFICANCE P-VALUES ANALYTIC DECISIONS

Pearson’s r

As mentioned, correlations like Pearson’s r accomplish two things: Explain the direction of the relationship

between 2 variables Positive vs Negative

Explain the strength (magnitude) of the relationship between 2 variables Range from -1 to 0 to +1 The closer to 1 (positive or negative), the stronger

it is

Page 44: STATISTICAL SIGNIFICANCE P-VALUES ANALYTIC DECISIONS

Strength of Relationship

A set of scores with r = –0.60 has the same strength as a set of scores with r = +0.60 because both sets cluster similarly.

Page 46: STATISTICAL SIGNIFICANCE P-VALUES ANALYTIC DECISIONS

Statistical Assumptions From here forward, each new statistic we

discuss will have it’s own set of ‘assumptions’

Statistical assumptions serve as a checklist of items that should be true in order for the statistic to be valid SPSS will do whatever you tell it to do – you have to

personally verify assumptions before moving forward

Kind of like being female is an ‘assumption’ of taking a pregnancy test If you aren’t female – you can take one – but it’s

not really going to mean anything

Page 47: STATISTICAL SIGNIFICANCE P-VALUES ANALYTIC DECISIONS

Assumptions of Pearson’s r 1) The measures are approximately normally

distributed Avoid using highly skewed data, or data with multiple

modes, etc…, should approximate that bell curve shape

2) The variance of the two measures is similar (homoscedasticity) -- check with scatterplot See upcoming slide

3) The sample represents the population If your sample doesn’t represent your target

population, then your correlation won’t mean anything

These three assumptions are pretty much critical to most of the statistics we’ll learn about (not unique to correlation)

Page 48: STATISTICAL SIGNIFICANCE P-VALUES ANALYTIC DECISIONS

Homoscedasticity Homoscedasticity is the assumption that the

variability in scores for one variable is roughly the same at all values of the other variable Heteroscedasticity=dissimilar variability across values; ex.

income vs. food consumption (income is highly variable and skewed, but food consumption is not

Page 49: STATISTICAL SIGNIFICANCE P-VALUES ANALYTIC DECISIONS

NBA Data: Heteroscedasticity Example

Page 50: STATISTICAL SIGNIFICANCE P-VALUES ANALYTIC DECISIONS

Note how variable the points are, especially towards one end of the plot

Page 51: STATISTICAL SIGNIFICANCE P-VALUES ANALYTIC DECISIONS

NFL Data: Homoscedasticity Example

Page 52: STATISTICAL SIGNIFICANCE P-VALUES ANALYTIC DECISIONS

Here, the variance appears to be equal across the entire range of scores

Page 53: STATISTICAL SIGNIFICANCE P-VALUES ANALYTIC DECISIONS

Two more (most) critical assumptions for r

4) The relationship is linear Can’t use variables that have a curvilinear

relationship Check with scatterplot (like last week),

plotting is always the first step!

5) The variables are measured on a interval or ratio scale (continuous variables) No nominal or ordinal data Can’t correlate body weight with gender

(even if it’s coded as a number!)

Page 54: STATISTICAL SIGNIFICANCE P-VALUES ANALYTIC DECISIONS

Linear correlations can’t inform you about non-linear relationships

Page 55: STATISTICAL SIGNIFICANCE P-VALUES ANALYTIC DECISIONS

Strength of Association - r

High (Strong) 0.85 - 1.0

Moderately-High 0.60 - 0.85

Moderate 0.30 - 0.60

Low 0.00 - 0.30

(R.M. Malina & C. Bouchard, 1991)

Describing and/or comparing multiple correlations can be difficult. However, there are standards to use:

Correlations are generally reported with two or three digits past the decimal (as 0.57 or 0.568)

Most use 2, just make sure you are consistent

Page 56: STATISTICAL SIGNIFICANCE P-VALUES ANALYTIC DECISIONS

Research Questions

Typical research questions that can be answered through correlation: What is the relationship between GRE scores and

graduate school GPA?

What is the relationship between athletic performance and admissions applications in college athletics?

What is the relationship between %BF and blood pressure?

Page 57: STATISTICAL SIGNIFICANCE P-VALUES ANALYTIC DECISIONS

Research Questions

Typical research questions that can be answered through correlation: (continued) What is the relationship between throwing

mechanics and shoulder distraction in professional baseball pitchers?

What is the relationship between certain baseball statistics (batting average, on-base percentage, etc…) and runs scored?

Page 58: STATISTICAL SIGNIFICANCE P-VALUES ANALYTIC DECISIONS

Correlations and causality

WARNING on correlations: Correlations only describe the relationship, they

do not prove causation (that variable A causes B) Correlation is just not a sufficient test for

determining causality when used alone Statistically speaking, there are 3 Requirements

to Infer a Causal Relationship: 1) A statistically significant relationship (r = yes) 2) Time-order (A comes before B), (r = maybe) 3) No other variable can explain this association (r =

no)

Page 59: STATISTICAL SIGNIFICANCE P-VALUES ANALYTIC DECISIONS

Correlations and causality

If there is a relationship between A and B it could be because A ->B A<-B A<-C->B

In this example, C is a confounding variable

Page 60: STATISTICAL SIGNIFICANCE P-VALUES ANALYTIC DECISIONS

Other Types of Correlations

Besides r, there are many types of correlations. For example:

Spearman rho correlation = Use when 1 or both of the two variables are ordinal Computed in SPSS the same way as Pearson’s r…simply

toggle the Spearman button on the Bivariate Correlations window

Page 61: STATISTICAL SIGNIFICANCE P-VALUES ANALYTIC DECISIONS

Correlation Example

Our research question (NBA Dataset): Is there a relationship between free throw

percentage and 3-point percentage (min. 1 attempt game)? HA: There is a relationship between FT% and 3PT% HO: There is no relationship between FT% and 3PT%

Analysis Plan: 1) Visually check data (scatterplot) 2) Pearson correlation between the two variables

Page 62: STATISTICAL SIGNIFICANCE P-VALUES ANALYTIC DECISIONS

Scatterplot

Page 63: STATISTICAL SIGNIFICANCE P-VALUES ANALYTIC DECISIONS

Results of correlation analysis

1. Correlation is positive2. Correlation is 0.38, moderate-to-low3. Correlation is statistically significant, p = 0.003

If there were no real relationship, we would only see a correlation of 0.375 or greater 0.3% of the time with repeated sampling and analysis

CONCLUSION: Reject the null hypothesis and accept the alternative

Page 64: STATISTICAL SIGNIFICANCE P-VALUES ANALYTIC DECISIONS

Results of correlation analysis

CONCLUSION: Reject the null hypothesis and accept the alternative

There is a positive, moderate-to-low relationship between NBA 3-point percentage and free throw percentage. Players that tend to shoot well at the free throw line also tend to shoot well behind the three point line.

QUESTIONS??

Page 65: STATISTICAL SIGNIFICANCE P-VALUES ANALYTIC DECISIONS

Upcoming…

In-class activity

Homework: Cronk 5.1 and 5.2 Holcomb Exercises 25 and 26 Reading Cronk 6.1 (optional, may be

helpful)

Regression/Prediction next week