statistical significance p-values analytic decisions

STATISTICAL SIGNIFICANCE

P-V

ALU

ESA

NA

LYTIC

DEC

ISIO

NS

Where we are:

Thus far we’ve covered: Measures of Central Tendency Measures of Variability Z-scores Frequency Distributions Graphing/Plotting data

All of the above are used to describe individual variables

Tonight we begin to look into analyzing the relationship between two variables

However… As soon as we begin analyzing relationships, we

have to discuss ‘statistical significance’, RSE, p-values, and hypothesis testing

Descriptive statistics do NOT require such things, as we are not ‘testing’ theories about the data, only exploring You aren’t trying to ‘prove’ something with

descriptive statistics, just ‘show’ something

These next few slides are critical to your understanding of the rest of the course– please stop me for questions!

Hypotheses

Hypothesis - the prediction about what will happen during an experiment or observational study, or what researchers will find.

Examples: Drug X will lower blood pressure Smoking will increase the risk of cancer Lowering ticket prices will increase event

attendance Wide receivers can run faster than linemen

Hypotheses

Example:Wide receivers can run faster than linemen

However, keep in mind that our hypothesis might be wrong – and the opposite might be true:

Wide receivers can NOT run faster than linemen

So, each time we investigate a single hypothesis, we actually test two, competing hypotheses.

Hypothesis testing

HA: Wide receivers can run faster than linemen This is what we expect to be true This is the alternative hypothesis (HA)

HO: Wide receivers can NOT run faster than linemen This is the hypothesis we have to prove wrong –

before our real hypothesis can be correct The default hypothesis This is the null hypothesis (HO)

Hypothesis Testing

Every time you run a statistical analysis (excluding descriptive statistics), you are trying to reject a null hypothesis

Could be very specific: Men taking Lipitor will have a lower LDL cholesterol

after 6 weeks compared to men not taking Lipitor Men taking Lipitor will have a similar LDL cholesterol

after 6 weeks compared to men not taking Lipitor (no difference)

…or very simple (and non-directional): There is an association between smoking and cancer These is not an association between smoking and

cancer

Why null vs alternative?

All statistical tests boil down to…

HO vs. HA

We write and test our hypothesis in this ‘competing’ fashion for several reasons, one is to address the issue of random sampling error (RSE)

Random Sampling Error

Remember RSE? Because the group you sampled does NOT EXACTLY

represent the population you sampled from (by chance/accident)

Red blocks vs Green blocks Always have a chance of RSE

All statistical tests provide you with the probability that sampling error has occurred in that test The odds that you are seeing something due to chance

(RSE)vs The odds you are seeing something real (a real

association or real difference between groups)

Summary so far… #1- Each time we use a statistical test, there

are two competing hypotheses HO: Null Hypothesis HA: Alternative Hypothesis

#2- Each time we use a statistical test, we have to consider random sampling error The result is due to random chance (RSE, bad

sample) The result is due to a real difference or

associationThese two things, #1 and #2, are interconnected and we have to consider potential errors in our decision making

Examples of Competing Hypotheses and Error

Suppose we collected data on risk of death and smoking

We generate our hypotheses: HA: Smoking increases risk of death HO: Smoking does not increase risk of death

Now we go and run our statistical test on our hypotheses and need to make a final decision about them But, due to RSE, there are two potential errors we

could make

Error…

There are two possible errors:

Type I Error We could reject the null hypothesis although it

was really true HA: Smoking increases risk of death (FALSE) HO: Smoking does not increase risk of death

(TRUE)

This error led to unwarranted changes. We went around telling everyone to stop smoking even though it didn’t really harm them

OR…

Error…

Type II Error We could fail to reject the null hypothesis when it

was really untrue HA: Smoking increases risk of death (TRUE) HO: Smoking does not increase risk of death (FALSE)

This error led to inaction against a preventable outcome (keeping the status quo). We went around telling everyone to keeping smoking while it killed them

OR…

There are really 4 potential decisions, based on what is “true” and what we “decide”

Our Decision

Reject HO Accept HO

What is True

HO

Type I Error

Unwarranted Change

Correct

HA CorrectType II Error

Kept Status Quo

HA: Smoking increases risk of death HO: Smoking does not increase risk of death

1

3

2

4

Questions…?

RA

ND

OM

SA

MPLI

NG

ER

RO

R

Kent Brockman: Mr. Simpson, how do you respond to the charges that petty vandalism such as graffiti is down eighty percent, while heavy sack beatings are up a shocking nine hundred percent? Homer Simpson: Aw, you can come up with statistics to prove anything, Kent. Forty percent of all people know that.

http://www.imdb.com/name/nm0790434/

http://www.imdb.com/name/nm0144657/

Example of RSE

RSE is the fact that - each time you draw a sample from a population, the values of those statistics (Mean, SD, etc…) will be different to some degree

Suppose we want to determine the average points per game of an NBA player from 2008-2009 (population parameter) If I sample around 30 players 3 times, and calculate

their average points per game I’ll end up with 3 different numbers (sample statistics)

Which 1 of the 3 sample statistics is correct?

8 random samples of 10% of population: Note the varying Mean and SD – this is RSE!

Knowing this…

The process of statistics provides us with a guide to help us minimize the risk of making Type I/Type II errors and RSE Statistical significance

Recall, random sampling error is less likely when: You draw a larger sample size from the population

(larger n) The variable you are measuring has less variance

(smaller standard deviation) Hence, we calculate statistical significance with a

formula that incorporates the sample size, the mean, and the SD of the sample

Statistical Significance All statistical tests (t-tests, correlation, regression,

etc…) provide an estimate of statistical significance When comparing two groups (experimental vs control) –

how different do they need to before we can determine if the treatment worked? Perhaps any difference is due to the random chance of sampling (RSE)?

When looking for an association between 2 variables – how do we know if there really is an association or if what we’re seeing is due to the random chance of sampling?

Statistical significance puts a value on this chance

Statistical Significance Statistical significance is defined with a p-value

p is a probability, ranging from near 0 to near1

Assuming the null hypothesis is true, p is the probability that these results could be due to RSE If p is small, you can be more confident you are looking at

the reality (truth)

If p is large, it’s more likely any differences between groups or associations between variables are due to random chance

Notice there are no absolutes here –never 100% sure

Statistical Significance All analytic research estimates ‘statistical

significance’ – but this is different from ‘importance’ Dictionary definition of Significance:

The probability the observed effect was caused by something other than mere chance (mere chance = RSE)

This does NOT tell you anything about how important or meaningful the result is!

P-values are about RSE and statistical interpretation, not about how “significant” your findings are

Example

Tonight we’ll be working with NFL combine data

Suppose I want to see if WR’s are faster than OL’s Compare 40-yard dash times

I’ll randomly select a few cases and run a statistical test (in this case, a t-test)

The test will provide me with the mean and standard deviation of 40 yard dash times – along with a p-value for that test

Results

HA: WR are faster than linemen HO: WR are not faster than linemen

WR are faster than linemen, by about 0.8 seconds With a p-value so low, there is a small chance this

difference is due to RSE

PositionMean 40yd (seconds)

SD p-value

WR 4.52 0.12 0.02

OL 5.32 0.25

Results

WR are faster than linemen, by about 0.8 seconds If the null hypothesis was true, and we drew more

samples and repeated this comparison 1,000 times, we would expect to see a difference of 0.8 seconds or larger only 20 times out of 1,000 (2% of the time)

Unlikely this is NOT a real difference (low prob of Type I error)


SD p-value

WR 4.52 0.12 0.02

OL 5.32 0.25

HO: WR are not faster than linemen

Example…AGAIN

Suppose I want to see if OG’s are faster than OT’s Compare 40-yard dash times

I’ll randomly select a few cases and run a statistical test

The test will provide me with the mean and standard deviation of 40 yard dash times – along with a p-value for that test

Results

HA: OG are faster than OT HO: OG are not faster than OT

OG are faster than OT, by about 0.1 seconds With a p-value so high, there is a high chance this

difference is due to RSE (OG aren’t really faster)


SD p-value

OG 5.33 0.14 0.57

OT 5.42 0.16


SD p-value

OG 5.33 0.14 0.57

OT 5.42 0.16

Results

OG are faster than OT, by about 0.1 seconds If the null hypothesis was true, and we drew more

samples and repeated this comparison 1,000 times, we would expect to see a difference of 0.1 seconds or larger 570 times out of 1,000 (57% of the time)

Unlikely this is a real difference (high prob of Type I error)

HO: OG are not faster than OT

Alpha However, this raises the question, “How small a p-

value is small enough?” To conclude there is a real difference or real

association To remain objective, researchers make this decision

BEFORE each new statistical test (p is set a priori) Referred to as alpha, α The value of p that needs to be obtained before

concluding that the difference is statistically significant p < 0.10 p < 0.05 p < 0.01 p < 0.001

p-values WARNINGS:

A p-value of 0.03 is NOT interpreted as: “This difference has a 97% chance of being real and a 3%

chance of being due to RSE” Rather

“If the null hypothesis is true, there is a 3% chance of observing a difference (or association) as large (or larger)”

p-values are calculated differently for each statistic (t-test, correlations, etc…) – just know a p-value incorporates the SD (variability) and n (sample size)

SPSS outputs a p-value for each test Sometimes it’s “0.000” in SPSS – but that is NOT true Instead report as “p < 0.001”

CORRELATIONAssociation between 2 variables

The everyday notion of correlation

Connection Relation Linkage Conjunction Dependence and the ever too ready “cause”

NY Times, 10/24/ 2010Stories vs. StatisticsBy JOHN ALLEN PAULOS

Correlations

Knowing p-values and statistical significance, now we can begin ‘analyzing’ data Perhaps the most often used stat with a p-value

is the correlation

Suppose we wished to graph the relationship between foot length and height of 20 subjects

In order to create the scatterplot, we need the foot length and height for each of our subjects.

Scatterplot

Assume our first subject had a 12 inch foot and was 70 inches tall.

1. Find 12 inches on the x-axis.2. Find 70 inches on the y-axis.3. Locate the intersection of 12 and 70.4. Place a dot at the intersection of 12 and

70.

58

60

62

64

66

68

70

72

74

4 6 8 10 12 14

Hei

gh

t

Foot Length

Scatterplot

Scatterplot

Continue to plot each subject based on x and y

Eventually, if the two variables are related in some way, we will see a pattern…

58

60

62

64

66

68

70

72

74

4 6 8 10 12 14

A Pattern Emerges

The more closely they cluster to a line that is drawn through them, the stronger the linear relationship between the two variables is (in this case foot length and height).

Envelope

Hei

gh

t

Foot Length

58

60

62

64

66

68

70

72

74

4 6 8 10 12 14

Describing These Patterns

If the points have an upward movement from left to right, the relationship is “positive

As one increases, the other increases (larger feet > taller people + smaller feet > shorter people)

58

60

62

64

66

68

70

72

74

4 6 8 10 12 14


58

60

62

64

66

68

70

72

74

4 6 8 10 12 14


If the points on the scatterplot have a downward movement from left to right, the relationship is negative.

As one increases, the other decreases (and visa versa)

Strength of Relationship Not only do relationships have direction

(positive and negative), they also have strength (from 0.00 to 1.00 and from 0.00 to –1.00). Also known as “magnitude” of the relationship

The more closely the points cluster toward a straight line, the stronger the relationship is.

Pearson’s r

For this procedure, we use Pearson’s r aka Pearson Product Moment Correlation

Coefficient

What calculations go into this calculation? Recognize them?

( (Xi - X) * (Yi -Y) )

(Xi - X)2 * (Yi - Y)2r =

Pearson’s r

As mentioned, correlations like Pearson’s r accomplish two things: Explain the direction of the relationship

between 2 variables Positive vs Negative

Explain the strength (magnitude) of the relationship between 2 variables Range from -1 to 0 to +1 The closer to 1 (positive or negative), the stronger

it is

Strength of Relationship

A set of scores with r = –0.60 has the same strength as a set of scores with r = +0.60 because both sets cluster similarly.

http://upload.wikimedia.org/wikipedia/en/c/c4/Corr-example.png

Statistical Assumptions From here forward, each new statistic we

discuss will have it’s own set of ‘assumptions’

Statistical assumptions serve as a checklist of items that should be true in order for the statistic to be valid SPSS will do whatever you tell it to do – you have to

personally verify assumptions before moving forward

Kind of like being female is an ‘assumption’ of taking a pregnancy test If you aren’t female – you can take one – but it’s

not really going to mean anything

Assumptions of Pearson’s r 1) The measures are approximately normally

distributed Avoid using highly skewed data, or data with multiple

modes, etc…, should approximate that bell curve shape

2) The variance of the two measures is similar (homoscedasticity) -- check with scatterplot See upcoming slide

3) The sample represents the population If your sample doesn’t represent your target

population, then your correlation won’t mean anything

These three assumptions are pretty much critical to most of the statistics we’ll learn about (not unique to correlation)

Homoscedasticity Homoscedasticity is the assumption that the

variability in scores for one variable is roughly the same at all values of the other variable Heteroscedasticity=dissimilar variability across values; ex.

income vs. food consumption (income is highly variable and skewed, but food consumption is not

NBA Data: Heteroscedasticity Example

Note how variable the points are, especially towards one end of the plot

NFL Data: Homoscedasticity Example

Here, the variance appears to be equal across the entire range of scores

Two more (most) critical assumptions for r

4) The relationship is linear Can’t use variables that have a curvilinear

relationship Check with scatterplot (like last week),

plotting is always the first step!

5) The variables are measured on a interval or ratio scale (continuous variables) No nominal or ordinal data Can’t correlate body weight with gender

(even if it’s coded as a number!)

Linear correlations can’t inform you about non-linear relationships

Strength of Association - r

High (Strong) 0.85 - 1.0

Moderately-High 0.60 - 0.85

Moderate 0.30 - 0.60

Low 0.00 - 0.30

(R.M. Malina & C. Bouchard, 1991)

Describing and/or comparing multiple correlations can be difficult. However, there are standards to use:

Correlations are generally reported with two or three digits past the decimal (as 0.57 or 0.568)

Most use 2, just make sure you are consistent

Research Questions

Typical research questions that can be answered through correlation: What is the relationship between GRE scores and

graduate school GPA?

What is the relationship between athletic performance and admissions applications in college athletics?

What is the relationship between %BF and blood pressure?

Research Questions

Typical research questions that can be answered through correlation: (continued) What is the relationship between throwing

mechanics and shoulder distraction in professional baseball pitchers?

What is the relationship between certain baseball statistics (batting average, on-base percentage, etc…) and runs scored?

Correlations and causality

WARNING on correlations: Correlations only describe the relationship, they

do not prove causation (that variable A causes B) Correlation is just not a sufficient test for

determining causality when used alone Statistically speaking, there are 3 Requirements

to Infer a Causal Relationship: 1) A statistically significant relationship (r = yes) 2) Time-order (A comes before B), (r = maybe) 3) No other variable can explain this association (r =

no)

Correlations and causality

If there is a relationship between A and B it could be because A ->B A<-B A<-C->B

In this example, C is a confounding variable

Other Types of Correlations

Besides r, there are many types of correlations. For example:

Spearman rho correlation = Use when 1 or both of the two variables are ordinal Computed in SPSS the same way as Pearson’s r…simply

toggle the Spearman button on the Bivariate Correlations window

Correlation Example

Our research question (NBA Dataset): Is there a relationship between free throw

percentage and 3-point percentage (min. 1 attempt game)? HA: There is a relationship between FT% and 3PT% HO: There is no relationship between FT% and 3PT%

Analysis Plan: 1) Visually check data (scatterplot) 2) Pearson correlation between the two variables

Scatterplot

Results of correlation analysis

1. Correlation is positive2. Correlation is 0.38, moderate-to-low3. Correlation is statistically significant, p = 0.003

If there were no real relationship, we would only see a correlation of 0.375 or greater 0.3% of the time with repeated sampling and analysis

CONCLUSION: Reject the null hypothesis and accept the alternative

Results of correlation analysis

CONCLUSION: Reject the null hypothesis and accept the alternative

There is a positive, moderate-to-low relationship between NBA 3-point percentage and free throw percentage. Players that tend to shoot well at the free throw line also tend to shoot well behind the three point line.

QUESTIONS??

Upcoming…

In-class activity

Homework: Cronk 5.1 and 5.2 Holcomb Exercises 25 and 26 Reading Cronk 6.1 (optional, may be

helpful)

Regression/Prediction next week

statistical significance p-values analytic decisions

Documents

real hypothesis

hypotheses hypothesis

hypothesis testing h

alternative hypothesis

null hypothesis h o

default hypothesis

single hypothesis

cancer slide