topics for today scatterplots relationship between 2 continuous …people.stat.sfu.ca › ~dthompso...

28
Stat203 Page 1 of 28 Fall2011 – Week 9, Lecture 1 Topics for Today Scatterplots Relationship between 2 Continuous Variables Pearson’s Correlation Facts and Myths Correlation as a Statistic

Upload: others

Post on 04-Jul-2020

6 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Topics for Today Scatterplots Relationship between 2 Continuous …people.stat.sfu.ca › ~dthompso › teaching › Stat203 › Fall2011 › ... · 2011-11-07 · Stat203’’ ’

Stat203                       Page  1  of  28  Fall2011  –  Week  9,  Lecture  1    

Topics for Today

Scatterplots Relationship between 2 Continuous Variables Pearson’s Correlation Facts and Myths Correlation as a Statistic

Page 2: Topics for Today Scatterplots Relationship between 2 Continuous …people.stat.sfu.ca › ~dthompso › teaching › Stat203 › Fall2011 › ... · 2011-11-07 · Stat203’’ ’

Stat203                       Page  2  of  28  Fall2011  –  Week  9,  Lecture  1    

Two Continuous Variables

Using the 2-sample Chi-square test we were able to investigate the relationship between two discrete variables. Eg: - Radio format and age

- weather and city Now we will examine the relationship between two __________ variables. The first tool we will discuss is called ___________.

Page 3: Topics for Today Scatterplots Relationship between 2 Continuous …people.stat.sfu.ca › ~dthompso › teaching › Stat203 › Fall2011 › ... · 2011-11-07 · Stat203’’ ’

Stat203                       Page  3  of  28  Fall2011  –  Week  9,  Lecture  1    

but even before that … Scatter Plots Shows the relationship between 2 continuous variables measured on the same ___________. Values of the one variable (X) are plotted on the horizontal axis and values of the other variable (Y) are plotted on the vertical axis. Each individual appears as a single point. Let’s look at this in SPSS …

Page 4: Topics for Today Scatterplots Relationship between 2 Continuous …people.stat.sfu.ca › ~dthompso › teaching › Stat203 › Fall2011 › ... · 2011-11-07 · Stat203’’ ’

Stat203                       Page  4  of  28  Fall2011  –  Week  9,  Lecture  1    

Let’s look at a dataset called Detroit that has information from the city for years 1961 to 1973. It contains 6 variables:

- year - homicide rate (per 100,000 population) - # of police (per 100,000 population) - unemployment rate (%) - # registered handguns (per 10,000 population) - average weekly income ($)

Page 5: Topics for Today Scatterplots Relationship between 2 Continuous …people.stat.sfu.ca › ~dthompso › teaching › Stat203 › Fall2011 › ... · 2011-11-07 · Stat203’’ ’

Stat203                       Page  5  of  28  Fall2011  –  Week  9,  Lecture  1    

Let’s create a scatterplot of two of these variables.

Page 6: Topics for Today Scatterplots Relationship between 2 Continuous …people.stat.sfu.ca › ~dthompso › teaching › Stat203 › Fall2011 › ... · 2011-11-07 · Stat203’’ ’

Stat203                       Page  6  of  28  Fall2011  –  Week  9,  Lecture  1    

Page 7: Topics for Today Scatterplots Relationship between 2 Continuous …people.stat.sfu.ca › ~dthompso › teaching › Stat203 › Fall2011 › ... · 2011-11-07 · Stat203’’ ’

Stat203                       Page  7  of  28  Fall2011  –  Week  9,  Lecture  1    

A scatterplot of the # of registered handguns and the # of police officers:

Page 8: Topics for Today Scatterplots Relationship between 2 Continuous …people.stat.sfu.ca › ~dthompso › teaching › Stat203 › Fall2011 › ... · 2011-11-07 · Stat203’’ ’

Stat203                       Page  8  of  28  Fall2011  –  Week  9,  Lecture  1    

let’s look at the first row of the data table, and then identify that point (circle it) in the scatterplot on the previous page:

Each row in the data table corresponds to exactly one point in the scatter plot. What sort of relationship between the # of registered handguns and the # of police officers does this scatterplot show?

Page 9: Topics for Today Scatterplots Relationship between 2 Continuous …people.stat.sfu.ca › ~dthompso › teaching › Stat203 › Fall2011 › ... · 2011-11-07 · Stat203’’ ’

Stat203                       Page  9  of  28  Fall2011  –  Week  9,  Lecture  1    

Correlation The term ___________ is often used in common language and has a general interpretation as implying a ____________ between two events … including two discrete events: “Autism is correlated with vaccination” … or things that can’t really be measured “there’s a correlation between my mood and my partner’s behavior” However in statistics the term correlation means something specific.

Page 10: Topics for Today Scatterplots Relationship between 2 Continuous …people.stat.sfu.ca › ~dthompso › teaching › Stat203 › Fall2011 › ... · 2011-11-07 · Stat203’’ ’

Stat203                       Page  10  of  28  Fall2011  –  Week  9,  Lecture  1    

Statistical Correlation

___________ measures the _________ and ________ of a ______ relationship between two continuous variables (X and Y). Pearson’s correlation is the most commonly used:

!

r =(xi " x

i=1

n# )(yi " y )

(xi " x i=1

n# )2$ % &

' ( ) (yi " y

i=1

n# )2$ % &

' ( )

Note:

- this is ONLY a linear relationship - there are many types of relationships that are not

linear

Page 11: Topics for Today Scatterplots Relationship between 2 Continuous …people.stat.sfu.ca › ~dthompso › teaching › Stat203 › Fall2011 › ... · 2011-11-07 · Stat203’’ ’

Stat203                       Page  11  of  28  Fall2011  –  Week  9,  Lecture  1    

I only give you the formula for completeness; we will not be calculating it by hand (it is extremely tedious).

In this class as in every time you analyze data in the future,

we will make the software calculate the correlation.

However, it is important that you understand that it’s just another statistic calculated from the data, just like the

mean, the standard deviation, or the odds-ratio.

Page 12: Topics for Today Scatterplots Relationship between 2 Continuous …people.stat.sfu.ca › ~dthompso › teaching › Stat203 › Fall2011 › ... · 2011-11-07 · Stat203’’ ’

Stat203                       Page  12  of  28  Fall2011  –  Week  9,  Lecture  1    

Some Facts about Correlation

1. Correlation can only be used when both variables are interval or ratio level

2. Correlation does not change when we change the units

of measurement of X and Y Height in cm or in will give same correlation to weight in kg or lbs

3. Positive correlation indicates positive association

between the variables and negative correlation indicates negative association

4. Correlation is always between __ and _. Values near 0

indicate a very ____ relationship -1 or 1 will occur only if points fall on a straight line

Page 13: Topics for Today Scatterplots Relationship between 2 Continuous …people.stat.sfu.ca › ~dthompso › teaching › Stat203 › Fall2011 › ... · 2011-11-07 · Stat203’’ ’

Stat203                       Page  13  of  28  Fall2011  –  Week  9,  Lecture  1    

Examples The following are scatter plots of two variables with the correlation between the two listed above the plot.

Page 14: Topics for Today Scatterplots Relationship between 2 Continuous …people.stat.sfu.ca › ~dthompso › teaching › Stat203 › Fall2011 › ... · 2011-11-07 · Stat203’’ ’

Stat203                       Page  14  of  28  Fall2011  –  Week  9,  Lecture  1    

Pearson Correlation of 1

As in the definition, correlation is the strength of the linear relationship. All of these figures have the ____ correlation!

Important note! The strength of the correlation doesn’t depend on the slope of the line, just how _______ clustered the points are to a _____________ … any straight line!

Page 15: Topics for Today Scatterplots Relationship between 2 Continuous …people.stat.sfu.ca › ~dthompso › teaching › Stat203 › Fall2011 › ... · 2011-11-07 · Stat203’’ ’

Stat203                       Page  15  of  28  Fall2011  –  Week  9,  Lecture  1    

Examples of a relationship with Pearson Correlation of 0

Page 16: Topics for Today Scatterplots Relationship between 2 Continuous …people.stat.sfu.ca › ~dthompso › teaching › Stat203 › Fall2011 › ... · 2011-11-07 · Stat203’’ ’

Stat203                       Page  16  of  28  Fall2011  –  Week  9,  Lecture  1    

Facts in a video

http://www.youtube.com/watch?v=Ypgo4qUBt5o

Page 17: Topics for Today Scatterplots Relationship between 2 Continuous …people.stat.sfu.ca › ~dthompso › teaching › Stat203 › Fall2011 › ... · 2011-11-07 · Stat203’’ ’

Stat203                       Page  17  of  28  Fall2011  –  Week  9,  Lecture  1    

Let’s do some examples – Correlation guessing

Q15, pg 370 – correlation between poverty and rates of teen pregnancy in 8 US states.

a) [-0.95, -0.5) b) [-0.5, 0) c) (0, 0.5) d) [0.5, 0.95)

Page 18: Topics for Today Scatterplots Relationship between 2 Continuous …people.stat.sfu.ca › ~dthompso › teaching › Stat203 › Fall2011 › ... · 2011-11-07 · Stat203’’ ’

Stat203                       Page  18  of  28  Fall2011  –  Week  9,  Lecture  1    

Q16, pg 370 (edited) – Hours studied and exam grade

a) [-0.95, -0.5) b) [-0.5, 0) c) (0, 0.5) d) [0.5, 0.95)

Page 19: Topics for Today Scatterplots Relationship between 2 Continuous …people.stat.sfu.ca › ~dthompso › teaching › Stat203 › Fall2011 › ... · 2011-11-07 · Stat203’’ ’

Stat203                       Page  19  of  28  Fall2011  –  Week  9,  Lecture  1    

Q19, pg 371 – Hours watching TV vs # books read

a) [-0.95, -0.5) b) [-0.5, 0) c) (0, 0.5) d) [0.5, 0.95)

Page 20: Topics for Today Scatterplots Relationship between 2 Continuous …people.stat.sfu.ca › ~dthompso › teaching › Stat203 › Fall2011 › ... · 2011-11-07 · Stat203’’ ’

Stat203                       Page  20  of  28  Fall2011  –  Week  9,  Lecture  1    

An Example

In which of these two scatter plots is the correlation higher?

-3 -2 -1 0 1 2

-4-2

02

x

y

-5 0 5

-50

5

x

y

Page 21: Topics for Today Scatterplots Relationship between 2 Continuous …people.stat.sfu.ca › ~dthompso › teaching › Stat203 › Fall2011 › ... · 2011-11-07 · Stat203’’ ’

Stat203                       Page  21  of  28  Fall2011  –  Week  9,  Lecture  1    

The correlation of the x and y in the two figures is _________, only the _____ of the axes is different! Don’t trust your eye, always calculate the correlation. … but don’t trust the correlation … always check by eye.

Page 22: Topics for Today Scatterplots Relationship between 2 Continuous …people.stat.sfu.ca › ~dthompso › teaching › Stat203 › Fall2011 › ... · 2011-11-07 · Stat203’’ ’

Stat203                       Page  22  of  28  Fall2011  –  Week  9,  Lecture  1    

Myths about Correlation

1. Correlation implies causation

There could be a third, unknown variable which influences both X and Y

2. A correlation coefficient of zero implies no relationship between two variables

WRONG! it only implies no LINEAR relationship!

Remember the funky shaped figures!

Page 23: Topics for Today Scatterplots Relationship between 2 Continuous …people.stat.sfu.ca › ~dthompso › teaching › Stat203 › Fall2011 › ... · 2011-11-07 · Stat203’’ ’

Stat203                       Page  23  of  28  Fall2011  –  Week  9,  Lecture  1    

Myths explained in video http://www.youtube.com/watch?v=MTbZoKEOkUg http://www.youtube.com/watch?v=VW1IEqKuf6s

(Only to 2:48)

Page 24: Topics for Today Scatterplots Relationship between 2 Continuous …people.stat.sfu.ca › ~dthompso › teaching › Stat203 › Fall2011 › ... · 2011-11-07 · Stat203’’ ’

Stat203                       Page  24  of  28  Fall2011  –  Week  9,  Lecture  1    

Correlation as a statistic

As with the mean, the Odds Ratio and the other statistics we have looked at, a correlation is a characteristic of a population that we estimate with our ______:

 Population  (Parameter)  

Sample  (Statistic)  

Mean   µ  

!

X  Proportion  

!

p  

!

ˆ p  Odds  Ratio   OR    Correlation   _ _

Page 25: Topics for Today Scatterplots Relationship between 2 Continuous …people.stat.sfu.ca › ~dthompso › teaching › Stat203 › Fall2011 › ... · 2011-11-07 · Stat203’’ ’

Stat203                       Page  25  of  28  Fall2011  –  Week  9,  Lecture  1    

The r tells part of the story Remember, the correlation (r) we calculate from a sample is only one of the _____________ correlations we could have obtained one of many possible _______. It’s possible that the true population correlation, ρ, has another value … say 0, or ρ0. So … there is some variability of our estimate r, it’s standard error.

!

s ˆ e (r) =1" r2

n " 2

Page 26: Topics for Today Scatterplots Relationship between 2 Continuous …people.stat.sfu.ca › ~dthompso › teaching › Stat203 › Fall2011 › ... · 2011-11-07 · Stat203’’ ’

Stat203                       Page  26  of  28  Fall2011  –  Week  9,  Lecture  1    

Hypotheses for Associations between Continuous Variables

H0: there is no linear relationship between X and Y

Ha: there is a linear relationship between X and Y

Is the same as:

H0: H0: ρ = 0

Ha: H0: ρ ≠ 0 And as in our other hypotheses tests, we will use a _________ (r ) to approximate a _________ (ρ).

Page 27: Topics for Today Scatterplots Relationship between 2 Continuous …people.stat.sfu.ca › ~dthompso › teaching › Stat203 › Fall2011 › ... · 2011-11-07 · Stat203’’ ’

Stat203                       Page  27  of  28  Fall2011  –  Week  9,  Lecture  1    

Testing for Correlation = 0 Recall our hypothesis tests for the µ= 0, we used a t-test.

!

t =x " 0se(x )

=x

s / n

If both X and Y are normally distributed, the test for H0: ρ = 0 is very similar:

!

t =r " 0se(r)

=r

1" r2n " 2

and we look up our t value in the appropriate table to find the p-value!

Page 28: Topics for Today Scatterplots Relationship between 2 Continuous …people.stat.sfu.ca › ~dthompso › teaching › Stat203 › Fall2011 › ... · 2011-11-07 · Stat203’’ ’

Stat203                       Page  28  of  28  Fall2011  –  Week  9,  Lecture  1    

New Topics Covered Today

Pearson’s Correlation • Most commonly calculated correlation statistic • No definition of response or predictor • Always between -1 and 1

Hypothesis testing for Correlation • Does a correlation exist? Reject null = a non-zero correlation

Reading: Chapter 10 up to page 360