correlation: a statistic to describe the relationship between variables hours worked pay hours...
TRANSCRIPT
Correlation: A statistic to describe the relationship between variables
Hours Worked
Pay
Hours Worked
Pay
Hours Worked
Pay
Univariate vs. Bivariate Statistics
Bivariate analyses/graphical representationsScatterplotsCorrelation:
Univariate analyses/ graphical representations:Frequency histogramsMeasures of central tendency and variabilityZ-scores
linear pattern of relationship between one variable (x) and another variable (y) – an association between two variablesrelative position of one variable correlates with relative distribution of another variable
How can we define correlation?
Correlations allow us to look for evidence of a relationship between variables.
Correlations can vary in strength
CIGARETT
1086420
DO
CV
ISIT
10
8
6
4
2
0
CIGARETT
1086420
DO
CV
IS2
10
8
6
4
2
0
CIGARETT
1086420-2
VA
R0
00
01
10
8
6
4
2
0
Correlations can vary in direction
So, how do we QUANTIFY a correlation?
We need to come up with a NUMBER that reflects both the strength and direction of the correlation.
flu shotsgiven
Correlation finds the strength and direction of the best fittingline to the data.
S XY - (S X) (S Y) n r =
S X2 - (S X)2 S Y2 - (S Y)2
n n[ [ ] ]
The number we calculate in Statistics is called the correlation coefficient. Developed by Karl Pearson, it is also sometimes referred to as Pearson’s r.
Example Calculation:the following data represent the number of emergency room visits per year (x) and cigarettes smoked a day (y) by three individuals recruited from New York Methodist Hospital.
12S 15 62 77 65
= 0.94
x237
y456
x2
49
49
y2
16 25 36
xy 8
1542
65 (12) (15)
3
62 (12)2
3
77 (15)2
3
5
√[(14)(2)]=
Another way to think of the correlation
• The product of the Z-scores for each pair of scores
r = ( S Zx Zy ) /( n-1)
x237
y456
Zx-.76-.381.13
Zy-101
65.2
4
xs
x
1
5
ys
y
If x=2, (2-4)/2.65 = -.76 …If y=4, (4-5)/1 = -1
Another way to think of the correlation
• The product of the Z-scores for each pair of scores
r = ( S Zx Zy ) /( n-1)x237
y456
Zx-.76-.381.13
Zy-101
ZxZy.7601.13
S 1.89
2=.945 = .95
Interpreting the Pearson r* Range of values: Interpreting the value of r
-1.0 to +1.0
* Direction from the signnegative => anticorrelated As one variable goes up the other
goes down in value.positive => correlated As on variable goes up so does
the other.
* Strength from the magnitude| r | = 1.0 perfect relationship| r | = 0.0 no evidence of relationship0.0 < | r | < 1.0 intermediate strength relationship
When NOT to use a correlation:• Extreme scores
r = .97• Non-linear relationships
r = .20
Some Issues with Correlation
•NO CAUSATION!• Spurious correlation
Number of people who drowned in a swimming pool & number of Nicholas Cage films in a given year = .67
Per capita consumption of cheese & number of deaths by becoming tangled in bed sheets = .95
Divorce rate in Maine & consumption of margarine in the United States = .99
Preview of Next Lecture:
Textile Workers
45
50
55
60
65
70
75
80 100 120 140 160
Weight (lbs)
Hie
ght
(inc
hes)
Regression finding the best fitting line to a data set.