correlation chapter 6. what is a correlation? it is a way of measuring the extent to which two...
TRANSCRIPT
![Page 1: Correlation Chapter 6. What is a Correlation? It is a way of measuring the extent to which two variables are related. It measures the pattern of responses](https://reader036.vdocuments.site/reader036/viewer/2022062519/5697c0111a28abf838ccb7bf/html5/thumbnails/1.jpg)
Correlation
Chapter 6
![Page 2: Correlation Chapter 6. What is a Correlation? It is a way of measuring the extent to which two variables are related. It measures the pattern of responses](https://reader036.vdocuments.site/reader036/viewer/2022062519/5697c0111a28abf838ccb7bf/html5/thumbnails/2.jpg)
What is a Correlation?
• It is a way of measuring the extent to which two variables are related.
• It measures the pattern of responses across variables.
![Page 3: Correlation Chapter 6. What is a Correlation? It is a way of measuring the extent to which two variables are related. It measures the pattern of responses](https://reader036.vdocuments.site/reader036/viewer/2022062519/5697c0111a28abf838ccb7bf/html5/thumbnails/3.jpg)
No relationship
![Page 4: Correlation Chapter 6. What is a Correlation? It is a way of measuring the extent to which two variables are related. It measures the pattern of responses](https://reader036.vdocuments.site/reader036/viewer/2022062519/5697c0111a28abf838ccb7bf/html5/thumbnails/4.jpg)
![Page 5: Correlation Chapter 6. What is a Correlation? It is a way of measuring the extent to which two variables are related. It measures the pattern of responses](https://reader036.vdocuments.site/reader036/viewer/2022062519/5697c0111a28abf838ccb7bf/html5/thumbnails/5.jpg)
![Page 6: Correlation Chapter 6. What is a Correlation? It is a way of measuring the extent to which two variables are related. It measures the pattern of responses](https://reader036.vdocuments.site/reader036/viewer/2022062519/5697c0111a28abf838ccb7bf/html5/thumbnails/6.jpg)
Scatterplots
• Plotting reminder– ggplot(data, aes(X, Y, color/fill = categorical)– + theme coding– + geom_point() to get the dots– + geom_smooth() to get a line– + xlab/ylab
![Page 7: Correlation Chapter 6. What is a Correlation? It is a way of measuring the extent to which two variables are related. It measures the pattern of responses](https://reader036.vdocuments.site/reader036/viewer/2022062519/5697c0111a28abf838ccb7bf/html5/thumbnails/7.jpg)
Scatterplots
• Not only do you need to check X and Y for proper labels
• Now you need to make sure X and Y are appropriate lengths for your scatterplot– Although this rule is less strictly enforced than the
bar graph rule.
![Page 8: Correlation Chapter 6. What is a Correlation? It is a way of measuring the extent to which two variables are related. It measures the pattern of responses](https://reader036.vdocuments.site/reader036/viewer/2022062519/5697c0111a28abf838ccb7bf/html5/thumbnails/8.jpg)
Fix the X/Y Axes
• coord_cartesian(xlim = NULL, ylim = NULL)• coord_cartesian(xlim = c(0,100), ylim =
c(0,100))
![Page 9: Correlation Chapter 6. What is a Correlation? It is a way of measuring the extent to which two variables are related. It measures the pattern of responses](https://reader036.vdocuments.site/reader036/viewer/2022062519/5697c0111a28abf838ccb7bf/html5/thumbnails/9.jpg)
Modeling Relationships
• First, look at some scatterplots of the variables that have been measured.• Outcomei = (model ) + errori• Outcomei = (bXi ) + errorib = beta = r when you have one predictor
![Page 10: Correlation Chapter 6. What is a Correlation? It is a way of measuring the extent to which two variables are related. It measures the pattern of responses](https://reader036.vdocuments.site/reader036/viewer/2022062519/5697c0111a28abf838ccb7bf/html5/thumbnails/10.jpg)
![Page 11: Correlation Chapter 6. What is a Correlation? It is a way of measuring the extent to which two variables are related. It measures the pattern of responses](https://reader036.vdocuments.site/reader036/viewer/2022062519/5697c0111a28abf838ccb7bf/html5/thumbnails/11.jpg)
Modeling Relationships
• Outcomei = (model ) + errori– Previously, this was Mean + SE– And we used SE to determine if the model “fit” well.
![Page 12: Correlation Chapter 6. What is a Correlation? It is a way of measuring the extent to which two variables are related. It measures the pattern of responses](https://reader036.vdocuments.site/reader036/viewer/2022062519/5697c0111a28abf838ccb7bf/html5/thumbnails/12.jpg)
Modeling Relationships
• Outcomei = (bXi ) + errori• Now we are using b or r or β (beta) to
determine the strength of the model – Traditionally you don’t see the error values
reported (sometimes you see CI)
![Page 13: Correlation Chapter 6. What is a Correlation? It is a way of measuring the extent to which two variables are related. It measures the pattern of responses](https://reader036.vdocuments.site/reader036/viewer/2022062519/5697c0111a28abf838ccb7bf/html5/thumbnails/13.jpg)
Measuring Relationships
• We need to see whether as one variable increases, the other increases, decreases or stays the same.
• This can be done by calculating the covariance.– We look at how much each score deviates from
the mean.– If both variables deviate from the mean by the
same amount, they are likely to be related.
![Page 14: Correlation Chapter 6. What is a Correlation? It is a way of measuring the extent to which two variables are related. It measures the pattern of responses](https://reader036.vdocuments.site/reader036/viewer/2022062519/5697c0111a28abf838ccb7bf/html5/thumbnails/14.jpg)
Revision of Variance
• The variance tells us by how much scores deviate from the mean for a single variable.
• It is closely linked to the sum of squares.• Covariance is similar – it tells is by how much
scores on two variables differ from their respective means.
![Page 15: Correlation Chapter 6. What is a Correlation? It is a way of measuring the extent to which two variables are related. It measures the pattern of responses](https://reader036.vdocuments.site/reader036/viewer/2022062519/5697c0111a28abf838ccb7bf/html5/thumbnails/15.jpg)
1
1
2
Variance
Nxxxx
Nxx
ii
i
![Page 16: Correlation Chapter 6. What is a Correlation? It is a way of measuring the extent to which two variables are related. It measures the pattern of responses](https://reader036.vdocuments.site/reader036/viewer/2022062519/5697c0111a28abf838ccb7bf/html5/thumbnails/16.jpg)
1
),(
Nyyxx iiyxovC
![Page 17: Correlation Chapter 6. What is a Correlation? It is a way of measuring the extent to which two variables are related. It measures the pattern of responses](https://reader036.vdocuments.site/reader036/viewer/2022062519/5697c0111a28abf838ccb7bf/html5/thumbnails/17.jpg)
![Page 18: Correlation Chapter 6. What is a Correlation? It is a way of measuring the extent to which two variables are related. It measures the pattern of responses](https://reader036.vdocuments.site/reader036/viewer/2022062519/5697c0111a28abf838ccb7bf/html5/thumbnails/18.jpg)
![Page 19: Correlation Chapter 6. What is a Correlation? It is a way of measuring the extent to which two variables are related. It measures the pattern of responses](https://reader036.vdocuments.site/reader036/viewer/2022062519/5697c0111a28abf838ccb7bf/html5/thumbnails/19.jpg)
254417
441021418221
4)4)(62()2)(60()1)(41()2)(41()3)(40(
1))((
)cov(
.
.....
.....N
yyxxy,x ii
![Page 20: Correlation Chapter 6. What is a Correlation? It is a way of measuring the extent to which two variables are related. It measures the pattern of responses](https://reader036.vdocuments.site/reader036/viewer/2022062519/5697c0111a28abf838ccb7bf/html5/thumbnails/20.jpg)
![Page 21: Correlation Chapter 6. What is a Correlation? It is a way of measuring the extent to which two variables are related. It measures the pattern of responses](https://reader036.vdocuments.site/reader036/viewer/2022062519/5697c0111a28abf838ccb7bf/html5/thumbnails/21.jpg)
Problems with Covariance
• It depends upon the units of measurement.• One solution: standardize it!– Divide by the standard deviations of both
variables.• The standardized version of covariance is
known as the correlation coefficient.– It is relatively affected by units of measurement.
![Page 22: Correlation Chapter 6. What is a Correlation? It is a way of measuring the extent to which two variables are related. It measures the pattern of responses](https://reader036.vdocuments.site/reader036/viewer/2022062519/5697c0111a28abf838ccb7bf/html5/thumbnails/22.jpg)
The Correlation Coefficient
yx
ii
yx
xy
ssNyyxx
ss
Covr
1
![Page 23: Correlation Chapter 6. What is a Correlation? It is a way of measuring the extent to which two variables are related. It measures the pattern of responses](https://reader036.vdocuments.site/reader036/viewer/2022062519/5697c0111a28abf838ccb7bf/html5/thumbnails/23.jpg)
The Correlation Coefficient
87.92.267.1
25.4
yx
xy
ss
Covr
![Page 24: Correlation Chapter 6. What is a Correlation? It is a way of measuring the extent to which two variables are related. It measures the pattern of responses](https://reader036.vdocuments.site/reader036/viewer/2022062519/5697c0111a28abf838ccb7bf/html5/thumbnails/24.jpg)
Correlation Coefficient
• Basically, r values have the model + error built into one number– Instead of M + SE– So we can just look at r to determine “model fit”
or strength of relationship.
![Page 25: Correlation Chapter 6. What is a Correlation? It is a way of measuring the extent to which two variables are related. It measures the pattern of responses](https://reader036.vdocuments.site/reader036/viewer/2022062519/5697c0111a28abf838ccb7bf/html5/thumbnails/25.jpg)
Things to know about the Correlation
• It varies between -1 and +1– 0 = no relationship
• It is an effect size– ±.1 = small effect– ±.3 = medium effect– ±.5 = large effect
• Coefficient of determination, r2– By squaring the value of r you get the proportion
of variance in one variable shared by the other.
![Page 26: Correlation Chapter 6. What is a Correlation? It is a way of measuring the extent to which two variables are related. It measures the pattern of responses](https://reader036.vdocuments.site/reader036/viewer/2022062519/5697c0111a28abf838ccb7bf/html5/thumbnails/26.jpg)
R versus r
• R = correlation coefficient for 3+ variables• r = correlation coefficient for 2 variables• r2 = coefficient of determination for 2 variables • R2 = coefficient of determination for 3+
variables– In reality, we use R2 for anything effect size
related, even if it’s only 2
![Page 27: Correlation Chapter 6. What is a Correlation? It is a way of measuring the extent to which two variables are related. It measures the pattern of responses](https://reader036.vdocuments.site/reader036/viewer/2022062519/5697c0111a28abf838ccb7bf/html5/thumbnails/27.jpg)
Correlation: Example
• Anxiety and Exam Performance• Participants:– 103 students
• Measures– Time spent revising (hours)– Exam performance (%)– Exam Anxiety (the EAQ, score out of 100)– Gender
![Page 28: Correlation Chapter 6. What is a Correlation? It is a way of measuring the extent to which two variables are related. It measures the pattern of responses](https://reader036.vdocuments.site/reader036/viewer/2022062519/5697c0111a28abf838ccb7bf/html5/thumbnails/28.jpg)
Assumptions
• Accuracy• Missing (exclude pairwise if you don’t fill in)• Outliers• Normality• Linearity**• Homogeneity• Homoscedasticity
![Page 29: Correlation Chapter 6. What is a Correlation? It is a way of measuring the extent to which two variables are related. It measures the pattern of responses](https://reader036.vdocuments.site/reader036/viewer/2022062519/5697c0111a28abf838ccb7bf/html5/thumbnails/29.jpg)
How to Calculate
• You’ve already been doing these!• cor(x,y)• But what about p values? We’ve only been
looking at the magnitude of the correlation for assumption checks.
![Page 30: Correlation Chapter 6. What is a Correlation? It is a way of measuring the extent to which two variables are related. It measures the pattern of responses](https://reader036.vdocuments.site/reader036/viewer/2022062519/5697c0111a28abf838ccb7bf/html5/thumbnails/30.jpg)
How to Calculate
• cor() function will calculate:– Pearson, Spearman, Kendall, multiple correlations
at once• rcorr() function will calculate:– Pearson, Spearman, p values, multiple correlations
at once• cor.test() function will calculate:– Pearson, Spearman, Kendall, p values, CI
![Page 31: Correlation Chapter 6. What is a Correlation? It is a way of measuring the extent to which two variables are related. It measures the pattern of responses](https://reader036.vdocuments.site/reader036/viewer/2022062519/5697c0111a28abf838ccb7bf/html5/thumbnails/31.jpg)
cor output
• cor(examdata, use = “pairwise.complete.obs”, method = “pearson”)– Remember these all have to be numeric, no factor
variables.
![Page 32: Correlation Chapter 6. What is a Correlation? It is a way of measuring the extent to which two variables are related. It measures the pattern of responses](https://reader036.vdocuments.site/reader036/viewer/2022062519/5697c0111a28abf838ccb7bf/html5/thumbnails/32.jpg)
rcorr output
• rcorr(as.matrix(examdata), type = “pearson”)– Automatically does pairwise deletion.– All things must be numeric, as well as in matrix
format– Load the Hmisc library for this function
![Page 33: Correlation Chapter 6. What is a Correlation? It is a way of measuring the extent to which two variables are related. It measures the pattern of responses](https://reader036.vdocuments.site/reader036/viewer/2022062519/5697c0111a28abf838ccb7bf/html5/thumbnails/33.jpg)
rcorr output
![Page 34: Correlation Chapter 6. What is a Correlation? It is a way of measuring the extent to which two variables are related. It measures the pattern of responses](https://reader036.vdocuments.site/reader036/viewer/2022062519/5697c0111a28abf838ccb7bf/html5/thumbnails/34.jpg)
cor.test output
• cor.test(x, y, method = “pearson”)– Must use single vectors/columns for x, y
![Page 35: Correlation Chapter 6. What is a Correlation? It is a way of measuring the extent to which two variables are related. It measures the pattern of responses](https://reader036.vdocuments.site/reader036/viewer/2022062519/5697c0111a28abf838ccb7bf/html5/thumbnails/35.jpg)
Correlation Interpretation
• The third-variable problem:– In any correlation, causality between two variables
cannot be assumed because there may be other measured or unmeasured variables affecting the results.
• Direction of causality:– Correlation coefficients say nothing about which
variable causes the other to change.
![Page 36: Correlation Chapter 6. What is a Correlation? It is a way of measuring the extent to which two variables are related. It measures the pattern of responses](https://reader036.vdocuments.site/reader036/viewer/2022062519/5697c0111a28abf838ccb7bf/html5/thumbnails/36.jpg)
Nonparametric Correlation
• Spearman’s Rho– Pearson’s correlation on the ranked data
• Kendall’s Tau– Better than Spearman’s for small samples– When lots of things have the same rank
![Page 37: Correlation Chapter 6. What is a Correlation? It is a way of measuring the extent to which two variables are related. It measures the pattern of responses](https://reader036.vdocuments.site/reader036/viewer/2022062519/5697c0111a28abf838ccb7bf/html5/thumbnails/37.jpg)
Example
• World’s best Liar Competition– 68 contestants– Measures• Where they were placed in the competition (first,
second, third, etc.)• Creativity questionnaire (maximum score 60)
C6 liar.csv
![Page 38: Correlation Chapter 6. What is a Correlation? It is a way of measuring the extent to which two variables are related. It measures the pattern of responses](https://reader036.vdocuments.site/reader036/viewer/2022062519/5697c0111a28abf838ccb7bf/html5/thumbnails/38.jpg)
Spearman/Kendall
• You can calculate:– Spearman and Kendall with cor() but no p values.– Spearman with rcorr() but no Kendall– All the things with cor.test() but one pair at a time.
![Page 39: Correlation Chapter 6. What is a Correlation? It is a way of measuring the extent to which two variables are related. It measures the pattern of responses](https://reader036.vdocuments.site/reader036/viewer/2022062519/5697c0111a28abf838ccb7bf/html5/thumbnails/39.jpg)
Spearman
![Page 40: Correlation Chapter 6. What is a Correlation? It is a way of measuring the extent to which two variables are related. It measures the pattern of responses](https://reader036.vdocuments.site/reader036/viewer/2022062519/5697c0111a28abf838ccb7bf/html5/thumbnails/40.jpg)
Kendall
![Page 41: Correlation Chapter 6. What is a Correlation? It is a way of measuring the extent to which two variables are related. It measures the pattern of responses](https://reader036.vdocuments.site/reader036/viewer/2022062519/5697c0111a28abf838ccb7bf/html5/thumbnails/41.jpg)
A note
• All values must be numeric to be able to do any of these correlations– Therefore, if you have any variables that import
with labels, you will have defactor them. – as.numeric()
![Page 42: Correlation Chapter 6. What is a Correlation? It is a way of measuring the extent to which two variables are related. It measures the pattern of responses](https://reader036.vdocuments.site/reader036/viewer/2022062519/5697c0111a28abf838ccb7bf/html5/thumbnails/42.jpg)
Point vs not point
• Point/Biserial correlation = correlation with a dichotomous variable
• Which is which?– Point biserial = true dichotomy, no underlying
continuum (i.e. gender)– Biserial = not quite discrete (i.e. pass/fail)
![Page 43: Correlation Chapter 6. What is a Correlation? It is a way of measuring the extent to which two variables are related. It measures the pattern of responses](https://reader036.vdocuments.site/reader036/viewer/2022062519/5697c0111a28abf838ccb7bf/html5/thumbnails/43.jpg)
Point vs not point
![Page 44: Correlation Chapter 6. What is a Correlation? It is a way of measuring the extent to which two variables are related. It measures the pattern of responses](https://reader036.vdocuments.site/reader036/viewer/2022062519/5697c0111a28abf838ccb7bf/html5/thumbnails/44.jpg)
Comparing Correlations
• A question I am asked a lot – how can I tell if these two correlation coefficients are significantly different?
• Install package cocor to be able to compare them!
![Page 45: Correlation Chapter 6. What is a Correlation? It is a way of measuring the extent to which two variables are related. It measures the pattern of responses](https://reader036.vdocuments.site/reader036/viewer/2022062519/5697c0111a28abf838ccb7bf/html5/thumbnails/45.jpg)
Comparing Correlations
• First, you have to decide if the correlations are independent or dependent
• Independent correlations the correlations come from separate groups of people, but are on the same variables
• Dependent correlation the correlations are from the same people, but are different variables (overlapping or not)
![Page 46: Correlation Chapter 6. What is a Correlation? It is a way of measuring the extent to which two variables are related. It measures the pattern of responses](https://reader036.vdocuments.site/reader036/viewer/2022062519/5697c0111a28abf838ccb7bf/html5/thumbnails/46.jpg)
Independent Correlations
• There’s no split/subset function within cocor.• Therefore, you have to split up the dataset on
your independent variable.– (and then put it back together in list format).
• Subset the data, then create a list.
![Page 47: Correlation Chapter 6. What is a Correlation? It is a way of measuring the extent to which two variables are related. It measures the pattern of responses](https://reader036.vdocuments.site/reader036/viewer/2022062519/5697c0111a28abf838ccb7bf/html5/thumbnails/47.jpg)
Independent Correlations
• cocor(~ X + Y | X + Y, data = data)• Fill in X and Y with your variables you want to
correlate– (on these they are likely to be the same).
• Data = new list we just created.
![Page 48: Correlation Chapter 6. What is a Correlation? It is a way of measuring the extent to which two variables are related. It measures the pattern of responses](https://reader036.vdocuments.site/reader036/viewer/2022062519/5697c0111a28abf838ccb7bf/html5/thumbnails/48.jpg)
Independent Correlations
![Page 49: Correlation Chapter 6. What is a Correlation? It is a way of measuring the extent to which two variables are related. It measures the pattern of responses](https://reader036.vdocuments.site/reader036/viewer/2022062519/5697c0111a28abf838ccb7bf/html5/thumbnails/49.jpg)
Dependent Correlations
• cocor(~ X + Y | Y + Z, data = data)• Fill in X, Y, Z with your variables you want to
correlate– Overlapping correlation
• You can also have X + Y | Q + Z – Non-overlapping correlation
• Data is the dataframe with all of the columns
![Page 50: Correlation Chapter 6. What is a Correlation? It is a way of measuring the extent to which two variables are related. It measures the pattern of responses](https://reader036.vdocuments.site/reader036/viewer/2022062519/5697c0111a28abf838ccb7bf/html5/thumbnails/50.jpg)
Dependent Correlations
• So much output!
![Page 51: Correlation Chapter 6. What is a Correlation? It is a way of measuring the extent to which two variables are related. It measures the pattern of responses](https://reader036.vdocuments.site/reader036/viewer/2022062519/5697c0111a28abf838ccb7bf/html5/thumbnails/51.jpg)
Partial and Semi-Partial Correlations
• Partial correlation:– Measures the relationship between two variables,
controlling for the effect that a third variable has on them both.
• Semi-partial correlation:– Measures the relationship between two variables
controlling for the effect that a third variable has on only one of the others.
![Page 52: Correlation Chapter 6. What is a Correlation? It is a way of measuring the extent to which two variables are related. It measures the pattern of responses](https://reader036.vdocuments.site/reader036/viewer/2022062519/5697c0111a28abf838ccb7bf/html5/thumbnails/52.jpg)
![Page 53: Correlation Chapter 6. What is a Correlation? It is a way of measuring the extent to which two variables are related. It measures the pattern of responses](https://reader036.vdocuments.site/reader036/viewer/2022062519/5697c0111a28abf838ccb7bf/html5/thumbnails/53.jpg)
![Page 54: Correlation Chapter 6. What is a Correlation? It is a way of measuring the extent to which two variables are related. It measures the pattern of responses](https://reader036.vdocuments.site/reader036/viewer/2022062519/5697c0111a28abf838ccb7bf/html5/thumbnails/54.jpg)
Partial Correlations
• Install ppcor package!• pcor(dataset, method = “pearson”)
![Page 55: Correlation Chapter 6. What is a Correlation? It is a way of measuring the extent to which two variables are related. It measures the pattern of responses](https://reader036.vdocuments.site/reader036/viewer/2022062519/5697c0111a28abf838ccb7bf/html5/thumbnails/55.jpg)
![Page 56: Correlation Chapter 6. What is a Correlation? It is a way of measuring the extent to which two variables are related. It measures the pattern of responses](https://reader036.vdocuments.site/reader036/viewer/2022062519/5697c0111a28abf838ccb7bf/html5/thumbnails/56.jpg)
Semipartial Correlations
• spcor(data, method = “pearson”)Notice how top and bottom half are not equal. Calculated as:Correlation between 1 and 2, given that 2 and 3 are correlated.
The first column = 1The rest are 2/3