a full analysis example multiple correlations partial ...jackd/stat203_2011/wk09_1.pdf · graphs...

38
A full analysis example Multiple correlations Partial correlations

Upload: others

Post on 24-Jan-2020

3 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: A full analysis example Multiple correlations Partial ...jackd/Stat203_2011/Wk09_1.pdf · Graphs Legacy Dialog Scatter/Dot No obvious signs of non-linear trends, but there doesn’t

A full analysis example

Multiple correlations

Partial correlations

Page 2: A full analysis example Multiple correlations Partial ...jackd/Stat203_2011/Wk09_1.pdf · Graphs Legacy Dialog Scatter/Dot No obvious signs of non-linear trends, but there doesn’t

New Dataset: Confidence

This is a dataset taken of the confidence scales of 41

employees some years ago using 4 facets of confidence

(Physical, Appearance, Emotional, and Problem Solving, as well

as their gender and their citizenship status.

Page 3: A full analysis example Multiple correlations Partial ...jackd/Stat203_2011/Wk09_1.pdf · Graphs Legacy Dialog Scatter/Dot No obvious signs of non-linear trends, but there doesn’t

Example problem 1: Analyze the correlation between physical

confidence and appearance confidence.

First question we should ask “Is Pearson correlation

appropriate?”

Four requirements for correlation:

1. _____

2. _____

3. _____

4. _____

Page 4: A full analysis example Multiple correlations Partial ...jackd/Stat203_2011/Wk09_1.pdf · Graphs Legacy Dialog Scatter/Dot No obvious signs of non-linear trends, but there doesn’t

Example problem 1: Analyze the correlation between physical

confidence and appearance confidence.

First question we should ask “Is Pearson correlation

appropriate?”

Four requirements for correlation:

1. A straight-line relationship.

2. Interval data.

3. Random sampling (Will need to assume)

4. Normal distributed characteristics

Page 5: A full analysis example Multiple correlations Partial ...jackd/Stat203_2011/Wk09_1.pdf · Graphs Legacy Dialog Scatter/Dot No obvious signs of non-linear trends, but there doesn’t

Check for normality in each of the histograms.

(Graphs Legacy Dialog Histogram)

Page 6: A full analysis example Multiple correlations Partial ...jackd/Stat203_2011/Wk09_1.pdf · Graphs Legacy Dialog Scatter/Dot No obvious signs of non-linear trends, but there doesn’t

The appearance variable is close enough to normal, although it

has more on the upper and lower end than it should.

The physical variable has a negative skew, so that could be a

problem.

Page 7: A full analysis example Multiple correlations Partial ...jackd/Stat203_2011/Wk09_1.pdf · Graphs Legacy Dialog Scatter/Dot No obvious signs of non-linear trends, but there doesn’t

There are at least two values that are far below the mean for

confidence in physical. We should investigate them further.

Graphs Legacy Dialogs Boxplot

Use summaries of separate variables, and

Options Exclude Variable-by-Variable

Page 8: A full analysis example Multiple correlations Partial ...jackd/Stat203_2011/Wk09_1.pdf · Graphs Legacy Dialog Scatter/Dot No obvious signs of non-linear trends, but there doesn’t

Boxplots identify outliers, from the boxplot we find that cases

31 and 37 are the outliers in physical confidence.

Looking at the data directly we find that neither of these cases

even have a value for appearance.

Page 9: A full analysis example Multiple correlations Partial ...jackd/Stat203_2011/Wk09_1.pdf · Graphs Legacy Dialog Scatter/Dot No obvious signs of non-linear trends, but there doesn’t

The two outliers in ‘physical’ have no measured value for

‘appearance’

That means they will have no effect on a correlation between

“physical” and “appearance”. Correlation can only consider

cases where there are values for both variables (a point needs

both an X and a Y to exist)

Page 10: A full analysis example Multiple correlations Partial ...jackd/Stat203_2011/Wk09_1.pdf · Graphs Legacy Dialog Scatter/Dot No obvious signs of non-linear trends, but there doesn’t

Next, we look at the scatterplot.

Graphs Legacy Dialog Scatter/Dot

No obvious signs of non-linear trends, but there doesn’t seem

to be any strong trend at all.

Correlation is a appropriate measure, but it won’t be strong.

Page 11: A full analysis example Multiple correlations Partial ...jackd/Stat203_2011/Wk09_1.pdf · Graphs Legacy Dialog Scatter/Dot No obvious signs of non-linear trends, but there doesn’t

We run the correlation to find it and see if it’s significant at

alpha = 0.05.

Analyze Correlate Bivariate

Sig. (2-tailed) is .039, so the correlation is significant at alpha

=.05. (Had we chosen the .01 level, this would not be the case)

Page 12: A full analysis example Multiple correlations Partial ...jackd/Stat203_2011/Wk09_1.pdf · Graphs Legacy Dialog Scatter/Dot No obvious signs of non-linear trends, but there doesn’t

We could also run a t-test by hand to verify the significance

level we found. (r= .373, n=31)

t* = 2.045 at 0.05 level, 29 df

t* = 2.756 at 0.01 level, 29 df

Page 13: A full analysis example Multiple correlations Partial ...jackd/Stat203_2011/Wk09_1.pdf · Graphs Legacy Dialog Scatter/Dot No obvious signs of non-linear trends, but there doesn’t

Let’s not sully this moment with a bad pun or something.

Page 14: A full analysis example Multiple correlations Partial ...jackd/Stat203_2011/Wk09_1.pdf · Graphs Legacy Dialog Scatter/Dot No obvious signs of non-linear trends, but there doesn’t

The _______________ is a table that shows the

correlation between two variables.

Physical Appearance Physical 1.000 .373 Appearance .373 1

In this case, Weight is correlated with Length with r=.940

Likewise, Length is correlated with Weight with r=.940

Also, everything correlates with itself with r=1.000.

Page 15: A full analysis example Multiple correlations Partial ...jackd/Stat203_2011/Wk09_1.pdf · Graphs Legacy Dialog Scatter/Dot No obvious signs of non-linear trends, but there doesn’t

SPSS takes it a little farther by making a matrix of correlation

coefficient, significance, and sample size.

Confidences are significantly correlated, there are 31 entries

for each pair (not 41 because real data has blanks).

Page 16: A full analysis example Multiple correlations Partial ...jackd/Stat203_2011/Wk09_1.pdf · Graphs Legacy Dialog Scatter/Dot No obvious signs of non-linear trends, but there doesn’t

However, if we go to the correlations menu and select more

than two variables of interest:

Page 17: A full analysis example Multiple correlations Partial ...jackd/Stat203_2011/Wk09_1.pdf · Graphs Legacy Dialog Scatter/Dot No obvious signs of non-linear trends, but there doesn’t

We get a 4x4 correlation matrix instead!

What’s better than two variables? FOUR VARIABLES!

Page 18: A full analysis example Multiple correlations Partial ...jackd/Stat203_2011/Wk09_1.pdf · Graphs Legacy Dialog Scatter/Dot No obvious signs of non-linear trends, but there doesn’t

Cutting away all the sample size and significance stuff, I find:

Phys. Appear. Emot. Pr.Solve. Physical 1 .373* .430** .730** Appearance 1 .483** .527** Emotional 1 .540** Problem Solving 1

There is a positive correlation between every facet. That

means that any one facet of confidence increases, so do all the

others.

* significant at 0.05 level

* significant at 0.01 level

Page 19: A full analysis example Multiple correlations Partial ...jackd/Stat203_2011/Wk09_1.pdf · Graphs Legacy Dialog Scatter/Dot No obvious signs of non-linear trends, but there doesn’t

Phys. Appear. Emot. Pr.Solve. Physical 1 .373* .430** .730** Appearance 1 .483** .527** Emotional 1 .540** Problem Solving 1

Multiple correlation is useful as a first-look search for

connections between variables, and to see broad trends

between data.

If there were only a few variables connected to each other, it

would help us identify which ones without having to look at all

6 pairs individually.

Page 20: A full analysis example Multiple correlations Partial ...jackd/Stat203_2011/Wk09_1.pdf · Graphs Legacy Dialog Scatter/Dot No obvious signs of non-linear trends, but there doesn’t

Pitfalls of multiple correlations:

1. ______________. With 4 variables, there are 6

correlations being tested for significance. At alpha =0.05,

there’s a 26.5% chance that at least one correlation is

going to show as significant even if there are no

correlations at all.

At 5 variables, there are 10 tests and a 40.1% chance of falsely

rejecting at least one null. (Assuming no correlations)

At 6 variables, there are 15 tests and a 53.7% chance of falsely

rejecting the null.

Page 21: A full analysis example Multiple correlations Partial ...jackd/Stat203_2011/Wk09_1.pdf · Graphs Legacy Dialog Scatter/Dot No obvious signs of non-linear trends, but there doesn’t

You don’t need to know how to handle multiple testing

problems in this class. However, be cautious when dealing

with many variables.

Be suspicious of correlations that are significant, but just

barely.

Example: The weakest correlation here is physical with

appearance, a correlation of .373. That correlation being

significant could be a fluke.

Page 22: A full analysis example Multiple correlations Partial ...jackd/Stat203_2011/Wk09_1.pdf · Graphs Legacy Dialog Scatter/Dot No obvious signs of non-linear trends, but there doesn’t

2. Diagnostics doesn’t get easier.

Doing correlations as a matrix allows you to do the math of a

correlation much faster than checking them one at a time.

However, the diagnostic tests like histograms, scatterplots, and

residual plots don’t get any faster.

Any correlation we’re interested in (even if it’s not showing as

significant) still needs checks for normality and linearity before

use in research.

Page 23: A full analysis example Multiple correlations Partial ...jackd/Stat203_2011/Wk09_1.pdf · Graphs Legacy Dialog Scatter/Dot No obvious signs of non-linear trends, but there doesn’t

One big advantage of correlating with multiple variables is that

we can isolate the connections between different variables

where they might not be obvious otherwise.

Phys. Appear. Emot. Pr.Solve. Physical 1 .373* .430** .730** Appearance 1 .483** .527** Emotional 1 .540** Problem Solving 1

Example: Is there really a correlation between appearance

confidence and problem solving confidence SPECIFICALLY, or

are they both attached to the same general confidence?

Page 24: A full analysis example Multiple correlations Partial ...jackd/Stat203_2011/Wk09_1.pdf · Graphs Legacy Dialog Scatter/Dot No obvious signs of non-linear trends, but there doesn’t

Ponder that over a Mandarin Duck.

Page 25: A full analysis example Multiple correlations Partial ...jackd/Stat203_2011/Wk09_1.pdf · Graphs Legacy Dialog Scatter/Dot No obvious signs of non-linear trends, but there doesn’t

To isolate a correlation between two variables from a third

variable, we want to only look at the part of that correlation

that’s really between those two and not the third.

We want the ______________.

Example: Ice cream sales increase when murder rates increase.

These two variables have nothing logical to do with each other,

however, they both increase when it’s hot out.

Page 26: A full analysis example Multiple correlations Partial ...jackd/Stat203_2011/Wk09_1.pdf · Graphs Legacy Dialog Scatter/Dot No obvious signs of non-linear trends, but there doesn’t

This is the ______________between these two variables.

We want the relationship between murder and ice cream

WITHOUT the ______________of heat.

Page 27: A full analysis example Multiple correlations Partial ...jackd/Stat203_2011/Wk09_1.pdf · Graphs Legacy Dialog Scatter/Dot No obvious signs of non-linear trends, but there doesn’t

In the dataset “murderice.csv”, we can find run a partial

correlation and find out.

First, a simple correlation reveals very significant correlations

between everything.

Page 28: A full analysis example Multiple correlations Partial ...jackd/Stat203_2011/Wk09_1.pdf · Graphs Legacy Dialog Scatter/Dot No obvious signs of non-linear trends, but there doesn’t

But how much of that connection is truly between murder and

ice cream? ____________________________

Page 29: A full analysis example Multiple correlations Partial ...jackd/Stat203_2011/Wk09_1.pdf · Graphs Legacy Dialog Scatter/Dot No obvious signs of non-linear trends, but there doesn’t

From here, put the two variables of interest in the variable

(you can put more than two if you wish).

Put the confounding variable in the ‘control for’ slot.

Page 30: A full analysis example Multiple correlations Partial ...jackd/Stat203_2011/Wk09_1.pdf · Graphs Legacy Dialog Scatter/Dot No obvious signs of non-linear trends, but there doesn’t

The partial correlation between ice cream and murder is much

_______ than the simple correlation. It appears that heat

(or something common to all three) was a major factor in both.

In fact, the correlation is no longer significant (we fail to reject

the null that there is no correlation)

Page 31: A full analysis example Multiple correlations Partial ...jackd/Stat203_2011/Wk09_1.pdf · Graphs Legacy Dialog Scatter/Dot No obvious signs of non-linear trends, but there doesn’t

Also note: SPSS tells us in the output table that heat is a

control variable, so we know from the output that this is a

partial correlation (hint, hint).

We’re using three degrees of freedom, one for each variable

involved, so the df is 57 even when n is 60 (for interest)

Page 32: A full analysis example Multiple correlations Partial ...jackd/Stat203_2011/Wk09_1.pdf · Graphs Legacy Dialog Scatter/Dot No obvious signs of non-linear trends, but there doesn’t

Key observation:

The partial correlation will be less than the simple correlation if

both variables of interest are correlated to the confounding

variable in _____________________.

Here, both murder and ice cream are correlated to heat

positively, so the partial correlation removes that common

positive relationship murder and ice cream.

Removing a positive relationship makes the correlation less

positive.

Page 33: A full analysis example Multiple correlations Partial ...jackd/Stat203_2011/Wk09_1.pdf · Graphs Legacy Dialog Scatter/Dot No obvious signs of non-linear trends, but there doesn’t

Likewise, if the correlation to the confounding variable is

opposing, then the partial correlation will be higher than the

simple correlation.

If we’re only considering positive correlations, this means a

confounding variable could be hiding or ______________ a correlation hiding a correlation between two variables rather

than creating a false correlation.

Page 34: A full analysis example Multiple correlations Partial ...jackd/Stat203_2011/Wk09_1.pdf · Graphs Legacy Dialog Scatter/Dot No obvious signs of non-linear trends, but there doesn’t

Example: Confidence.

Consider the correlation between types of confidence. Do the

correlations between the other three still show after we

control for problem solving confidence?

Simple Correlations

Phys. Appear. Emot. Pr.Solve. Physical 1 .373* .430** .730** Appearance 1 .483** .527** Emotional 1 .540** Problem Solving 1

Page 35: A full analysis example Multiple correlations Partial ...jackd/Stat203_2011/Wk09_1.pdf · Graphs Legacy Dialog Scatter/Dot No obvious signs of non-linear trends, but there doesn’t

The correlation between physical and anything is removed

entirely (that means that knowing problem solving confidence

tells you as much about an employee’s physical confidence as

knowing all three other facets)

Page 36: A full analysis example Multiple correlations Partial ...jackd/Stat203_2011/Wk09_1.pdf · Graphs Legacy Dialog Scatter/Dot No obvious signs of non-linear trends, but there doesn’t

With the heat behind murder and ice cream we had some

other non-math information to make the claim that heat was

behind the other two variables.

It could have easily been something we didn’t measure, like

the proportion of elderly in an area (retirees often migrate

south for winter).

In the case of facets of confidence, we don’t have any reason

why problem solving confidence would be the common

thread. The partial correlations shrink to nothing because after

problem solving, the other variables we’re giving much info.

Page 37: A full analysis example Multiple correlations Partial ...jackd/Stat203_2011/Wk09_1.pdf · Graphs Legacy Dialog Scatter/Dot No obvious signs of non-linear trends, but there doesn’t

If we control for emotional confidence, we see there’s a

connection between problem solving and physical when

emotional is taken out of the picture.

Page 38: A full analysis example Multiple correlations Partial ...jackd/Stat203_2011/Wk09_1.pdf · Graphs Legacy Dialog Scatter/Dot No obvious signs of non-linear trends, but there doesn’t

Interestingly, controlling for appearance produces the same

result. They all have a common thread and so increase

together, but the real connection is between problem solving

and physical confidence.

Without partial correlation we would have never caught this.