correlational problems and fallacies

18
Correlational Problems and Fallacies James H. Steiger

Upload: briar-cummings

Post on 31-Dec-2015

27 views

Category:

Documents


1 download

DESCRIPTION

Correlational Problems and Fallacies. James H. Steiger. Introduction. In this module, we discuss some common problems and fallacies regarding correlation coefficients and their interpretation Interpreting a correlation Correlation and causality Perfect correlation and equivalence - PowerPoint PPT Presentation

TRANSCRIPT

Page 1: Correlational Problems and Fallacies

Correlational Problems and Fallacies

James H. Steiger

Page 2: Correlational Problems and Fallacies

Introduction

In this module, we discuss some common problems and fallacies regarding correlation coefficients and their interpretation Interpreting a correlation Correlation and causality Perfect correlation and equivalence No Correlation vs. No Relation Combining Populations, and Ignoring Explanatory

Variables Restriction of Range

Page 3: Correlational Problems and Fallacies

Interpreting a Correlation

If scores are on roughly similar scales, the shape of the scatterplot can reveal a substantial amount about the correlation.

Page 4: Correlational Problems and Fallacies

Interpreting a CorrelationScatterplot (Cigarettes vs. Cardiac Reserve)

Cigarettes Smoked

Ca

rdia

c R

ese

rve

18

22

26

30

34

38

42

2 6 10 14 18 22 26 30

r =

Page 5: Correlational Problems and Fallacies

Interpreting a CorrelationScatterplot (Shoe Size vs. IQ)

Shoe Size

IQ

20

40

60

80

100

120

140

160

-2 0 2 4 6 8 10 12 14 16

r = .01

Page 6: Correlational Problems and Fallacies

Interpreting a Correlation

Scatterplot (GPA vs. IQ)

GPA

IQ

20

40

60

80

100

120

140

160

20 40 60 80 100 120

r = .72

Page 7: Correlational Problems and Fallacies

Anscombe’s Quartet

X1 Y1 X2 Y2 X3 Y3 X4 Y4 10 8.04 10 9.14 10 7.46 8 6.58 8 6.95 8 8.14 8 6.77 8 5.76 13 7.58 13 8.74 13 12.74 8 7.71 9 8.81 9 8.77 9 7.11 8 8.84 11 8.33 11 9.26 11 7.81 8 8.47 14 9.96 14 8.10 14 8.84 8 7.04 6 7.24 6 6.13 6 6.08 8 5.25 4 4.26 4 3.10 4 5.39 19 12.50 12 10.84 12 9.13 12 8.15 8 5.56 7 4.82 7 7.26 7 6.42 8 7.91 5 5.68 5 4.74 5 5.73 8 6.89

Page 8: Correlational Problems and Fallacies

Anscombe’s Quartet

9XM 2 11Xs

7.5YM 2 4.13Ys

Each of the above 4 data sets has the following summary statistics:

.82XYr

Each has a best fitting linear regression line of ˆ .5 3Y X

Page 9: Correlational Problems and Fallacies

Anscombe’s QuartetScatterplot (Anscombe.STA 8v*11c)

y=3+0.5*x+eps

X4

Y4

4

6

8

10

12

14

6 8 10 12 14 16 18 20

Scatterplot (Anscombe.STA 8v*11c)

y=3+0.5*x+eps

X3

Y3

4

6

8

10

12

14

2 4 6 8 10 12 14 16

Scatterplot (Anscombe.STA 8v*11c)

y=3+0.5*x+eps

X1

Y1

3

4

5

6

7

8

9

10

11

12

2 4 6 8 10 12 14 16

Scatterplot (Anscombe.STA 8v*11c)

y=3.+0.5*x+eps

X2

Y2

2

3

4

5

6

7

8

9

10

2 4 6 8 10 12 14 16

Page 10: Correlational Problems and Fallacies

Correlation and Causality

Correlation is not causality. This is a standard adage in textbooks on statistics and experimental design, but it is still forgotten on occasion.

Example: The correlation between number of fire trucks sent to a fire and the dollar damage done by the fire.

Page 11: Correlational Problems and Fallacies

Perfect Corrrelation and Equivalence

Two variables may correlate highly (or even perfectly), without measuring the same construct.

Example: Height and weight on the planet Zorg.

Page 12: Correlational Problems and Fallacies

Height and Weight on the Planet Zorg

Page 13: Correlational Problems and Fallacies

Zero Correlation vs. No Relation

The Pearson correlation coefficient is a measure of linear relation. Many strong relationships are nonlinear. Always examine the scatterplot!

Page 14: Correlational Problems and Fallacies

Combining Populations

If two groups with different means and/or covariances are combined, the resulting mixture can exhibit spurious correlations.

Example. (C.P.) Suppose the correlation between strength and mathematics performance is zero for 6th grade boys, and zero for 8th grade boys. Does this mean it will be zero in a combined group of 6th and 8th graders?

Page 15: Correlational Problems and Fallacies

Restriction of Range

Often, when linear regression is used to predict performance, the population is restricted. (For example, the GRE is used to predict performance in graduate school, but people with low GRE scores are often refused admission to graduate school. Consequently, the “available data” are a truncated version of the full data set.

Page 16: Correlational Problems and Fallacies

Restriction of RangeScatterplot (Restriction of Range.STA 10v*1000c)

y=33.905+0.514*x+eps

VAR1

VAR

2

40

50

60

70

80

90

100

20 30 40 50 60 70 80 90 100 110

N = 1000 r = .73

Page 17: Correlational Problems and Fallacies

Restriction of RangeScatterplot (Restriction of Range.STA 10v*1000c)

y=35.09+0.497*x+eps

VAR1

VAR

2

60

66

72

78

84

90

96

78 82 86 90 94 98 102 106

N = 153 r = .40

Page 18: Correlational Problems and Fallacies

The “Third Variable Fallacy”

Often people assume, sometimes almost subconsciously, that when two variables correlate highly with a third variable, they correlate highly with each other.

Actually, if rXW and rYW are both .7071, rXY can vary anywhere from 0 to 1.

Only when rXW and/or rYW become very high does the correlation between X and Y become highly restricted.