correlation: a statistic to describe the relationship between variables hours worked pay hours...

19
Correlation: A statistic to describe the relationship between variables Hours Worked P a y Hours Worked P a y Hours Worked P a y

Upload: rose-nicholson

Post on 11-Jan-2016

221 views

Category:

Documents


3 download

TRANSCRIPT

Page 1: Correlation: A statistic to describe the relationship between variables Hours Worked Pay Hours Worked Pay Hours Worked Pay

Correlation: A statistic to describe the relationship between variables

Hours Worked

Pay

Hours Worked

Pay

Hours Worked

Pay

Page 2: Correlation: A statistic to describe the relationship between variables Hours Worked Pay Hours Worked Pay Hours Worked Pay

Univariate vs. Bivariate Statistics

Bivariate analyses/graphical representationsScatterplotsCorrelation:

Univariate analyses/ graphical representations:Frequency histogramsMeasures of central tendency and variabilityZ-scores

linear pattern of relationship between one variable (x) and another variable (y) – an association between two variablesrelative position of one variable correlates with relative distribution of another variable

How can we define correlation?

Page 3: Correlation: A statistic to describe the relationship between variables Hours Worked Pay Hours Worked Pay Hours Worked Pay

Correlations allow us to look for evidence of a relationship between variables.

Page 4: Correlation: A statistic to describe the relationship between variables Hours Worked Pay Hours Worked Pay Hours Worked Pay

Correlations can vary in strength

CIGARETT

1086420

DO

CV

ISIT

10

8

6

4

2

0

CIGARETT

1086420

DO

CV

IS2

10

8

6

4

2

0

CIGARETT

1086420-2

VA

R0

00

01

10

8

6

4

2

0

Page 5: Correlation: A statistic to describe the relationship between variables Hours Worked Pay Hours Worked Pay Hours Worked Pay

Correlations can vary in direction

So, how do we QUANTIFY a correlation?

We need to come up with a NUMBER that reflects both the strength and direction of the correlation.

flu shotsgiven

Page 6: Correlation: A statistic to describe the relationship between variables Hours Worked Pay Hours Worked Pay Hours Worked Pay

Correlation finds the strength and direction of the best fittingline to the data.

S XY - (S X) (S Y) n r =

S X2 - (S X)2 S Y2 - (S Y)2

n n[ [ ] ]

The number we calculate in Statistics is called the correlation coefficient. Developed by Karl Pearson, it is also sometimes referred to as Pearson’s r.

Page 7: Correlation: A statistic to describe the relationship between variables Hours Worked Pay Hours Worked Pay Hours Worked Pay

Example Calculation:the following data represent the number of emergency room visits per year (x) and cigarettes smoked a day (y) by three individuals recruited from New York Methodist Hospital.

12S 15 62 77 65

= 0.94

x237

y456

x2

49

49

y2

16 25 36

xy 8

1542

65 (12) (15)

3

62 (12)2

3

77 (15)2

3

5

√[(14)(2)]=

Page 8: Correlation: A statistic to describe the relationship between variables Hours Worked Pay Hours Worked Pay Hours Worked Pay

Another way to think of the correlation

• The product of the Z-scores for each pair of scores

r = ( S Zx Zy ) /( n-1)

x237

y456

Zx-.76-.381.13

Zy-101

65.2

4

xs

x

1

5

ys

y

If x=2, (2-4)/2.65 = -.76 …If y=4, (4-5)/1 = -1

Page 9: Correlation: A statistic to describe the relationship between variables Hours Worked Pay Hours Worked Pay Hours Worked Pay

Another way to think of the correlation

• The product of the Z-scores for each pair of scores

r = ( S Zx Zy ) /( n-1)x237

y456

Zx-.76-.381.13

Zy-101

ZxZy.7601.13

S 1.89

2=.945 = .95

Page 10: Correlation: A statistic to describe the relationship between variables Hours Worked Pay Hours Worked Pay Hours Worked Pay

Interpreting the Pearson r* Range of values: Interpreting the value of r

-1.0 to +1.0

* Direction from the signnegative => anticorrelated As one variable goes up the other

goes down in value.positive => correlated As on variable goes up so does

the other.

* Strength from the magnitude| r | = 1.0 perfect relationship| r | = 0.0 no evidence of relationship0.0 < | r | < 1.0 intermediate strength relationship

Page 11: Correlation: A statistic to describe the relationship between variables Hours Worked Pay Hours Worked Pay Hours Worked Pay
Page 12: Correlation: A statistic to describe the relationship between variables Hours Worked Pay Hours Worked Pay Hours Worked Pay

When NOT to use a correlation:• Extreme scores

r = .97• Non-linear relationships

r = .20

Page 13: Correlation: A statistic to describe the relationship between variables Hours Worked Pay Hours Worked Pay Hours Worked Pay

Some Issues with Correlation

•NO CAUSATION!• Spurious correlation

Page 14: Correlation: A statistic to describe the relationship between variables Hours Worked Pay Hours Worked Pay Hours Worked Pay
Page 15: Correlation: A statistic to describe the relationship between variables Hours Worked Pay Hours Worked Pay Hours Worked Pay
Page 16: Correlation: A statistic to describe the relationship between variables Hours Worked Pay Hours Worked Pay Hours Worked Pay
Page 17: Correlation: A statistic to describe the relationship between variables Hours Worked Pay Hours Worked Pay Hours Worked Pay
Page 18: Correlation: A statistic to describe the relationship between variables Hours Worked Pay Hours Worked Pay Hours Worked Pay

Number of people who drowned in a swimming pool & number of Nicholas Cage films in a given year = .67

Per capita consumption of cheese & number of deaths by becoming tangled in bed sheets = .95

Divorce rate in Maine & consumption of margarine in the United States = .99

Page 19: Correlation: A statistic to describe the relationship between variables Hours Worked Pay Hours Worked Pay Hours Worked Pay

Preview of Next Lecture:

Textile Workers

45

50

55

60

65

70

75

80 100 120 140 160

Weight (lbs)

Hie

ght

(inc

hes)

Regression finding the best fitting line to a data set.