r xy. when two variables are correlated, we can predict a score on one variable from a score on the...

37
r xy

Post on 19-Dec-2015

217 views

Category:

Documents


1 download

TRANSCRIPT

Page 1: R xy. When two variables are correlated, we can predict a score on one variable from a score on the other The stronger the correlation, the more accurate

rxy

Page 2: R xy. When two variables are correlated, we can predict a score on one variable from a score on the other The stronger the correlation, the more accurate

rxy

• When two variables are correlated, we can predict a score on one variable from a score on the other

• The stronger the correlation, the more accurate our prediction will be

Page 3: R xy. When two variables are correlated, we can predict a score on one variable from a score on the other The stronger the correlation, the more accurate

rxy

• We need a measure of the “strength” of a correlation

Page 4: R xy. When two variables are correlated, we can predict a score on one variable from a score on the other The stronger the correlation, the more accurate

rxy

• We need a number that gets bigger when big numbers are paired with big numbers and small numbers are paired with small numbers

• We need a number that gets smaller when big numbers are paired with small numbers and small numbers are paired with big numbers

Page 5: R xy. When two variables are correlated, we can predict a score on one variable from a score on the other The stronger the correlation, the more accurate

rxy

• Remember the height/weight example:• Big number indicates this (strong positive correlation)

5’ 5’2 5’4 5’6 5’8 5’10

100 110 120 130 140 150

a

a

b

b, e

c

c

d

d

e f

f

Page 6: R xy. When two variables are correlated, we can predict a score on one variable from a score on the other The stronger the correlation, the more accurate

rxy

• Remember the height/weight example:• Small number indicates this (strong negative

correlation)

5’ 5’2 5’4 5’6 5’8 5’10

100 110 120 130 140 150

a

a

b

b, e

c

c

d

d

ef

f

Page 7: R xy. When two variables are correlated, we can predict a score on one variable from a score on the other The stronger the correlation, the more accurate

rxy

• Two sets of scores, xi and yi

• What could we do?

Page 8: R xy. When two variables are correlated, we can predict a score on one variable from a score on the other The stronger the correlation, the more accurate

rxy

• What could we do?

(x iy i)i=1

n

Page 9: R xy. When two variables are correlated, we can predict a score on one variable from a score on the other The stronger the correlation, the more accurate

rxy

• What could we do?• When pairs are multiplied and the

products are summed up: – Greatest when big numbers paired with big

numbers and small numbers with small numbers

– Least when small numbers are paired with big numbers and big numbers are paired with small numbers

Page 10: R xy. When two variables are correlated, we can predict a score on one variable from a score on the other The stronger the correlation, the more accurate

rxy

• analogy: This gets you most money

PenniesQuartersLoonies

Page 11: R xy. When two variables are correlated, we can predict a score on one variable from a score on the other The stronger the correlation, the more accurate

rxy

• analogy:this gets you the least…

PenniesQuartersLoonies

Page 12: R xy. When two variables are correlated, we can predict a score on one variable from a score on the other The stronger the correlation, the more accurate

rxy

• analogy:

Because:

3 x $1 plus 2 x $0.25 plus 1 x $0.01

is more than

1 x $1 plus 2 x $0.25 plus 3 x $0.01

Page 13: R xy. When two variables are correlated, we can predict a score on one variable from a score on the other The stronger the correlation, the more accurate

rxy

• But there’s a problem

(x iy i)i=1

n

∑Not a good measure because the value ultimately depends on n AND the size of the numbers

Page 14: R xy. When two variables are correlated, we can predict a score on one variable from a score on the other The stronger the correlation, the more accurate

rxy

• Try this

(x iy i)i=1

n

∑n

Page 15: R xy. When two variables are correlated, we can predict a score on one variable from a score on the other The stronger the correlation, the more accurate

rxy

• Try this

Still not so good - doesn’t depend on n anymore, but does depend on size of x’s and y’s

(x iy i)i=1

n

∑n

Page 16: R xy. When two variables are correlated, we can predict a score on one variable from a score on the other The stronger the correlation, the more accurate

rxy

• How about multiply deviation scores– comparing each variable relative to its

respective mean

(x i − x)(y i − y)i=1

n

∑n

Page 17: R xy. When two variables are correlated, we can predict a score on one variable from a score on the other The stronger the correlation, the more accurate

rxy

• Multiply deviation scores

Now value depends on the spread of the data

(x i − x)(y i − y)i=1

n

∑n

Page 18: R xy. When two variables are correlated, we can predict a score on one variable from a score on the other The stronger the correlation, the more accurate

rxy

• So standardize the scores

(x i − x)

Sx

(y i − y)

Syi=1

n

∑n

Page 19: R xy. When two variables are correlated, we can predict a score on one variable from a score on the other The stronger the correlation, the more accurate

rxy

• This measures strength of correlation:

(x i − x)

Sx

(y i − y)

Syi=1

n

∑n

=

(zx izyi)

i=1

n

∑n

= rxy

Page 20: R xy. When two variables are correlated, we can predict a score on one variable from a score on the other The stronger the correlation, the more accurate

rxy

• rxy ranges from -1.0 indicating a perfect negative correlation to +1.0 indicating a perfect positive correlation

• an rxy of zero indicates no correlation whatsoever. Scores are random with respect to each other.

Page 21: R xy. When two variables are correlated, we can predict a score on one variable from a score on the other The stronger the correlation, the more accurate

rxy

• rxy also has a geometric meaning

Page 22: R xy. When two variables are correlated, we can predict a score on one variable from a score on the other The stronger the correlation, the more accurate

rxy

• rxy also has a geometric meaning

• Recall that the mean of the zx and zy distributions is zero and each z-score is a deviation from the mean

Page 23: R xy. When two variables are correlated, we can predict a score on one variable from a score on the other The stronger the correlation, the more accurate

rxy

• Each point lands in one of four quadrants

point zx,zy

zx

zy

Page 24: R xy. When two variables are correlated, we can predict a score on one variable from a score on the other The stronger the correlation, the more accurate

rxy

• notice that:

both zx and zy are positive

(zx izyi)

i=1

n

∑n

rxy =

Page 25: R xy. When two variables are correlated, we can predict a score on one variable from a score on the other The stronger the correlation, the more accurate

rxy

• notice that:

zx is negative and zy is positive

(zx izyi)

i=1

n

∑n

rxy =

Page 26: R xy. When two variables are correlated, we can predict a score on one variable from a score on the other The stronger the correlation, the more accurate

rxy

• notice that:

zx is negative and zy is negative

(zx izyi)

i=1

n

∑n

rxy =

Page 27: R xy. When two variables are correlated, we can predict a score on one variable from a score on the other The stronger the correlation, the more accurate

rxy

• notice that:

zx is positive and zy is negative

(zx izyi)

i=1

n

∑n

rxy =

Page 28: R xy. When two variables are correlated, we can predict a score on one variable from a score on the other The stronger the correlation, the more accurate

rxy

• SoThus if most points tend to fall around a line with a positive (45 degree) slope (I and III), the cross-products will tend to be positive

III

III IV

Page 29: R xy. When two variables are correlated, we can predict a score on one variable from a score on the other The stronger the correlation, the more accurate

rxy

• So

If most points tend to fall around a line with a negative slope (II and IV), the cross products will tend to be negative

Thus if most points tend to fall around a line with a positive (45 degree) slope (I and III), the cross-products will tend to be positive

III

III IV

Page 30: R xy. When two variables are correlated, we can predict a score on one variable from a score on the other The stronger the correlation, the more accurate

rxy

• SoIf the points were randomly scattered about, the negative and positive cross-products cancel

Page 31: R xy. When two variables are correlated, we can predict a score on one variable from a score on the other The stronger the correlation, the more accurate

Covariance

• a related measure of the relationship between scores on two different variables is the covariance

Sxy =(x i − x )(y i − y )

i=1

n

∑n

Page 32: R xy. When two variables are correlated, we can predict a score on one variable from a score on the other The stronger the correlation, the more accurate

Covariance

• notice that the variance (S2x) is the

covariance between a variable and itself !

Sxy =(x i − x )(y i − y )

i=1

n

∑n

Page 33: R xy. When two variables are correlated, we can predict a score on one variable from a score on the other The stronger the correlation, the more accurate

Regression

• If two variables are perfectly correlated (r = + or - 1.0) then one can exactly predict a score on one variable given a score on another

Page 34: R xy. When two variables are correlated, we can predict a score on one variable from a score on the other The stronger the correlation, the more accurate

Regression

• For example: a university charges $250 registration fee plus $100 / credit

Page 35: R xy. When two variables are correlated, we can predict a score on one variable from a score on the other The stronger the correlation, the more accurate

Regression

• tuition = $100(X) + $250 – where X is the number of credits

• Notice this is a linear relationship (an equation of the form y = ax + b– a = $100/credit– b = $250– x = number of credits

Page 36: R xy. When two variables are correlated, we can predict a score on one variable from a score on the other The stronger the correlation, the more accurate

Regression

• Tuition as a function of credit hours is a straight line

• There is a perfect correlation between credit hours and tuition

•You could predict perfectly the tuition required given the number of credit hours

Page 37: R xy. When two variables are correlated, we can predict a score on one variable from a score on the other The stronger the correlation, the more accurate

Next Time

• Regression - read chapter 8