correlation

Post on 05-Jan-2016

40 Views

Category:

Documents

0 Downloads

Preview:

Click to see full reader

DESCRIPTION

Correlation. A bit about Pearson’s r. Why does the maximum value of r equal 1.0? What does it mean when a correlation is positive? Negative? What is the purpose of the Fisher r to z transformation? What is range restriction? Range enhancement? What do they do to r ?. - PowerPoint PPT Presentation

TRANSCRIPT

Correlation

A bit about Pearson’s r

Questions• Why does the

maximum value of r equal 1.0?

• What does it mean when a correlation is positive? Negative?

• What is the purpose of the Fisher r to z transformation?

• What is range restriction? Range enhancement? What do they do to r?

• Give an example in which data properly analyzed by ANOVA cannot be used to infer causality.

• Why do we care about the sampling distribution of the correlation coefficient?

• What is the effect of reliability on r?

Basic Ideas

• Nominal vs. continuous IV

• Degree (direction) & closeness (magnitude) of linear relations– Sign (+ or -) for direction– Absolute value for magnitude

• Pearson product-moment correlation coefficient

N

zzr YX

Illustrations

757269666360

Height

210

180

150

120

90

Wei

ght

Plot of Weight by Height

4003002001000Study Time

30

20

10

0

Err

ors

Plot of Errors by Study Time

1.91.81.71.61.5Toe Size

700

600

500

400

SA

T-V

Plot of SAT-V by Toe Size

Positive, negative, zero

Simple Formulas

rxy

NS SX Y

x X X and y Y Y

N

XXSX

2)(

rz z

Nx y

zX X

SX

Use either N throughout or else use N-1 throughout (SD and denominator); result is the same as long as you are consistent.

N

xyYXovC ),(

Pearson’s r is the average cross product of z scores. Product of (standardized) moments from the means.

Graphic Representation

757269666360

Height

210

180

150

120

90

Wei

ght

Plot of Weight by Height

757269666360

Height

Plot of Weight by Height

Plot of Weight by Height

Mean = 66.8 Inches

Mean = 150.7 lbs.

210-1-2Z-height

2

1

0

-1

-2

Z-w

eigh

t

Plot of Weight by Height in Z-scores

2

1

0

-1

-2

Z-w

eigh

t

Plot of Weight by Height in Z-scores

Plot of Weight by Height in Z-scores

+

-

-

+

1. Conversion from raw to z.

2. Points & quadrants. Positive & negative products.

3. Correlation is average of cross products. Sign & magnitude of r depend on where the points fall.

4. Product at maximum (average =1) when points on line where zX=zY.

Descriptive StatisticsN Minimum Maximum Mean Std. Deviation

Ht 10 60.00 78.00 69.0000 6.05530Wt 10 110.00 200.00 155.0000 30.27650Valid N (listwise) 10

r = 1.0

r=1

r=.99

Leave X, add error to Y.

r=.99

r=.91Add more error.

With 2 variables, the correlation is the z-score slope.

Review

• Why does the maximum value of r equal 1.0?

• What does it mean when a correlation is positive? Negative?

Sampling Distribution of rStatistic is r, parameter is ρ (rho). In general, r is slightly biased.

1.20.80.40.0-0.4-0.8-1.2

Observed r

0.08

0.06

0.04

0.02

0.00

Rel

ativ

e F

requ

ency

Sampling Distributions of r

rho=0 rho=.5rho=-.5

r N2

2 21

( )The sampling variance is approximately:

Sampling variance depends both on N and on ρ.

Empirical Sampling Distributions of the Correlation Coefficient

100;5. N 100;7. N

50;5. N 50;7. N

0.9 + 0 | 0 | | 0 | | 0 0 | 0.8 + 0 | | | | | | | | | +-----+ | 0 | +-----+ | | 0.7 + 0 | *--+--* *--+--* | | | +-----+ | | | | | | +-----+ | | | | | 0.6 + | | | | | | +-----+ 0 | | +-----+ | | 0 | | | | | | 0 | 0.5 + *--+--* *--+--* 0 0 | | | | | 0 0 | +-----+ | | * 0 | | +-----+ 0 0.4 + | | 0 | | | * 0 | | | * | | | 0.3 + 0 | | 0 | * | 0 | | 0 0 0.2 + 0 0 | 0 0 | 0 0 | 0 0.1 + 0 | 0 | 0 | 0 0 + * | * | * | -0.1 + ------------+-----------+-----------+-----------+----------- param .5_N100 .5_N50 .7_N100 .7_N50

Fisher’s r to z Transformation

r.10.20.30.40.50.60.70.80.90

z.10.20.31.42.55.69.871.101.47 1.00.80.60.40.20.0

r (sample value input)

1.5

1.3

1.1

0.9

0.6

0.4

0.2

0.0

z (o

utpu

t)

Fisher r to z Transformation

Sampling distribution of z is normal as N increases.Pulls out short tail to make better (normal) distribution.Sampling variance of z = (1/(n-3)) does not depend on ρ.

)1(

)1(ln5.

r

rz

Hypothesis test: 0:0 H

212

r

rNt

Result is compared to t with (N-

2) df for significance.

Say r=.25, N=100

56.2986.

25.899.9

25.1

25.98

2

t

t(.05, 98) = 1.984.

p< .05

Hypothesis test 2: valueH :0

z

rr

N

e e

. log . log

/

511

511

1 3

One sample z test where r is sample value and ρ is hypothesized population value.

Say N=200, r = .54, and ρ is .30.

ze e

. log..

. log..

/

51 541 54

51 301 30

1 200 3z

. .

.

60 31

07 =4.13

Compare to unit normal, e.g., 4.13 > 1.96 so it is significant. Our sample was not drawn from a population in which rho is .30.

Hypothesis test 3: 210 : H

Testing equality of correlations from 2 INDEPENDENT samples.

z

rr

rr

N N

e e

. log . log

/ ( ) / ( )

511

511

1 3 1 3

1

1

2

2

1 2

Say N1=150, r1=.63, N2=175, r2=70.

ze e

. log..

. log..

/ ( ) / ( )

51 631 63

51 701 70

1 150 3 1 175 3z

. .

.

74 87

11= -1.18, n.s.

Hypothesis test 4: kH ...: 210

Testing equality of any number of independent correlations.

)3(

)3(1

i

k

iii

n

znz

2))(3( zznQ ii

Compare Q to chi-square with k-1 df.

Study r n z (n-3)z zbar (z-zbar)2 (n-3)(z-zbar)2

1 .2 200 .2 39.94 .41 .0441 8.69

2 .5 150 .55 80.75 .41 .0196 2.88

3 .6 75 .69 49.91 .41 .0784 5.64

sum 425 170.6 17.21=Q

Chi-square at .05 with 2 df = 5.99. Not all rho are equal.

Hypothesis test 5: dependent r13120 : H

34120 : H

Hotelling-Williams test

323

223

1312)3( )1(||)3/()1(2

)1)(1()(

rrRNN

rNrrt N

2/)( 1312 rrr

534.)3)(.6)(.4(.23.6.4.1|| 222 R

Say N=101, r12=.4, r13=.6, r23=.3

5.2/)6.4(. r

1.2)3.1(5.534).98/()100(2

)3.1)(100()6.4(.

32)3(

Nt

t(.05, 98) = 1.98See my notes.

))()((21|| 2313122

232

132

12 rrrrrrR

Review

• What is the purpose of the Fisher r to z transformation?

• Test the hypothesis that – Given that r1 = .50, N1 = 103– r2 = .60, N2 = 128 and the samples are

independent.

• Why do we care about the sampling distribution of the correlation coefficient?

21

Range Restriction/Enhancement

ReliabilityReliability sets the ceiling for validity. Measurement error attenuates correlations.

'' YYXXTTXY YX

If correlation between true scores is .7 and reliability of X and Y are both .8, observed correlation is 7.sqrt(.8*.8) = .7*.8 = .56.

Disattenuated correlation

''/ YYXXXYTT YX

If our observed correlation is .56 and the reliabilities of both X and Y are .8, our estimate of the correlation between true scores is .56/.8 = .70.

Review

• What is range restriction? Range enhancement? What do they do to r?

• What is the effect of reliability on r?

SAS Power Estimationproc power;

onecorr dist=fisherz corr = 0.35

nullcorr = 0.2 sides = 1 ntotal = 100 power = .; run;

proc power; onecorr corr = 0.35

nullcorr = 0 sides = 2 ntotal = . power = .8; run;

Computed PowerActual alpha = .05Power = .486

Computed N TotalAlpha = .05Actual Power = .801Ntotal = 61

Power for CorrelationsRho N required against

Null: rho = 0

.10 782

.15 346

.20 193

.25 123

.30 84

.35 61

Sample sizes required for powerful conventional significance tests for typical values of the correlation coefficient in psychology. Power = .8, two tails, alpha is .05.

top related