simple correlation

23
14-12-13 1 Magdy Ibrahim Mostafa Prof. Obstetrics & Gynecology, Faculty of Medicine, Cairo University Director; Research, Biostatistics & IT Units, MEDC, Cairo University Management member; EBM Unit, MEDC, Cairo University Scientific Council Member, Egyptian IT Fellowship Board Member, Egyptian Ob/Gyn Fellowship Associate Editor; Kasr Al Aini Journal of Obstetrics and Gynecology Peer Reviewer; Gyn Endocrin J, Gyn Oncol J, Obstet Gynecol Invest Journal Peer Reviewer; Cairo University Medical Journal, Kasr El Aini Medical Journal, MEFS Journal.y

Upload: ibrahim-lubbad

Post on 27-Jan-2015

120 views

Category:

Documents


1 download

DESCRIPTION

 

TRANSCRIPT

Page 1: Simple correlation

14-12-13

1

Magdy Ibrahim MostafaProf. Obstetrics & Gynecology, Faculty of Medicine, Cairo University

Director; Research, Biostatistics & IT Units, MEDC, Cairo University

Management member; EBM Unit, MEDC, Cairo University

Scientific Council Member, Egyptian IT Fellowship

Board Member, Egyptian Ob/Gyn Fellowship

Associate Editor; Kasr Al Aini Journal of Obstetrics and Gynecology

Peer Reviewer; Gyn Endocrin J, Gyn Oncol J, Obstet Gynecol Invest Journal

Peer Reviewer; Cairo University Medical Journal, Kasr El Aini Medical Journal, MEFS Journal.y

Page 2: Simple correlation

14-12-13

2

Correlation

� In two series of numerical data

� The values in one variable may vary correspondingly with the other one

Age

Height

Age B

MD

Correlation: + ve OR - ve

� When the two variables increase & decrease in parallel(Same direction)� positive correlation.

� When one goes up the other goes down proportionally(Opposite directions) � negative correlation

Correlation = Causation

Page 3: Simple correlation

14-12-13

3

Importance of correlation

1. Facilitates difficult measures

2. Study of effectors:

• Dependent variable (outcome)

• Independent variable(s) (predictors or effectors)

Correlation between payment & working hours

Scatter diagram

Page 4: Simple correlation

14-12-13

4

Conclusion:

1. As working hours increase payment increase �

Positive correlation (proportionate correlation)

2. The increase in payment is constant in relation to

increase in working hours� Linear correlation

Correlation between Payment & working hours

Correlation between TV watch & school grade

Scatter diagram

Page 5: Simple correlation

14-12-13

5

Conclusion:

1. As TV watching hours increase, the final school grade

decrease � Negative correlation (inverse correlation)

2. The decrease in grade is constant in relation to

increase in TV watching hours� Linear correlation

Correlation between TV watching & school grade

Correlation between Age and Height

Scatter Diagram

Page 6: Simple correlation

14-12-13

6

Correlation between Age and Height

Scatter Diagram

500

1000

1500

2000

2500

3000

3500

22 24 26 28 30 32 34 36 38 40 42

Weig

ht

(g)

Gestational age (weeks)

Positive linear correlation

0

200

400

600

800

1000

1200

45 50 55 60 65 70 75

Dis

tan

ce b

efore

dis

com

fort

(m

)

Age (years)

150

160

170

180

190

200

210

28 33 38 43 48

Heig

ht

(cm

)

Age (years)

Negative linear correlation

No correlation

50

70

90

110

130

150

170

15 20 25 30 35 40 45

Am

nio

tic f

luid

volu

me (

ml)

Gestational age (weeks)

Non linear correlation

Page 7: Simple correlation

14-12-13

7

Non-Linear Correlation

0

20

40

60

80

100

120

0 5 10 15

Age (Years)

Height

0

20

40

60

80

100

120

0 5 10 15

Age (Years)

Height

Linear Non-Linear

Curve is Closer to points

Straight line Curve

The correlation coefficient(meaning and magnitude)

� Examining plots is a good way to determine the nature and

strength of the relationship between two variables

� However, you need an objective measure to replace subjective

descriptions like strong, weak, I can't make up my mind, and

none

Page 8: Simple correlation

14-12-13

8

The correlation coefficient(meaning and magnitude)

� Mathematically, correlation is represented by what is known as:

correlation coefficient

� The correlation coefficient ranges from: � “0” (means no correlation) to 1 (perfect correlation)

� The sign is for the direction and not a value

The correlation coefficient(interpretation)

Interpretation of “cc”:

� From 0 to 0.25 (-0.25) = little or no relationship

� From 0.25 to 0.50 (-0.25 to 0.50) = fair

� From 0.50 to 0.75 (-0.50 to -0.75) = moderate to good

� Greater than 0.75 (or -0.75) = very good to excellent

Strong relation may not be clinically important

Page 9: Simple correlation

14-12-13

9

The correlation

� Does NOT tell us if Y is a function of X

� Does NOT tell us if X is a function of Y

� Does NOT tell us if X causes Y

� Does NOT tell us if Y causes X

� Coefficient does NOT tell us what the scatterplot looks like

Correlation between Age and Height

60

65

70

75

80

85

90

95

0 2 4 6 8

Age (Years)

Strength of correlation

150

155

160

165

170

175

180

185

0 20 40 60 80

He

igh

t (C

m)

Age (Years)

cc = 0.983cc = 0.012

StrongWeak

Page 10: Simple correlation

14-12-13

10

Correlation between Age and Height

60

65

70

75

80

85

90

95

0 2 4 6 8

Age (Years)

Height

Direction of correlation

cc = 0.983 cc = - 0.73

NegativePositive

0

1

2

3

4

5

6

7

0 10 20 30 40 50 60

Exposure to Sun (h/week)

Number of cold episods/year

Correlation coefficient

+1

0

-1

X Independent variable

Y D

ependent

vari

able

X Increases

Y Increases

X Increases

Y Decreases

X Change

Y Not Follow

Page 11: Simple correlation

14-12-13

11

Which test?

Pearson product

moment

correlation (r)

Linear correlation

Normal data

Spearmancorrelation (R)

Non-normal data

The Pearson correlation

� Is a measure of the strength of the linear correlation between

two variables in one sample

� “r” indicates:

�Strength of relationship (strong, weak, or none) �from 0 to 1

�Direction of relationship � either (-)ve or (+)ve

Page 12: Simple correlation

14-12-13

12

Pearson Correlation

Assumptions:

1. Variables are quantitative or ordinal

2. Normally distributed variables

3. Linear relationship (monotonic + constant change)

The Pearson “r” is

� Symmetric, since the correlation of x and y is the same as the

correlation of y and x

� Unaffected by linear transformations, such as adding a constant

to all numbers or dividing all numbers by a constant

� WARNING: Never compute correlation coefficients for nominal

variables, even if they are nicely coded with numbers. A

correlation between governorate and income is meaningless

Page 13: Simple correlation

14-12-13

13

The Pearson correlation

� “r” is a measure of LINEAR ASSOCIATION

� When “r” = ZERO � This means NO LINEAR CORREATION –

this does NOT mean there is NO CORRELATION

Page 14: Simple correlation

14-12-13

14

Page 15: Simple correlation

14-12-13

15

Page 16: Simple correlation

14-12-13

16

Pearson Correlation

� Pearson “r” is an appropriate summary measure for the first plot only, since data are near a straight line

� In the second plot, the relationship is not linear, so it doesn't make sense to describe how tightly the points cluster around a straight line

� In the third plot, the perfect relationship is distorted by an outlier point

� In the fourth plot, there appear to be two subgroups of cases in which there is no linear relationship between the two variables

Page 17: Simple correlation

14-12-13

17

Pearson Correlation

� If you don't plot your data, you can't tell whether a correlation coefficient is a good summary of the relationship

� The value of a correlation coefficient also depends on the range of values for which observations are taken

� Even if there is a linear relationship between two variables, you won't detect it if you consider a small range of values of the variables

� For example, height may be a poor predictor of weight if you restrict your range of heights to those over six feet

No extrapolation

Page 18: Simple correlation

14-12-13

18

Pearson Correlation

Limitations:

� Linearity: Can’t describe non-linear relationships (most biological relations)

� Truncation of range: Underestimate strength of relationship if you can’t see full range of x value

� No proof of causation

Testing hypothesis

� Pearson correlation coefficient describes the correlation between the sample observations on two variables in the same way that ρdescribes the relationship in a population

� Thus we need to knowing if we may conclude that ρ # 0

� The hypotheses are:

� H0: ρ = 0 (no correlation in the population)

� Ha: ρ ≠ 0 (there is correlation in the population)

Page 19: Simple correlation

14-12-13

19

Testing hypothesis

� The test used is the t test (revise t test uses)

� Statistically significant doesn’t mean clinically important or useful

� If you are examining many correlations coefficients, have to use the Bonferroni adjustment

Coefficient of determination

� The square of Pearson cc, r2, is the proportion of variation in

the values of y that is explained by the regression model with x

� Amount of variance accounted for in y by x

� Percentage increase in accuracy you gain by using the

regression line to make predictions

� 0 ≤ r2 ≤ 1 (100%)

� The larger r2 , the stronger the linear relationship

� The closer r2 is to 1, the more confident we are in our

prediction

Page 20: Simple correlation

14-12-13

20

Coefficient of determination

Example

� Topography of adipose tissue (AT) is associated with metabolic complications considered as risk factors for cardiovascular disease

� To measure the amount of intraabdominal AT as part of the evaluation of the cardiovascular-disease risk of an individual. Computed tomography (CT), the only available technique that precisely and reliably measures the amount of deep abdominal AT, however, is costly, exposes the subject to irradiation and is not available to many physicians

Page 21: Simple correlation

14-12-13

21

Example

� Despres and his colleagues conducted a study to develop equations to predict the amount of deep abdominal AT from simple anthropometric measurements

� Among the measurements taken on each subject were deep abdominal AT obtained by CT and waist circumference. The question of interest is how well can deep abdominal AT correlates to waist circumference

Spearman Correlation

� It is a measure of the strength and direction of association that exists between two variables measured on at least an ordinal scale

� It is denoted by the symbol rs, R

� The test is used for either ordinal variables or for interval/ratio data that has failed the assumptions necessary for conducting the Pearson's product-moment correlation

� The values of the variables are converted in ranks and then correlated

Page 22: Simple correlation

14-12-13

22

Spearman Correlation

Assumptions:

1. Variables are measured on an ordinal, interval or ratio scale

2. Variables need NOT be normally distributed

3. There is a monotonic relationship (either the variables increase in value together or as one variable value increases the other variable value decreases) but linearity is not needed

4. This type of correlation is NOT very sensitive to outliers

SPSS work

Page 23: Simple correlation

14-12-13

23

Example

no. x(WC) Y(Abd.AT)

1 74.8 28.8

2 72.6 25.7

3 84.0 42.8

4 74.7 25.9

5 71.9 21.7

6 80.9 39.1

7 83.4 42.6

Multiple Correlation