statistics for the behavioral sciences (5 th ed.) gravetter & wallnau chapter 16 correlations...

23
Statistics for the Behavioral Statistics for the Behavioral Sciences (5 Sciences (5 th th ed.) ed.) Gravetter & Wallnau Gravetter & Wallnau Chapter 16 Chapter 16 Correlations and Regression Correlations and Regression University of Guelph Psychology 3320 — Dr. K. Hennig Winter 2003 Term

Upload: ezra-owens

Post on 23-Dec-2015

221 views

Category:

Documents


2 download

TRANSCRIPT

Page 1: Statistics for the Behavioral Sciences (5 th ed.) Gravetter & Wallnau Chapter 16 Correlations and Regression University of Guelph Psychology 3320 — Dr

Statistics for the Behavioral Sciences (5Statistics for the Behavioral Sciences (5thth ed.) ed.)

Gravetter & WallnauGravetter & Wallnau

Chapter 16Chapter 16

Correlations and RegressionCorrelations and Regression

University of GuelphPsychology 3320 — Dr. K. Hennig

Winter 2003 Term

Page 2: Statistics for the Behavioral Sciences (5 th ed.) Gravetter & Wallnau Chapter 16 Correlations and Regression University of Guelph Psychology 3320 — Dr

Overview of chapterOverview of chapter

CorrelationsCorrelations– Pearson Pearson rr– For non-linear (non scalar) data:For non-linear (non scalar) data:

Spearman r (with non-linear data)Spearman r (with non-linear data) point-biserial (where one variable is dichotomous)point-biserial (where one variable is dichotomous) phi-coefficient (where both variables are phi-coefficient (where both variables are

dichotomous)dichotomous)

RegressionsRegressions

Page 3: Statistics for the Behavioral Sciences (5 th ed.) Gravetter & Wallnau Chapter 16 Correlations and Regression University of Guelph Psychology 3320 — Dr

CORRELATIONS:CORRELATIONS:Figure 16-1 (p. 522)Figure 16-1 (p. 522)The relationship between exam grade and time needed to The relationship between exam grade and time needed to complete the exam. Notice the general trend in these data: complete the exam. Notice the general trend in these data: Students who finish the exam early tend to have better Students who finish the exam early tend to have better grades.grades.

Page 4: Statistics for the Behavioral Sciences (5 th ed.) Gravetter & Wallnau Chapter 16 Correlations and Regression University of Guelph Psychology 3320 — Dr

Figure 16-2 (p. 523) Figure 16-2 (p. 523) The same set of The same set of nn = 6 pairs of scores ( = 6 pairs of scores (XX and and YY values) values) is shown in a table and in a scatterplot. Notice that the is shown in a table and in a scatterplot. Notice that the scatterplot allows you to see the relationship between scatterplot allows you to see the relationship between XX and and YY..

Page 5: Statistics for the Behavioral Sciences (5 th ed.) Gravetter & Wallnau Chapter 16 Correlations and Regression University of Guelph Psychology 3320 — Dr

Three characteristicsThree characteristics1. Direction: e1. Direction: examples of positive and negative xamples of positive and negative relationships. (a) Beer sales are positively related to relationships. (a) Beer sales are positively related to temperature. (b) Coffee sales are negatively related to temperature. (b) Coffee sales are negatively related to temperature.temperature.

Page 6: Statistics for the Behavioral Sciences (5 th ed.) Gravetter & Wallnau Chapter 16 Correlations and Regression University of Guelph Psychology 3320 — Dr

2. Form:2. Form: Examples of relationships that are not linear: Examples of relationships that are not linear: (a) relationship between reaction time and age; (b) (a) relationship between reaction time and age; (b) relationship between mood and drug dose.relationship between mood and drug dose.

Page 7: Statistics for the Behavioral Sciences (5 th ed.) Gravetter & Wallnau Chapter 16 Correlations and Regression University of Guelph Psychology 3320 — Dr

3. Degree:3. Degree: Examples of different values for linear Examples of different values for linear correlations: (a) shows a strong positive correlations: (a) shows a strong positive relationship, approximately +0.90; (b) shows a relationship, approximately +0.90; (b) shows a relatively weak negative correlation, approximately relatively weak negative correlation, approximately –0.40; (c) shows a perfect negative correlation, ––0.40; (c) shows a perfect negative correlation, –1.00; (d) shows no linear trend, 0.00.1.00; (d) shows no linear trend, 0.00.

Page 8: Statistics for the Behavioral Sciences (5 th ed.) Gravetter & Wallnau Chapter 16 Correlations and Regression University of Guelph Psychology 3320 — Dr

Pearson (product-moment) Pearson (product-moment) correlationcorrelation sum of productssum of products of deviations, or of deviations, or SPSP

= = (X-M(X-Mxx) (Y-M) (Y-MYY), ), MMxx = mean for x scores, = mean for x scores, etc.etc.

Recall: Recall: SSSS = ∑( = ∑(X-MX-M))2=2=((X-MX-M)()(X-MX-M))

3 5

Page 9: Statistics for the Behavioral Sciences (5 th ed.) Gravetter & Wallnau Chapter 16 Correlations and Regression University of Guelph Psychology 3320 — Dr

Pearson (product-moment) Pearson (product-moment) correlationcorrelation r r = degree to which X and Y vary together= degree to which X and Y vary together

degree X and Y vary separately degree X and Y vary separately

computational formula:computational formula:SP= SP= XY- XY- X XY/nY/n

expressed as a expressed as a zz-score: -score: rr= = zzxxzzyy//nnnote: must use population note: must use population

YX SSSS

SPr

Page 10: Statistics for the Behavioral Sciences (5 th ed.) Gravetter & Wallnau Chapter 16 Correlations and Regression University of Guelph Psychology 3320 — Dr

Understanding and interpreting Understanding and interpreting rr

1)1) correlation do not prove causation, but correlation do not prove causation, but they can disprove causationthey can disprove causation

2)2) the value of a correlation can be effected the value of a correlation can be effected greatly by range of scores in the datagreatly by range of scores in the data

3)3) outliers can have a dramatic effectoutliers can have a dramatic effect

4)4) do not interpret a correlation as a do not interpret a correlation as a proportion (e.g., 0.50 = 50%); rather proportion (e.g., 0.50 = 50%); rather rr22 = .25 or 25% of the total variability is = .25 or 25% of the total variability is accounted for|accounted for|-is called the -is called the coefficient of determinationcoefficient of determination

Page 11: Statistics for the Behavioral Sciences (5 th ed.) Gravetter & Wallnau Chapter 16 Correlations and Regression University of Guelph Psychology 3320 — Dr

The effect of rangeThe effect of range(a) In this example, the full range of (a) In this example, the full range of XX and and YY values values shows a strong, positive correlation, but the restricted shows a strong, positive correlation, but the restricted range of scores produces a correlation near zero. range of scores produces a correlation near zero. (b) An example in which the full range of (b) An example in which the full range of XX and and YY values shows a correlation near zero, but the scores in values shows a correlation near zero, but the scores in the restricted range produce a strong, positive the restricted range produce a strong, positive correlation.correlation.

Page 12: Statistics for the Behavioral Sciences (5 th ed.) Gravetter & Wallnau Chapter 16 Correlations and Regression University of Guelph Psychology 3320 — Dr

OutliersOutliersA demonstration of how one extreme data A demonstration of how one extreme data point (an outlier) can influence the value of point (an outlier) can influence the value of a correlation.a correlation.

Page 13: Statistics for the Behavioral Sciences (5 th ed.) Gravetter & Wallnau Chapter 16 Correlations and Regression University of Guelph Psychology 3320 — Dr

Hyporthesis testingHyporthesis testing

HH00: p = 0 (There is no population : p = 0 (There is no population correlation)correlation)

HH11: p : p 0 (there is a real correlation) 0 (there is a real correlation)

Page 14: Statistics for the Behavioral Sciences (5 th ed.) Gravetter & Wallnau Chapter 16 Correlations and Regression University of Guelph Psychology 3320 — Dr

CORRELATIONS: For non-linear relations CORRELATIONS: For non-linear relations Relationship between practice and performance. Relationship between practice and performance. Although this relationship is not linear, there is a Although this relationship is not linear, there is a consistent positive relationship. An increase in consistent positive relationship. An increase in performance tends to accompany an increase in performance tends to accompany an increase in practice.practice.

Page 15: Statistics for the Behavioral Sciences (5 th ed.) Gravetter & Wallnau Chapter 16 Correlations and Regression University of Guelph Psychology 3320 — Dr

Spearman Spearman rr: Scatterplots showing (a) the scores and : Scatterplots showing (a) the scores and (b) the ranks for the data in Example 16.8. Notice (b) the ranks for the data in Example 16.8. Notice that there is a consistent, positive relationship that there is a consistent, positive relationship between the between the XX and and YY scores, although it is not a scores, although it is not a linear relationship. Also notice that the scatterplot of linear relationship. Also notice that the scatterplot of the ranks shows a perfect linear relationship.the ranks shows a perfect linear relationship.

Steps:Steps:1. rank order1. rank order2. use formula of Pearson 2. use formula of Pearson rr, or , or Special formula Special formula )( 1

61

2

2

nn

Drs

Page 16: Statistics for the Behavioral Sciences (5 th ed.) Gravetter & Wallnau Chapter 16 Correlations and Regression University of Guelph Psychology 3320 — Dr

Other measures of relationshipOther measures of relationship

Point-biserial - where one variable is dichotomous Point-biserial - where one variable is dichotomous (has two values; male vs. female, first-born vs. (has two values; male vs. female, first-born vs. later born, etc.)later born, etc.)

phi-coefficient - where both variables are (e.g., phi-coefficient - where both variables are (e.g., variable above - birth order (->1variable above - birth order (->1stst vs. later born) vs. later born)

Page 17: Statistics for the Behavioral Sciences (5 th ed.) Gravetter & Wallnau Chapter 16 Correlations and Regression University of Guelph Psychology 3320 — Dr

Introduction to regressionIntroduction to regressionSAT scores and GPA - regression line drawn through SAT scores and GPA - regression line drawn through the data points. The regression line defines a the data points. The regression line defines a precise, one-to-one relationship between each precise, one-to-one relationship between each XX value (SAT score) and its corresponding value (SAT score) and its corresponding YY value value (GPA). (GPA).

Page 18: Statistics for the Behavioral Sciences (5 th ed.) Gravetter & Wallnau Chapter 16 Correlations and Regression University of Guelph Psychology 3320 — Dr

Relationship between total cost and number of hours Relationship between total cost and number of hours playing tennis. The tennis club charges a $25 playing tennis. The tennis club charges a $25 membership fee plus $5 per hour. The relationship is membership fee plus $5 per hour. The relationship is described by a linear equation: Total cost = $5 described by a linear equation: Total cost = $5 (number of hours) + $25 (number of hours) + $25 YY = = bXbX + + aa..

The statistical technique for finding a best-fit line is The statistical technique for finding a best-fit line is called called regressionregression

Page 19: Statistics for the Behavioral Sciences (5 th ed.) Gravetter & Wallnau Chapter 16 Correlations and Regression University of Guelph Psychology 3320 — Dr

The distance between the actual data point (The distance between the actual data point (YY) and ) and the predicted point on the line (the predicted point on the line (ŶŶ) is defined as ) is defined as YY – – ŶŶ. The goal of regression is to find the equation for . The goal of regression is to find the equation for the line that minimized these distances.the line that minimized these distances.

Page 20: Statistics for the Behavioral Sciences (5 th ed.) Gravetter & Wallnau Chapter 16 Correlations and Regression University of Guelph Psychology 3320 — Dr

Best-fit straight lineBest-fit straight line. The predicted . The predicted YY values ( values (ŶŶ) are on ) are on the regression line. Unless the correlation is perfect the regression line. Unless the correlation is perfect (+1.00 or (+1.00 or –1.00), there will be some error between the –1.00), there will be some error between the actual actual YY values and the values and the predicted Ypredicted Y values. The larger values. The larger the correlation is, the less the error will be.the correlation is, the less the error will be.

Page 21: Statistics for the Behavioral Sciences (5 th ed.) Gravetter & Wallnau Chapter 16 Correlations and Regression University of Guelph Psychology 3320 — Dr

(a)(a) Scatterplot showing data points that perfectly fit Scatterplot showing data points that perfectly fit the regression equation the regression equation Ŷ Ŷ = 1.6= 1.6XX – 2. Note that – 2. Note that the correlation is the correlation is rr = 1.00. (b) Scatterplot for the = 1.00. (b) Scatterplot for the data from Example 16.14 Notice that there is error data from Example 16.14 Notice that there is error between the actual data points and the predicted between the actual data points and the predicted YY values of the regression line. values of the regression line.-total squared error = ∑(Y--total squared error = ∑(Y-ŶŶ))22 ->least squared ->least squared solutionsolution

Page 22: Statistics for the Behavioral Sciences (5 th ed.) Gravetter & Wallnau Chapter 16 Correlations and Regression University of Guelph Psychology 3320 — Dr

Regression (contd.)Regression (contd.)

The regression equation for Y is the linear equation:The regression equation for Y is the linear equation:– Goal is to find best a and b for best-fit lineGoal is to find best a and b for best-fit line– ŶŶ = = bbX + X + aa, where:, where:

– b = SP/SSb = SP/SSxx, and , and a = Ma = MYY-bM-bMxx

SP = SP = (X-M(X-Mxx) (Y-M) (Y-MYY))

SSx= SSx= (X-M(X-Mxx))22

ExampleExample X = 1, 3, 5 Y=4, 9, 8 (from text p. 559)X = 1, 3, 5 Y=4, 9, 8 (from text p. 559) What are the predicted values for 5, 7, 9?What are the predicted values for 5, 7, 9? SPSSSPSS

Page 23: Statistics for the Behavioral Sciences (5 th ed.) Gravetter & Wallnau Chapter 16 Correlations and Regression University of Guelph Psychology 3320 — Dr

A set of 9 data points (A set of 9 data points (XX and and YY values) with a values) with a correlation of correlation of rr = 0.80. The colored lines in part (a) show deviations = 0.80. The colored lines in part (a) show deviations from the mean for from the mean for YY. For these data, . For these data, SSSSYY = 240 (total = 240 (total variability). variability). In part (b) the colored lines show deviations from the In part (b) the colored lines show deviations from the regression line. For these data, SSregression line. For these data, SSerrorerror = 86.4 The = 86.4 The regression line reduces regression line reduces SSSS value by value by rr22 = 0.64 or 64%. = 0.64 or 64%. Error= 1 - Error= 1 - rr22