ch13 - linear regression and correlation

12
1 Linear Regression and Correlation Dr. Rick Jerz 1 1 Correlation and Regression Analysis Correlation Analysis is the study of the relationship between variables. It is also defined as a group of techniques to measure the association between two variables. Regression Analysis is a technique used to express the relationship between two variables. If the relationship is assumed to be a straight line, this is called “linear regression.” Variables must be either interval or ratio scales 2 2 Three Questions 1. Are two variables related? (correlation analysis) 2. Is there a linear relationship between two variables? (linear regression analysis) 3. How strong are these relationships? 3 3

Upload: others

Post on 12-Jan-2022

21 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Ch13 - Linear Regression and Correlation

1

Linear Regressionand Correlation

Dr. Rick Jerz

1

1

Correlation and Regression Analysis

• Correlation Analysis is the study of the relationship between variables. It is also defined as a group of techniques to measure the association between two variables.

• Regression Analysis is a technique used to express the relationship between two variables. If the relationship is assumed to be a straight line, this is called “linear regression.”

• Variables must be either interval or ratioscales

2

2

Three Questions

1. Are two variables related? (correlation analysis)

2. Is there a linear relationship between two variables? (linear regression analysis)

3. How strong are these relationships?

3

3

Page 2: Ch13 - Linear Regression and Correlation

2

Correlation and Linear Regression

4

4

Correlation & Regression Example

• The sales manager of Copier Sales of America, which has a large sales force throughout the United States and Canada, wants to determine whether there is a relationship between the number of sales calls made in a month and the number of copiers sold that month. The manager selects a random sample of 15 representatives and determines the number of sales calls each representative made last month and the number of copiers sold.

5

5

Step 1: Look at the Data(Plot the Data)

• A Scatter Diagram is a chart that portrays the relationship between the two variables. It is the usual first step in correlation analysis.

• The Dependent Variable is the variable being predicted or estimated. It is plotted on the y-axis.

• The Independent Variable provides the basis for estimation. It is the predictor variable. It is plotted on the x-axis.

6

6

Page 3: Ch13 - Linear Regression and Correlation

3

Step 1: Look at the Data(Plot the Data)

7

7

Step 2: Are they correlated?

8

8

The Coefficient of Correlation, r

The Coefficient of Correlation (r) is a measure of the strength of the relationship between two variables. It requires interval or ratio-scaled data. • It can range from -1.00 to 1.00• Values of -1.00 or 1.00 indicate perfect and

strong correlation• Values close to 0.0 indicate weak correlation• Negative values indicate an inverse relationship

and positive values indicate a direct relationship

9

9

Page 4: Ch13 - Linear Regression and Correlation

4

Correlation Coefficient –Interpretation

10

10

Correlation Coefficient Equation, r

𝑟 =𝑛∑𝑋𝑌 − ∑𝑋∑𝑌

𝑛∑𝑋! − ∑𝑋 ! 𝑛∑𝑌! − ∑𝑌 !

𝑟 = 0.865

11

11

Correlation Coefficient Calculation

12

12

Page 5: Ch13 - Linear Regression and Correlation

5

“The Variables are Related”

• A strong coefficient of correlation only means that there is a strong relationship.

• It doesn’t mean that one “causes” the other

• Be careful of spurious relationships

13

13

Coefficient of Determination

The coefficient of determination (r2) is the proportion of the total variation in the dependent variable (Y) that is explained or accounted for by the variation in the independent variable (X). It is the square of the coefficient of correlation. • It ranges from 0 to 1• It does not give any information on the direction

of the relationship between the variables

14

14

Coefficient of Determination (r2) - Example

• The coefficient of determination, r2, is 0.748, found by (0.865)2

• This is a proportion or a percent; we can say that 75 percent of the variation in the number of copiers sold is explained, or accounted for, by the variation in the number of sales calls.

15

15

Page 6: Ch13 - Linear Regression and Correlation

6

Step 3: Testing the Significance of the Correlation Coefficient

• H0: r = 0 (the correlation in the population is 0)

• H1: r ≠ 0 (the correlation in the population is not 0)• Reject H0 if:• t > ta/2,n-2 or t < -ta/2,n-2

16

16

t-Test Using Tables

The computed t (6.216) is within the rejection region (4.320,) therefore, we will reject H0. This means the correlation in the population is not zero. From a practical standpoint, it indicates to the sales manager that there is correlation with respect to the number of sales calls made and the number of copiers sold in the population of salespeople.

17

17

Step 4: Is there a linear relationship?

Regression Analysis

• An equation?

• A linear equation

18

18

Page 7: Ch13 - Linear Regression and Correlation

7

Example: Robot Repeatability Data

19

19

20

20

Bivariate Regression Analysis

• Bivariate Regression analyzes the relationship between two variables.

• It specifies one dependent (response) variable and one independent (predictor) variable.

• This hypothesized relationship may be linear, quadratic, or whatever.

21

21

Page 8: Ch13 - Linear Regression and Correlation

8

Linear Regression

• Unknown parameters areβ0 (or a) Interceptβ1 (or b) Slope

• The assumed model for a linear relationship is• yi = β 0 + β 1xi + ei, for all observations (i = 1, 2, …,

n)• or,

yi = a + bxi + ei, for all observations (i = 1, 2, …, n)

• The error term is not observable, is assumed normally distributed with mean of 0 and standard deviation σ.

22

22

Linear Regression Model

• Y “Hat”, is the estimate of Y given X• a is the Y-intercept• b is the slope of the line• X is any value of the independent variable

bXaY +=ˆ

23

23

Which Line is Correct?

24

24

Page 9: Ch13 - Linear Regression and Correlation

9

Least Squares Principle

• Determining a regression equation by minimizing the sum of the squares (the variance) of the vertical distances between the actual Y values and the predicted values of Y.

25

25

Calculating the y-Intercept, a, and Slope, b

22 XnXYXnXYb

-å-å

=

XbYa -=

26

26

Example: Finding the Regression Equation

!𝑌 = 19.9632 + 0.2608𝑥

27

27

Page 10: Ch13 - Linear Regression and Correlation

10

Example: Using the Regression Equation

• How many copiers would we expect to sell if we made 96 sales calls?

!𝑌 = 19.9632 + 0.2608 ∗ 96!𝑌 = 45

28

28

Step 3: Testing the Significance of the Slope

• H0: 𝛃 = 0 (the population slope is 0)• H1: 𝛃 ≠ 0 (the population slope is not 0)• Reject H0 if:• t > ta, n-2 or t < -ta, n-2

29

29

Assumptions Underlying Linear Regression

• For each value of X, there is a group of Y values, and these • Y values are normally distributed. The means of

these normal distributions of Y values all lie on the straight line of regression.

• The standard deviations of these normal distributions are equal.

• The Y values are statistically independent. This means that in the selection of a sample, the Y values chosen for a particular X value do not depend on the Y values for any other X values.

30

30

Page 11: Ch13 - Linear Regression and Correlation

11

Assumptions: Graphic

31

31

Confidence and Prediction Intervals

• Prediction interval• Used to predict the mean value of Y for a given

value of x (one x value)• Confidence interval• Used to predict the mean value of Y for a group

of x values

32

32

Confidence Interval Estimate

• Use the formula above by substituting the numbers computed in previous slides

• Thus, the 95 percent confidence interval for the average sales of all sales representatives who make 50 calls is from 27.4 up to 38.6 copiers.

33

33

Page 12: Ch13 - Linear Regression and Correlation

12

Prediction Interval Estimate

• Use the formula above by substituting the numbers computed in previous slides

• Thus, the 95 percent prediction interval for the average sales of all sales representatives who make 50 calls is from 17.4 up to 48.6 copiers.

34

34

Graphic Illustration

35

35