basic statistics linear regression. x y simple linear regression

20
Basic Statistics Linear Regression

Upload: marianna-dickerson

Post on 18-Jan-2016

260 views

Category:

Documents


3 download

TRANSCRIPT

Page 1: Basic Statistics Linear Regression. X Y Simple Linear Regression

Basic Statistics

Linear Regression

Page 2: Basic Statistics Linear Regression. X Y Simple Linear Regression

X

Y

Simple Linear Regression

Page 3: Basic Statistics Linear Regression. X Y Simple Linear Regression

Predicting Y from X

• Recall when we looked at scatter plots in our discussion of correlation, we showed generally the estimate of Y given a value for X, when the correlation was not perfect.

• We will now look at how to use our knowledge of the correlation to predict a value for Y, when we know a value for X.

Page 4: Basic Statistics Linear Regression. X Y Simple Linear Regression

Variable X

Variable Y

The GREEN line shows our prediction or regression line.

high

high

lowlow

Scatter Plot of Y and X

Estimated Y value

Page 5: Basic Statistics Linear Regression. X Y Simple Linear Regression

Prediction Equation

• The green line in the previous slide showed us our prediction line.

• We will use the mathematical formula for a straight line as the method for predicting a value for Y when we know the value for X.

• The process is called “Linear Regression” because, in this class, we will only deal with relationships that can be fitted by a straight line.

• The general formula for a straight line is:

XbaY yy ˆ

Page 6: Basic Statistics Linear Regression. X Y Simple Linear Regression

The Prediction Equation

• ay = the intercept or where the prediction line crosses the Y-axis (the value of Y when X = 0)

• by = the regression coefficient that indicates the amount of change in Y when the value of X increases one unit.

XbaY yy ˆ

Page 7: Basic Statistics Linear Regression. X Y Simple Linear Regression

A Simple Example• Suppose that a club charges a flat $25 to use

their facilities.• They also charge a $10 fee per hour for using

the tennis courts.• Now, assume that you want to play tennis for

2 hours at this club. How much would you have to pay?

Ŷ= $25 + (2) $10 = $25 + $20 = $45 for two hours of tennis

Page 8: Basic Statistics Linear Regression. X Y Simple Linear Regression

Linking the Simple Example to Regression

Ŷ= $25 + (2) $10 = $25 + $20 = $45 for two hours of tennis

• In our example:– $25 is ay, the intercept. Even if we didn’t play any

tennis (X = 0), it would cost $25 to use the club.– $10 is by, the regression coefficient (it costs $10 for

each hour of tennis played)

• In this case we predicted how much it would cost (Y) when we knew how long we wanted to play tennis.

Page 9: Basic Statistics Linear Regression. X Y Simple Linear Regression

Formulae for Sums of Squares

n

YXXYSSxy

n

YYSSy

n

XXSSx

2

2

2

2

These were introduced in our discussion of correlation.

Page 10: Basic Statistics Linear Regression. X Y Simple Linear Regression

Calculating the Regression Coefficient (b)

SSxSSxy

b

or

nX)(

X

nY)X)((

XYb

2

2

Page 11: Basic Statistics Linear Regression. X Y Simple Linear Regression

Calculating the Intercept (a)

XbYa You will notice that you must calculate the regression coefficient (b) before you can calculate the intercept (a), since the calculation of a uses b.

Page 12: Basic Statistics Linear Regression. X Y Simple Linear Regression

An Example

• From our earlier example, suppose that our college statistics professor is interested in predicting how many errors students might make on the mid-term examination based on how many hours they studied. Specifically, the professor wants to know how many errors a student might make if the student studied for 5 hours.

Page 13: Basic Statistics Linear Regression. X Y Simple Linear Regression

The Stats Professor’s Data

Student X Y X2 Y2 XY

1 4 15 16 225 60

2 4 12 16 144 48

3 5 9 25 81 45

4 6 10 36 100 60

5 7 8 49 64 56

6 7 4 49 16 28

7 7 6 49 36 42

8 9 2 81 4 18

9 9 4 81 16 36

10 12 3 100 9 36

Total X = 70 Y = 73 X2 =546 Y2=695 XY=429

Page 14: Basic Statistics Linear Regression. X Y Simple Linear Regression

The Resulting Sum of Squares

Student X Y X2 Y2 XY

Total X = 70 Y = 73 X2 =546 Y2=695 XY=429

n

YYSSy

2

2

n

YXXYSSxy

n

XXSSx

2

2 = 546 - 702/10 = 546 - 490 = 56

= 695 - 732/10 = 695 - 523.9 = 162.1

= 429 – (70)(73)/10 = 429 – 511 = -82

Page 15: Basic Statistics Linear Regression. X Y Simple Linear Regression

Calculating the Regression Coefficient (b)

SSx

SSxyb = - 82 / 56

= - 1.46

This can be interpreted as the change in the value of Y (in our case, errors made on the mid-term), for a unit change in X, or for us, each additional hour studied! Thus, study for another hour and make 1.46 fewer mistakes (on average!).

Page 16: Basic Statistics Linear Regression. X Y Simple Linear Regression

Calculating the Intercept (a)

XbYa = 7.3 – (-1.46)(7)

= 7.3 + 10.25

= 17.55

Therefore, our prediction equation is Ŷ = 17.55 + (-1.46) (X)

Page 17: Basic Statistics Linear Regression. X Y Simple Linear Regression

Using Our Prediction Equation

Ŷ = 17.55 + (-1.46) (X)

If the professor wanted to predict the number of errors a student might make if the student had studied for 5 hours, then we would substitute 5 for X in the above equation and obtain:

Ŷ = 17.55 + (-1.46) (5) = 17.55 + (-7.3) = 10.25

Thus, the professor would predict 10.25 errors for a student who had studied for 5 hours.

Page 18: Basic Statistics Linear Regression. X Y Simple Linear Regression

Measuring Prediction Errors:The Standard Error of the Estimate

2

/ 2

/

nSSx

nYXXYSSy

S xy

OR 2

1 2

/

n

SSyrS xy

Since we know that the estimate is not exact, as statisticians, we must report how much error we feel is in our estimate. The formula is:

Page 19: Basic Statistics Linear Regression. X Y Simple Linear Regression

Calculating the Standard Error of the Estimate

2

1 2

/

n

SSyrS xy

= 1 - .74(162.1) / 8

= 2.29Thus, when we estimated 10.25 errors, we also would report that the Standard Error of the Estimate is 2.29.

Page 20: Basic Statistics Linear Regression. X Y Simple Linear Regression

Summarizing Prediction Equations

• The existence of a relationship between two variables allows us to use that knowledge to make predictions.

• The prediction based on our equation will result in less error in prediction than using the mean of the dependent variable.

• Two sums of squares are required to calculate the regression coefficient and the intercept.