ps 225 lecture 20 linear regression equation and prediction
TRANSCRIPT
PS 225Lecture 20
Linear Regression Equation and Prediction
Adding Regression Line
Dependence
What if two variables are correlated? What if the mean of a variable is
dependent on the value of another variable? Is it dependent? How much is it dependent? How can we express the dependence
algebraically?
Examples of Dependence
The distance traveled at a given speed
= x The cost of a bag of bulk mixed nuts with a
given price per pound
= x
Distance Speed Time
Cost WeightPrice
Linear Relationship
s
Types of Relationships
Deterministic Relationship One variable totally determines the value of
another variable with perfect accuracy Algebraic linear relationship Previous examples
Variable One variable affects the value of another
variable with some element of variability Example: Height and weight
Using SPSS to Determine a Linear Relationship Is there a relationship?
Linear Regression Form of a Line Algebraic Form of Line:
A is the y-intercept B is the slope
Linear Regression Meaning of the Line A is the ‘constant’ B is a ‘coefficient’
bxay
SPSS Output for A Regression Line
Y = -18331.2 + 3909.907*x
X = Education Level
Y = Current Salary
Interpreting the Constant
Only has meaning if:
• Data present to validate
• Can naturally occur
Interpreting the Coefficient
Change in dependent variable for each unit change in the independent variable
2-Step Hypothesis Process
Test Overall Linear Relationship Test Contribution of Each Component
Similar to 2-Way ANOVA
Step 1: Overall Test
Is there a linear relationship? Ho: Means are the same at all values of
x (No relationship) Ha: There is a linear relationship
between x and y
If significance<.05 conclude relationship Otherwise, stop analysis
Step 2: Component Tests
Is the component significant? Intercept Coefficient
Ho: Not Significant Ha: Significant
If significance<.05 conclude significant Otherwise, eliminate from analysis and
recreate model
Line of Best Fit Regression line that minimizes the
distance to data points SPSS calculations
Sum of Squares Sum of squared differences for each data
point Regression- Difference between overall
mean and regression line Residual- Difference Between the
regression line and data points
Regression lines minimize the residual sum of squares
Deviations
Sum of Squares
Predicting Values from a Linear Regression
Write equation for the regression line ‘Plug in’ independent variable Gain a prediction for the dependent
variable
The relationship between the values of the independent variable and the prediction are deterministic
Accuracy of Predictions
The BEST guess Probably not exact due to variability Correct on average
Quality of Prediction
Predicted values must be within the range of the data
Relationship must be linear over the entire range of the data
Line must not depend too strongly on one point
SPSS AssignmentLast class we answered the following questions:
Does the number of years of education an individual has affect the hours of television a person watches?
Does age affect the hours of television a person watches?
This class: Use SPSS to find the regression equation that best represents each relationship. Write the full regression equation. Make a prediction for yourself with each regression
equation How different is each prediction from the number of hours
you watch? If the equation under predicts, report your answer as a negative number. If it over predicts report your answer as a positive number. Add your prediction error to the class data.