section 3.3 linear regression
DESCRIPTION
Section 3.3 Linear Regression. Statistics. Linear Regression. It would be great to be able to look at multi-variable data and reduce it to a single equation that might help us make predictions “What would be the predicted number of wins for a team with a 4.0 ERA?”. Linear Regression. - PowerPoint PPT PresentationTRANSCRIPT
Section 3.3Linear Regression
Statistics
AP Statistics, Section 3.3, Part 1 2
Linear Regression
It would be great to be able to look at multi-variable data and reduce it to a single equation that might help us make predictions
“What would be the predicted number of wins for a team with a 4.0 ERA?”
AP Statistics, Section 3.3, Part 1 3
Linear Regression
AP Statistics, Section 3.3, Part 1 4
The Least-Square Regression
Finds the best fit line by trying to minimize the areas formed by the difference of the real data from the predicted data.
AP Statistics, Section 3.3, Part 1 5
The Least-Square Regression
Finds the best fit line by trying to minimize the areas formed by the difference of the real data from the values predicted by the model.
AP Statistics, Section 3.3, Part 1 6
The Least-Square Regression
Statistician use a slightly different version of “slope-intercept” form. y
x
y a bxs
b rs
a y bx
AP Statistics, Section 3.3, Part 1 7
Predicting Model To put the regression line
on the graph use the Statistics:Eq:RegEQ from the Vars menu to put the Y1 equation.
Then you can use Trace or Table or Y1 to find response values that correspond to particular experimental values.
AP Statistics, Section 3.3, Part 1 8
Fact about least-square regression
Make sure you know which is the explanatory (x) variable and which is the response (y) variable.Switching them gets a different regression
line.
AP Statistics, Section 3.3, Part 1 9
Fact about least-square regression
Regression line always goes through the point (x-bar, y-bar)
The coefficient of correlation (r) explains the strength of the linear relationship
The square of the correlation (r2) is the variation in the values of y that is explained by x.
___%(r2) of the variation of ______ (y) is explained by _____ (x).
AP Statistics, Section 3.3, Part 1 10
r2 “coefficient of explanation”
In the regression of ERA vs. WINS, we find a r2 value of .4512
We say “45% of the variation in WINS can be explained by ERA”
AP Statistics, Section 3.3, Part 1 11
Outliers vs. Influential Data
An outlier is an observation outside the overall pattern
If an observation is influential it has a large effect on the regression line. Removing the observation markedly changes the calculation.
AP Statistics, Section 3.3, Part 1 12
Outliers vs. Influential Data
AP Statistics, Section 3.3, Part 1 13
Residuals
It is important to note that the observed value almost never match the predicted values exactly
The difference between the observed value and predicted has a special name: residual
Observed Value (y): 5.3 ERA, 43 Wins
Predicted Value ( ): 5.3 ERA 67.03 Wins
y
Residual:
43-67.03=-24.03ˆy y
AP Statistics, Section 3.3, Part 1 14
Residuals Residuals are
negative when the observed value is below the predicted value
Residuals are positive when the observed value is higher than the predicted value
Observed Value (y): 5.3 ERA, 43 Wins
Predicted Value ( ): 5.3 ERA 67.03 Wins
y
Residual:
43-67.03=-24.03ˆy y
AP Statistics, Section 3.3, Part 1 15
Residual Plots
You can plot the residuals to see if the there is any trends with the quality of the predictive model
AP Statistics, Section 3.3, Part 1 16
Residual Plots
This residual shows no tendencies. It is equally bad throughout.
AP Statistics, Section 3.3, Part 1 17
“Under predicts on the ends”
AP Statistics, Section 3.3, Part 1 18
“Predictive accuracy decreases”
AP Statistics, Section 3.3, Part 1 19
“Well Distributed”
AP Statistics, Section 3.3, Part 1 20
Assignment
Exercises: 3.38, 3.40 for Tuesday Exercises: 3.42, 3.43, 3.46, 3.47, 3.49,
3.53, 3.55, 3.57, 3.61 for Thursday Chapter Review for Monday: 3.63, 3.67,
3.71, 3.73, 3.75, 3.77 Sample Test due Monday Chapter 3 Test take home due on Monday