inference for slope sss handout - dupont manual … · if so, provide interpretations in this...

22
Inference for the slope of the LSRL Student Saturday Session © Chris True • National Math and Science. Student Notes – Prep Session Topic: Inference for Regression The AP Statistics exam is likely to have several items that test your ability to compute confidence intervals and perform significance tests for the slope of a least squares regression line. Past questions on this topic have provided computer output with the standard error of the slope. Note that the topic outline does not include inference for the intercept of a least squares line, nor does it include inference for predictions. Formula Provided: The following formula is provided in the section on descriptive statistics: s b 1 = ( y i ! ˆ y i ) 2 n ! 2 ( x i ! x ) 2 This formula is not very intuitive but it measures the spread in the sampling distribution of the slope of the LSRL. That is, if you were to take every possible sample of size n from a population, calculate each slope and create a distribution of all possible slopes from these samples, then s b 1 is the standard deviation of this sampling distribution. Communication, skills, and understanding ‐‐‐ Inference about the population (true) regression line y = ! O + ! 1 x is based on the sample least squares regression line ˆ y = b O + b 1 x . Be sure you understand that the regression coefficients b O and b 1 vary with different samples. Inference about ! 1 is based on knowledge of the sampling distribution of the sample slope b 1 . Theory tells us that if certain conditions (see below) are satisfied, sample slopes will be normally distributed with mean equal to the true value of the slope for the population, ! 1 , and standard deviation ! b 1 = ! ( x i ! μ x ) 2 . When we estimate ! b 1 with s b 1 (see formula above), the quantity b 1 ! ! 1 s b 1 has a t distribution with n2 degrees of freedom. Conditions for inference: 1. Linear model is appropriate. (True relationship is linear.) Check the scatter plot for linearity and the residual plot for no pattern. 2. Independent observations. This is a design issue that should be addressed in information about the data. Random sampling or random assignment will suffice. 3. Normality – the yvalues vary normally about the true regression line. Check that residuals are approximately normally distributed using a histogram, dotplot, normal probability plot, or stem and leaf plot. 4. Standard deviation of yvalues is the Same for every value of x. Check the residual plot to be sure that the spread of the residuals about the horizontal axis is approximately uniform (no “trumpet” appearance). Confidence intervals (as always) require that you‐‐‐ 1. Identify the confidence interval procedure that you will use 2. State the conditions and verify that they are satisfied. 3. Carry out the computations for the confidence interval. Be sure to state degrees of freedom. A confidence interval for slope has the familiar form (estimate) ± (ctirtcal value)(standard deviation of the statistic) . In the context of the slope of the LSRL, that is b 1 ± t n!2 * s b 1 4. Interpret the confidence interval in the context of the problem.

Upload: hakhuong

Post on 07-Sep-2018

214 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Inference for Slope SSS Handout - duPont Manual … · If so, provide interpretations in this context. If not, explain why not. C. John wants to provide a 98 percent confidence interval

Inference for the slope of the LSRL Student Saturday Session

© Chris True • National Math and Science.

Student Notes – Prep Session Topic: Inference for Regression The AP Statistics exam is likely to have several items that test your ability to compute confidence intervals and perform significance tests for the slope of a least squares regression line. Past questions on this topic have provided computer output with the standard error of the slope. Note that the topic outline does not include inference for the intercept of a least squares line, nor does it include inference for predictions. Formula Provided: The following formula is provided in the section on descriptive statistics:

sb1 =

(yi ! yi )2

n ! 2(xi ! x )

2

This formula is not very intuitive but it measures the spread in the sampling distribution of the slope of the LSRL. That is, if you were to take every possible sample of size n from a population, calculate each slope and create a distribution of all possible slopes from these samples, then sb1 is the standard deviation of this sampling distribution. Communication, skills, and understanding ‐‐‐

• Inference about the population (true) regression line y = !O + !1x is based on the sample least squares regression line y = bO + b1x .

• Be sure you understand that the regression coefficients bO and b1vary with different samples. • Inference about !1 is based on knowledge of the sampling distribution of the sample slope b1 . • Theory tells us that if certain conditions (see below) are satisfied, sample slopes will be normally

distributed with mean equal to the true value of the slope for the population, !1 , and standard deviation

! b1= !

(xi ! µx )2

. When we estimate ! b1with sb1 (see formula above), the quantity

b1 ! !1

sb1 has a t

distribution with n‐2 degrees of freedom. • Conditions for inference:

1. Linear model is appropriate. (True relationship is linear.) Check the scatter plot for linearity and the

residual plot for no pattern. 2. Independent observations. This is a design issue that should be addressed in information about the data.

Random sampling or random assignment will suffice. 3. Normality – the y‐values vary normally about the true regression line. Check that residuals are

approximately normally distributed using a histogram, dotplot, normal probability plot, or stem and leaf plot.

4. Standard deviation of y‐values is the Same for every value of x. Check the residual plot to be sure that the spread of the residuals about the horizontal axis is approximately uniform (no “trumpet” appearance).

• Confidence intervals (as always) require that you‐‐‐

1. Identify the confidence interval procedure that you will use 2. State the conditions and verify that they are satisfied. 3. Carry out the computations for the confidence interval. Be sure to state degrees of freedom. A confidence

interval for slope has the familiar form (estimate)± (ctirtcal value)(standard deviation of the statistic) . In the context of the slope of the LSRL, that is b1 ± tn!2

* sb1 4. Interpret the confidence interval in the context of the problem.

Page 2: Inference for Slope SSS Handout - duPont Manual … · If so, provide interpretations in this context. If not, explain why not. C. John wants to provide a 98 percent confidence interval

Inference for the slope of the LSRL Student Saturday Session

© Chris True • National Math and Science.

• Significance tests (as always) require that you‐‐‐ 1. State the hypotheses in terms of parameters, using standard symbols and words that communicate the

context of the problem. 2. Identify the test you will perform. Then state the conditions and verify that they are satisfied. 3. Calculate the test statistic and P‐value. Also state the degrees of freedom. 4. State your conclusion in context and connect it to the P‐value.

• The statement of the null hypothesis assumed by computers and calculators is HO : !1 = 0 . However, the null hypothesis may specify any value for !1 , for example, HO : !1 =1 .

• When the null hypothesis is HO : !1 = 0 , the test is often referred to as the model utility test since the null hypothesis can be thought of as a statement that there is no useful relationship between the variables. Rejecting the null hypothesis leads to the conclusion that the linear regression model with x as the explanatory variable is useful for predicting y. It would also be appropriate to state that there is evidence to suggest a linear relationship between the variables.

• Typically, the information will be presented as output from a computer software package such as Minitab. The output may look similar to the output in this example from the 2012 exam. It is important that you can pick out the important information in the output.

Example: As part of a class project at a large university, Amber selected a random sample of 12 students in her major field of study. All students in the sample were asked to report their number of hours spent studying for the final exam and their score on the final exam. A regression analysis on the data produced the following partial computer output. Assume all conditions for inference have been met.

Predictor Coef SE Coef T P Constant 62.328 4.570 13.64 0.000

Study Hours 2.697 0.745 3.62 0.005

S = 5.505 R-sq = 56.7%                        

y intercept

slope

Standard error of the slope

P value associated with the 2-tailed test.

Page 3: Inference for Slope SSS Handout - duPont Manual … · If so, provide interpretations in this context. If not, explain why not. C. John wants to provide a 98 percent confidence interval

Inference for the slope of the LSRL Student Saturday Session

© Chris True • National Math and Science.

From the information given on the previous page, answer the following questions: A. Find the equation of the least squares regression line. Be sure to define the variables in the equation.

B. Interpret the slope and y intercept in the equation in part A. C. What percent of the variation in students’ grades can be attributed to a linear relationship to the number

of hours spent studying? D. Find a 95% confidence interval for the slope of the least squares regression line for all students in

Amber’s field of study. Interpret the interval. E. Is there evidence at the 0.05 level to suggest that students scores increase with the number of hours

spent studying?

Page 4: Inference for Slope SSS Handout - duPont Manual … · If so, provide interpretations in this context. If not, explain why not. C. John wants to provide a 98 percent confidence interval

Inference for the slope of the LSRL Student Saturday Session

© Chris True • National Math and Science.

Answers: A. y = 62.328+ 2.697x where y is the estimated score a student in Amber’s major would score on the exam if

he/she spent x hours studying for the exam. B. 56.7% - this is simply r2 which is given in the table as “R-sq” C. A 95 % confidence interval is b1 ± tn!2

* sb1 = (2.697)± (2.228)(0.745) = (1.037, 4.357) - I am 95% confident that for each additional hour of studying, a students’ score will increase by approximately 1.037 to 4.357 points.

D. 1. HO : !1 = 0HA : !1 > 0

where !1 is the slope of the LSRL for all students in Amber’s major. 2. I will perform a linear regression t test for the slope of the least squares regression line. The problem

stated that the conditions were satisfied. These conditions are: a linear model is appropriate (the scatterplot of hours vs scores looked linear (not curved), the sample was a simple random sample, the distribution of the residuals is approximately normal, the spread of the residuals was fairly constant for all values of x.

3. t = 3.62 and P(t10 > 3.62) ! (.5)(0.005) = .0025 (half of the two sided P-val given in the table) 4. Since the p-value is less than 0.05, I reject the null hypothesis in favor of the alternative. That is, I have

sufficient evidence to suggest that the scores for students’ in Amber’s major tend to increase as they spend more time studying.

Multiple Choice Questions

1. In a study of the performance of a computer printer, the size (in kilobytes) and the printing time (in seconds)

for each of 22 small text files were recorded. A regression line was a satisfactory description of the relationship between size and printing time. The results of the regression analysis are shown below. Which of the following should be used to compute a 95 percent confidence interval for the slope of the regression line?

A. 3.47812 ± 2.086 ! 0.294 B. 3.47812 ±1.96 ! 0.6174 C. 3.47812 ±1.725! 0.294 D. 11.6559 ± 2.086 ! 0.3153 E. 11.6559 ±1.725! 0.3153

Page 5: Inference for Slope SSS Handout - duPont Manual … · If so, provide interpretations in this context. If not, explain why not. C. John wants to provide a 98 percent confidence interval

Inference for the slope of the LSRL Student Saturday Session

© Chris True • National Math and Science.

For problems 2-3: Boiling and melting points (in degrees Celsius) are recorded for 21 selected substances, and regression

analysis is used to describe the relationship between them. The results of the analysis are shown below.

Dependent variable is: Boiling point

Predictor Coef SE Coef T P Constant 309.914 146.7 2.11 0.0481

Melting point 0.9594 0.2104 4.56 0.0001

S = 626.4 R-sq = 73.4% Assume that all of the conditions for regression have been met. 2. Which of the following gives a 95% confidence interval for the slope of the regression line? A. 0.9594 ±1.729(0.2104) B. 0.9594 ±1.96(0.2104) C. 0.9594 ± 2.093(0.2104) D. 309.914 ±1.729(146.7) E. 309.914 ± 2.093(626.4) 3. Suppose that a significance test was conducted to determine whether there was a useful positive linear

relationship between the melting point and the boiling points of substances. Does this analysis provide sufficient evidence to suggest that there is a positive linear relationship between melting points and boiling points of substances at the 5% level?

A. Yes because the slope of the line for these 21 substances is 0.9594 (which is positive). B. Yes, the p value for the 1-sided test is 0.0481 and 0.0481 < 0.05. C. Yes, the p value for the 1-sided test is (0.5)(0.0001) = 0.00005 and 0.00005 < 0.05. D. No, the p value for this 1-sided test is 0.734 and 0.734 > 0.05. E. No, the p value for this 1-sided test is (2)(0.0481) = 0.0962 and 0.0962 > 0.05.

Page 6: Inference for Slope SSS Handout - duPont Manual … · If so, provide interpretations in this context. If not, explain why not. C. John wants to provide a 98 percent confidence interval

Inference for the slope of the LSRL Student Saturday Session

© Chris True • National Math and Science.

MC Answers: 1-A, 2-C, 3-C 2005B #5 5. John believes that as he increases his walking speed, his pulse rate will increase. He wants to model this

relationship. John records his pulse rate, in beats per minute (bpm), while walking at each of seven different speeds, in miles per hour (mph). A scatterplot and regression output are shown below.

A. Using the regression output, write the equation of the fitted regression line.

Page 7: Inference for Slope SSS Handout - duPont Manual … · If so, provide interpretations in this context. If not, explain why not. C. John wants to provide a 98 percent confidence interval

Inference for the slope of the LSRL Student Saturday Session

© Chris True • National Math and Science.

B. Do your estimates of the slope and intercept parameters have meaningful interpretations in the context of this question? If so, provide interpretations in this context. If not, explain why not.

C. John wants to provide a 98 percent confidence interval for the slope parameter in his final report. Compute

the margin of error that John should use. Assume that conditions for inference are satisfied.

Page 8: Inference for Slope SSS Handout - duPont Manual … · If so, provide interpretations in this context. If not, explain why not. C. John wants to provide a 98 percent confidence interval

Inference for the slope of the LSRL Student Saturday Session

© Chris True • National Math and Science.

AP® STATISTICS 2005 SCORING GUIDELINES (Form B)

Question 5 Solution Part (a): Predicted Pulse = 63.457 + 16.2809 (Speed) Part (b): The intercept (63.457 bpm) provides an estimate for John’s mean resting pulse (walking at a speed of zero mph). The slope (16.2809 bpm/mph) provides an estimate for the mean increase in John’s heart rate as his speed is increased by one mile per hour. Part (c): The margin of error for the confidence interval for the slope parameter is tn!2

* " sb1 = 3.365" 0.8192 = 2.7566 bpm. Scoring Part (a) is scored as essentially correct (E) or incorrect (I). Parts (b) and (c) are scored as essentially correct (E), partially correct (P), or incorrect (I). Note: If the student uses x and y, then both variables must be identified. Part (b): There are four steps to constructing correct interpretations: Step 1: A correct mathematical interpretation of the reported slope (16.2809) as a rate of increase in heart rate as

walking speed increases. Step 2: A correct mathematical interpretation of the reported intercept as a pulse rate when walking speed is zero. Step 3: Correct use of units of measurement, e.g., John’s heart rate increases 16.2809 bpm as his speed is

increased by one mile per hour. Step 4: Interpretation of the reported values as estimates of the corresponding mean quantities. Part (b) is essentially correct (E) if all four steps are correct. Part (b) is partially correct (P) if two or three steps are correctly addressed. Step 2 is scored as incorrect, for example, if the student suggests that the intercept does not have a meaningful interpretation. Part (b) is incorrect (I) if at most one step is correct. Note: The student is only penalized once for switching the variables.

Page 9: Inference for Slope SSS Handout - duPont Manual … · If so, provide interpretations in this context. If not, explain why not. C. John wants to provide a 98 percent confidence interval

Inference for the slope of the LSRL Student Saturday Session

© Chris True • National Math and Science.

Part (c) is essentially correct (E) if the standard error of the slope is identified and the correct critical value is used to calculate the margin of error. Part (c) is partially correct (P) if the student:

• Computes the 98% confidence interval but does not identify the margin of error; OR • Recognizes that the margin of error consists of the standard error of the coefficient and the critical value

but uses an incorrect value for one of the two components or uses a t-value with 6 degrees of freedom and an incorrect standard error.

Part (c) is incorrect (I) if the student uses: • The standard error of the coefficient as the margin of error; OR • A critical value as the margin of error.

4 Complete Response (3E) All three parts essentially correct 3 Substantial Response (2E 1P) Two parts essentially correct and one part partially correct 2 Developing Response (2E 0P or 1E 2P) Two parts essentially correct and zero parts partially correct OR One part essentially correct and two parts partially correct 1 Minimal Response ( 1E 1P or 1E 0P or 0E 2P) One part essentially correct and either zero parts or one part partially correct OR Zero parts essentially correct and two parts partially correct

Page 10: Inference for Slope SSS Handout - duPont Manual … · If so, provide interpretations in this context. If not, explain why not. C. John wants to provide a 98 percent confidence interval

Inference for the slope of the LSRL Student Saturday Session

© Chris True • National Math and Science.

Page 11: Inference for Slope SSS Handout - duPont Manual … · If so, provide interpretations in this context. If not, explain why not. C. John wants to provide a 98 percent confidence interval

Inference for the slope of the LSRL Student Saturday Session

© Chris True • National Math and Science.

Sample: 5A Score: 4 In part (a) the correct formula for the estimated regression line is reported, and the variables are clearly defined. The student clearly realizes the estimated regression line provides estimates of pulse rate for various walking speeds. In part (b) the student clearly indicates that John’s pulse rate would be expected to be close to the estimated intercept (63.457 bpm) when his walking speed is zero. This conveys the notion of the estimated intercept as an estimate of John’s mean pulse rate when he is not walking. Both the estimated intercept and the estimated slope are interpreted in the context of the problem using appropriate units of measurement. The margin of error is correctly evaluated in the response to part (c). The student clearly shows that the t-value is based on 5 degrees of freedom.

Page 12: Inference for Slope SSS Handout - duPont Manual … · If so, provide interpretations in this context. If not, explain why not. C. John wants to provide a 98 percent confidence interval

Inference for the slope of the LSRL Student Saturday Session

© Chris True • National Math and Science.

Sample: 5B Score: 3 The response to part (a) does not report the estimated regression line in the context of the problem, nor does it define the X and Y variables used in the formula. The response to part (b) provides interpretations of both the estimated intercept and the estimated slope in the context of the problem. Appropriate units of measurement are used in the interpretation of the slope, but bpm is omitted from the interpretation of the intercept. The interpretation of the slopes uses “increases on average by” to indicate that the slope is an average rate of increase and “heart rate is around” to indicate that the intercept is a prediction of John’s resting heart rate. The communication of these concepts could have been better. The margin of error is correctly evaluated in the response to part (c) and the supporting work is shown.

Page 13: Inference for Slope SSS Handout - duPont Manual … · If so, provide interpretations in this context. If not, explain why not. C. John wants to provide a 98 percent confidence interval

Inference for the slope of the LSRL Student Saturday Session

© Chris True • National Math and Science.

2010B #6 Although this next problem is classified as an investigative task, it has many of the elements linear regression including inference for the slope of the least squares regression line.

STATISTICS SECTION II

Part B Question 6

Spend about 25 minutes on this part of the exam. Percent of Section II score—25

Directions: Show all your work. Indicate clearly the methods you use, because you will be scored on the correctness of your methods as well as on the accuracy and completeness of your results and explanations. 6. A real estate agent is interested in developing a model to estimate the prices of houses in a particular part of a

large city. She takes a random sample of 25 recent sales and, for each house, records the price (in thousands of dollars), the size of the house (in square feet), and whether or not the house has a swimming pool. This information, along with regression output for a linear model using size to predict price, is shown below and on the next page.

Page 14: Inference for Slope SSS Handout - duPont Manual … · If so, provide interpretations in this context. If not, explain why not. C. John wants to provide a 98 percent confidence interval

Inference for the slope of the LSRL Student Saturday Session

© Chris True • National Math and Science.

A. Interpret the slope of the least squares regression line in the context of the study.

Page 15: Inference for Slope SSS Handout - duPont Manual … · If so, provide interpretations in this context. If not, explain why not. C. John wants to provide a 98 percent confidence interval

Inference for the slope of the LSRL Student Saturday Session

© Chris True • National Math and Science.

B. The second house in the table has a residual of 49. Interpret this residual value in the context of the study. The real estate agent is interested in investigating the effect of having a swimming pool on the price of a

house. C. Use the residuals from all 25 houses to estimate how much greater the price for a house with a swimming

pool would be, on average, than the price for a house of the same size without a swimming pool.

Page 16: Inference for Slope SSS Handout - duPont Manual … · If so, provide interpretations in this context. If not, explain why not. C. John wants to provide a 98 percent confidence interval

Inference for the slope of the LSRL Student Saturday Session

© Chris True • National Math and Science.

To further investigate the effect of having a swimming pool on the price of a house, the real estate agent creates two regression models, one for houses with a swimming pool and one for houses without a swimming pool. Regression output for these two models is shown below.

D. The conditions for inference have been checked and verified, and a 95 percent confidence interval for the true

difference in the two slopes is (−0.099, 0.110). Based on this interval, is there a significant difference in the two slopes? Explain your answer.

Page 17: Inference for Slope SSS Handout - duPont Manual … · If so, provide interpretations in this context. If not, explain why not. C. John wants to provide a 98 percent confidence interval

Inference for the slope of the LSRL Student Saturday Session

© Chris True • National Math and Science.

E. Use the regression model for houses with a swimming pool and the regression model for houses without a swimming pool to estimate how much greater the price for a house with a swimming pool would be than the price for a house of the same size without a swimming pool. How does this estimate compare with your result from part C?

AP® STATISTICS 2010 SCORING GUIDELINES (Form B)

Question 6 Intent of Question The primary goals of this investigative task were to assess students’ ability to understand, apply and draw conclusions from a regression analysis beyond what they have previously studied. More specific goals were to assess students’ ability to (1) interpret a slope coefficient and residual value; (2) interpret a confidence interval; (3) compare two regression models and draw appropriate conclusions. Solution Part (a): The slope coefficient is 0.165. This means that for each additional square foot of size, the predicted price of the house increases by 0.165 thousand dollars, which is $165. In other words, this model predicts that the average price of a house increases by $165 for each additional square foot of a house’s size. Part (b): The residual value of 49 for this house indicates that its actual price is 49 thousand dollars higher than the model would predict for a house of its size.

Page 18: Inference for Slope SSS Handout - duPont Manual … · If so, provide interpretations in this context. If not, explain why not. C. John wants to provide a 98 percent confidence interval

Inference for the slope of the LSRL Student Saturday Session

© Chris True • National Math and Science.

Part (c): The average residual value for the eight houses with a swimming pool is:

(6 + 49 + (!18)+!+ 42)8

= 1498

=18.6 thousand dollars.

The average residual value for the 17 houses with no swimming pool is: (13+ 26 + (!45)+!+ 33)

17= !15017

= !8.8 thousand dollars.

The residual averages suggest that the regression line tends to underestimate the price of homes with a swimming pool by about 18.6 thousand dollars and to overestimate the price of homes with no pool by about 8.8 thousand dollars. The difference between these two residual averages is 18.6 − (−8.8) = 27.4 thousand dollars. This suggests that, for two houses of the same size, the house with a swimming pool would be estimated to cost $27,400 more than the house with no swimming pool. Part (d): No, this confidence interval does not indicate a significant difference (at the 95 percent confidence level, equivalent to the 5 percent significance level) between the two slope coefficients because the interval includes the value zero. Part (e): If the two population regression lines do in fact have the same slope, the impact of a swimming pool is the (constant) vertical distance between the two lines. However, because the two fitted lines do not have the same slope, the distance between the two fitted lines depends on the size of the house. Using the available information, there are two acceptable approaches to estimating the impact of having a swimming pool. Approach 1: Use the two fitted lines to predict the price of a house with and without a pool for a particular house size. For example, using the value of size = 2,250 square feet (which is near the middle of the distribution of house sizes), we find: Predicted price for a 2,250 square-foot house with a swimming pool =

−11.602 + 0.166(2,250) = 361.898 thousand dollars. Predicted price for a 2,250 square-foot house with no swimming pool =

−27.382 + 0.160(2,250) = 332.618 thousand dollars. The difference in these predicted prices is 361.898 − 332.618 = 29.280 thousand dollars, which is an estimate of the impact of a swimming pool on the predicted price of a 2,250 square-foot house. This is quite similar to the estimate based on residuals in part (c). Approach 2: Because the slopes of the two sample regression lines were judged not to be significantly different, another acceptable approach would be to use the difference in the intercepts of the two fitted lines as an estimate of the vertical distance between the two population regression lines. The difference in the intercepts of the two fitted lines is −11.602 − (−27.382) = 15.780 thousand dollars, which is an estimate of the impact of a swimming pool on the predicted price of a house, assuming this difference does not change with the size of the house. This is quite different from the estimate based on residuals in part (c).

Page 19: Inference for Slope SSS Handout - duPont Manual … · If so, provide interpretations in this context. If not, explain why not. C. John wants to provide a 98 percent confidence interval

Inference for the slope of the LSRL Student Saturday Session

© Chris True • National Math and Science.

Scoring This question is scored in four sections. Section 1 consists of part (a); section 2 consists of part (b); section 3 consists of part (c); section 4 consists of parts (d) and (e). Each of the four sections is scored as essentially correct (E), partially correct (P) or incorrect (I). Section 1 is scored as follows: Essentially correct (E) if the response identifies the correct value for the slope coefficient and provides a correct interpretation in context. Partially correct (P) if the response identifies the correct value for the slope coefficient and provides a correct interpretation but not in context OR the response provides an incorrect value for the slope but provides a correct interpretation of this value in context OR the response identifies the correct value for the slope but the interpretation is incomplete because of one or more of the following errors:

• The interpretation does not mention “predicted” or “on average” or any other indication of a probabilistic rather than a deterministic relationship.

• The interpretation does not include the notion of each additional square foot of size by saying something like “for every square foot.”

• The interpretation does not use units for the price variable, or it uses incorrect units for the price variable (e.g., dollars instead of thousands of dollars).

Incorrect (I) if there is no interpretation or if the interpretation does not warrant a score of P. Note: It is possible to earn an E for section 1 without stating the actual numerical value of the slope, if a correct and well-communicated interpretation of the slope is given in context. Section 2 is scored as follows: Essentially correct (E) if the response provides a correct interpretation of the residual value, in context, including both direction and a comparison with the model’s predicted or average value (e.g., actual price is higher than predicted). Partially correct (P) if the response provides an interpretation of the residual value that fails to mention direction or that gives the incorrect direction OR if the response provides a correct interpretation of the residual value that includes direction, but that is not in context. Incorrect (I) if there is no interpretation of the residual value OR the interpretation does not include direction and is not in context. Section 3 is scored as follows: Essentially correct (E) if the response correctly calculates averages of residual values both for houses with pools and houses without pools AND correctly reports the difference between those averages as the estimate of the impact of a swimming pool. Partially correct (P) if the response either correctly calculates averages of residual values both for houses with pools and houses without pools but does not correctly report the difference between those averages as the estimate of the impact of a swimming pool OR incorrectly calculates one or both averages of residual values but does report the difference between those averages as the estimate of the impact of a swimming pool OR does not use all of the residual values but does use a reasonable set of residual values (such as houses of similar size) and correctly calculates both averages and correctly reports the difference between those averages as the estimate of the impact of a swimming pool. Incorrect (I) if the response does not meet the criteria for an E or P. Notes:

• If the student calculates some other measure of center for the two sets of residuals (e.g., medians) and reports the difference as the estimate of the impact of a swimming pool, this part can be scored, at best, partially correct (P).

• If the student estimates the values of the residuals from the residual plot rather than using the residuals provided in the table, the response can be scored as essentially correct (E), provided it is clear that this is what was done.

Page 20: Inference for Slope SSS Handout - duPont Manual … · If so, provide interpretations in this context. If not, explain why not. C. John wants to provide a 98 percent confidence interval

Inference for the slope of the LSRL Student Saturday Session

© Chris True • National Math and Science.

Section 4 is scored as follows: Essentially correct (E) if the response includes all three of the following components: 1. Correctly notes that the confidence interval in part (d) includes zero and so the difference in the slopes is not

statistically significant. 2. Calculates a reasonable estimate in part (e):

• For approach 1, this includes choosing a house size within the range of the data and correctly computing the difference in predicted prices.

• For approach 2, this includes appealing to the fact that the slopes were judged as not significantly different and computing the difference in intercepts.

3. Includes a comparison of the estimate in part (e) to the estimate in part (c). Partially correct (P) if the response includes only one of (1) and (2) above. Incorrect (I) if the response includes neither (1) nor (2) above. Notes

• If the response uses approach 1, the difference between the two predicted values can range from • 25.38 to 33.44, depending on the house size used. • If the response uses approach 2, the constant vertical distance can be estimated from the graph showing

the two regression lines rather than on the difference in intercepts, provided that the response makes it clear that this is what is being done.

• In the comparison with the estimate in part (c), an assessment of the size of the difference in estimates is not required. Statements that merely use phrases like “greater than,” “about the same,” etc. are acceptable for the comparison component of parts (d) and (e).

• If this section receives a score of partially correct only because the student neglects to compare the estimate in part (e) to the estimate in part (c), the response should be scored up if a decision on whether to score up or down is required.

• If the response subtracts the two fitted equations to obtain a general expression for the vertical distance between the two fitted lines as a function of house size, this should be considered an essentially correct approach for component 2 of section 4. The resulting expression is 15.580 + 0.006(size).

• If the student uses a house size outside the range of the data to compute the difference in predicted price, this can only be considered correct if the student appeals to the fact that the slopes of the sample regression lines are not significantly different.

Each essentially correct (E) section counts as 1 point. Each partially correct (P) section counts as 1/2 point. 4 Complete Response 3 Substantial Response 2 Developing Response 1 Minimal Response If a response is between two scores (for example, 2 1/2 points), use a holistic approach to determine whether to score up or down, depending on the overall strength of the response and communication. In deciding whether to score up or down, pay particular attention to the response to the investigative part of the question (section 4).

Page 21: Inference for Slope SSS Handout - duPont Manual … · If so, provide interpretations in this context. If not, explain why not. C. John wants to provide a 98 percent confidence interval

Inference for the slope of the LSRL Student Saturday Session

© Chris True • National Math and Science.

Page 22: Inference for Slope SSS Handout - duPont Manual … · If so, provide interpretations in this context. If not, explain why not. C. John wants to provide a 98 percent confidence interval

Inference for the slope of the LSRL Student Saturday Session

© Chris True • National Math and Science.

Sample: 6A Score: 4 Part (a) of this response includes a correct interpretation of the slope, in context, so section 1, consisting of part (a), was scored as essentially correct. Section 2, consisting of part (b), was also scored as essentially correct because the residual of 49 is correctly interpreted in context. In part (c) residual averages are computed separately for houses with pools and for houses without pools, and the difference in the residual averages is correctly calculated; thus section 3, consisting of part (c), was scored as essentially correct. In part (d) the response correctly states that there is no significant difference in the slopes and provides appropriate justification based on the given confidence interval. In part (e) a house size of 2,000 square feet, which is within the range of house sizes in the sample, is chosen, and the difference in price for a house of this size with a pool and a house of this size without a pool is computed. This estimate is then compared with the estimate in part (c). Section 4, consisting of parts (d) and (e), therefore includes all three components needed to receive a score of essentially correct. The entire answer, based on all four sections, was judged a complete response and earned a score of 4.