biostats 640 exam ii 2020 - umass 640 exam... · 2020. 3. 10. · biostats 640 exam 2 – spring...

21
BIOSTATS 640 Exam 2 – Spring 2020 Name ________________________________________________ Z:\bigelow\...\2020\...\BIOSTATS 640 Exam II 2020.docx Page 1 of 21 BIOSTATS 640 Intermediate Biostatistics Spring 2020 Examination II Unit 4 – Categorical Data Analysis & Unit 5 - Normal Theory Regression and Correlation Due: Monday April 6, 2020 Last Date for Submission for Credit: Monday April 13, 2020 (-20 points) Before you begin: This is a “take-home” exam. You are welcome to use any reference materials you wish. You are welcome to use the computer as you wish, too. However, you MUST work this exam by yourself and you may not consult with anyone (except me and that is fine…). Instructions and Checklist: __1. Start each problem on a new page. __ 2. Write your name on every page. __ 3. Make a photo-copy of your exam for safekeeping prior to submission __ 4. Complete the signature page __ 5. Please DO NOT submit a copy of the exam questions!! I have them…. How to submit your exam: (1) ONLINE Students Please be sure your name is somewhere on your submission. Next, save it as a SINGLE FILE pdf using the naming convention lastname_exam2.pdf. To submit: 1) please upload your exam to Blackboard Learn using the ASSIGNMENT tab at left, no later than Monday April 6, 2020. (2) Worcester Section. Please be sure your name is somewhere on your submission. Next, save it as a SINGLE FILE pdf using the naming convention lastname_exam2.pdf. To submit: 1) my preference is that you bring a hard copy copy (yes, be sure to keep an original) to class on Monday April 6, 2020; alternatively 2) if you prefer, you can upload your exam to Blackboard Learn using the ASSIGNMENT tab at left, no later than Monday April 6, 2020. (3) Amherst Section Please be sure your name is somewhere on your submission. Next, save it as a SINGLE FILE pdf using the naming convention lastname_exam2.pdf. To submit: please upload your exam to Blackboard Learn using the ASSIGNMENT tab at left, no later than Monday April 6, 2020; alternatively 2) you can put a hard copy copy (yes, be sure to keep an original) in my mailbox (4 th floor of Arnold House, across from the elevator), no later than Monday April 6, 2020. (4) ALL I will also accept exams sent by U.S. Post. Please mail with postmark no later than Monday April 6, 2020 to: Carol Bigelow School of Public Health/402 Arnold House University of Massachusetts/Amherst 715 North Pleasant Street Amherst, MA 01003-9304 Tel. 413-545-1319.

Upload: others

Post on 23-Feb-2021

18 views

Category:

Documents


1 download

TRANSCRIPT

Page 1: BIOSTATS 640 Exam II 2020 - UMass 640 Exam... · 2020. 3. 10. · BIOSTATS 640 Exam 2 – Spring 2020 Name _____ Z:\bigelow\...\2020\...\BIOSTATS 640 Exam II 2020.docx Page 3 of 21

BIOSTATS 640 Exam 2 – Spring 2020 Name ________________________________________________

Z:\bigelow\...\2020\...\BIOSTATS 640 Exam II 2020.docx Page 1 of 21

BIOSTATS 640 Intermediate Biostatistics Spring 2020

Examination II Unit 4 – Categorical Data Analysis &

Unit 5 - Normal Theory Regression and Correlation Due: Monday April 6, 2020

Last Date for Submission for Credit: Monday April 13, 2020 (-20 points)

Before you begin: This is a “take-home” exam. You are welcome to use any reference materials you wish. You are welcome to use the computer as you wish, too. However, you MUST work this exam by yourself and you may not consult with anyone (except me and that is fine…). Instructions and Checklist: __1. Start each problem on a new page. __ 2. Write your name on every page. __ 3. Make a photo-copy of your exam for safekeeping prior to submission __ 4. Complete the signature page __ 5. Please DO NOT submit a copy of the exam questions!! I have them…. How to submit your exam: (1) ONLINE Students Please be sure your name is somewhere on your submission. Next, save it as a SINGLE FILE pdf using the naming convention lastname_exam2.pdf. To submit: 1) please upload your exam to Blackboard Learn using the ASSIGNMENT tab at left, no later than Monday April 6, 2020. (2) Worcester Section. Please be sure your name is somewhere on your submission. Next, save it as a SINGLE FILE pdf using the naming convention lastname_exam2.pdf. To submit: 1) my preference is that you bring a hard copy copy (yes, be sure to keep an original) to class on Monday April 6, 2020; alternatively 2) if you prefer, you can upload your exam to Blackboard Learn using the ASSIGNMENT tab at left, no later than Monday April 6, 2020. (3) Amherst Section Please be sure your name is somewhere on your submission. Next, save it as a SINGLE FILE pdf using the naming convention lastname_exam2.pdf. To submit: please upload your exam to Blackboard Learn using the ASSIGNMENT tab at left, no later than Monday April 6, 2020; alternatively 2) you can put a hard copy copy (yes, be sure to keep an original) in my mailbox (4th floor of Arnold House, across from the elevator), no later than Monday April 6, 2020. (4) ALL I will also accept exams sent by U.S. Post. Please mail with postmark no later than Monday April 6, 2020 to: Carol Bigelow School of Public Health/402 Arnold House University of Massachusetts/Amherst 715 North Pleasant Street Amherst, MA 01003-9304 Tel. 413-545-1319.

Page 2: BIOSTATS 640 Exam II 2020 - UMass 640 Exam... · 2020. 3. 10. · BIOSTATS 640 Exam 2 – Spring 2020 Name _____ Z:\bigelow\...\2020\...\BIOSTATS 640 Exam II 2020.docx Page 3 of 21

BIOSTATS 640 Exam 2 – Spring 2020 Name ________________________________________________

Z:\bigelow\...\2020\...\BIOSTATS 640 Exam II 2020.docx Page 2 of 21

BIOSTATS 640 Intermediate Biostatistics Spring 2020

Examination II Unit 4 – Categorical Data Analysis &

Unit 5 - Normal Theory Regression and Correlation

Signature This is to confirm that in completing this exam, I worked independently and did not consult with anyone. Name: ___________________________________________________________ Date: ___________________________

Thank you!

Page 3: BIOSTATS 640 Exam II 2020 - UMass 640 Exam... · 2020. 3. 10. · BIOSTATS 640 Exam 2 – Spring 2020 Name _____ Z:\bigelow\...\2020\...\BIOSTATS 640 Exam II 2020.docx Page 3 of 21

BIOSTATS 640 Exam 2 – Spring 2020 Name ________________________________________________

Z:\bigelow\...\2020\...\BIOSTATS 640 Exam II 2020.docx Page 3 of 21

Before You Begin! Choose ONE Question to Skip

As with the first test, this test also has 11 questions. Choose EXACTLY 10 questions. If you submit answers to 11 questions, I will grade the first 10 questions only. The maximum score possible on this exam is 100.

Page 4: BIOSTATS 640 Exam II 2020 - UMass 640 Exam... · 2020. 3. 10. · BIOSTATS 640 Exam 2 – Spring 2020 Name _____ Z:\bigelow\...\2020\...\BIOSTATS 640 Exam II 2020.docx Page 3 of 21

BIOSTATS 640 Exam 2 – Spring 2020 Name ________________________________________________

Z:\bigelow\...\2020\...\BIOSTATS 640 Exam II 2020.docx Page 4 of 21

1. (10 points total) The following data are from a study of utilization patterns of a Boston Health Maintenance Organization: _________________________________________________________________________ Number of Visits in One Year Males Females __________________________________________________________________________ 0 35 24 1-2 30 24 3-5 21 24 6-10 9 18 11 or more 5 10 ___________________________________________________________________________ TOTALS 100 100 1a. (5 points) Under the null hypothesis of no association between sex and number of visits, how many women would be expected to have 6-10 visits? How many men? 1b. (5 points) Test the null hypothesis of no association between sex and number of visits. Tip - Be sure to state the null and alternative hypothesis (1 point), show the test statistic (1 point) and its calculated value (1 point), report the achieved significance level (1 point), and interpret your findings (1 point).

Page 5: BIOSTATS 640 Exam II 2020 - UMass 640 Exam... · 2020. 3. 10. · BIOSTATS 640 Exam 2 – Spring 2020 Name _____ Z:\bigelow\...\2020\...\BIOSTATS 640 Exam II 2020.docx Page 3 of 21

BIOSTATS 640 Exam 2 – Spring 2020 Name ________________________________________________

Z:\bigelow\...\2020\...\BIOSTATS 640 Exam II 2020.docx Page 5 of 21

2. (10 points total) A study was made of 100 terminal cancer patients who were given either vitamin C or placebo as part of their therapy. Patients differed in their age (AGE), gender (SEX), and location of tumor (SITE). Of interest is the outcome 0/1 remission (REMISS) at 30 days. A subset of the data, which includes 40 patients is presented here.

Vitamin C Group Placebo Group

SITE SEX AGE REMISS SITE SEX AGE REMISS Stomach F 61 Yes Stomach F 58 No Stomach M 69 No Stomach F 71 No Stomach F 62 Yes Stomach M 63 Yes Stomach F 66 No Stomach M 45 Yes Stomach M 63 yes Stomach M 57 no Bronchus M 74 No Bronchus M 74 Yes Bronchus M 74 Yes Bronchus F 50 Yes Bronchus M 66 No Bronchus F 66 No Bronchus M 52 No Bronchus M 50 Yes Bronchus F 48 No Bronchus M 87 No Colon F 76 Yes Colon F 35 Yes Colon F 58 Yes Colon M 50 No Colon M 49 Yes Colon M 89 No Colon M 69 Yes Colon F 67 Yes Colon F 70 No Colon F 55 No Rectum F 56 No Rectum M 82 No Rectum F 75 Yes Rectum M 51 Yes Rectum F 57 Yes Rectum F 73 No Rectum M 56 Yes Rectum M 85 No Rectum M 68 No Rectum F 64 Yes Carry out the appropriate statistical test to assess whether, overall, the data suggest that supplemental treatment with vitamin C is effective with respect to the outcome of remission. Tip #1 - Be sure to state the null and alternative hypothesis (2 points), show the test statistic (2 points) and its calculated value (2 points), report the achieved significance level (2 points), and interpret your findings (1 points). Tip #2 – To develop your solution, you will need to use the information provided to construct the 2x2 table that you then analyze.

Page 6: BIOSTATS 640 Exam II 2020 - UMass 640 Exam... · 2020. 3. 10. · BIOSTATS 640 Exam 2 – Spring 2020 Name _____ Z:\bigelow\...\2020\...\BIOSTATS 640 Exam II 2020.docx Page 3 of 21

BIOSTATS 640 Exam 2 – Spring 2020 Name ________________________________________________

Z:\bigelow\...\2020\...\BIOSTATS 640 Exam II 2020.docx Page 6 of 21

3. (10 points total) The following data, from an article by Cochran (1954), show the clinical change (Improvement) by degree of infiltration – a measure of a type of skin damage – present at the beginning of a study of 196 leprosy patients who received 48 weeks of treatment. Improvement Degree of Infiltration

Worse

Same

Slight

Moderate

Marked

Total

0-7 11 27 42 53 11 144 8-15 7 15 16 13 1 52

Total 18 42 58 66 12 196 3a. (4 points) Test the hypothesis of no association between the degree of infiltration and the clinical change. In 1-2 sentences, explain your answer. Tip - Be sure to state the null and alternative hypothesis, show the test statistic and its calculated value, report the achieved significance level, and interpret your findings. 3b. (4 points) Next, assign scores from -1 to +3 for the clinical change categories worse to marked improvement and test the hypothesis of no linear trend. 3c. (2 points) In 1-2 sentences, explain the distinction between the test you performed in #3A and the test you performed in question #3b.

Page 7: BIOSTATS 640 Exam II 2020 - UMass 640 Exam... · 2020. 3. 10. · BIOSTATS 640 Exam 2 – Spring 2020 Name _____ Z:\bigelow\...\2020\...\BIOSTATS 640 Exam II 2020.docx Page 3 of 21

BIOSTATS 640 Exam 2 – Spring 2020 Name ________________________________________________

Z:\bigelow\...\2020\...\BIOSTATS 640 Exam II 2020.docx Page 7 of 21

4. (10 points total) A case-control study investigated exposures to high risk occupation and cigarette use. Cases were cases of bladder cancer. The following data were obtained. High Risk

Occupation High Cigarette Consumption

Number of Cases

Number of Controls

No No 43 94 No Yes 173 189 Yes No 26 20 Yes Yes 111 72

4a. (2 points) What is the value of the estimated odds ratio for bladder cancer comparing high risk occupation to non-high risk occupation (ignoring cigarette consumption)? 4b. (2 points) Test the null hypothesis that the association of association of occupation and bladder cancer is the same for high cigarette smokers and non-cigarette smokers. Tip - Be sure to state the null and alternative hypothesis, show the test statistic and its calculated value, report the achieved significance level, and interpret your findings. 4c. (3 points) What is the value of the Mantel-Haenszel “common” odds ratio measure of association relating bladder cancer case status to high risk occupation (yes or no)? 4d. (3 points) Test the null hypothesis that the smoking adjusted odds ratio measure of association between high risk occupation and bladder cancer is one. You may assume homogeneity of association. Again - Be sure to state the null and alternative hypothesis, show the test statistic and its calculated value, report the achieved significance level, and interpret your findings.

Page 8: BIOSTATS 640 Exam II 2020 - UMass 640 Exam... · 2020. 3. 10. · BIOSTATS 640 Exam 2 – Spring 2020 Name _____ Z:\bigelow\...\2020\...\BIOSTATS 640 Exam II 2020.docx Page 3 of 21

BIOSTATS 640 Exam 2 – Spring 2020 Name ________________________________________________

Z:\bigelow\...\2020\...\BIOSTATS 640 Exam II 2020.docx Page 8 of 21

5. (10 points total) The Minnesota Department of Conservation is interested in the migration habits of its Canada geese. To study this question, 18,306 geese were captured and classified by age and sex. Each was banded and then released. During the subsequent migration season, hunters in the same area were asked to forward to the Minnesota Department of Conservation the band from any goose that they shot. The following information was obtained.

Age and Sex Number banded % Banded Number Returned Adult male 4,144 .2264 17

Adult female 3,597 .1965 21 Yearling male 5,034 .2750 38

Yearling female 5,531 .3021 36 TOTAL 18,306 1.0000 112

Carry out the appropriate statistical test to assess whether or not the distribution of geese types returning (and shot, unfortunately) is the same as the distribution of geese types among the 18, 306 that were banded. Tip - Be sure to state the null and alternative hypothesis (2 points), show the test statistic (2 points) and its calculated value (2 points), report the achieved significance level (2 points), and interpret your findings (1 points).

Page 9: BIOSTATS 640 Exam II 2020 - UMass 640 Exam... · 2020. 3. 10. · BIOSTATS 640 Exam 2 – Spring 2020 Name _____ Z:\bigelow\...\2020\...\BIOSTATS 640 Exam II 2020.docx Page 3 of 21

BIOSTATS 640 Exam 2 – Spring 2020 Name ________________________________________________

Z:\bigelow\...\2020\...\BIOSTATS 640 Exam II 2020.docx Page 9 of 21

6. (10 points total) BEFORE YOU BEGIN – This question tests your understanding of simple linear regression. You might want to make use of the BIOSTATS 540 Unit 12 resources here, in particular the course notes for that unit. Here is the link to the BIOSTATS 540 webpage for simple linear regression:

https://people.umass.edu/biep540w/webpages/fall%202016%2012%20regression.html Introduction. You might reasonably suspect that a person’s high school GPA is a good predictor of that person’s first-year college GPA. This question explores that. Following are the data.

Student ID High School GPA First-Year College GPA 1 3.5 3.3 2 2.5 2.2 3 4.0 3.5 4 3.8 2.7 5 2.8 3.5 6 1.9 2.0 7 3.2 3.1 8 3.7 3.4 9 2.7 1.9 10 3.3 3.7

6a. (1 point) State the assumptions necessary for a simple linear model regression. Given the description provided above, what is the dependent (outcome) variable Y? What is the independent (predictor) variable X? 6b. (1 point) Calculate the least squares estimate of the slope and intercept.

Page 10: BIOSTATS 640 Exam II 2020 - UMass 640 Exam... · 2020. 3. 10. · BIOSTATS 640 Exam 2 – Spring 2020 Name _____ Z:\bigelow\...\2020\...\BIOSTATS 640 Exam II 2020.docx Page 3 of 21

BIOSTATS 640 Exam 2 – Spring 2020 Name ________________________________________________

Z:\bigelow\...\2020\...\BIOSTATS 640 Exam II 2020.docx Page 10 of 21

6. (10 points total) - continued

6c. (1 point) Complete the following analysis of variance table by completing the 10 blanks, ? ____. Source DF Sum of Squares Mean Square F-Ratio p-value Regression

?___ ?___ ?___ ?___ ?___

Residual

?___ ?___ ?___

Total, corrected ?___

?___

6d. (1 point) Test the fitted model for statistical significance. In reporting your answer, please include statements of: the null and alternative hypotheses, the formula for the test statistic, the value of the test statistic, the p-value and most importantly, your interpretation!

6e. (1 point) Using the answer you obtained in question #6b, what is the predicted First-Year College GPA for an average student with High School GPA = 2.5?

6f. (1 point) Consider your answer to question #6e. What is the estimated standard error of the estimated mean prediction you obtained there? 6g. (1 point) Using the answer you obtained in question #6b, what is the predicted First-Year College GPA for an individual student with High School GPA = 2.5?

6h. (1 point) Consider your answer to question #6f. What is the estimated standard error of the estimated individual prediction you obtained?

Page 11: BIOSTATS 640 Exam II 2020 - UMass 640 Exam... · 2020. 3. 10. · BIOSTATS 640 Exam 2 – Spring 2020 Name _____ Z:\bigelow\...\2020\...\BIOSTATS 640 Exam II 2020.docx Page 3 of 21

BIOSTATS 640 Exam 2 – Spring 2020 Name ________________________________________________

Z:\bigelow\...\2020\...\BIOSTATS 640 Exam II 2020.docx Page 11 of 21

6. (10 points total) - continued 6i. (1 point) Look again at the raw data. How does student #2 (with High School GPA = 2.5) compare with other students (with High School GPA = 2.5) given that that student’s First-Year College GPA is 2.2? 6j. (1 point) Comment on the difference between the two predictions (questions #6e and #6g) and the two standard errors (questions #6f and #6h).

Page 12: BIOSTATS 640 Exam II 2020 - UMass 640 Exam... · 2020. 3. 10. · BIOSTATS 640 Exam 2 – Spring 2020 Name _____ Z:\bigelow\...\2020\...\BIOSTATS 640 Exam II 2020.docx Page 3 of 21

BIOSTATS 640 Exam 2 – Spring 2020 Name ________________________________________________

Z:\bigelow\...\2020\...\BIOSTATS 640 Exam II 2020.docx Page 12 of 21

7. (10 points total) To assess physical conditioning in normal individuals, it is useful to know how much energy they are capable of expending. Since the process of expending energy requires oxygen, one way to evaluate this is to look at the rate at which they use oxygen at peak physical activity. To examine the peak physical activity, tests have been designed where the individual runs on a treadmill. At specified time intervals, the speed at which the treadmill moves and the grade of the treadmill both increase. The individual is then systematically run to maximum physical capacity. The maximum capacity is determined by the individual; the person stops when unable to go further. Because physical conditioning is relative to the size of the individual, such measures take into account body size. One of these is VO2 MAX (ml/kg/min); this is computed by looking at the volume of oxygen used per minute per kilogram of body weight. Consider the following multiple predictor regression analysis of n=94 sedentary males with treadmill tests. The dependent (outcome) variable is Y = VO2 MAX . There are four predictors:

X1 = treadmill duration (seconds) X2 = maximum heart rate (beats/minute) X3 = height (centimeters) X4 = weight (kilograms)

A partial display of the regression results is provided. Coefficients Table Constant or Predictor

X1 = treadmill duration 0.0510 0.00416 X2 = max heart rate 0.0191 0.0258

X3 = height -0.0320 0.0444 X4 = weight 0.0089 0.0520

Constant (intercept) 2.89 11.17 Analysis of Variance Table

Source Sum of Squares DF Mean Square F-Ratio P

Regression

4,314.69

____

____________

_________

_____

Residual

___________ ____ ____________

Total, corrected

5,245.31

____

β SE(β )

Page 13: BIOSTATS 640 Exam II 2020 - UMass 640 Exam... · 2020. 3. 10. · BIOSTATS 640 Exam 2 – Spring 2020 Name _____ Z:\bigelow\...\2020\...\BIOSTATS 640 Exam II 2020.docx Page 3 of 21

BIOSTATS 640 Exam 2 – Spring 2020 Name ________________________________________________

Z:\bigelow\...\2020\...\BIOSTATS 640 Exam II 2020.docx Page 13 of 21

7. (10 points total) - continued 7a. (2 points) Compute the t-statistic value for testing the adjusted statistical significance of X1 = treadmill duration. What is its achieved significance (the p-value)? Do we reject at the 10% significance level? 7b. (3 points) Fill in the missing values in the analysis of variance table by completing the 8 blanks below. Source Sum of Squares DF Mean Square F-Ratio P

Regression

4,314.69

_____

______________

_________

_______

Residual

______________ _____ ______________

Total

5,245.31

_____

7c. (3 points) Next, test the overall significance of the fitted model. In developing your answer, be sure to state the null and alternative hypotheses. In 1-2 sentences, interpret your findings. 7d. (2 points) Calculate the value of the R2. In one sentence, explain its meaning.

β1 = 0

Page 14: BIOSTATS 640 Exam II 2020 - UMass 640 Exam... · 2020. 3. 10. · BIOSTATS 640 Exam 2 – Spring 2020 Name _____ Z:\bigelow\...\2020\...\BIOSTATS 640 Exam II 2020.docx Page 3 of 21

BIOSTATS 640 Exam 2 – Spring 2020 Name ________________________________________________

Z:\bigelow\...\2020\...\BIOSTATS 640 Exam II 2020.docx Page 14 of 21

8. (10 points total) A multiple linear regression analysis of n=19 cases of coronary artery disease investigated three predictors in relationship to Y = VO2 max. X1 = maximal ejection fraction X2 = maximal heart rate X3 = maximal systolic blood pressure Preliminary descriptive statistics on the 19 values of Y = VO2 max yielded a sample mean

Suppose several multiple predictor models are fit and you are given the following.

Model Predictors in the model Sum of Squares Residual (due error) 1 X1 , X2 , X3 790.76 2 X1 , X2 791.49 3 X1 , X3 1270.24 4 X2 , X3 814.16 5 X1 1357.48 6 X2 814.41 7 X3 1281.19

8a. (2 points) The sample standard deviation of Y = VO2 max is given to you; it is s=8.7017. Using this information, calculate the value of the total sum of squares, corrected. This is also called 8b. (2 points) Using your answer to question #8a and the information in the table above, complete the following analysis of variance table by completing the 10 cells with “?___” Source DF SSQ MSQ F-Ratio R2

Regression { X1 , X2 , X3}

?___ ?___ ?___ ?___ ?___

Residual

?___ ?___ ?___

Total, corrected ?___

?___

Y=37.052 and s=8.7017

Page 15: BIOSTATS 640 Exam II 2020 - UMass 640 Exam... · 2020. 3. 10. · BIOSTATS 640 Exam 2 – Spring 2020 Name _____ Z:\bigelow\...\2020\...\BIOSTATS 640 Exam II 2020.docx Page 3 of 21

BIOSTATS 640 Exam 2 – Spring 2020 Name ________________________________________________

Z:\bigelow\...\2020\...\BIOSTATS 640 Exam II 2020.docx Page 15 of 21

8. (10 points total) - continued 8c. (2 points) Again, using your answer to question #8a and the information in the table above, complete the following analysis of variance table by completing the 7 cells with “?___” Source DF SSQ

Regression

1 1 1

?___ ?___ ?___

Residual

?___ ?___

Total, corrected ?___ ?___

8d. (4 points) Carry out the appropriate test to compare the following two models

In your answer, please indicate 8d (i). (1 point) The null and alternative hypotheses. 8d (ii). (1 point) The test statistic formula and its value for these data. 8d (iii). (1 point) The achieved level of significance (p-value). 8d (iv). (1 point) Interpretation of your findings.

3

2 3

1 2 3

(X )(X |X )(X |X ,X )

ì üï ïí ýï ïî þ

0 1 1 2 2 3 3

0 3 3

Y = β + β X + β X + β X + EversusY = β + β X + E

Page 16: BIOSTATS 640 Exam II 2020 - UMass 640 Exam... · 2020. 3. 10. · BIOSTATS 640 Exam 2 – Spring 2020 Name _____ Z:\bigelow\...\2020\...\BIOSTATS 640 Exam II 2020.docx Page 3 of 21

BIOSTATS 640 Exam 2 – Spring 2020 Name ________________________________________________

Z:\bigelow\...\2020\...\BIOSTATS 640 Exam II 2020.docx Page 16 of 21

9. (10 points, total) It is known that both body mass index (BMI) and LDL cholesterol are risk factors for heart disease. It is also appreciated that BMI and LDL might be interrelated themselves. Consider a multiple linear regression analysis that is focused on the relationship of BMI as an important correlate of LDL. Here, LDL is the outcome variable (Y) and BMI is the predictor variable of primary interest. Possible confounders of a BMI-LDL relationship include age, nonwhite ethnicity, smoking, and alcohol use (DRINKANY). Consider the following two models. Single Predictor Model Source | SS df MS Number of obs = 2747 -------------+------------------------------ F( 1, 2745) = 10.14 Model | 14446.0223 1 14446.0223 Prob > F = 0.0015 Residual | 3910928.63 2745 1424.74631 R-squared = 0.0037 -------------+------------------------------ Adj R-squared = 0.0033 Total | 3925374.66 2746 1429.48822 Root MSE = 37.746

------------------------------------------------------------------------------ LDL | Coef. Std. Err. t P>|t| [95% Conf. Interval] -------------+---------------------------------------------------------------- bmi | .4151123 .1303648 3.18 0.001 .1594894 .6707353 _cons*| 133.1913 3.7939 35.11 0.000 125.7521 140.6305 ------------------------------------------------------------------------------

* Key: “_cons” in the coefficients table refers to the intercept. Multiple Predictor Model Source | SS df MS Number of obs = 2745 -------------+------------------------------ F( 1, 2745) = 5.97 Model | 42279.1877 5 8455.83753 Prob > F = 0.0000 Residual | 3881903.3 2739 1417.27028 R-squared = 0.0108 -------------+------------------------------ Adj R-squared = 0.0090 Total | 3924182.49 2744 1430.09566 Root MSE = 37.647

------------------------------------------------------------------------------ LDL | Coef. Std. Err. t P>|t| [95% Conf. Interval] -------------+---------------------------------------------------------------- bmi | .3591038 .1341047 2.68 0.007 .0961472 .6220605 age | -.1897166 .1130776 -1.68 0.094 -.4114426 .0320095 nonwhite | 5.219436 2.323673 2.25 0.025 .6631081 9.775764 smoking | 4.750738 2.210391 2.15 0.032 .4165363 9.08494 drinkany | -2.722354 1.498854 -1.82 0.069 -5.661351 .2166444 _cons | 147.3153 9.256449 15.91 0.000 129.165 165.4656 ------------------------------------------------------------------------------

Page 17: BIOSTATS 640 Exam II 2020 - UMass 640 Exam... · 2020. 3. 10. · BIOSTATS 640 Exam 2 – Spring 2020 Name _____ Z:\bigelow\...\2020\...\BIOSTATS 640 Exam II 2020.docx Page 3 of 21

BIOSTATS 640 Exam 2 – Spring 2020 Name ________________________________________________

Z:\bigelow\...\2020\...\BIOSTATS 640 Exam II 2020.docx Page 17 of 21

9. (10 points total) - continued 9a. (2 points) In unadjusted analysis, what is the estimated change in LDL per unit increase in BMI? 9b. (2 points) In unadjusted analysis what is the 95% confidence interval estimate of the change in LDL per unit increase in BMI? 9c. ( 2 points) Controlling for age, non-white ethnicity, smoking and alcohol, what is the estimated change in LDL per unit increase in BMI? 9d. (2 points) Controlling for age, non-white ethnicity, smoking and alcohol, what is the 95% confidence interval estimate of the change in LDL per unit increase in BMI? 9e. (2 points) Is the confounding of the BMI-LDL relationship by age, non-white ethnicity, smoking and alcohol an example of positive confounding or negative confounding? Explain.

Page 18: BIOSTATS 640 Exam II 2020 - UMass 640 Exam... · 2020. 3. 10. · BIOSTATS 640 Exam 2 – Spring 2020 Name _____ Z:\bigelow\...\2020\...\BIOSTATS 640 Exam II 2020.docx Page 3 of 21

BIOSTATS 640 Exam 2 – Spring 2020 Name ________________________________________________

Z:\bigelow\...\2020\...\BIOSTATS 640 Exam II 2020.docx Page 18 of 21

10. (10 points, total) Radial keratotomy is a type of surgery performed to reduce myopia in near sighted patients. . The Prospective Evaluation of Radial Keratotomy (PERK) study was initiated in 1983 with the goal of investigating the effects of radial keratotomy. In one study the outcome of interest was Y = 5-year post-surgical change in refractive error (diopters) in relationship to an hypothesized predictor X1 = baseline refractive error. A sample of n=54 was studied. Now suppose we want to investigate whether the relationship of Y = 5-year post-surgical change in refractive error (diopters) to X1 = baseline refractive error is different, depending on gender. To address this, two new variables are created, Z and X1Z Z = 1 if patient is male 0 if patient is female. X1Z = (Z ) * (X1 ) Recall from class what this kind of new variable does: X1Z = (Z)*(X1) = (1) * X1 if patient is male = 0 if patient is female. The following two models are fit and yielded the following output. Model 1: Y regressed on X1 and Z

df Sum of squares Mean square F p-value

Model 2 15.30101 7.65207 6.009 0.0045 Error 51 64.94880 1.27351

Total, corrected 53 80.25294 Model 2: Y regressed on X2 and Z and X1Z

df Sum of squares Mean square F p-value

Model 3 19.65170 6.55057 5.405 0.0027 Error 50 60.60124 1.21202

Total, corrected 53 80.25294

1y = 2.752647 - 0.309731*x - 0.412878*z

1y = 3.178210 - 0.201008*x - 1.995126*z - 0.383826*x1z

Page 19: BIOSTATS 640 Exam II 2020 - UMass 640 Exam... · 2020. 3. 10. · BIOSTATS 640 Exam 2 – Spring 2020 Name _____ Z:\bigelow\...\2020\...\BIOSTATS 640 Exam II 2020.docx Page 3 of 21

BIOSTATS 640 Exam 2 – Spring 2020 Name ________________________________________________

Z:\bigelow\...\2020\...\BIOSTATS 640 Exam II 2020.docx Page 19 of 21

10. (10 points total) - continued 10a. (2 points) State a single multiple linear regression model that defines straight-line models relating Y = 5-year post surgical change in refractive error (diopters) to X1 = baseline refractive error for both males and females. Be sure to define all terms. 10b. (3 points) Using the output provided on the previous page, carry out the appropriate statistical test to test whether the lines for males and females coincide. Tip - Be sure to state the null and alternative hypothesis, show the test statistic and its calculated value, report the achieved significance level, and interpret your findings. 10c. (3 points) Again using the output provided on the previous page, carry out the appropriate statistical test to test whether the lines for males and females are parallel. Tip - Be sure to state the null and alternative hypothesis, show the test statistic and its calculated value, report the achieved significance level, and interpret your findings. 10d. (2 points) In 1-2 sentences at most, comment on the comparison of the straight-line models relating Y = 5-year post-surgical change in refractive error (diopters) to X1 = baseline refractive error for both males and females.

Page 20: BIOSTATS 640 Exam II 2020 - UMass 640 Exam... · 2020. 3. 10. · BIOSTATS 640 Exam 2 – Spring 2020 Name _____ Z:\bigelow\...\2020\...\BIOSTATS 640 Exam II 2020.docx Page 3 of 21

BIOSTATS 640 Exam 2 – Spring 2020 Name ________________________________________________

Z:\bigelow\...\2020\...\BIOSTATS 640 Exam II 2020.docx Page 20 of 21

11. (10 points total) Following the 1973 oil crisis, the Department of Transportation performed a normal theory multiple linear regression analysis of 1972 fuel consumption (Y) data from the 48 contiguous states. Several predictors were hypothesized a priori as potentially important, including: X1: 1972 population in thousands (POP) X2: 1972 tax on fuel in cents per gallon (TAX) X3: number of licensed drivers (NLIC) X4: 1972 average per capita income (INC), and X5: length of federal aid highways (ROAD). The dependent variables in these analyses was the 1972 fuel consumption for the state, in millions of gallons (Y=FUELC). A partial listing of the data follows:

State Y=FUELC X1= POP X2= TAX X3= NLIC X4= INC X5=ROAD Maine 541 1029 9.00 540 3571 1976

Iowa 635 2883 7.00 1689 4318 10340 …

Oregon 524 2182 7.00 12130 5002 9794 Utah 591 1126 7.00 572 3745 2611

11a. (2 points) In one analysis, the Department of Transportation was interested in the relationship between TAX and FUELC, after controlling for NLIC. Two models were fit. The first contained the single predictor TAX. The second contained both predictors, TAX and NLIC. The following regression coefficients were obtained for the predictor TAX:

Predictors in Model Estimated for X2= TAX X2= TAX -53.00

X2= TAX and X3= NLIC -32.08 Analyses not shown here indicate that -53.00 and -32.08 are really different. In 1-3 sentences, what does this suggest about the association of FUELC with TAX? Tip - This question is asking you to show your understanding of crude versus adjusted association.

β

Page 21: BIOSTATS 640 Exam II 2020 - UMass 640 Exam... · 2020. 3. 10. · BIOSTATS 640 Exam 2 – Spring 2020 Name _____ Z:\bigelow\...\2020\...\BIOSTATS 640 Exam II 2020.docx Page 3 of 21

BIOSTATS 640 Exam 2 – Spring 2020 Name ________________________________________________

Z:\bigelow\...\2020\...\BIOSTATS 640 Exam II 2020.docx Page 21 of 21

11. (10 points total) - continued 11b. (4 points) The fit of four predictors (ROAD, INC, NLIC, TAX) yielded the following analysis of variance table: Source DF Sum of Squares Mean Square

Regression 4 339,316 84,829 Residual 43 189,050 4,397

Total, corrected 47 528,366 In one analysis, it was of interest to determine whether a reduced model containing the three predictors ROAD, INC, and NLIC was just as good as the full four predictor model. The following analysis of variance table was produced: Source DF Sum of Squares Mean Square Regression

ROAD, INC,NLIC TAX | road, inc, nlic

3 1

307,684 31,632

102,561 31,632

Residual 43 189,050 4,397 Total, corrected 47 528,366

Perform the appropriate test for the null hypothesis that knowledge of TAX contains no additional information for the prediction of FUELC beyond that contained in the model with the three predictors (ROAD, INC, NLIC). Tip - Be sure to state the null and alternative hypothesis, show the test statistic and its calculated value, report the achieved significance level, and interpret your findings. 11c. (4 points) Using the analysis of variance information from question #11b, compute the partial correlation squared between FUELC and TAX, after adjustment for ROAD, INC, and NLIC