multiple regression

21
1 Multiple Regression EPP 245/298 Statistical Analysis of Laboratory Data

Upload: vera-hanson

Post on 30-Dec-2015

30 views

Category:

Documents


2 download

DESCRIPTION

Multiple Regression. EPP 245/298 Statistical Analysis of Laboratory Data. Cystic Fibrosis Data. Cystic fibrosis lung function data lung function data for cystic fibrosis patients (7-23 years old) age a numeric vector. Age in years. - PowerPoint PPT Presentation

TRANSCRIPT

Page 1: Multiple Regression

1

Multiple Regression

EPP 245/298

Statistical Analysis of

Laboratory Data

Page 2: Multiple Regression

October 26, 2006 EPP 245 Statistical Analysis of Laboratory Data

2

Cystic Fibrosis DataCystic fibrosis lung function data

lung function data for cystic fibrosis patients (7-23 years old)

age a numeric vector. Age in years. sex a numeric vector code. 0: male, 1:female. height a numeric vector. Height (cm). weight a numeric vector. Weight (kg). bmp a numeric vector. Body mass (% of normal). fev1 a numeric vector. Forced expiratory volume. rv a numeric vector. Residual volume. frc a numeric vector. Functional residual capacity. tlc a numeric vector. Total lung capacity. pemax a numeric vector. Maximum expiratory pressure.

Page 3: Multiple Regression

October 26, 2006 EPP 245 Statistical Analysis of Laboratory Data

3

Some Stata Commands

. insheet using "C:\TD\CLASS\K30Bench2005\cystfibr.csv"

(11 vars, 25 obs). graph matrix age sex height weight bmp fev1 rv

frc tlc pemax. graph export cystfibr-scm.wmf. regress pemax age sex height weight bmp fev1

rv frc tlc. rvfplot. graph export cystfibr-rvf.wmf

Page 4: Multiple Regression

October 26, 2006 EPP 245 Statistical Analysis of Laboratory Data

4

age

sex

height

weight

bmp

fev1

rv

frc

tlc

pemax

0

20

0 20

0

.5

1

0 .5 1

100

150

200

100 150 200

20

40

60

80

20 40 60 80

60

80

100

60 80 100

20

40

60

20 40 60

0

200

400

0 200 400

100

200

300

100 200 300

80

100

120

140

80 100 120 140

50

100

150

200

50 100 150 200

Page 5: Multiple Regression

October 26, 2006 EPP 245 Statistical Analysis of Laboratory Data

5

Source | SS df MS Number of obs = 25-------------+------------------------------ F( 9, 15) = 2.93 Model | 17101.3907 9 1900.15452 Prob > F = 0.0320 Residual | 9731.24928 15 648.749952 R-squared = 0.6373-------------+------------------------------ Adj R-squared = 0.4197 Total | 26832.64 24 1118.02667 Root MSE = 25.471

------------------------------------------------------------------------------ pemax | Coef. Std. Err. t P>|t| [95% Conf. Interval]-------------+---------------------------------------------------------------- age | -2.54196 4.801699 -0.53 0.604 -12.77654 7.692618 sex | -3.736782 15.45982 -0.24 0.812 -36.68861 29.21505 height | -.4462549 .9033548 -0.49 0.628 -2.37171 1.4792 weight | 2.992816 2.007957 1.49 0.157 -1.287044 7.272675 bmp | -1.744944 1.155237 -1.51 0.152 -4.207274 .7173865 fev1 | 1.080697 1.080947 1.00 0.333 -1.223288 3.384682 rv | .196972 .1962136 1.00 0.331 -.2212474 .6151915 frc | -.3084314 .4923899 -0.63 0.540 -1.357936 .7410729 tlc | .1886017 .4997351 0.38 0.711 -.8765585 1.253762 _cons | 176.0582 225.8911 0.78 0.448 -305.4174 657.5338------------------------------------------------------------------------------

Page 6: Multiple Regression

October 26, 2006 EPP 245 Statistical Analysis of Laboratory Data

6

Source | SS df MS Number of obs = 25-------------+------------------------------ F( 9, 15) = 2.93 Model | 17101.3907 9 1900.15452 Prob > F = 0.0320 Residual | 9731.24928 15 648.749952 R-squared = 0.6373-------------+------------------------------ Adj R-squared = 0.4197 Total | 26832.64 24 1118.02667 Root MSE = 25.471

------------------------------------------------------------------------------ pemax | Coef. Std. Err. t P>|t| [95% Conf. Interval]-------------+---------------------------------------------------------------- age | -2.54196 4.801699 -0.53 0.604 -12.77654 7.692618 sex | -3.736782 15.45982 -0.24 0.812 -36.68861 29.21505 height | -.4462549 .9033548 -0.49 0.628 -2.37171 1.4792 weight | 2.992816 2.007957 1.49 0.157 -1.287044 7.272675 bmp | -1.744944 1.155237 -1.51 0.152 -4.207274 .7173865 fev1 | 1.080697 1.080947 1.00 0.333 -1.223288 3.384682 rv | .196972 .1962136 1.00 0.331 -.2212474 .6151915 frc | -.3084314 .4923899 -0.63 0.540 -1.357936 .7410729 tlc | .1886017 .4997351 0.38 0.711 -.8765585 1.253762 _cons | 176.0582 225.8911 0.78 0.448 -305.4174 657.5338------------------------------------------------------------------------------

T-test of additional value of variable

Page 7: Multiple Regression

October 26, 2006 EPP 245 Statistical Analysis of Laboratory Data

7

Source | SS df MS Number of obs = 25-------------+------------------------------ F( 9, 15) = 2.93 Model | 17101.3907 9 1900.15452 Prob > F = 0.0320 Residual | 9731.24928 15 648.749952 R-squared = 0.6373-------------+------------------------------ Adj R-squared = 0.4197 Total | 26832.64 24 1118.02667 Root MSE = 25.471

------------------------------------------------------------------------------ pemax | Coef. Std. Err. t P>|t| [95% Conf. Interval]-------------+---------------------------------------------------------------- age | -2.54196 4.801699 -0.53 0.604 -12.77654 7.692618 sex | -3.736782 15.45982 -0.24 0.812 -36.68861 29.21505 height | -.4462549 .9033548 -0.49 0.628 -2.37171 1.4792 weight | 2.992816 2.007957 1.49 0.157 -1.287044 7.272675 bmp | -1.744944 1.155237 -1.51 0.152 -4.207274 .7173865 fev1 | 1.080697 1.080947 1.00 0.333 -1.223288 3.384682 rv | .196972 .1962136 1.00 0.331 -.2212474 .6151915 frc | -.3084314 .4923899 -0.63 0.540 -1.357936 .7410729 tlc | .1886017 .4997351 0.38 0.711 -.8765585 1.253762 _cons | 176.0582 225.8911 0.78 0.448 -305.4174 657.5338------------------------------------------------------------------------------

Test of whole model

Page 8: Multiple Regression

October 26, 2006 EPP 245 Statistical Analysis of Laboratory Data

8

-40

-20

020

40R

esid

uals

80 100 120 140 160 180Fitted values

Page 9: Multiple Regression

October 26, 2006 EPP 245 Statistical Analysis of Laboratory Data

9

Source | SS df MS Number of obs = 25-------------+------------------------------ F( 9, 15) = 2.93 Model | 17101.3907 9 1900.15452 Prob > F = 0.0320 Residual | 9731.24928 15 648.749952 R-squared = 0.6373-------------+------------------------------ Adj R-squared = 0.4197 Total | 26832.64 24 1118.02667 Root MSE = 25.471

------------------------------------------------------------------------------ pemax | Coef. Std. Err. t P>|t| [95% Conf. Interval]-------------+---------------------------------------------------------------- age | -2.54196 4.801699 -0.53 0.604 -12.77654 7.692618 sex | -3.736782 15.45982 -0.24 0.812 -36.68861 29.21505 height | -.4462549 .9033548 -0.49 0.628 -2.37171 1.4792 weight | 2.992816 2.007957 1.49 0.157 -1.287044 7.272675 bmp | -1.744944 1.155237 -1.51 0.152 -4.207274 .7173865 fev1 | 1.080697 1.080947 1.00 0.333 -1.223288 3.384682 rv | .196972 .1962136 1.00 0.331 -.2212474 .6151915 frc | -.3084314 .4923899 -0.63 0.540 -1.357936 .7410729 tlc | .1886017 .4997351 0.38 0.711 -.8765585 1.253762 _cons | 176.0582 225.8911 0.78 0.448 -305.4174 657.5338------------------------------------------------------------------------------

Least significant variable

Page 10: Multiple Regression

October 26, 2006 EPP 245 Statistical Analysis of Laboratory Data

10

. regress pemax age height weight bmp fev1 rv frc tlc

Source | SS df MS Number of obs = 25-------------+------------------------------ F( 8, 16) = 3.49 Model | 17063.4886 8 2132.93607 Prob > F = 0.0159 Residual | 9769.15144 16 610.571965 R-squared = 0.6359-------------+------------------------------ Adj R-squared = 0.4539 Total | 26832.64 24 1118.02667 Root MSE = 24.71

------------------------------------------------------------------------------ pemax | Coef. Std. Err. t P>|t| [95% Conf. Interval]-------------+---------------------------------------------------------------- age | -2.114515 4.330841 -0.49 0.632 -11.29549 7.066459 height | -.394836 .851725 -0.46 0.649 -2.200412 1.41074 weight | 2.834909 1.841995 1.54 0.143 -1.069947 6.739765 bmp | -1.741637 1.120651 -1.55 0.140 -4.117312 .634038 fev1 | 1.26509 .7429407 1.70 0.108 -.3098737 2.840054 rv | .1779046 .1742911 1.02 0.323 -.1915759 .5473852 frc | -.2483218 .4122804 -0.60 0.555 -1.122317 .6256736 tlc | .2084044 .4782484 0.44 0.669 -.8054369 1.222246 _cons | 153.0385 198.7149 0.77 0.452 -268.2183 574.2953------------------------------------------------------------------------------

Least significant variable

Page 11: Multiple Regression

October 26, 2006 EPP 245 Statistical Analysis of Laboratory Data

11

. regress pemax age height weight bmp fev1 rv frc

Source | SS df MS Number of obs = 25-------------+------------------------------ F( 7, 17) = 4.16 Model | 16947.5458 7 2421.07798 Prob > F = 0.0077 Residual | 9885.09416 17 581.476127 R-squared = 0.6316-------------+------------------------------ Adj R-squared = 0.4799 Total | 26832.64 24 1118.02667 Root MSE = 24.114

------------------------------------------------------------------------------ pemax | Coef. Std. Err. t P>|t| [95% Conf. Interval]-------------+---------------------------------------------------------------- age | -2.663193 4.043832 -0.66 0.519 -11.19493 5.868546 height | -.4895733 .8036502 -0.61 0.550 -2.185127 1.205981 weight | 3.155659 1.647815 1.92 0.072 -.3209274 6.632245 bmp | -1.962543 .9753332 -2.01 0.060 -4.020316 .0952305 fev1 | 1.247861 .7239953 1.72 0.103 -.2796361 2.775357 rv | .1595988 .1650733 0.97 0.347 -.1886753 .5078729 frc | -.1764595 .368749 -0.48 0.638 -.9544518 .6015328 _cons | 198.2942 165.3311 1.20 0.247 -150.5238 547.1123------------------------------------------------------------------------------

Least significant variable

Page 12: Multiple Regression

October 26, 2006 EPP 245 Statistical Analysis of Laboratory Data

12

. regress pemax age height weight bmp fev1 rv

Source | SS df MS Number of obs = 25-------------+------------------------------ F( 6, 18) = 5.04 Model | 16814.3899 6 2802.39832 Prob > F = 0.0034 Residual | 10018.2501 18 556.569447 R-squared = 0.6266-------------+------------------------------ Adj R-squared = 0.5022 Total | 26832.64 24 1118.02667 Root MSE = 23.592

------------------------------------------------------------------------------ pemax | Coef. Std. Err. t P>|t| [95% Conf. Interval]-------------+---------------------------------------------------------------- age | -1.819342 3.560301 -0.51 0.616 -9.299258 5.660573 height | -.4101508 .7693006 -0.53 0.600 -2.026391 1.20609 weight | 2.874434 1.506126 1.91 0.072 -.2898203 6.038688 bmp | -1.949083 .9538193 -2.04 0.056 -3.952983 .0548169 fev1 | 1.411959 .6238279 2.26 0.036 .1013452 2.722573 rv | .0955779 .0946057 1.01 0.326 -.1031813 .2943371 _cons | 166.9049 148.4762 1.12 0.276 -145.0321 478.8418------------------------------------------------------------------------------

Least significant variable

Page 13: Multiple Regression

October 26, 2006 EPP 245 Statistical Analysis of Laboratory Data

13

. regress pemax height weight bmp fev1 rv

Source | SS df MS Number of obs = 25-------------+------------------------------ F( 5, 19) = 6.23 Model | 16669.0534 5 3333.81068 Prob > F = 0.0014 Residual | 10163.5866 19 534.92561 R-squared = 0.6212-------------+------------------------------ Adj R-squared = 0.5215 Total | 26832.64 24 1118.02667 Root MSE = 23.128

------------------------------------------------------------------------------ pemax | Coef. Std. Err. t P>|t| [95% Conf. Interval]-------------+---------------------------------------------------------------- height | -.4485274 .7505918 -0.60 0.557 -2.019534 1.122479 weight | 2.338692 1.060094 2.21 0.040 .1198889 4.557495 bmp | -1.641001 .7246036 -2.26 0.035 -3.157614 -.1243885 fev1 | 1.471767 .6007182 2.45 0.024 .2144491 2.729084 rv | .110117 .0884543 1.24 0.228 -.07502 .295254 _cons | 137.0958 133.8559 1.02 0.319 -143.0677 417.2594------------------------------------------------------------------------------

Least significant variable

Page 14: Multiple Regression

October 26, 2006 EPP 245 Statistical Analysis of Laboratory Data

14

. regress pemax weight bmp fev1 rv

Source | SS df MS Number of obs = 25-------------+------------------------------ F( 4, 20) = 7.96 Model | 16478.0401 4 4119.51002 Prob > F = 0.0005 Residual | 10354.5999 20 517.729996 R-squared = 0.6141-------------+------------------------------ Adj R-squared = 0.5369 Total | 26832.64 24 1118.02667 Root MSE = 22.754

------------------------------------------------------------------------------ pemax | Coef. Std. Err. t P>|t| [95% Conf. Interval]-------------+---------------------------------------------------------------- weight | 1.748914 .3806332 4.59 0.000 .9549274 2.542901 bmp | -1.377243 .5653421 -2.44 0.024 -2.556526 -.1979604 fev1 | 1.547698 .5776112 2.68 0.014 .3428223 2.752574 rv | .1257152 .0831456 1.51 0.146 -.0477234 .2991538 _cons | 63.9467 53.27673 1.20 0.244 -47.18661 175.08------------------------------------------------------------------------------

Least significant variable

Page 15: Multiple Regression

October 26, 2006 EPP 245 Statistical Analysis of Laboratory Data

15

. regress pemax weight bmp fev1

Source | SS df MS Number of obs = 25-------------+------------------------------ F( 3, 21) = 9.28 Model | 15294.4519 3 5098.15064 Prob > F = 0.0004 Residual | 11538.1881 21 549.437528 R-squared = 0.5700-------------+------------------------------ Adj R-squared = 0.5086 Total | 26832.64 24 1118.02667 Root MSE = 23.44

------------------------------------------------------------------------------ pemax | Coef. Std. Err. t P>|t| [95% Conf. Interval]-------------+---------------------------------------------------------------- weight | 1.536475 .3644235 4.22 0.000 .7786149 2.294335 bmp | -1.465406 .5792906 -2.53 0.019 -2.670106 -.260705 fev1 | 1.108629 .5143694 2.16 0.043 .0389396 2.178319 _cons | 126.3336 34.71986 3.64 0.002 54.12965 198.5375------------------------------------------------------------------------------

Page 16: Multiple Regression

October 26, 2006 EPP 245 Statistical Analysis of Laboratory Data

16

. stepwise, pr(.05): regress pemax age sex height weight bmp fev1 rv frc tlc begin with full modelp = 0.8123 >= 0.0500 removing sexp = 0.6688 >= 0.0500 removing tlcp = 0.6384 >= 0.0500 removing frcp = 0.6156 >= 0.0500 removing agep = 0.5572 >= 0.0500 removing heightp = 0.1462 >= 0.0500 removing rv

Source | SS df MS Number of obs = 25-------------+------------------------------ F( 3, 21) = 9.28 Model | 15294.4519 3 5098.15064 Prob > F = 0.0004 Residual | 11538.1881 21 549.437528 R-squared = 0.5700-------------+------------------------------ Adj R-squared = 0.5086 Total | 26832.64 24 1118.02667 Root MSE = 23.44

------------------------------------------------------------------------------ pemax | Coef. Std. Err. t P>|t| [95% Conf. Interval]-------------+---------------------------------------------------------------- fev1 | 1.108629 .5143694 2.16 0.043 .0389396 2.178319 weight | 1.536475 .3644235 4.22 0.000 .7786149 2.294335 bmp | -1.465406 .5792906 -2.53 0.019 -2.670106 -.260705 _cons | 126.3336 34.71986 3.64 0.002 54.12965 198.5375------------------------------------------------------------------------------

Page 17: Multiple Regression

October 26, 2006 EPP 245 Statistical Analysis of Laboratory Data

17

. stepwise, pr(.1) pe(.05): regress pemax age sex height weight bmp fev1 rv frc tlc begin with full modelp = 0.8123 >= 0.1000 removing sexp = 0.6688 >= 0.1000 removing tlcp = 0.6384 >= 0.1000 removing frcp = 0.6156 >= 0.1000 removing agep = 0.5572 >= 0.1000 removing heightp = 0.1462 >= 0.1000 removing rv

Source | SS df MS Number of obs = 25-------------+------------------------------ F( 3, 21) = 9.28 Model | 15294.4519 3 5098.15064 Prob > F = 0.0004 Residual | 11538.1881 21 549.437528 R-squared = 0.5700-------------+------------------------------ Adj R-squared = 0.5086 Total | 26832.64 24 1118.02667 Root MSE = 23.44

------------------------------------------------------------------------------ pemax | Coef. Std. Err. t P>|t| [95% Conf. Interval]-------------+---------------------------------------------------------------- fev1 | 1.108629 .5143694 2.16 0.043 .0389396 2.178319 weight | 1.536475 .3644235 4.22 0.000 .7786149 2.294335 bmp | -1.465406 .5792906 -2.53 0.019 -2.670106 -.260705 _cons | 126.3336 34.71986 3.64 0.002 54.12965 198.5375------------------------------------------------------------------------------

Page 18: Multiple Regression

October 26, 2006 EPP 245 Statistical Analysis of Laboratory Data

18

Cautionary Notes

• The significance levels are not necessarily believable after variable selection

• The original full model F-statistic is significant, indicating that there is some significant relationship: F(9,15) = 2.93, p = 0.0320

• After variable selection, F(3,21) = 9.28, p = 0.0004, which is biased.

Page 19: Multiple Regression

October 26, 2006 EPP 245 Statistical Analysis of Laboratory Data

19

set obs 25generate x1 = invnormal(uniform())generate x2 = invnormal(uniform())generate x3 = invnormal(uniform())generate x4 = invnormal(uniform())generate x5 = invnormal(uniform())generate x6 = invnormal(uniform())generate x7 = invnormal(uniform())generate x8 = invnormal(uniform())generate x9 = invnormal(uniform())generate y = invnormal(uniform())regress y x1 x2 x3 x4 x5 x6 x7 x8 x9stepwise, pr(.1): regress y x1 x2 x3 x4 x5 x6 x7 x8 x9

Page 20: Multiple Regression

October 26, 2006 EPP 245 Statistical Analysis of Laboratory Data

20

. regress y x1 x2 x3 x4 x5 x6 x7 x8 x9

Source | SS df MS Number of obs = 25-------------+------------------------------ F( 9, 15) = 0.91 Model | 12.3235639 9 1.36928488 Prob > F = 0.5397 Residual | 22.5105993 15 1.50070662 R-squared = 0.3538-------------+------------------------------ Adj R-squared = -0.0340 Total | 34.8341632 24 1.45142347 Root MSE = 1.225

------------------------------------------------------------------------------ y | Coef. Std. Err. t P>|t| [95% Conf. Interval]-------------+---------------------------------------------------------------- x1 | -.0441858 .2998066 -0.15 0.885 -.6832085 .594837 x2 | -.9078136 .4347798 -2.09 0.054 -1.834525 .0188976 x3 | .2076754 .3789522 0.55 0.592 -.6000421 1.015393 x4 | -.0056383 .3319125 -0.02 0.987 -.7130931 .7018166 x5 | -.330546 .3854497 -0.86 0.405 -1.152113 .4910207 x6 | .0202964 .3470704 0.06 0.954 -.7194666 .7600594 x7 | -.073401 .3135234 -0.23 0.818 -.7416603 .5948583 x8 | -.0552909 .3026913 -0.18 0.858 -.7004621 .5898803 x9 | -.3190092 .3137931 -1.02 0.325 -.9878434 .349825 _cons | -.2490392 .3078424 -0.81 0.431 -.9051898 .4071113------------------------------------------------------------------------------

Page 21: Multiple Regression

October 26, 2006 EPP 245 Statistical Analysis of Laboratory Data

21

. stepwise, pr(.1): regress y x1 x2 x3 x4 x5 x6 x7 x8 x9 begin with full modelp = 0.9867 >= 0.1000 removing x4p = 0.9545 >= 0.1000 removing x6p = 0.8456 >= 0.1000 removing x1p = 0.8165 >= 0.1000 removing x7p = 0.7506 >= 0.1000 removing x8p = 0.5023 >= 0.1000 removing x3p = 0.2866 >= 0.1000 removing x5p = 0.2081 >= 0.1000 removing x9

Source | SS df MS Number of obs = 25-------------+------------------------------ F( 1, 23) = 7.23 Model | 8.33379862 1 8.33379862 Prob > F = 0.0131 Residual | 26.5003646 23 1.15218977 R-squared = 0.2392-------------+------------------------------ Adj R-squared = 0.2062 Total | 34.8341632 24 1.45142347 Root MSE = 1.0734

------------------------------------------------------------------------------ y | Coef. Std. Err. t P>|t| [95% Conf. Interval]-------------+---------------------------------------------------------------- x2 | -.6644002 .2470417 -2.69 0.013 -1.175445 -.1533555 _cons | -.1523124 .214703 -0.71 0.485 -.5964594 .2918346------------------------------------------------------------------------------