multiple regression

Post on 30-Dec-2015

30 Views

Category:

Documents

2 Downloads

Preview:

Click to see full reader

DESCRIPTION

Multiple Regression. EPP 245/298 Statistical Analysis of Laboratory Data. Cystic Fibrosis Data. Cystic fibrosis lung function data lung function data for cystic fibrosis patients (7-23 years old) age a numeric vector. Age in years. - PowerPoint PPT Presentation

TRANSCRIPT

1

Multiple Regression

EPP 245/298

Statistical Analysis of

Laboratory Data

October 26, 2006 EPP 245 Statistical Analysis of Laboratory Data

2

Cystic Fibrosis DataCystic fibrosis lung function data

lung function data for cystic fibrosis patients (7-23 years old)

age a numeric vector. Age in years. sex a numeric vector code. 0: male, 1:female. height a numeric vector. Height (cm). weight a numeric vector. Weight (kg). bmp a numeric vector. Body mass (% of normal). fev1 a numeric vector. Forced expiratory volume. rv a numeric vector. Residual volume. frc a numeric vector. Functional residual capacity. tlc a numeric vector. Total lung capacity. pemax a numeric vector. Maximum expiratory pressure.

October 26, 2006 EPP 245 Statistical Analysis of Laboratory Data

3

Some Stata Commands

. insheet using "C:\TD\CLASS\K30Bench2005\cystfibr.csv"

(11 vars, 25 obs). graph matrix age sex height weight bmp fev1 rv

frc tlc pemax. graph export cystfibr-scm.wmf. regress pemax age sex height weight bmp fev1

rv frc tlc. rvfplot. graph export cystfibr-rvf.wmf

October 26, 2006 EPP 245 Statistical Analysis of Laboratory Data

4

age

sex

height

weight

bmp

fev1

rv

frc

tlc

pemax

0

20

0 20

0

.5

1

0 .5 1

100

150

200

100 150 200

20

40

60

80

20 40 60 80

60

80

100

60 80 100

20

40

60

20 40 60

0

200

400

0 200 400

100

200

300

100 200 300

80

100

120

140

80 100 120 140

50

100

150

200

50 100 150 200

October 26, 2006 EPP 245 Statistical Analysis of Laboratory Data

5

Source | SS df MS Number of obs = 25-------------+------------------------------ F( 9, 15) = 2.93 Model | 17101.3907 9 1900.15452 Prob > F = 0.0320 Residual | 9731.24928 15 648.749952 R-squared = 0.6373-------------+------------------------------ Adj R-squared = 0.4197 Total | 26832.64 24 1118.02667 Root MSE = 25.471

------------------------------------------------------------------------------ pemax | Coef. Std. Err. t P>|t| [95% Conf. Interval]-------------+---------------------------------------------------------------- age | -2.54196 4.801699 -0.53 0.604 -12.77654 7.692618 sex | -3.736782 15.45982 -0.24 0.812 -36.68861 29.21505 height | -.4462549 .9033548 -0.49 0.628 -2.37171 1.4792 weight | 2.992816 2.007957 1.49 0.157 -1.287044 7.272675 bmp | -1.744944 1.155237 -1.51 0.152 -4.207274 .7173865 fev1 | 1.080697 1.080947 1.00 0.333 -1.223288 3.384682 rv | .196972 .1962136 1.00 0.331 -.2212474 .6151915 frc | -.3084314 .4923899 -0.63 0.540 -1.357936 .7410729 tlc | .1886017 .4997351 0.38 0.711 -.8765585 1.253762 _cons | 176.0582 225.8911 0.78 0.448 -305.4174 657.5338------------------------------------------------------------------------------

October 26, 2006 EPP 245 Statistical Analysis of Laboratory Data

6

Source | SS df MS Number of obs = 25-------------+------------------------------ F( 9, 15) = 2.93 Model | 17101.3907 9 1900.15452 Prob > F = 0.0320 Residual | 9731.24928 15 648.749952 R-squared = 0.6373-------------+------------------------------ Adj R-squared = 0.4197 Total | 26832.64 24 1118.02667 Root MSE = 25.471

------------------------------------------------------------------------------ pemax | Coef. Std. Err. t P>|t| [95% Conf. Interval]-------------+---------------------------------------------------------------- age | -2.54196 4.801699 -0.53 0.604 -12.77654 7.692618 sex | -3.736782 15.45982 -0.24 0.812 -36.68861 29.21505 height | -.4462549 .9033548 -0.49 0.628 -2.37171 1.4792 weight | 2.992816 2.007957 1.49 0.157 -1.287044 7.272675 bmp | -1.744944 1.155237 -1.51 0.152 -4.207274 .7173865 fev1 | 1.080697 1.080947 1.00 0.333 -1.223288 3.384682 rv | .196972 .1962136 1.00 0.331 -.2212474 .6151915 frc | -.3084314 .4923899 -0.63 0.540 -1.357936 .7410729 tlc | .1886017 .4997351 0.38 0.711 -.8765585 1.253762 _cons | 176.0582 225.8911 0.78 0.448 -305.4174 657.5338------------------------------------------------------------------------------

T-test of additional value of variable

October 26, 2006 EPP 245 Statistical Analysis of Laboratory Data

7

Source | SS df MS Number of obs = 25-------------+------------------------------ F( 9, 15) = 2.93 Model | 17101.3907 9 1900.15452 Prob > F = 0.0320 Residual | 9731.24928 15 648.749952 R-squared = 0.6373-------------+------------------------------ Adj R-squared = 0.4197 Total | 26832.64 24 1118.02667 Root MSE = 25.471

------------------------------------------------------------------------------ pemax | Coef. Std. Err. t P>|t| [95% Conf. Interval]-------------+---------------------------------------------------------------- age | -2.54196 4.801699 -0.53 0.604 -12.77654 7.692618 sex | -3.736782 15.45982 -0.24 0.812 -36.68861 29.21505 height | -.4462549 .9033548 -0.49 0.628 -2.37171 1.4792 weight | 2.992816 2.007957 1.49 0.157 -1.287044 7.272675 bmp | -1.744944 1.155237 -1.51 0.152 -4.207274 .7173865 fev1 | 1.080697 1.080947 1.00 0.333 -1.223288 3.384682 rv | .196972 .1962136 1.00 0.331 -.2212474 .6151915 frc | -.3084314 .4923899 -0.63 0.540 -1.357936 .7410729 tlc | .1886017 .4997351 0.38 0.711 -.8765585 1.253762 _cons | 176.0582 225.8911 0.78 0.448 -305.4174 657.5338------------------------------------------------------------------------------

Test of whole model

October 26, 2006 EPP 245 Statistical Analysis of Laboratory Data

8

-40

-20

020

40R

esid

uals

80 100 120 140 160 180Fitted values

October 26, 2006 EPP 245 Statistical Analysis of Laboratory Data

9

Source | SS df MS Number of obs = 25-------------+------------------------------ F( 9, 15) = 2.93 Model | 17101.3907 9 1900.15452 Prob > F = 0.0320 Residual | 9731.24928 15 648.749952 R-squared = 0.6373-------------+------------------------------ Adj R-squared = 0.4197 Total | 26832.64 24 1118.02667 Root MSE = 25.471

------------------------------------------------------------------------------ pemax | Coef. Std. Err. t P>|t| [95% Conf. Interval]-------------+---------------------------------------------------------------- age | -2.54196 4.801699 -0.53 0.604 -12.77654 7.692618 sex | -3.736782 15.45982 -0.24 0.812 -36.68861 29.21505 height | -.4462549 .9033548 -0.49 0.628 -2.37171 1.4792 weight | 2.992816 2.007957 1.49 0.157 -1.287044 7.272675 bmp | -1.744944 1.155237 -1.51 0.152 -4.207274 .7173865 fev1 | 1.080697 1.080947 1.00 0.333 -1.223288 3.384682 rv | .196972 .1962136 1.00 0.331 -.2212474 .6151915 frc | -.3084314 .4923899 -0.63 0.540 -1.357936 .7410729 tlc | .1886017 .4997351 0.38 0.711 -.8765585 1.253762 _cons | 176.0582 225.8911 0.78 0.448 -305.4174 657.5338------------------------------------------------------------------------------

Least significant variable

October 26, 2006 EPP 245 Statistical Analysis of Laboratory Data

10

. regress pemax age height weight bmp fev1 rv frc tlc

Source | SS df MS Number of obs = 25-------------+------------------------------ F( 8, 16) = 3.49 Model | 17063.4886 8 2132.93607 Prob > F = 0.0159 Residual | 9769.15144 16 610.571965 R-squared = 0.6359-------------+------------------------------ Adj R-squared = 0.4539 Total | 26832.64 24 1118.02667 Root MSE = 24.71

------------------------------------------------------------------------------ pemax | Coef. Std. Err. t P>|t| [95% Conf. Interval]-------------+---------------------------------------------------------------- age | -2.114515 4.330841 -0.49 0.632 -11.29549 7.066459 height | -.394836 .851725 -0.46 0.649 -2.200412 1.41074 weight | 2.834909 1.841995 1.54 0.143 -1.069947 6.739765 bmp | -1.741637 1.120651 -1.55 0.140 -4.117312 .634038 fev1 | 1.26509 .7429407 1.70 0.108 -.3098737 2.840054 rv | .1779046 .1742911 1.02 0.323 -.1915759 .5473852 frc | -.2483218 .4122804 -0.60 0.555 -1.122317 .6256736 tlc | .2084044 .4782484 0.44 0.669 -.8054369 1.222246 _cons | 153.0385 198.7149 0.77 0.452 -268.2183 574.2953------------------------------------------------------------------------------

Least significant variable

October 26, 2006 EPP 245 Statistical Analysis of Laboratory Data

11

. regress pemax age height weight bmp fev1 rv frc

Source | SS df MS Number of obs = 25-------------+------------------------------ F( 7, 17) = 4.16 Model | 16947.5458 7 2421.07798 Prob > F = 0.0077 Residual | 9885.09416 17 581.476127 R-squared = 0.6316-------------+------------------------------ Adj R-squared = 0.4799 Total | 26832.64 24 1118.02667 Root MSE = 24.114

------------------------------------------------------------------------------ pemax | Coef. Std. Err. t P>|t| [95% Conf. Interval]-------------+---------------------------------------------------------------- age | -2.663193 4.043832 -0.66 0.519 -11.19493 5.868546 height | -.4895733 .8036502 -0.61 0.550 -2.185127 1.205981 weight | 3.155659 1.647815 1.92 0.072 -.3209274 6.632245 bmp | -1.962543 .9753332 -2.01 0.060 -4.020316 .0952305 fev1 | 1.247861 .7239953 1.72 0.103 -.2796361 2.775357 rv | .1595988 .1650733 0.97 0.347 -.1886753 .5078729 frc | -.1764595 .368749 -0.48 0.638 -.9544518 .6015328 _cons | 198.2942 165.3311 1.20 0.247 -150.5238 547.1123------------------------------------------------------------------------------

Least significant variable

October 26, 2006 EPP 245 Statistical Analysis of Laboratory Data

12

. regress pemax age height weight bmp fev1 rv

Source | SS df MS Number of obs = 25-------------+------------------------------ F( 6, 18) = 5.04 Model | 16814.3899 6 2802.39832 Prob > F = 0.0034 Residual | 10018.2501 18 556.569447 R-squared = 0.6266-------------+------------------------------ Adj R-squared = 0.5022 Total | 26832.64 24 1118.02667 Root MSE = 23.592

------------------------------------------------------------------------------ pemax | Coef. Std. Err. t P>|t| [95% Conf. Interval]-------------+---------------------------------------------------------------- age | -1.819342 3.560301 -0.51 0.616 -9.299258 5.660573 height | -.4101508 .7693006 -0.53 0.600 -2.026391 1.20609 weight | 2.874434 1.506126 1.91 0.072 -.2898203 6.038688 bmp | -1.949083 .9538193 -2.04 0.056 -3.952983 .0548169 fev1 | 1.411959 .6238279 2.26 0.036 .1013452 2.722573 rv | .0955779 .0946057 1.01 0.326 -.1031813 .2943371 _cons | 166.9049 148.4762 1.12 0.276 -145.0321 478.8418------------------------------------------------------------------------------

Least significant variable

October 26, 2006 EPP 245 Statistical Analysis of Laboratory Data

13

. regress pemax height weight bmp fev1 rv

Source | SS df MS Number of obs = 25-------------+------------------------------ F( 5, 19) = 6.23 Model | 16669.0534 5 3333.81068 Prob > F = 0.0014 Residual | 10163.5866 19 534.92561 R-squared = 0.6212-------------+------------------------------ Adj R-squared = 0.5215 Total | 26832.64 24 1118.02667 Root MSE = 23.128

------------------------------------------------------------------------------ pemax | Coef. Std. Err. t P>|t| [95% Conf. Interval]-------------+---------------------------------------------------------------- height | -.4485274 .7505918 -0.60 0.557 -2.019534 1.122479 weight | 2.338692 1.060094 2.21 0.040 .1198889 4.557495 bmp | -1.641001 .7246036 -2.26 0.035 -3.157614 -.1243885 fev1 | 1.471767 .6007182 2.45 0.024 .2144491 2.729084 rv | .110117 .0884543 1.24 0.228 -.07502 .295254 _cons | 137.0958 133.8559 1.02 0.319 -143.0677 417.2594------------------------------------------------------------------------------

Least significant variable

October 26, 2006 EPP 245 Statistical Analysis of Laboratory Data

14

. regress pemax weight bmp fev1 rv

Source | SS df MS Number of obs = 25-------------+------------------------------ F( 4, 20) = 7.96 Model | 16478.0401 4 4119.51002 Prob > F = 0.0005 Residual | 10354.5999 20 517.729996 R-squared = 0.6141-------------+------------------------------ Adj R-squared = 0.5369 Total | 26832.64 24 1118.02667 Root MSE = 22.754

------------------------------------------------------------------------------ pemax | Coef. Std. Err. t P>|t| [95% Conf. Interval]-------------+---------------------------------------------------------------- weight | 1.748914 .3806332 4.59 0.000 .9549274 2.542901 bmp | -1.377243 .5653421 -2.44 0.024 -2.556526 -.1979604 fev1 | 1.547698 .5776112 2.68 0.014 .3428223 2.752574 rv | .1257152 .0831456 1.51 0.146 -.0477234 .2991538 _cons | 63.9467 53.27673 1.20 0.244 -47.18661 175.08------------------------------------------------------------------------------

Least significant variable

October 26, 2006 EPP 245 Statistical Analysis of Laboratory Data

15

. regress pemax weight bmp fev1

Source | SS df MS Number of obs = 25-------------+------------------------------ F( 3, 21) = 9.28 Model | 15294.4519 3 5098.15064 Prob > F = 0.0004 Residual | 11538.1881 21 549.437528 R-squared = 0.5700-------------+------------------------------ Adj R-squared = 0.5086 Total | 26832.64 24 1118.02667 Root MSE = 23.44

------------------------------------------------------------------------------ pemax | Coef. Std. Err. t P>|t| [95% Conf. Interval]-------------+---------------------------------------------------------------- weight | 1.536475 .3644235 4.22 0.000 .7786149 2.294335 bmp | -1.465406 .5792906 -2.53 0.019 -2.670106 -.260705 fev1 | 1.108629 .5143694 2.16 0.043 .0389396 2.178319 _cons | 126.3336 34.71986 3.64 0.002 54.12965 198.5375------------------------------------------------------------------------------

October 26, 2006 EPP 245 Statistical Analysis of Laboratory Data

16

. stepwise, pr(.05): regress pemax age sex height weight bmp fev1 rv frc tlc begin with full modelp = 0.8123 >= 0.0500 removing sexp = 0.6688 >= 0.0500 removing tlcp = 0.6384 >= 0.0500 removing frcp = 0.6156 >= 0.0500 removing agep = 0.5572 >= 0.0500 removing heightp = 0.1462 >= 0.0500 removing rv

Source | SS df MS Number of obs = 25-------------+------------------------------ F( 3, 21) = 9.28 Model | 15294.4519 3 5098.15064 Prob > F = 0.0004 Residual | 11538.1881 21 549.437528 R-squared = 0.5700-------------+------------------------------ Adj R-squared = 0.5086 Total | 26832.64 24 1118.02667 Root MSE = 23.44

------------------------------------------------------------------------------ pemax | Coef. Std. Err. t P>|t| [95% Conf. Interval]-------------+---------------------------------------------------------------- fev1 | 1.108629 .5143694 2.16 0.043 .0389396 2.178319 weight | 1.536475 .3644235 4.22 0.000 .7786149 2.294335 bmp | -1.465406 .5792906 -2.53 0.019 -2.670106 -.260705 _cons | 126.3336 34.71986 3.64 0.002 54.12965 198.5375------------------------------------------------------------------------------

October 26, 2006 EPP 245 Statistical Analysis of Laboratory Data

17

. stepwise, pr(.1) pe(.05): regress pemax age sex height weight bmp fev1 rv frc tlc begin with full modelp = 0.8123 >= 0.1000 removing sexp = 0.6688 >= 0.1000 removing tlcp = 0.6384 >= 0.1000 removing frcp = 0.6156 >= 0.1000 removing agep = 0.5572 >= 0.1000 removing heightp = 0.1462 >= 0.1000 removing rv

Source | SS df MS Number of obs = 25-------------+------------------------------ F( 3, 21) = 9.28 Model | 15294.4519 3 5098.15064 Prob > F = 0.0004 Residual | 11538.1881 21 549.437528 R-squared = 0.5700-------------+------------------------------ Adj R-squared = 0.5086 Total | 26832.64 24 1118.02667 Root MSE = 23.44

------------------------------------------------------------------------------ pemax | Coef. Std. Err. t P>|t| [95% Conf. Interval]-------------+---------------------------------------------------------------- fev1 | 1.108629 .5143694 2.16 0.043 .0389396 2.178319 weight | 1.536475 .3644235 4.22 0.000 .7786149 2.294335 bmp | -1.465406 .5792906 -2.53 0.019 -2.670106 -.260705 _cons | 126.3336 34.71986 3.64 0.002 54.12965 198.5375------------------------------------------------------------------------------

October 26, 2006 EPP 245 Statistical Analysis of Laboratory Data

18

Cautionary Notes

• The significance levels are not necessarily believable after variable selection

• The original full model F-statistic is significant, indicating that there is some significant relationship: F(9,15) = 2.93, p = 0.0320

• After variable selection, F(3,21) = 9.28, p = 0.0004, which is biased.

October 26, 2006 EPP 245 Statistical Analysis of Laboratory Data

19

set obs 25generate x1 = invnormal(uniform())generate x2 = invnormal(uniform())generate x3 = invnormal(uniform())generate x4 = invnormal(uniform())generate x5 = invnormal(uniform())generate x6 = invnormal(uniform())generate x7 = invnormal(uniform())generate x8 = invnormal(uniform())generate x9 = invnormal(uniform())generate y = invnormal(uniform())regress y x1 x2 x3 x4 x5 x6 x7 x8 x9stepwise, pr(.1): regress y x1 x2 x3 x4 x5 x6 x7 x8 x9

October 26, 2006 EPP 245 Statistical Analysis of Laboratory Data

20

. regress y x1 x2 x3 x4 x5 x6 x7 x8 x9

Source | SS df MS Number of obs = 25-------------+------------------------------ F( 9, 15) = 0.91 Model | 12.3235639 9 1.36928488 Prob > F = 0.5397 Residual | 22.5105993 15 1.50070662 R-squared = 0.3538-------------+------------------------------ Adj R-squared = -0.0340 Total | 34.8341632 24 1.45142347 Root MSE = 1.225

------------------------------------------------------------------------------ y | Coef. Std. Err. t P>|t| [95% Conf. Interval]-------------+---------------------------------------------------------------- x1 | -.0441858 .2998066 -0.15 0.885 -.6832085 .594837 x2 | -.9078136 .4347798 -2.09 0.054 -1.834525 .0188976 x3 | .2076754 .3789522 0.55 0.592 -.6000421 1.015393 x4 | -.0056383 .3319125 -0.02 0.987 -.7130931 .7018166 x5 | -.330546 .3854497 -0.86 0.405 -1.152113 .4910207 x6 | .0202964 .3470704 0.06 0.954 -.7194666 .7600594 x7 | -.073401 .3135234 -0.23 0.818 -.7416603 .5948583 x8 | -.0552909 .3026913 -0.18 0.858 -.7004621 .5898803 x9 | -.3190092 .3137931 -1.02 0.325 -.9878434 .349825 _cons | -.2490392 .3078424 -0.81 0.431 -.9051898 .4071113------------------------------------------------------------------------------

October 26, 2006 EPP 245 Statistical Analysis of Laboratory Data

21

. stepwise, pr(.1): regress y x1 x2 x3 x4 x5 x6 x7 x8 x9 begin with full modelp = 0.9867 >= 0.1000 removing x4p = 0.9545 >= 0.1000 removing x6p = 0.8456 >= 0.1000 removing x1p = 0.8165 >= 0.1000 removing x7p = 0.7506 >= 0.1000 removing x8p = 0.5023 >= 0.1000 removing x3p = 0.2866 >= 0.1000 removing x5p = 0.2081 >= 0.1000 removing x9

Source | SS df MS Number of obs = 25-------------+------------------------------ F( 1, 23) = 7.23 Model | 8.33379862 1 8.33379862 Prob > F = 0.0131 Residual | 26.5003646 23 1.15218977 R-squared = 0.2392-------------+------------------------------ Adj R-squared = 0.2062 Total | 34.8341632 24 1.45142347 Root MSE = 1.0734

------------------------------------------------------------------------------ y | Coef. Std. Err. t P>|t| [95% Conf. Interval]-------------+---------------------------------------------------------------- x2 | -.6644002 .2470417 -2.69 0.013 -1.175445 -.1533555 _cons | -.1523124 .214703 -0.71 0.485 -.5964594 .2918346------------------------------------------------------------------------------

top related