multiple regression
DESCRIPTION
Multiple Regression. EPP 245/298 Statistical Analysis of Laboratory Data. Cystic Fibrosis Data. Cystic fibrosis lung function data lung function data for cystic fibrosis patients (7-23 years old) age a numeric vector. Age in years. - PowerPoint PPT PresentationTRANSCRIPT
1
Multiple Regression
EPP 245/298
Statistical Analysis of
Laboratory Data
October 26, 2006 EPP 245 Statistical Analysis of Laboratory Data
2
Cystic Fibrosis DataCystic fibrosis lung function data
lung function data for cystic fibrosis patients (7-23 years old)
age a numeric vector. Age in years. sex a numeric vector code. 0: male, 1:female. height a numeric vector. Height (cm). weight a numeric vector. Weight (kg). bmp a numeric vector. Body mass (% of normal). fev1 a numeric vector. Forced expiratory volume. rv a numeric vector. Residual volume. frc a numeric vector. Functional residual capacity. tlc a numeric vector. Total lung capacity. pemax a numeric vector. Maximum expiratory pressure.
October 26, 2006 EPP 245 Statistical Analysis of Laboratory Data
3
Some Stata Commands
. insheet using "C:\TD\CLASS\K30Bench2005\cystfibr.csv"
(11 vars, 25 obs). graph matrix age sex height weight bmp fev1 rv
frc tlc pemax. graph export cystfibr-scm.wmf. regress pemax age sex height weight bmp fev1
rv frc tlc. rvfplot. graph export cystfibr-rvf.wmf
October 26, 2006 EPP 245 Statistical Analysis of Laboratory Data
4
age
sex
height
weight
bmp
fev1
rv
frc
tlc
pemax
0
20
0 20
0
.5
1
0 .5 1
100
150
200
100 150 200
20
40
60
80
20 40 60 80
60
80
100
60 80 100
20
40
60
20 40 60
0
200
400
0 200 400
100
200
300
100 200 300
80
100
120
140
80 100 120 140
50
100
150
200
50 100 150 200
October 26, 2006 EPP 245 Statistical Analysis of Laboratory Data
5
Source | SS df MS Number of obs = 25-------------+------------------------------ F( 9, 15) = 2.93 Model | 17101.3907 9 1900.15452 Prob > F = 0.0320 Residual | 9731.24928 15 648.749952 R-squared = 0.6373-------------+------------------------------ Adj R-squared = 0.4197 Total | 26832.64 24 1118.02667 Root MSE = 25.471
------------------------------------------------------------------------------ pemax | Coef. Std. Err. t P>|t| [95% Conf. Interval]-------------+---------------------------------------------------------------- age | -2.54196 4.801699 -0.53 0.604 -12.77654 7.692618 sex | -3.736782 15.45982 -0.24 0.812 -36.68861 29.21505 height | -.4462549 .9033548 -0.49 0.628 -2.37171 1.4792 weight | 2.992816 2.007957 1.49 0.157 -1.287044 7.272675 bmp | -1.744944 1.155237 -1.51 0.152 -4.207274 .7173865 fev1 | 1.080697 1.080947 1.00 0.333 -1.223288 3.384682 rv | .196972 .1962136 1.00 0.331 -.2212474 .6151915 frc | -.3084314 .4923899 -0.63 0.540 -1.357936 .7410729 tlc | .1886017 .4997351 0.38 0.711 -.8765585 1.253762 _cons | 176.0582 225.8911 0.78 0.448 -305.4174 657.5338------------------------------------------------------------------------------
October 26, 2006 EPP 245 Statistical Analysis of Laboratory Data
6
Source | SS df MS Number of obs = 25-------------+------------------------------ F( 9, 15) = 2.93 Model | 17101.3907 9 1900.15452 Prob > F = 0.0320 Residual | 9731.24928 15 648.749952 R-squared = 0.6373-------------+------------------------------ Adj R-squared = 0.4197 Total | 26832.64 24 1118.02667 Root MSE = 25.471
------------------------------------------------------------------------------ pemax | Coef. Std. Err. t P>|t| [95% Conf. Interval]-------------+---------------------------------------------------------------- age | -2.54196 4.801699 -0.53 0.604 -12.77654 7.692618 sex | -3.736782 15.45982 -0.24 0.812 -36.68861 29.21505 height | -.4462549 .9033548 -0.49 0.628 -2.37171 1.4792 weight | 2.992816 2.007957 1.49 0.157 -1.287044 7.272675 bmp | -1.744944 1.155237 -1.51 0.152 -4.207274 .7173865 fev1 | 1.080697 1.080947 1.00 0.333 -1.223288 3.384682 rv | .196972 .1962136 1.00 0.331 -.2212474 .6151915 frc | -.3084314 .4923899 -0.63 0.540 -1.357936 .7410729 tlc | .1886017 .4997351 0.38 0.711 -.8765585 1.253762 _cons | 176.0582 225.8911 0.78 0.448 -305.4174 657.5338------------------------------------------------------------------------------
T-test of additional value of variable
October 26, 2006 EPP 245 Statistical Analysis of Laboratory Data
7
Source | SS df MS Number of obs = 25-------------+------------------------------ F( 9, 15) = 2.93 Model | 17101.3907 9 1900.15452 Prob > F = 0.0320 Residual | 9731.24928 15 648.749952 R-squared = 0.6373-------------+------------------------------ Adj R-squared = 0.4197 Total | 26832.64 24 1118.02667 Root MSE = 25.471
------------------------------------------------------------------------------ pemax | Coef. Std. Err. t P>|t| [95% Conf. Interval]-------------+---------------------------------------------------------------- age | -2.54196 4.801699 -0.53 0.604 -12.77654 7.692618 sex | -3.736782 15.45982 -0.24 0.812 -36.68861 29.21505 height | -.4462549 .9033548 -0.49 0.628 -2.37171 1.4792 weight | 2.992816 2.007957 1.49 0.157 -1.287044 7.272675 bmp | -1.744944 1.155237 -1.51 0.152 -4.207274 .7173865 fev1 | 1.080697 1.080947 1.00 0.333 -1.223288 3.384682 rv | .196972 .1962136 1.00 0.331 -.2212474 .6151915 frc | -.3084314 .4923899 -0.63 0.540 -1.357936 .7410729 tlc | .1886017 .4997351 0.38 0.711 -.8765585 1.253762 _cons | 176.0582 225.8911 0.78 0.448 -305.4174 657.5338------------------------------------------------------------------------------
Test of whole model
October 26, 2006 EPP 245 Statistical Analysis of Laboratory Data
8
-40
-20
020
40R
esid
uals
80 100 120 140 160 180Fitted values
October 26, 2006 EPP 245 Statistical Analysis of Laboratory Data
9
Source | SS df MS Number of obs = 25-------------+------------------------------ F( 9, 15) = 2.93 Model | 17101.3907 9 1900.15452 Prob > F = 0.0320 Residual | 9731.24928 15 648.749952 R-squared = 0.6373-------------+------------------------------ Adj R-squared = 0.4197 Total | 26832.64 24 1118.02667 Root MSE = 25.471
------------------------------------------------------------------------------ pemax | Coef. Std. Err. t P>|t| [95% Conf. Interval]-------------+---------------------------------------------------------------- age | -2.54196 4.801699 -0.53 0.604 -12.77654 7.692618 sex | -3.736782 15.45982 -0.24 0.812 -36.68861 29.21505 height | -.4462549 .9033548 -0.49 0.628 -2.37171 1.4792 weight | 2.992816 2.007957 1.49 0.157 -1.287044 7.272675 bmp | -1.744944 1.155237 -1.51 0.152 -4.207274 .7173865 fev1 | 1.080697 1.080947 1.00 0.333 -1.223288 3.384682 rv | .196972 .1962136 1.00 0.331 -.2212474 .6151915 frc | -.3084314 .4923899 -0.63 0.540 -1.357936 .7410729 tlc | .1886017 .4997351 0.38 0.711 -.8765585 1.253762 _cons | 176.0582 225.8911 0.78 0.448 -305.4174 657.5338------------------------------------------------------------------------------
Least significant variable
October 26, 2006 EPP 245 Statistical Analysis of Laboratory Data
10
. regress pemax age height weight bmp fev1 rv frc tlc
Source | SS df MS Number of obs = 25-------------+------------------------------ F( 8, 16) = 3.49 Model | 17063.4886 8 2132.93607 Prob > F = 0.0159 Residual | 9769.15144 16 610.571965 R-squared = 0.6359-------------+------------------------------ Adj R-squared = 0.4539 Total | 26832.64 24 1118.02667 Root MSE = 24.71
------------------------------------------------------------------------------ pemax | Coef. Std. Err. t P>|t| [95% Conf. Interval]-------------+---------------------------------------------------------------- age | -2.114515 4.330841 -0.49 0.632 -11.29549 7.066459 height | -.394836 .851725 -0.46 0.649 -2.200412 1.41074 weight | 2.834909 1.841995 1.54 0.143 -1.069947 6.739765 bmp | -1.741637 1.120651 -1.55 0.140 -4.117312 .634038 fev1 | 1.26509 .7429407 1.70 0.108 -.3098737 2.840054 rv | .1779046 .1742911 1.02 0.323 -.1915759 .5473852 frc | -.2483218 .4122804 -0.60 0.555 -1.122317 .6256736 tlc | .2084044 .4782484 0.44 0.669 -.8054369 1.222246 _cons | 153.0385 198.7149 0.77 0.452 -268.2183 574.2953------------------------------------------------------------------------------
Least significant variable
October 26, 2006 EPP 245 Statistical Analysis of Laboratory Data
11
. regress pemax age height weight bmp fev1 rv frc
Source | SS df MS Number of obs = 25-------------+------------------------------ F( 7, 17) = 4.16 Model | 16947.5458 7 2421.07798 Prob > F = 0.0077 Residual | 9885.09416 17 581.476127 R-squared = 0.6316-------------+------------------------------ Adj R-squared = 0.4799 Total | 26832.64 24 1118.02667 Root MSE = 24.114
------------------------------------------------------------------------------ pemax | Coef. Std. Err. t P>|t| [95% Conf. Interval]-------------+---------------------------------------------------------------- age | -2.663193 4.043832 -0.66 0.519 -11.19493 5.868546 height | -.4895733 .8036502 -0.61 0.550 -2.185127 1.205981 weight | 3.155659 1.647815 1.92 0.072 -.3209274 6.632245 bmp | -1.962543 .9753332 -2.01 0.060 -4.020316 .0952305 fev1 | 1.247861 .7239953 1.72 0.103 -.2796361 2.775357 rv | .1595988 .1650733 0.97 0.347 -.1886753 .5078729 frc | -.1764595 .368749 -0.48 0.638 -.9544518 .6015328 _cons | 198.2942 165.3311 1.20 0.247 -150.5238 547.1123------------------------------------------------------------------------------
Least significant variable
October 26, 2006 EPP 245 Statistical Analysis of Laboratory Data
12
. regress pemax age height weight bmp fev1 rv
Source | SS df MS Number of obs = 25-------------+------------------------------ F( 6, 18) = 5.04 Model | 16814.3899 6 2802.39832 Prob > F = 0.0034 Residual | 10018.2501 18 556.569447 R-squared = 0.6266-------------+------------------------------ Adj R-squared = 0.5022 Total | 26832.64 24 1118.02667 Root MSE = 23.592
------------------------------------------------------------------------------ pemax | Coef. Std. Err. t P>|t| [95% Conf. Interval]-------------+---------------------------------------------------------------- age | -1.819342 3.560301 -0.51 0.616 -9.299258 5.660573 height | -.4101508 .7693006 -0.53 0.600 -2.026391 1.20609 weight | 2.874434 1.506126 1.91 0.072 -.2898203 6.038688 bmp | -1.949083 .9538193 -2.04 0.056 -3.952983 .0548169 fev1 | 1.411959 .6238279 2.26 0.036 .1013452 2.722573 rv | .0955779 .0946057 1.01 0.326 -.1031813 .2943371 _cons | 166.9049 148.4762 1.12 0.276 -145.0321 478.8418------------------------------------------------------------------------------
Least significant variable
October 26, 2006 EPP 245 Statistical Analysis of Laboratory Data
13
. regress pemax height weight bmp fev1 rv
Source | SS df MS Number of obs = 25-------------+------------------------------ F( 5, 19) = 6.23 Model | 16669.0534 5 3333.81068 Prob > F = 0.0014 Residual | 10163.5866 19 534.92561 R-squared = 0.6212-------------+------------------------------ Adj R-squared = 0.5215 Total | 26832.64 24 1118.02667 Root MSE = 23.128
------------------------------------------------------------------------------ pemax | Coef. Std. Err. t P>|t| [95% Conf. Interval]-------------+---------------------------------------------------------------- height | -.4485274 .7505918 -0.60 0.557 -2.019534 1.122479 weight | 2.338692 1.060094 2.21 0.040 .1198889 4.557495 bmp | -1.641001 .7246036 -2.26 0.035 -3.157614 -.1243885 fev1 | 1.471767 .6007182 2.45 0.024 .2144491 2.729084 rv | .110117 .0884543 1.24 0.228 -.07502 .295254 _cons | 137.0958 133.8559 1.02 0.319 -143.0677 417.2594------------------------------------------------------------------------------
Least significant variable
October 26, 2006 EPP 245 Statistical Analysis of Laboratory Data
14
. regress pemax weight bmp fev1 rv
Source | SS df MS Number of obs = 25-------------+------------------------------ F( 4, 20) = 7.96 Model | 16478.0401 4 4119.51002 Prob > F = 0.0005 Residual | 10354.5999 20 517.729996 R-squared = 0.6141-------------+------------------------------ Adj R-squared = 0.5369 Total | 26832.64 24 1118.02667 Root MSE = 22.754
------------------------------------------------------------------------------ pemax | Coef. Std. Err. t P>|t| [95% Conf. Interval]-------------+---------------------------------------------------------------- weight | 1.748914 .3806332 4.59 0.000 .9549274 2.542901 bmp | -1.377243 .5653421 -2.44 0.024 -2.556526 -.1979604 fev1 | 1.547698 .5776112 2.68 0.014 .3428223 2.752574 rv | .1257152 .0831456 1.51 0.146 -.0477234 .2991538 _cons | 63.9467 53.27673 1.20 0.244 -47.18661 175.08------------------------------------------------------------------------------
Least significant variable
October 26, 2006 EPP 245 Statistical Analysis of Laboratory Data
15
. regress pemax weight bmp fev1
Source | SS df MS Number of obs = 25-------------+------------------------------ F( 3, 21) = 9.28 Model | 15294.4519 3 5098.15064 Prob > F = 0.0004 Residual | 11538.1881 21 549.437528 R-squared = 0.5700-------------+------------------------------ Adj R-squared = 0.5086 Total | 26832.64 24 1118.02667 Root MSE = 23.44
------------------------------------------------------------------------------ pemax | Coef. Std. Err. t P>|t| [95% Conf. Interval]-------------+---------------------------------------------------------------- weight | 1.536475 .3644235 4.22 0.000 .7786149 2.294335 bmp | -1.465406 .5792906 -2.53 0.019 -2.670106 -.260705 fev1 | 1.108629 .5143694 2.16 0.043 .0389396 2.178319 _cons | 126.3336 34.71986 3.64 0.002 54.12965 198.5375------------------------------------------------------------------------------
October 26, 2006 EPP 245 Statistical Analysis of Laboratory Data
16
. stepwise, pr(.05): regress pemax age sex height weight bmp fev1 rv frc tlc begin with full modelp = 0.8123 >= 0.0500 removing sexp = 0.6688 >= 0.0500 removing tlcp = 0.6384 >= 0.0500 removing frcp = 0.6156 >= 0.0500 removing agep = 0.5572 >= 0.0500 removing heightp = 0.1462 >= 0.0500 removing rv
Source | SS df MS Number of obs = 25-------------+------------------------------ F( 3, 21) = 9.28 Model | 15294.4519 3 5098.15064 Prob > F = 0.0004 Residual | 11538.1881 21 549.437528 R-squared = 0.5700-------------+------------------------------ Adj R-squared = 0.5086 Total | 26832.64 24 1118.02667 Root MSE = 23.44
------------------------------------------------------------------------------ pemax | Coef. Std. Err. t P>|t| [95% Conf. Interval]-------------+---------------------------------------------------------------- fev1 | 1.108629 .5143694 2.16 0.043 .0389396 2.178319 weight | 1.536475 .3644235 4.22 0.000 .7786149 2.294335 bmp | -1.465406 .5792906 -2.53 0.019 -2.670106 -.260705 _cons | 126.3336 34.71986 3.64 0.002 54.12965 198.5375------------------------------------------------------------------------------
October 26, 2006 EPP 245 Statistical Analysis of Laboratory Data
17
. stepwise, pr(.1) pe(.05): regress pemax age sex height weight bmp fev1 rv frc tlc begin with full modelp = 0.8123 >= 0.1000 removing sexp = 0.6688 >= 0.1000 removing tlcp = 0.6384 >= 0.1000 removing frcp = 0.6156 >= 0.1000 removing agep = 0.5572 >= 0.1000 removing heightp = 0.1462 >= 0.1000 removing rv
Source | SS df MS Number of obs = 25-------------+------------------------------ F( 3, 21) = 9.28 Model | 15294.4519 3 5098.15064 Prob > F = 0.0004 Residual | 11538.1881 21 549.437528 R-squared = 0.5700-------------+------------------------------ Adj R-squared = 0.5086 Total | 26832.64 24 1118.02667 Root MSE = 23.44
------------------------------------------------------------------------------ pemax | Coef. Std. Err. t P>|t| [95% Conf. Interval]-------------+---------------------------------------------------------------- fev1 | 1.108629 .5143694 2.16 0.043 .0389396 2.178319 weight | 1.536475 .3644235 4.22 0.000 .7786149 2.294335 bmp | -1.465406 .5792906 -2.53 0.019 -2.670106 -.260705 _cons | 126.3336 34.71986 3.64 0.002 54.12965 198.5375------------------------------------------------------------------------------
October 26, 2006 EPP 245 Statistical Analysis of Laboratory Data
18
Cautionary Notes
• The significance levels are not necessarily believable after variable selection
• The original full model F-statistic is significant, indicating that there is some significant relationship: F(9,15) = 2.93, p = 0.0320
• After variable selection, F(3,21) = 9.28, p = 0.0004, which is biased.
October 26, 2006 EPP 245 Statistical Analysis of Laboratory Data
19
set obs 25generate x1 = invnormal(uniform())generate x2 = invnormal(uniform())generate x3 = invnormal(uniform())generate x4 = invnormal(uniform())generate x5 = invnormal(uniform())generate x6 = invnormal(uniform())generate x7 = invnormal(uniform())generate x8 = invnormal(uniform())generate x9 = invnormal(uniform())generate y = invnormal(uniform())regress y x1 x2 x3 x4 x5 x6 x7 x8 x9stepwise, pr(.1): regress y x1 x2 x3 x4 x5 x6 x7 x8 x9
October 26, 2006 EPP 245 Statistical Analysis of Laboratory Data
20
. regress y x1 x2 x3 x4 x5 x6 x7 x8 x9
Source | SS df MS Number of obs = 25-------------+------------------------------ F( 9, 15) = 0.91 Model | 12.3235639 9 1.36928488 Prob > F = 0.5397 Residual | 22.5105993 15 1.50070662 R-squared = 0.3538-------------+------------------------------ Adj R-squared = -0.0340 Total | 34.8341632 24 1.45142347 Root MSE = 1.225
------------------------------------------------------------------------------ y | Coef. Std. Err. t P>|t| [95% Conf. Interval]-------------+---------------------------------------------------------------- x1 | -.0441858 .2998066 -0.15 0.885 -.6832085 .594837 x2 | -.9078136 .4347798 -2.09 0.054 -1.834525 .0188976 x3 | .2076754 .3789522 0.55 0.592 -.6000421 1.015393 x4 | -.0056383 .3319125 -0.02 0.987 -.7130931 .7018166 x5 | -.330546 .3854497 -0.86 0.405 -1.152113 .4910207 x6 | .0202964 .3470704 0.06 0.954 -.7194666 .7600594 x7 | -.073401 .3135234 -0.23 0.818 -.7416603 .5948583 x8 | -.0552909 .3026913 -0.18 0.858 -.7004621 .5898803 x9 | -.3190092 .3137931 -1.02 0.325 -.9878434 .349825 _cons | -.2490392 .3078424 -0.81 0.431 -.9051898 .4071113------------------------------------------------------------------------------
October 26, 2006 EPP 245 Statistical Analysis of Laboratory Data
21
. stepwise, pr(.1): regress y x1 x2 x3 x4 x5 x6 x7 x8 x9 begin with full modelp = 0.9867 >= 0.1000 removing x4p = 0.9545 >= 0.1000 removing x6p = 0.8456 >= 0.1000 removing x1p = 0.8165 >= 0.1000 removing x7p = 0.7506 >= 0.1000 removing x8p = 0.5023 >= 0.1000 removing x3p = 0.2866 >= 0.1000 removing x5p = 0.2081 >= 0.1000 removing x9
Source | SS df MS Number of obs = 25-------------+------------------------------ F( 1, 23) = 7.23 Model | 8.33379862 1 8.33379862 Prob > F = 0.0131 Residual | 26.5003646 23 1.15218977 R-squared = 0.2392-------------+------------------------------ Adj R-squared = 0.2062 Total | 34.8341632 24 1.45142347 Root MSE = 1.0734
------------------------------------------------------------------------------ y | Coef. Std. Err. t P>|t| [95% Conf. Interval]-------------+---------------------------------------------------------------- x2 | -.6644002 .2470417 -2.69 0.013 -1.175445 -.1533555 _cons | -.1523124 .214703 -0.71 0.485 -.5964594 .2918346------------------------------------------------------------------------------