model selection

23
Model selection Best subsets regression

Upload: tom

Post on 08-Feb-2016

35 views

Category:

Documents


1 download

DESCRIPTION

Model selection. Best subsets regression. Statement of problem. A common problem is that there is a large set of candidate predictor variables. Goal is to choose a small subset from the larger set so that the resulting regression model is simple , yet have good predictive ability. - PowerPoint PPT Presentation

TRANSCRIPT

Page 1: Model selection

Model selection

Best subsets regression

Page 2: Model selection

Statement of problem

• A common problem is that there is a large set of candidate predictor variables.

• Goal is to choose a small subset from the larger set so that the resulting regression model is simple, yet have good predictive ability.

Page 3: Model selection

Example: Cement data

• Response y: heat evolved in calories during hardening of cement on a per gram basis

• Predictor x1: % of tricalcium aluminate

• Predictor x2: % of tricalcium silicate

• Predictor x3: % of tetracalcium alumino ferrite

• Predictor x4: % of dicalcium silicate

Page 4: Model selection

Example: Cement data

83.35

105.05

6

16

37.25

59.75

8.75

18.25

83.35105

.05

19.5

46.5

6 1637.

2559.

75 8.75

18.25 19.

546.

5

y

x1

x2

x3

x4

Page 5: Model selection

Two basic methods of selecting predictors

• Stepwise regression: Enter and remove predictors, in a stepwise manner, until no justifiable reason to enter or remove more.

• Best subsets regression: Select the subset of predictors that do the best at meeting some well-defined objective criterion.

Page 6: Model selection

Why best subsets regression?

# of predictors (p-1)

# of regression models

1 2 : ( ) (x1)

2 4 : ( ) (x1) (x2) (x1, x2)

3 8: ( ) (x1) (x2) (x3) (x1, x2) (x1, x3) (x2, x3) (x1, x2, x3)

4 16: 1 none, 4 one, 6 two, 4 three, 1 four

Page 7: Model selection

Why best subsets regression?

• If there are p-1 possible predictors, then there are 2p-1 possible regression models containing the predictors.

• For example, 10 predictors yields 210 = 1024 possible regression models.

• A best subsets algorithm determines the best subsets of each size, so that choice of the final model can be made by researcher.

Page 8: Model selection

What is used to judge “best”?

• R-squared• Adjusted R-squared• MSE (or S = square root of MSE)• Mallow’s Cp

Page 9: Model selection

R-squared

SSTOSSE

SSTOSSRR 12

Use the R-squared values to find the point where adding more predictors is not worthwhile because it leads to a very small increase in R-squared.

Page 10: Model selection

Adjusted R-squared or MSE

MSESSTOn

SSTOSSE

pnnRa

11112

Adjusted R-squared increases only if MSE decreases, so adjusted R-squared and MSE provide equivalent information.

Find a few subsets for which MSE is smallest (or adjusted R-squared is largest) or so close to the smallest (largest) that adding more predictors is not worthwhile.

Page 11: Model selection

Mallow’s Cp criterion

The goal is to minimize the total standardized mean square error of prediction:

2

12

ˆ1

n

iiipp YEYE

n

i

n

iipiipp YVarYEYE

1 1

2

2ˆˆ1

which equals:

which in English is:

variancesomebias some p

Page 12: Model selection

Mallow’s Cp criterion

pnXXMSE

SSEC

p

pp 2

),...,( 11

Mallow’s Cp statistic

estimates p

where:

• SSEp is the error sum of squares for the fitted (subset) regression model with p parameters.

• MSE(X1,…, Xp-1) is the MSE of the model containing all p-1 predictors. It is an unbiased estimator of σ2.

• p is the number of parameters in the (subset) model

Page 13: Model selection

Facts about Mallow’s Cp

• Subset models with small Cp values have a small total standardized MSE of prediction.

• When the Cp value is …– near p, the bias is small (next to none),– much greater than p, the bias is substantial,– below p, it is due to sampling error; interpret as no bias.

• For the largest model with all possible predictors, Cp= p (always).

Page 14: Model selection

Using the Cp criterion

• So, identify subsets of predictors for which:– the Cp value is smallest, and

– the Cp value is near p (if possible)

• In general, though, don’t always choose the largest model just because it yields Cp= p.

Page 15: Model selection

Best Subsets Regression: y versus x1, x2, x3, x4

Response is y

x x x x Vars R-Sq R-Sq(adj) C-p S 1 2 3 4

1 67.5 64.5 138.7 8.9639 X 1 66.6 63.6 142.5 9.0771 X 2 97.9 97.4 2.7 2.4063 X X 2 97.2 96.7 5.5 2.7343 X X 3 98.2 97.6 3.0 2.3087 X X X 3 98.2 97.6 3.0 2.3121 X X X 4 98.2 97.4 5.0 2.4460 X X X X

Page 16: Model selection

Stepwise Regression: y versus x1, x2, x3, x4 Alpha-to-Enter: 0.15 Alpha-to-Remove: 0.15 Response is y on 4 predictors, with N = 13

Step 1 2 3 4Constant 117.57 103.10 71.65 52.58

x4 -0.738 -0.614 -0.237 T-Value -4.77 -12.62 -1.37 P-Value 0.001 0.000 0.205

x1 1.44 1.45 1.47T-Value 10.40 12.41 12.10P-Value 0.000 0.000 0.000

x2 0.416 0.662T-Value 2.24 14.44P-Value 0.052 0.000

S 8.96 2.73 2.31 2.41R-Sq 67.45 97.25 98.23 97.87R-Sq(adj) 64.50 96.70 97.64 97.44C-p 138.7 5.5 3.0 2.7

Page 17: Model selection

Example: Modeling PIQ

130.5

91.5

100.728

86.283

73.25

65.75

130.591.

5

170.5

127.5

100.72

886.

283 73.25

65.75

170.5

127.5

PIQ

MRI

Height

Weight

Page 18: Model selection

Best Subsets Regression: PIQ versus MRI, Height, WeightResponse is PIQ

H W e e i i M g g R h h Vars R-Sq R-Sq(adj) C-p S I t t

1 14.3 11.9 7.3 21.212 X 1 0.9 0.0 13.8 22.810 X 2 29.5 25.5 2.0 19.510 X X 2 19.3 14.6 6.9 20.878 X X 3 29.5 23.3 4.0 19.794 X X X

Page 19: Model selection

Stepwise Regression: PIQ versus MRI, Height, Weight Alpha-to-Enter: 0.15 Alpha-to-Remove: 0.15 Response is PIQ on 3 predictors, with N = 38

Step 1 2Constant 4.652 111.276

MRI 1.18 2.06T-Value 2.45 3.77P-Value 0.019 0.001

Height -2.73T-Value -2.75P-Value 0.009

S 21.2 19.5R-Sq 14.27 29.49R-Sq(adj) 11.89 25.46C-p 7.3 2.0

Page 20: Model selection

Example: Modeling BP

120

110

53.25

47.75

97.325

89.375

2.125

1.875

8.275

4.425

72.5

65.5

120110

76.25

30.75

53.25

47.75

97.325

89.375 2.1

251.8

758.2

754.4

25 72.5

65.5

76.25

30.75

BP

Age

Weight

BSA

Duration

Pulse

Stress

Page 21: Model selection

Best Subsets Regression: BP versus Age, Weight, ...Response is BP D u W r S e a P t i t u r A g B i l e g h S o s s Vars R-Sq R-Sq(adj) C-p S e t A n e s

1 90.3 89.7 312.8 1.7405 X 1 75.0 73.6 829.1 2.7903 X 2 99.1 99.0 15.1 0.53269 X X 2 92.0 91.0 256.6 1.6246 X X 3 99.5 99.4 6.4 0.43705 X X X 3 99.2 99.1 14.1 0.52012 X X X 4 99.5 99.4 6.4 0.42591 X X X X 4 99.5 99.4 7.1 0.43500 X X X X 5 99.6 99.4 7.0 0.42142 X X X X X 5 99.5 99.4 7.7 0.43078 X X X X X 6 99.6 99.4 7.0 0.40723 X X X X X X

Page 22: Model selection

Stepwise Regression: BP versus Age, Weight, BSA, Duration, Pulse, Stress Alpha-to-Enter: 0.15 Alpha-to-Remove: 0.15 Response is BP on 6 predictors, with N = 20

Step 1 2 3Constant 2.205 -16.579 -13.667

Weight 1.201 1.033 0.906T-Value 12.92 33.15 18.49P-Value 0.000 0.000 0.000

Age 0.708 0.702T-Value 13.23 15.96P-Value 0.000 0.000

BSA 4.6T-Value 3.04P-Value 0.008

S 1.74 0.533 0.437R-Sq 90.26 99.14 99.45R-Sq(adj) 89.72 99.04 99.35C-p 312.8 15.1 6.4

Page 23: Model selection

Best subsets regression

• Stat >> Regression >> Best subsets …• Specify response and all possible predictors.• If desired, specify predictors that must be

included in every model. (Researcher’s knowledge!)

• Select OK. Results appear in session window.