1 g89.2229 lect 13w imputation (data augmentation) of missing data multiple imputation examples...

10
1 G89.2229 Lect 13W • Imputation (data augmentation) of missing data • Multiple imputation • Examples G89.2229 Multiple Regression Week 13 (Wednesday)

Upload: kelley-atkins

Post on 13-Dec-2015

218 views

Category:

Documents


1 download

TRANSCRIPT

Page 1: 1 G89.2229 Lect 13W Imputation (data augmentation) of missing data Multiple imputation Examples G89.2229 Multiple Regression Week 13 (Wednesday)

1G89.2229 Lect 13W

• Imputation (data augmentation) of missing data

• Multiple imputation

• Examples

G89.2229 Multiple Regression Week 13 (Wednesday)

Page 2: 1 G89.2229 Lect 13W Imputation (data augmentation) of missing data Multiple imputation Examples G89.2229 Multiple Regression Week 13 (Wednesday)

2G89.2229 Lect 13W

Missing Data Woes

• Suppose a subject has completed nearly all of a two hour interview, but forgets (or refuses) to answer a handful of questions.» If these questions are used to define a variable

to be included in a model, the subject’s whole record is lost

• Suppose in a 10 year longitudinal study, a subject misses one annual followup» Listwise deletion results in the full sequence of

data being eliminated

Page 3: 1 G89.2229 Lect 13W Imputation (data augmentation) of missing data Multiple imputation Examples G89.2229 Multiple Regression Week 13 (Wednesday)

3G89.2229 Lect 13W

Model-based ML approaches

• There are several multivariate methods that can use available data for subjects whose records are incomplete» Structural equation methods» Multilevel methods» Generalized estimating equations» Survival analysis

• These methods assume that the same model applies to incomplete as well as complete records» Data Missing at Random (MAR)

Page 4: 1 G89.2229 Lect 13W Imputation (data augmentation) of missing data Multiple imputation Examples G89.2229 Multiple Regression Week 13 (Wednesday)

4G89.2229 Lect 13W

Data Augmentation for Multiple Regression

• For multiple regression, it is often convenient to use multiple imputation» We generate guesses of what the data might

have been had they been observed• The guesses use available data and

relations among the variables• The guesses are constructed to have

reasonable variances» We repeat the imputation process 5 or more

times

Page 5: 1 G89.2229 Lect 13W Imputation (data augmentation) of missing data Multiple imputation Examples G89.2229 Multiple Regression Week 13 (Wednesday)

5G89.2229 Lect 13W

Inference from Multiple Imputation

• Rubin (1987) recommends computing for each regression weight» An average across the K imputations

• An estimate of the standard error that takes into account the variation over imputations

Kk kBKB 1

1

11

122

KBB

KK

SS kkBB

Page 6: 1 G89.2229 Lect 13W Imputation (data augmentation) of missing data Multiple imputation Examples G89.2229 Multiple Regression Week 13 (Wednesday)

6G89.2229 Lect 13W

Example

Coef 1 2 3 4 5 Mean Var Var2 se t(Constant) 2.04 1.16 1.54 -3.66 7.65 1.74 16.13 29.53 5.43 0.32CESD2 0.59 0.70 0.59 0.68 0.51 0.61 0.01 0.01 0.11 5.72Percss 0.87 1.40 1.05 2.37 0.18 1.18 0.65 1.29 1.14 1.03MARRIED -1.15 -2.32 -1.59 -1.35 -3.13 -1.91 0.66 2.99 1.73 -1.10ETHB -0.01 -0.92 0.37 1.36 -1.85 -0.21 1.51 6.02 2.45 -0.09ETHL 3.16 1.74 4.23 3.03 0.79 2.59 1.79 5.22 2.28 1.13ETHO 0.61 -1.00 3.24 2.45 0.00 1.06 3.06 11.00 3.32 0.32Se mean sq(Se)(Constant) 3.17 3.07 3.42 3.03 3.24 10.175CESD2 0.06 0.07 0.07 0.06 0.06 0.004Percss 0.71 0.69 0.78 0.70 0.71 0.518MARRIED 1.50 1.41 1.54 1.43 1.51 2.189ETHB 2.07 1.97 2.14 1.97 2.11 4.211ETHL 1.77 1.67 1.84 1.67 1.81 3.068ETHO 2.75 2.59 2.82 2.60 2.78 7.333

Page 7: 1 G89.2229 Lect 13W Imputation (data augmentation) of missing data Multiple imputation Examples G89.2229 Multiple Regression Week 13 (Wednesday)

7G89.2229 Lect 13W

Goals of Accounting for Missing Values

• Adjust for bias» Make use of more representative sample» Don’t let respondents determine the nature of

the sample

• Sometimes increase power» If missing data is limited, including additional

cases can help.» Standard errors of individual imputation runs

are likely to be smaller

Page 8: 1 G89.2229 Lect 13W Imputation (data augmentation) of missing data Multiple imputation Examples G89.2229 Multiple Regression Week 13 (Wednesday)

8G89.2229 Lect 13W

Approaches to imputation

• SPSS has a regression-based approach» Relies on structure of complete cases to infer

what the missing cases look like.

• Newer Bayesian methods iterate to adjust the prediction model for the imputed information.» Uses beliefs or information about distribution

form» Example: NORM (free software)

• http://methodology.psu.edu/mde.html

» Example: PROC MI in SAS

Page 9: 1 G89.2229 Lect 13W Imputation (data augmentation) of missing data Multiple imputation Examples G89.2229 Multiple Regression Week 13 (Wednesday)

9G89.2229 Lect 13W

Some References

Graham, J.W., Hofer, S.M., Donaldson, S.I., MacKinnon, D.P., & Schafer, J.L. (1997) Analysis with missing data in prevention research. In K. Bryant, W. Windle, and S. West (Eds.), New Methodological Approaches to Alcohol Prevention Research. Washington, DC: American Psychological Association.

Horton, N.J., & Lipsitz, S.R. (2001). Multiple Imputation in Practice: Packages for regression models with missing variables. The American Statistician, 55(3), 244-254. Little, R.J.A. (1995). Modeling the drop-out mechanism in repeated-measures studies. Journal of the American Statistical Association, 90, 1112--1121.

Little, R.J.A., & Rubin, D.B. (1987). Statistical Analysis with Missing Data. New York: Wiley.

Little, R.J.A., & Rubin, D.B. (1989). The analysis of social science data with missing values. Sociological Methods and Research, 18, 292-326.

Rubin, D.B. (1976). Inference and missing data. Biometrika, 63, 581-592.

Rubin, D.B. (1987). Multiple imputation for nonresponse in surveys. New York: Wiley.

Schafer, J.L. & Graham, J.W. (2002). Missing data: Our view of the state of the art. Psychological Methods, 7, 147-177 (available as PDF on class web site).

Page 10: 1 G89.2229 Lect 13W Imputation (data augmentation) of missing data Multiple imputation Examples G89.2229 Multiple Regression Week 13 (Wednesday)

10G89.2229 Lect 13W

Some websites

• Penn State Methodology Centerhttp://methodology.psu.edu/mde.html

• Multiple Imputation Onlinehttp://www.multiple-imputation.com/

• SAS description of PROC MIhttp://support.sas.com/rnd/app/papers/mi.pdf