1 g89.2229 lect 13w imputation (data augmentation) of missing data multiple imputation examples...
TRANSCRIPT
1G89.2229 Lect 13W
• Imputation (data augmentation) of missing data
• Multiple imputation
• Examples
G89.2229 Multiple Regression Week 13 (Wednesday)
2G89.2229 Lect 13W
Missing Data Woes
• Suppose a subject has completed nearly all of a two hour interview, but forgets (or refuses) to answer a handful of questions.» If these questions are used to define a variable
to be included in a model, the subject’s whole record is lost
• Suppose in a 10 year longitudinal study, a subject misses one annual followup» Listwise deletion results in the full sequence of
data being eliminated
3G89.2229 Lect 13W
Model-based ML approaches
• There are several multivariate methods that can use available data for subjects whose records are incomplete» Structural equation methods» Multilevel methods» Generalized estimating equations» Survival analysis
• These methods assume that the same model applies to incomplete as well as complete records» Data Missing at Random (MAR)
4G89.2229 Lect 13W
Data Augmentation for Multiple Regression
• For multiple regression, it is often convenient to use multiple imputation» We generate guesses of what the data might
have been had they been observed• The guesses use available data and
relations among the variables• The guesses are constructed to have
reasonable variances» We repeat the imputation process 5 or more
times
5G89.2229 Lect 13W
Inference from Multiple Imputation
• Rubin (1987) recommends computing for each regression weight» An average across the K imputations
• An estimate of the standard error that takes into account the variation over imputations
Kk kBKB 1
1
11
122
KBB
KK
SS kkBB
6G89.2229 Lect 13W
Example
Coef 1 2 3 4 5 Mean Var Var2 se t(Constant) 2.04 1.16 1.54 -3.66 7.65 1.74 16.13 29.53 5.43 0.32CESD2 0.59 0.70 0.59 0.68 0.51 0.61 0.01 0.01 0.11 5.72Percss 0.87 1.40 1.05 2.37 0.18 1.18 0.65 1.29 1.14 1.03MARRIED -1.15 -2.32 -1.59 -1.35 -3.13 -1.91 0.66 2.99 1.73 -1.10ETHB -0.01 -0.92 0.37 1.36 -1.85 -0.21 1.51 6.02 2.45 -0.09ETHL 3.16 1.74 4.23 3.03 0.79 2.59 1.79 5.22 2.28 1.13ETHO 0.61 -1.00 3.24 2.45 0.00 1.06 3.06 11.00 3.32 0.32Se mean sq(Se)(Constant) 3.17 3.07 3.42 3.03 3.24 10.175CESD2 0.06 0.07 0.07 0.06 0.06 0.004Percss 0.71 0.69 0.78 0.70 0.71 0.518MARRIED 1.50 1.41 1.54 1.43 1.51 2.189ETHB 2.07 1.97 2.14 1.97 2.11 4.211ETHL 1.77 1.67 1.84 1.67 1.81 3.068ETHO 2.75 2.59 2.82 2.60 2.78 7.333
7G89.2229 Lect 13W
Goals of Accounting for Missing Values
• Adjust for bias» Make use of more representative sample» Don’t let respondents determine the nature of
the sample
• Sometimes increase power» If missing data is limited, including additional
cases can help.» Standard errors of individual imputation runs
are likely to be smaller
8G89.2229 Lect 13W
Approaches to imputation
• SPSS has a regression-based approach» Relies on structure of complete cases to infer
what the missing cases look like.
• Newer Bayesian methods iterate to adjust the prediction model for the imputed information.» Uses beliefs or information about distribution
form» Example: NORM (free software)
• http://methodology.psu.edu/mde.html
» Example: PROC MI in SAS
9G89.2229 Lect 13W
Some References
Graham, J.W., Hofer, S.M., Donaldson, S.I., MacKinnon, D.P., & Schafer, J.L. (1997) Analysis with missing data in prevention research. In K. Bryant, W. Windle, and S. West (Eds.), New Methodological Approaches to Alcohol Prevention Research. Washington, DC: American Psychological Association.
Horton, N.J., & Lipsitz, S.R. (2001). Multiple Imputation in Practice: Packages for regression models with missing variables. The American Statistician, 55(3), 244-254. Little, R.J.A. (1995). Modeling the drop-out mechanism in repeated-measures studies. Journal of the American Statistical Association, 90, 1112--1121.
Little, R.J.A., & Rubin, D.B. (1987). Statistical Analysis with Missing Data. New York: Wiley.
Little, R.J.A., & Rubin, D.B. (1989). The analysis of social science data with missing values. Sociological Methods and Research, 18, 292-326.
Rubin, D.B. (1976). Inference and missing data. Biometrika, 63, 581-592.
Rubin, D.B. (1987). Multiple imputation for nonresponse in surveys. New York: Wiley.
Schafer, J.L. & Graham, J.W. (2002). Missing data: Our view of the state of the art. Psychological Methods, 7, 147-177 (available as PDF on class web site).
10G89.2229 Lect 13W
Some websites
• Penn State Methodology Centerhttp://methodology.psu.edu/mde.html
• Multiple Imputation Onlinehttp://www.multiple-imputation.com/
• SAS description of PROC MIhttp://support.sas.com/rnd/app/papers/mi.pdf