multiple imputation: joint and conditional modeling of missing data

23
Mul$ple Imputa$on Octavious Talbot & Kazuki Yoshida Dec 16, 2015 BIO235 Final Project This document was created by students to fulfill a course requirement. Be aware of poten$al errors, and check with the original papers. There is a corresponding report document at hPps://github.com/kaz-yos/misc/blob/master/MI_Project.Rnw.pdf

Upload: kazuki-yoshida

Post on 11-Apr-2017

1.236 views

Category:

Data & Analytics


0 download

TRANSCRIPT

Page 1: Multiple Imputation: Joint and Conditional Modeling of Missing Data

Mul$pleImputa$on

OctaviousTalbot&KazukiYoshidaDec16,2015

BIO235FinalProjectThisdocumentwascreatedbystudentstofulfillacourserequirement.Beawareofpoten$alerrors,andcheckwiththeoriginalpapers.ThereisacorrespondingreportdocumentathPps://github.com/kaz-yos/misc/blob/master/MI_Project.Rnw.pdf

Page 2: Multiple Imputation: Joint and Conditional Modeling of Missing Data

Outline

•  Background•  Mul$pleImputa$on–  JointDistribu$on– Condi$onalDistribu$on

•  Compare/Contrast•  Conclusion

Page 3: Multiple Imputation: Joint and Conditional Modeling of Missing Data

Background

•  Missingdataisanomnipresentproblemthataffectsalmostallrealdatasets.

•  MIhasbecomeoneofthemostpopularmethodstoaddressmissingdata.

•  WereviewmajorMIalgorithms,includingtheirrela$vestrengthsandweaknessesandimplica$onsforhigh-dimensionaldata.

Page 4: Multiple Imputation: Joint and Conditional Modeling of Missing Data

Missingdataclassifica$on

•  MissingCompletelyAtRandom(MCAR)

•  MissingAtRandom(MAR)

•  NotMissingAtRandom(NMAR)

Page 5: Multiple Imputation: Joint and Conditional Modeling of Missing Data

Approaches

•  Insufficient– Completecases,indicator,singleimputa$on

Page 6: Multiple Imputation: Joint and Conditional Modeling of Missing Data

Approaches

•  Insufficient– Completecases,indicator,singleimputa$on

•  BePer– Mul$pleimputa$on

Page 7: Multiple Imputation: Joint and Conditional Modeling of Missing Data

Approaches

•  Insufficient– Completecases,indicator,singleimputa$on

•  BePer– Mul$pleimputa$on– Likelihood-based– Weigh$ng

Page 8: Multiple Imputation: Joint and Conditional Modeling of Missing Data

Approaches

•  Insufficient– Completecases,indicator,singleimputa$on

•  BePer– Mul$pleimputa$on– Likelihood-based– Weigh$ng

•  Best

Page 9: Multiple Imputation: Joint and Conditional Modeling of Missing Data

Approaches

•  Insufficient– Completecases,indicator,singleimputa$on

•  BePer– Mul$pleimputa$on– Likelihood-based– Weigh$ng

•  Best– Preven$on

Page 10: Multiple Imputation: Joint and Conditional Modeling of Missing Data

TheorybehindMI

•  Posteriordistribu$onofquan$tyofinterestQgivenobserveddataonly

•  Likelihood-basedapproachessuchasfullinforma$onmaximumlikelihood(FIML)modelthisexpressionitself.Butitcanbedifficult.

Page 11: Multiple Imputation: Joint and Conditional Modeling of Missing Data

TheorybehindMI

•  Posteriordistribu$onofquan$tyofinterestQgivenobserveddataonly

•  Decomposeintomoretractableparts.– Distribu$onofQgivencompletedata(outcomemodel)

– Distribu$onofmissingdatagivenobserveddata(missingdatamodel)

–  Integra$onovermissingdatadistribu$on

Page 12: Multiple Imputation: Joint and Conditional Modeling of Missing Data

OverviewofMI

vanBuuren1999

Rubin’srule

Page 13: Multiple Imputation: Joint and Conditional Modeling of Missing Data

OverviewofMI

Imputebasedonmissingdatamodel

Outcomemodelusingcompletedata

“Integrate”overimputeddatasets

Whatyouget

LiPle2002

Page 14: Multiple Imputation: Joint and Conditional Modeling of Missing Data

MI:Twoapproachesfor

•  Jointdistribu$onMI– U$lizesassumedjointdistribu$onofmissingandobserveddatatoimputemissingvalues

•  Condi$onaldistribu$onMI– Modelsthecondi$onaldistribu$onofpar$allyobservedvalues(missingdata)

Page 15: Multiple Imputation: Joint and Conditional Modeling of Missing Data

Jointapproach

•  Twomainapproaches–  Imputa$on-Posterior(IP)algorithm– Expecta$onMaximiza$on(EM)algorithm

•  UsualAssump$ons– MVNjointdistribu$onforen$redataset– MAR

Page 16: Multiple Imputation: Joint and Conditional Modeling of Missing Data

Jointapproach

Samplesfromdistribu$onofMVNparametersareobtained(MCMC).Samplesarecorrelated.UsingonechainforeachMVNisasolu$on.Implementedinnorm.

Pointes$matesofMVNparametersareobtained.Es$ma$onuncertaintyislost.BootstrappingEMisasolu$onforthis.Implementedinamelia.

Imputa$on-Posterior(IP)algorithm Expecta$on-Maximiza$on(EM)algorithm

King2001

Page 17: Multiple Imputation: Joint and Conditional Modeling of Missing Data

EMwithbootstrap(amelia)

Honaker2015

->VaryingMVNparameteres$mates

Page 18: Multiple Imputation: Joint and Conditional Modeling of Missing Data

Condi$onalapproach

•  Modelsthemissing-nesswithindis$nctvariablessepeartelyanddoesnotassumejointdistribu$on.MARs$llholds.

Page 19: Multiple Imputation: Joint and Conditional Modeling of Missing Data

Condi$onalapproach

•  Modelsthemissing-nesswithindis$nctvariablessepeartelyanddoesnotassumejointdistribu$on.MARs$llholds.

vanBuuren2006

Page 20: Multiple Imputation: Joint and Conditional Modeling of Missing Data

Comparison•  JointDistribu$on– MVNcanbeanunreasonableassump$onwhendealingwithcategoricalvariablesandrequiresmoreumph

–  Robustwhendealingwithcon$nuousvariables– Guaranteesconvergence(MCMC)

•  Condi$onalDistribu$on–  Rela$velymoreflexible–  Theore$calconvergencepimalls–  Robustinsimula$on

Page 21: Multiple Imputation: Joint and Conditional Modeling of Missing Data

High-dimensionaldata

•  ThejointMIhasanissuewithahugecovariancematrixmanyparameters,whereasthecondi$onalMIhasanoverfinngissueforeachregressionmodel.

•  Introducingstructuresforthecovariancematrix(jointMI)[1]andusingregulariza$on(condi$onalMI)[2]havebeenexamined.

•  Widelyavailablesoqwareimplementa$onsarelacking.

[1]He2014;[2]Zhao2013

Page 22: Multiple Imputation: Joint and Conditional Modeling of Missing Data

Rpackages

SeebelowforRcodeexampleshPp://rpubs.com/kaz_yos/mi-examples

R:miceadds(highdimensionalFCS(condi$onal)throughPLS)SASPROCMI:EMandMCMC(joint)andFCS(condi$onal)Stata:miimputemvn(joint,MCMC),ice(condi$onal),andsmcfcs(condi$onal)

Page 23: Multiple Imputation: Joint and Conditional Modeling of Missing Data

Conclusion

•  Thejointapproachistheore$callymoresound•  Thecondi$onalapproaches$matesthejointapproachandalthoughithasbeeneffec$veinsimula$onsitisnottheore$callyguaranteed.

•  Bothmethodshavedifficultywithhigh-dimensionaldatawherethenumberofcovariatesarelargerthanthenumberofobserva$ons.