Missing Data/Imputation Discussion > Advice on multiple imputation
Suresh,
You are dealing with an interesting situation. First of all, I don't recommend doing so many imputations (35 to 50). Most MI research that I've seen indicates that there is little benefit from creating more than 3 to 5 imputation datasets to analyze and pool (a good explanation of why this is the case can be found on this Penn State MI FAQ page.
Regarding the loss of significance, if your imputation models are poor, it is possible to introduce a lot of noise (random error) into your data through imputation, thus increasing your standard errors and decreasing your power.
With imputation models, more information is typically better, as long as it is relevant information. It isn't realistic to explain building imputation models in-depth on this forum, but here is good resource if you are willing to conduct the MI in R, which I also recommend:
Multiple Imputation with Diagnostics (mi) in R: Opening Windows into the Black Box
I hope this is helpful!
Many thanks Jeremy, that is quite helpful.
Kind regards
Suresh
Glad it was helpful!
Dear Jeremy,
I am doing MI on my dataset in SPSS 21. I have about 35% missing data and therefore have done 35 imputations. I am imputing 3 binary outcomes for subjects lost to follow-up. I am facing a few issues and hope you would be able to offer some advice.
1. My main issue is that some of the predictor variables significant (some with p<0.01) on complete case regression analysis are no longer significant on imputed data (I have done 20, 35 and 50 imputations just to check and the results are similar). Is there a possibility that MI is reducing the power because of increased within- and between-imputations variance? Is this common and any possible solutions? I have also tried to impute all 3 outcome variables separately (instead of a single MI model) but the results are not very different.
2. Do you think doing something like bootstrapping on imputed data might help? I am not sure how easy bootstrapping and interpreting the results is for imputed data.
3. How do I check and chose the right imputation and analysis model? I could do residual plots for continuous variables, but how to I compare observed and imputed data for categorical variables?
I have spent a lot of time searching the net for solutions and haven't found satisfactory answers so far (there isn't much info especially for categorical variables) and would greatly appreciate advice/suggestions. Many thanks