I am running into a problem working with missing data and multiple imputation. It's a technical problem in fact, but I can't seem to be able to work it out.
I am working with a pretty big database (about 300.000 cases), of which some cases (approximately 10%) have missing data for some variables, specifically date of birth and sex. Other data, like highest educational level achieved, is complete. The issue comes from an error in my database, but I need to solve it to achieve a complete dataset, by imputing missing values.
I am using multiple imputation (which runs fine and gets me correct results after playing with it for a while), and after running it, I end up with five datasets and one pooled data, which AFAIK averages the different values imputed in each iteration for each missing case. This lets me run stat tests that would be otherwise impossible due to missing data.
However,what I need to generate is a new dataset with these values imputed. It seems for me it should be pretty straightforward, but I can't seem to get it done. How could I save the dataset with the complete missing values imputed (already averaged from each impute iteration) into one final dataset imputed? Sounds easy to my ears, but I can't seem to get it done!
Hi!
I am running into a problem working with missing data and multiple imputation. It's a technical problem in fact, but I can't seem to be able to work it out.
I am working with a pretty big database (about 300.000 cases), of which some cases (approximately 10%) have missing data for some variables, specifically date of birth and sex. Other data, like highest educational level achieved, is complete. The issue comes from an error in my database, but I need to solve it to achieve a complete dataset, by imputing missing values.
I am using multiple imputation (which runs fine and gets me correct results after playing with it for a while), and after running it, I end up with five datasets and one pooled data, which AFAIK averages the different values imputed in each iteration for each missing case. This lets me run stat tests that would be otherwise impossible due to missing data.
However,what I need to generate is a new dataset with these values imputed. It seems for me it should be pretty straightforward, but I can't seem to get it done. How could I save the dataset with the complete missing values imputed (already averaged from each impute iteration) into one final dataset imputed? Sounds easy to my ears, but I can't seem to get it done!
Best regards and thanks in advance,
Sebastián