Next Upcoming Google+ Hangout: Tuesday, August 27 @ 7PM (CST) - To Participate CLICK HERE

Search For Topics/Content

Missing Data/Imputation Discussion > Deleting outliers in a multiple imputation data set

Hi there,

I had a fair bit of missing data in the variables I am using for a multiple regression, so I used multiple imputation to create a more complete data set. However, there are also quite a few multivariate outliers that I am now trying to delete - I've deleted those cases from the original data set (using Mahalanobis distance) but my supervisor has asked if I can also delete the same cases from the pooled results and then I can compare the pooled result to the imputed results. Is there a relatively simple way to do this (there are 61 outliers in the original data set so I don't want to go through each imputation deleting them manually)?

Or am I on the wrong track and I should have deleted the multivariate outliers prior to using multiple imputation? (although Tabachnick & Fidell seem to say to deal with missing data first)

November 5, 2012 | Unregistered CommenterClaire

Whether you exclude before or after imputation depends on your intuitions about why they are outliers. If you believe that the outliers are invalid somehow (e.g. not actually from the population you are intending to sample or are from measurement error), then I'd exclude prior to imputation, so they don't bias your imputation model. However, if you don't think they are invalid, but just extreme (but valid) scores, then I wouldn't exclude them.

November 20, 2012 | Registered CommenterJeremy Taylor