Next Upcoming Google+ Hangout: Tuesday, August 27 @ 7PM (CST) - To Participate CLICK HERE

Search For Topics/Content

Missing Data/Imputation Discussion > Multiple Imputation and Pooling Parameter Estimates

I am using (or trying to use) SPSS v. 19.0.1 to perform multiple imputation.

Because SPSS seems to provide only some pooled results (e.g., for t, unstandardized b), how would I go about obtaining other pooled parameter estimates (e.g., F statistic, B, R square change, confidence intervals, df's, partial eta squared, etc)?

Any help would be much appreciated!

March 17, 2011 | Unregistered CommenterLisa

Great question Lisa! It is frustrating that SPSS doesn't aggregate all of the pooled results. The reason for this, is probably because there aren't "standard" ways to report these statistics, since these methods have only recently become more popular (although they've been around for quite a bit longer).

In general, pooled (or summary) statistics for all of these statistics can be calculated by simply calculating the "mean" across the imputation results (generally 5 imputation datasets are sufficient to eliminate bias). There has been some other techniques suggested for pooling F-statistics, but I'm not as familiar with them and haven't seen them implemented very much (Schafer, 1997, pp. 113-114). I hope this is helpful and feel free to keep the questions coming!

PS: I've not yet come across a macro for this that works. I found one at one point, but when I went back to download it, the website was down, unfortunately.

March 17, 2011 | Registered CommenterJeremy Taylor

Thank you, Jeremy, that helps. I am wondering if this is the macro you are referring to:

http://www.spss.com/devcentral/index.cfm?pg=downloadDet&dId=55

I am not sure how to go about installing it, and if this, at all, is what I need?

Another question I have is how to go about reporting results? Is there an APA style for this?

March 17, 2011 | Unregistered CommenterLisa

Hey Lisa,

That actually is not the macro I was talking about, as the one I'd saw wasn't made by IBM (or SPSS), but rather a third-party developer. I'm not familiar with the SPSS/IBM plugin that you found, but it seems like it would be where you need to start.

As for the APA format, I'm not aware of an APA style for it, but the best way to handle this is to look at articles from the journal you might wish to submit this and see how articles they've published have handled this. In general, I'd recommend using standard APA format for reporting the analyses that you perform, since you are simply running ordinary analysis after the imputation is complete. To delineate that you are reporting summary statistics, a subscript might be in order. Other than that, I'd probably report all other stats just as I normally would. I hope that helps!

March 21, 2011 | Registered CommenterJeremy Taylor

Hi Jeremy,

I came across this post and it relieves me to see what you wrote - that I can get pooled (or summary) statistics for all of by simply calculating the "mean" across the imputation results. I've thought about it before but I've been hesitant and spent few days now to find a way.

So, I am planning now to take your suggestion. I have few questions though:
1. My data is categorical so is it acceptable to use "mode"?
2. Do I include the original data with missing values in this process? Intuitively I do not want to but not sure if I'm right.

Thank you!

August 31, 2011 | Unregistered CommenterSoco

Hi Soco!

I'm glad my post was helpful to you! With respect to your questions:

1) What do you want to use mode for? When say you average the estimates across the imputation datasets, I am referring to the estimates from analyses, not averaging the actual imputed values. Analyses are done on all five datasets and then the estimates/parameters from the analysis are averaged to get a pooled estimate. In that circumstance, I'm not sure where mode would be used...

2) Original values are not used when pooling estimates, only the imputed datasets are used.

I hope this is helpful!

Best.

Jeremy

August 31, 2011 | Registered CommenterJeremy Taylor

hi
just wondering if a 'solution' to this was found. I am trying to pool estimates from repeated measures ANOVA following MI and was hoping there was an easy solution as PASW/SPSS does not seem to provide this option.
thanks
mitch

January 31, 2012 | Unregistered Commentermitch

Hi Mitch,

Which estimates in particular are you trying to pool in SPSS? If you have your dataset split by imputation, and you have the imputation add-on (which you probably do, if you ran the MI in SPSS), then many of the parameters are pooled in SPSS. Which are you trying to obtain, specifically?

February 15, 2012 | Registered CommenterJeremy Taylor

I am having the same issue discussed by various authors in this thread, and after reading the responses, I am still a little unclear about how I would proceed.

I ran a 2(time1 and time2 measure) x 2(gender) x 2(couple type) repeated measures ANOVA using 5 different data sets (create from 5 multiple imputations). I now have 5 different SumSquare, F-values, p-values, and partial eta-squared values for time main effect, gender main effect, time*coupletype, and time*coupletype*gender. What do I aggregate across the five data sets to get one final reported value?

Any assistance would be greatly appreciated!

Thanks!

February 29, 2012 | Unregistered CommenterAllen

Allen, you can aggregate or "pool"any statistic you choose. What statistical platform are you using, SPSS?

February 29, 2012 | Registered CommenterJeremy Taylor

Hi there,

I've run the default 5 multiple imputations in spss (v 19) on my dataset and I have split the cases according to the imputation_ variable and I have run a series of univariate GLMs with bonferoni correction. My output matches all of the text books and forums apart from the crucial part of providing the "pooled" section. When I run the analysis it has the swirly icon next to it suggesting it is supported but I don't understand why the output doesn't contain the pooled summary.

Do you know why this might be?

Best,
Emma

June 19, 2012 | Unregistered CommenterEmma

Hi Jeremy,

My question is similar to Allen's. Could you provide more explanation on how to get pooled effect for the statistics listed by Allen (e.g., F value for the main effect; F value for the interaction effect)? I am using SPSS.

Thanks much for your time!

June 19, 2012 | Unregistered CommenterLily

Lily and Emma,

If you are using a version of SPSS that supports MI (which I'm assuming you are, since you produced the MI datasets in SPSS), then the pooled results should be produced automatically. If they aren't for some reason, you can try the following:

1. double click on a table in the results (to activate the table and make it editable; you can also right-click and select "edit content")
2. with the table in edit mode, go to the "Pivot" menu (which should've appeared when you switched to edit mode in the table)
3. drag the imputation pivot component (which is probably on the "rows" section of the pivot table and move it to the "LAYER" (box in the upper left corner)
4. Close the Pivot Tray dialogue
5. Double click on the table again, and you should see a new drop-down menu, click on it and you should see a list of the datasets, and hopefully it will say "pooled" at the bottom

If this does not work, check on the IBM website and see if there is a patch available for your version of SPSS, as it could be a bug in the system (particularly if the specific analysis you are doing has the "swirly icon" next to it when you are split by imputations, indicating that MI is supported for that type of analysis).

I hope this is helpful and feel free to come back and let us know if it ended-up working!

On a broader note, I generally recommend using something other than SPSS when working with data that will require multiple imputation. I really like the "mi" package in R, because it is very flexible, allows users to obtain pooled estimates for almost any kind of analysis, and best of all it's FREE! The learning curve for using R is a bit on the steep-side, but the payoff is HUGE, in my opinion.

June 19, 2012 | Registered CommenterJeremy Taylor

I wish that were true Jeremy, but at least up to v.20, SPSS won't pool GLMs. A shame as it'd be extremely useful.

July 24, 2012 | Unregistered CommenterDuncan Babbage

Even if you have the missing data/multiple imputation add-on, Duncan?

July 28, 2012 | Registered CommenterJeremy Taylor

Hi Jeremy!

Could you recommend a way of pooling repeated-measures MANOVA analyses across imputation results in R? Following your suggestion I tried to tackle the problem in R; learned how to do repated measures MANOVA (using 'car' package), how to use the 'mi' package, but still I have a problem with pooling, because 'mi' package handles pooling only for more standard analyses.

You also recommended above calculating the mean of across the imputation results, to obtain a pooled result. What did you mean by that - averaging F-values, for example?

thanks in advance!
Jakub
ps. I believe the macro you referred to was this one: http://www.socialsciences.leiden.edu/educationandchildstudies/childandfamilystudies/organisation/staffcfs/van-ginkel.html

July 30, 2012 | Unregistered CommenterJakub

Hey Jakub,

Congrats on venturing into the world of R! I'm glad to hear that you've found it useful so far! I've found using the Zelig package useful for running analyses with multiple imputation and obtaining pooled results.

Here is a link to their package page: http://cran.r-project.org/web/packages/Zelig/index.html

August 27, 2012 | Registered CommenterJeremy Taylor

Jeremy, thanks!
I will check this out.
Jakub

August 29, 2012 | Unregistered CommenterJakub

Hi Jeremy

Im trying to do a one way repeated measures anova in spss with 5 imputated datasets
In earlier posts you mentioned pooling your results by simply calculating the mean of your results.
I am interested in the pooled F-value and corresponding p-value, so could I just take the mean of my 5 F-values from the 5 imputated datasets?
How would I find the corresponding pooled p-value?
Because my data violates the assumption of sphericity I would use the Greenhouse-Geisser corrected results.
These results show different df's per imputated dataset. Should I also take the mean degrees of freedom to find the significance level in an F-table?

I'm really confused so I hope I made some sense to you and you can help me out!

September 19, 2012 | Unregistered CommenterAnna

According to the SPSS manual, descriptive statistics is one procedure that should support pooling. However, I get this error message: At least one of the vectors or matrices in the model being added is different in size from the corresponding vector or matrix in previously added models. This model cannot be added. I have not found anything helpful online (in general or through SPSS's site). Does anyone here have any idea what the problem is? Thank you!

September 29, 2012 | Unregistered CommenterHaley

Do to have your data setup as an MI dataset (split by imputation)?

October 18, 2012 | Registered CommenterJeremy Taylor

Can you pool this way?...

For example, you have five imputations. So you could use the aggregate function to create a new dataset in which the five imputations for each case's variable are averaged and aggregated into a single cell. Then, for each missing value, the value that replaces it would be the average value of all the five imputations. From there, you could run your analyses as all the data are already pooled ... wouldn't this equate to the same output response? Or am I missing something?

October 20, 2012 | Unregistered CommenterSteve

Typically the pooling is done at level of the results, not at the data level. This allows for both separate estimates of effects and error to be calculated and then pooled, which is most robust.

October 21, 2012 | Registered CommenterJeremy Taylor

So the bottom line, but correct me if I'm wrong, when it comes to SPSS and imputed data analysis is :

1. Pooled statistics are available (and automatically displayed) when asking SPSS for descriptives

2. SPSS is not able to provide you with a pooled F-value, p-value, etc. when conducting AN(C)OVA and other GLM analyses

3. Use a different program for the latter analyses

The question then remains: are there any calculations you can do to pool the statistics yourself?

November 8, 2012 | Unregistered CommenterYannick

Hi Jeremy,

I'm in the middle of completing my dissertaion for my maths degree using SPSS for collected data regarding side effects of radiotherapy. My dissertation involves writing about how reliable multiple imputation is for medical research purposes. SPSS use is a requirement as my dissertation will be used in further research to compare results from SPSS with results from other statistical software programs.

I'm currently in the process of trying to carry out ANOVA with my data however, I am unable to get a pooled result. I read one of your previous posts from 19th June and followed your instructions, however on the last step, it does not give me the option of pooling the results in the drop down menu. Can you recommend anything for me?

Hope to hear back.

Kate

November 16, 2012 | Unregistered CommenterKate

Hi Kate and Yannick

I have just recently submitted my thesis involving spss and multiple imputation. The process was a big learning curve, especially as I used 20 MI and I searched both the literature and the web. I was initially pooling each parameter estimate manually which took quite some time. But I found that the best way to do it is to double click on the output table, click pivot then pivoting trays, and then click and drag most of the items at the bottom of the tray to the top left tray. As you may already know, this allows you to control which items to display. By allowing full control of what is viewed, you can filter the information so that, for instance, the table only displays the p value for all imputations. As they are all together and in a column, if you hold shift and click on the first and last p value displayed for the imputations., you can select them all. Right click then click 'copy'. Now go to any online mean calculator on google, and press control v to paste all the p values. This may seem lengthy, but it is a lot quicker than having to calculate and input each p value manually. And you just do this for the rest of the parameters. I hope that makes sense. I saved much time this way.

November 16, 2012 | Unregistered CommenterSteve

Hi Steve,
Thank you so much!!! You have rescued my dissertation!!

November 17, 2012 | Unregistered CommenterKate

SPSS made some strange choices with how they implemented MI. For some things they support pooling (e.g. Descriptive s and linear regression), but for other things it does not do it. There are some articles that detail how to pool these estimates (See, for example, Rubin, 1987; Schafer, 1997), and there have been macros created to carry out these procedures (see link below). However, I haven't used these macros myself, so I can't offer guidance on them:

http://spssx-discussion.1045642.n5.nabble.com/FW-Multiple-imputation-td1090143.html

I prefer using R for multiple imputation projects (and most others) these days. If you choose to use R, you can use the "mi" package to complete imputation and the Zelig package for analysis (and pooling).

mi: http://cran.r-project.org/web/packages/mi/index.html

Zelig: http://cran.r-project.org/web/packages/Zelig/index.html

November 20, 2012 | Registered CommenterJeremy Taylor

Hi, I want to use MI in SPSS for a RM-AVOVA as I have a lot of missing data. I have carried out the MI and now have a dataset that has 1000 subjects (rather than 100, as I did 10 MIs) If i split the file by imputation my analysis is not pooled, as discussed above, but my question is this - can I just use my new MI dataset as a complete dataset and analyse it without the split file command?
#thanks

August 13, 2013 | Unregistered Commentersheila

Hi Sheila,

I had the same problem when I was carrying out repeated measures anova in SPSS. Unfortunately, there is no way of getting a pooled analysis from SPSS when carrying this out with a multiply imputed dataset (I tried so hard to find out that I even called IBM for help and they told me it's not in any of the current models but they are looking to introduce it in future versions of the programme).

Using your full dataset without splitting the file first wont work since it will see it as one massive dataset rather than individual ones. So it will treat it as though you have 1000 subjects.. it's not the same. MI creates all the individual datasets to account for the errors and the uncertainty about creating imputes. So you need to use them individually.

I had to copy and paste the RM ANOVA output analysis into an excel file and find the pooled data manually. I did this by taking the averages of all the outputs from SPSS. Take into account the max and min data so you can see how far your average value is from the actual data. Its a tedious job but not too difficult!

Good luck!

August 13, 2013 | Unregistered CommenterKate

Hi,

i have the same problem. I would like to use the imputed data set for repeated measures anova. i tried out what you suggested above (as far as i understood ;-) ) and wondered if one could do the following:
build the average of every single spss cell above the 5 imputations. this would lead to a new "mean imputed" matrice with new "raw data" that could then be used for repeated measures anova, couldn't it?
Is it allowed to build those means? And if yes, is there any faster way to do it than manually?

Hope to hear back and thanks in advance!

August 19, 2013 | Unregistered CommenterVanessa

Hi all,

Very helpfull discussion so far. I have a very specific question for which I have not been able to find a good answer.

If you have an imputed dataset and you want to obtain a 95CI for a logistic regression OR or beta using bootrapping, can you pool the upper and lower limits across the MI sets similar to what was described previously?

December 26, 2013 | Unregistered CommenterTim

Hi all,

The answer to all the questions about multiple imputation and repeated-measures ANOVA can be found in:

Van Ginkel, J. R. & Kroonenberg (2014). Analysis of variance of multiply imputed data. Multivariate Behavioral Research, 49, 78-91.

Best regards,

Joost van Ginkel

February 24, 2014 | Unregistered CommenterJoost van GInkel

Hi, I read that it might be ok (or is often done) to just pool a list of pvalues by taking the average of them?

I would need a reference for my thesis for this.. can anyone help me?

I have several imputed sets, ran chisquare tests on them and posthoc test to compare each group with eachother.. I will take the average of those now and present those as my results since I am running out of time.. but it would be great if I could point to someone saying that this is how it can be done..

May 21, 2015 | Unregistered CommenterDudu

Unfortunately, there is no reference for this because it is not justified to average p-values. :( Pooling significance tests is not a simple matter, unless it concerns t- or z-tests.

May 28, 2015 | Unregistered CommenterJoost van Ginkel

I am new to MI and have been using SPSS MI to replace missing data. I have run 5 datasets and was able to analyze the pooled results. I am looking to create a dataset that would have 1 final value for the missing data based on the 5 imputed datasets. Can I simply aggregate the 5 imputed datasets (average the results) to create the final dataset. If so is this an acceptable solution for replacing missing data or does it not account for the error associated with each imputed dataset.

Thanks,

Ken

July 1, 2015 | Unregistered CommenterKen

Here is how I resolve the situation in SPSS for ANOVA, ANCOVA...
Regression is a special case of ANOVA so you could just run a regression model and SPSS will give you POOLED result for your p value and so on. The t-test squared will equal your F-test in anova.

The computation made is not an actual average of all result, it's almost that. see this reference for a simple explanation of this type of averaging (pooled)
http://www.tandfonline.com/doi/pdf/10.1080/1743727X.2014.979146

In the end, you cant just average results, i dont think it would be accepted in the scientific community.
Use the formula in the reference if you have to do it by hand or use regression to estimate your model as SPSS gives POOLED ouput with regression.

Sorry for my english, I am working on that :)

February 10, 2016 | Unregistered CommenterAlexandre

I have used SSPS MI to replace missing data before carrying out some model testing (CFA) using AMOS 22. However when I use the imputed data set , I am unable to use Modification Indices as the message is telling me that there is missing data. Can I use the 5th imputation only as it is the data-set which corrects for all the missing data?
Many thanks

Eamon

March 11, 2016 | Unregistered CommenterEamon

Hi all,

Steve, I have a question regarding what you suggested: taking the mean of all your imputed data to calculate the pooled p-value, F-ratio, etc.
Do you have a reference that supports this?
It would be of great help! Thank you,
Jess.

August 10, 2017 | Unregistered CommenterJess

Hi everyone. I have done the MI with 5 imputations and run regressions using the pooled data - that's all great. However, I'm doing a moderation analysis with Process and that doesn't seem to allow me to split the file and run the analysis. As a result I have df(3, 1892)! Does anyone have any suggestions how to use the pooled data for Process? My data set is a bit complicated as I have 5 variables: 4 of which data were missing completely at random and 1 was MNAR. So I did the imputation for the 4 variables MCAR but not the MNAR one. Does this make sense? I should add that stats is so not my strong suit! Any help would be appreciated. Thanks.

August 24, 2017 | Unregistered CommenterFiona

I have run a MANCOVA with a multiply imputed data set in SPSS. How would I go about calculating the pooled partial eta squared and the pooled standard error ? Can I simply calculate the mean of the estimates (partial eta squared, standard error) across all 5 datasets or is there some other step I need to take?

September 14, 2017 | Unregistered CommenterAngela