Confusing Stats Terms Explained: Residual
Monday, October 25, 2010 at 12:19PM
Jeremy Taylor in Stats Make Me Cry Blog Entries, confusing stats terms, distribution, equation, normal, parametric, regression, residual, statistics, top ten


When I hear the word "residual", the pulp left over after I drink my orange juice pops into my brain, or perhaps the film left on the car after a heavy rain. However, when my regression model spits out an estimate of my model's residual, I'm fairly confident it isn't referring to OJ or automobile gunk...right? Not so fast, that imagery is more similar to it's statistical meaning than you might initially think.

In statistics, a residual refers to the amount of variability in a dependent variable (DV) that is "left over" after accounting for the variability explained by the predictors in your analysis (often a regression). Right about now you are probably thinking: "this guy likes the word "variability" way too much, he should buy a thesaurus already!"

Let me try again: when you include predictors (independent variables) in a regression, you are making a guess (or prediction) that they are associated with the DV; a residual is a numeric value for how much you were wrong with that prediction. The lower the residual, the more accurate the the predictions in your regression are, indicating your IVs are related to (predictive of) the DV.


Related Content:

Confusing Stats Terms Explained: Heteroscedasticity

Confusing Stats Terms Explained: Multicollinearity

Confusing Stats Terms Explained: Standard Deviation



Keep in mind that each person in your sample will have their own residual score. This is because a regression model provided a "predicted value" for every individual, which is estimated from the values of the IVs of the regression. Each person's residual score is the difference between their predicted score (determined by the values of the IV's) and the actual observed score of your DV by that individual. That "left-over" value is a residual.

Like the imagery of the orange pulp, a statistical residual is simply what's left over from your regression model. They can be used for many things, such as estimating accuracy of your model and checking assumptions, but that is a chat for another time...

Editorial Note: Stats Make Me Cry is owned and operated by Jeremy J. Taylor. The site offers many free statistical resources (e.g. a blog, SPSS video tutorials, R video tutorials, and a discussion forum), as well as fee-based statistical consulting and dissertation consulting services to individuals from a variety of disciplines all over the world.





Article originally appeared on Stats Make Me Cry (http://www.statsmakemecry.com/).
See website for complete article licensing information.