Next Upcoming Google+ Hangout: Tuesday, August 27 @ 7PM (CST) - To Participate CLICK HERE

Search For Topics/Content


STATISTICS QUESTIONS FROM YOU FOR THE STATS MAKE ME CRY GUY!


This page page features questions previously submitted by users on the "Ask the Stats Make Me Cry Guy" page. Although we now use the forum for these questions instead, I decided to leave these posted so that the information was available!

Entries in error term (1)

Tuesday
Dec282010

What are residuals in regression? How we can find residuals? (Raja, location unknown)

Great question(s), Raja! Residual is a buzz word that is often-used in statistics, which can make things very confusing if you aren't clear what it is. Before I go further in answering this question here (which I will), I'll also refer you to a blog I wrote in October on this very topic, called: "Top Ten Confusing Stats Terms Explained in Plain English (#8 Residual)". While I'm happy to try to address this question here, you might find that the blog posting offers more detail and examples that are useful to understand this complex topic.

That said, here is my attempt to answer this question concisely:

In general, a residual is the difference between the actual value of a dependent variable (DV) and the value of variable that was predicted by a statistical model. In the context of a regression, a residual is how far a predicted value (as determined by the predictors in the model) is from the actual value of the dependent variable. This is also called an "error term" (because it represents how much your regression model was in error, in terms of its ability to predict the value of the DV).

In any statistical model, a residual can be thought of in various contexts. For example, each person will have their own "residual" score (which is simply the difference between the actual and predicted value of the DV), while the model as a whole also has a residual score (which represents how much variability in the DV remained 'unexplained' by the predictors in the model). A residual score serves several purposes, including: 1) determining the accuracy of your model (how much variability is explained by the model) and 2) being used to test the assumptions inherent in the regression analysis (such as the assumption that the residuals are normally distributed OR that their variance is equal across all levels of the predicted value of the DV).

Again, please visit the blog linked above for more in-depth information. Good luck!