Next Upcoming Google+ Hangout: Tuesday, August 27 @ 7PM (CST) - To Participate CLICK HERE

Search For Topics/Content


STATISTICS QUESTIONS FROM YOU FOR THE STATS MAKE ME CRY GUY!


This page page features questions previously submitted by users on the "Ask the Stats Make Me Cry Guy" page. Although we now use the forum for these questions instead, I decided to leave these posted so that the information was available!

Monday
Jan032011

Which methods are used for analysis of Residuals? (Raja, location unknown)

Another great question Raja! There are several ways that one might analyze residuals and the one that an analyst chooses is depended on their purpose/goal for analyzing the them. For example, if someone is testing whether the variance of their residuals are equal across levels of the predicted value of their model's DV ( an assumption of regression), one would use a scatterplot, placing the predicted value of their DV on the x-axis and the residual scores from the regression model on the y-axis.

NOTE: most statistical software packages will allow you to save both the predicted scores and residual scores of a regression model (in SPSS, simply place a checkmark in the desired boxes in the "save" dialogue of the "regression" analysis).

Conversely, if the residuals are being analyzed in an effort to control for confounds, the residuals of one regression model (a model with the variable/variables that are to be controlled as predictors) might be used as the DV in another model (a model that would feature your target variable of interest as the predictor).

In yet another analysis (of multivariate normality; also an assumption of regression), residual scores may be examined with a histogram to determine if their distribution is "normal."

I hope that is helpful!

 

Tuesday
Dec282010

What are residuals in regression? How we can find residuals? (Raja, location unknown)

Great question(s), Raja! Residual is a buzz word that is often-used in statistics, which can make things very confusing if you aren't clear what it is. Before I go further in answering this question here (which I will), I'll also refer you to a blog I wrote in October on this very topic, called: "Top Ten Confusing Stats Terms Explained in Plain English (#8 Residual)". While I'm happy to try to address this question here, you might find that the blog posting offers more detail and examples that are useful to understand this complex topic.

That said, here is my attempt to answer this question concisely:

In general, a residual is the difference between the actual value of a dependent variable (DV) and the value of variable that was predicted by a statistical model. In the context of a regression, a residual is how far a predicted value (as determined by the predictors in the model) is from the actual value of the dependent variable. This is also called an "error term" (because it represents how much your regression model was in error, in terms of its ability to predict the value of the DV).

In any statistical model, a residual can be thought of in various contexts. For example, each person will have their own "residual" score (which is simply the difference between the actual and predicted value of the DV), while the model as a whole also has a residual score (which represents how much variability in the DV remained 'unexplained' by the predictors in the model). A residual score serves several purposes, including: 1) determining the accuracy of your model (how much variability is explained by the model) and 2) being used to test the assumptions inherent in the regression analysis (such as the assumption that the residuals are normally distributed OR that their variance is equal across all levels of the predicted value of the DV).

Again, please visit the blog linked above for more in-depth information. Good luck!

Thursday
Dec092010

How do I determine if a questionaire I want to use has 'good psychometric properties'? (Jessica, United Kingdom)


Thanks for the great question, Jessica! Typically, when the term psychometrics is used in the context of survey research, it is in reference to a survey instrument's reliability. The most common statistic used to determine a survey instrument's reliability is Cronbach's alpha. Cronbach's alpha is a statistic that evaluates how much individual survey items covariate with one another to predict a single construct. In English, that means it is a test how much a group of survey questions measures the construct they are intended to represent.
As an example, if an assessment of depressive symptoms contains 10 items, a Cronbach's alpha for those 10 items is a measure of how well the group of those 10 items (as a whole) represents a respondent's level of depressive symptoms . If you've not yet collected data, the best way to determine an instruments psychometrics is to review previous studies that have used the survey and calculate what the average Cronbach's alpha was among them. Cronbach's alpha typically ranges from 0 to 1 (although negative numbers are possible, they are usually meaningless), with values closer to one indicating stronger reliability. There is no official value that indicates strong reliability, but a review of the literature does show some general conventions on the topic.
Commonly, a Cronbach's alpha in the range of .70 to .79 is considered adequate, a value in the range of .80 to .89 is considered good, and a Cronbach's alpha in the range of .90 to .99 is considered excellent (an alpha of 1.00 is most likely an error or an indication that something is wrong with your data). Another test of reliability includes test-retest reliability (which uses a correlation to test for agreement between two measures of the same construct). If you've already collected your data, the statistic can generally be easily obtained using any statistical program (such as SPSS or SAS). I hope that was helpful and please keep the questions coming!

Tuesday
Nov232010

3 DV and 2 IV. I also have (possibly) up to 5 covariates (demographic info, eg. age, education, ethnicity, etc.); Only 1 group. So what's the best test? (Bren)

Great question, Bren! The best analysis to use, assuming your assumptions hold up, would be a MANCOVA (which stands for a Multivariate Analysis of Covariance). It is multivariate because you have more than one dependent variable (DV) and it is an analysis of covariance because you are examining more than one independent variable (IV). This can be accomplished in SPSS by using the following drop-down menu path:

Analyze -> General Linear Model -> Multivariate

Good Luck!

- The Stats Make Me Cry Guy (Jeremy)

Thursday
May202010

Data Analysis Question: Will removing your partial (or missing) data and doubling your complete data give you the same results as completing a multiple imputation on the data? (Matt, Chicago, IL)  

Great question! I get questions about missing data and how to deal with it all of the time. To your question, simply removing partial data and doubling your complete data points will not result in the same outcome as a multiple imputation. This is because a multiple imputation inserts random values (within the parameters of your data) into your dataset  to make sure that no bias occurs by the replaced values being present. By contrast, if a bias exists in the pattern of missing data of your original dataset, doubling the existing values will merely magnify this bias. Multiple imputation is widely considered to be the most bias-free method of dealing with missing data. Although it can be complex and time intensive, modern statistical software (such as SPSS and AMOS) offer "multiple imputation" features, which vastly simplify the process.

Thursday
May202010

Data Analysis Question: When dealing with significance levels, should I use p < .001 or p < .05? Also, should it change between tests or stay consistent throughout my analyses? (Matt, Chicago, IL)

In statistics, the level at which one seeks to find a significant p-value is known as "alpha". The most common levels of alpha are p < .05, p < .01, and p < .001. The decision about which to use is a difficult one, and is somewhat subjective. Essentially, alpha is the degree of chance that a researcher is willing to accept that the inferences they take from any given analysis are made in error. In other words, if I choose to use an alpha of .05, I'm accepting that there is an approximately 5% chance that I'll make assumptions from my results that may be an inaccurate representation of the population I am seeking to analyze. An alpha of .05 is the most commonly used alpha, although lower values of alpha (such as p < .001) are considered more conservative. There is no right or wrong answer about which to choose, although it is typically encouraged to keep the alpha statistic consistent across each analysis conducted in any given project, and to make the decision about which to use prior to running your analysis.

Wednesday
May052010

ANOVA and t-test Question: How do I know if I should be using a t-test or an ANOVA with my data? (Will, Ann Arbor, MI)

Thanks for the question, Will! From a practical/application perspective, both a t-test and an ANOVA are generally used to compare means between groups (although there are several types of t-tests). However, a t-test is commonly restricted to only testing between two groups (could be independent or related, depending on the type off t-test), while an ANOVA can accommodate a comparison of three or more groups. In some cases  a t-test can even be used to test a single group, to determine if the mean of that group is significantly different from a predetermined value (one sample t-test). There are other statistical differences also, but that is a summary of the differences in application.

Monday
May032010

Survey Data Analysis Question: How do I know if I should be using Exploratory Factor Analysis (EFA) or Confirmatory Factor Analysis (CFA)? (Jamie, Phoenix, AZ)

Fantastic question Jamie! The decision about whether to use EFA or CFA isn't always a clear cut one. At it's most basic statistical root, EFAs are  useful when you do not have an a priori hypothesis about how a set of items should be grouped together to measure unique constructs, but you think there are some distinct constructs that can be measured amongst a set of items. By contrast, a CFA is more appropriate when an a priori hypothesis exists about the structure of the data (the hypothesis may be rooted in a conceptual framework, prior EFA analysis, or both).
With that said, you are likely to see EFA used when some hypotheses exist about a set of items, so the above rules are not always rigid. The key in your decision is: what is the question you are trying to answer? If your research question is one of an "exploratory nature", then an EFA may be your best choice. However, if you are seeking to test an existing theory, hypothesis, or test competing models/structures, a CFA is what you are looking for. When sample size is abundant, one can randomly split their sample and extract a factor structure from the first half of their data (using EFA) and then test that structure, using a CFA on the second half of their data! If this remains unclear, feel free to send along more specifics (perhaps in the forum) and I'll try to offer a bit more guidance.

Tuesday
Apr272010

ANOVA, MANOVA, and MANCOVA Question: What is the difference between ANOVA, MANOVA, and MANCOVA? (Eric, Lafayette, IN)

The difference can definitely be confusing. There are differences on a few different levels. First, an ANOVA is different from both a MANOVA and MANCOVA because an ANOVA has only one dependent variable, while both a MANOVA and MANCOVA have multiple dependent variables. An ANOVA typically compares a continuous (a.k.a interval or scale variable) between multiple independent groups of responses (usually 3 or more groups).

By contrast, both a MANOVA and MANCOVA have multiple dependent variables, but there are differences between the two as well. The difference between a MANOVA and MANCOVA lies in the number of independent variables. A MANOVA, like an ANOVA, has only one independent variable (which is typically a categorical variable that represents independent groups) and compares multiple dependent variables between independent groups.  A MANCOVA is a similar concept to MANOVA, except it allow for multiple independent variables (a.k.a. covariates).

In a MANCOVA, one is able to examine  multiple dependent variables for differences between independent groups, while controlling for other variables that may also be related to the DV. These covariates may be either categorical or continuous. I hope this helps, Eric! Please keep the questions coming!

Monday
Apr262010

Regression Analysis Question: What is the difference between a mediator and a moderator? (anonymous)

Great Question! The difference is basically parallel to the difference between explaining something and changing something. In statistics, a mediator is a variable that explains the relationship between two other variables. For example, age may be hypothetically related to having a higher income, but that may be explained by age's association with work experience, which may itself be related to a higher income. If work experience accounts for a significant portion of the variability in income that is explained by age, then work experience is a mediator of that relationship (there are different kinds of mediators also, such as full or partial, but we won't get into that here).

While a mediator explains, a moderator changes. When the strength of the relationship between two variables is dependent on the value of a third variable, that variable is called a moderator. With respect to moderators, the easiest example is often gender. For example, let's pretend that age was associated with liking of ice cream, but that was only true for boys, while girl's liking of ice cream did not tend to vary with age. In this case, one's gender determines whether their age is related to their liking of ice cream. In an entire sample of people, a researcher might say that knowing an individual's gender would CHANGE the extent that they are able to predict one's liking of ice cream from their age.

This is a topic that is commonly confused, so much so, that I made a video about it (which also includes information about suppressors)! Check out my video about mediators, moderators, and suppressors HERE! Thanks for the question and keep them coming!