Next Upcoming Google+ Hangout: Tuesday, August 27 @ 7PM (CST) - To Participate CLICK HERE

Search For Topics/Content


STATISTICS QUESTIONS FROM YOU FOR THE STATS MAKE ME CRY GUY!


This page page features questions previously submitted by users on the "Ask the Stats Make Me Cry Guy" page. Although we now use the forum for these questions instead, I decided to leave these posted so that the information was available!

Entries in data analysis (4)

Thursday
May202010

Data Analysis Question: Will removing your partial (or missing) data and doubling your complete data give you the same results as completing a multiple imputation on the data? (Matt, Chicago, IL)  

Great question! I get questions about missing data and how to deal with it all of the time. To your question, simply removing partial data and doubling your complete data points will not result in the same outcome as a multiple imputation. This is because a multiple imputation inserts random values (within the parameters of your data) into your dataset  to make sure that no bias occurs by the replaced values being present. By contrast, if a bias exists in the pattern of missing data of your original dataset, doubling the existing values will merely magnify this bias. Multiple imputation is widely considered to be the most bias-free method of dealing with missing data. Although it can be complex and time intensive, modern statistical software (such as SPSS and AMOS) offer "multiple imputation" features, which vastly simplify the process.

Thursday
May202010

Data Analysis Question: When dealing with significance levels, should I use p < .001 or p < .05? Also, should it change between tests or stay consistent throughout my analyses? (Matt, Chicago, IL)

In statistics, the level at which one seeks to find a significant p-value is known as "alpha". The most common levels of alpha are p < .05, p < .01, and p < .001. The decision about which to use is a difficult one, and is somewhat subjective. Essentially, alpha is the degree of chance that a researcher is willing to accept that the inferences they take from any given analysis are made in error. In other words, if I choose to use an alpha of .05, I'm accepting that there is an approximately 5% chance that I'll make assumptions from my results that may be an inaccurate representation of the population I am seeking to analyze. An alpha of .05 is the most commonly used alpha, although lower values of alpha (such as p < .001) are considered more conservative. There is no right or wrong answer about which to choose, although it is typically encouraged to keep the alpha statistic consistent across each analysis conducted in any given project, and to make the decision about which to use prior to running your analysis.

Wednesday
May052010

ANOVA and t-test Question: How do I know if I should be using a t-test or an ANOVA with my data? (Will, Ann Arbor, MI)

Thanks for the question, Will! From a practical/application perspective, both a t-test and an ANOVA are generally used to compare means between groups (although there are several types of t-tests). However, a t-test is commonly restricted to only testing between two groups (could be independent or related, depending on the type off t-test), while an ANOVA can accommodate a comparison of three or more groups. In some cases  a t-test can even be used to test a single group, to determine if the mean of that group is significantly different from a predetermined value (one sample t-test). There are other statistical differences also, but that is a summary of the differences in application.

Monday
May032010

Survey Data Analysis Question: How do I know if I should be using Exploratory Factor Analysis (EFA) or Confirmatory Factor Analysis (CFA)? (Jamie, Phoenix, AZ)

Fantastic question Jamie! The decision about whether to use EFA or CFA isn't always a clear cut one. At it's most basic statistical root, EFAs are  useful when you do not have an a priori hypothesis about how a set of items should be grouped together to measure unique constructs, but you think there are some distinct constructs that can be measured amongst a set of items. By contrast, a CFA is more appropriate when an a priori hypothesis exists about the structure of the data (the hypothesis may be rooted in a conceptual framework, prior EFA analysis, or both).
With that said, you are likely to see EFA used when some hypotheses exist about a set of items, so the above rules are not always rigid. The key in your decision is: what is the question you are trying to answer? If your research question is one of an "exploratory nature", then an EFA may be your best choice. However, if you are seeking to test an existing theory, hypothesis, or test competing models/structures, a CFA is what you are looking for. When sample size is abundant, one can randomly split their sample and extract a factor structure from the first half of their data (using EFA) and then test that structure, using a CFA on the second half of their data! If this remains unclear, feel free to send along more specifics (perhaps in the forum) and I'll try to offer a bit more guidance.