Next Upcoming Google+ Hangout: Tuesday, August 27 @ 7PM (CST) - To Participate CLICK HERE

Help Me, Help You...
  • Your Name: *
  • Your Email: *
  • What would you like to see in a future video or blog? *
« Confusing Stats Terms Explained: Multicollinearity | Main | Top Ten Tips for Data Analysis to Make Your Research Life Easier! »
Sunday
Aug012010

Confusing Stats Terms Explained: Standard Deviation


Most people find statistics to be complicated, confusing, and just generally frustrating. One of the biggest causes of confusion is the complicated vocabulary that is associated with stats. Frankly, it sometimes seems that stats terms were made to be intentionally complicated. In fact, some concepts seem perfectly understandable when described inplain English, but seem incomprehensible when described in stats lingo.

With this in mind, I decided to compile a list of the most confusing stats terms and describe them in plain English, to clear-up some of the confusion that surrounds them. Initially, this was intended to be a single blog, but I soon realized far too many words are required to adequately explain this list in one entry, so I’ve decided to present them over a series of entries. I hope this will allow me to offer thorough explanation and examples.


Without further delay...

Confusing Stats Terms Explained: Standard Deviation

Standard deviation is a descriptive statistic that is used to understand the distribution of a dataset. It is often reported in combination with the mean (or average), giving context to that statistic. Specifically, a standard deviation refers to how much scores in a dataset tend to spread-out from the mean.

A small standard deviation (relative to the mean score) indicates that the majority of individuals (or data points) tend to have scores that are very close to the mean (see figure below). In this case, cases may look clustered around the mean score, with only a few scores farther away from the mean (probably outliers).

Standard Deviation Example

By contrast, a sample with a large standard deviation (relative to the mean score) tends to have cases that are more widely spread-out from the mean (see figure on right), perhaps with only a few cases actually having scores that fall close to the mean.

Standard Deviation large Graphic Example

You may be wondering to yourself: “Why should I care about the standard deviation?” The answer to that question is context. To really understand the basic characteristics of a dataset, you must put your statistics in context.

Allow me to demonstrate:

For the sake of demonstration, imagine we have two samples of chocolate cake eaters, each sample with 10 people, self-reporting how many pieces of chocolate cake they've eaten in the last seven days.

  • In dataset #1, we have five people that report eating 4 pieces of cake and five people that report eating 6 pieces of cake, for a mean of 5 pieces of cake
    • (4+4+4+4+4+6+6+6+6+6)/10 = 5
      • Mean (Average) = 5
  • In dataset #2, we have five people that report eating 0 piece of cake and five people that report eating 10 pieces of cake, for a mean of 5 pieces of cake
    • (0+0+0+0+0+10+10+10+10+10)/10 = 5.
      • Mean (Average) = 5
Looking at the mean score alone would leave us to believe that these two datasets of people have the same chocolate cake eating habits (eating about 5 pieces per person), but would we ever come to that conclusion, given access to the full information that we have here? Of course not. Instead we would probably say that the mean of 5 pieces per person seems to describe sample #1 reasonably, but not-so-much for sample #2, as it seems to be composed of people with more extreme chocolate cake eating habits (either eating a whole lot of chocolate cake in a week or having none at all).

In this case the datasets are mathematically similar, but the mean of the two samples is somewhat deceptive. In fact, the mean statistic can be a deceptive little bugger in general, when it is not presented in context. That is where a standard deviation comes in!

Now, you might be thinking: “Why not just look at the raw data and come to that conclusion? After all, you just came to that conclusion without ever talking about the standard deviation!”

Well, that is fine as long as you only have ten people in each sample AND as long as your sample is so neatly, cleanly, and clearly organized into moderate values and extreme values, as it is here. If that is the case, then you likely can get a perfectly firm grasp on your data without ever knowing the standard deviation! Unfortunately, data is rarely that clear and samples sizes can be in the hundreds, thousands, or even millions, making it impossible to "eye-ball" the data and draw reliable conclusions.

When these instances arise (which will be almost every time you work with data), your friendly standard deviation can give you the context you need. Let's consider the standard deviations of our chocolate cake datasets. Knowing that larger values of standard deviation are indicative of more points "spread" away from the mean, compared to smaller standard deviation values (as discussed in our first paragraph), which sample (#1 or #2) would you expect to have a larger standard deviation?...

(I'll pause while you ponder your answer and then answer out-loud at a volume just loud enough to make the person sitting next to you wonder if you are stable and if it is safe to sit next to you...)

To calculate the standard deviation for your data, well let's face it, all you need to do is use SPSS (or any other statistical software package). In SPSS, you can obtain the standard deviation by:

  • In the menu bar, go to: Analyze -> Descriptives ->Options -> check the Standard Deviation Box, if it isn't already [should be by default]-> click OK-> move the variable you want to calculate from the list of variables in the left dialogue box to the empty dialogue box on the right-> click OK.
If you happen to want to calculate by hand, you simply:

  1. Subtract your mean score from every person's actual (observed) score
  2. Square those difference scores for each person
  3. Add those values together for the whole sample
  4. Divide that sum by the number of cases in your data (10 in our case)
  5. Finally, calculate the square root of the number calculate in step #4
For those of you that like formulas:

equation.png

Now, back to our example. You will recall that I asked you to guess which sample you would expect to have the larger standard deviation (#1 or #2). Well, if you said sample #2, you would be correct!
  • In dataset #1, we have five people that report eating 4 pieces of cake and five people that report eating 6 pieces of cake, for a mean of 5 pieces of cake ([4+4+4+4+4+6+6+6+6+6]/10=5).
    • Mean =5; Standard Deviation = 1
  • In dataset #2, we have five people that report eating 0 piece of cake and five people that report eating 10 pieces of cake, for a mean of 5 pieces of cake ([0+0+0+0+0+10+10+10+10+10]/10=5).
    • Mean = 5; Standard Deviation = 5
Note: You will almost never see the mean and standard deviation with the same value. This example was made intentionally extreme for demonstration purposes, but clearly you wouldn't typically have your entire sample fall into either all 0's or all 10's (unless it is categorical data, in which case there is no need for means and standard deviation scores).
From this example , we can see that the standard deviation is critical to understanding your data, by putting your mean statistic in context, in this case indicating that the mean for the dataset #2 is not a very meaningful or useful statistic for understanding the eating tendencies of individuals in that dataset.
A few closing notes about standard deviations:
  • A dataset's variance can be calculated by simply squaring the standard deviation.
    • Variance = (standard deviation)2
    • Variance will be the topic of a future "Confusing Stats Terms" blog, so you'll see thenwhy we care about both for different reasons...
  • One standard deviation above and below the mean is expected to include about 68% of the participant's scores in your dataset (assuming your distribution is normal).
    • Two standard deviations above and below the mean would be expected to include 95% of the values in your dataset (assuming your distribution is normal).
    • Three standard deviations above and below the mean would be expected to include 99.7% of the values in your dataset (assuming your distribution is normal).
  • As eluded to earlier, standard deviations should only be calculated for interval data (also true for a mean score).
    • Interval data is data that is numeric and hold an intrinsic and consistent value between values (such as 1 to 2 represents an equal increase to 2 to 3 or 3 to 4..etc).

Editorial Note: Stats Make Me Cry is owned and operated by Jeremy J. Taylor. The site offers many free statistical resources (e.g. a blog, SPSS video tutorials, R video tutorials, and a discussion forum), as well as fee-based statistical consulting and dissertation consulting services to individuals from a variety of disciplines all over the world.




PrintView Printer Friendly Version

EmailEmail Article to Friend

Reader Comments (53)

Preparing to go for a scholarship test, just found your blog.

It refreshes what I learnt years back, and I would also like to state that someone new to statistics won't find it difficult.

You did a great job.

March 22, 2011 | Unregistered CommenterBen

Thanks Ben!!!

March 22, 2011 | Registered CommenterJeremy Taylor

I have studied statistics in 1990's, learned how to calaculate Standard deviation but never learned what is the purpose of it. Thanks to your simple explanation it is now very clear to me, great. It is never too tate to learn :-)

May 6, 2011 | Unregistered CommenterSohail Qureshi

I'm glad it was helpful Sohail! Thanks for the kind words!

June 2, 2011 | Registered CommenterJeremy Taylor

I'm teaching statistics to high school students and am thrilled to have found your blog. Your explanation is better than any I could have come up with and beats the heckle out of the text book.
Thank you!

October 29, 2011 | Unregistered CommenterJenF

Thanks! I'm glad it is helpful!

October 29, 2011 | Registered CommenterJeremy Taylor

This was really helpful! :D I'm waiting for #7 Variance. I don't understand what is the difference between Variance and Standard Deviation.

November 1, 2011 | Unregistered CommenterTina

This was just AMAZING! Thank you SO much for this clear cut explanation. I am taking statistics now in University and I can tell you I feel SO dumb when the professor is talking and I don't want to appear dumb and ask him questions or my friends as they will just give me that are you for real look. Now I can understand standard deviation and feel good about myself and knowledgable and work towards increasing my confidence and self esteem. Thank you once gain SOOOO much. :)

November 5, 2011 | Unregistered CommenterHappy Student

Glad it was helpful!

November 5, 2011 | Registered CommenterJeremy Taylor

Thanks for the useful details Jeremy. I know that SD is measured relative to the mean. One may need to say that SD is low or high (relative to the mean), when can one say low or high? i.e if mean is 3, is 1 low...if so is 1.5 low or high?....etc

December 9, 2011 | Unregistered CommenterRif

Hi Thanks so much it is really useful to understand and make some sense the reasons for calculating SD.
I'm a bit stumped on T tests if you have any more goodies to share! Warm regards,
Leah

March 20, 2012 | Unregistered Commenterleah

I'm glad it was helpful, Leah! What specific questions do you have about t-tests? I'll be happy to try to help!

March 22, 2012 | Registered CommenterJeremy Taylor

Thanks for the useful details Jeremy. I know that SD is measured relative to the mean. One may need to say that SD is low or high (relative to the mean), when can one say low or high? i.e if mean is 3, is 1 low...if so is 1.5 low or high?....etc

July 2, 2012 | Unregistered Commenterandreea

I would echo earlier postings, you have done a masterful job of explaining the use of standard deviation. I've always enjoyed the concept of variability as a descriptor of normality, and indeed you have confirmed my faith that this is truly an understandable concept... Well done!

July 7, 2012 | Unregistered CommenterDr.Karl

Amazing - I use the SD Values every day as a measure of railway track quality - Ive been looking for weeks for a simple explaination - fantastic job well done sir.

July 8, 2012 | Unregistered CommenterDave O-J

Andrea,

Your question is a good one, and a question that I get a lot. Unfortunately there is no easy answer, because there is no real "rule of thumb" for what is "low" or "high" for SD. What I can say is that the larger the SD (relative to the mean), the less precise of a measurement the mean represents. If the SD is the same size as the mean, then the mean estimate would have less "meaning" to me, because it is not representative of many of the people is my sample.

I know that answer is not definitive, but I'm afraid there is no definitive answer to that question. I hope I was helpful though.

July 15, 2012 | Registered CommenterJeremy Taylor

Thank you for the kind words, Dr. Karl and Dave O-J!

July 15, 2012 | Registered CommenterJeremy Taylor

Hi,
I'm studying psychology (first year) and even kids resources failed to explain this concept to me in a way I understand.
THANK YOU!!

August 30, 2012 | Unregistered CommenterNina

You are truly gifted! I hope you are a teacher/professor. Thank you very much!!! :-)

October 2, 2012 | Unregistered Commenterhappyaboutstatsnow

could you please explain to me how standard deviation in relation to gender works? ive been given results where males were coded with a zero and females with a 1, the mean is .36 so i understand that 36% of the sample are female. the table says the standard deviation is .48, how is there a standard deviation for gender and how would i explain it? good job on explaining standard deviation by the way :)

October 13, 2012 | Unregistered CommenterKatie

Thank you!

October 18, 2012 | Registered CommenterJeremy Taylor

Means and standard deviations (SD) are measures of central tendency to be used to understand continuous (numeric) variables, not categorical (or binary) variables. I understand that you've assigned numeric codes to gender groups, but those are arbitrary assignments, they are not numerically meaningful. I would not use means or SD to describe a categorical variable (such as gender or race). I hope this helps!

October 18, 2012 | Registered CommenterJeremy Taylor

Thanks for the really clear explanation! I'm studying for the Project Management Professional Exam and the section on estimating that included SD and Variances just tied me in knots. NOW I understand - thanks!

October 27, 2012 | Unregistered CommenterRon

Thanks for the simple, understandable explanation. As a high school teacher I hear that term thrown about all the time in reference to test scores for the state standards, but I was never very clear on what it meant. Whenever I asked, somewhat sheepishly, what standard deviation really meant I got a pretty good hodge-podge of answers, but no clear understanding. Now I have at least a basic comprehension of the idea.

Where were you when I was taking trig and college algebra?

October 28, 2012 | Unregistered CommenterMark

I'm glad I could be helpful, Mark!

October 29, 2012 | Registered CommenterJeremy Taylor

You're welcome, Ron! Thanks for reading!

October 29, 2012 | Registered CommenterJeremy Taylor

I can see where you getting you numbers because they're clean but I was wondering if you could do a similar exercise where the numbers were all over the map and then explain it. I just did my own exercise where I just chose random numbers between 1 and 15 in a dataset of 11 numbers and got a mean of 6.8 and a standard deviation of 3.97. This is where I don't understand what the SD means when the dataset isn't as clear as the two you chose to use. Can you help me out?

November 8, 2012 | Unregistered CommenterDerek

Excellent post. Helped me find the words when explaining to a client. I've bookmarked your blog!

November 15, 2012 | Unregistered CommenterCJ @ StrategicMarketingGuy

Thanks CJ!!

November 20, 2012 | Registered CommenterJeremy Taylor

Thank you very well done

November 21, 2012 | Unregistered CommenterRick Ryan

Outstanding explanation, keep up the good work bro!

November 22, 2012 | Unregistered CommenterAmrit

So are we not supposed to subtract 1 from the number of data items in the denominator during the division of the sum of squared deviations by the number of data items? Or are there different variations of standard deviations?

November 25, 2012 | Unregistered Commenterashley

Hi Ashley!

You are supposed to subtract 1 from the sample size (n-1) during calculation, but only if you are dealing with a sample of a larger population. If you are dealing with calculation of SD in an entire known population (as opposed to a sample), then there is no need to subtract 1. Am I making sense?

December 2, 2012 | Registered CommenterJeremy Taylor

Thanks Amrit!

December 2, 2012 | Registered CommenterJeremy Taylor

Thanks Rick!

December 2, 2012 | Registered CommenterJeremy Taylor

This is good.however you need to go further and interprete for us.Fantastic work.

January 21, 2013 | Unregistered CommenterTom

thank you for the explanation.... im working on my thesis now and i was conducting survey.... i already have the result but the problem is how can i tabulate the data in getting standard deviation with the options (yes, no and maybe)?

January 27, 2013 | Unregistered Commenterjean

hoping for your reply as soon as possible sir...... thank you...

January 27, 2013 | Unregistered Commenterjean

Hey Tom, how do you mean? Interpret what?

February 3, 2013 | Registered CommenterJeremy Taylor

Would it he correct to think SD is an average of averages. Thanks for any reply-Sean

February 11, 2013 | Unregistered Commentersean

I think I would think of SD more as a measure of how much your sample tends to vary from the mean.

February 11, 2013 | Registered CommenterJeremy Taylor

I like this. It brings clarity to what can be such mumbo jumbo! Thank you!

February 25, 2013 | Unregistered Commentersharon

Thanks Sharon!

March 26, 2013 | Registered CommenterJeremy Taylor

This is great work, thank you for doing this! It's refreshing to finally read stuff without getting stuck on the comprehension of the most basic and fundamental of stats topics. Very much looking forward to the next 7! Can you please email me when they are up? I've included my email address within this form. Thanks so much - again, really fantastic work!

April 10, 2013 | Unregistered CommenterJess

Thanks, Jess!

April 11, 2013 | Registered CommenterJeremy Taylor

thx for the explaination and stuff :)

May 14, 2013 | Unregistered CommenterAlgebraStudentLady

Your welcome, AlgebraStudentLady!

May 15, 2013 | Registered CommenterJeremy Taylor

I took stats when I first began college many years ago and slid by with a D...I don't know how because I didn't understand a word that professor spoke. I might as well have been in an advanced Mandarin class! Fast forward to now, and I am retaking this course for my RN-BSN class. I am also taking it online because I am crazy. The language of statistics is so confusing, and this is the first time I have understood what Standard Deviation means and why it is used! I am keeping this blog close to my heart and on my desktop for easy access....THANK YOU SO MUCH!!!!

May 31, 2013 | Unregistered CommenterDanielle

Thanks for the kind words Danielle! I'm glad I could be helpful!

May 31, 2013 | Registered CommenterJeremy Taylor

Hi,

It is really useful .

June 11, 2013 | Unregistered Commenteranusha

PostPost a New Comment

Enter your information below to add a new comment.
Author Email (optional):
Author URL (optional):
Post:
 
Some HTML allowed: <a href="" title=""> <abbr title=""> <acronym title=""> <b> <blockquote cite=""> <code> <em> <i> <strike> <strong>