In: Statistics and Probability
Why is the area under the standard normal curve above the mean always the same value?
The Normal Distribution and the Standard Deviation
When talking about the normal distribution, it's useful to think of the standard deviation as being steps away from the mean. One step to the right or one step to the left is considered one standard deviation away from the mean. Two steps to the left or two steps to the right are considered two standard deviations away from the mean. Likewise, three steps to the left or three steps to the right are considered three standard deviations from the mean. The standard deviation of a dataset is simply the number (or distance) that constitutes a complete step away from the mean. Adding or subtracting the standard deviation from the mean tells us the scores that constitute a complete step. Below I've put together a distribution with a mean of 58 and a standard deviation of 5. For example, if I add the standard deviation to the mean, I would get a score of 63 (58 + 5 = 63). In stats terminology, we would say that a score of 63 falls exactly "one standard deviation above the mean." Similarly, we could subtract the standard deviation from the mean (58 – 5 = 53) to find the score that falls one standard deviation below the mean.
Normal distributions are important due to Chebyshev's Theorem, which states that for a normal distribution a given standard deviation above and/or below the mean will always account for the same amount of area under the curve. Let me explain. Take a look at the picture below. The shaded area represents the total area that falls between one standard deviation above and one standard deviation below the mean. Those Greek letters are just statistical notation for the mean and the standard deviation of a population. Regardless of what a normal distribution looks like or how big or small the standard deviation is, approximately 68 percent of the observations (or 68 percent of the area under the curve) will always fall within two standard deviations (one above and one below) of the mean. Can you guess what proportion falls between the mean and just one standard deviation above it? If you guessed 34, you must be familiar with division (.68/2 = .34).
Now take a look at the next picture. It's basically the same as the first instance, only this time we're looking at two standard deviations above and below the mean. For any normal distribution, approximately 95 percent of the observations will fall within this area.
The same thing holds true for our distribution with a mean of 58 and a standard deviation of 5; 68% of the data would be located between 53 and 63. Within this range are all of the data values located within one standard deviation (above or below) of the mean. Furthermore, 95% of the data would fall within two standard deviations of the mean, or in this case between 48 and 68. Finally, 99.7% of the data values would fall between 43 and 73, or within three standard deviations of the mean. The percentages mentioned here make up what some statisticians refer to as the 68%-95%-99.7% rule. These percentages remain the same for all normally distributed data. I've illustrated this principle on the graph below.
Fun fact: the percentage of our distribution that falls in a given area is exactly the same as the probability that any single observation will fall in that area. In other words, we know that approximately 34 percent of our data will fall between the mean and one standard deviation above the mean. We can also say that a given observation has a 34 percent chance of falling between the mean and one standard deviation above the mean. Or, to put it another way, if you were to choose an observation at random from our distribution, there is a 34 percent chance that it would come from the area between the mean and one standard deviation above the mean.