In: Economics
Why is the normal distribution so important? What's the big deal behind it, and why do statisticians care so much? Are real-world data really normally distributed?
The normal distribution is the most important probability distribution in statistics because it fits many natural phenomena. For example, heights, blood pressure, measurement error, and IQ scores follow the normal distribution. It is also known as the Gaussian distribution and the bell curve.
The normal distribution is a probability function that describes how the values of a variable are distributed. It is a symmetric distribution where most of the observations cluster around the central peak and the probabilities for values further away from the mean taper off equally in both directions. Extreme values in both tails of the distribution are similarly unlikely.
that nothing real follows a Normal distribution.
The reason for that is the Central Limit Theorem, which says (roughly) that if something results from a lot of small influences that are not too correlated with each other, you’ll get a Normal distribution. Height, for example, is controlled by lots of genes, plus nutrition and other factors that work more or less independently.
However the Central Limit Theorem works from the center of the distribution out. Even if there aren’t that many factors, and some are big, and some are correlated; you can still get a distribution that looks pretty Normal for 80% or 95% of the observations. If there are many factors, none big, and no major correlations; maybe the distribution looks Normal for 99% or 99.9% of the observations. But never for 100%. With height, for example, there are outliers due to genetic conditions or stunting. In other cases, the problem is not outliers, but maximum or minimum values.
The reason this is important, is you can look at a lot of data and see it follows something reasonably Normal, and therefore make confidence intervals based on Normal assumptions. But you know (or should know) that tails are never Normal. Depending on the application, a single outlier may be more important than all the rest of your data put together.