In: Statistics and Probability
1. Explain how histograms (Links to an external site.) are used to examine data. Describe the various shapes of histogram and their meaning.
2. When would it be appropriate to use a histogram as opposed to a scattergram?
3. What is the purpose of a Data Dashboard? If you were the manager of a restaurant, for example, what 5 key metrics would you want to display on a Data Dashboard? (note: You should think about what important metrics a company, particularly a restaurant would want to measure. Answers will vary.)
4. Describe "Simpson's Paradox". Search the Internet and provide an example of this statistical phenomenon.
1)
Use histograms when you have continuous measurements and want to understand the distribution of values and look for outliers. These graphs take your continuous measurements and place them into ranges of values known as bins. Each bin has a bar that represents the count or percentage of observations that fall within that bin.
Histograms and Skewed Distributions
Histograms are an excellent tool for identifying the shape of your distribution. So far, we’ve been looking at symmetric distributions, such as the normal distribution. However, not all distributions are symmetrical. You might have nonnormal data that are skewed.
The shape of the distribution is a fundamental characteristic of your sample that can determine which measure of central tendency best reflects the center of your data. Relatedly, the shape also impacts your choice between using a parametric or nonparametric hypothesis test. In this manner, histograms are informative about the summary statistics and hypothesis tests that are appropriate for your data.
For skewed distributions, the direction of the skew indicates which way the longer tail extends.
For right-skewed distributions, the long tail extends to the right while most values cluster on the left, as shown below. These are real data from a study I conducted.
Conversely, for left-skewed distributions, the long tail extends to the left while most values cluster on the right.
Using Histograms to Identify Outliers
Histograms are a handy way to identify outliers. In an instant, you’ll see if there are any unusual values. If you identify potential outliers, investigate them. Are these data entry errors or do they represent observations that occurred under unusual conditions? Or, perhaps they are legitimate observations that accurately describe the variability in the study area.
Histograms and Variability
Suppose you hear that two groups have the same mean of 50. It sounds like they’re practically equivalent. However, after you graph the data, the differences become apparent, as shown below.
The histograms center on the same value of 50, but the spread of values is notably different. The values for group A mostly fall between 40 – 60 while for group B that range is 20 – 90. The mean does not tell the entire story! At a glance, the difference is evident in the histograms.
In short, histograms show you which values are more and less common along with their dispersion. You can’t gain this understanding from the raw list of values. Summary statistics, such as the mean and standard deviation, will get you partway there. But histograms make the data pop!