In: Statistics and Probability
True
One of the most fundamental chart types is the bar chart, and one of your most useful tools when it comes to exploring and understanding your data.
What is a bar chart?
A bar chart (aka bar graph, column chart) plots numeric values for levels of a categorical feature as bars. Levels are plotted on one chart axis, and values are plotted on the other axis. Each categorical value claims one bar, and the length of each bar corresponds to the bar’s value. Bars are plotted on a common baseline to allow for easy comparison of values.
This example bar chart depicts the number of purchases made on a site by different types of users. The categorical feature, user type, is plotted on the horizontal axis, and each bar’s height corresponds to the number of purchases made under each user type. We can see from this chart that while there are about three times as many purchases from new users who create user accounts than those that do not create user accounts (guests), both are dwarfed by the number of purchases made by repeating users.
When you should use a bar chart
A bar chart is used when you want to show a distribution of data points or perform a comparison of metric values across different subgroups of your data. From a bar chart, we can see which groups are highest or most common, and how other groups compare against the others. Since this is a fairly common task, bar charts are a fairly ubiquitous chart type.
The primary variable of a bar chart is its categorical variable. A categorical variable takes discrete values, which can be thought of as labels. Examples include state or country, industry type, website access method (desktop, mobile), and visitor type (free, basic, premium). Some categorical variables have ordered values, like dividing objects by size (small, medium, large). In addition, some non-categorical variables can be converted into groups, like aggregating temporal data based on date (eg. dividing by quarter into 20XX-Q1, 20XX-Q2, 20XX-Q3, 20XX-Q4, etc.) The important point for this primary variable is that the groups are distinct.
In contrast, the secondary variable will be numeric in nature. The secondary variable’s values determine the length of each bar. These values can come from a great variety of sources. In its simplest form, the values may be a simple frequency count or proportion for how much of the data is divided into each category – not an actual data feature at all. For example, the following plot counts pageviews over a period of six months. You can see from this visualization that there was a small peak in June and July before returning to the previous baseline.
Other times, the values may be an average, total, or some other summary measure computed separately for each group. In the following example, the height of each bar depicts the average transaction size by method of payment. Note that while the average payments are highest with checks, it would take a different plot to show how often customers actually use them.