In: Statistics and Probability
What is the difference between a mean, median and mode of a dataset? Provide an example of when it would be appropriate to report each of these. What is a standard deviation? Why is it necessary to report both a measure of center and a measure of spread?
Mean : The mean is equal to the sum of all the values in the data set divided by the number of values in the data set. It can be used with both discrete and continuous data, although its use is most often with continuous data. The mean has one main disadvantage: it is particularly susceptible to the influence of outliers.
Mean is given by the formula :
Median : The median is the middle score for a set of data that has been arranged in order of magnitude. The median is less affected by outliers and skewed data.
Mode : The mode is the most frequent score in our data set. Normally, the mode is used for categorical data where we wish to know which is the most common category. However, one of the problems with the mode is that it is not unique. Another problem with the mode is that it will not provide us with a very good measure of central tendency when the most common mark is far away from the rest of the data in the data set.
When we have a normally distributed sample, we can use both the mean or the median as your measure of central tendency. In fact, in any symmetrical distribution the mean, median and mode are equal. However, in this situation, the mean is preferred as the best measure of central tendency because it includes all the values in the data.
The median is generally considered to be the best representative of the central location of the data when data is skewed.
The table given below shows the best measure of central tendency for different types of variable.
Type of Variable | Best measure of central tendency |
Nominal | Mode |
Ordinal | Median |
Interval/Ratio (not skewed) | Mean |
Interval/Ratio (skewed) | Median |
Standard deviation : The standard deviation is a measure of the spread of scores within a set of data. The standard deviation is given by the formula :
The measure of spread and measure of central tendency are usually used in conjunction to provide an overall description of a set of data.