In: Math
Discuss similarities and differences among these ways to measure data variation.
Why would it seem reasonable to pair the median with a box-and-whisker plot & to pair the mean with the standard deviation?
What are the advantages and disadvantages of each method of describing data spread?
Comment on statements such as following:
a) The range is easy to compute, but it doesn’t give much information;
b)
Although the standard deviation is more complicated to compute, it
has some significant
applications;
c) The
box-and-whisker plot is fairly easy to construct and it gives
a
lot of information at a
glance.
There are several ways to describe the center and spread of a distribution. One way to present this information is with a five-number summary ( Box plot provide this information). It uses the median as its center value and gives a brief picture of the other important distribution values. Another measure of spread uses the mean and standard deviation to decipher the spread of data. This technique, however, is best used with symmetrical distributions with no outliers ( with outliers we use box plot).
Despite this restriction, the mean and standard deviation measures are used more commonly than the five-number summary. The reason for this is that many natural phenomena can be approximately described by a normal distribution. And for normal distributions, the mean and standard deviation are the best measures of center and spread respectively.
Standard deviation takes every value into account, has extremely useful properties when used with a normal distribution, and is mathematically manageable. But the standard deviation is not a good measure of spread in highly skewed distributions and, in these instances, should be supplemented by other measures such as the semi-quartile range ( This information is provided by the Box plot).