In: Statistics and Probability
Create a small dataset of at least 5 observations and calculate the mean, median, mode, range, IQR and standard deviation (use Excel or StatCrunch). Add an outlier to your data and recalculate these measures. Which changed and by how much? Explain how this illustrates the idea of resistant measures. Please provide the data that you made up and the measures for your data before and after you added an outlier. Organizing the data into a table would be a great way to display all of this
Solution:
Let five numbers given as below:
X |
5 |
8 |
12 |
14 |
16 |
The required descriptive statistics by using excel are given as below:
X |
|
Mean |
11 |
Median |
12 |
Mode |
#N/A |
Standard Deviation |
4.47 |
Range |
11 |
Minimum |
5 |
Maximum |
16 |
Sum |
55 |
Count |
5 |
First Quartile |
6.5 |
Third Quartile |
15 |
IQR |
8.5 |
Now, we have to add one outlier in this data. Suppose we add an outlier as 49 in this data. So data becomes as below:
X |
5 |
8 |
12 |
14 |
16 |
49 |
Now, we have to find the above descriptive statistics for this data including outlier. The required descriptive statistics are given as below:
X |
|
Mean |
17.33 |
Median |
13 |
Mode |
#N/A |
Standard Deviation |
16.02 |
Range |
44 |
Minimum |
5 |
Maximum |
49 |
Sum |
104 |
Count |
6 |
First Quartile |
8 |
Third Quartile |
16 |
IQR |
8 |
Now, we have to compare the values of descriptive statistics. The comparisons for these descriptive statistics are given as below:
Statistics |
Original Data |
Data with outlier |
Comparison |
Mean |
11 |
17.33 |
There is a large difference. |
Median |
12 |
13 |
There is a small difference. |
Mode |
#N/A |
#N/A |
#N/A |
Standard Deviation |
4.47 |
16.02 |
There is a large difference. |
Range |
11 |
44 |
There is a large difference. |
Minimum |
5 |
5 |
There is no difference. |
Maximum |
16 |
49 |
There is a large difference. |
Sum |
55 |
104 |
There is a large difference. |
First Quartile |
6.5 |
8 |
There is a small difference. |
Third Quartile |
15 |
16 |
There is a small difference. |
IQR |
8.5 |
8 |
There is a small difference. |
So, it is observed that the first quartile, median or second quartile, third quartile, and IQR (Interquartile range) is more effective if data contains an outlier.