In: Statistics and Probability
The data below represent the cost in dollars that each of 40 cafeteria customers paid for their salad, as observed by statistics students in Fall 2018.
2.45 |
2.89 |
3.11 |
3.34 |
3.38 |
3.80 |
3.83 |
3.88 |
3.93 |
4.06 |
4.09 |
4.19 |
4.23 |
4.29 |
4.33 |
4.38 |
4.75 |
4.96 |
5.00 |
5.08 |
5.20 |
5.25 |
5.28 |
5.45 |
5.65 |
5.70 |
5.72 |
5.78 |
5.81 |
5.93 |
5.98 |
6.15 |
6.28 |
6.40 |
6.56 |
6.59 |
6.79 |
6.90 |
7.08 |
7.24 |
The data is enterd into excel from column A1 to A40.
Mean | =AVERAGE(A1:A40) | 5.04375 |
Median(Q2) | =MEDIAN(A1:A40) | 5.14 |
s.d. | =STDEV(A1:A40) | 1.241889973 |
Q1 | =PERCENTILE(A1:A40,0.25) | 4.0825 |
Q3 | =PERCENTILE(A1:A40,0.75) | 5.9425 |
IQR | =Q3-Q1 | 1.86 |
Range | =MAX(A1:A40)-MIN(A1:A40) | 4.75 |
Q3-Q2 | =Q3-Q2 | 0.8025 |
Q2-Q1 | =Q2-Q1 | 1.0575 |
Inner Fence | =Q1-1.5*IQR | 1.2925 |
Outer Fence | =Q3+1.5*IQR | 8.7325 |
The shape of the distribution of the salad costs is approximately symmetric since Q3-Q2Q2-Q1.
Since the max value = 7.24 and min = 2.49, so there are no points outside the inner or outer fence. Hence there are no outliers.
After omitting the Highest and lowest costs the summary of the data is:
Mean | =AVERAGE(A1:A38) | 5.053157895 |
Median(Q2) | =MEDIAN(A1:A38) | 5.14 |
s.d. | =STDEV(A1:A38) | 1.148451614 |
Q1 | =PERCENTILE(A1:A38,0.25) | 4.115 |
Q3 | =PERCENTILE(A1:A38,0.75) | 5.9 |
IQR | =Q3-Q1 | 1.785 |
Range | =MAX(A1:A38)-MIN(A1:A38) | 2.85 |
Since median does not depends on the maximum and minimum values it only depend on the number of observation. So when they are deleted though the number of observation decreases but still it is even and hence the median remains same.
measures of variability depends on the values of the dataset. So when the max and the min values are deleted naturally the mean will be clustered where most of the data lies, so decreasing the variability.