Question

In: Statistics and Probability

Different people have different criteria of determining outliers. Late statistician John Tukey suggested the rule 1.5*IQR...

Different people have different criteria of determining outliers. Late statistician John Tukey suggested the rule 1.5*IQR as one criterion (singular form of criteria) and it was widely accepted ever since. It says that an observed value is considered an outlier if it is either smaller than Q1-1.5*IQR or larger than Q3+1.5*IQR. People have asked him in the past why “1.5” was used but Tukey simply answered, “Because 1 is too small and 2 is too large.” Suppose that the numerical variable of interest has a Normal distribution. Make use of parts (a) to (c) to answer the question in part (d).

  1. What is the proportion (or percentage) of all values that would be considered outliers if we adopted the

1*IQR rule (i.e. either smaller than Q1-1*IQR or larger than Q3+1*IQR)? [4]

  1. What is the proportion (or percentage) of all values that would be considered outliers if we adopted the

1.5*IQR rule? [2]

  1. What is the proportion (or percentage) of all values that would be considered outliers if we adopted the

2*IQR rule? [2]

  1. Briefly explain what Tukey said “Because 1 is too small and 2 is too large”. [2]

Solutions

Expert Solution

(a) For Q1, z = -0.6745 and for Q3, z = 0.6745

Q1 – 1 * IQR = Q1 – (Q3 – Q1) = 2Q1 – Q3 = 2(-0.6745) – 0.6745 = -2.0235

Q3 + 1 * IQR = Q3 + (Q3 – Q1) = 2Q3 – Q1 = 2(0.6745) – (-0.6745) = 2.0235

Area to the left of z = -2.0235 or to the right of z = 2.0235 is 0.0215

So, total proportion that would be considered outlier = 2 * 0.0215 = 0.043 (4.3%)

(b) Q1 – 1.5 * IQR = Q1 – 1.5(Q3 – Q1) = 2.5Q1 – 1.5Q3 = 2.5(-0.6745) – 1.5(0.6745) = -2.698

Q3 + 1.5 * IQR = Q3 + 1.5(Q3 – Q1) = 2.5Q3 – 1.5Q1 = 2.5(0.6745) – 1.5(-0.6745) = 2.698

Area to the left of z = -2.698 or to the right of z = 2.698 is 0.0035

So, total proportion that would be considered outlier = 2 * 0.0035 = 0.007 (0.7%)

(c) Q1 – 2 * IQR = Q1 – 2(Q3 – Q1) = 3Q1 – 2Q3 = 3(-0.6745) – 2(0.6745) = -3.3725

Q3 + 2 * IQR = Q3 + 2(Q3 – Q1) = 3Q3 – 2Q1 = 3(0.6745) – 2(-0.6745) = 3.3725

Area to the left of z = -3.3725 or to the right of z = 3.3725 is 0.0004

So, total proportion that would be considered outlier = 2 * 0.0004 = 0.0008 (0.08%)

(d) As we can see from above calculations, if we take 1 IQR rule, then a lot of data will pass as outliers (4.3%) whereas if we take the 2 IQR rule, hardly any data will pass as outliers (0.08%). That’s what Tukey probably meant when he said “1 is too small and 2 is too large”. So, the 1.5 IQR rule is optimal.


Related Solutions

ADVERTISEMENT
ADVERTISEMENT
ADVERTISEMENT