In: Statistics and Probability
The table shows the weekly income of 2020 randomly selected full-time students. If the student did not work, a zero was entered. 0 463 0 0 501 103 527 231 329 385 3383 197 517 165 248 0 572 412 258 93 (a) Check the data set for outliers. (b) Draw a histogram of the data. (c) Provide an explanation for any outliers. |
|
A. The outlier(s) is/are _____
(Use a comma to separate answers as needed.)
B.There are no outliers
(c) Choose the possible reason(s) for any outlier(s) below. Select all that apply.
A. Data entry error
B. A student providing false information
C. A student with unusually high income
D.None of the above
E. There are no outliers.
Answer A:
Arranging the data in Ascending Order-
0 , 0 , 0 , 0, 93 , 103 , 105 , 197 , 231 , 248 , 258 , 329 , 385 , 412 , 463 , 501 , 517 , 527 , 572 , 3383
Number of observations is 20.
The formula for calculation of the median is (( n/2 + (n/2 + 1) ) th observations / 2 )
The median of these observations is ( ( 10th. + 11th. ) observations / 2 ) =( (248 + 258) / 2) = 253
Now, taking the median of the 1st half of the observation set, that is, of the following set-
0 , 0 , 0 , 0, 93 , 103 , 105 , 197 , 231 , 248
The first quartile (Q1) is (( 5th + 6th ) observations /2) = (93 + 103) / 2 = 98
Now taking the median of the 2nd half of the observation set , that is , of the following set-
258 , 329 , 385 , 412 , 463 , 501 , 517 , 527 , 572 , 3383
The third quartile (Q3) is (( 5th + 6th ) observations /2) = (463 + 501)/2 = 482
The Interquartile Range (IQR) = Third quartile - first Quartile = 482 - 98 = 384
Now let t = 1.5 x IQR = 576
Subtracting this value t from first quartile (Q1) , and adding this value t to the third quartile (Q3) , we get ,
t + Q3 = 576 + 482 = 1058
Q1 - t = 98 - 576 = -478
Therefore , if all the values of the observation set lie in the interval [-478 , 1058]. Then there are no outliers. If not, then there are outliers present in the data set.
The outlier is 3383.
Answer B:
The frequency distribution of the following data set is-
Class Intervals |
Frequency |
0-250 |
10 |
250-500 |
5 |
500-750 |
4 |
750-1000 |
0 |
1000-1250 |
0 |
1250-1500 |
0 |
1500-1750 |
0 |
1750-2000 |
0 |
2000-2250 |
0 |
2250-2500 |
0 |
2500-2750 |
0 |
2750-3000 |
0 |
3000-3250 |
0 |
3250-3500 |
1 |
Total |
20 |
The histogram to the data set is-
( where X-axis represents the Class Intervals and Y-axis represents the Frequency )
Answer C:
The three Options A , B , C, can be the reason behind the outlier.
It can be any one of the three options.