In: Math
The accompanying data are the percentage of babies born prematurely in a particular year for the 50 U.S. states and the District of Columbia (DC). State Premature Percent State Premature Percent State Premature Percent Alabama 12.3 Kentucky 11.3 North Dakota 9.0 Alaska 9.1 Louisiana 12.9 Ohio 10.9 Arizona 9.6 Maine 9.0 Oklahoma 10.9 Arkansas 10.6 Maryland 10.7 Oregon 8.3 California 8.9 Massachusetts 9.2 Pennsylvania 10.0 Colorado 9.0 Michigan 10.4 Rhode Island 9.2 Connecticut 9.8 Minnesota 9.3 South Carolina 11.4 Delaware 9.9 Mississippi 13.5 South Dakota 9.1 DC 10.2 Missouri 10.4 Tennessee 11.4 Florida 10.5 Montana 9.9 Texas 11.0 Georgia 11.4 Nebraska 9.7 Utah 9.7 Hawaii 10.6 Nevada 10.7 Vermont 8.5 Idaho 8.8 New Hampshire 8.8 Virginia 9.8 Illinois 10.7 New Jersey 10.2 Washington 8.7 Indiana 10.3 New Mexico 9.8 West Virginia 11.4 Iowa 9.9 New York 9.5 Wisconsin 9.8 Kansas 9.3 North Carolina 10.3 Wyoming 11.8 (a) The smallest value in the data set is 8.3 (Oregon), and the largest value is 13.5 (Mississippi). Are these values outliers? Explain. Any observations smaller than 8.3 Incorrect: Your answer is incorrect. % or larger than 13.5 Incorrect: Your answer is incorrect. % are considered outliers. Therefore, Oregon's data value (8.3%) Correct: Your answer is correct. an outlier and Mississippi's data value (13.5%) Changed: Your submitted answer was incorrect. Your current answer has not been submitted. an outlier. (b) Construct a boxplot for this data set. Comment on the interesting features of the plot. The boxplot shows Incorrect: Your answer is incorrect. and the distribution is . The minimum value is %, the lower quartile is %, the median is %, the upper quartile is %, and the maximum value is %.
DETECTING OUTLIER:
Given that the smallest value in the data set is 8.3 (Oregon), and the largest value is 13.5 (Mississippi). Inorder to check if these values are outlier we make use of Inter quartile range (IQR).
The interquartile range is a measure of where the “middle fifty” is in a data set. Where a range is a measure of where the beginning and end are in a set, an interquartile range is a measure of where the bulk of the values lie.
The interquartile range formula is the first quartile subtracted from the third quartile:
IQR = Q3 – Q1
A commonly used rule says that a data point is an outlier if it is more than 1.5*IQR above the third quartile or below the first quartile.In other words, low outliers are below Q1−1.5*IQR and high outliers are above Q3+1.5*IQR.
Now lets calculate the five number summary: Minimum value, First quartile (Q1), Median, Third quartile (Q3) and Maximum value.
Minimum value:
The minimum value is 8.3
Median:
Inorder to find the median let's sort the data points (premature percent).
8.3, 8.5, 8.7, 8.8, 8.8, 8.9, 9, 9, 9, 9.1, 9.1, 9.2, 9.2, 9.3, 9.3, 9.5, 9.6, 9.7, 9.7, 9.8, 9.8, 9.8, 9.8, 9.9, 9.9, 9.9, 10, 10.2, 10.2, 10.3, 10.3, 10.4, 10.4, 10.5, 10.6, 10.6, 10.7, 10.7, 10.7, 10.9, 10.9, 11, 11.3, 11,4, 11.4, 11.4, 11.4, 11.8, 12.3, 12.9, 13.5
Since total number of data points (N) = 51 which is odd. Thus the median will be the middle value ((i.e) 26th value) which is 9.9. Thus MEDIAN=9.9
First quartile (Q1):
First quartile (Q1) is the median of the first half of data values without involving the median (26th value 9.9)
First half of the data: 8.3, 8.5, 8.7, 8.8, 8.8, 8.9, 9, 9, 9, 9.1, 9.1, 9.2, 9.2, 9.3, 9.3, 9.5, 9.6, 9.7, 9.7, 9.8, 9.8, 9.8, 9,8, 9.9, 9.9
Since the size of first half of data is 25 which is odd, median is the central value (13th value) of first half data which is 9.2.
Thus first quartile (Q1)= 9.2
Third quartile (Q3):
Third quartile (Q3) is the median of the second half of data values without involving the median (26th value 9.9)
Second half of data: 10, 10.2, 10.2, 10.3, 10.3, 10.4, 10.4, 10.5, 10.6, 10.6, 10.7, 10.7, 10.7, 10.9, 10.9, 11, 11.3, 11,4, 11.4, 11.4, 11.4, 11.8, 12.3, 12.9, 13.5
Since the size of second half of data is 25 which is odd, therefore the median is the central value (13th value) of second half data which is 10.7.
Thus Third quartile (Q3) = 10.7
Maximum Value:
The maximum value is 13.5
TO FIND OUTLIER:
An observation is an outlier if it falls more than 1.5(IQR) above the upper quartile or more than 1.5(IQR) below the lower quartile. In other words if an observation falls above Q3+1.5(IQR) or falls below Q1-1.5(IQR), it is an outlier.
Now IQR = Q3-Q1
= 10.7-9.2
IQR = 1.5
Thus Q1 - 1.5(IQR) = 9.2 - 1.5(1.5)
= 9.2 - 2.25
Q1 - 1.5(IQR) = 6.95
And Q3 + 1.5(IQR) = 10.7 + 1.5(1.5)
= 10.7 - 2.25
Q3 + 1.5(IQR) = 8.45
Thus we could see that the smallest value 8.3 lies between Q1 - 1.5(IQR) and Q3 + 1.5(IQR). Thus 8.3(Oregon) is not an outlier. But the largest value 13.5 (Mississipi) 13.5 lies above Q3 + 1.5(IQR). Thus the largest value 13.5 is an outlier.