Question

In: Math

The accompanying data are the percentage of babies born prematurely in a particular year for the...

The accompanying data are the percentage of babies born prematurely in a particular year for the 50 U.S. states and the District of Columbia (DC). State Premature Percent State Premature Percent State Premature Percent Alabama 12.3 Kentucky 11.3 North Dakota 9.0 Alaska 9.1 Louisiana 12.9 Ohio 10.9 Arizona 9.6 Maine 9.0 Oklahoma 10.9 Arkansas 10.6 Maryland 10.7 Oregon 8.3 California 8.9 Massachusetts 9.2 Pennsylvania 10.0 Colorado 9.0 Michigan 10.4 Rhode Island 9.2 Connecticut 9.8 Minnesota 9.3 South Carolina 11.4 Delaware 9.9 Mississippi 13.5 South Dakota 9.1 DC 10.2 Missouri 10.4 Tennessee 11.4 Florida 10.5 Montana 9.9 Texas 11.0 Georgia 11.4 Nebraska 9.7 Utah 9.7 Hawaii 10.6 Nevada 10.7 Vermont 8.5 Idaho 8.8 New Hampshire 8.8 Virginia 9.8 Illinois 10.7 New Jersey 10.2 Washington 8.7 Indiana 10.3 New Mexico 9.8 West Virginia 11.4 Iowa 9.9 New York 9.5 Wisconsin 9.8 Kansas 9.3 North Carolina 10.3 Wyoming 11.8 (a) The smallest value in the data set is 8.3 (Oregon), and the largest value is 13.5 (Mississippi). Are these values outliers? Explain. Any observations smaller than 8.3 Incorrect: Your answer is incorrect. % or larger than 13.5 Incorrect: Your answer is incorrect. % are considered outliers. Therefore, Oregon's data value (8.3%) Correct: Your answer is correct. an outlier and Mississippi's data value (13.5%) Changed: Your submitted answer was incorrect. Your current answer has not been submitted. an outlier. (b) Construct a boxplot for this data set. Comment on the interesting features of the plot. The boxplot shows Incorrect: Your answer is incorrect. and the distribution is . The minimum value is %, the lower quartile is %, the median is %, the upper quartile is %, and the maximum value is %.

Solutions

Expert Solution

DETECTING OUTLIER:

   Given that the smallest value in the data set is 8.3 (Oregon), and the largest value is 13.5 (Mississippi). Inorder to check if these values are outlier we make use of Inter quartile range (IQR).

The interquartile range is a measure of where the “middle fifty” is in a data set. Where a range is a measure of where the beginning and end are in a set, an interquartile range is a measure of where the bulk of the values lie.

The interquartile range formula is the first quartile subtracted from the third quartile:

IQR = Q3 – Q1

A commonly used rule says that a data point is an outlier if it is more than 1.5*IQR above the third quartile or below the first quartile.In other words, low outliers are below Q1−1.5*IQR and high outliers are above Q3+1.5*IQR.

Now lets calculate the five number summary: Minimum value, First quartile (Q1), Median, Third quartile (Q3) and Maximum value.

Minimum value:

   The minimum value is 8.3

Median:

   Inorder to find the median let's sort the data points (premature percent).

8.3, 8.5, 8.7, 8.8, 8.8, 8.9, 9, 9, 9, 9.1, 9.1, 9.2, 9.2, 9.3, 9.3, 9.5, 9.6, 9.7, 9.7, 9.8, 9.8, 9.8, 9.8, 9.9, 9.9, 9.9, 10, 10.2, 10.2, 10.3, 10.3, 10.4, 10.4, 10.5, 10.6, 10.6, 10.7, 10.7, 10.7, 10.9, 10.9, 11, 11.3, 11,4, 11.4, 11.4, 11.4, 11.8, 12.3, 12.9, 13.5

Since total number of data points (N) = 51 which is odd. Thus the median will be the middle value ((i.e) 26th value) which is 9.9. Thus MEDIAN=9.9

First quartile (Q1):

  First quartile (Q1)​ is the median of the first half of data values without involving the median (26th value 9.9)

First half of the data: 8.3, 8.5, 8.7, 8.8, 8.8, 8.9, 9, 9, 9, 9.1, 9.1, 9.2, 9.2, 9.3, 9.3, 9.5, 9.6, 9.7, 9.7, 9.8, 9.8, 9.8, 9,8, 9.9, 9.9

Since the size of first half of data is 25 which is odd, median is the central value (13th value) of first half data which is 9.2.

Thus first quartile (Q1)= 9.2

Third quartile (Q3):

   Third quartile (Q3)​ is the median of the second half of data values without involving the median (26th value 9.9)

Second half of data: 10, 10.2, 10.2, 10.3, 10.3, 10.4, 10.4, 10.5, 10.6, 10.6, 10.7, 10.7, 10.7, 10.9, 10.9, 11, 11.3, 11,4, 11.4, 11.4, 11.4, 11.8, 12.3, 12.9, 13.5

Since the size of second half of data is 25 which is odd, therefore the median is the central value (13th value) of second half data which is 10.7.

Thus Third quartile (Q3) = 10.7

Maximum Value:

    The maximum value is 13.5

TO FIND OUTLIER:

An observation is an outlier if it falls more than 1.5(IQR) above the upper quartile or more than 1.5(IQR) below the lower quartile.​ In other words if an observation falls above Q3+1.5(IQR) or falls below Q1-1.5(IQR), it is an outlier.

Now IQR = Q3-Q1

= 10.7-9.2

   IQR = 1.5

Thus Q1 - 1.5(IQR) = 9.2 - 1.5(1.5)

= 9.2 - 2.25

Q1 - 1.5(IQR) = 6.95

And Q3 + 1.5(IQR) = 10.7 + 1.5(1.5)

   = 10.7 - 2.25

   Q3 + 1.5(IQR) = 8.45

Thus we could see that the smallest value 8.3 lies between Q1 - 1.5(IQR) and Q3 + 1.5(IQR). Thus 8.3(Oregon) is not an outlier. But the largest value 13.5 (Mississipi) 13.5 lies above Q3 + 1.5(IQR). Thus the largest value 13.5 is an outlier.


Related Solutions

6. The following data shows the number of new born babies on each day of the...
6. The following data shows the number of new born babies on each day of the week. Day Monday Tuesday Wednesday Thursday Friday Saturday Sunday # of baby 123 116 92 92 104 96 77 Is the proportion of new born babies on the each day of week same or different? Explain your answer.
The following data represents a random sample of birth weignts (in kgs) of male babies born...
The following data represents a random sample of birth weignts (in kgs) of male babies born to mothers on a special vitamin supplement. 3.73 3.02 4.37 4.09 3.73 2.47 4.33 4.13 3.39 4.47 3.68 3.22 4.68 3.43 (a) Do the data follow a normal distribution?  ? Yes No Report the P-value of the normality test: (b) Do the data support the claim that the mean birth weight of male babies that have been subjected to the vitamin supplement is at least...
The average weight of 237 babies born at Swedish hospital last year was 7.04 pounds with...
The average weight of 237 babies born at Swedish hospital last year was 7.04 pounds with a standard deviation of .42 pounds. Generate the population and then take two samples via the following steps. Compare their means and standard deviations. Step 1: Open an excel worksheet and enter “population” in A1, “number” in A2, birth in B1 and “weights” in B2. Type 1 in A3 and ENTER. Step 2: Make A3 your active cell. From the Ribbon select the following...
The weight of freshly born babies approximates a normal distribution. The average weight of all babies...
The weight of freshly born babies approximates a normal distribution. The average weight of all babies born is 7.5 lbs (pounds), with a standard deviation of 2.5 lbs. What proportion of babies will be lighter than 12.1 lbs?
The weight of freshly born babies approximates a normal distribution. The average weight of all babies...
The weight of freshly born babies approximates a normal distribution. The average weight of all babies born is 7.5 lbs (pounds), with a standard deviation of 2.5 lbs. What proportion of babies will be between 6.5 lbs and 11.5 lbs?
The accompanying data are x = advertising share and y = market share for a particular...
The accompanying data are x = advertising share and y = market share for a particular brand of cigarettes during 10 randomly selected years. x 0.101 0.073 0.072 0.077 0.086 0.047 0.060 0.050 0.070 0.052 y 0.138 0.126 0.120 0.086 0.079 0.076 0.065 0.059 0.051 0.039 (a) Calculate the equation of the estimated regression line. (Round your answers to six decimal places.) Obtain the predicted market share when the advertising share is 0.09. (Round your answer to five decimal places.)...
The accompanying data are x = advertising share and y = market share for a particular...
The accompanying data are x = advertising share and y = market share for a particular brand of cigarettes during 10 randomly selected years. x 0.104 0.072 0.072 0.077 0.086 0.047 0.060 0.050 0.070 0.052 y 0.137 0.128 0.122 0.086 0.079 0.076 0.065 0.059 0.051 0.039 (a) Calculate the equation of the estimated regression line. (Round your answers to six decimal places.) y = Obtain the predicted market share when the advertising share is 0.09. (Round your answer to five...
The accompanying data are x = advertising share and y = market share for a particular...
The accompanying data are x = advertising share and y = market share for a particular brand of cigarettes during 10 randomly selected years. x 0.101 0.072 0.071 0.077 0.086 0.047 0.060 0.050 0.070 0.052 y 0.136 0.125 0.123 0.086 0.079 0.076 0.065 0.059 0.051 0.039 (a) Calculate the equation of the estimated regression line. (Round your answers to six decimal places.) y=________ Obtain the predicted market share when the advertising share is 0.09. (Round your answer to five decimal...
The accompanying data are x = advertising share and y = market share for a particular...
The accompanying data are x = advertising share and y = market share for a particular brand of cigarettes during 10 randomly selected years. x 0.103 0.072 0.072 0.077 0.086 0.047 0.060 0.050 0.070 0.052 y 0.135 0.124 0.122 0.086 0.079 0.076 0.065 0.059 0.051 0.039 (a) Calculate the equation of the estimated regression line. (Round your answers to six decimal places.) y = (B) Obtain the predicted market share when the advertising share is 0.09. (Round your answer to...
A particular paper included the accompanying data on the tar level of cigarettes smoked for a...
A particular paper included the accompanying data on the tar level of cigarettes smoked for a sample of male smokers who subsequently died of lung cancer. Assume it is reasonable to regard the sample as representative of male smokers who die of lung cancer. Is there convincing evidence that the proportion of male smoker lung cancer deaths is not the same for the four given tar level categories at the α = .05 level? (Use 2 decimal places.) Tar Level...
ADVERTISEMENT
ADVERTISEMENT
ADVERTISEMENT