In: Statistics and Probability
Recall in our discussion of the normal distribution the research study that examined the blood vitamin D levels of the entire US population of landscape gardeners. The intent of this large-scale and comprehensive study was to characterize fully this population of landscapers as normally distributed with a corresponding population mean and standard deviation, which were determined from the data collection of the entire population. Suppose you are now in a different reality in which this study never took place though you are still interested in studying the average vitamin D levels of US landscapers. In other words, the underlying population mean and standard deviation are now unknown to you. Furthermore, you would like to examine if wearing tank tops instead of short sleeve shirts significantly effects vitamin D levels. To accomplish this, you propose to collect data from the landscapers at two different points in time. Specifically, the landscapers are to wear short sleeve shirts while outside working during a period of three weeks. After three weeks, you collect blood specimens and the landscapers are then to wear tank tops for the next three weeks under the same working conditions, after which you collect blood draws a second time. You obtain research funding to randomly sample 49 landscapers, collect blood samples at two different time points as described above, and send these samples to your collaborating lab in order to quantify the amount of vitamin D in the landscapers' blood. After anxiously awaiting your colleagues to complete their lab quantification protocol, they email you the following vitamin D level data as shown in the following table. Subject Time Point 1, Shirts Vitamin D (ng/mL) Time Point 2, Tank Tops Vitamin D (ng/mL) 1 40.140 54.973 2 29.302 53.264 3 32.903 54.077 4 33.187 58.143 5 33.200 51.652 6 36.299 55.622 7 37.736 55.103 8 31.107 49.355 9 31.413 49.866 10 31.939 51.338 11 34.229 53.347 12 35.135 51.136 13 30.225 49.489 14 33.434 50.957 15 31.279 54.456 16 32.489 57.402 17 38.107 51.085 18 34.673 56.724 19 33.671 54.666 20 39.481 51.757 21 28.559 48.107 22 33.751 53.268 23 36.594 47.042 24 30.494 49.083 25 40.957 48.801 26 27.767 54.648 27 39.023 50.870 28 25.083 53.742 29 38.311 58.927 30 36.607 53.896 31 39.633 51.793 32 37.399 53.477 33 36.925 56.668 34 38.604 59.609 35 30.495 52.566 36 28.180 62.559 37 34.361 47.055 38 31.680 49.573 39 30.137 55.565 40 35.135 51.880 41 33.750 58.505 42 30.822 42.867 43 39.443 53.082 44 30.446 50.593 45 39.399 44.427 46 32.276 58.522 47 38.072 52.338 48 28.701 50.748 49 30.736 53.437 What is the estimated 95% confidence interval (CI) of the average difference in blood vitamin D levels between short sleeve shirt and tank top attire amongst US landscapers in ng/mL? Please note the following: 1) in practice, you as the analyst decide how to calculate the difference in vitamin D levels between time points for a given study participant, and subsequently interpret the aggregated results appropriately in the context of the data, though for the purposes of this exercise the difference is assigned for you as follows. Define the difference as the second minus the first time points, which is common practice, since the plus or minus sign of the resulting difference reflects any change over sequential time; 2) you might calculate a CI that is different from any of the multiple choice options listed below due to rounding differences, therefore select the closest match; 3) ensure you use either the large or small sample CI formula as appropriate; and 4) you may copy and paste the data into Excel to facilitate analysis. Select one: a. 17.33 to 20.41 ng/mL b. 18.55 to 21.84 ng/mL c. 15.17 to 18.47 ng/mL d. 15.08 to 23.07 ng/mL
As the population variance is unknown, a small sample confidence interval is used to calculate the margin of error thus confidence interval. The confidence interval CI is define as,
where is the difference in mean, is t statistic value obtained from t distribution table for significance level and is the standard error for difference in means
The means and standard deviations are obtained in excel using the functions =AVERAGE() and =STDEV() repectively.
The standard error is obtained by first calculating the pooled variance
Now the confidence interval is,
The closet option is option (d) because it contains both side margin of error.while (b) and (c) are wrong because they donot contain entire one side margin of error and option (b) contain small left side margin of error.