In: Statistics and Probability
A manufacturing company for car batteries uses lead in its manufacturing process. Workers are encouraged to shower, shampoo, and change
clothes before going home to eliminate the transfer of lead to their children, but there is still concern that children are being exposed by their parents. A study is carried out to determine whether workers carry lead dust home. 33 of the workers children were selected as subjects, and a blood tests for each child determines the level of lead in his/her blood. A control group of 33 children from the same neighborhood with no connection to the manufacturing plant is also selected, and their blood lead levels are also measured. (20 points)
Input the data into R using the following commands:
Exposed = c(38, 23, 41, 18, 37, 36, 23, 62, 31, 34, 24,
14, 21, 17, 16, 20, 15, 10, 45, 39, 22, 35,
49, 48, 44, 35, 43, 39, 34, 13, 73, 25, 27)
Control = c(16, 18, 18, 24, 19, 11, 10, 15, 16, 18, 18,
13, 19, 10, 16, 16, 24, 13, 9, 14, 21, 19,
7, 18, 19, 12, 11, 22, 25, 16, 13, 11, 13)
This creates two variables, Exposed and Control, containing the blood lead levels for the two groups, respectively. Make histograms, look at summary statistics, and look at a side-by side boxplot for the groups using the following commands:
hist(Exposed)
summary(Exposed)
hist(Control)
summary(Control)
boxplot (Exposed, Control)
what differences do you see between the sampling distributions of the two groups?
b)
Is there evidence that the average blood lead level is different in the Exposed vs control groups? What are the relevant null and alternative hypotheses in terms of population parameters?
c) Perform a t-test of the null hypothesis of no difference in mean with a two-sided alternative using the following command
t.test(Exposed, Control, paired=FALSE, alternative=”two.sided”)
and interpret the results.
A blood lead level of more than 45 requires medical treatment. Test the hypotheses
H0: mexp = 45
vs
H0: mexp ¹ 45
where mexp is the true mean of the exposed children, using the command
t.test(Exposed, mu=45, alternative= “two.sided”)
What is your conclusion?
e) Interpret the 95% confidence interval given in the output for part d) above.
(a)
Boxplot 1 is for Exposed and Boxplot 2 is for Control. Median Value of Exposed is higher than the median value of Control.
(b)
From the boxplot it seems that there is difference between the blood level of Exposed and Control group. But to make sure we need to do pairwise t-test.
Regarding this the hypothesis are:
where,
: mean blood level of Exposed group
: mean blood of Control Group
(c)
The test being conducted is pairwise t-test named Welch Two Sample Test
t statistic value is 6.072
degree of freedom = 38.293
p-value is 4.39e-07 which is very low. with 95% confidence taking = 0.05 we see that p-value < 0.05. So we reject the null hypothesis and conclude that there is difference in mean of blood lead level of Exposed and Control group.
95% confidence interval of the difference is = [10.65, 21.29] which does not contain 0. Another proof which tells that the blood lead level of the two groups are significantly different.
Sample mean of Exposed group blood lead level is 31.85
Sample mean of Control group blood lead level is 15.88
(d)
t statistic value is -5.2439
degree of freedom = 32
p-value is 9.775e-06 which is very low. with 95% confidence taking = 0.05 we see that p-value < 0.05. So we reject the null hypothesis and conclude that mean value of blood lead level of Exposed group is different from 45. But for the given sample blood lead level is significantly less than 45. So medical treatment is not required.
(e)
95% confidence interval of the difference is = [26.74, 36.96] which does not contain 45. Another proof which tells that the mean blood lead level of the Exposed group is significantly different (less) from 45.
Sample mean of Exposed group blood lead level is 31.85