In: Statistics and Probability
NYU is testing out two different versions of filtering software in order to reduce spam emails. The old version is called "Spam-A-Lot" and the new version is called "Spam-A-Little." In testing each version of the software the following data was produced:
Email Account | Solicited Mail | Unsolicited Mail | TOTAL |
Spam-A-Lot | 305 | 95 | 400 |
Spam-A-Little | 150 | 38 | 188 |
Let p1 and p2 denote the true proportion of unsolicited mail that make it through the "Spam-A-Lot" and "Spam-A-Little" filters, respectively.
(a) Determine the unbiased point estimates of p1 and p2:
(b) Explain why the formula for a large-sample confidence interval estimate for p1 - p2 can be used in this case.
(c) Build a 95% confidence interval for the true decrease in proportion p1 - p2 of unsolicited mail by switching filters from "Spam-A-Lot" (p1) to "Spam-A-Little" (p2), using the sample values given. Record results to 4 decimals.
(d) Based on your answer to (c), has the new filtering program reduced the amount of spam?
(e) Complete the following to perform a hypothesis test at the 5% significance level to test the claim that switching to the new filter "Spam-A-Little" has decreased the proportion of unsolicited emails getting through the filter.
i) H0:
Ha:
Level of Significance:
Observed Test Statistic (z-statistic):
ii) p-value:
Decision with justification:
Conclusion in context:
(a)
(b)
Since sample sizes are large and number of successes and failuers are greater than 5 so we can use the formula for a large-sample confidence interval estimate for p1 - p2.
(c)
(d)
Since confidence interval contains zero so we cannot conclude that the new filtering program reduced the amount of spam.
(e)
Conlusion: There is no evidence to conclude that the new filtering program reduced the amount of spam.