In: Statistics and Probability
If you want to know how important spam filters are to your online experience, try turning them off for a day. You’ll quickly see why these tools we tend to take for granted are so essential. Generally speaking, a filtering solution applied to your email system uses a set of protocols to determine which incoming messages are spam and which are not. What the filters checks on can vary, but often they all do basically the same thing: scan header information for evidence of malice, look up senders on blacklists of known spammers, and filter content for patterns that point to junk mail.
Suppose that a particular spam filter uses a points-based system in which various aspects of an email trigger an accumulation of points – with 100 points being the maximum and strongly indicating spam. So, more points for a particular email becomes stronger evidence that it is spam. After accumulating a sufficient number of points, the spam filter classifies the email as spam and it does not reach your inbox. This process is similar to hypothesis testing in the following way for each email it reviews: H0: The email is a real message (not spam) HA: The email is spam Using the above hypothesis setting context, answer the following questions using language/terms we have covered related to hypothesis testing:
a. When the filter allows spam to slip through into your inbox, which kind of error is that? Explain in terms of the hypotheses above.
b. Which kind of error is it when a real (i.e., non-spam) email gets classified as spam and does not get to your inbox? Explain in terms of the hypotheses above.
c. Suppose that this particular spam filter classifies spam as any email getting 50 points or higher. However, you reset the filter to use 60 points or higher before classifying it as spam. Is that analogous to choosing a higher or lower alpha level for a hypothesis test. Explain in terms of the hypotheses above.
d. What impact does this change in the spam cutoff value have on the chance of each type of error in hypothesis testing? Explain.
e. What does “power” mean in this context of the spam filter, and how is it related to one of the two types of errors? Explain in terms of the hypotheses above.