In: Statistics and Probability
Spam Spam filters try to sort your e-mails, deciding which are real messages and which are unwanted. One method used is a point system. The filter reads each incoming message and assigns points to the sender, the subject, key words in the message and so on. The higher the point total, the more likely it is that the message is unwanted. The filter has a cutoff value for the point total; any message rated lower than the cutoff passes through to your inbox, and the rest, suspected to be spam, are directed to the junk mailbox. We can think of the filter’s decision as a hypothesis test. The null hypothesis is that the e-mail is a real message and should go to your inbox. A higher point total provides evidence that the message may be spam; when there is sufficient evidence, the filter rejects the null, classifying the message as junk. This ususally works pretty well, but, of course, sometimes the filter makes a mistake. a. (1 mark) When the filter allows spam to slip through into your inbox, which kind of error is that? b. (1 mark) Which kind of error is it when a real message gets classified as junk? c. Some filters allow the user (that’s you) to adjust the cutoff. Suppose your filter has a default cutoff of 50 points, but you reset it to 40. Is that similar to choosing a larger value or similar to choosing a smaller value of α for a hypothesis test? Explain