In: Statistics and Probability
Please show and answer all the parts of this question. This is More About Tests and Intervals section in Statistics. Textbook: Stats Data And Models (4th Edition). De Veaux Velleman Bock
Suppose that a particular spam filter uses a points-based system in which various aspects of an email trigger an accumulation of points – with 100 points being the maximum and strongly indicating spam. So, more points for a particular email becomes stronger evidence that it is spam. After accumulating a sufficient number of points, the spam filter classifies the email as spam and it does not reach your inbox.
This process is similar to hypothesis testing in the following way for each email it reviews:
H0: The email is a real message (not spam)
HA: The email is spam
Using the above hypothesis setting context, answer the following questions using language/terms related to hypothesis testing:
1)
When the filter allows spam to slip through into your inbox, it is a type 2 error. That is, a spam mail is classified as a real message and reaches your inbox. Type 2 error is the non rejection of a false null hypothesis.
2)
When a real email gets classified as spam and does not get to your inbox, it is a type 1 error. Here, a real mail is classified as a spam. Type 1 error is the rejection of true null hypothesis.
3)
The spam filter classifies spam as any email getting 50 points or higher. If you reset the filter to use 60 points or higher before classifying it as spam, the number of messages classified as spam will reduce. It is also the case when the significance level alpha is increased. When alpha is higher, the number of messages classified as spam will reduce, because the acceptance region will be bigger.
4)
When the spam cutoff value increases, probability of type 1 error decreases, because the probability of acceptance is more. Also the probability of type 2 error increases because acceptance region is bigger.
When the spam cutoff value decreases, probability of type 1 error increases, because the probability of acceptance is less. Also the probability of type 2 error decreases because acceptance region is smaller.
5)
Power is the probability of rejecting the null hypothesis when it is false. Here, it is the probability of classifying a message as spam when it is actually a spam.
Power is related to type 2 error as :
power = 1 - (Probability of type 2 error)