In: Statistics and Probability
a. What is the probability that a random email will be a spam and will not contain the word ‘free’ in its subject line? Please show your work.
b. If the word ‘free’ does not appear in the subject line of an email, what is the probability that the email is not a spam? Please show work. [2 points]
solution:
Given that
Total No.of e-mails = n(S) = 20,000
No.of spam mails = 4,400
No.of non - spam mails = 20,000 - 4,400 = 15,600
No.of spam mails that contain word 'free' = 1,100
No.of non-spam mails that contain word 'free' = 780
Let's tabulate them as following:
Type | Free | No-free | Total |
Spam | 1,100 | 3,300 |
4,400 |
non-spam | 780 | 14,820 | 15,600 |
Total | 1,880 | 18,120 | 20,000 |
a) Let S = event that randomly selected mail is spam
A = event that randomly selected mail doesn't contain word 'free'
P(A) = n(A) / n(S) = 18,120/ 20,000 = 0.906
P( random email will be spam and doesn't contain word 'free' ) = P(A S)
= n(A S) /n(S)
= 3,300/20,000
= 0.165
Probability that randomly selected email will be spam and doesn't contain word 'free' = 0.165
b) Let N= event that randomly selected mail is non-spam
P(randomly selected mail is non-spam given that selected mail doesn't contain word 'free') = P( N| A)
= P(A N) /P(A)
= (14,820 / 20,000) / 0.906
= 0.818
Probability that randomly selected email will be non-spam and doesn't contain word 'free' = 0.818
c) Let F = event that randomly selected mail contain word 'free'
P(F) = n(F) /n(S) = 1,880/20,000 = 0.094
S F = event either spam or contain word 'free'
( S F ) ' = event neither spam nor contain word 'free'
P( S F) = P(S) + P(F) - P(S F) = 4,400 / 20,000 + 1,880 / 20,000 - 1,100/20,000
= 5180 / 20,000
= 0.26
Now, P ( S F )' = 1 - P( S F) = 0.74
Probability that randomly selected mail is neither spam nor contain word 'free' = 0.74
d) We need to check events S and F are independent or not.
If S and F are independent then P(S F) = P(S) * P(F)
we have P(S F) = 0.055
P(S) * P(F) = 0.22 *0.094
= 0.02
!= P(S F)
S and F are not Independent events
An email being a spam and its subject line containing the word ‘free’ are not independent events