Question

In: Statistics and Probability

A software company is trying to use Bayes Theorem and other rules of probability to develop...

  1. A software company is trying to use Bayes Theorem and other rules of probability to develop an algorithm that can effectively filter out spam emails. The company’s software developers carefully examined a random sample of 20,000 emails received by its employees and found out that 4,400 of the emails were spam. A closer look at the spam emails revealed that 1,100 of them contained the word ‘free’ in the subject line. On the other hand, only 780 of the non-spam emails had the word ‘free’ in their subject line. Please answer the following questions based on the given information.

a. What is the probability that a random email will be a spam and will not contain the word ‘free’ in its subject line? Please show your work.            

b. If the word ‘free’ does not appear in the subject line of an email, what is the probability that the email is not a spam? Please show work.                  [2 points]

  1. c. What is the probability that a random email will be neither a spam and nor will contain the word ‘free’ in its subject line? Please show your work.         [2 points]                                
    1. d. Are an email being a spam and its subject line containing the word ‘free’ independent events? Please show how you arrived at your answer.      [1 point]

Solutions

Expert Solution

solution:

Given that

Total No.of e-mails = n(S) = 20,000

No.of spam mails = 4,400

   No.of non - spam mails = 20,000 - 4,400 = 15,600

No.of spam mails that contain word 'free' = 1,100

   No.of non-spam mails that contain word 'free' = 780

Let's tabulate them as following:

Type Free No-free Total
Spam 1,100 3,300

4,400

non-spam 780 14,820 15,600
Total 1,880 18,120 20,000

a) Let S = event that randomly selected mail is spam

A = event that randomly selected mail doesn't contain word 'free'

P(A) = n(A) / n(S) = 18,120/ 20,000 = 0.906

P( random email will be spam and doesn't contain word 'free' ) = P(A S)

= n(A S) /n(S)

= 3,300/20,000

= 0.165

Probability that randomly selected email will be spam and doesn't contain word 'free' = 0.165

b) Let N= event that randomly selected mail is non-spam

P(randomly selected mail is non-spam given that selected mail doesn't contain word 'free') = P( N| A)

= P(A N) /P(A)

= (14,820 / 20,000) / 0.906

= 0.818

Probability that randomly selected email will be non-spam and doesn't contain word 'free' = 0.818

c) Let F = event that randomly selected mail contain word 'free'

P(F) = n(F) /n(S) = 1,880/20,000 = 0.094

S F = event either spam or contain word 'free'

( S F ) ' = event neither spam nor contain word 'free'

P( S F) = P(S) + P(F) - P(S F) = 4,400 / 20,000 + 1,880 / 20,000 - 1,100/20,000

= 5180 / 20,000

= 0.26

Now, P ( S F )' = 1 -  P( S F) = 0.74

Probability that randomly selected mail is neither spam nor contain word 'free' = 0.74

d) We need to check events S and F are independent or not.

If S and F are independent then P(S F) = P(S) * P(F)

we have P(S F) = 0.055

P(S) * P(F) = 0.22 *0.094

= 0.02

!= P(S F)

    S and F are not Independent events

   An email being a spam and its subject line containing the word ‘free’ are not independent events


Related Solutions

A software company is trying to use Bayes Theorem and other rules of probability to develop...
A software company is trying to use Bayes Theorem and other rules of probability to develop an algorithm that can effectively filter out spam emails. The company’s software developers carefully examined a random sample of 5,000 emails received by its employees and found out that 1200 of the emails were spam. A closer look at the spam emails revealed that 300 of them contained the word ‘free’ in the subject line. On the other hand, only 228 of the non-spam...
Hooper reviews both the origin and the purpose of Bayes' theorem which involves examining probability theory...
Hooper reviews both the origin and the purpose of Bayes' theorem which involves examining probability theory with full control of predictive outcomes as opposed to probability theory with some predictive outcomes outside the realm of individual control. Consider, then, the juxtaposition of Bayes' theorem with cognitive comprehension. To what extent does this juxtaposition relate to one's understanding of God? Explain.
Probability – Bayes Theorem Orion India manufactures special heavy duty printers which have chips with unique...
Probability – Bayes Theorem Orion India manufactures special heavy duty printers which have chips with unique embedded software targeted to different countries. This is a requirement because of the local laws of the importing country. 20% of their production is exported to America, 30 percent to China and the remaining to Britain. Occasionally, the wrong chips are installed in the printers and the local facility in the importing country will have to replace the wrong chip with the correct one....
Now let's use Bayes' theorem and the binomial distribution to address a Bayesian inference question. You...
Now let's use Bayes' theorem and the binomial distribution to address a Bayesian inference question. You toss a bent coin N times, obtaining a sequence of heads and tails. The coin has an unknown bias f of coming up heads. (a) If NH heads have occurred in N tosses, what is the probability distribution of f? Assume a uniform prior P(f) = 1 and make use of the following result: integral 0 to 1 f^a (1 - f)^b df =...
Now let's use Bayes' theorem and the binomial distribution to address a Bayesian inference question. You...
Now let's use Bayes' theorem and the binomial distribution to address a Bayesian inference question. You toss a bent coin N times, obtaining a sequence of heads and tails. The coin has an unknown bias f of coming up heads. (a) If NH heads have occurred in N tosses, what is the probability distribution of f? Assume a uniform prior P(f) = 1 and make use of the following result: integral 0 to 1 f^a (1 - f)^b df =...
For certain software, independently of other users, the probability is 0.07 that a user encounters a...
For certain software, independently of other users, the probability is 0.07 that a user encounters a fault. What are the chances of the 30th user is the 5th person encountering a fault?
Use the Central Limit Theorem to calculate the following probability. Assume that the distribution of the...
Use the Central Limit Theorem to calculate the following probability. Assume that the distribution of the population data is normally distributed. A person with “normal” blood pressure has a diastolic measurement of 75 mmHg, and a standard deviation of 4.5 mmHg. i) What is the probability that a person with “normal” blood pressure will get a diastolic result of over 80 mmHg, indicating the possibility of pre-hypertension? ii) If a patient takes their blood pressure every day for 10 days,...
Your company asked you to develop a new software. Explain the following : 1. What software...
Your company asked you to develop a new software. Explain the following : 1. What software development methodology you will use and why? 2. List two models, tools, and techniques you will use and explain why you used each one?
A fair coin is tossed 1000 times. Use the Central Limit Theorem to approximate the probability...
A fair coin is tossed 1000 times. Use the Central Limit Theorem to approximate the probability that between 470 and 530 heads are obtained. How does this compare to Chebyshev’s bound?
You are working for a large pharmaceutical company that is trying to develop new therapies to...
You are working for a large pharmaceutical company that is trying to develop new therapies to stop an aggressive form of breast cancer. Your goal is to develop several different therapies that seek to influence the abnormal control of cell cycle and apoptosis in these cancerous cells. You’ve worked out a mechanism to deliver your small molecules directly to the tumor, so that it is only taken up by specific cells (cancerous cells), however you are still working on your...
ADVERTISEMENT
ADVERTISEMENT
ADVERTISEMENT