Question

In: Statistics and Probability

A software company is trying to use Bayes Theorem and other rules of probability to develop...

  1. A software company is trying to use Bayes Theorem and other rules of probability to develop an algorithm that can effectively filter out spam emails. The company’s software developers carefully examined a random sample of 20,000 emails received by its employees and found out that 4,400 of the emails were spam. A closer look at the spam emails revealed that 1,100 of them contained the word ‘free’ in the subject line. On the other hand, only 780 of the non-spam emails had the word ‘free’ in their subject line. Please answer the following questions based on the given information.

a. What is the probability that a random email will be a spam and will not contain the word ‘free’ in its subject line? Please show your work.            

b. If the word ‘free’ does not appear in the subject line of an email, what is the probability that the email is not a spam? Please show work.                  [2 points]

  1. c. What is the probability that a random email will be neither a spam and nor will contain the word ‘free’ in its subject line? Please show your work.         [2 points]                                
    1. d. Are an email being a spam and its subject line containing the word ‘free’ independent events? Please show how you arrived at your answer.      [1 point]

Solutions

Expert Solution

solution:

Given that

Total No.of e-mails = n(S) = 20,000

No.of spam mails = 4,400

   No.of non - spam mails = 20,000 - 4,400 = 15,600

No.of spam mails that contain word 'free' = 1,100

   No.of non-spam mails that contain word 'free' = 780

Let's tabulate them as following:

Type Free No-free Total
Spam 1,100 3,300

4,400

non-spam 780 14,820 15,600
Total 1,880 18,120 20,000

a) Let S = event that randomly selected mail is spam

A = event that randomly selected mail doesn't contain word 'free'

P(A) = n(A) / n(S) = 18,120/ 20,000 = 0.906

P( random email will be spam and doesn't contain word 'free' ) = P(A S)

= n(A S) /n(S)

= 3,300/20,000

= 0.165

Probability that randomly selected email will be spam and doesn't contain word 'free' = 0.165

b) Let N= event that randomly selected mail is non-spam

P(randomly selected mail is non-spam given that selected mail doesn't contain word 'free') = P( N| A)

= P(A N) /P(A)

= (14,820 / 20,000) / 0.906

= 0.818

Probability that randomly selected email will be non-spam and doesn't contain word 'free' = 0.818

c) Let F = event that randomly selected mail contain word 'free'

P(F) = n(F) /n(S) = 1,880/20,000 = 0.094

S F = event either spam or contain word 'free'

( S F ) ' = event neither spam nor contain word 'free'

P( S F) = P(S) + P(F) - P(S F) = 4,400 / 20,000 + 1,880 / 20,000 - 1,100/20,000

= 5180 / 20,000

= 0.26

Now, P ( S F )' = 1 -  P( S F) = 0.74

Probability that randomly selected mail is neither spam nor contain word 'free' = 0.74

d) We need to check events S and F are independent or not.

If S and F are independent then P(S F) = P(S) * P(F)

we have P(S F) = 0.055

P(S) * P(F) = 0.22 *0.094

= 0.02

!= P(S F)

    S and F are not Independent events

   An email being a spam and its subject line containing the word ‘free’ are not independent events


Related Solutions

A software company is trying to use Bayes Theorem and other rules of probability to develop...
A software company is trying to use Bayes Theorem and other rules of probability to develop an algorithm that can effectively filter out spam emails. The company’s software developers carefully examined a random sample of 5,000 emails received by its employees and found out that 1200 of the emails were spam. A closer look at the spam emails revealed that 300 of them contained the word ‘free’ in the subject line. On the other hand, only 228 of the non-spam...
Probability – Bayes Theorem Orion India manufactures special heavy duty printers which have chips with unique...
Probability – Bayes Theorem Orion India manufactures special heavy duty printers which have chips with unique embedded software targeted to different countries. This is a requirement because of the local laws of the importing country. 20% of their production is exported to America, 30 percent to China and the remaining to Britain. Occasionally, the wrong chips are installed in the printers and the local facility in the importing country will have to replace the wrong chip with the correct one....
Now let's use Bayes' theorem and the binomial distribution to address a Bayesian inference question. You...
Now let's use Bayes' theorem and the binomial distribution to address a Bayesian inference question. You toss a bent coin N times, obtaining a sequence of heads and tails. The coin has an unknown bias f of coming up heads. (a) If NH heads have occurred in N tosses, what is the probability distribution of f? Assume a uniform prior P(f) = 1 and make use of the following result: integral 0 to 1 f^a (1 - f)^b df =...
Now let's use Bayes' theorem and the binomial distribution to address a Bayesian inference question. You...
Now let's use Bayes' theorem and the binomial distribution to address a Bayesian inference question. You toss a bent coin N times, obtaining a sequence of heads and tails. The coin has an unknown bias f of coming up heads. (a) If NH heads have occurred in N tosses, what is the probability distribution of f? Assume a uniform prior P(f) = 1 and make use of the following result: integral 0 to 1 f^a (1 - f)^b df =...
For certain software, independently of other users, the probability is 0.07 that a user encounters a...
For certain software, independently of other users, the probability is 0.07 that a user encounters a fault. What are the chances of the 30th user is the 5th person encountering a fault?
Use the Central Limit Theorem to calculate the following probability. Assume that the distribution of the...
Use the Central Limit Theorem to calculate the following probability. Assume that the distribution of the population data is normally distributed. A person with “normal” blood pressure has a diastolic measurement of 75 mmHg, and a standard deviation of 4.5 mmHg. i) What is the probability that a person with “normal” blood pressure will get a diastolic result of over 80 mmHg, indicating the possibility of pre-hypertension? ii) If a patient takes their blood pressure every day for 10 days,...
A fair coin is tossed 1000 times. Use the Central Limit Theorem to approximate the probability...
A fair coin is tossed 1000 times. Use the Central Limit Theorem to approximate the probability that between 470 and 530 heads are obtained. How does this compare to Chebyshev’s bound?
You are working for a large pharmaceutical company that is trying to develop new therapies to...
You are working for a large pharmaceutical company that is trying to develop new therapies to stop an aggressive form of breast cancer. Your goal is to develop several different therapies that seek to influence the abnormal control of cell cycle and apoptosis in these cancerous cells. You’ve worked out a mechanism to deliver your small molecules directly to the tumor, so that it is only taken up by specific cells (cancerous cells), however you are still working on your...
Capitalized costs incurred to develop software for internal use is amortized using which method of amortization.What...
Capitalized costs incurred to develop software for internal use is amortized using which method of amortization.What is the ASC? Which amortization method is recommended?
A company is trying to develop a diagnostic kit to identify an organism using antibodies raised...
A company is trying to develop a diagnostic kit to identify an organism using antibodies raised against a small yet highly specific aromatic compound found only on the surface of that organism. Injecting the small compound in laboratory animals does not generate an immune response to it , so the company decides to mix it with Freund’s adjuvant (a powerful adjuvant) before administering to animals. Yet even with the use of such a powerful adjuvant, no antibody response to the...
ADVERTISEMENT
ADVERTISEMENT
ADVERTISEMENT