Question

In: Statistics and Probability

A software company is trying to use Bayes Theorem and other rules of probability to develop...

  1. A software company is trying to use Bayes Theorem and other rules of probability to develop an algorithm that can effectively filter out spam emails. The company’s software developers carefully examined a random sample of 5,000 emails received by its employees and found out that 1200 of the emails were spam. A closer look at the spam emails revealed that 300 of them contained the word ‘free’ in the subject line. On the other hand, only 228 of the non-spam emails had the word ‘free’ in their subject line. Please answer the following questions based on the given information
  2. What is the probability that a random email will be a spam and will not contain the word ‘free’ in its subject line? Please show your work.                           [2 points
  3. If the word ‘free’ does not appear in the subject line of an email, what is the probability that the email is not a spam? Please show work.                  [2 points]
  4. What is the probability that a random email will be neither a spam and nor will contain the word ‘free’ in its subject line? Please show your work.         [2 points]          
  5. Are an email being a spam and its subject line containing the word ‘free’ independent events? Please show how you arrived at your answer.      [1 point]

Solutions

Expert Solution

From the given information: The following table is formulated:

Contain the word free Does not contain the word free Total
Spam 300 1200
not Spam 228
Total 5000

And the blank cell values are filled:

Contain the word free Does not contain the word free Total
Spam 300 1200-300=900 1200
not Spam 228 3800-228=3572 5000-1200=3800
Total 528 3572+900=4472 5000

Probability that a random email will be a spam and will not contain the word ‘free’ in its subject line

= Number of spam emails that does not contain the word free in the subject line/Total number of emails

From the above table,

Number of spam emails that does not contain the word free in the subject line = 900

Total number of mails = 5000

Probability that a random email will be a spam and will not contain the word ‘free’ in its subject line

= Number of spam emails that does not contain the word free in the subject line/Total number of emails = 900/5000=0.18

Probability that a random email will be a spam and will not contain the word ‘free’ in its subject line = 0.18

If the word ‘free’ does not appear in the subject line of an email, probability that the email is not a spam =

Number of emails that does not contain word "free" in the subject line and not a spam / Number of emails that does not contain word "free" in the subject line

From the above table,

Number of not spam emails that does not contain the word free in the subject line = 3572

Number of emails that does not contain word "free" in the subject line = 4472

If the word ‘free’ does not appear in the subject line of an email, probability that the email is not a spam

= Number of emails that does not contain word "free" in the subject line and not a spam / Number of emails that does not contain word "free" in the subject line

= 3572/4472 = 0.798747764

If the word ‘free’ does not appear in the subject line of an email, probability that the email is not a spam = 0.798747764

Probability that a random email will be neither a spam and nor will contain the word ‘free’ in its subject line

= Number of emails that does not contain word "free" in the subject line and not a spam / Total number of emails

From the above table,

Number of not spam emails that does not contain the word free in the subject line = 3572

Total number of emails = 5000

Probability that a random email will be neither a spam and nor will contain the word ‘free’ in its subject line

= Number of emails that does not contain word "free" in the subject line and not a spam / Total number of emails

= 3572/5000=0.7144

Probability that a random email will be neither a spam and nor will contain the word ‘free’ in its subject line = 0.7144

Are an email being a spam and its subject line containing the word ‘free’ independent events

Answer : No.

Probability that a email is spam = Number of spam email / total number of mails = 1200/5000=0.24

Probability that an email contains the word " free" in the subject line = Number of emails that contain "free" in the subject line / total number of mails = 528/5000=0.1056

If the email being a spam and its subject line containing the word ‘free’ are independent events then;

Probability that a email is spam and contains the word " free" in the subject line = Probability that a email is spam x Probability that an email contains the word " free" in the subject line

Probability that a email is spam x Probability that an email contains the word " free" in the subject line = 0.24 x 0.1056 = 0.025344

Probability that a email is spam and contains the word " free" in the subject line = Number of spam emails that contain the word "free" in the subject line / Total number of emails = 300/5000 = 0.06

As 0.06 is not equal 0.025344 i.e

Probability that a email is spam and contains the word " free" in the subject line Probability that a email is spam x Probability that an email contains the word " free" in the subject line

Therefore,

email being a spam and its subject line containing the word ‘free’ independent events are not independent events


Related Solutions

A software company is trying to use Bayes Theorem and other rules of probability to develop...
A software company is trying to use Bayes Theorem and other rules of probability to develop an algorithm that can effectively filter out spam emails. The company’s software developers carefully examined a random sample of 20,000 emails received by its employees and found out that 4,400 of the emails were spam. A closer look at the spam emails revealed that 1,100 of them contained the word ‘free’ in the subject line. On the other hand, only 780 of the non-spam...
Probability – Bayes Theorem Orion India manufactures special heavy duty printers which have chips with unique...
Probability – Bayes Theorem Orion India manufactures special heavy duty printers which have chips with unique embedded software targeted to different countries. This is a requirement because of the local laws of the importing country. 20% of their production is exported to America, 30 percent to China and the remaining to Britain. Occasionally, the wrong chips are installed in the printers and the local facility in the importing country will have to replace the wrong chip with the correct one....
Now let's use Bayes' theorem and the binomial distribution to address a Bayesian inference question. You...
Now let's use Bayes' theorem and the binomial distribution to address a Bayesian inference question. You toss a bent coin N times, obtaining a sequence of heads and tails. The coin has an unknown bias f of coming up heads. (a) If NH heads have occurred in N tosses, what is the probability distribution of f? Assume a uniform prior P(f) = 1 and make use of the following result: integral 0 to 1 f^a (1 - f)^b df =...
Now let's use Bayes' theorem and the binomial distribution to address a Bayesian inference question. You...
Now let's use Bayes' theorem and the binomial distribution to address a Bayesian inference question. You toss a bent coin N times, obtaining a sequence of heads and tails. The coin has an unknown bias f of coming up heads. (a) If NH heads have occurred in N tosses, what is the probability distribution of f? Assume a uniform prior P(f) = 1 and make use of the following result: integral 0 to 1 f^a (1 - f)^b df =...
For certain software, independently of other users, the probability is 0.07 that a user encounters a...
For certain software, independently of other users, the probability is 0.07 that a user encounters a fault. What are the chances of the 30th user is the 5th person encountering a fault?
Use the Central Limit Theorem to calculate the following probability. Assume that the distribution of the...
Use the Central Limit Theorem to calculate the following probability. Assume that the distribution of the population data is normally distributed. A person with “normal” blood pressure has a diastolic measurement of 75 mmHg, and a standard deviation of 4.5 mmHg. i) What is the probability that a person with “normal” blood pressure will get a diastolic result of over 80 mmHg, indicating the possibility of pre-hypertension? ii) If a patient takes their blood pressure every day for 10 days,...
A fair coin is tossed 1000 times. Use the Central Limit Theorem to approximate the probability...
A fair coin is tossed 1000 times. Use the Central Limit Theorem to approximate the probability that between 470 and 530 heads are obtained. How does this compare to Chebyshev’s bound?
You are working for a large pharmaceutical company that is trying to develop new therapies to...
You are working for a large pharmaceutical company that is trying to develop new therapies to stop an aggressive form of breast cancer. Your goal is to develop several different therapies that seek to influence the abnormal control of cell cycle and apoptosis in these cancerous cells. You’ve worked out a mechanism to deliver your small molecules directly to the tumor, so that it is only taken up by specific cells (cancerous cells), however you are still working on your...
Capitalized costs incurred to develop software for internal use is amortized using which method of amortization.What...
Capitalized costs incurred to develop software for internal use is amortized using which method of amortization.What is the ASC? Which amortization method is recommended?
A company is trying to develop a diagnostic kit to identify an organism using antibodies raised...
A company is trying to develop a diagnostic kit to identify an organism using antibodies raised against a small yet highly specific aromatic compound found only on the surface of that organism. Injecting the small compound in laboratory animals does not generate an immune response to it , so the company decides to mix it with Freund’s adjuvant (a powerful adjuvant) before administering to animals. Yet even with the use of such a powerful adjuvant, no antibody response to the...
ADVERTISEMENT
ADVERTISEMENT
ADVERTISEMENT