Question

In: Statistics and Probability

The five most common words appearing in spam emails are shipping!, today!, here!, available, and fingertips!....

The five most common words appearing in spam emails are shipping!, today!, here!, available, and fingertips!. Many spam filters separate spam from ham (email not considered to be spam) through application of Bayes' theorem. Suppose that for one email account, in every messages is spam and the proportions of spam messages that have the five most common words in spam email are given below.

shipping!        0.050      

today!             0.047

here!              0.034

Available       0.016

fingertips!      0.016


Also suppose that the proportions of ham messages that have these words are

shipping!

0.0016

today!

0.0021

here!

0.0021

available

0.0041

fingertips!

0.0010

Round your answers to three decimal places.

If a message includes the word shipping!, what is the probability the message is spam?

If a message includes the word shipping!, what is the probability the message is ham?

Should messages that include the word shipping! be flagged as spam?

b. If a message includes the word today!, what is the probability the message is spam?

If a message includes the word here!, what is the probability the message is spam?

Which of these two words is a stronger indicator that a message is spam?

Why?

Because the probability is

c. If a message includes the word available, what is the probability the message is spam?

If a message includes the word fingertips!, what is the probability the message is spam?

Which of these two words is a stronger indicator that a message is spam?

Why?

Because the probability is

d. What insights do the results of parts (b) and (c) yield about what enables a spam filter that uses Bayes' theorem to work effectively?

Explain.

It is easier to distinguish spam from ham when a word occurs in spam and less often in ham.

Solutions

Expert Solution

a. Probability that a message is spam given that it has the word shipping:

Probability that a message is ham given that it has the word shipping:

Yes, the messages with shipping should be flag spammed because the probability that the message is spam is much higher than that it will beham.

b.  Probability that a message is spam given that it has the word today:

.

Probability that a message is spam given that it has the word here:

Today has the more probability of being spam.

c. Probability that a message is spam given that it has the word available:

Probability that a message is spam given that it has the word fingertips:

The probability of both the letters being spam is same. Both are equally strong indicator of message being spam.

d. The probability of words being spam is quite high as can be seen from the parts b and c. However, the words, fingertips and available have more probability to be in spam than the words here and today.

You can comment if you still have any doubts. Please rate the answer if it was helpful.


Related Solutions

The five most common words appearing in spam emails are shipping!, today!, here!, available, and fingertips!....
The five most common words appearing in spam emails are shipping!, today!, here!, available, and fingertips!. Many spam filters separate spam from ham (email not considered to be spam) through application of Bayes' theorem. Suppose that for one email account, in every messages is spam and the proportions of spam messages that have the five most common words in spam email are given below. shipping!        0.050       today!             0.047 here!              0.034 Available       0.016 fingertips!      0.016 Also suppose that the proportions of...
Which entity (not including transportation or shipping companies) in the distribution channel is most important today...
Which entity (not including transportation or shipping companies) in the distribution channel is most important today and why? What changes are happening to companies in that area? I expect you to reference the concepts in the chapter.
What tool is the most appropriate among the different monetary policy tools available today?
What tool is the most appropriate among the different monetary policy tools available today?
1.Today, ____________ is the most common software used to manage ESXi servers and the vSphere environment....
1.Today, ____________ is the most common software used to manage ESXi servers and the vSphere environment. the VMware Remote Console the Windows-based vSphere Desktop Client the Direct Console User Interface (DCUI) a supported web browser 2-_________ vNetwork switches must be managed independently on each ESXi host. 3-____________________ a patch copies the files across to the host to speed up the actual time of remediation. 4-During installation, selecting the evaluation licensing mode starts a ____ day trial period.
Explain why these five rapist profiles listed are the most common. The Sadist. The Woman Hater....
Explain why these five rapist profiles listed are the most common. The Sadist. The Woman Hater. The Opportunist. The Date Rapist The Husband Rapist
In this unit, we introduce five paths to business ownership. The second most common way is...
In this unit, we introduce five paths to business ownership. The second most common way is to purchase an existing business. In the US, it is really an environment of caveat emptor when it comes to purchasing a business. My business for the project is a Tea Lounge. Find a business that is for sale in your industry and preferably, is similar to your business idea at BizBuySell. I found: Grilled Cheese Restaurant For Sale in Downtown Cranford Cranford, NJ...
In your own words name five things paper and electronic health records have in common and...
In your own words name five things paper and electronic health records have in common and five ways in which they differ
Compare and contrast the five most common types of system units. Explain expansion slots and car...
Compare and contrast the five most common types of system units. Explain expansion slots and car Cables are used to connect exterior devices to the system unit via the ports. One end of the cable is attached to the device, and the other end has a connector that is attached to a matching connector on the port. True or False
Identify the five most common threats facing firms from their local competitive environment that are represented...
Identify the five most common threats facing firms from their local competitive environment that are represented in the five forces framework, and discuss under what conditions firms in a specific industry are most likely to earn an above average profit and when they are likely to earn a below average profit ?
Write a 1200 words research paper using a minimum of five different sources on -Healthcare crisis-Most...
Write a 1200 words research paper using a minimum of five different sources on -Healthcare crisis-Most developed nations have universal health coverage. Why doesn’t the U.S., the wealthiest nation, have it?. Use proper MLA parenthetical citation and prepare a “Works Cited” page.
ADVERTISEMENT
ADVERTISEMENT
ADVERTISEMENT