Question

In: Computer Science

Spam Filtering using Naïve Bayesian classifier In Spam filtering problem the examples or observations are the...

Spam Filtering using Naïve Bayesian classifier

In Spam filtering problem the examples or observations are the emails, and the features are the words in the email. When using words as features we suppose that some words are more likely to appear in spam than in non-spam, which is the basic premise underlying most spam filters.

The attached folder contains three files:

  1. Spmabase.Documentation: has all the information about the dataset
    1. Source
    2. Number of examples (instances or observations)
    3. Number of attributes
    4. Attributes information
  2. Spambase.names: has the names of the features, and the target
  3. Spambase.data: contains the dataset

Write a Python code to:

  1. train a Bayesian classifier to classify emails to spam email or non-spam email.
    1. Divide the dataset to a training set and a testing set.
    2. Train the classifier on the training set, and test the classifier on the testing set,
    3. use 70% of the dataset as a training and 30% of the data as testing.
  2. Calculate the accuracy of the classifier.

The following documents can be found here  https://archive.ics.uci.edu/ml/machine-learning-databases/spambase/

  • spambase.DOCUMENTATION
  • spambase.data
  • spambase.names

Thank you

Solutions

Expert Solution

Below is a screen shot of the python program to check indentation. Comments are given on every line explaining the code.

Below is the output of the program:


Below is the code to copy:

#CODE STARTS HERE----------------
import pandas as pd
from sklearn import metrics
from sklearn.naive_bayes import GaussianNB
from sklearn.model_selection import train_test_split

#Download spambase.data into the same directory as your code
#Add column names as integers from 0-57
df = pd.read_csv("spambase.data",header=None,names=range(58))
model = GaussianNB() #Load data into a dataframe
X= df.loc[:,:56] #Separate X from data
y= df.loc[:,57] #Separate y values(class) from data

#Split data into train(70%) and test(30%)
X_train,X_test,y_train,y_test = train_test_split(X,y,test_size=0.3)
model.fit(X_train,y_train) #Train the model

pred = model.predict(X_test) #Predict output on test data
#Calculate and print accuracy
print("Accuracy: ",round(metrics.accuracy_score(y_test,pred),4))
#CODE ENDS HERE-----------------

Related Solutions

Construct a Bayesian Network that functions as a Naïve Bayes Classifier.
Construct a Bayesian Network that functions as a Naïve Bayes Classifier. The naïve assumption lies on the fact that values of attributes are independent conditional on the decision variable. Keep this in mind while creating the network. The construction should include a graph (diagram of the network) and then the conditional probability distribution for the variables.
Discuss the main differences between Naïve Bayes Classifier and Softmax Classifier. Assess when will you use...
Discuss the main differences between Naïve Bayes Classifier and Softmax Classifier. Assess when will you use Naïve Bayes over Softmax Classifier Please provide at least 3 differences thx
What is the relationship between Naïve Bayes and Bayesian networks? What is the process of developing...
What is the relationship between Naïve Bayes and Bayesian networks? What is the process of developing a Bayesian networks model?
Implementing a Naïve Bayes classifier on below data : Please provide full explanation CustID Gender SeniorCitizen...
Implementing a Naïve Bayes classifier on below data : Please provide full explanation CustID Gender SeniorCitizen Married AnyDependents NoOfYrsCustomer PhoneService MultipleLines InternetService OnlineSecurity OnlineBackup DeviceProtection TechSupport StreamingTV StreamingMovies ContractType PaperlessBilling PaymentMethod MonthlyCharges TotalCharges SwitchToCompetitor 1 F 0 Y N 1 N No phone DSL N Y N N N N Monthly Y Electronic check 29.85 29.85 No 2 M 0 N N 34 Y N DSL Y N Y N N N One year N Mailed check 56.95 1889.5 No...
NYU is testing out two different versions of filtering software in order to reduce spam emails....
NYU is testing out two different versions of filtering software in order to reduce spam emails. The old version is called "Spam-A-Lot" and the new version is called "Spam-A-Little." In testing each version of the software the following data was produced: Email Account Solicited Mail Unsolicited Mail TOTAL Spam-A-Lot 305 95 400 Spam-A-Little 150 38 188 Let p1 and p2 denote the true proportion of unsolicited mail that make it through the "Spam-A-Lot" and "Spam-A-Little" filters, respectively. (a) Determine the...
NYU is testing out two different versions of filtering software in order to reduce spam emails....
NYU is testing out two different versions of filtering software in order to reduce spam emails. The old version is called "Spam-A-Lot" and the new version is called "Spam-A-Little." In testing each version of the software the following data was produced: Email Account Solicited Mail Unsolicited Mail TOTAL Spam-A-Lot 305 95 400 Spam-A-Little 150 38 188 Let p1 and p2 denote the true proportion of unsolicited mail that make it through the "Spam-A-Lot" and "Spam-A-Little" filters, respectively. (a) Determine the...
A classifier is trained on a cancer dataset, and achieves 96% accuracy on new observations. Why...
A classifier is trained on a cancer dataset, and achieves 96% accuracy on new observations. Why might this not be considered a good classifier? How could it be improved?
Filtering and perception, communication pitfalls, can create distortions in the communication process. Describe each using examples...
Filtering and perception, communication pitfalls, can create distortions in the communication process. Describe each using examples and discuss what managers can do to avoid these communication distortions. (Do not use the same examples as described in the text.)
What are the advantages of using the Bayesian approach in Finance?
What are the advantages of using the Bayesian approach in Finance?
Naïve Approach- Using the naïve approach forecast the average domestic airfare for 2014. Explain how you...
Naïve Approach- Using the naïve approach forecast the average domestic airfare for 2014. Explain how you calculated this value. How accurate do you think this forecast is? Moving Averages: Using the 4-year moving average approach forecast the average airfare for 2014. Explain how you calculated this value. How accurate do you think this forecast is? Exponential Smoothing: Using the exponential smoothing approach with α = 0.40 forecast average airfare for 2014. Explain how you calculated this value. Round each forecast...
ADVERTISEMENT
ADVERTISEMENT
ADVERTISEMENT