In: Computer Science
Spam Filtering using Naïve Bayesian classifier
In Spam filtering problem the examples or observations are the emails, and the features are the words in the email. When using words as features we suppose that some words are more likely to appear in spam than in non-spam, which is the basic premise underlying most spam filters.
The attached folder contains three files:
Write a Python code to:
The following documents can be found here https://archive.ics.uci.edu/ml/machine-learning-databases/spambase/
Thank you
Below is a screen shot of the python program to check indentation. Comments are given on every line explaining the code. Below is the output of the program: Below is the code to copy: #CODE STARTS HERE----------------
import pandas as pd from sklearn import metrics from sklearn.naive_bayes import GaussianNB from sklearn.model_selection import train_test_split #Download spambase.data into the same directory as your code #Add column names as integers from 0-57 df = pd.read_csv("spambase.data",header=None,names=range(58)) model = GaussianNB() #Load data into a dataframe X= df.loc[:,:56] #Separate X from data y= df.loc[:,57] #Separate y values(class) from data #Split data into train(70%) and test(30%) X_train,X_test,y_train,y_test = train_test_split(X,y,test_size=0.3) model.fit(X_train,y_train) #Train the model pred = model.predict(X_test) #Predict output on test data #Calculate and print accuracy print("Accuracy: ",round(metrics.accuracy_score(y_test,pred),4)) #CODE ENDS HERE-----------------