In: Computer Science
This dataset has four features as follows: author, thread, length, and where to read the mail. According to the features the algorithm has to predict the user’s action whether to read or skip the mail.
Use Naïve Bayes classifier to predict the user’s action (skips or reads) when the author of the mail is known, the thread of the mail is follow up, the length of the mail is short, and where to read the email is home.
Author |
Thread |
Length |
Where to read |
User’s Action |
Known |
new |
long |
home |
Skips |
unknown |
new |
short |
work |
Reads |
unknown |
Follow up |
long |
work |
Skips |
Known |
Follow up |
Long |
Home |
Skips |
Known |
New |
Short |
Home |
Reads |
Known |
Follow up |
Long |
Work |
Skips |
Unknown |
New |
short |
work |
skips |
Unknown |
New |
short |
Work |
reads |
Known |
Follow up |
Long |
Home |
Skips |
known |
New |
Long |
Work |
skips |
unknown |
Follow up |
short |
home |
Skips |
Known |
new |
Long |
work |
Skips |
Known |
Follow up |
Short |
Home |
Reads |
Known |
New |
Short |
Work |
Reads |
known |
New |
short |
Home |
Reads |
Known |
Follow up |
short |
Work |
Reads |
Known |
New |
Short |
home |
Reads |
unknown |
new |
short |
work |
Reads |
Hint in authors feature you can use 0, 1 instead of unknown and known. In thread feature you can use 0, 1 instead of follow up and new. In length feature you can use 0, 1 instead of short and long. In where to read feature you can use 0, 1 instead of home, work. In the target you can use 0 instead of skips and 1 instead of reads.
data set:
Author
0- unknown, 1- known
[1,0,0,1,1,1,0,0,1,1,0,1,1,1,1,1,1,0]
Thread
0 - follow up
1 - new
[1,1,0,0,1,0,1,1,0,1,0,1,0,1,1,0,1,1]
Length
0 - short
1- long
[1,0,1,1,0,1,0,0,1,1,0,1,0,0,0,0,0,0]
Where to read
0 - home
1 - work
[0,1,1,0,0,1,1,1,0,1,0,1,0,1,0,1,0,1]
User's Action
0 - skip
1 - read
[0,1,0,0,1,0,0,1,0,0,0,0,1,1,1,1,1,1]
predict the user’s action (skips or reads) when the author of the mail is known, the thread of the mail is follow up, the length of the mail is short, and where to read the email is home
Question 1: Training the model with out Scikit learn
# loading given data
author = [1,0,0,1,1,1,0,0,1,1,0,1,1,1,1,1,1,0]
tread = [1,1,0,0,1,0,1,1,0,1,0,1,0,1,1,0,1,1]
length = [1,0,1,1,0,1,0,0,1,1,0,1,0,0,0,0,0,0]
where_to_read = [0,1,1,0,0,1,1,1,0,1,0,1,0,1,0,1,0,1]
user_action = [0,1,0,0,1,0,0,1,0,0,0,0,1,1,1,1,1,1]
# lists for storing the pre computing probabilities
read_0 = []
read_1 = []
skip_0 = []
skip_1 = []
# function for computing the probabilities of features with user action conditions
def pre_computation(feature):
read_para0 = 0
read_para1 = 0
skip_para0 = 0
skip_para1 = 0
user_action = [0,1,0,0,1,0,0,1,0,0,0,0,1,1,1,1,1,1]
for i in range(len(user_action)):
if user_action[i] == 1:
if feature[i] == 1:
read_para1 += 1
else:
read_para0 += 1
if user_action[i] == 0:
if feature[i] == 1:
skip_para1 += 1
else:
skip_para0 += 1
read_0.append(read_para0/9)
read_1.append(read_para1/9)
skip_0.append(skip_para0/9)
skip_1.append(skip_para1/9)
# computing probabilities for each feature with the user action
pre_computation(author)
pre_computation(tread)
pre_computation(length)
pre_computation(where_to_read)
# computing the probability of read under the given statement in the question
prob_read = read_1[0] * read_0[1] * read_0[2] * read_0[3]
# computing the probability of skip under the given statement in the question
prob_skip = skip_1[0] * skip_0[1] * skip_0[2] * skip_0[3]
if prob_read > prob_skip:
print("Predicted User action is read")
else:
print("Predicted User action is skip")
Output:
Predicted User action is read
Question 2: Training the model using Scikit learn
#Import Gaussian Naive Bayes model
from sklearn.naive_bayes import GaussianNB
import numpy as np
author = [1,0,0,1,1,1,0,0,1,1,0,1,1,1,1,1,1,0]
tread = [1,1,0,0,1,0,1,1,0,1,0,1,0,1,1,0,1,1]
length = [1,0,1,1,0,1,0,0,1,1,0,1,0,0,0,0,0,0]
where_to_read = [0,1,1,0,0,1,1,1,0,1,0,1,0,1,0,1,0,1]
user_action = [0,1,0,0,1,0,0,1,0,0,0,0,1,1,1,1,1,1]
X_train = np.vstack((author,tread,length,where_to_read)).T
Y_test = user_action
#Create a Gaussian Classifier
model = GaussianNB()
# Train the model using the training sets
model.fit(X_train,Y_test)
output: GaussianNB(priors=None, var_smoothing=1e-09)
Now we can predict the case:
author - 1, thread - 0, length -0, where to read - 0
X_test = [1,0,0,0]
y_pred = model.predict([X_test])
print(y_pred)
output: array([1])
1 means the user action is read