Question

In: Computer Science

The following training dataset is “reading email dataset”. This dataset has four features as follows: author,...

  1. The following training dataset is “reading email dataset”.

This dataset has four features as follows: author, thread, length, and where to read the mail. According to the features the algorithm has to predict the user’s action whether to read or skip the mail.

Use Naïve Bayes classifier to predict the user’s action (skips or reads) when the author of the mail is known, the thread of the mail is follow up, the length of the mail is short, and where to read the email is home.

Author

Thread

Length

Where to read

User’s Action

Known

new

long

home

Skips

unknown

new

short

work

Reads

unknown

Follow up

long

work

Skips

Known

Follow up

Long

Home

Skips

Known

New

Short

Home

Reads

Known

Follow up

Long

Work

Skips

Unknown

New

short

work

skips

Unknown

New

short

Work

reads

Known

Follow up

Long

Home

Skips

known

New

Long

Work

skips

unknown

Follow up

short

home

Skips

Known

new

Long

work

Skips

Known

Follow up

Short

Home

Reads

Known

New

Short

Work

Reads

known

New

short

Home

Reads

Known

Follow up

short

Work

Reads

Known

New

Short

home

Reads

unknown

new

short

work

Reads

  1. Write a Python code to implement a naïve Bayesian classifier to predict the user’s action (skips or reads) when the author of the mail is known, the thread of the mail is follow up, the length of the mail is short, and where to read the email is home. (Do not use Scikit-Learn)
  2. Use Scikit-Learn to predict the user’s action (skips or reads) when the author of the mail is known, the thread of the mail is follow up, the length of the mail is short, and where to read the email is home.

Hint in authors feature you can use 0, 1 instead of unknown and known. In thread feature you can use 0, 1 instead of follow up and new. In length feature you can use 0, 1 instead of short and long. In where to read feature you can use 0, 1 instead of home, work. In the target you can use 0 instead of skips and 1 instead of reads.

Solutions

Expert Solution

data set:

Author

0- unknown, 1- known

[1,0,0,1,1,1,0,0,1,1,0,1,1,1,1,1,1,0]

Thread

0 - follow up

1 - new

[1,1,0,0,1,0,1,1,0,1,0,1,0,1,1,0,1,1]

Length

0 - short

1- long

[1,0,1,1,0,1,0,0,1,1,0,1,0,0,0,0,0,0]

Where to read

0 - home

1 - work

[0,1,1,0,0,1,1,1,0,1,0,1,0,1,0,1,0,1]

User's Action

0 - skip

1 - read

[0,1,0,0,1,0,0,1,0,0,0,0,1,1,1,1,1,1]

predict the user’s action (skips or reads) when the author of the mail is known, the thread of the mail is follow up, the length of the mail is short, and where to read the email is home

Question 1: Training the model with out Scikit learn

# loading given data 
author = [1,0,0,1,1,1,0,0,1,1,0,1,1,1,1,1,1,0]
tread =  [1,1,0,0,1,0,1,1,0,1,0,1,0,1,1,0,1,1]
length = [1,0,1,1,0,1,0,0,1,1,0,1,0,0,0,0,0,0]
where_to_read = [0,1,1,0,0,1,1,1,0,1,0,1,0,1,0,1,0,1]
user_action = [0,1,0,0,1,0,0,1,0,0,0,0,1,1,1,1,1,1]

# lists for storing the pre computing probabilities  
read_0 = []
read_1 = []
skip_0 = []
skip_1 = []


# function for computing the probabilities of features with user action conditions
def pre_computation(feature):
  read_para0 = 0 
  read_para1 = 0
  skip_para0 = 0
  skip_para1 = 0
  

  user_action = [0,1,0,0,1,0,0,1,0,0,0,0,1,1,1,1,1,1]

  for i in range(len(user_action)):
    if user_action[i] == 1:
      if feature[i] == 1:
        read_para1 += 1
      else:
        read_para0 += 1
    if user_action[i] == 0:
      if feature[i] == 1:
        skip_para1 += 1
      else:
        skip_para0 += 1
  read_0.append(read_para0/9)
  read_1.append(read_para1/9)
  skip_0.append(skip_para0/9)
  skip_1.append(skip_para1/9)


# computing probabilities for each feature with the user action

pre_computation(author)
pre_computation(tread)
pre_computation(length)
pre_computation(where_to_read)

# computing the probability of read under the given statement in the question
prob_read = read_1[0] * read_0[1] * read_0[2] * read_0[3]

# computing the probability of skip under the given statement in the question
prob_skip = skip_1[0] * skip_0[1] * skip_0[2] * skip_0[3]

if prob_read > prob_skip:
  print("Predicted User action is read")
else:
  print("Predicted User action is skip") 

Output:

Predicted User action is read

Question 2: Training the model using Scikit learn

#Import Gaussian Naive Bayes model
from sklearn.naive_bayes import GaussianNB
import numpy as np

author = [1,0,0,1,1,1,0,0,1,1,0,1,1,1,1,1,1,0]
tread = [1,1,0,0,1,0,1,1,0,1,0,1,0,1,1,0,1,1]
length = [1,0,1,1,0,1,0,0,1,1,0,1,0,0,0,0,0,0]
where_to_read = [0,1,1,0,0,1,1,1,0,1,0,1,0,1,0,1,0,1]
user_action = [0,1,0,0,1,0,0,1,0,0,0,0,1,1,1,1,1,1]

X_train = np.vstack((author,tread,length,where_to_read)).T
Y_test = user_action

#Create a Gaussian Classifier
model = GaussianNB()

# Train the model using the training sets
model.fit(X_train,Y_test)

output: GaussianNB(priors=None, var_smoothing=1e-09)

Now we can predict the case:

author - 1, thread - 0, length -0, where to read - 0

X_test = [1,0,0,0]
y_pred = model.predict([X_test]) 
print(y_pred)

output: array([1])

1 means the user action is read


Related Solutions

The following training dataset is “reading email dataset”. This dataset has four features as follows: author,...
The following training dataset is “reading email dataset”. This dataset has four features as follows: author, thread, length, and where to read the mail. According to the features the algorithm has to predict the user’s action whether to read or skip the mail. Use Naïve Bayes classifier to predict the user’s action (skips or reads) when the author of the mail is known, the thread of the mail is follow up, the length of the mail is short, and where...
Which of the following methods can achieve zero training error on any linearly separable dataset? (A)...
Which of the following methods can achieve zero training error on any linearly separable dataset? (A) Support vector machines (B) 3-Nearest Neighbor (C) Linear perceptron (D) Logistic regression Please answer with an explanation for each option on why it may or may not achieve zero training error on any linearly separable dataset.
Describe Hunter's Algorithm for building decision trees. Build a decision out of the following ("training") dataset....
Describe Hunter's Algorithm for building decision trees. Build a decision out of the following ("training") dataset. The goal is to determine if a person is a defaulted borrower given values for the first four attributes. How do you deal with the attribute Annual Income with real values? For a person with values for the first four attributes 11, No, Single, 180K, is this person a defaulted borrower or not according to your newly built decision tree? ID Home Owner              ...
Suppose your dataset has a large number of features. What effect, if any, would feature selection...
Suppose your dataset has a large number of features. What effect, if any, would feature selection have on an SVM? And what is the effect of raising or lowering the λ hyper-parameter in an SVM?
Use the following dataset for the next four questions: X: 5 3 6 3 4 4...
Use the following dataset for the next four questions: X: 5 3 6 3 4 4 6 8 Y: 13 15 7 12 13 11 9 5 1. What is the correlation value “r”? a. -0.98 b. -0.89 c. 0.89 d. None of the above 2. Is the “r” signifcant at alpha = 0.05? (circle one) Yes No 3. Identify the regression equation below (note: Y is the dependent variable): a. Y = 19.12 + 1.74(X) b. Y = 19.12...
Mudcat Corporation has four categories of overhead, with expected costs for next year as follows.                  ...
Mudcat Corporation has four categories of overhead, with expected costs for next year as follows.                   Maintenance                                                                        $750,000                   Materials Handling                                                                 260,000                   Inspection                                                                              470,000                   Setups                                                                                   245,000 Job #58 is scheduled for next year and has the following estimates:                   Direct materials                                        $48,000                   Direct labor (2,000 hours)                         $56,000                   Number of inspections                                      95                   Number of setups                                             88                   Number of machine hours                            4,500                   Number of materials moves                             185 Sixty thousand direct labor hours are budgeted for...
Our dataset has the following variables Commitment- how committed the employee is to the organization [measured...
Our dataset has the following variables Commitment- how committed the employee is to the organization [measured on a 5 point Likert scale:-1( strongly disagree= 5(strongly agree); higher number means more committed] Satisfaction- How satisfied is the employee with his/her job?[ measured on a 5 point Likert scale:- 1 (strongly disagree) to 5 (strongly agree); higher number means more satisfied] Performance- What was this employee’s rating on his/her last performance appraisal?[ measured on a Likert Scale:- 1 (poor) to 5 (excellent);...
Wildcat, Inc., has estimated sales (in millions) for the next four quarters as follows:   Q1   Q2...
Wildcat, Inc., has estimated sales (in millions) for the next four quarters as follows:   Q1   Q2   Q3   Q4   Sales $ 190 $ 210 $ 230 $ 260 Sales for the first quarter of the year after this one are projected at $205 million. Accounts receivable at the beginning of the year were $81 million. Wildcat has a 45-day collection period. Wildcat’s purchases from suppliers in a quarter are equal to 50 percent of the next quarter’s forecast sales, and suppliers...
Wildcat, Inc., has estimated sales (in millions) for the next four quarters as follows: Q1 Q2...
Wildcat, Inc., has estimated sales (in millions) for the next four quarters as follows: Q1 Q2 Q3 Q4 Sales $ 145 $ 165 $ 185 $ 215    Sales for the first quarter of the year after this one are projected at $160 million. Accounts receivable at the beginning of the year were $63 million. Wildcat has a 45-day collection period. Wildcat’s purchases from suppliers in a quarter are equal to 45 percent of the next quarter’s forecasted sales, and...
Wildcat, Inc., has estimated sales (in millions) for the next four quarters as follows:   Q1   Q2...
Wildcat, Inc., has estimated sales (in millions) for the next four quarters as follows:   Q1   Q2   Q3   Q4   Sales $ 105 $ 125 $ 145 $ 175 Sales for the first quarter of the following year are projected at $120 million. Accounts receivable at the beginning of the year were $47 million. Wildcat has a 45-day collection period. Wildcat’s purchases from suppliers in a quarter are equal to 45 percent of the next quarter’s forecast sales, and suppliers are normally...
ADVERTISEMENT
ADVERTISEMENT
ADVERTISEMENT