Question

In: Computer Science

The following training dataset is “reading email dataset”. This dataset has four features as follows: author,...

The following training dataset is “reading email dataset”.

This dataset has four features as follows: author, thread, length, and where to read the mail. According to the features the algorithm has to predict the user’s action whether to read or skip the mail.

Use Naïve Bayes classifier to predict the user’s action (skips or reads) when the author of the mail is known, the thread of the mail is follow up, the length of the mail is short, and where to read the email is home.

Author	Thread	Length	Where to read	User’s Action
Known	new	long	home	Skips
unknown	new	short	work	Reads
unknown	Follow up	long	work	Skips
Known	Follow up	Long	Home	Skips
Known	New	Short	Home	Reads
Known	Follow up	Long	Work	Skips
Unknown	New	short	work	skips
Unknown	New	short	Work	reads
Known	Follow up	Long	Home	Skips
known	New	Long	Work	skips
unknown	Follow up	short	home	Skips
Known	new	Long	work	Skips
Known	Follow up	Short	Home	Reads
Known	New	Short	Work	Reads
known	New	short	Home	Reads
Known	Follow up	short	Work	Reads
Known	New	Short	home	Reads
unknown	new	short	work	Reads

Write a Python code to implement a naïve Bayesian classifier to predict the user’s action (skips or reads) when the author of the mail is known, the thread of the mail is follow up, the length of the mail is short, and where to read the email is home. (Do not use Scikit-Learn)
Use Scikit-Learn to predict the user’s action (skips or reads) when the author of the mail is known, the thread of the mail is follow up, the length of the mail is short, and where to read the email is home.

Hint in authors feature you can use 0, 1 instead of unknown and known. In thread feature you can use 0, 1 instead of follow up and new. In length feature you can use 0, 1 instead of short and long. In where to read feature you can use 0, 1 instead of home, work. In the target you can use 0 instead of skips and 1 instead of reads.

Expert Solution

data set:

Author

0- unknown, 1- known

[1,0,0,1,1,1,0,0,1,1,0,1,1,1,1,1,1,0]

Thread

0 - follow up

1 - new

[1,1,0,0,1,0,1,1,0,1,0,1,0,1,1,0,1,1]

Length

0 - short

1- long

[1,0,1,1,0,1,0,0,1,1,0,1,0,0,0,0,0,0]

Where to read

0 - home

1 - work

[0,1,1,0,0,1,1,1,0,1,0,1,0,1,0,1,0,1]

User's Action

0 - skip

1 - read

[0,1,0,0,1,0,0,1,0,0,0,0,1,1,1,1,1,1]

predict the user’s action (skips or reads) when the author of the mail is known, the thread of the mail is follow up, the length of the mail is short, and where to read the email is home

Question 1: Training the model with out Scikit learn

# loading given data 
author = [1,0,0,1,1,1,0,0,1,1,0,1,1,1,1,1,1,0]
tread =  [1,1,0,0,1,0,1,1,0,1,0,1,0,1,1,0,1,1]
length = [1,0,1,1,0,1,0,0,1,1,0,1,0,0,0,0,0,0]
where_to_read = [0,1,1,0,0,1,1,1,0,1,0,1,0,1,0,1,0,1]
user_action = [0,1,0,0,1,0,0,1,0,0,0,0,1,1,1,1,1,1]

# lists for storing the pre computing probabilities  
read_0 = []
read_1 = []
skip_0 = []
skip_1 = []


# function for computing the probabilities of features with user action conditions
def pre_computation(feature):
  read_para0 = 0 
  read_para1 = 0
  skip_para0 = 0
  skip_para1 = 0
  

  user_action = [0,1,0,0,1,0,0,1,0,0,0,0,1,1,1,1,1,1]

  for i in range(len(user_action)):
    if user_action[i] == 1:
      if feature[i] == 1:
        read_para1 += 1
      else:
        read_para0 += 1
    if user_action[i] == 0:
      if feature[i] == 1:
        skip_para1 += 1
      else:
        skip_para0 += 1
  read_0.append(read_para0/9)
  read_1.append(read_para1/9)
  skip_0.append(skip_para0/9)
  skip_1.append(skip_para1/9)

# computing probabilities for each feature with the user action

pre_computation(author)
pre_computation(tread)
pre_computation(length)
pre_computation(where_to_read)

# computing the probability of read under the given statement in the question
prob_read = read_1[0] * read_0[1] * read_0[2] * read_0[3]

# computing the probability of skip under the given statement in the question
prob_skip = skip_1[0] * skip_0[1] * skip_0[2] * skip_0[3]

if prob_read > prob_skip:
  print("Predicted User action is read")
else:
  print("Predicted User action is skip")

Output:

Predicted User action is read

Question 2: Training the model using Scikit learn

#Import Gaussian Naive Bayes model
from sklearn.naive_bayes import GaussianNB
import numpy as np

author = [1,0,0,1,1,1,0,0,1,1,0,1,1,1,1,1,1,0]
tread = [1,1,0,0,1,0,1,1,0,1,0,1,0,1,1,0,1,1]
length = [1,0,1,1,0,1,0,0,1,1,0,1,0,0,0,0,0,0]
where_to_read = [0,1,1,0,0,1,1,1,0,1,0,1,0,1,0,1,0,1]
user_action = [0,1,0,0,1,0,0,1,0,0,0,0,1,1,1,1,1,1]

X_train = np.vstack((author,tread,length,where_to_read)).T
Y_test = user_action

#Create a Gaussian Classifier
model = GaussianNB()

# Train the model using the training sets
model.fit(X_train,Y_test)

output: GaussianNB(priors=None, var_smoothing=1e-09)

Now we can predict the case:

author - 1, thread - 0, length -0, where to read - 0

X_test = [1,0,0,0]
y_pred = model.predict([X_test]) 
print(y_pred)

output: array([1])

1 means the user action is read

venereology answered 6 months ago

The following training dataset is “reading email dataset”. This dataset has four features as follows: author,...

The following training dataset is “reading email dataset”. This dataset has four features as follows: author, thread, length, and where to read the mail. According to the features the algorithm has to predict the user’s action whether to read or skip the mail. Use Naïve Bayes classifier to predict the user’s action (skips or reads) when the author of the mail is known, the thread of the mail is follow up, the length of the mail is short, and where...

Which of the following methods can achieve zero training error on any linearly separable dataset? (A)...

Which of the following methods can achieve zero training error on any linearly separable dataset? (A) Support vector machines (B) 3-Nearest Neighbor (C) Linear perceptron (D) Logistic regression Please answer with an explanation for each option on why it may or may not achieve zero training error on any linearly separable dataset.

Describe Hunter's Algorithm for building decision trees. Build a decision out of the following ("training") dataset....

Describe Hunter's Algorithm for building decision trees. Build a decision out of the following ("training") dataset. The goal is to determine if a person is a defaulted borrower given values for the first four attributes. How do you deal with the attribute Annual Income with real values? For a person with values for the first four attributes 11, No, Single, 180K, is this person a defaulted borrower or not according to your newly built decision tree? ID Home Owner ...

Suppose your dataset has a large number of features. What effect, if any, would feature selection...

Suppose your dataset has a large number of features. What effect, if any, would feature selection have on an SVM? And what is the effect of raising or lowering the λ hyper-parameter in an SVM?

Use the following dataset for the next four questions: X: 5 3 6 3 4 4...

Use the following dataset for the next four questions: X: 5 3 6 3 4 4 6 8 Y: 13 15 7 12 13 11 9 5 1. What is the correlation value “r”? a. -0.98 b. -0.89 c. 0.89 d. None of the above 2. Is the “r” signifcant at alpha = 0.05? (circle one) Yes No 3. Identify the regression equation below (note: Y is the dependent variable): a. Y = 19.12 + 1.74(X) b. Y = 19.12...

Mudcat Corporation has four categories of overhead, with expected costs for next year as follows. ...

Mudcat Corporation has four categories of overhead, with expected costs for next year as follows. Maintenance $750,000 Materials Handling 260,000 Inspection 470,000 Setups 245,000 Job #58 is scheduled for next year and has the following estimates: Direct materials $48,000 Direct labor (2,000 hours) $56,000 Number of inspections 95 Number of setups 88 Number of machine hours 4,500 Number of materials moves 185 Sixty thousand direct labor hours are budgeted for...

Our dataset has the following variables Commitment- how committed the employee is to the organization [measured...

Our dataset has the following variables Commitment- how committed the employee is to the organization [measured on a 5 point Likert scale:-1( strongly disagree= 5(strongly agree); higher number means more committed] Satisfaction- How satisfied is the employee with his/her job?[ measured on a 5 point Likert scale:- 1 (strongly disagree) to 5 (strongly agree); higher number means more satisfied] Performance- What was this employee’s rating on his/her last performance appraisal?[ measured on a Likert Scale:- 1 (poor) to 5 (excellent);...

Wildcat, Inc., has estimated sales (in millions) for the next four quarters as follows: Q1 Q2...

Wildcat, Inc., has estimated sales (in millions) for the next four quarters as follows: Q1 Q2 Q3 Q4 Sales $ 190 $ 210 $ 230 $ 260 Sales for the first quarter of the year after this one are projected at $205 million. Accounts receivable at the beginning of the year were $81 million. Wildcat has a 45-day collection period. Wildcat’s purchases from suppliers in a quarter are equal to 50 percent of the next quarter’s forecast sales, and suppliers...

Wildcat, Inc., has estimated sales (in millions) for the next four quarters as follows: Q1 Q2...

Wildcat, Inc., has estimated sales (in millions) for the next four quarters as follows: Q1 Q2 Q3 Q4 Sales $ 145 $ 165 $ 185 $ 215 Sales for the first quarter of the year after this one are projected at $160 million. Accounts receivable at the beginning of the year were $63 million. Wildcat has a 45-day collection period. Wildcat’s purchases from suppliers in a quarter are equal to 45 percent of the next quarter’s forecasted sales, and...

Wildcat, Inc., has estimated sales (in millions) for the next four quarters as follows: Q1 Q2...

Wildcat, Inc., has estimated sales (in millions) for the next four quarters as follows: Q1 Q2 Q3 Q4 Sales $ 105 $ 125 $ 145 $ 175 Sales for the first quarter of the following year are projected at $120 million. Accounts receivable at the beginning of the year were $47 million. Wildcat has a 45-day collection period. Wildcat’s purchases from suppliers in a quarter are equal to 45 percent of the next quarter’s forecast sales, and suppliers are normally...

Question

The following training dataset is “reading email dataset”. This dataset has four features as follows: author,...

Solutions

Expert Solution

Related Solutions

The following training dataset is “reading email dataset”. This dataset has four features as follows: author,...

Which of the following methods can achieve zero training error on any linearly separable dataset? (A)...

Describe Hunter's Algorithm for building decision trees. Build a decision out of the following ("training") dataset....

Suppose your dataset has a large number of features. What effect, if any, would feature selection...

Use the following dataset for the next four questions: X: 5 3 6 3 4 4...

Mudcat Corporation has four categories of overhead, with expected costs for next year as follows. ...

Our dataset has the following variables Commitment- how committed the employee is to the organization [measured...

Wildcat, Inc., has estimated sales (in millions) for the next four quarters as follows: Q1 Q2...

Wildcat, Inc., has estimated sales (in millions) for the next four quarters as follows: Q1 Q2...

Wildcat, Inc., has estimated sales (in millions) for the next four quarters as follows: Q1 Q2...