In: Computer Science
Question 1 :
For a voice recognition learning problem determine the possible:
Question 2 :
Question 3 :
This dataset has four features as follows: author, thread, length, and where to read the mail. According to the features the algorithm has to predict the user’s action whether to read or skip the mail.
Use Naïve Bayes classifier to predict the user’s action (skips or reads) when the author of the mail is known, the thread of the mail is follow up, the length of the mail is short, and where to read the email is home.
Author |
Thread |
Length |
Where to read |
User’s Action |
Known |
new |
long |
home |
Skips |
unknown |
new |
short |
work |
Reads |
unknown |
Follow up |
long |
work |
Skips |
Known |
Follow up |
Long |
Home |
Skips |
Known |
New |
Short |
Home |
Reads |
Known |
Follow up |
Long |
Work |
Skips |
Unknown |
New |
short |
work |
skips |
Unknown |
New |
short |
Work |
reads |
Known |
Follow up |
Long |
Home |
Skips |
known |
New |
Long |
Work |
skips |
unknown |
Follow up |
short |
home |
Skips |
Known |
new |
Long |
work |
Skips |
Known |
Follow up |
Short |
Home |
Reads |
Known |
New |
Short |
Work |
Reads |
known |
New |
short |
Home |
Reads |
Known |
Follow up |
short |
Work |
Reads |
Known |
New |
Short |
home |
Reads |
unknown |
new |
short |
work |
Reads |
Hint in authors feature you can use 0, 1 instead of unknown and known. In thread feature you can use 0, 1 instead of follow up and new. In length feature you can use 0, 1 instead of short and long. In where to read feature you can use 0, 1 instead of home, work. In the target you can use 0 instead of skips and 1 instead of reads.
1) For a voice recognition learning problem,
2) The curse of dimensionality can be summarised as: "As the dimensionality of the feature space increases, the number of configurations can grow exponentially, and hence, the number of configurations covered by an observation also decreases." We can explain it with help of the Hughes' phenomenon, which states that as the number of features for a given model increases, the classifier's performance keeps on increasing up to a particular number of features. After that, beyond the limit, training more will give degrade the performance of the classifier.
The Euclidean distance between two n-dimensional vectors with Cartesian coordinates p = (p1, p2, …, pn) and q = (q1, q2, …, qn) is computed using the distance formula:
Thus, when we have more dimensions, the amount of calculations we need to perform goes up exponentially!
We can draw the following conclusions based on the above:
(Please drop question 3 as a separate question).
If you liked this answer, consider giving it a thumbs up. Thank you so much!