Question

In: Computer Science

Question 2 Consider the one-dimensional data set shown below. x 0.5 3.0 4.5 4.6 4.9 5.2...

Question 2 Consider the one-dimensional data set shown below.

x 0.5 3.0 4.5 4.6 4.9 5.2 5.3 5.5 7.0 9.5
y - - + + + - - + - -

Classify the data point x = 5.0 according to its 1st, 3rd, 5th, and 9th nearest neighbors using K-nearest neighbor classifier.

Question 3 Use data set mushrooms.csv available for developing supervised model. The data set contains two classes namely,edible and poisonous. Perform following analysis on the data set.

Data Set:

1. Understand distribution of classes in the data set using suitable plots.

2. Develop supervised models: Decision tree and k-nearest neighbor

3. Identify best k in Cross-validation evaluating method for supervised models in step 3.

4. Discuss results achieved by each supervised model using confusion matrix, sensitivity, specificity,accuracy, F1-score and ROC curve.

5. Provide your opinion on why there exist variation in performance by models.

Solutions

Expert Solution

Answer 2:

1-nearest neighbor: +

3-nearest neighbor: −

5-nearest neighbor: +

9-nearest neighbor: −

Answer 3:

(1)

Distribution of classes

In [20]:

print(classification_report(y_test, y_pred))
             precision    recall  f1-score   support

          0       0.85      0.97      0.91      1257
          1       0.97      0.82      0.89      1181

avg / total       0.91      0.90      0.90      2438
2(i) Decision Tree Model
In: from sklearn.tree import DecisionTreeClassifier as DT

classifier = DT(criterion='entropy',random_state=42)
classifier.fit(X_train,y_train)
 
Out: DecisionTreeClassifier(class_weight=None, criterion='entropy', max_depth=None,
            max_features=None, max_leaf_nodes=None,
            min_impurity_decrease=0.0, min_impurity_split=None,
            min_samples_leaf=1, min_samples_split=2,
            min_weight_fraction_leaf=0.0, presort=False, random_state=42,
            splitter='best')
(ii)k-nearest neighbor model

In: from sklearn.neighbors import KNeighborsClassifier as KNN

classifier = KNN()
classifier.fit(X_train,y_train)

Out:

KNeighborsClassifier(algorithm='auto', leaf_size=30, metric='minkowski',
           metric_params=None, n_jobs=1, n_neighbors=5, p=2,
           weights='uniform')

(4)

Decision Tree Results

In:

print_score(classifier,X_train,y_train,X_test,y_test,train=False)
Test results:

Accuracy Score: 0.9007

Classification Report:
             precision    recall  f1-score   support

          0       0.90      0.91      0.90      1257
          1       0.91      0.89      0.90      1181

avg / total       0.90      0.90      0.90      2438


Confusion Matrix:
[[1147  110]
 [ 132 1049]]

K-NN Test Results

In :

print_score(classifier,X_train,y_train,X_test,y_test,train=False)
Test results:

Accuracy Score: 0.9307

Classification Report:
             precision    recall  f1-score   support

          0       0.91      0.96      0.93      1257
          1       0.96      0.90      0.93      1181

avg / total       0.93      0.93      0.93      2438


Confusion Matrix:
[[1211   46]
 [ 123 1058]]

(5) Maybe the most well-known reason is that you have overfit the training data. You have hit upon a model, a lot of model hyperparameters, a perspective on the information, or a mix of these components and more that just so happens to give a decent ability gauge on the training dataset..


Related Solutions

Consider the following data on x and y shown in the table below, x 2 4...
Consider the following data on x and y shown in the table below, x 2 4 7 10 12 15 18 20 21 25 y 5 10 12 22 25 27 39 50 47 65 Fit the model E(y)=β0+β1x to the data, and plot the residuals versus x for the model on Minitab. Do you detect any trends? If so, what does the pattern suggest about the model?
Use the data set below to answer the question. x −2 −1 0 1 2 y...
Use the data set below to answer the question. x −2 −1 0 1 2 y 2 2 4 5 5 Find a 90% prediction interval for some value of y to be observed in the future when x = −1. (Round your answers to three decimal places.)
Question Consider the function ?(?) = ??2 and x = 0, 0.25, 0.5, 1. Then use...
Question Consider the function ?(?) = ??2 and x = 0, 0.25, 0.5, 1. Then use the suitable Newton interpolating polynomial to approximate f(0.75). Also, compute an error bound for your approximation Dont use a sheet to solve thanks numerical methods
For the data set shown below, complete parts (a) through (d) below. X 3 4 5...
For the data set shown below, complete parts (a) through (d) below. X 3 4 5 7 8 Y 3 5 8 12 13 (a) Find the estimates of Bo and B1. Bo=bo= _____ (Round to three decimal places as needed.) B1=b1= ______(Round to four decimal places as needed.) (b) Compute the standard error the point estimate for se= ____ (c) Assuming the residuals are normally distributed, determine Sb1=____ (Round to four decimal places as needed.) (d) Assuming the residuals...
For the data set shown below, complete parts (a) through (d) below. X 20 30 40...
For the data set shown below, complete parts (a) through (d) below. X 20 30 40 50 60 Y 98 95 91 81 68 (a) Find the estimates of Bo and B1. Bo=bo= _____ (Round to three decimal places as needed.) B1=b1= ______(Round to four decimal places as needed.) (b) Compute the standard error the point estimate for se= ____ (c) Assuming the residuals are normally distributed, determine Sb1=____ (Round to four decimal places as needed.) (d) Assuming the residuals...
For the data set shown below, complete parts (a) through (d) below. X 3 4 5...
For the data set shown below, complete parts (a) through (d) below. X 3 4 5 7 8 Y 4 7 6 12 15 (a) Find the estimates of Bo and B1. Bo=bo= _____ (Round to three decimal places as needed.) B1=b1= ______(Round to four decimal places as needed.) (b) Compute the standard error the point estimate for se= ____ (c) Assuming the residuals are normally distributed, determine Sb1=____ (Round to four decimal places as needed.) (d) Assuming the residuals...
For the data set shown below, complete parts (a) through (d) below. x y 20 102...
For the data set shown below, complete parts (a) through (d) below. x y 20 102 30 95 40 91 50 81 60 68 ​(a) Use technology to find the estimates of beta 0 and beta 1. beta 0 ~ b 0=_____​(Round to two decimal places as​ needed.) beta 1 ~ b 1=_____(Round to two decimal places as​ needed.) (b) Use technology to compute the standard error, the point estimate for o' (o with a little tag on the top)...
the data set shown​ below, complete parts​ (a) through​ (d) below. x 3 4 5 7...
the data set shown​ below, complete parts​ (a) through​ (d) below. x 3 4 5 7 8 y 5 7 8 12 13 ​(a)  Find the estimates of beta 0 and beta 1. beta 0almost equalsb 0equals nothing ​(Round to three decimal places as​ needed.) beta 1almost equalsb 1equals nothing ​(Round to three decimal places as​ needed.)
the data set shown​ below, complete parts​ (a) through​ (d) below. x 3 4 5 7...
the data set shown​ below, complete parts​ (a) through​ (d) below. x 3 4 5 7 8 y 5 7 6 12 13 ​(a)  Find the estimates of beta 0 and beta 1. beta 0almost equalsb 0equals nothing ​(Round to three decimal places as​ needed.) beta 1almost equalsb 1equals nothing ​(Round to three decimal places as​ needed.)(a)  Find the estimates of beta 0 and beta 1. beta 0almost equalsb 0equals ??​(Round to three decimal places as​ needed.) beta 1almost equalsb...
For the data set shown below, complete parts (a) through (d). X Y 3 4 4...
For the data set shown below, complete parts (a) through (d). X Y 3 4 4 7 5 6 7 12 8 15 (a) Find the estimates of Bo and B1. Bo=bo= _____ (Round to three decimal places as needed.) B1=b1= ______(Round to four decimal places as needed.) (b) Compute the standard error the point estimate for se= ____ (c) Assuming the residuals are normally distributed, determine Sb1=____ (Round to four decimal places as needed.) (d) Assuming the residuals are...
ADVERTISEMENT
ADVERTISEMENT
ADVERTISEMENT