Question

In: Computer Science

Python 3 Rewrite KNN sample code using KNeighborsClassifier . ● Repeat KNN Step 1 – 5,...

Python 3

Rewrite KNN sample code using KNeighborsClassifier .

● Repeat KNN Step 1 – 5, for at least five times and calculate average accuracy to be your result.

● If you use the latest version of scikit -learn, you need to program with Python >= 3.5.

● Use the same dataset: “ iris.data ”

● Split your data: 67% for training and 33% for testing

● Draw a line chart: Use a “for loop” to change k from 1 to 10 and check your model accuracy.

Requirements:

scipy
numpy
pandas
matplotlib
seaborn
sklearn

Code:

import csv
import random
import math
import operator

def loadDataset(filename, split, trainingSet=[] , testSet=[]):
with open(filename, 'rb') as csvfile:
lines = csv.reader(csvfile)
dataset = list(lines)
for x in range(len(dataset)-1):
for y in range(4):
dataset[x][y] = float(dataset[x][y])
if random.random() < split:
trainingSet.append(dataset[x])
else:
testSet.append(dataset[x])


def euclideanDistance(instance1, instance2, length):
distance = 0
for x in range(length):
distance += pow((instance1[x] - instance2[x]), 2)
return math.sqrt(distance)

def getNeighbors(trainingSet, testInstance, k):
distances = []
length = len(testInstance)-1
for x in range(len(trainingSet)):
dist = euclideanDistance(testInstance, trainingSet[x], length)
distances.append((trainingSet[x], dist))
distances.sort(key=operator.itemgetter(1))
neighbors = []
for x in range(k):
neighbors.append(distances[x][0])
return neighbors

def getResponse(neighbors):
classVotes = {}
for x in range(len(neighbors)):
response = neighbors[x][-1]
if response in classVotes:
classVotes[response] += 1
else:
classVotes[response] = 1
sortedVotes = sorted(classVotes.iteritems(), key=operator.itemgetter(1), reverse=True)
return sortedVotes[0][0]

def getAccuracy(testSet, predictions):
correct = 0
for x in range(len(testSet)):
if testSet[x][-1] == predictions[x]:
correct += 1
return (correct/float(len(testSet))) * 100.0
  
def main():
# prepare data
trainingSet=[]
testSet=[]
split = 0.67
loadDataset('iris.data', split, trainingSet, testSet)
print 'Train set: ' + repr(len(trainingSet))
print 'Test set: ' + repr(len(testSet))
# generate predictions
  
predictions=[]
k_values = range(10)
for k in k_values:
for x in range(len(testSet)):
neighbors = getNeighbors(trainingSet, testSet[x], k+1)
result = getResponse(neighbors)
predictions.append(result)

accuracy = getAccuracy(testSet, predictions)
print('Accuracy: ' + repr(accuracy) + '%')
  

main()

Solutions

Expert Solution

KNN (K-Nearest Neighbor) is a simple supervised classification algorithm we can use to assign a class to new data point. It can be used for regression as well, KNN does not make any assumptions on the data distribution, hence it is non-parametric. It keeps all the training data to make future predictions by computing the similarity between an input sample and each training instance.

KNN can be summarized as below:

  1. Computes the distance between the new data point with every training example.
  2. For computing the distance measures such as Euclidean distance, Hamming distance or Manhattan distance will be used.
  3. Model picks K entries in the database which are closest to the new data point.
  4. Then it does the majority vote i.e the most common class/label among those K entries will be the class of the new data point.

With K=3, Class B will be assigned, with K=6 Class A will be assigned

Detailed documentation on KNN is available here.

Below example shows implementation of KNN on iris dataset using scikit-learn library. Iris dataset has 50 samples for each different species of Iris flower(total of 150). For each sample we have sepal length, width and petal length and width and a species name(class/label).

Iris flower: sepal length, sepal width, petal length and width

  • 150 observations
  • 4 features(sepal length, sepal width, petal length, petal width)
  • Response variable is the iris species
  • Classification problem since response is categorical.

Our task is to build a KNN model which classifies the new species based on the sepal and petal measurements. Iris dataset is available in scikit-learn and we can make use of it build our KNN.

Complete code can be found in the Git Repo.

Step1: Import the required data and check the features.

Import the load_iris function form scikit-learen datasets module and create a iris Bunch object(bunch is a scikitlearn’s special object type for storing datasets and its attributes).

Each observation represents one flower and 4 columns represents 4 measurements.We can see the features(measures) under ‘data’ attribute, where as labels under ‘features_names’. As we can see below, labels/responses are encoded as 0,1 and 2. Because the features and repose should be numeric (Numpy arrays) for scikit-learn models and they should have a specific shape.

Step2: Split the data and Train the Model.

Training and testing on the same data is not an optimal approach, so we do split the data into two pieces, training set and testing set. We use ‘train_test_split’ function to split the data. Optional parameter ‘test-size’ determines the split percentage. ‘random_state’ parameter makes the data split the same way every time you run. Since we are training and testing on different sets of data, the resulting testing accuracy will be a better estimate of how well the model is likely to perform on unseen data.

Scikit-learn is carefully organized into modules, so that we can import the relevant classes easily. Import the class ‘KNeighborsClassifer’ from ‘neighbors’ module and Instantiate the estimator (‘estimator’ is scikit-learn’s term for a model). We are calling model as estimator because their primary role is to estimate unknown quantities.

In our example we are creating an instance (‘knn’ ) of the class ‘KNeighborsClassifer’, in other words we have created an object called ‘knn’ which knows how to do KNN classification once we provide the data. The parameter ‘n_neighbors’ is the tuning parameter/hyper parameter (k) . All other parameters are set to default values.

‘fit’ method is used to train the model on training data (X_train,y_train) and ‘predict’ method to do the testing on testing data (X_test). Choosing the optimal value of K is critical, so we fit and test the model for different values for K (from 1 to 25) using a for loop and record the KNN’s testing accuracy in a variable (scores).

Plot the relationship between the values of K and the corresponding testing accuracy using the matplotlib library. As we can see there is a raise and fall in the accuracy and it is quite typical when examining the model complexity with the accuracy. In general as the value of K increase there appears to be a raise in the accuracy and again it falls.

In general the Training accuracy rises as the model complexity increases, for KNN the model complexity is determined by the value of K. Larger K value leads to smoother decision boundary (less complex model). Smaller K leads to more complex model (may lead to overfitting). Testing accuracy penalizes models that are too complex(over fitting) or not complex enough(underfit). We get the maximum testing accuracy when the model has right level of complexity, in our case we can see that for a K value of 3 to 19 our model accuracy is 96.6%.

For our final model we can choose a optimal value of K as 5 (which falls between 3 and 19) and retrain the model with all the available data. And that will be our final model which is ready to make predictions.


Related Solutions

Python - Rewriting a Program Rewrite Program 1 using functions. The required functions are in the...
Python - Rewriting a Program Rewrite Program 1 using functions. The required functions are in the table below. Create a Python program that will calculate the user’s net pay based on the tax bracket he/she is in. Your program will prompt the user for their first name, last name, their monthly gross pay, and the number of dependents. The number of dependents will determine which tax bracket the user ends up in. The tax bracket is as follows: 0 –...
USING C++ ONLY. Please study the code posted below. the goal is to rewrite the code...
USING C++ ONLY. Please study the code posted below. the goal is to rewrite the code implementing a template class using a linked list instead of an array. Note: The functionality should remain the same. /** * Queue implementation using linked list C style implementation ( no OOP). */ #include <cstdio> #include <cstdlib> #include <climits> #include <iostream> #define CAPACITY 100 // Queue max capacity using namespace std; /** Queue structure definition */ struct QueueType { int data; struct QueueType *...
1. Please program the following in Python 3 code. 2. Please share your code. 3. Please...
1. Please program the following in Python 3 code. 2. Please share your code. 3. Please show all outputs. Instructions: Run Python code  List as Stack  and verify the following calculations; submit screen shots in a single file. Postfix Expression                Result 4 5 7 2 + - * = -16 3 4 + 2  * 7 / = 2 5 7 + 6 2 -  * = 48 4 2 3 5 1 - + * + = 18   List as Stack Code: """...
Python 3 Fix the code. It is not saving the data into the xls file Code:...
Python 3 Fix the code. It is not saving the data into the xls file Code: import tkinter as tk from tkcalendar import DateEntry from openpyxl import load_workbook from tkinter import messagebox from datetime import datetime window = tk.Tk() window.title("daily logs") window.grid_columnconfigure(1,weight=1) window.grid_rowconfigure(1,weight=1) # labels tk.Label(window, text="Bar code").grid(row=0, sticky="W", pady=20, padx=20) tk.Label(window, text="Products failed").grid(row=1, sticky="W", pady=20, padx=20) tk.Label(window, text="Money Lost").grid(row=2, sticky="W", pady=20, padx=20) tk.Label(window, text="sold by").grid(row=3, sticky="W", pady=20, padx=20) tk.Label(window, text="Working product").grid(row=4, sticky="W", pady=20, padx=20) #Working product label tk.Label(window, text="Failed...
USING THE TRADITIONAL 5-STEP PROCESS (NO P VALUE) --> A random sample of 450 visitors to...
USING THE TRADITIONAL 5-STEP PROCESS (NO P VALUE) --> A random sample of 450 visitors to a local shopping mall found that 90 were there to just walk the halls for exercise. Is this sufficient evidence to conclude that more than 15% of visitors to the mall were there simply for the exercise? Let α = 0.05. --> My grandfather, Roscoe, needed a new band saw for his carpentry workshop, and asked me to call two different stores to check...
All code should be in Python 3. Implement the Stack Class, using the push, pop, str,...
All code should be in Python 3. Implement the Stack Class, using the push, pop, str, init methods, and the insurance variable 'list'.
Can you rewrite this MATLAB code using a for loop instead of a while loop? %formatting...
Can you rewrite this MATLAB code using a for loop instead of a while loop? %formatting clc, clear, format compact; %define variables k=1; b=-2; x=-1; y=-2; %while loop initialization for k <= 3 disp([num2str(k), ' ',num2str(b),' ',num2str(x),' ',num2str(y),]); y = x^2 -3; if y< b b = y; end x = x+1; k = k+1; end
Hi, I'm trying to rewrite the code below (code #1) by changing delay() to millis(). void...
Hi, I'm trying to rewrite the code below (code #1) by changing delay() to millis(). void loop() { // Print the value inside of myBPM. Serial.begin(9600); int myBPM = pulseSensor.getBeatsPerMinute(); // Calls function on our pulseSensor object that returns BPM as an "int". // "myBPM" hold this BPM value now. if (pulseSensor.sawStartOfBeat()) { // Constantly test to see if "a beat happened". Serial.println("♥ A HeartBeat Happened ! "); // If test is "true", print a message "a heartbeat happened". Serial.print("BPM:...
Using R Question 3. kNN Classification 3.1 Read in iris dataset using “data(iris)”. Describe the features...
Using R Question 3. kNN Classification 3.1 Read in iris dataset using “data(iris)”. Describe the features in the data using summary 3.2 Randomize the iris data set, mix it up and normalize it 3.3 split data into training & testing (70/30 split) 3.4 Train model in data and use crosstable function to evaluate the results 3.5 Rerun your code for K=10 and 100. Compare results and explain
Python Practice Sample:   Write code to replace every occurrence of THE or the with ### and...
Python Practice Sample:   Write code to replace every occurrence of THE or the with ### and every word ending with the letter s to end with a $. Print the resulting text four words per line (and any remaining words from each paragraph on the last one of each paragraph) "The modern business world goes way beyond the balance sheet. Whether your passion is finance or fashion, economics or the environment, you need an education built for business. At Bentley,...
ADVERTISEMENT
ADVERTISEMENT
ADVERTISEMENT