In: Computer Science
This is a machine learning question. Please use python and google colab format. PLEASE USE BASIC MACHINE LEARNING CODES.
Using the Kaggle diamonds dataset, construct a KNN estimator to predict diamond prices. Choose an appropriate K value and predict the price of a diamond with the following parameters: "carat' : 0.32, 'cut' : Ideal, 'color' : E, 'clarity' : IF, 'depth' : 60.7, 'table' : 58.0, 'x' : 4.46, 'y' : 4.48, 'z': 2.71".
Please change the cut, color and clarity to numbers (Eg: cut: 'Fair' : 1, 'Good' : 2, 'Very Good' : 3, 'Premium' : 4, 'Ideal' : 5) etc. I need to predict the price of the specific diamond stated above. Thank you!
Please find the google colab code below:
import pandas as pd
data = pd.read_csv("diamonds.csv") # read data from the csv file
data.head() # print the first 5 lines to check if the data has been imported correctly
data.drop('Unnamed: 0', inplace=True, axis=1) # drop the serial number column
data.head()
# get the unique values in each categorical column
print(data["cut"].unique())
print(data["color"].unique())
print(data["clarity"].unique())
# replace the categorical values with the corresponding numerical value
data['cut'].replace(to_replace=['Ideal', 'Premium', 'Good', 'Very Good', 'Fair'], value=[5, 4, 3, 2, 1], inplace=True)
data['color'].replace(to_replace=['E', 'I', 'J', 'H', 'F', 'G', 'D'], value=[7, 6, 5, 4, 3, 2, 1], inplace=True)
data['clarity'].replace(to_replace=['SI2', 'SI1', 'VS1', 'VS2', 'VVS2', 'VVS1', 'I1', 'IF'], value=[8, 7, 6, 5, 4, 3, 2, 1], inplace=True)
# see if all the data has been correctly replaced
print(data["cut"].unique())
print(data["color"].unique())
print(data["clarity"].unique())
data.head()
# store the price column in y and the rest of the features in X
y = data['price']
X = data.drop("price", axis=1)
from sklearn.neighbors import KNeighborsClassifier
knn = KNeighborsClassifier() # define the KNeighborsClassifier model
knn.fit(X, y) # fit the data
# create the test dataframe
test_data = {'carat' : [0.32], 'cut' : [5], 'color' : [7], 'clarity' : [1], 'depth' : [60.7], 'table' : [58.0], 'x' : [4.46], 'y' : [4.48], 'z': [2.71]}
to_test = pd.DataFrame (test_data, columns = ['carat', 'cut', 'color', 'clarity', 'depth', 'table', 'x', 'y', 'z'])
to_test
print("Price: ", knn.predict(to_test)[0]) # print the predicted price
Since .ipynb file can't be attached, the python code has been pasted. The final screenshot is as follows:
If you have any doubts please let me know in the comments. (Also upvote/thumbsup if you can!)