In: Computer Science
1. Use Universal Bank dataset. Note that Personal.Loan is the outcome variable of interest. a. Perform a k-NN classification with k=3 for a new data (You are welcome to choose your own values for the new data. Please clearly state it in your report).
b. Identify the best k. Why do you think this is the best? (Hint: Explain what happens if k increases and if k decreases).
c. Calculate accuracy, sensitivity, and specificity for your validation data using the best k (from part b) without calling or using the confusion matrix (that is, compute them directly from the validation data) and verify your computation by directly calling the confusion matrix using R (This is to just make you better in using R).
d. Partition your dataset into 3. Compare the accuracy metrics from part c for both validation data and test data. Summarize your results.
# !!!!! Don't forget to change the path in read.csv !!!!!
data <- read.csv("c:/Users/Acer/Downloads/UniversalBank.csv")
n <- nrow(data)
print(str(data))
print(summary(data))
attach(data)
n_train <- floor(0.8 * n)
n_valid <- floor(0.1 * n)
set.seed(123)
train_indices <- sample(1:n, n_train)
train_temp <- data[train_indices,]
test <- data[-train_indices,]
valid_indices <- sample(1:n_train, n_valid)
valid <- train_temp[valid_indices,]
train <- train_temp[-valid_indices,]
install.packages("DMwR")
library(DMwR)
install.packages("e1071")
library(e1071)
i = 2
while (i < 10) {
pred1 <- kNN(Personal.Loan~., train, test,k=i)
pred2 <- kNN(Personal.Loan~., train, valid,k=i)
print("Number of Nearest Neighbors : ")
print(i)
print("Confusion Matrix for Test: ")
print(confusionMatrix(factor(test$Personal.Loan), factor(pred1)))
print("Confusion Matrix for Valid: ")
print(confusionMatrix(factor(valid$Personal.Loan), factor(pred2)))
print("-------------------------------------------------")
i = i + 1
}