In: Computer Science
Using R
Question 3. kNN Classification
3.1 Read in iris dataset using “data(iris)”. Describe the features
in the data using summary
3.2 Randomize the iris data set, mix it up and normalize it
3.3 split data into training & testing (70/30 split)
3.4 Train model in data and use crosstable function to evaluate the
results
3.5 Rerun your code for K=10 and 100. Compare results and
explain
code :
output :
raw_code :
require("class")
require("datasets")
data("iris") #loading dataset
summary(iris)
set.seed(99)
rnum<- sample(rep(1:150))
iris <- iris[rnum,] #randomizing
#normalizing the data between values 0 and 1
normalize <- function(m){
return ((m-min(m))/(max(m)-min(m)))
}
iris.new<-
as.data.frame(lapply(iris[,c(1,2,3,4)],normalize))
#splitting into 70/30
iris.train<- iris.new[1:105,]
iris.train.target<- iris[1:105,5]
iris.test<- iris.new[106:150,]
iris.test.target<- iris[106:150,5]
#model
model1 <- knn(train=iris.train, test=iris.test,
cl=iris.train.target, k=10)
model2 <- knn(train=iris.train, test=iris.test,
cl=iris.train.target, k=100)
table(iris.test.target,model1)
table(iris.test.target,model2)
Result : k=10 gave better accuracy than k=100.
When k increases overfitting increases , so accuracy
decreases.
***do comment for queries and rate me up**