Question

In: Computer Science

Using R Question 3. kNN Classification 3.1 Read in iris dataset using “data(iris)”. Describe the features...

Using R

Question 3. kNN Classification
3.1 Read in iris dataset using “data(iris)”. Describe the features in the data using summary
3.2 Randomize the iris data set, mix it up and normalize it
3.3 split data into training & testing (70/30 split)
3.4 Train model in data and use crosstable function to evaluate the results
3.5 Rerun your code for K=10 and 100. Compare results and explain

Solutions

Expert Solution

code :

output :

raw_code :

require("class")

require("datasets")

data("iris") #loading dataset

summary(iris)

set.seed(99)

rnum<- sample(rep(1:150))

iris <- iris[rnum,] #randomizing

#normalizing the data between values 0 and 1
normalize <- function(m){
return ((m-min(m))/(max(m)-min(m)))
}
iris.new<- as.data.frame(lapply(iris[,c(1,2,3,4)],normalize))

#splitting into 70/30
iris.train<- iris.new[1:105,]
iris.train.target<- iris[1:105,5]
iris.test<- iris.new[106:150,]
iris.test.target<- iris[106:150,5]


#model
model1 <- knn(train=iris.train, test=iris.test, cl=iris.train.target, k=10)
model2 <- knn(train=iris.train, test=iris.test, cl=iris.train.target, k=100)

table(iris.test.target,model1)
table(iris.test.target,model2)

Result : k=10 gave better accuracy than k=100. When k increases overfitting increases , so accuracy decreases.

***do comment for queries and rate me up**


Related Solutions

1. Use the R command X <- iris to assign Fishers’ iris dataset to the data...
1. Use the R command X <- iris to assign Fishers’ iris dataset to the data matrix X. Using the head(X) command summarize what each column of the dataset is measuring and represents. Assign Y as a new matrix of dimension 150 by 4 which has the values of X without the species label. 2. Compute and interpret (in summary English) each of the summary statistics X,S,R using R. 3. Visualize the dataset by making a scatterplot of Sepal Length...
Using R studio 1. Read the iris data set into a data frame. 2. Print the...
Using R studio 1. Read the iris data set into a data frame. 2. Print the first few lines of the iris dataset. 3. Output all the entries with Sepal Length > 5. 4. Plot a box plot of Petal Length with a color of your choice. 5. Plot a histogram of Sepal Width. 6. Plot a scatter plot showing the relationship between Petal Length and Petal Width. 7. Find the mean of Sepal Length by species. Hint: You could...
Fitting a linear model using R a. Read the Toluca.txt dataset into R (this dataset can...
Fitting a linear model using R a. Read the Toluca.txt dataset into R (this dataset can be found on Canvas). Now fit a simple linear regression model with X = lotSize and Y = workHrs. Summarize the output from the model: the least square estimators, their standard errors, and corresponding p-values. b. Draw the scatterplot of Y versus X and add the least squares line to the scatterplot. c. Obtain the fitted values ˆyi and residuals ei . Print the...
This is a maching learning question. Using the Kaggle diamonds dataset, build a KNN based estimator...
This is a maching learning question. Using the Kaggle diamonds dataset, build a KNN based estimator for estimating the price of a diamond and propose an appropriate K value. Please use python and google colab format. Thank you!
Using the Iris dataset in R; PLEASE CREATE YOUR OWN FUNCTION USING FORMULAS INTEAD OF FUNCTIONS...
Using the Iris dataset in R; PLEASE CREATE YOUR OWN FUNCTION USING FORMULAS INTEAD OF FUNCTIONS THAT ARE BUILT IN R ,,,,,PLEASE TRY PLEASE a Carry out a hypothesis to test if the population sepal length mean is 6.2 at α = 0.05. Interpret your results. b Carry out a hypothesis to test if the population sepal width mean is 4 at α = 0.05. Interpret your results. c Carry out a hypothesis to test if the population sepal width...
1. Consider the builtin dataset iris. a. What is the structure of the iris data frame?...
1. Consider the builtin dataset iris. a. What is the structure of the iris data frame? b. Create a histogram of the Sepal.Width variable. c. Create a histogram of the Petal.Width variable. d. For both histograms, does the data appear normally distributed? Are they skewed? e. For both histograms, does it appear that the data come from more than one populations? f. What is the mean and median of Sepal.Width? What is the variance and standard deviation? g. What is...
Examine classification using logistic regression. In R console, type mtcars. The dataset mtcars is a generic...
Examine classification using logistic regression. In R console, type mtcars. The dataset mtcars is a generic dataset in R. This dataset comprises of fuel consumption and 10 aspects of automobile design and performance for 32 automobiles. Using only the variables am (0 = automatic, 1 = manual) and mpg, your task is to fit a logistic regression model. Complete the following steps using R. Create a scatter plot of am vs. mpg. Describe the relationship and explain why a simple...
ANSWER USING R CODE Using the dataset 'LakeHuron' which is a built in R dataset describing...
ANSWER USING R CODE Using the dataset 'LakeHuron' which is a built in R dataset describing the level in feet of Lake Huron from 1872- 1972. To assign the values into an ordinary vector,x, we can do the following 'x <- as.vector(LakeHuron)'. From there, we can access the data easily. Assume the values in X are a random sample from a normal population with distribution X. Also assume the X has an unknown mean and unknown standard deviation. With this...
This is for Predictive Analytics. 1. Read the iris data set into a data frame. 2....
This is for Predictive Analytics. 1. Read the iris data set into a data frame. 2. Print the first few lines of the iris dataset. 3. Output all the entries with Sepal Length > 5. 4. Plot a box plot of Petal Length with a color of your choice. 5. Plot a histogram of Sepal Width. 6. Plot a scatter plot showing the relationship between Petal Length and Petal Width. 7. Find the mean of Sepal Length by species. Hint:...
Python 3 Rewrite KNN sample code using KNeighborsClassifier . ● Repeat KNN Step 1 – 5,...
Python 3 Rewrite KNN sample code using KNeighborsClassifier . ● Repeat KNN Step 1 – 5, for at least five times and calculate average accuracy to be your result. ● If you use the latest version of scikit -learn, you need to program with Python >= 3.5. ● Use the same dataset: “ iris.data ” ● Split your data: 67% for training and 33% for testing ● Draw a line chart: Use a “for loop” to change k from 1...
ADVERTISEMENT
ADVERTISEMENT
ADVERTISEMENT