Question

In: Computer Science

Using R studio 1. Read the iris data set into a data frame. 2. Print the...

Using R studio

1. Read the iris data set into a data frame.

2. Print the first few lines of the iris dataset.

3. Output all the entries with Sepal Length > 5.

4. Plot a box plot of Petal Length with a color of your choice.

5. Plot a histogram of Sepal Width.

6. Plot a scatter plot showing the relationship between Petal Length and Petal Width.

7. Find the mean of Sepal Length by species. Hint: You could use the tapply function. Other methods are also acceptable.

8. Use the subset function to extract only rows where the species is "versicolor."

9. Install the dplyr package and load it on your console.

10. Use a function in the dplyr package to show only rows with Sepal Length <6 belonging to species "virginica."

Solutions

Expert Solution

Code:

#Library Dataset was used.
#Loading dataset iris

library(datasets)
data(iris)

#1) reading dataset into a data frame.
cat("dataloaded in data frame\n\n")
data <- data.frame(iris)


#2) first few lines of iris dataset.
cat("few lines of dataset\n\n")
head(data)
cat("\n\n")

Output:

#3) Output all the entries with Sepal Length > 5.
cat("entries with Sepal Length > 5\n\n")
ans1 <- data[(data[,1]>5),]
ans1
cat("\n\n")

Output:

Note: The output of this question is large please run it on your PC and save the Output.

#4) Plot a box plot of Petal Length with a color of your choice.
cat("boxplot of petal length\n\n")
boxplot(Petal.Length ~ Species, data=iris,
main="Box Plot",
col="red",
xlab="Species",
ylab="Sepal Length")
cat("\n\n")

Output:

#5) histogram of Sepal Width
cat("Histogram of sepal width\n\n")
sepal_width <- data$Sepal.Width
hist(sepal_width)
cat("\n\n")

Output:

#6) scatter plot showing the relationship between Petal Length and Petal Width.
cat("Scatter plot\n\n")
plot(data$Petal.Length, data$Petal.Width)
cat("\n\n")

Output:


#7) Find the mean of Sepal Length by species.
cat("mean of Sepal Length by species\n\n")
mean <- tapply(iris$Sepal.Length, iris$Species, mean)
mean
cat("\n\n")

Output:


#8) Use the subset function to extract only rows where the species is "versicolor."
cat("subset function to extract only rows\n\n")
irisV <- subset(data, Species == "versicolor")
irisV
cat("\n\n")

Output:

#Note: Please Write this in your R console and install the library.
install.packages("dplyr")


#10) only rows with Sepal Length <6 belonging to species "virginica."
library(dplyr)
cat("Sepal Length <6 belonging to species virginica\n\n")
ans <- filter(data, species == "virginica", sepal.length < 6)
ans

Output:

Explanation to code:

1) Library function is used to load a library in R.
2) Data function is used to load a specific dataset with name as argument.
3) data.frame is used to frame the data which we have take from the iris dataset.
4) Head function is used to print the first few values of the data frame.

Detailed Explanation:
1) ans1 <- data[(data[,1]>5),]

  • In this code we are taking the dataframe data we are selecting the first column and checking if it is greater than 5 or not.

2) boxplot(Petal.Length ~ Species, data=iris,
main="Box Plot Q4",
col="red",
xlab="Species",
ylab="Sepal Length")

  • In this code we have used boxplot function. It has arguments i.e. against which two of the variable you have to make the plot. Main is the name of the plot. Col is used to colour the box. xlab and ylab are the label to the plot.

3) sepal_width <- data$Sepal.Width
hist(sepal_width)

  • We are taking the sepal width in a variable and then plotting a histogram using the function hist.

4) plot(data$Petal.Length, data$Petal.Width)

  • In this code we have plotted a scatter plot with the help of the function plot. It take two argument i.e. X and Y variable for the plot.

5) mean <- tapply(iris$Sepal.Length, iris$Species, mean)

  • In this code we have used the function tapply and we have given 3 arguments i.e. sepal length, special and mean.

6)  irisV <- subset(data, Species == "versicolor")

  • In this code we have used subset and them we have find out the subset of the data where species is versicolor.

7) ans <- filter(data, species == "virginica", sepal.length < 6)

  • In this code we have used the function filter which come under dplyr. We have passed arguments like data , species which virginica and the condition that the sepal length should be < 6.

Note: You can refer the R documentation for more information regarding the function these are related just from the assignment.

Screenshot of code:


Related Solutions

This is for Predictive Analytics. 1. Read the iris data set into a data frame. 2....
This is for Predictive Analytics. 1. Read the iris data set into a data frame. 2. Print the first few lines of the iris dataset. 3. Output all the entries with Sepal Length > 5. 4. Plot a box plot of Petal Length with a color of your choice. 5. Plot a histogram of Sepal Width. 6. Plot a scatter plot showing the relationship between Petal Length and Petal Width. 7. Find the mean of Sepal Length by species. Hint:...
Using R Question 3. kNN Classification 3.1 Read in iris dataset using “data(iris)”. Describe the features...
Using R Question 3. kNN Classification 3.1 Read in iris dataset using “data(iris)”. Describe the features in the data using summary 3.2 Randomize the iris data set, mix it up and normalize it 3.3 split data into training & testing (70/30 split) 3.4 Train model in data and use crosstable function to evaluate the results 3.5 Rerun your code for K=10 and 100. Compare results and explain
1. Consider the builtin dataset iris. a. What is the structure of the iris data frame?...
1. Consider the builtin dataset iris. a. What is the structure of the iris data frame? b. Create a histogram of the Sepal.Width variable. c. Create a histogram of the Petal.Width variable. d. For both histograms, does the data appear normally distributed? Are they skewed? e. For both histograms, does it appear that the data come from more than one populations? f. What is the mean and median of Sepal.Width? What is the variance and standard deviation? g. What is...
(1) Read in the data and create an R data frame named tennis.dfr that has the...
(1) Read in the data and create an R data frame named tennis.dfr that has the following names for its columns: first.name, last.name, major.match.wins, major.match.losses, overall.match.wins, overall.match.losses, major.titles, overall.titles. (Note that the data file has several explanatory lines before the real data begin that should be skipped when reading in the data lines.) NOTE: For the file name, you must use the following web address (URL): "http://people.stat.sc.edu/hitchcock/tennisplayers2018.txt". Please do not have your code read in the file from your own...
** Number 2 implemented in R (R Studio) ** Set up the Auto data: Load the...
** Number 2 implemented in R (R Studio) ** Set up the Auto data: Load the ISLR package and the Auto data Determine the median value for mpg Use the median to create a new column in the data set named mpglevel, which is 1 if mpg>median and otherwise is 0. Make sure this variable is a factor. We will use mpglevel as the target (response) variable for the algorithms. Use the names() function to verify that your new column...
1. Use the R command X <- iris to assign Fishers’ iris dataset to the data...
1. Use the R command X <- iris to assign Fishers’ iris dataset to the data matrix X. Using the head(X) command summarize what each column of the dataset is measuring and represents. Assign Y as a new matrix of dimension 150 by 4 which has the values of X without the species label. 2. Compute and interpret (in summary English) each of the summary statistics X,S,R using R. 3. Visualize the dataset by making a scatterplot of Sepal Length...
1. Basic use of R/R Studio. Solve the following problem in R and print out the...
1. Basic use of R/R Studio. Solve the following problem in R and print out the commands and outputs. (a) Create a vector of the positive odd integers less than 100; Remove the values greater than 60 and less than 80; Find the variance of the remaining set of values (b) What’s the difference in output between the commands 2*1:5 and (2*1):5? Why is there a difference? (c) If you wanted to enter the odd numbers from 1 to 19...
( In R / R studio ) im not sure how to share my data set,...
( In R / R studio ) im not sure how to share my data set, but below is the title of my data set and the 12 columns of my data set. Please answer as best you can wheather its pseudo code, partial answers, or just a suggestion on how i can in to answer the question. thanks #---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- The dataset incovid_sd_20201001.RDatacontains several variables related to infections of covid-19 for eachzip code in San Diego County as of October...
USE R STUDIO. Consider the pressure data frame. There are two columns: temperature and pressure: •...
USE R STUDIO. Consider the pressure data frame. There are two columns: temperature and pressure: • Construct a scatterplot with pressure on the vertical axis and temperature on the horizontal axis. • The graph of the following function passes through the plotted points reasonably well: y = (0.168 + 0.007 ∗ x) ^(20/3). Recall that the differences between the pressure values predicted by the curve (i.e. y) and the observed pressure values (i.e. the pressure values obtained from the data...
In the R programming language, we would like to use the data set called iris to...
In the R programming language, we would like to use the data set called iris to build a simple linear regression model to predict Sepal.Length based on Petal.Length. Calculate the least squares regression line to predict Sepal.Length based on Petal.Length. Interpret the slope of the line in the context of the problem. Remember that both variables are measured in centimeters. Plot the regression line in a scatterplot of Sepal.Length vs. Petal.Length. Test H1: ??1 ≠ 0 at ?? = 0.05...
ADVERTISEMENT
ADVERTISEMENT
ADVERTISEMENT