Question

In: Statistics and Probability

1. Use the R command X <- iris to assign Fishers’ iris dataset to the data...

1. Use the R command X <- iris to assign Fishers’ iris dataset to the data matrix X. Using the head(X) command summarize what each column of the dataset is measuring and represents. Assign Y as a new matrix of dimension 150 by 4 which has the values of X without the species label.

2. Compute and interpret (in summary English) each of the summary statistics X,S,R using R.

3. Visualize the dataset by making a scatterplot of Sepal Length vs. Sepal Width, a scatterplot of Petal Length vs. Petal Width. The pairs function is useful here. Use your plots and stats from #2 to comment on any evident correlations.

Solutions

Expert Solution

Rcode.R

X=iris #assigning iris datset to data matrix X

head(X)

##   Sepal.Length Sepal.Width Petal.Length Petal.Width Species
## 1          5.1         3.5          1.4         0.2 setosa
## 2          4.9         3.0          1.4         0.2 setosa
## 3          4.7         3.2          1.3         0.2 setosa
## 4          4.6         3.1          1.5         0.2 setosa
## 5          5.0         3.6          1.4         0.2 setosa
## 6          5.4         3.9          1.7         0.4 setosa

# the first column shows the variable Sepal length, which meausres the length of Sepals
# the second column shows the variable Sepal length, which meausres the width of Sepals
# the third column shows the variable Petal width, which meausres the length of Petals
# the fourth column shows the variable Petal width, which meausres the width of Petals

Y=X[,-5] #assigning Y as a matrix of dimension 150*4 having columns of X except the last column(Species) of X

summary(X) #computes the summary for each column of X

##   Sepal.Length    Sepal.Width     Petal.Length    Petal.Width  
## Min.   :4.300   Min.   :2.000   Min.   :1.000   Min.   :0.100
## 1st Qu.:5.100   1st Qu.:2.800   1st Qu.:1.600   1st Qu.:0.300
## Median :5.800   Median :3.000   Median :4.350   Median :1.300
## Mean   :5.843   Mean   :3.057   Mean   :3.758   Mean   :1.199
## 3rd Qu.:6.400   3rd Qu.:3.300   3rd Qu.:5.100   3rd Qu.:1.800
## Max.   :7.900   Max.   :4.400   Max.   :6.900   Max.   :2.500
##        Species
## setosa    :50
## versicolor:50
## virginica :50
##                
##                
##

attach(X)
#Scatter plots
plot(Sepal.Length,Sepal.Width)

plot(Petal.Length,Petal.Width)

##for correlation interpretaion
pairs(X[1:2]) ## The graph shows no correlation between Sepal length and Sepal width

pairs(X[3:4]) ## The graph shows higher positive correlation between Petal length and Petal width


Related Solutions

1. Consider the builtin dataset iris. a. What is the structure of the iris data frame?...
1. Consider the builtin dataset iris. a. What is the structure of the iris data frame? b. Create a histogram of the Sepal.Width variable. c. Create a histogram of the Petal.Width variable. d. For both histograms, does the data appear normally distributed? Are they skewed? e. For both histograms, does it appear that the data come from more than one populations? f. What is the mean and median of Sepal.Width? What is the variance and standard deviation? g. What is...
Using R Question 3. kNN Classification 3.1 Read in iris dataset using “data(iris)”. Describe the features...
Using R Question 3. kNN Classification 3.1 Read in iris dataset using “data(iris)”. Describe the features in the data using summary 3.2 Randomize the iris data set, mix it up and normalize it 3.3 split data into training & testing (70/30 split) 3.4 Train model in data and use crosstable function to evaluate the results 3.5 Rerun your code for K=10 and 100. Compare results and explain
In the R programming language, we would like to use the data set called iris to...
In the R programming language, we would like to use the data set called iris to build a simple linear regression model to predict Sepal.Length based on Petal.Length. Calculate the least squares regression line to predict Sepal.Length based on Petal.Length. Interpret the slope of the line in the context of the problem. Remember that both variables are measured in centimeters. Plot the regression line in a scatterplot of Sepal.Length vs. Petal.Length. Test H1: ??1 ≠ 0 at ?? = 0.05...
Using R studio 1. Read the iris data set into a data frame. 2. Print the...
Using R studio 1. Read the iris data set into a data frame. 2. Print the first few lines of the iris dataset. 3. Output all the entries with Sepal Length > 5. 4. Plot a box plot of Petal Length with a color of your choice. 5. Plot a histogram of Sepal Width. 6. Plot a scatter plot showing the relationship between Petal Length and Petal Width. 7. Find the mean of Sepal Length by species. Hint: You could...
Using the Iris dataset in R; PLEASE CREATE YOUR OWN FUNCTION USING FORMULAS INTEAD OF FUNCTIONS...
Using the Iris dataset in R; PLEASE CREATE YOUR OWN FUNCTION USING FORMULAS INTEAD OF FUNCTIONS THAT ARE BUILT IN R ,,,,,PLEASE TRY PLEASE a Carry out a hypothesis to test if the population sepal length mean is 6.2 at α = 0.05. Interpret your results. b Carry out a hypothesis to test if the population sepal width mean is 4 at α = 0.05. Interpret your results. c Carry out a hypothesis to test if the population sepal width...
Please use RStudio to answer the question and give the R command: please load data use...
Please use RStudio to answer the question and give the R command: please load data use data: library(MASS) data(cats) Use the “cats” data set to test for the variance of the body weight in male and female cats
Use the multi-layer perceptron algorithm to learn a model that classifies IRIS flower dataset. Split the...
Use the multi-layer perceptron algorithm to learn a model that classifies IRIS flower dataset. Split the dataset into a train set to train the algorithm and test set to test the algorithm. Calculate the accuracy. Use Scikit-Learn
Solve it by R Use the ‘cement’ dataset in ‘MASS’ package to answer the question. (1)...
Solve it by R Use the ‘cement’ dataset in ‘MASS’ package to answer the question. (1) Conduct the multiple linear regression, regress y value on x1, x2, x3 and x4 (without intercept). Report the estimated coefficients. Which predictor variables have strong linear relationship with response variable y at significance level 0.05? (2) What is the adjusted R square of your regression? What is the interquartile range (IQR) of the residuals from your regression? (3) Conduct a best subset regression (with...
Use R.  Provide Solution and R Code within each problem. For this section use the dataset “PlantGrowth”,...
Use R.  Provide Solution and R Code within each problem. For this section use the dataset “PlantGrowth”, available in base R (you do not need to download any packages). a.Construct a 95% confidence interval for the true mean weight. b.Interpret the confidence interval in 1. in the context of the problem. c.Write down the null and alternative hypothesis to determine if the mean weight of the plants is less than 5. d.Conduct a statistical test to determine if the mean weight...
There are four numeric columns in R programming language's iris data set. Create a scatter plot...
There are four numeric columns in R programming language's iris data set. Create a scatter plot between the four numeric columns using R programming language and give answers to the following parts. Calculate the correlation between each pair of the four numeric columns in iris. Which pair of variables has the strongest linear relationship? Interpret their ??. Which pair of variables has the weakest linear relationship? Interpret their ??. Which pair(s) of variables can you conclude have a population correlation...
ADVERTISEMENT
ADVERTISEMENT
ADVERTISEMENT