Question

In: Statistics and Probability

1. Use the R command X <- iris to assign Fishers’ iris dataset to the data...

1. Use the R command X <- iris to assign Fishers’ iris dataset to the data matrix X. Using the head(X) command summarize what each column of the dataset is measuring and represents. Assign Y as a new matrix of dimension 150 by 4 which has the values of X without the species label.

2. Compute and interpret (in summary English) each of the summary statistics X,S,R using R.

3. Visualize the dataset by making a scatterplot of Sepal Length vs. Sepal Width, a scatterplot of Petal Length vs. Petal Width. The pairs function is useful here. Use your plots and stats from #2 to comment on any evident correlations.

Solutions

Expert Solution

Rcode.R

X=iris #assigning iris datset to data matrix X

head(X)

##   Sepal.Length Sepal.Width Petal.Length Petal.Width Species
## 1          5.1         3.5          1.4         0.2 setosa
## 2          4.9         3.0          1.4         0.2 setosa
## 3          4.7         3.2          1.3         0.2 setosa
## 4          4.6         3.1          1.5         0.2 setosa
## 5          5.0         3.6          1.4         0.2 setosa
## 6          5.4         3.9          1.7         0.4 setosa

# the first column shows the variable Sepal length, which meausres the length of Sepals
# the second column shows the variable Sepal length, which meausres the width of Sepals
# the third column shows the variable Petal width, which meausres the length of Petals
# the fourth column shows the variable Petal width, which meausres the width of Petals

Y=X[,-5] #assigning Y as a matrix of dimension 150*4 having columns of X except the last column(Species) of X

summary(X) #computes the summary for each column of X

##   Sepal.Length    Sepal.Width     Petal.Length    Petal.Width  
## Min.   :4.300   Min.   :2.000   Min.   :1.000   Min.   :0.100
## 1st Qu.:5.100   1st Qu.:2.800   1st Qu.:1.600   1st Qu.:0.300
## Median :5.800   Median :3.000   Median :4.350   Median :1.300
## Mean   :5.843   Mean   :3.057   Mean   :3.758   Mean   :1.199
## 3rd Qu.:6.400   3rd Qu.:3.300   3rd Qu.:5.100   3rd Qu.:1.800
## Max.   :7.900   Max.   :4.400   Max.   :6.900   Max.   :2.500
##        Species
## setosa    :50
## versicolor:50
## virginica :50
##                
##                
##

attach(X)
#Scatter plots
plot(Sepal.Length,Sepal.Width)

plot(Petal.Length,Petal.Width)

##for correlation interpretaion
pairs(X[1:2]) ## The graph shows no correlation between Sepal length and Sepal width

pairs(X[3:4]) ## The graph shows higher positive correlation between Petal length and Petal width


Related Solutions

1. Consider the builtin dataset iris. a. What is the structure of the iris data frame?...
1. Consider the builtin dataset iris. a. What is the structure of the iris data frame? b. Create a histogram of the Sepal.Width variable. c. Create a histogram of the Petal.Width variable. d. For both histograms, does the data appear normally distributed? Are they skewed? e. For both histograms, does it appear that the data come from more than one populations? f. What is the mean and median of Sepal.Width? What is the variance and standard deviation? g. What is...
Using R Question 3. kNN Classification 3.1 Read in iris dataset using “data(iris)”. Describe the features...
Using R Question 3. kNN Classification 3.1 Read in iris dataset using “data(iris)”. Describe the features in the data using summary 3.2 Randomize the iris data set, mix it up and normalize it 3.3 split data into training & testing (70/30 split) 3.4 Train model in data and use crosstable function to evaluate the results 3.5 Rerun your code for K=10 and 100. Compare results and explain
Q.1: The iris dataset (included with R) contains four measurements for 150 flowers representing three species...
Q.1: The iris dataset (included with R) contains four measurements for 150 flowers representing three species of iris (Iris setosa, versicolor and virginica). 1. Inspect the Iris data in R. 2. Use the summary code in R to perform descriptive analysis. Paste Summary statistics in your report. 3. Draw a scatter plot, for petal length vs petal width. 4. Find all possible correlation between quantitative variables. 5. Use Function lm for developing a regression model and paste the summary of...
In the R programming language, we would like to use the data set called iris to...
In the R programming language, we would like to use the data set called iris to build a simple linear regression model to predict Sepal.Length based on Petal.Length. Calculate the least squares regression line to predict Sepal.Length based on Petal.Length. Interpret the slope of the line in the context of the problem. Remember that both variables are measured in centimeters. Plot the regression line in a scatterplot of Sepal.Length vs. Petal.Length. Test H1: ??1 ≠ 0 at ?? = 0.05...
Using R studio 1. Read the iris data set into a data frame. 2. Print the...
Using R studio 1. Read the iris data set into a data frame. 2. Print the first few lines of the iris dataset. 3. Output all the entries with Sepal Length > 5. 4. Plot a box plot of Petal Length with a color of your choice. 5. Plot a histogram of Sepal Width. 6. Plot a scatter plot showing the relationship between Petal Length and Petal Width. 7. Find the mean of Sepal Length by species. Hint: You could...
Using the Iris dataset in R; PLEASE CREATE YOUR OWN FUNCTION USING FORMULAS INTEAD OF FUNCTIONS...
Using the Iris dataset in R; PLEASE CREATE YOUR OWN FUNCTION USING FORMULAS INTEAD OF FUNCTIONS THAT ARE BUILT IN R ,,,,,PLEASE TRY PLEASE a Carry out a hypothesis to test if the population sepal length mean is 6.2 at α = 0.05. Interpret your results. b Carry out a hypothesis to test if the population sepal width mean is 4 at α = 0.05. Interpret your results. c Carry out a hypothesis to test if the population sepal width...
Please use RStudio to answer the question and give the R command: please load data use...
Please use RStudio to answer the question and give the R command: please load data use data: library(MASS) data(cats) Use the “cats” data set to test for the variance of the body weight in male and female cats
Use the multi-layer perceptron algorithm to learn a model that classifies IRIS flower dataset. Split the...
Use the multi-layer perceptron algorithm to learn a model that classifies IRIS flower dataset. Split the dataset into a train set to train the algorithm and test set to test the algorithm. Calculate the accuracy. Use Scikit-Learn
Solve it by R Use the ‘cement’ dataset in ‘MASS’ package to answer the question. (1)...
Solve it by R Use the ‘cement’ dataset in ‘MASS’ package to answer the question. (1) Conduct the multiple linear regression, regress y value on x1, x2, x3 and x4 (without intercept). Report the estimated coefficients. Which predictor variables have strong linear relationship with response variable y at significance level 0.05? (2) What is the adjusted R square of your regression? What is the interquartile range (IQR) of the residuals from your regression? (3) Conduct a best subset regression (with...
Use R.  Provide Solution and R Code within each problem. For this section use the dataset “PlantGrowth”,...
Use R.  Provide Solution and R Code within each problem. For this section use the dataset “PlantGrowth”, available in base R (you do not need to download any packages). a.Construct a 95% confidence interval for the true mean weight. b.Interpret the confidence interval in 1. in the context of the problem. c.Write down the null and alternative hypothesis to determine if the mean weight of the plants is less than 5. d.Conduct a statistical test to determine if the mean weight...
ADVERTISEMENT
ADVERTISEMENT
ADVERTISEMENT