Question

In: Statistics and Probability

In the R programming language, we would like to use the data set called iris to...

  1. In the R programming language, we would like to use the data set called iris to build a simple linear regression model to predict

Sepal.Length based on Petal.Length.

  1. Calculate the least squares regression line to predict Sepal.Length based on Petal.Length. Interpret the slope of the line in the context of the problem. Remember that both variables are measured in centimeters.
  2. Plot the regression line in a scatterplot of Sepal.Length vs. Petal.Length.
  3. Test H1: ??1 ≠ 0 at ?? = 0.05 using both a ??-test and an ??-test. Report the test statistics and interpret the results.
  4. Visually check the normality assumption. Does it seem reasonable here?
  5. Visually check the constant variance assumption. Does it seem reasonable here?
  6. Interpret this regression model’s ??2.
  7. Using the regression line, what would you predict as the Sepal.Length for an iris with a Petal.Length of 3.4 cm?
  8. We would expect approximately 95% of the irises to have a Sepal.Length within ± (fill in the blank)  cm of their predicted values from the regression line.

Answers should be in the form of R code on how to accomplish each part and include the correct statistical explanation for those that require it in the question. Please be as thorough as possible. Thank you so much!!!

Solutions

Expert Solution

a)

The linear regression equation is

Y=4.3066+0.40892X      where Y= Sapel length and X= Petal length

Slope of the line is 0.40892

i.e for every 1 cm increase in petal length there is 0.40892cm lenth in Sapel length.

b)

c) H0:

    H1:

t -value = 21.65    (refer R code)

Since p-value<0.05 we reject H0 and conclude that there is significant releation between Petal length anf Sapel length

F-value=468.6

p-value <0.05 hence reject Ho that there is no relationship between petal length and Sapel length.

g) the Sapel length when the petal length is 3.4 cm is

d and e)

In Q-Q plot we have diogonal line, hence the data is normally distributed.

In the scale location graph the points are equally spread below and above the red line. Hence we can assume constant variance.

f) the adjusted R-square is 0.758 . That means 75.8% of changes in sapel length is dependend on petal length.

h)


Related Solutions

There are four numeric columns in R programming language's iris data set. Create a scatter plot...
There are four numeric columns in R programming language's iris data set. Create a scatter plot between the four numeric columns using R programming language and give answers to the following parts. Calculate the correlation between each pair of the four numeric columns in iris. Which pair of variables has the strongest linear relationship? Interpret their ??. Which pair of variables has the weakest linear relationship? Interpret their ??. Which pair(s) of variables can you conclude have a population correlation...
1. Use the R command X <- iris to assign Fishers’ iris dataset to the data...
1. Use the R command X <- iris to assign Fishers’ iris dataset to the data matrix X. Using the head(X) command summarize what each column of the dataset is measuring and represents. Assign Y as a new matrix of dimension 150 by 4 which has the values of X without the species label. 2. Compute and interpret (in summary English) each of the summary statistics X,S,R using R. 3. Visualize the dataset by making a scatterplot of Sepal Length...
Using R studio 1. Read the iris data set into a data frame. 2. Print the...
Using R studio 1. Read the iris data set into a data frame. 2. Print the first few lines of the iris dataset. 3. Output all the entries with Sepal Length > 5. 4. Plot a box plot of Petal Length with a color of your choice. 5. Plot a histogram of Sepal Width. 6. Plot a scatter plot showing the relationship between Petal Length and Petal Width. 7. Find the mean of Sepal Length by species. Hint: You could...
This problem is going to use the data set in R called "ChickWeight" that has 4...
This problem is going to use the data set in R called "ChickWeight" that has 4 variables, as described below. ChickWeight: A data frame with 578 observations on 4 variables. 1) weight: a numeric vector giving the body weight of the chick (gm). 2) Time: a numeric vector giving the number of days since birth when the measurement was made. 3) Chick: an ordered factor with levels 18 < ... < 48 giving a unique identifier for the chick. The...
This problem is going to use the data set in R called "ChickWeight" that has 4...
This problem is going to use the data set in R called "ChickWeight" that has 4 variables, as described below. ChickWeight: A data frame with 578 observations on 4 variables. 1) weight: a numeric vector giving the body weight of the chick (gm). 2) Time: a numeric vector giving the number of days since birth when the measurement was made. 3) Chick: an ordered factor with levels 18 < ... < 48 giving a unique identifier for the chick. The...
This problem is going to use the data set in R called "ChickWeight" that has 4...
This problem is going to use the data set in R called "ChickWeight" that has 4 variables, as described below. ChickWeight: A data frame with 578 observations on 4 variables. 1) weight: a numeric vector giving the body weight of the chick (gm). 2) Time: a numeric vector giving the number of days since birth when the measurement was made. 3) Chick: an ordered factor with levels 18 < ... < 48 giving a unique identifier for the chick. The...
LISP Programming Language Write a Bubble Sort program in the LISP Programming Language called “sort” that...
LISP Programming Language Write a Bubble Sort program in the LISP Programming Language called “sort” that sorts the array below in ascending order.  LISP is a recursive language so the program will use recursion to sort. Since there will be no loops, you will not need the variables i, j, and temp, but still use the variable name array for the array to be sorted.             Array to be sorted is 34, 56, 4, 10, 77, 51, 93, 30, 5, 52 The...
Using the R built-in data set called Chick Weight, we want to compare the mean weight...
Using the R built-in data set called Chick Weight, we want to compare the mean weight across the different types of Diet. IMPORTANT: We only want to compare chicks at the final value of Time, 21. In this problem, use ?? = 0.05. Make a boxplot to compare weight across the different types of Diet. Based on the boxplot, describe any differences (or lack of differences) you see. Run an ANOVA to compare weight across the different types of Diet....
3. Using the R data set called warpbreaks (See ?warpbreaks for more info), we want to...
3. Using the R data set called warpbreaks (See ?warpbreaks for more info), we want to compare the mean breaks across both the different types of wool and the different levels of tension. In this problem, use ?? = 0.10. a. Make a boxplot to compare breaks across both wool and tension. Color-code the three different tension levels for easier visibility. Within wool A, describe the relationship between tension and breaks. Within wool B, describe the relationship between tension and...
R programming language. The seafood data recorded the bacterial growth on oysters and mussels. The bacterial...
R programming language. The seafood data recorded the bacterial growth on oysters and mussels. The bacterial counts are measured in log scale at three different times. Convert the data to have one variable showing the bacterial counts and one variable showing the different time points. Seafood time0 time1 time2 Oysters 1.83 3.68 5.23 Oysters 1.11 3.25 4.98 Oysters 2.01 3.98 5.02 Oysters 7.19 7.42 9.32 Oysters 6.89 7.34 8.92 Oysters 7.01 8.11 9.99 Oysters 6.47 9.44 9.78 Oysters 5.98 8.77...
ADVERTISEMENT
ADVERTISEMENT
ADVERTISEMENT