Question

In: Statistics and Probability

3. Using the R data set called warpbreaks (See ?warpbreaks for more info), we want to...

3. Using the R data set called warpbreaks (See ?warpbreaks for more info), we want to compare the mean breaks across both the different types of wool and the different levels of tension. In this problem, use ?? = 0.10.

a. Make a boxplot to compare breaks across both wool and tension. Color-code the three different tension levels for easier visibility. Within wool A, describe the relationship between tension and breaks. Within wool B, describe the relationship between tension and breaks.

b. Run a two-way ANOVA to compare breaks across both wool and tension, including their interaction. Is the interaction significant? Which main effects are significant?

c. Regardless of your answer to part b, make an interaction plot (color coding might help, but is not required) and interpret it.

d. Run a one-way ANOVA (or a two-sample ??-test) to compare breaks across just wool. What is the result of the test, and how do you reconcile that with our previous results?

Answers should be in the form of R code on how to accomplish each part and include the correct statistical explanation for those that require it in the question. Please be as thorough as possible. Thank you so much!!!

Solutions

Expert Solution

R-commands and outputs:

d=warpbreaks
head(d)

breaks wool tension
1 26 A L
2 30 A L
3 54 A L
4 25 A L
5 70 A L
6 52 A L

brk=d$breaks
wool=d$wool
tension=d$tension

# a)
?boxplot
boxplot(brk~wool+tension, col=c(3,3,7,7,2,2))

# There is high variability between most pairs of wool and tension, especially wool A and tension L.
# This variability may be inherent in the data itself OR it is due to small sample size n=9 for each pair. This question should be thought here.
# Within wool A, the relationship between tension and breaks is MORE VARIED as compared to that witin wool B. This is indicated by larger length of rectangle in the plot as well the whiskers.

# b)
## Two-way ANOVA to compare breaks across both wool and tension, including their interaction
ANOVA=aov(breaks~wool+tension+wool*tension,data=d)
ANOVA

Call:
aov(formula = breaks ~ wool + tension + wool * tension, data = d)
Terms:
wool tension wool:tension Residuals
Sum of Squares 450.667 2034.259 1002.778 5745.111
Deg. of Freedom 1 2 2 48
Residual standard error: 10.94028
Estimated effects may be unbalanced

s=summary(ANOVA)
s

Df Sum Sq Mean Sq F value Pr(>F)
wool 1 451 450.7 3.765 0.058213 .
tension 2 2034 1017.1 8.498 0.000693 ***
wool:tension 2 1003 501.4 4.189 0.021044 *
Residuals 48 5745 119.7   
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

## alpha= 0.10
## H0I: Interaction effect is insignificant.
## H0A: main effect A (wool) is insignificant.
## H0B: main effect B (wool) is insignificant.
## Clearly, for interaction (wool:tension) effect, p-value is 0.021044, which is less than alpha. Therefore, we Reject H0I.
## Conclude that Interaction is significant.
## Also,for main effect A (wool), p-value is 0.058213, which is less than alpha(0.10). Thus, we Reject H0A.
## For main effect B (tension), p-value is 0.000693, which is less than alpha(0.10). Thus, we Reject H0B.
## Conclude that both main effects are significant.

# c)
## An interaction between factors occurs when the change in response from the low level to the high level of one factor is not the same as the change in response at the same two levels of a second factor. That is, the effect of one factor is dependent upon a second factor. You can use interaction plots to compare the relative strength of the effects across factors.
interaction.plot(tension,wool,brk) #Response=brk


## From Interaction plot, we can see that it appears that wool A has a decrease in breaks between low(L) and medium(M) tension, while wool B has a decrease in breaks between medium(M) and high(H).

# d)
oneANOVA=aov(breaks~wool,data=d)
oneANOVA

Call:
aov(formula = breaks ~ wool, data = d)
Terms:
wool Residuals
Sum of Squares 450.667 8782.148
Deg. of Freedom 1 52
Residual standard error: 12.99567
Estimated effects may be unbalanced

ones=summary(oneANOVA)
ones

Df Sum Sq Mean Sq F value Pr(>F)
wool 1 451 450.7 2.668 0.108
Residuals 52 8782 168.9   

# From the output of summary command, p-value for wool is 0.108 which is greater (though slightly) than alpha=0.1. We fail to Reject H0.
## Conclusion: 'wool' is not significant effect.
## This result is different from the result of two-way anova.


Related Solutions

Using the R built-in data set called Chick Weight, we want to compare the mean weight...
Using the R built-in data set called Chick Weight, we want to compare the mean weight across the different types of Diet. IMPORTANT: We only want to compare chicks at the final value of Time, 21. In this problem, use ?? = 0.05. Make a boxplot to compare weight across the different types of Diet. Based on the boxplot, describe any differences (or lack of differences) you see. Run an ANOVA to compare weight across the different types of Diet....
In the R programming language, we would like to use the data set called iris to...
In the R programming language, we would like to use the data set called iris to build a simple linear regression model to predict Sepal.Length based on Petal.Length. Calculate the least squares regression line to predict Sepal.Length based on Petal.Length. Interpret the slope of the line in the context of the problem. Remember that both variables are measured in centimeters. Plot the regression line in a scatterplot of Sepal.Length vs. Petal.Length. Test H1: ??1 ≠ 0 at ?? = 0.05...
Using R Programing Apply clustering to "Wholesale customers Data Set" and see if you can distinguish...
Using R Programing Apply clustering to "Wholesale customers Data Set" and see if you can distinguish between regions. NOTE: the clustering should exclude the region column.
This problem is going to use the data set in R called "ChickWeight" that has 4...
This problem is going to use the data set in R called "ChickWeight" that has 4 variables, as described below. ChickWeight: A data frame with 578 observations on 4 variables. 1) weight: a numeric vector giving the body weight of the chick (gm). 2) Time: a numeric vector giving the number of days since birth when the measurement was made. 3) Chick: an ordered factor with levels 18 < ... < 48 giving a unique identifier for the chick. The...
This problem is going to use the data set in R called "ChickWeight" that has 4...
This problem is going to use the data set in R called "ChickWeight" that has 4 variables, as described below. ChickWeight: A data frame with 578 observations on 4 variables. 1) weight: a numeric vector giving the body weight of the chick (gm). 2) Time: a numeric vector giving the number of days since birth when the measurement was made. 3) Chick: an ordered factor with levels 18 < ... < 48 giving a unique identifier for the chick. The...
This problem is going to use the data set in R called "ChickWeight" that has 4...
This problem is going to use the data set in R called "ChickWeight" that has 4 variables, as described below. ChickWeight: A data frame with 578 observations on 4 variables. 1) weight: a numeric vector giving the body weight of the chick (gm). 2) Time: a numeric vector giving the number of days since birth when the measurement was made. 3) Chick: an ordered factor with levels 18 < ... < 48 giving a unique identifier for the chick. The...
In this problem we consider a data structure for maintaining a multi-set M. We want to...
In this problem we consider a data structure for maintaining a multi-set M. We want to support the following operations: • Init(M): create an empty data structure M. • Insert(M, i): insert (one copy of) i in M. • Remove(M, i): remove (one copy of) i from M. • F requency(M, i): return the number of copies of i in M. • Select(M, k): return the kth element in the sorted order of elements in M.
We have been using the same set of data (Data Set One) in the notes to...
We have been using the same set of data (Data Set One) in the notes to illustrate production and costs. I have provided Data Set One in both tables below. When costs were calculated in the notes, fixed costs were $200. By using the term fixed costs economists are only referring to the fact that a firm must pay this expense no matter how much output it produces or sells. An example of a fixed cost could be the rent...
We have been using the same set of data (Data Set One) in the notes to...
We have been using the same set of data (Data Set One) in the notes to illustrate production and costs. I have provided Data Set One in both tables below. When costs were calculated in the notes, fixed costs were $200. By using the term fixed costs economists are only referring to the fact that a firm must pay this expense no matter how much output it produces or sells. An example of a fixed cost could be the rent...
We have been using the same set of data (Data Set One) in the notes to...
We have been using the same set of data (Data Set One) in the notes to illustrate production and costs. I have provided Data Set One in both tables below. When costs were calculated in the notes, fixed costs were $200. By using the term fixed costs economists are only referring to the fact that a firm must pay this expense no matter how much output it produces or sells. An example of a fixed cost could be the rent...
ADVERTISEMENT
ADVERTISEMENT
ADVERTISEMENT