In: Statistics and Probability
Researchers compared antibody responses in normal and alloxan diabetic mice. Three groups of mice were studied: normal, alloxan diabetic, and alloxan diabetic treated with insulin. Several comparisons are of interest:
- Does the antibody response of alloxan diabetic mice differ from the antibody response of normal mice?
- Does treating alloxan diabetic mice with insulin affect their antibody response?
- Does the antibody response of alloxan diabetic mice treated with insulin differ from the antibody response of normal mice?
Table displays the amounts of nitrogen-bound bovine serum ablumin produced by the mice:
Normal | 156 | 282 | 197 | 297 | 116 | 127 | 119 | 29 | 253 | 122 | 349 | 110 | 143 | 64 | 26 | 86 | 122 | 455 | 655 | 14 |
Alloxan | 391 | 46 | 469 | 86 | 174 | 133 | 13 | 499 | 168 | 62 | 127 | 276 | 176 | 146 | 108 | 276 | 50 | 73 | ||
Alloxan + Insulin | 82 | 100 | 98 | 150 | 243 | 68 | 228 | 131 | 73 | 18 | 20 | 100 | 72 | 133 | 465 | 40 | 46 | 34 | 44 |
a. Using the above data, investigate the ANOVA assumptions of normality and homoscedasticity. Do these assumptions seem plausible for these data? Why or why not?
b. Now transform the data by taking the square root of each measurement. Using the transformed data, investigate the ANOVA assumptions of normality and homoscedasticity. Do these assumptions seem plausible for the transformed data? Why or why not?
c. Using the transformed data, construct an ANOVA table. State the null and alternative hypotseses tested by this method. Should the null hypothesis be rejected at the alpha = 0.05 level?
d. Using the transformed data, construct suitable contrasts for investigating the research questions framed above. State appropriate null and alternative hypotheses and test them using the method of Bonferroni t-tests. At what significance level should these hypotheses be tested in order to maintain a family rate of Type I error equal to 5%? Which null hypotheses should be rejected?
Note: For thisproblem, you can choose a method to construct the ANOVA table. Please include appropriate labels in your plots!
(please provide solution with appropriate R code)
We first make our data for fitting the model then obtain the residuals and then check the normality and homoscadasticity assumption through some plots and tests.
Here is the R code for creating the data set useful for doing anova and getting the model summary
Normal=c(156, 282, 197, 297, 116 ,127 ,119, 29, 253, 122 ,349, 110, 143 ,64, 26, 86, 122, 455 ,655, 14)
Alloxan=c( 391, 46, 469, 86, 174 ,133 ,13, 499 ,168 ,62, 127, 276, 176, 146, 108, 276, 50, 73)
AlloxanInsulin=c(82, 100, 98 ,150, 243, 68, 228, 131, 73, 18,20, 100, 72, 133 ,465, 40, 46, 34,44)
response=c(156, 282, 197, 297, 116 ,127 ,119, 29, 253, 122 ,349, 110, 143 ,64, 26, 86, 122, 455 ,655, 14,391, 46, 469, 86, 174 ,133 ,13, 499 ,168 ,62, 127, 276, 176, 146, 108, 276, 50, 73,82, 100, 98 ,150, 243, 68, 228, 131, 73, 18,20, 100, 72, 133 ,465, 40, 46, 34,44)
group=c(rep(1,20),rep(2,18),rep(3,19))
group=as.factor(group)
anovamodel=lm(response~group) # fitting the model
anovamodel # seeing The moel in console
summary(anovamodel) # summary of your fitted model
anova(anovamodel) # Anova table
residuals(anovamodel) # obtaining the residuals
Here are the results
Call:
lm(formula = response ~ group)
Coefficients:
(Intercept) group2 group3
186.100 -4.267 -73.205
Have a look at the list of the coefficients.
I The intercept term is the general effect µ:
I The other terms signifies additional effects ?i.
I But there are only two estimates for ?2 and ?3: Why?
I Because of the identifiability restriction sum( ni?i) = 0:
Lets see the summarry of the model:
summary(anovamodel)
Call:
lm(formula = response ~ group)
Residuals:
Min 1Q Median 3Q Max
-172.10 -76.10 -40.89 37.11 468.90
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 186.100 30.994 6.004 1.68e-07 ***
group2 -4.267 45.033 -0.095 0.925
group3 -73.205 44.405 -1.649 0.105
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
Residual standard error: 138.6 on 54 degrees of freedom
Multiple R-squared: 0.05841, Adjusted R-squared: 0.02354
F-statistic: 1.675 on 2 and 54 DF, p-value: 0.1969
Lets see the anova table:
anova(anovamodel)
Analysis of Variance Table
Response: response
Df Sum Sq Mean Sq F value Pr(>F)
group 2 64357 32178 1.6749 0.1969
Residuals 54 1037470 19212
Now that was the code and output now lets check the normality and homoscadasticity assumption
We draw the normal probability plot of the residuals to see wheather the points are close to the line or not then formally perform the Shapiro-wilks test to check the normality
Here is the code for that
qqnorm(residuals(anovamodel))
qqline(residuals(anovamodel),col="red")
shapiro.test(residuals(anovamodel))
And gere is the output.
We see that the points are not close to the line and get the idea that the data is not nearly normal
Then the result of the test
> shapiro.test(residuals(anovamodel))
Shapiro-Wilk normality test
data: residuals(anovamodel)
W = 0.85533, p-value = 6.999e-06
Here also the p value is less than 0.05 so the data is not atall normal
Now we check the homoscadasticity by Bruce pegan test
Here is the R code
library(lmtest)
bptest(anovamodel)
Result
bptest(anovamodel)
studentized Breusch-Pagan test
data: anovamodel
BP = 1.3698, df = 2, p-value = 0.5041
So the homoscadasticity assumption is satisfied.
B) Transforming the data and performing the tests again
Here is the code
transformedresponce=response^0.5
transformedresponce
group=c(rep(1,20),rep(2,18),rep(3,19))
group=as.factor(group)
anovamodel2=lm(transformedresponce
~group)
anovamodel2
residuals(anovamodel2)
summary(anovamodel2)
anova(anovamodel2)
qqnorm(residuals(anovamodel2))
qqline(residuals(anovamodel2),col="red")
shapiro.test(residuals(anovamodel2))
library(lmtest)
bptest(anovamodel2)
REsults
The points are really close so the transformed data set is nearly normal.
and the test results show that the p value is greater than 0.05 so we dont reject the null hypothesis that the data is normal.
shapiro.test(residuals(anovamodel2))
Shapiro-Wilk normality test
data:
residuals(anovamodel2)
W = 0.96137, p-value = 0.06599
And the result of bruce pegan test also shows tht the p value is higher than 0.05 so the data also have homoscadastic assumption.
bptest(anovamodel2)
studentized Breusch-Pagan test
data:
anovamodel2
BP = 1.1337, df = 2, p-value = 0.5673
c) The anova table for the transformed data is
Analysis of Variance Table
Response: transformedresponce
Df Sum Sq Mean Sq F value Pr(>F)
group 2 94.74 47.368 1.8815 0.1622
Residuals 54 1359.47 25.175
as the p value is greater than 0.05 so we dont reject the null hypothesis that effect of anibody responce for all three group are the same.
D) as all the effects are same no question of pairwise comparision is there but for the question we here test
H0:alphai=alphaj ag H1: alphai=!alphaj
code for that
pairwise.t.test(anovamodel2) # this will give p value for all pairs of test.
Well that was the discussion finally i provide you the whole code here
"""""
response=c(156, 282, 197, 297, 116 ,127 ,119, 29, 253, 122 ,349, 110, 143 ,64, 26, 86, 122, 455 ,655, 14,391, 46, 469, 86, 174 ,133 ,13, 499 ,168 ,62, 127, 276, 176, 146, 108, 276, 50, 73,82, 100, 98 ,150, 243, 68, 228, 131, 73, 18,20, 100, 72, 133 ,465, 40, 46, 34,44)
group=c(rep(1,20),rep(2,18),rep(3,19))
group=as.factor(group)
anovamodel=lm(response~group)
anovamodel
residuals(anovamodel)
summary(anovamodel)
anova(anovamodel)
qqnorm(residuals(anovamodel))
qqline(residuals(anovamodel),col="red")
shapiro.test(residuals(anovamodel))
library(lmtest)
bptest(anovamodel)
transformedresponce=response^0.5
transformedresponce
group=c(rep(1,20),rep(2,18),rep(3,19))
group=as.factor(group)
anovamodel2=lm(transformedresponce
~group)
anovamodel2
residuals(anovamodel2)
summary(anovamodel2)
anova(anovamodel2)
qqnorm(residuals(anovamodel2))
qqline(residuals(anovamodel2),col="red")
shapiro.test(residuals(anovamodel2))
library(lmtest)
bptest(anovamodel2)
""""'