In: Statistics and Probability
A research facility is evaluating five different processes of extracting starch from corn flour.
The percent yield of starch is the response of interest. The data is shown below:
Process Starch_Yield
1 75.9
1 72.3
1 64.7
1 68.7
1 73.4
1 77.7
2 77.7
2 74.9
2 78.3
2 77.3
3 63.5
3 60.4
3 65.3
3 62.6
3 59.3
4 61
4 64
4 55.9
5 76.3
5 75.3
5 71.5
5 81
a) Perform an ANOVA test and report the F value and use 2 decimal places
b) Are the processes significantly different?
c) Which process(es) would you pick?
Solve using R studio and post code
Solution:
create a dataframe df1 with given data
use aov function in R to fit anova
summary function to get ANOVA table.
Tukey function to know which group means are different
Rcode:
df1 =read.table(header = TRUE, text ="
Process Starch_Yield
1 75.9
1 72.3
1 64.7
1 68.7
1 73.4
1 77.7
2 77.7
2 74.9
2 78.3
2 77.3
3 63.5
3 60.4
3 65.3
3 62.6
3 59.3
4 61
4 64
4 55.9
5 76.3
5 75.3
5 71.5
5 81
"
)
df1
df1$Process <- ordered(df1$Process ,
levels = c("1", "2", "3","4","5"))
summary(df1)
res.aov <- aov(Starch_Yield ~ Process , data = df1)
summary(res.aov)
TukeyHSD(res.aov)
Output:
Df Sum Sq Mean Sq F value Pr(>F)
Process 4 955.4 238.85 18.21 5.65e-06 ***
Residuals 17 223.0 13.12
---
Signif. codes:
0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
> TukeyHSD(res.aov)
Tukey multiple comparisons of means
95% family-wise confidence level
Fit: aov(formula = Starch_Yield ~ Process, data = df1)
$Process
diff lwr upr p adj
2-1 4.933333 -2.179518 12.046185 0.2605731
3-1 -9.896667 -16.569113 -3.224220 0.0024770
4-1 -11.816667 -19.608405 -4.024928 0.0020080
5-1 3.908333 -3.204518 11.021185 0.4755964
3-2 -14.830000 -22.221892 -7.438108 0.0001009
4-2 -16.750000 -25.166040 -8.333960 0.0001109
5-2 -1.025000 -8.816739 6.766739 0.9940688
4-3 -1.920000 -9.967273 6.127273 0.9474926
5-3 13.805000 6.413108 21.196892 0.0002301
5-4 15.725000 7.308960 24.141040 0.0002289
a) Perform an ANOVA test and report the F value and use 2 decimal places
Ho:all the 5 group means are equal
Ha:atleast one of the group means are different
alpha=0.05
From ANOVA table
F= 18.21
p value=5.65e-06
p<0.05
Reject Ho
Accept Ha
There is sufficient statistical evidence at 5% level of significance to conclude that processes significantly different
c) Which process(es) would you pick?
p<0.05,such pair of process are significantly different
$Process
diff lwr upr p adj
2-1 4.933333 -2.179518 12.046185 0.2605731
3-1 -9.896667 -16.569113 -3.224220 0.0024770
4-1 -11.816667 -19.608405 -4.024928
0.0020080
5-1 3.908333 -3.204518 11.021185 0.4755964
3-2 -14.830000 -22.221892 -7.438108
0.0001009
4-2 -16.750000 -25.166040 -8.333960
0.0001109
5-2 -1.025000 -8.816739 6.766739 0.9940688
4-3 -1.920000 -9.967273 6.127273 0.9474926
5-3 13.805000 6.413108 21.196892 0.0002301
5-4 15.725000 7.308960 24.141040 0.0002289
3-1,4-1,3-2,4-2,5-3,5-4 processes are significantly
different as p<0.05