In: Statistics and Probability
The computer operations department has a business objective of reducing the amount of time to fully update each subscriber’s set of messages in a special secure email system. A team designed a factorial experiment involving two factors, i.e., Interface and Media, with the goal of understanding the importance of these factors in explaining the variability in Update Time. The factor Interface has three levels, i.e., System I, System II, and System III, while the factor Media has two levels, i.e., Cable and Fiber. Consequently, there are 3x2=6 different factor-level combinations. Subscribers were randomly assigned to one of the 6 factor-level combinations and Update Time was recorded for each observation. The data are available in the worksheet labeled “Problem 5”. Use Excel to report answers when available.
1. Analyze the experimental data. Report the ANOVA Table and comment on the importance of the two factors and their interaction in explaining the variability in Update Time.
2. If there are model terms with p-values > 0.05, drop them from the model and re-analyze the data under your reduced model. Refer to this as your final model.
3. Report and interpret the coefficient of determination, R2, for your final model.
4. What level(s) of the important factor(s) appears to produce the shortest Update Time?
Subscriber ID | Update Time | Interface | Media |
1 | 4.56 | System I | Cable |
2 | 4.9 | System I | Cable |
3 | 4.18 | System I | Cable |
4 | 3.56 | System I | Cable |
5 | 4.34 | System I | Cable |
6 | 4.17 | System II | Cable |
7 | 4.28 | System II | Cable |
8 | 4 | System II | Cable |
9 | 3.96 | System II | Cable |
10 | 3.6 | System II | Cable |
11 | 3.53 | System III | Cable |
12 | 3.77 | System III | Cable |
13 | 4.1 | System III | Cable |
14 | 2.87 | System III | Cable |
15 | 3.18 | System III | Cable |
16 | 4.41 | System I | Fiber |
17 | 4.08 | System I | Fiber |
18 | 4.69 | System I | Fiber |
19 | 5.18 | System I | Fiber |
20 | 4.85 | System I | Fiber |
21 | 3.79 | System II | Fiber |
22 | 4.11 | System II | Fiber |
23 | 3.58 | System II | Fiber |
24 | 4.53 | System II | Fiber |
25 | 4.02 | System II | Fiber |
26 | 4.33 | System III | Fiber |
27 | 4 | System III | Fiber |
28 | 4.31 | System III | Fiber |
29 | 3.96 | System III | Fiber |
30 | 3.32 | System III | Fiber |
We do this using R code and I will explain everything in detail.
Data:
Subscriber ID | Update Time | Interface | Media |
1 | 4.56 | System I | Cable |
2 | 4.9 | System I | Cable |
3 | 4.18 | System I | Cable |
4 | 3.56 | System I | Cable |
5 | 4.34 | System I | Cable |
6 | 4.17 | System II | Cable |
7 | 4.28 | System II | Cable |
8 | 4 | System II | Cable |
9 | 3.96 | System II | Cable |
10 | 3.6 | System II | Cable |
11 | 3.53 | System III | Cable |
12 | 3.77 | System III | Cable |
13 | 4.1 | System III | Cable |
14 | 2.87 | System III | Cable |
15 | 3.18 | System III | Cable |
16 | 4.41 | System I | Fiber |
17 | 4.08 | System I | Fiber |
18 | 4.69 | System I | Fiber |
19 | 5.18 | System I | Fiber |
20 | 4.85 | System I | Fiber |
21 | 3.79 | System II | Fiber |
22 | 4.11 | System II | Fiber |
23 | 3.58 | System II | Fiber |
24 | 4.53 | System II | Fiber |
25 | 4.02 | System II | Fiber |
26 | 4.33 | System III | Fiber |
27 | 4 | System III | Fiber |
28 | 4.31 | System III | Fiber |
29 | 3.96 | System III | Fiber |
30 | 3.32 | System III | Fiber |
1. Analyze the experimental data. Report the ANOVA Table and comment on the importance of the two factors and their interaction in explaining the variability in Update Time.
Here, we fit model:
Update Time ~ Interface + Media + Interface*Media
Now,
R Code
> model4 =
lm(Update.Time~Interface+Media+Interface*Media,data=data)
> summary(model4)
and Output:
Call:
lm(formula = Update.Time ~ Interface + Media + Interface *
Media,
data = data)
Residuals:
Min 1Q Median 3Q Max
-0.7480 -0.2280 0.0240 0.2715 0.6100
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 4.3080 0.1844 23.368 < 2e-16 ***
InterfaceSystem II -0.3060 0.2607 -1.174 0.25204
InterfaceSystem III -0.8180 0.2607 -3.137 0.00447 **
MediaFiber 0.3340 0.2607 1.281 0.21241
InterfaceSystem II:MediaFiber -0.3300 0.3687 -0.895 0.37967
InterfaceSystem III:MediaFiber 0.1600 0.3687 0.434 0.66821
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
Residual standard error: 0.4122 on 24 degrees of freedom
Multiple R-squared:
0.4744, Adjusted R-squared: 0.3649
F-statistic: 4.333 on 5 and 24 DF, p-value:
0.005963
The model is significant as p-value = 0.006 < 0.05 and Rsq = 0.4744 or it explains 47.44% variability.
>
anova(model4)
Analysis of Variance Table
Response: Update.Time
Df Sum Sq Mean Sq F value Pr(>F)
Interface 2 2.7926 1.39629 8.2165 0.001913 **
Media 1 0.5769 0.57685 3.3945 0.077802 .
Interface:Media 2 0.3122 0.15608 0.9185 0.412704
Residuals 24 4.0785 0.16994
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
Here, p-value for interaction term = 0.413 > 0.05 (not significant) and also here, Media with p-value = 0.078 > 0.05 is not significant.
2. If there are model terms with p-values > 0.05, drop them from the model and re-analyze the data under your reduced model. Refer to this as your final model.
Here, we drop interaction term and media and re -run our model:
> model3 =
lm(Update.Time~Interface,data=data)
> summary(model3)
Call:
lm(formula = Update.Time ~ Interface, data = data)
Residuals:
Min 1Q Median 3Q Max
-0.9150 -0.2747 0.0245 0.2727 0.7050
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 4.4750 0.1356 32.992 < 2e-16 ***
InterfaceSystem II -0.4710 0.1918 -2.455 0.020798 *
InterfaceSystem III -0.7380 0.1918 -3.847 0.000662 ***
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
Residual standard error: 0.4289 on 27 degrees of freedom
Multiple R-squared:
0.3599, Adjusted R-squared: 0.3124
F-statistic: 7.589 on 2 and 27 DF, p-value:
0.002425
Here, model is significant but R-sq is even less than previous one as it explains only 35.99% variability.
3. Report and interpret the coefficient of determination, R2, for your final model.
We can also try two more models:
> model1 =
lm(Update.Time~Media+Interface,data = data)
> summary(model1)
Call:
lm(formula = Update.Time ~ Media + Interface, data = data)
Residuals:
Min 1Q Median 3Q Max
-0.77633 -0.24992 0.08033 0.28758 0.56633
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 4.3363 0.1501 28.899 < 2e-16 ***
MediaFiber 0.2773 0.1501 1.848 0.075975 .
InterfaceSystem II -0.4710 0.1838 -2.563 0.016517 *
InterfaceSystem III -0.7380 0.1838 -4.016 0.000449 ***
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
Residual standard error: 0.4109 on 26 degrees of freedom
Multiple R-squared:
0.4342, Adjusted R-squared: 0.3689
F-statistic: 6.651 on 3 and 26 DF, p-value:
0.001754
Here, p-value < 0.05 and this model is also significant and R-sq means 43.42% variability is explained.
> model2 =
lm(Update.Time~Media, data = data)
> summary(model2)
Call:
lm(formula = Update.Time ~ Media, data = data)
Residuals:
Min 1Q Median 3Q Max
-1.06333 -0.31267 0.04667 0.30117 0.96933
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 3.9333 0.1308 30.08 <2e-16 ***
MediaFiber 0.2773 0.1849 1.50 0.145
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
Residual standard error: 0.5065 on 28 degrees of freedom
Multiple R-squared:
0.07434, Adjusted R-squared: 0.04128
F-statistic: 2.249 on 1 and 28 DF, p-value:
0.1449
Here, model itself is not significant.
So, from 3 significant models, the highest R-sq is with model4 i.e., Update Time ~ Interface + Media + Interface*Media and we choose this as our final model.
4. What level(s) of the important factor(s) appears to produce the shortest Update Time?
For Shortest Update Time,
The interface is 'System III' and Media is 'Cable'.
Please rate my answer and comment for doubt.