In: Statistics and Probability
An orange juice processing plant has three production lines. The production lines fill juice into 400 ml packages. The production manager of the plant would like to know if the production lines are all filling the packages the same amount.
A sample of 25 packages from each production line are taken and the data is saved in juice.csv. The production manager would like to know:
The production manager already attempted to answer their question by applying a one-way ANOVA to the data. This resulted in a test where the assumption of normality failed. This was unable to be corrected by transforming the response variable. Use another method to try and answer the production managers question. The production manager is only interested in whether a difference exists, not where the differences are.
Test at the 5% significance level.
milliliters | line |
402.68 | A |
393.35 | A |
401.24 | A |
400.14 | A |
375.38 | A |
411.12 | A |
391.55 | A |
407.26 | A |
395.33 | A |
380.32 | A |
406.84 | A |
383.57 | A |
453.31 | A |
413.78 | A |
397.03 | A |
402.45 | A |
421.87 | A |
401.18 | A |
394.97 | A |
408.89 | A |
400.05 | A |
434.73 | A |
429.74 | A |
393.9 | A |
394.56 | A |
396.86 | B |
401.73 | B |
409.81 | B |
398.55 | B |
400.23 | B |
391.49 | B |
385.4 | B |
395.28 | B |
429.74 | B |
396.3 | B |
405.9 | B |
387.05 | B |
395.24 | B |
371.94 | B |
396.97 | B |
382.35 | B |
402.42 | B |
391.75 | B |
399.24 | B |
400.56 | B |
404.01 | B |
398.32 | B |
377.59 | B |
416.86 | B |
395.4 | B |
389.53 | C |
400.46 | C |
376.49 | C |
392.31 | C |
390.51 | C |
377.89 | C |
394.94 | C |
391.06 | C |
365.18 | C |
381.93 | C |
412.21 | C |
373.84 | C |
381.12 | C |
387 | C |
390.56 | C |
398.56 | C |
392.72 | C |
379.02 | C |
396.9 | C |
399.23 | C |
399.31 | C |
423.61 | C |
396.35 | C |
383.29 | C |
396.33 | C |
Consider A=1, B=2, C = 3
The normality assumption are fail, i.e.data is not normal hence we use nonparamatric test which is parallel to one way ANOVA it is called kruskal-Wallis test.
First, all the data needs to be put together in one column as shown below:
Line | millilitres |
1 | 402.68 |
1 | 393.35 |
1 | 401.24 |
1 | 400.14 |
1 | 375.38 |
1 | 411.12 |
1 | 391.55 |
1 | 407.26 |
1 | 395.33 |
1 | 380.32 |
1 | 406.84 |
1 | 383.57 |
1 | 453.31 |
1 | 413.78 |
1 | 397.03 |
1 | 402.45 |
1 | 421.87 |
1 | 401.18 |
1 | 394.97 |
1 | 408.89 |
1 | 400.05 |
1 | 434.73 |
1 | 429.74 |
1 | 393.9 |
1 | 394.56 |
2 | 396.86 |
2 | 401.73 |
2 | 409.81 |
2 | 398.55 |
2 | 400.23 |
2 | 391.49 |
2 | 385.4 |
2 | 395.28 |
2 | 429.74 |
2 | 396.3 |
2 | 405.9 |
2 | 387.05 |
2 | 395.24 |
2 | 371.94 |
2 | 396.97 |
2 | 382.35 |
2 | 402.42 |
2 | 391.75 |
2 | 399.24 |
2 | 400.56 |
2 | 404.01 |
2 | 398.32 |
2 | 377.59 |
2 | 416.86 |
2 | 395.4 |
3 | 389.53 |
3 | 400.46 |
3 | 376.49 |
3 | 392.31 |
3 | 390.51 |
3 | 377.89 |
3 | 394.94 |
3 | 391.06 |
3 | 365.18 |
3 | 381.93 |
3 | 412.21 |
3 | 373.84 |
3 | 381.12 |
3 | 387 |
3 | 390.56 |
3 | 398.56 |
3 | 392.72 |
3 | 379.02 |
3 | 396.9 |
3 | 399.23 |
3 | 399.31 |
3 | 423.61 |
3 | 396.35 |
3 | 383.29 |
3 | 396.33 |
Now, the data needs to be organized in ascending order by value (keeping track of what sample the values belongs to). The results are shown belo
line | millilitres(In Asc. Order) |
3 | 365.18 |
2 | 371.94 |
3 | 373.84 |
1 | 375.38 |
3 | 376.49 |
2 | 377.59 |
3 | 377.89 |
3 | 379.02 |
1 | 380.32 |
3 | 381.12 |
3 | 381.93 |
2 | 382.35 |
3 | 383.29 |
1 | 383.57 |
2 | 385.4 |
3 | 387 |
2 | 387.05 |
3 | 389.53 |
3 | 390.51 |
3 | 390.56 |
3 | 391.06 |
2 | 391.49 |
1 | 391.55 |
2 | 391.75 |
3 | 392.31 |
3 | 392.72 |
1 | 393.35 |
1 | 393.9 |
1 | 394.56 |
3 | 394.94 |
1 | 394.97 |
2 | 395.24 |
2 | 395.28 |
1 | 395.33 |
2 | 395.4 |
2 | 396.3 |
3 | 396.33 |
3 | 396.35 |
2 | 396.86 |
3 | 396.9 |
2 | 396.97 |
1 | 397.03 |
2 | 398.32 |
2 | 398.55 |
3 | 398.56 |
3 | 399.23 |
2 | 399.24 |
3 | 399.31 |
1 | 400.05 |
1 | 400.14 |
2 | 400.23 |
3 | 400.46 |
2 | 400.56 |
1 | 401.18 |
1 | 401.24 |
2 | 401.73 |
2 | 402.42 |
1 | 402.45 |
1 | 402.68 |
2 | 404.01 |
2 | 405.9 |
1 | 406.84 |
1 | 407.26 |
1 | 408.89 |
2 | 409.81 |
1 | 411.12 |
3 | 412.21 |
1 | 413.78 |
2 | 416.86 |
1 | 421.87 |
3 | 423.61 |
1 | 429.74 |
2 | 429.74 |
1 | 434.73 |
1 | 453.31 |
Now, we need to assign ranks to the values that are already organized in ascending order. Make sure that take the average of ranks in case of rank ties (Ex. If two values shared the first place in the list, instead of assigning rank 1 and rank 2 to them, assign rank 1.5 to both) The following ranks are obtain:
line | millilitres(In Asc. Order) | Rank | Rank (Adjusted for ties) |
3 | 365.18 | 1 | 1 |
2 | 371.94 | 2 | 2 |
3 | 373.84 | 3 | 3 |
1 | 375.38 | 4 | 4 |
3 | 376.49 | 5 | 5 |
2 | 377.59 | 6 | 6 |
3 | 377.89 | 7 | 7 |
3 | 379.02 | 8 | 8 |
1 | 380.32 | 9 | 9 |
3 | 381.12 | 10 | 10 |
3 | 381.93 | 11 | 11 |
2 | 382.35 | 12 | 12 |
3 | 383.29 | 13 | 13 |
1 | 383.57 | 14 | 14 |
2 | 385.4 | 15 | 15 |
3 | 387 | 16 | 16 |
2 | 387.05 | 17 | 17 |
3 | 389.53 | 18 | 18 |
3 | 390.51 | 19 | 19 |
3 | 390.56 | 20 | 20 |
3 | 391.06 | 21 | 21 |
2 | 391.49 | 22 | 22 |
1 | 391.55 | 23 | 23 |
2 | 391.75 | 24 | 24 |
3 | 392.31 | 25 | 25 |
3 | 392.72 | 26 | 26 |
1 | 393.35 | 27 | 27 |
1 | 393.9 | 28 | 28 |
1 | 394.56 | 29 | 29 |
3 | 394.94 | 30 | 30 |
1 | 394.97 | 31 | 31 |
2 | 395.24 | 32 | 32 |
2 | 395.28 | 33 | 33 |
1 | 395.33 | 34 | 34 |
2 | 395.4 | 35 | 35 |
2 | 396.3 | 36 | 36 |
3 | 396.33 | 37 | 37 |
3 | 396.35 | 38 | 38 |
2 | 396.86 | 39 | 39 |
3 | 396.9 | 40 | 40 |
2 | 396.97 | 41 | 41 |
1 | 397.03 | 42 | 42 |
2 | 398.32 | 43 | 43 |
2 | 398.55 | 44 | 44 |
3 | 398.56 | 45 | 45 |
3 | 399.23 | 46 | 46 |
2 | 399.24 | 47 | 47 |
3 | 399.31 | 48 | 48 |
1 | 400.05 | 49 | 49 |
1 | 400.14 | 50 | 50 |
2 | 400.23 | 51 | 51 |
3 | 400.46 | 52 | 52 |
2 | 400.56 | 53 | 53 |
1 | 401.18 | 54 | 54 |
1 | 401.24 | 55 | 55 |
2 | 401.73 | 56 | 56 |
2 | 402.42 | 57 | 57 |
1 | 402.45 | 58 | 58 |
1 | 402.68 | 59 | 59 |
2 | 404.01 | 60 | 60 |
2 | 405.9 | 61 | 61 |
1 | 406.84 | 62 | 62 |
1 | 407.26 | 63 | 63 |
1 | 408.89 | 64 | 64 |
2 | 409.81 | 65 | 65 |
1 | 411.12 | 66 | 66 |
3 | 412.21 | 67 | 67 |
1 | 413.78 | 68 | 68 |
2 | 416.86 | 69 | 69 |
1 | 421.87 | 70 | 70 |
3 | 423.61 | 71 | 71 |
1 | 429.74 | 72 | 72.5 |
2 | 429.74 | 73 | 72.5 |
1 | 434.73 | 74 | 74 |
1 | 453.31 | 75 | 75 |
In order to compute the sum of rank for each sample, it is easier to organize the above table by samples. The following is obtained.
line | millilitres | Rank (Adjusted for ties) |
1 | 375.38 | 4 |
1 | 380.32 | 9 |
1 | 383.57 | 14 |
1 | 391.55 | 23 |
1 | 393.35 | 27 |
1 | 393.9 | 28 |
1 | 394.56 | 29 |
1 | 394.97 | 31 |
1 | 395.33 | 34 |
1 | 397.03 | 42 |
1 | 400.05 | 49 |
1 | 400.14 | 50 |
1 | 401.18 | 54 |
1 | 401.24 | 55 |
1 | 402.45 | 58 |
1 | 402.68 | 59 |
1 | 406.84 | 62 |
1 | 407.26 | 63 |
1 | 408.89 | 64 |
1 | 411.12 | 66 |
1 | 413.78 | 68 |
1 | 421.87 | 70 |
1 | 429.74 | 72.5 |
1 | 434.73 | 74 |
1 | 453.31 | 75 |
2 | 371.94 | 2 |
2 | 377.59 | 6 |
2 | 382.35 | 12 |
2 | 385.4 | 15 |
2 | 387.05 | 17 |
2 | 391.49 | 22 |
2 | 391.75 | 24 |
2 | 395.24 | 32 |
2 | 395.28 | 33 |
2 | 395.4 | 35 |
2 | 396.3 | 36 |
2 | 396.86 | 39 |
2 | 396.97 | 41 |
2 | 398.32 | 43 |
2 | 398.55 | 44 |
2 | 399.24 | 47 |
2 | 400.23 | 51 |
2 | 400.56 | 53 |
2 | 401.73 | 56 |
2 | 402.42 | 57 |
2 | 404.01 | 60 |
2 | 405.9 | 61 |
2 | 409.81 | 65 |
2 | 416.86 | 69 |
2 | 429.74 | 72.5 |
3 | 365.18 | 1 |
3 | 373.84 | 3 |
3 | 376.49 | 5 |
3 | 377.89 | 7 |
3 | 379.02 | 8 |
3 | 381.12 | 10 |
3 | 381.93 | 11 |
3 | 383.29 | 13 |
3 | 387 | 16 |
3 | 389.53 | 18 |
3 | 390.51 | 19 |
3 | 390.56 | 20 |
3 | 391.06 | 21 |
3 | 392.31 | 25 |
3 | 392.72 | 26 |
3 | 394.94 | 30 |
3 | 396.33 | 37 |
3 | 396.35 | 38 |
3 | 396.9 | 40 |
3 | 398.56 | 45 |
3 | 399.23 | 46 |
3 | 399.31 | 48 |
3 | 400.46 | 52 |
3 | 412.21 | 67 |
3 | 423.61 | 71 |
With the information provided we can now easily compute the sum of ranks for each of the line:
R1 = 4 + 9 + 14 + 23 + 27 + 28 + 29 + 31 + 34 + 42 + 49 + 50 +
54 + 55 + 58 + 59 + 62 + 63 + 64 + 66 + 68 + 70 + 72.5 + 74 + 75 =
1180.5
R2 = 2 + 6 + 12 + 15 + 17 + 22 + 24 + 32 + 33 + 35 + 36 + 39 + 41 +
43 + 44 + 47 + 51 + 53 + 56 + 57 + 60 + 61 + 65 + 69 + 72.5 =
992.5
R3 = 1 + 3 + 5 + 7 + 8 + 10 + 11 + 13 + 16 + 18 + 19 + 20 + 21 + 25
+ 26 + 30 + 37 + 38 + 40 + 45 + 46 + 48 + 52 + 67 + 71 =
677
(1) Null and Alternative Hypotheses
The following null and alternative hypotheses need to be tested:
Ho: The samples come from populations with equal medians
Ha: The samples come from populations with medians that are not all equal
The above hypotheses will be tested using the Kruskal-Wallis test.
(2) Rejection Region
Based on the information provided, the significance level is alpha =α=0.05, and the number of degrees of freedom is df = 3 - 1 = 2. Therefore, the rejection region for this Chi-Square test is,
R={χ2:χ2>5.991}.
(3) Test Statistics
The H statistic is computed as shown in the following formula:
H =
12/{N(N+1)}*(R1^2/n1+R2^2/n2+.......+Rk^2/nk) - 3(N+1)
= {12}/{ 75(75+1)} [{ 1180.5^2}/{ 25} + { 992.5^2}/{ 25} +{ 677^2}/{ 25} ]- 3 (75+1)
=12/[75(75+1)]*(251180.52+25992.52+256772)−3(75+1)
=10.902
(4) Decision about the null hypothesis
Since it is observed that χ2=10.902 > χU2=5.991, it is then concluded that the null hypothesis is rejected.
Using the P-value approach: The p-value is p=0.0043, and since p=0.0043<0.05, it is concluded that the null hypothesis is rejected.
(5) Conclusion
It is concluded that the null hypothesis Ho is rejected. Therefore, there is enough evidence to claim that the not all population medians are equal, at the alpha =α=0.05 significance level.
i.e.there are at least 1 production line filling package with the different amount of juice on average.