In: Statistics and Probability
At a gymnastics meet, three judges evaluate the balance beam performances of five gymnasts. The judges use a scale of 1 to 10, where 10 is a perfect score. A statistician wants to examine the jobjectivity and consistency of the judges. Assume scores are normally distrbuted.
| Judge 1 | Judge 2 | Judge 3 | |
| Gymnast 1 | 8.0 | 8.5 | 8.2 | 
| Gymnast 2 | 9.5 | 9.2 | 9.7 | 
| Gymnast 3 | 7.3 | 7.5 | 7.7 | 
| Gymnast 4 | 8.3 | 8.7 | 8.5 | 
| Gymnast 5 | 8.8 | 9.2 | 9.0 | 
Using Minitab, (Stat -> ANOVA -> General Linear Model -> Fit General Linear Model), we get the following output -

To test whether average scores differ by judge,
The value of test statistic F = 2.55
and P-value = 0.139
Since P-value > 0.01, so we fail to reject H0 at 1% level of significance and we can conclude that average scores do not significantly differ by judge.
To test whether average scores differ by gymnast,
The value of test statistic F = 44.75
and P-value = 0
Since P-value < 0.01, so we reject H0 at 1% level of significance and we can conclude that average scores significantly differ by gymnast.

By Tukey's test, we can conclude that gymnast 2 and gymnast 3 are significantly different than others.