In: Statistics and Probability
PLEASE EXPLAIN YOUR REASONING
ANOVA is based on an F-ratio that is calculated as the ratio of two variance estimates, the variance between groups and the variance within groups, but enables conclusions to be made about the means of the samples involved. What is the logic of that? I.e., explain the rationale that supports the use of variance estimates.
1) It is done because:
The null hypothesis says that all the group population means are equal. The hypothesis of equal means implies that the populations have the same normal distribution because it is assumed that the populations are normal and that they have equal variances.
1. Variance between samples: An estimate of σ 2 that is the variance of the sample means. If the samples are different sizes, the variance between samples is weighted to account for the different sample sizes. The variance is also called variation due to treatment or explained variation.
2. Variance within samples: An estimate of σ 2 that is the average of the sample variances (also known as a pooled variance). When the sample sizes are different, the variance within samples is weighted. The variance is also called the variation due to error or unexplained variation.
F = MSbetween / MSwithin
If MSbetween and MSwithin estimate the same value (following the belief that Ho is true), then the F-ratio should be approximately equal to 1. Only sampling errors would contribute to variations away from 1. As it turns out, MSbetween consists of the population variance plus a variance produced from the differences between the samples. MSwithin is an estimate of the population variance. Since variances are always positive, if the null hypothesis is false, MSbetween will be larger than MSwithin. The F-ratio will be larger than 1. The above calculations were done with groups of different sizes.