In: Statistics and Probability
when does Anova use thee F test
Analysis of variance (ANOVA) can determine whether the means of three or more groups are different. ANOVA uses F-tests to statistically test the equality of means.
But, if we talk about theF-test, they are used to compare the variances of two or more samples. Whenever, you start with a one-way/two-way ANOVA test to compare the sample means of two or more samples at hand. First, you compare the variances of these samples to check whether they are equal or not. Why do we do that? Remember, that in a t-test for 2 samples, there are two variations - t-test for equal variances and t-test for unequal variances.
First, we need to understand that why do we compare variances? How do variances tell us about the sample means? Variances are a measure of dispersion, or how far the data are scattered from the mean. Larger values represent greater dispersion.
F-statistics are based on the ratio of mean squares. The term “mean squares” may sound confusing but it is simply an estimate of population variance that accounts for the degrees of freedom (DF) used to calculate that estimate.
In one-way ANOVA, the F-statistic is this ratio:
F = variation between sample means / variation within the samples
Variation between sample means: If the group means are clustered close to the overall mean, their variance is low. However, if the group means are spread out further from the overall mean, their variance is higher. This directky helps us evaluate whether the sample means are converging to the population mean or not and hence, this vallue is in the numerator.
Variation within sample: We also need an estimate of the variability within each sample. To calculate this variance, we need to calculate how far each observation is from its group mean. It is the sum of the squared deviations of each observation from its group mean divided by the error DF.
If the observations for each group are close to the group mean, the variance within the samples is low. However, if the observations for each group are further from the group mean, the variance within the samples is higher.
The F-statistic incorporates both measures of variability discussed above.
Look at this graph. High F value means that the variability of the group means is larger as compared to the within group variability.
The low F-value graph shows a case where the group means are close together (low variability) relative to the variability within each group.
For one-way ANOVA, the ratio of the between-group variability to the within-group variability follows an F-distribution when the null hypothesis is true. That is, the ratio of the between-group variability to the within-group variability is 1. Which precisely means that the sample variances are equal. The alternative hypothesis becomes that the sample variances are not equal.
When you perform a one-way ANOVA for a single study, you obtain a single F-value. However, if we drew multiple random samples of the same size from the same population and performed the same one-way ANOVA, we would obtain many F-values and we could plot a distribution of all of them. This is called F-distribution.
ANOVA uses the F-test to determine whether the variability between group means is larger than the variability of the observations within the groups. If that ratio is sufficiently large, you can conclude that not all the means are equal.
Hope that answers your question.