In: Statistics and Probability
why does the t test require the degrees of freedom and the z test does not
Let' say:
You have a data set with 10 values. If you’re not estimating anything, each value can take on any number, right? Each value is completely free to vary.
But suppose you want to test the population mean with a sample of 10 values, using a 1-sample t test. You now have a constraint—the estimation of the mean. What is that constraint, exactly? By definition of the mean, the following relationship must hold: The sum of all values in the data must equal n x mean, where n is the number of values in the data set.
So if a data set has 10 values, the sum of the 10 values must equal the mean x 10. If the mean of the 10 values is 3.5 (you could pick any number), this constraint requires that the sum of the 10 values must equal 10 x 3.5 = 35.
With that constraint, the first value in the data set is free to vary. Whatever value it is, it’s still possible for the sum of all 10 numbers to have a value of 35. The second value is also free to vary, because whatever value you choose, it still allows for the possibility that the sum of all the values is 35.
In fact, the first 9 values could be anything, including these two examples:
34, -8.3, -37, -92, -1, 0, 1, -22, 99
0.1, 0.2, 0.3, 0.4, 0.5, 0.6, 0.7, 0.8, 0.9
But to have all 10 values sum to 35, and have a mean of 3.5, the 10th value cannot vary.
Another way to say this is that the number of degrees of freedom equals the number of "observations" minus the number of required relations among the observations (e.g., the number of parameter estimates). For a 1-sample t-test, one degree of freedom is spent estimating the mean, and the remaining n - 1 degree of freedom estimate variability.