In: Statistics and Probability
What is the t distribution and why is it needed? How do we know to use the t distribution in the construction of a confidence interval of the mean rather than the normal distribution? Give an example. Ask a question
The T distribution, also known as the Student’s t-distribution, is a type of probability distribution that is similar to the normal distribution with its bell shape but has heavier tails. It is a continuous probability distribution that is used to estimate population parameters when the sample size is small and/or when the population variance is unknown.
Why do we use this distribution?
According to the central limit theorem, the sampling distribution of a statistic (like a sample mean) will follow a normal distribution, as long as the sample size is sufficiently large. Again, when we know the standard deviation of the population, we can compute a z-score, and use the normal distribution to evaluate probabilities with the sample mean.
But it may happen in many cases, sample sizes become small, and often we do not know the standard deviation of the population. When either of these problems occur, one prefers to use t-distribution.
Construction of confidence interval :
Now consider the case in which we have a normal distribution but do not know the standard deviation. We sample N values and compute the sample mean (M) and estimate the standard error of the mean (σM) with sM. If the standard deviation was known we could simply calculate CI for mean using standard normal table and normal distribution. Here, the SD being unknown we use t-distribution.
If we have one small set of data (under 30 items), we'll want to use the t-distribution instead of the normal distribution to construct our confidence interval.
Example :
Construct a 98% Confidence Interval based on the following data: 45, 55, 67, 45, 68, 79, 98, 87, 84, 82.
Step 1: Find the mean, μ and standard deviation, σ for the data.
σ: 18.172.
μ: 71
Step 2: Subtract 1 from the sample size to find the degrees of freedom (df). We have 10 numbers listed, so our sample size is 10, so our df = 9.
Step 3: Subtract the confidence level from 1, then divide by two. This is the alpha level.
(1 – .98) / 2 = .01
Step 4: Look up df (Step 2) and α (Step 3) in the t-distribution table. For df = 9 and α = .01, the t distribution table gives us 2.821.
Step 5: Divide the std dev (step 1) by the square root of the sample size.
18.172 / √(10) = 5.75
Step 6: : Multiply step 4 by step 5.
2.821 × 5.75 = 16.22075
Step 7: For the lower end of the range, subtract step 6 from the mean (Step 1).
71 – 16.22075 = 54.77925
Step 8: For the upper end of the range, add step 6 to the mean (Step 1).
71 + 16.22075 = 87.22075
So the t distribution CI for sample mean is (54.77925,87.22075)