In: Statistics and Probability
4. You have collected weekly earnings and age data from a sub-sample of 1,744 individuals using the Current Population Survey in a given year.
(a) Given the overall mean of $434.49 and a standard deviation of $294.67, construct a 99% confidence interval for average earnings in the entire population. State the meaning of this interval in words, rather than just in numbers. If you constructed a 90% confidence interval instead, would it be smaller or larger? What is the intuition?
(b) When dividing your sample into people 45 years and older, and younger than 45, the information shown in the table is found.
Age Category |
Average Earnings |
Standard Deviation |
N |
Age ≥ 45 |
$468.87 |
$308.64 |
507 |
Age < 45 |
$412.20 |
$276.63 |
1237 |
Test whether or not the difference in average earnings is statistically significant. Given your knowledge of age-earning profiles, does this result make sense?
a)
The confidence interval for mean weekly earnings is
Based on the sample at hand, the best guess for the population mean is $434.49.
However, because of random sampling error, this guess is likely to be wrong. Instead, the interval
estimate for the average earnings lies between $416.29 and $452.69. Committing to such an interval
repeatedly implies that the resulting statement is incorrect 1 out of 100 times.
For a 90% confidence interval, the only change in the calculation of the confidence interval is to replace 2.58 by 1.64.
Hence the confidence interval is smaller. A smaller interval implies, given the same average earnings and the standard deviation, that the statement will be false more often.
The larger the confidence interval, the more likely it is to contain the population value.
b)
Assuming unequal population variances
which is statistically significant at conventional levels
whether we will use use a two sided or one sided t test
Hence the null hypothesis of equal average earnings in the two groups is rejected.
Age earning profiles typically take on an inverted U shape. Maximum earnings occur in the 40s, depending on some other factors such as years of education, which are not considered here. Hence it is not clear if the alternative hypothesis should be one sided or two sided. In such a situation, it is best to assume a two sided alternative hypothesis.