In: Statistics and Probability
An agency wants to examine the distribution of wages F(x) of graduates with bachelor degrees. For that purpose they receive from US Census Bureau database a sample X1,…, Xn of 10,000 numeric records of wages for randomly chosen graduates.
Find the algorithm to provide an interval of wages which cover the 50% central part of distribution F(x) with approximate confidence 95%. How can you evaluate the accuracy of that estimation?
Given the data, we calculate the following quantities:
a) The sample size
This is the size of the sample out of 10,000 records. Note that n should be at least 30 to use the normal approximation.
b) The sample mean
This corresponds to 50% central part of the distribution.
c) The sample standard deviation
As the sample size is large, we use the normal approximation for the distribution. We use a z-table for converting probability to the z-score.
d) The 95% confidence interval corresponds to a probability range of 0.025 to 0.975. Using a z-table, we get the corresponding z-score for 0.975 as 1.96.
e) In the final step, we get the confidence interval as:
All the quantities have been calculated in the previous steps.
To evaluate the estimation, we take more samples of size from the same data and confirm that at least 95% of those samples have their mean in the above confidence interval.