In: Statistics and Probability
A description of workers. Each year, the Census Bureau selects a different and random sample of more than 3 million of households to be interviewed in American Community Survey (ACS). The dataset that have been assigned to you contains infor mation of a small random sample of workers of a particular state interviewed in the ACS 2016. I
Notes In your answers use up to one decimal place when the number is not an integer If the number is close to zero (i.e 0.0006) use up to four decimal places e Show your work to get full credit. When it corresponds, indicate what statistic unction of Excel you used to compute the estimate. Use the dataset that was assigned. If you use a different dataset, your homework will not be graded
(a) Describe the structure of the data set. In your answer include the population egorical (nominal/ordinal) or quantitative (discrete ,continuous), type of data (1.e., cross of interest, sample size, number of variables, type of variables (i.e. cat sectional, time series, or longitudinal data). (10 points)
(b) If each year the Census Bureau inter viewed the same sample of households, what would be the type of dataset generated by the ACS in this case? (3 pts). Explain.
(c) Use the earnings (WAGP) of the first ten workers to calculate the Σ and (-) (i.e. the sum of the squared deviations) of the workers earnings. Use to compute the sample mean and the sample standard deviation of these sums these workers' earnings. (10 pts)
(NOTE: In the question it is mentioned that the dataset was assigned to the candidate, but here that assigned dataset is not attached, so here tried to give the general solution, so that the functions/methods in the solution can be applied on the available data. Where-ever necessary, the R program functions are also given to get solution)
(a) The data about data is called metadata. It includes the type of all variables available in the dataset (like qualitative or quantitative, discrete or continuous, time-series), total number of variables in dataset, etc. So one is prefering R Program for analysis, there is one function called str(your_dataset), it will give full information of metadata.
The dplyr package in R has funciton called glimpse(your_dataset), which also gives information about dataset columns.
(b) If each year the ACS interview the same set of househlds, the all year samples are said to be dependent samples. So the effective sample size remains same over all the years. If one is using the t-test to compare the mean over two years, then paired sample t-test is to be used, intead of independent sample test. So accordingly assumptions of test(s) get changed.
(c) To get the sum of squared deviations of WAGP column of first 10 workers, in excel there is direct function as below
=DEVSQ(<select first 10 elements in WAGP column and hit enter by closing the braces>)
Or if one wnats to calculate manually, then
i) calculate mean of first 10 WAGP values by =AVERAGE(<select first 10 elements in WAGP column and hit enter by closing the braces>) say it AVG
ii) The create new column by subtracting AVG value from each of 10 values of WAGP and
iii) square each of them then add all 10 values by = SUM(<data rabge>)
Sample standard deviation of earnings can be calculated by using the function =STDEV(<data range>) and sample mean is by =AVERAGE(<data range i.e. 10 values>).