In: Statistics and Probability
Description: The data are from a national sample of 6000 households with a male head earning less than $15,000 annually in 1966. The data were classified into 39 demographic groups for analysis. The study was undertaken in the context of proposals for a guaranteed annual wage (negative income tax). At issue was the response of labor supply (average hours) to increasing hourly wages. The study was undertaken to estimate this response from available data
Approach: Our plan is to divide up the work where one person will tackle each question proposed above. We hope to find the best fitting simple linear regression between hours and wages. We also plan to gather an analysis of the labor supply to increasing hourly wage.
Number of cases: 39
Variable Names: |
HRS: Average hours worked during the year
WAGE: Average hourly wage ($)
ERSP: Average yearly earnings of spouse ($)
ERNO: Average yearly earnings of other family members ($)
NEIN: Average yearly non-earned income
ASSET: Average family asset holdings (Bank account, etc.) ($)
AGE: Average age of respondent
DEP: Average number of dependents
RACE: Percent of white respondents
SCHOOL: Average highest grade of school completed
The Data (also attached in another excel file):
HRS RATE ERSP ERNO NEIN ASSET AGE DEP RACE SCHOOL
2157 2.905 1121 291 380 7250 38.5 2.340 32.1 10.5
2174 2.970 1128 301 398 7744 39.3 2.335 31.2 10.5
2062 2.350 1214 326 185 3068 40.1 2.851 * 8.9
2111 2.511 1203 49 117 1632 22.4 1.159 27.5 11.5
2134 2.791 1013 594 730 12710 57.7 1.229 32.5 8.8
2185 3.040 1135 287 382 7706 38.6 2.602 31.4 10.7
2210 3.222 1100 295 474 9338 39.0 2.187 10.1 11.2
2105 2.493 1180 310 255 4730 39.9 2.616 71.1 9.3
2267 2.838 1298 252 431 8317 38.9 2.024 9.7 11.1
2205 2.356 885 264 373 6789 38.8 2.662 25.2 9.5
2121 2.922 1251 328 312 5907 39.8 2.287 51.1 10.3
2109 2.499 1207 347 271 5069 39.7 3.193 * 8.9
2108 2.796 1036 300 259 4614 38.2 2.040 * 9.2
2047 2.453 1213 297 139 1987 40.3 2.545 * 9.1
2174 3.582 1141 414 498 10239 40.0 2.064 * 11.7
2067 2.909 1805 290 239 4439 39.1 2.301 * 10.5
2159 2.511 1075 289 308 5621 39.3 2.486 43.6 9.5
2257 2.516 1093 176 392 7293 37.9 2.042 * 10.1
1985 1.423 553 381 146 1866 40.6 3.833 * 6.6
2184 3.636 1091 291 560 11240 39.1 2.328 13.6 11.6
2084 2.983 1327 331 296 5653 39.8 2.208 58.4 10.2
2051 2.573 1194 279 172 2806 40.0 2.362 77.9 9.1
2127 3.262 1226 314 408 8042 39.5 2.259 39.2 10.8
2102 3.234 1188 414 352 7557 39.8 2.019 29.8 10.7
2098 2.280 973 364 272 4400 40.6 2.661 53.6 8.4
2042 2.304 1085 328 140 1739 41.8 2.444 83.1 8.2
2181 2.912 1072 304 383 7340 39.0 2.337 30.2 10.2
2186 3.015 1122 30 352 7292 37.2 2.046 29.5 10.9
2108 2.786 1757 * 506 9658 43.4 * 32.6 10.2
2188 3.010 990 366 374 7325 38.4 2.847 30.9 10.6
2203 3.273 * * 430 8221 38.2 2.324 22.1 11.0
2077 1.901 350 209 95 1370 37.4 4.158 61.3 8.2
2196 3.009 947 294 342 6888 37.5 3.047 31.8 10.6
2093 1.899 342 311 120 1425 37.5 4.512 62.8 8.1
2173 2.959 1116 296 387 7625 39.2 2.342 31.0 10.5
2179 2.971 1128 312 397 7779 39.4 2.341 31.2 10.5
2200 2.980 1126 204 393 7885 39.2 2.341 31.0 10.6
2052 2.630 * * 154 3331 40.5 * 45.8 10.3
2197 3.413 1078 300 512 10450 39.1 2.297 15.5 11.3
Pleas answer the following using SAS and leave the code as well
The comparison between the correlation of wages and ages, and wages and schooling.
The data can be imported using the following code:
PROC IMPORT DATAFILE ="path name where file is saved"
OUT = MONEY
DBMS = TAB
REPLACE;
RUN;
Here we first need to form a new variable with HRS and RATE to get WAGE using the formula WAGE=RATE*HRS
PROC SQL;
SELECT HRS,RATE,AGE,SCHOOL,HRS*RATE AS WAGE FORMAT 10.2
FROM WORK.MONEY;
QUIT;
To compare the correlation between the variables wages and ages, and wages and schooling we can find correlation coefficient separately or use the correlation matrix with the variables wages, ages and schooling
PROC COR DATA=MONEY;
VAR WAGE AGE SCHOOL;
RUN;
In order to see correlation between wage and age refer to the cell corresponding to wage and age variables and you can find the correlation coefficient and for wage and school refer to cell corresponding to wage and school variables. Based on the value we can make a comparison.For example, whether both have a positive correlation with wage or a negative. If correlation is higher for wage vs school than wage vs age then we can infer schooling years have more positive impact on wage than age and vice versa.