In: Statistics and Probability
Use the following data to predict the wage you would have to pay to hire a particular individual.
a. Specify a regression equation
b. Estimate a regression equation to explain wages
c. Explain the t-stats, f-stats, and the R Squared. What information do they give ?
d. You have a nonwhite female with 20 years in the industry and an MBA. How much will you have to pay to hire her? What if she was a white male?
e. Does this industry discriminate against nonwhite or females ?
Annual Income in $1000 | Years of Experience | Education | Sex | Race |
77.8 | 12 | Masters | m | w |
77.1 | 13 | Masters | m | w |
76.19 | 15 | Masters | m | w |
77.52 | 24 | Masters | f | w |
66.13 | 2 | Masters | f | w |
76.62 | 22 | Masters | m | o |
72.56 | 9 | Masters | f | o |
78.43 | 7 | Masters | m | w |
57.98 | 12 | BA | m | w |
56.64 | 15 | BA | m | w |
61.63 | 27 | BA | f | w |
54.45 | 23 | BA | f | w |
65.19 | 24 | BA | m | w |
59 | 17 | BA | m | w |
58.88 | 1 | BA | m | w |
51.66 | 5 | BA | f | w |
56.23 | 15 | BA | f | o |
49.72 | 2 | BA | f | o |
55.25 | 1 | BA | m | o |
56.15 | 16 | BA | m | o |
52.44 | 18 | BA | m | o |
57 | 23 | BA | m | o |
55.73 | 22 | BA | f | o |
50.61 | 18 | BA | f | w |
51.73 | 15 | BA | f | w |
54.66 | 16 | BA | f | w |
54.41 | 18 | BA | m | o |
59.31 | 12 | BA | m | o |
56.65 | 11 | BA | f | w |
38.36 | 10 | Some Coll. | f | w |
42.94 | 12 | Some Coll. | f | w |
43.33 | 13 | Some Coll. | f | o |
41.35 | 15 | Some Coll. | f | o |
42.51 | 22 | Some Coll. | f | o |
47.39 | 19 | Some Coll. | f | o |
48.9 | 23 | Some Coll. | f | w |
48.02 | 12 | SOme Coll. | m | w |
48.19 | 15 | Some Coll. | m | o |
53.74 | 16 | Some Coll. | m | w |
43.74 | 12 | Some Coll | m | o |
51.61 | 12 | Some Coll | m | w |
48.99 | 10 | SOme Coll. | m | w |
38.7 | 7 | High School | m | o |
38.79 | 12 | High School | m | o |
40.34 | 13 | High School | m | o |
32.73 | 15 | High School | m | o |
30.97 | 6 | High School | m | w |
33.64 | 4 | High School | m | w |
38.05 | 8 | High School | m | o |
36.82 | 16 | High School | m | o |
33.6 | 13 | High School | m | o |
29.12 | 16 | High School | m | o |
30.5 | 2 | High School | m | w |
34.28 | 2 | High School | f | w |
30.52 | 4 | High School | m | w |
32.7 | 6 | High School | m | w |
34.26 | 1 | High School | m | w |
32.15 | 2 | High School | m | w |
38.41 | 7 | High School | m | w |
38.31 | 5 | High School | m | w |
30.29 | 5 | High School | f | w |
34.23 | 3 | High School | m | w |
37.95 | 8 | High School | m | o |
28.58 | 9 | High School | m | o |
35.44 | 34 | High School | m | o |
38.57 | 32 | High School | m | w |
34.06 | 12 | High School | m | w |
31.63 | 19 | High School | f | w |
32.84 | 23 | High School | f | o |
29.69 | 15 | High School | f | o |
26.32 | 2 | High School | f | o |
34.92 | 7 | High School | m | w |
32.28 | 4 | High School | m | w |
37.95 | 17 | High School | m | w |
38.4 | 18 | High School | f | w |
33.56 | 2 | High School | f | w |
a. Specify a regression equation
Let
Y=Annual Income in $1000
Years of Experience
if the education=Masters, else 0
if education =BA, else 0
if education is Some college, else 0
if sex=m(ale), else 0
if race=w(hite),
else 0
The regression equation that we wan to estimate is
where is the intercept, are the slope coefficients of respectively and is a random disturbance
b. Estimate a regression equation to explain wages
Prepare the following sheet
get this
set up the regression using data-->data analysis-->regression
get this
the estimated regression equation to explain the wages is
c. Explain the t-stats, f-stats, and the R Squared. What information do they give ?
Suppose we want to test the overall validity of the model. The hypotheses are
The f-stat and the corresponding significance F helps to test the above hypotheses.
The test statistics to test the above hypotheses is F=167.1897 and the p-value=4.4878E-39.
We will reject the null hypothesis if the p-value is less than the significance level. HEre, the p-value of 4.4878E-39 is less than 0.05 and hence we reject the null hypothesis.
We can say that the regression model to estimate the wages is significant.
We use the t-stats to test if the individual predictor variables
can significantly explain the wages
The hypotheses are
The t-stat values are the test statistics to test the above hypotheses and the p-values are used to test if we can reject the null hypothesis.
If the p-value is less than the significance level we reject the null hypothesis.
Here,
only the p-value for the variable Race is not significant at 0.05.
So Race cannot explain variation in wages at 5% level of significance.
Finally the value of R-square from the output is
This indicates that 93.56% of the variations in wages is explained by the model (or the predictors)
d. You have a nonwhite female with 20 years in the industry and an MBA. How much will you have to pay to hire her? What if she was a white male?
The values of variables are
Using the estimated equation we get
ans: You will have to pay $72,856 to hire a nonwhite female with 20 years in the industry and an MBA
if she was a white male (with 20 years in the industry and an MBA), then the variables are
Using the estimated equation we get
ans: You will have to pay $78,722 to hire a white male with 20 years in the industry and an MBA
e. Does this industry discriminate against nonwhite or females ?
Yes it does.
We can see the slope for white is positive 1.61. This means that a white candidate will be paid on an average $1,607 more than a non white candidate, while keeping other variables the same.
We can see the slope for male is positive 4.26. This means that a male candidate will be paid on an average $4,258 more than a female candidate, while keeping other variables the same.