In: Statistics and Probability
The Excel file BankData shows the values of the following variables for randomly selected 93 employees of a large bank. (A very similar data set was used in a court lawsuit against discrimination.)
Let
= monthly salary in dollars (SALARY),
= years of schooling at the time of hire (EDUCAT),
= number of months of previous work experience (EXPER),
= number of months that the individual was hired by the bank (MONTHS),
= dummy variable coded 1 for males and 0 for females (MALE).
Using the t-test studied in Section 10.2, you could find some evidence that the mean salary of all male employees is greater than the mean salaries of all female employees, and hence provide some support for a discrimination suit against the employer. It is recognized, however, that a simplecomparison of the mean salaries might be insufficient to conclude that the female employees have been discriminated against. Obviously there are other factors that affect the salary. These factors have been identified as and defined above.
Assume the following multiple linear regression model,
,
and apply Regression in Data Analysis of Excel with the 99% confidence level (see pages 312 – 314) to find the estimated regression equation
.
Note. Of course, Input Y Range is A1:A94, Input X range is B1:E94, and Labels should be checked.
1. Clearly show the estimated regression equation. What is the percentage of variation in the salary explained by this equation? Assuming that the values of and are fixed, what is the estimated difference between the predicted monthly salaries of male and female employees?
2. What salary would you predict for a male employee with 12 years educations, 10 months of previous work experience, and with the time hired equal to 15 months? What salary would you predict for a female employee with 12 years educations, 10 months of previous work experience, and with the time hired equal to 15 months? What is the difference between the two predicted salaries? Compare this difference with that stated in Task 1.
3. Is there a significant difference in the predicted salaries for male and female employees after accounting for the effects of the three other independent variables? To answer this question, conduct the ttest for the significance of at a 1% level of significance. Clearly show the null and alternative hypotheses to be tested, the value of the test statistic, the p-value of the test, your conclusion and its interpretation; see pages 322 – 323 and 333 – 335.
SALARY | EDUCAT | EXPER | MONTHS | MALE |
5620 | 10 | 12 | 22 | 1 |
5040 | 8 | 14 | 3 | 1 |
5100 | 9 | 36 | 15 | 1 |
5100 | 10 | 55 | 2 | 1 |
5220 | 12 | 29 | 14 | 1 |
5400 | 12 | 37 | 21 | 1 |
5400 | 12 | 38 | 11 | 1 |
5400 | 12 | 39 | 3 | 1 |
5400 | 10 | 48 | 8 | 1 |
5400 | 10 | 60 | 11 | 1 |
5700 | 15 | 74 | 5 | 1 |
6000 | 15 | 88 | 21 | 1 |
6000 | 12 | 98 | 12 | 1 |
6000 | 12 | 113 | 17 | 1 |
6000 | 12 | 115 | 14 | 1 |
6000 | 15 | 123 | 33 | 1 |
6000 | 14 | 152 | 11 | 1 |
6500 | 14 | 173 | 19 | 1 |
6000 | 15 | 150 | 13 | 1 |
6400 | 15 | 136 | 32 | 1 |
6000 | 15 | 156 | 12 | 1 |
6900 | 15 | 180 | 33 | 1 |
6000 | 15 | 156 | 16 | 1 |
6000 | 16 | 145 | 13 | 1 |
6300 | 15 | 220 | 17 | 1 |
6600 | 15 | 164 | 16 | 1 |
7800 | 15 | 259 | 33 | 1 |
6600 | 15 | 216 | 16 | 1 |
6840 | 15 | 142 | 17 | 1 |
6900 | 16 | 175 | 20 | 1 |
6900 | 15 | 132 | 24 | 1 |
8100 | 16 | 315 | 33 | 1 |
6300 | 15 | 187 | 30 | 1 |
6400 | 15 | 231 | 33 | 1 |
4620 | 10 | 12 | 22 | 0 |
4020 | 10 | 12 | 7 | 0 |
4290 | 12 | 5 | 10 | 0 |
4380 | 8 | 6 | 7 | 0 |
4380 | 8 | 8 | 6 | 0 |
4380 | 12 | 3 | 7 | 0 |
4380 | 12 | 4 | 10 | 0 |
4380 | 12 | 5 | 6 | 0 |
4440 | 10 | 11 | 2 | 0 |
4500 | 12 | 12 | 3 | 0 |
4500 | 12 | 8 | 19 | 0 |
4620 | 12 | 52 | 13 | 0 |
4800 | 10 | 70 | 20 | 0 |
4800 | 12 | 52 | 23 | 0 |
4800 | 12 | 11 | 12 | 0 |
4800 | 12 | 75 | 17 | 0 |
4800 | 12 | 63 | 22 | 0 |
4800 | 12 | 144 | 24 | 0 |
4800 | 12 | 163 | 12 | 0 |
4800 | 15 | 228 | 26 | 0 |
4800 | 12 | 381 | 10 | 0 |
4800 | 16 | 214 | 15 | 0 |
4980 | 10 | 318 | 25 | 0 |
5100 | 10 | 96 | 33 | 0 |
5100 | 12 | 36 | 15 | 0 |
5100 | 12 | 59 | 14 | 0 |
5100 | 10 | 115 | 1 | 0 |
5100 | 10 | 165 | 4 | 0 |
5100 | 15 | 123 | 12 | 0 |
5160 | 12 | 118 | 12 | 0 |
5220 | 10 | 102 | 29 | 0 |
5220 | 12 | 127 | 29 | 0 |
5280 | 10 | 90 | 11 | 0 |
5280 | 12 | 190 | 31 | 0 |
5280 | 12 | 107 | 11 | 0 |
5400 | 10 | 113 | 34 | 0 |
5400 | 12 | 128 | 33 | 0 |
5400 | 12 | 126 | 11 | 0 |
5400 | 12 | 112 | 33 | 0 |
5400 | 12 | 98 | 22 | 0 |
5400 | 12 | 82 | 29 | 0 |
5400 | 12 | 169 | 27 | 0 |
5400 | 12 | 124 | 31 | 0 |
5400 | 15 | 94 | 13 | 0 |
5400 | 15 | 49 | 27 | 0 |
5400 | 15 | 121 | 21 | 0 |
5400 | 15 | 122 | 33 | 0 |
5520 | 12 | 97 | 17 | 0 |
5520 | 12 | 196 | 32 | 0 |
5580 | 12 | 133 | 30 | 0 |
5640 | 12 | 155 | 9 | 0 |
5700 | 12 | 123 | 23 | 0 |
5700 | 12 | 117 | 25 | 0 |
5700 | 15 | 151 | 17 | 0 |
5700 | 15 | 161 | 11 | 0 |
5700 | 15 | 241 | 34 | 0 |
6000 | 12 | 121 | 30 | 0 |
6000 | 15 | 244 | 22 | 0 |
6120 | 12 | 209 | 21 | 0 |
SUMMARY OUTPUT | ||||||||
Regression Statistics | ||||||||
Multiple R | 0.874491 | |||||||
R Square | 0.764734 | |||||||
Adjusted R Square | 0.75404 | |||||||
Standard Error | 371.2956 | |||||||
Observations | 93 | |||||||
ANOVA | ||||||||
df | SS | MS | F | Significance F | ||||
Regression | 4 | 39434252 | 9858563 | 71.51119 | 7.73183E-27 | |||
Residual | 88 | 12131717 | 137860.4 | |||||
Total | 92 | 51565970 | ||||||
Coefficients | Standard Error | t Stat | P-value | Lower 95% | Upper 95% | Lower 99.0% | Upper 99.0% | |
Intercept | 3490.391 | 256.6959 | 13.59738 | 2.42E-23 | 2980.262261 | 4000.521 | 2814.547557 | 4166.235207 |
EDUCAT | 76.65403 | 23.27278 | 3.293721 | 0.001425 | 30.40427283 | 122.9038 | 15.38010777 | 137.9279442 |
EXPER | 2.92248 | 0.611065 | 4.782605 | 6.9E-06 | 1.708118076 | 4.136843 | 1.313634257 | 4.531326442 |
MONTHS | 20.06142 | 4.58929 | 4.371356 | 3.37E-05 | 10.94117245 | 29.18167 | 7.978472564 | 32.14437197 |
MALE | 856.053 | 85.70218 | 9.988697 | 3.79E-16 | 685.7379859 | 1026.368 | 630.4113856 | 1081.694711 |
(a) Here the estimated regression equation is where b0, b1, b2, b3 and b4 are the coefficients for intercept, educat, exper, months and male respectively.
Hence the estimated regression equation becomes
Now the coefficient of determination i.e. R2 measures the percentage of variation in the salary and the 0.764734 is the coefficient of determination from excel spreadsheet above.
(b) To predict the monthly salary of a male for the given values of the constraint
as number of males are 34
Similarly to predict the monthly salary of a female for the given values of the constraint
as number of females are 58.
Difference = 21401.32246