In: Statistics and Probability
Assignment on Multiple Linear Regression
The Excel file BankData shows the values of the following variables for randomly selected 93 employees of a bank. This real data set was used in a court lawsuit against discrimination.
Let
= monthly salary in dollars (SALARY),
= years of schooling at the time of hire (EDUCAT),
= number of months of previous work experience (EXPER),
= number of months that the individual was hired by the bank (MONTHS),
= dummy variable coded 1 for males and 0 for females (MALE).
Let um = the mean salary for all male bank employees, and uf = the mean salary for all female bank employees. Using the t-test studied in Section 10.2, you could find some evidence of um > uf and provide some support for a discrimination suit against the employer. It is recognized, however, that a simple comparison of the mean salaries might be insufficient to conclude that the female employees have been discriminated against. Obviously there are other factors that affect the salary to which the relation um > uf might be attributed. These factors have been identified as x1, x2 and x3 defined above.
Assume the following regression model,
y = B0 + B1x1 + B2x2 + B3x3 + B4x4 + E
and apply Regression in Data Analysis of Excel to find the estimated regression equation
1. Clearly show the estimated regression equation. Assuming that the values of x1, x2 and x3 are fixed, what is the estimated average difference between the salaries of all male and female employees?
2. What salary would you predict for a male employee with 12 years educations, 10 months of previous work experience, and with the time hired equal to 15 months? What salary would you predict for a female employee with 12 years educations, 10 months of previous work experience, and with the time hired equal to 15 months? What is the difference between the two predicted salaries? Compare this difference with that found in Task 1.
3. Is there a significant difference in the average salaries for male and female employees after accounting for the effects of the three other independent variables? Use a 5% level of significance to answer this question. Clearly show the null and alternative hypotheses to be tested, the value of the test statistic, the p-value of the test, your conclusion and its interpretation.
The detailed example of using Regression in Data Analysis of Excel is shown on pages 312 – 314.
Use Microsoft Word to write a managerial report with your name shown on the first page. The report should include your Excel output from Regression (copy and paste it), so do not attach any separate Excel file.
SALARY EDUCAT EXPER MONTHS MALE
4620 12 11.5 22 1
5040 15 14 3 1
5100 12 180 15 1
5100 12 315 2 1
5220 12 29 14 1
5400 12 7 21 1
5400 12 38 11 1
5400 12 113 3 1
5400 15 17.5 8 1
5400 15 359 11 1
5700 15 36 5 1
6000 8 320 21 1
6000 12 24 2 1
6000 12 32 17 1
6000 12 49 8 1
6000 12 56 33 1
6000 12 252 11 1
6000 12 272 19 1
6000 15 25 13 1
6000 15 35.5 32 1
6000 15 56 12 1
6000 15 64 33 1
6000 15 108 16 1
6000 16 45.5 3 1
6300 15 72 17 1
6600 15 64 16 1
6600 15 84 33 1
6600 15 215.5 16 1
6840 15 41.5 7 1
6900 12 175 10 1
6900 15 132 24 1
8100 16 54.5 33 1
3900 12 0 1 0
4020 10 44 7 0
4290 12 5 30 0
4380 8 6.2 7 0
4380 8 7.5 6 0
4380 12 0 7 0
4380 12 0 10 0
4380 12 4.5 6 0
4440 15 75 2 0
4500 8 52 3 0
4500 12 8 19 0
4620 12 52 3 0
4800 8 70 20 0
4800 12 6 23 0
4800 12 11 12 0
4800 12 11 17 0
4800 12 63 22 0
4800 12 144 24 0
4800 12 163 12 0
4800 12 228 26 0
4800 12 381 1 0
4800 16 214 15 0
4980 8 318 25 0
5100 8 96 33 0
5100 12 36 15 0
5100 12 59 14 0
5100 15 115 1 0
5100 15 165 4 0
5100 16 123 12 0
5160 12 18 12 0
5220 8 102 29 0
5220 12 127 29 0
5280 8 90 11 0
5280 8 190 1 0
5280 12 107 11 0
5400 8 173 34 0
5400 8 228 33 0
5400 12 26 11 0
5400 12 36 33 0
5400 12 38 22 0
5400 12 82 29 0
5400 12 169 27 0
5400 12 244 1 0
5400 15 24 13 0
5400 15 49 27 0
5400 15 51 21 0
5400 15 122 33 0
5520 12 97 17 0
5520 12 196 32 0
5580 12 132.5 30 0
5640 12 55 9 0
5700 12 90 23 0
5700 12 116.5 25 0
5700 15 51 17 0
5700 15 61 11 0
5700 15 241 34 0
6000 12 121 30 0
6000 15 78.5 13 0
6120 12 208.5 21 0
6300 12 86.5 33 0
6300 15 231 15 0
(1) the estimated regression eqution is given as
y=3526.42+90.02x1+1.27x2+23.41x3+722.47x4
SALARY | EDUCAT | Exper | Month | MALE |
y | x1 | x2 | x3 | x4 |
estimated difference between male and femal is regression coefficient of x4 and it is 722.47
(2)
for male x1=12,x2=10, x3=15 and x4=1,
the y=3526.42+90.02*12+1.27*10+23.41*15+722.47*1=5692.98
for female x1=12,x2=10, x3=15 and x4=0,
the y=3526.42+90.02*12+1.27*10+23.41*15+722.47*0=4970.51
difference =5692.98-4970.51=722.47 which is same
(3)null hypothesis H0:4=0 and alternate hypothesis Ha:
t=4/SE(4)=722.46/117.82=6.13 with df=88
and p-value=0.000
following information has been generated using ms-excel
SUMMARY OUTPUT | ||||||
Regression Statistics | ||||||
Multiple R | 0.714754304 | |||||
R Square | 0.510873714 | |||||
Adjusted R Square | 0.488640701 | |||||
Standard Error | 507.4215001 | |||||
Observations | 93 | |||||
ANOVA | ||||||
df | SS | MS | F | Significance F | ||
Regression | 4 | 23665351.39 | 5916338 | 22.97816 | 5.07217E-13 | |
Residual | 88 | 22657938.93 | 257476.6 | |||
Total | 92 | 46323290.32 | ||||
Coefficients | Standard Error | t Stat | P-value | Lower 95% | Upper 95% | |
Intercept | 3526.422111 | 327.7254214 | 10.76029 | 1E-17 | 2875.136716 | 4177.707506 |
X Variable 1 | 90.02031094 | 24.69356079 | 3.645497 | 0.000451 | 40.94704888 | 139.093573 |
X Variable 2 | 1.268990015 | 0.587736851 | 2.159113 | 0.033562 | 0.100986552 | 2.436993477 |
X Variable 3 | 23.40623577 | 5.200862805 | 4.500452 | 2.07E-05 | 13.07061404 | 33.74185749 |
X Variable 4 | 722.4606714 | 117.8215905 | 6.131819 | 2.41E-08 | 488.3150236 | 956.6063192 |