In: Statistics and Probability
The Excel file BankData shows the values of the following variables for randomly selected 93 employees of a large bank. This real data set was used in a court lawsuit against discrimination.
Let
= monthly salary in dollars (SALARY),
= years of schooling at the time of hire (EDUCAT),
= number of months of previous work experience (EXPER),
= number of months that the individual was hired by the bank (MONTHS),
= dummy variable coded 1 for males and 0 for females (MALE).
Using the t-test studied in Section 10.2, you could find some evidence that the mean salary of all male employees is greater than the mean salaries of all female employees, and hence provide some support for a discrimination suit against the employer. It is recognized, however, that a simplecomparison of the mean salaries might be insufficient to conclude that the female employees have been discriminated against. Obviously there are other factors that affect the salary. These factors have been identified as and defined above.
Assume the following multiple linear regression model,
,
and apply Regression in Data Analysis of Excel (see pages 312 – 314) to find the estimated regression equation
.
Note. Of course, Input Y Range is A1:A94, Input X range is B1:E94, and Labels should be checked.
1. Clearly show the estimated regression equation. What is the percentage of variation in the salary explained by this equation? Assuming that the values of and are fixed, what is the estimated difference between the predicted salaries of male and female employees?
2. What salary would you predict for a male employee with 12 years educations, 10 months of previous work experience, and with the time hired equal to 15 months? What salary would you predict for a female employee with 12 years educations, 10 months of previous work experience, and with the time hired equal to 15 months? What is the difference between the two predicted salaries? Compare this difference with that stated in Task 1.
3. Is there a significant difference in the predicted salaries for male and female employees after accounting for the effects of the three other independent variables? To answer this question, conduct the ttest for the significance of at a 1% level of significance. Clearly show the null and alternative hypotheses to be tested, the value of the test statistic, the p-value of the test, your conclusion and its interpretation; see pages 322 – 323 and 333 – 335.
SALARY | EDUCAT | EXPER | MONTHS | MALE |
4620 | 10 | 12 | 22 | 1 |
5040 | 8 | 14 | 3 | 1 |
5100 | 9 | 36 | 15 | 1 |
5100 | 10 | 55 | 2 | 1 |
5220 | 12 | 29 | 14 | 1 |
5400 | 12 | 37 | 21 | 1 |
5400 | 12 | 38 | 11 | 1 |
5400 | 12 | 39 | 3 | 1 |
5400 | 10 | 48 | 8 | 1 |
5400 | 10 | 60 | 11 | 1 |
5700 | 15 | 74 | 5 | 1 |
6000 | 15 | 88 | 21 | 1 |
6000 | 12 | 98 | 12 | 1 |
6000 | 12 | 113 | 17 | 1 |
6000 | 12 | 115 | 14 | 1 |
6000 | 15 | 123 | 33 | 1 |
6000 | 14 | 152 | 11 | 1 |
6000 | 14 | 173 | 19 | 1 |
6000 | 15 | 150 | 13 | 1 |
6000 | 15 | 136 | 32 | 1 |
6000 | 15 | 156 | 12 | 1 |
6000 | 15 | 180 | 33 | 1 |
6000 | 15 | 156 | 16 | 1 |
6000 | 16 | 145 | 13 | 1 |
6300 | 15 | 220 | 17 | 1 |
6600 | 15 | 164 | 16 | 1 |
6600 | 15 | 259 | 33 | 1 |
6600 | 15 | 216 | 16 | 1 |
6840 | 15 | 142 | 17 | 1 |
6900 | 16 | 175 | 20 | 1 |
6900 | 15 | 132 | 24 | 1 |
8100 | 16 | 315 | 33 | 1 |
3900 | 9 | 3 | 1 | 0 |
4020 | 10 | 12 | 7 | 0 |
4290 | 12 | 5 | 10 | 0 |
4380 | 8 | 6 | 7 | 0 |
4380 | 8 | 8 | 6 | 0 |
4380 | 12 | 3 | 7 | 0 |
4380 | 12 | 4 | 10 | 0 |
4380 | 12 | 5 | 6 | 0 |
4440 | 10 | 11 | 2 | 0 |
4500 | 12 | 12 | 3 | 0 |
4500 | 12 | 8 | 19 | 0 |
4620 | 12 | 52 | 13 | 0 |
4800 | 10 | 70 | 20 | 0 |
4800 | 12 | 52 | 23 | 0 |
4800 | 12 | 11 | 12 | 0 |
4800 | 12 | 75 | 17 | 0 |
4800 | 12 | 63 | 22 | 0 |
4800 | 12 | 144 | 24 | 0 |
4800 | 12 | 163 | 12 | 0 |
4800 | 15 | 228 | 26 | 0 |
4800 | 12 | 381 | 10 | 0 |
4800 | 16 | 214 | 15 | 0 |
4980 | 10 | 318 | 25 | 0 |
5100 | 10 | 96 | 33 | 0 |
5100 | 12 | 36 | 15 | 0 |
5100 | 12 | 59 | 14 | 0 |
5100 | 10 | 115 | 1 | 0 |
5100 | 10 | 165 | 4 | 0 |
5100 | 15 | 123 | 12 | 0 |
5160 | 12 | 118 | 12 | 0 |
5220 | 10 | 102 | 29 | 0 |
5220 | 12 | 127 | 29 | 0 |
5280 | 10 | 90 | 11 | 0 |
5280 | 12 | 190 | 31 | 0 |
5280 | 12 | 107 | 11 | 0 |
5400 | 10 | 113 | 34 | 0 |
5400 | 12 | 128 | 33 | 0 |
5400 | 12 | 126 | 11 | 0 |
5400 | 12 | 112 | 33 | 0 |
5400 | 12 | 98 | 22 | 0 |
5400 | 12 | 82 | 29 | 0 |
5400 | 12 | 169 | 27 | 0 |
5400 | 12 | 124 | 31 | 0 |
5400 | 15 | 94 | 13 | 0 |
5400 | 15 | 49 | 27 | 0 |
5400 | 15 | 121 | 21 | 0 |
5400 | 15 | 122 | 33 | 0 |
5520 | 12 | 97 | 17 | 0 |
5520 | 12 | 196 | 32 | 0 |
5580 | 12 | 133 | 30 | 0 |
5640 | 12 | 155 | 9 | 0 |
5700 | 12 | 123 | 23 | 0 |
5700 | 12 | 117 | 25 | 0 |
5700 | 15 | 151 | 17 | 0 |
5700 | 15 | 161 | 11 | 0 |
5700 | 15 | 241 | 34 | 0 |
6000 | 12 | 121 | 30 | 0 |
6000 | 15 | 244 | 22 | 0 |
6120 | 12 | 209 | 21 | 0 |
6300 | 15 | 187 | 30 | 0 |
6300 | 15 | 231 | 33 | 0 |
The Excel file BankData shows the values of the following variables for randomly selected 93 employees of a large bank. This real data set was used in a court lawsuit against discrimination.
Let
Dependent variable y = monthly salary in dollars (SALARY),
There are four independent variables.
x1 = years of schooling at the time of hire (EDUCAT),
x2 = number of months of previous work experience (EXPER),
x3 = number of months that the individual was hired by the bank (MONTHS),
x4 = dummy variable coded 1 for males and 0 for females (MALE).
Using the t-test studied in Section 10.2, you could find some evidence that the mean salary of all male employees is greater than the mean salaries of all female employees, and hence provide some support for a discrimination suit against the employer. It is recognized, however, that asimplecomparison of the mean salaries might be insufficient to conclude that the female employees have been discriminated against. Obviously there are other factors that affect the salary. These factors have been identified as and defined above.
Assume the following multiple linear regression model,
,and apply Regression in Data Analysis of Excel (see pages 312 – 314) to find the estimated regression equation
.Note. Of course, Input Y Range is A1:A94, Input X range is B1:E94, and Labels should be checked.
THis is the problem of multiple linear regression.
We can do multiple regression in EXCEL.
steps :
ENTER data into EXCEL sheet --> Data --> Data analysis --> Regression --> ok --> Input y range : Salary --> INput X range : Select all the independent variables --> Labels --> COnfidence level : 99% --> Output range : Select one empty cell --> ok --> ok
1. Clearly show the estimated regression equation. What is the percentage of variation in the salary explained by this equation? Assuming that the values of and are fixed, what is the estimated difference between the predicted salaries of male and female employees?
The regression equation is,
y = 3389.11 + 90.84*x1 + 2.97*x2 + 16.54*x3 + 720.44*x4
Rsq = 0.7230 = 72.30%
Interpretation of R square : 72.30% of the variation in the salary explained by variation in independent variables.
2. What salary would you predict for a male employee with 12 years educations, 10 months of previous work experience, and with the time hired equal to 15 months? What salary would you predict for a female employee with 12 years educations, 10 months of previous work experience, and with the time hired equal to 15 months? What is the difference between the two predicted salaries? Compare this difference with that stated in Task 1.
Now we have to predict salary for male (y) when x1 = 12 years
x2 = 10 months
x3 = 15 months
x4 = 1
We can find salary (y) using regression equation.
y = 3389.11 + 90.84*x1 + 2.97*x2 + 16.54*x3 + 720.44*x4
= 3389.11 + 90.84*12 + 2.97*10 + 16.54*15 + 720.44*1
= 5477.42
Now we have to predict salary for female (y) when x1 = 12 years
x2 = 10 months
x3 = 15 months
x4 = 0
We can find salary (y) using regression equation.
y = 3389.11 + 90.84*x1 + 2.97*x2 + 16.54*x3 + 720.44*x4
= 3389.11 + 90.84*12 + 2.97*10 + 16.54*15 + 720.44*0
= 4756.98
DIfference in salaries = salary for male - salary for female
= 5477.42 - 4756.98
= 720.44
3. Is there a significant difference in the predicted salaries for male and female employees after accounting for the effects of the three other independent variables? To answer this question, conduct the ttest for the significance of at a 1% level of significance. Clearly show the null and alternative hypotheses to be tested, the value of the test statistic, the p-value of the test, your conclusion and its interpretation
Here we have to test the hypothesis that,
H0 : B = 0 Vs H1 : B not = 0
where B is population slope for male.
Assume alpha = level of significance = 0.01
The test statistic follows t-distribution with n-2 degrees of freedoms.
From the output the test statistic for male is 8.12 and P-value = 2.7E-12 = 0.0000
P-value < alpha
Reject H0 at 1% level of significance.
COnclusion : The population slope for male is differ than 0.
We get significant result about t-test.