In: Statistics and Probability
ABC, Inc. is undergoing scrutiny for a possible wage discrimination suit. The following data is available: SALARY(monthly salary for each employee $), YEARS (years with the company), POSITION (position with company coded as: 1 = manual labor 2 = secretary 3 = lab technician 4 = chemist 5 = management EDUCAT (amount of education completed coded as: 1 = high school degree 2 = some college 3 = college degree 4 = graduate degree), GENDER (employee gender).
SALARY | YEARS | POSITION | EDUCAT | GENDER |
1720 | 6 | 3 | 2 | female |
2400 | 4.9 | 1 | 1 | male |
1600 | 4.2 | 2 | 2 | female |
2900 | 3.7 | 4 | 3 | female |
1200 | 1.6 | 3 | 1 | female |
1000 | 0.3 | 3 | 1 | female |
2900 | 1 | 4 | 3 | male |
2400 | 1.8 | 4 | 3 | male |
1900 | 6.8 | 3 | 1 | female |
2200 | 1.2 | 4 | 3 | male |
1000 | 0.3 | 3 | 1 | female |
900 | 0.2 | 3 | 1 | female |
1250 | 0.6 | 3 | 1 | female |
950 | 0.5 | 3 | 1 | female |
2000 | 0.7 | 4 | 3 | male |
2000 | 1.9 | 4 | 3 | male |
1900 | 1.6 | 1 | 1 | male |
1000 | 1.4 | 3 | 1 | female |
1000 | 1.4 | 3 | 1 | female |
2800 | 3.4 | 4 | 3 | female |
2900 | 3.5 | 4 | 3 | male |
1550 | 3.1 | 3 | 1 | female |
1550 | 3 | 2 | 1 | female |
2200 | 2.5 | 4 | 3 | male |
1650 | 2.2 | 1 | 1 | male |
2200 | 2 | 4 | 3 | male |
900 | 0.5 | 3 | 1 | female |
1000 | 0.5 | 3 | 2 | female |
1220 | 2 | 3 | 1 | female |
2100 | 0.5 | 4 | 3 | male |
900 | 0.5 | 3 | 1 | female |
900 | 0.2 | 3 | 1 | female |
2000 | 0.5 | 4 | 3 | male |
2330 | 0.6 | 4 | 3 | male |
2400 | 0.3 | 4 | 3 | male |
900 | 1 | 1 | 1 | male |
1069 | 0.5 | 3 | 1 | female |
1400 | 0.5 | 1 | 1 | male |
1650 | 1 | 1 | 1 | male |
1200 | 0.3 | 1 | 1 | male |
3500 | 13.5 | 5 | 4 | male |
1750 | 11 | 5 | 3 | female |
4000 | 6.4 | 5 | 3 | male |
1800 | 7.2 | 2 | 1 | female |
4000 | 6.1 | 5 | 3 | male |
4600 | 5.8 | 5 | 4 | male |
1350 | 5.1 | 4 | 3 | male |
a) Briefly summarize (present & calculate) the descriptive statistics of the data.
We have to find the descriptive statistics for Salary and Years.
Go to Megastat>Descriptive Statistics.
Select the Input Range and click OK.
Descriptive statistics for Salary is:
SALARY | |
count | 47 |
mean | 1,873.17 |
sample standard deviation | 900.56 |
sample variance | 8,11,009.32 |
minimum | 900 |
maximum | 4600 |
range | 3700 |
skewness | 1.17 |
kurtosis | 1.20 |
coefficient of variation (CV) | 48.08% |
1st quartile | 1,134.50 |
median | 1,720.00 |
3rd quartile | 2,265.00 |
interquartile range | 1,130.50 |
mode | 1,000.00 |
Therefore, we can say that the average salary for an employee is $1,873.17. The minimum salary for an employee is $900 and the maximum salary for an employee is $4,600.
Descriptive statistics for Years is:
YEARS | |
count | 47 |
mean | 2.634 |
sample standard deviation | 2.911 |
sample variance | 8.476 |
minimum | 0.2 |
maximum | 13.5 |
range | 13.3 |
skewness | 1.867 |
kurtosis | 3.916 |
coefficient of variation (CV) | 110.53% |
1st quartile | 0.500 |
median | 1.600 |
3rd quartile | 3.600 |
interquartile range | 3.100 |
mode | 0.500 |
Therefore, we can say that the average number of years an employee works in the company is 2.634 years. The minimum number of years an employee works in the company is 0.2 years and the maximum number of years an employee works in the company is 13.5 years.
b)Interpret and evaluate the model coefficients for the management team and corporate lawyer of ABC, Inc.
The model coefficients for the management team and corporate lawyer of ABC, Inc are:
Let x1 represents the number of years an employee is working in the company.
Let x2 represents the position of an employee.
Let x3 represents the education of an employee.
Let x4 represents the gender of an employee.
For x4, suppose its value is 1 for Female and 0 for Male.
c) Test for significance of relationships (both individually and jointly for the overall model). Use a 10% level of significance. Please note if you are performing a two-tailed test or one-tailed test and justify.
Let us test for significance of relationship jointly for the overall model.
Go to Megastat>Correlation/Regression.
Select the Input Range and click OK.
For the Input Range, select the independent variable(s), X as Years, Position, Education and Gender column.
For the Input Range, select the dependent variable, Y as Salary column.
The output is as follows:
R² | 0.700 | |||||
Adjusted R² | 0.671 | n | 47 | |||
R | 0.837 | k | 4 | |||
Std. Error | 516.205 | Dep. Var. | SALARY | |||
ANOVA table | ||||||
Source | SS | df | MS | F | p-value | |
Regression | 2,61,14,770.0905 | 4 | 65,28,692.5226 | 24.50 | 1.64E-10 | |
Residual | 1,11,91,658.5478 | 42 | 2,66,468.0607 | |||
Total | 3,73,06,428.6383 | 46 | ||||
Regression output | confidence interval | |||||
variables | coefficients | std. error | t (df=42) | p-value | 95% lower | 95% upper |
Intercept | 945.2355 | |||||
YEARS | 102.0659 | 29.4081 | 3.471 | .0012 | 42.7179 | 161.4138 |
POSITION | 111.6250 | 139.0671 | 0.803 | .4267 | -169.0237 | 392.2738 |
EDUCAT | 300.8649 | 195.4214 | 1.540 | .1312 | -93.5113 | 695.2412 |
GENDER | -579.7620 | 239.3983 | -2.422 | .0198 | -1,062.8873 | -96.6367 |
At a significance level of 0.1, we can say that the result is significant as the p-value(0.000) is less than the significance level.
Therefore, we can say there is a relationship between Salary and Years, Position, Education and Gender of an employee.
The regression equation for the model is:
y = 945.2355 + 102.0659*x1 + 111.6250*x2 + 300.8649*x3 - 579.7620*x4
Or
Salary = 945.2355 + 102.0659*Years+ 111.6250*Position + 300.8649*Education - 579.7620*Gender
Let us test for significance of relationship for Salary and Years.
Go to Megastat>Correlation/Regression.
Select the Input Range and click OK.
For the Input Range, select the independent variable(s), X as Years column.
For the Input Range, select the dependent variable, Y as Salary column.
The output is as follows:
r² | 0.276 | n | 47 | |||
r | 0.526 | k | 1 | |||
Std. Error | 774.603 | Dep. Var. | SALARY | |||
ANOVA table | ||||||
Source | SS | df | MS | F | p-value | |
Regression | 1,03,06,021.8929 | 1 | 1,03,06,021.8929 | 17.18 | .0001 | |
Residual | 2,70,00,406.7454 | 45 | 6,00,009.0388 | |||
Total | 3,73,06,428.6383 | 46 | ||||
Regression output | confidence interval | |||||
variables | coefficients | std. error | t (df=45) | p-value | 95% lower | 95% upper |
Intercept | 1,444.9179 | |||||
YEARS | 162.5837 | 39.2293 | 4.144 | .0001 | 83.5719 | 241.5955 |
At a significance level of 0.1, we can say that the result is significant as the p-value(0.0001) is less than the significance level.
Therefore, we can say there is a relationship between Salary and Years of an employee.
The regression equation for the model is:
y = 1,444.9179 + 162.5837*x1
Or
Salary = 1,444.9179 + 162.5837*Years
Let us test for significance of relationship for Salary and Position.
Go to Megastat>Correlation/Regression.
Select the Input Range and click OK.
For the Input Range, select the independent variable(s), X as Position column.
For the Input Range, select the dependent variable, Y as Salary column.
The output is as follows:
r² | 0.331 | n | 47 | |||
r | 0.575 | k | 1 | |||
Std. Error | 744.827 | Dep. Var. | SALARY | |||
ANOVA table | ||||||
Source | SS | df | MS | F | p-value | |
Regression | 1,23,41,902.8476 | 1 | 1,23,41,902.8476 | 22.25 | 2.35E-05 | |
Residual | 2,49,64,525.7907 | 45 | 5,54,767.2398 | |||
Total | 3,73,06,428.6383 | 46 | ||||
Regression output | confidence interval | |||||
variables | coefficients | std. error | t (df=45) | p-value | 95% lower | 95% upper |
Intercept | 487.8999 | |||||
POSITION | 436.9645 | 92.6425 | 4.717 | 2.35E-05 | 250.3728 | 623.5561 |
At a significance level of 0.1, we can say that the result is significant as the p-value(0.0000) is less than the significance level.
Therefore, we can say there is a relationship between Salary and Position of an employee.
The regression equation for the model is:
y = 487.8999 + 436.9645*x2
Or
Salary = 487.8999 + 436.9645*Position
Let us test for significance of relationship for Salary and Education.
Go to Megastat>Correlation/Regression.
Select the Input Range and click OK.
For the Input Range, select the independent variable(s), X as Education column.
For the Input Range, select the dependent variable, Y as Salary column.
The output is as follows:
r² | 0.591 | n | 47 | |||
r | 0.769 | k | 1 | |||
Std. Error | 581.968 | Dep. Var. | SALARY | |||
ANOVA table | ||||||
Source | SS | df | MS | F | p-value | |
Regression | 2,20,65,549.6728 | 1 | 2,20,65,549.6728 | 65.15 | 2.71E-10 | |
Residual | 1,52,40,878.9655 | 45 | 3,38,686.1992 | |||
Total | 3,73,06,428.6383 | 46 | ||||
Regression output | confidence interval | |||||
variables | coefficients | std. error | t (df=45) | p-value | 95% lower | 95% upper |
Intercept | 571.7059 | |||||
EDUCAT | 664.8785 | 82.3728 | 8.072 | 2.71E-10 | 498.9712 | 830.7858 |
At a significance level of 0.1, we can say that the result is significant as the p-value(0.0000) is less than the significance level.
Therefore, we can say there is a relationship between Salary and Education of an employee.
The regression equation for the model is:
y = 571.7059 + 664.8785*x3
Or
Salary = 571.7059 + 664.8785*Education
Let us test for significance of relationship for Salary and Gender.
Go to Megastat>Correlation/Regression.
Select the Input Range and click OK.
For the Input Range, select the independent variable(s), X as Gender column.
For the Input Range, select the dependent variable, Y as Salary column.
( I am not able to attach the output due to characters limit per question)
At a significance level of 0.1, we can say that the result is significant as the p-value(0.0000) is less than the significance level.
Therefore, we can say there is a relationship between Salary and Gender of an employee.
The regression equation for the model is:
y = 2,340.8333 - 955.6594*x4
Or
Salary = 2,340.8333 - 955.6594*Gender
d) Demonstrate how ABC, Inc., could use the model for predicting employee salary. Include a sample computation.
The model for predicting employee salary has a regression equation:
y = 945.2355 + 102.0659*x1 + 111.6250*x2 + 300.8649*x3 + 239.3983*x4
Or
Salary = 945.2355 + 102.0659*Years+ 111.6250*Position + 300.8649*Education - 579.7620*Gender
A sample computation:
Let us say, we want to predict an employee salary who is a female, working for the company from the past 10 years as a secretary and has a graduate degree.
In simple words, we are given with:
Years = 10
Position = 2
Education = 4
Gender = 1
Therefore, the predicted salary for this employee is:
Salary = 945.2355 + 102.0659*10+ 111.6250*2 + 300.8649*4 - 579.7620*1
Salary = $2,812.841866
Therefore, the predicted salary for an employee who is a female, working for the company from the past 10 years as a secretary and has a graduate degree is $2,812.841866.