In: Statistics and Probability
The personnel director for a small manufacturing company has collected the data found in the table 1 describing the salary (Y) earned by each machinist in the factory along with the average performance rating (X1) over the past 3 years, the years of service (X2), and the number of different machines each employee is certified to operate (X3).
The personnel director wants to build a regression model to estimate the average salary an employee should expect to receive based on his or her performance, years of service, and certifications.
a. Prepare three scatter plots showing the relationship between the salaries and each of the independent variables. What sort of relationship does each plot suggest?
b. If the personnel director wanted to build a regression model using only one independent variable to predict the salaries, what variable should be used?
c. If the personnel director wanted to build a regression model using only two independent variables to predict the salaries, what two variables should be used?
d. Compare the adjusted-R2 statistics obtained in parts b and c with that of a regression model using all three independent variables. Which model would you recommend that the personnel director use?
e. Suppose the personnel director chooses to use the regression function with all three independent variables. What is the estimated regression function?
f. Suppose the company considers an employee’s salary to be “fair” if it is within 1.5 standard errors of the value estimated by the regression function in part e. What salary range would be appropriate for an employee with 12 years of service, who has received average reviews of 4.5, and is certified to operate 4 pieces of machinery?
Obs |
Salary |
Avg Perf. |
Years |
Certifications |
1 |
48.20 |
3.50 |
9 |
6 |
2 |
55.30 |
5.30 |
20 |
6 |
3 |
53.70 |
5.10 |
18 |
7 |
4 |
61.80 |
5.80 |
33 |
7 |
5 |
56.40 |
4.20 |
31 |
8 |
6 |
52.50 |
6.00 |
13 |
6 |
7 |
54.00 |
6.80 |
25 |
6 |
8 |
55.70 |
5.50 |
30 |
4 |
9 |
45.10 |
3.10 |
5 |
6 |
10 |
67.90 |
7.20 |
47 |
8 |
11 |
53.20 |
4.50 |
25 |
5 |
12 |
46.80 |
4.90 |
11 |
6 |
13 |
58.30 |
8.00 |
23 |
8 |
14 |
59.10 |
6.50 |
35 |
7 |
15 |
57.80 |
6.60 |
39 |
5 |
16 |
48.60 |
3.70 |
21 |
4 |
17 |
49.20 |
6.20 |
7 |
6 |
18 |
63.00 |
7.00 |
40 |
7 |
19 |
53.00 |
4.00 |
35 |
6 |
20 |
50.90 |
4.50 |
23 |
4 |
21 |
55.40 |
5.90 |
33 |
5 |
22 |
51.80 |
5.60 |
27 |
4 |
23 |
60.20 |
4.80 |
34 |
8 |
24 |
50.10 |
3.90 |
15 |
5 |
a. Prepare three scatter plots showing the relationship between the salaries and each of the independent variables. What sort of relationship does each plot suggest?
b. If the personnel director wanted to build a regression model using only one independent variable to predict the salaries, what variable should be used?
--->
Regression Analysis: Salary versus Avg Perf.
Analysis of Variance
Source DF Adj SS Adj MS F-Value P-Value
Regression 1 306.732 306.732 17.64 0.000
Avg Perf. 1 306.732 306.732 17.64 0.000
Error 22 382.528 17.388
Lack-of-Fit 21 379.883 18.090 6.84 0.294
Pure Error 1 2.645 2.645
Total 23 689.260
Model Summary
S R-sq R-sq(adj) R-sq(pred)
4.16985 44.50% 41.98% 33.65%
Coefficients
Term Coef SE Coef T-Value P-Value VIF
Constant 39.35 3.71 10.62 0.000
Avg Perf. 2.828 0.673 4.20 0.000 1.00
Regression Equation
Salary = 39.35 + 2.828 Avg Perf.
c. If the personnel director wanted to build a regression model using only two independent variables to predict the salaries, what two variables should be used?
-----> Avg Perf , Years
Regression Analysis: Salary versus Avg Perf., Years
Analysis of Variance
Source DF Adj SS Adj MS F-Value P-Value
Regression 2 570.53 285.263 50.45 0.000
Avg Perf. 1 62.46 62.458 11.05 0.003
Years 1 263.79 263.794 46.66 0.000
Error 21 118.73 5.654
Total 23 689.26
Model Summary
S R-sq R-sq(adj) R-sq(pred)
2.37781 82.77% 81.13% 78.06%
Coefficients
Term Coef SE Coef T-Value P-Value VIF
Constant 38.25 2.12 18.04 0.000
Avg Perf. 1.443 0.434 3.32 0.003 1.28
Years 0.3412 0.0500 6.83 0.000 1.28
Regression Equation
Salary = 38.25 + 1.443 Avg Perf. + 0.3412 Years
d. Compare the adjusted-R2 statistics obtained in parts b and c with that of a regression model using all three independent variables. Which model would you recommend that the personnel director use?
----->
model 1-41.98%
model 2- 81.13%
model 2 is best model becuase Adjusted R-square higher that model 1