In: Statistics and Probability
You are trying to find a relationship between the productivity
of your sales staff and some other factors such as the number of
training hours, age, and the current annual salary. You have
collected the data of 37 employees. The Output is the percent of
actual sales given the sales leads. You want to study the impact of
other variables on this Output. For example, does age matter? And
if it does, are the younger employees better performers than the
older employees?
Output Data
Employee Output (O)
Training hours(T)
Age (A) Income (I)
1
58.8 24
60 22.1
2
58.7 25
59 24.6
3
58.2 25
59 25.1
4 57
25
53 28
5
56.5 25
52 28.3
6
55.2 27
50 29.7
7
54.7 30
50 32.1
8
54.5 30
49 32.6
9
52.8 30
49 33.7
10
51.1 31
49 44.1
11
44.8 32
48 42.9
12 44
33 47
40.7
13
59.4 39
47 40.2
14
59.8 41
45 40
15
61.7 43
45 39.9
16
62.5 44
43 39.6
17
62.6 45
43 38.5
18
64.6 60
41 38.4
19
64.9 57
40 38
20
65.3 56
40 37.4
21
65.7 56
40 34.5
22
66.1 49
40 34.4
23
68.2 48
40 44.4
24
71.8 48
35 44.6
25
71.7 46
35 44.8
26
71.3 46
35 44.8
27
70.1 60
36 45.6
28
69.9 63
37 46.5
29
69.8 63
38 47.5
30
69.6 63
39 49
31
68.8 64
39 51.4
32
68.5 68
39 52.6
33
72.6 72
34 52.9
34
73.9 75
34 54.8
35
75.3 76
33 55.1
36
81.6 77
31 55.2
37
83.7 79
30 55.6
a) What portion of the variation in the Output is explained by the
3 independent variables?
b) What output would you expect for a 40-year old employee who has
50 hours of training and is making $55,450?
c) Based on this model, productivity
d) Based on this model, productivity
e) Comment on the quality of your regression result. That is, how
strong or weak it is?
First, let's insert the data into Microsoft Excel. We can run the regression test in Excel or any other statistical software of your choice. As the software of choice isn't mentioned, we will run it in Excel.
Insert the data -> Go to Data -> Data Analysis -> Regression -> Select the data -> Click OK
The output is given below:
SUMMARY OUTPUT | ||||||||
Regression Statistics | ||||||||
Multiple R | 0.877813 | |||||||
R Square | 0.770556 | |||||||
Adjusted R Square | 0.749697 | |||||||
Standard Error | 4.518052 | |||||||
Observations | 37 | |||||||
ANOVA | ||||||||
df | SS | MS | F | Significance F | ||||
Regression | 3 | 2262.265 | 754.0883 | 36.94194 | 1.17E-10 | |||
Residual | 33 | 673.6222 | 20.41279 | |||||
Total | 36 | 2935.887 | ||||||
Coefficients | Standard Error | t Stat | P-value | Lower 95% | Upper 95% | Lower 95.0% | Upper 95.0% | |
Intercept | 84.10782 | 17.08399 | 4.923194 | 2.31E-05 | 49.35017 | 118.8655 | 49.35017 | 118.8655 |
T | 0.411947 | 0.103893 | 3.965116 | 0.000371 | 0.200576 | 0.623319 | 0.200576 | 0.623319 |
A | -0.5402 | 0.242575 | -2.22695 | 0.032888 | -1.03373 | -0.04668 | -1.03373 | -0.04668 |
I | -0.41191 | 0.178036 | -2.31361 | 0.027061 | -0.77412 | -0.04969 | -0.77412 | -0.04969 |
Let's answer the questions now:
a) The percent of variation explained by the three independent variables is the R-square value which is 0.770556 or 77.06%
b) The regression equation is:
O = 84.10782 + 0.411947 x T - 0.5402 x A - 0.41191 x I
For A = 40, T = 50, Income = 55.45
O = 60.26
The output expected is 60.26
c) The question is incomplete
d) The question is incomplete
e) Based on the regression model, the percent variation explanied
by the model is 77%. The p-value being <0.0001 means that the
regression model is significant. We can say that the model is
strong.
Also, looking at the p-values, we can say that all the variables
are significant in the regression equation as all the p-values are
< 0.05. We can say that age, training hours and income are
significant predictors of output.