In: Statistics and Probability
A survey conducted by a research team was to investigate how the education level, tenurein current employment, and age are related to annual income. A sample of 20 employees is selected and the data is given below. | |||||||||||
Education (no. of years) | length of tenure in current employment in years | Age in years | Annual income dollars | ||||||||
17 | 8 | 40 | 124000 | ||||||||
12 | 12 | 41 | 30000 | ||||||||
20 | 9 | 44 | 193000 | ||||||||
14 | 4 | 42 | 88000 | ||||||||
12 | 1 | 19 | 27000 | ||||||||
14 | 9 | 28 | 43000 | ||||||||
12 | 8 | 43 | 96000 | ||||||||
18 | 10 | 37 | 110000 | ||||||||
16 | 12 | 36 | 88000 | ||||||||
11 | 7 | 39 | 36000 | ||||||||
16 | 14 | 36 | 81000 | ||||||||
12 | 4 | 22 | 38000 | ||||||||
16 | 17 | 45 | 140000 | ||||||||
13 | 7 | 42 | 11000 | ||||||||
11 | 6 | 18 | 21000 | ||||||||
20 | 4 | 40 | 151000 | ||||||||
19 | 7 | 35 | 124000 | ||||||||
16 | 12 | 38 | 48000 | ||||||||
12 | 2 | 19 | 26000 | ||||||||
10 | 6 | 44 | 124000 | ||||||||
a.using the excel out provided, please answer the following question. What is the estimated regression equation? | |||||||||||
b. please verify that their model is significant at the 0.05 level of significance | |||||||||||
c. is each independent variable significant? Conduct t-test using p-value | |||||||||||
d.what is the meaning of r square | |||||||||||
e. interpret the estimated coefficient (slopes) | |||||||||||
f.what is your recommended regression model (equation) | |||||||||||
g.estimate annual income for a 35-old individual with 20 years of education and 5 years of tenure. | |||||||||||
a.
Using Data analysis package in Excel, we run regression on the above data, we get the output as :
So, Estimated regression equation is :
Y = - 143481.1924 + 10011.92124 X1 - 2193.883764 X2 + 2689.240517 X3
Here, Y = Annual income dollars
X1 = Education (no. of years)
X2 = length of tenure in current employment in years
X3 = Age in years
b. We can see from the ANOVA table that Significance value F = 0.000393507 which is also the P-value.
Since P-value < 0.05 ( level of significance) , we conclude that the model is significant at the 0.05 level of significance.
c.
Null hypothesis Ho : Indepedent variable is not significant
Alternative hypothesis H1: Indepedent variable is significant
Now we consider the P- value for each variable.
For X1 = Education (no. of years), P-value = 0.001287795 < 0.05, we reject Ho and conclude that the independent variable "Education " is significant at 0.05 level of significance.
For X2 = length of tenure in current employment in years , P-value = 0.324637926 > 0.05, we do not reject Ho and conclude that the independent variable "length of tenure in current employment" is not significant at 0.05 level of significance.
For X3 = Age in years, P-value = 0.014939599 < 0.05, we reject Ho and conclude that the independent variable "Age" is significant at 0.05 level of significance.
d.
R square is the proportion of the variance in the dependent variable that is predictable from the independent variable.
In our case, R-square = 0.670117089 = 67 % (approx)
Hence, 67% of the variance in the dependent variable that is predictable from the conisdered independent variable.
e.
We have , Estimated regression equation :
Y = - 143481.1924 + 10011.92124 X1 - 2193.883764 X2 + 2689.240517 X3
Coefficient of X1 = 10011.92124 which means that with a unit change in the value of X1 there is an average increase of 10011.92124 in the value of Y.
Coefficient of X2 = - 2193.883764 which means that with a unit change in the value of X2 there is an average decrease of 2193.883764 in the value of Y.
Coefficient of X3 = 2689.240517 which means that with a unit change in the value of X3 there is an average increase of 2689.240517 in the value of Y.
f.
Recommended regression model would be to just remove the variable X2 = length of tenure in current employment in years from ourr model since it is coming out to be insignificant.
When we again run the model after removing X2 we get th equation as :
Y = -138671.6837 + 9588.89332 X1 + 2234.556096 X3
g.
We use the estimated regression equation in the part (a) :
Y = - 143481.1924 + 10011.92124 X1 - 2193.883764 X2 + 2689.240517 X3
We are given, X1 = 20 , X2 = 5 , X3 = 35
Y = - 143481.1924 + 10011.92124 * 20 - 2193.883764 * 5 + 2689.240517 * 35 = 139911.2
Estimated Annual income in dollars = 139911.2