In: Math
Case Problem - Regression
Are you going to hate your new job?
Getting a new job can be exciting and uplifting. But what if you discover that after a short time on the job, that you hate your new job? Is there any way to determine ahead of time whether you will love or hate your new job? According to the Wall Street Journal, there are a few things to look for in the interview that might help you to determine whether you will be happy on that job.
A study conducted by the University of Connecticut posed several questions to employees to ascertain their job satisfaction. Themes included: relationship with the supervisor, overall quality of the work environment, total weekly hours worked, and opportunity for advancement at the job. Nineteen employees were asked to rate their job satisfaction on a scale of 0-100, with 100 being perfectly satisfied. The results of the survey are as follows. Assume that the relationship with a supervisor is rated from 0-50, with 50 as excellent. Overall workplace quality rated from 0-100, with 100 representing an excellent environment and opportunities for advancement on a scale of 0-50 with 50 representing excellent opportunity.
Job Relationship Overall Quality Total Hrs. Opportunities
Satisfaction w/ Supervisor Work Environ Worked/wk Advancement
55 27 65 50 42
20 12 13 60 28
85 40 79 45 7
65 35 53 65 48
45 29 43 40 32
70 42 62 50 41
35 22 18 75 18
60 34 75 40 32
95 50 84 45 48
65 33 68 60 11
85 40 72 55 33
10 5 10 50 21
75 37 64 45 42
80 42 82 40 46
50 31 46 60 48
90 47 95 55 30
75 36 82 70 39
45 20 42 40 22
65 32 73 55 12
1. Develop a multiple regression model and analyze the data above related to job satisfaction. Use the four step analytical process to analyze the data. Test at the 0.05 level of significance and discuss in detail.
2. Of the variables above that are related to job satisfaction, which variables are stronger predictors of job satisfaction? Are other variables not mentioned here, potentially related to job satisfaction? Discuss in detail.
let y=job satisfaction
x1=relationship with supervisor
x2=overall quality work environment
x3=total hours worked per week
x4=opportunities advancement
let us fit a multiple regression of y on x1,x2,x3,x4
y=c(55,20,85,65,45,70,35,60,95,65,85,10,75,80,50,90,75,45,65)
>
x1=c(27,12,40,35,29,42,22,34,50,33,40,5,37,42,31,47,36,20,32)
>
x2=c(65,13,79,53,43,62,18,75,84,68,72,10,64,82,46,95,82,42,73)
>
x3=c(50,60,45,65,40,50,75,40,45,60,55,50,45,40,60,55,70,40,55)
>
x4=c(42,28,7,48,32,41,18,32,48,11,33,21,42,46,48,30,39,22,12)
> m1=lm(y~x1+x2+x3+x4)
> summary(m1)
Call:
lm(formula = y ~ x1 + x2 + x3 + x4)
Residuals:
Min 1Q Median
-8.3290 -3.0805 0.0917
3Q Max
2.7742 8.7235
Coefficients:
Estimate
(Intercept) -1.46912
x1 1.39078
x2 0.31737
x3 0.04330
x4 -0.09446
Std. Error
(Intercept) 8.11613
x1 0.26076
x2 0.11614
x3 0.12123
x4 0.10222
t value Pr(>|t|)
(Intercept) -0.181 0.858953
x1 5.333 0.000106
x2 2.733 0.016189
x3 0.357 0.726317
x4 -0.924 0.371100
(Intercept)
x1 ***
x2 *
x3
x4
---
Signif. codes:
0 ‘***’ 0.001 ‘**’ 0.01
‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
Residual standard error: 5.141 on 14 degrees of freedom
Multiple R-squared: 0.9617, Adjusted R-squared:
0.9507
F-statistic: 87.79 on 4 and 14 DF, p-value: 9.412e-10
since the p values in x1 and x2 are less than 0.05 x1 and x2 are significant only
thus job satisfaction depends on relation with supervisor and overall quality work environment
this is evident from stepwise regression
step(m1)
Start: AIC=66.42
y ~ x1 + x2 + x3 + x4
Df Sum of Sq RSS
- x3 1 3.37 373.43
- x4 1 22.57 392.63
<none> 370.06
- x2 1 197.38 567.44
- x1 1 751.92 1121.98
AIC
- x3 64.588
- x4 65.540
<none> 66.415
- x2 72.537
- x1 85.490
Step: AIC=64.59
y ~ x1 + x2 + x4
Df Sum of Sq RSS
- x4 1 24.41 397.85
<none> 373.43
- x2 1 196.83 570.27
- x1 1 787.83 1161.26
AIC
- x4 63.791
<none> 64.588
- x2 70.632
- x1 84.144
Step: AIC=63.79
y ~ x1 + x2
Df Sum of Sq RSS
<none> 397.85
- x2 1 261.58 659.43
- x1 1 816.41 1214.26
AIC
<none> 63.791
- x2 71.392
- x1 82.992
Call:
lm(formula = y ~ x1 + x2)
Coefficients:
(Intercept) x1
-0.6167 1.3035
x2
0.3387
thus at last y depends on x1 and x2
we would prefer the variable with least mean standard error
OVERALL WORK ENVIRONMENT IS THE MOST IMPORTANT
R squared=96 hence 96% of variation in y is explained by x1 and x2 hence other variables are not needed