In: Statistics and Probability
Watson & Watson Repair Inc. provides maintenance service for a large apartment complex in downtown Saint Petersburg, Florida. The managers are evaluating the possibility of hiring another maintenance person due to the increase of maintenance calls. Rafael Roddick and Andy Nadal are currently responsible for maintenance tasks. To investigate “what” drives Repair Time and be able to hire the best candidate, the managers hire you as statistician to conduct a regression analysis. As dependent variable you have time of repair for each, Rafael and Andy. You also have time since last maintenance. You should accomplish your analysis step by step, including one variable at a time. First look at the correlations table to figure out the relationship between variables.
STEP 1: use the dummy variable REPAIRPERSON = 1 IF responsible = RAFAEL; REPAIRPERSON = 0 IF responsible = ANDY; RUN a regression model using ONLY repairperson as variable to explain REPAIRTIME
Fully explain here:
Fully explain here:
Sol:
Comment on goodness of fit of the model is
fit<-lm(Repair.time.hours.~Month.since.last.service)
summary(fit)
Output:
Call:
lm(formula = Repair.time.hours. ~ Month.since.last.service)
Residuals:
Min 1Q Median 3Q Max
-1.21074 -0.30179 -0.00549 0.43946 1.06436
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 2.03602 0.52847 3.853 0.00486 **
Month.since.last.service 0.32490 0.08824 3.682 0.00620 **
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
Residual standard error: 0.6868 on 8 degrees of freedom
Multiple R-squared: 0.6289, Adjusted R-squared: 0.5825
F-statistic: 13.56 on 1 and 8 DF, p-value: 0.0062
-------------------------------------------------------------------------------
## Pvalue of model is less than 0.05 hence we can say that model is
good fit
##Model gives R2=0.62 which is good value, says that
Month.since.last.service is explaining 62% of variability in
Repair.time.hours.
#Residual Vs. Fitted plot to check goodness of fit.
plot(fitted(fit),residuals(fit),col="green",lwd=5)
Output:
## Fitted Vs. residual plot shows no pattern which mean model is
goodfit.
The statistical significance of the coefficients is
To check statistical significance of the coefficients we will
use anova.
# Hypothesis:
#H01:Month.since.last.service is not significant
variable for model
#Vs.
#H11:Month.since.last.service is significant variable
for model
#H02:Maintanance.person is not significant variable
for model
#Vs.
#H12:Maintanance.person is significant variable for
model
anova(Fit1)
Ouput:
Analysis of Variance Table
Response: Repair.time.hours.
Df Sum Sq Mean Sq F value Pr(>F)
Month.since.last.service 1 6.3954 6.3954 14.9080 0.006203 **
Maintanance.person 1 0.7706 0.7706 1.7964 0.222018
Residuals 7 3.0030 0.4290
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
#Conclusion1: pvalue of Month.since.last.service is 0.006203 <0.05, hence we will reject H01 and conclude that
Month.since.last.service is significant variable for model at 5% los.
#Conclusion2: pvalue of Maintanance.person is 0.222018 >0.05, hence we will do not reject H02 and conclude that
Maintanance.person is not significant variable for model 5% los.
If you Satisfy with Answer, Please give me "Thumb Up". It was very important to me.