In: Statistics and Probability
13.50 The owner of a moving company typically has his most experienced manager predict the total number of labor hours that will be required to complete an upcoming move. This approach has proved useful in the past, but the owner has the business ob-jective of developing a more accurate method of predicting labor hours. In a preliminary effort to provide a more accurate method, the owner has decided to use the number of cubic feet moved and the number of pieces of large furniture as the independent vari-ables and has collected data for 36 moves in which the origin and destination were within the borough of Manhattan in New York City and the travel time was an insignificant portion of the hours worked. The data are organized and stored in Moving .
a. State the multiple regression equation.
b. Interpret the meaning of the slopes in this equation.
c. Predict the mean labor hours for moving 500 cubic feet with two large pieces of furniture.
d. Perform a residual analysis on your results and determine whether the regression assumptions are valid.
e. Determine whether there is a significant relationship between labor hours and the two independent variables (the number of cubic feet moved and the number of pieces of large furniture) at the 0.05 level of significance.
f. Determine the p-value in (e) and interpret its meaning.
g. Interpret the meaning of the coefficient of multiple determina-tion in this problem.
h. Determine the adjusted r2.
i. At the 0.05 level of significance, determine whether each inde-pendent variable makes a significant contribution to the regres-sion model. Indicate the most appropriate regression model for this set of data.
j. Determine the p-values in (i) and interpret their meaning.
k. Construct a 95% confidence interval estimate of the population slope between labor hours and the number of cubic feet moved. How does the interpretation of the slope here differ from that in Problem 12.44 on page 443?
l. What conclusions can you reach concerning labor hours?
Using R I solve this Problem .
library(dplyr)
library(data.table)
data=fread(file.choose())
Determine whether there is a significant relationship between labor hours and the two independent variables (the number of cubic feet moved and the number of pieces of large furniture) at the 0.05 level of significance
> cor.test(data$Hours,data$Feet)
Pearson's product-moment correlation
data: data$Hours and data$Feet
t = 16.522, df = 34, p-value < 2.2e-16
alternative hypothesis: true correlation is not equal to 0
95 percent confidence interval:
0.8902786 0.9707785
sample estimates:
cor
0.9429984
using above result we say that there is significant relationship between between labor hours and feet becasue P-value is less than 0.05.
> cor.test(data$Hours,data$Large)
Pearson's product-moment correlation
data: data$Hours and data$Large
t = 13.105, df = 34, p-value = 7.564e-15
alternative hypothesis: true correlation is not equal to 0
95 percent confidence interval:
0.8360594 0.9554000
sample estimates:
cor
0.9136406
using above result we say that there is significant relationship between between labor hours and Large becasue P-value is less than 0.05.
F: Determine the p-value in (e) and interpret its meaning.
> model=lm(data$Hours~data$Large +data$Feet)
> summary(model)
Call:
lm(formula = data$Hours ~ data$Large + data$Feet)
Residuals:
Min 1Q Median 3Q Max
-9.2921 -2.1574 0.3798 2.6174 9.2571
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) -3.915221 1.673790 -2.339 0.0255 *
data$Large 4.222834 0.914190 4.619 5.64e-05 ***
data$Feet 0.031924 0.004604 6.934 6. 36e-08 ***
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
Residual standard error: 3.98 on 33 degrees of freedom
Multiple R-squared: 0.9327, Adjusted R-squared: 0.9287
F-statistic: 228.8 on 2 and 33 DF, p-value: < 2.2e-16
interpritation - you can see above both P-value are less than 0.05 that means both the variable are significant ,
G: Interpret the meaning of the coefficient of multiple determination in this problem
> R2=summary(model)$r.squared
> R2
[1] 0.9327368
Here R2 is 0.93 means there is 93% variation explained responce variable in two independent variable .
H: Determine the adjusted r2.
> adj_R2=summary(model)$adj.r.squared
> adj_R2
[1] 0.9286602
adjusted R2 = 0.9286.
K: Construct a 95% confidence interval
> confint(model)
2.5 % 97.5 %
(Intercept) -7.32057240 -0.50987045
data$Large 2.36289934 6.08276843
data$Feet 0.02255721 0.04129148