In: Statistics and Probability
13.50 The owner of a moving company typically has his most experienced manager predict the total number of labor hours that will be required to complete an upcoming move. This approach has proved useful in the past, but the owner has the business ob-jective of developing a more accurate method of predicting labor hours. In a preliminary effort to provide a more accurate method, the owner has decided to use the number of cubic feet moved and the number of pieces of large furniture as the independent vari-ables and has collected data for 36 moves in which the origin and destination were within the borough of Manhattan in New York City and the travel time was an insignificant portion of the hours worked. The data are organized and stored in Moving .a. State the multiple regression equation.b. Interpret the meaning of the slopes in this equation.c. Predict the mean labor hours for moving 500 cubic feet with two large pieces of furniture.d. Perform a residual analysis on your results and determine whether the regression assumptions are valid.e. Determine whether there is a significant relationship between labor hours and the two independent variables (the number of cubic feet moved and the number of pieces of large furniture) at the 0.05 level of significance.f. Determine the p-value in (e) and interpret its meaning.g. Interpret the meaning of the coefficient of multiple determina-tion in this problem.h. Determine the adjusted r2.i. At the 0.05 level of significance, determine whether each inde-pendent variable makes a significant contribution to the regres-sion model. Indicate the most appropriate regression model for this set of data.j. Determine the p-values in (i) and interpret their meaning.k. Construct a 95% confidence interval estimate of the population slope between labor hours and the number of cubic feet moved. How does the interpretation of the slope here differ from that in Problem 12.44 on page 443?l. What conclusions can you reach concerning labor hours?
a)
The regression equation is defined as,
Now, the regression analysis is done in excel by following steps
Step 1: Write the data values in excel. The screenshot is shown below,
Step 2: DATA > Data Analysis > Regression > OK. The screenshot is shown below,
Step 3: Select Input Y Range: 'Hours' column, Input X Range: 'Feet and Large' column then OK. The screenshot is shown below, (Click on Residual and Residual plot for residual analysis)
The result is obtained. The screenshot is shown below,
The regression equation is,
b)
For response variable Feet,
For each unit increase in Feet, the total number of labor hours will increase by 0.0319.
For response variable Large,
For each unit increase in Large total number of labor hours will increase by 4.2228.
c)
For 500 cubic feet and two large pieces of furniture.
d)
Perform a residual analysis on your results and determine whether the regression assumptions are valid
The residual plot is obtained in excel in part a). The screenshot is shown below,
Since residuals are randomly dispersed, the multiple linear regression model assumptions are valid for the data;
e)
From the regression model summary,
ANOVA | |||||
df | SS | MS | F | Significance F | |
Regression | 2 | 7248.705601 | 3624.353 | 228.8049 | 4.55335E-20 |
Residual | 33 | 522.7318989 | 15.84036 | ||
Total | 35 | 7771.4375 |
The significance F value which is the P-value is less than 0.05 at 5% significance level. hence the regression model is significant
f)
P-value | 4.55335E-20 |
The significance level represents the percent of error that the difference exist between actual and hypothesized parameter. If the The P-value is less than significance level, the null hypothesis is rejected and we can conclude that there is no difference in reject.
g)
The coefficient of multiple determination is the R squared value in the regression model. From the regression model in part a)
R Square | 0.932736781 |
Which means the regression model explains the 93.27% of the variance of data set.
h)
Adjusted R Square | 0.928660223 |
It is the modified R-squared value that adjusted for for the number of predictors in the model such that if the newly added independent variable is insignificant, the R squared doesn't increase.
i)
The hypothesis is tested by calculating t-value and corresponding p-value for the estimated slope in regression model as shown below,
Null Hypothesis:
Alternate Hypothesis:
The P-value is obtained in regression analysis in part a).
P-value | Significance level | Decision | ||
Feet | 6.36E-08 | < | 0.05 | Null hypothesis is rejected, hence significant relationship |
Large | 5.64E-05 | < | 0.05 | Null hypothesis is rejected, hence significant relationship |
j)
P-value | |
Feet | 6.36E-08 |
Large | 5.64E-05 |
The p-value is the probability of finding the observed, or more extreme, results when the null hypothesis is true which means it is the probability of occurrence of an event at the extreme point (significance level).
k)
From the regression model 95% confidence interval estimate of the population slope are,
Lower 95% | Upper 95% | |
Feet | 0.0226 | 0.0413 |
Large | 2.3629 | 6.0828 |
Conclusion:
The independent variable number of cubic feet moved and the number of pieces of large furniture significantly explain the total number of labor hours that will be required to complete an upcoming move